strtok question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stu Cazzo

    strtok question

    Hi all,
    I have a question on why strtok is doing what it's doing for my
    splitString( string2 ); call.

    Below is the output for the entire program:

    token was: word1
    token was: word2
    token was: word3
    token was: word1
    token was: word3
    empty field found - token <(null)>


    The splitString( string1 ); works as expected, 3 tokens are found.
    The splitString( string2 ); does not work as I expected.
    I was expecting this:

    token was: word1
    empty field found - token <(null)>
    token was: word3

    Why does it not see the empty field for lineToken2?
    Is there a better way to strip out the tokens for the case I have
    where
    there is no space between my delimiter?

    -------------------------------------------------------------------
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    void splitString( char *string )
    {
    const char lineDelimiter[] = ",";
    char *lineToken1;
    char *lineToken2;
    char *lineToken3;

    lineToken1 = strtok( string, lineDelimiter );
    if ( lineToken1 == '\0' )
    {
    printf("empty field found - token <%s>\n", lineToken1);
    }
    else
    {
    printf("token was: %s\n", lineToken1);
    }

    lineToken2 = strtok( NULL, lineDelimiter );
    if ( lineToken2 == '\0' )
    {
    printf("empty field found - token <%s>\n", lineToken2);
    }
    else
    {
    printf("token was: %s\n", lineToken2);
    }

    lineToken3 = strtok( NULL, lineDelimiter );
    if ( lineToken3 == '\0' )
    {
    printf("empty field found - token <%s>\n", lineToken3);
    }
    else
    {
    printf("token was: %s\n", lineToken3);
    }

    }



    int main (int argc, const char **argv)
    {
    char string1[] = "word1,word2,wo rd3";
    char string2[] = "word1,,wor d3";

    splitString( string1 );
    splitString( string2 );

    return( 1 );

    }

  • Jens Thoms Toerring

    #2
    Re: strtok question

    Stu Cazzo <SCazzo@gmail.c omwrote:
    Hi all,
    I have a question on why strtok is doing what it's doing for my
    splitString( string2 ); call.
    Below is the output for the entire program:
    token was: word1
    token was: word2
    token was: word3
    token was: word1
    token was: word3
    empty field found - token <(null)>
    The splitString( string1 ); works as expected, 3 tokens are found.
    The splitString( string2 ); does not work as I expected.
    I was expecting this:
    token was: word1
    empty field found - token <(null)>
    token was: word3
    Why does it not see the empty field for lineToken2?
    Because that's not how strtok() works. The man page on my machine
    for strtok() actually makes it rather clear:

    A sequence of two or more contiguous delimiter characters in the
    parsed string is considered to be a single delimiter. Delimiter
    characters at the start or end of the string are ignored. Put
    another way: the tokens returned by strtok() are always non-empty
    strings.

    So if you have more than one ',' in a row all of them are treated
    the same as a single ','.
    Is there a better way to strip out the tokens for the case I have
    where there is no space between my delimiter?
    I guess you will have to write your own function for that, probably
    repeatedly using strchr() or strstr().

    Regards, Jens
    --
    \ Jens Thoms Toerring ___ jt@toerring.de
    \______________ ____________ http://toerring.de

    Comment

    • CBFalconer

      #3
      Re: strtok question

      Jens Thoms Toerring wrote:
      Stu Cazzo <SCazzo@gmail.c omwrote:
      >
      .... snip ...
      >
      >The splitString( string1 ); works as expected, 3 tokens are found.
      >The splitString( string2 ); does not work as I expected.
      >
      .... snip ...
      >
      >Why does it not see the empty field for lineToken2?
      >
      Because that's not how strtok() works. The man page on my machine
      for strtok() actually makes it rather clear:
      >
      .... snip ...
      >
      >Is there a better way to strip out the tokens for the case I have
      >where there is no space between my delimiter?
      >
      I guess you will have to write your own function for that, probably
      repeatedly using strchr() or strstr().
      Try this:

      /* ------- file tknsplit.c ----------*/
      #include "tknsplit.h "

      /* copy over the next tkn from an input string, after
      skipping leading blanks (or other whitespace?). The
      tkn is terminated by the first appearance of tknchar,
      or by the end of the source string.

      The caller must supply sufficient space in tkn to
      receive any tkn, Otherwise tkns will be truncated.

      Returns: a pointer past the terminating tknchar.

      This will happily return an infinity of empty tkns if
      called with src pointing to the end of a string. Tokens
      will never include a copy of tknchar.

      A better name would be "strtkn", except that is reserved
      for the system namespace. Change to that at your risk.

      released to Public Domain, by C.B. Falconer.
      Published 2006-02-20. Attribution appreciated.
      Revised 2006-06-13 2007-05-26 (name)
      */

      const char *tknsplit(const char *src, /* Source of tkns */
      char tknchar, /* tkn delimiting char */
      char *tkn, /* receiver of parsed tkn */
      size_t lgh) /* length tkn can receive */
      /* not including final '\0' */
      {
      if (src) {
      while (' ' == *src) src++;

      while (*src && (tknchar != *src)) {
      if (lgh) {
      *tkn++ = *src;
      --lgh;
      }
      src++;
      }
      if (*src && (tknchar == *src)) src++;
      }
      *tkn = '\0';
      return src;
      } /* tknsplit */

      --
      [mail]: Chuck F (cbfalconer at maineline dot net)
      [page]: <http://cbfalconer.home .att.net>
      Try the download section.


      ** Posted from http://www.teranews.com **

      Comment

      • Ben Bacarisse

        #4
        Re: strtok question

        CBFalconer <cbfalconer@yah oo.comwrites:
        <snip>
        while (*src && (tknchar != *src)) {
        <snip body>
        }
        if (*src && (tknchar == *src)) src++;
        Some people might find that test confusing. It is certainly
        belt-and-braces code.

        --
        Ben.

        Comment

        • CBFalconer

          #5
          Re: strtok question

          Ben Bacarisse wrote:
          CBFalconer <cbfalconer@yah oo.comwrites:
          <snip>
          > while (*src && (tknchar != *src)) {
          <snip body>
          > }
          > if (*src && (tknchar == *src)) src++;
          >
          Some people might find that test confusing. It is certainly
          belt-and-braces code.
          You should have left the body. That code doesn't have the same
          effect. Assuming I am correctly interpreting your message.

          --
          [mail]: Chuck F (cbfalconer at maineline dot net)
          [page]: <http://cbfalconer.home .att.net>
          Try the download section.


          ** Posted from http://www.teranews.com **

          Comment

          • Antoninus Twink

            #6
            Re: strtok question

            On 31 May 2008 at 23:37, Ben Bacarisse wrote:
            CBFalconer <cbfalconer@yah oo.comwrites:
            ><snip>
            > while (*src && (tknchar != *src)) {
            ><snip body>
            > }
            > if (*src && (tknchar == *src)) src++;
            >
            Some people might find that test confusing. It is certainly
            belt-and-braces code.
            Most people grow out of this sort of thing within a few months or so of
            their initial child-like excitement at discovering a language with so
            many side effects. Clarity and ease of debugging become more valuable
            than a transient smug feeling of cleverness.

            Actually I'm pretty tolerant of different people's ways of laying out
            code, indenting and the rest, but CBF really does seem to have total
            anti-taste when it comes to code formatting. Perhaps the most irritating
            thing of all is
            >} /* tknsplit */
            This seems to me to be about as helpful as the infamous

            i++; /* increment i by one */

            Comment

            • Richard

              #7
              Re: strtok question

              Antoninus Twink <nospam@nospam. invalidwrites:
              On 31 May 2008 at 23:37, Ben Bacarisse wrote:
              >CBFalconer <cbfalconer@yah oo.comwrites:
              >><snip>
              >> while (*src && (tknchar != *src)) {
              >><snip body>
              >> }
              >> if (*src && (tknchar == *src)) src++;
              >>
              >Some people might find that test confusing. It is certainly
              >belt-and-braces code.
              >
              Most people grow out of this sort of thing within a few months or so of
              their initial child-like excitement at discovering a language with so
              many side effects. Clarity and ease of debugging become more valuable
              than a transient smug feeling of cleverness.
              I have to agree 100% with this statement. Some people seem to take
              pleasure in obfuscating their code.
              >
              Actually I'm pretty tolerant of different people's ways of laying out
              code, indenting and the rest, but CBF really does seem to have total
              anti-taste when it comes to code formatting. Perhaps the most irritating
              thing of all is
              >
              >>} /* tknsplit */
              >
              This seems to me to be about as helpful as the infamous
              >
              i++; /* increment i by one */
              Again agreed.


              Comment

              • Ben Bacarisse

                #8
                Re: strtok question

                CBFalconer <cbfalconer@yah oo.comwrites:
                Ben Bacarisse wrote:
                >CBFalconer <cbfalconer@yah oo.comwrites:
                ><snip>
                >> while (*src && (tknchar != *src)) {
                ><snip body>
                >> }
                >> if (*src && (tknchar == *src)) src++;
                >>
                >Some people might find that test confusing. It is certainly
                >belt-and-braces code.
                >
                You should have left the body.
                It has no bearing on my point. Any loop body that terminates in the
                normal way would do just as well. Maybe I should have said "<snip
                body with no break statement>".
                That code doesn't have the same
                effect. Assuming I am correctly interpreting your message.
                I think you missed the point. If you want the function to work when
                tknchar might be 0, then if (tknchar == *src) src++; is enough. If
                you don't want it to work when tknchar is 0 (as seems to be the case)
                then if (*src) src++; is enough.

                It is not in any way wrong, just as if (c == '\n' && c) is not really
                wrong -- it just makes the reader do an unwarranted double take.

                --
                Ben.

                Comment

                • Ben Bacarisse

                  #9
                  Re: strtok question

                  Antoninus Twink <nospam@nospam. invalidwrites:
                  On 31 May 2008 at 23:37, Ben Bacarisse wrote:
                  >CBFalconer <cbfalconer@yah oo.comwrites:
                  >><snip>
                  >> while (*src && (tknchar != *src)) {
                  >><snip body>
                  >> }
                  >> if (*src && (tknchar == *src)) src++;
                  >>
                  >Some people might find that test confusing. It is certainly
                  >belt-and-braces code.
                  >
                  Most people grow out of this sort of thing within a few months or so of
                  their initial child-like excitement at discovering a language with so
                  many side effects. Clarity and ease of debugging become more valuable
                  than a transient smug feeling of cleverness.
                  I don't know what you mean in this case. I agree with the sentiment,
                  but what has it do with this example? How do you write this without
                  using side effects?

                  The reference to debugging makes me thing you object to the layout.
                  Would moving the increment of src down a line make it all OK?
                  Actually I'm pretty tolerant of different people's ways of laying out
                  code,
                  Ah. So you were commenting on the layout. What was the reference to
                  abusing side effects about?

                  --
                  Ben.

                  Comment

                  Working...