Does strtok require a non-null token?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ryampolsky@gmail.com

    Does strtok require a non-null token?

    I'm using strtok to break apart a colon-delimited string. It basically
    works, but it looks like strtok skips over empty sections. In other
    words, if the string has 2 colons in a row, it doesn't treat that as a
    null token, it just treats the 2 colons as a single delimiter.

    Is that the intended behavior?

  • Al Balmer

    #2
    Re: Does strtok require a non-null token?

    On 12 Oct 2006 11:38:36 -0700, ryampolsky@gmai l.com wrote:
    >I'm using strtok to break apart a colon-delimited string. It basically
    >works, but it looks like strtok skips over empty sections. In other
    >words, if the string has 2 colons in a row, it doesn't treat that as a
    >null token, it just treats the 2 colons as a single delimiter.
    >
    >Is that the intended behavior?
    Yes. This is one of the drawbacks of strtok. From the current
    position, it searches for a character *not* in the delimiter set, sets
    this position as the return pointer, then searches for the first
    character that *is* in the delimiter set and sets it to null.

    (Individual implementations may be different, but that's the way it's
    required to behave.)

    For your application, it's probably easier to scan the string
    yourself.

    --
    Al Balmer
    Sun City, AZ

    Comment

    • William Hughes

      #3
      Re: Does strtok require a non-null token?


      ryampolsky@gmai l.com wrote:
      I'm using strtok to break apart a colon-delimited string. It basically
      works, but it looks like strtok skips over empty sections. In other
      words, if the string has 2 colons in a row, it doesn't treat that as a
      null token, it just treats the 2 colons as a single delimiter.
      >
      Is that the intended behavior?
      Yes. Just one more reason to avoid strtok().

      - William Hughes

      Comment

      • Default User

        #4
        Re: Does strtok require a non-null token?

        William Hughes wrote:
        >
        ryampolsky@gmai l.com wrote:
        I'm using strtok to break apart a colon-delimited string. It
        basically works, but it looks like strtok skips over empty
        sections. In other words, if the string has 2 colons in a row, it
        doesn't treat that as a null token, it just treats the 2 colons as
        a single delimiter.

        Is that the intended behavior?
        >
        Yes. Just one more reason to avoid strtok().
        Unless that's the behavior you want. Example, breaking lines into words
        with white space. You don't want a bunch of "null" words.





        Brian

        Comment

        • Ben Pfaff

          #5
          Re: Does strtok require a non-null token?

          ryampolsky@gmai l.com writes:
          I'm using strtok to break apart a colon-delimited string. It basically
          works, but it looks like strtok skips over empty sections. In other
          words, if the string has 2 colons in a row, it doesn't treat that as a
          null token, it just treats the 2 colons as a single delimiter.
          strtok() has at least these problems:

          * It merges adjacent delimiters. If you use a comma as your
          delimiter, then "a,,b,c" will be divided into three tokens,
          not four. This is often the wrong thing to do. In fact, it
          is only the right thing to do, in my experience, when the
          delimiter set contains white space (for dividing a string
          into "words") or it is known in advance that there will be
          no adjacent delimiters.

          * The identity of the delimiter is lost, because it is
          changed to a null terminator.

          * It modifies the string that it tokenizes. This is bad
          because it forces you to make a copy of the string if
          you want to use it later. It also means that you can't
          tokenize a string literal with it; this is not
          necessarily something you'd want to do all the time but
          it is surprising.

          * It can only be used once at a time. If a sequence of
          strtok() calls is ongoing and another one is started,
          the state of the first one is lost. This isn't a
          problem for small programs but it is easy to lose track
          of such things in hierarchies of nested functions in
          large programs. In other words, strtok() breaks
          encapsulation.

          --
          "A lesson for us all: Even in trivia there are traps."
          --Eric Sosman

          Comment

          • William Hughes

            #6
            Re: Does strtok require a non-null token?


            Default User wrote:
            William Hughes wrote:
            >

            ryampolsky@gmai l.com wrote:
            I'm using strtok to break apart a colon-delimited string. It
            basically works, but it looks like strtok skips over empty
            sections. In other words, if the string has 2 colons in a row, it
            doesn't treat that as a null token, it just treats the 2 colons as
            a single delimiter.
            >
            Is that the intended behavior?
            Yes. Just one more reason to avoid strtok().
            >
            Unless that's the behavior you want. Example, breaking lines into words
            with white space. You don't want a bunch of "null" words.
            >
            >
            The point is not that the function's behaviour is not sometimes
            what you want. The point is

            -the default behaviour is surprising

            -the default behaviour is not even
            usually what you want

            -the default behaviour throws information away

            -if you don't like the default behaviour, see
            figure 1.

            Personally I'm with the Linux man pages on this one. Under Bugs
            is the advice "Never use this function".

            -William Hughes

            Comment

            • Default User

              #7
              Re: Does strtok require a non-null token?

              William Hughes wrote:
              >
              Default User wrote:
              Unless that's the behavior you want. Example, breaking lines into
              words with white space. You don't want a bunch of "null" words.
              >
              The point is not that the function's behaviour is not sometimes
              what you want. The point is
              >
              -the default behaviour is surprising
              Only if one fails to read the documentation. A number of functions are
              funny that way.
              -the default behaviour is not even
              usually what you want
              How do you know? Even if true, so what?
              -the default behaviour throws information away
              Again, if you know that and if fits the problem, so what?
              -if you don't like the default behaviour, see
              figure 1.
              I don't understand this statement. I have no idea what "figure 1" is.
              Personally I'm with the Linux man pages on this one. Under Bugs
              is the advice "Never use this function".
              Well, that's stupid advice. The function may be tricky, but sometimes
              it's just the right thing. In those cases, it should be used. If not,
              it shouldn't.



              Brian



              Comment

              • Al Balmer

                #8
                Re: Does strtok require a non-null token?

                On 12 Oct 2006 14:32:27 -0700, "William Hughes"
                <wpihughes@hotm ail.comwrote:
                >
                >Default User wrote:
                >William Hughes wrote:
                >>
                >
                ryampolsky@gmai l.com wrote:
                I'm using strtok to break apart a colon-delimited string. It
                basically works, but it looks like strtok skips over empty
                sections. In other words, if the string has 2 colons in a row, it
                doesn't treat that as a null token, it just treats the 2 colons as
                a single delimiter.
                >
                Is that the intended behavior?
                >
                Yes. Just one more reason to avoid strtok().
                >>
                >Unless that's the behavior you want. Example, breaking lines into words
                >with white space. You don't want a bunch of "null" words.
                >>
                >>
                >
                >The point is not that the function's behaviour is not sometimes
                >what you want. The point is
                >
                -the default behaviour is surprising
                The behavior of many functions might be surprising if you don't read
                the documentation.
                >
                -the default behaviour is not even
                usually what you want
                Like any other function in the library, it's used where appropriate.
                Sometimes it *is* what I want.
                >
                -the default behaviour throws information away
                I don't really know what information you're referring to. You could
                just as easily say it adds information. If there's information that
                you need to protect, it's trivial.
                >
                -if you don't like the default behaviour, see
                figure 1.
                ? Did you copy this from a book with pictures? That would explain the
                odd indentation, I suppose.
                >
                >Personally I'm with the Linux man pages on this one. Under Bugs
                >is the advice "Never use this function".
                That's silly. Like any other function, it should be used when
                appropriate, and not used when not appropriate.

                --
                Al Balmer
                Sun City, AZ

                Comment

                • CBFalconer

                  #9
                  Re: Does strtok require a non-null token?

                  ryampolsky@gmai l.com wrote:
                  >
                  I'm using strtok to break apart a colon-delimited string. It
                  basically works, but it looks like strtok skips over empty
                  sections. In other words, if the string has 2 colons in a row, it
                  doesn't treat that as a null token, it just treats the 2 colons as
                  a single delimiter.
                  >
                  Is that the intended behavior?
                  Yes. If that is a problem, consider using my toksplit routine, the
                  code for which has been published here before. I think googling
                  for "toksplit" will bring it up, so I won't burden the newsgroup
                  with YAC (yet another copy).

                  --
                  Some informative links:
                  <news:news.anno unce.newusers
                  <http://www.geocities.c om/nnqweb/>
                  <http://www.catb.org/~esr/faqs/smart-questions.html>
                  <http://www.caliburn.nl/topposting.html >
                  <http://www.netmeister. org/news/learn2quote.htm l>
                  <http://cfaj.freeshell. org/google/>


                  Comment

                  • William Hughes

                    #10
                    Re: Does strtok require a non-null token?


                    Al Balmer wrote:
                    On 12 Oct 2006 14:32:27 -0700, "William Hughes"
                    <wpihughes@hotm ail.comwrote:
                    >

                    Default User wrote:
                    William Hughes wrote:
                    >

                    ryampolsky@gmai l.com wrote:
                    I'm using strtok to break apart a colon-delimited string. It
                    basically works, but it looks like strtok skips over empty
                    sections. In other words, if the string has 2 colons in a row, it
                    doesn't treat that as a null token, it just treats the 2 colons as
                    a single delimiter.
                    >
                    Is that the intended behavior?

                    Yes. Just one more reason to avoid strtok().
                    >
                    Unless that's the behavior you want. Example, breaking lines into words
                    with white space. You don't want a bunch of "null" words.
                    >
                    >
                    The point is not that the function's behaviour is not sometimes
                    what you want. The point is

                    -the default behaviour is surprising
                    >
                    The behavior of many functions might be surprising if you don't read
                    the documentation.

                    -the default behaviour is not even
                    usually what you want
                    Like any other function in the library, it's used where appropriate.
                    Sometimes it *is* what I want.

                    -the default behaviour throws information away
                    >
                    I don't really know what information you're referring to.
                    The number of delimiters. (strtok() also discards the identity
                    of these delimiters but that has not been previously mentioned in
                    this subthread).
                    You could
                    just as easily say it adds information. If there's information that
                    you need to protect, it's trivial.

                    -if you don't like the default behaviour, see
                    figure 1.
                    >
                    ? Did you copy this from a book with pictures? That would explain the
                    odd indentation, I suppose.
                    figure 1. is a picture of a hand with a single digit extended (guess
                    which
                    one). It comes from an old piece of xerox-lore, a parody of DEC (?)
                    documentation in which an oft repeated phase is "see figure 1."
                    I guess the reference was a little too obscure.
                    Personally I'm with the Linux man pages on this one. Under Bugs
                    is the advice "Never use this function".
                    >
                    That's silly. Like any other function, it should be used when
                    appropriate, and not used when not appropriate.
                    >
                    Well, never is probably too strong. However, strtok() is dominated by
                    a good general purpose parsing method. Since you need a good
                    general purpose parsing method, why not use that instead of
                    strtok()?

                    - William Hughes

                    Comment

                    • Al Balmer

                      #11
                      Re: Does strtok require a non-null token?

                      On 12 Oct 2006 16:47:31 -0700, "William Hughes"
                      <wpihughes@hotm ail.comwrote:
                      >Well, never is probably too strong. However, strtok() is dominated by
                      >a good general purpose parsing method. Since you need a good
                      >general purpose parsing method, why not use that instead of
                      >strtok()?
                      Where one is needed, I do, and am obligated to supply it with the rest
                      of the code. Anyone maintaining that code is then obligated to read
                      and understand it.

                      Where it's not needed, strtok is already there, and the maintainer
                      already knows what it does.

                      It's the same reason I don't supply my own version of other parts of
                      the standard library.

                      --
                      Al Balmer
                      Sun City, AZ

                      Comment

                      • CBFalconer

                        #12
                        Re: Does strtok require a non-null token?

                        Ben Pfaff wrote:
                        >
                        ryampolsky@gmai l.com writes:
                        >
                        >I'm using strtok to break apart a colon-delimited string. It
                        >basically works, but it looks like strtok skips over empty
                        >sections. In other words, if the string has 2 colons in a row,
                        >it doesn't treat that as a null token, it just treats the 2
                        >colons as a single delimiter.
                        >
                        strtok() has at least these problems:
                        >
                        * It merges adjacent delimiters. If you use a comma as your
                        delimiter, then "a,,b,c" will be divided into three tokens,
                        not four. This is often the wrong thing to do. In fact, it
                        is only the right thing to do, in my experience, when the
                        delimiter set contains white space (for dividing a string
                        into "words") or it is known in advance that there will be
                        no adjacent delimiters.
                        >
                        * The identity of the delimiter is lost, because it is
                        changed to a null terminator.
                        >
                        * It modifies the string that it tokenizes. This is bad
                        because it forces you to make a copy of the string if
                        you want to use it later. It also means that you can't
                        tokenize a string literal with it; this is not
                        necessarily something you'd want to do all the time but
                        it is surprising.
                        >
                        * It can only be used once at a time. If a sequence of
                        strtok() calls is ongoing and another one is started,
                        the state of the first one is lost. This isn't a
                        problem for small programs but it is easy to lose track
                        of such things in hierarchies of nested functions in
                        large programs. In other words, strtok() breaks
                        encapsulation.
                        Whence sprang toksplit, which returns a pointer to the src string
                        just past the delimiting char, except at end of string. The only
                        possible nuisance IMO is that it handles only one possible token
                        delimiter char (apart from '\0').

                        const char *toksplit(const char *src, /* Source of tokens */
                        char tokchar, /* token delimiting char */
                        char *token, /* receiver of parsed token */
                        size_t lgh) /* length token can receive */
                        /* not including final '\0' */

                        --
                        Some informative links:
                        <news:news.anno unce.newusers
                        <http://www.geocities.c om/nnqweb/>
                        <http://www.catb.org/~esr/faqs/smart-questions.html>
                        <http://www.caliburn.nl/topposting.html >
                        <http://www.netmeister. org/news/learn2quote.htm l>
                        <http://cfaj.freeshell. org/google/>


                        Comment

                        • William Hughes

                          #13
                          Re: Does strtok require a non-null token?


                          Al Balmer wrote:
                          On 12 Oct 2006 16:47:31 -0700, "William Hughes"
                          <wpihughes@hotm ail.comwrote:
                          >
                          Well, never is probably too strong. However, strtok() is dominated by
                          a good general purpose parsing method. Since you need a good
                          general purpose parsing method, why not use that instead of
                          strtok()?
                          >
                          Where one is needed, I do, and am obligated to supply it with the rest
                          of the code. Anyone maintaining that code is then obligated to read
                          and understand it.
                          >
                          Where it's not needed, strtok is already there, and the maintainer
                          already knows what it does.
                          >
                          It's the same reason I don't supply my own version of other parts of
                          the standard library.
                          >
                          Ok. I can see why, if you expect the code to be maintained by
                          others (a common setup), you would want to use standard functions.
                          And, usually, a bad standard is better than no standard. But there
                          are limits!

                          In any case, I don't think your average maintainence drone would
                          know how strtok() works, or that said drone would be better
                          at reading the documentation for strtok() than any documentation
                          you supply with a good general purpose routine.

                          One reason for my opinion is that personally I don't find
                          strtok() very useful (Indeed, outside of a couple of exericises that
                          mandated its use, I don't think I have ever used it). This is in
                          part because, if possible, I don't use C for string manipulation.
                          But even when I do use C I don't use strtok(). Clearly, my
                          situation may not be the most usual one (or even common).


                          - William Hughes

                          Comment

                          • Clever Monkey

                            #14
                            Re: Does strtok require a non-null token?

                            William Hughes wrote:
                            Default User wrote:
                            >William Hughes wrote:
                            >>
                            >>ryampolsky@gmai l.com wrote:
                            >>>I'm using strtok to break apart a colon-delimited string. It
                            >>>basically works, but it looks like strtok skips over empty
                            >>>sections. In other words, if the string has 2 colons in a row, it
                            >>>doesn't treat that as a null token, it just treats the 2 colons as
                            >>>a single delimiter.
                            >>>>
                            >>>Is that the intended behavior?
                            >>Yes. Just one more reason to avoid strtok().
                            >Unless that's the behavior you want. Example, breaking lines into words
                            >with white space. You don't want a bunch of "null" words.
                            >>
                            The point is not that the function's behaviour is not sometimes
                            what you want. The point is
                            >
                            -the default behaviour is surprising
                            Perhaps. About the only thing surprising to me was that the argument
                            you pass it is affected.
                            -the default behaviour is not even
                            usually what you want
                            So far I've only ever needed the default behaviour with respect to
                            collapsing adjacent tokens. In fact, I *expected* this! That is, for
                            the majority of the reasons I need to tokenized a string, this default
                            behaviour is exactly what I want.
                            -the default behaviour throws information away
                            >
                            Not sure what you mean here, but I assume you are referring to how it
                            munges its argument. I guess I just never care about this because we
                            always store strings in a struct that is passed around, or make copies
                            of things we tokenized and care about.
                            -if you don't like the default behaviour, see
                            figure 1.
                            >
                            I assume figure 1 is a picture of your own implementation that has
                            non-default requirements :)
                            Personally I'm with the Linux man pages on this one. Under Bugs
                            is the advice "Never use this function".
                            >
                            Well, I'll ignore this advice. For the trivial case of needing
                            tokenized a string to store in my own array of buffers, it works just fine.

                            For those requirements that strtok() does not fit we have our own
                            internal tokenizing routines. If all I need is to parse out (say) a
                            bunch of email addresses passed as a list and store them in a char**
                            [which was the last time I used strtok()] then it fits perfectly. In
                            this case I don't even care if the calling code screwed up the list. I
                            either get one or more valid strings or I don't. I return success or
                            failure and let them howl!

                            Of course, if I'd been bitten by the function in the past, I'd be
                            arguing differently.

                            Many of the str_ routines in the Standard have some legacy use that
                            explains design decisions [e.g., strncpy() and database column width].
                            I wonder if strtok() also has history that explains why the defaults
                            cause so much consternation?

                            Comment

                            • William Hughes

                              #15
                              Re: Does strtok require a non-null token?


                              Clever Monkey wrote:
                              William Hughes wrote:
                              Default User wrote:
                              William Hughes wrote:
                              >
                              >ryampolsky@gmai l.com wrote:
                              >>I'm using strtok to break apart a colon-delimited string. It
                              >>basically works, but it looks like strtok skips over empty
                              >>sections. In other words, if the string has 2 colons in a row, it
                              >>doesn't treat that as a null token, it just treats the 2 colons as
                              >>a single delimiter.
                              >>>
                              >>Is that the intended behavior?
                              >Yes. Just one more reason to avoid strtok().
                              Unless that's the behavior you want. Example, breaking lines into words
                              with white space. You don't want a bunch of "null" words.
                              >
                              The point is not that the function's behaviour is not sometimes
                              what you want. The point is

                              -the default behaviour is surprising
                              Perhaps. About the only thing surprising to me was that the argument
                              you pass it is affected.
                              >
                              -the default behaviour is not even
                              usually what you want
                              So far I've only ever needed the default behaviour with respect to
                              collapsing adjacent tokens. In fact, I *expected* this! That is, for
                              the majority of the reasons I need to tokenized a string, this default
                              behaviour is exactly what I want.
                              >
                              -the default behaviour throws information away
                              Not sure what you mean here, but I assume you are referring to how it
                              munges its argument.
                              No, it also throws away the number [and identity] of the tokens.
                              I guess I just never care about this because we
                              always store strings in a struct that is passed around, or make copies
                              of things we tokenized and care about.
                              >
                              -if you don't like the default behaviour, see
                              figure 1.
                              I assume figure 1 is a picture of your own implementation that has
                              non-default requirements :)
                              >
                              Nope. See the jargon file.
                              Personally I'm with the Linux man pages on this one. Under Bugs
                              is the advice "Never use this function".
                              Well, I'll ignore this advice.
                              Chacon a son gout.
                              >For the trivial case of needing
                              tokenized a string to store in my own array of buffers, it works just fine.
                              >
                              For those requirements that strtok() does not fit we have our own
                              internal tokenizing routines.
                              And your reason for not using them in preference to strtok()?
                              If all I need is to parse out (say) a
                              bunch of email addresses passed as a list and store them in a char**
                              [which was the last time I used strtok()] then it fits perfectly. In
                              this case I don't even care if the calling code screwed up the list. I
                              either get one or more valid strings or I don't. I return success or
                              failure and let them howl!
                              >
                              Of course, if I'd been bitten by the function in the past, I'd be
                              arguing differently.
                              >
                              Many of the str_ routines in the Standard have some legacy use that
                              explains design decisions [e.g., strncpy() and database column width].
                              I wonder if strtok() also has history that explains why the defaults
                              cause so much consternation?
                              I am sure that the defaults were chosen for what was at the
                              time a good reason ( maybe because
                              the immediate need was removing whitespace). The fact remains
                              they are not a good choice for a general purpose routine
                              (and the fact that they are "mandatory defaults" makes things
                              even worse).

                              - William Hughes

                              Comment

                              Working...