looking for implementation of strtok

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • magicman

    looking for implementation of strtok

    can anyone point me out to its implementation in C before I roll my
    own.

    thx
  • Jensen Somers

    #2
    Re: looking for implementation of strtok

    magicman wrote:
    can anyone point me out to its implementation in C before I roll my
    own.
    >
    thx
    1ste Google result:


    - Jensen

    Comment

    • magicman

      #3
      Re: looking for implementation of strtok

      On Apr 19, 5:27 pm, Jensen Somers <jensen.som...@ gmail.comwrote:
      magicman wrote:
      can anyone point me out to its implementation in C before I roll my
      own.
      >
      thx
      >
      1ste Google result:http://www.openbsd.org/cgi-bin/cvswe...ng/strtok.c?re...
      >
      - Jensen
      what does s[-1] = 0; in the implementation mean?

      Comment

      • CBFalconer

        #4
        Re: looking for implementation of strtok

        magicman wrote:
        >
        can anyone point me out to its implementation in C before I roll
        my own.
        Why? strtok() is part of the standard C library, and should always
        be available.

        7.21.5.8 The strtok function

        Synopsis
        [#1]
        #include <string.h>
        char *strtok(char * restrict s1,
        const char * restrict s2);

        --
        [mail]: Chuck F (cbfalconer at maineline dot net)
        [page]: <http://cbfalconer.home .att.net>
        Try the download section.


        ** Posted from http://www.teranews.com **

        Comment

        • vippstar@gmail.com

          #5
          Re: looking for implementation of strtok

          On Apr 20, 2:17 am, CBFalconer <cbfalco...@yah oo.comwrote:
          magicman wrote:
          >
          can anyone point me out to its implementation in C before I roll
          my own.
          char *strtok(char * restrict s1, const char * restrict s2) {

          static char *p;
          char *v;

          if(s1 != NULL) p = s1;
          while(strchr(s2 , *p)) p++;
          v = p;
          for(; *p != 0; p++)
          if(strchr(s2, *p) { *p = 0; for(i++; strchr(s2, *p); *p++ = 0);
          return v; }
          return NULL;
          }
          >
          Why? strtok() is part of the standard C library, and should always
          be available.
          Most likely to learn.

          Comment

          • Barry Schwarz

            #6
            Re: looking for implementation of strtok

            On Sat, 19 Apr 2008 14:25:43 -0700 (PDT), magicman
            <ironsel2000@gm ail.comwrote:
            >can anyone point me out to its implementation in C before I roll my
            >own.
            The function does not have a single implementation. Each run-time
            library has its own implementation of the function. Some libraries
            include the source for the functions. Google can probably point you
            to some publicly available ones. Whether any particular
            implementation has any relationship to your system is something you
            will have to determine.


            Remove del for email

            Comment

            • Peter Nilsson

              #7
              Re: looking for implementation of strtok

              CBFalconer wrote:
              magicman wrote:
              can anyone point me out to its implementation in C before I roll
              my own.
              >
              Why? strtok() is part of the standard C library, and should always
              be available.
              It needn't be available on a freestanding implementation.
              7.21.5.8 The strtok function
              >
              Synopsis
              [#1]
              #include <string.h>
              char *strtok(char * restrict s1,
              const char * restrict s2);
              4p6 "...A conforming freestanding implementation shall accept
              any strictly conforming program that does not use complex types
              and in which the use of the features specified in the library clause
              (clause 7) is confined to the contents of the standard headers
              <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>,
              <stddef.h>, and <stdint.h>."

              Note the absence of <string.h>

              --
              Peter

              Comment

              • Chris Torek

                #8
                Re: looking for implementation of strtok

                >magicman wrote:
                >>can anyone point me out to its implementation in C before I roll my
                >>own.
                [As others noted elsethread, there is not really an "its
                implementation" , which implies exactly one single implementation.]
                >On Apr 19, 5:27 pm, Jensen Somers <jensen.som...@ gmail.comwrote:
                >1ste Google
                >result:http://www.openbsd.org/cgi-bin/cvswe...ng/strtok.c?re...
                In article <ceefe876-b3a5-4715-9a58-1b491587a58b@24 g2000hsh.google groups.com>
                magicman <ironsel2000@gm ail.comwrote:
                >what does s[-1] = 0; in the implementation mean?
                The version above (with the truncated URL) is a rehash of the
                implementation I wrote in 1988. Mine did not have a strtok_r()
                (which is not even in C99, much less C89), and in any case it
                would be better to put strtok_r() in a separate file.

                The C99 "restrict" qualifiers are missing, and should be added
                if you have a C99-based system (though most people do not).

                These days it probably would be better, in some sense, to go ahead
                and call strspn() and strcspn(), rather than writing them in-line
                as I did. (When I wrote this, subroutine calls were relatively
                expensive, and compilers never did any inline expansion, so we had
                to do it by hand wherever it was profitable.)

                The assignment:

                s[-1] = 0;

                writes a '\0' at s[-1], of course. If you mean "what does s[-1]
                even *mean*", it means: "add -1 to the pointer in s, and then
                follow the resulting pointer". Adding an integer to a pointer
                moves "forward" by that number of objects, so adding -1 moves
                "forward" negative-one "char"s, or in other words, moves backward
                by one "char". Earlier we had:

                c = *s++;

                so s now points one "char" beyond the location that holds the
                character stored in "c". Pictorially, suppose we call strtok()
                with:

                char buf[] = ",hello,/,world";
                char *result;

                result = strtok(buf, ",/");

                When we enter strtok(), we have "s" pointing to &buf[0], which
                contains a ',' character. The static variable "last" is initially
                NULL, or is left over from a previous call to strtok(), but either
                way we are not going to use it, and are soon going to overwrite
                it.

                The strtok() call then calls strtok_r(), passing s (&buf[0]), delim
                (pointing to the ',' in the string ",/"), and &last. The strtok_r()
                function begins with (I will update this code for C99 while writing
                this; note this changes the name of the third argument to "lastp",
                as I would have called the variable if I had written this):

                char *strtok_r(char *restrict s, const char *restrict delim, char **lastp) {
                const char *restrict spanp;
                char c, sc;
                char *restrict tok;

                if (s == NULL && (s = *lastp) == NULL)
                return NULL;

                Since s is not NULL, we do not even look at *lastp, much less
                return NULL. (However, if we call strtok() again later, passing
                NULL for s, we *will* look at *lastp, i.e., strtok()'s "last",
                at that point. This is how we will save a "sideways" result
                for later strtok() calls. This is not a *good* way to do it;
                see the BSD strsep() function for a better method.)

                Next, we enter the ugly loop that simply implements strspn():

                cont:
                c = *s++;
                for (spanp = delim; (sc = *spanp++) != '\0';) {
                if (c == sc)
                goto cont;
                }

                In this case, "delim" points to ",/" and "c" initially picks up
                the ',' from &buf[0], leaving s pointing to &buf[1]. The inner
                loop sets spanp to delim, then loops while seting sc (the "span
                character") from *spanp++ until sc is '\0'. Because the
                loop increments spanp each time, it will look at all the non-'\0'
                characters in delim, i.e., ',' and '/' respectively, until
                something happens. In this case, the "something" happens
                right away: c==',' and sc==',', so c==sc, so we execute the
                "goto cont" line and thus start the outer loop over again.

                This time, c gets the 'h' from buf[1], and s winds up pointing
                to the 'e' in buf[2]. Since c=='h', and the inner loop looks
                for ',' and '/', we get all the way through the inner loop
                without doing a "goto cont". This leaves c=='h' and exits the
                outer loop.

                (The outer loop logically should use something other than a
                "goto", but more realistically, we could just replace the whole
                thing with:

                s += strspn(s, delim);
                c = *s++;

                The strspn() function counts the number of characters that are
                in the "span set", up to the first one that is not in it. That
                is, since s points to ",hello,/,world" and the span-set is ",/",
                strspn() would return 1, and s += 1 would then advance over the
                initial comma. If s pointed to ",/,hello", strspn() would
                return 3, and s += 3 would advance over all three initial
                characters.)

                In any case, now that we have skipped leading "non-token"
                characters, we are ready to look for tokens:

                if (c == '\0') {
                *lastp = NULL;
                return NULL;
                }

                If there are no more tokens, we set *lastp to NULL (this is, I
                think, allowed but not required by the C standard) and return NULL.
                However, c=='h', so we do not do this, but instead plow on to the
                trickiest, densest part of the code:

                tok = s - 1;
                for (;;) {
                c = *s++;
                spanp = delim;
                do {
                if ((sc = *spanp++) == c) {
                [to be shown and explained in a moment]
                }
                } while (sc != '\0');
                }
                /* NOTREACHED */

                Remember that s points one character *past* the beginning of
                the token -- in this case, to the 'e' in "hello" -- because we
                did "c = *s++". So we set tok to s-1, so that tok points to
                the 'h' in "hello". We then enter the outer loop, which -- as
                the comment notes -- really just implements strcspn().

                For each trip through the outer loop, we look at the next character
                of the proposed token, via "c = *s++" as before. In this case,
                it first sets c to the 'e' in buf[2], leaving s pointing to buf[3]
                (the first 'l').

                The inner loop then runs through each character in "delim", putting
                it in sc each time, checking to see whether c==sc. Since the two
                delimiter characters are ',' and '/', and 'e' is neither a comma nor
                a slash, this will not find them. But here is one of the tricky
                bits: since "delim" is a C string, it ends with a '\0' character.
                I want *all* the loops to stop if c=='\0'. Rather than putting
                in a special test for c=='\0', I simply allow the inner loop to
                test c against the '\0' that ends the delimiter string.

                In other words, since delim points to {',', '/', '\0'}, I can let
                sc take on all *three* of these "char"s, and compare c==sc each
                time. I stop the inner loop only *after* both c!=sc *and* sc=='\0'.
                So this allows me to test c=='\0' without writing a separate "if"
                statement.

                In any case, none of 'e', 'l', 'l', and 'o' are token-delimiters,
                so eventually "c = *s++" sets c to ',', leaving s pointing to the
                '/' in ",hello,/,world". This time the inner loop *does* find
                a token-delimiter, and we get to the code inside the "if":

                if (c == '\0')
                s = NULL;
                else
                s[-1] = '\0';
                *lastp = s;
                return tok;

                Here, we check to see if c=='\0' first. A '\0' ends the string --
                meaning s points one past the end -- and the end of the string is
                necessarily the end of the token. But it also means there cannot
                possibly be any additional tokens. So we want to sent *lastp to
                NULL, so that a future call to strtok() will return NULL. Setting
                s to NULL and *lastp to s will accomplish that. (In this case,
                since c==sc, we will also have sc=='\0'. That is, we found both
                the end of the delimiter string *and* the end of the token. This
                is OK; we do not care which "token-ending character" we found.)

                In our case, though, c!='\0', because c==','. So we set s[-1] to
                '\0'. Here s points to the '/' in buf[7], so this replaces the
                ',' in buf[6] with '\0'. (Remember, buf is initially set to
                ",hello,/,world", so buf[0] is ',', buf[1] is 'h', buf[2] is 'e',
                buf[3] is 'l', buf[4] is 'l', buf[5] is 'o', buf[6] is ',', buf[7]
                is '/', buf[8] is ',', buf[9] is 'w', and so on.) We leave s==&buf[7],
                and set *lastp = s -- so now the variable "last" in strtok()
                points to the '/' in buf[7].

                Finally, we return "tok", which is &buf[1]. The contents of buf[]
                have been modified slightly, and the static variable "last" in
                strtok() points just past the first token, so that a later call to
                strtok() can find more tokens.

                Again, the two nasty loops can be replaced by strspn() -- which
                will "span" (or skip over) initial delimiters -- and strcspn(),
                which will span the "c"omplemen t of the delimiter-set, i.e., skip
                over initial "non-delimiters". Since everything that is not
                a delimiter is a token character, this will give us just what
                we want -- and we can rewrite the entire thing as:

                char *strtok(char *restrict s, const char *restrict delim) {
                static char *last;
                char *restrict tok;

                /* pick up from previous stopping point, if told to do so */
                if (s == NULL && (s = last) == NULL)
                return NULL;

                /* skip initial delimiters to find start of token */
                tok = s + strspn(s, delim);

                /* skip over non-delimiters to find end of token */
                s = tok + strcspn(tok, delim);

                /*
                * Save state for next call, and return token. If there
                * is no token, both *tok and *s will be '\0'. If there
                * is one token that is ended with '\0', *s will be '\0'.
                * (If there are trailing delimiters, we will skip them
                * later.)
                */
                last = *s == '\0' ? NULL : s + 1;
                return *tok == '\0' ? NULL : tok;
                }

                This code is now short enough that we can just use a separate copy
                for strtok_r(), if we like; or we can use the BSD-style "strtok is
                a wrapper around strtok_r", which just needs the obvious changes
                with "lastp".

                It is also short enough to demonstrate that strtok() is not a
                particularly useful function: you can call strspn() and strcspn()
                yourself, and avoid the problems that eventually crop up with the
                saved state in the static variable called "last" here.
                --
                In-Real-Life: Chris Torek, Wind River Systems
                Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
                email: gmail (figure it out) http://web.torek.net/torek/index.html

                Comment

                • CBFalconer

                  #9
                  Re: looking for implementation of strtok

                  Peter Nilsson wrote:
                  CBFalconer wrote:
                  >magicman wrote:
                  >>
                  >>can anyone point me out to its implementation in C before I roll
                  >>my own.
                  >>
                  >Why? strtok() is part of the standard C library, and should always
                  >be available.
                  >
                  It needn't be available on a freestanding implementation.
                  >
                  .... snip ...

                  In which case the op would be well advised to look for or design
                  his own other routines. One available is tknsplit, as follows:

                  /* ------- file tknsplit.h ----------*/
                  #ifndef H_tknsplit_h
                  # define H_tknsplit_h

                  # ifdef __cplusplus
                  extern "C" {
                  # endif

                  #include <stddef.h>

                  /* copy over the next tkn from an input string, after
                  skipping leading blanks (or other whitespace?). The
                  tkn is terminated by the first appearance of tknchar,
                  or by the end of the source string.

                  The caller must supply sufficient space in tkn to
                  receive any tkn, Otherwise tkns will be truncated.

                  Returns: a pointer past the terminating tknchar.

                  This will happily return an infinity of empty tkns if
                  called with src pointing to the end of a string. Tokens
                  will never include a copy of tknchar.

                  released to Public Domain, by C.B. Falconer.
                  Published 2006-02-20. Attribution appreciated.
                  revised 2007-05-26 (name)
                  */

                  const char *tknsplit(const char *src, /* Source of tkns */
                  char tknchar, /* tkn delimiting char */
                  char *tkn, /* receiver of parsed tkn */
                  size_t lgh); /* length tkn can receive */
                  /* not including final '\0' */

                  # ifdef __cplusplus
                  }
                  # endif
                  #endif
                  /* ------- end file tknsplit.h ----------*/

                  /* ------- file tknsplit.c ----------*/
                  #include "tknsplit.h "

                  /* copy over the next tkn from an input string, after
                  skipping leading blanks (or other whitespace?). The
                  tkn is terminated by the first appearance of tknchar,
                  or by the end of the source string.

                  The caller must supply sufficient space in tkn to
                  receive any tkn, Otherwise tkns will be truncated.

                  Returns: a pointer past the terminating tknchar.

                  This will happily return an infinity of empty tkns if
                  called with src pointing to the end of a string. Tokens
                  will never include a copy of tknchar.

                  A better name would be "strtkn", except that is reserved
                  for the system namespace. Change to that at your risk.

                  released to Public Domain, by C.B. Falconer.
                  Published 2006-02-20. Attribution appreciated.
                  Revised 2006-06-13 2007-05-26 (name)
                  */

                  const char *tknsplit(const char *src, /* Source of tkns */
                  char tknchar, /* tkn delimiting char */
                  char *tkn, /* receiver of parsed tkn */
                  size_t lgh) /* length tkn can receive */
                  /* not including final '\0' */
                  {
                  if (src) {
                  while (' ' == *src) src++;

                  while (*src && (tknchar != *src)) {
                  if (lgh) {
                  *tkn++ = *src;
                  --lgh;
                  }
                  src++;
                  }
                  if (*src && (tknchar == *src)) src++;
                  }
                  *tkn = '\0';
                  return src;
                  } /* tknsplit */

                  #ifdef TESTING
                  #include <stdio.h>

                  #define ABRsize 6 /* length of acceptable tkn abbreviations */

                  /* ---------------- */

                  static void showtkn(int i, char *tok)
                  {
                  putchar(i + '1'); putchar(':');
                  puts(tok);
                  } /* showtkn */

                  /* ---------------- */

                  int main(void)
                  {
                  char teststring[] = "This is a test, ,, abbrev, more";

                  const char *t, *s = teststring;
                  int i;
                  char tkn[ABRsize + 1];

                  puts(teststring );
                  t = s;
                  for (i = 0; i < 4; i++) {
                  t = tknsplit(t, ',', tkn, ABRsize);
                  showtkn(i, tkn);
                  }

                  puts("\nHow to detect 'no more tkns' while truncating");
                  t = s; i = 0;
                  while (*t) {
                  t = tknsplit(t, ',', tkn, 3);
                  showtkn(i, tkn);
                  i++;
                  }

                  puts("\nUsing blanks as tkn delimiters");
                  t = s; i = 0;
                  while (*t) {
                  t = tknsplit(t, ' ', tkn, ABRsize);
                  showtkn(i, tkn);
                  i++;
                  }
                  return 0;
                  } /* main */

                  #endif
                  /* ------- end file tknsplit.c ----------*/

                  --
                  [mail]: Chuck F (cbfalconer at maineline dot net)
                  [page]: <http://cbfalconer.home .att.net>
                  Try the download section.



                  ** Posted from http://www.teranews.com **

                  Comment

                  • Chris Torek

                    #10
                    Re: looking for implementation of strtok

                    In article <fugjau02jal@ne ws3.newsguy.com I wrote, in part,
                    a "simplified implementation" of my old strtok, but in the
                    process put in a major bug:
                    char *strtok(char *restrict s, const char *restrict delim) {
                    static char *last;
                    char *restrict tok;
                    >
                    /* pick up from previous stopping point, if told to do so */
                    if (s == NULL && (s = last) == NULL)
                    return NULL;
                    >
                    /* skip initial delimiters to find start of token */
                    tok = s + strspn(s, delim);
                    >
                    /* skip over non-delimiters to find end of token */
                    s = tok + strcspn(tok, delim);
                    >
                    /*
                    * Save state for next call, and return token. If there
                    * is no token, both *tok and *s will be '\0'. If there
                    * is one token that is ended with '\0', *s will be '\0'.
                    * (If there are trailing delimiters, we will skip them
                    * later.)
                    */
                    last = *s == '\0' ? NULL : s + 1;
                    return *tok == '\0' ? NULL : tok;
                    }
                    I forgot to '\0'-terminate the (nonempty case) token! The assignment
                    to "last" is too simple and should be expanded out to:

                    if (*s == '\0')
                    last = NULL;
                    else {
                    *s++ = '\0';
                    last = s;
                    }

                    Writing this without an "if" is possible, but I think ugly:

                    last = *s == '\0' ? NULL : (*s++ = '\0', s);
                    --
                    In-Real-Life: Chris Torek, Wind River Systems
                    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
                    email: gmail (figure it out) http://web.torek.net/torek/index.html

                    Comment

                    • Nick Keighley

                      #11
                      Re: looking for implementation of strtok

                      On 20 Apr, 00:17, CBFalconer <cbfalco...@yah oo.comwrote:
                      magicman wrote:
                      can anyone point me out to [strtok()]s implementation in C before I roll
                      my own.
                      >
                      Why?  strtok() is part of the standard C library, and should always
                      be available.
                      just because it "should" be available doesn't mean it is.
                      I've worked with an incomplete C library and ended up implementing
                      a few standard function.

                      Embedded systems are particularly prone to this. I think someone
                      told me he'd worked on a system that didn't implement things
                      like memset().

                      I've also come across broken implementations where the function
                      prototypes didn't quite match the standard.


                      --
                      Nick Keighley

                      A lot of the c.l.c. verbiage seems to be devoted to the numerical
                      density of cavorting nubile seraphim upon pinheads.
                      CBFalconer

                      Comment

                      • Richard

                        #12
                        Re: looking for implementation of strtok

                        Nick Keighley <nick_keighley_ nospam@hotmail. comwrites:
                        On 20 Apr, 00:17, CBFalconer <cbfalco...@yah oo.comwrote:
                        >magicman wrote:
                        >
                        can anyone point me out to [strtok()]s implementation in C before I roll
                        my own.
                        >>
                        >Why?  strtok() is part of the standard C library, and should always
                        >be available.
                        >
                        just because it "should" be available doesn't mean it is.
                        I've worked with an incomplete C library and ended up implementing
                        a few standard function.
                        >
                        Embedded systems are particularly prone to this. I think someone
                        told me he'd worked on a system that didn't implement things
                        like memset().
                        Er, wouldn't you be the first to state that these systems are off topic
                        and you should go down the corriddoor and take the second right to the
                        newsgroup dealing with your incomplete implementation? Good to see a
                        little more openness.
                        I've also come across broken implementations where the function
                        prototypes didn't quite match the standard.
                        That's nice.

                        Comment

                        Working...