Help with tokenizing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Acolyte
    New Member
    • Jan 2007
    • 20

    Help with tokenizing

    Ok, the program I'm working on now involves taking an input string and tokenizing it, by seperating it by spaces. Here's what I've got:
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main (void)
    {
            char *tokenarray[100];
            char input[200] = {"Test , Test2"};
            char *tokenPtr;
            int cnt = 0;
    
            tokenPtr = strtok(input, " ");
            while (tokenPtr != NULL)
            {
                    tokenPtr = strtok(NULL, " ");
                    tokenarray[cnt] = tokenPtr;
                    cnt ++;
            }
    
            for(cnt = 0; cnt < 100; cnt ++);
            {
                    printf("%s\n", tokenarray[cnt]);
            }
    }
    The problem is that whenever I try to run it, I get a segmentation fault. Anybody see what I'm missing here?
  • Ganon11
    Recognized Expert Specialist
    • Oct 2006
    • 3651

    #2
    Can you explain what strtok(NULL, " ") does breifly? I haven't used this function before, but if I understand what you're trying to do, I can probably help.

    Comment

    • Acolyte
      New Member
      • Jan 2007
      • 20

      #3
      NVM, I solved the initial problem, but I have a new one, which will be posted in a second.

      Anyways, strtok(NULL, " ") takes a string and seperates it based on where the spaces are. So, if the input was char test = {"These are a bunch of words."}, it would seperate it into "These", "are", "a", "bunch", "of", "words."

      Anyways, new problem:
      Code:
      #include <stdio.h>
      #include <string.h>
      #include <ctype.h>
      #include <stdlib.h>
      
      int main (void)
      {
              char *tokenarray[100];
              char input[] = {"Test, 1"};
              char *tokenPtr;
              int cnt = 0, c;
              int i;
      
              tokenPtr = strtok(input, " ");
              while (tokenPtr != NULL)
              {
                      tokenarray[cnt] = tokenPtr;
                      tokenPtr = strtok(NULL, " ");
                      cnt ++;
              }
      
              if (isdigit(tokenarray[1]) != 0)
                      {
                      i = atoi(tokenarray[1]);
                      }
      
              for(c = 0; c < cnt; c++)
              {
                      printf("%s\n", tokenarray[c]);
              }
      }
      What this version is trying to do is locate tokenized entries that consist of numbers and convert them to int. However, I get a segmentation fault whenever I try. Anybody see what I'm missing?

      Comment

      • AdrianH
        Recognized Expert Top Contributor
        • Feb 2007
        • 1251

        #4
        Originally posted by Acolyte
        NVM, I solved the initial problem, but I have a new one, which will be posted in a second.

        Anyways, strtok(NULL, " ") takes a string and seperates it based on where the spaces are. So, if the input was char test = {"These are a bunch of words."}, it would seperate it into "These", "are", "a", "bunch", "of", "words."

        Anyways, new problem:
        Code:
        #include <stdio.h>
        #include <string.h>
        #include <ctype.h>
        #include <stdlib.h>
        
        int main (void)
        {
                char *tokenarray[100];
                char input[] = {"Test, 1"};
                char *tokenPtr;
                int cnt = 0, c;
                int i;
        
                tokenPtr = strtok(input, " ");
                while (tokenPtr != NULL)
                {
                        tokenarray[cnt] = tokenPtr;
                        tokenPtr = strtok(NULL, " ");
                        cnt ++;
                }
        
                if (isdigit(tokenarray[1]) != 0)
                        {
                        i = atoi(tokenarray[1]);
                        }
        
                for(c = 0; c < cnt; c++)
                {
                        printf("%s\n", tokenarray[c]);
                }
        }
        What this version is trying to do is locate tokenized entries that consist of numbers and convert them to int. However, I get a segmentation fault whenever I try. Anybody see what I'm missing?
        I see a few problems here.
        • Your loop should make sure it doesn’t go beyond the end of tokenarray bound limit. Given your current input, it shouldn’t matter, but it will later on.
        • isdigit takes a char, not a pointer, you will get an undefined result unless you dereference the tokenarray[1] using the * operator (*tokenarray[1]) or a second [] operator (tokenarray[1][0]). The way you have it now will most likely result in false (0).

        However, either of these problems that I pointed out, do not appear to cause a segfault.

        If you have access to gdb (gnu debugger) on the system you are working on, try looking at this post. It briefly describes how to use it to debug your programme.


        Adrian

        Comment

        • iknc4miles
          New Member
          • Oct 2006
          • 32

          #5
          Originally posted by Acolyte
          Code:
                  if (isdigit(tokenarray[1]) != 0)
                          {
                          i = atoi(tokenarray[1]);
                          }
          
          }
          What this version is trying to do is locate tokenized entries that consist of numbers and convert them to int. However, I get a segmentation fault whenever I try. Anybody see what I'm missing?
          At first sight, isdigit() takes a single character as an integer value. You're trying to feed it a char* instead of a char. Not sure if this would be the cause of your seg-fault, but it's a start. atoi is used correctly.

          If you're going to use isdigit, you have to check one character at a time. You might even need to cast each char as an int (see below).
          Code:
          int k =0;
          while (1)
          {
              isdigit( (int)tokenarray[1][k]);
              k++;
          }
          This code is just an example and not a substitute solution to your code.

          - iknc4miles

          Comment

          • iknc4miles
            New Member
            • Oct 2006
            • 32

            #6
            Originally posted by Ganon11
            Can you explain what strtok(NULL, " ") does breifly? I haven't used this function before, but if I understand what you're trying to do, I can probably help.
            My guess is you are confused about the NULL as the first argument?

            As far as I've ever used it, the first use of strtok needs to specify the character array to be tokenized. After that, as long as you are continuing with the same character array you would call strtok(NULL, (delimiter char)) to get the next token in that array. When strtok returns a NULL value, it has reached the end of the string.

            - iknc4miles

            Comment

            • AdrianH
              Recognized Expert Top Contributor
              • Feb 2007
              • 1251

              #7
              Originally posted by iknc4miles
              My guess is you are confused about the NULL as the first argument?

              As far as I've ever used it, the first use of strtok needs to specify the character array to be tokenized. After that, as long as you are continuing with the same character array you would call strtok(NULL, (delimiter char)) to get the next token in that array. When strtok returns a NULL value, it has reached the end of the string.

              - iknc4miles
              That is what Acolyte is doing, but s/he is forgetting something, and I see it now. You can see it too if you init tokenarray like so:
              Code:
                      tokenarray[100] = {};
              That will init all elements in the array to NULL.

              You should see (null) come up without your programme crashing. Think about the while loop a bit more and you should see the problem.

              Come back if you still have problems.


              Adrian

              P.S. you don’t have to cast an char to an int. As you do not loose precision, it is an automatic cast.

              Comment

              • horace1
                Recognized Expert Top Contributor
                • Nov 2006
                • 1510

                #8
                this
                Code:
                        if (isdigit(tokenarray[1]) != 0)
                                {
                gives me a segmentation fault and is fixed by defreferencing (as indicated by previous contributers)
                Code:
                        if (isdigit(*tokenarray[1]) != 0)
                               {
                however, rather than having to check every character using isdigit() and then use atoi() you could use sscanf(), e.g.
                Code:
                        if(sscanf(tokenarray[1],"%d", &i) != 1)
                          printf("error %s is not integer\n", tokenarray[1]);
                        else
                          printf("integer %d\n", i);
                which attempts to convert an integer %d from tokenarray[1] (a char *) - if sucessful it returns 1 anything else is a fail and the char * does not contain a valid integer

                Comment

                • andersod
                  New Member
                  • Mar 2007
                  • 9

                  #9
                  hi,
                  try the following:

                  change if (isdigit(tokena rray[1]) != 0)
                  to if (isdigit(*token array[1]) != 0)

                  regards,
                  andersod

                  Comment

                  Working...