C++ Tokenize vectors strings, no values between delimiters

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • m6s
    New Member
    • Aug 2007
    • 55

    C++ Tokenize vectors strings, no values between delimiters

    1. After hours of researching, I used these snippets :
    Code:
    void Object::TokenizeLines(const string& str, vector<string>& tokens, const string& delimiters)
    // Skip delimiters at beginning.
    	string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    	// Find first "non-delimiter".
    	string::size_type pos     = str.find_first_of(delimiters, lastPos);
    
    	while (string::npos != pos || string::npos != lastPos) {
    		// Found a token, add it to the vector.
    		tokens.push_back( CheckWord( str, lastPos, pos - lastPos));
    		// Skip delimiters.  Note the "not_of"
    		lastPos = str.find_first_not_of(delimiters, pos);
    		// Find next "non-delimiter"
    		pos = str.find_first_of(delimiters, lastPos);
    	}
    }
    for tokenizing large lines.
    with a function call like this : string.Tokenize (str, tokens, "\n");
    and because in each line I had things like : abc,de,,,f,g,,h
    the previous lines were faulty. So I found this one :
    Code:
    void TokenizeWithComma(const string& str, vector<string>& tokens){
    	const char* first = str.c_str();
    	const char* last = str.c_str() + strlen(str.c_str());
    	while (first != last) {
    		const char* next = find(first, last, ',');
    		tokens.push_back(string(first, next - first));
    		first = min(next + 1, last);
    	}
    }
    and I use it after the TokenizeLines to tokenise the words by passing a string and a vector. Both worked.

    After spenting too many hours today for that, I have my homework done, but I am not sure what I did here...
    Can someone (not novice likeme) give me more detailed view?

    Also :
    2. Why, the first function don't want to work with strings like a,bc,,d ?
    3. I tryed also this for each word ( which are in vector ) :
    Code:
    for (w_iter = token_lines.begin(); w_iter != token_lines.end(); w_iter++) {
    string ff = (*w_iter);
    string::size_type loc = ff.find( "abc", 0 );
    if( loc != string::npos ) { cout << "Found Omega at " << loc << endl;}
    else {cout << "Didn't find Omega" << endl;}
    But the code while is a working with a normal string ff("abc,ccc,cc" ), seems not to work with the declaration I had. Is the iterator's fault?
    This drove me nuts, and made me in order to find patterns, devide a string with substr and have also cases for not making illegal measurements in the substr function ( which is of string again ).

    I know it might be borring topic for most, I appreciate your help...
    Thank you
  • weaknessforcats
    Recognized Expert Expert
    • Mar 2007
    • 9214

    #2
    Originally posted by m6s
    But the code while is a working with a normal string ff("abc,ccc,cc" ),
    string ff("abc,ccc,cc" ) creates a strring object.

    Your function has a string& argument so you can use that string object as a argument.

    This ("abc,ccc,cc ") is a C-string. A C-string cannot be used a string& because it's not a string object.

    Finally, I like your last solution. You should never be use the string::c_str() method unless your function absolutely requires a C-string. Considerinf that the C string library is deprecated in C++, there should little call for this.

    Comment

    • m6s
      New Member
      • Aug 2007
      • 55

      #3
      Originally posted by weaknessforcats
      string ff("abc,ccc,cc" ) creates a strring object.

      Your function has a string& argument so you can use that string object as a argument.

      This ("abc,ccc,cc ") is a C-string. A C-string cannot be used a string& because it's not a string object.

      Finally, I like your last solution. You should never be use the string::c_str() method unless your function absolutely requires a C-string. Considerinf that the C string library is deprecated in C++, there should little call for this.
      Thank you for your answer, and the good word :-)
      I had tried even string temp = *iter, but that didn't work also. I assume by this way it should load a string object and tokenize it, right?
      because temp was then passed to the function.But didn't work either.
      Finally, how could I make the TokenizeCommas without a c_str().
      Build the whole string as char arrays? Or is any other choice in order to stick as much as it can close to C++?

      Comment

      • weaknessforcats
        Recognized Expert Expert
        • Mar 2007
        • 9214

        #4
        Why can't you use find() to locate the next comma??

        Comment

        • m6s
          New Member
          • Aug 2007
          • 55

          #5
          :-) I Don't understand me either!!!
          I have a 64bit, can that be the problem?
          Just go through yourself if you like it and find it interesting...
          I know that didn't result.

          In other words, can you make an iterator from words ( if from a file even better) which will be like a,b,c,,,d,f,g ....so on... (did you notice the three commas-->2 spaces?

          And then use with just C++ code not C, I mean the first Tokenize function.
          What's your result? Oh, assign each iterator to a string and then tokenize it.
          For me it was disaster...

          Comment

          • weaknessforcats
            Recognized Expert Expert
            • Mar 2007
            • 9214

            #6
            Originally posted by m6s
            In other words, can you make an iterator from words ( if from a file even better) which will be like a,b,c,,,d,f,g ....so on... (did you notice the three commas-->2 spaces?
            It's not the string that's your problem. Its your parsing logic.

            There is an article in the C/C++ HowTos on the State Design Pattern and inside that article is how tro constrcuct a tokenizer to break a string into individual words. Complete with code. You might read that article.

            Comment

            • m6s
              New Member
              • Aug 2007
              • 55

              #7
              Ok, thank you for your close support on this, I am going to check this article too...

              Comment

              Working...