Comment on trim string function please

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • swengineer001@gmail.com

    Comment on trim string function please

    Just looking for a few eyes on this code other than my own.

    void TrimCString(cha r *str)
    {
    // Trim whitespace from beginning:
    size_t i = 0;
    size_t j;

    while(isspace(s tr[i]))
    {
    i++;
    }
    if(i 0)
    {
    for(j = 0; i < strlen(str); j++, i++)
    {
    str[j] = str[i];
    }
    str[j] = '\0';
    }

    // Trim whitespace from end:
    i = strlen(str) - 1;

    while(isspace(s tr[i]))
    {
    i--;
    }
    if(i < (strlen(str) - 1))
    {
    str[i + 1] = '\0';
    }
    }
  • Jens Thoms Toerring

    #2
    Re: Comment on trim string function please

    swengineer001@g mail.com <swengineer001@ gmail.comwrote:
    Just looking for a few eyes on this code other than my own.
    void TrimCString(cha r *str)
    {
    // Trim whitespace from beginning:
    I guess I would trim from the end first since then you have less
    copying to do afterwards.
    size_t i = 0;
    size_t j;
    while(isspace(s tr[i]))
    isspace() expects an int and not a char as it's argument.
    {
    i++;
    }
    if(i 0)
    {
    for(j = 0; i < strlen(str); j++, i++)
    {
    str[j] = str[i];
    }
    str[j] = '\0';
    }
    An alternative would be to use memmove() here, so you don't
    have to do it byte by byte. Also callling strlen() each time
    through the loop is a bit of a waste of time - it doesn't
    change and can be replaced by a check if str[i] is '\0'.
    // Trim whitespace from end:
    i = strlen(str) - 1;
    Careful: This could set 'i' to -1 (if the string consistet of white
    space only) and then the rest won't work anymore.
    while(isspace(s tr[i]))
    {
    i--;
    }
    if(i < (strlen(str) - 1))
    {
    str[i + 1] = '\0';
    }
    }
    Here's an alternative version using pointers (and trying to
    minimize the number of calls of strlen()):

    void
    TrimCString( char *str )
    {
    char *p,
    *q;

    /* Check that we've got something that looks like a string */

    if ( ! str || ! * str )
    return;

    /* Trim from end */

    for ( p = str + strlen( str ) - 1; p != str && isspace( ( int ) *p ); p-- )
    /* empty */ ;

    if ( p == str ) /* only white space in string */
    {
    *str = '\0';
    return;
    }

    *++p = '\0';

    /* Trim from start */

    for ( q = str; isspace( ( int ) *q ); q++ )
    /* empty */ ;

    if ( q != str )
    memmove( str, q, p - q + 1 );
    }
    Regards, Jens
    --
    \ Jens Thoms Toerring ___ jt@toerring.de
    \______________ ____________ http://toerring.de

    Comment

    • badc0de4@gmail.com

      #3
      Re: Comment on trim string function please

      swengineer001@g mail.com wrote:
      Just looking for a few eyes on this code other than my own.
      >
      void TrimCString(cha r *str)
      Why not return a char *, like most other string functions?

      char *TrimCString(ch ar *str)

      using char * enables you to piggyback your function in the
      middle of other functions, eg
      printf("%s\n", TrimCString(som eString));
      {
      // Trim whitespace from beginning:
      size_t i = 0;
      size_t j;
      >
      while(isspace(s tr[i]))
      {
      i++;
      }
      if(i 0)
      {
      for(j = 0; i < strlen(str); j++, i++)
      move the strlen() outside the loop.
      Maybe use memmove() instead of the loop.
      {
      str[j] = str[i];
      }
      str[j] = '\0';
      }
      >
      // Trim whitespace from end:
      i = strlen(str) - 1;
      Use the strlen you computed before, when
      you moved it out of the for loop above :)
      >
      while(isspace(s tr[i]))
      {
      i--;
      }
      if(i < (strlen(str) - 1))
      {
      str[i + 1] = '\0';
      No need for the test.
      when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
      assignment overwrites a '\0' with a brand new '\0'.
      Anyway, if you want to keep the test, use the computed strlen.
      }
      }

      A couple what-if's

      * what if a pass NULL to the function?
      TrimCString(NUL L);

      * what if I pass a constant string literal to the function?
      TrimCString(" 4 spaces at both ends ");

      Comment

      • Eric Sosman

        #4
        Re: Comment on trim string function please

        swengineer001@g mail.com wrote:
        Just looking for a few eyes on this code other than my own.
        This pair of eyes sees three bugs, two occurring twice each
        and the other perhaps due to too much snippage. There are also
        some opportunities to improve speed and style. So, here we go:

        Bug: Since you're using isspace() and strlen(), you need to
        #include <ctype.hand <string.hto get their declarations.
        Without the declarations, a compiler operating under C99 rules
        must generate a diagnostic. Under C89 rules the code will work,
        but might not be as fast as if the vendor's "magic" declarations
        were present.
        void TrimCString(cha r *str)
        {
        // Trim whitespace from beginning:
        size_t i = 0;
        Style: Instead of initializing `i' at its declaration and then
        relying on the initial value later on, consider initializing it
        closer to its use. The `for' statement is convenient for such
        purposes.
        size_t j;
        >
        while(isspace(s tr[i]))
        Bug: If `str' contains negative-valued characters, this use
        may violate the "contract" of isspace() by passing an out-of-range
        argument. Use `while (isspace( (unsigned char)str[i] ))'. (This
        is one of those occasions where a cast is almost always required
        and almost always omitted, as opposed to the more usual situation
        where a cast is almost always inserted and almost always wrong.)
        {
        i++;
        }
        if(i 0)
        {
        for(j = 0; i < strlen(str); j++, i++)
        Speed: This loop calculates strlen(str) on every iteration.
        Since it will return the same value each time, all calls after the
        first are wasted effort. Call strlen() once before the loop and
        store the result in a variable, and use that variable to control
        the loop.
        {
        str[j] = str[i];
        }
        str[j] = '\0';
        }
        >
        // Trim whitespace from end:
        i = strlen(str) - 1;
        Bug: If `str' is the empty string (either because it was
        empty to begin with or because it contained only white space),
        this calculation will produce a very large `i' that is almost
        assuredly wrong, R-O-N-G.
        while(isspace(s tr[i]))
        Bug: Same missing cast as above.
        {
        i--;
        }
        if(i < (strlen(str) - 1))
        Bug: Same mishandling of the empty string as above.

        Speed: strlen(str) is still the same as it was a few lines
        ago, so there's no point in computing it again.
        {
        str[i + 1] = '\0';
        Speed: It's probably quicker -- and certainly less verbose --
        to do this assignment unconditionally than to test whether it's
        needed. If it isn't, you'll just store a zero on top of an
        existing zero, which is harmless.
        }
        }
        Summary: Not quite ready for prime time, but not as bad as
        some attempts I've seen.

        Challenge: See if you can think of a way to do the job in
        just one pass over the string (calling strlen() counts as a
        "pass"). Hint: During the copy-to-front operation, can you
        somehow figure out where the final '\0' should land without
        going back and inspecting the moved characters a second time?

        --
        Eric.Sosman@sun .com

        Comment

        • swengineer001@gmail.com

          #5
          Re: Comment on trim string function please

          On Jul 10, 1:32 pm, j...@toerring.d e (Jens Thoms Toerring) wrote:
          swengineer...@g mail.com <swengineer...@ gmail.comwrote:
          Just looking for a few eyes on this code other than my own.
          void TrimCString(cha r *str)
          {
                  // Trim whitespace from beginning:
          >
          I guess I would trim from the end first since then you have less
          copying to do afterwards.
          >
                  size_t i = 0;
                  size_t j;
                  while(isspace(s tr[i]))
          >
          isspace() expects an int and not a char as it's argument.
          Doesn't this promotion happen implicitly and without loss of
          information since I am actually dealing with characters and not using
          the char as a holder for small integers?
          >
                  {
                          i++;
                  }
                  if(i 0)
                  {
                          for(j = 0; i < strlen(str); j++, i++)
                      {
                                  str[j] = str[i];
                          }
                          str[j] = '\0';
                  }
          >
          An alternative would be to use memmove() here, so you don't
          have to do it byte by byte. Also callling strlen() each time
          through the loop is a bit of a waste of time - it doesn't
          change and can be replaced by a check if str[i] is '\0'.
          >
                  // Trim whitespace from end:
                  i = strlen(str) - 1;
          >
          Careful: This could set 'i' to -1 (if the string consistet of white
          space only) and then the rest won't work anymore.
          i is of type size_t which I believe can't be negative?
          >
                  while(isspace(s tr[i]))
                  {
                          i--;
                  }
                  if(i < (strlen(str) - 1))
                  {
                          str[i + 1] = '\0';
                  }
          }
          >
          Here's an alternative version using pointers (and trying to
          minimize the number of calls of strlen()):
          >
          void
          TrimCString( char *str )
          {
              char *p,
                       *q;
          >
              /* Check that we've got something that looks like a string */
          >
                  if ( ! str || ! * str )
                          return;
          >
                  /* Trim from end */
          >
              for ( p = str + strlen( str ) - 1; p != str && isspace( ( int) *p ); p-- )
                      /* empty */ ;
          >
                  if ( p == str )      /* only white space in string */
                  {
                      *str = '\0';
                          return;
              }
          >
                  *++p = '\0';
          >
              /* Trim from start */
          >
                  for ( q = str; isspace( ( int ) *q ); q++ )
                  /* empty */ ;
          >
              if ( q != str )
                  memmove( str, q, p - q + 1 );}
          Thanks for the code and the comments. This is useful.

          Comment

          • swengineer001@gmail.com

            #6
            Re: Comment on trim string function please

            On Jul 10, 1:39 pm, badc0...@gmail. com wrote:
            swengineer...@g mail.com wrote:
            Just looking for a few eyes on this code other than my own.
            >
            void TrimCString(cha r *str)
            >
            Why not return a char *, like most other string functions?
            >
            char *TrimCString(ch ar *str)
            >
                    using char * enables you to piggyback your function in the
                    middle of other functions, eg
                    printf("%s\n", TrimCString(som eString));
            Good point, I had not thought of it.
            >
            {
               // Trim whitespace from beginning:
               size_t i = 0;
               size_t j;
            >
               while(isspace(s tr[i]))
               {
                       i++;
               }
               if(i 0)
               {
                       for(j = 0; i < strlen(str); j++, i++)
            >
            move the strlen() outside the loop.
            Maybe use memmove() instead of the loop.
            >
                   {
                               str[j] = str[i];
                       }
                       str[j] = '\0';
               }
            >
               // Trim whitespace from end:
               i = strlen(str) - 1;
            >
            Use the strlen you computed before, when
            you moved it out of the for loop above :)
            the length of the string has changed since then, possibly.
            >
            >
            >
               while(isspace(s tr[i]))
               {
                       i--;
               }
               if(i < (strlen(str) - 1))
               {
                       str[i + 1] = '\0';
            >
            No need for the test.
            when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
            assignment overwrites a '\0' with a brand new '\0'.
            Anyway, if you want to keep the test, use the computed strlen.
            >
               }
            }
            >
            A couple what-if's
            >
            * what if a pass NULL to the function?
              TrimCString(NUL L);
            I added a check for this.
            >
            * what if I pass a constant string literal to the function?
              TrimCString("    4 spaces at both ends    ");
            How can I account for this?

            Comment

            • swengineer001@gmail.com

              #7
              Re: Comment on trim string function please

              On Jul 10, 1:41 pm, Eric Sosman <Eric.Sos...@su n.comwrote:
              swengineer...@g mail.com wrote:
              Just looking for a few eyes on this code other than my own.
              >
                   This pair of eyes sees three bugs, two occurring twice each
              and the other perhaps due to too much snippage.  There are also
              some opportunities to improve speed and style.  So, here we go:
              >
                   Bug: Since you're using isspace() and strlen(), you need to
              #include <ctype.hand <string.hto get their declarations.
              Without the declarations, a compiler operating under C99 rules
              must generate a diagnostic.  Under C89 rules the code will work,
              but might not be as fast as if the vendor's "magic" declarations
              were present.
              >
              void TrimCString(cha r *str)
              {
                 // Trim whitespace from beginning:
                 size_t i = 0;
              >
                   Style: Instead of initializing `i' at its declaration and then
              relying on the initial value later on, consider initializing it
              closer to its use.  The `for' statement is convenient for such
              purposes.
              >
                 size_t j;
              >
                 while(isspace(s tr[i]))
              >
                   Bug: If `str' contains negative-valued characters, this use
              may violate the "contract" of isspace() by passing an out-of-range
              argument.  Use `while (isspace( (unsigned char)str[i] ))'.  (This
              is one of those occasions where a cast is almost always required
              and almost always omitted, as opposed to the more usual situation
              where a cast is almost always inserted and almost always wrong.)
              >
                 {
                         i++;
                 }
                 if(i 0)
                 {
                         for(j = 0; i < strlen(str); j++, i++)
              >
                   Speed: This loop calculates strlen(str) on every iteration.
              Since it will return the same value each time, all calls after the
              first are wasted effort.  Call strlen() once before the loop and
              store the result in a variable, and use that variable to control
              the loop.
              >
                     {
                                 str[j] = str[i];
                         }
                         str[j] = '\0';
                 }
              >
                 // Trim whitespace from end:
                 i = strlen(str) - 1;
              >
                   Bug: If `str' is the empty string (either because it was
              empty to begin with or because it contained only white space),
              this calculation will produce a very large `i' that is almost
              assuredly wrong, R-O-N-G.
              >
                 while(isspace(s tr[i]))
              >
                   Bug: Same missing cast as above.
              >
                 {
                         i--;
                 }
                 if(i < (strlen(str) - 1))
              >
                   Bug: Same mishandling of the empty string as above.
              >
                   Speed: strlen(str) is still the same as it was a few lines
              ago, so there's no point in computing it again.
              >
                 {
                         str[i + 1] = '\0';
              >
                   Speed: It's probably quicker -- and certainly less verbose --
              to do this assignment unconditionally than to test whether it's
              needed.  If it isn't, you'll just store a zero on top of an
              existing zero, which is harmless.
              >
                 }
              }
              >
                   Summary: Not quite ready for prime time, but not as bad as
              some attempts I've seen.
              >
                   Challenge: See if you can think of a way to do the job in
              just one pass over the string (calling strlen() counts as a
              "pass").  Hint: During the copy-to-front operation, can you
              somehow figure out where the final '\0' should land without
              going back and inspecting the moved characters a second time?
              >
              --
              Eric.Sos...@sun .com
              Thanks for the input. Someone else had pointed out one of the issues
              you mentioned but I didn't understsnd what he was saying until I read
              your description.

              Comment

              • swengineer001@gmail.com

                #8
                Re: Comment on trim string function please

                Here is my second go at it after reading the comments provided.

                //This was in my original file but just not immediately before this
                function
                #include <ctype.h>
                #include <string.h>

                //Added char* return as suggested
                char* TrimCString(cha r *str)
                {
                // Trim whitespace from beginning:
                size_t i = 0;
                size_t j;

                //Added validation of the argument for NULL
                if(str == NULL)
                {
                return str;
                }

                //Added cast as suggested. I have always avoided casts for various
                reasons
                //discussed in the FAQ and elsewhere but it is good to know this
                case
                while(isspace(( unsigned char)str[i]))
                {
                i++;
                }
                if(i 0)
                {
                //Calculate length once
                size_t length = strlen(str);
                //Decided to leave the bytewise copy I just think I can
                understand it a little better
                for(j = 0; i < length; j++, i++)
                {
                str[j] = str[i];
                }
                str[j] = '\0';
                }

                // Trim whitespace from end:
                //Added check to catch the empty string
                i = (strlen(str) ? (strlen(str) - 1) : 0);

                while(isspace(( unsigned char)str[i]))
                {
                i--;
                }
                //Removed check that was not needed
                str[i + 1] = '\0';
                }

                Comment

                • badc0de4@gmail.com

                  #9
                  Re: Comment on trim string function please

                  swengineer001@g mail.com wrote:
                  On Jul 10, 1:39 pm, badc0...@gmail. com wrote:
                  swengineer...@g mail.com wrote:
                  Just looking for a few eyes on this code other than my own.
                  * what if I pass a constant string literal to the function?
                  TrimCString(" 4 spaces at both ends ");
                  How can I account for this?
                  Well ... you can't!
                  But you can do (at least) one of two things:

                  a) add a comment forbidding the caller to pass a string literal.
                  The comment goes near the function in the file with the code,
                  it also goes in the header file and in the documentation
                  provided for TrimCString()

                  b) Rewrite TrimCString to write the trimmed string to a
                  different memory area (variable, array, whatever), a bit
                  like strcpy() works

                  char *TrimCString(ch ar *dest, const char *source)
                  /* caller must ensure `dest` has enough space for
                  the trimmed string */
                  /* if source is NULL, trims `dest` in-place. `dest`
                  *MUST* be writable */

                  Comment

                  • Ben Bacarisse

                    #10
                    Re: Comment on trim string function please

                    Eric Sosman <Eric.Sosman@su n.comwrites:
                    swengineer001@g mail.com wrote:
                    >Just looking for a few eyes on this code other than my own.
                    <snip>
                    > size_t i = 0;
                    >
                    Style: Instead of initializing `i' at its declaration and then
                    relying on the initial value later on, consider initializing it
                    closer to its use. The `for' statement is convenient for such
                    purposes.
                    If you are suggesting replacing the while (isspace(str[i]) with a
                    for (int i = 0; isspace((unsign ed char)str[i]; i++); then this will not
                    allow i to tested outside the for:
                    > while(isspace(s tr[i]))
                    > {
                    > i++;
                    > }
                    > if(i 0)
                    here.

                    --
                    Ben.

                    Comment

                    • badc0de4@gmail.com

                      #11
                      Re: Comment on trim string function please

                      swengineer001@g mail.com wrote:
                      Here is my second go at it after reading the comments provided.
                      //Added char* return as suggested
                      char* TrimCString(cha r *str)
                      {
                      [...]
                      //Calculate length once
                      size_t length = strlen(str);
                      //Decided to leave the bytewise copy [...]
                      for(j = 0; i < length; j++, i++)
                      {
                      str[j] = str[i];
                      }
                      str[j] = '\0';
                      }
                      >
                      // Trim whitespace from end:
                      //Added check to catch the empty string
                      i = (strlen(str) ? (strlen(str) - 1) : 0);
                      Here, strlen(str) is the same as j.
                      You can replace the call to strlen() with j :)

                      i = j ? j - 1 : 0;
                      while(isspace(( unsigned char)str[i]))
                      {
                      i--;
                      }
                      //Removed check that was not needed
                      str[i + 1] = '\0';
                      And, since you changed the function to return a char *,
                      do return a char * to the caller.

                      return str;
                      }

                      Comment

                      • Michael Brennan

                        #12
                        Re: Comment on trim string function please

                        On 2008-07-10, swengineer001@g mail.com <swengineer001@ gmail.comwrote:
                        Here is my second go at it after reading the comments provided.
                        <snip>
                        // Trim whitespace from end:
                        //Added check to catch the empty string
                        i = (strlen(str) ? (strlen(str) - 1) : 0);
                        >
                        while(isspace(( unsigned char)str[i]))
                        {
                        i--;
                        }
                        //Removed check that was not needed
                        str[i + 1] = '\0';
                        }
                        2 things I found:

                        1. You are missing a return statement at the end.

                        2. This will not be correct if someone calls your function with an array
                        of length one (with its only element set to '\0'). The index in your
                        assignment at the last line will then be 1.

                        --
                        Michael Brennan

                        Comment

                        • swengineer001@gmail.com

                          #13
                          Re: Comment on trim string function please

                          On Jul 10, 2:59 pm, badc0...@gmail. com wrote:
                          swengineer...@g mail.com wrote:
                          Here is my second go at it after reading the comments provided.
                          //Added char* return as suggested
                          char* TrimCString(cha r *str)
                          {
                          >
                          [...]
                          >
                                  //Calculate length once
                                  size_t length = strlen(str);
                                  //Decided to leave the bytewise copy [...]
                                  for(j = 0; i < length; j++, i++)
                                  {
                                      str[j] = str[i];
                                  }
                                  str[j] = '\0';
                              }
                          >
                              // Trim whitespace from end:
                              //Added check to catch the empty string
                              i = (strlen(str) ? (strlen(str) - 1) : 0);
                          >
                          Here, strlen(str) is the same as j.
                          You can replace the call to strlen() with j :)
                          This is only true if there was whitespace to trim from the front and
                          not in the more general case.

                          Comment

                          • Eric Sosman

                            #14
                            Re: Comment on trim string function please

                            swengineer001@g mail.com wrote:
                            Here is my second go at it after reading the comments provided.
                            >
                            //This was in my original file but just not immediately before this
                            function
                            #include <ctype.h>
                            #include <string.h>
                            >
                            //Added char* return as suggested
                            char* TrimCString(cha r *str)
                            {
                            // Trim whitespace from beginning:
                            size_t i = 0;
                            size_t j;
                            >
                            //Added validation of the argument for NULL
                            if(str == NULL)
                            {
                            return str;
                            }
                            >
                            //Added cast as suggested. I have always avoided casts for various
                            reasons
                            //discussed in the FAQ and elsewhere but it is good to know this
                            case
                            while(isspace(( unsigned char)str[i]))
                            {
                            i++;
                            }
                            if(i 0)
                            {
                            //Calculate length once
                            size_t length = strlen(str);
                            //Decided to leave the bytewise copy I just think I can
                            understand it a little better
                            for(j = 0; i < length; j++, i++)
                            {
                            str[j] = str[i];
                            }
                            str[j] = '\0';
                            }
                            Why move the characters at all? The original version needed
                            to because it returned no value, so the only way it could report
                            the position of the first non-space was to squeeze out the leading
                            spaces. But now that you're returning a pointer, why not just
                            point at the first non-space no matter where you find it?
                            // Trim whitespace from end:
                            //Added check to catch the empty string
                            i = (strlen(str) ? (strlen(str) - 1) : 0);
                            >
                            while(isspace(( unsigned char)str[i]))
                            {
                            i--;
                            }
                            This still mis-handles the empty string. Work through it:
                            You'll set `i' to zero, and then you'll test whether `str[0]'
                            is a space. It's the '\0', so it's not a space, so you decrement
                            `i' from zero to a very large number. Then you check whether
                            `str[very_large_numb er]' is a space, and there's no telling
                            what will happen except that it's not likely to be good ...
                            //Removed check that was not needed
                            str[i + 1] = '\0';
                            }
                            You still haven't picked up on the hints about an easier way
                            to deal with the trailing spaces. Let's try another analogy:
                            Imagine you're watching a football ("soccer") game that has
                            reached the shoot-out tie-breaking phase. Abelard shoots and
                            the goalkeeper makes the save (space). Then Bertrand shoots
                            and scores (non-space). Then Claude shoots, and Daniel, and
                            Edouard, ... When the match is over, you know you will be asked
                            which shooter scored the side's final goal (which may or may not
                            have been followed by a few saves). Unfortunately, your memory
                            is terrible -- but you have an erasable slate with enough space
                            to write one name at a time. Can you think of a method that will
                            ensure you can answer the question correctly?

                            --
                            Eric.Sosman@sun .com

                            Comment

                            • badc0de4@gmail.com

                              #15
                              Re: Comment on trim string function please

                              swengineer001@g mail.com wrote:
                              On Jul 10, 2:59 pm, badc0...@gmail. com wrote:
                              swengineer...@g mail.com wrote:
                              Here is my second go at it after reading the comments provided.
                              //Added char* return as suggested
                              char* TrimCString(cha r *str)
                              {
                              [...]
                              missing in my previous post
                              if (i 0)
                              {
                              //Calculate length once
                              size_t length = strlen(str);
                              //Decided to leave the bytewise copy [...]
                              for(j = 0; i < length; j++, i++)
                              {
                              str[j] = str[i];
                              }
                              str[j] = '\0';
                              }
                              // Trim whitespace from end:
                              //Added check to catch the empty string
                              i = (strlen(str) ? (strlen(str) - 1) : 0);
                              Here, strlen(str) is the same as j.
                              You can replace the call to strlen() with j :)
                              This is only true if there was whitespace to trim from the front and
                              not in the more general case.
                              Oops ... you're right, thanks for pointing it out.
                              But I still think you could do without some of those strlen() calls.

                              Comment

                              Working...