Comment on trim string function please

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • vippstar@gmail.com

    #46
    Re: Comment on trim string function please

    On Jul 11, 5:55 am, Jack Klein <jackkl...@spam cop.netwrote:
    On Thu, 10 Jul 2008 10:56:18 -0700 (PDT), "swengineer...@ gmail.com"
    <snip i = strlen(s)-1>
    i is of type size_t which I believe can't be negative?
    >
    You are correct that size_t is an unsigned integer type, and can never
    be negative. But that doesn't mean that underflowing it won't cause a
    very serious problem. Consider:
    >
    <snip (unsigned long)(size_t)-1>
    >
    Compile and execute this and see what you get.
    I don't see underflowing occuring anywhere in that code. Unsigned
    integers can not underflow or overflow.
    I do see your point, but your terminology was not correct.
    Only signed integers can overflow, and when that happends the behavior
    is not defined.

    Comment

    • Keith Thompson

      #47
      Re: Comment on trim string function please

      "Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
      Keith Thompson <kst-u@mib.orgwrote in message
      news:lnwsjs2xlo .fsf@nuthaus.mi b.org...
      >"Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
      Jens Thoms Toerring <jt@toerring.de wrote in message
      news:6dmve2F3ec qhU1@mid.uni-berlin.de...
      >[...]
      >An alternative would be to use memmove() here, so you don't
      >have to do it byte by byte.
      >
      What would be the actual metrics to make a decision here
      one way or the other? I myself just assume for this kind of stuff
      that it will only be done on relatively small strings, just a few
      characters, so it seems that a call to memmove() might be overkill.
      Or is it? DOES it depend on the size of the block you'll
      be moving? How many cycles does it take to assign one
      character to a previous array position, versus the overhead of
      a call to memmove(), and the operation of the function itself?
      >>
      >All this is system-specific.
      >
      This is not really responsive.
      Ok, here's a more responsive answer.

      I don't know.

      Specifically, I don't know how many cycles any given operation will
      take on whatever system you're using, partly because I don't know what
      system you're using. Also, I don't know how many cycles it would take
      on the system(s) *I'm* using, since I've never bothered to measure it.

      I would expect that a call to memmove() would be faster than an
      equivalent explicit loop on some systems, and vice versa on others.
      memmove() has some overhead to determine whether the operands overlap,
      but if the compiler can determine that they don't, or if it knows
      which way they overlap and knows how memcpy() behaves, it might be
      able to avoid that overhead. memmove() or memcpy() likely has some
      overhead due to the function call, but an optimizing compiler might
      replace either with inline code in some cases. An implementation of
      memmove() or memcpy() might take advantage of copying chunks larger
      than 1 byte; whether it can do this might depend on the alignment of
      the operands. And so forth.

      Perfect knowledge of the C language (something I don't claim to have)
      would not enable someone to answer your question; that should be a
      clue that this is not the place to ask it. Consider measuring it
      yourself, on your own system with your own code.

      [...]
      But I'm "listening" ;
      I won't speculate on what the quotation marks are supposed to mean.
      tell me how many cycles different "systems"
      would consume to perform each method...
      See above.
      or is memmove() kind
      of like fread() and fwrite(), you THINK that they're doing something
      "special", but really they're using fgetc() and fputc() which in most
      cases you probably could have just used yourself and eliminated
      the "middleman" ...
      The behavior of fread() and fwrite() is defined in terms of fgetc()
      and fputc(), but there's no requirement that they actually be
      implemented that way. Conversely, fgetc() and fputc() typically use
      buffering, which means that 1000 calls to fputc() won't necessarily be
      faster or slower than an equivalent single call to fwrite(). Use
      whatever is clearer.

      [...]
      >Consider letting the compiler do its job.
      >
      The compiler has "told" me which it considers faster, since it is
      documented that the default optimization is to replace sub-scripts
      with pointers. Therefore I cannot really be preventing the compiler
      from doing its "job" (your opinion) by pre-emptively writing my
      code to do the same thing as the "optimizer" , at least for this
      particular "optimizati on".
      Perhaps for this particular compiler. Other compilers for other
      systems might behave differently.
      Riiiiiiiiiiiiii ight? Now as far as some other optimizations are
      concerned, I can't re-write my code to emulate those, but THOSE
      tend to be the ones I am really concerned about in the first place,
      with GOOD reason...
      --
      Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
      Nokia
      "We must do something. This is something. Therefore, we must do this."
      -- Antony Jay and Jonathan Lynn, "Yes Minister"

      Comment

      • Ben Bacarisse

        #48
        Re: Comment on trim string function please

        "Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
        Ben Bacarisse <ben.usenet@bsb .me.ukwrote in message
        news:87iqvbjbeg .fsf@bsb.me.uk. ..
        >"Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
        >
        ><snip>
        In any event, does the following win the prize for most efficient
        implementation of the presumed requirements of the function?
        >>
        >I think you have a bug.
        >>
        char *remove_beg_end _non_text(char *text) {
        char *beg;
        size_t length;
        >
        for(beg=text;*b eg!='\0';beg++)
        if(!isspace(*be g)) break;
        >>
        >I'd guess
        >>
        > for (beg = text; isspace((unsign ed char)*beg); beg++);
        >>
        >makes shorter code with some compilers.
        >
        Yeah, but does it fly past the terminating null character?
        Nope. Why would you think that?
        I think
        you may have out-obfuscated me...
        I am happy to oblige. Those were your rules: you wanted minimal code;
        your own example suggests clarity does not matter (although I think my
        loop is clearer, but that is simply option).
        Also, I've not been following why the cast is important here...
        >
        length=strlen(b eg);
        >
        while(length>0)
        if(!isspace(*(b eg+(--length)))) break;
        >
        *(beg+(++length ))='\0';
        >>
        >If length never is never decremented (because it was zero after the
        >initial space scan) then this writes outside the string. It always
        >helps to walk through what your code does in boundary cases like ""
        >and " ".
        >
        Actually, the "boundary case" for this is the size of array; as long
        as it is at least one character bigger than the "string" then this will
        always work. But you are correct if you replace the word "string" with
        the word "array fully-populated by the string" above.
        >
        There are actually only five possible test cases, which all should
        ASSUME a "array fully-populated by the string" (which I unfortunately
        didn't): spaces before text, spaces after the text, spaces before and
        after the text, just spaces, empty string (six if you count the stupidity
        of passing NULL). Pass those five (or six) cases, the thing is
        perfect...
        I am getting lost in all the words. I think your code is wrong (in
        two of the five cases you are considering). Are you saying it is
        correct? Like many "off by one" bugs it may not produce undefined
        behaviour. For example passing char test[2] = ""; is fine because of
        the extra byte, but it is still a bug.
        return beg==text ? text : memmove(text,be g,length+1);
        }
        >
        Gotta admit, you couldn't reduce the cycles too much on
        that, could you? And it could even win a little bonus prize
        for obfuscatory conditions like if(!isspace(*(b eg+(--length))))...
        >>
        >You might have obscure it even to yourself!
        >
        OK, I need to work on a few things to make it the perfect piece
        of obscure ruthlessly efficient code...
        I'd want to be sure it is correct first. I would be the first admit I
        make mistakes but you have not persuaded me that I am wrong about
        this. I tried to be as clear about the bug as possible.

        --
        Ben.

        Comment

        • Mark McIntyre

          #49
          Re: Comment on trim string function please

          Bill Reid wrote:
          I was looking for the "Answers Department",
          Answers for what sort of questions? The price of tea in china? how to
          iron shirts? Did you try yahoo answers?

          but accidentally walked
          into the "Department of Begging the Question" (part of the "Clueless
          Pedants Department", sharing a common desk with the "Pointless
          Arguments Department")...
          Feel free to insult the experts who hang out here, I'm sure they'll be
          even keener to help you after you've been rude to them.
          ---
          William Ernest Reid
          ps one too many dashes, and no space in your sigsep.

          --
          Mark McIntyre

          CLC FAQ <http://c-faq.com/>
          CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >

          Comment

          • Mark McIntyre

            #50
            Re: Comment on trim string function please

            Bill Reid wrote:
            >(I wrote)
            >So to get a good answer, you need to ask in a system-specific news group.
            >
            OK, I'll take that as a "I have no idea"...
            Take it how you like. It means what it says.
            For the record, I have several answers, depending on which system you're
            interested in. However you have zero chance of getting any of them from
            me now.
            >all you had
            to do was pick ANY system (or two or three) that you were familiar with
            to answer the question...
            And had you asked the question over in a system-specific group, I'd have
            answered with a relevant answer.

            However its not topical here, so given your abusive attitude I suggest
            you swivel on it.


            --
            Mark McIntyre

            CLC FAQ <http://c-faq.com/>
            CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >

            Comment

            • CBFalconer

              #51
              Re: Comment on trim string function please

              Bill Reid wrote:
              <badc0de4@gmail .comwrote:
              >Bill Reid wrote:
              >>
              .... snip ...
              >>
              >>char *remove_beg_end _non_text(char *text) {
              >> char *beg;
              >> size_t length;
              >>>
              >> for (beg = text; *beg != '\0'; beg++)
              >> if (!isspace(*beg) ) break;
              (spaces added for legibility above)
              >>
              >BUG: missing cast
              >
              I've not been following why the cast is important; is this some
              type of "error" that has never actually occured on the planet
              Earth but MIGHT happen in the unpredictable future?
              Serious bug. isspace(int) requires the integer value of an
              unsigned char. Repeat, unsigned. If the char type on any machine
              is signed isspace can receive a negative value, and quite likely
              blowup. That's why the argument to isspace should receive an
              explicit cast. The only negative value those functions can receive
              is EOF.

              The cast can be avoided when the integer output of getc() is
              passed, because getc returns the appropriately cast value in the
              first place. But you are getting those chars from a string, and
              whether a raw char is signed or unsigned is always undefined.

              --
              [mail]: Chuck F (cbfalconer at maineline dot net)
              [page]: <http://cbfalconer.home .att.net>
              Try the download section.


              Comment

              • Richard Heathfield

                #52
                Re: Comment on trim string function please

                CBFalconer said:

                <snip>
                The cast can be avoided when the integer output of getc() is
                passed, because getc returns the appropriately cast value in the
                first place. But you are getting those chars from a string, and
                whether a raw char is signed or unsigned is always undefined.
                Your major point is correct, but you have one detail wrong. Whether a raw
                char is signed or unsigned is never undefined - it is always
                implementation-defined. "If a member of the required source character set
                enumerated in $2.2.1 is stored in a char object, its value is guaranteed
                to be positive. If other quantities are stored in a char object, the
                behavior is implementation-defined: the values are treated as either
                signed or nonnegative integers."

                --
                Richard Heathfield <http://www.cpax.org.uk >
                Email: -http://www. +rjh@
                Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                "Usenet is a strange place" - dmr 29 July 1999

                Comment

                • Bill Reid

                  #53
                  Re: Comment on trim string function please


                  Ben Bacarisse <ben.usenet@bsb .me.ukwrote in message
                  news:87ej5yk7d9 .fsf@bsb.me.uk. ..
                  "Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
                  Ben Bacarisse <ben.usenet@bsb .me.ukwrote in message
                  news:87iqvbjbeg .fsf@bsb.me.uk. ..
                  "Bill Reid" <hormelfree@hap pyhealthy.netwr ites:
                  <snip>
                  In any event, does the following win the prize for most efficient
                  implementation of the presumed requirements of the function?
                  >
                  I think you have a bug.
                  >
                  char *remove_beg_end _non_text(char *text) {
                  char *beg;
                  size_t length;

                  for(beg=text;*b eg!='\0';beg++)
                  if(!isspace(*be g)) break;
                  >
                  I'd guess
                  >
                  for (beg = text; isspace((unsign ed char)*beg); beg++);
                  >
                  makes shorter code with some compilers.
                  Yeah, but does it fly past the terminating null character?
                  >
                  Nope. Why would you think that?
                  Oh, you know, the fact that you aren't actually testing for
                  the null character, you're just relying on isspace() to return
                  0 for it, which probably is correct in all cases, but just
                  "looks dangerous"...
                  I think
                  you may have out-obfuscated me...
                  >
                  I am happy to oblige. Those were your rules: you wanted minimal code;
                  your own example suggests clarity does not matter (although I think my
                  loop is clearer, but that is simply option).
                  >
                  Also, I've not been following why the cast is important here...
                  length=strlen(b eg);

                  while(length>0)
                  if(!isspace(*(b eg+(--length)))) break;

                  *(beg+(++length ))='\0';
                  >
                  If length never is never decremented (because it was zero after the
                  initial space scan) then this writes outside the string. It always
                  helps to walk through what your code does in boundary cases like ""
                  and " ".
                  Actually, the "boundary case" for this is the size of array; as long
                  as it is at least one character bigger than the "string" then this will
                  always work. But you are correct if you replace the word "string" with
                  the word "array fully-populated by the string" above.

                  There are actually only five possible test cases, which all should
                  ASSUME a "array fully-populated by the string" (which I unfortunately
                  didn't): spaces before text, spaces after the text, spaces before and
                  after the text, just spaces, empty string (six if you count the
                  stupidity
                  of passing NULL). Pass those five (or six) cases, the thing is
                  perfect...
                  >
                  I am getting lost in all the words. I think your code is wrong (in
                  two of the five cases you are considering). Are you saying it is
                  correct?
                  No, it is not correct. You're right, it is a TRUE "bug".
                  Like many "off by one" bugs it may not produce undefined
                  behaviour. For example passing char test[2] = ""; is fine because of
                  the extra byte, but it is still a bug.
                  Of course. The way I allocate arrays for text on THIS system
                  means you might NEVER encounter it, but it is still wrong...
                  return beg==text ? text : memmove(text,be g,length+1);
                  }

                  Gotta admit, you couldn't reduce the cycles too much on
                  that, could you? And it could even win a little bonus prize
                  for obfuscatory conditions like if(!isspace(*(b eg+(--length))))...
                  >
                  You might have obscure it even to yourself!
                  OK, I need to work on a few things to make it the perfect piece
                  of obscure ruthlessly efficient code...
                  >
                  I'd want to be sure it is correct first. I would be the first admit I
                  make mistakes but you have not persuaded me that I am wrong about
                  this. I tried to be as clear about the bug as possible.
                  Don't worry about it, you're right. Also, I'm giving up on writing
                  efficient obcure code for this. Aside from the cast "issues", here's
                  something that is totally "straight-forward" but probably not the
                  most "efficient" algorithm for all cases, or even any (it is also what
                  I actually use for small strings):

                  char *remove_beg_end _non_text(char *text) {
                  char *curr_char,*tex t_char=text;
                  size_t spaces=0;

                  for(curr_char=t ext;*curr_char! ='\0';curr_char ++)
                  if(!isspace(*cu rr_char)) break;

                  while(*curr_cha r!='\0') {

                  if(isspace(*cur r_char)) spaces++;
                  else spaces=0;

                  *text_char++=*c urr_char++;
                  }

                  *(text_char-spaces)='\0';

                  return text;
                  }

                  ASIDE FROM THE CAST "ISSUES", where's the bug in THAT?

                  ---
                  William Ernest Reid



                  Comment

                  • santosh

                    #54
                    Re: Comment on trim string function please

                    Bill Reid wrote:
                    CBFalconer <cbfalconer@yah oo.comwrote in message
                    news:4879374E.5 3E97D90@yahoo.c om...
                    >Bill Reid wrote:
                    <badc0de4@gmail .comwrote:
                    >Bill Reid wrote:
                    >>
                    >... snip ...
                    >>
                    >>char *remove_beg_end _non_text(char *text) {
                    >> char *beg;
                    >> size_t length;
                    >>>
                    >> for (beg = text; *beg != '\0'; beg++)
                    >> if (!isspace(*beg) ) break;
                    >>
                    >(spaces added for legibility above)
                    >>
                    >BUG: missing cast
                    >
                    I've not been following why the cast is important; is this some
                    type of "error" that has never actually occured on the planet
                    Earth but MIGHT happen in the unpredictable future?
                    >>
                    >Serious bug. isspace(int) requires the integer value of an
                    >unsigned char. Repeat, unsigned.
                    >
                    Repetition is not clarification. By declaration, and all available
                    "official" documentation, isspace(int) requires a signed integer
                    value...
                    That's what Chuck is saying too. But you are passing it a plain char
                    value without a cast to unsigned. If plain char happens to be unsigned
                    on your implementation then things are fine, but is it a risk that you
                    want to take?

                    [ ... ]
                    >That's why the argument to isspace should receive an
                    >explicit cast. The only negative value those functions can receive
                    >is EOF.
                    >
                    Sure you're not conflating "strings" with "streams"?
                    How so? As far as I can see he is talking about neither, but the
                    interface of the is* and to* functions, which can be used with both
                    strings and text streams.

                    <snip>

                    Comment

                    • sebastian

                      #55
                      Re: Comment on trim string function please

                      Doing such things as returning malloc'ed memory, checking for NULL, or
                      returning a pointer to the first non-space will generally lead to
                      misuse. Just keep it simple:

                      char*
                      trim( char* str )
                      {
                      char
                      * cpy = str,
                      * seq = str,
                      * fin = str + strlen( str ) - 1;
                      while( seq <= fin && isspace( *seq ) )
                      ++seq;
                      while( seq <= fin && isspace( *fin ) )
                      --fin;
                      while( seq <= fin )
                      *cpy++ = *seq++;
                      *cpy = 0;
                      return str;
                      }

                      I can't guarantee that it's bug free of course...

                      Comment

                      • sebastian

                        #56
                        Re: Comment on trim string function please

                        Ah, it seems there were quite a few more post than I thought (using a
                        new feed reader) - in that case my post must seem a little
                        discombobulated !
                        >Therefore, passing one of those char values to a <ctype.hfunctio n invokes undefined behavior
                        Well, insofar as strictly using isspace as an iterative condition,
                        yes. Otherwise, it really doesn't matter, does it?

                        Comment

                        • Eric Sosman

                          #57
                          Re: Comment on trim string function please

                          sebastian wrote:
                          Doing such things as returning malloc'ed memory, checking for NULL, or
                          returning a pointer to the first non-space will generally lead to
                          misuse. Just keep it simple:
                          >
                          char*
                          trim( char* str )
                          {
                          char
                          * cpy = str,
                          * seq = str,
                          * fin = str + strlen( str ) - 1;
                          while( seq <= fin && isspace( *seq ) )
                          ++seq;
                          while( seq <= fin && isspace( *fin ) )
                          --fin;
                          while( seq <= fin )
                          *cpy++ = *seq++;
                          *cpy = 0;
                          return str;
                          }
                          >
                          I can't guarantee that it's bug free of course...
                          Let's see. Missing <string.h>, missing <ctype.h>,
                          potential undefined behavior if the argument points to the
                          '\0' of an empty string, potential undefined behavior from
                          misuse of isspace() -- other than that, Mrs. Lincoln, what
                          did you think of the play?

                          --
                          Eric Sosman
                          esosman@ieee-dot-org.invalid

                          Comment

                          • Eric Sosman

                            #58
                            Re: Comment on trim string function please

                            Bill Reid wrote:
                            Eric Sosman <esosman@ieee-dot-org.invalidwrot e
                            >>
                            > "If it works, it's not sufficiently tested." I'll bet you
                            >a cookie that either (1) char is in fact not signed on your
                            >systems, or (2) you haven't tested "Götterdãmmerun g" or "Aïda"
                            >or "µsec" or "garçon" or "£1 is worth ¥210" or ...
                            >
                            How many negative values are there in those characters that
                            are not part of the ASCII character set?
                            If char is signed, 127 or more.
                            Does the concept of
                            "sign-preserving" come into play here?
                            Hunnh?
                            As far as whether my "system" uses a default signed char,
                            in one place in the documentation they state that they USED
                            to do it that way, IMPLYING they don't any more, but don't worry
                            about it in any event, then in another place they EXPLICITLY
                            state that's how they do it currently.
                            >
                            Bastards...
                            There are at least two easy ways to find out by experiment.
                            > 1) The <ctype.hfunctio ns take an argument of type int.
                            >>
                            > 2) The value of the int argument must be in the range zero
                            > through UCHAR_MAX (all positive) or else EOF (negative).
                            >
                            WHO TOL YA DAT!!! WHO TOL YA DAT!!!
                            A document that seems unfamiliar to you, but that has been
                            adopted by various international and national standards bodies.
                            > 4) On an implementation where char is signed, some char
                            > values are negative. (Since the "basic execution set"
                            > characters are all non-negative, lackadaisacal testing
                            > will not reveal this.)
                            >
                            OK, where does "locale" play into this?
                            Sitting in with Basie and the boys? The locale setting may
                            affect the result isspace() returns for a given argument, but
                            does not change set of valid arguments.
                            > 5) At least 126 of those negative values are !=EOF.
                            >
                            OK, where does "locale" play into this?
                            Tenor sax backing up Billie Holliday.
                            > 6) Therefore, passing one of those char values to a <ctype.h>
                            > function invokes undefined behavior.
                            >
                            Which of course, includes exactly the expected behavior as one
                            of the possibilities.. .
                            Right. Or, "Hundreds of times don't nuthin' happen a-tall,"
                            says an old joke about the unimportance of contraception.
                            [...]
                            I WANT MY COOKIE!!!!!!
                            >
                            ANY OTHER OF YOU IGNORANT BASTARDS WANT TO GIVE
                            ME SOME FREE FOOD???!!??!!
                            You haven't yet shown that char is in fact signed on your
                            system. In a way, it is both shameful and laudable that you
                            don't know ...

                            --
                            Eric Sosman
                            esosman@ieee-dot-org.invalid

                            Comment

                            • Barry Schwarz

                              #59
                              Re: Comment on trim string function please

                              On Sun, 13 Jul 2008 14:10:04 -0700 (PDT), sebastian
                              <sebastiangarth @gmail.comwrote :
                              >Ah, it seems there were quite a few more post than I thought (using a
                              >new feed reader) - in that case my post must seem a little
                              >discombobulate d!
                              >
                              >>Therefore, passing one of those char values to a <ctype.hfunctio n invokes undefined behavior
                              >
                              >Well, insofar as strictly using isspace as an iterative condition,
                              >yes. Otherwise, it really doesn't matter, does it?
                              Undefined behavior always matters if you want you code to work.


                              Remove del for email

                              Comment

                              • CBFalconer

                                #60
                                Re: Comment on trim string function please

                                Bill Reid wrote:
                                Eric Sosman <esosman@ieee-dot-org.invalidwrot e:
                                >
                                .... snip ...
                                >
                                I WANT MY COOKIE!!!!!!
                                >
                                ANY OTHER OF YOU IGNORANT BASTARDS WANT TO GIVE
                                ME SOME FREE FOOD???!!??!!
                                Since you are ignoring advice, and being rude about it, I see no
                                reason for paying any further attention to you. PLONK.

                                --
                                [mail]: Chuck F (cbfalconer at maineline dot net)
                                [page]: <http://cbfalconer.home .att.net>
                                Try the download section.


                                Comment

                                Working...