Mystery: static variables & performance

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Peter Nilsson

    Re: Mystery: static variables & performance

    "CBFalconer " <cbfalconer@yah oo.com> wrote in message
    news:4036B315.B 8FE1A7E@yahoo.c om...[color=blue]
    > Peter Nilsson wrote:[/color]
    ....[color=blue][color=green][color=darkred]
    > > > >>> int strcmp(const char *s1, const char *s2) {
    > > > >>> while ( *s1 == *s2 && *s1 )
    > > > >>> ++s1, ++s2;
    > > > >>>
    > > > >>> return *s1 - *s2;
    > > > >>> }[/color][/color]
    >...[color=green]
    > >
    > > The most robust answer would seem to be...
    > >
    > > return (unsigned char) *s1 > (unsigned char) *s2
    > > - (unsigned char) *s1 < (unsigned char) *s2;[/color][/color]

    Oops...

    return ((unsigned char) *s1 > (unsigned char) *s2)
    - ((unsigned char) *s1 < (unsigned char) *s2);
    [color=blue][color=green]
    > >
    > > or...
    > >
    > > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
    > > : (unsigned char) *s1 > (unsigned char) *s2;[/color]
    >
    > Why cast to unsigned char?[/color]

    Because that is the specification for strcmp().
    [color=blue]
    > If native chars are signed, I would
    > want this routine to respect that.[/color]

    If we weren't talking about strcmp, you would be free to do that. But you
    may get some unexpected surprises, e.g. "a" > "aé".
    [color=blue]
    > Casts are usually a sign of evil doings.[/color]

    You can do the same thing without casts, if you like.

    --
    Peter


    Comment

    • R. Rajesh Jeba Anbiah

      Re: Mystery: static variables &amp; performance

      nrk <ram_nrk2000@de vnull.verizon.n et> wrote in message news:<YKgZb.527 18$1S1.40291@nw rddc01.gnilink. net>...

      <snip>

      Ram, sorry for my late follow up; I was not feeling well for past
      2days. Now I see, the thread has some more useful info. Thanks.

      --
      "Success = 10% sweat + 90% tears"
      If you live in USA, please support John Edwards.
      http://guideme.itgo.com/atozofc/ - "A to Z of C" Project
      Email: rrjanbiah-at-Y!com

      Comment

      • CBFalconer

        Re: Mystery: static variables &amp; performance

        Peter Nilsson wrote:[color=blue]
        > "CBFalconer " <cbfalconer@yah oo.com> wrote in message[color=green]
        > > Peter Nilsson wrote:[/color]
        > ...[color=green][color=darkred]
        > > > > >>> int strcmp(const char *s1, const char *s2) {
        > > > > >>> while ( *s1 == *s2 && *s1 )
        > > > > >>> ++s1, ++s2;
        > > > > >>>
        > > > > >>> return *s1 - *s2;
        > > > > >>> }[/color]
        > >...[color=darkred]
        > > >
        > > > The most robust answer would seem to be...
        > > >[/color][/color][/color]
        .... snip ...[color=blue][color=green][color=darkred]
        > > >
        > > > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
        > > > : (unsigned char) *s1 > (unsigned char) *s2;[/color]
        > >
        > > Why cast to unsigned char?[/color]
        >
        > Because that is the specification for strcmp().
        >[color=green]
        > > If native chars are signed, I would want this routine to
        > > respect that. Casts are usually a sign of evil doings.[/color]
        >
        > If we weren't talking about strcmp, you would be free to do that.
        > But you may get some unexpected surprises, e.g. "a" > "aé".[/color]

        I still see no justification for the cast. I know of nothing that
        specifies that strings consist of unsigned chars. However that is
        a good argument for having the system specify char to be
        unsigned. From N869:

        7.21.4.2 The strcmp function

        Synopsis
        [#1]
        #include <string.h>
        int strcmp(const char *s1, const char *s2);

        Description

        [#2] The strcmp function compares the string pointed to by
        s1 to the string pointed to by s2.

        Returns

        [#3] The strcmp function returns an integer greater than,
        equal to, or less than zero, accordingly as the string
        pointed to by s1 is greater than, equal to, or less than the
        string pointed to by s2.

        --
        Chuck F (cbfalconer@yah oo.com) (cbfalconer@wor ldnet.att.net)
        Available for consulting/temporary embedded and systems.
        <http://cbfalconer.home .att.net> USE worldnet address!

        Comment

        • Arthur J. O'Dwyer

          Re: Mystery: static variables &amp; performance


          On Sat, 21 Feb 2004, CBFalconer wrote:[color=blue]
          >
          > Peter Nilsson wrote:[color=green]
          > > "CBFalconer " <cbfalconer@yah oo.com> wrote in message[color=darkred]
          > > > Peter Nilsson wrote:[/color]
          > > ...[color=darkred]
          > > > > > >>> int strcmp(const char *s1, const char *s2) {
          > > > > > >>> while ( *s1 == *s2 && *s1 )
          > > > > > >>> ++s1, ++s2;
          > > > >
          > > > > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
          > > > > : (unsigned char) *s1 > (unsigned char) *s2;
          > > >
          > > > Why cast to unsigned char?[/color]
          > >
          > > Because that is the specification for strcmp().[/color]
          >
          > I still see no justification for the cast.[/color]

          Look about two subsections earlier in N869, where it discusses the
          semantics of comparison functions:

          7.21.4 Comparison functions

          [#1] The sign of a nonzero value returned by the comparison
          functions memcmp, strcmp, and strncmp is determined by the
          sign of the difference between the values of the first pair
          of characters (both interpreted as unsigned char) that
          differ in the objects being compared.

          IMHO this is a silly requirement; I would *expect* memcmp to do
          unsigned comparisons and 'strcmp' to do plain char comparisons,
          but for whatever reason the C committee decided otherwise.

          HTH,
          -Arthur

          Comment

          • Joe Wright

            Re: Mystery: static variables &amp; performance

            CBFalconer wrote:[color=blue]
            >
            > Peter Nilsson wrote:[color=green]
            > > "CBFalconer " <cbfalconer@yah oo.com> wrote in message[color=darkred]
            > > > Peter Nilsson wrote:[/color]
            > > ...[color=darkred]
            > > > > > >>> int strcmp(const char *s1, const char *s2) {
            > > > > > >>> while ( *s1 == *s2 && *s1 )
            > > > > > >>> ++s1, ++s2;
            > > > > > >>>
            > > > > > >>> return *s1 - *s2;
            > > > > > >>> }
            > > >...
            > > > >
            > > > > The most robust answer would seem to be...
            > > > >[/color][/color]
            > ... snip ...[color=green][color=darkred]
            > > > >
            > > > > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
            > > > > : (unsigned char) *s1 > (unsigned char) *s2;
            > > >
            > > > Why cast to unsigned char?[/color]
            > >
            > > Because that is the specification for strcmp().
            > >[color=darkred]
            > > > If native chars are signed, I would want this routine to
            > > > respect that. Casts are usually a sign of evil doings.[/color]
            > >
            > > If we weren't talking about strcmp, you would be free to do that.
            > > But you may get some unexpected surprises, e.g. "a" > "aé".[/color]
            >
            > I still see no justification for the cast. I know of nothing that
            > specifies that strings consist of unsigned chars. However that is
            > a good argument for having the system specify char to be
            > unsigned. From N869:
            >
            > 7.21.4.2 The strcmp function
            >
            > Synopsis
            > [#1]
            > #include <string.h>
            > int strcmp(const char *s1, const char *s2);
            >
            > Description
            >
            > [#2] The strcmp function compares the string pointed to by
            > s1 to the string pointed to by s2.
            >
            > Returns
            >
            > [#3] The strcmp function returns an integer greater than,
            > equal to, or less than zero, accordingly as the string
            > pointed to by s1 is greater than, equal to, or less than the
            > string pointed to by s2.
            >[/color]
            Don't we have a guarantee that characters in our set are positive? With
            signed char, in the range 00..127 (ASCII)? I've read that EBCDIC
            implementations , because characters can be > 127 implement unsigned char
            just so that characters remain positive.

            If this is the case, subtracting one positive integer from another
            cannot overflow.
            --
            Joe Wright http://www.jw-wright.com
            "Everything should be made as simple as possible, but not simpler."
            --- Albert Einstein ---

            Comment

            • CBFalconer

              Re: Mystery: static variables &amp; performance

              "Arthur J. O'Dwyer" wrote:[color=blue]
              >
              > On Sat, 21 Feb 2004, CBFalconer wrote:[color=green]
              > >
              > > Peter Nilsson wrote:[color=darkred]
              > > > "CBFalconer " <cbfalconer@yah oo.com> wrote in message
              > > > > Peter Nilsson wrote:
              > > > ...
              > > > > > > >>> int strcmp(const char *s1, const char *s2) {
              > > > > > > >>> while ( *s1 == *s2 && *s1 )
              > > > > > > >>> ++s1, ++s2;
              > > > > >
              > > > > > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
              > > > > > : (unsigned char) *s1 > (unsigned char) *s2;
              > > > >
              > > > > Why cast to unsigned char?
              > > >
              > > > Because that is the specification for strcmp().[/color]
              > >
              > > I still see no justification for the cast.[/color]
              >
              > Look about two subsections earlier in N869, where it discusses the
              > semantics of comparison functions:
              >
              > 7.21.4 Comparison functions
              >
              > [#1] The sign of a nonzero value returned by the comparison
              > functions memcmp, strcmp, and strncmp is determined by the
              > sign of the difference between the values of the first pair
              > of characters (both interpreted as unsigned char) that
              > differ in the objects being compared.
              >
              > IMHO this is a silly requirement; I would *expect* memcmp to do
              > unsigned comparisons and 'strcmp' to do plain char comparisons,
              > but for whatever reason the C committee decided otherwise.[/color]

              Aha - that justifies Peter Nilssons attitude, and shoots down
              mine. It does ensure that the shorter substring compares as less
              than the longer.

              It would be nice if such encompassing clauses were referenced in
              the individual descriptions, i.e. "See also 7.21.4".

              --
              Chuck F (cbfalconer@yah oo.com) (cbfalconer@wor ldnet.att.net)
              Available for consulting/temporary embedded and systems.
              <http://cbfalconer.home .att.net> USE worldnet address!


              Comment

              • pete

                Re: Mystery: static variables &amp; performance

                Peter Nilsson wrote:
                [color=blue]
                > it does not make comparisons based on unsigned char values.
                > [A requirement for the real strcmp().][/color]

                I don't see that in the standard.

                --
                pete

                Comment

                • pete

                  Re: Mystery: static variables &amp; performance

                  pete wrote:[color=blue]
                  >
                  > Peter Nilsson wrote:
                  >[color=green]
                  > > it does not make comparisons based on unsigned char values.
                  > > [A requirement for the real strcmp().][/color]
                  >
                  > I don't see that in the standard.[/color]

                  OK, now I do.

                  --
                  pete

                  Comment

                  • Ben Pfaff

                    Re: Mystery: static variables &amp; performance

                    pete <pfiland@mindsp ring.com> writes:
                    [color=blue]
                    > Peter Nilsson wrote:
                    >[color=green]
                    >> it does not make comparisons based on unsigned char values.
                    >> [A requirement for the real strcmp().][/color]
                    >
                    > I don't see that in the standard.[/color]

                    7.21.4 Comparison functions

                    1 The sign of a nonzero value returned by the comparison
                    functions memcmp, strcmp, and strncmp is determined by the
                    sign of the difference between the values of the first pair
                    of characters (both interpreted as unsigned char) that
                    differ in the objects being compared.

                    --
                    "Debugging is twice as hard as writing the code in the first place.
                    Therefore, if you write the code as cleverly as possible, you are,
                    by definition, not smart enough to debug it."
                    --Brian Kernighan

                    Comment

                    • nrk

                      Re: Mystery: static variables &amp; performance

                      Peter Nilsson wrote:
                      [color=blue]
                      > "nrk" <ram_nrk2000@de vnull.verizon.n et> wrote in message
                      > news:yVgZb.5276 7$1S1.8086@nwrd dc01.gnilink.ne t...[color=green]
                      >> Richard Heathfield wrote:
                      >>[color=darkred]
                      >> > R. Rajesh Jeba Anbiah wrote:
                      >> >
                      >> >> nrk <ram_nrk2000@de vnull.verizon.n et> wrote in message
                      >> >> news:<WiNYb.437 98$1S1.37569@nw rddc01.gnilink. net>...
                      >> >>>
                      >> > <snip>
                      >> >
                      >> >>> PS: Your book, section 4.2: The WAR style example of strcmp is
                      >> >>> atrocious
                      >> >>> IMO. If you must insist on a single return, here's a clearer version[/color][/color]
                      > of[color=green][color=darkred]
                      >> >>> strcmp:
                      >> >>>
                      >> >>> int strcmp(const char *s1, const char *s2) {
                      >> >>> while ( *s1 == *s2 && *s1 )
                      >> >>> ++s1, ++s2;
                      >> >>>
                      >> >>> return *s1 - *s2;
                      >> >>> }
                      >> >>
                      >> >> Thanks a lot for your interest in the quality of the book. As you
                      >> >> see, it has it's bug reporting corner; please don't take c.l.c as the
                      >> >> one. Thanks for your help; thanks for your understanding.
                      >> >
                      >> > Since he /did/ post his "clearer version" to comp.lang.c, you should at
                      >> > least get some feedback as to what is wrong with his correction. Do you
                      >> > see the flaw? If not, then how do can you do quality control on the bug
                      >> > reports you receive?
                      >> >
                      >> > comp.lang.c is good at this sort of thing.
                      >> >
                      >> > (Hint: the problem I can see has nothing to do with the loop.)
                      >> >[/color]
                      >>
                      >> Ok, here's my take on this:
                      >>
                      >> a) sizeof(int) > 1 for hosted implementations .
                      >> http://www.google.com/groups?selm=bu...unnews.cern.ch[/color]
                      >
                      > There mere fact that Dan Pop says so does not make it so! ;)
                      >
                      > There are actual members of the C Committee who disagree on this.
                      >[color=green]
                      >> So, integer overflow not an issue, yes?[/color]
                      >
                      > No. Even if sizeof(int) == 2, you can still have INT_MAX < UCHAR_MAX.
                      > [It's the limits which are important, not the byte size.]
                      >[/color]

                      Sorry, I goofed it up, but if you read the quoted thread, the idea is that
                      INT_MAX >= UCHAR_MAX for hosted implementations by implication. Can you
                      point out why there can be a disagreement on that?
                      [color=blue][color=green]
                      >>
                      >> b) Peter's concern still remains. So, does changing the last line to:
                      >>
                      >> return *(unsigned char *)s1 - *(unsigned char *)s2;
                      >>
                      >> make it alright?[/color]
                      >
                      > No. Reading chars via an unsigned char lvalue can produce a different
                      > value to the original.
                      >[/color]

                      This is only in the presence of padding bits, right? Or is there something
                      else that I am missing here?

                      -nrk.
                      [color=blue]
                      > Since character constants and I/O are based on unsigned char -> int ->
                      > char _conversions_ when storing plain char strings, the correct answer
                      > (assuming no integer overflow) is to use a _conversion_ of the plain char
                      > value to unsigned char...
                      >
                      > return (unsigned char) *s1 - (unsigned char) *s2;
                      >
                      > Note that on most implementations (8-bit, 2c, no padding) there is no need
                      > to go to this extreme, although the result is the same.
                      >
                      > The most robust answer would seem to be...
                      >
                      > return (unsigned char) *s1 > (unsigned char) *s2
                      > - (unsigned char) *s1 < (unsigned char) *s2;
                      >
                      > or...
                      >
                      > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
                      > : (unsigned char) *s1 > (unsigned char) *s2;
                      >
                      > --
                      > Peter[/color]

                      --
                      Remove devnull for email

                      Comment

                      • pete

                        Re: Mystery: static variables &amp; performance

                        Peter Nilsson wrote:
                        [color=blue]
                        > No. Even if sizeof(int) == 2, you can still have INT_MAX < UCHAR_MAX. [It's
                        > the limits which are important, not the byte size.]
                        >[color=green]
                        > >
                        > > b) Peter's concern still remains. So, does changing the last line to:
                        > >
                        > > return *(unsigned char *)s1 - *(unsigned char *)s2;
                        > >
                        > > make it alright?[/color]
                        >
                        > No. Reading chars via an unsigned char lvalue
                        > can produce a different value to the original.[/color]

                        That's what's called for.
                        [color=blue]
                        > Since character constants and I/O are based on
                        > unsigned char -> int -> char
                        > _conversions_ when storing plain char strings,
                        > the correct answer (assuming no integer overflow)
                        > is to use a _conversion_ of the plain char value to
                        > unsigned char...[/color]

                        Conversion is not called for.
                        The functions which use converted values, have the word "converted"
                        in their function descriptions.
                        [color=blue]
                        > return (unsigned char) *s1 - (unsigned char) *s2;
                        >
                        > Note that on most implementations
                        > (8-bit, 2c, no padding) there is no need
                        > to go to this extreme, although the result is the same.
                        >
                        > The most robust answer would seem to be...
                        >
                        > return (unsigned char) *s1 > (unsigned char) *s2
                        > - (unsigned char) *s1 < (unsigned char) *s2;
                        >
                        > or...
                        >
                        > return (unsigned char) *s1 < (unsigned char) *s2 ? -1
                        > : (unsigned char) *s1 > (unsigned char) *s2;[/color]

                        I'm not seeing it that way.
                        ((unsigned char) *s1), is *s1 *Converted* to unsigned char.
                        (*(unsigned char*)s1), is *s1, interpreted as unsigned char.

                        N869
                        7.21.4 Comparison functions
                        [#1] The sign of a nonzero value returned by the comparison
                        functions memcmp, strcmp, and strncmp is determined by the
                        sign of the difference between the values of the first pair
                        of characters (both interpreted as unsigned char) that
                        differ in the objects being compared.

                        memchr is a function which uses both converted and
                        differently interpreted values.

                        N869
                        7.21.5.1 The memchr function
                        Description
                        [#2] The memchr function locates the first occurrence of c
                        (converted to an unsigned char) in the initial n characters
                        (each interpreted as unsigned char) of the object pointed to
                        by s.

                        void *memchr(const void *s, int c, size_t n)
                        {
                        const unsigned char *p = s;

                        while (n-- != 0) {
                        if (*p == (unsigned char)c) {
                        return (void *)p;
                        }
                        ++p;
                        }
                        return NULL;
                        }

                        --
                        pete

                        Comment

                        Working...