Should I use "char" or "unsigned char" for strings?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John Devereux

    Should I use "char" or "unsigned char" for strings?

    Hi,

    I would like some advice on whether I should be using plain "chars"
    for strings. I have instead been using "unsigned char" in my code (for
    embedded systems). In general the strings contain ASCII characters in
    the 0-127 range, although I had thought that I might want to use the
    128-255 range for special symbols or foreign character codes.

    This all worked OK for a long time, but a recent update to the
    compiler on my system has resulted in a lot of errors such as:

    "pointer targets in passing argument 1 of 'strcpy' differ in
    signedness"

    Basically, the compiler is now protesting about me passing strings of
    "unsigned char" to standard library functions that expect "char"
    (which seems to be most of them).

    I can rewrite my code to use plain chars. Or, I can cast the string
    pointers in the standard library function calls. Both of these will
    need quite a lot of (fairly trivial) changes. Or I expect I can turn
    the warnings off.

    (I would think this topic must be beaten to death, but I did not see
    anything in the FAQ!).

    Thanks,

    --

    John Devereux
  • Emmanuel Delahaye

    #2
    Re: Should I use "char&quot ; or "unsign ed char" for strings?

    John Devereux wrote on 28/03/05 :[color=blue]
    > I would like some advice on whether I should be using plain "chars"
    > for strings. I have instead been using "unsigned char" in my code (for
    > embedded systems). In general the strings contain ASCII characters in
    > the 0-127 range, although I had thought that I might want to use the
    > 128-255 range for special symbols or foreign character codes.[/color]

    Stick to char for strings. You could activate some 'make char unsigned'
    option if you need a 0-255 range. But AFAIK, it's not necessary. Values
    128..255 are encoded -1..-127 on most machines.

    --
    Emmanuel
    The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
    The C-library: http://www.dinkumware.com/refxc.html

    "Clearly your code does not meet the original spec."
    "You are sentenced to 30 lashes with a wet noodle."
    -- Jerry Coffin in a.l.c.c++

    Comment

    • Eric Sosman

      #3
      Re: Should I use "char&quot ; or "unsign ed char" for strings?



      Emmanuel Delahaye wrote:[color=blue]
      > John Devereux wrote on 28/03/05 :
      >[color=green]
      >>I would like some advice on whether I should be using plain "chars"
      >>for strings. I have instead been using "unsigned char" in my code (for
      >>embedded systems). In general the strings contain ASCII characters in
      >>the 0-127 range, although I had thought that I might want to use the
      >>128-255 range for special symbols or foreign character codes.[/color]
      >
      >
      > Stick to char for strings. You could activate some 'make char unsigned'
      > option if you need a 0-255 range. But AFAIK, it's not necessary. Values
      > 128..255 are encoded -1..-127 on most machines.[/color]

      ITYM -128..-1 -- but the advice is sound.

      The possible signedness of `char' is, IMHO, one of
      the nagging infelicities of C. It's an imperfection we
      simply have to live with, and attempts to get around it
      by type-punning with `unsigned char' aren't satisfactory.
      As Emmanuel says, use plain `char' when dealing with
      characters -- but when using the <ctype.h> functions,
      take care to cast where needed:

      #include <ctype.h>
      const char *skip_whitespac e(const char *string) {
      while (isspace((unsig ned char)*string)
      ++string;
      return string;
      }

      Despite appearances, the cast is required if there's any
      chance at all of "extended" characters in the strings. I
      can't think of any other standard library functions that
      require such ugliness, so if you switch to plain `char'
      strings there shouldn't be too many places where you need
      to insert casts.

      --
      Eric.Sosman@sun .com

      Comment

      • John Devereux

        #4
        Re: Should I use &quot;char&quot ; or &quot;unsign ed char&quot; for strings?

        Eric Sosman <eric.sosman@su n.com> writes:
        [color=blue]
        > Emmanuel Delahaye wrote:[color=green]
        > > John Devereux wrote on 28/03/05 :
        > >[color=darkred]
        > >>I would like some advice on whether I should be using plain "chars"
        > >>for strings. I have instead been using "unsigned char" in my code (for
        > >>embedded systems). In general the strings contain ASCII characters in
        > >>the 0-127 range, although I had thought that I might want to use the
        > >>128-255 range for special symbols or foreign character codes.[/color]
        > >
        > >
        > > Stick to char for strings. You could activate some 'make char unsigned'
        > > option if you need a 0-255 range. But AFAIK, it's not necessary. Values
        > > 128..255 are encoded -1..-127 on most machines.[/color]
        >
        > ITYM -128..-1 -- but the advice is sound.[/color]

        OK, should cover any machine I am likely to encounter.
        [color=blue]
        >
        > The possible signedness of `char' is, IMHO, one of
        > the nagging infelicities of C. It's an imperfection we
        > simply have to live with, and attempts to get around it
        > by type-punning with `unsigned char' aren't satisfactory.
        > As Emmanuel says, use plain `char' when dealing with
        > characters -- but when using the <ctype.h> functions,
        > take care to cast where needed:
        >
        > #include <ctype.h>
        > const char *skip_whitespac e(const char *string) {
        > while (isspace((unsig ned char)*string)
        > ++string;
        > return string;
        > }
        >
        > Despite appearances, the cast is required if there's any
        > chance at all of "extended" characters in the strings. I
        > can't think of any other standard library functions that
        > require such ugliness, so if you switch to plain `char'
        > strings there shouldn't be too many places where you need
        > to insert casts.[/color]

        Great, I use rarely use these anyway.

        What about conversion to and from an "int" I wonder? Some of my
        functions process a string character by character, calling another
        function with that character. This will presumably get promoted to an
        int, right? And then probably converted back to a "char" again in the
        function. As I understand it, an in-range negative "int" is guaranteed
        to get converted to the same negative "char" value. So we should be
        OK.

        --

        John Devereux

        Comment

        • Eric Sosman

          #5
          Re: Should I use &quot;char&quot ; or &quot;unsign ed char&quot; for strings?



          John Devereux wrote:[color=blue]
          > [...]
          > What about conversion to and from an "int" I wonder? Some of my
          > functions process a string character by character, calling another
          > function with that character. This will presumably get promoted to an
          > int, right? And then probably converted back to a "char" again in the
          > function. As I understand it, an in-range negative "int" is guaranteed
          > to get converted to the same negative "char" value. So we should be
          > OK.[/color]

          There are three cases:

          If the function is prototyped to take a `char' argument,
          the `char' value you provide is passed to the function without
          conversion or promotion, and received just as you passed it.
          There may be behind-the-scenes magic involved (e.g., passing
          an eight-bit value in a 32-bit register), but the effect must
          be "as if" nothing happens.

          If the `char' you provide is passed to an old-style
          function (no prototype) that expects a `char' argument, the
          provided value is promoted, passed, and then "demoted" upon
          receipt. Again, the value arrives unscathed even though the
          representation may change on "exotic" hardware: if you provide
          a negative zero the function might receive a positive zero,
          but it will in any case receive a zero.

          If the `char' argument corresponds to part of the `...'
          of a variable-argument function, the value is promoted just
          as for prototypeless functions. In this case, though, you
          actually need to know the promoted type when you fetch the
          argument: `va_arg(ap, char)' is incorrect. A `char' will
          promote to `int' if `int' can represent all possible values
          a `char' might have, or to `unsigned int' otherwise. From
          your earlier posts it appears you're assuming an eight-bit
          `char' (values between -128 and 255), which fits comfortably
          in the range of `int' (at least -32767..32767, perhaps wider).
          Some systems, though, have sizeof(int)==si zeof(char)==1, and
          if `char' is unsigned on such a system it will promote to
          `unsigned int' instead of `int'.

          --
          Eric.Sosman@sun .com

          Comment

          Working...