Multi-character constants

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mirco Wahab

    Multi-character constants

    After reading through some (open) Intel (CPU detection)
    C++ source (https://www.intel.com/cd/ids/develop...eng/276611.htm)
    I stumbled upon a sketchy use of multibyte characters

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    260:
    unsigned int VendorID[3] = {0, 0, 0};
    try // If CPUID instruction is supported
    {
    ...
    }
    catch (...)
    {
    ...
    }
    return (
    (VendorID[0] == 'uneG') &&
    (VendorID[1] == 'Ieni') &&
    (VendorID[2] == 'letn')
    );

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    This seems to work, gcc 4.2 emits a warning:

    "warning: multi-character character constant"

    and Visual C++ 9 says nothing at all.

    Whats the matter w/multibyte characters now?
    I didn't use them and would be glad to learn
    if they are widely implemented and part of
    the standard soon/now?

    gcc tells us: (http://gcc.gnu.org/onlinedocs/gcc/Ch...mentation.html)
    ...
    [Characters]
    ...
    The value of a wide character constant containing more than
    one multibyte character, or containing a multibyte character
    or escape sequence not represented in the extended execution
    character set (C90 6.1.3.4, C99 6.4.4.4).
    ...



    Regards & Thanks for clearing this

    M.
  • James Kanze

    #2
    Re: Multi-character constants

    On Jul 9, 4:29 pm, Mirco Wahab <wa...@chemie.u ni-halle.dewrote:
    After reading through some (open) Intel (CPU detection)
    C++ source (https://www.intel.com/cd/ids/develop...eng/276611.htm)
    I stumbled upon a sketchy use of multibyte characters
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    260:
    unsigned int VendorID[3] = {0, 0, 0};
    try // If CPUID instruction is supported
    {
    ...
    }
    catch (...)
    {
    ...
    }
    return (
    (VendorID[0] == 'uneG') &&
    (VendorID[1] == 'Ieni') &&
    (VendorID[2] == 'letn')
    );
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    This seems to work, gcc 4.2 emits a warning:
    "warning: multi-character character constant"
    and Visual C++ 9 says nothing at all.
    Whats the matter w/multibyte characters now?
    First, do you mean multi-byte characters (e.g. UTF-8), or
    multicharacter literals. Your example doesn't contain any
    multi-byte characters, only multicharacter literals.
    I didn't use them and would be glad to learn if they are
    widely implemented and part of the standard soon/now?
    Multicharacter literals are a holdover from the original C. As
    far as I can tell, they have no use, and are of no interest
    whatsoever. And what they mean is implementation defined. All
    of which is probably why g++ warns about them.

    Multi-byte characters are becoming more and more frequent as
    applications shift to UTF-8, for reasons of
    internationaliz ation. True support is still spotty, but getting
    there; the next version of the standard will require it (to some
    degree---there still won't be functions like isdigit which work
    on them).
    gcc tells us: (http://gcc.gnu.org/onlinedocs/gcc/Ch...mentation.html)
    ...
    [Characters]
    ...
    The value of a wide character constant containing more than
    one multibyte character, or containing a multibyte character
    or escape sequence not represented in the extended execution
    character set (C90 6.1.3.4, C99 6.4.4.4).
    ...
    Implementation defined behavior is required to be documented by
    the implementation. In this case, you've cut the only
    significant bit, a link to the implementation defined behavior,
    where you'll find:

    The compiler values a multi-character character constant
    a character at a time, shifting the previous value left
    by the number of bits per target character, and then
    or-ing in the bit-pattern of the new character truncated
    to the width of a target character. The final
    bit-pattern is given type int, and is therefore signed,
    regardless of whether single characters are signed or
    not (a slight change from versions 3.1 and earlier of
    GCC). If there are more characters in the constant than
    would fit in the target int the compiler issues a
    warning, and the excess leading characters are ignored.

    For example, 'ab' for a target with an 8-bit char would
    be interpreted as `(int) ((unsigned char) 'a' * 256 +
    (unsigned char) 'b')', and '\234a' as `(int) ((unsigned
    char) '\234' * 256 + (unsigned char) 'a')'.

    (Technically, this documentation only applies to C, I think.
    But I would be very surprised if C++ did differently.)

    But since this is implementation defined, the above is only
    valid for gcc (although it does seem to be a frequent behavior).

    --
    James Kanze (GABI Software) email:james.kan ze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientier ter Datenverarbeitu ng
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

    Comment

    • Pete Becker

      #3
      Re: Multi-character constants

      On 2008-07-10 03:44:32 -0400, James Kanze <james.kanze@gm ail.comsaid:
      >
      They were part of K&R C. Where a character literal always had
      type int. Even in C, however, the only place I've seen them
      used was for generating the "magic" for certain types of files
      in very early Unix. (Presumably, the author of the code "knew"
      what his compiler did.) They're one of those misfeatures which
      we can't get rid of for reasons of backwards compatibility.
      They were also used to define 32-bit ID values for the Creator field in
      Macintosh data files.

      --
      Pete
      Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
      Standard C++ Library Extensions: a Tutorial and Reference
      (www.petebecker.com/tr1book)

      Comment

      Working...