size and nomenclature of integral types

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Shailesh

    size and nomenclature of integral types

    One problem I've been wrestling with for a long time is how to use the
    C++ integral data types, vis-a-vis their size. The C++ rules
    guarantee that a char is at least 1 bytes, a short and int at least 2
    bytes, and a long at least 4 bytes. The rules also define a size
    precedence to the types. In Stroustrup's book, it says that all type
    sizes are multiples of char, "so by definition the size of a char is
    1." According to the rules, that means 1 unit which can be any number
    of bytes.

    Why should I trust the compiler to optimize the memory usage of the
    types behind my back? As for portability, wouldn't fixed,
    unambiguosly-sized types be much more portable? Doesn't the ambiguity
    open the door for me on my system X with compiler Y to rely on its
    Z-byte representation of int? And if system-dependent optimization is
    desired, wouldn't it be easier to do with fixed-size types instead?

    One of my gripes is that the terminology is unclear. New programmers
    can be especially confused. For example, 'short' and 'long' are
    relative adjectives, and they don't say how big or at least how big.
    The other extreme are the names like __int8, __int16, and __int32 in
    MSVC++. Wouldn't I be much less likely to use something called __int8
    to count numbers over 255, than I would something called char? On the
    other hand, these keywords fix the size of the type and allow no room
    for the compiler to optimize.

    If I could invent new types, I would name them something like:

    uint8, uint16, uint24, uintN, ... (unsigned integer types)
    sint8, sint16, sint24, sintN, ... (signed integer types)

    where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
    types would be built-in.) I feel the signed/unsigned aspect is better
    part of the keyword, and not separate and optional. The Mozilla
    sources are instructive in that their cross-platform code implements
    macros following a similar convention; but macros are like pollution.

    I'd further have a new keyword like "allowopt", which when placed
    after the type keyword grants access to the compiler to optimize the
    memory allocation of the type. For example, when I write "uint16
    allowopt myCounter;", then I would unambiously be declaring, "Give me
    a 16-bit, unsigned, integer called myCounter whose size the compiler
    may optimize."

    In most compilers, the default setting would be to enable optimization
    for all the declarations, and a pragma could turn it off. I have
    suspicions about why things are the way they are, but I'd like to hear
    the experts' opinions.

  • Ben Measures

    #2
    Re: size and nomenclature of integral types

    Shailesh wrote:[color=blue]
    > One problem I've been wrestling with for a long time is how to use the
    > C++ integral data types, vis-a-vis their size. The C++ rules guarantee
    > that a char is at least 1 bytes, a short and int at least 2 bytes, and a
    > long at least 4 bytes. The rules also define a size precedence to the
    > types. In Stroustrup's book, it says that all type sizes are multiples
    > of char, "so by definition the size of a char is 1." According to the
    > rules, that means 1 unit which can be any number of bytes.
    >
    > Why should I trust the compiler to optimize the memory usage of the
    > types behind my back?[/color]

    That question can have one of two meanings IMO.

    Q1: How do I know the compiler won't introduce an error in its optimisation?
    A: You don't. You'll just have to trust its optimisations or disable them.

    Q2: How do I know the compiler is coming up with the best, most
    optimised code?
    A: You don't. If you want assurance, write in assembly code (with many
    years experience behind you).

    Trivial low-level optimisations like these have miniscule impact
    compared to algorithmic optimisations.
    [color=blue]
    > As for portability, wouldn't fixed,
    > unambiguosly-sized types be much more portable? Doesn't the ambiguity
    > open the door for me on my system X with compiler Y to rely on its
    > Z-byte representation of int? And if system-dependent optimization is
    > desired, wouldn't it be easier to do with fixed-size types instead?[/color]

    The C++ standard specifies the minimum ranges that integral types must
    support. If you consider these to be your maximum ranges then your code
    will definitely be portable in that respect. (Note though, that the C
    standard library provides ways of getting the exact ranges.)

    In any case, you should try IMHO to avoid the concepts of sizes in bits
    and bytes when programming in C++ and instead think in higher-level
    terms of ranges.

    --
    Ben Measures
    Software programming, Internet design/programming, Gaming freak.

    http://ben.measures.org.uk - when I find time

    Comment

    • Ben Measures

      #3
      Re: size and nomenclature of integral types

      Shailesh wrote:[color=blue]
      > One problem I've been wrestling with for a long time is how to use the
      > C++ integral data types, vis-a-vis their size. The C++ rules guarantee
      > that a char is at least 1 bytes, a short and int at least 2 bytes, and a
      > long at least 4 bytes. The rules also define a size precedence to the
      > types. In Stroustrup's book, it says that all type sizes are multiples
      > of char, "so by definition the size of a char is 1." According to the
      > rules, that means 1 unit which can be any number of bytes.
      >
      > Why should I trust the compiler to optimize the memory usage of the
      > types behind my back?[/color]

      That question can have one of two meanings IMO.

      Q1: How do I know the compiler won't introduce an error in its optimisation?
      A: You don't. You'll just have to trust its optimisations or disable them.

      Q2: How do I know the compiler is coming up with the best, most
      optimised code?
      A: You don't. If you want assurance, write in assembly code (with many
      years experience behind you).

      Trivial low-level optimisations like these have miniscule impact
      compared to algorithmic optimisations.
      [color=blue]
      > As for portability, wouldn't fixed,
      > unambiguosly-sized types be much more portable? Doesn't the ambiguity
      > open the door for me on my system X with compiler Y to rely on its
      > Z-byte representation of int? And if system-dependent optimization is
      > desired, wouldn't it be easier to do with fixed-size types instead?[/color]

      The C++ standard specifies the minimum ranges that integral types must
      support. If you consider these to be your maximum ranges then your code
      will definitely be portable in that respect. (Note though, that the C
      standard library provides ways of getting the exact ranges.)

      In any case, you should try IMHO to avoid the concepts of sizes in bits
      and bytes when programming in C++ and instead think in higher-level
      terms of ranges.

      --
      Ben Measures
      Software programming, Internet design/programming, Gaming freak.

      http://ben.measures.org.uk - when I find time

      Comment

      • Joe C

        #4
        Re: size and nomenclature of integral types


        "Shailesh" <humbads1@hotma il.com> wrote in message
        news:CNubc.1140 82$8G2.71202@fe 3.columbus.rr.c om...[color=blue]
        > One problem I've been wrestling with for a long time is how to use the
        > C++ integral data types, vis-a-vis their size.[/color]

        Try using the standard c (not c++) header:
        #include <stdint.h>
        it came on board with c99.

        This shows the names it allows.


        My experience is that this header can be used in c++ programs with no
        problems in gcc-based compilers.


        Comment

        • Joe C

          #5
          Re: size and nomenclature of integral types


          "Shailesh" <humbads1@hotma il.com> wrote in message
          news:CNubc.1140 82$8G2.71202@fe 3.columbus.rr.c om...[color=blue]
          > One problem I've been wrestling with for a long time is how to use the
          > C++ integral data types, vis-a-vis their size.[/color]

          Try using the standard c (not c++) header:
          #include <stdint.h>
          it came on board with c99.

          This shows the names it allows.


          My experience is that this header can be used in c++ programs with no
          problems in gcc-based compilers.


          Comment

          • Jack Klein

            #6
            Re: size and nomenclature of integral types

            On Sat, 03 Apr 2004 08:42:10 GMT, Shailesh <humbads1@hotma il.com>
            wrote in comp.lang.c++:
            [color=blue]
            > One problem I've been wrestling with for a long time is how to use the
            > C++ integral data types, vis-a-vis their size. The C++ rules
            > guarantee that a char is at least 1 bytes, a short and int at least 2
            > bytes, and a long at least 4 bytes.[/color]

            Your statement above is completely incorrect, because you are making
            the common mistake of confusing the word "byte" with the word "octet".

            In C and C++, the word "byte" is by definition the size of a
            character, and is at least 8 bits in width but may be wider. A char
            does not contain "at least 1 bytes", it contains exactly one byte,
            although that may be larger than one octet. A byte in C and C++ may
            have more than 8 bits.

            C++ does not guarantee that short and int are at least two bytes,
            although they must be at least two octets. Likewise with long.

            There are architectures where char contains more than 8 bits, mostly
            digital signal processors. On one such DSP, the minimum addressable
            unit is 16 bits, that is one byte has 16 bits. The character, short,
            and int types all contain 16 bits and their sizeof is 1. Another only
            addresses 32 bit quantities. All of the integer types, from char
            through long, are 32 bits, and all are exactly one byte.

            [color=blue]
            > The rules also define a size
            > precedence to the types. In Stroustrup's book, it says that all type
            > sizes are multiples of char, "so by definition the size of a char is
            > 1." According to the rules, that means 1 unit which can be any number
            > of bytes.[/color]

            No, the size of a char in C or C++ is exactly one byte, no matter how
            many bits it contains. You are still making the common mistake of
            assuming that the term "byte" means exactly 8 bits, which it most
            certainly does not. Especially in C and C++, where by definition it
            does not.
            [color=blue]
            > Why should I trust the compiler to optimize the memory usage of the
            > types behind my back? As for portability, wouldn't fixed,
            > unambiguosly-sized types be much more portable? Doesn't the ambiguity
            > open the door for me on my system X with compiler Y to rely on its
            > Z-byte representation of int? And if system-dependent optimization is
            > desired, wouldn't it be easier to do with fixed-size types instead?[/color]

            As others have already pointed out, C++ does not specify the size of
            any type in bytes, except for the character types. What it does
            specify is the range of values each type must be able to hold. If you
            stay within the minimum range of values for a type, it will be
            portable to all applications.
            [color=blue]
            > One of my gripes is that the terminology is unclear. New programmers
            > can be especially confused. For example, 'short' and 'long' are
            > relative adjectives, and they don't say how big or at least how big.
            > The other extreme are the names like __int8, __int16, and __int32 in
            > MSVC++. Wouldn't I be much less likely to use something called __int8
            > to count numbers over 255, than I would something called char? On the
            > other hand, these keywords fix the size of the type and allow no room
            > for the compiler to optimize.
            >
            > If I could invent new types, I would name them something like:
            >
            > uint8, uint16, uint24, uintN, ... (unsigned integer types)
            > sint8, sint16, sint24, sintN, ... (signed integer types)[/color]

            As already pointed out, C's <stdint.h> provides this for hardware
            platforms that actually support these exact sizes. The <stdint.h> for
            the 16-bit DSP I am currently working with does not typedef the int8_t
            and uint8_t types because they do not exist on the hardware.
            [color=blue]
            > where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
            > types would be built-in.) I feel the signed/unsigned aspect is better
            > part of the keyword, and not separate and optional. The Mozilla
            > sources are instructive in that their cross-platform code implements
            > macros following a similar convention; but macros are like pollution.
            >
            > I'd further have a new keyword like "allowopt", which when placed
            > after the type keyword grants access to the compiler to optimize the
            > memory allocation of the type. For example, when I write "uint16
            > allowopt myCounter;", then I would unambiously be declaring, "Give me
            > a 16-bit, unsigned, integer called myCounter whose size the compiler
            > may optimize."
            >
            > In most compilers, the default setting would be to enable optimization
            > for all the declarations, and a pragma could turn it off. I have
            > suspicions about why things are the way they are, but I'd like to hear
            > the experts' opinions.[/color]

            You really need to do a net search for stdint.h, or get and read a
            copy of the current C standard. All implementations are required to
            provide typedefs for types that eliminate the need for the rather
            clumsy concept of a keyword like "allowopt".

            You can choose required types that are either the smallest or fastest
            to hold at least 8, 16, 32, and 64 bits, and an implementation is free
            to provide other widths if it supports them.

            The feature's of C's <stdint.h> will almost certainly be included in
            the next revision of the C++ standard.

            --
            Jack Klein
            Home: http://JK-Technology.Com
            FAQs for
            comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
            comp.lang.c++ http://www.parashift.com/c++-faq-lite/
            alt.comp.lang.l earn.c-c++

            Comment

            • Jack Klein

              #7
              Re: size and nomenclature of integral types

              On Sat, 03 Apr 2004 08:42:10 GMT, Shailesh <humbads1@hotma il.com>
              wrote in comp.lang.c++:
              [color=blue]
              > One problem I've been wrestling with for a long time is how to use the
              > C++ integral data types, vis-a-vis their size. The C++ rules
              > guarantee that a char is at least 1 bytes, a short and int at least 2
              > bytes, and a long at least 4 bytes.[/color]

              Your statement above is completely incorrect, because you are making
              the common mistake of confusing the word "byte" with the word "octet".

              In C and C++, the word "byte" is by definition the size of a
              character, and is at least 8 bits in width but may be wider. A char
              does not contain "at least 1 bytes", it contains exactly one byte,
              although that may be larger than one octet. A byte in C and C++ may
              have more than 8 bits.

              C++ does not guarantee that short and int are at least two bytes,
              although they must be at least two octets. Likewise with long.

              There are architectures where char contains more than 8 bits, mostly
              digital signal processors. On one such DSP, the minimum addressable
              unit is 16 bits, that is one byte has 16 bits. The character, short,
              and int types all contain 16 bits and their sizeof is 1. Another only
              addresses 32 bit quantities. All of the integer types, from char
              through long, are 32 bits, and all are exactly one byte.

              [color=blue]
              > The rules also define a size
              > precedence to the types. In Stroustrup's book, it says that all type
              > sizes are multiples of char, "so by definition the size of a char is
              > 1." According to the rules, that means 1 unit which can be any number
              > of bytes.[/color]

              No, the size of a char in C or C++ is exactly one byte, no matter how
              many bits it contains. You are still making the common mistake of
              assuming that the term "byte" means exactly 8 bits, which it most
              certainly does not. Especially in C and C++, where by definition it
              does not.
              [color=blue]
              > Why should I trust the compiler to optimize the memory usage of the
              > types behind my back? As for portability, wouldn't fixed,
              > unambiguosly-sized types be much more portable? Doesn't the ambiguity
              > open the door for me on my system X with compiler Y to rely on its
              > Z-byte representation of int? And if system-dependent optimization is
              > desired, wouldn't it be easier to do with fixed-size types instead?[/color]

              As others have already pointed out, C++ does not specify the size of
              any type in bytes, except for the character types. What it does
              specify is the range of values each type must be able to hold. If you
              stay within the minimum range of values for a type, it will be
              portable to all applications.
              [color=blue]
              > One of my gripes is that the terminology is unclear. New programmers
              > can be especially confused. For example, 'short' and 'long' are
              > relative adjectives, and they don't say how big or at least how big.
              > The other extreme are the names like __int8, __int16, and __int32 in
              > MSVC++. Wouldn't I be much less likely to use something called __int8
              > to count numbers over 255, than I would something called char? On the
              > other hand, these keywords fix the size of the type and allow no room
              > for the compiler to optimize.
              >
              > If I could invent new types, I would name them something like:
              >
              > uint8, uint16, uint24, uintN, ... (unsigned integer types)
              > sint8, sint16, sint24, sintN, ... (signed integer types)[/color]

              As already pointed out, C's <stdint.h> provides this for hardware
              platforms that actually support these exact sizes. The <stdint.h> for
              the 16-bit DSP I am currently working with does not typedef the int8_t
              and uint8_t types because they do not exist on the hardware.
              [color=blue]
              > where N is any multiple of 8 greater than 0 (i.e. arbitrary precision
              > types would be built-in.) I feel the signed/unsigned aspect is better
              > part of the keyword, and not separate and optional. The Mozilla
              > sources are instructive in that their cross-platform code implements
              > macros following a similar convention; but macros are like pollution.
              >
              > I'd further have a new keyword like "allowopt", which when placed
              > after the type keyword grants access to the compiler to optimize the
              > memory allocation of the type. For example, when I write "uint16
              > allowopt myCounter;", then I would unambiously be declaring, "Give me
              > a 16-bit, unsigned, integer called myCounter whose size the compiler
              > may optimize."
              >
              > In most compilers, the default setting would be to enable optimization
              > for all the declarations, and a pragma could turn it off. I have
              > suspicions about why things are the way they are, but I'd like to hear
              > the experts' opinions.[/color]

              You really need to do a net search for stdint.h, or get and read a
              copy of the current C standard. All implementations are required to
              provide typedefs for types that eliminate the need for the rather
              clumsy concept of a keyword like "allowopt".

              You can choose required types that are either the smallest or fastest
              to hold at least 8, 16, 32, and 64 bits, and an implementation is free
              to provide other widths if it supports them.

              The feature's of C's <stdint.h> will almost certainly be included in
              the next revision of the C++ standard.

              --
              Jack Klein
              Home: http://JK-Technology.Com
              FAQs for
              comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
              comp.lang.c++ http://www.parashift.com/c++-faq-lite/
              alt.comp.lang.l earn.c-c++

              Comment

              • Shailesh

                #8
                Re: size and nomenclature of integral types

                Jack Klein wrote:[color=blue]
                > As others have already pointed out, C++ does not specify the size of
                > any type in bytes, except for the character types. What it does
                > specify is the range of values each type must be able to hold. If you
                > stay within the minimum range of values for a type, it will be
                > portable to all applications.
                >
                >[/color]

                You're right that I had no idea a byte could other than 8-bits. I
                also hadn't heard of the stdint.h header. I browsed it online, and it
                includes exactly the kinds of things I was looking for. I feel that a
                standard header like this would be far better than rolling one's own
                fixed-size types. Thank you for pointing it out. The fixed-width
                types are very helpful for tightly controlling data representation in
                files, memory, and network traffic. On the other hand, with so many
                flavors of CPU around, I can see how size is less meaningful in that
                context.

                Comment

                • Shailesh

                  #9
                  Re: size and nomenclature of integral types

                  Jack Klein wrote:[color=blue]
                  > As others have already pointed out, C++ does not specify the size of
                  > any type in bytes, except for the character types. What it does
                  > specify is the range of values each type must be able to hold. If you
                  > stay within the minimum range of values for a type, it will be
                  > portable to all applications.
                  >
                  >[/color]

                  You're right that I had no idea a byte could other than 8-bits. I
                  also hadn't heard of the stdint.h header. I browsed it online, and it
                  includes exactly the kinds of things I was looking for. I feel that a
                  standard header like this would be far better than rolling one's own
                  fixed-size types. Thank you for pointing it out. The fixed-width
                  types are very helpful for tightly controlling data representation in
                  files, memory, and network traffic. On the other hand, with so many
                  flavors of CPU around, I can see how size is less meaningful in that
                  context.

                  Comment

                  Working...