Proper way to do casts while avoiding aliasing issues

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • liljencrantz@gmail.com

    Proper way to do casts while avoiding aliasing issues

    Hi,

    I have a piece of code that uses hashtables to store pointers to
    various bits of data. The hashtable sees all pointers as const void *,
    while the application obviously uses various other pointer types for
    the data. I've run into a warning with the following code:

    void hash_remove( hash_table_t *h,
    const void *key,
    const void **old_key,
    const void **old_data );

    void function_remove ( const wchar_t *name )
    {
    void *key;
    function_data_t *d;
    event_t ev;

    hash_remove( &function,
    name,
    (const void **) &key,
    (const void **)&d );

    ....
    }

    Compiling it using GCC 4.1 results in the following warning:

    gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
    -c -o function.o function.c function.c: In function
    'function_remov e':
    function.c:204: warning: dereferencing type-punned pointer will break
    strict-aliasing rules

    The call to hash_remove is what is generating the warning.

    My understanding about why this is a problem may be a bit blurry. In
    general, using typecasting and pointer dereferencing does not seem to
    be proper C99, and may end up breaking the compilers assumptions about
    (non)aliasing of pointers of different types, and I suppose that is why
    GCC is a bit unhappy here.

    I can get around this by using a union of a const void * and the real
    datatype, but this seems like a hacksih solution. Any pointers to the
    proper way to do this type of typecasting, so as to avoid aliasing
    problems would be welcome.

    Also, I noticed that the comp.lang.c FAQ at
    www.eskimo.com/~scs/c-faq.com/ returns an access error, and at least
    the google cache of it contains no mention of aliasing. Perhaps a few
    questions about aliasing could be added to the FAQ as well?

    --
    Axel

  • Barry Schwarz

    #2
    Re: Proper way to do casts while avoiding aliasing issues

    On 25 Mar 2006 05:30:42 -0800, liljencrantz@gm ail.com wrote:
    [color=blue]
    >Hi,
    >
    >I have a piece of code that uses hashtables to store pointers to
    >various bits of data. The hashtable sees all pointers as const void *,
    >while the application obviously uses various other pointer types for
    >the data. I've run into a warning with the following code:
    >
    >void hash_remove( hash_table_t *h,
    > const void *key,
    > const void **old_key,
    > const void **old_data );
    >
    >void function_remove ( const wchar_t *name )
    >{
    > void *key;
    > function_data_t *d;
    > event_t ev;
    >
    > hash_remove( &function,
    > name,
    > (const void **) &key,
    > (const void **)&d );
    >
    >...
    >}
    >
    >Compiling it using GCC 4.1 results in the following warning:
    >
    >gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
    >-c -o function.o function.c function.c: In function
    >'function_remo ve':
    >function.c:204 : warning: dereferencing type-punned pointer will break
    >strict-aliasing rules
    >
    >The call to hash_remove is what is generating the warning.
    >
    >My understanding about why this is a problem may be a bit blurry. In
    >general, using typecasting and pointer dereferencing does not seem to
    >be proper C99, and may end up breaking the compilers assumptions about
    >(non)aliasin g of pointers of different types, and I suppose that is why
    >GCC is a bit unhappy here.
    >[/color]
    Your prototype says that the first argument to hash_remove will have
    type hash_table_t*. In your calling statement, it has type
    function_data_t **. Are these really compatible?


    Remove del for email

    Comment

    • Keith Thompson

      #3
      Re: Proper way to do casts while avoiding aliasing issues

      liljencrantz@gm ail.com writes:
      [...][color=blue]
      > Also, I noticed that the comp.lang.c FAQ at
      > www.eskimo.com/~scs/c-faq.com/ returns an access error, and at least
      > the google cache of it contains no mention of aliasing. Perhaps a few
      > questions about aliasing could be added to the FAQ as well?[/color]

      The URL works for me, but the clc FAQ has moved to <http://www.c-faq.com>.

      --
      Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
      San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
      We must do something. This is something. Therefore, we must do this.

      Comment

      • CBFalconer

        #4
        Re: Proper way to do casts while avoiding aliasing issues

        liljencrantz@gm ail.com wrote:[color=blue]
        >
        > I have a piece of code that uses hashtables to store pointers to
        > various bits of data. The hashtable sees all pointers as const
        > void *, while the application obviously uses various other pointer
        > types for the data. I've run into a warning with the following code:
        >
        > void hash_remove( hash_table_t *h,
        > const void *key,
        > const void **old_key,
        > const void **old_data );
        >
        > void function_remove ( const wchar_t *name )
        > {
        > void *key;
        > function_data_t *d;
        > event_t ev;
        >
        > hash_remove( &function,
        > name,
        > (const void **) &key,
        > (const void **)&d );
        > ...
        > }
        >
        > Compiling it using GCC 4.1 results in the following warning:[/color]

        You may want to look at how the interface to hashlib works with
        generic (i.e. void*) pointers. See:

        <http://cbfalconer.home .att.net/download/hashlib.zip>

        --
        Some informative links:
        news:news.annou nce.newusers
        Latest news coverage, email, free stock quotes, live scores and video are just the beginning. Discover more every day at Yahoo!






        Comment

        • Rod Pemberton

          #5
          Re: Proper way to do casts while avoiding aliasing issues


          <liljencrantz@g mail.com> wrote in message
          news:1143293442 .573269.244570@ v46g2000cwv.goo glegroups.com.. .[color=blue]
          > Hi,
          >
          > I have a piece of code that uses hashtables to store pointers to
          > various bits of data. The hashtable sees all pointers as const void *,
          > while the application obviously uses various other pointer types for
          > the data. I've run into a warning with the following code:
          >
          > void hash_remove( hash_table_t *h,
          > const void *key,
          > const void **old_key,
          > const void **old_data );
          >
          > void function_remove ( const wchar_t *name )
          > {
          > void *key;
          > function_data_t *d;
          > event_t ev;
          >
          > hash_remove( &function,
          > name,
          > (const void **) &key,
          > (const void **)&d );
          >
          > ...
          > }
          >
          > Compiling it using GCC 4.1 results in the following warning:
          >
          > gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
          > -c -o function.o function.c function.c: In function
          > 'function_remov e':
          > function.c:204: warning: dereferencing type-punned pointer will break
          > strict-aliasing rules
          >
          > The call to hash_remove is what is generating the warning.[/color]

          Yes, but you don't say which argument is causing the problem. From what
          you've posted, 'name', 'key', or 'd' is just as likely as 'function' to be
          creating the error... Which brings up the question: "Why haven't you cast
          'function' and 'name'?"
          [color=blue]
          > My understanding about why this is a problem may be a bit blurry. In
          > general, using typecasting and pointer dereferencing does not seem to
          > be proper C99, and may end up breaking the compilers assumptions about
          > (non)aliasing of pointers of different types, and I suppose that is why
          > GCC is a bit unhappy here.[/color]





          Rod Pemberton


          Comment

          • Chris Torek

            #6
            Re: Proper way to do casts while avoiding aliasing issues

            In article <1143293442.573 269.244570@v46g 2000cwv.googleg roups.com>
            <liljencrantz@g mail.com> wrote:[color=blue]
            >I have a piece of code that uses hashtables to store pointers to
            >various bits of data. The hashtable sees all pointers as const void *,
            >while the application obviously uses various other pointer types for
            >the data. I've run into a warning with the following code:
            >
            >void hash_remove( hash_table_t *h,
            > const void *key,
            > const void **old_key,
            > const void **old_data );[/color]

            This says that hash_remove needs:

            - a pointer to a hash table, "h"
            - a key, "key"
            - a pointer to a "const void *", old_key
            - a pointer to a "const void *", old_data

            Let us assume, for the sake of argument, that "const void *" is a
            special datatype that uses 16 bytes to hold lots of extra information
            about the target, while other pointers (where the compiler can make
            more assumptions about the target) use just 4 bytes. So a *pointer
            to* "const void *" is only 4 bytes long but it points to 16 bytes.
            Meanwhile let us assume that a "double" is 8 bytes. Then:

            double x = 3.1415926535897 93238462;
            const void *p = &x;
            const void **q = &p;

            might look like this in memory:

            x p
            +---------------+ +------------------------------------------+
            | 3.415926... | <-|---------------------* |
            +---------------+ +------------------------------------------+
            ^
            |
            |
            +-------+
            q | * |
            +-------+

            Now we have this:
            [color=blue]
            >void function_remove ( const wchar_t *name ) {
            > void *key;
            > function_data_t *d;
            > event_t ev;
            >
            > hash_remove( &function,[/color]

            Presumably "&function" is the address of a hash table, and OK (although
            we have not seen a declaration for the identifier "function") .
            [color=blue]
            > name,[/color]

            Here "name" has type "const wchar_t *", but this is converted to a
            32-byte "const void *" and the resulting value is passed in. This
            seems fine.
            [color=blue]
            > (const void **) &key,[/color]

            Here we have a possible problem. The variable "key" is just "void *",
            not "const void *". If "key" were "const void *", &key would have
            type "const void **" and be a 4-byte pointer to a 16-byte pointer.
            The 16-byte pointer currently contains garbage (is uninitialized)
            but this would presumably be OK -- hash_remove is going to fill it
            in.

            As it happens, the C standard has some fuzzy wording about how
            const qualifiers should not affect the underlying representation,
            so if "const void *" is a 16-byte thing, plain "void *" should
            also be a 16-byte thing. So &key should be a 4-byte pointer to
            a 16-byte pointer, and this should actually work. It would be
            better to make "key" a "const void *" and remove the cast, though.
            [color=blue]
            > (const void **)&d );[/color]

            Here, on the other hand, we have a definite problem.

            Remember that, except for "void *", we said that pointers were all
            four bytes. Here "d" has type "function_data_ t *", so it is a
            4-byte pointer. &d is a 4-byte pointer to a 4-byte pointer:

            &d d
            +-------+ +-------+
            | *---|--> | *---|-----? [we have no idea where this points]
            +-------+ +-------+

            but hash_remove is expecting a 4-byte pointer to a 16-byte pointer:

            old_data
            +-------+ +-------+.............. .......+
            | *---|--> | *---|
            +-------+ +-------+.............. .......+
            \_____/ \______________ ______/
            OK expected, but not there
            [color=blue]
            >Compiling it using GCC 4.1 results in the following warning:
            >
            >gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
            >-c -o function.o function.c function.c: In function
            >'function_remo ve':
            >function.c:204 : warning: dereferencing type-punned pointer will break
            >strict-aliasing rules
            >
            >The call to hash_remove is what is generating the warning.
            >
            >My understanding about why this is a problem may be a bit blurry. In
            >general, using typecasting and pointer dereferencing does not seem to
            >be proper C99, and may end up breaking the compilers assumptions about
            >(non)aliasin g of pointers of different types, and I suppose that is why
            >GCC is a bit unhappy here.[/color]

            Yes. Or, if pointers to different data types have different sizes,
            you could run into a situation in which hash_remove() updates all
            16 bytes of your 4-byte pointer.
            [color=blue]
            >I can get around this by using a union of a const void * and the real
            >datatype, but this seems like a hacksih solution. Any pointers to the
            >proper way to do this type of typecasting, so as to avoid aliasing
            >problems would be welcome.[/color]

            In general, the "right way to cast pointers" is not to do it at all.

            Consider the following small rewrite of function_remove ():

            void function_remove (const wchar_t *name) {
            const void *key;
            const void *old_data;
            const function_data_t *d;
            event_t ev;

            /* this fills in "key" and "old_data" */
            hash_remove(&fu nction, name, &key, &old_data);

            /*
            * We do not use "key" but we do use "old_data".
            * The "const void *" value that hash_remove() filled
            * in is actually a read-only pointer to function_data_t
            * (after being converted to "void *"), so we convert
            * it back from "void *" to "function_data_ t *", all
            * const-qualified, i.e., read-only.
            */
            d = old_data;

            ... more code here, presumably ...
            }

            It is probably the case that hash_remove() does not modify the
            data, but the data themselves are not necessarily read-only. If
            this is true, you will have fewer headaches if you remove most of
            the "const"s. Otherwise you will have places at which you are
            forced to use casts to cast away the const-ness. This means that
            "const" is not giving you any protection after all, and is therefore
            useless. (This is usually true in C, which gets "const" quite
            wrong. C++ is better about it, although still "not quite right".
            It would all work right, and I think be more in the "spirit of C",
            if const had been a storage-class modifier instead of a type-qualifier.
            The other alternative is to have dynamic typing, as in real OO
            languages, so that the type of a return parameter corresponds to
            the type of the input parameter. This is sort of like C++ templates,
            except not ugly :-) . But oh well.)

            Retaining just the one "useful or at least harmless" const gives:

            void hash_remove(has h_table_t *h, const void *key,
            void **old_key, void **old_data);

            void function_remove (const wchar_t *name) {
            void *key;
            void *old_data;
            function_data_t *d;
            event_t ev;

            hash_remove(&fu nction, name, &key, &old_data);
            d = old_data;

            ... more code here, presumably ...
            }

            In either case, though, the heart of the matter is to supply a
            correct "&old_data" , then derive the new value for "d" by *converting*
            the value that hash_remove() fills in. It is true that, on many
            (or perhaps even most) machines, this "conversion " is just "take
            the bits as-is", but by doing an actual conversion -- even if the
            compiler uses a total of 0 (zero) machine instructions to implement
            that conversion -- you get guaranteed-to-work code. On machines
            where the "conversion " is a no-op, if the compiler is half-decent
            (and gcc is), you get the zero-extra-instructions you would if you
            wrote non-portable, not-guaranteed code. Thus, there is no penalty
            for doing it right.
            --
            In-Real-Life: Chris Torek, Wind River Systems
            Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
            email: forget about it http://web.torek.net/torek/index.html
            Reading email is like searching for food in the garbage, thanks to spammers.

            Comment

            • liljencrantz@gmail.com

              #7
              Re: Proper way to do casts while avoiding aliasing issues


              Thank you for taking the time to write this very helpful and
              enlightening response. May I suggest that it is in some way added to
              the FAQ?

              Chris Torek wrote:[color=blue]
              > In article <1143293442.573 269.244570@v46g 2000cwv.googleg roups.com>[/color]
              [...][color=blue]
              > Let us assume, for the sake of argument, that "const void *" is a
              > special datatype that uses 16 bytes to hold lots of extra information
              > about the target, while other pointers (where the compiler can make
              > more assumptions about the target) use just 4 bytes. So a *pointer
              > to* "const void *" is only 4 bytes long but it points to 16 bytes.
              > Meanwhile let us assume that a "double" is 8 bytes. Then:
              >
              > double x = 3.1415926535897 93238462;
              > const void *p = &x;
              > const void **q = &p;
              >
              > might look like this in memory:
              >
              > x p
              > +---------------+ +------------------------------------------+
              > | 3.415926... | <-|---------------------* |
              > +---------------+ +------------------------------------------+
              > ^
              > |
              > |
              > +-------+
              > q | * |
              > +-------+[/color]

              Do any systems exist that actually _do_ this? (I want to fix the
              problem regardless, I'm just curious about if any system actually use
              this extre freedom provided by C)
              [color=blue]
              >
              > Now we have this:
              >[color=green]
              > >void function_remove ( const wchar_t *name ) {
              > > void *key;
              > > function_data_t *d;
              > > event_t ev;
              > >
              > > hash_remove( &function,[/color]
              >
              > Presumably "&function" is the address of a hash table, and OK (although
              > we have not seen a declaration for the identifier "function") .[/color]

              My bad. Your assumption is correct.
              [color=blue]
              >[color=green]
              > > name,[/color]
              >
              > Here "name" has type "const wchar_t *", but this is converted to a
              > 32-byte "const void *" and the resulting value is passed in. This
              > seems fine.
              >[color=green]
              > > (const void **) &key,[/color]
              >
              > Here we have a possible problem. The variable "key" is just "void *",
              > not "const void *". If "key" were "const void *", &key would have
              > type "const void **" and be a 4-byte pointer to a 16-byte pointer.
              > The 16-byte pointer currently contains garbage (is uninitialized)
              > but this would presumably be OK -- hash_remove is going to fill it
              > in.
              >
              > As it happens, the C standard has some fuzzy wording about how
              > const qualifiers should not affect the underlying representation,
              > so if "const void *" is a 16-byte thing, plain "void *" should
              > also be a 16-byte thing. So &key should be a 4-byte pointer to
              > a 16-byte pointer, and this should actually work. It would be
              > better to make "key" a "const void *" and remove the cast, though.
              >[color=green]
              > > (const void **)&d );[/color]
              >
              > Here, on the other hand, we have a definite problem.
              >
              > Remember that, except for "void *", we said that pointers were all
              > four bytes. Here "d" has type "function_data_ t *", so it is a
              > 4-byte pointer. &d is a 4-byte pointer to a 4-byte pointer:
              >
              > &d d
              > +-------+ +-------+
              > | *---|--> | *---|-----? [we have no idea where this points]
              > +-------+ +-------+
              >
              > but hash_remove is expecting a 4-byte pointer to a 16-byte pointer:
              >
              > old_data
              > +-------+ +-------+.............. .......+
              > | *---|--> | *---|
              > +-------+ +-------+.............. .......+
              > \_____/ \______________ ______/
              > OK expected, but not there
              >[color=green]
              > >Compiling it using GCC 4.1 results in the following warning:
              > >
              > >gcc -g -O2 -std=c99 -fno-optimize-sibling-calls -Wall -D _GNU_SOURCE
              > >-c -o function.o function.c function.c: In function
              > >'function_remo ve':
              > >function.c:204 : warning: dereferencing type-punned pointer will break
              > >strict-aliasing rules
              > >
              > >The call to hash_remove is what is generating the warning.
              > >
              > >My understanding about why this is a problem may be a bit blurry. In
              > >general, using typecasting and pointer dereferencing does not seem to
              > >be proper C99, and may end up breaking the compilers assumptions about
              > >(non)aliasin g of pointers of different types, and I suppose that is why
              > >GCC is a bit unhappy here.[/color]
              >
              > Yes. Or, if pointers to different data types have different sizes,
              > you could run into a situation in which hash_remove() updates all
              > 16 bytes of your 4-byte pointer.
              >[color=green]
              > >I can get around this by using a union of a const void * and the real
              > >datatype, but this seems like a hacksih solution. Any pointers to the
              > >proper way to do this type of typecasting, so as to avoid aliasing
              > >problems would be welcome.[/color]
              >
              > In general, the "right way to cast pointers" is not to do it at all.
              >
              > Consider the following small rewrite of function_remove ():
              >
              > void function_remove (const wchar_t *name) {
              > const void *key;
              > const void *old_data;
              > const function_data_t *d;
              > event_t ev;
              >
              > /* this fills in "key" and "old_data" */
              > hash_remove(&fu nction, name, &key, &old_data);
              >
              > /*
              > * We do not use "key" but we do use "old_data".
              > * The "const void *" value that hash_remove() filled
              > * in is actually a read-only pointer to function_data_t
              > * (after being converted to "void *"), so we convert
              > * it back from "void *" to "function_data_ t *", all
              > * const-qualified, i.e., read-only.
              > */
              > d = old_data;
              >
              > ... more code here, presumably ...
              > }
              >
              > It is probably the case that hash_remove() does not modify the
              > data, but the data themselves are not necessarily read-only. If
              > this is true, you will have fewer headaches if you remove most of
              > the "const"s. Otherwise you will have places at which you are
              > forced to use casts to cast away the const-ness. This means that
              > "const" is not giving you any protection after all, and is therefore
              > useless. (This is usually true in C, which gets "const" quite
              > wrong. C++ is better about it, although still "not quite right".
              > It would all work right, and I think be more in the "spirit of C",
              > if const had been a storage-class modifier instead of a type-qualifier.
              > The other alternative is to have dynamic typing, as in real OO
              > languages, so that the type of a return parameter corresponds to
              > the type of the input parameter. This is sort of like C++ templates,
              > except not ugly :-) . But oh well.)[/color]

              Ok, that makes sense. Thanks!
              [color=blue]
              >
              > Retaining just the one "useful or at least harmless" const gives:
              >
              > void hash_remove(has h_table_t *h, const void *key,
              > void **old_key, void **old_data);
              >
              > void function_remove (const wchar_t *name) {
              > void *key;
              > void *old_data;
              > function_data_t *d;
              > event_t ev;
              >
              > hash_remove(&fu nction, name, &key, &old_data);
              > d = old_data;
              >
              > ... more code here, presumably ...
              > }
              >
              > In either case, though, the heart of the matter is to supply a
              > correct "&old_data" , then derive the new value for "d" by *converting*
              > the value that hash_remove() fills in. It is true that, on many
              > (or perhaps even most) machines, this "conversion " is just "take
              > the bits as-is", but by doing an actual conversion -- even if the
              > compiler uses a total of 0 (zero) machine instructions to implement
              > that conversion -- you get guaranteed-to-work code. On machines
              > where the "conversion " is a no-op, if the compiler is half-decent
              > (and gcc is), you get the zero-extra-instructions you would if you
              > wrote non-portable, not-guaranteed code. Thus, there is no penalty
              > for doing it right.[/color]

              Yes, I've noticed that the const:ness doesn't buy you much. The
              disadvantages are pretty clearly outlined in your response, the
              advantage is that you never get a non-const pointer to a const memory
              region, which is something that always makes me a bit uneasy. I added
              it to err on the side of caution. You may be right that the
              disadvantage outweighs the advantage - I might remove the const:s.
              [color=blue]
              > --
              > In-Real-Life: Chris Torek, Wind River Systems
              > Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
              > email: forget about it http://web.torek.net/torek/index.html
              > Reading email is like searching for food in the garbage, thanks to spammers.[/color]

              --
              Axel

              Comment

              • liljencrantz@gmail.com

                #8
                Re: Proper way to do casts while avoiding aliasing issues


                Keith Thompson wrote:[color=blue]
                > liljencrantz@gm ail.com writes:
                > [...][color=green]
                > > Also, I noticed that the comp.lang.c FAQ at
                > > www.eskimo.com/~scs/c-faq.com/ returns an access error, and at least
                > > the google cache of it contains no mention of aliasing. Perhaps a few
                > > questions about aliasing could be added to the FAQ as well?[/color]
                >
                > The URL works for me, but the clc FAQ has moved to <http://www.c-faq.com>.[/color]

                Works for me now as well, the site must have been down. Thanks for the
                new URL. Hopefully it will bubble up in Google soon.
                [color=blue]
                >
                > --
                > Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
                > San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
                > We must do something. This is something. Therefore, we must do this.[/color]

                --
                Axel

                Comment

                • Chris Torek

                  #9
                  Re: Proper way to do casts while avoiding aliasing issues

                  >Chris Torek wrote:[color=blue][color=green]
                  >> Let us assume, for the sake of argument, that "const void *" is a
                  >> special datatype that uses 16 bytes to hold lots of extra information
                  >> about the target, while other pointers (where the compiler can make
                  >> more assumptions about the target) use just 4 bytes. So a *pointer
                  >> to* "const void *" is only 4 bytes long but it points to 16 bytes.
                  >> Meanwhile let us assume that a "double" is 8 bytes. Then:
                  >>
                  >> double x = 3.1415926535897 93238462;
                  >> const void *p = &x;
                  >> const void **q = &p;
                  >>
                  >> might look like this in memory:
                  >>
                  >> x p
                  >> +---------------+ +------------------------------------------+
                  >> | 3.415926... | <-|---------------------* |
                  >> +---------------+ +------------------------------------------+
                  >> ^
                  >> |
                  >> |
                  >> +-------+
                  >> q | * |
                  >> +-------+[/color][/color]

                  In article <1143369219.869 684.280870@z34g 2000cwc.googleg roups.com>
                  <liljencrantz@g mail.com> wrote:[color=blue]
                  >Do any systems exist that actually _do_ this? (I want to fix the
                  >problem regardless, I'm just curious about if any system actually use
                  >this extre freedom provided by C)[/color]

                  The answer is "well, almost". :-)

                  The classic example these days, for machines with "weird" pointers,
                  is the IBM AS/400. But the AS/400 has a uniform "data pointer"
                  space, and a separate but also uniform "function pointer" space,
                  as I understand it. (I have never used an AS/400.)

                  Somewhat less recently, we can look at the Cray machines, the
                  Data General Eclipse (the MV/10000 -- this is also the machine
                  referred to in the title of the book "The Soul of a New Machine").
                  All of these have special "byte pointer" *formats*. These do
                  not take up *more* space than other pointers, but they do have
                  "extra bits" when compared to other "normal" (word) pointers.
                  This comes about because the machines are, at some fundamental
                  level, "word-addressed" rather than "byte-addressed", with a
                  word being sufficiently large so that the C-compiler-writers
                  decided to give you "sub-word" data types to hold characters.
                  In order to fit two or more "char"s into a word, you have to
                  start with a word pointer, then add at least one -- and in the
                  case of the Cray, 3 -- extra bits somewhere, to pick out which
                  of the (2 or 8, respectively) bytes you really want.

                  On the Eclipse, this extra bit is actually handled by hardware:
                  the machine, at machine-code level, has separate "byte pointers"
                  and "word pointers", and instructions to convert from one to
                  the other.

                  If we go back into "C prehistory" (long before there was an ANSI
                  standard for C), we also find machines like the PR1ME, with 32
                  bit pointers but 48-bit "byte pointers". As with the Cray, the
                  reason for the extra-long "byte pointer" is that machine-level
                  pointers only point to whole words. To select a byte within
                  a machine word, the "byte pointer" uses additional memory.

                  While it is not directly related to this: in the bad old days of
                  the 80186, 80286, and early 80386 Intel chips, one also had to deal
                  with "segmented" systems and multiple "memory models" (usually
                  called "small", "large", "compact", and "huge"). These also result
                  in multiple pointer formats, although in this case, the formats
                  are based not on the data type, but rather on the size of the target
                  data. With the x86-64 now appearing, these days have returned:
                  the x86-64 necessarily provides multiple memory models, so that
                  you can run old 32-bit code and new 64-bit code on the same machine.

                  I would bet that data-type-dependent pointer formats will also be
                  "reinvented ", and will re-appear at some point in the future. One
                  mistake people make in disregarding history is that they believe
                  "we are smarter now". We are not any smarter -- people back then
                  were just as clever -- we just have different materials and tool
                  sets, that allow us to tackle tougher problems. But the solutions
                  they invented had a reason to exist -- they were solving real
                  problems -- and those reasons, those same problems, tend to recur.
                  So history repeats -- or as Mark Twain was supposed to have said,
                  "History doesn't repeat itself, but it does rhyme." (This does
                  not appear to be an actual Twain quote. There is also a version
                  with "it echos".)
                  --
                  In-Real-Life: Chris Torek, Wind River Systems
                  Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
                  email: forget about it http://web.torek.net/torek/index.html
                  Reading email is like searching for food in the garbage, thanks to spammers.

                  Comment

                  • liljencrantz@gmail.com

                    #10
                    Re: Proper way to do casts while avoiding aliasing issues


                    Chris Torek skrev:
                    [color=blue]
                    > In article <1143369219.869 684.280870@z34g 2000cwc.googleg roups.com>
                    > <liljencrantz@g mail.com> wrote:[color=green]
                    > >Do any systems exist that actually _do_ this? (I want to fix the
                    > >problem regardless, I'm just curious about if any system actually use
                    > >this extre freedom provided by C)[/color]
                    >
                    > The answer is "well, almost". :-)
                    >
                    > The classic example these days, for machines with "weird" pointers,
                    > is the IBM AS/400. But the AS/400 has a uniform "data pointer"
                    > space, and a separate but also uniform "function pointer" space,
                    > as I understand it. (I have never used an AS/400.)
                    >
                    > Somewhat less recently, we can look at the Cray machines, the
                    > Data General Eclipse (the MV/10000 -- this is also the machine
                    > referred to in the title of the book "The Soul of a New Machine").
                    > All of these have special "byte pointer" *formats*. These do
                    > not take up *more* space than other pointers, but they do have
                    > "extra bits" when compared to other "normal" (word) pointers.
                    > This comes about because the machines are, at some fundamental
                    > level, "word-addressed" rather than "byte-addressed", with a
                    > word being sufficiently large so that the C-compiler-writers
                    > decided to give you "sub-word" data types to hold characters.
                    > In order to fit two or more "char"s into a word, you have to
                    > start with a word pointer, then add at least one -- and in the
                    > case of the Cray, 3 -- extra bits somewhere, to pick out which
                    > of the (2 or 8, respectively) bytes you really want.
                    >
                    > On the Eclipse, this extra bit is actually handled by hardware:
                    > the machine, at machine-code level, has separate "byte pointers"
                    > and "word pointers", and instructions to convert from one to
                    > the other.
                    >
                    > If we go back into "C prehistory" (long before there was an ANSI
                    > standard for C), we also find machines like the PR1ME, with 32
                    > bit pointers but 48-bit "byte pointers". As with the Cray, the
                    > reason for the extra-long "byte pointer" is that machine-level
                    > pointers only point to whole words. To select a byte within
                    > a machine word, the "byte pointer" uses additional memory.
                    >
                    > While it is not directly related to this: in the bad old days of
                    > the 80186, 80286, and early 80386 Intel chips, one also had to deal
                    > with "segmented" systems and multiple "memory models" (usually
                    > called "small", "large", "compact", and "huge"). These also result
                    > in multiple pointer formats, although in this case, the formats
                    > are based not on the data type, but rather on the size of the target
                    > data. With the x86-64 now appearing, these days have returned:
                    > the x86-64 necessarily provides multiple memory models, so that
                    > you can run old 32-bit code and new 64-bit code on the same machine.
                    >
                    > I would bet that data-type-dependent pointer formats will also be
                    > "reinvented ", and will re-appear at some point in the future. One
                    > mistake people make in disregarding history is that they believe
                    > "we are smarter now". We are not any smarter -- people back then
                    > were just as clever -- we just have different materials and tool
                    > sets, that allow us to tackle tougher problems. But the solutions
                    > they invented had a reason to exist -- they were solving real
                    > problems -- and those reasons, those same problems, tend to recur.
                    > So history repeats -- or as Mark Twain was supposed to have said,
                    > "History doesn't repeat itself, but it does rhyme." (This does
                    > not appear to be an actual Twain quote. There is also a version
                    > with "it echos".)[/color]

                    Thank you for your detailed and interesting answers.
                    [color=blue]
                    > --
                    > In-Real-Life: Chris Torek, Wind River Systems
                    > Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
                    > email: forget about it http://web.torek.net/torek/index.html
                    > Reading email is like searching for food in the garbage, thanks to spammers.[/color]

                    --
                    Axel

                    Comment

                    Working...