question on union

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Roman Mashak

    question on union

    Hello,

    I'm going through the "UNIX network programming" by R.Stevens and stuck with
    the following code, determining the endiannes of a host it is running on:

    #include <stdio.h>
    #include <stdlib.h>

    #define CPU_VENDOR_OS "i686-pc-linux-gnu"

    int main(void)
    {
    union {
    short s;
    char c[sizeof(short)];
    } un;

    un.s = 0x0102;
    printf("%s: ", CPU_VENDOR_OS);
    if (sizeof(short) == 2) {
    if (un.c[0] == 1 && un.c[1] == 2)
    printf("big-endian\n");
    else if (un.c[0] == 2 && un.c[1] == 1)
    printf("little-endian\n");
    else
    printf("unknown \n");
    } else
    printf("sizeof( short) = %d\n", sizeof(short));

    exit(0);
    }

    What I don't get is how come that un.c[0] and un.c[1] both contain what has
    been un.s initialized, i.e. 0x0102. Is it a feature of 'union'?
    Why could not we use 'struct' to check how bytes are placed in memory ?

    Thanks in advance!


    With best regards, Roman Mashak. E-mail: mrv@tusur.ru


  • Morris Dovey

    #2
    Re: question on union

    Roman Mashak wrote:
    What I don't get is how come that un.c[0] and un.c[1] both contain what has
    been un.s initialized, i.e. 0x0102. Is it a feature of 'union'?
    Why could not we use 'struct' to check how bytes are placed in memory ?
    The elements in a structure all occupy separate and distinct
    "pieces" of memory, but the elements of a union all occupy a
    common "piece" of memory.

    --
    Morris Dovey
    DeSoto Solar
    DeSoto, Iowa USA

    Comment

    • Roman Mashak

      #3
      Re: question on union

      Hello, Morris!
      You wrote on Tue, 12 Feb 2008 20:32:28 -0500:

      ??>What I don't get is how come that un.c[0] and un.c[1] both contain
      ??>what has been un.s initialized, i.e. 0x0102. Is it a feature of
      ??>'union'? Why could not we use 'struct' to check how bytes are placed
      ??>in memory ?

      MDThe elements in a structure all occupy separate and distinct
      MD"pieces" of memory, but the elements of a union all occupy a
      MDcommon "piece" of memory.
      What is the mechanics behind that? Say, in posted example at run-time un.s =
      0x0102 and this value (0x0102) occupies some common memory. Is it CPU who is
      in charge to lay out un.c value in the memory according to architecture?
      Then why do both values look differently, in debugger:

      (gdb) p/x un
      $3 = {s = 0x102, c = {0x2, 0x1}}
      (gdb)

      With best regards, Roman Mashak. E-mail: mrv@tusur.ru


      Comment

      • Arthur

        #4
        Re: question on union

        On Feb 14, 4:43 am, "Roman Mashak" <m...@tusur.ruw rote:
        Hello, Morris!
        You wrote on Tue, 12 Feb 2008 20:32:28 -0500:
        >
        ??>What I don't get is how come that un.c[0] and un.c[1] both contain
        ??>what has been un.s initialized, i.e. 0x0102. Is it a feature of
        ??>'union'? Why could not we use 'struct' to check how bytes are placed
        ??>in memory ?
        >
        MDThe elements in a structure all occupy separate and distinct
        MD"pieces" of memory, but the elements of a union all occupy a
        MDcommon "piece" of memory.
        What is the mechanics behind that? Say, in posted example at run-time un.s =
        0x0102 and this value (0x0102) occupies some common memory. Is it CPU who is
        in charge to lay out un.c value in the memory according to architecture?
        Then why do both values look differently, in debugger:
        >
        (gdb) p/x un
        $3 = {s = 0x102, c = {0x2, 0x1}}
        (gdb)
        >
        With best regards, Roman Mashak. E-mail: m...@tusur.ru
        Hello! The CPU doesn't place an 'union' according to this
        architecture, in fact,
        your compiler does so when you compiles your program.

        Suppose you define:
        union u_tag {
        int i;
        char c[sizeof(int)];
        } u;
        and refers it using
        u.i = 0x12;
        u.c[0] = 0x12;
        the compiler will simply convert them into instructions like this:
        movl $0x12, _u
        movb $0x12, _u
        The compiler uses same symbols for u.i and u.c[0].

        The reason why they look different in your debugger is that
        Intel CPUs use little-endian.
        un.s is placed in memory like this:
        0x02 0x01
        when referred as u.s, it means a short int 0x0102, i.e. s = 0x102
        when referred as u.c, it means an array of char, {0x02, 0x01}

        Please correct me if I made any mistakes. Have a good day!

        Comment

        • Roman Mashak

          #5
          Re: question on union

          Hello, Arthur!
          You wrote on Tue, 12 Feb 2008 21:59:55 -0800 (PST):

          [skip]
          Thanks for your explanations.

          Aand refers it using
          A u.i = 0x12;
          A u.c[0] = 0x12;
          Athe compiler will simply convert them into instructions like this:
          A movl $0x12, _u
          A movb $0x12, _u
          AThe compiler uses same symbols for u.i and u.c[0].

          AThe reason why they look different in your debugger is that
          AIntel CPUs use little-endian.
          Aun.s is placed in memory like this:
          A 0x02 0x01
          Awhen referred as u.s, it means a short int 0x0102, i.e. s = 0x102
          Awhen referred as u.c, it means an array of char, {0x02, 0x01}
          But both u.i and u.c are placed in memory on the same little-endian machine,
          why do they look differently? I can't catch how it is done.

          With best regards, Roman Mashak. E-mail: mrv@tusur.ru


          Comment

          • Martin Ambuhl

            #6
            Re: question on union

            Roman Mashak wrote:
            Hello,
            >
            I'm going through the "UNIX network programming" by R.Stevens and stuck with
            the following code, determining the endiannes of a host it is running on:
            >
            #include <stdio.h>
            #include <stdlib.h>
            >
            #define CPU_VENDOR_OS "i686-pc-linux-gnu"
            >
            int main(void)
            {
            union {
            short s;
            char c[sizeof(short)];
            } un;
            >
            un.s = 0x0102;
            printf("%s: ", CPU_VENDOR_OS);
            if (sizeof(short) == 2) {
            if (un.c[0] == 1 && un.c[1] == 2)
            printf("big-endian\n");
            else if (un.c[0] == 2 && un.c[1] == 1)
            printf("little-endian\n");
            else
            printf("unknown \n");
            } else
            printf("sizeof( short) = %d\n", sizeof(short));
            >
            exit(0);
            }
            >
            What I don't get is how come that un.c[0] and un.c[1] both contain what has
            been un.s initialized, i.e. 0x0102. Is it a feature of 'union'?
            Why could not we use 'struct' to check how bytes are placed in memory ?
            The program is doing a very bad thing. The folks who "explained" why
            this code "works" are doing you a disservice. The value of any union
            member other than the last stored into is unspecified. _Never_ store
            into one member of a union and attempt to access its value though
            another except when accessing an indentical common initial segment of
            struct members. This is a special exception to the general rule that a
            union can contain only one of its component values at a time. Storing
            into one member and accessing another is attempting to have the unison
            contain more than one component values at a time.

            You can accomplish the above with a non-union array into which you
            memmove a value.

            Comment

            • Martin

              #7
              Re: question on union

              On Feb 13, 8:50 am, Martin Ambuhl <mamb...@earthl ink.netwrote:
              >_Never_ store into one member of a union and attempt to access its value though
              another except when accessing an indentical common initial segment of
              struct members.  This is a special exception to the general rule that a
              union can contain only one of its component values at a time.  Storing
              into one member and accessing another is attempting to have the unison
              contain more than one component values at a time.
              Does that mean that the answer to Summit's "C Programming FAQs"
              Question 20.9 is wrong? Viz.:

              union {
              int i;
              char c[sizeof(int)];
              } x;

              x.i = 1;

              if (x.c[0] == 1)
              printf("little-endian\n");
              else
              printf("big-endian\n");

              Comment

              • Ben Bacarisse

                #8
                Re: question on union

                Martin Ambuhl <mambuhl@earthl ink.netwrites:
                Roman Mashak wrote:
                >I'm going through the "UNIX network programming" by R.Stevens and
                >stuck with the following code, determining the endiannes of a host
                >it is running on:
                >>
                >#include <stdio.h>
                >#include <stdlib.h>
                >>
                >#define CPU_VENDOR_OS "i686-pc-linux-gnu"
                >>
                >int main(void)
                >{
                > union {
                > short s;
                > char c[sizeof(short)];
                > } un;
                >>
                > un.s = 0x0102;
                > printf("%s: ", CPU_VENDOR_OS);
                > if (sizeof(short) == 2) {
                > if (un.c[0] == 1 && un.c[1] == 2)
                > printf("big-endian\n");
                > else if (un.c[0] == 2 && un.c[1] == 1)
                > printf("little-endian\n");
                > else
                > printf("unknown \n");
                > } else
                > printf("sizeof( short) = %d\n", sizeof(short));
                >>
                > exit(0);
                >}
                >>
                >What I don't get is how come that un.c[0] and un.c[1] both contain
                >what has been un.s initialized, i.e. 0x0102. Is it a feature of
                >union'?
                >Why could not we use 'struct' to check how bytes are placed in memory ?
                >
                The program is doing a very bad thing. The folks who "explained" why
                this code "works" are doing you a disservice. The value of any union
                member other than the last stored into is unspecified.
                Can you cite the prohibition? I thought it had been removed. There
                is a footnote (yes, I know, non-normative) that states:

                If the member used to access the contents of a union object is not
                the same as the member last used to store a value in the object, the
                appropriate part of the object representation of the value is
                reinterpreted as an object representation in the new type as
                described in 6.2.6 (a process sometimes called "type punning"). This
                might be a trap representation.

                (6.3.6 is the section of the representation of types.) Since unsigned
                char can't have trap representations , I think the code above could be
                re-written to stay within the letter of C99. The intent seems clear:
                to allow type punning using a union.

                --
                Ben.

                Comment

                • Michael Mair

                  #9
                  Re: question on union

                  Ben Bacarisse wrote:
                  Martin Ambuhl <mambuhl@earthl ink.netwrites:
                  >>Roman Mashak wrote:
                  >>
                  >>>I'm going through the "UNIX network programming" by R.Stevens and
                  >>>stuck with the following code, determining the endiannes of a host
                  >>>it is running on:
                  >>>
                  >>>#include <stdio.h>
                  >>>#include <stdlib.h>
                  >>>
                  >>>#define CPU_VENDOR_OS "i686-pc-linux-gnu"
                  >>>
                  >>>int main(void)
                  >>>{
                  >> union {
                  >> short s;
                  >> char c[sizeof(short)];
                  >> } un;
                  >>>
                  >> un.s = 0x0102;
                  >> printf("%s: ", CPU_VENDOR_OS);
                  >> if (sizeof(short) == 2) {
                  >> if (un.c[0] == 1 && un.c[1] == 2)
                  >> printf("big-endian\n");
                  >> else if (un.c[0] == 2 && un.c[1] == 1)
                  >> printf("little-endian\n");
                  >> else
                  >> printf("unknown \n");
                  >> } else
                  >> printf("sizeof( short) = %d\n", sizeof(short));
                  >>>
                  >> exit(0);
                  >>>}
                  >>>
                  >>>What I don't get is how come that un.c[0] and un.c[1] both contain
                  >>>what has been un.s initialized, i.e. 0x0102. Is it a feature of
                  >>>union'?
                  >>>Why could not we use 'struct' to check how bytes are placed in memory ?
                  >>
                  >>The program is doing a very bad thing. The folks who "explained" why
                  >>this code "works" are doing you a disservice. The value of any union
                  >>member other than the last stored into is unspecified.
                  >
                  Can you cite the prohibition? I thought it had been removed. There
                  is a footnote (yes, I know, non-normative) that states:
                  >
                  If the member used to access the contents of a union object is not
                  the same as the member last used to store a value in the object, the
                  appropriate part of the object representation of the value is
                  reinterpreted as an object representation in the new type as
                  described in 6.2.6 (a process sometimes called "type punning"). This
                  might be a trap representation.
                  >
                  (6.3.6 is the section of the representation of types.) Since unsigned
                  char can't have trap representations , I think the code above could be
                  re-written to stay within the letter of C99. The intent seems clear:
                  to allow type punning using a union.
                  In the thread starting at
                  <pan.2005.06.29 .01.23.01.11558 7@consulting.ne t.nz>
                  Tim Rentsch pointed out

                  ,- From <kfn7jfvrnla.fs f@alumnus.calte ch.edu--
                  My understanding is that the storing one member of a union in
                  different memory than another member was the result of unclear
                  language in the standard, and that the unclear language is
                  expected to be addressed through a TC. See:

                  `----

                  Cheers
                  Michael
                  --
                  E-Mail: Mine is an /at/ gmx /dot/ de address.

                  Comment

                  • Martin Ambuhl

                    #10
                    Re: question on union

                    Ben Bacarisse wrote:
                    Martin Ambuhl <mambuhl@earthl ink.netwrites:
                    [...]
                    >The value of any union
                    >member other than the last stored into is unspecified.
                    >
                    Can you cite the prohibition?
                    Appendix J is "informativ e", but includes explictly:

                    J.1 Unspecified behavior
                    1 The following are unspecified:
                    [...]
                    -- The value of a union member other than the last one stored into
                    (6.2.6.1).

                    Comment

                    • Ben Bacarisse

                      #11
                      Re: question on union

                      Martin Ambuhl <mambuhl@earthl ink.netwrites:
                      Ben Bacarisse wrote:
                      >Martin Ambuhl <mambuhl@earthl ink.netwrites:
                      [...]
                      >>The value of any union
                      >>member other than the last stored into is unspecified.
                      >>
                      >Can you cite the prohibition?
                      >
                      Appendix J is "informativ e", but includes explictly:
                      >
                      J.1 Unspecified behavior
                      1 The following are unspecified:
                      [...]
                      -- The value of a union member other than the last one stored into
                      (6.2.6.1).
                      Ah, right. I misunderstood your rather strong prohibition on not
                      doing this type punnig with a union. The behaviour is unspecified,
                      but so is the behaviour of your suggested alternative. Using memcpy
                      and inspecting the result will be no more specified than doing the
                      union trick. Is your objection to the union method stronger than
                      this?

                      --
                      Ben.

                      Comment

                      • Martin Ambuhl

                        #12
                        Re: question on union

                        Martin wrote:
                        [...]
                        >The FAQ's reference is to an older
                        >edition of H&S, so I don't know if that text was there. If that
                        >text was there, Steve ought not to have suppressed it.
                        [...]
                        My copy of the book is dated 1996. I don't think there is a later
                        version.
                        You are wrong. H&S5 is from 2002.
                        >
                        In the book, as well as the union example I posted, there is also the
                        example as provided in the online FAQ, which uses a pointer. The
                        online FAQ and my edition of the book also cross-reference to Harbison
                        & Steel Sec. 6.1.2 pp. 163-4.
                        In H&S5 it is on p. 184
                        The introductory text you quote is not in my edition of the book.
                        It should have been.

                        Comment

                        • dj3vande@csclub.uwaterloo.ca.invalid

                          #13
                          Re: question on union

                          In article <fotp1f$1l2h$1@ relay.tomsk.ru> , Roman Mashak <mrv@tusur.ruwr ote:
                          >Then why do both values look differently, in debugger:
                          >
                          >(gdb) p/x un
                          >$3 = {s = 0x102, c = {0x2, 0x1}}
                          >(gdb)
                          The value of something is determined by two things:
                          (1) the bit pattern of the memory it's stored in
                          (2) the type of the "something"
                          The type is important because it defines how the bit pattern is
                          interpreted.

                          With a union of two different objects of the same size (in your case a
                          short occupying two bytes and an array of two chars occupying one byte
                          each), the implementation will usually use the same memory for both
                          (the limitations on what you as a programmer can do with a union allow
                          it to do this, and this behavior is what most nonportable uses of a
                          union depend on), so the bit pattern of the memory is the same for both
                          objects (and happens to be valid for both in this case; knowing that it
                          will be usually requires some non-portable knowledge of the system it's
                          running on).

                          So since the actual data is the same, the type of the object you're
                          accessing the data through is what determines what it looks like.

                          When the debugger looks at un.s, it interprets the two bytes as a short
                          and fgets the value 0x0102. When it looks at un.c, it interprets those
                          same two bytes as an array of char, and fgets two different values, one
                          for each char, giving {0x02, 0x01}.

                          In general, if you need to care about this, you're doing something
                          that's non-portable and comp.lang.c is usually not the right place to
                          ask. But the basics of how representations and values interact are
                          well within the scope of comp.lang.c (and because of that I think the
                          claim elsethread that your question should have been posted elsewhere
                          is unjustified).


                          dave

                          --
                          Dave Vandervies dj3vande at eskimo dot com
                          Erm... wouldn't clock(), used with Bill Godfrey's follow-up, ignoring my
                          follow-up to him (as suggested in your follow-up to me), do the trick
                          quite nicely? --Joona I Palaste in comp.lang.c

                          Comment

                          Working...