One for the language lawyers

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Kenny McCormack

    One for the language lawyers

    Here is a commonly used technique, that will, of course, work fine on
    any reasonably modern, normal hardware. But, does it pass the CLC test?

    /* Assume well-formed input - of course, you can always break it by
    * feeding it bad input */

    struct foo { int field1, field2; char nl; } *bar;
    char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];

    int main(void) {
    bar = (struct foo *) buffer;
    fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
    /* Now access the members of the struct (using, e.g., bar -field1).
    * Note that no actual struct was ever declared - we are using
    * buffer as if it were the struct */
    }

  • Harald van =?UTF-8?b?RMSzaw==?=

    #2
    Re: One for the language lawyers

    On Mon, 09 Jun 2008 17:08:20 +0000, Kenny McCormack wrote:
    Here is a commonly used technique,
    It is? Where have you seen it used?
    that will, of course, work fine on
    any reasonably modern, normal hardware. But, does it pass the CLC test?
    No.
    /* Assume well-formed input - of course, you can always break it by
    * feeding it bad input */
    >
    struct foo { int field1, field2; char nl; } *bar;
    What's the nl member for?
    char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
    >
    int main(void) {
    bar = (struct foo *) buffer;
    This assumes that buffer is appropriately aligned for a struct foo. When
    you access *bar, you also ignore C's aliasing rules. Both problems can be
    avoided by using a union.
    fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
    Did you mean fread, or were you really asking about fgets? If you meant
    fread, I don't see the point of a nl member at all. If you meant fgets, I
    don't see the point of a nl member at the very end.
    /* Now access the members of the struct (using, e.g., bar -field1).
    * Note that no actual struct was ever declared - we are using
    * buffer as if it were the struct */
    }

    Comment

    • Jens Thoms Toerring

      #3
      Re: One for the language lawyers

      Kenny McCormack <gazelle@xmissi on.xmission.com wrote:
      Here is a commonly used technique, that will, of course, work fine on
      any reasonably modern, normal hardware. But, does it pass the CLC test?
      /* Assume well-formed input - of course, you can always break it by
      * feeding it bad input */
      struct foo { int field1, field2; char nl; } *bar;
      char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
      int main(void) {
      bar = (struct foo *) buffer;
      fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
      /* Now access the members of the struct (using, e.g., bar -field1).
      * Note that no actual struct was ever declared - we are using
      * buffer as if it were the struct */
      }
      As long as sizeof(struct foo) isn't smaller than
      SOMENUMBERWHATE VERFLOATSYOURBO AT then there's no problem.
      It's rather obfuscated and I dare to doubt that this is
      a "commonly used technique", but 'buffer' is memory
      you own so you can do with it whatever you want. Of
      course, all hinges on your primary assuption that the
      input is well-formed (it may be difficult to make it
      non-well-formed for the types of members the structure
      has on main-stream hardware, but there might be some
      systems where certain bit-patterns don't represent ints
      and thus you may run into danger of undefined behaviour).
      So figuring out what's well-formed can be a bit of a
      bother but as long as you do that there's no problem.

      Regards, Jens
      --
      \ Jens Thoms Toerring ___ jt@toerring.de
      \______________ ____________ http://toerring.de

      Comment

      • Hallvard B Furuseth

        #4
        Re: One for the language lawyers

        Kenny McCormack writes:
        Here is a commonly used technique, (...)
        I hope not.
        struct foo { int field1, field2; char nl; } *bar;
        char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
        >
        int main(void) {
        bar = (struct foo *) buffer;
        fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
        /* Now access the members of the struct (using, e.g., bar -field1).
        This breaks e.g. if there is a 0x10 byte (newline) in the integer
        representation of the would-be bar->field1 value. And as Harald
        said, it breaks if buffer is not properly aligned for a struct foo.

        Also when I see fgets() I suspect the file has been opened in text
        instead of binary mode, which means there may be bugs from converting
        between newline and the file system's representation of end-of-line.

        --
        Hallvard

        Comment

        • Chris Torek

          #5
          Re: One for the language lawyers

          >Kenny McCormack <gazelle@xmissi on.xmission.com wrote:
          >Here is a commonly used technique, that will, of course, work fine on
          >any reasonably modern, normal hardware. But, does it pass the CLC test?
          >
          >/* Assume well-formed input - of course, you can always break it by
          > * feeding it bad input */
          >struct foo { int field1, field2; char nl; } *bar;
          >char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
          >
          >int main(void) {
          > bar = (struct foo *) buffer;
          > fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
          > /* Now access the members of the struct (using, e.g., bar -field1).
          > * Note that no actual struct was ever declared - we are using
          > * buffer as if it were the struct */
          > }
          In article <6b57voF399cfmU 1@mid.uni-berlin.de>,
          Jens Thoms Toerring <jt@toerring.de wrote:
          >As long as sizeof(struct foo) isn't smaller than
          >SOMENUMBERWHAT EVERFLOATSYOURB OAT then there's no problem.
          When I first built the 4.xBSD system for the SPARC, tftp broke,
          precisely because it used this kind of trick. (In tftp's case,
          it was a more complex variant of the "struct hack".)
          >It's rather obfuscated and I dare to doubt that this is
          >a "commonly used technique", but 'buffer' is memory
          >you own so you can do with it whatever you want. Of
          >course, all hinges on your primary assuption that the
          >input is well-formed ...
          More importantly, it depends on the variable "buffer" being
          properly aligned for all member accesses.

          This was not true on the SPARC, where the compiler put the
          big buffer on an odd byte boundary.

          As a quick fix, I wrapped the buffer up into a union, which
          forced gcc to align the entire thing on an appropriate boundary.

          The trick also works if you use malloc() to obtain the buffer.

          In any case, it is not a very good idea to write the code this way,
          because it places such strong constraints on what constitutes "well
          formed" input. You need to make sure that these severe restrictions
          on whatever uses the code are paid-for by whatever benefit you are
          getting from this "commonly used technique" (which, in my experience,
          was used perhaps once in the entire 4.xBSD code base -- that seems
          to argue against the claim that it is "commonly used").
          --
          In-Real-Life: Chris Torek, Wind River Systems
          Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
          email: gmail (figure it out) http://web.torek.net/torek/index.html

          Comment

          • Jens Thoms Toerring

            #6
            Re: One for the language lawyers

            Chris Torek <nospam@torek.n etwrote:
            Kenny McCormack <gazelle@xmissi on.xmission.com wrote:
            Here is a commonly used technique, that will, of course, work fine on
            any reasonably modern, normal hardware. But, does it pass the CLC test?
            /* Assume well-formed input - of course, you can always break it by
            * feeding it bad input */
            struct foo { int field1, field2; char nl; } *bar;
            char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
            int main(void) {
            bar = (struct foo *) buffer;
            fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
            /* Now access the members of the struct (using, e.g., bar -field1).
            * Note that no actual struct was ever declared - we are using
            * buffer as if it were the struct */
            }
            In article <6b57voF399cfmU 1@mid.uni-berlin.de>,
            Jens Thoms Toerring <jt@toerring.de wrote:
            As long as sizeof(struct foo) isn't smaller than
            SOMENUMBERWHATE VERFLOATSYOURBO AT then there's no problem.
            When I first built the 4.xBSD system for the SPARC, tftp broke,
            precisely because it used this kind of trick. (In tftp's case,
            it was a more complex variant of the "struct hack".)
            It's rather obfuscated and I dare to doubt that this is
            a "commonly used technique", but 'buffer' is memory
            you own so you can do with it whatever you want. Of
            course, all hinges on your primary assuption that the
            input is well-formed ...
            More importantly, it depends on the variable "buffer" being
            properly aligned for all member accesses.
            This was not true on the SPARC, where the compiler put the
            big buffer on an odd byte boundary.
            Yes, that's a point I forgot about. Should have known better,
            being bitten more than once by this issue when trying to port
            (mostly other peoples;-) code to a different architecture. I
            guess I am not too good a language lawyer;-)

            Best regards, Jens
            --
            \ Jens Thoms Toerring ___ jt@toerring.de
            \______________ ____________ http://toerring.de

            Comment

            • rahul

              #7
              Re: One for the language lawyers

              On Jun 10, 3:30 am, Chris Torek <nos...@torek.n etwrote:
              >
              As a quick fix, I wrapped the buffer up into a union, which
              forced gcc to align the entire thing on an appropriate boundary.
              A bit off the topic:

              We can also use compiler specific extensions to achieve the alignment
              and padding
              requirements. In case of gcc, __attribute__(( packed)) for eliminating
              padding for structures.
              We can also use aligned attributes for buffer to coerce the alignment.

              Comment

              • Nick Keighley

                #8
                Re: One for the language lawyers

                On 9 Jun, 18:08, gaze...@xmissio n.xmission.com (Kenny McCormack)
                wrote:
                Here is a commonly used technique, that will, of course, work fine on
                any reasonably modern, normal hardware.  But, does it pass the CLC test?
                >
                /* Assume well-formed input - of course, you can always break it by
                 * feeding it bad input */
                >
                struct foo { int field1, field2; char nl; } *bar;
                char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
                >
                int main(void) {
                    bar = (struct foo *) buffer;
                    fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
                    /* Now access the members of the struct (using, e.g., bar -field1).
                     * Note that no actual struct was ever declared - we are using
                     * buffer as if it were the struct */
                    }
                I used it on real systems. Now it makes me nervous.
                I've seen a system break when an OS was upgraded
                due to this.

                To use this I'd want to be *very* sure there was an
                identical system at both ends. And always would be.


                --
                Nick Keighley

                Comment

                • Nick Keighley

                  #9
                  Re: One for the language lawyers

                  On 10 Jun, 05:30, rahul <rahulsin...@gm ail.comwrote:
                  On Jun 10, 3:30 am, Chris Torek <nos...@torek.n etwrote:
                  >
                  >
                  >
                  As a quick fix, I wrapped the buffer up into a union, which
                  forced gcc to align the entire thing on an appropriate boundary.
                  >
                  A bit off the topic:
                  >
                  We can also use compiler specific extensions to achieve the alignment
                  and padding
                  requirements. In case of gcc, __attribute__(( packed)) for eliminating
                  padding for structures.
                  We can also use aligned attributes for buffer to coerce the alignment.
                  eek!!! These things are different on every compiler. And sometimes
                  don't exist. Some hardware cannot support it (or it becomes *very*
                  ineffceint).

                  I worked on systems that turned it on and off for
                  each structure in a large header...

                  I've hunted bugs when different packed/not packed options
                  had been used in different object files. It *linked* fine.

                  --
                  Nick Keighley

                  "Almost every species in the universe has an irrational fear of
                  #pragma packed. But they're wrong"

                  Comment

                  • vippstar@gmail.com

                    #10
                    Re: One for the language lawyers

                    Kenny the Troll wrote:
                    Here is a commonly used technique, that will, of course, work fine on
                    How did you come to the conclusion that this technique is common?
                    Where did you see or hear about it?
                    any reasonably modern, normal hardware. But, does it pass the CLC test?
                    It certainly won't work for the "unreasonab ly modern/antique"
                    "abnormal hardware/software".
                    /* Assume well-formed input - of course, you can always break it by
                    * feeding it bad input */
                    You *can't* always break it by feeding it bad input as long as it's
                    properly programmed.
                    struct foo { int field1, field2; char nl; } *bar;
                    char buffer[SOMENUMBERWHATE VERFLOATSYOURBO AT];
                    >
                    int main(void) {
                    bar = (struct foo *) buffer;
                    fgets(buffer,SO MENUMBERWHATEVE RFLOATSYOURBOAT ,stdin);
                    You don't check the return value of fgets, nor you include <stdio.h>
                    for it.
                    /* Now access the members of the struct (using, e.g., bar -field1).
                    Where? I don't see the code accessing said members.
                    * Note that no actual struct was ever declared - we are using
                    There was - struct foo { int field1, field2; char n1; }.
                    * buffer as if it were the struct */
                    No you are not.
                    }
                    You don't return a value from main().

                    Comment

                    • Serve Lau

                      #11
                      Re: One for the language lawyers


                      "Nick Keighley" <nick_keighley_ nospam@hotmail. comschreef in bericht
                      news:68baaecc-84c9-494b-b745-7e6784a1bfbd@l6 4g2000hse.googl egroups.com...
                      On 10 Jun, 05:30, rahul <rahulsin...@gm ail.comwrote:
                      >On Jun 10, 3:30 am, Chris Torek <nos...@torek.n etwrote:
                      >>
                      >>
                      >>
                      As a quick fix, I wrapped the buffer up into a union, which
                      forced gcc to align the entire thing on an appropriate boundary.
                      >>
                      >A bit off the topic:
                      >>
                      >We can also use compiler specific extensions to achieve the alignment
                      >and padding
                      >requirements . In case of gcc, __attribute__(( packed)) for eliminating
                      >padding for structures.
                      >We can also use aligned attributes for buffer to coerce the alignment.
                      >
                      eek!!! These things are different on every compiler. And sometimes
                      don't exist. Some hardware cannot support it (or it becomes *very*
                      ineffceint).
                      *very* inefficient is *very* relative. It all depends on the structure of
                      your code. So I would not worry about the efficiency aspect of unaligned
                      access, only on the incorrectness aspect :)

                      Comment

                      Working...