UDB and pointer increments and decrements

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Richard

    UDB and pointer increments and decrements


    I'm still battling with this causing UDB:

    while(e-- s);

    if s points to the start of a string and e becomes less than s then e is
    not really pointing to defined char. Fine.

    But UDB?

    Yes, e has an UDV (undefined value) but would this really cause a
    program to misbehave? In any platfrom? Remember this value of e is never
    used again in this case.

    I ask because theoretically s can be pointing to the middle of a bigger
    string. We then call a function with s as a parameter.

    The function called can have no idea that s is the pointer to a middle
    string. therefore it can have no idea how to "do undefined things" when
    e is decremented past the start of s. e and s are strictly char *s. It
    would be so "not C" if the compiler generated code to check the contents
    pointed to do determine the range of the object to the middle of which s
    points. I mean then we may as well have array limits and exceptions
    built into the language.

    I'm not being difficult here. Explain how this works. My problem (and I
    admit its a problem) is that i feel too much of C is being elevated to
    an almost ADA type status and (in this group) C is losing that "down and
    dirty and efficient" feeling which it is famous for.


  • Jean-Marc Bourguet

    #2
    Re: UDB and pointer increments and decrements

    Richard<rgrdev@ gmail.comwrites :
    I'm still battling with this causing UDB:
    >
    while(e-- s);
    >
    if s points to the start of a string and e becomes less than s then e is
    not really pointing to defined char. Fine.
    >
    But UDB?
    >
    Yes, e has an UDV (undefined value) but would this really cause a
    program to misbehave? In any platfrom? Remember this value of e is never
    used again in this case.
    1/ C has 4 levels of definition (defined behavior, implementation --
    includede locale -- defined behavior, unspecified behavior, undefined
    behavior), no more. Spending effort to try and classify undefined behavior
    more finely is probably not worthwhile. And it seems to be that's what you
    want, having different rules for the undefined value created by
    decrementing a pointer and all they others. There is precedence (the
    similar past of end of an array pointer comes immediately to mind), but
    your's would be more limited than that one (or you'd have got opposition
    from DOS folk as allowing them in comparison would have constrained them a
    lot, probably limitting the size of an object to 32767 instead of 65535
    they have got).

    2/ Optimizers tend to use undefined behavior in creative way. For example,
    things like value propagation can optimize out the then part of the if in
    this code:

    if (i == INT_MAX) {
    do something not modifying i;
    }
    ++i;

    (reasonning: as incrementing i is an overflow if i is INT_MAX, so it would
    be undefined behavior is that was the case, then the optimizer can assume i
    isn't INT_MAX, the result of the comparison is false). Optimizations like
    this is one of the reasons for which undefined behavior can be non causal
    (you just have to be sure that the code causing the undefined behavior
    would have been executed). And note that optimized do such propagation to
    more than the current function, they potentially can even do it for the
    whole program, and that's the way they are heading.

    Yours,

    --
    Jean-Marc

    Comment

    • Keith Thompson

      #3
      Re: UDB and pointer increments and decrements

      Richard<rgrdev@ gmail.comwrites :
      I'm still battling with this causing UDB:
      >
      while(e-- s);
      >
      if s points to the start of a string and e becomes less than s then e is
      not really pointing to defined char. Fine.
      >
      But UDB?
      A small note: You're the only person I've ever seen refer to undefined
      behavior as "UDB". Most posters here (at least those who choose to
      abbreviate it) refer to it as "UB". Why do you feel the need to
      invent your own abbreviation when there's already a perfectly good one
      in widespread use? (One could argue that "UB" could also mean
      unspecified behavior, but i've never seen it used that way, and it's
      generally clear enough from the context.)

      Yes, the behavior is undefined, simply because the standard doesn't
      define the behavior. That's all "undefined behavior" means.
      Yes, e has an UDV (undefined value) but would this really cause a
      program to misbehave? In any platfrom? Remember this value of e is never
      used again in this case.
      Yes. I don't have a real-world example, but if the containing object
      happens to be allocated at the beginning of a memory segment, it could
      easily blow up. And, as has been mentioned elsethread, a compiler is
      allowed to *assume* that undefined behavior does not occur, and
      perform code transformations based on that assumption (after all, if
      the behavior is already undefined, it can't make things worse); that
      may be a more realistic risk for most modern systems.
      I ask because theoretically s can be pointing to the middle of a bigger
      string. We then call a function with s as a parameter.
      Undefined behavior occurs if a pointer is decremented past the
      beginning of an array object, not if it's decremented past the initial
      value of a function parameter. Given this:

      char s[100];

      char *func(char *ptr) { return ptr - 1; }

      calling func(s+10) has well-defined behavior, but calling func(s) has
      undefined behavior. (I haven't compiled the above, so there may be
      some dumb mistakes.)
      The function called can have no idea that s is the pointer to a middle
      string.
      Right.
      therefore it can have no idea how to "do undefined things" when
      e is decremented past the start of s. e and s are strictly char *s.
      It doesn't deliberately "do undefined things"; that's not the point.
      The point is that the standard doesn't define what it does. In my
      example above, I'm thinking of a hypothetical system on which
      constructing the pointer value s-1 causes a hardware trap (because s
      is allocated at the beginning of a segment, and the hardware
      "decrement address" instruction traps in this case). The code
      generated for the body of the function has no awareness of this.

      For example, assume an implementation on which signed integer overflow
      causes a trap.

      int func(int n) { return n + 1; }

      func(42) has well-defined behavior, and returns 43. func(INT_MAX) has
      undefined behavior, and (on this particular implementation) causes a
      trap (or does something arbitrarily strange if an optimizing compiler
      rearranges code based on the assumption that no UB occurs). The
      function has no awareness of this; it just returns the result of n +
      1.
      It
      would be so "not C" if the compiler generated code to check the contents
      pointed to do determine the range of the object to the middle of which s
      points. I mean then we may as well have array limits and exceptions
      built into the language.
      The compiler is *allowed* to perform such checks, but it's not
      required to. That's why the behavior is undefined, rather than being
      defined to do whatever a failing check would do.
      I'm not being difficult here. Explain how this works. My problem (and I
      admit its a problem) is that i feel too much of C is being elevated to
      an almost ADA type status and (in this group) C is losing that "down and
      dirty and efficient" feeling which it is famous for.
      (It's "Ada", not "ADA".)

      C loses none of its "down and dirty and efficient" feeling because of
      this. In fact, the generated code can gain in efficiency because the
      compiler is allowed to trust the user to avoid undefined behavior and
      to perform aggressive optimization based on that assumption.

      A C implementation that does exactly what you seem to expect it to do
      (treat addresses as simple integers, allow arbitrary addresses to be
      computed, etc.) would be conforming. An implementation that performs
      aggressive bounds checking can also be conforming.

      --
      Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
      Nokia
      "We must do something. This is something. Therefore, we must do this."
      -- Antony Jay and Jonathan Lynn, "Yes Minister"

      Comment

      • Keith Thompson

        #4
        Re: UDB and pointer increments and decrements

        Richard<rgrdev@ gmail.comwrites :[...]
        >checks mandatory, the behavior could not be undefined - it would have
        >to be either standard-defined or implementation-defined. Because the
        >behavior is undefined, an implementation is currently free to deal
        >with array limits by ignoring them.
        >
        And them remaining undefined? Unspecified would have been better surely?
        Better how?

        Unspecified behavior is "use of an unspecified value, or other
        behavior where this International Standard provides two or more
        possibilities and imposes no further requirements on which is chosen
        in any instance".

        For the behavior of, for example, attempting to access an array
        outside its bounds to be unspecified rather than undefined, the
        standard would have to provide a number of possible behaviors, and
        anything other than one of those behaviors would be non-conforming.

        Suppose I have an array object declared within a function, and I write
        to element -1 of that array. I could clobber nearly anything,
        including the function's stored return address or some other vital
        piece of information. How would you restrict the possible
        consequences of that to "two or more possibilities"?

        [snip]
        I appreciate the time you have taken to explain. I would still love
        someone to explain the case I asked about above though. The one where s
        is pointing into the middle of an array. Or did you and I didn't
        understand?
        See my other recent response in this thread.

        --
        Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
        Nokia
        "We must do something. This is something. Therefore, we must do this."
        -- Antony Jay and Jonathan Lynn, "Yes Minister"

        Comment

        • Eric Sosman

          #5
          Re: UDB and pointer increments and decrements

          Richard wrote:
          I'm still battling with this causing UDB:
          >
          while(e-- s);
          >
          if s points to the start of a string and e becomes less than s then e is
          not really pointing to defined char. Fine.
          >
          But UDB?
          Yes. 6.5.6p8, penultimate sentence.
          Yes, e has an UDV (undefined value) but would this really cause a
          program to misbehave? In any platfrom? Remember this value of e is never
          used again in this case.
          The "UDV" need not even exist. Undefined behavior is not limited
          to generating an indeterminate value.
          I ask because theoretically s can be pointing to the middle of a bigger
          string. We then call a function with s as a parameter.
          No problem. Decrementing a pointer to the first element of this
          string is then well-defined, because the result points to an extant
          element of the larger array.
          The function called can have no idea that s is the pointer to a middle
          string. therefore it can have no idea how to "do undefined things" when
          e is decremented past the start of s. e and s are strictly char *s. It
          would be so "not C" if the compiler generated code to check the contents
          pointed to do determine the range of the object to the middle of which s
          points. I mean then we may as well have array limits and exceptions
          built into the language.
          It's not clear what you're getting at, or why you think any
          checking is necessary or implied. One of the reasons the Standard
          leaves things undefined is to *relieve* implementations of the burden
          of checking for errors. The benefit is that the generated code can
          be simpler and faster (dramatically so, in some cases), and the
          penalty is that there's no way to guarantee what happens when an
          error goes undetected.

          The Standard *could* have required that the implementation detect
          out-of-range pointer use and raise SIGSLOPPY, but that's the "so not C"
          philosophy that you mention. Instead, the Standard says "Try to
          generate an out-of-range pointers and all bets are off; I wash my hands
          of you and refuse to make any promises about what will or won't happen.
          Hasta la vista, baby." That's what "undefined behavior" means.
          I'm not being difficult here. Explain how this works. My problem (and I
          admit its a problem) is that i feel too much of C is being elevated to
          an almost ADA type status and (in this group) C is losing that "down and
          dirty and efficient" feeling which it is famous for.
          Portability is one of many aspects a program can have, in greater
          or lesser degree. It is seldom if ever the only important aspect, nor
          even at the front of the line. Sometimes portability is compromised
          for a good reason, and I don't think you'll find anyone who says
          otherwise.

          But when portability is sacrificed for no reason, out of ignorance
          ("Right-shifting always propagates the sign bit"), or out of laziness
          ("It's easier to write `2' than `sizeof(int)'") , or out of sloppiness
          ("Don't worry where the pointer points; we'll only use it if it's OK"),
          or even out of arrogance ("All systems are just like mine"), then it's
          worth pointing out the sacrifice and suggesting safer alternatives.

          It's also worth noting that "efficient" is not an antonym of
          "portable" and not a synonym of "dirty."

          --
          Eric.Sosman@sun .com

          Comment

          • Flash Gordon

            #6
            Re: UDB and pointer increments and decrements

            Richard wrote, On 23/09/08 16:44:
            I'm still battling with this causing UDB:
            >
            while(e-- s);
            >
            if s points to the start of a string and e becomes less than s then e is
            not really pointing to defined char. Fine.
            >
            But UDB?
            <snip>
            I'm not being difficult here. Explain how this works. My problem (and I
            admit its a problem) is that i feel too much of C is being elevated to
            an almost ADA type status and (in this group) C is losing that "down and
            dirty and efficient" feeling which it is famous for.
            Myself and another poster suggested an object starting at the beginning
            of a page or segment and *hardware* that traps on trying to decrement to
            before the start of the page/segment. No software checks need be involved!
            --
            Flash Gordon
            If spamming me sent it to smap@spam.cause way.com
            If emailing me use my reply-to address
            See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

            Comment

            • Old Wolf

              #7
              Re: UDB and pointer increments and decrements

              On Sep 24, 3:44 am, Richard<rgr...@ gmail.comwrote:
              I'm still battling with this causing UDB:
              >
              while(e-- s);
              >
              if s points to the start of a string and e becomes less than s then e is
              not really pointing to defined char. Fine.
              What if the string is at the very start of
              the address space? Where does 'e' point after
              decrementing it?

              There are CPUs or MMUs that will trap upon
              loading of an obviously bogus pointer such
              as this one that doesn't even describe a
              memory location that exists.

              Comment

              • Rosario

                #8
                Re: UDB and pointer increments and decrements

                "Richard" <rgrdev@gmail.c omha scritto nel messaggio
                news:gbb2rg$aai $1@registered.m otzarella.org.. .
                >
                I'm still battling with this causing UDB:
                >
                while(e-- s);
                >
                if s points to the start of a string and e becomes less than s then e is
                not really pointing to defined char. Fine.
                >
                But UDB?
                >
                Yes, e has an UDV (undefined value) but would this really cause a
                program to misbehave? In any platfrom? Remember this value of e is never
                used again in this case.
                >
                I ask because theoretically s can be pointing to the middle of a bigger
                string. We then call a function with s as a parameter.
                if you want a language that there is few to say, use assembly

                for what i can see for this group the speaking time of varios "UB"
                (undefinite behaviours) is more time consuming that programming



                Comment

                • James Kuyper

                  #9
                  Re: UDB and pointer increments and decrements

                  Rosario wrote:
                  ....
                  for what i can see for this group the speaking time of varios "UB"
                  (undefinite behaviours) is more time consuming that programming
                  That's because undefined (not "undefinite ") behavior is the single most
                  serious kind of problem C code can have. It's also because most code
                  that people bring to this group because they're having problems with it,
                  has undefined behavior. That's a selection effect; syntax errors and
                  constraint violations are easily caught by the compiler; the programs
                  that actually compile and fail tend to have subtler problems, usually
                  involving undefined behavior.

                  Comment

                  • Tim Rentsch

                    #10
                    Re: UDB and pointer increments and decrements

                    jameskuyper@ver izon.net writes:
                    Richard wrote:
                    I'm still battling with this causing UDB:

                    while(e-- s);

                    if s points to the start of a string and e becomes less than s then e is
                    not really pointing to defined char. Fine.

                    But UDB?

                    Yes, e has an UDV (undefined value) but would this really cause a
                    program to misbehave? In any platfrom? Remember this value of e is never
                    used again in this case.

                    I ask because theoretically s can be pointing to the middle of a bigger
                    string. We then call a function with s as a parameter.
                    The function called can have no idea that s is the pointer to a middle
                    string. therefore it can have no idea how to "do undefined things" when
                    e is decremented past the start of s. e and s are strictly char *s. It
                    would be so "not C" if the compiler generated code to check the contents
                    pointed to do determine the range of the object to the middle of which s
                    points. I mean then we may as well have array limits and exceptions
                    built into the language.
                    >
                    It's too late - the language that makes the behavior undefined was
                    inserted into the standard precisely for the purpose of allowing (but
                    not mandating) array limit checks. [...]
                    Nonsense. Allowing a pointer to be decremented to before the
                    start of an array is still compatible with doing array limit
                    checks, just as allowing a pointer to be incremented past the end
                    of an array is compatible with doing array limit checks.
                    The rationale document makes clear that decrementing a pointer
                    to before the start of an array was rejected because it would
                    impose overly burdensome requirements on implementations .
                    Array limit checks are equally possible whether e-- is allowed
                    or not.

                    Comment

                    Working...