Realloc and pointer arithmetics

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lithiumcat@gmail.com

    Realloc and pointer arithmetics

    Hi,

    maybe you remember me, some time ago I asked about how to store an
    integer value into a void*, and I learned that doing pointer
    arithmetic yeilding a pointer outside of an object (except the one-
    after-last thingy) is undefined behaviour.

    Actually I was trying to associate a function pointer with a key,
    through an AVL tree that managed void* data. Function pointers can't
    be stored in void* (that is, the standard does not garantee it works),
    so I stored them in a dynamic array and that was the index in said
    array that I wanted to store in the void* of the tree.

    My frist thought was actually to store into the tree pointers to array
    elements, instead of their index. The problem is, the array being
    dynamic, realloc() might change its location, and then all the
    pointers in my tree become invalid.

    So I thought about putting in my tree pointers to array elements, and
    then when realloc() changes the base, correct all the nodes of the
    tree accordingly. Something like that:

    fnptr* oldbase = base_of_dynamic _fnptr_array;
    increase_dynami c_array_size();
    fnptr* newbase = base_of_dynamic _fnptr_array;
    if (oldbase != newbase)
    for (void** storage_locatio n in the tree for all nodes) {
    fnptr* oldptr = *storage_locati on;
    int index = oldptr - oldbase;
    *storage_locati on = newbase + index; }

    From what I googled this is the usual way of dealing with pointers to
    elements inside a realloc()ed array, and I think it works just fine on
    many platforms.

    But is this garanteed to work? According to my (poor) understanding of
    the standard, if realloc() returns the original pointer, then oldbase
    and newbase are both valid pointers which compare equal, so there is
    no problem. But if realloc() changes the location of the dynamic
    array, oldbase becomes an invalid pointer. I think in that case
    oldbase is garanteed to compare unequal with newbase, which is a valid
    pointer, but using it to compute "index" is undefined, right?

    And if the above code is not garanteed to work by the standard, is
    there any portable way of handling dynamic arrays moved by realloc()?
    (besides never using pointers to elements inside a dynamic array)
  • Flash Gordon

    #2
    Re: Realloc and pointer arithmetics

    lithiumcat@gmai l.com wrote, On 18/04/08 16:33:
    Hi,
    >
    maybe you remember me, some time ago I asked about how to store an
    integer value into a void*, and I learned that doing pointer
    arithmetic yeilding a pointer outside of an object (except the one-
    after-last thingy) is undefined behaviour.
    I remember there was a discussion about this.
    Actually I was trying to associate a function pointer with a key,
    through an AVL tree that managed void* data. Function pointers can't
    be stored in void* (that is, the standard does not garantee it works),
    Well, the C standard does not. However, if you limit yourself to a
    suitable subset of C implementations you might find another standard
    that does. For instance, I believe that the Posix standard provides this
    guarantee, so it is possible. Note that as you are going beyond what the
    C standard guarantees you need to select which implementations you are
    interested in and select a method common to them.
    so I stored them in a dynamic array and that was the index in said
    array that I wanted to store in the void* of the tree.
    >
    My frist thought was actually to store into the tree pointers to array
    elements, instead of their index. The problem is, the array being
    dynamic, realloc() might change its location, and then all the
    pointers in my tree become invalid.
    OK, you have avoided the first trap.
    So I thought about putting in my tree pointers to array elements, and
    then when realloc() changes the base, correct all the nodes of the
    tree accordingly. Something like that:
    >
    fnptr* oldbase = base_of_dynamic _fnptr_array;
    increase_dynami c_array_size();
    fnptr* newbase = base_of_dynamic _fnptr_array;
    if (oldbase != newbase)
    for (void** storage_locatio n in the tree for all nodes) {
    fnptr* oldptr = *storage_locati on;
    int index = oldptr - oldbase;
    *storage_locati on = newbase + index; }
    >
    From what I googled this is the usual way of dealing with pointers to
    elements inside a realloc()ed array, and I think it works just fine on
    many platforms.
    >
    But is this garanteed to work?
    No.
    According to my (poor) understanding of
    the standard, if realloc() returns the original pointer, then oldbase
    and newbase are both valid pointers which compare equal, so there is
    no problem.
    Correct.
    But if realloc() changes the location of the dynamic
    array, oldbase becomes an invalid pointer. I think in that case
    oldbase is garanteed to compare unequal with newbase, which is a valid
    pointer, but using it to compute "index" is undefined, right?
    No, it is *not* guaranteed to compare unequal. Just evaluating it
    invokes undefined behaviour and so could crash your program.
    And if the above code is not garanteed to work by the standard, is
    there any portable way of handling dynamic arrays moved by realloc()?
    (besides never using pointers to elements inside a dynamic array)
    The way to deal with it is to use indices rather than pointers.

    Actually, for your problem as you need something beyond what the C
    standard guarantees I would be inclined to rely on the guarantees of
    Posix and the behaviour of Windows and just store the function pointers
    in the void* if that would cover all the required platforms. Then the
    code is far simpler.
    --
    Flash Gordon

    Comment

    • Chris Torek

      #3
      Re: Realloc and pointer arithmetics

      In article <l43nd5xt4m.ln2 @news.flash-gordon.me.uk>
      Flash Gordon <spam@flash-gordon.me.ukwro te:
      >Actually, for your problem as you need something beyond what the C
      >standard guarantees I would be inclined to rely on the guarantees of
      >Posix and the behaviour of Windows and just store the function pointers
      >in the void* if that would cover all the required platforms. Then the
      >code is far simpler.
      I tend to agree with Flash Gordon here: the engineering tradeoff
      between "greater portability" and "simpler but non-portable code"
      seems to be weighted towards the "simpler but non-portable" version.

      You might also want to attempt, as much as possible anyway, to
      isolate the non-portable code to a replaceable module, so that
      in the future, when the code is moved to a machine on which the
      old non-portable method fails, a new method can be substituted
      with minimal pain. (The new method can either continue to be
      non-portable, or chosen to be portable, depending on the newest
      set of engineering tradeoffs.)

      The other "obvious" option is to modify the AVL tree code, so
      that the "payload" part of the tree is a union:

      union avl_tree_value_ union {
      void *value_if_data_ pointer;
      void (*value_if_func _ptr)(void);
      int value_if_int;
      };

      (which is in fact portable, but requires modifying the AVL tree
      code).
      --
      In-Real-Life: Chris Torek, Wind River Systems
      Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
      email: gmail (figure it out) http://web.torek.net/torek/index.html

      Comment

      • Hallvard B Furuseth

        #4
        Re: Realloc and pointer arithmetics

        Flash Gordon writes:
        >lithiumcat@gma il.com wrote, On 18/04/08 16:33:
        >
        >According to my (poor) understanding of
        >the standard, if realloc() returns the original pointer, then oldbase
        >and newbase are both valid pointers which compare equal, so there is
        >no problem.
        >
        Correct.
        No. The realloc definition does not acknowledge any original pointer.
        It just says realloc deallocates the old & returns a new object.
        Comparing oldbase with anything yields undefined behavior, so how do you
        know realloc returned the original pointer anyway? The compiler can
        catch you at it.

        E.g. if it knows that some variable contains the now-indeterminate
        oldbase value, it may overwrite that variable. (Well, unless the
        program can detect that the bit pattern in the variable changed, I guess
        - by reading the variable as character data.)


        A more esoteric variant I seem to remember seeing in comp.<lang/std>.c:
        If malloc/realloc can affect page maps rather than moving memory around,
        the new pointer could even have the same bit representation as the old
        pointer - and yet be a different pointer. And realloc need then not
        ensure that the value (old pointer + pagesize) still refers to the
        corresponding (new pointer + pagesize).

        --
        Hallvard

        Comment

        • Flash Gordon

          #5
          Re: Realloc and pointer arithmetics

          Hallvard B Furuseth wrote, On 20/04/08 17:49:
          Flash Gordon writes:
          >lithiumcat@gmai l.com wrote, On 18/04/08 16:33:
          >>
          >>According to my (poor) understanding of
          >>the standard, if realloc() returns the original pointer, then oldbase
          >>and newbase are both valid pointers which compare equal, so there is
          >>no problem.
          >Correct.
          >
          No. The realloc definition does not acknowledge any original pointer.
          It just says realloc deallocates the old & returns a new object.
          Comparing oldbase with anything yields undefined behavior, so how do you
          know realloc returned the original pointer anyway?
          You can use memcmp. Of course, it might give false negatives. However,
          if the bit pattern is the same I don't see any way for it not to still
          be valid.
          The compiler can
          catch you at it.
          >
          E.g. if it knows that some variable contains the now-indeterminate
          oldbase value, it may overwrite that variable. (Well, unless the
          program can detect that the bit pattern in the variable changed, I guess
          - by reading the variable as character data.)
          Which the program is allowed to do :-)
          A more esoteric variant I seem to remember seeing in comp.<lang/std>.c:
          If malloc/realloc can affect page maps rather than moving memory around,
          the new pointer could even have the same bit representation as the old
          pointer - and yet be a different pointer. And realloc need then not
          ensure that the value (old pointer + pagesize) still refers to the
          corresponding (new pointer + pagesize).
          If the bit patterns of old pointer and new pointer are the same it is
          difficult to see how old pointer can fail to point at the correct place.

          However, the OP was explicitly talking about if the original pointer was
          returned not a pointer that happened to have the same bit pattern!

          I agree that there is no sensible way to check of the old pointer and
          new pointer are the same and it is pointless to try.
          --
          Flash Gordon

          Comment

          • Hallvard B Furuseth

            #6
            Re: Realloc and pointer arithmetics

            Flash Gordon writes:
            >Hallvard B Furuseth wrote, On 20/04/08 17:49:
            >>Flash Gordon writes:
            >>>lithiumcat@g mail.com wrote, On 18/04/08 16:33:
            >>>
            >>>According to my (poor) understanding of
            >>>the standard, if realloc() returns the original pointer, then oldbase
            >>>and newbase are both valid pointers which compare equal, so there is
            >>>no problem.
            >>Correct.
            >>
            >No. The realloc definition does not acknowledge any original pointer.
            >It just says realloc deallocates the old & returns a new object.
            >Comparing oldbase with anything yields undefined behavior, so how do you
            >know realloc returned the original pointer anyway?
            >
            You can use memcmp. Of course, it might give false negatives.
            And as I said below, it can give false positives too.
            However, if the bit pattern is the same I don't see any way for it not
            to still be valid.
            What's "valid"? What does "the pointers are the same" mean? The
            standard doesn't define any of this, so you have to define it first.
            >The compiler can
            >catch you at it.
            >E.g. if it knows that some variable contains the now-indeterminate
            >oldbase value, it may overwrite that variable. (Well, unless the
            >program can detect that the bit pattern in the variable changed, I guess
            >- by reading the variable as character data.)
            >
            Which the program is allowed to do :-)
            Sure, but most programs don't. And while you can use memcmp and thus
            freeze the bit pattern or something, it's still hard to decide just what
            that means in theory. In practice it may well make sense to decide you
            don't support too esoteric architectures.

            After a memcmp the old pointer value and pointers computed from it
            remain invalid when used as pointers, but you could use them anyway if
            you are sure you've successfully protected them from the compiler (and
            compilers are getting smarter all the time). Preferably you'd instead
            use the new pointer and just make a note that it matches the old one,
            but that doesn't help old pointers computed from the original old one.
            >A more esoteric variant I seem to remember seeing in comp.<lang/std>.c:
            >If malloc/realloc can affect page maps rather than moving memory around,
            >the new pointer could even have the same bit representation as the old
            >pointer - and yet be a different pointer. And realloc need then not
            >ensure that the value (old pointer + pagesize) still refers to the
            >correspondin g (new pointer + pagesize).
            >
            If the bit patterns of old pointer and new pointer are the same it is
            difficult to see how old pointer can fail to point at the correct place.
            >
            However, the OP was explicitly talking about if the original pointer was
            returned not a pointer that happened to have the same bit pattern!
            Yes. That's why I said it can adjust page maps. And an address range
            viewed as integeres need not be a sequence of contiguous numbers.
            Realloc can grow a malloced area by affecting the OS's mapping of
            virtual memory to physical memory. If it needs a new page to be put at
            the end of an address range, it just asks the OS to put insert a page
            there and tell it the page number. Pointer arithmetic which crosses
            page boundaries will need help from the virtual memory page maps.

            And - since realloc invalidates all pointers into the malloced address
            range, realloc might be a good time to normalize the page maps of that
            address range, or something like that. Then even if the bit pattern of
            a pointer to the start of the address range remains the same, the bit
            pattern of the value (start of alloced area + some offset) can change.

            I don't remember if that was the actual example (probably it wasn't),
            but it is possible and stuff like that has been done.
            I agree that there is no sensible way to check of the old pointer and
            new pointer are the same and it is pointless to try.
            --
            Hallvard

            Comment

            • Hallvard B Furuseth

              #7
              Re: Realloc and pointer arithmetics

              I wrote:
              Yes. That's why I said it can adjust page maps. And an address range
              viewed as integeres need not be a sequence of contiguous numbers.
              Realloc can grow a malloced area by affecting the OS's mapping of
              virtual memory to physical memory. If it needs a new page to be put at
              the end of an address range, it just asks the OS to put insert a page
              there and tell it the page number. Pointer arithmetic which crosses
              page boundaries will need help from the virtual memory page maps.
              If this wasn't clear: The point of the nonlinear address space is, then
              there need be no such concept as an unused area between two malloced
              address ranges, so "there is not enough room after the malloced area"
              cannot happen. As with a physical folder, you can always insert a new
              page after any particular page. Until the folder bursts, anyway.

              --
              Hallvard

              Comment

              Working...