Variable structure, constant length allocations

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andrew Smallshaw

    Variable structure, constant length allocations

    I'm working on a data structure that began life as a skip list
    derivative, but has evolved to the point that it now only has a
    passing resemblance to them. Each node of this structure has a
    few constant size variables (ints and the like), an array holding
    the actual data (chars in this case) and a random number of pointers
    to other nodes in the structure.

    At the moment my code uses a fixed size array for the contents
    (which may or may not be filled) and the call to malloc for each
    node is adjusted to allow for the number of the pointers in the
    node. Nothing too special there.

    However, I am using a _lot_ of these structures and a significant
    proportion have comparatively short average lifetimes. With the
    expection of a few configuration tables that rarely, if ever, change
    once initialised they are my app's only dynamically allocated memory
    and I'm slightly concerned about memory fragmentation because they
    are of differing lengths. Simply allocating the maximum amount of
    memory that could be needed is an unacceptable waste of memory
    since the balance is skewed strongly towards low numbers of pointers
    - 50% have only a single pointer, 25% two pointers, 12.5% three
    and so on up to a maximum of about 25 pointers.

    It occurs to me that it is possible to define a fixed sized structure
    with a variable number of pointer if the length of the character
    length varies in inverse proportion to the number of pointers.
    The question is how to do it cleanly. Unions are out of the question
    because the number of possible structure types. I considered having
    the character buffer start at the same point as the second pointer
    and computing the real start of the buffer as needed. However, I
    am now leaning towards defining the character array as a maximum
    buffer size and having a single pointer array at the end. That
    is:

    struct structure {
    char buffer[MAX_BUFFER_LEN];
    struct structure *ptr[1];
    };

    and using _negative_ array indices to store the extra pointers. I
    can easily determine how many pointers are expected algorithmically
    - it is something I need to keep track of in any case - and define
    a macro to calculate the buffer's real maximum length when it is
    needed (not very often). This should work fine but I can't help
    feeling it seems a little messy. Does anyone have any better
    solutions, or even any views on whether this is worth bothering
    about?

    --
    Andrew Smallshaw
    andrews@sdf.lon estar.org
  • jameskuyper

    #2
    Re: Variable structure, constant length allocations

    Andrew Smallshaw wrote:
    ....
    It occurs to me that it is possible to define a fixed sized structure
    with a variable number of pointer if the length of the character
    length varies in inverse proportion to the number of pointers.
    The question is how to do it cleanly. Unions are out of the question
    because the number of possible structure types. I considered having
    the character buffer start at the same point as the second pointer
    and computing the real start of the buffer as needed. However, I
    am now leaning towards defining the character array as a maximum
    buffer size and having a single pointer array at the end. That
    is:
    >
    struct structure {
    char buffer[MAX_BUFFER_LEN];
    struct structure *ptr[1];
    };
    >
    and using _negative_ array indices to store the extra pointers.
    Negative array indices are a constraint violation. It might work on
    some systems, but it's not going to be portable. One safer approach
    would be to simply store an array of 25 pointers in your structure, or
    MAX_BUFFER_LEN/sizeof(struct structure *), whichever is larger. Insert
    new pointers starting from the top of the array and working toward the
    bottom, while accessing the unused bytes for use as your buffer at the
    bottom by treating the entire structure as an array of char. It's
    tricky and dangerous, I certainly would prefer a method that was
    safer, even at the expense of a little extra space. You have to be
    very careful to avoid overlap between the portion used for pointers
    and the portion used for your buffer. However, if done right, it would
    not violate any constraints and would have defined behavior.

    Comment

    • Richard Heathfield

      #3
      Re: Variable structure, constant length allocations

      jameskuyper said:

      <snip>
      Negative array indices are a constraint violation.
      Which constraint do they violate?

      <snip>

      --
      Richard Heathfield <http://www.cpax.org.uk >
      Email: -http://www. +rjh@
      Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
      "Usenet is a strange place" - dmr 29 July 1999

      Comment

      • Eric Sosman

        #4
        Re: Variable structure, constant length allocations

        Andrew Smallshaw wrote:
        [...]
        It occurs to me that it is possible to define a fixed sized structure
        with a variable number of pointer if the length of the character
        length varies in inverse proportion to the number of pointers.
        The question is how to do it cleanly. Unions are out of the question
        because the number of possible structure types. I considered having
        the character buffer start at the same point as the second pointer
        and computing the real start of the buffer as needed. However, I
        am now leaning towards defining the character array as a maximum
        buffer size and having a single pointer array at the end. That
        is:
        >
        struct structure {
        char buffer[MAX_BUFFER_LEN];
        struct structure *ptr[1];
        };
        >
        and using _negative_ array indices to store the extra pointers. I
        can easily determine how many pointers are expected algorithmically
        - it is something I need to keep track of in any case - and define
        a macro to calculate the buffer's real maximum length when it is
        needed (not very often). This should work fine but I can't help
        feeling it seems a little messy. Does anyone have any better
        solutions, or even any views on whether this is worth bothering
        about?
        A cleaner approach might use a union instead of a struct:

        union onion {
        char buffer[MAX_BUFFER_LEN];
        union onion *ptr[
        (MAX_BUFFER_LEN + sizeof(union onion*) - 1)
        / sizeof(union onion*)];
        };

        (The ugly arithmetic could be beautified by defining MAX_BUFFER_LEN
        as a multiple of sizeof(union onion*).)

        Another possibility is to store the characters just after the
        struct itself, possibly with a convenient pointer to them in the
        fixed portion of the struct. This snippet uses "the struct hack;"
        you could also use C99's flexible array member:

        struct structure {
        char *buffer;
        int this;
        double that;
        struct structure *ptr[1];
        } *p;
        p = malloc(offsetof (struct structure, ptr)
        + POINTER_CENSUS * sizeof(struct structure*)
        + strlen(the_stri ng) + 1);
        if (p == NULL) ...
        p->buffer = (char*)p
        + offsetof(struct structure, ptr)
        + POINTER_CENSUS * sizeof(struct structure*);
        p->this = 42;
        p->that = sqrt(42.0);
        strcpy (p->buffer, the_string);

        "Is it worth bothering about?" Dunno. If MAX_BUFFER_LEN is
        small and the number of structs/unions is large, then maybe. If
        MAX_BUFFER_LEN is large and/or the population is small, probably
        not. You'll need to measure in the context of your own program.

        --
        Eric.Sosman@sun .com

        Comment

        • jameskuyper

          #5
          Re: Variable structure, constant length allocations

          Richard Heathfield wrote:
          jameskuyper said:
          ....
          Negative array indices are a constraint violation.
          >
          Which constraint do they violate?
          Sorry - that was undefined behavior (6.5.6p8), not a constraint
          violation.

          I knew there was something wrong with that message, but I couldn't
          figure out what. I've got to stop answering these questions too
          quickly.

          Comment

          • Richard Heathfield

            #6
            Re: Variable structure, constant length allocations

            jameskuyper said:
            Richard Heathfield wrote:
            >jameskuyper said:
            ...
            Negative array indices are a constraint violation.
            >>
            >Which constraint do they violate?
            >
            Sorry - that was undefined behavior (6.5.6p8), not a constraint
            violation.
            Note that, whilst negative array indices do indeed invoke undefined
            behaviour, that doesn't mean that x[-1] is necessarily undefined. It isn't
            the negative index that's the problem. The problem is that the underlying
            pointer arithmetic ends up pointing before the beginning of the array. The
            following code is well-defined:

            int array[20];
            void foo(void)
            {
            int *p = array + 10;
            p[-5] = 42; /* linguistically: perfectly okay;
            stylistically: vile
            */
            }

            --
            Richard Heathfield <http://www.cpax.org.uk >
            Email: -http://www. +rjh@
            Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
            "Usenet is a strange place" - dmr 29 July 1999

            Comment

            • Gene

              #7
              Re: Variable structure, constant length allocations

              On Nov 10, 4:25 pm, Andrew Smallshaw <andr...@sdf.lo nestar.orgwrote :
              I'm working on a data structure that began life as a skip list
              derivative, but has evolved to the point that it now only has a
              passing resemblance to them.  Each node of this structure has a
              few constant size variables (ints and the like), an array holding
              the actual data (chars in this case) and a random number of pointers
              to other nodes in the structure.
              >
              At the moment my code uses a fixed size array for the contents
              (which may or may not be filled) and the call to malloc for each
              node is adjusted to allow for the number of the pointers in the
              node.  Nothing too special there.  
              >
              However, I am using a _lot_ of these structures and a significant
              proportion have comparatively short average lifetimes.  With the
              expection of a few configuration tables that rarely, if ever, change
              once initialised they are my app's only dynamically allocated memory
              and I'm slightly concerned about memory fragmentation because they
              are of differing lengths.  Simply allocating the maximum amount of
              memory that could be needed is an unacceptable waste of memory
              since the balance is skewed strongly towards low numbers of pointers
              - 50% have only a single pointer, 25% two pointers, 12.5% three
              and so on up to a maximum of about 25 pointers.
              >
              It occurs to me that it is possible to define a fixed sized structure
              with a variable number of pointer if the length of the character
              length varies in inverse proportion to the number of pointers.
              The question is how to do it cleanly.  Unions are out of the question
              because the number of possible structure types.  I considered having
              the character buffer start at the same point as the second pointer
              and computing the real start of the buffer as needed.  However, I
              am now leaning towards defining the character array as a maximum
              buffer size and having a single pointer array at the end.  That
              is:
              >
                 struct structure {
                    char buffer[MAX_BUFFER_LEN];
                    struct structure *ptr[1];
                 };
              >
              and using _negative_ array indices to store the extra pointers.  I
              can easily determine how many pointers are expected algorithmically
              - it is something I need to keep track of in any case - and define
              a macro to calculate the buffer's real maximum length when it is
              needed (not very often).  This should work fine but I can't help
              feeling it seems a little messy.  Does anyone have any better
              solutions, or even any views on whether this is worth bothering
              about?
              >
              Suggest you try something simple first. Implement an allocator that
              maintains N free lists for your N distinct possible node sizes. For
              skip lists, N ought to be no more than 50 or so. Replenish these free
              lists when empty by allocating standard-size blocks of a few pages (at
              least) to contain a bunch of equal-sized nodes. Alloc and free are
              now just a few instructions each to unchain and chain from the
              respective free list. Average fragmentation can't exceed half a node
              per block. If a given size goes dormant, its free list will be
              swapped out. This costs little, but if it's still too much, you can
              recycle blocks containing only free nodes: not hard to do with an
              allocation counter stored in each block header.



              Comment

              • Andrew Smallshaw

                #8
                Re: Variable structure, constant length allocations

                On 2008-11-11, Richard Heathfield <rjh@see.sig.in validwrote:
                >
                Note that, whilst negative array indices do indeed invoke undefined
                behaviour, that doesn't mean that x[-1] is necessarily undefined. It isn't
                the negative index that's the problem. The problem is that the underlying
                pointer arithmetic ends up pointing before the beginning of the array. The
                following code is well-defined:
                I was pretty sure that they were. I should be OK in this instance
                though? Although this is a 'real' negative index going past the
                beginning of an array, I know in advance it is going to clobber
                another array and thus won't hit any undefined memory.

                Of course, I'll have to be careful to respect the 'real' length of
                the preceding array but looking at my code it is easily refactored
                so that only comes up at one point. This code is going to be a
                little on the messy side but I think all things considered it will
                be an acceptable trade off.

                --
                Andrew Smallshaw
                andrews@sdf.lon estar.org

                Comment

                • James Kuyper

                  #9
                  Re: Variable structure, constant length allocations

                  Andrew Smallshaw wrote:
                  On 2008-11-11, Richard Heathfield <rjh@see.sig.in validwrote:
                  >Note that, whilst negative array indices do indeed invoke undefined
                  >behaviour, that doesn't mean that x[-1] is necessarily undefined. It isn't
                  >the negative index that's the problem. The problem is that the underlying
                  >pointer arithmetic ends up pointing before the beginning of the array. The
                  >following code is well-defined:
                  >
                  I was pretty sure that they were. I should be OK in this instance
                  though? ...
                  No.
                  ... Although this is a 'real' negative index going past the
                  beginning of an array, I know in advance it is going to clobber
                  another array and thus won't hit any undefined memory.
                  The wording of the standard makes it legal for an implementation to
                  implement pointers in such a way as to enable run-time bounds checking.
                  That would be sufficient to make your program blow up. However,
                  implementations which do that by default are rare, possibly
                  non-existent, so you're extremely unlikely to run into a problem for
                  that reason.

                  A far more realistic worry is that, because access memory through
                  negative offsets from an array has undefined behavior, an implementation
                  is allowed to optimize code in ways that depend upon the assumption that
                  such accesses will never be attempted. Your code will presumably access
                  the same memory through ptr[i] for negative values of 'i', and also
                  through positive offsets from buffer[]. I presume that it will carefully
                  synchronize the two uses of that memory so that they don't interfere
                  with each other. However, all of that careful synchronization will be
                  wasted: an implementation can legally optimize your code to delay reads
                  or writes to that memory in ways that will mess up your careful
                  synchronization . Such optimizations are allowed, because you're not
                  supposed to access buffer[] through ptr[], and vice versa.

                  Comment

                  Working...