Expanding buffer - response to "Determine the size of malloc" query

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • James Harris

    Expanding buffer - response to "Determine the size of malloc" query

    Initial issue: read in an arbitrary-length piece of text.
    Perceived issue: handle variable-length data

    The code below is a suggestion for implementing a variable length
    buffer that could be used to read text or handle arrays of arbitrary
    length. I don't have the expertise in C of many folks here so I feel
    like I'm offering a small furry animal for sacrifice to a big armour
    plated one... but will offer it anyway. Please do suggest improvements
    or challenge the premise. It would be great if it could be improved to
    become a generally useful piece of code.

    Well, here goes. This should be fun. :-?

    -

    The following utility code is passed a buffer (allocated by the
    caller) and maintains it at an appropriate size. The main function
    increases the allocation (when necessary) by factors - rather than
    fixed amounts - for speed. There is a secondary function to trim a
    buffer back to a specific size. An extra byte (one more than is
    requested) is always left at the end.

    /*
    * Expanding buffer
    */

    #define EBUF_SIZE_INIT 128
    #define EBUF_SIZE_MIN 128
    #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */

    #include <stdio.h>
    #include <stdlib.h>
    #include <malloc.h>

    int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
    size_t new_size;
    char *new_buf;

    if (*buf_size < offset + 2) { /* NB last pos left empty */
    new_size = *buf_size * EBUF_INCREASE + 1;
    if (new_size < offset + 2) new_size = offset + 2;
    if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
    if ((new_buf = realloc(*buf, new_size)) == NULL) {
    return 1; /* Failed to realloc buffer */
    }
    *buf = new_buf;
    *buf_size = new_size;
    }
    return 0; /* Reallocated successfuly */
    }


    int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
    int new_size = offset + 2; /* Includes empty char */
    char *new_buf;

    if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
    if (new_size != *buf_size) {
    if ((new_buf = realloc(*buf, new_size)) == NULL) {
    return 1; /* Reallocation failed (unlikely) */
    }
    *buf = new_buf;
    *buf_size = new_size;
    }
    return 0; /* Reallocation succeeded */
    }

  • James Harris

    #2
    Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

    On 30 May, 14:12, James Harris <james.harri... @googlemail.com wrote:
    Initial issue: read in an arbitrary-length piece of text.
    Perceived issue: handle variable-length data
    >
    The code below is a suggestion for implementing a variable length
    buffer that could be used to read text or handle arrays of arbitrary
    length. I don't have the expertise in C of many folks here so I feel
    like I'm offering a small furry animal for sacrifice to a big armour
    plated one... but will offer it anyway. Please do suggest improvements
    or challenge the premise. It would be great if it could be improved to
    become a generally useful piece of code.
    >
    Well, here goes. This should be fun. :-?
    >
    -
    >
    The following utility code is passed a buffer (allocated by the
    caller) and maintains it at an appropriate size. The main function
    increases the allocation (when necessary) by factors - rather than
    fixed amounts - for speed. There is a secondary function to trim a
    buffer back to a specific size. An extra byte (one more than is
    requested) is always left at the end.
    >
    /*
    * Expanding buffer
    */
    >
    #define EBUF_SIZE_INIT 128
    #define EBUF_SIZE_MIN 128
    #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
    >
    #include <stdio.h>
    #include <stdlib.h>
    #include <malloc.h>
    >
    int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
    size_t new_size;
    char *new_buf;
    >
    if (*buf_size < offset + 2) { /* NB last pos left empty */
    new_size = *buf_size * EBUF_INCREASE + 1;
    if (new_size < offset + 2) new_size = offset + 2;
    if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
    if ((new_buf = realloc(*buf, new_size)) == NULL) {
    return 1; /* Failed to realloc buffer */
    }
    *buf = new_buf;
    *buf_size = new_size;
    }
    return 0; /* Reallocated successfuly */
    >
    }
    >
    int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
    int new_size = offset + 2; /* Includes empty char */
    char *new_buf;
    >
    if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
    if (new_size != *buf_size) {
    if ((new_buf = realloc(*buf, new_size)) == NULL) {
    return 1; /* Reallocation failed (unlikely) */
    }
    *buf = new_buf;
    *buf_size = new_size;
    }
    return 0; /* Reallocation succeeded */
    >
    }
    An example of intended use follows. Note that the routines are coded
    to expect buffer and current size as parameters. Despite the error
    handling the code is intended to be fast. Including "if (offset + 2 >
    buf1_size)" in the main code the function should only be called if the
    buffer is too small. The cost of one integer comparison is small.

    int main() {
    char *buf1;
    size_t buf1_size = EBUF_SIZE_INIT;
    size_t offset;

    if ((buf1 = malloc(buf1_siz e)) == NULL) {
    fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
    buf1_size);
    exit(1);
    }

    ...

    offset = <position in buffer to write to>

    ...

    /* Check buf1 is big enough */
    if (offset + 2 buf1_size && ebuf_full(&buf1 , &buf1_size, offset))
    {
    fprintf(stderr, "Buffer overflow - have %d bytes but need %d
    bytes",
    buf1_size, offset + 2);
    exit(1);
    }
    buf1[offset] = 0;

    ...

    free(buf1);
    }


    Comment

    • James Harris

      #3
      Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

      On 30 May, 14:12, James Harris <james.harri... @googlemail.com wrote:
      Initial issue: read in an arbitrary-length piece of text.
      Perceived issue: handle variable-length data
      >
      The code below is a suggestion for implementing a variable length
      buffer that could be used to read text or handle arrays of arbitrary
      length. I don't have the expertise in C of many folks here so I feel
      like I'm offering a small furry animal for sacrifice to a big armour
      plated one... but will offer it anyway. Please do suggest improvements
      or challenge the premise. It would be great if it could be improved to
      become a generally useful piece of code.
      >
      Well, here goes. This should be fun. :-?
      >
      -
      >
      The following utility code is passed a buffer (allocated by the
      caller) and maintains it at an appropriate size. The main function
      increases the allocation (when necessary) by factors - rather than
      fixed amounts - for speed. There is a secondary function to trim a
      buffer back to a specific size. An extra byte (one more than is
      requested) is always left at the end.
      >
      /*
      * Expanding buffer
      */
      >
      #define EBUF_SIZE_INIT 128
      #define EBUF_SIZE_MIN 128
      #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
      >
      #include <stdio.h>
      #include <stdlib.h>
      #include <malloc.h>
      >
      int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
      size_t new_size;
      char *new_buf;
      >
      if (*buf_size < offset + 2) { /* NB last pos left empty */
      new_size = *buf_size * EBUF_INCREASE + 1;
      if (new_size < offset + 2) new_size = offset + 2;
      if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
      if ((new_buf = realloc(*buf, new_size)) == NULL) {
      return 1; /* Failed to realloc buffer */
      }
      *buf = new_buf;
      *buf_size = new_size;
      }
      return 0; /* Reallocated successfuly */
      >
      }
      >
      int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
      int new_size = offset + 2; /* Includes empty char */
      char *new_buf;
      >
      if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
      if (new_size != *buf_size) {
      if ((new_buf = realloc(*buf, new_size)) == NULL) {
      return 1; /* Reallocation failed (unlikely) */
      }
      *buf = new_buf;
      *buf_size = new_size;
      }
      return 0; /* Reallocation succeeded */
      >
      }
      Here's another piece of example code to use the proposed functions.
      This one is to read an arbitrary-length line. Hopefully when compared
      with a custom line-reading function the code below keeps a far simpler
      interface while allowing any necessary options. It should also be fast
      in that, again, the function only gets called if there is a need for
      more space. Since the function allocates memory in ever-increasing
      chunks for most iterations the function will not be called.


      #define ENDCHAR '\n'

      FILE *infile = stdin;
      char *buffer;
      size_t bufsize = 100; /* Initial size only */
      size_t offset;

      ... (allocate buffer)

      /* Read to 'endchar' */
      for (offset = 0; (ch = getc(infile)) != EOF; ) {
      if (offset + 2 bufsize &&
      ebuf_full(&buff er, &bufsize, offset) {
      fprintf(stderr, "Line too long for memory");
      exit(1);
      }
      buffer[offset++] = ch;
      if (ch == ENDCHAR) break;
      }

      ... (free buffer)

      Notably since we invoke getc() we could easily have more than one
      termination character such as

      if (ch == '\n' || ch == '\0' || ch == ',')

      etc. which is intended to be a big advantage over calling a line
      reader function.

      --
      James

      Comment

      • Eric Sosman

        #4
        Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

        James Harris wrote:
        On 30 May, 14:12, James Harris <james.harri... @googlemail.com wrote:
        >Initial issue: read in an arbitrary-length piece of text.
        >Perceived issue: handle variable-length data
        >>
        >The code below is a suggestion for implementing a variable length
        >buffer that could be used to read text or handle arrays of arbitrary
        >length. I don't have the expertise in C of many folks here so I feel
        >like I'm offering a small furry animal for sacrifice to a big armour
        >plated one... but will offer it anyway. Please do suggest improvements
        >or challenge the premise. It would be great if it could be improved to
        >become a generally useful piece of code.
        >>
        >Well, here goes. This should be fun. :-?
        >>
        >-
        >>
        >The following utility code is passed a buffer (allocated by the
        >caller) and maintains it at an appropriate size. The main function
        >increases the allocation (when necessary) by factors - rather than
        >fixed amounts - for speed. There is a secondary function to trim a
        >buffer back to a specific size. An extra byte (one more than is
        >requested) is always left at the end.
        >>
        >/*
        > * Expanding buffer
        > */
        >>
        >#define EBUF_SIZE_INIT 128
        >#define EBUF_SIZE_MIN 128
        >#define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
        >>
        >#include <stdio.h>
        >#include <stdlib.h>
        >#include <malloc.h>
        Only a small, furry animal offering itself up for sacrifice
        would use a non-standard header like <malloc.h>. Consider
        yourself eaten by the Ravenous Bugblatter Beast.

        Also, there seems to be no reason for <stdio.hin the
        buffer-bashing code; it doesn't hurt to #include extraneous
        baggage, but it doesn't help either. Everything you need
        is in <stdlib.h>.
        >int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
        > size_t new_size;
        > char *new_buf;
        >>
        > if (*buf_size < offset + 2) { /* NB last pos left empty */
        This magical `2' appears in quite a few places. Maybe
        it deserves a #define of its own?
        > new_size = *buf_size * EBUF_INCREASE + 1;
        As a small matter of personal preference and prejudice,
        I myself would avoid floating-point arithmetic here and do
        the calculation in integers. Not a big deal, though.
        > if (new_size < offset + 2) new_size = offset + 2;
        > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
        > if ((new_buf = realloc(*buf, new_size)) == NULL) {
        > return 1; /* Failed to realloc buffer */
        > }
        > *buf = new_buf;
        > *buf_size = new_size;
        > }
        > return 0; /* Reallocated successfuly */
        >>
        >}
        >>
        >int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
        > int new_size = offset + 2; /* Includes empty char */
        > char *new_buf;
        >>
        > if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
        > if (new_size != *buf_size) {
        > if ((new_buf = realloc(*buf, new_size)) == NULL) {
        > return 1; /* Reallocation failed (unlikely) */
        There's an interface design decision lurking here: Should
        this be considered a "failure," or just an "unsuccessf ul
        attempt to optimize?" Arguments can be made for both points
        of view. IMHO you've chosen rightly, because it's possible
        that ebuf_trim() could fail in an attempt to *increase* the
        size of the buffer, in which case the calling program might
        be, er, surprised to discover that the buffer was too small
        for the offset.
        > }
        > *buf = new_buf;
        > *buf_size = new_size;
        > }
        > return 0; /* Reallocation succeeded */
        >>
        >}
        >
        An example of intended use follows. Note that the routines are coded
        to expect buffer and current size as parameters. Despite the error
        handling the code is intended to be fast. Including "if (offset + 2 >
        buf1_size)" in the main code the function should only be called if the
        buffer is too small. The cost of one integer comparison is small.
        >
        int main() {
        `int main(void)' would be very slightly better.
        char *buf1;
        size_t buf1_size = EBUF_SIZE_INIT;
        size_t offset;
        >
        if ((buf1 = malloc(buf1_siz e)) == NULL) {
        fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
        buf1_size);
        What is the type of buf1_size? Answer: size_t. What type
        of operand does the "%d" specifier convert? Answer: int. Are
        size_t and int the same? Answer: No. What should you do to
        fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
        didn't mean that ...)

        For fprintf() and so on, <stdio.h*is* needed.
        exit(1);
        ITYM `exit(EXIT_FAIL URE);'. Or `return EXIT_FAILURE;'.
        }
        Instead of pre-allocating the initial buffer, why not
        set buf1=NULL and buf1_size=0 and just let ebuf_full()
        take care of everything?
        ...
        >
        offset = <position in buffer to write to>
        >
        ...
        >
        /* Check buf1 is big enough */
        if (offset + 2 buf1_size && ebuf_full(&buf1 , &buf1_size, offset))
        {
        fprintf(stderr, "Buffer overflow - have %d bytes but need %d
        bytes",
        buf1_size, offset + 2);
        exit(1);
        }
        buf1[offset] = 0;
        >
        ...
        >
        free(buf1);
        ... and since main() returns an int value, you should ...?
        (C99 introduced a special rule for main() that says falling
        off the end is equivalent to returning zero, but IMHO this
        should be viewed as a concession to the large amount of sloppy
        code already in existence, not as an encouragement to further
        sloppiness. Besides, C99 implementations have not exactly
        taken the world by storm, and lots of C90 implementations are
        still in use.)
        }
        It seems to me you understand the basic ideas of how to
        use realloc() to grow a buffer (although the fact that you
        can reallocate a NULL may have escaped you). There are a
        few glitches in the way you've done things, easily fixable.

        If you want to package something like this for wider
        use as a buffer-managing utility, you might consider putting
        the buffer information in a struct and passing a single
        struct pointer to the functions. Not only would this make
        the interface clearer by reducing the argument count, but
        it would also make it easy for you to add further fillips
        of functionality later on, just by adding a few elements
        to the struct and leaving the calls alone.

        Go back to your lair and lick your wounds; I think
        they're not life-threatening.

        --
        Eric.Sosman@sun .com

        Comment

        • Malcolm McLean

          #5
          Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot; query


          "James Harris" <james.harris.1 @googlemail.com wrote in message
          Initial issue: read in an arbitrary-length piece of text.
          Perceived issue: handle variable-length data
          >
          The code below is a suggestion for implementing a variable length
          buffer that could be used to read text or handle arrays of arbitrary
          length. I don't have the expertise in C of many folks here so I feel
          like I'm offering a small furry animal for sacrifice to a big armour
          plated one... but will offer it anyway. Please do suggest improvements
          or challenge the premise. It would be great if it could be improved to
          become a generally useful piece of code.
          >
          Firstly, don't worry about the actual code bodies at this stage. Any
          reasonably competent C programmer should be able to provide those.

          The thing is the interfaces.

          The first problem is that if we use char *, the functions will only work on
          character arrays. If we use void *s, this problem disappears, but there
          might be issues about too many casts to access the actual data.

          The second issue is whether to use a structure for the buffer, or, as you
          have done, pass in several parameters to represent size and capacity.
          There's a nasty C stitch-up if we use void *s with option 2.

          ebuf_full(void **buf ...)

          char *buffer;
          /* this is illegal */
          ebuf_full(&buff er)

          buffer has to be assigned to a dummy void *first. Which makes the function
          unusable.

          --
          Free games and programming goodies.



          Comment

          • James Harris

            #6
            Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

            On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
            JamesHarriswrot e:
            On 30 May, 14:12,JamesHarr is<james.harri. ..@googlemail.c omwrote:
            Initial issue: read in an arbitrary-length piece of text.
            Perceived issue: handle variable-length data
            >
            The code below is a suggestion for implementing a variable length
            buffer that could be used to read text or handle arrays of arbitrary
            length. I don't have the expertise in C of many folks here so I feel
            like I'm offering a small furry animal for sacrifice to a big armour
            plated one... but will offer it anyway. Please do suggest improvements
            or challenge the premise. It would be great if it could be improved to
            become a generally useful piece of code.
            >
            Well, here goes. This should be fun. :-?
            >
            -
            >
            The following utility code is passed a buffer (allocated by the
            caller) and maintains it at an appropriate size. The main function
            increases the allocation (when necessary) by factors - rather than
            fixed amounts - for speed. There is a secondary function to trim a
            buffer back to a specific size. An extra byte (one more than is
            requested) is always left at the end.
            >
            /*
            * Expanding buffer
            */
            >
            #define EBUF_SIZE_INIT 128
            #define EBUF_SIZE_MIN 128
            #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
            >
            #include <stdio.h>
            #include <stdlib.h>
            #include <malloc.h>
            >
            Only a small, furry animal offering itself up for sacrifice
            would use a non-standard header like <malloc.h>. Consider
            yourself eaten by the Ravenous Bugblatter Beast.
            Haha - the sacrificial animal of my analogy was the code I was
            offering up - rather than me!! But you've raised - and eaten - some
            good points. I wasn't aware not to use malloc.h, for example.
            Also, there seems to be no reason for <stdio.hin the
            buffer-bashing code; it doesn't hurt to #include extraneous
            baggage, but it doesn't help either. Everything you need
            is in <stdlib.h>.
            OK
            int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
            size_t new_size;
            char *new_buf;
            >
            if (*buf_size < offset + 2) { /* NB last pos left empty */
            >
            This magical `2' appears in quite a few places. Maybe
            it deserves a #define of its own?
            Agreed, it's a bit scabby as it stands. The reason for the +2 is that
            there's a +1 to change from an offset to a length - e.g. an offset of
            7 means a length of 8 - and I wanted to leave one extra byte after the
            specified length. I'd rather avoid the clutter of another defined
            constant. I'll rewrite to consistently use offsets rather than lengths
            and thus avoid the +2.
            new_size = *buf_size * EBUF_INCREASE + 1;
            >
            As a small matter of personal preference and prejudice,
            I myself would avoid floating-point arithmetic here and do
            the calculation in integers. Not a big deal, though.
            Me too. The reason for including a factor of 1.5 was simply to
            demonstrate that we don't need to settle for integer factors.
            if (new_size < offset + 2) new_size = offset + 2;
            if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
            if ((new_buf = realloc(*buf, new_size)) == NULL) {
            return 1; /* Failed to realloc buffer */
            }
            *buf = new_buf;
            *buf_size = new_size;
            }
            return 0; /* Reallocated successfuly */
            >
            }
            >
            int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
            int new_size = offset + 2; /* Includes empty char */
            char *new_buf;
            >
            if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
            if (new_size != *buf_size) {
            if ((new_buf = realloc(*buf, new_size)) == NULL) {
            return 1; /* Reallocation failed (unlikely) */
            >
            There's an interface design decision lurking here: Should
            this be considered a "failure," or just an "unsuccessf ul
            attempt to optimize?" Arguments can be made for both points
            of view. IMHO you've chosen rightly, because it's possible
            that ebuf_trim() could fail in an attempt to *increase* the
            size of the buffer, in which case the calling program might
            be, er, surprised to discover that the buffer was too small
            for the offset.
            OK
            }
            *buf = new_buf;
            *buf_size = new_size;
            }
            return 0; /* Reallocation succeeded */
            >
            }
            >
            An example of intended use follows. Note that the routines are coded
            to expect buffer and current size as parameters. Despite the error
            handling the code is intended to be fast. Including "if (offset + 2 >
            buf1_size)" in the main code the function should only be called if the
            buffer is too small. The cost of one integer comparison is small.
            >
            int main() {
            >
            `int main(void)' would be very slightly better.
            >
            char *buf1;
            size_t buf1_size = EBUF_SIZE_INIT;
            size_t offset;
            >
            if ((buf1 = malloc(buf1_siz e)) == NULL) {
            fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
            buf1_size);
            >
            What is the type of buf1_size? Answer: size_t. What type
            of operand does the "%d" specifier convert? Answer: int. Are
            size_t and int the same? Answer: No. What should you do to
            fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
            didn't mean that ...)
            I've no idea how to print these values, then. Would they be better as
            unsigned ints? I guess this would mean unsigned ints would have to be
            wide enough for any memory offset. Not sure if that can be relied
            upon.
            For fprintf() and so on, <stdio.h*is* needed.
            >
            exit(1);
            >
            ITYM `exit(EXIT_FAIL URE);'. Or `return EXIT_FAILURE;'.
            OK. Was trying to keep the interface light. As such I wanted to reduce
            the number of defines. The procedure name in meant to indicate the
            meaning of a zero or non-zero return. The function can exist in an if
            statement as

            if (ebuf_full(.... )) handle error

            }
            >
            Instead of pre-allocating the initial buffer, why not
            set buf1=NULL and buf1_size=0 and just let ebuf_full()
            take care of everything?
            I didn't know this could be done. Will include in the rewrite.

            ...
            >
            offset = <position in buffer to write to>
            >
            ...
            >
            /* Check buf1 is big enough */
            if (offset + 2 buf1_size && ebuf_full(&buf1 , &buf1_size, offset))
            {
            fprintf(stderr, "Buffer overflow - have %d bytes but need %d
            bytes",
            buf1_size, offset + 2);
            exit(1);
            }
            buf1[offset] = 0;
            >
            ...
            >
            free(buf1);
            >
            ... and since main() returns an int value, you should ...?
            (C99 introduced a special rule for main() that says falling
            off the end is equivalent to returning zero, but IMHO this
            should be viewed as a concession to the large amount of sloppy
            code already in existence, not as an encouragement to further
            sloppiness. Besides, C99 implementations have not exactly
            taken the world by storm, and lots of C90 implementations are
            still in use.)
            OK. That was a miss on my part.
            }
            >
            It seems to me you understand the basic ideas of how to
            use realloc() to grow a buffer (although the fact that you
            can reallocate a NULL may have escaped you). There are a
            few glitches in the way you've done things, easily fixable.
            >
            If you want to package something like this for wider
            use as a buffer-managing utility, you might consider putting
            the buffer information in a struct and passing a single
            struct pointer to the functions. Not only would this make
            the interface clearer by reducing the argument count, but
            it would also make it easy for you to add further fillips
            of functionality later on, just by adding a few elements
            to the struct and leaving the calls alone.
            I thought about that but chose against it. Options seem to be

            1. Address and size are scalars in the caller
            - limits other info that can be stored

            2. Struct holding address, size, factor and other parameters
            - simplfies calls to ebuf-trim
            - requires normal use of pointers to be dereferenced via the struct

            3. Struct holding parameters other than the address
            - still needs extra parameter to be passed to ebuf_full
            - requires ebuf_full to locate parameter block

            On balance the first option seemed best. It keeps the system simple
            without losing function.

            Comment

            • Eric Sosman

              #7
              Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

              James Harris wrote:
              On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
              >JamesHarriswro te:
              >>[...]
              >> size_t buf1_size = EBUF_SIZE_INIT;
              >> size_t offset;
              >> if ((buf1 = malloc(buf1_siz e)) == NULL) {
              >> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
              >>buf1_size);
              > What is the type of buf1_size? Answer: size_t. What type
              >of operand does the "%d" specifier convert? Answer: int. Are
              >size_t and int the same? Answer: No. What should you do to
              >fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
              >didn't mean that ...)
              >
              I've no idea how to print these values, then. Would they be better as
              unsigned ints? I guess this would mean unsigned ints would have to be
              wide enough for any memory offset. Not sure if that can be relied
              upon.
              If you can count on a C99 implementation, there's a length
              modifier "z" for printing size_t values:

              printf ("Size = %zu\n", buf1_size);

              If you need to live with the more widely available C90
              systems, there's no "z" modifier and you need to convert the
              size_t to something printf() knows how to handle:

              printf ("Size = %u\n", (unsigned int)buf1_size);

              or (safer):

              printf ("Size = %lu\n", (unsigned long)buf1_size) ;

              or even (extremely safe, extremely unusual):

              printf ("Size = %.0f\n", (double)buf1_si ze);

              These will work as well on C99 as they do on C90.

              --
              Eric.Sosman@sun .com

              Comment

              • James Harris

                #8
                Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                On 31 May, 09:05, "Malcolm McLean" <regniz...@btin ternet.comwrote :
                "JamesHarri s" <james.harri... @googlemail.com wrote in message
                Initial issue: read in an arbitrary-length piece of text.
                Perceived issue: handle variable-length data
                >
                The code below is a suggestion for implementing a variable length
                buffer that could be used to read text or handle arrays of arbitrary
                length. I don't have the expertise in C of many folks here so I feel
                like I'm offering a small furry animal for sacrifice to a big armour
                plated one... but will offer it anyway. Please do suggest improvements
                or challenge the premise. It would be great if it could be improved to
                become a generally useful piece of code.
                >
                Firstly, don't worry about the actual code bodies at this stage. Any
                reasonably competent C programmer should be able to provide those.
                >
                The thing is the interfaces.
                >
                The first problem is that if we use char *, the functions will only work on
                character arrays. If we use void *s, this problem disappears, but there
                might be issues about too many casts to access the actual data.
                AFAIK casts tend to make code less safe and I try to avoid them. Is
                there a good solution to this?
                The second issue is whether to use a structure for the buffer, or, as you
                have done, pass in several parameters to represent size and capacity.
                There's a nasty C stitch-up if we use void *s with option 2.
                >
                ebuf_full(void **buf ...)
                >
                char *buffer;
                /* this is illegal */
                ebuf_full(&buff er)
                >
                buffer has to be assigned to a dummy void *first. Which makes the function
                unusable.
                Not nice!

                Having a separate struct would allow other advantages such as having a
                per-buffer size increase factor but I think it would need the pointer
                to be dereferenced when it is used normally. On balance I think
                address and length (or, better, address and offset) is better.


                Here's a rewrite where I've improved the code slightly by simplifying
                a few bits of it. It now uses offsets rather than lengths and bases
                the increase on the requested offset so eliminating some of the
                checks. It does require the factor to be greater than or equal to 1.
                I'll include the functions and a sample main in one go.

                /*
                * Test expanding buffer
                */

                #define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */

                #include <stdio.h>
                #include <stdlib.h>

                int ebuf_full(char **buf, size_t *buf_limit, size_t offset) {
                size_t new_limit;
                char *new_buf;

                if (offset >= *buf_limit) {
                new_limit = offset * EBUF_INCREASE;
                if ((new_buf = realloc(*buf, new_limit + 1)) == NULL) {
                return 1; /* Failed to realloc */
                }
                *buf = new_buf;
                *buf_limit = new_limit;
                }
                return 0; /* Realloc succeeded */
                }

                int ebuf_trim(char **buf, size_t *buf_limit, size_t offset) {
                char *new_buf;

                if ((new_buf = realloc(*buf, offset + 1)) == NULL) {
                return 1; /* Realloc failed */
                }
                *buf = new_buf;
                *buf_limit = offset;
                return 0; /* Succeeded */
                }


                int main(void) {
                char *buf1 = NULL;
                size_t buf1_limit = 0;
                size_t offset;

                for (offset = 0; offset < 1000; offset += 200) {
                fprintf(stderr, "\n---Checking for offset %d\n", offset);

                if (offset >= buf1_limit && ebuf_full(&buf1 , &buf1_limit, offset))
                {
                fprintf(stderr, "-Ebuf overflow %d/%d bytes", buf1_limit,
                offset);
                exit(1);
                }
                buf1[offset] = 'x';
                }

                fprintf(stderr, "\n---Trim from %d to %d\n", buf1_limit, offset);
                if (ebuf_trim(&buf 1, &buf1_limit, offset)) {
                fprintf(stderr, "-Buffer trim to %d failure\n", offset);
                exit(1);
                }

                free(buf1);
                return 0;
                }

                Comment

                • James Harris

                  #9
                  Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                  On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@su n.comwrote:
                  James Harris wrote:
                  On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
                  JamesHarriswrot e:
                  >[...]
                  > size_t buf1_size = EBUF_SIZE_INIT;
                  > size_t offset;
                  > if ((buf1 = malloc(buf1_siz e)) == NULL) {
                  > fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
                  >buf1_size);
                  What is the type of buf1_size? Answer: size_t. What type
                  of operand does the "%d" specifier convert? Answer: int. Are
                  size_t and int the same? Answer: No. What should you do to
                  fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
                  didn't mean that ...)
                  >
                  I've no idea how to print these values, then. Would they be better as
                  unsigned ints? I guess this would mean unsigned ints would have to be
                  wide enough for any memory offset. Not sure if that can be relied
                  upon.
                  >
                  If you can count on a C99 implementation, there's a length
                  modifier "z" for printing size_t values:
                  >
                  printf ("Size = %zu\n", buf1_size);
                  >
                  If you need to live with the more widely available C90
                  systems, there's no "z" modifier and you need to convert the
                  size_t to something printf() knows how to handle:
                  >
                  printf ("Size = %u\n", (unsigned int)buf1_size);
                  >
                  or (safer):
                  >
                  printf ("Size = %lu\n", (unsigned long)buf1_size) ;
                  >
                  or even (extremely safe, extremely unusual):
                  >
                  printf ("Size = %.0f\n", (double)buf1_si ze);
                  >
                  These will work as well on C99 as they do on C90.
                  >
                  Rather than use size_t would I be better to use a type of unsigned int
                  or unsigned long in the first place?

                  --
                  James

                  Comment

                  • pete

                    #10
                    Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                    James Harris wrote:
                    On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@su n.comwrote:
                    >James Harris wrote:
                    >>On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
                    >>>JamesHarrisw rote:
                    >>>>[...]
                    >>>> size_t buf1_size = EBUF_SIZE_INIT;
                    >>>> size_t offset;
                    >>>> if ((buf1 = malloc(buf1_siz e)) == NULL) {
                    >>>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
                    >>>>buf1_size );
                    >>> What is the type of buf1_size? Answer: size_t. What type
                    >>>of operand does the "%d" specifier convert? Answer: int. Are
                    >>>size_t and int the same? Answer: No. What should you do to
                    >>>fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
                    >>>didn't mean that ...)
                    >>I've no idea how to print these values, then. Would they be better as
                    >>unsigned ints? I guess this would mean unsigned ints would have to be
                    >>wide enough for any memory offset. Not sure if that can be relied
                    >>upon.
                    > If you can count on a C99 implementation, there's a length
                    >modifier "z" for printing size_t values:
                    >>
                    > printf ("Size = %zu\n", buf1_size);
                    >>
                    > If you need to live with the more widely available C90
                    >systems, there's no "z" modifier and you need to convert the
                    >size_t to something printf() knows how to handle:
                    >>
                    > printf ("Size = %u\n", (unsigned int)buf1_size);
                    >>
                    >or (safer):
                    >>
                    > printf ("Size = %lu\n", (unsigned long)buf1_size) ;
                    >>
                    >or even (extremely safe, extremely unusual):
                    >>
                    > printf ("Size = %.0f\n", (double)buf1_si ze);
                    >>
                    > These will work as well on C99 as they do on C90.
                    >>
                    >
                    Rather than use size_t would I be better to use a type of unsigned int
                    or unsigned long in the first place?
                    If size_t confuses you, then use long unsigned instead.

                    --
                    pete

                    Comment

                    • Eric Sosman

                      #11
                      Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                      James Harris wrote:
                      On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@su n.comwrote:
                      >[... how to print a size_t ...]
                      >
                      Rather than use size_t would I be better to use a type of unsigned int
                      or unsigned long in the first place?
                      I think not: size_t is the type for sizes, so the program
                      cannot go wrong by using it. If you use unsigned int the program
                      may err, because valid size_t values might exceed the range of
                      unsigned int on some machines.

                      Under C90 rules, size_t must be an unsigned integer type,
                      and the only "integer types" are the flavors of char, short,
                      int, and long, with long being the widest. Therefore, in C90
                      it is guaranteed that converting a size_t to unsigned long
                      loses no information and preserves the value; it's a good
                      solution for printing. On the other hand, unsigned long might
                      be overkill: a 64-bit value, say, where a 32-bit value might
                      suffice. So the wise course is to calculate with size_t
                      values and convert to unsigned long only for display purposes.

                      The situation gets stickier in C99, because the repertoire
                      of "integer types" is expanded and in fact becomes open-ended:
                      it is now possible that size_t could be wider than unsigned long --
                      think of a 64-bit size_t with a 32-bit unsigned long. You could
                      print by converting to unsigned long long or to uintmax_t, but
                      since these types only exist in C99 and C99 provides the "z"
                      length modifier, using "z" for display is surely best.

                      Summary: Calculate with size_t, except perhaps in situations
                      where you must economize on storage and *know* that the actual
                      values will fit in something more restricted. For display
                      purposes, either convert to unsigned long (C90) or rely on
                      the "z" modifier (C99).

                      --
                      Eric Sosman
                      esosman@ieee-dot-org.invalid

                      Comment

                      • pete

                        #12
                        Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                        James Harris wrote:
                        #define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */
                        In my get_line function, for reading text files,

                        I increase the buffer size by only one byte,
                        each time that the buffer is found to be too small.

                        Most text files that I've dealt with,
                        only have line lengths of less than a hundred bytes,
                        and a hundred calls to realloc in a program
                        isn't going to add up to any substantial time.

                        The get_line function is set up so that if you know
                        that you're going to be dealing
                        with a file which has significantly long lines,
                        then you can supply an adequately large original buffer
                        so that no reallocation will be needed.

                        --
                        pete

                        Comment

                        • Peter Nilsson

                          #13
                          Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot;que ry

                          James Harris wrote:
                          #define EBUF_SIZE_INIT 128
                          #define EBUF_SIZE_MIN 128
                          #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
                          >
                          #define ENDCHAR '\n'
                          Macros begining with E followed by another capital are reserved if
                          <errno.h>
                          is included. Although you don't now, you should not rule out the
                          possibility of
                          future versions including it.

                          --
                          Peter

                          Comment

                          • santosh

                            #14
                            Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot; query

                            Eric Sosman wrote:
                            James Harris wrote:
                            >On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
                            >>JamesHarriswr ote:
                            >>>[...]
                            >>> size_t buf1_size = EBUF_SIZE_INIT;
                            >>> size_t offset;
                            >>> if ((buf1 = malloc(buf1_siz e)) == NULL) {
                            >>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
                            >>>buf1_size) ;
                            >> What is the type of buf1_size? Answer: size_t. What type
                            >>of operand does the "%d" specifier convert? Answer: int. Are
                            >>size_t and int the same? Answer: No. What should you do to
                            >>fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
                            >>didn't mean that ...)
                            >>
                            >I've no idea how to print these values, then. Would they be better as
                            >unsigned ints? I guess this would mean unsigned ints would have to be
                            >wide enough for any memory offset. Not sure if that can be relied
                            >upon.
                            >
                            If you can count on a C99 implementation, there's a length
                            modifier "z" for printing size_t values:
                            >
                            printf ("Size = %zu\n", buf1_size);
                            >
                            If you need to live with the more widely available C90
                            systems, there's no "z" modifier and you need to convert the
                            size_t to something printf() knows how to handle:
                            >
                            printf ("Size = %u\n", (unsigned int)buf1_size);
                            >
                            or (safer):
                            >
                            printf ("Size = %lu\n", (unsigned long)buf1_size) ;
                            >
                            or even (extremely safe, extremely unusual):
                            >
                            printf ("Size = %.0f\n", (double)buf1_si ze);
                            >
                            These will work as well on C99 as they do on C90.
                            Even more safely:

                            printf("Size = %Lf\n", (long double)buf1_siz e);

                            :-)

                            Comment

                            • santosh

                              #15
                              Re: Expanding buffer - response to &quot;Determ ine the size of malloc&quot; query

                              pete wrote:
                              James Harris wrote:
                              >On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@su n.comwrote:
                              >>James Harris wrote:
                              >>>On 30 May, 15:14, Eric Sosman <Eric.Sos...@su n.comwrote:
                              >>>>JamesHarris wrote:
                              >>>>>[...]
                              >>>>> size_t buf1_size = EBUF_SIZE_INIT;
                              >>>>> size_t offset;
                              >>>>> if ((buf1 = malloc(buf1_siz e)) == NULL) {
                              >>>>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
                              >>>>>buf1_size) ;
                              >>>> What is the type of buf1_size? Answer: size_t. What type
                              >>>>of operand does the "%d" specifier convert? Answer: int. Are
                              >>>>size_t and int the same? Answer: No. What should you do to
                              >>>>fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
                              >>>>didn't mean that ...)
                              >>>I've no idea how to print these values, then. Would they be better
                              >>>as unsigned ints? I guess this would mean unsigned ints would have
                              >>>to be wide enough for any memory offset. Not sure if that can be
                              >>>relied upon.
                              >> If you can count on a C99 implementation, there's a length
                              >>modifier "z" for printing size_t values:
                              >>>
                              >> printf ("Size = %zu\n", buf1_size);
                              >>>
                              >> If you need to live with the more widely available C90
                              >>systems, there's no "z" modifier and you need to convert the
                              >>size_t to something printf() knows how to handle:
                              >>>
                              >> printf ("Size = %u\n", (unsigned int)buf1_size);
                              >>>
                              >>or (safer):
                              >>>
                              >> printf ("Size = %lu\n", (unsigned long)buf1_size) ;
                              >>>
                              >>or even (extremely safe, extremely unusual):
                              >>>
                              >> printf ("Size = %.0f\n", (double)buf1_si ze);
                              >>>
                              >> These will work as well on C99 as they do on C90.
                              >>>
                              >>
                              >Rather than use size_t would I be better to use a type of unsigned
                              >int or unsigned long in the first place?
                              >
                              If size_t confuses you, then use long unsigned instead.
                              This will break on Windows with objects larger than 4 Gb.

                              Comment

                              Working...