Array.Resize or List<> or some other data structure

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?Utf-8?B?VHJlY2l1cw==?=

    Array.Resize or List<> or some other data structure

    Hello, Newsgroupians:

    I've an optimization question for you all really quick. I have a stream
    that I am reading some bytes. At times, the stream can contain a small
    amount of bytes such as 50 or so or it can contain as much 10000000 bytes.
    In reality, I do not know the maximum number of bytes.

    In my function, I am going to read() the byte stream using a buffer. Now,
    is it better to read it into a buffer and dump the buffer into a List<byte>
    maybe using AddRange() or should I Array.Resize the buffer to grow a specific
    size everytime?

    Code for List<byte>

    List<bytelstByt es = new List<byte>();
    byte[] buffer = new byte[2048];

    while (stream.Read(bu ffer, 0, buffer.Length) != -1)
    {
    lstBytes.AddRan ge(buffer);
    }
    return lstBytes.ToArra y();



    Code for resizing array:

    byte[] buffer = new byte[2048];
    while (stream.Read(bu ffer, buffer.Length - 2048, 2048) != -1)
    {
    Array.Resize(re f buffer, buffer.Size + 2048);
    }
    return buffer;



    So which way should I use? Should I dump it into a list everytime, or
    should I resize the array everytime? Is there another way you would
    recommend? Thank you all for your help and suggestions.


    Trecius
  • =?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?=

    #2
    RE: Array.Resize or List&lt;&gt; or some other data structure

    Are you aware there is:

    byte [] bytes = File.ReadAllByt es("file.bin") ;

    ?

    "Trecius" wrote:
    Hello, Newsgroupians:
    >
    I've an optimization question for you all really quick. I have a stream
    that I am reading some bytes. At times, the stream can contain a small
    amount of bytes such as 50 or so or it can contain as much 10000000 bytes.
    In reality, I do not know the maximum number of bytes.
    >
    In my function, I am going to read() the byte stream using a buffer. Now,
    is it better to read it into a buffer and dump the buffer into a List<byte>
    maybe using AddRange() or should I Array.Resize the buffer to grow a specific
    size everytime?
    >
    Code for List<byte>
    >
    List<bytelstByt es = new List<byte>();
    byte[] buffer = new byte[2048];
    >
    while (stream.Read(bu ffer, 0, buffer.Length) != -1)
    {
    lstBytes.AddRan ge(buffer);
    }
    return lstBytes.ToArra y();
    >
    >
    >
    Code for resizing array:
    >
    byte[] buffer = new byte[2048];
    while (stream.Read(bu ffer, buffer.Length - 2048, 2048) != -1)
    {
    Array.Resize(re f buffer, buffer.Size + 2048);
    }
    return buffer;
    >
    >
    >
    So which way should I use? Should I dump it into a list everytime, or
    should I resize the array everytime? Is there another way you would
    recommend? Thank you all for your help and suggestions.
    >
    >
    Trecius

    Comment

    • =?Utf-8?B?VHJlY2l1cw==?=

      #3
      RE: Array.Resize or List&lt;&gt; or some other data structure

      My stream isn't a file. :(

      "Family Tree Mike" wrote:
      Are you aware there is:
      >
      byte [] bytes = File.ReadAllByt es("file.bin") ;
      >
      ?
      >
      "Trecius" wrote:
      >
      Hello, Newsgroupians:

      I've an optimization question for you all really quick. I have a stream
      that I am reading some bytes. At times, the stream can contain a small
      amount of bytes such as 50 or so or it can contain as much 10000000 bytes.
      In reality, I do not know the maximum number of bytes.

      In my function, I am going to read() the byte stream using a buffer. Now,
      is it better to read it into a buffer and dump the buffer into a List<byte>
      maybe using AddRange() or should I Array.Resize the buffer to grow a specific
      size everytime?

      Code for List<byte>

      List<bytelstByt es = new List<byte>();
      byte[] buffer = new byte[2048];

      while (stream.Read(bu ffer, 0, buffer.Length) != -1)
      {
      lstBytes.AddRan ge(buffer);
      }
      return lstBytes.ToArra y();



      Code for resizing array:

      byte[] buffer = new byte[2048];
      while (stream.Read(bu ffer, buffer.Length - 2048, 2048) != -1)
      {
      Array.Resize(re f buffer, buffer.Size + 2048);
      }
      return buffer;



      So which way should I use? Should I dump it into a list everytime, or
      should I resize the array everytime? Is there another way you would
      recommend? Thank you all for your help and suggestions.


      Trecius

      Comment

      • =?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?=

        #4
        RE: Array.Resize or List&lt;&gt; or some other data structure

        Then will Stream.Length work to initially size the array?

        "Trecius" wrote:
        My stream isn't a file. :(
        >
        "Family Tree Mike" wrote:
        >
        Are you aware there is:

        byte [] bytes = File.ReadAllByt es("file.bin") ;

        ?

        "Trecius" wrote:
        Hello, Newsgroupians:
        >
        I've an optimization question for you all really quick. I have a stream
        that I am reading some bytes. At times, the stream can contain a small
        amount of bytes such as 50 or so or it can contain as much 10000000 bytes.
        In reality, I do not know the maximum number of bytes.
        >
        In my function, I am going to read() the byte stream using a buffer. Now,
        is it better to read it into a buffer and dump the buffer into a List<byte>
        maybe using AddRange() or should I Array.Resize the buffer to grow a specific
        size everytime?
        >
        Code for List<byte>
        >
        List<bytelstByt es = new List<byte>();
        byte[] buffer = new byte[2048];
        >
        while (stream.Read(bu ffer, 0, buffer.Length) != -1)
        {
        lstBytes.AddRan ge(buffer);
        }
        return lstBytes.ToArra y();
        >
        >
        >
        Code for resizing array:
        >
        byte[] buffer = new byte[2048];
        while (stream.Read(bu ffer, buffer.Length - 2048, 2048) != -1)
        {
        Array.Resize(re f buffer, buffer.Size + 2048);
        }
        return buffer;
        >
        >
        >
        So which way should I use? Should I dump it into a list everytime, or
        should I resize the array everytime? Is there another way you would
        recommend? Thank you all for your help and suggestions.
        >
        >
        Trecius

        Comment

        • Rudy Velthuis

          #5
          Re: Array.Resize or List&lt;&gt; or some other data structure

          Family Tree Mike wrote:
          Then will Stream.Length work to initially size the array?
          If his stream reads bytes from, say, a port, I guess Length is not
          known before all bytes are read.

          --
          Rudy Velthuis http://rvelthuis.de

          "The study of non-linear physics is like the study of non-elephant
          biology." -- Unknown

          Comment

          • =?Utf-8?B?VHJlY2l1cw==?=

            #6
            Re: Array.Resize or List&lt;&gt; or some other data structure

            In fact, it is a port. :)

            "Rudy Velthuis" wrote:
            Family Tree Mike wrote:
            >
            Then will Stream.Length work to initially size the array?
            >
            If his stream reads bytes from, say, a port, I guess Length is not
            known before all bytes are read.
            >
            --
            Rudy Velthuis http://rvelthuis.de
            >
            "The study of non-linear physics is like the study of non-elephant
            biology." -- Unknown
            >

            Comment

            • Peter Duniho

              #7
              Re: Array.Resize or List&lt;&gt; or some other data structure

              On Fri, 17 Oct 2008 11:20:05 -0700, Trecius
              <Trecius@discus sions.microsoft .comwrote:
              In fact, it is a port. :)
              If by "port", you mean a NetworkStream retrieved from a Socket instance,
              then Rudy is correct...the Length property cannot be determined and in
              fact will always throw a NotSupportedExc eption.

              Comment

              • Peter Duniho

                #8
                Re: Array.Resize or List&lt;&gt; or some other data structure

                On Fri, 17 Oct 2008 08:48:13 -0700, Trecius
                <Trecius@discus sions.microsoft .comwrote:
                [...]
                So which way should I use? Should I dump it into a list everytime, or
                should I resize the array everytime? Is there another way you would
                recommend? Thank you all for your help and suggestions.
                The two approaches you're asking about are basically equivalent. The
                List<Tclass uses an array internally, and will do effectively the same
                operation as Array.Resize(). The only real difference between the two is
                that List<Talways doubles the size of the storage, so that you need to
                resize fewer and fewer times as the data gets larger. Of course, you
                could always use that strategy when using Array.Resize() as well, if that
                was important.

                Personally, I wouldn't use either. I would make every effort to try to
                process the bytes as they are read, so that they never have to be all in
                memory at once. That's the most ideal solution, as it avoids the whole
                business of having to buffer an arbitrarily large amount of data
                altogether.

                If you can't process the bytes as they are read, but instead need to store
                them all up first, I would use a MemoryStream, and write to the
                MemoryStream as the bytes come in. Then when you're done, you can use the
                MemoryStream.To Array() method to get the byte array representing the data.

                I believe that MemoryStream uses the same double-and-copy algorithm as
                List<T>, so if that wound up being a performance liability, I would switch
                to allocating individual buffers and storing them in a List<byte[]>. That
                is, rather than resizing a single byte[] over and over, just allocate a
                new byte[] when you've run out of room in your current byte[], storing a
                reference to each byte[] in the List<byte[]>.

                One more alternative would be to have the i/o code use individual byte[]
                instances only, and hand those off to a different thread that deals with
                writing them to a MemoryStream. In terms of performance, this would
                probably be somewhere in between using a List<byte[]to store individual
                buffers and just always writing to a MemoryStream.

                With this alternative, you could either use a double- or triple-buffering
                scheme where you have two or three such buffers that are used in rotation,
                or you could just allocate a new buffer as needed, letting the used ones
                be garbage collected after they've been copied to the MemoryStream. The
                former has the advantage of not causing a lot of repeated allocations and
                collections, at the cost of complexity and the possibility of having the
                i/o thread having to wait for a buffer to become available.

                Personally, if you have to buffer all the data, I would start with writing
                to a MemoryStream. It is by far the simplest approach, and may well
                perform adequately for your needs. Only if I ran into some specific
                performance issue would I then start exploring some of these other
                options. They are reasonably straightforward to code, but would certainly
                obfuscate the core purpose of the code and any complication of the code
                should avoided unless absolutely necessary.

                Pete

                Comment

                • Rudy Velthuis

                  #9
                  Re: Array.Resize or List&lt;&gt; or some other data structure

                  Peter Duniho wrote:
                  On Fri, 17 Oct 2008 11:20:05 -0700, Trecius
                  <Trecius@discus sions.microsoft .comwrote:
                  >
                  In fact, it is a port. :)
                  >
                  If by "port", you mean a NetworkStream retrieved from a Socket
                  instance, then Rudy is correct...the Length property cannot be
                  determined and in fact will always throw a NotSupportedExc eption.
                  I actually meant a physical port, like an USB port with some kind of
                  lab device attached, but the kind of port you meant has the same
                  problems. You simply can't know the amount of data to expect.

                  After all, data can be read from so many sources. <g>

                  --
                  Rudy Velthuis http://rvelthuis.de

                  "1001 words say more than one picture" -- Chinese proverb

                  Comment

                  • =?Utf-8?B?VHJlY2l1cw==?=

                    #10
                    Re: Array.Resize or List&lt;&gt; or some other data structure

                    Thank you, Mr. Duniho. I will use your suggestion. It seems like it will
                    work perfectly for my needs. Thank you again.

                    Trecius

                    "Peter Duniho" wrote:
                    On Fri, 17 Oct 2008 08:48:13 -0700, Trecius
                    <Trecius@discus sions.microsoft .comwrote:
                    >
                    [...]
                    So which way should I use? Should I dump it into a list everytime, or
                    should I resize the array everytime? Is there another way you would
                    recommend? Thank you all for your help and suggestions.
                    >
                    The two approaches you're asking about are basically equivalent. The
                    List<Tclass uses an array internally, and will do effectively the same
                    operation as Array.Resize(). The only real difference between the two is
                    that List<Talways doubles the size of the storage, so that you need to
                    resize fewer and fewer times as the data gets larger. Of course, you
                    could always use that strategy when using Array.Resize() as well, if that
                    was important.
                    >
                    Personally, I wouldn't use either. I would make every effort to try to
                    process the bytes as they are read, so that they never have to be all in
                    memory at once. That's the most ideal solution, as it avoids the whole
                    business of having to buffer an arbitrarily large amount of data
                    altogether.
                    >
                    If you can't process the bytes as they are read, but instead need to store
                    them all up first, I would use a MemoryStream, and write to the
                    MemoryStream as the bytes come in. Then when you're done, you can use the
                    MemoryStream.To Array() method to get the byte array representing the data.
                    >
                    I believe that MemoryStream uses the same double-and-copy algorithm as
                    List<T>, so if that wound up being a performance liability, I would switch
                    to allocating individual buffers and storing them in a List<byte[]>. That
                    is, rather than resizing a single byte[] over and over, just allocate a
                    new byte[] when you've run out of room in your current byte[], storing a
                    reference to each byte[] in the List<byte[]>.
                    >
                    One more alternative would be to have the i/o code use individual byte[]
                    instances only, and hand those off to a different thread that deals with
                    writing them to a MemoryStream. In terms of performance, this would
                    probably be somewhere in between using a List<byte[]to store individual
                    buffers and just always writing to a MemoryStream.
                    >
                    With this alternative, you could either use a double- or triple-buffering
                    scheme where you have two or three such buffers that are used in rotation,
                    or you could just allocate a new buffer as needed, letting the used ones
                    be garbage collected after they've been copied to the MemoryStream. The
                    former has the advantage of not causing a lot of repeated allocations and
                    collections, at the cost of complexity and the possibility of having the
                    i/o thread having to wait for a buffer to become available.
                    >
                    Personally, if you have to buffer all the data, I would start with writing
                    to a MemoryStream. It is by far the simplest approach, and may well
                    perform adequately for your needs. Only if I ran into some specific
                    performance issue would I then start exploring some of these other
                    options. They are reasonably straightforward to code, but would certainly
                    obfuscate the core purpose of the code and any complication of the code
                    should avoided unless absolutely necessary.
                    >
                    Pete
                    >

                    Comment

                    Working...