StringBuilder and internal memory question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • DV

    StringBuilder and internal memory question

    I have a StringBuilder that has a string with 12,000,000 characters.
    When I do a ToString(), I expect to have ~25,000,000 bytes worth of
    memory, yet, I end up with ~43,000,000 bytes. That's almost double
    the size. The string returned from ToString() is actually of size
    StringBuilder.C apacity, NOT StringBuilder.L ength. It may have a
    end-of-string character at StringBuilder.L ength, but its actual memory
    size is StringBuilder.C apacity.

    Does this sound right?

  • Randy A. Ynchausti

    #2
    Re: StringBuilder and internal memory question

    DV,
    [color=blue]
    >I have a StringBuilder that has a string with 12,000,000 characters.
    > When I do a ToString(), I expect to have ~25,000,000 bytes worth of
    > memory, yet, I end up with ~43,000,000 bytes. That's almost double
    > the size. The string returned from ToString() is actually of size
    > StringBuilder.C apacity, NOT StringBuilder.L ength. It may have a
    > end-of-string character at StringBuilder.L ength, but its actual memory
    > size is StringBuilder.C apacity.[/color]

    Capacity of a StringBuilder is how many characters can be stored in the
    StringBuilder at a given time. Length is how many characters are
    represented in the string that the StringBuilder represents at a given time.
    The maximum value of length and capacity is 2,147,483,647.

    Unicode characters can consume 2, 3 or 4 bytes. Therefore, depending on
    which characters are being stored in the StringBuilder, it would be possible
    to have 12,000,000 characters in the StringBuilder that translate to about
    43,000,000 bytes. You will have to iterate the characters in the string
    counting the bytes to see if the number of bytes can be explained by the
    storage of unicode characters.

    Regards,

    Randy


    Comment

    • Willy Denoyette [MVP]

      #3
      Re: StringBuilder and internal memory question


      "DV" <datvong@gmail. com> wrote in message
      news:1141178512 .611818.132800@ j33g2000cwa.goo glegroups.com.. .
      |I have a StringBuilder that has a string with 12,000,000 characters.
      | When I do a ToString(), I expect to have ~25,000,000 bytes worth of
      | memory, yet, I end up with ~43,000,000 bytes. That's almost double
      | the size. The string returned from ToString() is actually of size
      | StringBuilder.C apacity, NOT StringBuilder.L ength. It may have a
      | end-of-string character at StringBuilder.L ength, but its actual memory
      | size is StringBuilder.C apacity.
      |
      | Does this sound right?
      |

      No it doesn't, I really don't know where you got this value from but it's
      not the size of the SB. The Capacity and Length property values are the
      number of characters the SB can hold and the actual number it holds. Note
      the "number of characters", not the number of bytes.
      If you create a stringbuilder like this:

      string s = new String('\u0306' , 12000000);
      StringBuilder sb = new StringBuilder(s );

      you will end with a SB buffer (a String) with a Capacity of 16777216 and a
      Length of 12000000.
      The length of the char[] (the string backing store) in bytes is 33554432.
      Note that the string backing store is a char[], so the buffer size does not
      depend encoding used as the other replier suggests, a char is fixed 2 bytes
      in .NET.

      Willy.







      Comment

      • Willy Denoyette [MVP]

        #4
        Re: StringBuilder and internal memory question


        "Willy Denoyette [MVP]" <willy.denoyett e@telenet.be> wrote in message
        news:uBzDJySPGH A.2668@tk2msftn gp13.phx.gbl...
        |
        | "DV" <datvong@gmail. com> wrote in message
        | news:1141178512 .611818.132800@ j33g2000cwa.goo glegroups.com.. .
        ||I have a StringBuilder that has a string with 12,000,000 characters.
        || When I do a ToString(), I expect to have ~25,000,000 bytes worth of
        || memory, yet, I end up with ~43,000,000 bytes. That's almost double
        || the size. The string returned from ToString() is actually of size
        || StringBuilder.C apacity, NOT StringBuilder.L ength. It may have a
        || end-of-string character at StringBuilder.L ength, but its actual memory
        || size is StringBuilder.C apacity.
        ||
        || Does this sound right?
        ||
        |
        | No it doesn't, I really don't know where you got this value from but it's
        | not the size of the SB. The Capacity and Length property values are the
        | number of characters the SB can hold and the actual number it holds. Note
        | the "number of characters", not the number of bytes.
        | If you create a stringbuilder like this:
        |
        | string s = new String('\u0306' , 12000000);
        | StringBuilder sb = new StringBuilder(s );
        |
        | you will end with a SB buffer (a String) with a Capacity of 16777216 and a
        | Length of 12000000.
        | The length of the char[] (the string backing store) in bytes is 33554432.
        | Note that the string backing store is a char[], so the buffer size does
        not
        | depend encoding used as the other replier suggests, a char is fixed 2
        bytes
        | in .NET.
        |
        | Willy.

        To add to my previous reply, if you create a SB like this:

        StringBuilder sb = new StringBuilder(s , 12000000);

        Your buffer will be 24000000 bytes (SB Capactity=Lengt h=12000000). The
        reason for this is that now you create a SB with a predefined length, while
        in the previous sample, the SB starts with a Capactity of 16 and expands by
        doubling it's capacity each time it gets filled completely.


        Willy.



        Comment

        • Jon Skeet [C# MVP]

          #5
          Re: StringBuilder and internal memory question

          DV <datvong@gmail. com> wrote:[color=blue]
          > I have a StringBuilder that has a string with 12,000,000 characters.
          > When I do a ToString(), I expect to have ~25,000,000 bytes worth of
          > memory, yet, I end up with ~43,000,000 bytes. That's almost double
          > the size. The string returned from ToString() is actually of size
          > StringBuilder.C apacity, NOT StringBuilder.L ength. It may have a
          > end-of-string character at StringBuilder.L ength, but its actual memory
          > size is StringBuilder.C apacity.
          >
          > Does this sound right?[/color]

          Its size in terms of memory consumption is indeed represented by the
          capacity.

          --
          Jon Skeet - <skeet@pobox.co m>
          http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
          If replying to the group, please do not mail me too

          Comment

          • Jon Skeet [C# MVP]

            #6
            Re: StringBuilder and internal memory question

            Randy A. Ynchausti <randy_ynchaust i@msn.com> wrote:

            <snip>
            [color=blue]
            > Unicode characters can consume 2, 3 or 4 bytes.[/color]

            That's true in some encodings, but .NET internally uses UTF-16, which
            is exactly 2 bytes per character. Surrogate pairs are treated as two
            characters when it comes to things like String.Length.

            --
            Jon Skeet - <skeet@pobox.co m>
            http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
            If replying to the group, please do not mail me too

            Comment

            • Jon Skeet [C# MVP]

              #7
              Re: StringBuilder and internal memory question

              Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:[color=blue]
              > No it doesn't, I really don't know where you got this value from but it's
              > not the size of the SB. The Capacity and Length property values are the
              > number of characters the SB can hold and the actual number it holds. Note
              > the "number of characters", not the number of bytes.
              > If you create a stringbuilder like this:
              >
              > string s = new String('\u0306' , 12000000);
              > StringBuilder sb = new StringBuilder(s );
              >
              > you will end with a SB buffer (a String) with a Capacity of 16777216 and a
              > Length of 12000000.[/color]

              Yes - but if you then call ToString() on that StringBuilder, you'll
              find that the string only has a Length of 12000000 but takes up
              2*16777216+over head bytes. In other words, StringBuilder doesn't trim
              the string to its length before returning it.

              --
              Jon Skeet - <skeet@pobox.co m>
              http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
              If replying to the group, please do not mail me too

              Comment

              • Willy Denoyette [MVP]

                #8
                Re: StringBuilder and internal memory question

                That's why I said....
                The length of the char[] (the string backing store) in bytes is 33554432.


                Willy.



                "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                news:MPG.1e7002 f97da9c7ee98cec 0@msnews.micros oft.com...
                | Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:
                | > No it doesn't, I really don't know where you got this value from but
                it's
                | > not the size of the SB. The Capacity and Length property values are the
                | > number of characters the SB can hold and the actual number it holds.
                Note
                | > the "number of characters", not the number of bytes.
                | > If you create a stringbuilder like this:
                | >
                | > string s = new String('\u0306' , 12000000);
                | > StringBuilder sb = new StringBuilder(s );
                | >
                | > you will end with a SB buffer (a String) with a Capacity of 16777216 and
                a
                | > Length of 12000000.
                |
                | Yes - but if you then call ToString() on that StringBuilder, you'll
                | find that the string only has a Length of 12000000 but takes up
                | 2*16777216+over head bytes. In other words, StringBuilder doesn't trim
                | the string to its length before returning it.
                |
                | --
                | Jon Skeet - <skeet@pobox.co m>
                | http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                | If replying to the group, please do not mail me too


                Comment

                • Willy Denoyette [MVP]

                  #9
                  Re: StringBuilder and internal memory question

                  Hit send too fast.
                  The String reference returned from SB.ToString() is just the same reference
                  to the existing String object, trimming would involve the creation of a new
                  string, this would increase the memory footprint and would take a
                  performance hit.

                  Willy.



                  "Willy Denoyette [MVP]" <willy.denoyett e@telenet.be> wrote in message
                  news:OPJxFkWPGH A.2036@TK2MSFTN GP14.phx.gbl...
                  | That's why I said....
                  | The length of the char[] (the string backing store) in bytes is 33554432.
                  |
                  |
                  | Willy.
                  |
                  |
                  |
                  | "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                  | news:MPG.1e7002 f97da9c7ee98cec 0@msnews.micros oft.com...
                  || Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:
                  || > No it doesn't, I really don't know where you got this value from but
                  | it's
                  || > not the size of the SB. The Capacity and Length property values are the
                  || > number of characters the SB can hold and the actual number it holds.
                  | Note
                  || > the "number of characters", not the number of bytes.
                  || > If you create a stringbuilder like this:
                  || >
                  || > string s = new String('\u0306' , 12000000);
                  || > StringBuilder sb = new StringBuilder(s );
                  || >
                  || > you will end with a SB buffer (a String) with a Capacity of 16777216
                  and
                  | a
                  || > Length of 12000000.
                  ||
                  || Yes - but if you then call ToString() on that StringBuilder, you'll
                  || find that the string only has a Length of 12000000 but takes up
                  || 2*16777216+over head bytes. In other words, StringBuilder doesn't trim
                  || the string to its length before returning it.
                  ||
                  || --
                  || Jon Skeet - <skeet@pobox.co m>
                  || http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                  || If replying to the group, please do not mail me too
                  |
                  |


                  Comment

                  • Nick Hounsome

                    #10
                    Re: StringBuilder and internal memory question


                    "Willy Denoyette [MVP]" <willy.denoyett e@telenet.be> wrote in message
                    news:OXewTsWPGH A.1028@TK2MSFTN GP11.phx.gbl...[color=blue]
                    > Hit send too fast.
                    > The String reference returned from SB.ToString() is just the same
                    > reference
                    > to the existing String object, trimming would involve the creation of a
                    > new
                    > string, this would increase the memory footprint and would take a
                    > performance hit.
                    >
                    > Willy.[/color]

                    That's what I thought at first and then I thought about subsequent changes
                    to the builder - It must keep a reference to the string returned just in
                    case - subsequent mods must cause the copying of the string into a new
                    buffer but since this is uncommon in real code it's worth the optimisation.
                    [color=blue]
                    >
                    > "Willy Denoyette [MVP]" <willy.denoyett e@telenet.be> wrote in message
                    > news:OPJxFkWPGH A.2036@TK2MSFTN GP14.phx.gbl...
                    > | That's why I said....
                    > | The length of the char[] (the string backing store) in bytes is
                    > 33554432.
                    > |
                    > |
                    > | Willy.
                    > |
                    > |
                    > |
                    > | "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                    > | news:MPG.1e7002 f97da9c7ee98cec 0@msnews.micros oft.com...
                    > || Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:
                    > || > No it doesn't, I really don't know where you got this value from but
                    > | it's
                    > || > not the size of the SB. The Capacity and Length property values are
                    > the
                    > || > number of characters the SB can hold and the actual number it holds.
                    > | Note
                    > || > the "number of characters", not the number of bytes.
                    > || > If you create a stringbuilder like this:
                    > || >
                    > || > string s = new String('\u0306' , 12000000);
                    > || > StringBuilder sb = new StringBuilder(s );
                    > || >
                    > || > you will end with a SB buffer (a String) with a Capacity of 16777216
                    > and
                    > | a
                    > || > Length of 12000000.
                    > ||
                    > || Yes - but if you then call ToString() on that StringBuilder, you'll
                    > || find that the string only has a Length of 12000000 but takes up
                    > || 2*16777216+over head bytes. In other words, StringBuilder doesn't trim
                    > || the string to its length before returning it.
                    > ||
                    > || --
                    > || Jon Skeet - <skeet@pobox.co m>
                    > || http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                    > || If replying to the group, please do not mail me too
                    > |
                    > |
                    >
                    >[/color]


                    Comment

                    • DV

                      #11
                      Re: StringBuilder and internal memory question

                      Thanks all... I think i got it...
                      Interestingly enough..

                      If I do...
                      StringBuilder sb = new StringBuilder(1 0000000);
                      for (int i=0; i<100; i++)
                      sb.Append("Hell o World");
                      string a1 = sb.ToString();

                      and..
                      StringBuilder sb = new StringBuilder() ;
                      for (int i=0; i<100; i++)
                      sb.Append("Hell o World");
                      string a2 = sb.ToString();

                      in this case, a1 actually takes LESS physical memory then a2. a1 gets
                      trimmed while a2 returns the internal string.
                      i guess the lesson is to put a capacity whenever possible. but that
                      also has drawbacks..

                      Comment

                      • Jon Skeet [C# MVP]

                        #12
                        Re: StringBuilder and internal memory question

                        Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:[color=blue]
                        > Hit send too fast.
                        > The String reference returned from SB.ToString() is just the same reference
                        > to the existing String object, trimming would involve the creation of a new
                        > string, this would increase the memory footprint and would take a
                        > performance hit.[/color]

                        Exactly - so the OP's supposition that "The string returned from
                        ToString() is actually of size StringBuilder.C apacity, NOT
                        StringBuilder.L ength" sounds right to me.

                        --
                        Jon Skeet - <skeet@pobox.co m>
                        http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                        If replying to the group, please do not mail me too

                        Comment

                        • Willy Denoyette [MVP]

                          #13
                          Re: StringBuilder and internal memory question


                          "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
                          news:MPG.1e70a8 7266259cd398cec 6@msnews.micros oft.com...
                          | Willy Denoyette [MVP] <willy.denoyett e@telenet.be> wrote:
                          | > Hit send too fast.
                          | > The String reference returned from SB.ToString() is just the same
                          reference
                          | > to the existing String object, trimming would involve the creation of a
                          new
                          | > string, this would increase the memory footprint and would take a
                          | > performance hit.
                          |
                          | Exactly - so the OP's supposition that "The string returned from
                          | ToString() is actually of size StringBuilder.C apacity, NOT
                          | StringBuilder.L ength" sounds right to me.
                          |

                          Not really, what delimits a string is it's Length, the fact that the string
                          buffer (char[]) is larger is due to the way SB's handle the underlying
                          'String', they can add char's to the buffer without the need to create a new
                          String object (until the buffer is filled completely), which is the reason
                          why SBs exist. What is returned by SB.ToString() is a String reference and
                          what counts is the Length property value, what's stored beyond is not part
                          of the String, there isn't a managed way to get at the size of the buffer,
                          nor can you get at the contents of the store beyond the last char position +
                          1 (the 0x0000 terminator) in the buffer.
                          But this was not really my point, the OP assumes that the capacity is 43MB
                          for a string of 12000000 which is definitely not true.

                          Willy.


                          Comment

                          • Jon Skeet [C# MVP]

                            #14
                            Re: StringBuilder and internal memory question

                            Willy Denoyette [MVP] wrote:[color=blue]
                            > | Exactly - so the OP's supposition that "The string returned from
                            > | ToString() is actually of size StringBuilder.C apacity, NOT
                            > | StringBuilder.L ength" sounds right to me.
                            >
                            > Not really, what delimits a string is it's Length, the fact that the string
                            > buffer (char[]) is larger is due to the way SB's handle the underlying
                            > 'String', they can add char's to the buffer without the need to create a new
                            > String object (until the buffer is filled completely), which is the reason
                            > why SBs exist. What is returned by SB.ToString() is a String reference and
                            > what counts is the Length property value, what's stored beyond is not part
                            > of the String, there isn't a managed way to get at the size of the buffer,
                            > nor can you get at the contents of the store beyond the last char position +
                            > 1 (the 0x0000 terminator) in the buffer.[/color]

                            No - but given the rest of the OP's question, I thought it reasonable
                            to assume that the "size" he's talking about is the size in memory, not
                            the "logical" length of the string. Given that reading, isn't the OP's
                            supposition valid? The "buffer size" (in characters) of the returned
                            string is the same as (or maybe one or two characters more or less -
                            not sure) the capacity of the StringBuilder it was returned from.

                            Even though you can't get at the rest of the buffer, I'd say it still
                            "counts" in terms of being valuable information. For instance, if you
                            were creating a lot of long-lived strings from StringBuilders, it may
                            be worth creating a copy of the string which effectively trims the
                            extra buffer, allowing the oversized string to then be garbage
                            collected. (I've done this many times in Java, particularly when
                            reading lines from a file. The initial buffer is 80 characters, and if
                            you're reading a dictionary a line at a time and the average line
                            length is only 5 or 6 characters, the wastage can be very, very
                            significant.)
                            [color=blue]
                            > But this was not really my point, the OP assumes that the capacity is 43MB
                            > for a string of 12000000 which is definitely not true.[/color]

                            That much is certainly true - *if* by "string of 12000000" you mean a
                            "string of internal buffer size 12000000 characters".

                            Jon

                            Comment

                            • Willy Denoyette [MVP]

                              #15
                              Re: StringBuilder and internal memory question


                              "DV" <datvong@gmail. com> wrote in message
                              news:1141272408 .013859.276340@ u72g2000cwu.goo glegroups.com.. .
                              | Thanks all... I think i got it...
                              | Interestingly enough..
                              |
                              | If I do...
                              | StringBuilder sb = new StringBuilder(1 0000000);
                              | for (int i=0; i<100; i++)
                              | sb.Append("Hell o World");
                              | string a1 = sb.ToString();
                              |
                              | and..
                              | StringBuilder sb = new StringBuilder() ;
                              | for (int i=0; i<100; i++)
                              | sb.Append("Hell o World");
                              | string a2 = sb.ToString();
                              |
                              | in this case, a1 actually takes LESS physical memory then a2. a1 gets
                              | trimmed while a2 returns the internal string.
                              | i guess the lesson is to put a capacity whenever possible. but that
                              | also has drawbacks..
                              |

                              Right, what you see here is the result of an optimization.
                              Let me try to explain what's happening ....

                              First you need to know how a SB and a String looks like on the managed heap,
                              this is how a StringBuilder object looks:

                              <standard object header> // to IntPtr sized values, not relevant here
                              IntPtr m_currentThread ;
                              int m_maxCapacity;
                              string m_StringValue;

                              while a String looks like:
                              <object header>
                              int m_arrayLength;
                              int m_stringLength;
                              char m_firstChar;

                              In the first case, you create a SB with capacity 10000000, that means that
                              the size of the underlying String object is larger than 85Kb, so, the String
                              will end on the Large Object Heap (LOH).
                              Then you start filling the string buffer, the result at the end of the loop
                              looks like:

                              Your StrinBuilder sb on the Gen0 heap

                              m_currentThread = xxxx // not important here
                              m_maxCapacity = 2147483647 // 2GB
                              m_StringValue = 03271000 // reference - points to a string on the
                              LOH (value as a sample)

                              m_arrayLength = 10000001 // Buffer space (in no. of char)
                              m_stringLength = 1100 // actual string Length
                              m_firstChar = 'H' // First char in buffer (start of buffer
                              .... // following chars
                              .... = 'd' // last char of string (buffer position 1100)
                              .... = 0x0000 // last char in buffer (buffer position 1101)

                              Now when you execute ... sb.ToString();
                              the CLR rightfully decides that this String object doesn't belong to the LOH
                              (is < 85Kb), so he creates a new String on the Gen0 heap and returns it's
                              reference in a1, the new string object looks now like:

                              m_arrayLength = 1101 // Buffer space (in no. of char)
                              m_stringLength = 1100 // actual string Length
                              m_firstChar = 'H' // First char in buffer (start of buffer
                              .... // following chars
                              .... = 'd' // last char of string (buffer position 1100)
                              .... = 0x0000

                              Notice the new m_arrayLength of 1100 ...
                              Note that the m_arrayLength = 10000001 has never been committed, only
                              reserved that is why you don't see this allocated in physical memory.




                              What's happening in the second case is:

                              A SB is created on the Gen0 heap and looks like:

                              m_currentThread = xxxx // not important here
                              m_maxCapacity = 2147483647 // 2GB
                              m_StringValue = 01274e34 // reference - points to a string on the Gen0
                              heap(value as a sample)

                              and the underlying String object at the end of the loop:

                              m_arrayLength = 2049 // Buffer space (in no. of char)
                              m_stringLength = 1100 // actual string Length
                              m_firstChar = 'H' // First char in buffer (start of buffer
                              .... // following chars
                              .... = 'd' // last char of string (buffer position 1100)
                              .... = 0x0000

                              Notice the m_arrayLength ...

                              But before you get this final string, a number of temporary strings need to
                              be build. Remember that the SB starts with an m_arrayLength = 17 (16 + 1 for
                              the 0x0000 string termination char).
                              That means that after the loop you have effectively created 8 intermediate
                              string objects (16, 32, 64, ...2048).
                              That means that you have wasted some memory, but also some CPU cycles, more,
                              you also put some additional stress on the GC which will have to clean-up
                              the intermediate objects.
                              Conclusion: you should try to pre-allocate SB's whenever possible. Note that
                              this is especially important for server applications and for client
                              (WinForms) applications that need to run in Terminal Server environments.


                              Hope this clears things up a bit :-)

                              Willy.






                              Comment

                              Working...