Fast string operations

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chad Myers

    Fast string operations

    I've been perf testing an application of mine and I've noticed that there
    are a lot (and I mean A LOT -- megabytes and megabytes of 'em) System.String
    instances being created.

    I've done some analysis and I'm led to believe (but can't yet quantitatively
    establish as fact) that the two basic culprits are a lot of calls to:

    1.) if( someString.ToLo wer() == "somestring " )

    and

    2.) if( someString != null && someString.Trim ().Length > 0 )


    ToLower() generates a new string instance as does Trim().

    I believe that these are getting called many times and churning up a bunch
    of strings faster than the GC can collect them, or perhaps there's some
    weird interning/caching thing going on. Regardless, the number of string
    instances grows and grows. It gets bumped down occasionally, but it's
    basically 5 steps forward, 1 back.

    For reference, this is an ASP application calling into .NET ComVisible
    objects. So I assume this uses the workstation GC, right?


    Anyhow, so I think that I can solve problem (1) with String.Compare( ) which
    can perform in-place case-insensitive comparisons without generating new
    string instances.

    Problem (2), however, is more complicated. There doesn't appear to be a
    TrimmedLength or any type of method or property that can give me the length
    of a string, minus whitespace and without generating a new string instance,
    in the BCL.

    I suppose I could do some unsafe, or even unmanaged code (which is what MSFT
    did for all their string handling stuff inside System.String and using the
    COMString stuff), but I'd like to try to avoid that, or at least use a
    library that's already written and well tested.

    Any thoughts?

    Thanks in advance,
    Chad Myers


  • Nicholas Paldino [.NET/C# MVP]

    #2
    Re: Fast string operations

    Chad,

    For the first scenario, your solution should give you an increase.

    For the second scenario, you should use reflection once to get a
    reference to the internal static character array WhitespaceChars on the
    string class. Then, you can write a method which will cycle through a
    string passed to it, like so:

    public static bool TrimIsNullOrEmp ty(string value)
    {
    // If null, then get out.
    if (value == null)
    {
    // Return true.
    return true;
    }

    // Cycle through the characters in the string. If the character is not
    found
    // in the whitespace array, return false, otherwise, when done, return
    true.
    foreach (char c in value)
    {
    // If the character is not found in the WhitespaceArray , then return
    // false.
    if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
    {
    // Return false.
    return false;
    }
    }

    // Return true, the string is full of whitespace.
    return true;
    }

    I used the generic version of the IndexOf method on the Array class in
    order to eliminate boxing. Also, if you really want to squeeze out every
    last bit of performance from this, you can take the WhitespaceArray and use
    the characters as keys in a dictionary. The number of whitespace characters
    is 25 (right now, that is). However, if your strings typically are padded
    with spaces, then you could get a big speed boost by copying the array
    initially, and then placing the space character as the first element in the
    array (which would cause most of the calls to IndexOf to return very
    quickly, probably quicker than a lookup in a dictionary).

    I am curious though, are you seeing a performance issue, or do you just
    see the numbers and are worried about them? ASP.NET applications tend to
    get in a nice groove with the GC over time.

    Hope this helps.

    --
    - Nicholas Paldino [.NET/C# MVP]
    - mvp@spam.guard. caspershouse.co m

    "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
    news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=blue]
    > I've been perf testing an application of mine and I've noticed that there
    > are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
    > System.String instances being created.
    >
    > I've done some analysis and I'm led to believe (but can't yet
    > quantitatively establish as fact) that the two basic culprits are a lot of
    > calls to:
    >
    > 1.) if( someString.ToLo wer() == "somestring " )
    >
    > and
    >
    > 2.) if( someString != null && someString.Trim ().Length > 0 )
    >
    >
    > ToLower() generates a new string instance as does Trim().
    >
    > I believe that these are getting called many times and churning up a bunch
    > of strings faster than the GC can collect them, or perhaps there's some
    > weird interning/caching thing going on. Regardless, the number of string
    > instances grows and grows. It gets bumped down occasionally, but it's
    > basically 5 steps forward, 1 back.
    >
    > For reference, this is an ASP application calling into .NET ComVisible
    > objects. So I assume this uses the workstation GC, right?
    >
    >
    > Anyhow, so I think that I can solve problem (1) with String.Compare( )
    > which can perform in-place case-insensitive comparisons without generating
    > new string instances.
    >
    > Problem (2), however, is more complicated. There doesn't appear to be a
    > TrimmedLength or any type of method or property that can give me the
    > length of a string, minus whitespace and without generating a new string
    > instance, in the BCL.
    >
    > I suppose I could do some unsafe, or even unmanaged code (which is what
    > MSFT did for all their string handling stuff inside System.String and
    > using the COMString stuff), but I'd like to try to avoid that, or at least
    > use a library that's already written and well tested.
    >
    > Any thoughts?
    >
    > Thanks in advance,
    > Chad Myers
    >
    >[/color]


    Comment

    • Jonathan Allen

      #3
      Re: Fast string operations

      > 1.) if( someString.ToLo wer() == "somestring " )

      FxCop will actually catch and report instances of this for you. It is my 2nd
      favorite tool outside of Visual Studio.
      [color=blue]
      > 2.) if( someString != null && someString.Trim ().Length > 0 )[/color]

      I would recommend using

      if (someString != null)
      someString = someString.Trim ();
      else
      someString = "";

      if( someString.Leng th > 0 )

      My assumption here is that you already intend to trim the string before it
      is used.

      --
      Jonathan Allen


      "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
      news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=blue]
      > I've been perf testing an application of mine and I've noticed that there
      > are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
      > System.String instances being created.
      >
      > I've done some analysis and I'm led to believe (but can't yet
      > quantitatively establish as fact) that the two basic culprits are a lot of
      > calls to:
      >
      > 1.) if( someString.ToLo wer() == "somestring " )
      >
      > and
      >
      > 2.) if( someString != null && someString.Trim ().Length > 0 )
      >
      >
      > ToLower() generates a new string instance as does Trim().
      >
      > I believe that these are getting called many times and churning up a bunch
      > of strings faster than the GC can collect them, or perhaps there's some
      > weird interning/caching thing going on. Regardless, the number of string
      > instances grows and grows. It gets bumped down occasionally, but it's
      > basically 5 steps forward, 1 back.
      >
      > For reference, this is an ASP application calling into .NET ComVisible
      > objects. So I assume this uses the workstation GC, right?
      >
      >
      > Anyhow, so I think that I can solve problem (1) with String.Compare( )
      > which can perform in-place case-insensitive comparisons without generating
      > new string instances.
      >
      > Problem (2), however, is more complicated. There doesn't appear to be a
      > TrimmedLength or any type of method or property that can give me the
      > length of a string, minus whitespace and without generating a new string
      > instance, in the BCL.
      >
      > I suppose I could do some unsafe, or even unmanaged code (which is what
      > MSFT did for all their string handling stuff inside System.String and
      > using the COMString stuff), but I'd like to try to avoid that, or at least
      > use a library that's already written and well tested.
      >
      > Any thoughts?
      >
      > Thanks in advance,
      > Chad Myers
      >
      >[/color]


      Comment

      • Chad Myers

        #4
        Re: Fast string operations

        Nicholas,

        Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!), so
        I can't use Generics.

        Would looping over chars like that slow things down significantly? Also, is
        the char[] for each string cached with the string, or is a new one created
        when you call things like ToCharArray() or foreach() on the string (not
        every loop iteration, but on the first iteration)? Wouldn't I just be
        replacing a new string instance with a new char[] and not get any net gain
        over just calling .Trim()?

        In your opinion, if I weren't against unsafe code, could I make this
        significantly faster, or would it not afford me much difference?

        As far as performance, on some of our clients' instances, memory growth is
        rapid. It seems the more memory they have, the faster it grows which leads
        me to believe that the GC is being lax since it has so much free memory and
        doesn't see the need to aggressively collect memory. But it bothers our
        clients and they perceive this to be a memory leak.

        I realize it's an education issue, but I want to make sure that I'm
        educating them correctly, as opposed to just making up a B.S. excuse and
        Jedi hand-waving about the GC stuff.

        Also, it's not an ASP.NET application, it's an ASP app that used to call
        into VB6 COM objects. We've replaced the VB6 objects with .NET objects
        exposing a "compatibil ity layer" that has a ComVisible API that is identical
        (though not binary compatible) with the old VB6 stuff. Late-bound clients
        don't know the difference other than a different ProgID for the COM objects.

        So we're dealing with the wkst GC, as far as I know (since only ASP.NET uses
        svr unless you host the CLR yourself, from what I understand). I'm not sure
        how I'd even do that in an ASP/COM-interop situation, but, assuming it's
        possible, would writing our own CLR host to use the svr GC help matters at
        all?

        Most of our clients' servers are dual-or-more processor boxes.

        Thanks again,
        Chad Myers

        "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote in
        message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...[color=blue]
        > Chad,
        >
        > For the first scenario, your solution should give you an increase.
        >
        > For the second scenario, you should use reflection once to get a
        > reference to the internal static character array WhitespaceChars on the
        > string class. Then, you can write a method which will cycle through a
        > string passed to it, like so:
        >
        > public static bool TrimIsNullOrEmp ty(string value)
        > {
        > // If null, then get out.
        > if (value == null)
        > {
        > // Return true.
        > return true;
        > }
        >
        > // Cycle through the characters in the string. If the character is not
        > found
        > // in the whitespace array, return false, otherwise, when done, return
        > true.
        > foreach (char c in value)
        > {
        > // If the character is not found in the WhitespaceArray , then
        > return
        > // false.
        > if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
        > {
        > // Return false.
        > return false;
        > }
        > }
        >
        > // Return true, the string is full of whitespace.
        > return true;
        > }
        >
        > I used the generic version of the IndexOf method on the Array class in
        > order to eliminate boxing. Also, if you really want to squeeze out every
        > last bit of performance from this, you can take the WhitespaceArray and
        > use the characters as keys in a dictionary. The number of whitespace
        > characters is 25 (right now, that is). However, if your strings typically
        > are padded with spaces, then you could get a big speed boost by copying
        > the array initially, and then placing the space character as the first
        > element in the array (which would cause most of the calls to IndexOf to
        > return very quickly, probably quicker than a lookup in a dictionary).
        >
        > I am curious though, are you seeing a performance issue, or do you just
        > see the numbers and are worried about them? ASP.NET applications tend to
        > get in a nice groove with the GC over time.
        >
        > Hope this helps.
        >
        > --
        > - Nicholas Paldino [.NET/C# MVP]
        > - mvp@spam.guard. caspershouse.co m
        >
        > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
        > news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=green]
        >> I've been perf testing an application of mine and I've noticed that there
        >> are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
        >> System.String instances being created.
        >>
        >> I've done some analysis and I'm led to believe (but can't yet
        >> quantitatively establish as fact) that the two basic culprits are a lot
        >> of calls to:
        >>
        >> 1.) if( someString.ToLo wer() == "somestring " )
        >>
        >> and
        >>
        >> 2.) if( someString != null && someString.Trim ().Length > 0 )
        >>
        >>
        >> ToLower() generates a new string instance as does Trim().
        >>
        >> I believe that these are getting called many times and churning up a
        >> bunch of strings faster than the GC can collect them, or perhaps there's
        >> some weird interning/caching thing going on. Regardless, the number of
        >> string instances grows and grows. It gets bumped down occasionally, but
        >> it's basically 5 steps forward, 1 back.
        >>
        >> For reference, this is an ASP application calling into .NET ComVisible
        >> objects. So I assume this uses the workstation GC, right?
        >>
        >>
        >> Anyhow, so I think that I can solve problem (1) with String.Compare( )
        >> which can perform in-place case-insensitive comparisons without
        >> generating new string instances.
        >>
        >> Problem (2), however, is more complicated. There doesn't appear to be a
        >> TrimmedLength or any type of method or property that can give me the
        >> length of a string, minus whitespace and without generating a new string
        >> instance, in the BCL.
        >>
        >> I suppose I could do some unsafe, or even unmanaged code (which is what
        >> MSFT did for all their string handling stuff inside System.String and
        >> using the COMString stuff), but I'd like to try to avoid that, or at
        >> least use a library that's already written and well tested.
        >>
        >> Any thoughts?
        >>
        >> Thanks in advance,
        >> Chad Myers
        >>
        >>[/color]
        >
        >[/color]


        Comment

        • Chad Myers

          #5
          Re: Fast string operations

          Jonathon:

          Thanks for your quick response.

          Unfortunately, in (2), we're not doing that. In most cases, it's OK to have
          padded strings, just not all-whitespace strings.

          Regardless, even with your suggestion, the .Trim() still creates a new
          string instance and fills the heap with crap :(

          Thanks again,
          Chad

          "Jonathan Allen" <x@x.x> wrote in message
          news:eXXXOfuZFH A.1512@TK2MSFTN GP10.phx.gbl...[color=blue][color=green]
          >> 1.) if( someString.ToLo wer() == "somestring " )[/color]
          >
          > FxCop will actually catch and report instances of this for you. It is my
          > 2nd favorite tool outside of Visual Studio.
          >[color=green]
          >> 2.) if( someString != null && someString.Trim ().Length > 0 )[/color]
          >
          > I would recommend using
          >
          > if (someString != null)
          > someString = someString.Trim ();
          > else
          > someString = "";
          >
          > if( someString.Leng th > 0 )
          >
          > My assumption here is that you already intend to trim the string before it
          > is used.
          >
          > --
          > Jonathan Allen
          >
          >
          > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
          > news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=green]
          >> I've been perf testing an application of mine and I've noticed that there
          >> are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
          >> System.String instances being created.
          >>
          >> I've done some analysis and I'm led to believe (but can't yet
          >> quantitatively establish as fact) that the two basic culprits are a lot
          >> of calls to:
          >>
          >> 1.) if( someString.ToLo wer() == "somestring " )
          >>
          >> and
          >>
          >> 2.) if( someString != null && someString.Trim ().Length > 0 )
          >>
          >>
          >> ToLower() generates a new string instance as does Trim().
          >>
          >> I believe that these are getting called many times and churning up a
          >> bunch of strings faster than the GC can collect them, or perhaps there's
          >> some weird interning/caching thing going on. Regardless, the number of
          >> string instances grows and grows. It gets bumped down occasionally, but
          >> it's basically 5 steps forward, 1 back.
          >>
          >> For reference, this is an ASP application calling into .NET ComVisible
          >> objects. So I assume this uses the workstation GC, right?
          >>
          >>
          >> Anyhow, so I think that I can solve problem (1) with String.Compare( )
          >> which can perform in-place case-insensitive comparisons without
          >> generating new string instances.
          >>
          >> Problem (2), however, is more complicated. There doesn't appear to be a
          >> TrimmedLength or any type of method or property that can give me the
          >> length of a string, minus whitespace and without generating a new string
          >> instance, in the BCL.
          >>
          >> I suppose I could do some unsafe, or even unmanaged code (which is what
          >> MSFT did for all their string handling stuff inside System.String and
          >> using the COMString stuff), but I'd like to try to avoid that, or at
          >> least use a library that's already written and well tested.
          >>
          >> Any thoughts?
          >>
          >> Thanks in advance,
          >> Chad Myers
          >>
          >>[/color]
          >
          >[/color]


          Comment

          • KH

            #6
            Re: Fast string operations

            For the second scenario -- trimming white space -- you could check the first
            and last chars to see if they're whitespace and only perform the Trim() if
            that condition is true:

            string str = " lalala ";

            if (Char.IsWhiteSp ace(str[0]) || Char.IsWhiteSpa ce(str[str.Length -1]))
            {
            str = str.Trim();
            }


            "Jonathan Allen" wrote:
            [color=blue][color=green]
            > > 1.) if( someString.ToLo wer() == "somestring " )[/color]
            >
            > FxCop will actually catch and report instances of this for you. It is my 2nd
            > favorite tool outside of Visual Studio.
            >[color=green]
            > > 2.) if( someString != null && someString.Trim ().Length > 0 )[/color]
            >
            > I would recommend using
            >
            > if (someString != null)
            > someString = someString.Trim ();
            > else
            > someString = "";
            >
            > if( someString.Leng th > 0 )
            >
            > My assumption here is that you already intend to trim the string before it
            > is used.
            >
            > --
            > Jonathan Allen
            >
            >
            > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
            > news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=green]
            > > I've been perf testing an application of mine and I've noticed that there
            > > are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
            > > System.String instances being created.
            > >
            > > I've done some analysis and I'm led to believe (but can't yet
            > > quantitatively establish as fact) that the two basic culprits are a lot of
            > > calls to:
            > >
            > > 1.) if( someString.ToLo wer() == "somestring " )
            > >
            > > and
            > >
            > > 2.) if( someString != null && someString.Trim ().Length > 0 )
            > >
            > >
            > > ToLower() generates a new string instance as does Trim().
            > >
            > > I believe that these are getting called many times and churning up a bunch
            > > of strings faster than the GC can collect them, or perhaps there's some
            > > weird interning/caching thing going on. Regardless, the number of string
            > > instances grows and grows. It gets bumped down occasionally, but it's
            > > basically 5 steps forward, 1 back.
            > >
            > > For reference, this is an ASP application calling into .NET ComVisible
            > > objects. So I assume this uses the workstation GC, right?
            > >
            > >
            > > Anyhow, so I think that I can solve problem (1) with String.Compare( )
            > > which can perform in-place case-insensitive comparisons without generating
            > > new string instances.
            > >
            > > Problem (2), however, is more complicated. There doesn't appear to be a
            > > TrimmedLength or any type of method or property that can give me the
            > > length of a string, minus whitespace and without generating a new string
            > > instance, in the BCL.
            > >
            > > I suppose I could do some unsafe, or even unmanaged code (which is what
            > > MSFT did for all their string handling stuff inside System.String and
            > > using the COMString stuff), but I'd like to try to avoid that, or at least
            > > use a library that's already written and well tested.
            > >
            > > Any thoughts?
            > >
            > > Thanks in advance,
            > > Chad Myers
            > >
            > >[/color]
            >
            >
            >[/color]

            Comment

            • KH

              #7
              Re: Fast string operations

              Better yet (I didn't notice this overload before) ...

              string str = " lalala ";

              // Be sure to check that the string variable is not a null reference
              // and its length is at least 1, otherwise you'll get index out of range
              exceptions

              if (Char.IsWhiteSp ace(str, 0) || Char.IsWhiteSpa ce(str, str.Length -1))
              {
              str = str.Trim();
              }



              "Jonathan Allen" wrote:
              [color=blue][color=green]
              > > 1.) if( someString.ToLo wer() == "somestring " )[/color]
              >
              > FxCop will actually catch and report instances of this for you. It is my 2nd
              > favorite tool outside of Visual Studio.
              >[color=green]
              > > 2.) if( someString != null && someString.Trim ().Length > 0 )[/color]
              >
              > I would recommend using
              >
              > if (someString != null)
              > someString = someString.Trim ();
              > else
              > someString = "";
              >
              > if( someString.Leng th > 0 )
              >
              > My assumption here is that you already intend to trim the string before it
              > is used.
              >
              > --
              > Jonathan Allen
              >
              >
              > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
              > news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=green]
              > > I've been perf testing an application of mine and I've noticed that there
              > > are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
              > > System.String instances being created.
              > >
              > > I've done some analysis and I'm led to believe (but can't yet
              > > quantitatively establish as fact) that the two basic culprits are a lot of
              > > calls to:
              > >
              > > 1.) if( someString.ToLo wer() == "somestring " )
              > >
              > > and
              > >
              > > 2.) if( someString != null && someString.Trim ().Length > 0 )
              > >
              > >
              > > ToLower() generates a new string instance as does Trim().
              > >
              > > I believe that these are getting called many times and churning up a bunch
              > > of strings faster than the GC can collect them, or perhaps there's some
              > > weird interning/caching thing going on. Regardless, the number of string
              > > instances grows and grows. It gets bumped down occasionally, but it's
              > > basically 5 steps forward, 1 back.
              > >
              > > For reference, this is an ASP application calling into .NET ComVisible
              > > objects. So I assume this uses the workstation GC, right?
              > >
              > >
              > > Anyhow, so I think that I can solve problem (1) with String.Compare( )
              > > which can perform in-place case-insensitive comparisons without generating
              > > new string instances.
              > >
              > > Problem (2), however, is more complicated. There doesn't appear to be a
              > > TrimmedLength or any type of method or property that can give me the
              > > length of a string, minus whitespace and without generating a new string
              > > instance, in the BCL.
              > >
              > > I suppose I could do some unsafe, or even unmanaged code (which is what
              > > MSFT did for all their string handling stuff inside System.String and
              > > using the COMString stuff), but I'd like to try to avoid that, or at least
              > > use a library that's already written and well tested.
              > >
              > > Any thoughts?
              > >
              > > Thanks in advance,
              > > Chad Myers
              > >
              > >[/color]
              >
              >
              >[/color]

              Comment

              • Chad Myers

                #8
                Re: Fast string operations

                KH,

                Hrm, that's a good idea. In your first suggestion, I'm afraid of indexing
                the char[] in the strings because I'm not sure when the char[] is created
                (does it always tag along with the string, or is it only created the first
                time you try to access the char[] -- through the indexer or through a call
                to ToCharArray, etc).

                But this suggestion just might work...

                I'll look into it and let everyone know how it goes.

                Thanks again everyone.

                Sincerely,
                Chad



                "KH" <KH@discussions .microsoft.com> wrote in message
                news:0FF3EA0B-C537-4AF9-9C4C-E4CA618E78B5@mi crosoft.com...[color=blue]
                > Better yet (I didn't notice this overload before) ...
                >
                > string str = " lalala ";
                >
                > // Be sure to check that the string variable is not a null reference
                > // and its length is at least 1, otherwise you'll get index out of range
                > exceptions
                >
                > if (Char.IsWhiteSp ace(str, 0) || Char.IsWhiteSpa ce(str, str.Length -1))
                > {
                > str = str.Trim();
                > }
                >[/color]


                Comment

                • Nicholas Paldino [.NET/C# MVP]

                  #9
                  Re: Fast string operations

                  Chad,

                  Looping over characters like that can't slow things down that much. No
                  matter what you do, you will have to perform some sort of loop operation to
                  check the string. There is no other way to do it.

                  Also, the char[] that is enumerated through is not created for every
                  iteration through the string. Rather, the string implements IEnumerable,
                  and then the IEnumerator implementation returned will return a new char for
                  each iteration.

                  I don't think that using unsafe code is going to make things any better,
                  only because it's going to do the same thing you are going to do, maybe with
                  one or two operations eliminated in between (and I mean IL operations, not
                  function calls).

                  When you call Trim, a loop is going to start from the beginning of the
                  string, counting the whitespace characters that are at the beginning. Then
                  it is going to perform another loop to scan the end of the stirng for
                  whitespace characters. Once that is done, it will get the substring, which
                  will have to loop through the characters to copy them into a new string (on
                  some level or another, a loop is going to execute).

                  Also, the original issue was the amount of memory that is being consumed
                  (which in reality, it is not, but it is a customer education issue). If the
                  performance of the application is suffering, it is not because of these
                  operations. I would look elsewhere. The fact that you are using COM
                  interop means that for every call you make across that boundary, you are
                  adding something on the order of 40 extra operations. Depending on how
                  chunky your calls are, this could be a factor.

                  In the end, the GC is going to take up as much memory as possible, and
                  give it up only when the OS tells it (from a high level view). That's part
                  of what you sign up for when you use .NET. I'd work on educating your
                  customers to NOT look at task manager in order to determine whether or not
                  there is a memory leak. Rather, they should look at the performance
                  counters (many of which exist for .NET) which give a MUCH more clear
                  performance picture.

                  --
                  - Nicholas Paldino [.NET/C# MVP]
                  - mvp@spam.guard. caspershouse.co m

                  "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                  news:P3pne.1379 7$PR6.10264@tor nado.texas.rr.c om...[color=blue]
                  > Nicholas,
                  >
                  > Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!),
                  > so I can't use Generics.
                  >
                  > Would looping over chars like that slow things down significantly? Also,
                  > is the char[] for each string cached with the string, or is a new one
                  > created when you call things like ToCharArray() or foreach() on the string
                  > (not every loop iteration, but on the first iteration)? Wouldn't I just be
                  > replacing a new string instance with a new char[] and not get any net gain
                  > over just calling .Trim()?
                  >
                  > In your opinion, if I weren't against unsafe code, could I make this
                  > significantly faster, or would it not afford me much difference?
                  >
                  > As far as performance, on some of our clients' instances, memory growth is
                  > rapid. It seems the more memory they have, the faster it grows which leads
                  > me to believe that the GC is being lax since it has so much free memory
                  > and doesn't see the need to aggressively collect memory. But it bothers
                  > our clients and they perceive this to be a memory leak.
                  >
                  > I realize it's an education issue, but I want to make sure that I'm
                  > educating them correctly, as opposed to just making up a B.S. excuse and
                  > Jedi hand-waving about the GC stuff.
                  >
                  > Also, it's not an ASP.NET application, it's an ASP app that used to call
                  > into VB6 COM objects. We've replaced the VB6 objects with .NET objects
                  > exposing a "compatibil ity layer" that has a ComVisible API that is
                  > identical (though not binary compatible) with the old VB6 stuff.
                  > Late-bound clients don't know the difference other than a different ProgID
                  > for the COM objects.
                  >
                  > So we're dealing with the wkst GC, as far as I know (since only ASP.NET
                  > uses svr unless you host the CLR yourself, from what I understand). I'm
                  > not sure how I'd even do that in an ASP/COM-interop situation, but,
                  > assuming it's possible, would writing our own CLR host to use the svr GC
                  > help matters at all?
                  >
                  > Most of our clients' servers are dual-or-more processor boxes.
                  >
                  > Thanks again,
                  > Chad Myers
                  >
                  > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                  > in message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...[color=green]
                  >> Chad,
                  >>
                  >> For the first scenario, your solution should give you an increase.
                  >>
                  >> For the second scenario, you should use reflection once to get a
                  >> reference to the internal static character array WhitespaceChars on the
                  >> string class. Then, you can write a method which will cycle through a
                  >> string passed to it, like so:
                  >>
                  >> public static bool TrimIsNullOrEmp ty(string value)
                  >> {
                  >> // If null, then get out.
                  >> if (value == null)
                  >> {
                  >> // Return true.
                  >> return true;
                  >> }
                  >>
                  >> // Cycle through the characters in the string. If the character is
                  >> not found
                  >> // in the whitespace array, return false, otherwise, when done, return
                  >> true.
                  >> foreach (char c in value)
                  >> {
                  >> // If the character is not found in the WhitespaceArray , then
                  >> return
                  >> // false.
                  >> if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
                  >> {
                  >> // Return false.
                  >> return false;
                  >> }
                  >> }
                  >>
                  >> // Return true, the string is full of whitespace.
                  >> return true;
                  >> }
                  >>
                  >> I used the generic version of the IndexOf method on the Array class in
                  >> order to eliminate boxing. Also, if you really want to squeeze out every
                  >> last bit of performance from this, you can take the WhitespaceArray and
                  >> use the characters as keys in a dictionary. The number of whitespace
                  >> characters is 25 (right now, that is). However, if your strings
                  >> typically are padded with spaces, then you could get a big speed boost by
                  >> copying the array initially, and then placing the space character as the
                  >> first element in the array (which would cause most of the calls to
                  >> IndexOf to return very quickly, probably quicker than a lookup in a
                  >> dictionary).
                  >>
                  >> I am curious though, are you seeing a performance issue, or do you
                  >> just see the numbers and are worried about them? ASP.NET applications
                  >> tend to get in a nice groove with the GC over time.
                  >>
                  >> Hope this helps.
                  >>
                  >> --
                  >> - Nicholas Paldino [.NET/C# MVP]
                  >> - mvp@spam.guard. caspershouse.co m
                  >>
                  >> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                  >> news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=darkred]
                  >>> I've been perf testing an application of mine and I've noticed that
                  >>> there are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
                  >>> System.String instances being created.
                  >>>
                  >>> I've done some analysis and I'm led to believe (but can't yet
                  >>> quantitatively establish as fact) that the two basic culprits are a lot
                  >>> of calls to:
                  >>>
                  >>> 1.) if( someString.ToLo wer() == "somestring " )
                  >>>
                  >>> and
                  >>>
                  >>> 2.) if( someString != null && someString.Trim ().Length > 0 )
                  >>>
                  >>>
                  >>> ToLower() generates a new string instance as does Trim().
                  >>>
                  >>> I believe that these are getting called many times and churning up a
                  >>> bunch of strings faster than the GC can collect them, or perhaps there's
                  >>> some weird interning/caching thing going on. Regardless, the number of
                  >>> string instances grows and grows. It gets bumped down occasionally, but
                  >>> it's basically 5 steps forward, 1 back.
                  >>>
                  >>> For reference, this is an ASP application calling into .NET ComVisible
                  >>> objects. So I assume this uses the workstation GC, right?
                  >>>
                  >>>
                  >>> Anyhow, so I think that I can solve problem (1) with String.Compare( )
                  >>> which can perform in-place case-insensitive comparisons without
                  >>> generating new string instances.
                  >>>
                  >>> Problem (2), however, is more complicated. There doesn't appear to be a
                  >>> TrimmedLength or any type of method or property that can give me the
                  >>> length of a string, minus whitespace and without generating a new string
                  >>> instance, in the BCL.
                  >>>
                  >>> I suppose I could do some unsafe, or even unmanaged code (which is what
                  >>> MSFT did for all their string handling stuff inside System.String and
                  >>> using the COMString stuff), but I'd like to try to avoid that, or at
                  >>> least use a library that's already written and well tested.
                  >>>
                  >>> Any thoughts?
                  >>>
                  >>> Thanks in advance,
                  >>> Chad Myers
                  >>>
                  >>>[/color]
                  >>
                  >>[/color]
                  >
                  >[/color]


                  Comment

                  • Nicholas Paldino [.NET/C# MVP]

                    #10
                    Re: Fast string operations

                    Chad,

                    When you use the indexer on the string, it does not create a new
                    character array representing the whole string. Rather, it just fetches the
                    character and returns a copy of that single character to the user. A
                    character array is never created for the return value of an indexer.


                    --
                    - Nicholas Paldino [.NET/C# MVP]
                    - mvp@spam.guard. caspershouse.co m

                    "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                    news:g9pne.1380 3$PR6.12970@tor nado.texas.rr.c om...[color=blue]
                    > KH,
                    >
                    > Hrm, that's a good idea. In your first suggestion, I'm afraid of indexing
                    > the char[] in the strings because I'm not sure when the char[] is created
                    > (does it always tag along with the string, or is it only created the first
                    > time you try to access the char[] -- through the indexer or through a call
                    > to ToCharArray, etc).
                    >
                    > But this suggestion just might work...
                    >
                    > I'll look into it and let everyone know how it goes.
                    >
                    > Thanks again everyone.
                    >
                    > Sincerely,
                    > Chad
                    >
                    >
                    >
                    > "KH" <KH@discussions .microsoft.com> wrote in message
                    > news:0FF3EA0B-C537-4AF9-9C4C-E4CA618E78B5@mi crosoft.com...[color=green]
                    >> Better yet (I didn't notice this overload before) ...
                    >>
                    >> string str = " lalala ";
                    >>
                    >> // Be sure to check that the string variable is not a null reference
                    >> // and its length is at least 1, otherwise you'll get index out of range
                    >> exceptions
                    >>
                    >> if (Char.IsWhiteSp ace(str, 0) || Char.IsWhiteSpa ce(str, str.Length -1))
                    >> {
                    >> str = str.Trim();
                    >> }
                    >>[/color]
                    >
                    >[/color]


                    Comment

                    • Jonathan Allen

                      #11
                      Re: Fast string operations

                      > As far as performance, on some of our clients' instances, memory growth is[color=blue]
                      > rapid. It seems the more memory they have, the faster it grows which leads
                      > me to believe that the GC is being lax since it has so much free memory
                      > and doesn't see the need to aggressively collect memory. But it bothers
                      > our clients and they perceive this to be a memory leak.[/color]

                      If it was an ASP.net application, you could limit the amount of memory used
                      before the application recycles itself. However, I don't know if that is an
                      option for ASP. I think your goal of educating the client is probably the
                      best bet.

                      May I suggest using the PerfMon tool to show them how often the GC runs and
                      its effect on memory.

                      --
                      Jonathan Allen


                      "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                      news:P3pne.1379 7$PR6.10264@tor nado.texas.rr.c om...[color=blue]
                      > Nicholas,
                      >
                      > Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!),
                      > so I can't use Generics.
                      >
                      > Would looping over chars like that slow things down significantly? Also,
                      > is the char[] for each string cached with the string, or is a new one
                      > created when you call things like ToCharArray() or foreach() on the string
                      > (not every loop iteration, but on the first iteration)? Wouldn't I just be
                      > replacing a new string instance with a new char[] and not get any net gain
                      > over just calling .Trim()?
                      >
                      > In your opinion, if I weren't against unsafe code, could I make this
                      > significantly faster, or would it not afford me much difference?
                      >
                      > As far as performance, on some of our clients' instances, memory growth is
                      > rapid. It seems the more memory they have, the faster it grows which leads
                      > me to believe that the GC is being lax since it has so much free memory
                      > and doesn't see the need to aggressively collect memory. But it bothers
                      > our clients and they perceive this to be a memory leak.
                      >
                      > I realize it's an education issue, but I want to make sure that I'm
                      > educating them correctly, as opposed to just making up a B.S. excuse and
                      > Jedi hand-waving about the GC stuff.
                      >
                      > Also, it's not an ASP.NET application, it's an ASP app that used to call
                      > into VB6 COM objects. We've replaced the VB6 objects with .NET objects
                      > exposing a "compatibil ity layer" that has a ComVisible API that is
                      > identical (though not binary compatible) with the old VB6 stuff.
                      > Late-bound clients don't know the difference other than a different ProgID
                      > for the COM objects.
                      >
                      > So we're dealing with the wkst GC, as far as I know (since only ASP.NET
                      > uses svr unless you host the CLR yourself, from what I understand). I'm
                      > not sure how I'd even do that in an ASP/COM-interop situation, but,
                      > assuming it's possible, would writing our own CLR host to use the svr GC
                      > help matters at all?
                      >
                      > Most of our clients' servers are dual-or-more processor boxes.
                      >
                      > Thanks again,
                      > Chad Myers
                      >
                      > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                      > in message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...[color=green]
                      >> Chad,
                      >>
                      >> For the first scenario, your solution should give you an increase.
                      >>
                      >> For the second scenario, you should use reflection once to get a
                      >> reference to the internal static character array WhitespaceChars on the
                      >> string class. Then, you can write a method which will cycle through a
                      >> string passed to it, like so:
                      >>
                      >> public static bool TrimIsNullOrEmp ty(string value)
                      >> {
                      >> // If null, then get out.
                      >> if (value == null)
                      >> {
                      >> // Return true.
                      >> return true;
                      >> }
                      >>
                      >> // Cycle through the characters in the string. If the character is
                      >> not found
                      >> // in the whitespace array, return false, otherwise, when done, return
                      >> true.
                      >> foreach (char c in value)
                      >> {
                      >> // If the character is not found in the WhitespaceArray , then
                      >> return
                      >> // false.
                      >> if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
                      >> {
                      >> // Return false.
                      >> return false;
                      >> }
                      >> }
                      >>
                      >> // Return true, the string is full of whitespace.
                      >> return true;
                      >> }
                      >>
                      >> I used the generic version of the IndexOf method on the Array class in
                      >> order to eliminate boxing. Also, if you really want to squeeze out every
                      >> last bit of performance from this, you can take the WhitespaceArray and
                      >> use the characters as keys in a dictionary. The number of whitespace
                      >> characters is 25 (right now, that is). However, if your strings
                      >> typically are padded with spaces, then you could get a big speed boost by
                      >> copying the array initially, and then placing the space character as the
                      >> first element in the array (which would cause most of the calls to
                      >> IndexOf to return very quickly, probably quicker than a lookup in a
                      >> dictionary).
                      >>
                      >> I am curious though, are you seeing a performance issue, or do you
                      >> just see the numbers and are worried about them? ASP.NET applications
                      >> tend to get in a nice groove with the GC over time.
                      >>
                      >> Hope this helps.
                      >>
                      >> --
                      >> - Nicholas Paldino [.NET/C# MVP]
                      >> - mvp@spam.guard. caspershouse.co m
                      >>
                      >> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                      >> news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...[color=darkred]
                      >>> I've been perf testing an application of mine and I've noticed that
                      >>> there are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
                      >>> System.String instances being created.
                      >>>
                      >>> I've done some analysis and I'm led to believe (but can't yet
                      >>> quantitatively establish as fact) that the two basic culprits are a lot
                      >>> of calls to:
                      >>>
                      >>> 1.) if( someString.ToLo wer() == "somestring " )
                      >>>
                      >>> and
                      >>>
                      >>> 2.) if( someString != null && someString.Trim ().Length > 0 )
                      >>>
                      >>>
                      >>> ToLower() generates a new string instance as does Trim().
                      >>>
                      >>> I believe that these are getting called many times and churning up a
                      >>> bunch of strings faster than the GC can collect them, or perhaps there's
                      >>> some weird interning/caching thing going on. Regardless, the number of
                      >>> string instances grows and grows. It gets bumped down occasionally, but
                      >>> it's basically 5 steps forward, 1 back.
                      >>>
                      >>> For reference, this is an ASP application calling into .NET ComVisible
                      >>> objects. So I assume this uses the workstation GC, right?
                      >>>
                      >>>
                      >>> Anyhow, so I think that I can solve problem (1) with String.Compare( )
                      >>> which can perform in-place case-insensitive comparisons without
                      >>> generating new string instances.
                      >>>
                      >>> Problem (2), however, is more complicated. There doesn't appear to be a
                      >>> TrimmedLength or any type of method or property that can give me the
                      >>> length of a string, minus whitespace and without generating a new string
                      >>> instance, in the BCL.
                      >>>
                      >>> I suppose I could do some unsafe, or even unmanaged code (which is what
                      >>> MSFT did for all their string handling stuff inside System.String and
                      >>> using the COMString stuff), but I'd like to try to avoid that, or at
                      >>> least use a library that's already written and well tested.
                      >>>
                      >>> Any thoughts?
                      >>>
                      >>> Thanks in advance,
                      >>> Chad Myers
                      >>>
                      >>>[/color]
                      >>
                      >>[/color]
                      >
                      >[/color]


                      Comment

                      • Chad Myers

                        #12
                        Re: Fast string operations

                        Nicholas,

                        Looping: I thought looping over arrays in managed code was "slow"
                        (relatively speaking) because of all the bounds checking and whatnot. This
                        is why people use unsafe code now and then to use pointer arithmetic to loop
                        over arrays without all the unnecessary bounds checking.

                        I'm aware that looping has to occur one way or the other, but with Trim(),
                        the looping is happening in unmanaged code (COMString::Tri mHelper to be
                        exact) and is much faster since it isn't required to do all the bloated .NET
                        array handling and such.

                        The problem with TrimHelper is that it always returns a new string instance.
                        There's no way to say "Trim and tell me what the length is when you're done,
                        don't return the trimmed string".

                        There are no performance issues that I'm aware of (yet). The concern is
                        rapid memory growth. The customer perceives this as a memory leak. I'm 99%
                        sure that it's just the GC being lazy/stand-offish until there's something
                        to worry about (we approach the 2GB limit of a process), but I wanted to
                        double-check before I unknowingly fed a line of B.S. to the customer.

                        The customer will eventually learn to understand this and accept it, but I
                        wanted to make sure that I understood it completely.

                        We have done extensive performance counting, so we're well aware of the
                        memory picture. I have followed numerous profiling guides and established
                        that the majority of allocations are of System.String's and that the
                        majority of the process's memory is being taken up with the Gen2 heap. This
                        is what concerns me. My understanding is that you have to survive several
                        successive garbage collections in order to make it to the Gen2 heap. How are
                        temporary strings making it to Gen2?

                        That's the only reason for the 1% of doubt I have left.

                        Thanks again,
                        Chad

                        P.S.- the previous COM stuff was written in VB6 and was STA. We had
                        customers running on our legacy stuff (it was written before I got here,
                        don't blame me! ;) ) with many concurrent users. Some were starting to run
                        into the STA limitations, but, for the most part, it was running
                        surprisingly (and I mean surprisingly!) well. Eventually some hit a wall.

                        When our .NET rewrite/rearchitecture was finished, we wrote a COM
                        facade/compatibility layer so that existing COM- or ASP/SCRIPT-based code
                        could run against the new stuff.

                        We saw several orders of magnitude difference in performance, even with the
                        COM interop overhead, not to mention the now highly scalable multi-threaded
                        interface. Score +1 for .NET, again :)


                        "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote in
                        message news:efo6bwuZFH A.3040@TK2MSFTN GP14.phx.gbl...[color=blue]
                        > Chad,
                        >
                        > Looping over characters like that can't slow things down that much. No
                        > matter what you do, you will have to perform some sort of loop operation
                        > to check the string. There is no other way to do it.
                        >
                        > Also, the char[] that is enumerated through is not created for every
                        > iteration through the string. Rather, the string implements IEnumerable,
                        > and then the IEnumerator implementation returned will return a new char
                        > for each iteration.
                        >
                        > I don't think that using unsafe code is going to make things any
                        > better, only because it's going to do the same thing you are going to do,
                        > maybe with one or two operations eliminated in between (and I mean IL
                        > operations, not function calls).
                        >
                        > When you call Trim, a loop is going to start from the beginning of the
                        > string, counting the whitespace characters that are at the beginning.
                        > Then it is going to perform another loop to scan the end of the stirng for
                        > whitespace characters. Once that is done, it will get the substring,
                        > which will have to loop through the characters to copy them into a new
                        > string (on some level or another, a loop is going to execute).
                        >
                        > Also, the original issue was the amount of memory that is being
                        > consumed (which in reality, it is not, but it is a customer education
                        > issue). If the performance of the application is suffering, it is not
                        > because of these operations. I would look elsewhere. The fact that you
                        > are using COM interop means that for every call you make across that
                        > boundary, you are adding something on the order of 40 extra operations.
                        > Depending on how chunky your calls are, this could be a factor.
                        >
                        > In the end, the GC is going to take up as much memory as possible, and
                        > give it up only when the OS tells it (from a high level view). That's
                        > part of what you sign up for when you use .NET. I'd work on educating
                        > your customers to NOT look at task manager in order to determine whether
                        > or not there is a memory leak. Rather, they should look at the
                        > performance counters (many of which exist for .NET) which give a MUCH more
                        > clear performance picture.
                        >
                        > --
                        > - Nicholas Paldino [.NET/C# MVP]
                        > - mvp@spam.guard. caspershouse.co m
                        >
                        > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                        > news:P3pne.1379 7$PR6.10264@tor nado.texas.rr.c om...[color=green]
                        >> Nicholas,
                        >>
                        >> Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!),
                        >> so I can't use Generics.
                        >>
                        >> Would looping over chars like that slow things down significantly? Also,
                        >> is the char[] for each string cached with the string, or is a new one
                        >> created when you call things like ToCharArray() or foreach() on the
                        >> string (not every loop iteration, but on the first iteration)? Wouldn't I
                        >> just be replacing a new string instance with a new char[] and not get any
                        >> net gain over just calling .Trim()?
                        >>
                        >> In your opinion, if I weren't against unsafe code, could I make this
                        >> significantly faster, or would it not afford me much difference?
                        >>
                        >> As far as performance, on some of our clients' instances, memory growth
                        >> is rapid. It seems the more memory they have, the faster it grows which
                        >> leads me to believe that the GC is being lax since it has so much free
                        >> memory and doesn't see the need to aggressively collect memory. But it
                        >> bothers our clients and they perceive this to be a memory leak.
                        >>
                        >> I realize it's an education issue, but I want to make sure that I'm
                        >> educating them correctly, as opposed to just making up a B.S. excuse and
                        >> Jedi hand-waving about the GC stuff.
                        >>
                        >> Also, it's not an ASP.NET application, it's an ASP app that used to call
                        >> into VB6 COM objects. We've replaced the VB6 objects with .NET objects
                        >> exposing a "compatibil ity layer" that has a ComVisible API that is
                        >> identical (though not binary compatible) with the old VB6 stuff.
                        >> Late-bound clients don't know the difference other than a different
                        >> ProgID for the COM objects.
                        >>
                        >> So we're dealing with the wkst GC, as far as I know (since only ASP.NET
                        >> uses svr unless you host the CLR yourself, from what I understand). I'm
                        >> not sure how I'd even do that in an ASP/COM-interop situation, but,
                        >> assuming it's possible, would writing our own CLR host to use the svr GC
                        >> help matters at all?
                        >>
                        >> Most of our clients' servers are dual-or-more processor boxes.
                        >>
                        >> Thanks again,
                        >> Chad Myers
                        >>
                        >> "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                        >> in message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...[color=darkred]
                        >>> Chad,
                        >>>
                        >>> For the first scenario, your solution should give you an increase.
                        >>>
                        >>> For the second scenario, you should use reflection once to get a
                        >>> reference to the internal static character array WhitespaceChars on the
                        >>> string class. Then, you can write a method which will cycle through a
                        >>> string passed to it, like so:
                        >>>
                        >>> public static bool TrimIsNullOrEmp ty(string value)
                        >>> {
                        >>> // If null, then get out.
                        >>> if (value == null)
                        >>> {
                        >>> // Return true.
                        >>> return true;
                        >>> }
                        >>>
                        >>> // Cycle through the characters in the string. If the character is
                        >>> not found
                        >>> // in the whitespace array, return false, otherwise, when done,
                        >>> return true.
                        >>> foreach (char c in value)
                        >>> {
                        >>> // If the character is not found in the WhitespaceArray , then
                        >>> return
                        >>> // false.
                        >>> if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
                        >>> {
                        >>> // Return false.
                        >>> return false;
                        >>> }
                        >>> }
                        >>>
                        >>> // Return true, the string is full of whitespace.
                        >>> return true;
                        >>> }
                        >>>
                        >>> I used the generic version of the IndexOf method on the Array class
                        >>> in order to eliminate boxing. Also, if you really want to squeeze out
                        >>> every last bit of performance from this, you can take the
                        >>> WhitespaceArray and use the characters as keys in a dictionary. The
                        >>> number of whitespace characters is 25 (right now, that is). However, if
                        >>> your strings typically are padded with spaces, then you could get a big
                        >>> speed boost by copying the array initially, and then placing the space
                        >>> character as the first element in the array (which would cause most of
                        >>> the calls to IndexOf to return very quickly, probably quicker than a
                        >>> lookup in a dictionary).
                        >>>
                        >>> I am curious though, are you seeing a performance issue, or do you
                        >>> just see the numbers and are worried about them? ASP.NET applications
                        >>> tend to get in a nice groove with the GC over time.
                        >>>
                        >>> Hope this helps.
                        >>>
                        >>> --
                        >>> - Nicholas Paldino [.NET/C# MVP]
                        >>> - mvp@spam.guard. caspershouse.co m
                        >>>
                        >>> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                        >>> news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...
                        >>>> I've been perf testing an application of mine and I've noticed that
                        >>>> there are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
                        >>>> System.String instances being created.
                        >>>>
                        >>>> I've done some analysis and I'm led to believe (but can't yet
                        >>>> quantitatively establish as fact) that the two basic culprits are a lot
                        >>>> of calls to:
                        >>>>
                        >>>> 1.) if( someString.ToLo wer() == "somestring " )
                        >>>>
                        >>>> and
                        >>>>
                        >>>> 2.) if( someString != null && someString.Trim ().Length > 0 )
                        >>>>
                        >>>>
                        >>>> ToLower() generates a new string instance as does Trim().
                        >>>>
                        >>>> I believe that these are getting called many times and churning up a
                        >>>> bunch of strings faster than the GC can collect them, or perhaps
                        >>>> there's some weird interning/caching thing going on. Regardless, the
                        >>>> number of string instances grows and grows. It gets bumped down
                        >>>> occasionally, but it's basically 5 steps forward, 1 back.
                        >>>>
                        >>>> For reference, this is an ASP application calling into .NET ComVisible
                        >>>> objects. So I assume this uses the workstation GC, right?
                        >>>>
                        >>>>
                        >>>> Anyhow, so I think that I can solve problem (1) with String.Compare( )
                        >>>> which can perform in-place case-insensitive comparisons without
                        >>>> generating new string instances.
                        >>>>
                        >>>> Problem (2), however, is more complicated. There doesn't appear to be a
                        >>>> TrimmedLength or any type of method or property that can give me the
                        >>>> length of a string, minus whitespace and without generating a new
                        >>>> string instance, in the BCL.
                        >>>>
                        >>>> I suppose I could do some unsafe, or even unmanaged code (which is what
                        >>>> MSFT did for all their string handling stuff inside System.String and
                        >>>> using the COMString stuff), but I'd like to try to avoid that, or at
                        >>>> least use a library that's already written and well tested.
                        >>>>
                        >>>> Any thoughts?
                        >>>>
                        >>>> Thanks in advance,
                        >>>> Chad Myers
                        >>>>
                        >>>>
                        >>>
                        >>>[/color]
                        >>
                        >>[/color]
                        >
                        >[/color]


                        Comment

                        • KH

                          #13
                          Re: Fast string operations

                          Array looping: The CLR has an optimization to remove bounds checking under
                          certain conditions, mainly your basic for loop using Array.Length as the
                          condition:

                          for (int i=0; i < myarray.Length; ++i)
                          {
                          // presumably playing with the indexer or re-assigning the array variable
                          // here would disable the optimization, but I don't really know.
                          }

                          Anyways I don't know what your app is but it must be mighty big to be so
                          worried about string performance. It's usually over-use of strings that
                          causes problems, like building strings by conditinally concatenating them,
                          stuff like that that people don't realize causes a new instance of string to
                          be created with EACH operation:

                          string str1 = " ABC";
                          string str2 = "DEF ";
                          string str3 = (str1 + str2).ToLower() .Trim(); // 3 Strings created here

                          If that's the real issue you might look into the StringBuilder class, which
                          is mutable unlike String.

                          - KH



                          "Chad Myers" wrote:
                          [color=blue]
                          > Nicholas,
                          >
                          > Looping: I thought looping over arrays in managed code was "slow"
                          > (relatively speaking) because of all the bounds checking and whatnot. This
                          > is why people use unsafe code now and then to use pointer arithmetic to loop
                          > over arrays without all the unnecessary bounds checking.
                          >
                          > I'm aware that looping has to occur one way or the other, but with Trim(),
                          > the looping is happening in unmanaged code (COMString::Tri mHelper to be
                          > exact) and is much faster since it isn't required to do all the bloated .NET
                          > array handling and such.
                          >
                          > The problem with TrimHelper is that it always returns a new string instance.
                          > There's no way to say "Trim and tell me what the length is when you're done,
                          > don't return the trimmed string".
                          >
                          > There are no performance issues that I'm aware of (yet). The concern is
                          > rapid memory growth. The customer perceives this as a memory leak. I'm 99%
                          > sure that it's just the GC being lazy/stand-offish until there's something
                          > to worry about (we approach the 2GB limit of a process), but I wanted to
                          > double-check before I unknowingly fed a line of B.S. to the customer.
                          >
                          > The customer will eventually learn to understand this and accept it, but I
                          > wanted to make sure that I understood it completely.
                          >
                          > We have done extensive performance counting, so we're well aware of the
                          > memory picture. I have followed numerous profiling guides and established
                          > that the majority of allocations are of System.String's and that the
                          > majority of the process's memory is being taken up with the Gen2 heap. This
                          > is what concerns me. My understanding is that you have to survive several
                          > successive garbage collections in order to make it to the Gen2 heap. How are
                          > temporary strings making it to Gen2?
                          >
                          > That's the only reason for the 1% of doubt I have left.
                          >
                          > Thanks again,
                          > Chad
                          >
                          > P.S.- the previous COM stuff was written in VB6 and was STA. We had
                          > customers running on our legacy stuff (it was written before I got here,
                          > don't blame me! ;) ) with many concurrent users. Some were starting to run
                          > into the STA limitations, but, for the most part, it was running
                          > surprisingly (and I mean surprisingly!) well. Eventually some hit a wall.
                          >
                          > When our .NET rewrite/rearchitecture was finished, we wrote a COM
                          > facade/compatibility layer so that existing COM- or ASP/SCRIPT-based code
                          > could run against the new stuff.
                          >
                          > We saw several orders of magnitude difference in performance, even with the
                          > COM interop overhead, not to mention the now highly scalable multi-threaded
                          > interface. Score +1 for .NET, again :)
                          >
                          >
                          > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote in
                          > message news:efo6bwuZFH A.3040@TK2MSFTN GP14.phx.gbl...[color=green]
                          > > Chad,
                          > >
                          > > Looping over characters like that can't slow things down that much. No
                          > > matter what you do, you will have to perform some sort of loop operation
                          > > to check the string. There is no other way to do it.
                          > >
                          > > Also, the char[] that is enumerated through is not created for every
                          > > iteration through the string. Rather, the string implements IEnumerable,
                          > > and then the IEnumerator implementation returned will return a new char
                          > > for each iteration.
                          > >
                          > > I don't think that using unsafe code is going to make things any
                          > > better, only because it's going to do the same thing you are going to do,
                          > > maybe with one or two operations eliminated in between (and I mean IL
                          > > operations, not function calls).
                          > >
                          > > When you call Trim, a loop is going to start from the beginning of the
                          > > string, counting the whitespace characters that are at the beginning.
                          > > Then it is going to perform another loop to scan the end of the stirng for
                          > > whitespace characters. Once that is done, it will get the substring,
                          > > which will have to loop through the characters to copy them into a new
                          > > string (on some level or another, a loop is going to execute).
                          > >
                          > > Also, the original issue was the amount of memory that is being
                          > > consumed (which in reality, it is not, but it is a customer education
                          > > issue). If the performance of the application is suffering, it is not
                          > > because of these operations. I would look elsewhere. The fact that you
                          > > are using COM interop means that for every call you make across that
                          > > boundary, you are adding something on the order of 40 extra operations.
                          > > Depending on how chunky your calls are, this could be a factor.
                          > >
                          > > In the end, the GC is going to take up as much memory as possible, and
                          > > give it up only when the OS tells it (from a high level view). That's
                          > > part of what you sign up for when you use .NET. I'd work on educating
                          > > your customers to NOT look at task manager in order to determine whether
                          > > or not there is a memory leak. Rather, they should look at the
                          > > performance counters (many of which exist for .NET) which give a MUCH more
                          > > clear performance picture.
                          > >
                          > > --
                          > > - Nicholas Paldino [.NET/C# MVP]
                          > > - mvp@spam.guard. caspershouse.co m
                          > >
                          > > "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                          > > news:P3pne.1379 7$PR6.10264@tor nado.texas.rr.c om...[color=darkred]
                          > >> Nicholas,
                          > >>
                          > >> Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!),
                          > >> so I can't use Generics.
                          > >>
                          > >> Would looping over chars like that slow things down significantly? Also,
                          > >> is the char[] for each string cached with the string, or is a new one
                          > >> created when you call things like ToCharArray() or foreach() on the
                          > >> string (not every loop iteration, but on the first iteration)? Wouldn't I
                          > >> just be replacing a new string instance with a new char[] and not get any
                          > >> net gain over just calling .Trim()?
                          > >>
                          > >> In your opinion, if I weren't against unsafe code, could I make this
                          > >> significantly faster, or would it not afford me much difference?
                          > >>
                          > >> As far as performance, on some of our clients' instances, memory growth
                          > >> is rapid. It seems the more memory they have, the faster it grows which
                          > >> leads me to believe that the GC is being lax since it has so much free
                          > >> memory and doesn't see the need to aggressively collect memory. But it
                          > >> bothers our clients and they perceive this to be a memory leak.
                          > >>
                          > >> I realize it's an education issue, but I want to make sure that I'm
                          > >> educating them correctly, as opposed to just making up a B.S. excuse and
                          > >> Jedi hand-waving about the GC stuff.
                          > >>
                          > >> Also, it's not an ASP.NET application, it's an ASP app that used to call
                          > >> into VB6 COM objects. We've replaced the VB6 objects with .NET objects
                          > >> exposing a "compatibil ity layer" that has a ComVisible API that is
                          > >> identical (though not binary compatible) with the old VB6 stuff.
                          > >> Late-bound clients don't know the difference other than a different
                          > >> ProgID for the COM objects.
                          > >>
                          > >> So we're dealing with the wkst GC, as far as I know (since only ASP.NET
                          > >> uses svr unless you host the CLR yourself, from what I understand). I'm
                          > >> not sure how I'd even do that in an ASP/COM-interop situation, but,
                          > >> assuming it's possible, would writing our own CLR host to use the svr GC
                          > >> help matters at all?
                          > >>
                          > >> Most of our clients' servers are dual-or-more processor boxes.
                          > >>
                          > >> Thanks again,
                          > >> Chad Myers
                          > >>
                          > >> "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                          > >> in message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...
                          > >>> Chad,
                          > >>>
                          > >>> For the first scenario, your solution should give you an increase.
                          > >>>
                          > >>> For the second scenario, you should use reflection once to get a
                          > >>> reference to the internal static character array WhitespaceChars on the
                          > >>> string class. Then, you can write a method which will cycle through a
                          > >>> string passed to it, like so:
                          > >>>
                          > >>> public static bool TrimIsNullOrEmp ty(string value)
                          > >>> {
                          > >>> // If null, then get out.
                          > >>> if (value == null)
                          > >>> {
                          > >>> // Return true.
                          > >>> return true;
                          > >>> }
                          > >>>
                          > >>> // Cycle through the characters in the string. If the character is
                          > >>> not found
                          > >>> // in the whitespace array, return false, otherwise, when done,
                          > >>> return true.
                          > >>> foreach (char c in value)
                          > >>> {
                          > >>> // If the character is not found in the WhitespaceArray , then
                          > >>> return
                          > >>> // false.
                          > >>> if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
                          > >>> {
                          > >>> // Return false.
                          > >>> return false;
                          > >>> }
                          > >>> }
                          > >>>
                          > >>> // Return true, the string is full of whitespace.
                          > >>> return true;
                          > >>> }
                          > >>>
                          > >>> I used the generic version of the IndexOf method on the Array class
                          > >>> in order to eliminate boxing. Also, if you really want to squeeze out
                          > >>> every last bit of performance from this, you can take the
                          > >>> WhitespaceArray and use the characters as keys in a dictionary. The
                          > >>> number of whitespace characters is 25 (right now, that is). However, if
                          > >>> your strings typically are padded with spaces, then you could get a big
                          > >>> speed boost by copying the array initially, and then placing the space
                          > >>> character as the first element in the array (which would cause most of
                          > >>> the calls to IndexOf to return very quickly, probably quicker than a
                          > >>> lookup in a dictionary).
                          > >>>
                          > >>> I am curious though, are you seeing a performance issue, or do you
                          > >>> just see the numbers and are worried about them? ASP.NET applications
                          > >>> tend to get in a nice groove with the GC over time.
                          > >>>
                          > >>> Hope this helps.
                          > >>>
                          > >>> --
                          > >>> - Nicholas Paldino [.NET/C# MVP]
                          > >>> - mvp@spam.guard. caspershouse.co m
                          > >>>
                          > >>> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                          > >>> news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...
                          > >>>> I've been perf testing an application of mine and I've noticed that
                          > >>>> there are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
                          > >>>> System.String instances being created.
                          > >>>>
                          > >>>> I've done some analysis and I'm led to believe (but can't yet
                          > >>>> quantitatively establish as fact) that the two basic culprits are a lot
                          > >>>> of calls to:
                          > >>>>
                          > >>>> 1.) if( someString.ToLo wer() == "somestring " )
                          > >>>>
                          > >>>> and
                          > >>>>
                          > >>>> 2.) if( someString != null && someString.Trim ().Length > 0 )
                          > >>>>
                          > >>>>
                          > >>>> ToLower() generates a new string instance as does Trim().
                          > >>>>
                          > >>>> I believe that these are getting called many times and churning up a
                          > >>>> bunch of strings faster than the GC can collect them, or perhaps
                          > >>>> there's some weird interning/caching thing going on. Regardless, the
                          > >>>> number of string instances grows and grows. It gets bumped down
                          > >>>> occasionally, but it's basically 5 steps forward, 1 back.
                          > >>>>
                          > >>>> For reference, this is an ASP application calling into .NET ComVisible
                          > >>>> objects. So I assume this uses the workstation GC, right?
                          > >>>>
                          > >>>>
                          > >>>> Anyhow, so I think that I can solve problem (1) with String.Compare( )
                          > >>>> which can perform in-place case-insensitive comparisons without
                          > >>>> generating new string instances.
                          > >>>>
                          > >>>> Problem (2), however, is more complicated. There doesn't appear to be a
                          > >>>> TrimmedLength or any type of method or property that can give me the
                          > >>>> length of a string, minus whitespace and without generating a new
                          > >>>> string instance, in the BCL.
                          > >>>>
                          > >>>> I suppose I could do some unsafe, or even unmanaged code (which is what
                          > >>>> MSFT did for all their string handling stuff inside System.String and
                          > >>>> using the COMString stuff), but I'd like to try to avoid that, or at
                          > >>>> least use a library that's already written and well tested.
                          > >>>>
                          > >>>> Any thoughts?
                          > >>>>
                          > >>>> Thanks in advance,
                          > >>>> Chad Myers
                          > >>>>
                          > >>>>
                          > >>>
                          > >>>
                          > >>
                          > >>[/color]
                          > >
                          > >[/color]
                          >
                          >
                          >[/color]

                          Comment

                          • Samuel R. Neff

                            #14
                            Re: Fast string operations


                            Array bound checking in a loop is optimized such that the bounds are
                            only verified once outside the loop when the JIT knows for sure that
                            every array index within the loop will be valid. This is the case
                            when looping over elements in an array (a built-in array, not a
                            collection) and only using the loop variable to index into the array
                            and not another variable or calculation.

                            Sam


                            On Wed, 01 Jun 2005 21:35:01 GMT, "Chad Myers"
                            <cmyers@N0.SP4M .austin.rr.com> wrote:
                            [color=blue]
                            >Nicholas,
                            >
                            >Looping: I thought looping over arrays in managed code was "slow"
                            >(relatively speaking) because of all the bounds checking and whatnot. This
                            >is why people use unsafe code now and then to use pointer arithmetic to loop
                            >over arrays without all the unnecessary bounds checking.
                            >[/color]

                            Comment

                            • Jonathan Allen

                              #15
                              Re: Fast string operations

                              > Looping: I thought looping over arrays in managed code was "slow"[color=blue]
                              > (relatively speaking) because of all the bounds checking and whatnot. This
                              > is why people use unsafe code now and then to use pointer arithmetic to
                              > loop over arrays without all the unnecessary bounds checking.[/color]

                              That's not necessarily true.

                              for (int i = 0; i<arr.length; i++)
                              {
                              sum += arr[i];
                              }

                              The optimizer will recognize this pattern and not perform the array bound
                              checks. That said, the chances of the array bound check being significant is
                              very low. On the other hand, the chances of messing this up big time are
                              great. Even greater are the chances of messing up the real performance
                              improvements that the compiler can do. In a super computing class, we saw
                              that this code can be faster on some systems. The CLR knows this and uses it
                              when appropriate.

                              for (int i = 0; i<arr.length; i=i+4)
                              {
                              sum += arr[i];
                              sum += arr[i+1];
                              sum += arr[i+2];
                              sum += arr[i+3];
                              }


                              --
                              Jonathan Allen


                              "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                              news:9Spne.1382 4$PR6.957@torna do.texas.rr.com ...[color=blue]
                              > Nicholas,
                              >
                              > Looping: I thought looping over arrays in managed code was "slow"
                              > (relatively speaking) because of all the bounds checking and whatnot. This
                              > is why people use unsafe code now and then to use pointer arithmetic to
                              > loop over arrays without all the unnecessary bounds checking.
                              >
                              > I'm aware that looping has to occur one way or the other, but with Trim(),
                              > the looping is happening in unmanaged code (COMString::Tri mHelper to be
                              > exact) and is much faster since it isn't required to do all the bloated
                              > .NET array handling and such.
                              >
                              > The problem with TrimHelper is that it always returns a new string
                              > instance. There's no way to say "Trim and tell me what the length is when
                              > you're done, don't return the trimmed string".
                              >
                              > There are no performance issues that I'm aware of (yet). The concern is
                              > rapid memory growth. The customer perceives this as a memory leak. I'm 99%
                              > sure that it's just the GC being lazy/stand-offish until there's something
                              > to worry about (we approach the 2GB limit of a process), but I wanted to
                              > double-check before I unknowingly fed a line of B.S. to the customer.
                              >
                              > The customer will eventually learn to understand this and accept it, but I
                              > wanted to make sure that I understood it completely.
                              >
                              > We have done extensive performance counting, so we're well aware of the
                              > memory picture. I have followed numerous profiling guides and established
                              > that the majority of allocations are of System.String's and that the
                              > majority of the process's memory is being taken up with the Gen2 heap.
                              > This is what concerns me. My understanding is that you have to survive
                              > several successive garbage collections in order to make it to the Gen2
                              > heap. How are temporary strings making it to Gen2?
                              >
                              > That's the only reason for the 1% of doubt I have left.
                              >
                              > Thanks again,
                              > Chad
                              >
                              > P.S.- the previous COM stuff was written in VB6 and was STA. We had
                              > customers running on our legacy stuff (it was written before I got here,
                              > don't blame me! ;) ) with many concurrent users. Some were starting to run
                              > into the STA limitations, but, for the most part, it was running
                              > surprisingly (and I mean surprisingly!) well. Eventually some hit a wall.
                              >
                              > When our .NET rewrite/rearchitecture was finished, we wrote a COM
                              > facade/compatibility layer so that existing COM- or ASP/SCRIPT-based code
                              > could run against the new stuff.
                              >
                              > We saw several orders of magnitude difference in performance, even with
                              > the COM interop overhead, not to mention the now highly scalable
                              > multi-threaded interface. Score +1 for .NET, again :)
                              >
                              >
                              > "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                              > in message news:efo6bwuZFH A.3040@TK2MSFTN GP14.phx.gbl...[color=green]
                              >> Chad,
                              >>
                              >> Looping over characters like that can't slow things down that much.
                              >> No matter what you do, you will have to perform some sort of loop
                              >> operation to check the string. There is no other way to do it.
                              >>
                              >> Also, the char[] that is enumerated through is not created for every
                              >> iteration through the string. Rather, the string implements IEnumerable,
                              >> and then the IEnumerator implementation returned will return a new char
                              >> for each iteration.
                              >>
                              >> I don't think that using unsafe code is going to make things any
                              >> better, only because it's going to do the same thing you are going to do,
                              >> maybe with one or two operations eliminated in between (and I mean IL
                              >> operations, not function calls).
                              >>
                              >> When you call Trim, a loop is going to start from the beginning of the
                              >> string, counting the whitespace characters that are at the beginning.
                              >> Then it is going to perform another loop to scan the end of the stirng
                              >> for whitespace characters. Once that is done, it will get the substring,
                              >> which will have to loop through the characters to copy them into a new
                              >> string (on some level or another, a loop is going to execute).
                              >>
                              >> Also, the original issue was the amount of memory that is being
                              >> consumed (which in reality, it is not, but it is a customer education
                              >> issue). If the performance of the application is suffering, it is not
                              >> because of these operations. I would look elsewhere. The fact that you
                              >> are using COM interop means that for every call you make across that
                              >> boundary, you are adding something on the order of 40 extra operations.
                              >> Depending on how chunky your calls are, this could be a factor.
                              >>
                              >> In the end, the GC is going to take up as much memory as possible, and
                              >> give it up only when the OS tells it (from a high level view). That's
                              >> part of what you sign up for when you use .NET. I'd work on educating
                              >> your customers to NOT look at task manager in order to determine whether
                              >> or not there is a memory leak. Rather, they should look at the
                              >> performance counters (many of which exist for .NET) which give a MUCH
                              >> more clear performance picture.
                              >>
                              >> --
                              >> - Nicholas Paldino [.NET/C# MVP]
                              >> - mvp@spam.guard. caspershouse.co m
                              >>
                              >> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                              >> news:P3pne.1379 7$PR6.10264@tor nado.texas.rr.c om...[color=darkred]
                              >>> Nicholas,
                              >>>
                              >>> Thanks for the quick reply. Unfortunately I'm not using .NET 2.0 (yet!),
                              >>> so I can't use Generics.
                              >>>
                              >>> Would looping over chars like that slow things down significantly? Also,
                              >>> is the char[] for each string cached with the string, or is a new one
                              >>> created when you call things like ToCharArray() or foreach() on the
                              >>> string (not every loop iteration, but on the first iteration)? Wouldn't
                              >>> I just be replacing a new string instance with a new char[] and not get
                              >>> any net gain over just calling .Trim()?
                              >>>
                              >>> In your opinion, if I weren't against unsafe code, could I make this
                              >>> significantly faster, or would it not afford me much difference?
                              >>>
                              >>> As far as performance, on some of our clients' instances, memory growth
                              >>> is rapid. It seems the more memory they have, the faster it grows which
                              >>> leads me to believe that the GC is being lax since it has so much free
                              >>> memory and doesn't see the need to aggressively collect memory. But it
                              >>> bothers our clients and they perceive this to be a memory leak.
                              >>>
                              >>> I realize it's an education issue, but I want to make sure that I'm
                              >>> educating them correctly, as opposed to just making up a B.S. excuse and
                              >>> Jedi hand-waving about the GC stuff.
                              >>>
                              >>> Also, it's not an ASP.NET application, it's an ASP app that used to call
                              >>> into VB6 COM objects. We've replaced the VB6 objects with .NET objects
                              >>> exposing a "compatibil ity layer" that has a ComVisible API that is
                              >>> identical (though not binary compatible) with the old VB6 stuff.
                              >>> Late-bound clients don't know the difference other than a different
                              >>> ProgID for the COM objects.
                              >>>
                              >>> So we're dealing with the wkst GC, as far as I know (since only ASP.NET
                              >>> uses svr unless you host the CLR yourself, from what I understand). I'm
                              >>> not sure how I'd even do that in an ASP/COM-interop situation, but,
                              >>> assuming it's possible, would writing our own CLR host to use the svr GC
                              >>> help matters at all?
                              >>>
                              >>> Most of our clients' servers are dual-or-more processor boxes.
                              >>>
                              >>> Thanks again,
                              >>> Chad Myers
                              >>>
                              >>> "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard .caspershouse.c om> wrote
                              >>> in message news:Ozam7ZuZFH A.580@TK2MSFTNG P15.phx.gbl...
                              >>>> Chad,
                              >>>>
                              >>>> For the first scenario, your solution should give you an increase.
                              >>>>
                              >>>> For the second scenario, you should use reflection once to get a
                              >>>> reference to the internal static character array WhitespaceChars on the
                              >>>> string class. Then, you can write a method which will cycle through a
                              >>>> string passed to it, like so:
                              >>>>
                              >>>> public static bool TrimIsNullOrEmp ty(string value)
                              >>>> {
                              >>>> // If null, then get out.
                              >>>> if (value == null)
                              >>>> {
                              >>>> // Return true.
                              >>>> return true;
                              >>>> }
                              >>>>
                              >>>> // Cycle through the characters in the string. If the character is
                              >>>> not found
                              >>>> // in the whitespace array, return false, otherwise, when done,
                              >>>> return true.
                              >>>> foreach (char c in value)
                              >>>> {
                              >>>> // If the character is not found in the WhitespaceArray , then
                              >>>> return
                              >>>> // false.
                              >>>> if (Array.IndexOf< char>(Whitespac eArray, char) == -1)
                              >>>> {
                              >>>> // Return false.
                              >>>> return false;
                              >>>> }
                              >>>> }
                              >>>>
                              >>>> // Return true, the string is full of whitespace.
                              >>>> return true;
                              >>>> }
                              >>>>
                              >>>> I used the generic version of the IndexOf method on the Array class
                              >>>> in order to eliminate boxing. Also, if you really want to squeeze out
                              >>>> every last bit of performance from this, you can take the
                              >>>> WhitespaceArray and use the characters as keys in a dictionary. The
                              >>>> number of whitespace characters is 25 (right now, that is). However,
                              >>>> if your strings typically are padded with spaces, then you could get a
                              >>>> big speed boost by copying the array initially, and then placing the
                              >>>> space character as the first element in the array (which would cause
                              >>>> most of the calls to IndexOf to return very quickly, probably quicker
                              >>>> than a lookup in a dictionary).
                              >>>>
                              >>>> I am curious though, are you seeing a performance issue, or do you
                              >>>> just see the numbers and are worried about them? ASP.NET applications
                              >>>> tend to get in a nice groove with the GC over time.
                              >>>>
                              >>>> Hope this helps.
                              >>>>
                              >>>> --
                              >>>> - Nicholas Paldino [.NET/C# MVP]
                              >>>> - mvp@spam.guard. caspershouse.co m
                              >>>>
                              >>>> "Chad Myers" <cmyers@N0.SP4M .austin.rr.com> wrote in message
                              >>>> news:2rone.1378 0$PR6.9706@torn ado.texas.rr.co m...
                              >>>>> I've been perf testing an application of mine and I've noticed that
                              >>>>> there are a lot (and I mean A LOT -- megabytes and megabytes of 'em)
                              >>>>> System.String instances being created.
                              >>>>>
                              >>>>> I've done some analysis and I'm led to believe (but can't yet
                              >>>>> quantitatively establish as fact) that the two basic culprits are a
                              >>>>> lot of calls to:
                              >>>>>
                              >>>>> 1.) if( someString.ToLo wer() == "somestring " )
                              >>>>>
                              >>>>> and
                              >>>>>
                              >>>>> 2.) if( someString != null && someString.Trim ().Length > 0 )
                              >>>>>
                              >>>>>
                              >>>>> ToLower() generates a new string instance as does Trim().
                              >>>>>
                              >>>>> I believe that these are getting called many times and churning up a
                              >>>>> bunch of strings faster than the GC can collect them, or perhaps
                              >>>>> there's some weird interning/caching thing going on. Regardless, the
                              >>>>> number of string instances grows and grows. It gets bumped down
                              >>>>> occasionally, but it's basically 5 steps forward, 1 back.
                              >>>>>
                              >>>>> For reference, this is an ASP application calling into .NET ComVisible
                              >>>>> objects. So I assume this uses the workstation GC, right?
                              >>>>>
                              >>>>>
                              >>>>> Anyhow, so I think that I can solve problem (1) with String.Compare( )
                              >>>>> which can perform in-place case-insensitive comparisons without
                              >>>>> generating new string instances.
                              >>>>>
                              >>>>> Problem (2), however, is more complicated. There doesn't appear to be
                              >>>>> a TrimmedLength or any type of method or property that can give me the
                              >>>>> length of a string, minus whitespace and without generating a new
                              >>>>> string instance, in the BCL.
                              >>>>>
                              >>>>> I suppose I could do some unsafe, or even unmanaged code (which is
                              >>>>> what MSFT did for all their string handling stuff inside System.String
                              >>>>> and using the COMString stuff), but I'd like to try to avoid that, or
                              >>>>> at least use a library that's already written and well tested.
                              >>>>>
                              >>>>> Any thoughts?
                              >>>>>
                              >>>>> Thanks in advance,
                              >>>>> Chad Myers
                              >>>>>
                              >>>>>
                              >>>>
                              >>>>
                              >>>
                              >>>[/color]
                              >>
                              >>[/color]
                              >
                              >[/color]


                              Comment

                              Working...