Efficent use of the string class

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andrew Brampton

    Efficent use of the string class

    Hi,
    I'm new to using std classes, but I was wondering how I could do the
    following efficiently with the string class:

    Say I had a string with delimited values such as:
    "field1,field2, field3,field4"
    and I wanted to to place those 4 fields into different variables I would do
    something like:

    /* Note code quickly pulled from my head - may be wrong :) */
    char *tmp_ptr;
    char *ptr = input;

    while (ptr != null) {
    tmp_ptr = strchr(ptr, ',');
    strncpy(somewhe re, ptr, tmp_ptr - ptr);
    ptr = tmp_ptr;
    }

    Now that starts with a pointer to the beginning of the file, and moves it
    along... and each new search done by strchr starts where the last one left
    off.

    Now I'm faced with the same problem, but I wanted to use C++... so my code
    would look like this:

    string input = "blah,blah,blah ,blah";
    int pos = 0;
    int tmppos = 0;

    while (tmp != npos) {
    tmppos = input.find_firs t_of(",", pos);
    //do some kind of copy from pos, to tmppos
    pos = tmppos;
    }

    This again loops through the string, but I'm worried on how C++ handles this
    behind the scenes... Everytime I do input.find_firs t_of(.., pos) will it
    moving in its internal array from 0 to pos, and then starts the search?...In
    my previous case it didn't have to move from 0 to pos on each iteration
    because I stored pos as a pointer..

    So I was wondering if there is some way I can make input remember the last
    place I started from inside the string, or, should I just use c_str() and
    use my own char * to do everything?

    Also it should be noted that I'm only asking this so I can learn best
    practices, but say for example I wasn't doing this on small data, but data
    megabytes big, I imagine using a pointer to remember where to start from
    will be faster.

    Any feedback is welcome..
    Thanks
    Andrew




  • Karl Heinz Buchegger

    #2
    Re: Efficent use of the string class

    Andrew Brampton wrote:[color=blue]
    >
    > string input = "blah,blah,blah ,blah";
    > int pos = 0;
    > int tmppos = 0;
    >
    > while (tmp != npos) {
    > tmppos = input.find_firs t_of(",", pos);
    > //do some kind of copy from pos, to tmppos
    > pos = tmppos;
    > }
    >
    > This again loops through the string, but I'm worried on how C++ handles this
    > behind the scenes... Everytime I do input.find_firs t_of(.., pos) will it
    > moving in its internal array from 0 to pos, and then starts the search?...In
    > my previous case it didn't have to move from 0 to pos on each iteration
    > because I stored pos as a pointer..[/color]

    What do you think is the second argument to find_first_of() for?
    [color=blue]
    >
    > So I was wondering if there is some way I can make input remember the last
    > place I started from inside the string,[/color]

    You already did this by using the pos variable.

    --
    Karl Heinz Buchegger
    kbuchegg@gascad .at

    Comment

    • Siemel Naran

      #3
      Re: Efficent use of the string class

      "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
      news:mb9oc.2112
      [color=blue]
      > while (ptr != null) {
      > tmp_ptr = strchr(ptr, ',');
      > strncpy(somewhe re, ptr, tmp_ptr - ptr);
      > ptr = tmp_ptr;
      > }[/color]
      [color=blue]
      > Now I'm faced with the same problem, but I wanted to use C++... so my code
      > would look like this:[/color]

      What works in C++ also works in C++.

      The direct translation would probably to C++ style algorithms would be
      std::find, but what you have works fine too and seems efficient. You can
      time.


      Comment

      • David Harmon

        #4
        Re: Efficent use of the string class

        On Tue, 11 May 2004 19:46:16 +0100 in comp.lang.c++, "Andrew Brampton"
        <andrew@bramp.f reeserve.co.uk> wrote,
        [color=blue]
        >while (tmp != npos) {
        > tmppos = input.find_firs t_of(",", pos);
        > //do some kind of copy from pos, to tmppos
        > pos = tmppos;
        >}[/color]

        My example version at


        Comment

        • Andrew Brampton

          #5
          Re: Efficent use of the string class

          Thanks for all the answers, but maybe I wasn't clear enough.

          Take for example this:
          string blah = "1234567890 ";
          for (int i=0; i<10; i++) {
          printf( blah.at(i) );
          }

          This can be improved in C by doing:
          char [] blah = "1234567890 "
          for (char *ptr = blah; ptr < blah + 10; ptr++) {
          printf( ptr );
          }

          In such a trival example the speed incrase is negletable... but say the
          string blah was a megabyte big, then the 2nd example will be alot faster.

          Now I was wondering if there was some similar way to do the 2nd example, but
          in C++. Something I hope like this:
          string blah = "1234567890 ";
          string_pointer ptr = blah; //some magical class I wish exists
          for (; ptr < blah.end() ; ptr++) {
          printf( ptr )
          }

          I know in this basic example the iterator would be used... but refer to my
          orginal question since I'm sure a iterator can't be used in that case.

          Thanks
          Andrew

          P.S I know I can keep my C code and use it in C++, but I was wondering if
          there was a C++ specific way of doing it.

          "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
          news:mb9oc.2112 $wI4.214043@war ds.force9.net.. .
          | Hi,
          | I'm new to using std classes, but I was wondering how I could do the
          | following efficiently with the string class:
          |
          | Say I had a string with delimited values such as:
          | "field1,field2, field3,field4"
          | and I wanted to to place those 4 fields into different variables I would
          do
          | something like:
          |
          | /* Note code quickly pulled from my head - may be wrong :) */
          | char *tmp_ptr;
          | char *ptr = input;
          |
          | while (ptr != null) {
          | tmp_ptr = strchr(ptr, ',');
          | strncpy(somewhe re, ptr, tmp_ptr - ptr);
          | ptr = tmp_ptr;
          | }
          |
          | Now that starts with a pointer to the beginning of the file, and moves it
          | along... and each new search done by strchr starts where the last one left
          | off.
          |
          | Now I'm faced with the same problem, but I wanted to use C++... so my code
          | would look like this:
          |
          | string input = "blah,blah,blah ,blah";
          | int pos = 0;
          | int tmppos = 0;
          |
          | while (tmp != npos) {
          | tmppos = input.find_firs t_of(",", pos);
          | //do some kind of copy from pos, to tmppos
          | pos = tmppos;
          | }
          |
          | This again loops through the string, but I'm worried on how C++ handles
          this
          | behind the scenes... Everytime I do input.find_firs t_of(.., pos) will it
          | moving in its internal array from 0 to pos, and then starts the
          search?...In
          | my previous case it didn't have to move from 0 to pos on each iteration
          | because I stored pos as a pointer..
          |
          | So I was wondering if there is some way I can make input remember the last
          | place I started from inside the string, or, should I just use c_str() and
          | use my own char * to do everything?
          |
          | Also it should be noted that I'm only asking this so I can learn best
          | practices, but say for example I wasn't doing this on small data, but data
          | megabytes big, I imagine using a pointer to remember where to start from
          | will be faster.
          |
          | Any feedback is welcome..
          | Thanks
          | Andrew
          |
          |
          |
          |


          Comment

          • John Harrison

            #6
            Re: Efficent use of the string class


            "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
            news:Xiboc.2157 $wI4.222997@war ds.force9.net.. .[color=blue]
            > Thanks for all the answers, but maybe I wasn't clear enough.
            >
            > Take for example this:
            > string blah = "1234567890 ";
            > for (int i=0; i<10; i++) {
            > printf( blah.at(i) );
            > }
            >
            > This can be improved in C by doing:
            > char [] blah = "1234567890 "
            > for (char *ptr = blah; ptr < blah + 10; ptr++) {
            > printf( ptr );[/color]

            printf( *ptr );
            [color=blue]
            > }
            >
            > In such a trival example the speed incrase is negletable... but say the
            > string blah was a megabyte big, then the 2nd example will be alot faster.[/color]

            No it wouldn't. You seem to have some misconception about how integer
            offsets and strings work together. Now sure the second example might be 10%
            faster on a small string, but if so it would be 10% faster on any length of
            string.
            [color=blue]
            >
            > Now I was wondering if there was some similar way to do the 2nd example,[/color]
            but[color=blue]
            > in C++. Something I hope like this:
            > string blah = "1234567890 ";
            > string_pointer ptr = blah; //some magical class I wish exists
            > for (; ptr < blah.end() ; ptr++) {
            > printf( ptr )
            > }
            >
            > I know in this basic example the iterator would be used... but refer to my
            > orginal question since I'm sure a iterator can't be used in that case.
            >[/color]

            You can use an iterator, use it in combination with std::find, instead of
            std::string::fi nd_first_of

            But I really don't see what you are worred about, all these methods are
            essentially equivalent.

            john


            Comment

            • Andrew Brampton

              #7
              Re: Efficent use of the string class

              Hi,
              thanks for trying to help, but I've just wrote a little benchmarking
              program, which proves that its ALOT more than 10%
              Here is the code (which works on windows, because it uses GetTickCount()) :

              DWORD now;
              int limit = 1000000; //size of string
              char temp; //used later to hold each char in the string
              std::string blah; //create a string
              blah.reserve(li mit); //reserve limit size of memory

              //File the string up with Xs, not used in the benchmark
              for (int i = 0; i<limit; i++) {
              blah.append("X" );
              }

              now = GetTickCount(); //Used for timing

              //Now we have a file full of Xs..
              //In this example we won't print the X out, instead
              //we will just set some varible it...kinda pointless :)
              //but printf is slow and takes away what we are actually benchmarking
              for (int i=0; i<limit; i++) {
              temp = blah.at(i) ;
              }

              printf("Done 1 in %dms", GetTickCount() - now);
              now = GetTickCount(); //Used for timing

              //Now we do the same thing, but with c char pointers
              char *blah_c = (char *)blah.c_str();
              for (char *ptr = blah_c; ptr < blah_c + limit; ptr++) {
              temp = *ptr;
              }

              printf("Done 2 in %dms\n", GetTickCount() - now);

              Here are the results run 10 times:
              Done 1 in 235ms Done 2 in 0ms
              Done 1 in 235ms Done 2 in 15ms
              Done 1 in 234ms Done 2 in 16ms
              Done 1 in 234ms Done 2 in 0ms
              Done 1 in 235ms Done 2 in 0ms
              Done 1 in 265ms Done 2 in 0ms
              Done 1 in 250ms Done 2 in 16ms
              Done 1 in 234ms Done 2 in 16ms
              Done 1 in 265ms Done 2 in 0ms
              Done 1 in 234ms Done 2 in 0ms

              You are welcome to point out any mistakes I've made, BUT the reason for
              example 1 is slow is because its doing this:
              blah.at(0)
              //internal c array is currently pointed at 0
              //work out 0 + 0
              //set a c pointer to 0 + 0
              //return memory that pointer is pointing at;
              blah.at(1)
              //internal c array is again pointed at 0
              //work out 0 + 1
              //set a c pointer to 0 + 1
              //return memory that pointer is pointing at;
              blah.at(2)
              //internal c array is yet again pointed at 0
              //work out 0 + 2
              //set a c pointer to 0 + 2
              //return memory that pointer is pointing at;

              Now the code using the char pointer does this
              *ptr
              //return memory that ptr is pointing at;
              ptr++;
              //increment ptr;
              *ptr
              //return memory that ptr is pointing at;
              ptr++
              //increment ptr;
              *ptr
              //return memory that ptr is pointing at;
              ptr++
              //increment ptr;

              If I was any good at asm I could most likly write that down in opcodes..

              I can imagine that Iterators were designed to stop this being slow, because
              the iterator keeps a internal pointer, just like my ptr did.

              You suggested using find with iterators, this might be the solution to my
              problem, but its getting late now and I have some other stuff to do now,
              I'll try that out tommorow.

              Thanks for the feedback
              Andrew

              "John Harrison" <john_andronicu s@hotmail.com> wrote in message
              news:2gcuj4F1hl pjU1@uni-berlin.de...
              |
              | "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
              | news:Xiboc.2157 $wI4.222997@war ds.force9.net.. .
              | > Thanks for all the answers, but maybe I wasn't clear enough.
              | >
              | > Take for example this:
              | > string blah = "1234567890 ";
              | > for (int i=0; i<10; i++) {
              | > printf( blah.at(i) );
              | > }
              | >
              | > This can be improved in C by doing:
              | > char [] blah = "1234567890 "
              | > for (char *ptr = blah; ptr < blah + 10; ptr++) {
              | > printf( ptr );
              |
              | printf( *ptr );
              |
              | > }
              | >
              | > In such a trival example the speed incrase is negletable... but say the
              | > string blah was a megabyte big, then the 2nd example will be alot
              faster.
              |
              | No it wouldn't. You seem to have some misconception about how integer
              | offsets and strings work together. Now sure the second example might be
              10%
              | faster on a small string, but if so it would be 10% faster on any length
              of
              | string.
              |
              | >
              | > Now I was wondering if there was some similar way to do the 2nd example,
              | but
              | > in C++. Something I hope like this:
              | > string blah = "1234567890 ";
              | > string_pointer ptr = blah; //some magical class I wish exists
              | > for (; ptr < blah.end() ; ptr++) {
              | > printf( ptr )
              | > }
              | >
              | > I know in this basic example the iterator would be used... but refer to
              my
              | > orginal question since I'm sure a iterator can't be used in that case.
              | >
              |
              | You can use an iterator, use it in combination with std::find, instead of
              | std::string::fi nd_first_of
              |
              | But I really don't see what you are worred about, all these methods are
              | essentially equivalent.
              |
              | john
              |
              |


              Comment

              • JKop

                #8
                Re: Efficent use of the string class

                Andrew Brampton posted:
                [color=blue]
                > while (ptr != null)[/color]


                Do you not find that degrading?


                while (ptr)


                It reminds me of people who are learning a language, for instance English,
                and intentionally say "which" instead "that".


                -JKop

                Comment

                • Siemel Naran

                  #9
                  Re: Efficent use of the string class

                  "JKop" <NULL@NULL.NULL > wrote in message news:ZJcoc.7491[color=blue]
                  > Andrew Brampton posted:[/color]
                  [color=blue][color=green]
                  > > while (ptr != null)[/color]
                  >
                  > Do you not find that degrading?
                  >
                  > while (ptr)
                  >
                  > It reminds me of people who are learning a language, for instance English,
                  > and intentionally say "which" instead "that".[/color]

                  Neither (ptr != NULL) nor (ptr) should be considered better than the other.
                  Similarly, between putting braces on the same line and next line, neither
                  one is better. The thing to pay attention to is to follow a consistent
                  coding standard throughout your code.


                  Comment

                  • Siemel Naran

                    #10
                    Re: Efficent use of the string class

                    "John Harrison" <john_andronicu s@hotmail.com> wrote in message
                    news:2gcuj4F1hl pjU1@uni-
                    [color=blue][color=green]
                    > > I know in this basic example the iterator would be used... but refer to[/color][/color]
                    my[color=blue][color=green]
                    > > orginal question since I'm sure a iterator can't be used in that case.[/color]
                    >
                    > You can use an iterator, use it in combination with std::find, instead of
                    > std::string::fi nd_first_of[/color]

                    On some implementations , std::string::co nst_iterator is just const char *;
                    so using std::find with iterators should be very fast.


                    Comment

                    • John Harrison

                      #11
                      Re: Efficent use of the string class


                      "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
                      news:ZYboc.2167 $wI4.225743@war ds.force9.net.. .[color=blue]
                      > Hi,
                      > thanks for trying to help, but I've just wrote a little benchmarking
                      > program, which proves that its ALOT more than 10%
                      > Here is the code (which works on windows, because it uses GetTickCount()) :
                      >
                      > DWORD now;
                      > int limit = 1000000; //size of string
                      > char temp; //used later to hold each char in the string
                      > std::string blah; //create a string
                      > blah.reserve(li mit); //reserve limit size of memory
                      >
                      > //File the string up with Xs, not used in the benchmark
                      > for (int i = 0; i<limit; i++) {
                      > blah.append("X" );
                      > }
                      >
                      > now = GetTickCount(); //Used for timing
                      >
                      > //Now we have a file full of Xs..
                      > //In this example we won't print the X out, instead
                      > //we will just set some varible it...kinda pointless :)
                      > //but printf is slow and takes away what we are actually benchmarking
                      > for (int i=0; i<limit; i++) {
                      > temp = blah.at(i) ;
                      > }
                      >
                      > printf("Done 1 in %dms", GetTickCount() - now);
                      > now = GetTickCount(); //Used for timing
                      >
                      > //Now we do the same thing, but with c char pointers
                      > char *blah_c = (char *)blah.c_str();
                      > for (char *ptr = blah_c; ptr < blah_c + limit; ptr++) {
                      > temp = *ptr;
                      > }
                      >
                      > printf("Done 2 in %dms\n", GetTickCount() - now);
                      >
                      > Here are the results run 10 times:
                      > Done 1 in 235ms Done 2 in 0ms
                      > Done 1 in 235ms Done 2 in 15ms
                      > Done 1 in 234ms Done 2 in 16ms
                      > Done 1 in 234ms Done 2 in 0ms
                      > Done 1 in 235ms Done 2 in 0ms
                      > Done 1 in 265ms Done 2 in 0ms
                      > Done 1 in 250ms Done 2 in 16ms
                      > Done 1 in 234ms Done 2 in 16ms
                      > Done 1 in 265ms Done 2 in 0ms
                      > Done 1 in 234ms Done 2 in 0ms
                      >
                      > You are welcome to point out any mistakes I've made, BUT the reason for
                      > example 1 is slow is because its doing this:[/color]

                      Obviously I didn't express my self very well.

                      I made some changes to your code to express the point I'm making better and
                      to make the test fairer.

                      1) I compiled in release mode, so the compile could optimise the code.
                      blah.at(i) is a function call, dereferencing a pointer is not. But in
                      release mode the compiler may optimise away the function call.

                      2) I made temp a global variable, without this change my compiler is smart
                      enough to realise that both sections of code do nothing, and to optimise
                      away both sections entirely.

                      3) I replaced blah.at(i) with blah[i]. This is a fairer comparison, at(i)
                      does range checking and throws an exception on an out of range index,
                      something neither blah[i] or *ptr do.

                      4) I repeated each loop 10000 times to get better timings.

                      5) This is the point I was trying to make, I repeated the test for different
                      length strings

                      Here's the results

                      length 10000, string method 359, pointer method 156
                      length 100000, string method 3453, pointer method 1704
                      length 1000000, string method 36438, pointer method 26578

                      No I was obviously wrong about 10%, its more like 100%. But my point is that
                      it is 100% slower irrespective of the length of the string (although for the
                      longest string its less than that, probably some virtual memory effect is
                      kicking in).

                      I thought your were implying that things got worse as the string got longer,
                      but maybe you didn't mean that. My point is that you might see a percentage
                      difference between all these methods, and I doubt you'll see a method more
                      efficient than using a pointer, but its a constant proportion. Its not
                      dependent on the length of the string.

                      john


                      Comment

                      • Siemel Naran

                        #12
                        Re: Efficent use of the string class

                        "John Harrison" <john_andronicu s@hotmail.com> wrote in message
                        news:2gdp2tF1j2 rpU1@uni-
                        [color=blue]
                        > I thought your were implying that things got worse as the string got[/color]
                        longer,[color=blue]
                        > but maybe you didn't mean that. My point is that you might see a[/color]
                        percentage[color=blue]
                        > difference between all these methods, and I doubt you'll see a method more
                        > efficient than using a pointer, but its a constant proportion. Its not
                        > dependent on the length of the string.[/color]

                        It does, kind of. If the optimized process is 1 sec and the unefficient way
                        is 100% slower, than it is 2 secs. If the process is 1 hours, then the
                        unefficient way is 2 hours. Percentage way, no difference. But amount
                        wise, big difference.


                        Comment

                        • Uwe Schnitker

                          #13
                          Re: Efficent use of the string class

                          "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message news:<ZYboc.216 7$wI4.225743@wa rds.force9.net> ...[color=blue]
                          > Hi,
                          > thanks for trying to help, but I've just wrote a little benchmarking
                          > program, which proves that its ALOT more than 10%
                          > Here is the code (which works on windows, because it uses GetTickCount()) :
                          >
                          > DWORD now;
                          > int limit = 1000000; //size of string
                          > char temp; //used later to hold each char in the string
                          > std::string blah; //create a string
                          > blah.reserve(li mit); //reserve limit size of memory
                          >
                          > //File the string up with Xs, not used in the benchmark
                          > for (int i = 0; i<limit; i++) {
                          > blah.append("X" );
                          > }
                          >
                          > now = GetTickCount(); //Used for timing
                          >
                          > //Now we have a file full of Xs..
                          > //In this example we won't print the X out, instead
                          > //we will just set some varible it...kinda pointless :)
                          > //but printf is slow and takes away what we are actually benchmarking
                          > for (int i=0; i<limit; i++) {
                          > temp = blah.at(i) ;[/color]

                          Please try to use blah[i] instead of blah.at(i).

                          The at function performs range-checking, but the operator[] function
                          typically does not - like the C-style code.

                          Using at is like using
                          if(ptr < blah_c + limit) temp = *ptr;
                          in the C-style example (so just for fun you might want to benchmark
                          that, too).

                          Hopefully you did a release build (with optimization on) for your benchmarks?

                          The standard library depends a lot on inline functions, if those are not
                          optimized, a C++ program using it will easily be an - or several - order(s)
                          of magnitude slower than a C-style program.

                          Of course, since you throw away what is assigned to temp in every loop
                          iteration, the compiler might just decide to take out the whole loop
                          anyway :-<

                          <SNIP>
                          [color=blue]
                          > You are welcome to point out any mistakes I've made, BUT the reason for
                          > example 1 is slow is because its doing this:
                          > blah.at(0)
                          > //internal c array is currently pointed at 0
                          > //work out 0 + 0
                          > //set a c pointer to 0 + 0
                          > //return memory that pointer is pointing at;
                          > blah.at(1)
                          > //internal c array is again pointed at 0
                          > //work out 0 + 1
                          > //set a c pointer to 0 + 1
                          > //return memory that pointer is pointing at;
                          > blah.at(2)
                          > //internal c array is yet again pointed at 0
                          > //work out 0 + 2
                          > //set a c pointer to 0 + 2
                          > //return memory that pointer is pointing at;
                          >
                          > Now the code using the char pointer does this
                          > *ptr
                          > //return memory that ptr is pointing at;
                          > ptr++;
                          > //increment ptr;
                          > *ptr
                          > //return memory that ptr is pointing at;
                          > ptr++
                          > //increment ptr;
                          > *ptr
                          > //return memory that ptr is pointing at;
                          > ptr++
                          > //increment ptr;
                          >
                          > If I was any good at asm I could most likly write that down in opcodes..
                          >
                          > I can imagine that Iterators were designed to stop this being slow, because
                          > the iterator keeps a internal pointer, just like my ptr did.
                          >
                          > You suggested using find with iterators, this might be the solution to my
                          > problem, but its getting late now and I have some other stuff to do now,
                          > I'll try that out tommorow.[/color]

                          Your analysis is impressively straightforward , but your compiler can optimize
                          away that overhead in most cases. It might just transform the code from the
                          first example to the second one behind the scenes, and create (almost) the
                          same executable for both cases. (After replacing at with [])
                          [color=blue]
                          > Thanks for the feedback
                          > Andrew[/color]

                          Uwe

                          <SNIP>

                          Comment

                          • Andrew Brampton

                            #14
                            Re: Efficent use of the string class

                            No, sorry I was just implying, that say example 1 was going to take a
                            minute, example 2 would take half that time, and 30seconds is a lot of time
                            :)

                            Thanks
                            Andrew

                            "John Harrison" <john_andronicu s@hotmail.com> wrote in message
                            news:2gdp2tF1j2 rpU1@uni-berlin.de...
                            |
                            | "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
                            | news:ZYboc.2167 $wI4.225743@war ds.force9.net.. .
                            | > Hi,
                            | > thanks for trying to help, but I've just wrote a little benchmarking
                            | > program, which proves that its ALOT more than 10%
                            | > Here is the code (which works on windows, because it uses
                            GetTickCount()) :
                            | >
                            | > DWORD now;
                            | > int limit = 1000000; //size of string
                            | > char temp; //used later to hold each char in the string
                            | > std::string blah; //create a string
                            | > blah.reserve(li mit); //reserve limit size of memory
                            | >
                            | > //File the string up with Xs, not used in the benchmark
                            | > for (int i = 0; i<limit; i++) {
                            | > blah.append("X" );
                            | > }
                            | >
                            | > now = GetTickCount(); //Used for timing
                            | >
                            | > //Now we have a file full of Xs..
                            | > //In this example we won't print the X out, instead
                            | > //we will just set some varible it...kinda pointless :)
                            | > //but printf is slow and takes away what we are actually benchmarking
                            | > for (int i=0; i<limit; i++) {
                            | > temp = blah.at(i) ;
                            | > }
                            | >
                            | > printf("Done 1 in %dms", GetTickCount() - now);
                            | > now = GetTickCount(); //Used for timing
                            | >
                            | > //Now we do the same thing, but with c char pointers
                            | > char *blah_c = (char *)blah.c_str();
                            | > for (char *ptr = blah_c; ptr < blah_c + limit; ptr++) {
                            | > temp = *ptr;
                            | > }
                            | >
                            | > printf("Done 2 in %dms\n", GetTickCount() - now);
                            | >
                            | > Here are the results run 10 times:
                            | > Done 1 in 235ms Done 2 in 0ms
                            | > Done 1 in 235ms Done 2 in 15ms
                            | > Done 1 in 234ms Done 2 in 16ms
                            | > Done 1 in 234ms Done 2 in 0ms
                            | > Done 1 in 235ms Done 2 in 0ms
                            | > Done 1 in 265ms Done 2 in 0ms
                            | > Done 1 in 250ms Done 2 in 16ms
                            | > Done 1 in 234ms Done 2 in 16ms
                            | > Done 1 in 265ms Done 2 in 0ms
                            | > Done 1 in 234ms Done 2 in 0ms
                            | >
                            | > You are welcome to point out any mistakes I've made, BUT the reason for
                            | > example 1 is slow is because its doing this:
                            |
                            | Obviously I didn't express my self very well.
                            |
                            | I made some changes to your code to express the point I'm making better
                            and
                            | to make the test fairer.
                            |
                            | 1) I compiled in release mode, so the compile could optimise the code.
                            | blah.at(i) is a function call, dereferencing a pointer is not. But in
                            | release mode the compiler may optimise away the function call.
                            |
                            | 2) I made temp a global variable, without this change my compiler is smart
                            | enough to realise that both sections of code do nothing, and to optimise
                            | away both sections entirely.
                            |
                            | 3) I replaced blah.at(i) with blah[i]. This is a fairer comparison, at(i)
                            | does range checking and throws an exception on an out of range index,
                            | something neither blah[i] or *ptr do.
                            |
                            | 4) I repeated each loop 10000 times to get better timings.
                            |
                            | 5) This is the point I was trying to make, I repeated the test for
                            different
                            | length strings
                            |
                            | Here's the results
                            |
                            | length 10000, string method 359, pointer method 156
                            | length 100000, string method 3453, pointer method 1704
                            | length 1000000, string method 36438, pointer method 26578
                            |
                            | No I was obviously wrong about 10%, its more like 100%. But my point is
                            that
                            | it is 100% slower irrespective of the length of the string (although for
                            the
                            | longest string its less than that, probably some virtual memory effect is
                            | kicking in).
                            |
                            | I thought your were implying that things got worse as the string got
                            longer,
                            | but maybe you didn't mean that. My point is that you might see a
                            percentage
                            | difference between all these methods, and I doubt you'll see a method more
                            | efficient than using a pointer, but its a constant proportion. Its not
                            | dependent on the length of the string.
                            |
                            | john
                            |
                            |


                            Comment

                            • Andrew Brampton

                              #15
                              Re: Efficent use of the string class

                              I've found the answer to my problems now.

                              Three people mentioned the answer, but only as a side note. I'm starting to
                              see this trend in Newsgroups, you ask one question, and get the answer to 10
                              others which you didn't ask :)

                              But anyway thanks to all that answered, my eventual solution was to use
                              iterators with std::find... Before now I didn't know std::find existed
                              :).... and as a quick estimate the code runs nice and fast, faster than my
                              previous find_first_of, but not as fast as my C... However the point of this
                              exercise was to learn the most C++'s ish way to do something :)

                              Thanks to all
                              Andrew

                              "Andrew Brampton" <andrew@bramp.f reeserve.co.uk> wrote in message
                              news:mb9oc.2112 $wI4.214043@war ds.force9.net.. .
                              | Hi,
                              | I'm new to using std classes, but I was wondering how I could do the
                              | following efficiently with the string class:
                              |
                              | Say I had a string with delimited values such as:
                              | "field1,field2, field3,field4"
                              | and I wanted to to place those 4 fields into different variables I would
                              do
                              | something like:
                              |
                              | /* Note code quickly pulled from my head - may be wrong :) */
                              | char *tmp_ptr;
                              | char *ptr = input;
                              |
                              | while (ptr != null) {
                              | tmp_ptr = strchr(ptr, ',');
                              | strncpy(somewhe re, ptr, tmp_ptr - ptr);
                              | ptr = tmp_ptr;
                              | }
                              |
                              | Now that starts with a pointer to the beginning of the file, and moves it
                              | along... and each new search done by strchr starts where the last one left
                              | off.
                              |
                              | Now I'm faced with the same problem, but I wanted to use C++... so my code
                              | would look like this:
                              |
                              | string input = "blah,blah,blah ,blah";
                              | int pos = 0;
                              | int tmppos = 0;
                              |
                              | while (tmp != npos) {
                              | tmppos = input.find_firs t_of(",", pos);
                              | //do some kind of copy from pos, to tmppos
                              | pos = tmppos;
                              | }
                              |
                              | This again loops through the string, but I'm worried on how C++ handles
                              this
                              | behind the scenes... Everytime I do input.find_firs t_of(.., pos) will it
                              | moving in its internal array from 0 to pos, and then starts the
                              search?...In
                              | my previous case it didn't have to move from 0 to pos on each iteration
                              | because I stored pos as a pointer..
                              |
                              | So I was wondering if there is some way I can make input remember the last
                              | place I started from inside the string, or, should I just use c_str() and
                              | use my own char * to do everything?
                              |
                              | Also it should be noted that I'm only asking this so I can learn best
                              | practices, but say for example I wasn't doing this on small data, but data
                              | megabytes big, I imagine using a pointer to remember where to start from
                              | will be faster.
                              |
                              | Any feedback is welcome..
                              | Thanks
                              | Andrew
                              |
                              |
                              |
                              |


                              Comment

                              Working...