std::string and refcounting

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • joe martin

    std::string and refcounting

    In recent discussions relating to what to use for a new project which
    integrated the work of two, previously seperate, teams we got to the
    subject of our respective string implementations . One team rolled
    their own strings while the other used the std::string. Reasons for
    using the home-grown strings(and vectors) were mainly refcounting and
    portabillity, but I thought that these days almost all STL
    implementations used refcounted strings and that the STL was available
    for most platforms.

    When we got back to test things out with my compiler (MSVC++ 6 with
    the latest patch-level) strings were refcounted but on the other team
    lead's computer (.net) strings were not refcounted. Do any of you know
    a webpage or site that consolidates information about the STL
    implementations on various platforms or does anyone have specific
    information about the state of the STL on Windows, WinCE, Symbian, Mac
    or Linux?

    Thanks for any help,

    -joe
  • red floyd

    #2
    Re: std::string and refcounting

    joe martin wrote:
    [color=blue]
    > In recent discussions relating to what to use for a new project which
    > integrated the work of two, previously seperate, teams we got to the
    > subject of our respective string implementations . One team rolled
    > their own strings while the other used the std::string. Reasons for
    > using the home-grown strings(and vectors) were mainly refcounting and
    > portabillity, but I thought that these days almost all STL
    > implementations used refcounted strings and that the STL was available
    > for most platforms.
    >
    > When we got back to test things out with my compiler (MSVC++ 6 with
    > the latest patch-level) strings were refcounted but on the other team
    > lead's computer (.net) strings were not refcounted. Do any of you know
    > a webpage or site that consolidates information about the STL
    > implementations on various platforms or does anyone have specific
    > information about the state of the STL on Windows, WinCE, Symbian, Mac
    > or Linux?
    >[/color]

    std::string was changed in VC.NET because of threading issues. However,
    that's kind of OT.

    Comment

    • Dietmar Kuehl

      #3
      Re: std::string and refcounting

      joe martin wrote:[color=blue]
      > When we got back to test things out with my compiler (MSVC++ 6 with
      > the latest patch-level) strings were refcounted but on the other team
      > lead's computer (.net) strings were not refcounted. Do any of you know
      > a webpage or site that consolidates information about the STL
      > implementations on various platforms or does anyone have specific
      > information about the state of the STL on Windows, WinCE, Symbian, Mac
      > or Linux?[/color]

      I don't have specific information but my understanding from talking
      to the other C++ library implementers is that everybody is moving
      away from reference counted implementations of 'std::string'.
      Essentially, the reason is that the interface is not really suitable
      for this kind of implementation despite the fact that the specification
      in the standard actually even mentions reference counting in a note
      (if I remember correctly). This is somewhat related to the history of
      the string class which was vamped up when everything became a template.
      Things are further complicated in [potentially] multi-threaded
      environments where the reference counting approach effectively requires
      mutex locks in various places which significantly increases the costs.

      I haven't verified the results but apparently the conclusion is that
      copying strings is acceptable and the costs can be further reduced
      by the "small string"-optimization (which simply embeds the string in
      the string object directly if it is smaller than eg. 32 chars). For
      really large strings you probably want to pass them around by reference
      or through a shared pointer - at least when they are immutable.
      --
      <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
      <http://www.contendix.c om> - Software Development & Consulting

      Comment

      • joe martin

        #4
        Re: std::string and refcounting

        On Wed, 21 Apr 2004 05:38:10 +0200, Dietmar Kuehl
        <dietmar_kuehl@ yahoo.com> wrote:
        [color=blue]
        >joe martin wrote:[color=green]
        >> When we got back to test things out with my compiler (MSVC++ 6 with
        >> the latest patch-level) strings were refcounted but on the other team
        >> lead's computer (.net) strings were not refcounted. Do any of you know
        >> a webpage or site that consolidates information about the STL
        >> implementations on various platforms or does anyone have specific
        >> information about the state of the STL on Windows, WinCE, Symbian, Mac
        >> or Linux?[/color]
        >
        >I don't have specific information but my understanding from talking
        >to the other C++ library implementers is that everybody is moving
        >away from reference counted implementations of 'std::string'.
        >Essentially, the reason is that the interface is not really suitable
        >for this kind of implementation despite the fact that the specification
        >in the standard actually even mentions reference counting in a note
        >(if I remember correctly). This is somewhat related to the history of
        >the string class which was vamped up when everything became a template.
        >Things are further complicated in [potentially] multi-threaded
        >environments where the reference counting approach effectively requires
        >mutex locks in various places which significantly increases the costs.[/color]

        I thought though that the use of atomic incrementors and decrementors
        could be used in place of a mutex and that they were available on most
        processors. I asked on the Windows newsgroup about what their reasons
        might have been to not just use the provided Interlocked functions but
        havn't really gotten a response. Maybe I am wrong that atomic fcns are
        all that are really needed for refcounted objects? That would be
        unfortunate as I think this is our internal solution.
        [color=blue]
        >
        >I haven't verified the results but apparently the conclusion is that
        >copying strings is acceptable and the costs can be further reduced
        >by the "small string"-optimization (which simply embeds the string in
        >the string object directly if it is smaller than eg. 32 chars). For
        >really large strings you probably want to pass them around by reference
        >or through a shared pointer - at least when they are immutable.[/color]

        This makes sense I guess although refcounting seems so much more
        efficient. My hope is that Smarter Brains Than Mine have considered
        the necessary issues in most STL implementations and acted
        accordingly. Anyway, thanks for your response.

        -joe

        Comment

        • Siemel Naran

          #5
          Re: std::string and refcounting

          "joe martin" <joemar@hormel. product.iwishiw asdead.org> wrote in message[color=blue]
          > <dietmar_kuehl@ yahoo.com> wrote:[/color]
          [color=blue][color=green]
          > >I don't have specific information but my understanding from talking
          > >to the other C++ library implementers is that everybody is moving
          > >away from reference counted implementations of 'std::string'.
          > >Essentially, the reason is that the interface is not really suitable
          > >for this kind of implementation despite the fact that the specification
          > >in the standard actually even mentions reference counting in a note
          > >(if I remember correctly). This is somewhat related to the history of
          > >the string class which was vamped up when everything became a template.[/color][/color]

          What is it about the interface turns implementors away from reference
          counting?
          [color=blue][color=green]
          > >Things are further complicated in [potentially] multi-threaded
          > >environments where the reference counting approach effectively requires
          > >mutex locks in various places which significantly increases the costs.[/color][/color]

          This is only a problem when we share writable strings between threads. How
          often does this happen anyway? For that matter, isn't boost::shared_p tr a
          problem?

          [color=blue]
          > This makes sense I guess although refcounting seems so much more
          > efficient. My hope is that Smarter Brains Than Mine have considered
          > the necessary issues in most STL implementations and acted
          > accordingly. Anyway, thanks for your response.[/color]

          Why would refcounted strings be faster? Sure, it's fast when you pass and
          return strings by value. But then when you change the reference copied
          string you have to make a deep copy anyway. Also, there is the return value
          optimization, but I don't know how many compilers implement this.


          Comment

          • Dietmar Kuehl

            #6
            Re: std::string and refcounting

            joe martin <joemar@hormel. product.iwishiw asdead.org> wrote:[color=blue]
            > I thought though that the use of atomic incrementors and decrementors
            > could be used in place of a mutex and that they were available on most
            > processors. I asked on the Windows newsgroup about what their reasons
            > might have been to not just use the provided Interlocked functions but
            > havn't really gotten a response. Maybe I am wrong that atomic fcns are
            > all that are really needed for refcounted objects? That would be
            > unfortunate as I think this is our internal solution.[/color]

            My understanding is that atomic increments and decrements (combined at
            least in one direction with a test) are sufficient for single processor
            machines but not for multi processor machines. However, I'm not really
            sure about this.
            [color=blue]
            > This makes sense I guess although refcounting seems so much more
            > efficient.[/color]

            Is this a conclusion from measuring or from deduction? I can see that
            reference counting huge strings will probably be more efficient but
            for the typical small strings I'm using in my programs I doubt that a
            reference counted approach is really faster. On the other hand, I
            haven't measured it either.
            [color=blue]
            > My hope is that Smarter Brains Than Mine have considered
            > the necessary issues in most STL implementations and acted
            > accordingly. Anyway, thanks for your response.[/color]

            The problem with the standard string class is that the specification
            effectively requires copying the string in many, often unexpected,
            places. For example, when obtaining and dereferecing and iterator for
            a non-const object, a copy becomes necessary. The necessary additional
            logic will almost certain dwarf any gains obtained from omitting
            copies except when handling mostly fairly large strings. A string
            class avoiding problems like this could use reference counting more
            effectively but I would still expect it to pay off only for bigger
            strings. A good analysis of this would probably be quite interesting.
            --
            <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
            <http://www.contendix.c om> - Software Development & Consulting

            Comment

            • Peter van Merkerk

              #7
              Re: std::string and refcounting

              > What is it about the interface turns implementors away from reference[color=blue]
              > counting?[/color]

              This (old, possibly outdated) article sheds some light on the problem with
              reference counted string implementations :


              --
              Peter van Merkerk
              peter.van.merke rk(at)dse.nl


              Comment

              • Michiel Salters

                #8
                Re: std::string and refcounting

                joe martin <joemar@hormel. product.iwishiw asdead.org> wrote in message news:<2n1b80dv7 q8uf0oove1njb9u r8pnem1nh8@4ax. com>...[color=blue]
                > In recent discussions relating to what to use for a new project which
                > integrated the work of two, previously seperate, teams we got to the
                > subject of our respective string implementations . One team rolled
                > their own strings while the other used the std::string. Reasons for
                > using the home-grown strings(and vectors) were mainly refcounting and
                > portabillity, but I thought that these days almost all STL
                > implementations used refcounted strings and that the STL was available
                > for most platforms.
                >
                > When we got back to test things out with my compiler (MSVC++ 6 with
                > the latest patch-level) strings were refcounted but on the other team
                > lead's computer (.net) strings were not refcounted. Do any of you know
                > a webpage or site that consolidates information about the STL
                > implementations on various platforms or does anyone have specific
                > information about the state of the STL on Windows, WinCE, Symbian, Mac
                > or Linux?[/color]

                A site that addresses the basics is www.gotw.ca, especially GOTW
                articles #43-#45. The executive summary: refcounting is too hard in
                threaded environments, and even in single-threaded environments
                typically provides little if any advantage.

                Regards,
                Michiel Salters

                Comment

                • Alexander Terekhov

                  #9
                  Re: std::string and refcounting


                  Michiel Salters wrote:
                  [...][color=blue]
                  > A site that addresses the basics is www.gotw.ca, especially GOTW
                  > articles #43-#45. The executive summary: refcounting is too hard in
                  > threaded environments, and even in single-threaded environments
                  > typically provides little if any advantage.[/color]

                  First off, it isn't really too hard. As for advantage... if deep
                  copying needs to allocate memory (small string optimisations
                  aside for amoment), it simply means that you'll incur "some"
                  synchronisation overheard in the allocator instead of one single
                  "naked" atomic increment without any membars on refcount. #43-#45
                  is rather interesting reading but don't believe everything
                  (especially conclusions) that it says.



                  regards,
                  alexander.

                  Comment

                  • Dietmar Kuehl

                    #10
                    Re: std::string and refcounting

                    "Siemel Naran" <SiemelNaran@RE MOVE.att.net> wrote in message news:<k5Hhc.183 51$um3.396410@b gtnsc04-news.ops.worldn et.att.net>...[color=blue]
                    > "joe martin" <joemar@hormel. product.iwishiw asdead.org> wrote in message[color=green]
                    > > <dietmar_kuehl@ yahoo.com> wrote:[/color]
                    >[color=green][color=darkred]
                    > > >I don't have specific information but my understanding from talking
                    > > >to the other C++ library implementers is that everybody is moving
                    > > >away from reference counted implementations of 'std::string'.
                    > > >Essentially, the reason is that the interface is not really suitable
                    > > >for this kind of implementation despite the fact that the specification
                    > > >in the standard actually even mentions reference counting in a note
                    > > >(if I remember correctly). This is somewhat related to the history of
                    > > >the string class which was vamped up when everything became a template.[/color][/color]
                    >
                    > What is it about the interface turns implementors away from reference
                    > counting?[/color]

                    Effectively, the string has to be unshared in many situations often
                    unexpected situation. In particular, the string has to be [potentially]
                    unshared for each character access. This means that you get a conditional
                    dealing with the reference count in each iterator dereference, each array
                    access operation (on non-const strings, of course). This costs cycles
                    even for the considerate people which normally pass strings by reference.
                    Also, implementers use small string optimizations which don't need an
                    allocation for strings up to a certain size, eg. 32 chars: this is big
                    enough to contain many strings (IDs, tpyical data base values, etc.) and
                    only incurs a memory allocation for really big strings. With all this it
                    turns out that reference counting is actually more expensive than copying
                    strings in some cases.
                    [color=blue][color=green][color=darkred]
                    > > >Things are further complicated in [potentially] multi-threaded
                    > > >environments where the reference counting approach effectively requires
                    > > >mutex locks in various places which significantly increases the costs.[/color][/color]
                    >
                    > This is only a problem when we share writable strings between threads.[/color]

                    Yes and no: the problem with reference counting inside a string is that
                    it is an implementation detail. As such, the implementer of a strings
                    class for a multi-threaded environment has to make sure that it works
                    correctly if the string representation is really shared between threads
                    (well, strictly speaking the standard makes no such requirement but the
                    users will do anyway): after all, to the user the strings are separate
                    things and there is no need to protect them in any form from concurrent
                    accesses. As a consequence, the string has to do protections internally.
                    That is, it is a problem even if the strings are read-only and not even
                    shared at all...
                    [color=blue]
                    > How
                    > often does this happen anyway? For that matter, isn't boost::shared_p tr a
                    > problem?[/color]

                    'shared_ptr' does not have this problem because they do no internal
                    sharing magic: if the 'shared_ptr' is used from different threads, the
                    user is responsible for the protection against concurrent accesses.
                    --
                    <mailto:dietmar _kuehl@yahoo.co m> <http://www.dietmar-kuehl.de/>
                    <http://www.contendix.c om> - Software Development & Consulting

                    Comment

                    • Michiel Salters

                      #11
                      Re: std::string and refcounting

                      Alexander Terekhov <terekhov@web.d e> wrote in message news:<4087A330. 42AED61B@web.de >...[color=blue]
                      > Michiel Salters wrote:
                      > [...][color=green]
                      > > A site that addresses the basics is www.gotw.ca, especially GOTW
                      > > articles #43-#45. The executive summary: refcounting is too hard in
                      > > threaded environments, and even in single-threaded environments
                      > > typically provides little if any advantage.[/color]
                      >
                      > First off, it isn't really too hard.[/color]

                      I think we're talking about two things. You probably interpreted it
                      as "too hard to implement correctly" while I meant "too hard to
                      implement correctly and still faster than comparable non-refcounted".
                      [color=blue]
                      > As for advantage... if deep copying needs to allocate memory
                      > (small string optimisations aside for amoment), it simply means
                      > that you'll incur "some" synchronisation overheard in the
                      > allocator instead of one single "naked" atomic increment without
                      > any membars on refcount.[/color]

                      True. COW obviously shines in the absence of W. Of course, the common
                      CHAR_T& STRING::operato r[](pos_type) might very well be a write, which
                      causes branches and possibly copies in COW-types.
                      [color=blue]
                      > #43-#45 is rather interesting reading but don't believe
                      > everything (especially conclusions) that it says.[/color]

                      Indeed. The best string class can only be found by profiling.
                      Until that time, stick with std::string. It is universally available,
                      and in general recent versions are pretty good for common cases.
                      It also has the added advantage of being able to use
                      platform-specific tricks in the implementation without sacrificing
                      portability, something your code can never achieve ;)

                      Regards,
                      Michiel Salters

                      Comment

                      • Alexander Terekhov

                        #12
                        Re: std::string and refcounting


                        Michiel Salters wrote:
                        [...][color=blue]
                        > True. COW obviously shines in the absence of W. Of course, the common
                        > CHAR_T& STRING::operato r[](pos_type) might very well be a write, which
                        > causes branches and possibly copies in COW-types.[/color]

                        Use "const CHAR_T& STRING::operato r[](pos_type) const" for reads.

                        regards,
                        alexander.

                        Comment

                        • Peter van Merkerk

                          #13
                          Re: std::string and refcounting

                          > > #43-#45 is rather interesting reading but don't believe[color=blue][color=green]
                          > > everything (especially conclusions) that it says.[/color]
                          >
                          > Indeed. The best string class can only be found by profiling.[/color]

                          For a given case and a given platform. It is not realistic to expect for a
                          string implementation to produce optimal results in every case, a trade off
                          has to be made somewhere.

                          --
                          Peter van Merkerk
                          peter.van.merke rk(at)dse.nl


                          Comment

                          Working...