std::set constructor taking a sorted sequence

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • desktop

    std::set constructor taking a sorted sequence

    In the C++ standard page 472 it says that you can construct a std::set
    in linear time if the constructor gets a sorted sequence of elements.
    But how is this possible when insert takes logarithmic time? Should the
    time not be nlgn where n is the number of elements?
  • Zeppe

    #2
    Re: std::set constructor taking a sorted sequence

    desktop wrote:
    In the C++ standard page 472 it says that you can construct a std::set
    in linear time if the constructor gets a sorted sequence of elements.
    But how is this possible when insert takes logarithmic time? Should the
    time not be nlgn where n is the number of elements?
    The answer is in the question. If the sequence is ordered, there is no
    need to repeat the research each time (logn), so the set can be
    constructed in a linear time (you already know where to place the
    element, that would be the action requiring logn time).

    Regards,

    Zeppe

    Comment

    • Rosarin Roy

      #3
      Re: std::set constructor taking a sorted sequence

      On Jun 10, 6:15 pm, Zeppe <z...@remove.al l.this.long.com ment.email.it>
      wrote:
      desktop wrote:
      In the C++ standard page 472 it says that you can construct a std::set
      in linear time if the constructor gets a sorted sequence of elements.
      But how is this possible when insert takes logarithmic time? Should the
      time not be nlgn where n is the number of elements?
      >
      The answer is in the question. If the sequence is ordered, there is no
      need to repeat the research each time (logn), so the set can be
      constructed in a linear time (you already know where to place the
      element, that would be the action requiring logn time).
      >
      Regards,
      >
      Zeppe
      Most of the standard libraries make use of red-black trees for
      implementing set and map. In such case the insertion cannot be linear
      in time, immaterial of input is sorted or not sorted. And the claim
      that "you already know where to place the element" is not true across
      calls, as the search always begins at root.

      I wrote a test program (which I tested on Solaris running Sun C++ 5.8
      compiler) to confirm that sorted elements' insertion doesn't take
      linear time.

      I am still curious to know if it is possible to construct a set in
      linear time.

      Rosarin Roy

      Comment

      • Jerry Coffin

        #4
        Re: std::set constructor taking a sorted sequence

        In article <f4hpva$85u$1@n ews.net.uni-c.dk>, asdfsf@asd.com says...
        In the C++ standard page 472
        It's much better to cite section numbers or (better still) section names
        -- page numbers change from one version of the standard to the next, but
        the section names remain much closer to constant.
        it says that you can construct a std::set
        in linear time if the constructor gets a sorted sequence of elements.
        But how is this possible when insert takes logarithmic time? Should the
        time not be nlgn where n is the number of elements?
        Doing it with random access iterators is (mostly) pretty simple. You
        basically ignore the fact that you're working with a red-black tree (or
        AVL tree) and instead just construct a perfectly balanced tree -- take
        the median item in the input sequence, and make that your root. This
        leaves two halves of the input, which you recursively insert as the left
        and right sub-trees of the current node.

        This means you never re-balance the tree during creation (because you're
        creating it as close to perfectly balanced as possible given the number
        of nodes), and each node you insert is always a direct descendent of the
        current node, so you never have to do a Log(N) traversal to find where
        to insert a node.

        With Input Iterators, this gets a bit ugly -- doing this in linear time
        requires figuring the distance (and median item) in constant time. You
        could temporarily copy from the input sequence to a vector or deque, but
        that would rarely be worthwhile. In theory I believe this should be a
        win for a sufficiently large collection, but by the time it gets large
        enough, the space for the temporary copy would probably be prohibitive.

        --
        Later,
        Jerry.

        The universe is a figment of its own imagination.

        Comment

        • V.R. Marinov

          #5
          Re: std::set constructor taking a sorted sequence

          On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus. comwrote:
          In article <f4hpva$85...@n ews.net.uni-c.dk>, asd...@asd.com says...
          >
          >
          Doing it with random access iterators is (mostly) pretty simple. You
          basically ignore the fact that you're working with a red-black tree (or
          AVL tree) and instead just construct a perfectly balanced tree -- take
          the median item in the input sequence, and make that your root. This
          leaves two halves of the input, which you recursively insert as the left
          and right sub-trees of the current node.

          This is a very nice idea but unfortunately it imposes (small)
          unnecessary
          overhead when inserting unsorted ranges.

          With Input Iterators, this gets a bit ugly -- doing this in linear time
          requires figuring the distance (and median item) in constant time. You
          could temporarily copy from the input sequence to a vector or deque, but
          that would rarely be worthwhile. In theory I believe this should be a
          win for a sufficiently large collection, but by the time it gets large
          enough, the space for the temporary copy would probably be prohibitive.
          This is a bit overcomplicated and I guess that the set implementation
          is
          able to avoid it by making very good use of the "hint version" of
          insert().
          According to the standard the complexity of this function is:

          "logarithmi c in general, but amortized constant if t is inserted right
          after p."

          Comment

          • Jerry Coffin

            #6
            Re: std::set constructor taking a sorted sequence

            In article <1181536549.238 032.16540@p77g2 000hsh.googlegr oups.com>,
            v.r.marinov@gma il.com says...
            On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus. comwrote:
            [ ... ]
            With Input Iterators, this gets a bit ugly -- doing this in linear time
            requires figuring the distance (and median item) in constant time. You
            could temporarily copy from the input sequence to a vector or deque, but
            that would rarely be worthwhile. In theory I believe this should be a
            win for a sufficiently large collection, but by the time it gets large
            enough, the space for the temporary copy would probably be prohibitive.
            >
            This is a bit overcomplicated and I guess that the set implementation
            is able to avoid it by making very good use of the "hint version" of
            insert(). According to the standard the complexity of this function is:
            >
            "logarithmi c in general, but amortized constant if t is inserted right
            after p."
            This has one minor problem: while it gives amortized linear complexity,
            that doesn't (strictly speaking) meet the requirement for strictly
            linear complexity.

            --
            Later,
            Jerry.

            The universe is a figment of its own imagination.

            Comment

            • Zeppe

              #7
              Re: std::set constructor taking a sorted sequence

              Rosarin Roy wrote:
              On Jun 10, 6:15 pm, Zeppe <z...@remove.al l.this.long.com ment.email.it>
              wrote:
              >desktop wrote:
              >>In the C++ standard page 472 it says that you can construct a std::set
              >>in linear time if the constructor gets a sorted sequence of elements.
              [cut]
              Most of the standard libraries make use of red-black trees for
              implementing set and map. In such case the insertion cannot be linear
              in time, immaterial of input is sorted or not sorted. And the claim
              that "you already know where to place the element" is not true across
              calls, as the search always begins at root.
              as you can see from the quoted message, the OP says that "you can
              construct a std::set in linear time if the constructor gets a sorted
              sequence of elements." Of course if you make different calls it's not
              true any more. The reference constructor is that one accepting iterator
              first, iterator last. In that case, the comparison is done on the fly
              while inserting. I don't really know the red-black tree behaviour, but I
              can expect (correct me if I'm wrong) that, being a balanced binary tree,
              the insertion will imply a tree rebalancing which is O(1) (that is, it
              does not depend on the number of nodes). So, if I already know where to
              put the next value, I just have to do the balancing, that is O(1), and
              the total complexity is O(n).
              I wrote a test program (which I tested on Solaris running Sun C++ 5.8
              compiler) to confirm that sorted elements' insertion doesn't take
              linear time.
              >
              I am still curious to know if it is possible to construct a set in
              linear time.
              You have to build the set with the proper constructor. You will see a
              nice linear increment in the performance. Use this test:

              #include <set>
              #include <vector>
              #include <boost/date_time/posix_time/posix_time.hpp>

              int main()
              {
              std::vector<lon gv(100000);
              for(std::size_t i = 0; i < 100000; ++i){
              v[i] = i;
              }

              for(std::size_t i = 0; i < 100; ++i){
              std::vector<lon g>::const_itera tor begin = v.begin();
              std::vector<lon g>::const_itera tor end = v.begin() + 1000 * i;

              boost::posix_ti me::ptime startTime =
              boost::posix_ti me::microsec_cl ock::local_time ();
              std::set<longs( begin, end);
              boost::posix_ti me::ptime endTime =
              boost::posix_ti me::microsec_cl ock::local_time ();

              std::cout << endTime - startTime << std::endl;
              }
              return 0;
              }

              and plot the results.

              Regards,

              Zeppe

              Comment

              • desktop

                #8
                Re: std::set constructor taking a sorted sequence

                Jerry Coffin wrote:
                In article <1181536549.238 032.16540@p77g2 000hsh.googlegr oups.com>,
                v.r.marinov@gma il.com says...
                >On Jun 11, 5:20 am, Jerry Coffin <jcof...@taeus. comwrote:
                >
                [ ... ]
                >
                >>With Input Iterators, this gets a bit ugly -- doing this in linear time
                >>requires figuring the distance (and median item) in constant time. You
                >>could temporarily copy from the input sequence to a vector or deque, but
                >>that would rarely be worthwhile. In theory I believe this should be a
                >>win for a sufficiently large collection, but by the time it gets large
                >>enough, the space for the temporary copy would probably be prohibitive.
                >This is a bit overcomplicated and I guess that the set implementation
                >is able to avoid it by making very good use of the "hint version" of
                >insert(). According to the standard the complexity of this function is:
                >>
                >"logarithmi c in general, but amortized constant if t is inserted right
                > after p."
                >
                This has one minor problem: while it gives amortized linear complexity,
                that doesn't (strictly speaking) meet the requirement for strictly
                linear complexity.
                >
                Is it not enough to argument that the re balancing only takes constant
                time when the sequence are sorted?

                Re balancing can potentially take O(lg n) time. But when the sequence is
                sorted the while loop in re balancing will be executed at most one time
                so you get O(n) * O(1) which is O(n).

                Comment

                Working...