Cursors and Transactions, why?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Eric Ridge

    Cursors and Transactions, why?

    Why must a cursor be defined in an open transaction? Obviously there's
    a good reason, but I can't figure it out. On a high level, what would
    be involved in allowing a cursor to outlive the transaction that
    created it?

    Cursors seem as if they have some nice performance benefits (esp. if
    you're not using all rows found), but their usefulness drops
    considerably since you must leave a transaction open.

    eric


    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

  • Eric Ridge

    #2
    Re: Cursors and Transactions, why?

    On Apr 5, 2004, at 6:44 PM, Joe Conway wrote:
    [color=blue]
    > Eric Ridge wrote:[color=green]
    >> Why must a cursor be defined in an open transaction? Obviously
    >> there's a good reason, but I can't figure it out. On a high level,
    >> what would be involved in allowing a cursor to outlive the
    >> transaction that created it?[/color]
    >
    > Historically I think it was because the memory was released at the end
    > of the current transaction (i.e. allocations were made in
    > TopTransactionC ontext). But as of 7.4, cursors *can* outlive
    > transactions:
    > http://www.postgresql.org/docs/curre...l-declare.html
    >
    > WITH HOLD
    > WITHOUT HOLD[/color]

    holy cow! This is fantastic. I had no idea. <short pause> ooh, and
    I see FETCH, in 7.4, supports absolute positioning. Must upgrade.

    thanks!

    eric


    ---------------------------(end of broadcast)---------------------------
    TIP 9: the planner will ignore your desire to choose an index scan if your
    joining column's datatypes do not match

    Comment

    • Joe Conway

      #3
      Re: Cursors and Transactions, why?

      Eric Ridge wrote:[color=blue]
      > Why must a cursor be defined in an open transaction? Obviously there's
      > a good reason, but I can't figure it out. On a high level, what would
      > be involved in allowing a cursor to outlive the transaction that created
      > it?[/color]

      Historically I think it was because the memory was released at the end
      of the current transaction (i.e. allocations were made in
      TopTransactionC ontext). But as of 7.4, cursors *can* outlive transactions:
      DECLARE DECLARE — define a cursor Synopsis DECLARE name [ BINARY ] [ ASENSITIVE | INSENSITIVE ] [ [ NO …


      WITH HOLD
      WITHOUT HOLD

      WITH HOLD specifies that the cursor may continue to be used after
      the transaction that created it successfully commits. WITHOUT HOLD
      specifies that the cursor cannot be used outside of the transaction that
      created it. If neither WITHOUT HOLD nor WITH HOLD is specified, WITHOUT
      HOLD is the default.

      HTH,

      Joe


      ---------------------------(end of broadcast)---------------------------
      TIP 5: Have you checked our extensive FAQ?



      Comment

      • Jan Wieck

        #4
        Re: Cursors and Transactions, why?

        Eric Ridge wrote:
        [color=blue]
        > Why must a cursor be defined in an open transaction? Obviously there's
        > a good reason, but I can't figure it out. On a high level, what would
        > be involved in allowing a cursor to outlive the transaction that
        > created it?[/color]

        Because the transaction is what protects the rows that build the result
        set from being removed by vacuum. In PostgreSQL, a cursor is a running
        query executor just sitting in the middle of its operation. If the
        underlying query is for example a simple sequential scan, then the
        result set is not materialized but every future fetch operation will
        read directly from the base table. This would obviously get screwed up
        if vacuum would think nobody needs those rows any more.
        [color=blue]
        >
        > Cursors seem as if they have some nice performance benefits (esp. if
        > you're not using all rows found), but their usefulness drops
        > considerably since you must leave a transaction open.[/color]

        And now you know why they are so good if you don't use all rows. This
        benefit I think goes away if you use Joe Conway's suggestion of WITH HOLD.


        Jan

        --
        #============== =============== =============== =============== ===========#
        # It's easier to get forgiveness for being wrong than for being right. #
        # Let's break this rule - forgive me. #
        #============== =============== =============== ====== JanWieck@Yahoo. com #


        ---------------------------(end of broadcast)---------------------------
        TIP 4: Don't 'kill -9' the postmaster

        Comment

        • wespvp@syntegra.com

          #5
          Re: Cursors and Transactions, why?

          On 4/6/04 10:54 AM, "Jan Wieck" <JanWieck@yahoo .com> wrote:
          [color=blue][color=green]
          >> Cursors seem as if they have some nice performance benefits (esp. if
          >> you're not using all rows found), but their usefulness drops
          >> considerably since you must leave a transaction open.[/color]
          >
          > And now you know why they are so good if you don't use all rows. This
          > benefit I think goes away if you use Joe Conway's suggestion of WITH HOLD.[/color]

          I tried using WITH HOLD in the following case (using an ecpg C program):

          foreach row in table A
          update table B with value from table A
          commit once every 10,000 updates
          forend

          I created a cursor on table A. Without WITH HOLD, obviously I got an error
          on the next TABLE A fetch because the COMMIT closed the cursor. I added
          'WITH HOLD' to the cursor. On the first COMMIT, the application hung. I
          assume the COMMIT would have completed after some period of time, but I
          didn't wait that long.

          There are 20 million rows in table A and 60 million in table B (one to many
          relationship).

          Is this hang on COMMIT when using WITH HOLD to be expected? Is there a way
          around it? I don't think it's reasonable put the entire 60 million updates
          in a single transaction. The kludge solution I implemented was to write out
          all the data I needed from table A to a file, then read that file and update
          table B.

          Wes


          ---------------------------(end of broadcast)---------------------------
          TIP 9: the planner will ignore your desire to choose an index scan if your
          joining column's datatypes do not match

          Comment

          • Tom Lane

            #6
            Re: Cursors and Transactions, why?

            <wespvp@syntegr a.com> writes:[color=blue]
            > Is this hang on COMMIT when using WITH HOLD to be expected?[/color]

            Yes. WITH HOLD is not magic, it just makes a materialized copy of the
            SELECT result. If you're talking about a multi-million-row result,
            it's gonna take awhile.
            [color=blue]
            > The kludge solution I implemented was to write out all the data I
            > needed from table A to a file, then read that file and update table B.[/color]

            In theory at least, that should not be any faster than a WITH HOLD
            cursor, since you're effectively replicating the same functionality
            outside the database ...

            regards, tom lane

            ---------------------------(end of broadcast)---------------------------
            TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

            Comment

            • Eric Ridge

              #7
              Re: Cursors and Transactions, why?

              On Apr 6, 2004, at 11:54 AM, Jan Wieck wrote:
              [color=blue]
              > Eric Ridge wrote:
              >[color=green]
              >> Why must a cursor be defined in an open transaction? Obviously
              >> there's a good reason, but I can't figure it out. On a high level,
              >> what would be involved in allowing a cursor to outlive the
              >> transaction that created it?[/color]
              >
              > Because the transaction is what protects the rows that build the
              > result set from being removed by vacuum. In PostgreSQL, a cursor is a
              > running query executor just sitting in the middle of its operation.[/color]

              That's a good thing to know.
              [color=blue]
              > If the underlying query is for example a simple sequential scan, then
              > the result set is not materialized but every future fetch operation
              > will read directly from the base table. This would obviously get
              > screwed up if vacuum would think nobody needs those rows any more.[/color]

              Is vacuum the only thing that would muck with the rows?
              [color=blue][color=green]
              >> Cursors seem as if they have some nice performance benefits (esp. if
              >> you're not using all rows found), but their usefulness drops
              >> considerably since you must leave a transaction open.[/color]
              >
              > And now you know why they are so good if you don't use all rows. This
              > benefit I think goes away if you use Joe Conway's suggestion of WITH
              > HOLD.[/color]

              Okay, so WITH HOLD is actually materializing the entire resultset
              (sequential scan or otherwise)? If that's true, you're right, some of
              the benefits do go away.

              I need to setup a 7.4 test server and play with this some, and figure
              out if the benefits are really what I want them to be. I do appreciate
              the insight into how cursors work... it helps a lot!

              eric


              ---------------------------(end of broadcast)---------------------------
              TIP 7: don't forget to increase your free space map settings

              Comment

              • wespvp@syntegra.com

                #8
                Re: Cursors and Transactions, why?

                On 4/6/04 3:55 PM, "Tom Lane" <tgl@sss.pgh.pa .us> wrote:
                [color=blue][color=green]
                >> The kludge solution I implemented was to write out all the data I
                >> needed from table A to a file, then read that file and update table B.[/color]
                >
                > In theory at least, that should not be any faster than a WITH HOLD
                > cursor, since you're effectively replicating the same functionality
                > outside the database ...[/color]

                Except for the "out of memory" thing...

                Are you saying that once the first COMMIT completed, all COMMIT's after that
                would function at normal speed - only the first one has to save the result
                set?

                Wes


                ---------------------------(end of broadcast)---------------------------
                TIP 4: Don't 'kill -9' the postmaster

                Comment

                • Tom Lane

                  #9
                  Re: Cursors and Transactions, why?

                  <wespvp@syntegr a.com> writes:[color=blue]
                  > On 4/6/04 3:55 PM, "Tom Lane" <tgl@sss.pgh.pa .us> wrote:[color=green]
                  >> In theory at least, that should not be any faster than a WITH HOLD
                  >> cursor, since you're effectively replicating the same functionality
                  >> outside the database ...[/color][/color]
                  [color=blue]
                  > Except for the "out of memory" thing...[/color]

                  What "out of memory thing"? The tuplestore code is perfectly capable of
                  spilling to disk --- in fact the usual performance gripe against it has
                  to do with spilling too soon, because sort_mem is set too small.

                  regards, tom lane

                  ---------------------------(end of broadcast)---------------------------
                  TIP 9: the planner will ignore your desire to choose an index scan if your
                  joining column's datatypes do not match

                  Comment

                  • Joe Conway

                    #10
                    Re: Cursors and Transactions, why?

                    Eric Ridge wrote:[color=blue]
                    > On Apr 6, 2004, at 11:54 AM, Jan Wieck wrote:[color=green]
                    >> And now you know why they are so good if you don't use all rows. This
                    >> benefit I think goes away if you use Joe Conway's suggestion of WITH
                    >> HOLD.[/color]
                    >
                    > Okay, so WITH HOLD is actually materializing the entire resultset
                    > (sequential scan or otherwise)? If that's true, you're right, some of
                    > the benefits do go away.[/color]

                    Keep in mind that the tuplestore stays in memory as long as it fits
                    within sort_mem kilobytes. And you can do:

                    set sort_mem to <some_large_num ber>;

                    prior to COMMIT, and then

                    set sort_mem to default;

                    after COMMIT, as long as you can afford the memory use. A bit ugly, but
                    it might come in handy ;-)

                    Joe

                    ---------------------------(end of broadcast)---------------------------
                    TIP 4: Don't 'kill -9' the postmaster

                    Comment

                    • Jan Wieck

                      #11
                      Re: Cursors and Transactions, why?

                      Eric Ridge wrote:[color=blue]
                      > On Apr 6, 2004, at 11:54 AM, Jan Wieck wrote:[color=green]
                      >> If the underlying query is for example a simple sequential scan, then
                      >> the result set is not materialized but every future fetch operation
                      >> will read directly from the base table. This would obviously get
                      >> screwed up if vacuum would think nobody needs those rows any more.[/color]
                      >
                      > Is vacuum the only thing that would muck with the rows?[/color]

                      Vacuum is the only thing that cares for the dustmites, yes.
                      [color=blue]
                      > I need to setup a 7.4 test server and play with this some, and figure
                      > out if the benefits are really what I want them to be. I do appreciate
                      > the insight into how cursors work... it helps a lot![/color]

                      Experience and knowledge can only be replaced by more experience and
                      more knowledge.


                      Jan

                      --
                      #============== =============== =============== =============== ===========#
                      # It's easier to get forgiveness for being wrong than for being right. #
                      # Let's break this rule - forgive me. #
                      #============== =============== =============== ====== JanWieck@Yahoo. com #


                      ---------------------------(end of broadcast)---------------------------
                      TIP 7: don't forget to increase your free space map settings

                      Comment

                      • Eric Ridge

                        #12
                        Re: Cursors and Transactions, why?

                        On Apr 7, 2004, at 12:43 AM, Joe Conway wrote:
                        [color=blue]
                        > Eric Ridge wrote:[color=green]
                        >> On Apr 6, 2004, at 11:54 AM, Jan Wieck wrote:[color=darkred]
                        >>> And now you know why they are so good if you don't use all rows.
                        >>> This benefit I think goes away if you use Joe Conway's suggestion of
                        >>> WITH HOLD.[/color]
                        >> Okay, so WITH HOLD is actually materializing the entire resultset
                        >> (sequential scan or otherwise)? If that's true, you're right, some
                        >> of the benefits do go away.[/color]
                        >
                        > Keep in mind that the tuplestore stays in memory as long as it fits
                        > within sort_mem kilobytes. And you can do:[/color]

                        More good information. Thanks!

                        Is the tuplestore basically just an array of ItemPointer-s? In mean,
                        it's not a copy of each entire row, is it?

                        eric


                        ---------------------------(end of broadcast)---------------------------
                        TIP 5: Have you checked our extensive FAQ?



                        Comment

                        • Eric Ridge

                          #13
                          Re: Cursors and Transactions, why?

                          On Apr 7, 2004, at 7:51 AM, Jan Wieck wrote:
                          [color=blue]
                          > Eric Ridge wrote:[color=green]
                          >> On Apr 6, 2004, at 11:54 AM, Jan Wieck wrote:[color=darkred]
                          >>> If the underlying query is for example a simple sequential scan,
                          >>> then the result set is not materialized but every future fetch
                          >>> operation will read directly from the base table. This would
                          >>> obviously get screwed up if vacuum would think nobody needs those
                          >>> rows any more.[/color]
                          >> Is vacuum the only thing that would muck with the rows?[/color]
                          >
                          > Vacuum is the only thing that cares for the dustmites, yes.[/color]

                          And WITH HOLD is strong enough to defend against a vacuum, I hope...
                          [color=blue][color=green]
                          >> I need to setup a 7.4 test server and play with this some, and figure
                          >> out if the benefits are really what I want them to be. I do
                          >> appreciate the insight into how cursors work... it helps a lot![/color]
                          >
                          > Experience and knowledge can only be replaced by more experience and
                          > more knowledge.[/color]

                          Very wise words.

                          My real problem is that the JDBC drivers (and I assume this is true for
                          all client interfaces) buffer the results of a SELECT in memory,
                          because the backend pushes out all the tuples as the response. I'm not
                          dealing with a large number of rows (only a few thousand), but they've
                          very wide, and many contain fields with multi-megabyte data. In some
                          situations, when I've got a lot of open ResultSets, the JVM throws
                          OutOfMemory errors.

                          One half-baked thought was to hack the JDBC drivers to have 'em gzip
                          large resultsets in memory. Wouldn't completely solve the problem, but
                          would probably help quite a bit. But the better solution is to use
                          cursors. We're not in a position to upgrade to 7.4 just yet, so we'll
                          just deal with the OutOfMemory errors until we can.

                          eric


                          ---------------------------(end of broadcast)---------------------------
                          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                          Comment

                          • Wes Palmer

                            #14
                            Re: Cursors and Transactions, why?

                            On 4/6/04 11:09 PM, "Tom Lane" <tgl@sss.pgh.pa .us> wrote:
                            [color=blue]
                            > What "out of memory thing"? The tuplestore code is perfectly capable of
                            > spilling to disk --- in fact the usual performance gripe against it has
                            > to do with spilling too soon, because sort_mem is set too small.[/color]

                            I tried doing a mass update of all rows with a single SQL statement in psql
                            and after it ran for many hours, I got 'out of memory'. I didn't try that
                            using C and WITH HOLD. I assumed it ran out of swap space, but was sleeping
                            at the time.

                            Wes


                            ---------------------------(end of broadcast)---------------------------
                            TIP 7: don't forget to increase your free space map settings

                            Comment

                            • Kris Jurka

                              #15
                              Re: Cursors and Transactions, why?



                              On Wed, 7 Apr 2004, Eric Ridge wrote:
                              [color=blue]
                              >
                              > My real problem is that the JDBC drivers (and I assume this is true for
                              > all client interfaces) buffer the results of a SELECT in memory,
                              > because the backend pushes out all the tuples as the response. I'm not
                              > dealing with a large number of rows (only a few thousand), but they've
                              > very wide, and many contain fields with multi-megabyte data. In some
                              > situations, when I've got a lot of open ResultSets, the JVM throws
                              > OutOfMemory errors.
                              >[/color]

                              The 7.4 jdbc driver has the ability to use cursors behind the scenes on
                              queries. This is done by calling Statement.setFe tchSize(n) to retrieve n
                              rows at a time. There are a number of other restrictions: you must be in
                              a transaction and the ResultSet type must be FORWARD_ONLY. You can use
                              the 7.4 jdbc driver against a 7.3 server as well so this may provide some
                              relief.

                              Kris Jurka


                              ---------------------------(end of broadcast)---------------------------
                              TIP 3: if posting/reading through Usenet, please send an appropriate
                              subscribe-nomail command to majordomo@postg resql.org so that your
                              message can get through to the mailing list cleanly

                              Comment

                              Working...