Performance Tuning for Row-by-Row Update Statement

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Muzamil

    Performance Tuning for Row-by-Row Update Statement

    hi

    For an unavoidable reason, I have to use row-by-row processing
    (update) on a temporary table to update a history table every day.
    I have around 60,000 records in temporary table and about 2 million in
    the history table.

    Could any one please suggest different methods to imporve the runtime
    of the query?

    Would highly appreciate!
  • David Portas

    #2
    Re: Performance Tuning for Row-by-Row Update Statement

    Is the row-by-row processing done in a cursor? Must you update exactly one
    row at a time (if so, why?) or would it be acceptable to update 2,3 or 50
    rows at a time?

    You can use SET ROWCOUNT and a loop to fine-tune the batch size of rows to
    be updated. Bigger batches should improve performance over updating single
    rows.

    SET ROWCOUNT 50

    WHILE 1=1
    BEGIN

    UPDATE SomeTable
    SET ...
    WHERE /* row not already updated */

    IF @@ROWCOUNT=0
    BREAK

    END

    SET ROWCOUNT 0

    --
    David Portas
    SQL Server MVP
    --


    Comment

    • David Portas

      #3
      Re: Performance Tuning for Row-by-Row Update Statement

      Is the row-by-row processing done in a cursor? Must you update exactly one
      row at a time (if so, why?) or would it be acceptable to update 2,3 or 50
      rows at a time?

      You can use SET ROWCOUNT and a loop to fine-tune the batch size of rows to
      be updated. Bigger batches should improve performance over updating single
      rows.

      SET ROWCOUNT 50

      WHILE 1=1
      BEGIN

      UPDATE SomeTable
      SET ...
      WHERE /* row not already updated */

      IF @@ROWCOUNT=0
      BREAK

      END

      SET ROWCOUNT 0

      --
      David Portas
      SQL Server MVP
      --


      Comment

      • Greg D. Moore \(Strider\)

        #4
        Re: Performance Tuning for Row-by-Row Update Statement


        "Muzamil" <muzamil@hotmai l.com> wrote in message
        news:5a998f78.0 405211023.24b40 513@posting.goo gle.com...[color=blue]
        > hi
        >
        > For an unavoidable reason, I have to use row-by-row processing
        > (update) on a temporary table to update a history table every day.
        > I have around 60,000 records in temporary table and about 2 million in
        > the history table.[/color]

        Not much you can do if you absolutely HAVE to do row-by-row updating.

        You might want to post DDL, etc. so others can take a crack at it. I've
        seen many times someone will say, "I have to use a cursor", "I have to
        update one row at a time" and then someone posts a much better/faster
        solution.

        Also, how are you handling transactions? Explicitly or implicitely? If
        you're doing them implicitely, are you wrapping each update in its own, or
        can up batch say 20 updates?

        Finally, where's your log files? Separate physical drives?

        [color=blue]
        >
        > Could any one please suggest different methods to imporve the runtime
        > of the query?
        >
        > Would highly appreciate![/color]


        Comment

        • Muzamil

          #5
          Re: Performance Tuning for Row-by-Row Update Statement

          Hi
          Thanks for your reply.

          The row-by-row update is mandatory becuase the leagacy system is
          sending us the information such as "Add", "Modify" or "delete" and
          this information HAS to be processed in the same order otherwise we'll
          get the erroneous data.
          I know it's a dumb way of doing things but this is what our and their
          IT department has chosen to be correct way of action after several
          meetings. Hence the batch idea will not work here.


          I am not using Cursors, instead I am using the loop based on the
          primary key.

          The log files are on different drives.

          I've also tried using "WITH (ROWLOCK)" in the update statement but
          it's not helping much.

          Can you please still throw in some idea? Would be great help!

          Thanks


          "Greg D. Moore \(Strider\)" <mooregr_delete th1s@greenms.co m> wrote in message news:<tOxrc.234 090$M3.65389@tw ister.nyroc.rr. com>...[color=blue]
          > "Muzamil" <muzamil@hotmai l.com> wrote in message
          > news:5a998f78.0 405211023.24b40 513@posting.goo gle.com...[color=green]
          > > hi
          > >
          > > For an unavoidable reason, I have to use row-by-row processing
          > > (update) on a temporary table to update a history table every day.
          > > I have around 60,000 records in temporary table and about 2 million in
          > > the history table.[/color]
          >
          > Not much you can do if you absolutely HAVE to do row-by-row updating.
          >
          > You might want to post DDL, etc. so others can take a crack at it. I've
          > seen many times someone will say, "I have to use a cursor", "I have to
          > update one row at a time" and then someone posts a much better/faster
          > solution.
          >
          > Also, how are you handling transactions? Explicitly or implicitely? If
          > you're doing them implicitely, are you wrapping each update in its own, or
          > can up batch say 20 updates?
          >
          > Finally, where's your log files? Separate physical drives?
          >
          >[color=green]
          > >
          > > Could any one please suggest different methods to imporve the runtime
          > > of the query?
          > >
          > > Would highly appreciate![/color][/color]

          Comment

          • Erland Sommarskog

            #6
            Re: Performance Tuning for Row-by-Row Update Statement

            Muzamil (muzamil@hotmai l.com) writes:[color=blue]
            > The row-by-row update is mandatory becuase the leagacy system is
            > sending us the information such as "Add", "Modify" or "delete" and
            > this information HAS to be processed in the same order otherwise we'll
            > get the erroneous data.[/color]

            Ouch. Life is cruel, sometimes.

            I wonder what possibilities there could be to find parallel streams,
            that is updates that could be performed independently. Maybe you
            can modify 10 rows at a time then. But it does not sound like a very
            easy thing to do.

            Without knowing the details of the system, it is difficult to give
            much advice. But any sort of pre-aggregation you can do, is probably
            going to pay back.


            --
            Erland Sommarskog, SQL Server MVP, sommar@algonet. se

            Books Online for SQL Server SP3 at
            Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

            Comment

            • Muzamil

              #7
              Re: Performance Tuning for Row-by-Row Update Statement

              Details of the system:
              The leagcy system sends us records flagged with "Add", "modify" or
              "delete".
              The purpose of these flags is self-explnatory. But the fun began when
              we noticed that within same file , legacy system sends us "Add" and
              then "Modify". Thus, we were left with no other option except to do
              row-by-row processing.
              We came up with the following logic:

              a) If record‘s StatusFlag is ‘A' and record‘s key does not exist in
              DataWareHouse's Table, then the record is inserted into
              DataWareHouse's Table.

              b) If record‘s StatusFlag is ‘A', but record‘s key exists in
              DataWareHouse's Table, then the record is marked as invalid and will
              be inserted into InvalidTable..

              c) If record‘s StatusFlag is ‘M' and record‘s key exists in
              DataWareHouse's Table and record is active, then the corresponding
              record in DataWareHouse's Table will be updated.

              d) If record‘s StatusFlag is ‘M' and record‘s key exists in
              DataWareHouse's Table but record is inactive, then the record is
              marked as invalid and will be inserted into InvalidTable.


              e) If record‘s StatusFlag is ‘M' and record‘s key does not exist in
              DataWareHouse's Table, then the record is marked as invalid and will
              be inserted into InvalidTable.

              f) If record‘s StatusFlag is ‘D' and record‘s key exists in
              DataWareHouse's Table and record is active, then the corresponding
              record in DataWareHouse's Table will be updated as inactive.

              g) If record‘s StatusFlag is ‘D' and record‘s key exists in
              DataWareHouse's Table but record is inactive, then the record is
              marked as invalid and will be inserted into InvalidTable.

              h) If record‘s StatusFlag is ‘D' and record‘s key does not exist in
              DataWareHouse's Table, then the record is marked as invalid and will
              be inserted into InvalidTable.

              This logic takes care of ALL the anomalies we were facing before but
              at the cost of long processing time.

              I await your comments.


              Thanks


              Erland Sommarskog <sommar@algonet .se> wrote in message news:<Xns94F53B F51111Yazorman@ 127.0.0.1>...[color=blue]
              > Muzamil (muzamil@hotmai l.com) writes:[color=green]
              > > The row-by-row update is mandatory becuase the leagacy system is
              > > sending us the information such as "Add", "Modify" or "delete" and
              > > this information HAS to be processed in the same order otherwise we'll
              > > get the erroneous data.[/color]
              >
              > Ouch. Life is cruel, sometimes.
              >
              > I wonder what possibilities there could be to find parallel streams,
              > that is updates that could be performed independently. Maybe you
              > can modify 10 rows at a time then. But it does not sound like a very
              > easy thing to do.
              >
              > Without knowing the details of the system, it is difficult to give
              > much advice. But any sort of pre-aggregation you can do, is probably
              > going to pay back.[/color]

              Comment

              • Erland Sommarskog

                #8
                Re: Performance Tuning for Row-by-Row Update Statement

                Muzamil (muzamil@hotmai l.com) writes:[color=blue]
                > Details of the system:
                > The leagcy system sends us records flagged with "Add", "modify" or
                > "delete".
                > The purpose of these flags is self-explnatory. But the fun began when
                > we noticed that within same file , legacy system sends us "Add" and
                > then "Modify". Thus, we were left with no other option except to do
                > row-by-row processing.
                > We came up with the following logic:[/color]

                Hm, you might be missing a few cases. What if you get an Add, and record
                exists in DW, but is marked inactive? With your current logic, the
                input record moved to the Invalid table.

                And could that feediug system be as weird as to send Add, Modify, Delete,
                and Add again? Well, for a robust solution this is what we should assume.

                It's a tricky problem, and I was about to defer the problem, when I
                recalled a solution that colleague did for one of our stored procedures.
                The secret word for tonight is bucketing! Assuming that there are
                only a couple of input records for each key value, this should be
                an excellent solution. You create buckets, so that each bucket has
                at most one row per key value. Here is an example on how to do it:

                UPDATE inputtbl
                SET bucket = (SELECT count(*)
                FROM inputtbl b
                WHERE a.keyval = b.keyval
                AND a.rownumber < b.rownumber) + 1
                FROM inputtbl a

                input.keyval is the keys for the records in the DW table. Rownumber
                is a column which as describes the processing order. I assume that
                you have such a column.

                So now you can iterate over the buckets, and for each bucket, you can do
                set- based processing. You still have to iterate, but instead over 60000
                rows, only over a couple of buckets.


                --
                Erland Sommarskog, SQL Server MVP, sommar@algonet. se

                Books Online for SQL Server SP3 at
                Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                Comment

                • Muzamil

                  #9
                  Re: Performance Tuning for Row-by-Row Update Statement

                  I think I was not articulate enough to convey the logic properly.
                  Anyways, thanks to everyone for your help.
                  By using the ROWLOCK and proper indexes, I was ale to reduce the time considerably.

                  Erland Sommarskog <sommar@algonet .se> wrote in message news:<Xns94F682 1D6ABYazorman@1 27.0.0.1>...[color=blue]
                  > Muzamil (muzamil@hotmai l.com) writes:[color=green]
                  > > Details of the system:
                  > > The leagcy system sends us records flagged with "Add", "modify" or
                  > > "delete".
                  > > The purpose of these flags is self-explnatory. But the fun began when
                  > > we noticed that within same file , legacy system sends us "Add" and
                  > > then "Modify". Thus, we were left with no other option except to do
                  > > row-by-row processing.
                  > > We came up with the following logic:[/color]
                  >
                  > Hm, you might be missing a few cases. What if you get an Add, and record
                  > exists in DW, but is marked inactive? With your current logic, the
                  > input record moved to the Invalid table.
                  >
                  > And could that feediug system be as weird as to send Add, Modify, Delete,
                  > and Add again? Well, for a robust solution this is what we should assume.
                  >
                  > It's a tricky problem, and I was about to defer the problem, when I
                  > recalled a solution that colleague did for one of our stored procedures.
                  > The secret word for tonight is bucketing! Assuming that there are
                  > only a couple of input records for each key value, this should be
                  > an excellent solution. You create buckets, so that each bucket has
                  > at most one row per key value. Here is an example on how to do it:
                  >
                  > UPDATE inputtbl
                  > SET bucket = (SELECT count(*)
                  > FROM inputtbl b
                  > WHERE a.keyval = b.keyval
                  > AND a.rownumber < b.rownumber) + 1
                  > FROM inputtbl a
                  >
                  > input.keyval is the keys for the records in the DW table. Rownumber
                  > is a column which as describes the processing order. I assume that
                  > you have such a column.
                  >
                  > So now you can iterate over the buckets, and for each bucket, you can do
                  > set- based processing. You still have to iterate, but instead over 60000
                  > rows, only over a couple of buckets.[/color]

                  Comment

                  • Erland Sommarskog

                    #10
                    Re: Performance Tuning for Row-by-Row Update Statement

                    Muzamil (muzamil@hotmai l.com) writes:[color=blue]
                    > I think I was not articulate enough to convey the logic properly.
                    > Anyways, thanks to everyone for your help. By using the ROWLOCK and
                    > proper indexes, I was ale to reduce the time considerably.[/color]

                    Good indexes is always useful, and of course for iterative processing
                    it is even more imperative, since the cost a less-than-optimal plan
                    is multiplied.

                    I'm just curious, would my bucketing idea be applicable to your problem?
                    It should give you even more speed, but if you have good-ebough now, there
                    is of course no reason to spend more time on it.


                    --
                    Erland Sommarskog, SQL Server MVP, sommar@algonet. se

                    Books Online for SQL Server SP3 at
                    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                    Comment

                    Working...