deleting large numbers of records

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Frank Swarbrick

    deleting large numbers of records

    We have a batch process that inserts large numbers (100,000 - 1,000,000) of
    records into a database each day. (DL/I database.) We're considering
    converting it to a DB2 table. Currently we have logic in place that, prior
    to inserting any data, reads the first input record and checks to see if it
    already exists in the table. If the record already exists there are two
    options:
    1) Don't continue, because you already ran this job today!
    2) This is a rerun - continue.

    If number 2 is selected the first thing that happens is that it deletes
    every record that was inserted today prior to doing the regular insert
    process. (You may ask, why not just skip over the ones that are already
    there. It's because we may be rerunning with an updated input file, where
    the input records may be different than during the first run.)

    Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
    EXEC SQL
    DELETE FROM FILM.FILM_TRANS ACTIONS
    WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
    END-EXEC

    The only problem is that my log file would end up running out of room. So
    now I've come up with the following:

    DELETE-TODAY SECTION.
    DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
    PERFORM WITH TEST AFTER
    UNTIL SQLCODE = 100
    DISPLAY 'COMMITTING...' UPON CONSOLE
    PERFORM COMMIT-UOW
    DISPLAY 'DELETING 10000' UPON CONSOLE
    PERFORM DB2-DELETE-TODAY
    END-PERFORM
    PERFORM COMMIT-UOW
    DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
  • Mark A

    #2
    Re: deleting large numbers of records

    "Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
    news:51q367F1lg dlrU1@mid.indiv idual.net...
    We have a batch process that inserts large numbers (100,000 - 1,000,000)
    of
    records into a database each day. (DL/I database.) We're considering
    converting it to a DB2 table. Currently we have logic in place that,
    prior
    to inserting any data, reads the first input record and checks to see if
    it
    already exists in the table. If the record already exists there are two
    options:
    1) Don't continue, because you already ran this job today!
    2) This is a rerun - continue.
    >
    If number 2 is selected the first thing that happens is that it deletes
    every record that was inserted today prior to doing the regular insert
    process. (You may ask, why not just skip over the ones that are already
    there. It's because we may be rerunning with an updated input file, where
    the input records may be different than during the first run.)
    >
    Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
    EXEC SQL
    DELETE FROM FILM.FILM_TRANS ACTIONS
    WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
    END-EXEC
    >
    The only problem is that my log file would end up running out of room. So
    now I've come up with the following:
    >
    DELETE-TODAY SECTION.
    DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
    PERFORM WITH TEST AFTER
    UNTIL SQLCODE = 100
    DISPLAY 'COMMITTING...' UPON CONSOLE
    PERFORM COMMIT-UOW
    DISPLAY 'DELETING 10000' UPON CONSOLE
    PERFORM DB2-DELETE-TODAY
    END-PERFORM
    PERFORM COMMIT-UOW
    DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
    .
    >
    DB2-DELETE-TODAY SECTION.
    EXEC SQL
    DELETE FROM (
    SELECT UPDATE_DATE
    FROM FILM.FILM_TRANS ACTIONS
    WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
    FETCH FIRST 10000 ROWS ONLY
    )
    WHERE 1 = 1
    END-EXEC
    CALL CHECKERR USING SQLCA ERRLOC
    .
    >
    My question is, is this the way to go or is there some better way?
    >
    I tried making the "10000" a host variable, but that didn't work. Any way
    around this?
    >
    You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
    statement. This is because DB2 gives a warning if you pre-compile a
    DELETE
    or UPDATE statement without a WHERE clause. Still works, but I like to
    avoid warnings.
    >
    Thanks!
    Frank
    >
    >
    ---
    Frank Swarbrick
    Senior Developer/Analyst - Mainframe Applications
    FirstBank Data Corporation - Lakewood, CO USA
    Yes there is a better way that will avoid filling up your DB2 z/OS logs.

    Declare a cursor with something like:
    SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
    WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
    UPDATE WITH HOLD

    OPEN cursor-name.

    Then FETCH one row at a time (until RC = 100).

    DELETE WHERE CURRENT OF C1 (cursor name).

    COMMIT. You could commit every 100 - 10000 if you don't want to commit each
    row.

    CLOSE cursor-name.

    Note that the WITH HOLD will retain the cursor position even though you have
    committed (which usually closes a cursor).

    Please consult the SQL Reference for details, because I am posting all the
    above syntax from (my) memory.

    Also, you find that there is very little increase in performance in
    committing more often than every 100 updates. Committing more often will
    provide increased concurrency with other applications.


    Comment

    • Mark A

      #3
      Re: deleting large numbers of records

      "Mark A" <nobody@nowhere .comwrote in message
      news:pOSdnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...
      Also, you find that there is very little increase in performance in
      committing more often than every 100 updates. Committing more often will
      provide increased concurrency with other applications.
      Correction, you find that there is very little increase in performance in
      committing LESS often than every 100 updates.


      Comment

      • jefftyzzer

        #4
        Re: deleting large numbers of records

        I think both solutions have merit. Frank, your version is very much
        like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
        COBOL as opposed to SQL/PL. Mark's solution will use less log space
        than yours, but I respectfully wonder about the time required to delete
        all of the target rows when they're being fetched and deleted
        one-at-a-time.

        FWIW, here's how I've done it (in a stored procedure):

        CREATE PROCEDURE CSE.PURGE_LRD()
        BEGIN

        DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
        DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--

        DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--

        DECLARE CONTINUE HANDLER FOR NOT FOUND
        SET V_NO_DATA = 1;--

        -- The V_DEADLOCK_OR_L TO attribute is throw-away,
        -- but a continue handler needs to do something,
        -- i.e., it's not enough to just declare a handler,
        -- it has to have an action in its body.
        DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
        SET V_DEADLOCK_OR_L TO = 1;--

        WHILE (V_NO_DATA = 0) DO
        DELETE FROM
        (
        SELECT
        1
        FROM
        LRD
        FETCH FIRST 200000 ROWS ONLY
        ) AS LRD_D;--

        COMMIT;--
        END WHILE;--
        END;

        --Jeff

        On Jan 24, 3:44 pm, "Mark A" <nob...@nowhere .comwrote:
        "Mark A" <nob...@nowhere .comwrote in messagenews:pOS dnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...
        >
        Also, you find that there is very little increase in performance in
        committing more often than every 100 updates. Committing more often will
        provide increased concurrency with other applications.Co rrection, you find that there is very little increase in performance in
        committing LESS often than every 100 updates.

        Comment

        • Frank Swarbrick

          #5
          Re: deleting large numbers of records

          Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>
          >Yes there is a better way that will avoid filling up your DB2 z/OS logs.
          Actually, DB2/LUW, but I'm guessing your advice still applies.
          >Declare a cursor with something like:
          >SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
          WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
          >UPDATE WITH HOLD
          >
          >OPEN cursor-name.
          >
          >Then FETCH one row at a time (until RC = 100).
          >
          >DELETE WHERE CURRENT OF C1 (cursor name).
          >
          >COMMIT. You could commit every 100 - 10000 if you don't want to commit each
          >row.
          >
          >CLOSE cursor-name.
          Interesting. I just figured that this would be much less efficient than
          doing just the delete with the fullselect, because in the latter case no
          data need be returned to the AR. Anyway, I will give it a shot.
          >Note that the WITH HOLD will retain the cursor position even though you
          have
          >committed (which usually closes a cursor).
          >
          >Please consult the SQL Reference for details, because I am posting all the
          >above syntax from (my) memory.
          >
          >Also, you find that there is very little increase in performance in
          >committing more often than every 100 updates. Committing more often will
          >provide increased concurrency with other applications.
          Will keep that in mind. Thanks!

          Frank

          ---
          Frank Swarbrick
          Senior Developer/Analyst - Mainframe Applications
          FirstBank Data Corporation - Lakewood, CO USA

          Comment

          • Frank Swarbrick

            #6
            Re: deleting large numbers of records

            Oh! So every 10,000 actually does make sense then, right?
            >>Mark A<nobody@nowher e.com01/24/07 4:44 PM >>>
            "Mark A" <nobody@nowhere .comwrote in message
            news:pOSdnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...
            Also, you find that there is very little increase in performance in
            committing more often than every 100 updates. Committing more often will
            provide increased concurrency with other applications.
            Correction, you find that there is very little increase in performance in
            committing LESS often than every 100 updates.




            Comment

            • Frank Swarbrick

              #7
              Re: deleting large numbers of records

              jefftyzzer<jeff tyzzer@sbcgloba l.net01/24/07 5:54 PM >>>
              >I think both solutions have merit. Frank, your version is very much
              >like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
              >COBOL as opposed to SQL/PL. Mark's solution will use less log space
              >than yours, but I respectfully wonder about the time required to delete
              >all of the target rows when they're being fetched and deleted
              >one-at-a-time.
              That was my concern as well. In any case, I will give both methods a shot
              and see which I prefer.
              >FWIW, here's how I've done it (in a stored procedure):
              >
              >CREATE PROCEDURE CSE.PURGE_LRD()
              > BEGIN
              >
              > DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
              > DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--
              >
              > DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--
              >
              > DECLARE CONTINUE HANDLER FOR NOT FOUND
              > SET V_NO_DATA = 1;--
              >
              > -- The V_DEADLOCK_OR_L TO attribute is throw-away,
              > -- but a continue handler needs to do something,
              > -- i.e., it's not enough to just declare a handler,
              > -- it has to have an action in its body.
              > DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
              > SET V_DEADLOCK_OR_L TO = 1;--
              >
              > WHILE (V_NO_DATA = 0) DO
              > DELETE FROM
              > (
              > SELECT
              > 1
              > FROM
              > LRD
              > FETCH FIRST 200000 ROWS ONLY
              > ) AS LRD_D;--
              >
              > COMMIT;--
              > END WHILE;--
              >END;
              My probably very naive thought is that it would be nice to have some sort of
              DELETE statement that didn't even do logging. While often (usually?) you
              would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
              there's no reason I would ever want to rollback. So why log? Just
              wondering...

              Thanks!

              Frank

              ---
              Frank Swarbrick
              Senior Developer/Analyst - Mainframe Applications
              FirstBank Data Corporation - Lakewood, CO USA

              Comment

              • Mark A

                #8
                Re: deleting large numbers of records

                "Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
                news:51qg2jF1l7 lvnU2@mid.indiv idual.net...
                Oh! So every 10,000 actually does make sense then, right?
                No. I would commit every 100 - 1000 rows. A commit is not that expensive,
                unless you do it for every row. Doing it every 100 rows or 10,000 will
                provide about a 2% difference in performance at best.

                The major expense of a commit is a synchronous write of the log buffer to
                disk. But the log buffer will be flushed to disk anyway whenever the log
                buffer is full, or about every 1 second. In any case, a commit is usually
                only as expensive as a few deletes, so do the math.


                Comment

                • Mark A

                  #9
                  Re: deleting large numbers of records

                  "Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
                  news:51qg1dF1l7 lvnU1@mid.indiv idual.net...
                  Interesting. I just figured that this would be much less efficient than
                  doing just the delete with the fullselect, because in the latter case no
                  data need be returned to the AR. Anyway, I will give it a shot.
                  It depends on where the program runs. If the program runs on server and the
                  static SQL is bound into a package that runs on the server, then there is
                  not that much difference in performance (unless performance is
                  ultra-critical). If the program is running remotely, then there would be a
                  big difference in performance.

                  I have written SQL stored procedures to do mass deletes with a cursor and it
                  performs well.

                  I haven't seen to many COBOL programs running on DB2 LUW. What compiler are
                  you using? I used MicroFocus COBOL against OS/2 Database Manager, but that
                  was in 1991.


                  Comment

                  • Knut Stolze

                    #10
                    Re: deleting large numbers of records

                    Frank Swarbrick wrote:
                    jefftyzzer<jeff tyzzer@sbcgloba l.net01/24/07 5:54 PM >>>
                    >>I think both solutions have merit. Frank, your version is very much
                    >>like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
                    >>COBOL as opposed to SQL/PL. Mark's solution will use less log space
                    >>than yours, but I respectfully wonder about the time required to delete
                    >>all of the target rows when they're being fetched and deleted
                    >>one-at-a-time.
                    >
                    That was my concern as well. In any case, I will give both methods a shot
                    and see which I prefer.
                    >
                    >>FWIW, here's how I've done it (in a stored procedure):
                    >>
                    >>CREATE PROCEDURE CSE.PURGE_LRD()
                    >>BEGIN
                    >>
                    >>DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
                    >>DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--
                    >>
                    >>DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--
                    >>
                    >>DECLARE CONTINUE HANDLER FOR NOT FOUND
                    >>SET V_NO_DATA = 1;--
                    >>
                    >>-- The V_DEADLOCK_OR_L TO attribute is throw-away,
                    >>-- but a continue handler needs to do something,
                    >>-- i.e., it's not enough to just declare a handler,
                    >>-- it has to have an action in its body.
                    >>DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
                    >>SET V_DEADLOCK_OR_L TO = 1;--
                    >>
                    >>WHILE (V_NO_DATA = 0) DO
                    >>DELETE FROM
                    >>(
                    >>SELECT
                    >>1
                    >>FROM
                    >>LRD
                    >>FETCH FIRST 200000 ROWS ONLY
                    >>) AS LRD_D;--
                    >>
                    >>COMMIT;--
                    >>END WHILE;--
                    >>END;
                    >
                    My probably very naive thought is that it would be nice to have some sort
                    of
                    DELETE statement that didn't even do logging. While often (usually?) you
                    would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
                    there's no reason I would ever want to rollback. So why log? Just
                    wondering...
                    The logs are also used for crash recovery. Let's assume you run the
                    unlogged DELETE. Now your application or the DB2 server crashes before you
                    issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
                    properly rolled back and the database is in a consistent state. If you
                    don't log the DELETE, you are out of luck there.

                    What would be nice to have in this respect is an option for the DELETE
                    statement to explicitly turn off logging - which would have a certain
                    amount of problems as I just mentioned. Truncating a whole table is
                    supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
                    INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
                    partitioning is an option for you?

                    --
                    Knut Stolze
                    DB2 z/OS Utilities Development
                    IBM Germany

                    Comment

                    • Knut Stolze

                      #11
                      Re: deleting large numbers of records

                      Mark A wrote:
                      The major expense of a commit is a synchronous write of the log buffer to
                      disk.
                      Not necessarily if group commit is used.

                      --
                      Knut Stolze
                      DB2 z/OS Utilities Development
                      IBM Germany

                      Comment

                      • Knut Stolze

                        #12
                        Re: deleting large numbers of records

                        Frank Swarbrick wrote:
                        Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>
                        >>Yes there is a better way that will avoid filling up your DB2 z/OS logs.
                        >
                        Actually, DB2/LUW, but I'm guessing your advice still applies.
                        In that case, have you considered the MERGE statement? Then you may not
                        have to DELETE the rows at all - just UPDATE them.

                        --
                        Knut Stolze
                        DB2 z/OS Utilities Development
                        IBM Germany

                        Comment

                        • aj

                          #13
                          Re: deleting large numbers of records

                          Frank:
                          Here's an OLAPy trick that I sometimes use:

                          Let's say you want to delete rows from a very large table based on a
                          sysdate column. You *don't* want to overfill the transaction logs

                          The answer: Figure out how many rows you can safely delete w/ your
                          logs, use row_number() to slap a number on each one, and delete based
                          not only on your sysdate, but also that number.
                          Let's say you can safely delete up to 200000 rows, and you only want to
                          delete rows where sysdate = 5/1/2005:

                          lock table mytable in exclusive mode ;
                          delete
                          FROM (SELECT sysdate , row_number() OVER
                          (ORDER BY sysdate)
                          AS rn FROM mytable)
                          AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
                          COMMIT ;

                          Stick this in a loop and stop when no rows get deleted anymore.

                          HTH

                          aj

                          Frank Swarbrick wrote:
                          We have a batch process that inserts large numbers (100,000 - 1,000,000) of
                          records into a database each day. (DL/I database.) We're considering
                          converting it to a DB2 table. Currently we have logic in place that, prior
                          to inserting any data, reads the first input record and checks to see if it
                          already exists in the table. If the record already exists there are two
                          options:
                          1) Don't continue, because you already ran this job today!
                          2) This is a rerun - continue.
                          >
                          If number 2 is selected the first thing that happens is that it deletes
                          every record that was inserted today prior to doing the regular insert
                          process. (You may ask, why not just skip over the ones that are already
                          there. It's because we may be rerunning with an updated input file, where
                          the input records may be different than during the first run.)
                          >
                          Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
                          EXEC SQL
                          DELETE FROM FILM.FILM_TRANS ACTIONS
                          WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
                          END-EXEC
                          >
                          The only problem is that my log file would end up running out of room. So
                          now I've come up with the following:
                          >
                          DELETE-TODAY SECTION.
                          DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
                          PERFORM WITH TEST AFTER
                          UNTIL SQLCODE = 100
                          DISPLAY 'COMMITTING...' UPON CONSOLE
                          PERFORM COMMIT-UOW
                          DISPLAY 'DELETING 10000' UPON CONSOLE
                          PERFORM DB2-DELETE-TODAY
                          END-PERFORM
                          PERFORM COMMIT-UOW
                          DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
                          .
                          >
                          DB2-DELETE-TODAY SECTION.
                          EXEC SQL
                          DELETE FROM (
                          SELECT UPDATE_DATE
                          FROM FILM.FILM_TRANS ACTIONS
                          WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
                          FETCH FIRST 10000 ROWS ONLY
                          )
                          WHERE 1 = 1
                          END-EXEC
                          CALL CHECKERR USING SQLCA ERRLOC
                          .
                          >
                          My question is, is this the way to go or is there some better way?
                          >
                          I tried making the "10000" a host variable, but that didn't work. Any way
                          around this?
                          >
                          You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
                          statement. This is because DB2 gives a warning if you pre-compile a DELETE
                          or UPDATE statement without a WHERE clause. Still works, but I like to
                          avoid warnings.
                          >
                          Thanks!
                          Frank
                          >
                          >
                          ---
                          Frank Swarbrick
                          Senior Developer/Analyst - Mainframe Applications
                          FirstBank Data Corporation - Lakewood, CO USA

                          Comment

                          • ChrisC

                            #14
                            Re: deleting large numbers of records

                            Another option might be to create the table as an MDC (MultiDimension al
                            Cluster) table, clustered on UPDATE_DATE, and then turn on the option
                            MDC ROLLOUT (not sure of exact syntax). This should then allow you to
                            just delete all the rows for that date. According to the literature,
                            it should just mark each block (set of pages) for that cluster as
                            deleted, log each block as being deleted, and commit. If you think
                            about this, it should delete the 1,000,000 or so rows very quickly and
                            NOT fill up your log files.

                            Disclaimer - I haven't (yet) been able to use this myself, so no actual
                            experience here. But if this is a new DB2 table/DB, it might be a
                            great time to check this feature out.

                            -Chris

                            Comment

                            • Frank Swarbrick

                              #15
                              Re: deleting large numbers of records

                              Knut Stolze<stolze@d e.ibm.com01/25/07 12:41 AM >>>
                              >Frank Swarbrick wrote:
                              >
                              >My probably very naive thought is that it would be nice to have some
                              sort
                              >of
                              >DELETE statement that didn't even do logging. While often (usually?)
                              you
                              >would want to be able to ROLLBACK a DELETE, in the case of what I'm
                              doing
                              >there's no reason I would ever want to rollback. So why log? Just
                              >wondering...
                              >
                              >The logs are also used for crash recovery. Let's assume you run the
                              >unlogged DELETE. Now your application or the DB2 server crashes before
                              you
                              >issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
                              >properly rolled back and the database is in a consistent state. If you
                              >don't log the DELETE, you are out of luck there.
                              Are you saying that DB2 occasionally crashes?
                              :-) (Just kidding.)
                              As you can tell, I'm hopelessly naive about these things. I'm just a simple
                              application programmer.
                              >What would be nice to have in this respect is an option for the DELETE
                              >statement to explicitly turn off logging - which would have a certain
                              >amount of problems as I just mentioned. Truncating a whole table is
                              >supported that way already: you can use ALTER TABLE ... ACTIVATE NOT
                              LOGGED
                              >INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
                              >partitioning is an option for you?
                              I don't know about range partitioning. Can you give me a pointer to some
                              information on this?

                              Thanks!

                              Frank


                              ---
                              Frank Swarbrick
                              Senior Developer/Analyst - Mainframe Applications
                              FirstBank Data Corporation - Lakewood, CO USA

                              Comment

                              Working...