deleting large numbers of records

**Mark A** · Jan 24 '07, 11:45 PM

Re: deleting large numbers of records

"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
news:51q367F1lg dlrU1@mid.indiv idual.net...

We have a batch process that inserts large numbers (100,000 - 1,000,000)
of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that,
prior
to inserting any data, reads the first input record and checks to see if
it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.
>
If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)
>
Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
END-EXEC
>
The only problem is that my log file would end up running out of room. So
now I've come up with the following:
>
DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.
>
DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.
>
My question is, is this the way to go or is there some better way?
>
I tried making the "10000" a host variable, but that didn't work. Any way
around this?
>
You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a
DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.
>
Thanks!
Frank
>
>
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Declare a cursor with something like:
SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
UPDATE WITH HOLD

OPEN cursor-name.

Then FETCH one row at a time (until RC = 100).

DELETE WHERE CURRENT OF C1 (cursor name).

COMMIT. You could commit every 100 - 10000 if you don't want to commit each
row.

CLOSE cursor-name.

Note that the WITH HOLD will retain the cursor position even though you have
committed (which usually closes a cursor).

Please consult the SQL Reference for details, because I am posting all the
above syntax from (my) memory.

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

**Mark A** · Jan 24 '07, 11:55 PM

Re: deleting large numbers of records

"Mark A" <nobody@nowhere .comwrote in message
news:pOSdnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.

**jefftyzzer** · Jan 25 '07, 01:05 AM

Re: deleting large numbers of records

I think both solutions have merit. Frank, your version is very much
like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
COBOL as opposed to SQL/PL. Mark's solution will use less log space
than yours, but I respectfully wonder about the time required to delete
all of the target rows when they're being fetched and deleted
one-at-a-time.

FWIW, here's how I've done it (in a stored procedure):

CREATE PROCEDURE CSE.PURGE_LRD()
BEGIN

DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--

DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--

DECLARE CONTINUE HANDLER FOR NOT FOUND
SET V_NO_DATA = 1;--

-- The V_DEADLOCK_OR_L TO attribute is throw-away,
-- but a continue handler needs to do something,
-- i.e., it's not enough to just declare a handler,
-- it has to have an action in its body.
DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
SET V_DEADLOCK_OR_L TO = 1;--

WHILE (V_NO_DATA = 0) DO
DELETE FROM
(
SELECT
1
FROM
LRD
FETCH FIRST 200000 ROWS ONLY
) AS LRD_D;--

COMMIT;--
END WHILE;--
END;

--Jeff

On Jan 24, 3:44 pm, "Mark A" <nob...@nowhere .comwrote:

"Mark A" <nob...@nowhere .comwrote in messagenews:pOS dnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...
>

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.Co rrection, you find that there is very little increase in performance in

committing LESS often than every 100 updates.

**Frank Swarbrick** · Jan 25 '07, 01:45 AM

Re: deleting large numbers of records

Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>

>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

Actually, DB2/LUW, but I'm guessing your advice still applies.

>Declare a cursor with something like:
>SELECT UPDATE_DATE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED) FOR
>UPDATE WITH HOLD
>
>OPEN cursor-name.
>
>Then FETCH one row at a time (until RC = 100).
>
>DELETE WHERE CURRENT OF C1 (cursor name).
>
>COMMIT. You could commit every 100 - 10000 if you don't want to commit each

>row.
>
>CLOSE cursor-name.

Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.

>Note that the WITH HOLD will retain the cursor position even though you

have

>committed (which usually closes a cursor).
>
>Please consult the SQL Reference for details, because I am posting all the

>above syntax from (my) memory.
>
>Also, you find that there is very little increase in performance in
>committing more often than every 100 updates. Committing more often will
>provide increased concurrency with other applications.

Will keep that in mind. Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Frank Swarbrick** · Jan 25 '07, 01:45 AM

Re: deleting large numbers of records

Oh! So every 10,000 actually does make sense then, right?

>>Mark A<nobody@nowher e.com01/24/07 4:44 PM >>>

"Mark A" <nobody@nowhere .comwrote in message
news:pOSdnQU31-zgcCrYnZ2dnUVZ_ sWdnZ2d@comcast .com...

Also, you find that there is very little increase in performance in
committing more often than every 100 updates. Committing more often will
provide increased concurrency with other applications.

Correction, you find that there is very little increase in performance in
committing LESS often than every 100 updates.

**Frank Swarbrick** · Jan 25 '07, 01:55 AM

Re: deleting large numbers of records

jefftyzzer<jeff tyzzer@sbcgloba l.net01/24/07 5:54 PM >>>

>I think both solutions have merit. Frank, your version is very much
>like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
>COBOL as opposed to SQL/PL. Mark's solution will use less log space
>than yours, but I respectfully wonder about the time required to delete
>all of the target rows when they're being fetched and deleted
>one-at-a-time.

That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.

>FWIW, here's how I've done it (in a stored procedure):
>
>CREATE PROCEDURE CSE.PURGE_LRD()
> BEGIN
>
> DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
> DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--
>
> DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--
>
> DECLARE CONTINUE HANDLER FOR NOT FOUND
> SET V_NO_DATA = 1;--
>
> -- The V_DEADLOCK_OR_L TO attribute is throw-away,
> -- but a continue handler needs to do something,
> -- i.e., it's not enough to just declare a handler,
> -- it has to have an action in its body.
> DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
> SET V_DEADLOCK_OR_L TO = 1;--
>
> WHILE (V_NO_DATA = 0) DO
> DELETE FROM
> (
> SELECT
> 1
> FROM
> LRD
> FETCH FIRST 200000 ROWS ONLY
> ) AS LRD_D;--
>
> COMMIT;--
> END WHILE;--
>END;

My probably very naive thought is that it would be nice to have some sort of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...

Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Mark A** · Jan 25 '07, 02:45 AM

Re: deleting large numbers of records

"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
news:51qg2jF1l7 lvnU2@mid.indiv idual.net...

Oh! So every 10,000 actually does make sense then, right?

No. I would commit every 100 - 1000 rows. A commit is not that expensive,
unless you do it for every row. Doing it every 100 rows or 10,000 will
provide about a 2% difference in performance at best.

The major expense of a commit is a synchronous write of the log buffer to
disk. But the log buffer will be flushed to disk anyway whenever the log
buffer is full, or about every 1 second. In any case, a commit is usually
only as expensive as a few deletes, so do the math.

**Mark A** · Jan 25 '07, 02:45 AM

Re: deleting large numbers of records

"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
news:51qg1dF1l7 lvnU1@mid.indiv idual.net...

Interesting. I just figured that this would be much less efficient than
doing just the delete with the fullselect, because in the latter case no
data need be returned to the AR. Anyway, I will give it a shot.

It depends on where the program runs. If the program runs on server and the
static SQL is bound into a package that runs on the server, then there is
not that much difference in performance (unless performance is
ultra-critical). If the program is running remotely, then there would be a
big difference in performance.

I have written SQL stored procedures to do mass deletes with a cursor and it
performs well.

I haven't seen to many COBOL programs running on DB2 LUW. What compiler are
you using? I used MicroFocus COBOL against OS/2 Database Manager, but that
was in 1991.

**Knut Stolze** · Jan 25 '07, 07:45 AM

Re: deleting large numbers of records

Frank Swarbrick wrote:

jefftyzzer<jeff tyzzer@sbcgloba l.net01/24/07 5:54 PM >>>

>>I think both solutions have merit. Frank, your version is very much
>>like the approach outlined in Serge's "SQL on Fire" seminars, albeit in
>>COBOL as opposed to SQL/PL. Mark's solution will use less log space
>>than yours, but I respectfully wonder about the time required to delete
>>all of the target rows when they're being fetched and deleted
>>one-at-a-time.

>
That was my concern as well. In any case, I will give both methods a shot
and see which I prefer.
>

>>FWIW, here's how I've done it (in a stored procedure):
>>
>>CREATE PROCEDURE CSE.PURGE_LRD()
>>BEGIN
>>
>>DECLARE V_NO_DATA SMALLINT DEFAULT 0;--
>>DECLARE V_DEADLOCK_OR_L TO SMALLINT DEFAULT 0;--
>>
>>DECLARE C_DEADLOCK_OR_L TO CONDITION FOR SQLSTATE '40001';--
>>
>>DECLARE CONTINUE HANDLER FOR NOT FOUND
>>SET V_NO_DATA = 1;--
>>
>>-- The V_DEADLOCK_OR_L TO attribute is throw-away,
>>-- but a continue handler needs to do something,
>>-- i.e., it's not enough to just declare a handler,
>>-- it has to have an action in its body.
>>DECLARE CONTINUE HANDLER FOR C_DEADLOCK_OR_L TO
>>SET V_DEADLOCK_OR_L TO = 1;--
>>
>>WHILE (V_NO_DATA = 0) DO
>>DELETE FROM
>>(
>>SELECT
>>1
>>FROM
>>LRD
>>FETCH FIRST 200000 ROWS ONLY
>>) AS LRD_D;--
>>
>>COMMIT;--
>>END WHILE;--
>>END;

>
My probably very naive thought is that it would be nice to have some sort
of
DELETE statement that didn't even do logging. While often (usually?) you
would want to be able to ROLLBACK a DELETE, in the case of what I'm doing
there's no reason I would ever want to rollback. So why log? Just
wondering...

The logs are also used for crash recovery. Let's assume you run the
unlogged DELETE. Now your application or the DB2 server crashes before you
issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
properly rolled back and the database is in a consistent state. If you
don't log the DELETE, you are out of luck there.

What would be nice to have in this respect is an option for the DELETE
statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

**Knut Stolze** · Jan 25 '07, 08:45 AM

Re: deleting large numbers of records

Mark A wrote:

The major expense of a commit is a synchronous write of the log buffer to
disk.

Not necessarily if group commit is used.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

**Knut Stolze** · Jan 25 '07, 08:45 AM

Re: deleting large numbers of records

Frank Swarbrick wrote:

Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>

>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

>
Actually, DB2/LUW, but I'm guessing your advice still applies.

In that case, have you considered the MERGE statement? Then you may not
have to DELETE the rows at all - just UPDATE them.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

aj · Jan 25 '07, 02:15 PM

Re: deleting large numbers of records

Frank:
Here's an OLAPy trick that I sometimes use:

Let's say you want to delete rows from a very large table based on a
sysdate column. You *don't* want to overfill the transaction logs

The answer: Figure out how many rows you can safely delete w/ your
logs, use row_number() to slap a number on each one, and delete based
not only on your sysdate, but also that number.
Let's say you can safely delete up to 200000 rows, and you only want to
delete rows where sysdate = 5/1/2005:

lock table mytable in exclusive mode ;
delete
FROM (SELECT sysdate , row_number() OVER
(ORDER BY sysdate)
AS rn FROM mytable)
AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
COMMIT ;

Stick this in a loop and stop when no rows get deleted anymore.

HTH

aj

Frank Swarbrick wrote:

We have a batch process that inserts large numbers (100,000 - 1,000,000) of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that, prior
to inserting any data, reads the first input record and checks to see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.
>
If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)
>
Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
END-EXEC
>
The only problem is that my log file would end up running out of room. So
now I've come up with the following:
>
DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.
>
DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.
>
My question is, is this the way to go or is there some better way?
>
I tried making the "10000" a host variable, but that didn't work. Any way
around this?
>
You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.
>
Thanks!
Frank
>
>
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**ChrisC** · Jan 25 '07, 04:35 PM

Re: deleting large numbers of records

Another option might be to create the table as an MDC (MultiDimension al
Cluster) table, clustered on UPDATE_DATE, and then turn on the option
MDC ROLLOUT (not sure of exact syntax). This should then allow you to
just delete all the rows for that date. According to the literature,
it should just mark each block (set of pages) for that cluster as
deleted, log each block as being deleted, and commit. If you think
about this, it should delete the 1,000,000 or so rows very quickly and
NOT fill up your log files.

Disclaimer - I haven't (yet) been able to use this myself, so no actual
experience here. But if this is a new DB2 table/DB, it might be a
great time to check this feature out.

-Chris

**Frank Swarbrick** · Jan 25 '07, 04:45 PM

Re: deleting large numbers of records

Knut Stolze<stolze@d e.ibm.com01/25/07 12:41 AM >>>

>Frank Swarbrick wrote:
>

>My probably very naive thought is that it would be nice to have some

sort

>of
>DELETE statement that didn't even do logging. While often (usually?)

you

>would want to be able to ROLLBACK a DELETE, in the case of what I'm

doing

>there's no reason I would ever want to rollback. So why log? Just
>wondering...

>
>The logs are also used for crash recovery. Let's assume you run the
>unlogged DELETE. Now your application or the DB2 server crashes before

you

>issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
>properly rolled back and the database is in a consistent state. If you
>don't log the DELETE, you are out of luck there.

Are you saying that DB2 occasionally crashes?
:-) (Just kidding.)
As you can tell, I'm hopelessly naive about these things. I'm just a simple
application programmer.

>What would be nice to have in this respect is an option for the DELETE
>statement to explicitly turn off logging - which would have a certain
>amount of problems as I just mentioned. Truncating a whole table is
>supported that way already: you can use ALTER TABLE ... ACTIVATE NOT

LOGGED

>INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
>partitioning is an option for you?

I don't know about range partitioning. Can you give me a pointer to some
information on this?

Thanks!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

deleting large numbers of records

deleting large numbers of records

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment