deleting large numbers of records

**Frank Swarbrick** · Jan 25 '07, 05:15 PM

Re: deleting large numbers of records

Mark A<nobody@nowher e.com01/24/07 7:40 PM >>>

>"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
>news:51qg1dF1l 7lvnU1@mid.indi vidual.net...

>Interesting. I just figured that this would be much less efficient than
>doing just the delete with the fullselect, because in the latter case no
>data need be returned to the AR. Anyway, I will give it a shot.

>
>It depends on where the program runs. If the program runs on server and the

>static SQL is bound into a package that runs on the server, then there is
>not that much difference in performance (unless performance is
>ultra-critical). If the program is running remotely, then there would be a

>big difference in performance.
>
>I have written SQL stored procedures to do mass deletes with a cursor and

it

>performs well.
>
>I haven't seen to many COBOL programs running on DB2 LUW. What compiler are

>you using? I used MicroFocus COBOL against OS/2 Database Manager, but that

>was in 1991.

We're doing it in kind of an odd way. And for now, we're only testing. We
are using "DB2 Server for VSE" as the client, with the IBM COBOL for VSE/ESA
compiler. But all of our databases are remote databases on DB2/LUW.

So we definitely fall in to the category of a remote client, not a client
running on the server.

But to answer your question anyway, I have been successful using both Micro
Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
databases.

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Frank Swarbrick** · Jan 25 '07, 05:15 PM

Re: deleting large numbers of records

Knut Stolze<stolze@d e.ibm.com01/25/07 1:38 AM >>>

>Frank Swarbrick wrote:
>

>Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>

>>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.

>>
>Actually, DB2/LUW, but I'm guessing your advice still applies.

>
>In that case, have you considered the MERGE statement? Then you may not
>have to DELETE the rows at all - just UPDATE them.

Yet another thing I am not familiar with. I will look into it. Thanks.

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Frank Swarbrick** · Jan 25 '07, 05:25 PM

Re: deleting large numbers of records

aj<ronald@mcdon alds.com01/25/07 7:10 AM >>>

>Frank:
>Here's an OLAPy trick that I sometimes use:
>
>Let's say you want to delete rows from a very large table based on a
>sysdate column. You *don't* want to overfill the transaction logs
>
>The answer: Figure out how many rows you can safely delete w/ your
>logs, use row_number() to slap a number on each one, and delete based
>not only on your sysdate, but also that number.
>Let's say you can safely delete up to 200000 rows, and you only want to
>delete rows where sysdate = 5/1/2005:
>
>lock table mytable in exclusive mode ;
>delete
>FROM (SELECT sysdate , row_number() OVER
(ORDER BY sysdate)
AS rn FROM mytable)
AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
>COMMIT ;
>
>Stick this in a loop and stop when no rows get deleted anymore.

Sounds interesting. And brings up another question. Is there any way to
dynamically determine how many rows I can delete w/o filling up the logs?

Lots of good responses to this. Thanks all!

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Knut Stolze** · Jan 25 '07, 09:35 PM

Re: deleting large numbers of records

Frank Swarbrick wrote:

>>The logs are also used for crash recovery. Let's assume you run the
>>unlogged DELETE. Now your application or the DB2 server crashes before

you

>>issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
>>properly rolled back and the database is in a consistent state. If you
>>don't log the DELETE, you are out of luck there.

>
Are you saying that DB2 occasionally crashes?
:-) (Just kidding.)

I can't really comment on that. DB2 crashes quite often in my environment -
sometimes on purpose, sometimes not. If not, then it is usually due to my
(wrong) code changes, of course. ;-)

Anyway, just think of someone tripping over the power cable or using Windows
as OS...

>>What would be nice to have in this respect is an option for the DELETE
>>statement to explicitly turn off logging - which would have a certain
>>amount of problems as I just mentioned. Truncating a whole table is
>>supported that way already: you can use ALTER TABLE ... ACTIVATE NOT

LOGGED

>>INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
>>partitionin g is an option for you?

>
I don't know about range partitioning. Can you give me a pointer to some
information on this?

I guess Serge is the most knowledgeable about this. In a nutshell: you have
one logical table that is internally stored as multiple physical tables.
DB2 will distribute your data across those physical tables. For that, it
needs some criteria/algorithm for the distribution. With range
partitioning, you define ranges and a value in a row that fits into one
range goes into the physical table for that range. During query time, the
DB2 optimizer will analyze the query and if it finds that the query
searches on ranges, it can eliminate scanning some/most of the physical
tables, for instance.

Another side effect is that you have now (V9) ALTER TABLE ... ATTACH
PARTITION and ALTER TABLE ... DETACH PARTITION SQL statements.
(http://publib.boulder.ibm.com/infoce...c/r0000888.htm)
Essentially, those statements switch a regular base table to such a
mentioned physical table and group it to the logical table - or vice versa.
Thus, you can roll-in and roll-out ranges of a table with a single SQL
statement.

If you can partition your table according to your deletion criteria, you can
detach the internal, physical table holding the data you want to remove.
It becomes a regular table, which you can drop.

p.s: I hope I didn't screw up too much on the terminology.

--
Knut Stolze
DB2 z/OS Utilities Development
IBM Germany

**Mark A** · Jan 26 '07, 12:55 AM

Re: deleting large numbers of records

"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
news:51s6eoF1jg m6oU1@mid.indiv idual.net...

We're doing it in kind of an odd way. And for now, we're only testing.
We
are using "DB2 Server for VSE" as the client, with the IBM COBOL for
VSE/ESA
compiler. But all of our databases are remote databases on DB2/LUW.
>
So we definitely fall in to the category of a remote client, not a client
running on the server.
>
But to answer your question anyway, I have been successful using both
Micro
Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
databases.
>
Frank

Given the above, I would create an SQL stored procedure to do the deletes.
It will run on the LUW server (you can call it from a remote client with any
parms you want) and it should perform quite well.

**Frank Swarbrick** · Jan 26 '07, 01:35 AM

Re: deleting large numbers of records

Mark A<nobody@nowher e.com01/25/07 5:45 PM >>>

>"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
>news:51s6eoF1j gm6oU1@mid.indi vidual.net...

>We're doing it in kind of an odd way. And for now, we're only testing.
>We
>are using "DB2 Server for VSE" as the client, with the IBM COBOL for
>VSE/ESA
>compiler. But all of our databases are remote databases on DB2/LUW.
>>
>So we definitely fall in to the category of a remote client, not a

client

>running on the server.
>>
>But to answer your question anyway, I have been successful using both
>Micro
>Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access

DB2/LUW

>databases.
>>
>Frank

>
>Given the above, I would create an SQL stored procedure to do the deletes.

>It will run on the LUW server (you can call it from a remote client with

any

>parms you want) and it should perform quite well.

Sounds right. To be honest, we probably would not do that for this
particular case, simply because it's a situation that occurs just this side
of never. But I will keep it in mind.
Thanks,
Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

**Gregor =?UTF-8?B?S292YcSN?=** · Jan 26 '07, 07:35 AM

Re: deleting large numbers of records

Hi!

I'd propose another method that involves EXPORT and a LOAD operation:
- first you EXPORT the rows you want to preserve
- then do a LOAD with REPLACE
Example:
1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
After running this you'll probably have to do some SET INTEGRITY statements
if you have RI.

I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT and
LOAD oparations then you could even do this via SQL procedure.

Best regards,
Kovi

Frank Swarbrick wrote:

We have a batch process that inserts large numbers (100,000 - 1,000,000)
of
records into a database each day. (DL/I database.) We're considering
converting it to a DB2 table. Currently we have logic in place that,
prior to inserting any data, reads the first input record and checks to
see if it
already exists in the table. If the record already exists there are two
options:
1) Don't continue, because you already ran this job today!
2) This is a rerun - continue.
>
If number 2 is selected the first thing that happens is that it deletes
every record that was inserted today prior to doing the regular insert
process. (You may ask, why not just skip over the ones that are already
there. It's because we may be rerunning with an updated input file, where
the input records may be different than during the first run.)
>
Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
EXEC SQL
DELETE FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
END-EXEC
>
The only problem is that my log file would end up running out of room. So
now I've come up with the following:
>
DELETE-TODAY SECTION.
DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
PERFORM WITH TEST AFTER
UNTIL SQLCODE = 100
DISPLAY 'COMMITTING...' UPON CONSOLE
PERFORM COMMIT-UOW
DISPLAY 'DELETING 10000' UPON CONSOLE
PERFORM DB2-DELETE-TODAY
END-PERFORM
PERFORM COMMIT-UOW
DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
.
>
DB2-DELETE-TODAY SECTION.
EXEC SQL
DELETE FROM (
SELECT UPDATE_DATE
FROM FILM.FILM_TRANS ACTIONS
WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
FETCH FIRST 10000 ROWS ONLY
)
WHERE 1 = 1
END-EXEC
CALL CHECKERR USING SQLCA ERRLOC
.
>
My question is, is this the way to go or is there some better way?
>
I tried making the "10000" a host variable, but that didn't work. Any way
around this?
>
You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
statement. This is because DB2 gives a warning if you pre-compile a
DELETE
or UPDATE statement without a WHERE clause. Still works, but I like to
avoid warnings.
>
Thanks!
Frank
>
>
---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

--
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
| In A World Without Fences Who Needs Gates? |
| Experience Linux. |
-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-

**andyhe** · Jan 26 '07, 04:15 PM

Re: deleting large numbers of records

What would be nice to have in this respect is an option for the DELETE

statement to explicitly turn off logging - which would have a certain
amount of problems as I just mentioned. Truncating a whole table is
supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
partitioning is an option for you?
>

What 'activate not logged initially' does (without the 'with empty
table' option) is that it suspends logging until the next commit. So
you CAN use it without partitioning. Just activate not logged
initially, do your delete and commit.
Beware of any errors (already mentioned)! If for some reason your
delete fails, db2 cannot roll back and the table becomes invalid. Also,
if you have to rollforward after a restore, there is no way to
rollforward over a 'not logged' transaction, your table will be invalid
after rollforward.
We have used this feature for a one-time extension of a very big table,
but took precautions before doing it: backup AND snapshot. :-)
By the way: emptying a complete table is very easy with an import from
/dev/null (on unix).

**Frank Swarbrick** · Jan 26 '07, 04:25 PM

Re: deleting large numbers of records

Good thought, but with two years of data, of 100,000 to 1,000,000 records a
day I don't think this would be very efficient! :-)
(This is a transaction history file with two years of transactions.)

Thanks,

Frank

---
Frank Swarbrick
Senior Developer/Analyst - Mainframe Applications
FirstBank Data Corporation - Lakewood, CO USA

>>Gregor Kova(Bè<gregor .kovac@mikropis .si01/26/07 12:21 AM >>>

Hi!

I'd propose another method that involves EXPORT and a LOAD operation:
- first you EXPORT the rows you want to preserve
- then do a LOAD with REPLACE
Example:
1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
After running this you'll probably have to do some SET INTEGRITY statements
if you have RI.

I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT
and
LOAD oparations then you could even do this via SQL procedure.

Best regards,
Kovi

**Art S. Kagel** · Jan 29 '07, 08:05 PM

Re: deleting large numbers of records

Frank Swarbrick wrote:

Knut Stolze<stolze@d e.ibm.com01/25/07 1:38 AM >>>
>

>>Frank Swarbrick wrote:
>>
>>

>>>Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>
>>>
>>>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.
>>>
>>>Actually, DB2/LUW, but I'm guessing your advice still applies.

>>
>>In that case, have you considered the MERGE statement? Then you may not
>>have to DELETE the rows at all - just UPDATE them.

>
>
Yet another thing I am not familiar with. I will look into it. Thanks.

Just updating the non-key columns in each existing row (and inserting the
missing row if the record was not in the original change set) will be FAR
more efficient. The delete-then-add operations must update each index key
twice - once to remove the original record's keys and again to readd the
row's keys. This is very expensive.

If you can determine whether all (or at least most) operations will be
replacing an existing row, and apparently you can, then just do the updates
in place. My testing has found that this is ALWAYS faster to update than to
delete-then-add and is faster than trying to insert the row and updating
instead if the update fails for a unique key violation (ie already exists)
when fewer than about 30% of the rows will have already pre-existed. When
more than about 70% of the rows do not exist already it is usually faster to
do the insert and update if the insert fails (the exact cutoff depends on
the number of indexes, whether the update modifies indexed columns, the
order the database engine chooses to check the unique keys for violations,
etc.).

Obviously YMMV so you need to test it yourself.

Art S. Kagel

deleting large numbers of records

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment