deleting large numbers of records

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Frank Swarbrick

    #16
    Re: deleting large numbers of records

    Mark A<nobody@nowher e.com01/24/07 7:40 PM >>>
    >"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
    >news:51qg1dF1l 7lvnU1@mid.indi vidual.net...
    >Interesting. I just figured that this would be much less efficient than
    >doing just the delete with the fullselect, because in the latter case no
    >data need be returned to the AR. Anyway, I will give it a shot.
    >
    >It depends on where the program runs. If the program runs on server and the
    >static SQL is bound into a package that runs on the server, then there is
    >not that much difference in performance (unless performance is
    >ultra-critical). If the program is running remotely, then there would be a
    >big difference in performance.
    >
    >I have written SQL stored procedures to do mass deletes with a cursor and
    it
    >performs well.
    >
    >I haven't seen to many COBOL programs running on DB2 LUW. What compiler are
    >you using? I used MicroFocus COBOL against OS/2 Database Manager, but that
    >was in 1991.
    We're doing it in kind of an odd way. And for now, we're only testing. We
    are using "DB2 Server for VSE" as the client, with the IBM COBOL for VSE/ESA
    compiler. But all of our databases are remote databases on DB2/LUW.

    So we definitely fall in to the category of a remote client, not a client
    running on the server.

    But to answer your question anyway, I have been successful using both Micro
    Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
    databases.

    Frank


    ---
    Frank Swarbrick
    Senior Developer/Analyst - Mainframe Applications
    FirstBank Data Corporation - Lakewood, CO USA

    Comment

    • Frank Swarbrick

      #17
      Re: deleting large numbers of records

      Knut Stolze<stolze@d e.ibm.com01/25/07 1:38 AM >>>
      >Frank Swarbrick wrote:
      >
      >Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>
      >>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.
      >>
      >Actually, DB2/LUW, but I'm guessing your advice still applies.
      >
      >In that case, have you considered the MERGE statement? Then you may not
      >have to DELETE the rows at all - just UPDATE them.
      Yet another thing I am not familiar with. I will look into it. Thanks.

      Frank

      ---
      Frank Swarbrick
      Senior Developer/Analyst - Mainframe Applications
      FirstBank Data Corporation - Lakewood, CO USA

      Comment

      • Frank Swarbrick

        #18
        Re: deleting large numbers of records

        aj<ronald@mcdon alds.com01/25/07 7:10 AM >>>
        >Frank:
        >Here's an OLAPy trick that I sometimes use:
        >
        >Let's say you want to delete rows from a very large table based on a
        >sysdate column. You *don't* want to overfill the transaction logs
        >
        >The answer: Figure out how many rows you can safely delete w/ your
        >logs, use row_number() to slap a number on each one, and delete based
        >not only on your sysdate, but also that number.
        >Let's say you can safely delete up to 200000 rows, and you only want to
        >delete rows where sysdate = 5/1/2005:
        >
        >lock table mytable in exclusive mode ;
        >delete
        >FROM (SELECT sysdate , row_number() OVER
        (ORDER BY sysdate)
        AS rn FROM mytable)
        AS tr WHERE rn BETWEEN 1 and 200000 and sysdate = '5/1/2005' ;
        >COMMIT ;
        >
        >Stick this in a loop and stop when no rows get deleted anymore.
        Sounds interesting. And brings up another question. Is there any way to
        dynamically determine how many rows I can delete w/o filling up the logs?

        Lots of good responses to this. Thanks all!

        Frank


        ---
        Frank Swarbrick
        Senior Developer/Analyst - Mainframe Applications
        FirstBank Data Corporation - Lakewood, CO USA

        Comment

        • Knut Stolze

          #19
          Re: deleting large numbers of records

          Frank Swarbrick wrote:
          >>The logs are also used for crash recovery. Let's assume you run the
          >>unlogged DELETE. Now your application or the DB2 server crashes before
          you
          >>issued a COMMIT. Upon restart, DB2 has to make sure the transaction is
          >>properly rolled back and the database is in a consistent state. If you
          >>don't log the DELETE, you are out of luck there.
          >
          Are you saying that DB2 occasionally crashes?
          :-) (Just kidding.)
          I can't really comment on that. DB2 crashes quite often in my environment -
          sometimes on purpose, sometimes not. If not, then it is usually due to my
          (wrong) code changes, of course. ;-)

          Anyway, just think of someone tripping over the power cable or using Windows
          as OS...
          >>What would be nice to have in this respect is an option for the DELETE
          >>statement to explicitly turn off logging - which would have a certain
          >>amount of problems as I just mentioned. Truncating a whole table is
          >>supported that way already: you can use ALTER TABLE ... ACTIVATE NOT
          LOGGED
          >>INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
          >>partitionin g is an option for you?
          >
          I don't know about range partitioning. Can you give me a pointer to some
          information on this?
          I guess Serge is the most knowledgeable about this. In a nutshell: you have
          one logical table that is internally stored as multiple physical tables.
          DB2 will distribute your data across those physical tables. For that, it
          needs some criteria/algorithm for the distribution. With range
          partitioning, you define ranges and a value in a row that fits into one
          range goes into the physical table for that range. During query time, the
          DB2 optimizer will analyze the query and if it finds that the query
          searches on ranges, it can eliminate scanning some/most of the physical
          tables, for instance.

          Another side effect is that you have now (V9) ALTER TABLE ... ATTACH
          PARTITION and ALTER TABLE ... DETACH PARTITION SQL statements.
          (http://publib.boulder.ibm.com/infoce...c/r0000888.htm)
          Essentially, those statements switch a regular base table to such a
          mentioned physical table and group it to the logical table - or vice versa.
          Thus, you can roll-in and roll-out ranges of a table with a single SQL
          statement.

          If you can partition your table according to your deletion criteria, you can
          detach the internal, physical table holding the data you want to remove.
          It becomes a regular table, which you can drop.

          p.s: I hope I didn't screw up too much on the terminology.

          --
          Knut Stolze
          DB2 z/OS Utilities Development
          IBM Germany

          Comment

          • Mark A

            #20
            Re: deleting large numbers of records

            "Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
            news:51s6eoF1jg m6oU1@mid.indiv idual.net...
            We're doing it in kind of an odd way. And for now, we're only testing.
            We
            are using "DB2 Server for VSE" as the client, with the IBM COBOL for
            VSE/ESA
            compiler. But all of our databases are remote databases on DB2/LUW.
            >
            So we definitely fall in to the category of a remote client, not a client
            running on the server.
            >
            But to answer your question anyway, I have been successful using both
            Micro
            Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access DB2/LUW
            databases.
            >
            Frank
            Given the above, I would create an SQL stored procedure to do the deletes.
            It will run on the LUW server (you can call it from a remote client with any
            parms you want) and it should perform quite well.


            Comment

            • Frank Swarbrick

              #21
              Re: deleting large numbers of records

              Mark A<nobody@nowher e.com01/25/07 5:45 PM >>>
              >"Frank Swarbrick" <Frank.Swarbric k@efirstbank.co mwrote in message
              >news:51s6eoF1j gm6oU1@mid.indi vidual.net...
              >We're doing it in kind of an odd way. And for now, we're only testing.
              >We
              >are using "DB2 Server for VSE" as the client, with the IBM COBOL for
              >VSE/ESA
              >compiler. But all of our databases are remote databases on DB2/LUW.
              >>
              >So we definitely fall in to the category of a remote client, not a
              client
              >running on the server.
              >>
              >But to answer your question anyway, I have been successful using both
              >Micro
              >Focus Net Express (COBOL) 5.0 as well as OpenCobol 0.33 to access
              DB2/LUW
              >databases.
              >>
              >Frank
              >
              >Given the above, I would create an SQL stored procedure to do the deletes.
              >It will run on the LUW server (you can call it from a remote client with
              any
              >parms you want) and it should perform quite well.
              Sounds right. To be honest, we probably would not do that for this
              particular case, simply because it's a situation that occurs just this side
              of never. But I will keep it in mind.
              Thanks,
              Frank


              ---
              Frank Swarbrick
              Senior Developer/Analyst - Mainframe Applications
              FirstBank Data Corporation - Lakewood, CO USA

              Comment

              • Gregor =?UTF-8?B?S292YcSN?=

                #22
                Re: deleting large numbers of records

                Hi!

                I'd propose another method that involves EXPORT and a LOAD operation:
                - first you EXPORT the rows you want to preserve
                - then do a LOAD with REPLACE
                Example:
                1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
                SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
                2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
                After running this you'll probably have to do some SET INTEGRITY statements
                if you have RI.

                I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT and
                LOAD oparations then you could even do this via SQL procedure.

                Best regards,
                Kovi

                Frank Swarbrick wrote:
                We have a batch process that inserts large numbers (100,000 - 1,000,000)
                of
                records into a database each day. (DL/I database.) We're considering
                converting it to a DB2 table. Currently we have logic in place that,
                prior to inserting any data, reads the first input record and checks to
                see if it
                already exists in the table. If the record already exists there are two
                options:
                1) Don't continue, because you already ran this job today!
                2) This is a rerun - continue.
                >
                If number 2 is selected the first thing that happens is that it deletes
                every record that was inserted today prior to doing the regular insert
                process. (You may ask, why not just skip over the ones that are already
                there. It's because we may be rerunning with an updated input file, where
                the input records may be different than during the first run.)
                >
                Anyway, I figured with DB2 this would be a snap. I'll I'd need to do is:
                EXEC SQL
                DELETE FROM FILM.FILM_TRANS ACTIONS
                WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
                END-EXEC
                >
                The only problem is that my log file would end up running out of room. So
                now I've come up with the following:
                >
                DELETE-TODAY SECTION.
                DISPLAY 'DELETE PROCESS BEGINS' UPON CONSOLE
                PERFORM WITH TEST AFTER
                UNTIL SQLCODE = 100
                DISPLAY 'COMMITTING...' UPON CONSOLE
                PERFORM COMMIT-UOW
                DISPLAY 'DELETING 10000' UPON CONSOLE
                PERFORM DB2-DELETE-TODAY
                END-PERFORM
                PERFORM COMMIT-UOW
                DISPLAY 'DELETE PROCESS ENDS' UPON CONSOLE
                .
                >
                DB2-DELETE-TODAY SECTION.
                EXEC SQL
                DELETE FROM (
                SELECT UPDATE_DATE
                FROM FILM.FILM_TRANS ACTIONS
                WHERE UPDATE_DATE = FB_FUNC.TO_DATE (:CURR-DATE-JUL-PACKED)
                FETCH FIRST 10000 ROWS ONLY
                )
                WHERE 1 = 1
                END-EXEC
                CALL CHECKERR USING SQLCA ERRLOC
                .
                >
                My question is, is this the way to go or is there some better way?
                >
                I tried making the "10000" a host variable, but that didn't work. Any way
                around this?
                >
                You may wondering why I put the "WHERE 1 = 1" clause on the DELETE
                statement. This is because DB2 gives a warning if you pre-compile a
                DELETE
                or UPDATE statement without a WHERE clause. Still works, but I like to
                avoid warnings.
                >
                Thanks!
                Frank
                >
                >
                ---
                Frank Swarbrick
                Senior Developer/Analyst - Mainframe Applications
                FirstBank Data Corporation - Lakewood, CO USA
                --
                -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-
                | In A World Without Fences Who Needs Gates? |
                | Experience Linux. |
                -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-

                Comment

                • andyhe

                  #23
                  Re: deleting large numbers of records

                  What would be nice to have in this respect is an option for the DELETE
                  statement to explicitly turn off logging - which would have a certain
                  amount of problems as I just mentioned. Truncating a whole table is
                  supported that way already: you can use ALTER TABLE ... ACTIVATE NOT LOGGED
                  INITIALLY WITH EMPTY TABLE for that. Maybe this, combined with range
                  partitioning is an option for you?
                  >
                  What 'activate not logged initially' does (without the 'with empty
                  table' option) is that it suspends logging until the next commit. So
                  you CAN use it without partitioning. Just activate not logged
                  initially, do your delete and commit.
                  Beware of any errors (already mentioned)! If for some reason your
                  delete fails, db2 cannot roll back and the table becomes invalid. Also,
                  if you have to rollforward after a restore, there is no way to
                  rollforward over a 'not logged' transaction, your table will be invalid
                  after rollforward.
                  We have used this feature for a one-time extension of a very big table,
                  but took precautions before doing it: backup AND snapshot. :-)
                  By the way: emptying a complete table is very easy with an import from
                  /dev/null (on unix).

                  Comment

                  • Frank Swarbrick

                    #24
                    Re: deleting large numbers of records

                    Good thought, but with two years of data, of 100,000 to 1,000,000 records a
                    day I don't think this would be very efficient! :-)
                    (This is a transaction history file with two years of transactions.)

                    Thanks,

                    Frank

                    ---
                    Frank Swarbrick
                    Senior Developer/Analyst - Mainframe Applications
                    FirstBank Data Corporation - Lakewood, CO USA
                    >>Gregor Kova(Bè<gregor .kovac@mikropis .si01/26/07 12:21 AM >>>
                    Hi!

                    I'd propose another method that involves EXPORT and a LOAD operation:
                    - first you EXPORT the rows you want to preserve
                    - then do a LOAD with REPLACE
                    Example:
                    1.) EXPORT FROM SCHEMA.TABLE1 INTO TABLE1.IXF OF IXF SELECT * FROM
                    SCHEMA.TABLE1 WHERE DATE BETWEEN CURRENT DATE AND CURRENT DATE - 3 DAYS
                    2.) LOAD FROM TABLE1.IXF OF IXF REPLACE INTO SCHEMA.TABLE1
                    After running this you'll probably have to do some SET INTEGRITY statements
                    if you have RI.

                    I only know DB2 for LUW, but if your ADMIN_CMD procedure supports EXPORT
                    and
                    LOAD oparations then you could even do this via SQL procedure.

                    Best regards,
                    Kovi


                    Comment

                    • Art S. Kagel

                      #25
                      Re: deleting large numbers of records

                      Frank Swarbrick wrote:
                      Knut Stolze<stolze@d e.ibm.com01/25/07 1:38 AM >>>
                      >
                      >>Frank Swarbrick wrote:
                      >>
                      >>
                      >>>Mark A<nobody@nowher e.com01/24/07 4:36 PM >>>
                      >>>
                      >>>>Yes there is a better way that will avoid filling up your DB2 z/OS logs.
                      >>>
                      >>>Actually, DB2/LUW, but I'm guessing your advice still applies.
                      >>
                      >>In that case, have you considered the MERGE statement? Then you may not
                      >>have to DELETE the rows at all - just UPDATE them.
                      >
                      >
                      Yet another thing I am not familiar with. I will look into it. Thanks.
                      Just updating the non-key columns in each existing row (and inserting the
                      missing row if the record was not in the original change set) will be FAR
                      more efficient. The delete-then-add operations must update each index key
                      twice - once to remove the original record's keys and again to readd the
                      row's keys. This is very expensive.

                      If you can determine whether all (or at least most) operations will be
                      replacing an existing row, and apparently you can, then just do the updates
                      in place. My testing has found that this is ALWAYS faster to update than to
                      delete-then-add and is faster than trying to insert the row and updating
                      instead if the update fails for a unique key violation (ie already exists)
                      when fewer than about 30% of the rows will have already pre-existed. When
                      more than about 70% of the rows do not exist already it is usually faster to
                      do the insert and update if the insert fails (the exact cutoff depends on
                      the number of indexes, whether the update modifies indexed columns, the
                      order the database engine chooses to check the unique keys for violations,
                      etc.).

                      Obviously YMMV so you need to test it yourself.

                      Art S. Kagel

                      Comment

                      Working...