Delete and concurrency problems

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michel Esber

    Delete and concurrency problems

    Hello,

    Env: DB2 V8 LUW FP16 running Linux

    create Table X (machine_id varchar(24) not null, ctime timestamp not
    null);
    create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
    alter table X add primary key (MACHINE_ID, CTIME);

    Our C++ application inserts data into a table X using CLI array insert
    with very good throughput.

    On an hourly basis, another application (running on a different
    server) reads the last 'hour':

    select * from T where MACHINE_ID = ? and ctime between current
    timestamp - 1 hour and current timestamp for read only;

    After that, it starts a delete loop:

    while (SQLCODE <100 )
    do
    delete from (select 1 from T where MACHINE_ID = ? and CTIME between
    current timestamp - 1 hour and current timestamp fetch first 2000 rows
    only) as B
    done


    The deletes have severe impact on my insert rate/throughput. I can see
    number 3 times slower, or even worse. There are absolutely *NO* lock-
    waits.

    Question: Should I expect this performance degradation or am I missing
    something ? Maybe there is a better technique to delete data.

    Any thoughts ?

    Thanks in advance,
  • Mark A

    #2
    Re: Delete and concurrency problems

    "Michel Esber" <michel@us.auto matos.comwrote in message
    news:d3aeaf8f-d888-4b1d-8e30-f70937b1331f@v3 9g2000pro.googl egroups.com...
    Hello,
    >
    Env: DB2 V8 LUW FP16 running Linux
    >
    create Table X (machine_id varchar(24) not null, ctime timestamp not
    null);
    create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
    alter table X add primary key (MACHINE_ID, CTIME);
    >
    Our C++ application inserts data into a table X using CLI array insert
    with very good throughput.
    >
    On an hourly basis, another application (running on a different
    server) reads the last 'hour':
    >
    select * from T where MACHINE_ID = ? and ctime between current
    timestamp - 1 hour and current timestamp for read only;
    >
    After that, it starts a delete loop:
    >
    while (SQLCODE <100 )
    do
    delete from (select 1 from T where MACHINE_ID = ? and CTIME between
    current timestamp - 1 hour and current timestamp fetch first 2000 rows
    only) as B
    done
    >
    >
    The deletes have severe impact on my insert rate/throughput. I can see
    number 3 times slower, or even worse. There are absolutely *NO* lock-
    waits.
    >
    Question: Should I expect this performance degradation or am I missing
    something ? Maybe there is a better technique to delete data.
    >
    Any thoughts ?
    >
    Thanks in advance,
    db2set DB2_SKIPDELETED =ON
    db2set DB2_SKIPINSERTE D=ON

    then restart the instance (db2stop, db2start)


    Comment

    • Michel Esber

      #3
      Re: Delete and concurrency problems

      db2set DB2_SKIPDELETED =ON
      db2set DB2_SKIPINSERTE D=ON
      >
      then restart the instance (db2stop, db2start)- Ocultar texto entre aspas -
      Hi Mark,

      Thanks for your feedback. However, this variable was already set.
      Here is my config:

      DB2LINUXAIO=TRU E
      DB2_SKIPINSERTE D=ON
      DB2_SCATTERED_I O=ON
      DB2_USE_ALTERNA TE_PAGE_CLEANIN G=ON
      DB2_EVALUNCOMMI TTED=ON
      DB2_SKIPDELETED =ON
      DB2COMM=tcpip
      DB2_PARALLEL_IO =*
      DB2AUTOSTART=NO

      Maybe v9 has a better locking/concurrency mechanism ?

      -M

      Comment

      • Michel Esber

        #4
        Re: Delete and concurrency problems

        I was reading Chris Eaton´s article:

        Tackle tech challenges together. Join our IT community to connect, share your expertise, get answers, stay up with trends, advance your career, and keep IT fun!


        And it seems like the use of select from delete may be useful here.

        However, here is the problem:

        My application has to read up to 500K rows, summarizes data, and then
        it loops deleting data in chunks of 2k rows.
        I do need to read ALL the rows within a range and only after that
        start the deletion.

        select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
        between ? and ? fetch first 2000 rows only) doesn´t seem to suit my
        needs.

        Any thoughts?

        Thanks,

        Comment

        • Serge Rielau

          #5
          Re: Delete and concurrency problems

          Michel Esber wrote:
          I was reading Chris Eaton´s article:
          >
          Tackle tech challenges together. Join our IT community to connect, share your expertise, get answers, stay up with trends, advance your career, and keep IT fun!

          >
          And it seems like the use of select from delete may be useful here.
          >
          However, here is the problem:
          >
          My application has to read up to 500K rows, summarizes data, and then
          it loops deleting data in chunks of 2k rows.
          I do need to read ALL the rows within a range and only after that
          start the deletion.
          >
          select * from OLD TABLE (delete from X where MACHINE_ID = ? and ctime
          between ? and ? fetch first 2000 rows only) doesn´t seem to suit my
          needs.
          General question: How does teh plan look like? Are you deleting by index
          or does DB2 scan?

          Cheers
          Serge



          --
          Serge Rielau
          DB2 Solutions Development
          IBM Toronto Lab

          Comment

          • Michel Esber

            #6
            Re: Delete and concurrency problems

            General question: How does teh plan look like? Are you deleting by index
            or does DB2 scan?
            Hello Serge,

            It is deleting by index. Here is the full plan:

            *************** ***** EXPLAIN INSTANCE *************** *****

            DB2_VERSION: 08.02.8
            SOURCE_NAME: SQLC2E07
            SOURCE_SCHEMA: NULLID
            SOURCE_VERSION:
            EXPLAIN_TIME: 2008-09-19-11.05.35.744822
            EXPLAIN_REQUEST ER: DB2INST1

            Database Context:
            ----------------
            Parallelism: None
            CPU Speed: 4.802167e-07
            Comm Speed: 0
            Buffer Pool size: 189000
            Sort Heap size: 2048
            Database Heap size: 16000
            Lock List size: 8192
            Maximum Lock List: 20
            Average Applications: 50
            Locks Available: 167116

            Package Context:
            ---------------
            SQL Type: Dynamic
            Optimization Level: 5
            Blocking: Block All Cursors
            Isolation Level: Cursor Stability



            ---------------- STATEMENT 1 SECTION 203 ----------------
            QUERYNO: 1
            QUERYTAG:
            Statement Type: Searched Delete
            Updatable: Not Applicable
            Deletable: Not Applicable
            Query Degree: 1

            Original Statement:
            ------------------
            DELETE
            FROM RTM.TBL_COLLECT _PSSTAT_WIN_RTM
            where MACHINE_ID = ? and collect_time between ? and ?


            Optimized Statement:
            -------------------
            DELETE
            FROM RTM.TBL_COLLECT _PSSTAT_WIN_RTM AS Q1
            WHERE $RID$ IN
            (SELECT $RID$
            FROM RTM.TBL_COLLECT _PSSTAT_WIN_RTM AS Q2
            WHERE (:? <= :?) AND (Q2.COLLECT_TIM E <= :?) AND (:? <=
            Q2.COLLECT_TIME )
            AND (Q2.MACHINE_ID = :?))

            Access Plan:
            -----------
            Total Cost: 73.0298
            Query Degree: 1

            Rows
            RETURN
            ( 1)
            Cost
            I/O
            |
            8.39196
            DELETE
            ( 2)
            73.0298
            11.392
            /------+------\
            8.39196 2.918e+06
            IXSCAN TABLE: RTM
            ( 3) TBL_COLLECT_PSS TAT_WIN_RTM
            19.2689
            3
            |
            2.918e+06
            INDEX: DB2INST1
            IX_PS_WIN




            Extended Diagnostic Information:
            --------------------------------

            No extended Diagnostic Information for this statment.


            Plan Details:
            -------------


            1) RETURN: (Return Result)
            Cumulative Total Cost: 73.0298
            Cumulative CPU Cost: 252522
            Cumulative I/O Cost: 11.392
            Cumulative Re-Total Cost: 53.794
            Cumulative Re-CPU Cost: 178061
            Cumulative Re-I/O Cost: 8.39196
            Cumulative First Row Cost: 73.0298
            Estimated Bufferpool Buffers: 12.392

            Arguments:
            ---------
            BLDLEVEL: (Build level)
            DB2 v8.1.2.136 : special_19546
            HEAPUSE : (Maximum Statement Heap Usage)
            56 Pages
            STMTHEAP: (Statement heap size)
            4096

            Input Streams:
            -------------
            4) From Operator #2

            Estimated number of rows:
            8.39196
            Number of columns: 0
            Subquery predicate ID: Not
            Applicable


            2) DELETE: (Delete)
            Cumulative Total Cost: 73.0298
            Cumulative CPU Cost: 252522
            Cumulative I/O Cost: 11.392
            Cumulative Re-Total Cost: 53.794
            Cumulative Re-CPU Cost: 178061
            Cumulative Re-I/O Cost: 8.39196
            Cumulative First Row Cost: 73.0298
            Estimated Bufferpool Buffers: 12.392

            Input Streams:
            -------------
            2) From Operator #3

            Estimated number of rows:
            8.39196
            Number of columns: 2
            Subquery predicate ID: Not
            Applicable

            Column Names:
            ------------
            +Q3.$C0+Q3.$C1


            Output Streams:
            --------------
            3) To Object RTM.TBL_COLLECT _PSSTAT_WIN_RTM

            Estimated number of rows: 2.918e
            +06
            Number of columns: 1
            Subquery predicate ID: Not
            Applicable

            Column Names:
            ------------
            +Q1.$RID$

            4) To Operator #1

            Estimated number of rows:
            8.39196
            Number of columns: 0
            Subquery predicate ID: Not
            Applicable


            3) IXSCAN: (Index Scan)
            Cumulative Total Cost: 19.2689
            Cumulative CPU Cost: 143426
            Cumulative I/O Cost: 3
            Cumulative Re-Total Cost: 0.0331183
            Cumulative Re-CPU Cost: 68965.2
            Cumulative Re-I/O Cost: 0
            Cumulative First Row Cost: 19.245
            Estimated Bufferpool Buffers: 4

            Arguments:
            ---------
            MAXPAGES: (Maximum pages for prefetch)
            1
            PREFETCH: (Type of Prefetch)
            NONE
            ROWLOCK : (Row Lock intent)
            UPDATE
            SCANDIR : (Scan Direction)
            FORWARD
            TABLOCK : (Table Lock intent)
            INTENT EXCLUSIVE

            Predicates:
            ----------
            2) Sargable Predicate
            Relational Operator: Less Than or
            Equal (<=)
            Subquery Input Required: No
            Filter Factor: 0.333333

            Predicate Text:
            --------------
            (:? <= :?)

            3) Stop Key Predicate
            Relational Operator: Less Than or
            Equal (<=)
            Subquery Input Required: No
            Filter Factor: 0.0200165

            Predicate Text:
            --------------
            (Q2.COLLECT_TIM E <= :?)

            4) Start Key Predicate
            Relational Operator: Less Than or
            Equal (<=)
            Subquery Input Required: No
            Filter Factor: 0.0200165

            Predicate Text:
            --------------
            (:? <= Q2.COLLECT_TIME )

            5) Start Key Predicate
            Relational Operator: Equal (=)
            Subquery Input Required: No
            Filter Factor: 0.00143678

            Predicate Text:
            --------------
            (Q2.MACHINE_ID = :?)

            5) Stop Key Predicate
            Relational Operator: Equal (=)
            Subquery Input Required: No
            Filter Factor: 0.00143678

            Predicate Text:
            --------------
            (Q2.MACHINE_ID = :?)


            Input Streams:
            -------------
            1) From Object DB2INST1.IX_PS_ WIN

            Estimated number of rows: 2.918e
            +06
            Number of columns: 3
            Subquery predicate ID: Not
            Applicable

            Column Names:
            ------------
            +Q2.$RID$+Q2.MA CHINE_ID
            +Q2.COLLECT_TIM E


            Output Streams:
            --------------
            2) To Operator #2

            Estimated number of rows:
            8.39196
            Number of columns: 2
            Subquery predicate ID: Not
            Applicable

            Column Names:
            ------------
            +Q3.$C0+Q3.$C1


            Objects Used in Access Plan:
            ---------------------------

            Schema: DB2INST1
            Name: IX_PS_WIN
            Type: Index
            Time of creation:
            2008-07-11-18.07.56.326366
            Last statistics update:
            2008-09-06-10.49.52.692970
            Number of columns: 4
            Number of rows: 2917997
            Width of rows: -1
            Number of buffer pool pages: 27158
            Distinct row values: Yes
            Tablespace name: PSIDXWIN
            Tablespace overhead: 6.000000
            Tablespace transfer rate: 0.400000
            Source for statistics: Single Node
            Prefetch page count: 128
            Container extent page count: 32
            Index clustering statistic: 0.941143
            Index leaf pages: 51075
            Index tree levels: 4
            Index full key cardinality: 2917997
            Index first key cardinality: 696
            Index first 2 keys cardinality: 28598
            Index first 3 keys cardinality: 2917997
            Index first 4 keys cardinality: 2917997
            Index sequential pages: 51074
            Index page density: 98
            Index avg sequential pages: 51074
            Index avg gap between sequences:0
            Index avg random pages: 0
            Fetch avg sequential pages: -1
            Fetch avg gap between sequences:-1
            Fetch avg random pages: -1
            Index RID count: 2917997
            Index deleted RID count: 0
            Index empty leaf pages: 0
            Base Table Schema: RTM
            Base Table Name:
            TBL_COLLECT_PSS TAT_WIN_RTM
            Columns in index:
            MACHINE_ID
            COLLECT_TIME
            PROCESS_PID
            PROCESS_NAME

            Schema: RTM
            Name: TBL_COLLECT_PSS TAT_WIN_RTM
            Type: Table
            Time of creation:
            2008-01-17-11.51.05.399878
            Last statistics update:
            2008-09-06-10.49.52.692970
            Number of columns: 12
            Number of rows: 2917997
            Width of rows: 50
            Number of buffer pool pages: 27158
            Distinct row values: No
            Tablespace name: PSDATWIN
            Tablespace overhead: 6.000000
            Tablespace transfer rate: 0.400000
            Source for statistics: Single Node
            Prefetch page count: 128
            Container extent page count: 32
            Table overflow record count: 0
            Table Active Blocks: -1

            Comment

            • w.l.fischer@googlemail.com

              #7
              Re: Delete and concurrency problems

              Just ideas... You might as well already have ruled them out...

              One way to delete rows fast ist having them clustered by an MDC-index
              (can be one-dimensional) and the registry variable DB2_MDC_ROLLOUT
              being set (since DB2 V8.2 I believe). Just heard about it and I would
              be interested in how much deletes are sped up by this in reality...

              Another way of course is DB2 V9 partitioning and dropping partitions.

              Comment

              • yongleig@gmail.com

                #8
                Re: Delete and concurrency problems

                There are always contention on bufferpool and index root and leaf
                pages between delete and insert whenever delete starts because delete
                is always to delete the latest rows, those are rows that the insert
                process has just inserted. So the performance degradation is expected.

                RCT table maybe is worth to try, at least less contention. But there
                are many restrictions on RCT table.

                Comment

                • stefan.albert

                  #9
                  Re: Delete and concurrency problems

                  I don't really understand what you want to do with these steps...:
                  1) permanent (and concurrent) inserts with actual timestamp
                  2) every hour read / summarize - until here I understand - then delete
                  all until "now" - this I don't understand... because you will delete
                  un"processed/summarized" rows.

                  And also: You create an index and then a primary key with the same
                  columns... this throws an error, because the same columns can't be
                  used twice for an index.

                  Just my few cents...

                  On Sep 18, 10:21 pm, Michel Esber <mic...@us.auto matos.comwrote:
                  Hello,
                  >
                  Env: DB2 V8 LUW FP16 running Linux
                  >
                  create Table X (machine_id varchar(24) not null, ctime timestamp not
                  null);
                  create index iFoo on X (MACHINE_ID, CTIME) allow reverse scans;
                  alter table X add primary key (MACHINE_ID, CTIME);
                  >
                  Our C++ application inserts data into a table X using CLI array insert
                  with very good throughput.
                  >
                  On an hourly basis, another application (running on a different
                  server) reads the last 'hour':
                  >
                  select * from T where MACHINE_ID = ? and ctime between current
                  timestamp - 1 hour and current timestamp for read only;
                  >
                  After that, it starts a delete loop:
                  >
                  while (SQLCODE <100 )
                  do
                  delete from (select 1 from T where MACHINE_ID = ? and CTIME between
                  current timestamp - 1 hour and current timestamp fetch first 2000 rows
                  only) as B
                  done
                  >
                  The deletes have severe impact on my insert rate/throughput. I can see
                  number 3 times slower, or even worse. There are absolutely *NO* lock-
                  waits.
                  >
                  Question: Should I expect this performance degradation or am I missing
                  something ? Maybe there is a better technique to delete data.
                  >
                  Any thoughts ?
                  >
                  Thanks in advance,

                  Comment

                  Working...