DB2 vs MySQL - performance on large tables

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Serge Rielau

    #46
    Re: DB2 vs MySQL - performance on large tables

    bing,

    Let's try the following experiment:
    SELECT COUNT(*) FROM (SELECT
    case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
    case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
    COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
    AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
    vs.
    SELECT
    case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
    case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
    COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
    AND F.ID>0 ORDER BY C.FID,C.AID;

    What would this experiment do?
    First the case with raise_error() will ensure DB2 doesn't simplify the
    query (can't drop the raise_error()) Be conscious about placing the
    WHERE in the inner select!
    Now. Using the count(*) will cut out all the network traffic, so we'll
    get a fair idea of how much work DB2 (the engine) in doing vs. how much
    is spent in the client-server communication.
    If the cost is in the engine all those nice proposals about indexes,
    bufferpool, etc may be relevant. If not, they are pointless.
    The one dial that hasn't been mentioned yet is the communication buffer.
    You should already get BLOCKING (i.e prefetching of the resultset into
    the comm buffer) Now we need to dial up the buffer itself.
    I think it's the DB2COMM special register, likely a client side thing
    (never trust the SQL compiler guy ;-)

    Cheers
    Serge
    --
    Serge Rielau
    DB2 SQL Compiler Development
    IBM Toronto Lab

    Comment

    • Database Guy

      #47
      Re: DB2 vs MySQL - performance on large tables

      "Mark A" <ma@switchboard .net> wrote in message news:<gkytb.224 $iP6.100020@new s.uswest.net>.. .
      [color=blue]
      > Sequential prefetch for the table and/or indexes
      > would probably be used in an optimal access path. In such cases, table and
      > index placement is important, as is extent size, prefetch size, and
      > bufferpool page size.[/color]

      Think Bing stated he's running on a single disk. He also stated that
      DB2 is "slow" (10 minutes) when doing a simple select returning many
      rows. So I still feel 20,000 rows/sec is all he's likely to get.
      There's no point making DB2 run the query faster internally if the
      coordinator/client interaction can't spool the results out fast
      enough.

      Perhaps Bing could try timing the following query, to test your theory
      that i/o throughput is a problem and that fetching 20,000/sec can be
      improved on:

      select x, y, z from coordinate where fid < 1000
      and (x + y) = (z * -1) - 1;

      The purpose of the second, added line (non-indexable) is to reduce the
      resultset. If the query runs much quicker than 10 minutes then it
      indicates the extent to which fetch rate is the bottleneck.

      Bing, if you read this then could you also let us know many rows your
      10 minute query returned:

      select x, y, z from coordinate where fid < 1000


      DG

      Comment

      • Gert van der Kooij

        #48
        Re: DB2 vs MySQL - performance on large tables

        In article <bp7q1q$ssb$1@h anover.torolab. ibm.com>, srielau@ca.eye-
        bee-m.com says...
        [color=blue]
        > The one dial that hasn't been mentioned yet is the communication buffer.
        > You should already get BLOCKING (i.e prefetching of the resultset into
        > the comm buffer) Now we need to dial up the buffer itself.
        > I think it's the DB2COMM special register, likely a client side thing
        > (never trust the SQL compiler guy ;-)[/color]

        You're right:)
        It's the DBM CFG parm RQRIOBLK (Max requester I/O block size ).
        By default it's 32767, maximum is 65535.

        But it is used only when connecting from a remote client.
        I don't know enough about the internal memory sizes for local
        clients, but the RQRIOBLK size matching with the OPTIMIZE FOR n ROWS
        clause on a select improves performance for remote clients.

        Comment

        • Blair Adamache

          #49
          Re: DB2 vs MySQL - performance on large tables

          The operating system being used it almost certainly Linux.

          Daniel Morgan wrote:[color=blue]
          > ... I don't know what
          > operating system is being used here, one can't even guarantee that a
          > single file laid down on a disk will be contiguous except within a raw
          > partition.[/color]

          Comment

          • Daniel Morgan

            #50
            Re: DB2 vs MySQL - performance on large tables

            Blair Adamache wrote:
            [color=blue]
            > The operating system being used it almost certainly Linux.
            >
            > Daniel Morgan wrote:
            >[color=green]
            >> ... I don't know what operating system is being used here, one can't
            >> even guarantee that a single file laid down on a disk will be
            >> contiguous except within a raw
            >> partition.[/color][/color]

            Can't speak to a generic, much less any specific Linux version. But
            some operating systems are designed to attempt optimization by
            spreading a single file across a disk even when it is a single file
            on a brand new disk. May not affect you but something to consider.
            --
            Daniel Morgan
            We make it possible for you to keep learning at the University of Washington, even if you work full time or live outside of the Seattle area.

            We make it possible for you to keep learning at the University of Washington, even if you work full time or live outside of the Seattle area.

            damorgan@x.wash ington.edu
            (replace 'x' with a 'u' to reply)

            Comment

            • Bing Wu

              #51
              Re: DB2 vs MySQL - performance on large tables

              Yes, just confirm that the OS is

              Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

              Some hardware info below:-

              CPU info:

              processor : 0
              vendor_id : GenuineIntel
              cpu family : 6
              model : 11
              model name : Intel(R) Pentium(R) III CPU family 1400MHz
              stepping : 1
              cpu MHz : 1396.496
              cache size : 512 KB

              processor : 1
              vendor_id : GenuineIntel
              cpu family : 6
              model : 11
              model name : Intel(R) Pentium(R) III CPU family 1400MHz
              stepping : 1
              cpu MHz : 1396.496
              cache size : 512 KB

              Mem info:

              Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
              Swap: 945864704 344354816 601509888
              MemTotal: 1027860 kB
              MemFree: 60220 kB
              MemShared: 452048 kB
              Buffers: 125500 kB
              Cached: 39164 kB
              SwapCached: 249064 kB
              Active: 550596 kB

              I haven't upgraded to RH9 since it doesn't offically supported DB2.

              Thanks,

              Bing

              Blair Adamache wrote:[color=blue]
              > The operating system being used it almost certainly Linux.
              >
              > Daniel Morgan wrote:
              >[color=green]
              >> ... I don't know what operating system is being used here, one can't
              >> even guarantee that a single file laid down on a disk will be
              >> contiguous except within a raw
              >> partition.[/color]
              >
              >[/color]

              Comment

              • Bing Wu

                #52
                Re: DB2 vs MySQL - performance on large tables

                Database Guy wrote:[color=blue]
                > Perhaps Bing could try timing the following query, to test your theory
                > that i/o throughput is a problem and that fetching 20,000/sec can be
                > improved on:
                >
                > select x, y, z from coordinate where fid < 1000
                > and (x + y) = (z * -1) - 1;[/color]

                Currently under rebuilding indexes. Will test the sql later.
                [color=blue]
                > The purpose of the second, added line (non-indexable) is to reduce the
                > resultset. If the query runs much quicker than 10 minutes then it
                > indicates the extent to which fetch rate is the bottleneck.
                >
                > Bing, if you read this then could you also let us know many rows your
                > 10 minute query returned:
                >
                > select x, y, z from coordinate where fid < 1000
                >[/color]

                The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

                One strange thing if I run this:

                SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
                WHERE C.FID=F.ID AND F.ID<1000

                It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

                $date; db2 "SELECT ...." > output.file; date

                Many thanks,

                Bing






                Comment

                • Mark A

                  #53
                  Re: DB2 vs MySQL - performance on large tables

                  "Bing Wu" <bing@biop.ox.a c.uk> wrote in message
                  news:bpauh4$pc8 $1@news.ox.ac.u k...[color=blue]
                  > Yes, just confirm that the OS is
                  >
                  > Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown
                  >
                  > Some hardware info below:-
                  >
                  > CPU info:
                  >
                  > processor : 0
                  > vendor_id : GenuineIntel
                  > cpu family : 6
                  > model : 11
                  > model name : Intel(R) Pentium(R) III CPU family 1400MHz
                  > stepping : 1
                  > cpu MHz : 1396.496
                  > cache size : 512 KB
                  >
                  > processor : 1
                  > vendor_id : GenuineIntel
                  > cpu family : 6
                  > model : 11
                  > model name : Intel(R) Pentium(R) III CPU family 1400MHz
                  > stepping : 1
                  > cpu MHz : 1396.496
                  > cache size : 512 KB
                  >
                  > Mem info:
                  >
                  > Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
                  > Swap: 945864704 344354816 601509888
                  > MemTotal: 1027860 kB
                  > MemFree: 60220 kB
                  > MemShared: 452048 kB
                  > Buffers: 125500 kB
                  > Cached: 39164 kB
                  > SwapCached: 249064 kB
                  > Active: 550596 kB
                  >
                  > I haven't upgraded to RH9 since it doesn't offically supported DB2.
                  >
                  > Thanks,
                  >
                  > Bing
                  >[/color]
                  Bing,

                  You may already be doing this, but just in case you are not, here are some
                  further suggestions:

                  Since you have 2 processors, you should try to encourage DB2 intra-partition
                  parallelism (if you are not already doing it). This can be done even without
                  DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
                  thereof) containers in a DMS tablespace. Then make sure that degree of
                  parallelism parameter is set to 2 or ANY.

                  This would work even better if you could create the containers on separate
                  physical drives. When you create the containers with the Control Center, it
                  will help you configure the extent size and prefetch size in an optimal
                  configuration. Basically, the prefetch size should be a multiple of the
                  extent size.


                  Comment

                  • Bing Wu

                    #54
                    Re: DB2 vs MySQL - performance on large tables

                    Mark A wrote:[color=blue]
                    > Since you have 2 processors, you should try to encourage DB2 intra-partition
                    > parallelism (if you are not already doing it). This can be done even without
                    > DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
                    > thereof) containers in a DMS tablespace. Then make sure that degree of
                    > parallelism parameter is set to 2 or ANY.
                    >[/color]

                    Thanks for the info. Just confirm that the parallelism parameter has been set to 2.

                    $ db2 get dbm configuration | grep parall

                    Maximum query degree of parallelism (MAX_QUERYDEGRE E) = 2
                    Enable intra-partition parallelism (INTRA_PARALLEL ) = YES

                    The DB2 is ESE 8.1.3 Patch 4.
                    [color=blue]
                    > This would work even better if you could create the containers on separate
                    > physical drives. When you create the containers with the Control Center, it
                    > will help you configure the extent size and prefetch size in an optimal
                    > configuration. Basically, the prefetch size should be a multiple of the
                    > extent size.
                    >[/color]

                    In my case, the server has only one hard drive at the mo. The DB2 optimised:

                    DFT_EXTENT_SZ = 32
                    DFT_PREFETCH_SZ = 32
                    (page size: 4KB)

                    Many thanks,

                    Bing


                    Comment

                    • Bing Wu

                      #55
                      Re: DB2 vs MySQL - performance on large tables

                      Database Guy wrote:[color=blue]
                      > Perhaps Bing could try timing the following query, to test your theory
                      > that i/o throughput is a problem and that fetching 20,000/sec can be
                      > improved on:
                      >
                      > select x, y, z from coordinate where fid < 1000
                      > and (x + y) = (z * -1) - 1;[/color]

                      The SQL completed in just 22 seconds:

                      Tue Nov 18 17:56:27 GMT 2003

                      X Y Z
                      ------------------------ ------------------------ ------------------------
                      -9.31600E+000 +3.30400E+000 +5.01200E+000
                      -3.14100E+000 +3.27600E+000 -1.13500E+000
                      +5.08800E+000 +4.52000E+000 -1.06080E+001
                      -1.24380E+001 +1.80490E+001 -6.61100E+000
                      -7.53300E+000 +2.36800E+000 +4.16500E+000
                      +6.69900E+000 -5.14100E+000 -2.55800E+000
                      -4.95500E+000 +1.48270E+001 -1.08720E+001
                      +1.59270E+001 -5.64900E+000 -1.12780E+001
                      -1.40620E+001 +1.65760E+001 -3.51400E+000
                      +1.11800E+000 -1.15960E+001 +9.47800E+000
                      +7.34000E+000 +2.82100E+000 -1.11610E+001
                      +3.46800E+000 +3.53200E+000 -8.00000E+000
                      +6.26100E+000 +7.68700E+000 -1.49480E+001
                      -4.98700E+000 +8.91000E+000 -4.92300E+000
                      +1.21160E+001 +4.85600E+000 -1.79720E+001
                      -1.58230E+001 +9.37200E+000 +5.45100E+000
                      -8.39700E+000 -5.00200E+000 +1.23990E+001
                      +1.18110E+001 +4.91800E+000 -1.77290E+001
                      +1.08710E+001 -8.07400E+000 -3.79700E+000
                      +3.70300E+000 +1.52000E+001 -1.99030E+001
                      +1.66220E+001 -5.71000E+000 -1.19120E+001
                      +4.61500E+000 +9.30000E+000 -1.49150E+001
                      +1.45170E+001 -1.02910E+001 -5.22600E+000
                      +7.60300E+000 -1.56300E+001 +7.02700E+000
                      +6.84300E+000 +6.03200E+000 -1.38750E+001
                      -9.42900E+000 +2.74500E+000 +5.68400E+000
                      -3.78400E+000 +7.61800E+000 -4.83400E+000
                      +6.33700E+000 +7.54500E+000 -1.48820E+001
                      -1.01420E+001 +1.88600E+001 -9.71800E+000
                      -1.47090E+001 +2.67000E+000 +1.10390E+001
                      +7.58700E+000 +6.17000E+000 -1.47570E+001
                      -1.80590E+001 +8.09400E+000 +8.96500E+000

                      32 record(s) selected.

                      Tue Nov 18 17:56:49 GMT 2003
                      [color=blue]
                      >
                      > The purpose of the second, added line (non-indexable) is to reduce the
                      > resultset. If the query runs much quicker than 10 minutes then it
                      > indicates the extent to which fetch rate is the bottleneck.[/color]

                      Does this means I need to increase the fetch rate? Can you advice?

                      DFT_EXTENT_SZ = 32
                      DFT_PREFETCH_SZ = 32
                      (page size: 4KB)

                      Many thanks,

                      Bing

                      Comment

                      • Bing Wu

                        #56
                        Re: DB2 vs MySQL - performance on large tables

                        Serge,

                        Serge Rielau wrote:[color=blue]
                        > Let's try the following experiment:
                        > SELECT COUNT(*) FROM (SELECT
                        > case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
                        > case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
                        > COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
                        > AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
                        > vs.
                        > SELECT
                        > case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
                        > case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
                        > COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
                        > AND F.ID>0 ORDER BY C.FID,C.AID;[/color]

                        It seems db2 doesn't like "end":

                        SQL0104N An unexpected token "end" was found following "or('70000' , 'dummy')".
                        Expected tokens may include: "AS <cast_target> )". SQLSTATE=42601

                        Cheers,

                        Bing


                        Comment

                        • Serge Rielau

                          #57
                          Re: DB2 vs MySQL - performance on large tables

                          oops, .... cast(raise_erro r(...) AS INT) end

                          Cheers
                          Serge
                          --
                          Serge Rielau
                          DB2 SQL Compiler Development
                          IBM Toronto Lab

                          Comment

                          • Database Guy

                            #58
                            Re: DB2 vs MySQL - performance on large tables

                            Bing Wu <bing@biop.ox.a c.uk> wrote in message news:<bpav01$pi m$1@news.ox.ac. uk>...[color=blue]
                            > Database Guy wrote:[color=green]
                            > > Perhaps Bing could try timing the following query, to test your theory
                            > > that i/o throughput is a problem and that fetching 20,000/sec can be
                            > > improved on:
                            > >
                            > > select x, y, z from coordinate where fid < 1000
                            > > and (x + y) = (z * -1) - 1;[/color]
                            >
                            > Currently under rebuilding indexes. Will test the sql later.
                            >[color=green]
                            > > The purpose of the second, added line (non-indexable) is to reduce the
                            > > resultset. If the query runs much quicker than 10 minutes then it
                            > > indicates the extent to which fetch rate is the bottleneck.
                            > >
                            > > Bing, if you read this then could you also let us know many rows your
                            > > 10 minute query returned:
                            > >
                            > > select x, y, z from coordinate where fid < 1000
                            > >[/color]
                            >
                            > The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.
                            >
                            > One strange thing if I run this:
                            >
                            > SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
                            > WHERE C.FID=F.ID AND F.ID<1000
                            >
                            > It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:
                            >
                            > $date; db2 "SELECT ...." > output.file; date[/color]

                            Bing,

                            Didn't realise that the plain select returned the same number of rows
                            as the query containing the join - although in hindsight this was
                            obvious from your other posts.

                            Your explain is for a slightly different query to the original shown,
                            which makes things a bit harder - it shows a multi-column index that
                            might possibly allow your other query to be satisfied by index access
                            only (can't tell without DDL).

                            To be honest I'm pretty confused. Your data volumes for coordinate
                            sound like they must be in the region of 1300000 4K pages. To scan
                            that would take 20 minutes at the assumed default DB2 optimizer
                            transferrate of 9ms. Maybe your "hidden" index also contains the Z
                            column, so that index-only can be used, or maybe your disk may be
                            higher spec. Really you also need to explain the plain select to check
                            what that is doing.

                            The query containing the join and with 4 columns returned is quite
                            likely to be using indexes to avoid the need to table scan the entire
                            coordinate table, and instead scan just the required ranges.
                            Potentially this could be much faster.

                            I still think you will be bottlenecking on output to client, but feel
                            no certainty about what's going on given the discrepancies between
                            some of your posts - moving target hard to hit. Maybe you could
                            clarify a bit?


                            DG

                            Comment

                            Working...