DB2 vs MySQL - performance on large tables

**Serge Rielau** · Nov 12 '05, 05:51 AM

Re: DB2 vs MySQL - performance on large tables

bing,

Let's try the following experiment:
SELECT COUNT(*) FROM (SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
vs.
SELECT
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
AND F.ID>0 ORDER BY C.FID,C.AID;

What would this experiment do?
First the case with raise_error() will ensure DB2 doesn't simplify the
query (can't drop the raise_error()) Be conscious about placing the
WHERE in the inner select!
Now. Using the count(*) will cut out all the network traffic, so we'll
get a fair idea of how much work DB2 (the engine) in doing vs. how much
is spent in the client-server communication.
If the cost is in the engine all those nice proposals about indexes,
bufferpool, etc may be relevant. If not, they are pointless.
The one dial that hasn't been mentioned yet is the communication buffer.
You should already get BLOCKING (i.e prefetching of the resultset into
the comm buffer) Now we need to dial up the buffer itself.
I think it's the DB2COMM special register, likely a client side thing
(never trust the SQL compiler guy ;-)

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

**Database Guy** · Nov 12 '05, 05:51 AM

Re: DB2 vs MySQL - performance on large tables

"Mark A" <ma@switchboard .net> wrote in message news:<gkytb.224 $iP6.100020@new s.uswest.net>.. .
[color=blue]
> Sequential prefetch for the table and/or indexes
> would probably be used in an optimal access path. In such cases, table and
> index placement is important, as is extent size, prefetch size, and
> bufferpool page size.[/color]

Think Bing stated he's running on a single disk. He also stated that
DB2 is "slow" (10 minutes) when doing a simple select returning many
rows. So I still feel 20,000 rows/sec is all he's likely to get.
There's no point making DB2 run the query faster internally if the
coordinator/client interaction can't spool the results out fast
enough.

Perhaps Bing could try timing the following query, to test your theory
that i/o throughput is a problem and that fetching 20,000/sec can be
improved on:

select x, y, z from coordinate where fid < 1000
and (x + y) = (z * -1) - 1;

The purpose of the second, added line (non-indexable) is to reduce the
resultset. If the query runs much quicker than 10 minutes then it
indicates the extent to which fetch rate is the bottleneck.

Bing, if you read this then could you also let us know many rows your
10 minute query returned:

select x, y, z from coordinate where fid < 1000

DG

**Gert van der Kooij** · Nov 12 '05, 05:51 AM

Re: DB2 vs MySQL - performance on large tables

In article <bp7q1q$ssb$1@h anover.torolab. ibm.com>, srielau@ca.eye-
bee-m.com says...
[color=blue]
> The one dial that hasn't been mentioned yet is the communication buffer.
> You should already get BLOCKING (i.e prefetching of the resultset into
> the comm buffer) Now we need to dial up the buffer itself.
> I think it's the DB2COMM special register, likely a client side thing
> (never trust the SQL compiler guy ;-)[/color]

You're right:)
It's the DBM CFG parm RQRIOBLK (Max requester I/O block size ).
By default it's 32767, maximum is 65535.

But it is used only when connecting from a remote client.
I don't know enough about the internal memory sizes for local
clients, but the RQRIOBLK size matching with the OPTIMIZE FOR n ROWS
clause on a select improves performance for remote clients.

**Blair Adamache** · Nov 12 '05, 05:51 AM

Re: DB2 vs MySQL - performance on large tables

The operating system being used it almost certainly Linux.

Daniel Morgan wrote:[color=blue]
> ... I don't know what
> operating system is being used here, one can't even guarantee that a
> single file laid down on a disk will be contiguous except within a raw
> partition.[/color]

**Daniel Morgan** · Nov 12 '05, 05:51 AM

Re: DB2 vs MySQL - performance on large tables

Blair Adamache wrote:
[color=blue]
> The operating system being used it almost certainly Linux.
>
> Daniel Morgan wrote:
>[color=green]
>> ... I don't know what operating system is being used here, one can't
>> even guarantee that a single file laid down on a disk will be
>> contiguous except within a raw
>> partition.[/color][/color]

Can't speak to a generic, much less any specific Linux version. But
some operating systems are designed to attempt optimization by
spreading a single file across a disk even when it is a single file
on a brand new disk. May not affect you but something to consider.
--
Daniel Morgan

Home - UW Professional & Continuing Education

http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp

We make it possible for you to keep learning at the University of Washington, even if you work full time or live outside of the Seattle area.

Home - UW Professional & Continuing Education

http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp

We make it possible for you to keep learning at the University of Washington, even if you work full time or live outside of the Seattle area.

damorgan@x.wash ington.edu
(replace 'x' with a 'u' to reply)

**Bing Wu** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

Yes, just confirm that the OS is

Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown

Some hardware info below:-

CPU info:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Pentium(R) III CPU family 1400MHz
stepping : 1
cpu MHz : 1396.496
cache size : 512 KB

Mem info:

Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
Swap: 945864704 344354816 601509888
MemTotal: 1027860 kB
MemFree: 60220 kB
MemShared: 452048 kB
Buffers: 125500 kB
Cached: 39164 kB
SwapCached: 249064 kB
Active: 550596 kB

I haven't upgraded to RH9 since it doesn't offically supported DB2.

Thanks,

Bing

Blair Adamache wrote:[color=blue]
> The operating system being used it almost certainly Linux.
>
> Daniel Morgan wrote:
>[color=green]
>> ... I don't know what operating system is being used here, one can't
>> even guarantee that a single file laid down on a disk will be
>> contiguous except within a raw
>> partition.[/color]
>
>[/color]

**Bing Wu** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

Database Guy wrote:[color=blue]
> Perhaps Bing could try timing the following query, to test your theory
> that i/o throughput is a problem and that fetching 20,000/sec can be
> improved on:
>
> select x, y, z from coordinate where fid < 1000
> and (x + y) = (z * -1) - 1;[/color]

Currently under rebuilding indexes. Will test the sql later.
[color=blue]
> The purpose of the second, added line (non-indexable) is to reduce the
> resultset. If the query runs much quicker than 10 minutes then it
> indicates the extent to which fetch rate is the bottleneck.
>
> Bing, if you read this then could you also let us know many rows your
> 10 minute query returned:
>
> select x, y, z from coordinate where fid < 1000
>[/color]

The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.

One strange thing if I run this:

SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
WHERE C.FID=F.ID AND F.ID<1000

It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:

$date; db2 "SELECT ...." > output.file; date

Many thanks,

Bing

**Mark A** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

"Bing Wu" <bing@biop.ox.a c.uk> wrote in message
news:bpauh4$pc8 $1@news.ox.ac.u k...[color=blue]
> Yes, just confirm that the OS is
>
> Linux 2.4.9-31enterprise #1 SMP vie jun 7 12:56:14 PDT 2002 i686 unknown
>
> Some hardware info below:-
>
> CPU info:
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 11
> model name : Intel(R) Pentium(R) III CPU family 1400MHz
> stepping : 1
> cpu MHz : 1396.496
> cache size : 512 KB
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 6
> model : 11
> model name : Intel(R) Pentium(R) III CPU family 1400MHz
> stepping : 1
> cpu MHz : 1396.496
> cache size : 512 KB
>
> Mem info:
>
> Mem: 1052528640 990863360 61665280 462897152 128512000 295145472
> Swap: 945864704 344354816 601509888
> MemTotal: 1027860 kB
> MemFree: 60220 kB
> MemShared: 452048 kB
> Buffers: 125500 kB
> Cached: 39164 kB
> SwapCached: 249064 kB
> Active: 550596 kB
>
> I haven't upgraded to RH9 since it doesn't offically supported DB2.
>
> Thanks,
>
> Bing
>[/color]
Bing,

You may already be doing this, but just in case you are not, here are some
further suggestions:

Since you have 2 processors, you should try to encourage DB2 intra-partition
parallelism (if you are not already doing it). This can be done even without
DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
thereof) containers in a DMS tablespace. Then make sure that degree of
parallelism parameter is set to 2 or ANY.

This would work even better if you could create the containers on separate
physical drives. When you create the containers with the Control Center, it
will help you configure the extent size and prefetch size in an optimal
configuration. Basically, the prefetch size should be a multiple of the
extent size.

**Bing Wu** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

Mark A wrote:[color=blue]
> Since you have 2 processors, you should try to encourage DB2 intra-partition
> parallelism (if you are not already doing it). This can be done even without
> DB2 ESE with DPF. The best way to do this is to create 2 (or multiple of
> thereof) containers in a DMS tablespace. Then make sure that degree of
> parallelism parameter is set to 2 or ANY.
>[/color]

Thanks for the info. Just confirm that the parallelism parameter has been set to 2.

$ db2 get dbm configuration | grep parall

Maximum query degree of parallelism (MAX_QUERYDEGRE E) = 2
Enable intra-partition parallelism (INTRA_PARALLEL ) = YES

The DB2 is ESE 8.1.3 Patch 4.
[color=blue]
> This would work even better if you could create the containers on separate
> physical drives. When you create the containers with the Control Center, it
> will help you configure the extent size and prefetch size in an optimal
> configuration. Basically, the prefetch size should be a multiple of the
> extent size.
>[/color]

In my case, the server has only one hard drive at the mo. The DB2 optimised:

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing

**Bing Wu** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

Database Guy wrote:[color=blue]
> Perhaps Bing could try timing the following query, to test your theory
> that i/o throughput is a problem and that fetching 20,000/sec can be
> improved on:
>
> select x, y, z from coordinate where fid < 1000
> and (x + y) = (z * -1) - 1;[/color]

The SQL completed in just 22 seconds:

Tue Nov 18 17:56:27 GMT 2003

X Y Z
------------------------ ------------------------ ------------------------
-9.31600E+000 +3.30400E+000 +5.01200E+000
-3.14100E+000 +3.27600E+000 -1.13500E+000
+5.08800E+000 +4.52000E+000 -1.06080E+001
-1.24380E+001 +1.80490E+001 -6.61100E+000
-7.53300E+000 +2.36800E+000 +4.16500E+000
+6.69900E+000 -5.14100E+000 -2.55800E+000
-4.95500E+000 +1.48270E+001 -1.08720E+001
+1.59270E+001 -5.64900E+000 -1.12780E+001
-1.40620E+001 +1.65760E+001 -3.51400E+000
+1.11800E+000 -1.15960E+001 +9.47800E+000
+7.34000E+000 +2.82100E+000 -1.11610E+001
+3.46800E+000 +3.53200E+000 -8.00000E+000
+6.26100E+000 +7.68700E+000 -1.49480E+001
-4.98700E+000 +8.91000E+000 -4.92300E+000
+1.21160E+001 +4.85600E+000 -1.79720E+001
-1.58230E+001 +9.37200E+000 +5.45100E+000
-8.39700E+000 -5.00200E+000 +1.23990E+001
+1.18110E+001 +4.91800E+000 -1.77290E+001
+1.08710E+001 -8.07400E+000 -3.79700E+000
+3.70300E+000 +1.52000E+001 -1.99030E+001
+1.66220E+001 -5.71000E+000 -1.19120E+001
+4.61500E+000 +9.30000E+000 -1.49150E+001
+1.45170E+001 -1.02910E+001 -5.22600E+000
+7.60300E+000 -1.56300E+001 +7.02700E+000
+6.84300E+000 +6.03200E+000 -1.38750E+001
-9.42900E+000 +2.74500E+000 +5.68400E+000
-3.78400E+000 +7.61800E+000 -4.83400E+000
+6.33700E+000 +7.54500E+000 -1.48820E+001
-1.01420E+001 +1.88600E+001 -9.71800E+000
-1.47090E+001 +2.67000E+000 +1.10390E+001
+7.58700E+000 +6.17000E+000 -1.47570E+001
-1.80590E+001 +8.09400E+000 +8.96500E+000

32 record(s) selected.

Tue Nov 18 17:56:49 GMT 2003
[color=blue]
>
> The purpose of the second, added line (non-indexable) is to reduce the
> resultset. If the query runs much quicker than 10 minutes then it
> indicates the extent to which fetch rate is the bottleneck.[/color]

Does this means I need to increase the fetch rate? Can you advice?

DFT_EXTENT_SZ = 32
DFT_PREFETCH_SZ = 32
(page size: 4KB)

Many thanks,

Bing

**Bing Wu** · Nov 12 '05, 05:52 AM

Re: DB2 vs MySQL - performance on large tables

Serge,

Serge Rielau wrote:[color=blue]
> Let's try the following experiment:
> SELECT COUNT(*) FROM (SELECT
> case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
> case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
> COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
> AND F.ID>0 ) AS T ORDER BY C.FID,C.AID;
> vs.
> SELECT
> case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end,
> case WHEN x <> y then cast(raise_erro r('70000', 'dummy') end FROM
> COORDINATE AS C, FRAME AS F WHERE F.TID=1 AND C.FID=F.ID AND F.ID<1000
> AND F.ID>0 ORDER BY C.FID,C.AID;[/color]

It seems db2 doesn't like "end":

SQL0104N An unexpected token "end" was found following "or('70000' , 'dummy')".
Expected tokens may include: "AS <cast_target> )". SQLSTATE=42601

Cheers,

Bing

**Serge Rielau** · Nov 12 '05, 05:53 AM

Re: DB2 vs MySQL - performance on large tables

oops, .... cast(raise_erro r(...) AS INT) end

Cheers
Serge
--
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

**Database Guy** · Nov 12 '05, 05:53 AM

Re: DB2 vs MySQL - performance on large tables

Bing Wu <bing@biop.ox.a c.uk> wrote in message news:<bpav01$pi m$1@news.ox.ac. uk>...[color=blue]
> Database Guy wrote:[color=green]
> > Perhaps Bing could try timing the following query, to test your theory
> > that i/o throughput is a problem and that fetching 20,000/sec can be
> > improved on:
> >
> > select x, y, z from coordinate where fid < 1000
> > and (x + y) = (z * -1) - 1;[/color]
>
> Currently under rebuilding indexes. Will test the sql later.
>[color=green]
> > The purpose of the second, added line (non-indexable) is to reduce the
> > resultset. If the query runs much quicker than 10 minutes then it
> > indicates the extent to which fetch rate is the bottleneck.
> >
> > Bing, if you read this then could you also let us know many rows your
> > 10 minute query returned:
> >
> > select x, y, z from coordinate where fid < 1000
> >[/color]
>
> The above SQL returnes 8.5 million rows (that's about 500MB in file), 5% of data in the table.
>
> One strange thing if I run this:
>
> SELECT C.X,C.Y,C.Z FROM COORDINATE AS C, FRAME AS F
> WHERE C.FID=F.ID AND F.ID<1000
>
> It returns the same results but in 7 minutes! I don't understand this. Please enlight. BTW, I use:
>
> $date; db2 "SELECT ...." > output.file; date[/color]

Bing,

Didn't realise that the plain select returned the same number of rows
as the query containing the join - although in hindsight this was
obvious from your other posts.

Your explain is for a slightly different query to the original shown,
which makes things a bit harder - it shows a multi-column index that
might possibly allow your other query to be satisfied by index access
only (can't tell without DDL).

To be honest I'm pretty confused. Your data volumes for coordinate
sound like they must be in the region of 1300000 4K pages. To scan
that would take 20 minutes at the assumed default DB2 optimizer
transferrate of 9ms. Maybe your "hidden" index also contains the Z
column, so that index-only can be used, or maybe your disk may be
higher spec. Really you also need to explain the plain select to check
what that is doing.

The query containing the join and with 4 columns returned is quite
likely to be using indexes to avoid the need to table scan the entire
coordinate table, and instead scan just the required ranges.
Potentially this could be much faster.

I still think you will be bottlenecking on output to client, but feel
no certainty about what's going on given the discrepancies between
some of your posts - moving target hard to hit. Maybe you could
clarify a bit?

DG

DB2 vs MySQL - performance on large tables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment