Importance of primary keys in UNION

**PEB** · Oct 31 '06, 07:24 AM

The PK is very important and take care use only longinteger PK! Index your fields on which you do the search!

It's very fast after!

:)

No PK - no hope for fast performance!

**MMcCarthy** · Oct 31 '06, 10:12 AM

Originally posted by Killer42

What I’d like to know is, have I stuffed up my testing, or am I really likely to see an enormous speed improvement when doing simple queries against a bunch of “unioned” tables by adding a primary key to the tables? (The primary key that I added was totally unrelated to my test queries or to the real ones I’ll be using.)

You'll definately see an improvement as PEB says. In cases where it's just not possible to add a primary key, try to index at least one field in the table. Even an indexed field with duplicates should give some improvement.

**NeoPa** · Oct 31 '06, 11:48 AM

I'm open to correction here but I'm quite happy this is along the right tracks.
Firstly, FYI, The size limit of Access 2K and beyond is 4GB. 97 was the last one limited to 2GB.
Beware though, a Unicode feature was installed for text fields which can double your text data unless you use 'Unicode Compression', which should be safe for most databases in the English speaking world.
Indices - what a subject!
When an RDBMS (Access for instance) processes a recordset, it will look for an index that will help the processing it's currently on. So, if you run

Code:

SELECT [CName], [Surname], [DoB]
FROM [tblStaff]
ORDER BY [Surname]

then having an index based on [Surname] will enable that to be used.
If there is no index that will help, it will have to sort the data first before use.
This is where the small PK comes in

Originally posted by PEB

The PK is very important, and take care use only Long Integer PK! Index your fields on which you do the search!

If you can't provide keys for all the various queries that use the table, then those that don't have keys will sort, and here's the nub, if there is a PK then it will sort via that - if no PK at all, it needs to create a temp table and sort the whole dataset :(.

Having said that, my recommendation would be to use the PK for your main unique index - though there is clearly something to be said for PEB's approach, especially where you need to process the recordset from many different directions in different queries etc - as it were.

**Killer42** · Oct 31 '06, 09:07 PM

Hi all.

Don't have time to read and respond to everything just yet, but I wanted to make a couple of points.

I'm running Access 2003, but my databases are in Access 2000 format in case I need to work on them at home.
I have run into the 2GB limit, plenty of times. The last time was probably less than a fortnight ago. If I remember correctly, you get an "Invalid command" message, or something similar. It doesn't give any reasonable indication as to what's wrong.
All my search fields are indexed (allowing duplicates)
Performance is fine as long as I'm dealing with each 2GB database individually. They're around 12-15 million records each, and searching is nice and quick (couple of seconds). Once I UNION a couple of them, and do a simple search on a field which is indexed in both, it takes hours.
The "PK" I added was not integer, and had nothing directly to do with the search. What makes the difference appears to be simply whether there is a PK.

NeoPa, I'll read your lengthy message some time today (it's morning here now).

**Killer42** · Nov 1 '06, 12:51 AM

Originally posted by NeoPa

I'm open to correction here but I'm quite happy this is along the right tracks.
Firstly, FYI, The size limit of Access 2K and beyond is 4GB. 97 was the last one limited to 2GB.
Beware though, a Unicode feature was installed for text fields which can double your text data unless you use 'Unicode Compression', which should be safe for most databases in the English speaking world.
Indices - what a subject!
When an RDBMS (Access for instance) processes a recordset, it will look for an index that will help the processing it's currently on. So, if you run

Code:

SELECT [CName], [Surname], [DoB]
FROM [tblStaff]
ORDER BY [Surname]

then having an index based on [Surname] will enable that to be used.
If there is no index that will help, it will have to sort the data first before use.
This is where the small PK comes in

If you can't provide keys for all the various queries that use the table, then those that don't have keys will sort, and here's the nub, if there is a PK then it will sort via that - if no PK at all, it needs to create a temp table and sort the whole dataset :(.

Having said that, my recommendation would be to use the PK for your main unique index - though there is clearly something to be said for PEB's approach, especially where you need to process the recordset from many different directions in different queries etc - as it were.

Phew! Finally found the time to read all of this. Here's my response...

All of the fields I'm likely to want to search on are indexed (none unique). the indexes are exactly the same in all the databases. In fact apart from the data, the DBs are identical.
I don't have any unique field on the records. Used to have a PK called ID (automatically added by Access, I think) but removed it to save space.
As I said, performance was wonderful on all (alright, most) queries until I started "unioning" the databases.
Some of the databases had the records in a different sequence to what I really prefer, so the new PK I'm adding is made up of a couple of fields and returns them in the preferred order.
It is obviously not doing any major amount of sorting now that my sample database has a PK, as the response is back to a second or two, and I’m searching (and returning) millions of records.
As for creating a temp table and so on, I can believe it may have been doing that when I didn’t have the PK, based on the ridiculous length of time and amount of disk activity.
Anyone know why Access can’t make use of the keys on the unioned tables? It’s a very simple structure, and query.

I’ll go ahead and add a PK to all my tables (which will involve splitting each into two parts, since they are up against the size limit). Should be able to tell you in a few days whether it worked. Not much to do on my part, but dealing with these volumes, Access tends to chug along all day in the background. :) I've had a delete query running for 3-4 hours now, and it's probably around 70-80% finished.

**MMcCarthy** · Nov 1 '06, 01:01 AM

Originally posted by Killer42

Anyone know why Access can’t make use of the keys on the unioned tables? It’s a very simple structure, and query.

Essentially, Access treats each index correctly when it's on an individual table or query.

The minute you preform a UNION access can't reconcile the indexes in the various sets and pretty much ignores them.

One solution to your problem which I have used in the past is to create a VBA routine to take in each of the tables from the various databases and to create a new recordset based on a new table structure and run a DoCmd.RunSQL INSERT query to append each record. The new table would of course have an autonumber primary key.

It works much faster than the UNION query and your only problem is to work out how to dynamicaly program it to move through the odbc and recordset connections. I used a table with each database path and table as field values.

Any questions on this approach let me know.

Mary

**MMcCarthy** · Nov 1 '06, 01:04 AM

BTW

I did this using DAO not ADO which give problems when changing connections during a process.

**Killer42** · Nov 1 '06, 01:38 AM

Originally posted by mmccarthy

Essentially, Access treats each index correctly when it's on an individual table or query.

The minute you preform a UNION access can't reconcile the indexes in the various sets and pretty much ignores them.

One solution to your problem which I have used in the past is to create a VBA routine to take in each of the tables from the various databases and to create a new recordset based on a new table structure and run a DoCmd.RunSQL INSERT query to append each record. The new table would of course have an autonumber primary key.

It works much faster than the UNION query and your only problem is to work out how to dynamicaly program it to move through the odbc and recordset connections. I used a table with each database path and table as field values.

Any questions on this approach let me know.

Mary

Um...

As far as I can see, my main problem would still be the same - the data is too big to fit in one table. So how does this help?

Anyway, I don't really need an alternative method that badly, now. I have the single table from each of the databases linked to the front-end. It comes down to two choices. If a PK provides acceptable performance, I'll go back to using a UNION query which connects up the data for all of the years (it's one year per database). If not, I'll just have to union the results of the queries on each database. I'm doing that now, but don't like it.

**MMcCarthy** · Nov 1 '06, 02:03 AM

Originally posted by Killer42

Um...

As far as I can see, my main problem would still be the same - the data is too big to fit in one table. So how does this help?

Using this method you never actually link the tables or run any queries so the only thing you have is the new table. How many records are we talking about here. I've done it with a couple of million.

**Killer42** · Nov 1 '06, 02:56 AM

Originally posted by mmccarthy

Using this method you never actually link the tables or run any queries so the only thing you have is the new table. How many records are we talking about here. I've done it with a couple of million.

I'm afraid you've lost me completely as to what "this method" is.

I have a separate database for each year's data, with typically 12 to 15 million records, each database approaching 2GB in size (lucky coincidence). Each consists of a single table. They have identical structure and indexes. It's simply the same database copied, cleaned out and repopulated for each year.

I'd like to have a front-end database which just includes a union query to pull them all together, then play with that. It's not all that important - I can just continue to use them separately, but it's sometimes a little inconvenient.

**NeoPa** · Nov 1 '06, 09:51 AM

I posted this elsewhere, but it really should have come in here.

Originally posted by NeoPa

Killer,

You should seriously consider porting your back end database to SQL Server 2005.
I believe the free version is now much more flexible than the 2K MSDE.
It will handle your databases (/tables) without strain.
Unfortunately, the SQL Server forums are not as active as these here in the Access section (No MM of course).

PS. Sorry about the 2GB limit - I thought it had gone from 2GB to 4GB. It must have been from 1GB to 2GB.

**MMcCarthy** · Nov 1 '06, 02:50 PM

Originally posted by Killer42

I'm afraid you've lost me completely as to what "this method" is.

I have a separate database for each year's data, with typically 12 to 15 million records, each database approaching 2GB in size (lucky coincidence). Each consists of a single table. They have identical structure and indexes. It's simply the same database copied, cleaned out and repopulated for each year.

I'd like to have a front-end database which just includes a union query to pull them all together, then play with that. It's not all that important - I can just continue to use them separately, but it's sometimes a little inconvenient.

Sorry Killer

I didn't realise how many records you were talking about.

**Killer42** · Nov 3 '06, 01:00 AM

Originally posted by mmccarthy

I didn't realise how many records you were talking about.

Just for the record, the error message when Access tries to go beyond 2GB was "Invalid argument". Why do I bring this up now? I'll give you one guess... :(

**NeoPa** · Nov 3 '06, 12:05 PM

I don't think Access checks whether or not it's reached its ceiling - It just crashes (apparently randomly) wherever its internal code fails :(.

Importance of primary keys in UNION

Importance of primary keys in UNION

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment