Q: calling all gurus

**David Portas** · Jul 20 '05, 01:55 AM

Re: calling all gurus

It's always difficult to give design advice online without the opportunity
to analyse a particular situation in detail. But my opinion, based on the
information given, is that you should definitely use a single table.

Always start with the logical model - you appear to have a single entity so
you should have a single table. Add a column to indicate the status and
index it. The table is so small that retrieving the relevant rows from the
index should give SQLServer no trouble at all. Updating even this indexed
column should be far preferable to copying data between tables when the
status changes.

But as always with performance questions: don't believe what I say - test it
and see.

--
David Portas
------------
Please reply only to the newsgroup
--

**John Palmer** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
wrote:
[color=blue]
>The Scenario:
>
>We have some data that can be in three states. One is a saved state,
>a temporary state, and a complete state.
>
>The complete state is the bulk of the data. This data will be
>requested 1% of the time, maybe even less.
>
>The saved state and temporary state will only number 10-15 records.
>This data will be used 99% of the time, very frequently.
>
>
>So to summarize there will be 10-15 records that will be used ALL the
>time…. And 50,000-100,000 that will be used very infrequently.
>
>The programmer that I work with wants to combine the tables. I want
>to keep the tables sepperate.
>
>My logic is that it will be much faster to work with 10-15 records
>than sift threw 50-100k most of which are unnecessary to 99% of the
>requests. Also there are frequent deletions/modifications in the
>temporary table(10-15 records). The complete table can NOT have
>modification s or deletions (50-100k records). The optimization for
>these tables is also very different.
>
>He says that since the structures are almost identical they should be
>in the same table, that we should just use a field to denote the
>different types of records. He insists that my view is wrong and that
>I know nothing about databases.
>
>I may be new to SQL, but am I really that out to lunch? Would it not
>make sense to keep these tables separate?[/color]

I'm a relative newbie at SQL design and administration, but I believe
that it probably won't make much difference. If you build the table
with a clustered index (hopefully an integer), and with an additional
field to flag "saved", "temporary" and "complete", and index the
additional field, there will be nearly no time lost in scanning the
index, and grabbing the 10-15 rows in either of the first two tables.

If these were large tables, it might make a difference, but even then,
less than you might think; the index on the flag field essentially
splits the table into three separate tables, so long as your queries
are written to take advantage of that index. (e.g.: if you're looking
for something "saved" that has a certain unique attribute, make sure
you specify in the query that it's "saved", so SQL Server doesn't have
to do a table scan for the attribute you're looking for.

(Also, as a side note: proper database design suggests that you do not
have a field that allows the words "saved", "temporary" and "complete;
you should have a field that allows integers from one to three, and a
second table that equates 1 with "saved", 2 with "temporary" and 3
with "complete" (or whatever order seems best). Comparing integers is
much faster than comparing strings, and this saves a nice chunk of
space, allowing more rows to be cached.)

**David Portas** · Jul 20 '05, 01:55 AM

Re: calling all gurus

It's always difficult to give design advice online without the opportunity
to analyse a particular situation in detail. But my opinion, based on the
information given, is that you should definitely use a single table.

Always start with the logical model - you appear to have a single entity so
you should have a single table. Add a column to indicate the status and
index it. The table is so small that retrieving the relevant rows from the
index should give SQLServer no trouble at all. Updating even this indexed
column should be far preferable to copying data between tables when the
status changes.

But as always with performance questions: don't believe what I say - test it
and see.

--
David Portas
------------
Please reply only to the newsgroup
--

**John Palmer** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
wrote:
[color=blue]
>The Scenario:
>
>We have some data that can be in three states. One is a saved state,
>a temporary state, and a complete state.
>
>The complete state is the bulk of the data. This data will be
>requested 1% of the time, maybe even less.
>
>The saved state and temporary state will only number 10-15 records.
>This data will be used 99% of the time, very frequently.
>
>
>So to summarize there will be 10-15 records that will be used ALL the
>time…. And 50,000-100,000 that will be used very infrequently.
>
>The programmer that I work with wants to combine the tables. I want
>to keep the tables sepperate.
>
>My logic is that it will be much faster to work with 10-15 records
>than sift threw 50-100k most of which are unnecessary to 99% of the
>requests. Also there are frequent deletions/modifications in the
>temporary table(10-15 records). The complete table can NOT have
>modification s or deletions (50-100k records). The optimization for
>these tables is also very different.
>
>He says that since the structures are almost identical they should be
>in the same table, that we should just use a field to denote the
>different types of records. He insists that my view is wrong and that
>I know nothing about databases.
>
>I may be new to SQL, but am I really that out to lunch? Would it not
>make sense to keep these tables separate?[/color]

I'm a relative newbie at SQL design and administration, but I believe
that it probably won't make much difference. If you build the table
with a clustered index (hopefully an integer), and with an additional
field to flag "saved", "temporary" and "complete", and index the
additional field, there will be nearly no time lost in scanning the
index, and grabbing the 10-15 rows in either of the first two tables.

If these were large tables, it might make a difference, but even then,
less than you might think; the index on the flag field essentially
splits the table into three separate tables, so long as your queries
are written to take advantage of that index. (e.g.: if you're looking
for something "saved" that has a certain unique attribute, make sure
you specify in the query that it's "saved", so SQL Server doesn't have
to do a table scan for the attribute you're looking for.

(Also, as a side note: proper database design suggests that you do not
have a field that allows the words "saved", "temporary" and "complete;
you should have a field that allows integers from one to three, and a
second table that equates 1 with "saved", 2 with "temporary" and 3
with "complete" (or whatever order seems best). Comparing integers is
much faster than comparing strings, and this saves a nice chunk of
space, allowing more rows to be cached.)

**Craig Kelly** · Jul 20 '05, 01:55 AM

Re: calling all gurus

"Saturnin Kepa" <no1ninja@hotma il.com> wrote:

<snip>
[color=blue]
> My logic is that it will be much faster to work with 10-15 records
> than sift threw 50-100k most of which are unnecessary to 99% of the
> requests. Also there are frequent deletions/modifications in the
> temporary table(10-15 records). The complete table can NOT have
> modifications or deletions (50-100k records). The optimization for
> these tables is also very different.[/color]

<snip>

My two cents: I agree with David. However, I'd also like to throw in the
old adage that pre-mature optimization can cause you untold headaches. My
advice would be to go with a nice, clean, logical design (in this instance,
avoiding the use of multiple tables for a single type of entity based on a
single property).

If some prudent indexing won't keep your performance at acceptable levels,
then you can try things like splitting into separate tables. However, my
guess is that you'll find the single table performs just fine: it's not like
we're talking about 10 billion records or anything.

Craig

**Lyndon Hills** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

On Wed, 29 Oct 2003 09:07:20 -0800, John Palmer
<jpalmer1@ix.ne tcom.com> wrote:
[color=blue]
>On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
>wrote:
>[color=green]
>>The Scenario:
>>
>>We have some data that can be in three states. One is a saved state,
>>a temporary state, and a complete state.
>>
>>The complete state is the bulk of the data. This data will be
>>requested 1% of the time, maybe even less.
>>
>>The saved state and temporary state will only number 10-15 records.
>>This data will be used 99% of the time, very frequently.
>>
>>
>>So to summarize there will be 10-15 records that will be used ALL the
>>time…. And 50,000-100,000 that will be used very infrequently.
>>
>>The programmer that I work with wants to combine the tables. I want
>>to keep the tables sepperate.
>>
>>My logic is that it will be much faster to work with 10-15 records
>>than sift threw 50-100k most of which are unnecessary to 99% of the
>>requests. Also there are frequent deletions/modifications in the
>>temporary table(10-15 records). The complete table can NOT have
>>modificatio ns or deletions (50-100k records). The optimization for
>>these tables is also very different.
>>
>>He says that since the structures are almost identical they should be
>>in the same table, that we should just use a field to denote the
>>different types of records. He insists that my view is wrong and that
>>I know nothing about databases.
>>
>>I may be new to SQL, but am I really that out to lunch? Would it not
>>make sense to keep these tables separate?[/color]
>
>I'm a relative newbie at SQL design and administration, but I believe
>that it probably won't make much difference. If you build the table
>with a clustered index (hopefully an integer), and with an additional
>field to flag "saved", "temporary" and "complete", and index the
>additional field, there will be nearly no time lost in scanning the
>index, and grabbing the 10-15 rows in either of the first two tables.
>
>If these were large tables, it might make a difference, but even then,
>less than you might think; the index on the flag field essentially
>splits the table into three separate tables, so long as your queries
>are written to take advantage of that index. (e.g.: if you're looking
>for something "saved" that has a certain unique attribute, make sure
>you specify in the query that it's "saved", so SQL Server doesn't have
>to do a table scan for the attribute you're looking for.
>
>(Also, as a side note: proper database design suggests that you do not
>have a field that allows the words "saved", "temporary" and "complete;
>you should have a field that allows integers from one to three, and a
>second table that equates 1 with "saved", 2 with "temporary" and 3
>with "complete" (or whatever order seems best). Comparing integers is
>much faster than comparing strings, and this saves a nice chunk of
>space, allowing more rows to be cached.)[/color]

A further consideration could be how you are obtaining your primary
key. If this is one of the auto numbering schemes you may well find it
harder to work with two tables. As far as the speed issue is
concerned, try both ways and see if the difference justifies having
two tables.

**Craig Kelly** · Jul 20 '05, 01:55 AM

Re: calling all gurus

"Saturnin Kepa" <no1ninja@hotma il.com> wrote:

<snip>
[color=blue]
> My logic is that it will be much faster to work with 10-15 records
> than sift threw 50-100k most of which are unnecessary to 99% of the
> requests. Also there are frequent deletions/modifications in the
> temporary table(10-15 records). The complete table can NOT have
> modifications or deletions (50-100k records). The optimization for
> these tables is also very different.[/color]

<snip>

My two cents: I agree with David. However, I'd also like to throw in the
old adage that pre-mature optimization can cause you untold headaches. My
advice would be to go with a nice, clean, logical design (in this instance,
avoiding the use of multiple tables for a single type of entity based on a
single property).

If some prudent indexing won't keep your performance at acceptable levels,
then you can try things like splitting into separate tables. However, my
guess is that you'll find the single table performs just fine: it's not like
we're talking about 10 billion records or anything.

Craig

**--CELKO--** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

>> The programmer that I work with wants to combine the tables. I
want to keep the tables separate. <<

Why? If they are the same entity in your data model, then they should
be in one table. What you are doing is called attribute splitting and
it gets messy very fast.
[color=blue][color=green]
>> My logic is that it will be much faster to work with 10-15 records[/color][/color]
[sic] than sift threw 50-100k most of which are unnecessary to 99% of
the requests. <<

First of all, rows are not records and the differences are vital to
using and understanding SQL. Secondly, use an index or uniqueness
constraint to access the rows; why would you look thru the whole
table? This is not a sequential file system.
[color=blue][color=green]
>> He says that since the structures are almost identical they should[/color][/color]
be in the same table, that we should just use a field [sic] to denote
the different types of records [sic]. <<

Yes. If they are the same kind of entities, and differ only by a
status code, then he is right. Get the dara model right first, then
tune the performance.

**Lyndon Hills** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

On Wed, 29 Oct 2003 09:07:20 -0800, John Palmer
<jpalmer1@ix.ne tcom.com> wrote:
[color=blue]
>On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
>wrote:
>[color=green]
>>The Scenario:
>>
>>We have some data that can be in three states. One is a saved state,
>>a temporary state, and a complete state.
>>
>>The complete state is the bulk of the data. This data will be
>>requested 1% of the time, maybe even less.
>>
>>The saved state and temporary state will only number 10-15 records.
>>This data will be used 99% of the time, very frequently.
>>
>>
>>So to summarize there will be 10-15 records that will be used ALL the
>>time…. And 50,000-100,000 that will be used very infrequently.
>>
>>The programmer that I work with wants to combine the tables. I want
>>to keep the tables sepperate.
>>
>>My logic is that it will be much faster to work with 10-15 records
>>than sift threw 50-100k most of which are unnecessary to 99% of the
>>requests. Also there are frequent deletions/modifications in the
>>temporary table(10-15 records). The complete table can NOT have
>>modificatio ns or deletions (50-100k records). The optimization for
>>these tables is also very different.
>>
>>He says that since the structures are almost identical they should be
>>in the same table, that we should just use a field to denote the
>>different types of records. He insists that my view is wrong and that
>>I know nothing about databases.
>>
>>I may be new to SQL, but am I really that out to lunch? Would it not
>>make sense to keep these tables separate?[/color]
>
>I'm a relative newbie at SQL design and administration, but I believe
>that it probably won't make much difference. If you build the table
>with a clustered index (hopefully an integer), and with an additional
>field to flag "saved", "temporary" and "complete", and index the
>additional field, there will be nearly no time lost in scanning the
>index, and grabbing the 10-15 rows in either of the first two tables.
>
>If these were large tables, it might make a difference, but even then,
>less than you might think; the index on the flag field essentially
>splits the table into three separate tables, so long as your queries
>are written to take advantage of that index. (e.g.: if you're looking
>for something "saved" that has a certain unique attribute, make sure
>you specify in the query that it's "saved", so SQL Server doesn't have
>to do a table scan for the attribute you're looking for.
>
>(Also, as a side note: proper database design suggests that you do not
>have a field that allows the words "saved", "temporary" and "complete;
>you should have a field that allows integers from one to three, and a
>second table that equates 1 with "saved", 2 with "temporary" and 3
>with "complete" (or whatever order seems best). Comparing integers is
>much faster than comparing strings, and this saves a nice chunk of
>space, allowing more rows to be cached.)[/color]

A further consideration could be how you are obtaining your primary
key. If this is one of the auto numbering schemes you may well find it
harder to work with two tables. As far as the speed issue is
concerned, try both ways and see if the difference justifies having
two tables.

**--CELKO--** · Jul 20 '05, 01:55 AM

Re: Q: calling all gurus

>> The programmer that I work with wants to combine the tables. I
want to keep the tables separate. <<

Why? If they are the same entity in your data model, then they should
be in one table. What you are doing is called attribute splitting and
it gets messy very fast.
[color=blue][color=green]
>> My logic is that it will be much faster to work with 10-15 records[/color][/color]
[sic] than sift threw 50-100k most of which are unnecessary to 99% of
the requests. <<

First of all, rows are not records and the differences are vital to
using and understanding SQL. Secondly, use an index or uniqueness
constraint to access the rows; why would you look thru the whole
table? This is not a sequential file system.
[color=blue][color=green]
>> He says that since the structures are almost identical they should[/color][/color]
be in the same table, that we should just use a field [sic] to denote
the different types of records [sic]. <<

Yes. If they are the same kind of entities, and differ only by a
status code, then he is right. Get the dara model right first, then
tune the performance.

**GoranG** · Jul 20 '05, 01:56 AM

Re: Q: calling all gurus

On 29 Oct 2003 13:43:52 -0800, joe.celko@north face.edu (--CELKO--)
wrote:
[color=blue][color=green][color=darkred]
>>> The programmer that I work with wants to combine the tables. I[/color][/color]
>want to keep the tables separate. <<
>
>Why? If they are the same entity in your data model, then they should
>be in one table. What you are doing is called attribute splitting and
>it gets messy very fast.
>[color=green][color=darkred]
>>> My logic is that it will be much faster to work with 10-15 records[/color][/color]
>[sic] than sift threw 50-100k most of which are unnecessary to 99% of
>the requests. <<
>
>First of all, rows are not records and the differences are vital to
>using and understanding SQL. Secondly, use an index or uniqueness
>constraint to access the rows; why would you look thru the whole
>table? This is not a sequential file system.
>[color=green][color=darkred]
>>> He says that since the structures are almost identical they should[/color][/color]
>be in the same table, that we should just use a field [sic] to denote
>the different types of records [sic]. <<
>
>Yes. If they are the same kind of entities, and differ only by a
>status code, then he is right. Get the dara model right first, then
>tune the performance.[/color]

(not a guru, but here it goes anyway...)

Note the word 'almost identical' - isn't that 'almost positive' sign
for either over generalization or existing denormalization (especially
on conceptual level of design)?

Also if one entity never changes (only inserts allowed) and the other
allows updates, inserts and deletes doesn't that suggest different
entities, and calls for deeper examination of those entities and their
attributes?

To me it seems that their DBA is the one that fails to see the
difference between the conceptual data modeling and physical data
model.

---

On another note, also hoping to learn something...
What exactly does get messy with (for instance):

CREATE TABLE ABIG (
prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
prop2 CHAR(50))

CREATE TABLE ASMALL (
prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
prop2 CHAR(50))

CREATE VIEW A AS
SELECT prop1, prop2
FROM ABIG
UNION ALL
SELECT prop1, prop2
FROM ASMALL

INSERT INTO A (prop1, prop2) VALUES (1, 'G');
INSERT INTO A (prop1, prop2) VALUES (1001, 'G');

....except that it is already messy.
Does it get even messier?

( GoranG79 AT hotmail.com )

**hansje** · Jul 20 '05, 01:56 AM

Re: Q: calling all gurus

QUOTE
On another note, also hoping to learn something...
What exactly does get messy with (for instance):

CREATE TABLE ABIG (
prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
prop2 CHAR(50))

CREATE TABLE ASMALL (
prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
prop2 CHAR(50))

CREATE VIEW A AS
SELECT prop1, prop2
FROM ABIG
UNION ALL
SELECT prop1, prop2
FROM ASMALL

INSERT INTO A (prop1, prop2) VALUES (1, 'G');
INSERT INTO A (prop1, prop2) VALUES (1001, 'G');

....except that it is already messy.
Does it get even messier?
END QUOTE

GoranG,

I'm not sure what you are asking or demonstrating here?

Tnx,

Hans Brouwer

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

**GoranG** · Jul 20 '05, 01:56 AM

Re: Q: calling all gurus

On 30 Oct 2003 12:41:08 GMT, hansje <hansjes@anonym ous.com> wrote:
[color=blue]
>QUOTE
>On another note, also hoping to learn something...
>What exactly does get messy with (for instance):
>
>CREATE TABLE ABIG (
> prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
> prop2 CHAR(50))
>
>CREATE TABLE ASMALL (
> prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
> prop2 CHAR(50))
>
>CREATE VIEW A AS
>SELECT prop1, prop2
>FROM ABIG
>UNION ALL
>SELECT prop1, prop2
>FROM ASMALL
>
>INSERT INTO A (prop1, prop2) VALUES (1, 'G');
>INSERT INTO A (prop1, prop2) VALUES (1001, 'G');
>
>...except that it is already messy.
>Does it get even messier?
>END QUOTE
>
>
>
>GoranG,
>
>I'm not sure what you are asking or demonstrating here?[/color]

Potential logical equivalence of spliting attributes at physical
level.

( GoranG79 AT hotmail.com )

**GoranG** · Jul 20 '05, 01:56 AM

Re: Q: calling all gurus

On Thu, 30 Oct 2003 14:57:06 +0100, GoranG <no@spam.net> wrote:

<cut>[color=blue]
>
>Potential logical equivalence of spliting attributes at physical
>level.[/color]

Oops.

Potential logical equivalence of
a) splitting attributes at physical level and having two physical
entities and a single view
to
b) single entity at physical level.

( GoranG79 AT hotmail.com )

Q: calling all gurus

Q: calling all gurus

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment