good table design...

**Hilarion** · Jul 20 '05, 05:31 AM

Re: good table design...

> You realize an autoincremented integer, at least in MS SQL Server is[color=blue]
> terrible for this. You can't guarantee that you won't have gaps[/color]

What's with those gaps? I always wonder. Who cares about them? It's not
a row number, but a key, so gaps are OK.

Hilarion

**David Portas** · Jul 20 '05, 05:31 AM

Re: good table design...

You've failed to make the vital distinction between a *surrogate* key and a
*natural* key. You have no data integrity if your data looks like this:

id vehicle
----------- -----------
1 Car
2 Car
3 Truck

An autoincrementin g surrogate key should never be the only key of a table.

--
David Portas
SQL Server MVP
--

**-P-** · Jul 20 '05, 05:31 AM

Re: good table design...

"Greg D. Moore (Strider)" <mooregr_delete th1s@greenms.co m> wrote in message
news:9EUpd.1446 9$1u.4180@twist er.nyroc.rr.com ...[color=blue]
>
> "-P-" <ent_must_die@h otmail.DOTcom> wrote in message
> news:fqadnXa8Od VOczrcRVn-iw@adelphia.com ...
>[color=green]
>> This is where I would use a separate identifier as the primary key.[/color]
> Something that will never change (an[color=green]
>> autoincremented integer, for example), so that the "descriptio n" column[/color]
> can contain the more volatile descriptive text.[color=green]
>>[/color]
>
> You realize an autoincremented integer, at least in MS SQL Server is
> terrible for this. You can't guarantee that you won't have gaps and you
> can't even guarantee that the numbers will remain the same. DBCC CHECKIDENT
> can reset things on you, copying them to another DB may completely change
> the numbers, etc.
>
> You are right that the descriptions may change. I'd recommend something
> like a partial VIN since that's described by an outside authority and pretty
> much won't change.
>
>[/color]

"Not having gaps" wasn't specified as a requirement... <G> I think gaps in identity columns are perfectly fine -
unless you're exposing the number to the user, as with an invoice number or some other "auditable" entity. In that
case, I wouldn't use IDENTITY.

-P-

**-P-** · Jul 20 '05, 05:31 AM

Re: good table design...

"David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:1P6dncjbXd CHyjXcRVn-vw@giganews.com ...[color=blue]
> You've failed to make the vital distinction between a *surrogate* key and a *natural* key. You have no data integrity
> if your data looks like this:
>
> id vehicle
> ----------- -----------
> 1 Car
> 2 Car
> 3 Truck
>
> An autoincrementin g surrogate key should never be the only key of a table.
>[/color]

Baloney... A simple unique index or constraint on the description would prevent the scenario you just described.

"Never"? I also disagree with that statement. There are perfectly acceptable uses for that design.

-Paul-

**Hilarion** · Jul 20 '05, 05:31 AM

Re: good table design...

> I agree with the reply to that post:[color=blue]
> http://www.google.com/groups?hl=pl&l...TNGP11.phx.gbl[/color]

I mean I agree to the post pointed by the URL, and which is a replty to the
oryginal post.

Hilarion

**David Portas** · Jul 20 '05, 05:31 AM

Re: good table design...

> A simple unique index or constraint on the description would prevent the[color=blue]
> scenario you just described.[/color]

That's what I was suggesting. You didn't mention the importance of declaring
the natural key so I was just making the point for the benefit of the OP.
[color=blue]
> "Never"? I also disagree with that statement. There are perfectly
> acceptable uses for that design.[/color]

As part of an ETL process it may be acceptable in a staging table to use
only an artificial key. Not in a relational schema. Without a natural key by
implication you have redundant data and by definition no meaningful way to
define the entity you are modelling.

The cases I've come across that perhaps justify an exception to this are
when you have an automatic, event-driven process which logs to a table
without human intervention. It may not be feasible as part of a real-time
logging process to ensure that a natural key is enforced. The sequence of
events is the data you are attempting to capture but the database system may
not support a date/timestamp of sufficient precision to guarantee that each
event has a unique time. This is really a problem of application design
rather than relational database design. You still generate redundant data
but in a log that isn't always a big problem.

--
David Portas
SQL Server MVP
--

**Erland Sommarskog** · Jul 20 '05, 05:31 AM

Re: good table design...

Greg D. Moore (Strider) (mooregr_delete th1s@greenms.co m) writes:[color=blue]
> You realize an autoincremented integer, at least in MS SQL Server is
> terrible for this. You can't guarantee that you won't have gaps and you
> can't even guarantee that the numbers will remain the same. DBCC
> CHECKIDENT can reset things on you, copying them to another DB may
> completely change the numbers, etc.[/color]

Gaps are a non-issue in this case.

That said, you are terribly lazy if you need to have auto-incremented
ids for a simple lookup table. Better assigned the ids manually.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**-P-** · Jul 20 '05, 05:31 AM

Re: good table design...

"David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:GLudnVtpyo 3qSDXcRVn-rw@giganews.com ...[color=blue][color=green]
>> A simple unique index or constraint on the description would prevent the scenario you just described.[/color]
>
> That's what I was suggesting. You didn't mention the importance of declaring the natural key so I was just making the
> point for the benefit of the OP.
>[color=green]
>> "Never"? I also disagree with that statement. There are perfectly acceptable uses for that design.[/color]
>
> As part of an ETL process it may be acceptable in a staging table to use only an artificial key. Not in a relational
> schema. Without a natural key by implication you have redundant data and by definition no meaningful way to define the
> entity you are modelling.
>
> The cases I've come across that perhaps justify an exception to this are when you have an automatic, event-driven
> process which logs to a table without human intervention. It may not be feasible as part of a real-time logging
> process to ensure that a natural key is enforced. The sequence of events is the data you are attempting to capture but
> the database system may not support a date/timestamp of sufficient precision to guarantee that each event has a unique
> time. This is really a problem of application design rather than relational database design. You still generate
> redundant data but in a log that isn't always a big problem.
>[/color]

We still disagree. I would never use autoincrement in situations where a natural key was evident and available, but I
still contend that there is a place for autoincrement in the relational model. What about entities that have no
identifiable "natural" key? We have several in our model, and had to invent an identification scheme for them.

--
Paul Horan

**David Portas** · Jul 20 '05, 05:31 AM

Re: good table design...

> What about entities that have no identifiable "natural" key? We have[color=blue]
> several in our model, and had to invent an identification scheme for them.[/color]

There are two separate issues and maybe neither of us have spelt them out
well enough.

1) When you model an entity in a relational database it must always have a
natural key by definition (= "a subset of the attributes that uniquely
identifies a row"), otherwise you have redundancy and no integrity. For
example that key could be the vehicle description in Toedipper's case. If
you have a table without a natural key then someone has failed to identify
the entity properly in the logical model. However the natural key may not
necessarily be convenient for use as a foreign key (because of storage or
performance considerations for example).

2) In cases where the data changes infrequently it may additionally be
desirable to create your own user-assigned surrogate key, such as "C" for
"Car" for example. In other cases, a system-assigned surrogate (such as
IDENTITY in SQL Server) is often used. Surrogate keys are not a substitute
for a natural key, which should still be declared as a key of the table.
Unfortunately system-assigned "row-identifier" surrogates are too often used
carelessly by those who fail to design a proper logical model and think they
don't need real keys. System-assigned surrogates are not part of the logical
data model at all - they are part of the physical implementation.

There is of course room for a great deal of debate about the wisdom or
otherwise of using system-assigned surrogate keys at all. There are
reasonable arguments on both sides and I don't want to go over that debate
again here. What isn't usually disputed is that a surrogate key should never
be the ONLY key of a table. As soon as you compromise that principle your
data model is lost and you have big logical problems with data integrity and
often insurmountable practical problems when it comes to getting meaningful
information from the data.

--
David Portas
SQL Server MVP
--

**Erland Sommarskog** · Jul 20 '05, 05:31 AM

Re: good table design...

David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
> There is of course room for a great deal of debate about the wisdom or
> otherwise of using system-assigned surrogate keys at all. There are
> reasonable arguments on both sides and I don't want to go over that
> debate again here. What isn't usually disputed is that a surrogate key
> should never be the ONLY key of a table. As soon as you compromise that
> principle your data model is lost and you have big logical problems with
> data integrity and often insurmountable practical problems when it comes
> to getting meaningful information from the data.[/color]

I would say that there certainly are cases where some sort of system-
generated key is the only possible key. Take for instance a table with
account transactions. You can fairly well describe a transaction by using
account number, date and time of day. But there may be two transactions
for the same account in the same millisecond, so those three alone cannot
make a key. You can then try to find some constraint that distinguishes
two transactions that happen at the same time. (Typically they would be
generated by some batch process.) But you would then only trying to intro-
duce a constraint that has no relation to business rules, and one day
you will get a failure for a perfectly valid transaction, because it
did not fit into the squared model. That's when a surrogate key is the
way to go.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**David Portas** · Jul 20 '05, 05:31 AM

Re: good table design...

That depends on how you define the transaction entity. For example:

Account Timestamp Amount
------- ----------------------- -------
1234 2004-11-01 09:10:01.104 512.99
1234 2004-11-01 09:10:01.104 512.00

might be adequately represented as:

Account Timestamp Amount
------- ----------------------- -------
1234 2004-11-01 09:10:01.104 1024.99

Or even:

Account Timestamp Amount Trancount
------- ----------------------- ------- ---------
1234 2004-11-01 09:10:01.104 1024.99 2

In reality, at least in the financial systems I have worked with,
transactions contain more information that this. They are identified as part
of a batch by a unique batch number or journal number which is assigned at
the time the batch is generated. The batch number is itself a surrogate for
an entity composed of something like (orignating_ent ity, location,
datetime).

--
David Portas
SQL Server MVP
--

**Erland Sommarskog** · Jul 20 '05, 05:32 AM

Re: good table design...

David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
> That depends on how you define the transaction entity. For example:
>
> Account Timestamp Amount
> ------- ----------------------- -------
> 1234 2004-11-01 09:10:01.104 512.99
> 1234 2004-11-01 09:10:01.104 512.00
>
>
> might be adequately represented as:
>
> Account Timestamp Amount
> ------- ----------------------- -------
> 1234 2004-11-01 09:10:01.104 1024.99[/color]

That is very likely to be completely unacceptable.
[color=blue]
> In reality, at least in the financial systems I have worked with,
> transactions contain more information that this.[/color]

Yes, there is very likely to be more information: transaction type,
transaction text etc. And some of these may be different, which is
why you cannot collapse two transactions into one. But the problem is
that you would have include about every column in the table, to not
put up a roadblock for a pair of transactions that are valid for the
real-world business.
[color=blue]
> They are identified as part of a batch by a unique batch number or
> journal number which is assigned at the time the batch is generated. The
> batch number is itself a surrogate for an entity composed of something
> like (orignating_ent ity, location, datetime).[/color]

Yes, there may be such a thing. But not all transactions may be booked
in this way.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**David Portas** · Jul 20 '05, 05:32 AM

Re: good table design...

> you would have include about every column in the table, to not[color=blue]
> put up a roadblock for a pair of transactions that are valid for the
> real-world business.[/color]

Q.E.D.

--
David Portas
SQL Server MVP
--

**-P-** · Jul 20 '05, 05:32 AM

Re: good table design...

I've seen models that take the use of autoincrement to the extreme, where EVERY table has a sequence column as the PK,
and that's also wrong. I'm certainly not advocating that - but I do take issue with your assertion that there is NO
PLACE for autoincrement in a relational database.

For example, our Order table for storing commercial airtime orders from Ad agencies to TV stations... There are about
30 columns that help to describe the parameters of the order, including the Advertiser_ID, an Agency_ID (both fk
references to the Name_Address table), some dates, various and sundry codes, revision counters, and some descriptive
text... You're suggesting that we find the 7 or 8 columns that combine to uniquely identify an Order (including the
aforementioned descriptive text column) and call those the primary key? And then, we get to replicate all that data on
every row of every table that references Order (roughly 20 additional tables, up to 5 levels deep in places).

This poor design is solved by creating the Order_ID column as an autoincrementin g number, and using that as the primary
key. The references to Order then use Order_ID as the foreign key. No muss, no fuss. Significantly less duplication
of data, greatly increased performance for JOIN processing, and MUCH easier to work with from a development standpoint.

-Paul-

"David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:m5qdncYoJY WiOjbcRVn-gw@giganews.com ...[color=blue][color=green]
>> you would have include about every column in the table, to not
>> put up a roadblock for a pair of transactions that are valid for the
>> real-world business.[/color]
>
> Q.E.D.
>
> --
> David Portas
> SQL Server MVP
> --
>
>[/color]

**David Portas** · Jul 20 '05, 05:32 AM

Re: good table design...

> I do take issue with your assertion that there is NO PLACE for[color=blue]
> autoincrement in a relational database.[/color]

I didn't say that. Some people take the view that you should never use
"autoincrementi ng" keys but I'm not one of them.

[color=blue]
> You're suggesting that we find the 7 or 8 columns that
> combine to uniquely identify an Order[/color]

Identifying the keys is an essential part of the process of designing the
logical model anyway.

[color=blue]
> (including the aforementioned descriptive text column) and call
> those the primary key? And then, we get to replicate all that data
> on every row of every table that references Order (roughly 20
> additional tables, up to 5 levels deep in places).[/color]

No. You missed my point. Yes, create a compact surrogate key if you need to
and use that in other tables as the foreign key. But it is still essential
ALSO to declare the *natural* key columns. You don't have to duplicate the
natural key in any referencing tables but you do have to *declare* the key
in the parent table. From before: "An autoincrementin g surrogate key should
never be the only key of a table." The key word is "only". That was the
assertion that you originally disagreed with but none of what you have said
contradicts that statement so maybe we agree after all. :-)

[color=blue]
> This poor design is solved by creating the Order_ID column as an
> autoincrementin g number, and using that as the primary key. The
> references to Order then use Order_ID as the foreign key. No muss, no
> fuss. Significantly less duplication of data, greatly increased
> performance for JOIN processing, and MUCH easier to work with from a
> development standpoint.[/color]

Agreed. System-generated "autoincrementi ng" keys aren't the only option for
a surrogate key but they do have their uses. I don't have anything against
surrogate keys - only against poorly designed tables without natural keys,
mainly because in my career I've spent a lot of time identifying and fixing
problems caused by other people's weak schema designs.

--
David Portas
SQL Server MVP
--

good table design...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment