good table design...

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hilarion

    #16
    Re: good table design...

    > You realize an autoincremented integer, at least in MS SQL Server is[color=blue]
    > terrible for this. You can't guarantee that you won't have gaps[/color]

    What's with those gaps? I always wonder. Who cares about them? It's not
    a row number, but a key, so gaps are OK.

    Hilarion


    Comment

    • David Portas

      #17
      Re: good table design...

      You've failed to make the vital distinction between a *surrogate* key and a
      *natural* key. You have no data integrity if your data looks like this:

      id vehicle
      ----------- -----------
      1 Car
      2 Car
      3 Truck

      An autoincrementin g surrogate key should never be the only key of a table.

      --
      David Portas
      SQL Server MVP
      --


      Comment

      • -P-

        #18
        Re: good table design...

        "Greg D. Moore (Strider)" <mooregr_delete th1s@greenms.co m> wrote in message
        news:9EUpd.1446 9$1u.4180@twist er.nyroc.rr.com ...[color=blue]
        >
        > "-P-" <ent_must_die@h otmail.DOTcom> wrote in message
        > news:fqadnXa8Od VOczrcRVn-iw@adelphia.com ...
        >[color=green]
        >> This is where I would use a separate identifier as the primary key.[/color]
        > Something that will never change (an[color=green]
        >> autoincremented integer, for example), so that the "descriptio n" column[/color]
        > can contain the more volatile descriptive text.[color=green]
        >>[/color]
        >
        > You realize an autoincremented integer, at least in MS SQL Server is
        > terrible for this. You can't guarantee that you won't have gaps and you
        > can't even guarantee that the numbers will remain the same. DBCC CHECKIDENT
        > can reset things on you, copying them to another DB may completely change
        > the numbers, etc.
        >
        > You are right that the descriptions may change. I'd recommend something
        > like a partial VIN since that's described by an outside authority and pretty
        > much won't change.
        >
        >[/color]


        "Not having gaps" wasn't specified as a requirement... <G> I think gaps in identity columns are perfectly fine -
        unless you're exposing the number to the user, as with an invoice number or some other "auditable" entity. In that
        case, I wouldn't use IDENTITY.

        -P-



        Comment

        • -P-

          #19
          Re: good table design...

          "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:1P6dncjbXd CHyjXcRVn-vw@giganews.com ...[color=blue]
          > You've failed to make the vital distinction between a *surrogate* key and a *natural* key. You have no data integrity
          > if your data looks like this:
          >
          > id vehicle
          > ----------- -----------
          > 1 Car
          > 2 Car
          > 3 Truck
          >
          > An autoincrementin g surrogate key should never be the only key of a table.
          >[/color]

          Baloney... A simple unique index or constraint on the description would prevent the scenario you just described.

          "Never"? I also disagree with that statement. There are perfectly acceptable uses for that design.

          -Paul-


          Comment

          • Hilarion

            #20
            Re: good table design...

            > I agree with the reply to that post:[color=blue]
            > http://www.google.com/groups?hl=pl&l...TNGP11.phx.gbl[/color]

            I mean I agree to the post pointed by the URL, and which is a replty to the
            oryginal post.

            Hilarion


            Comment

            • David Portas

              #21
              Re: good table design...

              > A simple unique index or constraint on the description would prevent the[color=blue]
              > scenario you just described.[/color]

              That's what I was suggesting. You didn't mention the importance of declaring
              the natural key so I was just making the point for the benefit of the OP.
              [color=blue]
              > "Never"? I also disagree with that statement. There are perfectly
              > acceptable uses for that design.[/color]

              As part of an ETL process it may be acceptable in a staging table to use
              only an artificial key. Not in a relational schema. Without a natural key by
              implication you have redundant data and by definition no meaningful way to
              define the entity you are modelling.

              The cases I've come across that perhaps justify an exception to this are
              when you have an automatic, event-driven process which logs to a table
              without human intervention. It may not be feasible as part of a real-time
              logging process to ensure that a natural key is enforced. The sequence of
              events is the data you are attempting to capture but the database system may
              not support a date/timestamp of sufficient precision to guarantee that each
              event has a unique time. This is really a problem of application design
              rather than relational database design. You still generate redundant data
              but in a log that isn't always a big problem.

              --
              David Portas
              SQL Server MVP
              --


              Comment

              • Erland Sommarskog

                #22
                Re: good table design...

                Greg D. Moore (Strider) (mooregr_delete th1s@greenms.co m) writes:[color=blue]
                > You realize an autoincremented integer, at least in MS SQL Server is
                > terrible for this. You can't guarantee that you won't have gaps and you
                > can't even guarantee that the numbers will remain the same. DBCC
                > CHECKIDENT can reset things on you, copying them to another DB may
                > completely change the numbers, etc.[/color]

                Gaps are a non-issue in this case.

                That said, you are terribly lazy if you need to have auto-incremented
                ids for a simple lookup table. Better assigned the ids manually.




                --
                Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                Books Online for SQL Server SP3 at
                Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                Comment

                • -P-

                  #23
                  Re: good table design...

                  "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:GLudnVtpyo 3qSDXcRVn-rw@giganews.com ...[color=blue][color=green]
                  >> A simple unique index or constraint on the description would prevent the scenario you just described.[/color]
                  >
                  > That's what I was suggesting. You didn't mention the importance of declaring the natural key so I was just making the
                  > point for the benefit of the OP.
                  >[color=green]
                  >> "Never"? I also disagree with that statement. There are perfectly acceptable uses for that design.[/color]
                  >
                  > As part of an ETL process it may be acceptable in a staging table to use only an artificial key. Not in a relational
                  > schema. Without a natural key by implication you have redundant data and by definition no meaningful way to define the
                  > entity you are modelling.
                  >
                  > The cases I've come across that perhaps justify an exception to this are when you have an automatic, event-driven
                  > process which logs to a table without human intervention. It may not be feasible as part of a real-time logging
                  > process to ensure that a natural key is enforced. The sequence of events is the data you are attempting to capture but
                  > the database system may not support a date/timestamp of sufficient precision to guarantee that each event has a unique
                  > time. This is really a problem of application design rather than relational database design. You still generate
                  > redundant data but in a log that isn't always a big problem.
                  >[/color]


                  We still disagree. I would never use autoincrement in situations where a natural key was evident and available, but I
                  still contend that there is a place for autoincrement in the relational model. What about entities that have no
                  identifiable "natural" key? We have several in our model, and had to invent an identification scheme for them.

                  --
                  Paul Horan


                  Comment

                  • David Portas

                    #24
                    Re: good table design...

                    > What about entities that have no identifiable "natural" key? We have[color=blue]
                    > several in our model, and had to invent an identification scheme for them.[/color]

                    There are two separate issues and maybe neither of us have spelt them out
                    well enough.

                    1) When you model an entity in a relational database it must always have a
                    natural key by definition (= "a subset of the attributes that uniquely
                    identifies a row"), otherwise you have redundancy and no integrity. For
                    example that key could be the vehicle description in Toedipper's case. If
                    you have a table without a natural key then someone has failed to identify
                    the entity properly in the logical model. However the natural key may not
                    necessarily be convenient for use as a foreign key (because of storage or
                    performance considerations for example).

                    2) In cases where the data changes infrequently it may additionally be
                    desirable to create your own user-assigned surrogate key, such as "C" for
                    "Car" for example. In other cases, a system-assigned surrogate (such as
                    IDENTITY in SQL Server) is often used. Surrogate keys are not a substitute
                    for a natural key, which should still be declared as a key of the table.
                    Unfortunately system-assigned "row-identifier" surrogates are too often used
                    carelessly by those who fail to design a proper logical model and think they
                    don't need real keys. System-assigned surrogates are not part of the logical
                    data model at all - they are part of the physical implementation.

                    There is of course room for a great deal of debate about the wisdom or
                    otherwise of using system-assigned surrogate keys at all. There are
                    reasonable arguments on both sides and I don't want to go over that debate
                    again here. What isn't usually disputed is that a surrogate key should never
                    be the ONLY key of a table. As soon as you compromise that principle your
                    data model is lost and you have big logical problems with data integrity and
                    often insurmountable practical problems when it comes to getting meaningful
                    information from the data.

                    --
                    David Portas
                    SQL Server MVP
                    --


                    Comment

                    • Erland Sommarskog

                      #25
                      Re: good table design...

                      David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                      > There is of course room for a great deal of debate about the wisdom or
                      > otherwise of using system-assigned surrogate keys at all. There are
                      > reasonable arguments on both sides and I don't want to go over that
                      > debate again here. What isn't usually disputed is that a surrogate key
                      > should never be the ONLY key of a table. As soon as you compromise that
                      > principle your data model is lost and you have big logical problems with
                      > data integrity and often insurmountable practical problems when it comes
                      > to getting meaningful information from the data.[/color]

                      I would say that there certainly are cases where some sort of system-
                      generated key is the only possible key. Take for instance a table with
                      account transactions. You can fairly well describe a transaction by using
                      account number, date and time of day. But there may be two transactions
                      for the same account in the same millisecond, so those three alone cannot
                      make a key. You can then try to find some constraint that distinguishes
                      two transactions that happen at the same time. (Typically they would be
                      generated by some batch process.) But you would then only trying to intro-
                      duce a constraint that has no relation to business rules, and one day
                      you will get a failure for a perfectly valid transaction, because it
                      did not fit into the squared model. That's when a surrogate key is the
                      way to go.


                      --
                      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                      Books Online for SQL Server SP3 at
                      Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                      Comment

                      • David Portas

                        #26
                        Re: good table design...

                        That depends on how you define the transaction entity. For example:

                        Account Timestamp Amount
                        ------- ----------------------- -------
                        1234 2004-11-01 09:10:01.104 512.99
                        1234 2004-11-01 09:10:01.104 512.00


                        might be adequately represented as:

                        Account Timestamp Amount
                        ------- ----------------------- -------
                        1234 2004-11-01 09:10:01.104 1024.99


                        Or even:

                        Account Timestamp Amount Trancount
                        ------- ----------------------- ------- ---------
                        1234 2004-11-01 09:10:01.104 1024.99 2


                        In reality, at least in the financial systems I have worked with,
                        transactions contain more information that this. They are identified as part
                        of a batch by a unique batch number or journal number which is assigned at
                        the time the batch is generated. The batch number is itself a surrogate for
                        an entity composed of something like (orignating_ent ity, location,
                        datetime).

                        --
                        David Portas
                        SQL Server MVP
                        --


                        Comment

                        • Erland Sommarskog

                          #27
                          Re: good table design...

                          David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                          > That depends on how you define the transaction entity. For example:
                          >
                          > Account Timestamp Amount
                          > ------- ----------------------- -------
                          > 1234 2004-11-01 09:10:01.104 512.99
                          > 1234 2004-11-01 09:10:01.104 512.00
                          >
                          >
                          > might be adequately represented as:
                          >
                          > Account Timestamp Amount
                          > ------- ----------------------- -------
                          > 1234 2004-11-01 09:10:01.104 1024.99[/color]

                          That is very likely to be completely unacceptable.
                          [color=blue]
                          > In reality, at least in the financial systems I have worked with,
                          > transactions contain more information that this.[/color]

                          Yes, there is very likely to be more information: transaction type,
                          transaction text etc. And some of these may be different, which is
                          why you cannot collapse two transactions into one. But the problem is
                          that you would have include about every column in the table, to not
                          put up a roadblock for a pair of transactions that are valid for the
                          real-world business.
                          [color=blue]
                          > They are identified as part of a batch by a unique batch number or
                          > journal number which is assigned at the time the batch is generated. The
                          > batch number is itself a surrogate for an entity composed of something
                          > like (orignating_ent ity, location, datetime).[/color]

                          Yes, there may be such a thing. But not all transactions may be booked
                          in this way.


                          --
                          Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                          Books Online for SQL Server SP3 at
                          Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                          Comment

                          • David Portas

                            #28
                            Re: good table design...

                            > you would have include about every column in the table, to not[color=blue]
                            > put up a roadblock for a pair of transactions that are valid for the
                            > real-world business.[/color]

                            Q.E.D.

                            --
                            David Portas
                            SQL Server MVP
                            --


                            Comment

                            • -P-

                              #29
                              Re: good table design...

                              I've seen models that take the use of autoincrement to the extreme, where EVERY table has a sequence column as the PK,
                              and that's also wrong. I'm certainly not advocating that - but I do take issue with your assertion that there is NO
                              PLACE for autoincrement in a relational database.

                              For example, our Order table for storing commercial airtime orders from Ad agencies to TV stations... There are about
                              30 columns that help to describe the parameters of the order, including the Advertiser_ID, an Agency_ID (both fk
                              references to the Name_Address table), some dates, various and sundry codes, revision counters, and some descriptive
                              text... You're suggesting that we find the 7 or 8 columns that combine to uniquely identify an Order (including the
                              aforementioned descriptive text column) and call those the primary key? And then, we get to replicate all that data on
                              every row of every table that references Order (roughly 20 additional tables, up to 5 levels deep in places).

                              This poor design is solved by creating the Order_ID column as an autoincrementin g number, and using that as the primary
                              key. The references to Order then use Order_ID as the foreign key. No muss, no fuss. Significantly less duplication
                              of data, greatly increased performance for JOIN processing, and MUCH easier to work with from a development standpoint.

                              -Paul-


                              "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message news:m5qdncYoJY WiOjbcRVn-gw@giganews.com ...[color=blue][color=green]
                              >> you would have include about every column in the table, to not
                              >> put up a roadblock for a pair of transactions that are valid for the
                              >> real-world business.[/color]
                              >
                              > Q.E.D.
                              >
                              > --
                              > David Portas
                              > SQL Server MVP
                              > --
                              >
                              >[/color]


                              Comment

                              • David Portas

                                #30
                                Re: good table design...

                                > I do take issue with your assertion that there is NO PLACE for[color=blue]
                                > autoincrement in a relational database.[/color]

                                I didn't say that. Some people take the view that you should never use
                                "autoincrementi ng" keys but I'm not one of them.

                                [color=blue]
                                > You're suggesting that we find the 7 or 8 columns that
                                > combine to uniquely identify an Order[/color]

                                Identifying the keys is an essential part of the process of designing the
                                logical model anyway.

                                [color=blue]
                                > (including the aforementioned descriptive text column) and call
                                > those the primary key? And then, we get to replicate all that data
                                > on every row of every table that references Order (roughly 20
                                > additional tables, up to 5 levels deep in places).[/color]

                                No. You missed my point. Yes, create a compact surrogate key if you need to
                                and use that in other tables as the foreign key. But it is still essential
                                ALSO to declare the *natural* key columns. You don't have to duplicate the
                                natural key in any referencing tables but you do have to *declare* the key
                                in the parent table. From before: "An autoincrementin g surrogate key should
                                never be the only key of a table." The key word is "only". That was the
                                assertion that you originally disagreed with but none of what you have said
                                contradicts that statement so maybe we agree after all. :-)

                                [color=blue]
                                > This poor design is solved by creating the Order_ID column as an
                                > autoincrementin g number, and using that as the primary key. The
                                > references to Order then use Order_ID as the foreign key. No muss, no
                                > fuss. Significantly less duplication of data, greatly increased
                                > performance for JOIN processing, and MUCH easier to work with from a
                                > development standpoint.[/color]

                                Agreed. System-generated "autoincrementi ng" keys aren't the only option for
                                a surrogate key but they do have their uses. I don't have anything against
                                surrogate keys - only against poorly designed tables without natural keys,
                                mainly because in my career I've spent a lot of time identifying and fixing
                                problems caused by other people's weak schema designs.


                                --
                                David Portas
                                SQL Server MVP
                                --


                                Comment

                                Working...