What datatype to use for PK?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David Portas

    #16
    Re: What datatype to use for PK?

    Conceptually the model is a "bag" rather than a "set" i.e. duplicates
    are allowed (the GUID isn't part of the logical model so it doesn't
    count). A bag may be useful for a table in a "staging" database as part
    of a data transformation process. The transformation then reconciles
    and eliminates duplicates so that integrity is maintained.

    --
    David Portas
    SQL Server MVP
    --

    Comment

    • Erland Sommarskog

      #17
      Re: What datatype to use for PK?

      David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
      > A natural key is simply a product of the way some data is represented
      > in a table - a subset of the attributes that you need to record. In the
      > case of an online order that might be (user,order_dat etime) for
      > example. A natural key is never "generated" internally because it
      > always represents something that is "outside the system" - that is the
      > critical distinction between a natural and artificial key.[/color]

      The natural key for an order - the way users would refer to it - is
      the order id. In a paper-based computer-less system, it would be a
      pre-printed number on an order book. In a computer-based system, the
      order id is generated by the system at the time for the order.

      Involving userid is likely to be out of the question in many middle-
      tier apps - all orders from the same user. Customer may do better, but
      it may be perfectly legal for the same customer to place two orders at
      the same time. For instance, the customer may be a company, and two
      different departments are making different orders.

      And if even if you could construct this natural key, I can tell you that the
      users blissfully will ignore it. What they need is an order id, and you
      must somehow report that back. In a web app, you sometimes want to avoid
      an extra rountrip to get that number. Sending down a GUID from the client
      is one way to handle it. A GUID is a poor key for usability, but if it's
      reported back to another app in a business-to-business solution, this is
      less of an issue.

      The idea with a "natural key" as being something "outside the system" may
      sound nice, but it does not model the reality for real-life system.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

      Comment

      • David Portas

        #18
        Re: What datatype to use for PK?

        > Involving userid is likely to be out of the question in many middle-[color=blue]
        > tier apps - all orders from the same user[/color]

        True. There may be a difference between "user" and "customer". Customer
        is what I meant. Customers do indeed require an order number generated
        by the system - no problem there.
        [color=blue]
        > it may be perfectly legal for the same customer to place two orders[/color]
        at[color=blue]
        > the same time. For instance, the customer may be a company, and two
        > different departments are making different orders.[/color]

        Say, two orders are placed simultaneously by different entities
        (departments or whatever) and you don't record the information that
        distinguishes them then how will you know which is which? The
        artificial order number doesn't tell you that. In your hypothetical
        scenario wouldn't the order confirmation emails go to the same address?
        So even the customer's Accounts Payable department wouldn't know which
        items on the order belonged to which department unless the end-user
        recorded that information separately. Is there a benefit in issuing
        multiple order numbers in this scenario? I'm not sure, but if there
        were and you wanted to persist information about different, arbitrary
        "sub-groups" of order items with all other attributes of the order
        identical then in 3NF that information belongs as an attribute in the
        Order Details table, not as a duplicate in the Order table. That's
        unless the "department " attribute was recorded as well, in which case
        Department would obviously become part of the Order table's key.
        [color=blue]
        > The idea with a "natural key" as being something "outside the system"[/color]
        may[color=blue]
        > sound nice, but it does not model the reality for real-life system.[/color]

        I thought "real-life" was "outside the system" but maybe your
        definition of reality differs from mine. :-)

        --
        David Portas
        SQL Server MVP
        --

        Comment

        • Mike Sherrill

          #19
          Re: What datatype to use for PK?

          On 10 Feb 2005 03:28:00 -0800, "David Portas"
          <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote:
          [color=blue]
          >Conceptually the model is a "bag" rather than a "set" i.e. duplicates
          >are allowed (the GUID isn't part of the logical model so it doesn't
          >count).[/color]

          Not exactly where I was going, but it's in the same direction. I was
          using "conceptual model" in the sense Halpin uses it in ORM. And I'd
          still like Tal to post it.

          --
          Mike Sherrill
          Information Management Systems

          Comment

          • Don Vaillancourt

            #20
            Re: What datatype to use for PK?

            Well no, the PK would be an integer.

            Greg D. Moore (Strider) wrote:[color=blue]
            > "Don Vaillancourt" <donv@webimpact .com> wrote in message
            > news:CisOd.9068 8$vO1.567838@nn rp1.uunet.ca...
            >[color=green]
            >>Over the years I have always used the decimal(18,0) as the datatype for
            >>primary keys. Aside from the number of significant numbers involved,
            >>would BigInt type be better for performance or is decimal(18,0) still[/color]
            >
            > okay.
            >
            > I think it depends on what your primary key is. If it's an ISBN for
            > example, I don't think either of these would work.
            >
            >
            >[/color]


            --
            Don Vaillancourt
            Director of Software Development
            WEB IMPACT INC.
            phone: 416-815-2000 ext. 245
            fax: 416-815-2001
            email: donv@web-impact.com <mailto:donv@we bimpact.com>
            web: http://www.web-impact.com
            Web Impact Inc. <http://www.web-impact.com>
            This email message is intended only for the addressee(s) and contains
            information that may be confidential and/or copyright.

            If you are not the intended recipient please notify the sender by reply
            email and immediately delete this email.

            Use, disclosure or reproduction of this email by anyone other than the
            intended recipient(s) is strictly prohibited. No representation is made
            that this email or any attachments are free of viruses. Virus scanning
            is recommended and is the responsibility of the recipient.

            Comment

            • --CELKO--

              #21
              Re: What datatype to use for PK?

              >> Over the years I have always used the decimal(18,0) as the datatype
              for primary keys. <<

              That's wrong! The best key is the 17-digit Hebrew number that God puts
              on the bottom of everything in creation!
              Sounds pretty silly, doesn't it? But this the same thing as your
              question.

              There is no "magical, Universal, one-size-fits-all" key. There are
              industry standards and natural keys, which you discover with research.
              There are techniques for designing keys when the research fails --
              check digits, grep patterns, validation rules, etc.

              If you have been blindly writing DECIMAL(18,0) as the key on your
              tables, then you probably have never actually designed an RDBMS.
              Instead, you have been faking pointer chains and have not had any
              relational keys.

              You might want to get a course in the basics and learn exactly what a
              key is. I would also do a full data audit on what you have now.

              Comment

              • tal_mcmahon@hotmail.com

                #22
                Re: What datatype to use for PK?

                Ok,
                I think my example of address was taken a bit to literally. I used it
                only as an example. Where I was going with this is that i can have a
                table that I can mix foreign keys from different parents. Before you
                fall over or start the flame war, here is a system, There are reports,
                there are Contacts, Vehicles, and Property all attached to these
                reports. All of these (reports,contac ts,vehicles, and property) can
                have multiple images attached to them. I could have:

                1. Seperate tables for ReportImages,Co ntactImages,Veh icleImages, and
                PropertyImages all with there own pointers to the addresses.
                I chose not to use this as it would mean 4 times the stored
                procedures etc.

                2. a single table with a column for parentID (int), and
                ParentType(int) .
                I chose against this as most of the UI controls work better
                with one value

                3. a single table with a column for parentID (GUID)
                i chose this method because I could select from one place
                using one Column as the criteria.
                This method also allowed me to find images based upon their
                description(var char(250)) no matter what parent they were associated
                with. Sometimes a Smashed vehicle picture is attached to the report
                not the vehicle and vice versa. Once I find the image then I can find
                the parent record as needed.

                well there is my defense hope it holds up.

                tal

                Comment

                • Don Vaillancourt

                  #23
                  Re: What datatype to use for PK?

                  I have been designing databases for about 5 years. And from a
                  structural stand-point my designs have always been excellent. I have
                  always been able to add tables at a later time without falling into a
                  hole wishing I had designed the schema differently.

                  But, yes, I do lack knowledge in the area of what is the best types to
                  use for performance reasons. But I have never been corrected in my
                  selection of PK types even by well experienced DBAs.

                  What I am really interested in is if MSSQL (even, Oracle, MySQL, etc)
                  handles decimal differently than integer when generating keys and such.

                  Will an 8-byte BIGINT perform better than a 9-byte Decimal(18,0) when
                  dealing with millions of rows.

                  I wouldn't know how to test this since the time it would take to perform
                  such a test would probably differ in the nano-seconds if not
                  insignificant milli-seconds. Which is still even harder to test on a
                  multitasking OS when your result time is certainly always different.



                  --CELKO-- wrote:[color=blue][color=green][color=darkred]
                  >>>Over the years I have always used the decimal(18,0) as the datatype[/color][/color]
                  >
                  > for primary keys. <<
                  >
                  > That's wrong! The best key is the 17-digit Hebrew number that God puts
                  > on the bottom of everything in creation!
                  > Sounds pretty silly, doesn't it? But this the same thing as your
                  > question.
                  >
                  > There is no "magical, Universal, one-size-fits-all" key. There are
                  > industry standards and natural keys, which you discover with research.
                  > There are techniques for designing keys when the research fails --
                  > check digits, grep patterns, validation rules, etc.
                  >
                  > If you have been blindly writing DECIMAL(18,0) as the key on your
                  > tables, then you probably have never actually designed an RDBMS.
                  > Instead, you have been faking pointer chains and have not had any
                  > relational keys.
                  >
                  > You might want to get a course in the basics and learn exactly what a
                  > key is. I would also do a full data audit on what you have now.
                  >[/color]


                  --
                  Don Vaillancourt
                  Director of Software Development
                  WEB IMPACT INC.
                  phone: 416-815-2000 ext. 245
                  fax: 416-815-2001
                  email: donv@web-impact.com <mailto:donv@we bimpact.com>
                  web: http://www.web-impact.com
                  Web Impact Inc. <http://www.web-impact.com>
                  This email message is intended only for the addressee(s) and contains
                  information that may be confidential and/or copyright.

                  If you are not the intended recipient please notify the sender by reply
                  email and immediately delete this email.

                  Use, disclosure or reproduction of this email by anyone other than the
                  intended recipient(s) is strictly prohibited. No representation is made
                  that this email or any attachments are free of viruses. Virus scanning
                  is recommended and is the responsibility of the recipient.

                  Comment

                  • Erland Sommarskog

                    #24
                    Re: What datatype to use for PK?

                    David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                    > Say, two orders are placed simultaneously by different entities
                    > (departments or whatever) and you don't record the information that
                    > distinguishes them then how will you know which is which? The
                    > artificial order number doesn't tell you that. In your hypothetical
                    > scenario wouldn't the order confirmation emails go to the same address?
                    > So even the customer's Accounts Payable department wouldn't know which
                    > items on the order belonged to which department unless the end-user
                    > recorded that information separately. Is there a benefit in issuing
                    > multiple order numbers in this scenario? I'm not sure, but if there
                    > were and you wanted to persist information about different, arbitrary
                    > "sub-groups" of order items with all other attributes of the order
                    > identical then in 3NF that information belongs as an attribute in the
                    > Order Details table, not as a duplicate in the Order table. That's
                    > unless the "department " attribute was recorded as well, in which case
                    > Department would obviously become part of the Order table's key.[/color]

                    Most likely there is information to tell the orders apart, or if
                    they are duplicates. But that information is likely to be both
                    unpractical and uninteresting to use to identify the order.
                    [color=blue][color=green]
                    >> The idea with a "natural key" as being something "outside the system"
                    >> may sound nice, but it does not model the reality for real-life system.[/color]
                    >
                    > I thought "real-life" was "outside the system" but maybe your
                    > definition of reality differs from mine. :-)[/color]

                    Yes, many real-life system lives outside the system known as theory.
                    All systems that stores customers for instance - another concept that
                    does not have any useful natural key...


                    --
                    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                    Books Online for SQL Server SP3 at
                    Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

                    Comment

                    • David Portas

                      #25
                      Re: What datatype to use for PK?

                      I differ on whether natural keys are "unpractica l", "uninterest ing" and
                      "not useful". There are plenty of scenarios in which natural keys are
                      not only useful but essential - data integration betweeen heterogeneous
                      data sources for example.

                      At least we do after all seem to agree that, with the right model,
                      natural keys always exist - whether you choose to implement them or not
                      is simply a decision to be made at design time.

                      --
                      David Portas
                      SQL Server MVP
                      --

                      Comment

                      • Erland Sommarskog

                        #26
                        Re: What datatype to use for PK?

                        David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                        > I differ on whether natural keys are "unpractica l", "uninterest ing" and
                        > "not useful". There are plenty of scenarios in which natural keys are
                        > not only useful but essential - data integration betweeen heterogeneous
                        > data sources for example.[/color]

                        I certainly don't wish to imply that natural keys never are useful. There
                        are certainly cases where they are.

                        But I like to point out that what is a natural key for one system, often
                        is key generated by another system, and thus originally an artificial key.
                        In fact, this can happen within a system as well. If I identify a position
                        by account number and instrument id, that is a natural key composed by
                        two artificial keys.
                        [color=blue]
                        > At least we do after all seem to agree that, with the right model,
                        > natural keys always exist - whether you choose to implement them or not
                        > is simply a decision to be made at design time.[/color]

                        Certainly a disagreement on "right". I claim that "customer" is a concept
                        that in many situations does not have a natural key. Even if you in could
                        in theory find information that you could use as a natural key, it would
                        be a gross error of system design to include in your database. And even
                        if you would include it, it would be very difficult to verify that you
                        actually have the correct values in your database.

                        --
                        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                        Books Online for SQL Server SP3 at
                        Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

                        Comment

                        • David Portas

                          #27
                          Re: What datatype to use for PK?

                          You are correct that Customer data can pose some big challenges of data
                          verification. However, it would be an error NOT to enforce keys on this
                          data in my experience. Worst case is that you duplicate all the
                          customer attributes and allocate a new artifical key. In that scenario
                          there is likely to be little or no hope of ever determining which row
                          represents which physical customer or ensuring that referencing rows
                          point to the "right" customer. In other words you gain *nothing* from
                          having a redundant row in your table and you lose a lot from not having
                          a natural key. Use constraints to prevent duplicates and verify data at
                          entry or as part of a data integration process. There is plenty of
                          software out there to help verify and manage customer databases. It may
                          never be 100% accurate but it will be more useful and manageable with a
                          key than without one.

                          --
                          David Portas
                          SQL Server MVP
                          --

                          Comment

                          • Erland Sommarskog

                            #28
                            Re: What datatype to use for PK?

                            David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                            > You are correct that Customer data can pose some big challenges of data
                            > verification. However, it would be an error NOT to enforce keys on this
                            > data in my experience. Worst case is that you duplicate all the
                            > customer attributes and allocate a new artifical key. In that scenario
                            > there is likely to be little or no hope of ever determining which row
                            > represents which physical customer or ensuring that referencing rows
                            > point to the "right" customer. In other words you gain *nothing* from
                            > having a redundant row in your table and you lose a lot from not having
                            > a natural key. Use constraints to prevent duplicates and verify data at
                            > entry or as part of a data integration process. There is plenty of
                            > software out there to help verify and manage customer databases. It may
                            > never be 100% accurate but it will be more useful and manageable with a
                            > key than without one.[/color]

                            Assume that you run a web shop. How do you verify new customers? Well,
                            many web shops seems to use a user id or the mail address. When a new
                            customer register, they should probably check that some other crucial
                            attrbutes are not already in use by other customers, such as e-mail address
                            (if you use a user id), national registration number and credit-card number.
                            However, few business would require you to register all three. In any case,
                            if a customer prefers to register a second time, there is very little you
                            can do to stop him.

                            For a system where the customers are registered by staff through a GUI, you
                            can add some checks to the registration form. And there is a whole lot of
                            point of having the checks there and not in the database, because such
                            checks can help to prevent the user from entering data by mistake, but
                            not stop him from entering data he actually has to enter. It's naïve to
                            believe that you thought of each and every case when you designed your
                            database.

                            For instance, in our system there is at one case where users will have to
                            register a customer a second time, and that is if the customers changes tax
                            countries in the middle of the year. Yes, this can be construed as a
                            weakness of our system that we don't keep a history of tax country.

                            Then again, the driving force when we develop our system is what our
                            customer are prepared to pay for, and not fulfilment of theoretic rules.

                            --
                            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                            Books Online for SQL Server SP3 at
                            Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

                            Comment

                            • David Portas

                              #29
                              Re: What datatype to use for PK?

                              > Assume that you run a web shop. How do you verify new customers?
                              Well,[color=blue]
                              > many web shops seems to use a user id or the mail address. When a new[/color]
                              [color=blue]
                              > customer register, they should probably check that some other crucial[/color]
                              [color=blue]
                              > attrbutes are not already in use by other customers, such as e-mail[/color]
                              address[color=blue]
                              > (if you use a user id), national registration number and credit-card[/color]
                              number.[color=blue]
                              > However, few business would require you to register all three. In any[/color]
                              case,[color=blue]
                              > if a customer prefers to register a second time, there is very little[/color]
                              you[color=blue]
                              > can do to stop him.[/color]

                              Isn't email address the key here? Sites I've used don't let me register
                              a second time with the same address and that seems like the only
                              sensible policy if the site issues its own user ID to the user. If the
                              user supplies his own user login name then that might be preferred to
                              email address as a key. In either case, I don't see a problem.
                              [color=blue]
                              > Then again, the driving force when we develop our system is what our
                              > customer are prepared to pay for, and not fulfilment of theoretic[/color]
                              rules.

                              If the customer won't pay for it that is indeed a good reason not to do
                              it! Of course they may end up paying more in the end for you to come
                              back and fix it :-)

                              --
                              David Portas
                              SQL Server MVP
                              --

                              Comment

                              • Erland Sommarskog

                                #30
                                Re: What datatype to use for PK?

                                David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                                > Isn't email address the key here? Sites I've used don't let me register
                                > a second time with the same address and that seems like the only
                                > sensible policy if the site issues its own user ID to the user. If the
                                > user supplies his own user login name then that might be preferred to
                                > email address as a key. In either case, I don't see a problem.[/color]

                                Sure, you can use an email as a key! But the difference between an
                                email address or a system-generated customer number is slim. Both are
                                unverifiable combinations of random bits. What you can verify with
                                an email address is that it has a an @ and a . and ends in a known domain.
                                Of course, you can send out a mail and ask for acknowledge before you
                                create the user. But that mail address could disappear just seconds
                                later.

                                One advantage with a customer number is that if I change my mail address,
                                I don't have to register as a new customer at the site.

                                What is really horrible is system that uses users names that looks like
                                email addresses, but aren't. This means that someone who feels like can
                                use someone else's mail address as his user name.


                                --
                                Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                                Books Online for SQL Server SP3 at
                                Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

                                Comment

                                Working...