What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • David Portas

    #16
    Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

    IDENTITY may be a *surrogate* key but it isn't a *natural* key because it
    bears no relation to the entity that you are modelling in your table. A
    natural primary key is a subset of the attributes of an entity which
    uniquely identify that entity. IDENTITY clearly isn't an attribute of any
    real entity - it's just an arbitrary number.

    Taking your Vendors table as an example, a naive design might look like
    this:

    CREATE TABLE Vendors (vendor_id INTEGER IDENTITY PRIMARY KEY /* ?? */,
    vendor_name VARCHAR(40) NOT NULL, vendor_tax_id VARCHAR(10) NOT NULL, ...)

    But this table has no uniqueness or integrity because multiple vendors can
    exist with different (arbitrary) vendor_ids. It may well work internally for
    a particular application but will break when someone needs to do some real
    analysis on your data. And what if you need to combine it with data from
    another system that doesn't have that same magical Vendor_id column?

    Here's a better alternative:

    CREATE TABLE Vendors (vendor_id INTEGER NOT NULL UNIQUE /* surrogate */,
    vendor_name VARCHAR(40) NOT NULL UNIQUE, vendor_tax_id VARCHAR(10) NOT NULL
    PRIMARY KEY, ...)

    By declaring UNIQUE / PK constraints on the correct attributes you can
    ensure that you have verifiably unique data. Keep the surrogate key if you
    like but make sure you declare the Natural key as well. (Moving the actual
    PK declaration is essentially cosmetic - PK is equivalent to NOT NULL UNIQUE
    and it's not unusual to have several NOT NULL UNIQUE keys in a table).

    Here's Celko on keys:

    Explore the latest news and expert commentary on software and services, brought to you by the editors of InformationWeek


    --
    David Portas
    ------------
    Please reply only to the newsgroup
    --


    Comment

    • Steve Kass

      #17
      Re: What are cons and pros for using IDENTITY property as PK in SQLSERVER 2000?



      Aaron Bertrand - MVP wrote:
      [color=blue][color=green]
      >>I am by no means a SQL expert, so forgive me if this seems ignorant...But
      >>why can't the ID columm be a natural key? For example, I am working on a
      >>project that has a vendors table. The list of vendors is used in
      >>relationshi p to several other tables. We build this table with an identity
      >>
      >>[/color]
      >
      >An identity value that is generated by the system is not "natural".. . a
      >natural key means that the key is, by nature, identifying a single row...
      >not artificially because you generated some value for it. A natural key
      >could be an e-mail address, or a social security number, or a license plate
      >number, or a latitude and longitude -- something that is part of the data
      >that also happens to uniquely identify it.
      >[/color]
      Quiz: Classify each key below as "natural" or "artificial ":

      Northwind..Cust omers.CustomerI D
      Northwind..Orde rs.OrderID
      Northwind..Terr itories.Territo ryID

      Most keys are fundamentally artificial, but somehow we only call them
      artificial if we made them up, not if someone else made them up. Social
      Security numbers are probably nothing different than identity values in
      someone elses database, which doesn't make them any more intrinsic to
      living people than VendorID values. I don't think an identity VendorID
      value is any more artificial than any other unique way of identifying a
      vendor, so long as it is assigned the first time a vendor enters the
      system and is never changed.

      I think this whole natural/artifical distinction is mostly quite silly.
      In a well-designed database, entities can just as easily be identified
      by an integer they are assigned when they enter the system as they can
      by some set of attributes within the data they enter the system with.
      The arguments against using identity values always seem to be arguments
      against the ways people misuse identity values.

      The advantage of keys like Northwind..Cust omers.CustomerI D is really an
      error-correction issue. If carefully chosen, those 5-character keys can
      be recovered if a single letter is mistyped. But this can be done with
      artificial values also, if check digits or other error-correcting
      schemes are used.

      SK
      [color=blue]
      >
      >Keep in mind that a primary key does not have to a natural key. I don't see
      >any problems with using an IDENTITY as a primary key, such as in your case.
      >But in your case it is not a natural key.
      >
      >
      >[/color]

      Comment

      • David Portas

        #18
        Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

        > living people than VendorID values. I don't think an identity VendorID[color=blue]
        > value is any more artificial than any other unique way of identifying a
        > vendor, so long as it is assigned the first time a vendor enters the
        > system and is never changed.[/color]

        But the point is that a "natural" key is verifiable outside of the system.

        When I see a NG post from "skass[at]drew.edu" I don't care whether that's
        based on your "real" name or even whether S.Kass is the same name as on your
        passport or driver's licence. What's important to me is that it's determined
        by a consistent method outside of the system which gives me some acceptable
        degree of confidence that you're the same person who posted here as
        "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
        change your email address or if someone spoofs your address. But it's still
        intrinsically better than an arbitrary ID allocated by the server.

        --
        David Portas
        ------------
        Please reply only to the newsgroup
        --


        Comment

        • Aaron Bertrand - MVP

          #19
          Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

          > artificial if we made them up, not if someone else made them up. Social[color=blue]
          > Security numbers are probably nothing different than identity values in
          > someone elses database,[/color]

          Right, but this is a centralized and controlled database. Nobody else has
          my SSN (though there are exceptions, e.g. someday SSNs for deceased people
          will have to be re-used); anybody else who uses it to identify themselves is
          likely attempting fraud / identity theft. I consider it a "natural" key
          because I supply it to the database, rather than the other way around.

          Whereas my customerID according to Barnes & Noble is very unlikely to be the
          same as my customerID at J.Crew.

          In any case, I do agree that the distinction is largely silly, especially
          when it erupts into arguments and "but Celko says..." nonsense. Like many
          other things in the database world, the choice of a key is not dictated by
          some higher power, but is rather situation-dependent.

          --
          Aaron Bertrand
          SQL Server MVP
          Please contact this domain's administrator as their DNS Made Easy services have expired.



          Comment

          • Bob Badour

            #20
            Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

            "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message
            news:e-qdnT5gCqKOTFGiR Vn-iQ@giganews.com ...[color=blue][color=green]
            > > living people than VendorID values. I don't think an identity VendorID
            > > value is any more artificial than any other unique way of identifying a
            > > vendor, so long as it is assigned the first time a vendor enters the
            > > system and is never changed.[/color]
            >
            > But the point is that a "natural" key is verifiable outside of the system.[/color]

            As soon as one records the generated identity value outside the system, the
            key is verifiable outside of the system. All keys are surrogates or
            artificial keys. Natural keys are nothing more than familiar surrogates.


            Comment

            • Steve Kass

              #21
              Re: What are cons and pros for using IDENTITY property as PK in SQLSERVER 2000?



              David Portas wrote:
              [color=blue][color=green]
              >>living people than VendorID values. I don't think an identity VendorID
              >>value is any more artificial than any other unique way of identifying a
              >>vendor, so long as it is assigned the first time a vendor enters the
              >>system and is never changed.
              >>
              >>[/color]
              >
              >But the point is that a "natural" key is verifiable outside of the system.
              >[/color]
              I guess it depends on where you draw the boundaries of "the system." If
              you need something outside of "the system" to verify your key, aren't
              you just working within a larger system (and one that is not entirely
              under your control)?
              [color=blue]
              >
              >When I see a NG post from "skass[at]drew.edu" I don't care whether that's
              >based on your "real" name or even whether S.Kass is the same name as on your
              >passport or driver's licence. What's important to me is that it's determined
              >by a consistent method outside of the system which gives me some acceptable
              >degree of confidence that you're the same person who posted here as
              >"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
              >change your email address or if someone spoofs your address. But it's still
              >intrinsicall y better than an arbitrary ID allocated by the server.
              >[/color]
              This is a good point. If entities enter your system from time to time,
              and you must determine whether they duplicate entities already in your
              system or are new, then you need some "natural" method of
              identification. If I apply for a Microsoft credit card, Microsoft will
              likely generate an artifical credit card number for me. But that can't
              be the only way in which Microsoft can identify me if a business rule
              prohibits one person from possessing two Microsoft credit cards

              A nice way to look at this is by recognizing when there is and when
              there isn't some external entity that participates in an internal
              business rule. The need to use a "natural" key such as my social
              security number exists only because there is an entity outside
              Microsoft's credit department (the
              person-registered-with-the-social-security-administration entity) that
              participates in a business rule: that the cardinality of the PRWTSSA <->
              CCP relation is required to be 1<->{0,1}.

              This is a useful way to look at things. Is an identity value VendorID a
              good key for a vendor? It depends on whether there is some entity
              "outside the system" with a certain relationship to the entity
              identified by VendorID. If there is, then VendorID is not suitable as
              the only key. If it is used, it must be a surrogate for a natural
              external key also recorded within the system. Some businesses may allow
              one corporation more than one VendorID, and others may not.

              Is an identity value InvoiceID a good key for invoices (in the database
              of the business generating the invoices)? It might well be, since
              invoices can be internal to the system. While there might be a more
              "natural" key, such as (CreationDateti me, IssuingEmployee _or_System),
              the natural key might not provide any added value beyond its individual
              attribute values if invoices are entirely internal.

              In some cases, what appear to be external entities can use internal
              (artificial) keys, because there is no business rule relating the
              internal and external entities. An example might be a deli counter
              customer who pulls an identity value from the "take a ticket" machine.
              Here the only confusion is that there is no handy word for the
              human-visit-to-deli-counter entity, and Customer might be a more
              convenient name. Despite the fact that customers are people and people
              can be identified uniquely outside the system, there's no need to worry
              about that to manage visits to the deli counter.

              I'm curious now to know whether this point of view helps make a little
              more sense of the big debate. If not that, maybe it will at least help
              me understand why I get anxious every time I need to use a Microsoft
              Passport for identification.

              SK
              [color=blue]
              >
              >
              >[/color]

              Comment

              • Louis Davidson

                #22
                Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

                The problem is with how it is used. If you start giving users access to
                identity based values, you get into a bad spot where they want to make
                changes to the value (in my line of work, we don't like the numbers 666 in
                account numbers) so using identities for user values is a bad idea. I use
                them only for internal pointers that are never presented to users, since
                they are not modifiable. I could use guids, or characters, or whatever for
                keys and no one would be the wiser.
                [color=blue]
                > Natural keys are nothing more than familiar surrogates.[/color]

                You are kind of right here, but it is generally true that natural keys can
                change, because in the world, things can change. Identities cannot change.

                --
                ----------------------------------------------------------------------------
                -----------
                Louis Davidson (drsql@hotmail. com)
                Compass Technology Management

                Pro SQL Server 2000 Database Design


                Note: Please reply to the newsgroups only unless you are
                interested in consulting services. All other replies will be ignored :)

                "Bob Badour" <bbadour@golden .net> wrote in message
                news:IOydndJOXv 5aR1GiRVn-hg@golden.net.. .[color=blue]
                > "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message
                > news:e-qdnT5gCqKOTFGiR Vn-iQ@giganews.com ...[color=green][color=darkred]
                > > > living people than VendorID values. I don't think an identity[/color][/color][/color]
                VendorID[color=blue][color=green][color=darkred]
                > > > value is any more artificial than any other unique way of identifying[/color][/color][/color]
                a[color=blue][color=green][color=darkred]
                > > > vendor, so long as it is assigned the first time a vendor enters the
                > > > system and is never changed.[/color]
                > >
                > > But the point is that a "natural" key is verifiable outside of the[/color][/color]
                system.[color=blue]
                >
                > As soon as one records the generated identity value outside the system,[/color]
                the[color=blue]
                > key is verifiable outside of the system. All keys are surrogates or
                > artificial keys. Natural keys are nothing more than familiar surrogates.
                >
                >[/color]


                Comment

                • Trey Walpole

                  #23
                  Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?


                  "David Portas" <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote in message
                  news:e-qdnT5gCqKOTFGiR Vn-iQ@giganews.com ...[color=blue][color=green]
                  > > living people than VendorID values. I don't think an identity VendorID
                  > > value is any more artificial than any other unique way of identifying a
                  > > vendor, so long as it is assigned the first time a vendor enters the
                  > > system and is never changed.[/color]
                  >
                  > But the point is that a "natural" key is verifiable outside of the system.
                  >
                  > When I see a NG post from "skass[at]drew.edu" I don't care whether that's
                  > based on your "real" name or even whether S.Kass is the same name as on[/color]
                  your[color=blue]
                  > passport or driver's licence. What's important to me is that it's[/color]
                  determined[color=blue]
                  > by a consistent method outside of the system which gives me some[/color]
                  acceptable[color=blue]
                  > degree of confidence that you're the same person who posted here as
                  > "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
                  > change your email address or if someone spoofs your address. But it's[/color]
                  still[color=blue]
                  > intrinsically better than an arbitrary ID allocated by the server.
                  >[/color]

                  I am a firm believer that natural keys should only be used to logically
                  design/normalize the data. When it comes to the real reason for keys, data
                  integrity, more often than not I have seen that natural keys are
                  intrinsically not good physical primary keys.

                  1. Natural keys are, being natural and therefore user entered [i.e.,
                  provided to the database by external means], fungible. If a user enters
                  data, they must also be able to modify it. If data can be modified, then its
                  value as a systemic primary key is gone. Yes, you can cascade updates to
                  these, but why do it when it can be avoided to start with.

                  2. Natural keys are typically a composite of atomic attributes. If using a
                  composite, these must be propagated to referencing tables as foreign keys.
                  Your normalization drops below par, by having these [potentially] massively
                  duplicated columns.
                  Attributes that are single, [supposedly] unique attributes (e.g., SSN),
                  usually represent some official, governmentally recognized ID, and therefore
                  have legal issues with being propagated throughout a system.

                  Also, for amateurs and many professionals, natural keys are very often
                  chosen incorrectly. e.g., I believe some combination of Name and other info
                  has been used by my ISP as their primary key. My last name was entered into
                  their system incorrectly, but they cannot fix it because their system will
                  not allow it. Preposterous and poor design.

                  Surrogate keys generated by using the identity property are ideal for data
                  integrity, because
                  1. They are static values [i.e., once entered, it does not change] and the
                  DBA has control over allowing values in identity columns to be modified.
                  2. They are singleton row ids. The fact that they are sequential is
                  irrelevant. That is simply the most efficient means of generating new
                  numeric values.

                  Identity integers can be problematic in two-way replication, but proper
                  management of key ranges can alleviate these issues. GUIDs are the MS
                  recommended way to deal with distributed data and two-way replication, but
                  not as easy to deal with in unreplicated databases.



                  Comment

                  • Trevor Best

                    #24
                    Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

                    On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour"
                    <bbadour@golden .net> wrote:
                    [color=blue]
                    >
                    >"Louis Davidson" <dr_dontspamme_ sql@hotmail.com > wrote in message
                    >news:#rIbinJuD HA.3436@tk2msft ngp13.phx.gbl.. .[color=green]
                    >> While it is true that the chosen primary key cannot contain any optional
                    >> values, it is more the praticioner (sp?) that disagrees with this stance.
                    >> Compound keys are unwieldy and bad for performance, but the theorist in me
                    >> says "who cares?" It is all about what is right/best, not what is
                    >> fastest/easiest.[/color]
                    >
                    >Performance is determined by the physical structure and not by the logical
                    >interface. Legitimate theorists have written ad nauseum on the severe
                    >logical problems caused by using compound keys for references when data may
                    >be missing.[/color]

                    Are you talking about data being missing from a foreign key side of a
                    relationship? I'm pro identity column myself but I don't see how that
                    would help in this instance.

                    --
                    A)bort, R)etry, I)nfluence with large hammer.

                    Comment

                    • Trevor Best

                      #25
                      Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

                      On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas"
                      <REMOVE_BEFORE_ REPLYING_dporta s@acm.org> wrote:
                      [color=blue][color=green]
                      >> living people than VendorID values. I don't think an identity VendorID
                      >> value is any more artificial than any other unique way of identifying a
                      >> vendor, so long as it is assigned the first time a vendor enters the
                      >> system and is never changed.[/color]
                      >
                      >But the point is that a "natural" key is verifiable outside of the system.
                      >
                      >When I see a NG post from "skass[at]drew.edu" I don't care whether that's
                      >based on your "real" name or even whether S.Kass is the same name as on your
                      >passport or driver's licence. What's important to me is that it's determined
                      >by a consistent method outside of the system which gives me some acceptable
                      >degree of confidence that you're the same person who posted here as
                      >"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
                      >change your email address or if someone spoofs your address. But it's still
                      >intrinsicall y better than an arbitrary ID allocated by the server.[/color]

                      If Steve leaves that educational facility he's at now, his email
                      address will surely change. If he goes into a witness protection
                      scheme his name, address and SS number will change, he or someone else
                      could change quiet a bit about him but if he's on a database with an
                      identity column as his PK then it's more likely that it will *never*
                      change.

                      --
                      A)bort, R)etry, I)nfluence with large hammer.

                      Comment

                      • Steve Kass

                        #26
                        Re: What are cons and pros for using IDENTITY property as PK in SQLSERVER 2000?



                        Trevor Best wrote:
                        [color=blue]
                        >On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas"
                        ><REMOVE_BEFORE _REPLYING_dport as@acm.org> wrote:
                        >
                        >
                        >[color=green][color=darkred]
                        >>>living people than VendorID values. I don't think an identity VendorID
                        >>>value is any more artificial than any other unique way of identifying a
                        >>>vendor, so long as it is assigned the first time a vendor enters the
                        >>>system and is never changed.
                        >>>
                        >>>[/color]
                        >>But the point is that a "natural" key is verifiable outside of the system.
                        >>
                        >>When I see a NG post from "skass[at]drew.edu" I don't care whether that's
                        >>based on your "real" name or even whether S.Kass is the same name as on your
                        >>passport or driver's licence. What's important to me is that it's determined
                        >>by a consistent method outside of the system which gives me some acceptable
                        >>degree of confidence that you're the same person who posted here as
                        >>"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you
                        >>change your email address or if someone spoofs your address. But it's still
                        >>intrinsical ly better than an arbitrary ID allocated by the server.
                        >>
                        >>[/color]
                        >
                        >If Steve leaves that educational facility he's at now, his email
                        >address will surely change. If he goes into a witness protection
                        >scheme his name, address and SS number will change, he or someone else
                        >could change quiet a bit about him but if he's on a database with an
                        >identity column as his PK then it's more likely that it will *never*
                        >change.
                        >[/color]
                        In fact none of the 257 different identity values assigned to me will ever
                        get changed - unless I'm organized and honest, and unless someone
                        checks some kind of natural key of mine, how will anyone know
                        it was me each of those 257 times I opened an account?

                        SK
                        [color=blue]
                        >
                        >
                        >[/color]

                        Comment

                        • Greg D. Moore \(Strider\)

                          #27
                          Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?


                          "BenignVani lla" <bv@tibetanbeef garden.com> wrote in message
                          news:O5OdnVtG1u WtN1GiRVn-hg@giganews.com ...[color=blue]
                          >
                          > I am by no means a SQL expert, so forgive me if this seems ignorant...But
                          > why can't the ID columm be a natural key? For example, I am working on a
                          > project that has a vendors table. The list of vendors is used in
                          > relationship to several other tables. We build this table with an identity
                          > column, and a column with the vendor's name. Now when a vendor is added to
                          > the table, they are assigned a unique ID that ties all other related data
                          > back to this vendor, and in the case of a vendor changing their name, or a
                          > typo, we can make updates without affecting data. Seems like a perfect use
                          > for an identity field, and it is our primary key.[/color]

                          And what happens if someone does a DBCC checkident ('FOO', RESEED)?

                          Or you have to copy it into a new table and accidently set that table up
                          with an IDENTITY column and now all your rows get new IDs?

                          [color=blue]
                          >
                          >
                          > --
                          > BV.
                          > WebPorgmaster - www.IHeartMyPond.com
                          > Work at Home, Save the Environment - www.amothersdream.com
                          >
                          >[/color]


                          Comment

                          • Trey Walpole

                            #28
                            Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

                            > And what happens if someone does a DBCC checkident ('FOO', RESEED)?[color=blue]
                            >
                            > Or you have to copy it into a new table and accidently set that table up
                            > with an IDENTITY column and now all your rows get new IDs?[/color]

                            Fire the DBA. If they've allowed unskilled people such access or do anything
                            accidentally, no telling what other problems they'll cause ;)


                            Comment

                            • Greg D. Moore \(Strider\)

                              #29
                              Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?


                              "Trey Walpole" <treyNOpole@SPc omcastAM.net> wrote in message
                              news:uPV5k3SuDH A.1196@TK2MSFTN GP12.phx.gbl...[color=blue][color=green]
                              > > And what happens if someone does a DBCC checkident ('FOO', RESEED)?
                              > >
                              > > Or you have to copy it into a new table and accidently set that table up
                              > > with an IDENTITY column and now all your rows get new IDs?[/color]
                              >
                              > Fire the DBA. If they've allowed unskilled people such access or do[/color]
                              anything[color=blue]
                              > accidentally, no telling what other problems they'll cause ;)[/color]

                              Wow. Can I get a job where you work where folks never make mistakes?

                              Seriously, a DBCC checkident can be necessary in some recovery scenarios.

                              Copying a table over into another one is often necessary in general
                              maintenance, schema changes, etc. It's pretty easy to forget to do it
                              right.

                              Admittedly, they are pretty contrived examples, but the point is, the value
                              of the identity relies on some arbitrary state of the DB at the point in
                              time it is created.

                              Now, in some cases that just might not matter, but in many cases it can be
                              an important factor.

                              [color=blue]
                              >
                              >[/color]


                              Comment

                              • Stijn Verrept

                                #30
                                Re: What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000?

                                I've read through this thread but I don't understand it. I always use an
                                int or smallint as primary key, with identity. I believe it would be a mess
                                otherwise.

                                Example: I have a table with people, last name, first name, address, ... So
                                suppose you would make a natural key then you need at least the last name
                                and the first name. I have >25 other tables that reference that table. If
                                I get this right I will need to use the name and firstname field in all the
                                other tables as well to reference. Isn't that just a lot of data waste? If
                                I'm missing something, please tell me what because this seems a bit silly.


                                Stijn Verrept.


                                Comment

                                Working...