Data import - foreign key question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • teddysnips@hotmail.com

    Data import - foreign key question

    This is a rather abstract question about data design, but I ask it here
    because a) the database is SQL Server, and b) you're such a learned
    bunch!

    Let's assume the classic relation of Customers and Orders, where an
    Order may reference a single Customer. If I was designing such a
    relation from scratch, I would create the Customer table with an
    Identity column and call it CustomerID. The Order table would contain
    a column called CustomerID, a foreign key to the Customer table.

    So far, so unexceptional. However, in my current project I have to
    work with legacy data that comes from a number of old Access systems
    where the data was not normalised. I wish to normalise it.

    The main table in this new system contains reports on parts. Each
    report may reference a single part. However, the old data which I have
    to import allowed the user to type in the part number. This has led to
    dirty data (for example, '40-7889-9098' appears, as does '40-7889-
    9098') so I will clean this data up. In the application, the part
    number will be selected from a drop down list, though the administrator
    will have access to a builder to add, amend or delete part numbers.

    So, my report table needs to store a reference to a part. When I
    import the data into my SQL Report table, I will initially bring across
    the part number. I will then populate the Part Numbers table with all
    discrete, distinct part numbers from the Report table. My question is
    should I then create a PartNumberID column in both tables, and "back
    populate" the Report table with the PartNumberID which corresponds with
    the matching PartNumber - e.g.

    UPDATE
    R
    SET
    R.fldPartNumber ID = PN.fldPartNumbe rID
    FROM
    tblReports R
    INNER JOIN tblPartNumbers RN
    ON R.fldPartNumber = RN.fldPartNumbe r

    I could then drop the fldPartNumber from the tblReports table.

    My question is - should I bother? Or can I just leave the actual
    PartNumber in the Reports table, and leave the tblPartNumbers table
    with a single column which is both Primary key and Foreign key?

    Sorry if this is poorly expressed - I had a tough weekend!

    Edward
    --
    The reading group's reading group:


  • Simon Hayes

    #2
    Re: Data import - foreign key question

    I'm not sure why you say you would leave the PartNumbers table with a
    single column. Would it not be better to have a Parts table, whose
    primary key is PartNumber? Presumably you have things like names,
    descriptions, flags etc. for each part (and if you don't, then you
    probably will at some point), so you would need a Parts table anyway.

    And what would the difference be between PartNumberID and PartNumber? I
    would guess that PartNumber must be unique (after you clean them up,
    that is), so why would you want to invent your own ID?

    Simon

    Comment

    • Edward

      #3
      Re: Data import - foreign key question

      "Simon Hayes" <sql@hayes.ch > wrote in message news:<111140689 8.871386.17820@ g14g2000cwa.goo glegroups.com>. ..[color=blue]
      > I'm not sure why you say you would leave the PartNumbers table with a
      > single column. Would it not be better to have a Parts table, whose
      > primary key is PartNumber? Presumably you have things like names,
      > descriptions, flags etc. for each part (and if you don't, then you
      > probably will at some point), so you would need a Parts table anyway.
      >
      > And what would the difference be between PartNumberID and PartNumber? I
      > would guess that PartNumber must be unique (after you clean them up,
      > that is), so why would you want to invent your own ID?[/color]

      This is exactly my point. The trouble is that I am so used to
      creating tables from scratch:

      fldPartNumberID
      fldPartNumber
      fldDescription
      fldCurrent
      fldOEMID
      etc.

      that I find myself itching to add the redundant Primary Key, even
      though the PartNumber is itself unique or, if it is not, it is unique
      in combination with one or more other fields.

      Ah, that's the point. I knew I'd get there in the end. If the part
      number is NOT unique, but in combination with, say, fldOEMID, then in
      order to store a reference to a row in the parts table the Reports
      table would need to have TWO foreign keys on the Parts table. So, my
      rule of thumb is:

      If the PartNumber is unique, it's the ID. If it isn't, it isn't the
      ID.

      Thanks for your help in clarifying my thoughts.

      Edward

      Comment

      • Simon Hayes

        #4
        Re: Data import - foreign key question


        "Edward" <teddysnips@hot mail.com> wrote in message
        news:25080b60.0 503211101.73c40 4b8@posting.goo gle.com...[color=blue]
        > "Simon Hayes" <sql@hayes.ch > wrote in message
        > news:<111140689 8.871386.17820@ g14g2000cwa.goo glegroups.com>. ..[color=green]
        >> I'm not sure why you say you would leave the PartNumbers table with a
        >> single column. Would it not be better to have a Parts table, whose
        >> primary key is PartNumber? Presumably you have things like names,
        >> descriptions, flags etc. for each part (and if you don't, then you
        >> probably will at some point), so you would need a Parts table anyway.
        >>
        >> And what would the difference be between PartNumberID and PartNumber? I
        >> would guess that PartNumber must be unique (after you clean them up,
        >> that is), so why would you want to invent your own ID?[/color]
        >
        > This is exactly my point. The trouble is that I am so used to
        > creating tables from scratch:
        >
        > fldPartNumberID
        > fldPartNumber
        > fldDescription
        > fldCurrent
        > fldOEMID
        > etc.
        >
        > that I find myself itching to add the redundant Primary Key, even
        > though the PartNumber is itself unique or, if it is not, it is unique
        > in combination with one or more other fields.
        >
        > Ah, that's the point. I knew I'd get there in the end. If the part
        > number is NOT unique, but in combination with, say, fldOEMID, then in
        > order to store a reference to a row in the parts table the Reports
        > table would need to have TWO foreign keys on the Parts table. So, my
        > rule of thumb is:
        >
        > If the PartNumber is unique, it's the ID. If it isn't, it isn't the
        > ID.
        >
        > Thanks for your help in clarifying my thoughts.
        >
        > Edward[/color]

        Not exactly - in the case you describe, you would have a composite primary
        key on (PartNumber, OEMID):

        create table dbo.Parts (
        PartNumber char(12) not null,
        OEMID int not null,
        ...
        constraint PK_Parts primary key (PartNumber, OEMID)
        )

        It would be then referenced by one foreign key made up of two columns:

        create table dbo.Reports (
        ReportID int not null,
        PartNumber char(12) not null,
        OEMID int not null,
        ...
        constraint PK_Reports primary key (ReportID),
        constraint FK_Reports_Part s foreign key (PartNumber, OEMID) references
        dbo.Parts (PartNumber, OEMID)
        )

        This isn't unusual, and in general, you should always use the table's
        natural primary key, even if it's made up of several columns, and only
        introduce your own when you have a good reason - there's no rule to say that
        a primary key must be a single column.

        For example, a certain table's natural key might be made up of 10 columns,
        including wide varchar columns - that would probably cause performance
        problems at some point. So in a situation like that, it's not unreasonable
        to introduce your own artificial key, but you would also put a UNIQUE
        constraint on the natural key anyway, to make sure the data is valid.

        Simon


        Comment

        • jimfortune@compumarc.com

          #5
          Re: Data import - foreign key question

          teddysnips@hotm ail.com wrote:[color=blue]
          >...
          > So far, so unexceptional. However, in my current project I have to
          > work with legacy data that comes from a number of old Access systems
          > where the data was not normalised. I wish to normalise it.
          >
          > The main table in this new system contains reports on parts. Each
          > report may reference a single part. However, the old data which I[/color]
          have[color=blue]
          > to import allowed the user to type in the part number. This has led[/color]
          to[color=blue]
          > dirty data (for example, '40-7889-9098' appears, as does '40-7889-
          > 9098') so I will clean this data up.[/color]
          [color=blue]
          >From my perspective, this is where your biggest potential problem lies,[/color]
          although hidden. I once had to do a similar project where data from
          Access tables had to be translated into a different format that could
          be imported into a different system. Almost all this was done in code
          since it involved complex decisions based on interrelationsh ips of
          existing data. The part I didn't like was that one field had to be put
          into different categories that didn't exist in the original. The other
          fields had nice concrete rules for the conversion. So I created a form
          and suggested that their regular employees choose the category from a
          combobox for each record similar to the way your application will
          select part numbers. All the data coverted smoothly into the new
          system except for one of the categories that had been miscatalogued.
          The customer wanted me to reduce my invoice based on the grief caused
          by having the wrong category. I pointed out that I had selected none
          of the categories. Showing which computer was used for the improper
          selections would have been even better. Avoid cleaning data yourself
          if possible. If not, keep a backup and get the cleanup rules approved.
          'Replace' can also do some pretty unexpected things if you're not
          careful. Insist that they have someone else check the "clean" data for
          final approval.

          James A. Fortune

          Comment

          Working...