Stored Proc To Copy Unnormalized to Normalized Table

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alex

    Stored Proc To Copy Unnormalized to Normalized Table

    I have a "source" table that is being populated by a DTS bulk import
    of a text file. I need to scrub the source table after the import
    step by running appropriate stored proc(s) to copy the source data to
    2 normalized tables. The problem is that table "Companies" needs to
    be populated first in order to generate the Identity ID and then use
    that as the foreign key in the other table.

    Here is the DDL:

    CREATE TABLE [dbo].[OriginalList] (
    [FirstName] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL
    ,
    [LastName] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL
    ,
    [Company] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [Addr1] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [City] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [State] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [Zip] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [Phone] [varchar] (255) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL
    ) ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[Companies] (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    [Name] [varchar] (50) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NOT NULL
    ) ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[CompanyLocation s] (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    [CompanyID] [int] NOT NULL ,
    [Addr1] [varchar] (50) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [City] [varchar] (50) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [State] [varchar] (2) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [Zip] [varchar] (10) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL ,
    [Phone] [varchar] (14) COLLATE SQL_Latin1_Gene ral_CP1_CI_AS NULL
    ) ON [PRIMARY]
    GO


    This is the stored proc I have at this time that does NOT work. It
    uses the last Company insert for all the CompanyLocation s which is not
    correct.

    CREATE PROCEDURE DataScrubSP AS
    Begin Transaction
    insert Companies (Name) select Company from OriginalList
    IF @@Error <> 0
    GOTO ErrorHandler

    declare @COID int
    select @COID=@@identit y

    insert CompanyLocation s (CompanyID, Addr1, City, State, Zip) select
    @COID, Addr1, City, State, Zip from OriginalList
    IF @@Error <> 0
    GOTO ErrorHandler

    COMMIT TRANSACTION

    ErrorHandler:
    IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION
    RETURN
    GO

    Thanks for any help.

    Alex.
  • Hugo Kornelis

    #2
    Re: Stored Proc To Copy Unnormalized to Normalized Table

    On 9 Nov 2004 08:57:44 -0800, Alex wrote:
    [color=blue]
    >I have a "source" table that is being populated by a DTS bulk import
    >of a text file. I need to scrub the source table after the import
    >step by running appropriate stored proc(s) to copy the source data to
    >2 normalized tables. The problem is that table "Companies" needs to
    >be populated first in order to generate the Identity ID and then use
    >that as the foreign key in the other table.[/color]

    (snip DDL)
    [color=blue]
    >This is the stored proc I have at this time that does NOT work. It
    >uses the last Company insert for all the CompanyLocation s which is not
    >correct.[/color]

    (snip code)

    Hi Alex,

    First some comments on the DDL. OriginalList is obviously some staging
    table so I can understand the funny datatypes, but are you really sure
    that all these columns can be NULL? And that no column (or even
    combination of columns) can be used in a PRIMARY KEY or UNIQUE constraint?

    In the Companies table, you forgot to include a UNIQUE constraint on the
    Name column. Your company names are unique, aren't they? If not, you
    should of course not add the UNIQUE constraint on that column only, but
    add more columns, to ensure that you'll ahve a natural, verifiable key in
    addition to the convenient but meaningless identity column. Remember:
    without verifiable key, you'll never be able to know if the below example
    data is correct or a result of udplicated data entry:
    ID | Name
    ----+------------
    17 | Microsoft
    36 | Microsoft

    In the CompanyLocation s table, you'll gain a few bytes, a little bit of
    speed and improved self-documenting if you change State to char(2). Also,
    I'm surprised that you allow all columns to be NULL - most forms I have to
    fill out require me to supply at least address, city, state (if US) or
    country (if not US) and ZIP. Only phone is often optional.


    Now, on to your question. If you have a real strong urge to use @@IDENTITY
    or SCOPE_IDENTITY (), then you'll have to use a cursor or another way to
    loop through the rows in OriginalList and process them one by one. But
    that is not needed at all - as long as you have a good natural key (and
    you always should have!), there are other ways to find the IDENTITY value
    of a row: by using the natural key.

    Try this snippet of (untested) code. My assumption is that Companies.Name
    is indeed unique - if it isn't, you'll have to adapt the code (and your
    tables).

    CREATE PROCEDURE DataScrubSP AS
    BEGIN TRANSACTION
    -- Insert all companies from OriginalList
    INSERT Companies (Name)
    SELECT Company
    FROM OriginalList
    IF @@Error <> 0
    GOTO ErrorHandler

    -- Now, insert all companylocation s.
    -- Use companyname to find the assigned ID value.
    INSERT CompanyLocation s (CompanyID, Addr1, City, State, Zip)
    SELECT c.ID, o.Addr1, o.City, o.State, o.Zip
    FROM OriginalList AS o
    INNER JOIN Companies AS c
    ON c.Name = o.Company
    IF @@Error <> 0
    GOTO ErrorHandler

    COMMIT TRANSACTION

    ErrorHandler:
    IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION
    RETURN
    GO


    Best, Hugo
    --

    (Remove _NO_ and _SPAM_ to get my e-mail address)

    Comment

    • Erland Sommarskog

      #3
      Re: Stored Proc To Copy Unnormalized to Normalized Table

      Alex (hfamili@yahoo. com) writes:[color=blue]
      > I have a "source" table that is being populated by a DTS bulk import
      > of a text file. I need to scrub the source table after the import
      > step by running appropriate stored proc(s) to copy the source data to
      > 2 normalized tables. The problem is that table "Companies" needs to
      > be populated first in order to generate the Identity ID and then use
      > that as the foreign key in the other table.
      >...
      > This is the stored proc I have at this time that does NOT work. It
      > uses the last Company insert for all the CompanyLocation s which is not
      > correct.[/color]

      Remove the IDENTITY property on Companies, and roll your own:

      CREATE PROCEDURE DataScrubSP AS

      CREATE TABLE #temp (ident int IDENTITY,
      name varchar(50) NOT NULL)

      insert #temp (name) select Company from OriginalList
      IF @@Error <> 0
      GOTO ErrorHandler

      Begin Transaction

      declare @first_id int
      select @first_id = colaesce(MAX(ID ), 0) FROM Companies (UPDLOCK)

      insert Companies(ID, name)
      SELECT @first_id + ident, name FROM #temp
      IF @@Error <> 0
      GOTO ErrorHandler

      insert CompanyLocation s (CompanyID, Addr1, City, State, Zip)
      select c.ID, o.Addr1, o.City, o.State, o.Zip
      from OriginalList o
      Join Companies C ON o.Company = C.name
      IF @@Error <> 0
      GOTO ErrorHandler

      Actually, this example you don't even need that temp table or
      @first_id. Or even to drop the IDENTITY property on Companies.
      All you need the join in the INSERT statement. But I suspect
      that the due to the shrubbing there is some more complexity, so
      I included that technique.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

      Comment

      • alex

        #4
        Re: Stored Proc To Copy Unnormalized to Normalized Table

        Thanks Hugo. I implemented the suggestion and it worked.

        Alex.


        "Hugo Kornelis" <hugo@pe_NO_rFa ct.in_SPAM_fo> wrote in message
        news:rk72p01r1g n49ppfd1qqtgq57 8bt6ptsp8@4ax.c om...[color=blue]
        > On 9 Nov 2004 08:57:44 -0800, Alex wrote:
        >[color=green]
        >>I have a "source" table that is being populated by a DTS bulk import
        >>of a text file. I need to scrub the source table after the import
        >>step by running appropriate stored proc(s) to copy the source data to
        >>2 normalized tables. The problem is that table "Companies" needs to
        >>be populated first in order to generate the Identity ID and then use
        >>that as the foreign key in the other table.[/color]
        >
        > (snip DDL)
        >[color=green]
        >>This is the stored proc I have at this time that does NOT work. It
        >>uses the last Company insert for all the CompanyLocation s which is not
        >>correct.[/color]
        >
        > (snip code)
        >
        > Hi Alex,
        >
        > First some comments on the DDL. OriginalList is obviously some staging
        > table so I can understand the funny datatypes, but are you really sure
        > that all these columns can be NULL? And that no column (or even
        > combination of columns) can be used in a PRIMARY KEY or UNIQUE constraint?
        >
        > In the Companies table, you forgot to include a UNIQUE constraint on the
        > Name column. Your company names are unique, aren't they? If not, you
        > should of course not add the UNIQUE constraint on that column only, but
        > add more columns, to ensure that you'll ahve a natural, verifiable key in
        > addition to the convenient but meaningless identity column. Remember:
        > without verifiable key, you'll never be able to know if the below example
        > data is correct or a result of udplicated data entry:
        > ID | Name
        > ----+------------
        > 17 | Microsoft
        > 36 | Microsoft
        >
        > In the CompanyLocation s table, you'll gain a few bytes, a little bit of
        > speed and improved self-documenting if you change State to char(2). Also,
        > I'm surprised that you allow all columns to be NULL - most forms I have to
        > fill out require me to supply at least address, city, state (if US) or
        > country (if not US) and ZIP. Only phone is often optional.
        >
        >
        > Now, on to your question. If you have a real strong urge to use @@IDENTITY
        > or SCOPE_IDENTITY (), then you'll have to use a cursor or another way to
        > loop through the rows in OriginalList and process them one by one. But
        > that is not needed at all - as long as you have a good natural key (and
        > you always should have!), there are other ways to find the IDENTITY value
        > of a row: by using the natural key.
        >
        > Try this snippet of (untested) code. My assumption is that Companies.Name
        > is indeed unique - if it isn't, you'll have to adapt the code (and your
        > tables).
        >
        > CREATE PROCEDURE DataScrubSP AS
        > BEGIN TRANSACTION
        > -- Insert all companies from OriginalList
        > INSERT Companies (Name)
        > SELECT Company
        > FROM OriginalList
        > IF @@Error <> 0
        > GOTO ErrorHandler
        >
        > -- Now, insert all companylocation s.
        > -- Use companyname to find the assigned ID value.
        > INSERT CompanyLocation s (CompanyID, Addr1, City, State, Zip)
        > SELECT c.ID, o.Addr1, o.City, o.State, o.Zip
        > FROM OriginalList AS o
        > INNER JOIN Companies AS c
        > ON c.Name = o.Company
        > IF @@Error <> 0
        > GOTO ErrorHandler
        >
        > COMMIT TRANSACTION
        >
        > ErrorHandler:
        > IF @@TRANCOUNT > 0
        > ROLLBACK TRANSACTION
        > RETURN
        > GO
        >
        >
        > Best, Hugo
        > --
        >
        > (Remove _NO_ and _SPAM_ to get my e-mail address)[/color]


        Comment

        Working...