Appending data to a table but not duplicates

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • paul@domainscanners.com

    Appending data to a table but not duplicates

    Hiya everyone,

    I have two tables in SQL 2000. I would like to append the contents of
    TableA to TableB.

    Table A has around 1.1 Million Records.
    Table B has around 1 Million Reocords.

    Basically TableA has all of the data held in TableB plus 100,000
    additional records. I would only like to import or append these new
    additional records. I have a unique index already setup on Table B.

    Any ideas pretty pretty please?

    Paul.

    Ps. (Have been messing around with DTS but get a unique violation error
    - Which is kinda what I want I guess, but would like SQL to ignore the
    error and only copy the new data - if only)

  • Simon Hayes

    #2
    Re: Appending data to a table but not duplicates


    <paul@domainsca nners.com> wrote in message
    news:1119371481 .949023.149970@ g43g2000cwa.goo glegroups.com.. .[color=blue]
    > Hiya everyone,
    >
    > I have two tables in SQL 2000. I would like to append the contents of
    > TableA to TableB.
    >
    > Table A has around 1.1 Million Records.
    > Table B has around 1 Million Reocords.
    >
    > Basically TableA has all of the data held in TableB plus 100,000
    > additional records. I would only like to import or append these new
    > additional records. I have a unique index already setup on Table B.
    >
    > Any ideas pretty pretty please?
    >
    > Paul.
    >
    > Ps. (Have been messing around with DTS but get a unique violation error
    > - Which is kinda what I want I guess, but would like SQL to ignore the
    > error and only copy the new data - if only)
    >[/color]

    insert into dbo.TableB
    (col1, col2, col3,...)
    select col1, col2, col3...
    from dbo.TableA a
    where not exists (
    select *
    from dbo.TableB b
    where a.keycol = b.keycol)

    Simon


    Comment

    • rcamarda

      #3
      Re: Appending data to a table but not duplicates

      Simon,
      how is your sql different from

      insert into dbo.TableB
      (col1, col2, col3,...)
      select col1, col2, col3...
      from dbo.TableA a
      where keycol in (select keycol from dbo.tableb)
      ??
      TIA
      Rob

      Comment

      • Erland Sommarskog

        #4
        Re: Appending data to a table but not duplicates

        rcamarda (rcamarda@cable speed.com) writes:[color=blue]
        > Simon,
        > how is your sql different from
        >
        > insert into dbo.TableB
        > (col1, col2, col3,...)
        > select col1, col2, col3...
        > from dbo.TableA a
        > where keycol in (select keycol from dbo.tableb)[/color]

        That's one hell of a difference - you are inserting the duplicates only. :-)

        But, OK, put in the NOT, and your query is the same as Simon's. In SQL 6.5
        there was a difference in performance, NOT IN usually executed slower. I
        think that in SQL 2000, the optimizer rewrites the query internally.

        Anyway, there is still an advantage with the style that Simon used.
        Consider this query:

        insert into dbo.TableB
        (col1, col2, col3,...)
        select col1, col2, col3...
        from dbo.TableA a
        where not exists (
        select *
        from dbo.TableB b
        where a.keycol1 = b.keycol1
        and a.keycol2 = B.keycol2)

        That is not easily recast to NOT IN.

        So NOT EXISTS is simply an operation you need to master.


        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server SP3 at
        SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

        Comment

        • rcamarda

          #5
          Re: Appending data to a table but not duplicates

          er. sorry. missed the not. I'm interesting in "exists" vs. in (select
          .... )
          so. are you saying that the advantage is when you are looking for
          something that does not exist? Otherwise "..exists (select .." is the
          same as "in (select ..." ?

          Comment

          • Erland Sommarskog

            #6
            Re: Appending data to a table but not duplicates

            rcamarda (rcamarda@cable speed.com) writes:[color=blue]
            > er. sorry. missed the not. I'm interesting in "exists" vs. in (select
            > ... )
            > so. are you saying that the advantage is when you are looking for
            > something that does not exist? Otherwise "..exists (select .." is the
            > same as "in (select ..." ?[/color]

            Same thing there, EXISTS is the only that works when your condition
            is more than a single column. There is also a gotcha there are NULL
            values involved.

            It's partly a matter of style, but I use (NOT) EXISTS far more often
            then (NOT) IN. (With subqueries, that is. (NOT) IN a list of values
            is another matter.)


            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

            Comment

            • Simon Hayes

              #7
              Re: Appending data to a table but not duplicates

              Yes - for a NOT IN condition, it's almost always better to use NOT
              EXISTS. The main issue, as Erland pointed out, is the possibility of a
              NULL in the subquery. Consider this simplified example:

              IF 1 IN (1, 2, 3, NULL) PRINT 'True'

              Obviously the condition is TRUE, but now consider this:

              IF 1 NOT IN (2, 3, NULL) PRINT 'True'

              Now we don't know if the condition is TRUE or not - the NULL has an
              unknown value, so in principle it could be a 1, and therefore the whole
              condition evaluates to UNKNOWN. So in the case of a correlated
              subquery, any NULLs in the subquery mean that the whole query returns
              no rows. Using NOT EXISTS avoids this trap.

              Admittedly, you often use primary key columns in the correlation, so
              there could never be a NULL in the subquery, but I think it's better to
              have a 'safer' habit of using NOT EXISTS. And as Erland also mentioned,
              there is some personal taste involved - I find that EXISTS/NOT EXISTS
              expresses the intention of the query more clearly, especially when
              someone is quickly looking through the code.

              Simon

              Comment

              • rcamarda

                #8
                Re: Appending data to a table but not duplicates

                Thanks Guys!

                Comment

                • paul@domainscanners.com

                  #9
                  Re: Appending data to a table but not duplicates

                  Hi Guys,

                  Thanks very much for your answers to my questions. I ran the query you
                  supplied and it worked fine although I now have another problem and was
                  wondering if you would be able to helpo me out again?

                  Basically I now have a table containing all the data I need but the new
                  data has left a gap in the Identity column that im using.

                  Basically the original data's identity column went up to 1,000,000. I
                  was hoping that the new data that was appended would be inserted as
                  1,000,001 then 1,000,002 then 1,000,003 all the way up to 1,100,000.
                  However the new appended data went in as 1,254,324 then 1,254,325 etc.

                  Is there a command I can run to resnycronise my identity column? so
                  that the ID's run smoothly from 0 through to 1,1000,000?

                  Hope you can help me out again,

                  Paul.

                  Comment

                  • Erland Sommarskog

                    #10
                    Re: Appending data to a table but not duplicates

                    (paul@domainsca nners.com) writes:[color=blue]
                    > Basically the original data's identity column went up to 1,000,000. I
                    > was hoping that the new data that was appended would be inserted as
                    > 1,000,001 then 1,000,002 then 1,000,003 all the way up to 1,100,000.
                    > However the new appended data went in as 1,254,324 then 1,254,325 etc.
                    >
                    > Is there a command I can run to resnycronise my identity column? so
                    > that the ID's run smoothly from 0 through to 1,1000,000?[/color]

                    If you want contiguous ids, or at least control over them, don't
                    use the IDENTITY property. When you attempt to insert a row into
                    a table with the IDENTITY property, you consume one number, even if
                    the INSERT fails. This may seem stupid, but it is actually a feature,
                    because it speeds up concurrency. If the number would be reused in
                    case of failure, SQL Server would need to lock the number, and no
                    other process had been able to insert until the INSERT have completed.


                    --
                    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                    Books Online for SQL Server SP3 at
                    SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

                    Comment

                    Working...