BCP Order on SQL Server

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Thomas Richards

    BCP Order on SQL Server

    I have two SQL Server 2000 machines. The same file is sent nightly to
    each server and a stored proc uses BULK INSERT to load it into a
    staging table for processing.

    Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
    column appended to it. (I need this identity column to group by later
    on to remove duplicates.)

    ie

    select tempo.*,
    IDENTITY(int, 1,1) AS ID_Num
    into #test1
    from tempExtract tempo

    My question is : can I expect the ID_Num and the corresponding line of
    the file copied to the table to be the same on each server? Ie will
    each BCP into the staging table occur in the same order on both
    servers given that the file, the BULK INSERT command and the indexes
    are the same on each server.
  • Erland Sommarskog

    #2
    Re: BCP Order on SQL Server

    Thomas Richards (tom.richards@r ocketmail.com) writes:[color=blue]
    > I have two SQL Server 2000 machines. The same file is sent nightly to
    > each server and a stored proc uses BULK INSERT to load it into a
    > staging table for processing.
    >
    > Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
    > column appended to it. (I need this identity column to group by later
    > on to remove duplicates.)
    >
    > ie
    >
    > select tempo.*,
    > IDENTITY(int, 1,1) AS ID_Num
    > into #test1
    > from tempExtract tempo
    >
    > My question is : can I expect the ID_Num and the corresponding line of
    > the file copied to the table to be the same on each server? Ie will
    > each BCP into the staging table occur in the same order on both
    > servers given that the file, the BULK INSERT command and the indexes
    > are the same on each server.[/color]

    No, you would need to have the identity column on the table you load
    the file into. I don't know for sure that you can trust the IDENTITY
    value to match the input file exactly, and if it works, it is likely
    to by mere chance. That is, there is no committment from Microsoft
    that it should work, and it could change in a future version of SQL
    Server.

    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server SP3 at
    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

    Comment

    • Thomas Richards

      #3
      Re: BCP Order on SQL Server

      Erland Sommarskog <esquel@sommars kog.se> wrote in message news:<Xns9556E9 43F45FAYazorman @127.0.0.1>...[color=blue]
      > Thomas Richards (tom.richards@r ocketmail.com) writes:[color=green]
      > > I have two SQL Server 2000 machines. The same file is sent nightly to
      > > each server and a stored proc uses BULK INSERT to load it into a
      > > staging table for processing.
      > >
      > > Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
      > > column appended to it. (I need this identity column to group by later
      > > on to remove duplicates.)
      > >
      > > ie
      > >
      > > select tempo.*,
      > > IDENTITY(int, 1,1) AS ID_Num
      > > into #test1
      > > from tempExtract tempo
      > >
      > > My question is : can I expect the ID_Num and the corresponding line of
      > > the file copied to the table to be the same on each server? Ie will
      > > each BCP into the staging table occur in the same order on both
      > > servers given that the file, the BULK INSERT command and the indexes
      > > are the same on each server.[/color]
      >
      > No, you would need to have the identity column on the table you load
      > the file into. I don't know for sure that you can trust the IDENTITY
      > value to match the input file exactly, and if it works, it is likely
      > to by mere chance. That is, there is no committment from Microsoft
      > that it should work, and it could change in a future version of SQL
      > Server.[/color]

      Thanks for that. The problem that I'm trying to get round is that I
      have a key field and then one or more addresses. The key field and the
      fields that make up the address are all chars/varchars. I want to pick
      one arbitrary address to associate with the key and put in another
      table. There are no business rules (eg always take the one with the
      lowest street number) that will always identify just one of the
      addresses. Originally, I thought group by key and line number and pick
      the one with the lowest number. I would prefer to do this as it would
      get the first entry in the file which would more than likely give me
      the better address details. However as you've pointed out I can't
      depend on the order when bcp'ed in. Is there any other way to do this
      or would I have to get line number added to the file before SQL Server
      processes it?

      Comment

      • Erland Sommarskog

        #4
        Re: BCP Order on SQL Server

        Thomas Richards (tom.richards@r ocketmail.com) writes:[color=blue]
        > Thanks for that. The problem that I'm trying to get round is that I
        > have a key field and then one or more addresses. The key field and the
        > fields that make up the address are all chars/varchars. I want to pick
        > one arbitrary address to associate with the key and put in another
        > table. There are no business rules (eg always take the one with the
        > lowest street number) that will always identify just one of the
        > addresses. Originally, I thought group by key and line number and pick
        > the one with the lowest number. I would prefer to do this as it would
        > get the first entry in the file which would more than likely give me
        > the better address details. However as you've pointed out I can't
        > depend on the order when bcp'ed in. Is there any other way to do this
        > or would I have to get line number added to the file before SQL Server
        > processes it?[/color]

        The only way to be sure is to add the line numbers yourself. This can be
        done in two ways: 1) Manipulate the file, by running it through a program
        that adds a line number. 2) Instead of writing a to file, have the program
        to insert the data. In fact, you can still use bulk load, but you would
        bulk from variables, using the BCP API.

        However, BULK INSERT into a table with an IDENTITY gives you fairly good
        odds, and as I understand your case, it does not seem to be a disaster,
        if number is not what you expect. So I would go for that.


        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server SP3 at
        Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

        Comment

        • Thomas Richards

          #5
          Re: BCP Order on SQL Server

          Thanks,

          I'm going to have problems adding the line numbers to the file unless
          there's an easy way to do it that uses windows built-in functionality.
          Unfortunately the file comes from a mainframe extract so the I can't
          change the program either.

          Just to be clear about what I'm doing, I bulk insert into a table that
          has a KEY field and then one or more addresses eg:

          KEY, ADDRESS1, ADDRESS2
          -----------------------------
          FRED, HOG STREET, HOGLAND
          FRED, HOG STREET, HOGLANDIO

          I need to take the key (ie FRED) and one address (doesn't matter
          which) and put it into another table. However, I have to get the same
          address on each server. That's more important than trying to get the
          first one in the file.

          I'm going to try out the bulk insert with identity. If I put a
          clustered key on the table that is bulk inserted to on all columns, I
          would assume that would force the order in the table to be the same on
          both servers - what do you think?

          Cheers
          Tom
          [color=blue]
          > The only way to be sure is to add the line numbers yourself. This can be
          > done in two ways: 1) Manipulate the file, by running it through a program
          > that adds a line number. 2) Instead of writing a to file, have the program
          > to insert the data. In fact, you can still use bulk load, but you would
          > bulk from variables, using the BCP API.
          >
          > However, BULK INSERT into a table with an IDENTITY gives you fairly good
          > odds, and as I understand your case, it does not seem to be a disaster,
          > if number is not what you expect. So I would go for that.[/color]

          Comment

          • Thomas Richards

            #6
            Re: BCP Order on SQL Server

            My latest thinking on this is to create a table the same as the table
            holding the 'key' and address components but with an extra identity
            type field.

            Then insert into this table ordering by key + all columns. This will
            force the sequence number to match the same row on each server and the
            'key' fields to be sequentially next to each other. Then I can do a
            group by, picking up the lowest sequence number.

            eg

            SELECT KEY,
            ADDRESS1,
            ADDRESS2,
            IDENTITY(int,1, 1) as Seq
            INTO tempTable
            FROM tempExtract
            WHERE 1=2

            INSERT INTO tempTable
            SELECT KEY,
            ADDRESS1,
            ADDRESS2
            FROM tempExtract
            ORDER BY KEY,
            ADDRESS1,
            ADDRESS2

            -- Finally get a key with just one address
            SELECT KEY,
            ADDRESS1,
            ADDRESS2
            FROM tempTable
            WHERE SEQ = (SELECT MIN (Seq)
            FROM tempTable sub
            WHERE sub.KEY = tempTable.KEY)

            Can you see any holes in that?!

            Cheers
            Tom

            Comment

            • Erland Sommarskog

              #7
              Re: BCP Order on SQL Server

              Thomas Richards (tom.richards@r ocketmail.com) writes:[color=blue]
              > I'm going to have problems adding the line numbers to the file unless
              > there's an easy way to do it that uses windows built-in functionality.[/color]

              Adding such line numbers is a very simple program that can be written
              VBscript, Perl, C or whatever your preference is. The one catch is that
              this is not very effcient if the file is huge.
              [color=blue]
              > I'm going to try out the bulk insert with identity. If I put a
              > clustered key on the table that is bulk inserted to on all columns, I
              > would assume that would force the order in the table to be the same on
              > both servers - what do you think?[/color]

              What matters is the order that the rows hit the tables. My guess is that
              a completely indexless table is better.
              [color=blue]
              > INSERT INTO tempTable
              > SELECT KEY,
              > ADDRESS1,
              > ADDRESS2
              > FROM tempExtract
              > ORDER BY KEY,
              > ADDRESS1,
              > ADDRESS2
              >...
              >
              > Can you see any holes in that?![/color]

              The sad case is that neither is there any guarantee with an INSERT
              statement that the IDENITTY value will reflect the ORDER BY statement.
              But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
              turns off parallelism which is the major reason the ORDER BY gets messed up.

              But maybe you should rethink completely. Maybe you should bulk into one
              server, remove the duplicates, and the replicate the result to the
              second server. This could be done by a linked query, or bulking out and
              in again.




              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

              Comment

              • Thomas Richards

                #8
                Re: BCP Order on SQL Server

                My latest thinking on this is to create a table the same as the table
                holding the 'key' and address components but with an extra identity
                type field.

                Then insert into this table ordering by key + all columns. This will
                force the sequence number to match the same row on each server and the
                'key' fields to be sequentially next to each other. Then I can do a
                group by, picking up the lowest sequence number.

                eg

                SELECT KEY,
                ADDRESS1,
                ADDRESS2,
                IDENTITY(int,1, 1) as Seq
                INTO tempTable
                FROM tempExtract
                WHERE 1=2

                INSERT INTO tempTable
                SELECT KEY,
                ADDRESS1,
                ADDRESS2
                FROM tempExtract
                ORDER BY KEY,
                ADDRESS1,
                ADDRESS2

                -- Finally get a key with just one address
                SELECT KEY,
                ADDRESS1,
                ADDRESS2
                FROM tempTable
                WHERE SEQ = (SELECT MIN (Seq)
                FROM tempTable sub
                WHERE sub.KEY = tempTable.KEY)

                Can you see any holes in that?!

                Cheers
                Tom

                Comment

                • Erland Sommarskog

                  #9
                  Re: BCP Order on SQL Server

                  Thomas Richards (tom.richards@r ocketmail.com) writes:[color=blue]
                  > My latest thinking on this is to create a table the same as the table
                  > holding the 'key' and address components but with an extra identity
                  > type field.
                  >
                  > Then insert into this table ordering by key + all columns. This will
                  > force the sequence number to match the same row on each server and the
                  > 'key' fields to be sequentially next to each other. Then I can do a
                  > group by, picking up the lowest sequence number.
                  >...[/color]

                  That seems to the same suggestion, to which I answered once, so I
                  simply repear that answer:

                  The sad case is that neither is there any guarantee with an INSERT
                  statement that the IDENITTY value will reflect the ORDER BY statement.
                  But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
                  turns off parallelism which is the major reason the ORDER BY gets messed up.

                  But maybe you should rethink completely. Maybe you should bulk into one
                  server, remove the duplicates, and the replicate the result to the
                  second server. This could be done by a linked query, or bulking out and
                  in again.



                  --
                  Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                  Books Online for SQL Server SP3 at
                  Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                  Comment

                  • Thomas Richards

                    #10
                    Re: BCP Order on SQL Server

                    Sorry, what I meant to post was:

                    I'm thinking of doing the following to ensure I get the same row on
                    each server (assuming identical collations).
                    It is supposed to only bring back the row that matches the TOP value
                    of all the fields concatenated. I've tested it on identical servers
                    and it appears to work and the theory seems fine to me. Can you see
                    anything wrong with this??

                    Thanks
                    Tom

                    SELECT KEY,
                    ADDRESS1,
                    ADDRESS2
                    FROM TABLE main
                    WHERE ISNULL(KEY,'Z') +
                    ISNULL(ADDRESS1 , 'Z') +
                    ISNULL(ADDRESS2 , 'Z') =
                    (SELECT TOP 1 ISNULL(KEY, 'Z') +
                    ISNULL(ADDRESS1 , 'Z') +
                    ISNULL(ADDRESS2 , 'Z') =
                    FROM TABLE sub
                    WHERE sub.KEY = main.KEY
                    ORDER BY ISNULL(KEY,'Z') +
                    ISNULL(ADDRESS1 , 'Z') +
                    ISNULL(ADDRESS2 , 'Z'))

                    Comment

                    • Erland Sommarskog

                      #11
                      Re: BCP Order on SQL Server

                      Thomas Richards (tom.richards@r ocketmail.com) writes:[color=blue]
                      > I'm thinking of doing the following to ensure I get the same row on
                      > each server (assuming identical collations).
                      > It is supposed to only bring back the row that matches the TOP value
                      > of all the fields concatenated. I've tested it on identical servers
                      > and it appears to work and the theory seems fine to me. Can you see
                      > anything wrong with this??
                      >
                      > Thanks
                      > Tom
                      >
                      > SELECT KEY,
                      > ADDRESS1,
                      > ADDRESS2
                      > FROM TABLE main
                      > WHERE ISNULL(KEY,'Z') +
                      > ISNULL(ADDRESS1 , 'Z') +
                      > ISNULL(ADDRESS2 , 'Z') =
                      > (SELECT TOP 1 ISNULL(KEY, 'Z') +
                      > ISNULL(ADDRESS1 , 'Z') +
                      > ISNULL(ADDRESS2 , 'Z') =
                      > FROM TABLE sub
                      > WHERE sub.KEY = main.KEY
                      > ORDER BY ISNULL(KEY,'Z') +
                      > ISNULL(ADDRESS1 , 'Z') +
                      > ISNULL(ADDRESS2 , 'Z'))[/color]

                      You could get duplicates if you have some really weird data which
                      gives the same result for two concatenations, but I guess that is
                      a calculated risk. And you would get the same duplicates on both
                      servers.

                      Ah, there is one more catch - you must make sure that both databases
                      have the same collation. But since you can specify the collation per
                      column when you create the table, you can take care of that.

                      --
                      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                      Books Online for SQL Server SP3 at
                      Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                      Comment

                      Working...