Copy 60GB table?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • sql_server_user

    Copy 60GB table?

    I'm trying to copy all 440 million rows from one table in my SQL Server
    2005 db to another table with a different clustering scheme. After a
    few test inserts that were successful (up to a million rows in 20
    seconds with no problem), I crossed my fingers and went whole-hog with
    the naive "insert into dest_table (column1, column2, ...) select
    column1, column2, ... from source_table" but of course it ran out of
    space in *both* the db log file (at about 130 GB) and the tempdb log
    file (at about 54GB) and rolled it back, wrapping up 10 hours after I
    kicked it off.

    I think it would work if I could insert in batches and/or disable
    logging, like with a bulk insert, so I tried exporting the source table
    to a flat file in the hopes of bulk-importing it back into the
    destination table, but the export just halted for no reason after 176
    million rows, using all the CPU and not doing anything at all, just
    sitting there. (The fact that this operation not only failed, but
    brought down the whole server - any operation on any database in the
    instance timed out, no clients could connect, the whole deal - I find
    very disappointing for what MS claimed was a scalable, robust product
    that could rival Oracle.)

    What I'm wondering is, does anyone know of a way to copy a table that
    is apparently just too large for SQL Server 2005 to handle?

    Thanks,
    Seth

  • Erland Sommarskog

    #2
    Re: Copy 60GB table?

    sql_server_user (kaioptera@gmai l.com) writes:[color=blue]
    > I'm trying to copy all 440 million rows from one table in my SQL Server
    > 2005 db to another table with a different clustering scheme. After a
    > few test inserts that were successful (up to a million rows in 20
    > seconds with no problem), I crossed my fingers and went whole-hog with
    > the naive "insert into dest_table (column1, column2, ...) select
    > column1, column2, ... from source_table" but of course it ran out of
    > space in *both* the db log file (at about 130 GB) and the tempdb log
    > file (at about 54GB) and rolled it back, wrapping up 10 hours after I
    > kicked it off.[/color]

    That operation is definitely not a walk in a park.

    I think I would attempt this strategy:

    1) Set recovery for the database to simple recovery.
    2) Create the clustered index on the target table with a relatively low
    fill factor, say 30%. (Of course, no NC indexes would be in place.)
    3) Insert batchwise from the old table, where batches are defined by
    intervals of the clustered index of the source table. For batchsize,
    I would use 100000 to 500000 rows, depending on wide the table is.
    4) I would make the control loop so that it is easy to stop it. For
    instance on each iteration read from a control table. Possibly also
    a table the defines the batches, so you can keep track of what has
    been copied. But the table should not be joined with the INSERT;
    read interval into variales.
    5) Occassionally stop the process and monitor fragmentation in
    target table. It should first decrease, as holes are filled in,
    but then it will increase again. (But you will probably see this
    from the fact that the time for iteration increases.)
    6) Once everything is done, switch to full recovery and backup the
    database.

    My thinking here is that creating the clustered index on this monster
    is going require a lot of log and tempdb - you can guess what happens.
    So let's have the clustered index in place from the start, even if
    that will take longer time.



    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server 2005 at

    Books Online for SQL Server 2000 at

    Comment

    • sql_server_user

      #3
      Re: Copy 60GB table?

      Thanks, Erland - I'm gonna have to pay you a consulting fee soon.

      The recovery model is set to simple, there are no nonclustered indexes
      on the table, and I agree that clustering on insert is the way to go,
      since I think trying to recluster the new table with all the data in it
      would just blow up the logs and tempdb like my initial insert attempt.
      I like the idea of the batched insert with the low fill factor
      clustered index, however, I ran a bunch of tests and the amount of log
      and tempdb space required seems to be a function of the size of the
      destination table, not the number of rows in the batch: I would insert
      a batch (10 million rows), shrink the log and tempdb, insert another
      batch, and so on, and every time, the log and tempdb would require more
      space, roughly linearly as my very approximate data below indicate:

      after 1st batch: log = 1GB, tempdb = 2GB
      after shrinking and 2nd batch: log = 2GB, tempdb = 4GB
      after shrinking and 3rd batch: log = 3 GB, tempdb = 6GB

      So I'm worried about the batch approach for the full insert.

      I'm currently trying an SSIS copy from one table to the other, and
      since it's doing a bulk insert it doesn't seem to be touching the db
      log. However, tempdb is getting pretty fat, 12GB and going strong, but
      I've cleared a lot of room for it this time - it has about 179GB to go,
      so hopefully that will be enough. If this doesn't work I'll definitely
      try your batched insert suggestion.

      Thanks again,
      Seth

      Erland Sommarskog wrote:[color=blue]
      > sql_server_user (kaioptera@gmai l.com) writes:[color=green]
      > > I'm trying to copy all 440 million rows from one table in my SQL Server
      > > 2005 db to another table with a different clustering scheme. After a
      > > few test inserts that were successful (up to a million rows in 20
      > > seconds with no problem), I crossed my fingers and went whole-hog with
      > > the naive "insert into dest_table (column1, column2, ...) select
      > > column1, column2, ... from source_table" but of course it ran out of
      > > space in *both* the db log file (at about 130 GB) and the tempdb log
      > > file (at about 54GB) and rolled it back, wrapping up 10 hours after I
      > > kicked it off.[/color]
      >
      > That operation is definitely not a walk in a park.
      >
      > I think I would attempt this strategy:
      >
      > 1) Set recovery for the database to simple recovery.
      > 2) Create the clustered index on the target table with a relatively low
      > fill factor, say 30%. (Of course, no NC indexes would be in place.)
      > 3) Insert batchwise from the old table, where batches are defined by
      > intervals of the clustered index of the source table. For batchsize,
      > I would use 100000 to 500000 rows, depending on wide the table is.
      > 4) I would make the control loop so that it is easy to stop it. For
      > instance on each iteration read from a control table. Possibly also
      > a table the defines the batches, so you can keep track of what has
      > been copied. But the table should not be joined with the INSERT;
      > read interval into variales.
      > 5) Occassionally stop the process and monitor fragmentation in
      > target table. It should first decrease, as holes are filled in,
      > but then it will increase again. (But you will probably see this
      > from the fact that the time for iteration increases.)
      > 6) Once everything is done, switch to full recovery and backup the
      > database.
      >
      > My thinking here is that creating the clustered index on this monster
      > is going require a lot of log and tempdb - you can guess what happens.
      > So let's have the clustered index in place from the start, even if
      > that will take longer time.
      >
      >
      >
      > --
      > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
      >
      > Books Online for SQL Server 2005 at
      > http://www.microsoft.com/technet/pro...ads/books.mspx
      > Books Online for SQL Server 2000 at
      > http://www.microsoft.com/sql/prodinf...ons/books.mspx[/color]

      Comment

      • Greg D. Moore \(Strider\)

        #4
        Re: Copy 60GB table?


        "sql_server_use r" <kaioptera@gmai l.com> wrote in message
        news:1147299426 .561430.312200@ u72g2000cwu.goo glegroups.com.. .[color=blue]
        > Thanks, Erland - I'm gonna have to pay you a consulting fee soon.
        >[/color]

        BTW, might be worth looking at using BCP to dump it out and then insert it
        into the new table.

        Then build your indices.

        [color=blue]
        > The recovery model is set to simple, there are no nonclustered indexes
        > on the table, and I agree that clustering on insert is the way to go,
        > since I think trying to recluster the new table with all the data in it
        > would just blow up the logs and tempdb like my initial insert attempt.
        > I like the idea of the batched insert with the low fill factor
        > clustered index, however, I ran a bunch of tests and the amount of log
        > and tempdb space required seems to be a function of the size of the
        > destination table, not the number of rows in the batch: I would insert
        > a batch (10 million rows), shrink the log and tempdb, insert another
        > batch, and so on, and every time, the log and tempdb would require more
        > space, roughly linearly as my very approximate data below indicate:
        >
        > after 1st batch: log = 1GB, tempdb = 2GB
        > after shrinking and 2nd batch: log = 2GB, tempdb = 4GB
        > after shrinking and 3rd batch: log = 3 GB, tempdb = 6GB
        >
        > So I'm worried about the batch approach for the full insert.
        >
        > I'm currently trying an SSIS copy from one table to the other, and
        > since it's doing a bulk insert it doesn't seem to be touching the db
        > log. However, tempdb is getting pretty fat, 12GB and going strong, but
        > I've cleared a lot of room for it this time - it has about 179GB to go,
        > so hopefully that will be enough. If this doesn't work I'll definitely
        > try your batched insert suggestion.
        >
        > Thanks again,
        > Seth
        >
        > Erland Sommarskog wrote:[color=green]
        > > sql_server_user (kaioptera@gmai l.com) writes:[color=darkred]
        > > > I'm trying to copy all 440 million rows from one table in my SQL[/color][/color][/color]
        Server[color=blue][color=green][color=darkred]
        > > > 2005 db to another table with a different clustering scheme. After a
        > > > few test inserts that were successful (up to a million rows in 20
        > > > seconds with no problem), I crossed my fingers and went whole-hog with
        > > > the naive "insert into dest_table (column1, column2, ...) select
        > > > column1, column2, ... from source_table" but of course it ran out of
        > > > space in *both* the db log file (at about 130 GB) and the tempdb log
        > > > file (at about 54GB) and rolled it back, wrapping up 10 hours after I
        > > > kicked it off.[/color]
        > >
        > > That operation is definitely not a walk in a park.
        > >
        > > I think I would attempt this strategy:
        > >
        > > 1) Set recovery for the database to simple recovery.
        > > 2) Create the clustered index on the target table with a relatively low
        > > fill factor, say 30%. (Of course, no NC indexes would be in place.)
        > > 3) Insert batchwise from the old table, where batches are defined by
        > > intervals of the clustered index of the source table. For batchsize,
        > > I would use 100000 to 500000 rows, depending on wide the table is.
        > > 4) I would make the control loop so that it is easy to stop it. For
        > > instance on each iteration read from a control table. Possibly also
        > > a table the defines the batches, so you can keep track of what has
        > > been copied. But the table should not be joined with the INSERT;
        > > read interval into variales.
        > > 5) Occassionally stop the process and monitor fragmentation in
        > > target table. It should first decrease, as holes are filled in,
        > > but then it will increase again. (But you will probably see this
        > > from the fact that the time for iteration increases.)
        > > 6) Once everything is done, switch to full recovery and backup the
        > > database.
        > >
        > > My thinking here is that creating the clustered index on this monster
        > > is going require a lot of log and tempdb - you can guess what happens.
        > > So let's have the clustered index in place from the start, even if
        > > that will take longer time.
        > >
        > >
        > >
        > > --
        > > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
        > >
        > > Books Online for SQL Server 2005 at
        > >[/color][/color]
        http://www.microsoft.com/technet/pro...ads/books.mspx[color=blue][color=green]
        > > Books Online for SQL Server 2000 at
        > > http://www.microsoft.com/sql/prodinf...ons/books.mspx[/color]
        >[/color]


        Comment

        • Tony Rogerson

          #5
          Re: Copy 60GB table?

          > 2) Create the clustered index on the target table with a relatively low[color=blue]
          > fill factor, say 30%. (Of course, no NC indexes would be in place.)[/color]

          As the table is empty there is no point putting on the fillfactor because
          its not enforced for new pages.

          Only good for creating indexes on existing data.

          I agree though - clustered index should be in place before you start.

          --
          Tony Rogerson
          SQL Server MVP
          http://sqlserverfaq.com - free video tutorials


          "Erland Sommarskog" <esquel@sommars kog.se> wrote in message
          news:Xns97BFF29 DB95D8Yazorman@ 127.0.0.1...[color=blue]
          > sql_server_user (kaioptera@gmai l.com) writes:[color=green]
          >> I'm trying to copy all 440 million rows from one table in my SQL Server
          >> 2005 db to another table with a different clustering scheme. After a
          >> few test inserts that were successful (up to a million rows in 20
          >> seconds with no problem), I crossed my fingers and went whole-hog with
          >> the naive "insert into dest_table (column1, column2, ...) select
          >> column1, column2, ... from source_table" but of course it ran out of
          >> space in *both* the db log file (at about 130 GB) and the tempdb log
          >> file (at about 54GB) and rolled it back, wrapping up 10 hours after I
          >> kicked it off.[/color]
          >
          > That operation is definitely not a walk in a park.
          >
          > I think I would attempt this strategy:
          >
          > 1) Set recovery for the database to simple recovery.
          > 2) Create the clustered index on the target table with a relatively low
          > fill factor, say 30%. (Of course, no NC indexes would be in place.)
          > 3) Insert batchwise from the old table, where batches are defined by
          > intervals of the clustered index of the source table. For batchsize,
          > I would use 100000 to 500000 rows, depending on wide the table is.
          > 4) I would make the control loop so that it is easy to stop it. For
          > instance on each iteration read from a control table. Possibly also
          > a table the defines the batches, so you can keep track of what has
          > been copied. But the table should not be joined with the INSERT;
          > read interval into variales.
          > 5) Occassionally stop the process and monitor fragmentation in
          > target table. It should first decrease, as holes are filled in,
          > but then it will increase again. (But you will probably see this
          > from the fact that the time for iteration increases.)
          > 6) Once everything is done, switch to full recovery and backup the
          > database.
          >
          > My thinking here is that creating the clustered index on this monster
          > is going require a lot of log and tempdb - you can guess what happens.
          > So let's have the clustered index in place from the start, even if
          > that will take longer time.
          >
          >
          >
          > --
          > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
          >
          > Books Online for SQL Server 2005 at
          > http://www.microsoft.com/technet/pro...ads/books.mspx
          > Books Online for SQL Server 2000 at
          > http://www.microsoft.com/sql/prodinf...ons/books.mspx[/color]


          Comment

          • Erland Sommarskog

            #6
            Re: Copy 60GB table?

            Tony Rogerson (tonyrogerson@s qlserverfaq.com ) writes:[color=blue][color=green]
            >> 2) Create the clustered index on the target table with a relatively low
            >> fill factor, say 30%. (Of course, no NC indexes would be in place.)[/color]
            >
            > As the table is empty there is no point putting on the fillfactor because
            > its not enforced for new pages.
            >
            > Only good for creating indexes on existing data.
            >
            > I agree though - clustered index should be in place before you start.[/color]

            Eh, I guess that I will have to stick with my general lame excuse that
            the hour was late.

            However, it could maybe be idea to reindex every now and then with a low
            fill-factor.



            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server 2005 at

            Books Online for SQL Server 2000 at

            Comment

            • sql_server_user

              #7
              Re: Copy 60GB table?

              SSIS worked. It looks like it just put the *entire* table in tempdb -
              to sort it into the new clustering order, I guess. The key was
              eliminating logging with the bulk insert. Thanks for the help,
              everybody.

              Comment

              Working...