CREATE INDEX on large table

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • dfurtney

    CREATE INDEX on large table

    SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
    that we need to add some indexes for. In a test, it took over 12 hours
    to CREATE a new INDEX against this table. One of us suggested that we
    create a temp table with the new index and copy the data from the old
    table into the new one, then rename it. I understand this took 15
    minutes. Why the heck would it be faster to move the data and build
    multiple indexes incrementally vs adding an index??

  • David Rawheiser

    #2
    Re: CREATE INDEX on large table

    An index on a sorted table is quicker as the indexing process does not need
    to reorganized it as its creating the index.

    "dfurtney" <dfurtney@hotma il.com> wrote in message
    news:1105492090 .497574.233360@ z14g2000cwz.goo glegroups.com.. .[color=blue]
    > SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
    > that we need to add some indexes for. In a test, it took over 12 hours
    > to CREATE a new INDEX against this table. One of us suggested that we
    > create a temp table with the new index and copy the data from the old
    > table into the new one, then rename it. I understand this took 15
    > minutes. Why the heck would it be faster to move the data and build
    > multiple indexes incrementally vs adding an index??
    >[/color]


    Comment

    • Tzvika Barenholz

      #3
      Re: CREATE INDEX on large table

      Hi

      This seems unlikely.
      You did not mention if the table is a heap (i.e. there is no clustered
      index).
      You probably ran into a case where the data in the old table was so out
      of order that building the additional index was constatnly splitting
      pages.

      Comment

      • Erland Sommarskog

        #4
        Re: CREATE INDEX on large table

        dfurtney (dfurtney@hotma il.com) writes:[color=blue]
        > SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)
        > that we need to add some indexes for. In a test, it took over 12 hours
        > to CREATE a new INDEX against this table. One of us suggested that we
        > create a temp table with the new index and copy the data from the old
        > table into the new one, then rename it. I understand this took 15
        > minutes. Why the heck would it be faster to move the data and build
        > multiple indexes incrementally vs adding an index??[/color]

        12 hours to create an index for three million rows sounds abnormal.

        Of course, if the table did not have a clustered index, but already had
        several non-clustered index, and you added a clustered index, then it
        will take some time, but still not 12 hours.

        One possible reason, is that the CREATE INDEX process was blocked by
        another process most of the time.



        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server SP3 at
        Transform your business with a unified data platform. SQL Server 2019 comes with Apache Spark and Hadoop Distributed File System (HDFS) for intelligence over all your data.

        Comment

        • dfurtney

          #5
          Re: CREATE INDEX on large table

          The table, vehicle_history , had a clustered primary key and 2
          additional indexes. We were adding an additional index on a single
          integer column. That column was filled with a single default value of
          0 in this case. It was running on a dedicated QA server.

          We want to add this index to speed up a query against the new column.
          But we found it was taking much longer than we expected to add the
          index. The only thing that seemed somewhat unusual is the size of the
          table.

          Comment

          • Erland Sommarskog

            #6
            Re: CREATE INDEX on large table

            dfurtney (dfurtney@hotma il.com) writes:[color=blue]
            > The table, vehicle_history , had a clustered primary key and 2
            > additional indexes. We were adding an additional index on a single
            > integer column. That column was filled with a single default value of
            > 0 in this case. It was running on a dedicated QA server.
            >
            > We want to add this index to speed up a query against the new column.
            > But we found it was taking much longer than we expected to add the
            > index. The only thing that seemed somewhat unusual is the size of the
            > table.[/color]

            Does all three million rows have 0 in this column? In that case it would
            not be a very good index.

            I have no idea whether large amount of duplicate values could be reason
            that creating the index so long. I still lean towards that there was some-
            thing else, for instance blocking, that was the cause. It simply doesn't
            take 12 hours to create a non-clustered index on a three-million row table.


            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            Transform your business with a unified data platform. SQL Server 2019 comes with Apache Spark and Hadoop Distributed File System (HDFS) for intelligence over all your data.

            Comment

            • dfurtney

              #7
              Re: CREATE INDEX on large table

              The query/index would be used by a subset of our customers that utilize
              a specific feature of the product. For them, the values would be
              non-zero, of course, and the resulting index quite selective. However,
              we were going to add the index for all customers since we generally
              don't know what functionaly they will be utilizing. This dataset was
              from a customer not using that function.

              I thought it was awfully long and one of my co-workers was going to do
              some testing. What order of time would you expect?

              Comment

              • Erland Sommarskog

                #8
                Re: CREATE INDEX on large table

                dfurtney (dfurtney@hotma il.com) writes:[color=blue]
                > The query/index would be used by a subset of our customers that utilize
                > a specific feature of the product. For them, the values would be
                > non-zero, of course, and the resulting index quite selective. However,
                > we were going to add the index for all customers since we generally
                > don't know what functionaly they will be utilizing. This dataset was
                > from a customer not using that function.
                >
                > I thought it was awfully long and one of my co-workers was going to do
                > some testing. What order of time would you expect?[/color]

                The below script which emulates the situation you have described ran
                in eight minutes on my workstation, a 2.8 GHz HT box with 1 GB of RAM
                (but with SQL Server constrained to some 120 MB), running Windows XP SP2.
                The particular part of creating a non-clustered index on a column with
                non-variant values took 30 seconds. (But then all data was in cache.)

                Of course, not only number of rows count, but the size of the rows as
                well, since wider the rows, the more pages you get. Then again, for
                the sorting phase there are still only three million rows.

                It occurred to me that one thing you could have run into is autogrow.
                If the database is 300 GB, and you have 10% autogrow and this happens to
                set in during the index creation, you're in for a pause. Initializing
                30 GB of data does take some time. Not 12 hours though. 20-30 minutes
                may be expected.

                use master
                go
                drop database klump
                go
                create database klump
                go
                use klump
                go
                select TOP 3000000 klumpid = identity(int, 1, 1),
                slaskcol = 0,
                a.* into klump
                from Northwind..Orde rs a
                cross join Northwind..Orde rs b
                cross join Northwind..Orde rs c
                go
                ALTER TABLE klump ADD CONSTRAINT pk_klump PRIMARY KEY (klumpid)
                go
                CREATE INDEX orderidix ON klump (OrderID)
                CREATE INDEX customerid ON klump (CustomerID)
                go
                SELECT getdate()
                go
                CREATE INDEX slaskix ON klump(slaskcol)
                go
                SELECT getdate()

                --
                Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                Books Online for SQL Server SP3 at
                Transform your business with a unified data platform. SQL Server 2019 comes with Apache Spark and Hadoop Distributed File System (HDFS) for intelligence over all your data.

                Comment

                • dfurtney

                  #9
                  Re: CREATE INDEX on large table

                  Although this happened twice on two machines when analyzing the upgrade
                  script via profiler -- our latest attempts to isolate what is happening
                  ended up not reproducing our earlier results. The indexes are building
                  in 2 minutes when tested in isolation outside the script. The only
                  operation which we have reliably reproduced as slow is adding a column
                  with a default value to a large table - which is taking on the order of
                  20 minutes in the million row range. We do this a number of times. We
                  don't yet have an explanation for why the script took so long, although
                  we are running the analysis one more time as I speak.

                  Sorry to have presented you with a problem that didn't reproduce - but
                  we were about to make some drastic changes based on the faulty
                  assumption that building million row indexes was much more expensive
                  then it really is. I appreciate the help you folks have provided!

                  Comment

                  • Erland Sommarskog

                    #10
                    Re: CREATE INDEX on large table

                    dfurtney (dfurtney@hotma il.com) writes:[color=blue]
                    > Although this happened twice on two machines when analyzing the upgrade
                    > script via profiler -- our latest attempts to isolate what is happening
                    > ended up not reproducing our earlier results. The indexes are building
                    > in 2 minutes when tested in isolation outside the script. The only
                    > operation which we have reliably reproduced as slow is adding a column
                    > with a default value to a large table - which is taking on the order of
                    > 20 minutes in the million row range. We do this a number of times.[/color]

                    This sounds like a perfectly normal time for such an operation. Since
                    this column has to be copied into every row, the entire table has to
                    be rewritten. And unless every page bas bytes to spare for the new column,
                    you also get rows rearranged, and it is not a simple update in place.

                    In this case, it can sometimes be better to create a new table and
                    copy data over. (This in fact what we always do in our update scripts,
                    although that more has to do with the greater flexibility this
                    technique offers.)
                    [color=blue]
                    > Sorry to have presented you with a problem that didn't reproduce - but
                    > we were about to make some drastic changes based on the faulty
                    > assumption that building million row indexes was much more expensive
                    > then it really is. I appreciate the help you folks have provided![/color]

                    Oh, never mind! I think your concern was very valid, and I am glad to
                    have helped by telling you that it must have been a false alarm.

                    --
                    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                    Books Online for SQL Server SP3 at
                    Transform your business with a unified data platform. SQL Server 2019 comes with Apache Spark and Hadoop Distributed File System (HDFS) for intelligence over all your data.

                    Comment

                    • maxl@msn.com

                      #11
                      Re: CREATE INDEX on large table

                      I suggest the real issue is:
                      [color=blue]
                      > The only operation which we have reliably reproduced as slow is[/color]
                      adding a column[color=blue]
                      > with a default value to a large table - which is taking on the order[/color]
                      of[color=blue]
                      > 20 minutes in the million row range. We do this a number of times.[/color]

                      Why would you frequently add columns to a three million row table?

                      As far as selectivity goes, you don't gain any advantage to having a
                      default value of 0 vs. having a default value of NULL - it is a
                      disadvantage because you are frequently adding columns to a 3,000,000
                      row table. If the default value is NULL, SQL Server does not have to
                      rebuild the table when you add the value - NULL is nothing as far as
                      SQL Server is concerned.

                      The statistics histogram, based on only a select few companies having
                      this feature, would look something like:
                      _______________ __________|

                      Comment

                      • dfurtney

                        #12
                        Re: CREATE INDEX on large table

                        Point well taken regarding NULL vs 0. The reason we use 0 instead of
                        NULL is because most of the software and reporting is "confused" by
                        nulls. MFC recordsets return the inconvenient tvalue of

                        #define AFX_RFX_LONG_PS EUDO_NULL (0x4a4d4120L)

                        for null integers unless you specifically write code to check for null.
                        We have a large, old codebase which has no null handling code.

                        This is a function/service pack, so we commonly add a number of new
                        fields to support new functionality. In our 3rd test, the upgrade
                        script took 3 hours - which is in the "normal" range. I guess we are
                        going to attribute the earlier results as anomalous for now and monitor
                        for another occurrence. The index is useless for customers not
                        utilizing the feature, however, if the cost is only 2 minutes, it is
                        easier to just add the index for all customers. For those customers
                        using the feature/index, it is highly selective.

                        Comment

                        Working...