How do you improve SQL performance over large amount of data?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • charlies224@hotmail.com

    How do you improve SQL performance over large amount of data?

    Hi,

    I am using SQL 2000 and has a table that contains more than 2 million
    rows of data (and growing). Right now, I have encountered 2 problems:

    1) Sometimes, when I try to query against this table, I would get sql
    command time out. Hence, I did more testing with Query Analyser and to
    find out that the same queries would not always take about the same
    time to be executed. Could anyone please tell me what would affect the
    speed of the query and what is the most important thing among all the
    factors? (I could think of the opened connections, server's
    CPU/Memory...)

    2) I am not sure if 2 million rows is considered a lot or not, however,
    it start to take 5~10 seconds for me to finish some simple queries. I
    am wondering what is the best practices to handle this amount of data
    while having a decent performance?
    Thank you,

    Charlie Chang
    [Charlies224@hot mail.com]

  • philipdm@msn.com

    #2
    Re: How do you improve SQL performance over large amount of data?


    Have you researched indexes?
    Generally if you create a index in your table on the most common fields
    called in your where statement you can increase performance
    considerably.
    Keep in mind that creating too many indexes could hinder performance
    for insert and delete queries since the table has to be reindexed after
    each of these type of operations.
    Any other suggestions would require us seeing how you built the table.
    Hope that helps.

    Philip


    charlies224@hot mail.com wrote:[color=blue]
    > Hi,
    >
    > I am using SQL 2000 and has a table that contains more than 2 million
    > rows of data (and growing). Right now, I have encountered 2 problems:
    >
    > 1) Sometimes, when I try to query against this table, I would get sql
    > command time out. Hence, I did more testing with Query Analyser and[/color]
    to[color=blue]
    > find out that the same queries would not always take about the same
    > time to be executed. Could anyone please tell me what would affect[/color]
    the[color=blue]
    > speed of the query and what is the most important thing among all the
    > factors? (I could think of the opened connections, server's
    > CPU/Memory...)
    >
    > 2) I am not sure if 2 million rows is considered a lot or not,[/color]
    however,[color=blue]
    > it start to take 5~10 seconds for me to finish some simple queries. I
    > am wondering what is the best practices to handle this amount of data
    > while having a decent performance?
    > Thank you,
    >
    > Charlie Chang
    > [Charlies224@hot mail.com][/color]

    Comment

    • Erland Sommarskog

      #3
      Re: How do you improve SQL performance over large amount of data?

      (charlies224@ho tmail.com) writes:[color=blue]
      > I am using SQL 2000 and has a table that contains more than 2 million
      > rows of data (and growing). Right now, I have encountered 2 problems:[/color]

      I'll take the second question first, as it is more general.
      [color=blue]
      > 2) I am not sure if 2 million rows is considered a lot or not, however,
      > it start to take 5~10 seconds for me to finish some simple queries. I
      > am wondering what is the best practices to handle this amount of data
      > while having a decent performance?[/color]

      Two million rows for a table is a respectable number, although the world
      have seen many larger tables than this. By the way, what matters a lot
      is the total size. A two-million row table with a single integer column
      and a two-million row table with a single char(8000) are very different.
      But say that you have some 30 columns, with an average size of 300 bytes.
      That's a 600 MB table which certainly is not a small table.

      For a table with that size, it's essential that you have good indexes
      for the common queries. It is also essential that you rebuild indexes
      on a regular basis with DBCC REINDEX. However, depends how quickly they
      get fragmented.

      When you say that queries are taking a long time, it could be because
      you need to add some more indexes. One tool to find proper indexes, is
      run the Index Tuning Wizard on a workload.

      If you believe that you have the right indexes, a possible cause could
      be fragmentation. The command DBCC SHOWCONTIG can give you information
      about this.
      [color=blue]
      > 1) Sometimes, when I try to query against this table, I would get sql
      > command time out. Hence, I did more testing with Query Analyser and to
      > find out that the same queries would not always take about the same
      > time to be executed. Could anyone please tell me what would affect the
      > speed of the query and what is the most important thing among all the
      > factors? (I could think of the opened connections, server's
      > CPU/Memory...)[/color]

      There are a bit too many unknowns here to give an exact answer. Is the
      same query take differnt amount of time from execution to execution?
      There are at least two possible causes for this: blocking and caching.
      If another process performs some update operation, your query may be
      blocked for a while. You can examine blocking by using the sp_who command.
      If you see a non-zero value in the Blk column, the spid on that row is
      blocked by the spid in Blk. In the status bar in QA you can see the spid
      of the current window.

      SQL Server tries to keep as much data as it can in cache. If data is
      in cache, the response time for a query can be significantly better
      than if data has to be read from disk. But the cache cannot be bigger
      than a certain amount of the available memory in the machine. (I don't
      know the exact number, but say 60-70%). If there are a lot of scans in
      many tables, data will go in and out of the cache, and response time
      will vary accordingly.

      When testing different queries or indexes, one way to factor out the
      effect of the cache is to use the command DBCC DROPCLEANBUFFER S. This
      flushes the cache entirely. Obviously, it is not a good idea to do
      this on a production box.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

      Comment

      • charlies224@hotmail.com

        #4
        Re: How do you improve SQL performance over large amount of data?

        Thx for the reply, I will read about the indexes tonight.

        As of my table structure. It consists of 12 columns with the following
        order:

        Sale_Date_DT (datetime, first column)
        Employee_ID (int)
        Machine_ID (int)
        Receipt_Number_ NV (nvarchar)
        UPC_NV (nvarchar)
        Quantity_Sold_I N (int)
        Sale_Price_MN (money)
        Tax_MN (money)
        Payment_Type_IN (int)
        Payment_Amount_ MN (money)
        Rebate_Category _ID (int)
        Sales_ID (int, key, identity)

        I get somewhere between 1.5 to 2 million rows of data every year. I
        have been thinking about archieve and reindexing every 6 months.

        I guess I will read about indexing and full-text indexing (maybe on
        receipt number). Any other suggestions would be appreciated :)

        Thank you,

        Charlie Chang
        [charlies224@hot mail.com]

        philipdm@msn.co m wrote:[color=blue]
        > Have you researched indexes?
        > Generally if you create a index in your table on the most common[/color]
        fields[color=blue]
        > called in your where statement you can increase performance
        > considerably.
        > Keep in mind that creating too many indexes could hinder performance
        > for insert and delete queries since the table has to be reindexed[/color]
        after[color=blue]
        > each of these type of operations.
        > Any other suggestions would require us seeing how you built the[/color]
        table.[color=blue]
        > Hope that helps.
        >
        > Philip
        >
        >
        > charlies224@hot mail.com wrote:[color=green]
        > > Hi,
        > >
        > > I am using SQL 2000 and has a table that contains more than 2[/color][/color]
        million[color=blue][color=green]
        > > rows of data (and growing). Right now, I have encountered 2[/color][/color]
        problems:[color=blue][color=green]
        > >
        > > 1) Sometimes, when I try to query against this table, I would get[/color][/color]
        sql[color=blue][color=green]
        > > command time out. Hence, I did more testing with Query Analyser and[/color]
        > to[color=green]
        > > find out that the same queries would not always take about the same
        > > time to be executed. Could anyone please tell me what would affect[/color]
        > the[color=green]
        > > speed of the query and what is the most important thing among all[/color][/color]
        the[color=blue][color=green]
        > > factors? (I could think of the opened connections, server's
        > > CPU/Memory...)
        > >
        > > 2) I am not sure if 2 million rows is considered a lot or not,[/color]
        > however,[color=green]
        > > it start to take 5~10 seconds for me to finish some simple queries.[/color][/color]
        I[color=blue][color=green]
        > > am wondering what is the best practices to handle this amount of[/color][/color]
        data[color=blue][color=green]
        > > while having a decent performance?
        > > Thank you,
        > >
        > > Charlie Chang
        > > [Charlies224@hot mail.com][/color][/color]

        Comment

        • charlies224@hotmail.com

          #5
          Re: How do you improve SQL performance over large amount of data?

          Adding indexes works great. Thank you.

          I do have a few more questions:

          when I do dbcc showcontig (table_name)
          I get the following information:

          TABLE level scan performed.
          - Pages Scanned........ ............... .........: 38882
          - Extents Scanned........ ............... .......: 4879
          - Extent Switches....... ............... ........: 4878
          - Avg. Pages per Extent......... ............... : 8.0
          - Scan Density [Best Count:Actual Count].......: 99.63% [4861:4879]
          - Logical Scan Fragmentation ............... ...: 0.08%
          - Extent Scan Fragmentation ............... ....: 1.46%
          - Avg. Bytes Free per Page........... ..........: 27.4
          - Avg. Page Density (full)......... ............: 99.66%

          I guess the number to look is the Scan Density (the table I had problem
          with was down to 34%). Now, what I really want to know is, in general,
          when should I reindex the table?

          Another question I encountered is that while performing all the
          database maintenances and with all the failed (due to fragmentation
          causing operation timeout) , my transaction log get so big that my HD
          ran out of spaces. I detached the database and truncated the log and
          reattach back the database in order to fix this. I am wondering, if
          there is a way that we can do so that the transaction log can erase the
          old logs when the log file get to a certain size?


          Thank you again for your reply, it really helped.

          Charlie Chang

          Erland Sommarskog wrote:[color=blue]
          > (charlies224@ho tmail.com) writes:[color=green]
          > > I am using SQL 2000 and has a table that contains more than 2[/color][/color]
          million[color=blue][color=green]
          > > rows of data (and growing). Right now, I have encountered 2[/color][/color]
          problems:[color=blue]
          >
          > I'll take the second question first, as it is more general.
          >[color=green]
          > > 2) I am not sure if 2 million rows is considered a lot or not,[/color][/color]
          however,[color=blue][color=green]
          > > it start to take 5~10 seconds for me to finish some simple queries.[/color][/color]
          I[color=blue][color=green]
          > > am wondering what is the best practices to handle this amount of[/color][/color]
          data[color=blue][color=green]
          > > while having a decent performance?[/color]
          >
          > Two million rows for a table is a respectable number, although the[/color]
          world[color=blue]
          > have seen many larger tables than this. By the way, what matters a[/color]
          lot[color=blue]
          > is the total size. A two-million row table with a single integer[/color]
          column[color=blue]
          > and a two-million row table with a single char(8000) are very[/color]
          different.[color=blue]
          > But say that you have some 30 columns, with an average size of 300[/color]
          bytes.[color=blue]
          > That's a 600 MB table which certainly is not a small table.
          >
          > For a table with that size, it's essential that you have good indexes
          > for the common queries. It is also essential that you rebuild indexes
          > on a regular basis with DBCC REINDEX. However, depends how quickly[/color]
          they[color=blue]
          > get fragmented.
          >
          > When you say that queries are taking a long time, it could be because
          > you need to add some more indexes. One tool to find proper indexes,[/color]
          is[color=blue]
          > run the Index Tuning Wizard on a workload.
          >
          > If you believe that you have the right indexes, a possible cause[/color]
          could[color=blue]
          > be fragmentation. The command DBCC SHOWCONTIG can give you[/color]
          information[color=blue]
          > about this.
          >[color=green]
          > > 1) Sometimes, when I try to query against this table, I would get[/color][/color]
          sql[color=blue][color=green]
          > > command time out. Hence, I did more testing with Query Analyser and[/color][/color]
          to[color=blue][color=green]
          > > find out that the same queries would not always take about the same
          > > time to be executed. Could anyone please tell me what would affect[/color][/color]
          the[color=blue][color=green]
          > > speed of the query and what is the most important thing among all[/color][/color]
          the[color=blue][color=green]
          > > factors? (I could think of the opened connections, server's
          > > CPU/Memory...)[/color]
          >
          > There are a bit too many unknowns here to give an exact answer. Is[/color]
          the[color=blue]
          > same query take differnt amount of time from execution to execution?
          > There are at least two possible causes for this: blocking and[/color]
          caching.[color=blue]
          > If another process performs some update operation, your query may be
          > blocked for a while. You can examine blocking by using the sp_who[/color]
          command.[color=blue]
          > If you see a non-zero value in the Blk column, the spid on that row[/color]
          is[color=blue]
          > blocked by the spid in Blk. In the status bar in QA you can see the[/color]
          spid[color=blue]
          > of the current window.
          >
          > SQL Server tries to keep as much data as it can in cache. If data is
          > in cache, the response time for a query can be significantly better
          > than if data has to be read from disk. But the cache cannot be bigger
          > than a certain amount of the available memory in the machine. (I[/color]
          don't[color=blue]
          > know the exact number, but say 60-70%). If there are a lot of scans[/color]
          in[color=blue]
          > many tables, data will go in and out of the cache, and response time
          > will vary accordingly.
          >
          > When testing different queries or indexes, one way to factor out the
          > effect of the cache is to use the command DBCC DROPCLEANBUFFER S. This
          > flushes the cache entirely. Obviously, it is not a good idea to do
          > this on a production box.
          >
          > --
          > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
          >
          > Books Online for SQL Server SP3 at
          > http://www.microsoft.com/sql/techinf...2000/books.asp[/color]

          Comment

          • Erland Sommarskog

            #6
            Re: How do you improve SQL performance over large amount of data?

            (charlies224@ho tmail.com) writes:[color=blue]
            > I do have a few more questions:
            >
            > when I do dbcc showcontig (table_name)
            > I get the following information:
            >
            > TABLE level scan performed.
            > - Pages Scanned........ ............... .........: 38882
            > - Extents Scanned........ ............... .......: 4879
            > - Extent Switches....... ............... ........: 4878
            > - Avg. Pages per Extent......... ............... : 8.0
            > - Scan Density [Best Count:Actual Count].......: 99.63% [4861:4879]
            > - Logical Scan Fragmentation ............... ...: 0.08%
            > - Extent Scan Fragmentation ............... ....: 1.46%
            > - Avg. Bytes Free per Page........... ..........: 27.4
            > - Avg. Page Density (full)......... ............: 99.66%
            >
            > I guess the number to look is the Scan Density (the table I had problem
            > with was down to 34%). Now, what I really want to know is, in general,
            > when should I reindex the table?[/color]

            Depends a little, and there are actually a number of strategies you
            can use, depending on how the table is used. But as a simple rule of
            thumb, don't defragment if scan density is better than 70%. If nothing
            else, that avoids unnecessary bloat of the transaction log.
            [color=blue]
            > Another question I encountered is that while performing all the
            > database maintenances and with all the failed (due to fragmentation
            > causing operation timeout) , my transaction log get so big that my HD
            > ran out of spaces. I detached the database and truncated the log and
            > reattach back the database in order to fix this. I am wondering, if
            > there is a way that we can do so that the transaction log can erase the
            > old logs when the log file get to a certain size?[/color]

            Well, depends on what you want the transaction log for. If you are
            perfectly content with restore the latest full backup (a backup every
            night is good) in case of a crash, just switch to simple recovery mode. You
            can still encounter transaction-log explosion with reindexing, since the log
            can never be truncated past any currently running transaction. But at least
            when you are, the log will be truncated automatically.

            If you need up-to-the-point recovery, you must run with full or bulk-logged
            recovery, but in such case you don't want the transaction to be erased,
            you need to back it up every now and then.

            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

            Comment

            Working...