clustered vs. non clustered

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • pb648174

    clustered vs. non clustered

    I've been doing a bit of reading and have read in quite a few places
    that an identity column is a good clustered index and that all or at
    least most tables should have a clustered index. The tool I used to
    generate tables made them all with non clustered indexes so I would
    like to drop all of them and generate clustered indexes. So my
    questions is a) good idea? and b) how? There are foreign key references
    to most of them so those would need to be dropped first and then
    re-created after the clustered one was created and that could cascade
    (I think?)

    Any existing scripts out there that might do this? I found something
    similar and modified it, the sql is included below. This gives me the
    list of all the columns I need, I just need to get the foreign keys for
    each from here before each one and generate all the create/drop
    scripts.

    All the columns I am looking to do this for are called "Id" making this
    somewhat simpler. I'm just looking to incrementally make the SQL side
    better and don't want to rewrite a bunch of application level code to
    make the column names ISO compliant, etc.

    /*
    -- Returns whether the column is ASC or DESC
    CREATE FUNCTION dbo.GetIndexCol umnOrder
    (
    @object_id INT,
    @index_id TINYINT,
    @column_id TINYINT
    )
    RETURNS NVARCHAR(5)
    AS
    BEGIN
    DECLARE @r NVARCHAR(5)
    SELECT @r = CASE INDEXKEY_PROPER TY
    (
    @object_id,
    @index_id,
    @column_id,
    'IsDescending'
    )
    WHEN 1 THEN N' DESC'
    ELSE N''
    END
    RETURN @r
    END

    -- Returns the list of columns in the index
    CREATE FUNCTION dbo.GetIndexCol umns
    (
    @table_name SYSNAME,
    @object_id INT,
    @index_id TINYINT
    )
    RETURNS NVARCHAR(4000)
    AS
    BEGIN
    DECLARE
    @colnames NVARCHAR(4000),
    @thisColID INT,
    @thisColName SYSNAME

    SET @colnames = INDEX_COL(@tabl e_name, @index_id, 1)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id, 1)

    SET @thisColID = 2
    SET @thisColName = INDEX_COL(@tabl e_name, @index_id, @thisColID)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id, @thisColID)

    WHILE (@thisColName IS NOT NULL)
    BEGIN
    SET @thisColID = @thisColID + 1
    SET @colnames = @colnames + ', ' + @thisColName

    SET @thisColName = INDEX_COL(@tabl e_name, @index_id,
    @thisColID)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id,
    @thisColID)
    END
    RETURN @colNames
    END

    CREATE VIEW dbo.vAllIndexes
    AS
    begin
    SELECT
    TABLE_NAME = OBJECT_NAME(i.i d),
    INDEX_NAME = i.name,
    COLUMN_LIST = dbo.GetIndexCol umns(OBJECT_NAM E(i.id), i.id,
    i.indid),
    IS_CLUSTERED = INDEXPROPERTY(i .id, i.name, 'IsClustered'),
    IS_UNIQUE = INDEXPROPERTY(i .id, i.name, 'IsUnique'),
    FILE_GROUP = g.GroupName
    FROM
    sysindexes i
    INNER JOIN
    sysfilegroups g
    ON
    i.groupid = g.groupid
    WHERE
    (i.indid BETWEEN 1 AND 254)
    -- leave out AUTO_STATISTICS :
    AND (i.Status & 64)=0
    -- leave out system tables:
    AND OBJECTPROPERTY( i.id, 'IsMsShipped') = 0
    end
    */

    SELECT
    v.*
    FROM
    dbo.vAllIndexes v
    INNER JOIN
    INFORMATION_SCH EMA.TABLE_CONST RAINTS T
    ON
    T.CONSTRAINT_NA ME = v.INDEX_NAME
    AND T.TABLE_NAME = v.TABLE_NAME
    AND T.CONSTRAINT_TY PE = 'PRIMARY KEY'
    AND v.COLUMN_LIST = 'Id'
    AND v.IS_CLUSTERED = 0
    ORDER BY v.TABLE_NAME

  • Stu

    #2
    Re: clustered vs. non clustered

    It's OK to have a clustered index that is seperate from your
    nonclustered primary key, even if the two indexes cover the same
    columns. In fact, I usually build my indexes in this way in case I
    ever have to move the clustered index to a different column and I don't
    want to mess with my established foreign key constraints.

    That being said, I would simply add the clustered index to each table
    and not worry about dropping the pre-existing primary key constraint.
    It'll take a while, but it will work.

    Stu


    pb648174 wrote:
    I've been doing a bit of reading and have read in quite a few places
    that an identity column is a good clustered index and that all or at
    least most tables should have a clustered index. The tool I used to
    generate tables made them all with non clustered indexes so I would
    like to drop all of them and generate clustered indexes. So my
    questions is a) good idea? and b) how? There are foreign key references
    to most of them so those would need to be dropped first and then
    re-created after the clustered one was created and that could cascade
    (I think?)
    >
    Any existing scripts out there that might do this? I found something
    similar and modified it, the sql is included below. This gives me the
    list of all the columns I need, I just need to get the foreign keys for
    each from here before each one and generate all the create/drop
    scripts.
    >
    All the columns I am looking to do this for are called "Id" making this
    somewhat simpler. I'm just looking to incrementally make the SQL side
    better and don't want to rewrite a bunch of application level code to
    make the column names ISO compliant, etc.
    >
    /*
    -- Returns whether the column is ASC or DESC
    CREATE FUNCTION dbo.GetIndexCol umnOrder
    (
    @object_id INT,
    @index_id TINYINT,
    @column_id TINYINT
    )
    RETURNS NVARCHAR(5)
    AS
    BEGIN
    DECLARE @r NVARCHAR(5)
    SELECT @r = CASE INDEXKEY_PROPER TY
    (
    @object_id,
    @index_id,
    @column_id,
    'IsDescending'
    )
    WHEN 1 THEN N' DESC'
    ELSE N''
    END
    RETURN @r
    END
    >
    -- Returns the list of columns in the index
    CREATE FUNCTION dbo.GetIndexCol umns
    (
    @table_name SYSNAME,
    @object_id INT,
    @index_id TINYINT
    )
    RETURNS NVARCHAR(4000)
    AS
    BEGIN
    DECLARE
    @colnames NVARCHAR(4000),
    @thisColID INT,
    @thisColName SYSNAME
    >
    SET @colnames = INDEX_COL(@tabl e_name, @index_id, 1)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id, 1)
    >
    SET @thisColID = 2
    SET @thisColName = INDEX_COL(@tabl e_name, @index_id, @thisColID)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id, @thisColID)
    >
    WHILE (@thisColName IS NOT NULL)
    BEGIN
    SET @thisColID = @thisColID + 1
    SET @colnames = @colnames + ', ' + @thisColName
    >
    SET @thisColName = INDEX_COL(@tabl e_name, @index_id,
    @thisColID)
    + dbo.GetIndexCol umnOrder(@objec t_id, @index_id,
    @thisColID)
    END
    RETURN @colNames
    END
    >
    CREATE VIEW dbo.vAllIndexes
    AS
    begin
    SELECT
    TABLE_NAME = OBJECT_NAME(i.i d),
    INDEX_NAME = i.name,
    COLUMN_LIST = dbo.GetIndexCol umns(OBJECT_NAM E(i.id), i.id,
    i.indid),
    IS_CLUSTERED = INDEXPROPERTY(i .id, i.name, 'IsClustered'),
    IS_UNIQUE = INDEXPROPERTY(i .id, i.name, 'IsUnique'),
    FILE_GROUP = g.GroupName
    FROM
    sysindexes i
    INNER JOIN
    sysfilegroups g
    ON
    i.groupid = g.groupid
    WHERE
    (i.indid BETWEEN 1 AND 254)
    -- leave out AUTO_STATISTICS :
    AND (i.Status & 64)=0
    -- leave out system tables:
    AND OBJECTPROPERTY( i.id, 'IsMsShipped') = 0
    end
    */
    >
    SELECT
    v.*
    FROM
    dbo.vAllIndexes v
    INNER JOIN
    INFORMATION_SCH EMA.TABLE_CONST RAINTS T
    ON
    T.CONSTRAINT_NA ME = v.INDEX_NAME
    AND T.TABLE_NAME = v.TABLE_NAME
    AND T.CONSTRAINT_TY PE = 'PRIMARY KEY'
    AND v.COLUMN_LIST = 'Id'
    AND v.IS_CLUSTERED = 0
    ORDER BY v.TABLE_NAME

    Comment

    • Erland Sommarskog

      #3
      Re: clustered vs. non clustered

      pb648174 (google@webpaul .net) writes:
      I've been doing a bit of reading and have read in quite a few places
      that an identity column is a good clustered index and that all or at
      least most tables should have a clustered index. The tool I used to
      generate tables made them all with non clustered indexes so I would
      like to drop all of them and generate clustered indexes.
      Yes, having clustered indexes on all tables is a good idea, but the
      IDENTITY column is not always the best choice. It's a good choice if
      you have a high transaction rate, and you want to avoid fragmentation
      and page splits.

      But for SELECT queries it is likely that in most tables that there
      are better candidates for the clustered index, as you don't do
      range queries on ids that often. So I would suggest that you review
      your tables and look for better columns to cluster on.

      Here I had single-column PKs in mind. Clustering on a multi-column PK,
      or part of it is another matter. Take an OrderDetails table for instance.
      "SELECT ... FROM OrderDetails WHERE OrderID = @id" is a very likely
      query and a clustred index may be great here.

      Stu's suggestion of keeping the PK non-clustered, and adding a clustered
      index as well is not that bad. If you have a multi-column key that is 4
      30 bytes long, but the first key column is four bytes, the clustering on
      the first columns means that the key size for the clustered index is
      only 8 bytes. (key col + uniquifier). Since cluster-key colunms appear
      in non-clustered index, this matters quite a bit.

      As for looking up the foreign keys, the tables are sysreferences in
      SQL 2000 and sys.forein_keys in SQL 2005.



      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server 2005 at

      Books Online for SQL Server 2000 at

      Comment

      • pb648174

        #4
        Re: clustered vs. non clustered

        Well that makes things simpler then.. I'll try adding the clustered
        columns to one area of the app and see if it makes a positive or
        negative performance impact. Thanks for the info guys.

        Erland Sommarskog wrote:
        pb648174 (google@webpaul .net) writes:
        I've been doing a bit of reading and have read in quite a few places
        that an identity column is a good clustered index and that all or at
        least most tables should have a clustered index. The tool I used to
        generate tables made them all with non clustered indexes so I would
        like to drop all of them and generate clustered indexes.
        >
        Yes, having clustered indexes on all tables is a good idea, but the
        IDENTITY column is not always the best choice. It's a good choice if
        you have a high transaction rate, and you want to avoid fragmentation
        and page splits.
        >
        But for SELECT queries it is likely that in most tables that there
        are better candidates for the clustered index, as you don't do
        range queries on ids that often. So I would suggest that you review
        your tables and look for better columns to cluster on.
        >
        Here I had single-column PKs in mind. Clustering on a multi-column PK,
        or part of it is another matter. Take an OrderDetails table for instance.
        "SELECT ... FROM OrderDetails WHERE OrderID = @id" is a very likely
        query and a clustred index may be great here.
        >
        Stu's suggestion of keeping the PK non-clustered, and adding a clustered
        index as well is not that bad. If you have a multi-column key that is 4
        30 bytes long, but the first key column is four bytes, the clustering on
        the first columns means that the key size for the clustered index is
        only 8 bytes. (key col + uniquifier). Since cluster-key colunms appear
        in non-clustered index, this matters quite a bit.
        >
        As for looking up the foreign keys, the tables are sysreferences in
        SQL 2000 and sys.forein_keys in SQL 2005.
        >
        >
        >
        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
        >
        Books Online for SQL Server 2005 at

        Books Online for SQL Server 2000 at
        http://www.microsoft.com/sql/prodinf...ons/books.mspx

        Comment

        • pb648174

          #5
          Re: clustered vs. non clustered

          Performance was actually worse once I added the clustered index. A
          query that takes 4 seconds took 5 seconds after adding clustered
          indexes to all the tables for a particular module. I turned the actual
          execution plan display on and saw that it was using the clustered index
          instead of the non clustered. So without the clustered index the
          largest time used is an "index seek" and a "table spool/lazy spool" and
          with the clustered index the index seek just becomes a clustered index
          seek... No big difference except it takes longer!

          pb648174 wrote:
          Well that makes things simpler then.. I'll try adding the clustered
          columns to one area of the app and see if it makes a positive or
          negative performance impact. Thanks for the info guys.
          >
          Erland Sommarskog wrote:
          pb648174 (google@webpaul .net) writes:
          I've been doing a bit of reading and have read in quite a few places
          that an identity column is a good clustered index and that all or at
          least most tables should have a clustered index. The tool I used to
          generate tables made them all with non clustered indexes so I would
          like to drop all of them and generate clustered indexes.
          Yes, having clustered indexes on all tables is a good idea, but the
          IDENTITY column is not always the best choice. It's a good choice if
          you have a high transaction rate, and you want to avoid fragmentation
          and page splits.

          But for SELECT queries it is likely that in most tables that there
          are better candidates for the clustered index, as you don't do
          range queries on ids that often. So I would suggest that you review
          your tables and look for better columns to cluster on.

          Here I had single-column PKs in mind. Clustering on a multi-column PK,
          or part of it is another matter. Take an OrderDetails table for instance.
          "SELECT ... FROM OrderDetails WHERE OrderID = @id" is a very likely
          query and a clustred index may be great here.

          Stu's suggestion of keeping the PK non-clustered, and adding a clustered
          index as well is not that bad. If you have a multi-column key that is 4
          30 bytes long, but the first key column is four bytes, the clustering on
          the first columns means that the key size for the clustered index is
          only 8 bytes. (key col + uniquifier). Since cluster-key colunms appear
          in non-clustered index, this matters quite a bit.

          As for looking up the foreign keys, the tables are sysreferences in
          SQL 2000 and sys.forein_keys in SQL 2005.



          --
          Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

          Books Online for SQL Server 2005 at

          Books Online for SQL Server 2000 at
          http://www.microsoft.com/sql/prodinf...ons/books.mspx

          Comment

          • Dan Guzman

            #6
            Re: clustered vs. non clustered

            Index tuning is not black-and-white, especially when it comes to the
            clustered index decision. It is likely than some queries will benefit by
            the PK clustered index while others will not. You'll need run a mix of
            queries that is representative of the actual workload mix to ascertain
            overall performance impact. IMHO, an all-or-nothing clustered index
            decision is naive.

            It is also possible that some tables will benefit with the clustered PK and
            others will not. I know that this adds a wrinkle to automated schema
            generation but this is reality. You might consider using the Index Tuning
            Wizard (SQL 2000) or Database Engine Tuning Advisor (SQL 2005) for index
            recommendations based on workload.


            --
            Hope this helps.

            Dan Guzman
            SQL Server MVP

            "pb648174" <google@webpaul .netwrote in message
            news:1155727057 .245342.208480@ 74g2000cwt.goog legroups.com...
            Performance was actually worse once I added the clustered index. A
            query that takes 4 seconds took 5 seconds after adding clustered
            indexes to all the tables for a particular module. I turned the actual
            execution plan display on and saw that it was using the clustered index
            instead of the non clustered. So without the clustered index the
            largest time used is an "index seek" and a "table spool/lazy spool" and
            with the clustered index the index seek just becomes a clustered index
            seek... No big difference except it takes longer!
            >
            pb648174 wrote:
            >Well that makes things simpler then.. I'll try adding the clustered
            >columns to one area of the app and see if it makes a positive or
            >negative performance impact. Thanks for the info guys.
            >>
            >Erland Sommarskog wrote:
            pb648174 (google@webpaul .net) writes:
            I've been doing a bit of reading and have read in quite a few places
            that an identity column is a good clustered index and that all or at
            least most tables should have a clustered index. The tool I used to
            generate tables made them all with non clustered indexes so I would
            like to drop all of them and generate clustered indexes.
            >
            Yes, having clustered indexes on all tables is a good idea, but the
            IDENTITY column is not always the best choice. It's a good choice if
            you have a high transaction rate, and you want to avoid fragmentation
            and page splits.
            >
            But for SELECT queries it is likely that in most tables that there
            are better candidates for the clustered index, as you don't do
            range queries on ids that often. So I would suggest that you review
            your tables and look for better columns to cluster on.
            >
            Here I had single-column PKs in mind. Clustering on a multi-column PK,
            or part of it is another matter. Take an OrderDetails table for
            instance.
            "SELECT ... FROM OrderDetails WHERE OrderID = @id" is a very likely
            query and a clustred index may be great here.
            >
            Stu's suggestion of keeping the PK non-clustered, and adding a
            clustered
            index as well is not that bad. If you have a multi-column key that is 4
            30 bytes long, but the first key column is four bytes, the clustering
            on
            the first columns means that the key size for the clustered index is
            only 8 bytes. (key col + uniquifier). Since cluster-key colunms appear
            in non-clustered index, this matters quite a bit.
            >
            As for looking up the foreign keys, the tables are sysreferences in
            SQL 2000 and sys.forein_keys in SQL 2005.
            >
            >
            >
            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
            >
            Books Online for SQL Server 2005 at

            Books Online for SQL Server 2000 at
            http://www.microsoft.com/sql/prodinf...ons/books.mspx
            >

            Comment

            Working...