Update in SQL Server 2000 slow?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dan Berlin

    Update in SQL Server 2000 slow?

    I have two tables:

    T1 : Key as bigint, Data as char(20) - size: 61M records
    T2 : Key as bigint, Data as char(20) - size: 5M records

    T2 is the smaller, with 5 million records.

    They both have clustered indexes on Key.

    I want to do:

    update T1 set Data = T2.Data
    from T2
    where T2.Key = T1.Key

    The goal is to match Key values, and only update the data field of T1
    if they match. SQL server seems to optimize this query fairly well,
    doing an inner merge join on the Key fields, however, it then does a
    Hash match to get the data fields and this is taking FOREVER. It
    takes something like 40 mins to do the above query, where it seems to
    me, the data could be updated much more efficiently. I would expect
    to see just a merge and update, like I would see in the following
    query:

    update T1 set Data = [someconstantdat a]
    from T2
    where T2.Key = T1.Key and T2.Data = [someconstantdat a]

    The above works VERY quickly, and if I were to perform the above query
    5 mil times(assuming that my data is completely unique in T2 and I
    would need to) it would finish very quickly, much sooner than the
    previous query. Why won't SQL server just match these up while it is
    merging the data and update in one step? Can I make it do this? If I
    extracted the data in sorted order into a flat file, I could write a
    program in ten minutes to merge the two tables, and update in one
    step, and it would fly through this, but I imagine that SQL server is
    capable of doing it, and I am just missing it.

    Any advice would be GREATLY appreciated!
  • Erland Sommarskog

    #2
    Re: Update in SQL Server 2000 slow?

    Dan Berlin (dberlin@alum.r pi.edu) writes:[color=blue]
    > cI have two tables:
    >
    > T1 : Key as bigint, Data as char(20) - size: 61M records
    > T2 : Key as bigint, Data as char(20) - size: 5M records
    >
    > T2 is the smaller, with 5 million records.
    >
    > They both have clustered indexes on Key.
    >
    > I want to do:
    >
    > update T1 set Data = T2.Data
    > from T2
    > where T2.Key = T1.Key
    >
    > The goal is to match Key values, and only update the data field of T1
    > if they match. SQL server seems to optimize this query fairly well,
    > doing an inner merge join on the Key fields, however, it then does a
    > Hash match to get the data fields and this is taking FOREVER. It
    > takes something like 40 mins to do the above query, where it seems to
    > me, the data could be updated much more efficiently. I would expect
    > to see just a merge and update, like I would see in the following
    > query:
    >
    > update T1 set Data = [someconstantdat a]
    > from T2
    > where T2.Key = T1.Key and T2.Data = [someconstantdat a][/color]

    This query is quite different. Here SQL Server can scan T2, and for
    every row where Data has a matching value it can look up the key in T1.
    Since SQL Server has statistics about the data, it can tell how many
    hits the condition on T2.Data will get.

    In your first query, you are not restricting T2, so you will have
    to scan all. A nested loop join would mean 5 million lookups in T1 -
    probably not good. I would expect merge join to be possible, but that is
    still a scan of both tables.

    First I would add the condition:

    WHERE (T1.Data <> T2.Data OR
    T1.Data IS NULL AND T2.Data IS NOT NULL OR
    T1.Data IS NOT NULL AND T2.Data IS NULL)

    So that you actually update only rows you need to update.

    If there are plenty of other columns in the table, I would add non-clustered
    indexes on (Key1, Data) for both tables, since theses indexes would
    cover the query.

    --
    Erland Sommarskog, SQL Server MVP, sommar@algonet. se

    Books Online for SQL Server SP3 at
    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

    Comment

    • Dan Berlin

      #3
      Re: Update in SQL Server 2000 slow?

      Erland Sommarskog <sommar@algonet .se> wrote in message news:<Xns94E367 14C325Yazorman@ 127.0.0.1>...[color=blue]
      > Dan Berlin (dberlin@alum.r pi.edu) writes:[color=green]
      > > cI have two tables:
      > >
      > > T1 : Key as bigint, Data as char(20) - size: 61M records
      > > T2 : Key as bigint, Data as char(20) - size: 5M records
      > >
      > > T2 is the smaller, with 5 million records.
      > >
      > > They both have clustered indexes on Key.
      > >
      > > I want to do:
      > >
      > > update T1 set Data = T2.Data
      > > from T2
      > > where T2.Key = T1.Key
      > >
      > > The goal is to match Key values, and only update the data field of T1
      > > if they match. SQL server seems to optimize this query fairly well,
      > > doing an inner merge join on the Key fields, however, it then does a
      > > Hash match to get the data fields and this is taking FOREVER. It
      > > takes something like 40 mins to do the above query, where it seems to
      > > me, the data could be updated much more efficiently. I would expect
      > > to see just a merge and update, like I would see in the following
      > > query:
      > >
      > > update T1 set Data = [someconstantdat a]
      > > from T2
      > > where T2.Key = T1.Key and T2.Data = [someconstantdat a][/color]
      >
      > This query is quite different. Here SQL Server can scan T2, and for
      > every row where Data has a matching value it can look up the key in T1.
      > Since SQL Server has statistics about the data, it can tell how many
      > hits the condition on T2.Data will get.
      >
      > In your first query, you are not restricting T2, so you will have
      > to scan all. A nested loop join would mean 5 million lookups in T1 -
      > probably not good. I would expect merge join to be possible, but that is
      > still a scan of both tables.
      >
      > First I would add the condition:
      >
      > WHERE (T1.Data <> T2.Data OR
      > T1.Data IS NULL AND T2.Data IS NOT NULL OR
      > T1.Data IS NOT NULL AND T2.Data IS NULL)
      >
      > So that you actually update only rows you need to update.
      >
      > If there are plenty of other columns in the table, I would add non-clustered
      > indexes on (Key1, Data) for both tables, since theses indexes would
      > cover the query.[/color]

      This was very helpful, thank you!

      However, there is still a large Hash Match/Aggregate being performed
      that requires 45%(for a T2 of 2.5M records) of the resources for the
      query. A complete table scan of the larger table consists of 34% of
      the query, the merge join is 19% and the Hash Match is 45%,
      effectively doubling the time the query takes to run. The larger my
      T2 table is, the longer the hash takes on a scale that is increasing
      faster than linearly(expone ntial? not sure). The hash seems to be
      doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk100 0)
      (T2.Data = ANY(T2.Data))
      This is from the query analyzer's estimated execution plan. Do you
      know how I can avoid this hash, or why it is necessary? It really
      really slows down the query to an unacceptable level.

      Thanks again for the help!
      Dan Berlin

      Comment

      • Erland Sommarskog

        #4
        Re: Update in SQL Server 2000 slow?

        Dan Berlin (dberlin@alum.r pi.edu) writes:[color=blue]
        > However, there is still a large Hash Match/Aggregate being performed
        > that requires 45%(for a T2 of 2.5M records) of the resources for the
        > query. A complete table scan of the larger table consists of 34% of
        > the query, the merge join is 19% and the Hash Match is 45%,
        > effectively doubling the time the query takes to run. The larger my
        > T2 table is, the longer the hash takes on a scale that is increasing
        > faster than linearly(expone ntial? not sure). The hash seems to be
        > doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk100 0)
        > (T2.Data = ANY(T2.Data))
        > This is from the query analyzer's estimated execution plan. Do you
        > know how I can avoid this hash, or why it is necessary? It really
        > really slows down the query to an unacceptable level.[/color]

        Again, without access to the tables, it is difficult to give very good
        suggestions. Query tuning is a lot of hands on.

        But if the hashing is a bottleneck, and is growing more than linearly,
        one idea is to try to run the update in chunks. Take a reasonbly-sized
        interval of the key value at a time.

        The hashing is on Data, I would guess to locate the rows that needs
        updating. Hashing is probably better than a nested-loop join.

        Could you post:

        o CREATE TABLE and CREATE INDEX statements for your tables?
        o The query as it looks now?
        o The query plan you get?

        This would leave me a little less in the dark.


        --
        Erland Sommarskog, SQL Server MVP, sommar@algonet. se

        Books Online for SQL Server SP3 at
        Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

        Comment

        Working...