Update in SQL Server 2000 slow?

**Erland Sommarskog** · Jul 20 '05, 03:39 AM

Re: Update in SQL Server 2000 slow?

Dan Berlin (dberlin@alum.r pi.edu) writes:[color=blue]
> cI have two tables:
>
> T1 : Key as bigint, Data as char(20) - size: 61M records
> T2 : Key as bigint, Data as char(20) - size: 5M records
>
> T2 is the smaller, with 5 million records.
>
> They both have clustered indexes on Key.
>
> I want to do:
>
> update T1 set Data = T2.Data
> from T2
> where T2.Key = T1.Key
>
> The goal is to match Key values, and only update the data field of T1
> if they match. SQL server seems to optimize this query fairly well,
> doing an inner merge join on the Key fields, however, it then does a
> Hash match to get the data fields and this is taking FOREVER. It
> takes something like 40 mins to do the above query, where it seems to
> me, the data could be updated much more efficiently. I would expect
> to see just a merge and update, like I would see in the following
> query:
>
> update T1 set Data = [someconstantdat a]
> from T2
> where T2.Key = T1.Key and T2.Data = [someconstantdat a][/color]

This query is quite different. Here SQL Server can scan T2, and for
every row where Data has a matching value it can look up the key in T1.
Since SQL Server has statistics about the data, it can tell how many
hits the condition on T2.Data will get.

In your first query, you are not restricting T2, so you will have
to scan all. A nested loop join would mean 5 million lookups in T1 -
probably not good. I would expect merge join to be possible, but that is
still a scan of both tables.

First I would add the condition:

WHERE (T1.Data <> T2.Data OR
T1.Data IS NULL AND T2.Data IS NOT NULL OR
T1.Data IS NOT NULL AND T2.Data IS NULL)

So that you actually update only rows you need to update.

If there are plenty of other columns in the table, I would add non-clustered
indexes on (Key1, Data) for both tables, since theses indexes would
cover the query.

--
Erland Sommarskog, SQL Server MVP, sommar@algonet. se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**Dan Berlin** · Jul 20 '05, 03:40 AM

Re: Update in SQL Server 2000 slow?

Erland Sommarskog <sommar@algonet .se> wrote in message news:<Xns94E367 14C325Yazorman@ 127.0.0.1>...[color=blue]
> Dan Berlin (dberlin@alum.r pi.edu) writes:[color=green]
> > cI have two tables:
> >
> > T1 : Key as bigint, Data as char(20) - size: 61M records
> > T2 : Key as bigint, Data as char(20) - size: 5M records
> >
> > T2 is the smaller, with 5 million records.
> >
> > They both have clustered indexes on Key.
> >
> > I want to do:
> >
> > update T1 set Data = T2.Data
> > from T2
> > where T2.Key = T1.Key
> >
> > The goal is to match Key values, and only update the data field of T1
> > if they match. SQL server seems to optimize this query fairly well,
> > doing an inner merge join on the Key fields, however, it then does a
> > Hash match to get the data fields and this is taking FOREVER. It
> > takes something like 40 mins to do the above query, where it seems to
> > me, the data could be updated much more efficiently. I would expect
> > to see just a merge and update, like I would see in the following
> > query:
> >
> > update T1 set Data = [someconstantdat a]
> > from T2
> > where T2.Key = T1.Key and T2.Data = [someconstantdat a][/color]
>
> This query is quite different. Here SQL Server can scan T2, and for
> every row where Data has a matching value it can look up the key in T1.
> Since SQL Server has statistics about the data, it can tell how many
> hits the condition on T2.Data will get.
>
> In your first query, you are not restricting T2, so you will have
> to scan all. A nested loop join would mean 5 million lookups in T1 -
> probably not good. I would expect merge join to be possible, but that is
> still a scan of both tables.
>
> First I would add the condition:
>
> WHERE (T1.Data <> T2.Data OR
> T1.Data IS NULL AND T2.Data IS NOT NULL OR
> T1.Data IS NOT NULL AND T2.Data IS NULL)
>
> So that you actually update only rows you need to update.
>
> If there are plenty of other columns in the table, I would add non-clustered
> indexes on (Key1, Data) for both tables, since theses indexes would
> cover the query.[/color]

This was very helpful, thank you!

However, there is still a large Hash Match/Aggregate being performed
that requires 45%(for a T2 of 2.5M records) of the resources for the
query. A complete table scan of the larger table consists of 34% of
the query, the merge join is 19% and the Hash Match is 45%,
effectively doubling the time the query takes to run. The larger my
T2 table is, the longer the hash takes on a scale that is increasing
faster than linearly(expone ntial? not sure). The hash seems to be
doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk100 0)
(T2.Data = ANY(T2.Data))
This is from the query analyzer's estimated execution plan. Do you
know how I can avoid this hash, or why it is necessary? It really
really slows down the query to an unacceptable level.

Thanks again for the help!
Dan Berlin

**Erland Sommarskog** · Jul 20 '05, 03:40 AM

Re: Update in SQL Server 2000 slow?

Dan Berlin (dberlin@alum.r pi.edu) writes:[color=blue]
> However, there is still a large Hash Match/Aggregate being performed
> that requires 45%(for a T2 of 2.5M records) of the resources for the
> query. A complete table scan of the larger table consists of 34% of
> the query, the merge join is 19% and the Hash Match is 45%,
> effectively doubling the time the query takes to run. The larger my
> T2 table is, the longer the hash takes on a scale that is increasing
> faster than linearly(expone ntial? not sure). The hash seems to be
> doing the following: HASH: bmk1000, RESIDUAL: (bmk1000=bmk100 0)
> (T2.Data = ANY(T2.Data))
> This is from the query analyzer's estimated execution plan. Do you
> know how I can avoid this hash, or why it is necessary? It really
> really slows down the query to an unacceptable level.[/color]

Again, without access to the tables, it is difficult to give very good
suggestions. Query tuning is a lot of hands on.

But if the hashing is a bottleneck, and is growing more than linearly,
one idea is to try to run the update in chunks. Take a reasonbly-sized
interval of the key value at a time.

The hashing is on Data, I would guess to locate the rows that needs
updating. Hashing is probably better than a nested-loop join.

Could you post:

o CREATE TABLE and CREATE INDEX statements for your tables?
o The query as it looks now?
o The query plan you get?

This would leave me a little less in the dark.

--
Erland Sommarskog, SQL Server MVP, sommar@algonet. se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

Update in SQL Server 2000 slow?

Update in SQL Server 2000 slow?

Comment

Comment

Comment