I have two tables:
T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records
T2 is the smaller, with 5 million records.
They both have clustered indexes on Key.
I want to do:
update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key
The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:
update T1 set Data = [someconstantdat a]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdat a]
The above works VERY quickly, and if I were to perform the above query
5 mil times(assuming that my data is completely unique in T2 and I
would need to) it would finish very quickly, much sooner than the
previous query. Why won't SQL server just match these up while it is
merging the data and update in one step? Can I make it do this? If I
extracted the data in sorted order into a flat file, I could write a
program in ten minutes to merge the two tables, and update in one
step, and it would fly through this, but I imagine that SQL server is
capable of doing it, and I am just missing it.
Any advice would be GREATLY appreciated!
T1 : Key as bigint, Data as char(20) - size: 61M records
T2 : Key as bigint, Data as char(20) - size: 5M records
T2 is the smaller, with 5 million records.
They both have clustered indexes on Key.
I want to do:
update T1 set Data = T2.Data
from T2
where T2.Key = T1.Key
The goal is to match Key values, and only update the data field of T1
if they match. SQL server seems to optimize this query fairly well,
doing an inner merge join on the Key fields, however, it then does a
Hash match to get the data fields and this is taking FOREVER. It
takes something like 40 mins to do the above query, where it seems to
me, the data could be updated much more efficiently. I would expect
to see just a merge and update, like I would see in the following
query:
update T1 set Data = [someconstantdat a]
from T2
where T2.Key = T1.Key and T2.Data = [someconstantdat a]
The above works VERY quickly, and if I were to perform the above query
5 mil times(assuming that my data is completely unique in T2 and I
would need to) it would finish very quickly, much sooner than the
previous query. Why won't SQL server just match these up while it is
merging the data and update in one step? Can I make it do this? If I
extracted the data in sorted order into a flat file, I could write a
program in ten minutes to merge the two tables, and update in one
step, and it would fly through this, but I imagine that SQL server is
capable of doing it, and I am just missing it.
Any advice would be GREATLY appreciated!
Comment