Hi,
I was just helping a coworker optimize a query. He had two versions:
one which used UNION for each value for which he was tallying results
and another query which used GROUP BY. Here is an aproximation of what
they were:
Query #1:
---------
SELECT 12 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
AND T.col2 = 5
UNION
SELECT 13 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
AND T.col2 = 6
Query #2:
---------
SELECT R.row_num AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
INNER JOIN Report_Rows R ON R.col2 = T.col2
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
GROUP BY ALL R.row_num
ORDER BY R.row_num
The Report_Rows table in this case would have had two rows mapping row
12 to a column value of 5 and row 13 to a column value of 6. The
second query was performing horribly until I noticed the ALL keyword
in the GROUP BY, which I didn't think was necessary. When I removed
that it performed more like I expected it to perform.
Before I had noticed that I was scouring over the query plans and
couldn't figure out why in one instance the query optimizer chose to
join My_Table and Another_Table, yet when the ALL keyword was there it
chose to return all of the records from Another_Table (a rather large
table) and join it to the Report_Rows table before then joining to
My_Table, which had the date criteria in the WHERE clause.
So, if you've read this far without giving up...
1. Why would the ALL keyword cause this? I understand the
functionality of ALL, but I still don't see why that caused the
reordering of the joins.
2. (more importantly) Are there any good resources that you know of
that explain how the query optimizer choices its query paths? Do the
"Inside SQL Server" books go into that much detail? Any good online
resources?
Thanks!
-Tom.
I was just helping a coworker optimize a query. He had two versions:
one which used UNION for each value for which he was tallying results
and another query which used GROUP BY. Here is an aproximation of what
they were:
Query #1:
---------
SELECT 12 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
AND T.col2 = 5
UNION
SELECT 13 AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
AND T.col2 = 6
Query #2:
---------
SELECT R.row_num AS [Row],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN @week_start_dat e AND
DATEADD(d, 1, @week_start_dat e) THEN 1 ELSE 0 END), 0) AS [Monday],
ISNULL(SUM(CASE WHEN T.my_date BETWEEN DATEADD(d, 1,
@week_start_dat e) AND DATEADD(d, 2, @week_start_dat e) THEN 1 ELSE 0
END), 0) AS [Tuesday]
FROM My_Table T
INNER JOIN Another_Table T2 ON T2.col1 = T.col1
INNER JOIN Report_Rows R ON R.col2 = T.col2
WHERE T.my_date BETWEEN @week_start_dat e AND @week_end_date
GROUP BY ALL R.row_num
ORDER BY R.row_num
The Report_Rows table in this case would have had two rows mapping row
12 to a column value of 5 and row 13 to a column value of 6. The
second query was performing horribly until I noticed the ALL keyword
in the GROUP BY, which I didn't think was necessary. When I removed
that it performed more like I expected it to perform.
Before I had noticed that I was scouring over the query plans and
couldn't figure out why in one instance the query optimizer chose to
join My_Table and Another_Table, yet when the ALL keyword was there it
chose to return all of the records from Another_Table (a rather large
table) and join it to the Report_Rows table before then joining to
My_Table, which had the date criteria in the WHERE clause.
So, if you've read this far without giving up...
1. Why would the ALL keyword cause this? I understand the
functionality of ALL, but I still don't see why that caused the
reordering of the joins.
2. (more importantly) Are there any good resources that you know of
that explain how the query optimizer choices its query paths? Do the
"Inside SQL Server" books go into that much detail? Any good online
resources?
Thanks!
-Tom.
Comment