Convert correlated subqery to join?

**NeoPa** · Sep 15 '10, 12:33 PM

Perhaps it may be easier to understand what you're looking for if you described it in words.

**FLANDERS** · Sep 15 '10, 12:51 PM

OK, I used sample tables/columns in first example. There scenario is I have a table of item changes over time, basically ID, priceChange and a calculated row number. Some items may not change in a particular month and so I have a situation where I end up with a table like

Code:

[B]ID    priceChange    Month       rn[/B]
1     100      March 2010  1
1     null     June 2010   2
1     200      July 2010   3
2     520      March 2010  1 
2     null     June 2010   2

I want to fill in the null amount for each item by selecting the previous "real" item price for that particular item. Hence I want to end up with

Code:

[B]ID    priceChange    Month    rn[/B]
1     100      March 2010  1
1     100      June 2010   2
1     200      July 2010   3
2     520      March 2010  1 
2     520      June 2010   2

To do this, I do a join based on the item id being the same and select the record with largest rn value that has same ID.

Code:

select 
a.id,
coalesce(a.priceChange, b.priceChange) as priceChange
FROM items a, items b
where a.id = b.id AND b.rn = (   
    select max(rn) from items where id = a.id and rn <= a.rn
)

It works fine, just a bit slow, due to the correlated subquery. I have seen some examples of how these can be replaced with JOINs but I cant get it to fit my problem

Is that any clearer?

**NeoPa** · Sep 15 '10, 01:16 PM

Yes. Certainly.

Try this :

Code:

SELECT [ID]
     , COALESCE(tI.Price, sI.PrevPrice) AS [Price]
     , [Month]
     , [RN]
FROM   [Items] AS tI
       LEFT OUTER JOIN
       (
    SELECT   [Price] AS [PrevPrice]
    FROM     [Items]
             INNER JOIN
             (
        SELECT   [ID]
               , MAX([RN]) AS MaxRN
        FROM     [Items]
        GROUP BY [ID]
        ) AS sIi
      ON     tI.ID = sIi.ID
     AND     tI.RN > sIi.RN
    ) AS sI
  ON   tI.ID = sI.ID

First (logically - not in the SQL) find the RN of the previous record (The sub-sub-query sIi), then find the [Price] that matches that record (The sub-qury sI). Lastly, match this record (if found - hence LEFT OUTER JOIN) with the original and coalesce in the normal way.

Let us know how this works out for you.

Welcome to Bytes!

**FLANDERS** · Sep 15 '10, 01:56 PM

That doesnt run. The ON clause of the INNER JOIN is incorrect it appears:

Code:

ON tI.ID = sIi.ID AND tI.RN > sIi.RN

, since you are referring to an alias (tI) that isnt available at that point:
The multi-part identifier "tI.ID" could not be bound

**NeoPa** · Sep 15 '10, 04:41 PM

I'll try for a deeper look at this tomorrow. Sorry to get it wrong (at least so my initial tests would indicate).

Having most of my SQL experience in Jet (Access) I do sometimes lead people amiss (unintentionall y of course). Preliminary tests indicate that T-SQL doesn't handle this as well as Jet does, which is something of a surprise. Normally the reverse is more likely. It may just be that I have the vocabulary wrong and it is supported but just using a different approach.

**FLANDERS** · Sep 15 '10, 06:14 PM

Jeez dont be apologising, any help is great!

**ck9663** · Sep 15 '10, 06:44 PM

Tweak this for now while am working on your sample data...

Good Luck!!!

~~ CK

**ck9663** · Sep 15 '10, 07:08 PM

Here you go...

Your data...

Code:

declare @YourTable as table
(
   ID int,
   priceChange money,
   cMonth varchar(50),
   rn int
)
insert into @YourTable
select 1,100, 'March 2010', 1
union all
select 1,null,'June 2010',    2
union all
select 1,200, 'July 2010',    3
union all
select 2,520, 'March 2010',   1 
union all
select 2,null,'June 2010',    2

Your SQL

Code:

select y1.*, LastValidPrice = isnull(y1.pricechange,y2.pricechange)
from @YourTable y1
   left join @YourTable y2 on y1.id = y2.id and y1.rn = y2.rn + 1

Happy Coding!!!

~~ CK

**FLANDERS** · Sep 15 '10, 09:31 PM

Hey,
Thanks for the input. However, I think my sample data wasnt quite complete. Its not always valid to use
y1.rn = y2.rn + 1, the previous "real" value may be a few rows back. This is why I used the max() function, rather than what you did. A complete sample data would be

Code:

[B]ID    priceChange    Month       rn[/B]
1     100      March 2010         1
1     null     April  2010        2
1     null     May 2010           3
1     null     June 2010          4
1     200      July 2010          5
2     520      March 2010         1 
2     null     April  2010        2
2     null     May 2010           3
2     300      June 2010          4

In this case the value to be filled in for item 1 in April, May and June would come from March i.e. 100. Therefore I cant rely on it being always the previous row.

Any further ideas?

**FLANDERS** · Sep 15 '10, 11:27 PM

From tweaking your link, the following seems to work as required on the "updated" sample date:

Code:

select id, priceChange, 
coalesce((
   select priceChange 
   from YourTable b 
   where a.rn > b.rn and a.id = b.id and b.priceChange is not null and a.priceChange is null order by b.rn desc limit 1 ), 
   a.priceChange
) tmp, 
cMonth
from YourTable a order by id, rn

Is this also not a correlated subquery, which will have similar performance issues?

**FLANDERS** · Sep 16 '10, 03:23 PM

OK I think I have this sorted now. The below query has gone from 17 minutes to 22 seconds on my machine:

Code:

-- Fills in the blanks of the items table by joining each item change record
-- to the previous valid record for the same item, to allow retrival of last "real" price change
select * from (	
	select 
	a.id,
	coalesce(a.priceChange, b.priceChange) as priceChange,
ROW_NUMBER() OVER (PARTITION BY a.id, a.aMonth ORDER BY b.rn desc) as rn1  	
	FROM items a, items b
	where 
	a.id = b.id AND 	-- Only want records that relate to same item
	b.rn <= a.rn and    -- Previous or same (ensures we get items with no change at all) records only 
	b.priceChange IS NOT NULL   -- Previous record must be a "real" record
) tt where rn1 = 1 -- Get the latest "real" record that relates to same item. Latest is determined by the
-- row number column being 1 (since we ordered the partition DESC)

The correlated subquery has been replaced by the use of the analytical ROW_NUMBER() function. I do a self join to get all items before the "current" item, where there is a valid priceChange. The use of <= ensures that if there is no price change at all for a particular item, it will be joined to itself.
As part of the column selection, I assign each record a row number by partitioning the result set on the item id and month, ordering it so that the latest change record will be given a value of 1. This makes it possible to select the latest value using rn1 = 1 rather than selecting the max value.

There has been a huge performance increase using this strategy. It again confirms that correlated subqueries should be looked at if there is a query performance issue.

Thanks for your help, hopefully someone else will benefit from this in the future.

Convert correlated subqery to join?

Convert correlated subqery to join?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment