Convert correlated subqery to join?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • FLANDERS
    New Member
    • Jul 2008
    • 8

    Convert correlated subqery to join?

    I have a query that does a self join following by correlated subquery. I want to be able to select from "this row" or the latest row before "this row", based on the "rn" column, where column1 is populated. Its currently working using a correlated subquery as follows:

    Code:
    select 
    a.id,
    coalesce(a.typeName, b.typeName) as typeName
    FROM table1 a, table1 b
    where a.id = b.id AND b.rn = (   
        select max(rn) from table1 where id = a.id and rn <= a.rn and column1 is not null
    )
    Its a bit slow though since its needs to execute the inner query for each row in the outer query. I have seen examples of converting this to a join but cant seem to get it to work, as I always need to reference the outer query for

    Code:
    rn <= a.rn
    part.

    Can anyone help converting this to a non-correlated query?

    thanks
  • NeoPa
    Recognized Expert Moderator MVP
    • Oct 2006
    • 32636

    #2
    Perhaps it may be easier to understand what you're looking for if you described it in words.

    Comment

    • FLANDERS
      New Member
      • Jul 2008
      • 8

      #3
      OK, I used sample tables/columns in first example. There scenario is I have a table of item changes over time, basically ID, priceChange and a calculated row number. Some items may not change in a particular month and so I have a situation where I end up with a table like
      Code:
      [B]ID    priceChange    Month       rn[/B]
      1     100      March 2010  1
      1     null     June 2010   2
      1     200      July 2010   3
      2     520      March 2010  1 
      2     null     June 2010   2
      I want to fill in the null amount for each item by selecting the previous "real" item price for that particular item. Hence I want to end up with

      Code:
      [B]ID    priceChange    Month    rn[/B]
      1     100      March 2010  1
      1     100      June 2010   2
      1     200      July 2010   3
      2     520      March 2010  1 
      2     520      June 2010   2
      To do this, I do a join based on the item id being the same and select the record with largest rn value that has same ID.

      Code:
      select 
      a.id,
      coalesce(a.priceChange, b.priceChange) as priceChange
      FROM items a, items b
      where a.id = b.id AND b.rn = (   
          select max(rn) from items where id = a.id and rn <= a.rn
      )
      It works fine, just a bit slow, due to the correlated subquery. I have seen some examples of how these can be replaced with JOINs but I cant get it to fit my problem

      Is that any clearer?

      Comment

      • NeoPa
        Recognized Expert Moderator MVP
        • Oct 2006
        • 32636

        #4
        Yes. Certainly.

        Try this :

        Code:
        SELECT [ID]
             , COALESCE(tI.Price, sI.PrevPrice) AS [Price]
             , [Month]
             , [RN]
        FROM   [Items] AS tI
               LEFT OUTER JOIN
               (
            SELECT   [Price] AS [PrevPrice]
            FROM     [Items]
                     INNER JOIN
                     (
                SELECT   [ID]
                       , MAX([RN]) AS MaxRN
                FROM     [Items]
                GROUP BY [ID]
                ) AS sIi
              ON     tI.ID = sIi.ID
             AND     tI.RN > sIi.RN
            ) AS sI
          ON   tI.ID = sI.ID
        First (logically - not in the SQL) find the RN of the previous record (The sub-sub-query sIi), then find the [Price] that matches that record (The sub-qury sI). Lastly, match this record (if found - hence LEFT OUTER JOIN) with the original and coalesce in the normal way.

        Let us know how this works out for you.

        Welcome to Bytes!

        Comment

        • FLANDERS
          New Member
          • Jul 2008
          • 8

          #5
          That doesnt run. The ON clause of the INNER JOIN is incorrect it appears:
          Code:
          ON tI.ID = sIi.ID AND tI.RN > sIi.RN
          , since you are referring to an alias (tI) that isnt available at that point:
          The multi-part identifier "tI.ID" could not be bound

          Comment

          • NeoPa
            Recognized Expert Moderator MVP
            • Oct 2006
            • 32636

            #6
            I'll try for a deeper look at this tomorrow. Sorry to get it wrong (at least so my initial tests would indicate).

            Having most of my SQL experience in Jet (Access) I do sometimes lead people amiss (unintentionall y of course). Preliminary tests indicate that T-SQL doesn't handle this as well as Jet does, which is something of a surprise. Normally the reverse is more likely. It may just be that I have the vocabulary wrong and it is supported but just using a different approach.

            Comment

            • FLANDERS
              New Member
              • Jul 2008
              • 8

              #7
              Jeez dont be apologising, any help is great!

              Comment

              • ck9663
                Recognized Expert Specialist
                • Jun 2007
                • 2878

                #8
                Tweak this for now while am working on your sample data...

                Good Luck!!!

                ~~ CK

                Comment

                • ck9663
                  Recognized Expert Specialist
                  • Jun 2007
                  • 2878

                  #9
                  Here you go...

                  Your data...

                  Code:
                  declare @YourTable as table
                  (
                     ID int,
                     priceChange money,
                     cMonth varchar(50),
                     rn int
                  )
                  insert into @YourTable
                  select 1,100, 'March 2010', 1
                  union all
                  select 1,null,'June 2010',    2
                  union all
                  select 1,200, 'July 2010',    3
                  union all
                  select 2,520, 'March 2010',   1 
                  union all
                  select 2,null,'June 2010',    2
                  Your SQL

                  Code:
                  select y1.*, LastValidPrice = isnull(y1.pricechange,y2.pricechange)
                  from @YourTable y1
                     left join @YourTable y2 on y1.id = y2.id and y1.rn = y2.rn + 1
                  Happy Coding!!!

                  ~~ CK

                  Comment

                  • FLANDERS
                    New Member
                    • Jul 2008
                    • 8

                    #10
                    Hey,
                    Thanks for the input. However, I think my sample data wasnt quite complete. Its not always valid to use
                    y1.rn = y2.rn + 1, the previous "real" value may be a few rows back. This is why I used the max() function, rather than what you did. A complete sample data would be

                    Code:
                    [B]ID    priceChange    Month       rn[/B]
                    1     100      March 2010         1
                    1     null     April  2010        2
                    1     null     May 2010           3
                    1     null     June 2010          4
                    1     200      July 2010          5
                    2     520      March 2010         1 
                    2     null     April  2010        2
                    2     null     May 2010           3
                    2     300      June 2010          4
                    In this case the value to be filled in for item 1 in April, May and June would come from March i.e. 100. Therefore I cant rely on it being always the previous row.

                    Any further ideas?

                    Comment

                    • FLANDERS
                      New Member
                      • Jul 2008
                      • 8

                      #11
                      From tweaking your link, the following seems to work as required on the "updated" sample date:

                      Code:
                      select id, priceChange, 
                      coalesce((
                         select priceChange 
                         from YourTable b 
                         where a.rn > b.rn and a.id = b.id and b.priceChange is not null and a.priceChange is null order by b.rn desc limit 1 ), 
                         a.priceChange
                      ) tmp, 
                      cMonth
                      from YourTable a order by id, rn
                      Is this also not a correlated subquery, which will have similar performance issues?

                      Comment

                      • FLANDERS
                        New Member
                        • Jul 2008
                        • 8

                        #12
                        OK I think I have this sorted now. The below query has gone from 17 minutes to 22 seconds on my machine:
                        Code:
                        -- Fills in the blanks of the items table by joining each item change record
                        -- to the previous valid record for the same item, to allow retrival of last "real" price change
                        select * from (	
                        	select 
                        	a.id,
                        	coalesce(a.priceChange, b.priceChange) as priceChange,
                        ROW_NUMBER() OVER (PARTITION BY a.id, a.aMonth ORDER BY b.rn desc) as rn1  	
                        	FROM items a, items b
                        	where 
                        	a.id = b.id AND 	-- Only want records that relate to same item
                        	b.rn <= a.rn and    -- Previous or same (ensures we get items with no change at all) records only 
                        	b.priceChange IS NOT NULL   -- Previous record must be a "real" record
                        ) tt where rn1 = 1 -- Get the latest "real" record that relates to same item. Latest is determined by the
                        -- row number column being 1 (since we ordered the partition DESC)
                        The correlated subquery has been replaced by the use of the analytical ROW_NUMBER() function. I do a self join to get all items before the "current" item, where there is a valid priceChange. The use of <= ensures that if there is no price change at all for a particular item, it will be joined to itself.
                        As part of the column selection, I assign each record a row number by partitioning the result set on the item id and month, ordering it so that the latest change record will be given a value of 1. This makes it possible to select the latest value using rn1 = 1 rather than selecting the max value.

                        There has been a huge performance increase using this strategy. It again confirms that correlated subqueries should be looked at if there is a query performance issue.

                        Thanks for your help, hopefully someone else will benefit from this in the future.

                        Comment

                        Working...