[How to] avoid cross product/Cartesian product to improve performance

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • thelightkeeper
    New Member
    • Oct 2007
    • 2

    [How to] avoid cross product/Cartesian product to improve performance

    Hi,

    I have 1 table contains about 4 millions entries structure like below:

    [Alarm History]
    (
    AlarmID int,
    SetTime datetime
    )

    Now I want to :
    SELECT all the AlarmID that happened during Jan 2008 and no such AlarmID during Dec 2007. I used:

    SELECT * FROM [Alarm History]
    WHERE SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
    AND AlarmID NOT IN

    (
    SELECT AlarmID FROM [Alarm History]
    WHERE SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007'
    )

    My querry take very slow since there are more than 20,000 entries during Dec 2007 and 15,000 entries during Jan 2008,

    "NOT IN" operation is like an Cartesian Product : Compare each AlarmID during December 2007 with each AlarmID during Jan 2008. The results is very slow performance.

    Anybody can help me find an alternatives way to do this ? Thanks a lot.
  • code green
    Recognized Expert Top Contributor
    • Mar 2007
    • 1726

    #2
    It is probably the sub-query that slows it down.
    There is an equivalent JOIN looking for NULL instead of NOT IN
    Code:
    SELECT * FROM [Alarm History] Jan
    LEFT JOIN [Alarm History] Dec ON (Jan.AlarmID = Dec.AlarmID
    AND Dec.SetTime BETWEEN '1-dec-2007' AND '31-DEC-2007')
    WHERE Jan.SetTime BETWEEN '1-jan-2008' AND '31-jan-2008'
    AND IS NULL Dec.AlarmID
    I think

    Comment

    • Delerna
      Recognized Expert Top Contributor
      • Jan 2008
      • 1134

      #3
      If you want to compare the effectiveness of two different methods of writing a query then

      Open query analyser and paste the two queries into the query analyser window

      Then click the display estimated execution plan button and this will display a graphical representation of how each query will execute

      I did that for the 2 queries here, changing table and field names to suit one of my tables

      Sorry to say that, at least in the check that I did, code green's method was 3 times slower. 28%(lightkeeper s method) to 72% (code greens method)
      That may be different when you try it on your own table ???

      Also the execution plan will show you which parts of your query is spending the most time and therefow shows where you might be able to improve performance.
      Look for loops and and high I/O costs when looking for performance bottlenecks

      All the above is in relation to MS SQL Server. I guess other databse engines have something similar

      Comment

      • code green
        Recognized Expert Top Contributor
        • Mar 2007
        • 1726

        #4
        Wow! 3 times slower. That suprised me.
        What is really slowing the query down is the date comparison.
        I have tried similar queries in both formats on my online server (1and1).
        They both timed out.
        I was able to get around it because my table used auto-ids.
        I then used SQL variables to get the IDs of the minimum and maximum dates.
        Then did a SELECT comparing IDs rather than DATE.
        Very fast.
        Code:
         //Get the IDs of earliest and latest dates
        SELECT @earliest :=   MAX(AlarmID)  FROM [Alarm History]
        WHERE SetTime >= '1-jan-2008'; 
        SELECT @latest :=   MAX(AlarmID)  FROM [Alarm History]
        WHERE SetTime <= '31-jan-2008'
        
        Use these to filter the main query
        SELECT AlarmID FROM [Alarm History]
        WHERE AlarmID > @earliest AND AlarmID< @latest 
        AND  ....
        I kow this is MySql but could you adapt this idea to your table?

        Comment

        • Delerna
          Recognized Expert Top Contributor
          • Jan 2008
          • 1134

          #5
          Yea it surprised me also as I also thought that the subquery was the problem but it seems that SQL Server does a prettey good job of executing it. I also tried a method of my own and was beaten 48% to 52%. I personally have not used a "where not in" query myself but I think I will be taking a closer look at it in the future.

          One thing that may help the speed of this query is indexes I have seen slow queries that had nothing wrong with the way it was written. The sheer number of records was the cause. Well thought out indexes took those queries from minutes to a few seconds.

          Comment

          • ozone702
            New Member
            • Jul 2013
            • 1

            #6
            use a left outer join to your "not in" table, group by some columns in table a and at least one column in table b (the "not in" table) having b.some_column is null.

            This is much more effecient than a not in statement.

            i.e.

            Code:
            select a.col1, a.col2
            from tablea as a
            left outer join tableb as b
              on b.fkey_id = a.id
            group by a.col1, a.col2, b.some_column
            having b.some_column is null
            Last edited by Rabbit; Jul 16 '13, 03:44 PM. Reason: Please use code tags when posting code.

            Comment

            Working...