An optimum query..?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • dsdevonsomer@gmail.com

    An optimum query..?

    Hello friends,
    I have one simple question. I have two tables. 1 ( Table A ) has about
    2.5 million rows and second one ( Table B ) has about 1 million. There
    are common ID fields in both tables. I want join them on ID field and
    get all rows of Table A which are not in there Table B.

    When I ran following two queries, I got same result set, but time it
    took was very different.

    Following query took 1:35 minutes
    SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
    NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
    FROM [Table A] Tbl1
    WHERE NOT exists (
    SELECT 1 from [Table B] Tbl2 WHERE
    Tbl1.UID = Tbl2.UID )

    vs this one took .45 seconds.

    SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
    NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
    FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
    Tbl2.UID
    WHERE Tbl2.UID IS NULL

    Which option is better ? I have subsequent joins to another table
    which has about 2 mil more rows and trying to optimize the response
    time.

    I appreciate all help from the community.
    JB
  • Dan Guzman

    #2
    Re: An optimum query..?

    Which option is better ? I have subsequent joins to another table
    which has about 2 mil more rows and trying to optimize the response
    time.
    I would have expected both the NOT EXISTS and the LEFT JOIN...WHERE..I S NULL
    to yield the same plan and performance. However, the second query you
    posted is invalid (has a WHERE in the JOIN clause). Can you post the actual
    table DDL and queries?

    --
    Hope this helps.

    Dan Guzman
    SQL Server MVP


    <dsdevonsomer@g mail.comwrote in message
    news:195e21cb-f77d-48ef-b040-bbb4c403750d@m7 3g2000hsh.googl egroups.com...
    Hello friends,
    I have one simple question. I have two tables. 1 ( Table A ) has about
    2.5 million rows and second one ( Table B ) has about 1 million. There
    are common ID fields in both tables. I want join them on ID field and
    get all rows of Table A which are not in there Table B.
    >
    When I ran following two queries, I got same result set, but time it
    took was very different.
    >
    Following query took 1:35 minutes
    SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
    NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
    FROM [Table A] Tbl1
    WHERE NOT exists (
    SELECT 1 from [Table B] Tbl2 WHERE
    Tbl1.UID = Tbl2.UID )
    >
    vs this one took .45 seconds.
    >
    SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
    NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
    FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
    Tbl2.UID
    WHERE Tbl2.UID IS NULL
    >
    Which option is better ? I have subsequent joins to another table
    which has about 2 mil more rows and trying to optimize the response
    time.
    >
    I appreciate all help from the community.
    JB

    Comment

    • dsdevonsomer@gmail.com

      #3
      Re: An optimum query..?

      On Apr 2, 10:21 pm, "Dan Guzman" <guzma...@nospa m-
      online.sbcgloba l.netwrote:
      Which option is better ? I have subsequent joins to another table
      which has about 2 mil more rows and trying to optimize the response
      time.
      >
      I would have expected both the NOT EXISTS and the LEFT JOIN...WHERE..I S NULL
      to yield the same plan and performance.  However, the second query you
      posted is invalid (has a WHERE in the JOIN clause).  Can you post the actual
      table DDL and queries?
      >
      --
      Hope this helps.
      >
      Dan Guzman
      SQL Server MVPhttp://weblogs.sqlteam .com/dang/
      >
      <dsdevonso...@g mail.comwrote in message
      >
      news:195e21cb-f77d-48ef-b040-bbb4c403750d@m7 3g2000hsh.googl egroups.com...
      >
      >
      >
      Hello friends,
      I have one simple question. I have two tables. 1 ( Table A ) has about
      2.5 million rows and second one ( Table B ) has about 1 million. There
      are common ID fields in both tables. I want join them on ID field and
      get all rows of Table A which are not in there Table B.
      >
      When I ran following two queries, I got same result set, but time it
      took was very different.
      >
      Following query took 1:35 minutes
      SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
      NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
      FROM [Table A] Tbl1
      WHERE NOT exists (
      SELECT  1 from [Table B] Tbl2 WHERE
      Tbl1.UID = Tbl2.UID )
      >
      vs this one took .45 seconds.
      >
      SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
      NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
      FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
      Tbl2.UID
      WHERE Tbl2.UID IS NULL
      >
      Which option is better ? I have subsequent joins to another table
      which has about 2 mil more rows and trying to optimize the response
      time.
      >
      I appreciate all help from the community.
      JB- Hide quoted text -
      >
      - Show quoted text -
      Oh,
      second query is just without WHERE clause. It was an honest mistake. I
      checked on client statistics and execution plan. In Left Outer join
      query, Client processing time 67858, vs in NOT EXISTS query Client
      processing time 72074 .

      Thanks Dan..

      Comment

      • DeanGC

        #4
        Re: An optimum query..?

        On Apr 2, 8:06 pm, dsdevonso...@gm ail.com wrote:
        Oh,
        second query is just without WHERE clause. It was an honest mistake. I
        checked on client statistics and execution plan. In Left Outer join
        query, Client processing time 67858, vs in NOT EXISTS query Client
        processing time 72074 .
        What do the query plans look like? Like Dan, I would expect that these
        queries would produce the same plan.

        Did you run the queries multiple times? If you ran each one only once,
        the difference may be due to caching.

        Comment

        • --CELKO--

          #5
          Re: An optimum query..?

          Here is another alternative, if you have 2005 or 2008. Since you did
          not use very good data element names, you might not know about SAN
          ("Standard Address Number") which is used in many industries for
          address data. On Oracle and other DBs, the EXCEPT is quite fast.

          SELECT san, last_name, first_name, house_nbr, street_name, city_name,
          state_code
          FROM Addresses
          WHERE san
          IN
          (SELECT san FROM Addresses
          EXCEPT
          SELECT san FROM Foobar); -- not a great table name ..

          Comment

          Working...