Help with a Join and Duplicates?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MaxPenguin

    Help with a Join and Duplicates?

    Hi All,


    I am banging my head against a brick wall over this problem, so any
    help in the correct direction would be muchly appreciated!



    I have 2 SQL (MS SQL) server tables, realated to -
    a Property,
    Sales of that property.


    A property is uniquely identifed by its Roll, valuation Number and
    Suffix (not my choosing).

    Each property can only appear in the property table once, and can only
    have 1 assessment - but can have multiple sales (ie - over the
    annalysis period the same property can sell more than once).

    There is approximatly 19000 properties relating to about 8000 sales.


    When creating a query to list property and most recent sale (if there
    is any) I end up with somthing like this -
    SELECT [roll], [valuation], [suffix], [sale date]
    FROM [property]
    LEFT JOIN [sales]
    ON
    [property].[roll] = [sales].[roll] AND
    [property].[valuation] = [sales].[valuation] AND
    [property].[suffix] = [sales].[suffix]
    (table names simplifed).

    I get rows where there is all the property data there, but sale date
    (etc.) is null (as I would expect from a left join), but the problem is
    - when there is more than 1 sale for a property it pulls out another
    copy of the property data.

    In short, because of that I come out with more records than properties.

    ie -
    roll valuation suffix sale date
    12 456789 A 1/1/2003
    12 788988 B NULL
    14 123456 A 1/1/2003
    14 123456 A 1/1/2004
    (Note - the last two are the same property).


    I didn't know that the left join can affect both joined tables!


    Is there any way around this? Any suggestions/hints in the right
    direction would be very much appreciated!

    THANKS!

  • Michael Gray

    #2
    Re: Help with a Join and Duplicates?

    On 11 Apr 2005 21:27:53 -0700, "MaxPenguin "
    <malcolm.lockye r@origen.co.nz> wrote:
    [color=blue]
    >Hi All,
    >
    >
    >I am banging my head against a brick wall over this problem, so any
    >help in the correct direction would be muchly appreciated!
    >
    >
    >
    >I have 2 SQL (MS SQL) server tables, realated to -
    >a Property,
    >Sales of that property.
    >
    >
    >A property is uniquely identifed by its Roll, valuation Number and
    >Suffix (not my choosing).
    >
    >Each property can only appear in the property table once, and can only
    >have 1 assessment - but can have multiple sales (ie - over the
    >annalysis period the same property can sell more than once).
    >
    >There is approximatly 19000 properties relating to about 8000 sales.
    >
    >
    >When creating a query to list property and most recent sale (if there
    >is any) I end up with somthing like this -
    >SELECT [roll], [valuation], [suffix], [sale date]
    >FROM [property]
    >LEFT JOIN [sales]
    >ON
    >[property].[roll] = [sales].[roll] AND
    >[property].[valuation] = [sales].[valuation] AND
    >[property].[suffix] = [sales].[suffix]
    >(table names simplifed).
    >
    >I get rows where there is all the property data there, but sale date
    >(etc.) is null (as I would expect from a left join), but the problem is
    >- when there is more than 1 sale for a property it pulls out another
    >copy of the property data.
    >
    >In short, because of that I come out with more records than properties.
    >
    >ie -
    >roll valuation suffix sale date
    >12 456789 A 1/1/2003
    >12 788988 B NULL
    >14 123456 A 1/1/2003
    >14 123456 A 1/1/2004
    >(Note - the last two are the same property).
    >
    >
    >I didn't know that the left join can affect both joined tables!
    >
    >
    >Is there any way around this? Any suggestions/hints in the right
    >direction would be very much appreciated!
    >
    >THANKS![/color]

    Are you able to tell us what want the query to return?
    (And why?)

    Comment

    • Thomas R. Hummel

      #3
      Re: Help with a Join and Duplicates?

      It isn't really a matter of the left join "affecting both tables". For
      any join that you use, if you have a many to one relationship then you
      are going to get multiple rows for the "one" side since the join
      effectively creates a cartesian product to start with. Whittling that
      down is a matter of your join criteria.

      In your description of the problem you state that you want the "most
      recent sale" but there is nothing in your query to try to limit the
      results to the most recent sale. Either of the queries below should
      work. I usually see better performance using the LEFT JOIN/IS NOT NULL
      method, but using NOT EXISTS is a bit more readable/logical in my
      opinion.

      SELECT p.roll, p.valuation, p.suffix, s.[sale date]
      FROM property p
      LEFT JOIN sales s ON p.roll = s.roll
      AND p.valuation = s.valuation
      AND p.suffix = s.suffix
      LEFT JOIN sales s2 ON s2.roll = s.roll
      AND s2.valuation = s.valuation
      AND s2.suffix = s.suffix
      AND s2.[sale date] > s.[sale date]
      WHERE s2.roll IS NULL

      SELECT p.roll, p.valuation, p.suffix, s.[sale date]
      FROM property p
      LEFT JOIN sales s ON p.roll = s.roll
      AND p.valuation = s.valuation
      AND p.suffix = s.suffix
      WHERE NOT EXISTS (SELECT *
      FROM sales s2
      WHERE s2.valuation = s.valuation
      AND s2.suffix = s.suffix
      AND s2.roll = s.roll
      AND s2.[sale date] > s.[sale date])

      Both methods assume that you cannot have two sales on the same exact
      date for the same property. They both effectively remove any rows where
      there is another row for the same property with a later sale date. That
      will leave only those rows with the latest (most recent) sale date.

      HTH,
      -Tom.

      Comment

      • MaxPenguin

        #4
        Re: Help with a Join and Duplicates?


        Thomas R. Hummel wrote:[color=blue]
        > In your description of the problem you state that you want the "most
        > recent sale" but there is nothing in your query to try to limit the
        > results to the most recent sale. Either of the queries below should
        > work. I usually see better performance using the LEFT JOIN/IS NOT[/color]
        NULL[color=blue]
        > method, but using NOT EXISTS is a bit more readable/logical in my
        > opinion.[/color]


        Thanks your solution works perfectly. Thats exactly what I needed
        really - a way to tell it that I did just want the most recent sales.



        Thanks again :)!




        [color=blue]
        >
        > SELECT p.roll, p.valuation, p.suffix, s.[sale date]
        > FROM property p
        > LEFT JOIN sales s ON p.roll = s.roll
        > AND p.valuation = s.valuation
        > AND p.suffix = s.suffix
        > LEFT JOIN sales s2 ON s2.roll = s.roll
        > AND s2.valuation = s.valuation
        > AND s2.suffix = s.suffix
        > AND s2.[sale date] > s.[sale date]
        > WHERE s2.roll IS NULL
        >
        > SELECT p.roll, p.valuation, p.suffix, s.[sale date]
        > FROM property p
        > LEFT JOIN sales s ON p.roll = s.roll
        > AND p.valuation = s.valuation
        > AND p.suffix = s.suffix
        > WHERE NOT EXISTS (SELECT *
        > FROM sales s2
        > WHERE s2.valuation = s.valuation
        > AND s2.suffix = s.suffix
        > AND s2.roll = s.roll
        > AND s2.[sale date] > s.[sale date])
        >
        > Both methods assume that you cannot have two sales on the same exact
        > date for the same property. They both effectively remove any rows[/color]
        where[color=blue]
        > there is another row for the same property with a later sale date.[/color]
        That[color=blue]
        > will leave only those rows with the latest (most recent) sale date.
        >
        > HTH,
        > -Tom.[/color]

        Comment

        Working...