How do I dedupe my recordset?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MWilson

    How do I dedupe my recordset?

    I have just been given the task of learning SQL server at work. I can manage the basics, and I do have a standalone dedupe prog, but tat is pretty slow.

    If I have data like this

    1. Mr / Smith / 1 Park Road / ME4 2TL / 1 ParkME4 2TL
    2. Mr / Brown / 2 Park Road / ME4 2TL / 2 ParkME4 2TL
    3. Mrs/ Brown / 2 Park Road / ME4 2TL / 2 ParkME4 2TL

    That is a title, surname, address, postcode and a field made of 6 characters of address and the postcode.

    I want to be able to dedupe on the final field, so that only one instance of '2 ParkME4 2TL' is left over. It matters not, which instance is deleted, or remains.

    I have seen lots of talk of removing duplicates online but it seems to all be about removing a whole duplicated row, or something that would remove both '2 ParkME4 2TL' instances.

    Thanks very much :o)
  • gpl
    New Member
    • Jul 2007
    • 152

    #2
    Im assuming that the 1. 2. 3. at the start of each row is a unique identifier
    Assume your table is called TABLE

    Code:
    Select t.* from TABLE inner join
    (
    Select SixChar, Min(ID) as ID
    from TABLE t
    group by SixChar
    ) u
    on t.ID=u.ID
    The inner query which creates a derived table selects only the unique values of SixChar, along with the lowest value of the ID for that value of SixChar, the join allows you to return the whole row from TABLE that has the same value of ID and SixChar.

    It will of course be a bit trickier if you do not have a unique row identifier.

    Note that the house number and postcode will uniquely identify an address

    Comment

    • MWilson

      #3
      That is spot on, thankyou very much!

      Within that same bit of code, is it possible to have the 'deduped' data placed into a permanent table? As far as I can see in my SQL studio, the derived table can't be worked on in its own right.

      Comment

      • gpl
        New Member
        • Jul 2007
        • 152

        #4
        Yes, of course, standard sql allows for this, either
        1] you have a table called ddtable with the same structure as TABLE already existing
        Code:
        Insert ddtable
        Select t.* from TABLE inner join 
        ( 
        Select SixChar, Min(ID) as ID 
        from TABLE t 
        group by SixChar 
        ) u 
        on t.ID=u.ID
        2] if you want to create a table and populate it at the same time
        Code:
        Select t.* 
        INTO ddtable
        from TABLE inner join 
        ( 
        Select SixChar, Min(ID) as ID 
        from TABLE t 
        group by SixChar 
        ) u 
        on t.ID=u.ID
        the chief difference in the 2 approaches are that with the second one, you can create the table at the same time as filling it.
        With the first approach, it will append data to ddtable every time you run it

        Comment

        Working...