delete otherwise duplicate records based on differing values in one column

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jmstur2
    New Member
    • Feb 2008
    • 6

    delete otherwise duplicate records based on differing values in one column

    I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

    I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

    Any suggestions or guidance will be appreciated!

    I use:
    Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2
  • Delerna
    Recognized Expert Top Contributor
    • Jan 2008
    • 1134

    #2
    Paste this code into query analyser and see if it does what you need

    Code:
    --Setup a table and add some data for rhe example to work with
    create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
    
    delete from tblDulicateDates
    insert into tblDulicateDates select 1,1,'2007-01-01'
    insert into tblDulicateDates select 1,1,'2007-01-02'
    insert into tblDulicateDates select 1,1,'2007-01-03'
    insert into tblDulicateDates select 1,2,'2007-01-01'
    insert into tblDulicateDates select 1,2,'2007-01-02'
    insert into tblDulicateDates select 1,2,'2007-01-03'
    insert into tblDulicateDates select 1,3,'2007-01-01'
    
    
    
    --show the table contents with the duplicate records except for date
    select * from tblDulicateDates
    
    --Declare the necessary variables
    Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
    
    
    --see if there are any duplicate records
    set @ThereAreDuplicates=(select count(a.num1) from
    	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
    	join
    	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
    	where a.dte<>b.dte)
    
    
    --if there are duplicates then enter the loop
    while @ThereAreDuplicates > 0
    BEGIN
    	--select the duplicates that need to be deleted into a cursor
    	DECLARE DuplicatesCursor CURSOR FOR
    	select a.num1,a.num2,a.dte from
    	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
    	join
    	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
    	where a.dte<>b.dte
    	
    	
    	OPEN DuplicatesCursor
    	FETCH NEXT FROM DuplicatesCursor
    	INTO @Num1,@Num2,@Dte
    	
    
    	--enter a loop that deletes each of the records in the cursor
    	WHILE @@FETCH_STATUS = 0
    	BEGIN
    		DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
    	
    		FETCH NEXT FROM DuplicatesCursor
    		INTO @Num1,@Num2,@Dte
    	END
    	CLOSE DuplicatesCursor
    	DEALLOCATE DuplicatesCursor
    
    	--Check to see if there are any more duplicates still in the table
    	--This is to handle the case where there are 3 or more duplicate records
    	set @ThereAreDuplicates=(select count(a.num1) from
    	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
    	join
    	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
    	where a.dte<>b.dte)
    
    END
    
    --now show the table contents
    -- no duplicates and only the ones that had the max date are left
    select * from tblDulicateDates
    regards

    Comment

    • jmstur2
      New Member
      • Feb 2008
      • 6

      #3
      Thank you. I will give this a try!

      -Jeanne

      ---------------------------------------------------------------------------

      Originally posted by Delerna
      Paste this code into query analyser and see if it does what you need

      Code:
      --Setup a table and add some data for rhe example to work with
      create table tblDulicateDates([Num1] [tinyint],[Num2] [tinyint],[dte] [datetime])
      
      delete from tblDulicateDates
      insert into tblDulicateDates select 1,1,'2007-01-01'
      insert into tblDulicateDates select 1,1,'2007-01-02'
      insert into tblDulicateDates select 1,1,'2007-01-03'
      insert into tblDulicateDates select 1,2,'2007-01-01'
      insert into tblDulicateDates select 1,2,'2007-01-02'
      insert into tblDulicateDates select 1,2,'2007-01-03'
      insert into tblDulicateDates select 1,3,'2007-01-01'
      
      
      
      --show the table contents with the duplicate records except for date
      select * from tblDulicateDates
      
      --Declare the necessary variables
      Declare @ThereAreDuplicates int,@Num1 int,@Num2 int, @Dte datetime
      
      
      --see if there are any duplicate records
      set @ThereAreDuplicates=(select count(a.num1) from
      	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
      	join
      	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
      	where a.dte<>b.dte)
      
      
      --if there are duplicates then enter the loop
      while @ThereAreDuplicates > 0
      BEGIN
      	--select the duplicates that need to be deleted into a cursor
      	DECLARE DuplicatesCursor CURSOR FOR
      	select a.num1,a.num2,a.dte from
      	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
      	join
      	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
      	where a.dte<>b.dte
      	
      	
      	OPEN DuplicatesCursor
      	FETCH NEXT FROM DuplicatesCursor
      	INTO @Num1,@Num2,@Dte
      	
      
      	--enter a loop that deletes each of the records in the cursor
      	WHILE @@FETCH_STATUS = 0
      	BEGIN
      		DELETE FROM tblDulicateDates where Num1=@Num1 and Num2=@Num2 and Dte=@Dte
      	
      		FETCH NEXT FROM DuplicatesCursor
      		INTO @Num1,@Num2,@Dte
      	END
      	CLOSE DuplicatesCursor
      	DEALLOCATE DuplicatesCursor
      
      	--Check to see if there are any more duplicates still in the table
      	--This is to handle the case where there are 3 or more duplicate records
      	set @ThereAreDuplicates=(select count(a.num1) from
      	(select num1,num2,min(Dte) as Dte from tblDulicateDates group by num1,num2)a
      	join
      	(select num1,num2,max(Dte) as Dte from tblDulicateDates group by num1,num2)b on a.num1=b.num1 and a.num2=b.num2
      	where a.dte<>b.dte)
      
      END
      
      --now show the table contents
      -- no duplicates and only the ones that had the max date are left
      select * from tblDulicateDates
      regards

      Comment

      • jmstur2
        New Member
        • Feb 2008
        • 6

        #4
        The sample code worked well. It didn't quite do what I need, though. I tried to fit it into my situation, but had no luck.

        I have been given a table (TABLEDUPS) of known duplicate records. I need to delete records from a different table (TABLEEVENTS) based on this duplicate records table.

        So what I really need to do is look at TABLEDUPS to determine the duplicate records to keep (most recent) and delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS.

        Any ideas...? Thanks.



        Originally posted by jmstur2
        I have a table with what I consider duplicate records. Data in all columns are duplicate except for the date column, meaning that duplicate data was entered on different dates and those dates were stored along with the data.

        I want to delete the records except for most recent one. I can select the client ID field and the max(date field) to determine the ones I want to keep, but how do I determine the ones to delete. There are often more than two duplicates, so the min(date field) doesn't do it.

        Any suggestions or guidance will be appreciated!

        I use:
        Microsoft SQL Enterprise Manager 8.0 running on Windows XP SP2

        Comment

        • ck9663
          Recognized Expert Specialist
          • Jun 2007
          • 2878

          #5
          "delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

          Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

          -- CK

          Comment

          • jmstur2
            New Member
            • Feb 2008
            • 6

            #6
            They need to match on key fields. The structure for these key fields is the same for both tables:
            Region varchar 2
            ID varchar 9
            Claim varchar 6
            SVCDate datetime 8
            Modifier varchar 2
            Provider varchar 15
            Profess varchar 15
            Place varchar 2
            ReportDate datetime 8

            Once the match is made, I need to look at ReportDate in TABLEDUPS, then delete from TABLEEVENTS all records except the one with the most recent date.

            Thanks.

            Originally posted by ck9663
            "delete records in TABLEEVENTS that match the remaining record(s) in TABLEDUPS."

            Could you define "match" ? What columns will you use to determine if if matches? You can give the structure of those two tables.

            -- CK

            Comment

            • ck9663
              Recognized Expert Specialist
              • Jun 2007
              • 2878

              #7
              Run this first:
              Code:
              select tableevents.* 
              from tableevents
              left join
              (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
              group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
              on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
               and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
               and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
               and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
               and dups.latest > tableevents.ReportDate
              If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

              Happy coding

              -- CK

              Comment

              • jmstur2
                New Member
                • Feb 2008
                • 6

                #8
                Thanks. I'll give this a try.

                Originally posted by ck9663
                Run this first:
                Code:
                select tableevents.* 
                from tableevents
                left join
                (select Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place, max(ReportDate) as latest from TABLEDUPS
                group by Region, ID, Claim, SVCDate, Modifier, Provider, Profess, Place) as dups
                on dups.Region = tableevents.Region and  dups.ID = tableevents.ID 
                 and dups.Claim = tableevents.Claim and dups.SVCDate = tableevents.SVCDate 
                 and dups.Modifier = tableevents.Modifier and dups.Provider = tableevents.Provider 
                 and dups.Profess = tableevents.Profess and dups.Place = tableevents.Place 
                 and dups.latest > tableevents.ReportDate
                If it returns the record you want you can delete it. Just make it a DELETE query. The reason I'm asking you run this first so that you don't delete the rows you don't want. By running a SELECT first, you can take a look first if your deleting the right rows. But it's always good to have a backup.

                Happy coding

                -- CK

                Comment

                Working...