Removing Duplicates based on most recent

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bard777
    New Member
    • Jan 2008
    • 23

    Removing Duplicates based on most recent

    I fully expect to have a duh moment when I read replies to this....BUT....

    I have a table with duplicates in the [LN] field (could be 2 records with same value, might be 7 records). I need to delete all but the the most recent based on the [eDate] field.

    I hope this is enough info.

    Thanks.
  • gershwyn
    New Member
    • Feb 2010
    • 122

    #2
    Originally posted by bard777
    I fully expect to have a duh moment when I read replies to this....BUT....

    I have a table with duplicates in the [LN] field (could be 2 records with same value, might be 7 records). I need to delete all but the the most recent based on the [eDate] field.

    I hope this is enough info.

    Thanks.
    Do you have a unique key set up in your table?

    If so, you could do something like this:
    Code:
    DELETE Table.* FROM Table
    WHERE [ID] NOT IN (SELECT Last(ID) FROM
    (SELECT ID, LN FROM Table ORDER BY eDate)
    GROUP BY LN);
    That's a lot more sub-queries then I like to put into a criteria expression, but it gives you the idea. Also - do you ever have two records with the same LN and the same eDate? Would you want to keep them all, or does it matter which is deleted?

    Things are a little trickier if there's no ID field. Another thing to consider is you could select the data you want with a GROUP BY clause and make a new table from it. I wouldn't recommend it if this is something you will be repeating, but it might be an option depending on your setup.

    Comment

    • bard777
      New Member
      • Jan 2008
      • 23

      #3
      Originally posted by gershwyn
      Do you have a unique key set up in your table?

      If so, you could do something like this:
      Code:
      DELETE Table.* FROM Table
      WHERE [ID] NOT IN (SELECT Last(ID) FROM
      (SELECT ID, LN FROM Table ORDER BY eDate)
      GROUP BY LN);
      That's a lot more sub-queries then I like to put into a criteria expression, but it gives you the idea. Also - do you ever have two records with the same LN and the same eDate? Would you want to keep them all, or does it matter which is deleted?

      Things are a little trickier if there's no ID field. Another thing to consider is you could select the data you want with a GROUP BY clause and make a new table from it. I wouldn't recommend it if this is something you will be repeating, but it might be an option depending on your setup.
      Yes, I have an ID field, so this is a good lead, thanks.

      It is possible that [eDate] could be the same as well, but in this case either will do. The real issue is there is no data validation on the front end.

      Comment

      • Jim Doherty
        Recognized Expert Contributor
        • Aug 2007
        • 897

        #4
        Originally posted by bard777
        I fully expect to have a duh moment when I read replies to this....BUT....

        I have a table with duplicates in the [LN] field (could be 2 records with same value, might be 7 records). I need to delete all but the the most recent based on the [eDate] field.

        I hope this is enough info.

        Thanks.
        The following function will open a recordset examining two columns LN and edate in a table named Table1 and stack records in Ascending order of LN and descending descending order of edate. The procedure loops records skipping over the first LN number (ie the latest date for that LN value) and deletes any other records having the same LN value in the stack. It repeats this process for each individual LN number found until the end of the table. The end result is an individual LN value in the first column and in the second column the latest date for that LN number. Obviously test this out on a copy table to see if it fits your needs

        Code:
        Function Get_Rid_Of_LNRecords_Except_Latest_EDate()
        On Error GoTo Err_Get_Rid_Of_LNRecords_Except_Latest_EDate
        Dim db As DAO.Database
        Dim rst As DAO.Recordset
        Set db = CurrentDb
        mysql = "SELECT Table1.LN, Table1.eDate "
        mysql = mysql & "FROM Table1 "
        mysql = mysql & "ORDER BY Table1.LN, Table1.eDate DESC;"
        
        ' open a recordset of two columns
        ' sorting records by LN first then the edate descending
        Set rst = db.OpenRecordset(mysql, dbOpenDynaset)
        ' if a dataset is available loop through records
        ' skipping the first LN record encountered and deleting the rest
        ' the first LN record encountered will be first LN record for any given date
        
        mypointer = rst!LN
               
                Do While Not rst.BOF And Not rst.EOF
                    'skip the first one
                    rst.MoveNext
                            If mypointer = rst!LN Then
                               rst.Delete
                            Else
                            'ressign the pointer because we have encountered
                            ' a new LN
                                mypointer = rst!LN
                            End If
                 Loop
        
        rst.Close
        mysql = ""
        Set rst = Nothing
        Set db = Nothing
        MsgBox "Process Complete", vbInformation, "System Message"
        Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate:
            Exit Function
        
        Err_Get_Rid_Of_LNRecords_Except_Latest_EDate:
            If Err = 3021 Then
                'just ignore because encountered last row
            MsgBox "Process Complete", vbInformation, "System Message"
            Else
                MsgBox "Error " & Err.Number & " " & Err.Description
            End If
            Resume Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate
        
        End Function
        You can run the above if you paste it as is into a new module and then place a command button on a form (without using the wizard) then in the property box for the command button at the 'On click' event just type this =Get_Rid_Of_LNR ecords_Except_L atest_EDate()

        Comment

        • bard777
          New Member
          • Jan 2008
          • 23

          #5
          Originally posted by Jim Doherty
          The following function will open a recordset examining two columns LN and edate in a table named Table1 and stack records in Ascending order of LN and descending descending order of edate. The procedure loops records skipping over the first LN number (ie the latest date for that LN value) and deletes any other records having the same LN value in the stack. It repeats this process for each individual LN number found until the end of the table. The end result is an individual LN value in the first column and in the second column the latest date for that LN number. Obviously test this out on a copy table to see if it fits your needs

          Code:
          Function Get_Rid_Of_LNRecords_Except_Latest_EDate()
          On Error GoTo Err_Get_Rid_Of_LNRecords_Except_Latest_EDate
          Dim db As DAO.Database
          Dim rst As DAO.Recordset
          Set db = CurrentDb
          mysql = "SELECT Table1.LN, Table1.eDate "
          mysql = mysql & "FROM Table1 "
          mysql = mysql & "ORDER BY Table1.LN, Table1.eDate DESC;"
          
          ' open a recordset of two columns
          ' sorting records by LN first then the edate descending
          Set rst = db.OpenRecordset(mysql, dbOpenDynaset)
          ' if a dataset is available loop through records
          ' skipping the first LN record encountered and deleting the rest
          ' the first LN record encountered will be first LN record for any given date
          
          mypointer = rst!LN
                 
                  Do While Not rst.BOF And Not rst.EOF
                      'skip the first one
                      rst.MoveNext
                              If mypointer = rst!LN Then
                                 rst.Delete
                              Else
                              'ressign the pointer because we have encountered
                              ' a new LN
                                  mypointer = rst!LN
                              End If
                   Loop
          
          rst.Close
          mysql = ""
          Set rst = Nothing
          Set db = Nothing
          MsgBox "Process Complete", vbInformation, "System Message"
          Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate:
              Exit Function
          
          Err_Get_Rid_Of_LNRecords_Except_Latest_EDate:
              If Err = 3021 Then
                  'just ignore because encountered last row
              MsgBox "Process Complete", vbInformation, "System Message"
              Else
                  MsgBox "Error " & Err.Number & " " & Err.Description
              End If
              Resume Exit_Get_Rid_Of_LNRecords_Except_Latest_EDate
          
          End Function
          You can run the above if you paste it as is into a new module and then place a command button on a form (without using the wizard) then in the property box for the command button at the 'On click' event just type this =Get_Rid_Of_LNR ecords_Except_L atest_EDate()
          Thanks Jim...I had already started donig something similar so I went with that (for now anyways).

          Here is what I have, but it is not working:

          Code:
          Set db = CurrentDb
          Dim rsDEDUPE As Recordset
          Dim varLN As String
          Dim varTic As Long
          Dim varToc As Long
          
          varTic = 0 ' count total deleted duplicate records
          varToc = 0 ' count total for duplicate recordset
          
          Set rsDEDUPE = db.OpenRecordset("_1duplicates for tbl_exceptions_CSV") ' rename query
          rsDEDUPE.MoveFirst
          
          Do While Not rsDEDUPE.EOF
          
              If Not rsDEDUPE.EOF Then
                  varLN = rsDEDUPE(0)
                  rsDEDUPE.MoveNext
                  varToc = varToc + 1
              End If
              
          If Not rsDEDUPE.EOF Then
             If Trim(varLN) = Trim(rsDEDUPE(0)) Then
              
                  DoCmd.RunSQL ("DELETE * FROM tbl_exceptions_CSV WHERE [AutoKey] = " & rsDEDUPE(2))
                         
                  varTic = varTic + 1
              End If
          End If
          
          Loop
          It is giving me a "Record Deleted" error on the live that is:

          Code:
          varLN = rsDEDUPE(0)
          I have gotten this error in the past when I have a query that has a linked table that is linked to a linked table. The table that the query uses to create the recordset is local, so I don't know why this would be happening.

          Comment

          • Jim Doherty
            Recognized Expert Contributor
            • Aug 2007
            • 897

            #6
            Have you looked at the logic of the one I sent you. I did take the time out to present you with something that aligns itself to your original post!

            I am not so sure I understand what you are experiencing with the #deletion# thing. If you have an openrecordset and delete records in SQL (Delete From X Where etc etc) at the same time then obviously the recordset will reflect that

            Comment

            • NeoPa
              Recognized Expert Moderator MVP
              • Oct 2006
              • 32662

              #7
              This should handle the nesting of the subqueries and the duplications when [eDate]s match :
              Code:
              DELETE tDo.*
              FROM   [tblDuplicates] AS tDo
              WHERE  tDo.ID Not In(
                  SELECT   TOP 1
                           tDi.ID
                  FROM     [tblDuplicates] AS tDi
                  WHERE    tDi.LN=tDo.LN
                  ORDER BY tDi.[eDate] DESC
                         , tDi.ID DESC
                  )
              Last edited by NeoPa; May 14 '10, 02:01 PM. Reason: Changed ORDER BY lines to DESCending.

              Comment

              • Jim Doherty
                Recognized Expert Contributor
                • Aug 2007
                • 897

                #8
                Originally posted by NeoPa
                This should handle the nesting of the subqueries and the duplications when [eDate]s match :
                Code:
                DELETE tDo.*
                FROM   [tblDuplicates] AS tDo
                WHERE  tDo.ID Not In(
                    SELECT   TOP 1
                             tDi.ID
                    FROM     [tblDuplicates] AS tDi
                    WHERE    tDi.LN=tDo.LN
                    ORDER BY tDi.[eDate]
                           , tDi.ID
                    )
                Just a small point Neopa' the correlated inner loop needs descending order edate sorting otherwise the poster will be left with the 'earliest date' as opposed to the latest date having the benefit of two insights SQL and VBA. What more can one need?

                Comment

                • NeoPa
                  Recognized Expert Moderator MVP
                  • Oct 2006
                  • 32662

                  #9
                  You're absolutely right of course Jim :)

                  Strangely I've done a couple of similar ones today and thought I'd included that in this one, as I did in the other. Thanks for catching.

                  Comment

                  • bard777
                    New Member
                    • Jan 2008
                    • 23

                    #10
                    Thanks NeoPa, that did the trick! I thought it could all be done in a query, I just couldn't wrap my mind around the logic. I also found out something interesting about using TOP 1....it will return multiple records if they meet the criteria (I didn't include the ID field in the ORDER first time around), didn't know it would do that.

                    Jim, thanks for the time you spent on your method. I had already started working on some VBA along the same lines before I saw your code. I didn't ignore your effort, I was trying to work with what I had instead of re-writing at that point. I did mean to seem ungratelful for your help.

                    Comment

                    • NeoPa
                      Recognized Expert Moderator MVP
                      • Oct 2006
                      • 32662

                      #11
                      No worries Bard. The TOP predicate catches many of us out believe me ;)

                      Comment

                      • Jim Doherty
                        Recognized Expert Contributor
                        • Aug 2007
                        • 897

                        #12
                        Originally posted by bard777
                        Thanks NeoPa, that did the trick! I thought it could all be done in a query, I just couldn't wrap my mind around the logic. I also found out something interesting about using TOP 1....it will return multiple records if they meet the criteria (I didn't include the ID field in the ORDER first time around), didn't know it would do that.

                        Jim, thanks for the time you spent on your method. I had already started working on some VBA along the same lines before I saw your code. I didn't ignore your effort, I was trying to work with what I had instead of re-writing at that point. I did mean to seem ungratelful for your help.
                        Don't worry about it Bard I am not the 'sensitive soul' :) I much rather prefer that multiple methods came piling into threads the browsing individual not just yourself then has choice according to their particular flavour of working.

                        SQL,s TOP predicate will return more than 'one' record as you experienced because it returns TIES by default in Access (whereas in SQL Server it does not you explicitly request it) You simply code that out by using the DISTINCT predicate

                        Comment

                        • NeoPa
                          Recognized Expert Moderator MVP
                          • Oct 2006
                          • 32662

                          #13
                          Interesting Jim (DISTINCT predicate of the SELECT clause for handling ties). I normally include a unique reference field in the sorting to avoid them (as in current thread).

                          Comment

                          Working...