Delete Duplicates

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • EoRaptor013

    Delete Duplicates

    I'm having trouble figuring out how to delete some _almost_ duplicate
    records in a look-up table. Here's the table:
    CREATE TABLE [user_fields] (
    [fKEY] [char] (16) NOT NULL ,
    [SEQUENCE] [char] (2) NOT NULL ,
    [FIELD_LABEL] [varchar] (20) NULL ,
    [FIELD_VALUE] [varchar] (50) NULL ,
    [EXPORT_DATE] [datetime] NULL ,
    CONSTRAINT [PK_user_fields] PRIMARY KEY CLUSTERED
    (
    [fKEY],
    [SEQUENCE]
    )
    CONSTRAINT [FK_USRFLD_INV_D OCID] FOREIGN KEY
    (
    [fKEY]
    ) REFERENCES [OTHER_TABLE] (
    [PKEY]
    )
    )

    Some values:
    fKEY SEQUENCE FIELD_LABEL FIELD_VALUE
    ----------------------------------------------------------
    8525645200692B8 9 19 Co. ID #
    8525645200692B8 9 20 Co. ID #
    8525645200692B8 9 21 Co. ID #
    8525645200692B8 9 13 Co/Div/Dept
    8525645200692B8 9 14 Co/Div/Dept
    8525645200692B8 9 15 Co/Div/Dept
    8525645200692B8 9 16 Division
    8525645200692B8 9 17 Division
    8525645200692B8 9 18 Division
    8525645200692B8 9 10 Group
    8525645200692B8 9 11 Group
    8525645200692B8 9 12 Group
    8525645200692B8 9 1 HR Contact JOHN NOVAK
    8525645200692B8 9 2 HR Contact JOHN NOVAK
    8525645200692B8 9 3 HR Contact JOHN NOVAK
    8525645200692B8 9 24 Job Location
    8525645200692B8 9 22 Job Location
    8525645200692B8 9 23 Job Location
    8525645200692B8 9 4 Manager
    8525645200692B8 9 5 Manager
    8525645200692B8 9 6 Manager
    8525645200692B8 9 7 Recruiter
    8525645200692B8 9 8 Recruiter
    8525645200692B8 9 9 Recruiter
    85256D740081C3A 4 13 Co. ID #
    85256D740081C3A 4 14 Co. ID #
    85256D740081C3A 4 10 Co/Div/Dept
    85256D740081C3A 4 9 Co/Div/Dept
    85256D740081C3A 4 11 Division
    85256D740081C3A 4 12 Division
    85256D740081C3A 4 8 Group
    85256D740081C3A 4 7 Group
    85256D740081C3A 4 2 HR Contact Diana Tarry
    85256D740081C3A 4 1 HR Contact Diana Tarry
    85256D740081C3A 4 15 Job Location
    85256D740081C3A 4 16 Job Location
    85256D740081C3A 4 3 Manager
    85256D740081C3A 4 4 Manager
    85256D740081C3A 4 5 Recruiter
    85256D740081C3A 4 6 Recruiter

    Note that fKEY 8525645200692B8 9 has three of every FIELD_LABEL, and
    fKEY 85256D740081C3A 4 has two. Both, however, should have only one.
    Unfortunately, when I do a slect ... having count(*) > 1, I have nearly
    900 different fKEYs with some variation of this problem.
    It's just not coming to me how to delete the duplicates (except for
    sequence). I don't care which of the sequence values I keep but as a
    matter of preference I tried to do something using max(sequence) but,
    so far, everything I've tried deletes all records for any given fKEY.
    Help?
    Thanks.
    Randy

  • BlueDragon

    #2
    Re: Delete Duplicates

    My project manager is a smart cookie:

    delete User_Fields
    FROM
    user_fields a
    WHERE exists (
    select null
    from user_fields b
    where a.fKEY= b. fKEY
    and a.serverrepid = b.serverrepid
    and a.FIELD_LABEL = b.FIELD_LABEL
    group by FKEY, serverrepid, FIELD_LABEL
    having count(*) > 1 and min(b.Sequence) <> a.sequence
    )

    Comment

    • kevindotcar@gmail.com

      #3
      Re: Delete Duplicates


      EoRaptor013 wrote:[color=blue]
      > Note that fKEY 8525645200692B8 9 has three of every FIELD_LABEL, and
      > fKEY 85256D740081C3A 4 has two. Both, however, should have only one.
      > Unfortunately, when I do a slect ... having count(*) > 1, I have[/color]
      nearly[color=blue]
      > 900 different fKEYs with some variation of this problem.
      > It's just not coming to me how to delete the duplicates (except for
      > sequence). I don't care which of the sequence values I keep but as a
      > matter of preference I tried to do something using max(sequence) but,[/color]

      The easiest would be the MAX(sequence) that you suggested. I'm just
      typing this DML off the top of my head;

      DELETE FROM user_fields
      WHERE PK_user_fields
      NOT IN (SELECT PK_user_fields
      FROM user_fields
      LEFT JOIN (SELECT PK_user_fields, MAX(SEQUENCE) AS MAXSeq
      FROM PK_user_fields
      GROUP BY PK_user_fields) AS MAXKEY
      ON MAXKEY.PK_user_ fields = user_fields.PK_ user_fields
      GROUP BY user_fields.PK_ user_fields
      HAVING MAXKEY.PK_user_ fields IS NULL)

      I'm actually more of an MSAccess guy, so my syntax may be off, but bear
      with me :) This has an inner join collecting the max sequence; A
      mid-join taking the parent-table to the MAX-value, and the outer DELETE
      operating on the criteria of the RHS of the left-join being null.

      [color=blue]
      > so far, everything I've tried deletes all records for any given fKEY.
      > Help?[/color]

      Make sure you back up if you try the above qry- :-)

      HTH-

      kDot

      Comment

      • --CELKO--

        #4
        Re: Delete Duplicates

        Have you thought about using a relational design for the table intead
        of this? It looks like an EAV design flaw, with mixed data and metadata
        in the same table. First you need a real key. Then you need to
        understand that rows are not records and columns are not fields --
        totally different things. You will never have data integrity with this
        thing.

        Comment

        Working...