Question about deleting table duplicates

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ScottH

    Question about deleting table duplicates

    I was looking for thw SQL to delete dupes from a table, and came across
    this. All who saw it agreed in principle, but I can't quite figure out the
    logic. If we are deleting all rows whose rowid is greater than the least of
    the rowids returned from creating the subset of dupes, couldn't we
    inadvertently delete some non-dupes rows that were created after the last
    dupe ? I mean, any row created after the last dupe would have a greater
    rowid, wouldn't it ?

    Here's the SQL:

    delete from table_1 a
    where a.rowid >
    (select min(b.rowid)
    from table_1 b
    where b.col_dup_value s = a.col_dup_value s)

    By the way, should the delete ALL dupes, including the originals ? That is
    all rows participating in duplicity, as it is, will be gone.

    Thanks,
    Scott


  • sybrandb@yahoo.com

    #2
    Re: Question about deleting table duplicates

    "ScottH" <fakeaddress@ne wsgroupsonly.co mwrote in message news:<pOKdnbFXV 7ane_SiU-KYvQ@giganews.c om>...
    I was looking for thw SQL to delete dupes from a table, and came across
    this. All who saw it agreed in principle, but I can't quite figure out the
    logic. If we are deleting all rows whose rowid is greater than the least of
    the rowids returned from creating the subset of dupes, couldn't we
    inadvertently delete some non-dupes rows that were created after the last
    dupe ? I mean, any row created after the last dupe would have a greater
    rowid, wouldn't it ?
    >
    Here's the SQL:
    >
    delete from table_1 a
    where a.rowid >
    (select min(b.rowid)
    from table_1 b
    where b.col_dup_value s = a.col_dup_value s)
    >
    By the way, should the delete ALL dupes, including the originals ? That is
    all rows participating in duplicity, as it is, will be gone.
    >
    Thanks,
    Scott
    No, you will delete rows which are identical, save for their rowids.
    Rowids aren't stored, they are an internal attribute of a record.
    So if you have 2 identical rows, only 1 will be deleted. A *random*
    one of course, because rows are inserted randomly.
    If you would limit the delete with an extra where clause to one
    particular case, you should see easily that only one record is
    deleted.
    If more rows are deleted, the where clause in the subquery is
    incorrect.

    Sybrand Bakker
    Senior Oracle DBA

    Comment

    • Russ Bagley

      #3
      Re: Question about deleting table duplicates

      "ScottH" <fakeaddress@ne wsgroupsonly.co mwrote in message news:<pOKdnbFXV 7ane_SiU-KYvQ@giganews.c om>...
      I was looking for thw SQL to delete dupes from a table, and came across
      this. All who saw it agreed in principle, but I can't quite figure out the
      logic. If we are deleting all rows whose rowid is greater than the least of
      the rowids returned from creating the subset of dupes, couldn't we
      inadvertently delete some non-dupes rows that were created after the last
      dupe ? I mean, any row created after the last dupe would have a greater
      rowid, wouldn't it ?
      >
      Here's the SQL:
      >
      delete from table_1 a
      where a.rowid >
      (select min(b.rowid)
      from table_1 b
      where b.col_dup_value s = a.col_dup_value s)
      The condition in the subselect (b.col_dup_valu es = a.col_dup_value s)
      links the two instances (a and b) of table_1 in this SQL. The delete
      statement, therefore, only affects table_1 with the condition:
      (b.col_dup_valu es = a.col_dup_value s).

      This won't delete all duplicated rows, as such, but any row that is a
      duplicate of a row that already exists - leaving one row where there
      were several duplicates. If you wanted to do that the SQL is much
      simpler (at least, simpler to follow).

      DELETE FROM table_1
      WHERE col_dup_values = (SELECT col_dup_values
      ,COUNT(*)
      FROM table_1
      WHERE COUNT(*)>1
      GROUP BY col_dup_values)

      Comment

      • Mike Nugent

        #4
        Re: Question about deleting table duplicates

        Russ Bagley wrote:
        "ScottH" <fakeaddress@ne wsgroupsonly.co mwrote in message news:<pOKdnbFXV 7ane_SiU-KYvQ@giganews.c om>...
        >
        >>I was looking for thw SQL to delete dupes from a table, and came across
        >>this. All who saw it agreed in principle, but I can't quite figure out the
        >>logic. If we are deleting all rows whose rowid is greater than the least of
        >>the rowids returned from creating the subset of dupes, couldn't we
        >>inadvertent ly delete some non-dupes rows that were created after the last
        >>dupe ? I mean, any row created after the last dupe would have a greater
        >>rowid, wouldn't it ?
        >>
        >>Here's the SQL:
        >>
        >>delete from table_1 a
        >>where a.rowid >
        > (select min(b.rowid)
        > from table_1 b
        > where b.col_dup_value s = a.col_dup_value s)
        >
        >
        The condition in the subselect (b.col_dup_valu es = a.col_dup_value s)
        links the two instances (a and b) of table_1 in this SQL. The delete
        statement, therefore, only affects table_1 with the condition:
        (b.col_dup_valu es = a.col_dup_value s).
        >
        This won't delete all duplicated rows, as such, but any row that is a
        duplicate of a row that already exists - leaving one row where there
        were several duplicates. If you wanted to do that the SQL is much
        simpler (at least, simpler to follow).
        >
        DELETE FROM table_1
        WHERE col_dup_values = (SELECT col_dup_values
        ,COUNT(*)
        FROM table_1
        WHERE COUNT(*)>1
        GROUP BY col_dup_values)
        Hmm. This isn't how I saw it.

        Pretend data set

        rowid col
        1 1
        2 2
        3 1
        4 4
        5 1
        6 6

        Ok so the sub query said:
        select min(b.rowid)
        from table_1 b
        where b.col_dup_value s = a.col_dup_value s

        That would return 1, right? The min row where a.col = b.col.

        So plug that into the original query:

        delete from table_1 a
        where a.rowid >
        (select min(b.rowid)
        from table_1 b
        where b.col_dup_value s = a.col_dup_value s)

        You get:

        delete from table_1 a
        where a.rowid (1)

        Or am I missing something?

        --
        Mike Nugent
        Programmer/Author/DBA/Admin
        In search of employment, email for credentials
        news@remove-this.illuminatu s.org

        Comment

        Working...