Best way for finding duplicate entries in table?

**Aggro** · Jul 23 '05, 07:06 AM

Re: Best way for finding duplicate entries in table?

andreas.maurer1 971@web.de wrote:
[color=blue]
> Now my question:
> Are there better and especially faster ways to find duplicate entries
> within a table without having to create a seperate table which will be
> filled with a SELECT DISTINCT... statement from the "old" table?[/color]

select id,count(field) as amount
from table group by field having amount > 1;

**andreas.maurer1971@web.de** · Jul 23 '05, 07:06 AM

Re: Best way for finding duplicate entries in table?

Aggro schrieb:[color=blue]
> andreas.maurer1 971@web.de wrote:
>[color=green]
> > Now my question:
> > Are there better and especially faster ways to find duplicate entries
> > within a table without having to create a seperate table which will be
> > filled with a SELECT DISTINCT... statement from the "old" table?[/color]
>
> select id,count(field) as amount
> from table group by field having amount > 1;[/color]

Hi Aggro,

thank you very much for your help, it shows me duplicate entries but,
as well due to a not better specified question of mine, some problems
are left open.
First of all your statement, when issued in phpmyadmin, tells me that
it found in total all rows of the table. Do you have any idea, why?
Another problem I have is: It shows me the duplicate entries in "field"
but only one of the many keys related to this entry. Therefore, if I
wanted to delete all but one of these entries, I have to issue a lot of
SELECT and DELETE statements in order to clean my table.

Of course I always could create another (temporary) table and fill it
with DISTINCT entries of table1 but this is sometimes not an option for
me.

Do you have another idea how to fix this?

Thank's again,

Andy

**Bill Karwin** · Jul 23 '05, 07:06 AM

Re: Best way for finding duplicate entries in table?

andreas.maurer1 971@web.de wrote:[color=blue][color=green]
>>select id,count(field) as amount
>>from table group by field having amount > 1;[/color][/color]
[color=blue]
> Another problem I have is: It shows me the duplicate entries in "field"
> but only one of the many keys related to this entry. Therefore, if I
> wanted to delete all but one of these entries, I have to issue a lot of
> SELECT and DELETE statements in order to clean my table.[/color]

This should give the list of fields that occur more than once in the
database:

SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field `) > 1;

This should give the list of id's in whose records those values for
field occur:

SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field `) > 1);

This should give the list of the lowest id for each given value in field:

SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`;

Putting it all together, this should delete records who contain values
for field that appear on more than one row, but do not delete such
records with the lowest id for a given value of `field`:

DELETE FROM `table`
WHERE id IN (
SELECT t.id
FROM `table` AS t
WHERE t.`field` IN (
SELECT t2.`field`
FROM `table` AS t2
GROUP BY t2.`field`
HAVING COUNT(t2.`field `) > 1))
AND id NOT IN (
SELECT MIN(t3.id)
FROM `table` AS t3
GROUP BY t3.`field`);

I'm assuming that you want to keep each record with the lowest id value.
In order to do this in a single operation, you must have some kind of
uniform policy like this, that applies to all cases. You could use
MAX(t3.id) instead to delete all but the most recent records, but other
than those two options, the alternatives get more complex to implement.

Regards,
Bill K.

Best way for finding duplicate entries in table?

Best way for finding duplicate entries in table?

Comment

Comment

Comment