Finding almost duplicate rows in mysql

**Chung Leong** · Jul 17 '05, 02:22 AM

Re: Finding almost duplicate rows in mysql

The query should be like this, I think:

SELECT userID, COUNT(*)
FROM users
GROUP BY userID
ORDER BY userID
HAVING COUNT(*) > 1;

Uzytkownik "Russell" <null@null.noem ail> napisal w wiadomosci
news:DYVLb.1133 3$6L3.10494@new s-binary.blueyond er.co.uk...[color=blue]
> I'm using MySQL 4.1.1
>
> I've inherited a database which has some (almost) duplicate rows.
>
> The databse is like this.
>
> userID
> userPosition
> userDepartment
>
> No user should be in more than one department but some are. I need to
> find all the users who are listed in more than one department for manual
> fixing.
>
> I can do something like
>
> SELECT *, COUNT(userID)
> FROM users
> GROUP BY COUNT(userID)
> ORDER BY COUNT(userID) DESC
>
> But that still gives me all 10,000 rows. All I want is where
> COUNT(userID) > 1
>
> If I add a WHERE clause to that effect I just get error messages.
>
> Any suggestions?
>[/color]

**Tim Van Wassenhove** · Jul 17 '05, 02:22 AM

Re: Finding almost duplicate rows in mysql

On 2004-01-10, Russell <null@null.noem ail> wrote:[color=blue]
> I'm using MySQL 4.1.1
>
> I've inherited a database which has some (almost) duplicate rows.
>
> The databse is like this.
>
> userID
> userPosition
> userDepartment
>
> No user should be in more than one department but some are. I need to
> find all the users who are listed in more than one department for manual
> fixing.
>
> I can do something like
>
> SELECT *, COUNT(userID)
> FROM users
> GROUP BY COUNT(userID)
> ORDER BY COUNT(userID) DESC[/color]

That needs to be:
SELECT *
FROM users
GROUP BY userID HAVING COUNT(*) > 1
ORDER BY userID DESC

Oh, and a good database design requires you have a primary key for each
table. I suggest you use userID. This way, next time when another record
is inserted with a userID that already exists, mysql will complain about
that ;)

--

http://home.mysth.be/~timvw

**Russell** · Jul 17 '05, 02:22 AM

Re: Finding almost duplicate rows in mysql

Chung Leong wrote:
[color=blue]
> The query should be like this, I think:
>
> SELECT userID, COUNT(*)
> FROM users
> GROUP BY userID
> ORDER BY userID
> HAVING COUNT(*) > 1;[/color]

You, Sir, are a star :) Thanks!

The HAVING clause has to come before ORDER BY but other than that it's
perfect - and yes, once the data is fixed, the Primary Key will be too.

Thanks,

Russell.

**Jeeke** · Jul 17 '05, 02:22 AM

Re: Finding almost duplicate rows in mysql

Hi,

try something like (not tested but it should be something like this) :

select
*
from
users u
where
(
(select count(*) from users where userID = u.userID) > 1
)

It might not be entirely correct but it should get you started.

Good luck !

On Sat, 10 Jan 2004 16:40:42 +0000, Russell <null@null.noem ail> wrote:
[color=blue]
> I'm using MySQL 4.1.1
>
> I've inherited a database which has some (almost) duplicate rows.
>
> The databse is like this.
>
> userID
> userPosition
> userDepartment
>
> No user should be in more than one department but some are. I need to
> find all the users who are listed in more than one department for manual
> fixing.
>
> I can do something like
>
> SELECT *, COUNT(userID)
> FROM users
> GROUP BY COUNT(userID)
> ORDER BY COUNT(userID) DESC
>
> But that still gives me all 10,000 rows. All I want is where
> COUNT(userID) > 1
>
> If I add a WHERE clause to that effect I just get error messages.
>
> Any suggestions?
>[/color]

--
Jeeke

Finding almost duplicate rows in mysql

Finding almost duplicate rows in mysql

Comment

Comment

Comment

Comment