Finding almost duplicate rows in mysql

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Russell

    Finding almost duplicate rows in mysql

    I'm using MySQL 4.1.1

    I've inherited a database which has some (almost) duplicate rows.

    The databse is like this.

    userID
    userPosition
    userDepartment

    No user should be in more than one department but some are. I need to
    find all the users who are listed in more than one department for manual
    fixing.

    I can do something like

    SELECT *, COUNT(userID)
    FROM users
    GROUP BY COUNT(userID)
    ORDER BY COUNT(userID) DESC

    But that still gives me all 10,000 rows. All I want is where
    COUNT(userID) > 1

    If I add a WHERE clause to that effect I just get error messages.

    Any suggestions?

  • Chung Leong

    #2
    Re: Finding almost duplicate rows in mysql

    The query should be like this, I think:

    SELECT userID, COUNT(*)
    FROM users
    GROUP BY userID
    ORDER BY userID
    HAVING COUNT(*) > 1;


    Uzytkownik "Russell" <null@null.noem ail> napisal w wiadomosci
    news:DYVLb.1133 3$6L3.10494@new s-binary.blueyond er.co.uk...[color=blue]
    > I'm using MySQL 4.1.1
    >
    > I've inherited a database which has some (almost) duplicate rows.
    >
    > The databse is like this.
    >
    > userID
    > userPosition
    > userDepartment
    >
    > No user should be in more than one department but some are. I need to
    > find all the users who are listed in more than one department for manual
    > fixing.
    >
    > I can do something like
    >
    > SELECT *, COUNT(userID)
    > FROM users
    > GROUP BY COUNT(userID)
    > ORDER BY COUNT(userID) DESC
    >
    > But that still gives me all 10,000 rows. All I want is where
    > COUNT(userID) > 1
    >
    > If I add a WHERE clause to that effect I just get error messages.
    >
    > Any suggestions?
    >[/color]


    Comment

    • Tim Van Wassenhove

      #3
      Re: Finding almost duplicate rows in mysql

      On 2004-01-10, Russell <null@null.noem ail> wrote:[color=blue]
      > I'm using MySQL 4.1.1
      >
      > I've inherited a database which has some (almost) duplicate rows.
      >
      > The databse is like this.
      >
      > userID
      > userPosition
      > userDepartment
      >
      > No user should be in more than one department but some are. I need to
      > find all the users who are listed in more than one department for manual
      > fixing.
      >
      > I can do something like
      >
      > SELECT *, COUNT(userID)
      > FROM users
      > GROUP BY COUNT(userID)
      > ORDER BY COUNT(userID) DESC[/color]

      That needs to be:
      SELECT *
      FROM users
      GROUP BY userID HAVING COUNT(*) > 1
      ORDER BY userID DESC


      Oh, and a good database design requires you have a primary key for each
      table. I suggest you use userID. This way, next time when another record
      is inserted with a userID that already exists, mysql will complain about
      that ;)


      --

      Comment

      • Russell

        #4
        Re: Finding almost duplicate rows in mysql

        Chung Leong wrote:
        [color=blue]
        > The query should be like this, I think:
        >
        > SELECT userID, COUNT(*)
        > FROM users
        > GROUP BY userID
        > ORDER BY userID
        > HAVING COUNT(*) > 1;[/color]

        You, Sir, are a star :) Thanks!

        The HAVING clause has to come before ORDER BY but other than that it's
        perfect - and yes, once the data is fixed, the Primary Key will be too.

        Thanks,

        Russell.

        Comment

        • Jeeke

          #5
          Re: Finding almost duplicate rows in mysql


          Hi,

          try something like (not tested but it should be something like this) :

          select
          *
          from
          users u
          where
          (
          (select count(*) from users where userID = u.userID) > 1
          )

          It might not be entirely correct but it should get you started.

          Good luck !




          On Sat, 10 Jan 2004 16:40:42 +0000, Russell <null@null.noem ail> wrote:
          [color=blue]
          > I'm using MySQL 4.1.1
          >
          > I've inherited a database which has some (almost) duplicate rows.
          >
          > The databse is like this.
          >
          > userID
          > userPosition
          > userDepartment
          >
          > No user should be in more than one department but some are. I need to
          > find all the users who are listed in more than one department for manual
          > fixing.
          >
          > I can do something like
          >
          > SELECT *, COUNT(userID)
          > FROM users
          > GROUP BY COUNT(userID)
          > ORDER BY COUNT(userID) DESC
          >
          > But that still gives me all 10,000 rows. All I want is where
          > COUNT(userID) > 1
          >
          > If I add a WHERE clause to that effect I just get error messages.
          >
          > Any suggestions?
          >[/color]



          --
          Jeeke

          Comment

          Working...