How to select rows based on duplicates in one column

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chromis
    New Member
    • Jan 2008
    • 113

    #16
    Hi coolsti,

    Thanks alot that's made things alot clearer. I realise that now I do need to GROUP BY the results by the name and email and from what you said about the time, it sounds like I should be combining the time and milliseconds before using the min() function. You can't find the minimum of two columns I take it?

    This leaves me with the following query:
    Code:
          SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
          FROM scores
          INNER JOIN
          (
          SELECT T2.id, T2.name, T2.email, min(T2.time + T2.milliseconds) AS mintime
          FROM scores T2 GROUP BY T2.name, T2.email
          ) T3 ON scores.name = T3.name AND scores.email = T3.email
    Which much to my frustration is giving me all the results again.

    So, I've selected the columns I want to output, next I've created a temporary table T2 and selected the id, name, email and min() of time + milliseconds, then GROUPed the results by name and email, which means I should have unique name and email values. I've then created another temporary table T3 and JOINed the results from T2 based on the name and email columns.
    However, it seems the GROUP BY has been ignored somehow, I'm now getting all results again and in my results mintime is not a column.

    Sorry to be a pain, I still can't get this to work.

    Comment

    • BHTanna
      New Member
      • Aug 2008
      • 31

      #17
      Pls try this one...


      SELECT T1.id, T1.name, T1.email, T1.time, T1.milliseconds
      FROM scores T1
      INNER JOIN
      (
      SELECT [name], min(time + milliseconds) AS mintime
      FROM scores GROUP BY name
      ) T2 ON T2.name = T1.name AND T2(mintime). = T1.(time + milliseconds)

      This will definately work...

      Comment

      • chromis
        New Member
        • Jan 2008
        • 113

        #18
        Originally posted by BHTanna
        Pls try this one...


        SELECT T1.id, T1.name, T1.email, T1.time, T1.milliseconds
        FROM scores T1
        INNER JOIN
        (
        SELECT [name], min(time + milliseconds) AS mintime
        FROM scores GROUP BY name
        ) T2 ON T2.name = T1.name AND T2(mintime). = T1.(time + milliseconds)

        This will definately work...
        Hi BHTanna,

        Unforunately after some necessary modification it returned incorrect results.

        Code:
        SELECT T1.id, T1.name, T1.email, T1.time, T1.milliseconds
        FROM scores T1
        INNER JOIN
        (
        SELECT name, min(time + milliseconds) AS mintime
        FROM scores GROUP BY name
        ) T2 ON T2.name = T1.name AND T2.mintime = (T1.time + T1.milliseconds)
        produced:
        Code:
          	7  	Joe Wins Again!  	test  	00:00:58  	31
        	13 	sam 	  	00:01:21 	87
        	1 	stuart 	sjfbsakldmfasd 	00:01:58 	19
        	11 	tony d Man 	Tony@tony.co.uk 	00:03:06 	53
        Result 7 is not the fastest time. 00:00:54 93 is.
        I've come up with the following but still it outputs the wrong results:
        Code:
              SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
              FROM scores 
              INNER JOIN
              (
              SELECT T2.id, T2.name, T2.email, T2.time, T2.milliseconds, min(T2.time + T2.milliseconds) AS mintime
              FROM scores T2 GROUP BY T2.name, T2.email
              ) T3 ON scores.name = T3.name AND scores.email = T3.email AND (scores.time + scores.milliseconds) = T3.mintime
        results:
        Code:
           	3  	Joe Wins Again!  	as  	00:01:15  	81
         	7 	Joe Wins Again! 	test 	00:00:58 	31
         	13 	sam 	  	00:01:21 	87
         	12 	sam 	sjhads 	00:03:12 	63
         	1 	stuart 	sjfbsakldmfasd 	00:01:58 	19
         	11 	tony d Man 	Tony@tony.co.uk 	00:03:06 	53

        Comment

        • BHTanna
          New Member
          • Aug 2008
          • 31

          #19
          Sorry.. I again gone thru your table structure...

          pls try this and let me know..

          select s.id, s.name, s.email, s.time, s.milliseconds
          FROM scores s inner join

          (
          SELECT [name],mintime,min(mi lliseconds) as minsec
          FROM scores T1
          INNER JOIN
          (
          SELECT [name], min(time) AS mintime
          FROM scores GROUP BY name
          ) T2 ON T2.name = T1.name AND T2(mintime). = T1.(time)
          group by [name],mintime
          ) m
          on m.name=s.name,m intime=time,min sec=millisecond s

          Hope it will work and let me know, after it works, if u dont get, will explain you.

          Comment

          • coolsti
            Contributor
            • Mar 2008
            • 310

            #20
            Originally posted by chromis
            Hi coolsti,

            Thanks alot that's made things alot clearer. I realise that now I do need to GROUP BY the results by the name and email and from what you said about the time, it sounds like I should be combining the time and milliseconds before using the min() function. You can't find the minimum of two columns I take it?

            This leaves me with the following query:
            Code:
                  SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
                  FROM scores
                  INNER JOIN
                  (
                  SELECT T2.id, T2.name, T2.email, min(T2.time + T2.milliseconds) AS mintime
                  FROM scores T2 GROUP BY T2.name, T2.email
                  ) T3 ON scores.name = T3.name AND scores.email = T3.email
            Which much to my frustration is giving me all the results again.

            So, I've selected the columns I want to output, next I've created a temporary table T2 and selected the id, name, email and min() of time + milliseconds, then GROUPed the results by name and email, which means I should have unique name and email values. I've then created another temporary table T3 and JOINed the results from T2 based on the name and email columns.
            However, it seems the GROUP BY has been ignored somehow, I'm now getting all results again and in my results mintime is not a column.

            Sorry to be a pain, I still can't get this to work.
            You are getting there, but you are making small mistakes :)

            First of all, your time field is of type TIME and your milliseconds field is of type tinyint, and I am not sure you can find the minimum time by just adding these two fields. MySQL may not understand that and thereby give you the wrong answer, as a TIME variable is stored in a certain format. You may need a more complicated expression (check the Mysql documentation on date and time functions) to calculate the minimum time from these two columns. No, you cannot do a MIN() operation on both columns to get the minimum time. You need to convert the two columns to one number representing the time in either seconds, milliseconds, or whatever, and then do the minimum on that.

            The next problem is that when you have calculated this minimum time in your subquery, you also need to calculate the same on all your data because you need to compare this calculated total time in the first part with the minimum time of the subquery in order to do the final selection of only the rows that pertain to the minimum time.

            Here is what I mean:
            Code:
                  SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
                  FROM scores
                  INNER JOIN
                  (
                  SELECT T2.id, T2.name, T2.email, EXPRESSION(T2.time; T2.milliseconds) AS mintime
                  FROM scores T2 GROUP BY T2.name, T2.email
                  ) T3 ON scores.name = T3.name AND scores.email = T3.email and 
                     EXPRESSION(scores.time;scores.milliseconds) = T3.mintime
            Notice two things:

            1) you need to substitute the "EXRESSION(x;y) " in the two places above with whatever mysql expression will give you the minimum time from the two fields time and milliseconds. (Sorry I have no time right now to find this out for you)

            2) I added an "and" clause to the end of the query. You have forgotten this in your attempt, and that is why you got all the rows back.

            See if this gets you a bit further.

            ------------------------------------------------------
            Edit: Ok, I looked this up in the mysql documentation and it MAY be what you want. It depends on what your time column actually represents. Is it the time that you want to minimize (without the milliseconds part)?

            Code:
                  SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
                  FROM scores
                  INNER JOIN
                  (
                  SELECT T2.id, T2.name, T2.email, 
                     min(time_to_sec(T2.time)+T2.milliseconds/1000) AS mintime
                  FROM scores T2 GROUP BY T2.name, T2.email
                  ) T3 ON scores.name = T3.name AND scores.email = T3.email 
                     and (time_to_sec(scores.time)+scores.milliseconds/1000) = T3.mintime
            ---------------------------------------------------------------------
            Edit again: I just tried this using your create table and some fake data, and the above did not work until I modified it like this:

            Code:
                  SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
                  FROM scores
                  INNER JOIN
                  (
                  SELECT T2.id, T2.name, T2.email, 
                     min(time_to_sec(T2.time)*1000+T2.milliseconds) AS mintime
                  FROM scores T2 GROUP BY T2.name, T2.email
                  ) T3 ON scores.name = T3.name AND scores.email = T3.email 
                     and (time_to_sec(scores.time)*1000+scores.milliseconds) = T3.mintime
            The equality constraint fails when you divide milliseconds by 1000, most likely because I did not cast the tinyint to a float. It works though if you multiply the time by 1000 instead of dividing the milliseconds by 1000.

            Comment

            • chromis
              New Member
              • Jan 2008
              • 113

              #21
              Bhtanna:

              I tryed to convert your SQL into MySQL but ended up getting very confused with the different table alias's I didn't have time to work it out unforunately. But thanks anyway, I've learnt much from what you have said.

              Coolsti:

              Thanks for taking all this time to help. I did end up realising that simply adding millseconds won't work due the type conversion as you described it, so I came up with the following (albeit incorrect):

              Code:
                    SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
                    FROM scores 
                    INNER JOIN
                    (
                    SELECT T2.id, T2.name, T2.email, T2.time, T2.milliseconds, min(T2.time + (T2.milliseconds / 100)) AS mintime
                    FROM scores T2 GROUP BY T2.name, T2.email
                    ) T3 ON scores.name = T3.name AND scores.email = T3.email AND (scores.time + (scores.milliseconds / 100)) = T3.mintime
              Which is similar to what you came up with only I got the expression wrong.

              I ran your query and the only thing that wasn't quite right was the order, so I put an ORDER statement and a LIMIT statement for 10 records and hopefully, this is everything I need:
              Code:
              SELECT scores.id, scores.name, scores.email, scores.time, scores.milliseconds
              FROM scores
              INNER JOIN (
              	SELECT T2.id, T2.name, T2.email, T2.time, T2.milliseconds, min( T2.time + ( T2.milliseconds /100 ) ) AS mintime
              	FROM scores T2
              	GROUP BY T2.name, T2.email
              ) T3 ON scores.name = T3.name
              AND scores.email = T3.email
              AND ( scores.time + ( scores.milliseconds /100 ) ) = T3.mintime
              ORDER BY mintime
              LIMIT 0 , 10
              which gives the results:
              Code:
              6  	Joe Wins Again!  	test  	00:00:54  	93
              3 	Joe Wins Again! 	as 	00:01:15 	81
              13 	sam 	  	00:01:21 	87
              1 	stuart 	sjfbsakldmfasd 	00:01:58 	19
              11 	tony d Man 	Tony@tony.co.uk 	00:03:06 	53
              12 	sam 	sjhads 	00:03:12 	63
              and that works perfectly now! I've definately learnt alot through doing this and you've both been extremely helpful, thankyou very much!

              Comment

              • coolsti
                Contributor
                • Mar 2008
                • 310

                #22
                Glad to have helped!

                Just make sure that your expression for the total time is working correctly for all instances. I am not sure what happens when you just add the expression

                time + milliseconds/100

                when time is a TIME field and millisecond is a tinyint field. It may conveniently work for your test cases but may fail elsewhere.

                In my final solution, I convert the TIME field to seconds and then I multiplied (perhaps incorrectly using 1000 instead of 100) to put it in the same units as the milliseconds field so they could be added.

                Your solution may indeed work! This is just to caution you to check.

                Comment

                • chromis
                  New Member
                  • Jan 2008
                  • 113

                  #23
                  Ah right yeah I meant to put the order by and limit on your query not mine! You were right in your expression as there are 1000 milliseconds in a second.

                  Comment

                  Working...