SQL id of record at which maximum occurs, by group

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • SjaakBanaan
    New Member
    • Jun 2009
    • 6

    SQL id of record at which maximum occurs, by group

    Hey all,

    I have a question that ought to be simple but has given me headaches for a while.

    I have a table with contact email addresses, say.

    Code:
    T
    ID ContID Priority Emailaddress
    ----------------------------------------------
    1     1         10     email1@contact1.fi
    2     1          7      email2@contact1.fi
    3     2          9      email1@contact2.fi
    4     2          8          ....
    ...

    Now, the main question:
    I want to create a query that gives me
    - the ID-number of the records
    - that correspond to the highest-priority emailaddress
    - of each contact.


    I then want to use this query-result of ID-numbers to INNER JOIN to table T (or actually a table with more fields) to get the fields I'm interested in; the joining being done on the ID-fields, something like:
    Code:
    SELECT t1.ContID, t1.Emailaddress
    FROM t AS t1
    INNER JOIN
    (
      SELECT "the ID with the highest priority"
      FROM t AS t2
      GROUP BY t2.ContID
    ) AS t3 ON t3.ID = t1.ID
    The problem is that it's not possible to get "the ID of the record with the highest-priority email for each contact" with SQL- at least, none that I know of.
    (It IS possible to "get the maximum ID of the records for each contact" or "get the maximum priority for each contact", as shown below.)


    I did think of some solutions, but none that I like.


    I could do the JOINing on other fields, something like
    Code:
    SELECT t1.ContID, t1.Emailaddress
    FROM t AS t1
    INNER JOIN
    (
      SELECT t2.ContID, MAX(t2.priority) AS MaxPrio
      FROM t AS t2
      GROUP BY t2.ContID
    ) AS t3 ON t3.ContID = t1.ContID AND t3.MaxPrio = t1.Priority
    but this has the disadvantage of needing to join on multiple fields that may not necessarily be unique (there may be a contact with several emailaddresses with the same priority) - I want to get only one record per contact, maximum.

    This could multiplicity can be solved by GROUPing also in the outer query, like this:
    Code:
    SELECT t1.ContID, FIRST(t1.Emailaddress) AS FirstEmail
    FROM t AS t1
    INNER JOIN
    (
      SELECT t2.ContID, MAX(t2.priority) AS MaxPrio
      FROM t AS t2
      GROUP BY t2.ContID
    ) AS t3 ON t3.ContID = t1.ContID AND t3.MaxPrio = t1.Priority
    GROUP BY t1.ContID
    but also this solution has the disadvantage of JOINing on multiple non-unique fields.
    It also makes it difficult to include additional conditions: if the table has a yes/no field 'IsWork', and I want to get the highest-priority work-email, I need to include the 'WHERE IsWork = TRUE' line in both the inner and the outer query.

    It's just not pretty!


    Another possible solution is
    Code:
    SELECT t1.contID, t1.EmailAddress
    FROM t AS t1 
    INNER JOIN 
    (
    SELECT t2.ID, COUNT(*) 
      FROM t AS t2
      INNER JOIN t AS t3 ON (t2.ContID = t3.ContID) AND ((t2.priority < t3.priority) OR (t2.priority = t3.priority AND t2.ID <  t3.ID)) 
      GROUP BY t2.ID
      HAVING COUNT(*)  =1
      )  AS t4 ON t4.ID =t4.ID
    ORDER BY t1.contID
    (adjusted from http://rickosborne.org/blog/index.ph...grouped-query/, with the "(t2.ContID = t3.ContID) AND ((t2.priority < t3.priority) OR (t2.priority = t3.priority AND t2.ID < t3.ID)) " line to force a consistent ordering upon the recordset, even if a contact has several emailaddresses with the same priority.)

    I like this solution; all the selecting of the correct ID numbers is done isolatedly in the inner query, and additional conditions can be included here without needing to change something in the outer query.

    But it's really slow! :(

    Can someone help me out? Suggestions for an adequete, pretty, fast query to do the job?

    Thanks!!

    Sjaak

    if wanted i can prepare an Access file, of course.
    I'm using access 2007, SQL ANSI-92
  • Delerna
    Recognized Expert Top Contributor
    • Jan 2008
    • 1134

    #2
    I think you have all SQL possibilities pretty much covered but am interested to see alternatives if any.

    Comment

    • NeoPa
      Recognized Expert Moderator MVP
      • Oct 2006
      • 32662

      #3
      You would have thought that something like this would have been built into the SQL standards. It wasn't as far as I can see :(

      I would use either of two methods :
      1. Code:
        SELECT T.ContID,
               T.Emailaddress
        FROM   T INNER JOIN
               (
            SELECT   [ContID]
                     Val(Mid(Max(Format([Priority],'0000') & T.ID),5)) AS ID
            FROM     T
            GROUP BY [ContID]
               ) AS subT ON T.ContID=subT.ContID
      2. Code:
        SELECT [ContID],
               [Emailaddress]
        FROM   T AS T1
        WHERE  [Priority] In
               (
            SELECT   Max([Priority]) AS Data
            FROM     T AS T2
            GROUP BY [ContID]
            WHERE    T2.ContID=T1.ContID
               )

      Comment

      • SjaakBanaan
        New Member
        • Jun 2009
        • 6

        #4
        NeoPa, thanks for your suggestions

        Your first is quite creative :) (There's a typo in that the joining should be on the ID-fields, btw)

        The second one unfortunately doesnt work (even after reversing the WHERE and GROUP BY lines), as it lists all records with the same (highest) priority of a contact, not just one: if a contact has 2 emailaddresses both with prio=10 then both will show up in the final result.

        I'll think I'll go for the first, should be pretty fast and is quite readable. Unless someone else has a sudden great brain wave, of course!

        Sjaak

        Comment

        • NeoPa
          Recognized Expert Moderator MVP
          • Oct 2006
          • 32662

          #5
          Originally posted by SjaakBanaan
          Your first is quite creative :) (There's a typo in that the joining should be on the ID-fields, btw)
          Thank you. I had to work this out some while back, but having discovered this approach I missed the (generally more flexible) SQL approach until much later.

          Your point about the joining is correct (I discovered after a bit of a think through). The SQL need not even return the ContID field in fact :
          Code:
          SELECT T.ContID,
                 T.Emailaddress
          FROM   T INNER JOIN
                 (
              SELECT   Val(Mid(Max(Format([Priority],'0000') & T.ID),5)) AS ID
              FROM     T
              GROUP BY [ContID]
                 ) AS subT ON T.ID=subT.ID
          Originally posted by SjaakBanaan
          The second one unfortunately doesnt work (even after reversing the WHERE and GROUP BY lines), as it lists all records with the same (highest) priority of a contact, not just one: if a contact has 2 emailaddresses both with prio=10 then both will show up in the final result.
          Apologies for the incorrect order. I'm afraid I did it off the top of my head rather than going through any testing cycle.

          As far as the data is concerned, as your otherwise very thorough OP didn't mention how you wanted to handle multiple results I rather assumed that this was not possible in your scenario. As I can see no way, using aggregation, of selecting one above any other I think this approach can be considered a dud.

          Comment

          Working...