Top 2 per category

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Diablette
    New Member
    • May 2010
    • 6

    Top 2 per category

    I need to get the largest and 2nd largest amounts in a column for each postcode.
    I then want to add these two figures to a table which has other information for each postcode (e.g. counts, totals, etc).
    I can get largest fine:
    Code:
    SELECT [4_digit_postcode],  max([Total_SI]) AS [LargestRiskSI]
       INTO tLargestRiskPerPostcode
       FROM [Location file_DO]
       GROUP BY [4_digit_postcode];
    And update the table where the postcodes match.

    And I get get the top 2 overall:
    Code:
    SELECT  top 2 Total_SI
      FROM [Location file_DO]
      ORDER BY Total_SI DESC;
    And the top 2 for a specific postcode, but its way too slow to run this code for every single postcode in Excel (I tried).
    For each postcode:
    Code:
    SELECT  top 2 TotalSI
      FROM [Location file_DO]
      WHERE  Postcode = **variablePostcode**
      ORDER BY TotalSI DESC
    I'm having no luck with google trying to get second largest by postcode. The tables are in Access, so it's Access SQL. Would you be able to tell me what my code should look like? It needs to be run regularly from an Excel VBA macro, with the results being put into an Excel sheet.
    If there is a tie at 1, I'd like the same number in largest and 2nd largest.
  • Diablette
    New Member
    • May 2010
    • 6

    #2
    Any help with this will be greatly appreciated.
    I have to get it working for work, and I'm really stuck.
    Thank you

    Comment

    • Jim Doherty
      Recognized Expert Contributor
      • Aug 2007
      • 897

      #3
      Originally posted by Diablette
      Any help with this will be greatly appreciated.
      I have to get it working for work, and I'm really stuck.
      Thank you
      What is your data row size here are you working in excel or Access?

      Comment

      • Diablette
        New Member
        • May 2010
        • 6

        #4
        Hi Jim,
        I trust you will be able to help me. Thank you for taking the time to look at my query.

        The original table that I am working with today has 100,000 rows.
        It has 25,000 unique postcode values.
        In the data I am working with today the postcodes are numbers, but other tables will have alphanumeric postcodes. There are no memo or very long string fields in the table.
        (I hope I understand your question on data row size)

        The table is in Access.
        I am writing code in Excel which runs various queries on the tables in Access and puts summary information into Excel sheets.

        Comment

        • Jim Doherty
          Recognized Expert Contributor
          • Aug 2007
          • 897

          #5
          Originally posted by Diablette
          Hi Jim,
          I trust you will be able to help me. Thank you for taking the time to look at my query.

          The original table that I am working with today has 100,000 rows.
          It has 25,000 unique postcode values.
          In the data I am working with today the postcodes are numbers, but other tables will have alphanumeric postcodes. There are no memo or very long string fields in the table.
          (I hope I understand your question on data row size)

          The table is in Access.
          I am writing code in Excel which runs various queries on the tables in Access and puts summary information into Excel sheets.
          OK well I have an appointment very shortly but I will come back to you on this. In the meantime if you have anything there ie: table field names and structures to post back maybe I can replicate something when i come back to test things out and help you

          Comment

          • Diablette
            New Member
            • May 2010
            • 6

            #6
            Hi Jim,
            The sort of table I have (ignoring the rest of the data):
            Code:
            Postcode  Name     TotalSI
            2600    Place A      500
            2600    Place A      400
            3800    Place B      600
            3800    Place B      500
            3800    Place B      400
            3800    Place B      300
            4460    Place C      300
            4460    Place C      300
            4460    Place C      200
            1100    Place D      100
            What I need to end up with somehow:
            Code:
            Postcode  Largest 2nd largest
            2600        500    400
            3800        600    500
            4460        300    300
            1100        100    null (or 0)
            I hope this makes my question clearer.
            Last edited by NeoPa; May 13 '10, 07:06 PM. Reason: Added [CODE] tags for clarity

            Comment

            • NeoPa
              Recognized Expert Moderator MVP
              • Oct 2006
              • 32661

              #7
              This won't be done simply using SQL. You'll need some sort of public VBA function. I would suggest a function that takes two parameters. One to identify the grouping (in this case [Postcode]) and one as the value.

              The function would return the top two values in a string. The SQL would need to get the Max() of the returned values within any group, and separate out the two values into separate fields.

              It (the function) would need to maintain static data of the grouping last used and the highest two values to date. If would need to determine in each iteration of the function whether or not the newly passed value should effect either of the top two positions. The Max() of all the results in any grouping should always be the correct result to use.

              Comment

              • Diablette
                New Member
                • May 2010
                • 6

                #8
                Hi NeoPa,

                Thank you for your reply.

                So am I understanding correctly that I can't even get a table like this using SQL (i.e. with all but the top 2 values per group/postcode excluded):
                Code:
                Postcode  Value 
                2600    500
                2600    400 
                3800    600
                3800    500 
                4460    300
                4460    300 
                1100    100
                The only way is doing this sql within a function:
                Code:
                SELECT  top 2 TotalSI 
                  FROM [Location file_DO] 
                  WHERE  Postcode = **variablePostcode** 
                  ORDER BY TotalSI DESC
                and calling it 25,000 times?
                (This is the code I wrote in my first post.)

                Apologies, but I don't understand this part of your reply:
                It (the function) would need to maintain static data of the grouping last used and the highest two values to date. If would need to determine in each iteration of the function whether or not the newly passed value should effect either of the top two positions.

                I also don't understand what the value parameter is that you refer to the function taking in. I only understand putting in one variable (the postcode) and getting out the top two values.

                Comment

                • NeoPa
                  Recognized Expert Moderator MVP
                  • Oct 2006
                  • 32661

                  #9
                  Originally posted by Diablette
                  Diablette: So am I understanding correctly that I can't even get a table like this using SQL (i.e. with all but the top 2 values per group/postcode excluded):
                  Absolutely, you are. There is no way in SQL that I'm aware of.

                  At this point I should add an update.

                  I set up a test bed database and actually managed to get something to work on lines very similar to what you're after.

                  I started by setting up a table with values :
                  Table Name=[tblGroup]
                  Code:
                  [I]ID  Group  Value[/I]
                   1    A       1
                   2    A       2
                   3    A       3
                   4    A       4
                   5    B       2
                   6    B       4
                   7    B       6
                   8    C       5
                   9    D       4
                  10    D       8
                  I then put some SQL together :
                  Code:
                  SELECT   tGo.ID
                         , tGo.Group
                         , tGo.Value
                  
                  FROM     tblGroup AS tGo
                  
                  WHERE    tGo.ID IN(
                      SELECT   TOP 2 tGi.ID
                      FROM     tblGroup AS tGi
                      WHERE    tGi.Group=tGo.Group
                      ORDER BY tGi.Value DESC
                      )
                  
                  ORDER BY tGo.Group
                         , tGo.Value DESC
                  This produced these results :
                  Code:
                  [I]ID  Group  Value[/I]
                   4    A       4
                   3    A       3
                   7    B       6
                   6    B       4
                   8    C       5
                  10    D       8
                   9    D       4
                  You will notice that it cannot create a second value for the Group C for you, but otherwise is fundamentally what you want I think.

                  Comment

                  • NeoPa
                    Recognized Expert Moderator MVP
                    • Oct 2006
                    • 32661

                    #10
                    Originally posted by Diablette
                    Diablette: Apologies, but I don't understand this part of your reply:
                    It (the function) would need to maintain static data of the grouping last used and the highest two values to date. If would need to determine in each iteration of the function whether or not the newly passed value should effect either of the top two positions.

                    I also don't understand what the value parameter is that you refer to the function taking in. I only understand putting in one variable (the postcode) and getting out the top two values.
                    I would just like to add that it's always pleasant to get an intelligent response from a member when they are confused by something I've written. The most common responses by far are :
                    1. Ignoring it completely.
                    2. Explaining why I must be wrong.

                    In this case it is so much easier to deal with :)
                    First, the reference to the function. You need to invert your understanding somewhat to get the picture. The SQL is not processed from within the function. Rather, the function is called by the SQL.

                    Second, for the function to be able to do its job it would need both the data that is grouped by, and the value. It is called for every record in the table (but it's important that the data come in already sorted by the grouped field). The function can remember previous matching records so can determine the highest two values it has so far seen. Until all the data for the group has been processed there will be many return values of the function that are simply wrong, but the Max() of these should always be correct.

                    With the SQL now proving possible, there is no need to worry about these details at all of course.

                    Comment

                    • Diablette
                      New Member
                      • May 2010
                      • 6

                      #11
                      Thank you very, very much.
                      I got your SQL code to work.
                      And bonus it doesn't take too long.
                      I think I will create a table from this code, another table just using max to get the largest with their ids and then a unmatched query on id to get the 2nd largest values, where they exist.
                      Your help is greatly appreciated.
                      And thank you for the compliment :)

                      Comment

                      • NeoPa
                        Recognized Expert Moderator MVP
                        • Oct 2006
                        • 32661

                        #12
                        It was a pleasure, and Welcome to Bytes!

                        Comment

                        Working...