How do I select random subjects from different groups in one table?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lilp32
    New Member
    • Oct 2010
    • 43

    How do I select random subjects from different groups in one table?

    I am trying to write a program to randomly select 20 subjects from 20 groups. All groups and subjects are in one table. The variable for group is SecondaryID and the variable for subject is SubjectID. I found some code online that works for the random selection but have been unable to modify the code to do this for each SecondaryID. Any suggestions would be much appreciated.

    Thanks!

    Code:
    Option Compare Database
    Sub PickRandom()
        Dim db As Database
        Dim tdf As TableDef
        Dim fld As Field
        Dim rst As Recordset
        Dim strSQL As String
        Dim strTableName As String
          
    ' 1: Create a new temporary table containing the required fields
        strSQL = "SELECT tblDATA.SecondaryID, tblDATA.SubjectID " & _
                 "INTO tblTemp " & _
                 "FROM tblDATA;"
        DoCmd.SetWarnings False
        DoCmd.RunSQL strSQL
        DoCmd.SetWarnings True
        
    ' 2: Add a new field to the new table
        Set db = CurrentDb()
        Set tdf = db.TableDefs("tblTemp")
        Set fld = tdf.CreateField("RandomNumber", dbSingle)
        tdf.Fields.Append fld
    
    ' 3: Place a random number in the new field for each record
        Set rst = db.OpenRecordset("tblTemp", dbOpenTable)
        rst.MoveFirst
        Do
            Randomize
            rst.Edit
                rst![RandomNumber] = Rnd()
            rst.Update
            rst.MoveNext
        Loop Until rst.EOF
        rst.Close
        Set rst = Nothing
        
    ' 4: Sort the tblTemp by the random number and move the top 20 into a new table
        strTableName = SecondaryID & Format(Date, "ddmmmyyyy")
        strSQL = "SELECT TOP 20 tblTemp.secondaryID, tblTemp.SubjectID " & _
                 "INTO " & strTableName & " " & _
                 "FROM tblTemp " & _
                 "ORDER BY tblTemp.RandomNumber;"
        DoCmd.SetWarnings False
        DoCmd.RunSQL strSQL
        DoCmd.SetWarnings True
    
    ' 5: Delete the temporary table
        db.TableDefs.Delete ("tblTemp")
    End Sub
  • C CSR
    New Member
    • Jan 2012
    • 144

    #2
    The code works for me. I noticed the var SecondaryID for part of the Tablename is empty (not initialized). I'll check it somemore though. (can't now). Check data types (I just used "long integer") and check spelling, etc. Look for a table with just the date.

    Comment

    • NeoPa
      Recognized Expert Moderator MVP
      • Oct 2006
      • 32656

      #3
      It's not very clear what you even want, but I'll guess you have a table with records split across 20 possible groups, and you're looking to select one subject from each of these groups, randomly. If that doesn't express it properly then you need to try again yourself, because without a question that makes sense, answers are hard to come by.

      As you also supply no information to indicate what format Subject and Group values are in I suppose we should assume it's textual. As random data is produced using the Rnd() function, which returns a value such that 0 <= X < 1, we need, somehow, to convert such a value into something that will select one of n records for each Group value, where n is the number of records for that Group.

      I would suggest Recordset processing might be used to determine the actual record to select. Determine the number of records for each Group (DCount() could be used for this or processing through the recordset in order counting the matching records) then multiply this by the generated random number and round back to an integer value. This will be the number of records to move on to from the first of the Group.

      You can produce a list of PK IDs to include within an In() list or process them as you find them.

      Comment

      • ADezii
        Recognized Expert Expert
        • Apr 2006
        • 8834

        #4
        You do not need to Create a Temporary Table, Create a Field, Sort the Temporary Table, then Delete the Temporary Table. The following Code, based on NeoPa's idea of Recordset processing, will pick a Random Subject for each Unique Group, based on the Number of Subjects for that Group. I've posted some Sample Data, the Code Logic, as well as Results for 3 Trial Runs:
        1. tblData
          Code:
          SecondaryID	SubjectID
          Group A	        A1
          Group A	        A2
          Group A	        A3
          Group D	        D1
          Group X	        X1
          Group X	        X2
          Group X	        X3
          Group X	        X4
          Group D	        D2
          Group D	        D3
          Group D	        D4
          Group D	        D5
          Group D	        D6
          Group D	        D7
          Group D	        D8
          Group D	        D9
          Group D	        D10
          Group X	        X5
        2. Code Logic
          Code:
          Dim MyDB As DAO.Database
          Dim rstUniqueGroups As DAO.Recordset
          Dim rstSubjectsForGroup As DAO.Recordset
          Dim strSQL1 As String
          Dim strSQL2 As String
          Dim intNumOfRecs As Integer     'Number of Subjects per Group
          
          Randomize
          
          Set MyDB = CurrentDb
          
          'Recordset representing Distinct Groups in tblData
          strSQL1 = "SELECT DISTINCT [SecondaryID] FROM tblData"
          
          Set rstUniqueGroups = MyDB.OpenRecordset(strSQL1, dbOpenSnapshot, dbOpenForwardOnly)
          
          With rstUniqueGroups
            Do While Not .EOF
              'Recordset representing Subjects for each Group in tblData
              strSQL2 = "SELECT * FROM tblData WHERE [SecondaryID] = '" & ![SecondaryID] & "';"
              Set rstSubjectsForGroup = MyDB.OpenRecordset(strSQL2, dbOpenSnapshot)
                rstSubjectsForGroup.MoveLast: rstSubjectsForGroup.MoveFirst
                intNumOfRecs = rstSubjectsForGroup.RecordCount
                  rstSubjectsForGroup.Move Int(Rnd * intNumOfRecs)
                  Debug.Print "Group: " & ![SecondaryID] & vbCrLf & "Number of Records: " & intNumOfRecs & _
                               vbCrLf & "Random Subject for " & ![SecondaryID] & ": " & rstSubjectsForGroup![SubjectID]
                Debug.Print "**********************************"
              .MoveNext
            Loop
          End With
          
          rstUniqueGroups.Close
          rstSubjectsForGroup.Close
          Set rstUniqueGroups = Nothing
          Set rstSubjectsForGroup = Nothing
        3. Results
          Code:
          Group: Group A
          Number of Records: 3
          Random Subject for Group A: A3
          **********************************
          Group: Group D
          Number of Records: 10
          Random Subject for Group D: D5
          **********************************
          Group: Group X
          Number of Records: 5
          Random Subject for Group X: X2
          **********************************
          
          Group: Group A
          Number of Records: 3
          Random Subject for Group A: A2
          **********************************
          Group: Group D
          Number of Records: 10
          Random Subject for Group D: D4
          **********************************
          Group: Group X
          Number of Records: 5
          Random Subject for Group X: X5
          **********************************
          
          Group: Group A
          Number of Records: 3
          Random Subject for Group A: A1
          **********************************
          Group: Group D
          Number of Records: 10
          Random Subject for Group D: D2
          **********************************
          Group: Group X
          Number of Records: 5
          Random Subject for Group X: X1
          **********************************

        Comment

        • NeoPa
          Recognized Expert Moderator MVP
          • Oct 2006
          • 32656

          #5
          Let me start by congratulating ADezii on some good code that produces the correct results. I also knocked up some code to illustrate how an idea, very similar to my previously posted one, could work without the need for so many opens of recordsets (as this is relatively expensive of resources). It uses the concept of processing through the table until a Group's records have been passed, then moving back a random numer of records for the selected record, then forward again to the first of the next Group to continue onwards. I haven't a rig to test it on but I would expect it to work efficiently :
          Code:
          Dim cdb As DAO.Database
          Dim strSQL As String, strGroup As String
          Dim lngNumSubs As Long, lngRand As Long
          Dim blnCheck As Boolean
          
          Call Randomize
          Set cdb = CurrentDb
          strSQL = "SELECT   * " & _
                   "FROM     [tblDATA] " & _
                   "ORDER BY [SecondaryID]"
          
          With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
              Do
                  If .EOF Then
                      blnCheck = True
                  ElseIf .SecondaryID <> strGroup Then
                      blnCheck = True
                  End If
                  If blnCheck Then
                      lngRand = lngNumSubs - Int(Rnd() * lngNumSubs)
                      Call .Move(Rows:=-lngRand)
                      Debug.Print "Group: " & strGroup & vbCrLf & _
                                  "Number of Records: " & lngNumSubs & vbCrLf & _
                                  "Random Subject: " & .SubjectID & vbCrLf & _
                                  "**********************************"
                      Call .Move(Rows:=lngRand)
                      If Not .EOF Then
                          strGroup = .SecondaryID
                          lngNumSubs = 0
                          blnCheck = False
                      End If
                  End If
                  If .EOF Then Exit Do
                  lngNumSubs = lngNumSubs + 1
                  Call .MoveNext
              Loop
          End With
          PS. I should point out I used ADezii's code as a starting position. It's only changed in certain places.

          Comment

          • Rabbit
            Recognized Expert MVP
            • Jan 2007
            • 12517

            #6
            Here's a SQL version
            Code:
            SELECT T1.*
            FROM Table1 T1
               INNER JOIN (
                  SELECT SecondaryID, MAX(Rnd(-1 * DatePart("s", NOW()) * ASC(SecondaryID) * RIGHT(SubjectID, 1))) AS Expr1
                  FROM Table1
                  GROUP BY SecondaryID
               ) T2
               ON T1.SecondaryID = T2.SecondaryID
            WHERE T2.Expr1 = Rnd(-1 * DatePart("s", NOW()) * ASC(T1.SecondaryID) * RIGHT(T1.SubjectID, 1));
            It would be preferable to replace the stuff in the Rnd with a unique ID.

            Comment

            • NeoPa
              Recognized Expert Moderator MVP
              • Oct 2006
              • 32656

              #7
              This is really, really, clever.

              Unfortunately, I'd take some convincing of the randomness of [Expr1], effected as it is by the values of the two fields within the calculation. A unique ID would also suffer from the same effect in my view. Some very smart thinking behind the concept though.

              PS. I'm open to question if you disagree Rabbit.
              PPS. Maybe if the multiplication by any of the items (or parts thereof) found within the record were added to the parameter part of the Rnd() call instead of outside of it, the process might work perfectly?
              Last edited by NeoPa; Jan 18 '12, 02:21 PM. Reason: Added PPS

              Comment

              • ADezii
                Recognized Expert Expert
                • Apr 2006
                • 8834

                #8
                Like you, NeoPa, I found Rabbit's approach to be very cleaver, not to mention intriguing, so I made 10 Trial Runs on the SQL for curiosity sake. The Results are as follows:
                Code:
                Group A -  3 Subjects
                Group D - 10 Subjects
                Group X -  5 Subjects
                
                A1 - 20%
                A2 - 60%
                A3 - 20%
                
                D1 - 10%
                D2 - 30%
                D3 - 10%
                D4 -  0%
                D5 -  0%
                D6 - 20%
                D7 - 30%
                D8 -  0%
                D9 -  0%
                D10 - 0%
                
                X1 - 20%
                X2 - 60%
                X3 - 20%
                X4 -  0%
                X5 -  0%
                P.S. - With a greater number of Runs, I would assume that the outcome would have been more evenly distributed.

                Comment

                • lilp32
                  New Member
                  • Oct 2010
                  • 43

                  #9
                  Thanks everyone for your responses. I apologize for the lack of information - I am pretty new to this. I realize that my question was unclear - I would actually like to select 20 random subjects for EACH group for a total of 400. I got the code by ADezii to work for selecting one for each group; the code by NeoPa gives me an object required error at line 16 (strGroup). Thanks again.

                  Comment

                  • NeoPa
                    Recognized Expert Moderator MVP
                    • Oct 2006
                    • 32656

                    #10
                    @ADezii.
                    Your test data, where the first character of each [SecondaryID] value is the same for all, would not illustrate what I think is a shortcoming in Rabbit's code. Even if it did, the effect of it would be very hard to notice. It's actually easier to spot in the logic than in the results - as the Random results would tend to obscure any, otherwise noticeable, results.

                    @Lilp32.
                    Firstly thank you and congratulations for responding to the posts. It always helps and keeps up the interest.
                    I cannot easily think what the error could be referring to but here are some possibilities :
                    1. Line #2 wasn't included in your test so strGroup was unrecognised as a string variable.
                    2. tblDATA.Seconda ryID is some form of object that doesn't translate to a string. I wrote the code to fit ADezii's scenario (where both fields are string types). Some changes might be required if your actual situation doesn't match that. As yet we don't really know that situation very clearly.

                    Frankly neither seems particularly likely. Even if tblDATA.Seconda ryID were a number it would convert the data automatically and still work, and why you would miss some of the code out I cannot imagine.

                    On to your expanded/clarified question.
                    This makes life much more complicated. I would expect ADezii's approach to be more expandable to suit that if I'm honest, although the following routine might be incorporated into my code to produce the same results :
                    Code:
                    'Returns a string representing multiple moves through the recordset to select up
                    'to 20 random records.  Always ends up at start of next Group.
                    Private Sub Randomise(ByRef strResult As String, ByVal lngCount As Long)
                        Dim lngMax As Long, lngA As Long, lngX As Long, lngY As Long, lngZ As Long
                        Dim alngRecs() As Long
                    
                        ReDim alngRecs(lngCount - 1) As Long
                        lngMax = 20
                        If lngMax > lngCount Then lngMax = lngCount
                        For lngX = 1 To lngCount
                            alngRecs(lngX - 1) = lngX
                        Next lngX
                        strResult = ""
                        For lngX = 0 To lngMax - 1
                            lngY = Rnd() * lngCount + lngX
                            lngZ = alngRecs(lngY)
                            alngRecs(lngY) = alngRecs(lngX)
                            alngRecs(lngX) = lngZ
                            strResult = strResult & "," & lngZ - lngA
                            lngA = lngZ
                        Next lngX
                        strResult = Mid(strResult, 2) & lngZ
                    End Sub
                    Last edited by NeoPa; Jan 18 '12, 05:27 PM. Reason: Inserted chronologically first (but more complicated) response.

                    Comment

                    • NeoPa
                      Recognized Expert Moderator MVP
                      • Oct 2006
                      • 32656

                      #11
                      Alternatively, Rabbit's code could be manipulated to produce :
                      Code:
                      SELECT *
                           , Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))) As RandNo
                      FROM   [tblDATA] AS [tD]
                      WHERE [RandNo] In(SELECT   TOP 20
                                                 Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1)))
                                        FROM     [tblDATA]
                                        WHERE    (tblDATE.SecondaryID = tD.SecondaryID)
                                        ORDER BY Rnd(-Minute(Now()) * Second(Now()) * Asc([SecondaryID]) * Asc(Right([SubjectID], 1))))

                      Comment

                      • NeoPa
                        Recognized Expert Moderator MVP
                        • Oct 2006
                        • 32656

                        #12
                        Incorporating this new code (Apologies for the similarity between Randomize and Randomise() by the way) was not as local a change as I'd hoped, so I need to repost the whole lot. Most of the changes (apart from a few changes to the Dims) were between the original lines #20 and #26 though :
                        Code:
                        Dim cdb As DAO.Database
                        Dim strSQL As String, strGroup As String, strResult As String
                        Dim lngNumSubs As Long, lngX As Long
                        Dim blnCheck As Boolean
                        Dim varResults As Variant
                        
                        Call Randomize
                        Set cdb = CurrentDb
                        strSQL = "SELECT   * " & _
                                 "FROM     [tblDATA] " & _
                                 "ORDER BY [SecondaryID]"
                        
                        With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
                            Do
                                If .EOF Then
                                    blnCheck = True
                                ElseIf .SecondaryID <> strGroup Then
                                    blnCheck = True
                                End If
                                If blnCheck Then
                                    Debug.Print "Group: " & strGroup & vbCrLf & _
                                                "Number of Records: " & lngNumSubs
                                    Call Randomise(strResult, lngNumSubs)
                                    varResults = Split(strResult, ",")
                                    For lngX = 0 To lngNumSubs - 1
                                        Call .Move(Rows:=-varResults(lngX))
                                        Debug.Print "Random Subject: " & .SubjectID
                                    Next lngX
                                    Call .Move(Rows:=varResults(lngNumSubs))
                                    Debug.Print "**********************************"
                                    If Not .EOF Then
                                        strGroup = .SecondaryID
                                        lngNumSubs = 0
                                        blnCheck = False
                                    End If
                                End If
                                If .EOF Then Exit Do
                                lngNumSubs = lngNumSubs + 1
                                Call .MoveNext
                            Loop
                        End With

                        Comment

                        • lilp32
                          New Member
                          • Oct 2010
                          • 43

                          #13
                          Thanks again. I changed the SecondaryID to be a text field just in case. If it helps, the subject ID field is a number and numbers can be repeated across sites but not in the same site.

                          When I run this code I get the following compile error: "Wrong number of arguments or invalid property assignment" at line 23 (I changed randomise to randomize).

                          Code:
                          Dim cdb As DAO.Database
                          Dim strSQL As String, strGroup As String, strResult As String
                          Dim lngNumSubs As Long, lngX As Long
                          Dim blnCheck As Boolean
                          Dim varResults As Variant
                            
                          Call Randomize
                          Set cdb = CurrentDb
                          strSQL = "SELECT   * " & _
                                   "FROM     [tblDATA] " & _
                                   "ORDER BY [SecondaryID]"
                            
                          With cdb.OpenRecordset(strSQL, dbOpenSnapshot, dbReadOnly)
                              Do
                                  If .EOF Then
                                      blnCheck = True
                                  ElseIf tbldata.SecondaryID <> strGroup Then
                                      blnCheck = True
                                  End If
                                  If blnCheck Then
                                      Debug.Print "Group: " & strGroup & vbCrLf & _
                                                  "Number of Records: " & lngNumSubs
                                      Call Randomize(strResult, lngNumSubs)
                                      varResults = Split(strResult, ",")
                                      For lngX = 0 To lngNumSubs - 1
                                          Call .Move(Rows:=-varResults(lngX))
                                          Debug.Print "Random Subject: " & .SubjectID
                                      Next lngX
                                      Call .Move(Rows:=varResults(lngNumSubs))
                                      Debug.Print "**********************************"
                                      If Not .EOF Then
                                          strGroup = tbldata.SecondaryID
                                          lngNumSubs = 0
                                          blnCheck = False
                                      End If
                                  End If
                                  If .EOF Then Exit Do
                                  lngNumSubs = lngNumSubs + 1
                                  Call .MoveNext
                              Loop
                          End With

                          Comment

                          • ADezii
                            Recognized Expert Expert
                            • Apr 2006
                            • 8834

                            #14
                            @NeoPa:
                            I'm a little slow today so bear with me. How does your logic ensure that you end up with 20 Random, Unique, Subjects per Group. My question pertains to the 'Uniqueness' of the 20 Values. Thanks.

                            Comment

                            • NeoPa
                              Recognized Expert Moderator MVP
                              • Oct 2006
                              • 32656

                              #15
                              Originally posted by Lilp32
                              Lilp32:
                              When I run this code I get the following compile error: "Wrong number of arguments or invalid property assignment" at line 23 (I changed randomise to randomize).
                              That wasn't too clever as there is already a keyword Randomize. It's even used earlier in the code. Feel free to rename the procedure, but ensure it's not to an already-used keyword if you do.

                              PS. Have you tried out the solution proposed in post #11 yet. That may prove to be a much easier solution.
                              Last edited by NeoPa; Jan 18 '12, 06:09 PM.

                              Comment

                              Working...