How do I select random subjects from different groups in one table?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lilp32
    New Member
    • Oct 2010
    • 43

    #31
    Works perfectly! Thanks so much!!!

    Comment

    • Rabbit
      Recognized Expert MVP
      • Jan 2007
      • 12517

      #32
      Not a problem, good luck.

      Comment

      • Rabbit
        Recognized Expert MVP
        • Jan 2007
        • 12517

        #33
        The results of 100000 runs each on 50 unique IDs. There are a few outliers but for the most part, it approaches random.
        Code:
         1 :: 0.5097
         2 :: 0.4913
         3 :: 0.4957
         4 :: 0.4857
         5 :: 0.5064
         6 :: 0.4436
         7 :: 0.3428
         8 :: 0.5
         9 :: 0.52
        10 :: 0.5037
        11 :: 0.4366
        12 :: 0.5158
        13 :: 0.469
        14 :: 0.4779
        15 :: 0.49
        16 :: 0.5013
        17 :: 0.5315
        18 :: 0.4691
        19 :: 0.5175
        20 :: 0.541
        21 :: 0.4312
        22 :: 0.421
        23 :: 0.4748
        24 :: 0.4604
        25 :: 0.5207
        26 :: 0.5059
        27 :: 0.4522
        28 :: 0.4169
        29 :: 0.6054
        30 :: 0.5955
        31 :: 0.5092
        32 :: 0.5134
        33 :: 0.5379
        34 :: 0.5026
        35 :: 0.4889
        36 :: 0.5094
        37 :: 0.4758
        38 :: 0.452
        39 :: 0.5282
        40 :: 0.4668
        41 :: 0.4673
        42 :: 0.4998
        43 :: 0.4969
        44 :: 0.4396
        45 :: 0.5248
        46 :: 0.5087
        47 :: 0.5215
        48 :: 0.4943
        49 :: 0.4541
        50 :: 0.4971

        Comment

        • NeoPa
          Recognized Expert Moderator MVP
          • Oct 2006
          • 32656

          #34
          Originally posted by Rabbit
          Rabbit (Post #18):
          As for post #11, I prefer a join approach over a subquery in terms of speed.
          I'm sure you do. I would too if there were a version that gave valid results. I don't believe that's the case with what you posted in post #18 though, for the reasons I mentioned earlier (post #11). It doesn't seem sensible to compromise the results for reasons of performance (Hence I went for the less efficient approach that, as far as I can see, gives reliable results logically).

          Ref post #30.
          Certainly it is better to use the [UNIQUEID] field in the seeding, but this doesn't. It multiplies by a [UNIQUEID] instead. This has the effect that the results may well look random, but are not. It is my belief (until I see something else new to be fair) that this approach cannot yield the results you request. Assume for the sake of understanding that [UNIQUEID] is allocated in order of records created. As the product of the result of Rnd() * [UNIQUENO] will always be a value which follows the pattern 0 <= X < [UNIQUENO], it follows that more recent records will tend towards those to be selected. Nothing is easily testable as the randomness is always involved too, but unless [UNIQUENO] is added as part of the seed this will always be flawed. Of course if that were the case, then this approach wouldn't work anyway.

          I'm happy to be proven wrong here, but I'll require more than the fact that the results seem random (as I'd expect that anyway). That doesn't counter the knowledge that largere numbered [UNIQUEID] values (more recent records) would be weighted over those with lower values (entered earlier).
          Last edited by NeoPa; Jan 18 '12, 11:56 PM. Reason: Clarified where quote from

          Comment

          • NeoPa
            Recognized Expert Moderator MVP
            • Oct 2006
            • 32656

            #35
            Originally posted by Lilp32
            Lilp32:
            @NeoPa: I realized my mistake, thanks! I still haven't gotten the VBA code to work.
            Let me know what you're stuck on and I'll be happy to help.

            PS. As per my response in post #34, Rabbit is absolutely right in that my suggested SQL code in post #11 would not run as fast as the SQL using JOINs (I believe it would actually work reliably, but I'm not above wanting confirmation on that score from Rabbit who is, as you'll see, one of the best SQL exponents you could hope to come across). As such, I'm not sure whether or not it would run faster or slower than the code solution from post #12 (which references the routine from post #10).

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #36
              The only time that a join would return incorrect results is when Rnd() returns the same result for different records in the same group. At which point you would have to balance veracity against speed. And that would be highly dependent on how many records they have. I think at most you would get 19 records rather then your 20 and even that would be pretty a rare occurrence. If this is not the shortcoming you're referring to, can you explain?

              I don't think I suggested that the records be picked by Rnd() * [UniqueID] as that would definitely result in incorrect results. Rather, I meant that Rnd(-1 * [seconds or milliseconds] * [UniqueID]) should be used to get the random number. Using that as the seed should result in near randomness for any [UniqueID]. I believe the code in 30 and the results in 33 show that, at least for the numbers 1-50, any one ID approaches randomness.

              I suppose the next step would be to show randomness between any two sets of IDs. I can make test code for that as well. Because while randomness is, by definition, difficult to test, it is not impossible to test all possible seed values that may be used to seed the generator.

              In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.

              Comment

              • NeoPa
                Recognized Expert Moderator MVP
                • Oct 2006
                • 32656

                #37
                Originally posted by Rabbit
                Rabbit:
                I don't think I suggested that the records be picked by Rnd() * [UniqueID] as that would definitely result in incorrect results.
                Indeed. You are correct on that point. I checked over and over again and still managed to get confused on that point, but now I look yet again, I see it was, indeed, within the seed of the Rnd() call (Not only in the most recent example but also in posts #6 and #18).

                My other objection was based on the fact that the [RandNo] values for the same records in [T1] and [T2] were identical (assuming Now() is a value that is only set once for the whole run and simply reused as that static value for each reference). I'm ashamed to say that on that score I was also mistaken (Either that or I'm just too tired to think clearly now). This is actually an essential part of the logic.

                If I was impressed before, I find I now have the urge to bow before a SQL deity. I see nothing wrong with it at all (If I'm being really picky I might say to update line #14 to GROUP BY T1.UNIQEUEID, but that's just a detail you didn't catch up with after that field was provided late).

                @Lilp32.
                Go ahead and mark post #30 as Best Answer. I'll be happy to take you through my code still, if you want, but Rabbit's solution is, frankly, in a class of its own. I'm just checking again in case my enthusiasm is a result of tiredness and I'm missing something, but I really don't think so. It also seems more straightforward now somehow (which illustrates its code-elegance). No. I'm happy it's the doggy's doo-dahs.

                Originally posted by Rabbit
                Rabbit:
                In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.
                No. I led you up the garden path there I'm afraid (inadvertently of course). The code is perfectly random I see now. Any perceived lack of randomness was down to my misunderstandin g of where the parentheses left your multiplication of the value(s) from within the record.
                Last edited by NeoPa; Jan 19 '12, 01:01 AM. Reason: Added last quote and reply

                Comment

                • Rabbit
                  Recognized Expert MVP
                  • Jan 2007
                  • 12517

                  #38
                  I wouldn't say that you steered me wrong per se. Only that you sent me down an inevitable and necessary path to check whether or not it is truly random between any two numbers.

                  I plan on running the following simulation when I get back in the office tomorrow.
                  Code:
                  For seconds = 1 To 60
                     For minutes = seconds + 1 To 60
                        win = 0
                  
                        For id1 = 1 To 10001 Step 100
                           For id2 = id1 + 100 To 10001 Step 100
                              If rnd(minutes * seconds * id1) > rnd(minutes * seconds * id2) Then
                                 win = win + 1
                              End If
                           Next id2
                        Next id1
                  
                        Debug.Print win / 3600
                     Next minutes
                  Next seconds
                  The win rate for any two number combination should approach 50%.

                  Comment

                  • Rabbit
                    Recognized Expert MVP
                    • Jan 2007
                    • 12517

                    #39
                    I only went from 1 to 1001 with a step of 100. But here are the results:
                    Code:
                    1,101,0.497777777777778
                    1,201,0.506388888888889
                    1,301,0.508888888888889
                    1,401,0.487222222222222
                    1,501,0.493333333333333
                    1,601,0.495277777777778
                    1,701,0.495
                    1,801,0.489444444444444
                    1,901,0.498055555555556
                    1,1001,0.494444444444444
                    101,201,0.501944444444444
                    101,301,0.505
                    101,401,0.509444444444444
                    101,501,0.5125
                    101,601,0.487222222222222
                    101,701,0.498055555555556
                    101,801,0.514722222222222
                    101,901,0.496111111111111
                    101,1001,0.498888888888889
                    201,301,0.510833333333333
                    201,401,0.501111111111111
                    201,501,0.500277777777778
                    201,601,0.500555555555556
                    201,701,0.496111111111111
                    201,801,0.500555555555556
                    201,901,0.491388888888889
                    201,1001,0.506388888888889
                    301,401,0.507777777777778
                    301,501,0.493055555555556
                    301,601,0.4975
                    301,701,0.4925
                    301,801,0.490555555555556
                    301,901,0.487777777777778
                    301,1001,0.499166666666667
                    401,501,0.506944444444444
                    401,601,0.488888888888889
                    401,701,0.494166666666667
                    401,801,0.492777777777778
                    401,901,0.495
                    401,1001,0.503055555555556
                    501,601,0.499166666666667
                    501,701,0.496666666666667
                    501,801,0.489166666666667
                    501,901,0.493333333333333
                    501,1001,0.495277777777778
                    601,701,0.495555555555556
                    601,801,0.498333333333333
                    601,901,0.496666666666667
                    601,1001,0.501666666666667
                    701,801,0.491666666666667
                    701,901,0.506111111111111
                    701,1001,0.504166666666667
                    801,901,0.512222222222222
                    801,1001,0.504166666666667
                    901,1001,0.503055555555556
                    For the most part, it seems to approach random. The largest difference seems to be a 2.4% bias towards one number over another.

                    P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.

                    Comment

                    • NeoPa
                      Recognized Expert Moderator MVP
                      • Oct 2006
                      • 32656

                      #40
                      Originally posted by Rabbit
                      Rabbit:
                      P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.
                      Indeed. I left out the Hours when I looked at my version as that would have risked an overflow condition. Two fields from the record were required at that time though. Now [UNIQUEID] is available the Hours can be handled without risk of overflow.

                      Comment

                      Working...