Works perfectly! Thanks so much!!!
How do I select random subjects from different groups in one table?
Collapse
X
-
The results of 100000 runs each on 50 unique IDs. There are a few outliers but for the most part, it approaches random.
Code:1 :: 0.5097 2 :: 0.4913 3 :: 0.4957 4 :: 0.4857 5 :: 0.5064 6 :: 0.4436 7 :: 0.3428 8 :: 0.5 9 :: 0.52 10 :: 0.5037 11 :: 0.4366 12 :: 0.5158 13 :: 0.469 14 :: 0.4779 15 :: 0.49 16 :: 0.5013 17 :: 0.5315 18 :: 0.4691 19 :: 0.5175 20 :: 0.541 21 :: 0.4312 22 :: 0.421 23 :: 0.4748 24 :: 0.4604 25 :: 0.5207 26 :: 0.5059 27 :: 0.4522 28 :: 0.4169 29 :: 0.6054 30 :: 0.5955 31 :: 0.5092 32 :: 0.5134 33 :: 0.5379 34 :: 0.5026 35 :: 0.4889 36 :: 0.5094 37 :: 0.4758 38 :: 0.452 39 :: 0.5282 40 :: 0.4668 41 :: 0.4673 42 :: 0.4998 43 :: 0.4969 44 :: 0.4396 45 :: 0.5248 46 :: 0.5087 47 :: 0.5215 48 :: 0.4943 49 :: 0.4541 50 :: 0.4971
Comment
-
I'm sure you do. I would too if there were a version that gave valid results. I don't believe that's the case with what you posted in post #18 though, for the reasons I mentioned earlier (post #11). It doesn't seem sensible to compromise the results for reasons of performance (Hence I went for the less efficient approach that, as far as I can see, gives reliable results logically).Originally posted by RabbitRabbit (Post #18):
As for post #11, I prefer a join approach over a subquery in terms of speed.
Ref post #30.
Certainly it is better to use the [UNIQUEID] field in the seeding, but this doesn't. It multiplies by a [UNIQUEID] instead. This has the effect that the results may well look random, but are not. It is my belief (until I see something else new to be fair) that this approach cannot yield the results you request. Assume for the sake of understanding that [UNIQUEID] is allocated in order of records created. As the product of the result of Rnd() * [UNIQUENO] will always be a value which follows the pattern 0 <= X < [UNIQUENO], it follows that more recent records will tend towards those to be selected. Nothing is easily testable as the randomness is always involved too, but unless [UNIQUENO] is added as part of the seed this will always be flawed. Of course if that were the case, then this approach wouldn't work anyway.
I'm happy to be proven wrong here, but I'll require more than the fact that the results seem random (as I'd expect that anyway). That doesn't counter the knowledge that largere numbered [UNIQUEID] values (more recent records) would be weighted over those with lower values (entered earlier).Comment
-
Let me know what you're stuck on and I'll be happy to help.Originally posted by Lilp32Lilp32:
@NeoPa: I realized my mistake, thanks! I still haven't gotten the VBA code to work.
PS. As per my response in post #34, Rabbit is absolutely right in that my suggested SQL code in post #11 would not run as fast as the SQL using JOINs (I believe it would actually work reliably, but I'm not above wanting confirmation on that score from Rabbit who is, as you'll see, one of the best SQL exponents you could hope to come across). As such, I'm not sure whether or not it would run faster or slower than the code solution from post #12 (which references the routine from post #10).Comment
-
The only time that a join would return incorrect results is when Rnd() returns the same result for different records in the same group. At which point you would have to balance veracity against speed. And that would be highly dependent on how many records they have. I think at most you would get 19 records rather then your 20 and even that would be pretty a rare occurrence. If this is not the shortcoming you're referring to, can you explain?
I don't think I suggested that the records be picked byRnd() * [UniqueID]as that would definitely result in incorrect results. Rather, I meant thatRnd(-1 * [seconds or milliseconds] * [UniqueID])should be used to get the random number. Using that as the seed should result in near randomness for any [UniqueID]. I believe the code in 30 and the results in 33 show that, at least for the numbers 1-50, any one ID approaches randomness.
I suppose the next step would be to show randomness between any two sets of IDs. I can make test code for that as well. Because while randomness is, by definition, difficult to test, it is not impossible to test all possible seed values that may be used to seed the generator.
In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.Comment
-
Indeed. You are correct on that point. I checked over and over again and still managed to get confused on that point, but now I look yet again, I see it was, indeed, within the seed of the Rnd() call (Not only in the most recent example but also in posts #6 and #18).Originally posted by RabbitRabbit:
I don't think I suggested that the records be picked by Rnd() * [UniqueID] as that would definitely result in incorrect results.
My other objection was based on the fact that the [RandNo] values for the same records in [T1] and [T2] were identical (assuming Now() is a value that is only set once for the whole run and simply reused as that static value for each reference). I'm ashamed to say that on that score I was also mistaken (Either that or I'm just too tired to think clearly now). This is actually an essential part of the logic.
If I was impressed before, I find I now have the urge to bow before a SQL deity. I see nothing wrong with it at all (If I'm being really picky I might say to update line #14 toGROUP BY T1.UNIQEUEID, but that's just a detail you didn't catch up with after that field was provided late).
@Lilp32.
Go ahead and mark post #30 as Best Answer. I'll be happy to take you through my code still, if you want, but Rabbit's solution is, frankly, in a class of its own. I'm just checking again in case my enthusiasm is a result of tiredness and I'm missing something, but I really don't think so. It also seems more straightforward now somehow (which illustrates its code-elegance). No. I'm happy it's the doggy's doo-dahs.
No. I led you up the garden path there I'm afraid (inadvertently of course). The code is perfectly random I see now. Any perceived lack of randomness was down to my misunderstandin g of where the parentheses left your multiplication of the value(s) from within the record.Originally posted by RabbitRabbit:
In the end, I believe the key to approaching randomness would be to multiply more and more time values into the seed.Comment
-
I wouldn't say that you steered me wrong per se. Only that you sent me down an inevitable and necessary path to check whether or not it is truly random between any two numbers.
I plan on running the following simulation when I get back in the office tomorrow.
The win rate for any two number combination should approach 50%.Code:For seconds = 1 To 60 For minutes = seconds + 1 To 60 win = 0 For id1 = 1 To 10001 Step 100 For id2 = id1 + 100 To 10001 Step 100 If rnd(minutes * seconds * id1) > rnd(minutes * seconds * id2) Then win = win + 1 End If Next id2 Next id1 Debug.Print win / 3600 Next minutes Next secondsComment
-
I only went from 1 to 1001 with a step of 100. But here are the results:
For the most part, it seems to approach random. The largest difference seems to be a 2.4% bias towards one number over another.Code:1,101,0.497777777777778 1,201,0.506388888888889 1,301,0.508888888888889 1,401,0.487222222222222 1,501,0.493333333333333 1,601,0.495277777777778 1,701,0.495 1,801,0.489444444444444 1,901,0.498055555555556 1,1001,0.494444444444444 101,201,0.501944444444444 101,301,0.505 101,401,0.509444444444444 101,501,0.5125 101,601,0.487222222222222 101,701,0.498055555555556 101,801,0.514722222222222 101,901,0.496111111111111 101,1001,0.498888888888889 201,301,0.510833333333333 201,401,0.501111111111111 201,501,0.500277777777778 201,601,0.500555555555556 201,701,0.496111111111111 201,801,0.500555555555556 201,901,0.491388888888889 201,1001,0.506388888888889 301,401,0.507777777777778 301,501,0.493055555555556 301,601,0.4975 301,701,0.4925 301,801,0.490555555555556 301,901,0.487777777777778 301,1001,0.499166666666667 401,501,0.506944444444444 401,601,0.488888888888889 401,701,0.494166666666667 401,801,0.492777777777778 401,901,0.495 401,1001,0.503055555555556 501,601,0.499166666666667 501,701,0.496666666666667 501,801,0.489166666666667 501,901,0.493333333333333 501,1001,0.495277777777778 601,701,0.495555555555556 601,801,0.498333333333333 601,901,0.496666666666667 601,1001,0.501666666666667 701,801,0.491666666666667 701,901,0.506111111111111 701,1001,0.504166666666667 801,901,0.512222222222222 801,1001,0.504166666666667 901,1001,0.503055555555556
P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.Comment
-
Indeed. I left out the Hours when I looked at my version as that would have risked an overflow condition. Two fields from the record were required at that time though. Now [UNIQUEID] is available the Hours can be handled without risk of overflow.Originally posted by RabbitRabbit:
P.S. As expected, multiplying more time factors into the seed reduces variance. Minutes and seconds is good but minutes, seconds, hours is better and so on.Comment
Comment