How to Gnerate a Random ID Number

**Hugo Kornelis** · Jun 13 '07, 11:35 PM

Re: How to Gnerate a Random ID Number

On Mon, 11 Jun 2007 01:02:19 +0200, Hugo Kornelis wrote:

I know that rand() is called
>just once for a set-based query, returning the same value for each row.

Which, BTW, can be overcome in SQL Server 2005 using a dirty trick:

SELECT o.name, r.rnd
FROM sys.objects AS o
CROSS APPLY (SELECT RAND(CHECKSUM(o .name) ^ CHECKSUM(newid( ))) AS rnd)
AS r

The CHECKSUM(o.name ) makes sure that the RAND function has to be called
for each row in sys.objects. With just this, the query would become
deterministic; this is overcome by also factoring in CHECKSUM(NEWID( )).
Both CHECKSUM values can span the entire integer range; combining them
with bitwise exclusive OR results in a new integer that also spans the
entire range of integers. (Bitwise inclusive OR favors values with many
bits said; bitwise AND favors values with many bits off; adding or
subtracting runs the risk of exceeding the integer domain; and
subtracting the absolute values favors values around 0).

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

**Gert-Jan Strik** · Jun 14 '07, 07:15 PM

Re: How to Gnerate a Random ID Number

Hugo Kornelis wrote:

>
On Mon, 11 Jun 2007 01:24:34 +0200, Gert-Jan Strik wrote:
>

And isn't newid() more or less the same (using a different seed and a
different algorithm to compute the next value, but still computing some
formula with a seed as input to get at a pseudo-random value?)

I doubt it. The newid() value has to be globally unique, which suggests
the function should never produce an 'old' value ever again.

>
Hi Gert-Jan,
>
Well, that definitely rules out newid() as a "good" pseudo random number
generator, then. A sequence of random numbers should have a chance to
hold duplicates.

Good observation. And so you correctly concluded that RAND() also does
not do this.

Of course, checksum(newid( )) will include duplicates, but only someone
privy to the implementation details of both newid() and checksum() can
determine wether the non-repetition of newid() values affects the
randomness of checksum(newid( )). If I had a need for a good RNG, I'd
look further!

Should you find a better (and practical) method, please share it :-)

When using rand(), you could expect the same values after a reseed, or
an SQL Server restart. The newid() function should not have such
behavior.

>
I wasn't aware that the seed is reset on server restart. Is this
documented anywhere, or just based on personal observation?

Oops... My apologies, that was a bit thoughtless of me. I merely
assumed the seed would be reset upon restart. However, I just tested
this on SQL Server 2005, and the seed does not seem to be reset (or at
least not to the same value).

Gert-Jan

Anyway, it's
easy to fix it by putting
SET @dummy = RAND(DATEDIFF(s , '20000101', CURRENT_TIMESTA MP))
in a stored procedure and run it on startup.
>
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

**Hugo Kornelis** · Jun 14 '07, 10:25 PM

Re: How to Gnerate a Random ID Number

On Thu, 14 Jun 2007 21:05:28 +0200, Gert-Jan Strik wrote:

(snip)

>Well, that definitely rules out newid() as a "good" pseudo random number
>generator, then. A sequence of random numbers should have a chance to
>hold duplicates.

>
>Good observation. And so you correctly concluded that RAND() also does
>not do this.

Hi Gert-Jan,

Am I reading you incorrectly, or are you saying that the sequence of
numbers generated by RAND() never produces the same value twice?

>Of course, checksum(newid( )) will include duplicates, but only someone
>privy to the implementation details of both newid() and checksum() can
>determine wether the non-repetition of newid() values affects the
>randomness of checksum(newid( )). If I had a need for a good RNG, I'd
>look further!

>
>Should you find a better (and practical) method, please share it :-)

Heh! I've never yet had to implement a good RNG in SQL Server (or
anywhere, for that matter), but I do know that there's tons of
information on this subject on web pages and in books, so that's where
I'd start.

With the CLR, it's probably a lot easier to implement the RNG algorithm
of choice than it was before.

>When using rand(), you could expect the same values after a reseed, or
>an SQL Server restart. The newid() function should not have such
>behavior.

>>
>I wasn't aware that the seed is reset on server restart. Is this
>documented anywhere, or just based on personal observation?

>
>Oops... My apologies, that was a bit thoughtless of me. I merely
>assumed the seed would be reset upon restart. However, I just tested
>this on SQL Server 2005, and the seed does not seem to be reset (or at
>least not to the same value).

Probably some value derived from an internal clock or something. Most
systems that have support for builtin random number generation use that
for their initial seed.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

**Gert-Jan Strik** · Jun 15 '07, 04:15 PM

Re: How to Gnerate a Random ID Number

Well, that definitely rules out newid() as a "good" pseudo random number

generator, then. A sequence of random numbers should have a chance to
hold duplicates.

Good observation. And so you correctly concluded that RAND() also does
not do this.

>
Am I reading you incorrectly, or are you saying that the sequence of
numbers generated by RAND() never produces the same value twice?

No, I am not saying that. It might, I haven't analyzed the algorithm
thoroughly. But that doesn't matter. A good pseudo random number
generator should incorporate the idea that in a range of 2 billion
values, there is a one in 2 billion chance that the same value is
selected next. And after that, then again there is a one in 2 billion
chance it will appear again. And that is something the algorithm doesn't
do. The algorithm is totally deterministic.

Gert-Jan

**Seribus Dragon** · Jun 15 '07, 08:45 PM

Re: How to Gnerate a Random ID Number

I know this is an odd question but I thought the new access was actual
just a frount end to the destop version of SQLSERVER?

**Hugo Kornelis** · Jun 15 '07, 10:25 PM

Re: How to Gnerate a Random ID Number

On Fri, 15 Jun 2007 18:14:07 +0200, Gert-Jan Strik wrote:

>Well, that definitely rules out newid() as a "good" pseudo random number
>generator, then. A sequence of random numbers should have a chance to
>hold duplicates.
>
>Good observation. And so you correctly concluded that RAND() also does
>not do this.

>>
>Am I reading you incorrectly, or are you saying that the sequence of
>numbers generated by RAND() never produces the same value twice?

>
>No, I am not saying that. It might, I haven't analyzed the algorithm
>thoroughly. But that doesn't matter. A good pseudo random number
>generator should incorporate the idea that in a range of 2 billion
>values, there is a one in 2 billion chance that the same value is
>selected next. And after that, then again there is a one in 2 billion
>chance it will appear again. And that is something the algorithm doesn't
>do. The algorithm is totally deterministic.

Hi Gert-Jan,

Any pseudoRNG will always have a deterministic algorithm; the only
alternative would be some device that measures some physical magnitude
that is deemed to be random enough. And both philosophers and physicists
would probably argue whether even that is truly random.

Anyway, a deterministic algorithm can still satisfy the 1 in 2 billion
chance of producing the occasional duplicate. I'll try to illustrate
with a simplified example, using lower numbers (to save me the hassle of
translating Dutch words for extremely high numbers to English, and you
the hassle of translating them back :-)

Let's say that we have an algorith to produce random numbers between 1
and 64. We do of course not want to limit the seed to that range of 64
numbers - instead we use an integer to store the seed, giving us a range
of over 4 billion different seed values. We use an algorithm to
calculate next seed from the previous seed in such a way that there
won't be any obvious pattern to the series and that all 4-and-a-bit
billion possible values are calculated once before the series starts
over. We then use an other algorithm to hash each of the possible
integer values into a number between 1 and 64, such that there will be
an equal distribution but (again) no obvious pattern.

Even though the algorithm is entirely deterministic, if you write down
the full sequence of 4,294,967,296 numbers this algorithm generates
before starting over, you will see a 1 in 64 chance of getting the same
number twice in a row, a 1 in 64^2 chance of getting the same number
thrice and a 1 in 64^3 chance of getting four equal numbers in a row.
The chance for getting five equals in a row might be somewhat more or
less than 1 in 64^4, and the chance of six in a row will definitely
differ significantly from 1 in 64^5. These issues can be fixed by
increasing the ratio of seed numbers vs generated values (e.g. by using
biging instead of int for the seed).

Note that nothing of the above necessarily applies to the random number
generation by SQL Server. This applies to random number generation in
general - as far as I know, the random number generator in SQL Server is
not described in detail, so only MS employees are able to tell if it's
implemented as described here, or in another way.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

**Hugo Kornelis** · Jun 15 '07, 11:15 PM

Re: How to Gnerate a Random ID Number

On Fri, 15 Jun 2007 15:41:38 -0500, Seribus Dragon wrote:

>I know this is an odd question but I thought the new access was actual
>just a frount end to the destop version of SQLSERVER?

Hi Seribus,

No, it's not. You can *use* Access as a front-end to many different
DB's, including the Desktop Engine, but it also comes with it's own
builtin "database engine". (I enclose that in quotes because it lacks
some of the features of truly high-end relational database engines).

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

How to Gnerate a Random ID Number

Comment

Comment

Comment

Comment

Comment

Comment

Comment