How to Gnerate a Random ID Number

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hugo Kornelis

    #16
    Re: How to Gnerate a Random ID Number

    On Mon, 11 Jun 2007 01:02:19 +0200, Hugo Kornelis wrote:
    I know that rand() is called
    >just once for a set-based query, returning the same value for each row.
    Which, BTW, can be overcome in SQL Server 2005 using a dirty trick:

    SELECT o.name, r.rnd
    FROM sys.objects AS o
    CROSS APPLY (SELECT RAND(CHECKSUM(o .name) ^ CHECKSUM(newid( ))) AS rnd)
    AS r

    The CHECKSUM(o.name ) makes sure that the RAND function has to be called
    for each row in sys.objects. With just this, the query would become
    deterministic; this is overcome by also factoring in CHECKSUM(NEWID( )).
    Both CHECKSUM values can span the entire integer range; combining them
    with bitwise exclusive OR results in a new integer that also spans the
    entire range of integers. (Bitwise inclusive OR favors values with many
    bits said; bitwise AND favors values with many bits off; adding or
    subtracting runs the risk of exceeding the integer domain; and
    subtracting the absolute values favors values around 0).

    --
    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

    Comment

    • Gert-Jan Strik

      #17
      Re: How to Gnerate a Random ID Number

      Hugo Kornelis wrote:
      >
      On Mon, 11 Jun 2007 01:24:34 +0200, Gert-Jan Strik wrote:
      >
      And isn't newid() more or less the same (using a different seed and a
      different algorithm to compute the next value, but still computing some
      formula with a seed as input to get at a pseudo-random value?)
      I doubt it. The newid() value has to be globally unique, which suggests
      the function should never produce an 'old' value ever again.
      >
      Hi Gert-Jan,
      >
      Well, that definitely rules out newid() as a "good" pseudo random number
      generator, then. A sequence of random numbers should have a chance to
      hold duplicates.
      Good observation. And so you correctly concluded that RAND() also does
      not do this.
      Of course, checksum(newid( )) will include duplicates, but only someone
      privy to the implementation details of both newid() and checksum() can
      determine wether the non-repetition of newid() values affects the
      randomness of checksum(newid( )). If I had a need for a good RNG, I'd
      look further!
      Should you find a better (and practical) method, please share it :-)
      When using rand(), you could expect the same values after a reseed, or
      an SQL Server restart. The newid() function should not have such
      behavior.
      >
      I wasn't aware that the seed is reset on server restart. Is this
      documented anywhere, or just based on personal observation?
      Oops... My apologies, that was a bit thoughtless of me. I merely
      assumed the seed would be reset upon restart. However, I just tested
      this on SQL Server 2005, and the seed does not seem to be reset (or at
      least not to the same value).

      Gert-Jan
      Anyway, it's
      easy to fix it by putting
      SET @dummy = RAND(DATEDIFF(s , '20000101', CURRENT_TIMESTA MP))
      in a stored procedure and run it on startup.
      >
      --
      Hugo Kornelis, SQL Server MVP
      My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

      Comment

      • Hugo Kornelis

        #18
        Re: How to Gnerate a Random ID Number

        On Thu, 14 Jun 2007 21:05:28 +0200, Gert-Jan Strik wrote:

        (snip)
        >Well, that definitely rules out newid() as a "good" pseudo random number
        >generator, then. A sequence of random numbers should have a chance to
        >hold duplicates.
        >
        >Good observation. And so you correctly concluded that RAND() also does
        >not do this.
        Hi Gert-Jan,

        Am I reading you incorrectly, or are you saying that the sequence of
        numbers generated by RAND() never produces the same value twice?
        >Of course, checksum(newid( )) will include duplicates, but only someone
        >privy to the implementation details of both newid() and checksum() can
        >determine wether the non-repetition of newid() values affects the
        >randomness of checksum(newid( )). If I had a need for a good RNG, I'd
        >look further!
        >
        >Should you find a better (and practical) method, please share it :-)
        Heh! I've never yet had to implement a good RNG in SQL Server (or
        anywhere, for that matter), but I do know that there's tons of
        information on this subject on web pages and in books, so that's where
        I'd start.

        With the CLR, it's probably a lot easier to implement the RNG algorithm
        of choice than it was before.
        >When using rand(), you could expect the same values after a reseed, or
        >an SQL Server restart. The newid() function should not have such
        >behavior.
        >>
        >I wasn't aware that the seed is reset on server restart. Is this
        >documented anywhere, or just based on personal observation?
        >
        >Oops... My apologies, that was a bit thoughtless of me. I merely
        >assumed the seed would be reset upon restart. However, I just tested
        >this on SQL Server 2005, and the seed does not seem to be reset (or at
        >least not to the same value).
        Probably some value derived from an internal clock or something. Most
        systems that have support for builtin random number generation use that
        for their initial seed.

        --
        Hugo Kornelis, SQL Server MVP
        My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

        Comment

        • Gert-Jan Strik

          #19
          Re: How to Gnerate a Random ID Number

          Well, that definitely rules out newid() as a "good" pseudo random number
          generator, then. A sequence of random numbers should have a chance to
          hold duplicates.
          Good observation. And so you correctly concluded that RAND() also does
          not do this.
          >
          Am I reading you incorrectly, or are you saying that the sequence of
          numbers generated by RAND() never produces the same value twice?
          No, I am not saying that. It might, I haven't analyzed the algorithm
          thoroughly. But that doesn't matter. A good pseudo random number
          generator should incorporate the idea that in a range of 2 billion
          values, there is a one in 2 billion chance that the same value is
          selected next. And after that, then again there is a one in 2 billion
          chance it will appear again. And that is something the algorithm doesn't
          do. The algorithm is totally deterministic.

          Gert-Jan

          Comment

          • Seribus Dragon

            #20
            Re: How to Gnerate a Random ID Number

            I know this is an odd question but I thought the new access was actual
            just a frount end to the destop version of SQLSERVER?

            Comment

            • Hugo Kornelis

              #21
              Re: How to Gnerate a Random ID Number

              On Fri, 15 Jun 2007 18:14:07 +0200, Gert-Jan Strik wrote:
              >Well, that definitely rules out newid() as a "good" pseudo random number
              >generator, then. A sequence of random numbers should have a chance to
              >hold duplicates.
              >
              >Good observation. And so you correctly concluded that RAND() also does
              >not do this.
              >>
              >Am I reading you incorrectly, or are you saying that the sequence of
              >numbers generated by RAND() never produces the same value twice?
              >
              >No, I am not saying that. It might, I haven't analyzed the algorithm
              >thoroughly. But that doesn't matter. A good pseudo random number
              >generator should incorporate the idea that in a range of 2 billion
              >values, there is a one in 2 billion chance that the same value is
              >selected next. And after that, then again there is a one in 2 billion
              >chance it will appear again. And that is something the algorithm doesn't
              >do. The algorithm is totally deterministic.
              Hi Gert-Jan,

              Any pseudoRNG will always have a deterministic algorithm; the only
              alternative would be some device that measures some physical magnitude
              that is deemed to be random enough. And both philosophers and physicists
              would probably argue whether even that is truly random.

              Anyway, a deterministic algorithm can still satisfy the 1 in 2 billion
              chance of producing the occasional duplicate. I'll try to illustrate
              with a simplified example, using lower numbers (to save me the hassle of
              translating Dutch words for extremely high numbers to English, and you
              the hassle of translating them back :-)

              Let's say that we have an algorith to produce random numbers between 1
              and 64. We do of course not want to limit the seed to that range of 64
              numbers - instead we use an integer to store the seed, giving us a range
              of over 4 billion different seed values. We use an algorithm to
              calculate next seed from the previous seed in such a way that there
              won't be any obvious pattern to the series and that all 4-and-a-bit
              billion possible values are calculated once before the series starts
              over. We then use an other algorithm to hash each of the possible
              integer values into a number between 1 and 64, such that there will be
              an equal distribution but (again) no obvious pattern.

              Even though the algorithm is entirely deterministic, if you write down
              the full sequence of 4,294,967,296 numbers this algorithm generates
              before starting over, you will see a 1 in 64 chance of getting the same
              number twice in a row, a 1 in 64^2 chance of getting the same number
              thrice and a 1 in 64^3 chance of getting four equal numbers in a row.
              The chance for getting five equals in a row might be somewhat more or
              less than 1 in 64^4, and the chance of six in a row will definitely
              differ significantly from 1 in 64^5. These issues can be fixed by
              increasing the ratio of seed numbers vs generated values (e.g. by using
              biging instead of int for the seed).

              Note that nothing of the above necessarily applies to the random number
              generation by SQL Server. This applies to random number generation in
              general - as far as I know, the random number generator in SQL Server is
              not described in detail, so only MS employees are able to tell if it's
              implemented as described here, or in another way.

              --
              Hugo Kornelis, SQL Server MVP
              My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

              Comment

              • Hugo Kornelis

                #22
                Re: How to Gnerate a Random ID Number

                On Fri, 15 Jun 2007 15:41:38 -0500, Seribus Dragon wrote:
                >I know this is an odd question but I thought the new access was actual
                >just a frount end to the destop version of SQLSERVER?
                Hi Seribus,

                No, it's not. You can *use* Access as a front-end to many different
                DB's, including the Desktop Engine, but it also comes with it's own
                builtin "database engine". (I enclose that in quotes because it lacks
                some of the features of truly high-end relational database engines).

                --
                Hugo Kornelis, SQL Server MVP
                My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

                Comment

                Working...