rand() % n Revisited

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rich Fife

    rand() % n Revisited

    Quick rand() question:

    I know you're not supposed to use "rand() % 1024" for instance,
    because it focuses on the lower bits. However, it seems to me that
    given that the argument is not a power of two (or near a power of
    two), that this is not an issue. The upper bits will participate
    equally in the result with the lower. Am I missing something?

    Thanks!

    -- Rich --
  • Paul Hsieh

    #2
    Re: rand() % n Revisited

    On Oct 22, 6:04 pm, Rich Fife <rf...@amug.org wrote:
    Quick rand() question:
    >
    I know you're not supposed to use "rand() % 1024" for instance,
    because it focuses on the lower bits. However, it seems to me
    that given that the argument is not a power of two (or near a
    power of two), that this is not an issue. The upper bits will
    participate equally in the result with the lower. Am I missing
    something?
    Yes, you are missing some mathematical analysis to back up what you
    just said. If you do (rand() % 1023) on Microsoft Visual C++ or
    WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
    matter how good your random number generator is. No C compiler's
    rand() that I have ever seen has, by itself, a worse effect on random
    output than that.

    The C.L.C. FAQ about this gives extremely misleading advice on this
    point and it should seriously be ignored. If you want to seriously
    deal with random numbers just read my page about it:

    Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.


    I build up a *REAL* ranged random number generator with reasonable
    performance characteristics , culminating in the randrange() function
    that removes all the primary problems with ranged random numbers. If
    you want something with pure random bit quality you can always use the
    Mersenne Twister or Fortuna as a base for my generator function.

    --
    Paul Hsieh
    Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.


    Comment

    • lawrence.jones@siemens.com

      #3
      Re: rand() % n Revisited

      Paul Hsieh <websnarf@gmail .comwrote:
      No C compiler's
      rand() that I have ever seen has, by itself, a worse effect on random
      output than that.
      Then you've never seen the truely bad BSD rand() that fostered most of
      the paranoia about rand. With it, rand() % 1 generated the sequence 0,
      1, 0, 1, 0, 1, ....
      --
      Larry Jones

      This game lends itself to certain abuses. -- Calvin

      Comment

      • Kelsey Bjarnason

        #4
        Re: rand() % n Revisited

        On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:
        Paul Hsieh <websnarf@gmail .comwrote:
        >No C compiler's
        >rand() that I have ever seen has, by itself, a worse effect on random
        >output than that.
        >
        Then you've never seen the truely bad BSD rand() that fostered most of
        the paranoia about rand. With it, rand() % 1 generated the sequence 0,
        1, 0, 1, 0, 1, ....
        That would be bad, as rand() % 1 should only ever produce 0.

        Comment

        • Keith Thompson

          #5
          Re: rand() % n Revisited

          Kelsey Bjarnason <kelseyb@lgisp. netwrites:
          On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:
          >
          >Paul Hsieh <websnarf@gmail .comwrote:
          >>No C compiler's
          >>rand() that I have ever seen has, by itself, a worse effect on random
          >>output than that.
          >>
          >Then you've never seen the truely bad BSD rand() that fostered most of
          >the paranoia about rand. With it, rand() % 1 generated the sequence 0,
          >1, 0, 1, 0, 1, ....
          >
          That would be bad, as rand() % 1 should only ever produce 0.
          Obviously Larry is using a font that doesn't distinguish clearly
          enough between '%' and '&'. Yeah, that's it.

          --
          Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
          Nokia
          "We must do something. This is something. Therefore, we must do this."
          -- Antony Jay and Jonathan Lynn, "Yes Minister"

          Comment

          • CBFalconer

            #6
            Re: rand() % n Revisited

            lawrence.jones@ siemens.com wrote:
            Paul Hsieh <websnarf@gmail .comwrote:
            >
            >No C compiler's rand() that I have ever seen has, by itself, a
            >worse effect on random output than that.
            >
            Then you've never seen the truely bad BSD rand() that fostered
            most of the paranoia about rand. With it, rand() % 1 generated
            the sequence 0, 1, 0, 1, 0, 1, ....
            FYI the value of "rand() % 1" is identically 0. Except it may be
            considerably slower than just writing "0".

            --
            [mail]: Chuck F (cbfalconer at maineline dot net)
            [page]: <http://cbfalconer.home .att.net>
            Try the download section.

            Comment

            • Phil Carmody

              #7
              Re: rand() % n Revisited

              Keith Thompson <kst-u@mib.orgwrites :
              Kelsey Bjarnason <kelseyb@lgisp. netwrites:
              >On Thu, 23 Oct 2008 10:46:39 -0400, lawrence.jones wrote:
              >>
              >>Paul Hsieh <websnarf@gmail .comwrote:
              >>>No C compiler's
              >>>rand() that I have ever seen has, by itself, a worse effect on random
              >>>output than that.
              >>>
              >>Then you've never seen the truely bad BSD rand() that fostered most of
              >>the paranoia about rand. With it, rand() % 1 generated the sequence 0,
              >>1, 0, 1, 0, 1, ....
              >>
              >That would be bad, as rand() % 1 should only ever produce 0.
              >
              Obviously Larry is using a font that doesn't distinguish clearly
              enough between '%' and '&'. Yeah, that's it.
              I had presumed that ``const int l=2;'' was the line before.

              Obfuscatorially yours,
              Phil
              --
              The fact that a believer is happier than a sceptic is no more to the
              point than the fact that a drunken man is happier than a sober one.
              The happiness of credulity is a cheap and dangerous quality.
              -- George Bernard Shaw (1856-1950), Preface to Androcles and the Lion

              Comment

              • Charles Richmond

                #8
                Re: rand() % n Revisited

                CBFalconer wrote:
                lawrence.jones@ siemens.com wrote:
                >Paul Hsieh <websnarf@gmail .comwrote:
                >>
                >>No C compiler's rand() that I have ever seen has, by itself, a
                >>worse effect on random output than that.
                >Then you've never seen the truely bad BSD rand() that fostered
                >most of the paranoia about rand. With it, rand() % 1 generated
                >the sequence 0, 1, 0, 1, 0, 1, ....
                >
                FYI the value of "rand() % 1" is identically 0. Except it may be
                considerably slower than just writing "0".
                >
                Not if your compiler has a good optimizer... ;-)

                --
                +----------------------------------------------------------------+
                | Charles and Francis Richmond richmond at plano dot net |
                +----------------------------------------------------------------+

                Comment

                • Keith Thompson

                  #9
                  Re: rand() % n Revisited

                  Charles Richmond <frizzle@tx.rr. comwrites:
                  CBFalconer wrote:
                  >lawrence.jones@ siemens.com wrote:
                  >>Paul Hsieh <websnarf@gmail .comwrote:
                  >>>
                  >>>No C compiler's rand() that I have ever seen has, by itself, a
                  >>>worse effect on random output than that.
                  >>Then you've never seen the truely bad BSD rand() that fostered
                  >>most of the paranoia about rand. With it, rand() % 1 generated
                  >>the sequence 0, 1, 0, 1, 0, 1, ....
                  >FYI the value of "rand() % 1" is identically 0. Except it may be
                  >considerably slower than just writing "0".
                  >>
                  >
                  Not if your compiler has a good optimizer... ;-)
                  Maybe. rand() has side effects, so a call to it can't be optimized
                  away, *unless* the compiler can prove that the side effects don't
                  affect the program's output.

                  --
                  Keith Thompson (The_Other_Keit h) kst-u@mib.org <http://www.ghoti.net/~kst>
                  Nokia
                  "We must do something. This is something. Therefore, we must do this."
                  -- Antony Jay and Jonathan Lynn, "Yes Minister"

                  Comment

                  • Eric Sosman

                    #10
                    Re: rand() % n Revisited

                    Keith Thompson wrote:
                    Charles Richmond <frizzle@tx.rr. comwrites:
                    >CBFalconer wrote:
                    >>lawrence.jones@ siemens.com wrote:
                    >>>Paul Hsieh <websnarf@gmail .comwrote:
                    >>>>
                    >>>>No C compiler's rand() that I have ever seen has, by itself, a
                    >>>>worse effect on random output than that.
                    >>>Then you've never seen the truely bad BSD rand() that fostered
                    >>>most of the paranoia about rand. With it, rand() % 1 generated
                    >>>the sequence 0, 1, 0, 1, 0, 1, ....
                    >>FYI the value of "rand() % 1" is identically 0. Except it may be
                    >>considerabl y slower than just writing "0".
                    >>>
                    >Not if your compiler has a good optimizer... ;-)
                    >
                    Maybe. rand() has side effects, so a call to it can't be optimized
                    away, *unless* the compiler can prove that the side effects don't
                    affect the program's output.
                    srand( rand() % 1 );

                    --
                    Eric Sosman
                    esosman@ieee-dot-org.invalid

                    Comment

                    • William Hughes

                      #11
                      Re: rand() % n Revisited

                      On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail .comwrote:
                      On Oct 22, 6:04 pm, Rich Fife <rf...@amug.org wrote:
                      >
                      Quick rand() question:
                      >
                      I know you're not supposed to use "rand() % 1024" for instance,
                      because it focuses on the lower bits. However, it seems to me
                      that given that the argument is not a power of two (or near a
                      power of two), that this is not an issue. The upper bits will
                      participate equally in the result with the lower. Am I missing
                      something?
                      >
                      Yes, you are missing some mathematical analysis to back up what you
                      just said. If you do (rand() % 1023) on Microsoft Visual C++ or
                      WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
                      matter how good your random number generator is.
                      Well, 3% certainly meets the informal meaning of small. If your
                      problem
                      is such that you are worried about the 3% you should probably be more
                      worried about the fact that the rand() you are using has an output
                      space of only 16 bits.
                      No C compiler's
                      rand() that I have ever seen has, by itself, a worse effect on random
                      output than that.
                      Then you have not seen a rand() implementation that switched parity
                      on each call. I understand such an implementation not only existed
                      but
                      was relatively widespread.
                      >
                      The C.L.C. FAQ about this gives extremely misleading advice on this
                      point and it should seriously be ignored.
                      No. Using the advice given in the C.L.C. means that you will get
                      reasonable
                      results, even if rand() implementation is poor, as long as rand()
                      produces
                      integers that are more or less uniformly distributed in 0...RAND_MAX.
                      If you use the "rand() % n" technique you have no such guarantee.
                      The bias is small. Do not confuse detectablity with importance.
                      (The use of "signifcanc e" in the term "statistica l significance"
                      leads many people astray).

                      - William Hughes

                      Comment

                      • Nate Eldredge

                        #12
                        Re: rand() % n Revisited

                        William Hughes <wpihughes@hotm ail.comwrites:
                        >No C compiler's
                        >rand() that I have ever seen has, by itself, a worse effect on random
                        >output than that.
                        >
                        Then you have not seen a rand() implementation that switched parity
                        on each call. I understand such an implementation not only existed
                        but
                        was relatively widespread.
                        Right you are. Here is the rand() implementation from the very
                        influential 4.4BSD-Lite.

                        #define RAND_MAX 0x7fffffff

                        static u_long next = 1;

                        int
                        rand()
                        {
                        return ((next = next * 1103515245 + 12345) % ((u_long)RAND_M AX + 1));
                        }

                        void
                        srand(seed)
                        u_int seed;
                        {
                        next = seed;
                        }

                        Comment

                        • Paul Hsieh

                          #13
                          Re: rand() % n Revisited

                          On Oct 23, 7:45 pm, William Hughes <wpihug...@hotm ail.comwrote:
                          On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail .comwrote:
                          On Oct 22, 6:04 pm, Rich Fife <rf...@amug.org wrote:
                          Quick rand() question:
                          >
                          I know you're not supposed to use "rand() % 1024" for instance,
                          because it focuses on the lower bits.  However, it seems to me
                          that given that the argument is not a power of two (or near a
                          power of two), that this is not an issue.  The upper bits will
                          participate equally in the result with the lower.  Am I missing
                          something?
                          >
                          Yes, you are missing some mathematical analysis to back up what you
                          just said.  If you do (rand() % 1023) on Microsoft Visual C++ or
                          WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
                          matter how good your random number generator is.
                          >
                          Well, 3% certainly meets the informal meaning of small.
                          You might like to explain that to the average casino. Card counting
                          in black
                          jack gives player around a 1% advantage over the house, and the
                          casinos kick
                          out such people whenever they are discovered. People who attack
                          defective
                          casino games rely on people with attitudes like yours.

                          Even if you are implementing something as simple as a 1d20 (where each
                          choice
                          itself is only 5%) in a dungeons and dragons game, the players will
                          easily see
                          that bias over time.
                          [...] If your
                          problem is such that you are worried about the 3% you should probably
                          be more worried about the fact that the rand() you are using has an
                          output space of only 16 bits.
                          That statement doesn't follow any line of logic of any relevance. If
                          you
                          care, then you care, and you want to get a correct ranged random
                          number
                          generator. If you go up to 32 bits, but still have bias that's just
                          a
                          little smaller, how can you be happy? And if you want to write
                          portable
                          code, then what are you going to do?
                          No C compiler's
                          rand() that I have ever seen has, by itself, a worse effect on random
                          output than that.
                          >
                          Then you have not seen a rand() implementation that switched parity
                          on each call.  I understand such an implementation not only existed
                          but was relatively widespread.
                          True enough, but this fundamentally comes from the lack of analysis.
                          The
                          C.L.C. FAQ just continues this tradition by failing to give effective
                          analysis of the problem.
                          The C.L.C. FAQ about this gives extremely misleading advice on this
                          point and it should seriously be ignored.
                          >
                          No.  Using the advice given in the C.L.C.  means that you will get
                          reasonable results, even if rand() implementation is poor, as long
                          as rand() produces integers that are more or less uniformly
                          distributed in 0...RAND_MAX.
                          Did you know that a simple counter will produce numbers that are
                          exactly
                          uniformly distributed in 0 ... RAND_MAX? You know, basic
                          understanding is
                          sometimes actually useful on occasion.
                          If you use the "rand() % n" technique you have no such guarantee.
                          The technique shown in the CLC FAQ also has no such guarantee. Its
                          totally besides the point.
                          The bias is small.
                          Define small. If you want to test how often a hash function will map
                          to
                          a common bucket either (rand() % n) or
                          (rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
                          will
                          produce worthless results no matter what.
                          [...] Do not confuse detectablity with importance.
                          I assure you, I am not the one confused. The C.L.C. FAQ is giving a
                          solution that assumes a policy where low bit determinism is a worse
                          problem than pure measurable bias and also a worse problem than a
                          simple
                          range issue. In fact the CLC FAQ is promoting confusion by not
                          explaining the issue correctly and consequently how one might deal
                          with
                          the problem.
                          (The use of "significan ce" in the term "statistica l significance"
                          leads many people astray).
                          What has that got to do with anything? If you wish to test something
                          with a very small probability which is lower than the bias being
                          introduced by such short-sighted techniques then what good is the
                          C.L.C.
                          FAQs discussion on the subject?

                          Who actually wants to use a PRNG which is biased or incapable of even
                          measuring what you want? The subject deserves to be discussed
                          usefully.
                          The C.L.C. just harps one single anomaly that has resulted for the
                          weakness of the ANSI C standard.

                          --
                          Paul Hsieh
                          Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.


                          Comment

                          • Richard Bos

                            #14
                            Re: rand() % n Revisited

                            Paul Hsieh <websnarf@gmail .comwrote:
                            The C.L.C. FAQ about this gives extremely misleading advice on this
                            point and it should seriously be ignored. If you want to seriously
                            deal with random numbers just read my page about it:
                            Or possibly don't. If the code on that page is as rotten as the HTML, I
                            wouldn't trust it.

                            Then, search this and other newsgroups for posts by George Marsaglia,
                            who _really_ knows what a good PRNG is like.

                            Richard

                            Comment

                            • William Hughes

                              #15
                              Re: rand() % n Revisited

                              On Oct 23, 11:29 pm, Paul Hsieh <websn...@gmail .comwrote:
                              On Oct 23, 7:45 pm, William Hughes <wpihug...@hotm ail.comwrote:
                              >
                              >
                              >
                              On Oct 22, 11:48 pm, Paul Hsieh <websn...@gmail .comwrote:
                              On Oct 22, 6:04 pm, Rich Fife <rf...@amug.org wrote:
                              Quick rand() question:
                              >
                              I know you're not supposed to use "rand() % 1024" for instance,
                              because it focuses on the lower bits. However, it seems to me
                              that given that the argument is not a power of two (or near a
                              power of two), that this is not an issue. The upper bits will
                              participate equally in the result with the lower. Am I missing
                              something?
                              >
                              Yes, you are missing some mathematical analysis to back up what you
                              just said. If you do (rand() % 1023) on Microsoft Visual C++ or
                              WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
                              matter how good your random number generator is.
                              >
                              Well, 3% certainly meets the informal meaning of small.
                              >
                              You might like to explain that to the average casino.

                              The explanation goes "Even differences that are usually
                              considered small, e.g. 3%. can be very very important."
                              If the casino is still in business they will tell
                              me to teach my Grandmother to suck eggs.


                              Card counting
                              in black
                              jack gives player around a 1% advantage over the house,
                              Teach your Grandmother to suck eggs
                              and the casinos kick
                              out such people whenever they are discovered. People who attack
                              defective casino games rely on people with attitudes like yours.
                              >
                              Even if you are implementing something as simple as a 1d20 (where each
                              choice
                              itself is only 5%) in a dungeons and dragons game, the players will
                              easily see
                              that bias over time.
                              Piffle (even if we are talking about a 3% bias and not
                              the less than .1% bias you get with rand()%20 ).
                              And even if someone took the trouble to notice (e.g. tabulated
                              1000's of rolls and applied statistical techniques) they would notice
                              that the bias had no practical import.
                              >
                              [...] If your
                              problem is such that you are worried about the 3% you should probably
                              be more worried about the fact that the rand() you are using has an
                              output space of only 16 bits.
                              >
                              That statement doesn't follow any line of logic of any relevance. If
                              you
                              care, then you care, and you want to get a correct ranged random
                              number
                              generator. If you go up to 32 bits, but still have bias that's just
                              a
                              little smaller, how can you be happy?

                              E.g. I am interested at tail distributions. The performance of
                              my random generator has gone from terrible to reasonable.
                              >And if you want to write portable code, then what are you going to do?
                              Either I don't need much, in which case I can use
                              the system rand() or I provide my_rand().
                              >
                              No C compiler's
                              rand() that I have ever seen has, by itself, a worse effect on random
                              output than that.
                              >
                              Then you have not seen a rand() implementation that switched parity
                              on each call. I understand such an implementation not only existed
                              but was relatively widespread.
                              >
                              True enough, but this fundamentally comes from the lack of analysis.
                              The
                              C.L.C. FAQ just continues this tradition by failing to give effective
                              analysis of the problem.
                              >
                              The C.L.C. FAQ about this gives extremely misleading advice on this
                              point and it should seriously be ignored.
                              >
                              No. Using the advice given in the C.L.C. means that you will get
                              reasonable results, even if rand() implementation is poor, as long
                              as rand() produces integers that are more or less uniformly
                              distributed in 0...RAND_MAX.
                              >
                              Did you know that a simple counter will produce numbers that are
                              exactly
                              uniformly distributed in 0 ... RAND_MAX?

                              Indeed, one needs more than uniformly distributed.
                              The basic point, that the rand() implementation needs
                              to be really bad to produce unreasonable results
                              with the FAQ technique, but the rand() implementation
                              only needs to be a bit bad to produce unreasonable
                              results with the rand()%n technique remains.


                              >You know, basic
                              understanding is
                              sometimes actually useful on occasion.
                              >
                              If you use the "rand() % n" technique you have no such guarantee.
                              >
                              The technique shown in the CLC FAQ also has no such guarantee. Its
                              totally besides the point.
                              >
                              The bias is small.
                              >
                              Define small. If you want to test how often a hash function will map
                              to
                              a common bucket either (rand() % n) or
                              (rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
                              will
                              produce worthless results no matter what.
                              >
                              No. A test that looks for perfection vs bias, will find a bias,
                              but since there are lots and lots of ways of introducing a
                              insignficant (note I did _not_ say "statistica lly insignificant")
                              bias, a test that looks for perfection vs bias is stupid.

                              [...] Do not confuse detectablity with importance.
                              >
                              I assure you, I am not the one confused. The C.L.C. FAQ is giving a
                              solution that assumes a policy where low bit determinism is a worse
                              problem than pure measurable bias and also a worse problem than a
                              simple
                              range issue. In fact the CLC FAQ is promoting confusion by not
                              explaining the issue correctly and consequently how one might deal
                              with
                              the problem.
                              >
                              (The use of "significan ce" in the term "statistica l significance"
                              leads many people astray).
                              >
                              What has that got to do with anything? If you wish to test something
                              with a very small probability which is lower than the bias being
                              introduced by such short-sighted techniques then what good is the
                              C.L.C.
                              FAQs discussion on the subject?
                              >
                              Who actually wants to use a PRNG which is biased
                              [I recall a wonderful poem about an archer who claimed
                              he was best, because, although he never came near
                              the target, he was unbiased, Lack of bias is not
                              everything!]


                              Lots of people don't care a fig. If I want to shuffle cards for a
                              bridge game
                              then I don't care about a 3% bias. (If I want to shuffle cards for
                              a computer poker game, then the fact that the average rand()
                              is about as cryptographicly secure as a Ceasar cypher is more
                              important than a 3% bias). If the wumpus alternates between
                              being in the left half of the maze and the right half of the maze,
                              I care a lot!

                              The CLC FAQ solves a real (although probably now historical) problem.

                              The system rand() may not be suitable for many applicatins.
                              Fixing one problem, which is not a problem in most applications
                              where the system rand() is suitable, does not magically make rand()
                              produce high quality random numbers.

                              - William Hughes

                              Comment

                              Working...