Most pythonic way of weighted random selection

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Manuel Ebert

    Most pythonic way of weighted random selection

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Dear list,

    who's got aesthetic advice for the following problem? I've got some
    joint probabilities of two distinct events Pr(X=x, Y=y), stored in a
    list of lists of floats, where every row represents a possible
    outcome of X and every float in a row a possible outcome of Y (which
    I will now call my matrix, altough I can't use numpy or numeric for
    this), so e.g. m = [[0.2, 0.4, 0.05], [0.1, 0.05, 0.2]]. All lists in
    the list are equally long and the values of the flattened list add up
    to 1.0, i.e. sum([sum(row) for row in m]) == 1. In practice, this
    'matrix' is about 20x40, i.e. a list with 20 lists á 40 floats each.

    Now the task is to select one outcome for X and Y based on the joint
    probabilites, and afterwards check that the outcomes fullfill certain
    criteria. If it doesn't fulfill some criteria a new pair of outcomes
    has to be selected, for other criteria it will still be selected with
    a certain probability. My approach was to choose a random number, and
    then walk through the list adding the values together until the
    accumulated sum is greater than my random threshold:

    import random
    r = random.random()
    s = 0.0
    p = 0.2 # probability of selecting outcome if it doesn't fulfill
    criterion 2
    break_loop = False
    while not break_loop:
    for row_index in range(len(m)):
    for col_index in range(len(row)) :
    s += m[row_index][col_index]
    if s >= r:
    if not fulfills_criter ion_a(row_index , col_index):
    break_loop = True
    elif not fulfills_criter ion_b(row_index , col_index):
    if random.random() <= p:
    return row_index, col_index
    else:
    break_loop = True
    else:
    return row_index, col_index
    if break_loop: break
    if break_loop: break
    break_loop = False


    Now that looks plain ugly, and I wonder whether you might find a
    slightly more elegant way of doing it without using numpy and the like.

    Bests,
    Manuel
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.7 (Darwin)

    iD8DBQFIuWoncZ7 0OCIgLecRArV4AJ 9ynhC/McegMIYTWOOOW4p 44t3rWgCbBjvm
    1JRHy5kp1qIGLDa CTXXFcSs=
    =X6Sv
    -----END PGP SIGNATURE-----
  • bearophileHUGS@lycos.com

    #2
    Re: Most pythonic way of weighted random selection

    Manuel Ebert, this may be related/useful:


    Note that numpy has a bisection method/function that are probably
    quite faster.

    Bye,
    bearophile

    Comment

    • duncan smith

      #3
      Re: Most pythonic way of weighted random selection

      Manuel Ebert wrote:
      Dear list,
      >
      who's got aesthetic advice for the following problem? I've got some
      joint probabilities of two distinct events Pr(X=x, Y=y), stored in a
      list of lists of floats, where every row represents a possible outcome
      of X and every float in a row a possible outcome of Y (which I will now
      call my matrix, altough I can't use numpy or numeric for this), so e.g.
      m = [[0.2, 0.4, 0.05], [0.1, 0.05, 0.2]]. All lists in the list are
      equally long and the values of the flattened list add up to 1.0, i.e.
      sum([sum(row) for row in m]) == 1. In practice, this 'matrix' is about
      20x40, i.e. a list with 20 lists á 40 floats each.
      >
      Now the task is to select one outcome for X and Y based on the joint
      probabilites, and afterwards check that the outcomes fullfill certain
      criteria. If it doesn't fulfill some criteria a new pair of outcomes has
      to be selected, for other criteria it will still be selected with a
      certain probability. My approach was to choose a random number, and then
      walk through the list adding the values together until the accumulated
      sum is greater than my random threshold:
      >
      [snip]

      For a list of cumulative probabilities you could use the bisect module
      to find the insertion point for a 0,1 uniform variate (which you could
      then map back to a cell index).

      Alternatively you could use an alias table,
      http://amath.colorado.edu/courses/74.../Web/SS-10.ppt.

      You could produce a probability table that takes into account your
      criteria, so that you don't need to check them. i.e. Each cell that
      does not satisfy criterion a has probability 0 of being selected. Each
      other cell has a probability of being selected proportional to its
      original value if it satisfies criterion b, or its original value * p
      otherwise. Normalise the resulting values to sum to 1 and you're done
      with the need to check against criteria.

      So, preprocess your table, create a corresponding list of cumulative
      probabilities and use the bisect model (is one possibility).

      Duncan

      Comment

      • Steven D'Aprano

        #4
        Re: Most pythonic way of weighted random selection

        On Sat, 30 Aug 2008 17:41:27 +0200, Manuel Ebert wrote:
        Dear list,
        >
        who's got aesthetic advice for the following problem?
        ....

        [ugly code removed]
        Now that looks plain ugly, and I wonder whether you might find a
        slightly more elegant way of doing it without using numpy and the like.
        Never be afraid to factor out pieces of code into small functions.
        Writing a single huge while loop that does everything is not only hard to
        read, hard to write and hard to debug, but it can also run slower. (It
        depends on a number of factors.)

        Anyway, here's my attempt to solve the problem, as best as I can
        understand it:


        import random

        def eq(x, y, tol=1e-10):
        # floating point equality within some tolerance
        return abs(x-y) <= tol

        M = [[0.2, 0.4, 0.05], [0.1, 0.05, 0.2]]
        # the sums of each row must sum to 1.0
        assert eq(1.0, sum([sum(row) for row in M]))

        # build a cumulative probability matrix
        CM = []
        for row in M:
        for p in row:
        CM.append(p) # initialize with the raw probabilities

        for i in range(1, len(CM)):
        CM[i] += CM[i-1] # and turn into cumulative probabilities

        assert CM[0] >= 0.0
        assert eq(CM[-1], 1.0)

        def index(data, p):
        """Return the index of the item in data
        which is no smaller than float p.
        """
        # Note: this uses a linear search. If it is too slow,
        # you can re-write it using the bisect module.
        for i, x in enumerate(data) :
        if x >= p:
        return i
        return len(data-1)

        def index_to_rowcol umn(i, num_columns):
        """Convert a linear index number i into a (row, column) tuple."""
        # When converting [ [a, b, c, ...], [...] ] into a single
        # array [a, b, c, ... z] we have the identity:
        # index number = row number * number of columns + column number
        return divmod(i, num_columns)

        # Now with these two helper functions, we can find the row and column
        # number of the first entry in M where the cumulative probability
        # exceeds some given value.

        # You will need to define your own fulfills_criter ion_a and
        # fulfills_criter ion_b, but here's a couple of mock functions
        # for testing with:

        def fulfills_criter ion_a(row, column):
        return random.random() < 0.5

        fulfills_criter ion_b = fulfills_criter ion_a

        def find_match(p=0. 2):
        while True:
        r = random.random()
        i = index(CM, r)
        row, column = index_to_rowcol umn(i, len(M[0]))
        if fulfills_criter ion_a(row, column) or \
        fulfills_criter ion_b(row, column):
        return row, column
        else:
        if random.random() <= p:
        return row, column


        And here's my test:
        >>find_match( )
        (1, 2)


        Hope this helps.


        --
        Steven

        Comment

        Working...