threading - race condition?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • skunkwerk

    threading - race condition?

    i'm getting the wrong output for the 'title' attributes for this
    data. the queue holds a data structure (item name, position, and list
    to store results in). each thread takes in an item name and queries a
    database for various attributes. from the debug statements the item
    names are being retrieved correctly, but the attributes returned are
    those of other items in the queue - not its own item. however, the
    model variable is not a global variable... so i'm not sure what's
    wrong.

    i've declared a bunch of worker threads (100) and a queue into which
    new requests are inserted, like so:

    queue = Queue.Queue(0)
    WORKERS=100
    for i in range(WORKERS):
    thread = SDBThread(queue )
    thread.setDaemo n(True)
    thread.start()

    the thread:

    class SimpleDBThread ( threading.Threa d ):
    def __init__ ( self, queue ):
    self.__queue = queue
    threading.Threa d.__init__ ( self )
    def run ( self ):
    while 1:
    item = self.__queue.ge t()
    if item!=None:
    model = domain.get_item (item[0])
    logger.debug('s dbthread item:'+item[0])
    title = model['title']
    scraped = model['scraped']
    logger.debug("s dbthread title:"+title)

    any suggestions?
    thanks
  • John Nagle

    #2
    Re: threading - race condition?

    skunkwerk wrote:
    i'm getting the wrong output for the 'title' attributes for this
    data. the queue holds a data structure (item name, position, and list
    to store results in). each thread takes in an item name and queries a
    database for various attributes. from the debug statements the item
    names are being retrieved correctly, but the attributes returned are
    those of other items in the queue - not its own item. however, the
    model variable is not a global variable... so i'm not sure what's
    wrong.
    >
    i've declared a bunch of worker threads (100) and a queue into which
    new requests are inserted, like so:
    >
    queue = Queue.Queue(0)
    WORKERS=100
    for i in range(WORKERS):
    thread = SDBThread(queue )
    thread.setDaemo n(True)
    thread.start()
    >
    the thread:
    >
    class SimpleDBThread ( threading.Threa d ):
    def __init__ ( self, queue ):
    self.__queue = queue
    threading.Threa d.__init__ ( self )
    def run ( self ):
    while 1:
    item = self.__queue.ge t()
    if item!=None:
    model = domain.get_item (item[0])
    logger.debug('s dbthread item:'+item[0])
    title = model['title']
    scraped = model['scraped']
    logger.debug("s dbthread title:"+title)
    >
    any suggestions?
    thanks
    Hm. We don't have enough code here to see what's wrong.
    For one thing, we're not seeing how items get put on the queue. The
    trouble might be at the "put" end.

    Make sure that "model", "item", "title", and "scraped" are not globals.
    Remember, any assignment to them in a global context makes them a global.

    You should never get "None" from the queue unless you put a "None"
    on the queue. "get()" blocks until there's work to do.

    John Nagle

    Comment

    • skunkwerk

      #3
      Re: threading - race condition?

      On May 9, 12:12 am, John Nagle <na...@animats. comwrote:
      skunkwerk wrote:
      i'm getting the wrong output for the 'title' attributes for this
      data.  the queue holds a data structure (item name, position, and list
      to store results in).  each thread takes in an item name and queries a
      database for various attributes.  from the debug statements the item
      names are being retrieved correctly, but the attributes returned are
      those of other items in the queue - not its own item.  however, the
      model variable is not a global variable... so i'm not sure what's
      wrong.
      >
      i've declared a bunch of workerthreads(1 00) and a queue into which
      new requests are inserted, like so:
      >
      queue = Queue.Queue(0)
       WORKERS=100
      for i in range(WORKERS):
         thread = SDBThread(queue )
         thread.setDaemo n(True)
         thread.start()
      >
      the thread:
      >
      class SimpleDBThread ( threading.Threa d ):
         def __init__ ( self, queue ):
                 self.__queue = queue
                 threading.Threa d.__init__ ( self )
         def run ( self ):
                 while 1:
                         item = self.__queue.ge t()
                         if item!=None:
                                 model = domain.get_item (item[0])
                                 logger.debug('s dbthread item:'+item[0])
                                 title = model['title']
                                 scraped = model['scraped']
                                 logger.debug("s dbthread title:"+title)
      >
      any suggestions?
      thanks
      >
         Hm.  We don't have enough code here to see what's wrong.
      For one thing, we're not seeing how items get put on the queue.  The
      trouble might be at the "put" end.
      >
         Make sure that "model", "item", "title", and "scraped" are not globals.
      Remember, any assignment to them in a global context makes them a global.
      >
         You should never get "None" from the queue unless you put a "None"
      on the queue.  "get()" blocks until there's work to do.
      >
                                              John Nagle
      thanks John, Gabriel,
      here's the 'put' side of the requests:

      def prepSDBSearch(r esults):
      modelList = [0]
      counter=1
      for result in results:
      data = [result.item, counter, modelList]
      queue.put(data)
      counter+=1
      while modelList[0] < len(results):
      print 'waiting...'#wa it for them to come home
      modelList.pop(0 )#now remove '0'
      return modelList

      responses to your follow ups:
      1) 'item' in the threads is a list that corresponds to the 'data'
      list in the above function. it's not global, and the initial values
      seem ok, but i'm not sure if every time i pass in data to the queue it
      passes in the same memory address or declares a new 'data' list (which
      I guess is what I want)
      2) john, i don't think any of the variables you mentioned are
      global. the 'none' check was just for extra safety.
      3) the first item in the modelList is a counter that keeps track of
      the number of threads for this call that have completed - is there any
      better way of doing this?

      thanks again

      Comment

      • Gabriel Genellina

        #4
        Re: threading - race condition?

        En Sun, 11 May 2008 13:16:25 -0300, skunkwerk <skunkwerk@gmai l.comescribió:
        the only issue i have now is that it takes a long time for 100 threads
        to initialize that connection (>5 minutes) - and as i'm doing this on
        a webserver any time i update the code i have to restart all those
        threads, which i'm doing right now in a for loop. is there any way I
        can keep the thread stuff separate from the rest of the code for this
        file, yet allow access?
        Like using a separate thread to create the other 100?

        --
        Gabriel Genellina

        Comment

        • skunkwerk

          #5
          Re: threading - race condition?

          On May 11, 1:55 pm, Dennis Lee Bieber <wlfr...@ix.net com.comwrote:
          On Sun, 11 May 2008 09:16:25 -0700 (PDT),skunkwerk
          <skunkw...@gmai l.comdeclaimed the following in comp.lang.pytho n:
          >
          >
          >
          the only issue i have now is that it takes a long time for 100 threads
          to initialize that connection (>5 minutes) - and as i'm doing this on
          a webserver any time i update the code i have to restart all those
          threads, which i'm doing right now in a for loop. is there any way I
          can keep the thread stuff separate from the rest of the code for this
          file, yet allow access? It wouldn't help having a .pyc or using
          psycho, correct, as the time is being spent in the runtime? something
          along the lines of 'start a new thread every minute until you get to a
          100' without blocking the execution of the rest of the code in that
          file? or maybe any time i need to do a search, start a new thread if
          the #threads is <100?
          >
          Is this running as part of the server process, or as a client
          accessing the server?
          >
          Alternative question: Have you tried measuring the performance using
          /fewer/ threads... 25 or less? I believe I'd mentioned prior that you
          seem to have a lot of overhead code for what may be a short query.
          >
          If the .get_item() code is doing a full sequence of: connect to
          database; format&submit query; fetch results; disconnect from
          database... I'd recommend putting the connect/disconnect outside of the
          thread while loop (though you may then need to put sentinel values into
          the feed queue -- one per thread -- so they can cleanly exit and
          disconnect rather than relying on daemonization for exit).
          >
          thread:
          dbcon = ...
          while True:
          query = Q.get()
          if query == SENTINEL: break
          result = get_item(dbcon, query)
          ...
          dbcon.close()
          >
          Third alternative: Find some way to combine the database queries.
          Rather than 100 threads each doing a single lookup (from your code, it
          appears that only 1 result is expected per search term), run 10 threads
          each looking up 10 items at once...
          >
          thread:
          dbcon = ...
          terms = []
          terminate = False
          while not terminate:
          while len(terms) < 10:
          query = Q.get_nowait()
          if not query: break
          if query == SENTINEL:
          terminate = True
          break
          terms.append(qu ery)
          results = get_item(dbcon, terms)
          terms = []
          #however you are returning items; match the query term to the
          #key item in the list of returned data?
          dbcon.close()
          >
          where the final select statement looks something like:
          >
          SQL = """select key, title, scraped from ***
          where key in ( %s )""" % ", ".join("?" for x in terms)
          #assumes database adapter uses ? for placeholder
          dbcur.execute(S QL, terms)
          --
          Wulfraed Dennis Lee Bieber KD6MOG
          wlfr...@ix.netc om.com wulfr...@bestia ria.com

          (Bestiaria Support Staff: web-a...@bestiaria. com)
          HTTP://www.bestiaria.com/
          thanks again Dennis,
          i chose 100 threads so i could do 10 simultaneous searches (where
          each search contains 10 terms - using 10 threads). the .get_item()
          code is not doing the database connection - rather the intialization
          is done in the initialization of each thread. so basically once a
          thread starts the database connection is persistent and .get_item
          queries are very fast. this is running as a server process (using
          django).

          cheers

          Comment

          • skunkwerk

            #6
            Re: threading - race condition?

            On May 11, 9:10 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
            wrote:
            En Sun, 11 May 2008 13:16:25 -0300,skunkwerk< skunkw...@gmail .comescribió:
            >
            the only issue i have now is that it takes a long time for 100 threads
            to initialize that connection (>5 minutes) - and as i'm doing this on
            a webserver any time i update the code i have to restart all those
            threads, which i'm doing right now in a for loop. is there any way I
            can keep the thread stuff separate from the rest of the code for this
            file, yet allow access?
            >
            Like using a separate thread to create the other 100?
            >
            --
            Gabriel Genellina
            thanks Gabriel,
            i think that could do it - let me try it out. don't know why i
            didn't think of it earlier.

            Comment

            Working...