Problem: 'Threads' in Python?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ralph Sluiters

    Problem: 'Threads' in Python?

    Hi,
    i've got a small problem with my python-script. It is a cgi-script, which is
    called regulary (e.g. every 5 minutes) and returns a xml-data-structure.
    This script calls a very slow function, with a duration of 10-40 seconds. To
    avoid delays, i inserted a cache for the data. So, if the script is called,
    it returns the last caculated data-structure and then the function is called
    again and the new data is stored in the cache. (There is no problem to use
    older but faster data)

    My problem is, that the client (A Java program (or browser, command line))
    waits, until the whole script has ended and so the cache is worthless. How
    can I tell the client/browser/... that after the last print line there is no
    more data and it can proceed? Or how can I tell the python script, that
    everything after the return of the data (the retieval of the new data and
    the storage in a file) can be done in an other thread or in the background?

    Greetings

    Ralph


  • Francis Avila

    #2
    Re: Problem: 'Threads' in Python?


    Ralph Sluiters wrote in message ...[color=blue]
    >Hi,
    >i've got a small problem with my python-script. It is a cgi-script, which[/color]
    is[color=blue]
    >called regulary (e.g. every 5 minutes) and returns a xml-data-structure.
    >This script calls a very slow function, with a duration of 10-40 seconds.[/color]
    To[color=blue]
    >avoid delays, i inserted a cache for the data. So, if the script is called,
    >it returns the last caculated data-structure and then the function is[/color]
    called[color=blue]
    >again and the new data is stored in the cache. (There is no problem to use
    >older but faster data)
    >
    >My problem is, that the client (A Java program (or browser, command line))
    >waits, until the whole script has ended and so the cache is worthless. How
    >can I tell the client/browser/... that after the last print line there is[/color]
    no[color=blue]
    >more data and it can proceed? Or how can I tell the python script, that
    >everything after the return of the data (the retieval of the new data and
    >the storage in a file) can be done in an other thread or in the background?[/color]

    Wouldn't a better approach be to decouple the cache mechanism from the cgi
    script? Have a long-running Python process act as a memoizing cache and
    delegate requests to the slow function. The cgi scripts then connect to
    this cache process (via your favorite IPC mechanism). If the cache process
    has a record of the call/request, it returns the previous value immediately,
    and updates its cache in the meantime. If it doesn't have a record, then it
    blocks the cgi script until it gets a result.

    How can threading help you if the cgi-process dies after each request unless
    you store the value somewhere else? And if you store the value somewhere,
    why not have another process manage that storage? If it's possible to
    output a complete page before the cgi script terminates (I don't know if the
    server blocks until the script terminates), then you could do the cache
    updating afterwards. In this case I guess you could use a pickled
    dictionary or something as your cache, and you don't need a separate
    process. But even here you wouldn't necessarily use threads.

    Threads are up there with regexps: powerful, but avoid as much as possible.
    --
    Francis Avila

    Comment

    • Ralph Sluiters

      #3
      Re: Problem: 'Threads' in Python?

      > Wouldn't a better approach be to decouple the cache mechanism from the cgi[color=blue]
      > script? Have a long-running Python process act as a memoizing cache and
      > delegate requests to the slow function. The cgi scripts then connect to
      > this cache process (via your favorite IPC mechanism). If the cache[/color]
      process[color=blue]
      > has a record of the call/request, it returns the previous value[/color]
      immediately,[color=blue]
      > and updates its cache in the meantime. If it doesn't have a record, then[/color]
      it[color=blue]
      > blocks the cgi script until it gets a result.[/color]
      The caching can not be decoupled, because the cgi-script gets an folder ID
      gets only data from this "folder". So if I decouple die processes, I don't
      know which folders to cache and I can not cache all folders, because the
      routine is to slow. So I must get the actual folder from cgi and then cache
      this one as long as the uses is in this folder and pulls data every 2
      Minutes and cache another folder, if
      the uses changes his folder.
      [color=blue]
      > How can threading help you if the cgi-process dies after each request[/color]
      unless[color=blue]
      > you store the value somewhere else? And if you store the value somewhere,
      > why not have another process manage that storage? If it's possible to
      > output a complete page before the cgi script terminates (I don't know if[/color]
      the[color=blue]
      > server blocks until the script terminates), then you could do the cache
      > updating afterwards. In this case I guess you could use a pickled
      > dictionary or something as your cache, and you don't need a separate
      > process. But even here you wouldn't necessarily use threads.[/color]
      The data is to large to store it in the memmory and with this method, as you
      said, threading wouldn't help, but I store the data in the disk.

      My code:

      #Read from file
      try:
      oldfile = open(filename," r")
      oldresult =string.joinfie lds(oldfile.rea dlines(),'\r\n' )
      oldfile.close()
      except:
      # Start routine
      oldresult = get_data(ID) # Get xml data
      # Print header, so that it is returned via HTTP
      print string.joinfiel ds(header, '\r\n')
      print oldresult

      # ***

      # Start routine
      result = get_data(ID) # Get xml data
      #Save to file
      newfile = open(filename, "w")
      newfile.writeli nes(result)
      newfile.close()
      #END

      At the position *** the rest of the script must be uncoupled, so that the
      client can proceed with the actual data, but the new data generation for the
      next time ist stored in a file.

      Ralph


      Comment

      • Dennis Lee Bieber

        #4
        Re: Problem: 'Threads' in Python?

        Ralph Sluiters fed this fish to the penguins on Tuesday 06 January 2004
        02:07 am:

        [color=blue]
        > The caching can not be decoupled, because the cgi-script gets an
        > folder ID gets only data from this "folder". So if I decouple die
        > processes, I don't know which folders to cache and I can not cache all
        > folders, because the routine is to slow. So I must get the actual
        > folder from cgi and then cache this one as long as the uses is in this
        > folder and pulls data every 2 Minutes and cache another folder, if
        > the uses changes his folder.
        >[/color]
        I've been having some difficulty following this thread but...

        Isn't this what Cookies are for? Obtaining some sort of user ID/state
        that can be passed into the processing to allow for continuing from a
        previous connection?

        HTTP is normally stateless. The client requests a page, the page
        contents are obtained (either a static page, or some CGI-style
        computation generates the immediate page data), the page is returned,
        and the connection ends. If the page needs to be updated, that is a
        completely separate transaction.

        Cookies are used to link these separate transactions into one "whole";
        the first time the client requests the page, a cookie is generated. On
        subsequent requests (updates) the (now) existing cookie is sent back to
        the server to identify the user and allow for selecting the proper
        continuation state.
        [color=blue]
        >
        > At the position *** the rest of the script must be uncoupled, so that
        > the client can proceed with the actual data, but the new data
        > generation for the next time ist stored in a file.
        >[/color]
        I've not coded CGI stuff (don't have access to a server that permits
        user CGI) but my rough view of this task would be:

        CGI******
        if no cookie
        generate a cookie for this user
        endif
        pass (received or generated) cookie to background process
        wait for return-data from background process (if a new cookie, this
        will take time to compute, otherwise the background process should
        already have computed it)
        return web-page with cookie and data

        Background***** ***
        loop
        scan "cache" list for expired cookies (unused threads)
        terminate related process thread (process thread should clean up disk
        files used)
        clean up (delete) cookie from "cache" list
        get request (and cookie) from CGI
        if the cookie is not in the "cache" list
        create new processing thread
        endif
        Use cookie data to identify (existing) processing thread and read next
        data batch from it (queue.queue perhaps, one queue per cookie).
        Return data (processing thread continues to compute next update)
        endloop


        You probably want to include, in "Background " a bit of logic to track
        "last request time" and terminate processing threads if no client has
        asked for an update in some period of time. The Cookies should also
        have expiration times associated so that reconnecting after a period of
        time will force a new cookie.

        As for the folder? If the user physically navigates to other folders,
        that can be passed to the background process and used to update the
        threads (or create a new thread, if you assume the cookie identifies a
        folder).

        Caching would be semi-automatic here. The processing threads could be
        folder specific, and when the thread is terminated (on lack of update
        requests... let's see, you expect 2-minute update period, allow for a
        slow net, say you terminate a process after 5 minutes of disuse...) you
        can clean up the disk space (folder) that process was using. The cookie
        expiration time would be updated on each update.

        The master web page should have whatever HTML tags force a timed
        reload to do a new request every 2 minutes.

        --[color=blue]
        > =============== =============== =============== =============== == <
        > wlfraed@ix.netc om.com | Wulfraed Dennis Lee Bieber KD6MOG <
        > wulfraed@dm.net | Bestiaria Support Staff <
        > =============== =============== =============== =============== == <
        > Bestiaria Home Page: http://www.beastie.dm.net/ <
        > Home Page: http://www.dm.net/~wulfraed/ <[/color]

        Comment

        • Ralph Sluiters

          #5
          Re: Problem: 'Threads' in Python?

          You did everything, but not answer my question. I know what cookies are, but
          I don't need cookies here. And you said in your answer "start background
          process", that was my question. How can I start a background process.

          But I've solved it now,

          Ralph


          Comment

          • Ralph Sluiters

            #6
            I got the solution [was:Re: Problem: 'Threads' in Python?]

            Simply put the last part in an extra file 'cachedata.py', then use

            import os
            os.spawnlp(os.P _NOWAIT, 'python', 'python', 'cachedata.py')

            to call this as child process and DON'T wait for this process.

            Ralph


            Comment

            Working...