Locking around

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Nikolaus Rath

    Locking around

    Hello,

    I need to synchronize the access to a couple of hundred-thousand
    files[1]. It seems to me that creating one lock object for each of the
    files is a waste of resources, but I cannot use a global lock for all
    of them either (since the locked operations go over the network, this
    would make the whole application essentially single-threaded even
    though most operations act on different files).

    My idea is therefore to create and destroy per-file locks "on-demand"
    and to protect the creation and destruction by a global lock
    (self.global_lo ck). For that, I add a "usage counter"
    (wlock.user_cou nt) to each lock, and destroy the lock when it reaches
    zero. The number of currently active lock objects is stored in a dict:

    def lock_s3key(s3ke y):

    self.global_loc k.acquire()
    try:

    # If there is a lock object, use it
    if self.key_lock.h as_key(s3key):
    wlock = self.key_lock[s3key]
    wlock.user_coun t += 1
    lock = wlock.lock

    # otherwise create a new lock object
    else:
    wlock = WrappedLock()
    wlock.lock = threading.Lock( )
    wlock.user_coun t = 1
    self.key_lock[s3key] = wlock

    finally:
    self.global_loc k.release()

    # Lock the key itself
    lock.acquire()


    and similarly

    def unlock_s3key(s3 key):

    # Lock dictionary of lock objects
    self.global_loc k.acquire()
    try:

    # Get lock object
    wlock = self.key_lock[s3key]

    # Unlock key
    wlock.lock.rele ase()

    # We don't use the lock object any longer
    wlock.user_coun t -= 1

    # If no other thread uses the lock, dispose it
    if wlock.user_coun t == 0:
    del self.key_lock[s3key]
    assert wlock.user_coun t >= 0

    finally:
    self.global_loc k.release()


    WrappedLock is just an empty class that allows me to add the
    additional user_count attribute.


    My questions:

    - Does that look like a proper solution, or does anyone have a better
    one?

    - Did I overlook any deadlock possibilities?


    Best,
    Nikolaus



    [1] Actually, it's not really files (because in that case I could use
    fcntl) but blobs stored on Amazon S3.


    --
    »It is not worth an intelligent man's time to be in the majority.
    By definition, there are already enough people to do that.«
    -J.H. Hardy

    PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
  • Nikolaus Rath

    #2
    Re: Locking around

    Ulrich Eckhardt <eckhardt@sator laser.comwrites :
    Nikolaus Rath wrote:
    >I need to synchronize the access to a couple of hundred-thousand
    >files[1]. It seems to me that creating one lock object for each of the
    >files is a waste of resources, but I cannot use a global lock for all
    >of them either (since the locked operations go over the network, this
    >would make the whole application essentially single-threaded even
    >though most operations act on different files).
    >
    Just wondering, but at what time do you know what files are needed?
    As soon as I have read a client request. Also, I will only need one
    file per request, not multiple.
    If you know that rather early, you could simply 'check out' the
    required files, do whatever you want with them and then release them
    again. If one of the requested files is marked as already in use,
    you simply wait (without reserving the others) until someone
    releases files and then try again. You could also wait for that
    precise file to be available, but that would require that you
    already reserve the other files, which might unnecessarily block
    other accesses.
    >
    Note that this idea requires that each access locks one set of files at the
    beginning and releases them at the end, i.e. no attempts to lock files in
    between, which would otherwise easily lead to deadlocks.
    I am not sure that I understand your idea. To me this sounds exactly
    like what I'm already doing, just replace 'check out' by 'lock' in
    your description... Am I missing something?
    >My idea is therefore to create and destroy per-file locks "on-demand"
    >and to protect the creation and destruction by a global lock
    >(self.global_l ock). For that, I add a "usage counter"
    >(wlock.user_co unt) to each lock, and destroy the lock when it reaches
    >zero.
    [...code...]
    >
    > - Does that look like a proper solution, or does anyone have a better
    > one?
    >
    This should work, at least the idea is not flawed. However, I'd say
    there are too many locks involved. Rather, you just need a simple
    flag and the global lock. Further, you need a condition/event that
    tells waiting threads that you released some of the files so that it
    should see again if the ones it wants are available.
    I have to agree that this sounds like an easier implementation. I just
    have to think about how to do the signalling. Thanks a lot!


    Best,

    -Nikolaus

    --
    »It is not worth an intelligent man's time to be in the majority.
    By definition, there are already enough people to do that.«
    -J.H. Hardy

    PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

    Comment

    • Nikolaus Rath

      #3
      Re: Locking around

      Nikolaus Rath <Nikolaus@rath. orgwrites:
      >This should work, at least the idea is not flawed. However, I'd say
      >there are too many locks involved. Rather, you just need a simple
      >flag and the global lock. Further, you need a condition/event that
      >tells waiting threads that you released some of the files so that it
      >should see again if the ones it wants are available.
      >
      I have to agree that this sounds like an easier implementation. I
      just have to think about how to do the signalling. Thanks a lot!
      Here's the code I use now. I think it's also significantly easier to
      understand (cv is a threading.Condi tion() object and cv.locked_keys a
      set()).

      def lock_s3key(s3ke y):
      cv = self.s3_lock

      try:
      # Lock set of locked s3 keys (global lock)
      cv.acquire()

      # Wait for given s3 key becoming unused
      while s3key in cv.locked_keys:
      cv.wait()

      # Mark it as used (local lock)
      cv.locked_keys. add(s3key)
      finally:
      # Release global lock
      cv.release()


      def unlock_s3key(s3 key):
      cv = self.s3_lock

      try:
      # Lock set of locked s3 keys (global lock)
      cv.acquire()

      # Mark key as free (release local lock)
      cv.locked_keys. remove(s3key)

      # Notify other threads
      cv.notify()

      finally:
      # Release global lock
      cv.release()


      Best,

      -Nikolaus

      --
      »It is not worth an intelligent man's time to be in the majority.
      By definition, there are already enough people to do that.«
      -J.H. Hardy

      PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

      Comment

      • Carl Banks

        #4
        Re: Locking around

        On Aug 4, 9:30 am, Nikolaus Rath <Nikol...@rath. orgwrote:
        Hello,
        >
        I need to synchronize the access to a couple of hundred-thousand
        files[1]. It seems to me that creating one lock object for each of the
        files is a waste of resources, but I cannot use a global lock for all
        of them either (since the locked operations go over the network, this
        would make the whole application essentially single-threaded even
        though most operations act on different files).
        >
        My idea is therefore to create and destroy per-file locks "on-demand"
        and to protect the creation and destruction by a global lock
        (self.global_lo ck). For that, I add a "usage counter"
        (wlock.user_cou nt) to each lock, and destroy the lock when it reaches
        zero.
        [snip]
        My questions:
        >
        - Does that look like a proper solution, or does anyone have a better
        one?

        You need the per-file locks at all if you use a global lock like
        this. Here's a way to do it using threading.Condi tion objects. I
        suspect it might not perform so well if there is a lot of competition
        for certain keys but it doesn't sound like that's the case for you.
        Performance and robustness improvements left as an exercise. (Note:
        I'm not sure where self comes from in your examples so I left it out
        of mine.)


        global_lock = threading.Condi tion()
        locked_keys = set()

        def lock_s3key(s3ke y):
        global_lock.acq uire()
        while s3key in locked_keys:
        global_lock.wai t()
        locked_keys.add (s3key)
        global_lock.rel ease()

        def unlock_s3key(s3 key):
        global_lock.acq uire()
        locked_keys.rem ove(s3key)
        global_lock.not ifyAll()
        global_lock.rel ease()



        Carl Banks

        Comment

        • Carl Banks

          #5
          Re: Locking around

          On Aug 6, 6:34 am, Nikolaus Rath <Nikol...@rath. orgwrote:
          Nikolaus Rath <Nikol...@rath. orgwrites:
          This should work, at least the idea is not flawed. However, I'd say
          there are too many locks involved. Rather, you just need a simple
          flag and the global lock. Further, you need a condition/event that
          tells waiting threads that you released some of the files so that it
          should see again if the ones it wants are available.
          >
          I have to agree that this sounds like an easier implementation. I
          just have to think about how to do the signalling. Thanks a lot!
          >
          Here's the code I use now. I think it's also significantly easier to
          understand (cv is a threading.Condi tion() object and cv.locked_keys a
          set()).
          >
              def lock_s3key(s3ke y):
                  cv = self.s3_lock
          >
                  try:
                      # Lock set of locked s3 keys (global lock)
                      cv.acquire()
          >
                      # Wait for given s3 key becoming unused
                      while s3key in cv.locked_keys:
                          cv.wait()
          >
                      # Mark it as used (local lock)
                      cv.locked_keys. add(s3key)
                  finally:
                      # Release global lock
                      cv.release()
          >
              def unlock_s3key(s3 key):
                  cv = self.s3_lock
          >
                  try:
                      # Lock set of locked s3 keys (global lock)
                      cv.acquire()
          >
                      # Mark key as free (release local lock)
                      cv.locked_keys. remove(s3key)
          >
                      # Notify other threads
                      cv.notify()
          >
                  finally:
                      # Release global lock
                      cv.release()
          Freaky... I just posted nearly this exact solution.

          I have a couple comments. First, the call to acquire should come
          before the try block. If the acquire were to fail, you wouldn't want
          to release the lock on cleanup.

          Second, you need to change notify() to notifyAll(); notify alone won't
          cut it. Consider what happens if you have two threads waiting for
          keys A and B respectively. When the thread that has B is done, it
          releases B and calls notify, but notify happens to wake up the thread
          waiting on A. Thus the thread waiting on B is starved.


          Carl Banks

          Comment

          • Nikolaus Rath

            #6
            Re: Locking around

            Carl Banks <pavlovevidence @gmail.comwrite s:
            Freaky... I just posted nearly this exact solution.
            >
            I have a couple comments. First, the call to acquire should come
            before the try block. If the acquire were to fail, you wouldn't want
            to release the lock on cleanup.
            >
            Second, you need to change notify() to notifyAll(); notify alone won't
            cut it. Consider what happens if you have two threads waiting for
            keys A and B respectively. When the thread that has B is done, it
            releases B and calls notify, but notify happens to wake up the thread
            waiting on A. Thus the thread waiting on B is starved.
            You're right. Thanks for pointing it out.

            Best,

            -Nikolaus

            --
            »It is not worth an intelligent man's time to be in the majority.
            By definition, there are already enough people to do that.«
            -J.H. Hardy

            PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

            Comment

            • Tobiah

              #7
              Re: Locking around

              On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:
              Hello,
              >
              I need to synchronize the access to a couple of hundred-thousand
              files[1]. It seems to me that creating one lock object for each of the
              files is a waste of resources, but I cannot use a global lock for all
              of them either (since the locked operations go over the network, this
              would make the whole application essentially single-threaded even
              though most operations act on different files).
              Do you think you could use an SQL database on the network to
              handle the locking? I was thinking of a table with one row
              per file. If the lock field is clear, you could update with a unique
              ID, and query back to make sure that it is still yours before
              accessing the file.

              Hey, maybe the files themselves should go into blobs.


              ** Posted from http://www.teranews.com **

              Comment

              • Brian Quinlan

                #8
                Using an DTD not specified in XML file for validation

                Hey,

                I'm trying to figure out how I can validate an XML file using a DTD that
                isn't specified in the XML file.

                My code so far is:

                from xml import sax
                from xml.sax import sax2exts

                parser = sax2exts.XMLVal ParserFactory.m ake_parser()

                parser.setConte ntHandler(handl er)
                parser.setError Handler(handler )

                parser.parse(xm l_file)

                And this works fine if the DTD is specified in the XML file i.e errors
                are generated for non-compliant entities. But I would like to force the
                file to be valid according to one other DTD file that is not referenced
                in the XML file.

                Anyone know how to do this?

                Cheers,
                Brian

                Comment

                • Nikolaus Rath

                  #9
                  Re: Locking around

                  Tobiah <toby@tobiah.or gwrites:
                  On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:
                  >
                  >Hello,
                  >>
                  >I need to synchronize the access to a couple of hundred-thousand
                  >files[1]. It seems to me that creating one lock object for each of the
                  >files is a waste of resources, but I cannot use a global lock for all
                  >of them either (since the locked operations go over the network, this
                  >would make the whole application essentially single-threaded even
                  >though most operations act on different files).
                  >
                  Do you think you could use an SQL database on the network to
                  handle the locking?
                  Yeah, I could. It wouldn't even have to be over the network (I'm
                  synchronizing access from within the same program). But I think that
                  is even more resource-wasteful than my original idea.
                  Hey, maybe the files themselves should go into blobs.
                  Nope, not possible. They're on Amazon S3.

                  Best,

                  -Nikolaus

                  --
                  »It is not worth an intelligent man's time to be in the majority.
                  By definition, there are already enough people to do that.«
                  -J.H. Hardy

                  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

                  Comment

                  • Ben Finney

                    #10
                    Re: Using an DTD not specified in XML file for validation

                    Brian Quinlan <brian@sweetapp .comwrites:
                    I'm trying to figure out how I can validate an XML file using a DTD
                    that isn't specified in the XML file.
                    When your inention is to start a new discussion, you could compose a
                    new message, *not* reply to an existing message. Your message here is
                    now part of an existing thread of discussion, yet is confusingly
                    unrelated in its content, and will not be noticed by most readers.

                    --
                    \ “Whatever you do will be insignificant, but it is very |
                    `\ important that you do it.” —Mahatma Gandhi |
                    _o__) |
                    Ben Finney

                    Comment

                    • Enrico

                      #11
                      Re: Using an DTD not specified in XML file for validation

                      "Brian Quinlan" <brian@sweetapp .comha scritto nel messaggio
                      news:mailman.12 00.1218043709.9 22.python-list@python.org ...
                      Hey,
                      >
                      I'm trying to figure out how I can validate an XML file using a DTD that
                      isn't specified in the XML file.
                      I did it once using lxml. You can read from:


                      With this package is quite simple (code not tested):

                      from lxml import etree
                      dtd = etree.DTD('mydt d.dtd')
                      f = file('mydoc.xml ')
                      xml = etree.XML(f.rea d())
                      dtd.validate(xm l)

                      Enrico


                      Comment

                      • linuxnow@gmail.com

                        #12
                        Re: Locking around

                        On Aug 6, 8:33 pm, Nikolaus Rath <Nikol...@rath. orgwrote:
                        Tobiah <t...@tobiah.or gwrites:
                        On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:
                        Do you think you could use an SQL database on the network to
                        handle the locking?
                        >
                        Yeah, I could. It wouldn't even have to be over the network (I'm
                        synchronizing access from within the same program). But I think that
                        is even more resource-wasteful than my original idea.
                        You could use Amazon SQS to queue the requests, as you are talking
                        about files kept in S3 anyway, that way it would be fully distributed,

                        --
                        Pau

                        Comment

                        Working...