Locking around

**Nikolaus Rath** · Aug 6 '08, 09:45 AM

Re: Locking around

Ulrich Eckhardt <eckhardt@sator laser.comwrites :

Nikolaus Rath wrote:

>I need to synchronize the access to a couple of hundred-thousand
>files[1]. It seems to me that creating one lock object for each of the
>files is a waste of resources, but I cannot use a global lock for all
>of them either (since the locked operations go over the network, this
>would make the whole application essentially single-threaded even
>though most operations act on different files).

>
Just wondering, but at what time do you know what files are needed?

As soon as I have read a client request. Also, I will only need one
file per request, not multiple.

If you know that rather early, you could simply 'check out' the
required files, do whatever you want with them and then release them
again. If one of the requested files is marked as already in use,
you simply wait (without reserving the others) until someone
releases files and then try again. You could also wait for that
precise file to be available, but that would require that you
already reserve the other files, which might unnecessarily block
other accesses.
>
Note that this idea requires that each access locks one set of files at the
beginning and releases them at the end, i.e. no attempts to lock files in
between, which would otherwise easily lead to deadlocks.

I am not sure that I understand your idea. To me this sounds exactly
like what I'm already doing, just replace 'check out' by 'lock' in
your description... Am I missing something?

>My idea is therefore to create and destroy per-file locks "on-demand"
>and to protect the creation and destruction by a global lock
>(self.global_l ock). For that, I add a "usage counter"
>(wlock.user_co unt) to each lock, and destroy the lock when it reaches
>zero.

[...code...]
>

> - Does that look like a proper solution, or does anyone have a better
> one?

>
This should work, at least the idea is not flawed. However, I'd say
there are too many locks involved. Rather, you just need a simple
flag and the global lock. Further, you need a condition/event that
tells waiting threads that you released some of the files so that it
should see again if the ones it wants are available.

I have to agree that this sounds like an easier implementation. I just
have to think about how to do the signalling. Thanks a lot!

Best,

-Nikolaus

--
Â»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.Â«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

**Nikolaus Rath** · Aug 6 '08, 10:45 AM

Re: Locking around

Nikolaus Rath <Nikolaus@rath. orgwrites:

>This should work, at least the idea is not flawed. However, I'd say
>there are too many locks involved. Rather, you just need a simple
>flag and the global lock. Further, you need a condition/event that
>tells waiting threads that you released some of the files so that it
>should see again if the ones it wants are available.

>
I have to agree that this sounds like an easier implementation. I
just have to think about how to do the signalling. Thanks a lot!

Here's the code I use now. I think it's also significantly easier to
understand (cv is a threading.Condi tion() object and cv.locked_keys a
set()).

def lock_s3key(s3ke y):
cv = self.s3_lock

try:
# Lock set of locked s3 keys (global lock)
cv.acquire()

# Wait for given s3 key becoming unused
while s3key in cv.locked_keys:
cv.wait()

# Mark it as used (local lock)
cv.locked_keys. add(s3key)
finally:
# Release global lock
cv.release()

def unlock_s3key(s3 key):
cv = self.s3_lock

try:
# Lock set of locked s3 keys (global lock)
cv.acquire()

# Mark key as free (release local lock)
cv.locked_keys. remove(s3key)

# Notify other threads
cv.notify()

finally:
# Release global lock
cv.release()

Best,

-Nikolaus

--
Â»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.Â«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

**Carl Banks** · Aug 6 '08, 10:55 AM

Re: Locking around

On Aug 4, 9:30 am, Nikolaus Rath <Nikol...@rath. orgwrote:

Hello,
>
I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).
>
My idea is therefore to create and destroy per-file locks "on-demand"
and to protect the creation and destruction by a global lock
(self.global_lo ck). For that, I add a "usage counter"
(wlock.user_cou nt) to each lock, and destroy the lock when it reaches
zero.

[snip]

My questions:
>
- Does that look like a proper solution, or does anyone have a better
one?

You need the per-file locks at all if you use a global lock like
this. Here's a way to do it using threading.Condi tion objects. I
suspect it might not perform so well if there is a lot of competition
for certain keys but it doesn't sound like that's the case for you.
Performance and robustness improvements left as an exercise. (Note:
I'm not sure where self comes from in your examples so I left it out
of mine.)

global_lock = threading.Condi tion()
locked_keys = set()

def lock_s3key(s3ke y):
global_lock.acq uire()
while s3key in locked_keys:
global_lock.wai t()
locked_keys.add (s3key)
global_lock.rel ease()

def unlock_s3key(s3 key):
global_lock.acq uire()
locked_keys.rem ove(s3key)
global_lock.not ifyAll()
global_lock.rel ease()

Carl Banks

**Carl Banks** · Aug 6 '08, 11:05 AM

Re: Locking around

On Aug 6, 6:34 am, Nikolaus Rath <Nikol...@rath. orgwrote:

Nikolaus Rath <Nikol...@rath. orgwrites:

This should work, at least the idea is not flawed. However, I'd say
there are too many locks involved. Rather, you just need a simple
flag and the global lock. Further, you need a condition/event that
tells waiting threads that you released some of the files so that it
should see again if the ones it wants are available.

>

I have to agree that this sounds like an easier implementation. I
just have to think about how to do the signalling. Thanks a lot!

>
Here's the code I use now. I think it's also significantly easier to
understand (cv is a threading.Condi tion() object and cv.locked_keys a
set()).
>
def lock_s3key(s3ke y):
cv = self.s3_lock
>
try:
# Lock set of locked s3 keys (global lock)
cv.acquire()
>
# Wait for given s3 key becoming unused
while s3key in cv.locked_keys:
cv.wait()
>
# Mark it as used (local lock)
cv.locked_keys. add(s3key)
finally:
# Release global lock
cv.release()
>
def unlock_s3key(s3 key):
cv = self.s3_lock
>
try:
# Lock set of locked s3 keys (global lock)
cv.acquire()
>
# Mark key as free (release local lock)
cv.locked_keys. remove(s3key)
>
# Notify other threads
cv.notify()
>
finally:
# Release global lock
cv.release()

Freaky... I just posted nearly this exact solution.

I have a couple comments. First, the call to acquire should come
before the try block. If the acquire were to fail, you wouldn't want
to release the lock on cleanup.

Second, you need to change notify() to notifyAll(); notify alone won't
cut it. Consider what happens if you have two threads waiting for
keys A and B respectively. When the thread that has B is done, it
releases B and calls notify, but notify happens to wake up the thread
waiting on A. Thus the thread waiting on B is starved.

Carl Banks

**Nikolaus Rath** · Aug 6 '08, 12:45 PM

Re: Locking around

Carl Banks <pavlovevidence @gmail.comwrite s:

Freaky... I just posted nearly this exact solution.
>
I have a couple comments. First, the call to acquire should come
before the try block. If the acquire were to fail, you wouldn't want
to release the lock on cleanup.
>
Second, you need to change notify() to notifyAll(); notify alone won't
cut it. Consider what happens if you have two threads waiting for
keys A and B respectively. When the thread that has B is done, it
releases B and calls notify, but notify happens to wake up the thread
waiting on A. Thus the thread waiting on B is starved.

You're right. Thanks for pointing it out.

Best,

-Nikolaus

--
Â»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.Â«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

**Tobiah** · Aug 6 '08, 04:15 PM

Re: Locking around

On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:

Hello,
>
I need to synchronize the access to a couple of hundred-thousand
files[1]. It seems to me that creating one lock object for each of the
files is a waste of resources, but I cannot use a global lock for all
of them either (since the locked operations go over the network, this
would make the whole application essentially single-threaded even
though most operations act on different files).

Do you think you could use an SQL database on the network to
handle the locking? I was thinking of a table with one row
per file. If the lock field is clear, you could update with a unique
ID, and query back to make sure that it is still yours before
accessing the file.

Hey, maybe the files themselves should go into blobs.

** Posted from http://www.teranews.com **

**Brian Quinlan** · Aug 6 '08, 05:35 PM

Using an DTD not specified in XML file for validation

Hey,

I'm trying to figure out how I can validate an XML file using a DTD that
isn't specified in the XML file.

My code so far is:

from xml import sax
from xml.sax import sax2exts

parser = sax2exts.XMLVal ParserFactory.m ake_parser()

parser.setConte ntHandler(handl er)
parser.setError Handler(handler )

parser.parse(xm l_file)

And this works fine if the DTD is specified in the XML file i.e errors
are generated for non-compliant entities. But I would like to force the
file to be valid according to one other DTD file that is not referenced
in the XML file.

Anyone know how to do this?

Cheers,
Brian

**Nikolaus Rath** · Aug 6 '08, 06:45 PM

Re: Locking around

Tobiah <toby@tobiah.or gwrites:

On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:
>

>Hello,
>>
>I need to synchronize the access to a couple of hundred-thousand
>files[1]. It seems to me that creating one lock object for each of the
>files is a waste of resources, but I cannot use a global lock for all
>of them either (since the locked operations go over the network, this
>would make the whole application essentially single-threaded even
>though most operations act on different files).

>
Do you think you could use an SQL database on the network to
handle the locking?

Yeah, I could. It wouldn't even have to be over the network (I'm
synchronizing access from within the same program). But I think that
is even more resource-wasteful than my original idea.

Hey, maybe the files themselves should go into blobs.

Nope, not possible. They're on Amazon S3.

Best,

-Nikolaus

--
Â»It is not worth an intelligent man's time to be in the majority.
By definition, there are already enough people to do that.Â«
-J.H. Hardy

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

**Ben Finney** · Aug 6 '08, 11:15 PM

Re: Using an DTD not specified in XML file for validation

Brian Quinlan <brian@sweetapp .comwrites:

I'm trying to figure out how I can validate an XML file using a DTD
that isn't specified in the XML file.

When your inention is to start a new discussion, you could compose a
new message, *not* reply to an existing message. Your message here is
now part of an existing thread of discussion, yet is confusingly
unrelated in its content, and will not be noticed by most readers.

--
\ â€œWhatever you do will be insignificant, but it is very |
`\ important that you do it.â€ â€”Mahatma Gandhi |
_o__) |
Ben Finney

**Enrico** · Aug 7 '08, 07:55 AM

Re: Using an DTD not specified in XML file for validation

"Brian Quinlan" <brian@sweetapp .comha scritto nel messaggio
news:mailman.12 00.1218043709.9 22.python-list@python.org ...

Hey,
>
I'm trying to figure out how I can validate an XML file using a DTD that
isn't specified in the XML file.

I did it once using lxml. You can read from:

404 Not Found

http://codespeak.net/lxml/validation.html

With this package is quite simple (code not tested):

from lxml import etree
dtd = etree.DTD('mydt d.dtd')
f = file('mydoc.xml ')
xml = etree.XML(f.rea d())
dtd.validate(xm l)

Enrico

**linuxnow@gmail.com** · Aug 7 '08, 09:25 AM

Re: Locking around

On Aug 6, 8:33 pm, Nikolaus Rath <Nikol...@rath. orgwrote:

Tobiah <t...@tobiah.or gwrites:

On Mon, 04 Aug 2008 15:30:51 +0200, Nikolaus Rath wrote:

Do you think you could use an SQL database on the network to
handle the locking?

>
Yeah, I could. It wouldn't even have to be over the network (I'm
synchronizing access from within the same program). But I think that
is even more resource-wasteful than my original idea.

You could use Amazon SQS to queue the requests, as you are talking
about files kept in S3 anyway, that way it would be fully distributed,

--
Pau

Locking around

Locking around

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment