server bootstrapping upon connection (WARNING: LONG)

**ralf@brainbot.com** · Jul 18 '05, 08:20 AM

Re: server bootstrapping upon connection (WARNING: LONG)

fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
[color=blue]
> Here is the situation: I want my server started up upon connection.
> When the first connection comes in, the server is not running. The
> client realizes the fact, and then starts up the server and tries to
> connect again. This of course all happens on the same machine (local
> connection only).
>
> The connections can come in as fast as 30+/sec, so the server is
> threaded (using SocketServer.Th readingTCPServe r). Further, the server
> initialization is threaded into two: one is to do the real, lengthy
> setup, the other is start lisenting ASAP.
>
> The problem: I need to prevent multiple copies of the server being
> started. I did this by using file locking (fcntl.lockf()) . However,
> not every time the code successfully prevented the server being
> started up more than one time. Here is the relevant code:[/color]

When using fcntl.lockf different FooClient instances in the same
process will be able lock the file and start another server. You could
either use fcntl.flock to prevent that or use some global flag.
Also be sure to keep the file open by keeping a reference to it
(i.e. self.serverStar tLock=open(...) ).
For debugging purposes, remove that 'if self.connect(): return' and
I think you'll see much more servers being started.
[color=blue]
>
> ---- CLIENT CODE SNIPPET STARTS ----
> from fooConsts import *
>
> import socket,fcntl,os ,sys,time,loggi ng
>
> FOO_CLIENT_DEBU G=1
>
> POLL_SERVER_INT VL=0.5
>
> if FOO_CLIENT_DEBU G:
> myPID=os.getpid ()
> logger = logging.getLogg er('FOO Client')
> hdlr = logging.FileHan dler(TMP_PATH+' fooClient.log')
> formatter = logging.Formatt er('%(asctime)s %(message)s')
> hdlr.setFormatt er(formatter)
> logger.addHandl er(hdlr)
> logger.setLevel (logging.INFO)
>
> def log (info):
> logger.info('%d : %s'%(myPID,info ))
>
>
> class FooClient:
> def __init__ (self, startServer=Tru e):
> """Connects to FooServer if it exists, otherwise starts it and
> connects to it"""
> self.connected= True
> if self.connect(): return
> elif not startServer:
> if FOO_CLIENT_DEBU G: log('connection failed 1')
> self.connected= False
> return
>
> # try to obtain the right to start a server; if we can't,
> someone else
> # must be doing it - try to reconnect
> try:
> if FOO_CLIENT_DEBU G: log('try to get right to start
> server')
> serverStartLock =open(TMP_PATH+ 'serverStart.lo ck','w')
> fcntl.lockf(ser verStartLock.fi leno(),fcntl.LO CK_EX|fcntl.LOC K_NB)
> except:
> if FOO_CLIENT_DEBU G: log('someone else is doing it; wait
> 2')
> while 1:
> time.sleep(POLL _SERVER_INTVL)
> if self.connect(): return
>
> # safe to start a server and connect to it
> if FOO_CLIENT_DEBU G: log('start server')
> exitCode=os.sys tem('python -OO "%sfooServer.py " &'%FOO_PATH)
> if exitCode==0:
> if FOO_CLIENT_DEBU G: log('server is being started; wait
> 3')
> while 1:
> time.sleep(POLL _SERVER_INTVL)
> if self.connect(): return
> else:
> if FOO_CLIENT_DEBU G: log('sever bootstrapping failed')
> self.connected= False
> raise "Cannot start FOOServer"
>
> def connect (self):
> """Attempts to connect to FOOServer from PORT_MIN to
> PORT_MAX"""
> if FOO_CLIENT_DEBU G: log('connection attempt')
> port=PORT_MIN
> while port<=PORT_MAX:
> try:
> self.socket=soc ket.socket(sock et.AF_INET,
> socket.SOCK_STR EAM)
> self.socket.con nect(('',port))
> break
> except:
> self.socket.clo se()
> port+=1
>
> if port>PORT_MAX:
> if FOO_CLIENT_DEBU G: log('connection failed 2')
> return False
> if FOO_CLIENT_DEBU G: log('connection succeeded at port
> %d'%port)
> return True
> ...
>
> FooClient()
> ---- CLIENT CODE SNIPPET ENDS ----
>
> From the log (when problem occurred) I see even *AFTER* the server was
> started and accepted connections (several connections came and went
> happily), a connection would come in and hit the "connection failed 1"
> log line. This shouldn't have happened as the default value of
> startServer for FooClient.__ini t__() is True. In the very same[/color]

Well, maybe too many connection attempts are pending...
[color=blue]
> incident, FooServer was started twice, but in the log there's no trace
> of two "start server" lines, but only the "connection failed 1" line
> in the very end of the log.
>
> I realize this is a rather complex question to ask for help, but I'm
> really at wits end. Feel free to skip the detailed description, and
> just throw me some general tips and hints. Thank you VERY MUCH![/color]

--
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/ mailto:ralf@bra inbot.com

**Benjamin Han** · Jul 18 '05, 08:20 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
[color=blue]
> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[color=green]
>> The problem: I need to prevent multiple copies of the server being
>> started. I did this by using file locking (fcntl.lockf()) . However,
>> not every time the code successfully prevented the server being
>> started up more than one time. Here is the relevant code:[/color]
>
> When using fcntl.lockf different FooClient instances in the same
> process will be able lock the file and start another server. You could
> either use fcntl.flock to prevent that or use some global flag.[/color]

Hm... I didn't know there's a difference between flock() and lockf(),
and I didn't get much info from the document either. Could you explain
a bit on why lockf() would not lock the file?

Actually I wrote a tiny script just to test if lockf() does what it
claims to do:

--- CODE STARTS ---
#!/usr/bin/env python

import os,fcntl,sys

print "* about to open flock.txt"
f=open('flock.t xt','w')
print "* opened the file"
fcntl.lockf(f.f ileno(),fcntl.L OCK_EX|fcntl.LO CK_NB)
print "* obtained the lock, enter your line below:"
l=sys.stdin.rea dline()
f.truncate()
f.write(l)
f.flush()
sys.stdin.readl ine()
f.close()

--- CODE ENDS ---

It seems it does lock the file? (Mac OS X 10.3.2).
[color=blue]
> Also be sure to keep the file open by keeping a reference to it
> (i.e. self.serverStar tLock=open(...) ). For debugging purposes, remove
> that 'if self.connect(): return' and
> I think you'll see much more servers being started.[/color]

--- CODE SNIPPET STARTS ---
class FooClient:
def __init__ (self, startServer=Tru e):
"""Connects to FooServer if it exists, otherwise starts it and
connects to it"""
self.connected= True
if self.connect(): return
elif not startServer:
if FOO_CLIENT_DEBU G: log('connection failed 1')
self.connected= False
return
...
--- CODE SNIPPET ENDS ---

Well in that case every connection will try to start a server. Good
point on keeping a reference to the lock file though - I'll add to it
and see what happens.
[color=blue][color=green]
>>
>> From the log (when problem occurred) I see even *AFTER* the server was
>> started and accepted connections (several connections came and went
>> happily), a connection would come in and hit the "connection failed 1"
>> log line. This shouldn't have happened as the default value of
>> startServer for FooClient.__ini t__() is True. In the very same[/color]
>
> Well, maybe too many connection attempts are pending...[/color]

I failed to see why this should affect the default value of the
argument... if startServer is True (default), that log line should have
never been reached.

Thanks!

Ben

**ralf@brainbot.com** · Jul 18 '05, 08:20 AM

Re: server bootstrapping upon connection (WARNING: LONG)

Benjamin Han <this@is.for.sp ambot> writes:
[color=blue]
> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
>[color=green]
>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[color=darkred]
>>> The problem: I need to prevent multiple copies of the server being
>>> started. I did this by using file locking (fcntl.lockf()) . However,
>>> not every time the code successfully prevented the server being
>>> started up more than one time. Here is the relevant code:[/color]
>> When using fcntl.lockf different FooClient instances in the same
>> process will be able lock the file and start another server. You could
>> either use fcntl.flock to prevent that or use some global flag.[/color]
>
> Hm... I didn't know there's a difference between flock() and lockf(),
> and I didn't get much info from the document either. Could you explain
> a bit on why lockf() would not lock the file?[/color]

Well, it might lock the file multiple times in the same
process. That's the problem:
----
import fcntl

def getlockfile():
return open('serverSta rt.lock', 'w')

def getlock(f):
try:
lock(f.fileno() , fcntl.LOCK_EX|f cntl.LOCK_NB)
except:
return False
return True

def doit():
f1 = getlockfile()
f2 = getlockfile()
print getlock(f1), getlock(f2)

lock = fcntl.lockf
doit()

lock = fcntl.flock
doit()
---

Output is:
True True
True False

[color=blue]
>
> Actually I wrote a tiny script just to test if lockf() does what it
> claims to do:
>
> --- CODE STARTS ---
> #!/usr/bin/env python
>
> import os,fcntl,sys
>
> print "* about to open flock.txt"
> f=open('flock.t xt','w')
> print "* opened the file"
> fcntl.lockf(f.f ileno(),fcntl.L OCK_EX|fcntl.LO CK_NB)
> print "* obtained the lock, enter your line below:"
> l=sys.stdin.rea dline()
> f.truncate()
> f.write(l)
> f.flush()
> sys.stdin.readl ine()
> f.close()
>
> --- CODE ENDS ---
>
> It seems it does lock the file? (Mac OS X 10.3.2).
>[color=green]
>> Also be sure to keep the file open by keeping a reference to it
>> (i.e. self.serverStar tLock=open(...) ). For debugging purposes,
>> remove that 'if self.connect(): return' and
>> I think you'll see much more servers being started.[/color]
>
> --- CODE SNIPPET STARTS ---
> class FooClient:
> def __init__ (self, startServer=Tru e):
> """Connects to FooServer if it exists, otherwise starts it and
> connects to it"""
> self.connected= True
> if self.connect(): return
> elif not startServer:
> if FOO_CLIENT_DEBU G: log('connection failed 1')
> self.connected= False
> return
> ..
> --- CODE SNIPPET ENDS ---
>
> Well in that case every connection will try to start a server. Good
> point on keeping a reference to the lock file though - I'll add to it
> and see what happens.
>[color=green][color=darkred]
>>> From the log (when problem occurred) I see even *AFTER* the server
>>> was
>>> started and accepted connections (several connections came and went
>>> happily), a connection would come in and hit the "connection failed 1"
>>> log line. This shouldn't have happened as the default value of
>>> startServer for FooClient.__ini t__() is True. In the very same[/color]
>> Well, maybe too many connection attempts are pending...[/color]
>
> I failed to see why this should affect the default value of the
> argument... if startServer is True (default), that log line should
> have never been reached.[/color]

Well, then I suppose you're passing an false argument to
FooClient.__ini t__. I just wanted to say, that even if the server is
listening on that port, that connection attempts may fail.
[color=blue]
>
> Thanks!
>
> Ben[/color]

--
brainbot technologies ag
boppstrasse 64 . 55118 mainz . germany
fon +49 6131 211639-1 . fax +49 6131 211639-2
http://brainbot.com/ mailto:ralf@bra inbot.com

**Benjamin Han** · Jul 18 '05, 08:20 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
[color=blue]
> Benjamin Han <this@is.for.sp ambot> writes:
>[color=green]
>> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
>>[color=darkred]
>>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
>>>> The problem: I need to prevent multiple copies of the server being
>>>> started. I did this by using file locking (fcntl.lockf()) . However,
>>>> not every time the code successfully prevented the server being
>>>> started up more than one time. Here is the relevant code:
>>> When using fcntl.lockf different FooClient instances in the same
>>> process will be able lock the file and start another server. You could
>>> either use fcntl.flock to prevent that or use some global flag.[/color]
>>
>> Hm... I didn't know there's a difference between flock() and lockf(),
>> and I didn't get much info from the document either. Could you explain
>> a bit on why lockf() would not lock the file?[/color]
>
> Well, it might lock the file multiple times in the same
> process. That's the problem:
> ----
> import fcntl
>
> def getlockfile():
> return open('serverSta rt.lock', 'w')
>
> def getlock(f):
> try:
> lock(f.fileno() , fcntl.LOCK_EX|f cntl.LOCK_NB)
> except:
> return False
> return True
>
> def doit():
> f1 = getlockfile()
> f2 = getlockfile()
> print getlock(f1), getlock(f2)
>
> lock = fcntl.lockf
> doit()
>
> lock = fcntl.flock
> doit()
> ---
>
> Output is:
> True True
> True False[/color]

Ok I've since changed all lockf() to flock(), but from the "ps" log
during the stress test, I still get two servers started up, listed by
ps like this:

27308 ?? S 0:01.25 python -OO fooServer.py
27465 ?? SV 0:01.25 python -OO fooServer.py

In this case the one with pid 27308 was the first server process
started. Some time later, with unknown reason, another server entry
will show up in ps log, but the status is always "RV" or "SV" (e.g.,
pid 27465). The way I started the server in the code is os.system()
call, and from the man page of ps, 'V' means "The process is suspended
during a vfork."

The weird thing is I also have log statements in the server script, and
from the log file only ONE copy is actually started. I start to suspect
that I might have misunderstood the ps log - could it be there really
is only ONE copy running, despite there're (up to) 2 entries shown in
the ps log?
[color=blue]
> I just wanted to say, that even if the server is
> listening on that port, that connection attempts may fail.[/color]

How is it possible? Could you explain a bit more?

Thanks again,

Ben

**Krzysztof Stachlewski** · Jul 18 '05, 08:21 AM

Re: server bootstrapping upon connection (WARNING: LONG)

"Benjamin Han" <this@is.for.sp ambot> wrote in message
news:2004021019 594716807%this@ isforspambot...
[color=blue][color=green]
> > I just wanted to say, that even if the server is
> > listening on that port, that connection attempts may fail.[/color]
>
> How is it possible? Could you explain a bit more?[/color]

Your server may be accepting connections slower than
the speed at which new connections are coming.
In such situation your operating system maintains a list
of those awaiting connections. You can control the length of
this list with the "backlog" parameter of the "listen" function.
If the list is full, new connections may simply be dropped
as if the server was not listening at all.

Stach

**Benjamin Han** · Jul 18 '05, 08:21 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-11 07:00:32 -0500, "Krzysztof Stachlewski"
<stach@fr.REMOV E.pl> said:
[color=blue]
> "Benjamin Han" <this@is.for.sp ambot> wrote in message
> news:2004021019 594716807%this@ isforspambot...
>[color=green][color=darkred]
>>> I just wanted to say, that even if the server is
>>> listening on that port, that connection attempts may fail.[/color]
>>
>> How is it possible? Could you explain a bit more?[/color]
>
> Your server may be accepting connections slower than
> the speed at which new connections are coming.
> In such situation your operating system maintains a list
> of those awaiting connections. You can control the length of
> this list with the "backlog" parameter of the "listen" function.
> If the list is full, new connections may simply be dropped
> as if the server was not listening at all.
>
> Stach[/color]

Thanks for the tip - the default queue size is 5. I've since changed it
into 100 (much more than what it actually needs), but the original
problem remains.

**Benjamin Han** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-10 19:59:47 -0500, Benjamin Han <this@is.for.sp ambot> said:
[color=blue]
> On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
>[color=green]
>> Benjamin Han <this@is.for.sp ambot> writes:
>>[color=darkred]
>>> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
>>>
>>>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
>>>>> The problem: I need to prevent multiple copies of the server being
>>>>> started. I did this by using file locking (fcntl.lockf()) . However,
>>>>> not every time the code successfully prevented the server being
>>>>> started up more than one time. Here is the relevant code:
>>>> ... from the "ps" log during the stress test, I still get two servers
>>>> started up, listed by ps like this:[/color][/color]
>
> 27308 ?? S 0:01.25 python -OO fooServer.py
> 27465 ?? SV 0:01.25 python -OO fooServer.py
>
> In this case the one with pid 27308 was the first server process
> started. Some time later, with unknown reason, another server entry
> will show up in ps log, but the status is always "RV" or "SV" (e.g.,
> pid 27465). The way I started the server in the code is os.system()
> call, and from the man page of ps, 'V' means "The process is suspended
> during a vfork."
>
> The weird thing is I also have log statements in the server script, and
> from the log file only ONE copy is actually started. I start to suspect
> that I might have misunderstood the ps log - could it be there really
> is only ONE copy running, despite there're (up to) 2 entries shown in
> the ps log?[/color]

Ok this is by far the most puzzling one: I use logger module in the
server script to notify its execution. The code is like this:

--- LOGGING CODE STARTS ---
if FOO_COMM_DEBUG:
myPID=os.getpid ()
logger = logging.getLogg er('FOO Comm')
hdlr = logging.FileHan dler(TMP_PATH+' fooComm.log')
formatter = logging.Formatt er('%(asctime)s %(message)s')
hdlr.setFormatt er(formatter)
logger.addHandl er(hdlr)
logger.setLevel (logging.INFO)

def log (info):
logger.info('Se rver %d: %s'%(myPID,info ))

if FOO_COMM_DEBUG: log('process %d started me'%os.getppid( ))
--- LOGGING CODE ENDS ---

This is the FIRST block of code right after my "import ..." statement
in the server script, so there's no way that it could be skipped if the
script gets executed. Of course FOO_COMM_DEBUG is set to 1.

But as described, the ps log showed two entries of fooServer.py, but in
the log produced, only one server actually was started (I can tell that
from the PID in the log file). I really can't explain this discrepency!

**Alan Kennedy** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

[Benjamin Han][color=blue]
> But as described, the ps log showed two entries of fooServer.py, but in
> the log produced, only one server actually was started (I can tell that
> from the PID in the log file). I really can't explain this discrepency![/color]

Quick sanity check: Are you using an operating system that reports an
entry in the ps list for every *thread* of a process, rather an entry
for every process, as you might expect. So if you have 2 threads
running under the "fooServer. py" process, you get 2 entries in the
output of ps, rather than 1.

Some unixen do this.

HTH,

--
alan kennedy
------------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/contact/alan

**Alan Kennedy** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

[ralf@brainbot.c om][color=blue][color=green][color=darkred]
>>>> I just wanted to say, that even if the server is
>>>> listening on that port, that connection attempts may fail.[/color][/color][/color]

[Benjamin Han][color=blue][color=green][color=darkred]
>>> How is it possible? Could you explain a bit more?[/color][/color][/color]

[Krzysztof Stachlewski][color=blue][color=green]
>> Your server may be accepting connections slower than
>> the speed at which new connections are coming.
>> In such situation your operating system maintains a list
>> of those awaiting connections. You can control the length of
>> this list with the "backlog" parameter of the "listen" function.
>> If the list is full, new connections may simply be dropped
>> as if the server was not listening at all.[/color][/color]

[Benjamin Han][color=blue]
> Thanks for the tip - the default queue size is 5. I've since changed it
> into 100 (much more than what it actually needs), but the original
> problem remains.[/color]

Another quick point: even if you set the backlog to 100 connections,
that only gives you 3.3333r seconds of grace before the backlog of
connections fills up, given the "up to 30 connections/second" you
mentioned in another message.

The 101st client to attempt a connect() will fail, if the server
doesn't start accept()ing connections before the backlog fills up.

HTH,

--
alan kennedy
------------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/contact/alan

**Benjamin Han** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-11 14:23:33 -0500, Alan Kennedy <alanmk@hotmail .com> said:
[color=blue]
> [Benjamin Han][color=green]
>> But as described, the ps log showed two entries of fooServer.py, but in
>> the log produced, only one server actually was started (I can tell that
>> from the PID in the log file). I really can't explain this discrepency![/color]
>
> Quick sanity check: Are you using an operating system that reports an
> entry in the ps list for every *thread* of a process, rather an entry
> for every process, as you might expect. So if you have 2 threads
> running under the "fooServer. py" process, you get 2 entries in the
> output of ps, rather than 1.
>
> Some unixen do this.
>
> HTH,[/color]

Hm, this is Mac OS X 10.3.2, and I checked the man page of ps, it has
another option '-M' to show the threads (which I didn't use to produce
the ps log in question) - but of course they all show up with the same
PID, not the phenomena I saw in the ps log file (where more than one
fooServer.py processes showed up with unique PIDs).

But fooServer.py does use more than one thread (both from the
SocketServer.Th readingTCPServe r and from some other threads it creates).

**Krzysztof Stachlewski** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

"Benjamin Han" <this@is.for.sp ambot> wrote in message
news:2004021111 431416807%this@ isforspambot...
[color=blue]
> Thanks for the tip - the default queue size is 5. I've since changed it
> into 100 (much more than what it actually needs), but the original
> problem remains.[/color]

I don't really think that your OS supports
so big backlog queues.

Stach

**Benjamin Han** · Jul 18 '05, 08:22 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-11 14:50:02 -0500, Alan Kennedy <alanmk@hotmail .com> said:
[color=blue]
> [ralf@brainbot.c om][color=green][color=darkred]
>>>>> I just wanted to say, that even if the server is
>>>>> listening on that port, that connection attempts may fail.[/color][/color]
>
> [Benjamin Han][color=green][color=darkred]
>>>> How is it possible? Could you explain a bit more?[/color][/color]
>
> [Krzysztof Stachlewski][color=green][color=darkred]
>>> Your server may be accepting connections slower than
>>> the speed at which new connections are coming.
>>> In such situation your operating system maintains a list
>>> of those awaiting connections. You can control the length of
>>> this list with the "backlog" parameter of the "listen" function.
>>> If the list is full, new connections may simply be dropped
>>> as if the server was not listening at all.[/color][/color]
>
> [Benjamin Han][color=green]
>> Thanks for the tip - the default queue size is 5. I've since changed it
>> into 100 (much more than what it actually needs), but the original
>> problem remains.[/color]
>
> Another quick point: even if you set the backlog to 100 connections,
> that only gives you 3.3333r seconds of grace before the backlog of
> connections fills up, given the "up to 30 connections/second" you
> mentioned in another message.
>
> The 101st client to attempt a connect() will fail, if the server
> doesn't start accept()ing connections before the backlog fills up.
>
> HTH,[/color]

The server is threaded, so it only takes fractions of a sec to finish
off a request. Actually from the log once the server is up there is
nothing I can see which could fill the entire queue (even if with a
much smaller size).

My hunch is that the problem is somewhere else. First I can't still
account for the discrepancy between the server log file and ps log
file: the former told me only one server instance has been started, but
the later showed from time to time a second instance had indeed been
fired up (with state "RV" listed in the log file).

BTW I use this simple line in bash to collec the ps log:

while [ '1' = '1' ] ; do ps awx | grep python | grep -v grep>> ps.log ;
echo --- >> ps.log ; done

I don't suppose this could interfere anything?

**Benjamin Han** · Jul 18 '05, 08:25 AM

Re: server bootstrapping upon connection (WARNING: LONG)

On 2004-02-10 19:59:47 -0500, Benjamin Han <this@is.for.sp ambot> said:
[color=blue]
> On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
>[color=green]
>> Benjamin Han <this@is.for.sp ambot> writes:
>>[color=darkred]
>>> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
>>>
>>>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[/color][/color][/color]
[color=blue]
> Here is the situation: I want my server started up upon connection.
> When the first connection comes in, the server is not running. The
> client realizes the fact, and then starts up the server and tries to
> connect again. This of course all happens on the same machine (local
> connection only).
>
> The connections can come in as fast as 30+/sec, so the server is
> threaded (using SocketServer.Th readingTCPServe r). Further, the server
> initialization is threaded into two: one is to do the real, lengthy
> setup, the other is start lisenting ASAP.[/color]
[color=blue][color=green][color=darkred]
>>>>> The problem: I need to prevent multiple copies of the server being
>>>>> started. I did this by using file locking (fcntl.lockf()) . However,
>>>>> not every time the code successfully prevented the server being
>>>>> started up more than one time. Here is the relevant code:[/color][/color]
>
> ... from the "ps" log during the stress test, I still get two servers
> started up, listed by ps like this:
>
> 27308 ?? S 0:01.25 python -OO fooServer.py
> 27465 ?? SV 0:01.25 python -OO fooServer.py
>
> In this case the one with pid 27308 was the first server process
> started. Some time later, with unknown reason, another server entry
> will show up in ps log, but the status is always "RV" or "SV" (e.g.,
> pid 27465). The way I started the server in the code is os.system()
> call, and from the man page of ps, 'V' means "The process is suspended
> during a vfork."
>
> The weird thing is I also have log statements in the server script, and
> from the log file only ONE copy is actually started. I start to suspect
> that I might have misunderstood the ps log - could it be there really
> is only ONE copy running, despite there're (up to) 2 entries shown in
> the ps log?[/color]

Because of this discrepancy between the ps log and the log produced by
running the server script, I started to suspect that the second server
instance shown in the ps log was a forked process, especially since it
always came with a 'V' (vforked) state. Adding a "-o flags" to the ps
command showed the following (exerpt);

8896 Fri Feb 13 03:47:25 2004 R 2004004 - - 2a567f8
python -OO fooServer.py
8997 Fri Feb 13 03:47:34 2004 RV 8000014 - - 2a54028
python -OO fooServer.py

After checking sys/proc.h:

#define P_VFORK 0x2000000 /* process has vfork children */
#define P_INVFORK 0x8000000 /* proc in vfork */

Basically confirmed that the server process was vforked. I then did a
"grep -r fork *" but none showed in any of my code. My question then
is, where could such a "hidden" vfork happen? I did use the following
in my code:

1. pyDNS (http://pydns.sourceforge.net/)
2. os.popen(), os.popen2(), os.system()

Any hint is extremely welcome!

server bootstrapping upon connection (WARNING: LONG)

server bootstrapping upon connection (WARNING: LONG)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment