server bootstrapping upon connection (WARNING: LONG)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Fortepianissimo

    server bootstrapping upon connection (WARNING: LONG)

    Here is the situation: I want my server started up upon connection.
    When the first connection comes in, the server is not running. The
    client realizes the fact, and then starts up the server and tries to
    connect again. This of course all happens on the same machine (local
    connection only).

    The connections can come in as fast as 30+/sec, so the server is
    threaded (using SocketServer.Th readingTCPServe r). Further, the server
    initialization is threaded into two: one is to do the real, lengthy
    setup, the other is start lisenting ASAP.

    The problem: I need to prevent multiple copies of the server being
    started. I did this by using file locking (fcntl.lockf()) . However,
    not every time the code successfully prevented the server being
    started up more than one time. Here is the relevant code:

    ---- CLIENT CODE SNIPPET STARTS ----
    from fooConsts import *

    import socket,fcntl,os ,sys,time,loggi ng

    FOO_CLIENT_DEBU G=1

    POLL_SERVER_INT VL=0.5

    if FOO_CLIENT_DEBU G:
    myPID=os.getpid ()
    logger = logging.getLogg er('FOO Client')
    hdlr = logging.FileHan dler(TMP_PATH+' fooClient.log')
    formatter = logging.Formatt er('%(asctime)s %(message)s')
    hdlr.setFormatt er(formatter)
    logger.addHandl er(hdlr)
    logger.setLevel (logging.INFO)

    def log (info):
    logger.info('%d : %s'%(myPID,info ))


    class FooClient:
    def __init__ (self, startServer=Tru e):
    """Connects to FooServer if it exists, otherwise starts it and
    connects to it"""
    self.connected= True
    if self.connect(): return
    elif not startServer:
    if FOO_CLIENT_DEBU G: log('connection failed 1')
    self.connected= False
    return

    # try to obtain the right to start a server; if we can't,
    someone else
    # must be doing it - try to reconnect
    try:
    if FOO_CLIENT_DEBU G: log('try to get right to start
    server')
    serverStartLock =open(TMP_PATH+ 'serverStart.lo ck','w')
    fcntl.lockf(ser verStartLock.fi leno(),fcntl.LO CK_EX|fcntl.LOC K_NB)
    except:
    if FOO_CLIENT_DEBU G: log('someone else is doing it; wait
    2')
    while 1:
    time.sleep(POLL _SERVER_INTVL)
    if self.connect(): return

    # safe to start a server and connect to it
    if FOO_CLIENT_DEBU G: log('start server')
    exitCode=os.sys tem('python -OO "%sfooServer.py " &'%FOO_PATH)
    if exitCode==0:
    if FOO_CLIENT_DEBU G: log('server is being started; wait
    3')
    while 1:
    time.sleep(POLL _SERVER_INTVL)
    if self.connect(): return
    else:
    if FOO_CLIENT_DEBU G: log('sever bootstrapping failed')
    self.connected= False
    raise "Cannot start FOOServer"

    def connect (self):
    """Attempts to connect to FOOServer from PORT_MIN to
    PORT_MAX"""
    if FOO_CLIENT_DEBU G: log('connection attempt')
    port=PORT_MIN
    while port<=PORT_MAX:
    try:
    self.socket=soc ket.socket(sock et.AF_INET,
    socket.SOCK_STR EAM)
    self.socket.con nect(('',port))
    break
    except:
    self.socket.clo se()
    port+=1

    if port>PORT_MAX:
    if FOO_CLIENT_DEBU G: log('connection failed 2')
    return False
    if FOO_CLIENT_DEBU G: log('connection succeeded at port
    %d'%port)
    return True
    ....

    FooClient()
    ---- CLIENT CODE SNIPPET ENDS ----

    From the log (when problem occurred) I see even *AFTER* the server was
    started and accepted connections (several connections came and went
    happily), a connection would come in and hit the "connection failed 1"
    log line. This shouldn't have happened as the default value of
    startServer for FooClient.__ini t__() is True. In the very same
    incident, FooServer was started twice, but in the log there's no trace
    of two "start server" lines, but only the "connection failed 1" line
    in the very end of the log.

    I realize this is a rather complex question to ask for help, but I'm
    really at wits end. Feel free to skip the detailed description, and
    just throw me some general tips and hints. Thank you VERY MUCH!
  • ralf@brainbot.com

    #2
    Re: server bootstrapping upon connection (WARNING: LONG)

    fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
    [color=blue]
    > Here is the situation: I want my server started up upon connection.
    > When the first connection comes in, the server is not running. The
    > client realizes the fact, and then starts up the server and tries to
    > connect again. This of course all happens on the same machine (local
    > connection only).
    >
    > The connections can come in as fast as 30+/sec, so the server is
    > threaded (using SocketServer.Th readingTCPServe r). Further, the server
    > initialization is threaded into two: one is to do the real, lengthy
    > setup, the other is start lisenting ASAP.
    >
    > The problem: I need to prevent multiple copies of the server being
    > started. I did this by using file locking (fcntl.lockf()) . However,
    > not every time the code successfully prevented the server being
    > started up more than one time. Here is the relevant code:[/color]

    When using fcntl.lockf different FooClient instances in the same
    process will be able lock the file and start another server. You could
    either use fcntl.flock to prevent that or use some global flag.
    Also be sure to keep the file open by keeping a reference to it
    (i.e. self.serverStar tLock=open(...) ).
    For debugging purposes, remove that 'if self.connect(): return' and
    I think you'll see much more servers being started.
    [color=blue]
    >
    > ---- CLIENT CODE SNIPPET STARTS ----
    > from fooConsts import *
    >
    > import socket,fcntl,os ,sys,time,loggi ng
    >
    > FOO_CLIENT_DEBU G=1
    >
    > POLL_SERVER_INT VL=0.5
    >
    > if FOO_CLIENT_DEBU G:
    > myPID=os.getpid ()
    > logger = logging.getLogg er('FOO Client')
    > hdlr = logging.FileHan dler(TMP_PATH+' fooClient.log')
    > formatter = logging.Formatt er('%(asctime)s %(message)s')
    > hdlr.setFormatt er(formatter)
    > logger.addHandl er(hdlr)
    > logger.setLevel (logging.INFO)
    >
    > def log (info):
    > logger.info('%d : %s'%(myPID,info ))
    >
    >
    > class FooClient:
    > def __init__ (self, startServer=Tru e):
    > """Connects to FooServer if it exists, otherwise starts it and
    > connects to it"""
    > self.connected= True
    > if self.connect(): return
    > elif not startServer:
    > if FOO_CLIENT_DEBU G: log('connection failed 1')
    > self.connected= False
    > return
    >
    > # try to obtain the right to start a server; if we can't,
    > someone else
    > # must be doing it - try to reconnect
    > try:
    > if FOO_CLIENT_DEBU G: log('try to get right to start
    > server')
    > serverStartLock =open(TMP_PATH+ 'serverStart.lo ck','w')
    > fcntl.lockf(ser verStartLock.fi leno(),fcntl.LO CK_EX|fcntl.LOC K_NB)
    > except:
    > if FOO_CLIENT_DEBU G: log('someone else is doing it; wait
    > 2')
    > while 1:
    > time.sleep(POLL _SERVER_INTVL)
    > if self.connect(): return
    >
    > # safe to start a server and connect to it
    > if FOO_CLIENT_DEBU G: log('start server')
    > exitCode=os.sys tem('python -OO "%sfooServer.py " &'%FOO_PATH)
    > if exitCode==0:
    > if FOO_CLIENT_DEBU G: log('server is being started; wait
    > 3')
    > while 1:
    > time.sleep(POLL _SERVER_INTVL)
    > if self.connect(): return
    > else:
    > if FOO_CLIENT_DEBU G: log('sever bootstrapping failed')
    > self.connected= False
    > raise "Cannot start FOOServer"
    >
    > def connect (self):
    > """Attempts to connect to FOOServer from PORT_MIN to
    > PORT_MAX"""
    > if FOO_CLIENT_DEBU G: log('connection attempt')
    > port=PORT_MIN
    > while port<=PORT_MAX:
    > try:
    > self.socket=soc ket.socket(sock et.AF_INET,
    > socket.SOCK_STR EAM)
    > self.socket.con nect(('',port))
    > break
    > except:
    > self.socket.clo se()
    > port+=1
    >
    > if port>PORT_MAX:
    > if FOO_CLIENT_DEBU G: log('connection failed 2')
    > return False
    > if FOO_CLIENT_DEBU G: log('connection succeeded at port
    > %d'%port)
    > return True
    > ...
    >
    > FooClient()
    > ---- CLIENT CODE SNIPPET ENDS ----
    >
    > From the log (when problem occurred) I see even *AFTER* the server was
    > started and accepted connections (several connections came and went
    > happily), a connection would come in and hit the "connection failed 1"
    > log line. This shouldn't have happened as the default value of
    > startServer for FooClient.__ini t__() is True. In the very same[/color]

    Well, maybe too many connection attempts are pending...
    [color=blue]
    > incident, FooServer was started twice, but in the log there's no trace
    > of two "start server" lines, but only the "connection failed 1" line
    > in the very end of the log.
    >
    > I realize this is a rather complex question to ask for help, but I'm
    > really at wits end. Feel free to skip the detailed description, and
    > just throw me some general tips and hints. Thank you VERY MUCH![/color]

    --
    brainbot technologies ag
    boppstrasse 64 . 55118 mainz . germany
    fon +49 6131 211639-1 . fax +49 6131 211639-2
    http://brainbot.com/ mailto:ralf@bra inbot.com

    Comment

    • Benjamin Han

      #3
      Re: server bootstrapping upon connection (WARNING: LONG)

      On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
      [color=blue]
      > fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[color=green]
      >> The problem: I need to prevent multiple copies of the server being
      >> started. I did this by using file locking (fcntl.lockf()) . However,
      >> not every time the code successfully prevented the server being
      >> started up more than one time. Here is the relevant code:[/color]
      >
      > When using fcntl.lockf different FooClient instances in the same
      > process will be able lock the file and start another server. You could
      > either use fcntl.flock to prevent that or use some global flag.[/color]

      Hm... I didn't know there's a difference between flock() and lockf(),
      and I didn't get much info from the document either. Could you explain
      a bit on why lockf() would not lock the file?

      Actually I wrote a tiny script just to test if lockf() does what it
      claims to do:

      --- CODE STARTS ---
      #!/usr/bin/env python

      import os,fcntl,sys

      print "* about to open flock.txt"
      f=open('flock.t xt','w')
      print "* opened the file"
      fcntl.lockf(f.f ileno(),fcntl.L OCK_EX|fcntl.LO CK_NB)
      print "* obtained the lock, enter your line below:"
      l=sys.stdin.rea dline()
      f.truncate()
      f.write(l)
      f.flush()
      sys.stdin.readl ine()
      f.close()

      --- CODE ENDS ---

      It seems it does lock the file? (Mac OS X 10.3.2).
      [color=blue]
      > Also be sure to keep the file open by keeping a reference to it
      > (i.e. self.serverStar tLock=open(...) ). For debugging purposes, remove
      > that 'if self.connect(): return' and
      > I think you'll see much more servers being started.[/color]

      --- CODE SNIPPET STARTS ---
      class FooClient:
      def __init__ (self, startServer=Tru e):
      """Connects to FooServer if it exists, otherwise starts it and
      connects to it"""
      self.connected= True
      if self.connect(): return
      elif not startServer:
      if FOO_CLIENT_DEBU G: log('connection failed 1')
      self.connected= False
      return
      ...
      --- CODE SNIPPET ENDS ---

      Well in that case every connection will try to start a server. Good
      point on keeping a reference to the lock file though - I'll add to it
      and see what happens.
      [color=blue][color=green]
      >>
      >> From the log (when problem occurred) I see even *AFTER* the server was
      >> started and accepted connections (several connections came and went
      >> happily), a connection would come in and hit the "connection failed 1"
      >> log line. This shouldn't have happened as the default value of
      >> startServer for FooClient.__ini t__() is True. In the very same[/color]
      >
      > Well, maybe too many connection attempts are pending...[/color]

      I failed to see why this should affect the default value of the
      argument... if startServer is True (default), that log line should have
      never been reached.

      Thanks!

      Ben

      Comment

      • ralf@brainbot.com

        #4
        Re: server bootstrapping upon connection (WARNING: LONG)

        Benjamin Han <this@is.for.sp ambot> writes:
        [color=blue]
        > On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
        >[color=green]
        >> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[color=darkred]
        >>> The problem: I need to prevent multiple copies of the server being
        >>> started. I did this by using file locking (fcntl.lockf()) . However,
        >>> not every time the code successfully prevented the server being
        >>> started up more than one time. Here is the relevant code:[/color]
        >> When using fcntl.lockf different FooClient instances in the same
        >> process will be able lock the file and start another server. You could
        >> either use fcntl.flock to prevent that or use some global flag.[/color]
        >
        > Hm... I didn't know there's a difference between flock() and lockf(),
        > and I didn't get much info from the document either. Could you explain
        > a bit on why lockf() would not lock the file?[/color]

        Well, it might lock the file multiple times in the same
        process. That's the problem:
        ----
        import fcntl

        def getlockfile():
        return open('serverSta rt.lock', 'w')

        def getlock(f):
        try:
        lock(f.fileno() , fcntl.LOCK_EX|f cntl.LOCK_NB)
        except:
        return False
        return True

        def doit():
        f1 = getlockfile()
        f2 = getlockfile()
        print getlock(f1), getlock(f2)

        lock = fcntl.lockf
        doit()

        lock = fcntl.flock
        doit()
        ---

        Output is:
        True True
        True False

        [color=blue]
        >
        > Actually I wrote a tiny script just to test if lockf() does what it
        > claims to do:
        >
        > --- CODE STARTS ---
        > #!/usr/bin/env python
        >
        > import os,fcntl,sys
        >
        > print "* about to open flock.txt"
        > f=open('flock.t xt','w')
        > print "* opened the file"
        > fcntl.lockf(f.f ileno(),fcntl.L OCK_EX|fcntl.LO CK_NB)
        > print "* obtained the lock, enter your line below:"
        > l=sys.stdin.rea dline()
        > f.truncate()
        > f.write(l)
        > f.flush()
        > sys.stdin.readl ine()
        > f.close()
        >
        > --- CODE ENDS ---
        >
        > It seems it does lock the file? (Mac OS X 10.3.2).
        >[color=green]
        >> Also be sure to keep the file open by keeping a reference to it
        >> (i.e. self.serverStar tLock=open(...) ). For debugging purposes,
        >> remove that 'if self.connect(): return' and
        >> I think you'll see much more servers being started.[/color]
        >
        > --- CODE SNIPPET STARTS ---
        > class FooClient:
        > def __init__ (self, startServer=Tru e):
        > """Connects to FooServer if it exists, otherwise starts it and
        > connects to it"""
        > self.connected= True
        > if self.connect(): return
        > elif not startServer:
        > if FOO_CLIENT_DEBU G: log('connection failed 1')
        > self.connected= False
        > return
        > ..
        > --- CODE SNIPPET ENDS ---
        >
        > Well in that case every connection will try to start a server. Good
        > point on keeping a reference to the lock file though - I'll add to it
        > and see what happens.
        >[color=green][color=darkred]
        >>> From the log (when problem occurred) I see even *AFTER* the server
        >>> was
        >>> started and accepted connections (several connections came and went
        >>> happily), a connection would come in and hit the "connection failed 1"
        >>> log line. This shouldn't have happened as the default value of
        >>> startServer for FooClient.__ini t__() is True. In the very same[/color]
        >> Well, maybe too many connection attempts are pending...[/color]
        >
        > I failed to see why this should affect the default value of the
        > argument... if startServer is True (default), that log line should
        > have never been reached.[/color]

        Well, then I suppose you're passing an false argument to
        FooClient.__ini t__. I just wanted to say, that even if the server is
        listening on that port, that connection attempts may fail.
        [color=blue]
        >
        > Thanks!
        >
        > Ben[/color]

        --
        brainbot technologies ag
        boppstrasse 64 . 55118 mainz . germany
        fon +49 6131 211639-1 . fax +49 6131 211639-2
        http://brainbot.com/ mailto:ralf@bra inbot.com

        Comment

        • Benjamin Han

          #5
          Re: server bootstrapping upon connection (WARNING: LONG)

          On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
          [color=blue]
          > Benjamin Han <this@is.for.sp ambot> writes:
          >[color=green]
          >> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
          >>[color=darkred]
          >>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
          >>>> The problem: I need to prevent multiple copies of the server being
          >>>> started. I did this by using file locking (fcntl.lockf()) . However,
          >>>> not every time the code successfully prevented the server being
          >>>> started up more than one time. Here is the relevant code:
          >>> When using fcntl.lockf different FooClient instances in the same
          >>> process will be able lock the file and start another server. You could
          >>> either use fcntl.flock to prevent that or use some global flag.[/color]
          >>
          >> Hm... I didn't know there's a difference between flock() and lockf(),
          >> and I didn't get much info from the document either. Could you explain
          >> a bit on why lockf() would not lock the file?[/color]
          >
          > Well, it might lock the file multiple times in the same
          > process. That's the problem:
          > ----
          > import fcntl
          >
          > def getlockfile():
          > return open('serverSta rt.lock', 'w')
          >
          > def getlock(f):
          > try:
          > lock(f.fileno() , fcntl.LOCK_EX|f cntl.LOCK_NB)
          > except:
          > return False
          > return True
          >
          > def doit():
          > f1 = getlockfile()
          > f2 = getlockfile()
          > print getlock(f1), getlock(f2)
          >
          > lock = fcntl.lockf
          > doit()
          >
          > lock = fcntl.flock
          > doit()
          > ---
          >
          > Output is:
          > True True
          > True False[/color]

          Ok I've since changed all lockf() to flock(), but from the "ps" log
          during the stress test, I still get two servers started up, listed by
          ps like this:

          27308 ?? S 0:01.25 python -OO fooServer.py
          27465 ?? SV 0:01.25 python -OO fooServer.py

          In this case the one with pid 27308 was the first server process
          started. Some time later, with unknown reason, another server entry
          will show up in ps log, but the status is always "RV" or "SV" (e.g.,
          pid 27465). The way I started the server in the code is os.system()
          call, and from the man page of ps, 'V' means "The process is suspended
          during a vfork."

          The weird thing is I also have log statements in the server script, and
          from the log file only ONE copy is actually started. I start to suspect
          that I might have misunderstood the ps log - could it be there really
          is only ONE copy running, despite there're (up to) 2 entries shown in
          the ps log?
          [color=blue]
          > I just wanted to say, that even if the server is
          > listening on that port, that connection attempts may fail.[/color]

          How is it possible? Could you explain a bit more?

          Thanks again,

          Ben

          Comment

          • Krzysztof Stachlewski

            #6
            Re: server bootstrapping upon connection (WARNING: LONG)

            "Benjamin Han" <this@is.for.sp ambot> wrote in message
            news:2004021019 594716807%this@ isforspambot...
            [color=blue][color=green]
            > > I just wanted to say, that even if the server is
            > > listening on that port, that connection attempts may fail.[/color]
            >
            > How is it possible? Could you explain a bit more?[/color]

            Your server may be accepting connections slower than
            the speed at which new connections are coming.
            In such situation your operating system maintains a list
            of those awaiting connections. You can control the length of
            this list with the "backlog" parameter of the "listen" function.
            If the list is full, new connections may simply be dropped
            as if the server was not listening at all.

            Stach

            Comment

            • Benjamin Han

              #7
              Re: server bootstrapping upon connection (WARNING: LONG)

              On 2004-02-11 07:00:32 -0500, "Krzysztof Stachlewski"
              <stach@fr.REMOV E.pl> said:
              [color=blue]
              > "Benjamin Han" <this@is.for.sp ambot> wrote in message
              > news:2004021019 594716807%this@ isforspambot...
              >[color=green][color=darkred]
              >>> I just wanted to say, that even if the server is
              >>> listening on that port, that connection attempts may fail.[/color]
              >>
              >> How is it possible? Could you explain a bit more?[/color]
              >
              > Your server may be accepting connections slower than
              > the speed at which new connections are coming.
              > In such situation your operating system maintains a list
              > of those awaiting connections. You can control the length of
              > this list with the "backlog" parameter of the "listen" function.
              > If the list is full, new connections may simply be dropped
              > as if the server was not listening at all.
              >
              > Stach[/color]

              Thanks for the tip - the default queue size is 5. I've since changed it
              into 100 (much more than what it actually needs), but the original
              problem remains.

              Comment

              • Benjamin Han

                #8
                Re: server bootstrapping upon connection (WARNING: LONG)

                On 2004-02-10 19:59:47 -0500, Benjamin Han <this@is.for.sp ambot> said:
                [color=blue]
                > On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
                >[color=green]
                >> Benjamin Han <this@is.for.sp ambot> writes:
                >>[color=darkred]
                >>> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
                >>>
                >>>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:
                >>>>> The problem: I need to prevent multiple copies of the server being
                >>>>> started. I did this by using file locking (fcntl.lockf()) . However,
                >>>>> not every time the code successfully prevented the server being
                >>>>> started up more than one time. Here is the relevant code:
                >>>> ... from the "ps" log during the stress test, I still get two servers
                >>>> started up, listed by ps like this:[/color][/color]
                >
                > 27308 ?? S 0:01.25 python -OO fooServer.py
                > 27465 ?? SV 0:01.25 python -OO fooServer.py
                >
                > In this case the one with pid 27308 was the first server process
                > started. Some time later, with unknown reason, another server entry
                > will show up in ps log, but the status is always "RV" or "SV" (e.g.,
                > pid 27465). The way I started the server in the code is os.system()
                > call, and from the man page of ps, 'V' means "The process is suspended
                > during a vfork."
                >
                > The weird thing is I also have log statements in the server script, and
                > from the log file only ONE copy is actually started. I start to suspect
                > that I might have misunderstood the ps log - could it be there really
                > is only ONE copy running, despite there're (up to) 2 entries shown in
                > the ps log?[/color]

                Ok this is by far the most puzzling one: I use logger module in the
                server script to notify its execution. The code is like this:

                --- LOGGING CODE STARTS ---
                if FOO_COMM_DEBUG:
                myPID=os.getpid ()
                logger = logging.getLogg er('FOO Comm')
                hdlr = logging.FileHan dler(TMP_PATH+' fooComm.log')
                formatter = logging.Formatt er('%(asctime)s %(message)s')
                hdlr.setFormatt er(formatter)
                logger.addHandl er(hdlr)
                logger.setLevel (logging.INFO)

                def log (info):
                logger.info('Se rver %d: %s'%(myPID,info ))

                if FOO_COMM_DEBUG: log('process %d started me'%os.getppid( ))
                --- LOGGING CODE ENDS ---

                This is the FIRST block of code right after my "import ..." statement
                in the server script, so there's no way that it could be skipped if the
                script gets executed. Of course FOO_COMM_DEBUG is set to 1.

                But as described, the ps log showed two entries of fooServer.py, but in
                the log produced, only one server actually was started (I can tell that
                from the PID in the log file). I really can't explain this discrepency!

                Comment

                • Alan Kennedy

                  #9
                  Re: server bootstrapping upon connection (WARNING: LONG)

                  [Benjamin Han][color=blue]
                  > But as described, the ps log showed two entries of fooServer.py, but in
                  > the log produced, only one server actually was started (I can tell that
                  > from the PID in the log file). I really can't explain this discrepency![/color]

                  Quick sanity check: Are you using an operating system that reports an
                  entry in the ps list for every *thread* of a process, rather an entry
                  for every process, as you might expect. So if you have 2 threads
                  running under the "fooServer. py" process, you get 2 entries in the
                  output of ps, rather than 1.

                  Some unixen do this.

                  HTH,

                  --
                  alan kennedy
                  ------------------------------------------------------
                  check http headers here: http://xhaus.com/headers
                  email alan: http://xhaus.com/contact/alan

                  Comment

                  • Alan Kennedy

                    #10
                    Re: server bootstrapping upon connection (WARNING: LONG)

                    [ralf@brainbot.c om][color=blue][color=green][color=darkred]
                    >>>> I just wanted to say, that even if the server is
                    >>>> listening on that port, that connection attempts may fail.[/color][/color][/color]

                    [Benjamin Han][color=blue][color=green][color=darkred]
                    >>> How is it possible? Could you explain a bit more?[/color][/color][/color]

                    [Krzysztof Stachlewski][color=blue][color=green]
                    >> Your server may be accepting connections slower than
                    >> the speed at which new connections are coming.
                    >> In such situation your operating system maintains a list
                    >> of those awaiting connections. You can control the length of
                    >> this list with the "backlog" parameter of the "listen" function.
                    >> If the list is full, new connections may simply be dropped
                    >> as if the server was not listening at all.[/color][/color]

                    [Benjamin Han][color=blue]
                    > Thanks for the tip - the default queue size is 5. I've since changed it
                    > into 100 (much more than what it actually needs), but the original
                    > problem remains.[/color]

                    Another quick point: even if you set the backlog to 100 connections,
                    that only gives you 3.3333r seconds of grace before the backlog of
                    connections fills up, given the "up to 30 connections/second" you
                    mentioned in another message.

                    The 101st client to attempt a connect() will fail, if the server
                    doesn't start accept()ing connections before the backlog fills up.

                    HTH,

                    --
                    alan kennedy
                    ------------------------------------------------------
                    check http headers here: http://xhaus.com/headers
                    email alan: http://xhaus.com/contact/alan

                    Comment

                    • Benjamin Han

                      #11
                      Re: server bootstrapping upon connection (WARNING: LONG)

                      On 2004-02-11 14:23:33 -0500, Alan Kennedy <alanmk@hotmail .com> said:
                      [color=blue]
                      > [Benjamin Han][color=green]
                      >> But as described, the ps log showed two entries of fooServer.py, but in
                      >> the log produced, only one server actually was started (I can tell that
                      >> from the PID in the log file). I really can't explain this discrepency![/color]
                      >
                      > Quick sanity check: Are you using an operating system that reports an
                      > entry in the ps list for every *thread* of a process, rather an entry
                      > for every process, as you might expect. So if you have 2 threads
                      > running under the "fooServer. py" process, you get 2 entries in the
                      > output of ps, rather than 1.
                      >
                      > Some unixen do this.
                      >
                      > HTH,[/color]

                      Hm, this is Mac OS X 10.3.2, and I checked the man page of ps, it has
                      another option '-M' to show the threads (which I didn't use to produce
                      the ps log in question) - but of course they all show up with the same
                      PID, not the phenomena I saw in the ps log file (where more than one
                      fooServer.py processes showed up with unique PIDs).

                      But fooServer.py does use more than one thread (both from the
                      SocketServer.Th readingTCPServe r and from some other threads it creates).

                      Comment

                      • Krzysztof Stachlewski

                        #12
                        Re: server bootstrapping upon connection (WARNING: LONG)

                        "Benjamin Han" <this@is.for.sp ambot> wrote in message
                        news:2004021111 431416807%this@ isforspambot...
                        [color=blue]
                        > Thanks for the tip - the default queue size is 5. I've since changed it
                        > into 100 (much more than what it actually needs), but the original
                        > problem remains.[/color]

                        I don't really think that your OS supports
                        so big backlog queues.

                        Stach

                        Comment

                        • Benjamin Han

                          #13
                          Re: server bootstrapping upon connection (WARNING: LONG)

                          On 2004-02-11 14:50:02 -0500, Alan Kennedy <alanmk@hotmail .com> said:
                          [color=blue]
                          > [ralf@brainbot.c om][color=green][color=darkred]
                          >>>>> I just wanted to say, that even if the server is
                          >>>>> listening on that port, that connection attempts may fail.[/color][/color]
                          >
                          > [Benjamin Han][color=green][color=darkred]
                          >>>> How is it possible? Could you explain a bit more?[/color][/color]
                          >
                          > [Krzysztof Stachlewski][color=green][color=darkred]
                          >>> Your server may be accepting connections slower than
                          >>> the speed at which new connections are coming.
                          >>> In such situation your operating system maintains a list
                          >>> of those awaiting connections. You can control the length of
                          >>> this list with the "backlog" parameter of the "listen" function.
                          >>> If the list is full, new connections may simply be dropped
                          >>> as if the server was not listening at all.[/color][/color]
                          >
                          > [Benjamin Han][color=green]
                          >> Thanks for the tip - the default queue size is 5. I've since changed it
                          >> into 100 (much more than what it actually needs), but the original
                          >> problem remains.[/color]
                          >
                          > Another quick point: even if you set the backlog to 100 connections,
                          > that only gives you 3.3333r seconds of grace before the backlog of
                          > connections fills up, given the "up to 30 connections/second" you
                          > mentioned in another message.
                          >
                          > The 101st client to attempt a connect() will fail, if the server
                          > doesn't start accept()ing connections before the backlog fills up.
                          >
                          > HTH,[/color]

                          The server is threaded, so it only takes fractions of a sec to finish
                          off a request. Actually from the log once the server is up there is
                          nothing I can see which could fill the entire queue (even if with a
                          much smaller size).

                          My hunch is that the problem is somewhere else. First I can't still
                          account for the discrepancy between the server log file and ps log
                          file: the former told me only one server instance has been started, but
                          the later showed from time to time a second instance had indeed been
                          fired up (with state "RV" listed in the log file).

                          BTW I use this simple line in bash to collec the ps log:

                          while [ '1' = '1' ] ; do ps awx | grep python | grep -v grep>> ps.log ;
                          echo --- >> ps.log ; done

                          I don't suppose this could interfere anything?

                          Comment

                          • Benjamin Han

                            #14
                            Re: server bootstrapping upon connection (WARNING: LONG)


                            On 2004-02-10 19:59:47 -0500, Benjamin Han <this@is.for.sp ambot> said:
                            [color=blue]
                            > On 2004-02-10 18:22:30 -0500, ralf@brainbot.c om said:
                            >[color=green]
                            >> Benjamin Han <this@is.for.sp ambot> writes:
                            >>[color=darkred]
                            >>> On 2004-02-10 13:46:37 -0500, ralf@brainbot.c om said:
                            >>>
                            >>>> fortepianissimo @yahoo.com.tw (Fortepianissim o) writes:[/color][/color][/color]
                            [color=blue]
                            > Here is the situation: I want my server started up upon connection.
                            > When the first connection comes in, the server is not running. The
                            > client realizes the fact, and then starts up the server and tries to
                            > connect again. This of course all happens on the same machine (local
                            > connection only).
                            >
                            > The connections can come in as fast as 30+/sec, so the server is
                            > threaded (using SocketServer.Th readingTCPServe r). Further, the server
                            > initialization is threaded into two: one is to do the real, lengthy
                            > setup, the other is start lisenting ASAP.[/color]
                            [color=blue][color=green][color=darkred]
                            >>>>> The problem: I need to prevent multiple copies of the server being
                            >>>>> started. I did this by using file locking (fcntl.lockf()) . However,
                            >>>>> not every time the code successfully prevented the server being
                            >>>>> started up more than one time. Here is the relevant code:[/color][/color]
                            >
                            > ... from the "ps" log during the stress test, I still get two servers
                            > started up, listed by ps like this:
                            >
                            > 27308 ?? S 0:01.25 python -OO fooServer.py
                            > 27465 ?? SV 0:01.25 python -OO fooServer.py
                            >
                            > In this case the one with pid 27308 was the first server process
                            > started. Some time later, with unknown reason, another server entry
                            > will show up in ps log, but the status is always "RV" or "SV" (e.g.,
                            > pid 27465). The way I started the server in the code is os.system()
                            > call, and from the man page of ps, 'V' means "The process is suspended
                            > during a vfork."
                            >
                            > The weird thing is I also have log statements in the server script, and
                            > from the log file only ONE copy is actually started. I start to suspect
                            > that I might have misunderstood the ps log - could it be there really
                            > is only ONE copy running, despite there're (up to) 2 entries shown in
                            > the ps log?[/color]

                            Because of this discrepancy between the ps log and the log produced by
                            running the server script, I started to suspect that the second server
                            instance shown in the ps log was a forked process, especially since it
                            always came with a 'V' (vforked) state. Adding a "-o flags" to the ps
                            command showed the following (exerpt);

                            8896 Fri Feb 13 03:47:25 2004 R 2004004 - - 2a567f8
                            python -OO fooServer.py
                            8997 Fri Feb 13 03:47:34 2004 RV 8000014 - - 2a54028
                            python -OO fooServer.py

                            After checking sys/proc.h:

                            #define P_VFORK 0x2000000 /* process has vfork children */
                            #define P_INVFORK 0x8000000 /* proc in vfork */

                            Basically confirmed that the server process was vforked. I then did a
                            "grep -r fork *" but none showed in any of my code. My question then
                            is, where could such a "hidden" vfork happen? I did use the following
                            in my code:

                            1. pyDNS (http://pydns.sourceforge.net/)
                            2. os.popen(), os.popen2(), os.system()

                            Any hint is extremely welcome!

                            Comment

                            Working...