asynchat sends data on async_chat.push and .push_with_producer

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ludvig.ericson@gmail.com

    asynchat sends data on async_chat.push and .push_with_producer

    Hello,

    My question concerns asynchat in particular. With the following half-
    pseudo code in mind:

    class Example(asyncha t.async_chat):
    def readable(self):
    if foo:
    self.push_with_ producer(Produc erA())
    return asynchat.async_ chat.readable(s elf)

    Now, asyncore will call the readable function just before a select(),
    and it is meant to determine whether or not to include that asyncore
    dispatcher in the select map for reading.

    The problem with this code is that it has the unexpected side-effect
    of _immediately_ trying to send, disregarding if the async_chat object
    is indeed writable or not.

    The asynchat.push_w ith_producer (and .push as well)
    call .initiate_send( ), which in turn calls .send if there's data
    buffered. While this might seem logical, it isn't at all.

    Insinuate that when Example.readabl e is called, the socket has already
    been closed. There are two possible scenarios where it could be
    closed. a) The remote endpoint closed the connection, and b) the
    producer ProducerA somehow closed the connection (my case).

    Obviously, calling send on a socket that has been closed will result
    in an error - EBADF, "Bad file descriptor".

    So, my question is: Why does asynchat.push* call self.initiate_s end?
    Nothing in the name "push" suggests that it'll transmit immediately,
    disregarding potential "closedness ". Removing the two calls
    to .initiate_send( ) in the two push functions would still mean data is
    sent, but only when data can be sent - which is, IMO, what should be
    done.

    Thankful for insights,
    Ludvig.
  • Josiah Carlson

    #2
    Re: asynchat sends data on async_chat.push and .push_with_prod ucer

    Ludvig,

    In a substantial way, I agree with you. Calling initiate_send()
    within push or push_with_produ cer is arguably a misfeature (which you
    have argued).

    In a pure world, the only writing that is done would be within the
    handle_send() callbacks within the select loop. Then again, in a
    perfect world, calling readable() and writable() would have no strange
    side affects (as your example below has), and all push*() calls would
    be made within the handle_*() methods.

    We do not live in a pure world, Python isn't pure (practicality beats
    purity), and by attempting to send some data each time a .push*()
    method is called, there are measurable increases in transfer rates.

    In the particular case you are looking at (and complaining about ;) ),
    if you want to bypass the initiate_send() call, you can dig into the
    particular implementation of asynchat you are using (the internals may
    change in 2.6 and 3.x versus 2.5 and previous), and append your output
    to the outgoing queue. You could even abstract out the push*() calls
    for a non-auto-sending version (easy), write your own initiate_send()
    method that checks the stack to verify that it's being called from
    handle_send() (also easy), or any one of many other work-arounds.

    Yes, it would be convenient to not have push*() actually send data
    when called in some cases, but in others, the increase in data
    transfer rates and/or reduction in latency is substantial.

    - Josiah

    ludvig.ericson@ gmail.com wrote:
    Hello,
    >
    My question concerns asynchat in particular. With the following half-
    pseudo code in mind:
    >
    class Example(asyncha t.async_chat):
    def readable(self):
    if foo:
    self.push_with_ producer(Produc erA())
    return asynchat.async_ chat.readable(s elf)
    >
    Now, asyncore will call the readable function just before a select(),
    and it is meant to determine whether or not to include that asyncore
    dispatcher in the select map for reading.
    >
    The problem with this code is that it has the unexpected side-effect
    of _immediately_ trying to send, disregarding if the async_chat object
    is indeed writable or not.
    >
    The asynchat.push_w ith_producer (and .push as well)
    call .initiate_send( ), which in turn calls .send if there's data
    buffered. While this might seem logical, it isn't at all.
    >
    Insinuate that when Example.readabl e is called, the socket has already
    been closed. There are two possible scenarios where it could be
    closed. a) The remote endpoint closed the connection, and b) the
    producer ProducerA somehow closed the connection (my case).
    >
    Obviously, calling send on a socket that has been closed will result
    in an error - EBADF, "Bad file descriptor".
    >
    So, my question is: Why does asynchat.push* call self.initiate_s end?
    Nothing in the name "push" suggests that it'll transmit immediately,
    disregarding potential "closedness ". Removing the two calls
    to .initiate_send( ) in the two push functions would still mean data is
    sent, but only when data can be sent - which is, IMO, what should be
    done.
    >
    Thankful for insights,
    Ludvig.

    Comment

    • ludvig.ericson@gmail.com

      #3
      Re: asynchat sends data on async_chat.push and .push_with_prod ucer

      In a pure world, the only writing that is done would be within the
      handle_send() callbacks within the select loop.  Then again, in a
      perfect world, calling readable() and writable() would have no strange
      side affects (as your example below has), and all push*() calls would
      be made within the handle_*() methods.
      It wouldn't have those side-effects if push really just pushed. :-P
      We do not live in a pure world, Python isn't pure (practicality beats
      purity), and by attempting to send some data each time a .push*()
      method is called, there are measurable increases in transfer rates.
      -- 8< --
      Yes, it would be convenient to not have push*() actually send data
      when called in some cases, but in others, the increase in data
      transfer rates and/or reduction in latency is substantial.
      If it increases transfer speed that much, the calling application
      almost has to be broken, or at least not designed as it should be - of
      course there are such applications, but you know...

      Anyway, I went for a subclassing way of dealing with it, and it works
      fine.

      Thanks for the reply though, hadn't considered possibly "flawed"
      applications where the asyncore loop isn't revisited as often as it
      should. :->
      Ludvig

      Comment

      • Giampaolo Rodola'

        #4
        Re: asynchat sends data on async_chat.push and .push_with_prod ucer

        On 13 Mag, 17:59, Josiah Carlson <josiah.carl... @gmail.comwrote :
        We do not live in a pure world, Python isn't pure (practicality beats
        purity), and by attempting to send some data each time a .push*()
        method is called, there are measurable increases in transfer rates.
        Good point. I'd like to ask a question: if we'd have a default
        asyncore.loop timeout of (say) 0.01 ms instead of 30 could we avoid
        such problem?
        I've always found weird that asyncore has such an high default timeout
        value.
        Twisted, for example, uses a default of 0.01 ms for all its reactors.

        Comment

        • Giampaolo Rodola'

          #5
          Re: asynchat sends data on async_chat.push and .push_with_prod ucer

          On 13 Mag, 18:35, "ludvig.eric... @gmail.com"
          <ludvig.eric... @gmail.comwrote :
          Anyway, I went for a subclassing way of dealing with it, and it works
          fine.
          As Josiah already stated pay attention to the changes that will be
          applied to asyncore internals in Python 2.6 and 3.0 (in detail you
          could take a look at how things will be changed by taking a look at
          the patch provided in bug #1736190).
          Your subclass could not work on all implementations .

          --- Giampaolo

          Comment

          • Josiah Carlson

            #6
            Re: asynchat sends data on async_chat.push and .push_with_prod ucer

            On May 13, 9:35 am, "ludvig.eric... @gmail.com"
            <ludvig.eric... @gmail.comwrote :
            In a pure world, the only writing that is done would be within the
            handle_send() callbacks within the select loop. Then again, in a
            perfect world, calling readable() and writable() would have no strange
            side affects (as your example below has), and all push*() calls would
            be made within the handle_*() methods.
            >
            It wouldn't have those side-effects if push really just pushed. :-P
            >
            We do not live in a pure world, Python isn't pure (practicality beats
            purity), and by attempting to send some data each time a .push*()
            method is called, there are measurable increases in transfer rates.
            >
            -- 8< --
            >
            Yes, it would be convenient to not have push*() actually send data
            when called in some cases, but in others, the increase in data
            transfer rates and/or reduction in latency is substantial.
            >
            If it increases transfer speed that much, the calling application
            almost has to be broken, or at least not designed as it should be - of
            course there are such applications, but you know...
            It's not a matter of being broken at all, it's a matter of control
            flow. When we immediately try to send whenever a .push() call is
            made, the underlying TCP/IP stack will accept a reasonably large
            amount of data before it actually fills up (the most recent FreeBSD,
            from what I understand, will accept up to 1 meg, which is how they are
            able to saturate 10Gbit links), and by tossing the data into the the
            TCP/IP buffer early, the data gets sent earlier, thus reducing
            latency.

            Further, because we are making more actual calls to socket.send(),
            assuming the underlying TCP/IP buffer isn't filled (which may or may
            not be a good assumption), and assuming that the link has more
            capacity than is being used (usually the case on LANs and high-speed
            internet links), putting more data into the buffer to be handled by
            the underlying link layers will also increase transfer speeds.

            When the socket.send() calls are delayed until the next pass through
            the loop, and we aren't doing an initial send, then we don't get the
            benefit of the underlying TCP/IP socket layer buffering.

            In my experience over high-speed connections (LANs, Gbit WAN links,
            local machine connections), I have found that increasing block sizes
            to 32k to significantly improve performance for bandwidth constrained
            applications, as there are far fewer blocks to toss to the underlying
            layers, less Python code execution (Python 2.5 has a default block
            size of 512 bytes, or 64x as much Python execution to send the same
            amount of data, and one of the proposed 2.6 changes is to up this to a
            more reasonable 4096 bytes), and more effective use of the TCP/IP
            buffers (which are typically 64k or larger).
            Anyway, I went for a subclassing way of dealing with it, and it works
            fine.
            >
            Thanks for the reply though, hadn't considered possibly "flawed"
            applications where the asyncore loop isn't revisited as often as it
            should. :->
            Ludvig
            Again, it's not about the application being flawed, it's a matter of
            control flow. ;) Also, it's not a matter of any timeouts in the
            select/poll loops (as Giampaolo suggested); if any socket is readable
            or writable, those calls will return immediately (a few hundred
            microseconds per call isn't bad).

            - Josiah

            Comment

            Working...