yEnc implementation in Python, bit slow

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Freddie

    yEnc implementation in Python, bit slow

    Hi,

    I posted a while ago for some help with my word finder program, which is now
    quite a lot faster than I could manage. Thanks to all who helped :)

    This time, I've written a basic batch binary usenet poster in Python, but
    encoding the data into yEnc format is fairly slow. Is it possible to improve
    the routine any, WITHOUT using non-standard libraries? I don't want to have
    to rely on something strange ;)

    yEncode1 tends to be slightly faster here for me on my K6/2 500:

    $ python2.3 testyenc.py
    yEncode1 401563 1.82
    yEncode1 401563 1.83
    yEncode2 401562 1.83
    yEncode2 401562 1.83

    Any help would be greatly appreciated :)

    Freddie


    import struct
    import time
    from zlib import crc32

    def timing(f, n, a):
    print f.__name__,
    r = range(n)
    t1 = time.clock()
    for i in r:
    #f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
    f(a)
    t2 = time.clock()
    print round(t2-t1, 3)

    def yEncSetup():
    global YENC
    YENC = [''] * 256

    for I in range(256):
    O = (I + 42) % 256
    if O in (0, 10, 13, 61):
    # Supposed to modulo 256, but err, why bother?
    O += 64
    YENC[I] = '=%c' % O
    else:
    YENC[I] = '%c' % O

    def yEncode1(data):
    global YENC
    yenc = YENC

    encoded = []
    datalen = len(data)
    n = 0
    while n < datalen:
    chunk = data[n:n+256]
    n += len(chunk)
    encoded.extend([yenc[ord(c)] for c in chunk])
    encoded.append( '\n')

    print len(''.join(enc oded)),

    def yEncode2(data):
    global YENC
    yenc = YENC

    lines = []
    datalen = len(data)
    n = 0

    bits = divmod(datalen, 256)
    format = '256s' * bits[0]
    parts = struct.unpack(f ormat, data[:-bits[1]])
    for part in parts:
    lines.append('' .join([yenc[ord(c)] for c in part]))

    lines.append('' .join([yenc[ord(c)] for c in data[-bits[1]:]]))
    print len('\n'.join(l ines) + '\n'),


    yEncSetup()

    teststr1 = 'a' * 400000
    teststr2 = 'b' * 400000

    for meth in (yEncode1, yEncode2):
    timing(meth, 1, teststr1)
    timing(meth, 1, teststr2)

    --
    Remove the oinks!
  • Oren Tirosh

    #2
    Re: yEnc implementation in Python, bit slow

    On Tue, Aug 05, 2003 at 12:50:58AM +1000, Freddie wrote:[color=blue]
    > Hi,
    >
    > I posted a while ago for some help with my word finder program, which is now
    > quite a lot faster than I could manage. Thanks to all who helped :)
    >
    > This time, I've written a basic batch binary usenet poster in Python, but
    > encoding the data into yEnc format is fairly slow. Is it possible to improve
    > the routine any, WITHOUT using non-standard libraries? I don't want to have
    > to rely on something strange ;)[/color]

    Python is pretty quick as long as you avoid loops that operate character
    by character. Try to use functions that operate on longer strings.

    Suggestions:

    For the (x+42)%256 build a translation table and use str.translate.
    To encode characters as escape sequences use str.replace or re.sub.

    Oren

    Comment

    • Freddie

      #3
      Re: yEnc implementation in Python, bit slow

      Oren Tirosh <oren-py-l@hishome.net> wrote in
      news:mailman.10 60033689.18067. python-list@python.org :
      [color=blue]
      > Suggestions:
      >
      > For the (x+42)%256 build a translation table and use str.translate.
      > To encode characters as escape sequences use str.replace or re.sub.
      >
      > Oren[/color]

      Aahh. I couldn't work out how to use translate() at 4am this morning, but I
      worked it out now :) This version is a whoooole lot faster, and actually
      meets the yEnc line splitting spec. Bonus!

      $ python2.3 testyenc.py
      yEncode1 407682 1.98
      yEncode2 407707 0.18

      I'm not sure how to use re.sub to escape the characters, I assume it would
      also be 4 seperate replaces? Also, it needs a slightly more random input
      string than 'a' * 400000, so here we go.


      test = []
      for i in xrange(256):
      test.append(chr (i))
      teststr = ''.join(test*15 62)


      def yEncode2(data):
      trans = ''
      for i in range(256):
      trans += chr((i+42)%256)

      translated = data.translate( trans)

      # escape =, NUL, LF, CR
      for i in (61, 0, 10, 13):
      j = '=%c' % (i + 64)
      translated = translated.repl ace(chr(i), j)


      encoded = []
      n = 0
      for i in range(0, len(translated) , 256):
      chunk = translated[n+i:n+i+256]
      if chunk[-1] == '=':
      chunk += translated[n+i+256+1]
      n += 1
      encoded.append( chunk)
      encoded.append( '\n')

      result = ''.join(encoded )

      print len(result),
      return result

      --
      -----------------------------------------------------------
      Remove the oinks!

      Comment

      • Freddie

        #4
        Re: yEnc implementation in Python, bit slow

        Freddie <oinkfreddie@oi nkshlick.oinkne t> wrote in
        news:Xns93CE8D8 1747C5freddieth escaryeleph@218 .100.3.9:

        Arr. There's an error here, the [n+i+256+1] shouldn't have a 1. I always get
        that wrong :) The posted files actually decode now, and the yEncode()
        overhead is a lot lower.

        <snip>
        [color=blue]
        > encoded = []
        > n = 0
        > for i in range(0, len(translated) , 256):
        > chunk = translated[n+i:n+i+256]
        > if chunk[-1] == '=':
        > chunk += translated[n+i+256] <<< this line
        > n += 1
        > encoded.append( chunk)
        > encoded.append( '\n')[/color]

        --
        Remove the oinks!

        Comment

        Working...