XORing long strings opimization?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Noen

    XORing long strings opimization?

    def XOR(s1,s2):
    """ XOR string s1 with s2 """
    output = ""
    # Argument check
    if (type(s1) and type(s2)) != type(""):
    raise TypeError, "Arguments are not strings"
    if len(s1) != len(s2):
    raise ValueError, "Size differs in strings"
    # Xoring
    for c1 in s1:
    for c2 in s2:
    output += chr(ord(c1) ^ ord(c2))
    return output

    This way is very slow for large strings.
    Anyone know of a better and faster way to do it?

  • Peter Otten

    #2
    Re: XORing long strings opimization?

    Noen wrote:
    [color=blue]
    > def XOR(s1,s2):
    > """ XOR string s1 with s2 """
    > output = ""
    > # Argument check
    > if (type(s1) and type(s2)) != type(""):
    > raise TypeError, "Arguments are not strings"
    > if len(s1) != len(s2):
    > raise ValueError, "Size differs in strings"
    > # Xoring
    > for c1 in s1:
    > for c2 in s2:
    > output += chr(ord(c1) ^ ord(c2))
    > return output
    >
    > This way is very slow for large strings.
    > Anyone know of a better and faster way to do it?[/color]

    Before we start optimizing:

    len(XOR(s1, s2)) == len(s1) * len(s2)

    Bug or feature?

    Peter

    Comment

    • Noen

      #3
      Re: XORing long strings opimization?

      Peter Otten wrote:
      [color=blue]
      > Noen wrote:
      >
      >[color=green]
      >>def XOR(s1,s2):
      >>""" XOR string s1 with s2 """
      >>output = ""
      >># Argument check
      >>if (type(s1) and type(s2)) != type(""):
      >>raise TypeError, "Arguments are not strings"
      >>if len(s1) != len(s2):
      >>raise ValueError, "Size differs in strings"
      >># Xoring
      >>for c1 in s1:
      >>for c2 in s2:
      >>output += chr(ord(c1) ^ ord(c2))
      >>return output
      >>
      >>This way is very slow for large strings.
      >>Anyone know of a better and faster way to do it?[/color]
      >
      >
      > Before we start optimizing:
      >
      > len(XOR(s1, s2)) == len(s1) * len(s2)
      >
      > Bug or feature?
      >
      > Peter[/color]

      Oh, didnt notice that as I wrote it. Thanks...
      Anyway, this is how it should be.

      def XOR(s1,s2):
      """ XOR string s1 with s2 """
      output = ""
      # Argument check
      if (type(s1) and type(s2)) != type(""):
      raise TypeError, "Arguments are not strings"
      if len(s1) != len(s2):
      raise ValueError, "Size differs in strings"
      # Xoring
      for i in range(len(s1)):
      output += chr(ord(s1[i]) ^ ord(s2[i]))
      return output


      Comment

      • Peter Otten

        #4
        Re: XORing long strings opimization?

        Noen wrote:
        [color=blue]
        > Oh, didnt notice that as I wrote it. Thanks...
        > Anyway, this is how it should be.
        >
        > def XOR(s1,s2):
        > """ XOR string s1 with s2 """
        > output = ""
        > # Argument check
        > if (type(s1) and type(s2)) != type(""):
        > raise TypeError, "Arguments are not strings"
        > if len(s1) != len(s2):
        > raise ValueError, "Size differs in strings"
        > # Xoring
        > for i in range(len(s1)):
        > output += chr(ord(s1[i]) ^ ord(s2[i]))
        > return output[/color]

        Oops, I missed one:
        [color=blue][color=green][color=darkred]
        >>> xor2.XOR(1, "")[/color][/color][/color]
        Traceback (most recent call last):
        File "<stdin>", line 1, in ?
        File "xor2.py", line 7, in XOR
        if len(s1) != len(s2):
        TypeError: len() of unsized object

        Did you expect this?

        Peter

        Comment

        • Terry Reedy

          #5
          Re: XORing long strings opimization?


          "Noen" <not.available@ na.no> wrote in message
          news:i8Tpb.2445 1$BD3.4567242@j uliett.dax.net. ..[color=blue]
          > def XOR(s1,s2):
          > """ XOR string s1 with s2 """
          > output = ""
          > # Argument check
          > if (type(s1) and type(s2)) != type(""):
          > raise TypeError, "Arguments are not strings"
          > if len(s1) != len(s2):
          > raise ValueError, "Size differs in strings"
          > # Xoring
          > for i in range(len(s1)):
          > output += chr(ord(s1[i]) ^ ord(s2[i]))
          > return output[/color]

          You fell into the O(n**2) immutable sequence repeated addition trap.
          For O(n) behavior, try output = [] and return ''.join(output) .
          Replacing output+=stuff with output.append(s tuff) might be faster
          still.

          Terry J. Reedy


          Comment

          • Noen

            #6
            Re: XORing long strings opimization?

            Peter Otten wrote:[color=blue]
            > Noen wrote:
            >
            >[color=green]
            >>Oh, didnt notice that as I wrote it. Thanks...
            >>Anyway, this is how it should be.
            >>
            >>def XOR(s1,s2):
            >> """ XOR string s1 with s2 """
            >> output = ""
            >> # Argument check
            >> if (type(s1) and type(s2)) != type(""):
            >> raise TypeError, "Arguments are not strings"
            >> if len(s1) != len(s2):
            >> raise ValueError, "Size differs in strings"
            >> # Xoring
            >> for i in range(len(s1)):
            >> output += chr(ord(s1[i]) ^ ord(s2[i]))
            >> return output[/color]
            >
            >
            > Oops, I missed one:
            >
            >[color=green][color=darkred]
            >>>>xor2.XOR( 1, "")[/color][/color]
            >
            > Traceback (most recent call last):
            > File "<stdin>", line 1, in ?
            > File "xor2.py", line 7, in XOR
            > if len(s1) != len(s2):
            > TypeError: len() of unsized object
            >
            > Did you expect this?
            >
            > Peter[/color]
            nope, thought the Argument check would do it, but I used and opeartor
            instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
            faster?

            def XOR(s1,s2):
            """ XOR string s1 with s2 """
            output = ""
            # Argument check
            if (type(s1) or type(s2)) != type(""):
            raise TypeError, "Arguments are not strings"
            if len(s1) != len(s2):
            raise ValueError, "Size differs in strings"
            # Xoring
            for i in range(len(s1)):
            output += chr(ord(s1[i]) ^ ord(s2[i]))
            return output


            Comment

            • Peter Otten

              #7
              Re: XORing long strings opimization?

              Noen wrote:
              [color=blue]
              > def XOR(s1,s2):
              > """ XOR string s1 with s2 """
              > output = ""
              > # Argument check
              > if (type(s1) or type(s2)) != type(""):
              > raise TypeError, "Arguments are not strings"
              > if len(s1) != len(s2):
              > raise ValueError, "Size differs in strings"
              > # Xoring
              > for i in range(len(s1)):
              > output += chr(ord(s1[i]) ^ ord(s2[i]))
              > return output[/color]

              I keep nagging:
              [color=blue][color=green][color=darkred]
              >>> xor3.XOR("", 1)[/color][/color][/color]
              Traceback (most recent call last):
              File "<stdin>", line 1, in ?
              File "xor3.py", line 7, in XOR
              if len(s1) != len(s2):
              TypeError: len() of unsized object

              Must be in a nasty mood this evening :-)
              Seriously, I think that "first make it right, then make it fast" is a
              reasonable design principle.
              You can check the type

              if type(s1) != type("") or type(s2) != type(""):
              ....

              or

              if not isinstance(s1, str) or not isinstance(s2, str):
              ....

              or even

              if not type(s1) == type(s2) == type(""):
              ....

              but personally I would omit the check altogether. The second check could be
              relaxed to len(s1) <= len(s2), assuming that s1 is the data to be encrypted
              and s2 the random character sequence that makes your encryption NSA-proof.
              For (much) weaker encryption schemes, you could omit it completely and
              cycle over the characters of s2 (not recommended).

              As to speed, I would suppose the following to be somewhat faster:

              def xor(s1, s2):
              return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip( s1,
              s2)])

              It favours "".join(charact er list) over consecutive s += tail operations,
              which need to build "many" strings as intermediate results.

              I think, further speedups could be achieved building a lookup table for all
              character combinations.

              Peter

              Comment

              • Ulrich Petri

                #8
                Re: XORing long strings opimization?

                "Peter Otten" <__peter__@web. de> schrieb im Newsbeitrag
                news:bo8vq8$9q5 $01$1@news.t-online.com...[color=blue]
                > Noen wrote:
                >[color=green]
                > > Oh, didnt notice that as I wrote it. Thanks...
                > > Anyway, this is how it should be.
                > >
                > > def XOR(s1,s2):
                > > """ XOR string s1 with s2 """
                > > output = ""
                > > # Argument check
                > > if (type(s1) and type(s2)) != type(""):
                > > raise TypeError, "Arguments are not strings"
                > > if len(s1) != len(s2):
                > > raise ValueError, "Size differs in strings"
                > > # Xoring
                > > for i in range(len(s1)):
                > > output += chr(ord(s1[i]) ^ ord(s2[i]))
                > > return output[/color]
                >
                > Oops, I missed one:[/color]

                Therese even more ;)
                [color=blue][color=green]
                > > if (type(s1) and type(s2)) != type(""):
                > > raise TypeError, "Arguments are not strings"[/color][/color]

                i guess he meant:[color=blue][color=green]
                > > if type(s1) != type("") or type(s2) != type(""):
                > > raise TypeError, "Arguments are not strings"[/color][/color]

                Ciao Ulrich


                Comment

                • Francis Avila

                  #9
                  Re: XORing long strings opimization?

                  "Noen" <not.available@ na.no> wrote in message
                  news:ieUpb.2445 7$BD3.4567875@j uliett.dax.net. ..[color=blue]
                  > nope, thought the Argument check would do it, but I used and opeartor
                  > instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
                  > faster?[/color]

                  How about using the struct module?
                  Caveat: I don't know if struct limits the repeat count.

                  import struct
                  def XOR(data, key):
                  if len(data) != len(key):
                  raise ValueError, "data and key not of equal length"
                  ldata = struct.unpack(' =%sb' % len(data), data)
                  lkey = struct.unpack(' =%sb' % len(key), key)
                  lxored = [d^k for d,k in zip(ldata, lkey)]
                  xored = struct.pack('=% sb' % len(lxored), *lxored)
                  return xored

                  Just an idea.
                  --
                  Francis Avila

                  Comment

                  • Alex Martelli

                    #10
                    Re: XORing long strings opimization?

                    Peter Otten wrote:
                    [color=blue]
                    > Noen wrote:
                    >[color=green]
                    >> Oh, didnt notice that as I wrote it. Thanks...
                    >> Anyway, this is how it should be.
                    >>
                    >> def XOR(s1,s2):
                    >> """ XOR string s1 with s2 """
                    >> output = ""
                    >> # Argument check
                    >> if (type(s1) and type(s2)) != type(""):
                    >> raise TypeError, "Arguments are not strings"
                    >> if len(s1) != len(s2):
                    >> raise ValueError, "Size differs in strings"
                    >> # Xoring
                    >> for i in range(len(s1)):
                    >> output += chr(ord(s1[i]) ^ ord(s2[i]))
                    >> return output[/color]
                    >
                    > Oops, I missed one:
                    >[color=green][color=darkred]
                    >>>> xor2.XOR(1, "")[/color][/color]
                    > Traceback (most recent call last):
                    > File "<stdin>", line 1, in ?
                    > File "xor2.py", line 7, in XOR
                    > if len(s1) != len(s2):
                    > TypeError: len() of unsized object[/color]

                    Checks apart -- on to optimization as per subject:

                    [alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
                    '"".join([chr(ord(S[i])^ord(S[i])) for i in xrange(len(S))])'
                    10000 loops, best of 3: 113 usec per loop

                    i'm not sure i can do much better with strings. arrays are cool, though:

                    [alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
                    -s'import array' 'x=array.array( "b",S)' 'for i,v in enumerate(x): x[i]^=v'
                    'x.tostring()'
                    10000 loops, best of 3: 57 usec per loop

                    Morale: when you find yourself fiddling with lots of chr and ord, and care
                    about performance, think about "array('b', ... -- it could buy you an easy
                    speedup by a factor of 2, like here.


                    Alex




                    Comment

                    • Noen

                      #11
                      Re: XORing long strings opimization?

                      Here is the result, if anyone wants it :)
                      It was quite quick indeed, It took me 10.3279999495 seconds to XOR
                      "a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
                      if anyone know how to make it even faster, please tell me :)

                      import string
                      def xorstring(s1,s2 ):
                      """ XOR string s1 with s2 """
                      # Argument check
                      if not (type(s1) == type("")) or not (type(s2) == type("")):
                      raise TypeError, "Arguments are not strings"
                      if len(s1) != len(s2):
                      raise ValueError, "Size differs in strings"
                      # Xoring

                      # Create lists
                      l1 = map(ord, s1)
                      l2 = map(ord, s2)

                      # Xor it all together
                      xorlist = []
                      xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
                      return string.join(xor list,"") # Backward compatible

                      Comment

                      • Francis Avila

                        #12
                        Re: XORing long strings opimization?

                        "Alex Martelli" <aleax@aleax.it > wrote in message
                        news:a%Vpb.9414 5$e5.3462437@ne ws1.tin.it...
                        [color=blue]
                        > about performance, think about "array('b', ... -- it could buy you an easy
                        > speedup by a factor of 2, like here.[/color]

                        Why we should read a module a day: just today I learned about the array
                        module, when I *had* been using struct for similar tasks....

                        Sigh; if only the standard library weren't so *big* ;).
                        --
                        Francis Avila

                        Comment

                        • Peter Otten

                          #13
                          Re: XORing long strings opimization?

                          Francis Avila wrote:
                          [color=blue]
                          > "Noen" <not.available@ na.no> wrote in message
                          > news:ieUpb.2445 7$BD3.4567875@j uliett.dax.net. ..[color=green]
                          >> nope, thought the Argument check would do it, but I used and opeartor
                          >> instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
                          >> faster?[/color]
                          >
                          > How about using the struct module?
                          > Caveat: I don't know if struct limits the repeat count.
                          >
                          > import struct
                          > def XOR(data, key):
                          > if len(data) != len(key):
                          > raise ValueError, "data and key not of equal length"
                          > ldata = struct.unpack(' =%sb' % len(data), data)
                          > lkey = struct.unpack(' =%sb' % len(key), key)
                          > lxored = [d^k for d,k in zip(ldata, lkey)]
                          > xored = struct.pack('=% sb' % len(lxored), *lxored)
                          > return xored
                          >
                          > Just an idea.
                          > --
                          > Francis Avila[/color]

                          Nope. Both your and my approach make it worse. Only Alex Martelli did it
                          right:

                          import itertools, array, struct

                          def xor_noen(s1,s2) :
                          """ XOR string s1 with s2 """
                          output = ""
                          for i in range(len(s1)):
                          output += chr(ord(s1[i]) ^ ord(s2[i]))
                          return output

                          def xor_po(s1, s2):
                          return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip( s1,
                          s2)])

                          def xor_am(s1, s2):
                          x = array.array("b" , s2)
                          for i, v in enumerate(array .array("b", s1)):
                          x[i] ^= v
                          return x.tostring()

                          def xor_fa(data, key):
                          ldata = struct.unpack(' =%sb' % len(data), data)
                          lkey = struct.unpack(' =%sb' % len(key), key)
                          lxored = [d^k for d,k in zip(ldata, lkey)]
                          xored = struct.pack('=% sb' % len(lxored), *lxored)
                          return xored

                          # 35 usec
                          def pro_noen():
                          return xor_noen("sgnir ts ni sreffid eziS", "Size differs in strings")

                          # 40 usec
                          def pro_po():
                          return xor_po("sgnirts ni sreffid eziS", "Size differs in strings")

                          # 23 usec
                          def pro_am():
                          return xor_am("sgnirts ni sreffid eziS", "Size differs in strings")

                          # 46 usec
                          def pro_fa():
                          return xor_fa("sgnirts ni sreffid eziS", "Size differs in strings")

                          assert pro_po() == pro_am() == pro_fa() == pro_noen()

                          I was surprised how good pro_noen() does, but that might change for large
                          strings. By the way, results varied significantly (several usecs) in
                          various test runs.

                          Peter


                          Comment

                          • Casey Whitelaw

                            #14
                            Re: XORing long strings opimization?

                            Noen <not.available@ na.no> wrote in message news:<%AWpb.244 70$BD3.4569022@ juliett.dax.net >...[color=blue]
                            > Here is the result, if anyone wants it :)
                            > It was quite quick indeed, It took me 10.3279999495 seconds to XOR
                            > "a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
                            > if anyone know how to make it even faster, please tell me :)
                            >
                            > import string
                            > def xorstring(s1,s2 ):
                            > """ XOR string s1 with s2 """
                            > # Argument check
                            > if not (type(s1) == type("")) or not (type(s2) == type("")):
                            > raise TypeError, "Arguments are not strings"
                            > if len(s1) != len(s2):
                            > raise ValueError, "Size differs in strings"
                            > # Xoring
                            >
                            > # Create lists
                            > l1 = map(ord, s1)
                            > l2 = map(ord, s2)
                            >
                            > # Xor it all together
                            > xorlist = []
                            > xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
                            > return string.join(xor list,"") # Backward compatible[/color]

                            Just fiddling around with the (excellent) suggestions already posted,
                            and found a couple of things.

                            Alex's suggestion using array is the fastest, but was slightly faster
                            when using xrange() rather than enumerate():

                            def xor6(s1, s2):
                            a1 = array.array("b" , s1)
                            a2 = array.array("b" , s2)

                            for i in xrange(len(a1)) :
                            a1[i] ^= a2[i]
                            return a1.tostring()

                            On my system, this XORs 500,000 characters in 0.39s, with Noen's
                            original taking 0.67s.

                            Using psyco produced a further ~3x speedup, reducing this to 0.14s.
                            That's pretty fast!

                            Casey

                            Comment

                            • Georgy Pruss

                              #15
                              Re: XORing long strings opimization?

                              Well, nothing to add actually, but my results:

                              import string
                              import array

                              def xorstring0(s1,s 2): # the original method
                              # argument tests were here

                              # Create lists
                              l1 = map(ord, s1)
                              l2 = map(ord, s2)

                              # Xor it all together
                              xorlist = []
                              xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
                              return string.join(xor list,"") # Backward compatible

                              def xorstring1(s1,s 2):
                              xorlist = [chr(ord(x) ^ ord(y)) for (x, y) in zip(s1, s2)]
                              return string.join(xor list,"") # Backward compatible

                              def xorstring2(s1,s 2):
                              xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in range(len(s1))] # range
                              return string.join(xor list,"") # Backward compatible

                              def xorstring3(s1,s 2):
                              xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in xrange(len(s1))] # xrange
                              return string.join(xor list,"") # Backward compatible

                              def xorstring4(s1,s 2):
                              a1 = array.array("B" , s1)
                              a2 = array.array("B" , s2)
                              for i in xrange(len(a1)) :
                              a1[i] ^= a2[i]
                              return a1.tostring()

                              s1 = 'abcew'*200000
                              s2 = 'xyzqa'*200000

                              fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]

                              # check if all works OK for some short data --
                              # protection from some rough error or typo
                              sx = fns[0]( s1[:100], s2[:100] )
                              for fn in fns:
                              assert sx == fn( s1[:100], s2[:100] )

                              import time

                              tms = [60] * len(fns) # assume some unreal big times

                              for n in range(30): # loop 30 times

                              for i in range(len(fns)) :

                              tb = time.time()
                              sx = fns[i]( s1, s2 ) # do it!
                              tm = time.time() - tb
                              tms[i] = min( tms[i], tm )

                              for i in range(len(fns)) :
                              print "fn%d -- %.3f sec -- %.2f" % (i,tms[i],tms[i]/tms[-1])

                              # fn0 -- 5.578 sec -- 6.37
                              # fn1 -- 4.593 sec -- 5.25
                              # fn2 -- 2.609 sec -- 2.98
                              # fn3 -- 2.531 sec -- 2.89
                              # fn4 -- 0.875 sec -- 1.00


                              BTW, for the same million char strings, a C function runs 0.0035 sec, or 250 times faster!


                              G-:


                              Comment

                              Working...