XORing long strings opimization?

**Peter Otten** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

Noen wrote:
[color=blue]
> def XOR(s1,s2):
> """ XOR string s1 with s2 """
> output = ""
> # Argument check
> if (type(s1) and type(s2)) != type(""):
> raise TypeError, "Arguments are not strings"
> if len(s1) != len(s2):
> raise ValueError, "Size differs in strings"
> # Xoring
> for c1 in s1:
> for c2 in s2:
> output += chr(ord(c1) ^ ord(c2))
> return output
>
> This way is very slow for large strings.
> Anyone know of a better and faster way to do it?[/color]

Before we start optimizing:

len(XOR(s1, s2)) == len(s1) * len(s2)

Bug or feature?

Peter

**Noen** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

Peter Otten wrote:
[color=blue]
> Noen wrote:
>
>[color=green]
>>def XOR(s1,s2):
>>""" XOR string s1 with s2 """
>>output = ""
>># Argument check
>>if (type(s1) and type(s2)) != type(""):
>>raise TypeError, "Arguments are not strings"
>>if len(s1) != len(s2):
>>raise ValueError, "Size differs in strings"
>># Xoring
>>for c1 in s1:
>>for c2 in s2:
>>output += chr(ord(c1) ^ ord(c2))
>>return output
>>
>>This way is very slow for large strings.
>>Anyone know of a better and faster way to do it?[/color]
>
>
> Before we start optimizing:
>
> len(XOR(s1, s2)) == len(s1) * len(s2)
>
> Bug or feature?
>
> Peter[/color]

Oh, didnt notice that as I wrote it. Thanks...
Anyway, this is how it should be.

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) and type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1[i]) ^ ord(s2[i]))
return output

**Peter Otten** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

Noen wrote:
[color=blue]
> Oh, didnt notice that as I wrote it. Thanks...
> Anyway, this is how it should be.
>
> def XOR(s1,s2):
> """ XOR string s1 with s2 """
> output = ""
> # Argument check
> if (type(s1) and type(s2)) != type(""):
> raise TypeError, "Arguments are not strings"
> if len(s1) != len(s2):
> raise ValueError, "Size differs in strings"
> # Xoring
> for i in range(len(s1)):
> output += chr(ord(s1[i]) ^ ord(s2[i]))
> return output[/color]

Oops, I missed one:
[color=blue][color=green][color=darkred]
>>> xor2.XOR(1, "")[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor2.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Did you expect this?

Peter

**Terry Reedy** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

"Noen" <not.available@ na.no> wrote in message
news:i8Tpb.2445 1$BD3.4567242@j uliett.dax.net. ..[color=blue]
> def XOR(s1,s2):
> """ XOR string s1 with s2 """
> output = ""
> # Argument check
> if (type(s1) and type(s2)) != type(""):
> raise TypeError, "Arguments are not strings"
> if len(s1) != len(s2):
> raise ValueError, "Size differs in strings"
> # Xoring
> for i in range(len(s1)):
> output += chr(ord(s1[i]) ^ ord(s2[i]))
> return output[/color]

You fell into the O(n**2) immutable sequence repeated addition trap.
For O(n) behavior, try output = [] and return ''.join(output) .
Replacing output+=stuff with output.append(s tuff) might be faster
still.

Terry J. Reedy

**Noen** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

Peter Otten wrote:[color=blue]
> Noen wrote:
>
>[color=green]
>>Oh, didnt notice that as I wrote it. Thanks...
>>Anyway, this is how it should be.
>>
>>def XOR(s1,s2):
>> """ XOR string s1 with s2 """
>> output = ""
>> # Argument check
>> if (type(s1) and type(s2)) != type(""):
>> raise TypeError, "Arguments are not strings"
>> if len(s1) != len(s2):
>> raise ValueError, "Size differs in strings"
>> # Xoring
>> for i in range(len(s1)):
>> output += chr(ord(s1[i]) ^ ord(s2[i]))
>> return output[/color]
>
>
> Oops, I missed one:
>
>[color=green][color=darkred]
>>>>xor2.XOR( 1, "")[/color][/color]
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "xor2.py", line 7, in XOR
> if len(s1) != len(s2):
> TypeError: len() of unsized object
>
> Did you expect this?
>
> Peter[/color]
nope, thought the Argument check would do it, but I used and opeartor
instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
faster?

def XOR(s1,s2):
""" XOR string s1 with s2 """
output = ""
# Argument check
if (type(s1) or type(s2)) != type(""):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring
for i in range(len(s1)):
output += chr(ord(s1[i]) ^ ord(s2[i]))
return output

**Peter Otten** · Jul 18 '05, 04:55 AM

Re: XORing long strings opimization?

Noen wrote:
[color=blue]
> def XOR(s1,s2):
> """ XOR string s1 with s2 """
> output = ""
> # Argument check
> if (type(s1) or type(s2)) != type(""):
> raise TypeError, "Arguments are not strings"
> if len(s1) != len(s2):
> raise ValueError, "Size differs in strings"
> # Xoring
> for i in range(len(s1)):
> output += chr(ord(s1[i]) ^ ord(s2[i]))
> return output[/color]

I keep nagging:
[color=blue][color=green][color=darkred]
>>> xor3.XOR("", 1)[/color][/color][/color]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "xor3.py", line 7, in XOR
if len(s1) != len(s2):
TypeError: len() of unsized object

Must be in a nasty mood this evening :-)
Seriously, I think that "first make it right, then make it fast" is a
reasonable design principle.
You can check the type

if type(s1) != type("") or type(s2) != type(""):
....

or

if not isinstance(s1, str) or not isinstance(s2, str):
....

or even

if not type(s1) == type(s2) == type(""):
....

but personally I would omit the check altogether. The second check could be
relaxed to len(s1) <= len(s2), assuming that s1 is the data to be encrypted
and s2 the random character sequence that makes your encryption NSA-proof.
For (much) weaker encryption schemes, you could omit it completely and
cycle over the characters of s2 (not recommended).

As to speed, I would suppose the following to be somewhat faster:

def xor(s1, s2):
return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip( s1,
s2)])

It favours "".join(charact er list) over consecutive s += tail operations,
which need to build "many" strings as intermediate results.

I think, further speedups could be achieved building a lookup table for all
character combinations.

Peter

**Ulrich Petri** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

"Peter Otten" <__peter__@web. de> schrieb im Newsbeitrag
news:bo8vq8$9q5 $01$1@news.t-online.com...[color=blue]
> Noen wrote:
>[color=green]
> > Oh, didnt notice that as I wrote it. Thanks...
> > Anyway, this is how it should be.
> >
> > def XOR(s1,s2):
> > """ XOR string s1 with s2 """
> > output = ""
> > # Argument check
> > if (type(s1) and type(s2)) != type(""):
> > raise TypeError, "Arguments are not strings"
> > if len(s1) != len(s2):
> > raise ValueError, "Size differs in strings"
> > # Xoring
> > for i in range(len(s1)):
> > output += chr(ord(s1[i]) ^ ord(s2[i]))
> > return output[/color]
>
> Oops, I missed one:[/color]

Therese even more ;)
[color=blue][color=green]
> > if (type(s1) and type(s2)) != type(""):
> > raise TypeError, "Arguments are not strings"[/color][/color]

i guess he meant:[color=blue][color=green]
> > if type(s1) != type("") or type(s2) != type(""):
> > raise TypeError, "Arguments are not strings"[/color][/color]

Ciao Ulrich

**Francis Avila** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

"Noen" <not.available@ na.no> wrote in message
news:ieUpb.2445 7$BD3.4567875@j uliett.dax.net. ..[color=blue]
> nope, thought the Argument check would do it, but I used and opeartor
> instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
> faster?[/color]

How about using the struct module?
Caveat: I don't know if struct limits the repeat count.

import struct
def XOR(data, key):
if len(data) != len(key):
raise ValueError, "data and key not of equal length"
ldata = struct.unpack(' =%sb' % len(data), data)
lkey = struct.unpack(' =%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=% sb' % len(lxored), *lxored)
return xored

Just an idea.
--
Francis Avila

**Alex Martelli** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

Peter Otten wrote:
[color=blue]
> Noen wrote:
>[color=green]
>> Oh, didnt notice that as I wrote it. Thanks...
>> Anyway, this is how it should be.
>>
>> def XOR(s1,s2):
>> """ XOR string s1 with s2 """
>> output = ""
>> # Argument check
>> if (type(s1) and type(s2)) != type(""):
>> raise TypeError, "Arguments are not strings"
>> if len(s1) != len(s2):
>> raise ValueError, "Size differs in strings"
>> # Xoring
>> for i in range(len(s1)):
>> output += chr(ord(s1[i]) ^ ord(s2[i]))
>> return output[/color]
>
> Oops, I missed one:
>[color=green][color=darkred]
>>>> xor2.XOR(1, "")[/color][/color]
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "xor2.py", line 7, in XOR
> if len(s1) != len(s2):
> TypeError: len() of unsized object[/color]

Checks apart -- on to optimization as per subject:

[alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
'"".join([chr(ord(S[i])^ord(S[i])) for i in xrange(len(S))])'
10000 loops, best of 3: 113 usec per loop

i'm not sure i can do much better with strings. arrays are cool, though:

[alex@lancelot krb5module]$ timeit.py -c -s'from string import letters as S'
-s'import array' 'x=array.array( "b",S)' 'for i,v in enumerate(x): x[i]^=v'
'x.tostring()'
10000 loops, best of 3: 57 usec per loop

Morale: when you find yourself fiddling with lots of chr and ord, and care
about performance, think about "array('b', ... -- it could buy you an easy
speedup by a factor of 2, like here.

Alex

**Noen** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

Here is the result, if anyone wants it :)
It was quite quick indeed, It took me 10.3279999495 seconds to XOR
"a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
if anyone know how to make it even faster, please tell me :)

import string
def xorstring(s1,s2 ):
""" XOR string s1 with s2 """
# Argument check
if not (type(s1) == type("")) or not (type(s2) == type("")):
raise TypeError, "Arguments are not strings"
if len(s1) != len(s2):
raise ValueError, "Size differs in strings"
# Xoring

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xor list,"") # Backward compatible

**Francis Avila** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

"Alex Martelli" <aleax@aleax.it > wrote in message
news:a%Vpb.9414 5$e5.3462437@ne ws1.tin.it...
[color=blue]
> about performance, think about "array('b', ... -- it could buy you an easy
> speedup by a factor of 2, like here.[/color]

Why we should read a module a day: just today I learned about the array
module, when I *had* been using struct for similar tasks....

Sigh; if only the standard library weren't so *big* ;).
--
Francis Avila

**Peter Otten** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

Francis Avila wrote:
[color=blue]
> "Noen" <not.available@ na.no> wrote in message
> news:ieUpb.2445 7$BD3.4567875@j uliett.dax.net. ..[color=green]
>> nope, thought the Argument check would do it, but I used and opeartor
>> instead of or. Well, here is XOR v.0.3b :P Any comments how to make it
>> faster?[/color]
>
> How about using the struct module?
> Caveat: I don't know if struct limits the repeat count.
>
> import struct
> def XOR(data, key):
> if len(data) != len(key):
> raise ValueError, "data and key not of equal length"
> ldata = struct.unpack(' =%sb' % len(data), data)
> lkey = struct.unpack(' =%sb' % len(key), key)
> lxored = [d^k for d,k in zip(ldata, lkey)]
> xored = struct.pack('=% sb' % len(lxored), *lxored)
> return xored
>
> Just an idea.
> --
> Francis Avila[/color]

Nope. Both your and my approach make it worse. Only Alex Martelli did it
right:

import itertools, array, struct

def xor_noen(s1,s2) :
""" XOR string s1 with s2 """
output = ""
for i in range(len(s1)):
output += chr(ord(s1[i]) ^ ord(s2[i]))
return output

def xor_po(s1, s2):
return "".join([chr(ord(c1) ^ ord(c2)) for c1, c2 in itertools.izip( s1,
s2)])

def xor_am(s1, s2):
x = array.array("b" , s2)
for i, v in enumerate(array .array("b", s1)):
x[i] ^= v
return x.tostring()

def xor_fa(data, key):
ldata = struct.unpack(' =%sb' % len(data), data)
lkey = struct.unpack(' =%sb' % len(key), key)
lxored = [d^k for d,k in zip(ldata, lkey)]
xored = struct.pack('=% sb' % len(lxored), *lxored)
return xored

# 35 usec
def pro_noen():
return xor_noen("sgnir ts ni sreffid eziS", "Size differs in strings")

# 40 usec
def pro_po():
return xor_po("sgnirts ni sreffid eziS", "Size differs in strings")

# 23 usec
def pro_am():
return xor_am("sgnirts ni sreffid eziS", "Size differs in strings")

# 46 usec
def pro_fa():
return xor_fa("sgnirts ni sreffid eziS", "Size differs in strings")

assert pro_po() == pro_am() == pro_fa() == pro_noen()

I was surprised how good pro_noen() does, but that might change for large
strings. By the way, results varied significantly (several usecs) in
various test runs.

Peter

**Casey Whitelaw** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

Noen <not.available@ na.no> wrote in message news:<%AWpb.244 70$BD3.4569022@ juliett.dax.net >...[color=blue]
> Here is the result, if anyone wants it :)
> It was quite quick indeed, It took me 10.3279999495 seconds to XOR
> "a"*1000000 large string. Im going to XOR 700000000 mb sized files, so
> if anyone know how to make it even faster, please tell me :)
>
> import string
> def xorstring(s1,s2 ):
> """ XOR string s1 with s2 """
> # Argument check
> if not (type(s1) == type("")) or not (type(s2) == type("")):
> raise TypeError, "Arguments are not strings"
> if len(s1) != len(s2):
> raise ValueError, "Size differs in strings"
> # Xoring
>
> # Create lists
> l1 = map(ord, s1)
> l2 = map(ord, s2)
>
> # Xor it all together
> xorlist = []
> xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
> return string.join(xor list,"") # Backward compatible[/color]

Just fiddling around with the (excellent) suggestions already posted,
and found a couple of things.

Alex's suggestion using array is the fastest, but was slightly faster
when using xrange() rather than enumerate():

def xor6(s1, s2):
a1 = array.array("b" , s1)
a2 = array.array("b" , s2)

for i in xrange(len(a1)) :
a1[i] ^= a2[i]
return a1.tostring()

On my system, this XORs 500,000 characters in 0.39s, with Noen's
original taking 0.67s.

Using psyco produced a further ~3x speedup, reducing this to 0.14s.
That's pretty fast!

Casey

**Georgy Pruss** · Jul 18 '05, 04:56 AM

Re: XORing long strings opimization?

Well, nothing to add actually, but my results:

import string
import array

def xorstring0(s1,s 2): # the original method
# argument tests were here

# Create lists
l1 = map(ord, s1)
l2 = map(ord, s2)

# Xor it all together
xorlist = []
xorlist = [chr(x ^ y) for (x, y) in zip(l1, l2)]
return string.join(xor list,"") # Backward compatible

def xorstring1(s1,s 2):
xorlist = [chr(ord(x) ^ ord(y)) for (x, y) in zip(s1, s2)]
return string.join(xor list,"") # Backward compatible

def xorstring2(s1,s 2):
xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in range(len(s1))] # range
return string.join(xor list,"") # Backward compatible

def xorstring3(s1,s 2):
xorlist = [chr(ord(s1[i]) ^ ord(s2[i])) for i in xrange(len(s1))] # xrange
return string.join(xor list,"") # Backward compatible

def xorstring4(s1,s 2):
a1 = array.array("B" , s1)
a2 = array.array("B" , s2)
for i in xrange(len(a1)) :
a1[i] ^= a2[i]
return a1.tostring()

s1 = 'abcew'*200000
s2 = 'xyzqa'*200000

fns = [ xorstring0, xorstring1, xorstring2, xorstring3, xorstring4 ]

# check if all works OK for some short data --
# protection from some rough error or typo
sx = fns[0]( s1[:100], s2[:100] )
for fn in fns:
assert sx == fn( s1[:100], s2[:100] )

import time

tms = [60] * len(fns) # assume some unreal big times

for n in range(30): # loop 30 times

for i in range(len(fns)) :

tb = time.time()
sx = fns[i]( s1, s2 ) # do it!
tm = time.time() - tb
tms[i] = min( tms[i], tm )

for i in range(len(fns)) :
print "fn%d -- %.3f sec -- %.2f" % (i,tms[i],tms[i]/tms[-1])

# fn0 -- 5.578 sec -- 6.37
# fn1 -- 4.593 sec -- 5.25
# fn2 -- 2.609 sec -- 2.98
# fn3 -- 2.531 sec -- 2.89
# fn4 -- 0.875 sec -- 1.00

BTW, for the same million char strings, a C function runs 0.0035 sec, or 250 times faster!

G-:

XORing long strings opimization?

XORing long strings opimization?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment