xor: how come so slow?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michele

    xor: how come so slow?

    Hi,
    I'm trying to encode a byte data. Let's not focus on the process of
    encoding; in fact, I want to emphasize that the method
    create_random_b lock takes 0.5s to be executed (even Java it's faster) on
    a Dual-Core 3.0Ghz machine:

    took 46.746999979s, avg: 0.46746999979s

    Thus I suppose that the xor operation between bytes raise the execution
    time to 0.5; why I suppose that?
    Because in Python there's no support for bytes and even for xoring
    bytes, so I used a workaround:
    I cycle on the two strings to be xor-red
    for every char in the strings
    convert one char on integer and then xor them; (ord)
    insert one char in the result, transforming the previous integer
    in char (chr)

    I suppose that ord() and char() are the main problems of my
    implementation, but I don't know either way to xor two bytes of data
    (bytes are represented as strings).
    For more information, see the code attached.

    How should I decrease the execution time?

    Thank you


    from __future__ import division
    import random
    import time
    import sha
    import os

    class Encoder(object) :
    def create_random_b lock(self, data, seed, blocksize):
    number_of_block s = int(len(data)/blocksize)
    random.seed(see d)
    random_block = ['0'] * blocksize
    for index in range(number_of _blocks):
    if int(random.getr andbits(1)) == 1:
    block = data[blocksize*index :blocksize*inde x+blocksize]
    for bit in range(len(block )):
    random_block[bit] =
    chr(ord(random_ block[bit])^ord(block[bit])) # workaround per fare xor
    bit a bit di str; xor e' solo supportato per int -ord
    return ''.join(random_ block)


    x = Encoder()
    piece = os.urandom(1024 *1024)
    blocksize = 16384
    t1 = time.time()
    for l in range(100):
    seed = random.getrandb its(32)
    block = x.create_random _block(piece, seed, blocksize)
    t2 = time.time()
    print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'
  • Seun Osewa

    #2
    Re: xor: how come so slow?

    My answer is: never do things like this with python.
    You will find this module useful: www.pycrypto.org

    On Oct 15, 12:19 pm, Michele <mich...@nectar ine.itwrote:
    Hi,
    I'm trying to encode a byte data. Let's not focus on the process of
    encoding; in fact, I want to emphasize that the method
    create_random_b lock takes 0.5s to be executed (even Java it's faster) on
    a Dual-Core 3.0Ghz machine:
    >
    took 46.746999979s, avg: 0.46746999979s
    >
    Thus I suppose that the xor operation between bytes raise the execution
    time to 0.5; why I suppose that?
    Because in Python there's no support for bytes and even for xoring
    bytes, so I used a workaround:
    I cycle on the two strings to be xor-red
        for every char in the strings
            convert one char on integer and then xor them; (ord)
            insert one char in the result, transforming the previous integer
    in char (chr)
    >
    I suppose that ord() and char() are the main problems of my
    implementation, but I don't know either way to xor two bytes of data
    (bytes are represented as strings).
    For more information, see the code attached.
    >
    How should I decrease the execution time?
    >
    Thank you
    >
    from __future__ import division
    import random
    import time
    import sha
    import os
    >
    class Encoder(object) :
        def create_random_b lock(self, data, seed, blocksize):
            number_of_block s = int(len(data)/blocksize)
            random.seed(see d)
            random_block = ['0'] * blocksize
            for index in range(number_of _blocks):
                if int(random.getr andbits(1)) == 1:
                    block = data[blocksize*index :blocksize*inde x+blocksize]
                    for bit in range(len(block )):
                        random_block[bit] =
    chr(ord(random_ block[bit])^ord(block[bit])) # workaround per fare xor
    bit a bit di str; xor e' solo supportato per int -ord
            return ''.join(random_ block)
    >
    x = Encoder()
    piece = os.urandom(1024 *1024)
    blocksize = 16384
    t1 = time.time()
    for l in range(100):
        seed = random.getrandb its(32)
        block = x.create_random _block(piece, seed, blocksize)
    t2 = time.time()
    print 'took ' + str(t2-t1) + 's, avg: ' + str((t2-t1)/100.0) + 's'

    Comment

    • bearophileHUGS@lycos.com

      #3
      Re: xor: how come so slow?

      Few suggestions for your code:
      - Use xrange instead of range.
      - Loop over lists where you can instead of their indexes.
      - array.array("B" , somestring) may help you because it gives a byte
      "view" of a string.
      - Using psyco helps a lot for such kind of code.
      - I think numpy arrays can contain text/chars too, so it may offer you
      ways to speed up your code a lot.
      - Generally Python is fit to download pages from the net or to act as
      glue between different subsystems, or to do bulk string processing,
      etc, but for grunt low-level works like this it's often too much slow,
      and you can use other lower-level languages.
      - You can use a lib already written, or use an extension, for example
      you can try ShedSkin, or Pyd.

      Bye,
      bearophile

      Comment

      • John Machin

        #4
        Re: xor: how come so slow?

        On Oct 15, 10:19 pm, Michele <mich...@nectar ine.itwrote:
        Hi,
        I'm trying to encode a byte data. Let's not focus on the process of
        encoding; in fact, I want to emphasize that the method
        create_random_b lock takes 0.5s to be executed (even Java it's faster) on
        a Dual-Core 3.0Ghz machine:
        >
        took 46.746999979s, avg: 0.46746999979s
        >
        Thus I suppose that the xor operation between bytes raise the execution
        time to 0.5; why I suppose that?
        Because in Python there's no support for bytes and even for xoring
        bytes, so I used a workaround:
        I cycle on the two strings to be xor-red
            for every char in the strings
                convert one char on integer and then xor them; (ord)
                insert one char in the result, transforming the previous integer
        in char (chr)
        >
        I suppose that ord() and char() are the main problems of my
        implementation, but I don't know either way to xor two bytes of data
        (bytes are represented as strings).
        For more information, see the code attached.
        >
        How should I decrease the execution time?
        >
        Thank you
        >
        from __future__ import division
        import random
        import time
        import sha
        import os
        >
        class Encoder(object) :
            def create_random_b lock(self, data, seed, blocksize):
                number_of_block s = int(len(data)/blocksize)
                random.seed(see d)
                random_block = ['0'] * blocksize
        You possibly mean '\0' i.e. the byte all of whose bits are zero.
                for index in range(number_of _blocks):
                    if int(random.getr andbits(1)) == 1:
        getrandbits(1) produces a *long* with one random bit. Any good reason
        for preferring this to randrange(2) and randint(0, 1)?

        So there's a 50% chance that this block will be XORed into the result;
        is that what you intend?
                        block = data[blocksize*index :blocksize*inde x+blocksize]
        You don't need to slice out block, certainly not so awkwardly.

                        for bit in range(len(block )):

        Perhaps you mean "byte_index ", not "bit".

        On my assumption that range(len(block )) is invariant: calculate it
        once. That assumption is incorrect, so is your code for calculating
        the number of blocks; it ignores a possible short block at the end.

                            random_block[bit] =
        chr(ord(random_ block[bit])^ord(block[bit]))
        The chr() and one ord() are utterly wasteful; leave random_block as a
        list of ints and do the chr() thing in the return statement.
        # workaround per fare xor
        bit a bit di str; xor e' solo supportato per int -ord
                return ''.join(random_ block)
        this will become
        return ''.join(map(chr , random_block))
        or
        return ''.join(chr(i) for i in random_block)
        as taste or speed dictates :-)

        So the whole thing becomes [not tested]:
        def create_random_b lock(self, data, seed, blocksize):
        datalen = len(data)
        assert datalen % blocksize == 0
        random.seed(see d)
        random_block = [0] * blocksize
        block_range = range(blocksize )
        for start in xrange(0, datalen, blocksize):
        if random.randrang e(2):
        for x in block_range:
        random_block[x] ^= ord(data[start + x])
        return ''.join(map(chr , random_block))

        Looks slightly more athletic than before :-)

        BTW, +1 misleading subject of the week; it's not XOR that's slow!!

        Cheers,
        John

        Comment

        • Peter Otten

          #5
          Re: xor: how come so slow?

          Michele wrote:
          I'm trying to encode a byte data. Let's not focus on the process of
          encoding; in fact, I want to emphasize that the method
          create_random_b lock takes 0.5s to be executed (even Java it's faster) on
          a Dual-Core 3.0Ghz machine:
          >
          took 46.746999979s, avg: 0.46746999979s
          How should I decrease the execution time?
          Use numpy. You should be able to swap in the following in your script

          import numpy
          from numpy.core import multiarray as ma
          class Encoder(object) :
          def create_random_b lock(self, data, seed, blocksize):
          number_of_block s = len(data)//blocksize
          random.seed(see d)
          random_block = ma.fromstring(" 0"*blocksize , numpy.uint8)

          for index in range(number_of _blocks):
          if random.getrandb its(1):
          block =
          ma.fromstring(d ata[blocksize*index :blocksize*inde x+blocksize], numpy.uint8)
          random_block ^= block
          return random_block.to string()

          There are absolutely no warranties as I'm just a random tinkerer when it
          comes to numpy. But if it works you should get a nice speedup...

          Peter

          Comment

          • Lawrence D'Oliveiro

            #6
            Re: xor: how come so slow?

            In message <48f5d1a5$0$403 10$4fafbaef@rea der5.news.tin.i t>, Michele wrote:
            class Encoder(object) :
            def create_random_b lock(self, data, seed, blocksize):
            number_of_block s = int(len(data)/blocksize)
            random.seed(see d)
            random_block = ['0'] * blocksize
            for index in range(number_of _blocks):
            if int(random.getr andbits(1)) == 1:
            block = data[blocksize*index :blocksize*inde x+blocksize]
            for bit in range(len(block )):
            random_block[bit] =
            chr(ord(random_ block[bit])^ord(block[bit])) # workaround per fare xor
            bit a bit di str; xor e' solo supportato per int -ord
            return ''.join(random_ block)
            OK, this function is randomly choosing blocks from data (of length
            number_of_block s * blocksize), and xoring them together to produce a single
            block of length blocksize.
            piece = os.urandom(1024 *1024)
            Is piece really meant to be random? If so, your create_random_b lock function
            isn't achieving much--xoring random data together isn't going to produce
            anything more exciting than less random data than you started with.

            Comment

            • Steven D'Aprano

              #7
              Re: xor: how come so slow?

              On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
              Is piece really meant to be random? If so, your create_random_b lock
              function isn't achieving much--xoring random data together isn't going
              to produce anything more exciting than less random data than you started
              with.

              Hmmm... why do you say that xoring random data with other random data
              produces less randomness than you started with?

              I'm not saying that you're wrong, and certainly it is pointless since
              you're not going to improve on the randomness of /dev/urandom without a
              lot of work. But less random?

              --
              Steven

              Comment

              • Paul Rubin

                #8
                Re: xor: how come so slow?

                Michele <michele@nectar ine.itwrites:
                I suppose that ord() and char() are the main problems
                yes
                How should I decrease the execution time?
                See http://nightsong.com/phr/crypto/p3.py which deals with
                the same problem by using the array module to do the xor's
                32 bits at a time.

                Comment

                • Lawrence D'Oliveiro

                  #9
                  Re: xor: how come so slow?

                  In message <010845d8$0$206 38$c3e8da3@news .astraweb.com>, Steven D'Aprano
                  wrote:
                  On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
                  >
                  >... why do you say that xoring random data with other random data
                  produces less randomness than you started with?
                  blocksize <= number_of_block s * blocksize

                  Comment

                  • Steven D'Aprano

                    #10
                    Re: xor: how come so slow?

                    On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
                    In message <010845d8$0$206 38$c3e8da3@news .astraweb.com>, Steven D'Aprano
                    wrote:
                    >
                    >On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote:
                    >>
                    >>... why do you say that xoring random data with other random data
                    >produces less randomness than you started with?
                    >
                    blocksize <= number_of_block s * blocksize
                    I must be thick, because that looks like a non sequitor to me. I don't
                    see the relevance.

                    Of course, that's just another way of saying that:

                    1 <= number_of_block s

                    and I don't see how this relates to whether xoring random data with other
                    random data decreases the randomness available.
                    >>c1 = os.urandom(1)
                    >>c2 = os.urandom(1)
                    >>c3 = chr( ord(c1)^ord(c2) )
                    Is it your contention that c3 is more predictable than c1 or c2? If so,
                    why?



                    --
                    Steven

                    Comment

                    • Sion Arrowsmith

                      #11
                      Re: xor: how come so slow?

                      [I think these attributions are right]
                      Steven D'Aprano <steve@REMOVE-THIS-cybersource.com .auwrote:
                      >On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
                      >In message <010845d8$0$206 38$c3e8da3@news .astraweb.com>, Steven D'Aprano
                      >wrote:
                      >>... why do you say that xoring random data with other random data
                      >>produces less randomness than you started with?
                      >blocksize <= number_of_block s * blocksize
                      >I must be thick, because that looks like a non sequitor to me. I don't
                      >see the relevance.
                      Lawrence originally said something along the lines of this just
                      being a way of taking some random data and producing "less random
                      data". You're reading it as "(less random) data". The intent (I
                      assume) is for it to be read as "less (random data)".

                      Maybe it should be "fewer random data". After all, each byte in
                      the block is discrete.

                      --
                      \S -- siona@chiark.gr eenend.org.uk -- http://www.chaos.org.uk/~sion/
                      "Frankly I have no feelings towards penguins one way or the other"
                      -- Arthur C. Clarke
                      her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

                      Comment

                      • Lawrence D'Oliveiro

                        #12
                        Re: xor: how come so slow?

                        In message <Jlj*lmIps@news .chiark.greenen d.org.uk>, Sion Arrowsmith wrote:
                        Maybe it should be "fewer random data".
                        Except these days we tend to think of "data" being, say, more like "flour"
                        than "bees", so it's "less data", like "less flour", rather than
                        like "fewer bees". :)
                        After all, each byte in the block is discrete.
                        Data can come in fractional bits. That's how compression works.

                        Comment

                        • Steven D'Aprano

                          #13
                          Re: xor: how come so slow?

                          On Fri, 17 Oct 2008 13:59:27 +0100, Sion Arrowsmith wrote:
                          [I think these attributions are right] Steven D'Aprano
                          <steve@REMOVE-THIS-cybersource.com .auwrote:
                          >>On Fri, 17 Oct 2008 22:45:19 +1300, Lawrence D'Oliveiro wrote:
                          >>In message <010845d8$0$206 38$c3e8da3@news .astraweb.com>, Steven
                          >>D'Aprano wrote:
                          >>>... why do you say that xoring random data with other random data
                          >>>produces less randomness than you started with?
                          >>blocksize <= number_of_block s * blocksize
                          >>I must be thick, because that looks like a non sequitor to me. I don't
                          >>see the relevance.
                          >
                          Lawrence originally said something along the lines of this just being a
                          way of taking some random data and producing "less random data". You're
                          reading it as "(less random) data". The intent (I assume) is for it to
                          be read as "less (random data)".

                          Ah. That was how I read it.

                          Maybe it should be "fewer random data". After all, each byte in the
                          block is discrete.

                          Or "a smaller amount of random data".



                          --
                          Steven

                          Comment

                          • Steven D'Aprano

                            #14
                            Re: xor: how come so slow?

                            On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
                            In message <Jlj*lmIps@news .chiark.greenen d.org.uk>, Sion Arrowsmith
                            wrote:
                            >
                            >Maybe it should be "fewer random data".
                            >
                            Except these days we tend to think of "data" being, say, more like
                            "flour" than "bees", so it's "less data", like "less flour", rather than
                            like "fewer bees". :)
                            >
                            >After all, each byte in the block is discrete.
                            >
                            Data can come in fractional bits. That's how compression works.
                            Compression works by removing redundant data so the same amount of useful
                            information requires fewer bits. If you don't believe me, try compressing
                            a single bit and see if you get a "fractional bit". I leave it to you to
                            choose whether the bit should be on or off.

                            That's why compressing data twice doesn't gain much, if anything. Try
                            compressing a typical JPEG image, chances are it won't get any smaller.
                            That's because JPEGs already remove the redundancy in the data,
                            compressing it. With no redundancy, there can be no compression.

                            Due to the pigeon-hole principle, any reversible compression scheme must
                            cause some data to be expanded rather than compressed. (If not, then
                            there must be two different pieces of data that compress to the same
                            thing, which means the scheme is not reversible.) If there were
                            fractional bits, that wouldn't apply, because you can always divide a
                            pigeon-hole (a bit) into two smaller pigeon-holes.



                            --
                            Steven

                            Comment

                            • Lawrence D'Oliveiro

                              #15
                              Re: xor: how come so slow?

                              In message <010a9c3f$0$206 53$c3e8da3@news .astraweb.com>, Steven D'Aprano
                              wrote:
                              On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:
                              >
                              >Data can come in fractional bits. That's how compression works.
                              >
                              If you don't believe me, try compressing a single bit and see if you get
                              a "fractional bit".
                              If both states of the bit are not equally likely, then you do indeed have a
                              fractional bit, since

                              nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2

                              Comment

                              Working...