How to Read Bytes from a file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • gregpinero@gmail.com

    How to Read Bytes from a file

    It seems like this would be easy but I'm drawing a blank.

    What I want to do is be able to open any file in binary mode, and read
    in one byte (8 bits) at a time and then count the number of 1 bits in
    that byte.

    I got as far as this but it is giving me strings and I'm not sure how
    to accurately get to the byte/bit level.

    f1=file('somefi le','rb')
    while 1:
    abyte=f1.read(1 )

    Thanks in advance for any help.

    -Greg

  • Alex Martelli

    #2
    Re: How to Read Bytes from a file

    gregpinero@gmai l.com <gregpinero@gma il.comwrote:
    It seems like this would be easy but I'm drawing a blank.
    >
    What I want to do is be able to open any file in binary mode, and read
    in one byte (8 bits) at a time and then count the number of 1 bits in
    that byte.
    >
    I got as far as this but it is giving me strings and I'm not sure how
    to accurately get to the byte/bit level.
    >
    f1=file('somefi le','rb')
    while 1:
    abyte=f1.read(1 )
    You should probaby prepare before the loop a mapping from char to number
    of 1 bits in that char:

    m = {}
    for c in range(256):
    m[c] = countones(c)

    and then sum up the values of m[abyte] into a running total (break from
    the loop when 'not abyte', i.e. you're reading 0 bytes even though
    asking for 1 -- that tells you the fine is finished, remember to close
    it).

    A trivial way to do the countones function:

    def countones(x):
    assert x>=0
    c = 0
    while x:
    c += (x&1)
    x >>= 1
    return c

    you just don't want to call it too often, whence the previous advice to
    call it just 256 times to prep a mapping.

    If you download and install gmpy you can use gmpy.popcount as a fast
    implementation of countones:-).


    Alex

    Comment

    • Leif K-Brooks

      #3
      Re: How to Read Bytes from a file

      Alex Martelli wrote:
      You should probaby prepare before the loop a mapping from char to number
      of 1 bits in that char:
      >
      m = {}
      for c in range(256):
      m[c] = countones(c)
      Wouldn't a list be more efficient?

      m = [countones(c) for c in xrange(256)]

      Comment

      • Bart Ogryczak

        #4
        Re: How to Read Bytes from a file

        On Mar 1, 7:52 am, "gregpin...@gma il.com" <gregpin...@gma il.com>
        wrote:
        It seems like this would be easy but I'm drawing a blank.
        >
        What I want to do is be able to open any file in binary mode, and read
        in one byte (8 bits) at a time and then count the number of 1 bits in
        that byte.
        >
        I got as far as this but it is giving me strings and I'm not sure how
        to accurately get to the byte/bit level.
        >
        f1=file('somefi le','rb')
        while 1:
        abyte=f1.read(1 )
        import struct
        buf = open('somefile' ,'rb').read()
        count1 = lambda x: (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+
        (x&64>0)+(x&128 >0)
        byteOnes = map(count1,stru ct.unpack('B'*l en(buf),buf))

        byteOnes[n] is number is number of ones in byte n.



        Comment

        • Jussi Salmela

          #5
          Re: How to Read Bytes from a file

          Bart Ogryczak kirjoitti:
          On Mar 1, 7:52 am, "gregpin...@gma il.com" <gregpin...@gma il.com>
          wrote:
          >It seems like this would be easy but I'm drawing a blank.
          >>
          >What I want to do is be able to open any file in binary mode, and read
          >in one byte (8 bits) at a time and then count the number of 1 bits in
          >that byte.
          >>
          >I got as far as this but it is giving me strings and I'm not sure how
          >to accurately get to the byte/bit level.
          >>
          >f1=file('somef ile','rb')
          >while 1:
          > abyte=f1.read(1 )
          >
          import struct
          buf = open('somefile' ,'rb').read()
          count1 = lambda x: (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+
          (x&64>0)+(x&128 >0)
          byteOnes = map(count1,stru ct.unpack('B'*l en(buf),buf))
          >
          byteOnes[n] is number is number of ones in byte n.
          >
          >
          >
          I guess struct.unpack is not necessary, because:

          byteOnes2 = map(count1, (ord(ch) for ch in buf))

          seems to do the trick also.

          Cheers,
          Jussi

          Comment

          • Alex Martelli

            #6
            Re: How to Read Bytes from a file

            Leif K-Brooks <eurleif@ecritt ers.bizwrote:
            Alex Martelli wrote:
            You should probaby prepare before the loop a mapping from char to number
            of 1 bits in that char:

            m = {}
            for c in range(256):
            m[c] = countones(c)
            >
            Wouldn't a list be more efficient?
            >
            m = [countones(c) for c in xrange(256)]
            Yes, or an array.array -- actually I meant to use m[chr(c)] above (so
            you could use the character you're reading directly to index m, rather
            than calling ord(byte) a bazillion times for each byte you're reading),
            but if you're using the numbers (as I did before) a list or array is
            better.


            Alex

            Comment

            • gregpinero@gmail.com

              #7
              Re: How to Read Bytes from a file

              On Mar 1, 8:53 am, "Bart Ogryczak" <B.Ogryc...@gma il.comwrote:
              On Mar 1, 7:52 am, "gregpin...@gma il.com" <gregpin...@gma il.com>
              wrote:
              >
              It seems like this would be easy but I'm drawing a blank.
              >
              What I want to do is be able to open any file in binary mode, and read
              in one byte (8 bits) at a time and then count the number of 1 bits in
              that byte.
              >
              I got as far as this but it is giving me strings and I'm not sure how
              to accurately get to the byte/bit level.
              >
              f1=file('somefi le','rb')
              while 1:
              abyte=f1.read(1 )
              >
              import struct
              buf = open('somefile' ,'rb').read()
              count1 = lambda x: (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+
              (x&64>0)+(x&128 >0)
              byteOnes = map(count1,stru ct.unpack('B'*l en(buf),buf))
              >
              byteOnes[n] is number is number of ones in byte n.

              This solution looks nice, but how does it work? I'm guessing
              struct.unpack will provide me with 8 bit bytes (will this work on any
              system?)

              How does count1 work exactly?

              Thanks for the help.

              -Greg

              Comment

              • John Machin

                #8
                Re: How to Read Bytes from a file

                On Mar 2, 12:53 am, "Bart Ogryczak" <B.Ogryc...@gma il.comwrote:
                >
                import struct
                buf = open('somefile' ,'rb').read()
                count1 = lambda x: (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+
                (x&64>0)+(x&128 >0)
                byteOnes = map(count1,stru ct.unpack('B'*l en(buf),buf))
                byteOnes = map(count1,stru ct.unpack('%dB' %len(buf),buf))

                Comment

                • Bart Ogryczak

                  #9
                  Re: How to Read Bytes from a file

                  On Mar 1, 4:58 pm, "gregpin...@gma il.com" <gregpin...@gma il.com>
                  wrote:
                  On Mar 1, 8:53 am, "Bart Ogryczak" <B.Ogryc...@gma il.comwrote:
                  >
                  >
                  >
                  On Mar 1, 7:52 am, "gregpin...@gma il.com" <gregpin...@gma il.com>
                  wrote:
                  >
                  It seems like this would be easy but I'm drawing a blank.
                  >
                  What I want to do is be able to open any file in binary mode, and read
                  in one byte (8 bits) at a time and then count the number of 1 bits in
                  that byte.
                  >
                  I got as far as this but it is giving me strings and I'm not sure how
                  to accurately get to the byte/bit level.
                  >
                  f1=file('somefi le','rb')
                  while 1:
                  abyte=f1.read(1 )
                  >
                  import struct
                  buf = open('somefile' ,'rb').read()
                  count1 = lambda x: (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+
                  (x&64>0)+(x&128 >0)
                  byteOnes = map(count1,stru ct.unpack('B'*l en(buf),buf))
                  >
                  byteOnes[n] is number is number of ones in byte n.
                  >
                  This solution looks nice, but how does it work? I'm guessing
                  struct.unpack will provide me with 8 bit bytes

                  unpack with 'B' format gives you int value equivalent to unsigned char
                  (1 byte).
                  (will this work on any system?)
                  Any system with 8-bit bytes, which would mean any system made after
                  1965. I'm not aware of any Python implementation for UNIVAC, so I
                  wouldn't worry ;-)
                  How does count1 work exactly?
                  1,2,4,8,16,32,6 4,128 in binary are
                  1,10,100,1000,1 0000,100000,100 0000,10000000
                  x&1 == 1 if x has first bit set to 1
                  x&2 == 2, so (x&2>0) == True if x has second bit set to 1
                  .... and so on.
                  In the context of int, True is interpreted as 1, False as 0.

                  Comment

                  • gregpinero@gmail.com

                    #10
                    Re: How to Read Bytes from a file

                    On Mar 1, 12:46 pm, "Bart Ogryczak" <B.Ogryc...@gma il.comwrote:
                    This solution looks nice, but how does it work? I'm guessing
                    struct.unpack will provide me with 8 bit bytes
                    >
                    unpack with 'B' format gives you int value equivalent to unsigned char
                    (1 byte).
                    >
                    (will this work on any system?)
                    >
                    Any system with 8-bit bytes, which would mean any system made after
                    1965. I'm not aware of any Python implementation for UNIVAC, so I
                    wouldn't worry ;-)
                    >
                    How does count1 work exactly?
                    >
                    1,2,4,8,16,32,6 4,128 in binary are
                    1,10,100,1000,1 0000,100000,100 0000,10000000
                    x&1 == 1 if x has first bit set to 1
                    x&2 == 2, so (x&2>0) == True if x has second bit set to 1
                    ... and so on.
                    In the context of int, True is interpreted as 1, False as 0.
                    Thanks Bart. That's perfect. The other suggestion was to precompute
                    count1 for all possible bytes, I guess that's 0-256, right?

                    Thanks again everyone for the help.

                    -Greg

                    Comment

                    • Hendrik van Rooyen

                      #11
                      Re: How to Read Bytes from a file

                      <gregpinero@gma il.comwrote:
                      Thanks Bart. That's perfect. The other suggestion was to precompute
                      count1 for all possible bytes, I guess that's 0-256, right?
                      0 to 255 inclusive, actually - that is 256 numbers...

                      The largest number representable in a byte is 255

                      eight bits, of value 128,64,32,16,8, 4,2,1

                      Their sum is 255...

                      And then there is zero.

                      - Hendrik

                      Comment

                      • Bart Ogryczak

                        #12
                        Re: How to Read Bytes from a file

                        On Mar 1, 7:36 pm, "gregpin...@gma il.com" <gregpin...@gma il.com>
                        wrote:
                        On Mar 1, 12:46 pm, "Bart Ogryczak" <B.Ogryc...@gma il.comwrote:
                        This solution looks nice, but how does it work? I'm guessing
                        struct.unpack will provide me with 8 bit bytes
                        >
                        unpack with 'B' format gives you int value equivalent to unsigned char
                        (1 byte).
                        >
                        (will this work on any system?)
                        >
                        Any system with 8-bit bytes, which would mean any system made after
                        1965. I'm not aware of any Python implementation for UNIVAC, so I
                        wouldn't worry ;-)
                        >
                        How does count1 work exactly?
                        >
                        1,2,4,8,16,32,6 4,128 in binary are
                        1,10,100,1000,1 0000,100000,100 0000,10000000
                        x&1 == 1 if x has first bit set to 1
                        x&2 == 2, so (x&2>0) == True if x has second bit set to 1
                        ... and so on.
                        In the context of int, True is interpreted as 1, False as 0.
                        >
                        Thanks Bart. That's perfect. The other suggestion was to precompute
                        count1 for all possible bytes, I guess that's 0-256, right?
                        0-255 actually. It'd be worth it, if accessing dictionary with
                        precomputed values would be significantly faster then calculating the
                        lambda, which I doubt. I suspect it actually might be slower.


                        Comment

                        • Piet van Oostrum

                          #13
                          Re: How to Read Bytes from a file

                          >>>>"Bart Ogryczak" <B.Ogryczak@gma il.com(BO) wrote:
                          >BOAny system with 8-bit bytes, which would mean any system made after
                          >BO1965. I'm not aware of any Python implementation for UNIVAC, so I
                          >BOwouldn't worry ;-)
                          1965? I worked with non-8-byte machines (CDC) until the beginning of the
                          80's. :=( In fact in that time the institution where Guido worked also had such
                          a machine, but Python came later.
                          --
                          Piet van Oostrum <piet@cs.uu.n l>
                          URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C 4]
                          Private email: piet@vanoostrum .org

                          Comment

                          • Bart Ogryczak

                            #14
                            Re: How to Read Bytes from a file

                            On Mar 5, 10:51 am, Piet van Oostrum <p...@cs.uu.nlw rote:
                            >>>"Bart Ogryczak" <B.Ogryc...@gma il.com(BO) wrote:
                            BOAny system with 8-bit bytes, which would mean any system made after
                            BO1965. I'm not aware of any Python implementation for UNIVAC, so I
                            BOwouldn't worry ;-)
                            >
                            1965? I worked with non-8-byte machines (CDC) until the beginning of the
                            80's. :=( In fact in that time the institution where Guido worked also had such
                            a machine, but Python came later.
                            Right, I should have written 'designed' not 'made'. UNIVACs also have
                            been produced until early 1980s. Anyway, I'd call it
                            paleoinformatic s ;-)





                            Comment

                            • Gabriel Genellina

                              #15
                              Re: How to Read Bytes from a file

                              En Fri, 02 Mar 2007 08:22:36 -0300, Bart Ogryczak <B.Ogryczak@gma il.com>
                              escribió:
                              On Mar 1, 7:36 pm, "gregpin...@gma il.com" <gregpin...@gma il.com>
                              wrote:
                              >Thanks Bart. That's perfect. The other suggestion was to precompute
                              >count1 for all possible bytes, I guess that's 0-256, right?
                              >
                              0-255 actually. It'd be worth it, if accessing dictionary with
                              precomputed values would be significantly faster then calculating the
                              lambda, which I doubt. I suspect it actually might be slower.
                              Dictionary access is highly optimized in Python. In fact, using a
                              precomputed dictionary is about 12 times faster:

                              pyimport timeit
                              pycount1 = lambda x:
                              (x&1)+(x&2>0)+( x&4>0)+(x&8>0)+ (x&16>0)+(x&32> 0)+(x&64>0)+
                              (x&128>0)
                              pyd256 = dict((i, count1(i)) for i in range(256))
                              pytimeit.Timer( "for x in range(256): w = d256[x]", "from __main__ import
                              d256"
                              ).repeat(number =10000)
                              [0.5426125387444 5003, 0.5476346854139 3934, 0.5449994342856 4279]
                              pytimeit.Timer( "for x in range(256): w = count1(x)", "from __main__
                              import cou
                              nt1").repeat(nu mber=10000)
                              [6.1867963665773 118, 6.1967124313285 638, 6.1666287195719 178]

                              --
                              Gabriel Genellina

                              Comment

                              Working...