Conversion of 24bit binary to int

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Idar

    Conversion of 24bit binary to int

    Is there an effecient/fast way in python to convert binary data from file
    (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
    struct.unpack, but I am unsure how and what Python has to offer. Idar

    The orginal data format is stored in blocks of 512 words
    (1536B=3Bytes/word) on the form Ch1: 1536B (3B*512), the binary (hex) data
    is big endian
    Ch2: 1536B (3B*512)
    Ch3: 1536B (3B*512)
    and so on

    The equivalent c++ program looks like this:
    for(i=0;i<nchn; i++)
    {
    for(k=0;k<segl; k++)
    {
    ar24[k]=0;//output array=32 bit int array->Mt24 fmt
    pdt=(unsigned char *)(&ar24[k]);
    *pdt =*(a+2);
    *(pdt+1)=*(a+1) ;
    *(pdt+2)=*(a+0) ;
    a+=3;
    ar24[k]-=DownloadDataOf fset;
    // printf("%d\n",a r24[k]);//this is the number on 32 bit format
    }
    }

  • Peter Hansen

    #2
    Re: Conversion of 24bit binary to int

    Idar wrote:[color=blue]
    >
    > Is there an effecient/fast way in python to convert binary data from file
    > (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
    > struct.unpack, but I am unsure how and what Python has to offer. Idar[/color]

    I think the question is unclear. You say you've seen struct.unpack.
    So what then? Don't you think struct.unpack will work? What do you
    mean you are unsure how and what Python has to offer? The documentation
    which is on the web site clearly explains how and what struct.unpack
    has to offer...

    Please clarify.

    -Peter

    Comment

    • Mike C. Fletcher

      #3
      Re: Conversion of 24bit binary to int

      If I'm understanding correctly, hex has nothing to do with this and the
      data is really binary, so what you're looking for is probably:
      [color=blue][color=green][color=darkred]
      >>> data = '\000\001\002'
      >>> temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned[/color][/color][/color]
      big-endian integer format[color=blue][color=green][color=darkred]
      >>> print temp # is now a regular python integer (in a tuple)[/color][/color][/color]
      (258L,)[color=blue][color=green][color=darkred]
      >>> print repr(struct.pac k( '<I', *temp )) # encode in 4-byte unsigned[/color][/color][/color]
      little-endian integer format
      '\x02\x01\x00\x 00'

      There are faster ways if you have a lot of such data (e.g. PIL would
      likely have something to manipulate RGB to RGBA images), similarly, you
      could use Numpy to add large numbers of rows simultaneously (all 512 if
      I understand your description of the data correctly). Without knowing
      what type of data is being loaded it's hard to give a better recommendation.

      HTH,
      Mike


      Idar wrote:
      [color=blue]
      > Is there an effecient/fast way in python to convert binary data from
      > file (24bit hex(int) big endian) to 32bit int (little endian)? Have
      > seen struct.unpack, but I am unsure how and what Python has to offer.
      > Idar[/color]

      ....
      _______________ _______________ _________
      Mike C. Fletcher
      Designer, VR Plumber, Coder





      Comment

      • Patrick Maupin

        #4
        Re: Conversion of 24bit binary to int

        Idar wrote:
        [color=blue]
        > Is there an effecient/fast way in python to convert binary data from file
        > (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
        > struct.unpack, but I am unsure how and what Python has to offer. Idar[/color]

        As Peter mentions, you haven't _really_ given enough information
        about what you need, but here is some code which will do what
        I _think_ you said you want...

        This code assumes that you have a string (named teststr here)
        in the source format you describe. You can get a string
        like this in several ways, e.g. by reading from a file object.

        This code then swaps every 3 characters and inserts a null
        byte between every group of three characters.

        The result is in a list, which can easily be converted back
        to a string by ''.join() as shown in the test printout.

        I would expect that either the array module or Numpy would
        work faster with _exactly_ the same technique, but I'm
        not bored enough to check that out right now.

        If this isn't fast enough after using array or NumPy (or
        after Alex, Tim, et al. get through with it), I would
        highly recommend Pyrex -- you can do exactly the same
        sorts of coercions you were doing in your C++ code.


        teststr = ''.join([chr(i) for i in range(128,128+2 0*3)])

        result = len(teststr) * 4 // 3 * [chr(0)]
        for x in range(3):
        result[2-x::4] = teststr[x::3]

        print repr(''.join(re sult))


        Regards,
        Pat

        Comment

        • Idar

          #5
          Re: Conversion of 24bit binary to int



          On Tue, 11 Nov 2003 10:11:05 -0500, Peter Hansen <peter@engcorp. com> wrote:
          [color=blue]
          > Idar wrote:[color=green]
          >>
          >> Is there an effecient/fast way in python to convert binary data from
          >> file
          >> (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
          >> struct.unpack, but I am unsure how and what Python has to offer. Idar[/color]
          >
          > I think the question is unclear. You say you've seen struct.unpack.
          > So what then? Don't you think struct.unpack will work? What do you
          > mean you are unsure how and what Python has to offer? The documentation
          > which is on the web site clearly explains how and what struct.unpack
          > has to offer...[/color]

          It is due to slack reading........

          The doc says "Standard size and alignment are as follows: no alignment is
          required for any type (so you have to use pad bytes)......... ......."

          It was unclear (at the time of reading) in the sence that I didn't see the
          above text + there was no example on how to handle odd-byte/padding
          conversion and the test program crashed!

          But if you know how to convert this format (the file is about 6MB)
          effeciently, pls do give me a hint. The data is stored binary with the
          format:
          Ch1: 1536B (512*3B)
          ...
          Ch6 1536B (512*3B)
          Then it is repeated again until end:
          Ch1 1536B (512*3B)
          ...
          Ch6 1536B (512*3B)

          [color=blue]
          >
          > Please clarify.
          >
          > -Peter
          >[/color]



          --
          Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

          Comment

          • Idar

            #6
            Re: Conversion of 24bit binary to int

            On Tue, 11 Nov 2003 10:21:29 -0500, Mike C. Fletcher <mcfletch@roger s.com>
            wrote:
            [color=blue]
            > If I'm understanding correctly, hex has nothing to do with this and the
            > data is really binary, so what you're looking for is probably:[/color]

            Thanks for the hint!! and sorry - i ment binary![color=blue]
            >[color=green][color=darkred]
            > >>> data = '\000\001\002'
            > >>> temp = struct.unpack( '>I', '\000'+data ) # pad to 4-byte unsigned[/color][/color]
            > big-endian integer format[color=green][color=darkred]
            > >>> print temp # is now a regular python integer (in a tuple)[/color][/color]
            > (258L,)[color=green][color=darkred]
            > >>> print repr(struct.pac k( '<I', *temp )) # encode in 4-byte unsigned[/color][/color]
            > little-endian integer format
            > '\x02\x01\x00\x 00'
            >
            > There are faster ways if you have a lot of such data (e.g. PIL would
            > likely have something to manipulate RGB to RGBA images), similarly, you
            > could use Numpy to add large numbers of rows simultaneously (all 512 if I
            > understand your description of the data correctly). Without knowing what
            > type of data is being loaded it's hard to give a better recommendation.[/color]

            It is binary with no formating characters to indicate start/end of each
            block (fixed size).
            A file is about 6MB (and about 300 of them again...),
            Ch1: 1536B (512*3B) - the 3B are big endian (int)
            ...
            Ch6: 1536B (512*3B)
            And then it is repeated till the end:
            Ch1: 1536B (512*3B)
            ...
            Ch6: 1536B (512*3B)

            ciao, idar
            [color=blue]
            >
            > HTH,
            > Mike
            >
            >
            > Idar wrote:
            >[color=green]
            >> Is there an effecient/fast way in python to convert binary data from
            >> file (24bit hex(int) big endian) to 32bit int (little endian)? Have seen
            >> struct.unpack, but I am unsure how and what Python has to offer. Idar[/color]
            >
            > ... _______________ _______________ _________
            > Mike C. Fletcher
            > Designer, VR Plumber, Coder
            > http://members.rogers.com/mcfletch/
            >
            >
            >
            >
            >[/color]



            --
            Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

            Comment

            • Idar

              #7
              Re: Conversion of 24bit binary to int

              Thanks for the example!

              The format is binary with no formating characters to indicate start/end of
              each block (fixed size).
              A file is about 6MB (and about 300 of them again...), so

              Ch1: 1536B (512*3B) - the 3B are big endian (int)
              ...
              Ch6: 1536B (512*3B)
              And then it is repeated till the end (say Y sets of Ch1 (the same for
              Ch2,3,4,5,6)):
              Ch1,Y: 1536B (512*3B)
              ...
              Ch6,Y: 1536B (512*3B)

              And idealy I would like to convert it to this format:
              Ch1: Y*512*4B (normal int with little endian)
              Ch2
              Ch3
              Ch4
              Ch5
              Ch6
              And that is the end :)
              Idar
              [color=blue]
              >
              > This code assumes that you have a string (named teststr here)
              > in the source format you describe. You can get a string
              > like this in several ways, e.g. by reading from a file object.
              >
              > This code then swaps every 3 characters and inserts a null
              > byte between every group of three characters.
              >
              > The result is in a list, which can easily be converted back
              > to a string by ''.join() as shown in the test printout.
              >
              > I would expect that either the array module or Numpy would
              > work faster with _exactly_ the same technique, but I'm
              > not bored enough to check that out right now.
              >
              > If this isn't fast enough after using array or NumPy (or
              > after Alex, Tim, et al. get through with it), I would
              > highly recommend Pyrex -- you can do exactly the same
              > sorts of coercions you were doing in your C++ code.
              >
              >
              > teststr = ''.join([chr(i) for i in range(128,128+2 0*3)])
              >
              > result = len(teststr) * 4 // 3 * [chr(0)]
              > for x in range(3):
              > result[2-x::4] = teststr[x::3]
              >
              > print repr(''.join(re sult))
              >
              >
              > Regards,
              > Pat
              >[/color]



              --
              Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

              Comment

              • Alex Martelli

                #8
                Re: Conversion of 24bit binary to int

                Idar wrote:
                [color=blue]
                > Thanks for the example!
                >
                > The format is binary with no formating characters to indicate start/end of
                > each block (fixed size).
                > A file is about 6MB (and about 300 of them again...), so
                >
                > Ch1: 1536B (512*3B) - the 3B are big endian (int)
                > ..
                > Ch6: 1536B (512*3B)
                > And then it is repeated till the end (say Y sets of Ch1 (the same for
                > Ch2,3,4,5,6)):
                > Ch1,Y: 1536B (512*3B)
                > ..
                > Ch6,Y: 1536B (512*3B)
                >
                > And idealy I would like to convert it to this format:
                > Ch1: Y*512*4B (normal int with little endian)
                > Ch2
                > Ch3
                > Ch4
                > Ch5
                > Ch6
                > And that is the end :)[/color]

                So, you don't really need to convert binary to int or anything, just
                shuffle bytes around, right? Your file starts with (e.g.), using a
                letter for each arbitrary binary byte:

                A B C D E F G H I ...

                and you want to output the bytes

                C B A 0 F E D 0 I H G 0 ...

                I.e, swap 3 bytes, insert a 0 byte for padding, and proceed (for all
                Ch1, which is spread out in the original file -- then for all Ch2, and
                so on). Each file fits comfortably in memory (3MB for input, becoming
                4MB for output due to the padding). You can use two instances of
                array.array('B' ), with .read for input and .write for output (just
                remember .read _appends_ to the array, so make a new empty one for
                each file you're processing -- the _output_ array you can reuse).

                It's LOTS of indexing and single-byte moving, so I doubt the Python
                native performance will be great. Still, once you've implemented and
                checked it out you can use psyco or pyrex to optimize it, if needed.

                The primitive you need is typically "copy with swapping and padding
                a block of 1536 input bytes [starting from index SI] to a block of
                2048 output bytes" [starting from index SO -- the 0 bytes in the
                output you'll leave untouched after at first preparing the output
                array with OA = array.array('B' , Y*2048*6*'\0') of course].
                That's just (using predefined ranges for speed, no need to remake
                them every time):

                r512 = xrange(512)

                def doblock(SI, SO, IA, OA, r512=r512):
                ii = SI
                io = SO
                for i in r512:
                OA[io:io+3] = IA[ii+2:ii-1:-1]
                ii += 3
                io += 4

                so basically it only remains to compute SI and SO appropriately
                and loop ditto calling this primitive (or some speeded-up version
                thereof) 6*Y times for all the blocks in the various channels.


                Alex

                Comment

                • Patrick Maupin

                  #9
                  Re: Conversion of 24bit binary to int

                  Alex Martelli wrote:
                  [color=blue]
                  > r512 = xrange(512)
                  >
                  > def doblock(SI, SO, IA, OA, r512=r512):
                  > ii = SI
                  > io = SO
                  > for i in r512:
                  > OA[io:io+3] = IA[ii+2:ii-1:-1]
                  > ii += 3
                  > io += 4[/color]


                  It's my guess this would be faster using array.array
                  in combination with extended slicing, as per the list
                  example I gave in a previous message, even though I'm
                  still not bored enough to time it :) (The for loop
                  in my previous example only requires 3 interations,
                  rather than 512 as in this example.)

                  Pat

                  Comment

                  • Patrick Maupin

                    #10
                    Re: Conversion of 24bit binary to int

                    Idar wrote:
                    [color=blue]
                    > Thanks for the example!
                    >
                    > The format is binary with no formating characters to indicate start/end of
                    > each block (fixed size).
                    > A file is about 6MB (and about 300 of them again...), so
                    >
                    > Ch1: 1536B (512*3B) - the 3B are big endian (int)
                    > ..
                    > Ch6: 1536B (512*3B)
                    > And then it is repeated till the end (say Y sets of Ch1 (the same for
                    > Ch2,3,4,5,6)):
                    > Ch1,Y: 1536B (512*3B)
                    > ..
                    > Ch6,Y: 1536B (512*3B)
                    >
                    > And idealy I would like to convert it to this format:
                    > Ch1: Y*512*4B (normal int with little endian)
                    > Ch2
                    > Ch3
                    > Ch4
                    > Ch5
                    > Ch6
                    > And that is the end :)
                    > Idar[/color]

                    OK, now that I have a beer and a specification, here is some code
                    which (I think) should do what (I think) you are asking for.
                    On my Athlon 2200+ (marketing number) computer, with the source
                    file cached by the OS, it operates at around 10 source megabytes/second.

                    (That should be about 3 minutes plus actual file I/O operations
                    for the 300 6MB files you describe.)

                    Verifying that it actually produces the data you expect is up to you :)

                    Regards,
                    Pat


                    import array

                    def mungeio(srcfile ,dstfile, numchannels=6, blocksize=512):
                    """
                    This function converts 24 bit RGB into 32 bit BGR0,
                    and simultaneously de-interleaves video from multiple
                    sources. The parameters are:

                    srcfile -- an file object opened with 'rb'
                    (or similar object)
                    dstfile -- a file object opened with 'wb'
                    (or similar object)
                    numchannels -- the number of interleaved video channels
                    blocksize -- the number of pixels per channel on
                    each interleaved block (interleave factor)

                    This function reads all the data from srcfile and writes
                    it to dstfile. It is up to the caller to close both files.

                    The function asserts that the amount of data to be read
                    from the source file is an integral multiple of
                    blocksize*numch annels*3.

                    This function assumes that multiple copies of the data
                    will easily fit into RAM, as the target file size is
                    6MB for the source files and 8MB for the destination
                    files. If this is not a good assumption, it should
                    be rearchitected to output to one file per channel,
                    and then stitch the output files together at the end.
                    """

                    srcblocksize = blocksize * 3
                    dstblocksize = blocksize * 4

                    def mungeblock(src, dstarray=array. array('B',dstbl ocksize*[0])):
                    """
                    This function accepts a string representing a single
                    source block, and returns a string representing a
                    single destination block.
                    """
                    srcarray = array.array('B' ,src)
                    for i in range(3):
                    dstarray[2-i::4] = srcarray[i::3]
                    return dstarray.tostri ng()

                    channellist = [[] for i in range(numchanne ls)]

                    while 1:
                    for channel in channellist:
                    data = srcfile.read(sr cblocksize)
                    if len(data) != srcblocksize:
                    break
                    channel.append( mungeblock(data ))
                    else:
                    continue # (with while statement)
                    break # Propagate break from 'for' out of 'while'

                    # Check that input file length is valid (no leftovers),
                    # and then write the result.

                    assert channel is channellist[0] and not len(data)
                    dstfile.write(' '.join(sum(chan nellist,[])))


                    def mungefile(srcna me,dstname):
                    """
                    Actual I/O done in a separate function so it can
                    be more easily unit-tested.
                    """
                    srcfile = open(srcname,'r b')
                    dstfile = open(dstname,'w b')
                    mungeio(srcfile ,dstfile)
                    srcfile.close()
                    dstfile.close()

                    Comment

                    • Patrick Maupin

                      #11
                      Re: Conversion of 24bit binary to int

                      I just realized that, according to your spec, it ought to be possible
                      to do the rgb -> bgr0 conversion on the entire file all at one go
                      (no nasty headers or block headers to get in the way:)

                      So I wrote a somewhat more comprehensible (for one thing, it gets rid
                      of that nasty sum() everybody's been complaining about :), somewhat more
                      memory-intensive version of the program. On my machine it executes at
                      approximately the same speed as the original one I wrote (10 source
                      megabytes/second), but this one might be more amenable to profiling
                      and further point optimizations if necessary.

                      The barebones (no comments or error-checking) functions are below.

                      Pat


                      import array

                      def RgbToBgr0(srcst ring):
                      srcarray = array.array('B' ,srcstring)
                      dstarray = array.array('B' ,len(srcstring) * 4 // 3 * chr(0))
                      for i in range(3):
                      dstarray[2-i::4] = srcarray[i::3]
                      return dstarray.tostri ng()

                      def deinterleave(sr cstring,numchan nels=6,pixelspe rblock=512):
                      bytesperblock = pixelsperblock* 4
                      totalblocks = len(srcstring) // bytesperblock
                      blocknums = []
                      for i in range(numchanne ls):
                      blocknums.exten d(range(i,total blocks,numchann els))
                      return ''.join([srcstring[i*bytesperblock :(i+1)*bytesper block]
                      for i in blocknums])

                      def mungefile(srcna me,dstname):
                      srcfile = open(srcname,'r b')
                      dstfile = open(dstname,'w b')
                      dstfile.write(d einterleave(Rgb ToBgr0(srcfile. read())))
                      srcfile.close()
                      dstfile.close()

                      Comment

                      • Christos TZOTZIOY Georgiou

                        #12
                        Re: Conversion of 24bit binary to int

                        On Wed, 12 Nov 2003 10:53:17 +0100, rumours say that Idar
                        <ip@itk.ntnu.no > might have written:
                        [color=blue]
                        >But if you know how to convert this format (the file is about 6MB)
                        >effeciently, pls do give me a hint. The data is stored binary with the
                        >format:
                        >Ch1: 1536B (512*3B)
                        >..
                        >Ch6 1536B (512*3B)
                        >Then it is repeated again until end:
                        >Ch1 1536B (512*3B)
                        >..
                        >Ch6 1536B (512*3B)[/color]

                        So it's some audio file with 6 channels, right? (I missed the first
                        post)

                        I would take every chunk of 512*3 bytes, and for every 3 bytes,
                        struct.unpack(' i', _3_bytes+'\0')[0] is the 32bit value (assuming
                        Intel's little endianness).

                        Hope this helps (no, really :)
                        --
                        TZOTZIOY, I speak England very best,
                        Ils sont fous ces Redmontains! --Harddix

                        Comment

                        Working...