Reading binary data

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Aaron Scott

    Reading binary data

    I've been trying to tackle this all morning, and so far I've been
    completely unsuccessful. I have a binary file that I have the
    structure to, and I'd like to read it into Python. It's not a
    particularly complicated file. For instance:

    signature char[3] "GDE"
    version uint32 2
    attr_count uint32
    {
    attr_id uint32
    attr_val_len uint32
    attr_val char[attr_val_len]
    } ... repeated attr_count times ...

    However, I can't find a way to bring it into Python. This is my code
    -- which I know is definitely wrong, but I had to start somewhere:

    import struct
    file = open("test.gde" , "rb")
    output = file.read(3)
    print output
    version = struct.unpack(" I", file.read(4))[0]
    print version
    attr_count = struct.unpack(" I", file.read(4))[0]
    while attr_count:
    print "---"
    file.seek(4, 1)
    counter = int(struct.unpa ck("I", file.read(4))[0])
    print file.read(count er)
    attr_count -= 1
    file.close()

    Of course, this doesn't work at all. It produces:

    GDE
    2
    ---
    é
    ---
    ê Å

    I'm completely at a loss. If anyone could show me the correct way to
    do this (or at least point me in the right direction), I'd be
    extremely grateful.
  • Jon Clements

    #2
    Re: Reading binary data

    On 10 Sep, 18:14, Aaron Scott <aaron.hildebra ...@gmail.comwr ote:
    I've been trying to tackle this all morning, and so far I've been
    completely unsuccessful. I have a binary file that I have the
    structure to, and I'd like to read it into Python. It's not a
    particularly complicated file. For instance:
    >
    signature   char[3]     "GDE"
    version     uint32      2
    attr_count  uint32
    {
        attr_id         uint32
        attr_val_len    uint32
        attr_val        char[attr_val_len]
    >
    } ... repeated attr_count times ...
    >
    However, I can't find a way to bring it into Python. This is my code
    -- which I know is definitely wrong, but I had to start somewhere:
    >
    import struct
    file = open("test.gde" , "rb")
    output = file.read(3)
    print output
    version = struct.unpack(" I", file.read(4))[0]
    print version
    attr_count = struct.unpack(" I", file.read(4))[0]
    while attr_count:
            print "---"
            file.seek(4, 1)
            counter = int(struct.unpa ck("I", file.read(4))[0])
            print file.read(count er)
            attr_count -= 1
    file.close()
    >
    Of course, this doesn't work at all. It produces:
    >
    GDE
    2
    ---
    é
    ---
    ê Å
    >
    I'm completely at a loss. If anyone could show me the correct way to
    do this (or at least point me in the right direction), I'd be
    extremely grateful.
    What if we view the data as having an 11 byte header:
    signature, version, attr_count = struct.unpack(' 3cII',
    yourfile.read(1 1))

    Then for the list of attr's:
    for idx in xrange(attr_cou nt):
    attr_id, attr_val_len = struct.unpack(' II', yourfile.read(8 ))
    attr_val = yourfile.read(a ttr_val_len)


    hth, or gives you a pointer anyway
    Jon.


    Comment

    • Jon Clements

      #3
      Re: Reading binary data

      On 10 Sep, 18:33, Jon Clements <jon...@googlem ail.comwrote:
      On 10 Sep, 18:14, Aaron Scott <aaron.hildebra ...@gmail.comwr ote:
      >
      >
      >
      I've been trying to tackle this all morning, and so far I've been
      completely unsuccessful. I have a binary file that I have the
      structure to, and I'd like to read it into Python. It's not a
      particularly complicated file. For instance:
      >
      signature   char[3]     "GDE"
      version     uint32      2
      attr_count  uint32
      {
          attr_id         uint32
          attr_val_len    uint32
          attr_val        char[attr_val_len]
      >
      } ... repeated attr_count times ...
      >
      However, I can't find a way to bring it into Python. This is my code
      -- which I know is definitely wrong, but I had to start somewhere:
      >
      import struct
      file = open("test.gde" , "rb")
      output = file.read(3)
      print output
      version = struct.unpack(" I", file.read(4))[0]
      print version
      attr_count = struct.unpack(" I", file.read(4))[0]
      while attr_count:
              print "---"
              file.seek(4, 1)
              counter = int(struct.unpa ck("I", file.read(4))[0])
              print file.read(count er)
              attr_count -= 1
      file.close()
      >
      Of course, this doesn't work at all. It produces:
      >
      GDE
      2
      ---
      é
      ---
      ê Å
      >
      I'm completely at a loss. If anyone could show me the correct way to
      do this (or at least point me in the right direction), I'd be
      extremely grateful.
      >
      What if we view the data as having an 11 byte header:
      signature, version, attr_count = struct.unpack(' 3cII',
      yourfile.read(1 1))
      >
      Then for the list of attr's:
      for idx in xrange(attr_cou nt):
          attr_id, attr_val_len = struct.unpack(' II', yourfile.read(8 ))
          attr_val = yourfile.read(a ttr_val_len)
      >
      hth, or gives you a pointer anyway
      Jon.
      CORRECTION: '3cII' should be '3sII'.

      Comment

      • Aaron Scott

        #4
        Re: Reading binary data

        signature, version, attr_count = struct.unpack(' 3cII',
        yourfile.read(1 1))
        >
        This line is giving me an error:

        Traceback (most recent call last):
        File "test.py", line 19, in <module>
        signature, version, attr_count = struct.unpack(' 3cII',
        file.read(12))
        ValueError: too many values to unpack

        Comment

        • Aaron Scott

          #5
          Re: Reading binary data

          CORRECTION: '3cII' should be '3sII'.

          Even with the correction, I'm still getting the error.

          Comment

          • Wojtek Walczak

            #6
            Re: Reading binary data

            On Wed, 10 Sep 2008 10:43:31 -0700 (PDT), Aaron Scott wrote:
            >signature, version, attr_count = struct.unpack(' 3cII',
            >yourfile.read( 11))
            >>
            >
            This line is giving me an error:
            >
            Traceback (most recent call last):
            File "test.py", line 19, in <module>
            signature, version, attr_count = struct.unpack(' 3cII',
            file.read(12))
            ValueError: too many values to unpack
            Do:
            print struct.unpack(' 3cII', yourfile.read(1 1))
            instead of:
            signature, version, attr_count = struct.unpack(' 3cII', yourfile.read(1 1))
            to check what does struct.unpack return.

            I guess it returns more than three elements. Just like below:
            >>a,b,c=(1,2,3, 4)
            Traceback (most recent call last):
            File "<stdin>", line 1, in ?
            ValueError: too many values to unpack

            As you can see the fourth element from the tuple has no place to go.
            Same thing happens in your code.

            HTH.

            --
            Regards,
            Wojtek Walczak,

            Comment

            • Jon Clements

              #7
              Re: Reading binary data

              On Sep 10, 6:45 pm, Aaron Scott <aaron.hildebra ...@gmail.comwr ote:
              CORRECTION: '3cII' should be '3sII'.
              >
              Even with the correction, I'm still getting the error.
              Me being silly...

              Quick fix:
              signature = file.read(3)
              then the rest can stay the same, struct.calcsize ('3sII') expects a 12
              byte string, whereby you only really have 11 -- alignment and all
              that...

              Jon.

              Comment

              • Aaron Scott

                #8
                Re: Reading binary data

                Sorry, I had posted the wrong error. The error I am getting is:

                struct.error: unpack requires a string argument of length 12

                which doesn't make sense to me, since I'm specifically asking for 11.
                Just for kicks, if I change the line to

                print struct.unpack(' 3sII', file.read(12))

                I get the result

                ('GDE', 33554432, 16777216)

                .... which isn't even close, past the first three characters.

                Comment

                • Aaron Scott

                  #9
                  Re: Reading binary data

                  Taking everything into consideration, my code is now:

                  import struct
                  file = open("test.gde" , "rb")
                  signature = file.read(3)
                  version, attr_count = struct.unpack(' II', file.read(8))
                  print signature, version, attr_count
                  for idx in xrange(attr_cou nt):
                  attr_id, attr_val_len = struct.unpack(' II', file.read(8))
                  attr_val = file.read(attr_ val_len)
                  print attr_id, attr_val_len, attr_val
                  file.close()

                  which gives a result of:

                  GDE 2 2
                  1 4 é
                  2 4 ê Å

                  Essentially, the same results I was originally getting :(

                  Comment

                  • Jon Clements

                    #10
                    Re: Reading binary data

                    On Sep 10, 7:16 pm, Aaron Scott <aaron.hildebra ...@gmail.comwr ote:
                    Taking everything into consideration, my code is now:
                    >
                    import struct
                    file = open("test.gde" , "rb")
                    signature = file.read(3)
                    version, attr_count = struct.unpack(' II', file.read(8))
                    print signature, version, attr_count
                    for idx in xrange(attr_cou nt):
                            attr_id, attr_val_len = struct.unpack(' II', file.read(8))
                            attr_val = file.read(attr_ val_len)
                            print attr_id, attr_val_len, attr_val
                    file.close()
                    >
                    which gives a result of:
                    >
                    GDE 2 2
                    1 4 é
                    2 4 ê Å
                    >
                    Essentially, the same results I was originally getting :(
                    Umm, how about yourfile.read(1 00)[or some arbitary value, just to see
                    the data) and see what it returns... does it return something that
                    looks like values you'd expect in a char[]... I also find it odd that
                    the attr_val_len appears to be 4?

                    Comment

                    • John Machin

                      #11
                      Re: Reading binary data

                      On Sep 11, 4:16 am, Aaron Scott <aaron.hildebra ...@gmail.comwr ote:
                      Taking everything into consideration, my code is now:
                      >
                      import struct
                      file = open("test.gde" , "rb")
                      signature = file.read(3)
                      version, attr_count = struct.unpack(' II', file.read(8))
                      print signature, version, attr_count
                      for idx in xrange(attr_cou nt):
                              attr_id, attr_val_len = struct.unpack(' II', file.read(8))
                              attr_val = file.read(attr_ val_len)
                              print attr_id, attr_val_len, attr_val
                      file.close()
                      >
                      which gives a result of:
                      >
                      GDE 2 2
                      1 4 é
                      2 4 ê Å
                      >
                      Essentially, the same results I was originally getting :(
                      Stop thrashing about, and do the following:
                      (1) print repr(open('test .gde, 'rb').read(100) )
                      (2) tell us what you EXPECT to see in attr_val etc
                      (3) tell us what platform the file was created on and what platform
                      it's being read on
                      (4) (on the reading platform, at least) import sys; print
                      sys.byteorder

                      When showing results, do print ..., repr(attr_val)

                      Comment

                      • Terry Reedy

                        #12
                        Re: Reading binary data



                        Aaron Scott wrote:
                        Taking everything into consideration, my code is now:
                        >
                        import struct
                        file = open("test.gde" , "rb")
                        signature = file.read(3)
                        version, attr_count = struct.unpack(' II', file.read(8))
                        print signature, version, attr_count
                        for idx in xrange(attr_cou nt):
                        attr_id, attr_val_len = struct.unpack(' II', file.read(8))
                        attr_val = file.read(attr_ val_len)
                        print attr_id, attr_val_len, attr_val
                        file.close()
                        >
                        which gives a result of:
                        >
                        GDE 2 2
                        1 4 é
                        2 4 ê Å
                        >
                        Essentially, the same results I was originally getting :
                        It appears that your 4-byte attribute values are not what you were
                        expecting. Do you have separate info on the supposed contents? In any
                        case, I would print repr(attr_val) and even for c in attr_val:
                        print(ord(c)).

                        tjr


                        Comment

                        Working...