Binary data representation

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Charles T.

    Binary data representation

    Hi,

    I currently writing a serialize/unserialize architecture. The read/write
    function will read/write from a binary file.

    My question is is there some sort on defined standart to use when
    representing data type (int , int32, int64, double, string, etc....) ?


    Thanks,




  • Phlip

    #2
    Re: Binary data representation

    Charles T. wrote:
    [color=blue]
    > I currently writing a serialize/unserialize architecture. The read/write
    > function will read/write from a binary file.[/color]

    Why a binary file?
    [color=blue]
    > My question is is there some sort on defined standart to use when
    > representing data type (int , int32, int64, double, string, etc....) ?[/color]

    No. There is not even a "standard" for what order bytes go inside an int.

    The least heinous data format is XML. You can write very simple or very
    complex data structures in it, and you can read those structures in a text
    editor.

    But XML can be a little obese. Some data formats are compressed XML.

    --
    Phlip



    Comment

    • AirPete

      #3
      Re: Binary data representation

      [snip]
      [color=blue]
      > The least heinous data format is XML. You can write very simple or
      > very complex data structures in it, and you can read those structures
      > in a text editor.
      >
      > But XML can be a little obese. Some data formats are compressed XML.[/color]

      I would reccomend this, also.
      The game Age of Mythology uses XML compressed with zLib compatible
      compression, and it generates very compact but easily decoded files.
      You can get zLib here:


      - Pete


      Comment

      • Thomas Matthews

        #4
        Re: Binary data representation

        Charles T. wrote:
        [color=blue]
        > Hi,
        >
        > I currently writing a serialize/unserialize architecture. The read/write
        > function will read/write from a binary file.[/color]

        There has been much discussion on Serialization and Persistence in
        this newsgroup and news:comp.lang. c. Use a search engine and look
        for some ideas.

        [color=blue]
        > My question is is there some sort on defined standart to use when
        > representing data type (int , int32, int64, double, string, etc....) ?[/color]

        There is no standard, from platform to platform. On some platforms,
        there may be no standards between OS versions or compiler versions.
        For better portability, write out the data in a consistent form
        (i.e. uint64 == 64 bits, little endian) and let the programs convert
        the data into the native representation.

        Remember, when serializing, that the size of a structure may not
        be the sum of the size of its members. Compilers are allowed to
        add "padding bytes" between members.

        Pointers don't store well. There is a very small probability
        that an OS will allocate a variable in the same place for each
        execution of a program.

        Since pointers don't store well, don't store strings as pointers.
        Store text as <quantity, text> or <text, sentinel character>.

        See section [35] of the C++ FAQ (about serialization):

        [color=blue]
        >
        >
        > Thanks,
        >[/color]


        --
        Thomas Matthews

        C++ newsgroup welcome message:

        C++ Faq: http://www.parashift.com/c++-faq-lite
        C Faq: http://www.eskimo.com/~scs/c-faq/top.html
        alt.comp.lang.l earn.c-c++ faq:

        Other sites:
        http://www.josuttis.com -- C++ STL Library book

        Comment

        • Gianni Mariani

          #5
          Re: Binary data representation

          Phlip wrote:[color=blue]
          > Charles T. wrote:
          >
          >[color=green]
          >>I currently writing a serialize/unserialize architecture. The read/write
          >>function will read/write from a binary file.[/color]
          >
          >
          > Why a binary file?
          >
          >[color=green]
          >>My question is is there some sort on defined standart to use when
          >>representin g data type (int , int32, int64, double, string, etc....) ?[/color]
          >
          >
          > No. There is not even a "standard" for what order bytes go inside an int.
          >
          > The least heinous data format is XML. You can write very simple or very
          > complex data structures in it, and you can read those structures in a text
          > editor.
          >
          > But XML can be a little obese. Some data formats are compressed XML.[/color]

          If you're talking about a real-time (streaming) system, the XML overhead
          may be too much of a price to pay.

          In 1999 I built a binary XML format that could be "parsed" in a fraction
          of the time. But for some systems, even this one was too expensive.

          Comment

          • AirPete

            #6
            Re: Binary data representation

            Gianni Mariani wrote:
            [snip][color=blue]
            >
            > In 1999 I built a binary XML format that could be "parsed" in a
            > fraction of the time. But for some systems, even this one was too
            > expensive.[/color]

            Would you mind posting your implementation? I would be interested in seeing
            it.
            Thanks!

            - Pete


            Comment

            • Geoff Macartney

              #7
              Re: Binary data representation

              You might want to look (depending on your application area and on
              whether you have time to learn it) at ASN.1, which is an ITU standard to
              provide "a notation for defining data structures [and] a defined
              (machine-independent) encoding for those data structures".

              Have a glance at www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html,
              www.asn1.org, or google will bring back lots of links.

              Geoff Macartney

              Charles T. wrote:
              [color=blue]
              > Hi,
              >
              > I currently writing a serialize/unserialize architecture. The read/write
              > function will read/write from a binary file.
              >
              > My question is is there some sort on defined standart to use when
              > representing data type (int , int32, int64, double, string, etc....) ?
              >
              >
              > Thanks,
              >
              >
              >
              >[/color]

              Comment

              • Martijn Lievaart

                #8
                Re: Binary data representation

                On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:
                [color=blue]
                > In 1999 I built a binary XML format that could be "parsed" in a fraction
                > of the time. But for some systems, even this one was too expensive.[/color]

                No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.

                M4

                Comment

                • Gianni Mariani

                  #9
                  Re: Binary data representation

                  Martijn Lievaart wrote:[color=blue]
                  > On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:
                  >
                  >[color=green]
                  >>In 1999 I built a binary XML format that could be "parsed" in a fraction
                  >>of the time. But for some systems, even this one was too expensive.[/color]
                  >
                  >
                  > No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.[/color]

                  ASN.1 is different - the binary format I'm talking about has a 1:1
                  correlation to XML. The format was simply more efficient to parse than
                  XML text - admitedly the XML parser I wrote was slower than molasses in
                  a blizzard ... :-)

                  Comment

                  • David Rasmussen

                    #10
                    Re: Binary data representation

                    Charles T. wrote:[color=blue]
                    > Hi,
                    >
                    > I currently writing a serialize/unserialize architecture. The read/write
                    > function will read/write from a binary file.
                    >
                    > My question is is there some sort on defined standart to use when
                    > representing data type (int , int32, int64, double, string, etc....) ?
                    >[/color]

                    I have an application in which the compactness of binary representation
                    (as compared with, say, XML) is important, but where portability of that
                    binary file, regardless of endianess, is also important. My solution is
                    very simple: I just choose an endianess and stick with it, and make sure
                    to write/read one byte at a time to construct/reconstruct the data. It
                    works fine. The binary file is as compact as if I didn't care about
                    portability, and it works with all kinds of endianess. The reading and
                    the writing in principle takes a little longer because of the
                    disassembling/assembling that takes place here, but in practice it is
                    not a problem at all because of buffering. I just read, say, 1k at a
                    time and the problem disappears. Also, there are usually layers of
                    buffering involved anyway, in the OS, in the disk etc.

                    /David

                    Comment

                    • Martijn Lievaart

                      #11
                      Re: Binary data representation

                      On Wed, 04 Feb 2004 19:50:36 -0500, Gianni Mariani wrote:
                      [color=blue]
                      > Martijn Lievaart wrote:[color=green]
                      >> On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:
                      >>
                      >>[color=darkred]
                      >>>In 1999 I built a binary XML format that could be "parsed" in a fraction
                      >>>of the time. But for some systems, even this one was too expensive.[/color]
                      >>
                      >>
                      >> No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.[/color]
                      >
                      > ASN.1 is different - the binary format I'm talking about has a 1:1
                      > correlation to XML. The format was simply more efficient to parse than
                      > XML text - admitedly the XML parser I wrote was slower than molasses in
                      > a blizzard ... :-)[/color]

                      I think ASN.1 can easily handle binary-XML. Something like it's been a
                      while since I worked with ASN.1, so terminlogy is likely to be incorrect):

                      list xmlentitydef
                      list xmltagdef
                      utf8 xmltag
                      list xmlattrdef
                      utf8 attrkey
                      utf8 attrval
                      list xmlattr
                      utf8 attrkey
                      utf8 attrval
                      utf8 entitybody

                      Entity body could itself be a list with entities. Xmltag and xmlattrdef
                      could probably use binary tags if there are only a few possible tags, thus
                      saving greatly on space (and processing time).

                      I don't think you can get very much more efficient than that.

                      M4

                      Comment

                      • Charles T.

                        #12
                        Re: Binary data representation

                        Thank, for the response,

                        i will take a look at the asn1 stuff


                        "Geoff Macartney" <geoffrey.dot.m acartney@openwa vedotcom.nospam > wrote in
                        message news:E9eUb.4748 4$OA3.14831320@ newsfep2-win.server.ntli .net...[color=blue]
                        > You might want to look (depending on your application area and on
                        > whether you have time to learn it) at ASN.1, which is an ITU standard to
                        > provide "a notation for defining data structures [and] a defined
                        > (machine-independent) encoding for those data structures".
                        >
                        > Have a glance at www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html,
                        > www.asn1.org, or google will bring back lots of links.
                        >
                        > Geoff Macartney
                        >
                        > Charles T. wrote:
                        >[color=green]
                        > > Hi,
                        > >
                        > > I currently writing a serialize/unserialize architecture. The read/write
                        > > function will read/write from a binary file.
                        > >
                        > > My question is is there some sort on defined standart to use when
                        > > representing data type (int , int32, int64, double, string, etc....) ?
                        > >
                        > >
                        > > Thanks,
                        > >
                        > >
                        > >
                        > >[/color]
                        >[/color]


                        Comment

                        Working...