Reading Binary Files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jon Skeet [C# MVP]

    #16
    Re: Reading Binary Files

    Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue]
    > As its a Huge file nearly(2GB), it would not be easy to form magic bytes
    > that will be present only once.[/color]

    They'd only have to not be present at the start of the old files.
    Compare that with your delimiter idea which relies on the delimiter
    *never* being present.
    [color=blue]
    > Also the text present in the binary file will
    > not be the same..So to find the magic bytes do i have to search throught he
    > file every time before serializing?[/color]

    No, you'd look for the magic bytes at the start of the file when
    *deserializing* .

    --
    Jon Skeet - <skeet@pobox.co m>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too

    Comment

    • Rohith

      #17
      Re: Reading Binary Files


      "Jon Skeet [C# MVP]" wrote:[color=blue]
      > No, you'd look for the magic bytes at the start of the file when
      > *deserializing* .[/color]

      But how do i ensure that my previous version (application) files does not
      have these magic bytes at the start of the file. Also How do I identify
      magic bytes...

      Comment

      • Jon Skeet [C# MVP]

        #18
        Re: Reading Binary Files

        Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue]
        > "Jon Skeet [C# MVP]" wrote:[color=green]
        > > No, you'd look for the magic bytes at the start of the file when
        > > *deserializing* .[/color]
        >
        > But how do i ensure that my previous version (application) files does not
        > have these magic bytes at the start of the file.[/color]

        Well, what are these files? Many file formats already have a magic
        number at the start of the file.

        It seems to me that if you're only now considering how to deal with the
        problem, then you've got that problem whether you use extra headers or
        not. I don't see how your delimiter idea is any better (and it strikes
        me as more likely to be a lot worse).

        Where are these files coming from? Can you change existing ones when
        you upgrade to a new version of your software?

        If you generate a random sequence of 16 bytes, the chances of any
        existing files happening to start with that same sequence is
        *extremely* small (the same as two GUIDs colliding). I suspect that
        would actually be good enough, and the best you can do in the situation
        you're in.
        [color=blue]
        > Also How do I identify magic bytes...[/color]

        That's simple - by reading the first 8 bytes (or however long your
        magic number is) and seeing whether or not they are the same as the
        magic number.

        --
        Jon Skeet - <skeet@pobox.co m>
        http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
        If replying to the group, please do not mail me too

        Comment

        • Rohith

          #19
          Re: Reading Binary Files

          > Well, what are these files? Many file formats already have a magic[color=blue]
          > number at the start of the file.
          > It seems to me that if you're only now considering how to deal with the
          > problem, then you've got that problem whether you use extra headers or
          > not. I don't see how your delimiter idea is any better (and it strikes
          > me as more likely to be a lot worse).
          >
          > Where are these files coming from? Can you change existing ones when
          > you upgrade to a new version of your software?[/color]

          No. I will not be even able to find whether this a older or newer version
          file.
          [color=blue]
          > If you generate a random sequence of 16 bytes, the chances of any
          > existing files happening to start with that same sequence is
          > *extremely* small (the same as two GUIDs colliding). I suspect that
          > would actually be good enough, and the best you can do in the situation
          > you're in.[/color]

          My previous version files does not have this magic bytes written. But When I
          desrialize them I will not be in a position to tell whether this a previous
          version or new version file. So I will not be able to get the length of this
          first file from the magic bytes.


          Comment

          • Jon Skeet [C# MVP]

            #20
            Re: Reading Binary Files

            Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue][color=green]
            > > Well, what are these files? Many file formats already have a magic
            > > number at the start of the file.
            > > It seems to me that if you're only now considering how to deal with the
            > > problem, then you've got that problem whether you use extra headers or
            > > not. I don't see how your delimiter idea is any better (and it strikes
            > > me as more likely to be a lot worse).
            > >
            > > Where are these files coming from? Can you change existing ones when
            > > you upgrade to a new version of your software?[/color]
            >
            > No. I will not be even able to find whether this a older or newer version
            > file.[/color]

            Okay. In that case, you would certainly have no chance with a single-
            byte delimiter as you were planning, would you?
            [color=blue][color=green]
            > > If you generate a random sequence of 16 bytes, the chances of any
            > > existing files happening to start with that same sequence is
            > > *extremely* small (the same as two GUIDs colliding). I suspect that
            > > would actually be good enough, and the best you can do in the situation
            > > you're in.[/color]
            >
            > My previous version files does not have this magic bytes written. But When I
            > desrialize them I will not be in a position to tell whether this a previous
            > version or new version file. So I will not be able to get the length of this
            > first file from the magic bytes.[/color]

            You can find the length of a whole file very easily (eg use
            FileStream.Leng th after opening it, or FileInfo.Length ).

            Basically, if you start deserializing and don't see the magic number,
            the contents is just the whole of the file.

            If you *do* see the magic number, you then read whatever header
            information you've put into the new files, and deserialize
            appropriately.

            --
            Jon Skeet - <skeet@pobox.co m>
            http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
            If replying to the group, please do not mail me too

            Comment

            • Rohith

              #21
              Re: Reading Binary Files

              [color=blue]
              > Basically, if you start deserializing and don't see the magic number,
              > the contents is just the whole of the file.
              >
              > If you *do* see the magic number, you then read whatever header
              > information you've put into the new files, and deserialize
              > appropriately.
              >[/color]
              Everytime I serialize a file I have to generate a Magic byte (say 16 bytes)
              and add them to file. But at the time of deserializing i wont be having that
              magic number with me...(since i am serializing with new magic bytes every
              time and I can desrialize any file in my application). So with the previous
              version files, and I am taking the first 16 bytes i will not be able to tell
              whether this is a magic byte or actual data.

              Comment

              • Jon Skeet [C# MVP]

                #22
                Re: Reading Binary Files

                Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue][color=green]
                > > Basically, if you start deserializing and don't see the magic number,
                > > the contents is just the whole of the file.
                > >
                > > If you *do* see the magic number, you then read whatever header
                > > information you've put into the new files, and deserialize
                > > appropriately.
                > >[/color]
                > Everytime I serialize a file I have to generate a Magic byte (say 16 bytes)
                > and add them to file. But at the time of deserializing i wont be having that
                > magic number with me...(since i am serializing with new magic bytes every
                > time and I can desrialize any file in my application). So with the previous
                > version files, and I am taking the first 16 bytes i will not be able to tell
                > whether this is a magic byte or actual data.[/color]

                But this is what I was saying before - if you generate a random set of
                16 bytes to be your magic number (and that's the same for *every* file
                you create) then the chances of another file starting with the exact
                same 16 bytes are incredibly small.

                There is absolutely no way round that kind of problem being present at
                all, because however you decide to serialize your files, there's always
                a possibility that there will be an old file with exactly the same
                content as a serialized pair of files.

                --
                Jon Skeet - <skeet@pobox.co m>
                http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                If replying to the group, please do not mail me too

                Comment

                Working...