Reading Binary Files

**Jon Skeet [C# MVP]** · Dec 28 '05, 06:45 AM

Re: Reading Binary Files

Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue]
> As its a Huge file nearly(2GB), it would not be easy to form magic bytes
> that will be present only once.[/color]

They'd only have to not be present at the start of the old files.
Compare that with your delimiter idea which relies on the delimiter
*never* being present.
[color=blue]
> Also the text present in the binary file will
> not be the same..So to find the magic bytes do i have to search throught he
> file every time before serializing?[/color]

No, you'd look for the magic bytes at the start of the file when
*deserializing* .

--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

**Rohith** · Dec 28 '05, 07:05 AM

Re: Reading Binary Files

"Jon Skeet [C# MVP]" wrote:[color=blue]
> No, you'd look for the magic bytes at the start of the file when
> *deserializing* .[/color]

But how do i ensure that my previous version (application) files does not
have these magic bytes at the start of the file. Also How do I identify
magic bytes...

**Jon Skeet [C# MVP]** · Dec 28 '05, 07:35 AM

Re: Reading Binary Files

Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue]
> "Jon Skeet [C# MVP]" wrote:[color=green]
> > No, you'd look for the magic bytes at the start of the file when
> > *deserializing* .[/color]
>
> But how do i ensure that my previous version (application) files does not
> have these magic bytes at the start of the file.[/color]

Well, what are these files? Many file formats already have a magic
number at the start of the file.

It seems to me that if you're only now considering how to deal with the
problem, then you've got that problem whether you use extra headers or
not. I don't see how your delimiter idea is any better (and it strikes
me as more likely to be a lot worse).

Where are these files coming from? Can you change existing ones when
you upgrade to a new version of your software?

If you generate a random sequence of 16 bytes, the chances of any
existing files happening to start with that same sequence is
*extremely* small (the same as two GUIDs colliding). I suspect that
would actually be good enough, and the best you can do in the situation
you're in.
[color=blue]
> Also How do I identify magic bytes...[/color]

That's simple - by reading the first 8 bytes (or however long your
magic number is) and seeing whether or not they are the same as the
magic number.

--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

**Rohith** · Dec 28 '05, 07:55 AM

Re: Reading Binary Files

> Well, what are these files? Many file formats already have a magic[color=blue]
> number at the start of the file.
> It seems to me that if you're only now considering how to deal with the
> problem, then you've got that problem whether you use extra headers or
> not. I don't see how your delimiter idea is any better (and it strikes
> me as more likely to be a lot worse).
>
> Where are these files coming from? Can you change existing ones when
> you upgrade to a new version of your software?[/color]

No. I will not be even able to find whether this a older or newer version
file.
[color=blue]
> If you generate a random sequence of 16 bytes, the chances of any
> existing files happening to start with that same sequence is
> *extremely* small (the same as two GUIDs colliding). I suspect that
> would actually be good enough, and the best you can do in the situation
> you're in.[/color]

My previous version files does not have this magic bytes written. But When I
desrialize them I will not be in a position to tell whether this a previous
version or new version file. So I will not be able to get the length of this
first file from the magic bytes.

**Jon Skeet [C# MVP]** · Dec 28 '05, 08:05 AM

Re: Reading Binary Files

Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue][color=green]
> > Well, what are these files? Many file formats already have a magic
> > number at the start of the file.
> > It seems to me that if you're only now considering how to deal with the
> > problem, then you've got that problem whether you use extra headers or
> > not. I don't see how your delimiter idea is any better (and it strikes
> > me as more likely to be a lot worse).
> >
> > Where are these files coming from? Can you change existing ones when
> > you upgrade to a new version of your software?[/color]
>
> No. I will not be even able to find whether this a older or newer version
> file.[/color]

Okay. In that case, you would certainly have no chance with a single-
byte delimiter as you were planning, would you?
[color=blue][color=green]
> > If you generate a random sequence of 16 bytes, the chances of any
> > existing files happening to start with that same sequence is
> > *extremely* small (the same as two GUIDs colliding). I suspect that
> > would actually be good enough, and the best you can do in the situation
> > you're in.[/color]
>
> My previous version files does not have this magic bytes written. But When I
> desrialize them I will not be in a position to tell whether this a previous
> version or new version file. So I will not be able to get the length of this
> first file from the magic bytes.[/color]

You can find the length of a whole file very easily (eg use
FileStream.Leng th after opening it, or FileInfo.Length ).

Basically, if you start deserializing and don't see the magic number,
the contents is just the whole of the file.

If you *do* see the magic number, you then read whatever header
information you've put into the new files, and deserialize
appropriately.

--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

**Rohith** · Dec 28 '05, 09:15 AM

Re: Reading Binary Files

[color=blue]
> Basically, if you start deserializing and don't see the magic number,
> the contents is just the whole of the file.
>
> If you *do* see the magic number, you then read whatever header
> information you've put into the new files, and deserialize
> appropriately.
>[/color]
Everytime I serialize a file I have to generate a Magic byte (say 16 bytes)
and add them to file. But at the time of deserializing i wont be having that
magic number with me...(since i am serializing with new magic bytes every
time and I can desrialize any file in my application). So with the previous
version files, and I am taking the first 16 bytes i will not be able to tell
whether this is a magic byte or actual data.

**Jon Skeet [C# MVP]** · Dec 28 '05, 09:35 AM

Re: Reading Binary Files

Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue][color=green]
> > Basically, if you start deserializing and don't see the magic number,
> > the contents is just the whole of the file.
> >
> > If you *do* see the magic number, you then read whatever header
> > information you've put into the new files, and deserialize
> > appropriately.
> >[/color]
> Everytime I serialize a file I have to generate a Magic byte (say 16 bytes)
> and add them to file. But at the time of deserializing i wont be having that
> magic number with me...(since i am serializing with new magic bytes every
> time and I can desrialize any file in my application). So with the previous
> version files, and I am taking the first 16 bytes i will not be able to tell
> whether this is a magic byte or actual data.[/color]

But this is what I was saying before - if you generate a random set of
16 bytes to be your magic number (and that's the same for *every* file
you create) then the chances of another file starting with the exact
same 16 bytes are incredibly small.

There is absolutely no way round that kind of problem being present at
all, because however you decide to serialize your files, there's always
a possibility that there will be an old file with exactly the same
content as a serialized pair of files.

--
Jon Skeet - <skeet@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Reading Binary Files

Comment

Comment

Comment

Comment

Comment

Comment

Comment