Reading Binary Files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Rohith

    Reading Binary Files

    I need to split a large binary file into two binary files. I have a delimiter
    (say NewLine) in the binaryfile. I need to split the binary file such that
    the first file is upto the NewLine and the Second file is from NewLine to end
    of file. Kindly let me know whether this si possible

    Thanks
    Rohith
  • David Browne

    #2
    Re: Reading Binary Files


    "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
    news:BD46DA29-EFF5-41A6-990F-9E719497D0AF@mi crosoft.com...[color=blue]
    >I need to split a large binary file into two binary files. I have a
    >delimiter
    > (say NewLine) in the binaryfile. I need to split the binary file such that
    > the first file is upto the NewLine and the Second file is from NewLine to
    > end
    > of file. Kindly let me know whether this si possible
    >[/color]

    Just open a System.IO.FileS tream against the file. Read it out by chunks
    into a byte[] and examine the chunks for your delimeter. Write the chunks
    to a first and then a seconde FileStream.

    David


    Comment

    • Rohith

      #3
      Re: Reading Binary Files

      Ya..This will work. But I have a huge binary file nearly 1GB. Is there an
      alternate solution to find the delimiter position with checking on every
      chunk looping through it

      "David Browne" wrote:
      [color=blue]
      >
      > "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
      > news:BD46DA29-EFF5-41A6-990F-9E719497D0AF@mi crosoft.com...[color=green]
      > >I need to split a large binary file into two binary files. I have a
      > >delimiter
      > > (say NewLine) in the binaryfile. I need to split the binary file such that
      > > the first file is upto the NewLine and the Second file is from NewLine to
      > > end
      > > of file. Kindly let me know whether this si possible
      > >[/color]
      >
      > Just open a System.IO.FileS tream against the file. Read it out by chunks
      > into a byte[] and examine the chunks for your delimeter. Write the chunks
      > to a first and then a seconde FileStream.
      >
      > David
      >
      >
      >[/color]

      Comment

      • Rohith

        #4
        Re: Reading Binary Files

        Sorry, without looping through every chunk of data.

        Comment

        • David Browne

          #5
          Re: Reading Binary Files

          "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
          news:86F94E05-1F4E-4D86-B9BF-CEC70091222F@mi crosoft.com...[color=blue]
          > Sorry, without looping through every chunk of data.[/color]

          I don't see how. How would you know where the delimiter is?

          David


          Comment

          • Rohith

            #6
            Re: Reading Binary Files

            Thanks David. If checking through every byte for a delimitter, then it would
            be a huge performance blow...I was actually confused whether there is
            alternate solution for this?

            "David Browne" wrote:
            [color=blue]
            > "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
            > news:86F94E05-1F4E-4D86-B9BF-CEC70091222F@mi crosoft.com...[color=green]
            > > Sorry, without looping through every chunk of data.[/color]
            >
            > I don't see how. How would you know where the delimiter is?
            >
            > David
            >
            >
            >[/color]

            Comment

            • Bill Butler

              #7
              Re: Reading Binary Files


              "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
              news:2D3272B7-173D-4504-AE19-5F77F892E130@mi crosoft.com...[color=blue]
              > Thanks David. If checking through every byte for a delimitter, then it would
              > be a huge performance blow...I was actually confused whether there is
              > alternate solution for this?
              >[/color]

              First:
              Are you sure that the binary data CANNOT contain a newline?
              If it can.....Oh well.

              Are there any data length headers embedded in the Binary data?
              If so, you can possibly seek right to the position you need.
              Most binary files have header fields that provide datalengths or offsets.

              Without knowing anything about the structure of this file, it is difficult to be more helpful.

              Good luck
              Bill


              Comment

              • Rohith

                #8
                Re: Reading Binary Files

                Hi Bill,


                "Bill Butler" wrote:

                [color=blue]
                > First:
                > Are you sure that the binary data CANNOT contain a newline?
                > If it can.....Oh well.[/color]

                NewLine need not be a delimitter.Actu ally my requirement is that I have to
                serialize two binaryfiles in a single binary file and then deserialize it.
                The delimitter can be anything for that matter. I just need a way find the
                position of the delimitter in that file.
                [color=blue]
                > Are there any data length headers embedded in the Binary data?
                > If so, you can possibly seek right to the position you need.
                > Most binary files have header fields that provide datalengths or offsets.[/color]

                The thing is that I will not be knowing the actual postition. I will be
                knowing only the delimitter.
                [color=blue]
                > Without knowing anything about the structure of this file, it is difficult to be more helpful.[/color]

                Regarding the structure, Its only raw chunk of bytes.

                Thanks
                Rohith

                Comment

                • David Browne

                  #9
                  Re: Reading Binary Files


                  "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
                  news:E2679DD9-ACE1-47C4-ACD2-FB340D2BA135@mi crosoft.com...[color=blue]
                  > Hi Bill,
                  >
                  >
                  > "Bill Butler" wrote:
                  >
                  >[color=green]
                  >> First:
                  >> Are you sure that the binary data CANNOT contain a newline?
                  >> If it can.....Oh well.[/color]
                  >
                  > NewLine need not be a delimitter.Actu ally my requirement is that I have to
                  > serialize two binaryfiles in a single binary file and then deserialize it.
                  > The delimitter can be anything for that matter. I just need a way find the
                  > position of the delimitter in that file.
                  >[color=green]
                  >> Are there any data length headers embedded in the Binary data?
                  >> If so, you can possibly seek right to the position you need.
                  >> Most binary files have header fields that provide datalengths or offsets.[/color]
                  >
                  > The thing is that I will not be knowing the actual postition. I will be
                  > knowing only the delimitter.
                  >[color=green]
                  >> Without knowing anything about the structure of this file, it is
                  >> difficult to be more helpful.[/color]
                  >
                  > Regarding the structure, Its only raw chunk of bytes.
                  >[/color]

                  Typically you would prepend a header onto the file indicating, say the
                  number of files contained, their names and offsets. Then you can seek
                  around in the file to find the offsets.

                  David


                  Comment

                  • Bill Butler

                    #10
                    Re: Reading Binary Files


                    "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
                    news:E2679DD9-ACE1-47C4-ACD2-FB340D2BA135@mi crosoft.com...
                    <snip>[color=blue]
                    > NewLine need not be a delimitter.Actu ally my requirement is that I have to
                    > serialize two binaryfiles in a single binary file and then deserialize it.
                    > The delimitter can be anything for that matter. I just need a way find the
                    > position of the delimitter in that file.[/color]

                    If YOU are the one responsible for combining the data and then separating it, is there any reason
                    why you can't have a header in the file? If you could include a header, you could easily include the
                    sizes/offsets of the raw chunks. Then you would have no need of a delimiter.
                    If your hands are tied and you can do nothing more than a delimiter, then you have problems. You
                    need to choose a delimter that CANNOT exist in the binary data, but ANY value can exist in binary
                    data. You would need to scan your data to make sure that the delimiter is acceptible, and then find
                    a way to keep track of what the delimiter was.
                    If your only option is to use a delimiter, you have no choice, but to search for it linearly,
                    and you may need to have a multi-byte delimiter if every 8 bit combination exists in the data.

                    I personally would fight for the header.

                    Good luck,
                    Bill



                    Comment

                    • Jon Skeet [C# MVP]

                      #11
                      Re: Reading Binary Files

                      Rohith <Rohith@discuss ions.microsoft. com> wrote:[color=blue]
                      > Thanks David. If checking through every byte for a delimitter, then it would
                      > be a huge performance blow...I was actually confused whether there is
                      > alternate solution for this?[/color]

                      The cost of looking through memory is likely to be much smaller than
                      the IO cost in the first place.

                      As Bill suggested though, if you're the one who gets to combine the
                      files, it's easy - just include the lengths of each file.

                      --
                      Jon Skeet - <skeet@pobox.co m>
                      http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                      If replying to the group, please do not mail me too

                      Comment

                      • Chris Dunaway

                        #12
                        Re: Reading Binary Files

                        You're gonna have to read the file to split it anyway. If there is no
                        header that tells where the delimiter is or if you cannot create one,
                        then you will have to read the file in manually.

                        Typically, you would read a certain amount at a time into a memory
                        buffer, for example 4K, then search that buffer for the delimiter.

                        The performance should not be too bad.

                        Comment

                        • Rohith

                          #13
                          Re: Reading Binary Files

                          Thanks for the Replies.

                          I would not be able to add header to the files, as I have a set of previous
                          version(of my Application) binary files that does not have header. Now my new
                          requirement is to add two separate binary files in a single binary file and
                          deserialize them. But If i add header to the new files i will not be able to
                          identify which files to split and which not.

                          "Chris Dunaway" wrote:
                          [color=blue]
                          > You're gonna have to read the file to split it anyway. If there is no
                          > header that tells where the delimiter is or if you cannot create one,
                          > then you will have to read the file in manually.
                          >
                          > Typically, you would read a certain amount at a time into a memory
                          > buffer, for example 4K, then search that buffer for the delimiter.
                          >
                          > The performance should not be too bad.
                          >
                          >[/color]

                          Comment

                          • Bill Butler

                            #14
                            Re: Reading Binary Files

                            "Rohith" <Rohith@discuss ions.microsoft. com> wrote in message
                            news:31848E1B-AADD-4C2E-9133-BF0BDD9CA597@mi crosoft.com...[color=blue]
                            > Thanks for the Replies.
                            >
                            > I would not be able to add header to the files, as I have a set of previous
                            > version(of my Application) binary files that does not have header. Now my new
                            > requirement is to add two separate binary files in a single binary file and
                            > deserialize them. But If i add header to the new files i will not be able to
                            > identify which files to split and which not.[/color]

                            Sure you can.

                            Add the header to the Compound files only.
                            Start it of with a MAGIC String of bytes that remains the same.
                            Although there is a tiny possibility of incorrectly identifying a Simple file as being Compound, you
                            can control how tiny by extending the MAGIC String length.

                            Also

                            If the binary data is not random there will be some sequences of bytes that are FAR more likely than
                            others. Careful selection of the MAGIC String can effectively eliminate a false positive.

                            Good Luck
                            Bill


                            Comment

                            • Rohith

                              #15
                              Re: Reading Binary Files

                              As its a Huge file nearly(2GB), it would not be easy to form magic bytes
                              that will be present only once. Also the text present in the binary file will
                              not be the same..So to find the magic bytes do i have to search throught he
                              file every time before serializing?

                              "Bill Butler" wrote:
                              [color=blue]
                              > Sure you can.
                              >
                              > Add the header to the Compound files only.
                              > Start it of with a MAGIC String of bytes that remains the same.
                              > Although there is a tiny possibility of incorrectly identifying a Simple file as being Compound, you
                              > can control how tiny by extending the MAGIC String length.
                              >
                              > Also
                              >
                              > If the binary data is not random there will be some sequences of bytes that are FAR more likely than
                              > others. Careful selection of the MAGIC String can effectively eliminate a false positive.
                              >
                              > Good Luck
                              > Bill
                              >
                              >
                              >[/color]

                              Comment

                              Working...