binary diff

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ching-Lung

    binary diff

    Hi all,

    I try to create a tool to check the delta (diff) of 2
    binaries and create the delta binary. I use binary
    formatter (serialization) to create the delta binary. It
    works fine but the delta binary is pretty huge in size. I
    have 1 byte file and 2 bytes file, the delta should be 1
    byte but somehow it turns out to be 249 bytes using binary
    formatter. I guess serialization has some other things
    added to the delta file.

    Is there any better way to create delta binary from 2
    given files, but the delta has to be able to be
    constructed back to the original file? My current
    solution, using serialization binary formatter is great
    and easy but the size of the delta file... *sigh*

    Please advice, thanks!
    -CL
  • Bruno Jouhier [MVP]

    #2
    Re: binary diff

    Binary diff is a tough problem. So far, the best paper I have seen on the
    subject is Andrew Tridgell's paper on rsync. It actually tackles a slightly
    more difficult problem that just binary diff, i.e. binary diff between local
    and remote file, but the general idea (rolling checksum + strong checksum)
    can be used to compute a binary diff between two local files in an efficient
    way.

    Andrew's paper is on http://samba.org/rsync/tech_report/

    I don't undestand how binary serialization would help you implement binary
    diff. Seems like it is only going to make matters worse, but maybe you want
    to do something slightly different than standard binary diff.

    Bruno.

    "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit dans le message de
    news:10d101c389 67$960e1d10$a30 1280a@phx.gbl.. .[color=blue]
    > Hi all,
    >
    > I try to create a tool to check the delta (diff) of 2
    > binaries and create the delta binary. I use binary
    > formatter (serialization) to create the delta binary. It
    > works fine but the delta binary is pretty huge in size. I
    > have 1 byte file and 2 bytes file, the delta should be 1
    > byte but somehow it turns out to be 249 bytes using binary
    > formatter. I guess serialization has some other things
    > added to the delta file.
    >
    > Is there any better way to create delta binary from 2
    > given files, but the delta has to be able to be
    > constructed back to the original file? My current
    > solution, using serialization binary formatter is great
    > and easy but the size of the delta file... *sigh*
    >
    > Please advice, thanks!
    > -CL[/color]


    Comment

    • William Stacey

      #3
      Re: binary diff

      Instead of serializing the byte[] diff, why can't you just write that to a
      binary diff file (i.e. no serialization)?

      --
      William Stacey, DNS MVP

      "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in message
      news:10d101c389 67$960e1d10$a30 1280a@phx.gbl.. .[color=blue]
      > Hi all,
      >
      > I try to create a tool to check the delta (diff) of 2
      > binaries and create the delta binary. I use binary
      > formatter (serialization) to create the delta binary. It
      > works fine but the delta binary is pretty huge in size. I
      > have 1 byte file and 2 bytes file, the delta should be 1
      > byte but somehow it turns out to be 249 bytes using binary
      > formatter. I guess serialization has some other things
      > added to the delta file.
      >
      > Is there any better way to create delta binary from 2
      > given files, but the delta has to be able to be
      > constructed back to the original file? My current
      > solution, using serialization binary formatter is great
      > and easy but the size of the delta file... *sigh*
      >
      > Please advice, thanks!
      > -CL[/color]


      Comment

      • Ching-Lung

        #4
        Re: binary diff

        Hi Bruno,

        "...I don't undestand how binary serialization would help
        you implement binary diff..."

        I compare each byte from byte[] of file A and byte[] of
        file B. If they are diff then I store the byte diff in a
        hashtable (key = offset, value = byte diff). There are
        some other cases to consider, i.e. if the size of those 2
        files are not the same, etc. Once done, I serialize the
        hashtable.

        To reconstruct file B from file A with this delta binary,
        I deserialize the hashtable, do byte comparison between
        file A and the hashtable with one assumption that file A
        is the same as file A I created the binary diff from.

        "...but maybe you want to do something slightly different
        than standard binary diff..."

        I don't know much about standard binary diff, but I assume
        that it's all described in Andrew's paper. I'll check it
        out soon.

        Thanks!
        -CL



        [color=blue]
        >-----Original Message-----
        >Binary diff is a tough problem. So far, the best paper I [/color]
        have seen on the[color=blue]
        >subject is Andrew Tridgell's paper on rsync. It actually [/color]
        tackles a slightly[color=blue]
        >more difficult problem that just binary diff, i.e. binary [/color]
        diff between local[color=blue]
        >and remote file, but the general idea (rolling checksum + [/color]
        strong checksum)[color=blue]
        >can be used to compute a binary diff between two local [/color]
        files in an efficient[color=blue]
        >way.
        >
        >Andrew's paper is on http://samba.org/rsync/tech_report/
        >
        >I don't undestand how binary serialization would help you [/color]
        implement binary[color=blue]
        >diff. Seems like it is only going to make matters worse, [/color]
        but maybe you want[color=blue]
        >to do something slightly different than standard binary [/color]
        diff.[color=blue]
        >
        >Bruno.
        >
        >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit [/color]
        dans le message de[color=blue]
        >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=green]
        >> Hi all,
        >>
        >> I try to create a tool to check the delta (diff) of 2
        >> binaries and create the delta binary. I use binary
        >> formatter (serialization) to create the delta binary. [/color][/color]
        It[color=blue][color=green]
        >> works fine but the delta binary is pretty huge in [/color][/color]
        size. I[color=blue][color=green]
        >> have 1 byte file and 2 bytes file, the delta should be 1
        >> byte but somehow it turns out to be 249 bytes using [/color][/color]
        binary[color=blue][color=green]
        >> formatter. I guess serialization has some other things
        >> added to the delta file.
        >>
        >> Is there any better way to create delta binary from 2
        >> given files, but the delta has to be able to be
        >> constructed back to the original file? My current
        >> solution, using serialization binary formatter is great
        >> and easy but the size of the delta file... *sigh*
        >>
        >> Please advice, thanks!
        >> -CL[/color]
        >
        >
        >.
        >[/color]

        Comment

        • Ching-Lung

          #5
          Re: binary diff

          Hi Will,

          I actually store the byte diff in a hashtable (key =
          offset, value = byte diff). With serialization, it's
          easier to maintain the state of the hashtable.

          I was once thought about put everything in one byte[] and,
          like you suggested, no serialization-write it to a file.
          But this way, for each byte diff, I have to store the
          offset, length of the byte diff, and the byte itself. And
          to reconstruct file B back, I have to have 3 readings to
          get the offset, length, and read the next length byte from
          get the byte diff.

          File A:
          +-------------------+
          | A | B | C | D | E |
          +-------------------+

          File B:
          +-------------------+
          | A | b | c | D | e |
          +-------------------+

          byte[] diff:
          +---------------------------+
          | 1 | 2 | b | c | 4 | 1 | e |
          +---------------------------+

          So:
          - diff[0] = 1 indicates that A[1] has changed
          - diff[1] = 2 indicates that I have to read the next 2
          bytes from diff[] to get 'b' and 'c' and stick them into
          file A
          - and this goes on and on...

          Isn't it too much work to do?

          Well, above is just one example and the size of file A and
          B happen to be the same. One also needs to consider other
          cases.

          Any more idea?
          -CL




          [color=blue]
          >-----Original Message-----
          >Instead of serializing the byte[] diff, why can't you[/color]
          just write that to a[color=blue]
          >binary diff file (i.e. no serialization)?
          >
          >--
          >William Stacey, DNS MVP
          >
          >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in[/color]
          message[color=blue]
          >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=green]
          >> Hi all,
          >>
          >> I try to create a tool to check the delta (diff) of 2
          >> binaries and create the delta binary. I use binary
          >> formatter (serialization) to create the delta binary.[/color][/color]
          It[color=blue][color=green]
          >> works fine but the delta binary is pretty huge in[/color][/color]
          size. I[color=blue][color=green]
          >> have 1 byte file and 2 bytes file, the delta should be 1
          >> byte but somehow it turns out to be 249 bytes using[/color][/color]
          binary[color=blue][color=green]
          >> formatter. I guess serialization has some other things
          >> added to the delta file.
          >>
          >> Is there any better way to create delta binary from 2
          >> given files, but the delta has to be able to be
          >> constructed back to the original file? My current
          >> solution, using serialization binary formatter is great
          >> and easy but the size of the delta file... *sigh*
          >>
          >> Please advice, thanks!
          >> -CL[/color][/color]

          Comment

          • Bruno Jouhier [MVP]

            #6
            Re: binary diff

            Your algorithm will work and produce a diff that is significantly smaller
            than the complete files only in very special circumstances, i.e., if a few
            isolated bytes get replaced by some other bytes in your file. But it does
            not qualify for a general purpose binary diff utility.

            If the file is reorganized in some way (bytes inserted, deleted or moved
            around), your algo will not find any match and will produce an output which
            is much larger than the files themselves.

            And even if the file is not reorganized like that, your algo will perform
            very poorly if sequences of bytes change between two versions, because your
            are storing the diffs on individual bytes. You should at least try to find
            differing sequences rather than individual bytes.

            Finally, dumping a Hashtable is not an optimal way to save the difference. I
            suggest you take a look at the GDIFF format for a good example on how a
            binary diff could be stored: http://www.w3.org/TR/NOTE-gdiff-19970901.

            Bruno.

            "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit dans le message de
            news:05cb01c389 b0$929518c0$a00 1280a@phx.gbl.. .
            Hi Bruno,

            "...I don't undestand how binary serialization would help
            you implement binary diff..."

            I compare each byte from byte[] of file A and byte[] of
            file B. If they are diff then I store the byte diff in a
            hashtable (key = offset, value = byte diff). There are
            some other cases to consider, i.e. if the size of those 2
            files are not the same, etc. Once done, I serialize the
            hashtable.

            To reconstruct file B from file A with this delta binary,
            I deserialize the hashtable, do byte comparison between
            file A and the hashtable with one assumption that file A
            is the same as file A I created the binary diff from.

            "...but maybe you want to do something slightly different
            than standard binary diff..."

            I don't know much about standard binary diff, but I assume
            that it's all described in Andrew's paper. I'll check it
            out soon.

            Thanks!
            -CL



            [color=blue]
            >-----Original Message-----
            >Binary diff is a tough problem. So far, the best paper I[/color]
            have seen on the[color=blue]
            >subject is Andrew Tridgell's paper on rsync. It actually[/color]
            tackles a slightly[color=blue]
            >more difficult problem that just binary diff, i.e. binary[/color]
            diff between local[color=blue]
            >and remote file, but the general idea (rolling checksum +[/color]
            strong checksum)[color=blue]
            >can be used to compute a binary diff between two local[/color]
            files in an efficient[color=blue]
            >way.
            >
            >Andrew's paper is on http://samba.org/rsync/tech_report/
            >
            >I don't undestand how binary serialization would help you[/color]
            implement binary[color=blue]
            >diff. Seems like it is only going to make matters worse,[/color]
            but maybe you want[color=blue]
            >to do something slightly different than standard binary[/color]
            diff.[color=blue]
            >
            >Bruno.
            >
            >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit[/color]
            dans le message de[color=blue]
            >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=green]
            >> Hi all,
            >>
            >> I try to create a tool to check the delta (diff) of 2
            >> binaries and create the delta binary. I use binary
            >> formatter (serialization) to create the delta binary.[/color][/color]
            It[color=blue][color=green]
            >> works fine but the delta binary is pretty huge in[/color][/color]
            size. I[color=blue][color=green]
            >> have 1 byte file and 2 bytes file, the delta should be 1
            >> byte but somehow it turns out to be 249 bytes using[/color][/color]
            binary[color=blue][color=green]
            >> formatter. I guess serialization has some other things
            >> added to the delta file.
            >>
            >> Is there any better way to create delta binary from 2
            >> given files, but the delta has to be able to be
            >> constructed back to the original file? My current
            >> solution, using serialization binary formatter is great
            >> and easy but the size of the delta file... *sigh*
            >>
            >> Please advice, thanks!
            >> -CL[/color]
            >
            >
            >.
            >[/color]


            Comment

            • William Stacey

              #7
              Re: binary diff

              To get a feel for what is being stored, serialize the class using XML
              serializer and look at. As you will see, its not the data but the metadata
              that takes a lot of space. Now imagin how you might store the same data and
              metadata using a binary format. Its not xml (although there is bxml in the
              works), but you see the issue with creating a general format to reconstruct
              your object graph. Unless you come up with your own special method to
              serialize your structure, binary is probably the smallest you will get. If
              you must go more compact, I would probably look at a function to take your
              hashtable and convert it to a byte[] or a comma delimited text file and back
              again.

              --
              William Stacey, DNS MVP

              "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in message
              news:17dc01c389 b2$d5622970$a10 1280a@phx.gbl.. .[color=blue]
              > Hi Will,
              >
              > I actually store the byte diff in a hashtable (key =
              > offset, value = byte diff). With serialization, it's
              > easier to maintain the state of the hashtable.
              >
              > I was once thought about put everything in one byte[] and,
              > like you suggested, no serialization-write it to a file.
              > But this way, for each byte diff, I have to store the
              > offset, length of the byte diff, and the byte itself. And
              > to reconstruct file B back, I have to have 3 readings to
              > get the offset, length, and read the next length byte from
              > get the byte diff.
              >
              > File A:
              > +-------------------+
              > | A | B | C | D | E |
              > +-------------------+
              >
              > File B:
              > +-------------------+
              > | A | b | c | D | e |
              > +-------------------+
              >
              > byte[] diff:
              > +---------------------------+
              > | 1 | 2 | b | c | 4 | 1 | e |
              > +---------------------------+
              >
              > So:
              > - diff[0] = 1 indicates that A[1] has changed
              > - diff[1] = 2 indicates that I have to read the next 2
              > bytes from diff[] to get 'b' and 'c' and stick them into
              > file A
              > - and this goes on and on...
              >
              > Isn't it too much work to do?
              >
              > Well, above is just one example and the size of file A and
              > B happen to be the same. One also needs to consider other
              > cases.
              >
              > Any more idea?
              > -CL
              >
              >
              >
              >
              >[color=green]
              > >-----Original Message-----
              > >Instead of serializing the byte[] diff, why can't you[/color]
              > just write that to a[color=green]
              > >binary diff file (i.e. no serialization)?
              > >
              > >--
              > >William Stacey, DNS MVP
              > >
              > >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in[/color]
              > message[color=green]
              > >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=darkred]
              > >> Hi all,
              > >>
              > >> I try to create a tool to check the delta (diff) of 2
              > >> binaries and create the delta binary. I use binary
              > >> formatter (serialization) to create the delta binary.[/color][/color]
              > It[color=green][color=darkred]
              > >> works fine but the delta binary is pretty huge in[/color][/color]
              > size. I[color=green][color=darkred]
              > >> have 1 byte file and 2 bytes file, the delta should be 1
              > >> byte but somehow it turns out to be 249 bytes using[/color][/color]
              > binary[color=green][color=darkred]
              > >> formatter. I guess serialization has some other things
              > >> added to the delta file.
              > >>
              > >> Is there any better way to create delta binary from 2
              > >> given files, but the delta has to be able to be
              > >> constructed back to the original file? My current
              > >> solution, using serialization binary formatter is great
              > >> and easy but the size of the delta file... *sigh*
              > >>
              > >> Please advice, thanks!
              > >> -CL[/color][/color]
              >[/color]


              Comment

              • William Stacey

                #8
                Re: binary diff

                To get a feel for what is being stored, serialize the class using XML
                serializer and look at. As you will see, its not the data but the metadata
                that takes a lot of space. Now imagin how you might store the same data and
                metadata using a binary format. Its not xml (although there is bxml in the
                works), but you see the issue with creating a general format to reconstruct
                your object graph. Unless you come up with your own special method to
                serialize your structure, binary is probably the smallest you will get. If
                you must go more compact, I would probably look at a function to take your
                hashtable and convert it to a byte[] or a comma delimited text file and back
                again.

                --
                William Stacey, DNS MVP

                "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in message
                news:17dc01c389 b2$d5622970$a10 1280a@phx.gbl.. .[color=blue]
                > Hi Will,
                >
                > I actually store the byte diff in a hashtable (key =
                > offset, value = byte diff). With serialization, it's
                > easier to maintain the state of the hashtable.
                >
                > I was once thought about put everything in one byte[] and,
                > like you suggested, no serialization-write it to a file.
                > But this way, for each byte diff, I have to store the
                > offset, length of the byte diff, and the byte itself. And
                > to reconstruct file B back, I have to have 3 readings to
                > get the offset, length, and read the next length byte from
                > get the byte diff.
                >
                > File A:
                > +-------------------+
                > | A | B | C | D | E |
                > +-------------------+
                >
                > File B:
                > +-------------------+
                > | A | b | c | D | e |
                > +-------------------+
                >
                > byte[] diff:
                > +---------------------------+
                > | 1 | 2 | b | c | 4 | 1 | e |
                > +---------------------------+
                >
                > So:
                > - diff[0] = 1 indicates that A[1] has changed
                > - diff[1] = 2 indicates that I have to read the next 2
                > bytes from diff[] to get 'b' and 'c' and stick them into
                > file A
                > - and this goes on and on...
                >
                > Isn't it too much work to do?
                >
                > Well, above is just one example and the size of file A and
                > B happen to be the same. One also needs to consider other
                > cases.
                >
                > Any more idea?
                > -CL
                >
                >
                >
                >
                >[color=green]
                > >-----Original Message-----
                > >Instead of serializing the byte[] diff, why can't you[/color]
                > just write that to a[color=green]
                > >binary diff file (i.e. no serialization)?
                > >
                > >--
                > >William Stacey, DNS MVP
                > >
                > >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> wrote in[/color]
                > message[color=green]
                > >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=darkred]
                > >> Hi all,
                > >>
                > >> I try to create a tool to check the delta (diff) of 2
                > >> binaries and create the delta binary. I use binary
                > >> formatter (serialization) to create the delta binary.[/color][/color]
                > It[color=green][color=darkred]
                > >> works fine but the delta binary is pretty huge in[/color][/color]
                > size. I[color=green][color=darkred]
                > >> have 1 byte file and 2 bytes file, the delta should be 1
                > >> byte but somehow it turns out to be 249 bytes using[/color][/color]
                > binary[color=green][color=darkred]
                > >> formatter. I guess serialization has some other things
                > >> added to the delta file.
                > >>
                > >> Is there any better way to create delta binary from 2
                > >> given files, but the delta has to be able to be
                > >> constructed back to the original file? My current
                > >> solution, using serialization binary formatter is great
                > >> and easy but the size of the delta file... *sigh*
                > >>
                > >> Please advice, thanks!
                > >> -CL[/color][/color]
                >[/color]


                Comment

                • Bruno Jouhier [MVP]

                  #9
                  Re: binary diff

                  Hi Ching-Lung,

                  I read your reply to William and saw that your algo is smart enough to find
                  sequences that differ, not just isolated bytes. So, my judgement was a bit
                  severe.

                  Anyway, you should still take a look at the GDIFF format, it is a smart
                  format for storing the diff and it is relatively easy to generate and
                  process (you can take some shortcuts and eliminate the byte/short/int
                  variants of the tags if you don't need an optimal size).

                  Bruno.

                  "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit dans le message de
                  news:05cb01c389 b0$929518c0$a00 1280a@phx.gbl.. .
                  Hi Bruno,

                  "...I don't undestand how binary serialization would help
                  you implement binary diff..."

                  I compare each byte from byte[] of file A and byte[] of
                  file B. If they are diff then I store the byte diff in a
                  hashtable (key = offset, value = byte diff). There are
                  some other cases to consider, i.e. if the size of those 2
                  files are not the same, etc. Once done, I serialize the
                  hashtable.

                  To reconstruct file B from file A with this delta binary,
                  I deserialize the hashtable, do byte comparison between
                  file A and the hashtable with one assumption that file A
                  is the same as file A I created the binary diff from.

                  "...but maybe you want to do something slightly different
                  than standard binary diff..."

                  I don't know much about standard binary diff, but I assume
                  that it's all described in Andrew's paper. I'll check it
                  out soon.

                  Thanks!
                  -CL



                  [color=blue]
                  >-----Original Message-----
                  >Binary diff is a tough problem. So far, the best paper I[/color]
                  have seen on the[color=blue]
                  >subject is Andrew Tridgell's paper on rsync. It actually[/color]
                  tackles a slightly[color=blue]
                  >more difficult problem that just binary diff, i.e. binary[/color]
                  diff between local[color=blue]
                  >and remote file, but the general idea (rolling checksum +[/color]
                  strong checksum)[color=blue]
                  >can be used to compute a binary diff between two local[/color]
                  files in an efficient[color=blue]
                  >way.
                  >
                  >Andrew's paper is on http://samba.org/rsync/tech_report/
                  >
                  >I don't undestand how binary serialization would help you[/color]
                  implement binary[color=blue]
                  >diff. Seems like it is only going to make matters worse,[/color]
                  but maybe you want[color=blue]
                  >to do something slightly different than standard binary[/color]
                  diff.[color=blue]
                  >
                  >Bruno.
                  >
                  >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit[/color]
                  dans le message de[color=blue]
                  >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=green]
                  >> Hi all,
                  >>
                  >> I try to create a tool to check the delta (diff) of 2
                  >> binaries and create the delta binary. I use binary
                  >> formatter (serialization) to create the delta binary.[/color][/color]
                  It[color=blue][color=green]
                  >> works fine but the delta binary is pretty huge in[/color][/color]
                  size. I[color=blue][color=green]
                  >> have 1 byte file and 2 bytes file, the delta should be 1
                  >> byte but somehow it turns out to be 249 bytes using[/color][/color]
                  binary[color=blue][color=green]
                  >> formatter. I guess serialization has some other things
                  >> added to the delta file.
                  >>
                  >> Is there any better way to create delta binary from 2
                  >> given files, but the delta has to be able to be
                  >> constructed back to the original file? My current
                  >> solution, using serialization binary formatter is great
                  >> and easy but the size of the delta file... *sigh*
                  >>
                  >> Please advice, thanks!
                  >> -CL[/color]
                  >
                  >
                  >.
                  >[/color]


                  Comment

                  • Bruno Jouhier [MVP]

                    #10
                    Re: binary diff

                    Hi Ching-Lung,

                    I read your reply to William and saw that your algo is smart enough to find
                    sequences that differ, not just isolated bytes. So, my judgement was a bit
                    severe.

                    Anyway, you should still take a look at the GDIFF format, it is a smart
                    format for storing the diff and it is relatively easy to generate and
                    process (you can take some shortcuts and eliminate the byte/short/int
                    variants of the tags if you don't need an optimal size).

                    Bruno.

                    "Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit dans le message de
                    news:05cb01c389 b0$929518c0$a00 1280a@phx.gbl.. .
                    Hi Bruno,

                    "...I don't undestand how binary serialization would help
                    you implement binary diff..."

                    I compare each byte from byte[] of file A and byte[] of
                    file B. If they are diff then I store the byte diff in a
                    hashtable (key = offset, value = byte diff). There are
                    some other cases to consider, i.e. if the size of those 2
                    files are not the same, etc. Once done, I serialize the
                    hashtable.

                    To reconstruct file B from file A with this delta binary,
                    I deserialize the hashtable, do byte comparison between
                    file A and the hashtable with one assumption that file A
                    is the same as file A I created the binary diff from.

                    "...but maybe you want to do something slightly different
                    than standard binary diff..."

                    I don't know much about standard binary diff, but I assume
                    that it's all described in Andrew's paper. I'll check it
                    out soon.

                    Thanks!
                    -CL



                    [color=blue]
                    >-----Original Message-----
                    >Binary diff is a tough problem. So far, the best paper I[/color]
                    have seen on the[color=blue]
                    >subject is Andrew Tridgell's paper on rsync. It actually[/color]
                    tackles a slightly[color=blue]
                    >more difficult problem that just binary diff, i.e. binary[/color]
                    diff between local[color=blue]
                    >and remote file, but the general idea (rolling checksum +[/color]
                    strong checksum)[color=blue]
                    >can be used to compute a binary diff between two local[/color]
                    files in an efficient[color=blue]
                    >way.
                    >
                    >Andrew's paper is on http://samba.org/rsync/tech_report/
                    >
                    >I don't undestand how binary serialization would help you[/color]
                    implement binary[color=blue]
                    >diff. Seems like it is only going to make matters worse,[/color]
                    but maybe you want[color=blue]
                    >to do something slightly different than standard binary[/color]
                    diff.[color=blue]
                    >
                    >Bruno.
                    >
                    >"Ching-Lung" <chinglung@alum ni.cs.utexas.ed u> a écrit[/color]
                    dans le message de[color=blue]
                    >news:10d101c38 967$960e1d10$a3 01280a@phx.gbl. ..[color=green]
                    >> Hi all,
                    >>
                    >> I try to create a tool to check the delta (diff) of 2
                    >> binaries and create the delta binary. I use binary
                    >> formatter (serialization) to create the delta binary.[/color][/color]
                    It[color=blue][color=green]
                    >> works fine but the delta binary is pretty huge in[/color][/color]
                    size. I[color=blue][color=green]
                    >> have 1 byte file and 2 bytes file, the delta should be 1
                    >> byte but somehow it turns out to be 249 bytes using[/color][/color]
                    binary[color=blue][color=green]
                    >> formatter. I guess serialization has some other things
                    >> added to the delta file.
                    >>
                    >> Is there any better way to create delta binary from 2
                    >> given files, but the delta has to be able to be
                    >> constructed back to the original file? My current
                    >> solution, using serialization binary formatter is great
                    >> and easy but the size of the delta file... *sigh*
                    >>
                    >> Please advice, thanks!
                    >> -CL[/color]
                    >
                    >
                    >.
                    >[/color]


                    Comment

                    Working...