Calculating CRC32 for uploaded files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ricky Romaya

    Calculating CRC32 for uploaded files

    Hi,

    I'm working on a file upload script. I need to calculate the CRC32 of the
    file(s) which are successfully uploaded. How can I do this? PHP only have
    CRC32 function for strings. However, the uploaded file(s) are mostly
    binaries, and assumed have large size (5-12 MB per file).

    Are there any ways other than CRC32 (which supported by default PHP
    installation) to generate a unique hash of an arbitrary files? (the users
    of my script are assumed have no knowledge of unique file hash, so I can't
    depend on them to generate them prior upload)

    TIA
  • Colin McKinnon

    #2
    Re: Calculating CRC32 for uploaded files

    Ricky Romaya wrote:[color=blue]
    >
    > Are there any ways other than CRC32 (which supported by default PHP
    > installation) to generate a unique hash of an arbitrary files? (the users
    > of my script are assumed have no knowledge of unique file hash, so I can't
    > depend on them to generate them prior upload)
    >[/color]
    There are other hash functions, but for files of this size you'd be better
    shelling out and running a program specifically designed for the function.

    crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is
    supported by must systems (and free source code is available)

    HTH

    C.

    Comment

    • Ricky Romaya

      #3
      Re: Calculating CRC32 for uploaded files

      Colin McKinnon <colin.deleteth is@andthis.mms3 .com> wrote in
      news:ck2uel$l5m $1$8300dec7@new s.demon.co.uk:
      [color=blue]
      > There are other hash functions, but for files of this size you'd be
      > better shelling out and running a program specifically designed for
      > the function.
      >[/color]
      Uh, the problem is I don't own the server and used a webhosting instead. If
      only I could shell out and run a file hashing program, my life will be
      easier.
      [color=blue]
      > crc32 is hardly the cutting edge of file hashes. MD5 works quite well
      > and is supported by must systems (and free source code is available)
      >[/color]
      Hmm, care to elaborate about what is the 'cutting edge' file hashes
      algorithm?

      TIA

      Comment

      • Chung Leong

        #4
        Re: Calculating CRC32 for uploaded files


        "Colin McKinnon" <colin.deleteth is@andthis.mms3 .com> wrote in message
        news:ck2uel$l5m $1$8300dec7@new s.demon.co.uk.. .[color=blue]
        > crc32 is hardly the cutting edge of file hashes. MD5 works quite well and[/color]
        is[color=blue]
        > supported by must systems (and free source code is available)[/color]

        Plus there's md5_file(), so you don't have to load the entire file into
        memory to calculating the hash.


        Comment

        • Michael Vilain

          #5
          Re: Calculating CRC32 for uploaded files

          In article <Xns957C30E996E 19rickyralexand riacc@66.250.14 6.159>,
          Ricky Romaya <something@some where.com> wrote:
          [color=blue]
          > Colin McKinnon <colin.deleteth is@andthis.mms3 .com> wrote in
          > news:ck2uel$l5m $1$8300dec7@new s.demon.co.uk:
          >[color=green]
          > > There are other hash functions, but for files of this size you'd be
          > > better shelling out and running a program specifically designed for
          > > the function.
          > >[/color]
          > Uh, the problem is I don't own the server and used a webhosting instead. If
          > only I could shell out and run a file hashing program, my life will be
          > easier.
          >[color=green]
          > > crc32 is hardly the cutting edge of file hashes. MD5 works quite well
          > > and is supported by must systems (and free source code is available)
          > >[/color]
          > Hmm, care to elaborate about what is the 'cutting edge' file hashes
          > algorithm?
          >
          > TIA[/color]

          Aren't we a lazy-ass bum this afternoon...

          Doing a simple Goggle search on CRC32 and MD5 gives some choice hits:

          http://us4.php.net/crc32 (this what the OP originally ask for)



          http://us4.php.net/md5 (use to calculate md5 on a file)



          Basically, crc32 hashes aren't unique while md5 hashes are. SUN offers
          md5 checksums of all the files in the Solaris distributions as a
          'fingerprint' to verify if a file is authentic. That way a sysadmin can
          verify if the "ls" or "ps" they're using is the original from SUN.

          --
          DeeDee, don't press that button! DeeDee! NO! Dee...



          Comment

          • Chris

            #6
            Re: Calculating CRC32 for uploaded files

            -----BEGIN PGP SIGNED MESSAGE-----
            Hash: SHA1

            Michael Vilain wrote:

            [snip][color=blue]
            > Basically, crc32 hashes aren't unique while md5 hashes are. SUN
            > offers md5 checksums of all the files in the Solaris distributions
            > as a
            > 'fingerprint' to verify if a file is authentic. That way a sysadmin
            > can verify if the "ls" or "ps" they're using is the original from
            > SUN.
            >[/color]

            Hi,
            I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
            long; therefore, for any input length > 128 bits, there must be at
            *least* two possible inputs which produce the same output. For the
            given file lengths measured in megabytes, there would be an immense
            number of possible inputs that give the same output: the only thing
            is, it's relatively difficult to arbitrarily *find* another file with
            the same MD5 as a given input. They do exist, however, as a little
            math demonstrates:

            Number of possible MD5 hashes=2^128=3. 402823669209384 6e+38
            Number of possible 1 kilobit files=2^1024=1. 797693134862315 9e+308

            where ^ means "to the power of"

            As you see, if the input is only a kilobit long, there are *immensely*
            more possible inputs than possible outputs. Since every possible
            input is mapped to some output, obviously multiple inputs must be
            mapped to the same output. This is called a "hash collision". As far
            as I know, MD5 is not perfectly secure about this (these are just
            news items I read recently, I didn't look in detail at the subject);
            however, a more secure hash, such as SHA-1, although obviously still
            suffering from the *existence* of hash collisions, makes *looking*
            for them very difficult (i.e. you just have to try every possible
            input until you get a collision).

            Chris
            -----BEGIN PGP SIGNATURE-----
            Version: GnuPG v1.2.4 (GNU/Linux)

            iD8DBQFBZgl/gxSrXuMbw1YRAto WAJkBV342ESDMMh RmcJ28QX/wmUweUwCg+HI8
            irJmD8Aelju4mJw xXN586Xo=
            =d+rO
            -----END PGP SIGNATURE-----

            Comment

            • FLEB

              #7
              Re: Calculating CRC32 for uploaded files

              Regarding this well-known quote, often attributed to Chris's famous "Fri,
              08 Oct 2004 03:29:03 GMT" speech:
              [color=blue]
              > -----BEGIN PGP SIGNED MESSAGE-----
              > Hash: SHA1
              >
              > Michael Vilain wrote:
              >
              > [snip][color=green]
              >> Basically, crc32 hashes aren't unique while md5 hashes are. SUN
              >> offers md5 checksums of all the files in the Solaris distributions
              >> as a
              >> 'fingerprint' to verify if a file is authentic. That way a sysadmin
              >> can verify if the "ls" or "ps" they're using is the original from
              >> SUN.
              >>[/color]
              >
              > Hi,
              > I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
              > long; therefore, for any input length > 128 bits, there must be at
              > *least* two possible inputs which produce the same output. For the
              > given file lengths measured in megabytes, there would be an immense
              > number of possible inputs that give the same output: the only thing
              > is, it's relatively difficult to arbitrarily *find* another file with
              > the same MD5 as a given input. They do exist, however, as a little
              > math demonstrates:
              >
              > (snipped: big files/small hashes--some will be the same)[/color]

              But the idea, IIRC, is that although there may be collisions, the chance of
              two *legible* inputs with the same MD5 are immensely small. Most collisions
              will just be one intelligible value, and one with unusable garbage. Hence,
              MD5's usefulness in calculating file integrity (it would be very difficult,
              and quite detectable, to inject malware into a file and keep the MD5), and
              its dubious state as a password-security mechanism (since a password needs
              to be legible in no other way except to pass the MD5 check).

              --
              -- Rudy Fleminger
              -- sp@mmers.and.ev il.ones.will.bo w-down-to.us
              (put "Hey!" in the Subject line for priority processing!)
              -- http://www.pixelsaredead.com

              Comment

              • Chris

                #8
                Re: Calculating CRC32 for uploaded files

                -----BEGIN PGP SIGNED MESSAGE-----
                Hash: SHA1

                FLEB wrote:

                <snip>[color=blue]
                > But the idea, IIRC, is that although there may be collisions, the
                > chance of two *legible* inputs with the same MD5 are immensely
                > small. Most collisions will just be one intelligible value, and one
                > with unusable garbage. Hence, MD5's usefulness in calculating file
                > integrity (it would be very difficult, and quite detectable, to
                > inject malware into a file and keep the MD5), and its dubious state
                > as a password-security mechanism (since a password needs to be
                > legible in no other way except to pass the MD5 check).
                >[/color]

                Agreed. Of course, as I read somewhere (and makes a lot of sense), all
                you *need* is gibberish that passes the MD5 test if it's an MD5
                validating a BIOS flash. Kind of doesn't matter *what* you put there
                if all you want to do is break the computer, does it? (I know BIOS
                chips can be replaced; however, this would be a major undertaking for
                many people who just might possibly consider flashing their BIOSes).

                Of course, for uploading plain old files to a server, it's probably
                excellent - and besides, the original purpose was to make sure the
                file wasn't *accidentally* changed, for which it's excellent.

                Chris
                -----BEGIN PGP SIGNATURE-----
                Version: GnuPG v1.2.4 (GNU/Linux)

                iD8DBQFBZ5PxgxS rXuMbw1YRAiLKAK Cfyvkn4pXWMhjCW Wwc1KaNWqZi5wCg gkTt
                C8U8/ToYrvsL+6CHgq8J Iz0=
                =3vRO
                -----END PGP SIGNATURE-----

                Comment

                • Ricky Romaya

                  #9
                  Re: Calculating CRC32 for uploaded files

                  "Chung Leong" <chernyshevsky@ hotmail.com> wrote in
                  news:ULWdnfxcne ZdWvjcRVn-sA@comcast.com:[color=blue]
                  >
                  > Plus there's md5_file(), so you don't have to load the entire file
                  > into memory to calculating the hash.
                  >
                  >
                  >[/color]
                  Finally, someone answers my original question. Thx for the pointer. Why
                  didn't I see it when scouring the manual.

                  Comment

                  Working...