detecting orignal file format

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • omerbutt
    Contributor
    • Nov 2006
    • 638

    detecting orignal file format

    hi
    i have an application which accepts only the following extensions
    jpeg
    jpg
    giff
    tiff
    bmp
    doc
    docx
    pdf

    and converts them to a single tiff file using imagick
    .Now my question is not related to imagick in-fact it is about reading the original format of the file initially i was using a function to extract the extension from the file name then i swapped it by checking the mimetype and the reason was that if someone changes the file extension lets say for e.g there is an image named
    ABC.png and someone changes it to ABC.DOC the application should give an error message.


    The file you provided is not a valid doc format
    but using the mimetype in the

    $_FILES['file_name']['type']

    it wont detect the original mimetype of the file infact it gives the mimetype 'application/msword' whereas the original mimetype was 'image/png' , now i am looking towards reading the metadata info of the file but would it help with both images and documents ,apart from it i have heard that it is not compulsary that every image would have the metadata info attached with it, guys i have done the big part of it all is done but this requirement is making me mad . Any help / helpfull link would be appreciated
    regards,
    Omer Aslam
  • Stewart Ross
    Recognized Expert Moderator Specialist
    • Feb 2008
    • 2545

    #2
    Hi Omer. Not all files of the types you list will have metadata, and, as you say, some of the metadata may not be complete.

    Even if the file does have metadata, its location and form within the file will be type-dependent - so how would you know where to look for it in a file with an incorrect extension? No point in looking for Word's metadata if in fact the .doc file is actually a PNG file instead.

    Sorry, but I don't see any easy or 100% viable solution that would allow you to retrieve the information that was lost when a file was stored with a different file extension than it should have had.

    -Stewart

    Comment

    • Markus
      Recognized Expert Expert
      • Jun 2007
      • 6092

      #3
      Well, I'm not sure that a file loses any information when it's extension is changed. But, you're right, this is a problematic area, determining the type of a file based on its content. However, the finfo extension (enabled by default in 5.3) does a good job at determining the type of file through some heuristics. As said before, though, this is definitely not bulletproof.

      Comment

      • omerbutt
        Contributor
        • Nov 2006
        • 638

        #4
        @ markus yeah this thing went through my eyes but i have 5.2.6 and as i told i am using imagick additionally openofice and opswat antivirus to scan the file for virus and sql server 2010 on windows 2008 server its realy a jargon, switchng version wud require me to switch the approp or compatble ver dlls for imagick,sqlserv er and others and that wud b mess , bnlve me u wnt like to do that if u have worked in such an envrmnt and specialy with imagick
        @stewart yes i can but i have to dedicatedly treat al group of formats like images ,documemts text files, have u read abt getid3 which works for media type of files , what ever i do or give name or extension it wud tel u that its an mp3 , mpg4 and bla bla same like that there is phptoexl nt sure about the name exactly bt it is 5ike that , bt it wud be again somethng like whole project

        Comment

        • Markus
          Recognized Expert Expert
          • Jun 2007
          • 6092

          #5
          Well... if you insist on using very old versions of PHP, then you're preventing yourself from using the cool latest features. Updating your PHP really isn't that hard.

          Comment

          • omerbutt
            Contributor
            • Nov 2006
            • 638

            #6
            @markus its nt the php what wud iritate me bt its the extensions that i hav installed em talkng abt them

            Comment

            Working...