PDF Parser

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Farhan

    PDF Parser

    i am trying to make a PDF parser in PHP which will be able to extract
    data from PDF files. i want to basically convert pdf files to XML
    data. any idea from where i could start?
  • Shawn Wilson

    #2
    Re: PDF Parser

    Farhan wrote:[color=blue]
    >
    > i am trying to make a PDF parser in PHP which will be able to extract
    > data from PDF files. i want to basically convert pdf files to XML
    > data. any idea from where i could start?[/color]


    You may find the links in the comments helpful.
    I've never done anything with PDFs/PHP, but some of the tutorials looked
    promising.

    Regards,
    Shawn
    --
    Shawn Wilson
    shawn@glassgian t.com


    I have a spam filter. Please include "PHP" in the
    subject line to ensure I'll get your message.

    Comment

    • Farhan

      #3
      Re: PDF Parser

      thanks, shawn, for your reply. but PDFLib is not really what i am
      looking for. i need to extract data from PDF files. someone at #php in
      freenode told me that PDFLib with PID will be able to do that. but i
      don't think PID comes along with PHP, we need to by it.

      farhan

      Comment

      • Chung Leong

        #4
        Re: PDF Parser

        See my PDF highlighting code:



        Pay attention to line 451 to 462.

        Uzytkownik "Farhan" <god_father52@h otmail.com> napisal w wiadomosci
        news:b68af333.0 401150556.80833 45@posting.goog le.com...[color=blue]
        > thanks, shawn, for your reply. but PDFLib is not really what i am
        > looking for. i need to extract data from PDF files. someone at #php in
        > freenode told me that PDFLib with PID will be able to do that. but i
        > don't think PID comes along with PHP, we need to by it.
        >
        > farhan[/color]


        Comment

        • Farhan

          #5
          Re: PDF Parser

          > See my PDF highlighting code:[color=blue]
          >
          > http://www.conradish.net/pdfhi.php.txt
          >
          > Pay attention to line 451 to 462.
          >[/color]

          thanks chung. i will try looking at the code some other time, because
          the server you are ointing me to seems to be down right now. but just
          a question - do you just search the binary data or do you follow the
          Adobe PDF Specification? if you do, is it too complicated? thanks
          again.

          farhan

          Comment

          Working...