pdf to text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Justin Koivisto

    pdf to text

    I am looking for a way to convert PDF files into text content. I don't
    care about layout or formatting, just the plain text that I can use to
    search against in a database.

    I've look into the pdftotext tool from:


    However, when I use it via the command line, it works fine. If I issue
    the same command via a system() call, there are major problems that
    cause the server to crash. (Don't know why, there aren't any error
    messages been generated anywhere.)

    I am looking to use this when a PDF file is uploaded via a form and
    store the text in a database for a search function.

    TIA

    -- Justin
  • NC

    #2
    Re: pdf to text

    Justin Koivisto wrote:[color=blue]
    >
    > I am looking for a way to convert PDF files into text content.[/color]

    I vaguely remember using Ghostscript for that...

    Cheers,
    NC

    Comment

    • Joe Blow

      #3
      Re: pdf to text

      I have not used pdftotext via system(). However you could try different
      versions of pdftotext. In my experience the version you use can have
      quite different effects. Different versions should be easily available.
      It's also possible to use ascii2txt, which depends on Ghostscript I
      think. When I tried it I got into a muddle of versions though, and
      pdftotext was much easier.

      Comment

      • Miguel Cruz

        #4
        Re: pdf to text

        Justin Koivisto <justin@koivi.c om> wrote:[color=blue]
        > I am looking for a way to convert PDF files into text content. I don't
        > care about layout or formatting, just the plain text that I can use to
        > search against in a database.
        >
        > I've look into the pdftotext tool from:
        > http://www.foolabs.com/xpdf/download.html
        >
        > However, when I use it via the command line, it works fine. If I issue
        > the same command via a system() call, there are major problems that
        > cause the server to crash. (Don't know why, there aren't any error
        > messages been generated anywhere.)[/color]

        What do you mean when you say the server crashes? The Apache process
        dies? The entire machine locks up? The server physically falls off the
        rack and lands on the floor?

        How about doing an experiment where you use system() to call a shell
        script that sets up some debugging and dumps the environment, and see
        what you come up with?

        miguel
        --
        Photos from 38 countries on 5 continents: http://travel.u.nu
        Latest photos: Australia; Malaysia; Burma; Thailand; Hong Kong
        Airports of the world: http://airport.u.nu

        Comment

        Working...