Full text search in PDF and Word files ?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ned Baldessin

    Full text search in PDF and Word files ?

    Hi,

    I need to perform full text searches on a batch of PDF and Word files.
    What is the best way to go?

    After some research, I'm thinking of extracting the plain text from the
    files with "pdftotext" and "catdoc", hamonizing the various possible
    encodings to UTF-8, storing the text in a MySQL database, and then
    using the full text search capabilities of MySQL.
    Do you think that would work well? I am told that the files are mostly
    text and won't be longer than 30 pages.

    Thanks.

    --
    My email address doesn't ride a horse.

  • James

    #2
    Re: Full text search in PDF and Word files ?


    I do this with Oracle Text -- however the documents are not stored in
    the database, in fact Oracle is just used to index them (I store a
    filepath and filename)-- of course I do other things with Oracle
    however this has been a supurb solution for me and faster than you
    could ever believe.

    Essentially you get to search unlimited documents in their native
    format without actually having to do any real work for it.

    Comment

    Working...