Getting text from PDF

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Klaus Jensen

    Getting text from PDF

    Hi!

    I need to extract all text from PDF-files for fulltext-indexing purposes.
    How do I do that?

    I have looked at several PDF-components, but none of them have features to
    read the text in the PDF - only create PDF's.

    Using an applicaton (or indexing service) to search the pdf is not what I
    need, I need to extract the text and store it in a database.

    Any pointers and help will be greatly appreciated.

    Thanks in advance

    Klaus Jensen



  • Brian Henry

    #2
    Re: Getting text from PDF

    if you are using SQL Server all you need to do is install the adobe PDF
    Ifilter and it will full text index it for you automatically

    "Klaus Jensen" <spammers@burni nhell.com> wrote in message
    news:ufzvXoe9FH A.3544@TK2MSFTN GP09.phx.gbl...[color=blue]
    > Hi!
    >
    > I need to extract all text from PDF-files for fulltext-indexing purposes.
    > How do I do that?
    >
    > I have looked at several PDF-components, but none of them have features to
    > read the text in the PDF - only create PDF's.
    >
    > Using an applicaton (or indexing service) to search the pdf is not what I
    > need, I need to extract the text and store it in a database.
    >
    > Any pointers and help will be greatly appreciated.
    >
    > Thanks in advance
    >
    > Klaus Jensen
    >
    >
    >[/color]


    Comment

    • Ken Tucker [MVP]

      #3
      Re: Getting text from PDF

      Hi,



      Ken
      -----------
      "Klaus Jensen" <spammers@burni nhell.com> wrote in message
      news:ufzvXoe9FH A.3544@TK2MSFTN GP09.phx.gbl...[color=blue]
      > Hi!
      >
      > I need to extract all text from PDF-files for fulltext-indexing purposes.
      > How do I do that?
      >
      > I have looked at several PDF-components, but none of them have features to
      > read the text in the PDF - only create PDF's.
      >
      > Using an applicaton (or indexing service) to search the pdf is not what I
      > need, I need to extract the text and store it in a database.
      >
      > Any pointers and help will be greatly appreciated.
      >
      > Thanks in advance
      >
      > Klaus Jensen
      >
      >
      >[/color]


      Comment

      • Klaus Jensen

        #4
        Re: Getting text from PDF

        "Brian Henry" <nospam@nospam. com> wrote in message
        news:u8gB5Df9FH A.2176@TK2MSFTN GP14.phx.gbl...[color=blue]
        > if you are using SQL Server all you need to do is install the adobe PDF
        > Ifilter and it will full text index it for you automatically[/color]

        Hi Brian

        Thanks for your response!

        Unfortunately that would mean having to store the PDF's in the SQL Server,
        and I am talking about 1 gig of data a day... Im afraid it is not an option.

        - Klaus


        Comment

        • Klaus Jensen

          #5
          Re: Getting text from PDF

          "Ken Tucker [MVP]" <vb2ae@bellsout h.net> wrote in message
          news:umUpGnm9FH A.1028@TK2MSFTN GP11.phx.gbl...[color=blue]
          > http://www.codeproject.com/showcase/TallComponents.asp[/color]

          Hi Ken

          Thanks for your reply, I'll look into it.

          - Klaus


          Comment

          Working...