Good search theory

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • AaronV

    Good search theory

    Hello,

    I'm a webmaster for a college newspaper and I'm implementing an article
    search. I'm running PHP with a MySQL database to store the weekly
    stories. Does anyone know of an article that could offer good search
    theory.

    My top priority right now is multiple search terms and relevance
    sorting based on how many word hits are returned.

    It's easy to search for a single word or term in a body of text. I can
    just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
    searching for two terms, or searching for the most relevant document
    based on how many hits of the term are found.

    I imagine I would split up the search query and run multiple "LIKE
    'term'" queries to find multiple hits. I would have to pick some
    arbitrary number of searches because searching each article 50 times is
    not an option.

    Seems like there are a lot of choices in how to set up a good search
    system and I'd like to get started on the right foot to reduce my work
    load.

    -Aaron

  • Sacs

    #2
    Re: Good search theory

    AaronV wrote:[color=blue]
    > Hello,
    >
    > I'm a webmaster for a college newspaper and I'm implementing an article
    > search. I'm running PHP with a MySQL database to store the weekly
    > stories. Does anyone know of an article that could offer good search
    > theory.
    >
    > My top priority right now is multiple search terms and relevance
    > sorting based on how many word hits are returned.
    >
    > It's easy to search for a single word or term in a body of text. I can
    > just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
    > searching for two terms, or searching for the most relevant document
    > based on how many hits of the term are found.
    >
    > I imagine I would split up the search query and run multiple "LIKE
    > 'term'" queries to find multiple hits. I would have to pick some
    > arbitrary number of searches because searching each article 50 times is
    > not an option.
    >
    > Seems like there are a lot of choices in how to set up a good search
    > system and I'd like to get started on the right foot to reduce my work
    > load.
    >
    > -Aaron
    >[/color]
    You could look at fulltext searches.



    Look especially at the MATCH bits to get the relevance of the result.

    Sacs

    Comment

    • Mike Ash

      #3
      Re: Good search theory

      Since your search will be done on a body of text, I would suggest using
      MySQL's fulltext search. It is more efficient and accurate than using
      simple LIKE queries. Fulltext searches will also allow you to
      determine the relevancy of the results. All the searches that I've
      done over the years haven't ever worked "exactly" right, but fulltext
      is as close as I've ever gotten. Below are some links that hopefully
      will point you in the right direction.





      Mike

      Comment

      • Chung Leong

        #4
        Re: Good search theory

        "AaronV" <aaron.vanderpo el@gmail.com> wrote in message
        news:1110915682 .132517.134550@ l41g2000cwc.goo glegroups.com.. .[color=blue]
        > Hello,
        >
        > I'm a webmaster for a college newspaper and I'm implementing an article
        > search. I'm running PHP with a MySQL database to store the weekly
        > stories. Does anyone know of an article that could offer good search
        > theory.
        >
        > My top priority right now is multiple search terms and relevance
        > sorting based on how many word hits are returned.
        >
        > It's easy to search for a single word or term in a body of text. I can
        > just use the MySQL "WHERE `body` LIKE 'term'" query. But what about
        > searching for two terms, or searching for the most relevant document
        > based on how many hits of the term are found.
        >
        > I imagine I would split up the search query and run multiple "LIKE
        > 'term'" queries to find multiple hits. I would have to pick some
        > arbitrary number of searches because searching each article 50 times is
        > not an option.
        >
        > Seems like there are a lot of choices in how to set up a good search
        > system and I'd like to get started on the right foot to reduce my work
        > load.
        >
        > -Aaron[/color]

        Just let Google do it.


        Comment

        • nospam@geniegate.com

          #5
          Re: Good search theory

          In: <1110915682.132 517.134550@l41g 2000cwc.googleg roups.com>, "AaronV" <aaron.vanderpo el@gmail.com> wrote:[color=blue]
          >Hello,
          >
          >I'm a webmaster for a college newspaper and I'm implementing an article
          >search. I'm running PHP with a MySQL database to store the weekly
          >stories. Does anyone know of an article that could offer good search
          >theory.[/color]

          If it's an option for you, have a look at swish-e



          I don't know if there is a PHP interface or not though. It's semi-difficult to
          set up, but the folks who wrote it really did a good job. There are all kinds
          of ways of setting up Swish-e for META tags and the like.

          Proximity and phrases are quite difficult, tricky stuff but swish-e handles
          them.

          If swish-e won't work another option might be Lucene:

          Apache Lucene™ is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for...


          Been a few years, but when I checked into it Lucene was quite good as well.
          It's java, which may be an issue if you're not already running servlets.
          Surprisingly fast, especially considering it's java.

          Another option is Ht://dig



          Last I checked, it didn't do phrase matching, but it's quite mature. Been
          around a long time, several people are using it. It's the easiest one I've
          seen where setup is concerned. If you don't require phrase match, it's pretty
          decent.

          All of them that I've listed use an index and are pretty good at scale.
          Wouldn't try to use them in place of teoma.com, (With the possible exception of
          multiple Lucene's) but I bet they would work well for your application.

          One could probably fill a small library (or at least a full section of a
          library) with books on the subject of searching full text. 'tis not an easy
          task.
          [color=blue]
          >Seems like there are a lot of choices in how to set up a good search
          >system and I'd like to get started on the right foot to reduce my work
          >load.[/color]

          Maybe I'm prejudiced, but in my opinion SQL databases are not really designed
          for searching full text. (Been awhile, but I've been burned by them for
          fulltext search in the past) I suppose for a few hundred articles and/or
          highly custom search tools, an SQL database would work. (If your articles are
          in XML, then such a database would be OK for searching in titles or maybe within
          pre-determined XML containers like <var>..</var>)

          The "issue" I take with them is that you are effectively using a database
          AS an index. A database's primary goal is (or should be) data storage. Fulltext
          indices are a different beast altogether.

          They are excellent for setting up prototype "proof of concept" but quickly
          break down when using them for larger quantities of data. (This opinion based
          on a context-aware search tool, done in 1999, 6 years is a long time and things
          may have changed.)

          They do make good URL storage devices, last index time, things like that.

          Jamie
          --
          http://www.geniegate.com Custom web programming
          guhzo_42@lnubb. pbz (rot13) User Management Solutions

          Comment

          • AaronV

            #6
            Re: Good search theory

            Thanks for the many solutions everyone. I'll start with Fulltext
            because it will take the least effort to get something rudimentary
            working in short order. I'll examine the other options listed as well.

            Comment

            Working...