MySQL and searching TEXT fields

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michi

    MySQL and searching TEXT fields

    I was wondering what the best solution is for making large numbers of
    TEXT (or BLOB?) fields searchable. For example, if I have a forum,
    what is the best way to be able to search for specific words? How
    about exact phrases?

    I saw a solution where all words are preindexed in a "dictionary " like
    table and then another table stores the word matches. That seems
    really fast, but it has two major problems: 1) it can't do exact
    matches, and 2) the match table would get insanely big really fast.

    What is the "standard" way of doing this...?
  • Bill Karwin

    #2
    Re: MySQL and searching TEXT fields

    Michi wrote:
    [color=blue]
    > I was wondering what the best solution is for making large numbers of
    > TEXT (or BLOB?) fields searchable. For example, if I have a forum,
    > what is the best way to be able to search for specific words? How
    > about exact phrases?[/color]

    You should read MySQL's documentation on full-text searching, and see if
    it solves your requirements:



    Bill K.

    Comment

    • Michi

      #3
      Re: MySQL and searching TEXT fields

      Bill Karwin <bill@karwin.co m> wrote in message news:<ckupid11b mi@enews2.newsg uy.com>...[color=blue]
      > Michi wrote:
      >[color=green]
      > > I was wondering what the best solution is for making large numbers of
      > > TEXT (or BLOB?) fields searchable. For example, if I have a forum,
      > > what is the best way to be able to search for specific words? How
      > > about exact phrases?[/color]
      >
      > You should read MySQL's documentation on full-text searching, and see if
      > it solves your requirements:
      >
      > http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html
      >
      > Bill K.[/color]

      There seems to be a large consensus that the searching is very slow.
      When I said "large" I really meant in the few hundred thousand to
      millions of records. Would you say this is really the best solution
      for large scale applications?

      Comment

      • Bill Karwin

        #4
        Re: MySQL and searching TEXT fields

        Michi wrote:[color=blue]
        > There seems to be a large consensus that the searching is very slow.
        > When I said "large" I really meant in the few hundred thousand to
        > millions of records.[/color]

        Yeah, I pretty much ignore when people say "large number" because it's
        so subjective. A small retail shop calls it large database if it's
        20MB, while the IRS might call it a large database if it's 200TB (maybe
        not even then).
        [color=blue]
        > Would you say this is really the best solution
        > for large scale applications?[/color]

        Maybe not. The "dictionary table" you mentioned seems more likely.

        Yes, it will have a lot of rows, but they don't all have to store the
        word being searched; you can store a table of words, and assign an
        integer to each one. And an integer to each record with the text blob.
        Then you have a words_in_blob table where each record is simply two
        integers that are foreign keys to the other tables.

        You could also populate that dictionary table only with words that are
        actually chosen by users as search criteria, instead of every word that
        occurs in the text. The first person to use a given word might take an
        extra performance hit as their word is indexed and added to the lookup
        table. But subsequent searches for that word will be very fast. Faster
        even than if you had pre-filled the entire dictionary into the table,
        since naturally the table and its index will be smaller.

        Or you could try using a full-text index for a while, and log the most
        frequently chosen search words. Then switch the implementation over to
        the dictionary table method, and prepopulate it with words for which
        people have actually searched.

        There is usually a tradeoff between cost of storage and cost of
        computation. Luckily, the cost of storage these days is very favorable.

        Regards,
        Bill K.

        Comment

        • steve

          #5
          Re: Re: MySQL and searching TEXT fields

          "Bill Karwin1" wrote:[color=blue]
          > Michi wrote:[color=green]
          > > There seems to be a large consensus that the searching is[/color]
          > very slow.[color=green]
          > > When I said "large" I really meant in the few hundred[/color]
          > thousand to[color=green]
          > > millions of records.[/color]
          >
          > Yeah, I pretty much ignore when people say "large number"
          > because it's
          > so subjective. A small retail shop calls it large database if
          > it's
          > 20MB, while the IRS might call it a large database if it's
          > 200TB (maybe
          > not even then).
          >[color=green]
          > > Would you say this is really the best solution
          > > for large scale applications?[/color]
          >
          > Maybe not. The "dictionary table" you mentioned seems more
          > likely.
          >
          > Yes, it will have a lot of rows, but they don't all have to
          > store the
          > word being searched; you can store a table of words, and
          > assign an
          > integer to each one. And an integer to each record with the
          > text blob.
          > Then you have a words_in_blob table where each record is
          > simply two
          > integers that are foreign keys to the other tables.
          >
          > You could also populate that dictionary table only with words
          > that are
          > actually chosen by users as search criteria, instead of every
          > word that
          > occurs in the text. The first person to use a given word
          > might take an
          > extra performance hit as their word is indexed and added to
          > the lookup
          > table. But subsequent searches for that word will be very
          > fast. Faster
          > even than if you had pre-filled the entire dictionary into the
          > table,
          > since naturally the table and its index will be smaller.
          >
          > Or you could try using a full-text index for a while, and log
          > the most
          > frequently chosen search words. Then switch the
          > implementation over to
          > the dictionary table method, and prepopulate it with words for
          > which
          > people have actually searched.
          >
          > There is usually a tradeoff between cost of storage and cost
          > of
          > computation. Luckily, the cost of storage these days is very
          > favorable.
          >
          > Regards,
          > Bill K.[/color]

          Mysql full text is slow for large tables, BUT the slowness in my
          opinion is due to the large amount of time to retrieve the large
          fulltext index file from disk to memory. So the easy solution (if you
          can) is to simply force the fulltext index to sit in memory.
          Building on that, if you can have all the indecis sit in memory at all
          times, then your performance is great. In mysql, you can dedicate a
          chunk of ram to index cache, and if that area is larger than the sum
          of all your indecis, then you will run like a speeding bullet.

          --
          http://www.dbForumz.com/ This article was posted by author's request
          Articles individually checked for conformance to usenet standards
          Topic URL: http://www.dbForumz.com/mySQL-search...ict161795.html
          Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=548290

          Comment

          Working...