Query takes 17 seconds, suggestions needed

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ondernemer

    Query takes 17 seconds, suggestions needed

    I am building a website which stores the headers of newsgroup articles in a
    database so people can search through the headers using keywords.

    My queries use WHERE LIKE statements.

    CREATE TABLE `headers` (
    `header_id` int(11) NOT NULL auto_increment,
    `header_num` int(11) NOT NULL default '0',
    `message_id` varchar(150) NOT NULL default '',
    `subject` varchar(200) NOT NULL default '',
    `poster` varchar(75) NOT NULL default '',
    `postdate` int(11) default NULL,
    `newsgroup` varchar(100) NOT NULL default '',
    `partnum` int(11) NOT NULL default '0',
    `parttotal` int(11) NOT NULL default '0',
    `numlines` int(11) NOT NULL default '0',
    `bytes` int(11) NOT NULL default '0',
    `date_added` timestamp(14) NOT NULL,
    `sortfield` varchar(200) NOT NULL default '',
    PRIMARY KEY (`header_id`)
    ) TYPE=MyISAM

    I query on the SORTFIELD column.

    The SORTFIELD column is a string, made up of SUBJECT. Using RegExps I cut
    off things like RE: so that I can group similar subjects and sort on that
    field.

    Currently I have 5 million records in this table and searching the table
    takes 17 seconds to generate an output.

    I have tried using FULLTEXT indexes but I dropped this because FULLTEXT does
    not allow partial searches e.g. wildcard-searches.

    How can I speed up my results? I'm very much in the dark here. I want the
    user to be able to do a GLOBAL search and not limit them by making them
    select the newsgroup to search in before they submit theit keywords.

    Any help would be very much appreciated.



  • Bill Karwin

    #2
    Re: Query takes 17 seconds, suggestions needed

    Ondernemer wrote:[color=blue]
    > My queries use WHERE LIKE statements.[/color]

    LIKE cannot use an index unless the pattern to which you compare started
    with a non-wildcard. For instance, LIKE '%string' cannot use an index,
    but LIKE 'string%' might be able to use an index.

    Information on MySQL query performance and optimization is in the online
    MySQL documentation. Try reading section 7.2 and its subsections:


    It's nice that you showed us the table structure, but I don't think
    folks can help very much unless you also show the query you're using.

    Regards,
    Bill K.

    Comment

    • Rich R

      #3
      Re: Query takes 17 seconds, suggestions needed


      "Bill Karwin" <bill@karwin.co m> wrote in message
      news:cp4tae0246 f@enews4.newsgu y.com...[color=blue]
      > Ondernemer wrote:[color=green]
      > > My queries use WHERE LIKE statements.[/color]
      >
      > LIKE cannot use an index unless the pattern to which you compare started
      > with a non-wildcard. For instance, LIKE '%string' cannot use an index,
      > but LIKE 'string%' might be able to use an index.
      >
      > Information on MySQL query performance and optimization is in the online
      > MySQL documentation. Try reading section 7.2 and its subsections:
      > http://dev.mysql.com/doc/mysql/en/Query_Speed.html
      >
      > It's nice that you showed us the table structure, but I don't think
      > folks can help very much unless you also show the query you're using.
      >
      > Regards,
      > Bill K.[/color]

      Hi Bill,
      He at least included the SHOW CREATE TABLE, which is more than most posters
      do.
      Doesn't that show you the table structure?

      Rich


      Comment

      • Bill Karwin

        #4
        Re: Query takes 17 seconds, suggestions needed

        Rich R wrote:[color=blue]
        > He at least included the SHOW CREATE TABLE, which is more than most posters
        > do.[/color]

        Yes, and that's great! I commend Ondernemer for that.
        My comment was that he _did_ show the table structure, but _didn't_ show
        the query he was using. It's easiest to give some diagnosis or advice
        if we see both the table structure and the query.

        He only mentioned that he was using a LIKE expression on the SORTFIELD
        column. But he does not give the pattern against which he is trying to
        match SORTFIELD. So I can't tell if the expression would benefit from
        an index or not. If one uses wildcards at the start of a LIKE pattern,
        the search cannot use an index.

        Regards,
        Bill K.

        Comment

        • Ondernemer

          #5
          Re: Query takes 17 seconds, suggestions needed

          "Bill Karwin" wrote:[color=blue]
          > My comment was that he _did_ show the table structure, but _didn't_ show
          > the query he was using. It's easiest to give some diagnosis or advice if
          > we see both the table structure and the query.[/color]

          The query is pretty straight forward.

          Part of the function:

          function searchHeaders($ method, $keyword) {

          switch ($method) {

          case "subject" :
          $col = "sortfield" ;
          break;

          case "poster" :
          $col = "poster";
          break;

          case "ftd" :
          $col = "ftdnum";
          break;
          }

          $result = mysql_query ("SELECT h.sortfield, h.poster, h.postdate,
          count(h.partnum ) as parts,h.parttot al, n.newsgroup
          FROM headers h
          LEFT JOIN newsgroups n ON n.newsgroup_id = h.newsgroup
          WHERE $col like '%$keyword%'

          group by h.sortfield
          order by h.sortfield, h.partnum
          limit 0,250
          ");

          The LIKE statement and GROUP BY/ORDER BY are the bad guys. Doing an EXACT
          search, thus not using LIKE generates an output in 0.002 seconds.

          How would I set up my table so that partial searches go fast? I mean...
          Google has a newsgroup search and it doesnt take long.. how do they handle
          partial searches?



          Comment

          • Bill Karwin

            #6
            Re: Query takes 17 seconds, suggestions needed

            Ondernemer wrote:[color=blue]
            > How would I set up my table so that partial searches go fast? I mean...
            > Google has a newsgroup search and it doesnt take long.. how do they handle
            > partial searches?[/color]

            If we knew that, we could IPO and trade at $170 a share! ;-)

            But seriously, you can often solve these problems as a tradeoff between
            storage cost and computational cost. You can set up extra tables to
            create a mapping between keywords and each article that matches the
            keyword by subject, poster, or ftdnum. In other words, do a lot of the
            work of searching before the user asks for it, and store the results.

            Regards,
            Bill K.

            Comment

            • Ondernemer

              #7
              Re: Query takes 17 seconds, suggestions needed


              "Bill Karwin" wrote:
              [color=blue]
              > But seriously, you can often solve these problems as a tradeoff between
              > storage cost and computational cost. You can set up extra tables to
              > create a mapping between keywords and each article that matches the
              > keyword by subject, poster, or ftdnum. In other words, do a lot of the
              > work of searching before the user asks for it, and store the results.[/color]

              Yeah, I was thinking of using Regular Expressions to get words from the
              subject and storing them as keywords, in a different keyword table with
              mappings to the articles.

              I just don't want to create a larger database of keywords.

              How would you have set up this database?


              Comment

              Working...