Optimising MySQL queries against huge databases?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jasper Bryant-Greene

    Optimising MySQL queries against huge databases?

    I have a database of movie titles, with about 78,000 records, and a
    database of related people (directors, writers, actors/actresses etc.)
    with about 141,000 records. I display a random movie out of this
    database on each hit to my website's homepage.

    This worked fine when I had only a couple thousand movies, but now that
    the DB has grown, it seems to be taking a bit longer to process the
    page.

    My DB schema for each table:

    CREATE TABLE `movies` (
    `id` int(11) NOT NULL auto_increment,
    `name` varchar(200) NOT NULL default '',
    `uri` varchar(100) NOT NULL default '',
    `year` year(4) NOT NULL default '0000',
    `released` date NOT NULL default '0000-00-00',
    `imdb` int(11) NOT NULL default '0',
    `allmovies` varchar(25) NOT NULL default '',
    `length` int(11) NOT NULL default '0',
    `colour` enum('c','b') NOT NULL default 'c',
    `sound` varchar(25) NOT NULL default '',
    `director` int(11) NOT NULL default '0',
    `writer` int(11) NOT NULL default '0',
    `asin` varchar(25) NOT NULL default '',
    `image` blob NOT NULL,
    `genre` varchar(25) NOT NULL default '',
    `fatso` int(11) NOT NULL default '0',
    PRIMARY KEY (`id`),
    UNIQUE KEY `imdb` (`imdb`),
    KEY `name` (`name`),
    FULLTEXT KEY `name_2` (`name`)
    ) TYPE=MyISAM AUTO_INCREMENT= 78483 ;

    CREATE TABLE `people` (
    `id` int(11) NOT NULL auto_increment,
    `name` varchar(100) NOT NULL default '',
    `born` date NOT NULL default '0000-00-00',
    `died` date NOT NULL default '0000-00-00',
    `imdb` int(11) NOT NULL default '0',
    `allmusic` varchar(25) NOT NULL default '',
    `allmovies` varchar(25) NOT NULL default '',
    `uri` varchar(100) NOT NULL default '',
    `image` blob NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `imdb` (`imdb`),
    KEY `name` (`name`),
    FULLTEXT KEY `name_2` (`name`)
    ) TYPE=MyISAM AUTO_INCREMENT= 141623 ;

    The SQL query I am using to fetch a random movie is:

    SELECT movies.id, movies.name, movies.asin, movies.fatso, people.id AS
    directorid, people.name AS director
    FROM movies, people
    WHERE movies.image<>' ' AND people.id=movie s.director
    ORDER BY RAND() LIMIT 1;

  • Michael Vilain

    #2
    Re: Optimising MySQL queries against huge databases?

    In article <1100233324.681 881.71700@z14g2 000cwz.googlegr oups.com>,
    "Jasper Bryant-Greene" <jasperbg@gmail .com> wrote:
    [color=blue]
    > I have a database of movie titles, with about 78,000 records, and a
    > database of related people (directors, writers, actors/actresses etc.)
    > with about 141,000 records. I display a random movie out of this
    > database on each hit to my website's homepage.
    >
    > This worked fine when I had only a couple thousand movies, but now that
    > the DB has grown, it seems to be taking a bit longer to process the
    > page.
    >
    > My DB schema for each table:
    >
    > CREATE TABLE `movies` (
    > `id` int(11) NOT NULL auto_increment,
    > `name` varchar(200) NOT NULL default '',
    > `uri` varchar(100) NOT NULL default '',
    > `year` year(4) NOT NULL default '0000',
    > `released` date NOT NULL default '0000-00-00',
    > `imdb` int(11) NOT NULL default '0',
    > `allmovies` varchar(25) NOT NULL default '',
    > `length` int(11) NOT NULL default '0',
    > `colour` enum('c','b') NOT NULL default 'c',
    > `sound` varchar(25) NOT NULL default '',
    > `director` int(11) NOT NULL default '0',
    > `writer` int(11) NOT NULL default '0',
    > `asin` varchar(25) NOT NULL default '',
    > `image` blob NOT NULL,
    > `genre` varchar(25) NOT NULL default '',
    > `fatso` int(11) NOT NULL default '0',
    > PRIMARY KEY (`id`),
    > UNIQUE KEY `imdb` (`imdb`),
    > KEY `name` (`name`),
    > FULLTEXT KEY `name_2` (`name`)
    > ) TYPE=MyISAM AUTO_INCREMENT= 78483 ;
    >
    > CREATE TABLE `people` (
    > `id` int(11) NOT NULL auto_increment,
    > `name` varchar(100) NOT NULL default '',
    > `born` date NOT NULL default '0000-00-00',
    > `died` date NOT NULL default '0000-00-00',
    > `imdb` int(11) NOT NULL default '0',
    > `allmusic` varchar(25) NOT NULL default '',
    > `allmovies` varchar(25) NOT NULL default '',
    > `uri` varchar(100) NOT NULL default '',
    > `image` blob NOT NULL,
    > PRIMARY KEY (`id`),
    > UNIQUE KEY `imdb` (`imdb`),
    > KEY `name` (`name`),
    > FULLTEXT KEY `name_2` (`name`)
    > ) TYPE=MyISAM AUTO_INCREMENT= 141623 ;
    >
    > The SQL query I am using to fetch a random movie is:
    >
    > SELECT movies.id, movies.name, movies.asin, movies.fatso, people.id AS
    > directorid, people.name AS director
    > FROM movies, people
    > WHERE movies.image<>' ' AND people.id=movie s.director
    > ORDER BY RAND() LIMIT 1;[/color]

    what does the EXPLAIN <query> tell you? It should show the strategy
    used by optimizer. You may have to do some rephrasing. Have you posted
    this in a mysql group:




    --
    DeeDee, don't press that button! DeeDee! NO! Dee...



    Comment

    • Henk Verhoeven

      #3
      Re: Optimising MySQL queries against huge databases?

      Hi Jasper,

      I agree with Michael about the explain, but i did not try that, so the
      following are just guesses:
      - there is no index on movies.image so mySQL must sequentially search
      throuhg all movies that have a director (probably all or close to all 141000
      - there are probably MANY movies that have an image and a director.
      Unless MySQL has a special optimization for ORDER BY RAND() LIMIT ..
      it may have to random-sort all of them. That will take quite some time,
      i guess. So if an index on movies.image does not help, you may have to
      add AND id = RAND() to the SELECT condition, with the RAND function
      parameterized or multiplied and rounded or floored to get whole numbers
      between the lowest and highest id in the database. And then repeat the
      query until it does return a row. (if many rows have been removed you
      may have to pack the id's of movies)

      Success,

      Henk Verhoeven,
      www.phpPeanuts.org.


      Comment

      Working...