MySQL Speed

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ridge Burner

    MySQL Speed

    Can someone tell me which of these 2 SQL queries will be more efficient? I'm
    having a debate with another guy about which would be less resource
    intensive for MySQL.

    The first uses MySQL to pick a random row in a single statement:
    <?php
    $sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
    $query = mysql_query($sq l,$conn);
    ?>


    The second does the same thing, but uses 2 queries to do it:
    <?php
    $sqlA = "SELECT COUNT(id) FROM myTable";
    $queryA = mysql_query($sq lA,$conn);
    $num = mysql_result($q ueryA,0,0);
    $random = rand(1,$num);
    $sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
    ?>

    Basically, which approach would be better to use in a high traffic
    environment to retrieve a single random row?


    Thanks for any help/advice you can give!

    ps. the code may not be 100% correct, it is for demonstration purposes only!


  • Chung Leong

    #2
    Re: MySQL Speed


    Ridge Burner wrote:[color=blue]
    > Can someone tell me which of these 2 SQL queries will be more efficient? I'm
    > having a debate with another guy about which would be less resource
    > intensive for MySQL.
    >
    > The first uses MySQL to pick a random row in a single statement:
    > <?php
    > $sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
    > $query = mysql_query($sq l,$conn);
    > ?>
    >
    >
    > The second does the same thing, but uses 2 queries to do it:
    > <?php
    > $sqlA = "SELECT COUNT(id) FROM myTable";
    > $queryA = mysql_query($sq lA,$conn);
    > $num = mysql_result($q ueryA,0,0);
    > $random = rand(1,$num);
    > $sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
    > ?>
    >
    > Basically, which approach would be better to use in a high traffic
    > environment to retrieve a single random row?[/color]

    The problem here is that the second query is wrong for what you're
    trying to do. The number of rows in a table doesn't necessarily have
    anything kind of relationship with the primary keys. Rows could have
    been deleted or the seed value of the auto-increment column might not
    have been 1.

    Comment

    • Ridge Burner

      #3
      Re: MySQL Speed


      "Chung Leong" <chernyshevsky@ hotmail.com> wrote in message
      news:1146254836 .874212.306620@ e56g2000cwe.goo glegroups.com.. .[color=blue]
      >
      > Ridge Burner wrote:[color=green]
      >> Can someone tell me which of these 2 SQL queries will be more efficient?
      >> I'm
      >> having a debate with another guy about which would be less resource
      >> intensive for MySQL.
      >>
      >> The first uses MySQL to pick a random row in a single statement:
      >> <?php
      >> $sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
      >> $query = mysql_query($sq l,$conn);
      >> ?>
      >>
      >>
      >> The second does the same thing, but uses 2 queries to do it:
      >> <?php
      >> $sqlA = "SELECT COUNT(id) FROM myTable";
      >> $queryA = mysql_query($sq lA,$conn);
      >> $num = mysql_result($q ueryA,0,0);
      >> $random = rand(1,$num);
      >> $sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
      >> ?>
      >>
      >> Basically, which approach would be better to use in a high traffic
      >> environment to retrieve a single random row?[/color]
      >
      > The problem here is that the second query is wrong for what you're
      > trying to do. The number of rows in a table doesn't necessarily have
      > anything kind of relationship with the primary keys. Rows could have
      > been deleted or the seed value of the auto-increment column might not
      > have been 1.
      >[/color]

      What if I change it to say something like 'SELECT COUNT(column1) FROM
      myTable WHERE live='0' LIMIT 1'

      I understand the primary keys may not always return a reliable result, but
      I'm questioning whether it is better to use one query to return a row vs.
      using 2 queries to return a random row.

      Right now we are experiencing issues with MySQL being able to keep up with
      all of the traffic. The first SQL statement is used for ad banners, and our
      hoster has mentioned that this is not the most efficient way to do what were
      doing. The only other way I can see to retrieve a random row is to query for
      the # of rows that match the criteria, then use that # to have PHP generate
      a random # between 1 & that #, then re-query MySQL for the row that PHP
      generated. After that I'll need to make sure that the row does actually
      exist in the table. If it doesn't, I have to start again.

      It's just to me, the second approach seems to have a lot more overhead in it
      than the first.





















      Comment

      • Rik

        #4
        Re: MySQL Speed

        Ridge Burner wrote:[color=blue]
        > Can someone tell me which of these 2 SQL queries will be more
        > efficient? I'm having a debate with another guy about which would be
        > less resource intensive for MySQL.[/color]
        Depends:
        <?php
        $db = mysql_connect() ;
        $conn = mysql_select_db ("testbase",$db );

        $start1 = microtime(true) ;
        for($i=1;$i<200 0;$i++){
        $sql = "SELECT * FROM wnk_huizen ORDER BY RAND() LIMIT 1";
        $query = mysql_query($sq l);
        $result = mysql_fetch_arr ay($query);
        }
        $end1 = microtime(true) ;

        $start2 = microtime(true) ;

        for($i=1;$i<200 0;$i++){
        $value = array();
        $sql = "SELECT id FROM wnk_huizen";
        $query = mysql_query($sq l);
        while($result = mysql_fetch_arr ay($query)){
        $value[] = $result;
        }
        $random = rand(0,count($v alue)-1);
        $sql = "SELECT * FROM wnk_huizen WHERE id='" . $value[$random] . "' LIMIT
        1";
        $query = mysql_query($sq l);
        $result = mysql_fetch_arr ay($query);
        }
        $end2 = microtime(true) ;
        echo "<br />result with ".count($value) ." records;<br />";
        $time1= $end1-$start1;
        $time2= $end2-$start2;
        echo "<br />Option 1:".$start1."-".$end1." :".$time1;
        echo "<br />Option 2:".$start2."-".$end2." :".$time2;
        ?>

        Result from my extremely slow testserver:

        result with 69 records;

        Option 1:1146258591.25-1146258626.6845 :35.43452620506 3
        Option 2:1146258626.68 45-1146258636.4802 :9.795674085617 1

        result with 2208 records;

        Option 1:1146259157.82 81-1146259527.2195 :369.3913819789 9
        Option 2:1146259527.21 95-1146259655.4201 :128.2005729675 3

        You'd think option 1 is slower, except if I use a different table:

        result with 613 records (id=int(4), primary key);

        Option 1:1146258636.48 04-1146258644.8055 :8.325155019760 1
        Option 2:1146258644.80 56-1146258683.6749 :38.86934089660 6

        I presume it's because the first table had "text" fields, the second
        integers and a 2 VARCHAR(50).
        for your reference:
        TABLE 1:
        Field Type Null Key Default Extra
        id int(5) PRI NULL auto_increment
        stad varchar(50)
        postcode varchar(7)
        adres varchar(70)
        page varchar(40)
        prijs varchar(20)
        short_desc text
        long_desc text
        status char(1)
        time timestamp YES CURRENT_TIMESTA MP

        TABLE 2
        Field Type Null Key Default Extra
        id int(5) PRI NULL auto_increment
        img_name varchar(50)
        img_huis_id int(5) 0
        img_huis_defaul t tinyint(1) 0
        img_huis_desc varchar(50)


        So, it's highly dependable on the database, I'm not going to waste time
        checking the details myself, but some people on the mysql newsgroup might
        now how ORDER BY RAND() is affected by type of fields in a table.

        Grtz,
        --
        Rik Wasmus


        Comment

        • Rik

          #5
          Re: MySQL Speed

          Ridge Burner wrote:[color=blue]
          > Right now we are experiencing issues with MySQL being able to keep up
          > The first SQL statement is used for ad
          > banners, and our hoster has mentioned that this is not the most
          > efficient way to do what were doing.[/color]

          See my other post. Í thought it was just an hypothetical(?) situation. For
          this specific problem you can simply test yourself: create the table on your
          local server, and test with for($i=1;$i<$nu mber;$i++) where $number is an
          arbitrary high number.

          Note: this is the time the script takes, not neccesarily how long MySQL
          takes. If it's just a case about MySQL which is to busy, and PHP has no
          problem keeping up, it's a simple choice to choose to do more processing in
          PHP.

          Then again, I'm no server admin. Maybe someone else with more experience can
          shed more light on the subject of processing by MySQL and PHP on production
          servers.

          (?)pfff, english, I hope it's the correct word and spelling)

          Grtz,
          --
          Rik Wasmus


          Comment

          • bobzimuta

            #6
            Re: MySQL Speed


            Ridge Burner wrote:[color=blue]
            > Can someone tell me which of these 2 SQL queries will be more efficient? I'm
            > having a debate with another guy about which would be less resource
            > intensive for MySQL.
            >
            > The first uses MySQL to pick a random row in a single statement:
            > <?php
            > $sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
            > $query = mysql_query($sq l,$conn);
            > ?>
            >
            >
            > The second does the same thing, but uses 2 queries to do it:
            > <?php
            > $sqlA = "SELECT COUNT(id) FROM myTable";
            > $queryA = mysql_query($sq lA,$conn);
            > $num = mysql_result($q ueryA,0,0);
            > $random = rand(1,$num);
            > $sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
            > ?>
            >
            > Basically, which approach would be better to use in a high traffic
            > environment to retrieve a single random row?
            >
            >
            > Thanks for any help/advice you can give!
            >
            > ps. the code may not be 100% correct, it is for demonstration purposes only![/color]

            MySQL creates an internal counter for the number of rows in a table (at
            least MyISAM as far as I know). So doing "SELECT COUNT(*) FROM table"
            is instant. So as far as I know, using the COUNT(*) should be faster,
            as long as id is a unique (or primary) key. If in doubt, use the
            EXPLAIN syntax to see how many rows are being queried for the two
            different queries. Hope that helps.

            Comment

            • Gordon Burditt

              #7
              Re: MySQL Speed

              >>> The first uses MySQL to pick a random row in a single statement:[color=blue][color=green][color=darkred]
              >>> <?php
              >>> $sql = "SELECT FROM myTable ORDER BY RAND() LIMIT 1";
              >>> $query = mysql_query($sq l,$conn);
              >>> ?>
              >>>
              >>>
              >>> The second does the same thing, but uses 2 queries to do it:
              >>> <?php
              >>> $sqlA = "SELECT COUNT(id) FROM myTable";
              >>> $queryA = mysql_query($sq lA,$conn);
              >>> $num = mysql_result($q ueryA,0,0);
              >>> $random = rand(1,$num);
              >>> $sqlB = "SELECT FROM myTable WHERE id='" . $random . "' LIMIT 1";
              >>> ?>
              >>>
              >>> Basically, which approach would be better to use in a high traffic
              >>> environment to retrieve a single random row?[/color]
              >>
              >> The problem here is that the second query is wrong for what you're
              >> trying to do. The number of rows in a table doesn't necessarily have
              >> anything kind of relationship with the primary keys. Rows could have
              >> been deleted or the seed value of the auto-increment column might not
              >> have been 1.
              >>[/color]
              >
              >What if I change it to say something like 'SELECT COUNT(column1) FROM
              >myTable WHERE live='0' LIMIT 1'[/color]

              Consider seriously what happens when the result of this will *NEVER*
              match a value for id. For example, suppose id is always a credit
              card number (12-16 digits and unlikely to contain leading zeroes, as
              Amex, Mastercard, VISA, and Discover begin with 3, 4, 5, and 6, not
              necessarily in that order).
              [color=blue]
              >I understand the primary keys may not always return a reliable result, but
              >I'm questioning whether it is better to use one query to return a row vs.
              >using 2 queries to return a random row.[/color]

              Any code can be made infinitely fast and run in zero space if it doesn't
              have to return a correct answer. And I consider no banner ad to always
              be preferable to a banner ad.
              [color=blue]
              >Right now we are experiencing issues with MySQL being able to keep up with
              >all of the traffic. The first SQL statement is used for ad banners, and our
              >hoster has mentioned that this is not the most efficient way to do what were
              >doing. The only other way I can see to retrieve a random row is to query for
              >the # of rows that match the criteria, then use that # to have PHP generate
              >a random # between 1 & that #, then re-query MySQL for the row that PHP
              >generated. After that I'll need to make sure that the row does actually
              >exist in the table. If it doesn't, I have to start again.[/color]

              If you have 10 rows in your table numbered 77, 83, 84, 85, 89, 92, 93, 94,
              95, and 97, you will *NEVER* find a row that exists and it will take
              an INFINITE number of queries to generate the page.
              [color=blue]
              >It's just to me, the second approach seems to have a lot more overhead in it
              >than the first.[/color]

              Gordon L. Burditt

              Comment

              Working...