Median Function in mySQL

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ross Contino

    Median Function in mySQL

    Hello to all:

    I have been searching the web for examples on how to determine a median
    value in a mySQL table. I have reviewed the article at
    http://mysql.progen.com.tr/doc/en/Gr...functions.html. I am an
    experienced VB programmer that has recently moved to PHP/mySQL. My employer
    has a text file outputted from a vendor specific software with data.
    However it cannot be manipulated because it is text. I created a web that
    reads the text file on client, converts it to usable data, inputs to mySQL
    table and then reflects HTML pages with desired report. One report
    subtracts two times and then my employer wishes a median grouped by another
    field. I used the AVG function, but of course this is mean, not Median. My
    query is:

    $mSql = "SELECT AdmitDoc,
    Count(AdmitDoc) as Number_Admits,
    sec_to_time(AVG (unix_timestamp (Outdt) -
    unix_timestamp( Decdt))) as AVG_Doc_To_Admi t,
    sec_to_time(AVG (unix_timestamp (Outdt) -
    unix_timestamp( Indt))) as AVG_Total_Stay
    from tadmits GROUP BY AdmitDoc";

    This query subtracts several wait times in an emergency department and
    groups by admitting physician. I would like to have median rather than
    mean. AdmitDoc is text field, Outdt, Decdt, Indt are all date/time fields.

    Any thoughts?

    Thanks,
    Ross


  • Steve

    #2
    Re: Median Function in mySQL

    Ross Contino wrote:[color=blue]
    > Hello to all:
    >
    > I have been searching the web for examples on how to determine a[/color]
    median[color=blue]
    > value in a mySQL table. I have reviewed the article at
    > http://mysql.progen.com.tr/doc/en/Gr...functions.html. I am an
    > experienced VB programmer that has recently moved to PHP/mySQL. My[/color]
    employer[color=blue]
    > has a text file outputted from a vendor specific software with data.
    > However it cannot be manipulated because it is text. I created a web[/color]
    that[color=blue]
    > reads the text file on client, converts it to usable data, inputs to[/color]
    mySQL[color=blue]
    > table and then reflects HTML pages with desired report. One report
    > subtracts two times and then my employer wishes a median grouped by[/color]
    another[color=blue]
    > field. I used the AVG function, but of course this is mean, not[/color]
    Median. My[color=blue]
    > query is:
    >
    > $mSql = "SELECT AdmitDoc,
    > Count(AdmitDoc) as Number_Admits,
    > sec_to_time(AVG (unix_timestamp (Outdt) -
    > unix_timestamp( Decdt))) as AVG_Doc_To_Admi t,
    > sec_to_time(AVG (unix_timestamp (Outdt) -
    > unix_timestamp( Indt))) as AVG_Total_Stay
    > from tadmits GROUP BY AdmitDoc";
    >
    > This query subtracts several wait times in an emergency department[/color]
    and[color=blue]
    > groups by admitting physician. I would like to have median rather[/color]
    than[color=blue]
    > mean. AdmitDoc is text field, Outdt, Decdt, Indt are all date/time[/color]
    fields.[color=blue]
    >
    > Any thoughts?
    >
    > Thanks,
    > Ross[/color]

    There is no MEDIAN() operator in MySQL, so you will have to perform
    this operation in several steps...

    (1) Count the number of rows in each group
    (2) For each group, if the count is odd, the median is the middle row
    (when the set is ordered); when the count is even, the median is the
    mean of the middle two rows.

    In skeletal form...

    // get counts for each set of interest...

    SELECT AdmitDoc, COUNT(*) AS "Number_Adm its"
    FROM tadmits
    GROUP BY AdmitDoc

    // for each AdmitDoc, work out LIMIT parameters (nb first row =
    0)...

    $doc = $datRow[ 'AdmitDoc' ];
    $count = $datRow[ Number_Admits];

    if $count is odd,
    $limit1 = intval( $count / 2 );
    $limit2 = 1;
    else
    $limit1 = intval( $count / 2 ) - 1;
    $limit2 = 2;

    // select middle row(s) from ORDERed set...

    SELECT AdmitDoc,
    SEC_TO_TIME(UNI X_TIMESTAMP(Out dt)-UNIX_TIMESTAMP( Decdt)) AS
    "Doc_To_Adm it"
    FROM tadmits
    WHERE AdmitDoc = $doc
    ORDER BY UNIX_TIMESTAMP( Outdt) - UNIX_TIMESTAMP( Decdt)
    LIMIT $limit1, $limit2


    if $count is odd,
    // only one row...
    $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
    $median_doc_to_ admit = $datRow[ 'Doc_To_Admit' ];
    else
    // two rows
    $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
    $val1 = $datRow[ 'Doc_To_Admit' ];
    $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
    $val2 = $datRow[ 'Doc_To_Admit' ];
    $median_doc_to_ admit = ( $val1 + $val2 ) / 2;

    You'll have to do this separately for your two required measures
    because they each need a different sort order.

    ---
    Steve

    Comment

    • Steve

      #3
      Re: Median Function in mySQL


      Ross Contino wrote:[color=blue]
      > Hello to all:
      >
      > I have been searching the web for examples on how to determine a[/color]
      median[color=blue]
      > value in a mySQL table. I have reviewed the article at
      > http://mysql.progen.com.tr/doc/en/Gr...functions.html. I am an
      > experienced VB programmer that has recently moved to PHP/mySQL. My[/color]
      employer[color=blue]
      > has a text file outputted from a vendor specific software with data.
      > However it cannot be manipulated because it is text. I created a web[/color]
      that[color=blue]
      > reads the text file on client, converts it to usable data, inputs to[/color]
      mySQL[color=blue]
      > table and then reflects HTML pages with desired report. One report
      > subtracts two times and then my employer wishes a median grouped by[/color]
      another[color=blue]
      > field. I used the AVG function, but of course this is mean, not[/color]
      Median. My[color=blue]
      > query is:
      >
      > $mSql = "SELECT AdmitDoc,
      > Count(AdmitDoc) as Number_Admits,
      > sec_to_time(AVG (unix_timestamp (Outdt) -
      > unix_timestamp( Decdt))) as AVG_Doc_To_Admi t,
      > sec_to_time(AVG (unix_timestamp (Outdt) -
      > unix_timestamp( Indt))) as AVG_Total_Stay
      > from tadmits GROUP BY AdmitDoc";
      >
      > This query subtracts several wait times in an emergency department[/color]
      and[color=blue]
      > groups by admitting physician. I would like to have median rather[/color]
      than[color=blue]
      > mean. AdmitDoc is text field, Outdt, Decdt, Indt are all date/time[/color]
      fields.[color=blue]
      >
      > Any thoughts?
      >
      > Thanks,
      > Ross[/color]

      There is no MEDIAN() operator in MySQL, so you will have to perform
      this operation in several steps...

      (1) Count the number of rows in each group
      (2) For each group, if the count is odd, the median is the middle row
      (when the set is ordered); when the count is even, the median is the
      mean of the middle two rows.

      In skeletal form...

      // get counts for each set of interest...

      SELECT AdmitDoc, COUNT(*) AS "Number_Adm its"
      FROM tadmits
      GROUP BY AdmitDoc

      // for each AdmitDoc, work out LIMIT parameters (nb first row =
      0)...

      $doc = $datRow[ 'AdmitDoc' ];
      $count = $datRow[ Number_Admits];

      if $count is odd,
      $limit1 = intval( $count / 2 );
      $limit2 = 1;
      else
      $limit1 = intval( $count / 2 ) - 1;
      $limit2 = 2;

      // select middle row(s) from ORDERed set...

      SELECT AdmitDoc,
      SEC_TO_TIME(UNI X_TIMESTAMP(Out dt)-UNIX_TIMESTAMP( Decdt)) AS
      "Doc_To_Adm it"
      FROM tadmits
      WHERE AdmitDoc = $doc
      ORDER BY UNIX_TIMESTAMP( Outdt) - UNIX_TIMESTAMP( Decdt)
      LIMIT $limit1, $limit2


      if $count is odd,
      // only one row...
      $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
      $median_doc_to_ admit = $datRow[ 'Doc_To_Admit' ];
      else
      // two rows
      $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
      $val1 = $datRow[ 'Doc_To_Admit' ];
      $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
      $val2 = $datRow[ 'Doc_To_Admit' ];
      $median_doc_to_ admit = ( $val1 + $val2 ) / 2;

      You'll have to do this separately for your two required measures
      because they each need a different sort order.

      ---
      Steve

      Comment

      • Ross Contino

        #4
        Re: Median Function in mySQL

        Hi Steve:

        I ended up writing a php function to select the medians from both values,
        but it entailed looping the data frequently and creating a new table to
        store it (see below). I could not quite get your method working. Would
        your method have only worked via SQL and preclude the constant looping and
        need for a table to store the medians?

        Median functions:

        /***********func tion block********** ***************/
        function GetMedian_Dec($ myDoc, $No_Admits)
        {
        $MyMedian = "";
        $LimitSql = "SELECT
        SEC_TO_TIME(UNI X_TIMESTAMP(Out dt)-UNIX_TIMESTAMP( Decdt))AS Doc_To_Admit
        FROM tadmits WHERE AdmitDoc = '" . $myDoc . "'" . "
        ORDER BY Doc_To_Admit";
        $result2 = mysql_query($Li mitSql);
        $i=1;
        $FirstTime = "";
        $SecondTime = "";
        If (($No_Admits % 2)==1){ //odd number of admissions
        $MyCounter = intval(($No_Adm its / 2) + 1);
        while($row3=mys ql_fetch_assoc( $result2)){
        if ($i ==$MyCounter){
        $UpSql = "UPDATE tersummary SET Median_MD_To_Ad mit = '"
        .. $row3['Doc_To_Admit'] . "' WHERE AdmitDoc = '" . $myDoc . "'";
        mysql_query($Up Sql);
        }//end if
        $i++;
        }//end while
        }
        else { //even number of admissions
        $MyCounter = intval($No_Admi ts / 2);
        while($row3=mys ql_fetch_assoc( $result2)){
        if ($i==$MyCounter ){
        $FirstTime = $row3['Doc_To_Admit'];
        }//end if
        if ($i==$MyCounter + 1){
        $SecondTime = $row3['Doc_To_Admit'];
        }//end if
        $i++;
        }//end while
        $AvgVal = (strtotime($Fir stTime) + strtotime($Seco ndTime)) /
        2;
        $UpSql = "UPDATE tersummary SET Median_MD_To_Ad mit = '" .
        date("H:i:s",$A vgVal) . "' WHERE AdmitDoc = '" . $myDoc . "'";
        mysql_query($Up Sql);
        }
        }
        function GetMedian_Stay( $myDoc, $No_Admits)
        { $MyMedian = "";
        $LimitSql = "SELECT
        SEC_TO_TIME(UNI X_TIMESTAMP(Out dt)-UNIX_TIMESTAMP( Indt))AS Total_Stay
        FROM tadmits WHERE AdmitDoc = '" . $myDoc . "'" . "
        ORDER BY Total_Stay";
        $result2 = mysql_query($Li mitSql);
        $i=1;
        $FirstTime = "";
        $SecondTime = "";
        If (($No_Admits % 2)==1){ //odd number of admissions
        $MyCounter = intval(($No_Adm its / 2) + 1);
        while($row3=mys ql_fetch_assoc( $result2)){
        if ($i ==$MyCounter){
        $UpSql = "UPDATE tersummary SET Median_Total_St ay = '"
        .. $row3['Total_Stay'] . "' WHERE AdmitDoc = '" . $myDoc . "'";
        mysql_query($Up Sql);
        }//end if
        $i++;
        }//end while
        }
        else { //even number of admissions
        $MyCounter = intval($No_Admi ts / 2);
        while($row3=mys ql_fetch_assoc( $result2)){
        if ($i==$MyCounter ){
        $FirstTime = $row3['Total_Stay'];
        }//end if
        if ($i==$MyCounter + 1){
        $SecondTime = $row3['Total_Stay'];
        }//end if
        $i++;
        }//end while
        $AvgVal = (strtotime($Fir stTime) + strtotime($Seco ndTime)) /
        2;
        $UpSql = "UPDATE tersummary SET Median_Total_St ay = '" .
        date("H:i:s",$A vgVal) . "' WHERE AdmitDoc = '" . $myDoc . "'";
        mysql_query($Up Sql);
        }//end if/else
        }


        /*************en d block********** *************** ***/


        Thanks,
        Ross
        [color=blue]
        >
        > There is no MEDIAN() operator in MySQL, so you will have to perform
        > this operation in several steps...
        >
        > (1) Count the number of rows in each group
        > (2) For each group, if the count is odd, the median is the middle row
        > (when the set is ordered); when the count is even, the median is the
        > mean of the middle two rows.
        >
        > In skeletal form...
        >
        > // get counts for each set of interest...
        >
        > SELECT AdmitDoc, COUNT(*) AS "Number_Adm its"
        > FROM tadmits
        > GROUP BY AdmitDoc
        >
        > // for each AdmitDoc, work out LIMIT parameters (nb first row =
        > 0)...
        >
        > $doc = $datRow[ 'AdmitDoc' ];
        > $count = $datRow[ Number_Admits];
        >
        > if $count is odd,
        > $limit1 = intval( $count / 2 );
        > $limit2 = 1;
        > else
        > $limit1 = intval( $count / 2 ) - 1;
        > $limit2 = 2;
        >
        > // select middle row(s) from ORDERed set...
        >
        > SELECT AdmitDoc,
        > SEC_TO_TIME(UNI X_TIMESTAMP(Out dt)-UNIX_TIMESTAMP( Decdt)) AS
        > "Doc_To_Adm it"
        > FROM tadmits
        > WHERE AdmitDoc = $doc
        > ORDER BY UNIX_TIMESTAMP( Outdt) - UNIX_TIMESTAMP( Decdt)
        > LIMIT $limit1, $limit2
        >
        >
        > if $count is odd,
        > // only one row...
        > $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
        > $median_doc_to_ admit = $datRow[ 'Doc_To_Admit' ];
        > else
        > // two rows
        > $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
        > $val1 = $datRow[ 'Doc_To_Admit' ];
        > $datRow = mysql_fetch_arr ay( $datRows, MYSQL_ASSOC );
        > $val2 = $datRow[ 'Doc_To_Admit' ];
        > $median_doc_to_ admit = ( $val1 + $val2 ) / 2;
        >
        > You'll have to do this separately for your two required measures
        > because they each need a different sort order.
        >
        > ---
        > Steve
        >[/color]


        Comment

        • Steve

          #5
          Re: Median Function in mySQL


          Ross:

          The outline I suggested uses PHP to control the loop through each
          AdmitDoc, and to create the text of the query to find the median
          value(s), so it is not a SQL-only solution. The main advantage it has
          over your code is that using a LIMIT clause fetches only 1 or 2 rows
          regardless of how many are in the table, which could be a sigificant
          optimisation if you have thousands of rows.

          Which bit wasn't working for you?

          Here's a self-contained worked example...


          <?php

          $t_strServer = 'localhost';
          $t_strUser = 'root';
          $t_strPassword = '';


          $t_objSQLConnec tion = mysql_connect( $t_strServer, $t_strUser,
          $t_strPassword );
          mysql_select_db ( 'TEST', $t_objSQLConnec tion );
          mysql_query( 'DROP TABLE IF EXISTS tadmits', $t_objSQLConnec tion );
          mysql_query( 'CREATE TABLE tadmits ( id INT, AdmitDoc VARCHAR(2), outdt
          DATETIME, decdt DATETIME )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 1, "aa", "20041203000000 ",
          "20041201000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 2, "aa", "20041204000000 ",
          "20041202000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 3, "aa", "20041206000000 ",
          "20041202000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 4, "bb", "20041208000000 ",
          "20041203000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 5, "bb", "20041211000000 ",
          "20041204000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 6, "cc", "20041210000000 ",
          "20041206000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 7, "cc", "20041213000000 ",
          "20041203000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 8, "cc", "20041216000000 ",
          "20041205000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES ( 9, "cc", "20041223000000 ",
          "20041201000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES (10, "cc", "20041222000000 ",
          "20041207000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES (11, "dd", "20041226000000 ",
          "20041202000000 " )', $t_objSQLConnec tion );
          mysql_query( 'INSERT INTO tadmits VALUES (12, "ee", "20041214000000 ",
          "20041203000000 " )', $t_objSQLConnec tion );

          print "Doctor:\tMedia n days\n\n";

          $t_datRows = mysql_query( 'SELECT AdmitDoc, COUNT(*) as "Number_Adm its"
          FROM tadmits GROUP BY AdmitDoc', $t_objSQLConnec tion );
          while( $t_datRow = mysql_fetch_arr ay( $t_datRows, MYSQL_ASSOC ) )
          {
          $t_strAdmitDoc = $t_datRow[ 'AdmitDoc' ];
          $t_lngNumber_Ad mits = $t_datRow[ 'Number_Admits' ];
          if( $t_lngNumber_Ad mits % 2 == 0 )
          {
          // even...
          $t_lngLimit1 = intval( $t_lngNumber_Ad mits / 2 ) - 1;
          $t_lngLimit2 = 2;
          }
          else
          {
          // odd...
          $t_lngLimit1 = intval( $t_lngNumber_Ad mits / 2 );
          $t_lngLimit2 = 1;
          }

          $t_datRows2 = mysql_query( 'SELECT AdmitDoc, UNIX_TIMESTAMP( outdt) -
          UNIX_TIMESTAMP( decdt) AS "Doc_To_Adm it" FROM tadmits WHERE AdmitDoc =
          "' . $t_strAdmitDoc . '" ORDER BY UNIX_TIMESTAMP( outdt) -
          UNIX_TIMESTAMP( decdt) LIMIT ' . $t_lngLimit1 . ',' . $t_lngLimit2,
          $t_objSQLConnec tion );

          if( $t_lngNumber_Ad mits % 2 == 0 )
          {
          // even...
          $t_datRow2 = mysql_fetch_arr ay( $t_datRows2, MYSQL_ASSOC );
          $t_dblValue1 = $t_datRow2[ 'Doc_To_Admit' ];
          $t_datRow2 = mysql_fetch_arr ay( $t_datRows2, MYSQL_ASSOC );
          $t_dblValue2 = $t_datRow2[ 'Doc_To_Admit' ];
          $t_lngMedianTim e = ( $t_dblValue1 + $t_dblValue2 ) / 2;
          }
          else
          {
          // odd...
          $t_datRow2 = mysql_fetch_arr ay( $t_datRows2, MYSQL_ASSOC );
          $t_lngMedianTim e = $t_datRow2[ 'Doc_To_Admit' ];
          }

          mysql_free_resu lt( $t_datRows2 );

          // seconds to decimal days...
          $t_lngMedianTim e = $t_lngMedianTim e / ( 3600 * 24 );

          print "$t_strAdmitDoc :\t$t_lngMedian Time days\n\n";

          }

          mysql_free_resu lt( $t_datRows );

          ?>


          I get this:

          Doctor: Median days

          aa: 2 days

          bb: 6 days

          cc: 11 days

          dd: 24 days

          ee: 11 days



          ---
          Steve

          Comment

          Working...