faster way to get from SQL database into array

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Cleverbum@hotmail.com

    faster way to get from SQL database into array

    I currently have a list of md5 strings and need to check if a new
    string is in that list hundreds of thousands of times. I've found that
    the fastest way to do this is to have all the md5's stored in an array
    and use the php function in_array().
    my only problem now is that populating this array with data from my sql
    server is rather slow, I currently use the lines:

    $resone = mysql_query("SE LECT * FROM logs_full");
    mysql_close();

    while ($row = mysql_fetch_arr ay($resone)) {
    $md5array[$md5count]= $row['textmd5'];
    $md5count++;
    }

    to do this. does anyone have a faster method?

  • Erwin Moller

    #2
    Re: faster way to get from SQL database into array

    Cleverbum@hotma il.com wrote:
    I currently have a list of md5 strings and need to check if a new
    string is in that list hundreds of thousands of times. I've found that
    the fastest way to do this is to have all the md5's stored in an array
    and use the php function in_array().
    my only problem now is that populating this array with data from my sql
    server is rather slow, I currently use the lines:
    >
    $resone = mysql_query("SE LECT * FROM logs_full");
    mysql_close();
    >
    while ($row = mysql_fetch_arr ay($resone)) {
    $md5array[$md5count]= $row['textmd5'];
    $md5count++;
    }
    >
    to do this. does anyone have a faster method?
    Hi,

    Loading the whole table into PHP's memory is surely slow.
    I think it is faster to make the field textmd5 UNIQUE on databaselevel, and
    just insert and of course catch the possible error (UNIQUE CONSTAINT
    VIOLATION etc etc): In case of collision it will complain. than handle
    that.
    In that way only your database has to scan the table, and will not tranfer
    its contents to php.

    Also, avoid * when making queries, it slows down too.
    You only want textmd5, so just ask for that field. :-)

    Regards,
    Erwin Moller

    Comment

    • Rik

      #3
      Re: faster way to get from SQL database into array

      Erwin Moller wrote:
      Cleverbum@hotma il.com wrote:
      >
      >I currently have a list of md5 strings and need to check if a new
      >string is in that list hundreds of thousands of times. I've found
      >that the fastest way to do this is to have all the md5's stored in
      >an array and use the php function in_array().
      >my only problem now is that populating this array with data from my
      >sql server is rather slow, I currently use the lines:
      >>
      >$resone = mysql_query("SE LECT * FROM logs_full");
      >mysql_close( );
      >>
      >while ($row = mysql_fetch_arr ay($resone)) {
      > $md5array[$md5count]= $row['textmd5'];
      > $md5count++;
      >}
      >>
      >to do this. does anyone have a faster method?
      >
      Hi,
      >
      Loading the whole table into PHP's memory is surely slow.
      I think it is faster to make the field textmd5 UNIQUE on
      databaselevel, and just insert and of course catch the possible error
      (UNIQUE CONSTAINT VIOLATION etc etc): In case of collision it will
      complain. than handle that.
      In that way only your database has to scan the table, and will not
      tranfer its contents to php.
      Indeed, or possibly check it first:
      $searchstring = md5(//something);
      $result = mysql_query("SE LECT `textmd5` FROM `logs_full` WHERE `textmd5` =
      '{$searchstring }'");
      if(mysql_num_ro ws($result) < 1){
      //not in table logic..
      } else {
      //is in table logic..
      }
      Also, avoid * when making queries, it slows down too.
      You only want textmd5, so just ask for that field. :-)
      Amen.
      --
      Rik Wasmus


      Comment

      • Toby Inkster

        #4
        Re: faster way to get from SQL database into array

        Rik wrote:
        Indeed, or possibly check it first:
        $searchstring = md5(//something);
        $result = mysql_query("SE LECT `textmd5` FROM `logs_full` WHERE `textmd5` =
        '{$searchstring }'");
        if(mysql_num_ro ws($result) < 1){
        //not in table logic..
        } else {
        //is in table logic..
        }
        As the OP said, he needs to check "hundreds of thousands" of md5 strings
        -- using an SQL call for each one will slow him down.

        My advice would be to keep doing roughly what you're already doing, but
        speed up your array search.

        To speed up your search, make sure your array is sorted in alphabetical
        order. You can do this using your initial SQL query:

        SELECT textmd5 FROM logs_full ORDER BY textmd5;

        (Note: the query will speed up by specifying the exact column you need to
        select; not '*'.)

        You can then use a binary search function[1] instead of in_array() to
        check that a value exists in your array.

        ____
        1. Such as...


        --
        Toby A Inkster BSc (Hons) ARCS
        Contact Me ~ http://tobyinkster.co.uk/contact

        Comment

        • Cleverbum@hotmail.com

          #5
          Re: faster way to get from SQL database into array


          Toby Inkster wrote:
          Rik wrote:
          >
          Indeed, or possibly check it first:
          $searchstring = md5(//something);
          $result = mysql_query("SE LECT `textmd5` FROM `logs_full` WHERE `textmd5` =
          '{$searchstring }'");
          if(mysql_num_ro ws($result) < 1){
          //not in table logic..
          } else {
          //is in table logic..
          }
          >
          As the OP said, he needs to check "hundreds of thousands" of md5 strings
          -- using an SQL call for each one will slow him down.
          Hit the nail on the head.
          >
          My advice would be to keep doing roughly what you're already doing, but
          speed up your array search.
          >
          To speed up your search, make sure your array is sorted in alphabetical
          order. You can do this using your initial SQL query:
          >
          SELECT textmd5 FROM logs_full ORDER BY textmd5;
          >
          (Note: the query will speed up by specifying the exact column you need to
          select; not '*'.)
          >
          You can then use a binary search function[1] instead of in_array() to
          check that a value exists in your array.
          >
          ____
          1. Such as...

          >
          --
          Toby A Inkster BSc (Hons) ARCS
          Contact Me ~ http://tobyinkster.co.uk/contact
          Thanks for the binary search idea I've done some timing and the
          in_array seems to be taking a lot longer than I originally thought it
          would.

          Comment

          • Erwin Moller

            #6
            Re: faster way to get from SQL database into array

            Cleverbum@hotma il.com wrote:
            >
            Toby Inkster wrote:
            >Rik wrote:
            >>
            Indeed, or possibly check it first:
            $searchstring = md5(//something);
            $result = mysql_query("SE LECT `textmd5` FROM `logs_full` WHERE
            `textmd5` = '{$searchstring }'");
            if(mysql_num_ro ws($result) < 1){
            //not in table logic..
            } else {
            //is in table logic..
            }
            >>
            >As the OP said, he needs to check "hundreds of thousands" of md5 strings
            >-- using an SQL call for each one will slow him down.
            >
            Hit the nail on the head.
            Sorry to be rude, but I think you both missed the nail.
            You have a certain value that you transform to a md5 and check if it is your
            db allready, right?
            PHP can do that md5.

            After that you simply need to find that value in the table, like:
            $mymd5= md5("whatever") ;
            $SQL = "SELECT md5text FROM logs_full WHERE (md5text='".$my md5."');"

            You do not have to make hundreds of thousands of queries, or hundreds of
            thousands md5-calculations.
            You only feed this query once and sees if it returns any results.

            Am I missing something completely here?

            Regards,
            Erwin Moller
            >
            >>
            >My advice would be to keep doing roughly what you're already doing, but
            >speed up your array search.
            >>
            >To speed up your search, make sure your array is sorted in alphabetical
            >order. You can do this using your initial SQL query:
            >>
            >SELECT textmd5 FROM logs_full ORDER BY textmd5;
            >>
            >(Note: the query will speed up by specifying the exact column you need to
            >select; not '*'.)
            >>
            >You can then use a binary search function[1] instead of in_array() to
            >check that a value exists in your array.
            >>
            >____
            >1. Such as...
            >http://www.rci.rutgers.edu/~jfulton/...ary_search.inc
            >>
            >--
            >Toby A Inkster BSc (Hons) ARCS
            >Contact Me ~ http://tobyinkster.co.uk/contact
            >
            Thanks for the binary search idea I've done some timing and the
            in_array seems to be taking a lot longer than I originally thought it
            would.

            Comment

            • Geoff Berrow

              #7
              Re: faster way to get from SQL database into array

              Message-ID: <457e7592$0$324 $e4fe514c@news. xs4all.nlfrom Erwin Moller
              contained the following:
              >You do not have to make hundreds of thousands of queries, or hundreds of
              >thousands md5-calculations.
              >You only feed this query once and sees if it returns any results.
              >
              >Am I missing something completely here?
              If you are, so am I.
              --
              Geoff Berrow (put thecat out to email)
              It's only Usenet, no one dies.
              My opinions, not the committee's, mine.
              Simple RFDs http://www.ckdog.co.uk/rfdmaker/

              Comment

              • Norman Peelman

                #8
                Re: faster way to get from SQL database into array

                <Cleverbum@hotm ail.comwrote in message
                news:1165846992 .423101.215040@ f1g2000cwa.goog legroups.com...
                I currently have a list of md5 strings and need to check if a new
                string is in that list hundreds of thousands of times. I've found that
                the fastest way to do this is to have all the md5's stored in an array
                and use the php function in_array().
                my only problem now is that populating this array with data from my sql
                server is rather slow, I currently use the lines:
                >
                $resone = mysql_query("SE LECT * FROM logs_full");
                mysql_close();
                >
                while ($row = mysql_fetch_arr ay($resone)) {
                $md5array[$md5count]= $row['textmd5'];
                $md5count++;
                }
                >
                to do this. does anyone have a faster method?
                >
                Maybe,

                I think you are going about this project all wrong:

                Assumption - your 'list' of md5's is actually the md5's in the database.
                Assumption - you arecreating md5's from strings in text files.
                Assumption - you need to check to make sure that the new value isn't already
                in the database.

                Problem - You are reading in the ENTIRE database.
                Problem - you may be tempted to think in_array() would be faster, but it has
                to start at the beginning of the array for each new value every time.
                Problem - big waste of time and resources. You are doing double work.

                Solution - Let MySQL do what it was designed to do. Since md5's are meant to
                be unique in their own right, simply make your 'textmd5' field the PRIMARY
                KEY and it will automatically be indexed. Now only do the operations you
                actually require:

                $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
                $resone = mysql_query($qu ery, $dbc);
                if (!mysql_num_row s($resone))
                {
                // no match found
                // insert new info into database
                }
                else
                {
                //match found
                // no need to insert
                }

                no arrays being used and I guarantee it will be WAY faster for any size of
                database. Now for the real question. Any particular reason you are creating
                an md5 database? It's already being done...

                Norm
                --
                FREE Avatar hosting at www.easyavatar.com



                Comment

                • C.

                  #9
                  Re: faster way to get from SQL database into array

                  Cleverbum@hotma il.com wrote:
                  Toby Inkster wrote:
                  Rik wrote:
                  Indeed, or possibly check it first:
                  $searchstring = md5(//something);
                  $result = mysql_query("SE LECT `textmd5` FROM `logs_full` WHERE `textmd5` =
                  '{$searchstring }'");
                  if(mysql_num_ro ws($result) < 1){
                  //not in table logic..
                  } else {
                  //is in table logic..
                  }
                  As the OP said, he needs to check "hundreds of thousands" of md5 strings
                  -- using an SQL call for each one will slow him down.
                  >
                  Hit the nail on the head.
                  >
                  PHP arrays are not particularly fast / efficient - particularly when
                  they get big. The fastest way to resolve two large lists like this is
                  to keep the "hundreds of thousands" of needles in a table as well as
                  the haystack and join the two in an SQL query (rDBMS are designed
                  specifically to do this kind of thing).

                  C.

                  Comment

                  • Jerry Stuckle

                    #10
                    Re: faster way to get from SQL database into array

                    Norman Peelman wrote:
                    <Cleverbum@hotm ail.comwrote in message
                    news:1165846992 .423101.215040@ f1g2000cwa.goog legroups.com...
                    >
                    >>I currently have a list of md5 strings and need to check if a new
                    >>string is in that list hundreds of thousands of times. I've found that
                    >>the fastest way to do this is to have all the md5's stored in an array
                    >>and use the php function in_array().
                    >>my only problem now is that populating this array with data from my sql
                    >>server is rather slow, I currently use the lines:
                    >>
                    >>$resone = mysql_query("SE LECT * FROM logs_full");
                    >>mysql_close() ;
                    >>
                    >>while ($row = mysql_fetch_arr ay($resone)) {
                    > $md5array[$md5count]= $row['textmd5'];
                    > $md5count++;
                    >>}
                    >>
                    >>to do this. does anyone have a faster method?
                    >>
                    >
                    >
                    Maybe,
                    >
                    I think you are going about this project all wrong:
                    >
                    Assumption - your 'list' of md5's is actually the md5's in the database.
                    Assumption - you arecreating md5's from strings in text files.
                    Assumption - you need to check to make sure that the new value isn't already
                    in the database.
                    >
                    Problem - You are reading in the ENTIRE database.
                    Problem - you may be tempted to think in_array() would be faster, but it has
                    to start at the beginning of the array for each new value every time.
                    Problem - big waste of time and resources. You are doing double work.
                    >
                    Solution - Let MySQL do what it was designed to do. Since md5's are meant to
                    be unique in their own right, simply make your 'textmd5' field the PRIMARY
                    KEY and it will automatically be indexed. Now only do the operations you
                    actually require:
                    >
                    Actually, md5's are NOT unique. If they were, it would be the most
                    impressive compression mechanism known to computers. Imagine - any size
                    file compressed to 32 characters!
                    $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
                    $resone = mysql_query($qu ery, $dbc);
                    if (!mysql_num_row s($resone))
                    {
                    // no match found
                    // insert new info into database
                    }
                    else
                    {
                    //match found
                    // no need to insert
                    }
                    >
                    no arrays being used and I guarantee it will be WAY faster for any size of
                    database. Now for the real question. Any particular reason you are creating
                    an md5 database? It's already being done...
                    >
                    Norm

                    --
                    =============== ===
                    Remove the "x" from my email address
                    Jerry Stuckle
                    JDS Computer Training Corp.
                    jstucklex@attgl obal.net
                    =============== ===

                    Comment

                    • Norman Peelman

                      #11
                      Re: faster way to get from SQL database into array

                      "Jerry Stuckle" <jstucklex@attg lobal.netwrote in message
                      news:Asednf6PSq V6X-PYnZ2dnUVZ_qjin Z2d@comcast.com ...
                      Norman Peelman wrote:
                      <Cleverbum@hotm ail.comwrote in message
                      news:1165846992 .423101.215040@ f1g2000cwa.goog legroups.com...
                      >I currently have a list of md5 strings and need to check if a new
                      >string is in that list hundreds of thousands of times. I've found that
                      >the fastest way to do this is to have all the md5's stored in an array
                      >and use the php function in_array().
                      >my only problem now is that populating this array with data from my sql
                      >server is rather slow, I currently use the lines:
                      >
                      >$resone = mysql_query("SE LECT * FROM logs_full");
                      >mysql_close( );
                      >
                      >while ($row = mysql_fetch_arr ay($resone)) {
                      $md5array[$md5count]= $row['textmd5'];
                      $md5count++;
                      >}
                      >
                      >to do this. does anyone have a faster method?
                      >

                      Maybe,

                      I think you are going about this project all wrong:

                      Assumption - your 'list' of md5's is actually the md5's in the database.
                      Assumption - you arecreating md5's from strings in text files.
                      Assumption - you need to check to make sure that the new value isn't
                      already
                      in the database.

                      Problem - You are reading in the ENTIRE database.
                      Problem - you may be tempted to think in_array() would be faster, but it
                      has
                      to start at the beginning of the array for each new value every time.
                      Problem - big waste of time and resources. You are doing double work.

                      Solution - Let MySQL do what it was designed to do. Since md5's are
                      meant to
                      be unique in their own right, simply make your 'textmd5' field the
                      PRIMARY
                      KEY and it will automatically be indexed. Now only do the operations you
                      actually require:
                      >
                      Actually, md5's are NOT unique. If they were, it would be the most
                      impressive compression mechanism known to computers. Imagine - any size
                      file compressed to 32 characters!
                      >

                      Well, I knew i'd get some flak for that. :) I was assuming they would be
                      unique enough for the situation.

                      $query = "SELECT textmd5 FROM logs_full WHERE textmd5 = 'search_md5';
                      $resone = mysql_query($qu ery, $dbc);
                      if (!mysql_num_row s($resone))
                      {
                      // no match found
                      // insert new info into database
                      }
                      else
                      {
                      //match found
                      // no need to insert
                      }

                      no arrays being used and I guarantee it will be WAY faster for any size
                      of
                      database. Now for the real question. Any particular reason you are
                      creating
                      an md5 database? It's already being done...

                      Norm
                      >
                      Norm
                      --
                      FREE Avatar hosting at www.easyavatar.com


                      Comment

                      • Toby Inkster

                        #12
                        Re: faster way to get from SQL database into array

                        Erwin Moller wrote:
                        You have a certain value that you transform to a md5 and check if it is your
                        db allready, right?
                        No -- he has *hundred of thousands* of "certain values" that he needs to
                        transform to an MD5 and check to see if it's in his DB already.

                        What you are suggesting is (pseudo-code abound):

                        $needles = (hundreds of thousands of values);
                        foreach ($needles as $n)
                        {
                        $nMD5 = md5($n);
                        $r = sql_query("SELE CT md5text FROM logsfull WHERE md5text='$nMD5' ;");
                        if (sql_fetch_arra y($r))
                        print "$nMD5 exists in database.\n";
                        else
                        print "$nMD5 does not exist in database.\n";
                        }

                        This will involve hundreds of thousands of SQL queries. Say the "logsfull"
                        table has zero rows (as it well might!), then that is hundreds of
                        thousands of useless calls to your RDBMS.

                        What I am suggesting is:

                        $needles = (hundreds of thousands of values);
                        $r = sql_query("SELE CT md5text FROM logsfull ORDER BY md5text;");
                        while (list($straw) = sql_fetch_array ($r))
                        $haystack[] = $straw;
                        foreach ($needles as $n)
                        {
                        $nMD5 = md5($n);
                        if (bsearch($hayst ack, $n))
                        print "$nMD5 does exist in database\n";
                        else
                        print "$nMD5 does not exist in database\n";
                        }

                        A single call to our poor beleagured RDBMS, and then an efficient binary
                        search for each needle. The bottleneck is likely to be the md5() function
                        here.

                        --
                        Toby A Inkster BSc (Hons) ARCS
                        Contact Me ~ http://tobyinkster.co.uk/contact

                        Comment

                        • Toby Inkster

                          #13
                          Re: faster way to get from SQL database into array

                          Norman Peelman wrote:
                          Well, I knew i'd get some flak for that. :) I was assuming they would be
                          unique enough for the situation.
                          There are varying degrees of uniqueness?

                          --
                          Toby A Inkster BSc (Hons) ARCS
                          Contact Me ~ http://tobyinkster.co.uk/contact

                          Comment

                          • Norman Peelman

                            #14
                            Re: faster way to get from SQL database into array

                            <Cleverbum@hotm ail.comwrote in message
                            news:1165846992 .423101.215040@ f1g2000cwa.goog legroups.com...
                            I currently have a list of md5 strings and need to check if a new
                            string is in that list hundreds of thousands of times. I've found that
                            the fastest way to do this is to have all the md5's stored in an array
                            and use the php function in_array().
                            my only problem now is that populating this array with data from my sql
                            server is rather slow, I currently use the lines:
                            >
                            $resone = mysql_query("SE LECT * FROM logs_full");
                            mysql_close();
                            >
                            while ($row = mysql_fetch_arr ay($resone)) {
                            $md5array[$md5count]= $row['textmd5'];
                            $md5count++;
                            }
                            >
                            to do this. does anyone have a faster method?
                            >
                            I can only shorten it:

                            $resone = mysql_query("SE LECT * FROM logs_full");
                            mysql_close();

                            while ($row = mysql_fetch_ass oc($resone)) {
                            $md5array[]= $row['textmd5'];
                            }

                            How big (how many rows) is the table you are reading in? Are you storing new
                            md5's when they aren't found in the db?


                            Norm
                            --
                            FREE Avatar hosting at www.easyavatar.com


                            Comment

                            • Norman Peelman

                              #15
                              Re: faster way to get from SQL database into array

                              "Toby Inkster" <usenet200611@t obyinkster.co.u kwrote in message
                              news:l22354-d0o.ln1@ophelia .g5n.co.uk...
                              Norman Peelman wrote:
                              >
                              Well, I knew i'd get some flak for that. :) I was assuming they would
                              be
                              unique enough for the situation.
                              >
                              There are varying degrees of uniqueness?
                              >
                              --
                              Toby A Inkster BSc (Hons) ARCS
                              Contact Me ~ http://tobyinkster.co.uk/contact
                              >
                              The poster is converting strings not files to md5. md5 is not a method
                              of encryption per se but a more complicated method of CRC to 'help' validate
                              data... It is convienent to use md5 to 'one-way encrypt' small strings used
                              as passwords. I would assume that the poster is also storing the matching
                              plaintext along with the md5 so if a match was found he could then see if
                              the plaintext matched as well but, according to the definitions i've read on
                              the net; the odds are extremely high, paraphrasing:

                              Taken from


                              The odds of any two strings having the same md5 are on the order of 2^64
                              operations.
                              The odds of finding a string having a given md5 is on the order of 2^128
                              operations.

                              ....so yes, i'd say that there are variable degrees of uniqueness. So maybe
                              an easier way than my first post would be:

                              Using textmd5 as PRIMARY KEY

                              $db_link = mysql_connect(. ..,...,...); //etc.
                              $query = "INSERT INTO logs_full VALUES ($textmd5, $textplain)";
                              $resone = mysql_query($qu ery, $dbc);
                              if (!mysql_errno($ db_link))
                              {
                              // match found (duplicate entry for key error 1062 returned from MySQL)
                              // could then double check textplain here although odds of being here
                              are very high
                              // do other stuff here
                              }
                              else
                              {
                              // no match found
                              // new entry has just been inserted
                              }

                              Of course this assumes he is also storing new md5's to check against. I
                              gueuss we don't fully know what the poster completely intends on doing
                              unless he pops back in with a little more info.

                              Norm


                              Comment

                              Working...