How to inspect array elements based on criteria?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • deko

    How to inspect array elements based on criteria?

    I have a counter file that records page hits - each hit is a UNIX timestamp
    in the file. But I'm only interested in page hits in the last 365 days.
    The below code creates an array from the file and counts hits based on
    interval - day, month or year.

    $avs = file($viscounte r);
    foreach ($avs as $val)
    {
    if($val>(time()-(86400)))
    {
    ++$v24h;
    }
    if($val>(time()-(2592000)))
    {
    ++$v30d;
    }
    if($val>(time()-(31536000)))
    {
    ++$v365d;
    }
    }
    echo $v24h;
    echo $v30d;
    echo $v365;

    The problem is speed - the foreach loop is too slow. This is exacerbated
    when the file contains timestamps that are more than one year old - the code
    has to loop through additional array elements that I'm not interested in -
    sometimes in the tens of thousands. I can't modify the file because it
    needs to be kept for historical analysis.

    How can I only inspect array elements that are less than or equal to one
    year old? Other ways to make this more efficient?

    Thanks!


  • Pedro Graca

    #2
    Re: How to inspect array elements based on criteria?

    deko wrote:[color=blue]
    > I have a counter file that records page hits - each hit is a UNIX timestamp
    > in the file. But I'm only interested in page hits in the last 365 days.
    > The below code creates an array from the file and counts hits based on
    > interval - day, month or year.
    >
    > $avs = file($viscounte r);[/color]

    /* calculate time offsets once only */
    $y = time()-31536000;
    $m = time()-2592000;
    $d = time()-86400;
    [color=blue]
    > foreach ($avs as $val)
    > {[/color]

    /*
    [color=blue]
    > if($val>(time()-(86400)))
    > {
    > ++$v24h;
    > }
    > if($val>(time()-(2592000)))
    > {
    > ++$v30d;
    > }
    > if($val>(time()-(31536000)))
    > {
    > ++$v365d;
    > }[/color]

    */

    /* reorder your if()s */
    if ($val > $y) {
    ++$v365d;
    if ($val > $m) {
    ++$v30d;
    if ($val > $d) ++$v24h;
    }
    }
    [color=blue]
    > }
    > echo $v24h;
    > echo $v30d;
    > echo $v365;
    >
    > The problem is speed - the foreach loop is too slow. This is exacerbated
    > when the file contains timestamps that are more than one year old - the code
    > has to loop through additional array elements that I'm not interested in -
    > sometimes in the tens of thousands. I can't modify the file because it
    > needs to be kept for historical analysis.
    >
    > How can I only inspect array elements that are less than or equal to one
    > year old? Other ways to make this more efficient?[/color]

    In your original code you are calculating time()-<constant> 3 times for
    every line in the file.
    You are also comparing every value to three different offsets.

    With the changes above you calculate time()-<constant> 3 times.
    you compare every value to the year offset and only check for month and
    day if it's a value within last year.




    If you know for sure the file is ordered, maybe that code can be speeded
    up even more by getting rid of most comparisons, but this is just a
    guess: try it and use the faster code.

    $in_year = false;
    $in_month = false;
    $in_today = false;

    foreach($avs as $val) {
    if ($in_year || ($val > $y)) {
    ++$v365d;
    $in_year = true;
    if ($in_month || ($val > $m)) {
    ++$v30d;
    $in_month = true;
    if ($in_today || ($val > $d)) {
    ++$v24h;
    $in_today = true;
    }
    }
    }
    }



    --
    USENET would be a better place if everybody read: | to email me: use |
    http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
    http://www.netmeister.org/news/learn2quote2.html | header, textonly |
    http://www.expita.com/nomime.html | no attachments. |

    Comment

    • Pjotr Wedersteers

      #3
      Re: How to inspect array elements based on criteria?

      deko wrote:[color=blue]
      > I have a counter file that records page hits - each hit is a UNIX
      > timestamp in the file. But I'm only interested in page hits in the
      > last 365 days. The below code creates an array from the file and
      > counts hits based on interval - day, month or year.
      >
      > $avs = file($viscounte r);
      > foreach ($avs as $val)
      > {
      > if($val>(time()-(86400)))
      > {
      > ++$v24h;
      > }
      > if($val>(time()-(2592000)))
      > {
      > ++$v30d;
      > }
      > if($val>(time()-(31536000)))
      > {
      > ++$v365d;
      > }
      > }
      > echo $v24h;
      > echo $v30d;
      > echo $v365;
      >
      > The problem is speed - the foreach loop is too slow. This is
      > exacerbated when the file contains timestamps that are more than one
      > year old - the code has to loop through additional array elements
      > that I'm not interested in - sometimes in the tens of thousands. I
      > can't modify the file because it needs to be kept for historical
      > analysis.
      >
      > How can I only inspect array elements that are less than or equal to
      > one year old? Other ways to make this more efficient?
      >
      > Thanks![/color]

      A minor improvement (without touching the concept, which indead seems to be
      a candidate for an efficiency upgrade, I will think about that in the mean
      time...) is processing the time() function only once each loop and storing
      it in a var.

      Then i think you could save some processing by changing the order you check
      for year, month, day. But this may be due to the fact the current script
      does things you don't want it to do. Looks to me right now a $val that
      passes the year test also fulfills the other two tests. If that's what you
      want, you could use a case structure and intentionally omit the breaks so
      the value falls thru from the first appropriate test. I think the tests take
      up most of the time, not the assignments/increments.

      I really don't see how your script dismisses all stamps older than 1 year.
      It seems to me it does include all these in your counts. But then again,
      it's late, had a couple of brewskis and may be wrong...

      HTH
      Pjotr


      Comment

      • Michael Austin

        #4
        Re: How to inspect array elements based on criteria?

        deko wrote:[color=blue]
        > I have a counter file that records page hits - each hit is a UNIX timestamp
        > in the file. But I'm only interested in page hits in the last 365 days.
        > The below code creates an array from the file and counts hits based on
        > interval - day, month or year.
        >
        > $avs = file($viscounte r);
        > foreach ($avs as $val)
        > {
        > if($val>(time()-(86400)))
        > {
        > ++$v24h;
        > }
        > if($val>(time()-(2592000)))
        > {
        > ++$v30d;
        > }
        > if($val>(time()-(31536000)))
        > {
        > ++$v365d;
        > }
        > }
        > echo $v24h;
        > echo $v30d;
        > echo $v365;
        >
        > The problem is speed - the foreach loop is too slow. This is exacerbated
        > when the file contains timestamps that are more than one year old - the code
        > has to loop through additional array elements that I'm not interested in -
        > sometimes in the tens of thousands. I can't modify the file because it
        > needs to be kept for historical analysis.
        >
        > How can I only inspect array elements that are less than or equal to one
        > year old? Other ways to make this more efficient?
        >
        > Thanks!
        >
        >[/color]

        Deko,

        You REALLY need to learn to use a database, there are builtin functions that do
        all of this for you and would make your counter be able to store a lot more
        data. Currently, what happens when 2 people try to login simultaneously and
        write to that log file? Is there contention that will cause a login failure?

        Once you have it in a database, then it is easy to do something like (This is
        not the exact syntax, you will need to do that yourself -- call it a learning
        excercise.

        select fielda, fieldb from logtable where datetime > time()-(2592000)

        on access to a page you would use an insert statement

        insert into logtable values('someuse r','somepage',< ?php time() ?>);

        Now you can not only see how many visitors, but also which page(s) they hit.

        I would write it for you, but I am not getting paid to do your job... :)
        --
        Michael Austin.
        Consultant - Available.
        Donations welcomed. Http://www.firstdbasource.com/donations.html
        :)

        Comment

        • Herbie Cumberland

          #5
          Re: How to inspect array elements based on criteria?

          On Mon, 19 Jul 2004 21:32:17 GMT, "deko" <nospam@hotmail .com> wrote:
          [color=blue]
          >I have a counter file that records page hits - each hit is a UNIX timestamp
          >in the file. But I'm only interested in page hits in the last 365 days.
          >The below code creates an array from the file and counts hits based on
          >interval - day, month or year.
          >
          >$avs = file($viscounte r);
          >foreach ($avs as $val)
          >{
          > if($val>(time()-(86400)))
          > {
          > ++$v24h;
          > }
          > if($val>(time()-(2592000)))
          > {
          > ++$v30d;
          > }
          > if($val>(time()-(31536000)))
          > {
          > ++$v365d;
          > }
          >}
          >echo $v24h;
          >echo $v30d;
          >echo $v365;
          >
          >The problem is speed - the foreach loop is too slow. This is exacerbated
          >when the file contains timestamps that are more than one year old - the code
          >has to loop through additional array elements that I'm not interested in -
          >sometimes in the tens of thousands. I can't modify the file because it
          >needs to be kept for historical analysis.
          >
          >How can I only inspect array elements that are less than or equal to one
          >year old? Other ways to make this more efficient?[/color]

          are you sure it's just the foreach loop that's taking excessive time?
          the file read operation will also take a long time for a large file.

          anyway, it would speed things up if you didn't make 3 seperate calls
          to the time() function and 3 seperate mathematical calculations for
          every line of the file... something like this might speed the foreach
          loop considerably:

          <?php
          // read data file
          $avs = file($viscounte r);
          // get current time
          $now = time();
          // define values for our test times
          // (use constants rather than variables for processing speed)
          define('TIME24h ', $now - 86400);
          define('TIME30d ', $now - 2592000);
          define('TIME365 d', $now - 31536000);
          // process the data
          foreach ($avs as $val)
          {
          if ($val > TIME24h)
          {
          ++$v24h;
          }
          if ($val > TIME30d)
          {
          ++$v30d;
          }
          if ($val > TIME365d)
          {
          ++$v365d;
          }
          else
          {
          // entry is older than 365 days, so stop processing
          break;
          }
          }
          echo $v24h;
          echo $v30d;
          echo $v365;
          ?>

          a _much_ better way would be to store your page hits in a database -
          you could then also store other info, such as client IP, browser type,
          etc.

          as long as the timestamp field is indexed, lookups based on that field
          will be very fast.

          you would then do something like this to get number of hits in the
          last 24 hours (in MySQL):

          "SELECT COUNT(timestamp ) AS hits FROM page_hits WHERE timestamp >
          UNIX_TIMESTAMP( ) - 86400"


          hth,
          h.c.

          Comment

          • deko

            #6
            Re: How to inspect array elements based on criteria?

            > You REALLY need to learn to use a database, there are builtin functions
            that do all of this

            Agreed, but the goal here is to avoid using a database. Actually, it's
            working pretty well - except for that bit of inefficient code.
            [color=blue]
            > call it a learning excercise[/color]

            MySql is next week - PHP is this week :)
            [color=blue]
            > Is there contention that will cause a login failure?[/color]

            Yes, there is the possibility for contention, but I make sure there is a
            lock on the file before writing, and the code will retry to get the lock if
            it cannot the first time.
            [color=blue]
            > I would write it for you, but I am not getting paid to do your job... :)[/color]

            neither am i :)


            Comment

            • deko

              #7
              Re: How to inspect array elements based on criteria?

              > If you know for sure the file is ordered, maybe that code can be speeded[color=blue]
              > up even more by getting rid of most comparisons, but this is just a
              > guess: try it and use the faster code.[/color]

              Yes, the file is ordered - the oldest time is the first line; the newest
              time is the last line.
              [color=blue]
              > $in_year = false;
              > $in_month = false;
              > $in_today = false;
              >
              > foreach($avs as $val) {
              > if ($in_year || ($val > $y)) {
              > ++$v365d;
              > $in_year = true;
              > if ($in_month || ($val > $m)) {
              > ++$v30d;
              > $in_month = true;
              > if ($in_today || ($val > $d)) {
              > ++$v24h;
              > $in_today = true;
              > }
              > }
              > }
              > }
              >[/color]

              This looks great! I will give it a try. Thanks!


              Comment

              • deko

                #8
                Re: How to inspect array elements based on criteria?

                > are you sure it's just the foreach loop that's taking excessive time?[color=blue]
                > the file read operation will also take a long time for a large file.[/color]

                Good point - but what do you think is a large file? 100K? 1Mg? 5Mg?
                [color=blue]
                > anyway, it would speed things up if you didn't make 3 seperate calls
                > to the time() function and 3 seperate mathematical calculations for
                > every line of the file... something like this might speed the foreach
                > loop considerably:
                >
                > <?php
                > // read data file
                > $avs = file($viscounte r);
                > // get current time
                > $now = time();
                > // define values for our test times
                > // (use constants rather than variables for processing speed)[/color]

                Great! Will do.
                [color=blue]
                > define('TIME24h ', $now - 86400);
                > define('TIME30d ', $now - 2592000);
                > define('TIME365 d', $now - 31536000);
                > // process the data
                > foreach ($avs as $val)
                > {
                > if ($val > TIME24h)
                > {
                > ++$v24h;
                > }
                > if ($val > TIME30d)
                > {
                > ++$v30d;
                > }
                > if ($val > TIME365d)
                > {
                > ++$v365d;
                > }
                > else
                > {
                > // entry is older than 365 days, so stop processing
                > break;[/color]

                Sounds good - I did not realize I could exit the foreach loop with "break".
                I think this may be the best way to deal with entries that are over 1 year
                old.
                [color=blue]
                > }
                > }
                > echo $v24h;
                > echo $v30d;
                > echo $v365;
                > ?>
                >
                > a _much_ better way would be to store your page hits in a database -
                > you could then also store other info, such as client IP, browser type,
                > etc.
                >
                > as long as the timestamp field is indexed, lookups based on that field
                > will be very fast.
                >
                > you would then do something like this to get number of hits in the
                > last 24 hours (in MySQL):
                >
                > "SELECT COUNT(timestamp ) AS hits FROM page_hits WHERE timestamp >
                > UNIX_TIMESTAMP( ) - 86400"[/color]

                10-4 on the MySql idea - but there's a reason I am trying to do this with a
                file-based script.

                Thanks for your help!


                Comment

                • Herbie Cumberland

                  #9
                  Re: How to inspect array elements based on criteria?

                  On Tue, 20 Jul 2004 00:16:48 GMT, "deko" <nospam@hotmail .com> wrote:
                  [color=blue][color=green]
                  >> are you sure it's just the foreach loop that's taking excessive time?
                  >> the file read operation will also take a long time for a large file.[/color]
                  >
                  >Good point - but what do you think is a large file? 100K? 1Mg? 5Mg?[/color]

                  depends on a lot of things - processor speed, disk read speed, other
                  things happening on the machine, etc.
                  [color=blue]
                  >Sounds good - I did not realize I could exit the foreach loop with "break".
                  >I think this may be the best way to deal with entries that are over 1 year
                  >old.[/color]

                  ah.. i've just read another post of yours in this thread in which you
                  said:
                  [color=blue]
                  >Yes, the file is ordered - the oldest time is the first line; the newest
                  >time is the last line.[/color]

                  which complicates things... my example would only work if the lines
                  were in newest-first format...

                  you _could_ reverse the array once you've read it in from file, but
                  this would add to script time
                  (http://www.php.net/manual/en/function.array-reverse.php)

                  but better to work through the array backwards...

                  <?php
                  $avs = file($viscounte r);
                  // do other stuff (set time constants, etc)
                  for( $i=count($avs)-1; $i>=0; $i-- )
                  {
                  if( $avs[$i] > TIME24h )
                  {
                  ...
                  }
                  ...
                  }
                  ?>


                  Comment

                  • deko

                    #10
                    Re: How to inspect array elements based on criteria?

                    > but better to work through the array backwards...[color=blue]
                    >
                    > <?php
                    > $avs = file($viscounte r);
                    > // do other stuff (set time constants, etc)
                    > for( $i=count($avs)-1; $i>=0; $i-- )
                    > {
                    > if( $avs[$i] > TIME24h )
                    > {
                    > ...
                    > }
                    > ...
                    > }
                    > ?>[/color]

                    going to gin this up now... will let you know how it works...


                    Comment

                    • deko

                      #11
                      Re: How to inspect array elements based on criteria?

                      > ah.. i've just read another post of yours in this thread in which you[color=blue]
                      > said:
                      >[color=green]
                      > >Yes, the file is ordered - the oldest time is the first line; the newest
                      > >time is the last line.[/color]
                      >
                      > which complicates things... my example would only work if the lines
                      > were in newest-first format...
                      >
                      > you _could_ reverse the array once you've read it in from file, but
                      > this would add to script time
                      > (http://www.php.net/manual/en/function.array-reverse.php)
                      >
                      > but better to work through the array backwards...
                      >
                      > <?php
                      > $avs = file($viscounte r);
                      > // do other stuff (set time constants, etc)
                      > for( $i=count($avs)-1; $i>=0; $i-- )
                      > {
                      > if( $avs[$i] > TIME24h )
                      > {
                      > ...
                      > }
                      > ...
                      > }
                      > ?>
                      >[/color]

                      This seems to work - much faster! Thanks!

                      define('TIME24h ', $now - 86400);
                      define('TIME30d ', $now - 2592000);
                      define('TIME365 d', $now - 31536000);
                      $avs = file($viscounte r);
                      for( $i=count($avs)-1; $i>=0; $i-- )
                      {
                      if( $avs[$i] > TIME24h )
                      {
                      ++$v24h;
                      }
                      if( $avs[$i] > TIME30d )
                      {
                      ++$v30d;
                      }
                      if( $avs[$i] > TIME365d )
                      {
                      ++$v365d;
                      }
                      if( $avs[$i] < TIME365d )
                      {
                      echo "<br>some entries are older than 1 year";
                      break;
                      //does break statement look okay?
                      }
                      }
                      echo "<br>v24h = ".$v24h;
                      echo "<br>v30d = ".$v30d;
                      echo "<br>v365d = ".$v365d;




                      Comment

                      • deko

                        #12
                        Re: How to inspect array elements based on criteria?

                        > but better to work through the array backwards...
                        [color=blue]
                        > $avs = file($viscounte r);
                        > for( $i=count($avs)-1; $i>=0; $i-- )[/color]

                        Not sure I understand this...

                        count($avs) is the line number of the last line in the file - correct?

                        So if it's a 10-line file, the for statement says:

                        start with i = 9,
                        while i is greater than or equal to 0,
                        decrement i by one.

                        Why start at 9 rather than 10?
                        Why start at the bottom of the file? Is that where the pointer is in the
                        array, after it reads the file into memory? How is it more efficient?


                        Comment

                        • Geoff Berrow

                          #13
                          Re: How to inspect array elements based on criteria?

                          I noticed that Message-ID:
                          <FV0Lc.24185$dv 7.11361@newssvr 27.news.prodigy .com> from deko contained
                          the following:
                          [color=blue]
                          >count($avs) is the line number of the last line in the file - correct?[/color]

                          Nope, it's the number of elements in the array.[color=blue]
                          >
                          >So if it's a 10-line file, the for statement says:[/color]
                          10 element array[color=blue]
                          >
                          >start with i = 9,
                          >while i is greater than or equal to 0,
                          >decrement i by one.
                          >
                          >Why start at 9 rather than 10?[/color]

                          because arrays start at [0]

                          --
                          Geoff Berrow (put thecat out to email)
                          It's only Usenet, no one dies.
                          My opinions, not the committee's, mine.
                          Simple RFDs http://www.ckdog.co.uk/rfdmaker/

                          Comment

                          • deko

                            #14
                            Re: How to inspect array elements based on criteria?

                            > >count($avs) is the line number of the last line in the file - correct?[color=blue]
                            >
                            > Nope, it's the number of elements in the array.[color=green]
                            > >
                            > >So if it's a 10-line file, the for statement says:[/color]
                            > 10 element array[color=green]
                            > >
                            > >start with i = 9,
                            > >while i is greater than or equal to 0,
                            > >decrement i by one.
                            > >
                            > >Why start at 9 rather than 10?[/color]
                            >
                            > because arrays start at [0][/color]

                            I see. But I'm still curious how it's more efficeint to start at the last
                            element in the array.
                            [color=blue]
                            > for( $i=count($avs)-1; $i>=0; $i-- )[/color]

                            Is this because that's where the pointer is or something like that - or
                            should the array be reversed?


                            Comment

                            • Geoff Berrow

                              #15
                              Re: How to inspect array elements based on criteria?

                              I noticed that Message-ID:
                              <bx3Lc.12762$GU .4325@newssvr25 .news.prodigy.c om> from deko contained the
                              following:
                              [color=blue]
                              >I see. But I'm still curious how it's more efficeint to start at the last
                              >element in the array.[/color]
                              Because that is how your file is ordered.

                              --
                              Geoff Berrow (put thecat out to email)
                              It's only Usenet, no one dies.
                              My opinions, not the committee's, mine.
                              Simple RFDs http://www.ckdog.co.uk/rfdmaker/

                              Comment

                              Working...