is file_exists expensive in performance terms?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • lkrubner@geocities.com

    is file_exists expensive in performance terms?


    I've written some template code and one thing I'm trying to protect
    against is references to images that don't exist. Because users have
    the ability to muck around with the templates after everything's been
    set up, there is a chance they'll delete an image or ruin some tag
    after the web designer has set up everything perfectly. I want the
    software to catch mistakes like that and, at the very least, not show
    broken links. On the control panel sometimes as many as 100 thumbnails
    are run for each page, I'm wondering if running file_exists on all of
    those slows things down at all?

  • Andy Hassall

    #2
    Re: is file_exists expensive in performance terms?

    On 2 Feb 2005 15:48:41 -0800, lkrubner@geocit ies.com wrote:
    [color=blue]
    >I've written some template code and one thing I'm trying to protect
    >against is references to images that don't exist. Because users have
    >the ability to muck around with the templates after everything's been
    >set up, there is a chance they'll delete an image or ruin some tag
    >after the web designer has set up everything perfectly. I want the
    >software to catch mistakes like that and, at the very least, not show
    >broken links. On the control panel sometimes as many as 100 thumbnails
    >are run for each page, I'm wondering if running file_exists on all of
    >those slows things down at all?[/color]

    Clearly, yes - running file_exists() is slower than not running file_exists().
    But that's not a useful thing to say. You've given good reasons why you should
    be checking the existence of the files, so removing the call doesn't seem to be
    an option. file_exists() is the PHP method for checking whether a file exists,
    after all.

    Along with the question about comments, you seem to be trying to optimise at
    random; as another poster pointed out, you need to profile your code to measure
    whether your file_exists() calls, or any other particular part of the script,
    take up a significant proportion of your code's runtime before you start
    thinking about optimising or removing them - there's not much point getting a
    10% reduction in elapsed time on a part that takes 0.01% of the total runtime.

    The cost of file_exists() depends on several factors; operating system, speed
    of disk, whether the directory inode's in cache, filesystem type, number of
    files in directory, etc.

    --
    Andy Hassall / <andy@andyh.co. uk> / <http://www.andyh.co.uk >
    <http://www.andyhsoftwa re.co.uk/space> Space: disk usage analysis tool

    Comment

    • steve

      #3
      Re: is file_exists expensive in performance terms?

      "lkrubner" wrote:[color=blue]
      >I’ve written some template code and one thing I’m trying
      >to protect
      >against is references to images that don’t exist. Because users
      >have
      >the ability to muck around with the templates after everything’s
      >been
      >set up, there is a chance they’ll delete an image or ruin some
      >tag
      >after the web designer has set up everything perfectly. I want the
      >software to catch mistakes like that and, at the very least, not show
      >broken links. On the control panel sometimes as many as 100[/color]
      thumbnails[color=blue]
      >are run for each page, I’m wondering if running file_exists on
      >all of
      >those slows things down at all?[/color]

      Put a loop of 10000 iterations around file_exists, and see the timing.
      Can’t get more accurate than that.

      If too worried about it, perhaps you can put a reference to those
      files inside a mysql table?

      --
      Posted using the http://www.dbforumz.com interface, at author's request
      Articles individually checked for conformance to usenet standards
      Topic URL: http://www.dbforumz.com/PHP-file_exi...ict194240.html
      Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbforumz.com/eform.php?p=657451

      Comment

      • Jean-Baptiste Nizet

        #4
        Re: is file_exists expensive in performance terms?

        No. This won't be really accurate, because as the documentation says,
        the results of this function are cached. So you have to make sure each
        call is made on a different file. Moreover, the cost of the function
        might be different if the file exists or not. It might also be
        different if the directory contains 2 or 2000 files.

        Comment

        • Senator Jay Billington Bulworth

          #5
          Re: is file_exists expensive in performance terms?

          lkrubner@geocit ies.com wrote in news:1107388121 .254255.123800
          @z14g2000cwz.go oglegroups.com:
          [color=blue]
          >
          > I've written some template code and one thing I'm trying to protect
          > against is references to images that don't exist. Because users have
          > the ability to muck around with the templates after everything's been
          > set up, there is a chance they'll delete an image or ruin some tag
          > after the web designer has set up everything perfectly. I want the
          > software to catch mistakes like that and, at the very least, not show
          > broken links. On the control panel sometimes as many as 100 thumbnails
          > are run for each page, I'm wondering if running file_exists on all of
          > those slows things down at all?
          >[/color]

          After some local testing, I believe the answer is no. I built the following
          script:

          <?php

          function getmicrotime(){
          list($usec, $sec) = explode(' ', microtime());
          return ((float)$usec + (float)$sec);
          }

          #Find out how long it takes to do nothing 1000 times
          $start = getmicrotime();
          for($i=0; $i<1000; $i++){
          ;; // Control - do nothing
          }
          $finish = getmicrotime();
          $latency = sprintf("%.2f", ($finish - $start));
          echo "It took $latency seconds to do nothing 1000 times.\n";

          #Find out how long it takes to sleep 10 seconds
          $start = getmicrotime();
          sleep(10);
          $finish = getmicrotime();
          $latency = sprintf("%.2f", ($finish - $start));
          echo "It took $latency seconds to sleep 10 seconds.\n";

          #Find out how long it takes to generate 1000 random filenames
          $start = getmicrotime();
          for($i=0; $i<1000; $i++){
          $foo = uniqid('');
          }
          $finish = getmicrotime();
          $latency = sprintf("%.2f", ($finish - $start));
          echo "It took $latency seconds to call uniqid() 1000 times.\n";

          #Find out how long it takes to lookup 1000 random filenames
          $start = getmicrotime();
          for($i=0; $i<1000; $i++){
          $foo = file_exists('/home/foo/' . uniqid(''));
          }
          $finish = getmicrotime();
          $latency = sprintf("%.2f", ($finish - $start));
          echo "It took $latency seconds to look up 1000 files.\n";

          ?>

          The purpose of this script is to test how long it takes to do various
          things 1,000 times. First, it tests doing nothing. Then, it tests sleeping
          for 10 seconds as another control. Next, it generates 1,000 random
          filenames using uniqid(). Finally, it runs file_exists() against 1,000
          filenames randomly generated with uniqid().

          Here is the output from a few runs on my local dev machine:

          [root@winfosec phptest]# php test.php
          It took 0.00 seconds to do nothing 1000 times.
          It took 10.00 seconds to sleep 10 seconds.
          It took 20.00 seconds to call uniqid() 1000 times.
          It took 20.05 seconds to look up 1000 files.

          [root@winfosec phptest]# php test.php
          It took 0.00 seconds to do nothing 1000 times.
          It took 10.01 seconds to sleep 10 seconds.
          It took 20.04 seconds to call uniqid() 1000 times.
          It took 20.06 seconds to look up 1000 files.

          [root@winfosec phptest]# php test.php
          It took 0.00 seconds to do nothing 1000 times.
          It took 10.01 seconds to sleep 10 seconds.
          It took 20.02 seconds to call uniqid() 1000 times.
          It took 20.04 seconds to look up 1000 files.

          What I'm really looking at here are the last two values. Test #3 judges the
          generation of random strings via uniqid(''). Test #4 tests the time it
          takes to perform file_exists() on random filenames generated via uniqid
          (''). Test #4 takes approximately the same time as test #3. That tells me
          that the file_exists() calls aren't using so much time as the uniqid('')
          calls are.

          file_exists() appears to be a pretty low-resource function, at least from
          my results. This test was performed using PHP 4.3.10 on a FreeBSD 4.10
          system, P3 600mhz, 40 megs of RAM. Better systems will no doubt give better
          results. If your server is a bit more modern than my test machine, I would
          suggest that you shouldn't have any problem calling file_exists() hundreds
          or even thousands of times per execution.

          hth


          --

          Bulworth : PHP/MySQL/Unix | Email : str_rot13('f@fu ng.arg');
          --------------------------|---------------------------------
          <http://www.phplabs.com/> | PHP scripts, webmaster resources

          Comment

          • Chung Leong

            #6
            Re: is file_exists expensive in performance terms?


            <lkrubner@geoci ties.com> wrote in message
            news:1107388121 .254255.123800@ z14g2000cwz.goo glegroups.com.. .[color=blue]
            >
            > I've written some template code and one thing I'm trying to protect
            > against is references to images that don't exist. Because users have
            > the ability to muck around with the templates after everything's been
            > set up, there is a chance they'll delete an image or ruin some tag
            > after the web designer has set up everything perfectly. I want the
            > software to catch mistakes like that and, at the very least, not show
            > broken links. On the control panel sometimes as many as 100 thumbnails
            > are run for each page, I'm wondering if running file_exists on all of
            > those slows things down at all?
            >[/color]

            The answer is yes and no. Yes, file_exists() is fairly expensive. Although
            PHP does cache directory info, I do not believe the cache lives beyond the
            lifetime of the request. At best, it's per thread cache, meaning requests
            handled by other server threads would still end up hitting the OS.

            The answer is also no, however, because whatever overhead will be small
            compared to loading 100 thumbnails.


            Comment

            Working...