Locking threads

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jim

    Locking threads

    Hi,

    I'm developing a CMS and I'd like to be able cache the site "tree" in
    a multi-dimensional array in shared memory (far too many sql calls
    otherwise). When someone adds an item in the tree I need to be able to
    read in the array from shared memory, add the new item, then write it
    back to shared memory.....all in one atomic action.

    I've done plenty of research and short of using something like
    eaccelerator or mmcache I'm stuck with PHP semaphores which even then
    don't appear to be thread safe, only process safe (correct me if I'm
    wrong) - and then I'm restricted to *nix systems.

    Is there any way to create a method of doing the above which will work
    on *nix and Windows, whether it's multi-threaded or single-threaded?

    Thanks,

    Jim.

  • Jerry Stuckle

    #2
    Re: Locking threads

    Jim wrote:
    Hi,
    >
    I'm developing a CMS and I'd like to be able cache the site "tree" in
    a multi-dimensional array in shared memory (far too many sql calls
    otherwise). When someone adds an item in the tree I need to be able to
    read in the array from shared memory, add the new item, then write it
    back to shared memory.....all in one atomic action.
    >
    I've done plenty of research and short of using something like
    eaccelerator or mmcache I'm stuck with PHP semaphores which even then
    don't appear to be thread safe, only process safe (correct me if I'm
    wrong) - and then I'm restricted to *nix systems.
    >
    Is there any way to create a method of doing the above which will work
    on *nix and Windows, whether it's multi-threaded or single-threaded?
    >
    Thanks,
    >
    Jim.
    >
    Jim,

    Use a database. There are dozens around which use databases; if
    implemented properly they can be quite efficient.

    --
    =============== ===
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    jstucklex@attgl obal.net
    =============== ===

    Comment

    • Jim

      #3
      Re: Locking threads

      I'm developing a CMS and I'd like to be able cache the site "tree" in
      a multi-dimensional array in shared memory (far too many sql calls
      otherwise). When someone adds an item in the tree I need to be able to
      read in the array from shared memory, add the new item, then write it
      back to shared memory.....all in one atomic action.
      >
      I've done plenty of research and short of using something like
      eaccelerator or mmcache I'm stuck with PHP semaphores which even then
      don't appear to be thread safe, only process safe (correct me if I'm
      wrong) - and then I'm restricted to *nix systems.
      >
      Is there any way to create a method of doing the above which will work
      on *nix and Windows, whether it's multi-threaded or single-threaded?
      Jim,
      >
      Use a database. There are dozens around which use databases; if
      implemented properly they can be quite efficient.
      Hi Jerry,

      If I could think of a way of doing it efficiently then I'd stick with
      db only, but I can't see how. For example, I have a table which
      represents the structure of the site, so to put it simply each record
      has an id and a parent id. To build say a left hand nav I may need to
      call 3 or 4 sql statements to get all the data I need which I'd like
      to avoid doing if possible.

      Thanks,

      Jim.

      Comment

      • rf

        #4
        Re: Locking threads


        "Jim" <jimyt@yahoo.co mwrote in message
        news:1184570751 .961494.54930@m 3g2000hsh.googl egroups.com...
        >Use a database. There are dozens around which use databases; if
        >implemented properly they can be quite efficient.
        >
        Hi Jerry,
        >
        If I could think of a way of doing it efficiently then I'd stick with
        db only, but I can't see how. For example, I have a table which
        represents the structure of the site, so to put it simply each record
        has an id and a parent id. To build say a left hand nav I may need to
        call 3 or 4 sql statements to get all the data I need which I'd like
        to avoid doing if possible.
        Premature optimization?

        Three or four sql calls is sub-millisecond (once the operating system has
        *cached* your database working set for you). Compare this to the tens of
        milliseconds for a TCP/IP packet exchange or, from over here in .au,
        hundreds of milliseconds.

        Have you done any benchmarking to prove the sql calls are really a problem?

        --
        Richard.


        Comment

        • Jim

          #5
          Re: Locking threads

          If I could think of a way of doing it efficiently then I'd stick with
          db only, but I can't see how. For example, I have a table which
          represents the structure of the site, so to put it simply each record
          has an id and a parent id. To build say a left hand nav I may need to
          call 3 or 4 sql statements to get all the data I need which I'd like
          to avoid doing if possible.
          >
          Premature optimization?
          Perhaps, but I'm still interested in the shared memory caching options
          as there's more than once instance where I believe caching of data
          would come in handy. At the moment I'm not seeing issues with with
          performance, however on accessing a page I may have 5 or 6 db
          statements accessing data which rarely changes (i.e. site structure,
          data dictionary) and it feels like I'm making the db work un-
          necessarily when the data could be cached. I need for the cms to be
          able to scale well and I feel that ignoring caching would be a
          mistake.

          Jim.

          Comment

          • rf

            #6
            Re: Locking threads


            "Jim" <jimyt@yahoo.co mwrote in message
            news:1184578259 .619879.293680@ k79g2000hse.goo glegroups.com.. .
            If I could think of a way of doing it efficiently then I'd stick with
            db only, but I can't see how. For example, I have a table which
            represents the structure of the site, so to put it simply each record
            has an id and a parent id. To build say a left hand nav I may need to
            call 3 or 4 sql statements to get all the data I need which I'd like
            to avoid doing if possible.
            >>
            >Premature optimization?
            >
            Perhaps, but I'm still interested in the shared memory caching options
            You already have one. Your operating system. Modern OSs (and many older
            ones) are very very good at keeping a processes working set in memory. The
            authors have spent many years fine tuning the caching algorithms. A cache
            miss is very expensive (read many milliseconds). A cache hit is to be
            ignored (microseconds).

            <checks OSYep. Of the two gigabytes of physical memory present on this
            computer the OS is currently allocating almost one gig to System Cache. In
            there would be the working set of each program I have open (not windows,
            programs), the working set of each process (yes, windows) I have open and
            most likely the contents of the most recent files each process has opened.
            In the case of my database server I would expect that most of the indexes
            and much of the data that I have recently hit would be in the OS cache. All
            this is over and above the virtual memory living behind my real memory,
            which is much faster than any file other access.

            Your server would be running far less processes than I am at the moment.
            as there's more than once instance where I believe caching of data
            would come in handy. At the moment I'm not seeing issues with with
            performance, however on accessing a page I may have 5 or 6 db
            statements accessing data which rarely changes (i.e. site structure,
            data dictionary)
            So, those things will most likely be in the OS cache, if they come from a
            database (after the first hit on your page, that is). If *you* re-cache them
            in memory then you are defeating the OS or the database cache, since your
            memory cache will use up memory that they may have been able to use. And,
            I'll bet, they are better at cache algorithms than you, or I, are :-)
            and it feels like I'm making the db work un-
            necessarily when the data could be cached.
            Cross purposes. The db is working from cached data. Why layer another
            caching system over the top of that?
            I need for the cms to be
            able to scale well
            Think about all the other CMS's around. They all use a database. Then think
            about the obviously database driven web sites out there. CNN, ebay and, most
            to the point, Google. Do you really think they have put lots of effort into
            building a turnkey memory cache? Nope. They rely on the technologies that
            lie underneath the sql call. They rely on the database manager to perform
            properly (ie, cache where it can, and should), which relies on the operating
            system to perform properly (ie, cache where it can), which relies on the
            hardware to perform properly (ie, to cache where it can and yes, modern disk
            drives do cache, they even do read ahead, anticipating that if you have just
            read this bit you will probably read what follows pretty soon).
            and I feel that ignoring caching would be a
            mistake.
            Developing a cache to lie above all the other technology that is already
            there would IMHO be the mistake. Better to simply add one more index in the
            right place to your database, so your database manager can use the (cached)
            index to better access your data. Or compress that 200K image you have down
            to the 20K it should be.

            Finally, what about all the other things that happen during a page access.
            How many PHP files make up the page? (you do use include don't you?) How
            many (correctly compressed) images are there? The CSS files? Javascript? And
            where do these files live, after the first hit on your site? In the OS cache
            :-)

            Premature optmization.

            Phew, it's now time for a quiet beer ;-)
            --
            Richard.



            Comment

            • Jerry Stuckle

              #7
              Re: Locking threads

              Jim wrote:
              >>I'm developing a CMS and I'd like to be able cache the site "tree" in
              >>a multi-dimensional array in shared memory (far too many sql calls
              >>otherwise). When someone adds an item in the tree I need to be able to
              >>read in the array from shared memory, add the new item, then write it
              >>back to shared memory.....all in one atomic action.
              >>I've done plenty of research and short of using something like
              >>eaccelerato r or mmcache I'm stuck with PHP semaphores which even then
              >>don't appear to be thread safe, only process safe (correct me if I'm
              >>wrong) - and then I'm restricted to *nix systems.
              >>Is there any way to create a method of doing the above which will work
              >>on *nix and Windows, whether it's multi-threaded or single-threaded?
              >
              >Jim,
              >>
              >Use a database. There are dozens around which use databases; if
              >implemented properly they can be quite efficient.
              >
              Hi Jerry,
              >
              If I could think of a way of doing it efficiently then I'd stick with
              db only, but I can't see how. For example, I have a table which
              represents the structure of the site, so to put it simply each record
              has an id and a parent id. To build say a left hand nav I may need to
              call 3 or 4 sql statements to get all the data I need which I'd like
              to avoid doing if possible.
              >
              Thanks,
              >
              Jim.
              >
              It is efficient - and probably a lot more so than what you're trying to
              do. Not only do you need to cache it in memory, but you need ways to
              identify the cached data, determine if it is in memory, if not, load it
              into memory and a whole bunch of other things. All of these can easily
              take more time than a simple database (or three or four) access(s).

              There's a good reason why every CMS today uses databases - it works, and
              it works well.

              And don't prematurely optimize your code.

              --
              =============== ===
              Remove the "x" from my email address
              Jerry Stuckle
              JDS Computer Training Corp.
              jstucklex@attgl obal.net
              =============== ===

              Comment

              • FrobinRobin

                #8
                Re: Locking threads

                On Jul 16, 11:30 am, "rf" <r...@invalid.c omwrote:
                "Jim" <ji...@yahoo.co mwrote in message
                >
                news:1184578259 .619879.293680@ k79g2000hse.goo glegroups.com.. .
                >
                If I could think of a way of doing it efficiently then I'd stick with
                db only, but I can't see how. For example, I have a table which
                represents the structure of the site, so to put it simply each record
                has an id and a parent id. To build say a left hand nav I may need to
                call 3 or 4 sql statements to get all the data I need which I'd like
                to avoid doing if possible.
                >
                Premature optimization?
                >
                Perhaps, but I'm still interested in the shared memory caching options
                >
                You already have one. Your operating system. Modern OSs (and many older
                ones) are very very good at keeping a processes working set in memory. The
                authors have spent many years fine tuning the caching algorithms. A cache
                miss is very expensive (read many milliseconds). A cache hit is to be
                ignored (microseconds).
                >
                <checks OSYep. Of the two gigabytes of physical memory present on this
                computer the OS is currently allocating almost one gig to System Cache. In
                there would be the working set of each program I have open (not windows,
                programs), the working set of each process (yes, windows) I have open and
                most likely the contents of the most recent files each process has opened.
                In the case of my database server I would expect that most of the indexes
                and much of the data that I have recently hit would be in the OS cache. All
                this is over and above the virtual memory living behind my real memory,
                which is much faster than any file other access.
                >
                Your server would be running far less processes than I am at the moment.
                >
                as there's more than once instance where I believe caching of data
                would come in handy. At the moment I'm not seeing issues with with
                performance, however on accessing a page I may have 5 or 6 db
                statements accessing data which rarely changes (i.e. site structure,
                data dictionary)
                >
                So, those things will most likely be in the OS cache, if they come from a
                database (after the first hit on your page, that is). If *you* re-cache them
                in memory then you are defeating the OS or the database cache, since your
                memory cache will use up memory that they may have been able to use. And,
                I'll bet, they are better at cache algorithms than you, or I, are :-)
                >
                and it feels like I'm making the db work un-
                necessarily when the data could be cached.
                >
                Cross purposes. The db is working from cached data. Why layer another
                caching system over the top of that?
                >
                I need for the cms to be
                able to scale well
                >
                Think about all the other CMS's around. They all use a database. Then think
                about the obviously database driven web sites out there. CNN, ebay and, most
                to the point, Google. Do you really think they have put lots of effort into
                building a turnkey memory cache? Nope. They rely on the technologies that
                lie underneath the sql call. They rely on the database manager to perform
                properly (ie, cache where it can, and should), which relies on the operating
                system to perform properly (ie, cache where it can), which relies on the
                hardware to perform properly (ie, to cache where it can and yes, modern disk
                drives do cache, they even do read ahead, anticipating that if you have just
                read this bit you will probably read what follows pretty soon).
                >
                and I feel that ignoring caching would be a
                mistake.
                >
                Developing a cache to lie above all the other technology that is already
                there would IMHO be the mistake. Better to simply add one more index in the
                right place to your database, so your database manager can use the (cached)
                index to better access your data. Or compress that 200K image you have down
                to the 20K it should be.
                >
                Finally, what about all the other things that happen during a page access.
                How many PHP files make up the page? (you do use include don't you?) How
                many (correctly compressed) images are there? The CSS files? Javascript? And
                where do these files live, after the first hit on your site? In the OS cache
                :-)
                >
                Premature optmization.
                >
                Phew, it's now time for a quiet beer ;-)
                --
                Richard.
                i am developing a system which relies heavily on multiple includes()
                functions and has lots of sql calls/data.. so i saved my sql results
                to a session array (so it only calls the results once and updates it
                only when needed).. i thought all the sql would slow the page down a
                lot (especially as I'm on a shared hosting server) but to my surprise
                PHP is really really quick - i'm not even up to half a second yet :)

                Comment

                • Jim

                  #9
                  Re: Locking threads

                  as there's more than once instance where I believe caching of data
                  would come in handy. At the moment I'm not seeing issues with with
                  performance, however on accessing a page I may have 5 or 6 db
                  statements accessing data which rarely changes (i.e. site structure,
                  data dictionary)
                  >
                  So, those things will most likely be in the OS cache, if they come from a
                  database (after the first hit on your page, that is). If *you* re-cache them
                  in memory then you are defeating the OS or the database cache, since your
                  memory cache will use up memory that they may have been able to use. And,
                  I'll bet, they are better at cache algorithms than you, or I, are :-)
                  I understand what you're saying but when you consider that each time I
                  display a nav I need to execute a recursive function which may result
                  in many calls to the database then I struggle to believe that it'll be
                  anywhere near as quick as retrieving an array from shared memory, even
                  if the entire database if cached....I'll have to perform some tests.
                  and it feels like I'm making the db work un-
                  necessarily when the data could be cached.
                  >
                  Cross purposes. The db is working from cached data. Why layer another
                  caching system over the top of that?
                  Because there's a fair amount of php code executed to build the multi-
                  dimensional arrays that represent the site structure and data
                  dictionary. It would save the execution of this code every time.

                  I think I need to run some tests and see what happens. I'll give
                  eAccelerator a go for now.

                  Cheers,

                  Jim.

                  and I feel that ignoring caching would be a
                  mistake.
                  >
                  Developing a cache to lie above all the other technology that is already
                  there would IMHO be the mistake. Better to simply add one more index in the
                  right place to your database, so your database manager can use the (cached)
                  index to better access your data. Or compress that 200K image you have down
                  to the 20K it should be.

                  Comment

                  • Jim

                    #10
                    Re: Locking threads

                    i am developing a system which relies heavily on multiple includes()
                    functions and has lots of sql calls/data.. so i saved my sql results
                    to a session array (so it only calls the results once and updates it
                    only when needed)..
                    I'm effectively doing that at the moment but I'd like to take it one
                    step further with shared memory. Did you notice that you're system was
                    slower before caching the mysql results?

                    Thanks,

                    Jim.

                    Comment

                    • Rik

                      #11
                      Re: Locking threads

                      On Mon, 16 Jul 2007 13:21:39 +0200, Jim <jimyt@yahoo.co mwrote:
                      I understand what you're saying but when you consider that each time I
                      display a nav I need to execute a recursive function which may result
                      in many calls to the database then I struggle to believe that it'll be
                      anywhere near as quick as retrieving an array from shared memory, even
                      if the entire database if cached....I'll have to perform some tests.
                      It should not have to be like that. If you have an adjacency model, what
                      about this (bogus code, unchecked):

                      $navs = mysql_query('SE LECT id, name, parent FROM table');
                      $pages = array();
                      while($page = mysql_fetch_ass oc($navs)){
                      $pages[$page['id']] = $page;
                      }
                      foreach($pages as $id =$page){
                      //Are 'root'-nodes with parent = 0 or NULL? both will be taken into
                      account:
                      $parent = ($page['parent'] 0) ? $page['parent'] : 0;
                      //may be unneccesary, but I like to be straight:
                      if(!isset($page s[$parent]['childs']) $pages[$parent]['childs'] = array();
                      //reference it in the parent:
                      $pages[$parent]['childs'][] =& $pages[$id];
                      }
                      print_r($pages[0]['childs']);

                      1 query, some fiddling with references, and you're done, be very wary for
                      recursion in your tree though.

                      Offcourse, you could always try a nested set, might be more appropriate
                      for a navigation:

                      and it feels like I'm making the db work un-
                      necessarily when the data could be cached.
                      >>
                      >Cross purposes. The db is working from cached data. Why layer another
                      >caching system over the top of that?
                      >
                      Because there's a fair amount of php code executed to build the multi-
                      dimensional arrays that represent the site structure and data
                      dictionary. It would save the execution of this code every time.
                      Possibly go for the simple solution of saving the tree once on changes in
                      the backend with var_export(), and just calling that from somewhere (file,
                      db, etc)?

                      However, as others have said, only employ this kind of thing when you
                      think your server isn't coping right now/takes to long to complete a page.
                      If it's allright without chaching, don't bother with it.
                      --
                      Rik Wasmus

                      Comment

                      Working...