Data objects vs. speed

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rpsetzer@yahoo.com

    Data objects vs. speed

    I have to create a big web application and I was thinking of using a
    data layer. For each entity in the database, I'll define a class that
    maps the table structure, having sub-objects for each foreign key,
    having insert/delete/update methods, the usual deal. Yet, I am very
    concerned about performance. For example, there are lots of cases when
    I may just be needing the employee name. Yet using this model, I will
    have to instantiate an entire Employee class, which may have sub-
    objects and use lots of SELECT queries. I think this will be an
    important performance hit. What do you think? Ever did something like
    this? Is it worthy? Is there a better way? Should I give up creating
    the data layer?

  • lister

    #2
    Re: Data objects vs. speed

    I have just implemented something similar.

    You can structure the SELECTs in such a way that they are only run if
    the relavent information is requested from the class via a get method.
    For instance, if any of the methods requesting user information are
    called, a SELECT is run which gets all user information (since all
    info is on the same row - might as well get it all). Any further
    requests then give up this cached info rather than another SELECT.

    However, I share your concern with having to instantiate a large class
    for even simple queries. I am a newbie at PHP (C++ coder by trade) and
    am only now reading about the performance overheard of function calls
    and the fact that PHP will have to parse the entire class code.

    Is it worth the trade off? I don't know. I has certainly made the
    coding a whole lot easier and more structured. I am now implementing
    cacheing certain data in the session, and this is a snip as everything
    is accessed via the same class.

    I will read this thread with interest.




    Comment

    • Jerry Stuckle

      #3
      Re: Data objects vs. speed

      rpsetzer@yahoo. com wrote:
      I have to create a big web application and I was thinking of using a
      data layer. For each entity in the database, I'll define a class that
      maps the table structure, having sub-objects for each foreign key,
      having insert/delete/update methods, the usual deal. Yet, I am very
      concerned about performance. For example, there are lots of cases when
      I may just be needing the employee name. Yet using this model, I will
      have to instantiate an entire Employee class, which may have sub-
      objects and use lots of SELECT queries. I think this will be an
      important performance hit. What do you think? Ever did something like
      this? Is it worthy? Is there a better way? Should I give up creating
      the data layer?
      >
      I do objects in all my more complex sites. But I generally use business
      objects - ones related to the job at hand. Any business object may be
      related to a few columns in one table, most columns across multiple
      tables, or anything in between.

      And I may even use the same object, but have different fetch methods
      which retrieve only partial data.

      For instance, if I have:

      table student
      id int
      name char(30)
      other stuff

      table class
      id int
      name char
      instructor_id int
      other stuff

      table class_student
      class_id int
      student_id int

      I might have a class which will retrieve all the student names, the
      instructor name, day(s) and start/s for a single class, the room in
      which it will be held, etc.. Alternatively, it might only retrieve a
      count of the number of students in the class

      If the data is getting more complex, I may add a pure data layer under
      the business object layer, where I do have one table per object. But
      this layer will also have multiple functions to allow me to retrieve
      what I need, when I need it.

      --
      =============== ===
      Remove the "x" from my email address
      Jerry Stuckle
      JDS Computer Training Corp.
      jstucklex@attgl obal.net
      =============== ===

      Comment

      • Jerry Stuckle

        #4
        Re: Data objects vs. speed

        lister wrote:
        I have just implemented something similar.
        >
        You can structure the SELECTs in such a way that they are only run if
        the relavent information is requested from the class via a get method.
        For instance, if any of the methods requesting user information are
        called, a SELECT is run which gets all user information (since all
        info is on the same row - might as well get it all). Any further
        requests then give up this cached info rather than another SELECT.
        >
        However, I share your concern with having to instantiate a large class
        for even simple queries. I am a newbie at PHP (C++ coder by trade) and
        am only now reading about the performance overheard of function calls
        and the fact that PHP will have to parse the entire class code.
        >
        Is it worth the trade off? I don't know. I has certainly made the
        coding a whole lot easier and more structured. I am now implementing
        cacheing certain data in the session, and this is a snip as everything
        is accessed via the same class.
        >
        I will read this thread with interest.
        >
        >
        >
        >
        In a big class where you need lots of data, this would be a performance
        disaster. Some of my tables have 40+ columns - and making 40+ SELECT
        statements, each fetching one item, is a tremendous waste of time.

        --
        =============== ===
        Remove the "x" from my email address
        Jerry Stuckle
        JDS Computer Training Corp.
        jstucklex@attgl obal.net
        =============== ===

        Comment

        • lister

          #5
          Re: Data objects vs. speed

          In a big class where you need lots of data, this would be a performance
          disaster. Some of my tables have 40+ columns - and making 40+ SELECT
          statements, each fetching one item, is a tremendous waste of time.
          I think you misunderstand me. There is only one select statement per
          table. Yes, I could have 40 columns, and 40 different get*() methods,
          but only the first call to one of those methods would result in a
          SELECT *. Then all 40 pieces of information would be populated in the
          class, and any subsequent get*() calls would simply return the value.

          Comment

          • Jerry Stuckle

            #6
            Re: Data objects vs. speed

            lister wrote:
            >In a big class where you need lots of data, this would be a performance
            >disaster. Some of my tables have 40+ columns - and making 40+ SELECT
            >statements, each fetching one item, is a tremendous waste of time.
            >
            I think you misunderstand me. There is only one select statement per
            table. Yes, I could have 40 columns, and 40 different get*() methods,
            but only the first call to one of those methods would result in a
            SELECT *. Then all 40 pieces of information would be populated in the
            class, and any subsequent get*() calls would simply return the value.
            >
            OK, that's better. But it's also a lot of unnecessary overhead to fetch
            40 columns if you only need 3.

            Additionally, you should never use SELECT * - put in the names of the
            columns instead. For instance, what happens if someone adds a 10Mb BLOB
            field to your table and you SELECT *?

            As I indicated above, a better compromise between getting everything and
            getting just one field is to have more than one fetch method to allow
            fetching of multiple columns.

            For instance, on one customer I have one table (companies they sell to)
            with 42 columns (and yes, it is normalized). One fetch might get
            company name and id. Another might get this plus address and phone
            info. A third one might get all of this plus current financial info.

            And since this is a business object, I could have a method which fetches
            outstanding transactions. Another can fetch all transactions. These
            and others like them link to other tables - but that is transparent to
            the program itself.

            --
            =============== ===
            Remove the "x" from my email address
            Jerry Stuckle
            JDS Computer Training Corp.
            jstucklex@attgl obal.net
            =============== ===

            Comment

            • Toby A Inkster

              #7
              Re: Data objects vs. speed

              rpsetzer wrote:
              Yet, I am very concerned about performance. For example, there are lots
              of cases when I may just be needing the employee name.
              The approach I'm using in a current project is along these lines:

              public function load_from_datab ase($keys) {...}
              public function load_from_array ($vals) {...}

              Such that the class can be populated from either the database, or from an
              array. It can be partially populated, so that, for example, you can load
              the Employee ID and Employee Name from an array like so:

              $e = new Employee();
              $e->load_from_arra y(array('id'=>1 , 'name'=>'Joe')) ;

              where the other fields like address, telephone number, etc would remain
              blank within the object, but if they were asked for:

              print $e->get('phone') ;

              then the Employee object is smart enough to grab all the missing data from
              the database and return the appropriate value.

              So for example:

              $db = new PDO(...);
              foreach ($db->query('SELEC T id, name FROM employees') as $row)
              {
              $e = new Employee();
              $e->load_from_arra y($row);
              print $e->get('name');
              if ($e->get('name')==' Joe')
              print ' '.$e->get('phone') ;
              print "\n";
              }

              will work as expected, printing a list of employees, including phone
              numbers for employees called Joe. Yes, that's right -- even when the
              initial query didn't select phone numbers!

              When you perform the get('phone') call, $e realises that 'phone' is NULL,
              so performs its own call to load_from_datab ase, which runs 'SELECT * FROM
              employees WHERE id={$this->id}' and then feeds the results into
              load_from_array .

              --
              Toby A Inkster BSc (Hons) ARCS
              Contact Me ~ http://tobyinkster.co.uk/contact
              Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

              * = I'm getting there!

              Comment

              • Henk Verhoeven

                #8
                Re: Data objects vs. speed

                Hi,

                Of course it depends on what you need to do and how much processing
                power you have at hand. For some purposes using a relational database
                may itself be a bad idea. But for most apps data objects can work fine,
                if you implement caching. When i developed the first version of
                phpPeanuts i wanted it to be the simpelest thing that could possebly
                work so i did not include caching. But when i tested my first serious
                app with it, caching was the first thing to be added. Without it it
                simply did not perform. With caching the app only fired 1/4th to 1/10th
                of the queries, this solved the performance problem. I implemented
                caching in the framework and i have not yet had to develop an app for
                wchich it was not sufficient.

                To implement caching each object type needs to have a key. For each
                object that is retrieved from the database a reference* is added to the
                cache, using its key as key in an associative array. Associative arrays
                are great for caching because they use hashed lookup to quickly retrieve
                objects. With hashed lookup a there is very little searching, and the
                searching does not increase substantially with large arrays.

                When navigating over a '1-n' relationship in the '1' direction you have
                the (now foreign) key, so you first look in the cache to see if the
                object has already been loaded. If it has not, you load it and cache it
                and return a reference. This mechanism is so fast that it is not needed
                to store the reference to a related object in a member variable of the
                relating object. This saves you a big headache with circular references
                that tend to make php unstable.

                When navigating over an 1-n relationship in the n direction the chache
                is of little help**. Therefore it is generally a good idea to store each
                array of objects you retrieve in a member variable of the relating object.

                BTW, you do not have to implement insert/delete/update methods for each
                object type, with metadata you can implement generic insert, delete and
                update methods so you only need three methods however many types you
                make. Or you could download a framework that does all this ;-).

                Greetings,

                Henk Verhoeven,
                www.phpPeanuts.org.

                BTW phpPeanuts does not work if you need arbitrary keys. It allways
                needs 'id' to be the key. Simpelest thing...

                * in php5 it is not necessary to think about references, but in php4 you
                may get a subtantial performance gain by using variable references here.
                ** However, if an object is stored in multiple tables but you do not
                know in advance in which ones (like with polymorphism) the cache may
                save you an extra query to retrieve additional data.

                Comment

                • Tony Marston

                  #9
                  Re: Data objects vs. speed


                  <rpsetzer@yahoo .comwrote in message
                  news:1171888343 .626985.121040@ v33g2000cwv.goo glegroups.com.. .
                  >I have to create a big web application and I was thinking of using a
                  data layer. For each entity in the database, I'll define a class that
                  maps the table structure, having sub-objects for each foreign key,
                  having insert/delete/update methods, the usual deal. Yet, I am very
                  concerned about performance. For example, there are lots of cases when
                  I may just be needing the employee name. Yet using this model, I will
                  have to instantiate an entire Employee class, which may have sub-
                  objects and use lots of SELECT queries. I think this will be an
                  important performance hit. What do you think? Ever did something like
                  this? Is it worthy? Is there a better way? Should I give up creating
                  the data layer?
                  >
                  Having a single class for each database table is a good idea - I have been
                  using it for years. The notion of having a subclass for each foreign key is
                  a definite no-no. It is totally unnecessary and a waste of time. Nor do you
                  need a separate class method for each possible SELECT ... WHERE ... as it is
                  possible to have a generic getData($where) method where the $where argument
                  is a string which is provided at runtime. This can cover all eventualities.

                  It does not matter that you have to instantiate the EMPLOYEE class
                  containing 40 columns if you only want a single column. In my methodology
                  the default is SELECT *, but a specific list of column names can be provided
                  for individual queries if required. This removes the need for a different
                  class method for each combination of SELECT ... and WHERE ...

                  One thin that you should notice is that 95% of the code in a database table
                  class is common to all database tables, therefore this common code can be
                  put into a superclass and inherited by individual table classes. This means
                  that each individual table class need contain no more than the following:
                  (1) database engine name
                  (2) database name
                  (3) table name
                  (4) database structure (column names, primary key names, candidate key
                  names, relationships with other tables)
                  (5) custom business rules

                  If you are really clever you can put your database APIs in a separate class
                  so that you can switch database engines (MySQL, PostgreSQL or Oracle) by
                  changing a single line of code.

                  The bottom line is that if you use OO classes intelligently you can save a
                  lot of repetitive coding, must don't waste time trying to create a complex
                  class hierarchy - it just ain't worth it.

                  --
                  Tony Marston
                  This is Tony Marston's web site, containing personal information plus pages devoted to the Uniface 4GL development language, XML and XSL, PHP and MySQL, and a bit of COBOL

                  Build apps faster with Rapid Application Development using open-source RAD tools, modern RAD frameworks, and rapid application design methods.



                  Comment

                  Working...