Searching in tree structures

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Gordon

    Searching in tree structures

    I've developed a CMS that manages content across several sites. The
    content is represented in a tree structure maintained in a database,
    where each item has an id and a parent that points to the containing
    item or 0 if it's a root item (a site).

    I developed a fulltext search script for the CMS that will return
    pages ordered by relevance.

    This is fine, but I've now had a change request whereby searches can
    be restricted to specific sites. This is a problem because the only
    way to know what site an item belongs to is to fetch it, walk up it's
    tree and look at it's root node.

    I can see 2 possible solutions, but both have drawbacks. The first is
    to just fetch all matches and then discard ones that don't belong to
    the specified site. This means increased DB load and traffic between
    the DB and the webserver and could potentially cause quite a
    performance hit when the number of pages being dealt with climbs into
    the thousands.

    The other solution is to, in addition to the parent node, also store
    the root node in the database. This would give an extra field to
    match against and make eliminating items with an SQL query, thus
    reducing load and traffic, quite simple. However the big drawback is
    that there are now two fields determining where an item fits into the
    tree structure instead of just one, making it an obvious source of
    potential error. It would also mean updating code related to item
    creation updating that wouldn't have to be touched with the other
    approach.

    If any of you guys out there have another potential solution that I
    might have not thought of I'd like to hear what they are. Failing
    that, which of the options I do have would you recommend?
  • Jerry Stuckle

    #2
    Re: Searching in tree structures

    Gordon wrote:
    I've developed a CMS that manages content across several sites. The
    content is represented in a tree structure maintained in a database,
    where each item has an id and a parent that points to the containing
    item or 0 if it's a root item (a site).
    >
    I developed a fulltext search script for the CMS that will return
    pages ordered by relevance.
    >
    This is fine, but I've now had a change request whereby searches can
    be restricted to specific sites. This is a problem because the only
    way to know what site an item belongs to is to fetch it, walk up it's
    tree and look at it's root node.
    >
    I can see 2 possible solutions, but both have drawbacks. The first is
    to just fetch all matches and then discard ones that don't belong to
    the specified site. This means increased DB load and traffic between
    the DB and the webserver and could potentially cause quite a
    performance hit when the number of pages being dealt with climbs into
    the thousands.
    >
    The other solution is to, in addition to the parent node, also store
    the root node in the database. This would give an extra field to
    match against and make eliminating items with an SQL query, thus
    reducing load and traffic, quite simple. However the big drawback is
    that there are now two fields determining where an item fits into the
    tree structure instead of just one, making it an obvious source of
    potential error. It would also mean updating code related to item
    creation updating that wouldn't have to be touched with the other
    approach.
    >
    If any of you guys out there have another potential solution that I
    might have not thought of I'd like to hear what they are. Failing
    that, which of the options I do have would you recommend?
    Not from the PHP end. You will have to retrieve the data and throw away
    what you don't want.

    If you're looking for a database solution, you should try the
    appropriate database newsgroup.

    --
    =============== ===
    Remove the "x" from my email address
    Jerry Stuckle
    JDS Computer Training Corp.
    jstucklex@attgl obal.net
    =============== ===

    Comment

    • sheldonlg

      #3
      Re: Searching in tree structures

      Gordon wrote:
      I've developed a CMS that manages content across several sites. The
      content is represented in a tree structure maintained in a database,
      where each item has an id and a parent that points to the containing
      item or 0 if it's a root item (a site).
      >
      I developed a fulltext search script for the CMS that will return
      pages ordered by relevance.
      >
      This is fine, but I've now had a change request whereby searches can
      be restricted to specific sites. This is a problem because the only
      way to know what site an item belongs to is to fetch it, walk up it's
      tree and look at it's root node.
      >
      I can see 2 possible solutions, but both have drawbacks. The first is
      to just fetch all matches and then discard ones that don't belong to
      the specified site. This means increased DB load and traffic between
      the DB and the webserver and could potentially cause quite a
      performance hit when the number of pages being dealt with climbs into
      the thousands.
      >
      The other solution is to, in addition to the parent node, also store
      the root node in the database. This would give an extra field to
      match against and make eliminating items with an SQL query, thus
      reducing load and traffic, quite simple. However the big drawback is
      that there are now two fields determining where an item fits into the
      tree structure instead of just one, making it an obvious source of
      potential error. It would also mean updating code related to item
      creation updating that wouldn't have to be touched with the other
      approach.
      >
      If any of you guys out there have another potential solution that I
      might have not thought of I'd like to hear what they are. Failing
      that, which of the options I do have would you recommend?
      I opt for the second. It falls into the realm of code once, thoroughly
      test, forget about it. A severe performance hit, OTOH, is always present.

      Comment

      • =?ISO-8859-1?Q?=22=C1lvaro_G=2E_Vicario=22?=

        #4
        Re: Searching in tree structures

        Gordon escribió:
        I've developed a CMS that manages content across several sites. The
        content is represented in a tree structure maintained in a database,
        where each item has an id and a parent that points to the containing
        item or 0 if it's a root item (a site).
        [...]
        If any of you guys out there have another potential solution that I
        might have not thought of I'd like to hear what they are. Failing
        that, which of the options I do have would you recommend?
        There are many DBMS-specific solutions to handle hierarchical data. What
        database types does your CMS need to support?

        In any case, unless your tree is too big you can:

        1. Fetch the complete tree structure (1 DB query and not much data, just
        IDs)
        2. Find the IDs of the right nodes in PHP
        3. Fetch the desired info filtering by node: WHERE NODE_ID IN (....)

        A couple of links I've gathered:

        [MySQL+InnoDB] http://jan.kneschke.de/projects/mysql/sp
        [Oracle] http://philip.greenspun.com/sql/trees.html


        --
        -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
        -- Mi sitio sobre programación web: http://bits.demogracia.com
        -- Mi web de humor al baño María: http://www.demogracia.com
        --

        Comment

        • Gordon

          #5
          Re: Searching in tree structures

          On May 20, 1:00 pm, sheldonlg <sheldonlgwrote :
          Gordon wrote:
          I've developed a CMS that manages content across several sites. The
          content is represented in a tree structure maintained in a database,
          where each item has an id and a parent that points to the containing
          item or 0 if it's a root item (a site).
          >
          I developed a fulltext search script for the CMS that will return
          pages ordered by relevance.
          >
          This is fine, but I've now had a change request whereby searches can
          be restricted to specific sites. This is a problem because the only
          way to know what site an item belongs to is to fetch it, walk up it's
          tree and look at it's root node.
          >
          I can see 2 possible solutions, but both have drawbacks. The first is
          to just fetch all matches and then discard ones that don't belong to
          the specified site. This means increased DB load and traffic between
          the DB and the webserver and could potentially cause quite a
          performance hit when the number of pages being dealt with climbs into
          the thousands.
          >
          The other solution is to, in addition to the parent node, also store
          the root node in the database. This would give an extra field to
          match against and make eliminating items with an SQL query, thus
          reducing load and traffic, quite simple. However the big drawback is
          that there are now two fields determining where an item fits into the
          tree structure instead of just one, making it an obvious source of
          potential error. It would also mean updating code related to item
          creation updating that wouldn't have to be touched with the other
          approach.
          >
          If any of you guys out there have another potential solution that I
          might have not thought of I'd like to hear what they are. Failing
          that, which of the options I do have would you recommend?
          >
          I opt for the second. It falls into the realm of code once, thoroughly
          test, forget about it. A severe performance hit, OTOH, is always present.
          That's a good point. I suppose if I'm sufficiantly careful then the 2
          pointers solution offers almost no performance hit as opposed to the
          other approach. I just get a bit paranoid about that kind of thing, I
          had it drilled into me during my education on databases you don't
          store redundant data because of the problems it can cause if it
          desyncs. In this case the tradeoff is probably worth it.

          Comment

          • Gordon

            #6
            Re: Searching in tree structures

            On May 20, 1:06 pm, "Álvaro G. Vicario"
            <alvaroNOSPAMTH A...@demogracia .comwrote:
            Gordon escribió:
            >
            I've developed a CMS that manages content across several sites. The
            content is represented in a tree structure maintained in a database,
            where each item has an id and a parent that points to the containing
            item or 0 if it's a root item (a site).
            [...]
            If any of you guys out there have another potential solution that I
            might have not thought of I'd like to hear what they are. Failing
            that, which of the options I do have would you recommend?
            >
            There are many DBMS-specific solutions to handle hierarchical data. What
            database types does your CMS need to support?
            >
            In any case, unless your tree is too big you can:
            >
            1. Fetch the complete tree structure (1 DB query and not much data, just
            IDs)
            2. Find the IDs of the right nodes in PHP
            3. Fetch the desired info filtering by node: WHERE NODE_ID IN (....)
            >
            A couple of links I've gathered:
            >
            [MySQL+InnoDB] http://jan.kneschke.de/projects/mysql/sp
            [Oracle]http://philip.greenspu n.com/sql/trees.html
            >
            --
            --http://alvaro.es- Álvaro G. Vicario - Burgos, Spain
            -- Mi sitio sobre programación web:http://bits.demogracia.com
            -- Mi web de humor al baño María:http://www.demogracia.com
            --
            Thanks for the pointers. The CMS I'm working on is using Postgres
            8.3.1 as the backend. Very good fulltext search once you've gotten
            used to its quirks. :)

            Comment

            • Jerry Stuckle

              #7
              Re: Searching in tree structures

              Gordon wrote:
              On May 20, 1:00 pm, sheldonlg <sheldonlgwrote :
              >Gordon wrote:
              >>I've developed a CMS that manages content across several sites. The
              >>content is represented in a tree structure maintained in a database,
              >>where each item has an id and a parent that points to the containing
              >>item or 0 if it's a root item (a site).
              >>I developed a fulltext search script for the CMS that will return
              >>pages ordered by relevance.
              >>This is fine, but I've now had a change request whereby searches can
              >>be restricted to specific sites. This is a problem because the only
              >>way to know what site an item belongs to is to fetch it, walk up it's
              >>tree and look at it's root node.
              >>I can see 2 possible solutions, but both have drawbacks. The first is
              >>to just fetch all matches and then discard ones that don't belong to
              >>the specified site. This means increased DB load and traffic between
              >>the DB and the webserver and could potentially cause quite a
              >>performance hit when the number of pages being dealt with climbs into
              >>the thousands.
              >>The other solution is to, in addition to the parent node, also store
              >>the root node in the database. This would give an extra field to
              >>match against and make eliminating items with an SQL query, thus
              >>reducing load and traffic, quite simple. However the big drawback is
              >>that there are now two fields determining where an item fits into the
              >>tree structure instead of just one, making it an obvious source of
              >>potential error. It would also mean updating code related to item
              >>creation updating that wouldn't have to be touched with the other
              >>approach.
              >>If any of you guys out there have another potential solution that I
              >>might have not thought of I'd like to hear what they are. Failing
              >>that, which of the options I do have would you recommend?
              >I opt for the second. It falls into the realm of code once, thoroughly
              >test, forget about it. A severe performance hit, OTOH, is always present.
              >
              That's a good point. I suppose if I'm sufficiantly careful then the 2
              pointers solution offers almost no performance hit as opposed to the
              other approach. I just get a bit paranoid about that kind of thing, I
              had it drilled into me during my education on databases you don't
              store redundant data because of the problems it can cause if it
              desyncs. In this case the tradeoff is probably worth it.
              Depending on the database, it might be possible with the structure you
              have now. But you need to be asking in a newsgroup for your database.

              --
              =============== ===
              Remove the "x" from my email address
              Jerry Stuckle
              JDS Computer Training Corp.
              jstucklex@attgl obal.net
              =============== ===

              Comment

              Working...