Hi,
I'm currently implementing a database with a tree structure in a table. The
nodes in the tree are stored as records with a column called "Parent". The
root of the tree has a "NULL" parent. The path to each node is stored in
the column "Path" and is of the form "\000001\000002 \000003\" etc. The
latter enabling me to fetch subtrees using the "LIKE" predicate. I also
have created the relation "ID" <-> "ID_Parent, effectively the table is
related to itself. I did this so that an attempt to remove a parent when
that parent still has children will fail due to referential integrity.
Unfortunately, in order to delete subtrees, I have to first set all of the
nodes in the subtree to point to the "NULL" parent so that I don't get
caught with integrity errors during deletes.
Unlike a typical linear system, any given node record is related to more
than one of the other records. What I mean is it is possible to follow a
chain from any given node back to the root (ancestors) or collect a series
of branches and leaves (descendants) and so it is not reasonable to consider
any given node in isolation. Changes to any given descendant can trigger
changes which are propagated to it's ancestors. I am not using recursion to
do this, rather, I am using an iterative approach to select and update the
parent, then the parents parent, etc. according to it's new state (given by
the state of it's immediate descendants), right back up to the root.
Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of the
tree structure is correct. I think I need to lock all ancestors of the tree
(HOLDLOCK) before performing any operation on a given node. Is this
reasonable? Also, consider the "delete" problem given above. I really
should HOLDLOCK on the entire subtree of any node I wish to delete as I am
going to set the entire subtrees parent values to NULL. I don't want
another client to perform a read on part of the subtree while the nodes are
"parentless " pending deletion.
Secondly, I am not sure how to handle synchronization of the tree for each
client. How does each client know when a change has been made to the tree?
Consider a client holding a copy of the tree in memory. Another client
deletes a subtree. The first client attempts to update one of the deleted
subtree nodes and fails because the node no longer exists. At this point,
in the eyes of the first client, all of the ancestors and all of the
descendants of the node in question must become suspect. Should the
software now attempt to re-build this part of the tree? It seems that any
operation on any of the nodes in the tree will potentially make a lot of
other nodes suspect and so my application will be doing a lot of updating.
I would be interested to hear any insights people have on these issues,
particularly those implementing a "file system" structure in their database
or similar. How do you deal with these concurrency issues when manipulating
trees?
Thanks
Robin
I'm currently implementing a database with a tree structure in a table. The
nodes in the tree are stored as records with a column called "Parent". The
root of the tree has a "NULL" parent. The path to each node is stored in
the column "Path" and is of the form "\000001\000002 \000003\" etc. The
latter enabling me to fetch subtrees using the "LIKE" predicate. I also
have created the relation "ID" <-> "ID_Parent, effectively the table is
related to itself. I did this so that an attempt to remove a parent when
that parent still has children will fail due to referential integrity.
Unfortunately, in order to delete subtrees, I have to first set all of the
nodes in the subtree to point to the "NULL" parent so that I don't get
caught with integrity errors during deletes.
Unlike a typical linear system, any given node record is related to more
than one of the other records. What I mean is it is possible to follow a
chain from any given node back to the root (ancestors) or collect a series
of branches and leaves (descendants) and so it is not reasonable to consider
any given node in isolation. Changes to any given descendant can trigger
changes which are propagated to it's ancestors. I am not using recursion to
do this, rather, I am using an iterative approach to select and update the
parent, then the parents parent, etc. according to it's new state (given by
the state of it's immediate descendants), right back up to the root.
Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of the
tree structure is correct. I think I need to lock all ancestors of the tree
(HOLDLOCK) before performing any operation on a given node. Is this
reasonable? Also, consider the "delete" problem given above. I really
should HOLDLOCK on the entire subtree of any node I wish to delete as I am
going to set the entire subtrees parent values to NULL. I don't want
another client to perform a read on part of the subtree while the nodes are
"parentless " pending deletion.
Secondly, I am not sure how to handle synchronization of the tree for each
client. How does each client know when a change has been made to the tree?
Consider a client holding a copy of the tree in memory. Another client
deletes a subtree. The first client attempts to update one of the deleted
subtree nodes and fails because the node no longer exists. At this point,
in the eyes of the first client, all of the ancestors and all of the
descendants of the node in question must become suspect. Should the
software now attempt to re-build this part of the tree? It seems that any
operation on any of the nodes in the tree will potentially make a lot of
other nodes suspect and so my application will be doing a lot of updating.
I would be interested to hear any insights people have on these issues,
particularly those implementing a "file system" structure in their database
or similar. How do you deal with these concurrency issues when manipulating
trees?
Thanks
Robin
Comment