Concurrency issues with "Tree" structures.

**Thomas R. Hummel** · Jul 23 '05, 08:18 AM

Re: Concurrency issues with "Tree&quot ; structures.

I believe that Joe Celko has a book that is devoted to tree structures
in a SQl environment. Do a search on Amazon for Celko and you should
see it. I'm sure that Joe will probably chime in here as well. My
thoughts on the subject though...
[color=blue]
> The path to each node is stored in the column "Path" and is of the[/color]
form[color=blue]
> "\000001\000002 \000003\"[/color]
Ack, yuck, ick! I hate that method of storing tree information myself.
Check Joe's book for several different ways to store trees in a RDBMS.
[color=blue]
> also have created the relation "ID" <-> "ID_Parent, effectively the[/color]
table is[color=blue]
> related to itself. I did this so that an attempt to remove a parent[/color]
when[color=blue]
> that parent still has children will fail due to referential[/color]
integrity.[color=blue]
> Unfortunately, in order to delete subtrees, I have to first set all[/color]
of the[color=blue]
> nodes in the subtree to point to the "NULL" parent so that I don't[/color]
get[color=blue]
> caught with integrity errors during deletes.[/color]
So, what you seem to be saying is that you want RI so that you can't
delete any rows by mistake, but you want to be able to easily delete
rows. You can't have it both ways... you either need to explicitly
delete the children (or set the FK's to NULL) or you need to remove the
RI.
[color=blue]
> Changes to any given descendant can trigger changes which are[/color]
propagated to it's[color=blue]
> ancestors[/color]
This sounds like a design problem with regards to normalization to me,
but without knowing the specifics I really can't say.

As for the client application tracking changes to the tree, that is
really dependent on the business requirements for your application. You
can reload the client every X seconds or you could wait for the client
to take an action then check to see if the action is still valid on the
tree in its current form. To limit the number of client refreshes that
you have to perform, each node could use a last_updated datetime column
to track changes and you could retrieve only those nodes and their
subtrees where the last_updated column has a value that is greater than
the last time that you refreshed your tree.

My suggestion would be to check out Joe's book (or Google past posts by
Joe) to find alternative tree representations . Some of them are rather
ingenious and easy to work with.

HTH,
-Tom.

**Erland Sommarskog** · Jul 23 '05, 08:19 AM

Re: Concurrency issues with "Tree&quot ; structures.

Robin Tucker (idontwanttobes pammedanymore@r eallyidont.com) writes:[color=blue]
> Now, the problem comes when I consider concurrency with respect to this
> scheme. It seems to me, that locking the record I am updating is not
> sufficient to ensure clients are kept synchronised or the integrity of
> the tree structure is correct. I think I need to lock all ancestors of
> the tree (HOLDLOCK) before performing any operation on a given node. Is
> this reasonable?[/color]

UPDLOCK would be better. Else you could run into conversion deadlocks.
An UPDLOCK is a shared lock, so other processes can read. But only one
can have an UPDLOCK.
[color=blue]
> Also, consider the "delete" problem given above. I
> really should HOLDLOCK on the entire subtree of any node I wish to
> delete as I am going to set the entire subtrees parent values to NULL.
> I don't want another client to perform a read on part of the subtree
> while the nodes are "parentless " pending deletion.[/color]

I'm not really sure why you need to do this set NULL thing. In fact
that is something I would avoid like the plague. But then I know
very little of your actual business problem.

I think Joe Celko's trick for tress is to number each node in a way so
that a subtree is a contiguous range. Then you can blow away to whole
subtree in one delete.

Then again, you already had that path. Would not:

DELETE tbl WHERE path = 'a/b/c' or path LIKE 'a/b/c/%'

work?

Of course, even with set-based statements, you can have interesting
effects if two clients are it at the same time.
[color=blue]
> Secondly, I am not sure how to handle synchronization of the tree for
> each client. How does each client know when a change has been made to
> the tree?[/color]

You will have to ask you tech lead about that. :-) Seriously, with
no knowledge of the requirements etc, it is very difficult to answer.
As Thomas said, you can refresh automatically with some frequency.
For more bells and whistle you could push the change by activating
some signaling mechanism from a trigger. But you make have to ask for
a bigger budget to do this.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**--CELKO--** · Jul 23 '05, 08:19 AM

Re: Concurrency issues with "Tree&quot ; structures.

>> I would be interested to hear any insights people have on these
issues, .. <<

Get a copy of TREES & HIERARCHIES IN SQL and you can see several
different approaches.
[color=blue][color=green]
>> in order to delete subtrees, I have to first set all of the nodes in[/color][/color]
the subtree to point to the "NULL" parent so that I don't get caught
with integrity errors during deletes. <<

Look up the nested sets model; this one atomic DELETE FROM statement.

**Robin Tucker** · Jul 23 '05, 08:19 AM

Re: Concurrency issues with "Tree&quot ; structures.

Hi Tom,
[color=blue]
> Ack, yuck, ick! I hate that method of storing tree information myself.
> Check Joe's book for several different ways to store trees in a RDBMS.
>[/color]

Ok, I don't want to get into a religious war about the best representation
of trees. I am aware of Celkos nested sets model but prefer the clarify of
the path string approach (my system isn't going to be operating with 1,000
users or anything, so I don't mind if it's performance isn't so good); of
course when I refer to clarity, I mean my own personal ability to visualise
whats going on and write code to perform the various manipulations required!
However, I will look once again at Celkos methods because I rejected them a
year ago for various reasons (the potential renumbering of large numbers of
nodes during inserts for example) and am now a bit wiser. I also tried out
nested sets with Tropashenkos' (apologies if I misspelled the name)
materialised path (binary fractions) and found this to be unbelievably slow
and restrictive (for various reasons mainly based on numeric accuracy).
[color=blue]
> So, what you seem to be saying is that you want RI so that you can't
> delete any rows by mistake, but you want to be able to easily delete
> rows. You can't have it both ways... you either need to explicitly
> delete the children (or set the FK's to NULL) or you need to remove the
> RI.[/color]

You are right of course, I need to throw away the RI on this, then deletion
of subtrees will be atomic and I will not have to set parents to NULL.
[color=blue]
>[color=green]
>> Changes to any given descendant can trigger changes which are[/color]
> propagated to it's[color=green]
>> ancestors[/color]
> This sounds like a design problem with regards to normalization to me,
> but without knowing the specifics I really can't say.
>[/color]

I will solve this by locking the ancestors (as another person says in his
post, using UPDLOCK rather than HOLDLOCK) and then processing them in any
transaction that changes state. Thinking about it though, it will still be
possible for a client to read a node that has not yet been processed while
another client is currently processing. So that first client may well
receive some processed and some unprocessed nodes back after a read
operation (is this true though? If the processing occurs within a
transaction, can other users read the uncommitted data?). The reason for
this is because the algorithm for state propagation cannot be atomic (as
fetching "ancestors" of any given node requires iteration).

However, I suppose it is possible for the client to detect problems like
this, by checking ancestors itself for consistency and forcing a refresh if
it detects an inconsistency when it reads back parts of the tree. Extra
work I think but bearing in mind the above (Celko nested sets), I suppose it
may be possible to UPDATE all ancestors atomically. If this is indeed the
case, then I might not need an iterative approach. But I doubt this is
possible, as I would have to ensure updates on the set of ancestors were
performed in order of their "depth" (leaves first of course). I don't think
SQL has such a mechanism (tables are sets after all, not sequences). I
would love to know how, using a path approach it is possible to fetch all
ancestors of a given node. I don't think it can be done with just a path
string and ID_Parent relation.

The specifics of this is each node has a "condition" (green, yellow, red,
undefined) according to the condition of it's children. For nodes of type
0, it's condition is the highest condition of each of it's children, for
nodes of type 1, it's condition is the condition of it's "most recently
added child". For nodes of type 2, the condition is fixed by the client
when the record is created. Condition propagation basically is tasked with
ensuring these conditions are consistent after deletes, updates or inserts.
[color=blue]
> As for the client application tracking changes to the tree, that is
> really dependent on the business requirements for your application. You
> can reload the client every X seconds or you could wait for the client
> to take an action then check to see if the action is still valid on the
> tree in its current form. To limit the number of client refreshes that
> you have to perform, each node could use a last_updated datetime column
> to track changes and you could retrieve only those nodes and their
> subtrees where the last_updated column has a value that is greater than
> the last time that you refreshed your tree.
>[/color]

I have another idea about this (I will timestamp the nodes). I have a
worker thread that serializes access to the database from the various client
controls, firing callbacks when it performs an operation. This thread can
check through the tree in it's idle time to ensure it is synchronized. Of
course, if I execute an update and fail (ROWCOUNT = 0) or a delete come to
think of it, I can force a refresh of the entire subtree and it's ancestors.
But I would consider the entire subtree and all it's ancestors to be suspect
if I detected any given node had a later timestamp than that expected.
[color=blue]
> My suggestion would be to check out Joe's book (or Google past posts by
> Joe) to find alternative tree representations . Some of them are rather
> ingenious and easy to work with.
>
> HTH,
> -Tom.
>[/color]

Thanks for all of your ideas.

Robin

Concurrency issues with "Tree" structures.

Concurrency issues with "Tree" structures.

Comment

Comment

Comment

Comment