Is there a good approach to modelling many heterogeneous entity types
with that have some attributes in common?
Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomt ype and so on.
There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).
The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discrimina tor" field that allows you to figure out which
subtype table to load for a particular entity.
This solution does not seem to scale since for each value of
"discrimina tor" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
Is that possible to obtain a single ResultSet with one SQL statement
SQL?
Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.
Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?
Thanks!
- robert
with that have some attributes in common?
Say I have entities "employees" which share some attibutes (e.g.
firstname, lastname, dateofbirth) but some subsets of employees (e.g.
physicians, janitors, nurses, ambulance drivers) may have additional
attributes that do not apply to all employees. Physicians may have
attributes specialty and date of board certification, ambulance
drivers may have a drivers license id, janitors may have
preferredbroomt ype and so on.
There are many employee subtypes and more can be dynamically added
after the application is deployed so it's obviously no good to keep
adding attributes to the employees table because most attributes will
be NULL (since janitors are never doctors at the same time).
The only solution I found for this is a generalization hiearchy where
you have the employee table with all generic attributes and then you
add tables for each new employee subtype as necessary. The subtype
tables share the primary key of the employee table. The employee table
has a "discrimina tor" field that allows you to figure out which
subtype table to load for a particular entity.
This solution does not seem to scale since for each value of
"discrimina tor" I need to perform a join with a different table. What
if I need to retrieve 1,000 employees at once?
Is that possible to obtain a single ResultSet with one SQL statement
SQL?
Or do you I need to iterate look at the discriminator and then
perform the appropriate join? If this kind of iteration is necessary
then obviously this generalization hierarchy approach does not work in
practice
since it would be painfully slow.
Is there a better approach to modelling these kind of heterogeneous
entities with shared attributes that does not involve creating a table
for each new employee type or having sparce tables (mostly filled with
NULLS)
I guess another approach would be to use name/value pairs but that
would make reporting really ugly.
Seems like a very common problem. Any ideas? Is this a fundamental
limitation of SQL?
Thanks!
- robert
Comment