SQL for Modeling Generalization Hierarchies

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robert Brown

    SQL for Modeling Generalization Hierarchies

    Is there a good approach to modelling many heterogeneous entity types
    with that have some attributes in common?

    Say I have entities "employees" which share some attibutes (e.g.
    firstname, lastname, dateofbirth) but some subsets of employees (e.g.
    physicians, janitors, nurses, ambulance drivers) may have additional
    attributes that do not apply to all employees. Physicians may have
    attributes specialty and date of board certification, ambulance
    drivers may have a drivers license id, janitors may have
    preferredbroomt ype and so on.

    There are many employee subtypes and more can be dynamically added
    after the application is deployed so it's obviously no good to keep
    adding attributes to the employees table because most attributes will
    be NULL (since janitors are never doctors at the same time).

    The only solution I found for this is a generalization hiearchy where
    you have the employee table with all generic attributes and then you
    add tables for each new employee subtype as necessary. The subtype
    tables share the primary key of the employee table. The employee table
    has a "discrimina tor" field that allows you to figure out which
    subtype table to load for a particular entity.

    This solution does not seem to scale since for each value of
    "discrimina tor" I need to perform a join with a different table. What
    if I need to retrieve 1,000 employees at once?

    Is that possible to obtain a single ResultSet with one SQL statement
    SQL?

    Or do you I need to iterate look at the discriminator and then
    perform the appropriate join? If this kind of iteration is necessary
    then obviously this generalization hierarchy approach does not work in
    practice
    since it would be painfully slow.

    Is there a better approach to modelling these kind of heterogeneous
    entities with shared attributes that does not involve creating a table
    for each new employee type or having sparce tables (mostly filled with
    NULLS)

    I guess another approach would be to use name/value pairs but that
    would make reporting really ugly.

    Seems like a very common problem. Any ideas? Is this a fundamental
    limitation of SQL?

    Thanks!

    - robert
  • Laconic2

    #2
    Re: SQL for Modeling Generalization Hierarchies


    "Robert Brown" <robertbrown197 1@yahoo.comwrot e in message
    news:240a4d09.0 405270856.4ce55 c7d@posting.goo gle.com...
    Is there a good approach to modelling many heterogeneous entity types
    with that have some attributes in common?
    This is a frequently asked question. Unfortunately, the frequently given
    responses don't usually settle the matter.

    There are many employee subtypes and more can be dynamically added
    after the application is deployed so it's obviously no good to keep
    adding attributes to the employees table because most attributes will
    be NULL (since janitors are never doctors at the same time).
    >
    The only solution I found for this is a generalization hiearchy where
    you have the employee table with all generic attributes and then you
    add tables for each new employee subtype as necessary. The subtype
    tables share the primary key of the employee table. The employee table
    has a "discrimina tor" field that allows you to figure out which
    subtype table to load for a particular entity.
    If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
    janitor or a doctor)
    and you invent new tables for subtypes on an equally ad hoc basis, I'd
    suggest that you are altering your data model on an ad hoc basis.

    You can do that if you want, but don't expect the same power and simplicity
    that you get from a stable data model.


    >
    This solution does not seem to scale since for each value of
    "discrimina tor" I need to perform a join with a different table. What
    if I need to retrieve 1,000 employees at once?
    Why would you ever need the attributes of more than one subtype? If your
    query is about doctors, why would you need to join the table about janitors
    into the query?

    If your query is about employees, why would you need to know any of the
    subtype attributes?
    >
    Or do you I need to iterate look at the discriminator and then
    perform the appropriate join? If this kind of iteration is necessary
    then obviously this generalization hierarchy approach does not work in
    practice
    since it would be painfully slow.
    You need to ask two questions about the model, and about any imlpementation
    based on the model:
    first, is it correct and complete? Second, if yes to the first, is it fast
    enough?

    In general, different implementations of the same model should be
    transparent to users of the imlpementation who only look at the features
    visible in the model.

    >
    Is there a better approach to modelling these kind of heterogeneous
    entities with shared attributes that does not involve creating a table
    for each new employee type or having sparce tables (mostly filled with
    NULLS)
    If you have new entities, you are going to have new relations. That either
    means having new tables, or fudging the relationship between tables and
    rleations. Do the second one at your own peril.
    Seems like a very common problem. Any ideas? Is this a fundamental
    limitation of SQL?
    I would suggest it goes beyond SQl to the very heart of using foreign key/
    primary key aossciations to establish linkages.


    Comment

    • Laconic2

      #3
      Re: SQL for Modeling Generalization Hierarchies


      "Robert Brown" <robertbrown197 1@yahoo.comwrot e in message
      news:240a4d09.0 405270856.4ce55 c7d@posting.goo gle.com...
      adding attributes to the employees table because most attributes will
      be NULL (since janitors are never doctors at the same time).
      By the way, after I wrote "a DBA is never a janitor or a doctor" it
      occurred to me that when I was a DBA there were days when I felt like I was
      both at the same time! ;)



      Comment

      • Erland Sommarskog

        #4
        Re: SQL for Modeling Generalization Hierarchies

        Robert Brown (robertbrown197 1@yahoo.com) writes:
        The only solution I found for this is a generalization hiearchy where
        you have the employee table with all generic attributes and then you
        add tables for each new employee subtype as necessary. The subtype
        tables share the primary key of the employee table. The employee table
        has a "discrimina tor" field that allows you to figure out which
        subtype table to load for a particular entity.
        >
        This solution does not seem to scale since for each value of
        "discrimina tor" I need to perform a join with a different table. What
        if I need to retrieve 1,000 employees at once?
        So what? 1000 rows for a modern RDBMS is a breeze.
        Is that possible to obtain a single ResultSet with one SQL statement
        SQL?
        Yes, although with many discriminators, it will be one hell of a join:

        SELECT main.col1, main.col2, ... d1.col1, d1.col2, ...
        FROM main
        LEFT JOIN discriminator d1 ON d1.keycol = main.keycol
        LEFT JOIN discriminator d2 ON d2.keycol = main.keycol
        ...

        In practice, things might be even messier, because some values might
        apply to four discriminators, but be irrelevant to the rest. In
        they the would be four different columns. Although, this could be
        addressed with:

        common_to_four = colaesce(d3.com mon, d6.common, d9.common, d11.common)

        But it may be better to add this as a nullable column to the common
        table.

        Likewise, if two discrimiators are very similar, it may be better to
        lump them in the same table.
        I guess another approach would be to use name/value pairs but that
        would make reporting really ugly.
        And you get less control over your spelling errors. But sometimes this
        is the way to go.
        Seems like a very common problem. Any ideas? Is this a fundamental
        limitation of SQL?
        Sort of. SQL tables are squared, and object-oriented hierachies are
        jagged.

        But that's alright, just don't be too object-oriented. Be pragmatic too.

        --
        Erland Sommarskog, SQL Server MVP, sommar@algonet. se

        Books Online for SQL Server SP3 at
        SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

        Comment

        • Robert Brown

          #5
          Re: SQL for Modeling Generalization Hierarchies

          "Laconic2" <laconic2@comca st.netwrote in message news:<E8KdnXHBZ NMsuSvdRVn-jg@comcast.com> ...
          "Robert Brown" <robertbrown197 1@yahoo.comwrot e in message
          news:240a4d09.0 405270856.4ce55 c7d@posting.goo gle.com...
          Is there a good approach to modelling many heterogeneous entity types
          with that have some attributes in common?
          >
          This is a frequently asked question. Unfortunately, the frequently given
          responses don't usually settle the matter.
          Could you please post links to any relevant past threads if you
          remember where to find them?I tried to search the news group for
          "generaliza tion hierarchy" but did not find anything useful. Is there
          a better search term I can use?
          >
          There are many employee subtypes and more can be dynamically added
          after the application is deployed so it's obviously no good to keep
          adding attributes to the employees table because most attributes will
          be NULL (since janitors are never doctors at the same time).

          The only solution I found for this is a generalization hiearchy where
          you have the employee table with all generic attributes and then you
          add tables for each new employee subtype as necessary. The subtype
          tables share the primary key of the employee table. The employee table
          has a "discrimina tor" field that allows you to figure out which
          subtype table to load for a particular entity.
          >
          If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a
          janitor or a doctor)
          and you invent new tables for subtypes on an equally ad hoc basis, I'd
          suggest that you are altering your data model on an ad hoc basis.
          >
          You can do that if you want, but don't expect the same power and simplicity
          that you get from a stable data model.
          >
          >
          >

          This solution does not seem to scale since for each value of
          "discrimina tor" I need to perform a join with a different table. What
          if I need to retrieve 1,000 employees at once?
          >
          Why would you ever need the attributes of more than one subtype? If your
          query is about doctors, why would you need to join the table about janitors
          into the query?
          Because I need to display users and their heterogenious attributes in
          the UI as one list (e.g. in response to a search).
          If your query is about employees, why would you need to know any of the
          subtype attributes?
          Same reason, our customer's HR department does not see the employees
          as separate entities. To them an employee is an employee and they want
          to see all the associated data while browsing employees. I guess you
          could show the generic attributes on the first pass and then require
          the end user to click to drill down and see the subtype attributes -
          but that's not what they want.

          Or do you I need to iterate look at the discriminator and then
          perform the appropriate join? If this kind of iteration is necessary
          then obviously this generalization hierarchy approach does not work in
          practice
          since it would be painfully slow.
          >
          You need to ask two questions about the model, and about any imlpementation
          based on the model:
          first, is it correct and complete? Second, if yes to the first, is it fast
          enough?
          >
          In general, different implementations of the same model should be
          transparent to users of the imlpementation who only look at the features
          visible in the model.
          >
          >

          Is there a better approach to modelling these kind of heterogeneous
          entities with shared attributes that does not involve creating a table
          for each new employee type or having sparce tables (mostly filled with
          NULLS)
          >
          If you have new entities, you are going to have new relations. That either
          means having new tables, or fudging the relationship between tables and
          rleations. Do the second one at your own peril.
          >
          Seems like a very common problem. Any ideas? Is this a fundamental
          limitation of SQL?
          >
          I would suggest it goes beyond SQl to the very heart of using foreign key/
          primary key aossciations to establish linkages.

          Comment

          • Christopher Browne

            #6
            Re: SQL for Modeling Generalization Hierarchies

            In the last exciting episode, "Laconic2" <laconic2@comca st.netwrote:
            "Robert Brown" <robertbrown197 1@yahoo.comwrot e in message
            news:240a4d09.0 405270856.4ce55 c7d@posting.goo gle.com...
            >
            >adding attributes to the employees table because most attributes will
            >be NULL (since janitors are never doctors at the same time).
            >
            By the way, after I wrote "a DBA is never a janitor or a doctor" it
            occurred to me that when I was a DBA there were days when I felt
            like I was both at the same time! ;)
            Some days it also combines coroner and roto-rooter operator...

            A couple coworkers were in that state today :-(.
            --
            wm(X,Y):-write(X),write( '@'),write(Y). wm('cbbrowne',' acm.org').

            If a person with multiple personalities threatens suicide, is that
            considered a hostage situation?

            Comment

            • Payson

              #7
              Re: SQL for Modeling Generalization Hierarchies

              This is always messy. With dynamically added subtypes, I would
              consider keeping everything soft. Here is a first approximation for
              the tables:

              Employee Valid Subtypes Attributes/Subtype Valid Attributes
              Subtype<--------Subtype--------->Subtype ?Datatype
              Employee # ?Desc Attribute<-----------Attribute
              | ?Desc
              |
              Employee Attributes
              Employee #
              Attribute
              Attribute data

              You could then build a list of available attributes for an employee
              based on his subtype from the attributes/subtype table. The data
              entry screen(s) could be built from the Valid Attributes
              table(depending on how smart it needs to be).

              This is a bit rough, but I hope the idea comes through. Control over
              the Valid Subtypes table, the Attributes table, etc. would need to be
              defined (who and how), but, gosh, I would have to have to create a new
              table every time a new subtype was added.

              LOL.. here I go again, ask me what time it is and I tell you how to
              build a watch. Hope this helps more than it hurts.

              Payson


              robertbrown1971 @yahoo.com (Robert Brown) wrote in message news:<240a4d09. 0405270856.4ce5 5c7d@posting.go ogle.com>...
              Is there a good approach to modelling many heterogeneous entity types
              with that have some attributes in common?
              >
              Say I have entities "employees" which share some attibutes (e.g.
              firstname, lastname, dateofbirth) but some subsets of employees (e.g.
              physicians, janitors, nurses, ambulance drivers) may have additional
              attributes that do not apply to all employees. Physicians may have
              attributes specialty and date of board certification, ambulance
              drivers may have a drivers license id, janitors may have
              preferredbroomt ype and so on.
              >
              There are many employee subtypes and more can be dynamically added
              after the application is deployed so it's obviously no good to keep
              adding attributes to the employees table because most attributes will
              be NULL (since janitors are never doctors at the same time).
              >
              The only solution I found for this is a generalization hiearchy where
              you have the employee table with all generic attributes and then you
              add tables for each new employee subtype as necessary. The subtype
              tables share the primary key of the employee table. The employee table
              has a "discrimina tor" field that allows you to figure out which
              subtype table to load for a particular entity.
              >
              This solution does not seem to scale since for each value of
              "discrimina tor" I need to perform a join with a different table. What
              if I need to retrieve 1,000 employees at once?
              >
              Is that possible to obtain a single ResultSet with one SQL statement
              SQL?
              >
              Or do you I need to iterate look at the discriminator and then
              perform the appropriate join? If this kind of iteration is necessary
              then obviously this generalization hierarchy approach does not work in
              practice
              since it would be painfully slow.
              >
              Is there a better approach to modelling these kind of heterogeneous
              entities with shared attributes that does not involve creating a table
              for each new employee type or having sparce tables (mostly filled with
              NULLS)
              >
              I guess another approach would be to use name/value pairs but that
              would make reporting really ugly.
              >
              Seems like a very common problem. Any ideas? Is this a fundamental
              limitation of SQL?
              >
              Thanks!
              >
              - robert

              Comment

              • --CELKO--

                #8
                Re: SQL for Modeling Generalization Hierarchies

                >Is there a good approach to modelling many heterogeneous entity
                types with that have some attributes in common? <<

                Ignoring that the phrase "entity type" is OO **yuck!**

                The classic scenario calls for a root class with all the common
                attributes and then specialized sub-classes under it. As an example,
                let's take the class of Vehicles and find an industry standard
                identifier (VIN), and add two mutually exclusive sub-classes, Sport
                utility vehicles and sedans ('SUV', 'SED').

                CREATE TABLE Vehicles
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) NOT NULL
                CHECK(vehicle_t ype IN ('SUV', 'SED')),
                UNIQUE (vin, vehicle_type),
                ..);

                Notice the overlapping candidate keys. I then use a compound candidate
                key (vin, vehicle_type) and a constraint in each sub-class table to
                assure that the vehicle_type is locked and agrees with the Vehicles
                table. Add some DRI actions and you are done:

                CREATE TABLE SUV
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
                CHECK(vehicle_t ype = 'SUV'),
                UNIQUE (vin, vehicle_type),
                FOREIGN KEY (vin, vehicle_type)
                REFERENCES Vehicles(vin, vehicle_type)
                ON UPDATE CASCADE
                ON DELETE CASCADE,
                ..);

                CREATE TABLE Sedans
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                CHECK(vehicle_t ype = 'SED'),
                UNIQUE (vin, vehicle_type),
                FOREIGN KEY (vin, vehicle_type)
                REFERENCES Vehicles(vin, vehicle_type)
                ON UPDATE CASCADE
                ON DELETE CASCADE,
                ..);

                I can continue to build a hierarchy like this. For example, if I had
                a Sedans table that broke down into two-door and four-door sedans, I
                could a schema like this:

                CREATE TABLE Sedans
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                CHECK(vehicle_t ype IN ('2DR', '4DR', 'SED')),
                UNIQUE (vin, vehicle_type),
                FOREIGN KEY (vin, vehicle_type)
                REFERENCES Vehicles(vin, vehicle_type)
                ON UPDATE CASCADE
                ON DELETE CASCADE,
                ..);

                CREATE TABLE TwoDoor
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
                CHECK(vehicle_t ype = '2DR'),
                UNIQUE (vin, vehicle_type),
                FOREIGN KEY (vin, vehicle_type)
                REFERENCES Sedans(vin, vehicle_type)
                ON UPDATE CASCADE
                ON DELETE CASCADE,
                ..);

                CREATE TABLE FourDoor
                (vin CHAR(17) NOT NULL PRIMARY KEY,
                vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
                CHECK(vehicle_t ype = '4DR'),
                UNIQUE (vin, vehicle_type),
                FOREIGN KEY (vin, vehicle_type)
                REFERENCES Sedans (vin, vehicle_type)
                ON UPDATE CASCADE
                ON DELETE CASCADE,
                ..);

                The idea is to build a chain of identifiers and types in a UNIQUE()
                constraint that go up the tree when you use a REFERENCES constraint.
                Obviously, you can do variants of this trick to get different class
                structures.

                If an entity doesn't have to be exclusively one subtype, you play with
                the root of the class hierarchy:

                CREATE TABLE Vehicles
                (vin CHAR(17) NOT NULL,
                vehicle_type CHAR(3) NOT NULL
                CHECK(vehicle_t ype IN ('SUV', 'SED')),
                PRIMARY KEY (vin, vehicle_type),
                ..);

                Now start hiding all this stuff in VIEWs immediately and add an
                INSTEAD OF trigger to those VIEWs.

                Here is the link on Amazon.com for my new book on "Trees & Hierarchies
                in SQL"


                Comment

                • Erland Sommarskog

                  #9
                  Re: SQL for Modeling Generalization Hierarchies

                  --CELKO-- (jcelko212@eart hlink.net) writes:
                  CREATE TABLE Vehicles
                  (vin CHAR(17) NOT NULL PRIMARY KEY,
                  vehicle_type CHAR(3) NOT NULL
                  CHECK(vehicle_t ype IN ('SUV', 'SED')),
                  UNIQUE (vin, vehicle_type),
                  ..);
                  >
                  Notice the overlapping candidate keys. I then use a compound candidate
                  key (vin, vehicle_type) and a constraint in each sub-class table to
                  assure that the vehicle_type is locked and agrees with the Vehicles
                  table. Add some DRI actions and you are done:
                  Ah! Interesting trick! I'll save a mental note about that.

                  --
                  Erland Sommarskog, SQL Server MVP, sommar@algonet. se

                  Books Online for SQL Server SP3 at
                  SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

                  Comment

                  • DaaaDaaa

                    #10
                    Re: SQL for Modeling Generalization Hierarchies

                    jcelko212@earth link.net (--CELKO--) wrote in message news:<18c7b3c2. 0405291007.6ac7 2984@posting.go ogle.com>...
                    Is there a good approach to modelling many heterogeneous entity
                    types with that have some attributes in common? <<
                    >
                    Ignoring that the phrase "entity type" is OO **yuck!**
                    >
                    The classic scenario calls for a root class with all the common
                    attributes and then specialized sub-classes under it. As an example,
                    let's take the class of Vehicles and find an industry standard
                    identifier (VIN), and add two mutually exclusive sub-classes, Sport
                    utility vehicles and sedans ('SUV', 'SED').
                    >
                    CREATE TABLE Vehicles
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) NOT NULL
                    CHECK(vehicle_t ype IN ('SUV', 'SED')),
                    UNIQUE (vin, vehicle_type),
                    ..);
                    >
                    Notice the overlapping candidate keys. I then use a compound candidate
                    key (vin, vehicle_type) and a constraint in each sub-class table to
                    assure that the vehicle_type is locked and agrees with the Vehicles
                    table. Add some DRI actions and you are done:
                    >
                    CREATE TABLE SUV
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
                    CHECK(vehicle_t ype = 'SUV'),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Vehicles(vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    >
                    CREATE TABLE Sedans
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                    CHECK(vehicle_t ype = 'SED'),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Vehicles(vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    >
                    I can continue to build a hierarchy like this. For example, if I had
                    a Sedans table that broke down into two-door and four-door sedans, I
                    could a schema like this:
                    >
                    CREATE TABLE Sedans
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                    CHECK(vehicle_t ype IN ('2DR', '4DR', 'SED')),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Vehicles(vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    >
                    CREATE TABLE TwoDoor
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
                    CHECK(vehicle_t ype = '2DR'),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Sedans(vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    >
                    CREATE TABLE FourDoor
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
                    CHECK(vehicle_t ype = '4DR'),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Sedans (vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    >
                    The idea is to build a chain of identifiers and types in a UNIQUE()
                    constraint that go up the tree when you use a REFERENCES constraint.
                    Obviously, you can do variants of this trick to get different class
                    structures.
                    >
                    If an entity doesn't have to be exclusively one subtype, you play with
                    the root of the class hierarchy:
                    >
                    CREATE TABLE Vehicles
                    (vin CHAR(17) NOT NULL,
                    vehicle_type CHAR(3) NOT NULL
                    CHECK(vehicle_t ype IN ('SUV', 'SED')),
                    PRIMARY KEY (vin, vehicle_type),
                    ..);
                    >
                    Now start hiding all this stuff in VIEWs immediately and add an
                    INSTEAD OF trigger to those VIEWs.
                    >
                    Here is the link on Amazon.com for my new book on "Trees & Hierarchies
                    in SQL"
                    >
                    http://www.amazon.com/exec/obidos/tg...roduct-details
                    Good stuff, for one thing I don't get it,
                    Why would one need NOT NULL in the following DDL,
                    CREATE TABLE Sedans
                    (vin CHAR(17) NOT NULL PRIMARY KEY,
                    vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                    CHECK(vehicle_t ype IN ('2DR', '4DR', 'SED')),
                    UNIQUE (vin, vehicle_type),
                    FOREIGN KEY (vin, vehicle_type)
                    REFERENCES Vehicles(vin, vehicle_type)
                    ON UPDATE CASCADE
                    ON DELETE CASCADE,
                    ..);
                    since by definition PRIMARY KEY cannot be null, so, why would I tell
                    people, "I'm an American and an American citizen", instead of just
                    say, "I'm an American"?

                    Comment

                    • ctcgag@hotmail.com

                      #11
                      Re: SQL for Modeling Generalization Hierarchies

                      robertbrown1971 @yahoo.com (Robert Brown) wrote:
                      >
                      This solution does not seem to scale since for each value of
                      "discrimina tor" I need to perform a join with a different table. What
                      if I need to retrieve 1,000 employees at once?
                      Why would you ever need the attributes of more than one subtype? If
                      your query is about doctors, why would you need to join the table
                      about janitors into the query?
                      >
                      Because I need to display users and their heterogenious attributes in
                      the UI as one list (e.g. in response to a search).
                      Uh, that isn't really an answer.
                      If your query is about employees, why would you need to know any of the
                      subtype attributes?
                      >
                      Same reason, our customer's HR department does not see the employees
                      as separate entities. To them an employee is an employee and they want
                      to see all the associated data while browsing employees. I guess you
                      could show the generic attributes on the first pass and then require
                      the end user to click to drill down and see the subtype attributes -
                      but that's not what they want.
                      It sounds to me like the real solution is to fire your HR department and
                      replace them with people who just do their jobs, rather than browsing
                      around on the other employees information out of idle curiousity.

                      If the main goal is to provide custom interfaces, one per job-type, that
                      support job-specific functions--and satisfying nosy Nellies is only
                      secondary--then I think I would go with the main-table / multiple sub-table
                      set up. When people want to see data mixed aggregations of job-types for
                      no apparent reason, they can damn well wait the 5.2 seconds it will take to
                      assemble it.

                      If the main or only goal is to satisfy nosy Nellies, I would either go
                      with a auxilliary table of name-value pairs (and another auxilliary table
                      with listing allowed or suggested attributes for each job type, or just
                      do away with all those tables and make a "free text" clob column on the
                      employee table (or call it XML rather than free text, if that type of
                      things turns you on).



                      Xho

                      --
                      -------------------- http://NewsReader.Com/ --------------------
                      Usenet Newsgroup Service $9.95/Month 30GB

                      Comment

                      • Mikito Harakiri

                        #12
                        Re: SQL for Modeling Generalization Hierarchies

                        <ctcgag@hotmail .comwrote in message
                        news:2004060212 5733.647$6f@new sreader.com...
                        robertbrown1971 @yahoo.com (Robert Brown) wrote:
                        If the main or only goal is to satisfy nosy Nellies, I would either go
                        with a auxilliary table of name-value pairs (and another auxilliary table
                        with listing allowed or suggested attributes for each job type, or just
                        do away with all those tables and make a "free text" clob column on the
                        employee table (or call it XML rather than free text, if that type of
                        things turns you on).
                        Yeah, right, make it XML, and watch your performance going down the drain.

                        If you want "flexible" structure, use auxilliary name-value table. Convert
                        it into nested collection if you have object fever.





                        Comment

                        • Nick Landsberg

                          #13
                          Re: SQL for Modeling Generalization Hierarchies

                          ctcgag@hotmail. com wrote:


                          My 2 cents thrown in at the bottom:
                          robertbrown1971 @yahoo.com (Robert Brown) wrote:
                          >
                          >>>>This solution does not seem to scale since for each value of
                          >>>>"discrimina tor" I need to perform a join with a different table. What
                          >>>>if I need to retrieve 1,000 employees at once?
                          >>>
                          >>>Why would you ever need the attributes of more than one subtype? If
                          >>>your query is about doctors, why would you need to join the table
                          >>>about janitors into the query?
                          >>
                          >>Because I need to display users and their heterogenious attributes in
                          >>the UI as one list (e.g. in response to a search).
                          >
                          >
                          Uh, that isn't really an answer.
                          >
                          >
                          >>>If your query is about employees, why would you need to know any of the
                          >>>subtype attributes?
                          >>
                          >>Same reason, our customer's HR department does not see the employees
                          >>as separate entities. To them an employee is an employee and they want
                          >>to see all the associated data while browsing employees. I guess you
                          >>could show the generic attributes on the first pass and then require
                          >>the end user to click to drill down and see the subtype attributes -
                          >>but that's not what they want.
                          >
                          >
                          It sounds to me like the real solution is to fire your HR department and
                          replace them with people who just do their jobs, rather than browsing
                          around on the other employees information out of idle curiousity.
                          >
                          If the main goal is to provide custom interfaces, one per job-type, that
                          support job-specific functions--and satisfying nosy Nellies is only
                          secondary--then I think I would go with the main-table / multiple sub-table
                          set up. When people want to see data mixed aggregations of job-types for
                          no apparent reason, they can damn well wait the 5.2 seconds it will take to
                          assemble it.
                          >
                          If the main or only goal is to satisfy nosy Nellies, I would either go
                          with a auxilliary table of name-value pairs (and another auxilliary table
                          with listing allowed or suggested attributes for each job type, or just
                          do away with all those tables and make a "free text" clob column on the
                          employee table (or call it XML rather than free text, if that type of
                          things turns you on).
                          >
                          >
                          >
                          Xho
                          >
                          When I was dealing with large database systems, we tried
                          to make a distinction between "transactio n" and "report"
                          and tried to get the response time requirements ignored
                          for "reports".

                          Anything which returns 1,000 rows is no longer a "transactio n"
                          in my book. As Xho said, "... they can damn well wait ..."

                          If this were a print job, it would print on roughly 20 pages,
                          give or take. It would take whoever about 2 minutes
                          to even give it a cursory scan. About the same amount of time
                          to scroll down through a screenful of 1000 entries too,
                          probably more.

                          Questions I used to ask when confronted with this:
                          - What is the business need for this information?
                          - Can you usually wait for it overnight?
                          - In a pinch (when overnight isn't good enough), would
                          10 minutes be OK?
                          - If you really need it in 1-2 seconds, how much is your
                          hardware budget? (Because I'm going to need more/faster hardware
                          to satisfy this request in just a second or two.)

                          What actually used to worry me about having such "reports"
                          as on-line "transactio ns" was how they would impact the
                          response times for the other users who were doing
                          stuff which actually did require rapid response
                          (e.g. pull up an individual's medical history in an
                          emergency).

                          (I think that was actully 3 cents :)

                          NPL


                          --
                          "It is impossible to make anything foolproof
                          because fools are so ingenious"
                          - A. Bloch

                          Comment

                          • Joel Garry

                            #14
                            Re: SQL for Modeling Generalization Hierarchies

                            daaa@rock.com (DaaaDaaa) wrote in message news:<2193afb.0 406011916.5abb2 052@posting.goo gle.com>...
                            >
                            Good stuff, for one thing I don't get it,
                            Why would one need NOT NULL in the following DDL,
                            CREATE TABLE Sedans
                            (vin CHAR(17) NOT NULL PRIMARY KEY,
                            vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
                            CHECK(vehicle_t ype IN ('2DR', '4DR', 'SED')),
                            UNIQUE (vin, vehicle_type),
                            FOREIGN KEY (vin, vehicle_type)
                            REFERENCES Vehicles(vin, vehicle_type)
                            ON UPDATE CASCADE
                            ON DELETE CASCADE,
                            ..);
                            Methinks he means theoretically the primary key could be separate from
                            the vin definition. Just because Oracle disallows it doesn't mean it
                            is redundant. I don't know about ANSI or other db's offhand.
                            since by definition PRIMARY KEY cannot be null, so, why would I tell
                            people, "I'm an American and an American citizen", instead of just
                            say, "I'm an American"?
                            'Cause you are not these guys:



                            Obviously, the attributes of American are cultural, and of a citizen
                            are legal. A citizen could be raised abroad and not consider
                            themselves American, and it is possible to have null citizenship.



                            jg
                            --
                            @home.com is bogus.
                            Happy Birthday, Jerry Mathers.

                            Comment

                            • Job Miller

                              #15
                              Re: SQL for Modeling Generalization Hierarchies

                              Taking out of context slightly:
                              Tom Kyte's viewpoint on this kind of thing I think is:

                              "sounds like a single table design to me. I do not factor out 1:1
                              optional
                              relationships unless a table is "really really wide" -- really wide
                              being into 3 digits.

                              put the most likely to be null columns at the end of the create table
                              and when
                              they are NULL, they will consume NO space.

                              benchmark it. you would always have to join to pick up this optional
                              information (2 or 3 LIO's at least per row retrieved for each optional
                              set of
                              data) vs an extra 50 bytes of flags saying "this is null". I would go
                              for the
                              extra 50 bytes in a row that will be accessed via an index rather then
                              incurring 2/3 LIO's to read an index to access another table. "

                              The question was asking about subtype modeling, but the example the
                              questioner posed wasn't really subtype. The info they were talking
                              about was actually M:1 relationships most likely, but Tom's comments
                              apply to the subtype situation you are discussing.

                              Your model and application are much more simple with this single table
                              model.

                              Read the link for yourself to see the full thread:


                              Comment

                              Working...