Database design for large numbers of attributes

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John

    Database design for large numbers of attributes

    I'm developing an application for medical use that will be used to
    capture patient background and visit data. The application will have
    approximately 50 forms, with an average of about 20 fields each-- so a
    total of 1000+ fields. Almost always, the fields are unique to a
    particular form--and it will be pretty unlikely that new forms and
    fields will be added--although not out of the realm of possibility.
    There will also be a search feature that will allow physicians to
    query for certain forms based on as many attributes as they want to
    use. Another relevant piece of info is that each form will have a
    status, and that it will be necessary to aggregate all of the form
    statuses for a patient to determine a patient status. The target
    database is ORacle 9i (9.2.0.2). Total number of patients will
    probably be around 10,000-- each will probably have an average of 50
    forms (including multipe visit forms)-- so about 500,000 forms.
    I'm trying to decide how to design the database:
    SHould I
    1. use a "survey" type design, where every attribute will be a row in
    a response table?
    Pros:
    -most normalized
    -most flexible and easy to make "schema" changes
    Cons
    -Large number of rows in response table
    -cumbersome to work with
    -reduced performance
    2. create a separate table for each form w/ its unique attributes, and
    a master table that has the common attributes and the primary keys
    Pros
    -Easier to work with
    -updates and selects will have better performance
    Cons
    -Not normalized?
    -The "ad-hoc" search queries and reports may be difficult to write and
    maintain (possibly solvable w/ a large view?)
    3. Store all the data for each form as XML in an XMLType column (non
    schema based)
    Pros:
    -Easy to work with
    -updates and selects will have good performance (since i'm always
    updating or selecting an entire form)-- may actually have the best
    performance of all 4 options for updating and retrieving form data
    -normalized
    Cons
    -XML bloats storagce space requirements (but not really that big a
    deal)
    -Will make ad-hoc search and reporting queries perform(and probably
    use lots of RAM) more slowly since XML will have to be parsed in order
    for Xpath queries to run
    4. USe oracle 9i's schema based XML feature
    Pros
    -Faster Xpath queries than option 3
    COns
    -If shcema changes, all generated objects have to be dropped and
    re-created-- seems like a big pain
    -If I was going to do tihs, why not just go with option 2.


    I'm most tempted by options 2 and 3-- and would really prefer 3 if
    there was a way to make sure that the ad-hoc queries would perform
    decently.
  • Dave

    #2
    Re: Database design for large numbers of attributes

    rjg96@yahoo.com (John) wrote in message news:<63394c1f. 0402152348.7f8d b6b2@posting.go ogle.com>...[color=blue]
    > I'm developing an application for medical use that will be used to
    > capture patient background and visit data. The application will have
    > approximately 50 forms, with an average of about 20 fields each-- so a
    > total of 1000+ fields. Almost always, the fields are unique to a
    > particular form--and it will be pretty unlikely that new forms and
    > fields will be added--although not out of the realm of possibility.
    > There will also be a search feature that will allow physicians to
    > query for certain forms based on as many attributes as they want to
    > use. Another relevant piece of info is that each form will have a
    > status, and that it will be necessary to aggregate all of the form
    > statuses for a patient to determine a patient status. The target
    > database is ORacle 9i (9.2.0.2). Total number of patients will
    > probably be around 10,000-- each will probably have an average of 50
    > forms (including multipe visit forms)-- so about 500,000 forms.
    > I'm trying to decide how to design the database:
    > SHould I
    > 1. use a "survey" type design, where every attribute will be a row in
    > a response table?
    > Pros:
    > -most normalized
    > -most flexible and easy to make "schema" changes
    > Cons
    > -Large number of rows in response table
    > -cumbersome to work with
    > -reduced performance
    > 2. create a separate table for each form w/ its unique attributes, and
    > a master table that has the common attributes and the primary keys
    > Pros
    > -Easier to work with
    > -updates and selects will have better performance
    > Cons
    > -Not normalized?
    > -The "ad-hoc" search queries and reports may be difficult to write and
    > maintain (possibly solvable w/ a large view?)
    > 3. Store all the data for each form as XML in an XMLType column (non
    > schema based)
    > Pros:
    > -Easy to work with
    > -updates and selects will have good performance (since i'm always
    > updating or selecting an entire form)-- may actually have the best
    > performance of all 4 options for updating and retrieving form data
    > -normalized
    > Cons
    > -XML bloats storagce space requirements (but not really that big a
    > deal)
    > -Will make ad-hoc search and reporting queries perform(and probably
    > use lots of RAM) more slowly since XML will have to be parsed in order
    > for Xpath queries to run
    > 4. USe oracle 9i's schema based XML feature
    > Pros
    > -Faster Xpath queries than option 3
    > COns
    > -If shcema changes, all generated objects have to be dropped and
    > re-created-- seems like a big pain
    > -If I was going to do tihs, why not just go with option 2.
    >
    >
    > I'm most tempted by options 2 and 3-- and would really prefer 3 if
    > there was a way to make sure that the ad-hoc queries would perform
    > decently.[/color]

    John -

    The right answer really depends on your specific needs. Here are some
    things to think about....

    Using the row based approach really hinders your ability to query
    effectively both from a performance standpoint and complexity of the
    query syntax. Using a column based approach really makes querying
    easier...for instance, "select * from tab1 where col1 > 2 and col2 <=
    col3+col4 or col5 is not null", etc... try doing this with a row based
    approach.

    In the row-based approach, you also lose type checking since you have
    to store everything as a string and your model becomes less
    self-describing.

    With a column-based approach, you lose some of the flexibility you
    mention of adding and removing attributes, but it doesn't sound like
    you need it. You could separate volatile vs. fixed attributes in
    separate tables if you find the need to support adding/removing
    attributes. Then you wouldn't have to rebuild everything when things
    change.

    Nothing says you need to go with one approach only. Perhaps you can
    separate columns that are likely to be queried from ones that will
    not be queried and go with both a column and row-based approach. Or
    put all string based attributes in a row-based table if that would
    help.

    If many of the attributes are simple yes/no answers, perhaps you could
    look into encoding them as a bit string within one column. For
    instance, Q1=Yes, Q2=No, Q3=Yes, could be represented as 101 in one
    table column and could be queried using the BITAND function. If you
    have 100 Yes/No answers, you've just compressed 100 columns into 1,
    but you need to make sure the BITAND function will support your
    queries and perform well.

    I'm not that familiar with the XML functionality in Oracle yet, but I
    suspect if you have aggregate query requirements, then relational will
    perform better. But again, you may find a hybrid approach is best.
    Perhaps some of your attributes are best stored as XML.

    These aren't answers, but I hope it gives you something to think
    about.

    Good luck,
    Dave

    Comment

    Working...