Dynamic data elements for a data collection application

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • mittal.pradeep@gmail.com

    Dynamic data elements for a data collection application

    What is the better table design for a data collection application.
    1. Vertical model (pk, attributeName, AttributeValue)
    2. Custom columns (pk, custom1, custom2, custom3...custo m50)

    Since the data elements collected may change year over year, which
    model better takes of this column dynamicness

  • Erland Sommarskog

    #2
    Re: Dynamic data elements for a data collection application

    (mittal.pradeep @gmail.com) writes:[color=blue]
    > What is the better table design for a data collection application.
    > 1. Vertical model (pk, attributeName, AttributeValue)
    > 2. Custom columns (pk, custom1, custom2, custom3...custo m50)
    >
    > Since the data elements collected may change year over year, which
    > model better takes of this column dynamicness[/color]

    The vertical model is certainly cleaner from a relational perspective.
    It also requires less maintenance.

    But admittedly queries can be more complex. If attributes can be of
    different data types, you need some triggers to check this. A tip
    is that the sql_variant data type is good in this case.


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server SP3 at
    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.


    Comment

    • mittal.pradeep@gmail.com

      #3
      Re: Dynamic data elements for a data collection application


      1. But the data collection and reporting is in horizontal format. If
      collected data is edited vertically, won't there be a extra steps of
      converting horizontally obtained data to vertical and then vertical to
      horizontal reports. In custom column model data always remains
      horizontal. Won;t performance not be a issue in taking care of two
      extra steps required in vertical model.
      2. Won;t the concurrency be a issue, considering the fact that a
      logical single horizontal row is edited as say 10 rows. Two people
      might be changing same Primary key's different attributes at the same
      time.


      Erland Sommarskog wrote:[color=blue]
      > (mittal.pradeep @gmail.com) writes:[color=green]
      > > What is the better table design for a data collection application.
      > > 1. Vertical model (pk, attributeName, AttributeValue)
      > > 2. Custom columns (pk, custom1, custom2, custom3...custo m50)
      > >
      > > Since the data elements collected may change year over year, which
      > > model better takes of this column dynamicness[/color]
      >
      > The vertical model is certainly cleaner from a relational perspective.
      > It also requires less maintenance.
      >
      > But admittedly queries can be more complex. If attributes can be of
      > different data types, you need some triggers to check this. A tip
      > is that the sql_variant data type is good in this case.
      >
      >
      > --
      > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
      >
      > Books Online for SQL Server SP3 at
      > http://www.microsoft.com/sql/techinf...2000/books.asp[/color]

      Comment

      • Erland Sommarskog

        #4
        Re: Dynamic data elements for a data collection application

        (mittal.pradeep @gmail.com) writes:[color=blue]
        > 1. But the data collection and reporting is in horizontal format. If
        > collected data is edited vertically, won't there be a extra steps of
        > converting horizontally obtained data to vertical and then vertical to
        > horizontal reports. In custom column model data always remains
        > horizontal. Won;t performance not be a issue in taking care of two
        > extra steps required in vertical model.[/color]

        If you are to present ten of those custom values as columns in a report,
        you get a 10-way self-join. Certainly bulky in code. Performance is
        probably not top-notch, but I don't see that it would be absymal.
        [color=blue]
        > 2. Won;t the concurrency be a issue, considering the fact that a
        > logical single horizontal row is edited as say 10 rows. Two people
        > might be changing same Primary key's different attributes at the same
        > time.[/color]

        Good point. This can be handled fairly easily, but it requires more
        careful programming than the horizontal method.

        Overall, there certainly is a tradeoff. If the set of custom fields are
        faily stable, only change once per year or so, you might be prepared to
        take the extra maintenance cost. But if users asks for new fields every
        week, then the horizontal method could be a nightmare.


        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server SP3 at
        Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.


        Comment

        • mittal.pradeep@gmail.com

          #5
          Re: Dynamic data elements for a data collection application

          Thanks a lot for the reply.
          1. I am not able to understand why i require a 10 way join though. If i
          have a mapping between custom column and actual column, all i need is a
          dyanamic sql generated from the mapping.

          E.g.
          table
          -----
          pk, custom1, custom2, custom3...custo m10


          map
          ---
          customColName ActualColName
          custom1 ActualName1
          custom2 ActualName2
          ....

          Now I can generate dynamic sql using map.

          2. As long as number of custom columns is enough to take care data
          element additions which happen in a year. All that is needed is
          addition of new elements to mapping table to decifer newly assigned
          custom columns




          Erland Sommarskog wrote:[color=blue]
          > (mittal.pradeep @gmail.com) writes:[color=green]
          > > 1. But the data collection and reporting is in horizontal format. If
          > > collected data is edited vertically, won't there be a extra steps of
          > > converting horizontally obtained data to vertical and then vertical to
          > > horizontal reports. In custom column model data always remains
          > > horizontal. Won;t performance not be a issue in taking care of two
          > > extra steps required in vertical model.[/color]
          >
          > If you are to present ten of those custom values as columns in a report,
          > you get a 10-way self-join. Certainly bulky in code. Performance is
          > probably not top-notch, but I don't see that it would be absymal.
          >[color=green]
          > > 2. Won;t the concurrency be a issue, considering the fact that a
          > > logical single horizontal row is edited as say 10 rows. Two people
          > > might be changing same Primary key's different attributes at the same
          > > time.[/color]
          >
          > Good point. This can be handled fairly easily, but it requires more
          > careful programming than the horizontal method.
          >
          > Overall, there certainly is a tradeoff. If the set of custom fields are
          > faily stable, only change once per year or so, you might be prepared to
          > take the extra maintenance cost. But if users asks for new fields every
          > week, then the horizontal method could be a nightmare.
          >
          >
          > --
          > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
          >
          > Books Online for SQL Server SP3 at
          > http://www.microsoft.com/sql/techinf...2000/books.asp[/color]

          Comment

          • --CELKO--

            #6
            Re: Dynamic data elements for a data collection application

            Neither. The design flaw you are calling a vertical model is actually
            known as "EAV" or "Entity-Attribute-Value" because it is a common
            newbie mistake. I have no idea what your #2 means.

            As your data elements change, you need to re-design the schema --
            constraints, keys, data types, etc. Learn RDBMS and do it right.

            I found an old "cut & paste". Someone like you posted this:

            CREATE TABLE EAV -- no key declared
            (key_col VARCHAR (10) NULL,
            attrib_value VARCHAR (50) NULL);

            INSERT INTO EAV VALUES ('LOCATION','Be droom');
            INSERT INTO EAV VALUES ('LOCATION','Di ning Room');
            INSERT INTO EAV VALUES ('LOCATION','Ba throom');
            INSERT INTO EAV VALUES ('LOCATION','co urtyard');
            INSERT INTO EAV VALUES ('EVENT','verba l aggression');
            INSERT INTO EAV VALUES ('EVENT','peer' );
            INSERT INTO EAV VALUES ('EVENT','bad behavior');
            INSERT INTO EAV VALUES ('EVENT','other ');

            CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
            (id INTEGER IDENTITY (1,1) NOT NULL,
            bts_id INTEGER NULL,
            key_col VARCHAR (10) NULL,
            attrib_value VARCHAR (50) NULL );

            INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
            INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
            INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
            INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
            INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
            INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
            INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
            INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
            INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

            Ideally, the result set of the query would be Location Event count
            (headings if possible)

            Bedroom verbal aggression 1
            Bedroom peer 0
            Bedroom bad behavior 0
            Bedroom other 2
            Dining Room verbal aggression 0
            Dining Room peer 0
            Dining Room bad behavior 0
            Dining Room other 0
            Bathroom verbal aggression 0
            Bathroom peer 0
            Bathroom bad behavior 0
            Bathroom other 0
            courtyard verbal aggression 0
            courtyard peer 1
            courtyard bad behavior 0
            courtyard other 1

            Also, if possible, another query would return this result set. (I think
            I know how to do this one.)

            Location Event count
            Bedroom verbal aggression 1
            Bedroom other 2
            courtyard peer 1
            courtyard other 1

            Here is a From: Thomas Coleman

            SELECT Locations.locat ionvalue, Events.eventval ue,
            (SELECT COUNT(*)
            FROM (SELECT LocationData.lo cationvalue, EventData.event value

            FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
            FROM eav_data AS TD1
            WHERE TD1.key = 'location') AS LocationData
            INNER JOIN
            (SELECT TD2.bts_id, TD2.value AS eventvalue
            FROM eav_data AS TD2
            WHERE TD2.key = 'event'
            ) AS EventData
            ON LocationData.bt s_id = EventData.bts_i d
            ) AS CollatedEventDa ta
            WHERE CollatedEventDa ta.locationvalu e = Locations.locat ionvalue
            AND CollatedEventDa ta.eventvalue = Events.eventval ue
            FROM (SELECT T1.value AS locationvalue
            FROM EAV AS T1
            WHERE T1.key = 'location') AS Locations,
            (SELECT T2.value AS eventvalue
            FROM EAV AS T2
            WHERE T2.key = 'event') AS Events
            ORDER BY Locations.locat ionvalue, Events.eventval ue ,
            SELECT Locations.locat ionvalue, Events.eventval ue
            (SELECT COUNT(*)
            FROM (SELECT LocationData.lo cationvalue, EventData.event value

            FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
            FROM eav_data AS TD1
            WHERE TD1.key = 'location') AS LocationData
            INNER JOIN
            (SELECT TD2.bts_id, TD2.value AS eventvalue
            FROM eav_data AS TD2
            WHERE TD2.key = 'event') AS EventData
            ON LocationData.bt s_id = EventData.bts_i d)
            AS CollatedEventDa ta
            WHERE CollatedEventDa ta.locationvalu e = Locations.locat ionvalue
            AND CollatedEventDa ta.eventvalue = Events.eventval ue)
            FROM (SELECT T1.value AS locationvalue
            FROM EAV AS T1
            WHERE T1.key = 'location') AS Locations,
            (SELECT T2.value AS eventvalue
            FROM EAV AS T2
            WHERE T2.key = 'event') AS Events;

            Is the same thing in a proper schema as:

            SELECT L.locationvalue , E.eventvalue, COUNT(*)
            FROM Locations AS L, Events AS E
            WHERE L.btd_id = E.btd_id
            GROUP BY L.locationvalue , E.eventvalue;

            The reason that I had to use so many subqueries is that those entities
            are all lopped into the same table. There should be separate tables for
            Locations and Events.

            The column names are seriously painful. Beyond the fact that I
            personally hate underscores in column names, using underscores at the
            end of the column name is really non-intuitive. I removed them for my
            example and came across the next column name faux pas. Don't use "key"
            and "value" for column names. It means that the developer *has*
            surround the column name with square brackets for everything which is a
            serious pain.

            There is such a thing as "too" generic. There has to be some structure
            or everything becomes nothing more than a couple of tables called
            "things". The real key (no pun intended) is commonality. Is there a
            pattern to the data that they want to store? It may not be possible to
            create one structure to rule them all and in the darkness bind them.

            "To be is to be something in particular; to be nothing in particular is
            to be nothing." --Aristole

            All data integrity is destroyed. Any typo becomes a new attribute or
            entity. Entities are found missing attributes, so all the reports are
            wrong.

            ry to write a single CHECK() constraint that works for all the
            attributes of those 30+ entities your users created because you were
            too dumb or too lazy to do your job. It can be done! You need a case
            expression almost 70 WHEN clauses for a simple invoice and order system
            when I tried it as an exercise.

            ry to write a single DEFAULT clause for 30+ entities crammed into one
            column. Impossible!

            Try to set up DRI actions among the entities. If you thought the WHEN
            clauses in the single CASE expression were unmaintainable, wait until
            you see the "TRIGGERs from Hell" -- Too bad that they might not fit
            into older SQL Server which had some size limits. Now maintain it.

            For those who are interested, there are couple of links to articles I
            found on the net:

            Generic Design of Web-Based Clinical Databases


            The EAV/CR Model of Data Representation


            An Introduction to Entity-Attribute-Value Design for Generic
            Clinical Study Data Management Systems



            Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
            Database
            http://www.pubmedcentral.nih.g­ov/ar...pub­med&pubme...


            Exploring Performance Issues for a Clinical Database Organized Using
            an Entity-Attribute-Value Representation
            http://www.pubmedcentral.nih.g­ov/ar...pub­med&pubme...

            Comment

            • Erland Sommarskog

              #7
              Re: Dynamic data elements for a data collection application

              (mittal.pradeep @gmail.com) writes:[color=blue]
              > Thanks a lot for the reply.
              > 1. I am not able to understand why i require a 10 way join though. If i
              > have a mapping between custom column and actual column, all i need is a
              > dyanamic sql generated from the mapping.
              >
              > E.g.
              > table
              > -----
              > pk, custom1, custom2, custom3...custo m10
              >
              >
              > map
              > ---
              > customColName ActualColName
              > custom1 ActualName1
              > custom2 ActualName2
              > ....
              >
              > Now I can generate dynamic sql using map.[/color]

              When I said 10-way join I was thinking of the vertical solution. For the
              horisontal solution it's a simple join - once you have gone through all
              that SQL building. To me, this sounds more complex to implement. Then
              again, if the user selects dynamically which columns he wants to see,
              the horizontal solution would require dynamic SQL as well.
              [color=blue]
              > 2. As long as number of custom columns is enough to take care data
              > element additions which happen in a year. All that is needed is
              > addition of new elements to mapping table to decifer newly assigned
              > custom columns[/color]

              I didn't realise that you had this mapping table. One could say that
              this is a kind of compromise between the horizonal model and an entirely
              static vertical model.



              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.


              Comment

              • mittal.pradeep@gmail.com

                #8
                Re: Dynamic data elements for a data collection application

                CELKO,

                If I change the data elements 10 times during the year, I cannot change
                the front end 10 to accomodate the same. I need to come up with a
                dyanamic solution to take care of same. Vertical approach is the best
                way to get the same. Using custom columns is a compromise, as vertical
                approach is harder to implement.


                --CELKO-- wrote:[color=blue]
                > Neither. The design flaw you are calling a vertical model is actually
                > known as "EAV" or "Entity-Attribute-Value" because it is a common
                > newbie mistake. I have no idea what your #2 means.
                >
                > As your data elements change, you need to re-design the schema --
                > constraints, keys, data types, etc. Learn RDBMS and do it right.
                >
                > I found an old "cut & paste". Someone like you posted this:
                >
                > CREATE TABLE EAV -- no key declared
                > (key_col VARCHAR (10) NULL,
                > attrib_value VARCHAR (50) NULL);
                >
                > INSERT INTO EAV VALUES ('LOCATION','Be droom');
                > INSERT INTO EAV VALUES ('LOCATION','Di ning Room');
                > INSERT INTO EAV VALUES ('LOCATION','Ba throom');
                > INSERT INTO EAV VALUES ('LOCATION','co urtyard');
                > INSERT INTO EAV VALUES ('EVENT','verba l aggression');
                > INSERT INTO EAV VALUES ('EVENT','peer' );
                > INSERT INTO EAV VALUES ('EVENT','bad behavior');
                > INSERT INTO EAV VALUES ('EVENT','other ');
                >
                > CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
                > (id INTEGER IDENTITY (1,1) NOT NULL,
                > bts_id INTEGER NULL,
                > key_col VARCHAR (10) NULL,
                > attrib_value VARCHAR (50) NULL );
                >
                > INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
                > INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
                > INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
                > INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
                > INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
                > INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
                > INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
                > INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
                > INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');
                >
                > Ideally, the result set of the query would be Location Event count
                > (headings if possible)
                >
                > Bedroom verbal aggression 1
                > Bedroom peer 0
                > Bedroom bad behavior 0
                > Bedroom other 2
                > Dining Room verbal aggression 0
                > Dining Room peer 0
                > Dining Room bad behavior 0
                > Dining Room other 0
                > Bathroom verbal aggression 0
                > Bathroom peer 0
                > Bathroom bad behavior 0
                > Bathroom other 0
                > courtyard verbal aggression 0
                > courtyard peer 1
                > courtyard bad behavior 0
                > courtyard other 1
                >
                > Also, if possible, another query would return this result set. (I think
                > I know how to do this one.)
                >
                > Location Event count
                > Bedroom verbal aggression 1
                > Bedroom other 2
                > courtyard peer 1
                > courtyard other 1
                >
                > Here is a From: Thomas Coleman
                >
                > SELECT Locations.locat ionvalue, Events.eventval ue,
                > (SELECT COUNT(*)
                > FROM (SELECT LocationData.lo cationvalue, EventData.event value
                >
                > FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
                > FROM eav_data AS TD1
                > WHERE TD1.key = 'location') AS LocationData
                > INNER JOIN
                > (SELECT TD2.bts_id, TD2.value AS eventvalue
                > FROM eav_data AS TD2
                > WHERE TD2.key = 'event'
                > ) AS EventData
                > ON LocationData.bt s_id = EventData.bts_i d
                > ) AS CollatedEventDa ta
                > WHERE CollatedEventDa ta.locationvalu e = Locations.locat ionvalue
                > AND CollatedEventDa ta.eventvalue = Events.eventval ue
                > FROM (SELECT T1.value AS locationvalue
                > FROM EAV AS T1
                > WHERE T1.key = 'location') AS Locations,
                > (SELECT T2.value AS eventvalue
                > FROM EAV AS T2
                > WHERE T2.key = 'event') AS Events
                > ORDER BY Locations.locat ionvalue, Events.eventval ue ,
                > SELECT Locations.locat ionvalue, Events.eventval ue
                > (SELECT COUNT(*)
                > FROM (SELECT LocationData.lo cationvalue, EventData.event value
                >
                > FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
                > FROM eav_data AS TD1
                > WHERE TD1.key = 'location') AS LocationData
                > INNER JOIN
                > (SELECT TD2.bts_id, TD2.value AS eventvalue
                > FROM eav_data AS TD2
                > WHERE TD2.key = 'event') AS EventData
                > ON LocationData.bt s_id = EventData.bts_i d)
                > AS CollatedEventDa ta
                > WHERE CollatedEventDa ta.locationvalu e = Locations.locat ionvalue
                > AND CollatedEventDa ta.eventvalue = Events.eventval ue)
                > FROM (SELECT T1.value AS locationvalue
                > FROM EAV AS T1
                > WHERE T1.key = 'location') AS Locations,
                > (SELECT T2.value AS eventvalue
                > FROM EAV AS T2
                > WHERE T2.key = 'event') AS Events;
                >
                > Is the same thing in a proper schema as:
                >
                > SELECT L.locationvalue , E.eventvalue, COUNT(*)
                > FROM Locations AS L, Events AS E
                > WHERE L.btd_id = E.btd_id
                > GROUP BY L.locationvalue , E.eventvalue;
                >
                > The reason that I had to use so many subqueries is that those entities
                > are all lopped into the same table. There should be separate tables for
                > Locations and Events.
                >
                > The column names are seriously painful. Beyond the fact that I
                > personally hate underscores in column names, using underscores at the
                > end of the column name is really non-intuitive. I removed them for my
                > example and came across the next column name faux pas. Don't use "key"
                > and "value" for column names. It means that the developer *has*
                > surround the column name with square brackets for everything which is a
                > serious pain.
                >
                > There is such a thing as "too" generic. There has to be some structure
                > or everything becomes nothing more than a couple of tables called
                > "things". The real key (no pun intended) is commonality. Is there a
                > pattern to the data that they want to store? It may not be possible to
                > create one structure to rule them all and in the darkness bind them.
                >
                > "To be is to be something in particular; to be nothing in particular is
                > to be nothing." --Aristole
                >
                > All data integrity is destroyed. Any typo becomes a new attribute or
                > entity. Entities are found missing attributes, so all the reports are
                > wrong.
                >
                > ry to write a single CHECK() constraint that works for all the
                > attributes of those 30+ entities your users created because you were
                > too dumb or too lazy to do your job. It can be done! You need a case
                > expression almost 70 WHEN clauses for a simple invoice and order system
                > when I tried it as an exercise.
                >
                > ry to write a single DEFAULT clause for 30+ entities crammed into one
                > column. Impossible!
                >
                > Try to set up DRI actions among the entities. If you thought the WHEN
                > clauses in the single CASE expression were unmaintainable, wait until
                > you see the "TRIGGERs from Hell" -- Too bad that they might not fit
                > into older SQL Server which had some size limits. Now maintain it.
                >
                > For those who are interested, there are couple of links to articles I
                > found on the net:
                >
                > Generic Design of Web-Based Clinical Databases
                > http://www.jmir.org/2003/4/e27­/
                >
                > The EAV/CR Model of Data Representation
                > http://ycmi.med.yale.edu/nadka­rni/eav_CR_contents.htm
                >
                > An Introduction to Entity-Attribute-Value Design for Generic
                > Clinical Study Data Management Systems
                > http://ycmi.med.yale.edu/nadka­rni/I...­systems..htm
                >
                >
                > Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
                > Database
                > http://www.pubmedcentral.nih.g­ov/ar...pub­med&pubme...
                >
                >
                > Exploring Performance Issues for a Clinical Database Organized Using
                > an Entity-Attribute-Value Representation
                > http://www.pubmedcentral.nih.g­ov/ar...pub­med&pubme...[/color]

                Comment

                Working...