Q: calling all gurus

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Saturnin Kepa

    Q: calling all gurus

    The Scenario:

    We have some data that can be in three states. One is a saved state,
    a temporary state, and a complete state.

    The complete state is the bulk of the data. This data will be
    requested 1% of the time, maybe even less.

    The saved state and temporary state will only number 10-15 records.
    This data will be used 99% of the time, very frequently.


    So to summarize there will be 10-15 records that will be used ALL the
    time…. And 50,000-100,000 that will be used very infrequently.

    The programmer that I work with wants to combine the tables. I want
    to keep the tables sepperate.

    My logic is that it will be much faster to work with 10-15 records
    than sift threw 50-100k most of which are unnecessary to 99% of the
    requests. Also there are frequent deletions/modifications in the
    temporary table(10-15 records). The complete table can NOT have
    modifications or deletions (50-100k records). The optimization for
    these tables is also very different.

    He says that since the structures are almost identical they should be
    in the same table, that we should just use a field to denote the
    different types of records. He insists that my view is wrong and that
    I know nothing about databases.

    I may be new to SQL, but am I really that out to lunch? Would it not
    make sense to keep these tables separate?

    (Thank you for taking the time to read this)

    Sat
  • David Portas

    #2
    Re: calling all gurus

    It's always difficult to give design advice online without the opportunity
    to analyse a particular situation in detail. But my opinion, based on the
    information given, is that you should definitely use a single table.

    Always start with the logical model - you appear to have a single entity so
    you should have a single table. Add a column to indicate the status and
    index it. The table is so small that retrieving the relevant rows from the
    index should give SQLServer no trouble at all. Updating even this indexed
    column should be far preferable to copying data between tables when the
    status changes.

    But as always with performance questions: don't believe what I say - test it
    and see.

    --
    David Portas
    ------------
    Please reply only to the newsgroup
    --


    Comment

    • John Palmer

      #3
      Re: Q: calling all gurus

      On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
      wrote:
      [color=blue]
      >The Scenario:
      >
      >We have some data that can be in three states. One is a saved state,
      >a temporary state, and a complete state.
      >
      >The complete state is the bulk of the data. This data will be
      >requested 1% of the time, maybe even less.
      >
      >The saved state and temporary state will only number 10-15 records.
      >This data will be used 99% of the time, very frequently.
      >
      >
      >So to summarize there will be 10-15 records that will be used ALL the
      >time…. And 50,000-100,000 that will be used very infrequently.
      >
      >The programmer that I work with wants to combine the tables. I want
      >to keep the tables sepperate.
      >
      >My logic is that it will be much faster to work with 10-15 records
      >than sift threw 50-100k most of which are unnecessary to 99% of the
      >requests. Also there are frequent deletions/modifications in the
      >temporary table(10-15 records). The complete table can NOT have
      >modification s or deletions (50-100k records). The optimization for
      >these tables is also very different.
      >
      >He says that since the structures are almost identical they should be
      >in the same table, that we should just use a field to denote the
      >different types of records. He insists that my view is wrong and that
      >I know nothing about databases.
      >
      >I may be new to SQL, but am I really that out to lunch? Would it not
      >make sense to keep these tables separate?[/color]

      I'm a relative newbie at SQL design and administration, but I believe
      that it probably won't make much difference. If you build the table
      with a clustered index (hopefully an integer), and with an additional
      field to flag "saved", "temporary" and "complete", and index the
      additional field, there will be nearly no time lost in scanning the
      index, and grabbing the 10-15 rows in either of the first two tables.

      If these were large tables, it might make a difference, but even then,
      less than you might think; the index on the flag field essentially
      splits the table into three separate tables, so long as your queries
      are written to take advantage of that index. (e.g.: if you're looking
      for something "saved" that has a certain unique attribute, make sure
      you specify in the query that it's "saved", so SQL Server doesn't have
      to do a table scan for the attribute you're looking for.

      (Also, as a side note: proper database design suggests that you do not
      have a field that allows the words "saved", "temporary" and "complete;
      you should have a field that allows integers from one to three, and a
      second table that equates 1 with "saved", 2 with "temporary" and 3
      with "complete" (or whatever order seems best). Comparing integers is
      much faster than comparing strings, and this saves a nice chunk of
      space, allowing more rows to be cached.)

      Comment

      • David Portas

        #4
        Re: calling all gurus

        It's always difficult to give design advice online without the opportunity
        to analyse a particular situation in detail. But my opinion, based on the
        information given, is that you should definitely use a single table.

        Always start with the logical model - you appear to have a single entity so
        you should have a single table. Add a column to indicate the status and
        index it. The table is so small that retrieving the relevant rows from the
        index should give SQLServer no trouble at all. Updating even this indexed
        column should be far preferable to copying data between tables when the
        status changes.

        But as always with performance questions: don't believe what I say - test it
        and see.

        --
        David Portas
        ------------
        Please reply only to the newsgroup
        --


        Comment

        • John Palmer

          #5
          Re: Q: calling all gurus

          On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
          wrote:
          [color=blue]
          >The Scenario:
          >
          >We have some data that can be in three states. One is a saved state,
          >a temporary state, and a complete state.
          >
          >The complete state is the bulk of the data. This data will be
          >requested 1% of the time, maybe even less.
          >
          >The saved state and temporary state will only number 10-15 records.
          >This data will be used 99% of the time, very frequently.
          >
          >
          >So to summarize there will be 10-15 records that will be used ALL the
          >time…. And 50,000-100,000 that will be used very infrequently.
          >
          >The programmer that I work with wants to combine the tables. I want
          >to keep the tables sepperate.
          >
          >My logic is that it will be much faster to work with 10-15 records
          >than sift threw 50-100k most of which are unnecessary to 99% of the
          >requests. Also there are frequent deletions/modifications in the
          >temporary table(10-15 records). The complete table can NOT have
          >modification s or deletions (50-100k records). The optimization for
          >these tables is also very different.
          >
          >He says that since the structures are almost identical they should be
          >in the same table, that we should just use a field to denote the
          >different types of records. He insists that my view is wrong and that
          >I know nothing about databases.
          >
          >I may be new to SQL, but am I really that out to lunch? Would it not
          >make sense to keep these tables separate?[/color]

          I'm a relative newbie at SQL design and administration, but I believe
          that it probably won't make much difference. If you build the table
          with a clustered index (hopefully an integer), and with an additional
          field to flag "saved", "temporary" and "complete", and index the
          additional field, there will be nearly no time lost in scanning the
          index, and grabbing the 10-15 rows in either of the first two tables.

          If these were large tables, it might make a difference, but even then,
          less than you might think; the index on the flag field essentially
          splits the table into three separate tables, so long as your queries
          are written to take advantage of that index. (e.g.: if you're looking
          for something "saved" that has a certain unique attribute, make sure
          you specify in the query that it's "saved", so SQL Server doesn't have
          to do a table scan for the attribute you're looking for.

          (Also, as a side note: proper database design suggests that you do not
          have a field that allows the words "saved", "temporary" and "complete;
          you should have a field that allows integers from one to three, and a
          second table that equates 1 with "saved", 2 with "temporary" and 3
          with "complete" (or whatever order seems best). Comparing integers is
          much faster than comparing strings, and this saves a nice chunk of
          space, allowing more rows to be cached.)

          Comment

          • Craig Kelly

            #6
            Re: calling all gurus

            "Saturnin Kepa" <no1ninja@hotma il.com> wrote:

            <snip>
            [color=blue]
            > My logic is that it will be much faster to work with 10-15 records
            > than sift threw 50-100k most of which are unnecessary to 99% of the
            > requests. Also there are frequent deletions/modifications in the
            > temporary table(10-15 records). The complete table can NOT have
            > modifications or deletions (50-100k records). The optimization for
            > these tables is also very different.[/color]

            <snip>

            My two cents: I agree with David. However, I'd also like to throw in the
            old adage that pre-mature optimization can cause you untold headaches. My
            advice would be to go with a nice, clean, logical design (in this instance,
            avoiding the use of multiple tables for a single type of entity based on a
            single property).

            If some prudent indexing won't keep your performance at acceptable levels,
            then you can try things like splitting into separate tables. However, my
            guess is that you'll find the single table performs just fine: it's not like
            we're talking about 10 billion records or anything.

            Craig


            Comment

            • Lyndon Hills

              #7
              Re: Q: calling all gurus

              On Wed, 29 Oct 2003 09:07:20 -0800, John Palmer
              <jpalmer1@ix.ne tcom.com> wrote:
              [color=blue]
              >On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
              >wrote:
              >[color=green]
              >>The Scenario:
              >>
              >>We have some data that can be in three states. One is a saved state,
              >>a temporary state, and a complete state.
              >>
              >>The complete state is the bulk of the data. This data will be
              >>requested 1% of the time, maybe even less.
              >>
              >>The saved state and temporary state will only number 10-15 records.
              >>This data will be used 99% of the time, very frequently.
              >>
              >>
              >>So to summarize there will be 10-15 records that will be used ALL the
              >>time…. And 50,000-100,000 that will be used very infrequently.
              >>
              >>The programmer that I work with wants to combine the tables. I want
              >>to keep the tables sepperate.
              >>
              >>My logic is that it will be much faster to work with 10-15 records
              >>than sift threw 50-100k most of which are unnecessary to 99% of the
              >>requests. Also there are frequent deletions/modifications in the
              >>temporary table(10-15 records). The complete table can NOT have
              >>modificatio ns or deletions (50-100k records). The optimization for
              >>these tables is also very different.
              >>
              >>He says that since the structures are almost identical they should be
              >>in the same table, that we should just use a field to denote the
              >>different types of records. He insists that my view is wrong and that
              >>I know nothing about databases.
              >>
              >>I may be new to SQL, but am I really that out to lunch? Would it not
              >>make sense to keep these tables separate?[/color]
              >
              >I'm a relative newbie at SQL design and administration, but I believe
              >that it probably won't make much difference. If you build the table
              >with a clustered index (hopefully an integer), and with an additional
              >field to flag "saved", "temporary" and "complete", and index the
              >additional field, there will be nearly no time lost in scanning the
              >index, and grabbing the 10-15 rows in either of the first two tables.
              >
              >If these were large tables, it might make a difference, but even then,
              >less than you might think; the index on the flag field essentially
              >splits the table into three separate tables, so long as your queries
              >are written to take advantage of that index. (e.g.: if you're looking
              >for something "saved" that has a certain unique attribute, make sure
              >you specify in the query that it's "saved", so SQL Server doesn't have
              >to do a table scan for the attribute you're looking for.
              >
              >(Also, as a side note: proper database design suggests that you do not
              >have a field that allows the words "saved", "temporary" and "complete;
              >you should have a field that allows integers from one to three, and a
              >second table that equates 1 with "saved", 2 with "temporary" and 3
              >with "complete" (or whatever order seems best). Comparing integers is
              >much faster than comparing strings, and this saves a nice chunk of
              >space, allowing more rows to be cached.)[/color]

              A further consideration could be how you are obtaining your primary
              key. If this is one of the auto numbering schemes you may well find it
              harder to work with two tables. As far as the speed issue is
              concerned, try both ways and see if the difference justifies having
              two tables.

              Comment

              • Craig Kelly

                #8
                Re: calling all gurus

                "Saturnin Kepa" <no1ninja@hotma il.com> wrote:

                <snip>
                [color=blue]
                > My logic is that it will be much faster to work with 10-15 records
                > than sift threw 50-100k most of which are unnecessary to 99% of the
                > requests. Also there are frequent deletions/modifications in the
                > temporary table(10-15 records). The complete table can NOT have
                > modifications or deletions (50-100k records). The optimization for
                > these tables is also very different.[/color]

                <snip>

                My two cents: I agree with David. However, I'd also like to throw in the
                old adage that pre-mature optimization can cause you untold headaches. My
                advice would be to go with a nice, clean, logical design (in this instance,
                avoiding the use of multiple tables for a single type of entity based on a
                single property).

                If some prudent indexing won't keep your performance at acceptable levels,
                then you can try things like splitting into separate tables. However, my
                guess is that you'll find the single table performs just fine: it's not like
                we're talking about 10 billion records or anything.

                Craig


                Comment

                • --CELKO--

                  #9
                  Re: Q: calling all gurus

                  >> The programmer that I work with wants to combine the tables. I
                  want to keep the tables separate. <<

                  Why? If they are the same entity in your data model, then they should
                  be in one table. What you are doing is called attribute splitting and
                  it gets messy very fast.
                  [color=blue][color=green]
                  >> My logic is that it will be much faster to work with 10-15 records[/color][/color]
                  [sic] than sift threw 50-100k most of which are unnecessary to 99% of
                  the requests. <<

                  First of all, rows are not records and the differences are vital to
                  using and understanding SQL. Secondly, use an index or uniqueness
                  constraint to access the rows; why would you look thru the whole
                  table? This is not a sequential file system.
                  [color=blue][color=green]
                  >> He says that since the structures are almost identical they should[/color][/color]
                  be in the same table, that we should just use a field [sic] to denote
                  the different types of records [sic]. <<

                  Yes. If they are the same kind of entities, and differ only by a
                  status code, then he is right. Get the dara model right first, then
                  tune the performance.

                  Comment

                  • Lyndon Hills

                    #10
                    Re: Q: calling all gurus

                    On Wed, 29 Oct 2003 09:07:20 -0800, John Palmer
                    <jpalmer1@ix.ne tcom.com> wrote:
                    [color=blue]
                    >On 29 Oct 2003 07:59:10 -0800, no1ninja@hotmai l.com (Saturnin Kepa)
                    >wrote:
                    >[color=green]
                    >>The Scenario:
                    >>
                    >>We have some data that can be in three states. One is a saved state,
                    >>a temporary state, and a complete state.
                    >>
                    >>The complete state is the bulk of the data. This data will be
                    >>requested 1% of the time, maybe even less.
                    >>
                    >>The saved state and temporary state will only number 10-15 records.
                    >>This data will be used 99% of the time, very frequently.
                    >>
                    >>
                    >>So to summarize there will be 10-15 records that will be used ALL the
                    >>time…. And 50,000-100,000 that will be used very infrequently.
                    >>
                    >>The programmer that I work with wants to combine the tables. I want
                    >>to keep the tables sepperate.
                    >>
                    >>My logic is that it will be much faster to work with 10-15 records
                    >>than sift threw 50-100k most of which are unnecessary to 99% of the
                    >>requests. Also there are frequent deletions/modifications in the
                    >>temporary table(10-15 records). The complete table can NOT have
                    >>modificatio ns or deletions (50-100k records). The optimization for
                    >>these tables is also very different.
                    >>
                    >>He says that since the structures are almost identical they should be
                    >>in the same table, that we should just use a field to denote the
                    >>different types of records. He insists that my view is wrong and that
                    >>I know nothing about databases.
                    >>
                    >>I may be new to SQL, but am I really that out to lunch? Would it not
                    >>make sense to keep these tables separate?[/color]
                    >
                    >I'm a relative newbie at SQL design and administration, but I believe
                    >that it probably won't make much difference. If you build the table
                    >with a clustered index (hopefully an integer), and with an additional
                    >field to flag "saved", "temporary" and "complete", and index the
                    >additional field, there will be nearly no time lost in scanning the
                    >index, and grabbing the 10-15 rows in either of the first two tables.
                    >
                    >If these were large tables, it might make a difference, but even then,
                    >less than you might think; the index on the flag field essentially
                    >splits the table into three separate tables, so long as your queries
                    >are written to take advantage of that index. (e.g.: if you're looking
                    >for something "saved" that has a certain unique attribute, make sure
                    >you specify in the query that it's "saved", so SQL Server doesn't have
                    >to do a table scan for the attribute you're looking for.
                    >
                    >(Also, as a side note: proper database design suggests that you do not
                    >have a field that allows the words "saved", "temporary" and "complete;
                    >you should have a field that allows integers from one to three, and a
                    >second table that equates 1 with "saved", 2 with "temporary" and 3
                    >with "complete" (or whatever order seems best). Comparing integers is
                    >much faster than comparing strings, and this saves a nice chunk of
                    >space, allowing more rows to be cached.)[/color]

                    A further consideration could be how you are obtaining your primary
                    key. If this is one of the auto numbering schemes you may well find it
                    harder to work with two tables. As far as the speed issue is
                    concerned, try both ways and see if the difference justifies having
                    two tables.

                    Comment

                    • --CELKO--

                      #11
                      Re: Q: calling all gurus

                      >> The programmer that I work with wants to combine the tables. I
                      want to keep the tables separate. <<

                      Why? If they are the same entity in your data model, then they should
                      be in one table. What you are doing is called attribute splitting and
                      it gets messy very fast.
                      [color=blue][color=green]
                      >> My logic is that it will be much faster to work with 10-15 records[/color][/color]
                      [sic] than sift threw 50-100k most of which are unnecessary to 99% of
                      the requests. <<

                      First of all, rows are not records and the differences are vital to
                      using and understanding SQL. Secondly, use an index or uniqueness
                      constraint to access the rows; why would you look thru the whole
                      table? This is not a sequential file system.
                      [color=blue][color=green]
                      >> He says that since the structures are almost identical they should[/color][/color]
                      be in the same table, that we should just use a field [sic] to denote
                      the different types of records [sic]. <<

                      Yes. If they are the same kind of entities, and differ only by a
                      status code, then he is right. Get the dara model right first, then
                      tune the performance.

                      Comment

                      • GoranG

                        #12
                        Re: Q: calling all gurus

                        On 29 Oct 2003 13:43:52 -0800, joe.celko@north face.edu (--CELKO--)
                        wrote:
                        [color=blue][color=green][color=darkred]
                        >>> The programmer that I work with wants to combine the tables. I[/color][/color]
                        >want to keep the tables separate. <<
                        >
                        >Why? If they are the same entity in your data model, then they should
                        >be in one table. What you are doing is called attribute splitting and
                        >it gets messy very fast.
                        >[color=green][color=darkred]
                        >>> My logic is that it will be much faster to work with 10-15 records[/color][/color]
                        >[sic] than sift threw 50-100k most of which are unnecessary to 99% of
                        >the requests. <<
                        >
                        >First of all, rows are not records and the differences are vital to
                        >using and understanding SQL. Secondly, use an index or uniqueness
                        >constraint to access the rows; why would you look thru the whole
                        >table? This is not a sequential file system.
                        >[color=green][color=darkred]
                        >>> He says that since the structures are almost identical they should[/color][/color]
                        >be in the same table, that we should just use a field [sic] to denote
                        >the different types of records [sic]. <<
                        >
                        >Yes. If they are the same kind of entities, and differ only by a
                        >status code, then he is right. Get the dara model right first, then
                        >tune the performance.[/color]

                        (not a guru, but here it goes anyway...)

                        Note the word 'almost identical' - isn't that 'almost positive' sign
                        for either over generalization or existing denormalization (especially
                        on conceptual level of design)?

                        Also if one entity never changes (only inserts allowed) and the other
                        allows updates, inserts and deletes doesn't that suggest different
                        entities, and calls for deeper examination of those entities and their
                        attributes?

                        To me it seems that their DBA is the one that fails to see the
                        difference between the conceptual data modeling and physical data
                        model.

                        ---

                        On another note, also hoping to learn something...
                        What exactly does get messy with (for instance):

                        CREATE TABLE ABIG (
                        prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
                        prop2 CHAR(50))

                        CREATE TABLE ASMALL (
                        prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
                        prop2 CHAR(50))

                        CREATE VIEW A AS
                        SELECT prop1, prop2
                        FROM ABIG
                        UNION ALL
                        SELECT prop1, prop2
                        FROM ASMALL

                        INSERT INTO A (prop1, prop2) VALUES (1, 'G');
                        INSERT INTO A (prop1, prop2) VALUES (1001, 'G');

                        ....except that it is already messy.
                        Does it get even messier?

                        ( GoranG79 AT hotmail.com )

                        Comment

                        • hansje

                          #13
                          Re: Q: calling all gurus

                          QUOTE
                          On another note, also hoping to learn something...
                          What exactly does get messy with (for instance):

                          CREATE TABLE ABIG (
                          prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
                          prop2 CHAR(50))

                          CREATE TABLE ASMALL (
                          prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
                          prop2 CHAR(50))

                          CREATE VIEW A AS
                          SELECT prop1, prop2
                          FROM ABIG
                          UNION ALL
                          SELECT prop1, prop2
                          FROM ASMALL

                          INSERT INTO A (prop1, prop2) VALUES (1, 'G');
                          INSERT INTO A (prop1, prop2) VALUES (1001, 'G');

                          ....except that it is already messy.
                          Does it get even messier?
                          END QUOTE



                          GoranG,

                          I'm not sure what you are asking or demonstrating here?

                          Tnx,

                          Hans Brouwer

                          *** Sent via Developersdex http://www.developersdex.com ***
                          Don't just participate in USENET...get rewarded for it!

                          Comment

                          • GoranG

                            #14
                            Re: Q: calling all gurus

                            On 30 Oct 2003 12:41:08 GMT, hansje <hansjes@anonym ous.com> wrote:
                            [color=blue]
                            >QUOTE
                            >On another note, also hoping to learn something...
                            >What exactly does get messy with (for instance):
                            >
                            >CREATE TABLE ABIG (
                            > prop1 INT PRIMARY KEY CHECK (prop2 > 1000),
                            > prop2 CHAR(50))
                            >
                            >CREATE TABLE ASMALL (
                            > prop1 INT PRIMARY KEY CHECK (prop1 <= 1000),
                            > prop2 CHAR(50))
                            >
                            >CREATE VIEW A AS
                            >SELECT prop1, prop2
                            >FROM ABIG
                            >UNION ALL
                            >SELECT prop1, prop2
                            >FROM ASMALL
                            >
                            >INSERT INTO A (prop1, prop2) VALUES (1, 'G');
                            >INSERT INTO A (prop1, prop2) VALUES (1001, 'G');
                            >
                            >...except that it is already messy.
                            >Does it get even messier?
                            >END QUOTE
                            >
                            >
                            >
                            >GoranG,
                            >
                            >I'm not sure what you are asking or demonstrating here?[/color]

                            Potential logical equivalence of spliting attributes at physical
                            level.

                            ( GoranG79 AT hotmail.com )

                            Comment

                            • GoranG

                              #15
                              Re: Q: calling all gurus

                              On Thu, 30 Oct 2003 14:57:06 +0100, GoranG <no@spam.net> wrote:

                              <cut>[color=blue]
                              >
                              >Potential logical equivalence of spliting attributes at physical
                              >level.[/color]

                              Oops.

                              Potential logical equivalence of
                              a) splitting attributes at physical level and having two physical
                              entities and a single view
                              to
                              b) single entity at physical level.

                              ( GoranG79 AT hotmail.com )

                              Comment

                              Working...