Database Design Q: Many Rows or Many Tables?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Good Man

    Database Design Q: Many Rows or Many Tables?

    Hi there

    I'm in the planning stages of creating a database, and I have two options
    here. Which makes more sense, and/or provides better performance for
    queries - a single table with hundreds of thousands of rows, or many
    tables with less rows?

    GOAL:
    Recording information about buildings, storey by storey. Basically I
    will be recording six bits of information about each storey in each
    building.

    SETUP:
    There will be one table for Buildings for sure. The big question is about
    how I should record those six details per storey.

    OPTIONS:
    A) One table that records building ID, storey number, and the six
    measurements per row.
    B) Six tables that record building ID, storey number, and the appropriate
    measurement.


    Eventually, there could be a few thousand buildings in the database, and
    each building may have anywhere from 5 to 50 storeys. Not to mention that
    "old information" is to be kept and archived as new information about
    storeys is added.

    This means with option A, I can expect to have a table of a few hundred
    thousand rows.... (obviously i would index the BuildingID and Storey
    Number). Sounds like a lot(!), but it means that I could retreive/select
    information I need with one single query and no table joins.

    If I go with option B, I can expect to have to do six table joins/queries
    when retreiving information about one storey in one building, however
    there will be less records to sift through.


    Which is the better option to take, and why? By 'better', I am referring
    to database performance/speed, updates and other future maintenance on
    the database. All opinions welcome.

    Thanks,
    GM
    ,
  • Oli Filth

    #2
    Re: Database Design Q: Many Rows or Many Tables?

    Good Man wrote:[color=blue]
    > Hi there
    >
    > I'm in the planning stages of creating a database, and I have two options
    > here. Which makes more sense, and/or provides better performance for
    > queries - a single table with hundreds of thousands of rows, or many
    > tables with less rows?
    >
    > GOAL:
    > Recording information about buildings, storey by storey. Basically I
    > will be recording six bits of information about each storey in each
    > building.
    >
    > SETUP:
    > There will be one table for Buildings for sure. The big question is about
    > how I should record those six details per storey.
    >
    > OPTIONS:
    > A) One table that records building ID, storey number, and the six
    > measurements per row.
    > B) Six tables that record building ID, storey number, and the appropriate
    > measurement.
    >
    >
    > Eventually, there could be a few thousand buildings in the database, and
    > each building may have anywhere from 5 to 50 storeys. Not to mention that
    > "old information" is to be kept and archived as new information about
    > storeys is added.
    >
    > This means with option A, I can expect to have a table of a few hundred
    > thousand rows.... (obviously i would index the BuildingID and Storey
    > Number). Sounds like a lot(!), but it means that I could retreive/select
    > information I need with one single query and no table joins.
    >
    > If I go with option B, I can expect to have to do six table joins/queries
    > when retreiving information about one storey in one building, however
    > there will be less records to sift through.
    >[/color]

    Why would there be less rows with option B? With each scenario, you'll still
    have one row per storey, surely? So in fact, option B will be a far larger
    database, because you'll be storing each building ID and storey number six times.

    Option A is far more sensible, because it'll be smaller, you won't have to do
    any table joins, and the data is all in one place, which is easier to maintain
    in the long run.

    Oli

    Comment

    • Good Man

      #3
      Re: Database Design Q: Many Rows or Many Tables?

      Oli Filth <oli_filth@eats pam.coldmail.co m> wrote in
      news:r0Ewd.164$ mj4.82@newsfe1-gui.ntli.net:
      [color=blue]
      > Why would there be less rows with option B? With each scenario, you'll
      > still have one row per storey, surely? So in fact, option B will be a
      > far larger database, because you'll be storing each building ID and
      > storey number six times.
      >
      > Option A is far more sensible, because it'll be smaller, you won't
      > have to do any table joins, and the data is all in one place, which is
      > easier to maintain in the long run.[/color]

      Hi... I left something out regarding my database design structure (option
      A may actually record more than one row per story) but still you bring up
      a good point - there will still be a similar number of total rows
      regardless of which option I choose. Funny how I missed that :P

      Thanks for your help... I suppose my real concern was making sure that
      MySQL can go through several hundred thousand rows (what about
      millions?), but I suppose it can if I have the required disk space and I
      index my tables properly...

      PS: Regarding indexing, do I just set up the table with an Index, or do I
      need to index the table every month or so to make sure new records are
      being indexed?

      Thanks for the advice...

      GM

      Comment

      • Oli Filth

        #4
        Re: Database Design Q: Many Rows or Many Tables?

        Good Man wrote:[color=blue]
        > Thanks for your help... I suppose my real concern was making sure that
        > MySQL can go through several hundred thousand rows (what about
        > millions?), but I suppose it can if I have the required disk space and I
        > index my tables properly...
        >
        > PS: Regarding indexing, do I just set up the table with an Index, or do I
        > need to index the table every month or so to make sure new records are
        > being indexed?
        >
        > Thanks for the advice...
        >
        > GM[/color]

        Hi

        I've never used a MySQL table with more than a few hundred records, but I guess
        it still works fine with thousands or millions (just that queries will take much
        longer).

        If you set up a particular field in a table as an index, MySQL will
        automatically handle indexing of new insertions. However, if you're frequently
        deleting or altering existing records then the table becomes fragmented, so you
        should run a "OPTIMIZE TABLE table_name" query every so often, which defrags the
        table and re-sorts indexes.

        If you're going to be running tables with hundreds of thousands of records, it'd
        be well worth thinking about optimisations that you can use for your table
        design and the way that you form queries. For instance, use fixed-length CHAR
        strings rather than VARCHAR or TEXT. Try
        http://dev.mysql.com/doc/mysql/en/My...imization.html for loads of stuff on
        optimisation.

        Hope this helps,
        Oli

        Comment

        • John Sanders

          #5
          Re: Database Design Q: Many Rows or Many Tables?

          Good Man wrote:[color=blue]
          > Hi there
          >
          > I'm in the planning stages of creating a database, and I have two options
          > here. Which makes more sense, and/or provides better performance for
          > queries - a single table with hundreds of thousands of rows, or many
          > tables with less rows?
          >
          > GOAL:
          > Recording information about buildings, storey by storey. Basically I
          > will be recording six bits of information about each storey in each
          > building.
          >
          > SETUP:
          > There will be one table for Buildings for sure. The big question is about
          > how I should record those six details per storey.
          >
          > OPTIONS:
          > A) One table that records building ID, storey number, and the six
          > measurements per row.
          > B) Six tables that record building ID, storey number, and the appropriate
          > measurement.
          >
          >
          > Eventually, there could be a few thousand buildings in the database, and
          > each building may have anywhere from 5 to 50 storeys. Not to mention that
          > "old information" is to be kept and archived as new information about
          > storeys is added.
          >
          > This means with option A, I can expect to have a table of a few hundred
          > thousand rows.... (obviously i would index the BuildingID and Storey
          > Number). Sounds like a lot(!), but it means that I could retreive/select
          > information I need with one single query and no table joins.
          >
          > If I go with option B, I can expect to have to do six table joins/queries
          > when retreiving information about one storey in one building, however
          > there will be less records to sift through.
          >
          >
          > Which is the better option to take, and why? By 'better', I am referring
          > to database performance/speed, updates and other future maintenance on
          > the database. All opinions welcome.
          >
          > Thanks,
          > GM
          > ,[/color]

          I would say that A is your best solution, but only if you are storing
          one peice of information about the building, meaning Building ID. If
          you want to store other peices of information about the building, like
          street address, then you need a different approach... Option C.

          Option C, requires 2 tables... Buildings, and Storeys. Tables are
          listed below, and the fields are indented.

          Buildings
          BuildingID
          Address1
          Address2
          City
          ... (other fields)


          Storeys
          BuildingID (FK to Building/BuildingID)
          StoreyNumber
          BitOfInfo1
          BitOfInfo2
          ...
          BitOfInfo6


          John

          Comment

          • peter

            #6
            Re: Database Design Q: Many Rows or Many Tables?

            B is better as long as you have a normalized data model.

            Fewer tables mean less relational and mor records meaning more system
            resources are used for each query

            For large volumes do not use MySQL as it is not that scalable

            Postgres is much faster

            Good Man wrote:[color=blue]
            > Hi there
            >
            > I'm in the planning stages of creating a database, and I have two options
            > here. Which makes more sense, and/or provides better performance for
            > queries - a single table with hundreds of thousands of rows, or many
            > tables with less rows?
            >
            > GOAL:
            > Recording information about buildings, storey by storey. Basically I
            > will be recording six bits of information about each storey in each
            > building.
            >
            > SETUP:
            > There will be one table for Buildings for sure. The big question is about
            > how I should record those six details per storey.
            >
            > OPTIONS:
            > A) One table that records building ID, storey number, and the six
            > measurements per row.
            > B) Six tables that record building ID, storey number, and the appropriate
            > measurement.
            >
            >
            > Eventually, there could be a few thousand buildings in the database, and
            > each building may have anywhere from 5 to 50 storeys. Not to mention that
            > "old information" is to be kept and archived as new information about
            > storeys is added.
            >
            > This means with option A, I can expect to have a table of a few hundred
            > thousand rows.... (obviously i would index the BuildingID and Storey
            > Number). Sounds like a lot(!), but it means that I could retreive/select
            > information I need with one single query and no table joins.
            >
            > If I go with option B, I can expect to have to do six table joins/queries
            > when retreiving information about one storey in one building, however
            > there will be less records to sift through.
            >
            >
            > Which is the better option to take, and why? By 'better', I am referring
            > to database performance/speed, updates and other future maintenance on
            > the database. All opinions welcome.
            >
            > Thanks,
            > GM
            > ,[/color]

            Comment

            • Oli Filth

              #7
              Re: Database Design Q: Many Rows or Many Tables?

              peter wrote:[color=blue]
              > B is better as long as you have a normalized data model.
              >
              > Fewer tables mean less relational and mor records meaning more system
              > resources are used for each query
              >[/color]

              Doesn't apply in the OP's case, as his "option B" would literally be his "option
              A" split into six.

              i.e. For each storey:

              "option A":

              Table1: ID, StoreyNum, Data1, Data2, Data3, Data4, Data5, Data6
              ======

              "option B":

              Table1: ID, StoreyNum, Data1
              ======
              Table2: ID, StoreyNum, Data2
              ======
              Table3: ID, StoreyNum, Data3
              ======
              Table4: ID, StoreyNum, Data4
              ======
              Table5: ID, StoreyNum, Data5
              ======
              Table6: ID, StoreyNum, Data6
              ======

              As the OP has said that each storey will always be associated with six fields of
              data, "option B" will take up more space, as ID and StoreyNum will be stored six
              times each, *and* he'd have to do a six-table join for every query. There's no
              way that's more efficient.

              Oli

              Comment

              • nospam@geniegate.com

                #8
                Re: Database Design Q: Many Rows or Many Tables?

                Good Man <heyho@letsgo.c om> wrote:[color=blue]
                > Which is the better option to take, and why? By 'better', I am referring
                > to database performance/speed, updates and other future maintenance on
                > the database. All opinions welcome.[/color]

                I'd go with option A.

                You didn't mention what you were searching on? If it's building ID or
                storey ID, I wouldn't worry too much about table size. A modern database
                should be able to search fixed keys from a table of millions in under
                1 second. (where you get into trouble is like('%query%') )

                The trick is in understanding how the database interprets the query. Fixed
                index lookups are fast. Scanning rows of data is slow, so, try and minimize
                rows you'll likely need to scan.

                Don't be tempted to create indices w/out first populating the data, an index at the
                wrong place & time can slow down the database (particularly inserts).

                A good rule to follow is never, ever, EVER duplicate data. Ie, don't store
                the same bit of information twice. For instance, don't copy stuff around, use
                links: buildings AND stories.

                buildings ======
                bid | name | info that applies globally to same building

                storey ======
                sid | bid | info1 | info2 | info3 | info that applies only to storey.


                Where building->bid = storey->bid.

                That way, if you need to scan for a building name, you won't have to wade
                through as much data. Once you find the building name, you can use it's
                building ID (bid) to greatly minimize the rows from storeys that you'll be
                scanning. (since 'bid' would be indiced and a FIXED query. (bid=code, NOT bid
                like('%code%))

                You've got: select * from storeys where bid=100 AND info2 LIKE('%str%')

                The database should be smart enough to realize it can use it's bid index
                to minimize the rows passed on to the like() clause. So, even if your stories
                table has 50 buildings with 50 stories each, you'll only end up scanning
                100 rows. (50 from the building table to get the BID from the name, and 50
                from the stories table that were reduced from the index)

                (Experience has shown me to hold off on indices until they are actually
                required)

                Jamie
                --
                http://www.geniegate.com Custom web programming
                guhzo_42@lnubb. pbz (rot13) User Management Solutions

                Comment

                Working...