Should I Denormalize?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Josh

    Should I Denormalize?

    I have a database design question and wanted to see if folks could lend some
    advice. For simplicity sake, I'll illustrate using a book (but the
    hierarchy is the same in the real application - just with extra fields).

    A book has a title has chapters.
    A chapter has a title and has pages.
    A page has a title has pictures.
    A picture has a title.

    My application will show a book, a chapter, a page, or a picture at any
    given time. However, regardless of what I'm showing, I need to show my
    name, and the names of all my parents (e.g. a page must show it's name, and
    the names of it's parent chapter and book).

    The question - in database terms, does the second, denormalized table design
    give any performance advantage? Here are my two options:

    -- Normalized --
    book: book_id, title
    chapter: chapter_id, book_id, title
    page: page_id, chapter_id, title
    picture: picture_id, page_id, title

    -- Denormalized --
    book: book_id, title
    chapter: chapter_id, book_id, title
    page: page_id, chapter_id, book_id, title
    picture: picture_id, page_id, chapter_id, book_id, title

    I'll use the most extreme case (the lowest table) as an example. When
    viewing a picture, I'll need to get the picture title, and also the titles
    of each of my parents. Below are the resulting queries and explain routes:

    NORMALIZED QUERY

    select book.title, chapter.title, page.title, picture.title
    from book, chapter, page, picture
    where book.book_id = chapter.book_id
    and chapter.chapter _id = page.chapter_id
    and page.page_id = picture.page_id
    and picture.picture _id = 3601

    NORMALIZED EXPLAIN

    PRIMARY,fk_pict ure_2_page
    PRIMARY,fk_page _2_chapter
    PRIMARY,fk_chap ter_2_book
    PRIMARY

    If I chose the denormalized version, they are:

    DENORMALIZED QUERY

    select book.title, chapter.title, page.title, picture.title
    from book, chapter, page, picture
    where book.book_id = picture.book_id
    and chapter.chapter _id = picture.chapter _id
    and page.page_id = picture.page_id
    and picture.picture _id = 3601

    DENORMALIZED EXPLAIN

    PRIMARY,fk_pict ure_2_page,fk_p icture_2_chapte r,fk_picture_2_ book
    PRIMARY
    PRIMARY
    PRIMARY

    I've run some basic tests, and haven't noticed any real difference in
    performance. I'm pretty new to database design, so if anyone has any
    pointers, please let me know.

    Thanks!

    --
    Josh


  • Thomas Bartkus

    #2
    Re: Should I Denormalize?

    "Josh" <memphis75 - gmail> wrote in message
    news:11591u12p3 g3402@corp.supe rnews.com...
    <snip>[color=blue]
    > The question - in database terms, does the second, denormalized table[/color]
    design[color=blue]
    > give any performance advantage?[/color]
    <snip>

    The entire (MySQL) system is designed to perform it's all around best using
    textbook
    classic normalized tables. The kind every intro to db design preaches. This
    seems to be the most difficult thing to "preach" to newcomers to database
    design. It just sounds so unconvincing that one *should* spend a bit more
    effort writing table joins instead of cramming redundant information into
    flat tables. My advice to someone serious about database design would be to
    simply take it on faith because the "Ahah!" experience will come to you.
    For most of us who begin by hot rodding flat tables at the get go the
    revelation experience is more like "Oh sh**! - So that's why you're supposed
    to do it that way! ..."

    Let me just say that you (might!) get some performance advantage using flat
    "denormaliz ed" tables. MySQL would certainly be the *wrong* tool for this.
    I would go for a simple Excel table or one of the several flat file database
    systems that are out there. As long as the data set is *small* and the
    entire system is *simple* you will enjoy a performance advantage and may
    never encounter the numerous and onerous booby traps this will set for a
    system you expect to grow over time. MySQL, on the other hand, is meant to
    scale large. It may not be as fast out of the gate, but neither will it slow
    or bog down as the system expands. It's more like a marathon runner. As the
    miles wear on the sprinter always falls behind never to catch up again.

    But hey! If you are only going to cover 100 or 200 yards - by all means
    sprint. Just don't waste your efforts by using a marathon tool like MySQL.

    Okay - enough. Now what does your book say about normalizing a database
    that you don't quite believe? Or the advantage that you think might be worth
    sacrificing for few microseconds faster response time?

    Thomas Bartkus

    Oh how I wish production MySQL had views or virtual tables. Then when the
    next PHP hotshot demands a flat table to simplify his coding I could just
    write a nice proper query, store it, and tell him I'm giving him his table -
    muttering "jerk!" under my breath.


    Comment

    • Josh

      #3
      Re: Should I Denormalize?

      Thomas - Thank you for taking the time to explain this!
      [color=blue]
      > The entire (MySQL) system is designed to perform it's all around best
      > using textbook classic normalized tables.[/color]

      This is exactly the information I was hoping to get.
      [color=blue]
      > take it on faith because the "Ahah!" experience will come to you.[/color]

      I loaded a few million records across the tables and ran some performance
      tests (multi-threaded). You're absolutely right about the minimal
      performance gains. As long as I was joining on primary keys, there was no
      significant boost by denormalizing, only extra columns to keep up with!

      --
      Josh





      Comment

      • Thomas Bartkus

        #4
        Re: Should I Denormalize?

        "Josh" <memphis75 - gmail> wrote in message
        news:115g0hjbfb 9grc3@corp.supe rnews.com...[color=blue]
        > Thomas - Thank you for taking the time to explain this!
        >[color=green]
        > > The entire (MySQL) system is designed to perform it's all around best
        > > using textbook classic normalized tables.[/color]
        >
        > This is exactly the information I was hoping to get.
        >[color=green]
        > > take it on faith because the "Ahah!" experience will come to you.[/color]
        >
        > I loaded a few million records across the tables and ran some performance
        > tests (multi-threaded). You're absolutely right about the minimal
        > performance gains. As long as I was joining on primary keys, there was no
        > significant boost by denormalizing, ....[/color]
        [color=blue]
        > .... only extra columns to keep up with! * * *[/color]

        Yup - that's one of the ahahs!
        Maintaining data integrity is going to be *so much easier and reliable* when
        you follow the rules.

        There will be more!
        Thomas Bartkus


        Comment

        • jerry gitomer

          #5
          Re: Should I Denormalize?

          Thomas Bartkus wrote:[color=blue]
          > "Josh" <memphis75 - gmail> wrote in message
          > news:115g0hjbfb 9grc3@corp.supe rnews.com...
          >[color=green]
          >>Thomas - Thank you for taking the time to explain this!
          >>
          >>[color=darkred]
          >>>The entire (MySQL) system is designed to perform it's all around best
          >>>using textbook classic normalized tables.[/color]
          >>
          >>This is exactly the information I was hoping to get.
          >>
          >>[color=darkred]
          >>>take it on faith because the "Ahah!" experience will come to you.[/color]
          >>
          >>I loaded a few million records across the tables and ran some performance
          >>tests (multi-threaded). You're absolutely right about the minimal
          >>performance gains. As long as I was joining on primary keys, there was no
          >>significant boost by denormalizing, ....[/color]
          >
          >[color=green]
          > > .... only extra columns to keep up with! * * *[/color]
          >
          > Yup - that's one of the ahahs!
          > Maintaining data integrity is going to be *so much easier and reliable* when
          > you follow the rules.
          >
          > There will be more!
          > Thomas Bartkus
          >
          >[/color]
          Another factor to consider when looking at performance is the
          cost of "throwing hardware at the problem". As RAM and hard
          disks both become less expensive and faster to access trying for
          performance improvements by denormalizing your database (aside
          from probably not yielding any significant gains) is short
          sighted and will result in additional maintenance costs that
          will exceed the cost of additional hardware.

          Comment

          Working...