Database Normalization and Table Structures

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • DirtNap
    New Member
    • Jun 2008
    • 3

    #46
    Unfortunately, this is not quite correct.

    To give a quick summary, the normal forms are as follows:

    1NF: Every row must be an identifiable relation. This means that, in a table, no row may be an exact duplicate of another row, and nor may the row be completely filled with NULL.
    2NF All non-key attributes of the table must depend on the entire key. This means that, when a table has a compound key, attributes which depend only on a subset of the key columns should be moved out of the table. For example, take the comment above which mentions a compound primary key of OrderID and LineNo; if the application wants to use invoice_bgcolor to alternate the background color of rows on an invoice, it should go outside of this table, because it depends on LineNo but not on OrderID
    3NF You have 3NF pretty much correct above.
    BCNF Ummm.... what you have up above is actually pretty close to 2NF, but not BCNF. Technically, you could meet the definition of 1NF, 2NF and 3NF by making attribute columns which violate a definition become part of the key -- but BCNF says not to do this. For example, take the OrderID/LineNo example, above. Maybe this company assigns not only an OrderID, but also a ParcelID, which is used internally by the warehouse. OrderID and ParcelID have a one to one relationship, and so both are unique w/r/t LineNo. If we used (OrderID, LineNo) as the key and used ParcelID as an attribute, however, we would violate the 2NF (because ParcelID does not depend on LineNo.) Technically, however, we could make our key (OrderID,LineNo ,ParcelID) -- this would still be a unique key, because ParcelID maps 1-1 to OrderID -- but it would violate the BCNF.
    4NF The 4NF says that, if you have a compound key, the key must contain only one relation within it. For example, think of a table called student_skills (student,langua ge,sport) This type of table design leads to rows like the following:
    • Johnny, English, Soccer
    • Johnny, English, Baseball
    • Johnny, Italian, Soccer
    • Johnny, Italian, Baseball

    ...or even worse, like this:
    • Johnny, English, Soccer
    • Johnny, Italian, Baseball

    Now, this is fine if Johnny plays soccer in an English speaking league, and plays baseball in an Italian speaking league, but if that's not the case, this should be broken into two different tables: student_languag es(student,lang uage) and student_sports( student,sport)
    5NF 5NF takes effect if Johnny does play in leagues with different languages. That's a poor example, but say there were a multi-relation table like student_league_ sports(student, league,sport) with these rows:
    • Johnny,league a, baseball
    • Johnny, league a, footbal
    • Johnny, league a, soccer
    • Johnny, league b, soccer
    • Johnny, league c, tennis
    • Freddie, league b, soccer
    • Freddie, league b, golf
    • Freddie, league c, tennis
    • Freddie, league c, golf

    Now, notice that, since this table already passes 4NF, we don't mean that Johnny plays in league c, and also Johnny plays tennis -- what we're saying is that "Johnny plays baseball, football, soccer and tennis, and he does so in leagues a, b, and c. Freddie plays soccer, tennis and golf, and does so in leagues b and c. In other words, Johnny will play baseball, football, soccer and tennis, but only in leagues in which he is a member. Freddie, likewise, will play soccer, golf, and tennis, only in leagues in which he is a member. Johnny does not play golf just because he is a member of leagues b and c, nor is Freddie a member of league a just because he plays soccer -- there is really a three value relationship here. It should therefore be represented by the following tables: student_sports( student, sport), student_leagues (student,league ) and league_sports(l eague,sport). The data would then look like this:
    • student_sports
      • Johnny,baseball
      • Johnny, football
      • Johnny, soccer
      • Johnny, tennis
      • Freddie, soccer
      • Freddie, football
      • Freddie, tennis
      • Freddie, golf
    • student_leagues
      • Johnny, a
      • Johnny, b
      • Johnny, c
      • Freddie, b
      • Freddie, c
    • league_sports
      • a,baseball
      • a, football
      • a, soccer
      • c, tennis
      • b, soccer
      • b, golf
      • c, golf


    Notice that in this case, Freddie would like to play football, but can't.

    Finally, as stated above, a table must meet all previous normal forms, ie: 2NF implies 1NF, 5NF implies 4NF, 3NF, BCNF, 2NF, 1NF, etc.

    A database is normalized if it is in 5NF. There is also a 6NF and a few more theoretical forms, but they came after the original work in the 70's.



    msquared, please don't take this the wrong way -- thanks for putting the effort in to help people understand relational theory, I just want to get the correct info out there, I'm not trying to rain on your parade.


    ...also, some of the suppositions about why you normalize are also wrong, but this has been a long enough post already :)

    Comment

    • MMcCarthy
      Recognized Expert MVP
      • Aug 2006
      • 14387

      #47
      Originally posted by DirtNap
      msquared, please don't take this the wrong way -- thanks for putting the effort in to help people understand relational theory, I just want to get the correct info out there, I'm not trying to rain on your parade.

      ...also, some of the suppositions about why you normalize are also wrong, but this has been a long enough post already :)
      No offense, taken. Bytes is a democracy where all view points are welcomed and encouraged. I may not agree with all your conclusions but if I've learned anything over the years it's that the more I know the more I find I have to learn.

      The only other point I would make is that the premise of this article was to simplify Normal Forms to help people understand Normalisation better. It's a complex subject that is often badly implemented by so called database designers. In a way this was a deliberate attempt to simplify the subject in the hopes of encouraging others to take the structural design of databases seriously.

      All comments are always welcome :D

      Mary

      Comment

      • mfisher5kavika
        New Member
        • Jun 2008
        • 1

        #48
        Originally posted by DirtNap
        Unfortunately, this is not quite correct.

        ...also, some of the suppositions about why you normalize are also wrong, but this has been a long enough post already :)
        I was thinking the exact same thing when I read the post. It's great that msquared posted this to help explain normalization, which hardly anyone seems to understand any more, but the forms weren't exactly correct. DirtNap explained it exactly as I learned them many years ago. Thanks for clearing it up.

        Comment

        • MMcCarthy
          Recognized Expert MVP
          • Aug 2006
          • 14387

          #49
          Originally posted by mfisher5kavika
          I was thinking the exact same thing when I read the post. It's great that msquared posted this to help explain normalization, which hardly anyone seems to understand any more, but the forms weren't exactly correct. DirtNap explained it exactly as I learned them many years ago. Thanks for clearing it up.
          Its strange, one of the things I remember is that the explanations I got for Normalisation in College a few years ago (Late Student :D) differed in my mind somewhat from how it was taught to me many years ago when I first studied the subject. That may be just my impression though.

          Comment

          • DirtNap
            New Member
            • Jun 2008
            • 3

            #50
            Originally posted by msquared
            Its strange, one of the things I remember is that the explanations I got for Normalisation in College a few years ago (Late Student :D) differed in my mind somewhat from how it was taught to me many years ago when I first studied the subject. That may be just my impression though.
            The stuff I posted has been the way it is since the mid 70's, and continues to be that way, although there are some new forms like DKNF and 6NF that are newer, and which I didn't cover (because they are not normally included when talking about "normalizin g" a database.) However, if you've learned about this before, say, 1998 and learned about it today, you would get a whole different flavor, because before 1998 you would not have a very large practical component in the training.

            Comment

            • MMcCarthy
              Recognized Expert MVP
              • Aug 2006
              • 14387

              #51
              Originally posted by DirtNap
              The stuff I posted has been the way it is since the mid 70's, and continues to be that way, although there are some new forms like DKNF and 6NF that are newer, and which I didn't cover (because they are not normally included when talking about "normalizin g" a database.) However, if you've learned about this before, say, 1998 and learned about it today, you would get a whole different flavor, because before 1998 you would not have a very large practical component in the training.
              It was pre '98 originally. (LOL) Now I'm exposing my age.

              In truth I normalise nowadays as a matter of habit and would have to think about what I do and why. I think based on the results I'm doing it right maybe just not explaining it right :)

              Comment

              • DirtNap
                New Member
                • Jun 2008
                • 3

                #52
                Originally posted by msquared
                It was pre '98 originally. (LOL) Now I'm exposing my age.

                In truth I normalise nowadays as a matter of habit and would have to think about what I do and why. I think based on the results I'm doing it right maybe just not explaining it right :)

                There are three real reasons to do this:
                1. Integrity constraints are only guaranteed to work on normalized data
                2. When you are writing a query, normalization gives you a mathematical certainty that your data will behave in a predictable way
                3. The database itself can learn from normaliztion.


                The first two are because SQL and integrity constraints are based on set theory, which is what normalization is based on. The third doesn't apply to access, but for bigger databases like Oracle or DB2 that have a sophisticated query engine, the database can imply things about some columns based on other columns or tables because the data is normalized.

                Of course, there are many advantages to selective denormalization , but that's a different article.

                Comment

                • MMcCarthy
                  Recognized Expert MVP
                  • Aug 2006
                  • 14387

                  #53
                  Originally posted by DirtNap
                  Of course, there are many advantages to selective denormalization , but that's a different article.


                  Agreed! I also concur with the three points you made in the previous post. The amount of database designers who use denormalization as a "way around" badly structured tables is something that annoys me. I try to explain to them that before you can denormalize first you have to normalize. Most don't get it.

                  Comment

                  • ramien
                    New Member
                    • Jun 2008
                    • 3

                    #54
                    Has anyone ever used the Pick Database? I was involved in programming for that database and found it to be so intuitive.
                    All data relative to the record was stored in the record, ie. multiple phone #'s all in one field.
                    File structure, wow, so different from anything currently in use. Basically you create a file to hold the records. Then the records are retrieved via a Hash #. Fantastically quick. B-trees also used to access records where the data is 'inside' the record; no need to search entire db to get records. SQL has the same thing, BUT, all the data is scattered because all the pertinent data isnt contained in the record.
                    Just my 2cents.
                    Marshall Neill

                    Comment

                    • MMcCarthy
                      Recognized Expert MVP
                      • Aug 2006
                      • 14387

                      #55
                      Originally posted by ramien
                      Has anyone ever used the Pick Database? I was involved in programming for that database and found it to be so intuitive.
                      All data relative to the record was stored in the record, ie. multiple phone #'s all in one field.
                      File structure, wow, so different from anything currently in use. Basically you create a file to hold the records. Then the records are retrieved via a Hash #. Fantastically quick. B-trees also used to access records where the data is 'inside' the record; no need to search entire db to get records. SQL has the same thing, BUT, all the data is scattered because all the pertinent data isnt contained in the record.
                      Just my 2cents.
                      Marshall Neill
                      I haven't encountered it but sounds like something interesting to "play around" with :)

                      Comment

                      • ramien
                        New Member
                        • Jun 2008
                        • 3

                        #56
                        Originally posted by msquared
                        I haven't encountered it but sounds like something interesting to "play around" with :)
                        I think they have a free version for testing at rainingdata.com
                        odd name i know. used to be Pick Systems. It was developed by Richard Pick and i think Mike Nelso back in 1967 for the Army and some helicopter database.

                        Comment

                        • MMcCarthy
                          Recognized Expert MVP
                          • Aug 2006
                          • 14387

                          #57
                          Originally posted by ramien
                          I think they have a free version for testing at rainingdata.com
                          odd name i know. used to be Pick Systems. It was developed by Richard Pick and i think Mike Nelso back in 1967 for the Army and some helicopter database.
                          I'll have to check it out. Thanks for the recommendation.

                          Mary

                          Comment

                          • ramien
                            New Member
                            • Jun 2008
                            • 3

                            #58
                            Originally posted by msquared
                            I'll have to check it out. Thanks for the recommendation.

                            Mary
                            My bad, Mike Nelson. n
                            You're quite welcome Mary

                            Comment

                            • mshmyob
                              Recognized Expert Contributor
                              • Jan 2008
                              • 903

                              #59
                              I finally got around to reading your article Mary (lol). Nice. I explain it a little different but the concept is there.

                              As for the people explaining anything above 3NF you are just confusing things. Anyone out there who is a bigginner you can stop at 3NF and design any database you want.

                              As for why you normalize it is simple - It helps to reduce data redundancy. Notice I said help not eliminate because that is impossible.

                              cheers,

                              Comment

                              • MMcCarthy
                                Recognized Expert MVP
                                • Aug 2006
                                • 14387

                                #60
                                Originally posted by mshmyob
                                I finally got around to reading your article Mary (lol). Nice. I explain it a little different but the concept is there.

                                As for the people explaining anything above 3NF you are just confusing things. Anyone out there who is a bigginner you can stop at 3NF and design any database you want.

                                As for why you normalize it is simple - It helps to reduce data redundancy. Notice I said help not eliminate because that is impossible.

                                cheers,
                                Thanks :D

                                Comment

                                Working...