Unfortunately, this is not quite correct.
To give a quick summary, the normal forms are as follows:
1NF: Every row must be an identifiable relation. This means that, in a table, no row may be an exact duplicate of another row, and nor may the row be completely filled with NULL.
2NF All non-key attributes of the table must depend on the entire key. This means that, when a table has a compound key, attributes which depend only on a subset of the key columns should be moved out of the table. For example, take the comment above which mentions a compound primary key of OrderID and LineNo; if the application wants to use invoice_bgcolor to alternate the background color of rows on an invoice, it should go outside of this table, because it depends on LineNo but not on OrderID
3NF You have 3NF pretty much correct above.
BCNF Ummm.... what you have up above is actually pretty close to 2NF, but not BCNF. Technically, you could meet the definition of 1NF, 2NF and 3NF by making attribute columns which violate a definition become part of the key -- but BCNF says not to do this. For example, take the OrderID/LineNo example, above. Maybe this company assigns not only an OrderID, but also a ParcelID, which is used internally by the warehouse. OrderID and ParcelID have a one to one relationship, and so both are unique w/r/t LineNo. If we used (OrderID, LineNo) as the key and used ParcelID as an attribute, however, we would violate the 2NF (because ParcelID does not depend on LineNo.) Technically, however, we could make our key (OrderID,LineNo ,ParcelID) -- this would still be a unique key, because ParcelID maps 1-1 to OrderID -- but it would violate the BCNF.
4NF The 4NF says that, if you have a compound key, the key must contain only one relation within it. For example, think of a table called student_skills (student,langua ge,sport) This type of table design leads to rows like the following:
...or even worse, like this:
Now, this is fine if Johnny plays soccer in an English speaking league, and plays baseball in an Italian speaking league, but if that's not the case, this should be broken into two different tables: student_languag es(student,lang uage) and student_sports( student,sport)
5NF 5NF takes effect if Johnny does play in leagues with different languages. That's a poor example, but say there were a multi-relation table like student_league_ sports(student, league,sport) with these rows:
Now, notice that, since this table already passes 4NF, we don't mean that Johnny plays in league c, and also Johnny plays tennis -- what we're saying is that "Johnny plays baseball, football, soccer and tennis, and he does so in leagues a, b, and c. Freddie plays soccer, tennis and golf, and does so in leagues b and c. In other words, Johnny will play baseball, football, soccer and tennis, but only in leagues in which he is a member. Freddie, likewise, will play soccer, golf, and tennis, only in leagues in which he is a member. Johnny does not play golf just because he is a member of leagues b and c, nor is Freddie a member of league a just because he plays soccer -- there is really a three value relationship here. It should therefore be represented by the following tables: student_sports( student, sport), student_leagues (student,league ) and league_sports(l eague,sport). The data would then look like this:
Notice that in this case, Freddie would like to play football, but can't.
Finally, as stated above, a table must meet all previous normal forms, ie: 2NF implies 1NF, 5NF implies 4NF, 3NF, BCNF, 2NF, 1NF, etc.
A database is normalized if it is in 5NF. There is also a 6NF and a few more theoretical forms, but they came after the original work in the 70's.
msquared, please don't take this the wrong way -- thanks for putting the effort in to help people understand relational theory, I just want to get the correct info out there, I'm not trying to rain on your parade.
...also, some of the suppositions about why you normalize are also wrong, but this has been a long enough post already :)
To give a quick summary, the normal forms are as follows:
1NF: Every row must be an identifiable relation. This means that, in a table, no row may be an exact duplicate of another row, and nor may the row be completely filled with NULL.
2NF All non-key attributes of the table must depend on the entire key. This means that, when a table has a compound key, attributes which depend only on a subset of the key columns should be moved out of the table. For example, take the comment above which mentions a compound primary key of OrderID and LineNo; if the application wants to use invoice_bgcolor to alternate the background color of rows on an invoice, it should go outside of this table, because it depends on LineNo but not on OrderID
3NF You have 3NF pretty much correct above.
BCNF Ummm.... what you have up above is actually pretty close to 2NF, but not BCNF. Technically, you could meet the definition of 1NF, 2NF and 3NF by making attribute columns which violate a definition become part of the key -- but BCNF says not to do this. For example, take the OrderID/LineNo example, above. Maybe this company assigns not only an OrderID, but also a ParcelID, which is used internally by the warehouse. OrderID and ParcelID have a one to one relationship, and so both are unique w/r/t LineNo. If we used (OrderID, LineNo) as the key and used ParcelID as an attribute, however, we would violate the 2NF (because ParcelID does not depend on LineNo.) Technically, however, we could make our key (OrderID,LineNo ,ParcelID) -- this would still be a unique key, because ParcelID maps 1-1 to OrderID -- but it would violate the BCNF.
4NF The 4NF says that, if you have a compound key, the key must contain only one relation within it. For example, think of a table called student_skills (student,langua ge,sport) This type of table design leads to rows like the following:
- Johnny, English, Soccer
- Johnny, English, Baseball
- Johnny, Italian, Soccer
- Johnny, Italian, Baseball
...or even worse, like this:
- Johnny, English, Soccer
- Johnny, Italian, Baseball
Now, this is fine if Johnny plays soccer in an English speaking league, and plays baseball in an Italian speaking league, but if that's not the case, this should be broken into two different tables: student_languag es(student,lang uage) and student_sports( student,sport)
5NF 5NF takes effect if Johnny does play in leagues with different languages. That's a poor example, but say there were a multi-relation table like student_league_ sports(student, league,sport) with these rows:
- Johnny,league a, baseball
- Johnny, league a, footbal
- Johnny, league a, soccer
- Johnny, league b, soccer
- Johnny, league c, tennis
- Freddie, league b, soccer
- Freddie, league b, golf
- Freddie, league c, tennis
- Freddie, league c, golf
Now, notice that, since this table already passes 4NF, we don't mean that Johnny plays in league c, and also Johnny plays tennis -- what we're saying is that "Johnny plays baseball, football, soccer and tennis, and he does so in leagues a, b, and c. Freddie plays soccer, tennis and golf, and does so in leagues b and c. In other words, Johnny will play baseball, football, soccer and tennis, but only in leagues in which he is a member. Freddie, likewise, will play soccer, golf, and tennis, only in leagues in which he is a member. Johnny does not play golf just because he is a member of leagues b and c, nor is Freddie a member of league a just because he plays soccer -- there is really a three value relationship here. It should therefore be represented by the following tables: student_sports( student, sport), student_leagues (student,league ) and league_sports(l eague,sport). The data would then look like this:
- student_sports
- Johnny,baseball
- Johnny, football
- Johnny, soccer
- Johnny, tennis
- Freddie, soccer
- Freddie, football
- Freddie, tennis
- Freddie, golf
- student_leagues
- Johnny, a
- Johnny, b
- Johnny, c
- Freddie, b
- Freddie, c
- league_sports
- a,baseball
- a, football
- a, soccer
- c, tennis
- b, soccer
- b, golf
- c, golf
Notice that in this case, Freddie would like to play football, but can't.
Finally, as stated above, a table must meet all previous normal forms, ie: 2NF implies 1NF, 5NF implies 4NF, 3NF, BCNF, 2NF, 1NF, etc.
A database is normalized if it is in 5NF. There is also a 6NF and a few more theoretical forms, but they came after the original work in the 70's.
msquared, please don't take this the wrong way -- thanks for putting the effort in to help people understand relational theory, I just want to get the correct info out there, I'm not trying to rain on your parade.
...also, some of the suppositions about why you normalize are also wrong, but this has been a long enough post already :)
Comment