Currently Browsing: Databases

Database Normalization

Normalizing any data is the key to successful database development. There are many techniques and approaches to this but the main aim is always to divide the given data into the smallest logical groups and avoid any duplication of any kind.

The main thing is we to understand how the data we’re tracking relate to each other.

Consider grade. Can a student have a grade all by itself? No, the student has to be in a class in order to eventually receive a grade. It also follows that grade is going to depend on which student we’re talking about, and even that’s not enough; we have to know which class this particular student took to determine the grade. After all, the same student can flunk English while passing Math, right? Therefore, one possible logical conclusion is that grade is an attribute of the junction Student-Course and thus belongs in the table representing such junction. You will want to read up on many-many relationship about how this actually works out.

A table in a relational database is said to be in a certain normal form if it satisfies certain constraints. Edgar F. Codd’s original work defined three such forms but there are now other generally accepted normal forms. We give here a short informal overview of the most common ones. Each normal form below represents a stronger condition than the previous one (in the order below). For most practical purposes, databases are considered normalized if they adhere to third normal form.

  • First Normal Form (or 1NF) requires that all column values in a table are atomic (e.g., a number is an atomic value, while a list or a set is not). For example, normalization eliminates repeating groups by putting each into a separate table and connecting them with a primary key-foreign key relationship.
  • Second Normal Form (or 2NF) requires that there are no non-trivial functional dependencies of a non-key attribute on a part of a candidate key.
  • Third Normal Form (or 3NF) requires that there are not non-trivial functional dependencies of non-key attributes on something else than a superset of a candidate key.
  • Boyce-Codd Normal Form (or BCNF) requires that there are no non-trival functional dependencies of attributes on something else than a superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A).
  • Fourth Normal Form (or 4NF) requires that there no non-trivial multi-valued dependencies of attribute sets on something else than a superset of a candidate key.
  • Fifth Normal Form (or 5NF or PJ/NF) requires that there are no non-trivial join dependencies that do not follow from the key constraints.
  • Domain-Key Normal Form (or DK/NF) requires that all constraints follow from the domain and the key constraints.