The Basics of Database Normalization

How to normalize a database

In This Article

Jump to a Section

If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you, "Is that database normalized?" or "Is that in BCNF?" Normalization is often considered a luxury only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks isn't all that complicated, and it could drastically improve the performance of your DBMS.

In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. 

What Is Normalization?

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals, as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal Forms

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF, along with the occasional 4NF. The fifth normal form is very rarely seen and won't be discussed in this article.

Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's imperative to evaluate any possible ramifications they could have on your system and account for potential inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)

First normal form (1NF) sets the fundamental rules for an organized database:

  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)

Third normal form (3NF) goes one significant step further:

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form," adds one more requirement:

  • Meet all the requirements of the third normal form.
  • Every determinant must be a candidate key.

Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Should I Normalize?

While database normalization is often a good idea, it's not an absolute requirement. There are some cases where deliberately violating the rules of normalization is a good practice.

If you'd like to ensure your database is normalized, start with learning how to put your database into First Normal Form.

Was this page helpful?