Normalization in DBMS

Video Tutorial
FREE
 Introduction to Normalisation + 1NF and 2NF +examples thumbnail
This video belongs to
DBMS Course - Master the Fundamentals and Advanced Concepts
16 modules
Certificate
Topics Covered

Overview

Normalization is the process of organizing the data and the attributes of a database. It is performed to reduce the data redundancy in a database and to ensure that data is stored logically. Data redundancy in DBMS means having the same data but at multiple places. It is necessary to remove data redundancy because it causes anomalies in a database which makes it very hard for a database administrator to maintain it.

Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data redundancy. It provides a method to remove the following anomalies from the database and bring it to a more consistent state:

A database anomaly is a flaw in the database that occurs because of poor planning and redundancy.

  1. Insertion anomalies: This occurs when we are not able to insert data into a database because some attributes may be missing at the time of insertion.

  2. Updation anomalies: This occurs when the same data items are repeated with the same values and are not linked to each other.

  3. Deletion anomalies: This occurs when deleting one part of the data deletes the other necessary information from the database.

In the next section, we’ll see the types of normal forms in the database.

Normal Forms

There are four types of normal forms that are usually used in relational databases as you can see in the following figure:

Normal Forms

  1. 1NF: A relation is in 1NF if all its attributes have an atomic value.

  2. 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the candidate key in DBMS.

  3. 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.

  4. BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS is the super key.

To understand the above-mentioned normal forms, we first need to have an understanding of the functional dependencies.

Functional dependency is a relationship that exists between two sets of attributes of a relational table where one set of attributes can determine the value of the other set of attributes. It is denoted by X -> Y, where X is called a determinant and Y is called dependent.

There are various levels of normalizations. Let’s go through them one by one:

First Normal Form (1NF)

A relation is in 1NF if every attribute is a single-valued attribute or it does not contain any multi-valued or composite attribute, i.e., every attribute is an atomic attribute. If there is a composite or multi-valued attribute, it violates the 1NF. To solve this, we can create a new row for each of the values of the multi-valued attribute to convert the table into the 1NF.

Let’s take an example of a relational table <EmployeeDetail> that contains the details of the employees of the company.

<EmployeeDetail>

Employee CodeEmployee NameEmployee Phone Number
101John98765623,998234123
101John89023467
102Ryan76213908
103Stephanie98132452

Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.

To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as shown below:

<EmployeeDetail>

Employee CodeEmployee NameEmployee Phone Number
101John998234123
101John98765623
101John89023467
102Ryan76213908
103Stephanie98132452

Second Normal Form (2NF)

The normalization of 1NF relations to 2NF involves the elimination of partial dependencies. A partial dependency in DBMS exists when any non-prime attributes, i.e., an attribute not a part of the candidate key, is not fully functionally dependent on one of the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

  1. The table must be in first normal form.
  2. It must not contain any partial dependency, i.e., all non-prime attributes are fully functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent attributes and move them to some other table where they fit in well.

Let us take an example of the following <EmployeeProjectDetail> table to understand what is partial dependency and how to normalize the table to the second normal form:

<EmployeeProjectDetail>

Employee CodeProject IDEmployee NameProject Name
101P03JohnProject103
101P01JohnProject101
102P04RyanProject104
103P02StephanieProject102

In the above table, the prime attributes of the table are Employee Code and Project ID. We have partial dependencies in this table because Employee Name can be determined by Employee Code and Project Name can be determined by Project ID. Thus, the above relational table violates the rule of 2NF.

The prime attributes in DBMS are those which are part of one or more candidate keys.

To remove partial dependencies from this table and normalize it into second normal form, we can decompose the <EmployeeProjectDetail> table into the following three tables:

<EmployeeDetail>

Employee CodeEmployee Name
101John
101John
102Ryan
103Stephanie

<EmployeeProject>

Employee CodeProject ID
101P03
101P01
102P04
103P02

<ProjectDetail>

Project IDProject Name
P03Project103
P01Project101
P04Project104
P02Project102

Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into <EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is fully dependent on the primary key.

The relations in 2NF are clearly less redundant than relations in 1NF. However, the decomposed relations may still suffer from one or more anomalies due to the transitive dependency. We will remove the transitive dependencies in the Third Normal Form.

Third Normal Form (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive dependencies in DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional dependencies hold:

  • X -> Y
  • Y does not -> X
  • Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

  1. The table must be in the second normal form.
  2. No non-prime attribute is transitively dependent on the primary key.
  3. For each functional dependency X -> Z at least one of the following conditions hold:
  • X is a super key of the table.
  • Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent attributes and place them to a new table along with a copy of the determinant.

Let us take an example of the following <EmployeeDetail> table to understand what is transitive dependency and how to normalize the table to the third normal form:

<EmployeeDetail>

Employee CodeEmployee NameEmployee ZipcodeEmployee City
101John110033Model Town
101John110044Badarpur
102Ryan110028Naraina
103Stephanie110064Hari Nagar

The above table is not in 3NF because it has Employee Code -> Employee City transitive dependency because:

  • Employee Code -> Employee Zipcode
  • Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.

To remove transitive dependency from this table and normalize it into the third normal form, we can decompose the <EmployeeDetail> table into the following two tables:

<EmployeeDetail>

Employee CodeEmployee NameEmployee Zipcode
101John110033
101John110044
102Ryan110028
103Stephanie110064

<EmployeeLocation>

Employee ZipcodeEmployee City
110033Model Town
110044Badarpur
110028Naraina
110064Hari Nagar

Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into <EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t have any transitive dependency.

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and remove redundancy caused by that. However, there may still exist some dependencies that cause redundancy in the database. These redundancies are removed by a more strict normal form known as BCNF.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains additional constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

  1. The table must be in the third normal form.
  2. For every non-trivial functional dependency X -> Y, X is the superkey of the table. That means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a database table.

Let us take an example of the following <EmployeeProjectLead> table to understand how to normalize the table to the BCNF:

<EmployeeProjectLead>

Employee CodeProject IDProject Leader
101P03Grey
101P01Christian
102P04Hudson
103P02Petro

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project Leader is a non-prime attribute. This is not allowed in BCNF.

To convert the given table into BCNF, we decompose it into three tables:

<EmployeeProject>

Employee CodeProject ID
101P03
101P01
102P04
103P02

<ProjectLead>

Project LeaderProject ID
GreyP03
ChristianP01
HudsonP04
PetroP02

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into <EmployeeProject> and <ProjectLead> tables.

Conclusion

  • Normal forms are a mechanism to remove redundancy and optimize database storage.
  • In 1NF, we check for atomicity of the attributes of a relation.
  • In 2NF, we check for partial dependencies in a relation.
  • In 3NF, we check for transitive dependencies in a relation.
  • In BCNF, we check for the superkeys in LHS of all functional dependencies.

Read More: