Denormalization in DBMS

Overview

Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant (duplicate) data to one or more tables. This method can help us to avoid costly joins in a relational database made during normalization.

What is Denormalization?

Traditionally data is stored in normalized databases, in which multiple separate tables are maintained in a relational database to minimize the redundant data. Therefore, whenever we have to access data from multiple tables, we need to perform complex and costly join operations on the required tables.

Denormalization does not mean that the database is never normalized. It is just a method of optimizing databases that is implemented after normalization has been achieved.

Example

Here in the example, we have three tables Student table, Branch table and HOD table after normalization. The student table has roll_no, student_name, age, and branch_id as attributes.

roll_no	student_name	age	branch_id
101	Ash	18	1
102	Gary	16	3
103	Corey	19	4
104	James	23	2
105	George	20	3

Additionally, the Branch table has attributes such as branch_id, branch_name, total_students. The branch table is linked to the Student table via branch_id as the student table's foreign key.

branch_id	branch_name	total_students	hod_id
1	CSE	120	901
2	ME	90	902
3	ECE	60	903
4	EEE	30	904

And we also have HOD (Head of Department) table with attributes such as hod_id, hod_name, address. The HOD table is linked to the branch table via hod_id as the branch table's foreign key.

hod_id	hod_name	address
901	Mr. Max	Munich
902	Mr. Tyson	Tokyo
903	Mr. Andrew	Chicago
904	Mr. Rick	Vancouver

Suppose we want to retrieve all student names along with their branch name and hod name. For this to happen, a JOIN operation is required to join both student and branch tables and the result of this join is further joined with the hod table. This query will be a great option if the table is small but in case if the tables are big, joins on tables can take an excessively long time.

The SQL query to query the required data will be

Output

student_name	branch_name	hod_name
Ash	CSE	Mr. Max
Gary	ECE	Mr. Andrew
Corey	EEE	Mr. Rick
James	ME	Mr. Tyson
George	ECE	Mr. Andrew

So to counter the above situation, we can denormalize the database with redundant data and extra effort to maximize the efficiency benefits of fewer joins. Therefore, we can add the branch name, hod's name data from the Branch and HOD table respectively to the student table to optimize the database.

After denormalization, the table will look like

roll_no	student_name	age	branch_id	branch_name	hod_name
101	Ash	18	1	CSE	Mr. Max
102	Gary	16	3	ECE	Mr. Andrew
103	Corey	19	4	EEE	Mr. Rick
104	James	23	2	ME	Mr. Tyson
105	George	20	3	ECE	Mr. Andrew

Pros of Denormalization

Enhanced Query Performance Querying data from normalized databases requires joining multiple tables which slows down the query performance. To overcome this, redundancy is added to the database thus minimizing the number of joins.
More convenient to manage databases In a normalized database, required values are calculated while query execution takes a longer time thus slowing down the execution of the query. Thus to counter this, denormalization is done to fetch queries that can be simpler because we need to look at fewer tables.
Facilitate and accelerate reporting In normalized databases, if we want to regularly monitor revenues over a certain period, Generating such reports from live data will require "searching" throughout the entire database thus significantly slowing it down.

Cons of Denormalization

It takes a lot of storage space due to storing redundant data.
It is very expensive to maintain updates and inserts in a denormalized database, also code for insert and update is harder to write in this case.
Data Integrity is not maintained in denormalized databases, as there is redundancy so data is inconsistent.

How is Denormalization Different from Normalization?

Denormalization is a method to merge data from multiple relational tables into a single table in a database that can be retrieved quickly. While Normalization, on the other hand, is used to delete redundant/duplicate data from a database and replace it with non-redundant and reliable data.

Denormalization is used when joins are costly, and queries are run regularly on the tables. Normalization, on the other hand, is typically used when a large number of insert/update/delete operations are performed, and joins between those tables are not expensive.

Conclusion

Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant data to one or more tables.
The data from one table is included in another table to reduce the number of joins in the query and hence helps in speeding up the performance.
In database systems that require scalability, like that of any big tech company, we always use both normalized and denormalized databases.