Denormalization in DBMS

Learn via video course
FREE
View all courses
DBMS Course - Master the Fundamentals and Advanced Concepts
DBMS Course - Master the Fundamentals and Advanced Concepts
by Srikanth Varma
1000
5
Start Learning
DBMS Course - Master the Fundamentals and Advanced Concepts
DBMS Course - Master the Fundamentals and Advanced Concepts
by Srikanth Varma
1000
5
Start Learning
Topics Covered

Overview

Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant (duplicate) data to one or more tables. This method can help us to avoid costly joins in a relational database made during normalization.

What is Denormalization?

Traditionally data is stored in normalized databases, in which multiple separate tables are maintained in a relational database to minimize the redundant data. Therefore, whenever we have to access data from multiple tables, we need to perform complex and costly join operations on the required tables.

Denormalization does not mean that the database is never normalized. It is just a method of optimizing databases that is implemented after normalization has been achieved.

Example

Here in the example, we have three tables Student table, Branch table and HOD table after normalization. The student table has roll_no, student_name, age, and branch_id as attributes.

roll_nostudent_nameagebranch_id
101Ash181
102Gary163
103Corey194
104James232
105George203

Additionally, the Branch table has attributes such as branch_id, branch_name, total_students. The branch table is linked to the Student table via branch_id as the student table's foreign key.

branch_idbranch_nametotal_studentshod_id
1CSE120901
2ME90902
3ECE60903
4EEE30904

And we also have HOD (Head of Department) table with attributes such as hod_id, hod_name, address. The HOD table is linked to the branch table via hod_id as the branch table's foreign key.

hod_idhod_nameaddress
901Mr. MaxMunich
902Mr. TysonTokyo
903Mr. AndrewChicago
904Mr. RickVancouver

Suppose we want to retrieve all student names along with their branch name and hod name. For this to happen, a JOIN operation is required to join both student and branch tables and the result of this join is further joined with the hod table. This query will be a great option if the table is small but in case if the tables are big, joins on tables can take an excessively long time.

The SQL query to query the required data will be

Output

student_namebranch_namehod_name
AshCSEMr. Max
GaryECEMr. Andrew
CoreyEEEMr. Rick
JamesMEMr. Tyson
GeorgeECEMr. Andrew

So to counter the above situation, we can denormalize the database with redundant data and extra effort to maximize the efficiency benefits of fewer joins. Therefore, we can add the branch name, hod's name data from the Branch and HOD table respectively to the student table to optimize the database.

After denormalization, the table will look like

roll_nostudent_nameagebranch_idbranch_namehod_name
101Ash181CSEMr. Max
102Gary163ECEMr. Andrew
103Corey194EEEMr. Rick
104James232MEMr. Tyson
105George203ECEMr. Andrew

Pros of Denormalization

  1. Enhanced Query Performance Querying data from normalized databases requires joining multiple tables which slows down the query performance. To overcome this, redundancy is added to the database thus minimizing the number of joins.

  2. More convenient to manage databases In a normalized database, required values are calculated while query execution takes a longer time thus slowing down the execution of the query. Thus to counter this, denormalization is done to fetch queries that can be simpler because we need to look at fewer tables.

  3. Facilitate and accelerate reporting In normalized databases, if we want to regularly monitor revenues over a certain period, Generating such reports from live data will require "searching" throughout the entire database thus significantly slowing it down.

Cons of Denormalization

  1. It takes a lot of storage space due to storing redundant data.
  2. It is very expensive to maintain updates and inserts in a denormalized database, also code for insert and update is harder to write in this case.
  3. Data Integrity is not maintained in denormalized databases, as there is redundancy so data is inconsistent.

How is Denormalization Different from Normalization?

Denormalization is a method to merge data from multiple relational tables into a single table in a database that can be retrieved quickly. While Normalization, on the other hand, is used to delete redundant/duplicate data from a database and replace it with non-redundant and reliable data.

Denormalization is used when joins are costly, and queries are run regularly on the tables. Normalization, on the other hand, is typically used when a large number of insert/update/delete operations are performed, and joins between those tables are not expensive.

Conclusion

  • Denormalization is a database optimization technique used by database administrators to optimize the efficiency of their database by adding redundant data to one or more tables.
  • The data from one table is included in another table to reduce the number of joins in the query and hence helps in speeding up the performance.
  • In database systems that require scalability, like that of any big tech company, we always use both normalized and denormalized databases.

Read Also: