Concurrency Control in DBMS

Video Tutorial
FREE
 Problems due to Concurrency thumbnail
This video belongs to
DBMS Course - Master the Fundamentals and Advanced Concepts
16 modules
Certificate
Topics Covered

Overview

When several transactions execute concurrently without any rules and protocols, various problems arise that may harm the data integrity of several databases. These problems are known as concurrency control problems. Therefore several rules are designed, to maintain consistency in the transactions while they are executing concurrently which are known as concurrency control protocols.

To understand Concurrency Control in DBMS, you should have some knowledge of the following DBMS topics:

What is concurrency control in DBMS?

A transaction is a single reasonable unit of work that can retrieve or may change the data of a database. Executing each transaction individually increases the waiting time for the other transactions and the overall execution also gets delayed. Hence, to increase the throughput and to reduce the waiting time, transactions are executed concurrently.

Example: Suppose, between two railway stations, A and B, 5 trains have to travel, if all the trains are set in a row and only one train is allowed to move from station A to B and others have to wait for the first train to reach its destination then it will take a lot of time for all the trains to travel from station A to B. To reduce time all the trains should be allowed to move concurrently from station A to B ensuring no risk of collision between them.

When several transactions execute simultaneously, then there is a risk of violation of the data integrity of several databases. Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their atomicity, isolation, consistency and serializability.

Concurrent Execution in DBMS

  • In a multi-user system, multiple users can access and use the same database at one time, which is known as the concurrent execution of the database. It means that the same database is executed simultaneously on a multi-user system by different users.
  • While working on the database transactions, there occurs the requirement of using the database by multiple users for performing different operations, and in that case, concurrent execution of the database is performed.
  • The thing is that the simultaneous execution that is performed should be done in an interleaved manner, and no operation should affect the other executing operations, thus maintaining the consistency of the database. Thus, on making the concurrent execution of the transaction operations, there occur several challenging problems that need to be solved.

Concurrency Control Problems

Several problems that arise when numerous transactions execute simultaneously in a random manner are referred to as Concurrency Control Problems.

Dirty Read Problem

The dirty read problem in DBMS occurs when a transaction reads the data that has been updated by another transaction that is still uncommitted. It arises due to multiple uncommitted transactions executing simultaneously.

Example: Consider two transactions A and B performing read/write operations on a data DT in the database DB. The current value of DT is 1000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT)------
T2DT=DT+500------
T3WRITE(DT)------
T4------READ(DT)
T5------COMMIT
T6ROLLBACK------

Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets stored in the temporary buffer. The transaction B reads the data DT as 1500 and commits it and the value of DT permanently gets changed to 1500 in the database DB. Then some server errors occur in transaction A and it wants to get rollback to its initial value, i.e., 1000 and then the dirty read problem occurs.

Unrepeatable Read Problem

The unrepeatable read problem occurs when two or more different values of the same data are read during the read operations in the same transaction.

Example: Consider two transactions A and B performing read/write operations on a data DT in the database DB. The current value of DT is 1000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT)------
T2------READ(DT)
T3DT=DT+500------
T4WRITE(DT)------
T5------READ(DT)

Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of DT from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500. Transaction B finds two different values of DT in its two different read operations.

Phantom Read Problem

In the phantom read problem, data is read through two different read operations in the same transaction. In the first read operation, a value of the data is obtained but in the second operation, an error is obtained saying the data does not exist.

Example: Consider two transactions A and B performing read/write operations on a data DT in the database DB. The current value of DT is 1000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT)------
T2------READ(DT)
T3DELETE(DT)------
T4------READ(DT)

Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from the database DB and then again transaction B reads the value and finds an error saying the data DT does not exist in the database DB.

Lost Update Problem

The Lost Update problem arises when an update in the data is done over another update but by two different transactions.

Example: Consider two transactions A and B performing read/write operations on a data DT in the database DB. The current value of DT is 1000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT)------
T2DT=DT+500------
T3WRITE(DT)------
T4------DT=DT+300
T5------WRITE(DT)
T6READ(DT)------

Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A again reads DT and finds 1800 in DT and therefore the update done by transaction A has been lost.

Incorrect Summary Problem

The Incorrect summary problem occurs when there is an incorrect sum of the two data. This happens when a transaction tries to sum two data using an aggregate function and the value of any one of the data get changed by another transaction.

Example: Consider two transactions A and B performing read/write operations on two data DT1 and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT1)------
T2add=0------
T3add=add+DT1------
T4------READ(DT2)
T5------DT2=DT2+500
T6READ(DT2)------
T7add=add+DT2------

Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which calculates the sum of two data DT1 and DT2 in variable add but in between the value of DT2 get changed from 2000 to 2500 by transaction B. Variable add uses the modified value of DT2 and gives the resultant sum as 3500 instead of 3000.

Concurrency Control Protocols

To avoid concurrency control problems and to maintain consistency and serializability during the execution of concurrent transactions some rules are made. These rules are known as Concurrency Control Protocols.

Lock-Based Protocols

To attain consistency, isolation between the transactions is the most important tool. Isolation is achieved if we disable the transaction to perform a read/write operation. This is known as locking an operation in a transaction. Through lock-based protocols, desired operations are freely allowed to perform locking the undesired operations.

There are two kinds of locks used in Lock-based protocols:

Shared Lock(S): The locks which disable the write operations but allow read operations for any data in a transaction are known as shared locks. They are also known as read-only locks and are represented by 'S'.

Exclusive Lock(X): The locks which allow both the read and write operations for any data in a transaction are known as exclusive locks. This is a one-time use mode that can't be utilized on the exact data item twice. They are represented by 'X'.

There are four kinds of lock-based protocols:

Simplistic Lock Protocol: This protocol instructs to lock all the other operations on the data when the data is going to get updated. All the transactions may unlock all the operations on the data after the write operation.

Pre-claiming Lock Protocol: According to the pre-claiming lock protocol initially, an assessment of the operations that are going to be performed is conducted. Then a list is prepared to contain the data items on which locks will be imposed. The transaction requests the system all the locks before starting the execution of the operations. If all the locks are provided then the operations in the transaction run smoothly and then locks are returned to the system on completion. The transaction rolls back if all the locks are not provided.

Two-phase Locking Protocol: This protocol consists of three phases. The transaction starts its execution with the first phase, where it asks for the locks. Once the locks are granted, the second phase begins, where the transaction contains all the locks. When the transaction releases the first lock, the third phase begins where all the locks are getting released after the execution of every operation in the transaction.

Strict Two-Phase Locking Protocol: The strict 2PL is almost similar to 2PL. The only difference is that the strict 2PL does not allow releasing the locks just after the execution of the operations, but it carries all the locks and releases them when the commit is triggered. Check out this article to learn more about Loock-Based Protocols.

Time-based Protocols

According to this protocol, every transaction has a timestamp attached to it. The timestamp is based on the time in which the transaction is entered into the system. There is read and write timestamps associated with every transaction which consists of the time at which the latest read and write operations are performed respectively.

Timestamp Ordering Protocol:

The timestamp ordering protocol uses timestamp values of the transactions to resolve the conflicting pairs of operations. Thus, ensuring serializability among transactions. Following are the denotations of the terms used to define the protocol for transaction A on the data item DT:

TermsDenotations
Timestamp of transaction ATS(A)
Read time-stamp of data-item DTR-timestamp(DT)
Write time-stamp of data-item DTW-timestamp(DT)

Following are the rules on which the Time-ordering protocol works:

  1. When transaction A is going to perform a read operation on data item DT:
    • TS(A) < W-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it has entered in the system is less than the write timestamp of DT that is the latest time at which DT has been updated then the transaction will roll back.
    • TS(A) >= W-timestamp(DT): Transaction will be executed. If the timestamp of transaction A at which it has entered in the system is greater than or equal to the write timestamp of DT that is the latest time at which DT has been updated then the read operation will be executed.
    • All data-item timestamps updated.
  2. When transaction A is going to perform a write operation on data item DT:
    • TS(A) < R-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it has entered in the system is less than the read timestamp of DT that is the latest time at which DT has been read then the transaction will rollback.
    • TS(A) < W-timestamp(DT): Transaction will rollback. If the timestamp of transaction A at which it has entered in the system is less than the write timestamp of DT that is the latest time at which DT has been updated then the transaction will rollback.
    • All the operations other than this will be executed.

Thomas' Write Rule: The rule alters the timestamp-ordering protocol to make the schedule view serializable. For the case TS(A) < W-timestamp(DT), in the timestamp-ordering protocol, the transaction will get rollback but according to Thomas Write Rule, whenever the write operation comes up, it will get ignored.

Validation Based Protocol

This protocol executes the transaction undergoing through the following three phases:

Read phase: In this phase, the transaction stores all the values of data in its local buffer that occurs after the execution of every operation in the transaction. There is no modification done in the database.

Validation phase: In this phase, validation tests are performed that check whether the values of data present in the local buffer can replace the original value of the database without causing any harm to serializability.

Validation Test: Validation tests have performed on transaction A executing concurrently with transaction B such that TS(A)<TS(B). The transactions must follow one of the following conditions:

  1. Finish(A)<Start(B): The operations in transaction A are finished its execution before transaction B starts. Consider two transactions A and B executing its operations. Hence serializability order is maintained.
  2. Start(B)<Finish(A)<Validate(B): The list of data items written by transaction A during its write operation should not intersect with the read of the transaction B.

Write phase: If the transaction passes the tests of the validation phase, then the values get copied to the database, otherwise the transaction rolls back.

Example: Consider two transactions A and B performing read/write operations on two data DT1 and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The following table shows the read/write operations in A and B transactions.

TimeAB
T1READ(DT1)------
T2------READ(DT1)
T3------DT1=DT1-100
T4------READ(DT2)
T5------DT2=DT2+500
T6------READ(DT2)
T7------
T8PRINT(DT2-DT1)------
T9------
T10------WRITE(DT1)
T11------WRITE(DT2)

The schedule passes the validation test of the validation phase due to the timestamp transaction B being less than transaction A. It should be observed that the write operations are implemented after the validation of both transactions. All the operations before the final write are performed in the local buffer.

Conclusion

  • Concurrency Control in DBMS is a procedure of managing simultaneous transactions ensuring their atomicity, isolation, consistency, and serializability.
  • Several problems that arise when numerous transactions execute simultaneously in a random manner are referred to as concurrency control problems.
  • The dirty read problem occurs when a transaction reads the data that has been updated by another transaction that is still uncommitted.
  • The unrepeatable read problem occurs when two or more different values of the same data are read during the read operations in the same transaction.
  • The phantom read problem occured when the read data got deleted by another transaction and on applying read operation to it shows errors.
  • The Lost Update problem arises when an update in the data is done over another update but by two different transactions.
  • The Incorrect summary problem occurs when there is an incorrect sum of the two data.
  • To maintain consistency and serializability during the execution of concurrent transactions some rules are made. These rules are known as concurrency control protocols.
  • Lock-based protocol disables a transaction to perform read or write operations.
  • Timestamp-based protocol associates timestamp with the transactions and execute according to the order of timestamp.
  • Validation based protocol initially makes the list of values of the data and then decides whether the value should be written in the database or not.

Learn More: