B Tree in Data Structure

Overview

B tree is an m-way tree which self balances itself. Such trees are extensively used to organize and manage gigantic datasets and ease searches due to their balanced nature.

Takeaways

Complexity of B tree
- Time complexity - O(log n).

Introduction to B Tree in Data Structure

Binary trees can have a maximum of 2 children while a Multiway Search Tree, commonly known as a M-way tree can yield a maximum of m children where m>2. Due to their structure, M-way trees are mainly used in external searching, i.e. in situations where data is to be retreived from secondary storage like disk files. Following is how an m-way tree looks like.

Introduction to B Trees

As we know, access time for secondary storage is much higher than that of primary storage, and thus our aim is to minimize the number of file accesses. To do so, the height of the tree is reduced by forming an m-way tree. Lesser the height, more efficient is the external search.

Although the height of an M-way tree is less, there is a chance for further reduction which can be achieved by balancing the tree. That's where B tree comes in. A B Tree (height Balanced m-way search Tree) is a special type of M-way tree which balances itself.

The figure above is an example of a B Tree of order 5. It has [6,17] at the root. 4 that is lesser than 6 falls in the left child. 12 being lesser than 17 and greater than 6 is the middle child. [19,22] that are greater than 17 are the rightmost child. The same process follows as we go down the tree.

Important Property of B Tree in Data Structure

A B Tree of order m can be defined as an m-way search tree which is either empty, ot satisfies the following properties:

All leaf nodes of a B tree are at the same level, i.e. they have the same depth (height of the tree).
The keys of each node of a B tree (in case of multiple keys), should be stored in the ascending order.
In a B tree, all non-leaf nodes (except root node) should have at least m/2 children.
All nodes (except root node) should have at least m/2 - 1 keys.
If the root node is a leaf node (only node in the tree), then it will have no children and will have at least on ekey. If the root node is a non-leaf node, then it will have at least 2 children and at least one key.
A non-leaf node with n-1 key values should have n non NULL children.

Important Property of B Tree

Re-read the given points while checking out the above image. As you can see, all the leaf nodes are at the same height. Each node doesn't have multiple values mandatorily, but if it does, then they will be stored in ascending order. Eg: In the figure given, the keys are stored as [19,22] (ascending order). Considering the order of the tree of 4, so each node (except root) has at least 1 key.

From the properties, we can conclude that any node (except the root) in B tree is at least half full and this avoids wastage of space. The B tree is perfectly balanced so that the number of nodes accessed to find a key becomes less.

The value n or the order of the B tree is a degree that helps you determine the number of children a node can hold in a B tree. If the order of the B tree =3, then minimum number of children would be 2 and maximum number would be 3. Similarly, if the order of the B tree is 4, the minimum and maximum number of children would be 2 and 4 respectively.

Q. Is the tree given in the following image a B Tree?

Ans: No. Since the leaf nodes are not at the same level, we can say that this is not a B Tree.

Why Do You Need B Tree Data Structure?

The B tree data structure is needed due to the following reasons:

B Tree is an extension of M-way tree. While B trees are self-balanced, M way trees can be balanced, skewed or any way. In case of external storage, there is a need for faster access. An M-way tree can help ease searches for external storage more efficiently than a normal Binary Search Tree. However, due to the self-balancing nature of a B tree, it had fewer levels and thus the access-time is cut short to a huge extent by a B Tree.
A B tree facilitates ordered sequential access and simplifies insertion and deletion of records when there are millions of records. This is possible due to the reduced height and balanced nature of the B tree.

Operations on a B Tree in Data Structure

In order to ensure that none of the properties of a B tree are violated during the operations, the B tree may be split or joined.

Searching Operation on B Tree in Data Structure

Searching in a B Tree is similar to that in a Binary Search Tree. However, unlike a BST instead of making a 2-way decision (left or right), a B tree makes a n-way decision at each node where n is the number of children of the node.

Steps to search an element in B Tree in Data Structure

The search starts from the root node. Compare the search element k with the root.
1.1. If root node contains k, search completes.
1.2. If k < the first value in the root, we move to the leftmost chld and search the child recursively.
1.3.1. If the root has just 2 children, then move to the rightmost child and search the child recursively.
1.3.2. If the root has more than 2 keys, then search the rest.
If k is not found after traversing the whole tree, then the search element is not present in the B Tree.

Let's see this with the help of an example. Suppose that we want to search for a key k=32 in the B tree shown below.

We first check the root and realize that k=32 is not present here. So we move onto the children of the root node. Since it has 2 children here, we compare between the 2 keys.
Since k>31, go to the right child of the root node.
Compare k with the values of the node, i.e. 34 and 45. Since k<34 (i.e. the left key), we move to the left child of 34.
Since the value in left child of 34 = k = 32, we can say that the key k is found and search is complete.

In this particular example, we compared the key with 4 values until we found it. The time complexity required for the search operation in a B Tree is thus $O(log n)$ .

Inserting Operation on B Tree in Data Structure

While inserting an element to B tree, we have to first check whether the element is already present in the tree. If the element is found, then the insertion doesn't take place. If not, we will proceed further with the insertion. 2 cases need to be taken care of while inserting the key in the leaf node:

Node does not contain more than MAX number of keys - Simply insert the key in the node at its proper place.
Node contains MAX number of keys - Key is inserted to the full node and then a split takes place splitting the full node into 3 parts. The second or median key moves to the parent node and the first and third parts act as the left and right children. The process is repeated with the parent node if it contains MAX number of keys too.

Steps to insert an element in B Tree in Data Structure

Calculate the maximum number of keys in the node based on the order of the B tree.
If the tree is empty, a root node is allocated and the key is inserted and acts as the root node.
Search the appropriate node for insertion.`
If the node is full:
4.1. Insert the elements in increasing order.
4.2. If the elements are greater than the maximum number of keys, split at the median.
4.3. Push the median key upwards and split the left and right keys as left and right child respectively.
If the node is not full, insert the node in increasing order.

Suppose we have to create a B tree of order 4. The elements to be inserted are 4, 2, 20, 10, 1, 14, 7, 11, 3, 8.

Since m=3, max number of keys for a node = m-1 = 2.

Insert 4
Since 2<4, insert 2 to the left of 4 in the same node.
Since 20>4, insert 20 to the right of 4 in the same node. As we now, maximum number of keys in the node are 2, one of these keys will have to be moved to a node above to split it. 4 being the middle element will move up and 2 and 20 will be its left and right nodes respectively.
10>4 and 10<20 and thus, 10 will be inserted as a key in the node that contains 20 as a key.
Since 1<2, it will be inserted as a key in the node that contains 2 as a key.
14>10 and 14<20. Since the number of keys in that node exceeds the maximum number of keys, the node will split after the middle key moves upto the node in the above line. Thus, 14 gets added to the right of 4 in the node that contains 4, and 10 and 20 are split as 2 separate nodes.
Since 7<10, it gets inserted to the left in the node that contains 10 as a key.
11<14 and 11>10. Thus, 11 should get added to the right of the node that contains 7 and 10. However, since the maximum number of keys in the tree are 2, a split should take place. Thus, the middle element 10 moves to the above node and 7 and 11 split as separate nodes. The above node now contains 4, 10 and 14. Since the count of keys exceeds the maximum key count, there would be a split there. Now, 10 is the root node with 4 and 14 as its children.
Since 3<4 and 3>2,it gets inserted to the right of the node containing 1, 2. This node exceeds the maximum count of keys in a node, leading to a split. 2 is added to the upper node beside 4.
Since 8>7 and 8<10, it gets added to the left of the node that contains 7 as a key.

In this particular example, the number of comparisons at each step varied. The first value was directly entered, thereafter every value had to be compared with the nodes present in the tree. The time complexity for insertion in a B Tree is dependent on the number of nodes and thus, O(log n).

The diagram below shows the insertions in order.

Diagram of Insertion in a B Tree

Deletion Operation on B Tree in Data Structure

During insertion, we had to ensure that the number of keys in the node doesn't cross a maximum. Similarly, during deletion, we need to ensure that the number of keys in the node after deletion doesn't go below the minimum number of keys that a node can hold. Thus, in case of deletion, a combination process takes place instead of a split.

Steps to delete an element in B Tree in Data Structure

Deletion can be studied using 2 cases:

Deletion from a leaf node.
Deletion from a non-leaf node.

The 2 cases can be represented by the following flowchart:

Steps to delete an element in B Tree

Case 1 - Deletion from Leaf Node

1.1. If the node has more than MIN keys - Deletion of key does not violate any property and thus the key can be deleted easily by shifting other keys of the node, if required.

1.2. If the node has less than MIN keys - This kind of deletion violates a property of B tree. In case the keys in the left sibling are greater than MIN, keys are borrowed from there. If the keys in right sibling are greater than MIN, then keys are borrowed from there. If either of these do not hold true, then a combination of node takes place with either of the siblings.

Case 2 - Deletion from Non-Leaf Node

In this case, the successor key (smallest key in the right subtree) is copied at the place of the key to be deleted and then the successor is deleted. This case further reduces to Case 1, i.e. deletion from a leaf node.

Now, let's understand deletion using an example. Consider the following B tree of order 4 with the nodes 5, 12, 32 and 53 to be deleted in the given order.

Since the order of the B tree is 4, the minimum and maximum number of keys in a node are 2 and 4 respectively.

Deletion from Non-Leaf Node

Step 1 - Deleting 5

Since 5 is a key in a leaf node with keys>MIN, this would be Case 1.1. A simple deletion with key shift would be done. Keys 9 and 10 are shifted left to fill the gap created by the deletion of 5. Deleting Five

Step 2 - Deleting 12

Here, key 12 is to be deleted from node [12,17]. Since this node has only MIN keys, we will try to borrow from its left sibling [2,9,10] which has more than MIN keys. The parent of these nodes [11,21] contains the separator key 11. So, the last key of left sibling (10) is moved to the place of the separator key and the separator key is moved to the underflow node (the node where deletion took place). The resulting tree after deletion can be found as follows: Deleting Twelve

Step 3 - Deleting 32

Here, key 32 is to be deleted from node [32, 41]. Since this node has only MIN keys and does not have a left sibling, we will try to borrow from its right sibling [53, 61, 64] which has more than MIN keys. The parent of these nodes [51, 67] contains the separator key 51. So, the first key of right sibling (53) is moved to the place of the separator key and the separator key is moved to the underflow node (the node where deletion took place). The resulting tree after deletion can be found as follows:

Deleting Thrity Two

Step 4 - Deleting 53

Here key 53 is to be deleted from node [53, 67] which is a non-leaf node. In such a case, the successor key (61) will be copied in place of 53 and now the task reduces to deletion of 61 from the leaf node. Since this node would have less than MIN keys, we check for the left sibling. Since the left sibling has only MIN keys, we move to the right sibling. The leftmost key of the right sibling (68) moves to the parent node and replaces the separator (67) while the separator shifts to the underflow node making it [64,67]. The resulting tree after deletion can be found as follows:

Deleting Fifty Three

The time complexity here varies dependent on the location from where the key has to be deleted. The average time complexity required for the deletion operation in a B Tree is thus O(log n).

Application of B Tree

Database or File System - Consider having to search details of a person in Yellow Pages (directory containing details of professionals). Such a system where you need to search a record among millions can be done using B Tree.
Search Engines and Multilevel Indexing - B tree Searching is used in search engines and also querying in MySql Server.
To store blocks of data - B Tree helps in faster storing blocks of data in secondary storage due to the balanced structure of B Tree.

B Tree Operation Code in Java / C++

Java

Creation of Node in B Tree

Search Operation in B Tree

Insertion Operation in B Tree

Deletion Operation in B Tree

Display B Tree

C++

Creation of Node in B Tree

Search Operation in B Tree

Insertion Operation in B Tree

Deletion Operation in B Tree

Display Tree in B Tree

Time Complexity of B Tree

In a B tree, disk access times are much lower than other trees like AVL Tree, Red-Black Tree, etc. due to its low height. If we consider n to be the total number of elements in a B tree, the best case time complexity for any of the common operations, i.e. Search, insert and delete will be O(log n).

Unlock the Power of Efficient Coding! Join Our DSA Course Today and Transform Your Problem-Solving Abilities.

Summary

The B Tree is a data structure that aids in data operations such as insertion, deletion, and traversal.
Disk access times in B trees are much lower due to the low height of these trees.
B trees are extremely useful in real-world scenarios requiring large datasets due to its balanced structure.