Open Addressing | Closed Hashing

Learn via video course
FREE
View all courses
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
Topics Covered

Open Addressing, also known as closed hashing, is a simple yet effective way to handle collisions in hash tables. Unlike chaining, it stores all elements directly in the hash table. This method uses probing techniques like Linear, Quadratic, and Double Hashing to find space for each key, ensuring easy data management and retrieval in hash tables.


The main concept of Open Addressing hashing is to keep all the data in the same hash table and hence a bigger Hash Table is needed. When using open addressing, a collision is resolved by probing (searching) alternative cells in the hash table until our target cell (empty cell while insertion, and cell with value xx while searching xx) is found. It is advisable to keep load factor (α\alpha) below 0.50.5, where α\alpha is defined as α=n/m\alpha=n/m where nn is the total number of entries in the hash table and mm is the size of the hash table. As explained above, since all the keys are stored in the same hash table so it's obvious that α1\alpha\leq1 because nmn\leq m always. If in case a collision happens then, alternative cells of the hash table are checked until the target cell is found. More formally,

  • Cells h0(x),h1(x),h2(x)....hn(x)h_0(x), h_1(x), h_2(x) .... h_n(x) are tried consecutively until the target cell has been found in the hash table. Where hi(x)=(hash(x)+f(i))%Size,h_i(x)=(hash(x) + f(i))\%Size, keeping f(0)=0f(0)=0.
  • The collision function ff is decided according to method resolution strategy.

There are three main Method Resolution Strategies --

  1. Linear Probing
  2. Quadratic Probing
  3. Double Hashing

Linear Probing

In linear probing, collisions are resolved by searching the hash table consecutively (with wraparound) until an empty cell is found. The definition of collision function ff is quite simple in linear probing. As suggested by the name it is a linear function of ii or simply f(i)=if(i) = i. Operations in linear probing collision resolution technique -

  • For inserting xx we search for the cells hash(x)+0,hash(x)+1,...hash(x)+khash(x)+0, hash(x)+1, ... hash(x)+k until we found a empty cell to insert xx.
  • For searching xx we again search for the cells hash(x)+0,hash(x)+1,...hash(x)+khash(x)+0, hash(x)+1, ... hash(x)+k until we found a cell with value xx. If we found a cell that has never been occupied it means xx is not present in the hash table.
  • For deletion, we repeat the search process if a cell is found with value xx we replace the value xx with a predefined unique value (say \infty) to denote that this cell has contained some value in past.

Example of linear probing -

Table Size = 77 Hash Function - hash(key)=key%7hash(key)=key\%7 Collision Resoulution Strategy - f(i)=if(i)=i

  • Insert - 16,40,27,9,7516, 40, 27, 9, 75
  • Search - 75,2175, 21
  • Delete - 4040

Steps involved are

  • Step 1 - Make an empty hash table of size 7. Empty hash table of size 7

  • Step 2 - Inserting 16,4016, 40, and 2727.

    • hash(16)=16%7=2hash(16)=16\%7=2
    • hash(40)=40%7=5hash(40)=40\%7=5
    • hash(27)=27%7=6hash(27)=27\%7=6

As we do not get any collision we can easily insert values at their respective indexes generated by the hash function.
After inserting, the hash table will look like - hash table After insertion

  • Step 3 - Inserting 9 and 75.
    • hash(9)=9%7=2hash(9)=9\%7=2 But at index 22 already 1616 is placed and hence collision occurs so as per linear probing we will search for consecutive cells till we find an empty cell.
      So we will probe for hash(9)+1hash(9)+1 i.e.i.e. cell 3, since the next cell i.e. 3i.e. \space 3 is not occupied we place 9 in cell 33.
    • hash(75)=75%7=5hash(75)=75\%7=5 Again collision happens because 40 is already placed in cell 55. So will search for the consecutive cells, so we search for cell 66 which is also occupied then we will search for cell (hash(75)+2)%7 i.e. 0(hash(75)+2)\%7\space i.e. \space 0 which is empty so we will place 75 there.

After inserting 99 and 7575 hash table will look like - After inserting 9 and 75 hash table

  • Step 4 - Search 7575 and 2121 -
    • hash(75)=75%7=5hash(75)=75\%7=5 But at index 5,755, 75 is not present so we search for consecutive cells until we found an empty cell or a cell with a value of 7575. So we search in cell 66 but it does not contain 7575, so we search for 7575 in cell 00 and we stop our search here as we have found 7575.
    • h(21)=21%7=0h(21)=21\%7=0 We will search for 2121 in cell 00 but it contains 75 so we will search in the next cell hash(21)+1,hash(21)+1, i.e. 1i.e. \space 1 since it is found empty it is clear that 2121 do not exist in our table.
  • Step 5 - Delete 4040
    • h(40)=40%7=5h(40)=40\%7=5 Firstly we search for 4040 which results in a successful search as we get 4040 in cell 55 then we will remove 4040 from cell 55 and replace it with a unique value (say -ash\infty).

After all these operations our hash table will look like -  hash table After all  operations

Algorithm of linear probing

  • Insert(xx) -
    • Find the hash value, kk of xx from the hash function hash(x)hash(x).
    • Iterate consecutively in the table starting from the k,k, till you find a cell that is currently not occupied.
    • Place xx in that cell.
  • Search(xx) -
    • Find the hash value, kk of xx from the hash function hash(x)hash(x).
    • Iterate consecutively in the table starting from the k,k, till you find a cell that contains xx or which is never been occupied.
    • If we found xx, then the search is successful and unsuccessful in the other case.
  • Delete(xx) -
    • Repeat the steps of Search(xx).
    • If element xx does not exist in the table then we can't delete it.
    • If xx exists in the cell (say kk), put \infty in cell k to denote it has been occupied some time in the past, but now it is empty.

Pesudocode of Linear Probing

Code of Linear Probing

  • C/C++
  • Java

Output -

Problem With Linear Probing

Even though linear probing is intuitive and easy to implement but it suffers from a problem known as Primary Clustering. It occurs because the table is large enough therefore time to get an empty cell or to search for a key kk is quite large. This happens mainly because consecutive elements form a group and then it takes a lot of time to find an element or an empty cell which ultimately makes the worst case time complexity of searching, insertion and deletion operations to be O(n)O(n), where nn is the size of the table.

Quadratic Probing

Quadratic probing eliminates the problem of "Primary Clustering" that occurs in Linear probing techniques. The working of quadratic probing involves taking the initial hash value and probing in the hash table by adding successive values of an arbitrary quadratic polynomial. As suggested by its name, quadratic probing uses a quadratic collision function ff. One of the most common and reasonable choices for ff is -- f(i)=i2f(i)=i^2Operations in quadratic probing collision resolution strategy are -

  • For inserting xx we search for the cells hash(x)+0,hash(x)+12,hash(x)+22,...hash(x)+0, hash(x)+1^2, hash(x)+2^2, ... until we find an empty cell to insert xx.
  • For searching xx we again search for the cells hash(x)+0,hash(x)+12,hash(x)+22,...hash(x)+0, hash(x)+1^2, hash(x)+2^2, ... until we find a cell with value xx. If we find an empty cell that has never been occupied it means xx is not present in the hash table.
  • For deletion, we repeat the search process if a cell is found with value xx we replace the value xx with a predefined unique value to denote that this cell has contained some value in past.

You can see that the only one change between linear and quadratic probing is that in case of collision we are not searching in cells consecutively, rather we are interested in probing the cells quadratically. Let us understand this by an example -

Example of Quadratic Probing

Table Size = 7 Insert = 15, 23, and 85. Search & Delete = 85 Hash Function - Hash(x)=x%7Hash(x)=x\%7 Collision Resolution Strategy - f(i)=i2f(i)=i^2

  • Step 1 - Create a table of size 77. table of size 7

  • Step 2 - Insert 1515 and 2323

    • hash(15)=15%7=1hash(15)=15\%7=1 Since the cell at index 1 is not occupied we can easily insert 15 at cell 1.
    • hash(23)=23%7=2hash(23)=23\%7=2 Again cell 2 is not occupied so place 23 in cell 2. After performing this step our hash table will look like - insertion of 15 and 23
  • Step 3 - Inserting 8585

    • hash(85)=85%7=1hash(85)=85\%7=1 In our hash table cell 1 is already occupied so we will search for cell 1+12, i.e.1+1^2,\space i.e. cell 22. Again it is found occupied so we will search for cell 1+22, i.e.1+2^2, \space i.e. cell 55. It is not occupied so we will place 8585 in cell 55. After performing all these 33 insertions in our hash table it will look like - hash table After performing all these 3 insertions
  • Step 4 - Search and delete 8585 We will go through the same steps as in inserting 85 and when we find 85 our search is successful and to delete it we will replace it with some other unique value a good choice is to replace it with \infty.

Now as there is not much change in the approach of quadratic probing and linear probing. We are skipping algorithm and pseudocode of quadratic probing and directly jumping to its code.

Code of quadratic probing

  • C/C++
  • Java

Output -

Problem with Quadratic Probing

It can be observed that when the hash value for two keys, say x1x_1 and x2x_2 is the same then they will follow the same probe sequence because k, h(x1)+k2=h(x2)+k2\forall k, \space h(x_1)+k^2=h(x_2)+k^2. Which leads to a milder form of clustering, called secondary clustering.

Another problem which can be more severe is, we are not sure if we can probe all the locations of our table. In other words, it can be said that there is no guarantee of finding an empty cell if the table is more than half-filled. The problem can be more severe when the size of the table is not a prime number.

Luckily :relieved: , to pose a solution to this problem, we have a theorem saying that -

  • If the table size is prime and α0.5\alpha \leq 0.5 then all probes will be to a different location and an element can always be inserted in the hash table.

Bonus Tip of Quadratic Probing

There are a lot of costly mathematical operations like * and %\% used in quadratic probing which can make it a little slow when dealing with huge input. To overcome this difficulty we can use some of our mathematical knowledge, by using this trick - Let, H(i)=h(k)+i2H(i)=h(k)+i^2 for any given kk, then it can be rewritten as H(i)=H(i1)+(2×i)1H(i)=H(i-1)+(2\times i)-1, of course with taking modulus with table size wherever needed.

Double Hashing

Double hashing offers us one of the best techniques for hashing with open addressing. The permutation formed by double hashing is like a random permuatation therefore it almost eliminates the chances of cluster forming as it uses a secondary hash function as an offset to deal with collision condition.
The hash function that is used in double hashing is of the form -

h(k,i)=hash1(k)+i×hash2(k),  i=0,1,2,3,...h(k,i)=hash_1(k)+i\times hash_2(k), \space \space i=0, 1, 2, 3,...

A convenient and reasonable way to choose hash functions, hash1(k)hash_1(k) and hash2(k)hash_2(k) for a table of size, "sizesize" is like - hash1(k)=k%sizehash_1(k)=k\%size

hash2(k)=1+(k%size)hash_2(k)=1+(k\%size')

Where sizesize should be a prime number and sizesize' can be slightly less than sizesize (say, size1size-1). The reason to add 11 in hash2(k)hash_2(k) is to avoid getting hash2(k)hash_2(k) as 00, because if for any kk we get hash2(k)hash_2(k) as 00 then i,\forall i, we will have h(k,i)=hash1(k)h(k,i)=hash_1(k).

By now, you might have got an idea about how you will do operations double hashing. So you can implement it like shown below - Operations in double hashing technique are -

  • For inserting xx we search for the cells hash1(x)+0×hash2(x), hash1(x)+1×hash2(x), hash1(x)+2×hash_1(x)+0\times hash_2(x),\space hash_1(x)+1\times hash_2(x), \space hash_1(x)+2\times hash2(x),...hash_2(x), ... until we find an empty cell to insert xx. From here you can observe that if we don't get any collision at first (from the hash value of hash1(x))hash_1(x)), we will not calculate hash2(x)hash_2(x) because we can simply place xx at the index hash1(x)hash_1(x)
  • For searching kk we again search for the cells hash1(x)+0×hash2(x), hash1(x)+1×hash2(x), hash1(x)+2×hash_1(x)+0\times hash_2(x),\space hash_1(x)+1\times hash_2(x), \space hash_1(x)+2\times hash2(x),...hash_2(x), ... until we find a cell with value xx. If we find a cell that has never been occupied it means xx is not present in the hash table.
  • For deletion, we repeat the search process if a cell is found with value xx we replace the value xx with a predefined unique value to denote that this cell has contained some value in past.

Example of double hashing

Table size = 7 hash1(k)hash_1(k) = k%7k\%7 and hash2(k)=1+k%6hash_2(k)=1+k\%6 Insert = 37,25,12,40,37, 25, 12, 40, and 7575 Search & Delete = 7575

  • Step 1 - Create a table of size 77. Example of double hashing

  • Step 2 - Insert 37,25,37, 25, and 1212.

    • hash1(37)=37%7=2hash_1(37)=37\%7=2
    • hash1(25)=25%7=4hash_1(25)=25\%7=4
    • hash1(12)=12%7=5hash_1(12)=12\%7=5

    There is no collision at any point during inserting these three values so we will simply place these elements at their respective positions. After which, our hash table will look like - Example of double hashing step 2

  • Step 3 - Insert 4040 and 7474

    • hash1(40)=40%7=5,  hash2(40)=1+40%6=5hash_1(40)=40\%7=5, \space \space hash_2(40)=1+40\%6=5 But at the cell at index 5 is already occupied, so we will check for next cell i.e. hash1(40)+1×hash2(40)i.e.\space hash_1(40)+1\times hash_2(40) which will evaluate to (5+5)%7=3(5+5)\%7=3. Cell at index 3 is not occupied so we will place 40 in cell 3.

    • hash1(74)=74%7=4,  hash2(74)=1+74%6=3hash_1(74)=74\%7=4, \space \space hash_2(74)=1+ 74\%6=3 But at the cell at index 4 is already occupied, so we will check for next cell i.e. hash1(74)+1×hash2(74)i.e.\space hash_1(74)+1\times hash_2(74) which will evaluate to (4+3)%7=0(4+3)\%7=0. Cell at index 0 is not occupied so we will place 74 in cell 0.

    After which our hash table will look like - Example of double hashing step 3

  • Step 4 - Search and Delete 7474 hash1(74)=74%7=4,  hash2(74)=1+74%6=3hash_1(74)=74\%7=4, \space \space hash_2(74)=1+ 74\%6=3

    We will firstly search in cell 4 which does not contain 7474 so we will search for the next cell which may contain 74 i.e. hash1(74)+1×hash2(74)i.e.\space hash_1(74)+1\times hash_2(74) which will evaluate to (4+3)%7=0(4+3)\%7=0. We find 7474 in the cell at index 0 so our search is successful and now to delete it we replace its value with a very large number (say \infty).

After all the steps our hash table will look like - Example of double hashing step 4

Since the algorithm is not very different from the above two methods of hashing with open addressing we are directly going to code of Double Hashing -

Code of double hashing -

  • C/C++
  • Java

Output -

Analysis of Open Addressing

It is can be seen that, the cost of Insert, Search and Delete operations depends on how much the hash table is filled. More formally, we can say performance is related to load factor (α\alpha). Time required to perform any operation increases as α\alpha increases. With some mathematical proofs, we can say that it takes roughly O(1/(1α))O(1/(1-\alpha)) time operate in the hash table if we are using open addressing.

Now we will look at the relative efficiency between the three collision resolution strategies that we have discussed.
relative efficiency

From the above image, we can see that performance of quadratic probing and double hashing is almost similar but Linear probing took a very large number of probes for an operation when the table is more than half-filled. The main difference between the above three strategies is that, linear probing provides the best cache performance but the clustering problem is very much evident in it. Double hashing exhibits poor cache performance but there is very less or no clustering problem. While quadratic probing lies between the two in both of the fields.

Advantages of open addressing -

  • Open addressing provides better cache performance because all the data is stored in the same table only.
  • It is easy to implement as no pointers are not involved.
  • Different strategies to resolve collisions can be adopted as per the use case.

Practice Problems of Hashing with Open Addresssing

Ques 1Ques\space 1- Given the hash table of size 7, already three keys are inserted as shown in the image. What will be the number of comparison required to insert 7272 in the hash table if the hash function is hash(x)=x%7hash(x)=x\%7?

Practice Problem 1

Answer
Correct Answer - 4
Explanation -
KaTeX parse error: Expected 'EOF', got '%' at position 12: hash(72)=72%̲7=2 so we will begin our probe from 2nd index of the hash table and it will take 4 comparison to insert 72 in the hash table. Image -

practice problem 2


Ques 2Ques \space 2- A hash function hh is defined by h(x)=x%7h(x)=x\%7, If keys that are inserted into a hash table of size 7 are 37, 73, 58, and 52. Answer the following questions -

  • What will be the position of 52 in the hash table, if the linear probing strategy is used to resolve the collision?
  • What will be the position of 58 in the hash table, if the quadratic probing strategy is used to resolve the collision?

Answers
Position of 52 if linear probing is used is - 55 Position of 58 if quadratic probing is used is - 66

Explanation - Below are the hash tables after performing the insertion operations. They are self-sufficient to explain the solutions. practice-problem-2 answer

Conclusion

  • In this article, we have seen how we can use open addressing to resolve collisions during hashing process.
  • Three collision resolution strategies have been discussed viz. Linear Probing, Quadratic Probing, and Double Hashing.
  • Remember that, try to keep load factor (α\alpha) below 0.50.5 and table size as a prime number unless it is unavoidable.

Code with Confidence! Enroll in Our Online Data Structures Courses and Master Efficient Algorithms.