Open Addressing - Scaler Topics

Open Addressing, also known as closed hashing, is a simple yet effective way to handle collisions in hash tables. Unlike chaining, it stores all elements directly in the hash table. This method uses probing techniques like Linear, Quadratic, and Double Hashing to find space for each key, ensuring easy data management and retrieval in hash tables.

The main concept of Open Addressing hashing is to keep all the data in the same hash table and hence a bigger Hash Table is needed. When using open addressing, a collision is resolved by probing (searching) alternative cells in the hash table until our target cell (empty cell while insertion, and cell with value $x$ while searching $x$ ) is found. It is advisable to keep load factor ( $\alpha$ ) below $0.5$ , where $\alpha$ is defined as $\alpha=n/m$ where $n$ is the total number of entries in the hash table and $m$ is the size of the hash table. As explained above, since all the keys are stored in the same hash table so it's obvious that $\alpha\leq1$ because $n\leq m$ always. If in case a collision happens then, alternative cells of the hash table are checked until the target cell is found. More formally,

Cells $h_0(x), h_1(x), h_2(x) .... h_n(x)$ are tried consecutively until the target cell has been found in the hash table. Where $h_i(x)=(hash(x) + f(i))\%Size,$ keeping $f(0)=0$ .
The collision function $f$ is decided according to method resolution strategy.

There are three main Method Resolution Strategies --

Linear Probing
Quadratic Probing
Double Hashing

Linear Probing

In linear probing, collisions are resolved by searching the hash table consecutively (with wraparound) until an empty cell is found. The definition of collision function $f$ is quite simple in linear probing. As suggested by the name it is a linear function of $i$ or simply $f(i) = i$ . Operations in linear probing collision resolution technique -

For inserting $x$ we search for the cells $hash(x)+0, hash(x)+1, ... hash(x)+k$ until we found a empty cell to insert $x$ .
For searching $x$ we again search for the cells $hash(x)+0, hash(x)+1, ... hash(x)+k$ until we found a cell with value $x$ . If we found a cell that has never been occupied it means $x$ is not present in the hash table.
For deletion, we repeat the search process if a cell is found with value $x$ we replace the value $x$ with a predefined unique value (say $\infty$ ) to denote that this cell has contained some value in past.

Example of linear probing -

Table Size = $7$ Hash Function - $hash(key)=key\%7$ Collision Resoulution Strategy - $f(i)=i$

Insert - $16, 40, 27, 9, 75$
Search - $75, 21$
Delete - $40$

Steps involved are

Step 1 - Make an empty hash table of size 7.
Step 2 - Inserting $16, 40$ , and $27$ .
- $hash(16)=16\%7=2$
- $hash(40)=40\%7=5$
- $hash(27)=27\%7=6$

As we do not get any collision we can easily insert values at their respective indexes generated by the hash function.
After inserting, the hash table will look like - hash table After insertion

Step 3 - Inserting 9 and 75.
- $hash(9)=9\%7=2$ But at index $2$ already $16$ is placed and hence collision occurs so as per linear probing we will search for consecutive cells till we find an empty cell.
  So we will probe for $hash(9)+1$ $i.e.$ cell 3, since the next cell $i.e. \space 3$ is not occupied we place 9 in cell $3$ .
- $hash(75)=75\%7=5$ Again collision happens because 40 is already placed in cell $5$ . So will search for the consecutive cells, so we search for cell $6$ which is also occupied then we will search for cell $(hash(75)+2)\%7\space i.e. \space 0$ which is empty so we will place 75 there.

After inserting $9$ and $75$ hash table will look like - After inserting 9 and 75 hash table

Step 4 - Search $75$ and $21$ -
- $hash(75)=75\%7=5$ But at index $5, 75$ is not present so we search for consecutive cells until we found an empty cell or a cell with a value of $75$ . So we search in cell $6$ but it does not contain $75$ , so we search for $75$ in cell $0$ and we stop our search here as we have found $75$ .
- $h(21)=21\%7=0$ We will search for $21$ in cell $0$ but it contains 75 so we will search in the next cell $hash(21)+1,$ $i.e. \space 1$ since it is found empty it is clear that $21$ do not exist in our table.
Step 5 - Delete $40$
- $h(40)=40\%7=5$ Firstly we search for $40$ which results in a successful search as we get $40$ in cell $5$ then we will remove $40$ from cell $5$ and replace it with a unique value (say -ash $\infty$ ).

After all these operations our hash table will look like - hash table After all operations

Algorithm of linear probing

Insert( $x$ ) -
- Find the hash value, $k$ of $x$ from the hash function $hash(x)$ .
- Iterate consecutively in the table starting from the $k,$ till you find a cell that is currently not occupied.
- Place $x$ in that cell.
Search( $x$ ) -
- Find the hash value, $k$ of $x$ from the hash function $hash(x)$ .
- Iterate consecutively in the table starting from the $k,$ till you find a cell that contains $x$ or which is never been occupied.
- If we found $x$ , then the search is successful and unsuccessful in the other case.
Delete( $x$ ) -
- Repeat the steps of Search( $x$ ).
- If element $x$ does not exist in the table then we can't delete it.
- If $x$ exists in the cell (say $k$ ), put $\infty$ in cell k to denote it has been occupied some time in the past, but now it is empty.

Pesudocode of Linear Probing

Code of Linear Probing

C/C++

Java

Output -

Problem With Linear Probing

Even though linear probing is intuitive and easy to implement but it suffers from a problem known as Primary Clustering. It occurs because the table is large enough therefore time to get an empty cell or to search for a key $k$ is quite large. This happens mainly because consecutive elements form a group and then it takes a lot of time to find an element or an empty cell which ultimately makes the worst case time complexity of searching, insertion and deletion operations to be $O(n)$ , where $n$ is the size of the table.

Quadratic Probing

Quadratic probing eliminates the problem of "Primary Clustering" that occurs in Linear probing techniques. The working of quadratic probing involves taking the initial hash value and probing in the hash table by adding successive values of an arbitrary quadratic polynomial. As suggested by its name, quadratic probing uses a quadratic collision function $f$ . One of the most common and reasonable choices for $f$ is -- $f(i)=i^2$ Operations in quadratic probing collision resolution strategy are -

For inserting $x$ we search for the cells $hash(x)+0, hash(x)+1^2, hash(x)+2^2, ...$ until we find an empty cell to insert $x$ .
For searching $x$ we again search for the cells $hash(x)+0, hash(x)+1^2, hash(x)+2^2, ...$ until we find a cell with value $x$ . If we find an empty cell that has never been occupied it means $x$ is not present in the hash table.
For deletion, we repeat the search process if a cell is found with value $x$ we replace the value $x$ with a predefined unique value to denote that this cell has contained some value in past.

You can see that the only one change between linear and quadratic probing is that in case of collision we are not searching in cells consecutively, rather we are interested in probing the cells quadratically. Let us understand this by an example -

Example of Quadratic Probing

Table Size = 7 Insert = 15, 23, and 85. Search & Delete = 85 Hash Function - $Hash(x)=x\%7$ Collision Resolution Strategy - $f(i)=i^2$

Step 1 - Create a table of size $7$ .
Step 2 - Insert $15$ and $23$
- $hash(15)=15\%7=1$ Since the cell at index 1 is not occupied we can easily insert 15 at cell 1.
- $hash(23)=23\%7=2$ Again cell 2 is not occupied so place 23 in cell 2. After performing this step our hash table will look like -
Step 3 - Inserting $85$
- $hash(85)=85\%7=1$ In our hash table cell 1 is already occupied so we will search for cell $1+1^2,\space i.e.$ cell $2$ . Again it is found occupied so we will search for cell $1+2^2, \space i.e.$ cell $5$ . It is not occupied so we will place $85$ in cell $5$ . After performing all these $3$ insertions in our hash table it will look like -
Step 4 - Search and delete $85$ We will go through the same steps as in inserting 85 and when we find 85 our search is successful and to delete it we will replace it with some other unique value a good choice is to replace it with $\infty$ .

Now as there is not much change in the approach of quadratic probing and linear probing. We are skipping algorithm and pseudocode of quadratic probing and directly jumping to its code.

Code of quadratic probing

C/C++

Java

Output -

Problem with Quadratic Probing

It can be observed that when the hash value for two keys, say $x_1$ and $x_2$ is the same then they will follow the same probe sequence because $\forall k, \space h(x_1)+k^2=h(x_2)+k^2$ . Which leads to a milder form of clustering, called secondary clustering.

Another problem which can be more severe is, we are not sure if we can probe all the locations of our table. In other words, it can be said that there is no guarantee of finding an empty cell if the table is more than half-filled. The problem can be more severe when the size of the table is not a prime number.

Luckily :relieved: , to pose a solution to this problem, we have a theorem saying that -

If the table size is prime and $\alpha \leq 0.5$ then all probes will be to a different location and an element can always be inserted in the hash table.

Bonus Tip of Quadratic Probing

There are a lot of costly mathematical operations like $*$ and $\%$ used in quadratic probing which can make it a little slow when dealing with huge input. To overcome this difficulty we can use some of our mathematical knowledge, by using this trick - Let, $H(i)=h(k)+i^2$ for any given $k$ , then it can be rewritten as $H(i)=H(i-1)+(2\times i)-1$ , of course with taking modulus with table size wherever needed.

Double Hashing

Double hashing offers us one of the best techniques for hashing with open addressing. The permutation formed by double hashing is like a random permuatation therefore it almost eliminates the chances of cluster forming as it uses a secondary hash function as an offset to deal with collision condition.
The hash function that is used in double hashing is of the form -

h(k,i)=hash_1(k)+i\times hash_2(k), \space \space i=0, 1, 2, 3,...

A convenient and reasonable way to choose hash functions, $hash_1(k)$ and $hash_2(k)$ for a table of size, " $size$ " is like - $hash_1(k)=k\%size$

$hash_2(k)=1+(k\%size')$

Where $size$ should be a prime number and $size'$ can be slightly less than $size$ (say, $size-1$ ). The reason to add $1$ in $hash_2(k)$ is to avoid getting $hash_2(k)$ as $0$ , because if for any $k$ we get $hash_2(k)$ as $0$ then $\forall i,$ we will have $h(k,i)=hash_1(k)$ .

By now, you might have got an idea about how you will do operations double hashing. So you can implement it like shown below - Operations in double hashing technique are -

For inserting $x$ we search for the cells $hash_1(x)+0\times hash_2(x),\space hash_1(x)+1\times hash_2(x), \space hash_1(x)+2\times$ $hash_2(x), ...$ until we find an empty cell to insert $x$ . From here you can observe that if we don't get any collision at first (from the hash value of $hash_1(x))$ , we will not calculate $hash_2(x)$ because we can simply place $x$ at the index $hash_1(x)$
For searching $k$ we again search for the cells $hash_1(x)+0\times hash_2(x),\space hash_1(x)+1\times hash_2(x), \space hash_1(x)+2\times$ $hash_2(x), ...$ until we find a cell with value $x$ . If we find a cell that has never been occupied it means $x$ is not present in the hash table.
For deletion, we repeat the search process if a cell is found with value $x$ we replace the value $x$ with a predefined unique value to denote that this cell has contained some value in past.

Example of double hashing

Table size = 7 $hash_1(k)$ = $k\%7$ and $hash_2(k)=1+k\%6$ Insert = $37, 25, 12, 40,$ and $75$ Search & Delete = $75$

Step 1 - Create a table of size $7$ .
Step 2 - Insert $37, 25,$ and $12$ .
- $hash_1(37)=37\%7=2$
- $hash_1(25)=25\%7=4$
- $hash_1(12)=12\%7=5$
There is no collision at any point during inserting these three values so we will simply place these elements at their respective positions. After which, our hash table will look like -
Step 3 - Insert $40$ and $74$
- $hash_1(40)=40\%7=5, \space \space hash_2(40)=1+40\%6=5$ But at the cell at index 5 is already occupied, so we will check for next cell $i.e.\space hash_1(40)+1\times hash_2(40)$ which will evaluate to $(5+5)\%7=3$ . Cell at index 3 is not occupied so we will place 40 in cell 3.
- $hash_1(74)=74\%7=4, \space \space hash_2(74)=1+ 74\%6=3$ But at the cell at index 4 is already occupied, so we will check for next cell $i.e.\space hash_1(74)+1\times hash_2(74)$ which will evaluate to $(4+3)\%7=0$ . Cell at index 0 is not occupied so we will place 74 in cell 0.
After which our hash table will look like -
Step 4 - Search and Delete $74$ $hash_1(74)=74\%7=4, \space \space hash_2(74)=1+ 74\%6=3$

We will firstly search in cell 4 which does not contain $74$ so we will search for the next cell which may contain 74 $i.e.\space hash_1(74)+1\times hash_2(74)$ which will evaluate to $(4+3)\%7=0$ . We find $74$ in the cell at index 0 so our search is successful and now to delete it we replace its value with a very large number (say $\infty$ ).

After all the steps our hash table will look like - Example of double hashing step 4

Since the algorithm is not very different from the above two methods of hashing with open addressing we are directly going to code of Double Hashing -

Analysis of Open Addressing

It is can be seen that, the cost of Insert, Search and Delete operations depends on how much the hash table is filled. More formally, we can say performance is related to load factor ( $\alpha$ ). Time required to perform any operation increases as $\alpha$ increases. With some mathematical proofs, we can say that it takes roughly $O(1/(1-\alpha))$ time operate in the hash table if we are using open addressing.

Now we will look at the relative efficiency between the three collision resolution strategies that we have discussed.

From the above image, we can see that performance of quadratic probing and double hashing is almost similar but Linear probing took a very large number of probes for an operation when the table is more than half-filled. The main difference between the above three strategies is that, linear probing provides the best cache performance but the clustering problem is very much evident in it. Double hashing exhibits poor cache performance but there is very less or no clustering problem. While quadratic probing lies between the two in both of the fields.

Advantages of open addressing -

Open addressing provides better cache performance because all the data is stored in the same table only.
It is easy to implement as no pointers are not involved.
Different strategies to resolve collisions can be adopted as per the use case.

Practice Problems of Hashing with Open Addresssing

$Ques\space 1-$ Given the hash table of size 7, already three keys are inserted as shown in the image. What will be the number of comparison required to insert $72$ in the hash table if the hash function is $hash(x)=x\%7$ ?

Practice Problem 1

Answer
Correct Answer - 4
Explanation -
KaTeX parse error: Expected 'EOF', got '%' at position 12: hash(72)=72%̲7=2 so we will begin our probe from 2nd index of the hash table and it will take 4 comparison to insert 72 in the hash table. Image -

practice problem 2

$Ques \space 2-$ A hash function $h$ is defined by $h(x)=x\%7$ , If keys that are inserted into a hash table of size 7 are 37, 73, 58, and 52. Answer the following questions -

What will be the position of 52 in the hash table, if the linear probing strategy is used to resolve the collision?
What will be the position of 58 in the hash table, if the quadratic probing strategy is used to resolve the collision?

Answers
Position of 52 if linear probing is used is - $5$ Position of 58 if quadratic probing is used is - $6$

Explanation - Below are the hash tables after performing the insertion operations. They are self-sufficient to explain the solutions. practice-problem-2 answer