Find Duplicates in Array

Learn via video course
FREE
View all courses
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
Topics Covered

Problem Statement

Given an array A of size N + 1. Each element in array A is between 1 and N. Write a program to find one of the duplicates in array A. If there are multiple answers, then print any.

Example 1

Input: N = 5 A = [1, 2, 5, 3, 4, 3] Output 3

Explanation In this example, element 3 is repeated at position 4 and position 6 in array A.

Example 2

Input N = 5 A = [1, 2, 4, 2, 4, 4] Output 2

Explanation In this example, element 2 is repeated as well as element 4 is repeated. Hence we can output any of them.

Constraints

1<=N<=1051 <= N <= 10^{5}
1<=A[i]<=N1 <= A[i] <= N

Problem Discussion

Before moving on to the solution, let us first analyze the constraints. In this problem, we are given that NN is at least 11. This means that the size of the input array will be at least 2. This constraint guarantees that there is at least one duplicate element in the array. Why?

This is because we are given that array AA is of size N+1N + 1 and each element A[i]A[i] is between 11 and NN. So, if we give unique values to the first NN positions in the array AA, then the last position of array AA will be a duplicate in the array.

For example: N=6N = 6 Then, the maximum number of unique values in AA can be 55. A=[1,2,3,4,5,4]A = [1, 2, 3, 4, 5, 4]. The last element will be repeated. So the answer always exists.

example-duplicate-in-array

Approach 1: Brute Force

We can solve this problem using a simple brute force over the array AA. We can use two nested iterations over array AA to find the duplicate element.

  • For each element A[i]A[i], we can check is there an element A[j]A[j] such that iji\neq j and A[i]==A[j]A[i] == A[j].
  • If such an element A[j]A[j] exists, then we can say that A[i]A[i] is the repeated element and we have found our answer. In this way, we can find duplicate in array.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

In this method, we will do two nested iterations over the array AA of size N+1N + 1. Hence the time complexity will be O(N)O(N) for the outer loop and O(N)O(N) for the inner loop. Since the iterations are nested, the time complexity will be O(N2)O(N^2).
Time complexity: O(N2)O(N^2)

Space Complexity Analysis

In this method, we only need a constant amount of variables of iteration. We are also keeping another variable to store the duplicate element. Hence the space complexity is O(1)O(1).
Space Complexity: O(1)O(1)

Approach 2: Using Sorting

  • Firstly, we will sort the given array A in increasing or decreasing order. This will guarantee that all equal elements are placed side by side in the sorted array.
  • After sorting, equal elements will be placed one after the other so we can do a simple iteration on the sorted array A and check adjacent elements.
  • When iterating over the array, if we find a position where adjacent elements are equal, then we have found our duplicate element.

For example A:[3,2,5,4,3,1,4]A: [3, 2, 5, 4, 3, 1, 4] Sorted(A):[1,2,3,3,4,4,5]Sorted(A): [1, 2, 3, 3, 4, 4, 5] Now if we iterate over the sorted array AA, and compare adjacent elements at position 33, we will find that A[3]=A[4]=3A[3] = A[4] = 3. This means that the duplicate element is 33.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

We can sort the array using a comparison-based sorting algorithm. This will take O(Nlog(N))O(N log(N)) time. Now to search for the duplicate element, we are doing a simple iteration. This will take O(N)O(N) time.
Time complexity: O(Nlog(N))O(N log(N)).

We can also use a linear time sorting algorithm like Counting sort. In this case, the time complexity will be O(N)O(N). The trade-off here is that counting sort requires extra space.

Space Complexity Analysis

We are not using any extra space in our algorithm. We only need a constant number of variables for iteration and searching for the answer. Hence the space complexity is O(1)O(1). Space Complexity: O(1)O(1)

If we use a linear time sorting algorithm like counting sort, then the space complexity will be O(N)O(N) as it requires an extra array to store the count of each element.

Approach 3: Using Frequency Array

To find a duplicate element in the array, we can make use of the observation that any duplicate number A[i] will occur more than once in array A. This means that suppose, if we know the number of times each element occurs in array A, then the duplicate element will be one of the elements whose frequency or the number of times it occurs in A is more than one.

Now how to find the frequency of each element ? To do this, we can create a frequency table over the array A. A frequency table is a map that represents the number of times each element occurs in array A. It represents the count or frequency of each element in array A.

  • A frequency table can be implemented using arrays or a hash map. Generally, if the range of elements in the given array A is small, then it is preferred that we use arrays as a frequency table. This is because a hash map can have some overhead costs associated with it.
  • Let's say we have a variable freqfreq that represents our frequency table of size NN. We will create a table of size NN as A[i]A[i] lies between 11 and NN.
  • Now, we iterate over array A and increment freq[A[i]]freq[A[i]] by 11. In this way, we can build the frequency table.
  • Now, we can simply check the frequency table. If at any point freq[i]>1freq[i] > 1 then we have found the duplicate element equal to ii.

For example, Let us take A=[3,2,5,4,3,1,4]A = [3, 2, 5, 4, 3, 1, 4] Now we build a frequency table freq=[0,1,1,2,2,1]freq = [0, 1, 1, 2, 2, 1] It means freq[0]=0,freq[1]=1,freq[2]=1,freq[3]=2freq[0] = 0, freq[1] = 1, freq[2] = 1, freq[3] = 2 and so on. As we can see the freq[3]=2freq[3] = 2 and freq[4]=2freq[4] = 2, so both these elements are repeated. We can output any one of them as both are duplicates in the array.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

To create the frequency table, we need to do an iteration over the array AA. This will take O(N)O(N) time. To find the duplicate element, we need to do an iteration over the frequency array. The size of the frequency array is also bounded by O(N)O(N), hence the time taken will be O(N)O(N).
Time complexity: O(N)O(N)

Space Complexity Analysis

In this method, we need to create a frequency array of size N+1N + 1. So, the space complexity for this algorithm is O(N)O(N).
Space complexity: O(N)O(N)

Approach 4: Linked List Cycle Method

This method considers the array as a linked list. Suppose we consider the given elements of the array as nodes of a linked list. Here the indexindex of the element indicates the address of the node and it points to node A[index]A[index]. A[index]A[index] stores the address of the next node in the linked list. For example, consider the below array A:[1,2,3,4]A: [1, 2, 3, 4] In this case, we have 4 linked lists, each of size 1.

linked-list-cycle-method

Consider another example A:[2,3,1,5,4]A: [2, 3, 1, 5, 4]

duplicate-array-by-linked-list-cycle-method

In this way, we represent each list A in the form of a linked list. In the examples shown above, there are no duplicate elements in array A. Observe that, this means that each element occurs only once so there can be at most one incoming connection for each node. Try out some more examples to understand this clearly.

Now suppose, we add a duplicate element in array A. The main observation is that there will be at least one node in the linked list that will have multiple incoming connections. This is because there are at least two distinct positions i and j such that A[i]=A[j]A[i] = A[j]. So nodes ii and jj will point to the same node and that node will have multiple incoming connections.

  • To find the duplicate element, we will represent array A in the form of a linked list.
  • Now, to find the duplicate element, we need to find the start of the cycle. This can be done using Floyd’s cycle finding algorithm (discussed below).

Let us understand this using an example. Consider A=[3,2,4,1,5,1]A = [3, 2, 4, 1, 5, 1] Let us say, we are index 00, A[0]=3A[0] = 3, so the next element is at index 33. At index 33, A[3]=1A[3] = 1, so the next element of the list is at index 11. At index 11, we have A[1]=2A[1] = 2, so the next element of the list is at index 22. At index 22, we have A[2]=4A[2] = 4, so the next element is at index 44. At index 44, we have A[4]=5A[4] = 5, so the next element is at index 11.

floyd-cycle-algorithm

Now as we can see this represents a linked list structure with a cycle. So when the elements range between 11 and NN and there is a duplicate then it will form a cycle. The duplicate element will be the start of this cycle. This is because the duplicate element will point to an element that has already been visited. So, if we find the start of this cycle, then we can find the duplicate in array AA.

Now to find the start of the cycle, we can use Floyd’s cycle finding algorithm. The idea is to use two pointers - slow and fast pointer. We will start from index 00 and keep on moving the slow and the fast pointer. The slow pointer will move by one step and the fast pointer will move by two steps. Now when these pointers meet we can say that both the pointers are in the cycle.

duplicate-floyds-cycle-finding-algorithm

To find the start of the cycle, we will re-initialize the slow pointer to 00 and this time, we will update both the pointers by one step at a time. Now, these pointers will end up meeting at the start of the linked list. In this way, we can find the duplicate.

find-duplicate-to-find-start-of-cycle

Now, the question is why should slow and fast pointers meet at the start of the cycle using this method. To understand this consider the image below.

slow-fast-pointer-at-start-of-cycle

Let us say: The distance between the first element and the point of collision of the slow and the fast pointer is DD. Cycle length = CC and the Distance of the point of collision from the start of the cycle is XX. Distance traveled by slow pointer = DD so the distance traveled by fast pointer = 2D2D. If slow and fast pointers meet, then the distance between them is integer multiple of CC. This means D=kCD = k * C, where kk is some positive constant. Now when we reset the slow pointer, the fast pointer has moved a distance of X from starting point of the cycle. If we now update both the pointers, one at a time, this means that the slow pointer will travel a distance of (DX)(D - X) and the fast pointer will also travel a distance of DXD - X to meet the slow pointer at the start of the cycle. In this way, we can find the duplicate element in the array.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

To find the cycle, we will iterate over the cycle once. This is true because when both the pointers - slow and fast are in the cycle, the distance between them will decrease by 11. Since the cycle length is O(N)O(N) the time taken to find the cycle is O(N)O(N). To find the starting point of the cycle, we will iterate over the array once. The time complexity of this algorithm is O(N)O(N).
Time complexity: O(N)O(N)

Space Complexity Analysis

If we look at the implementation, we are only using a constant amount of space. Hence the space complexity is O(1)O(1).
Space complexity: O(1)O(1)

Approach 5: Binary Search on Array

Suppose, we are given that there is only one duplicate element in array AA. This means that there are NN unique elements in array AA and one duplicate element. We can use binary search to find the duplicate element.

The idea is to perform a binary search to find the duplicate element in the range [1,N][1, N]. Given the middle element of the range, we will count the number of elements smaller than or equal to the middle element. If low=1low = 1, high=Nhigh = N and mid=(low+high)/2mid = (low + high) / 2. Then if countcount represents number of A[i]<=midA[i] <= mid. There are three cases.

  • If the countcount is greater than midmid, then in this case we have a smaller subproblem of the original problem. Observe that, in this case, we can choose a subset of all elements that are smaller than or equal to midmid, but the size will be greater than midmid. So, it is like we have an array of size K(=mid)K (= mid) with elements in the range [1,K][1, K], and the size is at least K+1K + 1. So this is a smaller subproblem of the original problem. We will shift our highhigh to midmid.
  • If the countcount is equal to midmid, then, in this case, we can say that there are no duplicates less than or equal to midmid. This is because we have only one duplicate element and NN unique elements, so elements from 11 to midmid will always get counted. So we can discard half the possibilities so we shift our lowlow to mid+1mid + 1.
  • The case where the countcount is less than midmid is not possible. This is because midmid lies between lowlow and highhigh and there is one duplicate element in AA. This implies we have NN unique elements from 11 to NN so no matter what the value of mid is, the elements from 11 to midmid will always be counted.

Let us take an example to understand this. Let us say A=[3,2,4,1,5,2]A = [3, 2, 4, 1, 5, 2]

  • Low=1,high=6,mid=3=>count=4Low = 1, high = 6, mid = 3 => count = 4 (as 4 elements [3,2,1,2][3, 2, 1, 2] are less than or equal to 3 in the A)
  • Low=1,high=3,mid=2=>count=3Low = 1, high = 3, mid = 2 => count = 3 (as 3 elements [2,1,2][2, 1, 2] are less than or equal to 2 in A)
  • Low=1,high=2,mid=1=>count=1Low = 1, high = 2, mid = 1 => count = 1 so we set low = mid
  • Low=2,high=2Low = 2, high = 2. Since both are equal we have found our answer and we terminate the loop. To find the number of elements that are less than or equal to a given value of midmid, we can do a simple iteration over AA and count such elements.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

In this method, we perform a binary search over array AA. For each iteration of binary search, we have to iterate over array AA to find the number of elements less than or equal to the given mid. Hence the time complexity is O(Nlog(N))O(N log(N)).
Time Complexity: O(Nlog(N))O(N log(N))

Space Complexity Analysis

Binary search does not require any extra space. We use only a constant amount of space. Hence space complexity: O(1)O(1)

Approach 6: Count Iterations

In this method, the idea is to count the number of times, each element occurs while iterating over the array AA. This method relies on the fact that the frequency of duplicate elements is greater than one. In this method, we will find the duplicate element in a single pass.

  • We will maintain a frequency table count with all entries as 00.
  • Now iterate over the array A and for each element A[i]A[i] increment A[i]A[i] by 11. While iterating, if at any point, the number of occurrences of A[i]A[i] is greater than 11, then we have found the duplicate element.

Example: A=[3,2,1,4,1,5]A = [3, 2, 1, 4, 1, 5] count=[0,0,0,0,0,0,0]count = [0, 0, 0, 0, 0, 0, 0] Iteration 1: count=[0,0,0,1,0,0,0]count = [0, 0, 0, 1, 0, 0, 0] count[A[1]]=count[3]=1count[A[1]] = count[3] = 1, so we continue. Iteration 2: count=[0,0,1,1,0,0,0]count = [0, 0, 1, 1, 0, 0, 0] count[A[2]]=count[2]=1count[A[2]] = count[2] = 1, so we continue. Iteration 3: count=[0,1,1,1,0,0,0]count = [0, 1, 1, 1, 0, 0, 0] count[A[3]]=count[1]=1count[A[3]] = count[1] = 1, so we continue. Iteration 4: count=[0,1,1,1,1,0,0]count = [0, 1, 1, 1, 1, 0, 0] count[A[4]]=count[4]=1count[A[4]] = count[4] = 1, so we continue. Iteration 5: count=[0,2,1,1,1,0,0]count = [0, 2, 1, 1, 1, 0, 0] count[A[1]]=count[1]=2(>1)count[A[1]] = count[1] = 2 ( > 1). We will break here because we have found a duplicate element.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

In this method, we are doing a single iteration over array A and updating the count in the frequency table. Hence the complexity of this solution is O(N)O(N).
Time complexity: O(N)O(N)

Space Complexity Analysis

This method requires a frequency table for calculating the answer. The frequency table is implemented using an array of size N+1. So, the space complexity is O(N).
Space complexity: O(N)O(N).

Approach 7: Sum of Elements

This method is useful if it is known that there is only one duplicate element. If the problem specifies a new constraint, that there can be only one duplicate element, then we can use the following math trick to find that duplicate element efficiently.

We know that array AA contains NN elements from 11 to NN and one duplicate element. Let us say that the duplicate element in the array is DD.

Now we can write, (Sum of elements from 11 to NN) ++ D = Sum of elements in array AA From this, we can derive DD = Sum of elements in array AA - Sum of elements from 11 to NN. We also know that the sum of elements from 11 to NN is N(N+1)/2N*(N+1)/2.

sum-of-elements

The only thing we need is the sum of elements in array AA, which can be calculated using a simple iteration over the array. In this way, we can find the required duplicate element in array AA. Note that, this method is only useful when it is given that there is only one duplicate element. If this is not the case, then this method will fail as the equation defined above does not holds good.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

To find the duplicate in array AA, we need the sum of elements of array A. This is done using a simple interaction over the array AA in O(N)O(N). Once we know the sum, we can find the duplicate element in O(1)O(1) time. Hence the time complexity is O(N)O(N).
Time Complexity: O(N)O(N)

Space Complexity Analysis

We do not need any extra space to find the answer. We only need a variable to calculate the sum.
Space complexity: O(1)O(1)

Approach 8: Marker

In this approach, we will make use of the information given to us, that each element A[i]A[i] lies in between 11 and NN.

Suppose we take a boolean flagflag array of size N with all entries as 00. Now for each element A[i]A[i], we assign flag[A[i]]=1flag[A[i]] = 1. Now as we are doing the iteration, if we arrive at an element A[i] such that flag[A[i]]flag[A[i]] is already one, then, in this case, we can determine that the A[i]A[i] is a duplicate.

This is because if the value of flag[A[i]]flag[A[i]] is 11, this means that we have already encountered A[i]A[i] before in the iteration. This flagflag variable serves as a marker. When we are iterating over the elements of AA, we are marking each element A[i]A[i] being visited using the array flag.

Space Optimization: In this approach, we can optimize space as well. We are given positive elements in array AA, so we can use the same array as a flag array. Suppose, while iterating over A, to mark the element A[i]A[i], we make A[A[i]]A[A[i]] as negative. In this way, we can say that if at any time, we encounter an element x such that A[x]A[x] is negative, then this will tell us that x is a duplicate element. During the iteration, we will consider the absolute values of A[i]A[i] as the negative sign indicates that the index is marked by some previous element. This technique helps us to reuse memory and reduce space complexity.

Example Consider array A=[3,2,4,1,5,2]A = [3, 2, 4, 1, 5, 2] Iteration 1: A=[3,2,4,1,5,2]A = [3, 2, 4, 1, 5, 2] Make A[3]A[3] as negative. A=[3,2,4,1,5,2]A = [3, 2, 4, -1, 5, 2]

Iteration 2: A=[3,2,4,1,5,2]A = [3, 2, 4, -1, 5, 2] Make A[2]A[2] as negative. A=[3,2,4,1,5,2]A = [3, 2, -4, -1, 5, 2]

Iteration 3: A=[3,2,4,1,5,2]A = [3, 2, -4, -1, 5, 2] Make A[4]A[4] as negative. A=[3,2,4,1,5,2]A = [3, 2, -4, -1, -5, 2]

Iteration 4: A=[3,2,4,1,5,2]A = [3, 2, -4, -1, -5, 2] Make A[1]A[1] as negative. A=[3,2,4,1,5,2]A = [3, -2, -4, -1, -5, 2]

Iteration 5: A=[3,2,4,1,5,2]A = [3, -2, -4, -1, -5, 2] Make A[5]A[5] as negative. A=[3,2,4,1,5,2]A = [3, -2, -4, -1, -5, -2]

Iteration 6: A=[3,2,4,1,5,2]A = [3, -2, -4, -1, -5, -2] Now, we see that A[2]A[2] is already negative, so we know that the duplicate element is 2.

C++ Implementation

Java Implementation

Python3 Implementation

Time Complexity Analysis

In this method, we are iterating over the array A once, hence the time complexity is O(N)O(N).
Time complexity: O(N)O(N)

Space Complexity Analysis

If you use a secondary array flag for marking each element, then the space complexity will be O(N)O(N). If you re-use the same input array for marking each element, then the space complexity of our algorithm is O(1)O(1).

Conclusion

  • There are many methods, discussed in this article, to find duplicates in an array.
  • Some techniques like Binary search over an array or method using the sum of elements are only applicable only if there is a single duplicate element.
  • Other techniques like using a frequency table or hash tables can be applied to a more general problem.
  • The linked list cycle technique can be applied only when elements range from 1 to N.
  • From this problem, you can learn how to apply different types of techniques to the same problem. It is essential to understand each algorithm and the complexity analysis.