Find Duplicates in Array

Problem Statement
Given an array A of size N + 1. Each element in array A is between 1 and N. Write a program to find one of the duplicates in array A. If there are multiple answers, then print any.
Example 1
Input: N = 5 A = [1, 2, 5, 3, 4, 3] Output 3
Explanation In this example, element 3 is repeated at position 4 and position 6 in array A.
Example 2
Input N = 5 A = [1, 2, 4, 2, 4, 4] Output 2
Explanation In this example, element 2 is repeated as well as element 4 is repeated. Hence we can output any of them.
Constraints
Problem Discussion
Before moving on to the solution, let us first analyze the constraints. In this problem, we are given that is at least . This means that the size of the input array will be at least 2. This constraint guarantees that there is at least one duplicate element in the array. Why?
This is because we are given that array is of size and each element is between and . So, if we give unique values to the first positions in the array , then the last position of array will be a duplicate in the array.
For example: Then, the maximum number of unique values in can be . . The last element will be repeated. So the answer always exists.

Approach 1: Brute Force
We can solve this problem using a simple brute force over the array . We can use two nested iterations over array to find the duplicate element.
- For each element , we can check is there an element such that and .
- If such an element exists, then we can say that is the repeated element and we have found our answer. In this way, we can find duplicate in array.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
In this method, we will do two nested iterations over the array of size . Hence the time complexity will be for the outer loop and for the inner loop. Since the iterations are nested, the time complexity will be .
Time complexity:
Space Complexity Analysis
In this method, we only need a constant amount of variables of iteration. We are also keeping another variable to store the duplicate element. Hence the space complexity is .
Space Complexity:
Approach 2: Using Sorting
- Firstly, we will sort the given array A in increasing or decreasing order. This will guarantee that all equal elements are placed side by side in the sorted array.
- After sorting, equal elements will be placed one after the other so we can do a simple iteration on the sorted array A and check adjacent elements.
- When iterating over the array, if we find a position where adjacent elements are equal, then we have found our duplicate element.
For example Now if we iterate over the sorted array , and compare adjacent elements at position , we will find that . This means that the duplicate element is .
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
We can sort the array using a comparison-based sorting algorithm. This will take time. Now to search for the duplicate element, we are doing a simple iteration. This will take time.
Time complexity: .
We can also use a linear time sorting algorithm like Counting sort. In this case, the time complexity will be . The trade-off here is that counting sort requires extra space.
Space Complexity Analysis
We are not using any extra space in our algorithm. We only need a constant number of variables for iteration and searching for the answer. Hence the space complexity is .
Space Complexity:
If we use a linear time sorting algorithm like counting sort, then the space complexity will be as it requires an extra array to store the count of each element.
Approach 3: Using Frequency Array
To find a duplicate element in the array, we can make use of the observation that any duplicate number A[i] will occur more than once in array A. This means that suppose, if we know the number of times each element occurs in array A, then the duplicate element will be one of the elements whose frequency or the number of times it occurs in A is more than one.
Now how to find the frequency of each element ? To do this, we can create a frequency table over the array A. A frequency table is a map that represents the number of times each element occurs in array A. It represents the count or frequency of each element in array A.
- A frequency table can be implemented using arrays or a hash map. Generally, if the range of elements in the given array A is small, then it is preferred that we use arrays as a frequency table. This is because a hash map can have some overhead costs associated with it.
- Let's say we have a variable that represents our frequency table of size . We will create a table of size as lies between and .
- Now, we iterate over array A and increment by . In this way, we can build the frequency table.
- Now, we can simply check the frequency table. If at any point then we have found the duplicate element equal to .
For example, Let us take Now we build a frequency table It means and so on. As we can see the and , so both these elements are repeated. We can output any one of them as both are duplicates in the array.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
To create the frequency table, we need to do an iteration over the array . This will take time. To find the duplicate element, we need to do an iteration over the frequency array. The size of the frequency array is also bounded by , hence the time taken will be .
Time complexity:
Space Complexity Analysis
In this method, we need to create a frequency array of size . So, the space complexity for this algorithm is .
Space complexity:
Approach 4: Linked List Cycle Method
This method considers the array as a linked list. Suppose we consider the given elements of the array as nodes of a linked list. Here the of the element indicates the address of the node and it points to node . stores the address of the next node in the linked list. For example, consider the below array In this case, we have 4 linked lists, each of size 1.

Consider another example

In this way, we represent each list A in the form of a linked list. In the examples shown above, there are no duplicate elements in array A. Observe that, this means that each element occurs only once so there can be at most one incoming connection for each node. Try out some more examples to understand this clearly.
Now suppose, we add a duplicate element in array A. The main observation is that there will be at least one node in the linked list that will have multiple incoming connections. This is because there are at least two distinct positions i and j such that . So nodes and will point to the same node and that node will have multiple incoming connections.
- To find the duplicate element, we will represent array A in the form of a linked list.
- Now, to find the duplicate element, we need to find the start of the cycle. This can be done using Floyd’s cycle finding algorithm (discussed below).
Let us understand this using an example. Consider Let us say, we are index , , so the next element is at index . At index , , so the next element of the list is at index . At index , we have , so the next element of the list is at index . At index , we have , so the next element is at index . At index , we have , so the next element is at index .

Now as we can see this represents a linked list structure with a cycle. So when the elements range between and and there is a duplicate then it will form a cycle. The duplicate element will be the start of this cycle. This is because the duplicate element will point to an element that has already been visited. So, if we find the start of this cycle, then we can find the duplicate in array .
Now to find the start of the cycle, we can use Floyd’s cycle finding algorithm. The idea is to use two pointers - slow and fast pointer. We will start from index and keep on moving the slow and the fast pointer. The slow pointer will move by one step and the fast pointer will move by two steps. Now when these pointers meet we can say that both the pointers are in the cycle.

To find the start of the cycle, we will re-initialize the slow pointer to and this time, we will update both the pointers by one step at a time. Now, these pointers will end up meeting at the start of the linked list. In this way, we can find the duplicate.

Now, the question is why should slow and fast pointers meet at the start of the cycle using this method. To understand this consider the image below.

Let us say: The distance between the first element and the point of collision of the slow and the fast pointer is . Cycle length = and the Distance of the point of collision from the start of the cycle is . Distance traveled by slow pointer = so the distance traveled by fast pointer = . If slow and fast pointers meet, then the distance between them is integer multiple of . This means , where is some positive constant. Now when we reset the slow pointer, the fast pointer has moved a distance of X from starting point of the cycle. If we now update both the pointers, one at a time, this means that the slow pointer will travel a distance of and the fast pointer will also travel a distance of to meet the slow pointer at the start of the cycle. In this way, we can find the duplicate element in the array.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
To find the cycle, we will iterate over the cycle once. This is true because when both the pointers - slow and fast are in the cycle, the distance between them will decrease by . Since the cycle length is the time taken to find the cycle is . To find the starting point of the cycle, we will iterate over the array once. The time complexity of this algorithm is .
Time complexity:
Space Complexity Analysis
If we look at the implementation, we are only using a constant amount of space. Hence the space complexity is .
Space complexity:
Approach 5: Binary Search on Array
Suppose, we are given that there is only one duplicate element in array . This means that there are unique elements in array and one duplicate element. We can use binary search to find the duplicate element.
The idea is to perform a binary search to find the duplicate element in the range . Given the middle element of the range, we will count the number of elements smaller than or equal to the middle element. If , and . Then if represents number of . There are three cases.
- If the is greater than , then in this case we have a smaller subproblem of the original problem. Observe that, in this case, we can choose a subset of all elements that are smaller than or equal to , but the size will be greater than . So, it is like we have an array of size with elements in the range , and the size is at least . So this is a smaller subproblem of the original problem. We will shift our to .
- If the is equal to , then, in this case, we can say that there are no duplicates less than or equal to . This is because we have only one duplicate element and unique elements, so elements from to will always get counted. So we can discard half the possibilities so we shift our to .
- The case where the is less than is not possible. This is because lies between and and there is one duplicate element in . This implies we have unique elements from to so no matter what the value of mid is, the elements from to will always be counted.
Let us take an example to understand this. Let us say
- (as 4 elements are less than or equal to 3 in the A)
- (as 3 elements are less than or equal to 2 in A)
- so we set low = mid
- . Since both are equal we have found our answer and we terminate the loop. To find the number of elements that are less than or equal to a given value of , we can do a simple iteration over and count such elements.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
In this method, we perform a binary search over array . For each iteration of binary search, we have to iterate over array to find the number of elements less than or equal to the given mid. Hence the time complexity is .
Time Complexity:
Space Complexity Analysis
Binary search does not require any extra space. We use only a constant amount of space. Hence space complexity:
Approach 6: Count Iterations
In this method, the idea is to count the number of times, each element occurs while iterating over the array . This method relies on the fact that the frequency of duplicate elements is greater than one. In this method, we will find the duplicate element in a single pass.
- We will maintain a frequency table count with all entries as .
- Now iterate over the array A and for each element increment by . While iterating, if at any point, the number of occurrences of is greater than , then we have found the duplicate element.
Example: Iteration 1: , so we continue. Iteration 2: , so we continue. Iteration 3: , so we continue. Iteration 4: , so we continue. Iteration 5: . We will break here because we have found a duplicate element.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
In this method, we are doing a single iteration over array A and updating the count in the frequency table. Hence the complexity of this solution is .
Time complexity:
Space Complexity Analysis
This method requires a frequency table for calculating the answer. The frequency table is implemented using an array of size N+1. So, the space complexity is O(N).
Space complexity: .
Approach 7: Sum of Elements
This method is useful if it is known that there is only one duplicate element. If the problem specifies a new constraint, that there can be only one duplicate element, then we can use the following math trick to find that duplicate element efficiently.
We know that array contains elements from to and one duplicate element. Let us say that the duplicate element in the array is .
Now we can write, (Sum of elements from to ) D = Sum of elements in array From this, we can derive = Sum of elements in array Sum of elements from to . We also know that the sum of elements from to is .

The only thing we need is the sum of elements in array , which can be calculated using a simple iteration over the array. In this way, we can find the required duplicate element in array . Note that, this method is only useful when it is given that there is only one duplicate element. If this is not the case, then this method will fail as the equation defined above does not holds good.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
To find the duplicate in array , we need the sum of elements of array A. This is done using a simple interaction over the array in . Once we know the sum, we can find the duplicate element in time. Hence the time complexity is .
Time Complexity:
Space Complexity Analysis
We do not need any extra space to find the answer. We only need a variable to calculate the sum.
Space complexity:
Approach 8: Marker
In this approach, we will make use of the information given to us, that each element lies in between and .
Suppose we take a boolean array of size N with all entries as . Now for each element , we assign . Now as we are doing the iteration, if we arrive at an element A[i] such that is already one, then, in this case, we can determine that the is a duplicate.
This is because if the value of is , this means that we have already encountered before in the iteration. This variable serves as a marker. When we are iterating over the elements of , we are marking each element being visited using the array flag.
Space Optimization: In this approach, we can optimize space as well. We are given positive elements in array , so we can use the same array as a flag array. Suppose, while iterating over A, to mark the element , we make as negative. In this way, we can say that if at any time, we encounter an element x such that is negative, then this will tell us that x is a duplicate element. During the iteration, we will consider the absolute values of as the negative sign indicates that the index is marked by some previous element. This technique helps us to reuse memory and reduce space complexity.
Example Consider array Iteration 1: Make as negative.
Iteration 2: Make as negative.
Iteration 3: Make as negative.
Iteration 4: Make as negative.
Iteration 5: Make as negative.
Iteration 6: Now, we see that is already negative, so we know that the duplicate element is 2.
C++ Implementation
Java Implementation
Python3 Implementation
Time Complexity Analysis
In this method, we are iterating over the array A once, hence the time complexity is .
Time complexity:
Space Complexity Analysis
If you use a secondary array flag for marking each element, then the space complexity will be . If you re-use the same input array for marking each element, then the space complexity of our algorithm is .
Conclusion
- There are many methods, discussed in this article, to find duplicates in an array.
- Some techniques like Binary search over an array or method using the sum of elements are only applicable only if there is a single duplicate element.
- Other techniques like using a frequency table or hash tables can be applied to a more general problem.
- The linked list cycle technique can be applied only when elements range from 1 to N.
- From this problem, you can learn how to apply different types of techniques to the same problem. It is essential to understand each algorithm and the complexity analysis.