Longest Consecutive Subsequence
Problem Statement
We are given an unsorted integer array nums, and we have to find the length of the longest consecutive sequence of elements in it.
The order of elements in the array shouldn't matter as we are looking for a subsequence and not a subarray.
For example, the longest consecutive subsequence in the array [8, 2, 5, 6, 7, 3], is [5, 6, 7, 8].
Note: The algorithm should run in O(n) time, where n is the size of the input array nums.
Example 1:
Input:
Output:
:::
Explanation:
The longest consecutive subsequence in the input array is [6, 7, 8, 9]; therefore, its length is 4.
:::
Example 2:
Input:
Output:
Explanation: The longest consecutive subsequence in the input array is [-1, 0, 1, 2, 3, 4, 5], therefore its length is 7.
Constraints
Approach 1: Brute Force Approach
The Brute Force approach is the most inefficient way in terms of time complexity as in this approach, we will be considering each number in and check if that is the part of the longest consecutive subsequence or not.
Algorithm
This algorithm considers each number in the array and counts as high as the sequence goes from that. The greatest count recorded is returned as the length of the longest consecutive subsequence.
This is an optimal algorithm as it checks for every possibility and returns an accurate answer.
Implementation (C++ 20, Java 1.8, Python 3)
Implementation in C++ 20
Implementation in Java 1.8
Implementation in Python 3
The output for all of the implementations will be the same as the problem and the algorithm are the same.
Output
The longest consecutive sequence in the array [3, 4, 6, 7, 8] will be [6, 7, 8].
Complexity analysis
Time Complexity:
The for loop in the longestConsecutive() function runs exactly times and the while loop nested inside it also runs in time as currNum is getting incremented by in each iteration. Then on each iteration in the while loop, a lookup in the array is performed to check if is present in it or not, this also costs O(n) runtime.
Therefore, the runtime of this brute force algorithm is actually the same as three nested O(n) loops.
Space Complexity: O(1)
This algorithm only allocates only a handful of integers that too independent of n.
Approach 2: Naive Approach (Sorting)
This algorithm is what comes to everyone's mind upon seeing this problem, as it is the simplest to perceive and think of.
The intuition behind this algorithm is that it will be easier to find sequences of consecutive numbers in a sorted array than in an unsorted one.
Algorithm
In this algorithm, we are first sorting the array and considering each number to compare to its previous number. To store the longest length recorded and the current length of the subsequence, we will initialize two variables - longest and current.
Now, we will consider each number (except the first, as we need to compare each number to its previous number). If the current number and previous number are equal, then our sequence is neither extended nor broken, and we can simply move to the next number.
If the current number and previous number are unequal, we will check if the current number is one more than the previous (extends the sequence). If it does, then we will add to our current count (current) and move forward, and if it doesn't, we will store our current count and reset it to 1 (to include the count of the current number that broke the sequence).
In the end, we will return the longest count (longest) as the length of the longest consecutive sequence of .
Implementation (C++ 20, Java 1.8, Python 3)
Implementation in C++ 20
Implementation in Java 1.8
Implementation in Python 3
The output for all of the implementations will be the same as the problem and the algorithm are the same.
Output
The longest consecutive sequence in the array [-1, -2, -3, -4, 0, 8, 4] will be [-4, -3, -2, -1, 0].
Complexity analysis
Time Complexity:
The for loop inside the function runs exactly times, but it is dominated by the time taken to sort the array (). Therefore, the asymptotic time complexity will be , i.e., .
Space Complexity:
This algorithm only allocates only a handful of integers that are independent of .
This algorithm is better than the Brute Force algorithm as it has lesser time complexity.
Approach 3: Using a Priority Queue
In this approach, we will be using a priority queue to find the length of the longest subsequence in an array.
Note: A priority queue is a special type of queue in which all the elements are associated with a priority value. In this case, the bigger numbers are a higher priority than the smaller numbers.
Algorithm
At first, we will create an integer priority queue p_nums and copy all the elements of nums to it. A variable will be initialized and assigned the value at the top of p_nums.
Then we will run a while loop till p_nums becomes empty. Inside the while loop, we will check if the top of p_nums is equal to prev, if yes, the current number will be ignored, and we will move pop out the topmost element. Then we will check if the top of the queue (current number) is l greater than the previous number or not, if yes, our current count will be incremented by l, and the topmost element will be popped out from the queue.
If none of these conditions satisfies, we will simply reset the current count to 1 and assign the topmost element of the queue to the previous number before popping it out from the queue.
We will keep track of the greatest count recorded in each iteration, and after the end of the while loop, the greatest count will be returned as the length of the longest consecutive subsequence of nums.
Note: This algorithm only works with a min-heap priority queue but priority queues in C++ are max-heap by default.
To solve this problem we will initialize our priority queue using the greater<>, our priority queue initialization will look like this:
priority_queue<int, vector<int>, greater<>> var_name;
Here, greater<> is an object class for greater-than equality comparison, which converts max-heap priority queue to min-heap priority queue.
Implementation (C++ 20, Java 1.8, Python 3)
Implementation in C++ 20
Implementation in Java 1.8
Implementation in Python 3
The output for all of the implementations will be the same as the problem and the algorithm are the same.
Output
The longest consecutive sequence in the array [0, 2, 3, 5] will be [2, 3].
Complexity analysis
Time Complexity: O(nlogn)
The main while loop in this algorithm runs exactly time, but inside the while loop, we are also using a priority queue simultaneously, which has a time complexity of O(nlogn). Therefore, the total time complexity will be O(nlogn).
Space Complexity:
This algorithm has a linear space complexity as a priority queue is maintained throughout the algorithm of size n.
Approach 4: Using a HashSet
Using a HashSet, we can optimize our brute force solution to a great extent! None of our previous solutions has reached time complexity, but that will not be the case with this approach (making it the only acceptable solution for this problem).
Algorithm
This algorithm is more similar to the brute force approach than any other algorithm as it has only two changes.
- The numbers are stored in a HashSet, allowing O(1) lookups.
- An attempt to build a sequence from numbers that are already not a part of a longer sequence.
The point is accomplished by ensuring that the number which precedes the current number by is not present in the HashSet, because if it were present, we would have ignored the current number as that number would have been a part of a longer sequence.
For example, if nums = [4, 3, 2, 6], the longest consecutive sequence will not start from 4 or 3 as they will be a part of the subsequence starting from 2.
Implementation (C++ 20, Java 1.8, Python 3)
Implementation in C++
Implementation in Java 1.8
Implementation in Python 3
The output for all of the implementations will be the same as the problem and the algorithm are the same.
Output
The longest consecutive sequence in the array [0, 2, 3, 5] will be [2, 3].
Complexity analysis
Time Complexity:
This algorithm appears to be of O(n^2) time complexity due to a nested while loop inside a for loop, but on inspecting closely, it is realized that it is a linear time complexity algorithm as the for loop and the while loop runs for times throughout the algorithm. In other words, the while loop is amortized.
This means that the time complexity of the algorithm is O(n+n), i.e., O(n).
Space Complexity: O(n)
This algorithm has a linear space complexity as a HashSet is maintained throughout the algorithm of number of elements.
This algorithm is accepted as a solution to the problem and is more efficient than any other algorithm due to its lowest time complexity.
Conclusion
- The problem was to find the length of the longest consecutive subsequence in an array.
- We can find the solution by using Brute Force (three nested loops) approach, it is an optimal solution, but it has O(n^3), which is not preferable at all.
- The Sorting approach can also be used to find the solution, it is more efficient than the brute force approach, but still, it cannot be accepted as a solution due to its high time complexity.
- The Priority Queue approach works, but that is just unnecessary, first of all, it's difficult to intuitively come up with this approach, everyone will prefer the sorting approach, and the sorting approach is a more efficient solution of the two (as the sorting approach doesn't take any auxiliary space).
- The HashSet is the best solution for this problem as it has the lowest time complexity of all O(n).
FAQs
Q.What is a Priority Queue?
A.A Priority Queue is a special type of a queue data structure. Each of its elements is assigned a priority. Elements with a higher priority are served before the elements with a lower priority.
For example, Java's implementation of a Priority Queue is min-heap, i.e., smaller numbers are a higher priority than larger numbers, whereas, C++'s implementation of a Priority Queue is max-heap, i.e., larger numbers have a higher priority than smaller numbers.
Q.When are HashMaps used?
A.HashMaps are used when we have unique keys with their values to be stored. We use HashMaps to search for items based on their keys to save time as it has a lookup time.
Q.Why only the HashSet solution is accepted by this problem and not other approaches?
A.The HashSet solution is accepted and not the others because only the HashSet solution works with time complexity, others have a higher time complexity which doesn't satisfy the problem's needs.
Q.What are HashSets?
A.HashSets are an unordered sequence of elements that only contain unique elements. They are called unordered sets in C++ and sets in Python.
Q.Why did we use a HashSet instead of a HashMap in the last approach?
A. We could have used HashMaps in that approach, but the use of HashSet makes much more sense. A HashSet only has unique values, and there is no use of duplicate values in this algorithm. It takes less memory to store the same number of elements in a HashSet than in a HashMap, as a Hashmap requires two objects for every element (key and value).