Segmented Sieve -Scaler Topics

Overview

Given a number N, print all the prime numbers less than N. Prime numbers are the numbers which are divisible by 1 and itself only.

Example:

Input:

Output:

Example Explanation:

The given input number is 12, and prime numbers which are smaller than 12 are only 2, 3, 5, 7, 11. So, they are printed.

Constraints:

$2<=N<=10^{6}$

Approach 1: Brute Force

Just run the loop from 1 to N
For every number check whether it is a prime or not.
If the number is prime, then print the number.

Time Complexity: $O(N^2)$

For the first loop, it takes O(N)
The nested loop to check whether it is a prime number takes O(N)
So, the overall time complexity sums up to be O(N^2)

Space Complexity: O(1)

No extra space is needed for this approach.

This approach fails for the large test case(from constraints) as we get a TLE verdict, So to optimize it we use the famous algorithm Sieve of Eratosthenes. This approach helps in optimizing the time complexity for finding the primes smaller than N.

Approach 2: Using Sieve of Eratosthenes

Explanation with an example: Let's take an example N=25, find all the numbers which are less than 25 and are primes.

Firstly initialize a boolean array, sieve of size 26 with all values as true.
Then start traversing the iterator i from 2. If sieve[i] is found to be true, then mark all the multiples of i as false in the sieve array, it means that the numbers which are marked as false can't be prime since they've i as their factor.

In our example all the numbers which are multiples of 2 are marked as false in the sieves array, the array looks like this:

Now as we increment i to i, so i=3, since the sieve[3] is true, we mark all its multiples as false i.e,
After incrementing i it is 4 now, but sieve[4] is already false, which means that all the factors of 4 are already marked false. So, increment i. 11. Now i=5, since sieve[5] is not false, mark all its multiples as false, the array looks like this:
We can observe that non-prime(composite) numbers below 25 are already marked as false, so we can say that loop can run till $\sqrt{25}$ which is till 5, so the loop traverses till $\sqrt{N}$ (first optimization).
So, Now let's talk about how we mark i's multiples as false.

Observe that for i=3, we need to mark [6, 9, 12, 15, 18, 21, 24] we can notice that 6 is already been marked as false by 2's multiples, so we can start this loop from i^2^(second optimization in the sieve of primes) instead of starting the second loop from 2*i.
So the final array sieve has all the prime numbers less than 25 as true.

Segmented Sieve Data Structures

Output:

Algorithm:

Below is the algorithm to be followed for Sieve of Eratosthenes:

Initialise a boolean array of size N+1 with all values as true.
Run a for loop from 2 to $\sqrt{N}$
if sieve[i]==True
Mark all the multiples of i in the sieve as false.
This can be done by running a nested loop from i^2^ to n, where we increment the iterator by i, since we are marking its multiples as false.
Else increment i.
Finally, return all the indices where sieve[i] is True, they are the prime numbers smaller than n.

Implementation:

The above discussed algorithm has been implemented in this section:

Python implementation of simple sieve:

Java implementation of simple sieve:

C++ implementation of simple sieve:

Output:

Explanation: All the prime numbers which are less than 25 are printed as the result, this is achieved by using the Sieve of Eratosthenes algorithm.

Complexity Analysis:

Time Complexity: O(N log(logN))

It is assumed that time taken to mark as false takes constant time, then the number of times the loop runs is:
$(N/2 + N/3 + N/5 + N/7 +....p)$
By solving this we get:
$N (1/2 + 1/3 + 1/5 + 1/7 ....)$
The Harmonic progression of the sum of primes is log(log(N)).
Therefore the overall time complexity is O(N log(log(N)))

Space Complexity: O(N)

A boolean array of size N for keeping track of elements that are prime and smaller than N. Steps to be followed during Competitive Programming:

In this approach, generally if the queries are given for each test case and if the constraints are huge, we get a TLE verdict. So to overcome that we can pre-compute the sieve array only once. That can be done in a function/ globally before starting the main program.

And we can call this function and check for prime numbers in the given range.

For checking the prime numbers in the given range takes about O(1) from the sieve array which is already formed, however for computing the sieve it takes O(10^6^) if the size of the sieve is 10^6^.

This method helps in cases where queries are given for each test case, we need to print the prime numbers.

There are many questions based on this sieve, a few of them are :

Print K^th^ prime number

Print the K^th^ prime factor for the given number

Count the number of primes in the given range All these can be solved using a Sieve of Eratosthenes in the optimized and efficient way.

Segmented Sieves

Find the primes in the given range.

Given the range, Left ( L ) and Right ( R ), print all the numbers which are prime in the given range. Prime numbers are numbers which are divisible by 1 and itself.

Example:

Input:

Output:

Explanation: The prime numbers present in the given range i.e, from 10 to 30 are printed.

This problem uses the Segmented Sieve concept. To understand this in a better manner, the basic prerequisite is the simple sieve concept, also called as Sieve of Eratosthenes which we discussed earlier.

Constraints:

$1 <= L, R <= 10^{12}$
$1 <= R-L <= 10^{6}$

For the given constraints simple sieve, can't be used (we get memory error in many languages) as we exceed the maximum limit for the boolean array. So, we use the concept of the segmented sieve to overcome the issue. The main difference between segmented sieve and sieve is that in segmented sieve we limit the range from L to R instead of 1 to R.
This algorithm uses Sieve, which is discussed above to find primes smaller than or equal to $\sqrt{R}$ Our next question is why only up to $\sqrt{R}$ , because non-prime numbers till range R, will be marked as false by numbers smaller than $\sqrt{R}$ already.

Approach: (Using Segmented Sieve)

Generate all the primes in the range $\sqrt{R}$ .
Now instead of keeping track of all primes just keep track of primes only between the given range by creating a boolean (dummy) array of size (R-L+1) and initialising every element as true.
Each index in the dummy array represents L+i, so now the question is divided into segment.
Now from each element p, in the generated primes, mark its multiples as false in the dummy array.
This can be done by running a loop on dummy array. The starting index i.e. first multiple in the range [L-R] can be calculated as:
low_idx= ((L//p)p) (integer division)
if the low_idx is less than L, then add p.
Now the low_idx can be max(low_idx, p*p).
For example if we take the range as [110, 130], here for the prime number 11, multiples start from 110. We can simply start from 121, because start index is max(low_idx, p*p)
The below image explains more clearly:
After computing this we get the primes in the given range from the dummy array(which are marked as True).

Algorithm:

Firstly generate all the primes smaller than the $\sqrt{R}$ which is done using a sieve of primes and store the result in an array. This is the prerequisite for the next steps.
Initialise the boolean array/dummy array of size (R-L+1) and initialize all the values to true.
The idea is to partition the interval [L, R] into smaller segments and calculate the primes only upto $\sqrt{R}$ .
This is where we optimize the space complexity. This ensures that the algorithm runs for a large value of R (up to 10^12^) because it takes space of the order of 10^6^.
For all primes p in the generated primes array which is calculated from step-1, we cancel out its multiples in the range [L, R**] in the dummy array, by marking them as false.
Get the first multiple in the range, and start cancelling it's multiples smaller than or equal to R.
The element at index 0 represents L, and the element at the last index represents R.
Iterate over the dummy array, if the dummy array value is true print (i+L), which is the prime number in the given range.

So unlike the simple sieve we discussed earlier, we check for all multiples of every number less than or equal to $\sqrt{R}$ , this reduces the space complexity as well considered to be more time-efficient.

Implementation of the above approach:

Python implementation of segmented sieves:

Java implementation of segmented sieves:

C++ implementaion of segmented sieves:

Output:

Explanation: The prime numbers(divisible by only 1 and itself) in the given range i.e, from 110 to 130 are printed using the segmented sieves algorithm.

Complexity Analysis:

Time Complexity: $O((R-L+1) log log( R ) +$ \sqrt{R} $log log($ \sqrt{R} $))$ Log (log $\sqrt{R}$ ) can be negligible and the final time complexity will be

 O((R-L+1) log log( R ) +  $\sqrt{R}$)

For iterating over the dummy array takes $O(R-L+1)$ times and generating the primes using the sieve takes around $\sqrt{R}$
However this runs faster than it looks, and is considered to be the most efficient algorithm for finding the primes in the given range.
So, let's assume the worst-case i.e, if the R is $10^{12}+1200$ and L is some $10^{12}+410$ , then also this segmented sieve doesn't show TLE because for segmented sieve we take $\sqrt{R}$ which is $10^{6}$ ^, and for boolean array, we take $10^{6}$ , so the code is efficient for large test cases also.

Space Complexity: O(R-L+1)

A dummy array of size R-L+1 is used for keeping track of the prime numbers in the given range.

Conclusion

We observed that both the segmented sieve and the simple sieve algorithms have the same time complexity, but what affects most is the space optimization which is done in the segmented sieve as we use a dummy array to store the values only in the given range and return prime numbers.
If the constraints are very large, the segmented sieve algorithm consumes less space and therefore it is considered to be more efficient than simple sieve.
We have used a boolean array in the sieve algorithm or the segmented sieve algorithm to keep track of the prime numbers according to the given input, instead of that we can also use an integer array by keeping 0 and 1 as values for indication of prime or non-prime number.
Why is the boolean array more preferable?
It is because a boolean array when declared globally can take up to 10^8^ values, if it is in the function it can take up to 10^7^ which is advantageous when the input is very huge.
This can be done in C++ and also in java.

HAPPY CODING!!