Naive String Matching Algorithm

Learn via video course
FREE
View all courses
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
DSA Problem Solving for Interviews using Java
DSA Problem Solving for Interviews using Java
by Jitender Punia
1000
4.9
Start Learning
Topics Covered

Problem Statement

The problem states that we are given a text string (let's say text) having the length n and a pattern string (let's say pattern) having the length m. We need to write a function that takes these two strings (pattern, and text) and prints all the occurrences of the pattern string in the text string.

Note:

  • We can assume that the value of m will always be less than the value of n.
  • We can assume 0-based indexing.

Refer to the Example and Example Explanation sections for more details and the Approach section to understand the working of the naive string matching algorithm.

Example

Example 1: The input string is:

The input pattern is:

Output: The pattern was found at index: 11.

Example 2: The input string is:

The input pattern is:

Output:

Example Explanation

Before getting into various approaches to the Naive String Matching Algorithm, let us briefly discuss strings in brief.

A string is an immutable data type that is used to store the sequence of characters. Strings are one of the most widely used data types of any programming language. A string can be easily created using quotes (either single quotes or double quotes).

Example:

or Example:

Now, let us take a few examples to understand the algorithm i.e. naive string matching algorithm.

In the example above, the input string is "Hello World! and the pattern that needs to be searched is "World!". We can see that the size of the pattern is 6 (i.e. m) and the size of the input text is 12 (i.e. n).

We can start searching for the pattern in the input text by sliding the pattern over the text one by one and checking for a match.

So, we would start searching from the first index and slowly move our pattern window from index-0 to index-6. At index-6, we can see that both the W's are matching. So, we will search the entire pattern in the window starting with index-6.

Hence, we will find a match and return the starting index of the pattern as the answer.

For better clarity of the window search, please refer to the image provided below.

window-searching-pattern

Best Case

The best case in the naive string matching algorithm is when the required pattern is found in the first searching window only.

For example, the input string is: "Scaler Topics" and the input pattern is "Scaler. We can see that if we start searching from the very first index, we will get the matching pattern from index-0 to index-5. So, we would return 0 as the answer.

In the best-case scenarios, we have only searched a smaller section of the entire string. We need not traverse the entire input string.

Note: In the best case, we need to perform the O(1 + m)a number of comparisons, m for comparing the pattern string in the window and 11 comparison for the first character of the input string text with the pattern string's first character. Here, m is the size of the input pattern string.

Worst Case

The worst case in the naive string-matching algorithm is when the required pattern cannot be found in the entire input string.

For example, the input string is: "Scaler Topics" and the input pattern is "InterviewBit. We can see that if we start searching from the very first index and traverse the entire input string but we cannot find any matching.

In the worst-case scenario, we need to traverse the entire input string.

Note: In the worst case, we need to perform the O(m*(n-m+1)) number of comparisons, where n is the size of the input string and m is the size of the input pattern string.

Features

String searching algorithms like the naive string matching algorithm are widely used algorithms in computer science. They are extensively used to search strings and patterns in notepad/word files or browsers or databases, etc. They are used in search engines to search for matching results.

Input/Output

The input text is:

The input pattern is:

Output:

Example 2: The input text is:

The input pattern is:

Output: The pattern was found at index: 7.

Constraints

  • 1 <= n <= 30
  • 1 <= m <= 30

In some problems, you may find the number of test cases represented by t. So, we only need to call the searchPattern() function t-times.

Algorithm 1 - Naive Pattern Searching

One approach to search a certain pattern in the input string can be using two loops (nested loops). We will use all the possible placements of the pattern string in the input text and check whether the current positioning matches the input pattern or not. If the current positioning matches with the input pattern then we would return the first index of the matching window (of input text). If all the possible positioning does not match with the input string then we would return 1-1.

Note: This approach tests all the possible placements of input pattern [1.......m] relative to input text [1......n]. We then try to shift s = 0, 1.......n-m, successively and for each shift s, we will compare both the input and pattern string.

Refer to the next sub-section to learn the understanding pseudo-code of the naive string-matching algorithm.

Algorithm

The pseudo-code for the same:

  1. We will initialize an outer loop that will run from i = 0 to i less than or equal to (n-m). Since, after the index n-m, our searching space (or searching window) will exceed the length of the input string.
  2. After the outer loop, we will initialize an inner loop that will run from j = 0 to j less than m as this inner loop will range from 0 to the size of the input pattern.
  3. Now, first we would check if the current input text character matches with the current input pattern character or not.
    • If a match is not found, we would break from the loop, and the inner loop will move one index far and start searching in the next window.
    • If a match is found, we would try to match the entire input pattern with the current window of the input text.
      • If we have traversed the entire searching window in the input text and we have found our pattern string, we would print the result.
      • Else, we would search in the next window.

Please check the code implementation section for more clarity.

Code Implementation

Let us the code implementation of the same in various languages:

C++ Code:

Java Code:

Python Code:

Output

In the naive string matching algorithm, we are traversing the entire input string text until we found a match with the input pattern. For searching the pattern, we are searching the window size of m if the first character of the window matches the input string text.

Time complexity

In the naive string matching algorithm, the time complexity of the algorithm comes out to be O(n-m+1), where n is the size of the input string and m is the size of the input pattern string.

Space complexity

In the naive string matching algorithm, the space complexity of the algorithm comes out to be O(1).

Algorithm 2 - Alternate Approach - using 2 pointers

Another approach to the string pattern searching problem can be using two-pointers. We can use the degenerating property which means patterns having the same sub-patterns appearing in the pattern more than once.

Whenever we detected a mismatch, we can already sense or see the characters that will be present in the next search window. We can use this information and this information will help us to avoid searching for repeated characters present commonly in the various windows.

Algorithm

The pseudo-code for the same:

  1. We would initialize two pointers i and j. i will point to the input text string and j will point to the input pattern string
  2. Start comparing the pattern with the text of the current window.
  3. We will keep incrementing both the pointer (i and j) until both the string matches.
  4. When a mismatch is encountered:
    1. We already know that the character of the pattern from index-0 to index-j - 1 matches with the text string from index i - j to index i - 1. Apart from this, we also know that the pattern[0.. j-1] is both a proper suffix and a proper prefix.
    2. So, we do not need to match both these character sequences because these two (proper suffix and proper prefix) will always match.
    3. We would then perform the same algorithm for the next window but in the new window, we already know about the proper prefix and proper suffix.

Note:

  • Proper Prefix: A proper prefix of the string will be a subset of the string using only the beginning portion of the pattern. A subset is any possible combination of the original set (string).
  • Proper Suffix: A proper suffix of a string would be a subset of the pattern with elements taken only starting from the last character.

We would also store the longest proper prefix inside a string (namely - longestProperPrefSuf) which is also a suffix so that the processing becomes easier.

Code Implementation

Let us the code implementation of the same in various languages:

C++ Code:

Java Code:

Python Code:

Output

In this algorithm, we are calculating the longestProperPrefSuf first, then utilizing it to get the desired answer.

Time complexity

In the two-pointer algorithm, the time complexity of the algorithm comes out to be O(n), where n is the size of the input string.

Space complexity

In the two-pointer algorithm, we are creating an array for proper prefixes or suffixes, the space complexity of the algorithm comes out to be O(m), where m is the size of the input pattern string.

Advantages & Disadvantages of Naive String Matching Algorithm

Let us first discuss some of the advantages of the naive string-matching algorithm.

  • In the naive string matching algorithm, we do not need any extra space to find the starting index of the pattern in the string.
  • In the naive string matching algorithm, we can perform the checking and comparison in any direction (from left to right or from right to left).
  • In the two-pointer approach, we need to create a proper suffix or proper prefix array. In the naive string-matching algorithm, we do not require any pre-processing.

The only disadvantage of the naive string matching algorithm is that it is an inefficient searching algorithm as it requires O(n-m+1) time complexity, where n is the size of the input string and m is the size of the input pattern string.

Conclusion

  • The naive string matching algorithm uses two loops (nested loops). It uses all the possible placements of the pattern string in the input text and checks whether the current positioning matches the input pattern or not.
  • In the naive string matching algorithm, we do not need any extra space to find the starting index of the pattern in the string.
  • In the naive string matching algorithm, we can perform the checking and comparison in any direction (from left to right or from right to left).
  • In the two-pointer approach, we need to create a proper suffix or proper prefix array. In the naive string-matching algorithm, we do not require any pre-processing.
  • Another approach to the string pattern searching problem can be using two pointers. We can use the degenerating property which means patterns having the same sub-patterns appearing in the pattern more than once.
  • In the naive string matching algorithm, the time complexity of the algorithm comes out to be O(n-m+1), where n is the size of the input string and m is the size of the input pattern string.
  • In the naive string matching algorithm, the space complexity of the algorithm comes out to be O(1).
  • In the two-pointer algorithm, the time complexity of the algorithm comes out to be O(n), where n is the size of the input string.
  • In the two-pointer algorithm, the space complexity of the algorithm comes out to be O(m), where m is the size of the input pattern string.