Which Module in Python Supports Regular Expressions?

Learn via video course
FREE
View all courses
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Python Course for Beginners With Certification: Mastering the Essentials
Python Course for Beginners With Certification: Mastering the Essentials
by Rahul Janghu
1000
4.90
Start Learning
Topics Covered

Overview

The regular expressions are supported by the "re" module in Python. "re" is a built-in package in Python, which can be used to work with Regular Expressions. Once we have explicitly imported the "re" module, we can start using the regular expressions in Python.

What are Regular Expressions in Python?

You must have filled out different kinds of forms online. For example, registration forms, google forms, etc. Now, have you ever noticed, that while filling up any form, suppose you are entering the email and you skip an '@', or '.' (say) then you will be immediately prompted with an error saying "Invalid Email Address". Or, suppose while filling up the mobile number field, if you enter some character (instead of a number), then you will instantly get an error saying "Only numbers are allowed in this field"!

regular expressions in log in form

Have you wondered, how do they do these checks? How do they actually know we are skipping some mandatory character, or adding some invalid character in those fields? So, here comes the regular expressions in play! Let us understand what are regular expressions in Python!

regular expressions in Python

Generally, Regular Expressions (regex) are used to define a search pattern that can be used to search for things (or patterns) in a string. They are also known as the “regex” or regexp”. Regular expressions are a powerful language for matching text patterns. So, what kind of patterns do the regular expressions search or match in the string?

Well, they can be used to search or match strings of text, for example, any particular word, or even some pattern of characters. This can be very understood by the example of our registration forms. You must be aware that in general, any email address consists of a string followed by an '@' and again some string, and finally a '.'. So, we may say that '@' and '.' are the mandatory elements in any email address field apart from other string values.

mandatory elements in any email address field

Hence, the regular expressions are used to find similar patterns in any string and then we can match and extract any string pattern from the text with the help of regular expressions. The Regular Expressions, or just RegEx, are used in almost all programming languages.

Regular Expressions diagram

Now, how can we use regex or regular expressions in our Python code? So, here comes the "re" module in Python, which helps us to use the regular expressions in our code. Let us learn more about the "re" module in Python.

"re" Module in Python

Regular expressions are a powerful language for matching text patterns. In python, we have a built-in package called "re" to work with regular expressions. The Python "re" module provides regular expression support.

re module in Python

Python does not provide an inbuilt regex module. For using the regex module, we first need to install it using the pip command and then import it into your IDE.

Installing regex Module

Once we have installed the "re", to use the "re" module, we first import it into our code.

Importing "re" in Python:

Now, let us look at some examples where we actually use Regular Expressions.

In Python a regular expression search is typically written as:

Let us understand each of the terminologies used above in depth:

  • re.search() : In general, the re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. In case the search is successful, the search() method returns a "match object". Otherwise, it returns None. Hence, the search() method is usually followed by an if-else statement to confirm whether our search operation was successful or not.

  • Pattern : The pattern usually depicts the pattern we are trying to search for in our string. There are multiple notations we have while mentioning the pattern. We will learn a few of them in this article. To learn about this in detail, you may refer Regular Expression in Python.

  • String : Here we pass the string in which we will try to find our pattern.

Let us see an example to understand our above statements.

Code :

Output :

In the above example, we have performed the below steps:

  • First, we import the regex module by using import re.
  • Then, we have our string str, which we will use for searching.
  • Then re.search(pat, str) is used to search for the word re.search(r'How are you', str), in our string str.
  • Please note, that we have used r at the start of the pattern string. The r is used to convert the pattern to raw string. This means any special character will be treated as a normal character. Ex: \ character will not be treated as an escape character if we use r before the pattern. It is very handy for regular expressions and also highly recommended.
  • The result is stored in the variable "match". In this example since we have a matching word in the string, the if statement will be executed and 'found the matching word: ' will be printed. The match.group() is the matching text
  • In case we did not have any matching string then, the match would be false (None to be more specific) and our search function would not have succeeded.

Let us take some more complex examples and understand their usage.

Code:

Output:

We have already discussed the general working of re.search in the above example, so let us focus on understanding re.search(r'ID\=\w\d\d\w', str) in this example.

  • \ -- depicts the "specialness" of a character. So, if you are using a special character such as \ or '?' or '@' or '=', etc. you should put \ before that to make sure it is treated just as a character. So, we have put \ before our '=' in the example.
  • \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Please note, that it matches a single word char, and not a whole word.
  • \d -- represents the decimal digit [0-9].

Hence in the above example, (r'ID\=\w\d\d\w', str) matched ID=A10D because here ID is followed by '=' and it is followed by a character, 2 digits and again a character.

Regular Expression Examples

We have learnt enough of the basics of regular expressions. Now we will look into the various functions provided by python to implement regex along with its code and use cases.

Regular Expression Examples

Note: Before going ahead with this article, I would highly recommend you to read this article about the "Meta Characters" in the scaler article Regular Expression in Python

Match Method

Whenever we call any regex method / function it searches the pattern in the string. If it finds a match then it returns a match object else returns None. The group() function is used to fetch the value we get from the match object.

Below given is the syntax for the match method in Python regex.

Syntax:

Let us understand each of the terminologies:

  • The pattern is representing the regular expression that we pass.
  • The string is basically the text from which we will be finding patterns and performing our operations.
  • The flags are used to apply some particular conditions on the resultant matching string. However, it is not a mandatory parameter and can be skipped.

Let us look into an example based on the match method.

Code:

Output:

Explaination:

In the above code, we have used these attributes and properties of re.Match objects: match.group(): It returns the part of the string which matched the pattern.

match.start(): It returns the starting position of the matching pattern in the string.

match.end(): It returns the ending position of the matching pattern in the string.

match.span(): It returns a tuple which has the start and end positions of a matching pattern.

match. re: It returns the pattern object used for matching.

match.string: It returns the string given for matching.

Findall (Pattern, String)

The findall method in regular expressions, as the name signifies, is used to find out all the occurences or existence of any particular pattern in the string. It returns a list containing all the repetitions of the mentioned pattern.

The syntax of findall is given below:

Syntax:

Let us look at some code to understand further the usage of findall in regex and find all the mentioned similar patterns of the string.

Code:

Output:

In the above output, our findall method finds a match for "Python is" pattern and hence is returned in a list. This pattern is repeated twice in the text, hence we can see this two times in the list.

Split(Pattern, String)

Python split function is mainly used to split the string on the given pattern. After splitting, it returns a list.

The syntax of the split is given below:

Syntax:

Let us look at some code to understand further the usage of a split in regex and split all the given patterns of the string.

Code:

Output:

So, in the above example, our split method of regex, split or divided our string based on the '?'. It split whenever it found a '?' in the string.

Sub Method

The Sub method in regex is used to replace all the occurrences of the RE pattern with the new string / pattern.

The syntax is for sub-method is:

Syntax:

Here, "repl" will be the string we want to replace with every occurrence of the pattern in the string.

Let us look at some code to further understand the usage of Sub in regex and replace all the given patterns of the string with another string.

Code:

Output:

In the above output, we can see that the function replaced the pattern ‘David’ with the given substring ‘he’. Hence, this function will replace all the patterns present in the string with the given substring.

Applications of Regular Expressions

The regular expressions apply to a vast number of fields. You must have already used it once or multiple times during your projects. Let us list down some great usage of regular expressions.

  • String parsing, for example, find some text matching some characters, catch all URL GET parameters, etc.
  • Data Scraping or Web Scraping, like finding all the pages with so and so pattern and in so and so order.
  • Data Wrangling, which is transforming data from “raw” to another format
  • Data validation, for example, we checked whether the email or phone number format is well-formed or not.
  • Other benefits include syntax highlighting, file renaming, packet-sniffing and many other applications involving strings.
  • Using Regex for Text Pre-processing (NLP)

Learn More

To get an in-depth knowledge of the expression in Python refer below link :

Conclusion

In this article, we learned about which module in python supports regular expressions and also a lot about the regular expressions. Let's take a brief pause and reflect on what we have learned so far!

  • Regular Expressions (“regex” or “regexp”), are used to match strings of text such as particular characters, words, or patterns of characters.
  • In python we have a built-in package called "re" to work with regular expressions.
  • The match method searches for the RE pattern at the beginning of the string and returns the match object of the string.
  • The search method is the same as the match function but it can search patterns irrespective of the position at which the pattern is present.
  • The split method splits a string on the given pattern and returns a list.
  • The findall is the same as search but it matches all the occurrences of the pattern in the given string and returns a list.
  • Regex has great applications in fields of String parsing, data scrapping, data validations, NLP, etc.