Conditional Changes in Pandas Dataframe

Learn via video courses
Topics Covered

Overview

In this article, we will be learning about some conditional changes in Pandas DataFrames, such as the values of the columns in Pandas DataFrames can be replaced with new values according to the condition provided by the user using different methods.

Introduction 

There are different methods for conditional changes in Pandas DataFrame, such as DataFrame.loc[], numpy.where(), and DataFrame.mask(), which are used to replace the values of selected columns according to the condition provided by the user. Values can also be replaced using the if condition.

Replace Values in the Column Based on the Condition

The values of the selected columns in the Pandas DataFrame can be replaced with some new values. There are several methods to do this, such as DataFrame.loc[], numpy.where(), and DataFrame.mask(). Let's discuss each method one by one with the help of some examples.

Replace Values in a Column using DataFrame.loc()

With the help of pandas DataFrame.loc[] property, values of the selected columns based on the provided condition of the pandas DataFrame can be replaced. By using a label or boolean array, the loc[] allows you to access a collection of rows and columns. The values of a pandas DataFrame can be accessed and modified.

Code:

Output:

Explanation:

In the above code example, dictionary data is created and stored as a Pandas DataFrame using the pd.DataFrame() function in the df variable. Then inside the .loc function, we filter the Price column, as if the price of the book is greater than 500, we replace the price value with 1000.

Replace Values in a Column using numpy.where()

Another approach to replace the values of the columns based on conditions is by using the numpy.where() function. Whenever the specified condition is satisfied, the where() function returns the indices of the elements in an input array. For calculations of two-dimensional and three-dimensional arrays, numpy is used.

Code:

Output:

Explanation: 

In this code example, we created a dictionary data of keys and values and converted it into a DataFrame. np.where() function is used for checking the price of the books; where the price is greater than or equal to 500, replace the price amount by 1000 there and the updated value is stored in the New_Price column.

Replace Values in a Column by Checking Multiple Conditions

We can replace the values in selected columns or data by checking multiple conditions at the same instant. Let's take a look at the below example to do this:

Code:

Output:

Explanation: In this code example, we are checking for multiple conditions, as if the price is greater than or equal to 500 and the book name is 'oxford', we change the price to 1000.

Using DataFrame.mask()

The DataFrame.mask() function replaces the values of rows wherever the condition is met. The DataFrame.mask() function is the opposite of the where() function.

Code:

Output:

Explanation: In this above code example, .mask() is used for checking the condition. That is, if the price is greater than or equal to 500, is true or satisfied using inplace=True then change the value of the price from 500 to 1000. 

Apply IF Condition in a Pandas DataFrame.

Selected data can be replaced or changed by using the if condition. We can check whether the current data satisfies the condition provided by the user or not and change the data accordingly. To understand how the if condition works, let's look below at some examples.

Code:

Output:

Explanation: In this code example, .loc[] is used to access the specified column. Then we check the conditions that if the price is less than or equal to 500, then we assign True to that data, and if the price is greater than 500, then assign False. This True or False value is stored in the New_Price=<500 column. 

Code:

Output:

Explanation: We can perform the same operation using lambda, lambda keyword is an anonymous function and takes i as an iterable here. Here, we are checking for each price value using i as an iterable object here. If the price is less than or equal to 500, then assign "True" to it. Otherwise, assign "False," and the updated values are stored in the New_price=<500?.

Code:

Output:

Explanation:

Here, we are checking for the string data. The .loc[] function is used to access the data of the specified column. == sign is used for equal to and != is for not equal to, so we are checking for the Fist_Name and if the Fist_Name matches with the given name that is "John", then we assign Match, otherwise Mismatch, and the updated value is stored in the Match_Name column.

Code:

Output:

Explanation:

In this code example, we are doing the same as in the above example using lambda. Here i is used as an iterable with the help of the lambda function and checks for each string value whether it specifies the given condition by assigning Match and if not, then assign Mismatch and update the values in the Match_Name column.

Code:

Output:

Explanation:

Here, we are using the "and", "or" operators to check the multiple conditions. & is used for "and" and \ is used for or. If the First_Name matches with the John or Harry, then assign the Match to the Match_Name, and if the First_Name does not match the John and Harry then assign Mismatch.

Conclusion

  • There are several methods for conditional changes in Pandas dataframes, including DataFrame.loc [], numpy.where(), and DataFrame.mask(), which are used to replace selected columns or data in a Pandas dataframe with new values based on the condition.
  • We can also replace the values in columns by using the if condition in Pandas DataFrame.
  • We can also apply multiple conditions at the same time and change our data.