Iterating Over a Dataframe in Pandas

Learn via video courses
Topics Covered

Overview

In this article, we will be going to learn how to iterate over a DataFrame. There are different methods like iterrows(), .itertuples(), .iteritems(), using loc[] etc. to iterate over rows and columns of the DataFrame. By creating a list of columns, iteration over DataFrame can also be performed. Sometimes it's not necessary to iterate over a DataFrame, so an alternative method i.e., vectorization is used.

Introduction

There are different methods to iterate over rows and columns of the DataFrame. The .iterrows(), .itertuples(), and .items() methods are used to iterate over rows of the Dataframe and return tuple objects. The loc[], iloc[], and attribute indexing methods can also be used to perform iterations. We can iterate over columns of the DataFrame by creating a list of columns.

Methods to Iterate Over Rows

There are various methods to iterate over rows, some of which are discussed below. Here we try to iterate the below dictionary DataFrame.

Code#1:

Output:

Explanation: Pandas are imported as pd, and dictionary data is created. DataFrame df is created using pd.DataFrame().

Using iterrows()

The Pandas .iterrows() function can be used to iterate over rows of the DataFram. It returns a tuple-based object, and each tuple consists of the index and the data of each row from the data frame. The .iterrows() method does not maintain data types.

Code#2:

Output:

Explanation: In the above code, i and j pointers are used to iterate over a row. First, i iterate over the row index i.e 0,1,2 etc., and then j iterate over row data like we have the first-row containing Book_Name Oxford, Author Jhon Pearson, and Price 350.

Code#3:

Output:

Explanation: In the above code example, we have two pointers i and j. The first one iterate over the row index, and the second one iterate over the row data as pandas series using the column name.

We can also iterate over external files, such as CSV files, Excel files, etc., using this method.

A small Dataset is loaded as a CSV file from GitHub

The data looks like this:

Using iterrows()

Code#4:

Output:

Explanation: Here CSV data is loaded from GitHub using the pd.read_csv() function. The .shape is used to print the shape of the data, i.e. the number of rows and columns data contains, and using the .head() function, we get the starting rows of the data, by default, it takes the first five rows from the start.

Code#5:

Output:

Explanation: Like the above examples, in this code example, i contains the row index, and j iterates over the data of each row.

Using iteritems()

The iteritems() function is used to iterate over each column as key, value pair with the label as key, and Series object for the column value.

Code#6:

Output:

Explanation: Here, iteritems() is used for iterating over the columns. We have two pointers i and j, the first one is for the column and the second one is for the series items in each column. For example, Book_Name columns contain 'Oxford', 'Arihant', 'Pearson', 'Disha', and 'Cengage' items.

Code#7:

Output:

Explanation: Here, Data is loaded from the GitHub using pd.read_csv() function, i, and j pointers are used to iterate over the columns of the data. i is for the columns and j is for the items in the column.

Using itertuples()

The itertuples() function is used to iterate through the rows; this function returns a tuple for each row in the DataFrame. The row's matching index value will be the first element of the tuple, and the remaining values are the row values.

Code#8:

Output:

Explanation: In the above code, itertuples() is used to iterate over rows, i is a pointer that iterates over each row item. Like index 0 contains Oxford Book_Name, Jhon Pearson Author, and 350 price.

Code#9:

Output:

Explanation: Here, we are iterating over rows of the CSV file, the same as in code#8.

Using the index attribute of DataFrame

We can use the index to iterate over a DataFrame. Let's see the working with the help of the example given below:

Code#9:

Output:

Explanation: In the above code example, we are using the index of the data. For example, we have Oxford at the 0th index, Arihant at index 1, and so on in the Book_Name column.

Using loc[]

Pandas DataFrame.loc attribute accesses a group of rows and columns set of rows and columns in the given DataFrame using the labels or a boolean array. .loc[] is label based technique for data selection, which means we have to name the rows and columns which we want to access. it takes a string, a list of strings, and slice notation using strings as input. It includes the last element of the range passed.

Code#10:

Output:

Explanation: In the above code example, we iterate over each row of the data like the 0th row has items Oxford, Jhon, Pearson, and 350.

Code#11:

Output:

Explanation: In this example, we are using for loop and inside the .loc[], we passed the iterator i and the column city, whose data we want to access. It iterates over each row of this column.

Using iloc[]

Rows can be retrieved from a Data Frame using a special method given by the Pandas library. It is an index-based technique for data selection, which means we have to pass the index of the row or column which we want to access. It takes integers, a list of integers, or slice notation consisting of start and stops value as input. The .iloc[] function does not include the last element of the range passed.

Code#12:

Output:

Explanation: In the above code example, We have to specify the index inside the iloc[] function of row and column. For loop is used for iteration as iterator i iterates over a specified row and column of the data.

Code#13:

Output:

Explanation: In this example, Iterator i using for loop iterates over the data, and by using the iloc[] function, it will first go to the 0th row and 4th column then 1st row and 4th column, 2nd row and 4thh column and so on.

Using apply()

The **.apply() **function of Pandas is used to perform iterations on columns/rows. It uses vectorized techniques to speed up the execution of both simple and complex operations.

Code#14:

Output:

Explanation: In the above example, We concatenate the two rows of the data using the .apply function. Here, lambda is used to iterate over the data like for loop does and row is an iterator.

Methods to iterate over Columns

Create a list of columns

To iterate over the columns, we create a list of DataFrame columns and iterate through that list to extract the DataFrame columns.

Code#15:

Output:

Explanation: Pandas are imported as pd and pandas DataFrame df is created using the dictionary data.

Now we iterate through columns. First, we create a list of DataFrame columns and then iterate through the list. Code#16:

Output:

Explanation: Here, DataFrame is converted into a list and stored in the col variable. Then using the index, specified data is printed. Like here we use df[i][1], and iterator i starts for loop from 0 so, we get the element at the 0th row and 1st column the element at the 1st row and 1st column, then the element at the 2nd row and 1st column and so on.

Code#17:

Output:

Explanation: Here, we are using the top 5 rows of the data for reference and created a new data df of these rows. Then df is converted into a list and using for loop and specifying the index value, we iterate over the data as discussed in the explanation part of code#16.

Why Iterating over Dataframe Rows is a Bad Idea?

According to the pandas, iterating over DataFrame is not a good idea in most cases it's not needed, and DataFrame consists of lots of data records, over 1000 records, so iterating over these large DataFrame slows down the process. To avoid this, Pandas recommends vectorization and the .apply() method to apply specific formulas if needed. Iterating over rows without any need is not recommended.

Vectorize

With the "vectorize" technique, you can execute a single action across an entire Series, Index, or even DataFrame. In simple words, it is a technique to perform python operations on the data without using loops. It uses python's inbuilt functions and operands.

You will learn how to square a number in a column by using the example below. The calculation would be run as many times as there are records in the column if you were to iterate over each row. However, by vectorizing, you can directly apply a transformation to a column.

Code#18:

Output:

Explanation: In the above code example, DataFrame df is created from a dictionary. A new column New_Sales is created by squaring the data of the previous Sales column using vectorization.

Want to Explore Further? Our Data Science Training Course Delivers In-Depth Insights Beyond the Blog. Enroll Now for a Rich Learning Experience!

Conclusion

  • There are different methods to iterate over rows and columns of the DataFrame.
  • By using iterrows(),iteritems(),loc[],iloc[] and itertuples() we can iterate over rows of the DataFrame.
  • We can also use apply () by creating a list of columns to iterate over columns and columns of the DataFrame.
  • In most cases, iteration is not necessary, and it's a good idea to perform an iteration over DataFrame because DataFrame consists of large datasets. So iterating over these large datasets can slow down the process.
  • To speed up the process pandas recommend vectorize and apply() methods instead iterating over DataFrame.