Core Components of Pandas

Learn via video courses
Topics Covered

Overview

In this article, we will be learning about the core components of Pandas, which is an open-source Python library built on top of the NumPy library. Pandas have Series and DataFrame as core components, which have various applications.

Introduction 

Pandas is the open source library of python which is built on top of the NumPy library of python. It has core components, namely Series and DataFrame. There are different methods like using existing files or lists, dict, etc., for creating pandas Series and DataFrame.

Core Components of Pandas 

The Series and DataFrame are the core components of pandas. Let's discuss these topics one-by-one further in this article.

Series

The Pandas Series is a one-dimensional data structure that can hold data of any type, like integers, floats, strings, Python objects, etc. It is the primary data structure of pandas and is built on top of the NumPy library of Python. Pandas series is just like a column in an Excel sheet. All of the axis labels are referred to as an index. Labels must be of a hashable type, but they do not need to be unique. Any Python object that contains a hash value, an integer that remains constant throughout its duration, is said to be hashable. The object has a wide range of methods for executing operations involving the index and supports both integer and label-based indexing. 

Creating Series

The Pandas series can be created using existing datasets like CSV files, excel files, etc. It can also be created using a Python list, dictionary, etc. In python pandas.series() is the constructor that is used for creating series. Let's look at some basic examples:

Creating series using existing dataset

Code:

Output:

Explanation: In the code example, pandas is imported as pd. A list of alphabets is created and stored in the li variable. A series of the list is created and stored in the data variable using pd.Series() function.

Code:

Output:

Explanation: In the above code example, a CSV file is loaded from the github link using pd.read_csv() function and stored in data variable. Data is stored as a Pandas Series using pd.series(). The type() function is used for checking the type of data used, and by using the .head() function, we get the starting values of the data as specified in the head function. head() function by default gives the data for the first 5 rows.

Various Operations on Series

We can perform various arithmetic operations on series like addition, subtraction, etc. To perform these operations, we have to use Python in-built functions like .add() for addition, .sub() etc.

Code:

Output:

Explanation

Here, we created two series of the list with an alphabetic index using pd.series() and stored them in data1 and data2 variables. Then data2 is added to data1 using .add() function.

Code:

Output:

Explanation Here, we created two series of the list with an alphabetic index using pd.series() and stored them in data1 and data2 variables. Then, using the .sub() function, data2 is subtracted from data1.

In conversion operations on Pandas series, the data type of the series gets changed, i.e., changing the series to a list, etc. To perform conversion operations on series, we have different functions like .tolist(), .astype() etc.

Code:

Output:

Explanation Here, a CSV file is loaded from the  github url. The Date column of the data is converted as a Series using the pd.series() function, and then this Series is converted into a list using the .tolist() function. We can see the data type of the data in the above code before and after conversion. Here, .head() is used to get the data for the first 5 rows. 

Code:

Output:

Explanation In the above code example, a list of numbers is used to create a pandas series with index values as alphabets. The pandas series data type is converted to float using the .astype() function.

DataFrames

A Pandas DataFrame is a two-dimensional, size mutable data structure, consisting of labeled axes, i.e., rows and columns. Each column is a pandas series.

Creating DataFrames

Pandas DataFrame can be created using existing datasets like CSV files, excel files, etc. It can also be created from python lists, dictionaries, etc. Pandas DataFrame can be created using pandas.DataFrame() constructor.

Let's look at some pandas dataframe examples:

Code:

Output:

Explanation In the above code example, the Python list li is used for creating a DataFrame using the pd.DataFrame() function. We can determine the datatype using .type().

Code:

Output:

Explanation Here, Pandas DataFrame is created using an CSV file loaded from github url using pandas.read_csv() function. The .type() function is used for checking the data type and .head() function is used for getting the data of specified rows. 

Various Operations on DataFrame

There are various operations that can be performed on Pandas DataFrame, such as arithmetic operations (like addition using .add() function, subtraction using .sub() function, etc.),for changing the datatype, we have .astype(),  for description we have .describe() operation etc.

Code:

Output:

Explanation: In the above example, DataFrame of numbers with alphabetic index values is created using the list. using .add(), specified value within the add() function gets added to each element in the DataFrame. For example, 10 is added to the first element of the DataFrame 1 to get the final value of 11, and so on.

Code:

Output:

Explanation: Here, two Dataframe is data1 and data2 is created using pd.DataFrame() function. data2 is added to the data1 using .add() function.

Code:

Output:

Explanation: In this example, an github url of the data file is used for creating the DataFrame. type() function is used for checking the data type of the data and .describe() is used for getting the description of the data.

Code:

Output:

Explanation: In this example, the data of the Date column which is initially of int datatype gets converted into float datatype using the .astype() function.

Panel

The panel is an essential container for three-dimensional data in Pandas. The panel can be created using pandas.Panel().

  • items − A DataFrame is represented by each item on axis 0.
  • major_axis − The index(rows) of DataFrame is represented by axis 1.
  • minor_axis − The columns of each DataFrame are represented by axis 2.

Applications of Different Pandas Components

  • Data Creation We can create tabular form data of data files using Pandas Dtataframe, which really helps to increase the data readability of large data files and also helps to analyze the data easily and effectively.
  • Data Loading/Acquisition With the help of Pandas components, we can load any existing data like CSV files, excel files, etc., from the local disc or by using the URL link of the data file using some pre-built function of Pandas.
  • Extracting individual columns Pandas library helps to extract the selected column from the dataset. It is very beneficial when we are working with large datasets and we only need information present in some specific column, then we can easily extract that column using its name.
  • Extracting individual records Sometimes we need some specific data based on some conditions, such as " data records of vehicles sold before 2020" etc. To get these kinds of records from the large datasets we use the Pandas library.
  • Data Visualization As pandas have great usage in the above-mentioned points, we can say that with the help of pandas, it becomes very easy to visualize the data and we can analyze the data and work on any data with the help of pandas effectively, as we get the whole description of any data easily by using pandas.

Conclusion

  • Series and DataFrame are the core components of Pandas.
  • There are different ways to create pandas Series and DataFrame such as by using python existing data files like CSV files, excel files or by using python list, dict, etc.
  • We can perform various operations like addition, subtraction, conversion of data, etc. on Pandas Series and DataFrame.
  • Panel in pandas is used for three-dimensional data.
  • Pandas components have great applications like data creation, data visualization, extracting specific columns or data from large datasets, and loading data from the local computer or from an external link, which is very useful whenever we are working with large datasets.