Ordinal Encoding

Learn via video course
FREE
View all courses
Python and SQL for Data Science
Python and SQL for Data Science
by Srikanth Varma
1000
5
Start Learning
Python and SQL for Data Science
Python and SQL for Data Science
by Srikanth Varma
1000
5
Start Learning
Topics Covered

Overview

Ordinal encoding is a technique to transform categorical features into a numerical format. In ordinal encoding, labels are translated to numbers based on their ordinal relationship to one another. For example, if one feature contains - {low, medium, high}, it can be converted into {1,2,3}, where 1 represents low, 2 represents medium, and 3 represents high. It is one of the essential tasks before training an ML model, as many ML algorithms do not support categorical data directly and require them to be converted into a numerical format.

Transform Your Career

Choose from our industry-leading programs designed for career success

NSDC Certified

Modern Software and AI Engineering Program

Master full-stack development with AI integration

12 MonthsDuration
AI-LedCurriculum
Career SupportSupport
GoogleAmazonPaytm+1000 more
Go to Program
NSDC Certified

Modern Data Science and ML with specialisation in AI

Advanced data science techniques with AI specialization

12 MonthsDuration
AI-LedCurriculum
Career SupportSupport
GoogleAmazonPaytm+1000 more
Go to Program
NSDC Certified

Advanced AIML with Specialisation in Agentic AI

Deep dive into AIML with focus on Agentic systems

12 MonthsDuration
AI-LedCurriculum
Career SupportSupport
GoogleAmazonPaytm+1000 more
Go to Program
NSDC Certified

DevOps, Cloud & AI Platform Engineering

Build and manage AI-powered cloud infrastructure

12 MonthsDuration
AI-LedCurriculum
Career SupportSupport
GoogleAmazonPaytm+1000 more
Go to Program
NSDC Certified

AI Engineering Advanced Certification by IIT-Roorkee

Premier AI engineering certification from IIT-Roorkee

3 MonthsDuration
AI-LedCurriculum
Career SupportSupport
Program highlights
Go to Program

Introduction to Categorical Data

Before getting into ordinal encoding, it is necessary to understand what categorical data is and the different kinds of categorical features.

A real-world dataset in any Data Science project generally consists of numerical and categorical features. Numerical features can contain only numbers, i.e., integers or decimals. Categorical data is another data type that can take or hold only a limited and fixed number of values. These values can represent categories, groups, or labels associated with the data. Categorical features are often represented using words or strings rather than numbers. A few examples of categorical features include -

  • An animal variable with values of dog, cat, and bird
  • A country variable with values India, USA, and Germany
  • A product variable with values Samsung, Apple, and LG

Further, categorical features/variables can be divided into two categories as described below -

  1. Ordinal Categorical Variable - An ordinal categorical variable is a type of categorical variable in which the categories can be ordered or ranked. In other words, categories in ordinal categorical variables have clear, natural, and intrinsic ordering to their categories. A few examples of ordinal variables are economic status (low income, middle income, high income), educational experience (high school, bachelor's, master’s), customer feedback ratings (strongly dislike, dislike, neutral, like, strongly like), etc.
  2. Nominal Categorical Variable - In nominal categorical variables, categories have no relationship with each other. For example, age (male, female, transgender), colors (blue, red, green, yellow), blood group (A+, B+, O+, O-), etc.

types of categorical variables

What is Encoding?

  • The encoding of a categorical variable can be defined as the process of transforming the categorical variables into a numerical format. This is often necessary before training ML models, as most machine learning and deep learning algorithms require data to be in a numerical format.
  • A few of the most common techniques to encode categorical variables include - ordinal encoding, one-hot encoding, and binary encoding. The choice of encoding technique will depend on certain characteristics of the categorical variable. For example, one hot encoding is used to encode nominal variables, and ordinal encoding is used to encode ordinal variables.

Turn Learning into Career Growth

1200+Hiring Partners
89%Placement Rate
11,000+Placements
147%Avg Salary Increment
2.5XCareer Growth
₹23 LPAAvg Post-Scaler Salary
1200+Hiring Partners
89%Placement Rate
11,000+Placements
147%Avg Salary Increment
2.5XCareer Growth
₹23 LPAAvg Post-Scaler Salary

What is Ordinal Encoding

  • Ordinal encoding is a technique that is used to transform categorical variables into a numerical format by assigning a unique value to each of its categories. It is also referred to as Label Encoding. For example, we have customer feedback data based on a survey or online feedback mechanism. It contains categories - very dissatisfied, dissatisfied, neutral, satisfied, and very satisfied. To encode this variable using ordinal encoding, we can assign numerical values as mentioned below -
    • very dissatisfied - 1
    • dissatisfied - 2
    • neutral - 3
    • satisfied - 4
    • very satisfied - 5
  • Ordinal encoding assumes that categories in categorical variables have clear, natural, and intrinsic ordering to their categories. It does not work for nominal categorical variables as no relationship exists between categories of a nominal variable. In our previous example, we encoded the categorical variable by assigning the lowest numerical value of 1 to the very dissatisfied category and the highest value of 5 to the very satisfied category. This way, we were able to preserve the natural ordering of the categories - very dissatisfied < dissatisfied < neutral < satisfied < very satisfied was retained in 1 < 2 < 3 < 4 < 5. Suppose we have another categorical variable, which contains red, blue, and green categories. We can encode this variable using ordinal encoding by assigning 1 to red, 2 to blue, and 3 to green, but it may lead to incorrect results. As encoded values have a natural ordering between them - 1 < 2 < 3 will be there, but red < blue < green does not exist.

Example: Encoding Categorical Data using Ordinal Encoding

Let’s understand how you can apply ordinal encoding to categorical features using Python libraries. We will use the OrdinalEncoder class provided by the sklearn library.

output ordinalencoder class

Scaler Placement Report and Statistics

₹23L
AVG CTC
SCALER PLACEMENT PROOF

Scaler learners achieved 2.5x salary growth with average post-Scaler CTC reaching ₹23L.

11,000+placements
650+companies
Verified data

Conclusion

  • In ordinal encoding, categorical variables are transformed into numerical variables by assigning unique numbers to their categories based on their ordinal relationship to one another.
  • Ordinal encoding is only suitable for ordinal categorical features and can lead to incorrect results for nominal variables where no relationship exists between its categories.
Hiring Partners:
GoogleGoogleAmazonAmazonMicrosoftMicrosoftFlipkartFlipkartAdobeAdobe1200+ more