Boston House Price Prediction

Overview

As the name suggests, Boston House Price Prediction Project predicts house prices based on various parameters like area, number of rooms, age, and more. In addition, the Boston House Price Prediction dataset has numeric values making it uncomplicated to compute and predict.

Moreover, this project can be developed for real-time applications in any city, helping developers, realtors, and buyers predict property value.

What are we Building?

In the Boston House Price Prediction Project, we are building a predictive model to evaluate the price of a house with provided parameters.

The parameters into consideration are - Per Capita Crime Rate by Town, Nitric Oxide Concentration, Weighted Distances to five Boston Employment Centres, Tax, etc.

In this project, we will use - Linear Regression, Random Forest, and decision tree classifier and compare the model accuracy.

Pre-requisites

To build the Boston House Price Prediction Project, the following pre-requisites will be a benefit -

Understanding of Python
Implementation of libraries like Pandas, Numpy, Seaborn, Matplotlib, and SciKit Learn
Familiarity with Machine Learning algorithms like Regression, Decision Trees, and Random Forests
Basic acquaintance with concepts like Data Cleaning, Processing, Model Evaluation, Model Testing, Model Training, and Feature detection

How are we going to build this?

Let us have a walkthrough of building the project -

Libraries and Data - Importing the necessary libraries and dataset
Analysis - Reviewing the values, data types, and columns of the dataset
Visualisation - Data Visualisation to check for underlying trends and patterns in the dataset
Feature Analysis - Analysing the features to check if refactoring, dropping of columns, or missing data is to be filled
Data Cleaning - Filling in missing values and dropping columns that are not needed
Feature Engineering - Scoring the features to assess their significance in predicting
Training, Testing, and Splitting data - Getting the data prepared for prediction by dividing it into training and testing data
Models - Fitting the models and then predicting outputs
Evaluation - Obtaining the model score and checking their accuracy

Final Output

Given below is the score of each of the models applied to the dataset.

Linear Regression Graph -

Decision Tree Graph -

Random Forest Graph -

Requirements

Environment - Either an online or offline compiler
Libraries - Pandas, Numpy, Seaborn, Matplotlib, and SciKit Learn

About the data

The Boston Housing Data has 506 rows and 14 columns. All the values are either integers or float.

Below are the columns and their description -

CRIM - Per Capita Crime Rate
ZN - Proportion of Residential Land Zoned for Lots over 25,000 sq. ft
INDUS - Proportion of Non-Retail Business Acres
CHAS - Charles River Variable
NOX - Nitric Oxide Concentration
RM - Average number of Rooms
AGE - Proportion of owner-occupied units built before 1940
DIS - Weighted Distances to Boston Employment Centres
RAD - Index of Accessibility to Radial Highways
TAX - Property Tax Rate per $10,000
PTRATIO - Student — Teacher Ration
B - Proportion of people of African American descent
LSTAT - Percentage of Lower Status of the Population
MEDV - Median value of owner-occupied homes in $1000, i.e., Price of the property

Understanding the Problem Statement

Here, the goal is to build a model that can predict the price of a property based on parameters like crime rate, age of the property, air quality, etc.

Algorithms

We have used three algorithms -

Logistic Regression An algorithm for classifying the data is logistic regression. It is applied to predict a binary outcome based on several independent factors.
Decision Tree Classifier A supervised learning technique named a decision tree can be used to tackle classification and regression problems and is usually preferred. It is a tree-structured classifier, where internal nodes stand in for a dataset's features, branches for decision-making, and each leaf node for the classification result.
Random Forest A "forest" is created by growing and combining various decision trees using the supervised machine learning method Random Forest. The Random Forest model is based on the idea that several uncorrelated models (the various decision trees) work significantly better together than they do separately.

Building the Project

Load the data

Output- RandomForestRegressor

Data Analysis

Output-

Output- deviation-maximum

Data Visualisation

Output- average number of room distribution

Output-

homeowner distribution with low class

Output- student to teacher ratio distribution

Output- Ages of black owned portion in the town

Output- random graph

Output- crim-zn-indus

Data Cleaning

Output-

Feature Engineering

Output-

Output- specs-score

Output-

Output- tax-indus-ptratio

Train the model

Output-

Model evaluation

Output-

Model Testing

Output-

Model Testing

Output-

rsquared_score

Output-

rmse-score

Output-

What’s next

To enhance this project further -

Web Implementation - Flask can be used to have an interface
Models - Here, we have used three models, other models can be tried out to improve accuracy further

Conclusion

Boston House Price prediction is the best regarding working on numeric data.
In this project, the best accuracy was that of Random Forest at 93%.