Boston House Price Prediction

Learn via video courses
Topics Covered

Overview

As the name suggests, Boston House Price Prediction Project predicts house prices based on various parameters like area, number of rooms, age, and more. In addition, the Boston House Price Prediction dataset has numeric values making it uncomplicated to compute and predict.

Moreover, this project can be developed for real-time applications in any city, helping developers, realtors, and buyers predict property value.

What are we Building?

In the Boston House Price Prediction Project, we are building a predictive model to evaluate the price of a house with provided parameters.

The parameters into consideration are - Per Capita Crime Rate by Town, Nitric Oxide Concentration, Weighted Distances to five Boston Employment Centres, Tax, etc.

In this project, we will use - Linear Regression, Random Forest, and decision tree classifier and compare the model accuracy.

Pre-requisites

To build the Boston House Price Prediction Project, the following pre-requisites will be a benefit -

  • Understanding of Python
  • Implementation of libraries like Pandas, Numpy, Seaborn, Matplotlib, and SciKit Learn
  • Familiarity with Machine Learning algorithms like Regression, Decision Trees, and Random Forests
  • Basic acquaintance with concepts like Data Cleaning, Processing, Model Evaluation, Model Testing, Model Training, and Feature detection

How are we going to build this?

Let us have a walkthrough of building the project -

  1. Libraries and Data - Importing the necessary libraries and dataset
  2. Analysis - Reviewing the values, data types, and columns of the dataset
  3. Visualisation - Data Visualisation to check for underlying trends and patterns in the dataset
  4. Feature Analysis - Analysing the features to check if refactoring, dropping of columns, or missing data is to be filled
  5. Data Cleaning - Filling in missing values and dropping columns that are not needed
  6. Feature Engineering - Scoring the features to assess their significance in predicting
  7. Training, Testing, and Splitting data - Getting the data prepared for prediction by dividing it into training and testing data
  8. Models - Fitting the models and then predicting outputs
  9. Evaluation - Obtaining the model score and checking their accuracy

Final Output

Given below is the score of each of the models applied to the dataset.

Linear Regression Graph - Linear Regression Graph

Decision Tree Graph - Decision Tree Graph

Random Forest Graph - Random Forest Graph

Requirements

  • Environment - Either an online or offline compiler
  • Libraries - Pandas, Numpy, Seaborn, Matplotlib, and SciKit Learn

About the data

The Boston Housing Data has 506 rows and 14 columns. All the values are either integers or float.

Below are the columns and their description -

  • CRIM - Per Capita Crime Rate
  • ZN - Proportion of Residential Land Zoned for Lots over 25,000 sq. ft
  • INDUS - Proportion of Non-Retail Business Acres
  • CHAS - Charles River Variable
  • NOX - Nitric Oxide Concentration
  • RM - Average number of Rooms
  • AGE - Proportion of owner-occupied units built before 1940
  • DIS - Weighted Distances to Boston Employment Centres
  • RAD - Index of Accessibility to Radial Highways
  • TAX - Property Tax Rate per $10,000
  • PTRATIO - Student — Teacher Ration
  • B - Proportion of people of African American descent
  • LSTAT - Percentage of Lower Status of the Population
  • MEDV - Median value of owner-occupied homes in $1000, i.e., Price of the property

Understanding the Problem Statement

Here, the goal is to build a model that can predict the price of a property based on parameters like crime rate, age of the property, air quality, etc.

Algorithms

We have used three algorithms -

  1. Logistic Regression An algorithm for classifying the data is logistic regression. It is applied to predict a binary outcome based on several independent factors.

  2. Decision Tree Classifier A supervised learning technique named a decision tree can be used to tackle classification and regression problems and is usually preferred. It is a tree-structured classifier, where internal nodes stand in for a dataset's features, branches for decision-making, and each leaf node for the classification result.

  3. Random Forest A "forest" is created by growing and combining various decision trees using the supervised machine learning method Random Forest. The Random Forest model is based on the idea that several uncorrelated models (the various decision trees) work significantly better together than they do separately.

Building the Project

  1. Load the data

Output- RandomForestRegressor

  1. Data Analysis

Output-

Output-

Output- deviation-maximum

  1. Data Visualisation

Output- average number of room distribution

Output-

homeowner distribution with low class

Output- student to teacher ratio distribution

Output- Ages of black owned portion in the town

Output- random graph

Output- crim-zn-indus

  1. Data Cleaning

Output-

  1. Feature Engineering

Output-

Output-

Output- specs-score

Output-

Output- tax-indus-ptratio

  1. Train the model

Output-

  1. Model evaluation

Output-

Output-

  1. Model Testing

Output-

Model Testing

Output-

Output-

rsquared_score

Output-

Output-

rmse-score

Output-

What’s next

To enhance this project further -

  • Web Implementation - Flask can be used to have an interface
  • Models - Here, we have used three models, other models can be tried out to improve accuracy further

Conclusion

  • Boston House Price prediction is the best regarding working on numeric data.
  • In this project, the best accuracy was that of Random Forest at 93%.