Professional Documents
Culture Documents
ON
Submitted by:
Farhan Ahmad
Registration No : 11916199
(OCT, 2022)
ACKNOWLEDGEMENT
I have taken efforts in this project. However, it would not have been
possible without the kind support and help of many individuals and
organizations. I would like to extend my sincere thanks to all of them.
Flight Fare Prediction Using Machine Learning
Overview
In this article, we will be analyzing the flight fare prediction using Machine
Learning dataset using essential exploratory data analysis techniques then
will draw some predictions about the price of the flight based on some
features such as what type of airline it is, what is the arrival time, what is
the departure time, what is the duration of the flight, source, destination and
more.
About the dataset
1. Airline: So this column will have all the types of airlines like Indigo,
Jet Airways, Air India, and many more.
3. Source: This column holds the name of the place from where the
passenger’s journey will start.
5. Route: Here we can know about that what is the route through which
passengers have opted to travel from his/her source to their
destination.
8. Total_Stops: This will let us know in how many places flights will stop
there for the flight in the whole journey.
9. Additional_Info: In this column, we will get information about food, kind
of food, and other amenities.
10. Price: Price of the flight for a complete journey including all the
expenses before onboarding.
All the Lifecycle In A Data Science Project is divided into four parts:
Now, let’s start with the task of machine learning to predict Flight fare. I will
start by importing all the necessary libraries that we need for this task and
import the train dataset.
1) Importing libraries
3. Categorical Variables
4. Outliers
3. Categorical Variables
4. Outliers
Univariate Analysis
Feature Selection
Model Training:
We do not know beforehand which model will perform best on this problem,
as it is unknowable. We used Extra tree Regressor, Random Forest
Regression Model on the train set. you can try any number of regression
models and choose one among them which is best suitable.
we drop the “price” column from train dataset and make independent
variable to find correlation between dependent and independent data. After
cleaning the data, we can visualize data and better understand the
relationships between different variables. There are many more
visualizations that you can do to learn more about your dataset, like
scatterplots, histograms, boxplots, etc .
Scatter plots are used to observe relationships between variables and uses
dots to represent the relationship between them. here points are nearly
aligned in a line.
Hyperparameter Tuning
RMSE (Root Mean Squared Error) is the error rate by the square root of
MSE.
Saving model
BIBLIOGRAPHY
Books
website
You tube