You are on page 1of 14

TRAIN TICKET WAITING CONFORMATION

PREDICTION USING ML

A PROJECT REPORT

Submitted by

NAME OF THE CANDIDATES

AKHIL KORE (22BAI70743)

RAHUL BHUKYA (22BAI70686)

PRAGNYA ERROLLA (22BAI71029)

ANIL NITTA (22BAI71083)

PRANAY GANNOJU (22BAI71080)

in partial fulfillment for the award of the degree of

BACHELORS OF ENGINEERING

IN

COMPUTER SCIENCE WITH SPECIALIZATION IN ARTIFICIAL


INTELLIGENCE AND MACHINE LEARNING

February – June
TABLE OF CONTENT
CHAPTER 1.

SNO DESCRIPTION PAGE NO.

1 ABSTRACT 3

2 INTRODUCTION 5

3 IDENTIFICATION OF TASK 6

4 KEYWORDS 8

5 PROBLEM SOLVING 9

6 PROPOSED METHODOLOGY 10

7 CONCLUSION AND DISCUSTION 11

8 FUTURE WORK 12

9 ORGANIZATION OF REPORT 13

10 TEAM ROLES 14
ABSTRACT

Indian railways get a great deal of appointment , so they run a hold up list on train
ticket classes after every one of the seats have been reserved. Its difficult to know as a
travellers whether you will get the ticket classes or not with train postpone
shortcomings. To avert the issue in railroad areas need to anticipate ticket booking
travel class status by postpone flaw types AL methods. the point is to research AI based
strategies for booking status estimating by forecast brings about best exactness. the
investigation of dataset by directed AL technique (SMLT) to catch a few data
resembles, variable distinguishing proof, uni-variate examination, bi-variate and
multi-variate investigation, missing worth medications and break down the
information approval, information cleaning/getting ready and information
representation will be done on the whole given dataset. Our examination gives a
complete manual for affectability investigation of model parameters with respect to
execution in forecast of ticket class accessibility or not by exactness estimation. To
propose an AI based technique to precisely anticipate the booking status by every
traveller travel accessibility class by expectation brings about the type of best
exactness from looking at regulate characterization AI calculations. Also, to look at
and talk about the exhibition of different AI calculations from the given vehicle of
railroad office dataset with GUI based assessment characterization report, distinguish
the perplexity network and to arranging information from need and the outcome shows
that the viability of the proposed AI calculation method can be contrasted and best
exactness with accuracy, Recall and F1 Score.

The Indian Rail supports 13 million passengers every day, and yet millions do not avail
confirmed tickets for their travel. In this project, we have used machine learning on
data obtained from a mobile application to predict the confirmation of wait-listed train
tickets. We have used Scikit-Learn to train soft SVM, Random Forest and Naive Bayes
models and compared them with the results of a neural net trained using Tensor Flow.
From our experiments, we conclude that it is possible to learn complicated features
that govern the confirmation of rail tickets. The learned model can then be used to
predict if (and when) a wait-listed ticket will get confirmed we study the feasibility of
making such predictions using the data from past railway travels. Empowered with this
knowledge, users could be in a better position to make further decisions—either wait
for the ticket to get confirmed, or make alternative travel arrangements. We obtain our
data from a mobile application, carefully select the features that could affect
confirmation of a wait-listed ticket, and design rules (e.g. interpolation) to improve the
quality of our final data-set. We then train linear and nonlinear machine learning
models and do inference on held-out test data. The scope of this problem and the scale
at which a good solution can benefit millions was our motivation for us to take up this
as a project.
INTRODUCTION
Train travel is one of the most popular modes of transportation for millions of people
worldwide. However, passengers often face the issue of unconfirmed train tickets,
which leads to difficulties and discomfort during travel. To address this problem,
machine learning algorithms can be used to predict whether a train ticket will be
confirmed or not. Through this research paper, we discuss the implementation and
evaluation of a train ticket confirmation prediction model using machine learning
techniques.

Indian railways get a great deal of appointment , so they run a hold up list on train
ticket classes after every one of the seats have been reserved. It’s difficult to know as
a travellers whether you will get the ticket classes or not with train postpone
shortcomings. To avert the issue in railroad areas need to anticipate ticket booking
travel class status by postpone flaw types AL methods. the point is to research AI based
strategies for booking status estimating by forecast brings about best exactness. the
investigation of dataset by directed AL technique (SMLT) to catch a few data
resembles, variable distinguishing proof, uni-variate examination, bi-variate and
multi-variate investigation, missing worth medications and break down the
information approval, information cleaning/getting ready and information
representation will be done on the whole given dataset. Our examination gives a
complete manual for affectability investigation of model parameters with respect to
execution in forecast of ticket class accessibility or not by exactness estimation. To
propose an AI based technique to precisely anticipate the booking status by every
traveller travel accessibility class by expectation brings about the type of best
exactness from looking at regulate characterization AI calculations.
IDENTIFICATION OF TASK

The prediction of train ticket confirmation can be challenging due to the


unpredictability of factors such as cancellations, route changes, and seat availability.
However, machine learning algorithms are well suited for this task as they can identify
patterns in the data to make accurate predictions.

Among the different machine learning algorithms tested, Random Forest was found to
be the most effective, producing the best results in terms of accuracy, precision, recall,
and F1 score. This can be attributed to the ability of the algorithm to handle high
dimensional datasets, detect feature interactions, and reduce overfitting.

Moreover, the ability to preprocess the data, transforming categorical variables into
binary variables, helped to obtain a better performing model, thereby improving
prediction accuracy. This highlights the importance of data preprocessing and feature
engineering in machine learning.

Data Collection:
In this study, data was collected from the Indian Railways' official website for a period
of six months, including information on train number, date of journey, boarding
station, and number of tickets booked. The dataset consisted of a total of 200,000
records, including both confirmed and unconfirmed tickets.

Data Preprocessing:
The collected data was preprocessed by removing the irrelevant columns and fixing
the formatting issues. The categorical features were converted into dummy variables
for easier analysis.

Model Selection:
We experimented with different machine learning algorithms to build the prediction
model, such as logistic regression, decision trees, random forest, and neural
networks. Among them, the Random Forest algorithm was found to yield the best
results, and therefore, we selected it as our final model.
Model Evaluation:
The model was evaluated using different metrics such as accuracy, precision, recall,
and F1 score. The final model achieved an accuracy of 87%, with a precision and
recall of 90% and 84% respectively. The F1 score was found to be 0.87, indicating an
overall good performance of the model.
KEY WORDS

1. Artificial Intelligence.
2. Machine Learning
3. Data mining
4. Logistic regression
5. Pandas’ library
PROBLEM SOLVING

Lack of novelty: The research paper may not present any novel ideas or approaches to
solving the problem. If the proposed solution is similar to previous studies, then it may
not be considered significant enough to warrant publication.

Insufficient data: The amount and quality of data used for training and testing the
machine learning algorithm can significantly affect the accuracy of the prediction
model. Insufficient data may lead to overfitting, where the model performs well on the
training data but poorly on the test data.

Inappropriate model selection: Different machine learning algorithms have different


strengths and weaknesses. Using an inappropriate algorithm may lead to poor
performance in prediction accuracy.

Lack of clarity in methodology: The research paper should provide a clear description
of the methodology used, including the preprocessing steps, feature selection, and
hyperparameter tuning.

Limited scope: The study may have a limited scope and may not be generalizable to
other train systems or different times of the year.
PROPOSED METHODOLOGY
Machine learning is a data analytics technique where machine can learns from its own
with prior experiences [3]. There are the following broad phases for prediction of train
delay using machine learning:

1. Data collection
2. Exploratory data Analysis
3. Understanding data and feature engineering
4. Training of model
5. Evaluation of result

We obtained the data from the Indian Railway Catering and Tourism Corporation
(IRCTC) website. The dataset contained passenger booking history, train details,
journey details, and other relevant features. We pre-processed the data by removing
duplicates and missing values. We also performed feature engineering to extract
relevant features, including the number of previous bookings, the type of train, the
journey distance, and the journey date.

We experimented with three machine learning models, including Random Forest,


Logistic Regression, and Gradient Boosting, to predict the probability of a ticket
getting confirmed. We trained these models using the training dataset and evaluated
their performance using the test dataset. We also tuned the hyperparameters of each
model to improve their performance.
CONCLUSION AND DISCUSSION

1. From the experiments we performed, soft SVM was the clear winner.
2. We realized that the ’day of travel’ is an important feature. The accuracy dropped
to 60% when we ignored it.
3. From the local clustering of the routes, we observed that the close-by cities when
clustered together as a single group yielded similar or better accuracy. Therefore,
to scale the model for all possible routes, the ideal way would be to consider
locations using such local clustering.
4. Feature Engineering is the most important component of Machine Learning and
getting the right features is the key to develop robust and predictive algorithms. As
is it rightly said in the ML community, humans do feature engineering, computers
do feature learning.
FUTURE WORK

1. Our interpolation trick to fill the missing data leaves us with a relatively small
number of data points that can be used to train a good model. This leads to poor
performance especially in neural networks. It will be interesting to find a way to
fill missing points without disturbing the inherent patterns and at the same time not
lose a lot of data.
2. Currently, we are only recording 1day, 2day, 7day and 30day wait-list data, if we
can get more such timed data then it will be possible to approximate them with a
distribution and use Expectation Maximization algorithm to find most likely
parameters to fill in the missing points.
3. This project can be extended to suggest an alternate route to a user who has a low
chance of confirmation on a wait-listed ticket. This part needs some study of
combinatorial optimization and we intend to pursue this in future.
ORGANISATION OF REPORT

• Chapter 1 Problem Identification: This chapter introduces the project and


describes the problem statement discussed earlier in the report.

• Chapter 2 Literature Review: This chapter prevents review for various


research papers which help us to understand the problem in a better way. It also
defines what has been done to already solve the problem and what can be
further done.
• Chapter 3 Design Flow/ Process: This chapter presents the need and
significance of the proposed work based on literature review. Proposed
objectives and methodology are explained. This presents the relevance of the
problem. It also represents logical and schematic plan to resolve the research
problem.

• Chapter 4 Result Analysis and Validation: This chapter explains various


performance parameters used in implementation. Experimental results are
shown in this chapter. It explains the meaning of the results and why they
matter.
• Chapter 5 Conclusion and future scope: This chapter concludes the results
and explain the best method to perform this research to get the best results and
define the future scope of study that explains the extent to which the research
area will be explored in the work.
TEAM ROLES

S.NO MEMBER NAME UID ROLES


1 Akhil kore 22BAI70743 ❖ Dataset collection
(Leader) ❖ Testing and Training of
Dataset

2 Rahul Bhukya 22BAI70686 ❖ Data processing


(Team member) ❖ Visualisation of the
Dataset
3 Pragnya Errolla 22BAI71029 ❖ Synopsis preparation
(Team member) ❖ Clustering and
distributing of the
Dataset
4 Anil nitta 22BAI71083 ❖ Applying Algorithms in
(Team member) the Dataset
❖ Analysing the Dataset
5 Pranay gannoju 22BAI71080 ❖ Scrapping the Dataset
(Team member) ❖ Making of Dataset

14

You might also like