You are on page 1of 15

Prediction of cab cancellation of Online

Taxi Booking Website


Name – Samip Sen
Class roll no. - 69
MAKAUT roll no. - 12200121010
Department - CSE
Subject with code – Data Warehousing and Data Mining (PEC-IT602B)
St. Thomas’ College of Engineering & Technology
Date of Submission – 25-01-2024
Contents
• Introduction..........................................................................3
• Goal of the Analysis...............................................................4
• Dataset.................................................................................5
• Models Used for Analysis.......................................................6
• Prediction or Classification....................................................7
• Decision tree Analysis...........................................................8
• Logistic Regression Analysis...................................................9
• Memory Based Regression Analysis........................................10
• Best Model........................................................................11-12
• Future Scope.........................................................................13
• Conclusion............................................................................14
Introduction
Explanation of the Problem
• We are to find the "lowest average cost of error"
• We have to predict the lowest cost of error; that is the lowest
misclassification cost for predicting misclassifying a cancelled
booking as un-cancelled.
• Misclassification cost based on our problem and analysis is also
referred to as cost of error
Goal of the Analysis
• Our goal is to reduce the cost incurred by the company as a result of
cab cancellations being misclassified as un-cancelled
• By predicting possible cancellations an hour before the pickup time,
YourCabs will be better able to manage its drivers by providing them
with up-to-date information about customer cancellations and reduce
the cost incurred from sending a cab to a booking location that has
been cancelled by the customer
• Explanation of the Dataset : For the purpose of this analysis, we used the
TRAINING Dataset which include 43,000 bookings of Yourcabs.com
• This dataset was used because it is meant to build and evaluate predictive
models. It includes our output car_cancellation and the misclassification costs
in Cost_of error. Our output and potential input are all represented here
Models Used for Analysis
Prediction or Classification
• Since we are determining the cost associated with misclassifying a cancelled
booking (1) as an un-cancelled booking (0) it is a Classification Problem that we are
trying to solve.
• The models that we have looked at are Decision Tree, Logistic Regression and
Memory Based Reasoning. The overall model diagram stands as:
Decision Tree Analysis

• Decision trees are a simple, but powerful form of multiple variable analysis
• Decision trees are produced by algorithms that identify various ways of
splitting a data set into branch-like segments
• Decision tree can reflect both a continuous and categorical object of analysis
Logistic Regression Analysis

• Logistic regression measures the relationship between the categorical


dependent variable and one or more independent variables by
estimating probabilities using a Logistic Function, which is the
cumulative logistic distribution
Memory Based Reasoning Analysis

• Memory Based Reasoning a.k.a KNN is a non-parametric method, where it does not
involve estimation of parameters
• It makes no assumptions about the data (normality, multicollinearity, etc.) and thus is
a data-driven and not a model-driven analysis technique
Best Model
The best model from our
analysis is 'Decision Tree'.
• How?
1. SAS Suggestion
2. ROC
3. Cumulative Lift Chart
4. Overall Error Rate
Future Scope
• Mobile App Enhancements:
• Develop a user-friendly mobile app interface that educates users about the factors
affecting cancellations and encourages responsible booking behavior.
• Implement push notifications to alert users about potential cancellations and
suggest alternative times or routes.
• Predictive Customer Support:
• Implement a predictive customer support system that anticipates user queries
related to cancellations and provides proactive assistance.
Conclusion
• As we look to the future, the prediction of cab cancellations not only optimizes
operational efficiency for service providers but also empowers users with
insights, proactive notifications, and personalized recommendations. The vision
outlined here aims to transform the online taxi booking experience into a more
intelligent, responsive, and user-centric service, setting the stage for a paradigm
shift in the way we perceive and interact with transportation in the digital age.
Through continuous innovation and collaboration, the prediction of cab
cancellations emerges as a pivotal element in the ongoing evolution of smart,
efficient, and customer-focused mobility solutions.
Thank You.

You might also like