Jaypee Business School Case Study on Uber's Dynamic Pricing Algorithm

Jaypee Business School
Jaypee Institute of Information Technology

(Declared Deemed to be University u/s 3 of UGC Act)
A-10, Sector 62, NOIDA, 201 307, INDIA, www.jbs.ac.in
AIB ASSIGNMENT
Brand Personality: Case Study of Uber (Dynamic Price Optimization)
MBA 2020-2022
Submitted by:
Chhaya Dubey (20609045)
Vivek Roy (20609036)
Introduction
Artificial Intelligence powers many of the technologies and services underpinning Uber’s platform,
allowing engineering and data science teams to make informed decisions that help improve user
experiences for products across our lines of business.
At the forefront of this effort is Uber AI, Uber’s centre for advanced artificial intelligence research and
platforms. Uber AI powers applications in computer vision, natural language processing, deep learning,
advanced optimization methods, and intelligent location and sensor processing across the company, as
well as advancing fundamental research and engaging with the broader AI community through
publications and open-source projects.
These machine learning and AI techniques and models allow Uber to move the needle across several
verticals, from transportation and mobility to customer support and driver-partner navigation. In this
year alone, AI research at Uber has led to significant improvements in demand prediction and more
seamless pick-up experiences.
Improving location accuracy with sensing and perception.

In 2019, Uber AI’s Sensing and Perception team worked on projects across our mobile and back-end
stack to improve the coverage, accuracy, speed, and heading of vehicle locations on the Uber platform.
Overcoming the limitations of GPS and having more precise locations makes it easier for riders and
drivers to find one another, improves estimated times of arrival (ETAs), reduces rider and driver
cancellations, and makes our marketplace operate more efficiently.
Enhancing real-time forecasting with neural networks.

Uber leverages ML models powered by neural networks to forecast rider demand, pick-up and drop-off
ETAs, and hardware capacity planning requirements, among other variables that drive our operations.
To improve our forecasting abilities in 2019 and beyond, we developed new tools and techniques to
enhance these models, including X-Ray, GENIE, and HotStarts.
X-Ray is an in-house tool that allows us to search thousands of features in parallel, uncovering those
that will improve a model’s predictions. In 2019, we deployed this tool to production in systems across
the company. In 2020, we plan to integrate X-Ray into the Michelangelo feature store for more accurate
ML model feature assessment, which will enable us to further fine tune our predictions.
Also launched in 2019, GENIE, a novel architecture for deep learning creatively applied to temporal
prediction, powered a 12.3 percent improvement in demand forecasting in over 100 cities worldwide,
while HotStarts for AutoTune, our optimization-as-a-service tool, reduced the cost of tuning ML models
and algorithms by a factor of 5-10 for recurring tasks.
How does Uber use Machine Learning to facilitate its business?

You must be familiar with Uber about how easy it is to use its service. We only need to open the app,
book a cab, a cab comes and takes us to our destination point, and we pay the driver after the ride’s
completion.
In reality, it is not that simple as it appears from the outside. Behind offering such a simple user
experience, Uber runs many background services and complex algorithms. The key component which
is making it possible is Machine Leaning. Let’s see how Uber utilizes Machine Learning to offer
seamless services.
1. Adequate demand-supply chain
Uber deals with a large amount of data daily. Uber forecasts the location and time of the demand by
exploiting both the stored and the real-time data. It uses these estimates to aware the drivers so that
more and more cabs will be available to meet the demand requirements in a particular area. Hence in
this manner, Uber manages and balances demand and supply chain and offers customer-centric services.
2. Fare Estimates
Machine Learning enabled Demand Forecasting allows Uber to play with the prices during the peak
hours for increased profit, but it also comes with a cost of customer retention. Uber calculates fares
using real-time traffic data. It also analyses various external factors that could affect the fares, such as
public transport availability and how accessible these public facilities are.
3. Customer Retention
The gap in the demand-supply chain could result in the unavailability of cabs. Such circumstances may
result in users booking a ride from different available services. Uber’s machine learning-based demand
predictions play a crucial role in customer retention. It uses both historical and real-time data to bridge
the gap between demand and supply.
4. Accurate expected time arrival.

It might be very frustrating for the users to wait for the cabs to reach a pickup location. Using Machine
Learning-based approaches, Uber uses real-time traffic and GPS data, and Map APIs to forecast the
expected arrival time. Specific steps could be taken to decrease the expected time arrival (ETA) when
customers book rides. Uber always focuses on providing a superior customer experience by reducing
the waiting time of the user.
5. Route Optimization.
Uber uses Machine Learning based system to predict the best routes and recommends the most optimal
routes to the drivers. Using its accurate route optimization system, Uber assists drivers in avoiding
crowded areas. Traditionally, the route selection was based on absolute presumption and behavioural
estimation of the driver. They didn’t consider any real-time traffic, road blockage, and other weather
conditions. Machine Learning-based systems incorporate all these parameters and offer the best
services.
6. Uber Pool
Uber has introduced the Uber Pool services that allow shared riding to combat cabs’ difficult
unavailability during peak hours. Uber Pool allows ride-sharing between the riders heading in the same
direction and allows customers to have an economical ride at a lesser price. Uber uses Machine Learning
based algorithms to identify possible matching rides and assign them the same cab. Through such an
advanced system, it also decides whom to pick first and drop first. Uber Pool also uses the stored data
to find out the hidden pattern and accordingly modify the prices to offer the best services to its customers
and, at the same time, manages higher profits.
7. Big Data and Uber

Uber is continuously revolutionizing the world using its Machine Learning based system and data-
driven business model. Its system collects and maintains a large amount of data, uses big data processing
techniques, and offers more personalized services. It solely relied on and frameworks for real-time
processing of large-scale Machine Learning based algorithms.
It maintains a huge database of drivers simultaneously, which allows it to match any ride to that
particular driver in just 10–15 seconds. Uber closely observes each ride and its associated data to predict
more accurate demand-supply chain prices and allocate sufficient resources according to the need. It
considers various external data such as the availability of public transport facilities and many external
factors.
8. Surge Pricing
We must have noticed that sometimes Uber charges us 1.5–2 times the usual price because of the
Machine Learning-powered Surge Pricing algorithm. This algorithm is used to find the most reasonable
prices to offer based on that particular area’s economic and current traffic conditions. It ensures that the
passenger must always get a ride, even when it comes to higher prices. This algorithm uses geo-location
data, and demand forecasting data to position drivers efficiently and highly depends on regression
analysis tools to determine which locations will be the busiest to activate surge pricing in that area. This
could also be used to send more drivers to that location to offer more customer-oriented services,
allowing more customer retention and more profit.
Hence it is highly evident that How Machine Learning is involved in the functioning of Uber. Now it’s
time to move towards implementing one of its use-cases on our own, as this is the best way to learn
something thoroughly.
Problem Statement
In this article, we will develop Uber’s Machine Learning-powered Surge Pricing algorithm. We will
predict the serge multiplier based on different weather conditions. Uber and Lyft’s ride prices are not
constant like public transport and are greatly affected by the demand and supply of rides at a given time.
Sometimes, the weather/rain/snow causes more people to take rides, affecting the service’s pricing.
Here in this section, we’ll be looking into implementing cab price prediction for Lyft and Uber cabs
against the weather based on serge.
Implementation Steps
Step 1: Data Description
The images given below show the structure of the two sets of data that will be used here. Cab price data
consists of the details of each ride along with its corresponding price. At the same time, weather data
gives information regarding the weather at a particular instant of time.
Cab price data

Weather data
Step 2: Data Pre-processing

The initial step involves data cleaning like removing null values, changing the date time to the desired
format, and other data pre-processing steps. After pre-processing and merging the two datasets of Cabs
and Weather, the snapshot of the final data would be:
Merged Dataset
Step 3: Balance the data

If we try to observe the surge multiplier frequency distribution, it is evident (shown in the image below)
that the data is highly imbalanced. Hence we need to apply over-sampling techniques to balance out the
data. Here we have used Synthetic Minority Over Sampling Technique (Smote) Over sampler.
Before SMOTE
Unique, counts = np.unique(train_labels, return_counts=True)
Print(dict(zip(unique, counts)))
Train Data Before SMOTE = (873731, 9) Test Data after SMOTE = (291244, 9). NOTE: We have
removed label = 3 as it had very less samples.
After SMOTE
imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
train_features, train_labels = sm.fit_resample(train_features, train_labels)
Step 4: Model Training

As the price between given source and destination is almost fixed. We need to predict the desirable
surge multiplier to get the appropriate price according to the weather condition. This problem can be
solved as both Regressions as well as the classification problem. We have used Random Forest
Classifier and considering [1, 1.25, 1.5, 1.75, 2.0, 2.5]as 6 different classes. One can also use some other
classifiers like SVM or even a neural network.
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
rf = RandomForestClassifier(n_jobs=-1, random_state = 42,class_weight="balanced")
Step 5: Feature Importance

It is always a good practice to see the dependency of our model on different features. The diagram
below shows that our Random Forest Classifier is mostly dependant upon the Temperature and the
Wind feature.
# Get numerical feature importances
importances = list(rf.feature_importances_)
# List of tuples with variable and importance
feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list,
importances)]
# Sort the feature importances by most important first
feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)
# Print out the feature and importances
[print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];
It can be seen that the model is dependant on the distance feature the most and subsequently on the
other features represented in the decreasing order of importance.
Step 6: Evaluation of the built model

from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
f1_score(test_labels, predictions, average='weighted')
accuracy_score(test_labels, predictions)
For the random classifier model that we built, F1-score is 0.9616, and the Accuracy is 95.77%. So we
can say that the ML model is doing quite a decent job here. The below diagram shows the complete
confusion matrix.
Conclusion.
The swift progress of Machine Learning tools and techniques is continuously bringing favourable
circumstances to offer customer-oriented services and intensify several businesses’ productivity. Uber
has emerged as a king using machine learning-based systems and focusing more on offering Customer
Oriented Services. Artificial Intelligence and Machine Learning backed system helps offer optimized
services and is also highly useful for adding and retaining customer service.

Jaypee Business School Case Study on Uber's Dynamic Pricing Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jaypee Business School Case Study on Uber's Dynamic Pricing Algorithm

Uploaded by

Copyright:

Available Formats

Jaypee Business School

Jaypee Institute of Information Technology

Improving location accuracy with sensing and perception.

Enhancing real-time forecasting with neural networks.

How does Uber use Machine Learning to facilitate its business?

4. Accurate expected time arrival.

7. Big Data and Uber

Cab price data

Step 2: Data Pre-processing

Step 3: Balance the data

Step 4: Model Training

Step 5: Feature Importance

Step 6: Evaluation of the built model

You might also like