You are on page 1of 14

NOTEBOOK

Uber Ridesharing
Clustering
Determining optimal position of cabs

Knowledge Discovery through Brand Stories


#1
Introduction

Uber Technologies, Inc., commonly known as Uber, is an


American technology company. Its services include ride-hailing,
food delivery (Uber Eats), package delivery, couriers, freight
transportation. Formerly it was known as Ubercab (2009 -
2011). Uber’s net worth is $75 million. Currently Uber is present
in 71 countries and 890 cities, with 5 major countries; US,
Mexico, Brazil, Spain, and India.

Knowledge Discovery through Brand Stories


#2
Why is Uber a Technology
Company?
As it is evident that most of the services offered by uber are in the
transportation industry, but still many believe Uber is a technology
company. It’s because Uber tries to solve problems currently facing
the transportation industry through technology. The following
problems are from the list of problems Uber is trying to solve.

1. Decrease compute time from point A to point B.


2. Be independent and create supply.
3. Determining optimal position of cabs.

Problem 1: Decrease compute time


To solve this problem, Uber started Uber Elevate in 2016. Their goal
is to manufacture and bring drone taxis. This would definitely
decrease travel time from point A to point B. Uber Elevate has
played an important role in laying the groundwork for the aerial
ridesharing market by bringing together regulators, civic leaders, real
estate developers, and technology companies around a shared vision
for the future of air travel.

Knowledge Discovery through Brand Stories


Problem 2: Be independent and
create supply
Whenever Uber enters a new city or country, they face the challenge
of matching demand (riders) and supply (drivers). Generally, there’s
a huge demand but very less supply. To solve this problem, they are
working on driverless cars, which can be deployed as soon as they
enter the market in a country. With this, they will no longer be
dependent on drivers to register on their platform. Uber ATG
(Advanced Technologies Group) was responsible for all
developments in Uber’s driverless platform.

Problem 3: Determining optimal


position of cabs
This is our problem of the day. We will see how uber tries to
determine the optimal position of cabs.

Knowledge Discovery through Brand Stories


#Extra
Introduction to Colab

Nearly all machine learning and deep learning algorithms require


good hardware. What if ... you don't have good hardware? Should
you drop your dream to be a data scientist? No, there's an
alternative Let us introduce you to Colab.

Colab is a service provided by Google which lets you access a virtual


machine hosted on google servers. These virtual machines have dual-
core Xeon processors, with 12GB of RAM. You can even use GPU for
your neural networks. Colab is an interactive python notebook
(ipynb), which means that with writing python code, you can also
write normal text, include images.
To create a new colab notebook, just go to
"https://colab.research.google.com", and create a new notebook.
You will get something like below

You can connect to runtime by clicking the "Connect" button in the


right corner. You can add a new Code cell, or text cell using
respective buttons in the toolbar.

Knowledge Discovery through Brand Stories


#Extra
Introduction to Colab
Uploading files
Sometimes you need to use a file from your PC. For that, you can
upload the required files to google colab. Colab provides 100 GB in a
colab session. If you want to access some files from your drive, you
can even do so by connecting the drive to colab.

To upload something, open file pane from left toolbar.

Now, just upload the file using first button, and you can also mount
google drive using last button

Knowledge Discovery through Brand Stories


How do they do it?

#3.1

Data Collection
Every time someone books a ride from Uber's application, Uber
saves their location with a timestamp. Uber has been collecting this
data since they begin its operations. Uber uses this data to
determine the optimal position of cabs.

We can also plot these coordinated in a map.

Knowledge Discovery through Brand Stories


How do they do it?

#3.2

Types of learning algorithms


Supervised Learning
As the name suggests, this type of learning algorithm has a
supervisor or teacher. Basically, a well labeled data is the supervisor
or teacher, and the algorithm uses this data to learn patterns or
features. Like in the fruit sorting problem, a child who can read was
given a table of fruits and their names. He was later asked to sort
the fruits in a basket. With the provided table, he was able to sort
them and knows which basket contains which fruit.

Knowledge Discovery through Brand Stories


How do they do it?

#3.2

Types of learning algorithms


Unsupervised Learning
Unlike supervised learning, there are no supervisor or teacher present
i.e. algorithms use unlabelled data to find features or patterns.
Consider a newborn child who can’t read. We cannot give the same
table that we gave to the previous child. But when this child will be
asked to sort the fruits, he/she can sort on the basis of shape, color,
or weight. One thing to notice is that at the end, he was able to sort
the fruits in a basket but he/she still doesn’t know which basket
contains which fruit.

Knowledge Discovery through Brand Stories


How do they do it?

#3.3

K Means
K Means is an Unsupervised clustering algorithm used to find
clusters in unlabelled data. This algorithm is used to divide data into
K clusters, where K is a variable that depends on the case (a problem
that we are solving) and efficiency. This algorithm takes data as
input and returns centroids of K clusters.

How does K Means work:


K Means algorithm consists of three steps, and they are Centroid
initialization, clustering, and updating centroids.

Knowledge Discovery through Brand Stories


How do they do it?

#3.3

K Means

Step 1: Centroid Initialization


It marks the beginning of the algorithm, and the K number of
centroids is randomly initialized. There are two ways they can be
initialized.
1. Assigning random K data points from the dataset as centroids.
2. Assigning random K points (not from the dataset) as centroids.

Step 2: Clustering
In this step, the algorithm enters the loop and computes the distance
between each data point and each centroid. Data points are
assigned to cluster centroid which is closest. After this step, we have
K clusters of data points.

Step 3: Updating centroids


In this step, centroids are calculated using data points of the cluster
and control goes back to step 2. We can run this loop N number of
times or we can make the algorithm exit when there is no significant
change in centroids.

Knowledge Discovery through Brand Stories


How do they do it?

#3.4
How to use K Means in
python:
In python, K Means algorithm can be imported from the
sklearn library.
from sklearn.cluster import KMeans

Create KMeans model object:

model = KMeans(n_clusters=K, init='random', max_iter=300)

Parameters:
n_clusters : Used to pass the number of clusters needed

init: specifies how to initialize clusters centroids.


‘K-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See
section Notes in k_init for more details.
‘random’ : choose n_clusters observations (rows) at
random from data for the initial centroids.

max_iter: Maximum number of iterations of the k-means


algorithm for a single run.

Knowledge Discovery through Brand Stories


How do they do it?

#3.4
How to use K Means in
python:
Train model

model.fit(X)

Parameters:
X : Dataframe, or numpy array

Get cluster centroids

model.cluster_centers_

cluster_centers_ is a numpy array containing K


centroids.

Result

Knowledge Discovery through Brand Stories


Knowledge Discovery through Brand Stories

Visit Us at : Email : support@techlearn.live


techlearn.live Phone : +91-9154796743

You might also like