Uber Ridesharing Clustering: Notebook

NOTEBOOK
Uber Ridesharing
Clustering
Determining optimal position of cabs
Knowledge Discovery through Brand Stories

#1
Introduction
Uber Technologies, Inc., commonly known as Uber, is an

American technology company. Its services include ride-hailing,
food delivery (Uber Eats), package delivery, couriers, freight
transportation. Formerly it was known as Ubercab (2009 -
2011). Uber’s net worth is $75 million. Currently Uber is present
in 71 countries and 890 cities, with 5 major countries; US,
Mexico, Brazil, Spain, and India.

#2
Why is Uber a Technology
Company?
As it is evident that most of the services offered by uber are in the
transportation industry, but still many believe Uber is a technology
company. It’s because Uber tries to solve problems currently facing
the transportation industry through technology. The following
problems are from the list of problems Uber is trying to solve.
1. Decrease compute time from point A to point B.

2. Be independent and create supply.
3. Determining optimal position of cabs.
Problem 1: Decrease compute time

To solve this problem, Uber started Uber Elevate in 2016. Their goal
is to manufacture and bring drone taxis. This would definitely
decrease travel time from point A to point B. Uber Elevate has
played an important role in laying the groundwork for the aerial
ridesharing market by bringing together regulators, civic leaders, real
estate developers, and technology companies around a shared vision
for the future of air travel.

Problem 2: Be independent and
create supply
Whenever Uber enters a new city or country, they face the challenge
of matching demand (riders) and supply (drivers). Generally, there’s
a huge demand but very less supply. To solve this problem, they are
working on driverless cars, which can be deployed as soon as they
enter the market in a country. With this, they will no longer be
dependent on drivers to register on their platform. Uber ATG
(Advanced Technologies Group) was responsible for all
developments in Uber’s driverless platform.
Problem 3: Determining optimal

position of cabs
This is our problem of the day. We will see how uber tries to
determine the optimal position of cabs.

#Extra
Introduction to Colab
Nearly all machine learning and deep learning algorithms require

good hardware. What if ... you don't have good hardware? Should
you drop your dream to be a data scientist? No, there's an
alternative Let us introduce you to Colab.
Colab is a service provided by Google which lets you access a virtual

machine hosted on google servers. These virtual machines have dual-
core Xeon processors, with 12GB of RAM. You can even use GPU for
your neural networks. Colab is an interactive python notebook
(ipynb), which means that with writing python code, you can also
write normal text, include images.
To create a new colab notebook, just go to
"https://colab.research.google.com", and create a new notebook.
You will get something like below
You can connect to runtime by clicking the "Connect" button in the

right corner. You can add a new Code cell, or text cell using
respective buttons in the toolbar.

#Extra
Introduction to Colab
Uploading files
Sometimes you need to use a file from your PC. For that, you can
upload the required files to google colab. Colab provides 100 GB in a
colab session. If you want to access some files from your drive, you
can even do so by connecting the drive to colab.
To upload something, open file pane from left toolbar.
Now, just upload the file using first button, and you can also mount
google drive using last button

How do they do it?
#3.1
Data Collection
Every time someone books a ride from Uber's application, Uber
saves their location with a timestamp. Uber has been collecting this
data since they begin its operations. Uber uses this data to
determine the optimal position of cabs.
We can also plot these coordinated in a map.

How do they do it?
#3.2
Types of learning algorithms

Supervised Learning
As the name suggests, this type of learning algorithm has a
supervisor or teacher. Basically, a well labeled data is the supervisor
or teacher, and the algorithm uses this data to learn patterns or
features. Like in the fruit sorting problem, a child who can read was
given a table of fruits and their names. He was later asked to sort
the fruits in a basket. With the provided table, he was able to sort
them and knows which basket contains which fruit.

How do they do it?
#3.2
Types of learning algorithms

Unsupervised Learning
Unlike supervised learning, there are no supervisor or teacher present
i.e. algorithms use unlabelled data to find features or patterns.
Consider a newborn child who can’t read. We cannot give the same
table that we gave to the previous child. But when this child will be
asked to sort the fruits, he/she can sort on the basis of shape, color,
or weight. One thing to notice is that at the end, he was able to sort
the fruits in a basket but he/she still doesn’t know which basket
contains which fruit.

How do they do it?
#3.3
K Means
K Means is an Unsupervised clustering algorithm used to find
clusters in unlabelled data. This algorithm is used to divide data into
K clusters, where K is a variable that depends on the case (a problem
that we are solving) and efficiency. This algorithm takes data as
input and returns centroids of K clusters.
How does K Means work:

K Means algorithm consists of three steps, and they are Centroid
initialization, clustering, and updating centroids.

How do they do it?
#3.3
K Means
Step 1: Centroid Initialization

It marks the beginning of the algorithm, and the K number of
centroids is randomly initialized. There are two ways they can be
initialized.
1. Assigning random K data points from the dataset as centroids.
2. Assigning random K points (not from the dataset) as centroids.
Step 2: Clustering
In this step, the algorithm enters the loop and computes the distance
between each data point and each centroid. Data points are
assigned to cluster centroid which is closest. After this step, we have
K clusters of data points.
Step 3: Updating centroids

In this step, centroids are calculated using data points of the cluster
and control goes back to step 2. We can run this loop N number of
times or we can make the algorithm exit when there is no significant
change in centroids.

How do they do it?
#3.4
How to use K Means in
python:
In python, K Means algorithm can be imported from the
sklearn library.
from sklearn.cluster import KMeans
Create KMeans model object:
model = KMeans(n_clusters=K, init='random', max_iter=300)
Parameters:
n_clusters : Used to pass the number of clusters needed
init: specifies how to initialize clusters centroids.

‘K-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See
section Notes in k_init for more details.
‘random’ : choose n_clusters observations (rows) at
random from data for the initial centroids.
max_iter: Maximum number of iterations of the k-means

algorithm for a single run.

How do they do it?
#3.4
How to use K Means in
python:
Train model
model.fit(X)
Parameters:
X : Dataframe, or numpy array
Get cluster centroids
model.cluster_centers_
cluster_centers_ is a numpy array containing K

centroids.
Result

Visit Us at : Email : support@techlearn.live

techlearn.live Phone : +91-9154796743

Uber Ridesharing Clustering: Notebook

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Uber Ridesharing Clustering: Notebook

Uploaded by

Copyright:

Available Formats

NOTEBOOK

Knowledge Discovery through Brand Stories

Uber Technologies, Inc., commonly known as Uber, is an

Knowledge Discovery through Brand Stories

1. Decrease compute time from point A to point B.

Problem 1: Decrease compute time

Knowledge Discovery through Brand Stories

Problem 3: Determining optimal

Knowledge Discovery through Brand Stories

Nearly all machine learning and deep learning algorithms require

Colab is a service provided by Google which lets you access a virtual

You can connect to runtime by clicking the "Connect" button in the

Knowledge Discovery through Brand Stories

To upload something, open file pane from left toolbar.

Knowledge Discovery through Brand Stories

We can also plot these coordinated in a map.

Knowledge Discovery through Brand Stories

Types of learning algorithms

Knowledge Discovery through Brand Stories

Types of learning algorithms

Knowledge Discovery through Brand Stories

How does K Means work:

Knowledge Discovery through Brand Stories

Step 1: Centroid Initialization

Step 3: Updating centroids

Knowledge Discovery through Brand Stories

Create KMeans model object:

model = KMeans(n_clusters=K, init='random', max_iter=300)

init: specifies how to initialize clusters centroids.

max_iter: Maximum number of iterations of the k-means

Knowledge Discovery through Brand Stories

Get cluster centroids

cluster_centers_ is a numpy array containing K

Knowledge Discovery through Brand Stories

Visit Us at : Email : support@techlearn.live

You might also like