You are on page 1of 9

The Amazing Ways Uber

Is Using Big Data


With more than 8 million users, 1 billion Uber trips and
160,000+ people driving for Uber across 449 cities in 66
countries – Uber is the fastest growing startup standing at
the top of its game.

Tackling problems like poor transportation infrastructure


in some cities, unsatisfactory customer experience, late
cars, poor fulfilment, drivers denying to accept credit cards
and more –Uber has “eaten the world” in less than 5 years
and is a remarkable name to reckon when it comes to
solving problems for people in transportation.
• With a huge database of drivers, as soon as a user requests for car, their algorithms
match a user with the most suitable driver within a 15 second window to the
nearest driver. Uber stores and analyses data on every single trip the users take
which is leveraged to predict the demand for cars, set the fares and allocate
sufficient resources.

• Data science team at Uber also performs in-depth analysis of the public transport
networks across different cities so that they can focus on cities that have poor
transportation and make the best use of the data to enhance customer service
experience.

• In fact, uber drivers continue to generate data for Uber even when they are not
carrying any passengers because they transmit data back to the central platform at
Uber which is used to draw inferences on traffic patterns. The data is stored into the
database for supply and demand algorithm analysis.

• Driver data is used for autonomous car research, surge pricing, tracking the location
of drivers, monitoring driver’s speed, motion and acceleration and identifying if a
driver is working for a competing cab sharing company.
Big data analysis spans across diverse functions at Uber – machine learning, data
science, marketing, fraud detection and more. Uber data consists of information about
trips, billing, health of the infrastructure and other services behind its app. City
operations teams use uber big data to calculate driver incentive payments and predict
many other real time events. The complete process of data streaming is done through
a Hadoop Hive based analytics platform which gives right people and services with
required data at right time.

“Whether it’s calculating Uber’s “surge pricing, “helping drivers to avoid accidents, or
finding the optimal positioning of cars to maximize profits, data is central to what Uber
does. All these data problems…are really crystalized on this one math with people all
over the world trying to get where they want to go. That’s made data extremely
exciting here, it’s made engaging with Spark extremely exciting.”- said Uber’s Head of
Data Aaron Schildkrout.

Data Science at Uber
Surge Pricing
1. Uber’s surge pricing model is based on both geo-location and demand (for a ride) to
position drivers efficiently.
2. Data science methodologies are extensively used to analyse the short term effects of
surge pricing on customer demand and long term effects of surge pricing on
retaining customers.
3. Uber depends on regression analysis to find out which neighbourhoods will be the
busiest so it can activate surge pricing to get more drivers on the roads.
4. Sooner - The machine learning algorithms will take multiple data inputs and predict
where the highest demand is going to be so that Uber drivers can be redirected
there. This will ensure that there is no supply and demand shortage so that it does
not have to actually implement surge pricing.
Matching Algorithms at Uber
1. Given a pickup location, drop off location and time of the day, predictive models developed
at Uber predict how long will it take for a driver to cover the distance. Uber has sophisticated
routing and matching algorithms that direct cars to people and people to places. Right from
the time you open the uber app till you reach your destination, Uber’s routing engine and
matching algorithms are hard at work.
2. Uber follows a supplier pick map matching algorithm where the customer selects the
variables associated with a service (in this case Uber app) and makes a match by sending
requests to the most optimal list of service providers. Any Uber ride request is first sent to
the nearest available Uber driver (the nearest available Uber driver is determined by
comparing the customer location with the expected time of arrival of the driver). The Uber
driver then accepts or rejects a ride request. This matching algorithm works well for Uber
since the transaction is highly commoditized i.e. the number of variables that the customer
has to decide before a match is made are minimal.
Fare Estimates

1. Uber uses a mixture of internal and external data to estimate fares.


2. Uber calculates fares automatically using street traffic data, GPS data and its
own algorithms that make alterations based on the time of the journey.
3. It also analyses external data like public transport routes to plan various
services.
Uber Data Science Tools
1. Commonly used third party modules to do data science at Uber include
Python,NumPy, SciPy, Matplotlib and Pandas.
2. Uber data team does use R programming language, Octave or Matlab occasionally
for prototypes or one-off data science projects and not for production stack.
3. D3 is the most preferred data visualization tool at Uber and Postgres, the most
preferred SQL framework.

Source - https://www.dezyre.com/article/how-uber-uses-data-science-to-reinvent-
transportation/290

You might also like