You are on page 1of 12

Analysis Report – Analytics Challenge

Julián David Rojas Castaño

Bogotá D.C., Colombia – 2020


Development

Part 1. Metrics design

1. First of all, we need to make 2 assumptions: (1)it is neccesary to relocate


the couriers in peak hours (7-9 am, 12-2 pm and 6-8 pm) due to the high
volume of orders and (2)couriers prefer to be in zones with high volume of
orders which creates a bottleneck and leave other zones without couriers to
pick user’s orders. Based on this, I would think that the main variables which
the research team studied are the following:

 Zones with a high quantity of couriers in peak hours.


 Zones with a low quantity of couriers in peak hours.
 Quantity of orders in zones with high quantity of couriers in peak
hours.
 Quantity of orders in zones with low quantity of couriers in peak
hours.
 Quantity of couriers in zones with high quantity of orders in peak
hours.
 Quantity of couriers in zones with low quantity of orders in peak
hours.

Graphic 1 represents my point of view:


Graphic 1. Representation of the problem
2. Based on the variables chosen in the first question and making the
assumption that the purpose of the optimization model is to avoid a
bottleneck in zones with high quantity of couriers and also ensure enough
couriers for all the orders in zones with low quantity of couriers, I would
suggest the following 3 metrics for measuring its impact in order of
importance:

a. Quantity of couriers vs quantity of orders in a certain period of


time in zones with low quantity of couriers compared to other
zones.

Quantity of couriers
X zones withlow quantity of couriers= (It should be 1 or near to 1)
Quantity of orders

b. Quantity of couriers vs quantity of orders in a certain period of


time in zones with high quantity of couriers compared to other
zones.

Quantity of couriers
Y zoneswith highquantity of couriers = (It should be 1 or near to 1)
Quantity of orders
c. Amount of time between when the user orders something and
any courier picks the order, no matter the zone.

Z Amount of time between …

Note: The reason why I didn’t unify metric “a” and metric “b” its because our
goal is to get an index of 1 in each one. It means that we have enough
couriers for the amount of orders in that particular zone. All the cities has
zones with a bigger volume of orders compared to other zones, the main
goal is to keep a balance between the demand (orders) and the supply
(couriers). My proposal is to always compare zones with high amount of
couriers vs zones with low amount of couriers. (Of course real operations
will demand something more elaborated).

3. I would take the following approach for testing the new optimization model:

We need to choose only one city to test the performance. This city has to be
small, in a country like Colombia, it could be Montería or Soledad. Then we
need to determine which zones has high quantity of couriers in peak hours
and which one has low quantity of couriers in the same peak hours. The
idea is to check the performance of the model for one week because
Mondays to Fridays are different from Saturdays to Sundays. Then we can
begin gathering information based on the metrics proposed before. I would
propose the following standard:

Table 1. Table to track Rappi’s new optimization model


Of course this is just a general proposal, for the actual operations, we would
have to analize large amount of data. For variables X and Y, the purpose is
to make them get near to 1 as much as possible. For variable Z, the purpose
is to make decrease it as much as possible.

If the objective of the new optimization model is to better allocate couriers


and maximize the chance that there is an available courier for each one of
the orders, which is an amazing milestone, the company has to take the risk
and monitor the results, as I said, in a small city for only one week. If the
company gets an improvement of the performance based on the metrics
stablished compared with the old model, we can use this new model for 2 or
3 more weeks always making sure that the rest of the operations are
working well. If at some point the new model breaks, the company have to
apply a contingency plan to roll back to the old model.

4. Common sense would tell you to compare the old model with the new one,
and that’s what I would do. To be able to deliver good conclusions, we need
to manage same metrics in both models, it means that, we need to gather
enough information using the same variables. Table 2 represents, in general
terms, how to compare both optimization models using the same metrics.

Table 2. Table to compare Rappi’s new optimization model vs old one

Getting an improvement of 1% could be very helpful in a large operation like


Rappi’s one.
After establishing that the new model is better than the old one, we
need to take into account the following conditions:

 Deployment cost
 Deployment time
 Scalability
 Faults
 Integration with the couriers

Just after ensuring that those conditions are in good terms then we
can proceed to deploy the new optimization model to all the cities,
obviously city by city, not all of them at the same time and of course
tracking the performance and having everything ready in case the
contingency plan is needed because the optimization model is broken.

Part 3. Data analysis

1. First of all, we need to organize the information due to the actual format of
the file. I divided the values into different columns and made some changes
in the format of each column. Then I selected all the information to create a
pivot table which is an excellent tool for these cases.

I created 2 graphics to answer the questions as below:


Taken vs non-taken orders
9689; 8%

Non-taken
Taken
115860; 92%

Plot 1. Taken vs non-taken orders

Plot 1 shows me that 9.689 orders were not taken by any courier, which is
the 8% of the total orders (125.549). We can interpret it as a very low value
with the goal of keep decreasing it as much as possible.

Day vs non-taken orders


2090
1630
1324 1252 1190 1113
1090

o s s s o s es
a d
rne a rte
ev
e
in
g
u ne o l
b e l rc
sá vi m ju m é
do m
i
Plot 2. Day vs non-taken orders

Plot 2 shows that from the non-taken orders, Saturday is the day of all the
week with the highest value (2.090-21.5%) and also Friday has the highest
value (1.630-16.8%) within labor days (Monday to Friday).
2. From my point of view, the most important variables to determine wether or
not an order is going to be taken by a courier are the following three:
to_user_distance, to_user_elevation and total_earning. Due to the large
amount of values for each column, it was neccesary to group them:

total_earning (COP) cell color


0 and 2.000  
2.000 and 3.000  
3.000 and 4.000  
4.000 and 5.000  
5.000 and 6.000  
6.000 and 7.000  
7.000 and 8.000  
8.000 and 9.000  
more than 9.000  

to_user_distance (km) cell color


0 and 0.5  
0.5 and 1  
1 and 1.5  
1.5 and 2  
2 and 2.5  
2.5 and 3  
3 and 3.5  
3.5 and 4  
more than 4  

to_user_elevation (m) cell color


more than 0  
-40 and 0  
less than -40  

The logic behind the limits are just after analyzing all the information and
finding a pattern. The only special case was “to_user_elevation”. I had to
make the assumption that all the couriers were using bikes and I found
in Internet that when the difference of the altitude from point A to point
B is less than -40m (it means that the courier has to go up) the physical
effort was just too big to make it, so they would prefer to don’t do it.

The next step was to compare each of the three columns with the taken or
non-taken column, which give us a simple estimate on wether the order is
going to be taken by a courier or not. Then I found the following results:

    taken/non-taken percentage
total_earning
(COP) cell color taken non-taken taken non-taken
0 and 2.000   3 1 75% 25%
2.000 and 3.000   4 0 100% 0%
3.000 and 4.000   20183 3017 87% 13%
4.000 and 5.000   29752 2350 93% 7%
5.000 and 6.000   29723 1849 94% 6%
6.000 and 7.000   17056 1200 93% 7%
7.000 and 8.000   9252 562 94% 6%
8.000 and 9.000   5425 307 95% 5%
more than 9.000   4462 403 92% 8%

    taken/non-taken percentage
to_user_distance
(km) cell color taken non-taken taken non-taken
0 and 0.5   13458 603 96% 4%
0.5 and 1   26863 1478 95% 5%
1 and 1.5   25589 1926 93% 7%
1.5 and 2   21817 2259 91% 9%
2 and 2.5   14485 1662 90% 10%
2.5 and 3   7021 928 88% 12%
3 and 3.5   4345 561 89% 11%
3.5 and 4   1745 236 88% 12%
more than 4   537 36 94% 6%

    taken/non-taken percentage
to_user_elevation
(m) cell color taken non-taken taken non-taken
more than 0   69774 5532 93% 7%
-40 and 0   29868 2458 92% 8%
less than -40   16218 1699 91% 9%

To get deeper and more realistic conclussions, my proposal is to create


combinations with the most important 4 different columns changing the
conditions in each one:
As you can observe, this process will create an iterative method which
is very unefficient to do in Excel. I suggest to create a program using
Python to find all the combinations and plot them:

Combinations=9 x 9 x 3=243

3. First of all, I would like to express the importance of getting and delivering
this kind of information. At least 90% of the decisions taken insided a
company should be based on data obtained from daily operations. That’s
why Analytics and Operations Department should be considered among the
most crucial in any organization.

Based on the analysis made with the dataset provided, I would conclude the
following to Rappi’s Business Department to increase the number of orders
taken by the couriers:

 There is a 13% of chances that the courier will deny the order if the
payment for the delivery is between $3.000 and $4.000 COP.
 There is only 8 orders with a payment fewer than $3.000 COP.
 There is a 100% of chances that the courier will take the order if the
payment for the delivery is between $2.000 and $3.000 COP and the
distance between the user and the restaurant is fewer than 2.45 km.
 The highest chances to take an order are if the payment is between
$8.000 and $9.000 COP with a 95%, the courier could take longer
trips with this payment.
 There is a 12% of chances that the courier will deny the order if the
distance for the delivery is between 2.5 and 3 km.
 The courier will take almost all of the orders if the distance for the
delivery is between 0 and 0.5 km, unless the payment is very low or
the difference of the altitude is very high.
 The courier is willing to take trips with moren than 4 km only if the
payment is higher than $4.000 COP and the altitude is higher than 0.
In cases that the difference of the altitude is fewer than 40 m, the
payments usually should be higher than $7.000 COP.
 There is a 93% of chances that the courier will take the order if the
difference of the altitude are higher tan 0, it means that the courier
has to go downhill.
 There is only 91% of chances that the courier will take the order if the
difference of the altitude are lower than -40 m, Rappi needs to
balance that with lower distances and higher payments.
 There are no orders with the following conditions: earnings between 0
and $2.000 COP, distances between 0 and 0.5 km and difference of
the altitude higher than 0.
 There is a 97% of chances that the courier will take the order if it has
the following conditions: earnings between $5.000 and $6.000 COP,
distances between 0 and 0.5 km and difference of the altitiude higher
than 0.
 There are 6 orders which are very uncommon, the payment were
higher than $9.000, the distance was between 0 and 0.5 km and the
difference of the altitude was higher than 0 and the courier denied
them. Also with 7 orders with payments between $8.000 and $9.000
and 17 orders with payments between $7.000 and $8.000. It could be
due to the hour or the zone of the order.

One of the main strategies to encourage couriers to take an order or not


would be paying more money to them but Rappi needs to find a balance
between how much is too much money and how much they can afford while
having a sustainable business.

Rappi can take advantage from my conclussions by trying to change the


conditions in cases that the chances of rejecting the orders are bigger than
(3%). After making all the combinations established before (243), Rappi can
have a bigger picture and context on how to proceed with the objective of
decreasing the rejections from couriers.
General conclusions
To finish this report, I would like to express the following recommendations and
experiences:

 My laptop almost crashed, the large amount of data inside the excel
file couldn’t be handled with a good performance. (I really need to buy
a new one).
 To manage this kind of information is more convenient to use Power
BI, the issue is that I have an old MacBook Air.
 While working in this kind of analysis, you have to make a series of
assumptions, maybe in the real operations my approach would have
been different compared to this one.
 It is important to check the excel file (orders.xlsx) to understand
the source of my arguments.
 The format of the values were a big issue while working with this
excel file. It consumes a lot of RAM memory to convert from a text
format to a number format.
 Sometimes Excel and Power BI are not efficient enough to deal with
this amount of information, that’s why it is more advisable to use
Python.
 Another tool, like Python, could be more efficient when you have to
make multiple iterations.

You might also like