Professional Documents
Culture Documents
Analysis Report - Analytics Challenge
Analysis Report - Analytics Challenge
Quantity of couriers
X zones withlow quantity of couriers= (It should be 1 or near to 1)
Quantity of orders
Quantity of couriers
Y zoneswith highquantity of couriers = (It should be 1 or near to 1)
Quantity of orders
c. Amount of time between when the user orders something and
any courier picks the order, no matter the zone.
Note: The reason why I didn’t unify metric “a” and metric “b” its because our
goal is to get an index of 1 in each one. It means that we have enough
couriers for the amount of orders in that particular zone. All the cities has
zones with a bigger volume of orders compared to other zones, the main
goal is to keep a balance between the demand (orders) and the supply
(couriers). My proposal is to always compare zones with high amount of
couriers vs zones with low amount of couriers. (Of course real operations
will demand something more elaborated).
3. I would take the following approach for testing the new optimization model:
We need to choose only one city to test the performance. This city has to be
small, in a country like Colombia, it could be Montería or Soledad. Then we
need to determine which zones has high quantity of couriers in peak hours
and which one has low quantity of couriers in the same peak hours. The
idea is to check the performance of the model for one week because
Mondays to Fridays are different from Saturdays to Sundays. Then we can
begin gathering information based on the metrics proposed before. I would
propose the following standard:
4. Common sense would tell you to compare the old model with the new one,
and that’s what I would do. To be able to deliver good conclusions, we need
to manage same metrics in both models, it means that, we need to gather
enough information using the same variables. Table 2 represents, in general
terms, how to compare both optimization models using the same metrics.
Deployment cost
Deployment time
Scalability
Faults
Integration with the couriers
Just after ensuring that those conditions are in good terms then we
can proceed to deploy the new optimization model to all the cities,
obviously city by city, not all of them at the same time and of course
tracking the performance and having everything ready in case the
contingency plan is needed because the optimization model is broken.
1. First of all, we need to organize the information due to the actual format of
the file. I divided the values into different columns and made some changes
in the format of each column. Then I selected all the information to create a
pivot table which is an excellent tool for these cases.
Non-taken
Taken
115860; 92%
Plot 1 shows me that 9.689 orders were not taken by any courier, which is
the 8% of the total orders (125.549). We can interpret it as a very low value
with the goal of keep decreasing it as much as possible.
o s s s o s es
a d
rne a rte
ev
e
in
g
u ne o l
b e l rc
sá vi m ju m é
do m
i
Plot 2. Day vs non-taken orders
Plot 2 shows that from the non-taken orders, Saturday is the day of all the
week with the highest value (2.090-21.5%) and also Friday has the highest
value (1.630-16.8%) within labor days (Monday to Friday).
2. From my point of view, the most important variables to determine wether or
not an order is going to be taken by a courier are the following three:
to_user_distance, to_user_elevation and total_earning. Due to the large
amount of values for each column, it was neccesary to group them:
The logic behind the limits are just after analyzing all the information and
finding a pattern. The only special case was “to_user_elevation”. I had to
make the assumption that all the couriers were using bikes and I found
in Internet that when the difference of the altitude from point A to point
B is less than -40m (it means that the courier has to go up) the physical
effort was just too big to make it, so they would prefer to don’t do it.
The next step was to compare each of the three columns with the taken or
non-taken column, which give us a simple estimate on wether the order is
going to be taken by a courier or not. Then I found the following results:
taken/non-taken percentage
total_earning
(COP) cell color taken non-taken taken non-taken
0 and 2.000 3 1 75% 25%
2.000 and 3.000 4 0 100% 0%
3.000 and 4.000 20183 3017 87% 13%
4.000 and 5.000 29752 2350 93% 7%
5.000 and 6.000 29723 1849 94% 6%
6.000 and 7.000 17056 1200 93% 7%
7.000 and 8.000 9252 562 94% 6%
8.000 and 9.000 5425 307 95% 5%
more than 9.000 4462 403 92% 8%
taken/non-taken percentage
to_user_distance
(km) cell color taken non-taken taken non-taken
0 and 0.5 13458 603 96% 4%
0.5 and 1 26863 1478 95% 5%
1 and 1.5 25589 1926 93% 7%
1.5 and 2 21817 2259 91% 9%
2 and 2.5 14485 1662 90% 10%
2.5 and 3 7021 928 88% 12%
3 and 3.5 4345 561 89% 11%
3.5 and 4 1745 236 88% 12%
more than 4 537 36 94% 6%
taken/non-taken percentage
to_user_elevation
(m) cell color taken non-taken taken non-taken
more than 0 69774 5532 93% 7%
-40 and 0 29868 2458 92% 8%
less than -40 16218 1699 91% 9%
Combinations=9 x 9 x 3=243
3. First of all, I would like to express the importance of getting and delivering
this kind of information. At least 90% of the decisions taken insided a
company should be based on data obtained from daily operations. That’s
why Analytics and Operations Department should be considered among the
most crucial in any organization.
Based on the analysis made with the dataset provided, I would conclude the
following to Rappi’s Business Department to increase the number of orders
taken by the couriers:
There is a 13% of chances that the courier will deny the order if the
payment for the delivery is between $3.000 and $4.000 COP.
There is only 8 orders with a payment fewer than $3.000 COP.
There is a 100% of chances that the courier will take the order if the
payment for the delivery is between $2.000 and $3.000 COP and the
distance between the user and the restaurant is fewer than 2.45 km.
The highest chances to take an order are if the payment is between
$8.000 and $9.000 COP with a 95%, the courier could take longer
trips with this payment.
There is a 12% of chances that the courier will deny the order if the
distance for the delivery is between 2.5 and 3 km.
The courier will take almost all of the orders if the distance for the
delivery is between 0 and 0.5 km, unless the payment is very low or
the difference of the altitude is very high.
The courier is willing to take trips with moren than 4 km only if the
payment is higher than $4.000 COP and the altitude is higher than 0.
In cases that the difference of the altitude is fewer than 40 m, the
payments usually should be higher than $7.000 COP.
There is a 93% of chances that the courier will take the order if the
difference of the altitude are higher tan 0, it means that the courier
has to go downhill.
There is only 91% of chances that the courier will take the order if the
difference of the altitude are lower than -40 m, Rappi needs to
balance that with lower distances and higher payments.
There are no orders with the following conditions: earnings between 0
and $2.000 COP, distances between 0 and 0.5 km and difference of
the altitude higher than 0.
There is a 97% of chances that the courier will take the order if it has
the following conditions: earnings between $5.000 and $6.000 COP,
distances between 0 and 0.5 km and difference of the altitiude higher
than 0.
There are 6 orders which are very uncommon, the payment were
higher than $9.000, the distance was between 0 and 0.5 km and the
difference of the altitude was higher than 0 and the courier denied
them. Also with 7 orders with payments between $8.000 and $9.000
and 17 orders with payments between $7.000 and $8.000. It could be
due to the hour or the zone of the order.
My laptop almost crashed, the large amount of data inside the excel
file couldn’t be handled with a good performance. (I really need to buy
a new one).
To manage this kind of information is more convenient to use Power
BI, the issue is that I have an old MacBook Air.
While working in this kind of analysis, you have to make a series of
assumptions, maybe in the real operations my approach would have
been different compared to this one.
It is important to check the excel file (orders.xlsx) to understand
the source of my arguments.
The format of the values were a big issue while working with this
excel file. It consumes a lot of RAM memory to convert from a text
format to a number format.
Sometimes Excel and Power BI are not efficient enough to deal with
this amount of information, that’s why it is more advisable to use
Python.
Another tool, like Python, could be more efficient when you have to
make multiple iterations.