Gonzalez Torresarpi Capstone Fuel Efficiency and Safety in Coca-Cola Femsa Last-Mile Logistics

Fuel efficiency and safety in Coca-Cola FEMSA last-mile logistics
by
Arturo Torres Arpi Acero
Industrial and Systems Engineer, ITESM CSF
and
Fernando González Gil
Industrial and Systems Engineer, ITESM CSF
SUBMITTED TO THE PROGRAM IN SUPPLY CHAIN MANAGEMENT

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF APPLIED SCIENCE IN SUPPLY CHAIN MANAGEMENT
AT THE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2021
© 2021 Arturo Torres Arpi Acero and Fernando González Gil. All rights reserved.
The authors hereby grant to MIT permission to reproduce and to distribute publicly paper and electronic
copies of this capstone document in whole or in part in any medium now known or hereafter created.
Signature of Author: ____________________________________________________________________

Department of Supply Chain Management
May 14, 2021
Signature of Author: ____________________________________________________________________

Department of Supply Chain Management
May 14, 2021
Certified by: __________________________________________________________________________
Dr. María Jesús Saenz Gil de Gómez
Executive Director, Supply Chain Management Blended Program
Capstone Advisor
Accepted by: __________________________________________________________________________
Prof. Yossi Sheffi
Director, Center for Transportation and Logistics
Elisha Gray II Professor of Engineering Systems
Professor, Civil and Environmental Engineering
Fuel efficiency and safety in Coca-Cola FEMSA last-mile logistics
by
Arturo Torres Arpi Acero
and
Fernando González Gil
Submitted to the Program in Supply Chain Management

on May 14, 2021 in Partial Fulfillment of the
Requirements for the Degree of Master of Applied Science in Supply Chain Management
ABSTRACT
Across industries and supply chains, the safety of drivers and efficient use of fuel by truck fleets are an
increasing concern. This project focused on understanding driving styles, understanding the tradeoffs
between safe and efficient driving styles, and finding the highest levels of safety and fuel efficiency. We
worked with Coca-Cola FEMSA to analyze one year of telematics data from over 3,000 vehicles. To analyze
the data, we employed a methodology that involved multiple machine learning and analytical techniques,
including multiple regressions, a random forest classification algorithm, Bayesian Gaussian Mixture Model
for clustering, what-if simulations, and the use of interactive data visualization tools. These techniques
were used first to understand the main fuel efficiency drivers, then to understand the drivers of safety,
and finally to understand the trade-offs between fuel efficiency and safety with respect to different driving
styles. Our results show that significant gains can be achieved in terms of fuel efficiency by changing
driving behaviors. Results from the regression and simulator show that average speed, acceleration events
and maximum RPM are the 3 most important variables for fuel efficiency. With small changes like
increasing speed by 1km/h, reduce acceleration events in 5% and reduces maximum RPM by 5% fuel
efficiency can be increased by 6%. We also demonstrate the main factors defining safety and their relative
importance. Finally, we cluster driving styles and suggest good practices to replicate the best driving styles
between different driving style clusters. Through a change management framework, we propose how
some drivers could improve Coca-Cola FEMSA’s safety proxy by 34% without sacrificing fuel efficiency.
Capstone Advisor: Dr. María Jesús Saenz Gil de Gómez

Title: Executive Director, Supply Chain Management Blended Program
ACKNOWLEDGMENTS
2
We would like to thank our advisor, Dr. Maria Jesus Saenz, for providing guidance and support throughout
this project. Next, we want to thank Pamela Siska for reviewing our reports and providing detailed
feedback on areas for improvement. Lastly, we both would like to thank our families and partners, for
always being supportive throughout our master’s program.
3
TABLE OF CONTENTS
LIST OF FIGURES ............................................................................................................................................................5
LIST OF TABLES ..............................................................................................................................................................6
1 INTRODUCTION .....................................................................................................................................................7
2 LITERATURE REVIEW .............................................................................................................................................9
2.1 Safety .........................................................................................................................................................10
2.2 Fuel Efficiency and Costs ............................................................................................................................14
2.3 Fuel Efficiency and Sustainability ...............................................................................................................16
2.4 Conclusions ................................................................................................................................................18
3 METHODOLOGY ..................................................................................................................................................20
3.1 Business understanding .............................................................................................................................21
3.2 Data understanding....................................................................................................................................22
3.3 Data preparation ........................................................................................................................................23
3.4 Modeling ....................................................................................................................................................24
3.4.1 Fuel Efficiency ....................................................................................................................................24
3.4.2 Safety .................................................................................................................................................26
3.4.3 Cluster Analysis..................................................................................................................................28
3.4.4 Individual Cluster Analysis .................................................................................................................29
3.5 Conclusions ................................................................................................................................................30
4 RESULTS ..............................................................................................................................................................30
4.1 Fuel Efficiency ............................................................................................................................................30
4.1.1 Regression Model ..............................................................................................................................30
4.1.2 Fuel Efficiency Scenario Analysis .......................................................................................................34
4.1.3 Anomaly Detection ............................................................................................................................36
4.2 Safety .........................................................................................................................................................37
4.3 Fuel Efficiency and Safety ..........................................................................................................................39
5 DISCUSSION ........................................................................................................................................................43
5.1 Fuel Efficiency ............................................................................................................................................43
5.2 Safety .........................................................................................................................................................44
6 INSIGHTS AND MANAGEMENT RECOMMENDATIONS........................................................................................46
6.1 Fuel Efficiency ............................................................................................................................................46
7 FUTURE RESEARCH..............................................................................................................................................50
7.1 Fuel Efficiency ............................................................................................................................................50
7.2 Safety and Fuel Efficiency ..........................................................................................................................51
8 CONCLUSION.......................................................................................................................................................52
REFERENCES.................................................................................................................................................................53
4
LIST OF FIGURES
Figure 1: Characteristic turn maneuver and lane change patterns ............................................................ 11

Figure 2: Forces Influencing Driver Safety (Douglas and Swartz, 2016) ..................................................... 13
Figure 3: Factors Influencing Fuel Efficiency .............................................................................................. 14
Figure 4: Factors Influencing Sustainable Supply Chains ............................................................................ 18
Figure 5: Driving Styles Independent Variables and Dependent Variables ................................................ 19
Figure 6: Methodology ............................................................................................................................... 20
Figure 7: Cross Industry Standard Process for Data Mining CRISP-DM (Shearer, 2000) ............................ 21
Figure 8: Histogram of Fuel Efficiency ........................................................................................................ 31
Figure 9: Pareto Chart of the Standardized Effects .................................................................................... 31
Figure 10: Residuals for Linear Regression Model...................................................................................... 32
Figure 11: Prediction Error for Linear Regression....................................................................................... 32
Figure 12: Telematics Parameters Correlation Matrix ............................................................................... 33
Figure 13: Scenario Analysis ....................................................................................................................... 35
Figure 14: Anomaly Detection Example ..................................................................................................... 36
Figure 15: Distribution of Safety Score Classes .......................................................................................... 37
Figure 16: ROC Analysis for Safety Classification for Random Forest......................................................... 38
Figure 17: Relative Feature Importance ..................................................................................................... 39
Figure 18: Driving Style Clusters’ Mean Safety Score and Fuel Efficiency .................................................. 40
Figure 19: Condensed View of Driving Styles ............................................................................................. 41
5
LIST OF TABLES
Table 1: Classification Model Comparison.................................................................................................. 38

Table 2: Features used for Cluster Analysis ................................................................................................ 29
Table 3: Fuel Efficiency Linear Regression Results ..................................................................................... 34
Table 4: Change Motivator Nudges for Driving Styles ................................................................................ 48
6
1 INTRODUCTION
Across industries and supply chains, the safety of drivers and efficient use of fuel of truck fleets is an
increasing concern. This project focused on the research of driving styles that promote the highest levels
of safety and fuel efficiency, as well as the tradeoffs that can happen between the two. The hypothesis
of the project was that statistically significant different driving styles could be uncovered that showed the
most fuel efficient and safest driving styles and that we would also find inherent trade-offs between safety
and fuel efficiency.
This introduction discusses six main points: the impact of fuel efficiency in terms of cost and carbon
dioxide emissions, how driving style is related to fuel efficiency and safety, how telematics data can be
used to track driving styles, how analytics can be used to analyze this problem and the data that we will
be using for this project.
Across industries and supply chains, fuel consumption presents a two-fold problem, as it is an issue that
directly affects both companies’ profits and the environment. Proof of this comes from a study carried
out by Chainalytics (2020) in which they estimated that, on average, transportation costs make up to 50
to 60% of all supply chain operating costs. A similar study, done in Mexico, found that fuel contributes
approximately 38.5% of the total direct costs of road transportation in Mexico (Moreno, 2014). Besides,
transportation was responsible in 2010 for approximately 23% of worldwide energy-related CO2 emissions
(IPCC, 2014). Therefore, finding ways to reduce fuel consumption across industries is a great win-win
solution to both reducing climate impact and helping companies´ bottom lines.
One of the main factors affecting fuel efficiency is driving style. The difference in fuel consumption
between the most and least efficient drivers can be as high as 35%, according to a report by the American
Trucking Association’s Technology and Maintenance Council (Hooper & Murray, 2018). Driving style is also
crucial to ensure the safety of the drivers and their communities; the German Federal Statistical Office
7
(2010) presented in their accidents report that 69% of the accidents in Germany happen because of
drivers’ mistakes. Also, according to the Dutch Eco-Drive initiative (www.ecodrive.org, 2001), a safer
driving style is more efficient and reduces pollution. Fleet driving companies are dealing with a big gap in
driving styles, hurting them in safety, environmental impact, and cost.
There are many ways to monitor driving styles. One of them is through vehicle telematics and the way
that it works is that each vehicle contains multiple sensors that gather data on various metrics and events
like CO2 emissions, fuel consumption, RPM, tire pressure, speed, etc. The data provided by vehicle
telematics is extremely valuable, McKinsey (Gao, Kaas, Wee, 2018) claims in its report automotive
revolution perspective towards 2030 that recurring revenue from data-driven and on-demand mobility
services could increase by $1.5 trillion in 2030. This is 30% of the overall automotive revenue pools.
Important developments in telematics are coming in the next few years. However, there is still a huge
challenge with vehicle telematics: each vehicle can generate extensive data streams, which cannot be
analyzed with traditional methods and/or spreadsheets.
Telematics providers already have standard safety and cost-related web reports, but usually, but these
reports tend to be descriptive in nature and only provide data from past events without being able to
advise on the best actions forward. To provide predictive and prescriptive analytics a more advanced
approach is needed. This approach is nowadays possible thanks to increase in computer processing power
and the decrease in database storage capacity costs. Our approach leverages on this by integrating various
machine learning and analytics techniques into a multi-methodological approach to answer our research
question.
The data analyzed came from Coca-Cola FEMSA based out of Mexico. This company operates a fleet of
around 3,000 telematics-enabled delivery trucks in Mexico alone. These trucks are focused on the last-
mile delivery from regional distribution centers to both large grocery stores and nano stores all around
8
the country. The data from the telematics sensors is sent continuously to the cloud to give real-time
insights to the company and can also be exported to analyze historical datasets. The company provided
us with an entire year´s worth of data from those 3,000 trucks which accounts for around 600,000 driving
days.
In summary, our project focused on finding the safest and most fuel-efficient driving styles and the
inherent tradeoffs between safety and fuel efficiency. We are using data from the truck fleet of Coca-Cola
FEMSA Mexico. To answer our research question, we will be using a multimethodological approach that
includes various machine learning and analytics techniques. We hope that this research leads to data-
driven improvements in efficiency and safety for this company and the last-mile transportation industry.
2 LITERATURE REVIEW
Numerous articles have been written on transportation safety, fuel efficiency, sustainable supply chains,
driving styles, and telematics analytics. The objective of this project is to find out which driving styles are
the most fuel-efficient and safest and analyzing the inherent tradeoffs between safety and efficiency.
Different approaches for understanding driving behaviors have been researched throughout history, all
the way from manual surveys to advanced analytics on telematic data. Three main modeling scales used
to estimate and understand fuel consumption were defined by Chen et al. (2017) as: First, the microscopic
approach takes near real time values and builds a model given the second-to-second decisions of the
driver. Second, the macroscopic approaches look at cumulative data from long periods and uses one trip
or day as a measure of unit. Third, the mesoscopic approaches, which makes a hybrid combination of
microscopic and macroscopic data to model fuel consumption. Our approach will focus on the
macroscopic level by analyzing the summary of daily metrics per truck.
With the advent of big data and advances in machine learning practices, a new branch of analytics was
created that is commonly referred to as telematics analytics. Several approaches can be taken to analyze
9
the massive amounts of data coming from the telematic devices. Carlos et al. (2020) analyzed vehicle
telematics around aggressive behavior and the relation to road accidents worldwide. Their approach used
first and second order representations to model accelerometer data for classifying driving behavior.
Following this approach, we are looking into incorporating accelerometer data, among other variables,
into our model.
Various projects analyzed the data that can be collected from smartphone sensors, for example, the article
published by Kang and Banerjee (2017) in which they showed how modern smartphones can be used
widely to collected data on accelerations, brakes, turns and lane changes. This can serve as
encouragement for smaller firms that lack access to advanced telematic devices but want to tap into the
advantages of analyzing their drivers’ style to improve on their safety, sustainability, and fuel efficiency.
By leveraging the framework and insights provided in this project, firms can even use readily available
data as shown in the paper by Kang and Banerjee. In this project we not only extracted telematics data
but also further analyzed it to propose data-driven best practices, taking into considerations different
impacts of a driving style.
2.1 Safety
Fleet management’s top priority is generally safety. A car accident can be very hard to model or predict
given the chaotic nature of an accident and the high number of external factors that can influence an
accident. Driver behavior is certainly one of the biggest factors. Johnson et al. (2009) found that “as many
as 56% of deadly crashes involve one or more unsafe driving behaviors typically associated with aggressive
driving”. An aggressive driving style is a behavioral pattern or classification of a driver which is associated
with risky speeding profiles (irregular, instantaneous and abrupt changes in vehicle speed). Toledo et al.
(2008) also found that even after controlling for the larger distances they drive, company car drivers are
50% more likely to be involved in car crashes compared to other drivers.
10
Some studies focus on understanding driving styles from data in multiple applications, such as
autonomous driving, insurance applications, and driver distraction detection. Meiring et al. (2015)
reviewed the ongoing research on driving style analysis systems and their applications and synthetized
the updated research in their article “A Review of Intelligent Driving Style Analysis Systems and Related
Artificial Intelligence Algorithms”. According to them, one of the most traditional ways to rate driving
styles is through surveys. For example, the Driver Behavior Questionnaire (DBQ) used by Richard Rowe et
al. tested repeatedly with DBQ around 12,000 drivers six months after they passed their driver’s license
test and confirmed the integrity and validity of DBQ as a driver behavior measure in traffic accident
prediction, however this approach requires a manual input of each driver and relies in the driver’s integrity
and may vary with time. Similar approaches were proposed by Houston (2003) with the Aggressive Driving
Behavior Scale (ADBS) and by Harris (2014) with the Prosocial and Aggressive Driving Inventory (PADI).
Houston and Harris have shown that there is a statistical correlation between driver behavior and crash
involvement. Their study focused mainly on individual variability associated with numerous parameters
such as age, gender, and geographic locations.
Another approach to measuring driving safety is by analyzing in-vehicle data (Toledo et al., 2008). used
IVDR (In Vehicle Data Recorder) to measure different factors related to safety. For example, the
acceleration and direction of the vehicle, both in the lateral and longitudinal directions are measured by
accelerometers at a sampling rate of 40 measurements per second. The vehicle speed is derived from the
GPS receiver data or from the vehicle speed sensor (VSS). Then they apply pattern recognition algorithms
to the raw measurements to detect maneuvers that the vehicle performs. In Error! Reference source not
found. we can appreciate how a turn maneuver (left) and a lane change (right) can be differentiated by
detecting the changes in longitudinal and lateral acceleration.
Figure 1: Characteristic turn maneuver and lane change patterns
11
Note: In their research, Toledo et al. (2008) use this information to calculate risk indices that indicate on
the overall trip safety. Drivers receive feedback through various summary reports, real-time text messages
or an in-vehicle display unit. Reductions in crash rates and the risk indices are observed in the short-term.
Another more recent study by Amarasinghe et al. (2015) proposed a cloud-based driver monitoring and
vehicle diagnostic app with OBD2 Telematics. They design an architecture in which an OBD2 (On Board
Diagnostics) sensor reads the data generated in real time from the vehicle computer and monitor different
parameters, such as speed, acceleration, and cooler temperature. These inputs are then processed and
analyzed automatically by algorithms that detect reckless driving from high lateral and longitudinal
acceleration changes, proving the possibility of developing an application that could give feedback. The
study shows different metrics and the way to measure them but does not go in the detail of the
interactions between different parameters or their relations with the driving styles.
Lack of attention is another important factor defining driver safety. The “100-Car Naturalistic Study” was
a study designed by the National Highway Safety Administration (NHTSA) in collaboration with the Virginia
Tech Transportation Institute (VTTI) to provide insight on the influence and contribution of driver behavior
immediately preceding an accident. This study was performed on 100 vehicles fitted out with surveillance
and other sensor devices for a duration of a year, driven collectively for nearly 2 million miles, and
12
accumulated 42,000 h of data from the 241 drivers. It revealed that 78% of the 82 accidents recorded,
and 65% of the 761 near accidents, were the direct effect of driver inattention.
Douglas and Swartz (2016) proposed that three main factors that affect driver safety: external forces,
organizational forces, and regulatory forces. Regulatory forces include all the regulations and policies
coming from governmental agencies. Organizational forces are the ones set by a company such as
dispatching policies, safety priorities and the climate risk of acceptance. External forces, such as road and
weather conditions, also affect a driver safety in a variety of ways.
Figure 2: Forces Influencing Driver Safety (Douglas and Swartz, 2016)
Organizational forces can be adjusted to improve driver´s safety, as demonstrated by Rodriguez, Targar
and Belzer (2006). They proved that driver safety, as measured by crash incidence, can be improved by
two factors. The first one is by increasing retention of employees, as more experienced drivers have fewer
accidents. The second one is by increasing the pay regime, as their data showed how better paid drivers
had also fewer accidents.
13
2.2 Fuel Efficiency and Costs
The optimization of fuel use while driving is also affected by various factors including the inherent
efficiency of the truck, the optimization of the route and the maintenance of the truck. One way
companies can improve their fuel efficiency is by incentivizing drivers to reduce their fuel use.
Adamidis, Mantouka and Vlahogianni (2020) also showed that adopting smooth driving can have a
statistically significant impact on fuel efficiency and emissions. Some of the behaviors that they observed
related to the acceleration and the braking speeds of the vehicles. Figure 3 shows some of the various
factors that influence fuel efficiency.
Figure 3: Factors Influencing Fuel Efficiency
In this project, the main variables to be analyzed are the Vehicle Make and Model and the Driving Styles,
as there is no readily available data on fuel quality, and routing optimization is out of the scope of this
project. The expected fuel efficiencies for each Vehicle, Make and Model can be found on the specific car
manufacturer’s websites. To ensure the applicability for Coca-Cola FEMSA and avoid any outside effect,
the project team ran a regression model to understand the impact in fuel efficiency of the mentioned
parameters.
14
The term Eco-driving was credited by the UK government as the adoption of a driving behavior that
maximizes the efficiency of the vehicle’s engine. Xu et al. (2014) revealed that eco-driving can reduce fuel
consumption by an amount ranging from 15% to 25% and GHG emissions by at least 30%. A recent study
by Panagiotis Fafoutellis (2014) performs an in-depth overview of existing research regarding eco-driving,
in which he concludes that ICT (Information and Communications Technology) systems to generate and
store data are crucial for the quantification and understanding of the effects that different driving styles
can have. Driving style is also remarked in existing research as one of five components of fuel
consumption, with road geometry, vehicle specifications, traffic and weather conditions being among the
most influential (Gilman, et. al, 2015). Another conclusion from their study is that a big data approach is
needed to jointly consider data from different sources of information. As stated by Fafoutellis (2014),
Linear models can be considered more useful in assessing the influence and importance of each factor in
fuel consumption rather than predicting it, while machine learning and deep learning algorithms, such as
AdaBoost and neural networks perform better to predict the fuel efficiency, but do not offer much insight
since they are not explainable models.
Ping et al. (2014) explained that modeling driving behavior under inherently dynamic driving conditions is
complex. They also showed how making a quantitative analysis of the relationship between the driving
behavior and the fuel consumption is difficult. Nevertheless, in their study, they applied machine learning
algorithms to smartphone data to implement driving style identification. Several studies have used
smartphone data to mimic vehicle telematics data. They showed that speed and acceleration are
discretized by the smartphone which increases the error margin, and that smartphone data does not
provide the same number of parameters that can be extracted directly from the vehicle’s computer. In
their study, they developed a deep learning framework (LSTM) and used K-Means clustering to separate
drivers into different profiles and then estimate fuel consumption. Although some of the parameters were
15
discretized and they did not have all possible parameters, their model achieves an accuracy greater than
80%.
To interpret the data from telematics, Yao et al. (2020) developed various Machine Learning models
focusing on driving behavior (speed, acceleration, constant speed duration and braking). The algorithms
exploited were Neural Networks, Random Forest and Support Vector Regression. All models achieved
RMSE values of 0.87, 0.89 and 0.78 respectively, which correspond to a MAPE of less than 10%.
Vittorio Astarita et al. (2013) managed to develop an app called EcoSmart, which replies driving behavior
describing apps for a limited set of parameters without the need of connecting with OBDII or any vehicle
data. EcoSmart generates fuel consumption simulations based on smartphone GPS and a set of tuned
parameters that vary in under 5% with data reported by vehicle telematics. While this approach is practical
and easier than the previously mentioned OBDII connected app, it is limited since the simulations could
bring less accurate results for shorter and more chaotic routes, e.g., with variable traffic, slopes, stops and
load, which is this study’s focus.
Common approaches to improve driver styles mentioned in research include targeted pricing policies. For
example, Fafoutellis et al. (2020) suggested new regulations for alternative fuel vehicles and a systematic
upgrade of the transport infrastructure towards a more connected and cooperative city environment. On
the other hand, Scania, C. (2014) mentioned that “by gaining knowledge of the impact of their actions on
fuel consumption, drivers are more likely to adopt more environmentally friendly practices”. Another
impactful approach is designing a proper driving reward system. Lai (2015) showed how through a proper
reward system, a 10% improvement in fuel consumption efficiency was achieved.
2.3 Fuel Efficiency and Sustainability

With the increased attention around climate change and corporate responsibility, a growing number of
companies are looking into becoming more sustainable for the environment. Companies, such as Amazon,
16
have signed Climate Pledges to build sustainable business which they translate as becoming net zero
carbon. The idea behind net zero carbon is to eliminate or offset all CO2 emissions that are produced in
any point of their supply chains. Our project helps companies in identifying fuel efficient driving styles that
can lessen the amount of CO2 produced.
Fuel consumption and CO2 emissions go hand in hand, and several factors influence how efficient fuel
consumption can be. Demir et al. (2011) considered four different factors: the vehicle, the environment,
the traffic, and the operations. Demir et al. (2014) published another article that narrowed down the
factors that most influenced the number of emissions to total mass, speed, and road gradient.
Another approach that was proposed by Jaller et al. (2015) was to move the transportation of goods and
materials to hours of the day in which there is less congestion. Their study showed that freight deliveries
that were done in hours with less traffic led do reduce fuel consumption. This last study supports the
observations from previously mentioned studies in which constant changes in speed lead to higher
consumption of fuel.
Optimizing truck allocation by truck type is another approach that has been taken to reduce fuel
consumption. Velázquez et al. (2016) standardized this approach into a methodology that uses K-means
clustering and Tukey´s method to cluster trucks into certain types. These types can then be optimally
assigned to environments in which they would perform at their best and in this way reducing the overall
emissions produced by the fleet.
Many of the factors that impact fuel efficiency positively are the same ones that impact the environment
positively. As the main offender to sustainability in transportation is the burning of fossil fuels, we see no
inherent tradeoff between these two topics.
17
Figure 4: Factors Influencing Sustainable Supply Chains
Note: The factors considered for this project are the analysis of fuel efficiency in hours that have low
traffic, the changes in environment such as constant changes in altitude, the information around the
vehicles and the general constraints of the operation.
2.4 Conclusions
From data gathering to problem modelling, various challenges have been found for understanding driving
styles through data and proposing strategies for improving safety and fuel efficiency. Many articles have
been created given the timely importance of safety, cost, and sustainability, as well as the increasing
technological development and general interest for data science and machine learning. Specific solutions
for specific problems related to driving behavior have been designed; some traditional methods like DBQ
and some other more complex like the diagnostic app with OBD2 connected with vehicle telematics.
To provide insights into how all these different factors relate to and affect each other, this project analyzes
the interactions around vehicles and their characteristics, the environment in which the drivers conduct
their day-to-day business, organizational forces such company policies and regulations, driving styles such
as constant changes in velocity, vehicle cargo as measured by the gross weight of the cargo and traffic. All
18
these different variables are analyzed in the context of the two main dimensions of this project which are
safety and fuel efficiency. The articles that have been cited center on specific aspects of either safety or
fuel efficiency. Our project goes further by also analyze the trade-off situations between the different
variables and to also find situations in which there are clear win-win scenarios.
Figure 5: Driving Styles Independent Variables and Dependent Variables
Note: The variables in squares are a fraction of the independent variables that are used to describe the
driving styles, the variables in the center are the dependent variables that are the focus of the project.
19
3 METHODOLOGY
To understand which driving styles can help fleet owners improve safety and fuel efficiency, we followed
an approach that involved multiple machine learning and analytical techniques. Figure 6 shows our overall
approach. The first step consisted in Business Understanding, Data Understanding and Data Preparation,
all of which is explained in detail in sections 3.1, 3.2 and 3.3. The second part of our methodology involved
doing regression analysis to understand how our different independent variables impact fuel efficiency
(section 3.4.1) and a classification model to see how variables affect safety (section 3.4.2). The third part
involved factor analysis to reduce the dimensionality of our dataset to create clusters that represented
different driving styles (section 3.4.3). The fourth part involved analyzing each of the clusters to
understand what differentiates each driving style and how this impacts safety and fuel efficiency (section
3.4.4). The fifth and final part was drafting the conclusions to propose best practices to increase safety
and fuel efficiency (3.4.5).
Figure 6: Methodology
Note: numbers on the diagram represent the section in this document where more information can be
found.
20
Throughout the project, we followed the Cross Industry Standard Process (CRISP) for Data Mining
framework (Shearer, 2000). This framework constitutes a general framework that emphasizes the
iterative nature of data mining problems. Figure 7 shows the general framework; its application to this
project is detailed in the following sections. The Evaluation steps are discussed in parts throughout this
section and in the Results section. The Deployment was not part of the scope of this project, but several
recommendations for deployment are given in section 6, Insights and Management Recommendations.
Figure 7: Cross Industry Standard Process for Data Mining CRISP-DM (Shearer, 2000)
3.1 Business understanding

Our initial research helped us understand the Coca-Cola FEMSA´s business needs. This research consisted
of reading academic documents, industry reviews, and annual reports shared by organizations such as the
Intergovernmental Panel on Climate Change (IPCC) and the American Transportation Research Institute
21
(ATRI) for a high-level overview. This research showed that drivers’ decisions and driving style is one of
the main factors defining the safety and fuel efficiency of any company’s fleet.
Coca-Cola FEMSA also shared documentation regarding their telematics approach and objectives, to focus
the project and validate the company´s approach to using Telematics Data by comparing them with the
reviewed literature. Based on the initial research, the company’s main priorities are safety and fuel
efficiency (which affect equally cost savings and CO2 emissions).
A series of weekly interviews and discussions were held with Coca-Cola FEMSA’s secondary distribution
stakeholders. In these discussions, insights were shared from visualizations in Power BI, receiving
feedback and interpretation from Coca-Cola FEMSA´s experts. During these sessions we interviewed the
Director of Distribution, the Telematics Managers, and the Digital Analytics teams.
To gain a better understanding of the day-to-day of the truck drivers of Coca-Cola FEMSA, a field visit was
arranged. By accompanying truck drivers during their daily routes to deliver beverages, interesting insights
beyond the data from telematics emerged. Important insights from the field trip include that some
important information not reflected in telematics can impact the drivers’ decisions, for example, traffic,
street conditions, weather, and other vehicles’ driving behavior.
3.2 Data understanding

The amount of available data in this project was an important challenge. A telematics supplier integrates
with over 3,000 trucks to generate over 40 different tables that are updated every day or some even every
minute, each table having different parameters with different aggregation levels, therefore a clean data
set is fundamental for the project. A data dictionary was created to better understand each of the
different parameters shown in the reports, as well as the aggregation level of each report. Weekly calls
with the telematics team helped to clarify questions from the team.
22
With the data dictionary ready, different visualizations in Microsoft Power BI helped us to get an initial
feel for the data. These visualizations were iteratively validated and discussed with Coca-Cola FEMSA´s
stakeholders to clarify the expected ranges for important parameters and the expected relations between
them.
3.3 Data preparation

The following criteria were followed to clean the data:
• For outlier treatment we followed two approaches depending on the attribute. For most
attributes, we trimmed the values to what the users found to be real minimums or maximums. As
an example, for the attribute “Hours Driven”, there is no way a driver could have driven for more
than 24 hours in one day. For attributes in which there was no knowledge of the limits, a
conservative approach was followed, trimming only the values that went beyond 3 times the
interquartile range.
• Trucks that do not report fuel usage or full telematics data were removed. This removed a large
part of the trucks that the company uses but still left us with 360 trucks with data from 325 days
of delivery.
Data was also further processed in the following manner:
• One-hot encoding of variables are used to transform categorical variables (e.g., truck type and
model) into dummy variables to be used in regression models.
• Several features had to be engineered to be used in the system. Mainly features that were a ratio
of the time an activity took in respect of the total operating time, for example, the total time a
truck spent accelerating with respect to the total time of operation.
• Feature scaling was used to normalize the range of independent variables. The feature scaling
method used was min-max normalization. What this method does is that the minimum value of
23
an attribute becomes 0 and the maximum value becomes 1 and all the other values are adjusted
on that 0 to 1 scale.
3.4 Modeling
Our approach for modelling involved three different machine learning models. The first one was a
regression model to understand the how different driving behaviors impact fuel efficiency, the second
one was a machine learning model to understand how different driving behaviors impact safety and the
third model was a clustering analysis of driving behaviors to create different driving styles clusters to
analyze how these driving styles affect safety and fuel efficiency simultaneously.
3.4.1 Fuel Efficiency

For our first regression model, the dependent variable to be analyzed is fuel efficiency as measured by
kilometers per liter, where a higher fuel efficiency is better for the economy of the company and produces
a lesser amount of carbon dioxide per kilometer driven. To quantify the monetary impact of any change,
the average of the price per liter in Mexican pesos is used. To obtain the average price per liter, a dataset
of all refuels of 2020 is used, which was 18.16 MXN per liter. To obtain the average carbon dioxide
produced per liter of diesel we used a constant of 2.68 kilograms of carbon dioxide per liter.
Different regression algorithms were tested to explain the main forces impacting fuel consumption. Some
of the models considered were multiple linear regression (i.e., polynomial regression), support vector
regression and simple decision trees. Other ensemble methods were tested like boosting (e.g. AdaBoost,
XGBoost and LighGBMs), bagging (e.g. random forests) and stacking of various ensemble and simple
methods. Although the bagging and boosting models explained the variance in observations
measurements better (As reference, the adjusted R2 for the AdaBoost Regressor was 0.72 while for the
Linear Regression it was 0.67), we decided to use multiple linear regressions because they are fully
explainable. These types of models allow us to explain how the model interprets the inputs to produce
outputs as opposed to a black model that only produces outputs that are not explainable. To make sure
24
the results of the linear regression are reproducible, the main four assumptions behind linear regression
were tested using residuals plots. Here is a list of the main four assumptions:
1. Independence of observations
2. Linearity of Response
3. Normality of Residuals
4. Homogeneity of Variance (i.e., homoscedasticity)
Multicollinearity issues were addressed in three different ways:
• Feature Selection was carried out by understanding the meaning behind each of the attributes to
discard metrics that were proxies of each other.
• A Pearson Correlation Map was created to discard attributes with correlation greater than 0.6 to
at least one of the attributes that a Pearson correlation coefficient, which indicates a high
correlation with another feature that increases the effect of multicollinearity. A correlation
analysis only checks the probability of a correlation problem between two attributes.
• Variance Inflation Factor (VIF) was obtained for each of the attributes, and we discarded attributes
that had a factor greater than 5, which would indicate highly correlated attributes. A Pearson
Correlation Map helps with identifying pairs of attributes that are correlated, while the VIF
approach helps to identify multicollinearity among the interactions between the variables, not
just between two of them.
25
Once the regression was validated, we developed a Microsoft Power BI-based simulation tool that allows
the users to simulate what would be the fuel efficiency gains if any of the dependent variables are
modified. Afterwards, we use the results of our regression and validated them against samples of data.
The samples used came from drivers where we had detected abrupt changes in their fuel efficiency. To
detect the abrupt changes in fuel efficiency behavior for each driver we calculated a rolling 7-day average
of fuel efficiency to smooth out the daily noise and only kept those drivers in which we saw a change that
remain constant, an example is shown in Figure 13.
3.4.2 Safety
Safety is not a straightforward concept to measure. Some of the proxies for measuring safety include
number of accidents or proprietary Safety Scores given by telematics data providers. Our initial approach
to understand which driving behaviors affect safety was to use accidents as our dependent variable on a
regression model.
Predicting crash rates is hard to measure given that accidents are stochastic events that not always follow
the same pattern and depend on a wider variety of directly controllable factors like driving style and
external factors like the weather, external traffic and highly uncertain events like people crossing streets
or other drivers’ reckless driving. An econometric model was used to analyze the driving behavior of the
drivers that had accidents. The econometric model was based on logistic regressions to predict the
probability of an accident occurring. Econometric models allow the model to incorporate past events (by
using lag features) that have led to an accident, such as a driver incurring in unsafe practices for several
days in a row. Another option includes using several proxies for safety: events such as reducing the velocity
of the vehicle too quickly or events in which vehicles make hard turns at considerable speeds. Events like
these could be used as proxies for safety as they are considered unsafe behaviors.
26
Our econometric model to predict accidents did not produce statistically relevant results. This was seen
by an adjusted R2 that was less than 0.2 and attributes with p-values greater than 0.05. Therefore, this
part of the process was not integrated into the results. Our hypothesis of how to make this model work
would be to structure how the data is collected and analyze more data related to status of the driver as
suggested by Houston, J. (2003) and Harris, P. (2014).
Given that our first proxy for safety failed to work, we decided to use Coca-Cola FEMSA’s telematics
provider Safety Score. This score uses various events to calculate a proxy to the probability to have an
accident. The calculations used are Intellectual Property of the supplier, but they are based on a micro
modeling approach similar method to Toledo et al. (2008) mentioned in the literature review (i.e., real
time analytics of telematics data). For example: a sudden longitudinal and lateral acceleration change
measured by the vehicle’s computer may indicate an abrupt turn. This way, the previously mentioned
independent variables were generated (e.g., abrupt lane change, abrupt turns acceleration or braking
events while turning, etc.).
To understand the relative importance of our independent variables regarding the Safety Score we
discretized the Safety Score variable into 6 equal-frequency categories. The reason for discretizing the
variable instead of treating it as continuous numerical feature is that the Safety Score is bounded by an
upper limit at 100. So, a linear regression model would produce results with heteroscedasticity problems.
Therefore, we ran a classification model using 20 independent variables to predict one of the 6 Safety
Score classes. Figure 15 shows the ranges and number of observations for each of the 6 classes. The
independent variables used are listed in Table 2.
We tested among various machine learning models to decide on which machine learning model would
best fit the data. Among the options that we tried were Random Forest, AdaBoost, Naïve Bayes and
Support Vector Machines (SVM). Table 1 shows the comparison between the different machine learning
27
models. The machine learning algorithm that produced the best results in terms of Area Under the Curve
(AUC) was Random Forest Regressor. The AUC is a common metric to evaluate the results of multiclass
classification problems as it provides an aggregate measure of performance across all possible
classification thresholds. A common way to interpret this metric is as the probability that the model ranks
a random positive example more highly than a random negative example.
3.4.3 Cluster Analysis

Our third and final model was clustering. Clustering is a family of machine learning that allow for
unsupervised learning. The intention of using clustering was to group the different driving behavior
characteristics that are gathered at the daily and truck level to identify clusters of driving styles. As input
variables we used twenty different variables shown in Table 2.3.
The clustering approach that we decided to use was a Bayesian Gaussian Mixture Model. The reason for
using a probabilistic Gaussian Mixture model is that it allowed to us to better understand the properties
of input examples. Many clustering algorithms like K-Means simply give a cluster representative that
shows nothing about how the points are spread. The Gaussian properties of this approach gives us not
only the mean of the cluster but also the variance which can be used to estimate the likelihood that a
point belongs to a certain cluster. The reason for choosing a Bayesian Gaussian Mixture Model instead of
the traditional Gaussian Mixture was to take a probabilistic approach to choosing the number of clusters.
With a traditional Gaussian Mixture Model, a Bayesian Information Criterion (BIC) or the Akaike
Information Criterion (AIC) techniques must be used to select an optimal number of clusters. While with
a Bayesian one, the algorithm takes the cluster parameters as latent random variables, not as fixed model
parameters. In other words, with this algorithm you can set an initial maximum number of clusters and
the algorithm will decide the optimal number of clusters to reward models that fit the data well while
minimizing a theoretical information criterion. The possible range of number of clusters would be
between 1 and the maximum number of clusters that was set. For our problem, we chose a maximum
28
number of clusters as 10 as this would allow us to separate the driving styles into business-relatable
information but the algorithm suggested 5 clusters as the optimal number for clusters.
Table 1: Features used for Cluster Analysis
Feature ID Variable name

1 Life mileage
2 Max. engine t(°C)
3 Max. RPM
4 Top Speed
5 Operation time
6 Time in DC
7 Avg. Speed
8 % route under min. t(°C)
9 Over Revolution Time %
10 Idling time %
11 Acceleration Route time %
12 Overspeed events (%)
13 Number of stops (%)
14 Abrupt Acceleration
15 Abrupt Braking
16 Abrupt turns
17 Abrupt Lane Changes
18 Acceleration while turning
19 Braking while turning
20 OverAcceleration events
Note: All variables were measured at a vehicle-day disaggregation level.
3.4.4 Individual Cluster Analysis

Each cluster generated by our model had a weight from independent variables that impact fuel efficiency,
for example idling times and excessive acceleration events, as well as independent variables related to
safety, for example abrupt lane changes, abrupt turns and acceleration or braking events while turning.
Each cluster was generated based on the independent variables that represent the driving behavior, to
explain which patterns each driver follows. Then we used the created clusters to see how they in terms
of fuel efficiency and safety score with the purpose of explaining the tradeoffs between the different types
of driving styles.
29
To increase each cluster’s interpretability and applicability to business daily practices, a persona was
defined for each cluster. A persona is term borrowed from the marketing industry which is described as
“the aspect of someone’s character”. Our intention in using these personas was to create fictitious but
relatable characters so that the driving style of any driver could be identified and easily recognized. Our
Gaussian Mixture Model approach also allows for driving styles to be, probabilistically speaking, part of
many of driving styles.
3.5 Conclusions
To answer our research question of which driving styles can help fleet owners increase safety and fuel
efficiency we used multiple machine learning and analytics techniques. We used data from over 3,000
trucks to come up with a fuel efficiency regression model, we had an unsuccessful attempt at predicting
crash rate safety with econometric regression so we ended up using a proprietary Safety Score from the
telematics provider as a proxy for Safety and we developed a clustering analysis to drive the business
recommendations, actionable insights, and recommendations. Given the amount of data we were dealing
with we also followed the CRISP-DM methodology to guide us through the iterative process of data
mining. In the next section we will describe the results of each model.
4 RESULTS
4.1 Fuel Efficiency
4.1.1 Regression Model

Our polynomial regression model to understand the main drivers behind fuel efficiency (kilometers per
liter of diesel) contained 13 independent variables plus the bias term. The linear regression model had an
R2 of 0.67, MAPE of 8.7%, MAE of 0.23 and a RMSE of 0.28. As context, the mean fuel efficiency was 2.72
kilometers per liter with a standard deviation of 0.52. Figure 8 shows a histogram of the distribution of
fuel efficiency.
30
Figure 8: Histogram of Fuel Efficiency
As expected, not all variables have the same impact in estimating fuel efficiency. Figure 9 shows the
relative standardized effects of each variable. The top three are AvgSpeed,
RatioAceleradorDuracionEventos (the percentage of time a truck spends accelerating) and RPMMaxima.
Figure 9: Pareto Chart of the Standardized Effects
31
Figure 10 and Figure 11 show the Residuals Plots and the Prediction Error for the Linear Regression Model.
These plots were used to test the four key assumptions that were mentioned in the Methodology.
Figure 10: Residuals for Linear Regression Model
Figure 11: Prediction Error for Linear Regression
Note: these two plots shows that results are normally distributed (Gaussian Bell on the top right) which
visually proves the normally of the residuals. The points also appear to be evenly distributed (Points on
the center) which shows that the residuals have linearity of response and that there homogeneity in of
variance.
32
Figure 12 shows a matrix of all features and how they correlate with each of the other features. A number
closer to 1 means there is a perfect positive correlation between two variables. A number closer to 0
means there is no correlation between the two variables. A number closer to negative 1 means there is a
perfect negative correlation. This plot served as initial analysis to do feature selection to avoid
multicollinearity issues.
Figure 12: Telematics Parameters Correlation Matrix
33
Table 3 shows the summarized results of the linear regression with the names of features, the coefficients,
the standard error of the coefficients, the T-values, the P-values and the VIF. The VIF values were used for
feature selection to avoid multicollinearity problems.
Table 2: Fuel Efficiency Linear Regression Results
Note: Adjusted R2: 0.67
4.1.2 Fuel Efficiency Scenario Analysis

After developing the model, we built a simulator that allowed Coca-Cola FEMSA to perform “what-if”
analysis with the different variables that impact fuel efficiency. Figure 13 displays an example of a possible
scenario. The objective of this simulation tool is to allow the company to estimate the potential gain of
experimenting different changes of any of the independent variables and seeing how this would impact
both costs and CO2 emissions.
34
Figure 13: Scenario Analysis
Note: As an example, the figure shows the results of a reduction from 15% to 10% of time accelerating,
the variable named “RatioAceleradorDuracion”. And how this is correlated to a 3% increase in fuel
efficiency and a decrease of 253 tons of CO2 and a reduction in costs of 1.72 million MXN.
35
4.1.3 Anomaly Detection
The results of the model and the simulator are only correlational and therefore causality was not proven.
Experimenting with the findings of this model to prove causality was out of the scope of the project.
Nonetheless, we performed anomaly detection procedures to understand when certain drivers had
important changes in their fuel efficiency to see if the results of the model were correlated to our
predicted outcome. Several of these anomalies were identified and similar results were seen as the ones
in the example in Figure 14.
Figure 14: Anomaly Detection Example
Note: this figure shows, on the top visualization, the average efficiency for a particular truck across time
in the year 2020. On the September 30, there was a sudden 36% improvement in terms of fuel efficiency.
The top visualization also shows how the average speed also went up dramatically in the same period.
The positive correlation between speed and fuel efficiency was identified by the model as the main
36
predictor. After analyzing the data further, it was identified that the main driver behind this change was
that the route this driver took with this truck was changed in the months of October, November, and
December. The visualization in the bottom shows the different stop locations for this driver in the months
of September, October, and December.
4.2 Safety
Figure 15 shows the number of observations that were allotted per class after the discretization of the
Safety Score. This was done in order to balance the classes before training and testing the machine
learning models.
Figure 15: Distribution of Safety Score Classes
Observations per Safety Score Class
2,955
2,830
2,587 2,647
2,527
2,110
0 to 80.5 80.5-88.5 88.5-92.5 92.5-94.5 94.5-96.5 96.5-100
Note: Safety Scores were grouped into 6 different classes each containing a similar number of
observations per class to balance the groups.
Table 1 shows the results for the 4 different machine learning classification algorithms that we tried. We
decided to use Random Forest as our model as it had the highest AUC score. The score of 0.71 led us to
the conclusion that the independent variables we used are good predictors of the Safety Score defined by
the telematics data provider.
37
Table 3: Classification Model Comparison
Model AUC
Random Forest 0.713
AdaBoost 0.703
Naïve Bayes 0.695
SVM 0.655
Figure 16 shows the ROC analysis for the classification algorithm. As can be seen, using a threshold of 0.5,
the classification method results in less than 10% false positives and under 40% false negatives, which
prove the impact of the independent variables in safety score.
Figure 16: ROC Analysis for Safety Classification for Random Forest
The classification algorithm provided the relative importance of the independent variables as shown in
Figure 17. We can see that the number of Number of Braking Events while Turning is the most important
factor, while the number of abrupt lane changes is the least decisive factor.
38
Figure 17: Relative Feature Importance
Safety Score: Relative Feature Importances
Number of Breaking Events while Turning

Number of Abrupt Breaking Events
Number of Accelerations Events while Turning
RatioRalentiTiempoTotal
TempMaximaReportada
KilometrajeAcumulado
ExcesoVelocidadVelocidadMaxima
TiempoGeoUO
RatioTempTiempoDebajoMinima
RPMMaxima
RatioNumStops
AvgSpeed
TiempoOperacionHMS
RatioAceleradorDuracionEventos
Number of Excesive Acceleration Events
RatioRMPDuracionTotalEventos
Number of Abrupt Turns
Number of Abrupt Acceleration Events
RatioExcesoVelocidadEventos
Number of Abrupt Lane Changes
- 0.01 0.01 0.02 0.02 0.03 0.03 0.04
4.3 Fuel Efficiency and Safety

The results of our Bayesian Gaussian Mixture Model are show in Figure 18. Each cluster represents a
driving style that is characterized by a specific driving behavior. The average driving behaviors per cluster
are shown in Figure 19. We then used the driving behaviors and their impact on Fuel Efficiency and Safety
Score to come up with names that personify each cluster, in this way creating the driving styles personas.
For example, Speedy Sebastian is a cluster characterized by the highest average speed, which can lead to
higher Fuel Efficiency, but, as Figure 18 shows, this does not translate to the highest Safety Scores.
Figure 18 also evidences the inherent tradeoffs between fuel efficiency and safety. For example, Average
Arthur has the highest safety score, but a relatively low fuel efficiency, while Speedy Sebastian increases
fuel efficiency while having a lower safety score.
This personification of each cluster is a simplification of more than 20 different variables. Therefore, the
interpretation of Figure 18 and Figure 19 requires in-depth analysis of each of the average attributes, as
39
each of the variables is needed to interpret a driving style. In other words, it is complicated to ask an
individual driver to adjust his or her performance to change variables like the number of braking while
turning events by a certain factor but also the number of sudden accelerations by a different factor. These
clusters embody the driver’s decisions at a microscopic level, to humanize the math behind the decisions
that build driving behaviors.
Figure 18: Driving Style Clusters’ Mean Safety Score and Fuel Efficiency
Note: the size of the bubble is determined by the Number of Observations.
40
Cluster interpretation
Figure 19 shows a condensed view, with numerical values, of each variable used in the cluster analysis. As
discussed with Coca-Cola FEMSA, describing a driving behavior with numbers is not as impactful as
personifying each cluster, so we assigned a persona to each cluster to relate them to human driving
behaviors and make easier to interpret.
Figure 19: Condensed View of Driving Styles
Note: Parameters normalized to show differences
Average Arthur (2.76KM/L, 91.69)
This driver is the safest. Average Arthur puts safety first, this is the average route. Average time under
optimal temperature, average idling time. The important differentiator making Average Arthur the safest
is that he has almost 0 reckless driving events. No abrupt turns, no braking or accelerating while turning.
In general, Average Arthur does not rush his deliveries, his speed is average; hence his fuel efficiency
indicator is suboptimal.
Overachiever Ophelia (2.85KM/L, 87.64)
41
Overachiever Ophelia trades off a few safety score points to increase fuel efficiency, but she stays in the
pareto frontier of safety and fuel efficiency. She tries to be fast and efficient because she has some of the
longest routes, she is cautious, but due to her average higher speed, she has the highest number of high-
speed events and top RPM. She is still very safe, reporting a low number of abrupt accelerations, lane
switching, and accelerating while turning events.
Inefficient Isobel (2.73km/L, 68.57)
This persona is out of the pareto frontier, undoubtedly the company would be benefitted from finding a
way to help her improve performance. Inefficient Isabel has a low fuel efficiency but also a low safety
score. She has a high number of acceleration events but a low average speed, also she is the top
accelerator while turning by far, as well as the highest number of stops. This could be explained by high
density urban areas, in peak traffic times and probably with many customers. Inefficient Isabel could be
helped by overachiever Ophelia to increase efficiency and safety since they contrast highly in their
parameters.
Speedy Sebastian (2.99km/L, 76.71)
Speedy Sebastian takes fuel efficiency to the extreme. He is still in the pareto efficiency barrier, but he
lost balance, his driving style is too risky since his average safety score is under 80. He has the highest
average speed, most accumulated kilometers, so probably the longest routes. Given his high speed he has
the highest number of abrupt turns and abrupt lane changes. He also has a high number of accelerations
while turning, resulting in the mentioned low safety score.
Hazardous Harry (2.85km/L, 55.16)
This is probably the most undesirable driving behavior. Like inefficient Isabel, Hazardous Harry is not in
the pareto frontier, Hazardous harry has a dangerous 55 safety score due to his extremely high number
42
of sudden braking events and abrupt turning; his performance is also affected by a high idling time. This
persona could be helped by Average Arthur to increase safety score while not sacrificing fuel efficiency.
5 DISCUSSION
The motivation for our project to understand which driving styles can help fleet owners increase Fuel
Efficiency and Safety. We centered our methodology on understanding (1) the main drivers behind fuel
efficiency, (2) the main drivers behind safety, and (3) trade-offs between driving styles that promote Fuel
Efficiency and those that promote Safety. This section will provide an in-depth analysis of results,
managerial recommendations, and future directions for the core aspects of our research.
5.1 Fuel Efficiency
We developed a linear regression model to predict fuel efficiency using data from telematic sensors
of trucks. This will allow companies to understand how different factors affect fuel efficiency and by
controlling and optimizing these factors, we can decrease their CO2 emissions and fuel costs. Our model
explained the variance of the observed data with an adjusted R2 of 0.6. We observed that our model
predicts well under various circumstances such as route changes. This leads us to believe that our model
is a good reference for thoughtful experimentation. It allows decision makers to prioritize certain variables
that can have a high impact and it also allows them to quantify the potential cost and CO2 emissions
reductions. When planning a project experiment to improve fuel efficiency, decision can now project costs
and then use the model to project potential gains and calculate the return on investment on a project
related to increasing fuel efficiency.
The main variables regarding driving behavior that have an impact on fuel efficiency are average
speed, full throttle events, top revolutions per minute, and top temperature reported. This finding
indicates that a regular driving behavior is the most fuel efficient and irregular driving behavior, with many
braking and acceleration events, but low average speed, turns out to be the least efficient. We can relate
43
this driving behavior with the pattern that drivers may follow in a megacity in peak hours when the heavy
traffic makes drive more aggressively.
In Table 3 we saw a chart with the final coefficients and relative importance of our fuel efficiency
regression model. A positive (green) coefficient indicates whether the feature and dependent variable
have a positive correlation, whereas a negative (red) coefficient indicates a negative correlation. Similarly,
a high value in relative importance explains high impact of the variable to fuel efficiency; for example: avg
speed is the most important factor defining fuel efficiency and a higher average speed impacts in a higher
fuel efficiency.
5.2 Safety
Safety is a complicated variable to measure, given the chaotic nature and low occurrence of traffic
accidents. Some measures given by telematics providers can contribute to measure a proxy of safety with
reckless driving events, such as sudden accelerations, sudden braking events and sudden lane changes.
Compared to fuel efficiency, we concluded that safety is better fitted to a classification algorithm because
the relation between a safety score and the probability of having an accident is not as lineal as the
relationship between the fuel efficiency and fuel used.
As a result of our classification algorithm, the most important factors reducing safety score are the
Number of Braking Events while Turning, number of abrupt braking events, number of acceleration events
while turning and Idling time. The least important ones according to our data are: Number of Abrupt Lane
Changes, Overspeeding events (RatioExcesoVelocidadEventos), Number of Abrupt Acceleration Events
and Number of Abrupt Turns.

Driving styles are made up by drivers’ decisions that can be reflected and quantified with the help of
vehicle telematics in different driving indicators (independent variables in our project). These behaviors
44
will result in different fuel efficiency and safety indicators that can be below or within the pareto frontier
between both variables. Companies should always attempt to boost all clusters to the pareto frontier and
define which factor weighs more depending on their business strategy and corporate culture.
This clustering technique can also help find the drivers that are not performing as well overall. For
example, driving styles that follow the Inefficient Isabel cluster would benefit by transitioning to an
Average Arthur driving behavior, as they would increase both safety and fuel efficiency.
Cluster rating: A pareto frontier can be seen spotted in Figure 18, indicating that depending on the weight
that a company gives to safety score and fuel efficiency, a different driving style could be defined as
optimal. Different regions may have different priorities for their transportation metrics; therefore, the
first step is to align the company’s priorities with the weights assigned to each transportation metric.
Based on this, we can rate the clusters based on the company’s priorities and the environment for each
case, for example: safety may be more relevant for a route that goes through the center of a city, but a
route that goes through a low population density, high distance route in a town may focus more on fuel
efficiency.
As shown with our linear regression model, if a company is trying to improve fuel efficiency, they should
look for higher average speed, lower acceleration events, lower number of stops, lower time with cold
engine temperature and lower max RPM.
45
6 INSIGHTS AND MANAGEMENT RECOMMENDATIONS
6.1 Fuel Efficiency
Driving with low average speed is the main factor affecting fuel efficiency. Flexibility to deliver
during off traffic peak times and attempting to consolidate truck stops would increase a vehicle’s average
speed and could have a huge impact on fuel efficiency. Each company has its own level of complexity,
particularly in secondary distribution and last mile delivery. Therefore, when possible, fleet owners should
first understand their model and main implications, run statistical analyses to compare the validity of
these models, and draw conclusions particularly to their model. Factors such as vehicle type, geographic
area, regulatory restrictions, street conditions, delivery windows and other business considerations can
represent important variables that might affect a particular vehicle’s model.
Results from the regression and simulator show that average speed, acceleration events and
maximum RPM are the 3 most important variables for fuel efficiency. With small changes like increasing
speed by 1km/h, reduce acceleration events in 5% and reduces maximum RPM by 5% fuel efficiency can
be increased by 6%. The immediate next steps should be to plan a controlled experiment to bring these
savings to reality, or to find the hidden implications of changing our regression model’s independent
variables.
Standardizing telematics data is necessary to boost the value of understanding and improving driving
styles. We suggest standardizing the data pipeline that converts telematics data reports into valuable
insights that could enable a real time fed final dashboard that gets data from the regressions and
clusterization models. This would provide flexibility in the analysis and insights that may be applicable to
all zones or routes but would need a new model to replicate. Also, by not having a standardized data
pipeline companies can lose too much time and may have human error by replicating a model manually.
46
After conducting a thorough analysis of the driving styles and their components analysis, we have three
main insights and management recommendations. The first one is on deciding the optimal driving style
considering safety and fuel efficiency, the second one is a proposed framework for changing driver’s
driving styles, and the third one is on a proposed framework for operationalizing the insights around this
project.
Given the inherent tradeoff between safety and fuel efficiency, a company should first define the weight
they give to each factor, considering that different routes may require different priorities. After
discussions with the Coca-Cola FEMSA, safety is the top priority and even though graphically Overachiever
Ophelia seems the best because it is almost as safe as Average Arthur, Average Arthur results as the best
option.
Many of the attributes that lead to an optimal driving style like minimizing the amount of accelerating
quickly or minimizing the maximum revolutions per minute are driving behaviors that are rooted in the
driving style of each person. Changing these habits could prove to be a massive undertaking that would
require best-in-class change management practices. We suggest a framework around nudges that can
extrinsically and intrinsically motivate drivers. Table 4 shows some of the nudges that could be considered
to motivate the drivers to change their driving styles:
47
Table 4: Change Motivator Nudges for Driving Styles
Frequency Intrinsic Nudges* Extrinsic Nudges
Daily SMS at the end of the shift with SMS at the end of the day with
CO2 saved by day with an details of where the driver
equivalent of how impactful this behaved in unsafe or inefficient
is for the world. manners.
Real time pulses (To cellphone
or watch) notifying undesired
events.
Weekly Dashboard with Scorecard, Variable Bonus or Prizes related
Trends and Badges to fuel efficiency objectives
Monthly Gamification**. Monthly Trainings and Behavior
objectives based on best Transfer***
performance days. Public Recognition
Notes:
• *Nudges are a term that was popularized by the Nobel Prize in Economics, Richard Thaler. It is
commonly defined as a set of policies that lead people to make good and better decisions without
depriving them of the freedom to choose.
• **Gamification is a broad term that encompasses the application of typical elements of game
playing to other areas of activity to encourage change and/or engagement. Some of the most
typical gamification techniques such as points, badges, and leaderboards. Gamification could be
applied to encourage drivers to change their driving styles. An example of the concept applied to
his problem would be to give points to users that constantly drive efficiently or constantly achieve
a high safety score. Another example would be to give badges to drivers as they progress to more
efficient ways of driving, the personas could be used as the badges, so people could transition
from being a predominant Average Arthur to an Overachiever Ophelia.
• *** Training and Behavior Transfer: A driving style is the result of microscopic human decisions,
therefore the best way to improve driving style is by enabling interaction between best
48
performers and the rest of the drivers, this way the individual drivers’ best practices will survive
and be replicated to other drivers.
The third recommendation is to develop the necessary infrastructure to continuously gather the data,
create reports and send alerts to drivers. Telematics data actually comes from various Telematics
providers, so the formats, sensors and infrastructure are different. The first step to consolidate the data
would be design a data model to standardize the data coming from various providers. The architects of
this solution could use, as reference, the variables that this project suggests as most impactful and useful.
The project would involve the participation of a team of data engineers to create the pipelines and the
orchestrator to continuously transfer the data from the providers to a centralized and standardized data
storage resource. The resource to store the data could be either a data lake or a data warehouse that can
store big data in an efficient and cheap manner. The processing power could come from the data
warehouse or from the tool to be use for reporting. Various solutions exist in the market that could be
used to visualize the data and allow users to interact to derive insights, some examples include Microsoft
Power BI, Tableau and MicroStrategy. Finally, a resource should be installed to allow the system to
continuously send alerts to the necessary personnel as nudges. The code that was developed for this
project can serve as a guide for data cleaning, transformation and the machine learning models can be
used for predictions.
A possible objective could be to train and motivate drivers from the Inefficient Isobel cluster to transition
over to the Average Arthur cluster. This would improve the Safety Score by 34% without sacrificing fuel
efficiency.
49
7 FUTURE RESEARCH
7.1 Fuel Efficiency

In terms of possible venues for future research into fuel efficiency, we suggest three possible ideas.
First, the next step that we suggest to Coca-Cola FEMSA in the management recommendations is also a
good research experiment: planning a controlled experiment would make our model more robust by
adding any considerations that might appear during experimentation. An example would be to do a proper
experiment design in which a treatment variable could be limiting the maximum number of revolutions
per minute and seeing if the reduction in fuel efficiency is in accordance with the fuel efficiency regression
prediction.
Second, in this research we define the impact that different variables have in fuel efficiency, but we
assumed all routes to be static, which is not the case as shown in Anomaly detection (4.1.3). Because of
this, another future line of research is to input these correlations into a mixed integer lineal programming
routing model that takes into considerations a variable fuel efficiency factor given the relations with
driving behavior. This model would choose faster, more fuel-efficient routes that may be longer in
distance but overall better when we consider the different fuel efficiencies that come from different
routes.
Lastly, there is a deep relation between environmental sustainability and fuel efficiency. Fuel
efficiency and sustainability are generally a win-win situation, but there may be cases in which sustainable
does not mean fuel efficient. For example, the shortest path could be less fuel efficient but more
sustainable since the overall fuel used is less. Sustainability could go beyond fuel efficiency; the most
sustainable option could be a change to EV or a whole different routing logic, so including a broader aspect
of environmental sustainability to driving behavior is another attractive approach, to continue this line of
research, since environmental sustainability is an increasingly important global topic.
50
7.2 Safety and Fuel Efficiency
For future research in this area, we suggest three different approaches. The first one is one related to real-
time feedback loops, the second one is running models including exogenous variables, the third one is to
adjust the vehicle routing problem to consider the safety and fuel efficiency that a route could have, based
on historical telematics data.
In the last section we suggested the use of daily SMS messages to the drives summarizing their
performance and areas of improvement. Another approach that could be followed would be to provide
real-time feedback loops whenever a driver is behaving in unsafe or inefficient driving styles. Possible
ways to provide these real-time feedback loops are flashing lights, electronic bracelets with buzzers, visual
cues on the dashboards, sound cues from the speakers, etc. The main challenges of this project would be
to create a nudge system that does not distract the driver from the wheel, a system that cannot be
tampered or fooled with and finally creating a system that does not annoy the drivers to a point of
exasperation.
The second suggested area of research is to run a machine learning models with more variables that are
known to affect both fuel efficiency and safety. Some examples of exogenous variables that would be
interesting to include are weather and traffic. The rationale behind this would be to increase the predictive
power of the model and understand how different external factors affect safety and fuel efficiency and
see if there are potential areas of improvement when external conditions adverse.
The third approach is to implement the factors that we define in our project to the vehicle routing
problem. The most straightforward approach that we suggest is to keep the standard formulation but add
a factor of fuel efficiency and safety score to the cost of each arc. Generally, the objective function
attempts to minimize cost by minimizing distance, our proposal is to include the fuel efficiency and safety
factors that a driver would have depending on the route they choose, based in the historical telematics
data using our framework).
51
8 CONCLUSION
Companies across the world are trying to improve their fuel efficiency and safety of last-mile logistics. Our
research focused on understanding both concepts and the trade-offs between safety and fuel efficiency
on the data from 3,000 trucks from Coca-Cola FEMSA from Mexico. We applied a multiple polynomial
regression to understand and predict fuel efficiency from different parameters based on telematics,
successfully predicting fuel efficiency with a MAPE of 8.7 and R2 of 0.67; through anomaly detection we
prove the correlation explained in our linear regression. Afterwards, we applied a random forest
classification algorithm to classify safety in driving behaviors with an AUC of 0.713. Then, we applied a
clustering algorithm to all our independent variables to understand the tradeoffs between the safety and
fuel-efficient driver behaviors. We then generated simulation tools and insights for Coca-Cola FEMSA to
do scenario planning and understand further the impact of different variables in driving behavior.
Our research shows how companies that use telematic data can improve both their safety and fuel
efficiency. Our approach can be adopted by small companies using mobile phone data and by large
companies using telematic sensors. We have shown how several small changes in driving styles can lead
to large benefits not only for the bottom line of companies but also for the safety of drivers and the
improvement of the world.
52
REFERENCES
Carlos, M.R., Gonzalez, L.C., Wahlstrom, J., Ramirez, G. Martinez, F., Runger, G. How Smartphone
accelerometers reveal aggressive driving behavior? The Key is the Representation. IEEE
Transactions on Intelligent Transportation Systems IEEE Trans. Intell. Transport. Syst. Intelligent
Transportation Systems, IEEE Transactions on. 21(8):3377-3387 Aug, 2020
Demir, E., Bektas, T., Laporte G., 2011. A comparative analysis of several vehicle emission models for
road freight transportation. Transportation Research Part D (16), 347–357
Demir, E., Bektas, T., Laporte G., 2014. A review of recent research on green road freight transportation.
European Journal of Operational Research (237), 775–793
Fafoutellis, P., Mantouka, E., & Vlahogianni, E. (2020, December 29). Eco-Driving and its impacts on Fuel
Efficiency: An overview of technologies and data-driven methods. Retrieved April 06, 2021,
from https://www.mdpi.com/2071-1050/13/1/226/htm#B18-sustainability-13-00226
Gao, P., Kaas, H., Mohr, D. (2014). Climate Change 2014: Mitigation of Climate Change. Contribution of
Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate
Change (C., Ed.). Retrieved September/October, 2020, from
https://www.ipcc.ch/site/assets/uploads/2018/02/ipcc_wg3_ar5_chapter8.pdfGao, P., Kaas,
H., Mohr, D., & Wee, D. (2018, May 08). Automotive revolution – perspective towards 2030.
Retrieved October 16, 2020, from https://www.mckinsey.com/industries/automotive-and-
assembly/our-insights/disruptive-trends-that-will-transform-the-auto-industry/de-de
Gilman, E., Keskinarkaus, A., Tamminen, S., Pirttikangas, S., Röning, J., & Riekki, J. (2015, March 09).
Personalised assistance for fuel-efficient driving. Transportation Research Part C: Emerging
Technologies, Volume 58, Part D, 2015, Pages 681-705, ISSN 0968-090X
Harris, P., Houston, J., Vazquez, J., Smither, J., Harms, A., Dahlke, J., & Sachau, D. (2014, July 05). The
Prosocial and Aggressive Driving Inventory (PADI): A self-report measure of safe and unsafe
driving behaviors, Accident Analysis & Prevention, Volume 72, 2014, Pages 1-8, ISSN 0001-4575
Hooper, A., & Murray, D. (2018, October). An Analysis of the Operational Costs of Trucking: 2018
Update. Retrieved October 16, 2020, from https://truckingresearch.org/wp-
content/uploads/2018/10/ATRI-Operational-Costs-of-Trucking-2018.pdf
Houston, J. M., Harris, P. B., & Norman, M. (2003). The Aggressive Driving Behavior Scale: Developing a
self- report measure of unsafe driving practices. North American Journal of Psychology, 5, 193-
202.
Jaller, M., Sánchez, S., Green J., Fandiño M., 2015. Quantifying the impacts of sustainable city logistics
53
measures in the Mexico City Metropolitan Area. Transportation Research Procedia (12), 613–
626
Johnson, D.A.; Trivedi, M.M. Driving style recognition using a smartphone as a sensor platform. In
Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation
Systems (ITSC), Washington, WA, USA, 5–7 October 2011; pp. 1609–1615.
Lai, W. (2015). The effects of eco-driving motivation, knowledge and reward intervention on fuel
efficiency. Tranportation Research Part D January 2015 34:155-160.
Amarasinghe, M., Kottegoda, S., Arachchi, A. L., Muramudalige, S., Dilum Bandara, H. M. N. and Azeez,
A., "Cloud-based driver monitoring and vehicle diagnostic with OBD2 telematics," 2015
Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo,
Sri Lanka, 2015, pp. 243-249, doi: 10.1109/ICTER.2015.7377695.
Meirin, G., Myburgh HC. A Review of Intelligent Driving Style Analysis Systems and Related Artificial
Intelligence Algorithms. Sensors. 2015; 15(12):30653-30682.
https://doi.org/10.3390/s151229822
Moreno, E. (2014). índices de Precios en el Transporte por Carretera. Retrieved October 16, 2020, from
https://www.imt.mx/archivos/Publicaciones/PublicacionTecnica/pt424.pdf
Ping, P., Qin, W., Xu, Y., Miyajima, C.and Takeda, K., "Impact of Driver Behavior on Fuel Consumption:
Classification, Evaluation and Prediction Using Machine Learning" in IEEE Access, vol. 7, pp.
78515-78532, 2019, doi: 10.1109/ACCESS.2019.2920489.
Rodríguez, D. A., Targa, F., & Belzer, M. H. (2006). Pay Incentives and Truck Driver Safety: A Case Study.
ILR Review, 59(2), 205–225. https://doi.org/10.1177/001979390605900202
Rowe, R., Roman, G., McKenna, F., Barker, E., & Poulter, D. (2014, October 29). Measuring errors and
violations on the road: A bifactor modeling approach to the driver behavior questionnaire.
Accident Analysis & Prevention, Volume 74, 2015, Pages 118-125, ISSN 0001-4575,
https://doi.org/10.1016/j.aap.2014.10.012
Scania, C. (2014). Modeling the Relation Between Driving Behavior and Fuel Consumption. Retrieved
April 05, 2021, from https://www.cgi.com/sites/default/files/white-
papers/driving_behavior_and_fuel_consumption_white_paper.pdf
Shearer, C. (2000) The CRISP-DM Model: The New Blueprint for Data Mining. Journal of Data
Warehousing, 5, 13-22. Journal of Data Warehousing. Volume 5, Number 4, Fall 2000. Pages 13-
22.
Sims, R., Schaeffer, R., Creutzig, F., Cruz-Núñez, X., D’Agosto, M., Dimitriu, D., Figueroa Meza M.J.,
54
Fulton, L., Kobayashi, S., Lah, O., McKinnon, A., Newman, P., Ouyang, M., Schauer, J.J., Sperling,
D., and Tiwari, G., 2014: Transport. In: Climate Change 2014: Mitigation of Climate Change.
Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental
Panel on Climate Change [Edenhofer, O., R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K.
Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier, B. Kriemann, J. Savolainen, S. Schlömer, C.
von Stechow, T. Zwickel and J.C. Minx (eds.)]. Cambridge University Press, Cambridge, United
Kingdom and New York, NY, USA.
Toledo, T., Musicant, O., & Lotan, T. (2008, March 04). In-vehicle data recorders for monitoring and
feedback on drivers' behavior. Transportation Research Part C: Emerging Technologies, Volume
16, Issue 3, 2008, Pages 320-331, ISSN 0968-090X, https://doi.org/10.1016/j.trc.2008.01.001
Vaiana R., Iuele T., Astarita V., Caruso M.V., Tassitani A., Zaffino C., Giofre V.P. (2014, January 2). Driving
Behavior and Traffic Safety: An Acceleration-Based Safety Evaluation Procedure for
Smartphones. Modern Applied Science; Vol. 8, No. 1; 2014. ISSN 1913-1844
Velázquez, J., Fransoo, J., Blanco, E., Valenzuela, K., 2016. A new statistical method of assigning vehicles
to delivery areas for CO2 emissions reduction. Transportation Research Part D (43), 133– 144.
Vorndran, I. (2011, July). Unfallentwicklung auf deutschen Strassen 2010. Retrieved October 16, 2020,
from https://www.destatis.de/DE/Methoden/WISTA-Wirtschaft-und-
Statistik/2011/07/unfallentwicklung-2010-072011.pdf?__blob=publicationFile
Xu, Y., Li, H., Liu, H., Rodgers, M., & Guensler, R. (2016, October 24). Eco-driving for transit: An effective
strategy to conserve fuel and emissions. Applied Energy, Volume 194, 2017, Pages 784-797,
ISSN 0306-2619, https://doi.org/10.1016/j.apenergy.2016.09.101
Yuche, C., Lei, Z., Gonder, J., Young, S., Walkowicz, K. 2017. Data-driven fuel consumption estimation: A
multivariate adaptive regression spline approach. Transportation Research Part C: Emerging
Technologies, Volume 83, 2017, Pages 134-145, ISSN 0968-090X.
55

Gonzalez Torresarpi Capstone Fuel Efficiency and Safety in Coca-Cola Femsa Last-Mile Logistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gonzalez Torresarpi Capstone Fuel Efficiency and Safety in Coca-Cola Femsa Last-Mile Logistics

Uploaded by

Copyright:

Available Formats

Fuel efficiency and safety in Coca-Cola FEMSA last-mile logistics

SUBMITTED TO THE PROGRAM IN SUPPLY CHAIN MANAGEMENT

Signature of Author: ____________________________________________________________________

Signature of Author: ____________________________________________________________________

Arturo Torres Arpi Acero

Fernando González Gil

Submitted to the Program in Supply Chain Management

Capstone Advisor: Dr. María Jesús Saenz Gil de Gómez

always being supportive throughout our master’s program.

Figure 1: Characteristic turn maneuver and lane change patterns ............................................................ 11

Table 1: Classification Model Comparison.................................................................................................. 38

and fuel efficiency.

be using for this project.

driving styles, hurting them in safety, environmental impact, and cost.

analyzed with traditional methods and/or spreadsheets.

macroscopic level by analyzing the summary of daily metrics per truck.

into our model.

impacts of a driving style.

50% more likely to be involved in car crashes compared to other drivers.

such as age, gender, and geographic locations.

detecting the changes in longitudinal and lateral acceleration.

Figure 1: Characteristic turn maneuver and lane change patterns

weather conditions, also affect a driver safety in a variety of ways.

Figure 2: Forces Influencing Driver Safety (Douglas and Swartz, 2016)

had also fewer accidents.

factors that influence fuel efficiency.

Figure 3: Factors Influencing Fuel Efficiency

since they are not explainable models.

load, which is this study’s focus.

reward system, a 10% improvement in fuel consumption efficiency was achieved.

2.3 Fuel Efficiency and Sustainability

can lessen the amount of CO2 produced.

emissions produced by the fleet.

inherent tradeoff between these two topics.

vehicles and the general constraints of the operation.

Figure 5: Driving Styles Independent Variables and Dependent Variables

and fuel efficiency (3.4.5).

3.1 Business understanding

efficiency (which affect equally cost savings and CO2 emissions).

street conditions, weather, and other vehicles’ driving behavior.

3.2 Data understanding

3.3 Data preparation

Data was also further processed in the following manner:

model) into dummy variables to be used in regression models.

truck spent accelerating with respect to the total time of operation.

3.4.1 Fuel Efficiency

4. Homogeneity of Variance (i.e., homoscedasticity)

Multicollinearity issues were addressed in three different ways:

discard metrics that were proxies of each other.

just between two of them.

remain constant, an example is shown in Figure 13.

suggested by Houston, J. (2003) and Harris, P. (2014).

events while turning, etc.).

independent variables used are listed in Table 2.

classification problems as it provides an aggregate measure of performance across all possible

a random positive example more highly than a random negative example.

3.4.3 Cluster Analysis

variables we used twenty different variables shown in Table 2.3.

Table 1: Features used for Cluster Analysis

Feature ID Variable name

Note: All variables were measured at a vehicle-day disaggregation level.

3.4.4 Individual Cluster Analysis