You are on page 1of 15

Science of the Total Environment 786 (2021) 147366

Contents lists available at ScienceDirect

Science of the Total Environment

journal homepage: www.elsevier.com/locate/scitotenv

Forecasting transitions in the state of food security with machine learning


using transferable features
Joris J.L. Westerveld a,b,c,⁎, Marc J.C. van den Homberg c, Gabriela Guimarães Nobre e,f, Dennis L.J. van den Berg c,
Aklilu D. Teklesadik c,d, Sjoerd M. Stuit b
a
TNO Defense, Security and Safety, the Netherlands
b
Department of Experimental Psychology, Utrecht University, the Netherlands
c
510, an initiative of the Netherlands Red Cross, the Netherlands
d
Vrije Universiteit Brussel, Belgium
e
Institute for Environmental Studies (IVM), Vrije Universiteit Amsterdam, the Netherlands
f
World Food Programme, Research, Assessment and Monitoring Division, Italy

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Forecasting food insecurity is essential


to be able to trigger early actions for ex-
ample, by humanitarian actors.
• Forecast monthly transitions in food se-
curity in Ethiopia using supervised ma-
chine learning and open data.
• The transferrable model performs better
when forecasting long term (7 months)
compared to short time (3 months).
• Combining machine learning and open
data can add value to existing
consensus-based forecasting approaches.

a r t i c l e i n f o a b s t r a c t

Article history: Food insecurity is a growing concern due to man-made conflicts, climate change, and economic downturns.
Received 30 November 2020 Forecasting the state of food insecurity is essential to be able to trigger early actions, for example, by humanitar-
Received in revised form 14 March 2021 ian actors. To measure the actual state of food insecurity, expert and consensus-based approaches and surveys are
Accepted 22 April 2021
currently used. Both require substantial manpower, time, and budget. This paper introduces an extreme
Available online 27 April 2021
gradient-boosting machine learning model to forecast monthly transitions in the state of food security in
Editor: Martin Drews Ethiopia, at a spatial granularity of livelihood zones, and for lead times of one to 12 months, using open-source
data. The transition in the state of food security, hereafter referred to as predictand, is represented by the
Integrated Food Security Phase Classification Data. From 19 categories of datasets, 130 variables were derived
Keywords: and used as predictors of the transition in the state of food security. The predictors represent changes in climate
Food security and land, market, conflict, infrastructure, demographics and livelihood zone characteristics. The most relevant
Early warning systems predictors are found to be food security history and surface soil moisture. Overall, the model performs best for
Open data forecasting Deteriorations and Improvements in the state of food security compared to the baselines. The pro-
Machine learning
posed method performs (F1 macro score) at least twice as well as the best baseline (a dummy classifier) for a
Extreme gradient boosting
Deterioration. The model performs better when forecasting long-term (7 months; F1 macro average = 0.61)
IPC
compared to short-term (3 months; F1 macro average = 0.51). Combining machine learning, Integrated Phase
Classification (IPC) ratings from monitoring systems, and open data can add value to existing consensus-based

⁎ Corresponding author at: TNO Defense, Security and Safety, the Netherlands.
E-mail address: Joriswesterveld93@gmail.com (J.J.L. Westerveld).

https://doi.org/10.1016/j.scitotenv.2021.147366
0048-9697/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

forecasting approaches as this combination provides longer lead times and more regular updates. Our approach
can also be transferred to other countries as most of the data on the predictors are openly available from global
data repositories.
© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction (IPC) Reference Table since 2011 (FEWS NET, 2011). Through the use
of the IPC indicator, the complexity of the status of food security in a re-
Food security is reached when “the criteria that all people in a given gion can be translated in a generic yet rigorous and straightforward
region have physical and economic access to sufficient, safe, and nutri- way. For determining a certain IPC class, FEWS NET builds technical con-
tious food to meet their dietary needs and food preferences for a healthy sensus regarding the classification of acute food insecurity by engaging
and active life at all times, are met” (FAO, 2003). Unfortunately, food with relevant experts in every country they operate (FEWS NET, 2011).
insecurity is a growing concern. Since 2015, the world has been Experts converge on assigning a food security classification (Lentz et al.,
experiencing an increase in the prevalence of undernourishment, and, 2019) based on current and future information on different food secu-
in 2018 alone, approximately 1.3 billion people were food insecure at rity dimensions for several well-informed scenarios of food security.
moderate levels (FAO, 2019; 2003). Climate variability and climate This early warning system has been tested in Ethiopia by (Choularton
change (FAO, 2008), land degradation (Frelat et al., 2015; Holden and and Krishnamurthy, 2019), who showed that it was remarkably accu-
Shiferaw, 2004), man-made conflicts (Holleman et al., 2017), economic rate although with mixed forecasting accuracy in situations of transition
downturns or food price spikes (economic access) (Godfray et al., 2010; from food security to food crises. They also note that investments in data
HLPE, 2011) all impact a region's food security. To end hunger and en- collection and analysis could likely yield improvements in the perfor-
sure access to food by all people, multiple political commitments have mance of the FEWS NET system. Therefore, the FEWS NET current fore-
been established, such as the second goal (zero hunger) of the United cast is not fully data-driven but relies largely on expert judgment and
Nations (UN) Sustainable Development Goals (UN, 2019). However, “most likely” scenarios. For driving the “most likely” scenarios, key as-
reaching the Sustainable Development Goal of zero hunger is challeng- sumptions based on critical factors are considered. However, the choices
ing. Urgent action is needed if the goal is to be achieved by 2030. for key factors can increase a model's uncertainty and reduce its trans-
Consequently, there is a need for more insights on how to anticipate, parency. Therefore, producing a data-driven approach for quantifying
prevent, and respond to food security crises in a more effective and key factors can strengthen FEWS NET's transparency, accuracy, and re-
timely manner. Reducing the impact of an impending crisis through producibility, especially when using open data.
early action is not only important for saving lives and reducing suffering, The use of big, preferably open-source, data in combination with
but early action is also more efficient since it can yield significant cost predictive analytics could improve the existing monitoring and forecast
savings of humanitarian aid (Guimarães Nobre et al., 2019). Given the systems of food security that rely on convergence-of-evidence method-
widening gap between humanitarian needs and the funds available to ologies and local stakeholders (Lentz et al., 2020). We note that the term
respond (Development Initiatives, 2020), the possibility to better allo- “prediction” in predictive analytics refers to estimating the output for
cate scarce resources is essential. In terms of food security, this means unseen input data. Often, models use input and output data that have
one must not only monitor but also forecast the state of food security. the same temporal dimensions. Forecasting is a subdomain of predic-
However, forecasting the state of food security requires an extensive in- tion, whereby the output represents a state later in time than the
formation system (Braimoh et al., 2018) that is capable of intelligently input does (Döring, 2018). Several actors are currently exploring the
capturing the multidimensionality of food security-related phenomena use of predictive analytics for humanitarian purposes. OCHA (2021)
(Barrett, 2010; Barrett, 2002; Headey and Barrett, 2015). lists in their catalogue for predictive models in the humanitarian sector,
There are several approaches to monitor food security, of which apart from FEWS NET, the Artemis Famine Action Mechanism (FAM)
Jones et al. (2013) describe them as tools that can: (1) provide model from the World Bank, and the Hunger Map (World Food
national-level estimates of food security; (2) inform global monitoring Program, 2021a) and Safety Nets Alert Program (SNAP) (World Food
and early warning systems; (3) assess household food access and acqui- Program, 2021b), both from World Food Program. Also, academia is de-
sition; and (4) measure food consumption and utilization. Household veloping models that are not yet operationalized but may be in the
surveys, which form input to categories 1, 3, or 4, are mainly used to future.
measure several or single food security dimensions. From these surveys, This section presents and discusses the state of the art of these hu-
well-known food security metrics are extracted, such as the Food manitarian and academic models that give forecast information on
Consumption Score, Household Dietary Diversity Score, the Coping one or a combination of food security dimensions. These dimensions
Strategy Index, and the Household Food Insecurity Access Scale (Food are availability, access, utilization, and stability (Lentz et al., 2019).
Security Cluster, 2017; Jones et al., 2013). Despite providing detailed in- Some models are production-focused. They forecast a derivative of
formation on food security outcomes, these surveys are costly in terms availability, such as the yield of the main crop in a country. Guimarães
of assets and time (Barrett, 2010). Consequently, they are often per- Nobre et al. (2019) developed a forecast model using Fast-and-Frugal
formed at a limited spatial scale, such as at the household or regional Trees to unravel relationships between climate variability, vegetation
levels (Jones et al., 2013). Also, these surveys are used for providing in- coverage, and maize yields (the predictand) at multiple lead times
sight into food consumption in the past given a certain period, and as and at the district levels in Kenya. Biffis and Chavez (2017) also devel-
such, they provide only limited insights on future food security crises. oped a model to predict crop yield in relation to weather risk insurance
As a result, there is a need for developing predictive models for forecast- in Mozambique.
ing the state of food security. Others are focusing on one of the other dimensions of food security,
Currently, the central system for food security monitoring and fore- i.e., access, utilization, and stability. For example, Mwebaze et al. (2010)
casting is the Famine Early Warning Systems Network (FEWS NET, and Okori and Obua (2011) used causal structure discovery and several
2019). The FEWS NET has been monitoring food security since 1985 machine learning classifiers to predict famine at the household level
(Funk et al., 2019). It provides information that stakeholders around (utilization) using one relatively small household-level dataset. We
the world consistently require for strategic decision-making about note that they developed a prediction model but not a forecasting
food security. The FEWS NET projections of food security are compliant model. The World Food Program (WFP) Hunger Map is a platform
with the guidelines established by the Integrated Phase Classification that uses streams of public data from multiple sources to create a

2
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

complex dynamic subnational portrait of world hunger in 63 countries. 2. Materials and methods
It is mostly a monitoring or nowcasting system with daily updates. In
places where the data is missing, machine learning algorithms such as To forecast transitions in food security, a new approach has been
Xgboost are used to make extrapolations. Another monitoring system developed that is depicted in Fig. 1. First, we collected scalable
from WFP is the SNAP which can detect (abnormal) rises in food prices datasets at monthly intervals to be used as predictors of change
(Lentz et al., 2019) used linear and log-linear models to forecast events (Figs. 1, 2.1.1) and derived the target variable for classifica-
-individually- three food security dimensions at different spatial aggre- tion (change event; Figs. 1, 2.1.2). Second, we preprocessed the
gation levels. Their model was limited to one lead time (2 months ear- datasets by carrying out three sub-steps: imputation and process-
lier than the quarterly or semi-annually IPC actual value release) and ing (Figs. 1, 2.2.1), feature engineering (Figs. 1, 2.2.2), and class im-
trained on data from only 1 year for Malawi. In principle, they state balance (Figs. 1, 2.2.3). Third, we applied three machine learning
that their model is replicable and can model other countries. Finally, a algorithms, “Extreme Gradient Boosting”, “Random Forest”, and
limited number of models are more holistic since they forecast the over- “CatBoost” (Figs. 1, 2.3.1), tuned the hyperparameters, and vali-
all IPC classification, which uses information related to both availability dated the best model (Figs. 1, 2.3.2). Lastly, we compared our re-
and access. Andrée et al. (2020) used machine learning to forecast food sults to several baselines (Figs. 1, 2.3.3).
security transitions as part of the Artemis FAM program. They tried sim-
ple machine learning models, such as linear regression, up to more 2.1. Input and output data
complex ones. Random Forest performed best in their case. They delib-
erately did not use lagged values of the IPC ratings as predictors as they 2.1.1. Input variable
wanted to update forecasts monthly, independently of whether recent Misselhorn (2005) and Connolly-Boutin and Smit (2016) describe
IPC ratings are available (Andrée et al., 2020). They used as a benchmark several drivers for food security in the biophysical, political, and socio-
the FEWS NET outlooks. economic domain. For our approach, we have used a range of different
For all of the models above, remote sensing data is often used to cre- drivers: markets, conflict, climate and land, infrastructure, demograph-
ate the predictors in relation to mostly the environmental and climate ical variables, and livelihood zone characteristics. For a complete over-
variability drivers of food security. For example, Zargar et al. (2011) view, see Supplementary Table A.1. A livelihood zone is defined as a
give an overview of different drought-related indices whereby each geographical area within which people share the same patterns of ac-
index uses remote sensing data processed to a greater or lesser extent. cess to food and income (that is, they grow the same crops, or keep
Recent research combines machine learning or statistical learning the same types of livestock), and have the same access to markets
with remote sensing data to provide better monitors or forecasts for (Household Economy Approach, 2008). In consequence, we created an
one specific driver. Boult et al. (2020) developed the TAMSAT Alert, to inventory of online open-source geospatial data repositories at the spa-
provide forecasts of seasonal mean soil moisture and the water require- tial scale of livelihood zones, described below.
ment satisfaction index. First, in terms of the market drivers, we collected data on food mar-
As we have explained, the research field of food insecurity pre- ket prices from the World Food Program for the three most traded prod-
vention, preparedness, and early action is evolving at a fast pace. ucts in Ethiopia, namely wheat, maize, and sorghum (World Food
Our analysis of related work shows that artificial intelligence in Program, 2020). We used the average monthly prices per administrative
the form of machine learning, and open and big data, are being level 1 (both retail and wholesale) of different food markets from 2010
used more and more. However, this is often for monitoring or to 2018. We rasterized the information available at administrative level
nowcasting systems. There is still little research into a forecasting l, and subsequently extracted the average monthly prices at the liveli-
system that is able to detect, for different lead times, holistic hood zone scale. For an overview of all the markets used to calculate
changes in the state of the food security and that is robustly trained the average mean price, see Supplementary Table A.2. Food market
on long time series. Also, benchmarking is often done in a very lim- prices can hold relevant information about food accessibility (Godfray
ited manner. et al., 2010). In addition, we extracted the price volatility from the
Building upon earlier research from van der Heijden et al. (2018), WFP repository information, which is known to influence an individu-
our research hypothesized that a supervised machine learning algo- al's purchasing power (HLPE, 2011). Second, we collected conflict data
rithm can forecast whether the state of food security improves, re- from Uppsala University (2018), which is a highly relevant driver of
mains the same, or deteriorates (hereafter called “change events”) the rise in food insecurity (FAO et al., 2017; Food Security Information
within a livelihood zone. Furthermore, we hypothesize that a ma- Network, 2018).
chine learning algorithm will perform better than heuristic-based Third, we used satellite imagery stored in the Google Earth Engine
baselines but also than a dummy classifier. This model should also (GEE) Database for the climate and biophysical indicators. The GEE
be transparent and scalable to other countries (by using a large num- data archive contains climate, biophysical indicators, and demographi-
ber of predictors that are openly available from global data reposito- cal variables, which are already processed. For example, the model
ries). This means that the methodology implemented in one country used the Normalized Difference Vegetation Index (NDVI) data set,
can be transferred to other countries. This study has Ethiopia as a which calculates its value using Near-infrared and Red bands (Google,
proof-of-concept, a country in which food security transitions are 2020). The NDVI, precipitation, and soil moisture were shown to be im-
often observed. For instance, food security deterioration is regularly portant drivers of food availability in past studies (Frelat et al., 2015;
observed among pastoral and agropastoral communities inhabiting Holden and Shiferaw, 2004). Supplementary Table A.1 contains refer-
the northeastern and southeastern regions, whereas a stable state ences that describe the climate and biophysical predictors extracted
of food security is regularly observed in the western highlands. from the GEE. Fourth, the GEE database also contains datasets that
Moreover, central zones can often bounce between improvement represent a proxy for infrastructure, such as the Global Friction Surface
and deterioration in the state of food security (Choularton and and Accessibility to Cities datasets (Weiss et al., 2018). Information
Krishnamurthy, 2019). concerning the available infrastructure can indicate the connectedness
This paper is organised as follows. First, the materials and methods and access of people to markets (Godfray et al., 2010; Rosegrant and
are described, which contains information about the data and model Cline, 2003). We used a proxy dataset for travel speed, measured in
specifications. Next, the results for the overall performance, long- and minutes required to travel 1 m, which we derived from data on roads,
short-term forecasting and spatial performance is introduced. The last railways, and terrain types. Closely related, we used an accessibility
section discusses the performance of the model and its contribution to- map that expresses the land-based travel time to the nearest densely-
wards food security. populated area. Furthermore, we retrieved population count data

3
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 1. Flowchart of the methodological framework divided into three steps: (i) collection of input and output data; (ii) data preprocessing; and (iii) modelling. ADASYN refers to the
Adaptive Synthetic sampling approach.

given that population growth and density may affect the labour force without current or programmed humanitarian assistance. Lastly, we
and household welfare as more people have to compete for labour have also included livelihood zone characteristics which contain infor-
and share the outcome of a constant land area (Holden and Shiferaw, mation about the characteristics of an area. For example, some liveli-
2004; Sheffield et al., 2014). We obtained information about past hu- hood zones were identified as urban, while others were pastoral. The
manitarian assistance to the regions from the Famine Early Warning information has been further processed and extracted within the liveli-
Systems Network (2011) IPC classes, whose shapefiles indicate whether hood zone (Fig. 2). Supplementary Table A.1 summarizes all key input
the IPC phase classification would likely be at least one phase worse datasets for our selection of food security drivers.

4
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 2. Livelihood zones in Ethiopia used to aggregate the different datasets. In each of these homogeneous zones, people share broadly the same pattern of livelihood. The area of a live-
lihood zone varies between 230 and 69,176 km2. The geographical coordinate system 3857 is used to create this map.

2.1.2. Output variable shifted to February, June, and October. Because of this temporal shift,
For the target variable, we used the current situation Integrated we rounded a linear interpolation to make it possible to use the IPC
Phase Classification (IPC) class (or the actual observed IPC) from class data from the two different periods (an IPC value of 2.33 would
FEWS NET (2011) during the period of 2010 to 2018 to differentiate be- be rounded to 2, and a value of 2.5 would be rounded to 3). Second,
tween three different transitions (Eq. (1)). given that we interpolated the IPC values every month, we used
monthly input datasets instead of only for particular periods. Also, we
CEðtÞ ¼ −ðIPCðt þ nÞ–IPCðtÞÞ ð1Þ linearly interpolated missing values at the temporal scale for the food
market prices, population count, NDVI, precipitation, and soil moisture.
where t is the current month, n the lead time to be forecasted ahead, to Furthermore, we aggregated the different datasets on the livelihood
be chosen from one to 12 months, and CE (Change Event) defines the scale. Given that the population count, NDVI, rain precipitation, and
change in IPC class. As a higher IPC class corresponds to a worse food se- soil moisture were all data based on satellite imagery, we further proc-
curity state, this definition identifies positive CE values with Improve- essed these datasets by aggregating each pixel using either the mean,
ment transitions, negative with Deteriorations, and zero values with median, or sum for each livelihood zone. The conflict dataset from the
No Change. The IPC class is available in shapefiles at the FEWS NET. Uppsala University (2018) reports the number and the duration of con-
flict events in Ethiopia at a yearly aggregated level. We assumed that the
2.2. Data pre-processing number of fatalities for each conflict event were uniformly distributed
over its duration. Regions without conflict reports were labelled as 0.
2.2.1. Imputation & processing Furthermore, we also used several data sets which had static values
Open-source data created the need to deal with missing values in per month per region. Specifically, the data sets that contained static
both input and output datasets. The method used linear interpolation values are elevation, accessibility to cities, a friction map, and livelihood
to fill in the missing values. The importance of this step is twofold. zone characteristics. For the cases where no data were available on the
First, IPC values were released every January, April, July, and October type of livelihood zone, we considered it an “unknown” zone type. The
from 2010 until the end of 2015. However, in 2016, the release date same applied to the main stock and crop data set (see Supplementary

5
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Table A.1). If a certain stock or crop was missing in a region, we filled it with a decision problem. Second, these methods allow us to use differ-
and made this a separate category (labelled as −1). ent types of variables, such as binary and continuous, together without
any data transformation. Third, tree ensemble methods can identify the
2.2.2. Feature engineering: creating new variables from existing predictors most important variables. The most important variable can both reveal
To increase the performance of the machine learning approach, fea- which variables are relevant for the prediction but also functions as a
ture engineering was used. This allowed us to extract additional charac- type of feature selection. Less relevant variables won't be selected
teristics from the datasets (see Supplementary Table A.1). In total, we during splits and won't be used in the model. For more information
created 50 additional variables using feature engineering. We intro- about the Xgboost, CatBoost or Random Forest we refer to Chen and
duced three new dummy variables for the different rainy seasons of Guestrin, 2016), Dorogush et al. (2018), and Breiman (2001).
Ethiopia: the Belg (February to May), Kiremt (June to September) and
Bega (October to January). For instance, if a month was in the Belg sea- 2.3.2. Hyperparameter tuning
son, the row had a value of 1, otherwise, it was 0. Next, we also included To compare the different ensemble methods fairly, hyperparameter
time lags for the IPC and a binary variable indicating whether a region is tuning was carried out (see Fig. 3). Tuning the hyperparameters allowed
receiving humanitarian support, which resulted in 12 new variables. To us to identify the model with the best performance that generalizes well
create this, we shifted each variable with a period i from 1 to 6 months. to unseen data. In our case, a holdout validation strategy was used. With
We did not include the time lags for NDVI, rain precipitation, soil mois- this strategy, a part of the dataset, referred to as the hold out set, was put
ture, and food market prices. Instead, we used a rolling mean metric. In aside to be used later to test the performance of the model. The whole
this case, we used the mean value per variable for the four preceding data set was split into a training set ranging from 2010-01-01 until
months. Also, we created three different binary variables from the 2016-01-31 and the hold out set from 2016-02-01 until 2018-06-01.
change events. These three variables are a cumulative sum of the The hold out set was left untouched and was completely independent
times a livelihood zone has, in the previous period, deteriorated, im- of the training and tuning of the model. A training set is needed to
proved, or not changed with regard to food security. By using these train the model by giving it a set of examples (sample of the data)
three variables, the model uses more historical context. We also in- (Ripley, 1996, p. 354). The hyperparameter tuning was done for the 4-
cluded the cumulative sum of the fatalities per livelihood zone to in- month interval change event, which corresponded to the lead time of
clude the historical background of conflict and stability per livelihood the FEWS NET forecasts.
zone. Lastly, we also included a binary variable on whether market Hyperparameters were tuned before training and testing the model. To
prices increased or decreased compared to the previous month for tune the model without introducing bias, the training set was used in a
both retail and wholesale prices. In the end, this resulted in 130 vari- grid-search time-series cross-validation to find the best hyperparameters
ables distributed over 19 different predictor types (for a complete list for the three different ensemble methods. Grid search is a method in
see Supplementary Table A.1). which a set of different combination of hyperparameters are given to the
model. A repeated 10-fold time series cross-validation was used. Using
2.2.3. IPC class imbalance this cross-validation, the train set gets larger with each consecutive split
The data collection and processing described above resulted in a that was selected beforehand. The size of the test window remains the
somewhat imbalanced dataset. For instance, the dataset with the same but slides over until the last 10th split. For each k-step split, we
change events calculated for the 7-month lead time contained 10,647 resampled the imbalanced training set using the ADASYN algorithm,
No Change transitions, 2444 Deterioration transitions, and 2840 Im- which we explained briefly in the class imbalance Section 2.2.3. Using a re-
provements transitions in total. To counter this imbalance, we used peated times series cross-validation is important for two reasons. First, to
the Adaptive Synthetic sampling approach (ADASYN) by He et al. find a robust model, it is essential to train and test on different periods.
(2008). This technique uses a weighted distribution for the minority Secondly, respecting the timeline is of vital importance for preventing in-
class examples according to their difficulty in learning. In other words, formation leakage from future data points.
classes that are harder to learn get more synthetic data generated The selection of hyperparameters for each ensemble method is
than classes that are easier to learn (He et al., 2008). As suggested by based on finding the highest performance score (F1 macro), while also
He et al. (2008), by using ADASYN we not only reduce the bias which finding a model that generalizes well (e.g., does not overfit). Thus, the
can be introduced by the class imbalance but also shift the decision performance of the model was quantified by averaging the F1 macro
boundary to the more difficult examples. This allows the machine learn- score over the ten splits. The F1 score is the harmonic mean between
ing model to be more effective in identifying the underrepresented clas- the precision and recall metric. Precision is the proportion of positive
ses. It is of crucial importance to detect a change from food security identifications that were correct (Google, 2019). Thus, it identifies the
towards food insecurity for early warning purposes. proportion of true positives compared to all positives (true and nega-
tive). Recall is the proportion of positives that were identified correctly
2.3. Model (Google, 2019). Thus, it identifies the proportion of true positives com-
pared to samples it should have found (true positives and false nega-
2.3.1. Model specifications tives). The F1 score is used because it penalizes for false positives as
To find out which machine learning algorithm performs the best, we well as false negatives and is less sensitive for class imbalance. We
compared three different popular ensemble methods. In our case, we used the macro variant of the F1 score because this variant gives more
compared a Random Forest, an Extreme Gradient Boosting Algorithm weight to the less occurring classes. The reasoning behind this choice
(Xgboost), and a CatBoost algorithm. The methods are based on the was that we assumed that the detection of Deteriorations and Improve-
wisdom of the crowd principle, which means that the aggregation of a ments was more important than detecting the majority class (No
group of predictors (like decision trees) often performs better than Change).
the best individual predictor (Géron, 2019). Using a tree ensemble Using this strategy, we calculated two different F1 macro scores
method has several advantages in this use case. First, tree ensemble during hyperparameter tuning. From the 10-fold time series cross-
methods avoid multicollinearity, thus inhibiting redundant information validation split we calculated a training score and validation score. In
among the predictors. This is because tree-based ensemble methods this case, the training score was the average training score over the 10
split on the best candidate. If two variables (X and Z) are highly corre- splits for a candidate model using the training data from each split.
lated, the tree will only use one of them. For instance, if the tree uses The training score describes how well the model performs after training
variable X for making a decision, (split) variable Z will not be subse- on the training set itself (already seen data). The validation score is how
quently used given that the information contained in Z does not help well the candidate model performs using the test data from each split

6
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 3. An illustration of our chosen time series cross-validation strategy. The data is first split into a training and hold out set. Afterwards, the training set is used in a 10-fold times series
cross-validation.

and quantifies how well candidate models performed. The smaller the in a specific month. The last model (called RO) used the assumption
difference between the training and validation score, the better the that the most recent observation for a livelihood zone, taking into account
model generalizes. After selecting the hyperparameters for each the months, is a good forecast for the future (see the Supplementary
method (see Supplementary Table A.3 for the selected values) with Table A.4 for a summary).
the best balance between performance (highest validation score) and
generalization (smallest difference between the train and validation 3. Results
score), we used these hyperparameters to compare the best performing
candidate model from each ensemble method. 3.1. Model comparison

2.3.3. Validation In Fig. 4 we show the performance of three tuned tree-ensemble


Using the selected ensemble method and hyperparameters found, methods of the forecasts done 4 months ahead. This comparison was
we then retrained on the full training set (which is upsampled using done by executing 100 runs per method. We found that the Xgboost
ADASYN) and tested on our test set to estimate how well our model per- performs the best compared to the Random Forest and CatBoost. A
formed at different time intervals. This produced the test score from the Welch t-test between the three normal distributions of algorithm scores
hold out set, which was put aside. To validate the model, we also looked confirmed this since the differences between the F1 score of both the
at the geographical aspects: which livelihood zones were forecasted Xgboost and CatBoost (t (198) =19.86, p < 0.001) and the Xgboost
more and less accurately. We first calculated the F1 score per livelihood and Random Forest (t (198) =86.33, p < 0.001) are significant. Interest-
zone. Furthermore, we tested how the model performed over different ingly, the random forest had the smallest difference between the train-
lead times. To interpret model performance, we compared the selected ing and validation score. Nevertheless, since the Xgboost was very close
ensemble method to a number of baselines. These baselines were com- to the random forest score on this metric, and the performance of the
posed of simple heuristic models or models based on chance and thus Xgboost in the validation set was significantly higher, we further ex-
could not be tuned. plored the results of the Xgboost.
To demonstrate the added value of the model's performance, we cre-
ated several comparison models to serve as a baseline for performance. 3.2. Model performance based on the hold out set
The first baseline (DCS) was determined by a heuristic model based on
the dummy classifier from the python package scikit-learn (Pedregosa We evaluated the performance of the Xgboost (XGB) model against
et al., 2012). This model generates forecasts by respecting the training the baseline models based on 100 runs. Subsequently, we calculated
set's class distribution (Pedregosa et al., 2012). The second heuristic the 95% confidence intervals of the F1 score for the Xgboost approach
model (HN) used the historical norm, assuming that the historically and the DCS approach. Note that confidence intervals are not applicable
most occurring situation per livelihood zone is sufficient to forecast to the other approaches as they are deterministic by nature. We found
the CE in the next period. The third heuristic model (referred to as FP) that the Xgboost model generally outperformed all other models
assumed that the future equals the present. Therefore, the most recent (Fig. 5A). Moreover, the Xgboost model performed at least twice as
history of a livelihood zone forecast the livelihood zone in the future. well as the best baseline performance on Deterioration events (Fig. 5B)
Note that if there is No Change event, this means that the future equals and almost five times as well on Improvement events (Fig. 5D). Only
the present. The fourth model (HNT) took the temporal aspect into ac- on No Change events could the baseline performance surpass the
count by counting the historically most-frequently occurring situation Xgboost model (Fig. 5C). This pattern was often expected given that

7
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

the other hand, a 3-month interval is more likely to trigger action and
therefore is important to understand the variables generating this
early warning signal.

3.3. Long-term forecasting: 7-month interval

This section takes a more detailed look into the performance and
most important variables of the Xgboost model when forecasting 7
months into the future on the hold out set. In Fig. 6, we show the ratio
of correct forecasts per change event. In contrast to all other approaches,
the Xgboost model showed a clear and strong diagonal pattern in its
confusion matrix that visualizes the performance of the algorithm
(Fig. 6), as expected for a well-performing model. In other words, the
closer the diagonal numbers are to 1, the better is the model perfor-
mance. Specifically, this figure shows that of all Improvement cases in
the hold out set, it predicted a ratio of 0.77 (77%) correctly. The number
of cases per class are listed in Table 1. This means that the Xgboost
model was able to forecast all three classes of change events. For an
overview of a complete performance table (including metrics other
than the F1 Macro), please see Supplementary Table A5. Furthermore,
we investigated in more detail the performance of the model by identi-
fying which specific underlying IPC class transition was being predicted
Fig. 4. shows the F1 Macro performance of the different ensemble methods. This figure correctly (using the F1 Macro). For instance, we could identify how
reveals that the Xgboost performs the best out of the three, while the difference many times it correctly predicted an Improvement, this could for exam-
between the train and validation score is also low (e.g., that the model generalizes well).
ple be 3 (the initial IPC class) to 2 or 1. The results from this are shown in
Table 1. The result show that the model predicted most of the Improve-
we applied techniques to balance the data set, which in turn lead to per- ment cases, especially when there were Improvements from an initial
formance increase for the minority class and performance decrease for IPC value 3 or 4. The model did not predict the scarce cases (20) cor-
the majority class. Since performance appeared to stabilize around the rectly when an IPC went from a 3 to a 4 (a Deterioration). However,
7-month interval, we further explored the model in two timescales: the model has performed relatively well in detecting mild levels of
up to 3 months and up to 7 months. These time intervals can potentially Deterioration (0.68: from an initial IPC 1 to a higher class) at this long
support short- and long-term planning of humanitarian interventions. lead time.
However, in practice, a 7-month early warning for potential food insecu- To score the importance of each variable, and thus identify which
rity would require additional monitoring of the food security status. On variables are the most important for our model, we used the feature

Fig. 5. Average performance of the models, bootstrapped over 100 runs whenever possible, for different forecast intervals. Fig. 5 shows the overall performance expressed as the F1 Macro,
including, when applicable, the 95% confidence intervals (note that the small interval size makes them hard to see). The Xgboost classifier (XGB) performed better for each forecast
window when compared to the baselines. The performance increased as the forecast interval expanded. Figs. B–D show the performances for specific classes in more detail. Note that
while the relatively uninteresting class No Change Event decreased in performance for longer intervals, the classes Improvement and Deterioration both increased.

8
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 6. The y-axes show the true classes Deterioration (D), No Change Events (NCE) and Improvement (IM) for 7-month lead time. On the x-axes are the forecasted classes. In each cell, we
show the ratio of correct decisions for each combination, normalized across rows meaning such that each row sums to 1. Note that the Xgboost model was the only one that showed a clear
raised diagonal, where the forecasted class is most often the same as the true class, as one would like to see in a confusion matrix. For the absolute values please refer to Fig. B.1 in the
supplementary materials.

Table 1
The left side of the tables shows the specific F1 macro score per transition for different initial starting IPC classes for the 7-month interval. The right side shows the distribution of the dif-
ferent transitions per class. This table reveals that the model correctly predicted Improvement transitions 3 and 4 from the initial IPC class.

Correctly identified cases (F1 Macro) in the hold out set Distribution of cases in the hold out set

Deterioration No change Improvement Deterioration No change Improvement

Initial IPC class


1 0.68 0.91 NA 406 1709 0
2 0.19 0.30 0.41 183 882 347
3 0.0 0.28 0.88 20 59 287
4 NA NA 0.81 0 0 45
5 NA NA NA 0 0 0
Total cases 609 2650 679
Percentage of the data set. 0.15% 0.67% 0.17%

9
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

importance tool from scikit-learn, for 100 Xgboost models using a dif- different livelihood zones using the 7-month lead time model over 100
ferent random seed for each model. Next, we aggregated the mean im- runs (Fig. 10A). The model identified regions in northwest Ethiopia
portance values of each variable based on these 100 models (Fig. 7). more easily (thus a higher F1). These are high potential cropping high-
Since there are many variables included in the model, we only to show- land regions with stable state food security levels, which the Xgboost
case the top 20 highest-scoring feature-importance variables. Results forecasted accurately as No Change events of IPC class 1. The darker re-
showed that the most important variables reflect the IPC situation de- gions are more difficult to forecast compared to regions that have a ligh-
scriptors at the time point from which the forecast is made. ter colour. The lowest performance was observed largely in the Afar
Region, a predominantly pastoral region with consistently high levels of
3.4. Short term forecasting - 3-month interval recurring food crisis (Choularton and Krishnamurthy, 2019). Further-
more, the model performance over the large pastoral areas Somali and
Comparing the scores from the Xgboost model to the baselines Oromia showed moderate to poor performance. Despite experiencing re-
showed a similar pattern as with the 7-month interval analysis. Namely, current conflict and high levels of dryness, several pastoral communities
the Xgboost model predicted Deteriorations and Improvements better in the Somali region have also had to deal with declined levels of rainfall
than the baselines (see Fig. 8). The Xgboost model performed worse due to climate change. Overall, forecasting the status of food security in
when predicting Deteriorations for the 3-month interval compared to this region is a challenging task. To better understand why change events
the 7-month interval analysis. A performance table (including metrics in some areas can be forecasted better than in others, it was essential to
other than the F1 Macro) is included in Supplementary Table A.6. look closer at the training data. Specifically, it was important to analyse
Looking at the specific transitions that it predicted correctly (using whether there was a relationship between the number of change events
the F1 macro score), Table 2 revealed that the pattern is similar to the for a given area and algorithm performance. Note that more change
7-month interval analysis. That is, the model predicted Improvements events allow for better training of the model. Results showed a negative
from the initial IPC classes 3 and 4 (albeit a bit worse than the 7- relation between the number of change events and the performance
month interval model) very well. This model detected No Changes bet- per livelihood zone (Pearson r (368) = 0.605, p < 0.001; figure results
ter than the 7-month interval. Note that the total number of cases for 10B). This higher performance was expected since we have previously
the 3-month and 7-month intervals differ slightly, simply because observed that No Change events are forecasted more often correctly
there are fewer 7-month intervals in a given time window than there (Fig. 5).
are 3-month intervals, and hence in our dataset.
In addition, Fig. 9 displays 20 key variables selected by the model using 4. Discussion
the same method as for the 7-month interval. These variables were found
to be important for the short-term transitions of the state of food security. In this paper we introduced a machine learning model to forecast
We observed that a range of climate and biophysical predictors were now food security transitions for the livelihood zones in Ethiopia with differ-
being used, which demonstrate that intra-seasonal weather variability has ent lead times using open-source data. Our model showed higher per-
strong links with the state of the food security in Ethiopia. Furthermore, formance on the F1 macro score relative to six baselines and two
we observe that variables that reflect soil moisture levels had the other machine learning algorithms. Our model forecasts three specific
most predictive value for the upcoming state of food security. Lastly, levels of change events with long‑lead time well: Deteriorations (from
we found that the IPC situation from the present and past also plays IPC 1 to 2 or minimal to stress), No Change (IPC 1 or minimal) and
an important role. Improvements (from IPC 4 to 3, or emergency to crisis, and IPC 3 to 2,
or crisis to stress). Forecasting the onset of stressed levels of food inse-
3.5. Spatial variations for the 7-month lead time curity 7 months ahead of time means that more preventive actions
can be put in place to avoid the rise in food insecurity. Furthermore,
To identify regions where the model performed well, we assessed the model mostly chose predictors that reflect the vulnerability of the
the F1 macro score of the Xgboost model at the livelihood zone-level. livelihood zone, which is currently modelled as a measure of how
In particular, we look at the spatial variation in performance for these often Deterioration takes place (or in other words, how often the zone

Fig. 7. This image shows the top twenty features with the highest average feature-importance scores over 100 runs of the model with a 7-month lead time.

10
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 8. The y-axes show the true classes Deterioration (D), No Change Events (NCE) and Improvement (IM) for 3-month lead time. On the x-axes are the forecasted classes. Each cell shows
the ratio of correct decisions for each combination, normalized across rows meaning that each row sums to 1. Note that the Xgboost (XGB) model is the only one that shows a clear raised
diagonal, where the forecasted class is most often the same as the true class, as one would like to see in a confusion matrix. Note that the scores of the Future Equals the Present and
Historical Norm were identical. This happened because the historical norm for each livelihood in the 3-month interval is that there are no changes in the food security state. For the
absolute values, please see Fig. B.2 in the supplementary materials.

Table 2
The left side of the tables shows the specific F1 macro score per transition for different initial start IPC classes for the 3-month interval. The right side shows the distribution of the different
transitions per class. This table reveals that the model correctly predicted Improvement transitions 3 and 4 from the initial IPC class for this lead time.

Correctly identified cases (F1 macro) in the hold out set Distribution of cases in the hold out set

Deterioration No change Improvement Deterioration No change Improvement

Initial IPC Class


1 0.47 0.90 NA 251 2250 0
2 0.16 0.43 0.28 145 1327 221
3 0 0.54 0.76 20 174 219
4 NA 0 0.79 0 13 34
5 NA NA NA 0 0 0
Total cases 416 3764 474
Percentage of the data set 0.09% 0.81% 0.10%

11
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 9. This image shows the top twenty features with the highest average feature importance scores over 100 runs of the model with a 3-month lead time.

experiences food insecurity). This comes in combination with economic over time (Heinrich and Bailey, 2020). It can also be that communities
conditions (prices) and the Bega season (dry season), which seems to suffer gradually from the impact of two or more consecutive seasons,
indicate that future food security transitions can be better forecasted which have similar impacts to drought seasons but do not have below-
with predictors that are recorded in the dry season and that they are de- average rainfall. Variables that reflect food market prices, precipitation,
termined by a combination of the dynamics of food security vulnerabil- and seasonal information are also relatively important, corresponding
ity and price. Therefore, Deteriorations may be prevented by ensuring well with the research from Godfray et al. (2010). They showed that
the availability of food in local markets at stables prices, especially dur- food market prices could influence whether people have access to
ing the lean season in which communities increasingly rely on markets world markets (which in turn affects food security). Our results also
for supplies at high prices (Abay and Hirvonen, 2016). corresponded with the research from Jones et al. (2013) and Holden
Furthermore, our models performed reasonably well with a 3- and Shiferaw (2004), which showed that the NDVI, precipitation, and
month lead time for detecting No Change (IPC 3) and Improvements soil moisture are essential drivers of food availability.
(from IPC 4 to 3, or emergency to crisis, and IPC 3 to 2, or crisis to stress). Our results also showed that both space and time are essential mod-
Short-term transitions in the state of food security have strong links erators for our forecast performance. The performance was better for
with climate variability, which plays a determinant role in the health areas that showed fewer change events, and it tended to improve
of the biophysical system. For instance, the model selected rainfall, when forecasted further in the future. Concerning the spatial relation-
NDVI, and soil moisture from the Kiremt season preceding the dry sea- ships, this performance behaviour was not surprising. Mainly, we saw
son. Soil moisture is a critical element of the hydrological cycle that di- that forecasting a lack of change was associated with the highest perfor-
rectly affects plant water availability, overall plant productivity, and mance; areas that rarely changed were then associated with the highest
crop yields – especially in arid and semiarid areas with limited water performance. On the other hand, areas, where change was more fre-
and marginal agricultural lands (Krishnamurthy et al., 2020). These quent, were more likely to experience food insecurity and, as such,
weather variables play an important role as they directly influence would benefit the most from accurate forecasting. More importantly,
food production of the agropastoralism and agricultural sector. Given the performance of our algorithm was better when forecasting about
that soil moisture levels and rainfall variability are key drivers of transi- 7 months in the future. Note that the increase in performance was
tions in the state of the food security, access to granular seasonal fore- mainly due to a better forecast for the most relevant class for food secu-
cast information such as the TAM-SAT alert (Boult et al., 2020) and rity, namely Deterioration. This effect can be explained by the delay be-
rainfall anomalies through the Greater Horn of Africa Climate Outlook tween events that will result in food insecurity and the effect they will
Forum can create extra information for supporting the prioritization of have on the resources of the population. In terms of forecast, the delay
fast humanitarian interventions. is useful since it allows sufficient time to plan anticipatory action.
In spite of the promising results, a careful analysis was essential as Anticipatory action to lessen the impact of a Deterioration of food secu-
the complexity of the food security context includes different food sys- rity ranges from structural to non-structural measures and from the
tems interacting with several socio-economic and environmental pro- household up to national levels. In addition, humanitarian agencies
cesses. For example, pastoralists rely on different food systems than might benefit from early forecasts of Improvements to better quantify,
non-pastoralists. Coughlan de Perez et al. (2019) showed that pastoral- diversify and prioritize their –often limited- budget for anticipatory ac-
ists in East Africa more frequently experience food insecurity than do tion. For example, anticipatory action can be planned and prioritized for
non-pastoralists. One of the key environmental processes is drought, areas with Deteriorations outlooks, such as improving water resources
whereby drought is a hazard with a slow onset and a large spatial and management, food distribution, and nutrition screening. The forecasts
temporal extent. Agricultural drought can have a severe impact on the 7 months ahead can be used to mobilize financial resources, shape
food security and nutrition of populations whose lives and livelihoods structural measures and to start preparations for the non-structural
are highly dependent on rainfed agriculture. For instance, drought can measures, such as to pre-position stocks (Red Cross, 2020).
have a negative effect on pastoralist families by causing widespread It is important to note that forecasting future transitions in the state
death of their livestock (World Food Program, 2019). Thus, drought of the food security requires continued effort in monitoring the current
can be seen as a “silent emergency” - impacts are insidious and build status of food security. Therefore, efforts carried out by providers of

12
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Fig. 10. Fig. 10A shows the map of Ethiopia overlaid with the Xgboost model performance per region for the 7-month lead time. The regions in the northwest were the easiest for the model
to identify (thus a higher F1). The darker regions are more difficult to identify compared to regions that have a lighter colour. Fig. 10B shows the relationship between the percentage of No
Change events and the Xgboost F1 macro score. Results showed that the more stable a region is, the easier it is for the model to identify these correctly. This most likely explains the spatial
variation in Fig. 10A.

early warning information such as FEWS NET and the WFP remain es- The proposed approach is of great relevance given the transferability
sential enablers of food insecurity prevention and reduction. Thus, we of the model to other countries as most of the data on the predictors are
believe that our model can satisfactorily complement these existing openly available from global repositories. Although the model is trans-
systems. ferable, the model might produce a different ranking of the importance

13
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

of variables for another country, given that there are different socio- based on the trigger will not reach the right people or area (livelihood
economic and environmental processes at play. Model performance zone in our case), having negative consequences not only for the affected
will increase once data with higher spatial and temporal resolution be- people, but also the supporting organisations, in terms of trust, and do-
comes available. The current model implementation required interpola- nors (van den Homberg et al., 2020).
tion at the temporal scale variables with regard to population, NDVI, and In conclusion, we showed that transitions in the state of the food se-
precipitation, which most likely had a negative effect on performance. curity can be forecasted using open data with a machine learning algo-
Similarly, the livelihood zone is a relatively large spatial area; if data rithm named Extreme Gradient Boosting machine learning model with
on all predictors were available at lower spatial resolutions, the model the strongest performance for longer lead times. Soil moisture-related
would possibly also have a higher performance. Soil moisture as a key variables were the most important predictors for the shorter lead
predictor for the 3-month lead time was now only openly available at time, and socio-economic variables the most important for the longer
25 to 30 km2, whereas higher resolution soil moisture data do exist. lead time. Some of the key drivers for transitions in food security, such
The current model has some limitations that could be addressed in as soil moisture, NDVI, and rainfall, can be monitored and forecasted
future research. First of all, we expected conflict to have a more signifi- in high temporal and spatial resolution and with relatively high accu-
cant impact on food insecurity than our results show. Brown et al. racy. We showed that combining machine learning, monitoring systems
(2020) created an overview of empirical studies of factors associated with open data adds value to existing consensus-based forecasting ap-
with malnutrition. Delbiso et al. (2017) and Akresh et al. (2012) showed proaches as they provide longer lead times and more regular outlooks.
that shocks due to conflict are a consistent predictor of child malnutri- Our approach can also be transferred to other countries as most of the
tion. Delbiso et al. (2017) used conflict data from both the Uppsala da- data on the predictors is openly available from global data repositories
tabase and ACLED to determine if conflict took place (yes or no) in a such as the remote sensing derived predictors.
certain area 5 months prior to a nutrition survey in the same area. The code can be found on https://github.com/rodekruis. Supplemen-
Akresh et al. (2012) used a demographic health survey taken 2 years tary data to this article can be found online at doi:https://doi.org/10.
after the Eritrean-Ethiopian conflict. In both cases, analyses were done 1016/j.scitotenv.2021.147366.
on an individual level on one specific aspect or outcome of food security
(i.e., malnutrition) with only one lead time, while ours was done at a CRediT authorship contribution statement
more spatially aggregated level (livelihood zones) and for several lead
times. Our limited impact showcases the difficulty in relating conflict Joris J.L. Westerveld: Writing – original draft, Data curation,
to food insecurity if it is done at a more aggregated level. Moreover, re- Validation, Methodology, Software. Marc J.C. van den Homberg:
search from Lentz et al. (2019) also shows that forecasting shock effects Supervision, Methodology, Writing – review & editing, Funding acquisi-
like conflict or disasters with machine learning techniques is difficult; tion. Gabriela Guimarães Nobre: Funding acquisition, Writing – review
consensus and Delphic approaches work better for these. Nevertheless & editing. Dennis L.J. van den Berg: Data curation, Software, Writing –
Lentz et al. (2019) proposed that perhaps markets assessments and review & editing. Aklilu D. Teklesadik: Writing – review & editing.
prices (which are included in the current model) could also give an in- Sjoerd M. Stuit: Supervision, Methodology, Software, Writing- Original
dication about disruptions due to shocks like conflict or natural disas- draft preparation.
ters. Another potential solution would be to include the dataset from
ACLED, such as Delbiso et al. (2017) did, to see if conflict rises in impor- Declaration of competing interest
tance in the model.
Second, there are many different types of machine learning algo- The authors declare that they have no known conflict/competing fi-
rithms. We chose to compare the Xgboost, Random Forest, and CatBoost nancial interests or personal relationships that could have appeared to
algorithms. Many other state-of-the-art algorithms exist, such as the influence the work reported in this paper.
Light Gradient Boosting Machine or Rotations Forests. Based on the rel-
atively small differences in performance between the three algorithms Acknowledgements
we tested, our expectation is that other algorithms will have similar per-
formance. Third, we did not use a separate algorithm to perform feature We would like to express our gratitude to Stijn Heemskerk for his
selection before running the machine learning model. We expected all helpful discussions on the initial model.
features to be relevant at one of the lead times, whereby our ensemble Marc van den Homberg, Gabriela Guimarães Nobre and Joris
methods have an internal feature selection mechanism. The results Westerveld were partly funded by the GFDRR Forecast-based financing
also showed that the most important features for the 3-month interval for food security project. Marc van den Homberg, Aklilu Teklesadik and
are different than the features for the 7-month interval. Therefore, it is Joris Westerveld were also partly funded by the Ikea Foundation as part
important not to discard features, since different features might be im- of the Innovative Approaches to Response Preparedness Program. Marc
portant for different lead times. So, in our case, the ensemble methods van den Homberg was also partly funded by the Netherlands Red Cross
also functioned as an internal feature selection per lead time. We do, Princess Margriet Fund FbF methodologies project.
however, acknowledge that there are also advantages in using feature
selection. One advantage is that a model becomes more parsimonious, References
and therefore efforts and resources on data collection can be minimized.
Another advantage is that feature importance scores with feature selec- Abay, K., Hirvonen, K., 2016. Does Market Access Mitigate the Impact of Seasonality on
Child Growth? (Panel data evidence from northern Ethiopia)
tion provides more realistic estimates. Without feature selection, two
Akresh, R., Lucchetti, L., Thirumurthy, H., 2012. Wars and child health: evidence from the
highly-correlated variables could both be used to split a decision tree Eritrean-Ethiopian conflict. J. Dev. Econ. 99, 330–340. https://doi.org/10.1016/j.
without concrete preference for either variable (Andrée et al., 2020). jdeveco.2012.04.001.
We mitigated this bias by bootstrapping the model 100 times and aver- Andrée, B.P.J., Chamorro, A., Kraay, A., Spencer, P., Wang, D., 2020. Predicting food crises.
Policy Research Working Papers.
aging the feature importance scores. Nevertheless, it would be interest-
Barrett, C.B., 2002. Food security and food assistance programs. In: Gardner, B.L., Rausser,
ing to get even more insight into which features are important for each G.C. (Eds.), Handbook of Agricultural Economics. Elsevier, pp. 2103–2190 Handbook
specific transition. A new technique like the SHAP (SHapley Additive ex- of Agricultural Economics.
Planations) by Lundberg and Lee (2017) could provide this extra infor- Barrett, C.B., 2010. Measuring food insecurity. Science (80-.) 327, 825–828. https://doi.
org/10.1126/science.1182768.
mation. Finally, research started to assess biases in machine learning
Biffis, E., Chavez, E., 2017. Satellite data and machine learning for weather risk manage-
outcomes due to, for example, biases in one of the input data sources. If ment and food security. Risk Anal. 37, 1508–1521. https://doi.org/10.1111/
a machine learning algorithm is biased, the interventions implemented risa.12847.

14
J.J.L. Westerveld, M.J.C. van den Homberg, G.G. Nobre et al. Science of the Total Environment 786 (2021) 147366

Boult, V., Young, M., Maidment, R., Mwangi, E., Ambani, M., Waruru, S., Otieno, G., Black, 2018 International Tech4Dev Conference: UNESCO Chair in Technologies for
E., Asfaw, D., Todd, M., 2020. Evaluation and validation of TAMSAT-ALERT soil mois- Development: Voices of the Global South. https://www.researchgate.net/publica-
ture and WRSI for use in drought anticipatory action. Meteorol. Appl. 27, 1–22. tion/326096566_Combining_Open_Data_and_Machine_Learning_to_predict_Food_
https://doi.org/10.1002/met.1959. Security_in_Ethiopia
Braimoh, A., Manyena, B., Obuya, G., Muraya, F., 2018. Assessment of Food Security Early Heinrich, D., Bailey, M., 2020. Forecast-based Financing and Early Action for Drought –
Warning Systems for East and Southern Africa. World Bank Other Operational Guidance Notes for the Red Cross Red Crescent.
Studies, The World Bank. High Level Panel of Experts on Food Security and Nutrition, 2011. Price Volatility and
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. https://doi.org/10.1023/A: Food Security the High Level Panel of Experts on Food Security and Nutrition.
1010933404324. Holden, S., Shiferaw, B., 2004. Land degradation, drought and food security in a less-
Brown, M., Backer, D., Billing, T., White, P., Grace, K., Doocy, S., Huth, P., 2020. Empirical favoured area in the Ethiopian highlands: a bio-economic model with market imper-
studies of factors associated with child malnutrition: highlighting the evidence fections. Agric. Econ. 30, 31–49. https://doi.org/10.1111/j.1574-0862.2004.tb00174.x.
about climate and conflict shocks. Food Secur. 12, 1–12. https://doi.org/10.1007/ Holleman, C., Jackson, J., Sanchéz, M., Vos, R., 2017. Sowing the Seeds of Peace for Food
s12571-020-01041-y. Security (Rome).
Chen, T., Guestrin, C., 2016. XGBoost: a Scalable Tree Boosting System. , pp. 785–794 van den Homberg, M., Gevaert, C., Georgiadou, P.Y. (Yola), 2020. The changing face of ac-
https://doi.org/10.1145/2939672.2939785. countability in humanitarianism: using artificial intelligence for anticipatory action.
Choularton, R., Krishnamurthy, P.K., 2019. How accurate is food security early warning? Polit. Gov. 8, 456–467. https://doi.org/10.17645/pag.v8i4.3158.
Evaluation of FEWS NET accuracy in Ethiopia. Food Secur. 11, 333–344.
Household Economy Approach, 2008. The Practitioners’ Guide to the Household Economy
Connolly-Boutin, L., Smit, B., 2016. Climate change, food security, and livelihoods in sub-
Approach.
Saharan Africa. Reg. Environ. Chang. 16, 385–399. https://doi.org/10.1007/s10113-
Jones, A.D., Ngure, F., Pelto, G., Young, S., 2013. What are we assessing when we measure
015-0761-x.
food security? A compendium and review of current metrics. Adv. Nutr. 4 (5),
Coughlan de Perez, E., van Aalst, M., Choularton, R., van den Hurk, B., Mason, S., Nissan, H.,
481–505.
Schwager, S., 2019. From rain to famine: assessing the utility of rainfall observations
Krishnamurthy, R., P., K., Fisher, J.B., Schimel, D.S., Kareiva, P.M., 2020. Applying tipping
and seasonal forecasts to anticipate food insecurity in East Africa. Food Secur. 11,
point theory to remote sensing science to improve early warning drought signals
57–68. https://doi.org/10.1007/s12571-018-00885-9.
for food security. Earth’s Futur. 8, e2019EF001456. https://doi.org/10.1029/
Delbiso, T.D., Rodriguez-Llanes, J.M., Donneau, A.-F., Speybroeck, N., Guha-Sapir, D., 2017.
2019EF001456.
Drought, conflict and children’s undernutrition in Ethiopia 2000-2013: a meta-
analysis. Bull. World Health Organ. 95, 94–102. https://doi.org/10.2471/BLT.16.172700. Lentz, E., Michelson, H., Baylis, K., Zhou, Y., 2019. A data-driven approach improves food
Development Initiatives, 2020. Global Humanitarian Assistance Report (Bristol). insecurity crisis prediction. World Dev. 122, 399–409. https://doi.org/10.1016/j.
Döring, M., 2018. Prediction vs forecasting predictions do not always concern the future worlddev.2019.06.008.
[WWW document]. URL. https://www.datascienceblog.net/post/machine-learning/ Lentz, E., Gottlieb, G., Simmons, C., Maxwell, D., 2020. The Ecosystem of Humanitarian
forecasting_vs_prediction/. Diagnostics and its Application to Anticipatory Action (Boston).
Dorogush, A., Ershov, V., Gulin, A., 2018. CatBoost: Gradient Boosting with Categorical Lundberg, S., Lee, S.-I., 2017. A Unified Approach to Interpreting Model Predictions.
Features Support. Misselhorn, A., 2005. What drives food insecurity in southern Africa? A meta-analysis of
Famine Early Warning Systems Network, 2011. Special Brief Fews Net Adopts IPC Version household economy studies. Glob. Environ. Chang. 15, 33–43. https://doi.org/
2.0 Scale. 10.1016/j.gloenvcha.2004.11.003.
Famine Early Warning Systems Network, 2019. Livelihoods. Mwebaze, E., Okori, W., Quinn, J.A., 2010. Causal structure learning for famine prediction.
FAO, IFAD, UNICEF, WFP, WHO, 2017. The State of Food Security and Nutrition in the AAAI Spring Symposium: Artificial Intelligence for Development.
World (ROME). Office for the Coordination of Humanitarian Affairs, 2021. Catalogue of Predictive
Food and Agriculture Organization of the United Nations, 2003. Trade Reforms and Food Models in the Humanitarian Sector [WWW Document]. URL. https://centre.
Security. FAO, Rome. humdata.org/catalogue-for-predictive-models-in-the-humanitarian-sector/.
Food and Agriculture Organization of the United Nations, 2008. Climate Change and Food Okori, W., Obua, J., 2011. Machine learning classification technique for famine prediction.
Security: A Framework Document (Rome). Proc. World Congr. Eng. 2011. 2. WCE 2011, pp. 991–996.
Food and Agriculture Organization of the United Nations, 2019. The State of Food Security Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
and Nutrition in the World. FAO, Rome. Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D.,
Food Security Cluster, 2017. Core Indicator Handbook. Brucher, M., Perrot, M., Duchesnay, E., Louppe, G., 2012. Scikit-learn: machine learn-
Food Security Information Network, 2018. Global Report ON Food Crises 2018. ing in Python. J. Mach. Learn. Res. 12.
Frelat, R., Lopez-Ridaura, S., Giller, K., Herrero, M., Douxchamps, S., Djurfeldt, A., Erenstein, Red Cross, 2020. Global Early Action Database.
O., Henderson, B., Berresaw, M., Paul, B., Rigolot, C., Ritzema, R., Rodriguez, D., Van Ripley, B.D., 1996. Pattern ecognition and Neural Networks. Cambridge University Press,
Asten, P.J.A., Van Wijk, M., 2015. Drivers of household food availability in sub- Cambridge https://doi.org/10.1017/CBO9780511812651.
Saharan Africa based on big data from small farms. Proc. Natl. Acad. Sci. 113, Rosegrant, M.W., Cline, S.A., 2003. Global food security: challenges and policies. Science
201518384. https://doi.org/10.1073/pnas.1518384112. (80-.) 302, 1917–1919. https://doi.org/10.1126/science.1092958.
Funk, C., Shukla, S., Thiaw, W., Hoell, A., Mcnally, A., Husak, G., Novella, N., Budde, M.,
Sheffield, J., Wood, E.F., Chaney, N., Guan, K., Sadri, S., Yuan, X., Olang, L., Amani, A., Ali, A.,
Peters-Lidard, C., Alkhalil, A., Galu, G., Korecha, D., Magadzire, T., Rodriguez, M.,
Demuth, S., Ogallo, L., 2014. A drought monitoring and forecasting system for sub-
Robjhon, M., Bekele, E., Arsenault, K., Peterson, P., Verdin, J., 2019. Recognizing the
Sahara African water resources and food security. Bull. Am. Meteorol. Soc. 95,
famine early warning systems Network (FEWS NET): over 30 years of drought
861–882. https://doi.org/10.1175/BAMS-D-12-00124.1.
early warning science advances and partnerships promoting global food security.
United Nations, 2019. Goal 2: Zero Hunger - United Nations Sustainable development.
Bull. Am. Meteorol. Soc. 100. https://doi.org/10.1175/BAMS-D-17-0233.1.
[WWW Document].
Géron, A., 2019. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Uppsala University, 2018. Uppsala Conflict Data Program [WWW Document]. URL.
Concepts, Tools, and Techniques to Build Intelligent Systems.
https://www.pcr.uu.se/research/ucdp/ (accessed 7.20.18).
Godfray, H., Beddington, J., Crute, I., Haddad, L., Lawrence, D., Muir, J., Pretty, J., Robinson,
S., Thomas, S., Toulmin, C., 2010. Food security: the challenge of feeding 9 billion peo- Weiss, D.J., Nelson, A., Gibson, H.S., Temperley, W., Peedell, S., Lieber, A., Hancher, M.,
ple. Science (80-.) 327, 812–818. Poyart, E., Belchior, S., Fullman, N., Mappin, B., Dalrymple, U., Rozier, J., Lucas,
Google, 2019. Classification: Precision and Recall [WWW Document]. URL. https://developers. T.C.D., Howes, R.E., Tusting, L.S., Kang, S.Y., Cameron, E., Bisanzio, D., Battle, K.E.,
google.com/machine-learning/crash-course/classification/precision-and-recall. Bhatt, S., Gething, P.W., 2018. A global map of travel time to cities to assess in-
Google, 2020. Catalog Earth Engine Data - MODIS Combined 16-Day NDVI [WWW equalities in accessibility in 2015. Nature 553, 333–336. https://doi.org/
Document]. 10.1038/nature25181.
Guimarães Nobre, G., Davenport, F., Bischiniotis, K., Veldkamp, T., Jongman, B., Funk, C.C., World Food Program, 2019. 5 Climate-Driven Disasters — And how WFP Has Prepared
Husak, G., Ward, P.J., Aerts, J.C.J.H., 2019. Financing agricultural drought risk through and Responded in 2019 [WWW Document]. URL. https://insight.wfp.org/5-climate-
ex-ante cash transfers. Sci. Total Environ. 653, 523–535. https://doi.org/10.1016/j. driven-disasters-and-how-wfp-has-prepared-andresponded-%0A%0Ain-2019-
scitotenv.2018.10.406. aa715b454f06.
He, H., Bai, Y., Garcia, E., Li, S., 2008. ADASYN: adaptive synthetic sampling approach for World Food Program, 2020. Ethiopia - Food Prices [WWW Document]. URL. https://data.
imbalanced learning. Proceedings of the International Joint Conference on Neural humdata.org/dataset/wfp-food-prices-for-ethiopia (accessed 9.12.20).
Networks, pp. 1322–1328 https://doi.org/10.1109/IJCNN.2008.4633969. World Food Program, 2021a. HungerMap [WWW Document]. URL. https://hungermap.
Headey, D., Barrett, C., 2015. Opinion: measuring development resilience in the world’s wfp.org (accessed 2.28.21).
poorest countries. Proc. Natl. Acad. Sci. U. S. A. 112, 11423–11425. https://doi.org/ World Food Program, 2021b. And Safety Nets Alert Program [WWW Document]. URL.
10.1073/pnas.1512215112. https://snap.vam.wfp.org/main/.
van der Heijden, W., van den Homberg, M., Marijnis, M., de Graaff, M., Daniels, H., 2018. Zargar, A., Sadiq, R., Naser, B., Khan, F.I., 2011. A review of drought indices. Environ. Rev.
Combining Open Data and Machine Learning to Predict Food Security in Ethiopia. 19, 333–349. https://doi.org/10.1139/a11-013.

15

You might also like