1 s2.0 S0168169921004592 Main

Computers and Electronics in Agriculture 190 (2021) 106442
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture

journal homepage: www.elsevier.com/locate/compag
Original papers
IoT-Agro: A smart farming system to Colombian coffee farms

Jhonn Pablo Rodríguez a, *, Ana Isabel Montoya-Munoz a, Carlos Rodriguez-Pabon a,
Javier Hoyos b, Juan Carlos Corrales a
a
Department of Telematics Engineering, Engineering Telematics Group, Universidad del Cauca, Popayán 190002, Colombia
b
Parque Tecnológico de Innovación del Café - TECNICAFÉ, Popayán 190002, Colombia
A R T I C L E I N F O A B S T R A C T
Keywords: Currently, the adoption of smart technologies for sustainable farming systems creates a distinct competitive edge
Smart Farming for farmers, extension services, agri-business, and policy-makers. However, selecting the most appropriate
Internet of Things technologies from a wide range of options is never an easy job. In this context, several authors consider Smart
Data Analytics
Farming as the best solution. However, they fall short in providing more information to recommend the most
Outlier Detection
BPMN
appropriate IoT technology, the options to manage the IoT infrastructure, and the services to crop management
Coffee Farm plans and crop production estimation. This paper implements a Smart Farming System based on a three-layered
architecture (Agriculture Perception, Edge Computing, and Data Analytics). In the Agriculture Perception Layer,
we evaluated Omicron, Libelium, and Intel technologies under criteria such as the price, the number of inputs for
sensor connection, communication protocols, portability, battery life, and harvesting energy system photovoltaic
panel. We evaluated edge-based management mechanisms in the Edge Layer to provide data reliability, focusing
on outlier detection and treatment using Machine Learning and Interpolation algorithms. We recommend the
Isolation Forest algorithm for classifying outliers in the monthly temperature dataset (99% of precision) and the
Cubic Spline technique for effectively replacing the data classified as outliers (RMSE lower than 0.085). In the
Data Analytics Layer, we evaluated different machine learning algorithms to estimate coffee production. The
results show that the measured error values of the XGBOOST algorithm keep the values lower than the other
models (RMSE 0.008, MAE 0.032, and RSE 0.585). The www.iot-agro.com platform offers farmer services such as
weather variables monitoring, coffee production estimating, and IoT infrastructure setting. Finally, stakeholders,
researchers, and engineers validated our Smart Farming Solution through a Colombian coffee farm case study.
The test evaluated the usability, the straightforward interpretation of data, and the look feel of the web
application.
1. Introduction great importance to allow growers to have relevant information for the
proper decision-making, programming, and execution of agricultural
Nowadays, technology is involved in many sectors of the world. New activities such as flowering calendars, harvest times, irrigation methods,
technologies have been an essential pillar for millions of farmers living efficient nutrition almanacs, early warnings, planning at harvest time,
off their crop’s livelihood in the agriculture sector. The United Nations and apply the proper practices in the all value chain stages. However,
(Gupta et al., 2020) estimates for 2050 that the world’s population will selecting the most appropriate technologies for supporting these needs is
exceed nine billion, which demands an increase in food products quickly not an easy job considering the wide range of options and the specific
and safely. Technology has led craft practices to come to the fore for agriculture domains.
their costs and time; farmers’ main idea is to carry out agricultural work Several authors consider Smart Farming (SF) as the best solution to
with the minimum possible time, staff, and price. the issues mentioned above in the literature. SF is defined as using in
One of the main problems in the agriculture sector is climate vari formation and communication technologies to identify, monitor,
ability. This variability causes alterations in production and changes in analyze, and represent agricultural production’s spatial characteristics
the environmental context. Environmental variables monitoring has to support decision-making to improve agrarian productivity (Glaroudis
* Corresponding author.
E-mail addresses: jhonnpablo@unicauca.edu.co (J.P. Rodríguez), aimontoya@unicauca.edu.co (A.I. Montoya-Munoz), carlosdr@unicauca.edu.co (C. Rodriguez-
Pabon), javierhoyosg@unicauca.edu.co (J. Hoyos), jcorral@unicauca.edu.co (J.C. Corrales).
https://doi.org/10.1016/j.compag.2021.106442
Received 14 January 2021; Received in revised form 31 July 2021; Accepted 3 September 2021
Available online 1 October 2021
0168-1699/© 2021 Elsevier B.V. All rights reserved.
J.P. Rodríguez et al. Computers and Electronics in Agriculture 190 (2021) 106442
et al., 2020; Ryu et al., 2015). SF comes to facilitate the processes of Table 1
production chains, providing safety, efficiency, and performance. Related work.
Wolfert et al. (2017) conduct a review of state of the art concerning SF Approach Reference Shortcomings
and propose a flexible SF architecture comprising farm processes, farm
Gupta et al. The authors analyze cybersecurity
management, data chain, network management organization, and (2020) challenges in SF. However, they do not
network management technology. Popovic et al. (2017) presents a case introduce an actual scenario
study of a private IoT-enabled platform for ecological monitoring do implementation of these technologies.
mains by architectural perspectives describing the solution from the Gia et al. (2019) This work does not compare technologies
at the Perception layer.
viewpoint of different stakeholders. Colezea et al. (2018) introduces a Colezea et al. The authors present a web-service
Cloud-based web-service platform that increases the quality of products (2018) platform to increase the quality of the
and supports business development in agriculture-related fields. How products grown. However, the authors do
ever, these works fall short in providing detailed information to SF
not offer forecasting services at the Cloud
Layer of the platform proposed.
recommend the most appropriate IoT technology, options to manage the
Popovic et al. The authors present the design of a
IoT infrastructure, services to crop management plans or the crop pro (2017) private platform for research in SF.
duction estimation. However, the authors do not consider
Moreover, several works introduce SF approaches based on different Data Analytics in the Cloud Layer.
architectural layers. Sadowski and Spachos (2020) provides an experi Ferrag et al. The authors analyze privacy and security
(2020) challenges for green IoT-based
mental analysis between three wireless technologies: Zigbee, LoRaWAN, agriculture. However, This approach does
and WiFi used in an agricultural monitoring system with energy har not implement a real scenario in
vesting capabilities. Carpio et al. (2017) introduce an Android-based agriculture.
mobile application called SmartHof that integrates Cloud and Fog for Glaroudis et al. They focus on the use, requirements,
(2020) evaluation, and research challenges of IoT
monitoring the quality of the animals’ life. Rodriguez et al. (2020)
application protocols in SF.
propose a classic computer vision approach to detect cherry coffee beans Agriculture
Ray (2017) They do not compare the technologies
in crops from images captured by a smartphone in an uncontrolled Perception
implemented in real scenarios by metrics.
Layer
environment. However, to the best of our knowledge, none work in Sadowski and This approach does not compare
troduces a fully suited, implemented, and validated SF system to a real Spachos (2020) technologies in qualitative terms such as
price, portability, or harvest energy.
farm scenario with completed and extended deployment of each archi
Carpio et al., Fog Layer is only used to provide
tectural layer. 2017 redundancy of the database in case of
This paper proposes and implements IoT-Agro, an innovative SF failures.
system based on a three-layered architecture: Agriculture Perception, Ramli et al. There is no implementation of a fault
(2020) detection method.
Edge, and Data Analytics. In the Agriculture Perception Layer, we
Taneja et al. This approach does not implement a
evaluated three technologies available on the market under criteria such (2020) mechanism to improve the data reliability
as price, communication protocol, Pearson correlation, and Mean at the Fog Layer.
Square Error (MSE). Results indicate a good correlation in temperature Natale et al. The statistically-based cleaning method
and humidity between weather stations and satellite service. In the Edge (2020) improves the yield data collected but is
Edge-Fog Layer
still inaccurate to identify outliers.
Layer, we evaluated edge-based management mechanisms to provide
Celik et al. The authors compare a cluster-based
data reliability, focusing on outlier detection and treatment using Ma (2011) method for anomaly detection but do not
chine Learning (ML) and Interpolation algorithms. Results show our include an isolation-based approach,
mechanisms achieve high Precision and Recall and low False Alarm Rate classification-based techniques, or a
method for treating the outliers.
(FAR) and Root Mean Square Error (RMSE) when detecting and
Wahir et al. This work only considers linear
replacing outliers with inferred data. In the Data Analytics Layer, we (2018) interpolation but no other techniques to
evaluated different Artificial Intelligence (AI) algorithms to estimate interpolate values such as cubic spline or
coffee production, such as TreeRegressor, Artificial Neural Networks, nearest neighbor.
XGBoost, Support Vector Machine for Regression, and Random Forest. Shine et al. The authors predict direct consumption of
(2018) water and electricity on dairy farms;
The XGBoost algorithm obtained the lowest Mean Absolute Error
however, they do not present an SF
(0.032). architecture; they only focus on the data
Moreover, our proposed solution was implemented and validated by analytics layer.
a case study in a Colombian coffee farm with several stakeholders. For Goncalves et al. The authors do not implement an SF
(2015) architecture and focus on growing sugar
the case study, we introduce a Business Process Modelling Notation
cane with a data set of satellite imagery.
(BPMN)-based model (Chinosi and Trombetta, 2012) of a coffee value Ramesh (2015) This research uses Data Analytics
chain and a validated web platform: www.iot-agro.com. This platform techniques with different information
offers farm services such as weather variables monitoring, coffee pro sources to predict crop production but
duction estimating, and IoT infrastructure setting. does not involve SF architectures.
Data Analytics
Rodriguez et al. This work provides a first approximation
The remainder of this paper is organized as follows. Section 2 pre Layer
(2020) to the estimation of coffee production
sents related work. Section 3 describes the proposed IoT-Agro archi based on computer vision techniques.
tecture and each layer’s description. Section 4 presents the case study, However, the authors do not implement
and Section 5 describes the IoT-Agro architecture applied and evaluated SF architectures.
to the case study. Section 6 describes a discussion on the experiences Ramos et al. The authors developed a computer vision
(2017) system for counting coffee beans on tree
from IoT-Agro validation. In Section 7, we expose conclusions and future branches. The authors used a dataset of
work. 1018 images of coffee tree branches.
Kouadio et al. The authors proposed an Extreme
2. Related work (2018) Learning Machine model to analyze soil
fertility properties and generate an
accurate estimate of robusta coffee yield.
This section presents research works on SF architectures, Agriculture
Perception, Edge, and Data Analytics-based approaches. Table 1 sum
marizes the shortcomings of each one. Regarding SF architectures, the
2
authors in Gupta et al. (2020) analyze cybersecurity challenges in SF. money loss. Authors in Carpio et al. (2017) propose a novel application
They present a multi-layer SF architecture supported on edge and cloud framework on animal welfare that enables farmers to document the
environments and explain several attack scenarios in smart farms and quality of the animals’ life and facilitate various behavioral studies, such
the open challenges regarding smart agriculture security and privacy. In as social play for animals. The computing and sensing framework de
Gia et al. (2019), the authors propose a 5-layer SF architecture (Sensor, ploys an Android-based mobile application called SmartHof that in
Edge, Fog, Cloud, and Terminal) for IoT developments with LoRa. This tegrates Cloud and Fog.
approach exposes different convolutional neural networks (CNNs) to The authors in Ramli et al. (2020) design an adaptive network
compress data and reduce the link load of low-power wide-area net mechanism to achieve a more reliable smart farm system by changing
works (LPWAN). In Colezea et al. (2018), the integrated web-service the communication protocol between IEEE 802.11ac and LoRaWAN
platform aims to increase the quality of products grown in farms and depending on the situation. The performed evaluation resulted that
support business development in agriculture-related fields. The authors LoRaWAN is suitable for transmitting the sensor data and IEEE 802.11ac
raise a framework that enjoys the benefits of Cloud Computing like for sending video data since this protocol has a higher data rate than
flexibility, availability, or security, and it can be accessed at any time, in LoRaWAN.
any place, by using just an Internet connection. This paper also presents The authors in Taneja et al. (2020) propose FC’s use for reducing the
the system’s architecture and the performance test results that prove its amount of data transferred to the cloud an 84% in smart dairy farming.
efficiency. In Popovic et al. (2017), the design of a private platform for In this application, the Fog Node consists of a local database that collects
research in SF implements smart spraying and irrigation, assessing the and preprocesses raw data to form behavioral activities into a daily time
marine environment and fish/mussel farm monitoring. The platform is series.
created for researchers to use for Data Analytics purposes. In Ferrag et al. The following papers introduce mechanisms to provide data reli
(2020), the authors provide an overview of a four-tier green IoT-based ability in SF environments. Natale et al. (2020) proposes a cleaning
agriculture architecture. They focused on categorizing threat models methodology for yield maps collected by platform Agricolus tracking
against green IoT-based agriculture into five categories, including at systems, allowing farmers to make data-driven decisions. The data
tacks against privacy, authentication, confidentiality, availability, and cleaning method consists of practical cleaning steps, statistical analysis
integrity properties. Also, they analyzed the privacy-oriented block based on ”moving windows,” and interpolation to fill the missing data.
chain-based solutions, as well as consensus algorithms. Celik et al. (2011) presents a comparison between statistical and clus
Some research papers had implemented a comparative analysis of tering methods for detecting anomalies in monthly temperature data
technologies in the Agricultural Perception Layer, such as presented in sets. The authors implement the Density-Based Spatial Clustering of
Glaroudis et al. (2020). This paper focuses on the evaluation of IoT Applications with Noise (DBSCAN) algorithm to discover anomalies
protocols of farm applications. The authors provide a general survey of better than the statistical method, even if the outliers are not extreme
IoT application protocols and focus on the use, requirements, evalua values. Wahir et al. (2018) introduces a treatment for outliers using
tion, and research challenges of IoT application protocols in SF based on linear interpolation methods in time-series datasets. The authors detect
suitable key performance indicators. Ray (2017) exposes a review of anomalies with Box-Jenkins and neural network approaches, classify the
potential IoT applications and the specific issues and challenges asso outliers as missing values, and treat them using the linear interpolation
ciated with IoT deployment for improved farming. The author’s brief method to fill them.
various case studies explore the existing IoT-based solutions performed Several SF research papers use AI techniques executed at the Cloud
by multiple organizations and individuals and categories according to Layer due to the accommodation of the information registered and
their deployment parameters. Sadowski and Spachos (2020) provides an captured by different IoT devices. Next, we discuss some approaches
experimental analysis between three wireless technologies: Zigbee, that use ML algorithms to estimate crop production. Shine et al. (2018)
LoRaWAN, and WiFi used in an agricultural monitoring system with predict direct water and electricity consumption on Irish dairy farms,
energy harvesting capabilities. The authors created three identical sys based on ML algorithms (decision trees, random forests, artificial neural
tems, each using one wireless technology. They conclude that LoRaWAN networks, and support vector machines). The information (20 features)
is the optimal wireless technology used in an agricultural monitoring is collected from a remote monitoring system installed in a study sample
system, based on evaluating power consumption and the network from 58 Irish commercial pasture-based dairy farms between 2014 and
lifetime. 2016. The authors obtained a prediction of electricity consumption
The recent expansion of IoT applications at several domains, such as within 12% (relative prediction error) using a support vector machine
farms, was challenging cloud computing performance, primarily due to and a prediction of water consumption within 38% with random forests.
unpredictable communication and constrained Internet connectivity Goncalves et al. (2015) propose a Multiple Linear Regression (MLR) to
(Masip-bruin et al., n.d.). Fog Computing (FC) was proposed to help estimate sugar cane production in Sao Paulo’s state in Brazil, based on
execute applications and services closer to “Things”. This new concept time series meteorological and agroclimatic data. The model proposed
has been perceived as what is known as Edge Computing. It is essentially in this work uses planted area variables, standardized difference vege
used to proximate computational resources rather than remote resources tation index, and Water Requirement Satisfaction Index (WRSI), which
in data centers. However, some authors declare that FC is, typically, but had correlation coefficients around 0.9. The models also showed a
not exclusively located at the edge of the network (Bonomi et al., 2012). directly proportional relationship between sugarcane production and
Moreover, an FC layer is used for big data analysis; then, a Fog Node is Normalized Difference Vegetation Index (NDVI) and inversely propor
instanced by devices with intermediate computer power. We argue that tional to the WRSI. Ramesh (2015) implement two techniques for crop
using an Edge Computing approach is more appropriate in the SF prediction: Multiple Linear Regression (MLR) and Density-Based Clus
domain, especially considering the economic aspects. A Smart Farm tering (DBC), experienced these models were in the East Godavari dis
typically does not handle that much data to implement a Big Data trict of Andhra Pradesh in India. The authors considered the following
analysis or count on enormous computing and networking resources for input variables for the models: year (the date on which captured the
Fog-enable services. There are no Fog providers in developing countries, data), rainfall, planting area, yield, fertilizers (nitrogen, phosphorus,
and the Cloud ones are usually far from farms. potassium), and crop production. The results obtained in this work were
The following papers propose Edge and Fog Layers to control and about a 2% difference between the actual value and the predicted
manage SF-based IoT environments for meeting the requirements values. Rodriguez et al. (2020) propose a classic computer vision
related to reliability, connectivity, and capacity constraints. In this approach to detect cherry coffee beans in crops from images captured by
paper, we highlight data reliability as a farming requirement to avoid a mid-range smartphone in an uncontrolled environment. The set of
inaccurate data that can diminish the quality of crops and, consequently, images used in this study contained images of the entire coffee tree of 3
3
coffee varieties (Caturra, Bourbon, and Castillo). The system achieved 3.1. Agriculture Perception Layer
the best results for bourbon coffee trees with 0.594 precision; 0.669 of
the total correctly classified relevant cherry coffee beans. Ramos et al. The Agriculture Perception Layer installed on the farm represents the
(2017) developed a computer vision system for counting coffee beans on first layer of our architecture (see Fig. 2); technically, it comprises
tree branches. The authors used a dataset of 1018 images of coffee tree physical sensors, actuators, microcontrollers, and network modules. In
branches at different stages of maturation. The system was validated in agricultural terms, these devices include weather stations, drones, sen
four plots of Castillo variety coffee, at various stages of development and sors embedded, hubs, antennas, and gateway devices located in different
different densities, with a correlation higher than 90% in the early stages farm areas. These devices mainly collect and send data to the Edge
of crop development. Kouadio et al. (2018) evaluated an Extreme Layer, while actuating devices receive edge gateway commands to
Learning Machine (ELM) model to analyze soil fertility properties and control actuators. The devices collect real-time data about location, soil
generate an accurate estimate of robusta coffee yield. Compared to moisture level, weather conditions, which can be sent to the Edge and
Multiple Linear Regression and Random Forest models, the ELM model
contributes to selecting the significant soil properties used in coffee
yield. The proposed ELM model considering the organic matter, potas
sium, and sulfur characteristics as predictor variables generated the
most accurate coffee yield.
3. IoT-Agro architecture
The IoT-Agro architecture is built on three layers: Agriculture

Perception, Edge, and Data Analytics (Ferrag et al., 2020; Gupta et al.,
2020; Gia et al., 2019) (see Fig. 1). The Agriculture Perception Layer
consists of IoT-enabled devices sensing real-time variables such as
weather conditions and product tracking. These devices include sensors,
actuators, weather stations, tags, and Radio Frequency IDentification
(RFID) readers. These represent all the IoT infrastructure deployed along
the crop value chain (Crop, Harvest, Post-Harvest, Drying, and Storage)
to provide data to the architecture’s analysis components located in the
Data Analytics Layer. The Edge Layer is near the end-devices for local
real-time computations and data prepossessing to reduce the Data An
alytics Layer’s load and reinforce the agriculture IoT data reliability.
This layer comprises interconnected Edge devices, physical or virtual
entities (e.g., routers, switches, wireless access points, repeaters,
embedded systems, and servers) geo-distributed in the farms to collect
all the data from the Agriculture Perception Layer. The Data Analytics
Layer consists of data centers and traditional Cloud servers with virtu
ally unlimited computing resources and storage. This layer deploys IoT-
Agro services based on Data Analytics (see Section 5.4).
Fig. 2. Agriculture Perception Layer.
Fig. 1. IoT-Agro Architecture.
4
then to the Cloud. IoT-Agro implements Edge Computing from two perspectives: 1)
Notably, an IoT device has some capabilities such as reading sensors, introducing an Edge-based and reliability-oriented architectural layer
actuation on outputs, control, and monitoring. IoT devices can exchange that incorporates local data storage, a mechanism based on ML to detect
data with other connected devices and applications, collect data from outliers, and another based on Interpolation for inferring data intended
other devices, and process the data locally or send them to centralized to replace outliers; 2) offering IoT Infrastructure management services
servers or cloud-based applications. These services process the data and (Section 5.4) for checking the status of IoT devices (e.g., battery level)
perform IoT infrastructure tasks based on temporal and space con and configuring the IoT devices (e.g., registering new device).
straints (i.e., battery, memory, processing capabilities, communication This section focus on the architectural perspective. Fig. 4 describes
latency, speed, and deadlines). the Edge Layer compose of Edge devices. Edge devices are enriched with
Fig. 3 depicts a general model for IoT devices describing several computing and data processing capacities. The Network Modules collect
communication interfaces to other devices, both wired and wireless. the data sensed by the Agriculture Perception Layer. The Edge Server
These include I/O interfaces for sensors, interfaces for local or internet receives these data and processing it by the Outlier Detector and Data
connectivity, memory and storage interfaces, and audio/video in Recovery mechanisms. These mechanisms update the local Database
terfaces. IoT devices can also be of varied types, for instance, cameras, with reliable data for further sending to the Gateway Data Analytics
smart valves, localization and tracking sensors, and industrial machines. Layer. The following sections extensively describe the data cleaning
Almost all IoT devices generate data processed by Data Analytics sys mechanisms to provide data reliability by detecting and treating outliers
tems that generate useful information to guide further actions locally or in raw collected data from the Agriculture Perception Layer.
remotely. For instance, sensor data generated by a soil moisture moni
toring device in a plot can help determine the optimum watering 3.2.1. Outliers Detection
schedules when processed. Our outlier detector receives data from the Perception Layer and
identifies the data failures as outliers using ML algorithms. This detector
3.2. Edge Layer
In SF, not all data need to be sent to the Cloud, nor should all ap
plications operate only in a Cloud. Internet connectivity is still limited in
farms far from urban. Some data and applications must run at the edge,
closer to the end-users. Moreover, it is necessary to offer data reliability
since the raw data collected includes many incorrect readings. Hence,
raw data cannot use it directly in SF applications such as forecasters or
weather monitoring. These applications would not be helpful to the farm
stakeholders if they operate with unreliable data.
Let us consider a coffee farm as a particular SF scenario that involves
IoT devices. These devices can measure: i) pH levels at the fermentation
phase, ii) ambient temperature, air humidity, relative temperature and
humidity of coffee beans at the dry phase, iii) temperature, time, and
color bean at the roasted phase; and iv) weather variables at any phase.
These data are essential to guarantee the quality, taste, and aroma of
coffee. Indeed, these data are the main input for applications such as
data analysis to predict coffee production, traceability, and alarm sys
tems. Therefore, the data collection must be reliable to support the
coffee farm’s correct operation and SF in general. To provide such
reliability, we propose a detector of outliers (i.e., measurements that
significantly deviate from the normal pattern of sensed data (Chandola
et al., 2009)) and a data recovery mechanism to removes and replaces
Fig. 4. Edge Layer.
the outliers tagged.
Fig. 3. IoT device elements.
5
considers the following ML algorithms: DBSCAN, Isolation Forest (IF), and pre-modeling outlier cleanup (Data Processing); then perform all
Support Vector Machine (SVM). DBSCAN is a clustering-based technique the modeling of the data to be processed and implement the techniques
that uses a density level estimation based on a threshold for the number of artificial intelligence (Data Modeling); finally deliver the results to
of neighbors, minimum points (minPts), and the radius epsilon (eps) users in the best way and interpretation (Data Visualization). These
(Schubert et al., 2017). If eps is too small, most of the data will not processes are described in more detail below.
cluster most data at all. If eps is too large, the dataset will merge as a
single cluster (Ester et al., 1996). IF is an isolation-based technique that 3.3.1. Data Storage
performs recursive random splits on attribute values generating trees In the Data Storage process, all data from different sources are stored,
(Ding and Fei, 2013). When a forest of random trees collectively pro which can be: weather data from satellite images, weather stations, crop
duces shorter path lengths for particular samples, they are highly likely management information, multi-spectral images, vegetation indices,
to be outliers (Liu et al., 2008). SVM is a classification-based technique product price in market exchanges, etc. This process also seeks to
that separates the data from different classes by fitting a hyperplane implement acceptable data storage practices in terms of structure, which
between them, which maximizes the separation (Rajasegarar et al., allows for increased productivity in data query performance.
2007).
3.3.2. Data Processing
3.2.2. Data Recovery Data Processing is the conversion of data into a usable and desired
Our data recovery receives the dataset with the outliers tagged, then way. This conversion or “processing” is performed using a predefined
removes and replaces the outliers with other suitable values obtained data selection and data cleansing operations sequence.
with Interpolation techniques: Cubic Spline (Laszlo, 2005), Linear In Data Selection, you decide the data for performing the analysis.
(Hazewinkel, 2013), and Nearest Neighbor (Rukundo, 2012). Cubic Criteria include the importance of data concerning objectives, quality,
Spline consists of a series of unique cubic polynomials fitted between and technical constraints. Therefore, it should be considered which data
each of the data points. Linear uses linear polynomials to construct new will be included or excluded in the dataset.
data points within the range of a discrete set of known data points. The Data Cleansing is the phase where you have to decide whether to
Nearest Neighbor technique takes a rounded value of the expected po increase the data quality to the level required by the selected analysis
sition and finds the closest data value at the integer position (Klapetek, techniques. This phase can involve selecting subsets of clean data or
2013). inserting appropriate default data. This phase’s main activities are:
correct, remove, or ignore the noise of the data, decide how to deal with
particular values.
3.3. Data Analytics Layer
3.3.3. Data Modeling
In SF architectures, the Data Analytics Layer is built into the Cloud
In the Data Modeling stage, modeling techniques are selected. Four
Layer. We define it as a global layer because it integrates Data Storage,
phases must be taken into account: Modeling Techniques Selection,
Data Processing, Data Modeling, and Data Visualization (see Fig. 5).
Tests Design, Model Construction, and Model Evaluation. Below is a
With these modules, the Data Analytics Layer is complemented to
brief description of each stage.
effectively implement an SF architecture without considering large
Modeling Technique Selection, in this phase, you select the basic
volumes of data.
modeling technique to use. If multiple techniques are applied, this phase
In agriculture, Data Analytics has been a great help in increasing
is performed for each selected technique. Many modeling techniques
productivity and sustainability. The patterns and trends that the analysis
make specific assumptions about the data. For example, all attributes
of large data sets can reveal can provide crucial information (Kamilaris
have uniform distributions, do not find non-allowed values, and so on.
et al., 2017). Considering a Data Analytics Layer in an SF architecture
The Tests Design tests the quality and validity of the model. Typically,
helps farmers in decision-making, for example, (Shah et al., 2015): in
the dataset is separated into a training set and an evaluation set; the
rice, wheat, down, and jowar crops. They consider different information
model is built on the training set, and its quality is estimated over the
sources (geospatial, weather, climate, production data) to create a crop
test set. This phase describes the plan for training and evaluating models
recommendation system to predict crop yields and identify crop pro
and determines how the available dataset is divided into training and
duction trends. The recommender system will enable farmers to profit
test data.
from the results delivered by Data Analytics. However, the authors do
In Model Construction, you run the modeling tool on the dataset to
not describe Data Analytics as a Layer; they consider it a process for
create one or more models. Many parameters can be adjusted in any
creating the recommendation system.
modeling tool, so it is necessary to list the parameters and their chosen
Data Analytics Layer works as following: you must collect the
values with the reasoning to adjust the parameters to that value.
different sources of information (Data Storage); make a data selection
In this last Model Evaluation, the model has to be interpreted ac
cording to the desired domain, success criteria, and test design. Also,
classify it, evaluate it according to the evaluation criteria, apply a single
technique more than once, generate results with several different
methods, and compare all the results according to the evaluation
criteria.
The objective is to summarize the evaluation results regarding
business success criteria and services, including a final statement
establishing whether Data Analytics Layer has achieved business
objectives.
4. Case Study
This section describes the coffee farm where the case study is carried
out, and the coffee value chain’s description at the crop stage.
Fig. 5. Data Analytics Layer.
6
4.1. Description Coffee Farm • Planting: At this stage, the inputs are the small coffee trees. At the
Small Coffee Tree Classification, the little coffee trees are classified
The coffee farm “Los Naranjos” used for this case study, is belonging by growth and leaf foliage. The coffee grower adapts to the condi
to the company Supracafé, located in La Venta district, in the munici tions of the terrain where the small coffee trees are planted. Then the
pality of Cajibio, Cauca (21-35’08”N, 76-32’53”W). The farm is tiny coffee trees are transplanted into the ground, where soil prop
composed of 38 plots (Fig. 6); each of the plots is distinguished by the erties are essential variables. After this stage, the coffee crops are
sown coffee variety; among the main coffee varieties planted on the farm ready to be cared for and harvested for a long time.
are Castillo, Bourbon, Caturra, and Tabi. • Growth: The Fertilization, Integrated Weed Management, and Inte
grated Plagues and Disease Management stages remain by the entire
life of the crop. Fertilization is understood as the nutrients that coffee
4.2. Coffee Value Chain: Crop Stage
trees need before and after harvests. Integrated Weed Management is
relative to the land, such as a scythe and weed cleaning. Integrated
We modeled each stage of the coffee value chain from crop to export
Plagues and Disease Management are activities to prevent diseases of
stages using BPMN (see Fig. 7). BPMN is a standardized graphical no
the coffee crop.
tation designed to represent the activities that make up an organiza
tion’s business processes and the messages flowing between participants
5. Solution Applying in the Case Study
and each of the activities (Chinosi and Trombetta, 2012).
Due to the limited length of the paper, we only present in detail the
This section describes the implementation of the IoT-Agro architec
crop stage. Nevertheless, the whole BPMN-based coffee value chain is
ture in the case study, and specifically in the Crop stage of the coffee
available at the following repository: https://github.com/iotagro2018/
value chain (described in Section 4). Below we present the results ob
BPMN-coffee-model.
tained in implementing the three layers that make up the IoT-Agro
With the value chain in BPMN, we have benefits such as (Mirabelli
architecture.
et al., 2012): high flexibility, reduced development time, reduced
implementation costs, high usability, management and control of actors,
processes and data, easy information exchange between the different 5.1. Implementation of the Agriculture Perception Layer
actors of the supply chain, and the appropriate level of integration with
the data system. In the coffee sector, the BPMN value chain allows coffee Fig. 9 shows the physical infrastructure; it identifies the elements
growers to automate complex processes and greater transparency of described in a general way in the Agriculture Perception Layer of Fig. 2.
communication and information and reduce costs in each of their We evaluate different IoT technologies in several farm plots to compare
operations. their performance, usability, and price. We conducted this comparison
The Crop is the first stage of the coffee value chain that inputs the by satellite services with weather stations (MeteoBlue vs. Libelium and
coffee seeds. This stage has associated four roles of coffee growers at the Omicron) and weather stations with low-cost Tag devices (Libelium and
beginning of the Crop: Germinator, Seedbed, Planting, and Growth Omicron vs. GWS-CSCG INTEL Logistic Monitoring Tag devices).
(Fig. 8). Meteoblue is a Weather Service created at the University of Basel,
Switzerland, in cooperation with the U.S. National Oceanic and Atmo
• Germinator: This stage begins developing the plant’s vegetative or spheric Administration and the National Environmental Prediction
gans (root, stem, and leaves). In this stage, the lab technician clas Centers) (Universitat Basel, 2006).
sifies the seeds according to the seed origin and coffee varieties, We use IoT devices, such as weather stations available in the market.
ensuring healthy coffee trees. Secondly, the technician distributes Specifically, we installed Omicron and Libelium Smart Agriculture Pro.
the seeds over the moistened sand at the Seed Germination and then The first of them corresponds to a low-cost station available in the
covers them with another sand layer. In this activity, the soil mois Colombian market. The second is a weather station with great interna
ture and shadow variables are essential. The output of this stage is tional prestige in the IoT sector with more than 120 countries. Omicron
the seedlings ready to be transplanted into the bags. has devices for measuring temperature, humidity, and soil humidity in
• Seedbed: The input of this stage is the coffee seedlings already the plots compared with the GWS-CSCG INTEL Logistic Monitoring Tag
developed. In this stage, the lab technician classifies the seeds ac devices developed by Intel to monitor and track objects.
cording to the vegetative pore. Then, the lab technician transplants Table 2 presents analyzed parameters, such as the price, number of
the best seedlings into bags. The output of this stage is the coffee inputs for sensor connection, communication protocols, portability,
seedlings with their first leaves (Small Coffee Tree). In this activity, battery life, and harvesting energy system such as a photo-voltaic panel.
the soil moisture and shadow variables are essential. Fig. 10 shows the comparison of the values reported by the Libelium
Fig. 6. Plots distribution of the farm Los Naranjos.
7
Fig. 7. BPMN Coffee Value Chain.
Fig. 8. Crop stage.
climatic station and the low-cost nodes installed in the plots; in general, correlation coefficient (R) and MSE as metrics for this study. Fig. 11
the nodes deliver higher levels of temperature compared to the profes shows temperature, humidity measurements for Omicron, Libelium, and
sional climatic station; also, for humidity, omicron reported higher MeteoBlue technologies for a selected group of days with different cli
values that GWS-CSCG INTEL Logistic Monitoring Tag compared to matic conditions. There is similar behavior in both variables by the three
those reported by the professional station, the previous evaluation technologies. However, Omicron reports higher temperature levels;
shows that the GWS-CSCG INTEL Logistic Monitoring Tag present a Libelium and Omicron notify almost the same value in low tempera
useful measurement in humidity and values quite significantly out of tures. Likewise, in terms of humidity, Libelium delivers saturated levels.
phase in temperature for both low-cost technologies evaluated. Fig. 12 shows the wind speed and direction measurement of the three
The goal of evaluating meteorological monitoring technologies and technologies for a group of days. Libelium and Meteoblue report data
based on the analysis proposed in Colston et al., 2018, we compared regarding wind direction and speed; Libelium reports greater wind
satellite climate data against weather station data. We chose the Pearson values than Meteoblue between the southeast and east. Omicron
8
presents an inaccurate report in terms of wind direction. Meteoblue

Omicron Libelium offers too low values, implying a low estimate of the wind speed due to
Weather the measurement mechanism via satellite compared to devices installed
Stations
on the farm.
Fig. 13 shows scatter plots of the two sources’ daily variable values,
the satellite service (X-axis) variables against station-based data (Y-
axis). Temperature variable, there was a high correlation (R > 0.9) be
Omicron Intel tween the temperatures measured at weather stations and the satellite
service. Libelium had the lowest level of statistical agreement for tem
Nodes perature according to Pearson’s correlation. However, it has the lowest
MSE value. Furthermore, Relative humidity showed a moderate corre
lation between the satellite service and station-based data at the farm;
Omicron presents less significant values; In terms of MSE, humidity
Fig. 9. Components in Agriculture Perception Layer Infrastructure. values were more important than temperature. On the other hand, wind
speed and direction obtained from satellite service show a low correla
tion with those recorded by the weather stations (Omicron and Libel
ium) and exhibit considerable biases and poor statistical agreement. The
Table 2
Capabilities between the weather stations and end nodes.
Technology Price Inputs for Sensors Communication Portability Battery Life Harvest Energy
Libelium HIGH 6 WiFi, wired MEDIUM HIGH YES

Weather Stations
Omicron MEDIUM 3 WiFi, Sigfox HIGH HIGH YES
Intel LOW 3 Integrated Zigbee, BLE HIGH MEDIUM NO
End node on land plot
Omicron MEDIUM 3 WiFi MEDIUM MEDIUM YES
Fig. 10. Comparison of temperature and humidity for Omicron, Intel and Libelium technologies.
Fig. 11. Comparison of temperature and humidity for Omicron, Libelium and Meteoblue technologies.
9
Fig. 12. Comparison of Wind directions and speeds for Libelium, Omicron and Meteoblue technologies.
Fig. 13. Comparison between weather stations and satellite service data.
wind is a challenging variable to measure remotely, so we consider that 5.2. Implementation of the Edge Layer
this variable must be measured in a localized way, even more, when the
satellite service resolution is 5 km. Fig. 14 depicts the implemented and deployed architectural Edge
The comparative analysis shows Omicron’s low-cost weather station Layer in the Colombian Coffee Farm, introduced in Section 4.1. The
reports similar humidity and low temperatures; however, it reports Edge Layer above described in Section 3.2 was instanced as following.
higher temperature values due to the less robustness of the sensor in the Network Modules are Ubiquiti antennas and access points; Edge server,
face of solar rays. Meteoblue satellite data provides limited values for outlier detection, data recovery, and local database run over a Raspberry
low and high temperature and humidity levels; however, they are Pi 3 Model B; and the gateway consists of a set of Raspberry Pi 3 Model B
helpful when completion is required when faults occur. with EPS modules.
In terms of wind speed and direction, the Libelium weather station Following the data reliability importance presented in Section 3.2 in
presented a report with a wide range of values. Omicron presented er terms of outlier detection and data recovery, this section evaluates the
rors in the direction sensor, and Meteoblue presented a low resolution data cleaning mechanisms implemented in our case study. To assess our
and opposite values to those provided by Libelium; therefore, the data outlier detector, we conducted several experiments comparing the
provided by the satellite service are not adequate for completion in the
event of possible failures for the mentioned variables. Table 3 summa
rizes the evaluation of the four meteorological variables for the two
devices against the satellite service.
Table 3
Evaluation statistics for weather stations and satellite service.
Technology Parameter Variable
Temperature Humidity Wind Wind
Speed Direction
Pearson’s 0.927 0.777 0.45 0.2527

Libelium correlation
MSE 2.782 15.11 3.142 119.64
Pearson’s 0.93 0.828 0.093 -0.112
Omicron correlation
MSE 3.142 18.683 77.072 48326.58
Fig. 14. Components in Edge Layer Infrastructure.
10
performance of the ML algorithms presented in Section 3.2.1) with a Layer to the case study. The results will make the estimate of coffee
dataset collected from our Agriculture perception Layer in terms of( i) production based on weather information and crop management
recall (i.e., the percentage of typical total values), (ii) precision (i.e., the information.
fraction of normal values properly-identified among the instances clas The coffee production estimation is useful for scheduling coffee crop
sified as normal), (iii) F-Score (i.e., the harmonic mean of precision and management tasks as a support tool for decision-making. There have
recall), and (iv) FAR (i.e., the percentage of falsely detected typical been remarkable advances in the development of methodologies to
values of the instances classified as outliers). This dataset has 22532 predict bean production from a national perspective. The first ap
instances of temperature measurements sensed with our weather station proaches to the coffee production estimation were based on supply
Libelium for a month. We distinguish a high precision and a low FAR to functions that depend on the harvested area and historical production
indicate an algorithm’s best performance to detect outliers. We vary the levels. Later, these functions added variables such as the yield of plants
esp (e) parameter of DBSCAN from 0.01 to 0.8, the contamination (c) by age, prices paid to the producer, and the effects of applied technol
parameter of IF from 0 to 0.5, and the nu parameter of SVM from 0.01 to ogy. There are also approximations from the computer vision area for
0.1. Moreover, to evaluate the outlier detector’s performances in the counting cherry coffee beans, from images taken directly into the crop
worst case, we insert 10% of outliers into the dataset. The following (Rodriguez et al., 2020). Currently, the coffee production estimation
figures depict the results for each ML algorithm. implies a sampling methodology in coffee plantations. According to
Fig. 15 presents the DBSCAN outlier detection performing. The Ramos et al. (2015), now, the FNC (Federación Nacional de Cafeteros)
evaluated algorithm obtained excellent Recall, Precision, and F-Score estimates the national coffee production with direct measurements in
for eps values minor than 0.1. For eps major than 0.1, DBSCAN had the field, leaving out the cherry beans collected from coffee production
limitations to identify outliers closer to normal data increasing the FAR (destructive sampling). Each sample consists of 60 coffee trees per
significantly. Moreover, DBSCAN mistakenly classified values as normal hectare, in an area of 20000 hectares. Although the FNC methodology
for each eps value, although this percentage is slighter for small eps can compute a cherry coffee production estimation, this approach has
values. four shortcomings: counting error in the sampling process, insufficient
Fig. 16 depicts the IF outlier detection performance. This algorithm coffee bean samples, significant expenses of costs and time, and Coffee
obtained an excellent result regarding Recall, Precision, F-Score, and beans losses.
FAR with the c parameter set in 0.1. IF improved the classification of We propose a Data Analytics Layer for the coffee production esti
normal values when c increase. However, the recall decreased, indi mation based on weather data and crop management information. Each
cating outliers identified improperly. phase of the Data Analytics Layer is described below.
Fig. 17 presents the SVM outlier detection performance. The results
highlight that SVM is inefficient for classifying many outliers. Even for 5.3.1. Data Storage
the best results regarding Recall, Precision, and F-Score (nu = 0.1), the In the Data Storage component, information from three sources of
FAR is higher than 7%. Based on the results above, we consider IF the information is stored: climate data from the Agriculture Perception
best option for carrying out an outlier detector. IF obtained an excellent Layer, weather data, and crop management data.
performance for marking outliers with recall, precision, F-Score greater Climate data is provided from the Agriculture Perception Layer. This
than 99%, and zero FAR. information consists of four variables: ambient temperature, relative
To evaluate our Data Recovery, we use RMSE (Chai and Draxler, humidity, rainfall, and solar radiation; It is essential to highlight that
2014). RMSE measures the amount of error between the two datasets. this information is captured from weather stations and sensors installed
Then, we calculate the RMSE of the original dataset versus the dataset on the farm (5.1).
interpolated to identify the most accurate technique. The less RMSE, the Meteoblue provides weather data. The weather information consists
better the performance of the Interpolation technique. Table 4 presents of more than 20 variables with a daily timeline; some of the most
the results considering the three Interpolation techniques presented in representative variables of this service are temperatures, relative hu
Section 3.2.2). midity, rain, solar radiation, etc. This information is read every day at
Table 4 indicates that the Nearest Neighbor technique obtained the two access points.
worst RMSE; Nearest Neighbor selects the nearest datum without Crop management data is provided by the Naranjos farm (Section
considering neighboring data values, which tends to increase noise. 4.1). This information is composed of fertilization, control, and cleaning
Cubic Spline presents offer a slightly better result than the Linear activities. It also has information on coffee production. This crop man
Interpolation. Based on the results above, we recommended Cubic agement information is for the years 2012, 2013, 2014, 2016, 2017, and
Spline and Linear Interpolation to conduct the data recovery. 2018; there is no information for 2015 due to problems in registering the
farm data. The registration of this information is manually by the farm
administrator in excel files.
5.3. Implementation of the Data Analytics Layer: Coffee production
The features of crop management (control, fertilization, and clean
estimation
ing) were processed to calculate new attributes, such as weeds control,
disease control, pest control, renewal, and fertilization. In addition, crop
This section will look at the implementation of the Data Analytics
Fig. 15. Results of DBSCAN.
11
Fig. 16. Results of Isolation Forest.
Fig. 17. Results of Support Vector Machine.
Table 4 Table 5
Data recovery results. Dataset features.
Interpolation Techniques RMSE INFORMATION FEATURES TYPE DESCRIPTION
Cubic Spline 0.0849270614 Control Nominal Feature indicating whether or

Linear 0.0965005343 not pest and disease control
Nearest Neighbor 3.8505036271 activities have been done in the
crop.
Fertilization Nominal Features indicating whether or
management features were modified from nominal to numerical values, not fertilizer application
Crop activities have been done in the
computed as the number of acts per control management category
Management crop.
performed during that week. Crop cleaning Nominal Feature indicating whether or
The data set comprises 318 features corresponding to the informa not cleaning activities have been
tion on crop management and weather data and 107 samples, of which done in the crop.
90 belong to the training set and 17 to the test set (main harvest of the Coffee Numeric Target variable representing
production coffee production in kilograms
year 2018).
(Kg).
Average Numeric Average temperature in
5.3.2. Data Processing temperature centigrade (◦ C).
Data Selection was made from expert knowledge (coffee growers) and Maximum Numeric Maximum temperature in
literature (systematic review) to estimate coffee production, where the temperature centigrade (◦ C).
Minimum Numeric Minimum temperature in
most influential variables in the domain are reviewed. Table 5 shows the Weather Data temperature centigrade (◦ C).
selected variables for weather data and crop management data. In Relative Numeric Relative humidity in percentage
addition, we used a feature selection method to improve the results. The humidity (%).
Pearson Correlation (PC) filter method (Granitto et al., 2006) uses Rainfall Numeric Rainfall in millimetres (mm).
Solar radiation Numeric Solar radiation in watts per
Pearson’s correlation to select the best subset of variables from the
square meter (W/m2).
highest absolute value of correlation between an independent and
dependent variable. We define as threshold a Pearson correlation <=
0.2 due to several iterations to adjust from the results. mechanisms to provide reliable data at the Edge layer presented in
The filter method selected 112 features from the 318 of the initial Sections 3.2.1 and 3.2.2.
data set; this reduces the dimensionality of the data set and improves the
results of the coffee estimation. 5.3.3. Data Modeling
Data Cleansing was done as follows: for weather data was verified ML algorithms were used in Data Modeling to estimate coffee
within the ranges set for each variable, where there were no difficulties production.
because the data provider service sends structured and proved data. In The Modeling Technique Selection was based on literature where there
contrast to crop management data, there were several drawbacks are quite a few efforts to detect coffee diseases, but few jobs are for
because the data recording is manual, where errors were found, such as production. The selected machine learning techniques are described
spelling, poor writing, lack of data, etc. The advantage is that the farm below.
manager could complete the information. It was unnecessary to apply
methods for processing outliers in this phase due to the specific
12
• TreeRegressor (TR) (Quinlan, 1992): A regression tree is similar to a Table 6

classification tree, except that the target variable takes ordered Parameter values used for ML models.
values, and a regression model is fitted to each node to give the Models Params Values
predicted values. This model breaks down a dataset into smaller
splitter best
subsets while at the same time incrementally developing an associ TR criterion mae
ated regression tree. max_features sqrt
• Artificial Neural Network (ANN) (Corrales et al., 2017): For this solver lbfgs
work, a Multi-Layer Perceptron (MLP) network was used. The MLP is alpha 1e-5
ANN hidden_layer_sizes (2, 2)
made up of nodes that are organized into layers: input, hidden, and random_state 12
output layers. MLP has no short-cuts, so each node within a layer is max_iter 10000
connected to each node in the following layer. The topology of the objetive reg:linear
XGBOOST
MLP (i.e., number of hidden layers, number of nodes in each hidden booster gbtree
kernel poly
layer, the activation function of each neuron, etc.) is xed and
c (regularization) 100
established by an expert or follows default parameters given by a gamma auto
software platform. SVR
degree 3
• Extreme Gradient Boosting (XGBOOST) (Chen and Guestrin, 2016): epsilon .1
XGBoost belongs to a family of boosting algorithms and uses gradient coef0 1
n_estimators 1000
boosting (GBM) framework at its core. It is an optimized, distributed RF
random_state 42
gradient boosting library. Boosting is a sequential technique that
works on the principle of an ensemble. It combines a set of weak
learners and delivers improved prediction accuracy. The outcomes traditional harvesting is 4.46 coffee beans per tree. However, a coffee
predicted correctly are given a lower weight, and the ones miss- crop has approximately 5 thousand to 10 thousand coffee trees per
classified are weighted higher. hectare (Farfan et al., 2016). Besides, 1 kilogram of mature coffee
• Support Vector Regression (SVR) (Nisha and Sreekumar, 2017): A (cherry) has an average of 555 coffee beans (Martinez Marin, 2015).
Support Vector Machine tries to get a hyperplane as a boundary for a Specifically, the case study farm has, on average, 4.57 thousand trees
decision, so the separation between the patterns at each side (one per hectare and a total of 38 plots of one hectare each. If we calculated
side for each class) of the hyperplane is maximum. The approach that the loss of coffee beans per tree, we would have 20.4 thousand lost beans
SVR follows to denote the hyperplanes is based on statistical learning per hectare and 775.35 thousand throughout the farm. If we convert this
theory. SVR can also be applied to regression tasks. In this case, a amount to kilograms, we would be talking about 1.397 thousand kilo
distance measure is included in the loss function. grams (approximately 1.4 tons) of loss in collection throughout the farm.
• Random Forest (RF) (Segal, 2004): The random forest model is a type Finally, It would be essential to consider the loss of coffee beans
of additive model that makes predictions by combining decisions generated by transport from the crop to the collection center.
from a base model sequence. This broad technique of using multiple Fig. 18 shows that the two models (XGBOOST and RF) have the most
models to obtain better predictive performance is called model remarkable correlation against the coffee production real curve; this is
ensembling. In random forests, all the base models are constructed reflected in the evaluation metrics implemented in the Model Evaluation
independently using a different subsample of the data. phase.
The following evaluation metrics were taken into account in the
We select the ML models according to their excellent results in work Models Evaluation (Gholipoor and Nadali, 2019; Rodriguez et al., 2018):
related to coffee crops (Corrales et al., 2017; Corrales et al., 2015). In PC, RMSE, MAE, and Relative Squared Error (RSE).
addition, the XGBOOST and RF models are assembly methods that, The results presented in Table 7 show that the measured error values
based on weak models such as regression trees, considerably improve are close to each other. However, in the three error evaluation metrics,
the results. the XGBOOST algorithm keeps the values lower than the other models.
The Tests Design consists of the training data set and the test data set. While ANN, TR, SVR, and RF models get the highest MAE values. And for
The training data set comprises data for the years 2012, 2013, 2014, correlation metric, the best algorithms were RF, SVR, and XGBOOST;
2016, and 2017. Because we want to estimate the predictive perfor which obtained a positive correlation greater than 0.45. From this
mance (Generalization performance) (Raschka, 2020) of our model on analysis of the models’ evaluation, we can say that the algorithm with
future data (unseen), the testing data set contains real unseen data the lowest error value and with best proximity to the actual coffee
which were not obtained by using cross-validation. In addition, because production value is the XGBOOST algorithm.
the size of the data set is small and the models will be used by end-users We made this coffee production estimation with classic ML models,
(coffee growers), the evaluation of the models is carried out with real which is the first experimentation of this estimation mechanism in coffee
data that correspond to the main harvest of the year 2018. crops. It is essential to mention that the models implemented did not
The ML models (Model Construction) were implemented in the Python consider variables of loss of coffee bean collection, which are a crucial
3 language with the sklearn library. The models were trained with the factor in measuring the weighing of what was collected (Ocampo-Lopez
training data and with the parameters shown in Table 6. In total, five et al., 2017).
models were obtained, which receive the input parameters (weather
data and crop management data) and as output the estimate of coffee 5.4. IoT-Agro Services
production in kilograms. Fig. 18 shows each model’s coffee production
estimate for the main harvest of 2018. The estimation is made on a This section presents the primary services offered by IoT-Agro in
monthly scale for the analyzed harvest period (April - July). Data Analytics Layer: Environment Variable Monitoring and Coffee
The XGBOOST and RF algorithms get the values closest to coffee Production Estimation. Fig. 19 depicts the Graphical User Interface
production real. There is a difference of approximately 1 thousand ki (GUI) of the IoT-Agro web site (i.e., https://www.iot-agro.com/) for the
lograms for XGBOOST and 55 thousand kilograms for RF. Smart Farming services.
As mentioned above, the difference from the estimated to the real is IoT-Agro offers monitoring services of environmental variables on
quite significant because the models do not consider variables of coffee three spatial scales: region, farm, and plot. The satellite service is used to
beans loss at the harvest stage of the coffee value chain. According to a obtain weather information of the area with an accuracy of 5 Km by
CENICAFE study (Ocampo-Lopez et al., 2017), the average loss in Meteoblue (Universitat Basel, 2006). Libelium and Omicron weather
13
Fig. 18. Coffee production estimation from the main harvest of 2018.
data provided by the coffee grower plus the weather data are the inputs
Table 7
of the model. Finally, the platform shows the production graph with the
Evaluation metrics.
real data vs. the estimate and indicates the accumulated one per year.
TR ANN XGBOOST SVR RF IoT-Agro brings the management applications of checking IoT de
PC 0.014 0.0 0.459 0.481 0.567 vices’ status, configuring new IoT devices, and administering the IoT-
RMSE 0.014 0.010 0.008 0.026 0.007 Agro platform users. Fig. 23 presents the service to consult the current
MAE 0.065 0.065 0.032 0.091 0.044 battery level and signal strength of all devices at the farm. The user only
RSE 0.262 0.474 0.585 -0.429 0.597
needs to select the technology and the specific device to make the query.
This application allows the farmer to identify and verify real-time device
Fig. 19. GUI - IoT-Agro services.
stations offer many variables: ambient temperature, ambient humidity, outages or deficiencies due to low battery. In this way, the farmer can
solar radiation, rain, wind speed and direction, atmospheric pressure, carry out the corresponding tasks in an appropriate manner. For
and soil moisture. With them, we get more specific information on the instance, the devices’ batteries on time to avoid losing data and maintain
environmental conditions of the farm. Finally, low-cost sensors are the infrastructure are opportune.
responsible for measuring some variables such as environmental tem
perature and humidity in the plots where the coffee is grown to obtain 6. Discussion
specific crop data. Fig. 20 shows historical data from the Intel device
placed in the coffee plot throughout October; the average information is To validate the proposal presented in this paper and particularly the
displayed hourly, daily, monthly as the graphic map. web IoT-Agro services illustrated in the above section, we conducted the
IoT-Agro estimates coffee production from weather data and crop User Acceptance Tests (UAT) (Cimperman, 2006) with several farming
management data using machine learning models. We have achieved stakeholders. Table 8 presents the ranking (from 0 to 5, 0 being the
promising results in the Naranjos farm prototype. The coffee production lowest acceptance, 3 the intermediate, and 5 the highest acceptance)
estimate module on the IoT-Agro platform comprises the plot in which given by the users that performed the UAT. The stakeholders vary from
the production is estimated; the batch area is displayed on the map. The farmers and operators to researchers and students from institutions such
coffee grower must also enter agricultural activities (fertilization, con
trol, and cleaning) (see Fig. 21).
After the coffee grower presses the ”Estimate” button, the platform
automatically loads the weather data and graphs it (see Fig. 22). The
14
Fig. 20. GUI - Historical weather data.
Fig. 21. GUI - Coffee Production Estimation Module - Query Parameters.
as TECNICAFÉ1, EcoTecma SAS2, and Universidad del Cauca3. The test from UAT tests leap up the following discussion.
evaluated the usability, the straightforward interpretation of data, and For the coffee farm director and foreman, access to climate infor
the look & feel of the www.iot-agro.com web application. The results mation such as weather stations has excellent value because of the
current climatic variability. Our work is motivated by the need to pro
vide reliable data by IoT applications to guarantee correct decision-
1
making support for farming practices. Traditionally, the findings of
https://www.tecnicafe.co/
2
where to harvest? When to fertilize or apply a fungicide? are made by
https://ecotecma.com.co/
3 observation and experience. For instance, a coffee farmer can lose a
https://www.unicauca.edu.co/
15
Fig. 22. GUI - Coffee Production Estimation Module - Results.
Fig. 23. GUI - IoT devices status query.
considerable amount of beans if he decided to harvest a partially ripe lot

Table 8
against a fully ripe lot because the sufficiently mature coffee bean falls
Farming stakeholders.
from the tree and is lost. This decision is even more severe for large
Stakeholders Ranking farms since it is difficult for the farmer to cover the entire farm to
Coffee farm operational director 3.0 examine the crops’ state empirically. The rains, humidity, and temper
Coffee farm foreman 5.0 ature directly affect the maturation time of harvests. These farming
Farm-based software application developer 5.0
practices can then be supported by a monitoring system of environ
Environmental monitoring and management expert 3.8
Researcher of A.I. applied in coffee agriculture 5.0 mental variables that inform the farmer of climatic changes.
Researcher of green coffee defects classification using machine vision 4.8 From the point of view of agriculture experts and researchers, It is
Researcher of a model of estimation of ground data based on satellite 4.9 essential to take preventive actions based on real historical data to
images mitigate or avoid diseases, for example, rust, iron stain, and descending
Mean 4.5
death or attacks and pests such as the bit, pigeons, and malunion.
16
Likewise, the prediction of abnormal situations can lead to other crop of interest.
actions such as nutrition. Suppose that the state of crop development is We assessed several AI algorithms to estimate coffee production in
well-known, particularly the phenological4 phase and the reproductive terms of PC, RMSE, MAE, and RSE. The ML models proposed for esti
and productive needs. Using the prediction, we can ensure the ideal mating coffee production in Data Analytics Layer allow coffee growers
frequency of nutrition that allows suitable plant fruits to grow. This to plan activities before the main harvest, generating cost and time
action will enable the plant to withstand phytosanitary5 situations reduction. Regarding limitations, we found that the information on crop
without reaching the point of economic damage, thanks to physiological management is scarce, which means that the ML models do not have a
development, crossed with environmental conditions. sufficient volume of data to be trained. However, it is noteworthy that
All stakeholders conclude that based on climate information (current the model obtained good results with the data obtained from the coffee
and forecasted) provided by satellite imaging services, farmers can farm (case study).
adjust crop management plans such as nutrition, phytosanitary man This paper also introduces a novel BPMN-based coffee value chain
agement, planting programming, renovations, crops, and irrigation that describes the entire stages of the coffee value chain in detail,
programming. However, it is crucial to compare the satellite information characterizes the relevant variables related to the quality of the coffee,
with other weather stations to make accurate cultivation decisions. and depicts the steps as a process with inputs and outputs in a stan
Having climate data storage in the Data Analytics Layer helps farmers dardized notation. This model allows the key opportunity to improve,
access helpful information that with crop management information al adapt, or modify the value chain in a standardized manner according to
lows performing technical calculations in different crops such as irri several coffee farms types and farmer’s interests. Moreover, we pre
gation needs and early warnings for predicting phytosanitary problems. sented the web platform www.iot-agro.com that enables coffee growers
Data Storage at the top Layer also enables them to do regional planning (i) to monitor environmental variables of the coffee zone by plots, farm,
to seek economic stability, promote markets, adjust offers, and even or region, (ii) to make their own crop management decisions in coffee
influence prices to benefit producers. trees based on their real data, and (iii) to plan the activities before the
Besides, from the information storage, farmers can estimate their harvest based on coffee production estimation per year avoiding money
crops’ production with artificial intelligence models. The estimation of a loss in production. The IoT-Agro platform validation leads to a rich and
crop production allows for negotiation possibilities without breaking relevant discussion from the stakeholders presented in this paper. Our
with a delivery to which it can be committed. Also, there is the option to proposal proved promising by implementing a functional, usable, and
calculate the proportions of the different qualities that can be produced practical web platform that allows viewing all IoT-Agro services based
and, in the same way, adjust the quantities by quality to be offered; that on the value chain model’s description.
will also allow knowing more in proportion, the structure of production We propose as future work: (1) implement IoT-Agro services in more
costs, and the profitability of the production year. stages of the coffee value chain, such as storage, transport, and expor
tation, (2) aggregate more services to increase value-added such as
7. Conclusions and Future Work traceability and blockchain deployment, and (3) construct a recom
mendation system for controlling coffee diseases based on collected data
We highlight that the data represent the primary input in an SF by sensors and expert knowledge.
deployment, and its analysis is the crucial factor in decision-making to
support farming practice. This paper introduces IoT-Agro, a complete Declaration of Competing Interest
solution implemented in a Colombian farm for helping the farmers in
tasks such as schedule the harvest time based on climatic changes, The authors declare that they have no known competing financial
preventively mitigate crop diseases based on the historical farm data, interests or personal relationships that could have appeared to influence
and estimate the crop production per year. the work reported in this paper.
This paper proposes a fully implemented Smart Farming architecture
with three layers (Agriculture Perception, Edge, and Data Analytics) in a Acknowledgements
real Colombian coffee farm scenario. We conducted an extensive com
parison of the IoT technologies implemented at the Agriculture We thank the Telematics Engineering Group (GIT) of the Universidad
Perception Layer in price, communication protocols, portability, battery del Cauca and Tecnicafé for the technical support. This work has been
life, Pearson correlation, and RMSE. The technology comparison pro also supported by Innovacción-Cauca (SGR-Colombia) under project
vides an essential guide for smart farming providers and researchers for ”Alternativas Innovadoras de Agricultura Inteligente para sistemas
any study that uses IoT to select the suitable infrastructure according to productivos agrícolas del departamento del Cauca soportado en entor
the specific needs. We recommend professional stations for evaluating nos de IoT ID 4633 - Convocatoria 04C-2018 Banco de Proyectos Con
variables such as wind speed and direction; the low-cost station pro juntos UEES - Sostenibilidad”. As well, we thank the support we receive
vided very similar information in terms of temperature and humidity. In from Ph.D. Rahul Khanna and Ph.D. Giby Raphael from Intel Corpora
addition, satellite information is helpful for data completion, consid tion through their help in GWS-CSCG INTEL Logistic Monitoring Tag
ering that they offer lower values. devices. Finally, we thank the TaIO Systems company for their support
We evaluated edge-based management mechanisms focusing on and assistance with our project.
outlier detection and treatment of the data from the Agriculture
Perception Layer using ML and Interpolation algorithms in Precision, References
Recall, FAR, and RMSE. We recommend the Isolation Forest algorithm
Bonomi, F., Milito, R., Zhu, J., & Addepalli, S. (2012). Fog computing and its role in the
for classifying outliers in the monthly temperature dataset with 99% of internet of things. Proceedings of the first edition of the MCC workshop on Mobile
precision and the Cubic Spline technique for effectively replacing the cloud computing - MCC ’12, (March), 13. doi: 10.1145/2342509.2342513.
data classified as outliers with an RMSE lower than 0.085. The outlier Carpio, F., Jukan, A., Martin Sanchez, A.I., Amla, N., Kemper, N., 2017. Beyond
Production Indicators: A Novel Smart Farming Application and System for Animal
analysis of humidity, pressure and light measurements are out of the Welfare. In: Proceedings of the Fourth International Conference on Animal-
scope of the present paper, and the differentiation of outliers from events Computer Interaction, pp. 7:1–7:11. https://doi.org/10.1145/3152130.3152140.
Celik, M., Dadaşer-Celik, F., Dokuz, A.Ş., 2011. Anomaly detection in temperature data
using DBSCAN algorithm. In: INISTA 2011–2011 International, Symposium on
INnovations in Intelligent SysTems and Applications, pp. 91–95. https://doi.org/
4
periodic biological phenomena correlated with climatic conditions 10.1109/INISTA.2011.5946052.
5
of, relating to, or being measures for the control of plant diseases, especially
in crops
17
Chai, T., Draxler, R.R., 2014. Root mean square error (rmse) or mean absolute error Natale, A., Antognelli, S., Ranieri, E., Cruciani, A., Boggia, A., 2020. A novel cleaning
(mae)?-arguments against avoiding rmse in the literature. Geoscientific model method for yield data collected by sensors: A case study on winter cereals. In:
development 7 (3), 1247–1250. International Conference on Computational Science and Its Applications,
Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection: A survey. ACM pp. 684–691.
computing surveys (CSUR) 41 (3), 1–58. Nisha, K.G., Sreekumar, K., 2017. A review and analysis of machine learning and
Chen, T., Guestrin, C., 2016. XGBoost: A scalable tree boosting system. In: Proceedings of statistical approaches for prediction. In: 2017 International Conference on Inventive
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Communication and Computational Technologies (ICICCT), pp. 135–139. https://
Mining, pp. 785–794. https://doi.org/10.1145/2939672.2939785. doi.org/10.1109/ICICCT.2017.7975174.
Chinosi, M., Trombetta, A., 2012. BPMN: An introduction to the standard. Computer Ocampo-Lopez, O. L., Ovalle-Castiblanco, A. M., Arroyave-Diaz, A., Salazar-Ospina, K.,
Standards & Interfaces 34 (1), 124–134. https://doi.org/10.1016/j.csi.2011.06.002. Ramirez-Gomez, C. A., Oliveros-Tascon, C. E., Ocampo-Lopez, O. L., Ovalle-
Cimperman, R., 2006. Uat defined: A guide to practical user acceptance testing (digital Castiblanco, A. M., Arroyave-Diaz, A., Salazar-Ospina, K., Ramirez-Gomez, C. A., &
short cut). Pearson Education. Oliveros-Tascon, C. E. (2017). Nuevo metodo estandar para la recoleccion selectiva
Colezea, M., Musat, G., Pop, F., Negru, C., Dumitrascu, A., Mocanu, M., 2018. de cafe [Publisher: Facultad de Ingenieria, UNAM]. Ingenieria, investigation
CLUeFARM: Integrated web-service platform for smart farms. Computers and ytecnologla, 18(2), 127–137. Retrieved December 4, 2020, from http://www.scielo.
Electronics in Agriculture 154, 134–154. https://doi.org/10.1016/j. org.mx/scielo.php?script=sci_abstract&pid=S 1405-
compag.2018.08.015. 77432017000200127&lng=es&nrm=iso&tlng=es.
Colston, J. M., Ahmed, T., Mahopo, C, Kang, G., Kosek, M., de Sousa Junior, F., Shrestha, Popovic, T., Latinovic, N., Pesic, A., Zecevic, Z., Krstajic, B., Djukanovic, S., 2017.
P. S., Svensen, E., Turab, A., & Zaitchik, B. (2018). Evaluating meteorological data Architecting an IoT-enabled platform for precision agriculture and ecological
from weather stations, and from satellites and global models for a multi-site monitoring: A case study. Computers and Electronics in Agriculture 140, 255–265.
epidemiological study. Environmental Research, 165(October 2017), 91–109. doi: https://doi.org/10.1016/j.compag.2017.06.008.
10.1016/j.envres.2018.02.027. Quinlan, J. R. (1992). Learning with continuous classes, 343–348.
Corrales, D.C., Figueroa, A., Ledezma, A., Corrales, J.C., 2015. An empirical multi- Rajasegarar, S., Leckie, C., Palaniswami, M., Bezdek, J.C., 2007. Quarter sphere based
classifier for coffee rust detection in Colombian crops. In: Gervasi, O., Murgante, B., distributed anomaly detection in wireless sensor networks. IEEE International
Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. Conference on Communications 2007, 3864–3869.
(Eds.), Computational science and its applications -ICCSA 2015. Springer Ramesh, D. (2015). ANALYSIS OF CROP YIELD PREDICTION USING DATA MINING
International Publishing, pp. 60–74. https://doi.org/10.1007/978-3-319-21404-7_5. TECHNIQUES. International Journal of Research in Engineering and Technology, 04
Corrales, D.C., Gutierrez, G., Rodriguez, J.P., Ledezma, A., Corrales, J.C., 2017. Lack of (1), 470–473. Retrieved April 17, 2019, from https://www.academia.edu/
data: Is it enough estimating the coffee rust with meteorological time series? 21226449/ANALYSIS_OF_CROP_YIELD_PREDICTION_USING_DATA_MINING_
Computational Science and Its Applications - ICCSA 2017, 3–16. https://doi.org/ TECHNIQUES.
10.1007/978-3-319-62395-5_1. Ramli, M.R., Daely, P.T., Kim, D.-S., Lee, J.M., 2020. lot-based adaptive network
Ding, Z., & Fei, M. (2013). An anomaly detection approach based on isolation forest mechanism for reliable smart farm system. Computers and Electronics in Agriculture
algorithm for streaming data using sliding window (Vol. 3). IFAC. https://doi.org/ 170, 105287.
10.3182/20130902-3-CN-3020.00044. Ramos, P., Prieto, F., Oliveros, C, Aleixos, N., Albert, F., & Blasco, J. (2015). Medicion del
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al., 1996. A density-based algorithm for porcentaje de madurez en ramas de cafe mediante dispositivos moviles y vision por
discovering clusters in large spatial databases with noise. Kdd 96, 226–231. computador.
Farfan V., F. F., & Sanchez A., P. M. (2016). Densidad de siembra del cafe variedad Ramos, P.J., Prieto, F.A., Montoya, E.C., Oliveros, C.E., 2017. Automatic fruit count on
Castillo en sistemas agroforestales en el departamento de Santander Colombia coffee branches using computer vision. Computers and Electronics in Agriculture
[Accepted: 2016-07-18T15:03:17Z]. Retrieved December 14, 2020, from https:// 137, 9–22. https://doi.org/10.1016/j.compag.2017.03.010.
biblioteca.cenicafe.org/handle/10778/678. Raschka, S. (2020). Model evaluation, model selection, and algorithm selection in
Ferrag, M.A., Shu, L., Yang, X., Derhab, A., Maglaras, L., 2020. Security and Privacy for machine learning. arXiv: 1811.12808 [cs, stat]. Retrieved July 31, 2021, from
Green IoT-Based Agriculture: Review, Blockchain Solutions, and Challenges. IEEE http://arxiv.org/abs/1811.12808.
Access 8, 32031–32053. https://doi.org/10.1109/ACCESS.2020.2973178. Ray, P.P., 2017. Internet of things for smart agriculture: Technologies, practices and
Gholipoor, M., Nadali, F., 2019. Fruit yield prediction of pepper using artificial neural future direction. Journal of Ambient Intelligence and Smart Environments 9 (4),
network. Sci. Hortic. 250, 249–253. https://doi.org/10.1016/j.scienta.2019.02.040. 395–420.
Gia, T.N., Qingqing, L., Pe, J., 2019. Edge AI in Smart Farming IoT. In: CNNs at the Edge Rodriguez, J.P., Corrales, D.C., Aubertot, J.-N., Corrales, J.C., 2020. A computer vision
and Fog Computing with LoRa. (September). system for automatic cherry beans detection on coffee trees. Pattern Recognition
Glaroudis, D., Iossifides, A., Chatzimisios, P., 2020. Survey, comparison and research Utters 136, 142–153. https://doi.org/10.1016/j.patrec.2020.05.034.
challenges of iot application protocols for smart farming. Comput. Netw. 168, Rodriguez, J.P., Giron, E.J., Corrales, D.C., Corrales, J.C., 2018. A guideline for building
107037 https://doi.org/https://doi.org/10.1016/j.comnet.2019.107037. large coffee rust samples applying machine learning methods. In: Angelov, P.,
Goncalves, R. R. d. V., Zullo, J., Peron, T. M., Evangelista, S. R. M., & Romani, L. A. S. Iglesias, J.A., Corrales, J.C. (Eds.), Advances in information and communication
(2015). Numerical models to forecast the sugarcane production in regional scale technologies for adapting agriculture to climate change. Springer International
based on time series of NDVI/AVHRR images, 1–4. doi: 10.1109/Multi- Publishing, pp. 97–110.
Temp.2015.7245806. Rukundo, O. (2012). Nearest Neighbor Value Interpolation. arXiv preprint arXiv:
Granitto, P.M., Furlanello, C., Biasioli, F., Gasperi, F., 2006. Recursive feature 1211.1768, 3(4), 1–6.
elimination with random forest for PTR-MS analysis of agroin-dustrial products. Ryu, M., Yun, J., Miao, T., Ahn, I.Y., Choi, S.C., Kim, J., 2015. Design and
Chemometrics and Intelligent Laboratory Systems 83 (2), 83–90. https://doi.org/ implementation of a connected farm for smart farming system. IEEE SENSORS -
10.1016/j.chemolab.2006.01.007. Proceedings 2015, 1–4. https://doi.org/10.1109/ICSENS.2015.7370624.
Gupta, M., Abdelsalam, M., Mittal, S., 2020. Security and Privacy in Smart Farming : Sadowski, S., & Spachos, P. (2020). Wireless technologies for smart agricultural
Challenges and Opportunities. https://doi.org/10.1109/ACCESS.2020.2975142. monitoring using internet of things devices with energy harvesting capabilities.
Hazewinkel, M. (2013). Encyclopaedia of mathematics: Volume 6: Subject index-author Computers and Electronics in Agriculture, 172(September 2019), 105338. doi:
index. Springer Science & Business Media. 10.1016/j.compag.2020.105338.
Kamilaris, A., Kartakoullis, A., Prenafeta-Boldu, F.X., 2017. A review on the practice of Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X., 2017. DBSCAN revisited,
big data analysis in agriculture. Computers and Electronics in Agriculture 143 revisited: Why and how you should (still) use DBSCAN. ACM Transactions on
(January), 23–37. https://doi.org/10.1016/j.compag.2017.09.037. Database Systems 42 (3). https://doi.org/10.1145/3068335.
Klapetek, P., 2013. Basic Data Processing. Quantitative Data Processing in Scanning Segal, M. R. (2004). Machine learning benchmarks and random forest regression.
Probe Microscopy 55–80. https://doi.org/10.1016/b978-1-45-573058-2.00004-8. Retrieved September 12, 2020, from https://escholarship.org/uc/item/35x3v9t4.
Kouadio, L., Deo, R.C., Byrareddy, V., Adamowski, J.F., Mushtaq, S., Phuong Nguyen, V., Shah, P., Hiremath, B. H., & Chaudhary, S. (2015). Big data analytics for crop
2018. Artificial intelligence approach for the prediction of robusta coffee yield using recommendation system.
soil fertility properties. Computers and Electronics in Agriculture 155, 324–338. Shine, P., Murphy, M.D., Upton, J., Scully, T., 2018. Machine-learning algorithms for
https://doi.org/10.1016/j.compag.2018.10.014. predicting on-farm direct water and electricity consumption on pasture based dairy
Laszlo, L., 2005. Cubic spline interpolation with quasiminimal B-spline coefficients. Acta farms. Computers and Electronics in Agriculture 150, 74–87. https://doi.org/
Mathematica Hungarica 107 (1–2), 77–87. https://doi.org/10.1007/sl0474-005- 10.1016/j.compag.2018.03.023.
0180-4. Taneja, M., Byabazaire, J., Jalodia, N., Davy, A., Olariu, C., Malone, P., 2020. Machine
Liu, F.T., Ting, K.M., Zhou, Z.-H., 2008. Isolation forest. Eighth IEEE International learning based fog computing assisted data-driven approach for early lameness
Conference on Data Mining 2008, 413–422. detection in dairy cattle. Computers and Electronics in Agriculture 171 (February),
Martinez Marin, J. L. (2015). La importancia del corte selectivo de cafe cereza. http:// 105286. https://doi.org/10.1016/j.compag.2020.105286.
cafecol.mx/documentos/CORTE. Universitat Basel, S. (2006). Meteoblue. https://www.meteoblue.com/.
Masip-bruin, X., Jukan, A., Ren, G.-J., Zhu, J., & Clara, S. (n.d.). What is a Fog Node? A Wahir, N., Nor, M., Rusiman, M., Gopal, K., 2018. Treatment of outliers via interpolation
Tutorial on Current Concepts towards a Common Definition. method with neural network forecast performances. J. Phys: Conf. Ser. 995, 1–7.
Mirabelli, G., Pizzuti, T., Gomez-Gonzalez, F., & Sanz-Bobi, M. A. (2012). A bpmn Wolfert, S., Ge, L., Verdouw, C., Bogaardt, M.-J., 2017. Big Data in Smart Farming - A
general framework for managing traceability in a food supply chain. review. Agric. Syst. 153, 69–80. https://doi.org/10.1016/j.agsy.2017.01.023.
18

1 s2.0 S0168169921004592 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0168169921004592 Main

Uploaded by

Copyright:

Available Formats

Computers and Electronics in Agriculture 190 (2021) 106442

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture

IoT-Agro: A smart farming system to Colombian coffee farms

The IoT-Agro architecture is built on three layers: Agriculture

Fig. 1. IoT-Agro Architecture.

Fig. 3. IoT device elements.

Fig. 5. Data Analytics Layer.

Fig. 6. Plots distribution of the farm Los Naranjos.

Fig. 7. BPMN Coffee Value Chain.

Fig. 8. Crop stage.

presents an inaccurate report in terms of wind direction. Meteoblue

Libelium HIGH 6 WiFi, wired MEDIUM HIGH YES

Pearson’s 0.927 0.777 0.45 0.2527

Fig. 15. Results of DBSCAN.

Fig. 16. Results of Isolation Forest.

Fig. 17. Results of Support Vector Machine.

Cubic Spline 0.0849270614 Control Nominal Feature indicating whether or

• TreeRegressor (TR) (Quinlan, 1992): A regression tree is similar to a Table 6

Fig. 19. GUI - IoT-Agro services.

Fig. 20. GUI - Historical weather data.

Fig. 21. GUI - Coffee Production Estimation Module - Query Parameters.

Fig. 22. GUI - Coffee Production Estimation Module - Results.

Fig. 23. GUI - IoT devices status query.

considerable amount of beans if he decided to harvest a partially ripe lot

You might also like