Journal Terra Sentia

Journal of Intelligent & Robotic Systems
https://doi.org/10.1007/s10846-023-01849-8
REGULAR PAPER
Under-Canopy Navigation for an Agricultural Rover

Based on Image Data
Estêvão Serafim Calera1 · Gabriel Correa de Oliveira1 · Gabriel Lima Araujo1 · Jorge Id Facuri Filho1 ·
Lucas Toschi1 · Andre Carmona Hernandes2 · Andres Eduardo Baquero Velasquez3 ·
Mateus Valverde Gasparino4 · Girish Chowdhary4 · Vitor Akihiro Hisano Higuti3 · Marcelo Becker1
Received: 2 June 2022 / Accepted: 3 March 2023

© The Author(s), under exclusive licence to Springer Nature B.V. 2023
Abstract
This paper presents an Image data-based autonomous navigation system for an under-canopy agricultural mini-rover called
TerraSentia. This kind of navigation is a very challenging problem due to the lack of GNSS accuracy. This happens because
the crop leaves and stems attenuate the GNSS signal and produce multi-path data. In such a scenario, reactive navigation
techniques based on the detection of crop rows using image data have proved to be an efficient alternative to GNSS. However,
it also presents some challenges, mainly owing to leaves occlusions under the canopy and dealing with varying weather
conditions. Our system addresses these issues by combining different image-based approaches using low-cost hardware.
Tests were carried out using multiple robots, in different field conditions, and in different locations. The results show that our
system is able to safely navigate without interventions in fields without significant gaps in the crop rows. In addition to this,
we see as future steps, not only comparing more recent convolutional neural networks based on processing power needs and
accuracy, but also the fusion of these vision-based approaches previously developed by our group in order to obtain the best
of both approaches.
Keywords Mobile robotics · Navigation · Image data · Under-canopy
1 Introduction harvesters and tractors. In the same period, researchers have

investigated the potential use of GNSS-based navigation sys-
The development of agricultural robots able to navigate tems for small autonomous robots that travels over the canopy
autonomously in crop field scenarios has been carried out on horticultural crops such as sugar beet [3]. Although these
for a long time, both, in research centers and companies. approaches work reasonably well when there is a direct line-
Checking the literature we may cite [1, 2] as two excel- of-sight to GNSS satellite constellations, they suffer from
lent references that provide a systematic review of various inaccuracies due to multi-path errors when the robots are
applications of agricultural robots (research and commercial under the canopy. The multi-path error occurs because of
platforms) that were developed for in-crop-field operations. thick leaf canopies in crops. We may cite the Robotanist, an
Historically, early works on autonomous navigation for under-canopy robotic platform, as an example of this prob-
agricultural robots were focused on the auto-guidance of lem [4]. The robot used GNSS data for navigation, but its
large agricultural machinery. For almost 2 decades, GNSS- navigator faced challenges when the plants grow taller than
based navigation systems have been widely used to allow the GNSS antenna height. In addition to this, it is important to
automatic guidance of traditional crop machinery such as highlight that the GNSS is not always reliable everywhere on
Earth. The magnetosphere model used in GNSS positioning
All authors contributed equally to this work. computation was developed for Earth’s northern hemisphere
and some countries in the southern hemisphere are affected
B Andre Carmona Hernandes by the South Atlantic Magnetic Anomaly (SAMA) [5, 6].
andre.hernandes@ufscar.br Furthermore, the lonely use of GNSS cannot provide the
B Marcelo Becker robot with essential information to deal with the pres-
becker@sc.usp.br ence of static and dynamic obstacles in the planned GNSS
Extended author information available on the last page of the article waypoint path [7, 8]. Based on these examples, we may
123
conclude that GNSS-based navigation cannot be a stand- troubles, because of the presence of clutter data and occlu-
alone solution for under-canopy autonomous agricultural sions [16]. Unfortunately, another issue is the costs associated
(AG) robots [4, 9, 10]. with the use of several sensors. If on the one hand, multi-
When it comes to vision-based navigation systems for ple sensors and sensor fusion can produce better navigation
agricultural applications, one may notice that they have been behavior, on the other hand, the robot can become too expen-
extensively studied last decades. Several authors developed sive for the customers. Based on this reason, we decided to
such systems for following rows in crop fields, mainly apply- study the only use of vision data for navigation. So, this paper
ing computer vision techniques. Crop rows in agricultural presents the results of our research on under-canopy naviga-
fields have a very peculiar geometric structure that usu- tion for AG robots based only on vision data.
ally can be modeled as parallel lines. This characteristic
makes easier the use of computer vision techniques. Previ-
ous works in this field focused on auto guidance for tractors 2 Robotic Platform: TerraSentia
and heavy machinery such as harvesters [11]. More recently
many researchers also focused on vision-based navigation The TerraSentia rover is a compact and autonomous mobile
of over-the-canopy agricultural robots [12]. In such appli- robot initially designed for phenotyping. Since 2017, it is
cations, usually, the camera points down at the crop rows in produced and commercialized by EarthSense Inc. It is light-
order to acquire a top-down view and a clear view of multiple weight (approx. 16.55 Kg with battery), low-cost, and its
crop rows. The next step is the application of a segmenta- dimensions are 0.54 m x 0.32 m x 0.35 m (length x width x
tion algorithm (based on color indices and line fitting) in the height). Thanks to its dimensions, it is able to travel under
images in order to separate the vegetation from the soil back- the canopy in crops with minimum row spacing of 0.4 m. Its
ground. In the literature, there are some classical approaches body was specially designed in order to guarantee that the
to performing this segmentation procedure, such as Hough rover does not harm the plant leaves while traveling in the
Transform. After the segmentation, one may use the fitted crop field. TerraSentia is powered by four Maytech Brush-
lines to extract the relative orientation and offset of the robot less Outrunner Hub Motors. Each motor has a hall-effect
with respect to the crop row. This information is essential to sensor to drive each of the rovers’ four wheels. The wheels
control the robot steering and keep it in the middle of the lane are built using additive manufacturing with polylactic acid.
between the crop rows [13, 14]. Each motor is assembled in the middle of the wheel and
LiDAR sensors are commonly used for navigating mobile driven by a custom version of the VESC4 motor controllers.
robots in indoor and outdoor environments. For decades they When it comes to its embedded perception system, the rover
are been applied as the main input data source for mapping, has a Bosch BNO055 IMU (Inertial Measurement Unit), a
localization, and obstacle detection procedures. More specif- Zed-F9P GNSS (Global Positioned System) manufactured
ically in agricultural applications, LiDARs have been used in by Ublox; two 2D Hokuyo LiDAR (UST-10LX) for scan-
several scenarios. In many applications, the use of 2D LiDAR ning the environment, and four wide-angle Full HD USB
sensors is focused on line extraction and line fitting on images Camera Modules 1080P to record videos (Fig. 1).
of cameras in order to detect crop rows. Later, based on these IMU, GNSS, LiDAR, and some Cameras are mainly used
results, the robot estimates its orientation and offset to the for navigation purposes. One LiDAR is placed in the front
crop rows and generates the navigation controller outputs. It part and it provides a surrounding’s horizontal scan that is
is important to highlight that the high costs of 3D LiDAR used as input data for our navigation system. The second
sensors often make their use unfeasible in agricultural appli- LiDAR is placed in the rear part pointing up and it provides a
cations. Due to this, it is common to use 2D LiDAR sensors vertical scan that is used to make an off-line 3D crop recon-
with an additional degree of freedom (for instance, a rotating struction. The four Camera Modules record videos that can be
platform) to acquire 3D point clouds of the environment. In used to extract meaningful phenotype features (for instance,
both cases (2D and 3D LiDAR sensors) it is necessary to deal it is possible to count plants, estimate stem width, leaf area
with clutter and noise present in LiDAR scan data [15]. index under the canopy, and plant height). The images can
Aiming to obtain the best AG robot navigation behavior also be used to detect diseases [17, 18].
in the crop rows, sensor fusion techniques are commonly All embedded sensors are connected to a Raspberry Pi 3.
applied. These techniques are used to combine several In a nutshell, it acquires the data from the front LiDAR, IMU,
embedded sensor data (for instance, data from encoders, and hall-effect sensors run the navigation algorithms and cal-
IMU, GNSS, LiDAR, camera, etc.) to improve the robustness culates the desired control signals (rover angular and linear
of the robot navigation system. This is essential in under- velocities). Then, based on the rover geometry and kine-
canopy scenarios where one may not navigate based only on matics, it converts the desired control signals to the desired
GNSS data. Nevertheless, when applying sensor fusion tech- velocity for each motor and sends these values to the VESC4
niques in under-canopy environments, one may face several controllers as PWM (Pulse Width Modulation) signals to
123
Fig. 1 Snapshots of TerraSentia

inside of sugarcane plantation
taken by a commercial drone. In
the bottom left corner,
TerraSentia is highlighted
drive the motors. In addition to this, the rover has an Intel uses the optical flow and filtering to achieve visual odometry,
NUC i7 embedded computer (500GB SSD and 16GB RAM) thus module 2 feeds the controller with an estimated heading
to store all acquired sensor data. and module 3 feeds how much the robot has driven.
However, driving in-crop has a lot of challenges, given
that the previous modules were tuned for clean rows, that is
not always the case, there are times the rows are irregular
3 Methodology and leaves can bend down in the middle of the row, creat-
ing a false illusion of blocked path. To aid the navigation
Figure 2 shows the big picture tackled in this work, illustrat-
control to overcome these shortages, modules 4 and 5 were
ing visual data flow and all the modules used, which will be
implemented. Module 4 (Section 3.3) divides the image and
detailed in the following subsections. It is worth mention-
classifies each smaller block as plant, soil, or sky, thus if a
ing that the Fig. 2 represents all the integration effort needed
cluster of a plant is found in the middle of the path, block-
to complete an autonomous in-crop navigation using only
age detection can be found. Module 5 (Section 3.4) uses
images.
deep learning to robustly detect the outside plants and thus,
To autonomously navigate under-canopy using only image
this information aids the direction estimation and navigation
data, it was first proposed an entrance detection module, to
planner. It is important to highlight that Modules 4 and 5 run
aid the robot precisely finds the row it should enter. Although
independently of modules 2 and 3, chosen as the main mod-
it is in the maneuvering area, and thus, GNSS-based methods
ules for navigation. The main idea for module 4 is to warn
can partially or totally recover from in-crop movement, the
modules 2 and 3 that in front of TerraSentia is just a leaf, and,
standard errors associated with it can make it skip a row,
thus, the controller can go straight, in a way that the blockage
therefore, the entrance detection module aids the approach
can be skipped. Module 5 kicks in whenever the plantations
controller to direct to the correct row, as more detailed in
start losing too much of the perfect corridor scenario, or by
Section 3.1.
flaws on the plantation line, or by the presence of weeds in the
When TerraSentia enters the row, 4 parallel modules start
middle of the path. Also, when the standard deviation of the
running, illustrated in Fig. 2 with numbers 2 to 5. Modules 2
angular control action passes a safety threshold, module 5
and 3 (Section 3.2) are interconnected routines since module
can aid in finding the correct corridor/direction to follow.
2 ultimate goal is to detect the navigable floor, with the aid of
Finally, exiting a row is challenging since its GNSS-based
the perspective anchor point, and uses relative camera posi-
methods can not be used and the visual cues used on the
tion to the perspective point as a direction detector. Module 3
123
Fig. 2 Data flow and modules

representation for under-canopy
navigation
in-crop movement are no longer present, as detailed in Sub- The first stage consists of cropping the original frame
section 3.5. (stage 1) into a smaller more focused imaged. This behaviour
is triggered by GNSS positioning, when its deviation starts
3.1 Entrance Detection Module to increase due to satellite occlusion. As a proof-of-concept,
fixed values were used at the triggering point, considered to
During the maneuvering area, GNSS-based control methods be a critical point for detection. As illustrated at images 1 to
may be applied to guide the robot to the next desired row, 2, in Fig. 3, it was chosen to remove 30% of the ground and
however, as it approaches the crop, GNSS shadowing and 25% at each side of the image.
multipath starts to play decrease accuracy, and thus, an aux- The second stage is to use Canny edge detection [19].
iliary row detection method is required to keep the robot on This multi-stage algorithm, developed by John F. Canny, is
the right track. already implemented on OpenCV library, thus the param-
When performing the entrance procedure, it’s necessary eters used were 100 and 499 for both thresholds and
to identify the space between two crop stalks to get in the the others were kept as the default, resulting in stage 3
row straight, without collisions, and activate the navigation image.
module responsible to control the robot inside crop rows. The Once the edges were detected, it can be noticed that the
proposed entrance module was based on images as it is more crops are highlighted and those edges are nearly vertical
likely to differentiate a crop from a weed using computer lines. This can be fairly explicable since there is a crisp con-
vision algorithms than just LiDAR data. trast between the up-front plants and in-crop image, due to
Figure 3 encompasses all stages used to achieve the shadow given the plant height, thus, giving this stage detec-
entrance identification. tion method a fairly consistent approach.
Fig. 3 Entrance identification

overview. 1 - Raw Image, 2 -
cropped ROI, 3 - Edge
Detection, 4 - Hough Line
detection, 5 - Output Image
123
The next stage, as showed in stage 4, is to identify the The main part of the Visual Odometry strategy is to find
main plants that form the walls of the desired row. In this a essential matrix E, that describes the geometric relation
stage, Hough Line Transform [20] was used, which also is between consecutive images respecting the epipolar con-
available in OpenCV library. On this work, a rho resolution strain [22].
of 23 pixels and theta resolution of 8 degrees was used. Also, This is crucial since it is the essential matrix that contains
the Hough threshold was set to 96 to avoid smaller lines to the rotation matrix Rk,k−1 and the translation vector tk,k−1
be accounted for. In the final stage, a reference rotation was of the camera motion between Ik and Ik−1 [22], better seen
made, choosing the vertical line as the 0 degree reference. in Eq. (1).
After that, a search for lines between -15 and 15 degrees was
performed, resulting in the image 5, at Fig. 3. E k,k−1 = tˆk,k−1 Rk,k−1 (1)
where
3.2 In-Crop Navigation ⎡ ⎤
0 −tz t y
Once inside the crop row, 3 distinct modules start working tˆk,k−1 = ⎣ tz 0 −tx ⎦ (2)
together to achieve autonomous navigation, Visual Odom- −t y tx 0
etry, to aid in estimating traveled distance and orientation,
a visual direction definition, to estimate the orientation of To estimate E k,k−1 , it is used the Five-Point algorithm
the center of the robot with respect to the center of the row, [26], alongside, the Random Sample Consensus algorithm
that is used by Visual Odometry and Controller, to keep the (RANSAC) [27] is applied to improve the accuracy of the
TerraSentia robot on track. extracted rotation matrix and translation vector [22].
3.2.2 Direction Definition

3.2.1 Visual Odometry
Our method for the position and orientation estimation has
Visual odometry (VO) is a computer vision method to another heuristic to estimate the rotation matrix. The method
estimate position and orientation of a moving body in an was developed using multiples steps of image processing, as
arbitrary path. According to [21], there are available differ- follow the diagram on Fig. 4 and illustrated in Fig. 5.
ent approaches for visual odometry problem in the literature. The first step consists again of a Gaussian Blur Filter, used
Our work is inspired by [22] and it is used for comparison to reduce noise present in the image. Then, a color treatment
with our method and referenced as VO. was used to select the colors of interest. This treatment uses
In the standard Visual Odometry algorithm, a set of fea- the Excess Green (ExG) vegetation index [28], outputting
tures are detected and tracked to the new image. The overall a greyscale image that highlights the green of the image.
feature movement is what makes possible to recover the In order to binarize the image, the Otsu threshold method
actual movement of the robot. For featuring detection, we [29] was used. It consists of an iteration technique that runs
used Features from Accelerated Segment Test (FAST) detec- through all possible threshold values that minimize the sum
tor [23, 24], given its ability to retrieve more corners in less of the intraclass variances of the analyzed image.
processing time. The next step was a morphological opening operation,
To track the set of features identified by FAST, the Kanade- applied to smooth the contours present in the images, which
Lucas-Tomasi (KLT) algorithm, proposed by [25], was used. facilitates the following stages. Two other operations consti-
It implements a pyramidal reduction on image resolution tute the opening operation: one of erosion, followed by one
and, with the set of features, it minimizes a residual function of dilation, where both use the same kernel [30]. The Erosion
that estimates the optical flow parameters, and thus, camera operation is carried out by a convoluted kernel, where pixels
movement. from the image are only considered white if all the pixels over
Fig. 4 Method flowchart
123
Fig. 5 Method steps
the kernel are white, otherwise, it is considered black, that is, The selected lines were put in the inversed binarized
being eroded. In contrast, the dilatation is a dual operation image, and then the connectedComponentsWithStats func-
to erosion, i.e., if at least one pixel from the image over the tion from OpenCV was used to obtain the contours, their
kernel is white, the pixel is white. areas, and their centroids. That information was very impor-
An edge detection function (Canny Edge Detector) was tant to the next step of the method.
performed on the morphological operated image [19]. The The two largest area contours represent the sky and the
return of this function was only the contours present in the corridor. Since only the corridor is the focus of the method, a
binarized image, on letter d of Fig. 5. selection of this area is made using the centroid. The lowest
The convex hull of the contours was found using a function centroid of the two is selected since it represents the corridor,
based in the Sklansky algorithm [31]. This step was made to and then the coordinates of the highest point of the corridor
facilitate the following one. contour are taken. It is used with the lowest point in the
The convex hull is printed on an image and then passed middle of the image to establish the direction vector [33].
through the Hough Transform. This function works by
converting each point of the cartesian space (x, y) into a sinu-
soidal space (ρ = xcosθ + ysenθ ), called Hough Space. 3.2.3 Motion Estimation
In the cartesian space, a straight line is translated as an
intersection of multiple sinusoids in the Hough space [32]. Aiming to perform camera motion estimation along the
The method was implemented using the probabilistic Hough robot’s path, Rk,k−1 and tk,k=1 must be used to construct a
Transform, from openCV library, since this one returns finite homogeneous transformation matrix Tk,k−1 , to describe the
lines (just two points). It was implemented in order to facil- camera pose at time k with respect to time k − 1 [22]. The
itate the following phases, reducing execution time. matrix Tk,k−1 is represented in Eq. (3).
Several lines were returned by the transform, but it is nec-
essary to analyze in order to use them efficiently, since not
Rk,k−1 tk,k−1
all of them were used to define the robot’s direction. Tk,k−1 = (3)
0 1
123
Computing Tk,k−1 every two consecutive images, it is pos- according to superpixels’ adherence to image element edges.
sible to build the set T1:n = {T1,0 , ..., Tn,n−1 }, which provides An example frame segmented into superpixels is shown in
the relative camera motion along the whole trajectory trav- Fig. 6.
eled. To compute the camera pose at time k with respect to With the segmented image, superpixels classification into
initial time, Tk,0 is calculated as shown in Eq. (4). leaf, ground, or sky is performed. For this purpose, a large
dataset of superpixels with color features - color spaces

Rk−1,0 × Rk,k−1 Rk−1,0 × tk,k−1 + tk−1,0 (RGB, LAB, HSV, LUV, and YCrCb [37]) and color veg-
Tk,0 = (4)
0 1 etation indices (NDI, ExG, ExGR, CIVE, and COM2 [38,
39]) - and statistical features (standard deviation, skewness,
As the last VO step, the set of camera poses Tk,0 = kurtosis, entropy, minimum, maximum, mean, and median
{T1,0 , ..., Tn,0 } recovers the robot’s trajectory, providing the [34]), whose data was labeled, and the Random Forest
path estimation traveled by it. Classifier (RFC) [40, 41] was implemented. The complex-
ity of the machine learning model, in terms of quantity
3.2.4 Navigation and depth of decision trees, was considerably reduced by
removing features with the smallest variances and by ana-
A simple PID controller was implemented to control the lyzing the performance metrics confusion matrix, accuracy,
robot. The communication was done through ROS messages precision, recall, and F1-score [42], aiming for real-time
and a comparator was implemented before the PID input. application.
This comparator is intended to compare the vector angle Finally, the task was to differentiate which leaf superpixels
between readings to see if the movement presented by them is are obstructing the path and which are not. With this in mind,
possible. That is, if the method reads between two frames that the robot’s navigable region and perspective point detection
the robot rotated 90 degrees, the algorithm does not allow the algorithms were used, whereby it is possible to determine
robot to correct by moving 90 degrees in the opposite direc- which leaf superpixels are in this region - i.e. outside the
tion. Since this method can suffer from a robot position drift, usual region - and therefore correspond to the overlapping
the vector value is reset every 30 frames, thus preventing the leaves on the path, so that they can be removed from the
robot from locking in a position. image.
3.3 Leaf Obstruction Detection Module 3.4 Detection Module
Aiming to achieve TerraSentia’s autonomous navigation, a For this project, a detection module able to identify and clas-
problem that arises is the numerous foliage that protrudes on sify plants individually was created. During its development,
the trail, making it difficult for the robot to orient itself. In a dataset with one thousand images was produced. They were
this way, it was proposed to use computer vision and machine obtained by TerraSentia in a cornfield with its monocular
learning on the images of the robot’s front camera, in order to camera and were manually labeled by the research team. This
identify the obstructing leaves and remove the corresponding module uses Deep Learning methods and the YOLOv3 archi-
superpixels from the image. tecture, obtaining satisfactory results. Validation and labeling
The first step of image processing is the application of a methods improvement and dataset expansion stand out as
Gaussian filter to reduce input noise [34]. Then, the image is future improvements.
segmented into superpixels by the Fast-Simple Linear Iter- This component allows better implementations of Simul-
ative Clustering (Fast-SLIC) method [35, 36] to facilitate taneous Localization and Mapping (SLAM) algorithms in
and improve, in terms of runtime, the following steps. The dynamic contexts, like the agricultural environment. Besides
segmentation is adjusted from two parameters - the number that, it complements other modules, providing data about the
of superpixels and compactness - whose values were chosen scene objects.
Fig. 6 Fast-SLIC segmentation

example
123
Fig. 7 YOLOv3 neural network architecture [43]
Accordingly, a semantic understanding of the robot’s tions - in other words, some plants weren’t marked even if
surroundings is necessary to develop fully autonomous nav- they were explicit in the image. This observation will be
igation. This knowledge allows high-level action planning resumed later.
and can be used to interpret other data by context. The tech- Due to the computational effort needed to train the neural
nique for this purpose is called semantic segmentation and is network, the transfer learning technique was used. Therefore,
essential in this module [44, 45]. the heavy training to identify common features of images was
Semantic segmentation has three main types: batch-based not needed, only the fine-tuning to accomplish the specific
training, dense classification, and multi-scale classification. task described. Besides that, batch-size training and image
For the last one, the neural networks that can detect and clas- augmentations were also used.
sify objects with just one pass of the image on the model
stand out. These are used in real-time applications, due to 3.5 Exit Detection Module
the inference speed [44].
Some models fit the described requirements, like the The exit identification, therefore, runs in parallel to the navi-
"SSD: single shot multibox detector" [46], PSPNet [47], gation when the robot is inside the crop field and it’s function
YOLO (You Only Look Once) [48], and YOLOv3 [43]. is indicates the moment were the robot gets out of the row.
Among them, the neural network YOLOv3 was chosen due This is done by following the steps described in the Fig. 8.
to its contribution to actual commercial technologies and its For comparison purposes, the same diagram was made in
state-of-art status in detection and classification tasks. The a moment where the robot is inside the crop row and are far
YOLOv3 architecture is shown in Fig. 7. It utilizes the back- from exit. This diagram is shown in the Fig. 9.
bone network Darknet53 as the basic image feature extractor. First of all the Excess Green (ExG) [28] color vegetation
The network outputs count with the positions and box sizes indices is applied in order to emphasize the green part of the
as well as a score for object existence and a score for each image, the vegetation, and deny the soil and the sky. This
one of the pre-defined classes. technique is widely used in the literature and produces an
To train the network, a dataset was built with one thou- image in gray scale that has to be binarized. The most used
sand images, obtained by the TerraSentia robot in a cornfield. method in this context is Otsu’s threshold [29], that perform
Among the images, 750 of them were used for training, and well and gives a value to define what should be white (higher
the others 250 for method validation. They were labeled intensity) and black (lower intensity). In this process, it may
manually using the tool called CVAT (Computer Vision have some noise in the image, that is corrected with erode and
Annotation Tool). As manual work, it has some imperfec- dilate technique [34]. It will remove small white regions and
123
Fig. 8 Exit identification

diagram
keep the crop stalks and large leaves. To focus on the main has images in 640 x 480 resolution, so it was necessary to
region of interest, the image is cropped. With this segmented rescale the calibration matrix parameters by a 0.5 factor in
image representing just the vegetation, it is notable that the x coordinates and by a 0.67 factor in y coordinates since
middle of the image is almost black, since it is a region mostly they were initially calibrated based on 1280 x 720 image
consisting of soil and white pixels are concentrated on the resolution.
sides. Additionally, that concentration can be observed all Some images from the dataset were ignored during the
along the way, from the beginning to the end of the crop row process by a skipping image algorithm. These images were
and only changes when the robot reaches the maneuver area, skipped because of the presence of some leaves on the path
the objective of this section. Therefore, the algorithm can that occluded camera vision, as shown in Fig. 10. The pres-
detect the crop row ending and send a positive flag, marked ence of these occlusions caused errors in the implemented
in red at Fig. 8 for visualization purpose. algorithm, making it necessary to eliminate their influence
so the camera could observe the environment correctly. The
rejection algorithm works by calculating the percentage of
4 Experimental Results features in image Ik−1 tracked in image Ik . If the percentage
is less than a threshold, defined empirically, the image Ik was
In this section, prominent results will be shown, as modules skipped and features were re-detected to restart the tracking
2, 3, and 6 (Detect navigable floor, Visual Odometry, and con- process. Every five frames features were also re-detected to
troller) are interconnected and chosen as the main modules, restart the tracking process, because of the tracked features
the results will be shown as tacking errors provided by the loss problem caused by the optical flow.
visual odometry. Module 4 and module 5, as they run inde- To detect the features, the FAST detector used a thresh-
pendently, only classification results will be shown. As for old parameter thr = 60 and the KLT tracker used a 15 x
the entrance and exit modules, given the nature of an inter- 15 integration window, with L = 3. Other parameters were
face module with GNSS systems, no metrics were performed tested, but the chosen ones showed better results in motion
since the need for GNSS system. estimation. Figure 11 shows the tracking process after feature
detection. To evaluate the proposed method, its estimated tra-
4.1 In-Crop Navigation Results jectory was compared to the ones given by VO and the ground
truth (Fig. 8). The ground truth was obtained using GNSS,
This work compared the standard VO method, presented in gyroscope, and hall sensors for wheel speeds, as reference.
[22], and our proposed method. Both were tested on a laptop The proposed method and VO trajectories were able to
with NVIDIA GeForce GTX 1650 4 Gb video card, AMD follow the ground truth, although there were some oscilla-
Ryzen 5 processor, and 16 Gb RAM, running Ubuntu 18.04 tions in the route because the robot did not follow a perfect
LTS and implemented in C++ with the aid of OpenCV library. straight-line path. The robot could not do it because the exis-
The dataset used to test the methods is a set of images of tence of uncertainties in its motion took the robot out of
the path, traveled by TerraSentia, in a cornfield. The robot its equilibrium state. Due to this, the action of the robot’s
traveled this path along 63 m in a straight line. The dataset control system forced the robot back to its equilibrium state,
Fig. 9 Exit identification

diagram inside crop rows
123
causing a zigzag motion at certain moments. The oscillations

made the estimated trajectories deviate from the reference in
some parts of the path. Because some zigzag motions were
very small, the optical flow did not recognize correctly such
motions to estimate this oscillatory behavior in some posi-
tions of the trajectory.
The small differences from the ground truth were caused
by the drift problem of VO, which directly reflects on the pro-
posed method. Because VO is an incremental estimator, the
drift problem causes translation errors that are accumulated
during the whole trajectory estimation. Another translation
error source that deviated the estimated trajectory from the
ground truth was caused by the excessive plants’ motion in
the scene, the wind interference in the environment that hit
plants, and the robot’s passage over some plants that were
Fig. 10 Camera vision occlusion caused by a leaf fallen on the path. These motions contributed to the scene by
destabilizing the optical flow process of the tracker, which
started to identify motions other than the robot’s motion.
The proposed method followed the ground truth better
than VO (Fig. 12). The estimated trajectory of the proposed
method deviated less from the reference. Errors caused by
the drift problem and by the robot’s uncertainties oscillations
were smaller. It occurred because of the greater precision of
the calculated rotation matrix in the proposed method than the
estimated one in VO. The first computes the rotation matrix
without tracked features, which ensures that unfiltered out-
liers by RANSAC do not interfere with the rotation matrix
estimation. Although the RANSAC method eliminates most
part of the outliers in the set of tracked points, some of
these points are not eliminated, interfering detrimentally with
the estimation of an essential matrix to extract the rotation
Fig. 11 Feature tracking process after feature detection matrix.
Fig. 12 Estimated trajectory comparisons between VO, the proposed method, and ground truth: whole trajectory, starting at (0, 0) position (left)
and Zoomed end path (right)
123
Table 1 Translation Mean Squared Errors (MSE) and time processing

of VO and proposed method
Method MSE x [m] MSE y [m] Time [s]
Proposed method 0.0272 1.1604 0.202

VO 0.0298 1.2253 0.174
as close as possible to the size of the ground truth trajec-

tory. This approach, despite allowing a good visualization
of the comparison between the trajectories, causes transla-
tion errors to increase since each position of the trajectory is
scaled by a default factor.
Fig. 13 Translation error on x-axis 4.2 Leaf Obstruction Detection Module Results
The input image segmentation into superpixels was per-

To compare the performances between VO and the pro- formed by the Fast-SLIC method, which consists of SLIC-
posed method along the whole path, Figs. 13 and 14 show variant algorithm implementation that aims for significantly
the translation errors from the ground truth. Table 1 shows low runtime with CPU [35]. With values of 1100 for a number
the translation Mean Squared Errors (MSE) with respect of segments and 15 for compactness, the superpixels adhered
to ground truth and the average processing time of the to most of the image elements’ edges.
methods. It is possible to notice that although the methods For the classification step of the superpixels into leaf,
achieved translation errors close to each other, the proposed ground, and sky, it was necessary to create a vast dataset
method showed smaller translation errors than the classical with over 34,000 labeled superpixels and with color space
VO method. Furthermore, it was obtained that the average features, color vegetation indices, and statistics features for
processing time of the proposed method is 16.09 % longer each color channel.
than the processing time of the VO method. This was already The classification method chosen was Random Forest
expected since the proposed method algorithm runs more because it overcomes the overfitting problem, can handle
processes than the standard VO to estimate the trajectory. thousands of input variables, provides estimates of which
The problem of recovering a relative scale for each stretch variables are important in the classification, is easy to inter-
of the estimated trajectory was not solved. For the results pret, and is efficient [49]. Initially, the highest possible
presented in Figs. 13 and 14, an absolute scale was manu- performance was obtained using all features and with 250
ally determined in a way that the obtained trajectories were decision trees, each with a depth of around 50. Figure 15
and Table 2 show the values of the metrics obtained. To opti-
mize the model for real-time application, features with fewer
variances, that is, less relevant for learning, were discarded
from the training, leaving the color vegetation indices ExG
and ExGR and their respective statistical descriptors standard
deviation, skewness, kurtosis, entropy, mean and median.
Furthermore, the decision trees complexity was reduced con-
siderably for 5 decision trees of depth 3, since any increase in
complexity is not justified by the performance metrics. Fig-
ure 16 and Table 3 exemplify this idea very well, because,
after the major simplification of the Random Forest model,
the accuracy reduced only 8%, F1-score varied at most by
10% and the confusion matrix values varied at most by 12%.
Thus, the metric values were still high enough for excellent
classification. Regarding the features, it was concluded that
ExG entropy, ExGR entropy, and ExGR median are the most
relevant for classification with the importance of 28%, 15%,
Fig. 14 Translation error on y-axis and 13% respectively.
123
When analyzing the output image segmented and classi-

fied by Random Forest (Fig. 16), it can be seen that most of
the classification errors occur in the superpixels that group
leaf and ground pixels, as a result of the shadows, which
make it difficult to discern the scene elements.
4.3 Detection Module Results
Some neural network examples can be seen in Fig. 17. In

plots a and b, the neural network found good results that were
not manually marked in the ground-truth boxes; conversely,
in plots c and d, the neural network failed to detect some
of the ground-truth boxes. Usually, in an object detection
task, the Mean Average Precision (mAP) is used to measure
how much the model accomplished to adjust the predictions
Fig. 15 Normalized Confusion Matrix for RFC with 250 decision trees to the ground boxes. The calculated mAP value is 65.70%,
of depth 50 a result that is not good to understand the neural network
performance since some identifiable plants were omitted in
Table 2 RFC performance with 250 decision trees of depth 50 labeling. That can be explained by the fact that the ground-
Precision Recall F1-score truth boxes were subjectively selected by a human.
Therefore, to measure the model precision, custom metrics
Leaf 0.90 0.93 0.92 were necessary. In each of the 250 images from the validation
Ground 0.93 0.89 0.91 dataset, the number of ground boxes (n gt ) and the number
Sky 0.92 0.88 0.90 of predictions (n pr ed ) were measured. Besides that, it was
Total accuracy 91.3% counted the number of good prediction boxes (n gpr ed ) and
the amount of them that coincide with the ground truth ones
(n gt pr ed ). The term "good predictions" was oriented by the
question: “does this box delimit the majority of a plant’s
pixels?".
With the described metrics, it was possible to calculate
the virtual "total" of plants (nvtotal), given by Eq. 5. Besides
that, the custom metrics were defined as in Eqs. 6 to 10.
n vtotal = n gt + (n gpr ed − n gt pr ed ) (5)
n gpr ed
P= (6)
n pr ed
n gt pr ed
C= (7)
n pr ed
n gt − n gt pr ed
Fig. 16 Normalized Confusion Matrix for RFC with 5 decision trees
NC = (8)
n gt
of depth 3
n gpr ed − n gt pr ed
Table 3 RFC performance with 5 decision trees of depth 3 EB = (9)
n pr ed
Precision Recall F1-score
n gpr ed − n gt pr ed
Leaf 0.87 0.80 0.84 E BOT = (10)
n vtotal
Ground 0.79 0.88 0.83
Calculating the average of the five parameters for each
Sky 0.85 0.78 0.81
image from the validation dataset, Table 4 was constructed
Total accuracy 83,2%
(prefix "m" indicates the mean value).
123
Fig. 17 Some YOLOv3 model

predictions (in yellow) and
ground-truth boxes (in pink)
Observing Table 4, it is possible to see that the mean over- navigation on under-canopy real farm scenarios. Although
all precision for the model (mP) is relatively good (93.88%). some of the techniques used in our navigation framework
Still only 62.61% (mC) of the predictions coincide with can be considered as not state of the art in the literature, they
the ground truth boxes, and 31.48% (mEB) of the predic- are time and processing power needs balanced and they can
tions were good and do not coincide with the ground truth be executed in real-time embedded in our mobile robot while
boxes. Concerning the virtual "total" of plants in each image, it acquires data and runs the algorithms for phenotyping the
26.18% (mEBOT) of the predictions do not have some pre- crops.
vious reference in the dataset. Nonetheless, 25.71% (mNC) In order to validate the vision-based navigation frame-
of the ground truth boxes were not identified by the model. work algorithms, we carried out several experiments on
corn, soybeans, sugarcane, and sorgo crops. The results were
encouraging and showed that only vision-based autonomous
navigation for in-crop situations is feasible. We observed
5 Conclusions that when the GNSS signal started to lose precision, the
vision-based framework was capable of identifying and guid-
In this paper, we first provided an extensive literature review ing the robot inside the crop. In all tests carried out, when
of the AG robots focusing on the under-canopy problem and the robot was under the canopy the vision-based modules
emphasizing that GNSS-based methods can not be only used could identify plants, estimate the robot’s orientation, esti-
in this scenario. Navigation in such a scenario is challenging mate its position, and classify possible leaf obstruction.
and still an open problem for low-cost robots. Thereafter we Connecting all individual modules, we could autonomously
presented our mobile robot, describing its embedded sensors guide the robot out of the crop row. In order to clarify
and hardware. For us, these were essential pieces of informa- each module’s contribution to the framework, individual
tion for taking a decision about the best way to implement results of the modules were presented and detailed com-
an in-crop navigation framework. ments are in Section 4.1. In a nutshell, For the under
Based on our mobile robot embedded sensor configu- canopy navigation, the framework achieved lateral errors
ration and processing power, aiming to find the best bal- estimated at 30 cm, an overall accuracy of 83.2% for the
ance between costs and technical challenges we decided to leaf obstruction detection module, and for the detection mod-
develop an image-based autonomous navigation framework. ule using YOLOv3, an overall precision of 93.88%. It is
Next, we described our ongoing development of autonomous important to highlight that YOLOv3 was chosen due to
its characteristics of fast training and processing. For sure
other more modern convolutional networks could increase
Table 4 Mean YOLOv3 model performance parameters values the precision, but probably at a higher processing power
mAP mP mC mNC mEB mEBOT need.
Taking into account the promising results obtained, for
65.70% 93.88% 62.61% 25.71% 31.48% 26.18%
future work we are planning to test other convolutional
123
networks and evaluate their performance in precision, robust- 3. Bakker, T., van Asselt, K., Bontsema, J., Müller, J., van Straten, G.:
ness, and processing power. We are also planning to develop Autonomous navigation using a robot platform in a sugar beet field.
Biosyst. Eng. 109(4), 357–368 (2011). https://doi.org/10.1016/j.
a visual and LiDAR-based framework and compare it with biosystemseng.2011.05.001
our visual-based framework. So, it will be possible to verify 4. Mueller-Sim, T., Jenkins, M., Abel, J., Kantor, G.: The Robotanist:
the pros and cons considering not only the embedded sensor A ground-based agricultural robot for high-throughput crop phe-
costs and processing power needs but also the system perfor- notyping. In: 2017 IEEE International Conference on Robotics
and Automation (ICRA), pp. 3634–3639. IEEE, (2017). http://
mance when it comes to the system’s overall autonomy. ieeexplore.ieee.org/document/7989418
5. Abdu, M.A., Batista, I.S., Carrasco, A.J., Brum, C.G.M.: South
Acknowledgements The work was partially supported by Sao Paulo
Atlantic magnetic anomaly ionization: A review and a new focus
Research Foundation (FAPESP) grant numbers 2020/13037-3, 2020/
on electrodynamic effects in the equatorial ionosphere. Journal of
12710-6, 2020/11262-0, 2020/11089-6, and 2020/10533-0. The authors
Atmospheric and Solar-Terrestrial Physics. 67(17-18 SPEC. ISS.),
thank EarthSense for support with TerraSentia robots and field data.
1643–1657 (2005). https://doi.org/10.1016/j.jastp.2005.01.014
6. Spogli, L., Alfonsi, L., Romano, V., De Franceschi, G., Joao Fran-
Author Contributions All authors contributed to the study’s conception
cisco, G.M., Hirokazu Shimabukuro, M., Bougard, B., Aquino, M.:
and design. The field preparation, data collection, and experimenta-
Assessing the GNSS scintillation climate over Brazil under increas-
tion on fields were performed by Andres Eduardo Baquero Velasquez,
ing solar activity. Journal of Atmospheric and Solar-Terrestrial
Mateus Valverde Gsparino, Girish Chowdhary, and Vitor Akihiro
Physics 105-106, 199–206 (2013). https://doi.org/10.1016/j.jastp.
Hisano Higuti. Each module development had a responsible, for
2013.10.003
entrance/exit development was done by Estevão Serafim Calera, Visual
7. Reina, G., Milella, A., Rouveure, R., Nielsen, M., Worst, R.,
Odometry was done by Jorge Id Facuri Filho, orientation estimation
Blas, M.R.: Ambient awareness for agricultural robotic vehi-
and navigation control was done by Gabriel Lima Araujo, for the
cles. Biosyst. Eng. 146, 114–132 (2016). https://doi.org/10.1016/
leaf/soil/sky classifier was performed by Gabriel Correa de Oliveira, and
j.biosystemseng.2015.12.010
the deep learning plant identification was performed by Lucas Toschi.
8. Rovira-Más, F., Chatterjee, I., Sáiz-Rubio, V.: The role of GNSS
All modules were supervised by Vitor Akihiro Hisano Higuti, Marcelo
in the navigation strategies of cost-effective agricultural robots.
Becker, and Andre Carmona Hernandes. All authors contributed to the
Comput. Electron. Agric. 112, 172–183 (2015). https://doi.org/10.
first draft of the manuscript. Final revisions were performed by Marcelo
1016/j.compag.2014.12.017
Becker and Andre Carmona Hernandes. All authors read and approved
9. Bergerman, M., Maeta, S.M., Zhang, J., Freitas, G.M., Hamner,
the final manuscript.
B., Singh, S., Kantor, G.: Robot Farmers: Autonomous Orchard
Vehicles Help Tree Fruit Production. Robotics & Automation Mag-
Funding The work was partially supported by Sao Paulo Research
azine. 22(march), 54–63 (2015)
Foundation (FAPESP) grant numbers 2020/13037-3, 2020/12710-6,
10. Santos, F.B.N.D., Sobreira, H.M.P., Campos, D.F.B., Santos,
2020/11262-0, 2020/11089-6, and 2020/10533-0.
R.M.P.M.d., Moreira, A.P.G.M., Contente, O.M.S.: Towards a
Reliable Monitoring Robot for Mountain Vineyards. In: 2015
Data Availability The datasets generated during and/or analyzed dur-
IEEE International Conference on Autonomous Robot Systems
ing the current study are available from the corresponding author on
and Competitions, pp. 37–43. IEEE, (2015). https://doi.org/10.
reasonable request.
1109/ICARSC.2015.21. http://ieeexplore.ieee.org/lpdocs/epic03/
wrapper.htm?arnumber=7101608, http://ieeexplore.ieee.org/
Declarations document/7101608/
11. Reid, J., Searcy, S.: Vision-based guidance of an agriculture tractor.
IEEE Control Syst. Mag. 7(2), 39–43 (1987)
Competing Interests The authors have no relevant financial or non- 12. Ball, D., Upcroft, B., Wyeth, G., Corke, P., English, A., Ross, P.,
financial interests to disclose. Patten, T., Fitch, R., Sukkarieh, S., Bate, A.: Vision-based obstacle
detection and navigation for an agricultural robot. J. Field Robot.
Ethics Approval Not applicable 33(8), 1107–1130 (2016)
13. Zhang, S., Wang, Y., Zhu, Z., Li, Z., Du, Y., Mao, E.: Tractor path
Consent to participate Not applicable tracking control based on binocular vision. Inf. Process. Agric.
5(4), 422–432 (2018). https://doi.org/10.1016/j.inpa.2018.07.003
Consent for Publication Not applicable 14. Radcliffe, J., Cox, J., Bulanon, D.M.: Machine vision for orchard
navigation. Comput. Ind. 98, 165–171 (2018). https://doi.org/10.
1016/j.compind.2018.03.008
15. Higuti, V.A.H., Velasquez, A.E.B., Magalhaes, D.V., Becker, M.,
Chowdhary, G.: Under canopy light detection and ranging-based
autonomous navigation. J. Field Robot. 36(3), 547–567 (2019)
References 16. Xue, J., Zhang, L., Grift, T.E.: Variable field-of-view machine
vision based row guidance of an agricultural robot. Comput. Elec-
1. Ramin Shamshiri, R., Weltzien, C., A. Hameed, I., J. Yule, I., tron. Agric. 84, 85–91 (2012). https://doi.org/10.1016/j.compag.
E. Grift, T., K. Balasundram, S., Pitonakova, L., Ahmad, D., 2012.02.009
Chowdhary, G.: Research and development in agricultural robotics: 17. Kayacan, E., Zhang, Z., Chowdhary, G.: Embedded High Pre-
A perspective of digital farming. Int. J. Agric. Biol. Eng. 11(4), 1– cision Control and Corn Stand Counting Algorithms for an
11 (2018). https://doi.org/10.25165/j.ijabe.20181104.4278 Ultra-Compact 3D Printed Field Robot. In: Proceedings of
2. Fountas, S., Mylonas, N., Malounas, I., Rodias, E., Hellmann San- Robotics: Science and Systems, Pittsburgh, Pennsylvania, pp. 1–9
tos, C., Pekkeriet, E.: Agricultural Robotics for Field Operations. (2018). https://doi.org/10.15607/RSS.2018.XIV.036. http://www.
Sensors 20(9), 2672 (2020). https://doi.org/10.3390/s20092672 roboticsproceedings.org/rss14/p36.html
123
18. Choudhuri, A., Chowdhary, G.: Crop stem width estimation in Intelligence, vol. 34, no. 11, pp. 2274–2282, (2012). https://doi.
highly cluttered field environment. Proceedings of the Computer org/10.1109/TPAMI.2012.120
Vision Problems in Plant Phenotyping (CVPPP 2018), Newcastle, 37. Perissini, I.: Experimental analysis of color constancy and seg-
UK, 6–13 (2018) mentation algorithms for plant seedlings detection. Master’s thesis.
19. Canny, J.: A computational approach to edge detection. IEEE University of Sao Paulo (2018)
Transactions on Pattern Analysis and Machine Intelligence 38. Meyer, G., Neto, J.: Verification of color vegetation indices for
PAMI 8(6), 679–698 (1986). https://doi.org/10.1109/TPAMI. automated crop imaging applications. Computers and Electronics
1986.4767851 in Agriculture, 282–293 (2008). https://doi.org/10.1016/j.compag.
20. Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with 2008.03.00939
the OpenCV Library. O’Reilly, Cambridge (2008) 39. Hamuda, E., Glavin, M., Jones, E.: A survey of image processing
21. Aqel, M.O., Marhaban, M.H., Saripan, M.I., Ismail, N.B.: Review techniques for plant extraction and segmentation in the field. Com-
of visual odometry: types, approaches, challenges, and applica- puters and Electronics in Agriculture, 184–199 (2016). https://doi.
tions. SpringerPlus 5(1), 1–26 (2016) org/10.1016/j.compag.2016.04.024
22. Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE 40. Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001).
Robot. Auton. Mag. 18 (4), 80–92 (2011) https://doi.org/10.1023/A:1010933404324
23. Isik, S.: A comparative evaluation of well-known feature detectors 41. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,
and descriptors. Int. J. Appl. Math. Eletronics Comput. 3(1), 1–6 B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.,
(2014) Vander plas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot,
24. Rosten, E., Drummond, T.: Machine learning for high-speed corner M., Duchesnay, E.: Scikit-learn: machine learning in Python. J.
detection. In: European Conference on Computer Vision, pp. 430– Mach. Learn. Res. 12, 2825–2830 (2011)
443 (2006). Springer 42. Krüger, F.: Activity, Context, and Plan Recognition with Compu-
25. Bouguet, J.-Y., et al.: Pyramidal implementation of the affine lucas tational Causal Behaviour Models. PhD. Thesis. Advisor: Thomas
kanade feautre tracker description of the algorithm. Intel Corp. Kirste. University of Rostock (2016)
5(1-10), 4 (2001) 43. Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental
26. Nistér, D.: An efficient solution to the five-point relative pose Improvement. Computer Science, arXiv:1804.02767
problem. IEEE Trans. Pattern. Anal. Mach. Intell. 26(6), 756–770 44. Milz, S., Arbeiter, G., Witt, C., Abdallah, B., Yogamani, S.: Visual
(2004) SLAM for Automated Driving: Exploring the Applications of Deep
27. Fischler, M.A., Bolles, R.C.: Random sample consensus: a Learning. In: 2018 IEEE/CVF Conference on Computer Vision
paradigm for model fitting with applications to image analysis and and Pattern Recognition Workshops (CVPRW), Salt Lake City,
automated cartography. Communications of the ACM 24(6), 381– UT, USA, 2018, pp. 360–36010. https://doi.org/10.1109/CVPRW.
395 (1981) 2018.00062
28. Neto, J.C.: A combined statistical-soft computing approach for 45. Yu, C., Liu, Z., Liu, X., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-
classification and mapping weed species in minimum-tillage sys- slam: A semantic visual slam towards dynamic environments. In:
tems. dissertation. The university of Nebraska - Lincoln (2004) 2018 IEEE/RSJ International Conference on Intelligent Robots and
29. Otsu, N.: A threshold selection method from gray-level histograms. Systems (IROS), pp. 1168–1174 (2018). https://doi.org/10.1109/
IEEE Transactions on Systems, Man, and Cybernetics 9(1), 62–66 IROS.2018.8593691
(1979). https://doi.org/10.1109/TSMC.1979.4310076 46. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y.,
30. Raid, A.M., Khedr, W., El-dosuky, M., Aoud, M.: Image restoration Berg, A.C.: Ssd: Single shot multibox detector. Lecture Notes in
based on morphological operations. Int. J. Comput. Sci. Eng. Inf. Computer Science, 21–37 (2016). https://doi.org/10.1007/978-3-
Technol. 4, 9–21 (2014). https://doi.org/10.5121/ijcseit.2014.4302 319-46448-02
31. Sklansky, J.: Finding the convex hull of a simple polygon. Pattern 47. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.; Proceedings of the IEEE
Recog. Lett. 1(2), 79–83 (1982). https://doi.org/10.1016/0167- Conference on Computer Vision and Pattern Recognition (CVPR),
8655(82)90016-2 pp. 2881–2890 (2017)
32. Chen, J., Quiang, H., Wu, J., Xu, G., Wang, Z., Liu, X.: Extracting 48. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look
the nav igation path of a tomato-cucumber greenhouse robot based once: Unified, real-time object detection. In: 2016 IEEE Con-
on a median point hough transform. Comput. Electron. Agric. 174, ference on Computer Vision and Pattern Recognition (CVPR),
105472 (2020). https://doi.org/10.1016/j.compag.2020.105472 pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
33. Araujo, G.L., Filho, J.I.F., Higuti, V.A.H., Becker, M.: A new 49. Ali, J., Khan, R., Ahmad, N., Maqsood, I.: Random forests and deci-
approach of monocular visual odometry to trajectory estimation sion trees. International Journal of Computer Science Issues(IJCSI)
within a plantation. In: 2021 Latin American Robotics Symposium 9 (2012)
(LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021
Workshop on Robotics in Edu cation (WRE), pp. 180–185 (2021).
https://doi.org/10.1109/LARS/SBR/WRE54079.2021.9605451
Publisher’s Note Springer Nature remains neutral with regard to juris-
34. Gonzalez, R.C., Woods, R.E.: Digital Image Processing (3rd Edi-
dictional claims in published maps and institutional affiliations.
tion). Prentice Hall, August 2007
35. Kim, A.: FastSLIC: Optmized SLIC Superpixel. Available at
Springer Nature or its licensor (e.g. a society or other partner) holds
https://github.com/Algy/fast-slic
exclusive rights to this article under a publishing agreement with the
36. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk,
author(s) or other rightsholder(s); author self-archiving of the accepted
S.: SLIC Superpixels Compared to State-of-the-Art Superpixel
manuscript version of this article is solely governed by the terms of such
Methods. In: IEEE Transactions on Pattern Analysis and Machine
publishing agreement and applicable law.
123
Estêvão Serafim Calera is a current student of Mechatronics Engineer- Lucas Toschi is currently an undergraduate student in Electrical Engi-
ing at the University of São Paulo (USP), having enrolled in 2018. neering at the University of São Paulo (USP). From 2019 until 2022 he
From 2019 to 2021, he was a member of an undergraduate robotics was a member of an undergraduate robotics group called Soluções em
group called SEMEAR, where he served as a project manager and Engenharia Mecatrônica e Aplicação na Robótica (SEMEAR), devel-
director, leading nearly 30 people across five simultaneous projects. oping autonomous robots and leading teams. In 2022, he worked as
Through his leadership, the team won two national titles and secured a a director in SEMEAR Autonomous Robots Department, supervising
second-place finish in an international competition. In 2021 and 2022, almost thirty people on four projects. Since 2021 he has researched
he devoted himself to the development of autonomous mobile robotics applications of Artificial Intelligence and Deep Learning to improve
in agriculture and computer vision algorithms, working at the Mobile perception and navigation in the agricultural environment supported
Robotics Laboratory (LabRoM) with the support of the São Paulo by the São Paulo Research Foundation (FAPESP). From January 2023
Research Foundation (FAPESP). Currently, he is focused on develop- to April 2023 he developed a corn crop perception project on the
ing Blockchain solutions in the corporate context. Distributed Autonomous Systems Laboratory (DASLAB) at the Uni-
versity of Illinois at Urbana-Champaign (UIUC) with financial support
from FAPESP.
Gabriel Correa de Oliveira is currently an undergraduate student in
Mechatronics Engineering at University of São Paulo - USP. From
2019 to 2022, he was a member of an undergraduate robotics group Andre Carmona Hernandes is currently a Professor in the Electri-
called Soluções em Engenharia Mecatrônica e Aplicação na Robótica cal Engineering Department of the Federal University of São Carlos -
(SEMEAR), developing algorithms for computer vision, telemetry, UFSCar, Brazil. He holds a Ph.D.(2016) and M.Sc. (2012) in Mechan-
and control for autonomous aerial robots. In 2020 and 2021, he worked ical Engineering both in University of São Paulo (USP). He did an
on a research project at the Mobile Robotics Laboratory (LabRoM) internship in 2014 at Rühr-Universität Bochum, in Germany. He holds
focused on autonomous mobile robotics in agriculture, computer vision a bachelor (2009) from University of Sao Paulo (USP) in Mechatronic
and machine learning, receiving financial support from São Paulo Engineering. His research interests include Modeling and characteri-
Research Foundation (FAPESP). From June to October 2022, he worked zation of quadrotor dynamics, Aerial Robotics, Computer Vision and
at an agriculture drone company, working on navigation, guidance, Machine Learning applied to aerial images.
and control. Since January 2023, he has been working as a data sci-
entist in customer retention for a bank.
Andres Eduardo Baquero Velasquez is a senior engineer of autonomy
at Earthsense where he is working on the development of autonomy
Gabriel Lima Araujo is currently an undergraduate student in Mecha- algorithms for mobile robots operating in harsh, changing, and uncer-
tronics Engineering at University of São Paulo - USP. From 2019 to tain field environments. He holds a bachelor (2011) from the Llanos
2021, he was a member of an undergraduate robotics group called University (Colombia) in Electronic Engineering, and a M.Sc. (2015)
Soluções em Engenharia Mecatrônica e Aplicação na Robótica and PhD (2019) from The University of Sao Paulo (Brazil) in Mechan-
(SEMEAR), developing autonomous robots. In 2020 and 2021, he ical Engineering. Additionally, he was a postdoc (2019–2022) at Dis-
worked on a research project at the Mobile Robotics Laboratory tributed Autonomous Systems Laboratory (DASLab) of the University
(LabRoM) focused on autonomous mobile robotics in agriculture and of Illinois at Urbana-Champaign.
computer vision, receiving financial support from São Paulo Research
Foundation (FAPESP). Since 2022, he’s been on an exchange program
with Leibniz University Hannover (LUH), in Germany, focusing on
Mateus Valverde Gasparino is a Ph.D. student at the Computer Science
machine learning algorithms.
Department, University of Illinois at Urbana-Champaign, USA. He
was awarded the M.Sc. degree in mechanical engineering and Bach-
elor’s degree in mechatronics engineering from the University of Sao
Jorge Id Facuri Filho is currently an undergraduate student in Electrical Paulo, Brazil. He is currently a graduate research assistant at the Dis-
Engineering at University of São Paulo - USP. From 2019 to 2022, he tributed Autonomous Systems Laboratory (DASLab), and his research
was a member of an undergraduate robotics group called Soluções em interests include perception, mapping, control, and learning for robots
Engenharia Mecatrônica e Aplicação na Robótica (SEMEAR), devel- in unstructured outdoor environments.
oping computer vision solutions for autonomous robots. In 2020 and
2021, he worked on a research project at the Mobile Robotics Labo-
ratory (LabRoM) focused on develop a visual odometry solution for
autonomous mobile robotics in agriculture, receiving financial sup-
port from São Paulo Research Foundation (FAPESP), and published
a paper at LARS (Latin American Robotics Symposium). Since 2022
he is working with Strategic Sourcing solutions at a Bank.
123
Girish Chowdhary is an associate professor and Donald Biggar Wil- Marcelo Becker received his B.Sc. Mechanical Engineer (ME) degree
let Faculty Fellow at the University of Illinois at Urbana-Champaign. with emphasis on Mechatronics in 1993 at University of São Paulo
He is the director of the Field Robotics Engineering and Science Hub (USP), Brazil. He received his M.Sc. ME and D.Sc. ME degrees,
(FRESH) at UIUC and the Director of the USDA/NIFA Farm of the respectively, in 1997 and 2000 at the State University of Campinas
Future. Girish holds a joint appointment with Agricultural and Bio- (Unicamp), Brazil. During his D.Sc. studies he spent 8 months as a
logical Engineering and Computer Science, he is a member of the guest student at the Institute of Robotics (IfR) - Swiss Federal Insti-
UIUC Coordinated Science Lab, and holds affiliate appointments in tute of Technology, Zurich (ETHZ). At that time he was involved in
Aerospace Engineering and Electrical Engineering. He holds a PhD researches on obstacle avoidance and map building procedures for
(2010) from Georgia Institute of Technology in Aerospace Engineer- indoor mobile robots. From August 2005 until July 2006 he did a Sab-
ing. He was a postdoc at the Laboratory for Information and Decision batical at the Autonomous System Lab (ASL) - Swiss Federal Institute
Systems (LIDS) of the Massachusetts Institute of Technology (2011– of Technology, Lausanne (EPFL). There he was involved in researches
2013), and an assistant professor at Oklahoma State University (2013– on obstacle avoidance for indoor and outdoor mobile robots. From
2016). He also worked with the German Aerospace Center’s (DLR’s) 2001 until 2008 he was an associate professor at Pontifical Catholic
Institute of Flight Systems for around three years (2003–2006). He is University of Minas Gerais (PUC Minas), Brazil. From 2002 to 2005
the winner of the Air Force Young Investigator Award, and several best he was also the co-head of the Mechatronics Engineering Department
paper awards, including a best systems paper award at RSS 2018 for and of the Robotics and Automation Group (GEAR) at PUC Minas.
his recent work on the agricultural robot TerraSentia. Since 2008 he has been Professor at University of São Paulo (EESC-
USP). Since 2022 he is the coordinator of the Robotics Center at
USP. He published several papers in the fields of mechanical design
and mobile robotics in several conferences and journals. Although his
Vitor Akihiro Hisano Higuti holds a Mechatronics Engineering B.Sc.
research interests range broadly, his chief areas of interest are mobile
degree and a Ph.D. in Mechanical Engineering both in University
robots for agriculture, inspection robots for industry, design method-
of São Paulo (USP), Brazil. He is the main contributor to the cur-
ologies and tools, mechanical design applied on robots and mechatron-
rent LiDAR-based perception subsystem for small robots such as Ter-
ics.
raSentia to follow crop rows. Currently, he is Lead Autonomy Engi-
neer in EarthSense, Inc, which manufactures and develops robots for
agricultural tasks.
123
Authors and Aﬃliations
Estêvão Serafim Calera1 · Gabriel Correa de Oliveira1 · Gabriel Lima Araujo1 · Jorge Id Facuri Filho1 ·
Lucas Toschi1 · Andre Carmona Hernandes2 · Andres Eduardo Baquero Velasquez3 ·
Mateus Valverde Gasparino4 · Girish Chowdhary4 · Vitor Akihiro Hisano Higuti3 · Marcelo Becker1
Estêvão Serafim Calera Girish Chowdhary

estevaoscalera@usp.br girishc@illinois.edu
Gabriel Correa de Oliveira Vitor Akihiro Hisano Higuti
gabrielcorrea@usp.br akihiro@earthsense.co
Gabriel Lima Araujo 1 Sao Carlos Engineering School - EESC, University of Sao
gabrielaraujo18@usp.br
Paulo - USP, Av. do Trabalhador Saocarlense, 400, 13566-590
Jorge Id Facuri Filho Sao Carlos, SP, Brazil
jorgeid@usp.br 2 Electrical Engineering Department, Federal University of Sao
Lucas Toschi Carlos - UFSCar, Rod. Washington Luiz, s/n, 13565-905
ltoschi@usp.br Sao Carlos, SP, Brazil
Andres Eduardo Baquero Velasquez 3 EarthSense, 1800 South Oak Street, Suite 111, 61820
andru89@illinois.edu Champaign, IL, USA
Mateus Valverde Gasparino 4 Department of Agricultural and Biological Engineering,
mvalve2@illinois.edu University of Illinois at Urbana-Champaign - UIUC, 61801
Urbana, USA
123

Journal Terra Sentia

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Terra Sentia

Uploaded by

Copyright:

Available Formats

Journal of Intelligent & Robotic Systems

Under-Canopy Navigation for an Agricultural Rover

Received: 2 June 2022 / Accepted: 3 March 2023

Keywords Mobile robotics · Navigation · Image data · Under-canopy

1 Introduction harvesters and tractors. In the same period, researchers have

Fig. 1 Snapshots of TerraSentia

Fig. 2 Data flow and modules

Fig. 3 Entrance identification

3.2.2 Direction Definition

Fig. 4 Method flowchart

Fig. 5 Method steps

3.3 Leaf Obstruction Detection Module 3.4 Detection Module

Fig. 6 Fast-SLIC segmentation

Fig. 7 YOLOv3 neural network architecture [43]

Fig. 8 Exit identification

Fig. 9 Exit identification

causing a zigzag motion at certain moments. The oscillations

Table 1 Translation Mean Squared Errors (MSE) and time processing

Proposed method 0.0272 1.1604 0.202

as close as possible to the size of the ground truth trajec-

The input image segmentation into superpixels was per-

When analyzing the output image segmented and classi-

4.3 Detection Module Results

Some neural network examples can be seen in Fig. 17. In

n vtotal = n gt + (n gpr ed − n gt pr ed ) (5)

Fig. 17 Some YOLOv3 model

Authors and Aﬃliations

Estêvão Serafim Calera Girish Chowdhary

You might also like