You are on page 1of 4

Row Detection Using a Machine Learning Approach

for Autonomous Agriculture Vehicles


Niko Anthony Simon Cheol-Hong Min
Department of Electrical Engineering Department of Electrical Engineering
University of St. Thomas University of St. Thomas
2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC) | 978-1-6654-1490-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/CCWC51732.2021.9375999

St. Paul MN, United States St. Paul MN, United States

Abstract— Autonomous agriculture vehicles require robust


vision systems to navigate farm fields within rows. Commonly,
image processing methods are used for row detection in
applications like forestry and crop production to collect growth
information and yield data based on location. Often these methods
are implemented at a bird’s eye view by images obtained via
drones to determine distinct patterns, but this does not offer close-
up and specific information. Autonomous agriculture vehicles can
offer a ground-level view of crops and are capable of relaying more
nuanced data to the user. The proposed analysis focuses on
determining the effectiveness of various features for a machine Figure 1. Example image taken from a farm vehicle showing (corn)
learning based image classification of crops at a ground-level view. crops, grass and sky during the daytime with shadows on the grass.
Due to the similarities in the color and patterns observed in the
images, classification for autonomous navigation is a difficult main input to a system, it may still have some dependence on
problem. The proposed solution is a machine learning based model lighting even if results generally seem dependable, as shown in
constructed from color and texture features to predict the [1,3,5,9,10, and 13]. On the other hand, texture identification
occurrence of the crop. The input image data is divided into depends entirely on patterns. Unlike color-based identification,
pieces, and this prediction is performed on a segment by segment
basis to construct a coordinate map of distinct areas of interest in
which relies on determining pixel values and comparing them
the input image. A combination of color features produced by K- against a pre-set range, the texture depends on identifying
means clustering and texture features produced by Haralick patterns that the pixels create.
textures was found to deliver 95% accuracy. Autonomous
navigation is a real-time application, and computational efficiency II. PROPOSED METHOD
is important. Therefore we also target to reduce processing time For our application, a machine learning classification
while achieving high accuracy for real-time navigation. approach is examined, specifically identifying the best
performing features. With the features, the proposed system will
Keywords---K-means, Haralick, Machine Learning, Autonomous
Agriculture Vehicles, Row Detection be implemented as a real-time application. Therefore, certain
aspects regarding simulation versus processing efficiency and
I. INTRODUCTION feature choice will be highly important. The proposed method
seeks to use a minimal amount of features while producing a
Several aspects must be considered when analyzing high enough degree of prediction accuracy. A basic diagram for
agricultural image data. First, distinct regions of interest must this procedure is shown in Fig. 2. If the majority of predictions
be identified. For an autonomous vehicle on a farm, this means are correct, misidentifications will stand out as obvious outliers
that the image data contains crops (in this case corn), separated in the output and can be treated as noise in post processing.
into rows, grass in between the crop rows, and the sky.
Associated conditions such as weather and camera orientation Using a minimal amount of features is desirable because
will cause variations. For example, if the sky is overcast in one features take time to be processed. In a real time application,
image and sunny in an identical setting, it is possible for the depending on how the feature is processed, it will take a certain
color regions which separate distinct areas to be muddled. If the amount of processing time to execute. This increased
autonomous vehicle is close to crops, close-up views of plants processing time is the trade off for extra feature information that
need to be considered, which may be highly detailed and could may not be absolutely necessary for classification purposes.
be miss-classified. Various lighting conditions also affect Color and texture features are the most widely used over most
classification results. If all lighting conditions are known machine learning classification based applications however,
beforehand, it may be possible to construct a series of color other types of features are also examined in a comprehensive
ranges to represent every input condition. An example input feature analysis for this data to measure feature effectiveness.
image is shown in Fig. 1.
If an unexpected situation were to occur, such as grass being
darker than usual or corn stalks being the same color as grass,
then confusion could be caused when trying to distinguish
different regions. This means that if image pixel colors are the Figure 2: Flow diagram of the proposed method

978-0-7381-4394-1/21/$31.00 ©2021 IEEE 0273

Authorized licensed use limited to: East Carolina University. Downloaded on June 18,2021 at 23:54:29 UTC from IEEE Xplore. Restrictions apply.
This method also relies on a pool of training data in order to There are a variety of ways to determine texture information
construct a model. but one method was ultimately chosen for its ease of execution.
Texture features are calculated using Haralick textures, which
The training images must be an accurate representation of essentially looks for patterns in adjacent pixels and outputs a set
system inputs. If there is a significant pattern variation within a of coefficients to represent those patterns. Haralick textures are
particular region, then more training images may be required to often used in biology and medical applications to identify
develop an accurate model. An example of this would be corn organics. This is somewhat similar to the complex textures
in different stages of the growing season. Depending on the time found in most plant life, as shown by Bekkari in [8]. Texture
of year and environmental conditions, there exists a variety of identification can be performed in dim or changing lighting
different textures and colors that can be observed from the same conditions. This could be useful if the autonomous agriculture
location. vehicle were to be active in the evenings.
The proposed method also relies on a preprocessing stage Another benefit is that the number of coefficients produced
that takes the input image and breaks it down into segments. The by the Haralick texture operation is relatively small. Such
segments are then fed into the model, and predictions are plotted characteristics are relevant for the same reason discussed for
as coordinates to represent distinct regions of interest. This is color features where processing time is increased. Gabor filters
different than other methods which process the input image as a were considered as texture features for this implementation as
whole rather than process it gradually in segments. well. However, they require several more coefficients and
Because this type of agricultural data does not include multiple filter instances to perform effectively.
distinct objects, but rather areas of a particular occurrence, like Moments are also explored as a possible feature. Image
crops or grass, segmenting the data is desirable because a higher moments are a weighted average of pixel values and are
resolution can be achieved at the boundaries where different commonly used for shape detection. Specifically, the Hu
areas exist. The most critical boundary is between the crop and Moment is tested as a possible feature because it does not require
the grass because this is where the autonomous vehicle travels, certain phase-related inputs that other types of moments require
and it would be undesirable for the vehicle to drive into the crop. for calculations, and the output is in the form of only seven
The second most important boundary is between the crop and coefficients, as shown by Fu [7].
the sky because it helps to establish crop height, the horizon line,
and other factors relating to the spatial orientation of the B. Training Images
autonomous vehicle. Some of this data can be beneficial to Eight hundred reference images, each for the crop, sky, and
supplement already existing sensor data for the autonomous grass are used to train the model. These images were created by
vehicle, as suggested by [1, 10 and 12]. segmenting several farm images from the surrounding location,
III. FEATURES AND TRAINING IMAGES extracting sample areas, and manually separating them into the
three categories of interest. To develop an accurate model, it is
A. Features vital to use several examples of the training textures that are
Color data is based on image pixel values. Ideally, if all of expected to be observed in the input image. If the training
the colors for a given area were known, a range of colors could images are too different in perspective, size, or pattern from the
be developed to represent that area. However, if the lighting is input image, the quality of the predictions will suffer. Once the
slightly different, those color ranges would be invalidated. This model is developed, it can be reused without having to compile
is specifically relevant for cases of shadow. again. One possible benefit to this design is that multiple models
could be created and loaded if the surroundings change
Checking every pixel value is expensive in terms of depending on the specific type of crop or season of operation.
processing. Determining arbitrary color ranges and checking
pixel values in this systematic manner may give good resolution, For each segment, the model outputs a percent confidence
but only if the color ranges remain constant. In the case of value for the predicted region, either the crop, grass or the sky.
agricultural data, several different conditions could influence the This means that there will be a dominant percentage and
color. Evaluating colors in this level of detail could introduce secondary as well as tertiary percentages that predict whether
unwanted problems. For this reason, only the highest occurring any given segment shows any given region. In a segment with
colors in a given training image are recorded and used as mixed regions, however, percentages may be more skewed. The
features. goal is to distinguish the different textures and colors, but if a
segment is very large, the greater the chance an individual
Determining the most frequently occurring colors is segment will contain more than one region.
achieved using K-means clustering, which is used to examine all
In order to address such limitations, an alternative
colors present in a segment. The detected colors are then labeled
preprocessing technique is considered. Rather than simply
and divided into groups, which can be measured in terms of
splitting the input into segments in a grid-like fashion, a sliding
quantity of occurrence. Selecting an arbitrary n most occurring
window can be applied to sweep across the input and create
colors must be implemented with care because, in some cases,
segments at a rate specified by the user. At a certain point,
there may not exist n different colors in a given area, as
though, optimizations like this start to become a hardware
suggested by Yadav in [3]. This effect was observed in the sky
design question. GPU selection and other hardware choices are
region where on a clear cloudless day, there may be only two or
outside the scope of this analysis, but they are major
three distinct colors in a given segment, while a segment of grass
considerations in terms of field implementation. Associated
may contain twenty distinct colors.
factors are a consideration in [4,6, and 11].

0274

Authorized licensed use limited to: East Carolina University. Downloaded on June 18,2021 at 23:54:29 UTC from IEEE Xplore. Restrictions apply.
Figure 3. Sample Segments of Input Image in Two Different Sizes
360x240 Pixels (Left) 90x60 Pixels (Right) Figure 5: Close up view of corn stalks and cooresponding KNN decision
boundary where K=7
IV. EXPERIMENTAL RESULTS
of the image segments have been mapped to X and Y
A. Effects of Segment Size coordinates, they are fed into a KNN classifier to determine a
During classification, the original image is segmented as part decision boundary. KNN is a supervised machine learning
of processing the input, but the size of the segment affects algorithm commonly used for classification where coordinates
accuracy and precision. Classification increases in difficulty are classified based on the labels of neighboring data. The value
depending on which specific area of the input is observed. K represents the number of neighboring data that is considered
Because corn stalks have a variety of textures in the husk and during the classification, and there are tradeoffs to choosing the
leaves, it is even possible that parts could even be confused with appropriate value for K. If K is too small, a bias could be
the texture of the grass. When segments decrease in size, small introduced, creating noise in the decision boundary. If the value
areas of shadow and other small pockets of unexpected patterns for K is too large, increased processing time can become an
become more apparent, which can be seen in Fig. 3. Increasing issue, especially for a real-time application. The KNN classifier
the number of segments does make the output more precise by serves as a post-processing stage if portions of the output have
increasing the number of predictions. However, the low confidence in predictions.
interpretation of predictions produced by the model must also In cases where there is irregular or overlapping clustering of
become more accurate as a tradeoff. For the proposed method, coordinates, the value of K becomes more relevant, but in this
segments of 90x60 pixels are used. case, it can be kept low without having negative effects, which
is demonstrated by Guo in [2]. The KNN classifier groups the
When segments decrease in size, small areas of shadow and
voids in data in the grass regions of the image map caused by
other small pockets of unexpected patterns become more
the shadows, which can be seen in Fig. 4 and Fig. 5. In the
apparent. Increasing the number of segments does make the
formation of the decision boundary, the voids created by the
coordinate map more precise by increasing the number of
shadows in the grass are appropriately filled in based on the
coordinates. However, the interpretation of predictions
clustering of surrounding coordinates.
produced by the model must also become more complex to
distinguish variations among individual textures and colors. At C. Feature Effectiveness
most for this application, segments could be reduced to about If the input image itself has a close view of crop textures and
45x30 pixels while maintaining an identifiable representation of colors, it has the potential to affect row identification predictions
a given region. Any dimensions smaller than this may introduce if not expected. In the case of Fig. 5, the corn is still in season,
inconsistency. For classification, a Random Forrest classifier is but the distribution of distinct regions is more complicated. The
used to construct the machine learning model. Several other sky is intermingled with the corn stalks, and the grass can be
classifier types were tested, and most performed similarly to seen on the other side of the furrow. This is a likely scenario for
each other in terms of output accuracy. Specific classifier type an autonomous device where it would be an undesirable
is not a critical point of the proposed method and can be changed situation for it to drive into crops, which means there needs to
if a different classifier is found to process data more efficiently be a clear distinction between regions even if one region is
for this application. taking up most of the image.
B. Prediction Tolerance and Environmental Factors Table 1 shows features that were evaluated using the training
To maintain the original aspect ratio of the input image, data pool of images. Each feature is measured against every
horizontal points are scaled by a factor of two-thirds. Once all other feature for its contribution to the aggregate accuracy.

Figure 4. KNN Decision boundary, note yellow, blue and green colors Figure 6: Output predictions with no percentage threshold or KNN
have no special meaning and simply show divisions between the crop, clustering showing clear division between the crop, grass, and sky
sky and grass, K=7

0275

Authorized licensed use limited to: East Carolina University. Downloaded on June 18,2021 at 23:54:29 UTC from IEEE Xplore. Restrictions apply.
TABLE I. TABLE OF FEATURE EFFECTIVENESS classifying plant life or other organic materials offer valuable
comparisons for determining effective classification features in
Feature, Number of Coefficients, and Effectiveness this application. The proposed method adequately solves the
Between 3 classification problem. However, there is an inherent issue
Color (K-Means) Effective
and 10
4 Groupings associated with this type of machine learning based approach.
Haralick Texture Effective Because features must be manually chosen, the performance of
of 13
Hu Moment 7
Not the model is directly related to that choice. Further work on this
Effective application will include video input and investigation of a neural
Not network-based classification solution.
Variance 16200
Effective
Median 16200 Effective REFERENCES
Gaussian (Sigma=7) 16200 Effective [1] M. Kaur and C. Min, "Automatic Crop Furrow Detection for Precision
Agriculture," 2018 IEEE 61st International Midwest Symposium on
Circuits and Systems (MWSCAS), Windsor, ON, Canada, 2018, pp. 520-
Color, texture, and Gaussian features were the overall top- 523, doi: 10.1109/MWSCAS.2018.8623906.
performing features. A Gaussian filter essentially applies a [2] J. Guo and X. Wang, "Image Classification Based on SURF and
KNN," 2019 IEEE/ACIS 18th International Conference on Computer and
normal distribution and changing the value of sigma adjusts Information Science (ICIS), Beijing, China, 2019, pp. 356-359, doi:
variance at the mean value of the bell curve. Even though 10.1109/ICIS46139.2019.8940198.
Gaussian and some time-domain features (like median) have [3] H. Yadav, P. Bansal and R. Kumar Sunkaria, "Color dependent K-means
reasonable contributions, including all of them would increase clustering for color image segmentation of colored medical images," 2015
the calculation time per segment. However, adding additional 1st International Conference on Next Generation Computing
features would be possible. Different implementations of color Technologies (NGCT), Dehradun, 2015, pp. 858-862, doi:
10.1109/NGCT.2015.7375241.
and texture-based features could be added as long as the
[4] U. Shruthi, V. Nagaveni and B. K. Raghavendra, "A Review on Machine
calculation is not overly intensive. Learning Classification Techniques for Plant Disease Detection," 2019
Segment size and environmental factors are not the only 5th International Conference on Advanced Computing & Communication
Systems (ICACCS), Coimbatore, India, 2019, pp. 281-284, doi:
important considerations. The logic which interprets model 10.1109/ICACCS.2019.8728415.
predictions can be adjusted to omit predictions below a [5] W. Winterhalter, F. V. Fleckenstein, C. Dornhege and W. Burgard, "Crop
minimum threshold. Corn stalks are correctly identified as the Row Detection on Tiny Plants With the Pattern Hough Transform,"
major region in the foreground in Fig. 5, even when the camera in IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3394-3401,
orientation is adjusted to a close-up view of the crop. This is a Oct. 2018, doi: 10.1109/LRA.2018.2852841.
good indication that no biases have been introduced into the [6] K. Aparna and P. Supriya, "Precision Agriculture in Maize Fields," 2018
feature matrix during the construction of the model. In the case Second International Conference on Intelligent Computing and Control
Systems (ICICCS), Madurai, India, 2018, pp. 1407-1410, doi:
of Fig. 6, prediction thresholds are removed to observe the raw 10.1109/ICCONS.2018.8662936.
output of the model. Most of the misclassification occurs in the [7] M. Fu, F. Liu, Y. Yang and M. Wang, "Background pixels mutation
crop region, and overall accuracy of about 95% is achieved. In detection and Hu invariant moments based traffic signs detection on
general, the KNN decision boundary does not need to be used as autonomous vehicles," Proceedings of the 33rd Chinese Control
a post-processing step unless certain environmental factors are Conference, Nanjing, 2014, pp. 670-674, doi:
present that have not been considered. 10.1109/ChiCC.2014.6896705.
[8] A. Bekkari, S. Idbraim, D. Mammass and M. E. Yassa, "Exploiting
V. CONCLUSIONS spectral and space information in classification of high resolution urban
satellites images using Haralick features and SVM," 2011 International
Under certain lighting conditions, color features perform Conference on Multimedia Computing and Systems, Ouarzazate, 2011,
significantly better, especially in cases of minimal shadow. If pp. 1-4, doi: 10.1109/ICMCS.2011.5945611.
there is a significant amount of shadow caused by natural light, [9] X. Jinlin and J. Weiping, "Vision-Based Guidance Line Detection in Row
texture features will offer more valuable feature data for Crop Fields," 2010 International Conference on Intelligent Computation
classification. This is one of the benefits of including different Technology and Automation, Changsha, 2010, pp. 1140-1143, doi:
10.1109/ICICTA.2010.400.
feature types. Other types of features that are commonly used
[10] C. Tu, B. J. van Wyk, K. Djouani, Y. Hamam and S. Du, "An efficient
for classification in one-dimensional data, like median and crop row detection method for agriculture robots," 2014 7th International
Gaussian filters, calculate the segment as a whole and produce Congress on Image and Signal Processing, Dalian, 2014, pp. 655-659,
thousands of coefficients upon each instance. Overall, the doi: 10.1109/CISP.2014.7003860.
tradeoff of increased processing is not worth the benefit that [11] L. Zheng and J. Xu, "Multi-crop-row detection based on strip
these features offer. A purely signal processing based approach analysis," 2014 International Conference on Machine Learning and
can be very effective but has some disadvantages when Cybernetics, Lanzhou, 2014, pp. 611-614, doi:
10.1109/ICMLC.2014.7009678.
compared to a machine learning approach. This is evident just
[12] G. Jiang, C. Zhao and Y. Si, "A machine vision based crop rows detection
based on the flexibility of design. Certain signal processing for agricultural robots," 2010 International Conference on Wavelet
implementations may require the re-design of filter parameters, Analysis and Pattern Recognition, Qingdao, 2010, pp. 114-118, doi:
whereas a machine learning approach could simply swap out a 10.1109/ICWAPR.2010.5576422.
model. [13] N. H. Chehade, J. Boureau, C. Vidal and J. Zerubia, "Multi-class SVM
for forestry classification," 2009 16th IEEE International Conference on
The most challenging aspect of working with this type of Image Processing (ICIP), Cairo, 2009, pp. 1673-1676, doi:
data is the complexity and similarities between the crop and 10.1109/ICIP.2009.5413395.
grass image data. Furthermore, applications that involve

0276

Authorized licensed use limited to: East Carolina University. Downloaded on June 18,2021 at 23:54:29 UTC from IEEE Xplore. Restrictions apply.

You might also like