You are on page 1of 8

Computers and Electronics in Agriculture 80 (2012) 8996

Contents lists available at SciVerse ScienceDirect

Computers and Electronics in Agriculture

journal homepage:

Sequential support vector machine classication for small-grain weed species

discrimination with special regard to Cirsium arvense and Galium aparine
Till Rumpf a,, Christoph Rmer a, Martin Weis b, Markus Skefeld b, Roland Gerhards b, Lutz Plmer a
Institute of Geodesy and Geoinformation, Department of Geoinformation, University of Bonn, Germany
Institute of Phytomedicine (360), Department of Weed Science, University of Hohenheim, Germany

a r t i c l e i n f o a b s t r a c t

Article history: Site-specic weed management can reduce the amount of herbicides used in comparison to classical
Received 7 June 2011 broadcast applications. The ability to apply herbicides on weed patches within the eld requires automa-
Received in revised form 24 October 2011 tion. This study focuses on the automatic detection of different species with imaging sensors. Image pro-
Accepted 28 October 2011
cessing algorithms determine shape features for the plants in the images. With these shape descriptions
classication algorithms can be trained to identify the weed and crop species. Since weeds differ in their
economic loss due to their yield effect and are controlled by different herbicides, it is necessary to cor-
rectly distinguish between the species. Image series of different measurements with plant samples at dif-
Early weed detection
Cirsium arvense
ferent growth stages were analysed. For the classication a sequential classication approach was
Galium aparine chosen, involving three different support vector machine (SVM) models. In a rst step groups of similar
Feature selection plant species were successfully identied (monocotyledons, dicotyledons and barley). Distinctions within
Sequential classication the class of dicotyledons proved to be particularly difcult. For that purpose species in this group were
Support vector machines subject to a second and third classication step. For each of these steps different features were found
to be most important. Feature weighting was done with the RELIEF-F algorithm and SVM-Weighting.
The focus was on the early identication of the two most harmful species Cirsium arvense and Galium
aparine, with optimal accuracy than using a non-sequential classication approach. An overall classica-
tion accuracy of 97.7% was achieved in the rst step. For the two subsequent classiers accuracy rates of
80% and more were obtained for C. arvense and G. aparine.
2011 Elsevier B.V. All rights reserved.

1. Introduction According to the demand of Christensen et al. (2009) the identi-

cation of single weed species a species discrimination of dicotyle-
Weed populations have been found to be distributed heteroge- dons is necessary. A discrimination of species yields information
neously within agricultural elds (Marshall, 1998; Johnson et al., on weed distribution and weed species composition, laying the foun-
1996; Christensen and Heisel, 1998; Gerhards and Christensen, dations for the site-specic application of selective herbicides. A
2003; Christensen et al., 2009). Due to the lack of automatic weed suitable application technology, which allows a simultaneous appli-
detection techniques and site-specic herbicide application, the cation of several herbicidal agents on-the-go, based on sensor signals
majority of farmers spray herbicide uniformly across the eld. An or weed distribution maps, is a further requirement for the adoption
exact herbicide application to the weeds within a weed patch re- of site-specic and selective herbicide application. Possible technical
quires not only detailed information on the weed density but also solutions for patch spraying with several herbicides were outlined
on weed species distribution. Based on this information and the by Schulze-Lammers and Vondricka (2010) and kefeld (2010).
application of economic weed thresholds selective herbicides can A major step towards a practical solution for site-specic weed
be sprayed site-specically. Gerhards and Oebel (2006) realised management is the development of precise and powerful data
herbicide savings in cereals, maize and sugar beet eld from 6% acquisition techniques to automatically and continuously deter-
to 81% with site-specic herbicide application based on weed mine in-eld variation of weed populations. The most promising
species distribution maps. A site-specic application of a mixture techniques to identify weed species in arable crops are based on
of the individual herbicides on the same eld achieved only savings image processing (Weis and Skefeld, 2010). Infrared, multispec-
of 19% compared to a uniform treatment of the whole eld tral and RGB (red, green, blue channel) cameras were used to take
(Gerhards and Skefeld, 2003). pictures of crops and weed species from a low distance above the
ground. Plant properties were then extracted from the images by
Corresponding author. Tel.: +49 228 736335. image processing algorithms. Those properties were computed as
E-mail address: (T. Rumpf). features, which were used to separate species from each other.

0168-1699/$ - see front matter 2011 Elsevier B.V. All rights reserved.
90 T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996

strand and Baerveldt (2004) combined colour and shape fea- 2. Materials and methods
tures, which were derived after an initial plant segmentation step.
The colour features were computed as standard deviation and mean 2.1. Data acquisition
of the RGB values. Shape features were computed as area of segment,
form factors (distance variance to centre of gravity, compactness and Images were taken from greenhouse series and in the eld:
moments). Segments were merged if the distance between them weed and crop species were grown in pots in the years 2006 and
was small. In addition to the colour and shape features a row 2008, eld data were acquired in maize (2008), winter wheat
distance measure was introduced to locate the position in-row or (2007) and sugar beet crops (2007). Images of the red (R, ca.
between-row. Classication was nally done with a Bayesian 580 nm) and infrared (IR, >720 nm) spectrum of the light were ta-
approach. Burks et al. (2000) computed 33 unique colour texture ken simultaneously and subtracted from each other (IR-R) to gen-
features (co-occurence) from a hue, saturation and intensity (HSI) erate difference images (Skefeld et al., 2007). Most of the red light
representation to distinguish between ve weed species and the soil. is absorbed by plants for the photosynthesis, whereas the infrared
Different neural network classiers were tested for their perfor- light is reected. All other materials in eld (soil, mulch, stones)
mance based on the data given in Burks et al. (2005), resulting in a have a similar reection at both wavelengths. Plant material there-
backpropagation training algorithm with high classication success. fore appears bright in the difference images due to the typical red
Blasco et al. (2002) implemented a vision system for the detection of edge in the reectance spectrum.
weeds in a lettuce crop. Segmentation into plant/soil was based on a The data set consisted of samples for 10 species, which were to
Bayes-classier for RGB colours and size features were used to sep- be differentiated: two monocotyledonous, seven dicotyledonous
arate weeds from lettuce plants. Cho et al. (2002) used a discriminant weed species and summer barley were chosen because of their rel-
function for shape feature selection and neural networks to identify evance for weed management. All species were in early growth
weeds in a radish crop. Tellaeche et al. (2011) successfully applied stages, ranging from germination to two-leaf stage, only C. arvense
Support Vector Machines to identify weeds between crop rows appeared with up to ve leaves. The different growth stages of each
based on weed and crop cover measures for image parts. Zhu and weed and crop (Hordeum vulgare, summer barley) were merged,
Zhu (2009) present a Support Vector Machines approach for weed because they have no economic relevance in the management
detection based on shape and texture features for single leaves. practice, since the management thresholds are set according to
Still, there was no approach for reliable and robust classication the number of plants per m2.
between weed species yet. This, however, is a necessary prerequi-
site for site-specic weed management.
The objective of this paper was the automatic classication of 2.2. Segmentation and feature extraction
crop and the four weed classes Galium aparine, Cirsium arvense,
other dicotyledonous weeds and monocotyledonous weeds using The acquired difference images are converted using image pro-
image processing and classication. Diverse economic thresholds cessing techniques: a grey value threshold is used to separate
for these weed classes and the availability of selective herbicides plants and background, resulting in binary images with two values,
against these weeds are the reason for this separation. Gerowitt one for the foreground (plant) and the other for the background. In
and Heitefuss (1990) determined weed thresholds for G. aparine the binary images objects are identied by segmentation of con-
between 0.1 and 0.5 plants m2, 4050 plants m2 for dicotyledon- nected foreground components. These segments correspond to
ous weeds in total and 2030 plants m2 for grass weeds. For the plants or parts thereof, if single leaves are not connected after
very competitive weed species C. arvense Brner (1995) indicated the thresholding step. Fig. 1 shows the resulting objects after
a threshold of 2 plants m2. binarisation and segmentation for some of the training samples
The approach in this study is based on an imaging system cap- of this study.
turing two images of different wavelengths to differentiate plants Overlapping plants in this step lead to complex objects, contain-
from background. The plants are then extracted with image pro- ing parts of different plants. These objects would be difcult to sep-
cessing algorithms and classied according to their shape. The arate into their components and even then a proper shape
shape description is expressed as shape features. While the differ- description is likely to fail for the following analysis. Therefore they
ences in shape features between crop and weed were rather large, are put into classes for overlapped plants and as such handled par-
features of weeds were highly similar and therefore specic weeds allel to the other species classes in the system. Overlaps can also be
are hard to identify. In addition different growth stages amplify the identied according to their shape description, as they lead to large
problem, especially during two-leaf stage. For instance, the dicoty- objects and their complexity expresses itself in some of the shape
ledons G. aparine and Veronica persica appear to be nearly identical, features. Since the monocotyledonous species with their long
but due to the high economic loss caused by G. aparine accurate leaves naturally tend to overlap, these objects are usually assigned
classication is critical. This means that very specic features to a monocotyledonous class, which especially for H. vulgare crops
and classiers are needed to solve this problem. This, however, is leads to a valid decision. In Fig. 1 some overlapping of H. vulgare
not possible in a single multi-classication approach with all weed plants (HORVS) can be seen. The general assumption for the appli-
species and crops. Hence, a sequential classication approach was cation of this approach is, that the plants are measured in early
developed. The main idea is to separate between similar subgroups development stages, where overlapping does not affect the overall
of the dataset, like crop, monocotyledons and dicotyledons, which sampling accuracy. The shape of single plants cannot be extracted
are well separable. In next steps dicotyledons were identied by from cluttered scenes, limiting the analysis to early growth stages.
features specialised for the current subgroup. Relevant features The most important herbicide applications, which can benet from
were determined by SVM-Weighting (Guyon et al., 2002) or RE- this technology, take place shortly after germination, the problems
LIEF-F (Kononenko et al., 1994). Depending on the current task due to overlapping are limited during this period.
for classication linear and non-linear Support Vector Machines To identify different weeds and crop species, shape parameters
(SVMs) were used. This way it is possible to differentiate weed were computed for the objects in the image. Some of the shape
species of similar appearance. parameters were derived from the set of pixels belonging to an ob-
The image analysis system in combination with automatic algo- ject, like areasize, inertia values according to main axes of the dif-
rithms for plant species discrimination can be included into ferent object, central moments and moment invariants. Other
real-time and map-based approaches for site-specic weed control. features were derived from the border representation, like border
T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996 91

Fig. 1. Samples of the training data, sorted by class assignment. The segments were scaled to a common maximum width for better visual comparison of the shape. The
species are given by their EPPO-Codes: Monocotyledounous: AGRRE: Agropyron repens ALOMY: Alopecurus myosuroides Dicotyledonous: CIRAR: C. arvense GALAP: Galium
aparine LAMSS: Lamium sp. MATIN: Matricaria inodora SINAR: Sinapis arvensis STEME: Stellaria media VERPE: Veronica persica Crop: HORVS: Hordeum vulgare.

length and Fourier features. A combination of these feature repre- set of plants grown under various conditions. Since the appearance
sentations led to the features compactness and distance features of of the plants changes during the growth process, several growth
the border to either the centre of gravity of the region (rmean, stages of each species were distinguished during a training step.
rmax) or to the main inertia axes (drear). To derive further features, The training was done by manual assignment of classes to visually
a distance transform was applied to the region, resulting in a dis- selected segments and their feature vectors, which contain the
tance value (to the border) for each pixel. A skeletonisation step shape parameters. Samples of the training data set are shown in
additionally created skeletons of the objects, which are the central Fig. 1. Obviously the shape variation within each class can be high
lines of a region, dened by the local maxima of the distance values (topdown) and weeds of each group in early growth stages have
(Weis et al., 2007). Combining the skeleton lines with the distance similar appearance (leftright).
transform, the distance values for all skeleton pixels were assem-
bled and statistical measures were computed (number of values:
2.3. Classication with support vector machines
skelsize, maximum, mean: skelmean, variance). These measures de-
scribe the overall thickness of an object: typically monocotyledons
The simple case of linear SVMs assumes that two classes (dichot-
leaves have a larger skeleton and the skeleton does not have a large
omous problem) are linearly separable, i.e. their discriminant can
distance to the border lines. In contrast to these dicotyledons with
be described by a linear function. Consider a set of samples
more compact leaves have a shorter skeleton representation with
x1 ; y1 . . . ~
xl ; yl ; l 2 N. Each sample has a vector ~ xi 2 Rm consisting
bigger distances to the border. Finally, the values of each feature
of m known features describing it. The label yi 2 1 indicates to
had to be normalized to a mean of zero and a standard deviation
which class it belongs and usually cannot be directly observed. Each
of one. The normalisation avoids numerical problems introduced
sample is generated by an unknown, underlying probability distri-
by different scales.
bution P~ x; y P~ xPyj~
x. The task at hand for a binary classier is
After the feature weighting steps, which were conducted during
now to decide to which class an unknown sample belongs, based on
the analysis the following features proved to be the most relevant
the observed feature vector ~ x.
and are therefore named here:
To achieve this a training data set of m < l samples is needed
where both the feature vectors and the labels are observed for each
 areasize, the number of pixels of an object;
sample. The classier aims at nding an indicator function
 rmean and rmax, the mean and maximum distance of the border
f :~
x ! f1g which approximates the unknown P~ x; y and gives
to the centre of gravity;
the best prediction of the class y of a new sample ~ x which was
 drear, vertical distance of the border to the main axis of the
not part of the training data set.
Without a restriction of the set of functions, however, a function
 eccentricity and compactness, describing the overall elongated-
f which does well on the training data might not generalize well on
ness and computed as the ratio of the (squared) border length
unseen data. This risk of overtting occurs especially if the indi-
to the area (Burger and Burge, 2009);
cator function f may be arbitrarily complex.
 hu1 and hu2, the rst two moment invariants as given by Hu
Hence, Support Vector Machines (SVMs) (Vapnik, 2000;
Schlkopf et al., 1998) restrict their set of indicator functions to
 skelsize and skelmean, two features derived from the skeletoni-
separating hyperplanes of the form
sation step, denoting the length of the skeleton (measured in
pixel) and the mean distance to the border. x sgnhx
f ~ ~ ;~
xi b; 1

Segments of the data sets were selected for the training and where the c denotes the orientation of the hyperplane and b 2 R the
classes were assigned to them, resulting in a large training data offset from the origin. For a set of linearly separable data, however,
92 T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996

there are many different hyperplanes and the concept of nding a 1 Xm

maximise hx ~ j2 C
~ jx ni ; 11
separating hyperplane is not exclusively used by Support Vector x
~ 2 i1
Machines. SVMs are special about the specic selection on the sep-
arating hyperplane. SVMs adopt the hyperplane which maximises
subject to yi hx
~ ;~
xi b P 1  ni ; 8i 12
the distance to the closest samples of both classes in the training and ni P 0; 8i: 13
data, i.e. the hyperplane with the maximal margin. As Vapnik
The parameter C handles the tradeoff between margin maximi-
(1998) proves, this choice of the hyperplane minimised the risk of
sation and training error minimisation (Schlkopf and Smola,
2002). Interestingly, apart from an additional boundary on con-
To get the optimal separating hyperplane with maximal margin,
straint 5 to C P ai P 0, the general form of the dual problem does
we have to minimise
not change (Cortes and Vapnik, 1995).
~ ~ j2 ;
jx 2
2 2.4. Feature weighting
subject to yi hx ~;~
xi b P 1; 8i: 3
Note that only those few samples closest to the hyperplane It is unlikely that all 50 shape parameters are equally important
determine the position of the hyperplane. Those samples are called for class separation. This is important as many bad features may
support vectors. overweight a small number of good ones. Out of many possible fea-
To solve the constrained minimisation problem, it is often easier ture weighting methods two approaches were chosen. For linear
to solve the dual maximisation problem (Boyd and Vandenberghe, classication SVM-Weighting (Guyon et al., 2002) was used and
2004). Using the Karush Kuhn Tucker conditions (Rockafellar, for non-linear classication tasks the RELIEF-F algorithm (Kira
1993) the dual Lagrangian is formed et al., 1992) proved to be suitable.

1Xm X
maximise W~ ai  ai aj yi yj ~xi ; ~xj ; 4 2.4.1. SVM-Weighting
~ 2 i
i j As stated above SVMs are multivariate classiers, which means
subject to ai P 0 5 they are able to handle multiple features in training simultaneously.
Xm The weight vector x~ (Eq. (7)), which is a linear combination of train-
and ai yi 0; 6
i ing patterns, species the weight of every feature in the construction
where a denotes the Lagrangian variable. One key feature of the of the optimal hyperplane. Features where separation is good have a
Lagrangian is that every vector ~
xi which is no support vector be- large corresponding coefcient in x~ due to Eq. (7). These weights can
comes zero in the optimisation process and has no weight in the also be used to rank features (Guyon et al., 2002).
construction of the optimal hyperplane
^ Xm 2.4.2. RELIEF
x i
ai yi~xi ; 7 First, a dichotomous case with non-linear classication was con-
^ 1 x
b ~^; ~
xa ~
xb ; 8
sidered. The key idea of the RELIEF algorithm is to estimate features
2 according to how well their values distinguish among samples that
where ~ xa 2 f1g and ~
xb 2 f1g are any support vectors of both are near to each other (Kononenko et al., 1994). RELIEF searches for
classes. two nearest neighbours for a given sample. Each neighbourhood
consists of k elements. For a given k, hit is the set of k nearest
2.3.1. Nonlinear support vector machines neighbours of the same class and miss from the different class.
Until now linear separability was assumed, though this is in- The relevance of a feature Fj is determined by the sum of the euclid-
deed a rather special case. As every dot product can be replaced ean distances between nearest misses Ml and nearest hits Hl for all
by a kernel of the form k~
x1 ; ~
x2 (Schlkopf and Smola, 2002), using samples used to approximate probabilities.
the dual expression in Eq. (4) enables to use the so called kernel
trick and modify Eq. (4) to
Xm 1Xm X
maximise W~ ai  ai aj yi yj k~xi ; ~xj : 9 Listing 1: Pseudo code of the RELIEF-F algorithm for two class
~ i 2 i j classication.

In our approach, we use a very popular kernel, namely Gaussian ra- 1 INPUT: A set of features F = F1,    ,Fm, a set of
dial basis functions (RBF). The idea is similar to radial basis net- 2 samples R1,    ,Rn and a class label is given for each R
works, though instead of placing the radial basis functions into 3 OUTPUT: A set of feature weights W = W1,    ,Wm
the centres of both classes, they are placed at each support vector, 4 set all weights WF : 0
i.e. those samples which are critical for the classication task 5 for i: 1 to n do (number of samples for approximating
(Schlkopf et al., 1997). The RBF kernel used is probabilities)
6 begin
j~ x2 j2
x1  ~ 7 randomly select an sample Ri:
x1 ; ~
x2 exp : 10
2r 8 nd k nearest hits Hl and nearest misses Ml;
9 for j : 1 to m do (all features)
2.3.2. Soft margin support vector machine 10 begin
In order to improve the generalisation ability of the SVM classi- 11 W F j : W F j  kl1 difference F j ; Ri ; Hl =m  k
er, the risk of outliers, noise and wrongly labelled training samples kl1 difference F j ; Ri ; Ml =m  k;
have to be taken into account. To achieve this, slack variables are 12 end
introduced which penalise classication errors. This modies the 13 end;
optimisation problem to
T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996 93

For multi-class problems the extension RELIEF-F Kononenko the higher is the risk that their densities differ signicantly. This
et al. (1994) can be used. Instead of nding the nearest miss M can be expected here as some classes (e.g. G. aparine, A. repens
from the different class, the algorithm nds one nearest miss of and A. myosuroides) are very dissimilar in feature space and others
each different class and averages their contribution for updating (e.g. G. aparine, V. persica and Lamium sp.) are very close. Due to the
the weight W(Fj). The average is weighted with the prior probabil- variance in density within the different classes, the classication
ity of each class. problem had to be sequentially divided into robustly separable
groups of weeds. The features skelsize and areasize were the most
important to separate the crop H. vulgare from the classes mono-
3. Results cotyledons and dicotyledons, but were not suitable to separate
the different classes of dicotyledons. Other features have more rel-
3.1. One-against-all classication evance for the separation of the dicotyledonous species.
First, the crop H. vulgare and the groups of monocotyledons and
In a rst approach a classier to differentiate all classes in one dicotyledons were clearly classied. Second, sequentially specic
step was applied. Due to the number of 10 classes and difculties classiers for the more difcult classication of the dicotyledons
of class separation, especially dicotyledons, a non-linear SVM was were applied. This approach is introduced below.
used. Features were weighted according to the RELIEF-F algorithm.
This way, features with good class separability properties got more 3.2. Sequential classication
weight in classication.
After the application of a non-linear SVM with RBF kernel in order The aim of sequential classication was the identication of
to classify between all classes only an overall accuracy of 69.25% was weeds with high economic effect. The classication task was divided
achieved (Table 1). The classication result was not sufcient for our into three steps. In a rst step the three well separable groups H.
task, however, we learned which classes were well separable and vulgare, monocotyledons and dicotyledons, as shown in Table 1,
which classes were often mixed up.H. vulgare separation with a class were differentiated. Using SVM-Weighting the features were
recall of 82.98% and the two dicotyledons C. arvense and Sinapis arv- weighted by relevance for this classication step. The most relevant
ensis with a class recall over 77% were adequately identied. While features separating the three groups were skelsize, areasize, rmean
the other weeds were systematically misclassied. and rmax. The overall classication accuracy using a linear SVM was
First, misclassication occurred in the group of monocotyledons 97.74% (Table 2). The class recall of the dicotyledons which have the
Agropyron repens and Alopecurus myosuroides (highlighted dark highest economic effect even amount 99.24%. Further, only a classi-
grey), whereas the separation of H. vulgare, monocotyledons and cation between several dicotyledons was necessary.
dicotyledons was excellent. Second, within the dicotyledons also Compared to SVM weighting in the rst classication we ap-
symmetric misclassications could be observed. Especially two plied RELIEF-F to weight the features in the second non-linear clas-
groups of weeds were interchanged, viz. the rst group G. aparine, sication by SVM with RBF-kernel. The two features most
Lamium sp. and V. persica (highlighted black) and the second group weighted by RELIEF-F were eccentricity and hu1. A second group
Matricaria inodora and Stellaria media (highlighed light grey). The of features, namely skelmean, drear and hu2 were almost as rele-
group of misclassications within dicotyledons is of peculiar inter- vant. In comparison to the rst classication step the most relevant
est, because G. aparine is a weed with low economic threshold features were different. The second classication step separated
which accordingly has to be identied with high accuracy. the dicotyledons with an accuracy of 69.37% (Table 3). Regarding
In order to improve the classication result specic models for the classication result in more detail four classes were detected
special groups of weeds had to be learned. The classication prob- with a high accuracy of above 70%, whereas the classication result
lem have sequentially been divided into several separation models. of C. arvense, the weed with highest economic effect, was even
Thus, in each classication particularly suitable features could be above 82.71%. Symmetric misclassications, however, still oc-
weighted higher and parameters of each SVM could individually curred. The identication of the second weed with high economic
adapted. effect, namely G. aparine, was still unsatisfactory with an accuracy
The advantage of this approach is that by reduction of class of 64.00%. The two classes by which G. aparine was misclassied
numbers more specic features could be weighted higher for this are Lamium amplexicaule and Lamium purpureum (merged as
particular classication. As the used non-linear kernel is based on LAMSS) and V. persica (highlighted grey).
radial basis functions, it naturally depends on the density of the Thus, in a third step we learned a non-linear SVM classier with
data points, especially near the margin. The more classes are used, RBF-kernel only for these three dicotyledons. The most relevant

Table 1
Results of the one-against-all non-linear SVM classication in one step with weighted features by RELIEF-F. The equal coloured cells highlighted the groups of weeds where
common misclassications occurred.
94 T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996

Table 2
Results of the rst step of sequential classication with linear SVM.

Prediction Ground truth Class precision (%)

Monocotyledons Dicotyledons HORVS
Monocotyledons 171 3 3 96.61
Dicotyledons 2 784 10 98.49
HORVS 3 3 81 93.10
Class recall (%) 97.16 99.24 86.17 97.74

Table 3
Results of the second step of sequential classication for the dicotyledonous weed species with non-linear SVM. The equal coloured cells highlighted the group of weeds where
common misclassications occurred.

Table 4 lower competitiveness like Lamium sp. caused losses between 1

Results of the third step of sequential classication with non-linear SVM. and 2 kg/ha per plants m2.
Prediction Ground truth Class precision (%) The focus of this study is to discriminate between different
weed groups, single weed species and crop plants of various image
series taken in the greenhouse and under eld conditions. This
GALAP 100 11 21 75.76
means that each class consists of individuals from images of di-
VERPE 11 86 17 75.44
LAMSS 14 16 105 77.78 verse growth stages and origin, resulting in a very demanding clas-
sication task due to highly similar morphology of different weeds.
Class recall (%) 80.00 76.11 73.43 76.36
Despite the high class variance an optimal classication scheme
that can distinguish between several weed species and the crop
was developed.
features for this classier were rmin, hu2 and compactness. The Due to the fact that the classication problem involves separa-
average classication performance of these three dicotyledons in- tion of a high number of classes and the separability of some spe-
creased considerably to 76.36% (Table 4). The weed G. aparine with cies with similar morphology is difcult in one classication step,
high economic effect was reliable perceived with class recall of 80%. the problem was divided into several parts of classication. Well
The sequential classication steps and achieved classication separable groups of weed species were identied by a standard
accuracies are summarised in Table 5. SVM with RBF kernel, which was used to separate the different
weed classes simultaneously in one step (Table 1). The classica-
4. Discussion tion results turned out to be unsuitable for herbicide specic weed
management. Especially weed species with a high negative impact
Automatic weed detection is a prerequisite for accurate and on crop yield were often misclassied.
protable site-specic weed control. Furthermore, not only differ- In order to achieve an adequate classication accuracy, a
entiation between crop and weed, but especially between different sequential classication approach was necessary. One key to ade-
weeds, is of high interest. This is important, as the economic quate classication results is the weighting of relevant features.
threshold varies highly between different weed species. Therefore For instance, we have great morphologic differences between cer-
an unidentied weed which cause high economic loss, and hence tain weed species such as C. arvense and V. persica on the one hand,
wrongly applied herbicides, may be fatal for yield. For instance, and very similar shapes like G. aparine and Lamium sp. on the other
Pallutt and Flatter (1998) calculated yield losses in winter barley hand. Naturally, the optimal features to describe the differences
of 1030 kg/ha for one G. aparine plants m2, whereas weeds with between named classes are not the same. Hence, features highly

Table 5
The weed species for the three sequential classication steps from top to bottom with the achieved classication accuracies.

Step 1 Crop (%) Monocotyledons (%) Dicotyledons (%)
86.2 97.2 99.2
Step 2 82.7 70.8 80.0 70.7 88.7
Step 3 80.0 76.1 73.4
T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996 95

weighted for an SVM classication of all plants in a one-step Blasco, J., Aleixos, N., Roger, J.M., Rabatel, G., Molt, E., 2002. Ae-automation and
emerging technologies, Robotic weed control using machine vision. Biosystems
approach are not the same features which are best suited for a sep-
Engineering 83, 149157.
aration between two or three species. Therefore a sequential clas- Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press,
sication approach rst separates between dissimilar subgroups of New York, NY, USA.
weeds, and then step by step specializes the classier into separat- Burger, W., Burge, M.J., 2009. Principles of Digital Image Processing Core Algorithms
Undergraduate Topics in Computer Science, 1st edition. Springer.
ing the resulting subgroups of weeds with specialised features and Burks, T., Shearer, S., Heath, J., Donohue, K., 2005. Evaluation of neural-network
particular kernel parameters. classiers for weed species discrimination. Biosystems Engineering 91 (3), 293304.
Consequently, in the rst classication step groups of weed spe- Burks, T.F., Shearer, S.A., Payne, F.A., 2000. Classication of weed species using color
texture features and discriminant analysis. In: Transactions of the ASAE, vol. 43.
cies (monocotyledons, dicotyledons and crop) was separated with American Society of Agricultural Engineers, pp. 441448.
high accuracy of more than 97% by using a linear SVM combined Brner, H., 1995. Unkrautbekmpfung. Gustav Fischer, Jena.
with SVM-Weighting (Table 2). Prez et al. (2000) reduced the Cho, S.I., Lee, D.S., Jeong, J.Y., 2002. Weedplant discrimination by machine vision
and articial neural network. Biosystems Engineering 83, 275280.
weed-crop classication to a binary problem and achieved classi- Christensen, S., Heisel, T., 1998. Patch spraying using historical, manual and real-
cation rates using pattern recognition from almost 90% for the crop time monitoring of weeds in cereals. Zeitschrift fr Panzenkrankheiten und
and between 75% and 80% for the weed depending on the classi- Panzenschutz Sonderheft XVI, 257263.
Christensen, S., Sgaard, H., Kudsk, P., Nrremark, M., Lund, I., Nadimi, E., Jrgensen,
cation process. R., 2009. Site-specic weed control technologies. Weed Research 49 (3), 233241.
The subdivision of the dicotyledons in the classes of C. arvense Cortes, C., Vapnik, N.V., 1995. Support-vector networks. Machine Learning 20 (3),
and G. aparine during the second classication step follows their 273297.
Gerhards, R., Christensen, S., 2003. Real-time weed detection, decision making and
extremely diverse economic thresholds in comparison to the other
patch spraying in maize, sugar beet, winter wheat and winter barley. Weed
dicotyledons (Gerowitt and Heitefuss, 1990; Brner, 1995). Fur- Research 43 (6), 385392.
thermore selective herbicides are available for these weed species Gerhards, R., Oebel, H., 2006. Practical experiences with a system for site-specic
as well as for monocotyledons. The high weighted shape features weed control in arable crops using real-time image analysis and GPS-controlled
patch spraying. Weed Research 46 (3), 185193.
for the rst classication step are not suitable to distinguish the Gerhards, R., Skefeld, M., 2003. Precision farming in weed control system
dicotyledons among each other. Moreover, discrimination between components and economic benets. Precision Agriculture 1, 229234.
the dicotyledons is a non-linear task and therefore the RBF kernel Gerowitt, B., Heitefuss, R., 1990. Weed economic thresholds in the F R. Germany.
Crop Protection 9, 323331.
was chosen for the SVM combined with the feature weighting algo- Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer
rithm RELIEF-F. Misclassications between V. persica, Lamium sp. classication using support vector machines. Machine Learning 46, 389422,
and G. aparine, however, still occurred. 10.1023/A:1012487302797.
Hu, M.K., 1962. Visual pattern recognition by moment invariants. IRE Transactions
This is the reason for the third classication step to detect G. Information Theory 8 (2), 179187.
aparine with high accuracy, such that the classication accuracy Johnson, G., Mortensen, D., Gotway, C., 1996. Spatial and temporal analysis of weed
reached 80% for this class. Gerhards and Oebel (2006) used an im- seedling populations using geostatistics. Weed Science 44, 704710.
Kira, K., Rendell, L.A., 1992. The feature selection problem: traditional methods and
age analysis approach for weed detection with a similar subdivi- a new algorithm. In: Proceedings of the 10th National Conference on Articial
sion of the weeds. In comparison to this approach the sequential Intelligence, AAAI92. AAAI Press, pp. 129134.
classication increased the results for G. aparine to 80% versus Kononenko, I., 1994. Estimating attributes: analysis and extensions of relief. In:
Proceedings of the European Conference on Machine Learning. Springer-Verlag,
72% and for monocotyledons to 97% versus 74%. The realised clas-
New York, Inc., Secaucus, NJ, USA, pp. 171182.
sication results by means of sequential classication with SVM Marshall, E., 1998. Field-scale estimates of grass populations in arable land. Weed
(Table 5), which are presented in this paper, are a solid basis con- Research 28, 191198.
cerning differentiation of weed species or rather groups of weed Pallutt, B., Flatter, A., 1998. Variabilitt der Konkurrenz von Unkrutern in Getreide
und daraus resultierende Auswirkungen auf die Sicherheit von
species for site-specic weed control. This kind of classication Schwellenwerten. Zeitschrift fr Panzenkrankheiten und Panzenschutz
not only allows the differentiation between crops and weeds which Sonderheft XVI, 333344.
is a crucial requirement for the site-specic application of broad Prez, A., Lpez, F., Benlloch, J., Christensen, S., 2000. Colour and shape analysis
techniques for weed detection in cereal elds. Computers and Electronics in
spectrum herbicides or herbicide mixtures but it is also capable Agriculture 25, 197212.
for the discrimination within the weeds dividing them in monocot- Rockafellar, R.T., 1993. Lagrange multipliers and optimality. SIAM Review 35 (2),
yledons and dicotyledons. 183238.
Schlkopf, B., Smola, J.A., 2002. Learning with Kernels: Support Vector Machines,
Regularization, Optimization, and Beyond: Support Vector Machines,
Regularization, Optimization, and Beyond (reprint) Edition. The MIT Press and
5. Conclusion MIT Press, Cambridge, MA, USA.
Schlkopf, B., Smola, J.A., Mller, K.R., 1998. Nonlinear component analysis as a
kernel eigenvalue problem. Neural Computation 10 (5), 12991319.
The conducted work shows that an early detection and classi- Schlkopf, B., Sung, K., Burges, J.C., Girosi, F., Niyogi, P., Poggio, T., Vapnik, N.V., 1997.
cation of different weed species using image processing and classi- Comparing support vector machines with gaussian kernels to radial basis
function classiers. IEEE Transactions on Signal Processing 45, 27582765.
cation based on shape features is feasible. To focus the approach Schulze-Lammers, P., Vondricka, J., 2010. Precision crop protection-the challenge
on the identication of weeds with high economic effects, in par- and use of heterogeneity. In: Direct Injection Sprayer, rst ed. Springer Verlag,
ticular C. arvense and G. aparine, a sequential classication was Dordrecht, Heidelberg, London, New York, Ch., pp. 295310.
Skefeld, M., 2010. Precision crop protection-the challenge and use of
implemented. The presented approach divides the overall com- heterogeneity. In: Variable Rate Technology for Herbicide Application, rst
plexity of the classication problem into less complex parts, thus ed. Springer Verlag, Dordrecht, Heidelberg, London, New York, Ch., pp. 335347.
improving the classication accuracy and detection rates. The re- Skefeld, M., Gerhards, R., Oebel, H., Therburg, R.-D., 2007. Image acquisition for
weed detection and identication by digital image analysis. In: Stafford, J. (Ed.),
sults of the classication can be used by decision support systems
Precision Agriculture 07, Sixth European Conference on Precision Agriculture,
for site-specic weed management. This will lead to an improve- vol. 6. Wageningen Academic Publishers, The Netherlands, pp. 523529.
ment of herbicide saving in terms of cost reduction and minimisa- Tellaeche, A., Pajares, G., Burgos-Artizzu, X.P., Ribeiro, A., 2011. A computer vision
tion of environmental impact. approach for weeds identication through support vector machines. Applied
Soft Computing 11 (1), 908915.
Vapnik, N.V., 1998. Statistical learning theory. Wiley, New York.
Vapnik, N.V., 2000. The nature of statistical learning theory, Statistics for
References engineering and information science, 2nd Edition. Springer-Verlag, New York.
Weis, M., Gerhards, R., 2007. Feature extraction for the identication of weed
species in digital images for the purpose of site-specic weed control. In:
strand, B., Baerveldt, A.-J., 2004. Plant recognition and localization using context
Stafford, J. (Ed.), Precision Agriculture 07, Sixth European Conference on
information. In: Proceedings of Mechatronics and Robotics 2004
Precision Agriculture, ECPA, vol. 6. Wageningen Academic Publishers, The
(MechRob2004). Sascha Eysoldt Verlag, Aachen, Germany, Special Session
Netherlands, pp. 537545.
Autonomous Machines in Agriculture, pp. 11911196.
96 T. Rumpf et al. / Computers and Electronics in Agriculture 80 (2012) 8996

Weis, M., Skefeld, M., 2010. Precision crop protection-the challenge and use of Zhu, W., Zhu, X., 2009. The application of support vector machine in weed
heterogeneity. In: Detection and Identication of Weeds, rst ed. Springer classication. In: Intelligent Computing and Intelligent Systems, 2009. ICIS
Verlag, Dordrecht, Heidelberg, London, New York, Ch., pp. 119134. 2009. IEEE International Conference on, vol. 4., pp. 532536.