You are on page 1of 6

Evaluating Preferential Voting Methods for Multiple

Class Classification
Vaibhav Kumar Katturu∗ , ParamPuneet Kaur Thind∗ , Teryn Cha† and Sung-Hyuk Cha∗
∗ Computer
Science Department, Pace University
New York, NY, 10038, USA
Email: {vk38221n,pt54854n,scha}@pace.edu
† Mathematics, Engineering Technologies & Computer Sciences, Essex County College

Newark, NJ, 07102, USA


Email: yan@essex.edu

Abstract—Ensemble methods are of great importance in pat- A convolutional neural network, or simply CNN, is a Deep
tern recognition and machine learning because combining mul- learning algorithm which can take in an image as an input,
tiple classifiers often results in better accuracy and predictions assign importance to various aspects or objects in the image,
in many classification problems. Recently, a noble preferential
voting method with multiple convolutional neural networks was and be able to differentiate one from the other. A CNN is
suggested to classify plant diseases. It was shown that the new composed of multiple building blocks such as convolutional
approach performs better than conventional preferential voting layers, poling layers and fully connected layers and is designed
methods. Here the effect of number of classes on the preferential to automatically and adaptively learn spatial hierarchies of
voting methods is studied to identify the optimal number of features through back propagation algorithm [9]. CNN has
classes for each preferential voting method. The plant disease
dataset containing images of eleven plants and seven plant shown remarkable performance in computer vision tasks such
diseases is used in experiments. Four to Seven disease classes as static image (object recognition) and videos (action or
were tested and the proposed model performs the best when the motion recognition) [10]. CNN has proven its worth in medical
number of classes is five. applications as well. In all these applications, CNNs are used
to reach near precise accuracy though in some of the cases
I. I NTRODUCTION
CNN doesnt perform well, due to (1) overfitting data while
Consensus of a group plays an important role in decision training and (2) compromised quality of images. Multiple
making such as elections [1]. It is essential in most democratic CNNs are utilized as voters in this paper because numerous
societies and has received great attention in pattern recognition CNNs can be easily generated by changing structures, filters,
and machine learning communities as well. The idea of voting and functions and candidates can be easily ranked by the
is employed to the pattern classification to combine multiple output neurons’ net values.
classifiers. Various ensemble models for combining multiple Combining multiple CNNs have been studied in [7], [11].
classifiers can be found in [2], [3]. Here, total of twenty CNN models were trained for each
Widely used conventional voting methods are simple ma- experiment with the goal for achieving maximum accuracy by
jority voting if the top choice is selected and Borda method if each model. Architectural configurations for each model have
all candidates are ranked. Ranking candidates is often called been designed with non-repetitive architecture. Each model
preferential voting and exhaustive list of preferential voting is trained with a diversity of data, so as to maximize the
methods can be found in [4], [5]. Among various preferential accuracy of the result. Post training the models, each model
voting methods, the Borda method is used popularly in pattern was made to vote in decreasing order for all 7 diseases. The
classification area such as in [6]. In [7], a modified Borda voting information was then accumulated and again processed
method with power weight vector was proposed to classify to calculate the resultant disease.
the plant diseases and its superiority over other voting methods The rest of the paper is organized as follows. Section II
was shown. provides the literature review on the previous works on CNN
In the real world election system, however, the preferential ensemble methods. In section III, preferential voting method
voting systems are affected greatly depending on the num- is reviewed with illustration and various weight vectors are
ber of candidates [8]. This paper raises a question on how introduced. Section IV provides the experimental results on the
the number of classes affect the preferential voting methods plaint database and observations. Finally, section V concludes
of combining multiple classifiers. Various experiments are this work.
conducted to identify the optimal number of classes when
II. R ELATED W ORK ON CNN E NSEMBLE
preferential voting methods are used. So as to examine the
effect of the number of classes in preferential voting methods, A. Ensemble of Multiple Classifiers
an image dataset of eleven plants and seven plant diseases is CNN ensemble method with weighted voting was used
used. in many applications, one such application is Pneumonia
detection with weighted voting [11]. This paper proposes a Input Hidden Output
method to detect lung opacities which can be identified as rank(di)
layer layers layer
Pneumonia, on chest (CXR), by using ensemble of different
classifiers with majority weighted voting ensemble method. d1 0.712 2
Various attempts were done to ensemble Mask R-CNN and
RetinaNet models which resulted in a higher mean accuracy d2 0.012 4
precision (mAP). The suitable ratio of weights were given to
d3 0.912 1
each classifier which played an important role, thus increasing
mAP to 0.21746.
d4 0.312 3
In [12], an ensemble CNN was proposed on detection and
recognition of Traffic Signs. It was developed to help the
drivers in understanding what the road sign says without
Fig. 1. A sample ANN and its outputs.
shifting their focus off the road. This project was done using
the Belgium and German data sets. Three CNNs were trained
with random initialization of weights and later aggregated to
it received two votes while others received one vote each. A
form a single ensemble model by averaging the output of each
sample preferential voting is given in Table I. It ranks classes
CNN.
by each classifiers.
Another Ensemble of CNN with long-short term memory
(LSTM) method was proposed by using the output of one
single CNN as input to LSTM for a moment to ensemble on TABLE I
S AMPLE PREFERENTIAL VOTING
CIFAR-10 and CIFAR-100 dataset [13].
In [14]., the ensemble method is applied to Stegananalysis. Classifiers
classes c1 c2 c3 c4 c5
An ensemble method to combine CNN with SRM-EC model
d1 2 3 4 3 1
by averaging their output classification probability was pro- d2 4 1 1 4 3
posed. d3 1 2 2 2 2
d4 3 4 3 1 4
B. Two-Stage Ensemble Method
Lee et al. suggested model selecting and box voting method Due to flaws in the simple top choice voting system, the
of two-stage detectors for the purpose of improvement in the preferential voting system in which the voter ranks candidates
accuracy in object detection [15]. In the proposed model object in order of preference has been proposed. Amongst, the Borda
size is also considered as a selecting feature extractor. In box method, also known as the rank method, uses the score
voting the per class Accuracy Precision (AP) was considered function with certain weights. In a preferential voting, a voters
as weights for classifiers on the PASCAL VOC 2012 dataset, ranking is an assignment of grades (e.g., ”1st position”, ”2nd
thus the results showed an im-prove in mAP. position”, ”3rd position”) to the candidates. Requiring voters
In [16], a two-stage ensemble for deep CNN was proposed. to rank all the candidates means that (1) every candidate is
For each basic CNN model, multiple rounds of training were assigned a grade, (2) there are the same number of possible
conducted. To increase the diversity 9 different CNN models grades as the number of candidates, and (3) different candi-
were used. In the first stage outputs from each basic CNN dates must be assigned different grades. A conventional Borda
were integrated using Min-Max median. This is followed by method uses the following weights to compute the score:
second stage by combining all outputs of each CNN model.
Borda (linear: offset 0): L0 wL0 (r) = |S| − r (1)
III. P REFERENTIAL VOTING M ETHODS : P RELIMINARY
Borda (linear: offset 1): L1 wL1 (r) = |S| − r + 1 (2)
We provide a brief introduction to the background required
in preferential voting method. Consider a sample artificial
The class for an input instance is determined by the class
neural network with four output neurons. Suppose that there
with the maximum total weights. Let r(x, di , cj ) be the rank
are four classes, S = {d1 , d2 , d3 , d4 } and each output neuron
of the ith class di by the jth classifier cj on the input instance
corresponds to respective class. The output is classified to the
x. Then the ensemble preferential classifier can be defined as
class whose output net value is the maximum. In Fig. 1, the
follows:
third output neuron has the highest output net value and thus
it is classified as d3 . So as to utilize the preferential voting n
X
method, the rank of the classes is recorded instead of simply class(x, {c1 , · · · , cn }) = argmax w(r(x, di , cj ))(3)
di ∈S j=i
the highest one.
Suppose that there are (n = 5) different classifiers. Typical = argmax score(di ) (4)
di ∈S
classifiers may return only the predicted class only whose n
output neuron value is highest. If the simple majority voting
X
where score(di ) = w(r(x, di , cj )) (5)
is conducted, it classifies the input instance into d2 class as j=i
The score for each class is defined by the sum of weights. TABLE III
If weight vector in eqn (1) is used in the sample voting in S CORES FOR EACH CLASS ON VARIOUS PREFERENTIAL VOTING METHODS .
Table I, scores for each class are: Voting methods
class SM L0 L1 T0 T1 Q0 Q1 P0 P1
score(d1 ) = 2 + 1 + 0 + 1 + 3 =7 d1 1 7 12 11 23 15 34 17 34
score(d2 ) = 0 + 3 + 3 + 0 + 1 =7 d2 2 7 12 13 25 19 38 20 40
d3 1 11 16 18 34 25 52 24 48
score(d3 ) = 3 + 2 + 2 + 2 + 2 = 11 d4 1 5 10 8 18 11 26 14 28
score(d4 ) = 1 + 0 + 1 + 3 + 0 =5

While the simple majority voting method classifies the vectors depending on the offset. Table III show the scores for
input instance in Fig. I to d2 , the Borda preferential voting each class on various preferential voting methods.
method classifies it as d3 . The voting method discussed in this It should be also noted that there is no offset effect on the
section can be viewed as generalizations of scoring methods, power scoreing vector because of the following correlation in
as different weights such as in eqn (2) can be used. While the eqn (13):
eqn (1) produces the weight vector (3,2,1,0), eqn (2) produces scoreP 0 (di ) = 2 × scoreP 1 (di ) (13)
the weight vector (4,3,2,1). The difference is the initial starting
value, which is sometimes called offset. It should be noted that IV. E XPERIMENT ON P LANT D ISEASE
there is no offset effect on the linear Borda scoreing vector
This section describes experimental setups for the purpose
because of the following correlation in eqn (6):
of identifying the effect of the number of candidates on
scoreL0 (di ) = scoreL1 (di ) + n (6) various preferential voting methods. First, the plant disease
dataset is explained. The complete process of developing the
However, the offset may cause difference in other weight convolutional neural network models and how the preferential
vectors stated below in eqns (7 ∼ 10). voting methods are applied are presented.
If the weight vector is (1,0,0,0), it is the simple majority
voting method since it only take the top choice only. Instead A. Plant Disease Dataset
of linearly increasing function as in eqns (1) and (2), different Agricultural demands are multiplying at an alarming rate
increasing functions such as triangular number, quadratic, or directly in proportion to the increasing population. However,
power given in eqns (7 ∼ 12) can be used to produce weight the supply graph can almost be computed inversely. Un-
vectors. predictable weather change and disease along with added
factors cause massive yield reduction during pre-and post-
wL0 (r)(wL0 (r) + 1) harvest periods. Non-implementation of modern technology is
Triangular: T0 wT0 (r) = (7)
2 a contributing factor to low yield. The problem of identifying
(|S| − r)(|S| − r + 1) plant contamination adopting methodologies has gained its
=
2 attention.
wL1 (r)(wL1 (r) + 1) The plant disease dataset contains a total of 2,500 pictures
Triangular: T1 wT1 (r) = (8)
2 of infected plants whose disease is known. The pictures were
(|S| − r + 1)(|S| − r) collected from various sources such as google images, Plant
=
2 Village dataset [17] and Citrus database [18]. There are seven
Quadratic: Q0 wQ0 (r) = wL0 (r)2 = (|S| − r)2 (9) diseases: Anthracnose, Aster Yellow, Bacterial Spot, Black
Quadratic: Q1 wQ1 (r) = wL1 (r) 2
= (|S| − r + 1) 2
(10) Rot, Canker, Early Blight, and Late Blight. There are eleven
wL0 (r) (|S|−r)
crops: Banana, Cabbage, Carrot, Grapes, Green Bell pepper,
Power: P0 wP0 (r) = 2 =2 (11) Guava, Mango, Potato, Sweet Lemon, Tomato, and Yellow
wL1 (r) (|S|−r+1)
Power: P1 wP1 (r) = 2 =2 (12) Bell pepper.
Not all diseases occur in all plants as depicted in Fig. 2.
Hence, detecting plants may help detecting diseases better.
TABLE II However, since the main purpose of the experiment is to
W EIGHT VECTORS FOR VARIOUS PREFERENTIAL VOTING METHODS .
compare the different voting methods’ performances, the ex-
maj. Borda Lin. Triangular Quadratic Powe periment here is only concerned to detect diseases only purely
rank SM L0 L1 T0 T1 Q0 Q1 P0 P1 using multiple CNNs. In other words, pre or post processing
1 1 3 4 6 10 9 16 8 16 steps to improve the performance are ruled out to identify
2 0 2 3 3 6 4 9 4 8
3 0 1 2 1 3 1 4 2 4
the effect of the number of classes. Utilizing semantics and
4 0 0 1 0 1 0 1 1 2 knowledge to improve the performance is beyond the scope
of this paper. It can be stated that the experimental system
Table II enumerates nine weight vectors: the simple majority can be utilized as a low-level sub-system to the larger disease
(SM), Borda linear, triangular, quadratic, and power weight diagnosis system.
Yellow Bell
Banana Grapes Cabbage Potato
Green Bell pepper Sweet
Carrot Guava Mango Tomato
pepper Lemon

Aster Anthrac- Black Bacterial Early Late


Canker
Yellow nose Rot Spot Blight Blight

Fig. 2. Relationship between plants and diseases.

d1 Anthracnose

d2 Aster Yellow

d3 Bacterial Spot
Pre-
processing d4 Black Rot

d5 Canker

d6 Early Blight

d7 Late Blight

Fig. 3. CNN model for the plant diseases.

If a plant photo contains multiple objects, the region of accuracy of all the classifiers ranged between 63% and 77%
interest was cropped and saved with its truth. The selected with highest accuracy being 77% and the average of these
region of interest is converted into a 224 × 224 pixel image classifiers is 70.748%. The simple majority voting provides
where each pixel in an image is scaled to a value between 0 considerable improvement with a performance of 80.917%.
and 1. Very little improvement was recorded with Borda linear voting
B. Convolutional Neural Network Setup method with offset 0 and 1 as it showed an accuracy of
81.833%. Other voting functions like the Triangular Sequence
Four different experiments were set up depending on dif- with offset 0 and 1 did not provide much improvement, either.
ferent number of classes: four, five, six, and seven. The Power Sequence voting showed the highest accuracy.
The data was divided into three parts: training data, vali- These experimental results are summarized in the second
dation data and testing data. Training and validation datasets column of Table IV. ‘Maximum Cx ’ is the highest accuracy
were to be used while training of the neural networks. The among the 20 CNN classifiers and ‘Average Cx ’ is the average
last testing set is used to report its performance. of 20 classifiers’ performances.
C. Experimental Results 2) Experiment Two (5 classes): Experiment two was con-
1) Experiment One (4 classes): Experiment one was con- ducted on 5 classes. The 5 classes are Late Blight, Early
ducted on 4 classes. The 4 classes are Late Blight, Early Blight, Bacterial Spot, Black Rot and Anthracnose consisting
Blight, Bacterial Spot, Black Rot consisting of 800 images of 500 images per class. The entire dataset contained 2500
per class. The entire dataset contained 3200 pictures. 400 pictures. 300 pictures of each class was used for training all
pictures of each class was used for training all the classifiers the classifiers and 50 pictures of each class were selected for
and 100 pictures of each class were selected for validation. validation. A total of 150 pictures of each class was used for
A total of 300 pictures of each class was used for testing the testing the classifiers and for acquiring each classifiers vote.
classifiers and for acquiring each classifiers vote Hence the Hence, the total training data consisted of 1500 pictures, total
total training data consisted of 1600 pictures, total validation validation data consisted of 250 pictures and total testing data
data consisted of 400 pictures and total testing data consisted consisted of 750 pictures.
of 1200 pictures. 20 CNN classifiers were trained with the dataset each with
20 CNN classifiers were trained with the dataset each with a a different architecture and with different training samples.
different architecture and with different training samples. The The accuracy of all the classifiers ranged between 65% and
TABLE IV
S CORES FOR EACH CLASS ON VARIOUS PREFERENTIAL VOTING METHODS .

Voting method Exp 1 (4 classes) Exp 2 (5 classes) Exp 3 (6 classes) Exp 4 (7 classes)
Maximum Cx 77.000% 80.000% 73.000% 70.238%
Average Cx 70.748% 76.000% 69.000% 66.895%
Simple Majority Voting SM 80.917% 82.917% 80.500% 72.381%
Borda linear L0 & L1 81.833% 83.033% 81.424% 73.330%
Triangualr T0 81.167% 83.000% 81.348% 73.857%
Triangualr T1 81.167% 83.000% 81.348% 73.857%
Quadratic Q0 81.000% 82.917% 81.348% 73.857%
Quadratic Q1 81.083% 82.976% 81.348% 73.857%
Power P0 & P1 83.083% 84.400% 83.000% 75.000%

80% with highest accuracy being 80% and the average of of 1120 pictures, total validation data consisted of 70 pictures
these classifiers is 76%. The simple majority voting provides and total testing data consisted of 210 pictures.
considerable improvement with a performance of 82.917%. 20 CNN classifiers were trained with the dataset each with
Very little improvement was recorded with Borda linear voting a different architecture and with different training samples.
method with offset 0 and 1 as it showed an accuracy of The accuracy of all the classifiers ranged between 58.571%
83.033%. Other voting functions like the Triangular Sequence and 70.238% with highest accuracy being 70.238% and the
with offset 0 and 1 did not provide much improvement, either. average of these classifiers is 66.895%. The simple majority
The Power Sequence voting showed the highest accuracy. voting provides considerable improvement with a performance
These experimental results are summarized in the third column of 72.381%. Very little improvement was recorded with Borda
of Table IV. linear voting method with offset 0 and 1 as it showed an
3) Experiment Three (6 classes): Experiment three was accuracy of 73.33%. Other voting functions like the Triangular
conducted on 6 classes. The 6 classes are Late Blight, Early Sequence with offset 0 and 1 did not provide much improve-
Blight, Bacterial Spot, Black Rot, Anthracnose and Canker ment, either. The Power Sequence voting showed the highest
consisting of 500 images per class. The entire dataset con- accuracy. These experimental results are summarized in the
tained 3500 pictures. 240 pictures of each class was used fifth column of Table IV.
for training all the classifiers and 40 pictures of each class
were selected for validation. A total of 220 pictures of each V. C ONCLUSION
class was used for testing the classifiers and for acquiring Ensemble of multiple classifiers are one of the most promis-
each classifiers vote. In all, the total training data consisted of ing methods in pattern recognition and machine learning.
1440 pictures, total validation data consisted of 240 pictures There are many methods in computer science especially using
and total testing data consisted of 1320 pictures. artificial intelligence for plant disease detection and classi-
20 CNN classifiers were trained with the dataset each with fication process, but still, research in this field is lacking.
a different architecture and with different training samples. Even popular convolutional neural networks did not provide
The accuracy of all the classifiers ranged between 67% and a good result. However, when twenty CNNs were trained
73% with highest accuracy being 73% and the average of individually and combined using preferential voting methods,
these classifiers is 69%. The simple majority voting provides a good performance was observed.
considerable improvement with a performance of 80.5%. Very This paper evaluated the preferential voting method with
little improvement was recorded with Borda linear voting different weight vectors: simple voting, linear, quadratic, tri-
method with offset 0 and 1 as it showed an accuracy of angular, and power vectors. One with the power weight vector
81.424%. Other voting functions like the Triangular Sequence showed the highest performances on the plant disease dataset
with offset 0 and 1 did not provide much improvement, either. when it is used to combine multiple CNN classifiers.
The Power Sequence voting showed the highest accuracy. The major observation made from experiments is the effect
These experimental results are summarized in the fourth of the number of candidates on preferential voting methods.
column of Table IV. Four different experiments were set up depending on different
4) Experiment Four (7 classes): Experiment four was con- number of classes: four, five, six, and seven. When the
ducted on 7 classes. The 7 classes are Late Blight, Early number of classes (diseases) is five, the best performance was
Blight, Bacterial Spot, Black Rot, Anthracnose, Canker and observed.
Aster Yellow consisting of 200 images per class. The entire There are several future works and open problems. The first
dataset contained 1400 pictures. 160 pictures of each class was one is exploring other datasets to support and verify the claims
used for training all the classifiers and 10 pictures of each such as the superiority of the power voting method and five
class were selected for validation. A total of 30 pictures of class limit hypothesis. The second one is to explore various
each class was used for testing the classifiers and for acquiring other preferential voting methods and other ensemble methods.
each classifiers vote. Hence, the total training data consisted Finally, collecting a larger plant image data is necessary.
R EFERENCES
[1] J. Levin and B. Nalebuff, An Introduction to Vote-Counting Schemes,
Journal of Economic Perspectives, vol. 9, no. I, pp. 326, 1995.
[2] 5. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd
ed. Wiley, 2001
[3] T. Mitchell, Machine Learning, McGraw Hill, 1997.
[4] S.-H. Cha and Y. J. An, Syntatic and Semantic Taxonomy of Preferential
Voting Methods, IAENG Transactions on Engineering Technologies,
Lecture Notes in Electrical Engineering, vol. 247, pp 301-315, 2014.
[5] S.-H. Cha and Y. J. An, Taxonomy and Nomenclature of Preferential
Voting Methods, Lecture Notes in Engineering and Computer Science,
Proceedings of the World Congress on Engineering and Computer Science
(WCECS), San Francisco, USA, pp173-178, 2012.
[6] T. K. Ho, J.J. Hull, and S.N. Srihari, Decision Combination in Multiple
Classifier Systems, IEEE Trans. J. PAMI, vol. 16, no. I, pp. 6675, 1984.
[7] V. Katturu, P. K. Thind, S.-H. Cha, and T. Cha, A New Ensemble Method
for Convolutional Neural Networks and its Application to Plant Disease
Detection, 16th International Conference on Machine Learning and Data
Mining (MLDM), Amsterdam, The Netherlands, July 2020, pp. 187-196.
[8] M. Samuel III, Making Multicandidate Elections More Democratic,
Princeton University Press, Princeton, NJ, 1988.
[9] R. Yamashita, M. Nishio, R. K. G. Do and K. Togashi, Convolutional
neural networks: an overview and application in radiology, Insights
Imaging, vol. 611, no. 1869-4101, p. 9, 28 May 2018.
[10] E. Park, X. Han, T.L. Berg and A.C. Berg, Combining multiple sources
of knowledge in deep CNNs for action recognition, in 2016 IEEE Winter
Conference on Applications of Computer Vision (WACV), Lake Placid,
NY, 2016.
[11] H. Ko, H. Ha, H. Cho, K. Seo and J. Lee, Pneumonia Detection with
Weighted Voting Ensemble of CNN Models, in the 2nd International
Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu,
China, 2019.
[12] A. Vennelakanti, S. Shreya, R. Rajendran, D. Sarkar, D. Muddegowda,
and P. Hanagal , Traffic Sign Detection and Recognition using a CNN
Ensemble, Proceedings of IEEE International Conference on Consumer
Electronics (ICCE), Las Vegas, NV, USA, 2019.
[13] J. Chen, Y. Wang, Y. Wu, and C. Cai, An Ensemble of Convolutional
Neural Networks for Image Classification Based on LSTM,, Proceedings
of International Conference on Green Informatics (ICGI), Fuzhou, 2017.
[14] K. Liu, J. Yang, and X. Kang, Ensemble of CNN and rich model for
steganalysis, Proceedings of International Conference on Systems, Signals
and Image Processing (IWSSIP), Poznan, 2017.
[15] J. Lee, S. Lee, and S. Yang, An Ensemble Method of CNN Models for
Object Detection, Proceedings of International Conference on Information
and Communication Technology Convergence (ICTC), Jeju, Korea 2018.
[16] R. Uddamvathanak, Two-Stage Ensemble of Deep Convolutional Neural
Networks for Object Recognition, Proceedings of International Confer-
ence on Intelligent Rail Transportation (ICIRT), Singapore, 2018.
[17] S. P. Mohanty, Dataset of diseased plant leaf images and corresponding
labels, https://github.com/spMohanty/PlantVillage-Dataset, 2018.
[18] H. T. Rauf, B. A. Saleem, M. I. U. Lali, Khan, M. A. M. Sharif, and
S. A. C. Bukhari, A citrus fruits and leaves dataset for detection and
classification of citrus diseases through machine learning, Data in Brief,
vol. 26, Article104340 , 2019.

You might also like