You are on page 1of 10

HYBRID PRE-TRAINED CNN FOR MULTI-CLASSIFICATION OF

RICE PLANTS
Sri Silpa Padmanabhuni1,, Abhishek Sri Sai Tammannagari2, Rajesh Pudi3, Srujana Pesaramalli4
1
Associate Professor, Department of Computer Science and Engineering, PSCMR College of Engineering and Technology,
Vijayawada, A.P.
2,3,4
Student, Department of Computer Science and Engineering, PSCMR College of Engineering and Technology,
Vijayawada, A.P.

Silpa.padmanabhuni@gmail.com

Abstract: The largest producer of Rice is India, where it is also a widely grown crop. Due to the large areas planted in
villages, towns, and cities, many types of rice seeds are produced and developed to respond to the changes in growth settings
such as climate, soil, water, and many aspects. They have a variety of nutritional values and flavours. Various rice seeds are
introduced to the market with advancements across multiple cropping methods. The suggested system in this research uses
Enhanced Vanilla CNN to categorize the wide varieties of rice plants from an image of the seed. In the previously defined
models, the functionality and accuracy in predicting the type of rice seed are less, resulting in inconsistency in the model.
This paper provides an accurate model to classify and identify the type of rice seed that an end user is about to purchase and
the benefits of that particular type of Rice. It uses the dataset containing five types of rice seeds that will be used to train the
model. It is more consistent and accurate than previously developed models. This paper also explains the accuracy difference
between Enhanced Vanilla CNN and VGG16.

Keywords: Vanilla CNN, VGG16, Rice Seed, Accuracy, Deep Learning, Arborio, Basmati, Ipsala, Jasmine, Karacadag

1 Introduction
Different models and techniques have been introduced and worked to detect and separate the rice seed from
various distinct varieties. Many have served their purpose, and a few are currently used daily. These models and
techniques may have procured less accuracy and produced fewer quality results than the recently developed
models. When separating and identifying rice seeds, it is crucial to know whether the rice seed sold is genuine.
Also closely related to a particular rice plant family that is nutritious and healthy. The system will use the
Enhanced Vanilla CNN and VGG16 model to define the rice plant family to prove which works extensively and
accurately in predicting the rice plant family.

The proposed system will work on a dataset containing images of various rice plant families planted and
harvested in different parts of India. Convolution Neural Networks are a network architecture used alongside
deep learning algorithms. Their specific use for image recognition and image processing makes them more
reliable. This model mainly involves the process of pixelation of provided data.

The Vanilla CNN is said to be an extension of the linear regression supervised algorithm in machine learning.
The difference between both models is that the hidden layer is crucial in all the additional computations in a
vanilla neural network. This extra layer is introduced between the inputs and outputs. The hidden layer, denoted
as H, contains three neurons, namely H0, H1, and H2, assuming that any number of neurons can add any
number to the hidden layers. Backpropagation and the hidden layer are also used in Vanilla Neural Network.

Using a Vanilla Neural Network in this model proves that accuracy might become significant in this context. Its
architecture is robust, such that applying each layer's output is generated using a non-linear activation function
that uses weights and bias. The Vanilla Neural Network determines the weighted sum at each step based on the
number of layers when the ReLU function is used as the activation function.

In this study, we additionally demonstrate the accuracy attained by the VGG16 design. Convolution Neural
Network (CNN) VGG16 was utilized to triumph in an ILSVR (Imagenet) competition in 2014. It is regarded as
having the most prominent vision model architecture at the time. The key feature that sets VGG16 apart from
other algorithms is its emphasis on building 3x3 filter convolution layers with a stride one. Also, consistently
using the same padding and maximum pooling layer of a 2x2 filter with a stride 2. The 16 layers with weights
that comprise the VGG16 are represented by the number 16. Max Pooling, Fully linked with ReLU, and the
final stage in VGG16 all involve convolution and ReLU softmax. When softmax is achieved, it is said that the
model is cooked finally.

VGG16 in this model is used to evaluate and compare the accuracy achieved by the Vanilla CNN model. This
paper uses the dataset from Kaggle, which contains the rice seed types of Arborio, Ipsala, Basmati, Jasmine, and
Karacadag. Using this, we will be determining the accuracies of the two models.

2 Literature Survey
Zhengjun Qiu et al. proposed a model for identifying the variety of rice seeds using hyper-spectral imaging that
can classify by convolution neural network. This proposed system uses SVM (Support vector machine), (K-
Nearest Neighbour) as called KNN and (Convolution Neural Network) CNN. The spectra variation between the
preprocessed and un-pre-processed average spectra is not significant for the varieties of seeds. The reliability of
the testing and training dataset with SVM is 86.9% and 84.0 %, and the accuracy for the CNN model with
training and testing is 89.6% and 87.0 %.[1]
Samson Damilola Fabiyi et al. defined a way of classifying the variety of rice seeds using the RGB colour scale
for Hyperspectral imaging. In this model, the author uses a dataset containing 90 species of rice seeds. A brand-
new approach for inspecting rice seeds that combine hyperspectral and traditional RGB imaging is proposed.
The model has eliminated the impure source by combining spatial characteristics obtained from photos with
higher spatial resolution and spectral features from hyperspectral data cubes. The average recall, f1 scores and
precision for the dataset with all parts are 79.64%,78.80% and 78.27%.[2]
Cinar Ilkay et al. defined various methods for identifying the rice varieties utilizing machine learning
algorithms. The algorithms used in this model are (K-Nearest Neighbour) KNN, Random Forest and Support
Vector Machine, Multilayer Perceptron, Logistic Regression, and Decision Tree. While different cultivars are
more accurate than Arborio and Basmati, they have a lesser accuracy rate. Of all models, Random Forest has
produced more promising results in accuracy and precision in identifying the rice seed varieties. Random Forest
Produced 98.04% accuracy.[3]
Cinar Ilkay et al. performed various methods in classifying the rice varieties using Artificial intelligence
Methods. The algorithms used are Logistic Regression, Multiplayer Perceptron, SVM, RF, Decision Tree, KNN,
and Naïve Bayes. Success criteria like Efficiency, Precision, Selectivity, Clarity, F1 measure, Negative
Predictive Value, False Positive Rate, False Discovery Rate, and False Negative Rate are calculated for two-
class classification performance assessments. The Logistic Regression Method has received higher accuracy
than the other models. Its obtained accuracy is 93.02%.[4]
Xu Ma et al. proposed a model using a Fully Convolution network to segment Rice seedling weed images in
paddy fields. The methods used in this model are FCN, U-Net, and SegNet. The percentage of pixels correctly
assigned to a specific class is expressed as several pixels (referred to as the "pixel accuracy" or "PA"). The
model can directly extract characteristics from RGB photos and identify and classify the pixels. The SegNet has
performed well and gives a higher accuracy rate than the other two models. The accuracy is 93.6% for
SegNet.[5]

Table 1: Existing System analysis

S.No Author Algorithm Merits Demerits


As the training set In spectral information,
SVM, KNN, increases, CNN some bands are noisy,
1 Zheng jun
CNN outperforms the like 1st and 2nd, so there
other two models cannot be trusted
Eliminating not
RGB, pure species from Decrease in
Paul
2 Hyperspectral rice seed samples performance due to
Murray
Imaging using high spatial level of similarity
resolution images.
The random
The models only work
KNN, DT, Forest algorithm
Ilkay properly if the
3 LR, MLP, RF, performed well
CINAAR instances are correctly
SVM than the other
recognized.
algorithms
The logistic
Only some models can
Regression
KNN,DT,LR, determine the
Ilkay method has
4 MLP,RF,SVM importance of the
CINAAR achieved more
,NB variables in new
accuracy than
predictions.
other algorithms
extract features
The picture patch has
directly from
an impact on and a
FCN, U-Net, RGB images and
5 Xu Ma restriction on the
and SegNet classify and
algorithm's
recognize the
performance
pixel.

3 Proposed Methodology
3.1 Dataset Description

This model uses the Rice Seed image dataset, publicly available on Kaggle. The dataset consists of 5 types of
rice seeds, namely Arborio, Ipsala, Basmati, Karacadag and Jasmine sources. These five types of rice seeds are
split into training and testing datasets for training and to achieve accuracy in the defined models. These rice
seeds are divided into five different classes and are further continued for preprocessing.

(a) (b)

(c) (d)

(e)
Fig. 1. Types of Rice seed in the dataset (a)Arborio (b)Basmati (c)Ipsala (d)Jasmine (e)Karacadag

Fig.1 shows the types of rice seeds used in this model. The total images in the dataset are 75000 images where
every rice seed has 15000 images each i.e

Table 2: Dataset description

Rice Seed No. of Images


Arborio 15000
Basmati 15000
Ipsala 15000
Jasmine 15000
Karacadag 15000
3.2 Preprocessing

In Machine Learning, data augmentation and preprocessing play a crucial role in transforming the data and
making it suitable to the format into which the model can quickly process and identify it. The Preprocessing in
this model is done using Keras's ImageDataGenerator class.

The ImageDataGenerator class performs Translations, Rotations, Shearing, Changes in scale, Image flipping,
and zooming of the image dataset. Figure 2 defines the preprocessing of the classes. The process of
ImageDataGenerator class completes the following steps to preprocessing.

1) It takes a batch of images that the model uses for training.


2) It applies random transformations to each image contained in the batch.
3) The newly altered batch is then substituted for the original bunch of photos.
4) Then it trains a deep-learning model on this transformed batch.

Fig. 2. The preprocessing of ImageDataGenerator Class

3.3 Classification

Dividing a dataset into classes is known as classification in machine learning. Neither organized nor
unstructured data is not used to execute it. Guessing the category of a set of given data points is the first place in
the classification process. The created classes are frequently called targets, labels, or categories. Enhanced
Vanilla Convolution Neural Networks and the VGG16 architecture are the classification algorithms employed in
this model.

Vanilla CNN: A Convolution Neural Network consists of 3 layers. Figure 3 defines the CNN model.

1) Pooling Layer
2) Convolution Layer
3) Fully Connected Layer
Fig. 3. CNN Layers

Pooling Layer: The pooling layer substitutes a statistical summary from the adjacent outputs in place of the
network's production at specific locations. Because of this, the spatial size is lower when the representation is
made, which means fewer computations and weights are required. Each representational element is evaluated
separately for the pooling operation [11].

Convolution Layer: The convolution layer is the essential element of CNN design. On the network, it transfers
the vast bulk of the computing load. The dot product of two matrices is evaluated in this layer. The first matrix
serves as the kernel—a set of parameters that can learn—and the second matrix acts as the confined receptive
field [12]. The grain is shallower than a picture but smaller. As a result, if a picture contains three (RGB)
channels, the kernel's sizes of height and width are limited, but the depth will affect all three channels.

Fully Connected Layer: As in a traditional FCNN, all neurons in the layers above and below are entirely
connected. Consequently, it can be calculated by running a matrix multiplication followed by a bias effect. The
FC layer maps the representation between the input and the output.

There are various types of non-linear operations. The most popular models are

Sigmoid: The sigmoid nonlinearity's mathematical expression is σ(κ) = 1/(1+e¯κ). A number with fundamental
values is "squshed" into 0 and 1. A particularly undesirable sigmoid characteristic occurs when the activation is
at either tail—the gradient practically disappears. A local slope will unintentionally "die" if it shrinks to an
exceedingly low value during backpropagation. A zigzag dynamic of gradient updates for weight will result if
the data entering the neuron is always positive. Thus, this will result in either all positives or all negatives
coming out of the sigmoid.
Tanh: Tanh converts an interval [-1,1] from a real-valued number. Although the output is zero-centred instead
of sigmoid-centred, the activation saturates like the sigmoid.
ReLU: The Rectified Linear Unit (ReLU) has received much attention recently. The function f(k)=max (0, κ) is
computed. In other words, at 0, the activation is merely the threshold ReLU is six times faster in achieving
convergence than sigmoid and tan h and is also more dependable. However, The weakness of ReLU during
training is a drawback. A strong gradient running across it can update it while preventing the neuron from
receiving additional updates. We can overcome this by selecting a suitable learning rate.

VGG16: Convolutional neural networks, a kind of artificial neural network, go by ConvNet. Convolutional
neural networks consist of input, output, and several undiscovered layers. The CNN (Convolutional Neural
Network) subtype, VGG16, is one of the most influential computer vision models. Fig. 4 describes how the data
is processed in the VGG-16 model. The model's creators used an architecture with exceedingly small (3 x 3)
convolution filters to analyze the networks and increase the depth, significantly improving over previous
approaches. There are now roughly 138 trainable parameters thanks to the increase in depth to 16–19 weight
layers.
Fig. 4. VGG16 Architecture

The weighted 16 layers of VGG16 are represented by the number 16. VGG16 includes 21 layers, 13
convolutional layers, five Max Pooling layers, and three Dense layers. However, only 16 are weight layers, also
known as learnable parameters layers. VGG16 has an input tensor size of 224, 244 and three RGB channels. The
design keeps the max pool and convolution layers in the same sequence. Conv-1 Layer is made up of 64 filters,
Conv-2 Layer is made up of 128 filters, Conv-3 Layer is made up of 256 filters, Conv-4 Layer is made up of
512 filters, and Conv-5 Layer is made up of 512 filters. Three Fully-Connected (FC) layers are added after a
collection of convolutional layers. The third has a 1000-way ILSVRC classification, whereas the previous two
offer 4096 channels.

4 Results & Discussions


The results of the proposed model are described below.

Fig. 5. Data Preprocessing

In fig. 5, the model calculating and processing the dataset provided for several pictures and the classes into
which they are divided. It is represented in a bar chart. This data preprocessing helps the user illustrate the data
in which the model is working and helps in accurately generating the results.

Fig. 6. epoch vs accuracy graph for Model accuracy and loss using Vanilla CNN

The above fig. 6 plots the Models accuracy and loss at different epochs and iterations. The model's accuracy has
kept improving as the iterations are performed with training data, and it decreased the accuracy with the
validation data. The Models loss with training data kept descending as the iterations performed, and the loss
with validation data kept improving with the iterations performed.
On further analysis of the Vanilla CNN model, the F1 score, Accuracy, Precision, Recall and support values are
generated in fig. 7 for each class of the dataset. Macro averages and weighted averages are also developed for
the analysis values.

Fig. 7. Output values for Vanilla CNN

The output values vary from time to time and depend on the model used to train using the dataset. The next step
performs analyses and produces a summary of the VGG16 model on the dataset, as described in fig. 8. The
model summary helps strengthen the model in various aspects. It includes detailed information about the layers
used in the model. It estimates how many parameters are trainable, non-trainable and total parameters.

Fig. 8. Model Summary to evaluate using VGG16

On classification, validation and data analysis, a graph has been a plot to determine the accuracy vs epochs
intensity, as shown in fig. 9. The model accuracy has kept improving in this model, and model loss has
decreased as the epochs iterations increase.
Fig. 9. Epoch vs Accuracy graph for Model accuracy and loss using VGG16

The precision, recall, f1 score and support values are generated and are elaborated in fig. 10. It gives a detailed
overview of the predicted values of precision, recall, f1 and support.

Fig. 10. Output values Using VGG16

Fig. 11 represents the final output of the model predicting the type of the seed and specifies what the truth is and
the expected name of the rice seed. It determines whether the model can predict accurately or not.
Fig. 11. Prediction output for both VGG16 and Vanilla CNN model

5 Conclusion

The model proposed in this paper is to predict the accuracies of two different models, Vanilla CNN and VGG16,
to find and compare them in prediction. This accuracy prediction helps define which model can produce good
results when trained and tested with an Image dataset. When the models were trained and predicted the accuracy
results, The VGG16 model performed well and predicted the outputs better than the Vanilla CNN model. Hence
VGG16 undergoes various layers of pooling and convolution, which makes the model more accurate. Hence, the
VGG16 model outperforms the Vanilla Convolution Neural Network. Furthermore these models can be
compared with various datasets and various models that are developed and more enhanced than the models we
use now.

References

1. Qiu, Z., Chen, J., Zhao, Y., Zhu, S., He, Y., & Zhang, C. (2018). Variety identification of single rice seed using
hyperspectral imaging combined with the convolutional neural network. Applied Sciences, 8(2), 212.
2. S. D. Fabiyi et al., "Varietal Classification of Rice Seeds Using RGB and Hyperspectral Images," in IEEE Access,
vol. 8, pp. 22493-22505, 2020, doi: 10.1109/ACCESS.2020.2969847.
3. Cinar, Ilkay, and Murat Koklu. "Identification of Rice Varieties Using Machine Learning Algorithms." Journal of
Agricultural Sciences (2022): 9-9.
4. Cinar, Ilkay, and Murat Koklu. "Classification of rice varieties using artificial intelligence methods." International
Journal of Intelligent Systems and Applications in Engineering 7.3 (2019): 188-194.
5. Ma, X., Deng, X., Qi, L., Jiang, Y., Li, H., Wang, Y., & Xing, X. (2019). Fully convolutional network for rice
seedling and weed image segmentation at the seedling stage in paddy fields. PloS one, 14(4), e0215676.
6. Durai, S., and C. Mahesh. "Research on varietal classification and germination evaluation system for rice seed
using handheld devices." Acta Agriculturae Scandinavica, Section B—Soil & Plant Science 71.9 (2021): 939-955.
7. Liu, Zy., Cheng, F., Ying, Yb. et al. identification of rice seed varieties using neural network. J Zheijang Univ Sci
B 6, 1095–1100 (2005). https://doi.org/10.1631/jzus.2005.B1095
8. P. T. Thu Hong, T. T. Thanh Hai, L. T. Lan, V. T. Hoang, V. Hai and T. T. Nguyen, "Comparative Study on
Vision Based Rice Seed Varieties Identification," 2015 Seventh International Conference on Knowledge and
Systems Engineering (KSE), 2015, pp. 377-382, doi: 10.1109/KSE.2015.46.
9. Jin, B., Zhang, C., Jia, L., Tang, Q., Gao, L., Zhao, G., & Qi, H. (2022). Identification of rice seed varieties based
on near-infrared hyperspectral imaging technology combined with deep learning. ACS omega, 7(6), 4735-4749.
10. Sudeepthi Govathoti, A Mallikarjuna Reddy, Deepthi Kamidi and G Balakrishna, “Data Augmentation Techniques
on Chilly Plants to Classify Healthy and Bacterial Blight Disease Leaves” International Journal of Advanced
Computer Science and Applications(IJACSA), 13(6), 2022. http://dx.doi.org/10.14569/IJACSA.2022.0130618
11. Sri Silpa Padmanabhuni and Pradeepini Gera, “Synthetic Data Augmentation of Tomato Plant Leaf using Meta
Intelligent Generative Adversarial Network: Milgan” International Journal of Advanced Computer Science and
Applications(IJACSA), 13(6), 2022. http://dx.doi.org/10.14569/IJACSA.2022.0130628

APPENDIX

Pseudo Code for VGG16

Start
Initialize
vgg16 = VGG16(weights="imagenet", include_top=False, input_shape=input_shape)
vgg16.trainable = False
inputs = tf.keras.Input(input_shape)
x = vgg16(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
x = tf.keras.layers.Dense(5, activation='softmax')(x)
model_vgg16 = tf.keras.Model(inputs, x)
Pseudo Code for Vanilla CNN

Start
Initialize
model_vanilla = tf.keras.Sequential([
tf.keras.layers.Conv2D(32,(3,3),
activation='relu', input_shape=input_shape),
tf.keras.layers.BatchNormalization(),

tf.keras.layers.Conv2D(32,(3,3),activation='relu',pa
dding='same'),
tf.keras.layers.BatchNormalization(axis = 3),

tf.keras.layers.MaxPooling2D(pool_size=(2,2),pad
ding='same'),
tf.keras.layers.Dropout(0.3),

tf.keras.layers.Conv2D(64,(3,3),activation='relu',pa
dding='same'),
tf.keras.layers.BatchNormalization(),

tf.keras.layers.Conv2D(64,(3,3),activation='relu',pa
dding='same'),
tf.keras.layers.BatchNormalization(axis = 3),

tf.keras.layers.MaxPooling2D(pool_size=(2,2),pad
ding='same'),
tf.keras.layers.Dropout(0.3),

tf.keras.layers.Conv2D(128,(3,3),activation='relu',p
adding='same'),
tf.keras.layers.BatchNormalization(),

tf.keras.layers.Conv2D(128,(3,3),activation='relu',p
adding='same'),
tf.keras.layers.BatchNormalization(axis = 3),

tf.keras.layers.MaxPooling2D(pool_size=(2,2),pad
ding='same'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(5, activation='softmax')
])

You might also like