You are on page 1of 5

International Journal of Computer Science and Information Security (IJCSIS),

Vol. 15, No. 8, Augus 2017

Principle Component Analysis for Classification of the Quality of


Aromatic Rice
Etika Kartikadarma 1* , Sari Wijayanti2 , Sari Ayu Wulandari3 , Fauzi Adi Rafrastara 1
1 Faculty of ComputerScience, Universitas Dian Nuswantoro, 50131, Indonesia
2 Information Engineering Department, STMIK Jenderal A Yani, Yogyakarta, 552851, Indonesia
3 Faculty of Engineering, Universitas Dian Nuswantoro, 50131, Indonesia
*
email: etika.kartikadarma@dsn.dinus.ac.id

Abstract—This research introduces an instrument for performing


quality control on aromatic rice by utilizing feature extraction of
Principle Component Analysis (PCA) method. Our proposed system
(DNose v0.2) uses the principle of electronic nose or enose. Enose is
a detector instrument that work based on classification of the smell,
like function of human nose. It has to be trained first for recognizing
the smell before work in classification process.
The aim of this research is to build an enose system for quality
control instrument, especially on aromatic rice. The advantage of this
system is easy to operate and not damaging the object of research. In
this experiment, ATM ega 328 and 6 gas sensors are involved in the
electronic module and PCA method is used for classification process.

Keywords— Enose, Principal Component Analysis, Aromatic


Rice, Quality Control Instrument
FIGURE 1. DNOSE V0.2
I. INT RODUCT ION
One of the natural wealth of Indonesia is the diversity in A. Array Sensor System
tropical plant, and aromatic rice is a part of it, especially fro m This enose system consists of 6 gas sensors: TGS825,
group of rice. Aromatic rice is different to the common one in TGS826, TGS822, TGS813, TGS2620 and TGS2611. Those
term of smell and quality [1]. It has a musty smell because of sensors are metal oxide sensor and very useful for olfactory
sugar fermentation [2]. The experiment for such kind of rice system. Figure 2 describes the characteristic respond of sensor
actually can be conducted in laboratories, but it takes a long array when tested with aroma. Initially the aroma was sprayed
time and costly. into chamber sampler, then absorbed and thrown away slowly.
Characteristic Array Sensor
A new hope as an alternative way to test the aromatic rice 20000
quickly and accurately has landed and it is available in the body TGS825
of electronic nose (hereinafter referred as enose) [3]. The 15000 TGS826
TGS813
application of enose is so wide nowadays, including medical TGS822
field [4]. Enose even can bring the benefit on the daily activity, 10000 TGS2620
Voltage

like to monitor the ripeness level of tomatoes based on its aroma TGS2611

[5]. By looking at the potentiality, enose is proper to be 5000


developed more and more and to be implemented, especially in
Indonesia. 0

II. SYST EM DESCRIPT ION -5000


0 2 4 6 8 10 12 14 16 18 20
Dinus Nose (DNose) v0.2 is a new version of enose that Frequency

developed in UDINUS (Universitas Dian Nuswantoro, FIGURE 2. CHARACTERISTIC OF SENSOR ARRAY


Indonesia), with some improvements on pattern recognition
engine. Our previous system, DNose v0.1 [6], could B. Clustering System
successfully recognize 80% the smell of tofu. PCA is a method to extract and reduce initial pattern vector
This DNose v0.2 is composed of sensor array system, numbers into fewer pattern vectors, called principle component.
electronic data acquisition system, and clustering system using In MATLAB, PCA has some functions that can be used to
PCA method. In this research, DNose v0.2 is developed with improve our system performance. The steps of PCA are as
the aim to determine and classify the rice quality on food follows:
industry.  Define the template vector

315 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 8, Augus 2017

 A11 A12 A13 A1n  The most common normalization methods for enose research
A A22 A23 A2 n 
are amplitude and frequency normalization. The result of
 21  amplitude normalization is linier amplitude where the highest
A   .... .... .... ....  .............................(1) value of normalized amplitude is 1. Amplitude normalizatio n
  method keeps the form of the signal and it only moves on
 .... .... .... ....  positive axis of Cartesian coordinate y.
 Am1 Am 2 Am 3 Amn  One of the popular amplitude normalization methods is
Power Average. This method is begun by finding the highest
 Find the reduction vector and variance matrix. The value
score of pattern vector. It then performs division between
of variance matrix represents eigen value.
pattern vector and the maximu m score of pattern vector. After
 var  A1  cov  A1 , A2  cov  A1 , A3  cov  A1 , A4  completing this step, then we find the average of normalized
cov  A , A  var  A2  cov  A2 , A3  cov  A2 , A4  pattern vector for each class. Finally, the obtained average
 
2 1

 cov  A3 , A1  cov  A3 , A2  var  A3  cov  A3 , A4  score is taken as template pattern that going to be compared
 
cov  A4 , A1  cov  A4 , A2  cov  A4 , A3  var  A4   with experiment data.
On frequency normalization, the result is in the form of
(2)
polarized amplitude, and it is moving on the positive and
 The result of theorem 2 will be used to find the
negative axis from Cartesian coordinate y. Frequency
covariance matrix. The value of covariance matrix normalization method that commonly used is fft (fast fourier
represents the eigen value.
transform). This method is begun by finding fft score of each
 To create Matrix L we can check the diagonal value of
pattern vector. Next, the normalized pattern vector is calculated
theorem 2 (Matrix A). That diagonal matrix represents to find the average score for each cluster. Finally, this average
the eigen value.
score is compared with experiment data, as a pattern template.
1 0 0 0 0 
0  0 0 0 
 2  ..........…………...(3) (15)
L  0 0 .. 00 III. EXPERIMENT
  The aim of this research is to propose a DNose v0.2 based
0 0 0 ... 0 
 0 on PCA method for recognizing the smell of aromatic rice. The
0 0 0  n 
steps are as follows:
 Where 1   2  ......   n .
1. Measuring odorant, weight: 1 ons.
u 1 , u 2 ,......... .u n 2. Collecting censor data for 1 minute in a room with 270 C
 Find the set of eigen vector { } fro m
temperature, and volume of odorant chamber is 1573
vi cm3 .
matrix S where is orthonormal eigen vector and
 3. Perform sampling odorant data to find the Analog
consistent to the eigen value i . Digital Converting (ADC) value from Microcontroller ,
 Principal Component is obtained from the equation fs = 3Hz.
below [7]. 4. Performing data normalization by using 2 methods:
PC i  u i .A ' power average and fft (fast fourier transform) method.
……………………………...(4)
5. Applying PCA clustering with the results of those 2
normalization method.
C. Normalization 6. Repeating the experiment up to 3 times for each method.
Gas sensing system which is also known as system for aroma
detection and identification, become an important research for IV. RESULT A ND DISCUSSION
examining the quality of food material. An enose with VOC
(Volatile Organic Compound) system can works very well in a A. Experiment of Power Average Normalization
special place with certain air temperature, signal booster, and Power average method is done by processing voltage data
gas sensor. Figaro gas sensor is also using VOC system. that yielded by thos e six censors, which initially have 60
However, there is a main problem on VOC system that it cannot samples then reduced to 20 samples, with sampling frequency
identified well on the patterns which almost alike. It has low fs = 3 Hz. The average of FFT normalization is listed in Table
sensitivity on sensing system [7]. On this case, the success rate 1, whereas the graph for normalization result of censor 1 to 6 is
of PCA with VOC is considered low since there is no algorith m illustrated in Figure 3.
to perform the learning process from new aroma.
T ABEL I
Actually, the process of pattern normalization can affect to
the loss of information on concentration level, but somehow, RESULT OF POWER AVERAGE NORMALIZATION
some algorithms like NN, GA, and Cluster Analysis are using Quality Censor 1 Censor2 Censor3 Censor4 Censor5 Censor6
the normalized response. KW1 2.14E+08 3.89E+08 2.05E+08 2.03E+08 1.73E+08 2.23E+08
KW2 56845310 1.7E+08 1.95E+08 1.39E+08 1.07E+08 2.68E+15
KW3 84248108 3.07E+08 93200111 76526991 69535669 3.27E+08

316 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 8, Augus 2017

Here, we can use FFT to reduce the dimension so that we can


KW1
1 visualize the data.
4
0.5
26
3
0 29
1 2 3 4 5 6 27 32
30
KW2 2 23

2nd Principal Component


21
1 24
25 38 2236 33
1 40
37 28
1716 42
0.5 13 39 34
35
14
12
11
0
10
2018 60 5147
4 43
0 195 50 46 59
48 58
1 2 3 4 5 6 3 56
-1 98
6 49 52 55
7 57
KW3 1 2 54
45
1 -2 44
53

0.5 -3
-3 -2 -1 0 1 2 3 4
1st Principal Component
0
1 2 3 4 5 6
FIGURE 5. THE FIRST AND SECOND PRINCIPAL COMPONENTS WITH POWER
AVERAGE NORMALIZATION METHOD
FIGURE 3. THERESULT OF POWER AVERAGE NORMALIZATION
4
The gathered result of power average normalization is KWALITAS 3 26
clustered by using PCA, as illustrated in Figure 4, 5, 6, and 7. 3
29
27 32
30
2 23

2nd Principal Component


21
TGS2020 31
24
KWALITAS 1 25 38 2236 33
1 40
37 28
1716 42 35
TGS824 14
12 39 34
11
0
20 10 60 5147 KWALITAS 2
4 41
43
195 50 46 59
48 58
3 56
TGS2620 -1 6 87
9 49 52 57
55
1 2 54
45
-2 44
TGS826
53

-3
TGS825 -3 -2 -1 0 1 2 3 4
1st Principal Component

TGS822
FIGURE 6. THE FIRST AND SECOND PRINCIPAL COMPONENTS CLUSTER
1000 1500 2000 2500 3000 3500 4000 4500 DESCRIP TION WITH POWER AVERAGE NORMALIZATION
Values

FIGURE 4. BLOCK PLOT OF POWER AVERAGE NORMALIZATION ON PCA


90 90%

80 80%

70 70%
T ABEL 2
Variance Explained (%)

DEVIATION STANDARD ON PCA POWER AVERAGE NORMALIZATION 60 60%

50 50%
Censor 1 Censor 2 Censor 3 Censor 4 Censor 5 Censor 6
869.4426 710.0997 675.6139 646.1813 580.0782 626.4368 40 40%

30 30%

Table 2 shows that deviation standard of PCA with power 20 20%

average normalization is still high, with the highest deviation is 10 10%

on censor 1 (869.4426) and the lowest deviation is on censor 5 0


1 2 3 4 5
0%

(580.0782). Those data can be a reference to extract some Principal Component

censors that have high deviation standard as it can obstruct the FIGURE 7. PARETO DIAGRAM OF EACH PRINCIPAL COMPONENT WITH POWER
pattern recognition process. AVERAGE NORMALIZATION
The clustering result of power average normalization shows
that the clusters are still distributed and the distance of each
cluster is still close, as illustrated in Figure 6. B. Experiment of FFT Normalization
The pareto chart illustrates the plots of variable percentage
Power average method is applied by processing the voltage
which explained on each Principle Component (PC). Figure 6
data that obtained from those six censors. Initially it has 60
shows the clear differences between first and second
samples and then reduced to 20 samples, with sampling
component. However, the variance differences between those
frequency fs = 3 Hz. The average of FFT normalization is listed
two components is still less than 50%, so it is required more
in Table 3, whereas the graph for normalization result of censor
components to be involved. According to Figure 6, the first two
1 to 6 is illustrated in Figure 7.
components represent 2/3 of total variabilities on the template.

317 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 8, Augus 2017

T ABEL 3
4
1
T HE RESULT OF FFT NORMALIZATION
2 41
Quality Censor 1 Censor2 Censor3 Censor4 Censor5 Censor6

2nd Principal Component


0 24
4
51
24

KW1 4.2E+09 7.77E+09 4.02E+09 3.97E+09 3.4E+09 4.36E+09


-2
KW2 1.02E+09 3.29E+09 3.82E+09 2.76E+09 2.05E+09 1.91E+19

KW3 1.54E+09 6.13E+09 1.79E+09 1.44E+09 1.33E+09 6.41E+09 -4

-20 -6 21
x 10 KW1
5
-8
-14 -12 -10 -8 -6 -4 -2 0 2
0 1st Principal Component

-5
1 2 3 4 5 6
FIGURE 10. THE FIRST AND SECOND PRINCIPAL COMPONENTS
-29
KW2
4
x 10 The clustering result of power average normalization shows
that the clusters have been well concentrated, and the distance
2
among each cluster is far enough. There is a new cluster created
0
by merging between cluster 2 and 3. Table 5 shows the list of
1 2 3 4 5 6 clusters involved.
-18
x 10 KW3
1 T ABLE5
P ATTERN VECTOR DISTRIBUTION OF PCA WITH FFT NORMALIZATION
0
Quality Rows
-1 KW1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1 2 3 4 5 6
KW2 21,22,23,25,26,27,28,29,31,33,34,35,36,37,39,40
KW3 41,43,44,45,46,48,49,50,51,52,53,54,56,57,58,59
New Cluster 24,30,32,38,42,47,55,60
FIGURE 8. RESULT OF FFT NORMALIZATION

On this step, the result of FFT normalization is clustered by According to Table 5, Q1 (Quality 1) has distribution 0%,
using PCA, as shown in figure 8, 9, and 10. whereas the distribution of both Q2 and Q3 are 20%.

TGS2020
90 90%

80 80%
TGS824
70 70%
Variance Explained (%)

TGS2620 60 60%

50 50%

TGS826
40 40%

30 30%
TGS825
20 20%

TGS822 10 10%

0 0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2
Values Principal Component

FIGURE 9. BLOCK PLOT OF FFT NORMALIZATION ON PCA FIGURE 11. PARETO DIAGRAM OF EACH PRINCIPAL COMPONENT

By applying the result of PCA into pattern recognition


T ABLE4 algorithm, the smaller distribution score on PCA system means
DEVIATION STANDARD ON PCA WITH FFT NORMALIZATION the pattern can be recognized easier.
Censor 1 Censor 2 Censor 3 Censor 4 Censor5 Censor6
The above pareto chart illustrates the plots of variable
0 0 0 0 0 0.1291 percentage which explained on each Principle Component (PC).
Figure 10 shows the big differences between first and second
Table 4 explains the deviation standard of PCA (FFT component. However, the variance differences between those
normalization) which has slight better score, with the highest two components is still less than 84%, so it is required more
deviation is only on censor 6 (0.1291) and the lowest deviation components to be involved. According to Figure 6, the first two
is on censor 1 to 5 (0). Those data can be a reference to remove components represent 2/3 of total variabilities on the template.
the censor that has high deviation standard. This is because a Here, we can use FFT to reduce the dimension so that we can
censor with high deviation standard can obstruct the pattern visualize the data.
recognition process. So, in this case, we remove censor 6.

318 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 15, No. 8, Augus 2017

V. CONCLUSION 2. Http://kimia.upi.edu. (March 2012). Bahan Ajar Kuliah.


3. N. E. Barbri, et al., "Application of Portable Electronic Nose Sistem to
Based on the experiment and gathered result, we conclude Assess T he Freshness of Moroccan Sardines," Materials Science and
that FFT normalization method is better than power average Engineering vol. C28, p. 656, 2008.
one. It is proved by the score of variability explained on FFT is 4. A. D'Amico, et al., "Olfactory Sistems for Medical Applications," Sensors
and Actuators vol. B, 130, p. 458, 2008.
higher than the score of variability explained on power average 5. A. H. Gomez, et al., "Monitoring Storage Shelf Life of T omato Using
normalization method. However, by using FFT, the result of Electronic Nose Tecnique," Jurnal of Food Engineering vol. 85, p. 625,
clustering process is still less than 0.1%. Therefore, the further 2008.
research is needed to find the better normalization method that 6. Wulandari, S. A., Hidayat, R., Firmansyah, E., Teknik, J., Fakultas, E.,
Universitas, T., … Indonesia, Y. (n.d.). PENGELOMPOKAN AROMA
can produce significant improvement of the percentage of T AHU BERFORMALIN BERBASIS MET ODE PCA (PRINCIPLE
variability explained. COMPONENT ANALYSIS).
7. Smith, L. I. (2002). A tutorial on Principal Components Analysis
A CKNOWLEDGEMENT Introduction.
8. R. Polikar, et al., "DET ECTION AND IDENT IFICATION OF
We would like to express our gratitude to Dr. Eng. Kuwat ODORANT S USING AN ELECT RONIC NOSE," IEEE, vol. 0-7803-
Triana, M.Si, he has been our mentor for this research: eNose 7041 -4/01 02001, 2011.
for detection of quality of aromatic rice.

REFERENCES
1. Http://www.paskomnas.com. (March 2012). Isu-Strategis-Ketahanan-
Pangan.

319 https://sites.google.com/site/ijcsis/
ISSN 1947-5500

You might also like