You are on page 1of 8

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 179 (2021) 713–720

5th International Conference on Computer Science and Computational Intelligence 2020

Implementation of Using HMM-GA In Time Series Data


Agung Yuniarta Sosiawana, Rani Nooraenia, Liza Kurnia Saria, *
a
STIS Polytechnic of Statistics, Jl. Otto Iskandardinata No. 64C, East Jakarta 13330, Indonesia

Abstract

Some time series modeling methods have weaknesses, the static and dynamic information can not be consistently combined. Hidden
Markov Model provides solutions to these problems. Hidden Markov Model (HMM) is an extension of the Markov chain where
the state cannot be observed directly (hidden), but can only be observed through another set of observations. One of the problems
in HMM is how to maximizing P (O l )where O is an observation and λ is a model parameter consists of transition matrices, emission
matrices, and initial opportunity vectors which can be solved by the Baum-Welch algorithm. In practice, the Baum-Welch algorithm
produces a model that is not optimal because this algorithm is very dependent on determining the initial parameters. To solve these
problems, HMM will be combined with genetic algorithms (Hybrid GA-HMM). In general, based on AIC and BIC value, Hybrid
GA-HMM is optimal than HMM.
© 2020
© 2021 The
The Authors.
Authors. Published
Published by
by Elsevier
ELSEVIER B.V.B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 5th International Conference on Computer Science and
Peer-review
Computationalunder responsibility
Intelligence 2020of the scientific committee of the 5th International Conference on Computer Science and
Computational Intelligence 2020
Keywords: Hidden Markov Model; genetic algorithm; time series data

1. Introduction

Nowadays, the need for large-sized data processing is very important. One thing that can be used is with data
mining. Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning
techniques to extract and identify useful information and related knowledge from various large databases. Data mining
is a series of processes to create additional value from a data collection in the form of knowledge that has not been
manually recognized.

* Corresponding author. Tel.: +62-813-10734734; fax: +62-813-10734734.


E-mail address: lizakurnia@stis.ac.id

1877-0509© 2020 The Authors. Published by ELSEVIER B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 5th International Conference on Computer Science and
Computational Intelligence 2020

1877-0509 © 2021 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 5th International Conference on Computer Science and
Computational Intelligence 2020
10.1016/j.procs.2021.01.060
714 Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720
2 Author name / Procedia Computer Science 00 (2019) 000–000

The time series data often analyzed using data mining. It is obtained and arranged in order of time or collected from
time to time. Time series analysis is a method used to determine patterns of data that have been collected regularly.
Some methods of time series modeling have weaknesses, because static and dynamic information cannot be combined
consistently1. The Hidden Markov Model provides solutions to these problems. The Hidden Markov Model (HMM)
is an extension of the Markov chain in which the data cannot be observed directly (hidden), it can only be observed
through another set of observations2.
One of the problems with HMM is called Learning Problem. It is maximizing P (O l ) where O is an observation and
λ is a parameter model consisting of ( A, B, p ) which can be overcome by the Baum-Welch algorithm. In implementation,
Baum-Welch algorithm produces a model that is not optimal because this algorithm is very depend on determining
the initial parameters ( A, B, p ) . To solve this problem, we will combine HMM with genetic algorithm (Hybrid GA-
HMM).Genetic algorithm have advantage compare with the other optimization method that it flexible to hybrid with
other algorithm so genetic algorithm suitable to hybrid with Baum Welch algorithm3. Genetic algorithm is an
algorithm that has proven its ability to improve the performance of a statistical method such as in research of
(Nooraeni, 2015)4 and (Xiao, Zou, & Li, 2007)5.Many researches using GA-HMM like (Xiao, Zou, & Li, 2007), 6, 7,
etc but it use rhoulette wheels selection in selection step that have a weakness it produce premature convergence 8 so
in this study use ranking selection.
GDP as a result of the total value of goods and services produced within a country can illustrate the economic
condition in that country. In Indonesia, GDP can be used in policy making by the government. In this study, we use
GDP growth in Indonesian from 1983-2018 as a case study and the generated data from the R program as training
data. The data are divided into 2, 3, 4, and 5 states.
Based on that problems, this study aims to optimize the Hidden Markov Model with genetic algorithm and making
the best model of Indonesia’s GDP growth in 1983-2018 and training data.

2. Methodology

2.1. Data collection method

A data used in a case study is Indonesia’s GDP in 1983-2018 at constant market prices obtained from BPS
publication. The training data is the generated random data from the R. This study using multinomial distribution with
100 observations that is generated from a function that shows “N” as an increase and “T” as a decrease from normal
distributed data. A normal distributed data is generated by “rnorm” function.

2.2. Analysis method

2.2.1. Hidden Markov Model


Hidden Markov Model (HMM) is an extension of the Markov chain where the state cannot be observed directly
(hidden), it can only be observed through another set of observations2. The Markov chain is a stochastic model that
describes the sequence of possible events that might occur where the probability of an event depends only on the state
attained in the previous event. The Markov chain assumes to predict the sequence, it can only be influenced by the
present state, cannot use the previous state. HMM consist of five elements, ie Q, A, O, B, and p 9. Q = q1 , q 2 , q3 , !, q n is
a collection from N states. A is a transition probability matrix n×n where aii represent the change from state i to state
j. While O = o1, o2 , o3 ,!, ot expresses a sequence of T ordered observation data. B = bi (oi ) is an emission probability that
each of them express the probability of ot that is generated from state i. p = p 1 , p 2 , p 3 ,!, p n is an initial distribution
probability for a state. p i is a probability that the Markov chain will begin in the state i. Some of state j have p j = 0 , it
means the state j cannot be the initial state.
These are the HMM stages:
1. Determination number of states. This study uses 2, 3, 4, and 5 states. States are the class that each observation
will enters.
2. Modeling. The modeling stage is the stage in finding the initial value of parameter ie π and the value for element
of transition matrix ie a which is obtained randomly by using the "runif" function with packages Uniform {stats}
on R. Then, we find the total for each π and for each row in the matrix a. Probability is divided by the number of
Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720 715
Author name / Procedia Computer Science 00 (2019) 000–000 3

each row so that the total probability of each row is 1. The initial model obtained in HMM is used as the initial
model for the case study and training. To avoid changing probability, we use the function set.eed () in the R
program with parameter 5. This initial model will be the first individual on the GA-HMM Hybrid so that the
changes can be seen.
3. Forward algorithm
With this algorithm, we find P (O l )or probability of O = {o1 = vm1 , o2 = vm 2 , o3 = vm3 ,!, ot = vmt } given HMM l = ( A, B, p )
10
. It can be determined by induction using the forward algorithm. Define forward variable a t (i ) as:
at (i ) = P(o1 = vm1,!, ot = vmt , qt = si l ) (1)
Equation (1) is a probability of observations sequence o1, o2 , o3 ,!, ot and state si at time t given l. By induction
P (O l ) can be calculated using algorithm in Fig. 1 9:

( )
Fig.1.Algorithm to find P O l .

4. Parameter estimation with Baum Welch Algorithm


In application, HMM has a problem in maximizing P (O l )where O is an observation and λ is a parameter model
consisting of l = ( A, B, p ) . The Baum-Welch algorithm is used to solve this problem. In the Baum-Welch algorithm,
four variables are defined, namely: forward variable, backward variable, variable e t (i, j ) , and variable g t (i ) 10. The
first variable, the forward variable has been defined in equation (1), and the induction stage is given in Fig. 1. The
second variable, i.e. the backward variable is defined as:
bt (i ) = P(ot +1,!, ot qt = si , l ) (2)

Fig.2.The Baum Welch Algorithm.

Equation (2) is a probability of partial observation sequence ot +1,!, ot , given state si time t and model l.
716 Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720
4 Author name / Procedia Computer Science 00 (2019) 000–000

Furthermore, bt (i ) can be resolved by induction as follows:


• Inisialization
bt (i ) = 1, 1 £ i £ N (3)
• Induction
N
b t (i ) = å a b (o
j =1
ij j t +1 )b t +1 ( j ), t = T - 1, T - 2, ! ,1; 1 £ i (4)

Step of Baum Welch Algorithm is shown in Fig. 2 9:


If the initial model is l = ( A, B, p ) and the process is carried out in order to obtain the parameter estimation
l = (A , B , p ) . So, with that Baum-Welch algorithm l is more optimal than λ in term of P O l > P(O l ), which is ( )
obtained the new model baru l so that a more similar sequence observation is produced. If the process is repeated
until certain conditions are met, probability of observations sequence can be observed from model can be
increased.
5. Viterbi

Fig.3. Viterbi algorithm for determining the best sequence.

To determine the optimal state sequence Q* = {Q1 *, Q2 *, Q3 *, ! , QT *} with observations sequence


O = {O1 , O 2 , O3 , ! , OT } ,the Viterbi algorithm is used10. In the Viterbi algorithm, two auxiliary variables are used,
namely d t and j t :
d t (i ) = max Q1 ,Q2 ,Q3 ,!,Qt -1 P(Q1 = s i1 , ! , Qt = s it , O1 = v m1 , ! , Ot = v mt l ) (5)
With induction is obtained:
[ ]
d t +1 ( j ) = max1£i£ N d t (i ) x a ij x b j (Ot +1 ) (6)
(
j t (i ) = arg max Q1 ,Q2 ,Q3 ,!,Qt -1 P Q1 = s i1 , ! , Qt = s it , O1 = v m1 , ! , Ot = v mt l ) (7)
The steps in Viterbi algorithm for determining the best sequence are shown in Fig 3 : 9

2.2.2. Hybrid Genetic Algorithm Hidden Markov Model


Hybrid GA-HMM is a combination of Baum-Welch algorithm and genetic algorithm to maximize P (O l ) in HMM.
Genetic Algorithm (GA) is a stochastic searching method that can perform global search in a defined search space.
This algorithm uses the laws of natural selection and genetics5. The process of selecting genes in this genetic algorithm
is similar with process of selecting individuals who can survive natural selection. The fitness function used is the
similarity value of the model using AIC. Process of the Hybrid GA-HMM shows in Fig. 4.
1. Stages of genetic algorithms
• Coding scheme. At this stage, an individual is formed consisting of genes with real data types that contain
initial value or p and element of the transition matrix or a. Fig. 5 shows the individual formation.
Gene 1 to n show the initial value or p and the others show element of transition matrix or a.
Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720 717
Author name / Procedia Computer Science 00 (2019) 000–000 5

Coding scheme
Mutation

Population
inisialization Regeneration

Objective function Objective function

Older generation
No Termination
condition
Selection
Yes

Crossover Termination

Fig.4.Block diagram of genetic algorithm.

• Population Initialization. At this stage, individuals who have been formed are combined into a list of these
individuals. This study used population with 200 people.
• The objective function is the function to determine the fitness value of each individual. The Baum Welch's
algorithm function is used to find the maximum P (O l )and calculate the AIC value to compare values between
individuals.

p1 p2 … pn a11 a12 a13 … an n-1 an n

Fig.5. Individual formation.

• The older generation is the initial population with their respective fitness values.
• Initial selection is a stage of individual selection using the ranking selection method. In this study, two
individuals from population were selected as parent 1 and parent 2.
• Crossover is a process of crossing 2 parents with a certain crossover rate. Crossover rate is likely to be
exchanged genes. This study used crossover rate of 0.1.
• Mutation is a process of alternating part of parent gene with a certain mutation rate. The greater the mutation
rate, the greater the probability of parent gene will change. This study used mutation rates of 0.01.
• The objective function is a function to determine the fitness value of each mutant. The objective function is
formulated with:
1
f = (8)
AIC
• Regeneration is the process of inserting mutants and their fitness values into population and replacing
individuals with smallest fitness values.
• Termination functions to stop iteration of genetic algorithms. It is filled with the number of generations you
want to produce. This study used 60 generations.
• Final selection is the stage of individual selection by selecting individuals who have the best fitness value in
the final generation.

2. Stages of HMM
The initial model parameters in final selection are used as the initial model parameters on the HMM. The
next steps are the same as in the HMM.
718 Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720
6 Author name / Procedia Computer Science 00 (2019) 000–000

2.3. Evaluation method

Akaike Information Criterion (AIC) was first introduced by Akaike (1974) which used to identify the model of a
dataset. It used maximum likelihood to get the level of similarity in one state. The AIC equation in selecting a model
proposed by Akaike is:
AIC = -2 log L + 2 p (9)
with:
log L = log-likelihood of the model
p = number of parameters in the model
In testing HMM models, AIC is always paired with BIC. BIC is one of the best model selection criteria, proposed
by Schwarz (1978). Similar to AIC, BIC uses the Maximum Likelihood approach to get the level of similarity in one
state. BIC complements the information for cases with large sample. The BIC formula in selecting the best model
proposed by Schwarz is:
BIC = -2 log P + p log N (10)
N = number of observations

2.4. Other Researches

Xiao, Zou, and Li (2007) examined the optimization of the Hidden Markov Model with a Genetic Algorithm for
web information extraction. This study uses 2100 web extraction and produces an average level of precision using
GA-Baum welch of 84.483%, it 13.434% greater than HMM trained with Baum Welch, which is 71.049%. Wardana
(2013) uses a genetic algorithm for HMM modeling in making a sound imitation system. The genetic algorithm can
increase the RMSE Ceptral value with the highest increase of 7.08% and an average of 2.75%. In terms of similarity,
the MOS test for HMMAG has an average value of 3.26 while the MOS test for HMM has an average value of 3.07.
In terms of quality, the MOS test for HMMAG had an average value of 2.89 while the MOS test for HMM had an
average value of 2.78.7 compare HMM optimization using Baum-Welch algorithm and genetic algorithm for speech
recognition. GA-HMM training have higher average log probabilities than HMM trained by Baum welch. 11 use
genetic algorithm for HMM modelling in Speaker-Independent (SI) speech recognizer data. It using two approach. In
the first approach, each phoneme’s HMM is optimized one at a time, applying the reproduction operators on sub-sets
of the HMM’s transition matrix elements (thus, Parents consisted of different elements of the HMM’s transition
matrix). The second approach consisted of optimization of all HMMs at once, applying the reproduction operators on
pairs of HMM’s transition matrices (thus, Parents consisted of complete HMM’s transition matrix). The first approach
performed better than the second approach, achieving statistically significant improvements in recognition accuracy
in experiments with un-adapted and adapted Speaker-Ixndependent (SI) HMMs. However, while improvements were
obtained consistently for 80% of the test speakers, no improvement was achieved for the remaining 20%. This can be
due to significant acoustic differences or the effect of language model restrictions during the execution of the GA
estimates.12 combining the Hidden Markov Model (HMM), Artificial Neural Networks (ANN) and Genetic
Algorithms (GA) to forecast financial market behaviour compare with forecast using ARIMA. The result show that
the forecasting ability of the fusion model is as good as that of ARIMA model. The difference of this study with the
other researches is in this study selection step in genetic algorithm using ranking selection because Roulette wheel
selection has a weakness it produces premature convergence8.

3. Result and Discussion

To be able to compare the results of the HMM with the GA-HMM, the initial parameters of the HMM are required
for each number of states.
Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720 719
Author name / Procedia Computer Science 00 (2019) 000–000 7

3.1. Training data

Training data is a multinomial distribution that generated with R program with 100 observations. Here are the
results of comparing the HMM and GA-HMM methods for training data:

Tabel 1.Evaluation of HMM and GA-HMM methods in multinomial distribution.


Evaluation method
Multinomial distributed data State Algorithm
AIC BIC
n=100 HMM 147.7836 160.8094
2 GA-HMM 143.8474 156.8732
Difference 3.9362 3.9362
HMM 158.6097 187.2666
3 GA-HMM 156.893 185.5498
Difference 1.7167 1.7168
HMM 174.8544 224.3526
4 GA-HMM 173.0934 222.5916
Difference 1.761 1.761
HMM 195.0696 270.6195
5 GA-HMM 194.3212 269.8712
Difference 0.7484 0.7483

In table 1, it can be seen the biggest increase occurs when the number of states used is 2 with an increase of 3.9362
or 2.663% at the AIC value and 2.448% at the BIC value. The lowest increase occurs when the number of states used
is 5 with an increase of 0.7484 or 0.384% at the AIC value and 0.276% at the BIC value. The best model is produced
when the number of states used is 2 with an AIC value is 143.8474 and a BIC is 156.8732.

3.2. Case study data

Case study data are data on Indonesia's GDP growth in 1983–2018. In this study, we use the multinomial
distribution using the category of rising or falling GDP growth in Indonesia. Up is denoted by "N" and down is denoted
by "T".

Table 2. Evaluation of HMM and GA-HMM methods on Indonesia’s GDP growth in 1983-2018
Evaluation method
Distribution State Algorithm
AIC BIC
HMM 58.4361 66.2128
2 GA-HMM 56.82423 64.6010
Difference 1.6118 1.6118
HMM 65.8084 82.9172
3 GA-HMM 62.8558 79.9646
Difference 2.9526 2.9526
Multinomial
HMM 84.6388 114.1905
4 GA-HMM 73.6122 103.1638
Difference 11.02667 11.0267
HMM 102.8740 147.9791
5 GA-HMM 89.1204 134.2255
Difference 13.7536 13.7536

Table 2 shows that in 1983-2018 Indonesia's GDP growth data with a multinomial distribution, the biggest increase
occurs when the number of states used is 5 with an increase of 13.7536 or 13.3369% at the AIC and 13.7536 or 9.294%
at the BIC value. The lowest increase is when the number of states used are 2 with an increase of 1.6118 or 2.758%
at the AIC value and 2.434% at the BIC value.
The results of clustering Indonesia’s GDP growth data in 1983–2018 with multinomial distribution show that data
in state 1 are data in 1984, 1986, 1989, 1991, 1995, 1997, 2000, 2003, 2005, 2008, 2012, 2014, and 2018. While data
in state 2 are data in 1985, 1987–1988, 1990, 1992–1994, 1996, 1998–1999, 2001–2002, 2004, 2006–2007, 2009–
2011, 2013, and 2015–2017. All data in state 1 has decreased GDP growth. There is no year in state 1 where GDP
720 Agung Yuniarta Sosiawan et al. / Procedia Computer Science 179 (2021) 713–720
8 Author name / Procedia Computer Science 00 (2019) 000–000

growth increases. Meanwhile, in state 2, there are 18 years where GDP growth has increased and 4 years where GDP
growth has decreased. The conclusion of the results of GDP growth grouping is shown in Table 3.
State 1 contains 0 category of increasing (N) and 13 category of decreasing (T) while state 2 contains 18 category of
increasing (N) and 4 category of decreasing (T) so the first state can called as decreasing and the second state can
called as increasing.

Table 3.A summary of the results of clustering Indonesia's GDP


growth data in 1983-2018 with multinomial distribution.
State Increasing (N) decreasing (T)
1 0 13
2 18 4

4. Conclusion

Based on modeling on training and case study data with number of states 2, 3, 4, and 5, an AIC and BIC value of
the Hybrid GA-HMM method is smaller than the usual HMM method. So it can be concluded that the genetic
algorithm can optimize solution of HMM. For training data, the best model is form when the number of states used
are 2, but for case study, the best model is form when the number of states used are 5. The next researchers need to
study the prediction method. To get better results, we need data with a greater number of rows and conduct HMM and
Hybrid GA-HMM experiments on data that have many variables.

Acknowledgements

Thank to STIS Polytechnic of Statistics for support financial in publisihing this work .

References

1 Leng C, Wang S. Hidden Markov Model for Predicting the Turning Points of GDP Fluctuation. International Conference on Future
Computer and Communication Engineering (ICFCCE 2014). 2014;: p. 1-3.
2 Mamonto SW, Langi YAR, Rindengan AJ. Penerapan Hidden Markov Model Pada Harga Saham. JURNAL MATEMATIKA DAN
APLIKASI Vol 5, No 1. 2016;: p. 35-41.
3 Gen M, Cheng. Genetic Algorithms dan Enginering Design Canada: John Wiley dan Sons; 1997.
4 Nooraeni R. Metode Cluster Menggunakan Kombinasi Algoritma Cluster K-Prototype dan Algoritma Genetika untuk Data bertipe
Campuran. Jurnal Aplikasi Statistika & Komputasi Statistik, V 7.2. 2015;: p. 81-98.
5 Xiao J, Zou L, Li C. Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction. Advances in
Intelligent Systems Research. 2007.
6 Wardana IMK. OPTIMASI PENERAPAN HIDDEN MARKOVMODEL MENGGUNAKAN ALGORITMAGENETIKA PADA
KONVERSI SUARA Bandung: FAKULTAS PASCA SARJANA INSTITUT TEKNOLOGI TELKOM BANDUNG; 2013.
7 Chau CW, Kwong S, Diu CK, Fahinet WR. OPTIMIZATION OF HMM BY A GENETIC ALGORITHM. ICASSP. 1997;: p. 1727-1730.
8 Yadav SL, Sohal A. Study of the various selection techniques in Genetic Algorithms. International Journal of Engineering, Science and
Mathematics Vol 6. 2017;: p. 198-204.
9 Jurafsky D, Martin JH. Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and
Speech Recognition Third Edition draft; 2019.
10 Firdaniza N, Akmal. Hidden Markov Model. SEMNAS Matematika dan Pend. Matematika. 2006 ;: p. 201-214.
11 Maldonado YP, Morales SOC, Ortega ROC. GA Approaches to HMM Optimization for Automatic Speech Recognition. MCPR. 2012;: p.
313–322.
12 Hassan MR, Nath B, Kirley M. A fusion model of HMM, ANN and GA for stock market forecasting. Expert Systems with Applications 33.
2007;: p. 171–180.

You might also like