(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8,, No. 6, 2010
III.
A
P
ROBABILISTIC APPROACH FOR SEQUENCEREPRESENTATION
A DNA sequence is essentially represented as a string of fourcharacters A, C, T, G and looks something likeACCTGACCTTACG. These strings can also be represented interms of some probability measures and using these measuresit can be depicted graphically as well. This graphicalrepresentation matches the Markov Hidden Model. A physicalor mathematical model of a system produces a sequence of symbols according to a certain probability associated withthem. This is known as a stochastic process [2]. There aredifferent ways to use probabilities for depicting the DNAsequences. The diagrammatical representation can be shownas follows:FIG 1: [The states of A, C, G and T.]For example, the transition probability from state G to state Tis 0.08, i,e,
08.0)|(
1
===
−
G xT xP
ii
In a given sequence
x
of length
L
,
x
1
, x
2
, …… x
L
, represent thenucleotides. The sequence starts at the first state
x
1
, and makessuccessive transitions to
x
2
, x
3
and so on, till
x
L
. Using Markovproperty [6], the probability of
x
L
, depends on the value of only the previous state,
x
L-1
, not on the entire previoussequence. This characteristic is known as Markov property [5]and can be written as:
)()|().......|()|()(
112211
xP x xP x xP x xP xP
L L L L
−−−
=
)|()(
121
−=
∏
=
ii Li
x xP xP
(1)In Equation (1) we need to specify
P(x
1
),
the probability of thestarting state. For simplicity, we would like to model this as atransition too. This can be done by adding a begin state,denoted by
0
, so that the starting state becomes
x
0
=0.
Now considering , the transition probability we canrewrite (1) as
i x x
i
a
1
−
∏
=
−
=
Li x x
ii
a xP
1
1
)(
(2)If there are
n
classes, then we calculate the probability of asequence
x
being in all the classes. To overcome thisdrawback we use Fuzzy composition relation. That is, wedivide the
n
classes into different groups based on theirsimilarities. So, if out of
n
classes,
m
are similar then they aretreated as one group and their individual transition probabilitytables are merged using the fuzzy composition relation. Theremaining (
n – m)
classes are similarly grouped. Lets say, if there are two classes
R1
and
R2
, the Fuzzy compositionrelation between
R1
and
R2
[6][7] can be written as follows:
)))(),(((
2121
y R x R Min Max R R
=
o
(3)
Different class representation Grouping of similar classes
Fig 2: Grouping of similar classesA table is then constructed representing the entire (
n – m)
similar classes. From this table we compute the probabilitythat a sequence
x
belongs to a given group using the followingequation:
∑
=−−−+
=−+
Liiiii
xa xa xP xP
111
log)|()|(log
(4)Here “+” represents transition probability of the sequencebelonging to one of the classes using fuzzy compositionrelation and “-“ represents the transition probability of thesame for another class [1].If this ratio is greater than zero then we can say that thesequence
x
is from the first class else from the other one.An Example:Let us consider an example for applying this classificationmethod. We have taken into consideration the Swine fludata.[11] The different categories of the Swine flu data areshown as
R
1
,
R
2
and
R
3
.
R
1
,
R
2
and
R
3
shows the Transition Probability of Type 1, Type2 and Type 3 varieties of Avian Flu.
146http://sites.google.com/site/ijcsis/ISSN 1947-5500