(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No.6, 2010
approach and the comparison results are described in Section 5.Finally, conclusions are made in Section 6.II.
PROBLEM FORMULATIONAs shown in Figure 1, assume that dataset
U
consists of
n
haplotypes
1
{}
iin
h
≤ ≤
, each with
p
different SNPs
1
{}
jjp
S
≤ ≤
,
U
is
n
×
p
matrix. Each row in
U
indicates the haplotype
i
h
and eachcolumn in
U
represents the SNP
j
S
. The element
,
ij
d
denotesthe
j
-th SNP of
i
-th haplotype,
,
{0,1}
ij
d
∈
. Our goal is todetermine a minimum size g set of selected SNPs (htSNPs){},{1,2,...,}
k
Vvkp
= ∈
,
gV
=
, in which each randomvariable
k
v
corresponding to the
k
-th SNP of haplotypes in
U
,to predict the remaining unselected ones with a minimumprediction error. The size of
V
is smaller than a user-definedvalue
R
(
gR
≤
), and the selected SNPs are called haplotype
tagging
SNPs (htSNPs) while the remaining unselected onesare named as
tagged
SNPs. Thus, the selection set
V
of htSNPsis based on how well to predict the remaining set of theunselected SNPs and the number
g
of selected SNPs is usuallyminimized according to the prediction error by calculating theleave-one-out cross-validation (LOOCV) experiments [7].
1211,11,21,1,11,12,12,22,2,12,2,1,2,,1,1,11,21,1,11,1,1,2,,1,
jpp jpp jppiiijipipinnnjnpnpnnnnjnpnpnnp
SSSSSddddd hddddd hddddd hddddd hddddd h
−−−−− − − − − −−−×
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
L LL LL LM M O M N M MML LM M N M O M MML LL L
Figure 1 The haplotype tagging SNP Selection Problem
.III.
RELATED WORKS
A.
Particle Swarm Optimization
The PSO is a novel optimization method originallydeveloped by Kennedy and Eberhart [8]. It models theprocesses of the sociological behavior associated with birdflocking and is one of the evolutionary computation techniques.In the PSO, each solution is a ‘bird’ in the flock and is referredto as a ‘particle’. A particle is analogous to a chromosome inGA. Each particle traverses the search space looking for theglobal optimum. The basic PSO algorithm is as follow:
11122
()()
kkkkkk idididididid
vwvcrpbxcrgbx
+
= ⋅ + ⋅ ⋅ − + ⋅ ⋅ −
(1)
11
kkk ididid
xvx
+ +
= +
(2)where1,2,...,
dD
=
, 1,2,...,
iS
=
, and
D
is the dimension of the problem space,
S
is the size of population,
k
is the iterativetimes;
k id
v
is the
i
-th particle velocity,
k id
x
is the current particlesolution,
k id
pb
is the
i
-th particle best (
best
p
) solution achievedso far;
k id
gb
is the global best (
best
g
) solution obtained so far byany particle in the population;
1
r
and
2
r
are random values inthe range [0,1], both of
1
c
and
2
c
are learning factors, usually
12
2
cc
= =
,
w
is a inertia factor. A large inertia weightfacilitates global exploration, while a small one tends to localexploration. In order to achieve more refined solution, ageneral rule of thumb suggests that the initial inertia value hadbetter be set to the maximum
max
0.9
w
=
, and gradually downto the minimum
min
0.4
w
=
.According to the searching behavior of PSO, the gbestvalue will be an important clue in leading particles to the globaloptimal solution. It is unavoidable for the solution to fall intothe local minimum while particles try to find better solutions.In order to allow the solution exploration in the area to producemore potential solutions, a mutation-like disturbance operationis inserted between Eq. (1) and Eq. (2). The disturbanceoperation random selects
k
dimensions (1
≤
k
≤
problemdimensions) of
m
particles (1
≤
m
≤
particle numbers) to putGaussian noise into their moving vectors (velocities). Thedisturbance operation will affect particles moving toward tounexpected direction in selected dimensions but not previousexperience. It will lead particle jump out from local search andfurther can explore more un-searched area.According to the velocity and position updated formulamentioned above, the basic process of the PSO algorithm isgiven as follows:
1.)
Initialize the swarm by randomly generating initialparticles.
2.)
Evaluate the fitness of each particle in the population.
3.)
Compare the particle’s fitness value to identify the bothof
best
p
and
best
g
values.
4.)
Update the velocity of all particles using Equation (1).
5.)
Add disturbance operator to moving vector (velocity).
6.)
Update the position of all particles using Equation (2).
7.)
Repeat the Step 2 to Step 6 until a termination criterionis satisfied (e.g., the number of iteration reaches the pre-definedmaximum number or a sufficiently good fitness value isobtained).The authors [8] proposed a discrete binary version to allowthe PSO algorithm to operate in discrete problem spaces. In thebinary PSO (BPSO), the particle’s personal best and globalbest is updated as in continuous value. The major differentbetween discrete PSO with continuous version is that velocitiesof the particles are rather defined in terms of probabilities that abit whether change to one. By this definition, a velocity mustbe restricted within the range
minmax
[,]
VV
. If
1minmax
(,)
k id
vVV
+
∉
then
11maxmin
max(min(,),)
kk idid
vVvV
+ +
=
. The new particle positionis calculated using the following rule:
61http://sites.google.com/site/ijcsis/ISSN 1947-5500