This action might not be possible to undo. Are you sure you want to continue?

6, 2010

**A Hybrid PSO-SVM Approach for Haplotype Tagging SNP Selection Problem
**

Min-Hui Lin

Department of Computer Science and Information Engineering, Dahan Institute of Technology, Sincheng, Hualien County , Taiwan, Republic of China Chun-Liang Leu Department of Information Technology, Ching Kuo Institute of Management and Health, Keelung , Taiwan, Republic of China

Abstract—Due to the large number of single nucleotide polymorphisms (SNPs), it is essential to use only a subset of all SNPs called haplotype tagging SNPs (htSNPs) for finding the relationship between complex diseases and SNPs in biomedical research. In this paper, a PSO-SVM model that hybridizes the particle swarm optimization (PSO) and support vector machine (SVM) with feature selection and parameter optimization is proposed to appropriately select the htSNPs. Several public datasets of different sizes are considered to compare the proposed approach with other previously published methods. The computational results validate the effectiveness and performance of the proposed approach and the high prediction accuracy with the fewer htSNPs can be obtained. Keywords ： Single Nucleotide Polymorphisms (SNPs), Haplotype Tagging SNPs (htSNPs), Particle Swarm Optimization (PSO), Support Vector Machine (SVM).

two categories: block-based and block-free methods. The block-based methods [1-2] firstly partition human genome into haplotype blocks. The haplotype diversity is limited and then subsets of tagging SNPs are searched within each haplotype block. A main drawback of block-based methods is that the definition of blocks is not a standard form and there is no consensus about how these blocks should be partitioned. The algorithmic framework for selecting a minimum informative set of SNPs avoiding any reference to haplotype blocks is called block-free methods [3]. In the literature [4-5], feature selection technique was adopted to solve for the tagging SNPs selection problem and achieved some promising results. Feature selection algorithms may be widely categorized into two groups: the filter approach and the wrapper approach. The filter approach selects highly ranked features based on a statistical score as a preprocessing step. They are relatively computationally cheap since they do not involve the induction algorithm. Wrapper approach, on the contrary, directly uses the induction algorithm to evaluate the feature subsets. It generally outperforms filter method in terms of classification accuracy, but computationally more intensive. Support Vector Machine (SVM) [6] is a useful technique for data classification. A practical difficulty of using SVM is the selection of parameters such as the penalty parameter C of the error term and the kernel parameter γ in RBF kernel function. The appropriate choice of parameters is to get the better generalization performance. In this paper, a hybrid PSO-SVM model that incorporates the Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) with feature selection and parameter optimization is proposed to appropriately select the htSNPs. Several public benchmark datasets are considered to compare the proposed approach with other published methods. Experimental results validate the effectiveness of the proposed approach and the high prediction accuracy with the fewer htSNPs can be obtained. The remainder of the paper is organized as follows: Section 2 introduces the problem formulation. Section 3 describes the PSO and SVM classifier. In Section 4, the particle representation, fitness measurement, and the proposed hybrid system procedure are presented. Three public benchmark problems are used to validate the proposed

I.

INTRODUCTION

The large number of single nucleotide polymorphisms (SNPs) in the human genome provides the essential tools for finding the association between sequence variation and complex diseases. A description of the SNPs in each chromosome is called a haplotype. The string element of each haplotype is 0 or 1, where 0 denotes the major allele and 1 denotes the minor allele. The genotype is the combined information of two haplotypes on the homologous chromosomes and is prohibitively expensive to directly determine the haplotypes of an individual. Usually, the string element of a genotype is 0, 1, or 2, where 0 represents the major allele in homozygous site, 1 represents the minor allele in homozygous site, and 2 is in the heterozygous site. The genotyping cost is affected by the number of SNPs typed. In order to reduce this cost, a small number of haplotype tagging SNPs (htSNPs) which predicts the rest of SNPs are needed. The haplotype tagging SNP selection problem has become a very active research topic and is promising in disease association studies. Several computational algorithms have been proposed in the past few years, which can be divided into

60

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.6, 2010

approach and the comparison results are described in Section 5. Finally, conclusions are made in Section 6.

where d = 1, 2,..., D , i = 1, 2,..., S , and D is the dimension of the problem space, S is the size of population, k is the iterative k k times; vid is the i-th particle velocity, xid is the current particle

II. PROBLEM FORMULATION As shown in Figure 1, assume that dataset U consists of n haplotypes {hi }1≤i ≤ n , each with p different SNPs {S j }1≤ j ≤ p , U is

**k solution, pbid is the i-th particle best ( pbest ) solution achieved
**

k so far; gbid is the global best ( gbest ) solution obtained so far by any particle in the population; r1 and r2 are random values in the range [0,1], both of c1 and c2 are learning factors, usually c1 = c2 = 2 , w is a inertia factor. A large inertia weight facilitates global exploration, while a small one tends to local exploration. In order to achieve more refined solution, a general rule of thumb suggests that the initial inertia value had better be set to the maximum wmax = 0.9 , and gradually down to the minimum wmin = 0.4 .

n×p matrix. Each row in U indicates the haplotype hi and each column in U represents the SNP S j . The element di , j denotes the j-th SNP of i-th haplotype, di , j ∈ {0,1} . Our goal is to determine a minimum size g set of selected SNPs (htSNPs) V = {vk }, k ∈ {1, 2,..., p} , g = V , in which each random variable vk corresponding to the k-th SNP of haplotypes in U, to predict the remaining unselected ones with a minimum prediction error. The size of V is smaller than a user-defined value R ( g ≤ R ), and the selected SNPs are called haplotype tagging SNPs (htSNPs) while the remaining unselected ones are named as tagged SNPs. Thus, the selection set V of htSNPs is based on how well to predict the remaining set of the unselected SNPs and the number g of selected SNPs is usually minimized according to the prediction error by calculating the leave-one-out cross-validation (LOOCV) experiments [7].

S1 ⎡ d1,1 ⎢d ⎢ 2,1 ⎢ M ⎢ ⎢ di ,1 ⎢ M ⎢ hn −1 ⎢ d n −1,1 hn ⎢ d n ,1 ⎣ h1 h2 M hi M S2 d1,2 d 2,2 M d i ,2 M d n −1,2 d n ,2 L L L O Sj d1, j d 2, j M L L L N S p −1 d1, p −1 d 2, p −1 M Sp ⎤ ⎥ ⎥ ⎥ ⎥ di , p ⎥ M ⎥ ⎥ d n −1, p ⎥ d n, p ⎥ ⎦ n× p d1, p d 2, p M

According to the searching behavior of PSO, the gbest value will be an important clue in leading particles to the global optimal solution. It is unavoidable for the solution to fall into the local minimum while particles try to find better solutions. In order to allow the solution exploration in the area to produce more potential solutions, a mutation-like disturbance operation is inserted between Eq. (1) and Eq. (2). The disturbance operation random selects k dimensions (1 ≤ k ≤ problem dimensions) of m particles (1 ≤ m ≤ particle numbers) to put Gaussian noise into their moving vectors (velocities). The disturbance operation will affect particles moving toward to unexpected direction in selected dimensions but not previous experience. It will lead particle jump out from local search and further can explore more un-searched area. According to the velocity and position updated formula mentioned above, the basic process of the PSO algorithm is given as follows: 1.) Initialize the swarm by randomly generating initial particles. 2.) Evaluate the fitness of each particle in the population. 3.) Compare the particle’s fitness value to identify the both of pbest and gbest values. 4.) Update the velocity of all particles using Equation (1). 5.) Add disturbance operator to moving vector (velocity). 6.) Update the position of all particles using Equation (2). 7.) Repeat the Step 2 to Step 6 until a termination criterion is satisfied (e.g., the number of iteration reaches the pre-defined maximum number or a sufficiently good fitness value is obtained). The authors [8] proposed a discrete binary version to allow the PSO algorithm to operate in discrete problem spaces. In the binary PSO (BPSO), the particle’s personal best and global best is updated as in continuous value. The major different between discrete PSO with continuous version is that velocities of the particles are rather defined in terms of probabilities that a bit whether change to one. By this definition, a velocity must k be restricted within the range [Vmin , Vmax ] . If vid+1 ∉ (Vmin , Vmax )

k k then vid+1 = max(min(Vmax , vid+1 ), Vmin ) . The new particle position is calculated using the following rule:

L di , j N M L d n −1, j L d n, j

L d i , p −1 O M L d n −1, p −1 L d n , p −1

Figure 1 The haplotype tagging SNP Selection Problem.

III.

RELATED WORKS

A. Particle Swarm Optimization

The PSO is a novel optimization method originally developed by Kennedy and Eberhart [8]. It models the processes of the sociological behavior associated with bird flocking and is one of the evolutionary computation techniques. In the PSO, each solution is a ‘bird’ in the flock and is referred to as a ‘particle’. A particle is analogous to a chromosome in GA. Each particle traverses the search space looking for the global optimum. The basic PSO algorithm is as follow:

k k k k k k vid+1 = w ⋅ vid + c1 ⋅ r1 ⋅ ( pbid − xid ) + c2 ⋅ r2 ⋅ ( gbid − xid )

k k k xid+1 = vid+1 + xid

(1) (2)

61

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

**(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.6, 2010
**

k k k If rand ( ) < S (vid+1 ) , then xid+1 = 1 ; else xid+1 = 0

(3)

(4)

k , where S (vid+1 ) =

1 1 + e − vid

k +1

Linear SVM can be generalized to non-linear SVM via a mapping function Φ , which is also called the kernel function, and the training data can be linearly separated by applying the linear SVM formulation. The inner product (Φ ( xi ) ⋅ Φ ( x j )) is calculated by the kernel function k ( xi , x j ) for given training data. By introducing the kernel function, the non-linear SVM (optimal hyper-plane) has the following forms: f ( x, α * , b* ) = ∑ yiα i* < Φ ( x) ⋅ Φ ( xi ) > +b*

i =1 m m

The function S (vid ) is a sigmoid limiting transformation and rand() is a random number selected from a uniform distribution in [0, 1]. Note that the BPSO is susceptible to sigmod function saturation which occurs when velocity values are either too large or too small. For a velocity of zero, it is a probability of 50% for the bit to flip. B. Support Vector Machine Classifier SVM starts from a linear classifier and searches the optimal hyper-plane with maximal margin. The main motivating criterion is to separate the various classes in the training set with a surface that maximizes the margin between them. It is an approximate implementation of the structural risk minimization induction principle that aims to minimize a bound on the generalization error of a model. Given a training set of instance-label pairs ( xi , yi ), i = 1, 2,..., m where xi ∈ R n and yi ∈ {+1, −1} . The generalized linear SVM finds an optimal separating hyperplane f ( x) = w ⋅ x + b by solving the following optimization problem: Minimize

m 1 T w w + C ∑ ξi 2 w , b ,ξ i =1 Subject to : yi (< w ⋅ xi > +b) + ξi − 1 ≥ 0, ξi ≥ 0

= ∑ yiα k ( x, xi ) + b

i =1

* i

(8)

*

Though new kernel functions are being proposed by researchers, there are four basic kernels as follows.

• •

• •

Linear: k ( xi , x j ) = xiT x j Polynomial: k ( xi , x j ) = (γ xiT x j + r ) d , γ > 0

(9)

(10) (11) (12)

RBF: k ( xi , x j ) = exp(−γ || xi − x j ||2 ), γ > 0 Sigmoid: k ( xi , x j ) = tanh(γ xiT x j + r )

where γ , r and d are kernel parameters. Radial basis function (RBF) is a common kernel function as Eq. (11). In order to improve classification accuracy, the kernel parameter γ in the kernel function should be properly set. IV. METHODS As the htSNPs selection problem mentioned above in Section 2, the notations and definitions are used to present our proposed method. In the dataset U of n×p matrix, each row (haplotypes) can be viewed as a learning instance belonging to a class and each column (SNPs) are attributes or features based on which sequences can be classified into class. Given the values of g htSNPs of an unknown individual x and the known full training samples from U, a SNP prediction process can be treated as the problem of selecting tagging SNPs as a feature selection problem to predict the non-selected tagging SNPs in x. Thus, the tagging SNPs selection can be transformed to solve for a binary classification of vectors with g coordinates by using the support vector machine classifier. Here, an effective PSO-SVM model that hybridizes the particle swarm optimization and support vector machine with feature selection and parameter optimization is proposed to appropriately select the htSNPs. The particle representation, fitness definition, disturbance strategy for PSO operation and system procedure for the proposed hybrid model are described as follows.

A. Particle Representation

(5)

where C is a penalty parameter on the training error, and ξi is the non-negative slack variables. This optimization model can be solved using the Lagrangian method, which maximizes the same dual variables Lagrangian LD (α ) (6) as in the separable case.

Maximize

α

LD (α ) = ∑ α i −

i =1

m

1 m ∑ α iα j yi y j < xi ⋅ x j > 2 i , j =1

Subject to : 0 ≤ α i ≤ C , i = 1, 2,..., m and

∑ α i yi = 0

i =1

m

(6)

To solve the optimal hyper-plane, a dual Lagrangian LD (α ) must be maximized with respect to non-negative α i under the constraint

∑α y

i =1 i

m

i

= 0 and 0 ≤ α i ≤ C . The penalty

parameter C is a constant to be chosen by the user. A larger value of C corresponds to assigning a higher penalty to the errors. After the optimal solution α i* is obtained, the optimal hyper-plane parameters w and b can be determined. The optimal decision hyper-plane f ( x, α * , b* ) can be written as: f ( x, α * , b* ) = ∑ yiα i* < xi ⋅ x > +b* = w* ⋅ x + b*

i =1 m

* *

(7)

The RBF kernel function is used in the SVM classifier to implement our proposed method. The RBF kernel function requires that only two parameters, C and γ should be set. Using the RBF kernel for SVM, the parameters C , γ and SNPs viewed as input features which must be optimized

62

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.6, 2010

simultaneously for our proposed PSO-SVM hybrid system. The particle representation consists of three parts including: C and γ are the continuous variables, and the SNPs mask are the discrete variables. For the feature selection, if n f features are required to decide which features are chosen, then n f +2 decision variables in each particle must be adopted. Table 1 shows the particle representation of our design. The representation of particle i with dimension of n f + 2 , where n f is the number of SNPs (features) that varies from different datasets. xi ,1 ~ xi , n f ∈ {0,1} denotes the SNPs mask,

xi , n f +1 indicates the parameter value C , xi , n f + 2 represents the

Data Preprocessing Dataset

Split dataset by LOOCV

Training set htSNPs (Feature) Selection

Testing set

Selected htSNPs (features) subset F PSO parameter : htSNPs mask

**parameter value γ . If xi , k = 1, k = 1, 2,..., n f represents the k-th SNP on the i-th particle to be selected, and vice versa. TABLE I.
**

Discrete-variables

The particle i representation.

Testing set tagged SNPs

Training set htSNPs

SVM parameter Optimization Training SVM classifier PSO parameters : C and r Learned SVM classifier PSO operation

Continuous-variables

SNPs mask

xi ,1 xi ,2 L xi , n f

C

xi , n f +1

γ

xi , n f + 2

A random key encoding method [9] is applied in the PSO algorithm. Generally, random key encoding is used for an order-based encoding scheme where the value of random key is the genotype and the decoding value is the phenotype. Note that the particle in each {xi , k }1≤ k ≤ n f is assigned a random number on (0, 1), and to decode in ascending order with regard to its value. In the PSO learning process, the particle to be counted larger tends to evolve closer to 1 and those to be counted smaller tends to evolve closer to 0. Therefore, a repair mechanism such as particle amendment in [5] to guarantee the number of htSNPs after update process in PSO is not required.

B. Fitness Measurement

Fitness calculation

Termination check ?

No

PSO operation

Yes Optimized C , r , and feature subset F

Figure 2 The flowchart of the proposed PSO-SVM model.

measurement mentioned above, details of the proposed hybrid PSO-SVM procedure are described as follows: Procedure PSO-SVM hybrid model

1.) Data preparation Given a dataset U is considered using the leave-one-out cross-validation process to split the data into training and testing sets. The training and testing sets are represented as U TR and U TE , respectively. 2.) PSO initialization and parameters setting Set the PSO parameters including the number of iterations, number of particles, velocity limitation, particle dimension, disturbance rate. Generate initial particles comprised of the features mask, C and γ . 3.) Selected htSNPs (features) subset Select input features for training set according to the feature mask which is represented in the particle from 2), then the features subset can be determined.

In order to compare the performance of our proposed approach with other published methods SVM/STSA in [4] and BPSO in [5], the leave-one-out cross validation is used to evaluate the quality of fitness measurement. The prediction accuracy is measured as the percentage of correctly predicted SNP values on non-selected SNPs. In the LOOCV experiments, each haplotype sequence is removed one by one from dataset U, the htSNPs are selected using only the remaining haplotypes to predict these tagged SNPs values for the removed one. This procedure is repeated such that each haplotype in U is run once in turn as the validation data.

C. The Proposed Hybrid System Procedure

Figure 2 shows the system architecture of our proposed hybrid model. Based on the particle representation and fitness

63

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.6, 2010

**TABLE II. Datasets (num of SNPs)
**

SVM/STSA

Results to compare PSO-SVM with SVM/STSA [4] and BPSO [5] on four real haplotype datasets.

Prediction accuracy % 80 1 1 85 1 1 90 3 2 91 3 3 92 4 4 93 5 5 94 6 6 95 8 7 96 10 9 97 22 14 98 42 29 99 51 42

5q31 (103) TRPM8 (101) LPL (88)

BPSO PSO-SVM SVM/STSA BPSO PSO-SVM SVM/STSA BPSO PSO-SVM

1 1

1

1 1

1

2 2

2

2 5

5

3 5

5

4 6

6

5 7

7

6 8

8

7 10

9

10 15

13

23 15

14

36 24

22

1

2 2

1

3 3

2

4 6

4

10 9

4

13 12

5

20 16

6

25 18

7

30 21

8

35 25

11

39 28

13

42 31

21

47 37

2

3

4

7

10

12

13

17

20

22

26

31

4.) SVM model training and testing Based on the parameters C and γ which are represented in the particle, to train the SVM classifier on the training dataset, then the prediction accuracy for SVM on the testing dataset by LOOCV can be evaluated. 5.) Fitness calculation For each particle, evaluate its fitness value by the prediction accuracy obtained from previous step. The optimal fitness value will be stored to provide feedback on the evolution process of PSO to find the increasing fitness of particle in the next generation. 6.) Termination check When the maximal evolutionary epoch is reached, the program ends; otherwise, go to the next step. 7.) PSO operation In the evolution process, discrete valued and continuous valued dimension of PSO with the disturbance operator may be applied to search for better solutions.

ranges of continuous type dimension parameters are: C ∈ [10−2 ,104 ] and γ ∈ [10 −4 ,10 4 ] . The discrete type particle for features mask, we set [Vmin , Vmax ] = [ −6, 6] , which yields a range of [0.9975, 0.0025] using the sigmoid limiting transformation by Eq. (4). Both the cognition learning factor c1 and the social learning factor c2 are set to 2. The disturbance rate is 0.05, and the number of generation is 600. The inertia weight factor wmin = 0.4 and wmax = 0.9 . The linearly decreasing inertia weight is set as Eq. (13), where inow is the current iteration and imax is the pre-defined maximum iteration.

w = wmax − inow ( wmax − wmin ) imax

(13)

**V. EXPERIMENTAL RESULTS AND COMPARISONS
**

To validate the performance of the developed hybrid approach, three public experimental SNP datasets [4] including 5q31, TRPM8 and LPL are used to compare the proposed approach with other previously published methods. When there are missing data exist in haplotype datasets, the GERBIL [4-5] program is used to resolve them. The chromosome 5q31 dataset was from the 616 kilobase region of human chromosome 5q31 and the SNPs were 103. The TRPM8 which consists of 101 SNPs was obtained from HapMap. The human lipoprotein lipase (LPL) gene was derived from the haplotypes of 71 individuals typed over 88 SNPs. Our implementation platform was carried out on the Matlab 7.3, a mathematical development environment by extending the Libsvm which is originally designed by Chang and Lin [10]. The empirical evaluation was performed on Intel Pentium IV CPU running at 3.4GHz and 2 GB RAM. Through initial experiment, the parameter values of the PSO were set as follows. The swarm size is set to 200 particles. The searching

To compare the proposed PSO-SVM approach with the SVM/STSA in [4] and BPSO in [5] on the three haplotype datasets by LOOCV experiments, the computational results of prediction accuracy according to the numbers of selected htSNPs are summarized in Table 2. As mentioned in [4], it is astonished that only one SNP for the 80% prediction accuracy in 5q31 and TRPM8 datasets can be achieved. In practice, if one guesses each SNP as 0, the prediction accuracy of 72.5% for 5q31 dataset and 79.3% for TRPM8 dataset would be obtained. Therefore, the appropriate selection of one htSNPs to correctly predict 80% on the rest of non-selected SNPs is reasonable. It is obvious that the proposed PSO-SVM hybrid model achieves higher prediction accuracy with fewer selected htSNPs in the three haplotype datasets. In general, the prediction accuracy is increased refers to the incremental selected htSNPs number. From Figure 3 to Figure 5 show that the numbers of selected htSNPs on haplotype datasets are proportional to the prediction accuracy and the PSO-SVM algorithm has very good performance for haplotype tagging SNPs selection problem in the three testing cases.

64

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.6, 2010

VI.

CONCLUSION

In this paper, a hybrid PSO-SVM model that combines the particle swarm optimization (PSO) and support vector machine (SVM) with feature selection and parameter optimization is proposed to effectively solve for the haplotype tagging SNP selection problem. Several public datasets of different sizes are considered to compare the PSO-SVM with SVM/STSA and BPSO previously published methods. The experimental results show that the effectiveness of the proposed approach and the high prediction accuracy with the fewer number of haplotype tagging SNP can be obtained by the hybrid PSO-SVM system.

REFERENCES

Figure 3 The comparison result of prediction accuracy associated with selected htSNPs on 5q31 datasets. [1] K. Zhang, M. Deng, T. Chen, M. Waterman and F. Sun, “A dynamic programming algorithm for haplotype block partitioning,” Proc. Natl. Acad. Sci., vol. 99, pp. 7335–7339, 2002. [2] K. Zhang, F. Sun, S. Waterman and T. Chen, “Haplotype block partition with limited resources and applications to human chromosome 21 haplotype data,” Am. J. Hum. Genet., vol. 73, pp. 63–73, 2003. [3] H. Avi-Itzhak, X. Su, and F. de la Vega, “Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity,” In Proceedings of Pacific Symposium on Biocomputing, vol. 8, pp. 466–477, 2003. [4] He Jingwu and A. Zelikovsky, “Informative SNP Selection Methods Based on SNP Prediction,” IEEE Transactions on NanoBioscience, Vol. 6, pp. 60-67, 2007. [5] Cheng-Hong Yang, Chang-Hsuan Ho and Li-Yeh Chuang, “Improved tag SNP selection using binary particle swarm optimization,” IEEE Congress on Evolutionary Computation (CEC 2008), pp. 854-860, 2008. [6] V.N. Vapnik, “The nature of statistical learning theory,” New York: Springer-Verlag, 1995. [7] E. Halperin, G. Kimmel and R. Shamir, “Tag SNP selection in genotype data for maximizing SNP prediction accuracy,” Bioinformatics, Vol. 21, pp. 195-203, 2005. [8] J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle swarm algorithm,” in Proceedings of the World Multiconference on Systemics, Cybernetics and Informatics, Piscataway, NJ, 1997, pp. 4104-4109. [9] J.C. Bean, “Genetics and random keys for sequencing and optimization,” ORSA J. Comput., Vol. 6, pp. 154-160, 1994. [10] C.C. Chang, and C.J. Lin, LIBSVM: a library for support vector machines, Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

Figure 4 The comparison result of prediction accuracy associated with selected htSNPs on TRPM8 datasets.

Figure 5 The comparison result of prediction accuracy associated with selected htSNPs on LPL datasets.

65

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

Sign up to vote on this title

UsefulNot usefulAbstract—Due to the large number of single nucleotide polymorphisms (SNPs), it is essential to use only a subset of all SNPs called haplotype tagging SNPs (htSNPs) for finding the relationship betw...

Abstract—Due to the large number of single nucleotide polymorphisms (SNPs), it is essential to use only a subset of all SNPs called haplotype tagging SNPs (htSNPs) for finding the relationship between complex diseases and SNPs in biomedical research. In this paper, a PSO-SVM model that hybridizes the particle swarm optimization (PSO) and support vector machine (SVM) with feature selection and parameter optimization is proposed to appropriately select the htSNPs. Several public datasets of different sizes are considered to compare the proposed approach with other previously published methods. The computational results validate the effectiveness and performance of the proposed approach and the high prediction accuracy with the fewer htSNPs can be obtained.

Keywords ： Single Nucleotide Polymorphisms (SNPs), Haplotype Tagging SNPs (htSNPs), Particle Swarm Optimization (PSO), Support Vector Machine (SVM).

Keywords ： Single Nucleotide Polymorphisms (SNPs), Haplotype Tagging SNPs (htSNPs), Particle Swarm Optimization (PSO), Support Vector Machine (SVM).

- Hybrid Genetic Algorithms and Support Vector Machines
- Paper 30-Improved Accuracy of PSO and de Using Normalization an Application to Stock Price Prediction
- support vector machine theory by andrew ng
- SMO research
- RC1835
- Cs4758 Svm Notes
- Paper
- Support Vector Machines
- Svm
- [IJETA-V3I6P3]:Mrs.S.Vidya
- libsvm
- Cs221 Report
- Svm
- Max-Margin Additive Classifiers for Detection - Maji, Berg - Proceedings of IEEE International Conference on Computer Vision - 2009
- Maxent Dual
- p813-petridis
- svm
- LIBSVM a Library for Support Vector Machines
- 7
- 207 Report
- Acuna ExtendedAbstract AssetManagement Council 1308
- Svm Data Imputation
- Fiducial Points Detection Using Svm
- 10.1.1.126.2964
- creo que este es.pdf
- HMPusingSVM_250
- Adaptive+Strategies+for+HFT
- Support Vector Machines
- Improvements to SMO Algorithm for SVM Regression_ieee_smo_reg
- A REVIEW ON OPTIMIZATION OF LEAST SQUARES SUPPORT VECTOR MACHINE FOR TIME SERIES FORECASTING
- A Hybrid PSO-SVM Approach for Haplotype Tagging SNP Selection Problem