10055

UDC

Prediction of effects associated with single-point protein mutations and Study of mutation databases

(1) ) (2)

(

(3) (4) ( )

http://202.113.20.161:8001/index.htm

2010

5

25

1120070340

2010

5

25

/

/ 022 ( ) 81501949 94 Em ail gao_shan@mail.nankai.edu.cn 220
( )

2010 5 25 ( 20 2 ) ( 10 20 ) ( 20 ) 20 2 10 20 2 5 10 2 5 10 .

.

DNA DNA RNA - DNA- HERG HERG PMD Database Protein Mutant I .

II .

amplifications. We will propose the Hierarchical Entity-Relation Graph (HERG) Model. HVP and its development will be introduced first. Insertions&Deletions. this kind of research laid a foundation for further study in related biological problems (e. The Human Varisom Project (HVP) has been initiated to provide unified. one of the most significant challenges is to predict the effects of protein point substitutions (mutations). the heterogeneity of those databases makes it difficult to submit. Mutations can be roughly classified into natural mutations. standardized and high quality mutation data. In forward genetics. studies of protein functions). and use mutation data. the investigation of protein structure function relationship. a number of mutation databases have been developed. The dramatically accumulated molecular biology data from mutagenesis experiments have made it possible to systematically study mutation problems by bioinformatics methods. The HERG model can also be extended into a basic model in a unified framework for III . The result of prediction can be used to guide biological experiments directly.Abstract Abstract Mutations are changes in the DNA sequences and include many types.g.g. In the first part (chapter 2). based on the large scale of genome sequencing. Moreover. This brought the issue of the integration and standardization of existing mutation databases. We will then address some problems related to the integration and standardization of mutation databases. Data mining and knowledge discovery based on mutation databases is another class of important tasks in HVP. etc. gene therapy). However. The research work presented in this dissertation includes two parts. structures of RNA or proteins or other properties. drug design. exchange. In reverse genetics. researchers start with a mutation phenotype from natural mutations or random mutagenesis experiments and work toward identifying the mutated genes. Among those tasks. To facilitate such studies. identification of DNA protein interaction sites) and applications (e. researchers can use site-directed mutagenesis experiments to study functions of genes or elements on DNA sequences. random induced mutations and site-directed induced mutations. Mutagenesis experiments play an indispensable role in biological basic researches (e. including substitutions.g. which can be used to depict published molecular biology databases graphically.

based on a large dataset extracted from Protein Mutant Database (PMD) dadabae.Abstract standardizing the heterogeneous databases. We will demonstrate the advantages of the new kernel over classical SVM kernels. SVMs. data model. Key words mutation. prediction. substitution-matrix. we will report a novel substitution-matrix based kernel for support vector machine (SVM) and its application in predicting the effects of protein point substitutions (mutations). human variome. We will conclude this part with discussion of the meaning of substitution-matrix based kernel functions using information theories. standardization IV . In the second part (chapter 3).

....... ................................... ........ 34 ..............................2 HERG ..................................... 48 ............. ......2............................. ..................... 19 1........ .....3 .. ............................. ............. ................................... ...... ............................ .......1................................................................. ............................................................ ..................... ..............................1....................................3 .......................... ........... 22 .....3............... 50 ................ 19 1... 45 ........2............2 3.......... .......... .... ...................................................................... 15 ..................................... .............................................. ................ ...................... III ............. 29 ......... ....... 31 .....3............................. ....... .......... 36 HERG ...............................................................................................................2 2................ ......... ...................................................... .... 37 2.. 21 ..................... .......................................... .... ....... ..........1...... ................. .3...... ....1.. ......... ........ ..... 57 ... ..........1 2.......................................................... ..... .................... .....3......... 27 ........ . 45 3.......................................... ..........3 . ...................3 HERG .. ........... ..............................................1... ............................ ................................................2 ................. 28 .............................................................................. ........................1.................2. ......... ........... ............................. ................2 ............ ...................... ...................... ...................................1........ ....................... ........................ ............... V ............... 19 1....................2.. .......................... ..................... 45 3..... .................................. ..................1 2................ 38 2....................... 1 ......2....... ........................ 41 .....2.................................................................... ....................3 ................. .. ........... ................................................ .......1 3............ ............................1..............2 3............ 4 1.................................. .................................................. .............................................................................................. ............3 2.....................1 3...............1 . ..........2............................... 1 1......... . .... 55 .... ...... 20 HERG .....................2...............................1 . .......... 1 1........................................ ...... 58 V ................................................................. .. .............2 ........ ....................................1 . ................................... 29 ..................................................................1 ......... .....................3.................................................................................................................................... ..... ............. 55 ............................................... ......................................................4 ................................... 21 2.. ................................... .......... 46 3.................................... .................................................... 47 / 3................. 21 ..... ........................ 48 ..........1........... .3 2.......3................................ ............2.............. I Abstract ...................1.......... ......................................................... 37 2...... .4 2...................2 2........ 44 SVM ............................. ........................................ ..........................................................

.......................... 67 3............................ .......... ....................... ............. . ............................ ............................................................... ........... ........................... ........ ........................... ....................... .........5................ ............. 71 4.............................................. ................................... ...1 HERG ............. ...... ...3 132 .............. ....................................................4 ........................................................................ ............ ................................................................................... 113 .......... 74 .............. 113 EI (2 ) ......................................................... . .... . 61 3.... ........ .................................................................. ... 88 ................ .... .................................... 114 VI ............ .............................. 73 ...........1 . 90 ....... ................... ..........................2 .. .........4...... 63 3.......... .......... 74 4........................................3.4. ............... ................... .....................5................................ .................................... 111 ........... ................ ... 66 3......................................................................................................... ..3 ........ ............................ ................................................................ .................................................................................... ...................4 ...3 ............................. ........ .......................................... 93 B ............................... ............ 113 SCI 4 ... ...................... ....................... 88 ........................... ................. .......... .............. .............. .... 70 ...... .............................4...... 96 ..... ............................................... 61 3............................ .............. ... 71 ............................. .............................. 108 99 ........ ................. 69 3........................................ 66 .... ... ..................... ........................................................................ ........................... ...........................................................................2 ........ ........................................... ..........4.......... ............. ............ ............................. 87 ............................................. ......... ....... .. ........ 88 A ........................................................................ .....5................................ 108 33 ................................................................. ................. 113 ................................................................................................ 98 SNPeffect ........................................................................................................................ ......... 74 4........................ 100 C HERG XML Schema .................................. .............................. ..... ..................... 114 .................. ...........................1....................................2 .. ................................................................ 62 3................................. ........................... ............................. 88 ....................... .... ..................5...... ............ 77 ...........3.................................... ..........3....... ............ 66 3....... 96 PMD ............................ ................................................... 70 .......... .... ................. ......................... ....................... .................................. 102 1 2.............................. ............................ .................... ..........................1....... .................................... ........ 71 ............... 113 3 2 ...................................................... 71 4........................ ............................................1 ...................................... ........ ..........................1 HERG .... ......................................2 .................. .. ............................................... ........................................... .......... ........................................................... ....... ...................

1.1 Bioinformatics Molecular Bioinformatics Neuroinformatics 80 System Biology National Institutes of Health Technology Initiative Consortium BISTIC NIH Biomedical Information Science and [1] Computational Biology Quantitative Biology Mathematical Biology BISTIC [1] Biostatistics Biomedical Informatics 1 .1.1.1 1.1.

1.1.1 2 .1 1.Medical Informatics Information Technology Research Computing IT Healthcare Informatics Advanced Biomedical Sciences 1.2 [2] 1.1 1.

1.org 3 .1.1 1.1.1 1.oxfordjournals.3 [3] Nucleic Acids Research 2000 [4] The Molecular Biology Database Collection 2004 1 1170 14 2009 [5] 2009 4 Oxford University Press [6] http //database.

post genome era RNA 1.2 1.1.2.4 Support Vector Machine A SVM 1.1.1 DNA [7] [2] DNA RNA DNA RNA DNA RNA mutation polymorphism [9] variation alternation [8] mutation polymorphism variation 4 .1.1.

2.1.2 [13] 10bp 5 .[8][9][10][11] single nucleotide polymorphism SNPs SNPs SNPs cSNPs SNPs[12] DNA 1% SNPs coding region SNPs perigenic SNPs iSNPs sSNPs SNPs pSNPs nsSNPs SNPs SNPs intergenic SNPs SNPs non-synonymous cSNP SNPs synonymous SNPs 1.

2.point mutation substitution A T transversion synonymous mutation nonsynonymous or missense mutation frameshift mutation 1.2 G transition C/T [14] C A/G DNA silent mutation nonsense mutation neutral mutation 1.1.3 6 .

gene mutation wild type gene dominant mutation a a A back mutation or reversion insertion DNA repeating element deletion DNA DNA transposable element A a allele recessive mutation A 1.3 1.2 7 4 .

3B rearrangement inversion gene 1.1.1.3 1.3C translocation DNA tandem duplication segmental duplication fusion natural variant random mutagenesis site-directed mutagenesis hereditary mutation germline mutation somatic mutation acquired mutation de novo mutation mosaic mutation heterozygous mutation homozygous mutation mutation .2. loss-of-function mutation gain-of-function mutation 8 compound heterozygous .3A amplification or duplication gene duplication 1.

5 genetic marker 4 4 marker morphological marker cytological marker DNA biochemical molecular marker 9 .1.1.2.2.dominant negative mutation mutation mutation mutation DNA mutation in coding region mutation in intron region 1.4 forward genetics mutation in regulatory region lethal mutation antimorphic beneficial or advantageous harmful or deleterious mutation nearly neutral mutation neutral mutation in exon mutation in splicing genotype [13] phenotype mutant phenotypic variation [16] [15] genetic variation detection mutant screening mapping cloning sequencing 1.

1 Polymerase Chain SNPs Reaction PCR Denaturing Gradient Gel Electrophoresis Heteroduplex Analysis CDI CFLP of Mismatch PCR Conformational Polymorphism CCM dideoxy Fingerprinting DNA HA DGGE Carbodiimide EMC Cleavage Fragment Length Polymorphism Enzyme Mismatch Cleavage ddF DNA chip Chemical Cleavage PCR PCRPCR-SSCP [18] PCR Single-Strand linkage analysis association analysis [20] [19] RFLP pedigree population [21] 10 .2.1.Restriction Fragment Length Polymorphism RFLP DNA DNA [17] DNA 1.

recombinant frequency map unit Log Odd score Affected Sib Pair Member APM [22] LODs ASP [19] Affected Pedigree haplotype frequency Linkage Disequilibrium LD [23] Case-Control study CC CC population stratification family based design Haploid-Relative-Risk Transmission Disequilibrium Test TDT [21] HRR functional cloning phenotype cloning map-based cloning 1.4 11 [24] .4 1.

5 E.genotyping DNA 1. coli XL-1 Red PCR 12 X .

6 PCR [18] homologue reverse genetics 13 .PCR DNA shuffling Staggered Extension Process PCR error prone PCR StEP [25] DNA RPR Random-Priming Recombination 1.

5 [27] 14 .5 M13 PCR PCR DNA M13 PCR PCR 1.2.6 Pfu-PCR [18] [26] Eckstein UMP [18] Kunkel PCR DNA [18] 1.1.1.

1.1.7 1 1.3.1.1 14 Human Genome Organisation HUGO 15 .3 Human Variome Project HVP HVP 1.

1.1.3.1 HVP HVP 1.2 1990 1995 1993 1998 [35] LSDBs [36] HVP[37] 1.Human Genome Variation Society Horaitis Database 2.3.1.1.1 1.1.2.2.1 3 [31] [28] HGVS Central LSDBs SNP 2 Locus-Specific Databases [29] [30] 262 [32] [33] [34] 2.3 16 .

3.1.4 SNPs SNPs neutral deleterious SNPs A disease associated 17 .Rogozin Spectra DNA Cox [39] [38] Mutation [40] [41] 19 Collins Wang Gut [42] SNPs SNPs SNPs Aerts [44] [45] Martin ISAB [43] TP53 SNPs Pedigree Tool Soussi [47] Lachmund Cai TP53 Freimuth [46] web TP53 PolyMAPr Nalla Gao [52] [50] [49] [48] Greenblatt[51] Gefen 18 1.

2 homologues / / identity [57][69][71] u 10 18 .1 Decision Trees DT empirical rules [46][63][64] [41][54-60] [65][66][67] [61][68-72] [61][62] [69] mutagenesis 3 [73][74] T4 lac repressor bacteriophage T4 lysozyme [75][76] HIV-1 2.1 HIV-1 protease [77] SWISS-PROT/TrEMBL HGVbase HGMD PMD 40% pseudo pseudo 3.1.Single Amino acid Polymorphism Care [53] SAP nsSNPs 1.2.1.2.

2 1 2 HERG 3 SVM 19 .2.solvent accessibility Position-Specific Scoring Matrix substitution score matrix PSSM 1.1 1 2 3 1.2.

2.8 HERG 1.3 1 1.8 2 20 .1.

1.1 2.hgvs.HERG HERG 2.1.org/dblist/glsdb.1 LSDBs LSDBs HGVS LSDBs http //www.1.html Human Gene Mutation Database [81] HGMD [78-80] Online Mendelian Inheritance [82] in Man OMIM Genome Database dbSNP [84] [85] GDB [83] Human Genic Bi-Allelic Sequences HGBASE LSDBs curation 21 .

HERG

2.1.1.2 1994 ASHG Disorders Research Centre Melbourne mutation databases Human Genome Organisation HUGO 1996 HUGO HUGO-MDI http 6 HVP
[86]

American Society of Human Genetics Genomic Australia Richard.G.H.Cotton

HUGO Mutation Database Initiative Human Mutation journal/38515/home Science
[87]

2001

ASHG

HGVS //www3.interscience.wiley.com/ 2006

Human Variome Project

2.1.2
HVP HVP 2004 System HGVSYS HVP 2.1.2.1 HGVSYS HGVSYS HGVSYS 3 LSDBs HGVbase WayStation Central Database national and ethnic-specific mutation databases 2.1 HVP
[88]

the Human Genome Variation HVP
[36]

2006

22

HERG

2.1

HGVSYS
[36]

WayStation

Human Mutation Genome Variation Reports PubMed PubMed ID 2.2 GVRs GVRs WayStation 2003 //www.centralmutations.org/ 2005 PubMed ID 7 http WayStation

HGVSYS

HGVSYS

WayStation Gene Editor
[36]

WayStation Review Board WayStation //www.hgvs.org/geneeditors.html
23

92%

LSDBs http 1

HGVS

HERG

2

2.2

WayStation

3 5 6 WayStation Steve Callaghan 7

4

LSDBs

WayStation

LSDBs HGVS The Medical Research Council Human Genetics Unit
24

1

2 HVP 2006 HVP 6 20 10 96 [88] 3 4 23 12 HVP 10 25 .1.2.HERG 2 web 5 2.

HERG 12 Coordinating office Howard Florey Institute HVP HVP 10-12 HVP The clinic and phenotype Richard Cotton 1996 & HVP Diagnostic laboratories / Disease-Specific Database / HGVS Variation/linkage of common diseases/research laboratory Haplotype Informatics and central databases LSDBs HVP 26 .

1 27 . collection and locus-specific databases LSDBs LSDBs HVP Developing countries¶ international liaison HVP Funding and sustainability HVP Nomenclature and standards HVP Ethics and education HVP Publication and scientific journals Translation HVP 2.HERG Curation.3 1996 2006 HUGO 2.1.

HVP 28 . HVP HVP 4.HERG 2. HVP HVP 2. HVP 3.1 HVP [89 [95] [11] [38 [47 [98 [104] [106] [107] [78] [108 [113 90] 1996 1999 1998 42] 99] [91] 1998 [96] [97] 2000 2000 [43 [50] [100] [105] 44] [92] 2000 [93 94] 2002 2001 2000 2002 [51 [45 [101 46] 2004 2005 49] 2005 2008 2000 2000 1997 114] [79] 2008 2009 2004 2009 52] 2010 103] 1998 [115] [80 2009 85] 2000 [112] 2007 2010 109] 2001 [110] 2003 [111] 2005 [116-117] 2.4 HVP HVP 1.1.

1 Horaitis [28] 1.HERG 5.3.1 2. HVP HVP HVP HVP HVP HVP HVP 2.1.1.4 2.2.1 2.2.2. HVP 6.3 2.2.2.1.1 Horaitis [28] Gene29 .2 2.

1.1.alzgene.org/dblist/glsdb.org/ Alzheimer Disease and Haematology Atlas of Genetics and Cytogeneticsin Oncology http //atlasgeneticsoncology.or disease-specific mutation databases LSDBs [28] Horaitis HGVS LSDBs HUGO Gene Nomenclature Committee 2.fandm.html gene symbol 2.org http 2.edu HGVS SNP http //www.mitomap.3 HGVS 30 LSDBs .3 disease-specific mutation databases AlzGene HGNC http //www.3 http //www.hgvs.1 Central Mutation & 16S and 23S rRNA Mutation Database 2.org/ HGVS Disease Centered Central Mutation Databases system -specific mutation databases Mitochondrial Mutation Databases RNA //ribosome.HERG system.

2.1.1 record or entry field domain 2.2 4 completeness availability 2.2 31 .2 B quality nonredundancy richness 2.HERG SNP Databases Horaitis [28] HGMD[78-80] HGVbase[85] PMD[118] B 2.2.2.2.1.2.1 HGVS NCBI[119] Central repositories SWISS-PROT/TrEMBL[120] 2.3.

2.2.2.4 32 .3 2.2.HERG cross check feed back wiki WikiGenes [121] 2.

4 Entrez local deployment Application Programming Interface API remote module XML Schema [122] eXtensible Mark-up Language SQL scritps XML 33 .HERG Accessibility web 2.

2.2 34 .3 3 integration 2.2.3.2.3.HERG 2.4 Entrez Federation [126] 2.1 View integration Warehouse [123] link SRS ARX [124] Entrez [125] 2.5 KIND [127] overview 2.

2.6 35 .3.3 Mediation [126] KIND B/S Brower/Server architecture level Data access level Representation 2.5 2.HERG KIND[127] wrapper-mediator 2.6 B/S 2.

3.3.3.2.2.3 2.2.2.3 Data access level syntax 36 .7 [123] 2.2.2 IGD Integrated Genome Representation level 2.HERG 2.2.1 2.4 2.3 Database [128] 2.

xgap.1 2.hl7.4 flat file relational XML 2.3.2.2 Polymorphism Markup Language PML [138] object Phenotype and Genotype Experiment Object Model PaGE-OM [138] Proteomics [137] Standard Initiative Model Molecular Interaction format PSI-MI PharmGKB MAGE-OM MicroArray Gene Expression Object Model XGAP [141] [139] the model of Pharmacogenetics and Pharmacogenomics Knowledge Base [140] the Extensible the Genomic Genotype and Phenotype Model Genomics Model HL7-CGM [142] www.HERG semantics field nomenclature vocabularies ontologies HERG 2.org Sequence Variation Markup Language GSVML Experiment Model FuGE GSVML HL7-CGM the Health Level Seven Clinical the Functional Genomics XGAP FuGE www.org PML*# Japan Biological Information Consortium JSNP[129] HGVBase[130] HapMap[131] HGVBaseG2P[132] BIND [133] DIP[134] PaGE-OM*# PSI-MI*# PML Human Proteome Organization HUPO Proteomics Standards Initiative PSI 37 .

2 HERG 2.HERG HPRD[135] MINT[136] PharmGKB[137] The model of PharmGKB*# MAGE-OM &MAGE-ML* XGAP GSVML HL7-CGM* FuGE*# International Warfarin Pharmacogenetics Consortium IWPC Microarray Gene Expression Data Group MGED Rosetta Agilent Company and Affymetrix Company Groningen Bioinformatics Center Tokyo Medical and Dental University HL7 Clinical Genomics Work Group Rosseta biosoftware Company 2.8 HERG 6 Sequences & Genome Molecular Phenome Phenome Documentation Tools 4 Sequence Features Networks & Pathways Structures Terms & Nomenclatures Others Experiment 38 .3.3.2.2 MAGE-ML * MAGE-OM XML XML # XSD 2.1 Hierarchical Entity-Relation Graph Model 2.

8 HERG 39 .HERG DNA RNA 2.

2 HERG XML Schema 2.2 XML XML XML Schema HERG C XSD XML Schema Definition XML Schema 2.9 40 .3.HERG post-translational modification binding enzyme active [143] 2.2.

10A 99 HERG 1 0 4 2009[5] 1 PMD 41 [118] .9 2.3 HERG HERG [3] 1170 HERG 99 & 99 HERG SNPs 2.3.3.3 HERG HERG 2.1 2000[4] 132 33 99 2.3.3.2 HERG HERG HERG 6 0 1 PMD HERG HERG 99 2.2.HERG 2.3.3.2 2.

10A 99 2.10 HERG PMD SNPeffect 2.HERG 2.3.3.3 cluster 42 .

11B 4 pattern1 3 2.HERG database pattern HERG 2.11C pattern3 3 pattern7 3 pattern9 2.11 2.3.11A 2 pattern4 2.4 2 3 3 pattern6 4 2.11 2 Genotype to Phenotype [138] G2P 2 variation gene/protein function 43 3 / SNP .3.11 2.

8 3/99 2.HERG DNA PMD SNPeffect[143] 2.10B 2.8 99 19/99 hotplot 8/99 The Swiss-Prot Variant Page and the ModSNP Swiss-Prot PharmGKB [137] [144] 5 1 111 112 [i] 3 [j] 44 .

SVM SVM Human Gemone Project HGP Feature extraction Coding Feature selection 3.1 3 2n+1 i xi ! x x  n  n 1 i i 3.xin 1 xin n 3.1.xi1 xi0 xi1 .1 45 .1 i 3 .1 sliding window 3.

jp/aaindex/ 3.2.2 x i ! x i n x i n 1 .1.2.3 46 .xi1 xi0 x i1 .1 20 N 2 5 ! 32 A 01000000000000000000 00000000000000000001 21 0 21 [146] C 5 20 0[145] 10000000000000000000 20 C « Y 21 20 0 C D 00100000000000000000 N 20 1 21 3.2 index AAindex [147] http //www.SVM n=3 3.1.genome.1.2.xin 1 x in 20 3 3.1.

2 [145] 3.SVM residue composition 20 sequence profile 3.2 5 S eq QSEPEDLLK 20 N 20 v 9 AM S eq C 20 3.3 47 .2 3.1.

4.1.4.1.2.2 / performance evaluation 3.2.1 training set set validation set 3 model selection [68] test 8 3.2 SVM 3 3.5 1 1 10-fold k-2 48 k k=10 k-1 .2.1 3.SVM [148] filter 2 3 SVM Wrapper 3.

2 Resubstitution [149] 3.1.5 49 .1.2.2.1.1.3 Holdout Holdout 3.2.4 Holdout Leave-One-Out Jackknife l l l -1 l l l l 3.2.SVM / / 3.

2. .2.2.2 binary 1/-1 1/0 ti yi / performance measures l Y ! y1 .1 confusion matrix confusion matrix contingency measures 3.SVM Cross Validation l k-fold ku2 k k l k-1 k k 3 k k 5 7 10 k k k 3. t l l N l!P N P   t! l y! 1 l § yi l i !1 T Y 3. . 1 l § ti l i !1   T ! t1 .3 TP FN FP TN matrix 50 . y l .

SVM 3.3 3.1 Ac TP FP TN FN Sn  ! ? / .2.2.2 Percentages Sensitivity Specificity Accuracy 3.

 FN A! TP / P TP TP Sp  ! ? / .

TP  FP A TP Sn  ! ? / .

TN  FP A! TN / N TN Sp  ! ? /

 FN A TN TN Ac ! ?

 TN /(TP  TN  FP  FN ) A TP Err ! 1  Ac

3.1

Sensitivity of positive examples Sensitivity of negative examples Specificity of negative examples Error Rate Err Sn  Specificity of positive examples Sp  Overall Accuracy

Sn  Q + Sp 

Q P

P + Q2

Sn  Coverage 3.2.2.3 Lp distances 3.2 numerical
yi ti

Sp  Precision

Recall Hit Rate

binary 1
51

1/0

p

2

2 ti d1 FP FN d1 d1 3.5 [150] Matthews Matthews .4 Correlation Coefficient Pearson Correlation Coefficient CC CC ! (T  T )(Y  Y )T     (T  T )(T  T ) ti TT ! TP  FN  T T   ! T T TY T  l t y 2   [(Y  Y )(Y  Y ) yi (TT  l t ) (YY  l y ) 1/0 T 2 3.2.4 l t y(1  t )(1  y) 3.2.2 yi Hamming d1 1 1 p=2 quadratic Lp L2 d2 Euclidean LMS MMS Least/Minimum Mean Square 3.5 MCC 3.4 CC Matthews Correlation Coefficient CC ! 3.4  T 3.SVM 1 dp p=1 Lp L1 « ¬ § |y i ­ i »p ti | ¼ ½ p 3.3 t ! (TP  FN ) / l y ! (TP  FP ) / l CC ! TP  l t y    3.3 YY ! TP  FP  Y TY ! TP 3.

TP TN  .

FP FN .

TP  FN .

TP  FP .

TN  FN .

5 1 .5 0 MCC TN  FN ! 0 Burse ACP [150] Average Conditional Probability 52 ¡ ¡   3.2.TN  FP Approximate Correlation Coefficient 3.2.5 3.

5) AC ACP 3.SVM 0 ACP ! 0 TP TN TN » 1 « TP ¬ TP  FN  TP  FP  TN  FP  TN  FN ¼ 4­ ½ 3.4B Alarm Rate FAR E ti ! 1 ¢ Mutual Information y i ! 1 True FN FN H0 FNR False Match Rate FMR False F TP TP TPR t i ! 1 False Positive Rate FRR P FAR FPR False Non-match t i ! 1 N False 53 .2.6 Approximate Correlation AC ! 2 ( ACP  0.7 3.6 Relative Entropy Kullback Leibler KL [150] 3.2.7 ROC Receiver Operating Characteristic curve null hypothesis H0 H0 t i ! 1 TN TN TNR ti ! 1 Negative Rate False Negative Rate Accept Rate H0 yi ! 1 FAR H1 alternative hypothesis ti ! 1 True Positive Rate FP FP H0 False Reject Rate Rate FNMR l 3.2.2.

9 E 0.9 F E ROC F .4 TNR FNR TPR E E E F F F E F FPR 3.05 E E F TN FP !1 !1E N N FN FNR ! FAR ! !F P TP FN TPR ! !1 !1 F P P FP FPR ! FRR ! !E N TNR ! E U 54 3.SVM 3.

5A EER EER FRR FAR AUG 3.1.5 Area Under the Receiver Operating Characteristic Curve AUG = 1 3.5 ROC Equal Error Rate ROC AUC 3.SVM U F E 3.3.1 SVM [151] [152] 3.5A 3.1 SVM 55 over fitting .5B FRR FPR -FAR FNR 3.5B AUG = 0.4A F U E TPR-FPR 3.3.

t. 0 ]T e 3. J (x i ) "  b ] u 1  \ i .. E 2 . 2. . . C ]T x i .3 SVM 56 robustness .10 e [ C . l ! [E 1 .3. 1 T G  cT 2 [ 0 . i. t i [ w .SVM generalization min A A.l 1 w 2 2 l SVM C§\i i !1 overfitting underfitting SVM C SVM 3. . 1]T t ! [ t1 .t. \ i u 0. i ! 1.1. j ! 1.b 2 i !1 s. 0 . C .3.E l ]T G ! ti t j K c ! [1.10 Min s. .1.1. t 2 . tT Q( ) ! ! 0..2..3. . C .2 SVM A A.8 l 1 2 w  C§\i w .t l ] T      1 Q( ) ! Q( )  “Q( ) T (  )  (  ) T G (  ) 2 Q      Q ( ) u Q ( )  “Q ( ) T (  ) (  )T G(  ) u 0 G 3. x j ". 0 .

SVM 3. x)  b * ) i !1 SVM 3.5 f f f [151] Linear Function Radius Base Function RBF 57 Sigmoid £ l f Polynomial Function 3.1. . l} a H ! _ : R n p Z .3. 2.2 3.5 {( x i . t i ) x  R n i .11 .  1 }.4 SVM 3. - .1.3. SVM y ! f ( x ) ! sgn( § E i t i ( x i . Z ! _ .3. k a f 1 H SVM H f A.1.. t i  { 1 .3.1.5 [153] SVM SVM 3. i ! 1 .3.

11 [153] SVM [153] 3. x j " k 2 ) d 3.11 K ( x i .SVM p=1 RBF K ( x i . x j ) ! tanh( k 1 x j . xi " x i . x i "  K ( x i . x j ") d K ( x i . x j "  x j .3. x j ) ! exp(  || x i  x j || 2 / W 2 ) || x i  x j || 2 ! x i . x j ) ! (1 Sigmoid x i .3.3.1 DNA similarity matrix amino acid similarity matrix amino acid mutation matrix [147] 20 score matrix [147] identity matrix Genetic Code Matrix GCM 1 0 1 2 58 2 . x j ") xi xj 3.3 3.x j "  xi .

6 BLOSUM62 PAM1 n n 59 .SVM Point Accepted Mutation PAM PAM 34 85 relative mutability mutation probability matrix 10 PAM1 log odds matrix 1% [154] 1572 71 10 PAM 100 PAM PAMn n>1 3.

x j ) 3. x j  G 20 u u G ! { A. V . C .2 3.12 xi .14 3.x n1 x n j j j j j j S (xi . s ( xi .0 2 Block 1961 Swiss-Prot20 504 n>1 BLOSUM62 62% 3. T .1 PAM PAM BLOSUM BLOSUM PAM1 PAM250 BLOSUM62 x i ! x i n x i n 1 .1 30% 3.3.x 1 x 0 x1j . x j ) p « n u u » S (x i .xi1 xi0 x i1 . K . x j ) ! ¬ § s( x i . P . E . x j ) xi . R . Y } .3. N . Q.SVM BLOcks SUbstitution Matrix Henikoff [155] BLOSUM PAM BLOSUM Prosite8. D. S . I . x j " S (xi . F . x j )¼ ­u !  n ½ 3. G.xin 1 x in x j ! x  n x  n 1 .6 BLOSUM62 60 u u ¤ . W . L. H .

1.current 2000 12 24 VB 3.13 3.1.4. x j )  k 2 ) d 3.ac.734 [0] [+] [-] 61 .07Mar26. x j )  S ( x j . x j )) d K ( x i . x j ) ! (1  S ( x i .SVM K ( x i .1 PMD pmdchseq. x i ) K ( x i .294 t i ! 1 10.nig. x j )  S ( x i .4.4. x j ) ! exp(  || x i  x j || 2 / W 2 ) || x i  x j || 2 ! S ( x i .Z Visual Basic) PMD FUNCTION 40028 20 A234N A 234 234 A N B CHANGE ftp://spock. x i )  S ( x j .1 3.2 FUNCTION B ti ! 1 [=] 29.jp/pub/pmd/ pmd. x j ) ! tanh( k 1 S ( x i .genes.

2 SVM IP-SVM1 2 IP-SVM2 Eisenberg [157] 0 2n 1 N C 62 .11 3.2.7A 20 -1 3.7A 20 3.1.4.7 IP-SVM1 40 3.3 IP-SVM1 3.2.11 3. 1 2 ! 1 l W SVM 3.2.2 SVM SVM 3.4.2.13 LIBSVM[156] RBF C ! 1.4.4.7B 1 0 40 3.SVM 3.1 SVM 1 inner product based kernel SVM 20 3.2 RBF 3.

L.3. W .4. N . P.96% SVM IP-SVM2 27381 8977 1013 437 98. C .2 SM-SVM Sn  SM-SVM 4. R.2.1.3 SVM 3. Y } pab fa a .14% 78.6 3. K .87% 29. H .3 3. G .4.14 3.SVM 3.02% Sn  Sp  Sn  63 ¥ . 32% 75.31% 10.11 / BLOSUM62 sab ! 1 p log ab P fa fb 3. a fb b b P 3.03% 7. I . S .1 PMD IP-SVM SVM Sn  SM-SVM 10-fold 3. Q.14 a. E. V .2.43% 75. D. T . F . A. b  G ! {.2 IP-SVM1 TP FP TN FN 27628 9195 795 190 99.14% SM-SVM 26466 7092 2899 1352 95.4.3 SVM 3 substitution matrix based kernel SVM SM-SVM BLOSUM62 RBF 3.

2013 0.3.2 SVM 3.20% 77.7479 3.71% 75.3372 0.1967 0.8 SM-SVM IP-SVM1 Sn  3 Ac MCC SVM IP-SVM IP-SVM1 AUC ROC IP-SVM2 SM-SVM 3.86% 75.4.2 IP-SVM2 SM-SVM 3.SVM Sp  Ac MCC AUC 80.9 SM-SVM 7 Ac 64 2n 1 Ac 2n 1 5 7 [158] 49 .18% 0.653 68.8 ROC 3.1 Ac 3.67% 0.1.7001 69.10% 0.

6 3.SVM SVM 19 Ac SVM SM-SVM 7 Ac 4.3 SVM U U TP FP TN FN Ac t i ! {1.10 optimal threshold 65 .3.4.3. 1} U 0 U Ac Ac U Ac 3.9 3.2.

10 3.5.1 66 .4.SVM 3.4 [159] SM-SVM (Trehalose synthase) 962 SM-SVM / SM-SVM 3 [159] (Meiothermus 18278 962 v 19 ruber)CBS01 D200G R227C R392A 3.1.1 3.5.

2 .1.13 67 .5. K (x i . ' K ij  min( K ij ) max( K ij )  min( K ij ) [0 K (x i . x i ) SVM [160] BLOSUM62 BLOSUM62 AAindex PMD 3. x j ) ! K (x j .SVM 3 . x j ) ! K (x j .5. x i ) K ij ! 1] . BLOSUM62 merce 3.2 RBF 3.

x j ) ! exp( u!n s ( xi . x j )  S ( x j . x j ) ! ¯ n[ H ( P ) D ( P ||R )] n[ H ( Px ) D ( Px ||Rb )] ¿ xi xi a j j ± ± 2 2 ° À k 3. x j ) ! exp( S ( x i .16  n[ H ( Pxi x j )  D ( Px i x j ||Qab )] ® ¾ 2 ± ± K 3 ( x i .16 K 3 ( x i .16 — R( x ) — R( x u !n u !n ) xi u 3.16 k! 1 log e W 2P [161] Q xi u xj u R xj u Method of Types 11. x i ) / W 2 ) 3. x j ) R ( xi ) R ( x j ) u i n u u u u ] K 3 ( x i . x j ) / W 2 ) v exp( S ( x i .2 3. x j ) / W 2 ) n K 3 ( x i .SVM K ( x i .14 3.xj ) ]k u j u 3. x j ) ! exp(  S ( x i . x j )  S (x i .17 3. x j ) ! exp(  || x i  x j || 2 / W 2 ) || x i  x j || 2 ! S ( x i . x j ) ! exp[ 1 PW 2 u ! n § log n n Q ( xi .17 20 Px i G x j xi . x j Q ab Px i b Px j xj G Ra Rb a 3. x i ) K ( x i . x j xi xi u xj u xi .xj ) ] u j u u !n n — R ( x ) — R( x u! n u i n ) K 3 (x i . x j ) ! exp[ k ln u !n n — Q( x u i . x j ) / W 2 ) u u K 3 ( x i .14 fa fb 3. x j ) ! [ u! n n — Q( x u i .15 3.16 3. x i )  S ( x j .14 p ab 68 . x i ) / W 2 ) v exp(  S ( x j . x j ) / W 2 ) v exp( S ( x j .1.

SVM H xi . x j ) ! 2 2 kn[ H ( Qab )] a {b 3.5.3 3. x j D Q ab Px i Px j ! Rb x j ! Qab xi Ra Px ! Ra i xj Rb K3 (x i .1. x i ) ! 2 kn[ H ( Qaa )2 H ( Ra )] 3.19 BLOSUM62 PSI-BLAST[162] Position-Specific Iterated BLAST 3. x i ) ! 2 kn[ H (Qba )2 H ( Ra )] H ( Raa ) ! H ( R bb ) K ( x i . x j ) ! 2 kn[ H (Qab )H ( Ra )H ( Rb )] H (Ra ) ! H (R b ) K 3 ( x i .2. x j ) ! 2  kn[ H ( Qab )2 H ( Ra )] K1 ( x i .3 PSSM PSSM ortholog SM-SVM PSSM 69 .18 K 2 ( x j . x j ) ! 2 kn[ H ( Qbb )2 H ( Ra )] K 4 ( x j .

5.2.4.1.SVM 3.5 9 sequence-based [163] PMD sequence-based sample-based sample-based 3 4 [a][g] 3.1 [h] [b][c] SM-SVM SVM 111 112 70 .4 SVM fold 1 sequence-based 10-fold 9 sequenced-based sample-based fold 1 9/10 sequence-based sample-based 10-fold 10 3.

4.1.1 HERG HERG HERG HERG HERG HERG annotation mining MeSH[164] Medical Subject Headings SO[165] Sequence Ontology ( databases) Medical Language System HERG HERG GOA[167] GO[166] Gene Ontology The Gene Ontology Annotation UMLS[168] OBO[123] the Open Biomedical Ontologies Unified / 4.2.1 PMD 2.1.1 N 19N 71 19 .2.2 4.1.

1 Ac Sn  Ac 9 1 90 72 .1.2.1.2 1.3.1.3.4.3 3.2.1 4.4 PMD 4.1.1.1.3.4 LSDBs LSDBs 1 2 PMD 3.4.

4 [143] SVM 4.5 [159] SVM HERG HERG 73 .[68] [169] SVM C C 2.1.2.4.73 SVM 3.2.2 C / C 4.1.1.

1 3.17 4.19 3.3.2 4.3.1 HERG HERG 4.3.2 PMD 4.3.2.3 PMD FUNCTION FUNCTION SM-SVM [53] [0] [ ] [53] Swiss-Prot 74 disease .2.3.SVM 4.2.

1.1.2.4 3.polymorphism 4.3 1.2.3.3.2.6 75 .3.5 SM-SVM SM-SVM [158] [170][171] [172] [174] [177][178] [182] [183] [175] [72] [176] [179][180][181] [184] [163][173] SM-SVM SVM SM-SVM 4.4 web 4.3.

65% 78% 70% 75% active pocket 76 .

19(1):69 75. Guy R C.uscnlife. Central Dogma of Molecular Biology [J]. Human Mutation. [17] [18] 12±9/2010±01±25.35(5):17 18. Sefcovic E. About the Journal [DB/OL]. Achter P J.nih.cnki.aspx?searchword=%e7%aa%81%e5%8f%98%e4%bd%93 2010±01 [J] [M] 1 2000 Vol. Vol.org/wiki/Mutation. The Molecular Biology Database Collection: an online compilation of relevant database resources [J].7:969 972. www. Nucleic Acids Research.cn . A Contextualized Study of Public Discourse [J]. [4] Andreas D B. Kelso J. DATABASE: A new forum for biological databases and curation [J]. [9] Condit C M.bisti. [3] Nucl. Cotton R G. [16] Ring H Z. [19] Wikipedia.gov/docs/CompuBioDef. 2002. http://en.1093/database/bap002 published on March 26.wikipedia. 2002. 2009. 2009 [7] Wikipedia. [21] 77 563.wikipedia. Scriver C R.oxfordjournals. [20] Wikipedia. 19(1):76 78.org/wiki/Genetic_association. Vol. 1999. Vol. [2] Crick F. 2002. 2000-7-17 /2010±01±27. 2009± [DB/OL] http://www. Gentleman R. The Changing Meanings of Mutation.net 1996 158 [12] Brookes A J. 2009. Mutation [DB/OL]. Vol. Database. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 [J]. Acids Res. 2009±12±6/2010±01±25. 2010±01±23/2010±01 ±23. [5] Michael Y G. Human Mutation. Nature. bap002. Vol. Kwok P Y. The essence of SNPs [J]. http://en. EJ [M] 1 /WebForms/WebDefines. 1970. doi:10. Genetic linkage [DB/OL]. 1998. On The Changing Meanings of Mutation [J]. Lauer I. [M] 1 1984 208 [DB/OL].org/our_journals Disease Causing Mutation [J]. 2000. [10] Marshall J H. http://define.37 (Database issue):D1 D4. [8] Cotton R G H. 2006. et al. Vol. [6] Landsman D. Proof of Vol. 2000-7-17/2010±01±27. Communicating ³Mutation´ Modern Meanings and Connotations [J]. Human Mutation. Pharmacogenomics.28(1):1 7. Vol. Vol. Genetic association [DB/OL]. 2009. 19(1):2 3. 3. Human variome project: an international collaboration to catalogue human genetic variation [J]. Human Mutation. 2007 336 348. http://www.pdf.227:561 /nar/about. 12(1):1 [13] 174 [14] [15] ±23 /2010±01±23.wikipedia. Gene. NIH Working Definition of Bioinformatics and Computational Biology [DB/OL]. 8. Nucleic Acids Research.html.[1] BISTIC Definition Committee. 234(2):177 186. http://en. Vol.org/wiki/Genetic_linkage. [11] Cotton R G H.

Populations. 282(5389):682 689. [22] 2005-4-15/2010±01±27.uscnlife. [37] Cotton R G H. Vol. Horaitis O. In: Dracopoli N C. Human mutation databases [A]. 179. Auerbach A D. [29] Marsh S. 12:680 688.htm 2005-4-15/2010±01±27. SNPs. Allelic Association With SNPs: Metrics. 2010. [38] Rogozin I B. Human [31] Claustres M. 2002 178. Patrinos A. Cotton R G H. Vol. Ennis S. In Current protocols in human genetics [C]. et al. 17(4):263 270. Science. [40] Collins A.htm [DB/OL]. 2004. and the Linkage Disequilibrium Map [J].11. Jordan E. Sijmons R H.11. 20(3):174 Mutation. Human Mutation. Human Mutation. et al. 2001. Goals for the U.cn/web/page/news6137. [M] [J] [M] 2 [M] 3 1 2006 http://en. 2008:2 Repositories [J].11. Human 78 [DB/OL] Wikipedia. Vol. Automation in Genotyping of Single Nucleotide Polymorphisms [J]. SNP Databases and Pharmacogenetics: Great Start.wikipedia. Vol. Haines J L. 2003. 2008. Human Mutation. Barcelona: Human Genome Variation Society Newsletter. 2001. Science. Kwok P. Human Genome Project: 1998-2003 [J]. 2009. . Vol. 31(3):366 367. 17(2):141 150. Protein Structure. Central mutation databases a review [J]. 322(5903): 861 862. 2010±1±20/2010±01±25. 2001. Human Mutation. eds. et al. Vol. Human Mutation. 17(4):255 262. Use of Mutation Spectra Analysis Software [J]. The Human Variome Project (HVP) 2009 Forum ³Towards Establishing Standards´ [J]. [23] [24] 18. Data Mining: Efficiency of Using Sequence Databases for Polymorphism Discovery [J]. pp. Human Mutation. New York: Wiley-Liss. 1998. Moult J. [41] Wang Z. et al. 2000. 23:447 452. [39] Cox D G. Glazko G V. Vol./web/page/news6139. 30(4):493 4. 2002. [35] Collins F S. Sharing Data between LSDBs and Central [34] Howard H J. Canzian F. [30] Porter C J.S. Vol. The Challenge of Documenting Mutation Across the Genome: the Human Genome Variation Society Approach [J]. Kondrashov F A. [42] Gut I G. Genome Research. 7. Axton M. Time for a unified system of mutation description and reporting: a review of locus specific mutation databases [J]. [25] [26] [27] ( ) T AP 2007 544 572. Cotton R G H. Vol. 495. and Disease [J]. McLeod H L. Taillon-Miller P. Andersen P S.org/wiki/Linkage_ 2005 2 Vol. Human Mutation. Vol. Vol. Boillot C. [32] Human Genome Variation Society LSDB Core Data Integration Project [R]. 2001. 15(1):36 44. Linkage disequilibrium http://www. 2002. Cotton R G H. Human Mutation. [36] Horaitis O. [28] Horaitis O. Horaitis O.10(1):92 96. The Human Variome Project [J]. but a Long Way to Go [J]. et al. [33] den Dunnen J T.1 7. Talbott-Jr C C. et al. Vanevski M. Cuticchia A J. disequilibrium. Vol. 17(2):83 102. et al.

Annotation. McLeod H L. [58] Ramensky V. Human Mutation. 2009. Human Molecular Genetics. [53] Care M A. Vol. [49] Nalla V K.. [50] Gao S. Vol. Human Mutation. Predicting deleterious amino acid substitutions [J]. The Pedigree Tool: Web-Based Visualization of a Family Tree [J]. 31(3):229 236. [45] Lachmund P. Duan G Y. Levy P P. [57] Saunders C T. Bulpitt A J. Edmonson M N. Data Mining of Public SNP Databases for the Selection of Intragenic SNPs [J]. Automated Splicing Mutation Analysis by Information Theory [J]. Syndrome to gene (S2G): in-silico identification of candidate genes for human diseases [J]. Adams R M. Bioinformatics tools for single nucleotide polymorphism discovery and analysis [J]. Human Mutation. et al. Evaluation of structural and evolutionary contributions to deleterious mutations prediction [J]. Acad. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure based assessment of amino acid variation [J]. 2004. Integrating Mutation Data and Structural Analysis of the TP53 Tumor-Suppressor Protein [J]. 1020:101 109. 2004. N. Vol. [46] Cai Z. 79 . Zhang N. [48] Freimuth R R. [55] Sunyaev S. 2002. 30:3894 3900. [52] Gefen A. Genome Research. 2003. Vol. Nucleic Acids Research. Human Mutation. Reassessment of the TP53 Mutation Database in Human Disease by Data Mining With a Library of TP53 Missense Mutations [J]. 24(2):178 184. 2010. Vol. Prediction of deleterious functional effects of amino acid mutations using a library of structure-based function descriptors [J]. Vol. Vol. Journal of Molecular Biology. 2002. 23(2):103 105. et al. Rogan P K. Sunyaev S R. 2005. Deleterious SNP prediction: be mindful of your training data! [J]. Vol. 25(1):6 17. Vol. [44] Aerts J. Prediction of function changes associated with single-point protein mutations using support vector machines (SVMs) [J]. Nadine C. PROTEINS. et al. 2007. 11:863 874. and Functional Analysis [J]. 2002. et al. [54] Ng P C. Bioinformatics. 20(3):162 173. Human Mutation. Bork P.Mutation. [60] Clifford R J. 2010. 25(4):334 342. [59] Herrgard S. 2005. 17(6):475 492. 30(8):1161 1166. Cuff A L. [43] Martin A C R. Vol. 31(3): v. Human non-synonymous SNPs: server and survey [J]. Wetzels Y. 10(6):591 597. 2004. et al. Ann. Vol. Needham C J. Stormo G D. Hoffman B T. Vol. Vol. Human Mutation. Vol. Human Mutation. Vol. et al. Human Mutation. Birk O S. Human Mutation. 2001. Henikoff S. Nebel I T. PolyMAPr: Programs for Polymorphism Database Mining. [51] Greenblatt M S. Sci. 2002. [56] Chasman D. et al. Vol. Nguyen C. 53(4): 806 816. Y. [47] Soussi T. Journal of Molecular Biology. Cohen R. 19(2):149 164. 322:891 901. Prediction of deleterious human alleles [J]. Human Mutation. 2005. Cammer S A. 2001. 2001. Mutation clusters offer insight into predicting pathogenicity [J]. et al. Vol. et al. Kato S. et al. Facchiano A M. 23(6): 664 672. Bayesian approach to discovering pathogenic SNPs in conserved protein domains [J]. Baker D. 307:683 706. Vol. Vol. 2001. 25(2):110 117. Führer D.

[61] Krishnan V G, Westhead D R. A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function [J]. Bioinformatics, 2003, Vol. 19:2199 2209. [62] Dobson R, et al. Predicting deleterious nsSNPs: an analysis of sequence and structural attributes [J]. BMC Bioinformatics, 2006, Vol. 7:217. [63] Verzilli C J, John C W, Stallard N, et al. A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms [J]. Appl. Stat., 2005, Vol. 54:191 206. [64] Needham C J, et al. Predicting the effect of missense mutations on protein function: analysis with Bayesian networks [J]. BMC Bioinformatics, 2006, Vol. 7: 405. [65] Ferrer-Costa C, Orozco M, de la Cruz X. Sequence-based prediction of pathological mutations [J]. Proteins, 2004, Vol. 57(4):811 819. [66] Ferrer-Costa C, Orozco M, de la Cruz X. Use of bioinformatics tools for the annotation of disease-associated mutations in animal models [J]. Proteins, 2005, Vol. 61(4):878 887. [67] Ferrer-Costa C, Orozco M, de la Cruz X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties [J]. Journal of Molecular Biology, 2002, Vol. 315(4): 771 786. [68] Bromberg Y, Rost B. SNAP: predict effect of non-synonymous polymorphisms on function [J]. Nucleic Acids Research, 2007, Vol. 35:3823 3835. [69] Bao L, Cui Y. Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J]. Bioinformatics, 2005, Vol. 21:2185 2190. [70] Yue P, Li Z, Moult J. Loss of protein structure stability as a major causative factor in monogenic disease [J]. Journal of Molecular Biology, 2005, Vol. 353:459 463. [71] Yue P, Moult J. Identification and analysis of deleterious human SNPs [J]. Journal of Molecular Biology, 2006, Vol. 356:1263 1274. [72] Capriotti E, Calabrese R, Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J]. Bioinformatics, 2006, Vol. 22(22):2729 2734. [73] Alber T, et al. Temperature-sensitive mutations of bacteriophage T4 lysozyme occur at sites with low mobility and low solvent accessibility in the folded protein [J]. Biochemistry, 1987, Vol. 26: 3754 3758. [74] Rennell D, et al. Systematic mutation of bacteriophage T4 lysozyme [J]. Journal of Molecular Biology, 1991, Vol. 222: 67 88. [75] Markiewicz P, et al. Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as µµspacers¶¶ which do not require a specific sequence [J]. Journal of Molecular Biology, 1994, Vol. 240:421 433. [76] Suckow J, et al. Genetic studies of the Lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure [J]. Journal of Molecular Biology, 1996, Vol. 261: 509 523. [77] Loeb D D, Swanstrom R, Everitt L, et al. Complete mutagenesis of the HIV-1 protease [J]. Nature, 1989, Vol. 340:397 400.
80

[78] Krawczak M, Cooper D N. The human gene mutation database [J]. Trends in Genetics, 1997, Vol. 13:121 122. [79] Cooper D N, Ball E V, Krawczak M. The human gene mutation database [J]. Nucleic Acids Research, 1998, Vol. 26(1):285 287. 51. [80] Krawczak M, Cooper D N. Human Gene Mutation Database. A biomedical information and research resource [J]. Human Mutation, 2000, Vol. 15(1):45 [81] Lehväslaiho H, Stupka E, Ashburner M. Sequence variation database project at the European Bioinformatics Institute [J]. Human Mutation, 2000, Vol. 15(1):52 56. [82] Hamosh A, Scott A F, Amberger J, et al. Online Mendelian Inheritance in Man (OMIM) [J]. Human Mutation, 2000, Vol. 15(1):57 Vol. 15(1):62 67. 75. 61. [83] Cuticchia A J. Future vision of the GDB human genome database [J]. Human Mutation, 2000, [84]Sherry S T, Ward M, Sirotkin K. Use of molecular variation in the NCBI dbSNP Database [J]. Human Mutation, 2000, Vol. 15(1):68 [85] Brookes A J, Lehväslaiho H, Siegfried M, et al. HGBASE: a database of SNPs and other variations in and around human genes [J]. Nucleic Acids Research, 2000, Vol. 28(1):356 360. [86] Cotton R G H. Progress of the HUGO Mutation Database Initiative: A Brief Introduction to the Human Mutation MDI Special Issue [J]. Human Mutation, 2000, Vol. 15(1): 4 Genome Variation Society (HGVS) [J]. Human Mutation, 2002, Vol. 19(1):1. [88] Cotton R G H, participants of the 2006 Human Variome Project meeting. Recommendations of the 2006 Human Variome Project meeting [J]. Nature Genetics, Vol. 39(4):433 436. [89] Beaudet A L, the Ad Hoc Committee on Mutation Nomenclature. Update on nomenclature for human gene mutations [J]. Human Mutation, 1996, Vol. 8(3):197 202. [90] Beutler E, McKusick V A, Motulsky A, et al. Mutation nomenclature: nicknames, systematic names and unique identifiers [J]. Human Mutation, 1996, Vol. 8(3):203 206. 3. 12. [91] Antonarakis S E, the Nomenclature Working Group. Recommendations for a nomenclature system for human gene mutations [J]. Human Mutation, 1998, Vol. 11(1):1 [92] den Dunnen J T, Antonarakis S E. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion [J]. Human Mutation, 2000, Vol. 15(1): 7 [93] den Dunnen J T, Antonarakis S E. Mutation Nomenclature Extensions and Suggestions to Describe Complex Mutations:A Discussion [J]. Human Mutation, 2002, Vol. 20(5):403. [94] Nebert D W. Proposal for an Allele Nomenclature System Based on the Evolutionary Divergence of Haplotypes [J]. Human Mutation, 2002, Vol. 20(6):463 472. [95] Scriver C R, Nowacki P M, Lehvaslaiho H. Guidelines and recommendations for content, structure and deployment of mutation databases [J]. Human Mutation, 1999, Vol. 13(5):344 350. [96] Scriver C R, Nowacki P M, Lehvaslaiho H, et al. Guidelines and recommendations for content, structure, and deployment of mutation databases. II. Journey in progress [J]. Human Mutation, 2000, Vol. 15(1):13 15. 21. [97] Cotton R G H, Horaitis O. Quality control in the discovery, reporting, and recording of genomic variation [J]. Human Mutation, 2000, Vol. 15(1):16
81

6.

[87] Cotton R G H, Kazazian-Jr H H. Human Mutation: The Official Journal of the Human

[98] Brown A F, McKie M A. MuStaR and other software for locus-specific mutation databases [J]. Human Mutation, 2000, Vol. 15(1):76 85. 94. [99] Béroud C. UMD (universal mutation database): a generic software to build and analyse locus specific databases [J]. Human Mutation, 2000, Vol. 15(1):86 [100] Fredman D, Jobs M, Strömqvist L, et al. DFold: PCR Design that Minimizes Secondary Structure and Optimizes Downstream Genotyping Applications [J]. Human Mutation, 2004, Vol. 24(1):1 8. 19. [101] Manaster C, Zheng W Y, Teuber M, et al. InSNP: A Tool for Automated Detection and Visualization of SNPs and InDels [J]. Human Mutation, 2005, Vol. 26(1):11 [102] Fokkema I F, den Dunnen J T, Taschner P E. LOVD: Easy Creation of a Locus-Specific Sequence Variation Database Using an "LSDB-in-a-Box" Approach [J]. Human Mutation, 2005, Vol. 26(2):63 68. [103] Béroud C, Hamroun D, Collod-Béroud G, et al. UMD (Universal Mutation Database): 2005 Update [J]. Human Mutation, 2005, Vol. 26(3):184 191. [104] Smith T D, Cotton R G H. VariVis: A visualization toolkit for variation databases [J]. BMC Bioinformatics, 2008, Vol. 9:206. [105] Brandon M C, Ruiz-Pesini E, Mishmar D, et al. MITOMASTER: A Bioinformatics Tool for the Analysis of Mitochondrial DNA Sequences [J]. Human Mutation, 2009, Vol. 30(1): 1 Mutations Database Initiative [J]. Human Mutation, 2000, Vol. 15(1):22 Mutation, 2000, 15(1):30 35. [108] Lehnert V, Holzwarth J, Ott M, et al. A Semi-Automated System for Analysis and Storage of SNPs [J]. Human Mutation, 2001, Vol. 17(4):243 254. [109] Zhang G, Zhang S Z, Chen W, et al. Go!Poly: A Gene-Oriented Polymorphism Database [J]. Human Mutation, 2001, Vol. 18(5):382 387. [110] Stenson P D, Ball E V, Mort M, et al. Human Gene Mutation Database (HGMD):2003 Update [J]. Human Mutation, 2003, Vol. 21(6):577 581. [111] Tahira T, Baba S, Higasa K, et al. dbQSNP: A Database of SNPs in Human Promoter Regions With Allele Frequency Information Determined by Single-Strand Conformation Polymorphism-Based Methods [J]. Human Mutation, 2005, Vol. 26(2):69 77. [112] Giardine B, Riemer C, Hefferon T, et al. PhenCode: Connecting ENCODE Data With Mutations and Phenotype [J]. Human Mutation, 2007, Vol. 28(6):554 562. [113] Yip Y L, Famiglietti M, Gos A, et al. Annotating Single Amino Acid Polymorphisms in the UniProt/Swiss-Prot Knowledgebase [J]. Human Mutation, 2008, Vol. 29(3):361 366. [114] Owen R P, Altman R B, Klein T E. PharmGKB and the International Warfarin Pharmacogenetics Consortium: The Changing Role for Pharmacogenomic Databases and Single-Drug Pharmacogenetics [J]. Human Mutation, 2008, Vol. 29(4): 456 Mutation, 2009, Vol. 30(3): E460 E466. [116] Friedrich A, Garnier N, Gagnière N, et al. SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases [J]. Human Mutation, 2010, Vol. 31(2):127 135.
82

6.

[106] Maurer S. Coping with change: intellectual property rights new legislation, and the Human 29. [107] Knoppers B M, Laberge C M. Ethical guideposts for allelic variation databases [J]. Human

460.

[115] Rhee H, Lee J S. MedRefSNP: A Database of Medically Investigated SNPs [J]. Human

2002. Nucleic Acids [119] Benson D A. Vol. A wiki for the life sciences where authorship matters [J]. Nishikawa K. Vol. Vol. Yuan Y P. SRS: Information retrieval system for molecular biology data banks [J]. [132] Thorisson G A. BIND--The Biomolecular Interaction Network Database [J]. 266:141 [126] Kazemian M. 162. Navarro J D. [131] International HapMap Consortium. 2005 CICC. Prototype implementation of the integrated genomic database [J]. Kristiansen T Z. Kocab P. Argos P. Biomed. Vol. Ludäscher B. GenBank [J]. 30(1):303 305. Artificial Intelligence and Machine Learning 2005 Conference (AIML 05). Zhang B. INTEGRATING BIOLOGICAL DATABASES [J]. The Universal Protein Resource (UniProt) [J]. 37(Database issue):D797 D802. Epstein J A. Vol. Hashimoto Y. Nature Genetics 2008. 65(2):266 4(5):337 345. [127] Gupta A. Vol. 29(1):242 245. Tanaka T. Nat Rev Genet. Human Mutation. 2003. et al. Cairo. 30(1):158 162. [128] Ritter O. 1996. et al. 1999. Donaldson I. Martone M E. 40:1047 1051. 27(1):355 357. Vol. Wolting C. 31(3):219 Research. 426(6968):789-796. 2003. Res. Nucleic Acids Research. Methods in Enzymology. 1996. Passi K. Egypt. 266:114 128. Nucleic Acids Research. The Protein Mutant Database [J]. [118] Kawabata T. Nucleic Acids Research. Proceedings of the 12th International Conference on Scientific and Statistical Database Management.. Moshiri B. [125] Schuler G D. Vol. 2000:39. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources [J]. Vol. 32(Database issue):D497 D501. Nikhbakh H. Senger M. Nucleic Acids Research. JSNP: a database of common gene variations in the Japanese population [J]. 2002. Nucleic Acids Research. Nucleic Acids Research. 228. Vol. CanProVar: a human cancer proteome variation database [J]. et al. Salwínski L. el at. Bhowmick S. The International HapMap Project [J]. Vol. 2002. [133] Bader G D. Comput. 83 [121] Hoffmann R. HGVbaseG2P: a central genetic association database [J]. 27(2): 97 115. Vol. Methods in Enzymology. Duncan D T. 303 [123] Lincoln D S. 2004. 35(Database issue):D193 Vol. 2010. 2008. Nucleic Acids Research. [135] Peri S. [124] Etzold T. Human protein reference database as a discovery resource for proteomics [J]. Duan X J. [130] Fredman D. [129] Hirakawa M. D197. 30(1):387 391. Lancaster O. at el. Siegfried M. D30. Lipman D J. Architecture for Biological Database Integration [Z]. Vol. Vol. Knowledge-Based Integration of Neuroscience Data Sources [Z]. Vol. Vol. Data & Knowledge Engineering. Karsch-Mizrachi I. 2008. et al. An XML Schema integration and query mechanism system [J]. 36(Database issue): D25 [120] The UniProt Consortium. 2001. [122] Madria S. Free R C. the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions [J]..[117] Li J. . Ota M. 2007. et al. et al. Ohkawa H. Nucleic Acids Research. Nature. et al. [134] Xenarios I. DIP. et al. 1994. Entrez: molecular biology database and retrieval system [J].

2002. Bader G. Sun Z R. Human Mutation. 25(10):1127 1133. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins [J]. MINT: a Molecular INTeraction database [J]. Muilu J. Genome Biology. 4 . Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases [J]. 2004. Human Mutation. Diemand A V. 2006:1043. Human Mutation. FEBS Lett. Vol.[136] Zanzoni A. 2008. Journal of Molecular Biology 2001 Vol. 16(5):412 424. [145] 2005 Vol. 2000. Vol. The phenotype and genotype experiment object model (PaGE-OM): a robust data structure for information related to DNA variation [J]. Kanehisa M. Vol. 513(1):135 140. Troup C. et al. Conde L. Quondam M. 29(2):212 219. Protein Eng. et al. et al. Hiroi K. [140] Spellman P T. Brunak S. [137] Whirl-Carrillo M. Vol. Nature Biotechnology. Vol. Vol. [138] Brookes A J. Vol. Miller M. Ido K. et al. 1996..31(11):180 184. 2008. [144] Yip Y L. [142] Jones A R. Bioinformatics. 2004 [147] Tomii K. 2007. An XML-Based Interchange Format for Genotype±Phenotype Data [J]. Vol. 22(2):177 183. Design and implementation of microarray gene expression markup language (MAGE-ML) [J]. et al. [139] Hermjakob H. Medina I. 2009. Assessing the accuracy of prediction algorithms for classification: an overview [J]. et al.. An Overview of Genomic Sequence Variation Markup Language (GSVML) [J].5(4):501 504 84 . . Woon M. [149] ( ) [M] · [J] [J] 3 2004 473 475 [150] Baldi P. A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach [J]. et al. 30(6):968 977. Nucleic Acids Research. 2006. et al. AMIA Annu Symp Proc. 2002. 23(5):464 470. [141] Nakaya J. Nature Biotechnology. The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data [J].31(3):229 235 [146] Hua S J. 3(9):RESEARCH0046. [143] Reumers J. 308: 397 407. Vol. Montecchi-Palazzi L. Miller M. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics [J]. 2004. 36. Stewart J. The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure Information on Human Protein Variants [J]. Vol. Lehvaslaiho H. Aebersold R. Thorn C F. Montecchi-Palazzi L. et al.9:27 [148] Vol. [M] 1 [J] [M] 1 2006 37 ( ) 2006 2006 1 . Chauvin Y. Scheib H. Vol. 36(Database issue):D825 D829. et al. [151] [152] 41 [153] Vol.

[167] Camon E. 232(2):584 99. [158] Capriotti E. Vol. upon single point mutations. Sander C. 21(Suppl. Madden T L. [170] Capriotti E. [172] Cheng J. [168] Bodenreider O.edu. Dimmer E. 89: 10915 10919. The hydrophobic moment detects periodicity in protein hydrophobicity [J]. Vol. 34 (Database issue):D322 D326. Schwartz R M. Calabrese R. Barrell D. 1993.pdf. Vol. [165] Eilbeck K. et al. Vol. Schopen M. [160] 2004 37 [161] ( )Cover T M 2007 198 199 [162] Altschul S F. A model for evolutionary change in proteins [J]. Vol. Nucleic Acids Research. 33(Web Server issue):W306 stability W310. The Sequence Ontology: a tool for the unification of genome annotations [J]. www. [164] Nelson S J. The Gene Ontology (GO) project in 2006 [J]. Orcutt B C. 107(Pt1):67 69. Prediction of protein stability changes for single-site 85 [J] [M] 41 [M] 2 1 [J] Vol. 2005. Savage A G. 2005. Interface Design. 25 (17): 3389 3402. 2005. 81(1): 140 144. Vol. Vol. 4(1):5 6. 2004. Nucleic Acids Research. 32 (Database issue):D267 D270. et al. Fariselli P. Terwilliger T C. Journal of Molecular Biology. 6(5):R44 [166] Gene Ontology Consortium. Schäffer A A. [169] 10 13. Casadio R: A neural-network-based method for predicting protein changes 20(Suppl. Nucleic Acids Research. 2004. Proc Natl Acad Sci USA. Bioinformatics.ntu. [171] Capriotti E. Atlas of Protein Sequence and Structure. Mungall C J. Natl Acad. Nucleic Acids Research. 2004. USA.5 (Suppl.. 2006. [156] Hsu C W. [J].2):ii54 ii58. [155] Henikoff S. Amino acid substitution matrices from protein blocks [J]. Casadio R: I-Mutant2. 2004. Bioinformatics. The MeSH Translation Maintenance System: Structure. Vol. A practical guide to support vector classification [DB/OL]. 1997. Vol. Sci. Vol. In Silico Biology. Prediction of protein secondary structure at better than 70% accuracy [J]. [163] Rost B. Fariselli P. et al: Predicting protein stability changes from sequences using support vector machines [J].tw/~cjlin/papers/guide/guide. 1978. Vol. 36(5): 658 665. Stud Health Technol Inform. Vol. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].0: predicting stability changes upon mutation from the protein sequence or structure [J]. Fariselli P. The Gene Ontology Annotation (GOA) Database±An Integrated Resource of GO Annotations to the UniProt Knowledgebase [J]. Proc. Vol.csie. Lewis S E. Lee V. Genome Biology. Apweiler R . 2003-x-x/2010±04±03. [159] 2009 Vol.[154] Dayshift M O. et al. Weiss R M.2008 35 4 : . The Unified Medical Language System (UMLS): integrating biomedical terminology [J]. 1992. et al.1):i63 i68. [157] Eisenberg D. Henikoff J G. and Baldi P. and Implementation [J]. Randall A. 3): 345 352.

cnki. 20(17):3179 3184. Vol. Proteins.org/wiki/ [DB/OL] . Support vector machines for predicting protein structural 99%A8%E5%AD%A6%E4%B9%A0 2010±02±20/2010±02±20. 18(5):689 696. 15(2):181-90. et al. Liu X J. A novel method for protein secondary structure prediction using dual-layer SVM and profiles [J].. [186] [187] [188] 2010±01±23/2010±01±23. Peptides. J. Chou K C. )Vapnik V N 2004 96 [M] 1 [DB/OL] http://zh. Vol. [183] Zhang T. et al. Proteins. Support vector machines for prediction of protein signal sequences and their cleavage sites [J]. [185] Wang L H. et al.net/WebForms 743. [180] Caragea C. [179] Hansen J E. [173] Guo J. Vol.hudong. Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs [J]. Vol. 2004. Xu X. Zhang H. [181] Kim J H. 2004. Support vector machines for predicting membrane protein types by using functional domain composition [J]. Lin S L. et al. [M] 1 105. 2007.. 2003. et al. Engelbrecht J. et al.mutations using support vector machines [J]. Kanehisa M.. Zhou G P. 3257 3263. et al. Vol. Chou K C. 9:388. Support vector machines for predicting HIV protease cleavage sites in protein [J]. Li Y F. Prediction of O-glycosylation of mammalian of UDP-GalNAc: polypeptide N-acetylgalactosaminyl transferase [J]. Sequence based residue depth prediction using evolutionary information and predicted secondary structure [J]. Prediction of phosphorylation sites using SVMs [J]. 2:3. 19(13):1656 1663. Reifman J. Comput. Vol. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions [J]. 2008. Vol. Oh B. BMC Bioinformatics. 2003. Chen H. [184] Zhang H. Sinapov J. 54(4):738 class [J]. Biochem. Stevens F J. 2002.aspx?searchword=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0 86 . Vol. [189] ( [190] ( )Mitchell T M 2003 15 20. [182] Zavaljevski N. Liu X J. Bioinformatics. /WebDefines. 24(20):2329 2338. 2002. 8: 438. 2010±02±20/2010±02±20. BMC Bioinformatics. 2003. Bioinformatics. 84(5). Lee J. 2004.. http://www. Vol. Biophys. Vol. Silvescu A. Chem. Zhou G P. 2006. 62(4):1125 1132. Lund O. Genome Inform. J. [176] Park K J. [175] Cai Y D. [174] Cai Y D. Vol. Predicting protein secondary structure by a support vector machine based on a new coding scheme [J]. Zhang T. Accurate sequence-based prediction of catalytic residues [J]. Glycosylation site prediction using ensembles of Support Vector Machine classifiers [J]. 2008. J. [178] Cai Y D. 23(2):267 proteins: specificity patterns 274. 308(Pt3):801 813. Vol. BMC Bioinformatics. Vol. Chen K.com/wiki/%E6%9C%BA%E5% [DB/OL] http://define.wikipedia. 1995. Chen K. Liu J. Bioinformatics. 24(1):159 161. Bioinformatics. [177] Cai Y D. 2001. Sun Z. et al. Vol. Xu X B.

Cotton 87 .G.alberta Tuszynski Lukasz Kurgan Jack u u Richard.H.

A [186] Machine Learning [187] [188] paradigm Learning by Deduction Learning by Induction Transductive Learning Connectionism Actionism 88 Learning by Analogy Symbolicism Analytical .

Learning Learning / Statistical Learning Reinforcement Learning A.1 Manifold Learning Semi-supervised Learning Multi-instance Learning Ranking / 89 .1 [151] Ensemble A.1 [151] A.

ti . l} R 1 -1 ne3 n i 0 n 1 90 . t i ) xi x i  R n .1 / SVM [152] {( x i . t i  { 1 . i ! 1 .Leaming from Examples concept acquisition [189] ambiguity Concept Label A.  1 }.

x " G b A. t i ! 1 t i [ w .4 w .4 d ! 1/ w A.2 n=3 n G n g (x ) ! w.l A. 2. xi  G  A. i ! 1. x " b ! 0.2 A.Optimal Separating Hyperplane OSH A. x i "  b w .2B G+ G G+ G G- d w . x i "  b e  1. . t i ! 1 w . t i ! 1 G 2 G w 2 . x i "  b ] u 1. x  R n=3 G- w w.1 A. x i "  b u 1.3 G+ G d! G+ d ! 1/ w 2/ w 91 w.3. x i "  b ! 1.

t. i ! 1.3. \ i u 0.5 l Max l Q( ) ! § E i  i !1 1 l § E iE j t i t j 2 i .t.2. x " b * ) x A. j !1 xi .l min A.2.8 92 . i ! 1.2.5 g (x) ! w * . J ( x i ) "  b ] u 1  \ i . t i [ w .t. x " b * ! 0 A.3. x i "  b ] u 1.l A. x j " A.OSH 1 2 w w .5 w* b* y ! f ( x ) ! sgn( w * .l min A.2. E i u 0. . i ! 1.2A J x Hilbert J (x) 1 2 w w . i ! 1.3. . J (x i ) "  b ] u 1.6 s.7 l 1 2 w  C§ \i w .l min A. t i [ w . §t E i i !1 i ! 0.b 2 s.5 y lagrange A.t.b 2 i !1 s.b 2 s. . t i [ w .3. .

2. t i  { 1 . 0 e E i e.2. t ) p (x.3.l Max l Q( ) ! § E i  i !1 1 l § E iE j t i t j 2 i .9 l Max l Q( ) ! § E i  i !1 1 l § E iE j t i t j K 2 i .11 A.2. t ) 93 . J (x j ) " K (x i . k a f 1 a f R( f ) ! ´ c( f (x).t. x j " A.10 s.  1 }. J (x j ) " K (x i . C i ! 1. t i ) l i !1 H A.9 s. x j ) J (x i ). i ! 1 . t )dxdt Remp ( f ) ! 1 l § c( f (x i ). §t E i i !1 i ! 0 .l A. 0 e E i e. J (x j ) " K (x i . x j ) J A.8-10 Ei §t E i i !1 i ! 0.l J (x i ). x j ) ! J (x i ). J ( x j ) " A. t i ) n x i  R n . - . j !1 xi .3. Z ! _. l} H ! _ : R p Z .. j !1 J ( x i ).t.12 R( f ) Remp ( f ) p ( x. t ) c( f (x). x )  b * ) i !1 SV {( x i . C i ! 1.. . A.9-10 C 0 0 Ei l Support Vector y ! f ( x ) ! sgn( § E i t i K ( x i . .

3 A.13 0 eL e1 h L 2l  1)  ln 4 h l VC f F 94 ¦ R ( f ) p in R ( f ) Re p ( f ) p in R( f ) A.13 § L FH . t ) lpg Remp ( f ) p R ( f ) p R( f ) p A.p ( x.3 [152] Vapinic [190] h(ln R( f ) e Re p ( f )  A.

A.13 l h l h A.4 A.4 [152] 95 .

ac.2.B 2.uk inherited disease HGMD HGMD 3 Professional HGMD gene symbol ID gene-centered web gene description disease/phenotype ARX OMIM 5 ID HGMD 250 HGMD GDB 4 gene symbol Chromosomal location sequence viewer exon NCBI Extended cDNA 25bp Gene name Accession number Splice junctions 25bp intron Mutation type Number of mutations 96 cDNA cDNA cDNA Mutation NCBI Mutation viewer Mutation viewer .cf.hgmd.1 3 HGMD PMD SNPeffect The Human Gene Mutation Database the Institute of Medical Genetics http //www.

Mutation data by type Regulatory deletions Small deletions variations Splicing Small insertions Missense/nonsense Small Small indels Repeat Complex rearrangements Disease/phenotype Number of mutations Mutation data by disease/phenotype First published mutation report External links PubMed PubMed B.1 HGMD 97 ARX .

jp/ 1995 10000 globin ENTRY AUTHORS JOURNAL TITLE PURPOSE CROSS-REFERENCE ID PROTEIN SOURCE N N-TERMINAL N Swiss-prot PDB NCBI EXPRESSION-SYSTEM Yeast Human kidney 293 cells CHANGE FUNCTION + [0] [+] [-] 98 Escherichia coli [-] [=] [+ +] [0] [.nig.ddbj.-] [+] [=] .ac.2 7 DNA http://pmd.PMD PMD Protein Mutant Database Genetics DNA Data Bank [118] National Institute of Center for Information Biology and PMD 1970 immunoglobulin PMD 18 B.

2 HGMD ARX TRANSPORT DISEASE 99 .Protein synthesis inhibitory activity [-] STRUCTURE STABILITY EXPRESSION MATURATION B.

bioinfo.be SNPeffect SNPs 43797 SNPs 4965073 133505 SNPeffect SNP SNP-centered web ID SNP SNP C SNP 23948838 23948839 protein-centered Cytochrome c oxidase copper chaperone ENSEMBL SNP ID B.SNPeffect SNPeffect SNP http://pupasuite.es SNP PupaSuite SNPs 14935 [143] Free University of Brussels Switch http://snpeffect.3 SNPeffect 100 SNP .vib.cipf.

23948838 7 SNP B.3 SNP PupaSuite 7 SNP SNP Validation status Protein 7 Wild Type Functional Sites PupaSuite Molecular phenotype ID Protein Molecular YES/NO Structure & Dynamics Cellular Processing Links SNP SNP Allele string phenotype Wild Type Identifiers Sequence SNP Disease Identifiers SNP Sequence Disease OMIM Aggregation Amylogenic regions Structure & Dynamics Stability Transmembrane regions Functional Sites Chaperone binding Cellular Processing Phosphorylation PupaSuite Triplex Links SNP TFBS SpliceSites SNP SNP related links Gene related links ESE N- Catalytic sites N-glycosylation Acetylation Omegas ESS ID Protein related links 101 .

30 Day Trial Edition 7.5.liquid-technologies.C HERG XML Schema <?xml version="1.1.1419 (http://www.w3.0" encoding="utf-8" ?> <!--Created with Liquid XML Studio .org/2001/XMLSchema"> <xs:complexType name="SequencesAndGenome"> <xs:all> <xs:element name="landmark"> <xs:complexType /> </xs:element> <xs:element name="DNA"> <xs:complexType /> </xs:element> <xs:element name="chromosome_breakpoint"> <xs:complexType /> </xs:element> <xs:element name="RNA"> <xs:complexType /> </xs:element> <xs:element name="protein"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Sequence_Features"> <xs:all> <xs:element name="DNA"> <xs:complexType> <xs:all> <xs:element name="TR_site"> 102 elementFormDefault="qualified" .com)--> <xs:schema xmlns:xs="http://www.

<xs:complexType /> </xs:element> <xs:element name="exon_and_intron"> <xs:complexType /> </xs:element> <xs:element name="UTR"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType> <xs:all> <xs:element name="SNP"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="RNA"> <xs:complexType> <xs:all> <xs:element name="splice_site"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> 103 .

</xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="protein"> <xs:complexType> <xs:all> <xs:element name="polypeptide_motif_or_domain"> <xs:complexType /> </xs:element> <xs:element name="catalytic_residue"> <xs:complexType /> </xs:element> <xs:element name="signal_peptide"> <xs:complexType /> </xs:element> <xs:element name="binding_site"> <xs:complexType /> </xs:element> <xs:element name="post_translational_site"> <xs:complexType /> </xs:element> <xs:element name="sequence_variant"> <xs:complexType /> </xs:element> <xs:element name="missense_or_nonsense"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> 104 .

</xs:all> </xs:complexType> <xs:complexType name="Molecular_Phenome"> <xs:all> <xs:element name="single_molecular"> <xs:complexType> <xs:all> <xs:element name="gene_or_protein_property"> <xs:complexType /> </xs:element> <xs:element name="gene_or_protein_function"> <xs:complexType /> </xs:element> <xs:element name="protein_interaction"> <xs:complexType /> </xs:element> <xs:element name="protein_post_translation"> <xs:complexType /> </xs:element> <xs:element name="protein_transport_and_location"> <xs:complexType /> </xs:element> <xs:element name="gene_expression"> <xs:complexType /> </xs:element> <xs:element name="Others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> <xs:element name="Multiple_Molecular"> <xs:complexType> <xs:all> 105 .

<xs:element name="transcriptome"> <xs:complexType /> </xs:element> <xs:element name="proteome"> <xs:complexType /> </xs:element> <xs:element name="metabolome"> <xs:complexType /> </xs:element> <xs:element name="epigenome"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Phenome"> <xs:all> <xs:element name="deseases"> <xs:complexType /> </xs:element> <xs:element name="aging"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="NetworksAndPathways"> 106 .

<xs:all> <xs:element name="signal_transduction"> <xs:complexType /> </xs:element> <xs:element name="cellular_process"> <xs:complexType /> </xs:element> <xs:element name="metabolic_pathway"> <xs:complexType /> </xs:element> <xs:element name="molecular_interaction_network"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> </xs:element> </xs:all> </xs:complexType> <xs:complexType name="Structures"> <xs:all> <xs:element name="small_molecule_structure"> <xs:complexType /> </xs:element> <xs:element name="nucleic_acid_structure"> <xs:complexType /> </xs:element> <xs:element name="protein_structure"> <xs:complexType /> </xs:element> <xs:element name="carbohydrate_structure"> <xs:complexType /> </xs:element> <xs:element name="others"> <xs:complexType /> 107 .

findbase.gov/entrez/query.edu http://mutdb.med.gov/ http://genomics.gov/SNP/ http://orca.kr/SNP2NMD/ http://alfred.roswellpark.mit.hgmd.cnb.cwru.org 108 .senescence.org/ http://ghr.nanea.cags.kyushu-u.edu/adamsl ab/pbrowser.genatlas.re.nlm.ac.es/UniPub/HCAD/ www.org/ http://www.dk/cytokinesnps/ http://projects.org.ca/variation/ http://qsnp.ncbi.tcag.edu http://www.nlm.nih.ae http://www.3 132 GenAtlas Genetics Home Reference HAGR HCAD HGMD Human PAML Browser MSY Breakpoint Mapper MutDB OMIM SNP2NMD ALFRED CTGA Cypriot national mutation database Cytokine Gene Polymorphism Database Database of Genomic Variants dbQSNP dbRIP dbSNP D-HaploDB FINDBase http://www.jp/ http://falcon.gen.wi.yale.ac.kobic.info/ http://www.pdg.fc gi?db=OMIM http://variome.</xs:element> </xs:all> </xs:complexType> <xs:complexType name="Experiment_Documentation" /> <xs:complexType name="Terms_And_Nomenclatures" /> <xs:complexType name="Tools" /> <xs:complexType name="Others" /> </xs:schema> 1 99 2.org/cypriot/ http://www.ncbi.nih.cf.gen.jp http://www.nih.3.py http://breakpointmapper.nlm.uk http://mendel.org:9090/ http://www.goldenhelix.gene.kyushu-u.ac.uam.

pref.ncku.queensu.do http://oncodb.psych.uk/perl/CGP/cosmic http://www.tumor-gene.ddbj.mc.html http://www3.tw http://atlasgeneticsoncology.nlm.org/ http://lifesciencedb.php http://www.kr/SNPatETHNIC/ http://snpeffect.ca/androgendb/ http://genome.edu/pipaslab/ http://www.nig.cn/Index.mcgill.fr/polymorphix/query.uic.be/ http://gila.p hp http://pmd.phenomicdb.tw/TAG/GeneDo c.cn/hptaa/ http://www-p53.edu.kobic.org http://www.ncbi.ua.html http://bioinfo.nih.carpedb.org.cshl.be/pubmeth/ http://snp500cancer.alzgene.fi/BTKbase/ http://www.bioinfo.cz/projects/germline_mut _p53.tw http://matrix.cuni.ims.edu.ca/F-SNP/ http://snp.HCC PubMeth SNP500Cancer SV40 Large T-Antigen Mutant Database Tumor Associated Gene Database Tumor Gene Family Databases (TGDBs) ALPSbase AlzGene Androgen Receptor Gene Mutations DB BGED BTKbase CarpeDB http://compbio.org/tgdf.iarc.ac.cs.nci.nih.ac.pitt.gov/topics/ALPS/ http://www.lf2.sinica.niaid.univ-lyon1.bioengr.nih.jp/ http://bioportal.cchmc.fr/ittaca http://methycancer.iis.tw http://www.hcc.sanger.hgvbaseg2p.cangem.jp/BGED/ http://bioinf.jp/cged/ http://www.uta.mskcc.osaka.binfo.org http://pbil.fc gi?db=cancerchromosomes http://cbio.ibms.u-tokyo.org.gov/entrez/query.F-SNP HapMap Project HGVbase JSNP PhenomicDB PolyDoms Polymorphix Protein Mutant DB SNP@Ethnos SNPeffect & PupaSuite TopoSNP TPMD Atlas of Genetics and Cytogenetics in Oncology and Haematology Cancer Chromosomes CancerGenes CanGEM CGED COSMIC Database of Germline p53 Mutations EHCO HPTAA IARC TP53 Database ITTACA MethyCancer OncoDB.org/index http://snp.htm http://ehco.ugent.edu 109 .ac.curie.de http://polydoms.org http://www.fr/index.org/ http://www.ac.vib.org/cancergenes http://www.nhri.sinica.re.edu/snp/toposnp http://tpmd.bio.gov http://supernova.jp/ http://www.edu.

edu http://gibk26.com/plpmdb/ http://telomerase.nih.mcgill.de/joomla/ http://ymbc.studiofmp.ucl.html http://fmf.upenn.gov/ http://genome.ac.edu/pgdb/ http://www.uk/ncl/ http://neibank.mcgill.nw.edu/ http://pax2.lanl.org http://projects.fi/imt/bioinfo/KinMutBase/ http://research.edu/EpoDB/ http://gold.edu/HIV 110 .db HaemB HbVar HDBase HemBase HORDE HOX-PRO HPMR Human PAX2 Allelic Variant Database Human PAX6 Allelic Variant Database IL2Rgbase Imprinted Gene Catalogue INFEVERS KinMutBase Lowe Syndrome Mutation Database NCL Resource NEIbank PAHdb PGDB PHEXdb RB1 Gene Mutation DB SCAdb T1Dbase The Autism Chromosome Rearrangement DB The Lafora Database 16S and 23S rRNA Mutation Database ProTherm PLPMDB Telomerase database HIV Drug Resistance Database HIV Positive Selection Mutation Database http://www.hiv.casrdb.iephb.gov/scid/ http://igc.fandm.hgu.kcl.mrc.nhgri.le.otago.gov/content/sequence/RE SDB/ http://bioinfo.ac.mcgill.CASRDB Collagen Mutation DB EpoDB GOLD.cnrs.html http://www.nz/home.ym.nei.ucsf.edu/ http://www.ucla.psu.nih.hgu.stanford.bse.ac.kyutech.tugraz.gov http://www.fr/infevers http://www.igh.tcag.org/ http://hembase.uk/genetics/collagen/ http://www.ru/labs/lab38/spirov/ho x_pro/hox-pro00.tcag.uta.asu.cse.mbi.ca/lafora/ http://ribosome.ca/autism http://projects.edu.nhgri.verandi.ca/ http://www.ac.ac.at http://www.uk/ http://pax6.nih.html http://globin.mrc.ac.nih.il/horde/ http://www.edu/hbvar/ http://hdbase.pahdb.tw/sca_ensembl/ http://t1dbase.ac.cbil.ca http://www.gov/lowe/ http://www.jp/jouhou/prothe rm/protherm.niddk.weizmann.ac.html http://receptome.uk/ http://research.uk/ip/petergreen/haemBdat abase.ca http://www.phexdb.

edu/mutbase/ http://metalab.edu/mgs/dbases/agns http://cooke.de/asthmagen/main.org/ http://genetics.hpi.bbk.edu.infobiogen.ac.uta.uk/WebPages/Mai n/main.fr http://www.de http://cooke.gsf.cnr.angis.edu.de/ http://emj-pc.a-star.pedb.hvrbase.uk/Research/Mitbase/mitb ase.pku.u-tokyo.ac.uci.med.uit.unc.edu/cgi-bin/PRMut.kaist.jp/Mutation View/jsp/index.au/ http://mutview.uni-hamburg.cbi.no/GRAP/ http://genisys.ba.de/dt40.lit.ac.uk/ 111 .ac.se/cgi-bin/w3-msql/ptc hbase/index.tigem.net/ http://rmd.mrc.kr:8080 No longer maintained http://syndb.it/LOCAL/drosophila/dros.csc.ac.cybergene.pl http://tbase.cryst.cfm http://tinyGRAP.upenn.stanford.i2r. html http://www3.html http://genoplante-info.med.jax.interactiva.sg/kberg http://bioinformatics.edu/HemoP DB/ http://sdmc.edu/ http://www.html http://www.edu/dnam/mainpage.yale.ebi.dmb.fi/imt/bioinfo/idr/ http://hgbase.ncpgr.htm http://hivdb.ics.html http://research.it/~areamt08/MmtDBWW http://info.SCMD FlyTrap PathBase Rice Mutant Database HAMSTeRS HIV-RT HvrBase Online Mendelian Inheritance in Animals KMDB http://yeast.keio.yale.cn http://www.jp/ http://flytrap.k.gi.org.pl http://eyesite.pathbase.jsp 33 MmtDB Mutation Spectra Database p53 Databases DRESH MitBASE Transgenic/Targeted Mutation Database DT40 FLAGdb/FST IDR HGBASE AGNS Asthma and Allergy Database Asthma Gene Database GRAP Mutant Databases GeniSys T-REGs SynDB Prostate Expression DB PTCH1 Mutation DB KBERG HemoPDB ERGDB EyeSite http://www.gsf.org/ http://www.wistar.org.org/ http://omia.cgi http://www.med.cn/ http://europium.sg/ergdb/cgi-bin/explore.ac.

jp/EICODB/ http://angiodb.edu/ Incorporated into TGDBs.ceinge.ucsf.html http://www. no. rodent lacI and rodent lacZ databases HGVS Databases SNAP DG-CST FESD No longer maintained http://fantom2. human hprt. 155 http://www.kr/FESD/ 112 .org/dnam/mainpage.re.kribb.snu.unina.gsc.humgen.ibiblio.au.DENIZ EICO DB AngioDB BayGenomics Oral Cancer Gene DB Human p53.it/ http://sysbio.kr/ http://baygenomics.hgvs.dk/ http://dgcst.ac.org/ http://platform.riken.

Shan Gao. Guangyou Duan.454) [d] ZHANG Ning. 263(3): 360-368. 113 .1977 2004 2007 1 2007 24 1995 SCI 4 [a] Shan Gao. The Interstrand Amino Acid Pairs Play a Significant Role in Determining the Parallel or Antiparallel Orientation of beta-Strands.2009. 2009 30 (8): 1161-1166. Prediction of function changes associated with single-point protein mutations using support vector machines (SVMs) [J]. (IF= 2. 2010 Accepted. Ning Zhang.Journal of Theoretical Biology.DOI 10. Tao Zhang.5162547 [f] Ning Zhang. Human Mutation. ZHANG Tao. Jishou Ruan. Shan Gao. 386: 537-543.648) [c] Ning Zhang.1109/ICBBE.924) a universal software tool for DNA/Protein sequence relationship visualization based on undirected graphs. ICBBE 2009. Zhuo Yang. 2009. Guang-You Duan. Jishou Ruan. (IF= 7.Bioinformatics and Biomedical Engineering.SRD Journal of Biomedical Informatics.033) [b] Ning Zhang. Tao Zhang Component Vector method and its application in detecting similarities between sequences. YANG Zhuo. Biochemical and Biophysical Research Communications. Shan Gao. (IF= 2. Guangyou Duan. 2009. DUAN Guangyou. 3rd International Conference on 11-13 June 2009 Page(s): 1 . Ji S Ruan. Zhuo Yang. EI (2 ) [e] Guang-You Duan. You G Duan. StrandPairsViewer a toolkit for visualization and analysis of amino acids pairs in protein sheet structures.3. Zhuo Yang. Shan Gao. Tao Zhang. GAO Shan. Prediction of the Parallel / Antiparallel Orientation of Beta-Strands Using Amino Acid Pairing Preferences and Support Vector Machines. (IF= 1. Tao Zhang. 2010. Ning Zhang. Tao Zhang.

Jishou Ruan. Ning Zhang. 2009. 2010: under review. . 2010 2009 25 17. Ji S Ruan. . 2010: under review. Tao Zhang. You G Duan. . Zhuo Yang. Shan Gao. Interoperate and Study molecular biology Databases. Ning Zhang.DOI 10. Ning Zhang. Identifying non-neutral amino acid substitutions by SVMs. ICBBE 2009.1109/ICBBE. Tao Zhang. 3rd International Conference on 11-13 June 2009 Page(s): 1 .2009. Tao Zhang. HERG A Model to Describe.5163427 3 [g] Shan Gao. Visual Basic 5722-5725. [k] Guangyou Duan.Bioinformatics and Biomedical Engineering.Journal of Theoretical Biology. You G Duan.4. [h] 24 [i] . 114 . 2009 30 2 [j] Shan Gao. Improved splice site prediction using sequence information and singular value decomposition.