Professional Documents
Culture Documents
Intelligent Computing in Bioinformatics 10th International Conference ICIC 2014 Taiyuan China August 3 6 2014 Proceedings 1st Edition De-Shuang Huang
Intelligent Computing in Bioinformatics 10th International Conference ICIC 2014 Taiyuan China August 3 6 2014 Proceedings 1st Edition De-Shuang Huang
https://textbookfull.com/product/intelligent-computing-
theory-10th-international-conference-icic-2014-taiyuan-china-
august-3-6-2014-proceedings-1st-edition-de-shuang-huang/
https://textbookfull.com/product/intelligent-computing-
methodologies-10th-international-conference-icic-2014-taiyuan-
china-august-3-6-2014-proceedings-1st-edition-de-shuang-huang/
https://textbookfull.com/product/intelligent-computing-
methodologies-16th-international-conference-icic-2020-bari-italy-
october-2-5-2020-proceedings-part-iii-de-shuang-huang/
Intelligent Computing Theories and Application 16th
International Conference ICIC 2020 Bari Italy October 2
5 2020 Proceedings Part I De-Shuang Huang
https://textbookfull.com/product/intelligent-computing-theories-
and-application-16th-international-conference-icic-2020-bari-
italy-october-2-5-2020-proceedings-part-i-de-shuang-huang/
https://textbookfull.com/product/intelligent-computing-theories-
and-application-de-shuang-huang/
https://textbookfull.com/product/distributed-computing-and-
internet-technology-10th-international-conference-
icdcit-2014-bhubaneswar-india-february-6-9-2014-proceedings-1st-
edition-gerard-berry/
https://textbookfull.com/product/bioinformatics-research-and-
applications-10th-international-symposium-isbra-2014-zhangjiajie-
china-june-28-30-2014-proceedings-1st-edition-mitra-basu/
https://textbookfull.com/product/intelligent-virtual-agents-14th-
international-conference-iva-2014-boston-ma-usa-
august-27-29-2014-proceedings-1st-edition-timothy-bickmore/
De-Shuang Huang
Kyungsook Han
Michael Gromiha (Eds.)
LNBI 8590
Intelligent Computing
in Bioinformatics
10th International Conference, ICIC 2014
Taiyuan, China, August 3–6, 2014
Proceedings
123
Lecture Notes in Bioinformatics 8590
Intelligent Computing
in Bioinformatics
10th International Conference, ICIC 2014
Taiyuan, China, August 3-6, 2014
Proceedings
13
Volume Editors
De-Shuang Huang
Tongji University
Machine Learning and Systems Biology Laboratory
School of Electronics and Information Engineering
4800 Caoan Road, Shanghai 201804, China
E-mail: dshuang@tongji.edu.cn
Kyungsook Han
Inha University
Department of Computer Science and Engineering
Incheon, South Korea
E-mail: khan@inha.ac.kr
Michael Gromiha
Indian Institute of Technology (IIT) Madras
Department of Biotechnology
Chennai 600 036, Tamilnadu, India
E-mail: gromiha@iitm.ac.in
authors, the success of the conference would not have been possible. Finally,
we are especially grateful to the IEEE Computational Intelligence Society, the
International Neural Network Society, and the National Science Foundation of
China for their sponsorship.
General Co-chairs
Publication Chair
Phalguni Gupta, India
Tutorial Chair
Laurent Heutte, France
International Liaison
Prashan Premaratne, Australia
Publicity Co-chairs
Exhibition Chair
Chun-Hou Zheng, China
Reviewers
Tao He Xu Jie
Selena He Jiening Xia
Xing He Taeseok Jin
Feng He Jingsong Shi
Van-Dung Hoang Mingyuan Jiu
Tian Hongjun Kanghyun Jo
Lei Hou Jayasudha John Suseela
Jingyu Hou Ren Jun
Ke Hu Fang Jun
Changjun Hu Li Jun
Zhaoyang Hu Zhang Junming
Jin Huang Yugandhar K.
Qiang Huang Tomáš Křen
Lei Huang Yang Kai
Shin-Ying Huang Hee-Jun Kang
Ke Huang Qi Kang
Huali Huang Dong-Joong Kang
Jida Huang Shugang Kang
Xixia Huang Bilal Karaaslan
Fuxin Huang Rohit Katiyar
Darma Putra i Ketut Gede Ondrej Kazik
Haci Ilhan Mohd Ayyub Khan
Sorin Ilie Muhammed Khan
Saiful Islam Sang-Wook Kim
Saeed Jafarzadeh Hong-Hyun Kim
Alex James One-Cue Kim
Chuleerat Jaruskulchai Duangmalai Klongdee
James Jayaputera Kunikazu Kobayashi
Umarani Jayaraman Yoshinori Kobayashi
Mun-Ho Jeong Takashi Komuro
Zhiwei Ji Toshiaki Kondo
Shouling Ji Deguang Kong
Yafei Jia Kitti Koonsanit
Hongjun Jia Rafal Kozik
Xiao Jian Kuang Li
Min Jiang Junbiao Kuang
Changan Jiang Baeguen Kwon
Tongyang Jiang Hebert Lacey
Yizhang Jiang Chunlu Lai
He Jiang Chien-Yuan Lai
Yunsheng Jiang David Lamb
Shujuan Jiang Wei Lan
Ying Jiang Chaowang Lan
Yizhang Jiang Qixun Lan
Changan Jiang Yee Wei Law
ICIC 2014 Organization XIII
Machine Learning
Predicting the Outer/Inner BetaStrands in Protein Beta Sheets Based
on the Random Forest Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Li Tang, Zheng Zhao, Lei Zhang, Tao Zhang, and Shan Gao
Neural Networks
Tumor Clustering Using Independent Component Analysis and
Adaptive Affinity Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Fen Ye, Jun-Feng Xia, Yan-Wen Chong, Yan Zhang, and
Chun-Hou Zheng
Image Processing
Multi-scale Level Set Method for Medical Image Segmentation without
Re-initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Xiao-Feng Wang, Hai Min, Le Zou, and Yi-Gang Zhang
XX Table of Contents
Li Tang1,2,*, Zheng Zhao1, Lei Zhang3,*, Tao Zhang3,**, and Shan Gao3,**
1
School of Computer Science and Technology, Tianjin University, Tianjin, P.R. China
tangli0831@yeah.net
2
Information Science and Technology Department,
Tianjin University of Finance and Economics, Tianjin, P.R. China
3
Key Lab. of Bioactive Materials, Ministry of Education and The College of Life Sciences,
Nankai University, Tianjin, P.R. China
{zhni,zhangtao,gao_shan}@nankai.edu.cn
Abstract. The beta sheet, as one of the three common second form of regular
secondary structure in proteins plays an important role in protein function. The
best strands in a beta sheet can be classified into the outer or inner strands.
Considering the protein primary sequences have determinant information to
arrange the strands in the beta sheet topology, we introduce an approach by
using the random forest algorithm to predict outer or inner arrangement of a
beta strand. We use nine features to describe a strand based on the
hydrophobicity, the hydrophilicity, the side-chain mass and other properties of
the beta strands. The random forest classifiers reach the best prediction
accuracy 89.45% with 10-fold cross-validation among five machine learning
methods. This result demonstrates that there are significant differences between
the outer beta strands and the inner ones in beta sheets. The finding in this study
can be used to arrange beta strands in a beta sheet without any prior structure
information. It can also help better understanding the mechanisms of protein
beta sheet formation.
Keywords: beta sheet, beta strand, protein secondary structure, random forest
algorithm.
1 Introduction
*
These authors contributed equally to this paper.
**
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2014, LNBI 8590, pp. 1–9, 2014.
© Springer International Publishing Switzerland 2014
2 L. Tang et al.
contain beta sheets. To predict these beta sheet containing proteins, assigning beta
strands to a beta sheet can reduce the search space in the ab initio methods[13, 45].
Moreover, beta sheets play some important roles in protein functions, particularly in
the formation of the protein aggregation observed in many human diseases, notably
the Alzheimer's disease.
Fig. 1. Illustration of beta strands pairs and configurations. Arrows show the amide (N) to
carbonyl (C) direction of beta strands. Hydrogen bonds are represented by dotted lines.
In a beta sheet, beta strands are paired by the interactive hydrogen bonds in parallel
or antiparallel arrangement (Fig.1). A beta sheet forms a topology (Fig.2b), which can
be described by three components: the group of beta strands in the beta sheet, the
orders of these beta strands on the sequence level (Fig.3), and the configuration of
beta strand pairs (parallel or antiparallel). The order of beta strands arranged in a beta
sheet topology differs with the order of beta strands on sequence level (Fig.2). As
described in the Protein Data Bank Contents Guide[6], beta strands are listed and
numbered according to their orders on the sequence level. In this study, the first and
the last strand in a beta sheet are defined as outer strands, whereas the other strands
are defined as inner strands (Fig.3).
Many studies have been proposed to reach the different levels of understanding the
beta sheet topology. The mechanisms and rules of beta sheet formation are
investigated and simulated by theoretical and experimental method[7-10]. Some
efforts focus on the prediction of residue contact maps, which can be used to construct
the beta sheet topology[11, 12]. Other researcher predicted the parallel/antiparallel
beta strand pairs[13-15], based on the non-random distribution and pairing
Predicting the Outer/Inner BetaStrands in Protein Beta Sheets 3
Fig. 2. (a) Seven strands of protein 1VJG in sequence order. (b) Beta-sheet topology of protein
1VJG. (c) Protein 1VJG rendered in Rasmol.
Considering the two outer beta strands take the starting and terminal location in one
beta sheet, we suggest beta strands probably have different conservative properties in
the outer and inner strands on the sequenced level. In this study, we predict the
outer/inner beta strands in beta sheets using the Random Forest (see Materials and
Methods), extracting the features from the protein primary sequences.
Fig. 3. Illustration of beta strands number in the beta sheets of protein 1VJG in the PDB file.
The number 1 and 5 marked with circles denote the outer beta strands; whereas the number
marked with the box denote inner ones. The number 1 strand, ranging from the sequence
number of residue 43 to 51, corresponds to the second strand in sequence order.
4 L. Tang et al.
2.1 Datasets
The protein structure dataset we used is from a database server named PISCES,
established by Wang et al[22, 23]. For precisely examining the accuracy of the
classification via a cross-validation, an appropriate cutoff threshold of sequence
identity is necessary to avoid the redundancy and homology bias[24, 25]. PISCES
utilizes a combination method of PSI-BLAST and structure-based alignments to
determine sequence identities, and products lists of sequences from the Protein Data
Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence
identity according to the needs of the study. In our investigation, a non-redundant
dataset (cullpdb_pc25_res2.0_R0.25_d090516_chains4260) with the sequence
identity percentage cut-off 25% is used. Crystal structures have a resolution of 2.0 Å
or better and an R-factor below 0.25. We import the set into the Sheet Pair Database
[26] for easier data management and screening. Many incorrectness samples such as
protein chains that contain non-standard amino acids or disordered regions[27, 28],
any patterns with a chain break or heteroatom are excluded. We treat the outer beta
strands as positive samples and the inner ones as negative samples (Fig. 3). In the
final dataset, there are 1,205 proteins, of which contained 11,424 outer beta strands
and 13,285 inner ones.
1 L 1 L 1
x1 = i =1 H 1 ( Ri ), x2 = ∏ i=1 H 1 ( Ri ), x3 = iL=1 H 1 ( Ri ), x4 = iL=1 H 2 ( Ri ) (1-4)
L L L
1 L 1
x5 = ∏ i =1 H 2 ( Ri ), x6 = iL=1 H 2 ( Ri ), x7 = iL=1 M ( Ri ) (5-7)
L L
w L−1
i =1 Θ( Ri , Ri +1 )
x8 =
1
iL=−11Θ( Ri , Ri +1 ), x9 = L −1 (8-9)
L −1 w L−1
=1 f i +
i20 i =1 Θ( Ri , Ri +1 )
L −1
H 10 ( Ri ) 2
i20
0
=1 ( H 1 ( Ri ) − i20 )
H 10 ( Ri ) 20 (11)
H 1 ( Ri ) = ( H 10 ( Ri ) − i20 )/
20 20
trees. RF has a good predictive performance even though the dataset has much noise
[36]. With the increased number of decision trees, RF avoids the overfitting problem
or the dependence on the training data sets[35]. In view of the good characteristics of
Random Forest, it has been applied successfully to deal with many classification or
prediction problems in varied biological fields[37-41]. In this study, we used theWeka
software package[42] to implement the RF classification of the outer/inner beta
strands. There are three parameters to run RF in Weka developer version 3.6.2, which
are I: the number of trees constructed in the forest; K: the number of features
calculated to define each of the nodes in a tree; and S:random number seed. In this
study, we used the default setting without model selection.
To assess the performance of classifiers we used the following measures: the number
of true positives (TP), the number of false positives (FP), the number of true negatives
(TN), the number of false negatives (FN) , sensitivity of positive examples(Sn+),
specificity of positive examples(Sp+), sensitivity of negative examples(Sn-),
specificity of negative examples(Sp-), accuracy(Ac) and Matthews correlation
coefficient(MCC), as described in[43].
3 Results
Anmerkung
[Inhalt]
Maasse:
Kopf 18 mm
Körper 41
,,
Ohr 14 ×
10 ,,
Vorderer Ohrrand 8.5
,,
Tragus (Länge von der tiefstenStelle der Basis bis zur 5×2
Spitze) ,,
Humerus 24
,,
Vorderarm 36
,,
Dig. 1 mit Kralle 6.3
,,
2 (33.5 + 5.5) 39
,, ,,
3 (35 + 11.5 + 9 + 3) 58.5
,, ,,
4 (33 + 11 + 5) 49
,, ,,
5 (29 + 5.5 + 4.7) 39.2
,, ,,
Femur 12
,,
Unterschenkel 15.5
,,
Fuss mit Krallen 10
,,
Sporn c. 13
,,
Schwanz c. 37
,,
Penis 9
,,
[Inhalt]
11 Exemplare,
a–l. in Spiritus, 3 mares (36–37.5 mm), 7 fem.
(36–38), 1 fem. juv. (24); Kema (4. VIII 93), Tomohon (4.
IX 94) etc., Minahassa, Nord Celébes.
1 mas,
m. in Spiritus, Loka, Pik von Bonthain, c. 1200 m,
Süd Celébes, X 95 (33 mm). Dieses Exemplar ist klein,
aber adult.
Die Art, die von Vorderindien bis zu den Molukken nachgewiesen ist,
wurde auch schon speciell von Nord Celébes registrirt (D o b s o n
Cat. 1878, 317 und J e n t i n k Cat. MPB. XII, 190 1888), während
der S a r a s i n sche Fundort von Süd Celébes neu ist. Dresden
besitzt sie auch von NW Neu Guinea (36 mm).
1 Bezeichnungen nach H e n s e l . ↑
2 D o b s o n (Cat. 1878, 227) sagt von V. abramus, dass er relativ einen
grösseren Penis habe als alle anderen Fledermäuse, J e n t i n k (Webers Zool.
Erg. I, 128 1890) führt ein j u n g e s Exemplar von Sumatra auf (Vorderarm nur
14.5 mm), dessen Penis 4.5 mm lang ist. ↑
3 Diese Maasse stimmen nicht genau mit denen der Abbildung Tafel IV, Figur 3,
da diese perspectivisch gezeichnet ist. ↑
[Inhalt]
Emballonuridae
Molossi
[Inhalt]
Tafel IV Fig. 4–6, Tafel X Fig. 3, 4 und 28, und Tafel XI Fig. 2 und 2 a
[17]
Maasse:
Kopf 25 mm
Körper 45
,,
Ohr 21 × 15
,,
Tragus (Länge von der Zacke an gemessen) 2 × 1.2
,,
Humerus 29
,,
Vorderarm 40
,,
Dig. 1 mit Kralle 8.5
,,
42
2 (40 + 2)
,, ,,
3 (39 + 18 + 15 + 7.5) 79.5
,, ,,
4 (37.5 + 14 + 10 + 1.7) 63.2
,, ,,
40.6
5 (23.5 + 11.5 + 4 + 1.6)
,, ,,
Femur 16
,,
Unterschenkel 14.5
,,
Fuss mit Krallen 13
,,
Sporn 8
,,
Schwanz (davon frei 23 mm) 34
,,
Auf Tafel IV Fig. 4 ist eine dorsale Ansicht des ganzen Thieres in
natürlicher, Fig. 5 die des Kopfes von der Seite in doppelter, Fig. 6
die des Tragus in vierfacher Grösse.
Anmerkung
Keine Kehlgrube.
Maasse:
Kopf 25 mm
Körper 41
,,
Ohr 15 × 16
,,
Tragus 2 × 1.6
,,
Humerus 26
,,
Vorderarm 36
,,
Dig. 1 mit Kralle 7
,,
2 33
,, ,,
3 (35 + 14 + 12.5 + 6) 67.5
,, ,,
54.5
4 (33 + 11 + 9 + 1.5)
,, ,,
35.5
5 (22 + 8 + 4 + 1.5)
,, ,,
Femur (?) 15
,,
Unterschenkel 13.5
,,
Fuss mit Krallen 9
,,
Sporn 11
,,
Schwanz (davon frei 15) 34
,,