Professional Documents
Culture Documents
O r i g i nal Resea r c h
Open Access
Full open access to this and
thousands of other papers at
http://www.la-press.com.
Abstract: Effective representation of DNA sequences is one of the important tasks in the study of genome sequences. In this paper, we
propose a graphical representation of DNA sequences based on nucleotide ring structure. In the proposed representation, we convert
DNA sequences into 16 dinucleotides on the surface of the hexagon so that it can preserve nucleotides chemical property and positional
information. Our approach can provide capability of efficient similarity comparison between DNA sequences and also high comparison
accuracy. Furthermore, our approach satisfies uniqueness and no degeneracy of DNA sequences. In the experimental study, we use phylogeny analysis for evolutionary relationship among different species. Extensive performance study shows that the proposed method
can give better performance than existing methods in comparison with the degree of similarity.
Keywords: -globin gene, DNA curve, hexagon, ring structure
251
Bari etal
Introduction
Related Works
r epresentation containing uniqueness and no degeneracy is another contribution in the era of DNA
sequence visualization.
On the other hand, methods which are based on
non-graphical representation must also ensure that no
information is lost. If not, the result would be compatible but not precise with other methods that have no
loss in conversion. In this paper, we converted DNA
sequences into DNA curves without any loss of information and degeneracy.
keto
keto
purine
pyrimidine
pyrimidine
amino
strong-H
pyrimidine
strong-H
amino
a) Adenine (A)
b) Thymine (T)
c) Guanine (G)
d) Cytosine (C)
purine
purine
amino
keto
amino
pyrimidine
weak-H
purine
keto
weak-H
weak-H
amino
strong-H
pyrimidine
strong-H
keto
weak-H
purine
weak-H
purine
strong-H
pyrimidine
weak-H
keto
253
Bari etal
Table 1. 3D coordinates of ATACGATGCAG based on the proposed method.
Points
Dinucleotide
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
AT
TA
AC
CG
GA
AT
TG
GC
CA
AG
Cycle 1
Cycle 2
0.5
-0.5
-1.0
0
1.0
1.5
0.5
0
1.0
1.0
1.25
2.25
3.5
2.5
2.5
3.75
2.75
1.5
2.5
4.0
1
2
3
4
5
6
7
8
9
10
1
1
1.5
1.5
2
3
2
1
2
3
0
1.5
2.75
1.25
0
0
1
1
0
1
1
2
3
4
5
6
7
8
9
10
AC
TA(1,1)
6
1 AG(0,15)
AT
CA (1,1)
2
AA
CC
TT(0.5,0)
GA(1,0)
(0,0)
TC(1,0)
GG
TG(1,1)
5
GC
CG(1,1)
3
GT
4 CT(0,1.5)
254
= (x si si +1 , ysi si +1 , i),
i = 1, 2, 3, ..., N-1
Cycle 3
Cycle 4
Cycle 5
Cycle 6
0.5
1.5
2.5
1.5
1.0
1.5
1.5
1
1
2
-1.25
-0.25
-0.25
-1.25
-2.5
-3.75
-2.25
-1.0
-2.5
-3.5
1
2
3
4
5
6
7
8
9
10
-0.5
0.5
1.0
0
-1.0
-1.5
-0.5
0
-1
-1
-1.25
-2.25
-3.5
-2.5
-2.5
-3.75
-2.75
-1.5
-2.5
-4
1
2
3
4
5
6
7
8
9
10
-1
-1
-1.5
-1.5
-2
-3
-2
-1
-2
-3
0
-1.5
-2.75
-1.25
0
0
-1
-1
0
-1
1
2
3
4
5
6
7
8
9
10
-0.5
-1.5
-2.5
-1.5
-1.0
-1.5
-1.5
-1.0
-1.0
-2.0
1.25
0.25
0.25
1.25
2.5
3.75
2.25
1.0
2.5
3.5
1
2
3
4
5
6
7
8
9
10
Experimental Analysis
Performance metric, dataset
and experimental environment
10
10
0
4
0 -1
0
4
10
Bari etal
-2
10
-1
-4 -2
0
4
10
-2
-2
-3
0 1
d) Cycle 4
0
0
-1
-2
-3
c) Cycle 3
10
0
0
-1
b) Cycle 2
a) Cycle 1
0 -3
0
0
-1
e) Cycle 5
-2
-4 0
f) Cycle 6
Figure 4. The graphical representation of the proposed model for the example sequence ATACGATGCAG.
50
50
50
0
50
0
-50 -10
-5
0
50
100
100
100
a) Human
0
-50 -10
-5
0
50
b) Chimpanzee
50
50
50
0
50
-50 -10
-5
d) Lemur
-50 -20
-10
50
50
50
Z
Y
-50 -20
-10
0
50
10
g) Rabbit
0
-50 -10
-5
h) Goat
50
0
50
-5
j) Gallus
-10
10
-50 -10
-5
i) Bovine
50
-50 -10
100
-50 -20
0
50
100
100
100
f) Rat
100
0
50
10
e) Mouse
0
50
-5
100
100
0
50
-50 -10
c) Gorilla
100
0
20
0
-20 -10
-5
k) Opossum
256
30
Human
Chimpanzee
Gorilla
Lemur
Rat
Mouse
Rabbit
20
Goat
Bovine
10
Opossum
Gallus
60
50
40
0
50
40
30
20
10
-10
-4
-3
-2
-1
Species
ID/Accession
Database
Length
Human
Chimpanzee
Gorilla
Lemur
Rat
Mouse
Rabbit
Goat
Bovine
Opossum
Gallus
U01317
X02345
X61109
M15734
X06701
V00722
V00882
M15387
X00376
J03643
V00409
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
NCBI
92
105
93
92
92
93
92
86
86
92
92
ux =
1
1
x i , u y = 1 yi , u z = z i
N
N
N
i =1
i =1
i =1
= u 2x + u 2y + u 2z
D i,j =
1j
3j
1i
2i 2j 3i
+
+
N + 1 N + 1 N + 1 N + 1 N + 1 N + 1
2
2
2
4j
6j
5j
6i
4i
5i
+
N + 1 - N + 1 + N + 1 - N + 1 + N + 1 - N + 1
Bari et al
Table 3.
pecies
uman
Chimpanzee
orilla
Lemur
at
Mouse
abbit
oat
Bovine
Opossum
allus
ycle 1
ycle 2
ycle 3
ux
uy
uz
ux
uy
uz
ux
uy
uz
-4.489
-4.7981
-4.5163
-2.3132
-6.7802
-5.6882
-2.3736
-2.7824
-1.7824
-1.8571
-2.4615
-15.0275
-17.5024
-15.2255
-12.6181
-11.8489
-18.2285
-13.6401
-16.1882
-14.6706
-3.8159
-11.5604
46
52.5
46.5
46
46
47
46
43
43
46
46
-9.81
-10.88
-9.9
-6.71
-9.78
-13
-7.31
-9.106
-7.747
-1.92
-5.9
-3.75
-4.89
-3.85
-2.23
1.53
-1.65
-4.88
-4.98
-4.76
0.266
-5.24
46
52.5
46.5
46
46
47
46
43
43
46
46
-6.74725
-7.59135
-6.82065
-3.65385
-3.74176
-7.8172
-5.9011
-6.68235
-6.06471
0.489011
-4.34615
5.947802
5.947115
5.942935
6.035714
10.74725
13.25
2.239011
5.679412
4.447059
2.379121
0.898352
46
52.5
46.5
46
46
47
46
43
43
46
46
ycle 1
ycle 2
ycle 3
ycle 4
ycle 5
ycle 6
48.60017
55.54823
49.13718
47.75529
47.98299
50.73099
48.03838
46.03042
45.46871
46.19535
47.49423
47.18367
53.83806
47.69782
46.54027
47.05305
48.79265
46.83215
44.23482
43.95081
46.04082
46.67191
46.87112
53.37834
47.37182
46.53795
47.38675
49.45373
46.43098
43.88519
43.65269
46.06408
46.21359
46.98303
53.51123
47.48703
46.88248
47.01907
49.95977
46.6677
44.35377
44.18473
46.03293
46.12239
46.55629
53.13654
47.06175
46.75461
46.9441
48.54146
46.33272
43.81217
43.65795
46.24424
46.27557
47.55673
54.30821
48.07599
47.3303
48.01026
49.74948
46.93223
44.79453
44.31434
46.36385
46.96276
himpanzee
0.0020
Gorilla
Lemur
Rat
Mouse
Rabbit
Goat
Bovine
Opossum
Gallus
0.0002
0.0018
0.0124
0.0138
0.0126
0.0109
0.0115
0.0109
0.0134
0.0336
0.0330
0.0336
0.0420
0.0155
0.0116
0.0131
0.0118
0.0080
0.0174
0.0443
0.0236
0.0226
0.0235
0.0280
0.0233
0.0280
0.0311
0.0139
0.0137
0.0139
0.0156
0.0318
0.0330
0.0186
0.0131
0.0343
0.0358
0.0344
0.0236
0.0342
0.0644
0.0238
0.0494
0.0370
0.0189
0.0201
0.0190
0.0114
0.0218
0.0516
0.0088
0.0368
0.0245
0.0169
Human
Chimpanzee
Gorilla
Lemur
Goat
Bovine
Rabbit
Rat
Mouse
Opossum
Gallus
258
Degree of similarity/dissimilarity
pecies
150
Proposed method
Huang Y. et al
Qix et al
Qiz.H. et al
100
Liao B. et al
Jafarzadeh N. et al
50
Gorilla
Chimpanzee
Lemur
Rat
Mouse
Rabbit
Goat
Bovine
Opossum
Gallus
Cycle 4
Cycle 5
Cycle 6
ux
uy
uz
ux
uy
uz
ux
uy
uz
-0.45055
-0.70192
-0.47283
2.208791
3.615385
2.844086
-2.49451
1.088235
0.517647
1.686813
-0.33516
9.55
10.33
9.62
8.78
9.04
16.7
7.46
10.82
10.15
0.431
3.341
46
52.5
46.5
46
46
47
46
43
43
46
46
6.967033
7.875
7.032609
8.208791
8.296703
12.13441
5.39011
14.5
12
3.032967
4.071429
-1.71703
-2.28606
-1.76087
-1.61538
-4.3489
0.172043
-1.29121
-2.75
-2.25
-3.6511
-2.97527
46
52.5
46.5
46
46
47
46
43
43
46
46
3.901099
4.581731
3.951087
5.148352
2.258242
6.903226
3.978022
5.964706
5.864706
0.620879
2.521978
-11.4203
-13.1202
-11.5516
-9.88187
-13.5604
-14.7769
-8.41484
-11.0441
-8.96471
-5.76374
-9.11813
46
52.5
46.5
46
46
47
46
43
43
46
46
ii. The pair (goat, bovine) has the small entry 0.0131
which indicates the evolutionary similarity
between goat and bovine. The biological taxonomy of bovine and goat proves that both of them
are even-toed ungulates and belong to the family
of Bovidae;16
iii.Rat and mouse also show a small entry which
indicates their evolutionary closeness;
iv.The remote mammalian opossum has the largest
entry to all other mammalians.
Phylogenic analysis
Bari etal
goat and opossum are either mammals or nonmammals, however it is known that the goat is a
mammal. It therefore must be concluded that the
opossum is also a mammal, but the most remote
species from the remaining mammals. As a result,
gallus is the only species that is neither mammalian nor shows score value like opossum. Hence,
gallus falls into a single group within the species
analyzed: non-mammalian.
Conclusion
Author Contributions
Funding
Competing Interests
260
As a requirement of publication the authors have provided signed confirmation of their compliance with
ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither
under consideration for publication nor published
elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research
participants (if applicable), and that permission has
been obtained for reproduction of any copyrighted
material. This article was subject to blind, independent, expert peer review. The reviewers reported no
competing interests.
References
1. Hamori E, Ruskin J. H curves: a novel method of representation of nucleotide series especially suited for long DNA sequences. J Biol Chem. 1983;
258(2):131827.
2. Li Y, Huang G, Liao B, Liu Z. H-L curve: a novel 2-D graphical representation of protein sequences. MATCH Commun Math Comput Chem.
2009;61(2):51932.
3. Guo X, Randic M, Basak SC. A novel 2-D graphical representation of
DNA sequences of low degeneracy. Chem Phys Letter. 2001;350(12):
10612.
4. Jafarzadeh N, Iranmanesh A. A novel graphical and numerical representation for analyzing DNA sequences based on codons. MATCH Commun
Math Comput Chem. 2012;68:61120.
5. Yu JF, Wang JH, Sun X. Analysis of similarities/dissimilarities of DNA
sequences based on a novel graphical representation. MATCH Commun
Math Comput Chem. 2010;63:493512.
6. Liao B, Zhu W, Liu Y. 3D graphical representation of DNA sequence without degeneracy and its applications in constructing phylogenic tree. MATCH
Commun Math Comput Chem. 2006;56(1):20916.
7. Cao Z, Liao B, Li R. A group of 3D graphical representation of DNA
sequences based on dual nucleotides. Internat J Quant Chem. 2008;108(9):
148590.
8. Chi R, Ding K. Novel 4D numerical representation of DNA sequences.
Chemical Physics Letters. 2005;407(13):637.
9. Liao B,Li R,Zhu W, Xiang X. On the similarity of DNA primary sequences
based on 5-D representation. J Math Chem. 2007;42(1):4757.
10. Liao B, Wang T. Analysis of similarity/dissimilarity of DNA sequences
based on nonoverlapping triplets of nucleotide bases. J Chem Inf Comput
Sci. 2004;44(5):166670.
11. Wu R, Hu Q, Li R, Yue G. A novel composition coding method of DNA
sequence and its application. MATCH Commun Math Comput Chem.
2012;67:26976.
12. Qi X, Wu Q, Zhang Y, Fuller E, Zhang CQ. A novel model for DNA
sequence similarity analysis based on graph theory. Evol Bioinform Online.
2011;7:14958.
13. Ewens J, Grant G. Statistical Methods in Bioinformatics: An Introduction.
2nd ed. New York: Springer Science; 2005.
14. Randi M, Vrako M, Ler N, Plavi D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem Phys
Lett. 2003;368(12):16.
15. Randi M, Vrako M, Ler N, Plavi D. Analysis of similarity/dissimilarity
of DNA sequences based on novel 2-D graphical representation. Chem Phys
Lett. 2003;371(12):2027.
21. Castro-Chavez F. Most used codons per amino acid and per genome in the
code of man compared to other organisms according to the rotating circular
genetic code. Neuroquantology. 2011;9(4):74766.
22. Castro-Chavez F. A tetrahedral representation of the genetic code emphasizing aspects of symmetry. BIOcomplexity. 2012;2012:16.
23. Castro-Chavez F. Defragged binary I Ching genetic code chromosomes compared to Nirenbergs and transformed into rotating 2D circles and squares
and into a 3D 100% symmetrical tetrahedron coupled to a functional one
to discern start from non-start methionines through a stella octangula.
J Proteome Sci Comput Biol. 2012;2012(1):3.
261
Copyright of Evolutionary Bioinformatics is the property of Libertas Academica Ltd. and its
content may not be copied or emailed to multiple sites or posted to a listserv without the
copyright holder's express written permission. However, users may print, download, or email
articles for individual use.