You are on page 1of 9

YMPEV 5308 No.

of Pages 9, Model 5G
1 October 2015

Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx


1

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution


journal homepage: www.elsevier.com/locate/ympev

6
7

3 Molecular signatures that are distinctive characteristics of the


4 vertebrates and chordates and supporting a grouping of vertebrates with
5 the tunicates q
8 Radhey S. Gupta
9 Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario L8N 3Z5, Canada
10
12
11
13
a r t i c l e i n f o a b s t r a c t
1
2 5
9
16 Article history: Members of the phylum Chordata and the subphylum Vertebrata are presently distinguished solely on the 30
17 Received 29 April 2015 basis of morphological characteristics. The relationship of the vertebrates to the two non-vertebrate chor- 31
18 Revised 16 September 2015 date subphyla is also a subject of debate. Analyses of protein sequences have identified multiple con- 32
19 Accepted 18 September 2015
served signature indels (CSIs) that are specific for Chordata or for Vertebrata. Five CSIs in 4 important 33
20 Available online xxxx
proteins are specific for the Vertebrata, whereas two other CSIs are uniquely found in all sequenced chor- 34
date species including Ciona intestinalis and Oikapleura dioica (Tunicates) as well as Brachiostoma floridae 35
21 Keywords:
(Cephalochordates). The shared presence of these molecular signatures by all vertebrates/chordate spe- 36
22 Chordates
23 Vertebrates
cies, but in no other animal taxa, strongly indicates that the genetic changes represented by the identified 37
24 Cephalochordates/urochordates CSIs diagnose monophyletic groups. Two other discovered CSIs are uniquely shared by different verte- 38
25 Evolutionary relationships brate species and by either one (Ciona intestinalis) or both tunicate (Ciona and Oikapleura) species, but 39
26 Conserved signature indels they are not found in Branchiostoma or other animal species. Specific presence of these CSIs in different 40
27 Molecular signatures vertebrates and either one or both tunicate species provides strong independent evidence that the verte- 41
28
brate species are more closely related to the urochordates (tunicates) than to the cephalochordates. 42
Ó 2015 Elsevier Inc. All rights reserved. 43
44

45
46
47 1. Introduction studies based primarily on small subunit (SSU) and large subunit 66
(LSU) rRNA gene sequences (Cameron et al., 2000; Mallatt and 67
48 The phylum Chordata comprises Vertebrata as well as two other Winchell, 2007; Winchell et al., 2002). Additionally, the genomic 68
49 non-vertebrate taxa Urochordata and Cephalochordata (Nielsen, organization of the Hox genes in cephalochordates also suggests 69
50 1995; Gee, 1996; Jefferies, 1986). The chordates in turn are part that cephalochordates are more similar phylogenetically to the 70
51 of the superphylum Deuterostomes, which also includes the phyla vertebrates than to the tunicates, whose genomes are highly diver- 71
52 Echinodermata, Hemichordata and a recently described phylum gent and not informative in this regard (Pascual-Anaya et al., 2013; 72
53 Xenoturbellida (Nielsen, 1995; Gee, 1996; Jefferies, 1986; Bourlat Swalla and Smith, 2008). In contrast to these studies, the interrela- 73
54 et al., 2006; Blair and Hedges, 2005). Because Vertebrata contains tionships among different deuterostomes and metazoan phyla 74
55 all known vertebrate species, an understanding of its evolutionary have been examined in detail based on large datasets of sequences 75
56 relationship to the other chordates and deuterostomes is of central for nuclear proteins (Delsuc et al., 2006, 2008; Bourlat et al., 2006; 76
57 importance to zoology (Blair and Hedges, 2005; Philippe and Blair and Hedges, 2005). Surprisingly, the results of these studies 77
58 Telford, 2006; Edgecombe et al., 2011; Springer et al., 2004; strongly indicate that the urochordates and not cephalochordates 78
59 Bourlat et al., 2006; Delsuc et al., 2006). Of the two non- are the sister taxon to the vertebrates. The distal branching of 79
60 vertebrate chordate taxa, cephalochordates are morphologically the tunicates from vertebrates in earlier studies was shown to be 80
61 more similar to the vertebrates than to the adult urochordates an artifact of long-branch attraction attributed to rapid evolution 81
62 (tunicates); thus, they are traditionally considered to be the closest within the tunicates (Tsagkogeorga et al., 2010; Delsuc et al., 82
63 relatives of vertebrates (Nielsen, 1995; Gee, 1996; Jefferies, 1986). 2006). In some of these studies, monophyly of the phylum 83
64 A grouping of cephalochordates with vertebrates to the exclusion Chordata was ambiguous using various phylogenetic methods 84
65 of urochordates is also observed in a number of phylogenetic (Delsuc et al., 2006; Winchell et al., 2002; Cameron et al., 2000; 85
Glenner et al., 2004). 86

q The inference from recent studies that tunicates are the closest 87
This paper was edited by the Associate Editor A. Larson.
E-mail address: gupta@mcmaster.ca relatives of vertebrates is of much importance for understanding 88

http://dx.doi.org/10.1016/j.ympev.2015.09.019
1055-7903/Ó 2015 Elsevier Inc. All rights reserved.

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

2 R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

89 the origin and evolution of vertebrates (Delsuc et al., 2006; Blair species are more closely related to the tunicates (or Urochordata 155
90 and Hedges, 2005; Bourlat et al., 2006; Swalla and Smith, 2008). subphylum) than to the cephalochordates. 156
91 Currently, the evidence that tunicates are more closely related to
92 the vertebrates is entirely based on molecular phylogenetic studies
2. Materials and methods 157
93 (Delsuc et al., 2006; Blair and Hedges, 2005; Bourlat et al., 2006).
94 However, several recent studies show that the inferences from
Identification of conserved signature indels that are specific for 158
95 molecular phylogenetic studies, even when they are based on large
the chordates or vertebrates was performed as described in earlier 159
96 datasets involving multiple proteins, are sensitive to multiple con-
work (Gupta, 2014, 1998; Gupta and Golding, 1996). Briefly, for 160
97 founding factors including differences in evolutionary rates among
these studies, Blastp searches were performed on >2000 proteins 161
98 species, composition biases in sequences, on sampling of taxa, con-
from the genome of Ciona intestinalis (Satou et al., 2008). For each 162
99 flict in phylogenetic signal contained within the different amino
protein for which high-scoring homologs were found in assorted 163
100 acid sequences, and long-branch length attraction (Rokas et al.,
vertebrates as well as non-vertebrate species, sequences for 164
101 2003; Jeffroy et al., 2006; Nosenko et al., 2013; Delsuc et al.,
20–25 homologs from divergent chordates and other animal taxa 165
102 2008). Due to these factors, inferences from independent phyloge-
were retrieved, and their multiple sequence alignments were cre- 166
103 netic studies are often contradictory (Delsuc et al., 2006; Bourlat et
ated using Clustal X 1.83 (Jeanmougin et al., 1998). Additionally, 167
104 al., 2006; Teeling and Hedges, 2013; Song et al., 2012; Philippe and
multiple sequence alignments for large numbers (>1000) of other 168
105 Telford, 2006). Although in studies that group tunicates and verte-
proteins from diverse eukaryotic lineages were also utilized in this 169
106 brates as a clade precautions were taken to guard against these
work. The alignments were visually inspected to identify any con- 170
107 artifacts (Delsuc et al., 2006; Blair and Hedges, 2005; Bourlat et
served insert or deletion (indel) that was flanked on both sides by 171
108 al., 2006), it is important to confirm the relationship of vertebrates
at least 5–6 identical/conserved residues in the neighboring 30–40 172
109 to the other chordate taxa by independent means.
amino acids, and which was uniquely found in either different 173
110 The chordate as well as vertebrate clades are presently distin-
chordates or vertebrate species. The indels that were not flanked 174
111 guished from other animals only on the basis of a limited number
on both sides by conserved regions, or those limited to other clades 175
112 of morphological characteristics (Nielsen, 1995; Gee, 1996;
of animals were not further evaluated in this work. The selected 176
113 Jefferies, 1986; Swalla and Smith, 2008). Besides the morphological
indels of interest were further evaluated by performing repeat 177
114 characteristics, no other reliable molecular or biochemical prop-
Blastp searches on the indels and their flanking conserved regions. 178
115 erty is known that is specifically shared by either all chordates or
In most cases, top 500 hits were examined to determine the taxon 179
116 all vertebrates and can be used to distinguish these important
specificity of the observed indels (Gupta, 2014; Naushad et al., 180
117 groups of animals from all others. The availability of genome
2014). The results of these Blast searches were processed using 181
118 sequences from large numbers of animal species covering the
SIG_CREATE and SIG_STYLE programs to construct signature files 182
119 diversity of metazoan taxa now provides a valuable resource for
(Gupta, 2014). In the main figures, the results for the presence or 183
120 identifying novel molecular markers that are diagnostic for differ-
absence of the indels in different groups are shown for only a lim- 184
121 ent animal taxa. Conserved signature indels (CSIs) in protein
ited number of representative species. However, unless otherwise 185
122 sequences constitute one type of rare genetic changes (RGCs) that
indicated, the CSIs described here are specific for the indicated 186
123 provide very useful markers for this purpose, and they have been
clades of animals, and similar CSIs were not present in other ani- 187
124 used extensively for evolutionary and systematic studies (Rokas
mals within the 500 blast hits. 188
125 and Holland, 2000; Springer et al., 2004; Baldauf and Palmer,
126 1993; Rivera and Lake, 1992; Gupta et al., 1994; Gupta, 1998,
127 2014; Bhandari et al., 2012). Although the shared presence of CSIs 3. Results 189

128 in protein sequences in some cases can represent homoplasy or lat-


129 eral gene transfers (Bapteste and Philippe, 2002; Gupta, 2012), in The aim of this study is to identify novel molecular markers in 190

130 general, when a conserved indel of a definite length is found the form of conserved signature indels in protein sequences speci- 191

131 uniquely in a phylogenetically related group of organisms, its most fic for either the Vertebrata or other chordate subphyla. The pre- 192

132 parsimonious explanation is inheritance from the most recent sence or absence of conserved indels in gene/protein sequences 193

133 common ancestor (Nielsen, 1995; Gee, 1996; Jefferies, 1986). Thus, is generally not affected by factors such as differences in the evolu- 194

134 conserved signature indel(s) provide powerful means to support or tionary rates among species, composition biases, long-branch 195

135 refute a given phylogenetic hypothesis. attraction, etc., which significantly affect phylogenetic analyses of 196

136 In the present work, I have examined sequence alignments of base substitutions (Rokas and Holland, 2000; Gupta, 1998, 2014; 197

137 >3000 proteins from different metazoan species to identify con- Springer et al., 2004). Thus, the presence of such molecular 198

138 served signature indels that are specific for either the Vertebrata markers as synapomorphies for a given group of species generally 199

139 or groupings of vertebrates with other animal taxa (particularly provides strong evidence that the species harboring the given 200

140 the other chordate lineages). These studies have identified seven signature form a monophyletic group (Baldauf and Palmer, 1993; 201

141 CSIs in 6 widely distributed proteins that are uniquely found in Bhandari et al., 2012; Gupta, 2014; Rivera and Lake, 1992). To iden- 202

142 either all sequenced vertebrate species, or all sequenced chordate tify conserved indels, multiple sequence alignments of >3000 pro- 203

143 species, but which are not present in any other animal groups/ teins from assorted animal taxa were created and then examined 204

144 phyla. The unique shared presence of these CSIs (synapomorphies) for the presence of conserved inserts or deletions that were 205

145 in these animal groups provides evidence that these groups are restricted to members of the groups of interest. These studies have 206

146 monophyletic, and the identified characteristics provide novel identified a number of conserved indels that are specific for either 207

147 molecular means for distinguishing members of these groups from all sequenced vertebrate species, or which provide information 208

148 other animal taxa. Additionally, the present study has identified 2 regarding the relationship of vertebrates to other chordate 209

149 other CSIs in widely distributed proteins that are uniquely shared lineages. 210

150 by all sequenced vertebrate species and the urochordate species


151 (Ciona intestinalis and Oikopleura dioica), but which are not found 3.1. Molecular signatures that are specific for the vertebrates 211
152 in Branchiostoma (cephalochordate) or other deuterostome species.
153 The specific presence of these CSIs in these two chordate lineages Of the molecular signatures identified in this work, five are 212
154 provides strong and independent confirmation that the vertebrate specific for the subphylum Vertebrata. Two examples of the 213

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 3

214 vertebrates-specific signatures are shown in Fig. 1. In the univer- deuterostomes (viz. Saccoglossus kowalevskii (Hemichordata) and 278
215 sally distributed protein elongation factor-2 (EF-2), which plays a Strongylocentrotus purpuratus (Echinodermata)) as well as by other 279
216 central role in protein synthesis by catalyzing GTP-dependent animal and plant homologs. The only exception seen in this case is 280
217 translocation of ribosome (Kaul et al., 2011), a 13 aa insert in a that some nematode species also contain an insert in this position. 281
218 highly conserved region is uniquely found in all of the sequenced However, as nematodes are distantly related to the deuterostomes 282
219 vertebrate species, but it is not found in any other animal or eukar- and/or chordate phyla (Edgecombe et al., 2011; Glenner et al., 283
220 yotic taxa (Fig. 1A). Another prominent signature consisting of a 20 2004), the observed insert in nematodes has very likely occurred 284
221 aa conserved insert that is also specific for the vertebrates occurs in independently, and it is unrelated to that found in the chordates. 285
222 the eukaryotic translational initiation factor 3 (eIF-3) (subunit L The other molecular signature that is specific for the chordates is 286
223 isoform X2) (Fig. 1B). Both these inserts are of fixed lengths, and in the mitochondrial inner membrane protease (MIMP) ATP23 (iso- 287
224 they are flanked on both sides by conserved regions, which give form X1). The 1 aa insert in this protein is also uniquely present in 288
225 confidence that the identified CSIs are reliable molecular charac- all sequenced vertebrate species as well as in C. intestinalis and 289
226 teristics. In Fig. 1 (as well as in Figs. 2 and 3), sequence information B. floridae. However, a homolog showing significant similarity to 290
227 is shown for only a limited number of species. More detailed infor- this region of the protein was not found in O. dioica, whose genome 291
228 mation for these signatures is provided in Supplementary Figs. 1 has diverged greatly from other chordate species (Denoeud et al., 292
229 and 2. As seen, both these conserved inserts are commonly shared 2010). Besides the above noted exception, the CSIs in both of the 293
230 by all sequenced vertebrate species (>140) including different above proteins are highly specific for the chordates and absent 294
231 mammals, birds, fishes, reptiles and amphibians. In contrast to from deuterostomes as well as other animals or eukaryotic species 295
232 their shared presence by vertebrates, these CSIs are absent from (see Supplementary Fig. 4 for more detailed information for these 296
233 the non-vertebrate chordate species (Ciona intestinalis and Oiko- signatures). The unique shared presence of these two CSIs in differ- 297
234 pleura dioica (tunicates); Branchiostoma floridae (cephalochor- ent chordate homologs provides evidence that all of the chordate 298
235 dates)), deuterostomes (Saccoglossus kowalevskii (Hemichordata) species constitute a monophyletic group. 299
236 and Strongylocentrotus purpuratus (Echinodermata)) as well as other As noted in the introduction, the relationship of vertebrates to 300
237 animal/eukaryotic taxa. Interestingly, in the sequence alignment of the other chordate lineages has been a subject of much research 301
238 eIF-3 shown in Fig. 1B and Supplementary Fig. 2, homologs from and controversy. While morphological considerations and earlier 302
239 different fishes (including cartilaginous) contain a number of spe- phylogenetic studies favor the view that vertebrates are more clo- 303
240 cific amino acid substitutions, which serve to distinguish these sely related to the cephalochordates (Nielsen, 1995; Gee, 1996; 304
241 classes of vertebrates from all others. Jefferies, 1986), more recent phylogenetic studies strongly indicate 305
242 Three other CSIs in two other proteins are also specific for the that tunicates are the closest relatives of vertebrates (Delsuc et al., 306
243 Vertebrata subphylum. Sequence information for these molecular 2006, 2008; Bourlat et al., 2006; Blair and Hedges, 2005). Thus, 307
244 signatures is presented in Supplementary Figs. 3–5, and some of additional evidence that can independently test this important 308
245 their characteristics are summarized in Table 1. Of these relationship is of much importance. The present work has 309
246 vertebrate-specific signatures, two of them consisting of 1 aa con- identified two additional molecular signatures that provide useful 310
247 served inserts occur in different regions of the protein adenosine insights in this regard. The first of these molecular signatures 311
248 kinase (Supplementary Figs. 3 and 4). The enzyme adenosine kinase consists of a 2 aa deletion in a conserved region of the predicted 312
249 is ubiquitously present in eurkaryotic organisms and plays a central exosome complex RRP44 protein, which is uniquely shared by all 313
250 role in controlling the intra- and extra-cellular levels of adenosine sequenced vertebrate species and the two tunicates Cinoa intesti- 314
251 and in the regulation of methylation (Bjursell et al., 2011; Park nalis and O. dioica. However, this deletion does not characterize 315
252 and Gupta, 2008). The last of the vertebrate-specific signatures B. floridae (cephalochordate) or any other deuterostomes or animal 316
253 comprises a 2 aa deletion in a conserved protein that is related to species. The shared presence of the 2 aa conserved deletion by all 317
254 ubiquitin carboxyl-terminal hydrolase (Supplementary Fig. 5). In of the vertebrates and tunicates, but not by cephalochordates, sup- 318
255 all of these cases, the identified signatures or CSIs are uniquely pre- ports the view that tunicates are more closely related to the verte- 319
256 sent in different vertebrate species, but they are lacking in all other brates, and that the genetic change responsible for this CSI 320
257 animal taxa including the other sequenced chordates and deuteros- occurred in a common ancestor of the vertebrates and tunicates. 321
258 tomes. The unique shared presence of these CSIs in the gene/protein The second informative molecular signature is in the protein serine 322
259 homologs from different vertebrate species to the exclusion of all palmitoyltransferase, which plays a key role in the biosynthesis of 323
260 other animal taxa provides strong evidence that these molecular sphingosine and other sphingolipids (Hanada, 2003). In this case, a 324
261 signatures represent shared derived synapomorphic characteristics 1 aa insert in a conserved region is uniquely present in all verte- 325
262 that are distinctive of the vertebrate clade. brate species and Cinoa intestinalis. However, this insert is absent 326
from O. dioica (a tunicate) and B. floridae (cephalochordate) as well 327
263 3.2. Molecular characteristics that are specific for the chordates and as other deuterostomes and animal species. This signature suggests 328
264 support a grouping of the vertebrates with the tunicates that among the tunicates O. dioica, whose genome has undergone 329
extensive reorganization (Denoeud et al., 2010), has lost this insert. 330
265 In contrast to the vertebrate species, which form a strongly sup- In contrast to these molecular signatures that are specifically 331
266 ported clade in different phylogenetic studies, monophyly of the shared by the vertebrate species and either one or both of the 332
267 chordates is not universally or strongly supported by molecular tunicate species, no signature was identified in this work that 333
268 data (Delsuc et al., 2006; Winchell et al., 2002; Cameron et al., was specifically shared by vertebrates and B. floridae (cephalochor- 334
269 2000; Glenner et al., 2004). The present work has identified two dates) to the exclusion of tunicates. These results support the view 335
270 conserved inserts in two different proteins that are uniquely that vertebrates are more closely related to the tunicates (urochor- 336
271 shared by all of the sequenced chordate species and are informa- dates) than to the cephalochordates. 337
272 tive in this regard (Fig. 2). The first of these molecular signatures
273 consists of a 1 aa insert in a cyclophilin-like protein (Fig. 2A). This
274 conserved insert is commonly shared by all sequenced vertebrate 4. Discussion 338
275 species as well as by the homologs of the tunicate species
276 (viz. C. intestinalis and O. dioica) and a cephalochordate (viz. B. flor- The phylum Chordata, which includes vertebrates and two 339
277 idae). However, this insert is absent from homologs from other non-vertebrate subphyla (Urochordata and Cephalochordata), is 340

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

4 R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

Fig. 1. Partial sequence alignments of (A) the protein synthesis elongation factor-2 (EF-2) and (B) eukaryotic translational initiation factor 3 (subunit L) showing two large
inserts (boxed) in conserved regions of the proteins that are commonly shared by all sequenced vertebrate species, but which are not found in the homologs from other
chordates, deuterostomes or animal/eukaryotic species. Sequence information for only representative species from different groups is shown in this and other figures, but more
detailed information for these signatures is provided in Supplementary Figs. 1 and 2. The dashes (-) in the alignments indicate identity with the residue in the sequence shown
on the top lines. GenBank identification (GI) numbers for each sequence are indicated in the second column. At the position indicated by ⁄ an extra amino acid is present in
Callorhinchus milli (⁄ = SA). Sequence information for three other CSIs that are also specific for the vertebrate species is provided in Supplementary Figs. 3–5 and in Table 1.

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 5

Fig. 2. Excerpts from sequence alignments of (A) a cyclophilin-like protein and (B) mitochondrial inner membrane protease ATP23 showing 1 aa conserved inserts found in
conserved regions of these proteins that are uniquely shared by different vertebrates and chordate species, but not found in other animal species, with the exception of some
nematodes that also contain a 1 aa insert in the cyclophilin-like protein. More detailed information for these signatures is provided in Supplementary Figs. 6 and 7.

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

6 R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

Fig. 3. Partial sequence alignments of the protein (A) predicted exosome complex RRP44 protein and (B) serine palmitoyltransferase showing two conserved signature indels
(a 2 aa deletion and 1 aa insert, respectively) that are specifically shared by different vertebrate species and either both of the sequenced tunicate species (viz. Cinoa
intestinalis and O. dioica) or only by Cinoa intestinalis. These CSIs are not found in B. floridae or other animal species (see Supplementary Figs. 8 and 9). In (A), in the place
marked by ⁄, an extra amino acid (⁄ = EG) is present in B. floridae.

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 7

Table 1
Characteristics of the identified conserved signature indels (CSIs).

Protein name GI number Figure number Indel size Indel positiona Specificity
Elongation Factor-2 194387694 Fig. 1A and Supp. Fig. 1 13 aa insert 217–267 Vertebrata
Eukaryotic translation initiation factor-3 194374379 Fig. 1B and Supp. Fig. 2 20 aa insert 48–106 Vertebrata
subunit L isoform X2
Adenosine Kinase Signature-1 530393088 Supp. Fig. 3 1 aa insert 62–123 Vertebrata
Adenosine Kinase Signature-2 530393088 Supp. Fig. 4 1 aa insert 132–174 Vertebrata
Ubiquitin carboxyl-terminal hydrolase 158261139 Supp. Fig. 5 2 aa deletion 11–58 Vertebrata
related protein
Cyclophilin-like protein 1199600 Fig. 2A and Supp. Fig. 6 1 aa insert 185–229 Chordata
Mitochondrial inner membrane protease ATP23 530400987 Fig. 2B and Supp. Fig. 7 1 aa insert 279–333 Chordata
Predicted exosome complex RRP44 protein 119600924 Fig. 3A and Supp. Fig. 8 2 aa deletion 415–456 Vertebrata + Tunicates
(Cinoa intestinalis and O. dioica)
Serine palmitoyltransferase 4758668 Fig. 3A and Supp. Fig. 9 1 aa insert 152–181 Vertebrata + Tunicates (Ciona intestinalis)
a
The region of the specified protein that contains the indel.

341 presently recognized only on the basis of certain morphological signature sequences in different species (Ahmod et al., 2011; 390
342 characteristics viz. the notochord, a hollow dorsal nerve cord, phar- Wong et al., 2014; Griffiths et al., 2005) 391
343 yngeal gills and a muscular post-anal tail (Nielsen, 1995; Gee, This work also provides important independent insight as to 392
344 1996; Jefferies, 1986). Besides these characteristics, some of which which of the two other chordate subphyla is more closely related 393
345 are observed only at certain developmental stages in some mem- to the vertebrates. In contrast to phylogenetic studies of single base 394
346 bers, no other reliable molecular or biochemical characteristic is substitutions, where the derived inferences can be affected by a 395
347 known to be specific to all chordates. The vertebrate species repre- variety of factors including differences in evolutionary rates, com- 396
348 sent an overwhelming majority of the chordates, and in them the position differences and long-branch attraction effect, the determi- 397
349 notochord induces formation of a vertebral column (Nielsen, nation of evolutionary relationships based upon shared presence of 398
350 1995; Gee, 1996; Jefferies, 1986). However, apart from this and RGCs such as the CSIs is generally insensitive to these confounding 399
351 other chordate characteristics, no reliable molecular or biochem- factors (Rokas and Holland, 2000; Springer et al., 2004; Baldauf and 400
352 ical characteristic is known that is uniquely shared by all verte- Palmer, 1993; Rivera and Lake, 1992; Gupta, 1998, 2014; Bhandari 401
353 brates. Hence, identification of molecular markers that are et al., 2012). Additionally, the CSI-based approach makes use of 402
354 specific for the vertebrates is of considerable significance. sequence information for all available taxa, and thus the inferences 403
355 The analysis of protein sequences from different vertebrates based upon it are not dependent upon taxon selection. This work 404
356 and non-vertebrate species reported here has identified multiple describes two CSIs that are uniquely present in vertebrate species 405
357 molecular markers in the forms of conserved signature indels as well as either one (Ciona intestinalis) or both of the tunicates 406
358 that are specific for these two large groups of animals. Of these (Ciona and Oikapleura). However, these CSIs are not found in 407
359 molecular signatures, 5 CSIs in 4 proteins (viz. EF-2, AdK, Euk IF- Branchiostoma or other deuterostomes or animal species. The 408
360 3 and in a protein related to ubiquitin carboxyl-terminal hydro- unique shared presence of these CSIs by different vertebrates and 409
361 lase) are specific for the Vertebrata. In addition, two other CSIs in the tunicate species strongly indicate that these two taxa are more 410
362 the proteins cyclophilin-like protein and mitochondrial inner closely related to each other than either is to Branchiostoma 411
363 membrane protease ATP23 are uniquely found in all sequenced (cephalochordate). In view of the demonstrated specificity of dif- 412
364 vertebrates as well as the non-vertebrate chordate species includ- ferent CSIs described in this work for specific lineages, the chance 413
365 ing the tunicates (C. intestinalis and O. dioica) and cephalochordates occurrence of these CSIs in the tunicates is considered highly unli- 414
366 (B. floridae). The discovered CSIs occur in highly conserved regions kely. It is also important to note that no CSI was identified in this 415
367 of these proteins, and their presence or absence can be readily work that is commonly shared by the vertebrates and the cephalo- 416
368 ascertained by means of visual examination of the sequence align- chordates. These results provide strong evidence independent of 417
369 ments. The specificity as well as the reliability of these molecular prior phylogenetic studies (Delsuc et al., 2006, 2008; Bourlat et 418
370 signatures (both large and small CSIs) for the Vertebrata/Chordata al., 2006; Blair and Hedges, 2005) that the tunicates are the closest 419
371 taxa can be gauged from the fact that these signatures are present relatives of the vertebrates. 420
372 in the protein homologs of all (>140) sequenced vertebrate species The proteins in which these vertebrate or chordate-specific sig- 421
373 (covering all main lineages including mammals, reptiles, amphi- natures occur are widely distributed, and several of them play key 422
374 bians, birds, and both bony and cartilaginous fishes); they are roles in different cellular processes. Of these proteins, protein 423
375 not found in any other animal taxa, except sporadically. The unique synthesis elongation factor-2 and the eukaryotic translational 424
376 presence of these molecular signatures in different vertebrates/ initiation factor 3 (subunit L isoform X2) are essential components 425
377 chordates strongly indicates that the genetic changes responsible of protein synthesis (Kaul et al., 2011; Jackson et al., 2010). The 426
378 for them occurred in the common ancestors of the vertebrate or enzyme adenosine kinase (AdK) plays a central role in controlling 427
379 chordate lineages, and these changes were then retained by all des- the intra- and extra-cellular concentration of adenosine and in 428
380 cendant species. The discovery of these molecular signatures or the maintenance of methylation reactions (Bjursell et al., 2011; 429
381 synapomorphic characteristics that are specific for either all verte- Park and Gupta, 2008). Gene-knockout studies on the AdK gene 430
382 brates or chordates provides strong evidence that both these have established that the function of this gene/protein is essential 431
383 groups are monophyletic. for the survival of mammalian species (Boison et al., 2002). In view 432
384 The identified CSIs because of their specificity for Vertebrata/ of the evolutionary conservation and high degree of specificity of 433
385 Chordata also provide novel molecular diagnostic tools for mem- the identified CSIs in these proteins, it is expected that the genetic 434
386 bers of these lineages (Ahmod et al., 2011). Due to the high degree changes represented by these CSIs are functionally important for 435
387 of sequence conservation of genes/proteins containing these CSIs, the vertebrate/chordate species. Indeed, earlier work on a number 436
388 degenerate PCR primers for sequence regions flanking these CSIs of CSIs (both large and small) in the Hsp60 and Hsp70 proteins 437
389 can be employed to determine the presence or absence of these show that the observed CSIs are essential for the group of 438

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

8 R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

439 organisms in which they occur, and the removal of these CSIs or Wincker, Chourrout, D., . Plasticity of animal genome architecture unmasked by 509
rapid evolution of a pelagic tunicate. Science 330, 1381–1385. 510
440 any significant changes in them are incompatible with cellular 511
Edgecombe, G.D., Giribet, G., Dunn, C.W., Hejnol, A., Kristensen, R.M., Neves, R.C.,
441 growth (Singh and Gupta, 2009). Structural studies on CSIs show Rouse, G.W., Worsaae, K., Sorensen, M.V., 2011. Higher-level metazoan 512
442 that they are generally located in the surface loops of the proteins relationships: recent progress and remaining questions. Organ. Divers. Evol. 513
11, 151–172. 514
443 (Akiva et al., 2008; Gao and Gupta, 2012), which are predicted to 515
Gao, B., Gupta, R.S., 2012. Phylogenetic framework and molecular signatures for the
444 play important roles in mediating novel protein–protein or pro- main clades of the phylum Actinobacteria. Microbiol. Mol. Biol. Rev. 76, 66–112. 516
445 tein–ligand interactions that are specific for the group of species Gee, H., 1996. Before the Backbone. Chapman and Hall, London. 517
Glenner, H., Hansen, A.J., Sorensen, M.V., Ronquist, F., Huelsenbeck, J.P., Willerslev, 518
446 in which these CSIs occur. Based on these observations, it is
E., 2004. Bayesian inference of the metazoan phylogeny; a combined molecular 519
447 expected that the CSIs for the vertebrates and/or chordates and morphological approach. Curr. Biol. 14, 1644–1649. 520
448 described in this work will also be involved in cellular functions Griffiths, E., Petrich, A., Gupta, R.S., 2005. Conserved indels in essential proteins that 521
449 that are specific and essential for these animal lineages. are distinctive characteristics of Chlamydiales and provide novel means for their 522
identification. Microbiology 151, 2647–2657. 523
Gupta, R.S., 1998. Protein phylogenies and signature sequences: a reappraisal of 524
450 Acknowledgments evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. 525
Microbiol. Mol. Biol. Rev. 62, 1435–1491. 526
Gupta, R.S., 2012. Origin and spread of photosynthesis based upon conserved 527
451 This work was supported by a research grant (No. 249924) from sequence features in key bacteriochlorophyll biosynthesis proteins. Mol. Biol. 528
452 the Natural Science and Engineering Research Council of Canada. I Evol. 29, 3397–3412. 529
Gupta, R.S., 2014. Identification of conserved indels that are useful for classification 530
453 thank Rajni Gupta, Fariq Aziz and Reena Fabros for assistance in the and evolutionary studies. In: Goodfellow, M., Sutcliffe, I.C., Chun, J. (Eds.), 531
454 creation of sequence alignments and in the formatting of the signa- Bacterial Taxonomy, Methods in Microbiology, vol. 41. Elsevier, London, pp. 532
455 ture files. The assistance of Mobolaji Adeolu and Jeffery Chan in 153–182. 533
Gupta, R.S., Aitken, K., Falah, M., Singh, B., 1994. Cloning of Giardia lamblia heat 534
456 creating the Graphical Abstract of the present work is also much 535
shock protein HSP70 homologs: implications regarding origin of eukaryotic
457 appreciated. cells and of endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 91, 2895–2899. 536
Gupta, R.S., Golding, G.B., 1996. The origin of the eukaryotic cell. Trends Biochem. 537
Sci. 21, 166–171. 538
458 Appendix A. Supplementary material Hanada, K., 2003. Serine palmitoyltransferase, a key enzyme of sphingolipid 539
metabolism. Biochim. Biophys. Acta 1632, 16–30. 540
Jackson, R.J., Hellen, C.U., Pestova, T.V., 2010. The mechanism of eukaryotic 541
459 Supplementary data associated with this article can be found, in translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 542
460 the online version, at http://dx.doi.org/10.1016/j.ympev.2015.09. 11, 113–127. 543
461 019. Jeanmougin, F., Thompson, J.D., Gouy, M., Higgins, D.G., Gibson, T.J., 1998. Multiple 544
sequence alignment with Clustal X. Trends Biochem. Sci. 23, 403–405. 545
Jefferies, R.P.S., 1986. The Ancestry of the Vertebrates. British Museum (Natural 546
462 References History), London. 547
Jeffroy, O., Brinkmann, H., Delsuc, F., Philippe, H., 2006. Phylogenomics: the 548
beginning of incongruence? Trends Genet. 22, 225–231. 549
463 Ahmod, N.Z., Gupta, R.S., Shah, H.N., 2011. Identification of a Bacillus anthracis
Kaul, G., Pattan, G., Rafeequi, T., 2011. Eukaryotic elongation factor-2 (eEF2): its 550
464 specific indel in the yeaC gene and development of a rapid pyrosequencing
regulation and peptide chain elongation. Cell Biochem. Funct. 29, 227–234. 551
465 assay for distinguishing B. anthracis from the B. cereus group. J. Microbiol. Meth.
Mallatt, J., Winchell, C.J., 2007. Ribosomal RNA genes and deuterostome phylogeny 552
466 87, 278–285.
revisited: more cyclostomes, elasmobranchs, reptiles, and a brittle star. Mol. 553
467 Akiva, E., Itzhaki, Z., Margalit, H., 2008. Built-in loops allow versatility in domain–
Phylogenet. Evol. 43, 1005–1022. 554
468 domain interactions: lessons from self-interacting domains. Proc. Natl. Acad.
Naushad, H.S., Lee, B., Gupta, R.S., 2014. Conserved signature indels and signature 555
469 Sci. USA 105, 13292–13297.
proteins as novel tools for understanding microbial phylogeny and systematics: 556
470 Baldauf, S.L., Palmer, J.D., 1993. Animals and fungi are each other’s closest relatives:
identification of molecular signatures that are specific for the phytopathogenic 557
471 congruent evidence from multiple proteins. Proc. Natl. Acad. Sci. USA 90,
genera Dickeya, Pectobacterium and Brenneria. Int. J. Syst. Evol. Microbiol. 64, 558
472 11558–11562.
366–383. 559
473 Bapteste, E., Philippe, H., 2002. The potential value of indels as phylogenetic
Nielsen, C., 1995. Animal Evolution: Interrelationship of the Living Phyla. Oxford 560
474 markers: position of trichomonads as a case study. Mol. Biol. Evol. 19, 972–977.
University Press, Oxford. 561
475 Bhandari, V., Naushad, H.S., Gupta, R.S., 2012. Protein based molecular markers
Nosenko, T., Schreiber, F., Adamska, M., Adamski, M., Eitel, M., Hammel, J., 562
476 provide reliable means to understand prokaryotic phylogeny and support
Maldonado, M., Muller, W.E., Nickel, M., Schierwater, B., Vacelet, J., Wiens, M., 563
477 Darwinian mode of evolution. Front. Cell. Infect. Microbiol. 2, 98.
Worheide, G., 2013. Deep metazoan phylogeny: when different genes tell 564
478 Bjursell, M.K., Blom, H.J., Cayuela, J.A., Engvall, M.L., Lesko, N., Balasubramaniam, S.,
different stories. Mol. Phylogenet. Evol. 67, 223–233. 565
479 Brandberg, G., Halldin, M., Falkenberg, M., Jakobs, C., Smith, D., Struys, E., von
Park, J., Gupta, R.S., 2008. Adenosine kinase and ribokinase – the RK family of 566
480 Dobeln, U., Gustafsson, C.M., Lundeberg, J., Wedell, A., 2011. Adenosine kinase
proteins. Cell. Mol. Life Sci. 65, 2875–2896. 567
481 deficiency disrupts the methionine cycle and causes hypermethioninemia,
Pascual-Anaya, J., D’Aniello, S., Kuratani, S., Garcia-Fernandez, J., 2013. Evolution of 568
482 encephalopathy, and abnormal liver function. Am. J. Hum. Genet. 89, 507–515.
Hox gene clusters in deuterostomes. BMC Dev. Biol. 13, 26. 569
483 Blair, J.E., Hedges, S.B., 2005. Molecular phylogeny and divergence times of
Philippe, H., Telford, M.J., 2006. Large-scale sequencing and the new animal 570
484 deuterostome animals. Mol. Biol. Evol. 22, 2275–2284.
phylogeny. Trends Ecol. Evol. 21, 614–620. 571
485 Boison, D., Scheurer, L., Zumsteg, V., Rulicke, T., Litynski, P., Fowler, B., Brandner, S.,
Rivera, M.C., Lake, J.A., 1992. Evidence that eukaryotes and eocyte prokaryotes are 572
486 Mohler, H., 2002. Neonatal hepatic steatosis by disruption of the adenosine
immediate relatives. Science 257, 74–76. 573
487 kinase gene. Proc. Natl. Acad. Sci. USA 99, 6985–6990.
Rokas, A., Holland, P.W., 2000. Rare genomic changes as a tool for phylogenetics. 574
488 Bourlat, S.J., Juliusdottir, T., Lowe, C.J., Freeman, R., Aronowicz, J., Kirschner, M.,
Trends Ecol. Evol. 15, 454–459. 575
489 Lander, E.S., Thorndyke, M., Nakano, H., Kohn, A.B., Heyland, A., Moroz, L.L.,
Rokas, A., King, N., Finnerty, J., Carroll, S.B., 2003. Conflicting phylogenetic signals at 576
490 Copley, R.R., Telford, M.J., 2006. Deuterostome phylogeny reveals monophyletic
the base of the metazoan tree. Evol. Dev. 5, 346–359. 577
491 chordates and the new phylum Xenoturbellida. Nature 444, 85–88.
Satou, Y., Mineta, K., Ogasawara, M., Sasakura, Y., Shoguchi, E., Ueno, K., Yamada, L., 578
492 Cameron, C.B., Garey, J.R., Swalla, B.J., 2000. Evolution of the chordate body plan:
Matsumoto, J., Wasserscheid, J., Dewar, K., Wiley, G.B., Macmil, S.L., Roe, B.A., 579
493 new insights from phylogenetic analyses of deuterostome phyla. Proc. Natl.
Zeller, R.W., Hastings, K.E., Lemaire, P., Lindquist, E., Endo, T., Hotta, K., Inaba, K., 580
494 Acad. Sci. USA 97, 4469–4474.
2008. Improved genome assembly and evidence-based global gene model set 581
495 Delsuc, F., Brinkmann, H., Chourrout, D., Philippe, H., 2006. Tunicates and not
for the chordate Ciona intestinalis: new insight into intron and operon 582
496 cephalochordates are the closest living relatives of vertebrates. Nature 439,
populations. Genome Biol. 9, R152. 583
497 965–968.
Singh, B., Gupta, R.S., 2009. Conserved inserts in the Hsp60 (GroEL) and Hsp70 584
498 Delsuc, F., Tsagkogeorga, G., Lartillot, N., Philippe, H., 2008. Additional molecular
(DnaK) proteins are essential for cellular growth. Mol. Genet. Genom. 281, 361– 585
499 support for the new chordate phylogeny. Genesis 46, 592–604.
373. 586
500 Denoeud, F., Henriet, S., Mungpakdee, S., Aury, J.M., Da Silva, C., Brinkmann, H.,
Song, S., Liu, L., Edwards, S.V., Wu, S., 2012. Resolving conflict in eutherian mammal 587
501 Mikhaleva, J., Olsen, L.C., Jubin, C., Canestro, C., Bouquet, J.M., Danks, G., Poulain,
phylogeny using phylogenomics and the multispecies coalescent model. Proc. 588
502 J., Campsteijn, C., Adamski, M., Cross, I., Yadetie, F., Muffato, M., Louis, A.,
Natl. Acad. Sci. USA 109, 14942–14947. 589
503 Butcher, S., Tsagkogeorga, G., Konrad, A., Singh, S., Jensen, M.F., Huynh, C.E.,
Springer, M.S., Stanhope, M.J., Madsen, O., de Jong, W.W., 2004. Molecules 590
504 Eikeseth-Otteraa, H., Noel, B., Anthouard, V., Porcel, B.M., Kachouri-Lafond, R.,
consolidate the placental mammal tree. Trends Ecol. Evol. 19, 430–438. 591
505 Nishino, A., Ugolini, M., Chourrout, P., Nishida, H., Aasland, R., Huzurbazar, S.,
Swalla, B.J., Smith, A.B., 2008. Deciphering deuterostome phylogeny: molecular, 592
506 Westhof, E., Delsuc, F., Lehrach, H., Reinhardt, R., Weissenbach, J., Roy, S.W.,
morphological and palaeontological perspectives. Philos. Trans. R. Soc. Lond. B 593
507 Artiguenave, F., Postlethwait, J.H., Manak, J.R., Thompson, E.M., Jaillon, O., Du, P.
Biol. Sci. 363, 1557–1568. 594
508 L., Boudinot, P., Liberles, D.A., Volff, J.N., Philippe, H., Lenhard, B., Roest, C.H.,

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019
YMPEV 5308 No. of Pages 9, Model 5G
1 October 2015

R.S. Gupta / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 9

595 Teeling, E.C., Hedges, S.B., 2013. Making the impossible possible: rooting the tree of Wong, S.Y., Paschos, A., Gupta, R.S., Schellhorn, H.E., 2014. Insertion/deletion-based 602
596 placental mammals. Mol. Biol. Evol. 30, 1999–2000. approach for the detection of Escherichia coli O157:H7 in freshwater 603
597 Tsagkogeorga, G., Turon, X., Galtier, N., Douzery, E.J., Delsuc, F., 2010. Accelerated environments. Environ. Sci. Technol. 48, 11462–11470. 604
598 evolutionary rate of housekeeping genes in tunicates. J. Mol. Evol. 71, 153–167. 605
599 Winchell, C.J., Sullivan, J., Cameron, C.B., Swalla, B.J., Mallatt, J., 2002. Evaluating
600 hypotheses of deuterostome phylogeny and chordate evolution with new LSU
601 and SSU ribosomal DNA data. Mol. Biol. Evol. 19, 762–776.

Please cite this article in press as: Gupta, R.S. Molecular signatures that are distinctive characteristics of the vertebrates and chordates and supporting a
grouping of vertebrates with the tunicates. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.09.019

You might also like