Professional Documents
Culture Documents
Evolution of Spider Silks: Conservation and Diversification of The C-Terminus
Evolution of Spider Silks: Conservation and Diversification of The C-Terminus
doi: 10.1111/j.1365-2583.2005.00606.x
Received 27 May 2005; accepted following revision 1 August 2005. Correspondence: Dr S. Goodacre, School of Biological Sciences, University of
East Anglia, Norwich, NR4 7TJ. Tel.: +44-1603-593 853; fax: +44-1603-592
250; e-mail: s.goodacre.uea.ac.uk.
46
Table 1. Accession numbers of spider silk sequences in this study. All sequences are from mRNA apart from those indicated by *, which are from genomic DNA
Family
Species
Protein
Accession number
Reference
Dipluridae
Plectreuridae
Euagrus chisoseus
Plectreurys tristis
Fib1
Fib1
Fib2
Fib3
Fib4
AF350271
AF350281
AF350282
AF350283
AF350284
Araneidae
Araneus bicentarius
Araneus diadematus
MaSp2*
MaSp1
MaSp2
MiSp
MaSp2*
Flag
MaSp1
MaSp2
MaSp2*
U20328
U47854
U47856
U47853
AF350263
AF350264
AF350266
AF350267
AF350272
Flag*
MaSp1
MiSp
MaSp1*
MaSp2*
MaSp1*
MaSp2*
MaSp1*
MaSp1*
AF218621
U20329
AF027736
AF350277
AF350278
AF350279
AF350280
AF350285
AF350286
Argiope aurantia
Argiope trifasciata
Gasteracantha mammosa
Tetragnathidae
Nephila clavipes
Nephila madagascariensis
Nephila senegalensis
Tetragnatha kauaiensis
Tetragnatha versicolor
Theridiidae
Latrodectus geometricus
MaSp1
AF350273
Pisauridae
Dolomedes tenebrosus
AmSp1
AmSp2
AF350269
AF350270
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
47
Figure 2. ClustalW alignment of C-terminal amino acid sequences, shaded to indicate similarities (grey) and identities (black). The region of highly conserved
sequence about a QALLE motif, which corresponds to a region of predicted -helix in all silk types is also shown.
Results
Sequence conservation
There were 26 DNA sequences on GENBANK for which the
C-terminal silk sequence was available: five cribellate fibroins
and 21 ecribellate spidroins/fibroins. The total length of each
sequence varied since all are partial gene sequences with
variable repeat length, repeat number and C-terminal length.
Greater sequence conservation was observed at the Cterminus. There is a particularly conserved region at a QALLE
amino acid sequence motif (Fig. 2), at which the majority of
sequences share greater than 50% identity (Dayhoff similarity
matrix; Dayhoff et al., 1978). When the entire C-terminus
is considered, the similarity is lower, but most silks share at
least 30% identity. The flag C-termini are the most highly
diverged. They do not have a complete QALLE motif and
share as little as 23% sequence identity with other silks.
Phylogenetic analysis
Phylogenetic analyses were performed on the entire silk Cterminal data set (198 bp, 66 amino acids) and repeated with
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
48
Figure 3. (a) Unrooted maximum likelihood tree (198 base pairs) of C-termini. Rate matrix, proportion of invariant sites (0.22) and (1.66) estimated from an
initial neighbour-joining tree. Numbers indicate the support for individual branches from 100 bootstrap replicates (values above 70 shown). (b) Unrooted
phylogeny constructed using a Bayesian approach (computed with MRBAYES using 4 chains of 1 000 000 generations after a burn in time of 100 000
generations), estimating the proportion of invariant sites (019) and (1.69). Probabilities for each branch are given. Tests for substitution rate heterogeneity
among branches labelled 14 are described in Table 2.
Within the MaSp/MiSp silk group there are few wellsupported nodes in the ML or NJ trees but strong support
in the Bayesian tree for the following: AmSp and MiSp silks
cluster separately from the MaSp silks and MaSp silks
cluster in several, well-supported paraphyletic groups, with
strong support for several terminal groupings consisting
of either MaSp1 or MaSp2 silks but not both. The single
exception is MaSp1 of Araneus diadematus, which falls
within a well-supported group containing MaSp2 silks of
other species.
Maximum likelihood and Bayesian analysis of amino acid
sequences (JTT model of substitution) are shown in Fig. 4.
Well supported terminal groups in these trees were also
well supported by analysis of nucleotide sequences (Fig. 3).
Sequence evolution
Tests for recombination made using Recpars (Hein 93, with
any gaps in the alignment removed) inferred between 1
and 5 recombination events within the phylogeny when the
recombination:substitution cost was set at 1.5 : 1. When all
sequences were included, at least one recombination event
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
49
Figure 4. (a) Unrooted maximum likelihood analysis tree of C-termini amino acid sequences (66 amino acids) calculated using MOLPHY (JTT substitution
matrix, majority rule consensus of 50 trees produced is shown, branches supported in more than 50% (= 25) trees, are shown). (b) Unrooted Bayesian analysis
of amino acid sequences (JTT substitution matrix, chain number, burn in time and branch probabilities as for figure 3b. Majority rule consensus shown).
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
50
Table 2. Estimates of under different models of heterogeneity across sites or branches within the tree topology (indicated on Fig. 3) and likelihood ratio tests
(LRTs) of nested models
Model
All sequences included
One across branches and sites
Model = 0 NSites = 0
One across branches, rate variation among sites
Model = 0 NSites = 3 (discrete classes)
Model = 0 NSites = 7 (beta distribution)
Model = 0 NSites = 8 (beta + free ratio of )
MaSp1 & 2 sequences only
One across branches, rate variation among sites
Model = 0, NSites = 0
3 (discrete classes)
7 (beta distribution)
8 (beta + free ratio of )
All sequences included
One across branches, 2 site categories
Model = 0 NSites = 3 (2 categories)
Branch-sites model (variation across sites and branches)
Model = 2 Nsites = 3
1 2 = 3 = 4
2 1 = 3 = 4
3 1 = 2 = 4
4 1 = 2 = 3
Parameter estimates
Likelihood
2, LRT
3874.79
3846.70
3850.14
NS
p = 2.75, q = 26.77
p0 = 1, p = 2.75, q = 26.77
(p1 = 0, = 2.60)
2 d.f.
(per branch) = 0.063
(site classes) = 0.029, 0.130, 0.440
p = 0.92, q = 11.68
p0 = 1, p = 0.82, q = 11.69
p1 = 0, = 2.73
1703.95
1679.78
1682.84
1682.84
3847.84
3841.80
12.08, P = 0.002
3847.26
NS
3847.87
NS
3845.74
NS
NS
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
51
52
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
53
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
54
Sequence evolution
Phylogenetic analyses of nucleotide sequences (198 unambiguously
aligned base pairs/66 amino acids) were performed using several
methods: maximum likelihood (Felsenstein, 1981), neighbour-joining
and using a probabilitybased, Bayesian approach.
Maximum likelihood trees were constructed in PAUP* v. 4.0b
(Swofford, 1999) using the general time-reversible (GTR) model
(Lanave et al., 1984). The GTR rate matrix, base frequencies, the
proportion of invariant sites and the shape parameter () for the
gamma distribution that describes heterogeneity across sites, were
all estimated by likelihood using an iteration procedure based on
an initial simple neighbour-joining tree: Parameter values were
estimated from this initial neighbour-joining tree using likelihood.
These parameters were then used to make a new neighbourjoining tree, and the parameters re-estimated by likelihood from
this new tree. The process was repeated until no further improvement in likelihood of the neighbour-joining tree was observed. The
final parameter estimates were used to construct a tree by maximum
likelihood. The phylogeny was rooted on E. chisoseus on the basis
of its basal position within the Araneae based upon morphological
data (Fig. 1). Tree searching involved a heuristic procedure with treebisection-reconnection branch swapping. Bootstrap resampling
(100 replicates, Felsenstein, 1985) was used to assign support for
particular branches within the tree. Neighbour-joining trees were
constructed using MEGA 2.1 (Kumar et al., 2001) using the TajimaNei (1984) model of nucleotide substitution.
A probability-based, Bayesian approach to tree construction was
carried out using MRBAYES (Huelsenbeck & Ronquist, 2001).
This package uses a metropolis-coupled Markov chain Monte Carlo
algorithm to allow the running of multiple chains. A run of four chains
for 1 000 000 generations with a burn-in time of 100 000 generations
was carried out to ensure Markov chain convergence. A general
time reversible model of nucleotide substitution was used allowing
for rate heterogeneity across sites, with a proportion of sites
allowed to be invariant.
Maximum likelihood analysis of amino acid sequences using
the Jones, Taylor Thornton (JTT, 1992) substitution matrix was
performed using the program MOLPHY v. 2.3 (Adachi & Hasegawa,
1996.) 50 bootstrap replicates were used to assign support for
individual nodes within the tree. Bayesian analysis of amino acid
sequences was also carried out using the JTT matrix with the
same burn-in and run parameters as before.
Tests for detecting recombination events based upon a phylogenetic
approach were carried out using the program Recpars (Hein, 1993).
Phylogenies with and without recombination events were evaluated
against one another by comparing their total costs using a range of
recombination to substitution costs (the recommended ratio is 1.5 : 1,
Wiuf et al., 2001). A further test for recombination was made using
the DSS (difference in sum of squares) approach (McGuire &
Wright, 2000; F84 distance measure used) as implemented in the
program TOPALI ( Milne, Husmeir, McGuire & Wright, 2003, 04).
Estimates of , the parameter describing non-synonymous/
synonymous (dN/dS) amino acid substitution ratios, were made by
maximum likelihood using the program codeml in the software
package PAML (Yang, 1997). The method allows codon bias and
variable substitution rates to be incorporated into the analysis (Yang
& Bielawski, 2000), which is essential given the AT bias of third codon
positions in spider silk (Xu & Lewis, 1990; Hayashi & Lewis, 1998).
Estimates were made based upon a given tree topology, with the
following sets of criteria: (i) assuming a single value of across all
branches and sites in the tree (Model = 0, Nsites = M0) (ii) allowing
for heterogeneity in among codons within the tree (Model = 0, Nsites
= M3, M7 or M8) (iii) allowing a different value of along a specified
branch in the tree (Yang & Nielsen 2002) whilst at the same time
allowing four different classes of for amino acid positions (using
Model = 2, Nsites = M3).
The tests described can theoretically detect the small number of
sites (or branches) for which > 1 even when < 1 for the majority
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
Acknowledgements
The authors are grateful to Dr Brent Emerson, Amy Crowther
and Dr Alison Surridge for critically reading the manuscript.
This work was supported by the University of East Anglia
and by a BBSRC grant to Prof. Hewitt.
References
Adachi, J. and Hasegawa, M. (1996) MOLPHY, Version 2.3:
Programs for Molecular Phylogenetics Based on Maximum
Likelihood. Tokyo: Institute of Statistical Mathematics.
Anisimova, M., Nielsen, R. and Yang, Z. (2003) Effect of recombination on the accuracy of the likelihood method for detecting
positive selection at amino acid sites. Genetics 164: 1229 1236.
Beckwitt, R. and Arcidiacono, S. (1994) Sequence conservation in
the C-terminal region of spider silk proteins (spidroin) from
Nephila clavipes (Tetragnathidae) and Araneus bicentarius
(Araneidae). J Biol Chem 269: 66616663.
55
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556
56
2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556