‘The Limits of Amino Acid Sequence Data in Angiosperm Phylogenetic
Reconstruction
Kare Bremer
Evolution, Vol. 42, No. 4 (Iul., 1988), 795-803.
Stable URL:
Lhtp:flinks,jstor-org/sici?sici~00 14-3820%28 1988017%2942%3 Ad%IC195%3 ATLOAAS%3E2.0.CO%3B2-M
solution is currently published by Society for he Study of Evolution
‘Your use of the ISTOR archive indicates your acceptance of ISTOR’s Terms and Conditions of Use, available at
htp:sseww jstor org/aboutiterms.html. ISTOR’s Terms and Conditions of Use provides, in part, that unless you
hhave obtained prior permission, you may aot download an entie issue of a journal or multiple copies of articles, and
‘you may use content in the ISTOR archive only for your personal, non-commercial use
Please contact the publisher eegarding any futher use ofthis work, Publisher contact information ray he abained at
ftps jtor.oraournal/ssevo ml.
Each copy of any part ofa JSTOR transenission must contain the same copyright tice that appears on the screen or
printed page of such transtnission,
ISTOR isan independent not-for-profit organization dedicated to creating and preserving a digital archive ot
scholarly journals. For more information regarding ISTOR, please contact suppom@jstor org.
hup:thrwwjstor.orgy
Sun May 15 15:07:04 2005Bron 1889 5-405,
‘THE LIMITS OF AMINO ACID SEQUENCE DATA IN
ANGIOSPERM PHYLOGENETIC RECONSTRUCTION
Kine Deen
Swedish Museum of Natural History, Department of Phanarogamie Botany,
P.O. Box $0007, 8-104 05 Steckboim SWEDEN
_sbsorae— Amino acid sequence data are available fo ribulose biphosphate carboxylase, plasto-
‘Cyanin, cytochrome ¢. and ferredoxin for 2 number of aygiospecet families. Cladistic analvsis of
the datz, cluding evaluation ofall equally o¢ almost equally pacsimoniaus clagogz=ms, shows
that much bomapity (earalleliams ana reverals) fe present and that fev or no well Suppor
monophyletic groups of families can be demonstrated, fn ane analyais of nine angiosperm fares
dnd 4d vasiable amino acd positions from theee proteins, the mast persimoniaus cladograms were
TSI steps long and contained 63 paallelses ad reversals (consistency index ~ 0.543). fu anather
analysis of sx families and $3 vanable amino acid positions from four proteins, the ost pars
‘ronious eladograt was 161 steps long ana contained 50 parallesms ana reverals (consistency
index = 0.689} Single changes in both data matcess could vield most parsimonious cladograms
‘wi quite diferent topeloeies and without cortmon menoplisieue groups. Presently, amito acid
Sequetice dala ae not comy
‘More informative position:
sequencing ore proteins {tora the same eax,
Received Sune 22, 1987,
Very little is known with certainty about
the phylogeny ofthe angiosperms (e.g. Hey-
‘wood, 1977; Dahlgren and Bremer, 1985).
Often, morphological data are not evaluated
in enough detail, and the vast amount of
information is difficult 10 synthesize in a
single, consistent, and parsimonious hy-
pothesis of angiosperm phylogeny. Homo-
plasy (i.e. parallelisms and reversals) is
common in morphological data at higher
taxonomic levels.
‘Amino acid sequences from a number of
proteins are used for inferring interrelation-
Ships between an increasing mumber of in-
‘vestigated angiosperm families. I'these data
are reliable, a much needed general frame-
work of angiosperm phylogeny could be
generated. Most protein sequence data used
in phylogenetic reconstruction come from
the small subunit of ribulose biphosphate
carboxylase (RBC-SSU; Martin etl, 1983;
Martin and Dowd, 19842, 19840, 19842),
plastocyanin (Boulter et al, 1979), and cy-
{ochrome ¢ (Boulter et all, 1972}. Some-
times the phylogenetic trees generated from
sequences in these different proteins are in-
consistent, and there is uncertainty about
the stability of the data. The investigators
are generally very cautious in drawing con-
lusions about phylogeny from their anal-
yses (Boulter et al., 1979; Martin et al,
1985).
hensive enough fr phylogenetic reconstruction mang angiosperms
needed, ether fromm sequencing longer part ofthe proteins fem
Accepted January 26, 1988
A major problem is that the number of
variable amino acid positions currently
available in a single protein (15-30) is gen
erally too small for a well supported hy-
pothesis about the intertelationships of the
taxa investigated (Peacock, 1981), The
‘combination of data from several species
into familial sequences (Martin et al., 1983)
and the subsequent combination of familial
sequences from several proteins (Martin et
al., 1985) yield data sets with a langer num-
ber of informative amino acid positions.
Such data sets may become large enough t0
say something about the interrelationships
of the angiosperm families included.
Martinetal. (1985) used sequences of five
macromolecules, RBC-SSU, plastocyanin,
cytochrome c, ferredoxin, and 5S rRNA, to
‘construct phylogenetic trees for 11 angio-
sperm families. Since data from all five
macromolecules are available for only three
of the families, they used data sets based on
combinations of three or four macromole-
ccules. The trees from the different analyses
were compared, and inconsistencies were
discussed. One analysis included sequences
from RBC-SSU, plastocyanin, and cyto-
chrome ¢ in nine families. One most par-
simonious tree and 12 trees that were one
‘or two steps longer were found. In all 13
trees, Chenopodiaceae and Polygonaccac
appeared as sister groups, as did Fabaceae
795796
apiaceae
Bravaicecese
KARE BREMER
ARURANGUCCARCANCAGCCGUSGECCGAUCURGUCEGACASCGAAGUGUGARCUCEUTAICUG
CRUGERAGUGAAGAUUGRGOCAUAAAUGGGUCURAGAGAGUGSAAUCAAGCAUAGARARUUGAG
Capritotiacese AUUCGVUGUCCUGCCUCRICCUUGGAUGGAUCIGGARCUACGGCACCAAGCAUUAARARLUAW
Selanaceae
Poaceae
chenape:
Fabacese
Asteracese
Polygonacene
Fo.
CHURARUGRGAUGAAACACURAUGGAUGGAUCUGGARUUNCGGARCCAAGCAUUAAARYTANGAG
CERGERLIUUCCURCARGCCUURGARCACAAGUBANGAAGNOGUCACCAAGCAURGAAANAGAG
{scene COUGGRURUCCUACAAGUCIUNAGECUGGCLICIGECCCUGUGUCAAACUGUCURALCGUAUGUL
CRGGRAGUGAAGAAUCRGUGAGCOGUCGACIAATAUCACAGAAUCAAUCALMIAUAAALUGAU
AUUGGUUGACCUGAAACRGCGAURARUGANGGAAGACGGCUGGCGCCAAGCAARAUAAUUAG
‘CORGGRAGUCCUGAAAGACLTUAUIGCCUCGRUCTIEGCAUGCUAUCAAURALCARAACGGURA 10
Date matrix for sine angiosperm families and 64 variable nucleotide postions (columns) inferred
fram amino acid sequences of three proteins (ize lex for source of data) The letters A, C. Gand U repceset
nucleotides; 7 = unkown. The positions, coded with 2 leer indicating the protin
‘extochrome e), immediately followed by the position is the amino acid sequence, and
‘or 3) jn che inferred nucleotide vipet, ae as follows: S21, 87-1, 87-2, S8-1,S8-
SPL, S22, 9242, $24.3, S251, 52542, S77
37-1, $37.2, P-L, PL2, PT-2, PRL, BB
28-1, P2E-3, PI9°1, O~5-1, C-$-2) CE, Chl, C32, Cel, C43, C51, C53, CP
plastocyanin, end
esding with ihe poston
982,993, 912.2, $201, $20-2, $2241
BC-SSU, P=
527-2, 835.
PAS-L, PLB
ce
Bite, Piya, Pis-2, PI7-1, PL?
45) C5522, C5801, C583, O50, C62], C623, 066-3, C9069, C100-1, C1U0-3, CIEL
and Brassicaceae. Martin etal, (1985 p. 399)
concluded that these two sister-group rela-
tionships “appear to be very reliable.”
Despite the detailed analyses by Martin
etal. (1985), uncertainty about the conclu-
sions remains, because the extent of ho-
moplasy in the data was not estimated, T
have investigated the number of pacallel-
isms and reversals necessary for explaining
the data, and I discuss the equally or almost
equally ‘parsimonious cladograms (trees)
possible in relation to the homoplasy in-
volved. I do nat consider the methods used
by earlier authors to construct trees from
amino acid sequence data (sce Peacock,
1981). The methods differed and were much,
constrained by the algorithms and comput-
er power available at the time (Boulter et
al., 1972, 1979; Martin et al., 1983, 1985;
Martin and Dowd, 1986a) Fssentially, [use
the same parsimony analysis as Martin et
al, (1983, 1985; based on the algorithm of
Hendy and Periny [1982)), but in addition
to constructing cladograms, I seek to deter-
mine the extent of homoplasy in the amino
acid sequences, how many cladograms are
possible, anid what reliable phylogenetic in-
formation is actually present in those clado-
grams.
Mareriats AND MeTHons
Two data matrices were analyzed. The
first one (Fig. 1) includes the same nine an-
giosperm families discussed by Martin etal
(1985 fig, Ic) ia their conclusions (Apiaceae,
Brassicaceae, Caprifoliaceac, Solanaceae,
Poaceae, Chenopodiaceae, Fabaceae, As-
teraceac, and Polygonaceae). The sequences
included are from RBC-SSU, plastocyanin,
and cytochrome c and comprise 64 variable
nucleotide positions, inferred from 40 ami-
no acid positions. The second data matrix
(Fig. 2) is limited to those six families for
‘which there are sequence data for four pro-
teins (RBC-SSU, plastocyanin, cytochrome
¢, and ferredoxin). The six families are Api-
aceae, Brassicaceae, Caprifoliaceae, Po-
aceae, Chenopodiaceae, and Fabaceae, and
the sequences comprise 82 variable nuclea-
tide positions, inferred from $3 amtino acid
positions,
‘The data for each family senerally come
from several species (one species in some
cases), the proteins of which have been se~
‘quenced. By performing preliminary anal-
yses, Martin et al. (1983) established in-
ferred familial sequences from the species
sequences. Martin et al. (1983) converted
the amino acid sequences to inferred nu-
cleotide sequences before analysis, using the
method outlined by Penny et al. (1980) and
Fitch and Farris (1974). I used the family
mucleotide sequences converted from RBC-
SSU, plastocyanin, and cytochrome ¢ amino
acid sequences in the accessory publication
to Martin etal. (1983). For ferredoxin, each
family is represented by one species, except
the Fabaceae, which is represented by Leu-
caena glauca and Medicago sativa, Posi-
tions with different amino acids in these two