You are on page 1of 10
‘The Limits of Amino Acid Sequence Data in Angiosperm Phylogenetic Reconstruction Kare Bremer Evolution, Vol. 42, No. 4 (Iul., 1988), 795-803. Stable URL: Lhtp:flinks,jstor-org/sici?sici~00 14-3820%28 1988017%2942%3 Ad%IC195%3 ATLOAAS%3E2.0.CO%3B2-M solution is currently published by Society for he Study of Evolution ‘Your use of the ISTOR archive indicates your acceptance of ISTOR’s Terms and Conditions of Use, available at htp:sseww jstor org/aboutiterms.html. ISTOR’s Terms and Conditions of Use provides, in part, that unless you hhave obtained prior permission, you may aot download an entie issue of a journal or multiple copies of articles, and ‘you may use content in the ISTOR archive only for your personal, non-commercial use Please contact the publisher eegarding any futher use ofthis work, Publisher contact information ray he abained at ftps jtor.oraournal/ssevo ml. Each copy of any part ofa JSTOR transenission must contain the same copyright tice that appears on the screen or printed page of such transtnission, ISTOR isan independent not-for-profit organization dedicated to creating and preserving a digital archive ot scholarly journals. For more information regarding ISTOR, please contact suppom@jstor org. hup:thrwwjstor.orgy Sun May 15 15:07:04 2005 Bron 1889 5-405, ‘THE LIMITS OF AMINO ACID SEQUENCE DATA IN ANGIOSPERM PHYLOGENETIC RECONSTRUCTION Kine Deen Swedish Museum of Natural History, Department of Phanarogamie Botany, P.O. Box $0007, 8-104 05 Steckboim SWEDEN _sbsorae— Amino acid sequence data are available fo ribulose biphosphate carboxylase, plasto- ‘Cyanin, cytochrome ¢. and ferredoxin for 2 number of aygiospecet families. Cladistic analvsis of the datz, cluding evaluation ofall equally o¢ almost equally pacsimoniaus clagogz=ms, shows that much bomapity (earalleliams ana reverals) fe present and that fev or no well Suppor monophyletic groups of families can be demonstrated, fn ane analyais of nine angiosperm fares dnd 4d vasiable amino acd positions from theee proteins, the mast persimoniaus cladograms were TSI steps long and contained 63 paallelses ad reversals (consistency index ~ 0.543). fu anather analysis of sx families and $3 vanable amino acid positions from four proteins, the ost pars ‘ronious eladograt was 161 steps long ana contained 50 parallesms ana reverals (consistency index = 0.689} Single changes in both data matcess could vield most parsimonious cladograms ‘wi quite diferent topeloeies and without cortmon menoplisieue groups. Presently, amito acid Sequetice dala ae not comy ‘More informative position: sequencing ore proteins {tora the same eax, Received Sune 22, 1987, Very little is known with certainty about the phylogeny ofthe angiosperms (e.g. Hey- ‘wood, 1977; Dahlgren and Bremer, 1985). Often, morphological data are not evaluated in enough detail, and the vast amount of information is difficult 10 synthesize in a single, consistent, and parsimonious hy- pothesis of angiosperm phylogeny. Homo- plasy (i.e. parallelisms and reversals) is common in morphological data at higher taxonomic levels. ‘Amino acid sequences from a number of proteins are used for inferring interrelation- Ships between an increasing mumber of in- ‘vestigated angiosperm families. I'these data are reliable, a much needed general frame- work of angiosperm phylogeny could be generated. Most protein sequence data used in phylogenetic reconstruction come from the small subunit of ribulose biphosphate carboxylase (RBC-SSU; Martin etl, 1983; Martin and Dowd, 19842, 19840, 19842), plastocyanin (Boulter et al, 1979), and cy- {ochrome ¢ (Boulter et all, 1972}. Some- times the phylogenetic trees generated from sequences in these different proteins are in- consistent, and there is uncertainty about the stability of the data. The investigators are generally very cautious in drawing con- lusions about phylogeny from their anal- yses (Boulter et al., 1979; Martin et al, 1985). hensive enough fr phylogenetic reconstruction mang angiosperms needed, ether fromm sequencing longer part ofthe proteins fem Accepted January 26, 1988 A major problem is that the number of variable amino acid positions currently available in a single protein (15-30) is gen erally too small for a well supported hy- pothesis about the intertelationships of the taxa investigated (Peacock, 1981), The ‘combination of data from several species into familial sequences (Martin et al., 1983) and the subsequent combination of familial sequences from several proteins (Martin et al., 1985) yield data sets with a langer num- ber of informative amino acid positions. Such data sets may become large enough t0 say something about the interrelationships of the angiosperm families included. Martinetal. (1985) used sequences of five macromolecules, RBC-SSU, plastocyanin, cytochrome c, ferredoxin, and 5S rRNA, to ‘construct phylogenetic trees for 11 angio- sperm families. Since data from all five macromolecules are available for only three of the families, they used data sets based on combinations of three or four macromole- ccules. The trees from the different analyses were compared, and inconsistencies were discussed. One analysis included sequences from RBC-SSU, plastocyanin, and cyto- chrome ¢ in nine families. One most par- simonious tree and 12 trees that were one ‘or two steps longer were found. In all 13 trees, Chenopodiaceae and Polygonaccac appeared as sister groups, as did Fabaceae 795 796 apiaceae Bravaicecese KARE BREMER ARURANGUCCARCANCAGCCGUSGECCGAUCURGUCEGACASCGAAGUGUGARCUCEUTAICUG CRUGERAGUGAAGAUUGRGOCAUAAAUGGGUCURAGAGAGUGSAAUCAAGCAUAGARARUUGAG Capritotiacese AUUCGVUGUCCUGCCUCRICCUUGGAUGGAUCIGGARCUACGGCACCAAGCAUUAARARLUAW Selanaceae Poaceae chenape: Fabacese Asteracese Polygonacene Fo. CHURARUGRGAUGAAACACURAUGGAUGGAUCUGGARUUNCGGARCCAAGCAUUAAARYTANGAG CERGERLIUUCCURCARGCCUURGARCACAAGUBANGAAGNOGUCACCAAGCAURGAAANAGAG {scene COUGGRURUCCUACAAGUCIUNAGECUGGCLICIGECCCUGUGUCAAACUGUCURALCGUAUGUL CRGGRAGUGAAGAAUCRGUGAGCOGUCGACIAATAUCACAGAAUCAAUCALMIAUAAALUGAU AUUGGUUGACCUGAAACRGCGAURARUGANGGAAGACGGCUGGCGCCAAGCAARAUAAUUAG ‘CORGGRAGUCCUGAAAGACLTUAUIGCCUCGRUCTIEGCAUGCUAUCAAURALCARAACGGURA 10 Date matrix for sine angiosperm families and 64 variable nucleotide postions (columns) inferred fram amino acid sequences of three proteins (ize lex for source of data) The letters A, C. Gand U repceset nucleotides; 7 = unkown. The positions, coded with 2 leer indicating the protin ‘extochrome e), immediately followed by the position is the amino acid sequence, and ‘or 3) jn che inferred nucleotide vipet, ae as follows: S21, 87-1, 87-2, S8-1,S8- SPL, S22, 9242, $24.3, S251, 52542, S77 37-1, $37.2, P-L, PL2, PT-2, PRL, BB 28-1, P2E-3, PI9°1, O~5-1, C-$-2) CE, Chl, C32, Cel, C43, C51, C53, CP plastocyanin, end esding with ihe poston 982,993, 912.2, $201, $20-2, $2241 BC-SSU, P= 527-2, 835. PAS-L, PLB ce Bite, Piya, Pis-2, PI7-1, PL? 45) C5522, C5801, C583, O50, C62], C623, 066-3, C9069, C100-1, C1U0-3, CIEL and Brassicaceae. Martin etal, (1985 p. 399) concluded that these two sister-group rela- tionships “appear to be very reliable.” Despite the detailed analyses by Martin etal. (1985), uncertainty about the conclu- sions remains, because the extent of ho- moplasy in the data was not estimated, T have investigated the number of pacallel- isms and reversals necessary for explaining the data, and I discuss the equally or almost equally ‘parsimonious cladograms (trees) possible in relation to the homoplasy in- volved. I do nat consider the methods used by earlier authors to construct trees from amino acid sequence data (sce Peacock, 1981). The methods differed and were much, constrained by the algorithms and comput- er power available at the time (Boulter et al., 1972, 1979; Martin et al., 1983, 1985; Martin and Dowd, 1986a) Fssentially, [use the same parsimony analysis as Martin et al, (1983, 1985; based on the algorithm of Hendy and Periny [1982)), but in addition to constructing cladograms, I seek to deter- mine the extent of homoplasy in the amino acid sequences, how many cladograms are possible, anid what reliable phylogenetic in- formation is actually present in those clado- grams. Mareriats AND MeTHons Two data matrices were analyzed. The first one (Fig. 1) includes the same nine an- giosperm families discussed by Martin etal (1985 fig, Ic) ia their conclusions (Apiaceae, Brassicaceae, Caprifoliaceac, Solanaceae, Poaceae, Chenopodiaceae, Fabaceae, As- teraceac, and Polygonaceae). The sequences included are from RBC-SSU, plastocyanin, and cytochrome c and comprise 64 variable nucleotide positions, inferred from 40 ami- no acid positions. The second data matrix (Fig. 2) is limited to those six families for ‘which there are sequence data for four pro- teins (RBC-SSU, plastocyanin, cytochrome ¢, and ferredoxin). The six families are Api- aceae, Brassicaceae, Caprifoliaceae, Po- aceae, Chenopodiaceae, and Fabaceae, and the sequences comprise 82 variable nuclea- tide positions, inferred from $3 amtino acid positions, ‘The data for each family senerally come from several species (one species in some cases), the proteins of which have been se~ ‘quenced. By performing preliminary anal- yses, Martin et al. (1983) established in- ferred familial sequences from the species sequences. Martin et al. (1983) converted the amino acid sequences to inferred nu- cleotide sequences before analysis, using the method outlined by Penny et al. (1980) and Fitch and Farris (1974). I used the family mucleotide sequences converted from RBC- SSU, plastocyanin, and cytochrome ¢ amino acid sequences in the accessory publication to Martin etal. (1983). For ferredoxin, each family is represented by one species, except the Fabaceae, which is represented by Leu- caena glauca and Medicago sativa, Posi- tions with different amino acids in these two

You might also like