You are on page 1of 8

AP Biology Comparing DNA Sequences to Understand Evolutionary Relationships With BLAST Name_______________________________ Date_______________________ Block_____ Background.

Between 1990–2003, scientists working on an international research project known as the Human Genome Project were able to identify and map the approximately 20,000 genes that define a human being. The project also successfully mapped the genomes of other species, including the fruit fly, mouse, and Escherichia coli. The location and complete sequence of the genes in each of numerous species are now available for anyone in the world to access via the Internet. Why is this information important? Being able to identify the precise location and sequence of human genes will allow us to better understand aspects of human development and genetic diseases. In addition, learning about the sequence of genes in other species helps us understand evolutionary relationships among organisms. Many of our genes are identical or similar to those found in other species. Suppose you identify a single gene that is responsible for a particular disease in fruit flies. Is that same gene found in humans? Does it cause a similar disease? It would take you nearly 10 years to read through the entire human genome to try to locate the same sequence of bases as that in fruit flies. This definitely isn’t practical, so a sophisticated technological method is needed. Bioinformatics is a field that combines statistics, mathematical modeling, and computer science to analyze biological data. Using bioinformatics methods, entire genomes can be quickly compared in order to detect genetic similarities and differences. An extremely powerful bioinformatics tool is BLAST, which stands for Basic Local Alignment Search Tool. Using BLAST, you can input a gene sequence of interest and search entire genomic libraries for identical or similar sequences in a matter of seconds. In this laboratory investigation, you will use BLAST to compare several genes, and then transfer the information into Biology Workbench to construct a cladogram. A cladogram (also called a phylogenetic tree) is a visualization of the evolutionary relatedness of species. Figure 1 is a simple cladogram. Note that the cladogram is treelike, with the endopoints of each branch representing a specific species. The closer two species are located to each other, the more recently they share a common ancestor. For example, Selaginella (spikemoss) and Isoetes (quillwort) share a more recent common ancestor. Figure 2 (next page) includes additional details, such as the evolution of particular physical structures called shared derived characteris. Note that the placement of derived characters indicates that every species above the character label possesses that structure. For example, tigers and gorillas have hair, but lampreys, sharks, salamanders, and lizards do not have hair. The Figure 2 cladogram can be used to answer several questions. Which organisms have lungs? What three structures do all lizards posses? Did dry skin or hair evolve first?


ucmp.) b. and mosses? Pre-lab Assignment.Historically. 2 . while the protein is identical. Can you draw a cladogram that depicts the evolutionary relationship among humans. only physical structures were used to create cladograms.htm to learn more about the evolution of flight and the development of phylogenetic trees using cladistics. however. chimpanzees. Why is the percentage similarity in the gene always lower than the percentage similarity in the protein for each of the species? (Hint: Recall how a gene is expressed to produce a protein. The following data table shows the percentage similarity of this gene and the protein it expresses in humans versus other species. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) is an enzyme that catalyzes the sixth step in glycolysis. current cladistics relies heavily on genetic evidence as well Chimpanzees and humans share about 98% of their DNA. 3. the GAPDH gene in chimpanzees is 99. a. Work through http://www. Draw a cladogram depicting the evolutionary relationships among all five species (including humans) according to their percentage similarity in the GAPDH gene.berkeley. an important reaction that produces molecules used in cellular respiration. Humans and fruit flies share approximately 60% of their DNA. fruit flies. 2. Complete 2 and 3 in your lab notebook. Use the data at right to construct a cladogram of the major plant groups. For example. which would place them closely together on a cladogram. according to the table.6% identical to the gene found in humans. which would place them farther apart on a cladogram.

blubber under the skin and other adaptations for survival in the water. “Building Phylogenetic Trees from DNA Sequence Data: Investigating Polar Bear and Giant Panda Ancestry. Table 1: Accession Numbers of Species (to be used with NCBI site) Species Scientific Name Hemoglobin beta (Hbb) Accession Number Abyssian Hyrax Procavia capensis habessinica P02086 African Elephant Loxodonta Africana P02085 Amazon Manatee Trichechus manatus P07415 Bottlenose Dolphin Tursiops truncatus P18990 Domestic cow Bos taurus P02070 Domestic dog Canis lupus familiaris P60524 Harbor Seal Phoca vitulina P09909 Hippopotamus Hippopotamus amphibious P19016 Human Homo sapiens P68871 Minke Whale Balaenoptera acutorostrata P18984 Mouse Mus musculus you will load that information into analysis software called Biology Workbench. it is possible that the data will reveal the evolutionary history of marine mammals. Brown. Caroline Alexandra.nyu. 63. Did they evolve from a single ancestor who returned to the ocean. two alpha chains and two beta chains. or were there distinct return events from separate ancestors? A useful starting hypothesis is that all modern marine mammals have a single common land mammal ancestor.” The American Biology Teacher. Hemoglobin is a good molecule for this evolutionary analysis because it shows conservation across species. Vol. and manatees are all marine mammals. the hemoglobin beta protein. After obtaining the hemoglobin beta amino acid sequences of several mammalian species from GenBank.Part 1.” <http://www. No. Oh My: Using Bioinformatics to Teach Cladistics and Evolution. available at the National Center for Biotechnology Information (NCBI) website.1 Pacific Walrus Odobenus rosmarus divergens P68046 Red Kangaroo Macropus rufus P02107 Rhesus Monkey Macaca mulatta AFE67078 Sperm Whale Physeter catodon P09905. but DNA and protein sequences contain evidence about the evolutionary history of organisms and the relationships between living creatures. and variation between species. legs reduced to flippers. This analysis uses a protein that all mammals share. Using the tools available in Biology 3 . dolphins. Once we collect and analyze DNA or protein sequences of marine and land mammals. “Walruses. a public database of gene and protein sequences. Molecular Phylogeny and Marine Mammals Adapted from: Maier. seals. Hemoglobin is a protein made of four polypeptide subunits. Kim and Stuart M. They all have streamlined bodies. 643-646 and Foglia. many biologists have studied hemoglobin. so sequences from many different organisms are available in the GenBank database. due to random DNA mutations accumulating over time. Although mammals evolved on land. or were there different return events and parallel (convergent) evolution that led to similar adaptations among these species? It is not possible to go back in time to observe what happened.pdf> accessed 24 April 2011. these species have returned to the sea. since it performs the essential function of carrying oxygen in the blood. Walruses. you will compare the amino sequences and create phylogenetic trees to determine the evolutionary relationships between these species. Whales and Hippos. 9. The goal of this analysis is to test hypotheses about the evolutionary ancestry of different marine mammals: Did marine mammals evolve from a single ancestor that returned to the ocean. whales. In addition. pp.

nih. 10. Make sure the display format is “GenPept” to view the information about the sequence. Copy the highlighted 4 . Note that the sequence uses the one-letter abbreviations for the names of the amino acids. Position the cursor in the top left corner of the sequence box and paste the copied amino acid sequence. including the initial ‘>’ symbol. position the cursor after the initial ‘>’ symbol. On the next page then ‘Add New Protein Sequence’ and ‘Run’. highlight the rest of the identification lines. Replace the FASTA identification lines (everything after “>” and before the amino acid sequence) with the descriptive name (Amazon Manatee) used in the label box. Make sure to keep the ‘>’ and all of the amino acid sequence! Select ‘save’ at the bottom of the screen. In the Search pull-down menu at the top. as well as relevant journal articles. and log on to http://workbench. Double-check with the scientific name listed in Table 1 to make sure you have the hemoglobin sequence for the correct organism. select “Protein”. and repeat steps 2-9 for each of the other species. In the label box. At bottom of screen. 7. 6. 3. At the top of the screen. Open a new window in Firefox. Kingdom.).2’. This will show the amino acid sequence into a format that can be read by the Biology Workbench program. Set up your account. type “Amazon Manatee”. go to www. and in the space below enter the accession number (located in Table 1 above) for the amino acid sequence of the Amazon Manatee in the space. 2. Select ‘Go’. This page shows you the classification of the organism (Domain. and delete. 4. Then enter ‘Biology Workbench 3.Accessing hemoglobin beta (Hbb) amino acid sequences and importing them into Biology Workbench 1.nlm. Type the descriptive name immediately behind the ‘>’ symbol. Phylum. 9. Using Firefox. highlight the entire FASTA sequence. etc. beside ‘Display’. Return to the window open to NCBI. To do this. To import the sequence into Biology Workbench. select ‘Protein Tools’.sdsc. select “FASTA”. Click on “register for a free account”.Procedure Part 1 . 8.ncbi.

and Red Kangaroo) by checking the boxes beside the names at the bottom of the page. These numbers indicate the degree of difference between the hemoglobin beta amino acid sequences from each species.Part Two . align sequences for five species – Domestic Cow. paying attention to the consensus key. Which species is least related to others based analysis of the Hbb amino acid sequence? 3. Click ‘Return’. Hit return to bring you back to the main page. this time. a. First. (Note: Here and in other steps. Build a rooted Phylogenetic Tree of these five species a. Click on “Import Alignment” to enter the Alignment Tools section of Biology Workbench. Which of the five species appears least related to the other four based on this sequence analysis? c. Observe the clustal distance matrix and record in a table in your lab notebook. and Red Kangaroo . b. From the Protein Tools menu box. the program may run automatically. In Biology Workbench. b. b. c. Click on the box to the left of the aligned sequences. Domestic Dog. ‘Run’ and ‘Submit’.) Observe the alignment of sequences. 2.and draw a tree showing evolutionary relationships based on the hemoglobin beta amino acid sequences. b. but. Harbor Seal. Repeat steps 2 and 3 to create a Phylogenetic Tree that includes all twelve species. c. What does this tree suggest about whether or not all marine mammals share a single ancestor that returned to the ocean? 4. select all twelve species.00 indicates identical sequences. Draw the tree that appears on the screen in the data section. choose “CLUSTALW”. From the Alignment Tools menu. and you may need only enter ‘submit’. Domestic Dog. select “Protein Tools’ and select the five species sequences (Domestic Cow.Analyzing the relationships between different species using the Hbb sequence 1. Determine the relationships of all twelve mammalian species. followed by ‘run’ and then ‘submit’. then choose “CLUSTALDIST”. A difference of 0. Be sure to select the appropriate CLUSTALW group during step 2. c. their distance number increases. choose “DRAWGRAM” then ‘Run’ and ‘Submit’. 5 . Repeat step 1 above. Harbor Seal. Minke Whale. Minke Whale. Determine the Genetic Distance between sequence pairs a. As the difference between two sequences increase. a.

) Table 2. and Red Kangaroo Phylogenetic Tree #2 – All fifteen species listed in Table 1. Domestic Dog. Look up this protein and find out what it does.Domestic Cow. Model of table to be recorded in your lab notebook. The lengths of the branches do NOT necessarily indicate relative time since divergence between species. A useful protein to study a range of organisms. Can hemoglobin beta protein be used to study plants? Fungus? Think about the function of this protein and hypothesize as to the types of organisms that would produce this protein. beyond animals. Domestic Dog. Minke Whale. even beyond multicellular creatures! is cytochrome C. Part 2.Data Record Example Make a table similar to the one below in your lab notebook to record Clustal Distance Analysis for Hemoglobin beta sequences of Domestic Cow. Minke Whale. Molecular Phylogeny and the Tree of Life Suppose one is interested in examining more branches of the tree of life. you will construct a phylogenetic tree using information stored in the NCBI database and the alignment tools of Biology Workbench. so do not need to be filled in. (NOTE: Shaded squares would repeat data. Record genetic distance between each pair of species in the appropriate box. Draw the resulting tree in your notebook. Harbor Seal. The list of organisms you will analyze is found at the top of the next page. beyond vertebrates. and Red Kangaroo Record species names in boxes on top and right side of table. Note that each node indicates the most recent common ancestor of two (or more) species. Species names Draw the two phylogenetic trees in your lab notebook. Harbor Seal. 6 . How does its function help to explain why it is present in a very broad range of organisms? What other structures and processes are likely to be shared among all eukaryotes? Among both prokaryotes and eukaryotes? Using the same steps you followed above. Phylogenetic Tree #1 .

What have you learned about the origin of marine mammals? What do you find particularly interesting in the first phylogenetic tree? Explain in detail.1 Fruit Fly Drosophila melanogaster AAA28437. Turn in written answers to these questions. Check in NCBI to see if you can find the sequence for the protein you wish to study. (Does it make more sense to analyze hemoglobin sequences? Or cytochrome c? Hmmmm….1 Corn Zea mays AFW81901. Bats aren’t bugs! Can you help clear up Calvin’s misunderstanding? Choose five or six organisms and find out their scientific names.1 Fungus Candida albicans AAB68996.1 Fungus Neurospora crassa AAA92156. Discussion Questions.2 Thale Cress Arabidopsis thaliana AAB72175.Table 3: Accession Numbers of Species (to be used with NCBI site) Species Scientific Name Cytochrome C Accession Number Albacore Tuna Thunnus alalunga P81459 Bullfrog Rana catesbeuiana ACO51922. You may use some of the organisms listed above.1 Domestic dog Canis lupus familiaris AEP27248. Why could hemoglobin beta protein be used to answer the question about the origin of marine mammals.1 Snapping turtle Chelydra serpentina P00022. but the second tree required the use of the cytochrome c protein? 2.1 Turkey Meleagris gallopavo P67882. along with your notes taken while completing the lab and your phylogenetic trees for all parts of the lab.1 Chicken Gallus gallus NP_001072946.1 Horse Equus caballus NP_001157486. 7 . 1.1 Human Homo sapiens NP_061820. Which gene must have evolved more recently? Why? 3.1 Potato Solanum tuberosum AFX66977.1 Part 3. if you wish.1 Domestic cow Bos taurus NP_001039526.2 Wheat Triticum aestivum P00068.) Follow the steps you used before to make a small phylogenetic tree that might help Calvin better understand classification of bats.

What are some of the ethical and legal aspects involved in a case such as this? b. Explain how two different species can have identical cytochrome-c and still be different species. Would this make it easier or more difficult to maintain an accurate phylogeny of HIV strains in a population? d. Cite any online or print sources you use. 5. 6. Does the phylogenetic tree based on cytochrome c protein seem consistent with morphological features of these different organisms? Discuss with your classmates and do some additional research if you are unclear about characteristics of specific organisms. Do you think it is always important to trace the origin of transmitted diseases? Why or why not? c. why are structural similarities among living organisms and in the fossil record still important in determining relationships between species? 7. If the molecular data provides the complete “instruction manual” for an organism.4. HIV is a virus that rapidly evolves. even though she had no known risk factors for HIV infection. One of the uses of comparative evolution is in epidemiology. who may have also infected other patients during invasive dental procedures. a. The chicken and the turkey are both birds and have the same sequence of amino acids in their cytochrome-c protein. from the woman’s dentist. How could sequence alignment be used to help study the global spread of influenza virus. which often includes new strains. Comparative analysis of the gene sequences for the HIV-1 outer-envelope protein from the woman. from other patients. and from a member the local community (as a control) determined that she had been infected by her dentist. tracing the source of an infectious agent. each year? 8 . In the early 1990’s a young woman in Florida died of AIDS.