This action might not be possible to undo. Are you sure you want to continue?
Genome Sequencing Project – Up Close and Personal Definition Genome sequencing projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (an animal, a plant, a fungus, a bacterium, an Achaean, a protist or a virus). The genome sequence for any organism requires the DNA sequences for each of the chromosomes in an organism to be determined. For bacteria, which usually have just one chromosome, a genome project will aim to map the sequence of that chromosome. Humans, with 22 pairs of autosomes and 2 sex chromosomes, will require 24 separate chromosome sequences in order to represent the completed genome. Background The sequencing of the human genome along with related organisms represents one of the largest scientific endeavours in the history of mankind. The information gathered from sequencing will provide the raw data for the exploding field of bioinformatics, where computer science and biology live in symbiotic harmony. The art of determining the sequence of DNA is known as Sanger sequencing after its brilliant pioneer. This technique involves the separation of fluorescent labelled DNA fragments according to their length on a polyacrilimide gel (PAGE). The base at the end of each fragment can then be visualized and identified by the dye with which it reacts. The time and labour intensive nature of gel preparation and running, as well as the large amounts of sample required, increase the time and costs of genomic sequencing. These conditions drastically reduce the efficiency of sequencing projects ultimately limiting researchers in their sequencing attempts1.
Frederic Sanger – a man behind “shotgun sequencing”
Encyclopedia of Medical Genomics and Proteomics. Jürgen Fuchs, Maurizio Podda. 2004. CRC Press. London, UK.
Andre Goffeau set up a European consortium to sequence the genome of the budding yeast Saccharomyces cerevisiae (12. a strategy based on the isolation of random pieces of DNA from the host genome to be used as primers for the PCR amplification of the entire genome. valuable insight concerning these organisms would be gained with the elucidation of their genetic makeup.Genome Sequencing Project Sanger first used "shotgun" sequencing five years later to complete the bacteriophage sequence that was significantly larger: 48502 bp. and the 186 kb genome of smallpox. 2006. New York 2 . National Research Council. the 192 kb genome of vaccinia. Since then a couple of other viral and organellar genomes have been sequenced using similar techniques such as the 229 kb genome of cytomegalovirus (CMV).5 Mb). invented the method of "shotgun" sequencing. The final step involved the utilization of custom primers to elucidate the gaps between the contigs thus giving the completely sequenced genome. National Academies Press. At the time the sequencing of model organisms such as S. a task that seemed beyond the scope of technology due to its tremendous size of 3000 Mb. a viral genome with only 5368 base pairs (bp). Sequencing smaller genomes would highlight the problems with sequencing techniques eventually refining the technology to be used on large-scale projects like H. This method allowed sequencing projects to proceed at a much faster rate thus expanding the scope of realistic sequencing venture. Committee on Review of the Department of Energy's Genomics:GTL Program. The amplified portions of DNA are then assembled by their overlapping regions to form contiguous transcripts (otherwise known as contigs). The success with viral genome sequencing stemmed from the relatively small length of their genetic codes. cerevisiae had a sequence approximately 60 times larger than any sequence previously attempted indicating why Goffeau felt compelled to invite the cooperation of a group of laboratories. Most laboratories utilized Sanger's "shotgun" method of sequencing that had become the accepted standard for genome sequencing. sapiens. in another revolutionary discovery. Goffeau's European collaboration involved 74 different laboratories drawn to the project in hopes of sequencing the homologs of their favourite genes 2. and the 187 kb mitochondrial and the 121 kb chloroplast genomes of Marchantia polymorpha. 2 Review of the Department of Energy's Genomics: Gtl Program. cerevisiae appeared to be the logical step towards the eventual characterization of the human genome. National Research Council (U. Frederic Sanger. In 1989. In addition. Bacteriophage fX174 was the first genome to be sequenced.).S. S.
In the wake of this pronouncement came the start of three projects aimed at elucidating the sequences of smaller model organisms. These projects were the culmination of over seven years of intensive work. cerevisiae in 1997.Genome Sequencing Project The following year saw the initiation of a plethora of ambitious sequencing proposals the foremost being the introduction of the Human effort of the Department of Energy and the National Institute of Health that was designed as a three-step program to produce genetic maps. 3 . and finally the complete nucleotide sequence map of the human chromosomes. Many anticipated that E. Venter's team utilized a more comprehensive approach by "shotgunning" the entire 1. These segments are "shotgunned" into smaller pieces and then sequenced to reconstruct the genome. Mycoplasma capricolum. Previously. coli K-12. each containing up to 40 Kb of DNA. reassembling the approximately 24000 DNA fragments into the whole genome.4% of the total sequencing efforts was duplicated among laboratories. coli would be the first genome to be sequenced entirely but to the shock of the science community. The yeast genome was the final result of a tremendous international collaboration of more than 600 scientists from over 100 laboratories representing the largest decentralised experiment in modern molecular biology. Genome Project in 1990. It was hoped that these projects would increase the efficiency of sequencing but unfortunately they fell short of this task. Previous sequencing projects had been limited by the lack of adequate computational approaches to assemble the large amount of random sequences produced by "shotgun" sequencing.6 Mb) but equally important in terms of experimental utility. physical maps. Craig Venter from the Institute for Genomic Research (TIGR) and Nobel laureate Hamilton Smith of Johns Hopkins University. influenzae genome was "shotgunned" and the clones purified sufficiently the TIGR Assembler software required approximately 30 hours of central processing unit time on a SPARCenter 2000 containing half a gigabyte of RAM testifying to the enormous complexity of the computation.8 Mb bacterium with new computational methods developed at TIGR's facility in Gaithersburg. The first two aims of the project are practically fulfilled and now the majority of work is concentrated on the exact nucleotide sequence of the human. In an incredible display of organizational mastery only 3. overlapping segments. influenzae genome.S. The U. the bacterium E. developed by TIGR. coli is the preferred model in biochemical genetics. Human Genome Project (HGP) is a joint A team headed by J. an outsider won the race for the first complete genome sequence of a free living organism. TIGR's dramatic leadership role in the field of genome sequencing was paralleled by the final completion of two of the largest genomic sequences. molecular biology. such an approach would have failed because the software did not exist to assemble such a massive amount of information accurately. similar to S. Europe. called the TIGR Assembler was up to the task. S. Maryland. the genome is broken down laboriously into ordered. The E. and the United States producing the largest full length sequence (12 Mb) ever done. such as Escherichia coli. After the H. Software.8 Mb H. In conventional sequencing. Canada. Haemophilus influenzae. E. The final work represented efforts of scientist from Japan. cerevisiae in their academic utility. and Caenorhabditis elegans. and the yeast. coli sequence was considerably smaller (4. sequenced the 1.
storage. but 100000 genes reflecting not their similarity in electronic configuration but their evolutionary and functional relationship. the fruit fly. analysis. and eleven other microbial genomes under the length of 4. The rapid proliferation of biological information in the form of genome sequences has been the major factor in the creation of the field of bioinformatics. coli and yeast. elegans which is 71% completed (finished: 1998). This field will be challenged by the heightening demands of increased information on the algorithms currently utilized for sequence manipulation. the Human Genome Project will allow modern scientists to construct a biological periodic table relating units of nucleotides. The growing sequence knowledge of the human genome has been likened to the establishment of the periodic table in the 19th century. access. the mouse which has less than 1% finished (December 2007: only 20%). E.Genome Sequencing Project and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental. Bioinformatics will be the tool of the modern scientist in interpreting this periodic table of biological information. which focuses on the acquisition. Drosophila melanogaster which is 6% completed (finished: 2006). modelling. Just as past chemists systematically organized all elements in an array that captured their differences and similarities. On September 1997. The periodic table will not contain 100 elements.5% completed (current: 92%). C. and distribution of the many types of information embedded in DNA sequences. and industrial organism. Four other large-scale projects are in progress including the sequencing of the Nematode. 4 . thirteen genome sequences of free-living organisms had been completed including the two largest.2 Mb. medical. and the human which is only 1.
edu/course/projects/final-4/ (111008) 5 . overlap.cbcb. In a shotgun sequencing project. and detecting all places where two of the short sequences. the entire DNA from a source (usually a single organism.mbb. These overlapping reads can be merged together. all of which were generated by a shotgun sequencing project. and putting them back together to create a representation of the original chromosomes from which the DNA originated3. These pieces are then "read" by automated sequencing machines. anything from a bacterium to a mammal) is first fractured into millions of small pieces.yale. (The four bases are adenine. and the process continues4. which can read up to 900 nucleotides or bases at a time. or reads. Original DNA is broken into a collection of fragments The ends of each fragment (drawn in green) are sequenced 3 4 http://www.) A genome assembly algorithm works by taking all the pieces and aligning them to one another.edu/research/assembly_primer. and thymine. cytosine.Genome Sequencing Project Genome Assembly Genome assembly refers to the process of taking a large number of short DNA sequences. represented as AGCT. guanine.shtml (111008) http://bioinfo.umd.
Barnes. It is important to note that the shotgun sequencing process is inherently "wasteful" as. The graph below shows a plot of the LanderWaterman equation for a genome of 1Mbp (1000000 base pairs). this phenomenon can be understood by thinking of a sidewalk as it begins to rain. 5 for a 1Mbp genome). Intuitively. Michael R. corresponding to regions of the genome that are not represented in the set of shotgun reads. Using such overlaps between the sequences. They examined the correlation between the oversampling of the genome (coverage) and the number of contiguous pieces of DNA (contigs) that can be re-constructed by an idealized assembly program. due to the randomness of the shearing process. Between 8 and 10-fold coverage the model predicts that most of the genome will be assembled into a small number of contigs (approx. the assembler can join the sequences together in a manner similar to solving a jigsaw puzzle. assembly is only possible once enough sequences are generated to cover the genome 8 to 10 times. dry spots persist for quite a while. 6 . this phenomenon was modelled by Eric Lander and Michael Waterman in 1988. 5 Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data. John Wiley and Sons. 2007. UK. London.Genome Sequencing Project The sequence reads are assembled together based on sequence similarity Assembly Statistics The assembler relies on the basic assumption that two sequences reads (two strings of letters produced by the sequencing machine) that share a same string of letters originated from the same place in the genome (see pic above)5. Mathematically. As raindrops fall randomly across the sidewalk.
Even at 8-10 fold coverage.cbcb. 6 http://www.t. coli replicates its own genome. leading to the presence of gaps in the coverage. however. thereby replicating the fragment as E. In all but the simplest cases. certain regions are toxic to the E. genome coverage Assembly Challenges6 Ideally. In most genomes. Each shotgun fragment must be cloned.r.shtml (111008) 7 .Genome Sequencing Project Lander-Waterman estimation of number of contigs w.umd. many contigs are produced due to a combination of factors. coli bacterium. Sanger sequencing requires many copies of each fragment in order for the sequencing chemistry to be possible. an assembly program should produce one contig for every chromosome of the genome being sequenced. is the fact that the distribution of the sheared fragments along the genome cannot be modelled as a perfect Poisson process.edu/research/assembly_primer. a procedure usually performed by inserting the fragment into the cell of the Escherichia coli bacterium (called a vector) and allowing this bacterium to grow. there is a non-zero probability that some portion of the genome remains unsequenced. More importantly. however.
AMOS (A Modular. who are now at the University of Maryland. Celera Assembler was a key element in the successful assembly of the human genome by Celera Genomics and is currently used in numerous bacterial and eukaryotic projects.assembly program developed at Celera Genomics.assembly program developed at the University of Washington. AMOS was initiated at The Institute for Genomic Research by Steven Salzberg. used throughout the years in the assembly of many bacterial and eukaryotic genomes. this has changed as the software has grown more complex and as the number of sequencing centres has increased. 2. and Art Delcher. most large-scale DNA sequencing centres developed their own software for assembling the sequences that they produced.Genome Sequencing Project The ability of an assembly program to produce a single contig is also limited by regions of the genome that occur in multiple near-identical copies throughout the genome (repeats). The assembly program incorrectly combined the reads from the two copies of the repeat leading to the creation of two separate contigs Assembly software Originally. Genome misassembled due to a repeat. Open-Source assembler)7 is a well-known open source effort to bring together the efforts of leading genome assembly code developers. Among the list of available assemblers are: 1. The Celera Assembler . phrap is one of the most widely used assembly programs. Despite its age. Most notably. phrap was the main workhorse in the public effort to sequence the human genome. The reads originating from different copies of a repeat appear identical to the assembler and cause assembly errors. 7 http://amos. The reads coloured in red and those coloured in yellow appear identical to the assembly program. Celera Assembler demonstrated the applicability of the shotgun method to the assembly of a whole eukaryotic genome by successfully assembling the genome of the fruit fly Drosophila melanogaster. 3. However. Phrap .net/ (111008) 8 . A simple example: Two copies of a repeat along a genome. Mihai Pop.sourceforge.
This assembler was used to generate the first sequence of a free living organism Haemophilus influenzae. 9 .assembly program developed at the Institute for Genomic Research (TIGR). The Arachne .program developed at the Broad Institute of MIT. TIGR Assembler . Arachne and Celera Assembler are arguably the best assemblers available to the scientific community for the assembly of large eukaryotic genomes.Genome Sequencing Project 4. widely used in genome projects both at the Broad Institute and other research organizations. accomplishment reported in the journal Science in 1995. 5.
es/ (121008) 1 .and their localisation. The Ensembl database relies on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline8. • biochemical function • biological function • involved regulation and interactions • expression These steps may involve both biological experiments and in silico (performed on computer or via computer simulation) analysis. The basic level of annotation is using BLAST for finding similarities. a process called gene finding.portion of an organism's genome which contains a sequence of bases that could potentially encode a protein . Ideally. Genome annotation is the next major challenge for the Human Genome Project. identifying elements on the genome. Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means: http://www. now that the genome sequences of human and several model organisms are largely complete.ehu. similarity scores. attaching biological information to these elements. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together". the SEED database uses genome context information.org/IJDC/DB/ (121008) 9 http://insilico. However. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. A variety of software tools have been developed to permit scientists to view and share genome annotations9.seedling. these approaches co-exist and complement each other in the same annotation pipeline. 2. and then annotating genomes based on that. as opposed to manual annotation (also called curation) which involves human expertise. For example. experimental data. • gene structure. Automatic annotation tools try to perform all this by computer analysis. It consists of two main steps: 1. • location of regulatory motifs. • coding regions. Structural annotation consists in the identification of genomic elements: • Open reading frames (ORFs) . and integrations of other resources to provide the most accurate genome annotations through their Subsystems approach. 8 Functional annotation consists in attaching biological information to genomic elements.Genome Sequencing Project Genome Annotation Genome annotation is the process of attaching biological information to sequences. nowadays more and more additional information is added to the annotation platform.
4. animal and microbial genomes. The collection includes sequences from plasmids. and protein sequences from diverse taxa.central repository for high quality. the GO Consortium has grown to include many databases. The results of this pilot phase will guide future efforts to analyze the entire human genome. Ensembl . Encyclopedia of DNA Elements (ENCODE) . Gene Ontology Consortium . 1 . naturally occurring molecule from one organism. Uniprot . frequently updated.joint project between EMBL . 5. 3. 2. including several of the world's major repositories for plant. Since 1998. Vertebrate and Genome Annotation Project (Vega) . RefSeq . Each RefSeq represents a single. manual annotation of vertebrate finished genome sequence. RNA. high-quality and freely accessible resource of protein sequence and functional information.collaborative effort to address the need for consistent descriptions of gene products in different databases.aims to identify all functional elements in the human genome sequence. The pilot phase of the project is focused on a specified 30 megabases ( 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. bacteria. viruses.non-redundant collection of richly annotated DNA.provide the scientific community with a comprehensive.EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. organelles. Achaea.Genome Sequencing Project 1. 6. and eukaryotes.
occasionally more depending on the variety and soil fertility. both wild and domesticated. Rice is staple food for a large part of the human population and making it the second-most consumed cereal grain especially Latin America. K. Oryza sativa : native to tropical and subtropical southern Asia and African rice. some governments and retailers began rationing supplies of the grain due to fears of a global rice shortage. Rajeev K. There are two species of domesticated rice in the Poaceae (“true grass”) family. East Asia. P.5cm broad. Gupta (contributor) Springer. Has long. It is usually used for species of the different but related genus Zizania. Varshney. K. P. and Africa Potential of rice Improve nutrition Boost food security Foster rural development Support sustainable land care Provides more than one fifth of the calories consumed worldwide by humans. The edible seed is grain 5-12 mm long and 2-3 mm thick. Southeast Asia. There also have another type of rice other than domesticated rice. UK 1 . The small wind-pollinated flowers are produced in a branched arching to pendulous inflorescence 30-50cm long. Although in tropical areas it can survive as perennial and can produce a ratoon crop and survive for up to 20 years. wild rice. Oryza glaberrima : is native to West Africa. Gupta. 2004. London. 10 Cereal Genomics. slender leaves 50-100 cm long and 2-2. In early 2008.8m tall. Can grow to 1-1. South Asia.Genome Sequencing Project Plant Genome Sequencing Project – Rice Rice is a cereal foodstuff which forms an important part of the diet of many people worldwide and as such it is a staple food for many. Rice’s life10 Grown as a monocarpic annual plant. although the term may be used for primitive or uncultivated varieties of Oryza.
glaberrima.sativa var indica on the Indian side and O. rice will become more inclined to remain flooded for longer periods of time11. All other methods of irrigation require higher effort in weed and pest control during growth periods and a different approach for fertilizing the soil. During the wet season. more than other food products with the exception of beef and dairy products. While with rice growing and cultivation the flooding is not mandatory.sativa be best divided into five groups. rice cultivation responsible for most of the methane emissions. Genetic History As we know. although its species are native to South Asia and certain parts of Africa. whose genome did show significant differences in age. On the other hand. Environmental Effect In many countries where rice is the main cereal crop. exemplified by Basmati rice. This method requires sound planning and servicing of the water damming and channelling. How to cultivated rice Flooding the fields with or after setting the young seedlings.Genome Sequencing Project Rice cultivation is well-suited to countries and regions with low labour costs and high rainfall. and deters vermin.sativa appears to have been domesticated from the crop wild relative Oryza rice. and the broad-grained “javonica” varieties.org/rice2004/en/rice4. and it is very labour-intensive to cultivate and requires plenty of water for cultivation.htm (121008) 1 . Farmers in some of the arid regions try to cultivate rice using groundwater bored through pumps. with O. which thrive under tropical conditions. Rice can be grown practically anywhere. rice cannot hold the carbon in anaerobic conditions. Current genetic analysis suggests that O. O. Temperate japonica and tropical japonica Labelled indica Aus Aromatic. Other studies have suggested that there are three groups of Oryza sativa cultivars: the short-grained “japonica” or “sinica” varieties. the long-grained “indica” varieties. thus increasing the chances of famine in the long run. As sea levels rise. According to Londo and Chiang. rza rufipogon around the foothills of the Himalayas. Longer stays in water cuts the soil off from atmospheric oxygen and causes fermentation of organic matter in the soil. Rice also requires much more water to produce than other grains. Labelled indica Aus Aromatic Temperate japonica Tropical japonica Further analysis of the genetic material of various types of rice indicates cultivar to emerge start with. mechanized cultivation is extremely oil-intensive. even on a steep hill or mountain.fao.sativa and O. O. The microbes in the soil convert the carbon into methane which is then released through the respiration of the rice plant or through diffusion of 11 http://www. but reduces growth of less robust weed and pest plant that have no submerged growth state. there are two species of rice were domesticated.sativa var japonica on the Chinese and Japanese side. exemplified by Japanese rice.
Escalada. misuse of insecticides can actually lead to pest outbreaks12. but in general the practice is not common. Rice pests are managed by cultural techniques. Methane is twenty times more effective as a greenhouse gas than carbon dioxide is. When a rice variety is no longer able to resist pest infestations. Pest Management of Rice Farmers in Asia.Genome Sequencing Project water. 12 Among rice cultivars there are differences in the responses to. insects. One of the challenges facing crop protection specialists is to develop rice pest management techniques which are sustainable. pest damage. rodents and birds. 1 . and retain their ability to withstand the pests are said to have durable resistance. Nonpreference : host plants which insects prefer to avoid Antibiosis: where insect survival is reduced after the ingestion of host tissues Tolerance: the capacity of a plant to produce high yield or retain high quality despite insect infestation. Some upland rice farmers in Cambodia spread chopped leaves of the bitter bush over the surface of fields after planting. 1997. Botanicals. Further rise in sea level of 10-85 centimetres would then stimulate the release of more methane into the air by rice plants. Rice pests include weeds. The practice probably helps the soil retain moisture and thereby facilities seed germination. the use of pest resistant rice varieties selects for pests that are able to overcome these mechanisms of resistance. including the overuse of pesticides and high rates of nitrogen fertilizers application. Philippines. PR China and Taiwan. so called “natural pesticides”. to manage crop pests in such manner that future crop production is not threatened. Increasingly. Farmers also claim the leaves are a natural fertilizer and helps suppress weed and insect infestations. Upland rice is grown without standing water in the field. Manila. International Rice Research Institute. Therefore. there is evidence that farmer’s pesticide applications are often unnecessary. M. and recovery from. In other words. M. A variety of factors can contribute to pest outbreaks. Major rice pests include: The brown planthopper Armyworms The green leafhopper The rice gall midge The rice bug Hispa The rice leaffolder Stemborer Rats The weed Echinochloa crusgali Rice weevils also known to be a threat to rice crops in the US. are used by some farmers in an attempt to control rice pests. Rice varieties that can be widely grown for many years in the presence of pests. pest-resistant rice varieties and pesticides. resistance is said to have broken down. The genetically based ability of a rice variety to withstand pest attack is called resistance. Kong Luen Heong. Three main types of plant resistance to pests are recognized as. By the reducing the populations of natural enemies of rice pests. Weather conditions also contribute to pest outbreaks. Over time. Pests and Disease Rice pests are any organisms or microbes with the potential to reduce the yield or value of the rice crop. pathogens. particular cultivars are recommended for areas prone to certain pest problems.
is the most significant disease affecting rice cultivation.000 rice accessions held in the International Rice Genebank13. Ambemohar : fragrance of Mango blossom Aromatic rices : have definite aromas and flavours . was genetically manipulated to increase its yield. Japanese table rice: sticky. 13 http://beta. d) Texmati Biotechnology High Yielding Varieties The high yielding varieties are a group of crops created intentionally during the Green Revolution to increase global food production. Indian rice : long-grained and aromatic Basmati Patna rice : long and medium-grained Sona masoori : short-grained Ponni: grown in the delta regions of Kaveri River. and into industrial sectors. and these are generally called “floating rice”. Japanese mochi rice & Chinese sticky rice: short-grain. is long-grain and relatively less sticky. there are varieties selected for other reasons.Genome Sequencing Project Major rice diseases: Rice ragged stunt Sheath blight Tungro Rice blast. This project enabled labour markets in Asia to shift away from agriculture. caused by the fungus Magnaporthe grisea. Rice cultivars are often classifieds by their grain shapes and texture. Cultivars While most breeding of rice is carried out for crop quality and productivity. like corn and wheat. Chinese restaurants usually serve long-grain as plain unseasoned steamed rice. Japanese sake rice: another kind as well.php?option=com_frontpage&Itemid=1 (131008) 1 . Chinese people use sticky rice which is properly known as “glutinous rice” to make zongzi. with over 100. Rice. a) Thai fragrant rice b) Patna rice c) Basmati .org/statistics/index. as longgrain rice contains less amylopectin than short-grain cultivars.irri. For example: That Jasmine Rice. The largest collection of rice cultivars is at the International Rice Research Institute (IRRI). Cultivars exist that are adapted to deep flooding. short-grain rice.have a mild popcorn-like aroma and flavour.
With an estimated genome size of 430 Mb. lysozyme. Several attributes such as small genome size. and human serum albumin which are proteins usually found in breast milk. japonica var "Nipponbare" using a bacterial artificial chromosome/P1 artificial chromosome shotgun sequencing strategy. and establishment of genetic and molecular resources make it a tractable organism for plant biologists. transformability. 2007) and International Herald Tribune (October 9. hoping to increase productivity.Genome Sequencing Project The first “modern rice”. An important way this can happen is the production of ‘New Rices for Africa’ (NERICA). antibacterial. thereby shortening their duration and reducing recurrence. These proteins have antiviral. Annotation tools such as optimized gene prediction programs are being developed for rice to improve the quality of annotation. Annotation of the rice genome is performed using prediction-based and homology-based searches to identify genes. diploid nature. An international effort has been established and is in the process of sequencing O. Golden Rice German and Swiss researchers have engineered rice to produce Betacarotene. Genome Project Rice (Oryza sativa) is a model species for monocotyledonous plants. IR8 was created through a cross between an Indonesian variety named “Peta” and a Chinese variet named “Dee Geo Woo Gen” Potential for the Future As the UN Millennium Development project seeks to spread global economic development to Africa. thereby maximizing the output from the rice genome project. The NERICA have appeared in The New York Times (October 10. The addition of the carotene turns the rice gold Expression of Human Protein Ventria Bioscience has genetically modified rice to express lactoferrin. especially for members in the grass family. Rice containing these added proteins can be used as a component in oral rehydration solutions which are used to treat diarrheal diseases. and antifungal effects. Resources are also being developed to leverage the rice genome sequence to partial genome projects such as expressed sequence tag projects. groups like the Earth Institute are doing research on African agricultural systems. trumpeted as miracle crops that will dramatically increase rice yield in Africa and enable an economic resurgence. selected to tolerate the low input and harsh growing conditions of African agriculture are produced by the African Rice Center. sativa spp. Such supplements may also help reverse anemia. for Africa. IR8 was produced in 1966 at the International Rice Research Institute which is based in the Philippines at the University of the Philippines’ Los Banos site. we have aligned all rice bacterial artificial chromosome/P1 artificial chromosome sequences with The Institute of Genomic Research Gene Indices that are a 1 . the ‘Green Revolution’ is cited as the model for economic development. With the intent of replicating the successful Asian boom in agronomic productivity. and billed as technology from Africa. 2007). it is feasible to obtain the complete genome sequence of rice using current technologies. with the intent that it might someday be used to treat vitamin A deficiency. To provide a low level of annotation for rice genomic sequences. Additional efforts are being made to improve the quantity and quality of other nutrients in golden rice. These rices.
has been compiled and placed in computer data banks around the world. It will be a key tool for researchers working on improved strains of rice and other grains as they struggle to stay ahead of human population growth. "This is really a project that can lead to important discoveries and findings that can help the condition of the poor. Arabidopsis. in theory."14 The number of people in the world is expected to increase 50 percent. by the middle of this century. which it normally lacks. and how many countries will embrace it remains to be seen. barley. or genome. including introducing genes from other species to create desirable traits. sorghum. For example. to perform sophisticated genetic manipulations of the rice plant. The new map will make it possible. one project introduced a daffodil gene into rice to turn the plant into a source of vitamin A. 14 Washington Post.Genome Sequencing Project set of nonredundant transcripts that are generated from nine public plant expressed sequence tag projects (rice. maize. Rice is the first crop plant whose complete genetic sequence. Much of that growth will come in Asian countries where rice is the dietary staple. But that kind of work has been controversial. "You could equate this to being as important as the Human Genome Project. data from The Institute of Genomic Research Gene Indices and the Arabidopsis and Rice Genome Projects was used to identify putative orthologues and paralogues among these nine genomes. potato. In addition. said Rod Wing. a scientist at the University of Arizona who was a key participant in the rice project." which recently compiled a human genetic map. to 9 billion. wheat. The poorest of the poor are the ones that depend on rice the most. tomato. 11 August 2005 1 . and barrel medic).
of St. The great cereals whose cultivation made human civilization possible -. China. Brazil and Britain. Louis and Syngenta AG of Basel. agriculture. Japan and other places.Genome Sequencing Project More important in the short term. That makes the cereals close genetic relatives. They have been embraced by U. It is a crucial model for understanding the biology of all cereals. While the map is an important achievement. Rice is the principal source of calories for about half the world's population.but. by giving scientists more precise knowledge of how the plants work.S. France.descended from a common ancestor. and its importance is rising rapidly in urban Africa. completion of the rice genome is expected to speed conventional breeding programs. Monsanto Co. and the United Nations Food and Agriculture Organization projects that demand will raise sharply in coming decades. Taiwan. where it is being embraced as easier to prepare than many traditional African foods. which for decades has funded research aimed at feeding the world. India. wheat and corn are the most important . but the plant is vitally important to them nonetheless. allowing researchers to produce rice strains that resist drought and disease and that grow in colder climates and at higher elevations. and rice. though the many other strains of rice . scientists are about to tackle the far larger genome of corn. brown rice. It is a map of the Nipponbare strain of white rice grown in Japan. an independent genetics laboratory founded by maverick scientist J. while the Rockefeller Foundation is funding work in the Philippines and other countries on strains that could yield enough even in drought years to keep a farm family from starving. are hot on the trail of genetic variations that might allow rice to grow in colder climates.S. researchers in Japan. using the new map. They need to learn to read the genetic messages and understand how the proteins in rice interact with one another. purple rice . Switzerland. Two Western agricultural companies. 1 . Building on their success with rice. farmers. Scientists now have a rice genome with but a few gaps. it may also help to reduce some of the theoretical risks that have led to controversy.rice. more abundant rice is seen as one of the keys to reducing hunger worldwide. basmati rice. Thailand. Korea. Rice is a minor component of most diets in the developed world but it supplies most daily calories for people in Asia who remain in poverty. with the smallest genome. the most important commodity in U. helped get the project off the ground. contributed genetic information that moved up completion of the project by at least a year. Those are critical needs as Asia's rapid urbanization reduces the land available for rice cultivation. but vociferously rejected by consumers in Europe. Seed rice is not a major product for companies like Monsanto and Syngenta. it also means that an immense new task opens before the world's plant biologists. It was led by scientists in Japan but involved teams from the United States. It is critically important to poor people in Latin America. It cost more than $100 million.red rice. scientists said.are expected to be similar. Availability of the rice genome will make such genetic manipulation easier in all the cereals . Craig Venter. proved to be the easiest to analyze. The Rockefeller Foundation of New York. a wild grass that lived more than 50 million years ago. which is likely to take decades. Companies like Syngenta and Monsanto have brought genetically modified strains of corn and other crops to market. A lot of the work was done in Rockville at the Institute for Genomic Research. Already. Cheaper. The International Rice Genome Sequencing Project began in 1998.
Genome Sequencing Project "Our work is not over. Access and Intellectual Property • Domination of world food production by a few companies. • Increasing dependence on industrialized nations by developing countries. Objections to consuming animal genes in plants and vice versa. United States) • Mixing GM crops with non-GM products confounds labeling attempts Society • New advances may be skewed to interests of rich countries 1 .. Stress for animal. Japan. "It's just starting. unknown effects on other organisms (e. Ethics • • • • Violation of natural organisms' intrinsic values. and loss of flora and fauna biodiversity.g. Tampering with nature by mixing genes among species.g. • Biopiracy or foreign exploitation of natural resources. unknown effects.. soil microbes)." Issues and Controversies in Plant Genome Project Safety • Potential human health impacts. including: unintended transfer of transgenes through cross-pollination. and principal leader of the rice genome project. Labeling • Not mandatory in some countries (e. including allergens. • Potential environmental impacts." said Takuji Sasaki. transfer of antibiotic resistance markers. vice president of the National Institute of Agrobiological Sciences in Tsukuba.
Behaviour In the wild state. although wild mice may have a reproductive season extending only from April to September. but mutant and calorie-restricted captive individuals have lived for as long as 5 years. Because of their association with humans. Some individuals spend the summer in fields and move into barns and houses with the onset of cool autumn weather. barns. but they seldom stray far from buildings. fencerows. Litters consist of 3-12 (generally 5 or 6) offspring. They are fully furred after 10 days. and reach sexual maturity at 5-7 weeks. are weaned at 3 weeks.in houses. Young mice are cared for in their mother's nest until they reach 21 days old. but individuals have lived for as long as 6 years. If a house mouse is a pet. and they generally have white or buffy bellys. which are born naked and blind. several chambers for nesting and storage. granaries. house mice generally dwell in cracks in rocks or walls or make underground burrows consisting of a complex network of tunnels. Mus musculus is characterized by tremendous reproductive potential. Average life span is about 2 years in captivity. with estrus lasting less than a day. They also occupy cultivated fields. In the wild.Genome Sequencing Project Animal Genome Sequencing Project – Domestic Rice Introduction Mus musculus may have originally been distributed from the Mediterranean region to China. and even wooded areas. Breeding occurs throughout the year. house mice have been able inhabit inhospitable areas (such as tundra and desert) which they would not be able to occupy independently. Soon after this most young mice leave their mother's territory. Many domestic forms of mice have been developed that vary in colour from white to black and with spots. most mice do not live beyond 12-18 months. Wild-derived captive Mus musculus individuals have lived up to 4 years in captivity. most mice do not live beyond 12-18 months. Their fur ranges in colour from light brown to black. House mice generally live in close association with humans . The recent discovery of ultrasonic songs produced by male mice. their tails are 60 to 105 mm long. though young females are more likely to stay nearby. House mice tend to have longer tails and darker fur when living closely with humans. and three or four exits. The estrous cycle is 4-6 days long. but it has now been spread throughout the world by humans and lives as a human commensally. open their eyes at 14 days. when exposed to female sex pheromones. They range from 12 to 30 g in weight. They have long tails that have very little fur and have circular rows of scales (annulations). In the wild. House mice have a polygynous mating system. Gestation is 19-21 days but may be extended by several days if the female is lactating. Females experience a postpartum estrus 12-18 hours after giving birth. When House mice are from 65 to 95 mm long from the tip of their nose to the end of their body. etc. Females generally have 5-10 litters per year if conditions are suitable. the average life span is about 2 years. 2 . suggests that this behavior may be involved in mate choice. but as many as 14 have been reported.
Occasionally. in woodpiles. house mice eat many kinds of plant matter. Domesticated forms and albinos have been developed which are commonly used as laboratory animals (especially in medicine and 15 The Mouse in Animal Genetics and Breeding Research. but all the individuals in a territory will defend an area against outsiders. Insects (beetle larvae. snakes. In human habitation.Genome Sequencing Project living with humans. they contribute to the spread of diseases such as murine typhus. Many mice store their food or live within a human food storage facility. House mice are generally nocturnal. good climbers. making these foods unavailable to other (perhaps native) animals. a keen sense of smell. large lizards. storage areas. Imperial College Press. jumpers. leaves and stems. London 2 . such as seeds. House mice are quick runners (up to 8 miles per hour). (contributor) Eugene J. Economic Importance for Human? House mice do not cause such serious health and economic problems as do Rattus norvegicus and Rattus rattus. mongooses. Predation House mice are eaten by a wide variety of small predators throughout the world. Females establish a loose hierarchy within the territories. House mice are also important prey items for many small predators.the mouse mammary tumour virus (MMTV) 15 . however. Mus musculus consumes any human food that is accessible as well as glue. Communication and Perception House mice have excellent vision and hearing. They are also capable of reproducing very rapidly. upholstery. House mice often squeak to each other in the nest. or other soft substances and line them with finer shredded material. Dominant males set up a territory including a family group of several females and their young. It was recently discovered that male mice produce complex. however. hawks. ultrasonic songs in response to female sex pheromones. They also destroy woodwork. falcons. foxes. fleshy roots. caterpillars. They use pheromones and other smells to communicate with each other about social dominance. and owls. and cockroaches) and meat (carrion) may be taken when available. House mice try to avoid predation by keeping out of the open and by being fast. Mice are agricultural pests in some areas. or any hidden spot near a source of food. although some (especially females) may remain in the vicinity of their parents. In addition. weasels. Eugene J. ferrets. subordinate males may occupy a territory or males may share territories. and use their whiskers to feel air movements and surface textures. They construct nests from rags. rickettsial pox. and reproductive readiness. and clothing. furniture. Mus musculus is generally considered both territorial and colonial when living commensally with humans. tularemia. Aggression within family groups is rare. soap. Eisen. Eisen. Territoriality is not as pronounced in wild conditions. they rarely travel more than 50 feet from their established homes. and they do consume and contaminate stored human food with their droppings.that may contribute to breast cancer in humans. Food Habit In the wild. including cats. but they are far less aggressive than males. food poisoning (Salmonella). house mice nest behind rafters. and also swim well. Ecosystem Roles Where house mice are abundant they can consume huge quantities of grains. which means that populations can recover quickly from predation. and other household materials. Despite this. family composition. although some are active during the day in human dwellings. paper. Young mice are generally made to disperse through adult aggression. and bubonic plague. Recent research has also shown that they carry a virus . 2005.
As many as seven separate species may be placed under Mus musculus. but this is minimal. and Business. such as Mus domesticus. or be active at different times during the life of a person or a mouse. The current estimated gene count is 23.786. Researchers state that having a publicly available mouse genome sequence draft means we can move from knowing that a general region of the genome is contributing to a disease state or biological process. It will save investigators months. Although both man and mouse share genes. 2 . they also share 'nongene' regions that may regulate genes and these could be critical to understanding why humans develop certain disease16. Comparing humans and mice has the potential to reveal key features of mammalian biology. and the neighbourhoods in which these genes reside are strikingly similar in humans and mice. New York. western European house mice. causing the mice to weave. In fact. to actually looking at that region and seeing directly what genes are there. though many human and mouse genes appear to be similar. The former refers to a genetic strain with inner ear defects. "Dancing" and "singing" mice are other names for house mice.Genome Sequencing Project genetics). and wobble when they walk. Mus musculus often refers to several fairly distinct kinds of mice. The genes in humans and mice are essentially the same genes. 2004. although the mouse genome is fourteen percent smaller than the human genome. Estimating the number of genes contained in the mouse genome is difficult. Genome Project Sequencing of the mouse genome was completed in late 2002. of gene-hunting effort 16 Nature 5 420(6915):520-62 (2002) 17 Digital Code of Life: How Bioinformatics is Revolutionizing Science. they may have taken on slightly different roles.686 genes. and Mus castaneus. humans are estimated to have 23. Virtually every gene in the mouse is also present in humans. The latter refers to a pathological condition causing mice to twitter constantly with a "song" resembling that of a cricket. if not years. This estimate takes into account knowledge of molecular biology as well as comparative genomic data. in part because the definition of a gene is still being debated and extended. John Wiley and Sons. Mus musculus also has a small role as an insect destroyer. Medicine. turn in circles. and as pets. The mouse genome is essentially a reference manual for understanding the human genome. Glyn Moody. Researchers report that approximately 99 percent of mouse genes have counterparts in humans. they were inherited from a common mammalian ancestor millions of years ago however evolution changes genomes through the duplication and specialisation of genes. southeastern Asian house mice. For comparison. and more insights will emerge as more genomes are completed17. The haploid genome is about 3 billion bases long (3000 Mb distributed over 20 chromosomes) and therefore equal to the size of the human genome.
This order spells out the exact instructions required to create a particular organism with its own unique traits. DNA in the human genome is arranged into 24 distinct chromosomes: physically separate molecules that range in length from about 50 million to 250 million base pairs. the remainder consists of non-coding regions. whose functions may include providing chromosomal structural integrity and regulating where. A few types of major chromosomal abnormalities. DNA from all organisms is made up of the same chemical and physical components. Genomes vary widely in size: the smallest known genome for a free-living organism (a bacterium) contains about 600. 2 .. can be detected by microscopic examination. and in what quantity proteins are made. the dynamic proteome changes from minute to minute in response to tens of thousands of intra.000 genes. The human genome is estimated to contain 20. Whose genome was sequenced in the public (HGP) and private projects? The human genome reference sequences do not represent any one person’s genome. The knowledge obtained from the sequences applies to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes. Rather. Genes are specific sequences of bases that encode instructions on how to make proteins. ATTCCGGA). Unlike the relatively unchanging genome.and extracellular environmental signals. all human cells contain a complete genome. The DNA sequence is the particular side-by-side arrangement of bases along the DNA strand (e. Most changes in DNA.g.000-25. complex molecules made up of smaller subunits called amino acids.Genome Sequencing Project Human Genome Sequencing Project Introduction Cells are the fundamental working units of every living system. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell. while human and mouse genomes have some 3 billion. The genome is an organism’s complete set of DNA. Proteins are large. Studies to explore protein structure and activities. are more subtle and require a closer analysis of the DNA molecule to find perhaps single-base differences. Genes comprise only about 2% of the human genome. Each chromosome contains many genes. they serve as a starting point for broad comparisons across humanity. The constellation of all proteins in a cell is called its proteome. Except for mature red blood cells. when.000 DNA base pairs. known as proteomics. the basic physical and functional units of heredity. will be the focus of much research for decades to come and will help elucidate the molecular basis of health and disease. it’s the proteins that perform most life functions and even make up the majority of cellular structures. however. A protein’s chemistry and behaviour are specified by the gene sequence and by the number and identities of other proteins made in the same cell at the same time and with which it associates and reacts. Although genes get a lot of attention. All the instructions needed to direct their activities are contained within the chemical DNA (deoxyribonucleic acid). including missing or extra copies or gross breaks and rejoinings (translocations).
energy." with distances measured in centi-morgans. Only a few samples were processed as DNA resources. the DOE Human Genome Project has funded about 100 principal investigators. However. DNA clones from many libraries were used in the overall project. It indicates for each chromosome the whereabouts of genes or other "heritable markers. the National Institutes of Health (NIH). including equal numbers of cells with the X (female) or Y (male) sex chromosomes. HGP scientists also used white cells from female donors' blood to include samples originating from women. is based on careful analyses of human inheritance patterns. researchers collected blood (female) or sperm (male) samples from a large number of donors. companies are conducting genome research. many large and small private U. and laboratories throughout the United States also have received DOE and NIH funding for human genome research 18.small regions of DNA that vary among individuals also were identified during the HGP. Other researchers at numerous colleges. 18 http://www.gov/sciencetech/genome.occurs in which pieces of genetic material are swapped between paired chromosomes. although a minority contributes to the beneficial diversity of humanity. At least 18 other countries have participated in the Human Genome Project. Technically. Sets of human chromosomes Mapping the Genome One of the central goals of the Human Genome Project is to produce a detailed "map" of the human genome. a process of genetic recombination -or "crossing over" .S. In addition. it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. At any given time. Sperm contain all chromosomes necessary for study. a genetic linkage map. A much smaller minority of polymorphisms affect an individual’s susceptibility to disease and response to medical treatments. During the formation of sperm and egg cells. Thus donors' identities were protected so neither they nor scientists could know whose DNA was sequenced.S. Many polymorphisms . universities. a measure of recombination frequency.htm (181008) 2 . Who sequenced the human genome? Human Genome Project research was funded at many laboratories across the U. mostly single nucleotide polymorphisms (SNPs).Genome Sequencing Project In the international public-sector Human Genome Project (HGP). One type. or both. by the Department of Energy (DOE). Most SNPs have no physiological effect.
The average gap between markers was about 0. the less likely they are to get split up during genetic recombination. A close analogy can thus be drawn between physical maps and the road maps familiar to us all. means are also available to produce physical maps of much higher resolution . Further. Bruce W. they are said to be separated by a distance of one centimorgan. Just as small-scale road maps may show only large cities and indicate distances only between major features. synthetic cloning "vectors" modelled after bacteria-infecting viruses Genome Analysis: A Laboratory Manual. or copying. The role of human pedigrees now becomes clear. is based on restriction fragments cloned in cosmids. for example. specific segments of DNA can be targeted in intact chromosomes by using complementary strands synthesized in the laboratory. Birren. New York.Genome Sequencing Project This process of chromosomal scrambling accounts for the differences invariably seen even in siblings (apart from identical twins). typically. CSHL Press. process is a product of recombinant DNA technology. 2 . Just such a detailed physical map is one that emerges from the use of restriction enzymes . is the familiar chromosomal map. Huntington disease.a bacterium or yeast. and many other maladies. its physical location is thereby accurately pinned down. 2003. numbers of base pairs. or even unique segments of DNA identifiable only in the laboratory. Other maps are known as physical maps. for example. When they are close enough that the chances of being separated are only one in a hundred. a comprehensive map was available that included more than 5800 such markers. thus producing the multiple copies needed for further study. which can then be detected and thus pinpointed on a specific region of the chromosome.DNA-cleaving enzymes that serve as highly selective microscopic scalpels. for example replicates a "parasitic" fragment of human DNA. showing the distinctive staining patterns that can be seen in the light microscope. The highresolution chromosome 19 map. these conveniently sized clones become resources for further studies by researchers around the world . the analogy can be extended further. so called because the distances between features are measured not in genetic terms. If a gene can be localized to a single fragment within a contig map. Green. in which the natural reproductive machinery of a "host" organism . workers can eventually produce an ordered library of clones. and indicate distances at a similar level of detail. and the resulting map is a contig map. by a process known as in situ hybridization.as well as the natural starting points for systematic sequencing efforts. but in "real" physical units. Eric D. constructed at the Lawrence Livermore National Laboratory. the closer two genes are to each other on a single chromosome.analogous to large-scale county maps that show every village and farm road. Each contiguous block of ordered clones is known as a contig. One use of these handy tools involves cutting up a selected chromosome into small pieces. recognizes the DNA sequence GAATTC and selectively cuts the double helix at that site. A typical restriction enzyme known as EcoRI. each overlapping the next and together spanning long segments (or even the entire length) of the chromosome. including genes implicated in cystic fibrosis. Tay-Sachs disease. so a low-resolution physical map includes only a relative sprinkling of chromosomal landmarks. then cloning and ordering the resulting fragments. By cloning enough such fragments. The cloning. Logically. Fortunately. myotonic dystrophy. A well-known low-resolution physical map. geneticists can begin to pin down the relative positions of these genetic markers. Indeed. By the end of 1994. USA. These laboratory-made "probes" 19 carry a fluorescent or radioactive label. Two giant steps: Chromosomes 16 and 19 One of the signal achievements of the DOE genome effort so far is the successful physical mapping of chromosomes 16 and 19. several cancers. Further. By studying family trees and tracing the inheritance of diseases and physical traits.7 centimorgan19.
responsible for a form of dwarfism known as pseudoachondroplasia. An emerging gene map shows the locations of the mapped genes. 2 . About 2000 other genes are likely to be found eventually on chromosome 19. Among these genes is the one responsible for the most common form of adult muscular dystrophy (DM). more than 95 percent of the chromosome. more than 200 cosmids have been more accurately ordered along the chromosome by a high-resolution FISH technique in which the distances between cosmids are determined with a resolution of about 50. a second form of kidney disease. comprising 700 YACs from a library constructed by the Centre d'Etude du Polymorphisme Humain (CEPH). The framework for the Los Alamos effort is yet another kind of map. has also been identified. a portion of it. A second important disease gene (COMP). assembled into about 500 contigs covering 60 percent of the chromosome.000 base pairs.Genome Sequencing Project known as bacteriophages. The cosmid contig map is an especially important step forward. Moreover. it includes 250 smaller YAC clones that have been merged with the cosmid contig map. a "cytogenetic breakpoint map" based on 78 lines of cultured cells. is reproduced here as Mapping chromosome 16. provides practically complete coverage of the chromosome. of which nearly 300 have been incorporated into the ordered map.and low-resolution maps have been tied together by sequencetagged sites (STSs). leading to a breakpoint map that divides the chromosome into segments whose lengths average 1. the EcoRI restriction sites have been mapped on more than 45 million base pairs of the overall cosmid map. The low-resolution map. The foundation of the chromosome 19 map is a large set of cosmid contigs that were assembled by automated analysis of overlapping but unordered restriction fragments. with cosmid reference points separated by an average of 230. but has not yet been precisely pinpointed. and with genetic maps developed at the Adelaide Children's Hospital and by CEPH. A readable display of this integrated map covers a sheet of paper more than 15 feet long. And yet another gene. the Los Alamos National Laboratory Center for Human Genome Studies has completed a highly integrated map of chromosome 16. They have also been integrated into the breakpoint map. Most of the contigs have been mapped by fluorescence in situ hybridization to visible chromosomal bands. a chromosome that contains genes linked to blood disorders. these clones have been restriction mapped to allow identification of a minimum set of overlapping clones for a large-scale sequencing effort. much reduced and showing only some of its central features. short but unique stretches of DNA sequence. a cosmid hijacks the cellular machinery of a bacterium to mass-produce its own genetic material. except the highly repetitive DNA in the centromere region. The integrated map also includes a transcription map of 1000 sequenced 20 Encyclopedia of Human Biology. The high. In addition. Academic Press. 1997. It is based on bacterial clones that are ideal substrates for DNA sequencing. and further. each a hybrid that contains mouse chromosomes and a fragment of human chromosome 1620. This ordered FISH map. excluding the centromere. leukemia. Like a phage. London UK. one linked to a form of congenital kidney disease. provides the essential framework to which other cosmid contigs can be anchored.000 base pairs. Over 450 genes and genetic markers have also been localized on this map. These contigs span an estimated 54 million base pairs. Natural breakpoints in chromosome 16 are thus identified. The high-resolution map comprises some 4000 cosmid clones. together with any "foreign" human DNA that has been smuggled into it. Renato Dulbecco. Anchored to this framework are a low-resolution contig map based on YAC clones and a high-resolution contig map based largely on cosmids.1 million base pairs. Further. since it is a "sequence-ready" map. which was identified in 1992 by an international consortium that included Livermore scientists. In a similar effort. and breast and prostate cancers. has been localized to a single contig spanning one million base pairs.
From the beginning. as with so many human enterprises. Included. and to establish suitable resources for sequencing. At the beginning of the project. Only about 30 million base pairs of human DNA (roughly one percent of the total) have been sequenced in longer stretches. Getting down to details: Sequencing the genome Ultimately. and enhancing the accuracy of base identification. and one researcher could produce between 20. but they are not perfect. or spurious insertions. though. Both of these methods rely on gelbased electrophoresis systems to separate DNA fragments. Sequencing the genome by the year 2005 would therefore likely cost $10- 20 billion and require a dedicated cadre of at least 5000 workers21. at different times of our lives). therefore. P1 phages. for example.). these physical maps and the clones they point to are mere stepping stones to the most visible goal of the genome project. and G's -. Efforts to develop new cloning vectors have been especially productive. without the danger of deletions. National Academies Press. Committee on Challenges for the Chemical Sciences in the 21st Century. Only the barest start has been made in taking this dramatic step in the Human Genome Project.representing the sequence of base pairs that defines our species. resist cloning in YACs.000 base pairs of continuous. As a result of such improvements. a major effort in technology development was called for .programs to develop new technologies. as well as the sequences for stretches of DNA whose functions we don't yet know (but which may be involved in such little-understood processes as orchestrating gene expression in different parts of our bodies. including clone libraries and libraries of expressed sequences. C's. T's. though all of those in widespread current use are still based on methods developed in 1977 by Allan Maxam and Walter Gilbert and by Frederick Sanger and his coworkers. decreasing run times. Marked progress is also evident in the development of sequencing technologies. the DOE has emphasized programs to pave the way for expeditious and economical sequencing efforts -. Some regions of the genome. Hence. accurate sequence in a year. the cost of sequencing a single base pair was between $2 and $10. would be the sequence for every gene. the longest being about 685. Clearly.000 and 50. Several hundred million base pairs have been sequenced and archived in databases. but the great majority of these are from short "sequence tags" on cloned fragments. and recent advances in commercial systems include increasing the number of gel lanes. New York 2 . 2003.an effort that would drive the cost well below $1 per base pair and that would allow automation of the sequencing process. a standard sequencing 21 Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering National Research Council (U.Genome Sequencing Project exons (expressed fragments of genes) and more than 600 other markers developed at other laboratories around the world. Should anyone undertake to print it all out. including new cloning vectors. YACs remain a classic tool for cloning large fragments of human DNA. the result would fill several hundred volumes the size of a big-city phone book. the challenge of sequencing the genome is largely one of doing the job cheaper and faster. Even more daunting is the realization that we will eventually need to sequence many parts of the genome many times. These new approaches are critical for ensuring that the entire genome can be faithfully represented in clone libraries. New vectors such as bacterial artificial chromosomes (BACs). of course.000 base pairs long. rearrangements.S. thus to reveal differences that indicate various forms of the same gene. the string of three billion characters -. and P1derived artificial cloning systems (PACs) have thus been devised to address these problems.A's. and others are prone to rearrangement.
This includes the longest contiguous fragment of sequenced human DNA. Many copies of a single large clone are broken into pieces of perhaps 1500 base pairs.G.and chromosome-sorting technologies developed at Livermore and Los Alamos.E.G.that is. and a convenient portion of it sequenced.000 base pairs -. One of the available choices. complete human DNA libraries have been established using BACs. More recently.A. Haemophilus influenzae (1. from the human T-cell receptor beta region. constructs an ordered library for the parent clone. Based on cell. The members of this ordered library can then be sequenced from end to end to yield a complete sequence for the parent. of about 685. and the individual clones are widely available for mapping and for isolating genes. Equally important to the sequencing goals of the genome project is a rational system for organizing and distributing the material to be sequenced.E. Genome Therapeutics has sequenced 1.Genome Sequencing Project machine can now turn out raw. The statistics involved in taking this approach require that many copies of the original clone be randomly fragmented. Nevertheless. Twenty-five hundred genes have also been newly mapped as part of this coordinated effort. Shotguns and transposons Such advances as these. libraries of clones were established for each of the human chromosomes. and YACs. are usually 300-500 base pairs each.A. in both technology development and the assembly of resource libraries.M. most of them with one or both ends sequenced to provide unique identifiers. cofounded by the Livermore Human Genome Center.000 partial and complete cDNA clones. I. by finding overlaps that indicate neighboring fragments. (Integrated Molecular Analysis of Genomes and their Expression). as part of the DOE-supported Microbial Genome Initiative.137 base pairs.8 million base pairs of Methanobacterium thermoautotrophicum. By early 1996. A computational assembly process then compares the terminal sequences of the many fragments and. either by restriction enzymes or by physical shearing. and TIGR has successfully sequenced the complete genomes of three free-living bacteria. A great deal of variety remains.000 bases per day. had distributed over 250. These clones were invaluable in such notable "gene hunts" as the successful searches for the cystic fibrosis and Huntington disease genes. unverified sequences of 50. and it is not yet clear which will prove the most efficient and most costeffective way to read long stretches of DNA over the next decade. Another is the degree of redundancy . is between "shotgun" and "directed" strategies. in the approaches available to sequencing the human genome. Another critical resource is being assembled in an effort known as I. an effort supported mostly by private funds). Mycoplasma 2 . how many times must a given strand be sequenced to ensure acceptable confidence in the result? Shotgun sequencing derives its name from the randomly generated DNA fragments that are the objects of scrutiny. have brought much nearer the day when "production sequencing" can begin.M. expressed sequence tags (ESTs). Each fragment is then separately cloned.a product of DOE-supported work at the University of Washington. The shotgun strategy is also being used at the Genome Therapeutics Corporation and The Institute for Genomic Research (TIGR). The aim is a master set of mapped and sequenced human cDNA. PACs. however. representing the expressed parts of the human genome. These identifiers.000 to 75. as more efficient vectors have become available. a bacterium important in energy production and bioremediation. if no gaps are to be tolerated in the final sequence. for example. shotgun sequencing has been the primary means for generating most of the genomic sequence data in public DNA databases. the main disadvantage is that the same sequence must be done many times (in the many overlapping fragments). A benefit is that the final sequence is highly reliable.830.
Mayes. The alternative to shotgun sequencing is a directed approach. Darryl K.technologies that might potentially increase mapping and sequencing efficiencies by orders of magnitude. the DOE's engineering infrastructure and tradition of instrumentation development have been crucial contributors to the international effort.that can be sequenced in one run. efficiency. Robert K. mainly the expense and inconvenience of custom-synthesizing a primer as the necessary starting point for each sequencing step. especially. the next stretch of DNA. 2006. 400 base pairs long .739. Victor W. the pooling of libraries as Harper's Illustrated Biochemistry. Until recently. in which one seeks to sequence the target clone from end to end with a minimum of duplication. and similar directed strategies. And here. genome researchers are seeing significant improvements in the rate. Significant DOE resources have been committed to innovations in instrumentation. and economy of large-scale mapping and sequencing efforts as a result of improved laboratory automation tools. but recent innovations have made primer walking.933 base pairs). Unfortunately. With the sequence for this first segment in hand. Murray. just overlapping the first. 2 . commercial robots have simply been mechanically reconfigured and reprogrammed to perform repetitive tasks.say. Granner. In principle. The essence of this approach is embodied in a technique known as primer walking. Rodwell. On the first of these fronts. Columbus. Ohio. 22 ranging from straightforward applications of automation to improve the speed and efficiency of conventional laboratory protocols to the development of technologies on the cutting edge . Peter A. more and more economically feasible. one replicates a stretch of DNA .Genome Sequencing Project genitalium (580.070 base pairs). In many cases. one can thus "walk" the entire length of the original clone. this conceptually simple approach has been historically beset with disadvantages. Bioinformatics in Human Genome Sequencing Project22 From the start. Starting at one end of a single large fragment. The widely automated Sanger sequencing method involves a DNA replication step that must be "primed" by a DNA fragment that is complementary to 15 to 20 base pairs of the strand to be sequenced. it has been clear that the Human Genome Project would require advanced instrumentation and automation if its mapping and sequencing goals were to be met. and Methanococcus jannaschii (1. including the replication of large clone libraries. is then tackled in the same way. McGraw-Hill Professional. making these primers was an expensive and time-consuming business.
and the application of sophisticated statistical analyses reassembles the target sequence. in disease? Sequencing by hybridization is only one of several forward-looking ideas for revolutionizing sequencing technology. with an eye to simplifying sample preparation. and cooled CCD cameras. The oligomers are placed on an array by a process similar to that of making silicon chips for electronics. which would make feasible large-scale hybridization assays. Building on experiences in the electronics industry. Both of these approaches exploit higher electric field strengths to increase DNA mobility and to reduce analysis times.Genome Sequencing Project a prelude to various assays. This innovative technique uses short oligomers that pair up with corresponding sequences of DNA. can be used to obtain 400 bases of sequence from each lane in a hour's run. and the arraying of clone libraries for hybridization studies. and instruments developed at Utah for automated hybridization in multiplex sequencing schemes. and tenfold improvement in speed. a fivefold improvement in throughput over conventional systems. economy. to sequencing arrays of rigid glass microchannels. The capillary approach is especially ripe for further development. In spite of continuing improvements to sequencers based on the classic methods. Challenges include providing uniform excitation over arrays of 50 to 100 capillaries and then efficiently detecting the fluorescence emitted by labeled samples. reducing measurement times. high-speed thermal cycling systems for PCR.1 millimetres thick. developed at Livermore and now being commercialized. which greatly accelerates PCR amplifications. ultrathin gels. it is nonetheless desirable to explore altogether new approaches. computer-controlled PCR device under development at Livermore operates on 9-volt batteries and might ultimately lead to arrays of thousands of individually controlled microPCR chambers. A miniaturized. supplemented by automated gel and sample loading. in place of the conventional slab gels. several DOE-supported groups are exploring ways to adapt high-resolution photolithographic methods to the manipulation of minuscule quantities of biological reagents." Current thrusts of this "nanotechnology" approach include the design of microscopic electrophoresis systems and ultrasmall-volume. robotics-compatible thermal cycler developed at Berkeley. For example. Even faster speedups are seen when arrays of 0. including sequencing by hybridization. Smaller is better: and other developments Beyond "mere" automation are efforts aimed at more fundamental enhancements of established techniques. This same technology has already been used for genetic screening and cDNA fingerprinting. or inactive. Other examples include a high-speed. custom-designed instruments have proved more efficient. a number of DOEsupported efforts aim at improved versions of the automated gel-based Sanger sequencing technique. And Livermore scientists are looking beyond even capillaries. followed by assays performed on the same "chip. In particular.1-millimeter capillaries are used as the separation medium. The move toward miniaturization is afoot elsewhere as well. In other cases. Successful matches between oligomers and genomic DNA are then detected by fluorescence. which is used to sort human chromosomes for chromosome-specific libraries. Technologies under investigation include fiber-optic arrays. A notable illustration is the world's fastest cell and chromosome sorter. Some of this effort has already been transferred to the private sector. and efficiency are projected in future commercial instruments. Similar approaches can be envisioned to understand differences in patterns of gene expression: Which genes are active (which are producing mRNA) in which cells? Which are active at different times during an organism's development? Which are active. less than 0. Another miniaturization effort aims at the fabrication of high-density combinatorial arrays of custom oligomers (short chains of nucleotides). scanning confocal microscopy. 3 .
Another innovative sequencing method is under investigation at Los Alamos. This approach is beset by major technical challenges. The interpretation of map and sequence data is the job of data analysis systems. The genome informatics program is the world leader in developing automated systems for identifying genes in DNA sequence data from humans and other organisms. but mass spectrometry has perhaps demonstrated the greatest near-term potential.Genome Sequencing Project increasing the length of the strands that can be analyzed in a single run. and facilitating interpretation of the results. whose products are already widely used in genome laboratories. thereby yielding the sequence. Routine application of this technique still lies in the future. Mass spectrometry measures the masses of ionized DNA fragments by recording their time-of-flight in vacuum. The characteristic fluorescence is detected by a laser system. is a world-standard gene identification tool. DNA sequencing. These systems typically comprise databases for tracking biological materials and experimental procedures. The roles of laboratory data acquisition and management systems include the construction of genetic and physical maps. software for controlling robots or other automated systems. and DNA sequence assembly software developed at the University of Arizona. cost-effective data production in both DOE laboratories and the many other laboratories that use them. and direct sequencing has not yet been achieved. 3 . These systems typically include task-specific computational engines. including atomic-resolution molecular scanning. several alternative approaches to direct sequencing have been explored. base by base. singlemolecule detection of individual bases. These systems are the keys to efficient. Dealing with the data Among the less visible challenges of the Human Genome Project is the daunting prospect of coping with all the data that success implies. but also for sophisticated data analysis and for the management and public distribution of unprecedented quantities of biological information. and practical systems based on high-resolution mass separations of DNA fragments of fewer than 100 bases are currently being developed at several universities and national laboratories. Further. and mass spectrometry of DNA fragments. more than 180 million base pairs of DNA were analyzed with GRAIL. structural biology. and gene expression analysis. but also to the microbial genome program and to public . together with graphics and user-friendly interfaces that invite their use by biologists and other non-computer scientists. and software for acquiring laboratory data and presenting it in useful form. and much of the instrumentation for sensitive detection of fluorescence signals has already proved useful for molecular sizing in mapping applications. Appropriate information systems are needed not only during data acquisition. biotechnology companies. Efforts in all these areas are the mandate of the DOE genome informatics program. because much of the challenge is interpreting genomic data and making the results available for scientific and technological applications.and private-sector programs focused on areas such as health effects. All of these alternatives look promising in the long term. general molecular biology and medical laboratories. the challenge extends not just to the Human Genome Project. Over the course of the past few years. It would therefore replace traditional gel electrophoresis as the last step in a conventional sequencing scheme. In 1995 alone. Among such systems are physical mapping databases developed at Livermore and Los Alamos. and environmental remediation. supporting efforts at Oak Ridge National Laboratory and elsewhere. robot control software developed at Berkeley and Livermore. The Oak Ridge-developed GRAIL system. But the potential benefits are great. but fragments of up to 500 bases have been analyzed. and biopharmaceutical companies around the world. illustrated in Gene hunts.
and G) can be ordered in more than 68 billion ways to create an 18-base primer. physiology and medicine. as well as over three million base pairs from the fruit fly Drosophila melanogaster. a "minimum tiling path" can be determined for each subclone -. for example. developed at the Brookhaven National Laboratory. it is critical to develop scientific databases that "interoperate. is currently being applied to Borrelia burgdorferi. Three of these "6-mers" can be matched to the end of the fragment to be sequenced. the organism that causes Lyme disease.000-base-pair fragment has already been sequenced. molecular. and reaction conditions are controlled to yield. As an illustration. By mapping these positions.that is. The known transposon sequence allows a single primer to be used for sequencing the full set of overlapping regions. T. the region around each transposon is then sequenced. Berkeley researchers are interested in a region of about two million base pairs that is implicated in 15 to 20 percent of all primary breast carcinomas. and environmental science. a single insertion in each 3000-base-pair strand. which offers a way of increasing throughput with either shotgun or directed approaches. the Genome Sequence DataBase at the National Center for Genome Resources in Santa Fe. they have developed a methodology for "multiplex" DNA sequencing. Public resource databases must provide data and interpretive analyses to a worldwide research and development community. the four nucleotides (A. and the Molecular Structure Database at Brookhaven National Laboratory. As the genome project continues to provide data that interlink structural and functional biochemistry. As this community of researchers expands and as the quantity of data grows.information readily available to the scientific and lay communities. This predilection for random insertion and the fact that the transposon's DNA sequence is well known are the keys to the sequencing strategy depicted schematically in taking a directed approach. using the inserted transposons as starting points. Bionformatics program is crucial to the multiagency effort to develop just such databases. which then become the targets of the transposons. By attaching a unique identifying sequence to each sequencing sample in a 3 . this technique has been used to sequence over 1. On chromosome 5. on average. But it is eminently practical to create a library of the 4096 possible 6-base primers. Another directed approach uses a naturally occurring genetic element called a transposon. Researchers supported by the DOE at the University of Utah are also pursuing the use of directed sequencing. thus serving as an 18-base primer. C. the challenges of maintaining accessible and useful databases likewise increase. is to use sets of very short fragments to prime the next sequencing step. Systems now in place include the Genome Database of human genome map data at Johns Hopkins University. a set of strands can be identified whose transposon insertions are roughly 300 base pairs apart. and developmental biology. on chromosome 20. One way to deal with the primer bottleneck. a 34. such interoperable databases will be the critical resources for both research and technology development. the ultimate product of the Human Genome Project -. The largest clones are broken into smaller subclones (each of about 3000 base pairs).Genome Sequencing Project A third area of informatics reflects. for each.5 million base pairs of DNA on human chromosomes 5 and 20. At the Lawrence Berkeley National Laboratory. For example. interest focuses on a region of three million base pairs that is rich in growth factor and receptor genes. The individual strands are then analyzed to yield. which insinuates itself more or less randomly in longer DNA strands. Multiple copies of each subclone are exposed to the transposons. In addition. whereas. an imposing set of possibilities. in a sense. the approximate position of the inserted transposon. In this set of strands. cellular. This modular primer technology." sharing data and protocols so that users can expect answers to complex questions that demand information from geographically distributed data resources.
In contrast to. another Los Alamos target. With this philosophy in mind. sequence-tagged sites. along with the associated segments of the genome. the SASEderived sequences provide enough information for researchers elsewhere to pursue just such comprehensive efforts. The result is sequence coverage for about 70 percent of the original cosmid clone. In addition. then for bands containing the second. shotgun sequencing. 50 such samples.A. first. and another of approximately one million base pairs. especially the mouse.much as in other sequencing approaches. Interestingly. Los Alamos scientists have therefore begun sequencing chromosome 16. confirming the apparent high density of genes on this chromosome. multiplexing can also be used for mapping.000. Initially. The Utah group is now able to map almost 5000 transposons in a single experiment. Clones are selected from the high-resolution Los Alamos cosmid map.8 million base pairs of the thermophilic microbe Pyrococcus furiosus and two important regions of human chromosome 17. A region of 60. only a small random set of the subclones is then selected for sequencing. the entire mixture can be analyzed in a single electrophoresis lane. Los Alamos scientists have begun a project to determine the cost and throughput of a low-redundancy sequencing strategy known as sample sequencing (SASE. To assure a higher level of confidence. Sequence fragments already known -. say. In a similar way. the genome center there has produced almost two million base pairs of human DNA sequence. However. the difference between one human being and another is more like one base pair in five hundred.are used as the starting points. Between chromosome 16 and the short arm of chromosome 5.000 base pairs has already been sequenced around the adult polycystic kidney gene. and they are using multiplexing in concert with a directed sequencing strategy to sequence the 1. The 50 samples can be resolved sequentially by probing. The completed physical maps of chromosomes 16 and 19. say.E. Such comparative sequencing has identified conserved sequence elements that might act as regulatory regions for these genes and has also assisted in the identification of gene function How good is good enough? The goal of most sequencing to date has been to guarantee an error rate below 1 in 10. containing a kidney disease gene. for bands containing the first identifier. and perhaps to uncover important individual differences. enough to allow identification of genes and ESTs. and good starts have been made in mapping other genes. thus pinpointing the most critical targets for later.end sequences.Genome Sequencing Project mixture of.G. and then physically broken into 3000-base-pair subclones -. or "sassy"). The Livermore scientists are making use of the I. containing several genes involved in DNA repair and replication. researchers there have completed over 1. they are attacking two major regions of chromosome 19: one of about two million base pairs. and so forth -.3 million bases of genomic sequence. cDNA resource to sequence the cDNA from these regions. but using this lowered standard would greatly reduce the cost of acquiring sequence data for the bulk of human DNA. focusing special effort on locating the estimated 3000 expressed genes on that chromosome and using those sites as starting points for directed genomic sequencing. Using a shotgun approach. and so forth. sometimes even 1 in 100. using whole genomic DNA. even random sequencing has led to the identification of gene DNA in over 15 percent of the samples.M. A parallel effort is under way at Livermore on chromosome 19 and other targeted genomic regions. though. the most biologically or medically important regions would still be sequenced more exhaustively. more thorough sequencing efforts. so most researchers now agree that one error in a thousand is a more reasonable standard.000. with their extensive coverage in many different kinds of cloning vectors. are especially ripe for large-scale sequencing. Further. in many cases. have done comparative sequencing of these genes in other species. Livermore scientists have targeted DNA repair gene regions throughout the genome and. In 3 .
do a total genetic examination. Ethical Issues23 Controversies That Never End There are many ethical issues that are raised as a direct result of our knowledge. 23 Current Controversies in the Biological Sciences: Case Studies of Policy Challenges from New Technologies. By early 1996. proved to be as efficient as typical shotgun sequencing.to tenfold redundancy required in shotgun approaches. In addition. genderism).the ultimate physical map -. Jon F. Vovis foresees a day when doctors will take a sample of blood.000-base-pair sequence is the second-longest stretch of contiguous human DNA sequence ever produced. Greif. MIT Press. Asian. Los Alamos is building on the SASE effort by using SASE sequence data as the basis for an efficient primer walking strategy for detailed genomic sequencing. Merz. even a complete genome sequence -.to threefold redundancy to produce a complete sequence. the cost of SASE sequencing is only one-tenth the cost of obtaining a complete sequence. 2001 3 . sexism. and a genomic region can be "sampled" ten times as fast. The first application of this strategy.4 million base pairs had been sequenced. Mass 24 MSNBC Reuter. and a gene.000 identified by human genome scientists found that for each gene. In a sense. EST. in contrast to the seven. A first step toward solving these subtle mysteries. 2007. though. Another upside to this technology is that side effects produced by the ingestion of medication could be minimized or eradicated altogether.is only a start in understanding the human genome. Gearld Vovis.white. Los Alamos scientists chose a cosmid contig of four million base pairs at the end (the telomere) of the short arm of chromosome 16. http://www. there were on average 14 versions that could be inherited by a given person from parents. black. to a telomeric region on the long arm of chromosome 7. The deepest mystery is how the potential of 100. located in New Haven announced they have detected an "astonishing" variance at the genetic level in 82 unrelated people from four racial backgrounds . ageism. Hispanic.Genome Sequencing Project addition. but it required only two. Genaissance chief technology officer and senior vice president felt this might explain why there is such a wide variance in how people respond to medication. Huge gene Variation found in Humans: Find May Explain Differing Responses to Medication. Their study of 313 genes. or suspected coding region had been located on every cosmid sampled. over 1. based on historical documentation.com/news. how blood cells and brain cells are able to perform their very different functions with the same genetic program. As the first major target of SASE analysis. understanding and usage of genome technology. government? Since this society.msnbc.000 genes is regulated and controlled. The resulting 230. First and foremost is what will be done with this information? Who has a right to have it? Should potential employers be given this information? Should insurance carriers be given this information? Will this technology absolve some and indict others of their responsibilities to society. family. out of 30. The downside is that some unscrupulous individual having access to that information could misuse or exploit that individual 24. Boston. Karen F. though. can't/ won't free itself from the deleterious clutches of the "isms" (racism. is a more complete physical picture of the master molecules that lie at the heart of it all. and have that guide in prescribing treatment. would such information merely serve as another discriminating mechanism to ostracize individuals from mainstream society? Treatment and Medicine Genaissance Pharmaceticals. and how these and countless other cell types arise in the first place from an single undifferentiated egg cell. Inc. July 12.
AIDS. No one can look at how the book of life is written and not come away fully understanding that our genetic instructions have evolved from the same programs that guided the development of earlier animals. With access to your genetic code. Changes came about because of natural selection. and education and sometimes who-you-know. any predisposition to alcoholism.. no test and no evidence in support of evolution is. and a host of other malaise can increases your rates. The genome reveals. Religion sets its sights on a reality that has no bases in logic. Now imagine if insurance companies had access to your genetic composition? You could potentially be penalized now for what may be coming twenty years (or never) down the road. "On the Origin of Species by Means of Natural Selection in 1859. cancer. said that if you look at our genome it is clear that "evolution …must make new genes from old parts. dinosaurs. the most important consequence of mapping out all of our genes. That will not set well with a potential employee who will need to be selfsupporting. obesity. omnipresent. references. Religion has promoted the notion that an omnipotent." Eric Lander of the Whitehead Institute in Cambridge. Science believes in the tangible and concrete. as the scientists who cracked the genome all agreed. "The proof is right here. Mass. if you have something you don't know about. it cannot be used against you. the answer would probably be a resounding yes.Genome Sequencing Project Insurance Companies If life insurance companies had this information. When Charles Robert Darwin first presented his book of theories entitled. The response to all those who thump their bible and say there is no proof. could become grounds for not being hired. but rather emotional rectitude. based on something that might not take place in the next five to ten years? From the potential employer's perspective. no other possible explanation. Science says reality must be grounded in fact. Sure the business side of cracking 3 . Employment What if potential employers had this information? When you currently apply for a job. it was met fairly much as it is today. your "employability" is based on skill. Huntington's disease. Religion says I believe therefore it is real. indisputably and beyond any serious doubt that Darwin was right mankind evolved over a long period of time from primitive animal ancestors. There is no other way to explain the jerryrigged nature of the genes that control key aspects of our development. Science relies on empirical data. Multiple Sclerosis. Would it be ethical for the employer to deny you the opportunity to make money today. The human genome project confirms the theory on evolution. Smoking. There is. or experienced. …none of these headlines capture the most basic. Darwin believed that first man evolved. or cause you to pay abhorrently elevated fees. in our genes. Faith is a belief in that which is not seen. Religion is predicated on all faith. Today. experience. Moral Issues Science and religion have been at odds for eons. cancer." The core recipe of humanity carries clumps of genes that show we are descended from bacteria. Our genes show that scientific creationism cannot be true. The fundamental difference between the two is buttressed in philosophy. The only up side to the current system is. the naturalist feels that Darwin has been vindicated. coronary heart disease. Our genetic instructions have been slowly assembled from the genetic instructions that made jellyfish. how might that impact society? Anyone who has ever sought life insurance is familiar with the little indicators that can prohibit your ability to become insured. and omniscient deity created all life here on Earth. wooly mammoths and our primate ancestors.
add to the aptitude of the child. now pierces the veneer of American piety. The potential for eliminating illnesses with debilitating effects on adults and children are clearly a good reason to continue. will there be a way to control what goes on? Will the quality and validity be retained. 2001 3 . Though we have the technology. Even that not particularly religious." the study of the production of proteins. fear a resurgence of Adolph Hitler's vision of creating the perfect race. thanks to our continuing breakthroughs with DNA. 'Darwin vindicated!' Cracking of human genome confirms theory of evolution. The debate further intensifies as religion frowns on the notion that man will attempt to play God.1 billon 25 letters of DNA. Gene mapping will make it possible to do away with perceived flaws. Change the eye colouring.Genome Sequencing Project our genetic code is fascinating. We are correcting past wrongs. Historical denials. Clearly the genie is out of the bottle and there is no way of stopping its progress. MSNBC. One salient thought keeps me from totally embracing this new technology: can we fallible creatures objectively and responsibly handle this knowledge? Caplan. courtesy of the genome factor by proving he fathered several of Sally Hemings' children. The public at-large must indeed become more knowledgeable so that an eye can be kept on Big Brother. let's make the child athletic and very aesthetically pleasing.000 genes. We have sequenced 3. February 21. or defects in children. Clearly our bio-technical advances are working. and by all means. we have seen a thirty-percent increase in the number of centenarians. and proven that humans are made up of 30. freeing those who have been incarcerated unjustly. During the last twenty years. Arthur. As more and more companies enter the arena. as more for-profit businesses like Celera Genomics enter the picture? Only time will tell.000 to 40. We are spawning new scientific fields of study like "proteomics. only two times more than fruit flies.S. once vehemently denied. is it moral to take away the variety that nature provides? Will scientists one day perceive certain ethnic groups as being unwanted flaws? Conclusion There is no doubt that the human genome project started in 1990 left humankind hanging at the precipice of eminent power and direction. The collaboration of the U. And we all need to be sure that our government does not leave us in the genetic lurch without laws to ensure our privacy and protect us against genetic discrimination25. like the Thomas Jefferson debauchery. change the hair texture. Department of Energy and the National Institutes of Health seems to have been a good merger.
Genome Sequencing Project Marine Genome Sequencing Project . Analysis of the shipworm symbiont community metagenome will provide important insights into the composition and function of this unique lignocellulose degrading bacterial community and will allow valuable comparisons to the recently sequenced termite symbiont metagenome. 1971. and we know little about the biochemical characteristics. We are also has to develop the biological technology needed to identify sources of ecological stress to develop strategies to protect and restore coastal resources. Molecular technology to develop rapid diagnostic that ensure the safety of the seafood we eat and the vitality of the seafood industry. such as bacteria and phytoplankton. digestion.Nanoflagellates More than 80% of the earth living organisms are found in only aquatic ecosystem. These predatory protists play a critical role in marine carbon cycling. and biomass incorporation by protists that determine the fate of phytoplankton and bacteria to bridge the gap in our knowledge about this important player in the marine food web. Advances in Marine Biology. functionally. and evolutionarily distinct from those found in termites. London UK. Nanoflagellates are a group of marine microbes. Seafood-borne illness adversely affects public health and coastal economies. Academic Press. the ability of shipworms to consume wood depends on symbiotic bacteria that provide enzymes. ruminants. An International team of investigators led by Monterey Bay Aquarium Research Institute's Alexandra Worden will investigate the genetic mechanism behind the processes of predation. utilizing a highly efficient system of symbiotic lignocellulose degradation that is biologically. wood-boring marine bivalves. Sir Maurice Yonge. prey on other microbes. shipworms accomplish the complete degradation of lignocellulose with a simple intracellular 26 In genome sequencing project among the marine organism will enable scientist to differentiate populations and address emerging disease to protect fishery and ecological resources. one such metagenome lurks inside was the giant Pacific shipworm. Sir Frederick Stratten Russell. have been nicknamed as "termites of the sea" These animals are capable of feeding solely on wood. including cellulases and other hydrolases critical for digestion of wood by the host and potentially valuable for commercial bioconversion of lignocelluloses to ethanol. and all other cellulose-consuming animals. Shipworms. Our challenge as a nation is to discover the life-enhancing and lifesaving properties this unique organism posses. Unlike termites. 3 . While for the Bankia setacea26. One of the examples of the genome sequencing among the marine organism is Nanoflagellates. Like termites. for survival.
reported on Feb. also comprise a portion of the hydrocarbon masses in several modern-day petroleum and coal deposits. This project." King said. Yet. which has been dominated by one-celled organisms. "Choanoflagellates are the closest living unicellular relatives of animals and. The sequencing and analysis was performed by the Department of Energy Joint genome Institute (JGI) in Walnut Creek. choanoflagellates play a major role in the carbon cycle of the oceans. aside from the fact that they are an important food for krill. biologists know almost nothing about these organisms. University of California. evolved for linking cells together. A type of B. The newly sequenced genome of a one-celled. these proteins' roles are a mystery. by consuming large quantities of bacteria. Botryococcenes have already been converted to fuel suitable for internal combustion engines. that synthesizes longchain liquid hydrocarbon compounds and sequesters them in the extracellular matrix of the colony to afford buoyancy. little information. either genetic or metabolic. Berkeley. biologists Nicole King. braunii communities. presumably from ancient B. which hold promise as an alternative energy source. and their first comparisons with the genes of multicellular animals. is already telling scientists about the evolutionary changes that accompanied the jump from one-celled life forms to multicellular animals like us. According to King. Calif. the so-called metazoans. King said. but they have 23 genes for cadherin proteins. less than 10 micrometers in size. The project was proposed by Daniel Distel of the Ocean Genome Legacy Foundation. an assistant professor of integrative biology and of molecular and cell biology. some of these proteins. planktonic marine organism. Geochemical analysis has shown that botryococcenes." said King. It is a colony-forming green microalgae. While algae have been recognized for their role in carbon sequestration and for biofuels production. and a 2005 MacArthur "genius" Award winner. braunii produces a family of compounds termed botryococcenes. called cadherins. will target the identification of specific metabolic pathways responsible for hydrocarbon synthesis to alleviate bottlenecks in biofuels production. led by Andrew Koppisch and colleagues from Los Alamos National Laboratory and five other institutions. "In animals. produce proteins essential to cell-to-cell signalling and in determining which cells stick to one another. which are the main source of food for baleen whales.Genome Sequencing Project consortium of just a few related types of microbes. 14 2008 in the journal Nature. and that. in collaboration with researchers from UC Berkeley and eight other institutions. as such. Daniel Rokhsar and their colleagues present their first draft of the genome of a choanoflagellate called Monosiga brevicollis." One finding confirmed by the sequencing is that choanoflagellates have many genes that. Since Monosiga does not form colonies as do some other choanoflagellates.. In the Nature paper and a complementary Science paper also released that week. "They help shed light on the biology and genome content of the unicellular organisms from which we evolved. they are the glue that prevents clumps of cells from falling apart." 3 . about the same as the fruit fly or the mouse. "Choanoflagellates show no hint of multicellularity. because choanoflagellates and animals shared a common ancestor between 600 million and a billion years ago. they hold a key to understanding the origins and evolution of animals. can help us learn about our history and the history of life on Earth. has been reported for this particular organism. in animals. Another marine organism is Botryococcus braunii.
consists of about 9. for example. King and graduate student Monika Abedin report that some of these proteins are found around the base of the choanoflagellate cell. 3 . choanoflagellates are not . While yeast is well known to genetics researchers. have about 25. Because choanoflagellates resemble the feeding cells of sponges. where the choanoflagellate attaches to surfaces.200 genes. but much smaller than the genomes of metazoans. and in the intervening years. At about 10 microns across. however. They are our best way of triangulating on that last unicellular ancestor of animals. they argue. King and Rokhsar successfully proposed the choanoflagellate for sequencing several years ago as part of the Department of Energy's Microbial Genome Program. King worked on isolating enough uncontaminated DNA for sequencing. while more complex metazoans adopted these proteins for gluing cells into a larger.choano comes from the Greek word for collar . because the fossil record is not there. Choanoflagellates are found abundantly in salt and fresh water around the world." they wrote in Science. The cells are egg-shaped with a single long tail or flagellum at one end surrounded at its base by a collar of tentacles . Humans. The flagellum propels the choanoflagellate through the water and also washes bacteria towards the tentacles. which are among the most primitive of animals.that capture bacteria.Genome Sequencing Project In the Science paper. and around the tentacles. It is similar in size to the genomes of fungi and diatoms." said Dan Rokhsar. yeast. The draft genome. many-celled creature. biologists 165 years ago proposed that these organisms were very distant ancestors of multicelled animals.a situation King hopes will change now that the genome is sequenced.000 genes. "Choanoflagellates really are a unique window back in time to the origin of animals and humans. they're about the size of another one-celled eukaryote. Perhaps. the last single-celled ancestor of all animals (including humans) employed these ancient cadherin proteins to bind and eat bacteria. King and Rokhsar also are members of UC Berkeley's Center for Integrative Genomics. completed and annotated in 2007. UC Berkeley professor of molecular and cell biology and program head for computational genomics at JGI. where they gorge on bacteria. where bacteria are captured and ingested. "The transition to multicellularity likely rested upon the co-option of diverse transmembrane and secreted proteins to new functions in intercellular signaling and adhesion.
like the genomes of many seemingly simple organisms sequenced in recent years. Introns have to be snipped out before a gene can be used as a blueprint for a protein and have been associated mostly with higher organisms. so differences between the genomes may reflect genes that have been lost by choanoflagellates as much as genes gained by humans. choanoflagellates have five immunoglobulin domains. These findings are helping King and her colleagues assemble a picture of what the original common ancestor of humans and choanoflagellates looked like and also get hints about the first animals. and which are new. Choanoflagellates and humans have been evolving for the same length of time. including another choanoflagellate . or at least does not form colonies. Nevertheless. have been found in simple organisms that lack a centralized nervous system. though they have no skeleton or matrix binding cells together. collagen. the choanoflagellate has nearly as many introns . and proteins called tyrosine kinases that are a key part of signaling between cells. "This is a new era. shows a surprising degree of complexity. Comparison of the Monosiga genome to that of other organisms. Nematostella vectensis. noting a similar situation with the starlet sea anemone. Many genes involved in the central nervous system of higher organisms. whose genome is due to be sequenced by the National Institutes of Health . where we start with a genome to understand the biology of an organism. and often in the same spots.may answer such questions. for example.” 4 . it is not always easy determining which genes were in the last common ancestor of choanoflagellates and humans.a colony-former called Proterospongia. at least in ways that allow you to make hypotheses about what those first steps toward animals looked like.noncoding regions once referred to as "junk" DNA . integrin and cadherin domains." King said.Genome Sequencing Project Interestingly." Rokhsar said. even though Monosiga is not known to communicate. King said. The choanoflagellate genome.in its genes as humans do in their genes. sequenced in 2007. "The genome is the toehold. King has hopes that the Monosiga genome will answer many questions of animal evolution and illuminate the biology of this poorly understood aquatic creature. though they have no immune system. "It remarkable to what extent we can figure out how those animal ancestors must have been able to stick together and communicate with each other. Likewise.
Even when every base pair of a genome sequence has been determined. Thus. This will allow for complete genome sequences to be determined from many different individuals of the same species. 4 . Such projects may also include gene prediction to find out where the genes are in a genome. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans. the emphasis has been on species which have either a relevance to human health (examples: pathogenic bacteria or vectors of disease such as mosquitoes) or species which have commercial importance (such as livestock and crop plants). it is likely that it will become even cheaper and quicker to sequence a genome. Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (such as: the common chimpanzee). Also. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes. However. For humans. where coding DNA may only account for a few percent of the entire sequence). “completed” genome sequences are rarely ever complete. when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA). there are still likely to be errors present because DNA sequencing is not a completely accurate process. In the future. Future Perspectives Historically. Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly.Genome Sequencing Project Future Perspectives and Conclusion When is a genome project finished? When sequencing a genome. means that genomes can now be “shotgun sequenced” in one go (there are caveats to this approach though when compared to the traditional approach). and terms such as “working draft” or “essentially complete” have been used to more accurately describe the status of such genome projects. and what those genes do. this will allow us to better understand aspects of human genetic diversity. Changes in technology and in particular improvements to the processing power of computers. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are. there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Rather than sequences a chromosome in one go. it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. When research agencies decide what new genomes to sequence. it is not always possible (or desirable) to only sequence the coding regions separately. it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.