This action might not be possible to undo. Are you sure you want to continue?
Genome Sequencing Project – Up Close and Personal Definition Genome sequencing projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (an animal, a plant, a fungus, a bacterium, an Achaean, a protist or a virus). The genome sequence for any organism requires the DNA sequences for each of the chromosomes in an organism to be determined. For bacteria, which usually have just one chromosome, a genome project will aim to map the sequence of that chromosome. Humans, with 22 pairs of autosomes and 2 sex chromosomes, will require 24 separate chromosome sequences in order to represent the completed genome. Background The sequencing of the human genome along with related organisms represents one of the largest scientific endeavours in the history of mankind. The information gathered from sequencing will provide the raw data for the exploding field of bioinformatics, where computer science and biology live in symbiotic harmony. The art of determining the sequence of DNA is known as Sanger sequencing after its brilliant pioneer. This technique involves the separation of fluorescent labelled DNA fragments according to their length on a polyacrilimide gel (PAGE). The base at the end of each fragment can then be visualized and identified by the dye with which it reacts. The time and labour intensive nature of gel preparation and running, as well as the large amounts of sample required, increase the time and costs of genomic sequencing. These conditions drastically reduce the efficiency of sequencing projects ultimately limiting researchers in their sequencing attempts1.
Frederic Sanger – a man behind “shotgun sequencing”
Encyclopedia of Medical Genomics and Proteomics. Jürgen Fuchs, Maurizio Podda. 2004. CRC Press. London, UK.
cerevisiae had a sequence approximately 60 times larger than any sequence previously attempted indicating why Goffeau felt compelled to invite the cooperation of a group of laboratories. At the time the sequencing of model organisms such as S.S. in another revolutionary discovery. Since then a couple of other viral and organellar genomes have been sequenced using similar techniques such as the 229 kb genome of cytomegalovirus (CMV). Frederic Sanger. The amplified portions of DNA are then assembled by their overlapping regions to form contiguous transcripts (otherwise known as contigs). Sequencing smaller genomes would highlight the problems with sequencing techniques eventually refining the technology to be used on large-scale projects like H. National Academies Press. S.5 Mb). National Research Council (U. In 1989. a strategy based on the isolation of random pieces of DNA from the host genome to be used as primers for the PCR amplification of the entire genome. the 192 kb genome of vaccinia. New York 2 . National Research Council. The final step involved the utilization of custom primers to elucidate the gaps between the contigs thus giving the completely sequenced genome. cerevisiae appeared to be the logical step towards the eventual characterization of the human genome. This method allowed sequencing projects to proceed at a much faster rate thus expanding the scope of realistic sequencing venture. Committee on Review of the Department of Energy's Genomics:GTL Program.Genome Sequencing Project Sanger first used "shotgun" sequencing five years later to complete the bacteriophage sequence that was significantly larger: 48502 bp. Andre Goffeau set up a European consortium to sequence the genome of the budding yeast Saccharomyces cerevisiae (12. The success with viral genome sequencing stemmed from the relatively small length of their genetic codes. sapiens. 2 Review of the Department of Energy's Genomics: Gtl Program. Most laboratories utilized Sanger's "shotgun" method of sequencing that had become the accepted standard for genome sequencing. Bacteriophage fX174 was the first genome to be sequenced. Goffeau's European collaboration involved 74 different laboratories drawn to the project in hopes of sequencing the homologs of their favourite genes 2. 2006. valuable insight concerning these organisms would be gained with the elucidation of their genetic makeup. a viral genome with only 5368 base pairs (bp). and the 186 kb genome of smallpox. In addition. a task that seemed beyond the scope of technology due to its tremendous size of 3000 Mb.). and the 187 kb mitochondrial and the 121 kb chloroplast genomes of Marchantia polymorpha. invented the method of "shotgun" sequencing.
an outsider won the race for the first complete genome sequence of a free living organism. Genome Project in 1990.6 Mb) but equally important in terms of experimental utility. and the United States producing the largest full length sequence (12 Mb) ever done. Haemophilus influenzae.8 Mb H. such as Escherichia coli. coli K-12. influenzae genome. Canada. Previous sequencing projects had been limited by the lack of adequate computational approaches to assemble the large amount of random sequences produced by "shotgun" sequencing. Europe. the bacterium E. Many anticipated that E. the genome is broken down laboriously into ordered. developed by TIGR. Software. Human Genome Project (HGP) is a joint A team headed by J. coli sequence was considerably smaller (4. S. The E. such an approach would have failed because the software did not exist to assemble such a massive amount of information accurately. molecular biology. each containing up to 40 Kb of DNA. The yeast genome was the final result of a tremendous international collaboration of more than 600 scientists from over 100 laboratories representing the largest decentralised experiment in modern molecular biology. physical maps. and Caenorhabditis elegans. In an incredible display of organizational mastery only 3. The final work represented efforts of scientist from Japan. It was hoped that these projects would increase the efficiency of sequencing but unfortunately they fell short of this task. coli would be the first genome to be sequenced entirely but to the shock of the science community.Genome Sequencing Project The following year saw the initiation of a plethora of ambitious sequencing proposals the foremost being the introduction of the Human effort of the Department of Energy and the National Institute of Health that was designed as a three-step program to produce genetic maps. After the H.S. called the TIGR Assembler was up to the task. The first two aims of the project are practically fulfilled and now the majority of work is concentrated on the exact nucleotide sequence of the human. TIGR's dramatic leadership role in the field of genome sequencing was paralleled by the final completion of two of the largest genomic sequences. influenzae genome was "shotgunned" and the clones purified sufficiently the TIGR Assembler software required approximately 30 hours of central processing unit time on a SPARCenter 2000 containing half a gigabyte of RAM testifying to the enormous complexity of the computation. Craig Venter from the Institute for Genomic Research (TIGR) and Nobel laureate Hamilton Smith of Johns Hopkins University. sequenced the 1. Mycoplasma capricolum. similar to S. 3 . and the yeast. and finally the complete nucleotide sequence map of the human chromosomes. In conventional sequencing. Venter's team utilized a more comprehensive approach by "shotgunning" the entire 1. Maryland. cerevisiae in their academic utility. reassembling the approximately 24000 DNA fragments into the whole genome. coli is the preferred model in biochemical genetics.4% of the total sequencing efforts was duplicated among laboratories. These projects were the culmination of over seven years of intensive work. cerevisiae in 1997. These segments are "shotgunned" into smaller pieces and then sequenced to reconstruct the genome. The U. E. overlapping segments. In the wake of this pronouncement came the start of three projects aimed at elucidating the sequences of smaller model organisms. Previously.8 Mb bacterium with new computational methods developed at TIGR's facility in Gaithersburg.
and eleven other microbial genomes under the length of 4. storage. E. Four other large-scale projects are in progress including the sequencing of the Nematode. This field will be challenged by the heightening demands of increased information on the algorithms currently utilized for sequence manipulation. The rapid proliferation of biological information in the form of genome sequences has been the major factor in the creation of the field of bioinformatics. but 100000 genes reflecting not their similarity in electronic configuration but their evolutionary and functional relationship. Just as past chemists systematically organized all elements in an array that captured their differences and similarities. The periodic table will not contain 100 elements. the mouse which has less than 1% finished (December 2007: only 20%). thirteen genome sequences of free-living organisms had been completed including the two largest. which focuses on the acquisition. modelling. The growing sequence knowledge of the human genome has been likened to the establishment of the periodic table in the 19th century. Drosophila melanogaster which is 6% completed (finished: 2006). and industrial organism. the fruit fly. On September 1997. C. access. and distribution of the many types of information embedded in DNA sequences. coli and yeast. 4 . medical. Bioinformatics will be the tool of the modern scientist in interpreting this periodic table of biological information. the Human Genome Project will allow modern scientists to construct a biological periodic table relating units of nucleotides. elegans which is 71% completed (finished: 1998).2 Mb.5% completed (current: 92%).Genome Sequencing Project and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental. analysis. and the human which is only 1.
anything from a bacterium to a mammal) is first fractured into millions of small pieces.yale.umd. and putting them back together to create a representation of the original chromosomes from which the DNA originated3. cytosine.) A genome assembly algorithm works by taking all the pieces and aligning them to one another.mbb. or reads. In a shotgun sequencing project.Genome Sequencing Project Genome Assembly Genome assembly refers to the process of taking a large number of short DNA sequences. and thymine. guanine. and the process continues4.cbcb. the entire DNA from a source (usually a single organism. Original DNA is broken into a collection of fragments The ends of each fragment (drawn in green) are sequenced 3 4 http://www. These overlapping reads can be merged together. which can read up to 900 nucleotides or bases at a time. all of which were generated by a shotgun sequencing project.shtml (111008) http://bioinfo. (The four bases are adenine. These pieces are then "read" by automated sequencing machines. and detecting all places where two of the short sequences. represented as AGCT.edu/research/assembly_primer.edu/course/projects/final-4/ (111008) 5 . overlap.
5 Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data.Genome Sequencing Project The sequence reads are assembled together based on sequence similarity Assembly Statistics The assembler relies on the basic assumption that two sequences reads (two strings of letters produced by the sequencing machine) that share a same string of letters originated from the same place in the genome (see pic above)5. It is important to note that the shotgun sequencing process is inherently "wasteful" as. corresponding to regions of the genome that are not represented in the set of shotgun reads. the assembler can join the sequences together in a manner similar to solving a jigsaw puzzle. London. this phenomenon can be understood by thinking of a sidewalk as it begins to rain. assembly is only possible once enough sequences are generated to cover the genome 8 to 10 times. dry spots persist for quite a while. The graph below shows a plot of the LanderWaterman equation for a genome of 1Mbp (1000000 base pairs). Mathematically. due to the randomness of the shearing process. Intuitively. Using such overlaps between the sequences. UK. Michael R. Between 8 and 10-fold coverage the model predicts that most of the genome will be assembled into a small number of contigs (approx. As raindrops fall randomly across the sidewalk. 2007. this phenomenon was modelled by Eric Lander and Michael Waterman in 1988. 5 for a 1Mbp genome). They examined the correlation between the oversampling of the genome (coverage) and the number of contiguous pieces of DNA (contigs) that can be re-constructed by an idealized assembly program. Barnes. John Wiley and Sons. 6 .
there is a non-zero probability that some portion of the genome remains unsequenced.cbcb. genome coverage Assembly Challenges6 Ideally. is the fact that the distribution of the sheared fragments along the genome cannot be modelled as a perfect Poisson process.umd.shtml (111008) 7 .r.edu/research/assembly_primer. thereby replicating the fragment as E. Each shotgun fragment must be cloned. a procedure usually performed by inserting the fragment into the cell of the Escherichia coli bacterium (called a vector) and allowing this bacterium to grow. certain regions are toxic to the E. In all but the simplest cases. 6 http://www. an assembly program should produce one contig for every chromosome of the genome being sequenced.t. however. More importantly. however. Even at 8-10 fold coverage. In most genomes. leading to the presence of gaps in the coverage. coli replicates its own genome.Genome Sequencing Project Lander-Waterman estimation of number of contigs w. Sanger sequencing requires many copies of each fragment in order for the sequencing chemistry to be possible. coli bacterium. many contigs are produced due to a combination of factors.
Genome Sequencing Project The ability of an assembly program to produce a single contig is also limited by regions of the genome that occur in multiple near-identical copies throughout the genome (repeats). The reads originating from different copies of a repeat appear identical to the assembler and cause assembly errors. 3. A simple example: Two copies of a repeat along a genome. this has changed as the software has grown more complex and as the number of sequencing centres has increased. Despite its age. Phrap . and Art Delcher. used throughout the years in the assembly of many bacterial and eukaryotic genomes. 7 http://amos. AMOS (A Modular. AMOS was initiated at The Institute for Genomic Research by Steven Salzberg. The assembly program incorrectly combined the reads from the two copies of the repeat leading to the creation of two separate contigs Assembly software Originally. 2. who are now at the University of Maryland. Among the list of available assemblers are: 1. Most notably. However. Mihai Pop. The reads coloured in red and those coloured in yellow appear identical to the assembly program.assembly program developed at Celera Genomics.assembly program developed at the University of Washington. most large-scale DNA sequencing centres developed their own software for assembling the sequences that they produced. Open-Source assembler)7 is a well-known open source effort to bring together the efforts of leading genome assembly code developers.net/ (111008) 8 . Celera Assembler demonstrated the applicability of the shotgun method to the assembly of a whole eukaryotic genome by successfully assembling the genome of the fruit fly Drosophila melanogaster.sourceforge. Celera Assembler was a key element in the successful assembly of the human genome by Celera Genomics and is currently used in numerous bacterial and eukaryotic projects. The Celera Assembler . phrap is one of the most widely used assembly programs. Genome misassembled due to a repeat. phrap was the main workhorse in the public effort to sequence the human genome.
widely used in genome projects both at the Broad Institute and other research organizations. 5. TIGR Assembler . The Arachne . Arachne and Celera Assembler are arguably the best assemblers available to the scientific community for the assembly of large eukaryotic genomes.program developed at the Broad Institute of MIT.assembly program developed at the Institute for Genomic Research (TIGR). accomplishment reported in the journal Science in 1995.Genome Sequencing Project 4. 9 . This assembler was used to generate the first sequence of a free living organism Haemophilus influenzae.
The basic level of annotation is using BLAST for finding similarities. experimental data. Genome annotation is the next major challenge for the Human Genome Project. • coding regions. a process called gene finding. A variety of software tools have been developed to permit scientists to view and share genome annotations9. Automatic annotation tools try to perform all this by computer analysis. Structural annotation consists in the identification of genomic elements: • Open reading frames (ORFs) . 2. identifying elements on the genome. • biochemical function • biological function • involved regulation and interactions • expression These steps may involve both biological experiments and in silico (performed on computer or via computer simulation) analysis. now that the genome sequences of human and several model organisms are largely complete. these approaches co-exist and complement each other in the same annotation pipeline.ehu. The Ensembl database relies on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline8. • gene structure.es/ (121008) 1 . Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. the SEED database uses genome context information. and then annotating genomes based on that.portion of an organism's genome which contains a sequence of bases that could potentially encode a protein . attaching biological information to these elements.and their localisation. 8 Functional annotation consists in attaching biological information to genomic elements. as opposed to manual annotation (also called curation) which involves human expertise. For example.Genome Sequencing Project Genome Annotation Genome annotation is the process of attaching biological information to sequences. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together". and integrations of other resources to provide the most accurate genome annotations through their Subsystems approach. It consists of two main steps: 1. Ideally. • location of regulatory motifs. However.org/IJDC/DB/ (121008) 9 http://insilico. nowadays more and more additional information is added to the annotation platform.seedling. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. similarity scores. Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means: http://www.
Gene Ontology Consortium . 1 . Each RefSeq represents a single. 6. animal and microbial genomes. The results of this pilot phase will guide future efforts to analyze the entire human genome. The collection includes sequences from plasmids.provide the scientific community with a comprehensive. and eukaryotes.aims to identify all functional elements in the human genome sequence.non-redundant collection of richly annotated DNA.EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Achaea. bacteria. Uniprot . including several of the world's major repositories for plant. the GO Consortium has grown to include many databases.Genome Sequencing Project 1. 5. RNA. 4. The pilot phase of the project is focused on a specified 30 megabases ( 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. 3. manual annotation of vertebrate finished genome sequence. viruses.central repository for high quality. Vertebrate and Genome Annotation Project (Vega) .collaborative effort to address the need for consistent descriptions of gene products in different databases. Ensembl . 2. Encyclopedia of DNA Elements (ENCODE) . naturally occurring molecule from one organism. RefSeq . organelles. Since 1998. and protein sequences from diverse taxa. high-quality and freely accessible resource of protein sequence and functional information. frequently updated.joint project between EMBL .
Gupta.5cm broad. and Africa Potential of rice Improve nutrition Boost food security Foster rural development Support sustainable land care Provides more than one fifth of the calories consumed worldwide by humans. K. Oryza glaberrima : is native to West Africa. P. Rice’s life10 Grown as a monocarpic annual plant. There are two species of domesticated rice in the Poaceae (“true grass”) family. There also have another type of rice other than domesticated rice. 2004. some governments and retailers began rationing supplies of the grain due to fears of a global rice shortage. UK 1 . Varshney. Although in tropical areas it can survive as perennial and can produce a ratoon crop and survive for up to 20 years. slender leaves 50-100 cm long and 2-2. K. South Asia. both wild and domesticated.Genome Sequencing Project Plant Genome Sequencing Project – Rice Rice is a cereal foodstuff which forms an important part of the diet of many people worldwide and as such it is a staple food for many. In early 2008. London. Can grow to 1-1. occasionally more depending on the variety and soil fertility. Southeast Asia. Rajeev K. Has long. P. Rice is staple food for a large part of the human population and making it the second-most consumed cereal grain especially Latin America. Gupta (contributor) Springer. Oryza sativa : native to tropical and subtropical southern Asia and African rice. The small wind-pollinated flowers are produced in a branched arching to pendulous inflorescence 30-50cm long. The edible seed is grain 5-12 mm long and 2-3 mm thick. wild rice.8m tall. although the term may be used for primitive or uncultivated varieties of Oryza. It is usually used for species of the different but related genus Zizania. East Asia. 10 Cereal Genomics.
and deters vermin.sativa var indica on the Indian side and O. Environmental Effect In many countries where rice is the main cereal crop. exemplified by Japanese rice. the long-grained “indica” varieties.sativa appears to have been domesticated from the crop wild relative Oryza rice. and the broad-grained “javonica” varieties. On the other hand. The microbes in the soil convert the carbon into methane which is then released through the respiration of the rice plant or through diffusion of 11 http://www. rice will become more inclined to remain flooded for longer periods of time11. more than other food products with the exception of beef and dairy products. Longer stays in water cuts the soil off from atmospheric oxygen and causes fermentation of organic matter in the soil. although its species are native to South Asia and certain parts of Africa. rice cultivation responsible for most of the methane emissions. Genetic History As we know.glaberrima.sativa and O. Labelled indica Aus Aromatic Temperate japonica Tropical japonica Further analysis of the genetic material of various types of rice indicates cultivar to emerge start with. Other studies have suggested that there are three groups of Oryza sativa cultivars: the short-grained “japonica” or “sinica” varieties. thus increasing the chances of famine in the long run. O. As sea levels rise.htm (121008) 1 . All other methods of irrigation require higher effort in weed and pest control during growth periods and a different approach for fertilizing the soil. whose genome did show significant differences in age.org/rice2004/en/rice4. mechanized cultivation is extremely oil-intensive.fao. How to cultivated rice Flooding the fields with or after setting the young seedlings. Current genetic analysis suggests that O. which thrive under tropical conditions. rice cannot hold the carbon in anaerobic conditions.sativa var japonica on the Chinese and Japanese side. O. During the wet season.Genome Sequencing Project Rice cultivation is well-suited to countries and regions with low labour costs and high rainfall. exemplified by Basmati rice. even on a steep hill or mountain. rza rufipogon around the foothills of the Himalayas. Rice can be grown practically anywhere. According to Londo and Chiang. and it is very labour-intensive to cultivate and requires plenty of water for cultivation. Rice also requires much more water to produce than other grains. but reduces growth of less robust weed and pest plant that have no submerged growth state. Farmers in some of the arid regions try to cultivate rice using groundwater bored through pumps. with O.sativa be best divided into five groups. there are two species of rice were domesticated. This method requires sound planning and servicing of the water damming and channelling. While with rice growing and cultivation the flooding is not mandatory. Temperate japonica and tropical japonica Labelled indica Aus Aromatic.
PR China and Taiwan. When a rice variety is no longer able to resist pest infestations. Pest Management of Rice Farmers in Asia. Escalada.Genome Sequencing Project water. The genetically based ability of a rice variety to withstand pest attack is called resistance. pest damage. Rice pests include weeds. Weather conditions also contribute to pest outbreaks. Farmers also claim the leaves are a natural fertilizer and helps suppress weed and insect infestations. Therefore. Botanicals. so called “natural pesticides”. Three main types of plant resistance to pests are recognized as. rodents and birds. A variety of factors can contribute to pest outbreaks. resistance is said to have broken down. Kong Luen Heong. One of the challenges facing crop protection specialists is to develop rice pest management techniques which are sustainable. but in general the practice is not common. Pests and Disease Rice pests are any organisms or microbes with the potential to reduce the yield or value of the rice crop. Further rise in sea level of 10-85 centimetres would then stimulate the release of more methane into the air by rice plants. there is evidence that farmer’s pesticide applications are often unnecessary. Nonpreference : host plants which insects prefer to avoid Antibiosis: where insect survival is reduced after the ingestion of host tissues Tolerance: the capacity of a plant to produce high yield or retain high quality despite insect infestation. misuse of insecticides can actually lead to pest outbreaks12. 12 Among rice cultivars there are differences in the responses to. Philippines. M. Some upland rice farmers in Cambodia spread chopped leaves of the bitter bush over the surface of fields after planting. and recovery from. Over time. insects. are used by some farmers in an attempt to control rice pests. M. Manila. 1997. International Rice Research Institute. Increasingly. 1 . In other words. to manage crop pests in such manner that future crop production is not threatened. including the overuse of pesticides and high rates of nitrogen fertilizers application. and retain their ability to withstand the pests are said to have durable resistance. Rice varieties that can be widely grown for many years in the presence of pests. The practice probably helps the soil retain moisture and thereby facilities seed germination. Major rice pests include: The brown planthopper Armyworms The green leafhopper The rice gall midge The rice bug Hispa The rice leaffolder Stemborer Rats The weed Echinochloa crusgali Rice weevils also known to be a threat to rice crops in the US. the use of pest resistant rice varieties selects for pests that are able to overcome these mechanisms of resistance. Upland rice is grown without standing water in the field. pest-resistant rice varieties and pesticides. particular cultivars are recommended for areas prone to certain pest problems. By the reducing the populations of natural enemies of rice pests. Methane is twenty times more effective as a greenhouse gas than carbon dioxide is. pathogens. Rice pests are managed by cultural techniques.
13 http://beta. Japanese mochi rice & Chinese sticky rice: short-grain.have a mild popcorn-like aroma and flavour. Japanese sake rice: another kind as well. This project enabled labour markets in Asia to shift away from agriculture. is the most significant disease affecting rice cultivation. Indian rice : long-grained and aromatic Basmati Patna rice : long and medium-grained Sona masoori : short-grained Ponni: grown in the delta regions of Kaveri River. is long-grain and relatively less sticky. with over 100. Chinese restaurants usually serve long-grain as plain unseasoned steamed rice. Ambemohar : fragrance of Mango blossom Aromatic rices : have definite aromas and flavours . there are varieties selected for other reasons.org/statistics/index. like corn and wheat. Rice cultivars are often classifieds by their grain shapes and texture.irri. short-grain rice.php?option=com_frontpage&Itemid=1 (131008) 1 . For example: That Jasmine Rice.000 rice accessions held in the International Rice Genebank13. and into industrial sectors. Cultivars exist that are adapted to deep flooding. Japanese table rice: sticky. caused by the fungus Magnaporthe grisea. Chinese people use sticky rice which is properly known as “glutinous rice” to make zongzi. and these are generally called “floating rice”. d) Texmati Biotechnology High Yielding Varieties The high yielding varieties are a group of crops created intentionally during the Green Revolution to increase global food production. was genetically manipulated to increase its yield. as longgrain rice contains less amylopectin than short-grain cultivars. a) Thai fragrant rice b) Patna rice c) Basmati . Cultivars While most breeding of rice is carried out for crop quality and productivity. Rice. The largest collection of rice cultivars is at the International Rice Research Institute (IRRI).Genome Sequencing Project Major rice diseases: Rice ragged stunt Sheath blight Tungro Rice blast.
transformability. trumpeted as miracle crops that will dramatically increase rice yield in Africa and enable an economic resurgence. Annotation of the rice genome is performed using prediction-based and homology-based searches to identify genes. we have aligned all rice bacterial artificial chromosome/P1 artificial chromosome sequences with The Institute of Genomic Research Gene Indices that are a 1 . Rice containing these added proteins can be used as a component in oral rehydration solutions which are used to treat diarrheal diseases. and establishment of genetic and molecular resources make it a tractable organism for plant biologists. To provide a low level of annotation for rice genomic sequences. with the intent that it might someday be used to treat vitamin A deficiency. Such supplements may also help reverse anemia. thereby shortening their duration and reducing recurrence. With an estimated genome size of 430 Mb. Resources are also being developed to leverage the rice genome sequence to partial genome projects such as expressed sequence tag projects. it is feasible to obtain the complete genome sequence of rice using current technologies. diploid nature. Genome Project Rice (Oryza sativa) is a model species for monocotyledonous plants. Several attributes such as small genome size. and antifungal effects. and human serum albumin which are proteins usually found in breast milk. These proteins have antiviral. Additional efforts are being made to improve the quantity and quality of other nutrients in golden rice. Golden Rice German and Swiss researchers have engineered rice to produce Betacarotene. 2007) and International Herald Tribune (October 9. for Africa. An international effort has been established and is in the process of sequencing O. hoping to increase productivity. The NERICA have appeared in The New York Times (October 10. With the intent of replicating the successful Asian boom in agronomic productivity. and billed as technology from Africa. 2007). groups like the Earth Institute are doing research on African agricultural systems. An important way this can happen is the production of ‘New Rices for Africa’ (NERICA). lysozyme.Genome Sequencing Project The first “modern rice”. the ‘Green Revolution’ is cited as the model for economic development. IR8 was created through a cross between an Indonesian variety named “Peta” and a Chinese variet named “Dee Geo Woo Gen” Potential for the Future As the UN Millennium Development project seeks to spread global economic development to Africa. japonica var "Nipponbare" using a bacterial artificial chromosome/P1 artificial chromosome shotgun sequencing strategy. thereby maximizing the output from the rice genome project. especially for members in the grass family. sativa spp. Annotation tools such as optimized gene prediction programs are being developed for rice to improve the quality of annotation. selected to tolerate the low input and harsh growing conditions of African agriculture are produced by the African Rice Center. antibacterial. The addition of the carotene turns the rice gold Expression of Human Protein Ventria Bioscience has genetically modified rice to express lactoferrin. IR8 was produced in 1966 at the International Rice Research Institute which is based in the Philippines at the University of the Philippines’ Los Banos site. These rices.
by the middle of this century. For example. The new map will make it possible. and how many countries will embrace it remains to be seen. to 9 billion."14 The number of people in the world is expected to increase 50 percent. It will be a key tool for researchers working on improved strains of rice and other grains as they struggle to stay ahead of human population growth. sorghum. Arabidopsis. in theory. one project introduced a daffodil gene into rice to turn the plant into a source of vitamin A. said Rod Wing. data from The Institute of Genomic Research Gene Indices and the Arabidopsis and Rice Genome Projects was used to identify putative orthologues and paralogues among these nine genomes. Much of that growth will come in Asian countries where rice is the dietary staple. The poorest of the poor are the ones that depend on rice the most. tomato. to perform sophisticated genetic manipulations of the rice plant. wheat. Rice is the first crop plant whose complete genetic sequence. which it normally lacks. But that kind of work has been controversial. a scientist at the University of Arizona who was a key participant in the rice project. "You could equate this to being as important as the Human Genome Project. barley. 11 August 2005 1 . or genome. including introducing genes from other species to create desirable traits. potato. has been compiled and placed in computer data banks around the world. 14 Washington Post.Genome Sequencing Project set of nonredundant transcripts that are generated from nine public plant expressed sequence tag projects (rice. and barrel medic). In addition. "This is really a project that can lead to important discoveries and findings that can help the condition of the poor. maize." which recently compiled a human genetic map.
helped get the project off the ground. but the plant is vitally important to them nonetheless. Availability of the rice genome will make such genetic manipulation easier in all the cereals . Monsanto Co. an independent genetics laboratory founded by maverick scientist J. it may also help to reduce some of the theoretical risks that have led to controversy. which for decades has funded research aimed at feeding the world. wheat and corn are the most important . It is a crucial model for understanding the biology of all cereals. scientists said. completion of the rice genome is expected to speed conventional breeding programs.but. agriculture. Taiwan. Rice is a minor component of most diets in the developed world but it supplies most daily calories for people in Asia who remain in poverty. and the United Nations Food and Agriculture Organization projects that demand will raise sharply in coming decades. Rice is the principal source of calories for about half the world's population. it also means that an immense new task opens before the world's plant biologists. Thailand. It is critically important to poor people in Latin America. A lot of the work was done in Rockville at the Institute for Genomic Research. Japan and other places. Scientists now have a rice genome with but a few gaps.S.are expected to be similar. basmati rice. while the Rockefeller Foundation is funding work in the Philippines and other countries on strains that could yield enough even in drought years to keep a farm family from starving. are hot on the trail of genetic variations that might allow rice to grow in colder climates. Louis and Syngenta AG of Basel. Brazil and Britain. Craig Venter. by giving scientists more precise knowledge of how the plants work. a wild grass that lived more than 50 million years ago. which is likely to take decades. Two Western agricultural companies. scientists are about to tackle the far larger genome of corn. purple rice . India. 1 . That makes the cereals close genetic relatives. Cheaper. It is a map of the Nipponbare strain of white rice grown in Japan. and its importance is rising rapidly in urban Africa. and rice. but vociferously rejected by consumers in Europe. The International Rice Genome Sequencing Project began in 1998. Switzerland.red rice. contributed genetic information that moved up completion of the project by at least a year. where it is being embraced as easier to prepare than many traditional African foods. farmers. They have been embraced by U. The Rockefeller Foundation of New York. China.S. Already. brown rice. They need to learn to read the genetic messages and understand how the proteins in rice interact with one another. Those are critical needs as Asia's rapid urbanization reduces the land available for rice cultivation. allowing researchers to produce rice strains that resist drought and disease and that grow in colder climates and at higher elevations. researchers in Japan. While the map is an important achievement. It was led by scientists in Japan but involved teams from the United States. though the many other strains of rice . France.Genome Sequencing Project More important in the short term. the most important commodity in U. The great cereals whose cultivation made human civilization possible -. Building on their success with rice. Seed rice is not a major product for companies like Monsanto and Syngenta.descended from a common ancestor. of St. proved to be the easiest to analyze. Korea. more abundant rice is seen as one of the keys to reducing hunger worldwide.rice. using the new map. Companies like Syngenta and Monsanto have brought genetically modified strains of corn and other crops to market. with the smallest genome. It cost more than $100 million.
including allergens. vice president of the National Institute of Agrobiological Sciences in Tsukuba. unknown effects on other organisms (e. United States) • Mixing GM crops with non-GM products confounds labeling attempts Society • New advances may be skewed to interests of rich countries 1 . including: unintended transfer of transgenes through cross-pollination. Objections to consuming animal genes in plants and vice versa. Japan. Access and Intellectual Property • Domination of world food production by a few companies. unknown effects. Ethics • • • • Violation of natural organisms' intrinsic values." Issues and Controversies in Plant Genome Project Safety • Potential human health impacts." said Takuji Sasaki. and loss of flora and fauna biodiversity.g. • Biopiracy or foreign exploitation of natural resources.. and principal leader of the rice genome project. • Increasing dependence on industrialized nations by developing countries. transfer of antibiotic resistance markers. Stress for animal.g. soil microbes). • Potential environmental impacts. Tampering with nature by mixing genes among species. "It's just starting. Labeling • Not mandatory in some countries (e..Genome Sequencing Project "Our work is not over.
When House mice are from 65 to 95 mm long from the tip of their nose to the end of their body. and three or four exits. The estrous cycle is 4-6 days long. They have long tails that have very little fur and have circular rows of scales (annulations). most mice do not live beyond 12-18 months.in houses. Their fur ranges in colour from light brown to black. Females generally have 5-10 litters per year if conditions are suitable. but mutant and calorie-restricted captive individuals have lived for as long as 5 years. and they generally have white or buffy bellys. If a house mouse is a pet. but individuals have lived for as long as 6 years. Some individuals spend the summer in fields and move into barns and houses with the onset of cool autumn weather. Many domestic forms of mice have been developed that vary in colour from white to black and with spots. They also occupy cultivated fields. the average life span is about 2 years. 2 . Because of their association with humans. house mice generally dwell in cracks in rocks or walls or make underground burrows consisting of a complex network of tunnels. They are fully furred after 10 days. Females experience a postpartum estrus 12-18 hours after giving birth. though young females are more likely to stay nearby. with estrus lasting less than a day. Young mice are cared for in their mother's nest until they reach 21 days old. Gestation is 19-21 days but may be extended by several days if the female is lactating. granaries. when exposed to female sex pheromones. They range from 12 to 30 g in weight. and reach sexual maturity at 5-7 weeks. and even wooded areas. House mice generally live in close association with humans . Average life span is about 2 years in captivity. although wild mice may have a reproductive season extending only from April to September. etc. Soon after this most young mice leave their mother's territory. barns. In the wild. Litters consist of 3-12 (generally 5 or 6) offspring.Genome Sequencing Project Animal Genome Sequencing Project – Domestic Rice Introduction Mus musculus may have originally been distributed from the Mediterranean region to China. but it has now been spread throughout the world by humans and lives as a human commensally. fencerows. several chambers for nesting and storage. house mice have been able inhabit inhospitable areas (such as tundra and desert) which they would not be able to occupy independently. House mice tend to have longer tails and darker fur when living closely with humans. suggests that this behavior may be involved in mate choice. but as many as 14 have been reported. their tails are 60 to 105 mm long. but they seldom stray far from buildings. House mice have a polygynous mating system. are weaned at 3 weeks. The recent discovery of ultrasonic songs produced by male mice. which are born naked and blind. most mice do not live beyond 12-18 months. Breeding occurs throughout the year. In the wild. Wild-derived captive Mus musculus individuals have lived up to 4 years in captivity. Mus musculus is characterized by tremendous reproductive potential. open their eyes at 14 days. Behaviour In the wild state.
and clothing. but all the individuals in a territory will defend an area against outsiders. they rarely travel more than 50 feet from their established homes.that may contribute to breast cancer in humans. Dominant males set up a territory including a family group of several females and their young. They use pheromones and other smells to communicate with each other about social dominance. It was recently discovered that male mice produce complex. Eisen. Food Habit In the wild. caterpillars. rickettsial pox. house mice eat many kinds of plant matter. in woodpiles. ferrets. which means that populations can recover quickly from predation. Ecosystem Roles Where house mice are abundant they can consume huge quantities of grains. house mice nest behind rafters.Genome Sequencing Project living with humans. such as seeds. large lizards. Imperial College Press. and use their whiskers to feel air movements and surface textures. House mice are quick runners (up to 8 miles per hour). Eisen. House mice are also important prey items for many small predators. leaves and stems. They also destroy woodwork. and owls.the mouse mammary tumour virus (MMTV) 15 . 2005. ultrasonic songs in response to female sex pheromones. House mice often squeak to each other in the nest. Recent research has also shown that they carry a virus . tularemia. and cockroaches) and meat (carrion) may be taken when available. Young mice are generally made to disperse through adult aggression. Insects (beetle larvae. Predation House mice are eaten by a wide variety of small predators throughout the world. Mus musculus is generally considered both territorial and colonial when living commensally with humans. although some are active during the day in human dwellings. furniture. and other household materials. Aggression within family groups is rare. they contribute to the spread of diseases such as murine typhus. Occasionally. Despite this. making these foods unavailable to other (perhaps native) animals. storage areas. Mice are agricultural pests in some areas. fleshy roots. Eugene J. Communication and Perception House mice have excellent vision and hearing. falcons. They are also capable of reproducing very rapidly. food poisoning (Salmonella). Economic Importance for Human? House mice do not cause such serious health and economic problems as do Rattus norvegicus and Rattus rattus. including cats. subordinate males may occupy a territory or males may share territories. mongooses. In human habitation. but they are far less aggressive than males. and reproductive readiness. Many mice store their food or live within a human food storage facility. paper. family composition. Mus musculus consumes any human food that is accessible as well as glue. In addition. Females establish a loose hierarchy within the territories. or other soft substances and line them with finer shredded material. and bubonic plague. Territoriality is not as pronounced in wild conditions. Domesticated forms and albinos have been developed which are commonly used as laboratory animals (especially in medicine and 15 The Mouse in Animal Genetics and Breeding Research. although some (especially females) may remain in the vicinity of their parents. and also swim well. soap. upholstery. They construct nests from rags. and they do consume and contaminate stored human food with their droppings. good climbers. however. snakes. London 2 . House mice are generally nocturnal. however. (contributor) Eugene J. or any hidden spot near a source of food. weasels. jumpers. foxes. a keen sense of smell. House mice try to avoid predation by keeping out of the open and by being fast. hawks.
Comparing humans and mice has the potential to reveal key features of mammalian biology. or be active at different times during the life of a person or a mouse. Although both man and mouse share genes. Researchers report that approximately 99 percent of mouse genes have counterparts in humans. Genome Project Sequencing of the mouse genome was completed in late 2002. The genes in humans and mice are essentially the same genes. Researchers state that having a publicly available mouse genome sequence draft means we can move from knowing that a general region of the genome is contributing to a disease state or biological process. such as Mus domesticus. The mouse genome is essentially a reference manual for understanding the human genome. The latter refers to a pathological condition causing mice to twitter constantly with a "song" resembling that of a cricket. they were inherited from a common mammalian ancestor millions of years ago however evolution changes genomes through the duplication and specialisation of genes. and Business. New York. in part because the definition of a gene is still being debated and extended. they may have taken on slightly different roles. In fact. 2 . although the mouse genome is fourteen percent smaller than the human genome. and as pets.686 genes. but this is minimal. to actually looking at that region and seeing directly what genes are there. Medicine. For comparison. though many human and mouse genes appear to be similar. Mus musculus often refers to several fairly distinct kinds of mice.Genome Sequencing Project genetics). and more insights will emerge as more genomes are completed17. The current estimated gene count is 23.786. humans are estimated to have 23. The haploid genome is about 3 billion bases long (3000 Mb distributed over 20 chromosomes) and therefore equal to the size of the human genome. The former refers to a genetic strain with inner ear defects. and Mus castaneus. Glyn Moody. Mus musculus also has a small role as an insect destroyer. John Wiley and Sons. if not years. As many as seven separate species may be placed under Mus musculus. and the neighbourhoods in which these genes reside are strikingly similar in humans and mice. they also share 'nongene' regions that may regulate genes and these could be critical to understanding why humans develop certain disease16. western European house mice. southeastern Asian house mice. Virtually every gene in the mouse is also present in humans. 2004. Estimating the number of genes contained in the mouse genome is difficult. and wobble when they walk. causing the mice to weave. of gene-hunting effort 16 Nature 5 420(6915):520-62 (2002) 17 Digital Code of Life: How Bioinformatics is Revolutionizing Science. turn in circles. This estimate takes into account knowledge of molecular biology as well as comparative genomic data. It will save investigators months. "Dancing" and "singing" mice are other names for house mice.
can be detected by microscopic examination. ATTCCGGA). A few types of major chromosomal abnormalities. The constellation of all proteins in a cell is called its proteome.000 genes. Studies to explore protein structure and activities. whose functions may include providing chromosomal structural integrity and regulating where. are more subtle and require a closer analysis of the DNA molecule to find perhaps single-base differences. Genomes vary widely in size: the smallest known genome for a free-living organism (a bacterium) contains about 600. however. Whose genome was sequenced in the public (HGP) and private projects? The human genome reference sequences do not represent any one person’s genome. Proteins are large. they serve as a starting point for broad comparisons across humanity. Except for mature red blood cells. The DNA sequence is the particular side-by-side arrangement of bases along the DNA strand (e. it’s the proteins that perform most life functions and even make up the majority of cellular structures. and in what quantity proteins are made. 2 . Genes are specific sequences of bases that encode instructions on how to make proteins. when. DNA in the human genome is arranged into 24 distinct chromosomes: physically separate molecules that range in length from about 50 million to 250 million base pairs. complex molecules made up of smaller subunits called amino acids. Most changes in DNA. A protein’s chemistry and behaviour are specified by the gene sequence and by the number and identities of other proteins made in the same cell at the same time and with which it associates and reacts. All the instructions needed to direct their activities are contained within the chemical DNA (deoxyribonucleic acid). the remainder consists of non-coding regions. the basic physical and functional units of heredity.Genome Sequencing Project Human Genome Sequencing Project Introduction Cells are the fundamental working units of every living system. while human and mouse genomes have some 3 billion.. The genome is an organism’s complete set of DNA. Rather. Although genes get a lot of attention.g. known as proteomics. will be the focus of much research for decades to come and will help elucidate the molecular basis of health and disease. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell.and extracellular environmental signals.000-25. The knowledge obtained from the sequences applies to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes. This order spells out the exact instructions required to create a particular organism with its own unique traits. The human genome is estimated to contain 20. Each chromosome contains many genes. DNA from all organisms is made up of the same chemical and physical components. including missing or extra copies or gross breaks and rejoinings (translocations). the dynamic proteome changes from minute to minute in response to tens of thousands of intra.000 DNA base pairs. Genes comprise only about 2% of the human genome. Unlike the relatively unchanging genome. all human cells contain a complete genome.
gov/sciencetech/genome. A much smaller minority of polymorphisms affect an individual’s susceptibility to disease and response to medical treatments. Other researchers at numerous colleges. Many polymorphisms . 18 http://www. a process of genetic recombination -or "crossing over" . Most SNPs have no physiological effect. Only a few samples were processed as DNA resources. Who sequenced the human genome? Human Genome Project research was funded at many laboratories across the U. many large and small private U. and laboratories throughout the United States also have received DOE and NIH funding for human genome research 18. is based on careful analyses of human inheritance patterns. At any given time. including equal numbers of cells with the X (female) or Y (male) sex chromosomes. universities. HGP scientists also used white cells from female donors' blood to include samples originating from women. companies are conducting genome research.small regions of DNA that vary among individuals also were identified during the HGP.htm (181008) 2 . At least 18 other countries have participated in the Human Genome Project. a genetic linkage map. In addition. mostly single nucleotide polymorphisms (SNPs).energy. Sperm contain all chromosomes necessary for study.S. Sets of human chromosomes Mapping the Genome One of the central goals of the Human Genome Project is to produce a detailed "map" of the human genome. However. a measure of recombination frequency. researchers collected blood (female) or sperm (male) samples from a large number of donors. the National Institutes of Health (NIH).Genome Sequencing Project In the international public-sector Human Genome Project (HGP)." with distances measured in centi-morgans. Technically. Thus donors' identities were protected so neither they nor scientists could know whose DNA was sequenced.occurs in which pieces of genetic material are swapped between paired chromosomes. It indicates for each chromosome the whereabouts of genes or other "heritable markers. One type. it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done.S. DNA clones from many libraries were used in the overall project. although a minority contributes to the beneficial diversity of humanity. the DOE Human Genome Project has funded about 100 principal investigators. by the Department of Energy (DOE). During the formation of sperm and egg cells. or both.
for example. typically. Further. but in "real" physical units. the analogy can be extended further. The role of human pedigrees now becomes clear. Bruce W. constructed at the Lawrence Livermore National Laboratory. for example. USA. If a gene can be localized to a single fragment within a contig map. then cloning and ordering the resulting fragments. By cloning enough such fragments. or even unique segments of DNA identifiable only in the laboratory. Tay-Sachs disease. Just such a detailed physical map is one that emerges from the use of restriction enzymes . for example replicates a "parasitic" fragment of human DNA. these conveniently sized clones become resources for further studies by researchers around the world . each overlapping the next and together spanning long segments (or even the entire length) of the chromosome. Two giant steps: Chromosomes 16 and 19 One of the signal achievements of the DOE genome effort so far is the successful physical mapping of chromosomes 16 and 19. When they are close enough that the chances of being separated are only one in a hundred.as well as the natural starting points for systematic sequencing efforts. process is a product of recombinant DNA technology. including genes implicated in cystic fibrosis. its physical location is thereby accurately pinned down. is based on restriction fragments cloned in cosmids.analogous to large-scale county maps that show every village and farm road. Further. several cancers. Other maps are known as physical maps. and indicate distances at a similar level of detail. 2003. thus producing the multiple copies needed for further study. numbers of base pairs. which can then be detected and thus pinpointed on a specific region of the chromosome. they are said to be separated by a distance of one centimorgan. and many other maladies. The average gap between markers was about 0. a comprehensive map was available that included more than 5800 such markers.Genome Sequencing Project This process of chromosomal scrambling accounts for the differences invariably seen even in siblings (apart from identical twins). Indeed. is the familiar chromosomal map. Each contiguous block of ordered clones is known as a contig. The cloning. Logically. and the resulting map is a contig map. Huntington disease. by a process known as in situ hybridization. A typical restriction enzyme known as EcoRI. in which the natural reproductive machinery of a "host" organism . A close analogy can thus be drawn between physical maps and the road maps familiar to us all. geneticists can begin to pin down the relative positions of these genetic markers. A well-known low-resolution physical map. Eric D.a bacterium or yeast. workers can eventually produce an ordered library of clones. the less likely they are to get split up during genetic recombination. 2 . Fortunately. showing the distinctive staining patterns that can be seen in the light microscope. specific segments of DNA can be targeted in intact chromosomes by using complementary strands synthesized in the laboratory. Just as small-scale road maps may show only large cities and indicate distances only between major features. Green. New York. or copying.7 centimorgan19. Birren. myotonic dystrophy. These laboratory-made "probes" 19 carry a fluorescent or radioactive label. so a low-resolution physical map includes only a relative sprinkling of chromosomal landmarks. means are also available to produce physical maps of much higher resolution . recognizes the DNA sequence GAATTC and selectively cuts the double helix at that site. By studying family trees and tracing the inheritance of diseases and physical traits. One use of these handy tools involves cutting up a selected chromosome into small pieces. so called because the distances between features are measured not in genetic terms. synthetic cloning "vectors" modelled after bacteria-infecting viruses Genome Analysis: A Laboratory Manual. By the end of 1994. CSHL Press.DNA-cleaving enzymes that serve as highly selective microscopic scalpels. the closer two genes are to each other on a single chromosome. The highresolution chromosome 19 map.
more than 95 percent of the chromosome. The low-resolution map. a "cytogenetic breakpoint map" based on 78 lines of cultured cells. 1997. comprising 700 YACs from a library constructed by the Centre d'Etude du Polymorphisme Humain (CEPH). together with any "foreign" human DNA that has been smuggled into it. The integrated map also includes a transcription map of 1000 sequenced 20 Encyclopedia of Human Biology. which was identified in 1992 by an international consortium that included Livermore scientists.000 base pairs. with cosmid reference points separated by an average of 230. The framework for the Los Alamos effort is yet another kind of map. Renato Dulbecco. In a similar effort.000 base pairs. Natural breakpoints in chromosome 16 are thus identified. About 2000 other genes are likely to be found eventually on chromosome 19. They have also been integrated into the breakpoint map. And yet another gene. since it is a "sequence-ready" map. This ordered FISH map. of which nearly 300 have been incorporated into the ordered map. a second form of kidney disease. short but unique stretches of DNA sequence. Academic Press. it includes 250 smaller YAC clones that have been merged with the cosmid contig map. provides practically complete coverage of the chromosome. An emerging gene map shows the locations of the mapped genes. The cosmid contig map is an especially important step forward. provides the essential framework to which other cosmid contigs can be anchored. A second important disease gene (COMP). responsible for a form of dwarfism known as pseudoachondroplasia. London UK.1 million base pairs. Over 450 genes and genetic markers have also been localized on this map. The high-resolution map comprises some 4000 cosmid clones. each a hybrid that contains mouse chromosomes and a fragment of human chromosome 1620. these clones have been restriction mapped to allow identification of a minimum set of overlapping clones for a large-scale sequencing effort. Further. and further. assembled into about 500 contigs covering 60 percent of the chromosome. excluding the centromere. has also been identified. These contigs span an estimated 54 million base pairs. Most of the contigs have been mapped by fluorescence in situ hybridization to visible chromosomal bands. Like a phage. is reproduced here as Mapping chromosome 16. Among these genes is the one responsible for the most common form of adult muscular dystrophy (DM).Genome Sequencing Project known as bacteriophages. the Los Alamos National Laboratory Center for Human Genome Studies has completed a highly integrated map of chromosome 16. much reduced and showing only some of its central features. a cosmid hijacks the cellular machinery of a bacterium to mass-produce its own genetic material. Moreover. one linked to a form of congenital kidney disease. more than 200 cosmids have been more accurately ordered along the chromosome by a high-resolution FISH technique in which the distances between cosmids are determined with a resolution of about 50. and breast and prostate cancers.and low-resolution maps have been tied together by sequencetagged sites (STSs). leukemia. the EcoRI restriction sites have been mapped on more than 45 million base pairs of the overall cosmid map. In addition. It is based on bacterial clones that are ideal substrates for DNA sequencing. A readable display of this integrated map covers a sheet of paper more than 15 feet long. except the highly repetitive DNA in the centromere region. a chromosome that contains genes linked to blood disorders. 2 . and with genetic maps developed at the Adelaide Children's Hospital and by CEPH. but has not yet been precisely pinpointed. The high. Anchored to this framework are a low-resolution contig map based on YAC clones and a high-resolution contig map based largely on cosmids. leading to a breakpoint map that divides the chromosome into segments whose lengths average 1. The foundation of the chromosome 19 map is a large set of cosmid contigs that were assembled by automated analysis of overlapping but unordered restriction fragments. a portion of it. has been localized to a single contig spanning one million base pairs.
would be the sequence for every gene. or spurious insertions. without the danger of deletions. the string of three billion characters -. YACs remain a classic tool for cloning large fragments of human DNA. decreasing run times. a major effort in technology development was called for . resist cloning in YACs.an effort that would drive the cost well below $1 per base pair and that would allow automation of the sequencing process. but they are not perfect. rearrangements. Some regions of the genome. though all of those in widespread current use are still based on methods developed in 1977 by Allan Maxam and Walter Gilbert and by Frederick Sanger and his coworkers.000 base pairs of continuous. Only the barest start has been made in taking this dramatic step in the Human Genome Project.programs to develop new technologies. and enhancing the accuracy of base identification. Several hundred million base pairs have been sequenced and archived in databases. At the beginning of the project. the result would fill several hundred volumes the size of a big-city phone book. therefore. and P1derived artificial cloning systems (PACs) have thus been devised to address these problems. P1 phages. thus to reveal differences that indicate various forms of the same gene. at different times of our lives). of course. New vectors such as bacterial artificial chromosomes (BACs). T's. Hence. including new cloning vectors. as with so many human enterprises.representing the sequence of base pairs that defines our species.000 and 50. Only about 30 million base pairs of human DNA (roughly one percent of the total) have been sequenced in longer stretches. Included.Genome Sequencing Project exons (expressed fragments of genes) and more than 600 other markers developed at other laboratories around the world. a standard sequencing 21 Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering National Research Council (U. Clearly.). and one researcher could produce between 20. and recent advances in commercial systems include increasing the number of gel lanes. including clone libraries and libraries of expressed sequences. the cost of sequencing a single base pair was between $2 and $10. Committee on Challenges for the Chemical Sciences in the 21st Century. New York 2 .000 base pairs long. As a result of such improvements. 2003. accurate sequence in a year. Even more daunting is the realization that we will eventually need to sequence many parts of the genome many times. National Academies Press. Efforts to develop new cloning vectors have been especially productive. Marked progress is also evident in the development of sequencing technologies. Getting down to details: Sequencing the genome Ultimately. From the beginning. for example. These new approaches are critical for ensuring that the entire genome can be faithfully represented in clone libraries. and G's -. and others are prone to rearrangement. Both of these methods rely on gelbased electrophoresis systems to separate DNA fragments. and to establish suitable resources for sequencing. these physical maps and the clones they point to are mere stepping stones to the most visible goal of the genome project. the challenge of sequencing the genome is largely one of doing the job cheaper and faster. but the great majority of these are from short "sequence tags" on cloned fragments. the DOE has emphasized programs to pave the way for expeditious and economical sequencing efforts -. the longest being about 685. though. C's. Sequencing the genome by the year 2005 would therefore likely cost $10- 20 billion and require a dedicated cadre of at least 5000 workers21. Should anyone undertake to print it all out. as well as the sequences for stretches of DNA whose functions we don't yet know (but which may be involved in such little-understood processes as orchestrating gene expression in different parts of our bodies.A's.S.
unverified sequences of 50.8 million base pairs of Methanobacterium thermoautotrophicum.Genome Sequencing Project machine can now turn out raw.000 partial and complete cDNA clones.A.000 bases per day. The aim is a master set of mapped and sequenced human cDNA. are usually 300-500 base pairs each. By early 1996. most of them with one or both ends sequenced to provide unique identifiers.830. had distributed over 250. however. representing the expressed parts of the human genome. how many times must a given strand be sequenced to ensure acceptable confidence in the result? Shotgun sequencing derives its name from the randomly generated DNA fragments that are the objects of scrutiny.000 to 75. if no gaps are to be tolerated in the final sequence.M. A benefit is that the final sequence is highly reliable. I. Twenty-five hundred genes have also been newly mapped as part of this coordinated effort. an effort supported mostly by private funds). and it is not yet clear which will prove the most efficient and most costeffective way to read long stretches of DNA over the next decade. Each fragment is then separately cloned. a bacterium important in energy production and bioremediation. have brought much nearer the day when "production sequencing" can begin. A computational assembly process then compares the terminal sequences of the many fragments and. Genome Therapeutics has sequenced 1. in both technology development and the assembly of resource libraries. (Integrated Molecular Analysis of Genomes and their Expression). Another critical resource is being assembled in an effort known as I. and YACs. Another is the degree of redundancy .G.and chromosome-sorting technologies developed at Livermore and Los Alamos. Equally important to the sequencing goals of the genome project is a rational system for organizing and distributing the material to be sequenced.E. and TIGR has successfully sequenced the complete genomes of three free-living bacteria.a product of DOE-supported work at the University of Washington. shotgun sequencing has been the primary means for generating most of the genomic sequence data in public DNA databases. as more efficient vectors have become available. from the human T-cell receptor beta region. Haemophilus influenzae (1. in the approaches available to sequencing the human genome. This includes the longest contiguous fragment of sequenced human DNA. expressed sequence tags (ESTs). One of the available choices. The statistics involved in taking this approach require that many copies of the original clone be randomly fragmented. as part of the DOE-supported Microbial Genome Initiative. is between "shotgun" and "directed" strategies. either by restriction enzymes or by physical shearing. Shotguns and transposons Such advances as these. libraries of clones were established for each of the human chromosomes. for example. Many copies of a single large clone are broken into pieces of perhaps 1500 base pairs. A great deal of variety remains. The members of this ordered library can then be sequenced from end to end to yield a complete sequence for the parent. Mycoplasma 2 . by finding overlaps that indicate neighboring fragments.M.A. of about 685. Based on cell.E. More recently. constructs an ordered library for the parent clone. cofounded by the Livermore Human Genome Center.137 base pairs. the main disadvantage is that the same sequence must be done many times (in the many overlapping fragments). These clones were invaluable in such notable "gene hunts" as the successful searches for the cystic fibrosis and Huntington disease genes.G. Nevertheless. and a convenient portion of it sequenced. The shotgun strategy is also being used at the Genome Therapeutics Corporation and The Institute for Genomic Research (TIGR). PACs.that is. These identifiers.000 base pairs -. and the individual clones are widely available for mapping and for isolating genes. complete human DNA libraries have been established using BACs.
making these primers was an expensive and time-consuming business. Granner. Ohio. 2006.that can be sequenced in one run. the pooling of libraries as Harper's Illustrated Biochemistry. Significant DOE resources have been committed to innovations in instrumentation. just overlapping the first. Unfortunately. Starting at one end of a single large fragment. Darryl K. in which one seeks to sequence the target clone from end to end with a minimum of duplication. especially. the DOE's engineering infrastructure and tradition of instrumentation development have been crucial contributors to the international effort. The alternative to shotgun sequencing is a directed approach. commercial robots have simply been mechanically reconfigured and reprogrammed to perform repetitive tasks. is then tackled in the same way. one can thus "walk" the entire length of the original clone. but recent innovations have made primer walking. Murray. And here. one replicates a stretch of DNA . Robert K.933 base pairs). 400 base pairs long . The widely automated Sanger sequencing method involves a DNA replication step that must be "primed" by a DNA fragment that is complementary to 15 to 20 base pairs of the strand to be sequenced.739. In principle. and economy of large-scale mapping and sequencing efforts as a result of improved laboratory automation tools. Peter A. 2 . In many cases. Victor W. the next stretch of DNA.070 base pairs). Mayes. mainly the expense and inconvenience of custom-synthesizing a primer as the necessary starting point for each sequencing step. Until recently. efficiency. Columbus. McGraw-Hill Professional. genome researchers are seeing significant improvements in the rate.technologies that might potentially increase mapping and sequencing efficiencies by orders of magnitude. On the first of these fronts. and similar directed strategies. Bioinformatics in Human Genome Sequencing Project22 From the start. it has been clear that the Human Genome Project would require advanced instrumentation and automation if its mapping and sequencing goals were to be met.say. including the replication of large clone libraries. Rodwell. 22 ranging from straightforward applications of automation to improve the speed and efficiency of conventional laboratory protocols to the development of technologies on the cutting edge . and Methanococcus jannaschii (1. With the sequence for this first segment in hand.Genome Sequencing Project genitalium (580. more and more economically feasible. The essence of this approach is embodied in a technique known as primer walking. this conceptually simple approach has been historically beset with disadvantages.
and instruments developed at Utah for automated hybridization in multiplex sequencing schemes. and efficiency are projected in future commercial instruments. and cooled CCD cameras. 3 .1 millimetres thick. in disease? Sequencing by hybridization is only one of several forward-looking ideas for revolutionizing sequencing technology. The oligomers are placed on an array by a process similar to that of making silicon chips for electronics. a number of DOEsupported efforts aim at improved versions of the automated gel-based Sanger sequencing technique. Another miniaturization effort aims at the fabrication of high-density combinatorial arrays of custom oligomers (short chains of nucleotides). This same technology has already been used for genetic screening and cDNA fingerprinting. Challenges include providing uniform excitation over arrays of 50 to 100 capillaries and then efficiently detecting the fluorescence emitted by labeled samples. economy. Similar approaches can be envisioned to understand differences in patterns of gene expression: Which genes are active (which are producing mRNA) in which cells? Which are active at different times during an organism's development? Which are active. In other cases. which is used to sort human chromosomes for chromosome-specific libraries. several DOE-supported groups are exploring ways to adapt high-resolution photolithographic methods to the manipulation of minuscule quantities of biological reagents. ultrathin gels. Building on experiences in the electronics industry. robotics-compatible thermal cycler developed at Berkeley. The capillary approach is especially ripe for further development. Successful matches between oligomers and genomic DNA are then detected by fluorescence. and tenfold improvement in speed. scanning confocal microscopy. Smaller is better: and other developments Beyond "mere" automation are efforts aimed at more fundamental enhancements of established techniques. custom-designed instruments have proved more efficient. followed by assays performed on the same "chip. including sequencing by hybridization. in place of the conventional slab gels. with an eye to simplifying sample preparation. A notable illustration is the world's fastest cell and chromosome sorter. And Livermore scientists are looking beyond even capillaries. In particular. and the arraying of clone libraries for hybridization studies. Both of these approaches exploit higher electric field strengths to increase DNA mobility and to reduce analysis times. The move toward miniaturization is afoot elsewhere as well. or inactive. In spite of continuing improvements to sequencers based on the classic methods. Other examples include a high-speed. computer-controlled PCR device under development at Livermore operates on 9-volt batteries and might ultimately lead to arrays of thousands of individually controlled microPCR chambers.1-millimeter capillaries are used as the separation medium. a fivefold improvement in throughput over conventional systems. to sequencing arrays of rigid glass microchannels. developed at Livermore and now being commercialized. less than 0. Some of this effort has already been transferred to the private sector. reducing measurement times. For example. high-speed thermal cycling systems for PCR. Even faster speedups are seen when arrays of 0. supplemented by automated gel and sample loading.Genome Sequencing Project a prelude to various assays. can be used to obtain 400 bases of sequence from each lane in a hour's run. it is nonetheless desirable to explore altogether new approaches. Technologies under investigation include fiber-optic arrays. and the application of sophisticated statistical analyses reassembles the target sequence. which greatly accelerates PCR amplifications. A miniaturized. which would make feasible large-scale hybridization assays." Current thrusts of this "nanotechnology" approach include the design of microscopic electrophoresis systems and ultrasmall-volume. This innovative technique uses short oligomers that pair up with corresponding sequences of DNA.
It would therefore replace traditional gel electrophoresis as the last step in a conventional sequencing scheme. is a world-standard gene identification tool. together with graphics and user-friendly interfaces that invite their use by biologists and other non-computer scientists. and facilitating interpretation of the results. All of these alternatives look promising in the long term. Further. because much of the challenge is interpreting genomic data and making the results available for scientific and technological applications. cost-effective data production in both DOE laboratories and the many other laboratories that use them. and biopharmaceutical companies around the world. Over the course of the past few years. and gene expression analysis. DNA sequencing. and practical systems based on high-resolution mass separations of DNA fragments of fewer than 100 bases are currently being developed at several universities and national laboratories. Another innovative sequencing method is under investigation at Los Alamos. The roles of laboratory data acquisition and management systems include the construction of genetic and physical maps. These systems are the keys to efficient. general molecular biology and medical laboratories. These systems typically comprise databases for tracking biological materials and experimental procedures. These systems typically include task-specific computational engines. Mass spectrometry measures the masses of ionized DNA fragments by recording their time-of-flight in vacuum. biotechnology companies. The Oak Ridge-developed GRAIL system. but fragments of up to 500 bases have been analyzed. but mass spectrometry has perhaps demonstrated the greatest near-term potential. Dealing with the data Among the less visible challenges of the Human Genome Project is the daunting prospect of coping with all the data that success implies. more than 180 million base pairs of DNA were analyzed with GRAIL. including atomic-resolution molecular scanning. robot control software developed at Berkeley and Livermore. Appropriate information systems are needed not only during data acquisition. Efforts in all these areas are the mandate of the DOE genome informatics program. But the potential benefits are great. This approach is beset by major technical challenges. and mass spectrometry of DNA fragments. the challenge extends not just to the Human Genome Project. In 1995 alone. and much of the instrumentation for sensitive detection of fluorescence signals has already proved useful for molecular sizing in mapping applications. and DNA sequence assembly software developed at the University of Arizona. illustrated in Gene hunts. base by base. software for controlling robots or other automated systems. supporting efforts at Oak Ridge National Laboratory and elsewhere. The interpretation of map and sequence data is the job of data analysis systems. Among such systems are physical mapping databases developed at Livermore and Los Alamos. thereby yielding the sequence. but also for sophisticated data analysis and for the management and public distribution of unprecedented quantities of biological information. The characteristic fluorescence is detected by a laser system.Genome Sequencing Project increasing the length of the strands that can be analyzed in a single run. and environmental remediation. and software for acquiring laboratory data and presenting it in useful form. singlemolecule detection of individual bases.and private-sector programs focused on areas such as health effects. Routine application of this technique still lies in the future. 3 . and direct sequencing has not yet been achieved. The genome informatics program is the world leader in developing automated systems for identifying genes in DNA sequence data from humans and other organisms. but also to the microbial genome program and to public . several alternative approaches to direct sequencing have been explored. whose products are already widely used in genome laboratories. structural biology.
In this set of strands. is to use sets of very short fragments to prime the next sequencing step.that is. on chromosome 20. it is critical to develop scientific databases that "interoperate. whereas. Systems now in place include the Genome Database of human genome map data at Johns Hopkins University. the Genome Sequence DataBase at the National Center for Genome Resources in Santa Fe. for each. As the genome project continues to provide data that interlink structural and functional biochemistry. as well as over three million base pairs from the fruit fly Drosophila melanogaster. cellular. which offers a way of increasing throughput with either shotgun or directed approaches. One way to deal with the primer bottleneck. molecular. The largest clones are broken into smaller subclones (each of about 3000 base pairs). a "minimum tiling path" can be determined for each subclone -. As this community of researchers expands and as the quantity of data grows. this technique has been used to sequence over 1. the organism that causes Lyme disease. an imposing set of possibilities. thus serving as an 18-base primer. and G) can be ordered in more than 68 billion ways to create an 18-base primer. Berkeley researchers are interested in a region of about two million base pairs that is implicated in 15 to 20 percent of all primary breast carcinomas. At the Lawrence Berkeley National Laboratory.000-base-pair fragment has already been sequenced. On chromosome 5.Genome Sequencing Project A third area of informatics reflects. and reaction conditions are controlled to yield. But it is eminently practical to create a library of the 4096 possible 6-base primers. such interoperable databases will be the critical resources for both research and technology development. which insinuates itself more or less randomly in longer DNA strands. the challenges of maintaining accessible and useful databases likewise increase. This predilection for random insertion and the fact that the transposon's DNA sequence is well known are the keys to the sequencing strategy depicted schematically in taking a directed approach. a set of strands can be identified whose transposon insertions are roughly 300 base pairs apart. Three of these "6-mers" can be matched to the end of the fragment to be sequenced. As an illustration. and the Molecular Structure Database at Brookhaven National Laboratory. the approximate position of the inserted transposon. on average. For example. Another directed approach uses a naturally occurring genetic element called a transposon. which then become the targets of the transposons. is currently being applied to Borrelia burgdorferi. By mapping these positions. the region around each transposon is then sequenced. In addition. for example. a 34. the ultimate product of the Human Genome Project -. Public resource databases must provide data and interpretive analyses to a worldwide research and development community. a single insertion in each 3000-base-pair strand. the four nucleotides (A. The known transposon sequence allows a single primer to be used for sequencing the full set of overlapping regions. using the inserted transposons as starting points. Multiple copies of each subclone are exposed to the transposons.information readily available to the scientific and lay communities. they have developed a methodology for "multiplex" DNA sequencing.5 million base pairs of DNA on human chromosomes 5 and 20. in a sense. developed at the Brookhaven National Laboratory. interest focuses on a region of three million base pairs that is rich in growth factor and receptor genes. C. The individual strands are then analyzed to yield. Bionformatics program is crucial to the multiagency effort to develop just such databases. physiology and medicine." sharing data and protocols so that users can expect answers to complex questions that demand information from geographically distributed data resources. Researchers supported by the DOE at the University of Utah are also pursuing the use of directed sequencing. By attaching a unique identifying sequence to each sequencing sample in a 3 . This modular primer technology. and environmental science. T. and developmental biology.
first. with their extensive coverage in many different kinds of cloning vectors. thus pinpointing the most critical targets for later. the genome center there has produced almost two million base pairs of human DNA sequence. more thorough sequencing efforts. for bands containing the first identifier. though. 50 such samples. containing a kidney disease gene.8 million base pairs of the thermophilic microbe Pyrococcus furiosus and two important regions of human chromosome 17. especially the mouse. To assure a higher level of confidence. and good starts have been made in mapping other genes. the entire mixture can be analyzed in a single electrophoresis lane. Initially.Genome Sequencing Project mixture of. even random sequencing has led to the identification of gene DNA in over 15 percent of the samples. shotgun sequencing. sequence-tagged sites.much as in other sequencing approaches. A parallel effort is under way at Livermore on chromosome 19 and other targeted genomic regions. the difference between one human being and another is more like one base pair in five hundred. and perhaps to uncover important individual differences. and so forth -. and then physically broken into 3000-base-pair subclones -. only a small random set of the subclones is then selected for sequencing. In 3 . In addition.E. Clones are selected from the high-resolution Los Alamos cosmid map.G. researchers there have completed over 1. say. However. containing several genes involved in DNA repair and replication. The 50 samples can be resolved sequentially by probing.end sequences. the SASEderived sequences provide enough information for researchers elsewhere to pursue just such comprehensive efforts. or "sassy"). Further. cDNA resource to sequence the cDNA from these regions. enough to allow identification of genes and ESTs. so most researchers now agree that one error in a thousand is a more reasonable standard. Between chromosome 16 and the short arm of chromosome 5. multiplexing can also be used for mapping.are used as the starting points. and another of approximately one million base pairs.000 base pairs has already been sequenced around the adult polycystic kidney gene. have done comparative sequencing of these genes in other species. Such comparative sequencing has identified conserved sequence elements that might act as regulatory regions for these genes and has also assisted in the identification of gene function How good is good enough? The goal of most sequencing to date has been to guarantee an error rate below 1 in 10. they are attacking two major regions of chromosome 19: one of about two million base pairs.000. using whole genomic DNA. Los Alamos scientists have begun a project to determine the cost and throughput of a low-redundancy sequencing strategy known as sample sequencing (SASE. along with the associated segments of the genome. say. Interestingly. the most biologically or medically important regions would still be sequenced more exhaustively. A region of 60. Los Alamos scientists have therefore begun sequencing chromosome 16. focusing special effort on locating the estimated 3000 expressed genes on that chromosome and using those sites as starting points for directed genomic sequencing.000. confirming the apparent high density of genes on this chromosome. Using a shotgun approach. but using this lowered standard would greatly reduce the cost of acquiring sequence data for the bulk of human DNA. then for bands containing the second. The Utah group is now able to map almost 5000 transposons in a single experiment. are especially ripe for large-scale sequencing. and they are using multiplexing in concert with a directed sequencing strategy to sequence the 1. In contrast to.A.3 million bases of genomic sequence. The Livermore scientists are making use of the I.M. Livermore scientists have targeted DNA repair gene regions throughout the genome and. Sequence fragments already known -. In a similar way. With this philosophy in mind. sometimes even 1 in 100. The result is sequence coverage for about 70 percent of the original cosmid clone. in many cases. and so forth. another Los Alamos target. The completed physical maps of chromosomes 16 and 19.
000 genes is regulated and controlled.4 million base pairs had been sequenced. understanding and usage of genome technology. As the first major target of SASE analysis. to a telomeric region on the long arm of chromosome 7. 2007. Vovis foresees a day when doctors will take a sample of blood. and a genomic region can be "sampled" ten times as fast. Hispanic. family. Huge gene Variation found in Humans: Find May Explain Differing Responses to Medication.to threefold redundancy to produce a complete sequence. ageism. Their study of 313 genes. though. http://www.the ultimate physical map -. EST. Los Alamos is building on the SASE effort by using SASE sequence data as the basis for an efficient primer walking strategy for detailed genomic sequencing. Jon F. Asian. Karen F. Greif. The deepest mystery is how the potential of 100. even a complete genome sequence -.com/news. Ethical Issues23 Controversies That Never End There are many ethical issues that are raised as a direct result of our knowledge.msnbc. genderism). MIT Press. can't/ won't free itself from the deleterious clutches of the "isms" (racism. The downside is that some unscrupulous individual having access to that information could misuse or exploit that individual 24. Inc. By early 1996. and a gene.Genome Sequencing Project addition. government? Since this society. in contrast to the seven. July 12.is only a start in understanding the human genome. do a total genetic examination. located in New Haven announced they have detected an "astonishing" variance at the genetic level in 82 unrelated people from four racial backgrounds . black.000-base-pair sequence is the second-longest stretch of contiguous human DNA sequence ever produced. and how these and countless other cell types arise in the first place from an single undifferentiated egg cell.to tenfold redundancy required in shotgun approaches. is a more complete physical picture of the master molecules that lie at the heart of it all. out of 30. In a sense. 2001 3 . Another upside to this technology is that side effects produced by the ingestion of medication could be minimized or eradicated altogether. proved to be as efficient as typical shotgun sequencing. the cost of SASE sequencing is only one-tenth the cost of obtaining a complete sequence. A first step toward solving these subtle mysteries. First and foremost is what will be done with this information? Who has a right to have it? Should potential employers be given this information? Should insurance carriers be given this information? Will this technology absolve some and indict others of their responsibilities to society. Gearld Vovis.white. In addition. and have that guide in prescribing treatment. sexism. or suspected coding region had been located on every cosmid sampled. Merz. though. Boston. Genaissance chief technology officer and senior vice president felt this might explain why there is such a wide variance in how people respond to medication. The resulting 230.000 identified by human genome scientists found that for each gene. based on historical documentation. over 1. there were on average 14 versions that could be inherited by a given person from parents. Los Alamos scientists chose a cosmid contig of four million base pairs at the end (the telomere) of the short arm of chromosome 16. but it required only two. how blood cells and brain cells are able to perform their very different functions with the same genetic program. 23 Current Controversies in the Biological Sciences: Case Studies of Policy Challenges from New Technologies. The first application of this strategy. would such information merely serve as another discriminating mechanism to ostracize individuals from mainstream society? Treatment and Medicine Genaissance Pharmaceticals. Mass 24 MSNBC Reuter.
Huntington's disease. the most important consequence of mapping out all of our genes. Religion is predicated on all faith. With access to your genetic code. No one can look at how the book of life is written and not come away fully understanding that our genetic instructions have evolved from the same programs that guided the development of earlier animals. wooly mammoths and our primate ancestors. Mass. as the scientists who cracked the genome all agreed. Smoking. Darwin believed that first man evolved. obesity. Employment What if potential employers had this information? When you currently apply for a job. any predisposition to alcoholism.Genome Sequencing Project Insurance Companies If life insurance companies had this information. The genome reveals. indisputably and beyond any serious doubt that Darwin was right mankind evolved over a long period of time from primitive animal ancestors. Sure the business side of cracking 3 . That will not set well with a potential employee who will need to be selfsupporting. omnipresent. references. dinosaurs. your "employability" is based on skill. Moral Issues Science and religion have been at odds for eons. Now imagine if insurance companies had access to your genetic composition? You could potentially be penalized now for what may be coming twenty years (or never) down the road. based on something that might not take place in the next five to ten years? From the potential employer's perspective. There is no other way to explain the jerryrigged nature of the genes that control key aspects of our development. the naturalist feels that Darwin has been vindicated. The fundamental difference between the two is buttressed in philosophy. Science says reality must be grounded in fact. if you have something you don't know about. Would it be ethical for the employer to deny you the opportunity to make money today. could become grounds for not being hired. coronary heart disease. cancer. When Charles Robert Darwin first presented his book of theories entitled." The core recipe of humanity carries clumps of genes that show we are descended from bacteria. or cause you to pay abhorrently elevated fees. The human genome project confirms the theory on evolution. and a host of other malaise can increases your rates. it was met fairly much as it is today. Religion sets its sights on a reality that has no bases in logic. Religion has promoted the notion that an omnipotent. Faith is a belief in that which is not seen. it cannot be used against you. how might that impact society? Anyone who has ever sought life insurance is familiar with the little indicators that can prohibit your ability to become insured." Eric Lander of the Whitehead Institute in Cambridge. AIDS. "On the Origin of Species by Means of Natural Selection in 1859. said that if you look at our genome it is clear that "evolution …must make new genes from old parts. The response to all those who thump their bible and say there is no proof. the answer would probably be a resounding yes. experience. Science believes in the tangible and concrete. Today. but rather emotional rectitude. Changes came about because of natural selection. in our genes. cancer. Religion says I believe therefore it is real. no test and no evidence in support of evolution is. or experienced. Our genetic instructions have been slowly assembled from the genetic instructions that made jellyfish. Our genes show that scientific creationism cannot be true. Science relies on empirical data. Multiple Sclerosis. There is. The only up side to the current system is. and education and sometimes who-you-know.. "The proof is right here. …none of these headlines capture the most basic. no other possible explanation. and omniscient deity created all life here on Earth.
fear a resurgence of Adolph Hitler's vision of creating the perfect race. is it moral to take away the variety that nature provides? Will scientists one day perceive certain ethnic groups as being unwanted flaws? Conclusion There is no doubt that the human genome project started in 1990 left humankind hanging at the precipice of eminent power and direction. Change the eye colouring. We have sequenced 3." the study of the production of proteins. like the Thomas Jefferson debauchery.000 genes. Department of Energy and the National Institutes of Health seems to have been a good merger.Genome Sequencing Project our genetic code is fascinating. MSNBC. The debate further intensifies as religion frowns on the notion that man will attempt to play God. and proven that humans are made up of 30. Historical denials. Gene mapping will make it possible to do away with perceived flaws. thanks to our continuing breakthroughs with DNA. will there be a way to control what goes on? Will the quality and validity be retained. One salient thought keeps me from totally embracing this new technology: can we fallible creatures objectively and responsibly handle this knowledge? Caplan. The potential for eliminating illnesses with debilitating effects on adults and children are clearly a good reason to continue. The collaboration of the U. Clearly our bio-technical advances are working.S. We are correcting past wrongs. once vehemently denied. The public at-large must indeed become more knowledgeable so that an eye can be kept on Big Brother. As more and more companies enter the arena. Even that not particularly religious. or defects in children. Though we have the technology. let's make the child athletic and very aesthetically pleasing.1 billon 25 letters of DNA. Clearly the genie is out of the bottle and there is no way of stopping its progress. And we all need to be sure that our government does not leave us in the genetic lurch without laws to ensure our privacy and protect us against genetic discrimination25. February 21. and by all means. courtesy of the genome factor by proving he fathered several of Sally Hemings' children. We are spawning new scientific fields of study like "proteomics. 2001 3 . we have seen a thirty-percent increase in the number of centenarians. 'Darwin vindicated!' Cracking of human genome confirms theory of evolution. Arthur. add to the aptitude of the child. During the last twenty years. only two times more than fruit flies. change the hair texture. as more for-profit businesses like Celera Genomics enter the picture? Only time will tell.000 to 40. now pierces the veneer of American piety. freeing those who have been incarcerated unjustly.
have been nicknamed as "termites of the sea" These animals are capable of feeding solely on wood.Nanoflagellates More than 80% of the earth living organisms are found in only aquatic ecosystem. and evolutionarily distinct from those found in termites. such as bacteria and phytoplankton. Academic Press. Like termites. Sir Maurice Yonge. Molecular technology to develop rapid diagnostic that ensure the safety of the seafood we eat and the vitality of the seafood industry. Unlike termites. Sir Frederick Stratten Russell. Our challenge as a nation is to discover the life-enhancing and lifesaving properties this unique organism posses. While for the Bankia setacea26. for survival. ruminants. and we know little about the biochemical characteristics. An International team of investigators led by Monterey Bay Aquarium Research Institute's Alexandra Worden will investigate the genetic mechanism behind the processes of predation. one such metagenome lurks inside was the giant Pacific shipworm. including cellulases and other hydrolases critical for digestion of wood by the host and potentially valuable for commercial bioconversion of lignocelluloses to ethanol. We are also has to develop the biological technology needed to identify sources of ecological stress to develop strategies to protect and restore coastal resources. Shipworms. 3 . Nanoflagellates are a group of marine microbes. digestion. and biomass incorporation by protists that determine the fate of phytoplankton and bacteria to bridge the gap in our knowledge about this important player in the marine food web. prey on other microbes. functionally. shipworms accomplish the complete degradation of lignocellulose with a simple intracellular 26 In genome sequencing project among the marine organism will enable scientist to differentiate populations and address emerging disease to protect fishery and ecological resources. the ability of shipworms to consume wood depends on symbiotic bacteria that provide enzymes. These predatory protists play a critical role in marine carbon cycling. utilizing a highly efficient system of symbiotic lignocellulose degradation that is biologically. and all other cellulose-consuming animals. One of the examples of the genome sequencing among the marine organism is Nanoflagellates. London UK. Analysis of the shipworm symbiont community metagenome will provide important insights into the composition and function of this unique lignocellulose degrading bacterial community and will allow valuable comparisons to the recently sequenced termite symbiont metagenome. wood-boring marine bivalves. Seafood-borne illness adversely affects public health and coastal economies. Advances in Marine Biology.Genome Sequencing Project Marine Genome Sequencing Project . 1971.
Berkeley. The sequencing and analysis was performed by the Department of Energy Joint genome Institute (JGI) in Walnut Creek. and that." 3 . in collaboration with researchers from UC Berkeley and eight other institutions.Genome Sequencing Project consortium of just a few related types of microbes. While algae have been recognized for their role in carbon sequestration and for biofuels production. Since Monosiga does not form colonies as do some other choanoflagellates. these proteins' roles are a mystery. This project. can help us learn about our history and the history of life on Earth. "Choanoflagellates show no hint of multicellularity. Another marine organism is Botryococcus braunii. because choanoflagellates and animals shared a common ancestor between 600 million and a billion years ago. that synthesizes longchain liquid hydrocarbon compounds and sequesters them in the extracellular matrix of the colony to afford buoyancy. either genetic or metabolic. aside from the fact that they are an important food for krill. "In animals. as such. some of these proteins. reported on Feb. University of California. which are the main source of food for baleen whales. Daniel Rokhsar and their colleagues present their first draft of the genome of a choanoflagellate called Monosiga brevicollis. planktonic marine organism. The project was proposed by Daniel Distel of the Ocean Genome Legacy Foundation. the so-called metazoans. also comprise a portion of the hydrocarbon masses in several modern-day petroleum and coal deposits. led by Andrew Koppisch and colleagues from Los Alamos National Laboratory and five other institutions. 14 2008 in the journal Nature. produce proteins essential to cell-to-cell signalling and in determining which cells stick to one another. little information. braunii communities. Geochemical analysis has shown that botryococcenes. Botryococcenes have already been converted to fuel suitable for internal combustion engines. A type of B. The newly sequenced genome of a one-celled. biologists Nicole King." One finding confirmed by the sequencing is that choanoflagellates have many genes that. is already telling scientists about the evolutionary changes that accompanied the jump from one-celled life forms to multicellular animals like us. According to King." said King. in animals. they hold a key to understanding the origins and evolution of animals. King said. an assistant professor of integrative biology and of molecular and cell biology. which hold promise as an alternative energy source. evolved for linking cells together. by consuming large quantities of bacteria. but they have 23 genes for cadherin proteins. In the Nature paper and a complementary Science paper also released that week. less than 10 micrometers in size. "They help shed light on the biology and genome content of the unicellular organisms from which we evolved. called cadherins. which has been dominated by one-celled organisms. and a 2005 MacArthur "genius" Award winner. they are the glue that prevents clumps of cells from falling apart. braunii produces a family of compounds termed botryococcenes. Calif. choanoflagellates play a major role in the carbon cycle of the oceans.. and their first comparisons with the genes of multicellular animals. It is a colony-forming green microalgae. presumably from ancient B. "Choanoflagellates are the closest living unicellular relatives of animals and. has been reported for this particular organism. about the same as the fruit fly or the mouse. biologists know almost nothing about these organisms." King said. will target the identification of specific metabolic pathways responsible for hydrocarbon synthesis to alleviate bottlenecks in biofuels production. Yet.
while more complex metazoans adopted these proteins for gluing cells into a larger. UC Berkeley professor of molecular and cell biology and program head for computational genomics at JGI. many-celled creature. and in the intervening years. the last single-celled ancestor of all animals (including humans) employed these ancient cadherin proteins to bind and eat bacteria. Humans.choano comes from the Greek word for collar . but much smaller than the genomes of metazoans. choanoflagellates are not ." they wrote in Science. King and Rokhsar successfully proposed the choanoflagellate for sequencing several years ago as part of the Department of Energy's Microbial Genome Program. however. It is similar in size to the genomes of fungi and diatoms. While yeast is well known to genetics researchers. where they gorge on bacteria. because the fossil record is not there. have about 25. "Choanoflagellates really are a unique window back in time to the origin of animals and humans.that capture bacteria. which are among the most primitive of animals. they're about the size of another one-celled eukaryote. biologists 165 years ago proposed that these organisms were very distant ancestors of multicelled animals.200 genes. Choanoflagellates are found abundantly in salt and fresh water around the world. Perhaps. they argue. At about 10 microns across. The cells are egg-shaped with a single long tail or flagellum at one end surrounded at its base by a collar of tentacles . King and Rokhsar also are members of UC Berkeley's Center for Integrative Genomics." said Dan Rokhsar. King worked on isolating enough uncontaminated DNA for sequencing. consists of about 9.Genome Sequencing Project In the Science paper. yeast. for example. Because choanoflagellates resemble the feeding cells of sponges. The draft genome. King and graduate student Monika Abedin report that some of these proteins are found around the base of the choanoflagellate cell.000 genes. The flagellum propels the choanoflagellate through the water and also washes bacteria towards the tentacles. "The transition to multicellularity likely rested upon the co-option of diverse transmembrane and secreted proteins to new functions in intercellular signaling and adhesion. where bacteria are captured and ingested. 3 . where the choanoflagellate attaches to surfaces. and around the tentacles. completed and annotated in 2007.a situation King hopes will change now that the genome is sequenced. They are our best way of triangulating on that last unicellular ancestor of animals.
Nematostella vectensis. shows a surprising degree of complexity. "It remarkable to what extent we can figure out how those animal ancestors must have been able to stick together and communicate with each other. though they have no immune system. including another choanoflagellate . These findings are helping King and her colleagues assemble a picture of what the original common ancestor of humans and choanoflagellates looked like and also get hints about the first animals. the choanoflagellate has nearly as many introns .noncoding regions once referred to as "junk" DNA . even though Monosiga is not known to communicate. where we start with a genome to understand the biology of an organism. Introns have to be snipped out before a gene can be used as a blueprint for a protein and have been associated mostly with higher organisms. Choanoflagellates and humans have been evolving for the same length of time. or at least does not form colonies." King said. though they have no skeleton or matrix binding cells together. "This is a new era. noting a similar situation with the starlet sea anemone. and often in the same spots. so differences between the genomes may reflect genes that have been lost by choanoflagellates as much as genes gained by humans.may answer such questions.” 4 . whose genome is due to be sequenced by the National Institutes of Health . "The genome is the toehold.Genome Sequencing Project Interestingly. at least in ways that allow you to make hypotheses about what those first steps toward animals looked like. King said. like the genomes of many seemingly simple organisms sequenced in recent years. The choanoflagellate genome.a colony-former called Proterospongia. Many genes involved in the central nervous system of higher organisms. collagen. King has hopes that the Monosiga genome will answer many questions of animal evolution and illuminate the biology of this poorly understood aquatic creature. and proteins called tyrosine kinases that are a key part of signaling between cells.in its genes as humans do in their genes. for example. integrin and cadherin domains. have been found in simple organisms that lack a centralized nervous system. it is not always easy determining which genes were in the last common ancestor of choanoflagellates and humans. and which are new. choanoflagellates have five immunoglobulin domains. Nevertheless. sequenced in 2007." Rokhsar said. Comparison of the Monosiga genome to that of other organisms. Likewise.
and terms such as “working draft” or “essentially complete” have been used to more accurately describe the status of such genome projects. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes. Thus. and what those genes do. “completed” genome sequences are rarely ever complete. this will allow us to better understand aspects of human genetic diversity. it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Also. the emphasis has been on species which have either a relevance to human health (examples: pathogenic bacteria or vectors of disease such as mosquitoes) or species which have commercial importance (such as livestock and crop plants). it is likely that it will become even cheaper and quicker to sequence a genome. Such projects may also include gene prediction to find out where the genes are in a genome.Genome Sequencing Project Future Perspectives and Conclusion When is a genome project finished? When sequencing a genome. Rather than sequences a chromosome in one go. Changes in technology and in particular improvements to the processing power of computers. In the future. 4 . as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA). However. When research agencies decide what new genomes to sequence. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are. For humans. where coding DNA may only account for a few percent of the entire sequence). it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). it is not always possible (or desirable) to only sequence the coding regions separately. there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Even when every base pair of a genome sequence has been determined. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans. there are still likely to be errors present because DNA sequencing is not a completely accurate process. Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (such as: the common chimpanzee). It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly. when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. means that genomes can now be “shotgun sequenced” in one go (there are caveats to this approach though when compared to the traditional approach). Future Perspectives Historically.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.