This action might not be possible to undo. Are you sure you want to continue?
Genome Sequencing Project – Up Close and Personal Definition Genome sequencing projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (an animal, a plant, a fungus, a bacterium, an Achaean, a protist or a virus). The genome sequence for any organism requires the DNA sequences for each of the chromosomes in an organism to be determined. For bacteria, which usually have just one chromosome, a genome project will aim to map the sequence of that chromosome. Humans, with 22 pairs of autosomes and 2 sex chromosomes, will require 24 separate chromosome sequences in order to represent the completed genome. Background The sequencing of the human genome along with related organisms represents one of the largest scientific endeavours in the history of mankind. The information gathered from sequencing will provide the raw data for the exploding field of bioinformatics, where computer science and biology live in symbiotic harmony. The art of determining the sequence of DNA is known as Sanger sequencing after its brilliant pioneer. This technique involves the separation of fluorescent labelled DNA fragments according to their length on a polyacrilimide gel (PAGE). The base at the end of each fragment can then be visualized and identified by the dye with which it reacts. The time and labour intensive nature of gel preparation and running, as well as the large amounts of sample required, increase the time and costs of genomic sequencing. These conditions drastically reduce the efficiency of sequencing projects ultimately limiting researchers in their sequencing attempts1.
Frederic Sanger – a man behind “shotgun sequencing”
Encyclopedia of Medical Genomics and Proteomics. Jürgen Fuchs, Maurizio Podda. 2004. CRC Press. London, UK.
S. At the time the sequencing of model organisms such as S.). Frederic Sanger. sapiens. a strategy based on the isolation of random pieces of DNA from the host genome to be used as primers for the PCR amplification of the entire genome. Goffeau's European collaboration involved 74 different laboratories drawn to the project in hopes of sequencing the homologs of their favourite genes 2. The amplified portions of DNA are then assembled by their overlapping regions to form contiguous transcripts (otherwise known as contigs).S. in another revolutionary discovery. This method allowed sequencing projects to proceed at a much faster rate thus expanding the scope of realistic sequencing venture.5 Mb). cerevisiae appeared to be the logical step towards the eventual characterization of the human genome. National Academies Press. the 192 kb genome of vaccinia. Committee on Review of the Department of Energy's Genomics:GTL Program. and the 186 kb genome of smallpox. The final step involved the utilization of custom primers to elucidate the gaps between the contigs thus giving the completely sequenced genome. a task that seemed beyond the scope of technology due to its tremendous size of 3000 Mb. In addition. New York 2 . National Research Council (U. The success with viral genome sequencing stemmed from the relatively small length of their genetic codes.Genome Sequencing Project Sanger first used "shotgun" sequencing five years later to complete the bacteriophage sequence that was significantly larger: 48502 bp. Andre Goffeau set up a European consortium to sequence the genome of the budding yeast Saccharomyces cerevisiae (12. In 1989. Most laboratories utilized Sanger's "shotgun" method of sequencing that had become the accepted standard for genome sequencing. 2 Review of the Department of Energy's Genomics: Gtl Program. a viral genome with only 5368 base pairs (bp). valuable insight concerning these organisms would be gained with the elucidation of their genetic makeup. National Research Council. Sequencing smaller genomes would highlight the problems with sequencing techniques eventually refining the technology to be used on large-scale projects like H. cerevisiae had a sequence approximately 60 times larger than any sequence previously attempted indicating why Goffeau felt compelled to invite the cooperation of a group of laboratories. and the 187 kb mitochondrial and the 121 kb chloroplast genomes of Marchantia polymorpha. Bacteriophage fX174 was the first genome to be sequenced. Since then a couple of other viral and organellar genomes have been sequenced using similar techniques such as the 229 kb genome of cytomegalovirus (CMV). invented the method of "shotgun" sequencing. 2006.
called the TIGR Assembler was up to the task. The final work represented efforts of scientist from Japan. These projects were the culmination of over seven years of intensive work.S. S. similar to S. such as Escherichia coli. Genome Project in 1990. 3 . molecular biology. It was hoped that these projects would increase the efficiency of sequencing but unfortunately they fell short of this task. Haemophilus influenzae. Human Genome Project (HGP) is a joint A team headed by J. and the yeast. the bacterium E. sequenced the 1. such an approach would have failed because the software did not exist to assemble such a massive amount of information accurately. reassembling the approximately 24000 DNA fragments into the whole genome. In the wake of this pronouncement came the start of three projects aimed at elucidating the sequences of smaller model organisms. and the United States producing the largest full length sequence (12 Mb) ever done. E. coli K-12. cerevisiae in 1997. Europe. Many anticipated that E. an outsider won the race for the first complete genome sequence of a free living organism. Venter's team utilized a more comprehensive approach by "shotgunning" the entire 1. Maryland.8 Mb H. In an incredible display of organizational mastery only 3. coli is the preferred model in biochemical genetics. influenzae genome. Previous sequencing projects had been limited by the lack of adequate computational approaches to assemble the large amount of random sequences produced by "shotgun" sequencing. These segments are "shotgunned" into smaller pieces and then sequenced to reconstruct the genome. the genome is broken down laboriously into ordered. TIGR's dramatic leadership role in the field of genome sequencing was paralleled by the final completion of two of the largest genomic sequences. Software. cerevisiae in their academic utility.4% of the total sequencing efforts was duplicated among laboratories. coli would be the first genome to be sequenced entirely but to the shock of the science community. In conventional sequencing. and Caenorhabditis elegans.Genome Sequencing Project The following year saw the initiation of a plethora of ambitious sequencing proposals the foremost being the introduction of the Human effort of the Department of Energy and the National Institute of Health that was designed as a three-step program to produce genetic maps.8 Mb bacterium with new computational methods developed at TIGR's facility in Gaithersburg. each containing up to 40 Kb of DNA. The U.6 Mb) but equally important in terms of experimental utility. Canada. After the H. The first two aims of the project are practically fulfilled and now the majority of work is concentrated on the exact nucleotide sequence of the human. influenzae genome was "shotgunned" and the clones purified sufficiently the TIGR Assembler software required approximately 30 hours of central processing unit time on a SPARCenter 2000 containing half a gigabyte of RAM testifying to the enormous complexity of the computation. Craig Venter from the Institute for Genomic Research (TIGR) and Nobel laureate Hamilton Smith of Johns Hopkins University. and finally the complete nucleotide sequence map of the human chromosomes. overlapping segments. The E. physical maps. The yeast genome was the final result of a tremendous international collaboration of more than 600 scientists from over 100 laboratories representing the largest decentralised experiment in modern molecular biology. Mycoplasma capricolum. Previously. coli sequence was considerably smaller (4. developed by TIGR.
C. The rapid proliferation of biological information in the form of genome sequences has been the major factor in the creation of the field of bioinformatics. the fruit fly. coli and yeast. and distribution of the many types of information embedded in DNA sequences.Genome Sequencing Project and biotechnology and its genomic characterization will undoubtedly further research toward a more complete understanding of this important experimental. thirteen genome sequences of free-living organisms had been completed including the two largest. The growing sequence knowledge of the human genome has been likened to the establishment of the periodic table in the 19th century. and industrial organism. but 100000 genes reflecting not their similarity in electronic configuration but their evolutionary and functional relationship.5% completed (current: 92%). On September 1997. The periodic table will not contain 100 elements. modelling. Just as past chemists systematically organized all elements in an array that captured their differences and similarities. access. Four other large-scale projects are in progress including the sequencing of the Nematode. Drosophila melanogaster which is 6% completed (finished: 2006). This field will be challenged by the heightening demands of increased information on the algorithms currently utilized for sequence manipulation. storage. elegans which is 71% completed (finished: 1998). and eleven other microbial genomes under the length of 4. the mouse which has less than 1% finished (December 2007: only 20%). which focuses on the acquisition. the Human Genome Project will allow modern scientists to construct a biological periodic table relating units of nucleotides.2 Mb. E. Bioinformatics will be the tool of the modern scientist in interpreting this periodic table of biological information. analysis. and the human which is only 1. medical. 4 .
These overlapping reads can be merged together.edu/course/projects/final-4/ (111008) 5 . which can read up to 900 nucleotides or bases at a time.Genome Sequencing Project Genome Assembly Genome assembly refers to the process of taking a large number of short DNA sequences. overlap.) A genome assembly algorithm works by taking all the pieces and aligning them to one another. and the process continues4. represented as AGCT. and putting them back together to create a representation of the original chromosomes from which the DNA originated3. cytosine.yale.edu/research/assembly_primer. and detecting all places where two of the short sequences.shtml (111008) http://bioinfo. guanine. or reads. These pieces are then "read" by automated sequencing machines.umd. all of which were generated by a shotgun sequencing project. and thymine. Original DNA is broken into a collection of fragments The ends of each fragment (drawn in green) are sequenced 3 4 http://www.mbb. the entire DNA from a source (usually a single organism.cbcb. anything from a bacterium to a mammal) is first fractured into millions of small pieces. In a shotgun sequencing project. (The four bases are adenine.
As raindrops fall randomly across the sidewalk. the assembler can join the sequences together in a manner similar to solving a jigsaw puzzle. 2007. assembly is only possible once enough sequences are generated to cover the genome 8 to 10 times. 5 Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data. 6 .Genome Sequencing Project The sequence reads are assembled together based on sequence similarity Assembly Statistics The assembler relies on the basic assumption that two sequences reads (two strings of letters produced by the sequencing machine) that share a same string of letters originated from the same place in the genome (see pic above)5. Mathematically. John Wiley and Sons. this phenomenon can be understood by thinking of a sidewalk as it begins to rain. 5 for a 1Mbp genome). this phenomenon was modelled by Eric Lander and Michael Waterman in 1988. The graph below shows a plot of the LanderWaterman equation for a genome of 1Mbp (1000000 base pairs). They examined the correlation between the oversampling of the genome (coverage) and the number of contiguous pieces of DNA (contigs) that can be re-constructed by an idealized assembly program. Using such overlaps between the sequences. It is important to note that the shotgun sequencing process is inherently "wasteful" as. UK. Michael R. dry spots persist for quite a while. Intuitively. Barnes. corresponding to regions of the genome that are not represented in the set of shotgun reads. due to the randomness of the shearing process. Between 8 and 10-fold coverage the model predicts that most of the genome will be assembled into a small number of contigs (approx. London.
cbcb.umd. is the fact that the distribution of the sheared fragments along the genome cannot be modelled as a perfect Poisson process.r.shtml (111008) 7 . coli bacterium. coli replicates its own genome. a procedure usually performed by inserting the fragment into the cell of the Escherichia coli bacterium (called a vector) and allowing this bacterium to grow.Genome Sequencing Project Lander-Waterman estimation of number of contigs w. genome coverage Assembly Challenges6 Ideally. thereby replicating the fragment as E. More importantly. certain regions are toxic to the E. an assembly program should produce one contig for every chromosome of the genome being sequenced. many contigs are produced due to a combination of factors. however. Sanger sequencing requires many copies of each fragment in order for the sequencing chemistry to be possible.t. 6 http://www. leading to the presence of gaps in the coverage. Even at 8-10 fold coverage. In most genomes. Each shotgun fragment must be cloned. however.edu/research/assembly_primer. In all but the simplest cases. there is a non-zero probability that some portion of the genome remains unsequenced.
7 http://amos. Celera Assembler was a key element in the successful assembly of the human genome by Celera Genomics and is currently used in numerous bacterial and eukaryotic projects. Genome misassembled due to a repeat. AMOS (A Modular.assembly program developed at Celera Genomics. The assembly program incorrectly combined the reads from the two copies of the repeat leading to the creation of two separate contigs Assembly software Originally. most large-scale DNA sequencing centres developed their own software for assembling the sequences that they produced. Most notably. The reads originating from different copies of a repeat appear identical to the assembler and cause assembly errors. Celera Assembler demonstrated the applicability of the shotgun method to the assembly of a whole eukaryotic genome by successfully assembling the genome of the fruit fly Drosophila melanogaster. phrap is one of the most widely used assembly programs. The Celera Assembler . and Art Delcher. Despite its age. AMOS was initiated at The Institute for Genomic Research by Steven Salzberg. Mihai Pop. this has changed as the software has grown more complex and as the number of sequencing centres has increased. Phrap . 2.net/ (111008) 8 . A simple example: Two copies of a repeat along a genome. The reads coloured in red and those coloured in yellow appear identical to the assembly program. Among the list of available assemblers are: 1. 3.Genome Sequencing Project The ability of an assembly program to produce a single contig is also limited by regions of the genome that occur in multiple near-identical copies throughout the genome (repeats). phrap was the main workhorse in the public effort to sequence the human genome. used throughout the years in the assembly of many bacterial and eukaryotic genomes.assembly program developed at the University of Washington.sourceforge. Open-Source assembler)7 is a well-known open source effort to bring together the efforts of leading genome assembly code developers. who are now at the University of Maryland. However.
The Arachne .assembly program developed at the Institute for Genomic Research (TIGR). TIGR Assembler . accomplishment reported in the journal Science in 1995.Genome Sequencing Project 4. 5.program developed at the Broad Institute of MIT. Arachne and Celera Assembler are arguably the best assemblers available to the scientific community for the assembly of large eukaryotic genomes. 9 . This assembler was used to generate the first sequence of a free living organism Haemophilus influenzae. widely used in genome projects both at the Broad Institute and other research organizations.
It consists of two main steps: 1. • coding regions. Automatic annotation tools try to perform all this by computer analysis. a process called gene finding.ehu. 8 Functional annotation consists in attaching biological information to genomic elements. Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together". Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means: http://www. Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism. A variety of software tools have been developed to permit scientists to view and share genome annotations9.portion of an organism's genome which contains a sequence of bases that could potentially encode a protein . attaching biological information to these elements. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation. The Ensembl database relies on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline8. Genome annotation is the next major challenge for the Human Genome Project. similarity scores.seedling. For example. and then annotating genomes based on that. and integrations of other resources to provide the most accurate genome annotations through their Subsystems approach.Genome Sequencing Project Genome Annotation Genome annotation is the process of attaching biological information to sequences. The basic level of annotation is using BLAST for finding similarities.es/ (121008) 1 . as opposed to manual annotation (also called curation) which involves human expertise.org/IJDC/DB/ (121008) 9 http://insilico. • gene structure.and their localisation. • location of regulatory motifs. nowadays more and more additional information is added to the annotation platform. experimental data. now that the genome sequences of human and several model organisms are largely complete. However. the SEED database uses genome context information. these approaches co-exist and complement each other in the same annotation pipeline. Structural annotation consists in the identification of genomic elements: • Open reading frames (ORFs) . Ideally. identifying elements on the genome. 2. • biochemical function • biological function • involved regulation and interactions • expression These steps may involve both biological experiments and in silico (performed on computer or via computer simulation) analysis.
collaborative effort to address the need for consistent descriptions of gene products in different databases. 3. 4. manual annotation of vertebrate finished genome sequence.joint project between EMBL .Genome Sequencing Project 1.aims to identify all functional elements in the human genome sequence. Ensembl . and protein sequences from diverse taxa. Encyclopedia of DNA Elements (ENCODE) . and eukaryotes. Achaea. including several of the world's major repositories for plant. The pilot phase of the project is focused on a specified 30 megabases ( 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. 6. 1 . animal and microbial genomes. 5. The results of this pilot phase will guide future efforts to analyze the entire human genome. the GO Consortium has grown to include many databases.EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Uniprot . Each RefSeq represents a single. Gene Ontology Consortium . 2.central repository for high quality. naturally occurring molecule from one organism. viruses. organelles.provide the scientific community with a comprehensive. bacteria. RefSeq . Vertebrate and Genome Annotation Project (Vega) . frequently updated. high-quality and freely accessible resource of protein sequence and functional information. RNA. Since 1998. The collection includes sequences from plasmids.non-redundant collection of richly annotated DNA.
Gupta. Gupta (contributor) Springer. Rice is staple food for a large part of the human population and making it the second-most consumed cereal grain especially Latin America. slender leaves 50-100 cm long and 2-2. It is usually used for species of the different but related genus Zizania.8m tall. Rajeev K. 2004. London. some governments and retailers began rationing supplies of the grain due to fears of a global rice shortage. The edible seed is grain 5-12 mm long and 2-3 mm thick. In early 2008. K. wild rice. Oryza glaberrima : is native to West Africa. There are two species of domesticated rice in the Poaceae (“true grass”) family. P. occasionally more depending on the variety and soil fertility. P. East Asia. Can grow to 1-1. Southeast Asia. UK 1 . K.5cm broad. Although in tropical areas it can survive as perennial and can produce a ratoon crop and survive for up to 20 years. Has long. Oryza sativa : native to tropical and subtropical southern Asia and African rice. both wild and domesticated. Rice’s life10 Grown as a monocarpic annual plant. although the term may be used for primitive or uncultivated varieties of Oryza. 10 Cereal Genomics.Genome Sequencing Project Plant Genome Sequencing Project – Rice Rice is a cereal foodstuff which forms an important part of the diet of many people worldwide and as such it is a staple food for many. The small wind-pollinated flowers are produced in a branched arching to pendulous inflorescence 30-50cm long. South Asia. and Africa Potential of rice Improve nutrition Boost food security Foster rural development Support sustainable land care Provides more than one fifth of the calories consumed worldwide by humans. There also have another type of rice other than domesticated rice. Varshney.
rice will become more inclined to remain flooded for longer periods of time11. Rice also requires much more water to produce than other grains. On the other hand.sativa appears to have been domesticated from the crop wild relative Oryza rice. O. Genetic History As we know. exemplified by Japanese rice. While with rice growing and cultivation the flooding is not mandatory. The microbes in the soil convert the carbon into methane which is then released through the respiration of the rice plant or through diffusion of 11 http://www. As sea levels rise. with O. Longer stays in water cuts the soil off from atmospheric oxygen and causes fermentation of organic matter in the soil. Temperate japonica and tropical japonica Labelled indica Aus Aromatic. Environmental Effect In many countries where rice is the main cereal crop. rice cultivation responsible for most of the methane emissions.org/rice2004/en/rice4. whose genome did show significant differences in age. which thrive under tropical conditions.htm (121008) 1 . but reduces growth of less robust weed and pest plant that have no submerged growth state. mechanized cultivation is extremely oil-intensive. exemplified by Basmati rice.sativa var japonica on the Chinese and Japanese side. thus increasing the chances of famine in the long run. the long-grained “indica” varieties. and the broad-grained “javonica” varieties. rice cannot hold the carbon in anaerobic conditions. more than other food products with the exception of beef and dairy products. Current genetic analysis suggests that O. According to Londo and Chiang.sativa var indica on the Indian side and O. and it is very labour-intensive to cultivate and requires plenty of water for cultivation. even on a steep hill or mountain. rza rufipogon around the foothills of the Himalayas. Rice can be grown practically anywhere.glaberrima. During the wet season.sativa and O.Genome Sequencing Project Rice cultivation is well-suited to countries and regions with low labour costs and high rainfall. there are two species of rice were domesticated. How to cultivated rice Flooding the fields with or after setting the young seedlings. All other methods of irrigation require higher effort in weed and pest control during growth periods and a different approach for fertilizing the soil. O. Other studies have suggested that there are three groups of Oryza sativa cultivars: the short-grained “japonica” or “sinica” varieties. Farmers in some of the arid regions try to cultivate rice using groundwater bored through pumps. and deters vermin. This method requires sound planning and servicing of the water damming and channelling.fao. although its species are native to South Asia and certain parts of Africa.sativa be best divided into five groups. Labelled indica Aus Aromatic Temperate japonica Tropical japonica Further analysis of the genetic material of various types of rice indicates cultivar to emerge start with.
Major rice pests include: The brown planthopper Armyworms The green leafhopper The rice gall midge The rice bug Hispa The rice leaffolder Stemborer Rats The weed Echinochloa crusgali Rice weevils also known to be a threat to rice crops in the US. are used by some farmers in an attempt to control rice pests. rodents and birds. The genetically based ability of a rice variety to withstand pest attack is called resistance. A variety of factors can contribute to pest outbreaks. By the reducing the populations of natural enemies of rice pests. M. to manage crop pests in such manner that future crop production is not threatened. Rice pests include weeds. The practice probably helps the soil retain moisture and thereby facilities seed germination. but in general the practice is not common. In other words. so called “natural pesticides”. Philippines. particular cultivars are recommended for areas prone to certain pest problems. Methane is twenty times more effective as a greenhouse gas than carbon dioxide is. Upland rice is grown without standing water in the field. Weather conditions also contribute to pest outbreaks. Nonpreference : host plants which insects prefer to avoid Antibiosis: where insect survival is reduced after the ingestion of host tissues Tolerance: the capacity of a plant to produce high yield or retain high quality despite insect infestation. misuse of insecticides can actually lead to pest outbreaks12. Over time. Three main types of plant resistance to pests are recognized as. International Rice Research Institute. including the overuse of pesticides and high rates of nitrogen fertilizers application. insects. Rice pests are managed by cultural techniques. Some upland rice farmers in Cambodia spread chopped leaves of the bitter bush over the surface of fields after planting. Pest Management of Rice Farmers in Asia. Further rise in sea level of 10-85 centimetres would then stimulate the release of more methane into the air by rice plants. there is evidence that farmer’s pesticide applications are often unnecessary. and recovery from. Farmers also claim the leaves are a natural fertilizer and helps suppress weed and insect infestations. One of the challenges facing crop protection specialists is to develop rice pest management techniques which are sustainable. Manila.Genome Sequencing Project water. resistance is said to have broken down. Rice varieties that can be widely grown for many years in the presence of pests. Kong Luen Heong. When a rice variety is no longer able to resist pest infestations. PR China and Taiwan. M. pathogens. pest-resistant rice varieties and pesticides. Therefore. the use of pest resistant rice varieties selects for pests that are able to overcome these mechanisms of resistance. and retain their ability to withstand the pests are said to have durable resistance. Pests and Disease Rice pests are any organisms or microbes with the potential to reduce the yield or value of the rice crop. Botanicals. 1997. Increasingly. 12 Among rice cultivars there are differences in the responses to. pest damage. 1 . Escalada.
is long-grain and relatively less sticky. and these are generally called “floating rice”.php?option=com_frontpage&Itemid=1 (131008) 1 .have a mild popcorn-like aroma and flavour. d) Texmati Biotechnology High Yielding Varieties The high yielding varieties are a group of crops created intentionally during the Green Revolution to increase global food production. short-grain rice. caused by the fungus Magnaporthe grisea. like corn and wheat. Indian rice : long-grained and aromatic Basmati Patna rice : long and medium-grained Sona masoori : short-grained Ponni: grown in the delta regions of Kaveri River. is the most significant disease affecting rice cultivation. For example: That Jasmine Rice. Chinese restaurants usually serve long-grain as plain unseasoned steamed rice. Rice cultivars are often classifieds by their grain shapes and texture. Japanese table rice: sticky.org/statistics/index. Cultivars While most breeding of rice is carried out for crop quality and productivity. with over 100. Cultivars exist that are adapted to deep flooding. Japanese sake rice: another kind as well. Chinese people use sticky rice which is properly known as “glutinous rice” to make zongzi. 13 http://beta. Ambemohar : fragrance of Mango blossom Aromatic rices : have definite aromas and flavours . Rice. and into industrial sectors. Japanese mochi rice & Chinese sticky rice: short-grain. as longgrain rice contains less amylopectin than short-grain cultivars.irri. The largest collection of rice cultivars is at the International Rice Research Institute (IRRI).Genome Sequencing Project Major rice diseases: Rice ragged stunt Sheath blight Tungro Rice blast.000 rice accessions held in the International Rice Genebank13. was genetically manipulated to increase its yield. This project enabled labour markets in Asia to shift away from agriculture. there are varieties selected for other reasons. a) Thai fragrant rice b) Patna rice c) Basmati .
thereby shortening their duration and reducing recurrence. lysozyme. Resources are also being developed to leverage the rice genome sequence to partial genome projects such as expressed sequence tag projects. for Africa. An international effort has been established and is in the process of sequencing O. IR8 was produced in 1966 at the International Rice Research Institute which is based in the Philippines at the University of the Philippines’ Los Banos site. japonica var "Nipponbare" using a bacterial artificial chromosome/P1 artificial chromosome shotgun sequencing strategy. To provide a low level of annotation for rice genomic sequences. selected to tolerate the low input and harsh growing conditions of African agriculture are produced by the African Rice Center. With an estimated genome size of 430 Mb. groups like the Earth Institute are doing research on African agricultural systems. Rice containing these added proteins can be used as a component in oral rehydration solutions which are used to treat diarrheal diseases. it is feasible to obtain the complete genome sequence of rice using current technologies. 2007). 2007) and International Herald Tribune (October 9. With the intent of replicating the successful Asian boom in agronomic productivity. hoping to increase productivity. thereby maximizing the output from the rice genome project. diploid nature. Several attributes such as small genome size. An important way this can happen is the production of ‘New Rices for Africa’ (NERICA). antibacterial. Such supplements may also help reverse anemia. IR8 was created through a cross between an Indonesian variety named “Peta” and a Chinese variet named “Dee Geo Woo Gen” Potential for the Future As the UN Millennium Development project seeks to spread global economic development to Africa. Golden Rice German and Swiss researchers have engineered rice to produce Betacarotene. the ‘Green Revolution’ is cited as the model for economic development. trumpeted as miracle crops that will dramatically increase rice yield in Africa and enable an economic resurgence. we have aligned all rice bacterial artificial chromosome/P1 artificial chromosome sequences with The Institute of Genomic Research Gene Indices that are a 1 . with the intent that it might someday be used to treat vitamin A deficiency. and establishment of genetic and molecular resources make it a tractable organism for plant biologists. The addition of the carotene turns the rice gold Expression of Human Protein Ventria Bioscience has genetically modified rice to express lactoferrin. and human serum albumin which are proteins usually found in breast milk. Annotation of the rice genome is performed using prediction-based and homology-based searches to identify genes. and billed as technology from Africa. Genome Project Rice (Oryza sativa) is a model species for monocotyledonous plants. The NERICA have appeared in The New York Times (October 10. Additional efforts are being made to improve the quantity and quality of other nutrients in golden rice. especially for members in the grass family. transformability. These rices.Genome Sequencing Project The first “modern rice”. These proteins have antiviral. sativa spp. Annotation tools such as optimized gene prediction programs are being developed for rice to improve the quality of annotation. and antifungal effects.
The new map will make it possible. In addition. tomato. including introducing genes from other species to create desirable traits."14 The number of people in the world is expected to increase 50 percent. in theory. potato. by the middle of this century." which recently compiled a human genetic map. has been compiled and placed in computer data banks around the world. barley. maize. 11 August 2005 1 . It will be a key tool for researchers working on improved strains of rice and other grains as they struggle to stay ahead of human population growth. The poorest of the poor are the ones that depend on rice the most. "You could equate this to being as important as the Human Genome Project. and barrel medic). But that kind of work has been controversial. 14 Washington Post. or genome.Genome Sequencing Project set of nonredundant transcripts that are generated from nine public plant expressed sequence tag projects (rice. "This is really a project that can lead to important discoveries and findings that can help the condition of the poor. Much of that growth will come in Asian countries where rice is the dietary staple. a scientist at the University of Arizona who was a key participant in the rice project. For example. data from The Institute of Genomic Research Gene Indices and the Arabidopsis and Rice Genome Projects was used to identify putative orthologues and paralogues among these nine genomes. to 9 billion. said Rod Wing. one project introduced a daffodil gene into rice to turn the plant into a source of vitamin A. and how many countries will embrace it remains to be seen. Arabidopsis. which it normally lacks. to perform sophisticated genetic manipulations of the rice plant. Rice is the first crop plant whose complete genetic sequence. sorghum. wheat.
and the United Nations Food and Agriculture Organization projects that demand will raise sharply in coming decades. though the many other strains of rice . It was led by scientists in Japan but involved teams from the United States.are expected to be similar.red rice. the most important commodity in U. It is critically important to poor people in Latin America. contributed genetic information that moved up completion of the project by at least a year. it may also help to reduce some of the theoretical risks that have led to controversy. France. but the plant is vitally important to them nonetheless. but vociferously rejected by consumers in Europe. basmati rice. scientists said. Scientists now have a rice genome with but a few gaps. it also means that an immense new task opens before the world's plant biologists. The Rockefeller Foundation of New York. more abundant rice is seen as one of the keys to reducing hunger worldwide. where it is being embraced as easier to prepare than many traditional African foods. That makes the cereals close genetic relatives. Companies like Syngenta and Monsanto have brought genetically modified strains of corn and other crops to market. with the smallest genome.S. China. Seed rice is not a major product for companies like Monsanto and Syngenta. India. are hot on the trail of genetic variations that might allow rice to grow in colder climates.descended from a common ancestor. Taiwan. while the Rockefeller Foundation is funding work in the Philippines and other countries on strains that could yield enough even in drought years to keep a farm family from starving. While the map is an important achievement. Korea. farmers. They have been embraced by U. of St. and its importance is rising rapidly in urban Africa. Cheaper. Rice is a minor component of most diets in the developed world but it supplies most daily calories for people in Asia who remain in poverty. brown rice. scientists are about to tackle the far larger genome of corn. The International Rice Genome Sequencing Project began in 1998. allowing researchers to produce rice strains that resist drought and disease and that grow in colder climates and at higher elevations. The great cereals whose cultivation made human civilization possible -. It is a map of the Nipponbare strain of white rice grown in Japan. They need to learn to read the genetic messages and understand how the proteins in rice interact with one another. which is likely to take decades. and rice. Monsanto Co. Brazil and Britain.Genome Sequencing Project More important in the short term. A lot of the work was done in Rockville at the Institute for Genomic Research. researchers in Japan. Japan and other places. Switzerland. Availability of the rice genome will make such genetic manipulation easier in all the cereals . wheat and corn are the most important .rice. proved to be the easiest to analyze. Craig Venter. helped get the project off the ground. using the new map. agriculture.S. Those are critical needs as Asia's rapid urbanization reduces the land available for rice cultivation. which for decades has funded research aimed at feeding the world. 1 . It is a crucial model for understanding the biology of all cereals. Two Western agricultural companies. purple rice . It cost more than $100 million.but. Already. Rice is the principal source of calories for about half the world's population. completion of the rice genome is expected to speed conventional breeding programs. an independent genetics laboratory founded by maverick scientist J. Louis and Syngenta AG of Basel. Building on their success with rice. Thailand. a wild grass that lived more than 50 million years ago. by giving scientists more precise knowledge of how the plants work.
"It's just starting. and loss of flora and fauna biodiversity. soil microbes)." Issues and Controversies in Plant Genome Project Safety • Potential human health impacts.. Japan. Tampering with nature by mixing genes among species.Genome Sequencing Project "Our work is not over.g. Labeling • Not mandatory in some countries (e. • Increasing dependence on industrialized nations by developing countries. United States) • Mixing GM crops with non-GM products confounds labeling attempts Society • New advances may be skewed to interests of rich countries 1 .. Ethics • • • • Violation of natural organisms' intrinsic values. Stress for animal. Objections to consuming animal genes in plants and vice versa. including: unintended transfer of transgenes through cross-pollination. • Biopiracy or foreign exploitation of natural resources. transfer of antibiotic resistance markers." said Takuji Sasaki. vice president of the National Institute of Agrobiological Sciences in Tsukuba. unknown effects on other organisms (e. Access and Intellectual Property • Domination of world food production by a few companies. unknown effects. and principal leader of the rice genome project.g. including allergens. • Potential environmental impacts.
They are fully furred after 10 days. Behaviour In the wild state. Young mice are cared for in their mother's nest until they reach 21 days old. If a house mouse is a pet. etc. though young females are more likely to stay nearby. but mutant and calorie-restricted captive individuals have lived for as long as 5 years. They also occupy cultivated fields. Mus musculus is characterized by tremendous reproductive potential. house mice have been able inhabit inhospitable areas (such as tundra and desert) which they would not be able to occupy independently.Genome Sequencing Project Animal Genome Sequencing Project – Domestic Rice Introduction Mus musculus may have originally been distributed from the Mediterranean region to China. When House mice are from 65 to 95 mm long from the tip of their nose to the end of their body. open their eyes at 14 days. but they seldom stray far from buildings. granaries. Gestation is 19-21 days but may be extended by several days if the female is lactating. Many domestic forms of mice have been developed that vary in colour from white to black and with spots. house mice generally dwell in cracks in rocks or walls or make underground burrows consisting of a complex network of tunnels. The estrous cycle is 4-6 days long. Females generally have 5-10 litters per year if conditions are suitable. Some individuals spend the summer in fields and move into barns and houses with the onset of cool autumn weather. but it has now been spread throughout the world by humans and lives as a human commensally. and even wooded areas. their tails are 60 to 105 mm long. with estrus lasting less than a day. most mice do not live beyond 12-18 months. House mice have a polygynous mating system. Breeding occurs throughout the year. The recent discovery of ultrasonic songs produced by male mice. are weaned at 3 weeks. Soon after this most young mice leave their mother's territory. They have long tails that have very little fur and have circular rows of scales (annulations). House mice tend to have longer tails and darker fur when living closely with humans. but individuals have lived for as long as 6 years. when exposed to female sex pheromones. In the wild. but as many as 14 have been reported. Because of their association with humans. several chambers for nesting and storage. House mice generally live in close association with humans . Average life span is about 2 years in captivity. fencerows. Females experience a postpartum estrus 12-18 hours after giving birth. Their fur ranges in colour from light brown to black. and reach sexual maturity at 5-7 weeks. which are born naked and blind.in houses. although wild mice may have a reproductive season extending only from April to September. barns. the average life span is about 2 years. most mice do not live beyond 12-18 months. Litters consist of 3-12 (generally 5 or 6) offspring. and they generally have white or buffy bellys. In the wild. They range from 12 to 30 g in weight. suggests that this behavior may be involved in mate choice. Wild-derived captive Mus musculus individuals have lived up to 4 years in captivity. and three or four exits. 2 .
hawks. ultrasonic songs in response to female sex pheromones. food poisoning (Salmonella). Occasionally. foxes. and they do consume and contaminate stored human food with their droppings. Communication and Perception House mice have excellent vision and hearing. but they are far less aggressive than males. furniture. In human habitation. Territoriality is not as pronounced in wild conditions. They are also capable of reproducing very rapidly. subordinate males may occupy a territory or males may share territories. they rarely travel more than 50 feet from their established homes. a keen sense of smell. paper. Food Habit In the wild.Genome Sequencing Project living with humans. Young mice are generally made to disperse through adult aggression. although some (especially females) may remain in the vicinity of their parents. or other soft substances and line them with finer shredded material. house mice eat many kinds of plant matter. snakes. fleshy roots. family composition. upholstery. large lizards. but all the individuals in a territory will defend an area against outsiders. making these foods unavailable to other (perhaps native) animals. leaves and stems. Many mice store their food or live within a human food storage facility. Ecosystem Roles Where house mice are abundant they can consume huge quantities of grains. House mice try to avoid predation by keeping out of the open and by being fast. which means that populations can recover quickly from predation. House mice are also important prey items for many small predators. It was recently discovered that male mice produce complex. Insects (beetle larvae. however. Imperial College Press. (contributor) Eugene J. Eisen. Predation House mice are eaten by a wide variety of small predators throughout the world. Eugene J. or any hidden spot near a source of food. although some are active during the day in human dwellings. house mice nest behind rafters. House mice are generally nocturnal. Despite this. caterpillars. House mice often squeak to each other in the nest. Mus musculus is generally considered both territorial and colonial when living commensally with humans. Domesticated forms and albinos have been developed which are commonly used as laboratory animals (especially in medicine and 15 The Mouse in Animal Genetics and Breeding Research. such as seeds. and also swim well. Mice are agricultural pests in some areas. Recent research has also shown that they carry a virus . They also destroy woodwork. 2005. They use pheromones and other smells to communicate with each other about social dominance. and other household materials. They construct nests from rags. storage areas. good climbers. Females establish a loose hierarchy within the territories. mongooses. and clothing. including cats. In addition. Aggression within family groups is rare. House mice are quick runners (up to 8 miles per hour). Dominant males set up a territory including a family group of several females and their young. and bubonic plague. ferrets. and reproductive readiness. and use their whiskers to feel air movements and surface textures. tularemia. and cockroaches) and meat (carrion) may be taken when available. Mus musculus consumes any human food that is accessible as well as glue. London 2 . jumpers.that may contribute to breast cancer in humans. Eisen. in woodpiles. they contribute to the spread of diseases such as murine typhus. weasels. and owls. Economic Importance for Human? House mice do not cause such serious health and economic problems as do Rattus norvegicus and Rattus rattus. falcons. soap. however. rickettsial pox.the mouse mammary tumour virus (MMTV) 15 .
and the neighbourhoods in which these genes reside are strikingly similar in humans and mice. The mouse genome is essentially a reference manual for understanding the human genome. Medicine. they were inherited from a common mammalian ancestor millions of years ago however evolution changes genomes through the duplication and specialisation of genes. The genes in humans and mice are essentially the same genes. This estimate takes into account knowledge of molecular biology as well as comparative genomic data. causing the mice to weave. For comparison. The current estimated gene count is 23. in part because the definition of a gene is still being debated and extended. It will save investigators months. Mus musculus also has a small role as an insect destroyer. and more insights will emerge as more genomes are completed17. though many human and mouse genes appear to be similar. Comparing humans and mice has the potential to reveal key features of mammalian biology. and Mus castaneus. The former refers to a genetic strain with inner ear defects. 2004. Mus musculus often refers to several fairly distinct kinds of mice.686 genes. Virtually every gene in the mouse is also present in humans. Researchers state that having a publicly available mouse genome sequence draft means we can move from knowing that a general region of the genome is contributing to a disease state or biological process. Genome Project Sequencing of the mouse genome was completed in late 2002. 2 . Although both man and mouse share genes. Glyn Moody. "Dancing" and "singing" mice are other names for house mice. to actually looking at that region and seeing directly what genes are there. or be active at different times during the life of a person or a mouse. western European house mice. and Business.786. but this is minimal. of gene-hunting effort 16 Nature 5 420(6915):520-62 (2002) 17 Digital Code of Life: How Bioinformatics is Revolutionizing Science. they also share 'nongene' regions that may regulate genes and these could be critical to understanding why humans develop certain disease16. New York. John Wiley and Sons. As many as seven separate species may be placed under Mus musculus. and as pets. The latter refers to a pathological condition causing mice to twitter constantly with a "song" resembling that of a cricket.Genome Sequencing Project genetics). if not years. Researchers report that approximately 99 percent of mouse genes have counterparts in humans. they may have taken on slightly different roles. Estimating the number of genes contained in the mouse genome is difficult. humans are estimated to have 23. The haploid genome is about 3 billion bases long (3000 Mb distributed over 20 chromosomes) and therefore equal to the size of the human genome. although the mouse genome is fourteen percent smaller than the human genome. turn in circles. such as Mus domesticus. and wobble when they walk. In fact. southeastern Asian house mice.
All the instructions needed to direct their activities are contained within the chemical DNA (deoxyribonucleic acid). DNA from all organisms is made up of the same chemical and physical components.000 genes. and in what quantity proteins are made.g. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell. Rather..and extracellular environmental signals. can be detected by microscopic examination. Unlike the relatively unchanging genome. all human cells contain a complete genome. including missing or extra copies or gross breaks and rejoinings (translocations). when. ATTCCGGA).000-25. Studies to explore protein structure and activities. however. The constellation of all proteins in a cell is called its proteome. DNA in the human genome is arranged into 24 distinct chromosomes: physically separate molecules that range in length from about 50 million to 250 million base pairs. the basic physical and functional units of heredity. Genomes vary widely in size: the smallest known genome for a free-living organism (a bacterium) contains about 600. Although genes get a lot of attention. A protein’s chemistry and behaviour are specified by the gene sequence and by the number and identities of other proteins made in the same cell at the same time and with which it associates and reacts. Genes are specific sequences of bases that encode instructions on how to make proteins. This order spells out the exact instructions required to create a particular organism with its own unique traits. Each chromosome contains many genes. it’s the proteins that perform most life functions and even make up the majority of cellular structures. while human and mouse genomes have some 3 billion. Proteins are large.000 DNA base pairs. The knowledge obtained from the sequences applies to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes. Most changes in DNA. Whose genome was sequenced in the public (HGP) and private projects? The human genome reference sequences do not represent any one person’s genome. the remainder consists of non-coding regions. The DNA sequence is the particular side-by-side arrangement of bases along the DNA strand (e. Except for mature red blood cells. Genes comprise only about 2% of the human genome. the dynamic proteome changes from minute to minute in response to tens of thousands of intra. known as proteomics. complex molecules made up of smaller subunits called amino acids. The human genome is estimated to contain 20. whose functions may include providing chromosomal structural integrity and regulating where. A few types of major chromosomal abnormalities. The genome is an organism’s complete set of DNA. they serve as a starting point for broad comparisons across humanity. are more subtle and require a closer analysis of the DNA molecule to find perhaps single-base differences. will be the focus of much research for decades to come and will help elucidate the molecular basis of health and disease. 2 .Genome Sequencing Project Human Genome Sequencing Project Introduction Cells are the fundamental working units of every living system.
S. including equal numbers of cells with the X (female) or Y (male) sex chromosomes. A much smaller minority of polymorphisms affect an individual’s susceptibility to disease and response to medical treatments. or both. Technically. At least 18 other countries have participated in the Human Genome Project. a measure of recombination frequency. One type.Genome Sequencing Project In the international public-sector Human Genome Project (HGP). HGP scientists also used white cells from female donors' blood to include samples originating from women. Sets of human chromosomes Mapping the Genome One of the central goals of the Human Genome Project is to produce a detailed "map" of the human genome.S. the National Institutes of Health (NIH). DNA clones from many libraries were used in the overall project. researchers collected blood (female) or sperm (male) samples from a large number of donors. Sperm contain all chromosomes necessary for study. It indicates for each chromosome the whereabouts of genes or other "heritable markers. Who sequenced the human genome? Human Genome Project research was funded at many laboratories across the U. and laboratories throughout the United States also have received DOE and NIH funding for human genome research 18.occurs in which pieces of genetic material are swapped between paired chromosomes. by the Department of Energy (DOE). the DOE Human Genome Project has funded about 100 principal investigators. Many polymorphisms ." with distances measured in centi-morgans.htm (181008) 2 . Thus donors' identities were protected so neither they nor scientists could know whose DNA was sequenced. is based on careful analyses of human inheritance patterns. However.gov/sciencetech/genome. 18 http://www. During the formation of sperm and egg cells. a genetic linkage map. In addition. although a minority contributes to the beneficial diversity of humanity. companies are conducting genome research. mostly single nucleotide polymorphisms (SNPs). Most SNPs have no physiological effect. Only a few samples were processed as DNA resources.small regions of DNA that vary among individuals also were identified during the HGP.energy. universities. many large and small private U. a process of genetic recombination -or "crossing over" . it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. At any given time. Other researchers at numerous colleges.
these conveniently sized clones become resources for further studies by researchers around the world .as well as the natural starting points for systematic sequencing efforts. so a low-resolution physical map includes only a relative sprinkling of chromosomal landmarks. workers can eventually produce an ordered library of clones. Logically. Birren. If a gene can be localized to a single fragment within a contig map. Just such a detailed physical map is one that emerges from the use of restriction enzymes . Green. Each contiguous block of ordered clones is known as a contig. typically. recognizes the DNA sequence GAATTC and selectively cuts the double helix at that site. Two giant steps: Chromosomes 16 and 19 One of the signal achievements of the DOE genome effort so far is the successful physical mapping of chromosomes 16 and 19. Further. is the familiar chromosomal map. Bruce W. USA. several cancers. then cloning and ordering the resulting fragments. in which the natural reproductive machinery of a "host" organism . A typical restriction enzyme known as EcoRI. each overlapping the next and together spanning long segments (or even the entire length) of the chromosome. 2 . Tay-Sachs disease. or copying. 2003. The highresolution chromosome 19 map. process is a product of recombinant DNA technology. Indeed. and the resulting map is a contig map. and many other maladies. they are said to be separated by a distance of one centimorgan. and indicate distances at a similar level of detail. The role of human pedigrees now becomes clear. By studying family trees and tracing the inheritance of diseases and physical traits. A close analogy can thus be drawn between physical maps and the road maps familiar to us all. Further. is based on restriction fragments cloned in cosmids. numbers of base pairs. showing the distinctive staining patterns that can be seen in the light microscope. geneticists can begin to pin down the relative positions of these genetic markers. its physical location is thereby accurately pinned down. By cloning enough such fragments. for example replicates a "parasitic" fragment of human DNA. By the end of 1994. for example. which can then be detected and thus pinpointed on a specific region of the chromosome. The average gap between markers was about 0. CSHL Press. These laboratory-made "probes" 19 carry a fluorescent or radioactive label.7 centimorgan19. for example. the closer two genes are to each other on a single chromosome. means are also available to produce physical maps of much higher resolution . by a process known as in situ hybridization. Just as small-scale road maps may show only large cities and indicate distances only between major features. New York. synthetic cloning "vectors" modelled after bacteria-infecting viruses Genome Analysis: A Laboratory Manual.DNA-cleaving enzymes that serve as highly selective microscopic scalpels. Other maps are known as physical maps.Genome Sequencing Project This process of chromosomal scrambling accounts for the differences invariably seen even in siblings (apart from identical twins). constructed at the Lawrence Livermore National Laboratory. a comprehensive map was available that included more than 5800 such markers. myotonic dystrophy. A well-known low-resolution physical map. thus producing the multiple copies needed for further study. Eric D. Fortunately. Huntington disease. so called because the distances between features are measured not in genetic terms.analogous to large-scale county maps that show every village and farm road.a bacterium or yeast. the less likely they are to get split up during genetic recombination. including genes implicated in cystic fibrosis. The cloning. When they are close enough that the chances of being separated are only one in a hundred. but in "real" physical units. One use of these handy tools involves cutting up a selected chromosome into small pieces. specific segments of DNA can be targeted in intact chromosomes by using complementary strands synthesized in the laboratory. or even unique segments of DNA identifiable only in the laboratory. the analogy can be extended further.
A second important disease gene (COMP). which was identified in 1992 by an international consortium that included Livermore scientists. The cosmid contig map is an especially important step forward. and further. In a similar effort. These contigs span an estimated 54 million base pairs. excluding the centromere. Like a phage. has been localized to a single contig spanning one million base pairs. 1997. It is based on bacterial clones that are ideal substrates for DNA sequencing. A readable display of this integrated map covers a sheet of paper more than 15 feet long. leading to a breakpoint map that divides the chromosome into segments whose lengths average 1. The framework for the Los Alamos effort is yet another kind of map. And yet another gene. a cosmid hijacks the cellular machinery of a bacterium to mass-produce its own genetic material. since it is a "sequence-ready" map.Genome Sequencing Project known as bacteriophages. They have also been integrated into the breakpoint map. The integrated map also includes a transcription map of 1000 sequenced 20 Encyclopedia of Human Biology. is reproduced here as Mapping chromosome 16. each a hybrid that contains mouse chromosomes and a fragment of human chromosome 1620. London UK. Further. these clones have been restriction mapped to allow identification of a minimum set of overlapping clones for a large-scale sequencing effort. a chromosome that contains genes linked to blood disorders. it includes 250 smaller YAC clones that have been merged with the cosmid contig map. The high-resolution map comprises some 4000 cosmid clones. Over 450 genes and genetic markers have also been localized on this map. the Los Alamos National Laboratory Center for Human Genome Studies has completed a highly integrated map of chromosome 16. Most of the contigs have been mapped by fluorescence in situ hybridization to visible chromosomal bands.and low-resolution maps have been tied together by sequencetagged sites (STSs). provides the essential framework to which other cosmid contigs can be anchored. assembled into about 500 contigs covering 60 percent of the chromosome. a second form of kidney disease. the EcoRI restriction sites have been mapped on more than 45 million base pairs of the overall cosmid map. but has not yet been precisely pinpointed. Among these genes is the one responsible for the most common form of adult muscular dystrophy (DM). Moreover. leukemia. one linked to a form of congenital kidney disease. About 2000 other genes are likely to be found eventually on chromosome 19. Renato Dulbecco. 2 . provides practically complete coverage of the chromosome. In addition. Academic Press. An emerging gene map shows the locations of the mapped genes. and breast and prostate cancers. Natural breakpoints in chromosome 16 are thus identified. more than 95 percent of the chromosome. together with any "foreign" human DNA that has been smuggled into it.1 million base pairs. much reduced and showing only some of its central features. This ordered FISH map. except the highly repetitive DNA in the centromere region. more than 200 cosmids have been more accurately ordered along the chromosome by a high-resolution FISH technique in which the distances between cosmids are determined with a resolution of about 50. Anchored to this framework are a low-resolution contig map based on YAC clones and a high-resolution contig map based largely on cosmids. with cosmid reference points separated by an average of 230. The high. The low-resolution map. a portion of it. and with genetic maps developed at the Adelaide Children's Hospital and by CEPH. a "cytogenetic breakpoint map" based on 78 lines of cultured cells.000 base pairs. comprising 700 YACs from a library constructed by the Centre d'Etude du Polymorphisme Humain (CEPH).000 base pairs. short but unique stretches of DNA sequence. has also been identified. responsible for a form of dwarfism known as pseudoachondroplasia. of which nearly 300 have been incorporated into the ordered map. The foundation of the chromosome 19 map is a large set of cosmid contigs that were assembled by automated analysis of overlapping but unordered restriction fragments.
National Academies Press. Sequencing the genome by the year 2005 would therefore likely cost $10- 20 billion and require a dedicated cadre of at least 5000 workers21. P1 phages. though all of those in widespread current use are still based on methods developed in 1977 by Allan Maxam and Walter Gilbert and by Frederick Sanger and his coworkers. or spurious insertions. and to establish suitable resources for sequencing. including new cloning vectors. and P1derived artificial cloning systems (PACs) have thus been devised to address these problems. C's. including clone libraries and libraries of expressed sequences.). of course. 2003.S. the challenge of sequencing the genome is largely one of doing the job cheaper and faster. but the great majority of these are from short "sequence tags" on cloned fragments. the cost of sequencing a single base pair was between $2 and $10.000 base pairs long. as with so many human enterprises. From the beginning. Some regions of the genome. the DOE has emphasized programs to pave the way for expeditious and economical sequencing efforts -. a standard sequencing 21 Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering National Research Council (U. As a result of such improvements. and one researcher could produce between 20. thus to reveal differences that indicate various forms of the same gene.programs to develop new technologies. Included. decreasing run times. and enhancing the accuracy of base identification. Even more daunting is the realization that we will eventually need to sequence many parts of the genome many times. rearrangements. the result would fill several hundred volumes the size of a big-city phone book.Genome Sequencing Project exons (expressed fragments of genes) and more than 600 other markers developed at other laboratories around the world. at different times of our lives). a major effort in technology development was called for . as well as the sequences for stretches of DNA whose functions we don't yet know (but which may be involved in such little-understood processes as orchestrating gene expression in different parts of our bodies. these physical maps and the clones they point to are mere stepping stones to the most visible goal of the genome project. but they are not perfect. the string of three billion characters -. therefore. and others are prone to rearrangement. Both of these methods rely on gelbased electrophoresis systems to separate DNA fragments. and G's -. YACs remain a classic tool for cloning large fragments of human DNA.an effort that would drive the cost well below $1 per base pair and that would allow automation of the sequencing process. T's. would be the sequence for every gene. New York 2 . without the danger of deletions. New vectors such as bacterial artificial chromosomes (BACs). resist cloning in YACs. Marked progress is also evident in the development of sequencing technologies.000 base pairs of continuous. Clearly. the longest being about 685.000 and 50. Efforts to develop new cloning vectors have been especially productive. for example. Should anyone undertake to print it all out. At the beginning of the project. and recent advances in commercial systems include increasing the number of gel lanes. Getting down to details: Sequencing the genome Ultimately. though. Only about 30 million base pairs of human DNA (roughly one percent of the total) have been sequenced in longer stretches. These new approaches are critical for ensuring that the entire genome can be faithfully represented in clone libraries.representing the sequence of base pairs that defines our species. Hence. Several hundred million base pairs have been sequenced and archived in databases. accurate sequence in a year.A's. Only the barest start has been made in taking this dramatic step in the Human Genome Project. Committee on Challenges for the Chemical Sciences in the 21st Century.
This includes the longest contiguous fragment of sequenced human DNA. libraries of clones were established for each of the human chromosomes. and the individual clones are widely available for mapping and for isolating genes. The members of this ordered library can then be sequenced from end to end to yield a complete sequence for the parent. unverified sequences of 50. of about 685. shotgun sequencing has been the primary means for generating most of the genomic sequence data in public DNA databases.A. in the approaches available to sequencing the human genome. if no gaps are to be tolerated in the final sequence. and YACs. Twenty-five hundred genes have also been newly mapped as part of this coordinated effort. The shotgun strategy is also being used at the Genome Therapeutics Corporation and The Institute for Genomic Research (TIGR). the main disadvantage is that the same sequence must be done many times (in the many overlapping fragments). A computational assembly process then compares the terminal sequences of the many fragments and. and TIGR has successfully sequenced the complete genomes of three free-living bacteria. Haemophilus influenzae (1.a product of DOE-supported work at the University of Washington. representing the expressed parts of the human genome. have brought much nearer the day when "production sequencing" can begin. by finding overlaps that indicate neighboring fragments.A. Another critical resource is being assembled in an effort known as I. The aim is a master set of mapped and sequenced human cDNA. a bacterium important in energy production and bioremediation. as more efficient vectors have become available. (Integrated Molecular Analysis of Genomes and their Expression). Mycoplasma 2 . These identifiers.that is. cofounded by the Livermore Human Genome Center.830. are usually 300-500 base pairs each. as part of the DOE-supported Microbial Genome Initiative. had distributed over 250. and a convenient portion of it sequenced.and chromosome-sorting technologies developed at Livermore and Los Alamos.G. for example. One of the available choices. More recently. an effort supported mostly by private funds). Each fragment is then separately cloned. how many times must a given strand be sequenced to ensure acceptable confidence in the result? Shotgun sequencing derives its name from the randomly generated DNA fragments that are the objects of scrutiny.Genome Sequencing Project machine can now turn out raw.M.E. and it is not yet clear which will prove the most efficient and most costeffective way to read long stretches of DNA over the next decade. Many copies of a single large clone are broken into pieces of perhaps 1500 base pairs.137 base pairs. Shotguns and transposons Such advances as these. Based on cell. complete human DNA libraries have been established using BACs.G. Equally important to the sequencing goals of the genome project is a rational system for organizing and distributing the material to be sequenced.M.E. expressed sequence tags (ESTs). A benefit is that the final sequence is highly reliable.000 bases per day.000 to 75. Nevertheless. The statistics involved in taking this approach require that many copies of the original clone be randomly fragmented. Another is the degree of redundancy . Genome Therapeutics has sequenced 1. constructs an ordered library for the parent clone. These clones were invaluable in such notable "gene hunts" as the successful searches for the cystic fibrosis and Huntington disease genes. in both technology development and the assembly of resource libraries. however. from the human T-cell receptor beta region. A great deal of variety remains. PACs. is between "shotgun" and "directed" strategies. By early 1996. I. either by restriction enzymes or by physical shearing.8 million base pairs of Methanobacterium thermoautotrophicum. most of them with one or both ends sequenced to provide unique identifiers.000 base pairs -.000 partial and complete cDNA clones.
Unfortunately. 22 ranging from straightforward applications of automation to improve the speed and efficiency of conventional laboratory protocols to the development of technologies on the cutting edge . Ohio.Genome Sequencing Project genitalium (580. mainly the expense and inconvenience of custom-synthesizing a primer as the necessary starting point for each sequencing step. Starting at one end of a single large fragment. The essence of this approach is embodied in a technique known as primer walking. and Methanococcus jannaschii (1.070 base pairs).technologies that might potentially increase mapping and sequencing efficiencies by orders of magnitude. one can thus "walk" the entire length of the original clone. the next stretch of DNA. it has been clear that the Human Genome Project would require advanced instrumentation and automation if its mapping and sequencing goals were to be met. Murray. just overlapping the first. In principle. more and more economically feasible. and economy of large-scale mapping and sequencing efforts as a result of improved laboratory automation tools. Mayes. in which one seeks to sequence the target clone from end to end with a minimum of duplication.933 base pairs).say. including the replication of large clone libraries. McGraw-Hill Professional. Victor W.739. Rodwell. the DOE's engineering infrastructure and tradition of instrumentation development have been crucial contributors to the international effort. Peter A. Bioinformatics in Human Genome Sequencing Project22 From the start. With the sequence for this first segment in hand. is then tackled in the same way. especially. 400 base pairs long . making these primers was an expensive and time-consuming business. Robert K. efficiency. one replicates a stretch of DNA . And here. The alternative to shotgun sequencing is a directed approach. this conceptually simple approach has been historically beset with disadvantages. Significant DOE resources have been committed to innovations in instrumentation. commercial robots have simply been mechanically reconfigured and reprogrammed to perform repetitive tasks. The widely automated Sanger sequencing method involves a DNA replication step that must be "primed" by a DNA fragment that is complementary to 15 to 20 base pairs of the strand to be sequenced. 2 . the pooling of libraries as Harper's Illustrated Biochemistry. but recent innovations have made primer walking. On the first of these fronts. Until recently. 2006.that can be sequenced in one run. In many cases. Granner. and similar directed strategies. genome researchers are seeing significant improvements in the rate. Columbus. Darryl K.
In particular. several DOE-supported groups are exploring ways to adapt high-resolution photolithographic methods to the manipulation of minuscule quantities of biological reagents. to sequencing arrays of rigid glass microchannels.Genome Sequencing Project a prelude to various assays. 3 . followed by assays performed on the same "chip. and efficiency are projected in future commercial instruments. and instruments developed at Utah for automated hybridization in multiplex sequencing schemes. with an eye to simplifying sample preparation. it is nonetheless desirable to explore altogether new approaches. This innovative technique uses short oligomers that pair up with corresponding sequences of DNA. a fivefold improvement in throughput over conventional systems. A notable illustration is the world's fastest cell and chromosome sorter.1 millimetres thick. Technologies under investigation include fiber-optic arrays. Similar approaches can be envisioned to understand differences in patterns of gene expression: Which genes are active (which are producing mRNA) in which cells? Which are active at different times during an organism's development? Which are active. which greatly accelerates PCR amplifications. Challenges include providing uniform excitation over arrays of 50 to 100 capillaries and then efficiently detecting the fluorescence emitted by labeled samples. including sequencing by hybridization. ultrathin gels. In spite of continuing improvements to sequencers based on the classic methods. in place of the conventional slab gels. For example. Other examples include a high-speed. and the arraying of clone libraries for hybridization studies. in disease? Sequencing by hybridization is only one of several forward-looking ideas for revolutionizing sequencing technology. and the application of sophisticated statistical analyses reassembles the target sequence. high-speed thermal cycling systems for PCR. Both of these approaches exploit higher electric field strengths to increase DNA mobility and to reduce analysis times. and tenfold improvement in speed. In other cases. And Livermore scientists are looking beyond even capillaries. robotics-compatible thermal cycler developed at Berkeley. Another miniaturization effort aims at the fabrication of high-density combinatorial arrays of custom oligomers (short chains of nucleotides). This same technology has already been used for genetic screening and cDNA fingerprinting. A miniaturized. The oligomers are placed on an array by a process similar to that of making silicon chips for electronics. Building on experiences in the electronics industry. which is used to sort human chromosomes for chromosome-specific libraries. a number of DOEsupported efforts aim at improved versions of the automated gel-based Sanger sequencing technique. reducing measurement times.1-millimeter capillaries are used as the separation medium. The move toward miniaturization is afoot elsewhere as well. supplemented by automated gel and sample loading. Even faster speedups are seen when arrays of 0. The capillary approach is especially ripe for further development. or inactive. and cooled CCD cameras. can be used to obtain 400 bases of sequence from each lane in a hour's run. economy. Some of this effort has already been transferred to the private sector. developed at Livermore and now being commercialized. custom-designed instruments have proved more efficient. computer-controlled PCR device under development at Livermore operates on 9-volt batteries and might ultimately lead to arrays of thousands of individually controlled microPCR chambers. Smaller is better: and other developments Beyond "mere" automation are efforts aimed at more fundamental enhancements of established techniques." Current thrusts of this "nanotechnology" approach include the design of microscopic electrophoresis systems and ultrasmall-volume. which would make feasible large-scale hybridization assays. scanning confocal microscopy. less than 0. Successful matches between oligomers and genomic DNA are then detected by fluorescence.
and biopharmaceutical companies around the world. and much of the instrumentation for sensitive detection of fluorescence signals has already proved useful for molecular sizing in mapping applications. supporting efforts at Oak Ridge National Laboratory and elsewhere. several alternative approaches to direct sequencing have been explored. Another innovative sequencing method is under investigation at Los Alamos. robot control software developed at Berkeley and Livermore. These systems are the keys to efficient. because much of the challenge is interpreting genomic data and making the results available for scientific and technological applications. structural biology. but mass spectrometry has perhaps demonstrated the greatest near-term potential. singlemolecule detection of individual bases.Genome Sequencing Project increasing the length of the strands that can be analyzed in a single run. The Oak Ridge-developed GRAIL system. Dealing with the data Among the less visible challenges of the Human Genome Project is the daunting prospect of coping with all the data that success implies. Appropriate information systems are needed not only during data acquisition. These systems typically include task-specific computational engines. The genome informatics program is the world leader in developing automated systems for identifying genes in DNA sequence data from humans and other organisms. But the potential benefits are great. These systems typically comprise databases for tracking biological materials and experimental procedures. All of these alternatives look promising in the long term. The interpretation of map and sequence data is the job of data analysis systems. Efforts in all these areas are the mandate of the DOE genome informatics program. Routine application of this technique still lies in the future. The characteristic fluorescence is detected by a laser system. Over the course of the past few years. base by base. more than 180 million base pairs of DNA were analyzed with GRAIL. and direct sequencing has not yet been achieved. and software for acquiring laboratory data and presenting it in useful form.and private-sector programs focused on areas such as health effects. but fragments of up to 500 bases have been analyzed. biotechnology companies. but also to the microbial genome program and to public . Further. is a world-standard gene identification tool. and practical systems based on high-resolution mass separations of DNA fragments of fewer than 100 bases are currently being developed at several universities and national laboratories. whose products are already widely used in genome laboratories. thereby yielding the sequence. general molecular biology and medical laboratories. In 1995 alone. DNA sequencing. illustrated in Gene hunts. This approach is beset by major technical challenges. The roles of laboratory data acquisition and management systems include the construction of genetic and physical maps. 3 . Among such systems are physical mapping databases developed at Livermore and Los Alamos. and environmental remediation. and DNA sequence assembly software developed at the University of Arizona. It would therefore replace traditional gel electrophoresis as the last step in a conventional sequencing scheme. but also for sophisticated data analysis and for the management and public distribution of unprecedented quantities of biological information. including atomic-resolution molecular scanning. and gene expression analysis. cost-effective data production in both DOE laboratories and the many other laboratories that use them. the challenge extends not just to the Human Genome Project. and mass spectrometry of DNA fragments. software for controlling robots or other automated systems. and facilitating interpretation of the results. together with graphics and user-friendly interfaces that invite their use by biologists and other non-computer scientists. Mass spectrometry measures the masses of ionized DNA fragments by recording their time-of-flight in vacuum.
the organism that causes Lyme disease. the ultimate product of the Human Genome Project -. for example.5 million base pairs of DNA on human chromosomes 5 and 20. the challenges of maintaining accessible and useful databases likewise increase. The individual strands are then analyzed to yield. using the inserted transposons as starting points. Public resource databases must provide data and interpretive analyses to a worldwide research and development community.that is. and the Molecular Structure Database at Brookhaven National Laboratory. By attaching a unique identifying sequence to each sequencing sample in a 3 . C. But it is eminently practical to create a library of the 4096 possible 6-base primers. a "minimum tiling path" can be determined for each subclone -.000-base-pair fragment has already been sequenced. in a sense. on average. Three of these "6-mers" can be matched to the end of the fragment to be sequenced. thus serving as an 18-base primer. molecular. The known transposon sequence allows a single primer to be used for sequencing the full set of overlapping regions. which then become the targets of the transposons. whereas. cellular. Systems now in place include the Genome Database of human genome map data at Johns Hopkins University. a 34. and G) can be ordered in more than 68 billion ways to create an 18-base primer. As the genome project continues to provide data that interlink structural and functional biochemistry. One way to deal with the primer bottleneck. on chromosome 20. which insinuates itself more or less randomly in longer DNA strands. the region around each transposon is then sequenced. Berkeley researchers are interested in a region of about two million base pairs that is implicated in 15 to 20 percent of all primary breast carcinomas. it is critical to develop scientific databases that "interoperate. The largest clones are broken into smaller subclones (each of about 3000 base pairs). This modular primer technology. physiology and medicine. As an illustration. At the Lawrence Berkeley National Laboratory. is currently being applied to Borrelia burgdorferi. As this community of researchers expands and as the quantity of data grows. for each. a single insertion in each 3000-base-pair strand. Another directed approach uses a naturally occurring genetic element called a transposon. the approximate position of the inserted transposon. a set of strands can be identified whose transposon insertions are roughly 300 base pairs apart. is to use sets of very short fragments to prime the next sequencing step. In this set of strands. Researchers supported by the DOE at the University of Utah are also pursuing the use of directed sequencing.Genome Sequencing Project A third area of informatics reflects. and developmental biology. this technique has been used to sequence over 1. as well as over three million base pairs from the fruit fly Drosophila melanogaster. and reaction conditions are controlled to yield. developed at the Brookhaven National Laboratory. For example. such interoperable databases will be the critical resources for both research and technology development. and environmental science. This predilection for random insertion and the fact that the transposon's DNA sequence is well known are the keys to the sequencing strategy depicted schematically in taking a directed approach. On chromosome 5.information readily available to the scientific and lay communities. In addition. the four nucleotides (A. Bionformatics program is crucial to the multiagency effort to develop just such databases. Multiple copies of each subclone are exposed to the transposons. which offers a way of increasing throughput with either shotgun or directed approaches. By mapping these positions. T." sharing data and protocols so that users can expect answers to complex questions that demand information from geographically distributed data resources. interest focuses on a region of three million base pairs that is rich in growth factor and receptor genes. the Genome Sequence DataBase at the National Center for Genome Resources in Santa Fe. they have developed a methodology for "multiplex" DNA sequencing. an imposing set of possibilities.
the genome center there has produced almost two million base pairs of human DNA sequence. researchers there have completed over 1. Initially. and good starts have been made in mapping other genes. and so forth -.M. 50 such samples. more thorough sequencing efforts. so most researchers now agree that one error in a thousand is a more reasonable standard. Further. say. The 50 samples can be resolved sequentially by probing. and perhaps to uncover important individual differences. with their extensive coverage in many different kinds of cloning vectors.000. first. containing several genes involved in DNA repair and replication. then for bands containing the second. only a small random set of the subclones is then selected for sequencing. along with the associated segments of the genome. and another of approximately one million base pairs. Using a shotgun approach. the SASEderived sequences provide enough information for researchers elsewhere to pursue just such comprehensive efforts. especially the mouse. Sequence fragments already known -. A region of 60. using whole genomic DNA.A. say. With this philosophy in mind. The result is sequence coverage for about 70 percent of the original cosmid clone. the most biologically or medically important regions would still be sequenced more exhaustively. shotgun sequencing. focusing special effort on locating the estimated 3000 expressed genes on that chromosome and using those sites as starting points for directed genomic sequencing. The Utah group is now able to map almost 5000 transposons in a single experiment. have done comparative sequencing of these genes in other species.end sequences. for bands containing the first identifier. Between chromosome 16 and the short arm of chromosome 5. In addition. even random sequencing has led to the identification of gene DNA in over 15 percent of the samples. The completed physical maps of chromosomes 16 and 19. However. confirming the apparent high density of genes on this chromosome.much as in other sequencing approaches.3 million bases of genomic sequence. containing a kidney disease gene.8 million base pairs of the thermophilic microbe Pyrococcus furiosus and two important regions of human chromosome 17.000.G. and so forth. and they are using multiplexing in concert with a directed sequencing strategy to sequence the 1. Interestingly. another Los Alamos target. Livermore scientists have targeted DNA repair gene regions throughout the genome and.E. In a similar way. enough to allow identification of genes and ESTs. multiplexing can also be used for mapping. but using this lowered standard would greatly reduce the cost of acquiring sequence data for the bulk of human DNA. In 3 .are used as the starting points. the difference between one human being and another is more like one base pair in five hundred. thus pinpointing the most critical targets for later. sometimes even 1 in 100. The Livermore scientists are making use of the I. Los Alamos scientists have begun a project to determine the cost and throughput of a low-redundancy sequencing strategy known as sample sequencing (SASE. in many cases. Los Alamos scientists have therefore begun sequencing chromosome 16. or "sassy"). are especially ripe for large-scale sequencing. In contrast to. To assure a higher level of confidence. Such comparative sequencing has identified conserved sequence elements that might act as regulatory regions for these genes and has also assisted in the identification of gene function How good is good enough? The goal of most sequencing to date has been to guarantee an error rate below 1 in 10.Genome Sequencing Project mixture of. they are attacking two major regions of chromosome 19: one of about two million base pairs. the entire mixture can be analyzed in a single electrophoresis lane. cDNA resource to sequence the cDNA from these regions. Clones are selected from the high-resolution Los Alamos cosmid map. though. and then physically broken into 3000-base-pair subclones -. A parallel effort is under way at Livermore on chromosome 19 and other targeted genomic regions. sequence-tagged sites.000 base pairs has already been sequenced around the adult polycystic kidney gene.
000-base-pair sequence is the second-longest stretch of contiguous human DNA sequence ever produced. and have that guide in prescribing treatment. The downside is that some unscrupulous individual having access to that information could misuse or exploit that individual 24. As the first major target of SASE analysis. Ethical Issues23 Controversies That Never End There are many ethical issues that are raised as a direct result of our knowledge. the cost of SASE sequencing is only one-tenth the cost of obtaining a complete sequence. Hispanic. there were on average 14 versions that could be inherited by a given person from parents. black. proved to be as efficient as typical shotgun sequencing. to a telomeric region on the long arm of chromosome 7. Huge gene Variation found in Humans: Find May Explain Differing Responses to Medication. ageism. even a complete genome sequence -. Inc. The resulting 230.Genome Sequencing Project addition. but it required only two. sexism. understanding and usage of genome technology. would such information merely serve as another discriminating mechanism to ostracize individuals from mainstream society? Treatment and Medicine Genaissance Pharmaceticals. over 1. Los Alamos scientists chose a cosmid contig of four million base pairs at the end (the telomere) of the short arm of chromosome 16.is only a start in understanding the human genome. though. 2001 3 . government? Since this society. in contrast to the seven. MIT Press. located in New Haven announced they have detected an "astonishing" variance at the genetic level in 82 unrelated people from four racial backgrounds . The deepest mystery is how the potential of 100. family. though.the ultimate physical map -.to tenfold redundancy required in shotgun approaches. In addition. http://www. Greif. Vovis foresees a day when doctors will take a sample of blood. By early 1996. A first step toward solving these subtle mysteries. Jon F. Boston. In a sense. genderism). can't/ won't free itself from the deleterious clutches of the "isms" (racism. The first application of this strategy. how blood cells and brain cells are able to perform their very different functions with the same genetic program. Asian. Karen F. Their study of 313 genes. First and foremost is what will be done with this information? Who has a right to have it? Should potential employers be given this information? Should insurance carriers be given this information? Will this technology absolve some and indict others of their responsibilities to society. do a total genetic examination.msnbc. EST.4 million base pairs had been sequenced. and how these and countless other cell types arise in the first place from an single undifferentiated egg cell. Another upside to this technology is that side effects produced by the ingestion of medication could be minimized or eradicated altogether. and a gene.com/news. July 12. or suspected coding region had been located on every cosmid sampled.000 genes is regulated and controlled.000 identified by human genome scientists found that for each gene. Gearld Vovis. based on historical documentation. Merz. and a genomic region can be "sampled" ten times as fast. out of 30. Mass 24 MSNBC Reuter.white. is a more complete physical picture of the master molecules that lie at the heart of it all. 2007. 23 Current Controversies in the Biological Sciences: Case Studies of Policy Challenges from New Technologies. Los Alamos is building on the SASE effort by using SASE sequence data as the basis for an efficient primer walking strategy for detailed genomic sequencing.to threefold redundancy to produce a complete sequence. Genaissance chief technology officer and senior vice president felt this might explain why there is such a wide variance in how people respond to medication.
based on something that might not take place in the next five to ten years? From the potential employer's perspective. indisputably and beyond any serious doubt that Darwin was right mankind evolved over a long period of time from primitive animal ancestors. or cause you to pay abhorrently elevated fees. "On the Origin of Species by Means of Natural Selection in 1859. Science believes in the tangible and concrete. The genome reveals. When Charles Robert Darwin first presented his book of theories entitled. as the scientists who cracked the genome all agreed. and omniscient deity created all life here on Earth. Today.. any predisposition to alcoholism. the naturalist feels that Darwin has been vindicated. it was met fairly much as it is today. dinosaurs. Now imagine if insurance companies had access to your genetic composition? You could potentially be penalized now for what may be coming twenty years (or never) down the road. if you have something you don't know about. AIDS. and education and sometimes who-you-know. Huntington's disease. That will not set well with a potential employee who will need to be selfsupporting. Darwin believed that first man evolved. cancer. Changes came about because of natural selection. Religion says I believe therefore it is real. There is. Religion is predicated on all faith. Science says reality must be grounded in fact. said that if you look at our genome it is clear that "evolution …must make new genes from old parts. obesity. and a host of other malaise can increases your rates. your "employability" is based on skill. Moral Issues Science and religion have been at odds for eons. The only up side to the current system is. wooly mammoths and our primate ancestors. Religion sets its sights on a reality that has no bases in logic. the answer would probably be a resounding yes. …none of these headlines capture the most basic." The core recipe of humanity carries clumps of genes that show we are descended from bacteria. Science relies on empirical data. Employment What if potential employers had this information? When you currently apply for a job. No one can look at how the book of life is written and not come away fully understanding that our genetic instructions have evolved from the same programs that guided the development of earlier animals. The human genome project confirms the theory on evolution. references. There is no other way to explain the jerryrigged nature of the genes that control key aspects of our development. the most important consequence of mapping out all of our genes. no test and no evidence in support of evolution is.Genome Sequencing Project Insurance Companies If life insurance companies had this information. Multiple Sclerosis. Faith is a belief in that which is not seen. no other possible explanation. The response to all those who thump their bible and say there is no proof. Religion has promoted the notion that an omnipotent. cancer. Our genes show that scientific creationism cannot be true. Sure the business side of cracking 3 . in our genes. experience. With access to your genetic code." Eric Lander of the Whitehead Institute in Cambridge. or experienced. omnipresent. how might that impact society? Anyone who has ever sought life insurance is familiar with the little indicators that can prohibit your ability to become insured. but rather emotional rectitude. Mass. could become grounds for not being hired. The fundamental difference between the two is buttressed in philosophy. it cannot be used against you. "The proof is right here. Smoking. Would it be ethical for the employer to deny you the opportunity to make money today. coronary heart disease. Our genetic instructions have been slowly assembled from the genetic instructions that made jellyfish.
Clearly the genie is out of the bottle and there is no way of stopping its progress.Genome Sequencing Project our genetic code is fascinating. The collaboration of the U. MSNBC. The debate further intensifies as religion frowns on the notion that man will attempt to play God. During the last twenty years. once vehemently denied. February 21.S. We are spawning new scientific fields of study like "proteomics. will there be a way to control what goes on? Will the quality and validity be retained.1 billon 25 letters of DNA. or defects in children. Historical denials. Department of Energy and the National Institutes of Health seems to have been a good merger. thanks to our continuing breakthroughs with DNA. Even that not particularly religious. We have sequenced 3. now pierces the veneer of American piety. One salient thought keeps me from totally embracing this new technology: can we fallible creatures objectively and responsibly handle this knowledge? Caplan. Clearly our bio-technical advances are working. The public at-large must indeed become more knowledgeable so that an eye can be kept on Big Brother. like the Thomas Jefferson debauchery. The potential for eliminating illnesses with debilitating effects on adults and children are clearly a good reason to continue. add to the aptitude of the child. And we all need to be sure that our government does not leave us in the genetic lurch without laws to ensure our privacy and protect us against genetic discrimination25. Arthur. and proven that humans are made up of 30. and by all means. 2001 3 . Change the eye colouring. change the hair texture. we have seen a thirty-percent increase in the number of centenarians. courtesy of the genome factor by proving he fathered several of Sally Hemings' children. Though we have the technology. 'Darwin vindicated!' Cracking of human genome confirms theory of evolution. Gene mapping will make it possible to do away with perceived flaws. as more for-profit businesses like Celera Genomics enter the picture? Only time will tell. freeing those who have been incarcerated unjustly. We are correcting past wrongs." the study of the production of proteins. only two times more than fruit flies.000 genes.000 to 40. is it moral to take away the variety that nature provides? Will scientists one day perceive certain ethnic groups as being unwanted flaws? Conclusion There is no doubt that the human genome project started in 1990 left humankind hanging at the precipice of eminent power and direction. let's make the child athletic and very aesthetically pleasing. As more and more companies enter the arena. fear a resurgence of Adolph Hitler's vision of creating the perfect race.
Sir Maurice Yonge. Molecular technology to develop rapid diagnostic that ensure the safety of the seafood we eat and the vitality of the seafood industry. Seafood-borne illness adversely affects public health and coastal economies. 1971. Nanoflagellates are a group of marine microbes. One of the examples of the genome sequencing among the marine organism is Nanoflagellates. utilizing a highly efficient system of symbiotic lignocellulose degradation that is biologically. Advances in Marine Biology. prey on other microbes. wood-boring marine bivalves. digestion. and evolutionarily distinct from those found in termites. We are also has to develop the biological technology needed to identify sources of ecological stress to develop strategies to protect and restore coastal resources. Shipworms. These predatory protists play a critical role in marine carbon cycling. shipworms accomplish the complete degradation of lignocellulose with a simple intracellular 26 In genome sequencing project among the marine organism will enable scientist to differentiate populations and address emerging disease to protect fishery and ecological resources. Our challenge as a nation is to discover the life-enhancing and lifesaving properties this unique organism posses. the ability of shipworms to consume wood depends on symbiotic bacteria that provide enzymes. Like termites. Unlike termites. for survival. and all other cellulose-consuming animals. functionally. London UK.Genome Sequencing Project Marine Genome Sequencing Project . Analysis of the shipworm symbiont community metagenome will provide important insights into the composition and function of this unique lignocellulose degrading bacterial community and will allow valuable comparisons to the recently sequenced termite symbiont metagenome. 3 . While for the Bankia setacea26. Sir Frederick Stratten Russell. and biomass incorporation by protists that determine the fate of phytoplankton and bacteria to bridge the gap in our knowledge about this important player in the marine food web. including cellulases and other hydrolases critical for digestion of wood by the host and potentially valuable for commercial bioconversion of lignocelluloses to ethanol. Academic Press. have been nicknamed as "termites of the sea" These animals are capable of feeding solely on wood. An International team of investigators led by Monterey Bay Aquarium Research Institute's Alexandra Worden will investigate the genetic mechanism behind the processes of predation. and we know little about the biochemical characteristics. one such metagenome lurks inside was the giant Pacific shipworm.Nanoflagellates More than 80% of the earth living organisms are found in only aquatic ecosystem. such as bacteria and phytoplankton. ruminants.
also comprise a portion of the hydrocarbon masses in several modern-day petroleum and coal deposits. an assistant professor of integrative biology and of molecular and cell biology. Geochemical analysis has shown that botryococcenes. led by Andrew Koppisch and colleagues from Los Alamos National Laboratory and five other institutions. Calif. aside from the fact that they are an important food for krill. these proteins' roles are a mystery.Genome Sequencing Project consortium of just a few related types of microbes. The project was proposed by Daniel Distel of the Ocean Genome Legacy Foundation. "Choanoflagellates show no hint of multicellularity.. about the same as the fruit fly or the mouse. The newly sequenced genome of a one-celled. and a 2005 MacArthur "genius" Award winner. in collaboration with researchers from UC Berkeley and eight other institutions. has been reported for this particular organism. Since Monosiga does not form colonies as do some other choanoflagellates. braunii produces a family of compounds termed botryococcenes. that synthesizes longchain liquid hydrocarbon compounds and sequesters them in the extracellular matrix of the colony to afford buoyancy. According to King. in animals. reported on Feb. 14 2008 in the journal Nature. less than 10 micrometers in size. some of these proteins." said King. "In animals. they hold a key to understanding the origins and evolution of animals. which are the main source of food for baleen whales. as such. called cadherins. will target the identification of specific metabolic pathways responsible for hydrocarbon synthesis to alleviate bottlenecks in biofuels production. presumably from ancient B. Yet. biologists Nicole King. Daniel Rokhsar and their colleagues present their first draft of the genome of a choanoflagellate called Monosiga brevicollis. but they have 23 genes for cadherin proteins. King said. University of California. and their first comparisons with the genes of multicellular animals. A type of B. Another marine organism is Botryococcus braunii. Botryococcenes have already been converted to fuel suitable for internal combustion engines. either genetic or metabolic. While algae have been recognized for their role in carbon sequestration and for biofuels production. "Choanoflagellates are the closest living unicellular relatives of animals and. which hold promise as an alternative energy source. they are the glue that prevents clumps of cells from falling apart." One finding confirmed by the sequencing is that choanoflagellates have many genes that. produce proteins essential to cell-to-cell signalling and in determining which cells stick to one another. which has been dominated by one-celled organisms. can help us learn about our history and the history of life on Earth. because choanoflagellates and animals shared a common ancestor between 600 million and a billion years ago. It is a colony-forming green microalgae. and that. evolved for linking cells together. braunii communities." King said. the so-called metazoans. Berkeley." 3 . little information. biologists know almost nothing about these organisms. planktonic marine organism. by consuming large quantities of bacteria. The sequencing and analysis was performed by the Department of Energy Joint genome Institute (JGI) in Walnut Creek. "They help shed light on the biology and genome content of the unicellular organisms from which we evolved. In the Nature paper and a complementary Science paper also released that week. is already telling scientists about the evolutionary changes that accompanied the jump from one-celled life forms to multicellular animals like us. This project. choanoflagellates play a major role in the carbon cycle of the oceans.
UC Berkeley professor of molecular and cell biology and program head for computational genomics at JGI. where they gorge on bacteria. They are our best way of triangulating on that last unicellular ancestor of animals." said Dan Rokhsar. Humans.000 genes. At about 10 microns across. The flagellum propels the choanoflagellate through the water and also washes bacteria towards the tentacles. where the choanoflagellate attaches to surfaces. and around the tentacles. yeast.that capture bacteria. they argue. While yeast is well known to genetics researchers. consists of about 9.Genome Sequencing Project In the Science paper. while more complex metazoans adopted these proteins for gluing cells into a larger. for example. have about 25. choanoflagellates are not .choano comes from the Greek word for collar . King and Rokhsar successfully proposed the choanoflagellate for sequencing several years ago as part of the Department of Energy's Microbial Genome Program." they wrote in Science. because the fossil record is not there. Because choanoflagellates resemble the feeding cells of sponges. but much smaller than the genomes of metazoans. however. "The transition to multicellularity likely rested upon the co-option of diverse transmembrane and secreted proteins to new functions in intercellular signaling and adhesion. The draft genome. King worked on isolating enough uncontaminated DNA for sequencing. Perhaps. Choanoflagellates are found abundantly in salt and fresh water around the world.a situation King hopes will change now that the genome is sequenced. where bacteria are captured and ingested.200 genes. and in the intervening years. many-celled creature. which are among the most primitive of animals. "Choanoflagellates really are a unique window back in time to the origin of animals and humans. biologists 165 years ago proposed that these organisms were very distant ancestors of multicelled animals. the last single-celled ancestor of all animals (including humans) employed these ancient cadherin proteins to bind and eat bacteria. The cells are egg-shaped with a single long tail or flagellum at one end surrounded at its base by a collar of tentacles . completed and annotated in 2007. King and graduate student Monika Abedin report that some of these proteins are found around the base of the choanoflagellate cell. It is similar in size to the genomes of fungi and diatoms. King and Rokhsar also are members of UC Berkeley's Center for Integrative Genomics. 3 . they're about the size of another one-celled eukaryote.
whose genome is due to be sequenced by the National Institutes of Health . collagen.Genome Sequencing Project Interestingly. though they have no immune system. it is not always easy determining which genes were in the last common ancestor of choanoflagellates and humans. shows a surprising degree of complexity. integrin and cadherin domains.may answer such questions. The choanoflagellate genome.noncoding regions once referred to as "junk" DNA . Introns have to be snipped out before a gene can be used as a blueprint for a protein and have been associated mostly with higher organisms.in its genes as humans do in their genes. or at least does not form colonies. noting a similar situation with the starlet sea anemone.a colony-former called Proterospongia. and often in the same spots. where we start with a genome to understand the biology of an organism. Nematostella vectensis. though they have no skeleton or matrix binding cells together." King said. at least in ways that allow you to make hypotheses about what those first steps toward animals looked like. like the genomes of many seemingly simple organisms sequenced in recent years. the choanoflagellate has nearly as many introns . These findings are helping King and her colleagues assemble a picture of what the original common ancestor of humans and choanoflagellates looked like and also get hints about the first animals. Choanoflagellates and humans have been evolving for the same length of time. King said. sequenced in 2007. so differences between the genomes may reflect genes that have been lost by choanoflagellates as much as genes gained by humans. Nevertheless. for example.” 4 . "The genome is the toehold. King has hopes that the Monosiga genome will answer many questions of animal evolution and illuminate the biology of this poorly understood aquatic creature. choanoflagellates have five immunoglobulin domains. "It remarkable to what extent we can figure out how those animal ancestors must have been able to stick together and communicate with each other. including another choanoflagellate . "This is a new era. Comparison of the Monosiga genome to that of other organisms. and proteins called tyrosine kinases that are a key part of signaling between cells. Likewise. and which are new." Rokhsar said. even though Monosiga is not known to communicate. have been found in simple organisms that lack a centralized nervous system. Many genes involved in the central nervous system of higher organisms.
Improvements in DNA sequencing technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per base pair) and newer technology has also meant that genomes can be sequenced far more quickly. However. Rather than sequences a chromosome in one go. Thus. Future Perspectives Historically. and what those genes do. when sequencing eukaryotic genomes (such as the worm Caenorhabditis elegans) it was common to first map the genome to provide a series of landmarks across the genome. Secondary emphasis is placed on species whose genomes will help answer important questions in molecular evolution (such as: the common chimpanzee). this will allow us to better understand aspects of human genetic diversity. Such projects may also include gene prediction to find out where the genes are in a genome. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. where coding DNA may only account for a few percent of the entire sequence). It is often reported that the goal of sequencing a genome is to obtain information about the complete set of genes in that particular genome sequence. When research agencies decide what new genomes to sequence. It could also be argued that a complete genome project should include the sequences of mitochondria and (for plants) chloroplasts as these organelles have their own genomes.Genome Sequencing Project Future Perspectives and Conclusion When is a genome project finished? When sequencing a genome. Changes in technology and in particular improvements to the processing power of computers. Even when every base pair of a genome sequence has been determined. the emphasis has been on species which have either a relevance to human health (examples: pathogenic bacteria or vectors of disease such as mosquitoes) or species which have commercial importance (such as livestock and crop plants). “completed” genome sequences are rarely ever complete. means that genomes can now be “shotgun sequenced” in one go (there are caveats to this approach though when compared to the traditional approach). it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. there are still likely to be errors present because DNA sequencing is not a completely accurate process. and terms such as “working draft” or “essentially complete” have been used to more accurately describe the status of such genome projects. there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). it is not always possible (or desirable) to only sequence the coding regions separately. as scientists understand more about the role of this noncoding DNA (often referred to as junk DNA). it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. Also. it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). For humans. 4 . In the future. The proportion of a genome that encodes for genes may be very small (particularly in eukaryotes such as humans. There may also be related projects to sequence ESTs or mRNAs to help find out where the genes actually are.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.