The human genome project

By, Anu S

• • • • • What is a genome? Brief introduction to human genome Why human genome project? Goals of human genome project Techniques involved in human genome 1. Clone-clone sequencing 2. Celera shot gun sequencing Role of bioinformatics in HGP Genes and their role in the body Ethical, Legal, and Social Implications Advantages and Disadvantages of human genome project Conclusion Reference

• • • • • •

What is a genome?
• The entire genetic makeup of the cell nucleus of any organism is called a genome Genes carry the information for making all of the proteins required by the body for growth and maintenance. The genome also encodes rRNA and tRNA which are involved in protein synthesis.

The Human Genome
• Made up of ~35,000-50,000 genes which code for functional proteins in the body • Includes non-coding sequences located between genes, which makes up the vast majority of the DNA in the genome (~95%) • The particular order of nucleotide bases (As, Gs, Cs, and Ts) determines the amino acid composition of proteins • Information about DNA variations (polymorphisms) among individuals can lend insight into new technologies for diagnosing, treating, and preventing diseases that afflict humankind

History of human genome project
• Human genome project was officially started in June 1990 • The project proposed was for 15year • The countries that took part in human genome project are: France, Germany, Japan, China, the UK and USA • They completed the draft in 2000 • human genome project was completed in April 2003

Why human genome project
• Most inherited diseases are rare, but taken together, the more than 3,000 disorders known to result from single altered genes rob millions of healthy and productive lives. • Today, little can be done to treat, let alone cure, most of these diseases. But having a gene in hand allows scientists to study its structure and characterize the molecular alterations, or mutations, that result in disease. • Progress in understanding the causes of cancer • Gene mutations probably play a role in many of today's most common diseases, such as heart disease, diabetes, immune system disorders, and birth defects.

• These diseases are believed to result from complex interactions between genes and environmental factors. • When genes for diseases have been identified, scientists can study how specific environmental factors, such as food, drugs, or pollutants interact with those genes.

What Goals Were Established for the Human Genome Project When it Began in 1990?
• Identify all of the genes in human DNA. • Determine the sequence of the 3 billion chemical nucleotide bases that make up human DNA. • Store this information in data bases. • Develop faster, more efficient sequencing technologies. • Develop tools for data analysis. • Address the ethical, legal, and social issues (ELSI) that are arise form the project.

Techniques involved in human genome
• • • • • • • •

DNA Sequencing The Employment of Restriction Fragment-Length Pol Yeast Artificial Chromosomes (YAC) Bacterial Artificial Chromosomes (BAC) The Polymerase Chain Reaction (PCR) Electrophoresis Clone-clone sequences Celera short gun sequence

DNA sequencing
• DNA sequencing, the process of determining the exact order of the 3 billion chemical building blocks (called bases and abbreviated A, T, C, and G) that make up the DNA of the 24 different human chromosomes, was the greatest technical challenge in the Human Genome Project. Achieving this goal has helped reveal the estimated 20,000-25,000 human genes within our DNA as well as the regions controlling them. The resulting DNA sequence maps are being used by 21st Century scientists to explore human biology and other complex phenomena. This type of sequencing is done by four methods:
1. 2. 3. 4. Maxium and gillbert method of seqencing Sanger’s method of sequencing Pyro sequencing Automated sequencing

• • •

Restriction fragment length polymorphism
• Restriction fragment length polymorphisms (RFLPs) were the first type of molecular markers used in linkage studies. • RFLPs arise because mutations can create or destroy the sites recognized by specific restriction enzymes, leading to variations between individuals in the length of restriction fragments produced from identical regions of the genome differences in the sizes of restriction fragments between individuals can be detected by Southern blotting with a probe specific for a region of DNA known to contain an RFLP. • The segregation and meiotic recombination of such DNA polymorphisms can be followed like typical genetic markers. • RFLP analysis of a family can detect the segregation of an RFLP that can be used to test for statistically significant linkage to the allele for an inherited disease or some other human trait of interest

Yeast artificial chromosome
• • • This method first described in 1983 by Murray and Szostak A yeast artificial chromosome (short YAC) is a vector used to clone large DNA fragments (larger than 100 kb and up to 3000 kb). It is an artificially constructed chromosome and contains the telomeric, centromeric, and replication origin sequences needed for replication and preservation in yeast cells. Built using an initial circular plasmid, they are linearised by using restriction enzymes, and then DNA ligase can add a sequence or gene of interest within the linear molecule by the use of cohesive ends. Use of different regions of DNA in different YACs allows the rapid determination of the sequence, or order of the constituents, of the DNA.

Bacterial artificial chromosome
• bacterial artificial chromosome (BAC) is a DNA construct, based on a functional fertility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually coli-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell division. The bacterial artificial chromosome's usual insert size is 150-350 kbp, but can be greater than 700 kbp. BACs are often used to sequence the genome of organisms in genome projects, for example the Human Genome Project. A short piece of the organism's DNA is amplified as an insert in BACs, and then sequenced. Finally, the sequenced parts are rearranged in silico, resulting in the genomic sequence of the organism

• •

Polymerase chain reaction
• Using the polymerase chain reaction (PCR), millions of copies of a specific DNA segment can be made in a test tube. • PCR is also an automated process. Many physical mapping strategies depend on creating an array of linear DNA overlaps. • Multiple copies of DNA fragments are needed to complete the mapping process. • PCR can be applied for forensic purposes as well. • From a very tiny amount of DNA, the polymerase chain reaction can be used to produce more copies of the DNA for analysis • most mapping techniques in the Human Genome Project (HGP) rely on PCR.

Clone-clone sequencing
• • • When the whole genome sequencing work on human and other organism was initiated in late-1980s, it was decided that large segments (clones) of genomic DNA (produced by partial digestion) may first be aligned in a linear order on the chromosomes as overlapping segments, which can then be used as landmarks for sequencing data. The sequences of individual clones can thus be conveniently coalesced to obtain the DNA sequence covering an entire chromosome. Large DNA segments are cloned in BAC vectors and these BACs are used for construction of physical maps. since the physical position of each clone on a chromosome is defined in the form of ordered BACs, In late 1980s and early 1990s, such clone-based maps were considered necessary and useful for complete genome sequencing and were therefore prepared in several animal and plant genomes. Using these clone-based maps, whole genome sequencing was successfully completed in several eukaryotes including yeast (S. cerevisiae), a nematode (C. elegans) and a higher plant (Arabidopsis thaliana). Such clone-based maps also contributed, though partly, to the whole genome sequencing of Drosophila melanogaster, the mouse and the humans.

• •

• •

Once the BACs are physically mapped, the physical maps can be utilized for whole genome sequencing using the following steps : (i) BAC clones are selected from the whole genome BAC map, using suitable algorithms (software), so that minimum number of BAC clones with minimum overlapping is used to over the entire genome. This is often described as selection of minimum tilling path. In case of human genome, 10,000 to 20,000 BACs were selected to generate a working draft of human genome; (ii) BAC clones re used for subcloning, so that small inserts of a manageable size for sequencing are available in cosmid or plasmid vectors (DNA segments longer than 500-800 base pairs can not be sequenced directly in manual or automated sequencers). These subclones are subjected to shotgun (random) sequencing without ordering them within the BAC clone, so that many subclones are sequenced to ensure sequencing of all parts of a BAC. This approach has been used to sequenced to ensure sequencing genomes of yeast and a nematode, C. elegans and also partly the genomes of fruitfly, mouse and humans. In this approach, every part of the genome is actually sequenced roughly 45 times to ensure that no part of the genome is left out

Celera shot gun sequencing

• Celera was founded in 1998 by Craig Venter ,with the mission to sequence the human genome and provide clients with early access to the resulting data. • Using state-of-the art sequencing technology supplied by Applied Biosystems and sophisticated internally-developed informatics, Celera pioneered the application of “shotgun” sequencing • Whole-genome shotgun sequencing involves shearing or cleavage (partial digestion) of genomic DNA followed by cloning, to produce a genomic library. • This is followed by sequencing of cloned DNA-fragments at random, followed by assembly of the fragment sequences into larger units on the basis of their overlaps. • The techniques is described as shotgun assembly. • This approach does not require any or physical maps of the genome for whole genome sequencing.

Craig Venter also made use of publicly available hierarchical shotgun DNA sequence data generated by the International Human Genome Sequencing Consortium (IHGSC). • The sequences were initially obtained in the form of 140 sequenced contigs, each contig, consisting of 2-20 overlapping clones and representing different non-overlapping portions of the genome (a contig is a set of contiguous overlapping clones, each contig having two to more than 25 clones and a singleton is a clone not incorporated into any contig). • The gaps between these contigs were filled later. For this purpose, the genomic library was searched for singletons, whose end sequences may match those of the ends of two different contigs. If such a clone (singleton) is available, its sequence will fill the gap between two contigs. As many as 99 gaps were filled in this manner

Difference between the clone by clone and celera shotgun method
Clone by clone method Celera shotgun method
It requires a physical map of whole genome (the crude map) It straight away moves to the job of sequening

Many copies of randomly cut genome fragments are taken (150,000bp) These fragments are inserted into BAC and a library is constructed The DNA is fingerprinted to give each piece a unique identification Each BAC is then randomly broken into 1500bp and it is placed in another artificial piece of DNA called M13 and M13 library is constructed The M13 libraries are then sequenced

The genome is shredded into pieces (2000bp) and for the second time they generate a 10000bp These fragments are inserted into suitable vector and a library is constructed -


The 2,000bp and the 10000bp plasmid libraries are sequenced Computer algorithms assemble the sequenced fragments into continuous stretch resembling each chromosome The above steps are repeated 8-9 times

These sequences are fed into a computer program called PHRAP that looks for common sequences The above steps are repeated for 4-5 times

Role of bioinformatics in HGP
• One of the key research areas was bioinformatics. Without the annotation provided via bioinformatics, the information gleaned from the HGP is not very useful. • Informatics is the creation, development, and operation of databases and other computing tools to collect, organize, and interpret data. • Continued investment in current and new databases and analytical tools is critical to the future usefulness of HGP data. • Databases must adapt to the evolving needs of the scientific community and must allow queries to be answered easily. • Planners suggest developing a human genome database, analogous to model organism databases, that will link to phenotypic information. • Also needed are databases and analytical tools for studying the expanding body of gene-expression and functional data, for modeling complex biological networks and interactions, and for collecting and analyzing sequence-variation data.

Genes and their role in the body
• • • • F5: Position: 1q23 Full name: coagulation factor V Role in the body:
1. Coagulation factor V is an essential component of the blood coagulation cascade. 2. Blood coagulation is initiated either by trauma or by damage to blood vessels and culminates in the conversion of a circulating protein called fibrinogen into its derivative fibrin, the substance of blood clots. 3. Factor V co-operates with another coagulation factor, known as factor X, to convert the inactive polypeptide prothrombin into the active enzyme thrombin. 4. This enzyme then converts fibrinogen into fibrin and allows blood clots to form. 5. Interestingly, factor V is also cleaved by thrombin so there is a positive feedback loop between the two enzymes - blood clotting stimulates more blood clotting. This amplifies the coagulation cascade and results in rapid clotting when required.


Role in disease:
• • • • • • Defects in the F5 gene generally block the coagulation cascade and result in prolonged bleeding, either externally or into body cavities. one particular class of mutation (factor V Leiden mutations) has the opposite effect - these mutations predispose the patient to frequent clotting events, manifesting as deep vein thrombosis. This is because factor V also helps to inhibit blood clotting, (it acts as an anticoagulant). It does this by interacting with another anticoagulant protein called activated protein C (APC). Were it not for such regulation, blood clotting would run out of control every time we suffered a minor injury. Leiden mutations in F5 specifically prevent interaction between factor V and APC, and therefore affect its anticoagulant activity but not its role in the coagulation pathway

• • • •

RHO Position: 3q21-q24 Full name: rhodopsin (opsin 2, rod pigment) Role in the body:
– Rhodopsin is a membrane-spanning protein expressed in the light-sensitive rod cells (photoreceptor cells) of the retina. – The protein is functional when it is chemically attached to another molecule called retinal, which is derived from vitamin A. – The fully assembled protein facilitates the perception of dim light.

Role in disease:
– Rhodopsin is required for normal photoreceptor development. – The absence of rhodopsin (or the presence of a defective rhodopsin) results in retinal degeneration, a condition known as retinitis pigmentosa, which is a major cause of blindness in developed countries. – About 15 per cent of retinal degeneration in humans is caused by mutations in the RHO gene. – Retinal degeneration can be slowed by supplementing the diet with vitamin A, as the presence of excess retinal may help to stabilize the protein.

• • • •

HD Position: 4p16 Full name: Huntington's disease Role in the body:
– – – – – The HD gene is expressed widely in the body and produces two distinct mRNAs. The larger of the two transcripts is expressed preferentially in the brain and encodes a protein called huntingtin. The precise role of the protein is unknown but it is associated with microtubules and synaptic vesicles. Microtubules are components of the cytoskeleton that give structural stability to the cell and facilitate the transport of molecules and other components between cell compartments, while synaptic vesicles are required for communication between neurons. It is therefore possible that the protein is involved in the transport of substances from the cell body to the synapses. The protein may also play a role in apoptosis (deliberately programmed cell death). The HD gene first came to notice as a candidate for Huntington's disease, a neurodegenerative disorder in which certain neurons are progressively destroyed, leading to dementia. The mutation that causes the disease is not a point mutation or a deletion as might be expected, but an expansion of a trinucleotide repeat. There is a series of repeats (in this case the sequence CAG) within the coding region of the gene that can expand or contract from generation to generation. This produces huntingtin proteins with variable numbers of glutamine residues, a so-called polyglutamine tract. Once the number of repeats exceeds 35, it becomes unstable and can increase rapidly in subsequent generations.

Role in disease:
– – – – –

• XIST • Full name: X(inactive)-specific transcript • Role in the body:
– The XIST gene is unusual in that it encodes a functional RNA molecule rather than a protein. – Most genes are transcribed to produce mRNAs that serve as templates for protein synthesis. – In the case of the XIST gene, the RNA itself carries out a function in the cell. The function of the XIST RNA is intriguing. – It is expressed from one of the two X-chromosomes in female cells just prior to inactivation – It appears to coat the active regions of the chromosome from which it is expressed and promote histone modifications that favour the formation of heterochromatin. – This results in the transcriptional repression of large parts of the chromosome.

• Role in disease:
– Rare females have been identified with multiple congenital malformations and severe mental retardation that appear to result from the presence of a small X-chromosome derivative that lacks the XIST gene and therefore cannot be inactivated. – The multiple symptoms reflect the doubling in dosage of a large number of X-linked genes. – Other families have been identified in which there is skewed inactivation of either the paternal or maternal X-chromosome. – It has been suggested that up to 18 per cent of spontaneous abortions may result from skewed Xinactivation

• • • •

SRY Position: Yp11 Full name: sex-determining region Y Role in the body:
– The product of the SRY gene is a transcription factor - a protein that controls gene expression. – It is also known as the testis-determining factor and is required to initiate male development. – Following SRY expression in the sex-neutral genital ridge of the embryo, other transcription factors are synthesized. – One of these is called steroidogenic factor 1 and is encoded by the NR5A1 gene on chromosome 9. It helps to activate genes that facilitate the synthesis of male sex hormones, such as anti-Mullerian hormone and testosterone.

Role in disease:
– the absence of the SRY gene in XY individuals leads to complete gonadal dysgenesis, producing adults with streaks of gonadal tissue where ovaries would normally be found and a complete set of Mullerian ducts (fallopian tubes, uterus). – The external appearance of such individuals is females Translocation of SRY to the tip of the X chromosome results in male development in XX individuals. However, other genes on the Y chromosome are required for sperm development, so XX SRY males are generally sterile

Ethical, Legal, and Social Implications
• • • • • Fairness in the use of genetic information by insurers, employers, courts, schools, adoption agencies, and the military, among others Privacy and confidentiality of genetic information. Psychological impact and stigmatization due to an individual's genetic differences. Reproductive issues including adequate informed consent for complex and potentially controversial procedures, use of genetic information in reproductive decision making, and reproductive rights Clinical issues including the education of doctors and other health service providers, patients, and the general public in genetic capabilities, scientific limitations, and social risks; and implementation of standards and quality-control measures in testing procedures. Uncertainties associated with gene tests for susceptibilities and complex conditions (e.g., heart disease) linked to multiple genes and gene-environment interactions Conceptual and philosophical implications regarding human responsibility, free will vs genetic determinism, and concepts of health and disease. Health and environmental issues concerning genetically modified foods (GM) and microbes. Commercialization of products including property rights (patents, copyrights, and trade secrets) and accessibility of data and materials.

• • • •

Advantages and Disadvantages of human genome project
• Advantages:
– improving our knowledge of gene expression, – elucidating the function of the large proportion of DNA we know little about – discovering possible means of diagnosis for some genetic diseases, – discovering possible treatments for currently untreatable genetic diseases – discovering new tools and techniques for genetic research, – generating the ability to go directly from a trait to a gene, – identifying genetically validated therapeutic targets which would increase the cost-benefit ratio in pharmaceutical discovery, – investigating the development of drug resistance in bacteria, – investigating antigenic variation and host-parasite interaction at both the host and parasite level

• Disadvantage: • the cost – the money could be spent elsewhere, • the anguish resulting from knowing that a person has an untreatable genetic disease, • the use or misuse of genetic information by such organisations as insurance companies and employers, • the ownership of genetic test results, • the patenting of human genes and DNA, • the increasing gap between rich and poor countries in the quality of life and the level of health and disease treatment, • the exploitation of isolated populations in the search for disease genes, • the ethics of accumulating genotypic profiles of people - are they able to be used for anything that the researcher wants, • decisions about the ownership of data by 'affected' or donor individuals, • the ethics of germline gene therapy, • the ethics of somatic gene therapy, • the costs of genetic treatment versus benefit to the community.

• Human Genome Project research will help solve one of the greatest mysteries of life: • How does one fertilized egg "know" to give rise to so many different specialized cells, such as those making up muscles, brain, heart, eyes, skin, blood, and so on? For a human being or any organism to develop normally, a specific gene or sets of genes must be switched on in the right place in the body at exactly the right moment in development. • Information generated by the Human Genome Project will shed light on how this intimate dance of gene activity is choreographed into the wide variety of organs and tissues that make up a human being.

• • • • • • •

Biotechnology by clark

Sign up to vote on this title
UsefulNot useful