Dna Sequencing and Human Genome Project

De La Salle University Dasmarias
Deoxyribonucleic Acid (DNA) Sequencing and Human Genome Project
A Review Paper presented to Dr. MARGEL BONIFACIO Professor, Physical Sciences Department College of Science
In Partial Fulfilment of the Requirements for the course Biochemistry
ROQUE, Jason R. HERNANDEZ, Ritz Hendrie C.
September 2011
Digitally signed by Jason Raquin Roque DN: cn=Jason Raquin Roque, o, ou, email=jason_mike15@yahoo. com, c=PH Date: 2012.05.15 14:25:24 +08'00'

TABLE OF CONTENTS
Abstract
Introduction
Definition of Terms Background
1 3
Discussion
DNA Preparation Sequencing Reaction Strand Separation Primer Annealing Primer Extension Chain Termination Putting it all together Capillary Electrophoresis Computer Analysis Summary of the Procedures
10 12 12 13 14 15 16 17 19 20
Conclusion
21
References
22

ABSTRACT
Deoxyribonucleic acid (DNA) Sequencing is one of the methods of the human genome project. This method has several steps and procedure that make a human genome project very useful to the field of science and medicine. Steps of which are: DNA preparation, sequencing reaction, strand separation, primer annealing, primer extension, chain termination, putting it all together, capillary electrophoresis and computer analysis. All of the steps and procedure must be done in accordingly to attain the desired sequenced DNA.
INTRODUCTION
Definition of Terms
DNA
Deoxyribonucleic acid is the hereditary material in humans and almost all other organisms. Nearly every cell in a persons body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial or mtDNA). [1]
Plasmid
A 'plasmid' is a small, circular piece of DNA that is often found in bacteria. This innocuous molecule might help the bacteria survive in the presence of an antibiotic, for example, due to the genes it carries. To scientists, however, plasmids are important

because (i) we can isolate them in large quantities, (ii) we can cut and splice them, adding whatever DNA we choose, (iii) we can put them back into bacteria, where they'll replicate along with the bacteria's own DNA, and (iv) we can isolate them again - getting billions of copies of whatever DNA we inserted into the plasmid! Plasmids are limited to sizes of 2.5-20 kilobases (kb), in general. [2]
BAC
The term 'BAC" is an acronym for 'Bacterial Artificial Chromosome', and in principle, it is used like a plasmid. We construct BACs that carry DNA from humans or mice or wherever, and we insert the BAC into a host bacterium. As with the plasmid, when we grow that bacterium, we replicate the BAC as well. Huge pieces of DNA can be easily replicated using BACs - usually on the order of 100-400 kilobases (kb). Using BACs, scientists have cloned (replicated) major chunks of human DNA. This, as you will see later, is critical to the Human Genome Project. [2]
Vector
The 'vector' is generally the basic type of DNA molecule used to replicate your DNA, like a plasmid or a BAC. [2]
Insert
The 'insert' is a piece of DNA we've purposely put into another (a 'vector') so that we can replicate it. Usually the 'insert' is the interesting part, consequently. In the case of the Human

Genome Project or other sequencing projects, the insert is the part we want to sequence - the part we don't know. Usually we know the complete DNA sequence of the vector. [2]
Shotgun Sequencing
Shotgun sequencing is a method for determining the sequence of a very large piece of DNA. The basic DNA sequencing reaction can only get the sequence of a few hundred nucleotides. For larger ones (like BAC DNA), we usually fragment the DNA and insert the resultant pieces into a convenient vector (a plasmid, usually) to replicate them. After we sequence the fragments, we try to deduce from them the sequence of the original BAC DNA. [2]
Background
What is DNA Sequencing? DNA sequencing, the process of determining the exact order of the 3 billion chemical building blocks (called bases and abbreviated A, T, C, and G) that make up the DNA of the 24 different human chromosomes, was the greatest technical challenge in the Human Genome Project. Achieving this goal has helped reveal the estimated 20,000-25,000 human genes within our DNA as well as the regions controlling them. The resulting DNA sequence maps are being used by 21st Century scientists to explore human biology and other complex phenomena.
DNA sequencing is a newer technology; it has been known since the invention of the microscope that some central part of the human cell

(and animal and plant cells, any cell of a living organism) has as its core some small piece of information-holding matter that probably contains the blueprint of how each cell in your body is formed. In 1944 Deoxyribonucleic Acid was discovered to be the chemical comprising this tiny central encyclopedia found in every cell, and the shortening to DNA became a household word. [3]
Automated DNA sequencing is a core research tool used by almost every research biochemistry lab. It is used to determine the sequence of DNA, or the genetic code, that serves as the blueprint of life for every organism on Earth. [4]
Nucleic acid sequencing is a relatively late arrival for the sequencing of biological macromoleculesfor up until the late 1980s protein sequencing was the primary tool for obtaining coding information found in the molecules of life. Protein sequencing is a slow and expensive endeavor, and it could easily take a year or more to sequence a protein of 500 amino acids. Today the sequence of a protein can be determined from DNA analyses in just a few days. Because of the straightforward and repetitive nature of the procedure, the sequencing itself is typically performed in centralized facilities where automated machines carry out the reactions and data analysis. [4]
Figure 1. Nucleic Acid Sequencing
Meeting Human Genome Project Sequencing goals by 2003 required continual improvements in sequencing speed, reliability, and costs. Previously, standard methods were based on separating DNA fragments by gel electrophoresis, which was extremely labor intensive and expensive.
Gel-based sequencers use multiple tiny (capillary) tubes to run standard electrophoretic separations. These separations are much faster because the tubes dissipate heat well and allow the use of much higher electric fields to complete sequencing in shorter times. [5]
Most DNA sequencing reactions use dideoxy nucleotides (ddNTP) to stop DNA synthesis at specific nucleotides. For example, if the ddCTP to the right is incorporated into a growing strand of DNA, the lack of a free 3 OH group would prevent the next nucleotide from being added, and the chain would terminate. In automated sequencing we use a different fluorescent label attached to each of the four dideoxy nucleotides (ddA, ddC, ddG and ddT). Thus we can determine the terminal base in each fragment of DNA. [6]
DISCUSSION
Scientists have developed a number of biochemical and genetic techniques by which DNA can be separated, rearranged, and transferred from one cell to another. Some of these laboratory methods help scientists study the properties of genes in naturefor example, by comparing DNA from different animals to find out whether those animals are closely related to each other or only distant relatives. Other DNA techniques provide tools for genetic engineeringthe

alteration of genes in an organism. These tools are used in industry to develop commercial products, such as hardier crops, microbes that can break down oil slicks or decompose garbage, and improved medicines.
Figure 2. DNA Molecule
A DNA molecule consists of a ladder, formed of sugars and phosphates, and four nucleotide bases: adenine (A), thymine (T), cytosine (C), and guanine (G). The genetic code is specified by the order of the nucleotide bases, and each gene possesses a unique sequence of base pairs. Scientists use these base sequences to locate the position of genes on chromosomes and to construct a map of the entire human genome. [2]
Gene libraries are often necessary in genetic engineering to isolate a DNA segment when its details are not fully knowne. g., in order to determine its nucleotide sequence. In this case, one can use what are known as DNA libraries. A DNA library consists of a large number of vector DNA molecules containing

different fragments of foreign DNA. For example, it is possible to take all of them RNA molecules present in a cell and transcribe them into DNA. These DNA fragments (known as copy DNA or cDNA) are then randomly introduced into vector molecules.
A library of genomic DNA can be established by cleaving the total DNA from a cell into small fragments using restriction endonucleases, and then incorporating these into vector DNA. Suitable vectors for gene libraries include bacteriophages, for example (phages for short). Phages are viruses that only infect bacteria and are replicated by them. Gene libraries have the advantage that they can be searched for specific DNA segments, using hybridization with oligonucleotides.
Since it was discovered that DNA is the material in the cell that carries our genetic information, understanding DNA has become a primary focus of genetic research. Our chromosomes, or genome, consist of neatly wound strands of DNA. All living organisms, from bacteria to human beings, contain DNA in each of their cells. Each cell contains the entire genetic code for that organism. [4]
Figure 3. Illustration of the four (4) building blocks of DNA

DNA consists of just four building blocks, or nucleotides. These four building blocks, known by their abbreviations A, T, G and C are used as the alphabet to write our genetic code. All the instructions needed to build our bodies are encoded using just these four letters.
Figure 4. Genome / Gene Sizes
Genomes come in a variety of sizes. Viruses, which cannot live without a host cell, have the smallest genomes, while higher-order organisms such as plants and animals have genomes that are billions of bases long. Like genomes, individual genes can vary greatly in size, from several hundred bases to millions of bases. The average human gene is about 3,000 bases long, although only about 1,0002,000 bases actually encode protein. These protein-encoding stretches of DNA are called exons.
Introns, which are intervening stretches of DNA that are not fully understood, make up the rest of the gene. The largest human gene, dystrophin, a muscle protein implicated in muscular dystrophy, is 2.4 million bases in length.

Viral and bacterial DNA sequences, which do not contain introns, are typically the shortest genes.
Figure 5. Extrons and Introns
The dideoxy DNA sequencing procedure was invented by Frederic Sanger and his colleagues in 1977. With a few improvements, this method is still used today. This elegant procedure, which can be fully automated, allows large sequencing centers to read over 1,000 bases of DNA sequence per second, a feat which now allows scientists to sequence even large genomes within the span of years, rather than decades.
[4]
Figure 6. Dideoxy DNA Sequencing

First, DNA has to be extracted from the cells of the organism being studied. The sequencing reaction is then performed on the DNA, and the sequenced DNA strands are sorted by size using capillary electrophoresis. Finally, the DNA code is read by a computer, which displays the data for scientists to use.
Figure 7. Sequencing Framework
DNA Preparation
Before it can be sequenced, DNA needs to be purified from cells. First, the cells and their nuclei are broken open. This can be accomplished by mechanical methods, such as grinding, or by chemical methods that break apart cell membranes. The DNA floating around in this soup is still coated with protective proteins. The DNA can be selectively removed
10

from this soup by precipitating it and DNA-binding proteins can be cleaned away.
Figure 8. Procedures of DNA Preparation
Very large pieces of DNA, such as whole chromosomes or genomes, are cut into smaller pieces and stored in vectors (plasmids), which are larger pieces of DNA with the ability to be reproduced when placed in host cells such as bacteria.
Bacteria containing a vector are placed in culture medium, where they multiply a million-fold or more. Each time a bacterium divides, the DNA vector placed inside is also copied. In this way, the target DNA can be multiplied exponentially. Each of the copied DNA pieces is called a clone. [4]
11

Sequencing Reaction
The sequencing reaction itself consists of four steps, which will be covered in detail in this section. First, the double-stranded DNA is separated into single strands, and a small starter piece of DNA called a primer binds to one of the strands, called the template strand.
In the extension step, a new DNA strand is made that is complementary to the template strand. Starting at the primer, DNA polymerase uses the template strand as a guide to recreate the second DNA strand.
The termination step is the key to the sequencing reaction. Strand extension is halted by the incorporation of a dye-labeled terminator nucleotide, which identifies the base at the position where strand extension stopped. When many strand termination reactions are performed together, each of the bases in a DNA strand can be identified. [4]
Strand Separation
Double-stranded DNA needs to be denatured, or separated into single strands, before it can be sequenced. This process is accomplished by heating the DNA, which disrupts the hydrogen bonds and Vander Waals forces that hold the two chains of DNA together in a double helix.
12
Figure 9. Strand Separation by Heating
Primer Annealing
Next, a small single-stranded DNA piece of about 20 bases, called an oligonucleotide, is annealed to the denatured template strand. This oligonucleotide is needed to prime the next step, theory, the two DNA strands that were
DNA extension. In
separated in the preceding step could just snap back together. This is avoided by using a rapid cooling process, which gives the
small nucleotides an advantage over long DNA strands in annealing. In addition, a large excess of primers is used to again ensure that the primers will out-compete the complementary DNA strand for annealing to the template.
13
Figure 10. Primer Annealing
The oligonucleotide primer must be of complementary sequence to the template strand in order to bind by base-pair interactions.
Figure 11. Complementary Sequence
Primer Extension
During the extension phase, a bacterial DNA polymerase enzyme begins assembling a new DNA chain from the individual nucleotide building blocks, or dNTPs, provided in the reaction mixture.
14
Figure 12. New DNA Chain: dNTPs
The nucleotides are added in the order specified by the complementary bases in the template strand. DNA polymerase cannot start copying a template strand without a small piece of DNA to start the extension process. This is why the primer was added in the previous step.
Figure 13. Copying the Template Strand by means of DNA Polymerase
Chain Termination
The reaction mixture also contains small amounts of each of the 4 dideoxynucleotides, or ddNTPs, which lack the 3'hydroxyl group necessary for chain extension. Whenever a dideoxynucleotide is incorporated into a growing DNA chain, it terminates chain growth because another nucleotide cannot be attached to it. Each of the four ddNTPs is labeled with a different dye, which can later be detected using a special laser.
15
Figure 14. Dideoxynucleotides or ddNTPs
The DNA polymerase occasionally incorporates a labeled dideoxynucleotide into a growing DNA strand. This doesnt happen very often, because the concentration of
dideoxynucleotides is much lower than the concentration of dNTPs in the reaction mixture. The ratio of dNTPs to ddNTPs is carefully balanced to get just the right number of chain termination events.
Figure 15. Addition of ddNTPS to the Template Strand
Putting it all together
An actual sequencing reaction mixture contains thousands of DNA template strands, which are all being by chance, some annealed
sequenced simultaneously. Simply
16

primers will only be extended a few nucleotides before the
chain extension is terminated by the addition of a ddNTP.
However, other primers will form a long chain of DNA before a ddNTP is incorporated. Thus, there will be a
population of DNA strands in the
reaction, some very short,
some very long, and every possible length in between. To further increase the yield of sequenced strands, the sequencing reaction is performed in a thermal cycler, which cycles through the heating and cooling steps dozens of times, in effect repeating the sequencing reaction many times in one experiment.
Figure 16. Simultaneous DNA Sequencing
Capillary Electrophoresis
The newly synthesized DNA strands, each labeled with one of four dyes, are now sorted by length using capillary electrophoresis. First, the reaction mixture is heated to keep the newly synthesized single strands from annealing with the template DNA strand. The dye-labeled single
17

strands are loaded onto a tiny capillary tube containing a viscous, gel-like material.
Figure 17. Heating of the Reaction Mixture
An electrical current pulls the negatively charged DNA strands through the capillary. This tube is not much thicker than a human hair and is 1 to 3 feet long, sufficient to separate strands that differ in length by only one base. Because of the small dimensions involved, preparation of the capillary and loading of the sample are computer controlled.
Figure 18. Capillary Tube
18

Shorter DNA strands migrate through the gel material more quickly, and come out the bottom of the capillary first, while longer
strands become tangled in the gel material and take longer to emerge out the bottom.
Figure 19. Migration of the Strands
As the strands emerge out the bottom of the capillary they pass through a laser beam that excites the fluorescent dye attached to the dideoxynucleotide at the end of each strand. This causes the dye to fluoresce, or glow, at a specific wavelength, or color. This color is then detected by a photocell, which feeds the information to the computer. [4]
Figure 20. Detection of ddNTPs
Computer Analysis
The computer displays the information received from the photocell as an electropherogram, which is a tracing of signal received by the photodetector in each of the four wavelengths. Although the real colors seen by the photodetector are close to green, yellow, orange and red, the
19

computer assigns false colors to each of the four tracings to make it easier to tell them apart. It also prints the letter of the appropriate base below each of the signal peaks. Because successive peaks correspond to DNA segments differing in length by one nucleotide, the sequence of peaks reveals the sequence of bases in the original DNA sample. [4]
Figure 21. Electropherogram of DNA Sequence
Table 1. Fluorescence Color Assignment
Summary of Procedure
Today, dideoxy sequencing is the method of choice to sequence very long strands of DNA. DNA is purified from the cells of the organism of interest, and placed into cloning vectors, which allow the DNA to be multiplied by bacterial hosts. Each clone is then individually sequenced.
20

This method can be done manually, or can be be fully automated, depending on how much DNA needs to be sequenced.
The DNA is denatured and a small DNA oligonucleotide is annealed at one end of the sequence of interest on the template strand. The DNA polymerase extends the oligonucleotide, using the template strand to guide incorporation of nucleotides. Once in a while, a dideoxynucleotide will be incorporated into the growing DNA strand. Because it is missing the 3 hydroxyl group, the dideoxynucleotide will prevent the DNA chain from being extended further. In addition, each dideoxynucleotide has a different color label. Consequently, each terminated DNA chain is colored according to the nucleotide at its end.
When the chains are separated by length by capillary electrophoresis, individual chains of increasing length can be identified by their color. A laser at the bottom of the capillary excites the fluorescent labels as they come out of the capillary. The fluorescence color then tells the computer which base is represented, and the computer records each base, one by one on a graph called an electopherogram.
CONCLUSION
Overlapping sequence data from many clones is analyzed by powerful computers, which regenerate the full-length sequence by piecing the short sequences together like a puzzle. Entire genomes can be sequenced in this manner. Genome sequencing projects, representing many different organisms, hold the promise of unprecedented advances in industry and medicine. Microbial genomes may encode enzymes that could help make industrial processes more efficient. Human genome sequences are helping us to better understand human
21

metabolism and disease and may make it easier to treat genetic diseases or design better drugs in the future.
REFERENCES
1. Retrieved from http://ghr.nlm.nih.gov/handbook/basics/dna on September 23, 2011 2. Retrieved from http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/mbglossary/mbgloss.ht ml on September 23, 2011 3. Retrieved from http://www.dnasequencing.com/ on September 23, 2011 4. Retrieved from http://www.wiley.com/college/pratt/0471393878/student/animations/dna_s equencing/index.html on September 23, 2011 5. Retrieved from http://www.ornl.gov/sci/techresources/Human_Genome/faq/seqfacts.shtml on September 23, 2011 6. Retrieved from http://bioweb.uwlax.edu/genweb/molecular/theory/dna_sequencing/dna_se quencing.html on September 23, 2011
22

Dna Sequencing and Human Genome Project

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dna Sequencing and Human Genome Project

Uploaded by

Copyright:

Available Formats

De La Salle University Dasmarias

Deoxyribonucleic Acid (DNA) Sequencing and Human Genome Project

In Partial Fulfilment of the Requirements for the course Biochemistry

ROQUE, Jason R. HERNANDEZ, Ritz Hendrie C.

De La Salle University Dasmarias

Definition of Terms Background

De La Salle University Dasmarias

De La Salle University Dasmarias

De La Salle University Dasmarias

De La Salle University Dasmarias

Figure 1. Nucleic Acid Sequencing

De La Salle University Dasmarias

De La Salle University Dasmarias

Figure 2. DNA Molecule

De La Salle University Dasmarias

Figure 3. Illustration of the four (4) building blocks of DNA

De La Salle University Dasmarias

Figure 4. Genome / Gene Sizes

De La Salle University Dasmarias

Figure 5. Extrons and Introns

Figure 6. Dideoxy DNA Sequencing

De La Salle University Dasmarias

Figure 7. Sequencing Framework

De La Salle University Dasmarias

Figure 8. Procedures of DNA Preparation

De La Salle University Dasmarias

De La Salle University Dasmarias

Figure 9. Strand Separation by Heating

De La Salle University Dasmarias

Figure 10. Primer Annealing

Figure 11. Complementary Sequence

De La Salle University Dasmarias

Figure 12. New DNA Chain: dNTPs

Figure 13. Copying the Template Strand by means of DNA Polymerase

De La Salle University Dasmarias

Figure 14. Dideoxynucleotides or ddNTPs

Figure 15. Addition of ddNTPS to the Template Strand

Putting it all together

sequenced simultaneously. Simply

De La Salle University Dasmarias

chain extension is terminated by the addition of a ddNTP.

population of DNA strands in the

reaction, some very short,

Figure 16. Simultaneous DNA Sequencing

De La Salle University Dasmarias

Figure 17. Heating of the Reaction Mixture

Figure 18. Capillary Tube

De La Salle University Dasmarias

Figure 19. Migration of the Strands

Figure 20. Detection of ddNTPs

De La Salle University Dasmarias

Figure 21. Electropherogram of DNA Sequence

Table 1. Fluorescence Color Assignment

De La Salle University Dasmarias

De La Salle University Dasmarias

You might also like