Professional Documents
Culture Documents
A Review Paper presented to Dr. MARGEL BONIFACIO Professor, Physical Sciences Department College of Science
September 2011
Digitally signed by Jason Raquin Roque DN: cn=Jason Raquin Roque, o, ou, email=jason_mike15@yahoo. com, c=PH Date: 2012.05.15 14:25:24 +08'00'
Abstract
Introduction
1 3
Discussion
DNA Preparation Sequencing Reaction Strand Separation Primer Annealing Primer Extension Chain Termination Putting it all together Capillary Electrophoresis Computer Analysis Summary of the Procedures
10 12 12 13 14 15 16 17 19 20
Conclusion
21
References
22
Deoxyribonucleic acid (DNA) Sequencing is one of the methods of the human genome project. This method has several steps and procedure that make a human genome project very useful to the field of science and medicine. Steps of which are: DNA preparation, sequencing reaction, strand separation, primer annealing, primer extension, chain termination, putting it all together, capillary electrophoresis and computer analysis. All of the steps and procedure must be done in accordingly to attain the desired sequenced DNA.
INTRODUCTION
Definition of Terms
DNA
Deoxyribonucleic acid is the hereditary material in humans and almost all other organisms. Nearly every cell in a persons body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial or mtDNA). [1]
Plasmid
A 'plasmid' is a small, circular piece of DNA that is often found in bacteria. This innocuous molecule might help the bacteria survive in the presence of an antibiotic, for example, due to the genes it carries. To scientists, however, plasmids are important
BAC
The term 'BAC" is an acronym for 'Bacterial Artificial Chromosome', and in principle, it is used like a plasmid. We construct BACs that carry DNA from humans or mice or wherever, and we insert the BAC into a host bacterium. As with the plasmid, when we grow that bacterium, we replicate the BAC as well. Huge pieces of DNA can be easily replicated using BACs - usually on the order of 100-400 kilobases (kb). Using BACs, scientists have cloned (replicated) major chunks of human DNA. This, as you will see later, is critical to the Human Genome Project. [2]
Vector
The 'vector' is generally the basic type of DNA molecule used to replicate your DNA, like a plasmid or a BAC. [2]
Insert
The 'insert' is a piece of DNA we've purposely put into another (a 'vector') so that we can replicate it. Usually the 'insert' is the interesting part, consequently. In the case of the Human
Shotgun Sequencing
Shotgun sequencing is a method for determining the sequence of a very large piece of DNA. The basic DNA sequencing reaction can only get the sequence of a few hundred nucleotides. For larger ones (like BAC DNA), we usually fragment the DNA and insert the resultant pieces into a convenient vector (a plasmid, usually) to replicate them. After we sequence the fragments, we try to deduce from them the sequence of the original BAC DNA. [2]
Background
What is DNA Sequencing? DNA sequencing, the process of determining the exact order of the 3 billion chemical building blocks (called bases and abbreviated A, T, C, and G) that make up the DNA of the 24 different human chromosomes, was the greatest technical challenge in the Human Genome Project. Achieving this goal has helped reveal the estimated 20,000-25,000 human genes within our DNA as well as the regions controlling them. The resulting DNA sequence maps are being used by 21st Century scientists to explore human biology and other complex phenomena.
DNA sequencing is a newer technology; it has been known since the invention of the microscope that some central part of the human cell
Automated DNA sequencing is a core research tool used by almost every research biochemistry lab. It is used to determine the sequence of DNA, or the genetic code, that serves as the blueprint of life for every organism on Earth. [4]
Nucleic acid sequencing is a relatively late arrival for the sequencing of biological macromoleculesfor up until the late 1980s protein sequencing was the primary tool for obtaining coding information found in the molecules of life. Protein sequencing is a slow and expensive endeavor, and it could easily take a year or more to sequence a protein of 500 amino acids. Today the sequence of a protein can be determined from DNA analyses in just a few days. Because of the straightforward and repetitive nature of the procedure, the sequencing itself is typically performed in centralized facilities where automated machines carry out the reactions and data analysis. [4]
Meeting Human Genome Project Sequencing goals by 2003 required continual improvements in sequencing speed, reliability, and costs. Previously, standard methods were based on separating DNA fragments by gel electrophoresis, which was extremely labor intensive and expensive.
Gel-based sequencers use multiple tiny (capillary) tubes to run standard electrophoretic separations. These separations are much faster because the tubes dissipate heat well and allow the use of much higher electric fields to complete sequencing in shorter times. [5]
Most DNA sequencing reactions use dideoxy nucleotides (ddNTP) to stop DNA synthesis at specific nucleotides. For example, if the ddCTP to the right is incorporated into a growing strand of DNA, the lack of a free 3 OH group would prevent the next nucleotide from being added, and the chain would terminate. In automated sequencing we use a different fluorescent label attached to each of the four dideoxy nucleotides (ddA, ddC, ddG and ddT). Thus we can determine the terminal base in each fragment of DNA. [6]
DISCUSSION
Scientists have developed a number of biochemical and genetic techniques by which DNA can be separated, rearranged, and transferred from one cell to another. Some of these laboratory methods help scientists study the properties of genes in naturefor example, by comparing DNA from different animals to find out whether those animals are closely related to each other or only distant relatives. Other DNA techniques provide tools for genetic engineeringthe
A DNA molecule consists of a ladder, formed of sugars and phosphates, and four nucleotide bases: adenine (A), thymine (T), cytosine (C), and guanine (G). The genetic code is specified by the order of the nucleotide bases, and each gene possesses a unique sequence of base pairs. Scientists use these base sequences to locate the position of genes on chromosomes and to construct a map of the entire human genome. [2]
Gene libraries are often necessary in genetic engineering to isolate a DNA segment when its details are not fully knowne. g., in order to determine its nucleotide sequence. In this case, one can use what are known as DNA libraries. A DNA library consists of a large number of vector DNA molecules containing
A library of genomic DNA can be established by cleaving the total DNA from a cell into small fragments using restriction endonucleases, and then incorporating these into vector DNA. Suitable vectors for gene libraries include bacteriophages, for example (phages for short). Phages are viruses that only infect bacteria and are replicated by them. Gene libraries have the advantage that they can be searched for specific DNA segments, using hybridization with oligonucleotides.
Since it was discovered that DNA is the material in the cell that carries our genetic information, understanding DNA has become a primary focus of genetic research. Our chromosomes, or genome, consist of neatly wound strands of DNA. All living organisms, from bacteria to human beings, contain DNA in each of their cells. Each cell contains the entire genetic code for that organism. [4]
Genomes come in a variety of sizes. Viruses, which cannot live without a host cell, have the smallest genomes, while higher-order organisms such as plants and animals have genomes that are billions of bases long. Like genomes, individual genes can vary greatly in size, from several hundred bases to millions of bases. The average human gene is about 3,000 bases long, although only about 1,0002,000 bases actually encode protein. These protein-encoding stretches of DNA are called exons.
Introns, which are intervening stretches of DNA that are not fully understood, make up the rest of the gene. The largest human gene, dystrophin, a muscle protein implicated in muscular dystrophy, is 2.4 million bases in length.
The dideoxy DNA sequencing procedure was invented by Frederic Sanger and his colleagues in 1977. With a few improvements, this method is still used today. This elegant procedure, which can be fully automated, allows large sequencing centers to read over 1,000 bases of DNA sequence per second, a feat which now allows scientists to sequence even large genomes within the span of years, rather than decades.
[4]
DNA Preparation
Before it can be sequenced, DNA needs to be purified from cells. First, the cells and their nuclei are broken open. This can be accomplished by mechanical methods, such as grinding, or by chemical methods that break apart cell membranes. The DNA floating around in this soup is still coated with protective proteins. The DNA can be selectively removed
10
Very large pieces of DNA, such as whole chromosomes or genomes, are cut into smaller pieces and stored in vectors (plasmids), which are larger pieces of DNA with the ability to be reproduced when placed in host cells such as bacteria.
Bacteria containing a vector are placed in culture medium, where they multiply a million-fold or more. Each time a bacterium divides, the DNA vector placed inside is also copied. In this way, the target DNA can be multiplied exponentially. Each of the copied DNA pieces is called a clone. [4]
11
The sequencing reaction itself consists of four steps, which will be covered in detail in this section. First, the double-stranded DNA is separated into single strands, and a small starter piece of DNA called a primer binds to one of the strands, called the template strand.
In the extension step, a new DNA strand is made that is complementary to the template strand. Starting at the primer, DNA polymerase uses the template strand as a guide to recreate the second DNA strand.
The termination step is the key to the sequencing reaction. Strand extension is halted by the incorporation of a dye-labeled terminator nucleotide, which identifies the base at the position where strand extension stopped. When many strand termination reactions are performed together, each of the bases in a DNA strand can be identified. [4]
Strand Separation
Double-stranded DNA needs to be denatured, or separated into single strands, before it can be sequenced. This process is accomplished by heating the DNA, which disrupts the hydrogen bonds and Vander Waals forces that hold the two chains of DNA together in a double helix.
12
Primer Annealing
Next, a small single-stranded DNA piece of about 20 bases, called an oligonucleotide, is annealed to the denatured template strand. This oligonucleotide is needed to prime the next step, theory, the two DNA strands that were
DNA extension. In
separated in the preceding step could just snap back together. This is avoided by using a rapid cooling process, which gives the
small nucleotides an advantage over long DNA strands in annealing. In addition, a large excess of primers is used to again ensure that the primers will out-compete the complementary DNA strand for annealing to the template.
13
The oligonucleotide primer must be of complementary sequence to the template strand in order to bind by base-pair interactions.
Primer Extension
During the extension phase, a bacterial DNA polymerase enzyme begins assembling a new DNA chain from the individual nucleotide building blocks, or dNTPs, provided in the reaction mixture.
14
The nucleotides are added in the order specified by the complementary bases in the template strand. DNA polymerase cannot start copying a template strand without a small piece of DNA to start the extension process. This is why the primer was added in the previous step.
Chain Termination
The reaction mixture also contains small amounts of each of the 4 dideoxynucleotides, or ddNTPs, which lack the 3'hydroxyl group necessary for chain extension. Whenever a dideoxynucleotide is incorporated into a growing DNA chain, it terminates chain growth because another nucleotide cannot be attached to it. Each of the four ddNTPs is labeled with a different dye, which can later be detected using a special laser.
15
The DNA polymerase occasionally incorporates a labeled dideoxynucleotide into a growing DNA strand. This doesnt happen very often, because the concentration of
dideoxynucleotides is much lower than the concentration of dNTPs in the reaction mixture. The ratio of dNTPs to ddNTPs is carefully balanced to get just the right number of chain termination events.
An actual sequencing reaction mixture contains thousands of DNA template strands, which are all being by chance, some annealed
16
However, other primers will form a long chain of DNA before a ddNTP is incorporated. Thus, there will be a
some very long, and every possible length in between. To further increase the yield of sequenced strands, the sequencing reaction is performed in a thermal cycler, which cycles through the heating and cooling steps dozens of times, in effect repeating the sequencing reaction many times in one experiment.
Capillary Electrophoresis
The newly synthesized DNA strands, each labeled with one of four dyes, are now sorted by length using capillary electrophoresis. First, the reaction mixture is heated to keep the newly synthesized single strands from annealing with the template DNA strand. The dye-labeled single
17
An electrical current pulls the negatively charged DNA strands through the capillary. This tube is not much thicker than a human hair and is 1 to 3 feet long, sufficient to separate strands that differ in length by only one base. Because of the small dimensions involved, preparation of the capillary and loading of the sample are computer controlled.
18
strands become tangled in the gel material and take longer to emerge out the bottom.
As the strands emerge out the bottom of the capillary they pass through a laser beam that excites the fluorescent dye attached to the dideoxynucleotide at the end of each strand. This causes the dye to fluoresce, or glow, at a specific wavelength, or color. This color is then detected by a photocell, which feeds the information to the computer. [4]
Computer Analysis
The computer displays the information received from the photocell as an electropherogram, which is a tracing of signal received by the photodetector in each of the four wavelengths. Although the real colors seen by the photodetector are close to green, yellow, orange and red, the
19
Summary of Procedure
Today, dideoxy sequencing is the method of choice to sequence very long strands of DNA. DNA is purified from the cells of the organism of interest, and placed into cloning vectors, which allow the DNA to be multiplied by bacterial hosts. Each clone is then individually sequenced.
20
The DNA is denatured and a small DNA oligonucleotide is annealed at one end of the sequence of interest on the template strand. The DNA polymerase extends the oligonucleotide, using the template strand to guide incorporation of nucleotides. Once in a while, a dideoxynucleotide will be incorporated into the growing DNA strand. Because it is missing the 3 hydroxyl group, the dideoxynucleotide will prevent the DNA chain from being extended further. In addition, each dideoxynucleotide has a different color label. Consequently, each terminated DNA chain is colored according to the nucleotide at its end.
When the chains are separated by length by capillary electrophoresis, individual chains of increasing length can be identified by their color. A laser at the bottom of the capillary excites the fluorescent labels as they come out of the capillary. The fluorescence color then tells the computer which base is represented, and the computer records each base, one by one on a graph called an electopherogram.
CONCLUSION
Overlapping sequence data from many clones is analyzed by powerful computers, which regenerate the full-length sequence by piecing the short sequences together like a puzzle. Entire genomes can be sequenced in this manner. Genome sequencing projects, representing many different organisms, hold the promise of unprecedented advances in industry and medicine. Microbial genomes may encode enzymes that could help make industrial processes more efficient. Human genome sequences are helping us to better understand human
21
REFERENCES
1. Retrieved from http://ghr.nlm.nih.gov/handbook/basics/dna on September 23, 2011 2. Retrieved from http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/mbglossary/mbgloss.ht ml on September 23, 2011 3. Retrieved from http://www.dnasequencing.com/ on September 23, 2011 4. Retrieved from http://www.wiley.com/college/pratt/0471393878/student/animations/dna_s equencing/index.html on September 23, 2011 5. Retrieved from http://www.ornl.gov/sci/techresources/Human_Genome/faq/seqfacts.shtml on September 23, 2011 6. Retrieved from http://bioweb.uwlax.edu/genweb/molecular/theory/dna_sequencing/dna_se quencing.html on September 23, 2011
22