DNA sequencing The term DNA sequencing refers to methods for determining the order of the nucleotide bases

, adenine, guanine, cytosine, and thymine, in a molecule of DNA. The first DNA sequences were obtained by academic researchers, using laborious methods based on 2-dimensional chromatography in the early 1970s. Following the development of dye-based sequencing methods with automated analysis, DNA sequencing has become easier and orders of magnitude faster. Knowledge of DNA sequences of genes and other parts of the genome of organisms has become indispensable for basic research studying biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of the human genome, in the Human Genome Project. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.

DNA Sequence Trace RNA sequencing, which is technically easier to perform than DNA sequencing, was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers and his coworkers at the University of Ghent (Ghent, Belgium), between 1972[1] and 1976.[2] Prior to the development of rapid DNA sequencing methods in the early 1970s by Frederick Sanger at the University of Cambridge, in England and Walter Gilbert and Allan Maxam at Harvard,[3][4] a number of laborious methods were used. For instance, in 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. [5] The chain-termination method developed by Sanger and coworkers in 1975 soon became the method of choice, owing to its relative ease and reliability.[6][7] ] Maxam-Gilbert sequencing In 1976-1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases.[3] Although Maxam and Gilbert published their chemical sequencing method two years after the ground-breaking paper of Sanger and Coulson on plus-minus sequencing,[6][8] Maxam-Gilbert sequencing rapidly became more popular, since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. However, with the improvement of the chain-termination method (see below), Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity prohibiting its use in standard molecular biology kits, extensive use of hazardous chemicals, and difficulties with scale-up. The method requires radioactive labelling at one end and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). Thus a series of labelled fragments is generated, from the radiolabelled end to the first 'cut' site in each molecule. The fragments in the four reactions are arranged side by side in gel electrophoresis for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabelled DNA fragment, from which the sequence may be inferred.

Also sometimes known as 'chemical sequencing', this method originated in the study of DNA-protein interactions (footprinting), nucleic acid structure and epigenetic modifications to DNA, and within these it still has important applications. Chain-termination methods

Part of a radioactively labelled sequencing gel Because the chain-terminator method (or Sanger method after its developer Frederick Sanger) is more efficient and uses fewer toxic chemicals and lower amounts of radioactivity than the method of Maxam and Gilbert, it rapidly became the method of choice. The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators. The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in various DNA fragments of varying length. The newly synthesized and labeled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence.

DNA fragments are labeled with a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand) Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers [9][10] of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.

Sequence ladder by radioactive sequencing compared to fluorescent peaks (click to expand) Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence. [edit] Dye-terminator sequencing

is now being used for the vast majority of sequencing projects. but sufficient for automated processing of large sequence data sets. and data output as fluorescent peak trace chromatograms. The accuracy of such algorithms is below visual examination by a human operator. resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the right). These programs score the quality of each peak and remove low-quality base peaks (generally located at the ends of the sequence). In cases where DNA fragments are cloned before sequencing. Owing to its greater expediency and speed. dye-terminator sequencing is now the mainstay in automated sequencing. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. along with automated high-throughput DNA sequence analyzers. each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment. In contrast. as well as methods for eliminating "dye blobs". which permits sequencing in a single reaction. The dye-terminator sequencing method. Base calling software typically gives an estimate of quality to aid in quality trimming. . [edit] Large-scale sequencing strategies Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction. DNA sequencers carry out capillary electrophoresis for size separation.Capillary electrophoresis (click to expand) Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs. the resulting sequence may contain parts of the cloning vector. detection and recording of dye fluorescence. cleanup and re-suspension in a buffer solution before loading onto the sequencer are performed separately.[11] The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. each of which with different wavelengths of fluorescence and emission. [edit] Challenges Common challenges of DNA sequencing include poor quality in the first 15-40 bases of the sequence and deteriorating quality of sequencing traces after 700-900 bases. [edit] Automation and sample preparation View of the start of an example dye-terminator read (click to expand) Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. PCR-based cloning and emerging sequencing technologies based on pyrosequencing often avoid using cloning vectors. rather than four reactions as in the labelled-primer method. In dye-terminator sequencing. This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability. Sequencing reactions by thermocycling.

Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. by repeated removal of the blocking group . The single-molecule method developed by Stephen Quake's laboratory (later commercialized by Helicos) skips this amplification step. shotgun methods are often used for sequencing large genomes.[12][13] High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. such as whole chromosomes. and amplified in Escherichia coli.[14][15][16] Another method for in vitro clonal amplification is bridge PCR. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions. Reversible terminator methods (used by Illumina and Helicos) use reversible versions of dye-terminators. detect fluorescence at each position in real time. and sequenced in parallel. particularly with sequence repeats often causing gaps in genome assembly. (commercialized by 454 Life Sciences).[17] Parallelized sequencing DNA molecules are physically bound to a surface.Genomic DNA is fragmented into random pieces and cloned as a bacterial library. (developed by Agencourt. like dyetermination electrophoretic sequencing. where fragments are amplified upon primers attached to a solid surface. now Applied Biosystems). Gaps in the assembled sequence may be filled by primer walking. producing thousands or millions of sequences at once. In vitro clonal amplification Molecular detection methods are not sensitive enough for single molecule sequencing. This method does not require any pre-existing information about the sequence of the DNA and is referred to as de novo sequencing. The different strategies have different tradeoffs in speed and accuracy. The fragmented DNA is cloned into a DNA vector. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long. directly fixing DNA molecules to a surface. uses a DNA polymerase to determine the base sequence.(click to expand) Large-scale sequencing aims at sequencing very long DNA pieces.Sequencing by synthesis. Shendure and Porreca et al. Polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. (also known as "polony sequencing") and SOLiD sequencing. adding one nucleotide at a time. Emulsion PCR is used in the methods by Marguilis et al. New sequencing methods High-throughput sequencing The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process. but its assembly is complex and difficult. so most approaches use an in vitro cloning step to amplify individual DNA molecules. Common approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. contiguous sequence.

[14][18] Sequencing by ligation This enzymatic sequencing method uses a DNA ligase to determine the target sequence..[citation needed] In some instances researchers[who?] have shown that they can increase the through-put of conventional sequencing through the use of microchips. labeled according to the sequenced position. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. it uses a pool of all possible oligonucleotides of a fixed length. intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less.000 bp) by nucleotide labeling with heavier elements (e.[15][16][19] Used in the polony method and in the SOLiD technology. the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. called the Archon X Prize. Microfluidic Sanger Sequencing In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single chip (approximately 100 cm in diameter) thus reducing the reagent usage as well as cost.000 bases sequenced.g. with sequences accurately covering at least 98% of the genome.to allow polymerization of another nucleotide. Pyrosequencing (used by 454) also uses DNA polymerization. such as AFM or electron microscopy that are used to identify the positions of individual nucleotides within long DNA fragments (>5. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.[20] Mass spectrometry may be used to determine mass differences between DNA fragments produced in chain-termination reactions. with an accuracy of no more than one error in every 100.000 (US) per genome."[26] Sanger sequencing . and at a recurring cost of no more than $10.[25] In October 2006.[23][24] and microscopy-based techniques. Other sequencing technologies Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray.[citation needed] Research will still need to be done in order to make this use of technology effective. adding one nucleotide species at a time and detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates. Oligonucleotides are annealed and ligated. the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies.[21] DNA sequencing methods currently under development include labeling the DNA polymerase.[22] reading the sequence as a DNA strand transits through nanopores. halogens) for visual detection and recording.

an enzyme that replicates DNA. in a narrow glass tube (capillary) filled with a viscous polymer. The major advantage of this approach is the complete sequencing set can be performed in a single reaction. This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye. although this is less of a concern with frequently used 'universal' primers. The major reason for this is that the primers do not have to be separately labelled (which can be a significant expense for a single-use custom primer). This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability. Another important use of . commonly called 'dye terminator sequencing'. the polymerase chain reaction (PCR). which fluoresces at a different wavelength. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks).Part of a radioactively labelled sequencing gel In chain terminator sequencing (Sanger sequencing). Sanger Method for DNA Sequencing DNA sequencing. due to a template dependent difference in the incorporation of the large dye chainterminators. extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. or more commonly now. but may produce more uneven data peaks (different heights). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. For example. The oligonucleotide primer is extended using a DNA polymerase. requires first knowing the flanking sequences of this piece. This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper. has become a powerful technique in molecular biology. For this reason. View of the start of an example dye-terminator read (click to expand) An alternative to the labelling of the primer is to label the terminators instead. rather than the four needed with the labeled-primer approach. first devised in 1975. along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). a method which rapidly produces numerous copies of a desired piece of DNA. allowing analysis of genes at the nucleotide level. This method is easier and quicker than the dye primer approach. this tool has been applied to many areas of research.

This reaction is performed four times using a different ddNTP for each reaction. sequencing has allowed researchers to identify conserved sequence motifs and determine their importance in the promoter region. Note that the lengths of these strands are dependent on the location of the base relative to the 5' end. . This can be done by denaturing the double stranded DNA with NaOH. The sequenced strand can be read 5' to 3' by reading top to bottom the bases complementary to the those on the gel. In Figure 2. dGTP. (Figure 1).3'-dideoxynucleotide triphospates (ddNTPs). the polymerization will take place and will terminate whenever a ddATP is incorporated into the growing strand. and dTTP). one is reading the 5' to 3' sequence of the strand complementary to the sequenced strand. The concentration of ddATP should be 1% of the concentration of dATP. thymine. The logic behind this ratio is that after DNA polymerase is added. Before the advent of DNA sequencing. and its complementary base. must have been the base present on the 3' end of the sequenced strand. Therefore. Knowing these restriction sites is useful in cloning a foreign gene into the plasmid. the shortest fragments will migrate the farthest. One can continue reading in this fashion. the bottom-most band indicates that its particular dideoxynucleotide was added first to the labeled primer. ddATP must have been added first to the primer. the band that migrated the farthest was in the ddATP reaction mixture. This technique utilizes 2'. a polyacrylamide gel electrophoresis (PAGE) is performed. Note in Figure 2 that if one reads the bases from the bottom up. for example. one must first convert double stranded DNA into single stranded DNA.DNA sequencing is identifying restriction sites in plasmids. molecular biologists had to sequence proteins directly. Furthermore. Dideoxynucleotide sequencing represents only one method of sequencing DNA. When these reactions are completed. The gel is transferred to a nitrocellulose filter and autoradiography is performed so that only the bands with the radioactive label on the 5' end will appear. In PAGE. These are only a few examples illustrating the way in which DNA sequencing has revolutionized molecular biology. A Sanger reaction consists of the following: a strand to be sequenced (one of the single strands which was denatured using NaOH). One reaction is loaded into one lane for a total of four lanes (Figure 2). a whole series of labeled strands will result (Figure 1). In order to perform the sequencing. a molecular biologist can utilize sequencing to identify the site of a point mutation. These molecules terminate DNA chain elongation because they cannot form a phosphodiester bond with the next deoxynucleotide. In eukaryotic gene expression. If the ddATP is only 1% of the total concentration of dATP. It is commonly called Sanger sequencing since Sanger devised the method. now amino acid sequences can be determined more easily by sequencing a piece of cDNA and finding an open reading frame. and the other three dNTPs (dCTP. DNA primers (short pieces of DNA that are both complementary to the strand which is to be sequenced and radioactively labelled at the 5' end). a mixture of a particular ddNTP (such as ddATP) with its normal dNTP (dATP in this case). molecules that differ from deoxynucleotides by the having a hydrogen atom attached to the 3' carbon rather than an OH group. Therefore.

is AATCTGGGCTACTCGGGCGT. Figure 2. This figure is a representation of an acrylamide sequencing gel. Notice the different lengths of labeled strands produced in this reaction. Also depicted in this figure are the ingredients for a Sanger reaction.Figure 1. Notice that the sequence of the strand of DNA complementary to the sequenced strand is 5' to 3' ACGCCCGAGTAGCCCAGATT while the sequence of the sequenced strand. 5' to 3'. This figure shows the structure of a dideoxynucleotide (notice the H atom attached to the 3' carbon). .

forensics. The outcomes of obtaining a complete reference map (including the sequence) of the human genome have ushered in the post-genome era of studies. molecular biology. [1][2] .] Microfluidic Sanger Sequencing The completion of the Human Genome Project (HGP) has been a cornerstone in the advancement of biological studies. Sanger (dideoxy) method for sequencing DNA fragments. Each reaction was carried out in duplicate using Sequenase™. each reaction mixture will contain a mixture of prematurely terminated chains ending at every occurrence of the ddNTP (yellow). Genomics will (if it hasn’t already) revolutionize medicine. and many other related and even unrelated disciplines in the future. (b) Three of the labeled chains that would be generated in the presence of ddGTP from the specific DNA sequence shown in blue. chain elongation stops because the ddNTP lacks a 3′ hydroxyl. The primer is elongated in four separate reaction mixtures containing the four normal deoxyribonucleoside triphosphates (dNTPs) plus one of the four dideoxyribonucleoside triphosphates (ddNTPs) in a ratio of 100 to 1. but when this occurs. A ddNTP molecule can add at the position of the corresponding normal dNTP.Figure 7-29. [Part (c) courtesy of United States Biochemical Corporation. biotechnology. (c) An actual autoradiogram of a polyacrylamide gel in which more than 300 bases can be read. a commercial preparation of the DNA polymerase from bacteriophage T7. In time. (a) A single strand of the DNA to be sequenced (blue line) is hybridized to a 5′-end-labeled synthetic deoxyribonucleotide primer.

Roche/454 Pyrosequencing. Device design A microfluidic sequencing chip developed by Richard Mathies and colleagues (University of California. the ability of the HGP in obtaining the full human genomic sequence meant that modifications were required to be made to this method. Resolving DNA fragments according to differences in size and/or conformation is the most critical step in studying these features of the genome[5]. singlestrand conformation polymorphism (SSCP) hetroduplex analysis. while obviating many of the significant shortcomings of the conventional Sanger method (e. Berkeley)[7]. personnel-intensive manipulations. Typically MPS methods can only obtain short read lengths (35bp with Illumina platforms to a maximum of 200300bp by 454 Pyrosequencing). Applications of Microfluidic Sequencing Technologies Other useful applications of DNA sequencing include single nucleotide polymorphism (SNP) detection. consisting of three 100-mm-diameter glass wafers (on which device elements are microfabricated) and a polydimethylsiloxane (PDMS) membrane. and short tandem repeat (STR) analysis. making sequencing automated and high-throughput. In particular Massively Parallel Sequencing approaches such as those now in wide commercial use (Illumins/Solexa. The sequencing chip has a four-layer construction. Sanger Methods on the other hand achieve read lengths of approximately 800bp (typically 500-600bp with non-enriched DNA). A challenge of short-read sequence data is particularly an issue in sequencing new genomes (de novo) and in sequencing highly rearranged genome segments. Adopting the Sanger method. reliance on expensive equipment. followed by amplification of the fragments by Polymerase Chain Reaction (PCR). and capillary electrophoresis) are integrated on a waferscale chip using nanoliter-scale sample volumes. Three-dimensional channel interconnections and microvalves are formed by the PDMS and bottom manifold glass wafer. which provides an ordered sequence of the fragments. Briefly. Amplified base ladders are then separated by Capillary Array Electrophoresis (CAE) with automated.Sequencing of DNA has largely been based on dideoxy chain termination developed by Sanger et al. high consumption of expensive reagents. In particular. in situ “finish-line” detection of the fluorescently labeled ssDNA fragments. sample purification. [3].[6] Microfluidic Sanger Sequencing Microfluidic Sanger sequencing is a lab-on-a-chip application for DNA sequencing. This technology generates long and accurate sequence reads. These sequence reads are then computer assembled into overlapping or contiguous sequences (termed "contigs") which resemble the full genomic sequence once fully assembled. Reaction chambers and capillary electrophoresis channels are etched between the top two glass wafers. each DNA fragment is irreversibly terminated with the incorporation of a fluorescently labeled dideoxy chain-terminating nucleotide. the incorporation of technological innovation. The longer read lengths in Sanger methods display significant advantages over MPS tools especially in terms of sequencing repetitive regions of the genome. which are thermally bonded. etc.[5] Rapid technological developments have now emerged as a result of the HGP. and ABI SOLiD) are proving to be attractive tools for sequencing. thereby producing a DNA “ladder” of fragments that each differ in length by one base and bear a base-specific fluorescent label at the terminal base. typically those seen of cancer genomes or in regions of chromosomes that exhibit structural variation. high-throughput genome sequencing (also referred to as Whole Genome Shotgun Sequencing) involves fragmenting the genome into small single-stranded pieces.g. made this decade-long worldwide effort successful [4]. in which the Sanger sequencing steps (thermal cycling.) by integrating and automating the Sanger sequencing steps. . However. in its modern inception.

Longer read lengths of each single electrophoretic separation. After thermal-cycling. and a surface heater. • Capillary electrophoresis Extension fragments are injected into the CE chamber where they are electrophoresed through a 125-167-V/cm field. The Thermal Cycling (TC) unit is comprised of a 250-nanoliter reaction chamber with integrated resistive temperature detector. each corresponding to the Sanger sequencing steps. Dublin. The capture gel through which the sample is driven. • Purification The charged reaction mixture (containing extension fragments.. and excess primer. substantially reduces the cost associated with de novo DNA . and salts are eluted through the capture waste port. and primers are loaded into the TC chamber and thermal-cycled for 35 cycles ( at 95°C for 12 seconds and at 60°C for 55 seconds). Extension fragments are immobilized by the gel matrix. CA)[8] integrates the first two Sanger sequencing steps (thermal cycling and purification) in a fully automated system.The device consists of three functional units. free nucleotides. the reaction mixture undergoes purification in the capture/purification chamber. dye-terminator sequencing reagent. template DNA. Comparisons to other sequencing techniques The ultimate goal of high-throughput sequencing is to develop systems that are low-cost. and then is injected into the capillary electrophoresis (CE) chamber. consists of 40 μM of oligonucleotide (complementary to the primers) covalently bound to a polyacrylamide matrix. template. microvalves. The capture gel is heated to 67-75°C to release extension fragments. and excess sequencing reagent) is conducted through a capture/purification chamber at 30°C via a 33-Volts/cm electric field applied between capture outlet and inlet ports. Movement of reagent between the top all-glass layer and the lower glass-PDMS layer occurs through 500-μm-diameter via-holes. template DNA. Sequencing chemistry • Thermal cycling In the TC reaction chamber. The CE unit consists of a 30-cm capillary which is folded into a compact switchback pattern via 65-μm-wide turns. The Apollo 100 platform requires sub-microliter volumes of reagents. The manufacturer claims that samples are ready for capillary electrophoresis within three hours of the sample and reagents being loaded into the system. Platforms The Apollo 100 platform (Microchip Biotechnologies Inc. and extremely efficient at obtaining extended (longer) read lengths.

1992). prevent the addition of further nucleotides. Even though both teams shared the 1980 Nobel Prize. used a “chemical cleavage protocol”. The Method Before the DNA can be sequenced. make comparisons between homologous genes across species and identify mutations. it has to be denatured into single strands using heat. This primer is specifically constructed so that its 3' end is located next to the DNA sequence of interest. With this knowledge. while the English. The Americans. These modified nucleotides. lead by Maxam and Gilbert. designed a procedure similar to the natural process of DNA replication. two methods were independently developed by an American team and an English team to do exactly this. is based on the use of dideoxynucleotides (ddNTP’s) in addition to the normal nucleotides (NTP’s) found in DNA. and so there was competition to create a method that would sequence DNA.The Sanger Method By Sarah Obenrader _____________________________________________ This web page was produced as an assignment for an undergraduate course at Davidson College. (Speed. Sanger’s method became the standard because of its practicality (Speed. when integrated into a sequence. we can locate regulatory and gene sequences. Scientists recognized that this could potentially be a very powerful tool. Background Information DNA sequencing enables us to perform a thorough analysis of DNA because it provides us with the most basic information of all: the sequence of nucleotides. which is also referred to as dideoxy sequencing or chain termination. Dideoxynucleotides are essentially the same as nucleotides except they contain a hydrogen group on the 3’ carbon instead of a hydroxyl group (OH). Once the primer is . 1992). Next a primer is annealed to one of the template strands. and thus the DNA chain is terminated. 2002). lead by Sanger. Sanger’s method.This occurs because a phosphodiester bond cannot form between the dideoxynucleotide and the next incoming nucleotide. Then in 1974. for example. Either this primer or one of the nucleotides should be radioactively or fluorescently labeled so that the final product can be detected on a gel (Russell.

The fragments are all different lengths due to the random integration of the ddGTP's (Metzenberg). depending on the method used for labeling the DNA. For example if we looked at only the "G" tube. we might find a mixture of the following products: Figure 1: An example of the potential fragments that could be produced in the "G" tube. ddTTP and DNA polymerase "C" tube: all four dNTP's. Once these reactions are completed. The key to this method. the DNA is once again denatured in preparation for electrophoresis. Thus in a solution where the same chain of DNA is being synthesized over and over again. on occasion a dideoxynucleotide is incorporated into the chain in place of a normal nucleotide. Then reagents are added to these samples as follows: "G" tube: all four dNTP's. ddATP and DNA polymerase "T" tube: all four dNTP's. the new chain will terminate at all positions where the nucleotide has the potential to be added because of the integration of the dideoxynucleotides (Russell. and each at about one-hundreth the concentration of the the normal precursors (Russell. 2002). "A". In this way.attached to the DNA. all of the tubes contain a different ddNTP present. nucleotides are added on to the growing chain by the DNA polymerase. which results in a chain-terminating event. The contents of each of the four tubes are run in separate lanes on a polyacrylmide gel in order to separate the different sized bands from one another. 2002). the gel is then exposed to either UV light or X-Ray. the solution is divided into four tubes labeled "G". . However. bands of all different lengths are produced. "T" and "C". ddCTP and DNA polymerase As shown above. is that all the reactions start from the same nucleotide and end with a specific base. ddGTP and DNA polymerase "A" tube: all four dNTP's. After the contents have been run across the gel. As the DNA is synthesized.

it is no surprise that the Sanger method has become outdated. Automated sequencing has been developed so that more DNA can be sequenced in a shorter period of time. As shown in Figure 2. If all of the reactions from the four tubes are combined on one gel. Automated Sequencing With the many advancements in technology that we have achieved since 1974. The longer fragments of DNA traveled shorter distances than the smaller fragments because of their heavier molecular weight. The letters over the lanes indicate which dideoxy nucleotide was used in the sample being represented by that lane. . Figure 3: This is an autoradiogram of a dideoxy sequencing gel. which terminated the chain (Metzenberg).The blue section indicates the primer. you are reading the complementary sequence of the template strand (Metzenberg). each labeled with a different color dye (Russell. the actual DNA sequence in the 5' to 3' direction can be determined by reading the banding pattern from the bottom of the gel up. smaller fragments are produced when the ddNTP is added closer to the primer because the chains are smaller and therefore migrate faster across the gel. the black section indicates the newly synthesized strand and the red denotes a ddGTP. It is important to remember though that this sequence is complementary to the template strand from the beginning. When you read from the bottom up. the new technology that has emerged to replace this method is based on the same principles of Sanger's method. However. With the automated procedures the reactions are performed in a single tube containing all four ddNTP's.Figure 2: This is a polyacrylmide gel of the reactions in the "G" tube (the same sequences seen in figure 1). 2002).

but they are all run on the same lane as opposed to four different ones. Since the four dyes fluoresce at different wavelengths.Figure 4: In automated sequencing. Figure 5: Results of gel electrophoresis for the dye labeled DNA in automated sequencing. the DNA is separated on a gel. which is a diagram of colored peaks that correspond to the nucleotide in that location in the sequence (Russell. the oligonucleotide primers can be "end-labeled" with different color dyes. one for each ddNTP. as opposed to the image on the right which shows a gel where all the DNA is run in one lane (Metzenberg). . These dyes fluoresce at different wavelengths. The image on the left shows what the gel looks like if the four reactions are run in different lanes. which are read via a machine (Metzenberg). 2002). The results are then depicted in the form of a chromatogram. a laser then reads the gel to determine the identity of each band according to the wavelengths at which it fluoresces. As in Sanger's method.

Single stranded DNA to be sequenced DNA polymerase . ii.Figure 6: Results from an automated sequence shown in the form of a chromatogram. green is A.3’-dideoxy analog of the four nucleotides   These analogs lack the 3’. The colors represent the four bases: blue is C. Controlled termination at specific sites can be achieved with the use of 2’. each containing: i. DNA replication terminates at the site where a dideoxy analog is incorporated.OH group required to form the next phosphodiester bond with the incoming nucleotide. black is G and red is T (Metzenberg). DNA replication terminated at different sites will produce DNA fragments of variable lengths.  DNA replication is performed in four separate tubes. Sanger Dideoxy Method    This method is based on DNA replication.

ca/~hallett/GEP/Lecture15/Image31.mcgill.   Primers The four dNTPs (dATP.gif    The products of all four reactions will be separated by gel electrophoresis in four separate lanes. In this way. . dCTP. The amount of dideoxy analog added is small enough (~1% of total dNTP) that termination will occur only occasionally. More porous agarose gels are used to resolve mixtures of larger fragments. all possible DNA fragments will be produced. up to 20kb. dTTP and dGTP) Small amount of one of the four 2’. Polyacrylamide gels are used to separate fragments containing up to 1000bp. v.mcb.3’-dideoxy analog (ddATP or ddCTP or ddTTP or ddGTP) Either the primers or the dNTPs are radiolabeled with 32P .   http://www. iv. The correct nucleotide will be inserted sometimes and the dideoxy analog other times.iii.

Lasers will be used to activate the fluorescent dideoxy analogs or primers and a detector to distinguish the colors. The last base incorporated into the DNA can be determined from the color of the detected DNA fragment. When using fluorescent-tagged terminators.licor. Smaller DNA fragments runs faster (towards the positive electrode as DNA is negatively charged) and appear at the bottom of the gel.jpg . The DNA mixture will be separated by gel electrophoresis. Either fluorescent-tagged terminators (dideoxy analogs) or florescent-tagged primers can be used. the primers in each of the four separate mixtures should carry tags of different colors. This method eliminates the use of radioactive reagents and can be readily automated. The base sequence of the new DNA is read from the autoradiogram of the gel in 5’→ 3’direction starting from the smallest fragment.  Fluorescent Detection of Oligonucleotides     Fluorescent detection is a highly effective alternative for visualizing DNA. each of the four dideoxy nucleotides should carry a tag with a different color.com/bio/Images/IR2Schem. If fluorescent-tagged primer is the choice.     Adapted from: http://www.

Subsequent addition of ATP sulfurylase converts PPi to ATP.gif New Developments         More robust and high-throughput methods are currently being developed to meet the need of whole genome projects. PPi is released. which provides energy for luciferase to generate light. 11. Because the added nucleotide is known. If the nucleotide is incorporated. The light is easily detected by a photodiode. January 2001 . the sequence of nucleotide can be determined.tw/ol_biology2/ultranet/FluorDideoxySeq. only one of the four dNTP in the reaction mixture. or a charge-coupled device camera (CCD) camera.ym. where millions of bases need to be sequenced.edu.dls. DNA template can be immobilized on a solid phase and the four nucleotides are added in a stepwise fashione. One of the growing techniques developed is pyrosequencing. photomultiplier tube. 3-11. Genome Research Vol.http://www. Issue 1. Pyrosequencing is based on the detection of released pyrophosphate (PPi) during DNA synthesis.

.

.

These procedures have been employed successfully for large-scale DNA sequencing of cosmid fragments subcloned in plasmid or M13 vectors. there is an optimal method for PCR amplification. the comb and the tape at the bottom of the gel are removed. the urea in each well is suctioned out with a mouth pipette. and electrophoresis (26. It is important to reiterate that for every combination of amplification primer pair and target DNA. and dried with a Kimwipe. The reactions terminated with the long termination mix typically are divided in half and loaded onto . a few days or hours spent optimizing PCR amplification conditions and selecting the best DNA sequencing method for the target DNA of interest will be time well spent. After polymerization. Radiolabeled sequencing gel preparation. Hence. taking care to avoid and eliminate air bubbles. The wells are cleaned by circulating buffer into the wells with a syringe and. taped together and separated by thin spacers corresponding to the desired thickness of the gel.15 mm X 50 cm X 20 cm. Typically.29) To prepare polyacrylamide gels for DNA sequencing. After pouring. the gel immediately is laid horizontally and a well forming comb is inserted into the gel and held in place by metal clamps. loading. the gel solution is poured between two glass plates. A coupled PCR/DNA sequencing method that works well for one experimental system may work quite poorly with others. and adding running buffer to the buffer chambers. the notched glass plate is treated with a silanizing reagent and then rinsed with double distilled water. Immediately after the addition of the polymerizing agents. the respective amount of deionized acrylamide-bisacrylamide solution is added. The polyacrylamide gels are allowed to polymerize for at least 30 minutes prior to use. Each base-specific sequencing reaction terminated with the short termination mix is loaded using a mouth pipette onto a 0.The methods described in this chapter provide some useful approaches for DNA sequencing of templates produced by PCR. these glass plates are cleaned with Alconox detergent and hot water. In addition. are rinsed with double distilled water. and ammonium persulfate and TEMED are added to initiate polymerization. the appropriate amount of urea is dissolved by heating in water and electrophoresis buffer. and for sequence analysis of cDNAs cloned in bacteriophage lambda vectors. Prior to taping. denaturing 5% polyacrylamide gel and electrophoresed for 2.25 hours at 22 mA. the ability to sequence the products of any PCR experiment directly will also vary. The vertical electrophoresis apparatus is assembled by clamping the top and bottom buffer wells onto the gel. immediately prior to the loading of each sample. the method describing direct sequencing from PEG-precipitated PCR product has been used successfully for analysis of Caenorhabditis elegans genomic and cDNA sequences.

The plate then is placed in the cycler whose heat block had been preheated to 95deg C. which fits into a Perkin Elmer Cetus Cycler 9600. At this stage. exposure times varied from 4 hours to several days. Taq-polymerase catalyzed cycle sequencing using fluorescent-labeled dye terminator reactions One of the major problems in DNA cycle sequencing is that when fluorescent primers (1) are used the reaction conditions are such that the nested fragment set distribution is highly dependent upon the template concentration in the reaction mix.26) Each base-specific fluorescent-labeled cycle sequencing reaction routinely included approximately 100 or 200 ng Biomek isolated single-stranded DNA for A and C or G and T reactions. Once the above mixes are prepared. Prior to use. Strip caps are sealed onto the tube/retainer set and the plate is centrifuged briefly. the plate is centrifuged briefly to reclaim condensation. These sequencing reactions could be stored for several days at -20degC. Double-stranded cycle sequencing reactions similarly contained approximately 200 or 400 ng of plasmid DNA. four aliquots of single or double-stranded DNA are pipetted into the bottom of each 0. The cycling protocol consisted of 1530 cycles of seven-temperatures:  95degC denaturation  55degC annealing  72degC extension  95degC denaturation  72degC extension  95degC denaturation. the glass plates are separated and the gel is blotted to Whatman paper. Prior to pooling and precipitation. and  72degC extension. C. covered with plastic wrap. To prepare the reaction premixes.15 mm X 70 cm X 20 cm denaturing 4% polyacrylamide gels. The protocol used. and T reactions. dried by heating on a Hoefer vacuum gel drier. C. the fluorescent terminator reactions require only one reaction tube per template while the fluorescent labelled primer reactions require one reaction tube for each of the four terminators. as described below. One gel is electrophoresed at 15 mA for 89 hours and the other is electrophoresed for 20-24 hours at 15 mA.two 0. the films are developed by processing in developer and fixer solutions. linked to a 4deg C final soak file. In addition. . The autoradiogram then is placed on a light-box and the sequence is manually read and the data typed into a computer. These tubes are part of a 96-tube/retainer set tray in a microtiter plate format. the reactions frequently are frozen and stored at -20degC for up to several days. The primer and base-specific reactions are pooled into ethanol. and the cycling program immediately started. and then an aliquot of the respective reaction mixes is added to the side of each tube. respectively. rinsed with water. that are sufficient for 24 template samples. isolated using either the standard alkaline lysis or the diatomaceous earth modified alkaline lysis procedures.2 ml thin-walled reaction tube. All reagents except template DNA are added in one pipetting step from a premix of previously aliquotted stock solutions stored at -20degC (see Appendix B). is easily interfaced with the 96 well template isolation and 96 well reaction clean-up procedures also described herein. corresponding to the A. We have recently observed that the nested fragment set distribution for the DNA cycle sequencing reactions using the fluorescent labelled terminators (8) is much less sensitive to DNA concentration than that obtained with the fluorescent labelled primer reactions as described above. G. the base-specific reaction premixes are thawed and combined with diluted Taq DNA polymerase and the individual fluorescent end-labeled universal primers (see Appendix C) to yield the final reaction mixes. Taq-polymerase catalyzed cycle sequencing using fluorescent-labeled dye primers (10. After electrophoresis. and exposed to X-ray film. After exposure. and air dried. Depending on the intensity of the signal and whether the radiolabel is 32-P or 35-S. D. This latter point allows the fluorescent labelled terminator reactions to be pipetted easily in a 96 well format. and the DNA is precipitated and dried. reaction buffer is combined with the base-specific nucleotide mixes.

After data collection. or as a string of nucleotides. the collected data can be viewed in several formats. E. rinsed. the pooled and dried reaction products are resuspended in formamide/EDTA loading buffer by vortexing and then heated at 90degC. sample loading. The gel then is electrophoresed for 5 minutes before the wells are cleaned again and the even numbered samples are loaded. The filter wheel used for dye-primers and dye-terminators is specified on the ABI 373A CPU. The DNA is denatured and the primer annealed by incubating DNA. the oddnumbered sequencing reactions are loaded into the respective wells using a micropipettor equipped with a flattipped gel-loading tip. The overall graphics image of the gel can be displayed to assess the accuracy of lane tracking. and stored. Double-stranded dye-terminator reactions required approximately 5 ug of diatomaceous earth modified-alkaline lysis midi-prep purified plasmid DNA. Prior to sample loading. the sequence data files are transferred to a SPARCstation 2 using NFS Share. F. the plates are recleaned. and time correction. The double-stranded DNA is denatured by incubating the DNA in sodium hydroxide at 65degC. To aid in the removal of unincorporated dye-terminators.3 mm X 89 cm X 52 cm taped plates and fitted with 36 well forming combs. mobility calculation. fluorescent-labeled dye-terminators. spectral deconvolution. electrophoresis. the DNA pellet is rinsed twice with ethanol. Fluorescent-labeled sequencing gel preparation. After the reaction cooled to room temperature. dried. the cross-section of data for each lane are extracted and processed by baseline subtraction. The reaction is stopped by adding ammonium acetate and ethanol. The software then determined the sample lane positions based on the signal intensities. Subsequently. After polymerization. After processing. and rinsed with distilled water and ethanol. Double-stranded sequencing of cDNA clones containing long poly(A) tails using anchored poly(dT) primers . electrophoresis buffer is added. alpha-thio-deoxynucleotides. also where electrophoresis conditions are adjusted. and buffer at 65degC. and after incubation. Ammonium acetate is added to stop the reaction and the DNA fragments similarly are precipitated. the overall procedure is highly reproducable and therefore less error prone. low fluorescence glass plates are carefully cleaned with hot water. Reaction buffer.By performing all three of these steps in a 96 well format. and the checked by laser-scanning. an image file is created by the ABI software which related the fluorescent signal detected to the corresponding scan number. and diluted Sequenase[TM] DNA polymerase are added and the mixture is incubated at 37degC. Optically-ground. The gel is assembled into an ABI sequencer. Sequenase[TM] catalyzed sequencing with dye-labeled terminators (29-32) Single-stranded dye-terminator reactions required approximately 2 ug of phenol extracted M13-based template DNA. and the DNA fragments are precipitated and dried. primer is added and the reaction is neutralized by adding an acid-buffer. and analysis on the ABI 373A DNA sequencer Polyacrylamide gels for fluorescent DNA sequencing are prepared as described above except that the gel mix is filtered prior to polymerization. On the Macintosh computer. as a chromatogram of processed sequence data. After cleaning the sample wells with a syringe. and ethanol to remove potential fluorescent contaminants prior to taping. pre-electrophoresis. G. primer. and the data from each sample lane can be viewed as either a four-color raw fluorescent signal versus scan number. The dried sequencing reactions could be stored up to several days at -20degC. Typically electrophoresis and data collection are for 10 hours at 30W on the ABI 373A that is fitted with a heat-distributing aluminum plate in contact with the outer glass gel plate in the region between the laser stop and the sample loading wells (26). and the gel is pre-electrophoresed for 10-30 minutes at 30 W. distilled water. the buffer wells are attached. alpha-thio-deoxynucleotides. data collection. the tape and the comb are removed from the gel and the outer surfaces of the glass plates are cleaned with hot water. If baseline alterations are observed on the ABI-associated Macintosh computer display. After the lanes are tracked. Denaturing 6% polyacrylamide gels are poured into 0. and diluted Sequenase[TM] DNA polymerase then are added and the reaction is incubated at 37degC. A sample sheet is created within the ABI data collection software on the Macintosh computer which indicated the number of samples loaded and the fluorescent-labeled mobility file to use for sequence data processing. fluorescent-labeled dye-terminators.

cDNA sequencing based on PCR and random shotgun cloning The following is a rapid and efficient method for sequencing cloned cDNAs based on PCR amplification (14). H. In an attempt to solve this problem we synthesized three primers which contain (dT)17 and either (dA) or (dC) or (dG) at the 3' end. leads to a significant number of clones containing the original cDNA cloning vector rather than the desired cDNA insert.13). The thermostability of Taq polymerase allows the sequencing reactions to be carried out like PCR reactions.15).3. and the reactions are called "cycle sequencing" reactions. The ability to directly obtain sequence immediately upstream from the poly(A) tail of cDNAs should be of particular importance to large scale efforts to generate sequence-tagged sites (STSs) (11) from cDNAs (12. The use of four different dyes allows the sequencing reaction to be performed in a single tube and the resulting fragments to be loaded in a single well. The fluorescence is detected as the fragments electrophorese through a transparent section of the capillary which runs in front of a CCD camera. This method is used less frequently since dye-labeled primers are expensive to manufacture and four separate reactions must be set up for each sequence (they can still be separated in a single lane though). However using this type of enzyme for fluorescent DNA sequencing is not optimal because the amount of fluorescent DNA produced is low and thus a large amount of template is required to produce good fluorescent signals. over 300 bp of readable sequence could be obtained. Automated Fluorescent DNA Sequencing . The dyes can also be attached to the primer used to initiate the extension by a DNA polymerase.3.4 kb size range so that the random shotgun sequencing approach described below could be implemented. Using this protocol. generally cleaner than dye-terminator sequencing.e. Attaching these dyes to the four dideoxynucleotide terminators used in the standard Sanger chain termination DNA sequencing reaction results in all the fragments ending in one of the four fluorescent dyelabeled terminator corresponding to the dideoxy bases. randomly shearing the intact plasmid followed by shotgun sequencing (1. however. These two PCR primers. primer walking (17) and exonuclease III deletion (18). with the sequence 5'-TCGAGGTCGACGGTATCG-3' for the forward or -16bs primer and 5'GCCGCTCTAGAACTAG TG-3' for the reverse or +19bs primer. This is a PCR-based approach where the "universal" forward and/or reverse priming sites were excluded from the resulting PCR product by choosing a primer pair that lay between the usual "universal" forward and reverse priming sites and the multiple cloning sites of the Stratagene Bluescript vector. now have been used to amplify sufficient quantities of cDNA inserts in the 1. are both time consuming and labor intensive. Sequenase (T7) polymerase has been a widely used enzyme for DNA sequencing with radioactive nucleotides because of its high processivity and low error rate. while the alternative. We reasoned that the presence of these three bases at the 3' end would 'anchor' the primers at the upstream end of the poly(A) tail and allow sequencing of the region immediately upstream of the poly(A) region. Sequencing with these primers results in a long poly(T) ladder followed by a sequence which is difficult to read. while being excited with laser light. random shotgun cloning (1.e.2 to 3. The dyeterminator approach can be used with different types of DNA polymerases. the presently implemented directed cDNA sequencing strategies. and automated fluorescent sequencing (16). These fragments are then separated on a liquid denaturing gel pumped into each capillary. The enzyme currently in general use for automated fluorescent DNA sequencing is a variant of Taq DNA polymerase.Background Automated fluorescent DNA sequencing using capillary DNA sequencing machines like the ABI/PRISM 3100 Genetic Analyzers in the CGC is based on the use of a different colored fluorescent dye for each of the four DNA bases. i. Sequencing of the opposite strand of these cDNAs using insert-specific primers occurred directly upstream of the poly(A) region.Sequencing double stranded DNA templates has become a common and efficient procedure (10) for rapidly obtaining sequence data while avoiding preparation of single stranded DNA. the corresponding cDNAs should be sequenced in a timely manner. i. resulting in greater sample capacity per “lane” as compared to what is possible with radioactive labeled fragments. We have applied this approach to several other poly(A)-containing cDNA clones with similar results. Cycle sequencing reactions are analogous to PCRs except only a single primer is used . The resulting sequence is. However.15). This method was developed in our laboratory because once the sequence of a genomic DNA containing cosmid is obtained and putative exons are predicted. Double stranded templates of cDNAs containing long poly(A) tracts are difficult to sequence with vector primers which anneal downstream of the poly(A) tail.

the C's in blue and the T's in red. The advantage of using the thermostable Taq polymerase over the use of Sequenase is that multiple rounds of sequencing can be performed without the need to add fresh enzyme. .htm Another important innovation was the introduction of dideoxy terminators that use two fluorescent dye labels to take advantage of fluorescence resonance energy transfer (FRET) (i. four nucleotides (A. forming chemically cleavable fluorescent nucleotide reversible terminators. We report here the construction of such a DNA sequencing system using molecular engineering approaches. it takes 6 runs to process a full plate of samples. You can see that the cycle sequencing reaction products from 16 wells of a 96-well plate are being electrophoresed simultaneously. This one-step dual-deallylation reaction thus allows the reinitiation of the polymerase reaction and increases the SBS efficiency.atp. The G bases are displayed in black.nist. etc. and sequencing plasmids and PCR products is trivial. The fluorophore and the 3′-O-allyl group on a DNA extension product. the A's in green. making this the method of choice in the majority of circumstances. A typical processed electropherogram generated using ABI BigDye v3. This allows the use of much less template DNA. C. G. The large pane at the bottom of the screen shows the fluorescent DNA bands which have been separated up to this point in each of the 16 capillaries lined up next to each other. Modifications (point mutations affecting amino acid residues in or near the active site) to the Taq DNA polymerase have enabled it to incorporate the fluorescent dye-labeled terminators more evenly and efficiently. P1s. Fluorescent sequencing methods are now robust enough that sequencing large templates such as BACS. ABI BigDye).1 cycle sequencing chemistry and run on a 3100 is shown at right. We found that an allyl moiety can be used successfully as a linker to tether a fluorophore to 3′-O-allyl-modified nucleotides. Attempts have been made to develop polymerases which have some of the advantageous properties of Sequenase but are thermostable like Taq. Since the 3100 is a 16 capillary machine. Next Section Abstract DNA sequencing by synthesis (SBS) on a solid surface during polymerase reaction offers a paradigm to decipher DNA sequences.and only single stranded products are generated. Taking advantage of FRET using the dual-labeled terminators has allowed improvement of signal intensity and spectral separation to the point where very small amounts of templates can be used. for application in SBS. T) are modified as reversible terminators by attaching a cleavable fluorophore to the base and capping the 3′-OH group with a small chemically reversible moiety so that they are still recognized by DNA polymerase as substrates. The picture below is a screen capture from the data collection software.gov/eao/sp950-2/chapt3-2. 3′-O-allyl-dNTPs-allyl-fluorophore. which is generated by incorporating 3′-O-allyl-dNTPs-allyl-fluorophore in a polymerase reaction. In this approach. that used to be impractical have become reasonably easy. resulting in very even peak heights over a sequence read. are removed simultaneously in 30 s by Pd-catalyzed deallylation in aqueous buffer solution.e. One of these efforts is described in more detail at http://www. DNA templates consisting of homopolymer regions were accurately sequenced by using this class of fluorescent nucleotide analogues on a DNA chip and a four-color fluorescent scanner.

It is named by analogy with the rapidly-expanding. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence [1]. For example. . consider the following two rounds of shotgun reads: Strand Original First shotgun sequence Second shotgun sequence Reconstruction Sequence AGCATGCTGCAGTCATGCTTAGGCTA AGCATGCTGCAGTCATGCT-------------------------TAGGCTA AGCATG-------------------------CTGCAGTCATGCTTAGGCTA AGCATGCTGCAGTCATGCTTAGGCTA In this extremely simplified example. piece by piece. meaning similar short reads could come from completely different parts of the sequence. and shotgun sequencing. Even so. As sequencing projects began to take on longer and more complicated DNAs. and subsequently re-assembled to give the overall sequence. Assembly of complex genomes is additionally complicated by the great abundance of repetitive sequence. current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome. to complete the Human Genome Project. which is a faster but more complex process. in 12 reads. which progresses through the entire strand. also known as shotgun cloning. multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. on average. most of the human genome was sequenced at 12X or greater coverage. which are sequenced using the chain termination method to obtain reads. DNA is broken up randomly into numerous small segments. quasi-random firing pattern of a shotgun. is a method used for sequencing long DNA strands. longer sequences must be subdivided into smaller fragments. that is. each base in the final sequence was present. shotgun sequencing. Whole genome shotgun sequencing Whole genome shotgun sequencing for small (4000 to 7000 basepair) genomes was already in use in 1979 [1] broader application benefited from pairwise end sequencing. the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. known colloquially as double-barrel shotgun sequencing. Example For example. Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. In reality. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. In shotgun sequencing [1] [2]. however. Since the chain termination method of DNA sequencing can only be used for fairly short strands (100 to 1000 basepairs).Shotgun sequencing In genetics. Two principal methods are used for this: chromosome walking. and uses random fragments. none of the reads cover the full length of the original sequence.

other technologies started surfacing. and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation. particularly for genomes with repeating regions. At the time. The first theoretical description of a pure pairwise end sequencing strategy. This results in high coverage. and then by Celera Genomics to sequence the drosophila melanogaster (fruit fly) genome in 2000 [7]. Coverage Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence.000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy. the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. mate pairs will rarely overlap. there was community consensus that the optimal fragment length for pairwise end sequencing would be three times the sequence read length. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the genome of the bacterium Haemophilus influenzae in 1995 [6] . Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. its ability to correctly link these regions is suspect. For example. the number of reads(N). a hypothetical genome with 2. although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. such as the percentage of the genome covered by reads (the coverage). Contigs can be linked together into scaffolds by following connections between mate pairs. In 1995 Roach et al.Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments. called next-generation sequencing. and 150 kb). It can be calculated from the length of the original genome (G). 50. high-molecular-weight DNA is sheared into random fragments. overlapping reads are collected into longer composite sequences known as contigs. First. As sequence assembly programs become more sophisticated and computing power becomes cheaper. which makes the whole process much more efficient than more traditional approaches. in all but the smallest clones. Since the chain termination method usually can only produce reads between 500 and 1000 bases long. Detractors argue that although the technique quickly sequences large regions of DNA. and cloned into an appropriate vector. The first published description of the use of paired ends was in 1990 [3] as part of the sequencing of the human HPRT locus. Proponents of this approach argue that it is possible to sequence the whole genome at once using large arrays of sequencers. These technologies produce shorter reads (anywhere from 25-500bp) but many hundreds of thousands or million reads in a relatively short time (on the order of a day). The clones are then sequenced from both ends using the chain termination method yielding two short sequences. This parameter also enables one to estimate other quantities. it may be possible to overcome this limitation[citation needed] . The original sequence is reconstructed from the reads using sequence assembly software. The subject of DNA sequencing theory addresses the relationships of such quantities. To apply the strategy. The major disadvantage is that the accuracies are usually lower (although this is compensated by the high coverage). was in 1991[4]. . These technologies are vastly superior to shotgun sequencing due to the high volume of data and the relatively short time it takes to sequence a whole genome. size-selected (usually 2. and the average read length(L) as NL / G. Next-Generation Sequencing Although shotgun sequencing was the most advanced technique for sequencing genomes from about 1995-2005.[5] introduced the innovation of using fragments of varying sizes. 10. and subsequently the human genome. but the assembly process is much more computationally expensive. assuming fragments of constant length.

Sign up to vote on this title
UsefulNot useful