Introduction: Since the discovery of the chemical nature of DNA in the 1950s that, it is written in a simple four-letter

code of nucleotides, and is the hereditary material in all living organisms, sequencing, or "reading" the genetic code has become of increasing interest to scientists. RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers. Prior to the mid-1970’s no method existed by which DNA could be directly sequenced. Knowledge about gene and genome organization was based upon studies of prokaryotic organisms and the primary means of obtaining DNA sequence was so-called reverse genetics in which the amino acid sequence of the gene product of interest is back-translated into a nucleotide sequence based upon the appropriate codons. Given the degeneracy of the genetic code, this process can be tricky at best. In the mid1970’s two methods were developed for directly sequencing DNA. These were the Maxam-Gilbert chemical cleavage method and the Sanger chain-termination method. Prior to the development of rapid DNA sequencing methods in the early 1970s by Frederick Sanger in England and Walter Gilbert and Allan Maxam at Harvard, a number of laborious methods were used. For instance, in 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. The chain-termination method developed by Sanger and coworkers in 1975 soon became the method of choice, owing to its relative ease and reliability. Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Several changes took place in these technologies owing to the high demand for low-cost sequencing and it has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences

at once. High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. Need for gene sequencing:

Understanding a particular DNA sequence can shed light on a genetic

condition and offer hope for the eventual development of treatment.

An alteration in a DNA sequence can lead to an altered or non functional

protein, and hence to a harmful effect in a plant or animal.

Simple point mutations can cause altered protein shape and function.

Terminology related to sequencing: DNA A nucleic acid, that carries the genetic information in the body’s cells. made up of four similar chemicals called bases and abbreviated A, T, C, and G that are repeated over and over in pairs. DNA sequencing Determination of the order of the nucleotide bases - adenine, guanine, cytosine, and thymine in a molecule of DNA. Gene A gene is a distinct portion of a cell’s DNA that codes for a type of protein or for an RNA chain. Gene sequencing Gene sequencing is a process in which the individual base nucleotides in an organism's DNA are identified.

Genome Complete copy of chromosomal and extra chromosomal gene insrtuctions. Genome sequencing: Breaking the whole genome into small pieces, sequencing the pieces and then reassembling them in proper order to arrive at the sequence of the whole genome. Genomics: Sequencing of genomes, determination of the complete set of proteins encoded by an organism and functioning of genes and metabolic path ways in an organism. Historical facts in DNA sequencing:
• •

1953 Discovery of the structure of the DNA double helix. 1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA. 1975 The first complete DNA genome to be sequenced is that of bacteriophage φX174 1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by chemical degradation". Frederick Sanger, independently, publishes "DNA sequencing by enzymatic synthesis". 1980 Frederick Sanger and Walter Gilbert receive the Nobel Prize in Chemistry 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb. 1986 Leroy E. Hood's laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine. 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.

1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents (US)/base). 1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a freeliving organism, the bacterium Haemophilus influenzae by shot gun method.

1995 Richard Mathies et al.. publish fluorescence energy transfer dye-based sequencing.

1996 Pal Nyren and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing. 1998 Phil Green and Brent Ewing of the University of Washington publish “phred” for sequencer data analysis. 1999 Completion of sequencing of the chromosome 22 2000 completion of rough draft of human genome.

• •

Different sequencing methods: Chemical cleavage method: In 1976–1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases by taking advantage of two step catalytic process. It involves piperidine and two chemicals that selectively attack purines and pyrimidines. Purines will react with dimethyl sulfate and pyrimidines will react with hydrazine in such a way as to break the glycoside bond between the ribose sugar and the base displacing the base (Step 1). Piperidine will then catalyze phosphodiester bond cleavage where the base has been displaced (Step 2). The use of these selective reactions to DNA sequencing then involved creating a

singlestranded DNA substrate carrying a radioactive label on the 5’ end. This labeled substrate would be subjected to four separate cleavage reactions, each of which would create a population of labeled cleavage products ending in known nucleotides. Chemicals for cleavage are 1) methyl sulfate which breaks DNA at G, 2) Acid (pH 2.0) at A and G, 3) Hydrazine at T and C and 4) Hydrazine in salt which breaks DNA at C. The reactions would be loaded on high percentage polyacrylamide gels. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabelled DNA fragment, from which the sequence may be inferred. Since electrophoresis, whether in an acrylamide or an agarose matrix, will resolve nucleic acid fragments in the inverse order of length, that is, smaller fragments will run faster in the gel matrix than larger fragments, the dark autoradiographic bands on the film will represent the 5’- 3’ DNA sequence when read from bottom to top. Base calling: Interpreting the banding pattern relative to the four chemical reactions. For example, a band in the lanes corresponding to the C only and the C + T reactions would be called a C. If the band was present in the C + T reaction lane but not in the C only reaction lane it would be called a T. The same decision process would obtain for the G only and the G + A reaction lanes. Chain termination method: At about the same time as Maxam-Gilbert DNA sequencing was being developed, Fred Sanger was developing an alternative method. Rather than using chemical cleavage reactions, Sanger opted for a method involving dideoxy ribose sugars which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in DNA fragments of varying

length. Thus this method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and modified nucleotides that terminate DNA strand elongation.

The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) is added. These reactions would produce a population of fragments all ending in the same dideoxynucleotide in the presence of a DNA polymerase if the ratio of the dideoxynucleotide and the corresponding deoxynucleotide was properly set. The newly synthesized and labeled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence.






Differences between Maxam-Gilbert and Sanger method: 1) Unlike Maxam-Gilbert method each lane would be base-specific in Sanger’s method. 2) Autoradiography is same but base calling is easier. 3) The sequence fragments on the gel were the complement of the actual template in Sanger’s method. 4) A major improvement ushered in by Sanger sequencing was the elimination of some of the dangerous chemicals, like hydrazine. 5) Efficiency is more than chemical cleavage method. When dealing with nucleic acids, enzymatic processes are more efficient than chemical processes. Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation.

Cycle sequencing Cycle sequencing is a modification of the traditional Sanger sequencing method. The principles are the same as in Sanger sequencing; Dideoxynucleotides are used in a polymerization reaction to create a nested set of DNA fragments with dideoxynucleotides at the 3' terminus of each fragment. The key difference is that cycle sequencing employs a thermostable DNA polymerase which can be heated to 95oC and still retain activity. The advantage of using such a polymerase is that the sequencing reaction can be repeated over and over again in the same tube by heating the mixture to denature the DNA and then allowing it to cool to anneal the primers and polymerize new strands. Thus, fewer templates DNA is needed than for conventional sequencing reactions. Furthermore, the repeated heating and cooling can be done in a DNA thermal cycler. Advantages:

Works with ssDNA and dsDNA and thus eliminates the need for M13 Requires only small amounts of template Can be set up in microtitre plates or microtubes Can use internal labeling with [α-32P], [α-33P],or [35S]or with 5’- end Can be adapted for rapid screening

• • •

labeled primer

High-throughput sequencing The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects.

Dye-terminator sequencing It is the semi-automated system that utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which with different wavelengths of fluorescence and emission. Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Automation and sample preparation: The most dramatic advance in sequencing and the one that carried DNA sequencing into a high throughput environment was the introduction of automated sequencing using fluorescence-labeled dideoxy-terminators. In 1986, Leroy Hood and colleagues reported on a DNA sequencing method in which the radioactive labels, autoradiography, and manual base calling were all replaced by fluorescent labels, laser induced fluorescence detection, and computerized base calling. In their method, the primer was labeled with one of four different fluorescent dyes and each was placed in a separate sequencing reaction with one of the four dideoxynucleotides plus all four deoxynucleotides. Once the reactions were complete, the four reactions were pooled and run together in one lane of a polyacrylamide sequencing gel. A four-color laser induced fluorescence detector scanned the gel as the reaction fragments migrated past. The fluorescence signature of each fragment was then sent to a computer where the software was trained to perform base calling. This method was commercialized in 1987 by Applied Biosystems. Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each

peak and remove low-quality base peaks (generally located at the ends of the sequence). Best estimates of error rates for base calling with slab gel based sequencing is PHRED and for capillary sequencing is Life Trace.


Capillary electrophoresis: In the early 1990’s Harold swerdlow and colleagues reported the use of capillaries to obtain DNA sequences. Capillaries are small, a 50μm inner diameter, and they dissipate heat very efficiently due to their high surface area to volume ratios. This means that a capillary based system can be run with much higher voltages thus dramatically lowering the run times. Most importantly, capillary systems can be automated, a major limitation in gelbased systems (dye terminater sequencing is only semi automated that too in case of base calling).

Capillaries could be flushed out after a run and replaced for the next run without having to touch the capillary (Gupta P K 2009). DNA sequencing reactions can be carried out in a single reaction tube and be prepared for loading once the reaction reagents had been filtered out. Load the sequencing reaction into the capillary, apply a constant electrical current through the capillary, and have the resolved fragments migrate past an optical window where a laser would excite the dye terminator, a detector would collect the fluorescence emission wavelengths, and software would interpret the emission wavelengths as nucleotides.



Alternative sequencing methods: (primrose and Twyman 2003) Pyrosequencing: It is a method of DNA sequencing based on the "sequencing by synthesis" principle. It differs from Sanger sequencing, relying on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides. "Sequencing by synthesis" involves taking a single strand of the DNA to be sequenced and then synthesizing its complementary strand enzymatically. The template DNA is immobile, and solutions of A, C, G, and T nucleotides are added and removed after the reaction, sequentially. Inorganic PPi is released as a result of nucleotide incorporation by polymerase. The released PPi is subsequently converted to ATP by ATP sulfurylase, which provides the energy to luciferase to oxidize luciferin and generate light. Light is produced only when the nucleotide solution complements the first unpaired base of the template. Because the added nucleotide is known, the sequence of the template can be determined. Sequencing by ligation DNA ligase is an enzyme that joins together ends of DNA molecules. Although commonly represented as joining two pairs of ends at once, as in the ligation of restriction enzyme fragments, ligase can also join the ends on only one of the two strands. Sequencing by ligation relies upon the sensitivity of DNA ligase for base-pairing mismatches. The target molecule to be sequenced is a single strand of unknown DNA sequence, flanked on at least one end by a known sequence. A short "anchor" strand is brought in to bind the known sequence. A mixed pool of probe oligonucleotides is then brought in (eight or nine bases long), labeled (typically with fluorescent dyes) according to the position that will be sequenced. These molecules hybridize to the target DNA sequence, next to the anchor sequence, and DNA ligase preferentially joins the molecule to the anchor

when its bases match the unknown DNA sequence. Based on the fluorescence produced by the molecule, one can infer the identity of the nucleotide at this position in the unknown sequence. Sequencing by hybridization It is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced (Lizardi 2008). Next generation sequencing methods (Hardiman 2008) • •

Mass Spectrophotometric Sequences. Direct Visualization of Single DNA Molecules by Atomic force Microscopy (AFM ) Single Molecule Real Time Sequencing (SMRT) Techniques Readout of Cellular Gene Expression Use of DNA chips or micro arrays Nano pore sequencing

• •

Nano pore sequencing is based on the electrical perturbations generated by a single strand of DNA as it passes through a pore more than a thousand times smaller than the diameter of a human hair. The physicists used mathematical calculations and computer modeling of the motions and electrical fluctuations of DNA molecules to determine how to distinguish each of the four different bases (A, G, C, T) that constitute a strand of DNA. They based their calculations on a pore about a nanometer in diameter made on silicon nitride, surrounded by two pairs of tiny gold electrodes. The electrodes would record the electrical current perpendicular to the DNA strand as the DNA passed

through the pore. Because each DNA base is structurally and chemically different, each base creates its own distinct electronic signature (Lagerquist 2010). Some commercial sequencers • • • •

Rochel454FLXpyrosequencer - pyrosequencing Illumina genome analyzer – sequencing by synthesis Applied biosystems SOLiD sequencer – sequencing by ligation. Helicos Heliscope Pacific Biosciences SMRT – zeromode waveguide principle

References: Gupta P K 2009 Cell and Molecular biology. 3rd edition Rastogi publications. Hardiman G 2008 ultra-high-throughput sequencing, microarray based genomic selection and pharmacogenomics. Phamacogenomics 9 (1): 5-9. Primrose S B and Twyman 2003 principles of genome analysis and genomics. 3 rd edition Blackwell publishing co. Lagerquist J 2010 nanopore based sequence specific detection of duplex DNA for genomic profiling. Nano letters April 2010 (on line journal). Lizardi P M A new hybridization based technique offers advantage in sequencing genomes. Nature biotechnology 26: 648-650.