Gene Sequencing Methods (Word Document)

Introduction:
Since the discovery of the chemical nature of DNA in the 1950s that, it is
written in a simple four-letter code of nucleotides, and is the hereditary material in
all living organisms, sequencing, or "reading" the genetic code has become of
increasing interest to scientists. RNA sequencing was one of the earliest forms of
nucleotide sequencing. The major landmark of RNA sequencing is the sequence of
the first complete gene and the complete genome of Bacteriophage MS2,
identified and published by Walter Fiers. Prior to the mid-1970s no method
existed by which DNA could be directly sequenced. Knowledge about gene and
genome organization was based upon studies of prokaryotic organisms and the
primary means of obtaining DNA sequence was so-called reverse genetics in
which the amino acid sequence of the gene product of interest is back-translated
into a nucleotide sequence based upon the appropriate codons. Given the
degeneracy of the genetic code, this process can be tricky at best. In the mid-
1970s two methods were developed for directly sequencing DNA. These were the
Maxam-Gilbert chemical cleavage method and the Sanger chain-termination
method. Prior to the development of rapid DNA sequencing methods in the early
1970s by Frederick Sanger in England and Walter Gilbert and Allan Maxam at
Harvard, a number of laborious methods were used. For instance, in 1973, Gilbert
and Maxam reported the sequence of 24 basepairs using a method known as
wandering-spot analysis. The chain-termination method developed by Sanger and
coworkers in 1975 soon became the method of choice, owing to its relative ease
and reliability. Technical variations of chain-termination sequencing include
tagging with nucleotides containing radioactive phosphorus for radiolabelling, or
using a primer labeled at the 5 end with a fluorescent dye. Several changes took
place in these technologies owing to the high demand for low-cost sequencing and
it has driven the development of high-throughput sequencing technologies that
parallelize the sequencing process, producing thousands or millions of sequences
at once. High-throughput sequencing technologies are intended to lower the cost
of DNA sequencing beyond what is possible with standard dye-terminator
methods.
Need for gene sequencing:

Understanding a particular DNA sequence can shed light on a genetic
condition and offer hope for the eventual development of treatment.
An alteration in a DNA sequence can lead to an altered or non functional

protein, and hence to a harmful effect in a plant or animal.
Simple point mutations can cause altered protein shape and function.
Terminology related to sequencing:
DNA
A nucleic acid, that carries the genetic information in the bodys cells. made up of
four similar chemicals called bases and abbreviated A, T, C, and G that are
repeated over and over in pairs.
DNA sequencing
Determination of the order of the nucleotide bases - adenine, guanine, cytosine,

and thymine in a molecule of DNA.
Gene
A gene is a distinct portion of a cells DNA that codes for a type of protein or for
an RNA chain.
Gene sequencing
Gene sequencing is a process in which the individual base nucleotides in an

organism's DNA are identified.
Genome
Complete copy of chromosomal and extra chromosomal gene insrtuctions.
Genome sequencing:
Breaking the whole genome into small pieces, sequencing the pieces and then
reassembling them in proper order to arrive at the sequence of the whole genome.
Genomics:
Sequencing of genomes, determination of the complete set of proteins encoded by

an organism and functioning of genes and metabolic path ways in an organism.
Historical facts in DNA sequencing:
1953 Discovery of the structure of the DNA double helix.
1972 Development of recombinant DNA technology, which permits

isolation of defined fragments of DNA; prior to this, the only accessible
samples for sequencing were from bacteriophage or virus DNA.
1975 The first complete DNA genome to be sequenced is that of

bacteriophage X174
1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by

chemical degradation". Frederick Sanger, independently, publishes "DNA
sequencing by enzymatic synthesis".
1980 Frederick Sanger and Walter Gilbert receive the Nobel Prize in
Chemistry
1984 Medical Research Council scientists decipher the complete DNA

sequence of the Epstein-Barr virus, 170 kb.
1986 Leroy E. Hood's laboratory at the California Institute of Technology

and Smith announce the first semi-automated DNA sequencing machine.
1987 Applied Biosystems markets first automated sequencing machine, the

model ABI 370.
1990 The U.S. National Institutes of Health (NIH) begins large-scale
sequencing trials on Mycoplasma capricolum, Escherichia coli,
Caenorhabditis elegans, and Saccharomyces cerevisiae (at 75 cents
(US)/base).
1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for
Genomic Research (TIGR) publish the first complete genome of a free-
living organism, the bacterium Haemophilus influenzae by shot gun
method.
1995 Richard Mathies et al.. publish fluorescence energy transfer dye-based
sequencing.
1996 Pal Nyren and his student Mostafa Ronaghi at the Royal Institute of
Technology in Stockholm publish their method of pyrosequencing.
1998 Phil Green and Brent Ewing of the University of Washington publish
phred for sequencer data analysis.
1999 Completion of sequencing of the chromosome 22
2000 completion of rough draft of human genome.
Different sequencing methods:
Chemical cleavage method:
In 19761977, Allan Maxam and Walter Gilbert developed a DNA

sequencing method based on chemical modification of DNA and subsequent
cleavage at specific bases by taking advantage of two step catalytic process. It
involves piperidine and two chemicals that selectively attack purines and
pyrimidines. Purines will react with dimethyl sulfate and pyrimidines will react
with hydrazine in such a way as to break the glycoside bond between the ribose
sugar and the base displacing the base (Step 1). Piperidine will then catalyze
phosphodiester bond cleavage where the base has been displaced (Step 2). The use
of these selective reactions to DNA sequencing then involved creating a
singlestranded DNA substrate carrying a radioactive label on the 5 end. This
labeled substrate would be subjected to four separate cleavage reactions, each of
which would create a population of labeled cleavage products ending in known
nucleotides. Chemicals for cleavage are 1) methyl sulfate which breaks DNA at G,
2) Acid (pH 2.0) at A and G, 3) Hydrazine at T and C and 4) Hydrazine in salt
which breaks DNA at C. The reactions would be loaded on high percentage
polyacrylamide gels. To visualize the fragments, the gel is exposed to X-ray film
for autoradiography, yielding a series of dark bands each corresponding to a
radiolabelled DNA fragment, from which the sequence may be inferred.
Since electrophoresis, whether in an acrylamide or an agarose matrix, will

resolve nucleic acid fragments in the inverse order of length, that is, smaller
fragments will run faster in the gel matrix than larger fragments, the dark
autoradiographic bands on the film will represent the 5- 3 DNA sequence when
read from bottom to top.
Base calling: Interpreting the banding pattern relative to the four chemical
reactions. For example, a band in the lanes corresponding to the C only and the C
+ T reactions would be called a C. If the band was present in the C + T reaction
lane but not in the C only reaction lane it would be called a T. The same decision
process would obtain for the G only and the G + A reaction lanes.
Chain termination method:
At about the same time as Maxam-Gilbert DNA sequencing was being

developed, Fred Sanger was developing an alternative method. Rather than using
chemical cleavage reactions, Sanger opted for a method involving dideoxy ribose
sugars which are the chain-terminating nucleotides, lacking a 3'-OH group
required for the formation of a phosphodiester bond between two nucleotides, thus
terminating DNA strand extension and resulting in DNA fragments of varying
length. Thus this method requires a single-stranded DNA template, a DNA
primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and
modified nucleotides that terminate DNA strand elongation.
The DNA sample is divided into four separate sequencing reactions,

containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and
dTTP) and the DNA polymerase. To each reaction only one of the four
dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) is added. These reactions
would produce a population of fragments all ending in the same dideoxynucleotide
in the presence of a DNA polymerase if the ratio of the dideoxynucleotide and the
corresponding deoxynucleotide was properly set.
The newly synthesized and labeled DNA fragments are heat denatured, and
separated by size (with a resolution of just one nucleotide) by gel electrophoresis
on a denaturing polyacrylamide-urea gel with each of the four reactions run in one
of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by
autoradiography or UV light, and the DNA sequence can be directly read off the
X-ray film or gel image. A dark band in a lane indicates a DNA fragment that is
the result of chain termination after incorporation of a dideoxynucleotide (ddATP,
ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among
the four lanes are then used to read (from bottom to top) the DNA sequence.
A G T C
A
T
A
G
C
G
T
A
G
C
G
T
A
G
C
G
T
A
G
C
T
A
G
C
G
A
T
T
A
A
T
T
A
Differences between Maxam-Gilbert and Sanger method:

1) Unlike Maxam-Gilbert method each lane would be base-specific in Sangers
method.
2) Autoradiography is same but base calling is easier.
3) The sequence fragments on the gel were the complement of the actual template
in Sangers method.
4) A major improvement ushered in by Sanger sequencing was the elimination of
some of the dangerous chemicals, like hydrazine.
5) Efficiency is more than chemical cleavage method. When dealing with nucleic
acids, enzymatic processes are more efficient than chemical processes.
Technical variations of chain-termination sequencing include tagging with
nucleotides containing radioactive phosphorus for radiolabelling, or using a primer
labeled at the 5 end with a fluorescent dye. Dye-primer sequencing facilitates
reading in an optical system for faster and more economical analysis and
automation.
Cycle sequencing
Cycle sequencing is a modification of the traditional Sanger sequencing

method. The principles are the same as in Sanger sequencing; Dideoxynucleotides
are used in a polymerization reaction to create a nested set of DNA fragments with
dideoxynucleotides at the 3' terminus of each fragment. The key difference is that
cycle sequencing employs a thermostable DNA polymerase which can be heated
to 95oC and still retain activity. The advantage of using such a polymerase is that
the sequencing reaction can be repeated over and over again in the same tube by
heating the mixture to denature the DNA and then allowing it to cool to anneal the
primers and polymerize new strands. Thus, fewer templates DNA is needed than
for conventional sequencing reactions. Furthermore, the repeated heating and
cooling can be done in a DNA thermal cycler.
Advantages:
Works with ssDNA and dsDNA and thus eliminates the need for M13
phage
Requires only small amounts of template
Can be set up in microtitre plates or microtubes
Can use internal labeling with [-32P], [-33P],or [35S]or with 5- end
labeled primer
Can be adapted for rapid screening
High-throughput sequencing
The high demand for low-cost sequencing has driven the development of
high-throughput sequencing technologies that parallelize the sequencing process,
producing thousands or millions of sequences at once. The dye-terminator
sequencing method, along with automated high-throughput DNA sequence
analyzers, is now being used for the vast majority of sequencing projects.
Dye-terminator sequencing
It is the semi-automated system that utilizes labelling of the chain

terminator ddNTPs, which permits sequencing in a single reaction, rather than four
reactions as in the labelled-primer method. In dye-terminator sequencing, each of
the four dideoxynucleotide chain terminators is labelled with fluorescent dyes,
each of which with different wavelengths of fluorescence and emission. Owing to
its greater expediency and speed, dye-terminator sequencing is now the mainstay
in automated sequencing.
Automation and sample preparation:
The most dramatic advance in sequencing and the one that carried DNA
sequencing into a high throughput environment was the introduction of automated
sequencing using fluorescence-labeled dideoxy-terminators. In 1986, Leroy Hood
and colleagues reported on a DNA sequencing method in which the radioactive
labels, autoradiography, and manual base calling were all replaced by fluorescent
labels, laser induced fluorescence detection, and computerized base calling. In
their method, the primer was labeled with one of four different fluorescent dyes
and each was placed in a separate sequencing reaction with one of the four
dideoxynucleotides plus all four deoxynucleotides. Once the reactions were
complete, the four reactions were pooled and run together in one lane of a
polyacrylamide sequencing gel. A four-color laser induced fluorescence detector
scanned the gel as the reaction fragments migrated past. The fluorescence
signature of each fragment was then sent to a computer where the software was
trained to perform base calling. This method was commercialized in 1987 by
Applied Biosystems. Automated DNA-sequencing instruments (DNA sequencers)
can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a
day. A number of commercial and non-commercial software packages can trim
low-quality DNA traces automatically. These programs score the quality of each
peak and remove low-quality base peaks (generally located at the ends of the
sequence). Best estimates of error rates for base calling with slab gel based
sequencing is PHRED and for capillary sequencing is Life Trace.
chromatogram
Capillary electrophoresis:
In the early 1990s Harold swerdlow and colleagues reported the use of
capillaries to obtain DNA sequences. Capillaries are small, a 50m inner diameter,
and they dissipate heat very efficiently due to their high surface area to volume
ratios. This means that a capillary based system can be run with much higher
voltages thus dramatically lowering the run times. Most importantly, capillary
systems can be automated, a major limitation in gelbased systems (dye
terminater sequencing is only semi automated that too in case of base calling).
Capillaries could be flushed out after a run and replaced for the next run without
having to touch the capillary (Gupta P K 2009). DNA sequencing reactions can be
carried out in a single reaction tube and be prepared for loading once the reaction
reagents had been filtered out. Load the sequencing reaction into the capillary,
apply a constant electrical current through the capillary, and have the resolved
fragments migrate past an optical window where a laser would excite the dye
terminator, a detector would collect the fluorescence emission wavelengths, and
software would interpret the emission wavelengths as nucleotides.
CAPILLARIES
DETECTORS
OUTPUT
SIGNAL
Alternative sequencing methods: (primrose and Twyman 2003)
Pyrosequencing:
It is a method of DNA sequencing based on the "sequencing by synthesis"

principle. It differs from Sanger sequencing, relying on the detection of
pyrophosphate release on nucleotide incorporation, rather than chain termination
with dideoxynucleotides. "Sequencing by synthesis" involves taking a single
strand of the DNA to be sequenced and then synthesizing its complementary
strand enzymatically. The template DNA is immobile, and solutions of A, C, G,
and T nucleotides are added and removed after the reaction, sequentially.
Inorganic PPi is released as a result of nucleotide incorporation by polymerase.
The released PPi is subsequently converted to ATP by ATP sulfurylase, which
provides the energy to luciferase to oxidize luciferin and generate light. Light is
produced only when the nucleotide solution complements the first unpaired base
of the template. Because the added nucleotide is known, the sequence of the
template can be determined.
Sequencing by ligation
DNA ligase is an enzyme that joins together ends of DNA molecules.

Although commonly represented as joining two pairs of ends at once, as in the
ligation of restriction enzyme fragments, ligase can also join the ends on only one
of the two strands. Sequencing by ligation relies upon the sensitivity of DNA
ligase for base-pairing mismatches. The target molecule to be sequenced is a
single strand of unknown DNA sequence, flanked on at least one end by a known
sequence. A short "anchor" strand is brought in to bind the known sequence. A
mixed pool of probe oligonucleotides is then brought in (eight or nine bases long),
labeled (typically with fluorescent dyes) according to the position that will be
sequenced. These molecules hybridize to the target DNA sequence, next to the
anchor sequence, and DNA ligase preferentially joins the molecule to the anchor
when its bases match the unknown DNA sequence. Based on the fluorescence
produced by the molecule, one can infer the identity of the nucleotide at this
position in the unknown sequence.
Sequencing by hybridization
It is a non-enzymatic method that uses a DNA microarray. A single pool of

DNA whose sequence is to be determined is fluorescently labeled and hybridized
to an array containing known sequences. Strong hybridization signals from a given
spot on the array identifies its sequence in the DNA being sequenced (Lizardi
2008).
Next generation sequencing methods (Hardiman 2008)
Mass Spectrophotometric Sequences.

Direct Visualization of Single DNA Molecules by Atomic force
Microscopy (AFM )
Single Molecule Real Time Sequencing (SMRT) Techniques
Readout of Cellular Gene Expression
Use of DNA chips or micro arrays
Nano pore sequencing
Nano pore sequencing is based on the electrical perturbations generated by a

single strand of DNA as it passes through a pore more than a thousand times
smaller than the diameter of a human hair. The physicists used mathematical
calculations and computer modeling of the motions and electrical fluctuations
of DNA molecules to determine how to distinguish each of the four different
bases (A, G, C, T) that constitute a strand of DNA. They based their
calculations on a pore about a nanometer in diameter made on silicon nitride,
surrounded by two pairs of tiny gold electrodes. The electrodes would record
the electrical current perpendicular to the DNA strand as the DNA passed
through the pore. Because each DNA base is structurally and chemically
different, each base creates its own distinct electronic signature (Lagerquist
2010).
Some commercial sequencers

Rochel454FLXpyrosequencer - pyrosequencing
Illumina genome analyzer sequencing by synthesis
Applied biosystems SOLiD sequencer sequencing by ligation.
Helicos Heliscope
Pacific Biosciences SMRT zeromode waveguide principle
References:
Gupta P K 2009 Cell and Molecular biology. 3rd edition Rastogi publications.
Hardiman G 2008 ultra-high-throughput sequencing, microarray based genomic
selection and pharmacogenomics. Phamacogenomics 9 (1): 5-9.
http://www.integratedDNAtechnologies.com
http://www.appliedbiosystems.com
http://www.biostudio.com
http://www.biologyanimations.com
Primrose S B and Twyman 2003 principles of genome analysis and genomics. 3 rd
edition Blackwell publishing co.
Lagerquist J 2010 nanopore based sequence specific detection of duplex DNA for
genomic profiling. Nano letters April 2010 (on line journal).
Lizardi P M A new hybridization based technique offers advantage in sequencing
genomes. Nature biotechnology 26: 648-650.

Gene Sequencing Methods (Word Document)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gene Sequencing Methods (Word Document)

Uploaded by

Copyright:

Available Formats

Introduction:

Need for gene sequencing:

An alteration in a DNA sequence can lead to an altered or non functional

Terminology related to sequencing:

Determination of the order of the nucleotide bases - adenine, guanine, cytosine,

Gene sequencing is a process in which the individual base nucleotides in an

Complete copy of chromosomal and extra chromosomal gene insrtuctions.

Sequencing of genomes, determination of the complete set of proteins encoded by

Historical facts in DNA sequencing:

1953 Discovery of the structure of the DNA double helix.

1972 Development of recombinant DNA technology, which permits

1975 The first complete DNA genome to be sequenced is that of

1977 Allan Maxam and Walter Gilbert publish "DNA sequencing by

1984 Medical Research Council scientists decipher the complete DNA

1986 Leroy E. Hood's laboratory at the California Institute of Technology

1987 Applied Biosystems markets first automated sequencing machine, the

Different sequencing methods:

Chemical cleavage method:

In 19761977, Allan Maxam and Walter Gilbert developed a DNA

Since electrophoresis, whether in an acrylamide or an agarose matrix, will

Chain termination method:

At about the same time as Maxam-Gilbert DNA sequencing was being

The DNA sample is divided into four separate sequencing reactions,

Differences between Maxam-Gilbert and Sanger method:

Cycle sequencing is a modification of the traditional Sanger sequencing

It is the semi-automated system that utilizes labelling of the chain

Automation and sample preparation:

It is a method of DNA sequencing based on the "sequencing by synthesis"

DNA ligase is an enzyme that joins together ends of DNA molecules.

It is a non-enzymatic method that uses a DNA microarray. A single pool of

Next generation sequencing methods (Hardiman 2008)

Mass Spectrophotometric Sequences.

Nano pore sequencing is based on the electrical perturbations generated by a

Some commercial sequencers

You might also like