You are on page 1of 5

Almost immediately after the structure of DNA was elucidated by Watson and Crick, the mechanism by

which genetic information was maintained within a cell and used to create proteins became apparent. 
This mechanism has become known as the "Central Dogma of Molecular Biology".  The Central Dogma
has three main parts:

1. Genetic information is preserved and transmitted to new cells and offspring by a duplication process
called replication.  Replication occurs as a part of mitosis, normal cell division reviewed above.

2. Genetic information stored in the nucleus is made available to the rest of the cell by the creation of
numerous temporary copies known as messenger RNA (mRNA) through a process known
as transcription.  mRNA is similar to DNA in that it consists of a long, specific sequence of nucleotides.  It
differs in that it is single-stranded, contains the sugar ribose rather than deoxyribose in its backbone,
and utilizes the base uracil in place of thymine.  Transcription is a major part of one of the most
important aspects of gene expression, the "turning on" of genes in appropriate cells at appropriate
times. 

3.  In the cytoplasm, ribosomes construct specific proteins by interpreting the sequence of bases in
mRNA.  This process is known as translation.  The genetic code which allows ribosomes to assemble the
correct amino acids in the correct order is the subject of the following section. 

Since proteins are the structural core of the cell and since proteins (in the form of enzymes) control
nearly all of the cell's metabolism, the ability to specify protein structure makes DNA the primary
determinant of the structure and function of cells.  The Central Dogma is a major organizing principle in
molecular biology and the organization of DNA in cells and genes cannot be fully understood except in
its context.

The genetic code


Table 1 Universal codon table.  One-letter amino acid abbreviations follow names

One of the most important discoveries in biology was the means by which a DNA sequence specified the
sequence of amino acids in a protein.  Through experimentation, it was found that consecutive groups of
three nucleotides, known as codons, determined the particular amino acids that would occur
sequentially in a polypeptide.  The relationship between particular codons and particular amino acids
was found to be the same for nearly all living organisms, and this relationship has become known as
the genetic code (Table 1).  In the table, all codons within a section of the table code for the listed amino
acid (i.e. GGU, GGC, GGA, and GGG all code for Glycine).  Also note that the codons in this table are
specified for mRNA (the intermediary between DNA and protein production).  To determine the codon
as specified in the original DNA, simply substitute T (thymine) for U (uracil). 

Reading frames
Fig. 15  Detailed diagram of the organization of the beginning of a gene (strand direction descriptors
follow the NCBI convention).  Note that "positive" and "negative" strand designations are relative to the
p arm of the chromosome, while the "sense" strand designation is relative to the orientation of a
particular gene.

A particular strand of DNA could be divided into codons three different ways depending on the starting
position from which the nucleotide triples are demarcated.  Since genes can be coded for on either of
the complementary strands, a double-stranded piece of DNA can thus have a total of six different frames
of reference for demarcating codons.  Each of these six is called a reading frame. 

All DNA sequences coding for proteins begin with the same codon, ATG, which codes for the amino acid
methionine.  This codon is also known as the start codon.  In Fig. 15, the ATG sequence on the negative
strand demarcates the beginning of a coding sequence.  (Since the gene in the example is coded on the
negative strand, for that gene the negative strand is the sense strand.)  The reading frame which
contains that methionine is therefore the one correct reading frame (out of the possible six), which
codes for the protein.  The other five reading frames are essentially random gibberish.  Methionine will
appear at the beginning of all genes since it is the only codon used to signal the start of protein coding,
but not all methionines in a sequence are start codons.  In addition to its role in the initiation of
translation, methionine is also simply an amino acid equivalent to the other 19 amino acids. 

Because of their random nature, the other five possible reading frames derived from a DNA sequence
will often contain stop codons that are generated by chance.  If those reading frames actually coded for
proteins, the stop codons would indicate to the ribosomes that they should stop adding amino acids to
the polypeptide. Since it would not make sense biologically for translation to stop after so short a time,
the presence of many stop codons can serve as an indication that a particular reading frame does not
legitimately code for a protein sequence.  In a DNA sequence composed of a random series of
nucleotides, stop codons should occur by chance on average every 21 codons (4 3 possible combinations
of three nucleotides divided by the three possible stop codons).  Therefore it is unlikely that any reading
frame would continue for a distance of much longer than 21 codons without being interrupted by a stop
codon unless it actually coded for a protein.  A segment of DNA that contains a reading frame in which a
long sequence is not interrupted by any stop codon is called an open reading frame (ORF).  Therefore,
the first step in searching a new genome sequence for unknown protein-coding genes is usually to
identify ORFs.  Note: In order for a segment of DNA to be considered an ORF, the stretch of DNA lacking
a stop codon must be much longer than what would be likely to occur by chance in a random sequence
of nucleotides (i.e. many more than 21 codons= hundreds of nucleotides). 

You should be aware that there are genes that do not code for proteins.  Most notably, these are genes
that code for non-messenger RNAs  (i.e. transfer RNA=tRNA and ribosomal RNA=rRNA).  In the case of
RNA coding genes, the RNA transcript is the final structural product and is not subsequently translated. 
Thus tRNA and rRNA genes do not contain codons, nor does the term reading frame have any relevance
to them. 

Gene Structure

The genetic structure described in the previous section is relatively simple.  However, eukaryotic genes
have a relatively complicated structure that includes additional components beyond the coding
sequence.  Transcribed mRNA preceding the start codon is necessary for the attachment of the
ribosomes that translate the mRNA.  Untranslated mRNA is also present after the stop codon.  In
addition to this flanking DNA, untranslated sequences are present within the coding portion of the gene
that contains the coding sequence.  These regions, which in some cases may be quite long, are known
as introns, and the coding segments are called exons (Fig. 16).  Before translation, introns are spliced
out of the sequence.  The joined exons are then used by the ribosomes in the translation process.  The
precise purpose of introns is unknown, although they may allow for the formation of variant forms of
genes through a proposed process called "alternative splicing".  However, it is known that their
sequence evolves at a much faster rate than exons because mutations in introns have no effect on the
amino acid sequence of the protein coded for by the gene. 
Fig. 16 Production of messenger RNA through transcription and RNA editing

You might also like