You are on page 1of 38

• Transcription is the process of making RNA from DNA.

• Since DNA and RNA use the same four letter alphabet made of the
bases A,C,T, and G, (except RNA uses U instead of T), copying a gene
from DNA onto RNA simply transcribes the instructions for making a
protein. It does not translate them into another language or alphabet.
• In eukaryotes, DNA remains in the nucleus, but proteins are built in
the cytoplasm. So the instructions contained in DNA must be
transcribed onto an RNA (ribonucleic acid) molecule, which moves
into the cytoplasm and serves as the template for making a protein.
• Unlike replication, which makes a copy of a complete chromosome,
transcription only makes a copy of the gene that is to be expressed as
a protein.
• RNA is a chemical similar to a single strand of DNA. In
RNA, the letter U, which stands for uracil, is substituted
for T in the genetic code. RNA delivers DNA's genetic
message to the cytoplasm of a cell where proteins are
made.
• Messenger RNA (mRNA) is a template for protein
synthesis. Each set of three bases, called codons, specifies
a certain protein in the sequence of amino acids that
comprise the protein. The sequence of a strand of mRNA
is based on the sequence of a complementary strand of
DNA.
• Translation is the process of translating the sequence of nucleotide bases in
DNA/RNA into a sequence of amino acids in a protein.
• RNA contains four bases, and is used as the template for making proteins that
are made from varying combinations of 20 amino acids. Amino acids are also
called peptides or residues. A chain of amino acids is called a polypeptide.
• A codon is a set of three bases in a DNA or RNA sequence that specify a single
amino acid.
• Each triplet codon codes for a specific amino acid. Some amino acids can be
coded for by more than one triplet codon (i.e., the genetic code is degenerate).
E.g., the triplet codons CTA, CTC, CTG, and CTT all code for the amino acid
Leucine. Leucine can be abbreviated as Leu or L. The single letter representations
are used in sequence database records. Also, notice that the first two letters of a
triplet codon are usually the most important, and if an amino acid is coded for by
more than one triplet codon, it is the last letter that varies
• The genetic code is the set of instructions in a gene
that tell the cell how to make a specific protein.
• A, T, G, and C are the "letters" of the DNA code. They
stand for the chemicals adenine, thymine, guanine,
and cytosine, respectively, that make up the
nucleotide bases of DNA.
• The DNA code for each gene combines the four
chemicals in various ways to spell out 3-letter "words"
(codons) that specify which amino acid is needed at
every step in making a protein.
• A protein is a large complex molecule made up
of one or more chains of amino acids. Proteins
perform a wide variety of activities in the cell.
• Amino acids are a group of 20 different kinds
of small molecules that link together in long
chains to form proteins. Often referred to as
the "building blocks" of proteins.
Gene control
• Genes are not constantly being transcribed in order to make proteins. Rather, genes can be turned on
or off in a cell. If they are turned on, the rate of transcription, and therefore protein production, can
vary.
• There are a number of factors that initiate gene transcription, and regulate the rate at which initiation
occurs.
• Upstream of the coding sequence is a promoter region of DNA, to which a protein called a
transcription factor binds.
• The promoter contains a region called the TATA box, a short sequence of T-A and A-T base pairs
recognized by the general transcription factor TFIID. The TATA box is usually 25 base pairs upstream
from the transcription start site.
• In eukaryotic organisms, a general transcription factor must bind to a promoter before the RNA
polymerase enzyme can initiate transcription.
• The RNA polymerase then transcribes the gene from genomic DNA (on a chromosome in the nucleus),
to mRNA (which moves through nuclear pores into the cytoplasm, where it can be translated into
protein). RNA polymerase ceases transcription at the stop codon, which is usually TAA, TAG, or TGA.
• In addition to transcription factors, there are thousands of different regulatory proteins. Some
regulatory proteins stimulate gene transcription (e.g., gene activator proteins), while others repress it
(e.g., gene repressor proteins).
• Two examples of technologies that facilitate the measurement of gene
expression are:
• Expressed Sequence Tags (ESTs)
 Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp),
single-pass sequence reads from the 5' and 3' ends of mRNA.
 Typically ESTs are produced in large batches. They represent a snapshot of
genes expressed in a given tissue and/or at a given developmental stage.
 If a gene is very active, it will produce a large number of mRNAs. Therefore,
the number of mRNA fragments from that gene in the library will be high
compared to a less active gene.
 In this way, ESTs are tags (some coding, others not) of mRNA, representing
the genes that are being expressed in a given cell.
 Note: mRNA -> cDNA:
• mRNA is actually not a stable molecule for sequencing, so an
enzyme called reverse transcriptase is used to convert mRNA
back into a synthetic molecule called "cDNA," or complementary
DNA.
• cDNA libraries are built from different types of tissues, e.g.,
normal prostate, precancerous prostate, malignant prostate. The
library contains short fragments of a sampling of the mRNAs
produced in that tissue, and therefore represents the genes that
are being actively transcribed.
• The cDNA is sequenced and deposited into GenBank. However,
the molecule type at the top of the GenBank record is still shown
as mRNA, since that is the molecule that occurs in nature.
• Micorarrays
• "Microarray technology is one of several developing approaches to comparatively analyze
genome-wide patterns of mRNA expression. For mRNA expression studies, the ultimate
goal is to develop arrays which contain every gene in a genome against which mRNA
expression levels can be quantitatively assessed." (From the NHGRI page, "About the
Microarray Project")
• Microarrays contain a wide sampling, or array, of cDNA from human genes. Then, DNA
"probes" from a cell or tissue of interest are tagged with a flourescent label, and allowed
to bind to the sample of human genes on the slide. Some of the cDNAs on the slide might
not bind with any probes. That will be true if the gene represented by that cDNA is not
being expressed by the cell of interest. On the other hand, some of the cDNAs will bind
with a large quantity of flourescent probes, if that gene is highly expressed in the cell of
interest. Other cDNAs will bind with a moderate quantity of probes.
• The slides are put into a scanning microscope that can measure the brightness of each
fluorescent dot; brightness reveals how much of a specific DNA fragment is present, an
indicator of how active it is.
Deletion: loss of a piece of DNA from a chromosome.
Deletion of a gene or part of a gene can lead to a
disease or abnormality.
Duplication: production of one or more copies of any
piece of DNA, including a gene or even an entire
chromosome. Definitions
Insertion: a type of chromosomal abnormality in
which a DNA sequence is inserted into a gene,
disrupting the normal structure and function of that
gene.
Translocation: breakage and removal of a large
segment of DNA from one chromosome, followed by
the segment's attachment to a different chromosome.
• The amino acid sequence of a protein determines its three-dimensional shape.
• The structure of a protein can be described in several levels. The summaries below are
based on definitions found in:
• Smith, A.D., et al., eds. 1997. Oxford Dictionary of Biochemistry and Molecular Biology.
New York: Oxford University Press.
• Primary structure - the linear sequence of residues (amino acids) in a polypeptide chain.
• Secondary structure - the arrangement of a polypeptide chain into more or less regular
hydrogen-bonded structures -- has two basic elements --
• Alpha helix - spiral configuration of a polypeptide chain with 3.6 residues (amino acids)
per turn. The helix may be left-handed or right-handed, and the latter is more common.
• Beta strand - two adjacent polypeptide strands that are bonded together. Two or more
strands may interact to form a beta sheet.
• Tertiary structure - the level of protein structure at which an entire polypeptide chain has
folded into a three-dimensional structure. In multi-chain proteins, the term tertiary
structure applies to the individual chains.
• Quaternary structure - the fourth order of complexity of structural organization exhibited
by protein molecules, and refers to the arrangement in space of the complete protein,
without regard to the internal geometry of the subunits. Quatenary structure is possessed
only when the molecule is made of at least two subunits that are separable.
•Various types of maps exist, based on different mapping methodologies. The
scales and resolutions differ as well. Some examples include:
•cytogenetic map
•genetic map
•physical maps Genome Maps
• radiation hybrid map
• clone-based map
• sequence ma
A cytogenetic map is the visual appearance of a chromosome when stained and
examined under a microscope. Particularly important are visually distinct regions,
called light and dark bands, which give each of the chromosomes a unique
Definition
appearance. This feature allows a person's chromosomes to be studied in a clinical test
known as a karyotype, which allows scientists to look for chromosomal alterations
• Definition
• Each human chromosome has a short arm ("p" for "petit") and long
arm ("q" for "queue"), separated by a centromere. The ends of the
chromosome are called telomeres.
• Each chromosome arm is divided into regions, or cytogenetic bands,
that can be seen using a microscope and special stains. The
cytogenetic bands are labeled p1, p2, p3, q1, q2, q3, etc., counting
from the centromere out toward the telomeres. At higher resolutions,
sub-bands can be seen within the bands. The sub-bands are also
numbered from the centromere out toward the telomere.
• For example, the cytogenetic map location of the CFTR gene is 7q31.2,
which indicates it is on chromosome 7, q arm, band 3, sub-band 1, and
sub-sub-band 2.
• The ends of the chromosomes are labeled ptel and qtel. For example,
the notation 7qtel refers to the end of the long arm of chromosome 7.
• Definition
• A genetic map, also known as a linkage map, is a
chromosome map of a species that shows the position
of its known genes and/or markers relative to each
other, rather than as specific physical points on each
chromosome.
• One tool used in creating genetic linkage maps is a
pedigree -- a simplified diagram of a family's genealogy
that shows family members' relationships to each other
and how a particular trait or disease has been inherited.
• Source: definition from the National Human Genome
Research Institute (NHGRI) Glossary of Genetic Terms.
• Scale:  centiMorgans (cM)
• A centiMorgan is a unit of genetic distance that
represents a 1% probability of recombination during
meiosis.
• E.g., if two genes are 1 cM apart, there is a 1% chance
they will break apart during meiosis. If two genes are
20 cM apart, there is a 20% chance they will break
apart during meiosis. One cM is equivalent, on
average, to a physical distance of approximately 1
megabase in the human genome. This is just an
average because genetic recombination rates vary
along different parts of the chromosomes.
• Scale: centiRay (cR)
• CentiRays are not a constant unit, but the value of each centiray depends on the
dose of radiation used to break the DNA into fragments.
• E.g., If two genes are 1 centiRay apart (at a given dose of radiation), then there is
a 1% chance of getting a break between them.
• E.g., 1cR8000 = 1% chance that two things will break apart at a dose of 8000 rads.
• Source: The example above is from the glossary of: Strachan, T. and Read, A.P.
1997. Human Molecular Genetics. New York: John Wiley & Sons.
• Conversion of centiRay (cR) --> megabase (Mb) --> kilobase (Kb)
• The average Kb length per CentiRay for the cR3000 and cR10000 panels can be
estimated as follows, but keep in mind it provides an average value for the whole
human genome, but the actual ratio can vary along portions of a chromosome:
• GB4: Total cR3000 for the genome is 11524, so:
• 3,200 Mb/11,524 cR = 0.28 Mb/cR X 1000 Kb/Mb = 280 Kb/cR
• G3: Total cR10000 for the genome is 125853, so:
• 3,200 Mb/125,853 cR = 0.025 Mb/cR X 1000 Kb/Mb = 25 Kb/cR
• Euchromatin back to top
• "The fraction of the nuclear genome which
contains transcriptionally active DNA and
which, unlike heterochromatin, adopts a
relatively extended conformation."
• Heterochromatin back to top
• "A chromosomal region that remains highly
condensed throughout the cell cycle and
shows little or no evidence of active gene
expression.
• Draft Sequence back to top
• A region of sequence which still contains gaps, and in which sections of
DNA might still be of unknown order and orientation.
• The sections of DNA are grouped together into a single GenBank
submission because they have been sequenced from the same clone.
Once the order and orientation of the sections is determined, and as
the gaps are filled, the sequence will move into the finished phase.
• The draft sequence in the illustration above is shown in green, while
finished sequence is shown in shades of orange or red (depending on
the length of the finished segment of sequence).
• The phases of sequencing are described on the High-Throughput
Genomic Sequences (HTG) page. Draft sequence can be either phase 1
or phase 2.
• Finished Sequence back to top
• A region of sequence which has been completely
sequenced. It contains no gaps, and the order and
orientation of all the sequence subsections are known.
• The finished sequence in the illustration above is shown in
shades of orange or red (depending on the length of the
finished segment of sequence), while the draft sequence is
shown in green.
• The phases of sequencing are described on the High-
Throughput Genomic Sequences (HTG) page. Finished
sequence is phase 3

You might also like