You are on page 1of 37

Gene, Proteins, and Genetic Code

Protein Synthesis in a Cell


Protein and Amino Acids
Protein
Protein

GOT Ecoli
A protein sequence
>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …
MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST

A protein sequence may have a few hundreds to several


thousands amino acids.
Protein synthesis
.
Genetic code .
A
T I
T
C
A H
C
A
G S
T
G
G G
A
.
.
Notes on translation
• Three Reading frames
• Third base not important
• 5’ -> 3’
• Start and end codon
• Open Reading Frame (ORF)
• Each gene is an ORF, but not all ORF are
genes.
The Central Dogma of Molecular Biology

replication

transcript translation
DNA RNA Protein

genotype phenotype
Exception – retroviruses
replication

transcript translation
DNA RNA Protein

genotype phenotype
Biology

DNA Protein
(Genotype) Phenotype
Genes
• One gene encodes one protein (or sometimes
RNA).
• Like a program, it starts with start codon (e.g.
ATG), then each three code one amino acid. Then
a stop codon (e.g. TGA) signifies end of the gene.
• Genes are dense in prokaryotes and sparse in
eukaryotes.
• In the middle of a eukaryotic gene, there are
introns that are spliced out (as junk) after
transcription. Good parts are called exons. This is
the task of gene finding.
Gene related diseases
• Hemophilia: on X chromosome.
• Sickle-Cell Anemia: single nucleotide mutation in the first
exon of beta-globin gene (removes a cutting site). 1 in 12
African Americans are carriers. (sick for homozygotes)
• BRCA1 gene (chr. 17q) – responsible for ½ inherited
breast cancer (10% of breast cancer)
• Fragile X syndrome (mentally retard) – 1 in 1250 males,
2500 females (dominate, but females have partially
expressed good gene). FMR-1 gene: tri-nucleotide repeats
>200 causes disease.
• P53 gene: chr. 17p, tumor suppressor protein.
Genetic Test
• Example:
http://www.myriad.com/index.php
• Cons and Pros:
• Can possibly avoid/early diagnose the disease.
• Can make you unhappier
• Can help insurance company discriminate the
defected gene carriers
• ……
Possible ways of gene test
• First PCR the gene, then
• Sequencing it
• Measure the length
• Restriction enzyme
• Or
• PCR primer at the mutation site.
Gene Prediction and Annotation
Prokaryotes
1. Start/stop codon (ORF)
2. Promoters
3. Content
4. Sequence similarity
Start Codon

May miss short genes.


Do not know which start codon to use.
Overlapping ORF at different reading frames.
Promoters
<-- upstream downstream -->
5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3‘
-35 -10 Gene to be transcribed

-10: T A T A A T
77% 76% 60% 61% 56% 82% Pribnow box
-35: T T G A C A
69% 79% 61% 56% 54% 54%
These rules are only
approximately correct.
In prokaryotes, the promoter consists of two short sequences at -10 and -35 position
upstream of the gene, that is, prior to the gene in the direction of transcription. The
sequence at -10 is called the Pribnow box and usually consists of the six nucleotides
TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes.
The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence
allows a very high transcription rate.
Scoring a 6-mer as Pribnow box
• Computers deal with exact formulae but not
English description.
• We need a “score function” to measure the
likelihood that a 6-mer is a pribnow box
An exemplary function for pribnow
box fitness evaluation

log()
Content I – codon bias
• A codon XYZ occurs with different freqencies in
coding regions and non-coding regions
• different amino acids have different freq.
• Diff. codons for the same amino acid have diff. freq.
• In non-coding regions approx. p(X)*p(Y)*p(Z)
http://www.kazusa.or.jp/codon/
Codon bias
• First use many known genes of the organism or
similar organisms to train codon frequency table.
• Each codon ci has f(ci).
• Second compute the background frequency of
each base bf(X) for X=A,C,G,T
• The “significance” of a codon c=XYZ is then
–log( f(c) / (bf(X)*bf(Y)*bf(Z))).
• High average significance in a region is an
indication of gene.
Content II - Hidden Markov
Model (HMM)
Eukaryotes
• Basic idea similar to Prokaryotes
• Difference:
DNA-specific transcription factors

• These are the basic of gene-regulatory


network
• Another hot area in Bioinformatics
These rules are only
Splicing approximately correct.

• Consensus sequences have been identified as necessary but


not sufficient for splicing. In vertebrates, these sequences
are (the slash identifies the exon-intron or intron-exon
junction):
• C(orA)AG/GTA(orG)AGT "donor" splice site
• T(orC)nNC(orT)AG/G "acceptor" splice site.
• A third sequence, which in yeast is TACTAAC , is necessary
within the intron sequence.
Gene Prediction Software
• Try Gene Scan at
http://genes.mit.edu/GENSCAN.html by
using the sequence at
• http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi
?db=nucleotide&val=3253144
• Did Gene Scan work well?

You might also like