Professional Documents
Culture Documents
GOT Ecoli
A protein sequence
>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …
MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST
replication
transcript translation
DNA RNA Protein
genotype phenotype
Exception – retroviruses
replication
transcript translation
DNA RNA Protein
genotype phenotype
Biology
DNA Protein
(Genotype) Phenotype
Genes
• One gene encodes one protein (or sometimes
RNA).
• Like a program, it starts with start codon (e.g.
ATG), then each three code one amino acid. Then
a stop codon (e.g. TGA) signifies end of the gene.
• Genes are dense in prokaryotes and sparse in
eukaryotes.
• In the middle of a eukaryotic gene, there are
introns that are spliced out (as junk) after
transcription. Good parts are called exons. This is
the task of gene finding.
Gene related diseases
• Hemophilia: on X chromosome.
• Sickle-Cell Anemia: single nucleotide mutation in the first
exon of beta-globin gene (removes a cutting site). 1 in 12
African Americans are carriers. (sick for homozygotes)
• BRCA1 gene (chr. 17q) – responsible for ½ inherited
breast cancer (10% of breast cancer)
• Fragile X syndrome (mentally retard) – 1 in 1250 males,
2500 females (dominate, but females have partially
expressed good gene). FMR-1 gene: tri-nucleotide repeats
>200 causes disease.
• P53 gene: chr. 17p, tumor suppressor protein.
Genetic Test
• Example:
http://www.myriad.com/index.php
• Cons and Pros:
• Can possibly avoid/early diagnose the disease.
• Can make you unhappier
• Can help insurance company discriminate the
defected gene carriers
• ……
Possible ways of gene test
• First PCR the gene, then
• Sequencing it
• Measure the length
• Restriction enzyme
• Or
• PCR primer at the mutation site.
Gene Prediction and Annotation
Prokaryotes
1. Start/stop codon (ORF)
2. Promoters
3. Content
4. Sequence similarity
Start Codon
-10: T A T A A T
77% 76% 60% 61% 56% 82% Pribnow box
-35: T T G A C A
69% 79% 61% 56% 54% 54%
These rules are only
approximately correct.
In prokaryotes, the promoter consists of two short sequences at -10 and -35 position
upstream of the gene, that is, prior to the gene in the direction of transcription. The
sequence at -10 is called the Pribnow box and usually consists of the six nucleotides
TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes.
The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence
allows a very high transcription rate.
Scoring a 6-mer as Pribnow box
• Computers deal with exact formulae but not
English description.
• We need a “score function” to measure the
likelihood that a 6-mer is a pribnow box
An exemplary function for pribnow
box fitness evaluation
log()
Content I – codon bias
• A codon XYZ occurs with different freqencies in
coding regions and non-coding regions
• different amino acids have different freq.
• Diff. codons for the same amino acid have diff. freq.
• In non-coding regions approx. p(X)*p(Y)*p(Z)
http://www.kazusa.or.jp/codon/
Codon bias
• First use many known genes of the organism or
similar organisms to train codon frequency table.
• Each codon ci has f(ci).
• Second compute the background frequency of
each base bf(X) for X=A,C,G,T
• The “significance” of a codon c=XYZ is then
–log( f(c) / (bf(X)*bf(Y)*bf(Z))).
• High average significance in a region is an
indication of gene.
Content II - Hidden Markov
Model (HMM)
Eukaryotes
• Basic idea similar to Prokaryotes
• Difference:
DNA-specific transcription factors