Professional Documents
Culture Documents
Zerrin Işık
zerrin@cs.deu.edu.tr
Motifs and HMMs
Identifying Motifs
Genes are turned on or off by regulatory proteins
These proteins bind to upstream regulatory regions of
genes to either attract or block an RNA polymerase
Regulatory protein (TF) binds to a short DNA sequence
called a motif (TFBS)
So, finding the same motif in multiple genes’ regulatory
regions suggests a regulatory relationship amongst those
genes
Multiple Alignments and Protein Family
Classification
Multiple alignment of a protein family shows variations
in conservation along the length of a protein
Example: after aligning many globin proteins, the
biologists recognized that the helices region in globins
are more conserved than others.
Motif Search
Sequence motifs represent critical positions that are conserved in
evolution, so search algorithms employing motifs may be used to
identify more divergent sequences than methods based on global
sequence similarity
PSI-BLAST (similarity search using PSSM - Position Specific
Scoring Matrix)
HMM of protein family (a very brief introduction)
Motifs: Profiles and Consensus
a G g t a c T t Line up the patterns by their
C c A t a c g t
Alignment a c g t T A g t
start indexes
a c g t C c A t
C c g t a c g G s = (s1, s2, …, st)
_________________
Construct matrix profile with
A 3 0 1 0 3 1 1 0
Profile C 2 4 0 0 1 4 0 0 frequencies of each nucleotide
G 0 1 4 0 0 0 3 1 in columns
T 0 0 0 5 1 0 1 4
eIj(b) = p(b)
vMj-1(i-1) + log(aMj-1,Mj )
vMj(i) = log (eMj(xi)/p(xi)) + max vIj-1(i-1) + log(aIj-1,Mj )
vDj-1(i-1) + log(aDj-1,Mj )
A path through an
edit graph and the
corresponding path
through a profile
HMM
Making a Collection of HMM for Protein
Families
https://github.com/EddyRivasLab/hmmer
Sequence Pattern (Motif) Discovery
Finding patterns in multiple alignments, or in unaligned
sequences
eMotif (a protein pattern database); eBLOCKs
Gibbs and MEME
To infer patterns in unaligned sequences
Gibbs program starts with a fixed pattern length of W and a random set of
locations of the pattern in given input sequences (i.e., the initial pattern is
random); and then one sequence is selected at a time randomly and an
attempt is made to improve its pattern position.
MEME uses many similar concepts, but uses the EM (expectation
maximization) method.
Utilization of Multiple Alignments
Residue conservation
Jalview
Subfamilies
SCI-PHY
FunShift