Professional Documents
Culture Documents
of multiple datasets
Published data
Genetic data
?
Expression data Evolutionary data
How can we integrate multiple datasets?
Proteomics data
Published data
Genetic data
• His work on the “Bayes’ theorem” was published by Richard Price in 1763
• Mathematics of probabilities
• A hot topic in science in early 18th century
• A lot of people at the time were interested in mathematics,
statistics and probabilities because of gambling!
Bayes’ theorem
P(B | A)×P(A)
P ( A | B) =
P(B)
• Say:
• 75% of women have long hair
• 15% of men have long hair
Bayes’ theorem
• What if your friend told you that this person was also wearing high heels?
• We can use P(W|L) as the new prior!
P(H | W )×P(W | L)
P(W | L & H) =
P(H )
P(L | W )×P(W )
P(W | L) P(L)
=
P(M | L) P(L | M )×P(M )
P(L)
Published data
Genetic data
Sperm cells
Cerebral cavities,
Bronchia &
Fallopian tubes
Retina:
Cones and Rods
Bayesian integration on SysCilia data
• Tandem Affinity Purifications & SILAC
• Yeast 2 Hybrid screens
• Ciliary evolutionary co-occurrence
• Gene presence/absence profiles matching ciliary presence/absence
• System co-expression
• Genes with XBOX transcription factor binding sites
15
Bayesian integration of multiple observations
log(a ×b) =log(a) + log(b)
P ( T |! fi ) P(Cilium) P(! fi | T )
= ×
P ( F |! fi ) P(!Cilium) P(! fi | F)
• In case we have a result which has a value, we can use categories.
For instance:
P ( T | fi(0,0.1) ) P(Cilium) P( fi(0,0.1) | T )
= ×
P ( F | fi(0,0.1) ) P(!Cilium) P( fi(0,0.1) | F)
P ( T | fi[ 0.1,0.2) ) P(Cilium) P( fi[0.1,0.2) | T )
= ×
P ( F | fi[ 0.1,0.2) ) P(!Cilium) P( fi[0.1,0.2) | F)
P ( fi T )
• P ( fi F ) Then simply becomes
Ciliary
Predicted
Non-ciliary
The Bayesian integration enriches for more known ciliary genes, than the individual
datasets. We can control for False Discovery Rate.
23
ROC-curve and performance of individual
datasets
AUC: 0.86
24
Application of the Bayesian integration
• Predicting causative genes in ciliopathy disease loci or exome data
• Predict which genes are likely involved in ciliary function, and which are not
• Example BBS5 locus (182 genes):
25
Conclusion
• Bayesian integration is a powerful way to predict novel ciliary genes by
objective evaluation and integration of experimental datasets
• New datasets can easily be incorporated