You are on page 1of 120

Gene Co-

expression
Networks

1
Gene co-expression networks
Gene Expression Data
• Identify co-expressed
genes
• Classify new datasets
• Discover regulatory
networks

Courtesy of Macmillan Publishers Limited. Used with permission.


Source: Baker, Monya. "Gene Data to Hit Milestone." Nature 487,
no. 7407 (2012): 282-3.
3
Gene co-expression network

Schaefer et al, 2017


Outline
• Gene expression
– Distance metrics
– Clustering
– Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data

5
Distance Metrics
3.5

2.5
Expression

1.5

0.5

0
0 2 4 6
Time

6
Expression data as
multidimensional vectors

XB,2 Gene B XA =( 1, 0.5, -1, 0.25, …)


expression XB =(0.2, 0.4, -1.2, 0.05, …)
in experiment Gene A …
2 XA,2
XA,1 XB,1

expression
in experiment
1

What is a natural way to compare these vectors?

7
Euclidean
• XA,k = Expression of gene A in condition k
expression in expt 2

XB,2 B

A
XA,2
XA,1 XB,1

expression in expt 1 8
Distance
• Metrics for distance have a formal
definition:
– d(x, y) ≥ 0
– d(x, y) = 0 if and only if x = y
– d(x, y) = d(y, x)
– Triangle inequality:
d(x, z) ≤ d(x, y) + d(y, z)
• This triangle inequality need not hold for a
measure of “similarity.”
• Distance ~ Dissimilarity = 1 - similarity
9
Distance Metrics
Can we capture the similarity of
3.5
these patterns?
3

2.5
Expression

1.5

0.5

0
0 2 4 6
Time

10
Pearson Correlation

• Xi,j = Expression of gene i in condition j

• ZA = z-score of gene A in all experiments:


• normalized expression of gene A
• ZA is a vector over all experiments

11
Pearson Correlation
• Xi,j = Expression of gene i in condition j
• Zi = z-score of gene i one experiment:

• Pearson correlation
over all experiments

– from +1 (perfect correlation) to -1 (anti-correlated)


• Distance = 1-rA,B

12
3.5 1.5

3 1
Expression

2.5
0.5

Z-score
2
0
0 1 2 3 4 5 6
1.5
-0.5
A ZA
1
C -1
ZC
0.5
-1.5
0
0 1 2 3 4 5 6 -2

13
3.5
2
ZA
3
1.5 ZB
A ZC
Expression

2.5 1
B

Z-score
2 C
0.5

1.5 0
0 1 2 3 4 5 6
1 -0.5

-1
0.5

-1.5
0
0 1 2 3 4 5 6
-2

RA,B= -0.01
RA,C= 0.999
RB,C= -0.03

14
4 2
ZA
3 1.5 ZB
ZD
Expression

2 1

Z-score
1 0.5

0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
-1 A
-0.5
B
-2 D -1

-3 -1.5

-4 -2

RA,B= -0.01
RA,D= -1.0
RB,D= 0.007

15
Distance Metrics
3.5

2.5
Expression

1.5

1
1-
0.5

0
0 2 4 6
Time

16
Missing Data
• What if a particular data point is missing?
(Back in the old days: there was a bubble or a
hair on the array)
– ignore that gene in all samples
– ignore that sample for all genes
– replace missing value with a constant
– “impute” a value
• example: compute the K most similar genes (arrays)
using the available data; set the missing value to the
mean of that for these K genes (arrays)

17
Clustering
• Multiple mechanisms could lead to up-
regulation in any one condition
• Goal: Find genes that have “similar”
expression over many condition.
• How do you define “similar”?

18
Clustering
• Intuitive idea that we want to find an
underlying grouping
• In practice, this can
be hard to define and
implement.
• An example of unsupervised learning

19
Clustering 8600 human genes based on time course
of expression following
serum stimulation of fibroblasts

Key: Black = little change Green = down Red = up


(relative to initial time point)
Genes

(A) cholesterol biosynthesis


(B) the cell cycle
(C) the immediate-early response
(D) signaling and angiogenesis
(E) wound healing and tissue remodeling

© American Association for the Advancement of Science. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Iyer, Vishwanath R., Michael B. Eisen, et al. "The Transcriptional Program in the Response
of Human Fibroblasts to Serum." Science 283, no. 5398 (1999): 83-7. Iyer et al. Science 1999
20
Why cluster?

• Cluster genes (rows)


– Measure expression at multiple time-points, different
conditions, etc.
Similar expression patterns may suggest similar functions of genes

• Cluster samples (columns)


– e.g., expression levels of thousands of genes for each
tumor sample
Similar expression patterns may suggest biological relationship
among samples

21
Hierarchcial clustering
Two types of approaches:
•Agglomerative
•Divisive

© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

22
Agglomerative Clustering Algorithm
• Step 1: Initialization: Each data point is in its
own cluster
• Step 2: Merge the two most similar clusters.
• Step 3: Repeat step 2 until there is only one
cluster.

23
Agglomerative Clustering Algorithm

• Step 1: Initialization: Each data point is in its


own cluster
• Step 2: Merge the two most similar clusters.
• Step 3: Repeat step 2 until there is only one
cluster.

If distance is defined for a


vector, how do I compare
clusters?

24
• Clusters Y, Z with A in Y and B in Z
• Single linkage = min{dY,Z}
• Complete linkage = max{dY,Z}
• UPGMC (Unweighted Pair Group Z
B
Method using Centroids Y
A

– Define distance as
• UPGMA (Unweighted Pair Group
Method with Arithmetic Mean)
average of pairwise distances:
25
• Single linkage = min{dY,Z}
• Complete linkage = max{dY,Z}

26
• If clusters exist and are compact, it should not
matter.
• Single linkage will “chain” together groups
with one intermediate point.
• Complete linkage will not combine two groups
if even one point is distant.

27
Interpreting the Dendogram
• This produces a binary tree or
dendrogram
• The final cluster is the root and
each data item is a leaf
• The heights of the bars indicate
how close the items are
• Can ‘slice’ the tree at any
distance cutoff to produce
Distance

discrete clusters
• Dendogram represents the
results of the clustering; its
usefulness in representing the
data is mixed.
• The results will always be
hierarchical, even if the data are
Data items (genes, etc.) not.

28
K-means clustering
• Advantage: gives sharp partitions of the data
• Disadvantage: need to specify the number of
clusters (K).
• Goal: find a set of k clusters that minimizes
the distances of each point in the cluster to
the cluster mean:

29
K-means clustering algorithm
• Initialize: choose k points as cluster means
• Repeat until convergence:
– Assignment: place each point Xi in the cluster
with the closest mean.
– Update: recalculate the mean for each cluster

30
31
32
33
What if you choose the wrong K?

34
35
36
37
Big steps occur when we are dividing
data into natural clusters

Smaller steps occur


when we are
overclustering

38
What if we choose pathologically
bad initial positions?

39
40
41
42
43
44
45
46
47
48
What if we choose pathologically
bad initial positions?
Often, the algorithm gets a
reasonable answer, but not always!

49
Convergence
• K-means always converges.
• The assignment and update steps always
either reduce the objective function or leave it
unchanged.

50
Convergence
• However, it doesn’t always find the same
solution.

K=2
51
Fuzzy K-means

K=2
52
K-means Fuzzy k-means
• Initialize: choose k • Initialize: choose k
points as cluster means points as cluster means
• Repeat until • Repeat until
convergence: convergence:
– Assignment: place each – Assignment: calculate
point Xi in the cluster probability of each point
with the closest mean. belonging to each
cluster.
– Update: recalculate the – Update: recalculate the
mean for each cluster mean for each cluster
using these probabilities

76
K-means Fuzzy k-means

= membership of point j in cluster i


Larger values of r make the clusters
more fuzzy.

Relationship to EM and Gaussian mixture models 77


Limits of k-means
K-means uses Euclidean distance

• Gives most weight to largest cluster


• Can’t be used if data are qualitative
• Centroid usually does not represent any datum
55
K-means K-medoids
• Best clustering minimizes • Best clustering minimizes
within-cluster Euclidean within-cluster dissimilarity
distance of from centroids from medoids (exemplar)

Medoid: a data point in each cluster that minimizes the sum of


distances of all points in that cluster from itself.

56
K-medoids clustering
• Initialize: choose k points as cluster means
• Repeat until convergence:
– Assignment: place each point Xi in the cluster
with the closest medoid.
– Update: recalculate the medoid for each cluster

57
So What?
• Clusters could reveal underlying biological
processes not evident from complete list of
differentially expressed genes
• Clusters could be co-regulated. How could we
find upstream factors?

58
Reconstructing Regulatory Networks

Courtesy of Elsevier B.V. Used with permission.


Source: Sumazin, Pavel, Xuerui Yang, et al. "An Extensive
MicroRNA-mediated Network of RNA-RNA Interactions
Regulates Established Oncogenic Pathways in
Glioblastoma." Cell 147, no. 2 (2011): 370-81.

59
Clustering vs. “modules”
• Clusters are purely phenomenological – no
claim of causality
• The term “module” is used to imply a more
mechanistic connection
Transcription Transcription
factor A factor B

Correlated expression
60
61
Courtesy of Macmillan Publishers Limited. Used with permission.
Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for
Robust Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
62
Courtesy of Macmillan Publishers Limited. Used with permission.
Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for
Robust Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
63
Outline
• Gene expression
– Distance metrics
– Clustering
–Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data

64
Bayesian Networks

Predict unknown variables from


observations

A “natural” way to think about


biological networks.

65
Bayesian Networks
• Bayesian Networks are a tool for reasoning
with probabilities
• Consist of a graph (network) and a set of
probabilities
• These can be “learned” from the data
Bayes’ theorem
• Bayes’ theorem (Bayes’ law or Bayes’ rule)
describes the probability of an event based
on prior knowledge of conditions that might
be related to the event.
Bayes’ theorem

• It can be formulated mathematically as:

• or
𝑃(𝐴,𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴)

• A and B are observed events


• P(B) is the probability of observing B independently.
• P(A,B) is a joint probability of A and B which represents the
likelihood of event A and B occurring together
• P(A|B) is a conditional probability which represents the
likelihood of event A occurring given that B is observed.
Bayes’ theorem

• It can be formulated mathematically as:

• or
𝑃(𝐴,𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴)

• Other equivalent formulation:


Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• G is a directed acyclic graph (DAG) made up of nodes corresponding
to random variables, X and links representing conditional
dependencies between them

An example of a DAG.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• P is a set of conditional probability distributions (CPD), one for each
node conditional on its parents.
• It follows the Markov property of BN: each node is conditionally
independent of its ancestors given its parents.

An example of a DAG.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• P is a set of conditional probability distributions (CPD), one for each
node conditional on its parents.
• CPD for node A: 𝑃(A|B,C)
• CPD for node B: 𝑃(B|F)
• CPD for node C: P(C)
• What are the CPD for nodes D, E, F and G?

An example of a DAG.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• P is a set of conditional probability distributions (CPD), one for each
node conditional on its parents.
• Given the Markov property, the joint probability distribution of
random variables in G can be decomposed into a product of
conditional probability distributions across all nodes.
𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)

• ϴ is a set of parameters in P.
An example of a DAG.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• ϴ is a set of parameters in P
• Number of parameters in Bayesian network means the number of
conditional probability entries in the network (as shown below).

The conditional probability


for nodes C and F have
only one value because
P(C=No) or P(F=No) is
simply one minus the
value.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• ϴ is a set of parameters in P
• Number of parameters equals
• n: number nodes
• 𝒓𝒊 is the level of node i
• 𝒒𝒊 is the number of value combinations of node 𝑖’s parents

• ϴ is a set of parameters in P.

An example of a DAG.
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• ϴ is a set of parameters in P
• Number of parameters equals
• n: number nodes
• 𝒓𝒊 is the level of node i
• 𝒒𝒊 is the number of value combinations of node 𝑖’s parents

𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)

Suppose all nodes have binary values, that their level are all 2

Then the number of parameters in 𝑃(A|B,C) is calculated as:


(𝒓𝑨 − 1) 𝒒𝑨 = (2-1) * (2*2) = 4
Then the number of parameters in P(B|F) is calculated as:
(𝒓𝑩 − 1) 𝒒𝑩 = (2-1) * (2) = 2
Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• ϴ is a set of parameters in P
• Number of parameters equals
• n: number nodes
• 𝒓𝒊 is the level of node i
• 𝒒𝒊 is the number of value combinations of node 𝑖’s parents

𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)

Suppose all nodes have binary values, that is their level are all 2

Number of parameters for P(A,B,C,D,E,F,G,H) = 4 + 2 + 1 + 4 + 2 + 1 + 1 + 2

How many parameters if no independency between nodes?


If all nodes are binary, it will be 𝟐𝒏 -1
Quick exercise:
What is the probability for network?
How many parameters?
What is the probability of observing (S=1, C=1, B=1, X=1, D =1)?
Parameter Learning
• Parameter learning determines the
probability distribution of each node in a BN.

• Once the structure of the network has been


learnt from the data, estimating the
parameters of the global distribution is greatly
simplified by the application of the Markov
property.
Parameter Learning
• One can also classify the value of a target
variable using the posterior probability
distribution and an observation of other
variables.
• The predicted class is the one that has the
largest posterior probability.
• Posterior probability: given the observation of other
variables, what is the probability of the target variable?
Parameter Learning
• Example, for an observation of the input
variables, (𝑋1=𝑥1,𝑋2=𝑥2,⋯,𝑋𝑛−1=𝑥𝑛−1),
• The predicted class of target T is c satisfying

𝑚𝑎𝑥𝑐 𝑃(𝑇=𝑐|𝑋1=𝑥1,𝑋2=𝑥2,⋯,𝑋𝑛−1=𝑥𝑛−1)
Parameter Learning
• Example, a BN with all node having YES or NO
Parameter Learning
• Example, a BN with all node having YES or NO

Suppose we observe an event (B=Yes, C=Yes, D=Yes, E=No, F=Yes, G=No, H=Yes)
What is P(A=YES)?
Parameter Learning
• Example, a BN with all node having YES or NO
𝑃(𝐴=𝑌|B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) Product of conditional probability
∝ 𝑃(𝐴=𝑌,B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) for each node
=𝑃(𝐴=𝑌|𝐵=𝑌,𝐶=𝑌) 𝑃(B=Y|F=Y)P(C=Y) P(D=Y|A=Y,G=N) P(E=N|A=Y) P(F=Y)
P(G=N) P(H=Y|E=N)
=0.5×0.43×0.59×0.8×(1−0.67)×0.68×(1−0.59)×0.23
=0.002;

𝑃(𝐴=𝑁|B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) ∝𝑃(𝐴=𝑁,B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y)
=𝑃(𝐴=𝑁|𝐵=𝑌,𝐶=𝑌)𝑃(B=Y|F=Y)P(C=Y)P(D=Y|A=N,G=N)P(E=N|A=N)P(F=Y)P(G=N)
P(H=Y|E=N)
=(1−0.5)×0.43×0.59×0.5×(1−0.1)×0.68×(1−0.59)×0.23
=0.004.
Suppose we observe an event (B=Yes, C=Yes, D=Yes, E=No, F=Yes, G=No, H=Yes)
What is P(A=YES)?
Parameter Learning
• Example, a BN with all node having YES or NO

What is P(A=Yes | F=Yes)?


A only depends on B and C, B
depends on F. So the rest drops out.

Ancestor (Sub)Graph: a subgraph of the Bayes Net where only variables of interest
and their ancestors are drawn
Ancestor Graph Shortcut: Any variable that is not in the "ancestor" graph for the set
of variables of interest we can remove from the summation.
Parameter Learning
• Example, a BN with all node having YES or NO

P(A=Yes | F=Yes)
= σ𝐵,𝐶 𝑃 𝐴 = 𝑌𝑒𝑠 𝐵, 𝐶 𝑃(𝐶)𝑃(𝐵|𝐹 = 𝑌𝑒𝑠)

B C 𝑃 𝐴 = 𝑌𝑒𝑠 𝐵, 𝐶 𝑃(𝐶) 𝑃(𝐵|𝐹 = 𝑌𝑒𝑠)


Yes Yes
Yes No
No Yes
No Yes
Is the p53 pathway activated?

Courtesy of Looso et al. License: CC-BY.


Source: Looso, Mario, Jens Preussner, et al. "A De Novo Assembly of the Newt Transcriptome Combined with Proteomic
Validation Identifies New Protein Families Expressed During Tissue Regeneration." Genome Biology 14, no. 2 (2013): R16.

87
Is the p53 pathway activated?
Possible Evidence
•Known p53 targets are up-regulated
•Could another pathway also cause this?
•Genes for members of signaling pathway are
expressed (ATM, ATR, CHK1, …)
•Might be true under many conditions
where pathway has not yet been activated
•Genes for members of signaling pathway are
differentially expressed
•Still does not prove change in activity
Courtesy of Looso et al. License: CC-BY.
Source: Looso, Mario, Jens Preussner, et al. "A De Novo Assembly of the Newt Transcriptome Combined with Proteomic
Validation Identifies New Protein Families Expressed During Tissue Regeneration." Genome Biology 14, no. 2 (2013): R16.

88
Is the p53 pathway activated?
• Formulate problem probabilistically
• Compute
– P(p53 pathway activated| data)
• How?
– Relatively easy to compute p(X up | TF up)
– How?

TF

X1 X2 X3 89
Is the p53 pathway activated?
• Formulate problem probabilistically
• Compute
– P(p53 pathway activated| data)
• How?
– Relatively easy to compute p(X up | TF up)
– Look over lots of experiments and tabulate:
• X1 up & TF up
• X1 up & TF not up
• X1 not up & TF not up
• X1 not up & TF up TF

X1 X2 X3 90
Is the p53 pathway activated?
• Formulate problem probabilistically
• Compute
– P(p53 pathway activated| data)
• How?
– Relatively easy to compute p(X up | TF up)
– P(TF up|X up) = p(X up | TF up) p(TF up)/p(X up)

TF

X1 X2 X3 310
Is the p53 pathway activated?
• Formulate problem probabilistically
• Compute
– P(p53 pathway activated| data)
• How?
– Even with p(TF up | X up) how do we compare this
explanation of the data to other possible
explanations?
– Can we include upstream data? TF

X1 X2 X3
Application to Gene Networks
TF A1 TF B1
•Which pathway activated
this set of genes?
•Either A or B or both would
produce similar but not
identical results.

TF A2 TF B2
•Bayes Nets estimate
conditional probability tables
from lots of gene expression
data.
•How often is TF B2
expressed when TF B1 is
expressed, etc.

93
Learning Models from Data
• Searching for the BN structure: NP-complete
– Too many possible structures to evaluate all of
them, even for very small networks.
– Many algorithms have been proposed
– Incorporated some prior knowledge can reduce
the search space.
• Which nodes should regulate transcription?
• Which should cause changes in phosphorylation?
– Intervention experiments help

94
© American Association for the Advancement of Science. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Sachs, Karen, Omar Perez, et al. "Causal Protein-signaling Networks Derived From
Multiparameter Single-cell Data." Science 308, no. 5721 (2005): 523-9.

• Without interventions, all we can say is that


X and Y are correlated
• Interventions allow us to determine which is
the parent.
K. Sachs et al., Science 308, 523 -529 (2005)

95
Fig. 1. Bayesian network modeling with single-celldata

If we don’t measure “Y” can


we still model the data?
The relationship of X and Z,W
will be noisy and might be
missed.

© American Association for the Advancement of Science. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Sachs, Karen, Omar Perez, et al. "Causal Protein-signaling Networks Derived From
Multiparameter Single-cell Data." Science 308, no. 5721 (2005): 523-9.

K. Sachs et al., Science 308, 523 -529 (2005) 113


Outline
• Gene expression
– Distance metrics
– Clustering
–Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data

97
Regression-based models

© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

Predicted expression : Yg = f g ( X Tg ) + 
Assume that expression of gene Xg is some function of the
expression of its transcription factors X Tg = {X t ,t Tg }
Xi = measured expression of i-th gene
XTi = measured expression of a set of TFs potentially
regulating gene i
fg is an arbitrary function
ϵ is noise
98
Regression-based models

© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

f g ( XTg ) =  t ,g X t
tTg

fg is frequently assumed to be a linear function


The values of the t,g reflect the influence of each TF on
gene g
How do we discover the values of the t,g ?
BMC Syst Biol. 2012 Nov 22;6:145. doi: 10.1186/1752-0509-6-145.
TIGRESS: Trustful Inference of Gene REgulation using Stability Selection.
99
Regression-based models
Yg =  t ,g X t + 
tTg

Define an objective function:


Sum over M training data sets and N genes
Find parameters that minimize “residual sum of
squares” between observed (X) and predicted (Y)
expression levels.
M N
RSS =  (X i, j −Yi, j )2
j =1 i=1

100
Regression-based models
M N

Yg =  t,g X t +  RSS =  (X i, j −Yi, j )2


tTg j =1 i=1
Problems:
Standard regression will produce many very small values of
, which makes interpretation difficult
 values can be unstable to changes in training data
Solutions:
Subset Selection and Coefficient Shrinkage
•see Section 3.4 of Hastie Tibshirani and Friedman
“The elements of statistical learning” for general approaches and
“TIGRESS: Trustful Inference of Gene REgulation using Stability
Selection” for a successful DREAM challenge doi: 10.1186/1752-
0509-6-145.
101
Outline
• Gene expression
– Distance metrics
– Clustering
–Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data

102
Quick Review of Information Theory

1
Information content I (E ) = log 2
of an event E P (E )

Rare letters have higher information content

© Califon Productions, Inc. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

103
Quick Review of Information Theory

1
Information content I (E ) = log 2
of an event E P (E )

H (S ) =  pi I (si ) = pi log2
1
Entropy is evaluated
over all possible i i pi
outcomes

104
Mutual Information
• Does knowing variable X reduce the uncertainty
in variable Y?
• Example:
– P(Rain) depends on P(Clouds)
– P(target expressed) depends on P(TF expressed)

I (x, y ) = H (x) + H ( y) − H (x, y)


• I(x,y) = 0 means variables are independent
• Reveals non-linear relationships that are missed
by correlation.

105
Mutual information detects
non-linear relationships

Mutual information = 1.7343


Correlation coefficient = -0.0464
© source unknown. All rights reserved. This content is excluded from our Creative
No correlation, but knowing A reduces the Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.

uncertainty in the distribution of B


106
ARACNe

107
ARACNe
• Find TF-target relationships using mutual
information

• How do you recognize a significant value of


MI?
– randomly shuffle expression data
– compute distribution of Mutual information

108
ARACNE
• Data processing inequality
– Eliminate indirect interactions G2
– If G2 regulates G1,G3
I(G1,G3)>0 but adds no insight
– Remove edge with smallest
mutual information in G1 G3
each triple

109
Limitations
• Need huge expression datasets
• Can’t find:
– modulator that do not change in expression
– modulator that are highly correlated with target
– modulators that both activate and repress

110
Huge networks!
This is just the
nearest
neighbors of
one node of
interest from
ARACNe!
Nature
Medicine 18, 436–
440 (2012) doi:10.1038/n
m.2610

Courtesy of Macmillan Publishers Limited. Used with permission.


Source: Della Gatta, Giusy, Teresa Palomero, et al. "Reverse Engineering of TLX
Oncogenic Transcriptional Networks Identifies RUNX1 as Tumor Suppressor
in T-ALL." Nature Medicine 18, no. 3 (2012): 436-40.

111
Outline
• Gene expression
– Distance metrics
– Clustering
–Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data

112
113
Area under precision-recall curve AUPR = area under precision-recall curve

Courtesy of Macmillan Publishers Limited. Used with permission.


Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for Robust
Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
114
Area under precision-recall curve AUPR = area under precision-recall curve

Courtesy of Macmillan Publishers Limited. Used with permission.


Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for Robust
Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
115
Area under precision-recall curve AUPR = area under precision-recall curve

Courtesy of Macmillan Publishers Limited. Used with permission.


Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for
Robust Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
116
Courtesy of Macmillan Publishers Limited. Used with permission.
Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for Robust
Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016 117
Courtesy of Macmillan Publishers Limited. Used with permission.
Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for
Robust Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

Wisdom of crowds for robust gene network inference


Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
118
Thoughts on Gene Expression Data
• Useful for classification and clustering
• Not sufficient for reconstructing regulatory
networks
• Can we infer levels of proteins from gene
expression?

119
Approach
mRNA levels do not predict protein levels
Protein expression levels
(molecules/cell, log-scale base 10)

mRNA expression levels


(arbitrary units, log-scale base 10)
© Royal Society of Chemistry. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: de Sousa Abreu, Raquel, Luiz O. Penalva, et al. "Global Signatures of Protein and
mRNA Expression Levels." Molecular Biosystems 5, no. 12 (2009): 1512-26.

Raquel de Sousa Abreu, Luiz Penalva, Edward Marcotte and Christine Vogel, Mol. BioSyst., 2009 DOI: 10.1039/b908315d

Source: Ning, Kang, Damian Fermin, et al. "Comparative Analysis of Different Label-free Mass
Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq
Gene Expression Data." Journal of Proteome Research 11, no. 4 (2012): 2261-71.

Kang Ning, Damian Fermin, and Alexey I. Nesvizhskii J Proteome Res. 2012 April 6; 11(4): 2261–2271.
120

You might also like