Coexpression Intro

Gene Co-
expression
Networks
1
Gene co-expression networks
Gene Expression Data
• Identify co-expressed
genes
• Classify new datasets
• Discover regulatory
networks
Courtesy of Macmillan Publishers Limited. Used with permission.

Source: Baker, Monya. "Gene Data to Hit Milestone." Nature 487,
no. 7407 (2012): 282-3.
3
Gene co-expression network
Schaefer et al, 2017

Outline
• Gene expression
– Distance metrics
– Clustering
– Modules
• Bayesian networks
• Regression
• Mutual Information
• Evaluation on real and simulated data
5
Distance Metrics
3.5
2.5
Expression
1.5
0.5
0
0 2 4 6
Time
6
Expression data as
multidimensional vectors
XB,2 Gene B XA =( 1, 0.5, -1, 0.25, …)

expression XB =(0.2, 0.4, -1.2, 0.05, …)
in experiment Gene A …
2 XA,2
XA,1 XB,1
expression
in experiment
1
What is a natural way to compare these vectors?
7
Euclidean
• XA,k = Expression of gene A in condition k
expression in expt 2
XB,2 B
A
XA,2
XA,1 XB,1
expression in expt 1 8
Distance
• Metrics for distance have a formal
definition:
– d(x, y) ≥ 0
– d(x, y) = 0 if and only if x = y
– d(x, y) = d(y, x)
– Triangle inequality:
d(x, z) ≤ d(x, y) + d(y, z)
• This triangle inequality need not hold for a
measure of “similarity.”
• Distance ~ Dissimilarity = 1 - similarity
9
Distance Metrics
Can we capture the similarity of
3.5
these patterns?
3
2.5
Expression
1.5
0.5
0
0 2 4 6
Time
10
Pearson Correlation
• Xi,j = Expression of gene i in condition j
• ZA = z-score of gene A in all experiments:

• normalized expression of gene A
• ZA is a vector over all experiments
11
Pearson Correlation
• Xi,j = Expression of gene i in condition j
• Zi = z-score of gene i one experiment:
• Pearson correlation
over all experiments
– from +1 (perfect correlation) to -1 (anti-correlated)

• Distance = 1-rA,B
12
3.5 1.5
3 1
Expression
2.5
0.5
Z-score
2
0
0 1 2 3 4 5 6
1.5
-0.5
A ZA
1
C -1
ZC
0.5
-1.5
0
0 1 2 3 4 5 6 -2
13
3.5
2
ZA
3
1.5 ZB
A ZC
Expression
2.5 1
B
Z-score
2 C
0.5
1.5 0
0 1 2 3 4 5 6
1 -0.5
-1
0.5
-1.5
0
0 1 2 3 4 5 6
-2
RA,B= -0.01
RA,C= 0.999
RB,C= -0.03
14
4 2
ZA
3 1.5 ZB
ZD
Expression
2 1
Z-score
1 0.5
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
-1 A
-0.5
B
-2 D -1
-3 -1.5
-4 -2
RA,B= -0.01
RA,D= -1.0
RB,D= 0.007
15
Distance Metrics
3.5
2.5
Expression
1.5
1
1-
0.5
0
0 2 4 6
Time
16
Missing Data
• What if a particular data point is missing?
(Back in the old days: there was a bubble or a
hair on the array)
– ignore that gene in all samples
– ignore that sample for all genes
– replace missing value with a constant
– “impute” a value
• example: compute the K most similar genes (arrays)
using the available data; set the missing value to the
mean of that for these K genes (arrays)
17
Clustering
• Multiple mechanisms could lead to up-
regulation in any one condition
• Goal: Find genes that have “similar”
expression over many condition.
• How do you define “similar”?
18
Clustering
• Intuitive idea that we want to find an
underlying grouping
• In practice, this can
be hard to define and
implement.
• An example of unsupervised learning
19
Clustering 8600 human genes based on time course
of expression following
serum stimulation of fibroblasts
Key: Black = little change Green = down Red = up

(relative to initial time point)
Genes
(A) cholesterol biosynthesis

(B) the cell cycle
(C) the immediate-early response
(D) signaling and angiogenesis
(E) wound healing and tissue remodeling
© American Association for the Advancement of Science. All rights reserved. This content is excluded
from our Creative Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
Source: Iyer, Vishwanath R., Michael B. Eisen, et al. "The Transcriptional Program in the Response
of Human Fibroblasts to Serum." Science 283, no. 5398 (1999): 83-7. Iyer et al. Science 1999
20
Why cluster?
• Cluster genes (rows)

– Measure expression at multiple time-points, different
conditions, etc.
Similar expression patterns may suggest similar functions of genes
• Cluster samples (columns)

– e.g., expression levels of thousands of genes for each
tumor sample
Similar expression patterns may suggest biological relationship
among samples
21
Hierarchcial clustering
Two types of approaches:
•Agglomerative
•Divisive
© source unknown. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
22
Agglomerative Clustering Algorithm
• Step 1: Initialization: Each data point is in its
own cluster
• Step 2: Merge the two most similar clusters.
• Step 3: Repeat step 2 until there is only one
cluster.
23
Agglomerative Clustering Algorithm
• Step 1: Initialization: Each data point is in its

own cluster
• Step 2: Merge the two most similar clusters.
• Step 3: Repeat step 2 until there is only one
cluster.
If distance is defined for a

vector, how do I compare
clusters?
24
• Clusters Y, Z with A in Y and B in Z
• Single linkage = min{dY,Z}
• Complete linkage = max{dY,Z}
• UPGMC (Unweighted Pair Group Z
B
Method using Centroids Y
A
– Define distance as
• UPGMA (Unweighted Pair Group
Method with Arithmetic Mean)
average of pairwise distances:
25
• Single linkage = min{dY,Z}
• Complete linkage = max{dY,Z}
26
• If clusters exist and are compact, it should not
matter.
• Single linkage will “chain” together groups
with one intermediate point.
• Complete linkage will not combine two groups
if even one point is distant.
27
Interpreting the Dendogram
• This produces a binary tree or
dendrogram
• The final cluster is the root and
each data item is a leaf
• The heights of the bars indicate
how close the items are
• Can ‘slice’ the tree at any
distance cutoff to produce
Distance
discrete clusters
• Dendogram represents the
results of the clustering; its
usefulness in representing the
data is mixed.
• The results will always be
hierarchical, even if the data are
Data items (genes, etc.) not.
28
K-means clustering
• Advantage: gives sharp partitions of the data
• Disadvantage: need to specify the number of
clusters (K).
• Goal: find a set of k clusters that minimizes
the distances of each point in the cluster to
the cluster mean:
29
K-means clustering algorithm
• Initialize: choose k points as cluster means
• Repeat until convergence:
– Assignment: place each point Xi in the cluster
with the closest mean.
– Update: recalculate the mean for each cluster
30
31
32
33
What if you choose the wrong K?
34
35
36
37
Big steps occur when we are dividing
data into natural clusters
Smaller steps occur

when we are
overclustering
38
What if we choose pathologically
bad initial positions?
39
40
41
42
43
44
45
46
47
48
What if we choose pathologically
bad initial positions?
Often, the algorithm gets a
reasonable answer, but not always!
49
Convergence
• K-means always converges.
• The assignment and update steps always
either reduce the objective function or leave it
unchanged.
50
Convergence
• However, it doesn’t always find the same
solution.
K=2
51
Fuzzy K-means
K=2
52
K-means Fuzzy k-means
• Initialize: choose k • Initialize: choose k
points as cluster means points as cluster means
• Repeat until • Repeat until
convergence: convergence:
– Assignment: place each – Assignment: calculate
point Xi in the cluster probability of each point
with the closest mean. belonging to each
cluster.
– Update: recalculate the – Update: recalculate the
mean for each cluster mean for each cluster
using these probabilities
76
K-means Fuzzy k-means
= membership of point j in cluster i

Larger values of r make the clusters
more fuzzy.
Relationship to EM and Gaussian mixture models 77

Limits of k-means
K-means uses Euclidean distance
• Gives most weight to largest cluster

• Can’t be used if data are qualitative
• Centroid usually does not represent any datum
55
K-means K-medoids
• Best clustering minimizes • Best clustering minimizes
within-cluster Euclidean within-cluster dissimilarity
distance of from centroids from medoids (exemplar)
Medoid: a data point in each cluster that minimizes the sum of

distances of all points in that cluster from itself.
56
K-medoids clustering
• Initialize: choose k points as cluster means
• Repeat until convergence:
– Assignment: place each point Xi in the cluster
with the closest medoid.
– Update: recalculate the medoid for each cluster
57
So What?
• Clusters could reveal underlying biological
processes not evident from complete list of
differentially expressed genes
• Clusters could be co-regulated. How could we
find upstream factors?
58
Reconstructing Regulatory Networks
Courtesy of Elsevier B.V. Used with permission.

Source: Sumazin, Pavel, Xuerui Yang, et al. "An Extensive
MicroRNA-mediated Network of RNA-RNA Interactions
Regulates Established Oncogenic Pathways in
Glioblastoma." Cell 147, no. 2 (2011): 370-81.
59
Clustering vs. “modules”
• Clusters are purely phenomenological – no
claim of causality
• The term “module” is used to imply a more
mechanistic connection
Transcription Transcription
factor A factor B
Correlated expression
60
61
Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for
Robust Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.
Wisdom of crowds for robust gene network inference

Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016
62

63
Outline
• Gene expression
– Clustering
–Modules
• Regression
64
Bayesian Networks
Predict unknown variables from

observations
A “natural” way to think about

biological networks.
65
Bayesian Networks
• Bayesian Networks are a tool for reasoning
with probabilities
• Consist of a graph (network) and a set of
probabilities
• These can be “learned” from the data
Bayes’ theorem
• Bayes’ theorem (Bayes’ law or Bayes’ rule)
describes the probability of an event based
on prior knowledge of conditions that might
be related to the event.
Bayes’ theorem
• It can be formulated mathematically as:
• or
𝑃(𝐴,𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴)
• A and B are observed events

• P(B) is the probability of observing B independently.
• P(A,B) is a joint probability of A and B which represents the
likelihood of event A and B occurring together
• P(A|B) is a conditional probability which represents the
likelihood of event A occurring given that B is observed.
Bayes’ theorem
• It can be formulated mathematically as:
• or
𝑃(𝐴,𝐵)=𝑃(𝐴|𝐵)𝑃(𝐵)=𝑃(𝐵|𝐴)𝑃(𝐴)
• Other equivalent formulation:

Bayesian Network (BN)
• A class of graphic models that consist of two parts, <G, P> or
<G, ϴ>
• G is a directed acyclic graph (DAG) made up of nodes corresponding
to random variables, X and links representing conditional
dependencies between them
An example of a DAG.
<G, ϴ>
• P is a set of conditional probability distributions (CPD), one for each
node conditional on its parents.
• It follows the Markov property of BN: each node is conditionally
independent of its ancestors given its parents.
<G, ϴ>
• CPD for node A: 𝑃(A|B,C)
• CPD for node B: 𝑃(B|F)
• CPD for node C: P(C)
• What are the CPD for nodes D, E, F and G?
<G, ϴ>
• Given the Markov property, the joint probability distribution of
random variables in G can be decomposed into a product of
conditional probability distributions across all nodes.
𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)
• ϴ is a set of parameters in P.
<G, ϴ>
• ϴ is a set of parameters in P
• Number of parameters in Bayesian network means the number of
conditional probability entries in the network (as shown below).
The conditional probability

for nodes C and F have
only one value because
P(C=No) or P(F=No) is
simply one minus the
value.
<G, ϴ>
• Number of parameters equals
• n: number nodes
• 𝒓𝒊 is the level of node i
• 𝒒𝒊 is the number of value combinations of node 𝑖’s parents
• ϴ is a set of parameters in P.
<G, ϴ>
• n: number nodes
Suppose all nodes have binary values, that their level are all 2
Then the number of parameters in 𝑃(A|B,C) is calculated as:

(𝒓𝑨 − 1) 𝒒𝑨 = (2-1) * (2*2) = 4
Then the number of parameters in P(B|F) is calculated as:
(𝒓𝑩 − 1) 𝒒𝑩 = (2-1) * (2) = 2
<G, ϴ>
• n: number nodes
Suppose all nodes have binary values, that is their level are all 2
Number of parameters for P(A,B,C,D,E,F,G,H) = 4 + 2 + 1 + 4 + 2 + 1 + 1 + 2
How many parameters if no independency between nodes?

If all nodes are binary, it will be 𝟐𝒏 -1
Quick exercise:
What is the probability for network?
How many parameters?
What is the probability of observing (S=1, C=1, B=1, X=1, D =1)?
Parameter Learning
• Parameter learning determines the
probability distribution of each node in a BN.
• Once the structure of the network has been

learnt from the data, estimating the
parameters of the global distribution is greatly
simplified by the application of the Markov
property.
Parameter Learning
• One can also classify the value of a target
variable using the posterior probability
distribution and an observation of other
variables.
• The predicted class is the one that has the
largest posterior probability.
• Posterior probability: given the observation of other
variables, what is the probability of the target variable?
Parameter Learning
• Example, for an observation of the input
variables, (𝑋1=𝑥1,𝑋2=𝑥2,⋯,𝑋𝑛−1=𝑥𝑛−1),
• The predicted class of target T is c satisfying
𝑚𝑎𝑥𝑐 𝑃(𝑇=𝑐|𝑋1=𝑥1,𝑋2=𝑥2,⋯,𝑋𝑛−1=𝑥𝑛−1)
Parameter Learning
• Example, a BN with all node having YES or NO
Parameter Learning
Suppose we observe an event (B=Yes, C=Yes, D=Yes, E=No, F=Yes, G=No, H=Yes)
What is P(A=YES)?
Parameter Learning
𝑃(𝐴=𝑌|B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) Product of conditional probability
∝ 𝑃(𝐴=𝑌,B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) for each node
=𝑃(𝐴=𝑌|𝐵=𝑌,𝐶=𝑌) 𝑃(B=Y|F=Y)P(C=Y) P(D=Y|A=Y,G=N) P(E=N|A=Y) P(F=Y)
P(G=N) P(H=Y|E=N)
=0.5×0.43×0.59×0.8×(1−0.67)×0.68×(1−0.59)×0.23
=0.002;
𝑃(𝐴=𝑁|B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y) ∝𝑃(𝐴=𝑁,B=Y,C=Y,D=Y,E=N,F=Y,G=N,H=Y)
=𝑃(𝐴=𝑁|𝐵=𝑌,𝐶=𝑌)𝑃(B=Y|F=Y)P(C=Y)P(D=Y|A=N,G=N)P(E=N|A=N)P(F=Y)P(G=N)
P(H=Y|E=N)
=(1−0.5)×0.43×0.59×0.5×(1−0.1)×0.68×(1−0.59)×0.23
=0.004.
Suppose we observe an event (B=Yes, C=Yes, D=Yes, E=No, F=Yes, G=No, H=Yes)
What is P(A=YES)?
Parameter Learning
What is P(A=Yes | F=Yes)?

A only depends on B and C, B
depends on F. So the rest drops out.
Ancestor (Sub)Graph: a subgraph of the Bayes Net where only variables of interest
and their ancestors are drawn
Ancestor Graph Shortcut: Any variable that is not in the "ancestor" graph for the set
of variables of interest we can remove from the summation.
Parameter Learning
P(A=Yes | F=Yes)
= σ𝐵,𝐶 𝑃 𝐴 = 𝑌𝑒𝑠 𝐵, 𝐶 𝑃(𝐶)𝑃(𝐵|𝐹 = 𝑌𝑒𝑠)
B C 𝑃 𝐴 = 𝑌𝑒𝑠 𝐵, 𝐶 𝑃(𝐶) 𝑃(𝐵|𝐹 = 𝑌𝑒𝑠)

Yes Yes
Yes No
No Yes
No Yes
Is the p53 pathway activated?
Courtesy of Looso et al. License: CC-BY.

Source: Looso, Mario, Jens Preussner, et al. "A De Novo Assembly of the Newt Transcriptome Combined with Proteomic
Validation Identifies New Protein Families Expressed During Tissue Regeneration." Genome Biology 14, no. 2 (2013): R16.
87
Possible Evidence
•Known p53 targets are up-regulated
•Could another pathway also cause this?
•Genes for members of signaling pathway are
expressed (ATM, ATR, CHK1, …)
•Might be true under many conditions
where pathway has not yet been activated
•Genes for members of signaling pathway are
differentially expressed
•Still does not prove change in activity
Courtesy of Looso et al. License: CC-BY.
Source: Looso, Mario, Jens Preussner, et al. "A De Novo Assembly of the Newt Transcriptome Combined with Proteomic
Validation Identifies New Protein Families Expressed During Tissue Regeneration." Genome Biology 14, no. 2 (2013): R16.
88
• Formulate problem probabilistically
• Compute
– P(p53 pathway activated| data)
• How?
– Relatively easy to compute p(X up | TF up)
– How?
TF
X1 X2 X3 89
• Compute
• How?
– Look over lots of experiments and tabulate:
• X1 up & TF up
• X1 up & TF not up
• X1 not up & TF not up
• X1 not up & TF up TF
X1 X2 X3 90
• Compute
• How?
– P(TF up|X up) = p(X up | TF up) p(TF up)/p(X up)
TF
X1 X2 X3 310
• Compute
• How?
– Even with p(TF up | X up) how do we compare this
explanation of the data to other possible
explanations?
– Can we include upstream data? TF
X1 X2 X3
Application to Gene Networks
TF A1 TF B1
•Which pathway activated
this set of genes?
•Either A or B or both would
produce similar but not
identical results.
TF A2 TF B2
•Bayes Nets estimate
conditional probability tables
from lots of gene expression
data.
•How often is TF B2
expressed when TF B1 is
expressed, etc.
93
Learning Models from Data
• Searching for the BN structure: NP-complete
– Too many possible structures to evaluate all of
them, even for very small networks.
– Many algorithms have been proposed
– Incorporated some prior knowledge can reduce
the search space.
• Which nodes should regulate transcription?
• Which should cause changes in phosphorylation?
– Intervention experiments help
94
Source: Sachs, Karen, Omar Perez, et al. "Causal Protein-signaling Networks Derived From
Multiparameter Single-cell Data." Science 308, no. 5721 (2005): 523-9.
• Without interventions, all we can say is that

X and Y are correlated
• Interventions allow us to determine which is
the parent.
K. Sachs et al., Science 308, 523 -529 (2005)
95
Fig. 1. Bayesian network modeling with single-celldata
If we don’t measure “Y” can

we still model the data?
The relationship of X and Z,W
will be noisy and might be
missed.
Source: Sachs, Karen, Omar Perez, et al. "Causal Protein-signaling Networks Derived From
Multiparameter Single-cell Data." Science 308, no. 5721 (2005): 523-9.
K. Sachs et al., Science 308, 523 -529 (2005) 113

Outline
• Gene expression
– Clustering
–Modules
• Regression
97
Regression-based models
Predicted expression : Yg = f g ( X Tg ) + 
Assume that expression of gene Xg is some function of the
expression of its transcription factors X Tg = {X t ,t Tg }
Xi = measured expression of i-th gene
XTi = measured expression of a set of TFs potentially
regulating gene i
fg is an arbitrary function
ϵ is noise
98
f g ( XTg ) =  t ,g X t
tTg
fg is frequently assumed to be a linear function

The values of the t,g reflect the influence of each TF on
gene g
How do we discover the values of the t,g ?
BMC Syst Biol. 2012 Nov 22;6:145. doi: 10.1186/1752-0509-6-145.
TIGRESS: Trustful Inference of Gene REgulation using Stability Selection.
99
Yg =  t ,g X t + 
tTg
Define an objective function:

Sum over M training data sets and N genes
Find parameters that minimize “residual sum of
squares” between observed (X) and predicted (Y)
expression levels.
M N
RSS =  (X i, j −Yi, j )2
j =1 i=1
100
M N
Yg =  t,g X t +  RSS =  (X i, j −Yi, j )2

tTg j =1 i=1
Problems:
Standard regression will produce many very small values of
, which makes interpretation difficult
 values can be unstable to changes in training data
Solutions:
Subset Selection and Coefficient Shrinkage
•see Section 3.4 of Hastie Tibshirani and Friedman
“The elements of statistical learning” for general approaches and
“TIGRESS: Trustful Inference of Gene REgulation using Stability
Selection” for a successful DREAM challenge doi: 10.1186/1752-
0509-6-145.
101
Outline
• Gene expression
– Clustering
–Modules
• Regression
102
Quick Review of Information Theory
1
Information content I (E ) = log 2
of an event E P (E )
Rare letters have higher information content
© Califon Productions, Inc. All rights reserved. This content is excluded from our Creative
103
Quick Review of Information Theory
1
Information content I (E ) = log 2
of an event E P (E )
H (S ) =  pi I (si ) = pi log2
1
Entropy is evaluated
over all possible i i pi
outcomes
104
Mutual Information
• Does knowing variable X reduce the uncertainty
in variable Y?
• Example:
– P(Rain) depends on P(Clouds)
– P(target expressed) depends on P(TF expressed)
I (x, y ) = H (x) + H ( y) − H (x, y)

• I(x,y) = 0 means variables are independent
• Reveals non-linear relationships that are missed
by correlation.
105
Mutual information detects
non-linear relationships
Mutual information = 1.7343

Correlation coefficient = -0.0464
No correlation, but knowing A reduces the Commons license. For more information, see http://ocw.mit.edu/help/faq-fair-use/.
uncertainty in the distribution of B

106
ARACNe
107
ARACNe
• Find TF-target relationships using mutual
information
• How do you recognize a significant value of

MI?
– randomly shuffle expression data
– compute distribution of Mutual information
108
ARACNE
• Data processing inequality
– Eliminate indirect interactions G2
– If G2 regulates G1,G3
I(G1,G3)>0 but adds no insight
– Remove edge with smallest
mutual information in G1 G3
each triple
109
Limitations
• Need huge expression datasets
• Can’t find:
– modulator that do not change in expression
– modulator that are highly correlated with target
– modulators that both activate and repress
110
Huge networks!
This is just the
nearest
neighbors of
one node of
interest from
ARACNe!
Nature
Medicine 18, 436–
440 (2012) doi:10.1038/n
m.2610

Source: Della Gatta, Giusy, Teresa Palomero, et al. "Reverse Engineering of TLX
Oncogenic Transcriptional Networks Identifies RUNX1 as Tumor Suppressor
in T-ALL." Nature Medicine 18, no. 3 (2012): 436-40.
111
Outline
• Gene expression
– Clustering
–Modules
• Regression
112
113
Area under precision-recall curve AUPR = area under precision-recall curve

Source: Marbach, Daniel, James C. Costello, et al. "Wisdom of Crowds for Robust
Gene Network Inference." Nature Methods 9, no. 8 (2012): 796-804.

114


115


116

Nature Methods 9, 796–804 (2012) doi:10.1038/nmeth.2016 117

118
Thoughts on Gene Expression Data
• Useful for classification and clustering
• Not sufficient for reconstructing regulatory
networks
• Can we infer levels of proteins from gene
expression?
119
Approach
mRNA levels do not predict protein levels
Protein expression levels
(molecules/cell, log-scale base 10)
mRNA expression levels

(arbitrary units, log-scale base 10)
© Royal Society of Chemistry. All rights reserved. This content is excluded from our Creative
Source: de Sousa Abreu, Raquel, Luiz O. Penalva, et al. "Global Signatures of Protein and
mRNA Expression Levels." Molecular Biosystems 5, no. 12 (2009): 1512-26.
Raquel de Sousa Abreu, Luiz Penalva, Edward Marcotte and Christine Vogel, Mol. BioSyst., 2009 DOI: 10.1039/b908315d
Source: Ning, Kang, Damian Fermin, et al. "Comparative Analysis of Different Label-free Mass
Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq
Gene Expression Data." Journal of Proteome Research 11, no. 4 (2012): 2261-71.
Kang Ning, Damian Fermin, and Alexey I. Nesvizhskii J Proteome Res. 2012 April 6; 11(4): 2261–2271.
120

Coexpression Intro

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coexpression Intro

Uploaded by

Copyright:

Available Formats

Gene Co-

Courtesy of Macmillan Publishers Limited. Used with permission.

Schaefer et al, 2017

XB,2 Gene B XA =( 1, 0.5, -1, 0.25, …)

What is a natural way to compare these vectors?

• Xi,j = Expression of gene i in condition j

• ZA = z-score of gene A in all experiments:

– from +1 (perfect correlation) to -1 (anti-correlated)

Key: Black = little change Green = down Red = up

(A) cholesterol biosynthesis

• Cluster genes (rows)

• Cluster samples (columns)

• Step 1: Initialization: Each data point is in its

If distance is defined for a

Smaller steps occur

= membership of point j in cluster i

Relationship to EM and Gaussian mixture models 77

• Gives most weight to largest cluster

Medoid: a data point in each cluster that minimizes the sum of

Courtesy of Elsevier B.V. Used with permission.

Wisdom of crowds for robust gene network inference

Wisdom of crowds for robust gene network inference

Predict unknown variables from

A “natural” way to think about

• It can be formulated mathematically as:

• A and B are observed events

• It can be formulated mathematically as:

• Other equivalent formulation:

The conditional probability

𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)

Then the number of parameters in 𝑃(A|B,C) is calculated as:

𝑃(A,B,C,D,E,F,G,H)=𝑃(A|B,C) 𝑃(B|F) P(C) 𝑃(D|A,G) P(E|A) P(F) P(G) P(H|E)

Number of parameters for P(A,B,C,D,E,F,G,H) = 4 + 2 + 1 + 4 + 2 + 1 + 1 + 2

How many parameters if no independency between nodes?

• Once the structure of the network has been

What is P(A=Yes | F=Yes)?

B C 𝑃 𝐴 = 𝑌𝑒𝑠 𝐵, 𝐶 𝑃(𝐶) 𝑃(𝐵|𝐹 = 𝑌𝑒𝑠)

Courtesy of Looso et al. License: CC-BY.

• Without interventions, all we can say is that

If we don’t measure “Y” can

K. Sachs et al., Science 308, 523 -529 (2005) 113

fg is frequently assumed to be a linear function

Define an objective function:

Yg =  t,g X t +  RSS =  (X i, j −Yi, j )2

Rare letters have higher information content

I (x, y ) = H (x) + H ( y) − H (x, y)

Mutual information = 1.7343

uncertainty in the distribution of B

• How do you recognize a significant value of

Courtesy of Macmillan Publishers Limited. Used with permission.

Courtesy of Macmillan Publishers Limited. Used with permission.

Wisdom of crowds for robust gene network inference

Courtesy of Macmillan Publishers Limited. Used with permission.

Wisdom of crowds for robust gene network inference

Courtesy of Macmillan Publishers Limited. Used with permission.

Wisdom of crowds for robust gene network inference

Wisdom of crowds for robust gene network inference

Wisdom of crowds for robust gene network inference

mRNA expression levels

You might also like