Dynamic Bayesian Network and Nonparametric Regression Model For Inferring Gene Networks From Time

Dynamic Bayesian Network and Nonparametric Regression Model
for Inferring Gene Networks from Time Series Microarray Data
SunYong Kim (1) , Seiya Imoto (1) , and Satoru Miyano (1)
(1)
Human Genome Center, Institute of Medical Science, The University of Tokyo,
4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639 - Japan
{sunk, imoto, miyano}@ims.u-tokyo.ac.jp
keywords: Microarray, Gene networks, Dynamic Bayesian networks, Nonparametric regression
Introduction
A Bayesian network is a promising method for modeling relations among a large number of random variables and has
received considerable attention from the studies of gene network estimation using microarray gene expression data. Imoto et
al. [2, 3] proposed a Bayesian network and nonparametric regression model for capturing nonlinear relations between genes
without discretizing microarray data. However, a problem of Bayesian networks is that it cannot construct cyclic regulations,
while real gene networks may have many feedback loops. For a solution to this problem, we propose a dynamic Bayesian
network and nonparametric regression model [4] for estimating a gene network with cyclic regulations from time series
microarray data. We also derive a criterion for selecting a network from a Bayesian statistical viewpoint. The effectiveness
of our method is displayed though a real data application of Saccharomyces cerevisiae gene expression data.
Dynamic Bayesian Network and Nonparametric Regression Model
Let X be an n×p time series microarray data matrix, where n and p are the number of microarrays and genes, respectively.
Assuming the first order Markov relation between time points, the joint density can be decomposed as f (x11 , · · · , xnp ) =
f1 (x1 )f2 (x2 |x1 ) × · · · × fn (xn |xn−1 ), where xi = (xi1 , · · · , xip )T is an observation vector at time i. The conditional
density fi (xi |xi−1 ) can be decomposed as fi (xi |xi−1 ) = g1 (xi1 |pi−1,1 ) × · · · × gp (xip |pi−1,p ), where pi−1,j denotes the
expression values of the parents of jth gene at time i − 1. For modeling each gj (·), we assume a nonparametric additive
regression model based on B-splines, and f (X) can then be formulated as
 ( )
Yp n
Y 1 (xij − µ(pi−1,j ))2
f (x11 , · · · , xnp ; θ G ) = f1 (x1 )  q exp − ,
2σ 2
2
j=1 i=2 2πσj j
PMj1 (j) (j) (j) PMjqj (j) (j) (j) (j) (j)
where µ(pi−1,j ) = m=1 γm1 bm1 (pi−1,1 ) + ... + m=1 γmqj bmqj (pi−1,qj ), γ1k , ..., γMjk k are coefficient parameters and
(j) (j)
{b1k (·), ..., bMjk k (·)} is a prescribed set of B-splines.
Criterion for Learning Network Structure
When the network structure is given, we can construct a gene network by using the proposed model. However, the true
gene network is usually unknown, and we need to infer the optimal network structure from data. We derive a criterion for
evaluating a network structure from a Bayesian statistical viewpoint. By using the Laplace approximation for integrals, the
criterion, named BNRCdynamic can be expressed as
½ Z ¾
BNRCdynamic (G) = −2 log πprior (G) f (x11 , · · · , xnp ; θG )π(θ G |λ)dθ G
≈ −2 log πprior (G) − r log(2π/n) + log |Jλ (θ̂ G )| − 2nlλ (θ̂ G |X),
FUS3 FAR1 FUS3 FAR1 FUS3 FAR1
MBP1 CLN1 CLN2 MBP1 CLN1 CLN2 MBP1 CLN1 CLN2
SWI6 CDC28 SWI6 CDC28 SWI6 CDC28
CLN3 CLN3 CLN3
CDC28 SIC1 CDC28 SIC1 CDC28 SIC1

MBP1 MBP1 MBP1
SWI6 SWI6 SWI6

CLB5 CLB6 CLB5 CLB6 CLB5 CLB6
CDC28 CDC28 CDC28
CDC20 CDC6 CDC20 CDC6 CDC20 CDC6
(a) Target pathway (b) Result of the BN model (c) Result of the DBN model
Figure 1. Cell cycle pathway compiled in KEGG.
where π(θ G |λ) and πprior (G) are the prior distribution of the parameter θ G and the prior probability of the network
G, respectively, λ is the hyper parameter vector, r is the dimension of θ G , lλ (θ G |X) = log f (x11 , · · · , xnp ; θ G )/n +
log π(θ G |λ)/n, Jλ (θ G ) = −∂ 2 {lλ (θ G |X)}/∂θ G ∂θ TG and θ̂ G is the mode of lλ (θ G |X). We select a network which
minimizes the BNRCdynamic score.
Real Data Application
We apply the proposed method to the Saccharomyces cerevisiae cell cycle data collected by Spellman et al. [5]. The target
network is a part of cell cycle pathway compiled in the KEGG database [1], around CDC28 (YBR160w; cyclin-dependent
protein kinase). This network contains 45 genes and the partial pathway registered in KEGG is shown in Figure 1 (a). Figure
1 (b) and (c) are the resulting networks of the Bayesian network (BN) model [2, 3] and the dynamic Bayesian network
(DBN) model [4] respectively. A shaded region represents the genes which compose a complex. We consider the edges
inside these regions as correct edges since the genes inside the same region will co-express with some delay. We indicate a
correct estimation by an edge attached with a circle. A triangle represents either a misdirected edge or an edge skipping at
most one gene. A cross represents a wrong estimation. We observed that our dynamic Bayesian network model succeeded in
decreasing the number of false positives compared with the Bayesian network model.
References
[1] http://www.genome.ad.jp/kegg/.
[2] S. Imoto, T. Goto, and S. Miyano. Estimation of genetic networks and functional structures between genes by using bayesian network
and nonparametric regression. Pacific Symposium on Biocomputing, 7:175–186, 2002.
[3] S. Imoto, S. Kim, T. Goto, S. Aburatani, K. Tashiro, S. Kuhara, and S. Miyano. Bayesian network and nonparametric heteroscedastic
regression for nonlinear modeling of genetic network. Proc. 1st IEEE Computer Society Bioinformatics Conference, pages 219–227,
2002.
[4] S. Kim, S. Imoto, and S. Miyano. Dynamic bayesian network and nonparametric regression for nonlinear modeling of gene networks
from time series gene expression data. Proc. Computational Methods in Systems Biology, Lecture Note in Computer Science, Springer-
Verlag, 2602:104–113, 2003.
[5] P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher. Comprehensive identification
of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell,
9:3273–3297, 1998.

Dynamic Bayesian Network and Nonparametric Regression Model For Inferring Gene Networks From Time

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamic Bayesian Network and Nonparametric Regression Model For Inferring Gene Networks From Time

Uploaded by

Copyright:

Available Formats

Dynamic Bayesian Network and Nonparametric Regression Model

for Inferring Gene Networks from Time Series Microarray Data

keywords: Microarray, Gene networks, Dynamic Bayesian networks, Nonparametric regression

Dynamic Bayesian Network and Nonparametric Regression Model

Criterion for Learning Network Structure

MBP1 CLN1 CLN2 MBP1 CLN1 CLN2 MBP1 CLN1 CLN2

SWI6 CDC28 SWI6 CDC28 SWI6 CDC28

CLN3 CLN3 CLN3

CDC28 SIC1 CDC28 SIC1 CDC28 SIC1

SWI6 SWI6 SWI6

CDC28 CDC28 CDC28

CDC20 CDC6 CDC20 CDC6 CDC20 CDC6

Figure 1. Cell cycle pathway compiled in KEGG.

Real Data Application

You might also like