1 views

Uploaded by Sudipta Bhadra

Paper on GRN using Bspline

- Time Series Laboratory Exercise I
- ECONOMETRIC ANALYSIS TIME SERIES - LISTINET.pdf
- Stata 02 04timeseriesanalysis 120605154536 Phpapp02
- time series ppt.pdf
- Exercises for July 16
- phps09a
- Data Mining on Heathcare Management System
- Predictability of Nonstationary Time Series Using Wavelet and EMD Based ARMA Models
- iandfct62014syl
- Sixth Form Trials 2018 Semi-Final Draft.pdf
- Comparacion Funcional de Cobre
- 1. FORECASTING_Introduction.ppt
- Problem Posing Rubric
- Lecture6_StockSimulationExcel
- Advanced Systems Lab Exercises From the Book – VISki
- Chap19 Time-Series Analysis and Forecasting
- 2015 Rocada Dumitrescu
- 1_2017_Curves_Representation.pdf
- Training Data Selection for Short Term Load Forecasting
- 1.3 3D Modeling

You are on page 1of 5

Haixin Wang, James E. Glover and Lijun Qian

Abstract In this paper, the quantitative analysis of timeseries gene expression data on inference of gene regulatory

networks is performed. Time-series gene data are modeled by

the B-Spline algorithm to improve the overall smooth expression

curves which can further reduce over-fitting. The effect of the

different sizes of observed time-series data on gene regulatory

networks inference is analyzed. The stochastic errors introduced

by the B-Spline algorithm to the system are evaluated. The

precision of different sizes of time-series data on parameter

estimations is compared. With application of the B-Spline to

generate continuous curves, simulation results can be much

more accurate and inference results are significantly improved.

Both synthetic data and experimental data from microarray

measurements are used to demonstrate the effectiveness of the

proposed method.

I. I NTRODUCTION

ENETIC regulatory networks (GRNs) are collections

of DNA segments in a cell which interact with each

other and with other substances in the cell, thereby governing

gene transcriptions. In light of the recent development of

high-throughput DNA microarray technology, it becomes

possible to discover GRNs, which are complex and nonlinear

in nature. Specifically, the increasing existence of microarray

time-series data makes possible the characterization of dynamic nonlinear regulatory interactions among genes. The

modeling, analysis and control of GRNs are critical for

finding medicine for gene-related diseases.

One issue during the inference of GRNs is that not enough

time-series experiment data are available [1]. This makes

many inference processes not realizable. Many models are

restricted by limited data. B-Spline can reconstruct unobserved gene expression data or recover the missing data.

Compared to the B-Spline algorithm, simple algorithms, for

example linear interpolation, can lead to poor estimation for

the missing time-series data, especially when the sampled

data is not uniform [2].

In this paper, the ordinary differential equation (ODE)

model using S-system is adopted to infer GRNs [3]. Different sizes of time-series data are analyzed. The B-Spline

algorithm is introduced to analyze the raw data. The effects

of the time-series data to infer gene regulatory networks are

analyzed in synthetic model and microarray experimental

expression data.

Haixin Wang and James E. Glover are with the Department of Mathematics and Computer Science, Fort Valley State University, Fort Valley,

Georgia, USA (email: wangh@fvsu.edu).

Lijun Qian is with the Department of Electrical and Computer Engineering, Prairie View A&M University, Prairie View, Texas, USA (email:

LiQian@pvamu.edu).

II. I NTRODUCTION

TO

B-S PLINE

ALGORITHM

in image processing, computer-aided design and statistical

analysis [4], [5]. It is one of the most flexible formulations for

many engineering applications. A spline curve is a sequence

of curve segments that are connected together to form a

single continuous curve. In this study, it is a time-related

sequence [6]. The basis B-Splines of degree m can be

expressed as

S(t) =

N m1

X

i=0

control points, and the normalized B-Spline basis bi,k (t) can

be expressed using the Cox-deBoor recursion formula [6]:

1, di t di+1

bi,1 (t) =

0, otherwise

bi,k (t) =

+

.

di+k1 di

di+k di+1

satisfying

d0 d1 dN ,

the corresponding control points (Pi ) need to be calculated.

Suppose t [dm , dkm ] and P0i = Pi for i = m, , k m

h1

Phi = (1 h,i )Ph1

i1 + h,i Pi

where h = 1, , k; i = 0, , N m 1;

h,i =

t Pi

.

pi+k+1h Pi

points with line segments. From those control points, the

corresponding data points can be calculated.

In this study, we select the uniform cubic B-Spline to insert

time-series data, and k is chosen to be 4. Then the basis can

be expressed as

1 3 3 1

pi1

1 3 6 3 0 pi

Si (t) = u3 u2 u 1

3 0 pi+1

6 3 0

1

4

1 0

pi+2

(1)

where u [0, 1].

The general process of inserting time series data via Bspline algorithm is shown in Algorithm 1.

Require:

The raw data: d0 dN

The insertion data length between two insertion data

points: (di , di+1 )

Spline order: k = 4

Ensure: The data set Si

1) Calculate the control points Pi from the original data

di 2) Determine the insertion data Si according to equation (1) with initial conditions: spline order and insertion

data length

Microarray

Experiments

Apply B-Spline

Algorithm to

Gene Expression

Data

Genetic

Programming

Kalman Filter

Yes

REGULATORY NETWORKS

In general, modeling gene regulatory networks is a nonlinear identification problem. Assuming there are N genes

of interest and xi denotes the state (such as the microarray

reading) of the ith gene, then the dynamics of the GRN may

be modeled as

dxi

= fi (x1 , x2 , , xN )

i = 1, 2, , N

(2)

dt

where the nonlinear functions fi need to be determined

from time-series microarray measurements. In this study, we

assume that the functions (fi , i) are in the form [7]:

fi =

Li

X

[wij ij (x1 , x2 , , xN )]

i = 1, 2, , N (3)

j=1

where Li is the number of terms in fi , wij are the parameters to be estimated and ij (x1 , x2 , , xN ) is the j th

component of the nonlinear function fi . A two-step nested

optimization procedure is proposed to identify the nonlinear differential equation for each individual gene. Genetic

programming (GP) is applied to determine the nonlinear

parameters (global optimization) and then the corresponding parameters associated with each term are estimated by

Kalman filtering (local optimization) in each iteration. Such

a decomposition of the problem into a structural part solved

by GP and a parameter optimization part solved by Kalman

filtering reduces the complexity significantly and speeds

up convergence. In this paper the S-system is adopted for

function fi . The S-system model is given by [3]:

N

N

Y

Y

dxi

gi,j

h

= i

xj i

xj i,j , (i = 1, ..., N )

dt

j=1

j=1

Data Sturcture

Model Parameters

Entire Model

No

Evaluation by Fitness Function?

Yes

Fig. 1. The flowchart for GRN identification using GP, Kalman Filtering

and B-Spline

successful in modeling GRNs [9], [10], [11], [12], [13], [14].

Hence, the S-system model is adopted for modeling GRNs

in this paper. More importantly, the B-Spline algorithm is

introduced into the process to improve the quality of the

time-series data sets. The general process of inferring the

gene regulatory networks is shown in Fig. 1.

IV. T HE

B-S PLINE

the B-Spline algorithm are discussed in the optimization

estimation process.

(4)

rate constants, gi,j and hi,j are the exponential parameters

called kinetic orders. If gi,j > 0, gene j will induce the

expression of gene i. On the contrary, gene j will inhibit

the expression of gene i if gi,j < 0. hi,j will have the

opposite effects on controlling gene expressions compared to

gi,j . S-system is a quantitative model which is characterized

by power-law functions. It has the rich structure capability of

capturing various dynamics in many biochemical systems [8].

For microarray experiments, the time-series data are important to infer gene regulatory networks. The sampling data

sizes from the experiment have a significant impact on the

resulted GRNs. For insufficient data sizes, the inferred GRNs

can have errors. Hence, the effects of the various data sizes

on the inference of GRNs are analyzed in this section. In

order to show the relation, the coefficient of determination

(CoD) is applied [15]. It provides a measure of how well

future outcomes are likely to be predicted by the model. The

0.9

0.9

BSplline

BSpline

BSpline

0.8

0.8

Linear

BSpline

Linear

Linear

0.7

0.7

MSE Error

Average CoD

data4

0.6

0.5

0.6

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0

0

10

12

14

16

18

10

15

20

25

20

Fig. 2.

CoD is given by

SSerr

N

X

(yi fi )2

N

X

(yi y)2

i=0

SStot

Fig. 3.

microarray experiment is usually very small. Therefore, it

is necessary to introduce the B-spline into the process of

inference of GRNs. It is also noticed that B-Spline can reach

a better result than that of a linear interpolation algorithm

before approaching the upper bound.

i=0

= 1

SSerr

SStot

observed data and data from the inferred model. SStot is

the sum of the squared errors between observed data and its

mean. R2 is the CoD that indicates how well the regressing

line fits the data. An R2 = 1.0 indicates that the regression

line perfectly fits the data. yi is the observed data, fi is the

modelled data and y is the mean of the observed data. The

process is shown by Algorithm 2.

Algorithm 2 The general process of analyzing the effect of

data size to infer GRNs

Require:

The observed data xi with size n

Ensure: CoD of each GRNs

1) Apply two-step method to infer GRNs

2) Analyze CoD for each GRNs

As an example, the result is shown in Fig. 2 for the

following synthetic model:

x1

x2

1.2

0.2

= x1.5

1 x2 x2

0.5

= x0.1

1 x1 x2

From Fig. 2, the averaged CoD from the estimator becomes larger with the increased amount of sampling data.

The CoD does not change much after certain number of

samples, which implies that the CoD has an upper bound.

It is observed that increasing the number of sampling points

yields more exact estimates, within the upper bound. Note

B-Spline algorithm is introduced to interpolate more data

from the original sampling data with some missing data. It is

necessary to analyze whether or not the B-Spline algorithm

introducesP

cumulative data errors. The mean square errors

N

(M SE = i=0 (yi fi )2 ) for different sizes of the data in BSpline algorithm are stochastically calculated and compared

to the observed data. The result is shown in Fig.3. The

synthetic data range is [0.1, 1]. The MSE range by BSpline algorithm is [0.02, 0.25]. The MSE range by linear

interpolation is [0.029, 0.81]. From Fig. 3, it is observed that

the MSE remains very small and it does not increase with

interpolated data by B-Spline algorithm. Therefore, B-Spline

algorithm introduce very limited data errors, which have little

effect on the inference result of GRNs.

C. The effect of data size on parameter optimization estimation

After applying B-Spline to interpolate data, it is necessary

to analyze the parameter estimation under different interpolation sizes. The general process is shown by Algorithm 3.

The effect of the total data size is analyzed using the

same synthetic model. The data size is the total data size

after applying the B-Spline algorithm with different sampling

sizes. The stochastic CoD is analyzed with different intervals

from two adjacent control points. The result is shown by

Fig.4. It clearly shows that the CoD is larger with larger

sampling size. However, both interpolation algorithms are

upper bounded. Fig. 4 shows that B-Spline can get much

better results in comparison to linear interpolation.

2500

HAP1

CYB2

CYC7

CYT1

COX5A

x1 Bspline

x2 Bspline

x3 Bspline

x4 Bspline

x5 Bspline

2000

Concentration Level

data size on parameter optimization

Require:

The observed data xi

Ensure: Stochastical errors of the estimated parameters

1) Choose the sampling size N for B-Spline algorithm

2) Apply B-Spline algorithm with the size N sampling

points

3) Apply Kalman Filter to calculate the parameters

4) Calculate the stochastical errors of the estimated parameters

1500

1000

500

50

100

150

200

250

300

350

Time

BSpline

0.95

CoD of Estimated Parameters

BSpline

Fig. 5.

Linear

Linear

0.9

0.85

0.8

0.75

0.7

0

20

40

60

80

100

120

140

160

180

200

Data Size

Fig. 4.

During this part of the simulation, we consider timeseries gene-expression data corresponding to yeast protein

synthesis. Five genes (HAP1(x1 ), CYB2(x2 ), CYC7(x3 ),

CYT1(x4 ), COX5A(x5 )) are selected because the relations

among them have been revealed by biological experiments [16]. After the B-Spline algorithm is applied to the

experimental data, the trace of the time-series microarray

measurement data from [17] is shown by Fig. 5. Because the

processes using B-splines algorithm generates enough data

from the original experiment data, inference results for each

iteration are stable. Therefore, unlike [3], the complicated

processes using Z-scores are not necessary to process the

simulation results, and B-splines is useful for the experiment

data processing which improves the inference with less

complicated process and more stable results.

The relationships among the 5 genes are shown by the

branch pathway model given in Fig. 6. We observe that

HAP1 represses CYC7, and CYB2 activates CYC7. It is

also observed that HAP1 activates COX5A and CYT1. These

observations are in agreement with the biological experiment

findings in [16], [18].

VI. C ONCLUSIONS

In this paper, the effects of the B-Spline algorithm on the

gene time-series data and inference process are analyzed.

Fig. 6.

analyzed. The introduction of B-Spline algorithm yields

better results in both cases. In further studies, B-Spline

algorithm will be applied to analyze the effects of noisy data

on genetic programming during the inference process.

R EFERENCES

[1] T. Chen, V. Filkov, and S. S. Skiena, Identifying gene regulatory

networks from experimental data, pp. 94103, 1999.

[2] J. Aach and G. M. Church, Aligning gene expression time series with

time warping algorithms, BIOINF: Bioinformatics, vol. 17, pp. 495

508, 2001.

[3] H. Wang, L. Qian, and E. Dougherty, Inference of gene regulatory

networks using s-system: A unified approach, IEEE 2007 Symposium

on Computational Intelligence in Bioinformatics and Computational

Biology, 2007.

[4] S. Samadi, M. Ahmad, and M. Swamy, Characterization of b-spline

digital filters, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS,

vol. 51, no. 4, pp. 808816, 2004.

[5] G. S. Knmar, P. K. Kalra, and S. G. Dhande, Parameter optimization

for b-spline curve fitting using genetic algorithms, IEEE Evolutionary

Computation, vol. 3, pp. 18711878, 2003.

[6] H. Prautzsch, W. Boehm, and M. Paluszny, Bezier and B-Spline

Techniques. Springer, 2002.

differential equation models for gene regulatory networks using genetic

programming and kalman filtering, IEEE Transactions on Signal

Processing, vol. 56, no. 8, 2008.

[8] E. Voit and T. Radivoyevitch, Bioinformatical systems analysis of

genome-wide expression data, Bioinformatics, vol. 16, pp. 1023

1037, 2000.

[9] C. Spieth, F. Streichertr, N. Spee, and Zell., A memetic inference

method for gene regulatory networks based on s-systems, Proceedings

of the IEEE Congress on Evolutionary Computation, pp. 152157,

2004.

[10] N. Norman and H. Iba, Inference of gene regulatory networks using

s-system and differential evolution, Proceedings of the Genetic and

Evolutionary Computation Conference, pp. 439446, 2005.

[11] S. Kimura, K. Ide, A. Kashihara, M. Kano, M. Hatakeyama, R. Masu,

N. Nakagawa, S. Yokoyama, S. Kuramitsu, and A. Konagaya, Inference of s-system models of genetic networks using a cooperative

coevolutionary algorithm, Bioinformatics, vol. 21(7), pp. 11541163,

2005.

Dynamic modeling of genetic networks using genetic algorithm and

s-system, Bioinformatics, vol. 19(5), pp. 643650, 2003.

[13] S. Kimura, K. Ide, A. Kashihara, M. Kano, M. Hatakeyama, R. Masui,

N. Nakagawa, S. Yokoyama, S. Kuramitsu, and A. Konagaya, Inference of s-system models of genetic networks from noisy time-series

data, Chem-Bio Informatics Journal, vol. 4(1), pp. 114, 2004.

[14] J. Almeida and E. Voit, Neural-network-based parameter estimation in

s-system models of biological networks, Genome Informatics, vol. 14,

pp. 114123, 2003.

[15] W. Mendenhall and T. Sincich, A Second Course in Statistics: Regression Analysis. Prentice Hall, 2003.

[16] P. Woolf and Y. Wang, A fuzzy logic approach to analyzing gene

expression data, Physiol. Genomics, vol. 3, pp. 915, 2000.

[17] http://www.wanghaixin.com/ss/fulldata.htm,

[18] J. Schneider and L. Guarente, Regulation of the yeast cyti gene encoding cytochrome cl by hap1 and hap2/3/4, Molecular and Cellular

Biology, vol. 11(10), pp. 49344942, 1991.

- Time Series Laboratory Exercise IUploaded byAl-Ahmadgaid Asaad
- ECONOMETRIC ANALYSIS TIME SERIES - LISTINET.pdfUploaded byedwardeduardo10
- Stata 02 04timeseriesanalysis 120605154536 Phpapp02Uploaded byHector Garcia
- time series ppt.pdfUploaded byAbhishek Yada
- Exercises for July 16Uploaded byJoshua Corpuz Jola
- phps09aUploaded byVinay Kr
- Data Mining on Heathcare Management SystemUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Predictability of Nonstationary Time Series Using Wavelet and EMD Based ARMA ModelsUploaded bybhanutez
- iandfct62014sylUploaded byalokjase
- Sixth Form Trials 2018 Semi-Final Draft.pdfUploaded bynicole
- Comparacion Funcional de CobreUploaded byPatricia Mejia Solier
- 1. FORECASTING_Introduction.pptUploaded byJamal Farah
- Problem Posing RubricUploaded byAnabel Sta Cruz
- Lecture6_StockSimulationExcelUploaded byHarry Kwong
- Advanced Systems Lab Exercises From the Book – VISkiUploaded byChandana Napagoda
- Chap19 Time-Series Analysis and ForecastingUploaded byMahesh Ramalingam
- 2015 Rocada DumitrescuUploaded byPlescan Cristina Elena
- 1_2017_Curves_Representation.pdfUploaded bySreehari Nambiar KT
- Training Data Selection for Short Term Load ForecastingUploaded byCristian Klen
- 1.3 3D ModelingUploaded byLucas Lee
- 434Uploaded byAnaAAAM
- ONDULETASUploaded byFISORG
- 1-s2.0-S0960148116308977-mainUploaded byJulián Camacho-Gómez
- BriciuAndrei-DiurnalsemidiurnalandfortnightlytidalcomponentsinorthotidalproglacialriversUploaded byniga maria anca
- Forecasting DPUploaded bymeagon_cj
- ForecastUploaded byAnkita Sahu
- j 0342054059Uploaded byinventionjournals
- Data mining 1Uploaded bytinamishu
- Silent RiskUploaded bybigshanto
- 051021Uploaded bysriashokcute

- Tracking and Tracing Books _ InnoCentive Custom Challenge Division.pdfUploaded bySudipta Bhadra
- Scientific Voyage 2(1) [Article ID - 47]Uploaded bySudipta Bhadra
- Intro Face Detect RecognitionUploaded bySudipta Bhadra
- IT_0016_ENUploaded bySudipta Bhadra
- IT_0014_ENUploaded bySudipta Bhadra
- IT_0013_ENUploaded bySudipta Bhadra
- IT_0012_ENUploaded bySudipta Bhadra
- IT_0011_ENUploaded bySudipta Bhadra
- Webinar of ACR Grand ChallengeUploaded bySudipta Bhadra
- Lab 01Uploaded byCharu Roopa Varadarajan
- 05470558Uploaded bySudipta Bhadra
- 04578259Uploaded bySudipta Bhadra

- CV -- Qifeng Weng 9-29-11Uploaded byİlker Soyulu
- ASIC Interview Question & Answer_ ASIC VerificationUploaded byprodip7
- ANALYSIS OF FIBER REINFORCED PLASTIC NEEDLE GATE FOR K.T. WEIRSUploaded byIJIRAE- International Journal of Innovative Research in Advanced Engineering
- EC-Council University Catalog 2017-V6Uploaded byEC-Council University
- Discrete Vol1Uploaded byHuyHoàng
- Reloj en tiempo real con pic 16f628a.docxUploaded bydidproel
- ASON ManagementUploaded bysaja44
- Fire Escape's 314 Area Code BBS Directory - June 1999Uploaded byFire Escape
- The Impact of New Media on Intercultural Communication in GlobalUploaded byAhMei
- 978-1-63057-094-1-3Uploaded bygouravbhatia200189
- Basic QA Testing Interview Questions and AnswersUploaded byPriya Mach
- 286549922-Bernard-Tschumi-Architecture-and-Disjunction.pdfUploaded byJorge Luis Nuñez
- How to Start a Difficult ConversationUploaded byabirbhab
- Antisocial Behavior and HypnosisUploaded byMansonCaseFile
- 1. Norsen. John S. Bell’s concept of local causalityUploaded byFrancesca Leto
- makalah centosUploaded byErristhya Darmawan
- User OptionsUploaded byJustin Galipeau
- Magnetospheric Eternally Collapsing Object (Jeffy Hodgson)Uploaded byDr Abhas Mitra
- 4846D-SD Tech ManUploaded byStevenRodriguez
- FUJITSU H730 SPECS SHEETUploaded byAleksandar Vasic
- 75126579-Lecture-6-The-Engineer-s-Transit-and-Theodolite.pdfUploaded byBryle James Nanglegan
- ContentGuard Holdings v. Amazon.com et. al.Uploaded byPriorSmart
- ıuuıu 56Uploaded bydob14
- As ISO IEC 15444.2-2004 Information Technology - JPEG 2000 Image Coding System ExtensionsUploaded bySAI Global - APAC
- Practical Applications and Solutions Using LabVIEW SoftwareUploaded byMaleny Garcia
- AES Discussion SheetUploaded byAjmal Salim
- day three madeline hunter lesson planUploaded byapi-269729950
- Ob Paper Final (1)Uploaded byZaheer Hendricks
- People and Environment 2009 - Landscape Architecture ProgrammeUploaded bycalderdavid35
- Hit the Reset Button in Your Brain - NYTimesUploaded byjavicpb