You are on page 1of 50

INFERRING GENE REGULATORY NETWORKS USING HETEROGENEOUS MICROARRAY DATASET

A PROJECT REPORT Submitted by S.P.SUGANYA DEVI L.SUGANYA U.SUGANYA 80705104097 80705104098 80705104099

in partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING
in

COMPUTER SCIENCE AND ENGINEERING J.J.COLLEGE OF ENGINEERING AND TECHNOLOGY, AMMAPETTAI,TRICHIRAPPALLI-620 009

ANNA UNIVERSITY:: CHENNAI 600 025
MAY 2005

ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE

Certified

that

this

project

report

“INFERRING

GENE
work of

REGULATORY NETWORKS USING HETEROGENEOUS MICROARRAY
“S.P.SUGANYA

DATASETS”

is

the

bonafide

DEVI L.SUGANYA AND U.SUGANYA”

who carried out the project work under my supervision.

SIGNATURE
<<Name>>

SIGNATURE
<<Name>>

HEAD OF THE DEPARTMENT

SUPERVISOR
Assistant professor,

Dept.of Computer Science & Engg., Dept.of ComputeScience & Engg., J.J.College of Engg. & Tech., Ammapattai, Tiruchirappalli-620009. J.J.College of Engg. & Tech., Ammapattai, Tiruchirappalli-620009.

ACKNOWLEDGEMENT

ABSTRACT

Inferring Gene Regulatory Networks (GRNs) is critical in describing the intrinsic relationship between genes in the course of evolution and discovering group behaviors of a certain set of genes. Recent development on high-throughput technique, microarray, provides researches a chance to monitor the expression patterns of thousands of genes simultaneously. While increasing amount of microarray data sets are becoming available online, the integration of multiple microarray data sets from various data sources (e.g. different tissues, species, and conditions) for GRNs inference becomes very important in order to achieve more accurate and reliable GRNs modeling. This paper will review recent development on integrating multiple microarray data sets and propose a new method to infer GRNs using using multiple microarray data sets .

TABLE OF CONTENTS CHAPTER NO. TITLE PAGE

ABSTRACT LIST OF TABLE LIST OF FIGURES LIST OF SYMBOLS

1.

INTRODUCTION 1.1 LITERATURE SURVEY 1.1.1 1.1.2 1.2

2.

PROPOSED SYSTEM

2.1. MicroArray Dataset: 2.1.1 Formatting MicroArray Dataset 2.1.2 Loading MicroArray Dataset 2.1.3 Reading Gene values from MicroArray 2.2 Implementing Correlation Signature Method 2.2.1Capturing Landmark Genes 2.3. Modifying MicroArray Dataset 2.4. Implementing GRN SigCalc Method

2.5. Capturing Activator and Repressor Genes 2.6. Constructing Gene Regulatory Network
3.

CONCLUSION FUTURE WORK APPENDICES A.1 SOURCE CODE A.2 SNAP SHOTS REFERENCES

4.

LIST OF FIGURES FIGURE NO NAME OF THE FIGURE PAGE NO

LIST OF TABLES TABLE NO NAME OF THE TABLE PAGE NO

LIST OF ABBREVIATIONS

1 CHAPTER 1 INTRODUCTION Understanding the genetic causes behind phenotypic

characteristics of organisms is one of the most important objectives in genetic researches. In other words, discovering the exact ways in which genetic components, genes and proteins, i.e., interact to make a complex living system. Recently, high-through put techniques, like microarray , have greatly helped researchers obtain a closer look on interactions between genes. Other than traditional genetic and molecular approaches, which usually examine and collect data on a single gene, microarray technique monitors the expression patterns of tens of thousands of genes in parallel [2], [9]. Data collected with this technique are noted as gene expression data and well-suited for both qualitative and quantitative-level modeling and simulation. The interaction between the genes can be illustrated with a Gene Regulatory Network (GRN). A GRN contains a collection of genes that interact with each other and can elucidate the effect of the nature and topology of interactions on the systemic properties of organisms. However, GRNs constructed from single microarray data

2 sets are often hard to interpret and unreliable due to the lack of enough samples. The ability of integrating heterogeneous microarray data sets is becoming very important and desired by bioinformatics researchers to infer more reliable GRNs with statistically robust models. After reviewing recent efforts on this topic, this paper will present a novel GRN inference method using the SigCalc algorithm [6]. 1.1 LITERATURE SURVEY

Singular Value Decomposition:
Wang et al. presented a method to find the most consistent GRN structure with respect to all involved data sets [11]. This method uses linear differential equations to describe a GRN, which is shown in Equation x(t) = Jx(t) + b(t), t = t1, . . . , tm where J = ((Jij)n×n= ∂f(x)/∂x is an Jacobian matrix or

connectivity matrix and b = (b1, . . bn )T ∈ Rn is a vector

representing the external stimuli or environmental conditions. A particular solution for Equation-1 can be derived for each data set using Singular Value Decomposition (SVD) [1]. With the 3 consideration of the sparse structure of GRNs, the inference of GRN is formulated as an optimization problem with an objective function of forced matching and sparsity terms for multiple data sets. Here forced matching means forcing the final solution of J to match with the SVD solution whereas sparsity means the matrix J should be sparse (In other words, most of the elements in J should have zero values). The experimental results showed that the generated GRNs were promising and biologically meaningful [11].

Evolving Connectionist System (ECOS):
Goh and Kasabov utilized the Evolving Connectionist System (ECOS) to integrate multiple data sets [5]. An ECOS is a neural network that can continuously adjust its structure through interacting with its environment and other systems. The system will evolve along with the incoming information that has unknown distribution. ECOS uses an objective function to optimize the system performance over time. In terms of modeling, ECOS

allows new data to be added in an incremental way so that the connectionist systems can be built for online adaptive learning, where new data from various sources can be added into the system. In addition to applying the ECOS model to all the data sets, Goh and Kasabov also conducted normalization on all data sets to achieve better results. 4

Clustering:
Filkov and Skiena proposed a method to combine microarray data from various experiments on an equal basis using the concept of consensus clustering [4]. Clustering technology has been very popular and widely applied in analyzing biological data [3] for years. There are many existing clustering results for the same organism available in public repositories today. clusterings of the same genes can be used to extract more information about the groups that the genes cobelong to than the individual clusterings themselves. Consensus clustering isan algorithm that is based on the various, source-specific (for the same organism) clusterings of the data (or the meta-data) to both provide

an integrated view of the data and eliminate misclassifications due to errors in the individual data sets. Mathematically, clusterings are set partitions and therefore consensus clustering algorithm can be formalized as a set partition problem. Given n partitions, p1, p2, . . . , pn, and the symmetric difference distance on any two partitions, find a consensus partition p that minimizes D =n _ i=1d(pi, p).

This is denoted in the literature as the median partition problem and has been proved to be NP-Complete. Filkov and Skiena provided 5 three heuristics to find this consensus cluster and demonstrated the algorithm’s efficiency with both clean and noise-contained data experimentally.

PROPOSED SYSTEM:
Formatting MicroArray Dataset: A typical microarray experiment will have tens of samples while each sample contains thousands of genes expression data. The asymmetry between the number of samples and the number of genes

is called the ”curse of dimensionality” and often causes problems in statistical data-processing. On the other hand, increasing amount of microarray data is being generated daily. Data collected from microarray experiments is usually heterogeneous. That is, the data is often from different tissues, treatment strategies, stages of disease development, and conducted in different labs that apply different microarray technologies and protocols. The integration of different microarray data sets is creating a new challenge to bioinformatics researchers. The Dataset has to be formatted in excel sheet. The column heading denotes the different sample and row heading denotes the different number of genes. 6 Loading MicroArray Dataset: The MicroArray Dataset is loaded into the system by choosing the file at run time. Gene values from MicroArray Dataset: The system reads the gene values from input dataset and stores it temporarily for future processing. Table 1.1 A Typical Microarray Example

S1
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 0.73 0.80 0.01 0.37 0.91 0.02 0.76 0.75 0.92 0.14

S2
0.88 0.75 0.05 0.32 0.85 0.07 0.87 0.84 0.86 0.03

S3
0.69 0.71 0.09 0.41 0.83 0.12 0.92 0.77 0.84 0.06

S4 0.71 0.82 0.03 0.35 0.87 0.08 0.95 0.89 0.96 0.16

*L1*

*L2*

7 Implementing Correlation Signature Method:

Kang et al. presented a correlation-based algorithm, Sig-Calc, to provide a new interpretation of gene expression data and to integrate heterogeneous microarray data sets [6]. In this algorithm, they first defined the concept of correlation signature. The correlation signature is used to capture the correlations between a gene and a set of landmark genes. Different methods can be used to choose the landmark genes, for example, the genes from a particular pathway or being referred in literatures as associated with a certain disease with high probability, such as lung cancer. The correlations are defined to be the similarities and dissimilarities between gene vectors (rows in Table-1.2). Any convenient distance metric can be used in the calculation of correlations, e.g. Euclidean distance, Cosine correlation, Pearson correlation, and Mean-Expression distance. The selection of a proper distance metric is based on the application environment [6]. For instance, although Euclidean distance is a popular method to measure the distance between two vectors, it fails to capture the natural bias of gene expression data. Thus if we focus on the fluctuation of the expression levels rather than the absolute expression values, 8

correlation metrics may be used to achieve more accurate results than does Euclidean distance. All correlation signature values form a vector. This vector is called gene signature vector. The expression level of a gene can then be represented by its correlation to a set of landmark genes. A typical microarray data set is usually represented by a matrix. The rows are the measurements associated with individual genes while the columns are the measurements associated with the samples. Each Entry represents the expression level of one gene of a sample. Typically, an asymmetric relationship exists between the number of genes and samples, i.e., the number of genes (in thousands) is much larger than the number of samples (in tens). Capturing Landmark Genes: SigCalc assigns a correlation signature to each gene in a microarray data set. Without loss of generality, G3 and G6 in Table-I are the selected landmark genes, L1 and L2 based on their average value. The correlation signature values for the genes are shown in Table 1.2. represents the signature vector. For example, We also list the average values of the correlation signature for each gene in last column. The Pearson 9

correlation is used for this calculation. This popular metric measures the tendency of two vector of variables to increase or decrease together. Its mathematical definition is described as follows.

where

and

represent two gene row vectors in our context. In this paper, we use the same correlation for the correlation and are correlated while a

For the dissimilarity measure, one just simply changes the correlation form to be distance described in [6], and a distance close to zero implies

signature calculation. This correlation distance has a range of [0, 1], distance close to one implies the two vectors are inversely correlated. There is no correlation between the two vectors if the value of the correlation distance is 0.5. We can interpret the value using the regulation rule between genes. For instance, Sig(G7) = [0.21, 0.08] may imply that G7 is activated by the landmark genes G3 and G6. On the other hand, Sig(G9) =[0.90, 0.75] may imply that G9 is repressed by G3 and G6.

10

Table 1.2 The Correlation Signature

Sig(G1) Sig(G2) Sig(G3) Sig(G4) Sig(G5) Sig(G6) Sig(G7) Sig(G8) Sig(G9) Sig(G10)

L1(A) 0.54 0.95 0 0.26 0.97 0.04 0.21 0.56 0.90 0.85

L2(A) 0.61 0.83 0.04 0.30 0.97 0 0.08 0.39 0.75 0.72

Average 0.59 0.89 0.02 0.28 0.97 0.02 0.14 0.47 0.82 0.78

Modifying MicroArray Dataset: We rank the genes in Table-1.2 based on the average column in descending order. Since a value of 0.5 represents no correlation existing between a gene vector and the landmark genes (in other words, the landmark genes do not activate or repress the gene), we consider a threshold parameter θ = 0.1 to exclude the genes that have 11 the average value near 0.5 ±θ. The modified gene expression vectors are shown in Table-1.3 1.3 The Modified MicroArray Data

G2 G3 G5 G6 G7 G9 G10

S1 0.80 0.01 0.91 0.02 0.76 0.92 0.14

S2 0.75 0.05 0.85 0.07 0.87 0.86 0.03

S3 0.71 0.09 0.83 0.12 0.92 0.84 0.06

S4 0.82 0.03 0.87 0.08 0.95 0.96 0.16

*L1* *L3* *L2*

Implementing GRN SigCalc Method : Given k microarray data sets, our goal is to construct a GRN which contains all the activate and repress relationships between the genes and the landmark genes. Without loss of generality, 12 let mi(n × m), i = 1, . . . , k represent the microarray data sets, where n is the number of genes and m is the number of samples. We concatenate the k data sets to obtain a bigger matrix M(n × r) where r = m × k. The gene vectors (row vectors) can be represented as G = {g1, g2, . . . , gn}. Also let L = {l1, l2, . . . , lt} represent the initial landmark gene set. The correlation signatures between each gene and

the landmark gene li can be represented as SIG = {Sig(g1,li ), Sig(g2,li ), . . . , Sig(gn,li )}, where i = 1, . . . , t 1 Also we have the follow notations: Avg(Sigi) : the average Signature value for gi AvgAct(Sigi) : the average Signature value between gi and the activators AvgRep(Sigi) : the average Signature value between gi and the repressors MAX(a, b) : the maximum value between a and b Diff(a, b) : the difference between a and b 1. do 2. for each gi in G 3. for each lj in L 4. calculate Sig(gi, lj) 13 5. end for 6. calculate Avg(Sigi) 7. if Avg(Sigi)±0.5 < θ 8. remove gi from G 9. end if 10. end for

11. for each gi in G 12. calculate AvgAct(Sig(gi)) 13. calculateAvgRep(Sig(gi)) 14. ifDiff(AvgAct(Sig(gi)),AvgRep(Sig(gi))) < δ 15. remove gi from G 16. end if 17. end for 18. add gx withMAX(Diff(Avg(ACT),Avg(REP))) to L 19. if activators activate gx 20. add gx to ACT 20. else add gx to REP 21. end if 22. add gx to the GRN with the incoming lines from ACT 23. and outgoing lines to elements in REP set 24. until elements in M have been all processed

14

Capturing Activator and Repressor Genes We then select the gene with the highest rank score, G5, in this example, and add it to the landmark gene list and mark it as an repressor. Note in this particular example, other than G3 and G6, which are already in the landmark gene list, we also have two ranking scores (for G4 and G7) smaller than 0.5±θ. G4 and G7 are activators.

Therefore, we need to choose the lowest in these scores (1−0.14 = 0.86 for G7 in this case) and compare 0.86 with the highest ranking score (0.97 for G5) in the repressors and select the one with the higher value (G5) to be the new landmark gene. The assumption behind the scene is that the one that is most activated/repressed will be selected to be the next landmark gene, either being an activator or a repressor. With the new landmark gene G5, we have a new Correlation Signature Table-1.4 In Table-IV. L1(R) and L2(R) means L1 and L2 are repressors while L3(A) means L3 is an activator. Average(R) represents the average for all the repressors while Average(A) represents the average for all the activators. Diff represents the difference between Average(A) and Average(R).

15

Table 1.4 The Correlation Signature with a New Landmark gene

L1(R) Sig(G2) Sig(G3) Sig(G5) Sig(G6) Sig(G7) Sig(G9) Sig(G10)

L2(R)

L3(A)

Average(R)

Average(A)

Diff

0.95 0 0.97 0.04 0.21 0.90 0.85

0.83 0.04 0.97 0 0.08 0.75 0.72

0.11 0.97 0 0.97 0.87 0.15 0.14

0.89 0.02 0.97 0.02 0.14 0.82 0.78

0.11 0.97 0 0.97 0.87 0.15 0.14

0.78 0.95 0.97 0.95 0.72 0.67 0.64

We will select the gene with the highest Diff value to be the next landmark gene (except the ones that are already in the landmark gene set). This means we select the one that is mutually regulated by the activating genes and repressing genes to the greatest degree. In this example, it is G2.

16

Constructing Gene Regulatory Network We continue this procedure until all the genes have either been added to the landmark gene set or excluded from the working microarray data set. At the end of the loop, we obtain a gene activating/repressing relation graph. This is illustrated in Figure-1 where solid lines and dashed lines represent activate and repress relationship, respectively. Note that we can use δ as a threshold for Diff to limit the final GRN size. That is, we can reduce the GRN size by removing the genes that have a Diff value under the threshold requirement in each loop.

1 7

GRN-SigCalc can be verified by constructing GRNs from training data sets and test the activating/repressing relationship against the testing data sets. When multiple data sets are available, we can split these sets to several groups and construct GRNs separately

CHAPTER 3

CONCLUSION

This project is used to construct the GRNs using microarray data sets from heterogeneous data sources. Since the microarray datasources are obtained from different organisms and have different sample backgrounds, quality control standard and microarray technologies etc., the integration of the data is a difficult task. On the other hand, due to the curse of dimensionality, GRNs constructed from a single microarray experiment are often not convincing and lack robust statistical basis. Although the data from different sources may vary greatly, presumably, the relationship between the genes will remain. GRN-SigCalc integrates multiple microarray data to construct GRNs using the correlation signature concept. In addition, this method utilizes the sparsity feature of GRNs to remove the genes that are not highly correlated with other genes from the network. The resulting GRNs are compact and can represent both activate and repress relationships. The size of GRNs can be further reduced by increasing the threshold value θ and δ in the algorithm. GRN-SigCalc can be verified by constructing GRNs from training data sets and test the activating/repressing relationship against the testing data sets. When multiple data sets are available, we can split these sets to several groups and construct GRNs separately.

FUTURE WORK:
The GRN-SigCalc method can be improved by importing the resulting GRNs to a neural network and evolve the neural network to achieve higher accuracy. This is especially useful when noise in the data sets is unavoidable.

21 APPENDICES

A.1SOURCE CODE

SigcalcFirstServlet.java import java.io.*; import java.lang.reflect.Array; import java.net.*; import java.util.ArrayList; import javax.servlet.*; import javax.servlet.http.*; import java.io.FileInputStream; import java.io.InputStream; import java.util.*; import org.apache.poi.poifs.filesystem.POIFSFileSystem; import org.apache.poi.hssf.usermodel.HSSFCell; import org.apache.poi.hssf.usermodel.HSSFRow; import org.apache.poi.hssf.usermodel.HSSFSheet; import org.apache.poi.hssf.usermodel.HSSFWorkbook; /** * * @author * @version */ public class SigcalcFirstServlet extends HttpServlet { static int len=0; static int no_of_row=0; /** Processes requests for both HTTP <code>GET</code> and <code>POST</code> methods. * @param request servlet request * @param response servlet response */ protected void processRequest(HttpServletRequest request, HttpServletResponse response)

throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); //String FileName="C:/Documents and Settings/mekala.FOCUS/Desktop/Book1.xls"; String FileName=request.getParameter("fname"); int sno=Integer.parseInt(request.getParameter("sno")); sno=sno-1; try { HttpSession ses=request.getSession(); ses.setAttribute("FILENAME",FileName); ses.setAttribute("SNO",String.valueOf(sno)); InputStream in= new FileInputStream(FileName); POIFSFileSystem fs=new POIFSFileSystem(in); HSSFWorkbook wb=new HSSFWorkbook(fs); HSSFSheet sheet=wb.getSheetAt(sno); ArrayList al=new ArrayList(); int no_of_rows=sheet.getPhysicalNumberOfRows(); System.out.println("NUMBER OF ROWS"+no_of_rows); HSSFRow row; HSSFCell cell; String s; int rows; rows = sheet.getPhysicalNumberOfRows(); if(rows>0) { int cols = 0; int tmp = 0; // To find number of columns in data sheet for(int i = 0; i < rows; i++) { row = sheet.getRow(i);

if(row != null) { tmp = sheet.getRow(i).getPhysicalNumberOfCells(); if(tmp > cols) cols = tmp; } } int k=0; // Let len be number of samples in Micro array Dataset len=cols-1; System.out.println("len: "+len+" row :"+rows); // TO retrive the values row by row for(int r = 0; r < rows; r++) { if(r==1) { System.out.println(" "); System.out.println("-------------------------------------------"); } row = sheet.getRow(r); System.out.println(" "); // To retrieve the value column by column from each Cell if(row != null) { for(int c = 0; c < cols; c++) { if(c==1) { System.out.print(" | \t"); } cell = row.getCell((short)c); if(cell != null) { if(cell.getCellType() ==HSSFCell.CELL_TYPE_STRING)

{ String cellvalue=cell.getStringCellValue(); System.out.print(cellvalue+"\t"); if(r!=0 && c!=0) { // values are added to al arraylist al.add(cellvalue); } } else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) { double cellvalue1=cell.getNumericCellValue(); System.out.print(cellvalue1+"\t"); if(r!=0 && c!=0) { // values are added to al arraylist al.add(new Double(cellvalue1)); } } } else { System.out.println("*********Null Value:"+"row= "+r+"Col= " +c); } } k++; } else { rows++; } } System.out.println("\n"+al+"\n"+al.size());

request.setAttribute("ALLVALUES",al); request.setAttribute("col",String.valueOf(len)); } System.out.println(" Page Forwarded "); RequestDispatcher rd=getServletContext().getRequestDispatcher("/sigcalcfirst.jsp"); rd.forward(request,response); } catch(Exception e) { System.out.print("Sigcalc First Servlet Exception:"+e); } out.close(); } // <editor-fold defaultstate="collapsed" desc="HttpServlet methods. Click on the + sign on the left to edit the code."> /** Handles the HTTP <code>GET</code> method. * @param request servlet request * @param response servlet response */ protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { processRequest(request, response); } /** Handles the HTTP <code>POST</code> method. * @param request servlet request * @param response servlet response */ protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {

processRequest(request, response); } /** Returns a short description of the servlet. */ public String getServletInfo() { return "Short description"; } // </editor-fold> }

GeneRegulatoryNetworkServlet.java

import java.awt.*; import java.awt.geom.Rectangle2D; import java.io.*; import javax.servlet.*; import javax.servlet.http.*; import java.util.Arrays; import java.util.ArrayList; import Acme.JPM.Encoders.GifEncoder; /** * * @author * @version */ public class GeneRegulatoryNetworkServlet extends HttpServlet { Frame frame = null; Graphics g = null; Graphics g1 = null;

public void init(ServletConfig config) throws ServletException { super.init(config); // Construct a reusable unshown frame frame = new Frame(); frame.addNotify(); } protected void processRequest(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { ServletOutputStream out = res.getOutputStream(); try { System.out.println("GeneRegulatoryNetworkServlet"); HttpSession ses=req.getSession(); ArrayList lmg=(ArrayList)ses.getAttribute("LANDMARK1"); ArrayList lmg1=(ArrayList)ses.getAttribute("LANDMARK0"); ArrayList lg=new ArrayList(); ArrayList ag=(ArrayList)ses.getAttribute("Activator"); ArrayList rg=(ArrayList)ses.getAttribute("Repressor"); ArrayList xa=new ArrayList(); ArrayList ya=new ArrayList(); ArrayList xr=new ArrayList(); ArrayList yr=new ArrayList(); ArrayList act=new ArrayList(); ArrayList rep=new ArrayList(); System.out.println("LANDMARK1:"+lmg); System.out.println("LANDMARK0:"+lmg1); lg=lmg1; int si=ag.size()+rg.size(); int gm=0,rm=0,ym = 0; // Get the image location from the path info String url= getServletContext().getRealPath("/"); System.out.println("URL:"+url); String rp=java.io.File.separator+"build";

url=url.replace(rp,""); System.out.println("URL:"+url); String ysource = url+"images"+java.io.File.separator+"y.GIF"; String gsource = url+"images"+java.io.File.separator+"g1.GIF"; String rsource = url+"images"+java.io.File.separator+"r1.GIF"; System.out.println("ysource:"+ysource); System.out.println("gsource:"+gsource); System.out.println("rsource:"+rsource); if (ysource == null) { throw new ServletException("Extra path information " + "must point to an image"); } // Load the image (from bytes to an Image object) MediaTracker mt = new MediaTracker(frame); // frame acts as ImageObserver Image yimage = Toolkit.getDefaultToolkit().getImage(ysource); Image rimage = Toolkit.getDefaultToolkit().getImage(rsource); Image gimage = Toolkit.getDefaultToolkit().getImage(gsource); mt.addImage(yimage, 0); mt.addImage(rimage, 1); mt.addImage(gimage, 2); try { mt.waitForAll(); } catch (InterruptedException e) { getServletContext().log(e, "Interrupted while loading image"); throw new ServletException(e.getMessage()); } // Construct a matching-size off screen graphics context int w = yimage.getWidth(frame); int h = yimage.getHeight(frame); int w1 = gimage.getWidth(frame); int h1 = gimage.getHeight(frame);

int w2 = rimage.getWidth(frame); int h2 = rimage.getHeight(frame); //frame.setBackground(Color.blue); Image offscreen = frame.createImage(800,900); g = offscreen.getGraphics(); g1=offscreen.getGraphics(); System.out.println("image width, height:"+w+":"+h); System.out.println("image width, height:"+w1+":"+h1); System.out.println("image width, height:"+w2+":"+h2); // Draw the image to the off-screen graphics context int count=1; //-----------------> display the activator and repressor in color genes page 20 g.setFont(new Font("arial", Font.BOLD , 17)); g.setColor(Color.RED); g.drawString("Gene Regulatory Network", 200, 30); g.setColor(Color.DARK_GRAY); g.drawString("LandMark Gene", 30, 60); g.setFont(new Font("arial", Font.BOLD , 12)); int j=0; int count1=0; int x1,x2,y1,y2; int y3=0; for(int i=0;i<lg.size();i++) { if(count1 ==5) { y3=y3+33; j=0; count1=0; } x1=(30+ (j*35)); y1=80+y3; x2=(30+ (j*35)+(w/2) - 7); y2=(80+h/2)+3+y3;

g.drawImage(yimage, x1, y1, frame); g.drawString("G"+lg.get(i).toString(),x2,y2); count1++; j++; ym=y2; } System.out.println("YMax:"+ym); g.setFont(new Font("arial", Font.BOLD , 17)); g.drawString("Activator Gene", 250, 60); g.setFont(new Font("arial", Font.BOLD , 12)); j=0; count1=0; y3=0; for(int i=0;i<ag.size();i++) { if(count1 ==5) { y3=y3+33; j=0; count1=0; } x1=(250+ (j*35)); y1=80+y3; x2=(250+ (j*35)+(w1/2) - 7); y2=(80+h1/2)+3+y3; g.drawImage(gimage, x1,y1, frame); g.drawString("G"+ag.get(i).toString(),x2,y2); count1++; j++; gm=y2; } System.out.println("GMax:"+gm); g.setFont(new Font("arial", Font.BOLD , 17)); g.drawString("Repressor Gene", 470, 60); g.setFont(new Font("arial", Font.BOLD , 12)); j=0;

count1=0; y3=0; for(int i=0;i<rg.size();i++) { if(count1 ==5) { y3=y3+33; j=0; count1=0; } x1=(470+ (j*35)); y1=80+y3; x2=(470+ (j*35)+(w2/2) - 7); y2=(80+h2/2)+3+y3; g.drawImage(rimage, x1,y1, frame); g.drawString("G"+rg.get(i).toString(),x2,y2); count1++; j++; rm=y2; } System.out.println("RMax:"+rm); // Write CONFIDENTIAL over its top int max= ym+50; g.setFont(new Font("arial", Font.BOLD , 15)); g.setColor(Color.blue); g.drawString("The Inference of GRN",150,max); g.setColor(Color.DARK_GRAY); g.setFont(new Font("arial", Font.BOLD , 12));

A.2 SNAP SHOTS INPUT SCREEN

Loading Microarray Dataset

Reading Gene values from MicroArray Dataset

REFERENCES [1] Alter, O., Brown, P., and Botstein, D. (1997) Singular value decomposition for genome-wide expression data processing and modeling. In Proceedings of Natural Academic Sciences of the United States of America, 10101–6. [2] Chen, J., Wu, R., Yang, P., Huang, J., Sher, Y., Han, M., Kao, W., Lee, P., Chiu, T., Chang, F., Chu, Y Wu, C., and Peck, K. (1998) ., Profiling expression patterns and isolating differentially expressed genes by cdna microarray system with colorimetry detection. Genomics, 51, 313–324. [3] Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. In Proceedings of Natural Academic Sciences of the United States of America, Vol. 85 14863?4868.

[4] Filkov, V. and Skiena, S. (2004) Integrating microarray data by consensus clustering. International Journal on Artificial Intelligence Tools,, 13(4),863–880. [5] Goh, L. and Kasabov, N. (2003) Integrated gene expression analysis of multiple microarray data sets based on a normalization technique and on adaptive connectionist model. IEEE Proceedings, IJCNN’2003, Vol. 3 1724–1728.

[6] Kang, J., Yang, J., Xu, W., and Chopra, P. (2005) Integrating heterogeneous microarray data sources using correlation signatures. In Proceedings of Data Integration in the Life Sciences, Second InternationalWorkshop (DILS 2005), 105–120. [7] Kauffman, S. (1996). At Home in the Universe: The Search for Laws of Self-Organization and Complexity. Oxford University Press. [8] Keedwell, E. and Narayanan, A. (2005) Discovering gene networks with a neural-genetic hybrid. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(3), 231–242.

[9] Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P., and Davis, R. (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. In Proceedings of the National Academy of Sciences of the United States of America, 10614–10619. [10] Thieffry, D., Huerta, A. M., Ernesto Prez-Rueda, E., and Collado-Vides, J. (1998) From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in escherichia coli.. BioEssays, 20(5), 433–440. [11] Wang, Y Joshi, T., Zhang, X.-S., Xu, D., and Chen, L. ., (2006) Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics, 22(19), 2413–2420. 522