You are on page 1of 2

Vol. 19 no.

17 2003, pages 2332–2333


BIOINFORMATICS APPLICATIONS NOTE DOI: 10.1093/bioinformatics/btg321

MetaGeneAlyse: analysis of integrated


transcriptional and metabolite data
Carsten O. Daub1, ∗ , Sebastian Kloska2 and Joachim Selbig1
1 Max
Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Golm,
Germany and 2 Scienion AG, Volmerstraße 7a, 12489 Berlin, Germany

Received on March 26, 2003; revised on May 22, 2003; accepted on June 5, 2003

Downloaded from https://academic.oup.com/bioinformatics/article/19/17/2332/206374 by guest on 07 May 2021


ABSTRACT OVERVIEW
Summary: New techniques in sample preparation allow high We have developed a web-based service which allows the ana-
throughput analysis of samples on the transcriptional as well lysis of integrated data sets containing gene-expression data
as on the metabolic level. We present a service accessible and metabolite data. At the web front-end, a tab-separated
via the web that allows the analysis of integrated data sets data set can be uploaded. The file format is limited to an
that combine gene-expression data and metabolic data. After ASCII text table containing genes and/or metabolites in rows
uploading, data sets can be normalized, clustered by vari- and experiments in columns. Data normalization is an essen-
ous methods and results can be graphically visualized. All tial preprocessing step in the comparison of transcriptional
calculations are carried out on a server, so even time- and and metabolic data, since both technologies provide different
memory-consuming analyses can be done independently of measurement categories. We therefore implemented several
the performance of the client. normalization algorithms like normalization to the maximum,
Availability: The service is accessible via web-interface at mean, median, variance, standard deviation, vector norm,
http://metagenealyse.mpimp-golm.mpg.de/ root mean square, z-score, and others. Analysing directly the
Contact: daub@mpimp-golm.mpg.de normalized data set can be done with k-means clustering and
principal component analysis. Different types of hierarchical
INTRODUCTION clustering, which are based on the prior calculation of a dist-
ance matrix, are also implemented. The underlying distance
Highly powerful experimental techniques allow large-scale
matrix is then calculated as a first step using a distance measure
and parallel profiling of biological samples. Profiling on
(e.g. Euclidean or Manhattan distance, Pearson’s correlation
the transcriptomic level is done using cDNA or oligonuc-
coefficient or mutual information). The distance matrices can
leotide hybridization arrays, which provide information on the
also be downloaded to be post-processed by the user. Different
concentrations of mRNA (Baldwin et al., 1999). These micro-
file formats for the download are available to import directly
array experiments are used to identify differentially expressed
the distance matrix into graph visualization tools. Analyses on
genes, to make functional annotations of co-expressed genes
shuffled data sets are useful for the calculation of significance
and to reconstruct regulatory networks. Recently, large-scale
levels, therefore our service allows the shuffling of uploaded
analysis of metabolite levels became available which gives
data sets. Documentation is available in PDF format and can
additional insight into the internal state of the biochemical
be downloaded from the web site.
system (Fiehn et al., 2000). Metabolic properties have been
used to distinguish different genotypes (Taylor et al., 2002).
Additionally, only integrated analysis of metabolites and tran- IMPLEMENTATION
scripts will reveal regulatory properties on a systems level
MetaGeneAlyse is a service that runs on a multiprocessor
(Weckwerth and Fiehn, 2002; Weckwerth, 2003). Results
Linux server under the web-server Apache. It consists of
from both expression data and metabolite data can be used
a compilation of Perl scripts for the dynamic generation
to elucidate different levels of cellular regulation leading to
of web pages and performance-optimized C++ programs
mechanistic insights into genetic and physiological control.
for time-consuming calculations. For the graphical visual-
In an example, the identification of specific genes required
ization of clustering results we use the statistics package R
for the biosynthesis of a secondary metabolite that is used as
(http://www.r-project.org/). The generated figures can be
clinical agent showed the importance of the integrated analysis
displayed and downloaded in JPEG and PDF format. Time-
(Askenazi et al., 2003).
consuming calculations of distance matrices are handled by a
job queuing system, whereas smaller jobs are calculated on-
∗ To whom correspondence should be addressed. the-fly. This job management system allows the user to upload

2332 Bioinformatics 19(17) © Oxford University Press 2003; all rights reserved.
MetaGeneAlyse

and analyse large data files containing up to 6000 genes and REFERENCES
metabolites. Askenazi,M., Driggers,E.M., Holtzmann,A.D., Norman,C.T.,
Freely available tools for the exploration of microarray data Iverson,S., Zimmer,P.D., Boers,M., Blomquist,R.P.,
which run under Java on the user-side have been reported Martinez,J.E., Monreal,W.A. et al., (2003) Integrated tran-
(Dysvik and Jonassen, 2001; Sturn et al., 2002). An advantage scriptional and metabolite profiles to direct the engineering of
of our service is the calculation of large distance matrices lovastatin-producing fungal strains. Nat. Biotech., 21, 150–156.
which gets computationally expensive and requires a large Baldwin,D., Crane,V. and Rice,D. (1999) A comparison of gel-based,
working memory for larger data sets. Such data sets could not nylon filter and microarray techniques to detect differential RNA
be analysed with software running on client-side workstations. expression in plants. Curr. Opin. Plant Biol., 2, 96–103.
The analysis of a data set consists of several steps from Dysvik,B. and Jonassen,I. (2001) J-express: exploring gene expres-
sion data using Java. Bioinformatics, 2, 96–103.
normalization to the choice of a distance measure and a clust-

Downloaded from https://academic.oup.com/bioinformatics/article/19/17/2332/206374 by guest on 07 May 2021


Fiehn,O., Kopka,J., Dörmann,P., Altmann,T., Trethewey,N.R. and
ering method. Each combination of these algorithms will tend
Willmitzer,L. (2000) Metabolite profiling for plant functional
to emphasize different types of regulation inherent in the data. genomics. Nat. Biotech., 18, 1157–1161.
The comparison of results obtained from different combina- Sturn,A., Quackenbush,J. and Trajanoski,Z. (2002) Genesis: cluster
tions might give additional complementary information. For analysis of microarray data. Bioinformatics, 18, 207–208.
instance, the results obtained by hierarchical clustering based Taylor,J., King,R.D., Altmanm,T. and Fiehn,O. (2002) Application
on different similarity measures like the Pearson correlation of metabolomics to plant genotype discrimination using statistics
coefficient and mutual information can be compared to detect and machine learning. Bioinformatics, 18, 241S–248S.
potential non-linear correlations in the data. Also, after dif- Weckwerth,W. (2003) Metabolomics in systems biology. Annu. Rev.
ferent normalization steps, the same clustering algorithms can Plant. Biol., 54, 669–689.
be applied to analyse the impact of normalization to the data. Weckwerth,W. and Fiehn,O. (2002) Patent PCT/EP03/00196.

2333

You might also like