You are on page 1of 1

Use of the Mantel test for analysis of functional gene array data from

R-025

environmental samples
Sanghoon Kang1,3, Joon Jin Song2, Gene S. Wickham1, Christopher S. Schadt1 and Jizhong Zhou3
1Oak Ridge National Laboratory, Oak Ridge, TN; 2University of Arkansas, Fayetteville, AR; 3University of Oklahoma, Norman, OK

ABSTRACTS MANTEL TEST FOR FGA DATA SIMULATION


Functional gene arrays (FGAs) is a powerful tool for monitoring microbial communities of Quantified data from scanned image of FGAII hybridization go through series of pre-processing for down-stream
environmental samples in a high throughput fashion, but analyzing massive microarray data to infer data analysis. Typical pre-processing involves signal/noise ratio detection, normalization and outlier detection. Pre-
the relationships between microbial communities and ecosystem functional processes is very difficult.
processed data has rather conventional format of multivariate data, but much larger dimension on the species
To address such difficulty, the Mantel test, which is an effective approach to determine the association
of different dissimilarity matrices, was evaluated with microarray data from different types of samples. direction (~ 1000 gene or so).
This method is a simple but efficient for spatial structure analysis, when used with geographic distance One of the most important attempts with FGAII data is to compare with actual environmental measurements, e.g., as
matrices. Especially, the Mantel test can provide quantitative results when parametric methods cannot
be used due to lack of replication. We have applied this method to analyze spatial relationship
an entire group and specific groups expected to be related. The subsequent analysis would be a selection of important
between microbial community structure and ecosystem functional processes in forest soils. Total of 25 (highly correlated) functional gene(s) and environmental measurement(s) by means of vector fitting, Bio-Env and
samples collected from two 1-km transects crossing at the center were analyzed for soil characteristics constrained ordination (example in R-024 poster).
and microbial functions. Suites of measurements were summarized into a single dissimilarity matrices
1 n −1 n ⎛ xij − x ⎞ ⎛ yij − y ⎞ d = [n(n-1)/2]
for soil and microbial portions each, and compared with distance matrix by the Mantel test. By doing
that, the overall correlation was determined among multivariate data of soil characteristics and
rM = ∑∑⎜ ⎟⎜ ⎟
d − 1 i =1 j =i +1 ⎝ sx ⎠ ⎜⎝ s y ⎟⎠ i and j are row and column indices
Pearson’s r = 0.99 Pearson’s r = 0 Pearson’s r = -0.99

microbial functions ( P = 0.631). When compared with distance matrix, the Mantel test indicated weak
spatial aucorrelation on microbial functions (P = 0.532) and strong spatial autocorrelation on soil Mantel test is multivariate correlation analysis by means of comparison between dissimilarity matrices of same
characteristics (P = 0.002). There are inconsistency among leading applications of the Mantel test in dimension. Thus constructed dissimilarity matrices from multivariate formats of FGAII data and a suite of soil
terms of their significance test and difference is due to the misunderstanding of the behavior of
Mantel’s rM in comparison to Pearson’s r. We performed simulation study in univariate and bivariate
characteristics measurements are perfect candidate for the test. The standardized Mantel statistics (rM) bounded
cases and decided that the Mantel test is one-tailed test in the upper tail of the reference distribution, between -1 and 1. Since Mantel test is not based on any distribution nor the elements of matrices are independent, the
thus packages (ade4, ecodist & vegan) in R are acceptable for that sense in significance test. significance test is performed by randomized permutations and location of original Mantel statistics on the reference
distribution. Like Pearson’s product moment correlation analysis, the Mantel test can be performed in partial format,
in that, for example, distance effect (spatial autorrelation) among sampling locations can be controlled by partial out
FUNCTIONAL GENE ARRAY the distance matrix.
Number of Probes
Gene Category Unique Group Total Fig 4. Scattergram between simulated univariate data (X-Y) and
Carbon Degradation 2,532 276 2,808 CASE STUDY distance matrix (dX-dY) of them with three scenarios.
Carbon Fixation 584 215 799
Metal Resistance/Reduction 4,039 507 4,546 The simulation indicates that the values of Mantel’s rM is
Deciduous tree forest at Oak Ridge Reservation (ORR) was sampled
Methane/Methanogenesis 437 333 770 almost perfectly corresponding to that of Pearson’s r when
along two transects crossing at center to study spatial distribution of the
Table 1. Overall summary
Nitrogen Fixation 1,225 0 1,225 r ≤ 0 and squared Euclidean distance was used (data are
Nitrogen Metabolism 865 902 1,767 relationship between microbial functions and soil characteristics. Each
of FGA II probes. This also normal distribution). When r > 0, both coefficients
Nitrogen Reduction 1,805 501 2,306 sample was analyzed by a suite of soil characteristics (e.g., N fixation
probe set is targeting 10,498 Organic Contaminant 6,920 1,087 8,007 behave opposite that negative r corresponds positive rM. It
rate, total C, respiration rate etc) and FGAII. The output multivariate data
environmental functional Perchlorate Remediation 21 0 21 suggests that even if both coefficients bounded between -1
are converted into a dissimilarity matrices by using Euclidean distance
genes. Sulfur Reduction 1,286 329 1,615 and 1, negative rM doesn’t mean negative correlation, but
Total 19,714 4,150 23,864 (soil characteristics) and Sørensen index (FGA II in functional gene
weaker correlation. As many authors claim (e.g., Spatial
variants and functional genes levels). Richness and diversity indices were
Ecology, by Fortin & Dale), negative rM is not very
Fluorescent molecule also converted into dissimilarity matrices by using Euclidean distance.
common.
DNA or RNA Label nucleic acids The Mantel test was performed by function “mantel” of ecodist (v.1.01)
and vegan (v.1.6-10) packages of R. Fig 3. Map of sampling locations at Five applications performing Mantel test (ade4
ORR. Fig 5. Scattergram between Pearson’s r and
A (mantel_randtest), ecodist (mantel) and vegan (mantel))
Community Geochemistry Distance Richness Diversity Distance Mantel’s rM. packages of R, The R Pakcages by P. Legendre and
Hybridize to array Community fgv __________ 0.0824 (0.2410) 0.0735 (0.2710) Richness _______ r = 0.1181
P = 0.118 program RT by B. Manly, were investigated. There was no difference in the performance among them,
fg 0.2434 (0.0489) 0.1391 (0.1140) Diversity r = 0.9079 _______ r = 0.1624
Geochemistry fgv 0.0554 (0.3140) __________ 0.4510 (0.0076) P < 0.001 P = 0.129 but three packages of R provide appropriate P values for significance test. In fact, ecodist (mantel)
fg 0.2694 (0.0464) 0.4510 (0.0076) Community r = 0.5797 r = 0.5983 r = 0.07347
Distance fgv 0.1122 (0.1790) 0.4477 (0.0087) __________ Structure P < 0.001 P < 0.001 P = 0.271 provides all possible P values (two one-tailed and one two-tailed) for the situation when user have a
fg 0.2622 (0.0460) 0.4633 (0.0061)
B priori knowledge of his/her correlation direction. Legendre’s the R Package automatically assumes
Fig 1. Schematic diagram of FGAII hybridization. Fig 2. An image of one subgrid of Richness
Richness
_______
Diversity Distance
r = 0.1203 negative correlation when rM < 0, so tests significance in lower tail of reference distribution, but it
Table 2. Summary of the simple and partial Mantel test among microbial
Nucleic acid from environmental samples is prepared , hybridized FGAII with forest soil functions (community) , soil characteristics (geochemistry) and distance. The Diversity r = 0.9742 _______
P = 0.177
r = 0.09486
might lead misinterpretation when a priori sign of correlation is not obvious, which is not uncommon.
labeled with fluorescent dye (Cy5 or Cy3) and sample from Oak Ridge lower left diagonals are results of the partial Mantel tests with holding
P < 0.001 P = 0.236
hybridized with FGAII. Washing after incubation Reservation. One subgrid contains
Community r = 0.7955 r = 0.8525 r = 0.1391 Simulation was performed using MASS (mvrnorm) package of R.
components not corresponding to the specific cells. structure P < 0.001 P < 0.001 P = 0.114
removes all unhybridized DNA and correspondence of 576 spots (24 × 24) and is one of
array is pre-processed for down-stream analysis. 48 subgrids on the array (4 × 12). Values in parentheses are P values and significant correlation at 0.05 identified Table 3. Summary of the Mantel test among
by bold font. different microbial community indices and
distance for functional gene variants (A) and
SUMMARY
FGAII is the second generation custom-designed 50-mer oligonucleotide fgv: functional gene variants. fg: functional genes functional genes (B). • FGAII is a new tool for rather comprehensive investigation of environmental functions of microbial
microarray with 10,498 environmental genes with multiple probes (~69% with
communities.
3 and more). Major update from the first generation enables very extensive Results with functional gene variants were very noisy that clear relationship was not observed, but the results
investigation on environmental functions from a single experiment. Co- were much clear with functional genes level data, which basically is summation of functional gene variants. By • The Mantel test can be used for correlation analysis between two multivariate data of microbial functions
developed & adapted procedures (e.g., rolling circle amplification and comparing with partial Mantel tests, it seems to be all components somehow negatively contributed on the and environmental measurements.
hybridization machine etc) involving DNA preparation and hybridization correlation of two counterpart components. For example, this would indicate weaker negative spatial
• Results of the Mantel test might further analyzed by using gradient analysis methods, causal modeling and
overcame obstacles related to sensitivity and specificity. autocorrelation on the relationship between soil microbial functional genes (composition of them) and
correlogram analysis for the interest of researchers.
geochemical measurements. Further analysis by Mantel’s correlogram confirmed the existence of negative
Remaining issues mainly related to pre-processing of hybridization data and
autocorrelation especially at closer distance. • Simulation study suggests that negative Mantel’s rM indicates lower correlation not negative correlation,
effective statistical analysis for the data’s full extent. Efforts have been made to
so that significance test should be done on upper tail of the reference distribution.
develop and adapt idea and methods from the fields of genomic microarray All summary indices of microbial community were highly spatially autocorrelated at both functional gene
statistics, multivariate statistics, vegetation ecology, mathematical ecology, variants and genes levels. As expected, the summary indices were significantly correlated among them.
geostatistics and machine learning.

You might also like