• Embed Doc
  • Readcast
  • Collections
  • 35
    CommentGo Back
Download
 
Improved Statistical Test for Multiple-Condition MicroarrayData Integrating ANOVA withClusteringJared Friedman
1
This research was done at The Rockefeller University and supervised by Brian Kirk 
2
1. 180 East End Ave, New York, NY, 10128. Email: vicvic@att.net. Phone: 212-249-01342. Email: KirkB@Rockefeller.edu. Phone: 212-327-7064
1
 
Summary
Microarrays are exciting new biological instruments. Microarrays promise to haveimportant applications in many fields of biological research, but they are currently fairly inaccurateinstruments and this inaccuracy increases their expense and reduces their usefulness. This researchintroduces a new technique for analyzing data from some types of microarray experiments that promises to produce more accurate results. Essentially, the method works by identifying patternsin the genes.
Abstract
Motivation: Statistical significance analysis of microarray data is currently a pressing problem. A multitude of statistical techniques have been proposed, and in the case of simple twocondition experiments these techniques work well, but in the case of multiple conditionexperiments there is additional information that none of these techniques take into account. Thisinformation is the shape of the expression vector for each gene, i.e., the ordered set of expressionmeasurements, and its usefulness lies in the fact that genes that are actually affected biologically by some experimental circumstance tend to fall into a relatively small number of clusters. Genesthat do appear to fall into these clusters should be selected for by the significance test, those that donot should be selected against.Results: Such a test was successfully designed and tested using a large number of artificially generated data sets. Where the above assumption of the correlation between clusteringand significance is true, this test gives considerably better performance than conventionalmeasures. Where the assumption is not entirely true, the test is robust to the deviation.
Introduction
Microarrays are part of an exciting new class of biotechnologies that promise to allowmonitoring of genome-wide expression levels (1). Although the technique presented in this paper is quite general and would, in principle, work with other types of data, it has been developed for microarray data. Microarray experiments work by testing expression levels of some set of genes inorganisms exposed to at least two different conditions, and usually then trying to determine which
2
 
(if any) of these genes are actually changing in response to the change in experimental condition(2). Unfortunately, the random variation in microarrays is often large relative to the changes tryingto be detected. This makes it very hard to eliminate false positives, genes whose randomfluctuations make them erroneously seem to respond to the experimental condition, and falsenegatives, genes whose random fluctuations mask the fact that they are actually responding to theexperimental condition. It is standard statistical practice to handle this problem by usingreplicates, i.e., several microarrays run in identical settings (1-9). Although replicates are highlyeffective, the expense of additional replicates makes it worthwhile to minimize the number needed by using more effective statistical tests.When there are only two conditions in an experiment, the conventional statistical test for differential expression is the t-test (2,6). But Long,
et. al.
(5, also 6,7) realized that by running aseparate t-test on each gene, valuable information is lost, namely the fact that the actual populationstandard deviation of all the genes is rather similar. Due to the small number of replicates inmicroarray experiments there are usually a few genes that, by chance, appear to have extremelylow standard deviations and thus register even a very small change as highly significant by aconventional t-test. In reality -- and as additional replicates would confirm -- the standarddeviations of these genes are much higher, and the change is not at all significant. In response tothis issue, Long
et. al.
derived using Bayesian statistics a method of transforming standarddeviations that, in essence, moves outlying standard deviations closer to the mean standarddeviation. Unfortunately, the resulting algorithm, which they call Cyber-T, works only with twocondition experiments. Since this research is concerned with multiple condition experiments, itwas necessary to expand the method to work with this type of experiment, the standard statisticaltest for which is called ANOVA (ANalysis Of Variations) (8). Here we report the successfulextension of the Cyber-T algorithm to multiple condition experiments, creating the algorithm wecall Cyber-ANOVA.In microarray experiments involving several conditions, the expression vector of each geneformed by consecutive measurements of its expression level has a distinctive shape, whichcontains useful information. The more conditions in the experiment (and thus numbers in theexpression vector), the more distinctive will be the shape. Often biologists are interested in finding
3
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...