Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Improved Statistical Test

Improved Statistical Test



|Views: 274,267 |Likes:
Published by Jared Friedman
My Intel paper. Now it is finally published.
My Intel paper. Now it is finally published.

More info:

Published by: Jared Friedman on Oct 11, 2006
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less





Improved Statistical Test for Multiple-Condition MicroarrayData Integrating ANOVA withClusteringJared Friedman
This research was done at The Rockefeller University and supervised by Brian Kirk 
1. 180 East End Ave, New York, NY, 10128. Email: vicvic@att.net. Phone: 212-249-01342. Email: KirkB@Rockefeller.edu. Phone: 212-327-7064
Microarrays are exciting new biological instruments. Microarrays promise to haveimportant applications in many fields of biological research, but they are currently fairlyinaccurate instruments and this inaccuracy increases their expense and reduces their usefulness.This research introduces a new technique for analyzing data from some types of microarrayexperiments that promises to produce more accurate results. Essentially, the method works byidentifying patterns in the genes.
Motivation: Statistical significance analysis of microarray data is currently a pressing problem. A multitude of statistical techniques have been proposed, and in the case of simple twocondition experiments these techniques work well, but in the case of multiple conditionexperiments there is additional information that none of these techniques take into account. Thisinformation is the shape of the expression vector for each gene, i.e., the ordered set of expressionmeasurements, and its usefulness lies in the fact that genes that are actually affected biologically by some experimental circumstance tend to fall into a relatively small number of clusters. Genesthat do appear to fall into these clusters should be selected for by the significance test, those thatdo not should be selected against.Results: Such a test was successfully designed and tested using a large number of artificially generated data sets. Where the above assumption of the correlation betweenclustering and significance is true, this test gives considerably better performance thanconventional measures. Where the assumption is not entirely true, the test is robust to thedeviation.
Microarrays are part of an exciting new class of biotechnologies that promise to allowmonitoring of genome-wide expression levels (1). Although the technique presented in this paper is quite general and would, in principle, work with other types of data, it has beendeveloped for microarray data. Microarray experiments work by testing expression levels of 
some set of genes in organisms exposed to at least two different conditions, and usually thentrying to determine which (if any) of these genes are actually changing in response to the changein experimental condition (2). Unfortunately, the random variation in microarrays is often largerelative to the changes trying to be detected. This makes it very hard to eliminate false positives,genes whose random fluctuations make them erroneously seem to respond to the experimentalcondition, and false negatives, genes whose random fluctuations mask the fact that they areactually responding to the experimental condition. It is standard statistical practice to handle this problem by using replicates, i.e., several microarrays run in identical settings (1-9). Althoughreplicates are highly effective, the expense of additional replicates makes it worthwhile tominimize the number needed by using more effective statistical tests.When there are only two conditions in an experiment, the conventional statistical test for differential expression is the t-test (2,6). But Long,
et. al.
(5, also 6,7) realized that by running aseparate t-test on each gene, valuable information is lost, namely the fact that the actual population standard deviation of all the genes is rather similar. Due to the small number of replicates in microarray experiments there are usually a few genes that, by chance, appear tohave extremely low standard deviations and thus register even a very small change as highlysignificant by a conventional t-test. In reality -- and as additional replicates would confirm -- thestandard deviations of these genes are much higher, and the change is not at all significant. Inresponse to this issue, Long
et. al.
derived using Bayesian statistics a method of transformingstandard deviations that, in essence, moves outlying standard deviations closer to the meanstandard deviation. Unfortunately, the resulting algorithm, which they call Cyber-T, works onlywith two condition experiments. Since this research is concerned with multiple conditionexperiments, it was necessary to expand the method to work with this type of experiment, thestandard statistical test for which is called ANOVA (ANalysis Of Variations) (8). Here wereport the successful extension of the Cyber-T algorithm to multiple condition experiments,creating the algorithm we call Cyber-ANOVA.In microarray experiments involving several conditions, the expression vector of eachgene formed by consecutive measurements of its expression level has a distinctive shape, whichcontains useful information. The more conditions in the experiment (and thus numbers in the

Activity (58)

You've already reviewed this. Edit your review.
leafo reviewed this
Rated 4/5
The greatest doc that ever lived
leafo liked this
Trip Adler liked this
leafo liked this
leafo liked this
Kevin Jacobson added this note
More like KJ-means clustering...
newtonapple liked this

You're Reading a Free Preview