Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Grasping at Flaws

Grasping at Flaws

Ratings: (0)|Views: 15 |Likes:
Published by terrabyte
Even if you’re not a statistician, you may one day find yourself in the position of reviewing a statistical analysis that was done by someone else. It may be an associate, someone who works for you, or even a competitor. Don’t panic. Critiquing someone else’s work has got to be one of the easiest jobs in the world.
Even if you’re not a statistician, you may one day find yourself in the position of reviewing a statistical analysis that was done by someone else. It may be an associate, someone who works for you, or even a competitor. Don’t panic. Critiquing someone else’s work has got to be one of the easiest jobs in the world.

More info:

Published by: terrabyte on Jan 16, 2011
Copyright:Traditional Copyright: All rights reserved


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Even if you’re not a statistician, you may one day find yourself in the position of reviewing
astatistical analysis that was done by someone else. It may be an associate, someone who works
for you, or even a competitor. Don’t panic. Critiquing someone else’s work 
has got to be one of the easiest jobs in the world. After all, your boss does it all the time(http://statswithcats.wordpress.com/2010/11/14/you-can-lead-a-boss-to-data-but-you-can%e2%80%99t-make-him-think/). Doing it in a constructive manner is another story.
Don’t expect to find a major flaw in a multivariate
analysis of variance, or a neural network, or afactor analysis. Look for the simple and fundamental
errors of logic and performance. It’s
probably what you are best suited for and will be most usefulto the report writer who can no longer see the statistical forestthrough the numerical trees.
So here’s the deal. I’ll give you some bulletproof leads onwhat criticisms to level on that statistical report you’re
reading. In exchange, you must promise to be gracious,forgiving, understanding, and, above all, constructive in your
remarks. If you don’t, you will be forever cursed to receive
the same manner of comments that you dish out.With that said, here are some things to look for.
The Red-Face Test
Start with an overall look at the calculations and findings. Not infrequently, there is a glaringerror that is invisible to all the poor folks who have been living with the analysis 24/7 for the lastseveral months. The error is usually simple, obvious once detected, very embarrassing, andenough to send them back to their computers. Look for:
Wrong number of samples
Either samples were unintentionally omitted or replicateswere included when the
y shouldn’t have been.
Unreasonable means
Calculated means look too high or low, sometimes by a lot. Thecause may be a mistaken data entry, an incorrect calculation, or an untreated outlier.
Nonsensical conclusions
A stated conclusion seems counterintuitive or unlikely givenknown conditions. This may be caused by a lost sign on a correlation or regressioncoefficient, a misinterpreted test probability, or an inappropriate statistical design oranalysis.
Nobody Expects the Sample Inquisition
Start with the samples. If you can cast doubt on the representativeness of the samples, everything
else done after that doesn’t matter. If you are reviewing a product from a mathematically
probably the only place to look for difficulties is in the samples. There are a fewreasons for this. First, a statistician may not be familiar with some of the technical complexitiesof sampling the medium or population being investigated. Second, he or she may have beenhanded the dataset with little or no explanation of the methods used to generate the data. Third,he or she will probably get everything else right. Focus on what the data analyst knows the leastabout.
I feel like I'm being watched.
 It helps to know where to look for hidden trouble.
Data Alone Do Not an Analysis Make
Unless you see the report writer counting on his or her fingers, don’t worry about
calculations being correct. There’s so much good statistical software
available that getting the
calculations right shouldn’t be a problem
 (http://statswithcats.wordpress.com/2010/06/27/weapons-of-math-production/). It should besufficient to simply verify that he or she used tested statistical software.
Likewise, don’t bother 
asking for the data unless you plan to redo the analysis. You
won’t be able to get much o
ut of aquick look at a database, especially if it is large. Even if you redo the analysis, you may not makethe same decisions about outliers and other data issues that will lead to slightly different results(http://statswithcats.wordpress.com/2010/10/17/the-data-scrub-3/). Waste your time on otherthings.
Descriptive Statistics
Descriptive statistics are usually the first place you might notice something amiss in a dataset. Besure the report provides means, variances, minimums, and maximums, and numbers of samples.Anything else is gravy. Look for obvious data problems like a
minimum that’s way too low or amaximum that’s way too high. Be sure the sample sizes
are correct. Watch out for the analysisthat claims to have a large number of samples but also a large number of grouping factors. Thetotal number of samples might be sufficient, but the number in each group may be too small tobe analyzed reliably.
You might be provided a matrix with dozens of correlation coefficients(http://statswithcats.wordpress.com/2010/11/28/secrets-of-good-correlations/). For anycorrelation that is important to the analysis in the report, be sure you get a
t-test to determinewhether the correlation coefficient is different from zero, and a plot of the two correlated variables toverify that the relationship between the two variables is linear and there are no outliers.
Regression models are one of the most popular types of statistical analyses conducted by non-statisticians. Needless to say, there are usually quite a few areas that can be critiqued. Here areprobably the most common errors.
If the ratio of data points to predictor variables isn’t at
least 10 to 1, the modelwill be unstable (http://statswithcats.wordpress.com/2010/07/17/purrfect-resolution/).
There should be an intercept term in the model unless there is a compellingtheoretical reason not to include it. When an intercept is omitted, the coefficient of determination is artificially inflated and the model will look better than it really is.
Look at the variation of the predictions, usually expressed as the standarderror of estimate (http://statswithcats.wordpress.com/2010/12/19/you%e2%80%99re-off-to-be-a-wizard/). You might have an accurate predictive model that lacks enoughprecision to be useful.
Statistical Tests
Statistical tests are often done by report writers with no notion of what they mean. Look for somedescription of the null hypothesis (the assumption the test is trying to disprove) for the test. Itd
oesn’t matter if 
it is in words or mathematical shorthand. Does it make sense? For example, if the analysis is trying to prove that a pharmaceutical
effective, the null hypothesis should bethat the pharmaceutical
is not 
effective. After that, look for the test statistics and probabilities. If 
you don’t understand what they
mean, just be sure they were reported. If you want to take it tothe next step, look for violations of statistical assumptions(http://statswithcats.wordpress.com/2010/10/03/assuming-the-worst/).
Analysis of Variance
An ANOVA is like a testosterone-induced, steroid-driven, rampaging horde of statistical tests.There are many many ways the analysis can be misspecified, miscalculated, misinterpreted, and
misapplied. You’ll probably never 
find most kinds of ANOVA flaws
unless you’re a
professional statistician, so stick with the simple stuff.A good ANOVA will include the traditional ANOVA summary table, an analysis of deviationsfrom assumptions, and a power analysis. You hardly ever get the last two items. Not getting theANOVA table in one form or another is cause for suspicion. It might be that there was something
in the analysis, or the data analyst didn’t know it should be included.
 If the
ANOVA design doesn’t have the same number of 
samples in each cell, the design istermed
That’s not a fatal flaw but v
iolations of assumptions are more serious forunbalanced designs.If the sample sizes are very small, only large difference can be detected in the means of theparameter being investigated. In this case, be suspicious of finding no significant differenceswhen there should be some.
Assumptions Giveth and Assumptions Taketh Away
Statistical models usually make at least four assumptions: the model is linear; the errors(residuals) from the model are independent; Normally-distributed; and have the same variancefor all groups. A first-class analysis will include somemention of violations of assumptions. Violating anassumption does not necessarily invalidate a model butmay require that some caveats be placed on the results.The independence assumption is the most critical. This isusually addressed by using some form of randomization to
select samples. If you’re dealing with spatial or temporal
 data, you probably have a problem unless some additionalsteps were taken to compensate for autocorrelation.Equality of variances is a bit more tricky. There are teststo evaluate this assumption, but they may not have been
cited by the report writer. Here’s a rule of thumb. If the
largest variance in an ANOVA group or regression level istwice as big as the smallest variance, you might have a problem. If the difference is a factor of five or more, you definitely have a problem.
It helps to know where to look for hidden trouble 

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->