Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Samples and Potato Chips

Samples and Potato Chips

Ratings: (0)|Views: 25 |Likes:
Published by terrabyte
Samples are like potato chips. You're never satisfied with just one. Every one you take makes you want more. And you're never sure you've had enough until you've had way too many.
Samples are like potato chips. You're never satisfied with just one. Every one you take makes you want more. And you're never sure you've had enough until you've had way too many.

More info:

Published by: terrabyte on Aug 08, 2010
Copyright:Traditional Copyright: All rights reserved

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

08/08/2010

pdf

text

original

 
S
AMPLES AND
P
OTATO
C
HIPS
 
Samples are like potato chips. You're never satisfied with just one. Every one you take makes you want more. And you're never sure you've had enough until you've had way too many.
Betcha C
an’t
Take Just One
One observation. One test sample. One subject. One measurement.
One of anything isn’t thatsatisfying. You’ll always want more
to replicate the experience, to find out if there isconsistency. Maybe you take just a few. If you sense a pattern, you can build your observationsinto an anecdote, a story. Many statistical analyses, in fact, grow out of anecdotal evidence. You
 just can’t stop at the story telling stage.
Statistics are antidotes to those anecdotes.Politicians, preachers, and parents can get away with telling tales to illustrate points they want tomake. Their followers trust them and want to believe them whether they are telling the truth ornot. Other professionals, though,
can’t rely on their audience having such unquestioning faith.
Scientists rely on hard data to test their hypotheses. Educators need test scores so they can gradeon a curve. Businessmen want to see the numbers before they spend their money (your money,not so much). So you can p
retty much expect that once you start collecting data, you’ll
going towant more.
Want More
You know you want more data, so first you estimate how
many more samples you’ll need to get that Purrfect
Say you’ve estimated that you need 1,000 samples to do a
statistical analysis. You package your sampling andanalysis plan into a proposal and give it to your client. Onething you can bet on is that your client wo
n’t want to spend the money to collect that many
samples. So what can you do? Here are a few suggestions:
Change the Study
 — 
Lower your confidence (1 minus the false positive error rate
you’ll allow)
and power
(1 minus the false negative error rate you’ll al
low). If youdo this, look out for those misleading test results. You can look for bigger effects(e.g., differences between means, size of targets, and so on).
You won’t get the
resolution you wanted but it could be a good start. Also consider limiting thestudy area, level of detail, or analysis scope. Sometimes you can trade otherproject costs, like meetings and deliverables, for a few more samples.
Take Smaller Bites
 — 
Take as many samples as you can and use the informationto decide what to do next. This is sometimes the aim of a pilot study. You can usethe samples collected during a pilot study to estimate more precisely how many
more samples you’ll need to get the statistical resolution you want. You might
 
Chips all gone. Want more.
 
 
also be able to collect samples in phases or change the implementation schedule
to accommodate your client’s budget cycle.
Use Supporting Data
 — 
There may be historical data available that you can use to
reassess the number of samples you’ll need and even augment the samples you
plan to collect (i.e., provided the quality of the historical data is appropriate). Youcan also consider surrogate sampling, in which you correlate the results of manyinexpensive observations or measurements to the few expensive samples yourclient can afford.
Control Variance
 — 
If you think about it, the reason you need more samples inthe first place is because you need to improve precision (not accuracy). So think harder about how you can reduce any extraneous variability in the data generationprocess. Standardized procedures and training of the data collectors mightmitigate the need for quite a few samples.
Too Many
Can you eat too many potato chips? Of course you can.
It’s happened to many of us.
Likewise,you can have too many samples, which presents its own set of challenges. Here are five:
Information Overload
 
 — 
Statistical software tends to be very efficient, but when you
have tens of thousands of samples, you start to see performance slow a bit. What’s more
important, though, is the inefficiency you run into when you scrub your dataset,especially if you use a lot of spreadsheet array formulas. Be patient. You can use thewaiting time to read a good book.http://statswithcats.wordpress.com/2010/05/29/stats-with-cats-whats-inside/  
Chasing Tails
 
 — 
In any data set, you may have 5% influential observations not to
mention the outliers and errors that you’ll have to check to determine if they should be
corrected, removed from the dataset, or left alone. This is a very time consuming process.With a small dataset, you may have to investigate just a few samples. With a 1,000-record dataset, you may have to investigate 50 samples. This is part of why datascrubbing can represent most of the work in a data analysis project.
Data Intimacy
 
 — 
 
When you’re working with only a few dozen samples, you get to
know each data point. You can look at plots and tables and see how individual details fit
into a bigger picture. You can’t do that with a tho
usand data points. Sometimes you canget around this problem by dividing the data into groups and working with the groups, oranalyzing a higher level of hierarchical data.
Graphic Mud
 
 — 
 
It’s tough to see patterns with only a few samples but plotting
thous
ands of samples can be just as perplexing. You won’t be able to use any small plots
like matrix plots. Even with full-scale plots, it will be difficult to see subtle differences indata point markers, like size, shape, and even color. Points will overwrite each other so
you won’t be able to tell it there is one point at a graph location or a hundred points
stacked on top of each other. And even the best statistical software will choke whentrying to print graphs with thousands of data points. Solving this problem usuallyinvolves plotting group means or only randomly selected records from the data matrix.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->