Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
There Something About Variance

There Something About Variance

Ratings: (0)|Views: 5|Likes:
Published by terrabyte
Accuracy and precision are important concepts in statistics. When you measure data for an analysis, you’ll notice that even under similar conditions, you can get dissimilar results. That lack of precision is called variability. Variability is everywhere; it’s a normal part of life. In fact, it is the spice in the soup. Without variability, all wines would taste the same. Every race would end in a tie. Even statistics might lose its charm. So a bit of variability isn’t such a bad thing. The important question, though, is what kind of variability?
Accuracy and precision are important concepts in statistics. When you measure data for an analysis, you’ll notice that even under similar conditions, you can get dissimilar results. That lack of precision is called variability. Variability is everywhere; it’s a normal part of life. In fact, it is the spice in the soup. Without variability, all wines would taste the same. Every race would end in a tie. Even statistics might lose its charm. So a bit of variability isn’t such a bad thing. The important question, though, is what kind of variability?

More info:

Published by: terrabyte on Aug 08, 2010
Copyright:Traditional Copyright: All rights reserved

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

08/08/2010

pdf

text

original

 
T
HERE
S
S
OMETHING
A
BOUT
V
ARIANCE
 
Imagine practicing hitting a target using darts, bow and arrow, pistol, cannon, missile launcher,or whatever. You aim for the center of the target. If your shots land where you aimed, you areconsidered to be
accurate
. If all your shots land near each other, you are considered to be
 precise
. The two properties are not linked. You can be accurate but not precise, precise but notaccurate, neither accurate nor precise, or both accurate and precise.Accuracy and precision
also apply to statistics calculated from data. If you’re trying to determine
some characteristic of a population (i.e., a population parameter), you want your statisticalestimates of the characteristic to be both accurate and precise.The same also applies to the data themselves. When you start measuring data for an analysis,
you’ll notice that even under similar conditions, you can get dissimilar results. That
lack of precision
 
is called
variability
. Variability is everywhere; i
t’s a normal part of life.
In fact, it is thespice in the soup. Without variability, all wines would taste the same. Every race would end in a
tie. Even statistics might lose its charm. Your doctor wouldn’t tell you that you have about a year to live, he’d say don’t make any plans
for January 11 after 6:13
PM
EST. So a bit of variability
isn’t such a bad thing. The important question, though, is
what kind of variability?
The Inevitability of Variability
Before going further, let me clarify something. Statisticians discuss variability using a variety of terms, including
errors, uncertainty, deviations, distortions, residuals, noise, inexactness,dispersion, scatter, spread, perturbations, fuzziness,
and
differences
. To nonprofessionals, manyof these terms hold pejorative connotations
. But variability isn’t bad … it’s just misunderstood.
 
Suppose you’re sitting in your living room one cold winter 
night contemplating the high cost of heating oil. Thethermostat reads 68 degrees
F, but you’re still shivering.
Maybe the thermostat is broken. Maybe the heater ismalfunctioning or you need more insulation. You need awarmer place to sit while you read
 An Inconvenient Truth,
 so you grab a thermometer from the medicine cabinet and
start measuring temperatures around the room. It’s
115degrees at the radiator, 68 degrees at your chair, 59 degreesat the window, and 69 degrees at the stairs. You keep
measuring. It’s 73
degrees at the fish tank, 67 degrees at thecouch and bookcase, 82 degrees at the TV, and 60 degrees
at the door. That’s
a lot of variation!Think of those temperature readings as the summation of five components:
Characteristic of Population
 — 
the portion of a data value that is the same between asample and the population. This part of a data value forms the patterns in the population
that you want to uncover. If you think of the living room space as the population you’re
measuring, the characteristic temperature would be the 68 degrees at your chair whereyou want to read.
Natural Variability
 — 
the inherent differences between a sample and the population.This part of a data value is the uncertainty or variability in population patterns. In a
 
completely deterministic world, there would be no natural variability. You would read thesame value at every point where you took a measurement. But in the real world, if youmade the same measurement again and again, you probably would get different values. If all other types of variation were controlled, these differences would be the natural orinherent variability.
Sampling Variability
 — 
differences between a sample and the population attributable tohow uncharacteristic (nonrepresentative) the sample is of the population. Minimizingsampling error requires that you understand the population you are trying to evaluate. Thesampling variability in the living room would be attributable to where you took thetemperature readings. For example, the radiator and TV are heat sources. The door andwindow are heat sinks. Furthermore, if all the readings were taken at eye level, the areasnear the ceiling and floor would not have been adequately represented. The floor may bea few degrees cooler because the more dense cold air sinks displacing the warmer airupward, which is why the air at the ceiling is warmer.
Measurement Variability
 — 
differences between a sample and the populationattributable to how data were measured or otherwise generated. Minimizing measurementerror requires that you understand measurement scales and the actual process andinstrument you use to generate data. Using an oral thermometer for the living roommeasurements may have been expedient but not entirely appropriate. The temperatures
you wanted to measure are at the low end of the thermometer’s range and may be less
accurate than around 98 degrees. Also, the thermometer is slow to reach equilibrium and
can’t be read with more than one decimal place of precision. Use a digital infrared laser 
-point thermometer next time. More accurate. More precise. More fun.
Environmental Variability
 — 
differences between a sample and the populationattributable to extraneous factors. Minimizing environmental variance is difficult becausethere are so many causes and because the causes are often impossible to anticipate orcontrol. For example, the heating system may go on and off unexpectedly. Your ownbody heat adds to the room temperature and walking around the living room takingmeasurements mixes the ceiling and floor air which adds variability to the temperatures.When you analyze data, you usually want to evaluate characteristics of some population and the
natural variability associated with the population. Ideally, you don’t want to be mislead by any
extraneous variability that might be introduced by the way you select your samples (or patients,items, or other entities), measure (generate or collect) the data, or experience uncontrolled
transient events or conditions. That’s why it’s so important to understand the ways of variability.
 
Variability versus Bias
Remember target practice? If there is little variation in your aim, the deviations from the centerof the target would be random in distance and direction. Your aim would be accurate and precise.But what if the sight on your weapon were misaligned? Your shots would not be centered on thecenter of the target. Instead there would be a systematic deviation caused by the misalignedsight. Your shots would all be inaccurate, by roughly the same distance and direction from thecenter. That systematic deviation is called
bias
. You may not even have known there was aproblem with the sight before shooting, although you would probably suspect something after allthe misses.
 
Bias usually carries the connotation of being a bad thing. It usually is. It may be why 19thCentury British Prime Minister Benjamin Disraeli mistakenly associated statistics with lies anddamn lies. But if the systematic deviation is a good
thing because it fixes another bias, it’s called
a correction. For example, you could add a correction, an intentional bias in the directionopposite the bias introduced by the weapon sight, to compensate for the inaccuracy. So bias can
 be good (in a way) or bad, intentional or not, but it’s always systematic. On the other hand, a bias
applied to only selected data is a form of exploitation, and is nearly always intentional and a verybad thing.So the relationships to remember are:
Variance
ImprecisionBias
Inaccuracy
Most statistical techniques are unbiased themselves, as long as you meet their assumptions. If 
something goes wrong, you can’t blame the statistics. You may have to look in the mirror,
though. During the course of any statistical analysis, there are many decisions that have to bemade, primarily involving data. Whatever the decisions are, such as deleting or keeping anoutlier, there will be some impact on precision and perhaps even accuracy. In an ideal world, thesum of the decisions wo
uldn’
t add appreciably to the variability. Often, though, data analystswant to be conservative, so they make decisions they believe are counter to their expectations.But
when they don’t get the results they expected, they go back and try to tweak the analysis. At
that point they have lost all chance of doing an objective analysis and are little better thananalysts with vested interests who apply their biases from the start. Avoiding such
analysis bias
 requires no more than to make decisions based solely on statistical principles. This soundssimple but it
isn’t always so.
 
Sometimes bias isn’t the fault of the data analyst, as in the case o
reporting bias
. In professionalcircles the most common form of reporting bias is probably not reporting non-significant results.Some investigators will repeat a study again and again, continually fine-tuning the study designuntil they reach their nirvana of statistical significance. Seriously, is there any real difference
 between probabilities of significance of 0.051 versus 0.049? But you can’t fault the investigatorsalone. Some professional journals won’t publish negative results, and professionals who don’t
publish perish. Can you imagine the pressure on an investigator looking for a significant resultfor some new business venture, like a pharmaceutical? He might take subtle actions to help hiscause then not report
everything
he did.
That’s a form o
f reporting bias.Perhaps the most common form of reporting bias in nonprofessional circles is cherry picking, the
 practice of reporting just those findings that are favorable to the reporter’s position. Cherry
picking is very common in studies of controversial topics such as climate change, marijuana, andalternative medicine. Virtually all political discussions use information that was cherry picked.
Given that someone else’s reporting bias is after 
-the-analysis, why is it important to youranalysis? The answer is that it
’s how you can be misled in planning your statistical study. Never 
trust a secondary source if you can avoid it. Never trust a source of statistics or a statistical
analysis that doesn’t report variance and sample size along with the results.
And alwaysremember:
s
tatistics don’t lie; people do.
 
Join the
Stats With Cats
group on Facebook.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->