This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more

Download

Standard view

Full view

of .

Look up keyword

Like this

Share on social networks

1Activity

×

0 of .

Results for: No results containing your search query

P. 1

Limits of ConfusionRatings:

(0)|Views: 1|Likes: 0

Published by terrabyte

A confidence interval is the numerical interval around the mean of a sample from a population that has a certain confidence of including the mean of the entire population. “Say what?” OK, let’s take it one point at a time.

A confidence interval is the numerical interval around the mean of a sample from a population that has a certain confidence of including the mean of the entire population. “Say what?” OK, let’s take it one point at a time.

Published by: terrabyte on Jan 31, 2011

Copyright:Traditional Copyright: All rights reserved

See more

See less

https://www.scribd.com/doc/47844227/Limits-of-Confusion

01/31/2011

text

original

LIMITS OF CONFUSION

A confidence interval is the numerical interval around the mean of a sample from apopulation that has a certain confidence of including the mean of the entire population.

―Say what?‖ OK, let’s take it one point at a

time.Say you collect 30 water samples from a lake.Oh wait. That use of the word sample will beconfusing to some people. A

sample

is aportion of a population, but the word can referto an individual piece of a population or acollection of pieces of the population(http://statswithcats.wordpress.com/2010/07/0

). It

’

s like the word fish

—

onefish, two fish, school of fish, and so on.Anyway, say you collect 30 aliquots (i.e., samples) of water from the lake and analyzethe aliquots for iron content. Then, you sum the 30 iron concentrations and divide by 30to get the mean iron concentration of your collection of aliquots (i.e., sample). But youdon

’

t really care about the mean iron concentration of your sample of 30 samplescollection of 30 aliquots. What you want to know is the average iron concentration of allthe water in the lake. No problem. You can use the mean iron concentration of the 30aliquots as an approximation of the mean iron concentration of the lake (population).Now, that would be fine for most people except for neurotic individuals who don

’t

understand the Central Limit Theorem. These persons have a couple of options. They cango back to the lake and collect 30 more aliquots of water (this is sometimes referred to asa

working vacation

if the collection of fish samples is also involved), then recalculate themean, and see what they get. They can do the same thing again, and again, and again(referred to as a

vacation

if the consumption of beer and potato chips is involved,http://statswithcats.wordpress.com/2010/07/26/samples-and-potato-chips/

’s

mean iron concentration might be.(Note: If the neurotic individuals can get someone else to pay for everything, they arecalled

consultants

. If the neurotic individuals can get everyone else to pay for everything,they are called

politicians

.)For people who can

’

t afford to collect more samples of samples, there

’

s an alternativeapproach called

resampling.

It

’

s the computer equivalent of a cushy government contractfor data collection. In a resampling approach, you would collect the 30 aliquots of lakewater, analyze them for iron content, and calculate the mean of your sample. Then youwould have specialized software randomly select a certain number of the original 30samples (the process is called

bootstrapping

or

jackknifing

depending on how it

’

s done;feel free to google away) to create a new dataset, from which you could calculate a newmean iron concentration. Then do that again, and again, and again until you have enoughmeans to say how variable the mean iron concentration is.

Cat whiskers are like confidence intervals. They let the cat know how big it's spread is.

A third alternative, which involves no fishing, less computer time, and as much beer asyou need, is to calculate a confidence interval. First, calculate the mean and standarddeviation of the 30 iron concentrations. Then calculate a confidence interval around thesample mean using the formula

Sample Mean

±

Sample Standard Deviation

DIVIDED BY

square root of the Number of Samples

TIMES

a

t-value

In the lake example, the mean, standard deviation and number of samples would becalculated from the iron concentrations determined in the aliquots of lake water. Thet-value would be calculated using software or selected from a table of values of the t-distribution on the basis of:

Degrees-of-freedom

—

the number of samples minus one. In this case, 30 wateraliquots minus 1 equals 29.

Alpha

—

One minus the confidence

that you won’t find any

estimates of the meanoutside the interval you calculate divided by the number of limits you willcalculate, in this case, two because you want upper and lower limits.The boundaries of a confidence interval are called the upper confidence limit and thelower confidence limit.For example, if:

Mean iron concentration were 50

Standard deviation of iron concentration were 10

t-value for 29 degrees-of-freedom (based on 30 iron concentrations) and alphaof .005 (based on 99% confidence for a two-sided limit) were 3.04the 99% lower confidence limit would be44.45 (i.e., 50

–

(3.04 * (10/30))and the 95% upper confidence limit would be55.55 (i.e., 50 + (3.04 * (10/

30))You would have about 99% confidence that this interval would include the mean ironconcentration of the lake.But what if you think 44 to 56 is too wide a range for the lake

’

s mean iron concentration.What can you do? You could go back to the lake and collect another 30 samples and tryagain. Better yet, you could go back to the lake and take 120 or even more sampleshttp://statswithcats.wordpress.com/2010/07/11/30-samples-standard-suggestion-or-superstition/

), but that

’

s a lot of expensive work vacation.Look back at the formula for the confidence limits. The limits are calculated from themean, the standard deviation, the number of samples, and the t-value. If you

’re not going

back to the lake, you can

’t change t

he mean, the standard deviation, or the number of samples. That leaves the t-value. The t-value would be based on the degrees-of-freedomand the confidence. The degrees-of-freedom are determined from the number of samples,so that

’s still no help. BUT, the choice of the confidence is yours.

Consider this. If you choose the confidence level to be:

99%, the confidence limit would be 44.45 to 55.5595%, the confidence limit would be 45.68 to 54.3290%, the confidence limit would be 46.27 to 54.32Or for that matter,50%, the confidence limit would be 47.86 to 52.14although it wouldn

’

t be very useful if your interval only had a 50% chance of includingthe real mean iron concentration of the lake.Consider the analogy of a nearsighted man playing a ring-toss game at a carnival. Thelocation of the peg he will toss his ring at is like the mean of a population of possiblemeasurements. The diameter of the peg is like the inherent variability of the population of measurements. The fuzziness with which he sees the peg because of his near sightednessis like the additional variation associated with sampling, measurement, andenvironmental variability(http://statswithcats.wordpress.com/2010/08/01/there%E2%80%99s-something-about-variance/

). The size of the ring he tosses is like the size of the confidence interval. If hewanted to be very confident that he could toss a ring over the peg, he would use a largering to give him that confidence (i.e., the higher the confidence the larger the confidenceinterval).The man cannot change the location and diameter of the peg (i.e., the population valuesare fixed). However, he would have a greater chance of success if he could see better(i.e., extraneous variation in the samples is controlled,http://statswithcats.wordpress.com/2010/09/05/the-heart-and-soul-of-variance-control/ ; http://statswithcats.wordpress.com/2010/09/19/it%E2%80%99s-all-in-the-technique

)or if he could use a very large ring (i.e., a relatively wide confidence interval). If the ring (theconfidence interval) becomes too large, though, the game becomes meaningless. Thus,there must be some limits on how large the ring should be.By convention, most statistical inferences,including confidence intervals, use a 95%confidence level. Sometimes either a 90%level (resulting in a smaller confidenceinterval) or a 99% level (resulting in alarger interval) is used. A 90% levelwould be more appropriate when theconsequences of not including the truepopulation value in the interval arerelatively minor. Confirmatory inferences,on the other hand, often use a 99%confidence level. When in doubt, use95%.Some people dislike putting confidence limits around means they calculate. Limits showhow imprecise data, and statistics calculated from them, actually are. But if you are goingto make an informed decision, you have to know not just the evidence, but the reliability

Obsidian in a 90% confidence drawer.

O U T L I E R S

I'm Just Saying ...

Wanderwork

Furlough Resources for GSA Employees

States Most Often Declared Disaster Areas

HOW TO WRITE DATA ANALYSIS REPORTSIN SIX EASY LESSONS

WHAT TYPE OF DATA SCIENTIST ARE YOU?

I CAN HAZ COLLABORATION

The Foundation of Professional Graphs

The Best Super Power of All

CATS SHOW HOW TO USE STATISTICS

Ten Ideas for Fixing the Federal Government

Carbon is Like Money

Regression Fantasies

Five Things You Should Know Before Taking Statistics 101

Polls Apart

Aphorisms for Data Analysts

The Right Answer

Ten Tactics in the War on Error

Its All Relative

Becoming Part of the Group

The Data Dozen

Statistics a Remedy for Football Withdrawal

Six Misconceptions About Statistics You May Get From Stats 101

Consumer Guide to Stats 101

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

Weapons of Math Production

It was Professor Plot in the Diagram with a Graph

Dealing With Dilemmas

Five Things You Should Know Before Taking Statistics 101

Regression Fantasies

You're Off to Be a Wizard

The Foundation of Professional Graphs

Grasping at Flaws

All in the Technique

Ockham's Spatula

Polls Apart

Live Long and Publish

The Zen of Modeling

Purrfect Resolution

CATS SHOW HOW TO USE STATISTICS

There Something About Variance

Becoming Part of the Group

Consumer Guide to Stats 101

A Picture Worth 140000 Words

The Right Tool for the Job 8-27-2010

Reality Statistics

The Right Answer

Statistics a Remedy for Football Withdrawal

Six Misconceptions About Statistics You May Get From Stats 101

The Santa Claus Strategy

Seeds of a Model

Perspectives on Objectives

The Best Super Power of All

Ten Fatal Flaws in Data Analysis (and how to avoid them)

Time Is On My Side

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd