Professional Documents
Culture Documents
STATISTICS
SEBASTIAN GUSTAVO MORENO BARÓN
WHAT IS AN ESTIMATOR?
An estimator is a statistic that estimates some fact about the population. You
can also think of an estimator as the rule that creates an estimate. For
example, the sample mean(x̄) is an estimator for the population mean, μ.
WHAT IS AN ESTIMATOR?
The quantity that is being estimated (i.e. the one you want to know) is called
the estimand. For example, let’s say you wanted to know the average height of
children in a certain school with a population of 1000 students. You take a sample
of 30 children, measure them and find that the mean height is 56 inches. This is
your sample mean, the estimator. You use the sample mean to estimate that the
population mean (your estimand) is about 56 inches
POINT ESTIMATE VS INTERVAL
ESTIMATE
Estimators can be a range of values (like a confidence interval) or a single
value (like the standard deviation). When an estimator is a range of values,
it’s called an interval estimate. For the height example above, you might
add on a confidence interval of a couple of inches either way, say 54 to 58
inches. When it is a single value — like 56 inches — it’s called a point
estimate.
TYPES
Estimators can be described in several ways (click on the bold word for the
main article on that term):
Biased: a statistic that is either an overestimate or an underestimate.
Efficient: a statistic with small variances (the one with the smallest possible
variance is also called the “best”). Inefficient estimators can give you good
results as well, but they usually requires much larger samples.
Invariant: statistics that are not easily changed by transformations, like
simple data shifts.
Shrinkage: a raw estimate that’s improved by combining it with other
information. See also: The James-Stein estimator.
Sufficient: a statistic that estimates the population parameter as well as if
you knew all of the data in all possible samples.
Unbiased: an accurate statistic that neither underestimates nor
overestimates.
German Tank Problem
The German Tank Problem is a way to
estimate the total population size from a
small sample. It’s commonly used in AP
statistics to teach about estimators. The
problem was originally developed by the
Allies during World War II, when it was
used to estimate the total number of
German tanks from a small number of
serial numbers from captured, destroyed,
or observed tanks. It was extended to
estimate the number of factories and
other manufactured parts. Today, the
formula has been applied for wide
reaching applications like estimating the
number of iPhones sold.
WHAT IS A PARAMETER?
Descriptiv
e Estimators Test
WHAT IS A POINT ESTIMATE?
In simple terms, any statistic can be a point estimate. A statistic is an estimator
of some parameter in a population. For example:
The sample standard deviation (s) is a point estimate of the
population standard deviation (σ).
The quantity that is being estimated (i.e. the one you want to know) is called
the estimand. For example, let’s say you wanted to know the average height of
children in a certain school with a population of 1000 students. You take a sample
of 30 children, measure them and find that the mean height is 56 inches. This is
your sample mean, the estimator. You use the sample mean to estimate that the
population mean (your estimand) is about 56 inches
WHAT IS AN INTERVAL ESTIMATE?
As an example, let’s say you wanted to find out the average cigarette use of
senior citizens. You can’t survey every senior citizen on the planet (due to
time constraints and finances), so you take a sample of 1000 senior citizens
and find that 10% of them smoke cigarettes. Although you’ve only taken
a sample, you can use that figure to estimate that “about” 10% of
the whole population smoke cigarettes. In reality, it’s unlikely to be exactly
10% (as you only sampled a small percentage of people), but it’s probably
somewhere around there, perhaps between 5 and 15%. That “somewhere
between 5 and 15%” is an interval estimate.
CONFIDENCE INTERVALS
In statistics, a confidence interval (CI) is a type of interval estimate,
computed from the statistics of the observed data, that might contain the
true value of an unknown population parameter.
CONFIDENCE INTERVALS
CONFIDENCE INTERVALS
EXAMPLE
A 2008 Gallup survey found that TV ownership may be good for wellbeing. The
results from the poll stated that the confidence level was 95% +/-3, which means
that if Gallup repeated the poll over and over, using the same techniques, 95% of
the time the results would fall within the published results. The 95% is
the confidence level and the +/-3 is called a margin of error. At the beginning of
the article you’ll see statistics (and bar graphs). At the bottom of the article you’ll
see the confidence intervals. For example, “For the European data, one can say
with 95% confidence that the true population for wellbeing among those without
TVs is between 4.88 and 5.26.” The confidence interval here is “between 4.88
and 5.26“.
CONFIDENCE INTERVAL FOR THE
MEAN WITH KNOWN VARIANCE
,…
𝑁 (0,1)
WORKSHOP
1) Among various ethnic groups, the standard deviation of heights is known to
be approximately three inches. We wish to construct a 95% confidence
interval for the mean height of male Swedes. Forty-eight male Swedes are
surveyed. The sample mean is 71 inches. The sample standard deviation is
2.8 inches.
2) The SAT scores from a random sample of 91 high school seniors were
analyzed and found to have a mean of 545 and a standard deviation of 75.
Find a 90% confidence interval.
3) The ACT scores from a random sample of 61 high school seniors were
analyzed and found to have a mean of 25.1 and a standard deviation of 3.6.
Find a 95% confidence interval.
175, 177, 180, 165, 170, 170, 181, 169, 165, 190, 170, 171
CONFIDENCE INTERVAL FOR
PROPORTIONS
,…
CONFIDENCE INTERVAL FOR
PROPORTIONS
𝑊𝑒𝑤𝑖𝑠h𝑡𝑜𝑠𝑡𝑢𝑑𝑦𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑟𝑎𝑡𝑒𝑠𝑖𝑛𝑎𝑐𝑜𝑛𝑠𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑐𝑜𝑚𝑝𝑎𝑛𝑦∈ayear,taking
January 11
February 22
March 23
April 18
May 17
June 15
July 15
August 24
September 11
October 15
November 25
December 13
CONFIDENCE INTERVAL FOR THE VARIANCE
,…
= 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 .
= 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 .
χ2
𝛼
,𝑛 − 1
2
CONFIDENCE INTERVAL FOR THE VARIANCE
𝐼𝑛𝑎𝑐𝑜𝑚𝑝𝑎𝑛𝑦,𝑞𝑢𝑎𝑙𝑖𝑡𝑦
𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑜𝑓 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑖𝑛𝑐𝑒𝑟𝑡𝑎𝑖𝑛𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙𝑠 𝑎𝑟𝑒 𝑔𝑖𝑣𝑒𝑛
SAMPLE DENSITY
1 1,1
2 1,13
3 2,21
4 1,16
5 1,68
6 2,16
7 2,23
8 2,29
9 1,28
10 0,88
𝐴𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔𝑡𝑜 𝑡h𝑖𝑠 , 𝑐𝑟𝑒𝑎𝑡𝑒 𝑎 95 % 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑡h𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡h𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 .
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS
,…
𝑆2
1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1.
𝑆2
2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2.
2
𝜎
1 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1.
2
𝜎
2 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2.
𝐹 2𝛼
, 𝑛1 − 1 , 𝑛2 − 1
2
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS
,
MACHINE 1 MACHINE 2
SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT