You are on page 1of 49

INFERENCIAL

STATISTICS
SEBASTIAN GUSTAVO MORENO BARÓN
WHAT IS AN ESTIMATOR?
An estimator is a statistic that estimates some fact about the population. You
can also think of an estimator as the rule that creates an estimate. For
example, the sample mean(x̄) is an estimator for the population mean, μ.
WHAT IS AN ESTIMATOR?

The quantity that is being estimated (i.e. the one you want to know) is called
the estimand. For example, let’s say you wanted to know the average height of
children in a certain school with a population of 1000 students. You take a sample
of 30 children, measure them and find that the mean height is 56 inches. This is
your sample mean, the estimator. You use the sample mean to estimate that the
population mean (your estimand) is about 56 inches
POINT ESTIMATE VS INTERVAL
ESTIMATE
Estimators can be a range of values (like a confidence interval) or a single
value (like the standard deviation). When an estimator is a range of values,
it’s called an interval estimate. For the height example above, you might
add on a confidence interval of a couple of inches either way, say 54 to 58
inches. When it is a single value — like 56 inches — it’s called a point
estimate.
TYPES
Estimators can be described in several ways (click on the bold word for the
main article on that term):
Biased: a statistic that is either an overestimate or an underestimate.
Efficient: a statistic with small variances (the one with the smallest possible
variance is also called the “best”). Inefficient estimators can give you good
results as well, but they usually requires much larger samples.
Invariant: statistics that are not easily changed by transformations, like
simple data shifts.
Shrinkage: a raw estimate that’s improved by combining it with other
information. See also: The James-Stein estimator.
Sufficient: a statistic that estimates the population parameter as well as if
you knew all of the data in all possible samples.
Unbiased: an accurate statistic that neither underestimates nor
overestimates.
German Tank Problem
The German Tank Problem is a way to
estimate the total population size from a
small sample. It’s commonly used in AP
statistics to teach about estimators. The
problem was originally developed by the
Allies during World War II, when it was
used to estimate the total number of
German tanks from a small number of
serial numbers from captured, destroyed,
or observed tanks. It was extended to
estimate the number of factories and
other manufactured parts. Today, the
formula has been applied for wide
reaching applications like estimating the
number of iPhones sold.
WHAT IS A PARAMETER?

It’s a value that tells you


something about
a population and is the opposite
from a statistic, which tells you
something about a small part of
the population.
WHAT IS A PARAMETER?
A parameter never changes, because everyone (or everything) was surveyed to find the
parameter. For example, you might be interested in the average age of everyone in your
class. Maybe you asked everyone and found the average age was 25. That’s a
parameter, because you asked everyone in the class. Now let’s say you wanted to know
the average age of everyone in your grade or year. If you use that information from your
class to take a guess at the average age, then that information becomes a statistic.
That’s because you can’t be sure your guess is correct (although it will probably be
close!).
WHAT IS A STATISTIC
A statistic is a piece of data from a portion of a population. It’s the opposite
of a parameter. A parameter is data from a census. A census
surveys everyone.

Descriptiv
e Estimators Test
WHAT IS A POINT ESTIMATE?
In simple terms, any statistic can be a point estimate. A statistic is an estimator
of some parameter in a population. For example:
The sample standard deviation (s) is a point estimate of the
population standard deviation (σ).

The sample mean (̄x) is a point estimate of the population mean, μ

The sample variance (s2 is a point estimate of the population variance (σ2).

In more formal terms, the estimate occurs as a result of point estimation


applied to a set of sample data. Points are single values, in comparison
to interval estimates, which are a range of values. For example, a confidence
interval is one example of an interval estimate.
WHAT IS AN INTERVAL ESTIMATE?

The quantity that is being estimated (i.e. the one you want to know) is called
the estimand. For example, let’s say you wanted to know the average height of
children in a certain school with a population of 1000 students. You take a sample
of 30 children, measure them and find that the mean height is 56 inches. This is
your sample mean, the estimator. You use the sample mean to estimate that the
population mean (your estimand) is about 56 inches
WHAT IS AN INTERVAL ESTIMATE?
As an example, let’s say you wanted to find out the average cigarette use of
senior citizens. You can’t survey every senior citizen on the planet (due to
time constraints and finances), so you take a sample of 1000 senior citizens
and find that 10% of them smoke cigarettes. Although you’ve only taken
a sample, you can use that figure to estimate that “about” 10% of
the whole population smoke cigarettes. In reality, it’s unlikely to be exactly
10% (as you only sampled a small percentage of people), but it’s probably
somewhere around there, perhaps between 5 and 15%. That “somewhere
between 5 and 15%” is an interval estimate.
CONFIDENCE INTERVALS
In statistics, a confidence interval (CI) is a type of interval estimate,
computed from the statistics of the observed data, that might contain the
true value of an unknown population parameter.
CONFIDENCE INTERVALS
CONFIDENCE INTERVALS
EXAMPLE
A 2008 Gallup survey found that TV ownership may be good for wellbeing. The
results from the poll stated that the confidence level was 95% +/-3, which means
that if Gallup repeated the poll over and over, using the same techniques, 95% of
the time the results would fall within the published results. The 95% is
the confidence level and the +/-3 is called a margin of error. At the beginning of
the article you’ll see statistics (and bar graphs). At the bottom of the article you’ll
see the confidence intervals. For example, “For the European data, one can say
with 95% confidence that the true population for wellbeing among those without
TVs is between 4.88 and 5.26.” The confidence interval here is “between 4.88
and 5.26“.
CONFIDENCE INTERVAL FOR THE
MEAN WITH KNOWN VARIANCE
  ,…

  𝑁 (0,1)
 
WORKSHOP
1) Among various ethnic groups, the standard deviation of heights is known to
be approximately three inches. We wish to construct a 95% confidence
interval for the mean height of male Swedes. Forty-eight male Swedes are
surveyed. The sample mean is 71 inches. The sample standard deviation is
2.8 inches.

2) The SAT scores from a random sample of 91 high school seniors were
analyzed and found to have a mean of 545 and a standard deviation of 75.
Find a 90% confidence interval.

3) The ACT scores from a random sample of 61 high school seniors were
analyzed and found to have a mean of 25.1 and a standard deviation of 3.6.
Find a 95% confidence interval.

4) An IQ test was administered to a random sample of 101 sixth graders. The


sample mean was 104.2 with a standard deviation of 12. Find a 99%
confidence interval.
WORKSHOP
5) The average height of a random sample of 21 adult males was 70 inches with
a standard deviation of 2.2 inches. Find a 99% confidence interval.
CONFIDENCE INTERVAL FOR THE
MEAN WITH UNKNOWN VARIANCE
,…
    𝑡 𝑛− 1
 
CONFIDENCE INTERVAL FOR THE
MEAN WITH UNKNOWN VARIANCE
 
of a population. From previous studies, it’s known that height has approximately a normal
distribution. The researcher collected the next

175, 177, 180, 165, 170, 170, 181, 169, 165, 190, 170, 171
CONFIDENCE INTERVAL FOR
PROPORTIONS
  ,…
CONFIDENCE INTERVAL FOR
PROPORTIONS
𝑊𝑒𝑤𝑖𝑠h𝑡𝑜𝑠𝑡𝑢𝑑𝑦𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑟𝑎𝑡𝑒𝑠𝑖𝑛𝑎𝑐𝑜𝑛𝑠𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑐𝑜𝑚𝑝𝑎𝑛𝑦∈ayear,taking
 

January 11
February 22
March 23
April 18
May 17
June 15
July 15
August 24
September 11
October 15
November 25
December 13
CONFIDENCE INTERVAL FOR THE VARIANCE
  ,…

= 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 .

= 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 .

  χ2
𝛼
,𝑛 − 1
2
CONFIDENCE INTERVAL FOR THE VARIANCE

𝐼𝑛𝑎𝑐𝑜𝑚𝑝𝑎𝑛𝑦,𝑞𝑢𝑎𝑙𝑖𝑡𝑦
 
𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑜𝑓 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑖𝑛𝑐𝑒𝑟𝑡𝑎𝑖𝑛𝑚𝑎𝑡𝑒𝑟𝑖𝑎𝑙𝑠 𝑎𝑟𝑒 𝑔𝑖𝑣𝑒𝑛
SAMPLE DENSITY
1 1,1
2 1,13
3 2,21
4 1,16
5 1,68
6 2,16
7 2,23
8 2,29
9 1,28
10 0,88

 𝐴𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔𝑡𝑜 𝑡h𝑖𝑠 , 𝑐𝑟𝑒𝑎𝑡𝑒 𝑎 95 %  𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 𝑡h𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡h𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 .
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS
  ,…

𝑆2
1 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1.

𝑆2
2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2.

2
𝜎
  1 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1.
2
𝜎
  2 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 2.

  𝐹 2𝛼
, 𝑛1 − 1 , 𝑛2 − 1
2
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS
CONFIDENCE INTERVAL FOR THE COEFFICIENT
OF VARIANCES FROM TWO DIFFERENT
POPULATIONS

MACHINE 1 MACHINE 2

SAMPLE LENGHT (m) SAMPLE LENGHT (m)


1 7,19 1 6,12
2 6,19 2 6,22
3 6,73 3 6,75
4 7,78 4 7,97
5 6,89 5 6,13
6 6,75 6 7,49
7 7,96 7 7,39
8 6,99 8 6,21
9 7,63
10 6,61
WORKSHOP
1) In a company there is a warehouse with 754 beams, beams are supposed to
be between 8 m and 10 m long. You take a sample of 50 beams and get the
next results.

SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT

1 8,336 11 7,725 21 8,653 31 10,708 41 8,202

2 7,623 12 7,923 22 10,979 32 9,032 42 9,961

3 9,778 13 9,312 23 10,885 33 8,919 43 10,885

4 10,767 14 9,831 24 8,464 34 8,107 44 7,543

5 7,569 15 7,434 25 7,702 35 8,644 45 9,186

6 9,456 16 9,241 26 7,578 36 7,67 46 8,787

7 8,267 17 9,524 27 7,114 37 9,292 47 7,472

8 10,468 18 8,124 28 8,991 38 8,002 48 8,293

9 7,077 19 10,235 29 10,905 39 9,542 49 9,718

10 7,278 20 10,13 30 8,893 40 9,919 50 10,1


WORKSHOP
With the previous result create:
a) A 90% confidence interval for the mean of the Length.
b) A 95% confidence interval for the variance of the Length.
c) A 99% confidence interval for the percentage of defective beams.
d) A 95% confidence interval for the coefficient of variances of the lengths taking
into account a comparison between the previous given data and the next given
data from a second warehouse analyzed in the company.
Give conclusions about the accuracy of the beams. Would you do something?
SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT SAMPLE LENGHT
1 8,592 11 10,524 21 9,753 31 7,942
2 8,494 12 8,284 22 10,507 32 8,179
3 10,491 13 10,42 23 8,549 33 9,159
4 9,027 14 8,148 24 7,139 34 7,89
5 9,428 15 9,408 25 10,934 35 7,346
6 7,082 16 10,88 26 7,734 36 7,247
7 8,427 17 10,008 27 8,43 37 7,81
8 9,779 18 9,209 28 9,724 38 7,376
9 10,385 19 10,127 29 10,047 39 9,171
10 10,65 20 9,829 30 8,597 40 10,797
STATISTICAL PROCESS
CONTROL
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
ATRIBUTE CONTROL
GRAPH – P GRAPH
“La gracia del Señor
Jesucristo, el amor de Dios y
la comunión del Espíritu
Santo sean con todos
vosotros”
2 Corintios 13:14

You might also like