Professional Documents
Culture Documents
(Introduction of Statistics)
Estimation is the statistical procedure which uses sample information to estimate the value of a
population parameter such as the population mean, population standard deviation or population
proportion.
Estimation
2 words: Confidence Interval and Confidence level
Confidence level: the probability that the CI actually contains the parameter (μ for us), given
that we take a large number of samples of size n and calculate a CI for each sample.
Significance Level: the probability that our CI does not contain the true value of the population
parameter (if we repeatedly take different samples and calculate a CI)
Ex:
Let’s say we take a sample from a population where we do not know μ, but we know squared
= 9.
We want a 95% confidence interval for μ. So if we take 1000 samples of size n=36 and calculate
a 1000 CIs,
95%= .95, so .95(1000)=950 CI’s that will contain μ 100%-95%=.05= so (.05) (1000)= 50 CI’s
that will not continue.
Confidential Interval
A confidence interval for a population mean, , is an interval expected to capture the true
population mean a certain percentage of the time. This percentage is called the Confidence
Level. The confidence levels we will discuss are 90%, 95%, and 99%.
Assumptions
1) We have a simple random sample of size n from a population of x values.
2) The value of is known.
3) If the population of X is normally distributed then we can use any value of n.
4) If the population of X is unknown or it is any other distribution, then we require a sample
size n≥30.
Note: If the distribution of X is highly skewed and not mound shaped, a sample of n≥50 or even
100 may be required.
Margin of error: since we do not know μ, we can never know the exact value of the margin of
error.
Critical value: for a CI with a confidence level, c, the critical value z is the number such that the
value under the normal curve between -z and z equals c
Sample Size
The goal is to determine the minimum sample size (n) required (needed) to achieve
1) A confidence level of C
2) A maximum error of E
Degrees of Freedom of a statistic are the number of free choices used in computing the statistic.
The degrees of freedom, denoted by df, for each sample of size n, is one less than the sample
size. Thus, for a sample of size n, the degrees of freedom are given by the formula: df n 1
Student’s t-distribution
The t-distribution, also known as Student’s t-distribution, is a way of describing data that
follow a bell curve when plotted on a graph, with the greatest number of observations close to
the mean and fewer observations in the tails.
Distributional Requirements
One or more of the following requirements must be meet to use a t-distribution:
1) X has a normal distribution and/or
2) n≥30.
Why?
The t- distribution is only valid if we can assume the distribution of x is normally distribute. This
is true either of the following
1) If n≥30, then we cannot use the CLT.
Thus for x to be normally distributed, x has to be normally distributed.
2) If n≥30, then by the CLT, X is approximately normally distributed.