Professional Documents
Culture Documents
Confidence Intervals
INTRODUCTION TO CONFIDENCE INTERVAL
−
0.1 1.64 X 1.64 / n
−
0.05 1.96 X 1.96 / n
−
0.02 2.33 X 2.33 / n
−
0.01 2.58 X 2.58 / n
Example
Note that 4.73 is the upper limit of the 95% confidence interval from part (a), thus
the probability that the population mean is greater than 4.73 is approximately
0.025.
Confidence Interval for Population Proportion
If X1, X2, …, Xn are from Bernoulli trials with probability of success p, that
is, E(Xi) = p and Var(Xi) = p × q (where q = 1 – p), then the sampling
distribution of probability of success (say p̂ ) for a large sample size
follows an approximate normal distribution with mean p and standard
error pq / n , where n is the sample size.
ˆ − p
p
The variable pq / n converges to a standard normal distribution. Note
that the standard deviation of the sampling distribution of proportions
depends on the value ofp̂ p which is unknown. However, for large sample
size, the estimate value will converge to the actual value p.
CI for Population Proportion
• As a rule of thumb, we set the value of n such that n × p × q 10 (few
authors suggest that n × p × q should be at least 15) .
ˆ q
p ˆ ˆ q
p ˆ
ˆ − Z / 2
p p p
ˆ + Z/ 2
n n
That is, the 95% confidence interval for p is (0.6102, 0.7898). That
is, we are 95% confident that the interval (0.6102, 0.7898) contains
the true population proportion of the customers who pay by cash
CI for Population Mean when Standard Deviation is Unknown
• William Gossett (Student, 1908) proved that if the population follows a normal distribution
and the standard deviation is calculated from the sample, then the statistic given in Eq will
follow a t-distribution with (n − 1) degrees of freedom
−
X −
t =
S / n
• Here S is the standard deviation estimated from the sample (standard error). The t-
distribution is very similar to standard normal distribution; it has a bell shape and its mean,
median, and mode are equal to zero as in the case of standard normal distribution. The
major difference between the t-distribution and the standard normal distribution is that t-
distribution has broad tail compared to standard normal distribution. However, as the
degrees of freedom increases the t-distribution converges to standard normal distribution.
CI for Population Mean when Standard Deviation is Unknown
• The (1 − )100% confidence interval for mean from a population that follows
normal distribution when the population mean is unknown is given by
− S
X t / 2, n −1
n
• In above Eq, the value t/2,n − 1 is the value of t under t-distribution for which
the cumulative probability F(t) = 0.025 when the degrees of freedom is (n − 1).
• Here the degrees of freedom is (n − 1) since standard deviation is estimated
from the sample
• The absolute values of t/2,n−1 for different values of
are shown in Table along with corresponding Z/2
values.
|t/2,10| |t/2,50| |t/2,500| |Z/2|
Solution
−
We know thatX = 24 , n = 70, S = 3.8 and t0.025, 69 = 1.995
Thus the 95% confidence interval for the size of the basket is
(23.09,24.91).
Summary
a) The point estimate of population parameters give unique value,
however, data scientists would like to know the range of values the
population parameter is likely to take. Interval estimates provide
better insights about the population parameter.