You are on page 1of 28

Statistical Estimation

Sampling Distribution
Distribution of all possible values of a statistic computed from samples of
the same size randomly selected from the same population.

Due to random variation different samples from the same population will
have different sample means.

If we repeatedly take sample of the same size n from a population, the
means of the samples form a sampling distribution of means of size n.

Serves to answer probability questions about sample statistics.


A. Sampling distribution of sample mean

• Suppose we have a population of size N=4, constituting the ages


of four outpatients.

x, Age (years): 18, 20, 22, 24

μ
x i
N
18  20  22  24
  21
4

σ
 i
(x  μ) 2

 2.236
N
Now consider all possible samples of size
n=2

1st 2nd Observation 1st 2nd Observation


Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24
• 16 possible samples • 16 Sample Means
(with replacement)
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Sampling distribution of all sample means

16 Sample Means Sample Means


Distribution
1st 2nd Observation
Obs 18 20 22 24 P(x)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 x
Summary measures of this sampling distribution:
Add the 16 sample means & divide by 16.
Also calculate the SD of the sample means.

μx 
 x i

18  19  21    24
 21
N 16

σx 
 i x
(x  μ ) 2

N
(18 - 21)2  (19 - 21)2    (24 - 21)2
  1.58
16
Properties

1. The mean of the sampling distribution of means is the same as the


population mean, μ .

2. The SD of the sampling distribution of means is σ / √n .

3. The shape of the sampling distribution of means is approximately a


normal curve, regardless of the shape of the population distribution
and provided n is large enough (Central limit theorem).

 In practice, the approximation is a workable one if n is 30 or more.


Sampling Distribution of proportion

The sampling distribution of the sample proportion p posses the


following properties.
 The mean of sampling distribution of proportion p is equal to the
population proportion P.

 The standard deviation of sampling distribution p is = √P(1-P) /n


(called the standard error of the proportion).

 Provided n is large enough the shape of the sampling distribution


of p is normal.
9
Statistical Inference

 Statistical inference includes (methods of making inference)

 1. Estimation

 2. Hypothesis testing
Sample statistic Population parameter
Statistical Estimation

Estimation: is the process of determining a likely value for a


variable in the population based on information collected from the
sample.
The use of sample statistics to estimate population parameters.

E.g.
 Estimates for the proportion of smokers among all people aged 15 to 24
in the population
The mean level of a certain enzyme among healthy men.
Point Estimation

A single numerical value is used to estimate the corresponding


population parameter
 is an estimator of the population mean μ
 S is an estimator of the population standard deviation σ

 p is an estimator of the population proportion π


Point estimation…
 From a single sample we can calculate a sample statistic to estimate a
single parameter (a point estimate).
 Point estimate for population mean µ is
n

 xi
x = i =1
n

 Point estimate for population proportion is given by


 x
p=
n

 Where x is the total number of success (events)


15
Interval estimation
 Interval estimation: is a statement that a population parameter has a value
lying between two specified limits.

 The value of the sample statistic will vary from sample to sample therefore to
simply obtain an estimate of the single value of the parameter is not generally
acceptable.

 We need to take into account the sample to sample variation of the statistic.

 A confidence interval defines an interval within which the true population


parameter is like to fall (interval estimate).
Confidence interval ……

A (1-α) 100% confidence interval for unknown population mean


and population proportion is given as follows;

 
 [ x  z . , x  z . ] for estimating mean
2 n 2 n
if  is unknown, it can be estimated by s.
 
 [ p  z  . p (1  p ) / n , p  z  . p (1  p ) / n ] for estimating proportion
2 2

17
 The 95% confidence interval is interpreted in such a way that,
under the conditions assumed for underlying distribution, you are
95% confident that the interval contains the true parameter.

 90% CI is narrower than 95% CI since we are only 90% certain


that the interval includes the population parameter.

 The 99% CI is wider than 95% CI; the extra width meaning that
we can be more certain that the interval will contain the
population parameter.
 But to obtain a higher confidence from the same sample, we must be
willing to accept a larger margin of error (a wider interval).

 For a given confidence level (i.e. 90%, 95%, 99%) the width of the
confidence interval depends on the standard error of the estimate which
in turn depends on the:
1. Sample size:-The larger the sample size, the narrower the confidence
interval and the more precise our estimate.

 Lack of precision means in repeated sampling the values of the sample


statistic are spread out or scattered.
 The result of sampling is not repeatable.
 You can make the precision as high as you want by taking a large
enough sample.
 The margin of error decreases as√n increases.
2. Standard deviation:-The more the variation among the individual
values, the wider the confidence interval and the less precise the estimate.
3. C.I. for a population proportion (large
sample size)
• A 100(1‐α)% C.I. for π is

Example:

A study on dental health practice. Of 300 adults interviewed, 123


said that they regularly had a dental check‐up twice a year.
What is the 95% C.I. for π?
 P = 123/300 = 0.41 a point estimator of π.

 α = 0.05 ⇒ Z0.025 = 1.96

Example2: An epidemiologist is worried about the ever increasing


trend of malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
• If he takes a random sample of 150 persons in that locality
during the peak transmission period and finds that 60 of them are
positive for malaria,
Find: a) 95% b) 90% c) 99% confidence intervals

for the proportion of the whole infected people in that locality


during the peak malaria transmission period.

Solution:

Sample proportion = 60 / 150 =0 .4

a) A 95% C.I for the population proportion (the proportion of the


whole infected people in that locality) = 0.4 ± 1.96 (.04) = (0.4
± .078) = (0.322, 0.478).
b) A 90% C.I for the population proportion ( the proportion of the
whole infected people in that locality) = 0.4 ± 1.64 (.04) = (.4
± .066) = (.334, .466).

c) A 99% C.I for the population proportion (the proportion of the


whole infected people in that locality) = .4 ± 2.58 (.04) = (0.4
± .103) = (.297, .503).
4. C.I. for the difference between two population
proportions (large sample size)
A 100(1‐α)% C.I. for π1 ‐ π2 is
Example
 Two hundred patients suffering from a certain disease were randomly
divided into two equal groups. Of the first group, who received the
standard treatment, 78 recovered within three days. Out of the other
100, who were treated by a new method, 90 recovered within three
days. The physician wished to estimate the true difference in the
proportions who would recovered within three days.
Solution:
The estimate of the difference in the population proportions is

P1 – P2 = 0.78 – 0.90 = ‐0.12


• The 95% C.I. Is

• we are 95% sure that the difference is between -0.22 and –0.02.

 Note: that the negative signs merely reflect the fact that better
results were obtained by using the new treatment.

You might also like