You are on page 1of 18

Inferences Concerning Means

Estimation of mean
Parameter: Population mean 
Data: A random sample x1, x2, …, xn
Estimator: n
1
x 
n i 1
xi
S
Standard error of estimate:
n

Maximum Error of Estimate ( known)


x–
x 
For large n, standardized sample mean
Z 
/ n
is a random variable having standard normal distribution N(0,1)
We can assert with probability (the area under PDF) 1 –  that
x
 z 2   z 2
 n
/2 /2
will be satisfied, or we can write 1–

x 
 z 2  z z
/ n 2
0 2

where z 2 is such that the normal PDF curve area to its right
Equals to /2 (0 < usually a small number
Error of the estimate, x

will be smaller than z 2 
n
with probability of 1 – 
Most widely used values for (1 –  are 0.95, 0.99

z/2 ± z0.025 = ± 1.96, ± z0.005 = ± 2.575

Determination of Sample Size n


  
P  x    z 2    1
 n
We can assert with probability 1 –  that the error
will be within some prescribed quantity 

P x       1

Hence,   z 2 
n
Sample size n required so that error   with
probability 1 –  is
2
 z 2   
n 
  
Maximum Error of Estimate
(normal population, 2 unknown)

x
t
S/ n
We can assert with (1 – ) confidence (times 100 to
convert to %) that the error made in using to
estimate  is within
S
  t 2 
n
Sample size required
2
 t 2  S 
n 
  

Confidence Interval of Mean ( known)


Suppose that we have a large random sample (n  30)
from a population with unknown  and known .
x
P (  z 2   z 2 )  1  
 / n
 
x  z 2     x  z 2 
n n -z/2 z/2

When a sample has been obtained and X has been


calculated, we can claim with (1 – ) confidence that the
interval from __________ to __________ contains .
- Confidence interval for 
- Degree of confidence: (1 – )
- Confidence limits
(1 – ) Confidence Interval for 
- Large sample,  unknown
S S
x  z 2     x  z 2 
n n

- Small sample,  unknown


Assume the sample is from a
normal population

S S
x  t 2     x  t 2 
n n -t/2 t/2

Interpretation of Confidence Intervals for 


Before the observations are made, x and S are random
variables, so
S S
1. The interval from X  t 2 to X  t 2 is a random
n n

interval. It is centered at x , and its length is


proportional to S.
2. The interval will cover the true  with probability (1– ).

Once the observations are made and we have the numerical


values of x and S,
3. The calculated interval is fixed. However, in any
particular application, we have no way of knowing if  is
covered or not. We are (1 – ) confident that the interval
will cover 
Interpretation of Confidence Intervals for 
Assuming  = 0.05, then 95% of
confidence intervals constructed will
cover the actual mean

 x

Common Misunderstandings Non-examinable

Confidence intervals are frequently misunderstood, and published


studies have shown that even professional scientists often
misinterpret them

A 95% confidence interval does not mean that for a given realized
interval there is a 95% probability that the population parameter
lies within the interval Once an experiment is done and an interval
calculated, this interval either covers the parameter value or it
does not; it is no longer a matter of probability.

A 95% confidence interval does not mean that 95% of the sample
data lie within the interval.

Plus others…..

Extracted from:
https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation
Some Philosophy Non-examinable

Two schools of statistics:

Classical (frequentist) Bayesian statistics

What we learn in this module Don’t worry too much about this
approach (just fun to know)

Population parameters are fixed Data is fixed


(although generally unknown)

Data is random (following Population parameters are


some distribution) regarded as uncertain

Probabilities are interpreted as Probabilities express our personal


frequency of an event uncertainties about an event

Confidence interval Credible interval

Some Philosophy Non-examinable

Recall Bayes’ theorem

P ( B | A)
P( A | B)  P( A)
P( B)

P(A) = Prior probability, based on our belief before evidence


is taken into account

P(A|B) = Posterior probability, obtained after some evidence


is taken into account

“Bayesian updating” allows us to improve our probability


estimate as new information is acquired.
Example
Daily dissolved oxygen (DO) concentration for a stream at a
station has been recorded for 30 days. The daily level of DO
concentration is known to vary with  = 2.05 mg/l. From the
sample of 30 observations, the sample mean is calculated to
be = 2.52 mg/l. Determine the 99% confidence interval for
X
the mean daily DO concentration.
/2 =
/2 0.005
1 –  = 0.99,  = 0.01 1–

 
x  z 2     x  z 2   z
0
z
=2.575
n n 2 2

 2.05   2.05 
2.52 – 2.575   <  < 2.52 + 2.575  
 30   30 

99% confidence interval 1.56 <  < 3.49 mg/l

p
Sp

1 – 0.005 = 0.995
One-sided Confidence Limit for 
1. (1 – ) lower confidence limit ( will be > this limit with
the degree of confidence 1 – )
e.g. material strength, capacity of a highway, or of a
flood channel

- For population with known 



  1  x  z 
n
- For normal population with unknown 

S 
  1  x  t  1–
n

–z 0

2. (1 – ) upper confidence limit

e.g. in determining the wind load on a structure, we would


like to state with a high degree of confidence that the
mean wind load will not exceed certain limit

- For population with known 


 1
 x  z 
n
- For normal population with unknown 

S
 1
 x  t  1–

0 z
Example
Test results for 15 randomly selected specimens of 1 cm
diameter A36 steel yielded 𝒙 = 2200 kgf (kilogram force) and
S = 220 kgf. The manufacturer is required to specify the 95%
lower confidence limit of the mean yield strength .

1 –  = 0.95,  = 0.05
Use t-distribution with d.o.f. = 15 – 1 = 14
t0.95 = 1.761
95% lower confidence limit
S
  1  x  t 
n
220
 2200  1.761  2100 kgf
15

Test of Hypotheses

• Hypothesis testing is a statistical method for making


inferential decisions about the population based on
sample data.
• Why hypothesis testing? Because when population
parameters are estimated from small samples, then
sampling errors are inevitable. We need a sound
statistical basis to determine whether any discrepancy
could have arisen due to chance alone.
• Useful in many engineering applications. For example,
an engineer may be interested to know whether the yield
strength of a structural component meets certain
requirements.
Test of Hypotheses
Example: A consumer protection agency wants to test a paint
manufacturer’s claim that the average drying time of his new
“fast drying” paint is 20 minutes. It instructs a member of its
staff to take 36 boards and paint them with paint from 36
different one-gallon cans of the paint, with the intention of
rejecting the claim if the mean of the drying times exceed
20.75 minutes; otherwise, it will accept the claim.

Criterion: 20.75, accept or reject the claim


Two possibilities where sample information may lead to
wrong assessment
1) The sample mean  20.75 min. even though the true mean
drying time  = 20 min.

2) The sample mean  20.75 min. even though the true mean
drying time is, say,  = 21 min.

First, let us investigate the chances that the criterion may


lead to a wrong decision.
Assuming that the standard deviation of the drying time
 = 2.4 minutes.

Possibility (1), large sample P( x >20.75) = 1 – P( x < 20.75)


 2.4 x 20.75  20
x    0.4 Z   1.875
n 36 x 0.4

From the Table of Normal


Distribution Function,
the probability of
erroneously rejecting the 0.0304
x
hypothesis  = 20 min. is  = 20 20.75
approximately (1–0.9696) Accept the Reject the
= 0.03 claim claim
that  = 20 that  = 20
Possibility (2): fails to detect that   20 min.
Suppose that the true mean drying time is  = 21 min., so
that the probability of getting sample mean X  20.75 min.
and hence erroneously accepting the claim that  = 20 min.
is given below.  x = 0.4,
x  20.75  21
Z   0.625
x 0.4

From the Table of


Normal Distribution
Function, the 0.2660
probability of
erroneously accepting x
the hypothesis  = 20 20.75  = 21
min. is approximately Accept the Reject the
0.2660 claim claim
that  = 20 that  = 20

Summary of hypothesis testing:


(We refer to the hypothesis being tested hypothesis H)

Accept H Reject H
H is true Correct decision Type I error ()
H is false Type II error () Correct decision

 = P(Type I error) = P(reject H  H is true)


= level of significance,  is usually set at 0.05 or 0.01

 = P(Type II error) = P(fail to reject H  H is false)


The probability of committing Type II error is impossible to
compute unless we have a specific alternative hypothesis
Role of , , and Sample Size
In the paint example, probability of committing Type I error
 =0.03, probability of committing Type II error  = 0.27
Ideally, we like to use a procedure for which the Type I and
Type II error probability are both small.
e.g. If we set criterion = 20.5 instead of 20.75

20.5  20
Z  1.25  = 0.11
0.4
20.5  21
Z  1.25  = 0.11
0.4
For a fixed sample size,  in the probability of one error will
usually result in  in the probability of other error.
The probability of committing both types of errors can be
reduced by increasing the sample size.

Null Hypotheses and Significance Test

We often formulate hypotheses to be tested as a


single value for a parameter.

Alternative Null
Hypothesis H1 Hypothesis H0
Example:
Yield strength  < 200 MPa  = 200 MPa

Wind load  > 5 MN  = 5 MN

E of concrete   30 GPa  = 30 GPa

The term “null hypothesis” is used for any hypothesis set


up primarily to see whether it can be rejected.
H0 : hypothesis that chance alone could be responsible
for the results (usually no difference)
H1 : opposite to H0
Steps of hypothesis testing
1. Formulate a null hypothesis H0 and an appropriate
alternative hypothesis H1 which we accept when the null
hypothesis must be rejected

2. Specify the probability of a Type I error (level of


significance)

3. Construct a criterion for testing the null hypothesis against


the given alternative based on the sampling distribution of
an appropriate statistic

4. Calculate the value of the statistic on which the decision is


to be based from the data

5. Decide whether or not to reject the null hypothesis

In hypothesis testing, we make use of the probability


distribution of the variable of concern to determine the
rejection or non-rejection of the hypothesis.

Critical region – area under the probability density function in


which Ho is rejected.

Critical values – defines the boundary of the critical region.


Hypothesis Concerning One Mean
Statistic for test concerning mean –  known

x  0
Z 
 / n
Critical Regions for Testing  = 
(normal population and  known)

Alternative Hypothesis H1 Reject H0 if


 <  Z   z
 >  Z  z
   Z   z / 2 or Z  z / 2

Example
Test whether the thermal conductivity of a certain kind of brick is
0.340 as claimed by the manufacturer at the 0.05 level of
significance,  = 0.01, n = 35, x = 0.343

1. Null Hypothesis H0 :  = 0.340


Alternative Hypothesis H1 :   0.340
2.  = 0.05
3. Criterion: Reject H0 if Z   z0.025  1.96 or Z  z0.025  1.96
where
x  0
Z
/ n
4. Calculation: 0.343  0.340
Z  1.77
0.01 / 35
5. Decision:  1.96  Z  1.77  1.96
H0 cannot be rejected. Difference between x and  may be by chance
Note: The test did not establish  = 0.340. It only concludes that
H0 :  = 0.340 cannot be rejected.
Statistic for Large Sample Test Concerning Mean
( unknown, large samples)
n  30
x  0
Z 
S/ n
Critical regions for testing  =  (large samples) -
the same as those  known.

Critical Regions for Testing  = (large sample)

Alternative Hypothesis H1 Reject H0 if


 <  Z   z
 >  Z  z
   Z   z / 2 or Z  z / 2

Statistic for Small Sample Test Concerning Mean


(normal population,  unknown, small samples)
n < 30
x  0
t
S/ n
t is a random variable having the t-distribution with n – 1 d.o.f.

Critical Region for Testing  = 


(normal population,  unknown) 1-sample t-test

Alternative Hypothesis H1 Reject H0 if


 <  t   t
 >  t  t
   t   t / 2 or t  t / 2
Example
The specification of a certain kind of ribbon calls for a mean
breaking strength of 180 MPa. If 5 pieces of randomly selected
ribbon have a mean breaking strength of 169.5 MPa with a
standard deviation of 5.7 MPa, test the null hypothesis  = 180 MPa
against the alternative hypothesis  < 180 MPa at the 0.01 level of
significance. Assume that the population distribution is normal.
 = 180 n=5 x = 169.5 S = 5.7

1. H0 :  = 180
d.o.f.= 5 – 1 = 4
H1 :  < 180
2.  = 0.01
x  0
3. Criterion: Reject H0 if t   t 0.01   3.747 where t 
S/ n

4. Calculation: 169.5  180


t  4.12
5.7 / 5
5. Decision: since t = –4.12 < –3.747, H0 must be rejected at
 = 0.01. The strength is below specification.

t1–= –t

when d.o.f.  

z0.025 = 1.96 = t0.025


Relation between Hypothesis Tests and
Confidence Intervals (Limits)
• (1 – ) confidence interval for  (small sample,
 unknown, normal population)
S S
x  t 2     x  t 2 
n n
• level  test of H0 :  =  vs H1 :  
Critical region: reject H0 if
x  0
t   t 2
S/ n
Non-rejection region:

x  0 S S
t   t 2 x  t 2    0  x  t 2 
S/ n n n

S S
x  t 2    0  x  t 2 
n n

 
2 2
1–
Acceptance Region

 t t
2 2

H0 will not be rejected at level  if  lies within (1 – )


(usually converted to percentage) confidence interval for .
Sample Sample
mean variance

Set 1 x1, x2, x3, …., x20

Set 2 x1, x2, x3, …., x20

  
Set 50 x1, x2, x3, …., x20

̅ ,…,

You might also like