You are on page 1of 48

Statistical Methods

Descriptive Statistics
Descriptive Statistics consists of the tools
and techniques designed to describe data,
such as charts, graphs, and numerical
measures.

1
Descriptive Statistics
(Histogram)

Descriptive Statistics
AVERAGE
The sum of all the values divided by the number of
values. In equation form:
n

∑x i
sum of all data values
Mean = 1
=
n number of data values

where:
n = number of data values
xi = ith data value

2
Inferential Statistics

Inferential Statistics consists of techniques


that allow a decision-maker to reach a
conclusion about characteristics of a larger
data set (Population) based upon a subset
(Sample) of those data

Inferential Statistics
Involves:
 Estimation Population?
 Hypothesis testing
Purpose
 Make Inferences
about Population
Characteristics

3
Inference Process
Estimates
& tests Population

Sample
statistic
Sample

Why Study
Sampling Distributions
►Sample statistics are used to estimate
population parameters
 e.g.: X = 50 estimates the population mean µ
►Problems: Different samples provide
different estimates
 Large samples give better estimates; large
sample costs more
 How good is the estimate?
►Approach to solution: Theoretical basis is
sampling distribution

4
Sampling Distribution

►Theoretical probability distribution of a


sample statistic
►Sample statistic is a random variable
 Sample mean, sample proportion
►Results from taking all possible
samples of the same size

Developing Sampling
Distributions
►Assume there is a
population … A B C D

►Population size N=4


►Random variable
X=age of individuals
►Values of X: 18, 20,
22, 24 measured in
years

5
Developing Sampling
Distributions
Summary Measures for the Population Distribution
N

∑X i
P(X)
µ= i =1 .3
N
.2
18 + 20 + 22 + 24
= = 21 .1
4
N 0
∑(X −µ)
2
i A B C D X
σ = i =1
= 2.236 (18) (20) (22) (24)
N
Uniform Distribution

Developing Sampling Distributions


All Possible Samples of Size n=2

1st 2nd Observation


Obs 18 20 22 24
16 Sample Means
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24
20 19 20 21 22

N2=K=16 Samples 22 20 21 22 23
Taken with Replacement 24 21 22 23 24

6
Developing Sampling Distributions

Sampling Distribution of All Sample Means

16 Sample Means Sample Means


Distribution
1st 2nd Observation
P(X)
Obs 18 20 22 24
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23 _
0 X
24 21 22 23 24 18 19 20 21 22 23 24

Developing Sampling Distributions


Summary Measures of Sampling Distribution

∑X i
18 + 19 + 19 + L + 24
µX = i =1
= = 21
K 16
N

∑(X − µX )
2
i
σX = i =1

K
(18 − 21) + (19 − 21) + L + ( 24 − 21)
2 2 2

= = 1.58
16

7
Comparing the Population with its
Sampling Distribution
Population Sample Means Distribution
N=4 n=2
µ = 21 σ = 2.236 µ X = 21 σ X = 1.58
P(X) P(X)
.3 .3

.2 .2
.1 .1
0 0 _
A B C D X 18 19 20 21 22 23 24 X
(18) (20) (22) (24)

Properties of Summary Measures


► µX = µ
 e.g.: X Is unbiased
►Standard error (standard deviation) of the
sampling distribution σ X is less than the
standard error of other unbiased
σ
estimators σ =
n
X

►For sampling with replacement:


 As n increases, σ decreases
X

8
Unbiasedness

P(X)
Unbiased Biased

µ µX X

Less Variability

P(X)
Sampling
Distribution
of Median Sampling
Distribution of
Mean

µ X

9
Effect of Large Sample

Larger
P(X)
sample size

Smaller
sample size

µ X

When the Population is Normal


Population Distribution
Central Tendency σ = 10
µX = µ

Variation µ = 50
σ
σX = Sampling Distributions
n n=4 n = 16
Sampling with σX =5 σ X = 2.5
Replacement
µ X = 50 X

10
When the Population
is Not Normal
Population Distribution
Central Tendency σ = 10
µX = µ

Variation µ = 50
σ
σX = Sampling Distributions
n n=4 n = 30
Sampling with σX =5 σ X = 1.8
Replacement
µ X = 50 X

Central Limit Theorem

As Sample Sampling
Size Gets Distribution
Large Be
Enough comes
Almost
Normal
Regardless
of Shape of
Population

11
How Large is Large Enough?
►For most distributions, n>30
►For fairly symmetric distributions, n>15
►For normal distribution, the sampling
distribution of the mean is always normally
distributed

Sampling Distribution of the Sample


Mean
The sampling distribution of the sample mean x is the
probability distribution
x of the means of all possible
x random
samples of n observations that can be drawn from a given
population with mean u and variance σ2

POPULATION

Mean= u
Variance=σ2
x

x x x ... x

12
SAMPLING DISTRIBUTION OF MEANS
Population:-10, 12, 14,16, 18, 20
Draw all possible random samples of size 2

No Sampls Mean
1 10 12 11 Means f Relative Frequency
2 10 14 12
11 1 0.06666667
3 10 16 13
4 10 18 14 12 1 0.06666667
5 10 20 15 13 2 0.13333333
6 12 14 13
14 2 0.13333333
7 12 16 14
8 12 18 15 15 3 0.2
9 12 20 16 16 2 0.13333333
10 14 16 15
11 14 18 16
17 2 0.13333333
12 14 20 17 18 1 0.06666667
13 16 18 17
14 16 20 18
19 1 0.06666667
15 18 20 19

Properties Sampling Distribution


of the Sample Mean

If a random sample of size n is taken from a population


with mean µ and standard deviation σ, then the sampling
distribution of the sample mean x is

Normal, if the sampled population is normal


Has mean
µx = µ
Has standard deviation σ
σx=
n

13
The Central Limit Theorem
In a random sample on size n is taken from a population then
the sampling distribution of sample means is approximately
normally distributed with
mean µ x = µ and standard deviation σ x = σ n

Random Sample(x1, x2, …, xn)


x
X

as n → large Sampling Distribution


Population Distribution of Sample Mean

(µ,σ ) (µx = µ, σ x = σ n)

Function of Hypothesis Testing


Hypothesis testing begins with an assumption, called a
hypothesis, that we make about a population parameter.
Say that we assume a certain value for a population
mean.
To test the validity of our assumption we
• Collect sample data
• Produce sample statistics
• Use this information to decide how likely it is that our
hypothesized population parameter is correct.
Now determine the difference between the hypothesized
value and the actual value of the sample mean.

14
Function of Hypothesis Testing (Cont…)
Then we judge whether the difference is significant or
non-significant.
Unfortunately, the difference between the hypothesized
population parameter and the actual statistic is more often
neither so large that we automatically reject our
hypothesis nor so small that we just as quickly say don’t
reject it.
So in hypothesis testing, as in most significant real life
decisions, clear-cut solutions are the exception, not the
rule.

Function of Hypothesis Testing (Cont…)


When to Reject the Hypothesis or Don’t Reject ?

Suppose I say that the average marks in F.Sc of the


students of UAF is at least 90 percent. How can you test
the validity of my hypothesis?
Using the sampling methods we could calculate the marks
of a sample of students. If we did this and the sample
statistic came out to be 95 percent, we would readily say
that “don’t reject the statement”.
However, if the sample statistic were 46 percent, we
would reject the statement.
We can interpret both these outcomes, 95 percent and 46
percent, using our common sense.

15
The Basic Problem ?

Now suppose that our sample statistic reveals a mark of


88 percent. This value is relatively close to 90 percent,
but is it close enough for us to “don’t reject the
hypothesis”?
Whether we don’t reject or reject the hypothesis, we can’t
be absolutely certain that our decision is correct;
therefore, we will have to learn to deal with uncertainty in
our decision making.

Hypothesis Testing
Testing of Hypothesis

A procedure which enables us to decide on the basis of


information obtained from the sample taken from the
population whether to reject or don’t reject any specified
statement or hypothesis regarding the value of the
population parameter in a statistical problem is known as
testing of hypothesis.

16
Hypothesis Testing
Statistical Hypothesis
An assumption made about the population parameter
which may or may not be true.
Null Hypothesis or Maintained Hypothesis
Denoted by symbol Ho, is any hypothesis which is to be
tested for possible rejection under the assumption that it
is true. The null hypothesis always contains some form
of an equality sign.
Alternative Hypothesis or Research Hypothesis
The complement of the null hypothesis (H1), denoted by
A. The alternative hypothesis never contains the sign of
equality and is always in an inequality form.

Hypothesis Testing
Basic Strategy in Hypothesis Testing
The basic strategy in statistical hypothesis testing is to
attempt to support the Research/Alternative hypothesis
by contradicting the null hypothesis.
Reasoning in Hypothesis Testing
The null hypothesis should be regarded as true and should
be rejected only when the sample data gives strong
evidence against it. The alternative hypothesis is the
hypothesis which we are willing to don't reject. A null
hypothesis is thus tested against an alternative hypothesis.

17
Hypothesis Testing

Error of Inference

Whenever sample evidence is used to draw a Conclusion,


there are risks of making wrong decision because of
sampling. Such errors in making the incorrect conclusion
are called Inferential Errors, because they entail drawing
an incorrect inference from the sample about the value of
the population parameter.

18
Error of Inference

Hypothesis Testing
Significance Level
Probability of committing a Type-I error is called the level
of significance, denoted by α . The level of significance is
also called the size of test. By α =5% we mean that there
are 5 chances in 100 of incorrectly rejecting a true null
hypothesis. To put it in another way we say that we are
95% confident in making the correct decision.
Level of Confidence
The probability of not committing a Type-I error, (1- α ), is
called the level of confidence, or confidence co-efficient.
Power of a Test
The probability of not committing a Type-II error, (1-β), is
called the power of the test.

19
Hypothesis Testing

A cut-off value often used is 0.05, that is, reject the


null hypothesis when the p-value is less than 0.05. For
example, suppose you do a t-test to test the null
hypothesis that µ equals 5, versus the alternative
hypothesis that it does not equal 5. You would reject
the null hypothesis that µ equals 5 if the test yields a
very small (for example, less than 0.05) p-value.

Hypothesis Testing

Mathematically
If P-Value < Level of significance (α)

Reject H at α level. Statistically say that the results are


significant.

If P-Value > Level of significance (α)

Do not reject H at α level. Statistically say that the


results are non- significant.

20
Hypothesis Testing

Test Statistic
Statistic on which the decision of rejecting or don’t
rejecting the null hypothesis is based

Rejection Region/Critical Region (CR)


That part of the sampling distribution of a statistic for
which the null hypothesis is rejected

Non-rejection Region/Non-critical Region


That part of the sampling distribution of a statistic for
which the null hypothesis is not rejected

21
General Procedure for Hypothesis Testing
• Formulate the null and alternative hypotheses
• Decide upon a significance level,
• Choose an appropriate test statistic and find its
value
• Determine the Critical Region (CR). The location of
the CR depends upon the form of alternative
hypothesis. Choose the location of the CR on
the basis of the direction at which the
inequalities sign points:
• If >, choose the right tail as the CR
• If <, choose the left tail as the CR
• If ≠ , choose a two-tailed CR

General Procedure for Hypothesis Testing

• Reject null hypothesis if the computed value of test


statistic falls in the CR, otherwise don’t reject
null hypothesis and then state the decision in
managerial terms

22
STEPS FOR TEST OF HYPOTHESIS

1):-Construction of hypotheses
2):- Level of significance
3):- Test statistic
4):-Decision rule
5):-Conclusion

1/5 Construction of hypotheses


[Null and Alternative Hypotheses]
The null hypothesis, denoted H0, is any hypothesis which is
to be tested for possible rejection or nullification under the
assumption that it is true. The null hypothesis always
contains some form of an equality sign.

The alternative hypothesis, denoted Ha, The complement


of the null hypothesis is called the alternative
hypothesis. It is denoted by H1. The alternative
hypothesis never contains the sign of equality and is
always in an inequality form.

23
1/5 Construction of hypotheses
[One sided and two sided hypothesis]

One-Sided, Greater Than ( Right Tail)


H0: µ ≤ 50 Ha: µ > 50

One-Sided, Less Than (Left Tail)


H0: µ ≥ 50 Ha: µ < 50

Two-Sided, Not Equal To


H0: µ = 50 Ha: µ ≠ 50

2/5 Level of significance


[Type I and Type II errors]

One the basis of sample information, we may reject


a true statement about population or don’t reject a
false statement
Type I error = Reject H0 / H0 is true
Type II error = Don’t Reject H0 / H0 is fasle

24
2/5 Level of significance
[Type I and Type II errors]

State of Nature
P(Type I error)=α
Decision on the H0 True H0 False
basis of sample P(Type II error)=β
information
Reject H0 Type I Correct
Error Decision α and β are inversely
Do not Reject H0 Correct Type II related to each other
Decision Error

1- α = level of confidence
1- β= Power of the test

3/5 Test Statistic


►A statistic on which the decision of rejecting
or don’t rejecting the null hypothesis is
based is called a test statistic
► In testing of hypothesis the sampling
distribution of the test statistic is based on
the assumption that the null hypothesis is
true.

25
4/5 Decision Rule
Critical Value

► Critical region/Rejection
region
Critical region is that part of the sampling
distribution of a statistic for which the HO is
rejected. A null hypothesis is rejected if the AR RR
value of test- statistic is not consistent with
the HO. CR is associated with H1.
► Non-rejection Region
Non-rejection region is that part of the
sampling distribution of a statistic for which
the HO is not rejected.
Critical Values:
The values that separate Rejection and Non-
rejection regions are called Critical values

5/5 Result
Reject Ho if the calculated value of test
statistic falls in the rejection region
otherwise don’t reject Ho

26
Test of hypothesis included in the course

► About Single population mean µ


► About difference between population means µ1 – µ2
► About Single population proportion P
► About difference between population proportions P1- P2
► About several proportions
► About several means

Assumptions
► Parentpopulation should be normal or
sample size should be large
► The sample should be random

27
EXAMPLE:- It is claimed that an automobile is driven on the
average more than 12,000 miles per year. To test this claim a
random sample of 100 automobiles owners are asked to keep
a record of the miles they travel. Would you agree with the
claim if the random sample showed an average of 12500
miles and a standard deviation of 2400 miles?
Construction of hypotheses
POPULATION Ho : µ ≤ 12000
µ > 12000 H1: µ > 12000
µ ≤ 12000 Level of significance
α = 5%
Test Statistic

X −µ 12500−12000
Z= = = 2.083
s2 (2400) 2
n 100
SAMPLE
n=100 Decision Rule:- Reject Ho if Zcal ≥ Zα
X=12500 Result:-As Zcal > Z.05 =1.645, So reject Ho and conclude
S=2400 that the claim is true.

EXAMPLE:- Ithas been found from experience that the mean


breaking strength of a particular brand of thread is 9.63N with
a standard deviation of 1.40N. Recently a sample of 36 pieces
of thread showed a mean breaking strength of 8.93N. Can we
conclude that the thread has become inferior?
POPULATION
Construction of hypotheses
σ=1.40 Ho : µ ≥ 9.63
H1: µ < 9.63
µ < 9.63
µ ≥ 9.63 Level of significance
α = 5%
Test Statistic

X −µ 8.93 − 9.63
Z= = = −3
σ2 (1.40) 2
n 36

SAMPLE Decision Rule:- Reject Ho if Zcal ≤ - Zα


n=36
X=8.93 Result:-As Zcal < - Z.05= -1.645 Reject Ho and hence
we conclude that threat has become inferior.

28
EXAMPLE:- The mean lifetime of bulbs produced by a
company has in past been 1120 hours. A sample of 9
electric light bulbs recently chosen from a supply of
newly produced battery showed a mean lifetime of 1170
hours with a standard deviation of 120 hours. Test that
mean lifetime of the bulbs has not changed
Construction of hypotheses
POPULATION Ho: µ = 1120
H1: µ ≠ 1120
µ =1120
µ ≠ 1120 Level of significance
α = 5%
Test Statistic

t= X − µ = 1170 − 1120
2
= 1.25
S2 (120 )
n 9

SAMPLE
n=9 Decision Rule:-Reject Ho if | tcal | ≥ tα/2(n-1)
X=1170 Result:-As | tcal | < t.025 (8),=2.306 So don’t
S=120
reject Ho and conclude that the mean life has
not changed.

EXAMPLE:- Workers at a production facility are required to assemble a certain part in 2.3
minutes in order to meet production criteria. The assembly rate per part is assumed to be
normally distributed. Six workers are selected at random and time in assembling is recorded.
The assembly times (in minutes) for the six workers are as follows. The manager wants to
determine that the mean assembling time is according to production criteria.

Worker 1 2 3 4 5 6 TOTAL
Time 2 2.4 1.7 1.9 2.8 1.8 12.6
(X-X)2 0.01 0.09 0.16 0.04 0.49 0.09 0.88
X =
∑ X = 12.6 = 2.1 S2 =
∑ ( X − X ) 2 =of0.88
Construction hypotheses
= 0.176
n 6 n −1 5 Ho: µ ≤ 2.3
POPULATION
H1: µ > 2.3
Level of significance
µ ≤ 2.3
α = 5%
µ >2.3 Test Statistic

X −µ 2.1 − 2.3
t= = = −1.166
S2 0.176
SAMPLE n 6
n=6
X=2.1 Decision Rule:-Reject Ho if tcal ≥ tα(n-1)
S2=0.176 Result:-As tcal < t.05 (5),= 2.015 So don’t reject Ho and
conclude that the assembling time is according to
production criteria.

29
Test of hypothesis for population mean
Ho H1 Population variance Test Statistic Decision rule
Sample size Reject Ho if
µ ≤ µo µ > µo • Pop. Variance known Zcal > Zα
X −µ
Z=
σ2
n
µ ≥ µo µ < µo • Pop.Variance unknown Zcal < -Zα
X −µ
• Large sample Z=
S2
n
µ = µo µ ≠ µo • Pop.Variance unknown Zcal > Zα/2
X −µ
• Large sample Z= OR
S2 Zcal < -Zα/2
n
µ = µo µ ≠ µo • Pop.Variance unknown tcal > t α/2(n-1)
X −µ
• Small sample t= OR
• Pop is Normal S2 tcal < -tα/2(n-1)
n

Population and Sample


Proportions
Population X1, X2, …, XN Sample x1, x2, …, xn

p p̂

Population Proportion Sample Proportion


X
P =
X pˆ =
N n

30
Example: Sample Proportion
117 out of 500 sample students from UAF are not in
favour of semester system in the university

n = 500, number of students surveyed

X = 117, number of students who disfavour


^ X 117
p= = = 0.234 Disfavour
n 500
^ n − X 383
q= = = 0.766 favour
n 500
^ ^
p+ q = 0.234 + 0.766 = 1 Always

TEST OF HYPOTHESIS FOR POPULATION PROPORTION


EXAMPLE:-A manufacturer claimed that at least 95% of the equipment
which he supplied to a factory conformed to specification. An examination
of a sample of 200 pieces of equipment revealed that 18 were faulty. Test
his claim at 5%.

Construction of hypotheses
POPULATION Ho : P ≥ 0.95
P ≥ 0.95 H1: P < 0.95
P < 0.95 Level of significance
α = 5%
Test Statistic

pˆ − Po 0.91− 0.95
Z= = = −2.60
PQ
o o (0.95)(0.05)
n 200
SAMPLE
n=200 Decision Rule:- Reject Ho if Zcal ≤ - Zα
X=182
Result:-As Zcal < - Z.05= -1.645 Reject Ho and
p=182/200=0.91
conclude that manufacturer claim is not correct

31
Example:- Out of 500 students from UAF, 400 are in favour of the
semester system in the university. Can we conclude that the proportion of
students from the university in favour of the semester system is at most
70%.

Construction of hypotheses
Ho : P ≤ 0.70
POPULATION
H1: P > 0.70
P ≤ 0.70
Level of significance
P >0.70 α = 5%
Test Statistic

pˆ − Po 0.80−0.70
Z= = = 4.87
PQ
o o (0.70)(0.30)
n 500

SAMPLE Decision Rule:- Reject Ho if Zcal > Zα


n=500
X=400
Result:-As Zcal > Z0.05=1.645 We, reject Ho and
p=400/500=0.80 conclude proportion is more than 70%.

TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION MEANS


EXAMPLE:The average salary of 50 workers from Masood textile is Rs.7,000
with a standard deviation of Rs 500 and average salary of 70 workers form
Shahzad textile is 6,800 with a standard deviation of Rs.300 . On the basis of
above sample information can we conclude that Masood textile is paying more
to workers than Shahzad Textile. Use 5% level of significance

Construction of hypotheses
POPULATION
Ho : µ1 ≤ µ2
H1: µ1 > µ2
µ1 > µ2
µ1 ≤ µ2 Level of significance
α = 5%
T e st S ta tistic

( X 1− X 2 ) − ( µ 1 − µ 2 ) (7 0 0 0 − 6 8 0 0 ) − ( 0 )
Z = = = 2 .5 2
S 12 S 22 5002 3002
SAMPLE + +
n1 n2 50 70
n1=50 n2=70 Decision Rule:-Reject Ho if Zcal ≥ Zα
X1=7000 X2=6800
S1=500 S2=300 Result:-As Zcal=2.52 > Zα=1.645 .So reject H0 and conclude
that the average salary of all workers in Masood textile is more
than average salary of workers for Shahzad textile..

32
Example:-The strength of ropes made out of cotton yarn and
coir gave on measurement the following values
Cotton :7.5 5.4 10.6 9.0 6.1 10.2 7.9 9.7 7.1 8.5
Coir 8.3 6.1 9.6 10.4 6.4 10.0 7.9 8.9 7.5 9.7
Test whether there is a significant difference in the strength of
the two types of ropes at 5% level of significance. Assume
population variances are equal
Construction of hypotheses
POPULATION Ho : µ1 = µ2
H1: µ1 ≠ µ2 Sp2= (n1−1)S12 + (n2 −1)S22 = 26.78+ 20.236= 2.612
µ1 ≠ µ2
Level of significance α = 5% (n1−1) + (n2 −1) 9 +9
µ1 = µ2
− −
( X 1− X 2) − (µ1 − µ 2) (8.2 − 8.48) − (0)
t= = = −0.38
1 1 
2 1 1
S p  n1 + n2  2.612 + 
 10 10 
SAMPLE Decision Rule:-Reject Ho if , | t cal | ≥ tα/2(n1+n2-2)
n1=10 n2=10
X1=8.2 X2=8.48 Result:-As | t cal |=0.38 < t0.025(18 ) =2.101 so don’t reject Ho
S12=2.98 S22=2.25 and conclude that there is not significant difference between the
Sp2=2.612 ropes made from cotton and coir yarn

Example:-Six horses were feed on diet A, 5 on diet B. the


gains in weights for the individual horses were as shown
Diet A(X1): 30 30 28 38 28 26
Diet B(X2) 40 34 38 32 26
Can we conclude that diet B is better as compare to diet A for
increasing weight?Assume population variances are unequal
Construction of hypotheses
POPULATION Ho : µ1 ≥ µ2 df =
[w1 + w 2 ]2 ≈7
H1: µ1 < µ2 ( w1) 2 ( w 2 ) 2
µ1 < µ2 +
Level of significance n1 − 1 n 2 − 1
µ1 ≥ µ2
α = 5%
Test Statistic Where w1= S12/n1 =2.93
(30 − 34) − (0)
and w2= S22/n2 =6
t= = −1.34
17.6 30
+
6 5

SAMPLE Decision Rule:-Reject Ho if tcal ≤ - tα(df)


n1=6 n2=5
X1=30 X2=34 Result:-As tcal=-1.34 > tα(7)=-1.895 So don’t reject Ho and
S12=17.6 S22=30 conclude that diet B is not better than A

33
Test of hypothesis for comparing two
means of NORMALPOPULATION
Ho H1 Population variance Te st Statistic Decision rule
Sample size Reject Ho if
µ 1- µ 2 ≤ µ d µ 1 - µ 2 >µ d • Pop. Variances known − − Z cal > Z α
• Samples are ( X 1− X 2 ) − ( µ d )
independent Z =
σ 12 σ 22
+
n1 n2

µ 1- µ 2 ≥ µ d µ 1 - µ 2 <µ d • Pop. Variances − − Z cal < -Z α


unknown ( X 1− X 2 ) − ( µ d )
• Samples are large Z =
S12 S 2 2
• Samples are +
independent n1 n2

µ 1- µ 2 = µ d µ 1- µ 2 ≠ µ o • Pop.Variance unknown − − t cal > t α/2(df)


and equal ( X 1− X 2 ) − ( µ d ) OR
• t=
Samples are small  1 1  tcal < -tα/2(df)
• Samples are S 2p +  df=n1+n2 -2
independent  n1 n 2 
µ 1- µ 2 = µ d µ 1- µ 2 ≠ µ o • Samples are not − − tcal > t α/2(n-1)
independ ent ( X 1− X 2 ) − ( µ d ) OR
(Samples are paired) t=
 S 2d  t cal < -t α/2(n-1)
 
 n 

TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION PROPORTIONS


A sample of 150 light bulbs produced by company A showed 12 defective bulbs while a
sample of 100 light bulbs produced by company B showed 4 defective bulbs. Is there a
significant difference between the proportions of defective bulbs produced by two
companies at 5% level of significance?
Let X1=Total number of defective bulbs produced by company A=12
Let X2=Total number of defective bulbs produced by company B=4
Construction of hypotheses
POPULATION
Ho: P1 = P2
H1: P1 ≠ P2
P1 ≠ P2
P1 =P2 Level of significance
α = 5%
Test Statistic
( pˆ1 − pˆ 2 ) − (P1 − P2 ) (.08 − .04) − (0)
Z= = = −1.27
1 1  1 1 
pˆ c qˆc  +  (0.064)(0.936)  + 
SAMPLE  150 100 
 n1 n2 
n1=150 n2=100
X1=12 X2=4 Decision Rule:- Reject Ho if | Z cal | ≥ Zα/2
p1=0.08 p2=0.04
pˆ c =
X1+ X 2
=
12 + 4
= 0.064 Result:-As | Z cal | < Zα/2= 1.96 We, don’t
n1 + n 2 150 + 100
reject Ho
qc=1-.064=0.936

34
TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION PROPORTIONS
A machine puts out 16 imperfect articles in a sample of 500. After machine is overhauled, it puts out
97 prefect articles in a sample of 100. Has the machine been improved?
Let X1=Number of imperfect articles in the sample before the machine is overhauled=16
Let X2=Number of imperfect articles in the sample after the machine is overhauled=100-97=3

Construction of hypotheses
Ho: P1 ≤ P2
POPULATION
H1: P1 > P2
P1 > P2 Level of significance
P1 ≤ P2 α = 5%
Test Statistic
( pˆ 1 − pˆ 2 ) − ( P1 − P2 ) ( 0.032 − .03) − ( 0 )
Z = = = 0.11
1 1   1 1 
pˆ c qˆ c  +  ( 0 .0317 )( 0 .968 ) + 
 n1 n 2   500 100 
SAMPLE
n1=500 n2=100 Decision Rule:- Reject Ho if Zcal ≥ Zα
X1=16 X2=03
P1=0.032 p2=0.03 Result:-As Zcal < Z0.05= 1.645 We, don’t reject Ho
X1+ X 2 16+3
pˆc = =
n1+ n2 500+100
= 0.0317 and conclude that machine has not been
improved.
qc=1-0.0317=0.968

POPULATION PARAMETER ?

Point Estimate Interval Estimate


(Single value) (Range of Values)

35
INTERVAL ESTIMATE
An interval estimation for population
parameter is a rule for determining an
interval in which the parameter is likely to
fall. The corresponding estimate is called
interval estimate. Usually a probability of
some confidence is attached with the
interval estimate when it is formed.

Example:- A researcher wishes to estimate the average amount of money


that a student from university spends for food per day. A random sample of
36 students is selected and the sample mean is found to be Rs 45 with
standard deviation of Rs.3. Estimate 90 % confidence limits for the
average amount of money that the students from the university spend on
food per day.

POPULATION

SAMPLE
n=36 90% C.I 45±(1.645)(0.5)
X=45 α = 10%
S=3 ( 44.18 , 45.82)

36
Interpretation of Confidence Interval

Some 95% C.Is for the


population mean

U
If we construct 100 C.Is ,one from each sample of the same
size, then 95 of such constructed C.Is will contain unknown
parameter and 5 C.Is may not contain parameter

► The sample mean is at the center of the interval estimate


► Width w , of the interval -i-e the distance between the
end points is U.L – L.L
► so width W, is determined by probability content, the S.D,
and the sample size. Therefore the following results hold.

► For a given probability content, and S.D, the bigger the


sample size n, the narrower the confidence interval for
the population mean.

► For a given prob content and sample size, the smaller


the standard deviation the narrower would be the
confidence interval for the population mean.

► For given SD and sample size, the smaller the probability


content (1-α), the narrower the confidence interval for
the population mean.

37
Example:-The following data represents the daily milk production of a
random sample of 10 cows from a particular breed
12,15,11,13,16,19,15,16,18,15. Construct 90% C.I for the average milk
production of all the cows of that particular breed.

POPULATION

15± (1.833) (1.51)


SAMPLE ( 12.23 , 17.77)
n=10 90% C.I
X=15 α = 10%
S2=22.89

Example:- In a sample of 500 individuals in a certain


area 41 were found to be unemployed. Compute 99% C.I
for the rate of unemployment in that area

 
pˆ ±
 pˆ qˆ 
Z α / 2  
POPULATION  n 
 
 ( 0 . 082 )( 0 . 918 ) 
0 . 082 ± Z  
. 005  500 
 
0.082 ± (2.58)(0.0123)
( 0.05 , 0.114) OR
SAMPLE
n=500 99% C.I ( 5% , 11.4%)
X=41 α = 1%
p=41/500=0.082

38
Example:-A test in Statistics was given to 50 girls and 75 boys. The
girls made an average grade of 76 with a standard deviation of 6, while
boys made grade of 82 with a standard deviation of 8. Find 96%
confidence interval for the difference between µ1-µµ2. Where µ1 is the
mean of all boys and µ2 is the mean of all girls who might take this
test

POPULATION

6 ± (2.054) (1.254)
SAMPLE
n1=75 n2=50
X1=82 X2=76
S1=8 S2=6
96% C.I
( 3.42 , 8.58)
α = 4%

Example:-A random sample of 20 plants from Variety I showed a


mean height of 63 cm with standard deviation of 6 cm, while an other
random sample of 25 plants from Varity II showed a mean height of 60
cm with standard deviation of 2 cm. Construct 90 confidence interval
for the difference between two variety means.(Assume population
variances are unequal)
 2  2

( X 1− X 2 ) ±  S 1
+ S  2

POPULATION
tα / 2 ( df )
 n1 n2 
df =
[w 1 + w2 ]
2
≈ 23

( w 1) 2 (w 2)2
+
n1 − 1 n2 − 1

Where w1= S12/n1 =1.8


and w2= S22/n2 =0.16
 2 2 
( 63 − 60

) ± t 0 . 05 ( 23 ) 
6 +
2 

SAMPLE  20 25 
 
n1=20 n2=25
X1=63 X2=60 3 ± (1.71)(1.4)
S1=6 S2=2
90% C.I
α = 10%
(0.606 , 5.394)

39
Determination of sample size for estimating
population mean
► Example:-If the standard deviation of the lifetimes of tube is estimated as
100 hours. How large a sample should one take in order to be
1) 95% confident that the error in the estimated mean lifetime will not exceed
20 hours
2) 90% confident that the error in the estimated mean lifetime will not exceed
20 hours
3) 95% confident that the error in the estimated mean lifetime will not exceed
10 hours
4) 99% confident that the error in the estimated mean lifetime will not exceed
20 hours
5) 95% confident that the width of confidence interval is 30 hours

Solution:- 1-α = 95% α = 5% e=20 σ =100


2
 
Zα

 σ
 ( ) 2

1) n =  2  =
(1.96 )2 (100 )2 = 96.04 ≈ 96
e2 (20 )2
2) 1-α = 90% α = 10% e=20 σ =100
2
 
Zα

 σ2
 ( )
n= 2  =
(1.645 )2 (100 )2 = 67.65 ≈ 68
e 2
(20 )2
3) 1-α = 95% α = 5% e=10 σ =100

2
 
Zα

 σ
 ( ) 2

n= 2  =
(1.96 )2 (100 )2 = 384.16 ≈ 384
e2 (10 )2
4) 1-α = 99% α = 1% e=20 σ =100
2
 
 Zα

 σ2
 ( )
n= 2  =
(2.58 )2 (100 )2 = 164.41 ≈ 164
e 2
(20 )2

40
How large should the sample be
depends on the following factors
► How precise do we want a confidence interval
estimate to be (width of confidence interval)
► How confident do we want to be that the interval
estimate is correct (margin of error)
► How variable is the population being sampled?
(Variance of the population)
 Greater the desired level of confidence the larger will
be the sample
 Smaller the error, the larger will be the sample
 Greater the variation in the population, the larger will
be the sample

− −
( X 1− X 2 ) − (µ 1 − µ 2 )
Yes Z =
Population σ 1 2
σ 2 2

Variances +
n1 n 2
known

NO
− −
Sample Yes ( X 1− X 2 ) − (µ 1 − µ 2 )
Z =
sizes S 12 S 2 2
large +
n1 n 2

NO

Population Yes − −
( X 1− X 2 ) − (µ 1 − µ 2 )
Variances t =
equal 2  1 1 
S  + 
p
 n1 n 2 
NO

( X 1− X 2 ) − ( µ 1 − µ 2 )
t =
 S 12 S 22 
 + 
 n1 n2 

41
Mr. Fahid is the training manager of a light engineering firm which employs a considerable
number of skilled machine operators. The firm is constantly making efforts to improve the
quality of its product and so recently Mr. Fahid has introduced a new refresher training
course for workers who have been on the same machine for a long time. The first group has
now completed the course and returned to normal work and Mr. Fahid would like to assess
the effect if any which the training has had upon the standard of its work so that he can
decide whether to make such courses a regular event
Category Improved Did not improved Sub Total
Under 35 17 4 21
(21x40)/60=(14) (21x20)/60=(7)
35-50 17 7 24
(24x40)/60=(16) (24x20)/60=(8)
Over 50 6 9 15
(15x40)/60=(10) (15x20)/60=(05)
Sub Total 40 20 60

1) Construction of hypotheses
Ho : Same rate of improvement in all three age groups OR
No association (independence) between age group and improvement O E (O-E)2/E
H1: Rate of improvement is not same in all age groups OR 17 14 0.643
The two attributes (age group and improvement) are not independent 17 16 0.063
6 10 1.600
)Test Statistic 4 7 1.286
7 8 0.125
(O − E ) 2 9 5
χ2 = ∑
3.200
= 6.92 164 164 6.92
E
2 2
Decision Rule:- Reject Ho if χ cal ≥ χ α[(r-1)x(c-1)]
Result:-As χ2cal > χ2.05(2)=5.99 So reject Ho and conclude that the two attributes are not
independent i.e improvement rate for different age group is different

Comparing more than two population means


We can use two sample t-test to test the equality of more
than two population means, but this procedure
 Require large number of two sample t-tests
 Performing many two sample t-tests at α tends to inflate
the overall α risk.
For example, To test the equality of 10-population
means, we have to perform 45 t-test
If the tests are independent and each test use α =0.05,
then overall α=45(0.05)=2.25
we require a procedure for carrying out test of hypothesis
about the equality of several population means
simultaneously
–we can use F-distribution in ANOVA that yields a single test
statistic for comparing all means so that the overall risk of Type-I
error is controlled

42
Analysis of Variance
(ANOVA)
Analysis of Variance is a procedure that partitions the
total variability present in the data set into meaningful
and distinct components. Each component represents
the variation due to a recognized source of variation, in
addition, one component represents the variation due to
uncontrolled factors and random errors associated with
the response measurements
NORMALITY:-The K-populations from which sample are drawn should be normal
INDEPENDENCE:-The k-samples should be independent
Randomness: The k-Samples should be random
HOMOSCEDASTICITY ( Common Variance):-The k_population have common
variance

One-Way ANOVA
Four groups of students ( All of approximately same attributes)
were subjected to different teaching techniques and tested at
the end of a specified period of time. Due to drop outs in the
experimental groups (sickness, transfers etc) the number of
students varied from group to group
Method 1 Method 2 Method 3 Method 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
81 72 83
69 79 76
90
454 549 425 351 1779

Do the data provide sufficient evidence to indicate a


difference in the mean achievements for the 4
teaching techniques?

43
Graphical view of data

Construction of hypotheses
Ho : µ1=µ2=µ3=µ4(i.e Mean achievements from 4 methods are same)
H1: At least two µ’s are different

Level of significance
α = 5%
Test Statistic
2

F = S b
2
S w

A N O V A TABLE
Source Of Variation Degree of Freedom Sum of Squares Mean Sum of Squares Fcal
(S.O.V) DF SS MSS=SS/df
Between Methods 4-1 =3 712.6 237.5 S2b 3.77*
2
Within Methods 22-3=19 1196.6 63.0 S w(MSE)
(Error)
TOTAL 23-1=22 1909.2

CALCULATION FOR ANOVA TABLE ?

44
Method 1 Method 2 Method 3 Method 4
65 75 59 94 Correction Factor=(G.T)2/Obs= (1779)2/23 = 137601.8
87 69 78 89
73 83 67 80 TotalSS=(65)2+(87)2 …(88)2 – CF= 139511 – 137601.8
79 81 62 88
81
69
72
79
83
76
= 1909.2
90
454 549 425 351 1779

(454) 2 (549) 2 (425) 2 (351) 2


Between Method SS = + + + − CF = 138314.4 − 137601.8 = 712.6
6 7 6 4
WithinMethodsSS =TotalSS – Between MethodsSS=1909.2 – 712.6= 1196.6

SOV DF SS MSS=SS/df Fcal


Between Methods 4-1=3 712.6 237.5 237.5/63=3.77
Within Methods 19 1196.6 63
TOTAL 23-1=22 1909.2

Decision Rule:- Reject Ho if Fcal ≥ Fα(3,19)


Result:-As Fcal =3.77 > F.05(3,19) =3.10 So reject Ho and conclude that
there is difference in the mean achievements for the four teaching methods.

Two-Way ANOVA
The Black Rock candy company was planning a test of three new candy
flavors (A,B,C). In the test company wished also to measure the effect of
three different retail price levels (79 Cents, 89 Cents, 99 Cents). Because
each flavor was to be tested at each price a total of nine different flavor-
prices level combinations were to be tested. The following data represent
the the number of sold candy in (100).

A B C Total

79 8 13 5 26

89 4 18 6 28

99 4 22 10 36
Total 16 53 21 90

Do the data provide sufficient evidence to indicate


a difference in the mean for flavors and prices?

45
Construction of hypotheses
Ho : µ1=µ2=µ3 i.e All the flavors have equal sales
H′o : µ′1=µ′2=µ′3 i.e All the prices have equal sales
H1: At least two µ’s are different
H1’: At least two µ′’s are different
Test Statistics
2

F= S 1
to test Ho
2
S 3
2

F'= S 2
2
to test H/o
S 3 A N O V A TABLE
Source Of Variation Degree of Freedom Sum of Squares Mean Sum of Squares Fcal
(S.O.V) DF SS MSS=SS/df
Between Flavours 3-1 =3 268.67 134.33 11.51
Between Price 3-1=2 18.67 9.33 0.8
Error 8-2-2=4 46.67 11.67
TOTAL 9-1 334

CALCULATION FOR ANOVA TABLE ?

A B C Total

79 8 13 5 26
C. F=(G.T)2/Obs= (90)2/09 = 900
89 4 18 6 28 TotalSS=(8)2+(4)2 …(10)2 – CF= 1234 –900=334
99 4 22 10 36
Total 16 53 21 90
2
(1 6 ) (5 3 ) 2 (2 1 ) 2
F la v o r s S S = + + -C F = 1 1 6 8 .6 7 -9 0 0 = 2 6 8 .6 7
3 3 3
2 2 2
(2 6 ) (2 8 ) (3 6 )
P ric e S S = + + -C F = 9 1 8 .6 7 -9 0 0 = 1 8 .6 7
3 3 3
Error SS =Total SS- Flavor SS - Price SS=334 – 268.67 – 18.67 = 46.67
A N O V A T A B L E
Source Of Degree of Sum of Mean Sum Fcal
Variation Freedom Squares of Squares Decision Rule:-
(S.O.V) DF
Bet. Flavours 3-1 =2
SS
268.67
MSS=SS/df
134.33 11.51*
Reject Ho if Fcal ≥ Fα(2,4)=6.94
Bet. Price 3-1=2 18.67 9.33 0.8ns Reject H’o if F’cal ≥ Fα(2,4)=6.94
Error 8-2-2=4 46.67 11.67
TOTAL 9-1 334

46
Effect of Degrees of Freedom on the
t-distribution

•The shape of t distribution depends on degree of freedom


•As the number of degrees of freedom increases, the spread
of the t distribution decreases and the t curve approaches
the standard normal curve.
•Approximately n ≥ 30 the t and standard normal become same

Z-TABLE

0.005 0.01 0.025 0.05 0.1


2.58 2.33 1.96 1.64 1.28

47
PERCENTAGE POINT OF STUDENT'S t-DISTRIBUTION

Alpha
d.f. 0.250 0.100 0.050 0.025 0.0125 0.005
1 1.000 3.078 6.314 12.706 31.821 63.657
2 0.816 1.886 2.920 4.303 6.965 9.925
3 0.765 1.638 2.353 3.182 4.541 5.841
4 0.741 1.533 2.132 2.776 3.747 4.604
5 0.727 1.476 2.015 2.571 3.365 4.032
6 0.718 1.440 1.943 2.447 3.143 3.707
7 0.711 1.415 1.895 2.365 2.998 3.499
8 0.706 1.397 1.860 2.306 2.896 3.355
9 0.703 1.383 1.833 2.262 2.821 3.250
10 0.700 1.372 1.812 2.228 2.764 3.169
11 0.697 1.363 1.796 2.201 2.718 3.106
12 0.695 1.356 1.782 2.179 2.681 3.055
13 0.694 1.350 1.771 2.160 2.650 3.012
14 0.692 1.345 1.761 2.145 2.624 2.977
15 0.691 1.341 1.753 2.131 2.602 2.947
16 0.690 1.337 1.746 2.120 2.583 2.921
17 0.689 1.333 1.740 2.110 2.567 2.898
18 0.688 1.330 1.734 2.101 2.552 2.878
19 0.688 1.328 1.729 2.093 2.539 2.861
20 0.687 1.325 1.725 2.086 2.528 2.845
21 0.686 1.323 1.721 2.080 2.518 2.831
22 0.686 1.321 1.717 2.074 2.508 2.819
23 0.685 1.319 1.714 2.069 2.500 2.807
24 0.685 1.318 1.711 2.064 2.492 2.797
25 0.684 1.316 1.708 2.060 2.485 2.787
26 0.684 1.315 1.706 2.056 2.479 2.779
27 0.684 1.314 1.703 2.052 2.473 2.771
28 0.683 1.313 1.701 2.048 2.467 2.763
29 0.683 1.311 1.699 2.045 2.462 2.756
30 0.683 1.310 1.697 2.042 2.457 2.750
40 0.681 1.303 1.684 2.021 2.423 2.704
60 0.679 1.296 1.671 2.000 2.390 2.660
120 0.677 1.289 1.658 1.980 2.358 2.617

5 PERCENT POINTS OF F DISTRIBUTION

V1 ♣
V2♠ 1 2 3 4 5 6 12 24
1 161.448 199.500 215.707 224.583 230.162 233.986 243.906 249.052
2 18.513 19.000 19.164 19.247 19.296 19.330 19.413 19.454
3 10.128 9.552 9.277 9.117 9.013 8.941 8.745 8.639
4 7.709 6.944 6.591 6.388 6.256 6.163 5.912 5.774
5 6.608 5.786 5.409 5.192 5.050 4.950 4.678 4.527
6 5.987 5.143 4.757 4.534 4.387 4.284 4.000 3.841
7 5.591 4.737 4.347 4.120 3.972 3.866 3.575 3.410
8 5.318 4.459 4.066 3.838 3.687 3.581 3.284 3.115
9 5.117 4.256 3.863 3.633 3.482 3.374 3.073 2.900
10 4.965 4.103 3.708 3.478 3.326 3.217 2.913 2.737
11 4.844 3.982 3.587 3.357 3.204 3.095 2.788 2.609
12 4.747 3.885 3.490 3.259 3.106 2.996 2.687 2.505
13 4.667 3.806 3.411 3.179 3.025 2.915 2.604 2.420
14 4.600 3.739 3.344 3.112 2.958 2.848 2.534 2.349
15 4.543 3.682 3.287 3.056 2.901 2.790 2.475 2.288
16 4.494 3.634 3.239 3.007 2.852 2.741 2.425 2.235
17 4.451 3.592 3.197 2.965 2.810 2.699 2.381 2.190
18 4.414 3.555 3.160 2.928 2.773 2.661 2.342 2.150
19 4.381 3.522 3.127 2.895 2.740 2.628 2.308 2.114
20 4.351 3.493 3.098 2.866 2.711 2.599 2.278 2.082
21 4.325 3.467 3.072 2.840 2.685 2.573 2.250 2.054
22 4.301 3.443 3.049 2.817 2.661 2.549 2.226 2.028
23 4.279 3.422 3.028 2.796 2.640 2.528 2.204 2.005
24 4.260 3.403 3.009 2.776 2.621 2.508 2.183 1.984
25 4.242 3.385 2.991 2.759 2.603 2.490 2.165 1.964
26 4.225 3.369 2.975 2.743 2.587 2.474 2.148 1.946
27 4.210 3.354 2.960 2.728 2.572 2.459 2.132 1.930
28 4.196 3.340 2.947 2.714 2.558 2.445 2.118 1.915
29 4.183 3.328 2.934 2.701 2.545 2.432 2.104 1.901
30 4.171 3.316 2.922 2.690 2.534 2.421 2.092 1.887
40 4.085 3.232 2.839 2.606 2.449 2.336 2.003 1.793
60 4.001 3.150 2.758 2.525 2.368 2.254 1.917 1.700
120 3.920 3.072 2.680 2.447 2.290 2.175 1.834 1.608


Denominator degrees of freedom

Numerator degrees of freedom

48

You might also like