# Statistical Methods

Descriptive Statistics
Descriptive Statistics consists of the tools and techniques designed to describe data, such as charts, graphs, and numerical measures.

1

Descriptive Statistics
(Histogram)

Descriptive Statistics
AVERAGE
The sum of all the values divided by the number of values. In equation form:
Mean =
n 1

∑x
n

i

=

sum of all data values number of data values

where:

n = number of data values
xi = ith data value

2

Inferential Statistics
Inferential Statistics consists of techniques that allow a decision-maker to reach a conclusion about characteristics of a larger data set (Population) based upon a subset (Sample) of those data

Inferential Statistics
Involves:
Estimation Hypothesis testing
Population?

Purpose

3

Inference Process
Estimates & tests Population

Sample statistic

Sample

Why Study Sampling Distributions
►Sample

statistics are used to estimate population parameters
e.g.: X = 50 estimates the population mean

µ

►Problems:

Different samples provide different estimates
Large samples give better estimates; large sample costs more How good is the estimate?

►Approach

to solution: Theoretical basis is sampling distribution

4

Sampling Distribution
probability distribution of a sample statistic ►Sample statistic is a random variable
Sample mean, sample proportion
►Results ►Theoretical

from taking all possible samples of the same size

Developing Sampling Distributions
►Assume

there is a population … size N=4 variable

A

B

C

D

►Population ►Random

X=age of individuals
►Values

of X: 18, 20, 22, 24 measured in years

5

Developing Sampling Distributions
Summary Measures for the Population Distribution
µ=

∑X
i =1

N

i

P(X)
.3 .2 .1 0 A B
(20)

N 18 + 20 + 22 + 24 = = 21 4

σ =

∑(X
i =1

N

i

−µ)

2

C
(22)

D
(24)

X

N

= 2.236

(18)

Uniform Distribution

Developing Sampling Distributions
All Possible Samples of Size n=2
1st Obs 2nd Observation 18 20 22 24

16 Sample Means
1st 2nd Observation Obs 18 20 22 24

18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24

18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24

N2=K=16 Samples
Taken with Replacement

6

Developing Sampling Distributions
Sampling Distribution of All Sample Means
16 Sample Means

Sample Means Distribution
P(X) .3 .2 .1 0

1st 2nd Observation Obs 18 20 22 24

18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24

_
18 19 20 21 22 23 24

X

Developing Sampling Distributions
Summary Measures of Sampling Distribution

µX =

∑X
i =1

N

i

K
N i =1

=

18 + 19 + 19 + L + 24 = 21 16 − µX )
2

σX =
=

∑(X

i

K

(18 − 21)

2

+ (19 − 21) + L + ( 24 − 21) = 1.58 16
2 2

7

Comparing the Population with its Sampling Distribution
Population N=4 Sample Means Distribution n=2

µ = 21
P(X) .3 .2 .1 0 A
(18)

σ = 2.236

µ X = 21
P(X) .3 .2 .1

σ X = 1.58

B
(20)

C
(22)

D X
(24)

0

_
18 19 20 21 22 23 24

X

Properties of Summary Measures

µX = µ

e.g.: X Is unbiased

►Standard

error (standard deviation) of the sampling distribution σ X is less than the standard error of other unbiased σ σ = estimators n
X

►For

sampling with replacement:
X

As n increases, σ

decreases

8

Unbiasedness
P(X)
Unbiased Biased

µ

µX

X

Less Variability
P(X)

Sampling Distribution of Median

Sampling Distribution of Mean

µ

X

9

Effect of Large Sample
P(X)
Smaller sample size Larger sample size

µ

X

When the Population is Normal
Population Distribution Central Tendency

µX = µ
Variation

σ = 10

σX =

σ

µ = 50
Sampling Distributions n=4 n = 16 σX =5 σ X = 2.5

n

Sampling with Replacement

µ X = 50

X

10

When the Population is Not Normal
Population Distribution Central Tendency

µX = µ
Variation

σ = 10

σX =

σ

µ = 50
Sampling Distributions n=4 n = 30 σX =5 σ X = 1.8

n

Sampling with Replacement

µ X = 50

X

Central Limit Theorem
As Sample Size Gets Large Enough Sampling Distribution Be comes Almost Normal Regardless of Shape of Population

X

11

How Large is Large Enough?
►For ►For ►For

most distributions, n>30 fairly symmetric distributions, n>15

normal distribution, the sampling distribution of the mean is always normally distributed

Sampling Distribution of the Sample Mean
The sampling distribution of the sample mean x is the probability distribution of the means of all possible random x x samples of n observations that can be drawn from a given population with mean u and variance σ2
POPULATION Mean= u Variance=σ2
x

x

x

x

...

x

12

SAMPLING DISTRIBUTION OF MEANS
Population:-10, 12, 14,16, 18, 20 Draw all possible random samples of size 2
No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sampls Mean 10 12 11 10 14 12 10 16 13 10 18 14 10 20 15 12 14 13 12 16 14 12 18 15 12 20 16 14 16 15 14 18 16 14 20 17 16 18 17 16 20 18 18 20 19

Means 11 12 13 14 15 16 17 18 19

f 1 1 2 2 3 2 2 1 1

Relative Frequency

0.06666667 0.06666667 0.13333333 0.13333333 0.2 0.13333333 0.13333333 0.06666667 0.06666667

Properties Sampling Distribution of the Sample Mean
If a random sample of size n is taken from a population with mean µ and standard deviation σ, then the sampling distribution of the sample mean x is

Normal, if the sampled population is normal
Has mean

µx = µ

Has standard deviation

σx=

σ
n

13

The Central Limit Theorem
In a random sample on size n is taken from a population then the sampling distribution of sample means is approximately normally distributed with

mean µ x = µ and standard deviation σ x = σ
Random Sample(x1, x2, …, xn) X

n

x
as n → large
Sampling Distribution of Sample Mean

Population Distribution

(µ,σ )

(µx = µ, σ x = σ

n)

Function of Hypothesis Testing Hypothesis testing begins with an assumption, called a hypothesis, that we make about a population parameter. Say that we assume a certain value for a population mean. To test the validity of our assumption we • Collect sample data • Produce sample statistics • Use this information to decide how likely it is that our hypothesized population parameter is correct. Now determine the difference between the hypothesized value and the actual value of the sample mean.

14

Function of Hypothesis Testing (Cont…) Then we judge whether the difference is significant or non-significant. Unfortunately, the difference between the hypothesized population parameter and the actual statistic is more often neither so large that we automatically reject our hypothesis nor so small that we just as quickly say don’t reject it. So in hypothesis testing, as in most significant real life decisions, clear-cut solutions are the exception, not the rule.

Function of Hypothesis Testing (Cont…) When to Reject the Hypothesis or Don’t Reject ? Suppose I say that the average marks in F.Sc of the students of UAF is at least 90 percent. How can you test the validity of my hypothesis? Using the sampling methods we could calculate the marks of a sample of students. If we did this and the sample statistic came out to be 95 percent, we would readily say that “don’t reject the statement”. However, if the sample statistic were 46 percent, we would reject the statement. We can interpret both these outcomes, 95 percent and 46 percent, using our common sense.

15

The Basic Problem ? Now suppose that our sample statistic reveals a mark of 88 percent. This value is relatively close to 90 percent, but is it close enough for us to “don’t reject the hypothesis”? Whether we don’t reject or reject the hypothesis, we can’t be absolutely certain that our decision is correct; therefore, we will have to learn to deal with uncertainty in our decision making.

Hypothesis Testing
Testing of Hypothesis A procedure which enables us to decide on the basis of information obtained from the sample taken from the population whether to reject or don’t reject any specified statement or hypothesis regarding the value of the population parameter in a statistical problem is known as testing of hypothesis.

16

Hypothesis Testing
Statistical Hypothesis An assumption made about the population parameter which may or may not be true. Null Hypothesis or Maintained Hypothesis Denoted by symbol Ho, is any hypothesis which is to be tested for possible rejection under the assumption that it is true. The null hypothesis always contains some form of an equality sign. Alternative Hypothesis or Research Hypothesis The complement of the null hypothesis (H1), denoted by A. The alternative hypothesis never contains the sign of equality and is always in an inequality form.

Hypothesis Testing Basic Strategy in Hypothesis Testing
The basic strategy in statistical hypothesis testing is to attempt to support the Research/Alternative hypothesis by contradicting the null hypothesis. Reasoning in Hypothesis Testing The null hypothesis should be regarded as true and should be rejected only when the sample data gives strong evidence against it. The alternative hypothesis is the hypothesis which we are willing to don't reject. A null hypothesis is thus tested against an alternative hypothesis.

17

Hypothesis Testing

Error of Inference
Whenever sample evidence is used to draw a Conclusion, there are risks of making wrong decision because of sampling. Such errors in making the incorrect conclusion are called Inferential Errors, because they entail drawing an incorrect inference from the sample about the value of the population parameter.

18

Error of Inference

Hypothesis Testing Significance Level
Probability of committing a Type-I error is called the level of significance, denoted by α . The level of significance is also called the size of test. By α =5% we mean that there are 5 chances in 100 of incorrectly rejecting a true null hypothesis. To put it in another way we say that we are 95% confident in making the correct decision.

Level of Confidence
The probability of not committing a Type-I error, (1- α ), is called the level of confidence, or confidence co-efficient.

Power of a Test
The probability of not committing a Type-II error, (1-β), is called the power of the test.

19

Hypothesis Testing
A cut-off value often used is 0.05, that is, reject the null hypothesis when the p-value is less than 0.05. For example, suppose you do a t-test to test the null hypothesis that equals 5, versus the alternative equals 5 if the test yields a hypothesis that it does not equal 5. You would reject the null hypothesis that very small (for example, less than 0.05) p-value.

Hypothesis Testing
Mathematically If P-Value < Level of significance (α) Reject H at α level. Statistically say that the results are significant. If P-Value > Level of significance (α) Do not reject H at α level. Statistically say that the results are non- significant.

20

Hypothesis Testing Test Statistic
Statistic on which the decision of rejecting or don’t rejecting the null hypothesis is based Rejection Region/Critical Region (CR) That part of the sampling distribution of a statistic for which the null hypothesis is rejected Non-rejection Region/Non-critical Region That part of the sampling distribution of a statistic for which the null hypothesis is not rejected

21

General Procedure for Hypothesis Testing
• Formulate the null and alternative hypotheses • Decide upon a significance level, • Choose an appropriate test statistic and find its value • Determine the Critical Region (CR). The location of the CR depends upon the form of alternative hypothesis. Choose the location of the CR on the basis of the direction at which the inequalities sign points: • If >, choose the right tail as the CR • If <, choose the left tail as the CR • If ≠ , choose a two-tailed CR

General Procedure for Hypothesis Testing
• Reject null hypothesis if the computed value of test statistic falls in the CR, otherwise don’t reject null hypothesis and then state the decision in managerial terms

22

STEPS FOR TEST OF HYPOTHESIS 1):-Construction of hypotheses 2):- Level of significance 3):- Test statistic 4):-Decision rule 5):-Conclusion

1/5 Construction of hypotheses
[Null and Alternative Hypotheses]
The null hypothesis, denoted H0, is any hypothesis which is to be tested for possible rejection or nullification under the assumption that it is true. The null hypothesis always contains some form of an equality sign. The alternative hypothesis, denoted Ha, The complement of the null hypothesis is called the alternative hypothesis. It is denoted by H1. The alternative hypothesis never contains the sign of equality and is always in an inequality form.

23

1/5 Construction of hypotheses
[One sided and two sided hypothesis]
One-Sided, Greater Than ( Right Tail) H0: µ ≤ 50 Ha: µ > 50

One-Sided, Less Than (Left Tail) H0: µ ≥ 50 Ha: µ < 50

Two-Sided, Not Equal To H0: µ = 50 Ha: µ ≠ 50

2/5

Level of significance
[Type I and Type II errors]

One the basis of sample information, we may reject a true statement about population or don’t reject a false statement / H0 is true Type I error = Reject H0 Type II error = Don’t Reject H0 / H0 is fasle

24

2/5

Level of significance
[Type I and Type II errors]
State of Nature

P(Type I error)=α P(Type II error)=β

Decision on the basis of sample information Reject H0 Do not Reject H0

H0 True

H0 False

Type I Error Correct Decision

Correct Decision Type II Error

α and β are inversely
related to each other

1- α = level of confidence 1- β= Power of the test

3/5
►A

Test Statistic

statistic on which the decision of rejecting or don’t rejecting the null hypothesis is based is called a test statistic ► In testing of hypothesis the sampling distribution of the test statistic is based on the assumption that the null hypothesis is true.

25

4/5
► Critical

Decision Rule
region/Rejection

Critical Value

region

Critical region is that part of the sampling distribution of a statistic for which the HO is rejected. A null hypothesis is rejected if the value of test- statistic is not consistent with the HO. CR is associated with H1.

AR

RR

► Non-rejection

Region

Non-rejection region is that part of the sampling distribution of a statistic for which the HO is not rejected.

Critical Values:
The values that separate Rejection and Nonrejection regions are called Critical values

5/5

Result

Reject Ho if the calculated value of test statistic falls in the rejection region otherwise don’t reject Ho

26

Test of hypothesis included in the course
► ► ► ► ► ►

Single population mean difference between population means Single population proportion difference between population proportions several proportions several means

µ µ1 – µ2 P P1- P2

Assumptions
► Parent

population should be normal or sample size should be large ► The sample should be random

27

EXAMPLE:- It is claimed that an automobile is driven on the average more than 12,000 miles per year. To test this claim a random sample of 100 automobiles owners are asked to keep a record of the miles they travel. Would you agree with the claim if the random sample showed an average of 12500 miles and a standard deviation of 2400 miles?
POPULATION

µ > 12000

Construction of hypotheses Ho : µ ≤ 12000 H1: µ > 12000 Level of significance α = 5%
Test Statistic

µ ≤ 12000

Z=

X −µ s2 n

=

12500−12000 = 2.083 (2400) 2 100

SAMPLE n=100 X=12500 S=2400

Decision Rule:- Reject Ho if Zcal ≥ Zα Result:-As Zcal > Z.05 =1.645, So reject Ho and conclude that the claim is true.

has been found from experience that the mean breaking strength of a particular brand of thread is 9.63N with a standard deviation of 1.40N. Recently a sample of 36 pieces of thread showed a mean breaking strength of 8.93N. Can we conclude that the thread has become inferior?
POPULATION

EXAMPLE:- It

σ=1.40 µ < 9.63

Construction of hypotheses Ho : µ ≥ 9.63 H1: µ < 9.63 Level of significance α = 5%
Test Statistic
X −µ

µ ≥ 9.63

Z=

σ2
n

=

8.93 − 9.63 (1.40) 2 36

= −3

SAMPLE n=36 X=8.93

Decision Rule:- Reject Ho if Zcal ≤ - Zα Result:-As Zcal < - Z.05= -1.645 Reject Ho and hence we conclude that threat has become inferior.

28

mean lifetime of bulbs produced by a company has in past been 1120 hours. A sample of 9 electric light bulbs recently chosen from a supply of newly produced battery showed a mean lifetime of 1170 hours with a standard deviation of 120 hours. Test that mean lifetime of the bulbs has not changed
POPULATION

EXAMPLE:- The

µ =1120

Construction of hypotheses Ho: µ = 1120 H1: µ ≠ 1120 Level of significance α = 5%
Test Statistic t= X − µ =
S2 n
1170 − 1120 (120 ) 9
2

µ ≠ 1120

= 1.25

SAMPLE n=9 X=1170 S=120

Decision Rule:-Reject Ho if | tcal | ≥ tα/2(n-1)

Result:-As | tcal | < t.025 (8),=2.306 So don’t reject Ho and conclude that the mean life has not changed.

EXAMPLE:- Workers at a production facility are required to assemble a certain part in 2.3 minutes in order to meet production criteria. The assembly rate per part is assumed to be normally distributed. Six workers are selected at random and time in assembling is recorded. The assembly times (in minutes) for the six workers are as follows. The manager wants to determine that the mean assembling time is according to production criteria.
1 Worker Time 2 (X-X)2 0.01 ∑ X = 12.6 = 2.1 X =
n 6 POPULATION

2 2.4 0.09

3 1.7 0.16

4 1.9 0.04

5 2.8 0.49

6 1.8 0.09

TOTAL 12.6 0.88

µ ≤ 2.3

µ >2.3

∑ ( X − X ) 2 of hypotheses Construction = 0.88 = 0.176 S2 = n −1 5 Ho: µ ≤ 2.3 H1: µ > 2.3 Level of significance α = 5%
Test Statistic

t=

X −µ S2 n

=

2.1 − 2.3 0.176 6

= −1.166

SAMPLE n=6 X=2.1 S2=0.176

Decision Rule:-Reject Ho if tcal ≥ tα(n-1) Result:-As tcal < t.05 (5),= 2.015 So don’t reject Ho and
conclude that the assembling time is according to production criteria.

29

Test of hypothesis for population mean
Ho µ ≤
o

H1 µ >
o

Population variance Sample size • Pop. Variance known

Test Statistic

Z=

X −µ

Decision rule Reject Ho if Zcal > Zα

σ2
n
X −µ S2 n

µ ≥

o

µ <

o

• •

Pop.Variance unknown Large sample

Zcal < -Zα

Z=

µ =

o

µ

o

• •

Pop.Variance unknown Large sample

Z=

X −µ S2 n
X −µ S2 n

Zcal > Zα/2 OR Zcal < -Zα/2 > t α/2(n-1) OR tcal < -tα/2(n-1)
tcal

µ =

o

µ

o

• • •

Pop.Variance unknown Small sample Pop is Normal

t=

Population and Sample Proportions
Population X1, X2, …, XN p Sample x1, x2, …, xn

ˆ p

Population Proportion
P = X N

Sample Proportion

ˆ p=

X n

30

Example: Sample Proportion
117 out of 500 sample students from UAF are not in favour of semester system in the university
n = 500, number of students surveyed

X = 117, number of students who disfavour ^ X 117 p= = = 0.234 Disfavour n 500 ^ n − X 383 q= = = 0.766 favour n 500

p+ q = 0.234 + 0.766 = 1

^

^

Always

TEST OF HYPOTHESIS FOR POPULATION PROPORTION
EXAMPLE:-A manufacturer claimed that at least 95% of the equipment which he supplied to a factory conformed to specification. An examination of a sample of 200 pieces of equipment revealed that 18 were faulty. Test his claim at 5%.
Construction of hypotheses Ho : P ≥ 0.95 H1: P < 0.95 Level of significance α = 5%
Test Statistic Z=
ˆ o p−P PQo o n = 0.91− 0.95 = −2.60 (0.95)(0.05) 200

POPULATION

P ≥ 0.95 P < 0.95

SAMPLE Decision Rule:- Reject Ho if Zcal ≤ - Zα n=200 X=182 Result:-As Zcal < - Z.05= -1.645 Reject Ho and p=182/200=0.91 conclude that manufacturer claim is not correct

31

Example:- Out of 500 students from UAF, 400 are in favour of the semester system in the university. Can we conclude that the proportion of students from the university in favour of the semester system is at most 70%.

POPULATION

P ≤ 0.70 P >0.70

Construction of hypotheses Ho : P ≤ 0.70 H1: P > 0.70
Test Statistic

Level of significance α = 5%
ˆ o p− P PQ o o n 0.80−0.70 = 4.87 (0.70)(0.30) 500

Z=

=

SAMPLE n=500 X=400
p=400/500=0.80

Decision Rule:- Reject Ho if Zcal > Zα Result:-As Zcal > Z0.05=1.645 We, reject Ho and conclude proportion is more than 70%.

TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION MEANS

EXAMPLE:The average salary of 50 workers from Masood textile is Rs.7,000 with a standard deviation of Rs 500 and average salary of 70 workers form Shahzad textile is 6,800 with a standard deviation of Rs.300 . On the basis of above sample information can we conclude that Masood textile is paying more to workers than Shahzad Textile. Use 5% level of significance
POPULATION

µ1 > µ2 µ1 ≤ µ2

Construction of hypotheses Ho : µ1 ≤ µ2 H1: µ1 > µ2 Level of significance α = 5%
T e st S ta tistic Z = ( X 1− X 2 ) − ( µ 1 − µ 2 ) S 12 S 22 + n1 n2 = (7 0 0 0 − 6 8 0 0 ) − ( 0 ) 5002 3002 + 50 70 = 2 .5 2

SAMPLE n1=50 n2=70 Decision Rule:-Reject Ho if Zcal ≥ Zα X1=7000 X2=6800 Result:-As Zcal=2.52 > Zα=1.645 .So reject H0 and conclude S1=500 S2=300 that the average salary of all workers in Masood textile is more than average salary of workers for Shahzad textile..

32

Example:-The strength of ropes made out of cotton yarn and coir gave on measurement the following values
Cotton :7.5 5.4
Coir 8.3 6.1 10.6 9.6 9.0 10.4 6.1 6.4 10.2 10.0 7.9 7.9 9.7 7.1 8.5 8.9 7.5 9.7

Test whether there is a significant difference in the strength of the two types of ropes at 5% level of significance. Assume population variances are equal
POPULATION

µ1 ≠ µ2 µ1 = µ2

Construction of hypotheses Ho : µ1 = µ2 H1: µ1 ≠ µ2 Sp2= (n1−1)S12 + (n2 −1)S22 = 26.78+ 20.236= 2.612 (n1−1) + (n2 −1) 9 +9 Level of significance α = 5% ( X 1− X 2) − (µ1 − µ 2) 1 1  S p  n1 + n2   
2 − −

t=

=

(8.2 − 8.48) − (0) 1 1 2.612 +   10 10 

= −0.38

SAMPLE Decision Rule:-Reject Ho if , | t cal | ≥ tα/2(n1+n2-2) n1=10 n2=10 X1=8.2 X2=8.48 Result:-As | t cal |=0.38 < t0.025(18 ) =2.101 so don’t reject Ho S12=2.98 S22=2.25 and conclude that there is not significant difference between the Sp2=2.612 ropes made from cotton and coir yarn

Example:-Six horses were feed on diet A, 5 on diet B. the gains in weights for the individual horses were as shown Diet A(X1): 30 30 28 38 28 26 Diet B(X2) 40 34 38 32 26 Can we conclude that diet B is better as compare to diet A for increasing weight?Assume population variances are unequal
POPULATION

µ1 < µ2 µ1 ≥ µ2

Test Statistic

Construction of hypotheses Ho : µ1 ≥ µ2 H1: µ1 < µ2 Level of significance α = 5%
t=

df =

[w1 + w 2 ]2
( w1) 2 ( w 2 ) 2 + n1 − 1 n 2 − 1

≈7

(30 − 34) − (0) = −1.34 17.6 30 + 6 5

Where w1= S12/n1 =2.93 and w2= S22/n2 =6

SAMPLE n1=6 n2=5 X1=30 X2=34 S12=17.6 S22=30

Decision Rule:-Reject Ho if tcal ≤ - tα(df) Result:-As tcal=-1.34 > tα(7)=-1.895 So don’t reject Ho and conclude that diet B is not better than A

33

Test of hypothesis for comparing two means of NORMALPOPULATION
Ho µ 1- µ 2 ≤
d

H1 µ 1- µ 2 >
d

Population variance Sample size • Pop. Variances known • Samples are independent

Te st Statistic ( X 1− X 2 ) − ( µ d )
− −

Decision rule Reject Ho if Z cal > Z α

Z =

σ 12
n1

+

σ 22
n2 Z cal < -Z α

µ 1- µ 2 ≥

d

µ 1- µ 2 <

d

• • •

Pop. Variances unknown Samples are large Samples are independent Pop.Variance unknown and equal Samples are small Samples are independent Samples are not independ ent (Samples are paired)

Z =

( X 1− X 2 ) − ( µ d ) S12 S 2 2 + n1 n2 ( X 1− X 2 ) − ( µ d ) 1   1 S 2p +   n1 n 2  ( X 1− X 2 ) − ( µ d )  S 2d   n     
− − − −

µ 1- µ 2 =

d

µ 1- µ 2 ≠

o

• • •

t=

t cal > t α/2(df) OR tcal < -tα/2(df)
df=n1+n2 -2

µ 1- µ 2 =

d

µ 1- µ 2 ≠

o

t=

> t α/2(n-1) OR t cal < -t α/2(n-1)
tcal

TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION PROPORTIONS

A sample of 150 light bulbs produced by company A showed 12 defective bulbs while a sample of 100 light bulbs produced by company B showed 4 defective bulbs. Is there a significant difference between the proportions of defective bulbs produced by two companies at 5% level of significance?
Let X1=Total number of defective bulbs produced by company A=12 Let X2=Total number of defective bulbs produced by company B=4
POPULATION

P1 ≠ P2 P1 =P2

Construction of hypotheses Ho: P1 = P2 H1: P1 ≠ P2 Level of significance α = 5%

Test Statistic
Z= ˆ ˆ ( p1 − p2 ) − (P − P2 ) 1 1 1 ˆ ˆ pc qc  +   n1 n2  = (.08 − .04) − (0) 1   1 (0.064)(0.936)  +   150 100  = −1.27

SAMPLE n1=150 n2=100 X1=12 X2=4 p1=0.08 p2=0.04
ˆ pc = X1+ X 2 12 + 4 = = 0.064 n1 + n 2 150 + 100

Decision Rule:- Reject Ho if | Z cal | ≥ Zα/2

qc=1-.064=0.936

Result:-As | Z cal | < Zα/2= 1.96 We, don’t reject Ho

34

TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION PROPORTIONS
A machine puts out 16 imperfect articles in a sample of 500. After machine is overhauled, it puts out 97 prefect articles in a sample of 100. Has the machine been improved? Let X1=Number of imperfect articles in the sample before the machine is overhauled=16 Let X2=Number of imperfect articles in the sample after the machine is overhauled=100-97=3

POPULATION

Construction of hypotheses Ho: P1 ≤ P2 H1: P1 > P2 Level of significance α = 5%
Test Statistic
Z = ˆ ˆ ( p1 − p 2 ) − ( P1 − P2 ) 1 1 ˆ ˆ  pc qc  +  n1 n 2     = ( 0.032 − .03) − ( 0 ) 1   1 + ( 0 .0317 )( 0 .968 )   500 100  = 0.11

P1 > P2 P1 ≤ P2

SAMPLE n1=500 n2=100 X1=16 X2=03 P1=0.032 p2=0.03
ˆ pc = X1+ X 2 16+3 = 0.0317 = n1+ n2 500+100

Decision Rule:- Reject Ho if Zcal ≥ Zα
Result:-As Zcal < Z0.05= 1.645 We, don’t reject Ho and conclude that machine has not been improved.

qc=1-0.0317=0.968

POPULATION PARAMETER ?

Point Estimate
(Single value)

Interval Estimate
(Range of Values)

35

INTERVAL ESTIMATE
An interval estimation for population parameter is a rule for determining an interval in which the parameter is likely to fall. The corresponding estimate is called interval estimate. Usually a probability of some confidence is attached with the interval estimate when it is formed.

Example:- A researcher wishes to estimate the average amount of money
that a student from university spends for food per day. A random sample of 36 students is selected and the sample mean is found to be Rs 45 with standard deviation of Rs.3. Estimate 90 % confidence limits for the average amount of money that the students from the university spend on food per day.

POPULATION

SAMPLE n=36 90% C.I X=45 α = 10% S=3

45±(1.645)(0.5) ( 44.18 , 45.82)

36

Interpretation of Confidence Interval

Some 95% C.Is for the population mean

U If we construct 100 C.Is ,one from each sample of the same size, then 95 of such constructed C.Is will contain unknown parameter and 5 C.Is may not contain parameter

► ► ►

The sample mean is at the center of the interval estimate Width w , of the interval -i-e the distance between the end points is U.L – L.L so width W, is determined by probability content, the S.D, and the sample size. Therefore the following results hold. For a given probability content, and S.D, the bigger the sample size n, the narrower the confidence interval for the population mean. For a given prob content and sample size, the smaller the standard deviation the narrower would be the confidence interval for the population mean. For given SD and sample size, the smaller the probability content (1-α), the narrower the confidence interval for the population mean.

37

Example:-The following data represents the daily milk production of a
random sample of 10 cows from a particular breed 12,15,11,13,16,19,15,16,18,15. Construct 90% C.I for the average milk production of all the cows of that particular breed.

POPULATION

15± (1.833)
SAMPLE n=10 90% C.I X=15 α = 10% S2=22.89

(1.51) , 17.77)

( 12.23

Example:- In a sample of 500 individuals in a certain area 41 were found to be unemployed. Compute 99% C.I for the rate of unemployment in that area

ˆ p ±
POPULATION

Z

α

/ 2

    

ˆ p qˆ
n

    

0 . 082 ±

Z

. 005

   

( 0 . 082 )( 0 . 918 )    500 

0.082
SAMPLE n=500 99% C.I X=41 α = 1% p=41/500=0.082

± , ,

(2.58)(0.0123) 0.114) OR 11.4%)

( 0.05 ( 5%

38

Example:-A test in Statistics was given to 50 girls and 75 boys. The
girls made an average grade of 76 with a standard deviation of 6, while boys made grade of 82 with a standard deviation of 8. Find 96% confidence interval for the difference between µ1-µ2. Where µ1 is the µ mean of all boys and µ2 is the mean of all girls who might take this test
POPULATION

SAMPLE n1=75 n2=50 X1=82 X2=76 S1=8 S2=6 96% C.I α = 4%

6 ± (2.054) (1.254) ( 3.42 , 8.58)

Example:-A random sample of 20 plants from Variety I showed a
mean height of 63 cm with standard deviation of 6 cm, while an other random sample of 25 plants from Varity II showed a mean height of 60 cm with standard deviation of 2 cm. Construct 90 confidence interval for the difference between two variety means.(Assume population variances are unequal)

( X 1− X 2 ) ±
POPULATION

df =

  / 2 ( df )  2 [w 1 + w ]

S

2

( w 1) 2 (w 2)2 + n1 − 1 n2 − 1

  + n1 n2   ≈ 23
2 1

S

2

2

( 63
SAMPLE n1=20 n2=25 X1=63 X2=60 S1=6 S2=2 90% C.I α = 10%

− 60

Where w1= S12/n1 =1.8 and w2= S22/n2 =0.16 2   6 + ) ± t 0 . 05 ( 23 )  20  

2
25

2

    

3

± (1.71)(1.4) 5.394)

(0.606 ,

39

Determination of sample size for estimating population mean
► 1)

Example:-If the standard deviation of the lifetimes of tube is estimated as 100 hours. How large a sample should one take in order to be 95% confident that the error in the estimated mean lifetime will not exceed 20 hours 90% confident that the error in the estimated mean lifetime will not exceed 20 hours 95% confident that the error in the estimated mean lifetime will not exceed 10 hours 99% confident that the error in the estimated mean lifetime will not exceed 20 hours 95% confident that the width of confidence interval is 30 hours

2)

3)

4)

5)

Solution: Zα  1) n =  2
2

1-α = 95%

α = 5%

e=20

σ =100

  σ   e2

( )
2

=

(1.96 )2 (100 )2 (20 )2

= 96.04 ≈ 96 α = 10%

2) 1-α = 90%  Zα  n= 2 3)
2

e=20

σ =100

  σ2  (1.645 )2 (100 )2 = 67.65 ≈ 68  = 2 e (20 )2 1-α = 95% α = 5%

( )

e=10

σ =100

 Zα  n= 2

  σ   e2

2

( )
2

=

(1.96 )2 (100 )2 (10 )2

= 384.16 ≈ 384 α = 1% e=20 σ =100

4)

1-α = 99%
2

  Zα  n= 2

  σ2  (2.58 )2 (100 )2 = 164.41 ≈ 164  = 2 e (20 )2

( )

40

How large should the sample be depends on the following factors
► ► ►

How precise do we want a confidence interval estimate to be (width of confidence interval) How confident do we want to be that the interval estimate is correct (margin of error) How variable is the population being sampled? (Variance of the population)
Greater the desired level of confidence the larger will be the sample Smaller the error, the larger will be the sample Greater the variation in the population, the larger will be the sample

Population Variances known

Yes

Z

=

( X 1− X 2 ) − (µ 1 − µ 2 )

σ 1
n1

2

+

σ 2
n 2

2

NO
Sample sizes large

Yes

Z

=

( X 1− X 2 ) − (µ 1 − µ 2 ) S 12 + n1 S 2 2 n 2

NO
Population Variances equal

Yes
t =

( X 1− X 2 ) − (µ 1 − µ 2 )

S

2 p

1   1 +   n 2   n1

NO

t =

( X 1− X 2 ) − ( µ 1 − µ 2 )  S 12 S 22  +   n2   n1

41

Mr. Fahid is the training manager of a light engineering firm which employs a considerable number of skilled machine operators. The firm is constantly making efforts to improve the quality of its product and so recently Mr. Fahid has introduced a new refresher training course for workers who have been on the same machine for a long time. The first group has now completed the course and returned to normal work and Mr. Fahid would like to assess the effect if any which the training has had upon the standard of its work so that he can decide whether to make such courses a regular event
Category Under 35 35-50 Over 50 Sub Total Improved 17 (21x40)/60=(14) 17 (24x40)/60=(16) 6 (15x40)/60=(10) 40 Did not improved 4 (21x20)/60=(7) 7 (24x20)/60=(8) 9 (15x20)/60=(05) 20 Sub Total 21 24 15 60

1) Construction of hypotheses Ho : Same rate of improvement in all three age groups OR No association (independence) between age group and improvement H1: Rate of improvement is not same in all age groups OR The two attributes (age group and improvement) are not independent

)Test Statistic

χ2 = ∑

Decision Rule:- Reject Ho if χ cal ≥ χ α[(r-1)x(c-1)]
Result:-As χ2cal > χ2.05(2)=5.99 So reject Ho and conclude that the two attributes are not independent i.e improvement rate for different age group is different

(O − E ) 2 = 6.92 E

O 17 17 6 4 7 9 164

E 14 16 10 7 8 5 164

(O-E)2/E
0.643 0.063 1.600 1.286 0.125 3.200

6.92

2

2

Comparing more than two population means
We can use two sample t-test to test the equality of more than two population means, but this procedure
Require large number of two sample t-tests Performing many two sample t-tests at α tends to inflate the overall α risk.

For example, To test the equality of 10-population means, we have to perform 45 t-test If the tests are independent and each test use α =0.05, then overall α=45(0.05)=2.25 we require a procedure for carrying out test of hypothesis about the equality of several population means simultaneously
–we can use F-distribution in ANOVA that yields a single test statistic for comparing all means so that the overall risk of Type-I error is controlled

42

Analysis of Variance (ANOVA)
Analysis of Variance is a procedure that partitions the total variability present in the data set into meaningful and distinct components. Each component represents the variation due to a recognized source of variation, in addition, one component represents the variation due to uncontrolled factors and random errors associated with the response measurements
NORMALITY:-The K-populations from which sample are drawn should be normal
INDEPENDENCE:-The k-samples should be independent Randomness: The k-Samples should be random HOMOSCEDASTICITY ( Common Variance):-The k_population have common variance

One-Way ANOVA
Four groups of students ( All of approximately same attributes) were subjected to different teaching techniques and tested at the end of a specified period of time. Due to drop outs in the experimental groups (sickness, transfers etc) the number of students varied from group to group
Method 1 65 87 73 79 81 69 454 Method 2 75 69 83 81 72 79 90 549 Method 3 59 78 67 62 83 76 425 Method 4 94 89 80 88

351

1779

Do the data provide sufficient evidence to indicate a difference in the mean achievements for the 4 teaching techniques?

43

Graphical view of data

Construction of hypotheses Ho : µ1=µ2=µ3=µ4(i.e Mean achievements from 4 methods are same) H1: At least two µ’s are different Level of significance

α = 5%
Test Statistic

F =

S S

2 b 2 w

A N O V A TABLE
Source Of Variation (S.O.V) Between Methods Within Methods (Error) TOTAL Degree of Freedom DF 4-1 =3 22-3=19 23-1=22 Sum of Squares SS 712.6 1196.6 1909.2 Mean Sum of Squares MSS=SS/df 237.5 S2b 2 63.0 S w(MSE) Fcal 3.77*

CALCULATION FOR ANOVA TABLE ?

44

Method 1 65 87 73 79 81 69 454

Method 2 75 69 83 81 72 79 90 549

Method 3 59 78 67 62 83 76 425

Method 4 94 89 80 88

Correction Factor=(G.T)2/Obs= (1779)2/23 = 137601.8 TotalSS=(65)2+(87)2 …(88)2 – CF= 139511 – 137601.8

= 1909.2
1779

351

Between Method SS =

(454) 2 (549) 2 (425) 2 (351) 2 + + + − CF = 138314.4 − 137601.8 = 712.6 6 7 6 4

WithinMethodsSS

=TotalSS – Between MethodsSS=1909.2 – 712.6= 1196.6

SOV
Between Methods

DF
4-1=3 19
23-1=22

SS
712.6 1196.6 1909.2

MSS=SS/df
237.5 63

Fcal
237.5/63=3.77

Within Methods

TOTAL

Decision Rule:- Reject Ho if Fcal ≥ Fα(3,19) Result:-As Fcal =3.77 > F.05(3,19) =3.10 So reject Ho and conclude that
there is difference in the mean achievements for the four teaching methods.

Two-Way ANOVA
The Black Rock candy company was planning a test of three new candy flavors (A,B,C). In the test company wished also to measure the effect of three different retail price levels (79 Cents, 89 Cents, 99 Cents). Because each flavor was to be tested at each price a total of nine different flavorprices level combinations were to be tested. The following data represent the the number of sold candy in (100).
A 79 89 99 Total 8 4 4 16 B 13 18 22 53 C 5 6 10 21 Total 26 28 36 90

Do the data provide sufficient evidence to indicate a difference in the mean for flavors and prices?

45

Construction of hypotheses i.e All the flavors have equal sales Ho : µ1=µ2=µ3 H′o : µ′1=µ′2=µ′3 i.e All the prices have equal sales H1: At least two µ’s are different H1’: At least two µ′’s are different

Test Statistics

S S F'= S S
F=
Source Of Variation (S.O.V) Between Flavours Between Price Error TOTAL

2 1 2 3 2
2 2 3

to test Ho to test H/o
A N O V A TABLE
Sum of Squares Mean Sum of Squares Fcal SS MSS=SS/df 268.67 134.33 11.51 0.8 18.67 9.33 46.67 11.67 334

Degree of Freedom DF 3-1 =3 3-1=2 8-2-2=4 9-1

CALCULATION FOR ANOVA TABLE ?

A 79 89 99 T otal 8 4 4 16

B 13 18 22 53

C 5 6 10 21
2

T otal 26 28 36 90

C. F=(G.T)2/Obs= (90)2/09 = 900
TotalSS=(8)2+(4)2 …(10)2 – CF= 1234 –900=334

F la v o r s S S =

(1 6 ) (5 3 ) 2 (2 1 ) 2 + + -C F = 1 1 6 8 .6 7 -9 0 0 = 2 6 8 .6 7 3 3 3 2 2 2 (2 6 ) (2 8 ) (3 6 ) P ric e S S = + + -C F = 9 1 8 .6 7 -9 0 0 = 1 8 .6 7 3 3 3

Error SS

=Total SS- Flavor SS - Price SS=334 – 268.67 – 18.67 = 46.67
A N O V A T A B L E

Source Of Degree of Freedom Variation (S.O.V) DF Bet. Flavours 3-1 =2 3-1=2 Bet. Price 8-2-2=4 Error TOTAL 9-1

Sum of Mean Sum Fcal Squares of Squares MSS=SS/df SS 11.51* 134.33 268.67 9.33 18.67 0.8ns 11.67 46.67 334

Decision Rule:-

Reject Ho if Fcal ≥ Fα(2,4)=6.94 Reject H’o if F’cal ≥ Fα(2,4)=6.94

46

Effect of Degrees of Freedom on the t-distribution

•The shape of t distribution depends on degree of freedom •As the number of degrees of freedom increases, the spread of the t distribution decreases and the t curve approaches the standard normal curve. •Approximately n ≥ 30 the t and standard normal become same

Z-TABLE
0.005 2.58 0.01 2.33 0.025 1.96 0.05 1.64 0.1 1.28

47

PERCENTAGE POINT

OF

STUDENT'S t-DISTRIBUTION
Alpha 0.050 0.025
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.658 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.980

d.f. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120

0.250
1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.677

0.100
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.289

0.0125
31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.358

0.005
63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.617

5 PERCENT POINTS
V2♠

OF

F DISTRIBUTION
V1 ♣

1
161.448 18.513 10.128 7.709 6.608 5.987 5.591 5.318 5.117 4.965 4.844 4.747 4.667 4.600 4.543 4.494 4.451 4.414 4.381 4.351 4.325 4.301 4.279 4.260 4.242 4.225 4.210 4.196 4.183 4.171 4.085 4.001 3.920

2
199.500 19.000 9.552 6.944 5.786 5.143 4.737 4.459 4.256 4.103 3.982 3.885 3.806 3.739 3.682 3.634 3.592 3.555 3.522 3.493 3.467 3.443 3.422 3.403 3.385 3.369 3.354 3.340 3.328 3.316 3.232 3.150 3.072

3
215.707 19.164 9.277 6.591 5.409 4.757 4.347 4.066 3.863 3.708 3.587 3.490 3.411 3.344 3.287 3.239 3.197 3.160 3.127 3.098 3.072 3.049 3.028 3.009 2.991 2.975 2.960 2.947 2.934 2.922 2.839 2.758 2.680

4
224.583 19.247 9.117 6.388 5.192 4.534 4.120 3.838 3.633 3.478 3.357 3.259 3.179 3.112 3.056 3.007 2.965 2.928 2.895 2.866 2.840 2.817 2.796 2.776 2.759 2.743 2.728 2.714 2.701 2.690 2.606 2.525 2.447

5
230.162 19.296 9.013 6.256 5.050 4.387 3.972 3.687 3.482 3.326 3.204 3.106 3.025 2.958 2.901 2.852 2.810 2.773 2.740 2.711 2.685 2.661 2.640 2.621 2.603 2.587 2.572 2.558 2.545 2.534 2.449 2.368 2.290

6
233.986 19.330 8.941 6.163 4.950 4.284 3.866 3.581 3.374 3.217 3.095 2.996 2.915 2.848 2.790 2.741 2.699 2.661 2.628 2.599 2.573 2.549 2.528 2.508 2.490 2.474 2.459 2.445 2.432 2.421 2.336 2.254 2.175

12
243.906 19.413 8.745 5.912 4.678 4.000 3.575 3.284 3.073 2.913 2.788 2.687 2.604 2.534 2.475 2.425 2.381 2.342 2.308 2.278 2.250 2.226 2.204 2.183 2.165 2.148 2.132 2.118 2.104 2.092 2.003 1.917 1.834

24
249.052 19.454 8.639 5.774 4.527 3.841 3.410 3.115 2.900 2.737 2.609 2.505 2.420 2.349 2.288 2.235 2.190 2.150 2.114 2.082 2.054 2.028 2.005 1.984 1.964 1.946 1.930 1.915 1.901 1.887 1.793 1.700 1.608

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120

♠ ♣

Denominator degrees of freedom Numerator degrees of freedom

48