You are on page 1of 22

SAMPLING THEORY

SAMPLING:

Every statistical investigation aim at collecting information about some


collection of individuals or of their attributes in statistical language such a
collection is called population.
Eg:- Products turned out by a machine liver of electric bulbs manufactured by a
company.
A population is finite or infinite according as number of elements is finite or
infinite in most situations the population may be considered infinitely large. A finite
subset of population is called a “SAMPLE” and the process of selecting such
samples is called “SAMPLING”.

PARAMETERS AND STATICTICS:

Generally in statistical investigation our ultimate interest will lie in one or


more characteristics possessed by the population. The interest is to know different
statistical measures such as mean and variance population.

Statistical measures calculated on the basis of population values are called


“Parameters”. Corresponding measures compute on the basis of sample values are
called “Statistics”.

SAMPLING DISTRIBUTION:

Considering all possible samples of size ‘n’ which can be drawn from a given
population at random for each sample computes sample mean. The means of the
samples will not be identical. The distribution so formed is called “Sampling
Distribution” of means similarly we can have sampling distribution of standard
deviations etc. In general sampling distribution is called sampling distribution of
statistics.

STANDARD ERROR:

The standard deviation of the sampling distribution is called “Standard


Error”. The standard error is used to assess the differences between expected and
observed values. Thus the standard error of sampling distribution of means is
called standard error of means.

In the number of elements in a sample ‘n’ is greater than or equal to 30. The
sample space is called Large Sample otherwise Small Sample. The sampling
distribution of large samples is very near to normal distribution.

1
TEST OF HYPOTHESIS:

To reach decisions about population on the basis of sample information we


can make certain assumptions about the population involved which may or may
not be true such assumptions are called the statistical hypothesis. By testing
hypothesis is meant a process for deciding whether to accept or reject the
hypothesis.

NULL HYPOTHESIS:

The hypothesis formulated for the sake of rejecting it under the assumption
that it is true is called NULL HYPOTHESIS and it’s denoted by H0.

ERRORS:

If a hypothesis is rejected why it should have been accepted type-1 error has
been committed on the other hand if the hypothesis is accepted while it should
have been rejected type-2 error has been made.

H0 accepted H0 rejected
H0 true Correct decision Type-1 error

H0 false Type-2 error Correct decision

LEVEL OF SIGNIFICANCE (L O S):

The probability level below which we reject the hypothesis is known as level
of significance. The region in which a sample value calling is rejected is known as
“CRITICAL REGION”.

TEST OF SIGNIFICANCE:

The procedure which enables us to decide whether to accept or to reject is


called “TEST OF SIGNIFICANCE”.

ALTERNATE HYPOTHESIS:

It is a hypothesis formed to null hypothesis to be accepted when H 0 is rejected and


is denoted by H1.

PROCEDURE FOR TEST OF HYPOTHESIS:

1) Define Null hypothesis H0 suitable to the problem.


Alternate hypothesis H1 is also to be formed after careful study of
problem and also nature of the test (one tailed or two tailed)

If Ɵ is population parameter and Ɵ0 sample statistics then

2
H0 : Ɵ = Ɵ0

a) H1: Ɵ ≠ Ɵ0-------- two tailed


b) H1: Ɵ > Ɵ0------------- right tailed .
c) H1: Ɵ < Ɵ0------------------- left tailed
2) Level of significance (α) is fixed or taken from the problem. If specified
and Zα, critical value noted.

𝐭−𝐄(𝐭)
3) Test statistics Z =
𝐬.𝐄(𝐭)

t = sample value
E(t)= population value

s . E(t) = standard error of t

4) Comparison is made between |z| and |Zα|

If |z| <|Zα|, accept H0


If |z| >|Zα|, reject H0 or accept H1

5) Conclusion

Level of Significance

NATURE OF TEST 1% 5%

TWO TAILED |Zα| = 2.58 |Zα| = 1.96

RIGHT TAILED Zα = 2.33 Zα = 1.645

LEFT TAILED Zα = -2.33 Zα = -1.645

3
TEST 1:

Test of significance of the difference between sample proportion and


population proportion

X - Number of success in n trials

X/n = p is proportion of success in a sample

P = proportion of success from population


𝒑−𝑷
𝑻𝒆𝒔𝒕 𝒔𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔 𝒁 =
√𝑷𝑸
𝒏

𝑝𝑞 𝑝𝑞
95% confidence limits (𝑝 − 1.96 √ 𝑛 , 𝑝 + 1.96 √ 𝑛 )

PROBLEMS
1) Experience has shown that 20% of a manufactured product is of top
quality. In one day’s production of 400 articles, only 50 are of top quality.
Show that either the production of day chosen was not a representative
sample or the hypothesis of 20% was wrong at 5% level of significance
based on particular day production; find also the 95% confidence limits
for the percentage of top quality production?

Sol:

p = proportion of top quality in sample space = 50/400 =1/8


n = 400, P = 1/5, Q = 4/5
Step 1:
H0 : p = 20/100 = 1/5 (20% are of top quality)
P = population proportion |
1
H1: p≠ 5
Step 2: level of significance 5%, 𝑧𝛼 = 1.96
𝑝−𝑃
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 =
𝑃𝑄

𝑛
𝟏 𝟏
− −𝟑𝑿𝟓𝟎
𝟖 𝟓
= = = −𝟑. 𝟕
𝟏𝟒 𝟏 𝟒𝟎

𝟓 𝟓 𝟒𝟎𝟎

Step 4:
|z| = 3.75 > 1.96 = Zα
Therefore |z| > 𝑧𝛼
Reject H0 at 5% los
Step 5:

4
That is either the production of the day was chosen not a
representative sample or the hypothesis of 20% was wrong.

𝑝𝑞 𝑝𝑞
95% confidence limits (𝑝 − 1.96 √ 𝑛 , 𝑝 + 1.96 √ 𝑛 )

1 17 1 1 17 1
(8 − 1.96√8 8 400 < 𝑝 ≤ 8 + 1.96√8 8 400 )

0.093 ≤ P ≤ 0.157
9.3% ≤ P ≤ 15.7%
There 95%confidence limits for the percentage of top quality
product are 9.3 and 15.7

======== ======== ========


2) A cubical die is thrown 9000 times and a throw of 3 or 4 is observed
3240 times. Show that the die cannot be regarded as unbiased one,
and find that extreme limits between which the probability of throw of
3 or 4 lies at 5% level of significance.
Sol:
P = Population proportion = 2/6 = 0.33
p = sample proportion = 3240/9000 = 0.36
n = 9000
Step 1:
H0 : P = 0.33 the die is unbiased
1
H1: p ≠ 3

Step 2: level of significance 5%, 𝑧𝛼 = 1.96


𝑝−𝑃 0.36− 0.33
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝑃𝑄
= 0.33𝑋0.67
= 6.05
√ √
𝑛 9000

Step 4:
|z| = 6 > 1.96 = Zα
Therefore |z| > Zα
Reject H0 at 5% los
Step 5: The die cannot be unbiased at 5% los.
Area from -3 to 3 is 0.9973
99.75%
Critical value |Zα| = 1

𝑝𝑞 𝑝𝑞
Extreme limits (𝑝 − 3 √ 𝑛 , 𝑝 + 3 √ 𝑛 )

0.36𝑋0.64 0.36𝑋0.64
(0.36 − 3√ < 𝑝 ≤ 0.36 + 3√ )
9000 9000

0.344 ≤ P ≤ 0.375
==== ==== =====

5
TEST 2:
Test of the significance between two sample proportions
If p1, p2 be proportions of success in two large samples of sizes n1, n2
respectively drawn from the same population or two populations with same
proportions

p1 −p2 𝑛1 p1 + 𝑛2 p2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 1 1
Where 𝑃=
√𝑃𝑄(𝑛 +𝑛 ) 𝑛1 +𝑛2
1 2

PROBLEMS
1) Before an increase in excise duty on tea 800 people out of the sample
of 1000 were consumers of tea. After the increase in duty 800 people
were consumers of tea in a sample of 1200 persons. Find whether
there is a significant decrease in the consumption of tea. After the
increase in the duty at 1% los?

Sol: Let p1 = Proportion of consumers of tea before increase in excise


duty = 800/1000 = 0.8
p2 = After increase in duty = 800/1200 = 0.66
Step 1: H0 : p1 = p2
H1: p1 > p2 Right tailed
Step 2: level of significance 1%, Zα = 2.33
Step 3:
p1 − p2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 =
1 1
√𝑃𝑄 ( + )
𝑛 𝑛 1 2

𝑛1p1 + 𝑛2 p2 (0.8)1000 + 1200(0.66)


𝑃= = = 0.7273
𝑛1 +𝑛2 1000 + 1200
Q = 1 – P = 1 – 0.7273 = 0.2727

0.8 − 0.66
𝑍= = 7.34
√(0.7273)(0.2727) ( 1 + 1 )
1000 1200
Step 4:
|z| = 7.34 > 2.33 = Zα
Therefore |z| > Zα then reject H0 at 1% los
Step 5: There is a significant decrease in the consumption of tea after
the increase in excise duty.
============

6
TEST 3:
Test the significance of the difference between sample mean and
population mean
Sample mean = 𝑥̅
Population mean = µ,
Population standard deviation =
Size of the interval =n
𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
𝜎 𝜎
95% confidence limits (𝑥̅ − 1.96 𝑛 , 𝑥̅ + 1.96 𝑛)
√ √

NOTE: If  is not known replace  by sample standard deviation

𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑡𝑖𝑐𝑠 𝑍 = 𝑠
√𝑛

PROBLEMS
1) A sample of 100 students is taken from a large population. The mean
height of the students in this sample is 160 cms. Can it be reasonably
regarded that in the population? The mean height is 165 cms and
standard deviation is 10 cms.

Sol: Given n = 100, 𝑥̅ = 160, µ = 165,  = 10

Step 1:
H0 : 𝑥̅ = µ i.e. the difference between 𝑥̅ and µ is not significant
H1: 𝑥̅ ≠ µ Two tailed
Step 2: level of significance 5%, Zα = 1.96
𝑥̅ −µ
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
160− 165
𝑍= 10 = −5
√100

Step 4: |z| = 5 > 1.96 = Zα


Therefore |z| > Zα
Reject H0 at 5% los
Step 5:
The sample cannot be regarded as drawn from population.
====== ========= ========

2) The mean breaking strength of a cables supplied by a manufactured is


1800, with a standard deviation of 100. By a new technique in the

7
manufacturing process, it is claimed that the breaking strength of the
cable has increased. To test this claim, a sample of 50 cables is tested
and it is found that the mean breaking strength is 1850. Can we
support that claim at 1% los?

Sol: Given n = 50, 𝑥̅ = 1850, µ = 1800,  = 100

Step 1: H0 : 𝑥̅ = µ
H1: 𝑥̅ > µ Right tailed
Step 2: level of significance 1%, Zα = 2.33
Step 3:
𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
1850 − 1800
𝑍= = 3.53
100
√50

Step 4: |z| = 3.53 > 2.33 = Zα


Therefore |z| > Zα
Reject H0 at 1% los

Step 5: There is significant increase in the breaking strength of cables


manufactured by new process. The claim of the manufactured
is accepted.
====== ====== ========

TEST 4:
Test for significance of the difference between means of two samples

Let x1, x2 be the means of two large samples of sizes n1, n2 drawn from two
populations with same mean and variances 12, 22 respectively

𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 =
σ12 σ22

𝑛1 + 𝑛2

NOTE: If the samples are drawn from same population

𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑡𝑖𝑐𝑠 𝑍 =
1 1
𝜎√𝑛 + 𝑛
1 2

8
PROBLEMS

1) In a random sample of size 500, the mean is found to be 20. In another


independent sample of size 400, the mean is 15. Could the samples have
been drawn from the same population with standard deviation 4? At 1%
los.

Sol: Given 𝑛1 = 500, ̅̅̅


𝑥1 = 20

𝑛2 = 400, 𝑥2 = 15, 𝜎 = 4
̅̅̅

Step 1:
H0 : ̅̅̅= 𝑥2 ie the sample have been drawn from the same
𝑥1 ̅̅̅
population.
H1: ̅̅̅ 𝑥2 Two tailed
𝑥1 ≠ ̅̅̅
Step 2: level of significance 1%, Zα = 2.33
̅𝑥̅̅1̅− 𝑥
̅̅̅2̅
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 1 1
𝜎√ +
𝑛1 𝑛2

20 − 15
𝑍= = 18.63
1 1
4√500 + 400
Step 4:
|z| = 18.63 > 2.33 = Zα
Therefore |z| > Zα
Reject H0 at 1% los
Step 5:
The two samples are not drawn from same population.
========== ======== =======

2) A sample of heights of 6400 Englishmen has a mean of 170cms,


and standard deviation of 6.4cms. While a sample of heights of 1600
Americans has a mean of 172cm and standard deviation of 6.3 cm. Do
the data indicate that Americans are on an average taller than English
men? At 1% los.

Sol: Given 𝑛1 = 6400, ̅̅̅


𝑥1 = 170, 𝑠1 = 6.4,

𝑛2 = 1600, 𝑥2 = 172, 𝑠2 = 6.3


̅̅̅

Step 1: H0 : ̅̅̅=
𝑥1 ̅̅̅
𝑥2
H1: ̅̅̅ 𝑥2 left tailed
𝑥1 < ̅̅̅
Step 2: Level of significance 1%, Zα = 2.33
̅𝑥̅̅1̅− ̅𝑥̅̅2̅ 170− 172
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = = = −11.32
2 2
s21 s2 2 √(6.4) +(6.3)
√ + 6400 1600
𝑛 𝑛 1 2

Step 4: |z| = 11.32 > 2.33 = Zα

9
Therefore |z| > Zα
Reject H0 at 1% los
Step 5: From this sample it is include that on an average Americans
are taller than English men.
======= ==== ==========

SMALL SAMPLE TESTS


Small Sample:

If the size of the sample is less than 30, then the sample is known
as small sample (n<30)

Student t-distribution:
A random variable ‘t’ is said to follow student’s t-distribution, if its
density function.

𝝊+𝟏
−( 𝟐 )
𝟏 𝒕²
f(t) = 𝝊𝟏 (𝟏 + ) , -∞<t<∞
√𝝊 𝜷(𝟐,𝟐) 𝝊

‘𝝊’ gives the degrees of freedom of the t-distribution.

Properties of t-distribution:

1) The probability curve of the t-distribution is similar to the standard


normal curve and is symmetric about t=0. (bell-shaped and
asymptotic to the t-axis)
2) For sufficiently large values of 𝝊, the t-distribution tends to the
standard normal distribution.

3) The mean of the t-distribution is zero.

𝝑
4) The variance = , if 𝜗 >2,
𝝑−𝟐

(And is greater than 1, but it tends to 1 as 𝝊→∞)

Degrees of freedom:

The no. of independent variables used to compute the test statistic


is known as the number of degrees of freedom.

In general, the no. of degrees of freedom 𝝊=n-k, n is no. of


observations in the sample and ‘k’ is the no. of constraints imposed on them.

10
Uses of t-distribution:

The t-distribution is used to test the significance of the difference between

1) The mean of a small sample and the mean of the population

2) The means of two small samples and

3) The coefficient of correlation (in the small sample and that in the
population, assumed zero).

Test 1:
Test of significance of the difference between sample mean and population
mean
𝑥̅ −𝜇
t = 𝑠 notation as beore
√𝑛−1

where 𝑥̅ = sample mean, s = sample standard deviation


μ = population mean, n = size of the sample
 = n-1
Test 2:
Test of significance of the difference means of two small samples
drawn from the same normal population.

̅𝑥̅̅1̅−𝑥
̅̅̅2̅
t= and degrees of freedom ϑ = (𝑛1 + 𝑛2 − 2)
𝑛 𝑠2 +𝑛 𝑠2
√ 1 1 2 2(1 +1)
𝑛1 +𝑛2 −2 𝑛1 𝑛2

t-test:

1) ‘t’ is test statistic for small sample ‘α’ is level of significance

‘’ is degrees of freedom

2) If |t| < tα(ϑ) (critical value) Accept Ho

3) If |t|> tα|ϑ| Reject Ho

PROBLEMS
1) A mechanist is expected to make engine parts with axle diameter of

1.75cms. A random sample of 10 parts shows a mean diameter of 1.85

cms with a standard deviation of 0.1 cms. On the basis of this sample,
would you say that the work of the machinist is inferior?

11
Sol: Given population mean, µ=1.75cm

No of items n=10

Mean of the sample 𝑥̅ =1.85cm

Sample standard deviation s = 0.1

a) Step 1:
H0 :̅̅̅=
𝑥 µ

H1 : 𝑥̅ ≠ µ (two tailed test)

Step 2:
Los 5%, critical value

Degrees of freedom 𝝊 = n-1 = 10-1 = 9

𝑡(0.975) (9) = 2.26


Step 3:
𝑥̅ −µ 1.85−1.75
Test statistics t = 𝑠 = 0.1 =3
√𝑛−1 √9
Step 4: |z| = 3 > 2.26 = 𝑡(0.975) (9)
Therefore |z| > 𝑡(0.975) (9)
Reject H0 at 5% los
Step 5:
We support that the work of the machinist is interior at 5% LOS

b) Step 1: H0 :̅̅̅=
𝑥 µ

H1 : 𝑥̅ ≠ µ (two tailed test)


Step 2: Los 1%, critical value

Degrees of freedom 𝝊 = n-1 = 10-1 = 9, 𝑡(0.975) (9) = 3.25.

Step 3:
𝑥̅ −µ 1.85−1.75
Test statistics t = 𝑠 = 0.1 =3
√𝑛−1 √9
Step 4:
|z| = 3 < 3.25 = t (0.975)(9)
Therefore |z| < t(0.995)(9)
Accept H0 at 1% los
Step 5:
We cannot assume work of the machinist is interior

===== ========== ======== ======

3) A random sample of 10 boys had the following IQ`s 70, 120, 110, 101,

12
88, 83, 95, 98,107,100.Does the data support the assumption of a
population mean IQ of 100? Find a reasonable range in which most of
the mean IQ values of sample of 10 boys lie?
∑ 𝑥𝑖 972
Sol: Sample mean = 𝑥̅ = = = 97.2
10 2

Step 1: Null hypothesis H0: μ =100

H1: μ  100

Step 2: 𝑡𝛼 (v) ⟹ v=n-1=9

5% los, t0.975(9)=2.26

𝑥̅ −𝜇
Step 3: Test statistic, t = 𝑆
√𝑛−1
𝑥𝑖 𝑥𝑖 − 𝑥̅ = 𝐴 𝐴2
70 -27.2 739.84
120 22.8 519.84
110 12.8 163.84
101 3.8 14.44
88 -9.2 84.64
83 -14.2 201.69
95 -2.2 4.84
98 0.8 0.64
107 9.8 96.04
100 2.8 7.84

1 1
s² = 𝑛∑ (𝑥𝑖 − 𝑥̅ )² = ∑ (2037.3) = 203.73
10

97.2−100
Then S =14.27, and t= = −0.63.
14.27/√9
|t| = 0.63

Step 4: |t| < t0.975(9)=2.26 Accept H0

Step 5: We are supporting that mean IQ of population is 100


𝑠 𝑠
95% confidence 𝑥̅ - 𝑡𝛼 (v)  t  𝑥̅ + 𝑡𝛼 (v)
√𝑛−1 √𝑛−1

14.27 14.27
(97.2 - 22.6 , 97.2+22.6 )
√9 √9

= (86.45, 107.95)

13
F-Test (or) Snedecor’s F-distribution:

The probability density function of the random variable ‘F’ given by

𝜐 𝜐1 𝜐1
( 1 ) ⁄2 𝐹 ⁄2−1
𝜐2
F= 𝜐 𝜐 · 𝜐 𝐹 , F >0
𝛽( 1 , 2 ) (1+ 1 )(𝜐1 +𝜐2)/2
2 2 𝜐2

𝝊₁,𝝊₂ are degrees of freedom.

Properties: F-curve

𝜐2
1) The mean of the F-distribution is (𝜐₂ >2)
𝜐2 −2
2𝜐22 (𝜐1 +𝜐2 −2)
2) Variance = (𝜐2 > 4)
𝜐1 (𝜐2−2)2(𝜐2−4)

Total area under F-curve is 1 sq. unit.

Use of F-distribution:

F-distribution is used to test the quality of the variances of the


populations from which two small samples have been drawn.

14
F-test:

Test of significance of difference between population variances for


small samples :

s12 - sample variance of 1st sample , n1 - size of 1st sample


s22 - sample variance of 2nd sample, n2 - size of 2nd sample
Estimation of population variance for
𝑛1 𝑛2
1st population,  12 = , 2 population,  2 =
s12 nd 2
s22
𝑛1−1 𝑛2 −1

 12
Consider F = if  12 >  22 , F>1
 22
Degrees of freedoms v1 = n1-1, v2 = n2-1
Critical value F (𝜐1 , 𝜐2 )
L.O.S α
If |F| < F (𝜐1 , 𝜐2 ) Accept Ho
Ho :  12 =  22 , H1 :  12   22

PROBLEMS
1) Two independent samples of 8nd 7 items respectively had the
following variables

Sample-1 9 11 13 11 15 9 12 14
Sample-2 10 12 10 14 9 8 10
Do the two estimates of population differ significantly at 5% Los?
Sol: Given n1=8, n2=7
𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊
9 10 81 100
11 12 121 144
13 10 169 100
11 14 121 196
15 9 225 81
9 8 81 64
12 10 144 100
14 16
1 1 1 1
𝑥̅ = ∑xi = 94 = 11.75, 𝑦̅ = ∑ yi = 73 = 10.42
𝑛1 8 𝑛2 7
1 1
S1² = ∑xi² - (𝑥̅ )² = (1138) - (11.75)² = 4.18
𝑛1 8

1 1
S2² = ∑ yi² - (𝑦̅)² = (785) - (10.43) = 3.38
𝑛2 7

15
𝑛1 8
 12 = s1² = ×4.18 = 4.77
𝑛1 −1 7
𝑛2 7
 22 = ×s2²= ×3.38=3.94
𝑛2 −1 6

Step 1: H0 :  12 =  22
H1:  12 ≠  22

Step 2: 𝝊1=n1-1=8-1=7, 𝝊2= n2-1=7-1=6

α =0.05, 𝐹0.05 (7, 6) = 4.21


 12 4.77
Step 3: F= = = 1.21
 22 3.94
Step 4: F = 1.21 < 𝐹0.05 (7, 6) = 4.21 accept H0

Step 5: The two estimates of population variances do not differ


significantly at 5% los.
========= ======= ======== ==========

3) Two random samples drawn from two normal populations gave the
following observations

Sample-1 20 16 26 27 23 22 18 24 25 19
Sample-2 17 23 32 25 22 24 28 18 31 33 20 27

xi yi xi² yi²
20 17 400 289
16 23 256 529
26 32 676 1024
27 25 729 625
23 22 529 484
22 24 484 576
18 28 324 784
24 18 576 324
25 31 625 961
19 33 361 1089
20 400
27 729

1 1
𝑥̅ = ∑xi = 22, 𝑦̅ = ∑ yi = 25
𝑛1 𝑛2

16
1 1
S1² = ∑xi² - (𝑥̅ )² = ×4960-22² = 12
𝑛1 10

1 1
S2² = ∑ yi² - (𝑦̅)² = (7814)-625 = 26.16
𝑛2 12

𝑛1 10 𝑛2 12
 12 = s1² = ×12 = 13.33,  22 = ×s2²= ×26.16 = 28.53
𝑛1 −1 9 𝑛2 −1 11

Step 1: H0 :  12 =  22 , H1:  12 ≠  22
Step 2: 𝝊1 = n1-1 = 10-1 = 9; 𝝊2 = n2-1 = 12-1 = 11

α =0.05, F0.05(9,11) = 2.90


 22 28.53
Step 3: F= = = 2.140
 2
1
13.33

Step 4: F = 2.14 < 2.90 = F0.05(9,11) accept H0


Step 5: The two populations have same variances at 5% los.
======== ============ ============ ====

Chi – Square Distribution (or) Test: (𝝌𝟐 - distribution)

If x1, x2…xn are normally distributed independent random variable


then (𝑥12 +𝑥22 +…..+𝑥𝑛2 ) follows a probability distribution called “chi-square
distribution” with ‘n’ degrees of freedom.

The probability density function of 𝝌𝟐 -distribution is

1 𝜐
( )−1 2/2
f(χ²) = 𝜐 (𝜒 2) 2 . 𝑒 −𝜒 , 0 < χ² <∞.
2 ⁄2 . 𝜐/2

Uses:

1) 𝝌𝟐 distribution is used to test the goodness fit.(i.e., if is used to judge


whether a given sample may be reasonably regarded as a sample from
certain hypothetical population)
2) It is used to test the independence of attributes. (i.e., if a population is
known to have two attributes (or traits), then 𝝌𝟐 -distribution is used
to test whether the two attributes are associated (or) independent,
based on a sample drawn from the population).

𝝌𝟐 −Test of goodness of Fit:

On the basis of the hypothesis assumed about the population, we


find the expected frequencies

17
Ei (i=1, 2 …n), corresponding to the observed frequencies

Oi (i=1, 2 …n)  ∑ 𝐸𝑖 =∑ 𝑂; It is known that

(𝑂𝑖 −𝐸𝑖 )2
𝝌𝟐 = ∑𝑛𝑖=1
𝐸𝑖

Follows 𝝌𝟐 -distribution, with df ϑ.

𝝊= no. of independent frequencies.

→ If 𝜒 2 < 2 (𝝊) (critical value)

Accepted that fit is good.

→ If 𝜒 2 > 2 (𝝊) (critical value)

Accepted that fit is bad.

Conditions for the validity of 𝝌𝟐 -test:

1) The no. of observations ‘N’ in the sample must be reasonably large (N≥50)
2) Individual frequencies must not be too small i.e. Oi ≥10. In case of Oi <10,
it is combined with the neighbouring frequencies, so that the combined
frequency is ≥ 10. (Oi ≥10)
3) The no. of classes ‘n’ must be neither too small nor too large, i.e 4 ≤ n ≤ 16

Problems
1). The following data show defective articles produced by 4 machines.

machine A B C D total
Production time 1 1 2 3 7
No. of defectives 12 30 63 98 203
Do the figures indicate a significant difference in the performance of the
machines?

Sol: H0: Production rates of the machines are the same. Based on H0,

expected frequencies

Total = 203

A B C D
1 1 2 3
Ei ×203 ×203 ×203 ×203
7 7 7 7
Ei 29 29 58 57

Oi 12 30 63 98
Ei 29 29 58 87

18
Oi-Ei 17 1 5 11
(Oi-Ei)2 289 1 25 121

(𝑂𝑖−𝐸𝑖)2
𝜒 2 =∑4𝑖=1
289 1 25 121
= 29 + 29 + + = 11.81
𝐸𝑖 28 87

LOS 5% 𝝊 =n-1=3

𝜒 2 (0.95) (3)=7.81

𝝌𝟐 =11.81>7.81, then Reject Ho


Significant difference between machines.

======= ========= ========

3) Theory predicts that proportion of beams in four groups A, B, C, D


should 9:3:3:1 in an experiment among 1600 beams, the no’s in four
groups were 882, 373, 287 and 118. Thus the experiment supports the
theory?
Sol: Ho: Proportion of beams is the same based on Ho.

Expected frequencies

A B C D
9 3 3 1
Ei: ×1600 ×1600 ×1600 ×1600
16 16 16 16
Total 1600

Ei 900 300 300 100

Oi 882 313 287 118


Oi-Ei -18 13 -13 18
(Oi-Ei)2 324 169 169 324

(𝑂𝑖−𝐸𝑖)2
𝜒 2 = ∑4𝑖=1 = 4.72,
𝐸𝑖

At Los 5% 𝝊 = n-1 = 3

𝝌𝟐 (0.95)(3)=7.81, 𝝌𝟐 =4.72<7.8
Accept Ho

∴ The Proportion of beams are the same at 5% los.

======== ========= ========

19
3. A survey of 800 families with 4 children each revealed the following distribution
No. of boys 0 1 2 3 4
No. of girls 4 3 2 1 0
No. of families 32 178 290 236 64

Is the result consistent with hypothesis that male and female births are
usually probable? Test by using 𝜒 2 -test for goodness of fit.

Sol: Ho: the male and female births are usually probable
1
P = prob. of boys = 2
1
E = prob. of girls = 2
N = Total frequency=800 n=4
X = no. of boys in a family
X=0, 1,2,3,4
1 1 1
P(x=r) = ncrprqn-r = ncr( )r( )n-r = ncr( )n = 4cr(12)4
2 2 2
Theoretical
r P(x=r) NXP(x=r) frequency Ei
1 1
0 800×16=50 50
16
1 1
1 800×4=200 200
4
3 3
2 800×8=300 300
8
1 1
3 800×4=200 200
4
1 1
4 800×16=50 50
16

Oi 32 178 290 236 64


Ei 50 200 300 200 50
Oi-Ei -18 -22 -10 36 14
(Oi-Ei)2 324 484 100 1296 196

(𝑂𝑖−𝐸𝑖)2
𝜒 2 = ∑4𝑖=1 = 19.63
𝐸𝑖

Los 5% V=n-1=4-1=3

𝝌𝟐 (0.95)(3)=7.81⇒𝜒 2 >𝜒 2 0,95(4). Then, Reject Ho.


This sample donor support male and female births in a family are usually

20
probable.

==========

4. Fit a Poisson distribution for the following distribution and also test the
goodness of fit

x 0 1 2 3 4 5 6 7 Total
f 314 335 204 56 29 9 3 0 980

Sol: Ho: The data in accordance with Poisson distribution.


∑𝑥𝑖𝑓𝑖
Mean =1.2
𝑓𝑖

λ =1.2
𝜆𝑟
P(x=r)= 𝑟! 𝑒 −𝜆 r= 0,1,2,3……………

Theoretical frequencies

x P(x=r) Frequency Ei 480×p1×20


0 (1.2)e-1.2 301
1 (1.2)e-1.2 362
2 (1.2)2 -1.2
e 217
2
3 (1.2)3 -1.2
e 87
3
4 26
5 6
6 1
7 0

No. of classes are merged of frequency is less than 10

Oi 314 335 204 86 41


Ei 301 362 217 87 33
Oi-Ei 13 -27 -13 -1 8
n=4
2
𝝌𝟐 = ∑(𝑂𝑖−𝐸𝑖)
𝐸𝑖
= 5.4 V = n-2 = 4-2 = 2

𝝌𝟐 (0.95)(2)=5.99

𝝌𝟐 >𝝌𝟐 (0.95) Accept Ho

Poisson distribution is good fit.

21
==========

22

You might also like