Professional Documents
Culture Documents
SAMPLING:
SAMPLING DISTRIBUTION:
Considering all possible samples of size ‘n’ which can be drawn from a given
population at random for each sample computes sample mean. The means of the
samples will not be identical. The distribution so formed is called “Sampling
Distribution” of means similarly we can have sampling distribution of standard
deviations etc. In general sampling distribution is called sampling distribution of
statistics.
STANDARD ERROR:
In the number of elements in a sample ‘n’ is greater than or equal to 30. The
sample space is called Large Sample otherwise Small Sample. The sampling
distribution of large samples is very near to normal distribution.
1
TEST OF HYPOTHESIS:
NULL HYPOTHESIS:
The hypothesis formulated for the sake of rejecting it under the assumption
that it is true is called NULL HYPOTHESIS and it’s denoted by H0.
ERRORS:
If a hypothesis is rejected why it should have been accepted type-1 error has
been committed on the other hand if the hypothesis is accepted while it should
have been rejected type-2 error has been made.
H0 accepted H0 rejected
H0 true Correct decision Type-1 error
The probability level below which we reject the hypothesis is known as level
of significance. The region in which a sample value calling is rejected is known as
“CRITICAL REGION”.
TEST OF SIGNIFICANCE:
ALTERNATE HYPOTHESIS:
2
H0 : Ɵ = Ɵ0
𝐭−𝐄(𝐭)
3) Test statistics Z =
𝐬.𝐄(𝐭)
t = sample value
E(t)= population value
5) Conclusion
Level of Significance
NATURE OF TEST 1% 5%
3
TEST 1:
𝑝𝑞 𝑝𝑞
95% confidence limits (𝑝 − 1.96 √ 𝑛 , 𝑝 + 1.96 √ 𝑛 )
PROBLEMS
1) Experience has shown that 20% of a manufactured product is of top
quality. In one day’s production of 400 articles, only 50 are of top quality.
Show that either the production of day chosen was not a representative
sample or the hypothesis of 20% was wrong at 5% level of significance
based on particular day production; find also the 95% confidence limits
for the percentage of top quality production?
Sol:
Step 4:
|z| = 3.75 > 1.96 = Zα
Therefore |z| > 𝑧𝛼
Reject H0 at 5% los
Step 5:
4
That is either the production of the day was chosen not a
representative sample or the hypothesis of 20% was wrong.
𝑝𝑞 𝑝𝑞
95% confidence limits (𝑝 − 1.96 √ 𝑛 , 𝑝 + 1.96 √ 𝑛 )
1 17 1 1 17 1
(8 − 1.96√8 8 400 < 𝑝 ≤ 8 + 1.96√8 8 400 )
0.093 ≤ P ≤ 0.157
9.3% ≤ P ≤ 15.7%
There 95%confidence limits for the percentage of top quality
product are 9.3 and 15.7
Step 4:
|z| = 6 > 1.96 = Zα
Therefore |z| > Zα
Reject H0 at 5% los
Step 5: The die cannot be unbiased at 5% los.
Area from -3 to 3 is 0.9973
99.75%
Critical value |Zα| = 1
𝑝𝑞 𝑝𝑞
Extreme limits (𝑝 − 3 √ 𝑛 , 𝑝 + 3 √ 𝑛 )
0.36𝑋0.64 0.36𝑋0.64
(0.36 − 3√ < 𝑝 ≤ 0.36 + 3√ )
9000 9000
0.344 ≤ P ≤ 0.375
==== ==== =====
5
TEST 2:
Test of the significance between two sample proportions
If p1, p2 be proportions of success in two large samples of sizes n1, n2
respectively drawn from the same population or two populations with same
proportions
p1 −p2 𝑛1 p1 + 𝑛2 p2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 1 1
Where 𝑃=
√𝑃𝑄(𝑛 +𝑛 ) 𝑛1 +𝑛2
1 2
PROBLEMS
1) Before an increase in excise duty on tea 800 people out of the sample
of 1000 were consumers of tea. After the increase in duty 800 people
were consumers of tea in a sample of 1200 persons. Find whether
there is a significant decrease in the consumption of tea. After the
increase in the duty at 1% los?
0.8 − 0.66
𝑍= = 7.34
√(0.7273)(0.2727) ( 1 + 1 )
1000 1200
Step 4:
|z| = 7.34 > 2.33 = Zα
Therefore |z| > Zα then reject H0 at 1% los
Step 5: There is a significant decrease in the consumption of tea after
the increase in excise duty.
============
6
TEST 3:
Test the significance of the difference between sample mean and
population mean
Sample mean = 𝑥̅
Population mean = µ,
Population standard deviation =
Size of the interval =n
𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
𝜎 𝜎
95% confidence limits (𝑥̅ − 1.96 𝑛 , 𝑥̅ + 1.96 𝑛)
√ √
𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑡𝑖𝑐𝑠 𝑍 = 𝑠
√𝑛
PROBLEMS
1) A sample of 100 students is taken from a large population. The mean
height of the students in this sample is 160 cms. Can it be reasonably
regarded that in the population? The mean height is 165 cms and
standard deviation is 10 cms.
Step 1:
H0 : 𝑥̅ = µ i.e. the difference between 𝑥̅ and µ is not significant
H1: 𝑥̅ ≠ µ Two tailed
Step 2: level of significance 5%, Zα = 1.96
𝑥̅ −µ
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
160− 165
𝑍= 10 = −5
√100
7
manufacturing process, it is claimed that the breaking strength of the
cable has increased. To test this claim, a sample of 50 cables is tested
and it is found that the mean breaking strength is 1850. Can we
support that claim at 1% los?
Step 1: H0 : 𝑥̅ = µ
H1: 𝑥̅ > µ Right tailed
Step 2: level of significance 1%, Zα = 2.33
Step 3:
𝑥̅ − µ
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 𝜎
√𝑛
1850 − 1800
𝑍= = 3.53
100
√50
TEST 4:
Test for significance of the difference between means of two samples
Let x1, x2 be the means of two large samples of sizes n1, n2 drawn from two
populations with same mean and variances 12, 22 respectively
𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 =
σ12 σ22
√
𝑛1 + 𝑛2
𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑡𝑖𝑐𝑠 𝑍 =
1 1
𝜎√𝑛 + 𝑛
1 2
8
PROBLEMS
𝑛2 = 400, 𝑥2 = 15, 𝜎 = 4
̅̅̅
Step 1:
H0 : ̅̅̅= 𝑥2 ie the sample have been drawn from the same
𝑥1 ̅̅̅
population.
H1: ̅̅̅ 𝑥2 Two tailed
𝑥1 ≠ ̅̅̅
Step 2: level of significance 1%, Zα = 2.33
̅𝑥̅̅1̅− 𝑥
̅̅̅2̅
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = 1 1
𝜎√ +
𝑛1 𝑛2
20 − 15
𝑍= = 18.63
1 1
4√500 + 400
Step 4:
|z| = 18.63 > 2.33 = Zα
Therefore |z| > Zα
Reject H0 at 1% los
Step 5:
The two samples are not drawn from same population.
========== ======== =======
Step 1: H0 : ̅̅̅=
𝑥1 ̅̅̅
𝑥2
H1: ̅̅̅ 𝑥2 left tailed
𝑥1 < ̅̅̅
Step 2: Level of significance 1%, Zα = 2.33
̅𝑥̅̅1̅− ̅𝑥̅̅2̅ 170− 172
Step 3: 𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 𝑍 = = = −11.32
2 2
s21 s2 2 √(6.4) +(6.3)
√ + 6400 1600
𝑛 𝑛 1 2
9
Therefore |z| > Zα
Reject H0 at 1% los
Step 5: From this sample it is include that on an average Americans
are taller than English men.
======= ==== ==========
If the size of the sample is less than 30, then the sample is known
as small sample (n<30)
Student t-distribution:
A random variable ‘t’ is said to follow student’s t-distribution, if its
density function.
𝝊+𝟏
−( 𝟐 )
𝟏 𝒕²
f(t) = 𝝊𝟏 (𝟏 + ) , -∞<t<∞
√𝝊 𝜷(𝟐,𝟐) 𝝊
Properties of t-distribution:
𝝑
4) The variance = , if 𝜗 >2,
𝝑−𝟐
Degrees of freedom:
10
Uses of t-distribution:
3) The coefficient of correlation (in the small sample and that in the
population, assumed zero).
Test 1:
Test of significance of the difference between sample mean and population
mean
𝑥̅ −𝜇
t = 𝑠 notation as beore
√𝑛−1
̅𝑥̅̅1̅−𝑥
̅̅̅2̅
t= and degrees of freedom ϑ = (𝑛1 + 𝑛2 − 2)
𝑛 𝑠2 +𝑛 𝑠2
√ 1 1 2 2(1 +1)
𝑛1 +𝑛2 −2 𝑛1 𝑛2
t-test:
PROBLEMS
1) A mechanist is expected to make engine parts with axle diameter of
cms with a standard deviation of 0.1 cms. On the basis of this sample,
would you say that the work of the machinist is inferior?
11
Sol: Given population mean, µ=1.75cm
No of items n=10
a) Step 1:
H0 :̅̅̅=
𝑥 µ
Step 2:
Los 5%, critical value
b) Step 1: H0 :̅̅̅=
𝑥 µ
Step 3:
𝑥̅ −µ 1.85−1.75
Test statistics t = 𝑠 = 0.1 =3
√𝑛−1 √9
Step 4:
|z| = 3 < 3.25 = t (0.975)(9)
Therefore |z| < t(0.995)(9)
Accept H0 at 1% los
Step 5:
We cannot assume work of the machinist is interior
3) A random sample of 10 boys had the following IQ`s 70, 120, 110, 101,
12
88, 83, 95, 98,107,100.Does the data support the assumption of a
population mean IQ of 100? Find a reasonable range in which most of
the mean IQ values of sample of 10 boys lie?
∑ 𝑥𝑖 972
Sol: Sample mean = 𝑥̅ = = = 97.2
10 2
H1: μ 100
5% los, t0.975(9)=2.26
𝑥̅ −𝜇
Step 3: Test statistic, t = 𝑆
√𝑛−1
𝑥𝑖 𝑥𝑖 − 𝑥̅ = 𝐴 𝐴2
70 -27.2 739.84
120 22.8 519.84
110 12.8 163.84
101 3.8 14.44
88 -9.2 84.64
83 -14.2 201.69
95 -2.2 4.84
98 0.8 0.64
107 9.8 96.04
100 2.8 7.84
1 1
s² = 𝑛∑ (𝑥𝑖 − 𝑥̅ )² = ∑ (2037.3) = 203.73
10
97.2−100
Then S =14.27, and t= = −0.63.
14.27/√9
|t| = 0.63
14.27 14.27
(97.2 - 22.6 , 97.2+22.6 )
√9 √9
= (86.45, 107.95)
13
F-Test (or) Snedecor’s F-distribution:
𝜐 𝜐1 𝜐1
( 1 ) ⁄2 𝐹 ⁄2−1
𝜐2
F= 𝜐 𝜐 · 𝜐 𝐹 , F >0
𝛽( 1 , 2 ) (1+ 1 )(𝜐1 +𝜐2)/2
2 2 𝜐2
Properties: F-curve
𝜐2
1) The mean of the F-distribution is (𝜐₂ >2)
𝜐2 −2
2𝜐22 (𝜐1 +𝜐2 −2)
2) Variance = (𝜐2 > 4)
𝜐1 (𝜐2−2)2(𝜐2−4)
Use of F-distribution:
14
F-test:
12
Consider F = if 12 > 22 , F>1
22
Degrees of freedoms v1 = n1-1, v2 = n2-1
Critical value F (𝜐1 , 𝜐2 )
L.O.S α
If |F| < F (𝜐1 , 𝜐2 ) Accept Ho
Ho : 12 = 22 , H1 : 12 22
PROBLEMS
1) Two independent samples of 8nd 7 items respectively had the
following variables
Sample-1 9 11 13 11 15 9 12 14
Sample-2 10 12 10 14 9 8 10
Do the two estimates of population differ significantly at 5% Los?
Sol: Given n1=8, n2=7
𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊
9 10 81 100
11 12 121 144
13 10 169 100
11 14 121 196
15 9 225 81
9 8 81 64
12 10 144 100
14 16
1 1 1 1
𝑥̅ = ∑xi = 94 = 11.75, 𝑦̅ = ∑ yi = 73 = 10.42
𝑛1 8 𝑛2 7
1 1
S1² = ∑xi² - (𝑥̅ )² = (1138) - (11.75)² = 4.18
𝑛1 8
1 1
S2² = ∑ yi² - (𝑦̅)² = (785) - (10.43) = 3.38
𝑛2 7
15
𝑛1 8
12 = s1² = ×4.18 = 4.77
𝑛1 −1 7
𝑛2 7
22 = ×s2²= ×3.38=3.94
𝑛2 −1 6
Step 1: H0 : 12 = 22
H1: 12 ≠ 22
3) Two random samples drawn from two normal populations gave the
following observations
Sample-1 20 16 26 27 23 22 18 24 25 19
Sample-2 17 23 32 25 22 24 28 18 31 33 20 27
xi yi xi² yi²
20 17 400 289
16 23 256 529
26 32 676 1024
27 25 729 625
23 22 529 484
22 24 484 576
18 28 324 784
24 18 576 324
25 31 625 961
19 33 361 1089
20 400
27 729
1 1
𝑥̅ = ∑xi = 22, 𝑦̅ = ∑ yi = 25
𝑛1 𝑛2
16
1 1
S1² = ∑xi² - (𝑥̅ )² = ×4960-22² = 12
𝑛1 10
1 1
S2² = ∑ yi² - (𝑦̅)² = (7814)-625 = 26.16
𝑛2 12
𝑛1 10 𝑛2 12
12 = s1² = ×12 = 13.33, 22 = ×s2²= ×26.16 = 28.53
𝑛1 −1 9 𝑛2 −1 11
Step 1: H0 : 12 = 22 , H1: 12 ≠ 22
Step 2: 𝝊1 = n1-1 = 10-1 = 9; 𝝊2 = n2-1 = 12-1 = 11
1 𝜐
( )−1 2/2
f(χ²) = 𝜐 (𝜒 2) 2 . 𝑒 −𝜒 , 0 < χ² <∞.
2 ⁄2 . 𝜐/2
Uses:
17
Ei (i=1, 2 …n), corresponding to the observed frequencies
(𝑂𝑖 −𝐸𝑖 )2
𝝌𝟐 = ∑𝑛𝑖=1
𝐸𝑖
1) The no. of observations ‘N’ in the sample must be reasonably large (N≥50)
2) Individual frequencies must not be too small i.e. Oi ≥10. In case of Oi <10,
it is combined with the neighbouring frequencies, so that the combined
frequency is ≥ 10. (Oi ≥10)
3) The no. of classes ‘n’ must be neither too small nor too large, i.e 4 ≤ n ≤ 16
Problems
1). The following data show defective articles produced by 4 machines.
machine A B C D total
Production time 1 1 2 3 7
No. of defectives 12 30 63 98 203
Do the figures indicate a significant difference in the performance of the
machines?
Sol: H0: Production rates of the machines are the same. Based on H0,
expected frequencies
Total = 203
A B C D
1 1 2 3
Ei ×203 ×203 ×203 ×203
7 7 7 7
Ei 29 29 58 57
Oi 12 30 63 98
Ei 29 29 58 87
18
Oi-Ei 17 1 5 11
(Oi-Ei)2 289 1 25 121
(𝑂𝑖−𝐸𝑖)2
𝜒 2 =∑4𝑖=1
289 1 25 121
= 29 + 29 + + = 11.81
𝐸𝑖 28 87
LOS 5% 𝝊 =n-1=3
𝜒 2 (0.95) (3)=7.81
Expected frequencies
A B C D
9 3 3 1
Ei: ×1600 ×1600 ×1600 ×1600
16 16 16 16
Total 1600
(𝑂𝑖−𝐸𝑖)2
𝜒 2 = ∑4𝑖=1 = 4.72,
𝐸𝑖
At Los 5% 𝝊 = n-1 = 3
𝝌𝟐 (0.95)(3)=7.81, 𝝌𝟐 =4.72<7.8
Accept Ho
19
3. A survey of 800 families with 4 children each revealed the following distribution
No. of boys 0 1 2 3 4
No. of girls 4 3 2 1 0
No. of families 32 178 290 236 64
Is the result consistent with hypothesis that male and female births are
usually probable? Test by using 𝜒 2 -test for goodness of fit.
Sol: Ho: the male and female births are usually probable
1
P = prob. of boys = 2
1
E = prob. of girls = 2
N = Total frequency=800 n=4
X = no. of boys in a family
X=0, 1,2,3,4
1 1 1
P(x=r) = ncrprqn-r = ncr( )r( )n-r = ncr( )n = 4cr(12)4
2 2 2
Theoretical
r P(x=r) NXP(x=r) frequency Ei
1 1
0 800×16=50 50
16
1 1
1 800×4=200 200
4
3 3
2 800×8=300 300
8
1 1
3 800×4=200 200
4
1 1
4 800×16=50 50
16
(𝑂𝑖−𝐸𝑖)2
𝜒 2 = ∑4𝑖=1 = 19.63
𝐸𝑖
Los 5% V=n-1=4-1=3
20
probable.
==========
4. Fit a Poisson distribution for the following distribution and also test the
goodness of fit
x 0 1 2 3 4 5 6 7 Total
f 314 335 204 56 29 9 3 0 980
λ =1.2
𝜆𝑟
P(x=r)= 𝑟! 𝑒 −𝜆 r= 0,1,2,3……………
Theoretical frequencies
𝝌𝟐 (0.95)(2)=5.99
21
==========
22