You are on page 1of 324

L

,n,^Lrn*r"

tsASIC
om
t . c
o
(A Text Book fo, lnte;::;*;.li&i{,

p
r- iasscs,

PART.II
g s
o
::.:,\ bl {,
.
,-ijk_r"::r,1,_! ;-;\.ry.11 ,

By 3
I .. ..

9 4 KIANI
t9
GHULAM HUSSAIN

a
Ex. Associate professor of

t
SGii.ii.,

/: /s
MUHAMMAD SALEEM AKHTAR

s
tt p
Govt. Gordon College nawitpinoi,

h
BOOK
II4AJEED trDEPC)T l_lJcad oiace
22-Urdu Vizai, Lahore
I

Ph:Q42-37 3 1 1 484 gt gSSl 87


ut{touE PUBuCATt0fia
Al-Mustafa plaza, 212-C Sunny
Aminpur Bazar, Center Al-Hanif plaza.
Road,
6!n Sateltite Town.
Faisatabad University Road,
Rawalpindi Gujrarrwala
Ph: A41-2643322 Ph:051-4429949 Sargodha.
Ph: 055-3825612 Ph: 048-374OUg
I
AII Rights Reserved
' ffo part of this book mag be reprodued or transmittedin any form or
by ang meens, electronic or mechanica\ including plwtoeopying recording,
or ang information storage and retrieual sgstem, utitltoututrittenpermissfon
of the autlwrs or publi.slwrs.

om
t . c
p o
g s
l o
. b
43
99
t
aEDITION 2011.2012
REVISED
s t
//
!

s :
I

Publlshers: Kashif Mukhtar (

tt p
Maieed Book Depot

h
Urdu Bazar, Lahore. !
t
Printers: Z.A. Printers, Lahore. ir

Gopy: 5oo s

Composlng: Muhammad Khurshid Khan &


B
Shahid Ayub Ghanna
b
K

t?c /_
Price: Rs.225f---
!
L

\ttt)

errtce
Basic statistics -
Part II has 6een written to
serye as the text
Intermediate rever crass XII.
n r,.i u..r,r1;itr;-J;aiy ...oraing tofor the students of
approved by the Ministry of rhe new syrabus
Education (curriculrr-wing),
Islamabad' The book will meetir,.i.qui*inds;i;ir coulrn*.nt'of pakistan,
iie roucation Boards in pakistan.

m
The students of M'A' Economics,
i. tgr,, Mr s; t*bi.pr.,v, M. sc, psychorogy,

o
Business Administration and B.B.A.,
*'"
riro.ntr or ,.nv;A;,. areas of sociar

c
read their courses from this sciences can
uoqk They .un u.r"htl'tt
.
from this book because the .,

t
lessons in the book have
oeen o[iLsedin a;;il;;rd,
simpre and rucid manner.

o
For the students, who do not
have the ctass-room facirty,
il;;;';;;
p t. ilr.r. "
gift in their hands' The students a good
oi allrru rquii op., University who

s
courses of BBA and B'A' will are taking up
find this book or tr.rn*aous value

g
prepare their lessons from They can
this book.without

o
intensive
niri*ise,..and*t.roi.g
The book is reary 'basic', ,.t crass-room lectures.

l
interested to learn the basic l;;il;g.-wise and anybody who is il

b
tneow or st tiiti., *ili n,iJit. it

.
book a beneficial guide. ;t
The entire book has been written \i

3
in a simple manner. Speciat attention .ri
given to theory of sampling, has been ,ri

4
rrvpotnesls testing and estimition.
have been made clear wit-h irllliriJru the tn.oreiicrr concepts 4l
il;; #;';;;J[
9
i!
examptes. Efforts I
to the sit,ations or practicat rire keep the
:fj|f#fJ.ffi:
9
so-tnat sreater i.ilr.Jis created I

t
{
I

a.;;
we are extremely gratefut to our I

t
colleagues who have done the
proof reading' Nonetheless arduous task of the I

s
some

/: /
lisnt
we shatt be grarefur if such u;r;;;;omisiions stiil appeai here and there [n the book.
ure'urougr,t to our;;ilJ'for prompr
'*.,*lri";eedsussestioni"nohearthy

s
Ii'fli"i'H;ff t-hHHiTil.l::ili,:ffi

tt p
co,easues in Rawarpindi an,r
i;'J:T:.#,:".:?Hf.:1,1?i^s::
j;H'#il.?:TI I: :u,
ji::T',.,:Trr:iliilii'ff '#Iilil[::::l rsramabad
j'J:ff
h
il'.xlil: jJ:::?s; 3::
I,"JffHJ:Hil';:*m'rlll":#i{ii{4x::,ff
jr*]rG j:;;iffi t.[:xffi':'#.;.:;:,ljTfl
'lil;:ilH,':: :'i:;:
ff ffi ::::t,,;.Y:#:'"ti:
stre ng us bv
th e n ed'
ti, ; ; ,.;t .'i. rifi ;ff.'rilffi
app reci g'
a fi
;t,.tf
,T. ffi:'3,.1['# j:i: n?*']fl-'^':l j:: I I *, to
e r p u b r is h e
jHff rs M ess r. M ajeed
ffi ['f 1?l.Y't^:.'m:;'*'j:'?:,",]1T,*dlHH,I Tt,ffi ;I?ff : ..o.u

ffi :Hl,l,il::,["..][.i*:l*11,"y^"::i_J"o,ililHT,,#r#f"I[Ir^llrr#,,,:;
Khurshid Khan, the computer operaro,
*lo;;'k.r*,ii'iril,{;T'i[o,}.t[.li*ilfl:
August l, ZAt1
Ghulam Hussain Kiani
Muhammad Saleem AUri",
CONTENTS

om
t . c
p o
g s
l o
. b
43
99
a t
s t
/: /
s
tt p
h
7
l/
(ui)
11.5.6. I(nown Probability ............. ......... BS
11.5.7. Non-Zero Probabiiity .................:................:........... .... Ba
11.6. Probability and Non-Probability Sampling ............... BB
11.6.1.
Lt.6.2. Sampling without Replacement........... ....... Bb
Combinations.........
m
11.6.3. ...... Bb

o
11.6.4. Permutations ......... ..;.............. ............:......... B5

c
Simple Random Sample

.
11.6.5. ................ ...r.......... Bb
11.6.6. Difference between Random Sample and
Simple Random Sample
ot
p
.............. 86

s
11.6.7. Selection of Simple Random Sample .,.:...... 36

g
LL,1. Errors ............;.... ........... Bz

o
Errors
l
11.7.1. Sampling ...............:.......... Bg

b
rt.1.2. Reducing the Sampling Errors ..............
.
..... Bg
Errors................
3
11.7.3. Non-Sampling .................. Bg
Distributiors....................:.....
4 ............
11.8. Sampling ..... 40

9
11.8.1.

t9
11.8.2. Sampling Distribution of X ............ 40

ta
11.8.3. Sampling Distribution of s2 and 52 ............47

/ s
11.8.4. Sampling Distribution of Difference ber*ceo rwo Means......... 50

:/
Proportion............. ..............-
11.8.5. .. b4 .

s
Proportion
11.8.6. Sampling Distribution of .......... bb

t t p
11.8.7. Sampling Distribution of Difference berre,en ff, zmd ir........... b8

h
[A Short Definitions.............. ........... 60
sA Multiple Choice Questions ..........64 "

rA Short Questions ............7L


Exercises 73-78
Chapter 12
Statistical Inference - Estimation ..i............ 79-116
12.1. Introduction................................. 79
72.2. Inference............
Statistical .............. Tg
12.2.1. Approaches of statistical Infsrtyr+--. ............79
80
12.3.1. Point estimator and point Estimate
......d.i.................. g0
- 12.3.2. Point Estimation ..........
12.8.8. u"iiuu"aness........ .. .....,.:........:::::..:..::..... """"';""""""
81

L2.4. Interval Estimation.............,............:........,:.:....-........ "" """" 81


'"""""-"""""""":"'9?
72.4.L. Confidence Coefficient...
tz.s._;;;;*ionofco.,nau....;;;;;..:..:..::.......
""' 82
om
c
12.5.1. Selection of proper Confidence lnterval....'."""""

t .
12.6.1. Meaning of the Confidence Interval
p o
s
;.......................... g6
12.7. conhdence Intervar Estimate for popuration Mean p .
Population Normal (Small Sample)..

og gg

l
12.8. confidence Intervar Estimate for the Difference between..........

b
two
' Population Means (Large Samples)..:.............. ..........
.
g.l
12.9. confidence Interval Estimate for the Difference hetween two

4 3
Population Means - popurations Normal (sma[ su-pi"*i...... gg

9
12.10. confidence Interval for the Difference between trvo

9
Population Means - Dependent Sampl"r......

t
l2.1L.Proportion............................. -...-.. -.................. 96
12.12.
ta
Confi.dence Interval Estimate for

// s
Population Proportion p (Large
Sample).. ................ gg

:
12.13. confidence Interval Estimate for the Difference hetween

s
two
PopulationProportions(LargeSamples)...'..........

rS p
rA
tt Multiple Choice
euestions
h r*u""i.ur..:-::".,
...............:;....... 108
os Short Questions
""""':"""""""" 110
...... r.13-116
Chapter 1B
Statistical Inference._ Testing of Hypotheses
..... Ll7_164
13.1. Introduction....
t8.2. Statisticar n*"irr"; ..............:. "" LL7

13.2.1. Nu, Hypotr,u*i, .::.....,.:......::::..:....:..:.....:...:.t.: il;


Lg.Z.2. Alternative Hypothesis......................... .... 11g

L9.2.4. Composite Hypothesis...........


.... 119 l.

--./n
( uiii \
. Aceeptance and Rejection of Null Hypothesis .......................... 1 19
13.2. f
13.2:6. Test Statistic......... .....:........... ..... 119

13.2.8. T\vo- Tailed Test........ ......"....... 120


tg.z.g. One - Tailed Test........ .............. 120
13.3.ErrorsinTestingofHypothesis.......................'..
13.3.1 Type I - Error

om
....".......122

c
13.3.2. Type II *-Error
.
........... tZZ

t
13.3.3.Relationbetweenaandp.'.'..........

o
: 13.4. Levelof Signifrcance........ .......... 124

p
13.5. Formulating Ho and H1 and Making Critical Region.............. 12b
s
g
13.6. General Procedure for Testing of Hypothesis .......... 126

o
1q.7. Hypothesis Testing Population Mean p, o Known
l
-

b
(LargeSample) ....;........".. .......... 128
13.8. Hypothesis Testing
-
3 .
Population Mean p, o not-Known

13.9. Hypothesis Testing


-
9 4 Population Mean p, o Known

-9
a
13.10. Hypothesis Testing
t Population mean p, o lJnknown

t -
Normal Population (Small

s
Sample).. ....... 132

/: / -
13.11. Hypothesis Testing Difference between two Population
I Means lrl * pz,6land ofr Knorvn (Large Samples)................... 135

s
tt p
13;12. Hypothesis Testing Difference between two Population
Means Fr - Fz, of and ol Unknown (Large Samples).............. 137

h
13.13. Test about pr - ttz, oi and o! Known, Populations Normal

13.14. Test about p1 -


ttz,6?and o! not linown,
Populations l.{orrnal (Smali Sampies) ...... 139
13.15. Test about Ir - F2, Dependent Samples,
Populations Ncrmal ...........,...... i40
13.16. Test of Population Proportion p (Large Sampie) ...............:..... 143
' 13.17 . Test of Differenre hetween two Population Proportions,
Pr - Pz (Large Sampies).i...................:........... .... ....... 146
13.18. Choice of Proper Test .-' Statistic .............. 149

I
( ix)

BA Shorr Questions

m
.RegressionandCorreIation-..........:....
14.L Introduction..........
14.2. Mathematical Model or Equatiorr..............
c o
,.. 165

.
" """" ""'
t
14.3. Non-Linear Mode1......... 19?

o
14.4. Statistical Model . .. ... ..............................' """' 168

p
""":"""""""" 16e

s
L4'4'L. Independent and Depend.en, v;#i;;.....

g
14.4"2. CauseandEft'ectneiation
....... .....".....:........... .'."'"'.'"',i1;

o
L4.5. Regression.......................
14.5.L.SimpleLinearRegression.......'.....

bl .
.
L4.5.2. Purpose of RegressionAnalysis............

3
14.5.3. Scatter Diagram
. .....................,........ l;; l

4
,i!
li
........... 173
74.6. Fitting a Linear Regression Line_the
9
:i

9 y "'
Method of Least Squares..
t
........... 125
14.6.1. Properties of the Regression Line..."....;...............i.....r..1...........

ta "
14.6.2. Regression Equation of X on 1Tg

s
......... .. .. ........,.... .
"" 17e

/: /
t4.7. Introductt"".-....._...._:.-..:: _ .
14'8' correlation.,........'......................................."..:........................:..
ili
s
14.8.1. Measurement of Correlation...........".

tt p
14.8.2. Ferfect positive Correl.ation......."..... ........ 1g4
......,. lg5

h
14,8.11. Perfact Negative Correlation.............
....... lgb
""":""""' """ 185
14'8.5. Seatter Diagrancs
14.9. Correlation Coeffi.cient for Sample Data......
1.g7
14.9. 1. Causa.tion in Correiation.".
14.g.2. Spurious Correlatio",...."..":."..:.."................""". "" 191
14.9.g. Changeof Origin '"""""" 19i

14.9.5. Change of Origin and $cale ........ 792


14.9.6,jr'irraLirrearBegression}lelation'...:...".., tgS
14.9.?. 'r,for Random Variables
.-._.-:_..". " .......... l'u
I93
(r)
14.10. Relation between br*, b* and r ................ 193
14.11. Properties of Correlation Coefficient r ............. ........ 194
[aShortDefinitions...........................'..
uS Multiple Choice Questions .....;.. 199
sS Questions
Short ........ 205

m
Exercises.............. ............ 2L0-2LG

o
Chapter 15
Association.........
t . c 217-.254

15.1.1. Notation for Attributes ........

p o
...............;.....217

s
15.1.2. One Attribute .................:...... .... 218

g
15.1.3. Two Attributes....... .....i.."........... 218

l o
15.1.4. Positive and Negative Classes............ ...................... 219

b
15.1.5. Order of Classes ..di.....................220

3 .
15.1.6. Ultimate Class Frequencies........... ...........220

4
15.1.7. Lower Order Frequencies in Terms of

9 9
15.1.8. Higher Order Frequencies into Lower Order Frer1uencies......22L

a t
15.2. Consistency............ ....223

t
15.3. Attributes...
Independence of ....224

/ s Defined..........
15.3.1. Independence

/ Association
. ...................226

:
Independence.......
15.3.2. Another Definition of ..227
L5.4.
s
Coefficient of ..........22g

tt p Independence........
h
15.5.1. Test of ......... 230
L5.5.2. Direct Formula for Calculating Xz in 2 x 2
TabIe...............................
Contingency ....... 234
15.6. Contingency Table of Higher Order .........235
;
15.7. l,imitaticns of X2............ ...........:.... ............ 2BG
15"8. ,Rank Colrelatiou.............. ...:............ .........240
@ Short Definitions.............. ..r............. .........243

$€ Questions
Short ................ ........247
Eiercises.............................. ...:........ 250-254
(ri)
Chapter 16
Time Series.................... .. ZEE_2gg
16.1. Introduction.......... ..........................:..... ..... ZEl
L6.2. Purpose of Time Series...... .....".......... ........ zbs
L6.2.1. Graph of the Time Series ..........2b5

m
16.3. Components of a Time Series
o
...286

16.3.2. Seasonal Variation ................


t . c .... ZbB

o
16.3.3. Cyclical Variations..............

p
......25g
16.3.4. IrregularVariations..............

s
....260

g
L6.4. Analysis of Time Series .............261

l o
b
16.5.1. The Method of Free-handCurve................
.
.........,....262

3
L6.5.2. The Method of Semi-Averages .268

4
16.5.3. The Method of MovingAverages ..............265

9
Squares...
L6.5.4. Method of Least

9periods
........269

t
Line.........:......
16.5.5. Fitting a Straight ..............269

ta Coding
16.5.6. Codiirg of the Time ...-....!.,..... .......269

/ s
16,5.7. Change of Origin in .......270

:/
16.6. Fitting of Second Degree parabola ................ ........... 27B

s
uS Short Definitions..............................

tt p
......... Z7S
0g Link with Time Series Components............ .............277
Bg Multiple Choice euestions
h
........27g
rg Short Questions ........2g2

Chapter 17

t7.I. Introduction to Computerp..............


"':"".l"
Zgg
L7.L.l. Computer Capabiiities and its Uses ......... Zg9
t7.2. ComputerHistory 2g0
17.3. Tlpes of Comput*".......... ".. ... .... .."............... .'.'.'.... rn,
(*if)
17.3.2.
L7.3.3. Hybrid Computer .......2g1
L7.4"
17.4.t.
77.4.2. Minicomputers....... .... Zgz

m
17.4.3. Microcomputers.......:........ ......... Zgz

o
17.4.4. Super Computers.............. ......... 2gg
17.5.

t . c
................o
t7.6. ComputerHardware . ZgB

p
17.6.1. Input Unit........ .......... 29A
t7.,6.2.
Storage
g s
o
1?.6.3. Secondary .....297
17.6.4.

b l
Computer Sbftware
.
17.7.

3
:
... 2gg
ti.7.L.

4
............... 299

9
L7.7.2. System Software..............;.................. ....... 2gg

9
Software..............
t
t7.7.3.. Application ... 901

a
17.8. Basic Idea of Writing and Running a Computer program...... BO.L
17.8.1. Prograrn Design
s t ......... 801

'
77.8"2.

://
s
1?.8.3.

t p
17.8.4. Documentati.on, Implementation and Ma,intenance ........,....... 802
L7 "9.

17.9.1.
h t
17.9.2. Binary Number System ..........." 802
1?.9.3. Octai Number System.:.............. ............... 802
17.*"4. Hexadecimal Number Systern .. g02
17.10. Einary Number System as a Foundation of Compurer ........... 802
[g' Iv{uitiple Choice Questions ........ g04
[a
Statistical Tables B0Z-B1Z
Chapter

DISTRIBUTION
10.1 INTBODUCTION
om
. c
ffil*"'"*r, ,n" urr"overy of normal distribution goes back to the seventeenth
and eighteenth centuries and is associated with the names of De
t
o
(166J --1754), Laplace (LI4g Moivre
- LBZT) and Gauss dnt _ t85b). el-tfrrt time,

p
received the attention of mathemat7icians and naturd it
,J;;;i"i-r.i""tists.
s
application to biological Its
was pioneered at a later date by Sir Francis Galton

g
(1822 - 19tt). The normal{ata
distribution, also cailed th" ,,or-al il;;f ;;r, is widery

o
lusJ;; ;;;hirfoppostte
used in research in the biologicar, physical and social ..i;;:;";;;;"1
quite hfe we

b
often come across the distributions close to this distribution
and hence the

.
"normal" is used for it. The word normal is not to be
to

3
the word abnormal. Normal distribution is also called'mother
of distributions

4
because various other distributions are generated from
this distribution. This

9
distribution makes the base for inferentid ;tati;ti; brrrr"h of statistics in which
" of informrition gained from

9
we draw conclusions about the populations on the basis

t
the sample study.

ta
LO.z NORMAL DISTRIBUTION

s
Normal distribution was first described in 1ZBB by De Moivre

/: /
limiting form of the binomial density as the rr,,*U*, of trirl, become as being the
discovery did not get much attention-and the aisiriu"tlon
infinite. This
was ,,discovered,,again by

s
both Laplaee and Gauss about a half century later. Both
men J;;il *iti prout"*.
of a9fronomy, and each derived the- ,ror*ri

tt p
distributio;;;;i.ffi;;;;;;;
seeniingly deseribed the behavior of errors in astronomical
measurements. The

h
distribution is often referred to as the "Gaussian" distiibution.
one of the most important examples of a continuous probability
distribution is
the normal distribution also called normai curve oiCur.*i"n
distribution.
E-vv^'vBvrv'' The
r curve
is defined by the equation

y = f(x) = +.-*(+)'_o<X<oo (10.r)


o {2n
where, p = mean of the distributio a parambter.
o = standard deviation of the distribution
Tt = a constant approximately equal to 8.141b9
-"a.parameter.
I
Basic Statistics Part-II
e = a constant approximately equai to 2.71828
X = abscissa, measurement or score marked on horizontal axis
Y = ordinate, height of curve corresponding to an assigned value
ofX
The total area bounded by the curve (10.1) and the X-axis is one. The area
under the curve between two ordinates X = a and X = b, where 3 < b, represents the
probability that X lies between a and b and this probability is denoted'by P(a<X<b).

m
When the variable X is expressed in terms of standard units or standard normal
variate , = +,
c o
then equation (10.1) iS replaced by the so-called standard form

1
t .
o
_22
Y= i-a Z .-....(10.2)

p
^'lz"

s
In this case we say that Z is normaliy distributed. The nrean of standard

g
normal variate Z is zero and its variance is one. The value of.Z is zero when X = p. A

o
graph of this standardized normal curve.is shown in Fig.10.1. In this graph we have

l
indicated the areas included between Z - - 1 and + l, Z= - 2 and + 2, Z= - 3 and +3

. b
which arc 68,270/o, 95.45ya and 99.73% respectively. The area under this curve
bounded by the ordinates gt Z = 0 and any positive value of. Z are given in table.

symmetry of the curve about Z = A.


43
From this table the area between any two ordinates can be fotrnd by using the

99
a t
s t
/: /
s
tt p
h 1-l
z--
+
012
68.27
i< aE o/
% +
99.73 %

Fig. 10.1 -
10.3 PROPERTIES OF THE NOBMAI-, DISTRIBUTION
7"@) It is symmetrical about ordinate at X = p. It means that the central ordinate at
X - p d.ivides the curve into two equal parts.
L@) The arithmetic mean, median and mode coincide.
(3) The lower and upper quartiles are equidistant from the mean and are at a
8r, p-a'b7\56
distance of 0.6745 o.

Qr, P *o'b'145 d
3

13 fj::,i:::::::i::.'-171o or quarrile
(5)
or 4/5 o(approximaterv) . *€t"* . o .Clt rtE
3ff;tlj:._:_11Til::31g"
which is equal to 0.674bo or z)g"
deviarion ru uq.f,lBiri" probabre error
tupp"oJ*-i"rri -
(6) The ordinate is highest at the mean p.
(7) ,2a-8 ,' Li+ E
::":ff^:Tlffr:
direction) which lie"1.'nrl3.ction
at a di
(the point. *ffi ;"
;,*;;,,e ?hanges its
of one o g!o.y{he mean p and one o below
the mean p.

m
(8) The curve is asymptbtic to the't a." Un"f It

o
means that it
but never reaches the base line.

c
"orrtr.r,rlJ[-rpproach

.
(9) In normal curve,. if nth moment is odd, the value of

t
alwavs be zero. This is because t\" ;;*;r'irir"
this odd moment will

o
symmetrieal distribution sum of the}ffir"
is symmetrical and for

'"w sp
equal to the sum of the negative ddiation;
i""irti"ns from p will always be
each other. If nth moment
fr;; t, and thtrs will cancel out
sYErr' *u
is even, -" frr*ru it
g
rru ro
n! '"'fiilt'
"":;; """lrti* *vtAwt.^,t^
^)) '*^,t^"t:
o
= *
l
P,,
"t (where
,-i{,(, 'JJ
\wrrtiru'rs
n is even/
e,
ba z"Lo
o;(f )r
b
W
. fi tl =Fr !--* =
, o

3
=
It that -
4
follows p, o2, p:= go1, p,
= = y? = 0, i.e. skewness is zero and

9
ffi

9
P, = = 3, Tz= 0, i'e' normal curve has zero kurtosis. The normal

t
ff distribution

a
is also called mesokurtic.

s t
(10) The total area under the normal curve

/
is unity.

/
(11) Area properties of the normal

:
distribution.
In a normal distribution:

s
,.,-

tt p
p+0.674bo coversb}o/oare,- : p+1o
o I covers 6g.270/oarca
1r * 2 covers g5.45% arc 0 I
p+3 ss,,,% area

h
covers
10.4 STANDARD NOBMAi UsrirsurloN
The properties of the normal-
9u1ve _permit us
distribution in terms of the ,u"irUtu 2 defined a to define standardized
as
Z = X-P
o
This is equivalent to measuring the distance
standard deviation o as the unit of measuring X from the nrean p using the
distance. The variable z is termed. as
the standard normar variate a"a
probability density function in termsd1v;r ; ,'";y ;;;;rtant role in statistics. The
of Z is
p(z) = +r-T ;
"'l2n
Basic S-taUstics Pa ft-II
The mean of random variatrle Z is zerc and its variance is fnity. If lve know the
mean p and.the standard deviation G, we can calculate Z comesponding to any value
ofX and corresponding from the central ordinate to the value ofZ.
10.5 USE OF THE AREA TABLE
The table "areas under the standard normal curve" gives the areas for various
values of.Z. For example Z - - 1 to 0 and 0 to +1 gives the area 0.34134 as shown in
the following frgure.

om
t . c
p o
g s
l o
. b
Fig. 10.2
As the curve is syrnmetricai, the same area table can be used for negative

Z= and.Z = 0 is also 0.34134.


-L 3
values of. Z. The area from Z = O ta Z = 1 is 0.84184, similnrly the area between

4
Example 10.1.

9 9
t
In a normal distribution mean is 100 and standard deviation is 10. Find:

aquartiles
(, the mean deviation
t
(ii) the quartile deviation

s
(ili) the third and fourth moments about mean (iv)moment ratios (B, and pr)

/: /
(v) the lower and upper (vi)the median and mode
(vii) the values of points of inflection

s
(viii) the value of the ma:5imum ordinate correct to four places of decimal.

tt p
Solution:

h
Here, p = 100, <l = 10, 62 = llz= 100
(1) Mean cleviation = 0.7979 o = 0.?979 (10) = ?.9?9
(ii) Quartile deviation = 0.6745 o = 0.6745 (10) = 6.7,15
(iii) Third moment about mean = fh = 0, because all odd order moments about mean
in a normal distribution are zero, i.e. pr = Irs = p5 = ...- 0.
Fourth moment about mean = Ir4 =.3on = g(tb)4 = 3O0OO

(ir) p,=H=ffi=oand o _ Il! _ 30000 _


t', - pZ --
(1Og1z "
(v) Qr= Ir - 0.6745 o = 109 - 0.6745(L0) = 93.255
Qa= p + A.6745 o= 100 + 0.6745 (10) = 106.745

t-
[Chapter 10] Normal Distribution E

(vi) Mean = Median = Mode = 100, because in a nonnal distribution the rnean,
median and mode coincide.
(vii) Normal distribution has two points of irrflection which lie at a distance
of one o
above the mean pr and one o below the mcan F. i,e.
p-o= 100-f0=9.0 and p*o=100+ l0= ll0
(viii)Maximumordinate =+ - - 1

o-V2n ffi=o'0399 t
Example 10.2.

om
c
In a normal rlistribution, mean is zero and the standard cleviation is l. Write

t .
down its equation and find the value of the maximum ordinate
correct to four places

o
of decimal.
Solution:

s
The equation of r,ire rrormal curve with mean p and standard deviation
p
o is
r = r(x)= rH
Vc/ (+)'
og
- oo < X < o;

l
"-;

b
when p = 0 ,r,o*T = ,*
.
,, equation of the normal curve wiil be
y = f(x)= $u-j*'
v 4,r

43
9
We know that th.e nraximum ordinate is at X p and
= |t = 0, the vilue of the

y = -f j,o,' 9 -
maximunr ordinate is

a t 1 .,0- 1.. (sincec"=l)

t
= *
^J2n ,tr"- G
"-

/: / s
1 1

2(3.14t6) = 2.5066 0.3989

s
Example 10.3.

tt p - - 0.6 2.2L
Find the area under the normal curve in each of the cases:
(i) Between Z= O andZ= 1.2 (ii) Between Z =- 0.68 and Z = O

h
(iii) Betweenz= 0.4G andz= (iv)Betweenz= 0.g1 and z= l.g4
(v) To the left of z = (vi)To the rig.ht of z = -r.2g
(vii) To the right of Z = 2.0b and to the left of Z
= - t.44:
Solution:
(i) Between Z = 0 andZ = l.Z
Required area = Area between
Z= 0 andZ= L.2is 0.Bg4g
Basic Statistics Paft-II I
I
(ii) Between l= - 0.68 and Z= 0
Required area = Area between
I = -0.68 and Z = 0 is 0.2518

(iii) Between Z = -..0.46 andZ = 2.2L

om
c
Required area = (Area between

t .
l=-0.46andZ=0)

o
+(Area between Z = 0 and Z = 2.21)

p
= 0.L772+0.4864 = 0.6636

(iv) Between Z -- 0.81 andZ-= L.94


g s
Required area
l o
. b
- (Area between Z=O.and Z = 1.94)

3
-(Area between Z=A andZ=O'.81)

4
= 0.4738 - 0.2910 = 0.1828

99
t
(v) To the left of Z = 0.6
-

a
Required area = (Area to left of Z = 0)
l=
t
- (Area between - 0.6 and Z= 0)
s
//
= 0.5 - 0.2258 = 0.2742

s :
tt p
-06 r,

(vi) To the right of Z = -t.28

h
Required area
= (Area between Z = - 1.28 and Z = 0)
+ (Area to right of Z = 0)
= 0.3997 * 0.5 = 0.8997

(vii) To the right of.Z = 2.05 and to the left


- of.Z = -1.44
Required area = Total area
- (Area between Z = -1.44 and Z = 0)
- (Area between Z = 0 andZ = 2.05)
=l - 0.4251 - 0.4798 = 0.0951
[Chapter 10] Norqral Distribution
Example 10,4,
Given a normal distribution with p = 40 and o = 6, find
(a) the area below 32 (b) the area above 27 (c)
the area betwe en 42 and 81.
Solution:
Here, lr .=. 40, o . = 6, Z ,=
X--u x- +o

,=*#=-B
m
(a)
o
=-1.88

c
P(X< 32)=P17 <- 1.33)
' =P(- q sZ < 0) - P(- 1.38 <Z < 0)
t .
=0.5-0.4082=0.0918

p o x

g s ,Z

27-40
-?13
l o
b
(b) u- = 2'L7

.
6 -

3
P(X> 27)=P(Zr-2.I7)

4
= P(- 2.t7 <Z<0) + P(0 <Z<*)

9
= 0.4850 * 0.5 = 0.98b0

t9
ta
(c) z,
"#'='#//s
= =3=0.r,
rz
o, _
s :
Xy-40_51-40 11
--6- =-l-=
tt p
= 6-= 1.83
P(42 < X < b1) p(0.88 < Z < i.8B)

h-
=
= P(0 s Z < 1.83) -p(0 < Z s 0.S3) x
= 0.4664 0.1298 = 0.3871 II Z
0 0.3.j 1.83
Example 10.5.
- A newspaper stall sells an average of 400 papers per clay. Assume that these
sales are normaliy distributed with a standarj deviation of 2b. For
each of the
following probability questions use graphs with both X ancl Z axes and indicate the
c-orresponding areas under the nonn-al curve. what i, th;;;;"ujru,
,rr," un n gir.n
day:
ia) more than 420 papers will be sord? (b) at most 410 papers will be sold?
(c) less than 395 papers will be sold? (d)
between 3g0 and 40b papers will be sold?
Basic Statistics Part-II
Solution '
. Y-, x-400
Here, p=400, o=25, Z= * =
o 2lt

'(a) Z = 420 - 400 : 0.9


25
P(Xr420)=P(Z>0,8)
=P(0.2.*) -p(0 <Z<0.8)
= 0.5 - 0.2881 = 0.2119 {20

om
c
0.8

t .
o
(b) rJ-
410 - 400
-u..1

p
25

s
P(Xs410)= P(Z<0.4)

g
=P(- at <Z<0) + p(0 < ZsO. tr
x

l o
= 0.5 + 0.1bb4 = 0.65b4 +,ti li0

. b
3
395 - 400
'L = --Tb = -0.2
4
(c)

9
P(X < 395) = P(Z 0.2) < -

t9
< <
= P(-oo Z O)-p( - 0.2 <Z<O)

a
= 0.5 - 0.0793 = 0.4207

s- t
-
/: /-
X1 400 390
(d) zr = 400

s
25 = --25- =-u'4

tt p
t,z -
Xz - 400 405 400 I
ZS25 - v'2 =
I

h
P(390 <X<405) = P(- 0.4 <Z<0.2) :{ I
:i$il l6ali05
= P( - 0.4<Z< 0) + P(0 < Z< O.Z)
--nO Z
. ,.) -1 i_r t) l
= 0.1554 + 0.0793 = 0.234i
Example 10.6.
The heights of freshmen students at a military acadei:r;' a.re nornrally
distributed lt'ith a lnean of 5 feet 10 inches anci a stanclard deviatioir of 2 inches.
(a) What is the proportion of freshmen at the acadcni}, rvho arc taller Lhun {.l t'cer ll
inches?
(b\ what is the proportion of freshmen who are less than b feet 7 inches? x
ir
,'c) W'hat is the proportion of freshmen betrveen 5 feet 8 inches and 6 feat 0 inches?
6
l C(
iCha -10

Solutian:
(a) z =25:rio = 2.i
Ir(X>75)=Ir(Z>Z.i)
=P(0 <Z<m) *p(0 <Z<.2.5)

m
= 0.b -- 0.4g3g = 0.0062 or A.62,)/o x

o
75

c
Z

.
2..5

t
(b) 7. = 9ji-9 --

o
h
r.5

p
P(X. riTJ .- Il1,l, ..- I.ir)

s
=P(-.n.:l <0) _Ir(_ 1.5 ,<z<0)

g
= 0.5 -" 0.ziBB2 = 0.()66g or.

o
6.GU9ru

bl *1.5

(c) ,, = \j! 68 -- 70
2
--I
3 .
,, _ [i:_?Q== 72 _ 7a
,,,)- 2
9 4
--7"-=+l
-P(- l<?< 0)+p(0.2.1)t9
I'(68 < X < T2) = p(* t <Z <

a
L)

s t
/
= 0.3413 + 0.g4lil = 0.6g26 or 6g.?6%

:/
10.6 NORnIAI, FITEQUENCY
DISTRIBUTION

s
sometimes we have to cr:tlvert

tt p
the norrnal probability distribution
freqr'rency distributio"' wr,u"
irre prohabilit;, aistJbution is
into normal
rnultipliecl with thc total
ilffil::ffiilHillJih l]i o"
g"t u,"

h !"-j(+')'
rrequen., al.i.iu,tion rhcl
"",*ar
r(x;=
u'hereas the nornral frcqucncy
clistributio,, i"

For exarnple' we know that thc'probability


X will fail bef'r'een the intervar is o.cszz that the r.ncrorn variabre
1,-o.to pro. i:h; ;;abilit_v 0.68?? can be
into per.centage of observation. *hi.h co,vertecl
68'27 %. If the tr:tal number of
fi" U"t*""Ji"io.una.,r -i.o.. ,fhis pcrcentage
obsorv,tiors a.c r000, is
contain G8s obserirations, i.e . 0.6u27 - -' the
r'rru incerval
rrrLlJ'val 1tlt _
- o to
l * + o wilr
x 1000 = 6g3.
to Basic Statistics Paft-II
Example 10.7.
In an intelligence test administered on 1000 children, the average I.Q. was 42
and standard deviation 24.
(a) Find the number of children exceecling a score of 50.
(b) Find the number of children lying hetrveen the scores 30 and 54.
Z=+ =ry
m
Solution: Here, lL=42, o=24, N=1000,

,(a) ,=u'oiin'=0.33
c o
P(X>50)=P12>0.33)
t .
= P(0 <Z < *)- P(0 < Z < 0.33)

p o
s
= 0.5 - 0.1293 = 0.3707

og
l
0 0.33

.b
IIence the expected number of children exceeding a score of 50

3
= N.P(X > 50) = 1000(0.3?0i) = 370.7 or 371 approximately.
(b) z,=V=W=-o.b
9 4
t9
?7 -Xz-42 54-42
Zr=T=-1f,-=+0.5
-
ta
P(30 < X < 54) P(- 0.5 <Z< 0.5)

/ s
= P(- 0.5 <Z< 0) + P(0 < Z < 0.5)

/
:
= 0"1915 {: 0.1915 = 0.3830

s
tt p
Hence the expected number of children lying betrveen the scores 30 and 54
=
= N.P(30<Xs54) 1000(0.3830) = 383.

h
10.7 THE NORMAL APPROXIMATION TO THE BINOIIIIAL DISTRIBUTION
The (continuous) normal distribution provides a close approximation to the
(discrete) binomial distribution when n, the number of trials is very large and p, the
prohability of a success on an individual trial is close to 112. To provide a theoretical
foundation for this argument, let us make the following statement, a proof of which
can be found in most of the texts in mathematical statistics.
If X is a random variable having a binomial distribution with the parameters n
and p, thenZ = (X - nptn[npq approaches the standard normal d.istribution when n
approaches infinity. Strictly speaking, this statenrent applies when n approaches
infinity, but. the normal distribution is often used to approximate binomial
probabilities even n is fairly small. A good rule of thumb is to use this approximation
orily when np and nq are both equal to or greater than 5- The procedure to follow in
using a normal approximation to the binornial is as follows:
[Chapter t0] Normat Distribution
11
il"p l. Cornpute F = np and. o = V;;;
Step 2. Apply a continuitv ."r*.tion
factor to convert a discrete (binomial)
random variable into o (rror.nul)
cclntinuous random variable, so
standardized norrnal Z transfon"uiion i, that the
(x-1/2)_r
o orZ=
3'
m
step I"':e a stanciard rtormal table to rlnaine
probabilities corresponding to Z in

o
order to obtain the binomial b(x;
,j. fo. example,

c
",
,[5P-=, Yzr-] -
.
(X = a) = (" *

t
=
P(x < h) = vlrz
=a.i+-]
p o
p(x > c) _: vizrq+l ,, t _yfz.
g s *
(" {?) _ p-i
j -
o
o

l
ot.Jnao* urr;r[tu x. ,
'
where a, b and . u." *,*u uulu".

b
rO.8 IM/ERSE USE OF THE AREA

.
TABLE
The area table of normal aistrJiution

3
is designed to give the aread.for various
values of the standard normal ,ariate
z. Butthi, ;;;; tabre can also be used to reacl

4
Lhe values of z for a cerLaln given
area under the normal curve. This is called

9
inverse use ofthe area table. s"ppo."
there are gil,/oob.errations less than a certain

-lf mX:"JfiJ lff;." t9


point sav, X(Pn.). crearlv rhe area
between p and x i.;;;;ir.i;;-t:il'J
onu rruu

a
read the value c"r"*ponding to the area equal to

t
"rz.

// s
s :
tt p
h
Exomple 10.5.
0 1.645
Z

A random variable X is normally distributed


deviation = 4- (i) Finrl a point that has with mean = 40 and standard
9? * ;ith";irrriurrro., berow it.
. (ii) Find a point that has 62.2 % of
the disiribution berow it.
(iii)Find a point that has 90 % ofthe
distribution above it.
."
(iv)Find rwo points containing th"
midJle;;t;;;;:'"
(v)Findtwopointscontainingthemiddleg5%;;;;,.
(vi) I.ind p2s, pss, p* and prr.
r
Basic Statistics Part-II
Solution:
Hgre, p=40, o=4, Z=\J! - X:40
(i) P, is a point having 97 percent of the area below it Area table shows this
point to be
x-40

m
1.88 =
4

o
x-40. = 4(1.88)
' X=
Thus, Pr, = 47.52
40+7.52=47.52

t . c
o
I--.: 11

to
p
re:
Pc-

s
(ii) Por.,is a point having 62.2 percent.of the area belorv rt. .{rea table shows this

g
point to be

-
l o
b
0.3108

X-40 = 4(0.3108)
=,
3 .
4
X = 40+ L.2432=4"1.2432

9
Thua, loz.z = 4L.2432

t9
{0 Pe::
(iii) 90 Yo ofthe area above it, Area table shows rhis pornt ro be
-ry ta
-t.28

// s
X :. 40

s :
= 4(- L.28;)
X = 40 -5.L2 = 34.88

tt p
(iv)
.
'
h
98 % area under the normal curve
means that 0.49 area each to the
left and right of mean. From the
area table, the value of Z for which
the area between 0 and Z is 0.49 is
. 2.33. Since X, will be on the lefi ot'
the mean, therefore Zr ,will be
negative and X, is on the right of
the mean, therefore Z, will be
positive

\--. -l
10] Normat Distribution

/ -- Xl-40
n
un_
4
Xr-40
-2.33 = 4
+ 2.33 - X:: - 40
4

x,-40 4(-2.33)

m
X.r- 40 = 4(.2.8J)

o
xr 40-9.32=80.68 X, '= 49 + g.B2 = 49.82
Ilence two points are 30.68
and 49.82.

t . c
o
(v) 95 9'o area unrJer the normal curve

p
that 0.4750 area ur.n to if,.

s
-m9ans
left and right r:f mean. fr"rr- ifr"

g
area table, the value of Z f.or
which

o
the area hetween 0 and Z is

l
C..qiSO
is 1.96. Since X, will be on the-jeft

. b
of the mean, therefbre Z, will be

3
negative and X, is on th.e right
of

4
the trnean, therefore Zz witt be

9
positive

Lt=TXr-40
n
t9
a
. X:l-40'
:: -----

t
/:,,
"4

s
v _40

/: /
Al-
-1.96-4- + 1.96 = &-L9

s
X,-40 = 4(-1.96)

tt p
X,- 40 = 4(1.96)
Xr = 40-7.g4=82.16
. X, = 40 + 7.94

h
= 47.g4

re 32.16 and 47.94.


(vi) P,o is a point rrrifi"rT"j:ilt:i' the area below it. Area table shows
point to be this

x-40
- 0.8415 = 4"
X-40 - 4(- 0.8415)
?''
'\r_ .1\ - 40-3.306=86.684
. Thus, Pro = 36.6b4.
14 Basic Statistics Part-II
'Pru is q point having 80 percent of thc arca below it. Arca tablc shnrvs this
t- point to be

_= x --40
0.8415
X
-{-
--40 = 4(0.8415)
X = 40+ 3.366-43.366

m
Thus, Pso'= 43.366.

lr't

c o
Ps"

t .
Pro is a point having 95 percent of the area belorv it. Ai:a tabie shows this

o
point to be

p
1.645 -.
x-40
X-40 = 4(1.645)
g s
o
X = 40 + 6.58= 46.58
Thus, Pr, = 46.58.

bl
3 . -10 Pq:

4
Prn is a point haring 99 percent of the area beiorv lt. -\ea tahle shows this

9
point to be

9
x-40

t
2"33 =
4

ta
X-4r) = 4(2.3t1 ,

s
v_
^.- 4rJ+9.32= 49.32

//
Thus, Pro = 49.32.

s : {{} Prs

t p
Example 10.9.

t
In a normal distribution 31% of the items are und.er 5l and E9,/o are over ?6.

h
Find t;he mean and the standard deviation of the distribution.
Solution:
Let p ='Mean, o = Standard deviation of the normal distribution.
- _X-u
lr -
o Xr-li _ 5'l-tt -
Ir, -
o,- Xr-[ 76-tr
Lo - -
o,ooo
Since 3I% of the items are under 54, the area
to the left of the ordinate at X = 54, is 0.31.
Therefore, the area between X = 54 and thc
mean p is 0.5 - 0.31 = 0.19.
Then the eorresponding value of Z, is 0.495u
54-Lt
Zr=- 0.4958 = -;-- 5+ tr

t-
[ChdDt€l lOI Normal tlictrihrrx^-
("' we have taken z,toben"*urr,
at mean)
54-p =-0.4988o or p-0,49bgo = b4 ...... (1)
Again
It is given that 8 % of. the items are over 76. Therefore, the area uncler the
normal curve betrveen pr and 76 is 0.42 (or 42 %)"

om
The corresponding value of Zris l.40bB i.e. Zr= 1.4058 ZA+
-o =

t . c
o
(We have taken Z, to be positive because it
p
falls on the right of the mean
ordinate)
76-p=1.4053o or p+1.40b8 o = 76
g s .....(Z)

o
'
l
solving equations (r) and (2), we get 1.g011. o 22 or o = fl.57

b
=

.
Substitutirg o = LL.57 in equation (l), we get
-
3 or
0.4958(11.57.; = 54 p = b4 + b.7864 or 59.74
4
tt = i9.7864

9
Hence, Mean = ig.74and S.D. = ll.b7

9 ,rirtribrtior,.
Exanrple 10.10.

t
= ta
In a normal distribution the lower quartile is I0 and the upper
--r-r-- quartile is
Find mean and standard deviatiop of the
a-*' 22.

Solution: Here, Qr l0 and er =

/ / s 22

s :
The two quartiles are given by

tt p
Qr = F-0.6.745qandQr = p+0.674bo
Substituting the values of er and ea, we get

h tt-0.6745 o = 10 ...... (1) p+0.6745o = 22 ......(Z)


Solving equations (l) and (2), we get 2p = 32 or
trr = 16
Substituting p = 16 in equation (l), we get
16-0.6745o=l0oro=g.g
Thus, the mean and standard deviation of the normal distribution are
16 and g.g
respectively.
I
I
16 Basic Statistics Fart-II
SHORT DBFINITIONS
Normal Distribution
A normal clistribution is a. particular idealized. smooth, bell-shaped hrsr.oqrartr virti)
ail of the randomness removed, [t represents an ideal data sei that iras lor,s of
numbers concentrated in the midcile c.rf the range and trails off sy'mrnetrica).lr, ,.r1
both sides. A data set follows a normal distribution if it resembles ihe :rnoeith,

m
symtnetric, bell-shaped normal curve, except for some randomness The norur;.r!

o
distril.rution piays an imp,:rtant role in statistical theoil' and pracricc

c
Stanrlard Nor'ma I f)istribution

t .
The disrr:ihution of a nortnal randorn variablt, rvith :n.an zci; ;irtl ,ri::indAr'[l

o
clr:viaciorr one l,s ,:llled a starrdard :ronrral distrrbuticrn.

p
ot'

g s
A norntal Ci.triirut,r.orr t,hat has a mean of zerr, and standard derration of onr is,:allco
tlre stanitarrJ norrn,ll rlrqtriburion If 7,is the st.andard n,trmal rarjoii i,arrabJ,;. thr-r:;

l o
Zh.rstlr--prohahitrit,,'disrrihrrtionf(z)=+o.f;fn,_.'<Z<....

b
tl9n

.
MULTIPI,E . (.'T{OICE QUESTIO\S

3
4(b)
t. A normal rjisr:ribution has the me;rn p = 200, If 70 percent of rl;e area under the

9
{rrrve lies r;r, t}:e lert of 220, t}re area to tlte right of 220 rs:

9
(at 0.3 d

t
05

a
(c) t) 2 (d) 0.7

st
2. Given a nrirmal distributi<.rn with p = 100 and or = lCO. rhe area to the lcft of

/: /
100 is:
(a) cne (b) equal to 0.5. ,

s greater than 0. j
(c) less than 0.5 (d)
3.
t p
A randon: variable has a norrnal rlistribution rvith the mean u = 400. If 80

t
percent of tlie area under the curve lies to the left of 500. tire are: betx'r:e rr 400

(a)h
and 500 is:
0.5 (tr) 0.2
(c) r
0.3 (d) zero
4. In a normal distribution mean is 100 ancl standarci cleyiation rs l0 The r';riues
of points of inflection are:
(a) 100 aurl110 (b) 80 and 120
(c) 90 and 110 I (d) none ofthe above
5. If X is a normal variate with mean 20 and variance 16. The respective values of
B, and B" are:
(a) 0and3^ (b) Sandl
(c) 0.5 and 1 (d) 3 ancl 3

\-
10] Norma! Distribution
t7
6. A random variable X is normally clistributed
with yt=70ando!= 25. The third
rtroment about arithrnetic mean is:
(a) zero (b) less than zero
(c) greater than zero (d) none of
7. if X.is N(t,,iS), the fourth cent.ai ' the above
-- -' ,";;";;l;, Bd "
.; . ^t-'i_'"
^'
(a/ ua (b) 75
L,u
l.L (c:\ Rrr (d) 100
-\
m
,n., stanclard nonnal clistributio n, p(Z>

o
li,
(a) more then 0.5 mean) is:
(b) lcss than 0.5

. c
(c) equal ro 0.b (d;
t
9' Given a standarclized normal distribution clifficulr to tell

o
(with a rrrea, of zeroancl a stanciarcl
,leviation of one), p(Z < variance)

p t * ?u
is equal to:
(a) 0.8413 (b) o.B.1lB !=e^1LX
i;\ j;fi7 r'/ s (cr) o'ooo, 'tLy=

'-g er= f
t'/
10 ancl X is N(10' 2Q)' thc. ttteatr of '
,.,
'o
1o):

l
(a) 50 (?) Go 6^
(c) ?o
*1b
.
lI. If X is a normal .anciom variablc

3
-:iJ,
'J = 7, rr'y = X - ?" then stanrlard cleviation
p = 50 arru
a.d Dr'dlrLrili-r
standar.cl dcviation

4
ofy is;'JV

,. iii J rL r r 9 [] ',1 ,f,,',i;;i;,, b-


t9 (b)
area tc the left of (p+3; for a nonnal distnbution
tr.;
\-/ (,r, o.lti
l'h.. is api_.roximately equal to:

a
:

t
0'il1
ic) 0.50

s
r3'
(a) o.tgtb / , '41*
distribution with * =
,5:: ::^ru&,obabliry

/
ffif ;1iral or a varue-srepr.e,.

: /L\ .,...,- /rfl-.f,-


s
(c) o6elb :=Et>; [:] '
a,V+' g-,[ ','''
, ^

tt p i
313i3
14, For a normar distributnmiirrr., mern p i -: i
and.t*n,lu.,i ereviation
I

(a) c:

(c't h
Approxirnately b % of varues ar.e outside .,.i-
the
{L') A,proxir,at,e}y % of varues are greate. than'ange 2cr)\(pr -. 2o) --
'--l \r +
to (p .

1p * zor
r\pproxirnateiy 5 % of varues are outside
id) ' i
the r.ange (p*o.)
Approxirnaiery b 96 of varues are less ilian (pr -/ to
-- (pr
\r' r:)
15' - a";
The cl^stributio. a prop.e' probability distribution o1
'o.nal
random variable, the total-is a cohti,uous
area uncler it_," .u.r" ii*; i.,
.-. (a) equal to one ..ni i^" -"ll
lra'. 11 L!-o't'u' f:l"lP,,V distlibuti<.rn of a continuous ranclonr variablc, rht, v.Juc
J :t.:,undard deviation)is: *__,/
(.al zeto /L\ r^^--r
(c) greaterthanzerb-\ [:] HHf;;",.
*""_: '*:*-;J:::::::'

t] '> a'
18 Basic Statistics part-II
17, The value of e is approximately equai to:
(a) 2.7183 (b) z.17BB
(c) 2.8173 (d) s.1416
18. The value of n is approximately equal to:
(a) 3.4116 (b) 8.1416
(c) 3. 1614 (d) 8.6416

m
'[h' tt normal probability distribution with mean np ancl variance npq nray be
-'r used
" to approximate the-binomial distributioil.Ifii bd';";;;;i-,lo;a
o
;;f"rru,

. c
{a) greater than b
-,/ (b) Iess than i
(.) equal to b (d) difficuit to teli
20.Theparanleter'softhenormaldistributionare:
ot
(a) p and o2

s
ft) p and o
p
g
(c) np and nq (d) n and p.

o
21.

(d) - l
The median of a normal distribution corresponds to a r.alue of Z is:
(do
b
G)
.
1

(c) 0.5

3
. 0.b
22. The mean and standard
4(d)
deviation of ihe stanclard normal distribution are
respectively:
9
,
(a) 0and1
9
. &) landO

t
(c) p and or

a (b)
n and e

t
23. If a normal distribution with p = 200 has P(X > 225) = 0.15g?. then p(X <

s
17b)

/: /
equal to:
(a) 0.3413 0.8413

s
(c) ,0. ts87 (d) 0.b000

p
24. Given a random variable X which is normally distributcd

t
s-rth a rnean and

(a) 7 t
variance both equal to 100. 'lhavalue of mean deviation rs appro-\rmately equal

h
to:

ft) 8,,-
(c) 8.5 (d) g'
25, If X is a normal variate with mean 50 and stanclard deviation 3. The value of
quartile deviation is approximately equal,to:
(a) I (b) 1.5
(d ,2 (d) 2.5
26. In normal probability distribution for a continuous random variable, the value
of mean deviation is approximately equal to :
(a) 213 G) ZIB o
(c) 415 (ct) 4156/

1
Chapter 10 Nerrmal Distribution
19
27' In a normal distribution whose mean is p and standard deviation
o, the value

(a) 4lb (b) Atbo


k) 2/3o+ (d) zt}
26' In a normal distribution, the lower and upper quartiles are equiclistant frorn
the mean and are at a distance of

m
:

(a) 0.7979
o
(b) 0
*e79
o .

c
(c) 0,6745
.
(d)

t
0.67 4b o s
29, In a normal curve, the or:clinate

o
is highest at:

p
h) mean C (b) r,a'iance

s
(c) standard deviation (rl)
g
e,

o
30. The total area of the nor.mal probabiiitl' ciensity function

(ii) l
is equal ro:
(a) o
b
(b)

.
0.5
(c) la 0.2b

3
31' The normal curve is symmetrical and for symmetrical clistribution,

4(b)
the values
of all odd order rnoments about rnean lvil airvays be:
(a) 1

99 0.5
k) 0.25

a t (d) 0*

st
32, ln a normal curve p+0.6745ocovers:

/: /
(a) 50 9'o area 1 (b) 68.27 yu area
(c)
s
95.45 o/o
ateo (d) 99.73 o/o
?ttel

tt p
33. 'Ihe skervness and kurtosis of the nonnal distribution
arqrcspcctively:
(a) zer-o and. zerq, (b)

h
zero ancl one
(c) one and zero (d) onc ancl onc
34. In a normal curvd, the highest point on the curve occurs at the mean, p,
rvhich
is also the: '

(a) median and mode. (b) geometric mean ancl harrnonic rnean
(c) lower and upper quartiles (d)
variance and standard cleviation
35' The normal probabilit)' density function'curve is syrnmetrical
about the mean,
p, i'e' the area to the right of the mean is the same
as the area to the left of the
mean. This means that P(X < p) = p(X > p) is equal to:
(a) o (b) 1

(c) 0.5 (d) a.25


20 Basic Statistics Part-II
36. The shape of the normal curve de;rends upon the value of :

(a) standard deviation (b) Qr


(c) nlean deviation (d) quartilecleviation
37. In a standarcl normal distribution, the value of rnode is:
(a) equal to zero (b) less than zcro
(c) greater than zero . (d) exactly'one

m
38. In a standard normal distribution. the area to the left of Z = I is:

o
(a) 0.6413 (b)
c is:
0.7413

.
(c) 0.ti413 (d)
t
0.3413

o
39. The serni inter qua.r'tile range for a stanCard normal random variable Z 'd'

p
(a) . 0.fr7 ai, ft) 0.67.15 o

s
(c) 0.7979 (d) 0.?e7e o

g for a standardrzed
40. The lower and upper quartiles normal variate

o
Are

(d) bl
respectively:
(a) (b)
.
- C.674i' ry and 0.6745 o - 0.6745 and 0.67.15
(c) *

(b)3
0.7979 o and 0.7979 o - 0.7979 and 0.?979

4 0.b --\ ..
41. The value of ttre standard deviation o of a normal distributron is alrvays:

9
(a) zero
equal to greater rhan zero

9
(c) zero (d)
t (b) X=p+o '',..--1
Iess than equal to r . t Ii')

a
42. at:
The maxinrum ordinate of a normal curve is
C B- --_-j

t
(a) X=p -s^'

(c) X-p-2o
// s (d) X=o? :\r,,}\J
:
X
43. If is: t \
s
X - N'(100, 6-1), then stanrlarcl deviation o

p
(a) 100 (b) 64
(c)
tt 8$ (d) - 100 64 = 36

h
44. The vaiue of secoud tnotnent about the mean in a normal distributron is 5. The
fbur:th moment about the rnean in the distribution is:
(a) 5 (b) 15
(c) 25 (d) 75 o
45. Most of the area under the normal curve with pararneters p and o lies betrveen:
(a) p-0.5oandpt+0.5o (b) p-oandti-ro
(c) p-2o andp+!6 (d) p-30 and pt+3o.
, 46 ) If X is a normal random variablc
,J having moiln ;.t, thcn n l X - pr l is c,qtr:rl to:
(a) 'iariance (b) standard de,iiation
(c) quar:tile deviation (d) mean deviation 1--.
t
lCha 101 Normal Distribution
-*4i.__,If
X is a normal random variable having rnean p, the E (X - p;t';
(a) o,2 (b) o"
(c) 3o'r (d) Fr
48. Which of the f.llowing is possible in normal distribution:
(a) o<0 ' (b) o=0
(c)l, o>0 (d) o>n

m
49. The range of normal distribution is:

o
(a) 0ton (b) Otooo
(c) -l to+l
'lhe range of standard normal distribution
(dI - @ to + s)
is:
t . c
o
50. r
(a) Oton (b) 0m*f
p
I

s
(c) Otok (d) -ooto+.oC

#log
51. In the normal distribution, the iralue of the maximum ordinate is equal to:

#(a) (b)
(d) b
(c)
# 3 . ft,
4(b) #..
'{3:rrh the inate at points of inflection of the normal curve is equal to:

9
9 (d) ;hr E\
a t
t
"__"-t\

i 53. P(p
(a)
- <r <

/: / s
X. p * ir) is equal to:
(b)
.-.t--N z'I 4-

i/ *a Ar
r\
-l \-

s t
0.5000 "0.6827
A-(+--
(c) (d)

tt p
0.9545 0.e973
54. In a normal curve p 2o covcrs:
(a) 50% area
h
(b) 68.270/o areal
(c) 95.45% arcu - (d) 99.73% arca
bD. In X is N (p , o!), the percentage of the area contained within the lirnits p +' lio
2
is:
(a) 50% (b) 68.27%
(c) 95..t5% (d) 9s.73%,
56. The probabilily dcnsity function of the standard normal distribution is:
,, IJ .t2
1 !!_
(a) --+e 2o (b) -+e 4

w:ln oV:Z7re

1 -",'!2
.^ft- _22
(c) u (d)
1
e4
VStr ^JZ"
\
Basic Statlstlcs Part-II

67, The equation of the normal frequency distribution is:

-i(t,)' +.i(+)'
(a)
;h" (b)
vzTr

G) #'-*(?)' (d) +" j('"r'


m
6VZ7I

o
6E. If Xis N (lr, or) andif Y= a* bX, then mean andvariance of Y are respectively:
(a) p and o2
(c) a + bpr and sr
(b) a+pandbor
(d) a + bp and brord
t . c
;;il, o
-f ,]'it,,,ili
x1.{,'',.11

p
6r) ;;; J,-.;ffiil variat., tr,u)i'pr ; .n.",',i,

s
(a) 0.0260 ".rmar
(b) 0,4760 ,_ 1'
"l ',---
^1 o L-

(c) 0,96
og (d)' 0.9760 ?t

(b) l
60. lf a ctandard normal variatc, then P(: 2,676 SZ S + 2.676;7 ts equal to:

(d).b
Z Le

(a) 0,9961 o.ee


(e) 0.4961
3
4(b)
0,4949

9
61, lf Z iza ctandard normal variate, thcn P(-1,646 s Z s + 16.16) rs e qual to

9
(a)
t
0,90 0,e6

a
(e) (d)
t
0.98 o,es

s
6!, lf Zle a standard normal varlate, then P(:2,99 <Z s + 2,38; is equnl to:
(a)
://
0,4901 (b) 0,6E27

s
(e) 0,9646 (d) 0,9802

(a) tp
64, In normal distributlonl

(e) t
(b) (

h
meaR = mcdlan = modc meeR < tnediun tnocle
(d)
meen > medlan > mode mean * rneelian * mode
64, In e normal distributlon Qr = 20 and Qu = 40, then mean is equal to:
(a) 20 (b) Bo
(e) 40 (d) 60 .

66, The value of maximum ordinate in etandard nonnal eliutriltutiun ie e qual to:

#
I
(a) ('0) ::
ttZre

(c) :L r/zo
(d)
1

s!2n
-
1. (a) 2. (b) 8. (c) 4, (c) 6. (a) 6. (a) 7, (b) 6. (c)
9. (a) 10. (b) 11. (a) 12. (d) 18. (d) 14. (a) 16. (a) lO. (c)
17. (a) 18. (b) I9. (a) 20. (b) 21. (a) 22. (a) 28.- (c) 24, (b)

m
25. (c) 26. (d) 27, (c) 28. (d) 29, (a) 80. (c) 81. (d) 82, (a)
o
38. (a) 84. (a) 86. (c) 86. (a) 87, (a) 88. (c) 99. (a)
c {E.
40, (b)

.
4r. (b) 42. (a) 49, (c) 44' (d)
t
46, (d) 46, (d) 47, (a) (c)

o
49. gl_ 60. (d) 61. (d) 62, (b) 63. _ (b) 64. (c) 66, (d) 60. (c)

p
07, (rl) 58. (d) 69. (c) 60. (b) 61. (a) 62, (d) 68. (a) 04' (b)
s
66, (c)

og
Guests at a large hotel stay for an averago of g

bl
.
dayo with a standard deviation
of 2,4 days, Among 1000 gueets how many ean

3
be cxpected to rtay lem than ?

4
dayu' Aacume rhar length of stay ie normafly dia*ibuisd.

9
Ane, P03

9 tiailiouioi G;;rffiil;ffi;ii;
2' A rnaehine which altotnutieally paeks potatocd into baga ir

t
hnow.n to opcrage
with a mcan of r0 kg, and

a
deviatioo

t
'tandard
fincl thc pcroenrage of bags weight morc i.s,

/: / s
Ang.60ot
8' The heighte of large rarnple of nren werc found
3 to bc aBproxlmately normally

s
dictt'ibutod'with mean 67,b8 inchce and etanaarJ devtatlo n 2,67lnohu,
Flnd

tt p
rhc heighr cxcceded by 6 % of the mcn.
Ane,7l,79

:*,:flf.y'forh
rll8t'r'ibuted witlr standard dcvtation B,7E
rability of waltlng to go lnto
more than 20 minurec iB 0,0ggg: tr thc warring trme
Ir normdly
minutes, flnd thi iiai-*iitrog dmo,
Ans, ll,6l
6' If x is no*riaily rrietributod with u moan of 4 and a standard
doviationrof {,'
find the.pr,obnbrliry that X is less than 6,
Ans,0,6916
I' Find the Brobubility that the vrtlus of a etendard
normal variable ic less then i,
Ans,0,9772
7
'Find thc probability thau the value of a standard
normal veriable exseedl 1,6,
{ne.0,0668
Baslc Statlstics Paft'll
2d
teet are normally distributed wi\th
E. The acores made by candidatee in a certain
to 100' Find the probabilitv
meai equal to Egd "rd;;;Jard deviation equal
that a score will greator than 700'
Ane.0.022E .. ,
vary rn
g. A manufacturer of pipe knows that the pipe lengths-it producesdiameter
mean
that the diameters are norr"ily digtributed. The

m
diameter and a diamete
;robability tha6 a tength of
pipe will have

o
ilil;;h, ";;;"

c
of the diameters'
exceeding 1.1 inchee ie 0.16g?, Find the variance
Ane.0.01 (inchee)t
t .
o
working in a factory is Rs'28i
10. The mean wages of a certsin Sroup of workerspercentage

p
of workers' who get
with etandard deviation of Rs'60' Find the
above Rs.200.

g s
o
Ans.96.54%

l
it.a0 lnd.the fourth
i+' For a normal dietribution the firet moment.aboutof19the distribution'
11.

b
48. Find the etandard deviation

. =
;;;;u"rt oo i.

3
Ans.2
171'094' Find the standprd

4
L2, In a normal diatribution p = 163 and Q3

9
deviation.
Ane.12

t9 standard deviation is

a
13. In a normal diatribution the lower quartile ie 10 and the

t
10. Find the mean of the dietribution'

s
/: /
Ans.16.746
standard deviation o = 4'5' Find
14.. A normal dietribution has the mean p = 85,

s
tho value'of Qr.

tt p
Ang.88.04
standard deviation
16. If X ia a normal random variable with mean p = 113.49 and
h
o = 20. Find the value"of Qr.
Ans.100
16. Define normal distribution'
: i;: *rite down any five properriea of normal distribution.
18. Write down the equation of the normal curve
(l) with mean p and etandard dcviation o
(ii) with mean 50 and gtandard rleviation t0'
10. Define the normal probabi'lily deneity funttion'
t0. Defrne the normal frequency dietribution'
21, Define the etandard normal dietributign'
distribution and the nonnal
zz. what ic the relationJip between the binornial
distribution?
29. Describe the important properties of the normal d-istribution.
24, What is a standardized normal variate.
25. write down the ordinates of the standard normar curve at
(i)Z=L (i1)Z = *1.
26. Explain why odd order nroments about mean equals zero for the normal

m
27. Explain why Br equals zero for the normal distribbtion.
28, The normal curve is defined by thc equation '

c o
t .
where p=_o=_Tt= '
p o l=
s
e=_X=

g
29. Define the points of-inflection in a normal distribution.

o
30. Sketch the normal curve and then place the values for the means on the

l
respective X and Z scales. Verifu'that the area under the normal curve between

b
.
the mean and 2 standard ceviations above and below it is 0.gb44.

3
31. Sketch and verify the area under the normal curve between the mean and B

4
standard deviations above and below it is 0.g9ZB.

9
32. When is it appropriate to use a normal approximation to the binomial

9
''

t
distribution?

a
33. write down the basic properties of the standard normal curve.
34.
s t
Complete the following table for the norrnal curve with pararneters p and o.

/: /
Draw four graphs illustrating your results.

s
Given X-values Corresponding Z-scores Area between X-values

tt p
p - 0.6745o and p + 0.6745 o - 0.6745 and + 0.6745 0"50
p-loandp+lq
h $-26andp+2o
Ir-Soandp+3o
25 Basic Statistics Part-II

E)(ERCISES
If the random variable Zhas the standard normal distribution, find:
l.
(i) P(z < 1.46) (ii) P(Z> r.46) (iii) P(z < -1.48)
(iv)P(Z > -1.e6) (v) P(0.65 <Z< 1.99) (vi)P(0 <Z< l.Ll)
' (rii) PCl.32<z< 1.65) (viii) P(-1.25 <z<0).
Anr. (i),0.9279 (ii) 0'0721 (iiil 0.0694 (iv) 0'9750 (v) 0'2345

m
/
l{r) 0.3749 (vii) 0.8571 (viii) 0.3944.
o
2.lln a normal distribtrtion, M.D. = 3.9895, then find standard deviation, quartile
c
.
deviatiorr, second and fourth momonts about mean of the nortnal distr'ibution.
; Anr. S.D. = 5, Q.D. = .3..3725, p, = 25, tr r= 1875.
ot
p
<4, In a normal distribution with lL = ancl o = 5. l'incl the area:

s
-?o

(iv)between 12 and 18 (v) between 30 and 42

og
- ,'
l
Anr. (i\ 0.4772 (ii) 0.8413 (iii) 0.0228 (iv) 0'2&98 (v) 0'0228

b
d. The mean safes of all the different branches of a big cloth shop is Rs.10000 with

3 .
pencoDtage/proportion of shops the sales of which are between Rs.11000 and
R8.12000.

9 4
9
Ang.zlf.93 %.

t
tlXis a normal variate with mean I aryd standard deviation 3, find the probability

ta
s
Ans. t I o'iazftilo.zasz

/: /
0.
-' In a normal clistribution the mean is five and the variance is one. Write dorvn its
aqu"tion. Also find the value of maximum ordinate correct to two places of

,l= s
, "

p
decimala.

t
ht
g.40
^*YG*
frn"
'.
"-i1x-:,,),,
sconae made by candirlatcs in a ccrtain test are norrpgiry.{istributcd rvith
."""[OOO]nd standarcl dcviation 100. What percent of the candiilates.t"ff]Y.$
r**rililireater than 700 (ii) less than 400 (iii) between 400 and 600 (iv) fthich
eff", from mean bY more than 150'
tr"/(t')z.zlo/o (ii\ 18.87 % (iii)68.26 % (iv) 13.pa
8./lf,the average hgight of miniaturc poodles is eOlentimeters, 11th a standA
deviation ol 4.{centimetcrs, what percenta$qgf,,tn^iniature poodles .9xce9cl{.-JPl
centimeters in height, assuming that the heig[ts follow a normal distribution
and can be measured to any desired degree of accuracy?
Ans. 11.12 %

L
Normal Distribution
diameters of bolts rnanufactured by a company are normally distributed
iih
rnean 0.25 inches and standard dcviation 0.02 inchcs. A bolt is considered
&efective if its diameter is < 0.20 or > 0.28 inches. Find the percentage of
defeclive boits manufactured by the company.
Ans.716 orit
,/
10.Lf the rveights of ball bearings are normally distributed with mean 0.6140
newtons and standard deviation 0.0025 newtons, determine the

om
. c
(ii) gneater than 0.617 newtons (iii) less than 0.608 newtons.
o"rfr Bs.o4% (ii) 11.b1 % o.szy{
(iii) - -' l

ot
tVLet X be a normal random variable with mean = 16 and standard deviation = 5,
Determine: (i) P( between 1l and 2l )

s p
(ii) P( at least 26 )

g
,/' (iii) P( less than or equal to 6 ) (iv) P( at most 21 ) /
(i) 0.6826 .

o
(ii) o.oz28 (iii) 0.0228 (iv) 0.8418

l
Y
iZ.In a certain examination 3000 students appealed. The average marl# obtained
/

. b
were 50 % and standard deviation was 5 o/o.Y7ow many students do'you expect

3
who obtain: (i) More than 60 % marks (ii) Less than 40 % marks
/

4
(ir/Y Between 4A % and 60 % marks?

9
trs{ 61 ae (ii) 68 (iii) 286a.

t9 l-/
t'g. A..r-L the mean height of soldiers poffi,.{inches with a variance of I inches.

a
How many soldiers in a regiment
,"Y { lO0[would you expect to be cjver six feet
errs/sg
s t " ' N
14.

/: /
The mean life of stockings used by an army was 40 days; with a standard

s
deviation fg .auyr. Assume the life of the stockings follows a normal

tt p
distribution. If 100000 pairs are issued, how many would need rqplacement
(i) before 35 days? (ii) after 46 days?

h
Ans: (i)-26600 (ii) 22660.
l\$ven that 1t = 300 and o2 = L00. Find I
' (i)
the area above 314 (ii) thg two v4ues that eontain the middld 75 o/o area
/i:L n ,rrt r t. r.r \\
---In
{+ii)' Qr and Q, of the normal distribution.
Ans.(i)0.0808(ii)288.5,311.5(iii)293.255,3D8;!I45
16. A random variable X is npnnally distributed with mean = 70 and S.D. = 5. ,,
(0 Find a point that ha6 gZ.g % 9f the distriburion below it. \,
(ii) Fintt a point tt ut nuJSi.ZTof thc distribution abiffi
.6i) pina two such points between which the central 7O% of the distribution,lies.
,dv) Find two such points bctwccn rvhich the ccntral 90% of the distriSfrsn lics:
Ans. (i) 75 85 (ii) 65.48 (iii) 64.815,75.185 (iv) 61.775,78.225
I
I Basic Statistics Part-II
,
II - In a.normal distribution p = 40 and o = 3'8' Find: '
17.
'| gg falling between thern
{r*o points such that the curve has a % chance of
than 38'6'
(ii) the chance that a single observation will be iess
Ans. (i) 31.146,48.854 (ii) 0'355? '
18. if X is N(24, 16), then find the:
(i) 33rd percentile (ii) 9th decile'

Ans. (i) 22.24 (ii) 29.12'

om
c
under 50 and 10 % are over'70'

.
l.9.In a normal distribution 30 % of the values are

t
Find'thenleanandstanrlarddeviationofthedistribution.
Ans. (55.81, 11.09)

p o
ancl o = 6' find the value X that has:

s
20' Given a normal distribuLion rvith p = 40

g
(i) % of the area below it'
38
(ii) P% of the area above it'

o
t\
Ans. (i) 38.167 (ii) 49'87
l
J" '

b
is

.
students whose average r'i"'ight 150
21.Assume that We have a large number of th,"
lbs. and that the weights u'" "o'*'llv d"t"'b"e

3
If y"
i* ;;;; lt""Y
;;,d io a ru. w$ I !+v *re
f9*l:;l,lii
n d a r ddEi: a tion of

4
ffi ,:l?.'il il JT.i#r ffi
s t a
. ./ !\ - o.'l\tn -r

99
t
t,lrlD. , r .vv .._
i ?1 \.\,
60 and 15 0/o are over 90'

a
distributio n 25 % of the iterirb lre
normal 'n[iir
t
22,lna

s
Findthemeanandstandardcleviationofthedistribution..

/: /
Ans. (71.82,17.53) .
quartiles are 18 and 26
23.In a "normal distribution" the lower and upper
s
deviation'
r\lso find mcan
respectively. Find its mean ancl standarcl deviation'

tt p
Ans. :f,5.93,4.73
' ,.-
h
ancl 8g 06 are undcr 63.
zii{.-In a normal distribution 7 % of the 1': undcr
itcms are
he items
"11:t
Bl-r

Whatisthe*"",,andstandarddeviationofthedistribution?
Ans. 50.29, 10.33
nor'mally
25. The heights of a large sample of tncn were found to bc approximately
distributedwithmean6?.56it"rr".^"astandarcldeviation2'57inches'what'
heigJrt is exceeded by 5 "/i of the metr?
Ans.71.?9.
Ghapter
11
SAMPLING AND" SAMPLING DISTRIBUTIONS

om
c
11.1 INTRODUCTION

t .
In our daity life it is quite often that we have to examine sorne given material.

o
We examine fruit before we purchase it, we make a small study of the material

p
whenever we have to purchase something. Even the children check the swelets,

s
pencils, bats, rubbers and other items when they have to purchase them. This

g
approach is applied in different fields of life. The products of the factories are
inspected to ensure the desired quality of the products. The medicines are

l o
manufactured on commercial scale when their effects have been tested on the

b
patients. The different fertilizers are tested on agricultural plots and different foods

3 .
are tested on animals. Srnall darns are constructed in the laboratories to study the
life and other characteristics of the big dams before they are actually constructed.

4
Some colour may be applied on a wall, on a door or cloth etc., and the result of the '

9
colour is observed before it is applied on large scale. Cement, steei.and bricks are

t9
examined before using them in different places. This process of inspection is very
wide and is commonly used on various cccasions. But this job is never done on very

ta
large scale. This process is carried out on. a small scale. On the basis of this small

s
study, we make an opinion about the entire materiai under study.

/: /
11.2 POPULATION
I

s
The word population or sloiislicol populatiort is used for all the inrlividuals or

tt p
objects on which we have to rrake some study. We may be interestecl to know the
quality of bulbs produced in a factory. The entire product of the factory in a certain

h
period is called a popu.l,atitin,. We may be interested in the level of educarion in
primary schools. All the chiklren in the primary schools will make 4 populatioru. The
populatiot may contain living or non-living things. The entire lot of anything under
study is called populotiori.. All the fruit trees in a garden, all the patients rn a
hospital and all the cattle in a cattle farm are examples of populoriorus in different
studies.
II.2.I FINITE POPULATION
A population is callerl fhtite if it is possible to count its individuals. lt may also
be called'a coutttable populotion,. The number of vehicles crossing a bridge every day,
the number of deaths per year and the number of words in a book arc fhtite
populatiorls. The number of units in a finite population is denoted by N. Thus N is
the size of the populatiott,.
2g
rr--
Basic Statistics Part-II
30
IL,2.2 INFINITE POPULATION
Sorzretimes it is not possiblc to count the units contained in
the popu'l,oliott'.
suppose ihat we want'to
Litrch a populotiorl is callei irtfi'tite or turcotttttuble'Let us
ex;iiuine whether a cgin is true or not' We shall toss it a verv iarge
number of times
to obs;erve the number of heads. All the tosses will make an int'intte or coutr'tably
in.finite populotion.. The number of gerrns in the body of a patient of malarta
is
perhaps something which is utrcoutitoble'

m
11.2.3 TARGET AND SA1VTPLED POPUI'ATION

o
Suppose rve have to make a stuCy about the problems of the
farnilies living in

c
houses is'our
rented tor."u in a certain big city. All the families living in rented
torget popu,lotion..
-of
t .
The entire target trtoltttlation ma,r' not he' cousiqlerecl foi: the

o
;;;;J selecting a sample from the population. Some famiSes may not be

p
"popr,lrt*,r, to be inciucled in the sarnple.
interested We rnay ignore some_pa-rl of the torget

s
to ieduce the cost of study. The populolitrri out of which the sarnple is

g
seiectetl is called sornpled, p6pu.|o!iarr. ot' stutl,i.cd poprLLati,ort.

l o
11.3 SA}IPLE
Any part of the popuiation is callecl a sarnple. A study of the san[tle ehables
us

. b
population^ The number of units
tc rnake some decisionsabout the properiies ef the

3
bv n' A good
included in the sotrtple is called th,: size of the sonLple and is denoted
population . A sctntple

4
sornple is that or," *hi.h speaks about the qualities of ttre
This process

9
study ieads us to make some inferences about the populatio'n measures'

9
is sornpling.

t
"ailed AND STA ISTIC
11,3.1 PARAMETER

a
Any measnre of the population is cali.eri puratneter and the rvord sladis/ic is
used for any
s t
i,alue calculaied t'rom the sample The populaticn mean pt is a

/: /
pot.ctrneter and the sample mean X ,s a t;t'ai,i'st,ic. The sample mea.n
X is used to
r'"i""u o' ts a ltaronreter
s
estimate the population mean pt' Sirnilarll' ilte F;opulation
*nJ tn" .r*pt" r,ariance S3 is a stotisti,c; In gerlerai ihe syrrrbol 0 is used for a

tt p
paranteter is
poirarneter anrl the syrnbol 6 iu ,."".1 fcrr a siati.stic. The vaiue of the

h
mostly"unknown anrl-the sarrple starist:r: is '":secl to ura.kr: snme i.nferences about the
trnknorvn paronreler
I1.3.2 SAMPLING FR;\C'IION
is called
If size of the population is I'J anrl size r,t'the sarnple is n. the I'atic' ft
lf N ''' !gt} i' i: lC, thc i:iitio ft- iJ}== io' It means that
on
thc,sorrrPli rtg {rrtction.
thc rrverage l0 units of tire population x'iJl he rcpr'*sentcri hV one unit in
the sample'
. I{ the srtrtltlittg fractiotr fli. n]rltiy;iir;.I .vilr, 10[r, rf i: g8t. Ehe sa:npling f'cr'ltort
tu

n ii)
100 -- !ila - J-t nre;rns 10 ?'r of thi: populatirin,
percentage form, Th'.:s ft'( 100 =
_#'
.:s inciuded in the -"ample.

\*_
lChapter 1I,l Sampling and Samptiqg pis-tilglions 31
rT.4 COMPLETE COUNT
If we collect information about ali hhe inriiviCi-rais in the population, thc study
is called contplete couttt ot cotilplete etuttnereiio;r.. The word cerr.srus is also used for
the entire population study. In statistical si,ud,*s the t:ontplete co1,ril, is usualll'
avoided. If size of the popuiation is large, the c,,tr;itlete count requrres a lot of time
and a lot of funds. The contplele coulr/ is rnoslllr rjiiTicrrit i'or various reasons. Suppose
we ivant to make a studl'about the cattle in thr: cattJ.e thrrns in our country. Vy'e are

m
interested in the average cost of thcir fcrotl ibr a c*rtarn period. We rvant to link their
cost of food with their sale price. This rs of cc,urse an import,ant study. It is very

o
difficuit to coilect. and maintain the inforrnaticn al.,or:t each and every cattle in thl

c
.
farms. If at all we are able to cio it, thc stucly rnai, rrot be of mucl, ,r.. 'l'hc rlesiled

t
information can be obtairred from a reasonabie sainple size of the cattles.
11.4.1 POPULATION CENSUS

p o
A complete cotttrt of the humal populabio;'r is callccl psltliolrrtrl rcrrsrr.r. In

g s
Pakistan, the first populotio:L cetlsus was conclucred in 11)bl anrl thc scconrl was
conducted in 1961. The third c€tlsrts af pa1,.uLat,loru could:rot be conductecl in 1g?I

l o
because of agitations in the then East Fak.istan, It wa.q conrlur:terl in 1g72. The zrth

b
cetlsu,s rvas conducted in 1981. The fifih pcpulation census was concluctecl in 1gl)8. A

.
lot of information is ccllected about the h'-rirran ocpuiaition thmuqh tlxl. psi1111r11,r,,

3
census conducted regularly after every 10 ye,ars. Thr. ,_:er:r:ls reports give inforr;raliorr

4
about various characteristics of the ;ropulation e.g.. i,he urhan and rirral pcrpuiaticir,

9
the skilled and un-skilied Jabour iorce. the a.ri:icullural Iabr.r.-rr force ar:rt the

9
industrial workers, level of educati,rir. .rncl illiteracy in lhe cnunl;r3i, geographical

t
distribution of the population, age and spr distnbution of the populatinn etc.

ta
11.5 SAMPLE SURVEY

s
If it is not essential to conduct the complete enrrmeration, tirbn a sarnple cf

/: /
some suitable size is selected from the population and the st;.rdy- is carried ,.rut cn ihr:
sample. This study is called sotrtpit: i;tl,t't-tey"
'\fost of tiie r.?.sear.ch vrork ll ii6n*

s
through santple surueys. The opinion cf thc voie!'s in, favou'r o[ ceriarri .propcsed

tt p
election candidates is ohtained through sontpl.e.strr.uc.),s
11.5.1 ADVANTAGES OF SAMPLING

h
s,ampling has some advantages over the complete count. These are:
(i) Need for Sarnpling
Sometimcs there is a necd for santpling. Suppose wc u,ilnL to inspect Lirr: cggs,
the bullets, the missiles and the tires c,f soure firrn. The study rnav be such that the
objects are destroyed during the process of inspection. Obviously, ure cannoj. afford to
destroy all the eggs and the bullets etc. We have to take care r,har t:he \\asr.age
shouid be minimum. T'his is possible only in sarnple st,udy. Thu-q.sarnpli*g is
essential when the units under stucty ale destroyed.
(ii) Saves T'ime anrl Cost
As the size,-rf the sanrplc is srnall as oornpared to the poirula.t;on, l.he trrnr.ariri
cost inv:olved on sarnple study are rnuch less thah the complete counts. !'cr coml !eL3
tlouht huge funds iire leouired. 'fhcrc is alrvays the prohlern of {iaancerJ. .,\ srnalj,
32 Basic Statistics Part-II

sample can be studied in a limited tirne and total cost of sample study is very srnall'
For complete count, lve need a big team of supervisors and enumerators who are to
be trained anrl they are to be paid properly for the work they do. Thus the sample
study requires less time and less of cost'
(iii) Reliability
about all the units of population, the collected
If we collect the infrirmatii-rn
information rnay be true. Bu[ we are never sure about it. We do not know whether

om
the infbrmation is true or is completely false. Thus we cannot sayr.anything with
confidence about the quality of information. We say that the reliobility is not

. c
possible. This is a very important aclvantage of samptring. The inference about the

t
population parameters is possible only when the sample data is collected frorn the

o
selected sample.

s p
(iv) Sometimes the experiments are done on sample basis. The fertilizers, the seeds
und th" medicjnes are initially tested on samples and if found useful, then they are

g
4pplied 6n large scale. Most of the research work
is done on the samples'

l o
(v) Sample data is also used to check the accuracy of the census data.

b
II.5.2 LIMITATIONS OF SAMPLING

3 .
Sometimes the information about each and evcry unit of thc populution is
required. This is possible only through the complete en..meraiion because the
sample will not
"uio"
9 4
the purpose. Some examples in which the sampling is not

9
allowed are:

t
(l) To conduct the elections, we need a complete list of the voters. The candldates

a
participating in the election will not accept the results prepared from a sample'

t
With increase in literacy, the people may become statistical minded and they

s
/: /
may become willing to accept the results prepared frgm th9 sample-.ln advanced
countries the opinion polls are frequently conducted and unofficially the people

s
accept the results of sampie surveys'

tt p
(ii) Tax is collected from all the tax payers. A complete list of all the tax payers is
required" The telephone, gas and electricity. bills are sent to all the consumers. A
complete list of the owners of land and property is always prepared to maintain

h the records. The position of stocks in factories requires complete entries of drll
the items in the stock.
T1.5.3 SAMPLE DESIGN
In sample studies, we tfave to make a plan regarding the'size of the sample,
selection of the sampie, collectign of the sample data ancl preparation of
the final
results based on thl sample $[udy. The whole procedure involved is called the
sarnple d,esign,. The term sample survey is used for a detailed study of the
sample' In
ge.reral, the term sample survey is used for any study conducted on the sample
taken from some real world data.
11.5.4 SAMPLING T'RAME
A complete list otl all the units of the popdlation is called the sontpling t'ronte. A
u,ni,t of population is a relative term. If all the workers in a factory
make a
s
population, a single worker is a unit of the population. lf all the factories in a
country are being studied for some purpose, a single factory is a unit of the
population of factories.'fhe sorupling frortrc contains uit thu units of the population.
trt is to be defined clearly as to which units are to be included in the frame. The
frame provideb a base for the selection of.the sample.
1r.6.5 EQUAL PROBABTLITY

m
The term equal probability is frequently used in the theor:y of sarnpling, 1'his

o
term is quite often not understood correctly, It is thought to be closc' to 'equal''in

c
meaning. It is not brue always. Suppose there is a population of 50(N = 50) stuclents

t .
in a class. We select any one student. Every student has probability li50 of bcing

o
eelected. Then a second student is selected. Now, there are 49 students in the
population and every student has 1/49 probability of being sclectecl. When thc first

s p
student is selected, all the students have equal (1/50) chance of selection and when
the second student is selected, again all the students have equal (1/49) chance of

g
selection. But 1/50 is not equal to 1149. Thus equal probability of selection means the

l o
probability when the individual is selected from the remaining available units in the

b
population. At the tirne of selecting a unit, the probability.of selection is equal, It is

.
called equol probobi,lity of selection

3
I1.5.6 KNOWN PROBABILITY

9 4
. In sampling theory the term htrcwtt, probobility is usetl in random (probability)
sampling. Let us explain it by taking an example. Suppose there are 300 workers in

t9
a certain factory out of which 200 are skilied and 100 are non-skilled. \\re have to

a
select one sample (sub-sample) out of skilled workers and one sample or,rt of un-

t
skilled workers. When the first worker out of skilled workers is selected, each

/: / s
worker has a probability of selection cqual to l/200. Sirnilarly when thc firsr worker
out of un-skilled workers is selectcd,. each rvorkcr has a proliabiiity of sclcction equal
to 1/100. Both these probabilities are httouttt., though they are not equdl.

s
tt p
11.5.7 NON-ZERO PROBABILITY
Suppose we have a population of 500 students out of which 50 are non-

h
intelligent. We have decided to select an intelligent student from the population. T'he
probability of selecting an intelligent student is 1/450 which is non.-zero. In this
example, we have decided to exclude the non-intelligent students from the
popuiation for the purpose of selecting a sample. Thus prr:bability of selecting a non-
intelligent student is iero.
11.6 PROBABILITY AND NON-PROBABILITY SAMPLING
The terrn probabilit,y sontplilry is used when the selection of the sarnple is
purely based on chance. The human mind has no control on the selection or non-
selectioryof the units for the sample. Evely unit of the population has known non-
zero probability of being selected for the sample, The probability of selection may be
equal or unequal but it should be non-zero and should \e htr.outru. The probobili,ty
stlttlplhlg is also called th" ,g{&ry:X*plrng (not simple randorn sampling). Some
examples of random sampling aiE-
7'-
34 Basic statistics Paft-ll
(i) Sinrple t'irnd()ln surltpiing
(ii) Stratilied t'atrdotn sampling
(iii) Systematic random -sanrpling.
ln trcru.probability satrt4tlirq, the sanrpk: is not based on chance. It is rather
determined ty sornc p"r.u,i. \\'u ,,u,rn,'i, assign to an elenrent of population'the

m
probability of it* being selectetl in the sample. Sornebody may use his personal
judg*ment in the selection .of ttre sample. Iri this case the sampling is callecl

c o
ju{gn,,tr,rt sanplittg. A clrawback in notr,"probobility samplirtg is that such a samplc

.
.uonot b* us"cl,to cletermine the error. An.y statistical rnethod cannot be usccl to

ot
drnw int'eyence frt-rr1 thr.s sarnple, llut it should be retltetnberecl thrtt .luclgetttent
sarnplilg becornes esscntial in sorne situations. Supposc we have to take a cmall

s p
*urp6 lrn,, * big hcap of eoal. Ws cannsb ntnke a list of all tlie pieces sf eoal, The
uppur part of the hcap rvili havc perltaps big pieces uf coal, We hnve to use our

g
quttlity of'eortl,'l'hc rto,l'
lo,ignninot in sclectirrg n surnple to hrn'c rtn rclctt abottt tho

l o
probability santplirg is also culled [ron'randonr silnrpling

b
11.6.1 SA}IPLING wlTH ITEPLACENIE}i'I'

3 .
Sampling is ealled u,ith replar:enrctLt when a unit celeeted at randoln frorn the
populatiol is ;eturned to the populatieirt nntl thefi a Bceond clcment ie gcleeted rtt

9 4
.nn*n*. Whenever e unit is ssleluea, the pepuiatisn eontaiRs all thc Bame unitc, A
oni, nrry be selected ttrorc than onee, t'here ts ri.o cltango at all ln the .eize of the

t9
popolrt*rr at any stage. we eun a58uui€) that a orilllple of any size ean be seleeted
iroru ttre giverr pnpulitiol bf any siae, This ie only a thesretical eoneept and in'

ta
practieal u-ituatiorr. the sumple i.s hot eeleeted bv using this eehcnlc of teleetion,

s
bil;;;. the pspulatirrri eii{er N = 5 urrd sartrple eise r = 2, nnd sainpling ie dono uiidlt

/: /
iiitorgi,,n,i.r,'Olt of 6 elerne'nto, tlie firet olernsnt Pan be seleetcd in 5 ways'Thq
seiecteil unit ie returneril to tLre ainiir.lot and nCIw the Beeond uniE ean ulso be scloebed

s
in A *ry*, thtlB in total there arrr S x I * 26 samples 0r pdirE whieh Hre pss8ible,

tt p
Soppn*o d eontainer r;orrtaind 3 gor-rd bulbe denoted by Gs, O, and 0., trntl 2 det'eetive
UoiUr den6terl by D, anri Do. lf arry two br,rlbs arc Beleeted rriitfu rcpldeentet r, there

h
are 26 possil-rle satnples listed betwecn in 'l'ablc I l' l'

Llr Ur
ttiblc 11"1

Gr Di D:

tJr OiGr 0rOg G t0,r 0rDr e rDr


(}z GrGi 0gGc G*Gs 0rllr 0r1l:
Ga 'GrGr G,iGp [J,iG,r C,rD r 0,l]o
Dr DrGr IJIG,J IJrG,r Drllr Il r I)'J.

Dg DcGr IlrGs DrGr DgDr DsDg

given tty N" = 6s = 35. The seleeterl stttttplo rvill be


'possible ia
The nUrhber rif'sdurples
*ny one nf th" Z5 samples. lXach sample ttas. equal pruliabifity l/:26 of
Eelection, A sample eelected itt this mannel is r:allerl sirnple randotn curnplc

\_
IChapter 11]_ !31npling and Sa m pt in g Distri butions 35
I1.6.2 SAMPI,ING WITHOUT REPLACEMENT
Sarnpling is cal.led uti.lhout reltloretttt,n l rvhcn,iurrrt rs sclcctcrl irI runrjdrn fi,onl
the popultttion untl it is not rctur"rrccl to thc rnrrin lot. I.'irst, unrt is sclor:i.r:d out ot'a
populatiorl of sizc N and tltc seeotttl unit is sclcctcrl out of the rcrnai,rlr,!j
lj{.rpulrrtio,i
of N * I units and so on. 'lhus the sizc of the populat,ion goes on elccr,easirrg as the
satrtple size n incrcases. 'l'ltr-. surntrrlc size rr cannot exceed ttre population sizc lJ. 'fhe

m
unit once selectcrt lbr: ir sarrrpL-, cannot be repeatert irr the sclrne rrilflple.'fhus nll the

o
units of the samplc,, arc distinr:t frorn ont: anothu'. A sarnpla tuil.houl rcpiace.trrt:ti,l can

c
be selected eitirer: [:\, usrrtg t[rc irlerr of pelrnutations ol' cornbirraiiens. I)epending

t .
tlpon the situation, rve lvritr: all possiblc perrnutirrtions or cornbinations. It' rhe

o
clifferent al'rangetnents of thc utritrr *re to be cunsidererl, then the pernrutations

p
(arrangernents) are writren ter get nll possible samples. lf thc nrrilngemcnt of i.rnits rs

s
of no interest, we wriee [hr corribir)ations to gct all possible hirrnplL,,.j.

g
11,6.8 COMBINATI()NS

o
Let us aBarn'conhrrl,-'l.n klt (population) of 5 bulbs with lJ goorl (C,, G, ancl G,)

bl
anel 2 defeetivs (D, and Ur) bulh,*. Suppose we harve to selecb two l:ulbs in any or.rler.

there are uC, =

3 .
,h= l0 possibk: corttbiintiotts ar sotrrltlee. 'lhese cotrtlttttnliotr.s

4
(samples) are listed as,Grr_1", GrG,, G,D,,G,l),,, L],D, GrDr, G,,lJ, GiD!, D!IJ,.
G,1G,1,

9
There are 10 possrbie sarnples and each of thern has probabiht,y of sr:li:ction

t9 r u, *
cqual to l/10, The eelesterl sattrpie will be {rny orre sf tliese 1b sarnpl,rs. fno sanrple

a
seleeted in thie rnanner ie also enlled simple l'nnelorir anrnplu. In g,rni:rul, thr. ntrrntror

s t
oIaanrplee by roirt bitrutirtttsia equul to i'r(: --=F!- -=

/: /
n!(N-n)! '

I I.6.4 PERftIUTATIONS

s
Ear:h esfiltitratitrn SuttoraLcs B hutnber of ar'r'arrgemcnts (perrtutali.ilrs), 'l'itus

tt p
in general the nurnbet'sf pci'ttltttalions is gl'ester tlialt the riumber of combirratrutrs,
In the Freviou'e e ttirxple sf hulbs, if the or,rler rif tlic solccte il bulbe is to he corlside'r,ed

h
then tlrc nuurbet'of uarnplrs by petirtrttoliorro is givur l,',,'1, ,- Fl.,=
' 'r -= (6-g)! -' ?0. Thnsp
sarnpleBarc: t
G,0, GrG, 0,G, orG, GrG, GrG, C,D, DrGr G,D/ D,Gl
orD, D,G, (12b.t D'G' CrD' D,C,r GrDs [toc, DrDr Dru,
Eat:h saurpie hrs ;rr'ol"rutiilit3, ol stleetion equal to 1/t0. 'l'he ssleeter{ sumple
liuupil"tg Lh vtett t,h* ci'riirl t,i'the buiLrs will tre rr,rS,or*u r;l tlrr-,sc Z0 salrtlrles. A sun prle
sclr-rcLeil irt fltis lll;,11irlut is uirio cailerl srrrrplu lrrrrrl6fir strurlrlc trecatrsL eat,h snnrpi"
hae equal protrabiliry o1'Leing aclaeted.
I1.8.6 SIIITPLE NANDOM SAMPIT}
SrmPL: t'ilIldoln srttnplc (SltS) iB il sl)cclfll c,lsc of a rlrrrlorlr sar-trllle , .\ sarrrpie rs
r;rllcrl uitripl,': tntttlr;ttt sampie r{'r,lrch rlnit uf thc iropulatrotr has urt uqull qhliii,:rt uf
:IciIlB sr:lr;itutt fLrr tltt't'ttttl,lt: \\'irt,trevcr;l ut'!tt is sel,,r:tcrl {irr tlr,: tlit, unils
",rttrpl,',
36 Basic Statistics Paft'II

of the population are equally likely to be selecteci. It rnust be noted that the of
of selectilg th; first elcrnent is not to hc cornparcd wrth thc probability
probabiiity-the
;;i;.;id second unit. When thc first unit is.sclcctcd, all the units of thr:
p"prf"ti"r have the equal chance of selection rvhich is 1/N. When the second unit is
selected, all the ,orouining (N - 1) units of the population have 1/(N 1) chance of
selection.
Arrother way of defining a sinryle rond.anr.'sonrple is that if u'e consider all

m
po.*i[iu .urnptus of si.ze n, then each possible sarnple has equal probability of being

o
c
selected.

t .
If sampling is done with replacement, there are N" possible samplcrs and each
sample has probability otl selection equal to 1/N". If sampling is done withor'rt

o
Ncn possible samples ahd

p
replacement with the help of cornbinations then there are

s
are-rnade with
each sample has probability of selection equal to 1/NCn. If samples NP,,. Strictly
to
g
permutations, each sample has probability of selection equal 1/
speaking, the sarnple selected by without replacetnent is called simpl'e

o
ron'donr'

bl
.
i1'.'fj"rrurrERENCE BET*EEN RAND.M sAMptE AND srMpLE

3
RANDOM SANIPLE

4
of
If each unit of the population haq known (equal or un-equal) probability
selection in the sarnple, ihe sample is called a randour sarnple. If cach
unit

9
of the
population has eqrril ltrobability of 5ei.g sclcctcd for thc satnplc, the s^rnple

t9
oUial.rea is called .i*plu random sa.mple'

ta
11.6.? SELECTION OF SiNTPI,B RANDOM SAI\TPLE

s
A sintple ranclotn sarnple is usuaily seiected by +vithout replacement'- The

/: /
following mlthods are used for the selection of a simple ron'dont sontpl'e"
(i) LotterY Method
s
This is an old'classical methocl but it is a powerful technique and modern

tt p
popul.ation are
methods of selection are very close to this method. All the uriits.of the
frame. These numbers are written on
nurnberetl from I to N.'Ihis is called sampling-rnetallic

h
the small slips of paper or the small round balls. The paper slips or the
metaliic balls shouid be of the saine size otherwise the selected sarnple
will not be'
ball is picked
truly random. The slips or the bails are thoroughly-mixed and a slip or
up. Again the population of slips is mixed and the next unit is selected.
In this
selected' The units of the
manner, the number of slips equal to the sample size n are
pop,rtutio,, which ufpuu, o" the selected slips rnake the shnple-ran'dor.rt
sorDple ' This
ir"tnoa of ielection l* .onr*only usecl when size of the population is small' For
a

Iarge population there is a big heap of paper slips and it is difficult to rnix the slips
properly.
(ii) Using a Random Number Table
All the units of the population are nurnbcrecl frorn 1 to N or fi'orn 0 to N - 1'
We
thc sizc
ronsult the random nurrrL., table to talte a sirrlple rondo1t santple ' Supposc
rs 80 and rve have to select a random sample of 8 units' The uniLs
;;;ffi;il;;
of.the population are numbcred from 01 to 80. We read two.dlgit numbers frorn the
table of random nutnbers, We ean takc a rtart from any columnE o, ,o*, uf tnu 6gbls,
Ixt ua eonsult randorn tmntber tolrJe givcn in thir iook, i*u.aigit ioiri6u,,. ui,
talren frdm the table, Any number aboio B0 will bo isnorsd ;Jl? ili;i;,;ffi;
repeated, we lhall not roeord it if ranrpling lr donc wlthiut rCplaoemunt, t*t uc road
thc first two eolumne sf the tnLlo, Ths rnridorn nurnboi from trrs muis iltoJo, B?,'08,
1.2, 06, 81' 68 and ?3, The two numbot's 0g and 86 hnvo not becn

m
rooorrlgd boonuao
the population docs not eontain theso numbem, ThJ unrii ;atfi purruioiinn whoc€

o
numberr havc baEn seleetod nonstitule tho edmplo ,o,rdin io,,tpl6,-G;i;;;r!il;;;

.; c
that,thc rlae of the populution ie 100, If the uniti arc-numbercd frsm OOf to 100, we

t
rhall havc to read B,'tigit randorn numbore, Frorn ths fimt B eolumns of tho ,;,iotlo,t,

o
numbcr table, the randorn nurnberc aro 100, gZO, Oge, OgO; itrild uo, Wn fioA

p
thet tnoEt of the numbers are abovg 100 and *e aru *artlng our tlrno wlriio roadiire

s
thc teblo, We oen avoiel it by nunrbdring tho unltr oi thi pop"ulatron ftom d0 to 00, Io

g
!Li^i-Iut, we shall read Z'digit numbsm fiom ttrs taulo, rrrul if N tt i00, 1000 or

o
10000' tho numbuing ie done from 00 to 00, 000 to 0gg or 0000 to g00g,

l
(ltl) Urlnl the Oomputer
*.. {h. faollttyir urcd
. b
of seleetin g-a aintple, random aantplc lr evallable on ths oomputorl,

3
Tho.aomButor for relee.ting a rample prlzo.bond
of wlnncm, a ramplo:of HaJ

4
applleante, a eample of applieants for'rmidentiel'plotr ina iur-roii6u. othor

9
PUrBois0,

t9
11,7 EBnORg

a
Suppoee we ar€ interE$teel in tho valuo of a populatlon paramotQr, iho truo

t
value of whieh is 0 but is unknown, llhe knowledgo aisut O oan bo obtalnarl either

/: / s
from a rarnplo dntn rrt' frotrt thu po;luLfiion dntu, tn bsth enrsr, thsro ir n pon;ibillty
of not roaching thtl truo valus of tlrs parumotor, The dlfforgnag bstwgun the

s
saleulatod value (fronr aatnple datn sr from populatisn data) end the truo value of

tt p
the paramctff ia ealled errot', Thuc error ir romsthing whioh oannst bs detgrmincd
aaeurately if the populutiott ie largo and tho unltiof ttre
Bipufatisn are to bo

h
meerurEd, Buppose we al's interested to flnd tho total produetion sf
.whoet ln
Palrirtan in a eurtttilt ,vuar'. $ufficiont fundr and tlmo ars'at uur it pu.ul and we
went to gEt thc 'trtte' fi_gut's about production of whsat, Tho maximum we eai do i.
that we "eontaet all the farmers and eupBore all the farmers give maximum
cooporation and supply ths inf'ornration ac honootly ar poraible,
But the informatiori
rupplied by tho fartncls will havc errors in mo:t of the ca56r. Thue wo may not be
ablc to idontifv tlru 'uue' figur,e. Incpitc of all efforts, *rii.u b;
salculated or the obsen'oel figure may be good for ail practical purpore5
i; d;rk;;.;, il;
but ws gaR
ncver clairn that a ,'us value of the pl*u*-.tu, hu. i; obtainod, r? tr,o ,iuay of ths
units is baaed on 'countittg' tttuy bs we can get the true fijure of the puprlntinn
Brrrametcr. There are two kirrclc af errora (i) aampling erroru or random orrors
(ii) non.eamDlin*r r,,,,^.,,.
rF
Basic Statistics Pail'II
38
TI.?.1 SIAMPLING ERBOBS
sampling' The sample
These are the errors which occur due to the nature of
,.t..t"a-f.m d; population is one of all possible samples' Any value calculated
sample statistic' The
;;;g ;;;pi. is baeed on the sample data and- is called
sample statietic may or may not be close to the
population parameter' If the statistic
value of the poputation parameter is 0, then the differen." -
6 0 is
f?|Pj]ffiil;e
m
is a random variable
called eantplittg error.It is impor tant to note that a statistic

o
error is the difference

c
and it mai taki any value. A particular example of' samplitr'g

.
, between the eampl. ,."n f; and the population mean p' Thus sarnpling error is also
;]ffi;
ot
t ;.'The population parameter is usualry not known,' rherefore
the
due to the'

p
sampling ernoris estimatcd from the sample data. The somplhtg-eL'r'ot'is

s
sarnple. obviously, a part of
rsaron that a certain part d[ the populntion gocs to the

g
of the poptrlntion' Btrt
;;;;r;i;tion cannoi sire the true picture of the properties

o
givcs the resulb which is full
one rhould not get the inipreseion that a sample always

l
manner so that
;;;;. W.."n deeign a eample and sollect the sample data inbea reduced
. b
' the oorttp ling errorc are redUced. The sampl'ing errors can by the
following methods:

43
(i) by increaeing the gize of the sample (ii) bv
gtratification'

99
(i)
a t
By Incrcaelng the dze of the sample

t
.The sampling error can be reduced by increasing the sample size. If
the sarnple

/: / s
error is zero'
size n ie equal to the population eize N, then the sampling
(ii) BY Stratlllcation

; ; s
random sample is'

tt p
When the popUlation contains,homogeneous units, a simple
, ltkely representative of the population. But if the population
contains
of

h
. dissimilar urrite, a simple random *"*pt" may fail to be representative of ali kinds
is
;;t"l; rh" gopulation. To itnprove the resllt of the satnple, the.sample'design
groups contair"ring similar units'
modi6ed. The iopulation is divided into different
groupt are calted slrata. From each group (straturn), a sub-sample is selected
Thesetrlnao,
manner. Thus all the groups are represented in the sample
and
irr
.rtor is reduced. It is called stratified-random sampling' Tl^re. size of
" the
ilrli",
t il-;".it from each stratum is frequently in proportion to the size of the stratum'
600 are intelligent and
Suppgsg'a populatiorr consists of r00b students out ofruhich
have this much
400 are non;intelligent. we are assuming here that we do
100 is to be
i"f*;;r6r, eUout the population. A stratified sample of size n =and the size of
, ;;i;;;;i. n .;ir" of the stratum is denoted bv N, and N, respectivelv
;;;;;;br}*, each stratum may be derroted bv n, and n,' It is written as under:
buUons 39
Stratum No. Size ofstratum
Nr = nI Nt
1
I 60Q
nt=T-= loo x 6oo
I
-0O0 :60
Nz = 400 nl Nz loo x 4oo
'D2=T= 1006-=40
m
N,*N,=N = 1000 trrf tlz = n =

o
100

c
The sizc of the sarnple from each etratum

.
has been calculated according to the
size of tlre straturn. This is called pr;opoytiornt

t
allocation. In the above eample

o
design, the sarnpling fraction in the popuration
is ft = ffi = * and the
p
sampling fr'rction in bolh the strata ie also

s
lllo .Til, thie deaign ie aleo called
fixed santprhtg fractioru. This moclified e",rop;u;;ig;'i; ftcquentry

g
surveys tsut rhis design requires some p*rio, used in sampre
i;i;;;;ion
o
about the unite of the

l
population' on the hagis of ttris inroimation,
the population is divided into difrerent,
strata' If the prior information is not available-

b
tt.n the strati{ication ie not
.
.
applicable.

3
1I.7.3 I{ON.SAMI'LTNG EBRORS

4
There nre certaiil sdurces of eriors which

9
Qccurs both in eample survey ae well
as in the conlplet '\ entlmeration. These
emors are of common nature. Suppoee we

9
study each and.everv unit of the pop.ulntion.

t
is the populatiorr meon nnd the 'true' ,utu"
Til;ill"tion pariimeter under study

a
,f it ic p which is unknown.

t
we hope Lo get the.value p bv a complete ;"r"t"i"-r"irt"r
;ii tr,. units of the population.

s
9f
we get a value cailecr 'calcuiated' o"-'Jrrir;' ;;J "f

/: /
the population mean.,Thre
observed vzrrue r,av bu denored bv
rrcal. The differ.n; b";;;il;ffid; (true) is
called tLor'sqrtplii.g error. Even if wp,.atudy

s
the popuration unite ,"1;;;;;
conditirns, there rnav srilr be rhe difference-il;J;';;;;;;;"

tt p
population rir€il, an'! the true value r}uu of ,the
-ii,i-i,,iiti4g
trir p"prr"it-rr',nu"n
--- ernoro
may occur dujs 1., rnflr/ reasons. some "r
of theri i"",--

h
.
(i) The units of the. pnpulatioa may not be defined properly.
suppoee we have to
carry our a stud,,about skilled labour force
per3 jn' S,tF - pt cr:le do more
in orr'.ou"tiv.-fin;;;';k;i;
than one ioU. Some ao tn" secietariat
as the technical jobs. some are skilled rUr *"fi
worker. Tiius it is important to crearly
["t ii"v *" "r
doing the job of un-skilled
otherr+ ise there.will be *o*'sanrpli,ts-iiroii
aufi""-iir" units of the ,;;;1"#;
the "^.n_.nlc stur y.
ili ;; the population count and
(ii) There :.ray be pcrr r'sponse on the part
of respondcnte.
-irrri, The people do not
supplv corrccI i,forrnation about their
propertv etc' Theue errors are likeily
incoml, children, th"ir;;;;;
to be of high magnitude in populatioi
rhan the -ar:lpre sturly. '[o redu-cq1[;;;";H
;rp"il;;-;r. "tuJv go be
{o
snumoratorc may
(iil) Thc thinfr ln humrn hend are llkely t'o be mlE'handled' Tho
bo oarolo.. o,'iil.yil ffi;
d; ;l;ilir t;intain uniformitv t'f^*tnt:"*ii:'
iilffiffiil rrr*;$ Bopulatisir or from rhe eampla,
ffi"'#iil&!'rffi;iiy''ruriu*. tn itrb population ttatn than thE
Thsrc orrlotr are llkely to b0 tutu
remplo datr,
part of thc
'blet', Blas moans &n 0rror on thElliau rnay bc
tirl Anoifrir rorlour s*or ir dus to*it*

m
snumrrator oiiiu't.rpuia.nt tiro tlata ir bgin5 solloeted'
mey no[ bc eapable of reBorting '

o
lntontlondf or u*fniutitinnuf, An onurnerator
ifi'ffifi; ffi; il iI ffiiu *poit itiuut ttru elnditisn of oropu ln dlfferont
', lr€lt rfter he'ivy'riinl*h,-trii'nlrclrmcntr may ho
;;;til#ilhi may bc lnsllnod tn gl* ffing
blarcd

t .
duo
cto laolr of
roBorti, Blar lr a roriour errot' und
oannor ue reoueei 6i-ffid;ringii. Ju"Bin iiro,
;ill,i; ;tu-dfia wo[-dr-tho rorulatlon rtudv'
p o
Binr mav be prerenr tn rho

s
11:! dTIUPT..TNO DISTRIBUTIONSand wo dra-w all poglblo rlmplo randont
':

Supporc *. ft** nifu pqputniion

o g
l
..r ir.l'J};;6 ; 6ri:iut ;Jpr-rffi ili ui. * iitr rcplaco en r, ForE aoh
m ra te e mp w

. b
oh')' {! p.omlblo valuor of
oaloulnte rcmc rtltirtls (ramplg poan X or.proporttoS,0
wtttitt lr'oalled the rcntpllrtg
r*notrl'ii3
tho rrrtlrrlo mahc r'Brobeblfft,':aiitiit 11[[.
nur[.rtil1id1ti. iiiipi-uii. uiuittv verv larso and obvrourlv

4
dhtrlburhrr, Thc numbsr of
-iotouturoa' -will be oqual i-o tho

9
tho numbrr of rtrtrrtror (any tr,i:*mpto)
."i;E.*; icatr.u;i from eaoh ramplo, [n fact' ln

9ttr:1:',,...l[.;r,'4'n';"r.it',lh*r$i;l*"Ji:
mmp. tf ono

t
volusr' The
iiil
prrottorl rlturttoni, qiiiwll ,i-aliriUnitio, har vs.rv larse numbei sf

;ffiia ,*aor ..'rrii.i,ta


iltd%'t"'ffii',1f,*':Ilf
/ s iomiof the famoul tamplfity. dist'tihttiort, aroi

/
(i) grnomiii[iriribution, ftp Nor,rnai.dhlrlbutisn, (iii) t'rlirtlibtttion'
il'o
s : dhmlbutloi,
Chl.rquaro (v) F;dlrtrlbution'
ar.tributlonr boeaure thov are dsrlved

tt p EBBOR
Thon a**uiliorl.*.iircd iiiir.a tho
from rll podble lamPlcl,

h
11.!,1. BIA!{DABD
"''hi;:;;i;e'acrirtiin h callod the rlordord stra)r, sf rhar
of romc rrarirtto
poglble valuoa gf X b
rtrtlrtlo, If tho rtatlrtls X, tho rtandard devlation of all
ir
or oJ' Similarly, lf the
orlhd tr,n&|rt.flotof f, whloh may be writtan al 8,8, 6)
of all poraiblq valuEe of 0 ic
umplc rtatlrtlc ir proportlon $, the rtandard deviation
g'E'( $)'
..ii.i rt", turd s?irot $ rna ir donoted bl o6 or
llrt.l SAMPTINO DI$TBIBUTION Of *
Theprobabilitydirtrlbutionofallporriblovalur lE of
X ealeulatod from all
distributiorr of X' In brief' we
poraible ,irpt random ramplo ir oallod-th e eantplittX
oxpoctocl vah't'i
it digtribution of x, The moan of this dirtribution ia callcd
irrau eatt
l rnd
v
),
sf X and h written at Efi) up [tr, Tho rtandard dovlatlon (rtrndard orror)'of thir
l,
dhtributbn b denotedrby B,E,( f;) or oJ and tho vertenoo o(f, tr donotod by
E var 6)
or ofl, The dlrtrlbutlon of f, hae rcm6 lmportant propertler ar
under:

m
0) An lmportalt ploperty of tho dirtrlbutlon of X ls that lt lr a norrnal
dlrtrlbutlon

o
whon the stae of rho sqmple tr_larsc, when ltJrampfcA;;il;#;ffi"fd;
c
we oell lt a larse eamplc ilrc, The ihapc of the populaUri
lriiirUli;6; aoei ila
,
t .
mett€r,Ihu puPltltign,ryay bo normal or non.normal, the dirtribution

o
sf X lg
normal for n > 80, But thir h rruc when thc nurnbir-;,f ffi;lr;ii'rJ y tursu,

s p
Ar thc dlctrlbutlon of randonr.vnriablo X lr normnl, X ean be trnnlformod lnto

standard normal variablo Z whoro Z


=
f;:-F

og
l in[ {dL;;n;'iil'ifrii,
o / t/n,

. b
Tho dlstrlbutlon of * har tha t.dlstrlbution when tho populatlon
lr normal and n

3
s B0'..Dtasram (a) rhows rho normer dr*riburron rho r,

9 4
t9
ta
/: /s
(ti) Thc m€en of tha dlrtrlbutton of t lr equal to tho mean of tho populatlon,

s
Thur

tt p
Efi) = Fx = F (Populatlon msan), Thlr rnlatlon lr true for rrnall ar wcll ar larye
rample:lae ln rampllng wlthout rcplaooment and wlth repla@ment,

h
(ttt) Thc rtandard €mor (rtandard ilevlatlon) of
X ir- rnlaced wlth tho standard
dcviatisn of population s through tho relaiionu
g,E,( x)-6x- fro
Thtr lr true whcn.popuiatlon ir inllnlto whioh rReanr N ir vcry largo or
thc
aampling ir donc with roplaeemont from finite or inftnitc
Boputuu'uol-'

Thir is truc when rampling ir witlrout roplacornont fi.om-finite population, I.hc


abovc rwo equations between o* and s are truu uuir, 6;'il;li
;;;;ii
aa larye
aamplo rizor.
,fi
Basic Statistics Part-II
a2
11,7,
'.&cample
Draw all Possible samPles of size 2 without replaccurett from a
population
lZ, 15. Form the sampling distribution of sarnple means and
consisting of 3, 6, 9,
verify the.reaults:
(i) (Xl =u (ii) var(x) = *'(*=)
Solution:
ltle have pOpulation valuee 3, 6, 9, 12,15, population siLe i\ = ii
om
c
ar-ttl sample size

.
n= Z.Thus,-tfi" n"*ber of possible samples which can be ''.it'awn without
rep)lacement is

ot
(I)' =.(l) = ro

s p
g
Sample

l o
Values

I 3,6' 4.5

. b 6 6, 12 9.0

3
3,9 6.0 7 ri, 15 10.5

4
2
g,lz ,7.5 8 9,12 L0.5

9
3
I 9, 15 12.0

9
4 3, 15 9.0

t
6,9 7.5 10 12" r5 i3.5

a
5

s t
The sampling digtribution of the sample mrian X
,rnd its nlL'an and standard

/: /
deviation are: -l_ I
s
x f f (x) l-f d) x3f (x ) i

tt p
1/10 5/10
4.), 20.2511,0
4.6 1

h
I 1/10 . c.00/1 0 36.00/10
6.0
c, 2lt0 15t0/10 112.50/10
7.5
2 2lL0 18.(
.0110 162.00/10
9.0
2 2lt0 2L.(.0/10 22A.50170
10.5
144.00i 10
12.0 I 1/10 t2.(
;.0/10

13.5 I 1/10 i3.j


i.5/10 l9z.25ll0

l0 '1 90 877.5110
Total
90
,E(X) = rIrtXl = 10
o

- r z /90\:
- Irx (xu = 16-
877.5
Var(X) =EXz f(X) -[to,r = 8'}.7t: * Bi = 6.7tr
t-
I [Chapt"r tU S".rpting ibrUonr. ,
S"rpting Di.t {3
"nd _
The mean and variance of the population are: .

x 3 6 I t2 15 EX =45
x2 I 36 81 t44 225 XX2 = 495

p=# =+ =eando'=#-(#)' 495 / as \z


-l.T,, = 90-81 = l8

m
Verification:

o
(i)E(X;'= p=e (ii)var(X) =
*(N*)
18 15-2\

c
= T [5-il = 6.76
Example 11,2

t .
o
If randcm samples of size three are drawn without replacement from the

p
population consisting of four numbers 4, b, b, z. Find eample mean

s
X for each

g
sample and make sampling distribution of X. Calculate the mean and standard

o
deviation of this sampling distrrbution. Compare your calculations with population

l
parameters.

. b
Solution:

3
we have population values 4, b,5,7, population size N = 4 and sample gize n B.

4
Thl., the number of possible samples which can be drawn without replacement= is
N)
9
f ( 4\
(r/ = [e,/ =a'

t9
a
Sample No. Sample Values

t
Sample Mean (X)
"
s
I 4,5,5 l4lg

/: /
q
4,5,7 r6il3

s
3 4,5,7 l6i/3

tt p
4 5,5, 7 t7l3

- The sampling distribution of the sample meanrX and its mean and etandard
h
deviation are:

x f f(x) x f(x) X, r(X)


t4l3 1 Ll4 t4lt2 196/36
1643 I 2t4 321t2 512136
r7t3 1 u4 17112 289136
Total 4 I 63n2 997/36

rX r(X) = f; = b.z6

63 \2
\rX, r(X) - [rX rlXy;z =
n) = 0.3632
J!
The moan and rtanderd doviatlon of qbs PgEggggn ryg

EXU e 116

om
t . c
u-'Y:;r'jri'rrru6e
p o
s
rampler of srze rwo with reptaeernent frop thc popul*tion-P,

g
z, E, S[;;;iiltid;;eoiaiioo moen lr equal to thc meaR of means sf all
sntnplos

l o
ihO pueotatlon varlane'o ir twlee the varlanee of samplo tnoane
SolwtlEru
. b
,

We have population values f[,2, -g,population siue N = 3 and eamplc


tizs n = 2'

3
Thur, ttrf;ffibii'.f-i6.ri6ioiaiuptei,'whieh can bc drawn with rcplaeemont
is

4
\fn=$!=$,

9
Eample Mean Saurplc Barnple Mean

9
Eample Bamplc Bamplc

t
No, Values (x) No, Valuce (x)

ta
I 2,2 2 6 2,8 6

s
E'?

/: /
7 6
z 2,9 a
B 2,8 6 I 9,2 6

s
4 2,2 2 0 8,8 B

p
2,2 tfl'

t
6

hxt
rneaR x a tid tH mcaR aRfl varianes are:
nd va
The ramplinE dietrlbution of the ca
Tally t f (x) x f(x) ls r([)
2 ll'll 4 4ls 819 16/0

6 illl 4 4lg 20/9 100/t)

I I 1 110 8/0 04lg

0 30/9 180/9
Total 1

E (,x) = rf;r(xl = 9#= 4


Var(X;= ENr f(X) - tEX f (X)lr
180
I =(%u)u * n

2Var(X)= 2(4\ =, $
[Gtfitr tU trmpllnc rnd lrmpllng Dlrtrlbutlonr 4t
The mean dhd'"varianee of ths population arel {,,

x 2 2 I EX= 12

xI
*
4 4 '6d EXU = i2

p=# = + #: (#)'= f :(#)'=t


m
e ahd n'=

c o
.
$enee E(N) =pt=4 nntl s!=9Vnr,(*) =8,
Drample l/,{,
ot
p
A population has the values 10, 12, 14, 16, 1t and 20, Drlaw all porelblo

s
ramples of slze 3 wlthout roplacornent and ealoulate the sample moaR X fol oaeh

og
rample, Wrlte the sampllng rllstrlbution of f,, Irlnd the followlng probabilitioa:

(i) * will be grcator than 16,


bl
.
(i0 X wtll differ from p by lem than B unite,

(tit)
3
Sarnpllng error will be less than 0, (lv) f; will be equal to

4
p,

I 9
Sohtllont
Aill poretble rampleo of rlre

t9 will be equal to oC, =*pfu = 15

ta
Thc eamples, their mcens and,necessery oaloulations are ag undcr:

Eample

/: /
Sample
Values s Sample Mean Sample Eamplc
Valuor
Barnplo Moan

s
No, No,
(D (x)

tt p
1 10, lP lt 0 19,24 l0

h
2 to; 14 t2 l0 14, 16 16

I 10, 10 1B 11 14, 18 t8
4 10, 18 L4 t2 t4,26 I t7
6 10, 20 l6 18 16, 18 t7
6 L2, L4 18 l4 16, 20 18

7 12, 16 14 16 18, 20 10
J
I 12, 18 16 ,l'
46 Basic StatisUcs Paft-II

Sampling Distribution of X

x f ffi)
11 1 1/i5

m
t2 1 LILS

o
2lL5

c
13 2

t .
l4 2 2lt5

E 3

p o
3/15

s
()

g
16 2lt5

l o
L7
c)
ztts

. b
18 1 LILS

3
I 1/15

4
19

9
TotaI 15 1

ffi t9
a =#
poputation mean u= T='u =

s t
/
(1) P6:16)=*.*.*

:/
s
(ii) X will differ from p by iess than 3 units if X is greater than 12 and is less

p
than 18.

t t .2232211 =

(iii) h
ThusPtlX-[tl<3] G* G+ lb* G* G
=P(12<X<18) = G
The sampling error will be less than 2 if the random variable X is greater than
13 andless than 17. Thus P(13<x" tz)= P(14<Xs 16) = p Ils'B |'2]
2927
-15 -15-15 -15
3
(iv) P(X = P) = P(X = 15) =G
Erample 11,5
Certain tubes produced by a company have a mean lifetime of 900 hours and a
standaid deviation of fOO hours. The cornpany sends out 2000 lots of 100 tubes each.
f!fiapten Lll Sampling and Sampling Distributions 47
Compute the mean and standard deviation of the sampling distribution of the
sample mean X if sampling is done: (i) with replacement (ii) without replacernent.
Solution:
Here N= 2000, n = 100, p= 900, o= 100
(i) Sampling with replacement
lt*=P=900 and o;=
fr = ffi = tO

om
(ii) Sampling without replacement

t . c
o
ffioo- roo
=fr\H 100

p
1t;.= P = 900 and o; -:_ zooo-r =9'75
=
rlroo \
11.E.3'SAMPLING DISTRIBUTION OF s2 and 52

g s
l o
Suppose we draw all p.ossible samples of size n from a finite population and

calculaie the sample variance .' = H


. b
for each sarnple. The mean of the

3
sampling tlistribution of s2 is denoted by E(s2) or psz.

4 x(I:
It
can be shown.that if

9
sampling is with replacement, then E(:') = p.z = o,2. Thus s2 is an unbiased estimator

t9
of o2. The sample variance s2 is defined as: s2 =
D'. If samples are drawn with

a * o'
n

replacement, it

s t
can be shown that: E(Sz) = [ftS) * o'],

/: /
Thus 52 is a biased estimator of o2. In case of sampling rvithout replacement,

s
we have the following relations:

tt p \J or = (*)"'
Pr.') E(szy

Example
h E(Sr) ;h.

11.6
or E(s,)= (sJ(+)",,
4 population consists of three nu.rnbers LO, L2, 14. Take all possible sanrples of
size tivo with replacenient from this population. Find the mean and the unbiased
variance for each sampie. Show that E(sz) = o2 where s2 = I(X - X)rl(n - t)
Solution:
We have population values LO, 12,14, popt lation size N = 3 and sample size
n = 2. Thus, the number of possible samples which can be drawn with replacement is
Nn=32=9.
tT

Earnplc Mcan SamBlo Varianee


Eample No, Samplc Valucc
fr cr. e E(X*n!/(n-l)
" EXAr
1 10, 10 E** = ,u r0
,
l0+ @=ta
m
@=tl
12
2 10, 12 f) - r+
2=L

o
a

W=a
. c
-uQ*[ =
I 2 -*-,u
t
10, 14 a_7 -- +

o LI
(12: l1)l +

p
uE-19 = ,, [10= =u
4 12, 10
2 - ',*. 2-L

g
l?,+
s 2 -*'
12 12: 12)! + (12 =.IP)s

Illl-Ll =-'x lo W=z


@=IU
6 12, L2

'6

. b
r fr
l?,,1/l fl D_t

3
a-,

#=,, 4
9
7 14, 10 =S

14, 12
t9tL_il! _ ,,
s#=,t @=s o
a-_t t

a --=
e

I
s t W=o
/: /
14, 14 2:t ru

s
The rampllng dlstrtbutlon of thc sample varlanoe ss and lts mean isl

tt p
t9t Tally f# f(rs) gr f(el)

I E(rr1 = tut (3t)

h
0 ilt 8/9 0
z lll'l 4 4lg 8/9
=T e 2'61
I H 2 zls ,16/9

Total 0 I 24lg

The varlancc ef thE Bopulation ial


x 10 t2 14 EX = ll0

x!l r00 t44 106 JxB = 4'io

Ex! l Ex \t 440 I s0\r -


o'= 5r-l.Ti.j 3 -\3i
_
=
| I
2,87

Hsnco E(sr) = ot = 2,67


IGhrphr tll lrmollno rnd trmpllns Dlttrlbutlont 49
Dxample 11,7,
A population eoncists of five valuog 4, 6, 8, 10, 19, Take all pooaibto samBleo of
rize two without rcplaeement front thie Bopulation and veriff that

E(sr)=(t)(H)n,
-Solutlsru
-*,
tave Bopulatlon valuo:4, 6, E, 10, 12, populetlon eize N = 6 anel nntnple
clna n r 2, Thur, thE number af Boceible rampler whloh ean bs drawn without
om
rcplaeement h (X) " ( E) ",u,
t . c
p o
Sample Vurianse

s
Samplc Monn
Samplo Valuor
x.+
g(4-6\llt(6-Slr r
Samplo No,
gr E

trs!bl o
.
1 4,6 2
I

4,8
43
tr-g ({-0)t+(8-6)r :-

9
2 11

t9 - - (d 7\l {:(10 7\,1


I 4, 10 E 0

a ff"a
2

s t * ({ - 8\'l +.(12 - 8)'r

/: /
4, 4, 12
2
l0

s
(6-ilt+(8-flt
6'8 rytl =
tt p
6 1
2

h
6 6, 10 H" (0 - g\r {:(10
2
- Bu c I

1 6, 12
Y-, (6 - 0\t

(8-0)r+(10-0)r
{:(12:0lrt
2 = 0

8 8, 10 ef,! = 6' a)

0 8, 12 Efll
2 --rv
,^ (8 - 1O\r +-(12
?,
- 10\t
=i
10+ 12 (10;Xl\tt(12-11tu ,.
10 10, 12
2 -" -. 2
1

-rli
50 Basic Statistlcs Part-II
n distribrrtion
The sampling of th
n oI tne sample D- a nd its mean is:
le varlance s2
S2 f f(s) s'?f(s)
1 4 4lL0 4lr0
4 3 3/10 LzILO
I I 2lr0 18/10
16 1 1/10 16/10

m
Total 10 1 50/10

c o
50
E(S1= Irsz = ES'?f(Sz) -u
.
-E
10

t
The varian ce ofthe popul
ula tionn is:
ls:
x 4 6 I l0 t2 EX =40

p o
s
yz 16 36 64 100 r44 IX? = 360

og
bl
Hence E(s) = (*)(:t"
3 .
4
)o2 =
b

9
"
,./Example 11,8

9 (ii)
A population of 10 numbers has a nrean of 100 and a standard deviation of 10.

t
If sampies of size 5 are drawn from this population, find the mean of the sampling

a
t
distribution of variances when sampling is done

s
(i) with replacement withoutreplacement.

/: /
Solution:
Here N= F= 100, o= 10, o2= 100, n =

E(s) s
10, 5
(i) Sampling with replacement

(ii) tt
p = (+)o, (?)
ps2= = loo = 8o

h = = (^*) +)o, (#X"'"


Sampling without replacenrent
E(s,)' r,s2 ( = ) r00 = 88 8e
11.8.4 SAMPLING DISTRIBUTION OF DIFFERENCE RETWEEN TWO
MEANS
Suppose there is a population with mean Fr and variance of. Another
population has the mean p,, and variance oj. eU possible simple random sa.rfiples of

size n, are selected from the first population and the sample means X, for each
sample are calculated. Similarly, al! po:sible simple random samples of size n2 are
selected from the seconcl population ancl the sampie means X, are calculated. The

\-
lChapter 11I Sampting and Sampting Oistributions 51

difference (X,. - Xr) is another randorn variable and its distribution is called
eampling distribution of X, - Xr. Some properties of this distribution are:

(i) The mean of the distribution of X, - X, is equal to the difference Fr - ltz. Thus

E(X,-Xr;= p*,_x, = pt,-pz

Similarly the distribution oiX, - X, has the mean pxz


- Xr = lt, -
om
ltr.

If Fr = p, then E(X, - Xr) = 0

t . c
o
The above relations are true for any type of population with any samplc sile,
' small or large and the sarnples may be drawn by without replacement or with

(ii)
replacement.

s
When samples are selected by without replacement from a finite population, p
og
I, - X, has the following relation with of and oi.
l ).;t N,;Tl
the standard error of
ffiN,_n,)
s.E,(x,-x,; ox,-x, =
.
\,,tN;= b =

4 3
When samples are drawn with repiacement or they are .drawn from infinite

9
populations (Nr andN, are very large), the relation becomes:

t9
S.E.(X,-Xr; = ox,_x,=
a
\*.[
It

s t
may be noted that in practical life, N, and N, are usually very large and the
'fractions
H
:// ffi una are almost equal to unity. Thus in the subsequent

p s VG -

t
=
chapter, we shall frequently use the relation of

t
\ ", d , - X,

hin
(iii) The sampling distribution of X, X, is a normal distributio, *hur, n, > 30 anrl
-
n, > 30. The sample sizes n, and n2 may be equal or unequal but both should be
large size. The difference (X, - Xrl is a random variable wit,h normal
distribution and the standard normal variable Z can be writtert as
77
L ,),
6', o;
nl nz
The distribution of X, - X, has the t-distribution when both n, and n, are small
in size.
lmlc Strtlrtlc. Pffi.ll

Draw all poeaible randotn sarnpior of cizo trr r 2 without roplaeomont frorn the
finitc population 2, 9, 8,Similarly, draw all posciblE random eamplos sf cizo hr = !
without rephcement from ths population I , 1,2, 4,
(i) Find the pseeiblo difforsnscs botwosn tho aamplo menRc of tho two populations,

m
(it) Oonstruet the eampling digtributlon of X1 - X, and eompttto its menn und
varianee,

(t+t.J - *
o
c (H
o
(iii) vorify thatr E(f,r:f;d - p,-p, and vur(x,-x,) "
* )
Sohttlont
s p
Population Il 2,2,8

og [I:
PoBulation L, 1, g, 4

Populatlon rirc N, . I
bl rlre
PoPulatlon rize Nu r
.
4

3
Semplc rlzg tr1 I 2 Samplo trr F 2

9 4
The number of pouible rampler whlch The numbor of po:rlblo ramplor whlch

t9 '.(Xi)=(l)=
oan be drawn without raplaelmant can bo drawn wlthout replaoomont

"(Il) -(B)='
ta u

/: / s
s
From PsBulatlsn I From PsBulation II

tt p Sampls Mean Eample Surnplc Moan


SamBIe Sample Samplo

h
No, Veluor (Xr) No, Valuer fir)
I 2,2 2 1 1'1 1,0

2 2,6 4 2 1,2 1,6

I 2,8 ,4 I L'4 2,6

4 1,2" 1,6

6 1,4 2,6

6 2,4 8,0
lChapteT 11I Sampfing

(i) The 18 possible differences X, - fr, are shown in the following rable,

xr
(,
x, 4 4

1.0 1,0 8,0 8.0

om
. c
1.6 0,& 2.6 2.6

t
2,b * 0.5 l.d

o
1.6

p
I ,ll 0.6 2.6 2.6

s
2.5 - 0,6 1,6 1.5
8,0 - 1.0

og
1,0 1.0

The sampling.distribution of differenses betwoen sample meanB X,

bl - *,
.
and its
mcan and variance are computcd below.

X1-Xr=4
43
- 1.0 I
99 l/18 - l/18
t
1,0/18
q
-
a
0,6 2fi8 - 1/18 0,6/18

t
0,5 t)

s
,
2tL8 1/18 0,6/18

/: /
1,0 3 3/18 B/18 8,0/18

s
1,6 4 4lL8 6lL8 9,0/18

tt p
2,6 4 4lt8 10/18 26,0/18
8,0 o
2lL8 18,0/18

h:*d
1 8/18,,

67/18

E(f;r = It(rt;= Edf(d) - ffi= * I

varfi1= xu) = ver(d) = Eds (d) * [pa (at]s


(ffi) Ths mcen and varianee of the flrst populetlsn
* # = (6), - w"ffi
are:
xr 2 2 6 EX1 x 19

x? 4 4 g0
EX! * aa

p, =ffi *$n,r *?-#*(ffi),. f (#),"* ry-4%rsa.ry


l'
;

Baslc Statlstlca Paft-Il


I

The mean and vafiance of the second population arq;


xr 1 1 2 4 EXr=6
xi I 1 4 16 ZXI=t22

ffi = ?=zand"i=#-(#J' ='+ G)' =? n= W=E


m
P, =

o
10
=f 2^ = -F=
-1-0--:-0 L

c
Irr-Fz

.
5

:i r*,-n,),siIN,-o,)=
[t,NF )- o'[ N'- )-
q(*.*J-*(]*j= i3* i= #=
ot ?B

p .*(H)=i3
L 1

Hencenfi,-X,y=lrr-r,, =8and Var(x, -X,)=


g s
*[H)
Exomple 11,10

l o
b
dirrn Nr = 800, Nr = 600, nt = 200, nv'= 124, p1= 1800, pl = 1600, or = 200 and
.
ot = lZ4, Compute th. ,o..n and stanclard error of the sarnpling dietribution of
the

difference X, -
43
X, if sampling is elone (i) with replacetnont (ii) without roplaeetnent'

9
Solutlon:

Pi,-iz = *P, = 9
(i) Sampling with rcplaeemcnt

fT-olt
t, e 1800: 1600 200

oi,=i, = \ii.ilta
/ s = =18
(iD
:/=
Sampling without rePlaeement

s
PIr=Il=Fl-P! 1800-1600 = 2Q0

tt p=
nI,=ir=m= (200)!

h t6'11
11,T.6 PROPORTION
of whlch
What io a proportion? Suppoco there are 1000 ctudents in a sehool out
coo aie- mufu ioa aoo uru female, ihe ratlo of 600 to thc total iB callod
the
Ep+
proportlon of maler end l: denoted by p, Thuu proportlon of maler = n'n ffi
ffi = 0,4
and proportion of fernalee = q =
tet ur denote male by ,o.*l uod fcmale by a failure, lf the tnalc ctudents are
populatt^on
then tho
arElgned iriooo*iu;ig[6t femaler aro areignedthe nutnber 0,
oontains 800 oneJ aoa ibO ,rrsc, fhis .a,i be written as bclow in the form of a
iiiii,i-u-i,tiri uuitua;ii; B;iriltli distribution, Lct ua oalsulate tho meaR of thie
distribution,
Random Variable (X)

400/1000 = 0,4

600/1000 = 0.6

om
c
E(X) = Mean = llX (X; = 9,6

t .
. Thus thc-propot'ti.o.tt of tho population callcd the binomial populntion is equal

o
1p.

to the mean of the populution corltli;i,lg Os and l,s,

p
II.8.6 SAMPLING DISTRIBUTION OF PROPORTION

g
Suppooe there ic a finite population in whieh tho proportion of
s
oueeosoes is. p
and the proportion of failuree ie q, Supposo wo draw ali pogsiblc rurptur

o
of r*izc n

l
from the population and ealculnte the sarnple proportioi for each oample, The

b
ff

.$
eampling distribution of p hae the foilowing properties

3
(i) The mean of the sampling diatribution of lo equal to the

4
population
proportion p, 'l'hus E(0)

9
= 1.0 = p

9
I
I
Thic relation is truo in aampling with rcplacement and without replacement

t
for
any cample sizc,
(ii)
ta
The ctandard error of 0 ia relatcd to the population parametero p and

s
q

/: /
through the equationc:

s
S.E,6)=o0=\ffi(Trueforaamplingwithoutreplrrcement)

and S,E,f6l = u0 =
tt p\F (Truo for sampling with rcBlaesmont\

h
\ or when N ia vory large )
(iii) The eh.ape of the distribution of p ie normal whcn n > g0,
Thc valuo of Z ean be
ealculated frorn $, wherc g = i=g,
,83
\'
warntngt note thar when n is omall, the dtrrrtburton of
lj^i: 11lP_9,xayrt
tR6 t"dlttrib.uflon,
$.tr not
EwmBle IJUI
. I$_nonulation
rize
constetr of flve numbrm 2,6,6, ?, g, Take ell pomlblo ramplc: of
from thie population without ,cptaouileiit rini iomputE th; pro[reriion of sdd
numbcre for eaoh earnplc, Veriff rhatr (i) pg:1p_$,@
!
Basic Statistlcs Part-II I
f(
solution"ave
popuration values 2, b,6, ?, g, population siz,c N = 5 u*cl sampre.sizc (j
n = 3. Thus, the number of possible satnples which can be drnwn without (
replacernent is (I) = (3)= 10. Let fi rrp.u*unt tirb
pro,ortiorr.f orld nurnbc's in !
I

m
the sample.

o
Sample Proportion Sanrple Sample Sample Proportion r
$ample

c
Sample
(6)

.
Values No. Values
No. tfir
1 1,6,6 t/3 6

o t
2,7, f) 213

p6,?,[l.
5,0, 7 2t3
t) 2,5,7 213 7

s
3 2, 5, f) 'Jl3 I 6,CI,9 2li)

g $
2,6, 7 l/3 I 3/3

o
4

l r#
10 (;, ?,9 2t3
t) 2,6,9 1/3

b
. -
The eampling distribution of the satttplc proportion ancl its mean and

3
varianceato:
4 t0l $, rt0r
Tally g

99
a t
s t
/: /
s
* u$ rtfi) * 19: 0,6

tt p
Fii B0*
uu' * p0, rt$) * r($)1, : BB /Ig\J
00 -\80/
:--lg-l
= 0,40 = 0,Bg
* 0,04

h
tE$

Populatisn prgpsrtisn oT* - B e 0'fl, q E I = p' 0'4


popriluttotl'
where X reprcsente the numbclr sf sdd digiie in thcr
gsilq4 (W\ '
r'--g-
f S=?\1 =
n\N:li \E=tJ=()'o'1 I
r

I
Henee (i) pr^ = P = 0'0
(ii) u'0 * # (H) = 0'0'l
Example ll,l8,
s;1 anrl s.T,Lffixl
A frnitu,p*pulatign egntaina 4 smekere densted by 8r,sg,
grnekers denoted by Nr and NB, Draw all peesihts randont eamffo of
siap I

propot'tion qf ctnqkprb fr in ench


l,eplpcepent from the population nnd saluulate the
and Sam Distrlbutlons
mple. Write the probability distrihution (oampling distribution)
of 0 and fincl the
lorving probabilities:

I p ,^... A I
0 is more than (ii) $ is equal to p (ttt) p =; (iv) that both are smokers.

'e have population values Sr, Ss, Sr, S,t,.N,,


{r, populaticlp size N = 0 und snmple
n = 2. Thus, the nurnber of poesible sainples

m
which can be clrawn without
placenrentt'(I) = (3) =
o
rb.

Sarnple

t . c
Sarnple

o
Values' proportion (fi)

p
I Sr, Sz 2t2 -9 Sr, Nz y2

s
t) ,', lt7
Sr, Su Sl

g
10 Srt, 2/2
3 Sr, .7 l1)

o
S,r 1l S,t, Nr Lt2

l
4 Sr, Nr u2 t2 Nf

b
S,t, Lt2

.
l) Sr, Nz lt2 13 Sr, Nr y2

3
6 Sz, Sr qfi
l4 Sr, Nz Lt2

4
7 Sr, S,l O!()
Nr, Nz

9
1l-r 0
I Sr, Nr

9f (c)
r/2

t
The sampling distributionof the sample proportion

a
$ i*rr

s
p

t
/: /
0 I 1/15
y2

s
8 8/15

tt p
2t2 6 6/15
Total 15 1

h, r)
Populat.ionlrrolrortion

(i)
p = fl = ?
r'(0
* = (ii) P(0 = p) = 0

(ii1 u(6 = *) -
* (iv) I'(both arc rirnokers; = I
Example 11,18
If
samples of n = 200 oh"servations are to be drawn from a large population
N = 2500 in which the population proportion is 20 %, Oetermine
if,. *xpu.ted moan
,r*l^tlii.*rd_-deviation of the u,r,npling clistribution of proportione when rr*pfi"g-"
,ufon. (l) wtth replacemcnt (ii) without replaccrnont,
roh
r::
Baslc Statlrtlcr Part'll
58
Solutlon:
Here N= 2600, n= 200, P = 0.20,
q= 1 - p= I - 0'20 = 0'80
(i) Sampling with rePlacernont
(0.20x0.80)
E($) = p = o.2o and s.E, (0) = aF = 200 = 0,0288
(ii) Sampling without replacoment

E(0)= p = o,2o and S'E' t0l =

om
.
= 0,0271

t c
BETWEEN $, and fl'

o
11.E.? SAMPLING DISTRIBUTION Of DI!'pEB1INCE

p
. suppose there aro two populatione with proportiono p, and p, rrnd all possible

s
selectcd frorn the poprrlrttions
eimple ,rnao, eanrples of .iru.,n, and n, are

g
frorn the samples are fi' and fi:l' The
reapectively, Thc samplo proportions calculatod
diffsrence 0, - 0, is a random variablo and
l o
ite distribtrtion is callcd tho sa,tpling

. b
digtribution of $, - ffr. The propertiea of thie digtribution
are:

3
bctwoen p'
(i) The msan of the distribution o.f !, - 0, i. equal to thc difference
4
l

9
r
andpr. Thus p0, E(0, -0il'= P1 -Pz
-0r=
9
i

t
f

sampling with ancl without


Thie relation ig truo for any eample size and for
I

ta
replacement.
ttut the following relation with

s
ir (ii) The etandard error of the distribution of Gr -'0rl

/: /
r
1

population Parameters

s
tt p
s.E.6r - 0r) = o0, - rr =
(Truo for eampling without replacetncnt)

h and S.E,(6r - 0rl = o0, - 0, = !? - lp,q, PrQr


ff
(TrueforsamplingwithreplacementorwhenNisverylarge)
has ths'normal distribution when both
n' and n' are
(iii) The diatribution of $, - ff,
eize, the dintribution ug 0t.- $., ctoee
large in eize, whon n, and nr are ernall in
difforente (0, - $r) ean be
not form any standard distribution. The randorn

tranaformedintostendqrdnormalvariableZwhere,=ffi
Il1 I11
59
Example ll,I4
Given the data: Nr = 6, Dr = 3, Xr = B, Nl = 5, fi,t= Z,Xy= Z.

Find E(fi, - 0, anrt Var(fi, - fi, if sarnpling ie done


(i) with replacement (iiy without rcplaconrent
Solution:

Hero Nr=6, ht =3, Xr=3, p, = =63 = 0,5, qt =1-pr =0,6


ff
om
Nr=6, nt=2, Xr=2, p2
2

t . c
= ffi =;D = 0,4, Qg=1'-pr=0,6
(i) Sampling with replacement

p o
nf0, - 0rl = pr - pz =
g s
0,6 - 0.4.= 0,1

var(0r
l
= T - ry gryO . s4#*)
- 0,
o
b
=

=
3 .
0.0888 + 0.12 = 0,2093

4
(ii) Sampling without replacement

Pt0,-0rl pr-pr
99
t
= = 0,5-0,4 = 0,t

var(or_or=
ta
?(H+).H(H3J
// s
. =s:
= ryoGu*) .ry(B=f)
tt p
0,06+0,0g=0,14

h
Baslc Statlstlcs Patt'Il g
60
NTTIONS
s
A
..Population w ;ry?"re. fl&" " bi\oh6'r'
or iirreresr in a particuiar problem'
d
;;:ffi; ;; tot"ir.t lr"*rirur.runiu
or T
Thepopulationieaset{datathatcharacterizeseotncphcnomenon. t
population; For s
-
#:';:JrlE*:tn: *,t
number or erements, it is.cauetr as rinfte I

m
of chaire in a college
example h,,*un'ilpJ;tit;; ;;mber t
-- Inflnito Population
o
. o,r---^-r^ *it i- as infinite
infinite population'

c
number of eremente, ^"irort op
is crilled

.
If a popurationi*r'Trrinite lt
t
For exampru nu*irr;'fi;;;; ;n linr, numbor of stare in the eky.,

o
which we want to get some inrormation
is calted target

p
IHl,T,:fXiTl,
s
population.

g
.-Sampled PoPulation population'
A populati"r il;;;;h a sample is drawn is called sampled
- SamPle r--r-r r-^* population
l o
b
,,^-,rlnti
A,;;ple is a subset of clata eelected fromora^
A sample is a subset of the population
3 .
that contains measurements obtained by
an

exPeriment.
?Bandom Sample
9 4
9 **plo
r!- r- ^-r^A a *^^^nm nnrnnle.
by random sarnpling- iscalled ^ random sample.
qut.i;.i
t
A sample
or

a
whose barnpling unite have known

t
If a sample is selected from- euch a popiiation is eaiilto be a random sample'

s
probability ,t ri*"vi, equal o, unliuJr-irr.

/: /
poputntionf
3lffiill1e the procees of drawing6amnle from the
s
tt p
,m;i:#.:iffill$,ectinssm6q#Jrrom a sroup on the basis or chance or luch ie
calied a random samPling' , or
h
A method of selecting samples
ii;ffi;qrd ;; ;;qial
Unite
eo that ,i.t, ru*ple
chanco of beins selected'
of a given eize in a population

- SamPling.,nit. .re
.l/sampl'rg honoverlapping colloctions of elements
from tho population'
E

or
, popuhtion aro known as sampling units'
The basic slemente that constitutes
*stmpte Random SamPle evory item from'a populatibn has the same
.{ eimple random eample ie one in which
JU".l
"r
ttl;li;; tt tnv other itom'
or
Aeampleseloctedineuchamamerthateachposeiblesampleofaspecifiedsizehas
an equal.f,ui.u ofUuing aelected'
(ir)
119
119 rA Multiple Choice euestions...........;....
120 ........ 1E
tza
LzA
Chapter 14

m
121

o
122

c
t22
|23

t .
24

p o
s
25

g
26

l o
. b
31

43
99
t
t2

ta
/: /s
s
tt p
h
)
)
[Chapter 11] Sampllng and Sampling Olstrlbutlons 61
Simple Random Sampling
A procedure for eelecting members from a population in such a manner thnt each
drawing gives every available rnember an equal chance of selection,
or
A method of selecting items from a population so that every possible sample of a
specified size hae an equal chance ofbeing selected.

iiTll:l-rlxffif:ffi1il?
m
rhe popuration is first divided into subgroups, caued
etrata and a random sample is then taken from each stratum,
or'

c o
.
A etratified randotn sampling ie obtainecl by partitioning the sampling units in the

then eelected from each straturn.


ot
population into nouoverlapping subpopulations called etrata. Random enmplca are

p
Parameter

s
-
A parameter is a numerical.clescr.iptive measure of a population.

og
l
rA parameter is any measure which au...ioUl. a population.

b
Statistic ---

.
A statistic is quantity calculated from the observations in a sample.

3
4
A meaeure computed on the baeis of uu,n,il data is termed as eratisric.

9
Censug*

t9
The etudy of all the data pointe in a population is called a census.

ta
To etudy all the individual observatione of the entire population is called censun.

/: /
titf#f-"iffT# population. s
ilfr,.1r"r,on abour a popuration wthout u*u*ini,rg each and

s r
every unit of tho

tt p
(ii) To find reliability of estimates derived frorn the sample,
Advantagee of Sampling
(i) Sampling is cheaper thun cornplete count,
h
(ii) The data are collected and analyzed rnore guickly.
(iii) Sampling eavee time.
(iv) A highsr quality of labour with better supervision can be employecl cluo to
reduced volume of material.
(v) A emall fraction of population gives eometimes compreheneive and detailed
I
resulte.
Sampling Deeign /
t A eampling design is a definite statistical plan. which hae all eteps taken in the
I
eelection of ths eample and method of e-s$1gsliglr
I
1
.of
i The eampling deeign specifies the method of collecting the sample,
I
Baslc Statlstlca Part'Il
62
- Sampling Frame
population'
;lffi;l#g frame is a list of all narnpling units in thc

i*?}roa a sarnpling fratne'


A riet of the earnpling units for a stucly
-" ProbabilitY samPling'
unite are chosen on the basie of
A probability eampling is one in which the sarnpling
known probabilitien. '
of
When each anel every element of the pi"pulatisn
seleeted in the .u*pi,j', tr,u" rrrpling ic"eaid
to be probability campling,
om
hae known probability of b*ing

-ilflj';:;"!,*tlyL?filt|illn.r,rnr,ubility
t . c
sarnpling .*hul the procedure.o,f selecting

o
*n p*bubility but perconal judgement
the elernentc from the population ic not blee(t
is involved in selection

s p i

g
object
i^iHxffi;#ffi1ffi'-T# the popurarion and is,replaced beforc rhe next

o
ie known ae eamplins with replacement'

repla.rruff*tenl
i;;;il;d,, i-.tr " r,eio.tiun

. b
we draw a eampling unit from a
Sampling ie said to be with

3
next unit is drawn' In eamplrng
population una ,.tur* it to the pbprltiiun bcfore the

4
withrepIa.o*.ni,ui;il;'.unuuchosenmorethanonceinasample.

9
Sampling without BoPlacement
is p.erformed when an object is not replaced in the

9 o''
Sampling without replacement

t
population after it has bcen salectcd'

a tit,itit
'
t
when wc draw a eampling unit from a
sampling ie said to be without replacemcnt

s
before the next'unit ie drawn' In

/: /
population and do not return it to''tllpopulotiun
sampling without replacement an ."nnot be chosen more than once in a

s
sample,

tt p
Pormutation of the objectc selected from a
A permutation ie an arrangement in whieh the order

h
gpecific pool of objects is irnportant'
'or
objecto'
A permutation ie an orcterecl arrangement of
Combination to order'
i-.lJi*tion is coUcetion of a group of ohjccts without regard '
or
regard to order'
A combination is an arrangement of objecte'without
.gampling Error
a population paralnetcr and a snmple
The eampling error is the difference between
ctatistic'
or
gtaiietic and ite correaponrling population
The difference between a samPle
parameter ie called sampling error'
[Chapter 111 Sampllng and Sampllng Dlrtrlbutlons 63
Non-.Sampling Error
- All types of error other than sampling error, such as measurernent error, interviower
error and proceseing error is called non.eampling error.

Non-eampling error ie introduced by Ui".llon..iously or unconsciouly, on the part of


the recearcher. This is due to irnproper earnple eelection, improper questionnaires, etc,
-Bias [Ln\aia-5: u'L.e'r er-fec\a,ti.,r o] o.-*r s\^t,s\a q .1*-t\";t,rg.z*;fu1
The difference between the mean or expecled vatue of a statietidand the value of {ne)
eflsm:teTl'J:STtgi-i:H,?l*rfuf ., \.j;:Txm*
o
".u".d

. c
Bias meano a sycternatie cornponont of error which dcprivoa a statictical roeult of its

t
representativoneec,

o
- Sampling Distribution

p
yThe distribution of all poscible values that ean be ascumod by uonre statietic,

s
computeel from samples of the eamo eize randomly drawn from the came population,

g
ie called the sampling rliotribution of that statietic.

., A probability distribution coneieting .f


l o
;fi poesible values of a sample etatistic ie

. b
known as earnpling dietribution.
- Standard Error

etandard error, 3
The etandard deviation of the sampling distribution for a statistic is called the

4 ,,
99
t
The etandard deviation of any ostimr'rl, called the standard error of the

a
estimator.

t
Sampling Distribution of the Mean

s
If we take all possible samples of a given eize from a population and determine the

/: /
mean of each eample, the probability distribution of the sample meano is callcd the
eampling dietribution of the rnean,

s
tt p
A probability distribution of nll
poseibluo'ron',ulo meunc of a given errrnplo cize is
known as sampling diotribution of the rnean.

h
Central Lirnit Theorem
If alt eamplee of a epecificd cize are eclected frorn any population, the sampling
diatribution of thc sample mean is approximately a normal diatribution, Thic
approximation improves with larger samples,
ol'
If the sample cizc is large, thc theoretical sarnpling distribution of the tRcan can be
approximated closely with a norrnal dietribution,
Populatton Proportion
Thi fraction of values in a population whieh hae a epccific attribute ic called
population proportion
Sample Proportlon
A sample proportion ie the fraction of iterns in a sample that hae tho attributc sf
intereet.
Baslc Statlstlcs Part'U

MULTIPLB - CHOICE QU
Sample is a eub'eet of:
(a) population (b) data
(c) set (d) distribution,
List of all the units of the population is called:
(a) random samPling (b) bias
(c) samplirrg frame (d) probabrlitysamJrling.
8. Any calculation on the samplu data ie called:
(a) parameter (b) statistic
om
(c) X (d) error.
t . c
o
Any measure of the populatinn is called:

p
(a) tinite (b) para.Irreter

s
(c) without replaccment (d) random'

g
The difference between a stntistic ayrd the parameter is callerl:

o
(b) sampling error
l
(a) probabilitY
(d) non-random.
b
(c) random

.
Probability distribution of a statiirtic is called:

3
(a) sannplinB (b) parameter

4
(c) data (d) samphng distribution'

9
Sian,larrl deviation of the sampling distribution of a etatistic re called:

9I
(b) ,dispereion

t
(a) serious ertor
(c) standard error (rI) difference.

ta
s
If we obtain a point estinrate fo, o population mean p, the difference botween

X and p is callcd:
/: /
s
(a) sx6ndal{ 6rrror ft) bias
(d) difficult to tcll

tt p
(u) error of estima[ton
A'clistribution fcrrmed tiy all possible values of a statistic ie called:
' (b) hypergeometric distribution

h
(a) binomial distribution
(c) rrormul rlistribution (d) sampling digtribution
10. In prohability sampling, protrability of selecting an itom frorn the population rt
known and ie:
(a) equal to zero (b) non.zero
(c) equal to one (d) all of the above
11. A populatiorr about which we want to gct sotne information
is called:
(a) finite populabion (b) infinite poptrlation
(c) eampled population (d) larget popr'rlation
L2, Study of population is called:
(a) parameter (b) statietic
(c) error (d) cen$us
[Chapter 11] Sampling and $ampling Dlstributions 55
13. .For making voters list in Pnkistan we neerl:
(a) satnpling erreir (b) standard error
(c) cen$us (d) sirnple rilndom sampling
14.' ^Sampling based up,)n equal probability is called:
(a) probubilitysarnpling (b) .systematicsampling

m
(c)simple randortr slmplirrg (d) stratilicd runclom sarnpling

o
f 5. In sampling with rcplace rnont, nn elcmcnt can bc chosen:
(a) less than once
(r:) only once
(b) more than once
(d)
t . c
o
difficult to tell

p
16. In sanrpling without replncement, alt element cun be chosen:

s
(a) less tharr once (b) more than once

g
(c) only once (d) difficult to tell

(d)lo
1.7, In sampling with replacernent, thc following is alwuys true:
(a) n*N (b) nfN
(c) n>N
. b trll of the above

3
[8. Suppo*e q finitu population hae 6 items ancl 2 items are selectecl nt random

4
without replaecrnent, then all poscible campleis will bo:

9
(a) 6 (b) L2

9 (b)
(e) l6 (d)
t
36
10. Suppose a finite population contains 7 items anel 3 iicnrc are eelectcd at

ta
ranclom without ru:plaeement, then all poesible oanrplec will tre;

s
(a) 2l

/: /
35
(e) 14 (rl) 7
20' A popularion
s
con[ains hl iterna and all possiLrle enrnplers of cizs 11 rlre eelected

p
without replacement, The poseible numbor, rif earnplea will bel

t
(a) N (b) PN

g1' t
(e)
h(a)
Nen (d) N" ,

'9uppoee
a finttp prrpulation containa 4 rterns and 2 itemc arn celeeted at'
randsm with replaQemBR0, then all pessihle anurpl*ls will bet
0 (b) 10
(p) B (il4
&8' r\ populatislt ceninirie 2 itenrs flnd 11 itprrru aro Eplsete.d at rnnrlom with
rcplaesm$nt, rhBR nll pounihlo Hamlligs will bet
(al 1€ (b) I
iri) tfla (rl) 4
g|l, $uppirso rr prtp11lqpiqn han N itemp and n iir,nta al'p Eelostpd with replacement,
Nrrnrher uf dl poneihle sarnpk:s will ber;
(a) f{n ft) Non
(s) N 6) 11
66 Baslc Statlctle Paft.ll
24, In random sampling, the probability sf eelecting en item from ths population
isl
(a)unknown (b) known
(e)un-decided (d) ot'10 .

26, Random samPling ie also eallcd:


(a) probabilitYsarnPling (b) non.probability sampling

m
(e) samPling error (d) rantlom on'or,

o
26, Non.random eantpling is also eallecl:

c
(a) biased..*piiod 0) non-probability campling
i.i randorn sarnpling (d) reprcoentative sarnple
t .
o
27, Sampling error ean bo redueed bY:
'nori.random

p
(a) eampling (b) inereasing the population

s
i.i deereasing the eample cize (d) increasing t6e sarnple size,

g
2E. if frf is the eizc-of the population and n is the size of the cample, then sampling

o
fraetisn ic:
(a)
l
(b) N,

b
nN

.
n Nen
(e) N (d)

29, 'Ihc finite population


ffiI
4 3
eorrection factor ic:

9
N+R
(a)
Iffi G) N+

9
1

t
6[-n \-Ir

a
(c)
Ii*= (d)
n*1

s t X

G\Fi ://
80, , In campling with replaeement, the etandnrd error of ie eqttnl to:

(a) 6 /ii--o (b) ; .ol

G)fr s
tt p
(d) 4.
fr N

h
8,1, In eamPling wlth replaeement, standard eruor of the rample proportlon $ ts

.ffi
squal tol
(a) {*s (b)

, (e) \F (d)
N:N
N-1
lChapter 11I Ssmpllng and Sampllng Dlstrlbutlonr 67.

88. Ifpl=pz=pand Dr * nr; then S.E (0, - 6r) ia equal to:

,-\ ElllJ. + IL!91


\4, nr n2 (r,) ?-Y
EllLt + E!ll! IL Fa)
1
(e) (d) ,ql..t
/IJoo(;
tll tu nt)
RI,

m
,l

84. The eeleetion of cricket tearn for the world eu


rp lisis ral
ca ledr
rdr

o
(a) random sarnpling (b) 8y{,EtlLern
ernatrti c EA
Eatrnpling
(c) purposivesarnpling ter Hrt

. c
(d) cluuBl{ter Hl nruli
rlll ing
86. Random sanrpling is aleo ealled:
(a) probability snrnpling (b) judgrnent eampling
ot
(e) quota sarnpling
s
(d) sequentialsarnpling
p
g
86. A cornpletc list of all the uarnpling unite is ealled:

o
(a) campling decign (b) saurpling frtrme
(e) population frarne (rl) cluster
bl
(a) population design '
3 .
97. A plan for obtaining a Barnple fronr a population ic ealled:
(b) sampling dosign
(o) sarnpling frarte
9 4 (d) sarnpling dietributisn

9
8E. If a eurvey is conduetecl by a camplirrg design ic ealled:

t
(a) sample curvey (b) population curvey
(c) cystematic survey
ta (d) none of the above

/: / s
89. The differenee between the expeeted value sf a statiebic and the value of the
parameter being ectirnate d ie ealled a;
(b) non.sumpling error
s
(a) sampling eruor

tt p
(e) etandarcl error (d) bias
40, Thc etandard dcviation ot'any surttpling tlistlrbutioll iH salle€l I

h
(a) stanrlard errol' (b; non srrtnpling error
(e) type: I error (d) type.ll ertot, ,

41. The ctandard error inerEasoe whon sarnple eize ie:


(a) inereased (b) dcereaecd
(e) ftxed (d) rnore than iJO

4n, 'l'he mean of sanrpling tlistr,ibutton of nlCInrle lo oqunl to:

(d) x (b) p
(e) B (d) noRe ofthc abovc
48, The maan of the Barnpls rneuns ie oxaetly oqual to thel
(a) eample mean (b) population mean
(u) weightcd lfloall (d) cornbined ,rrean
68 Baslc Statlstlcs Paft.Il

t, Sum of all sanrplqlneaqg is eoual t.:


t,.' Total number of samples

(a) Etb (b) Lr


(c) both (a) and (b) (d) none of the above

m
46, A sa.nple which is free frorn bias is called:
(a) biased (b) unbiased
(c) positively biased (d) negatively biased
c o
t .
o
46. If E(X) = p thon bias is:
(a) (b)
p
poeitive ncgative

s
(c) zero (d) 100'Zo

47, If E(b = 10 and lr = 10 thon bine ia equal to:

og
l
(a) 0 (b) 10

b
(d)
.(b)
(c) 20 difficult tq tell

48. IfX=10andpr=
(a) 3
12 then sanrpling

4
error ie equal to:
lo

9
22
(c) Lz (d)
9
2

t
49, Thc etandard dcviation of ihc distribution of earnplc tneaRs is equal to:

a
t
(a) o'lfi (h,) 16 ln

s
(d) s/n

/: /
(e) s Nn
Ifn= 26,ss=26andX=26, then ctandard errsr sf X wifl bcl

s
I0,
(b)
p
(a) 2F 6

t
'0
t
(s) 1(d)

x'hsu*ES#iaaalled;
(a) unbiaaed sample varlanee (b) populatien variense
(e) bieeed sample vflriaRes (d) all sf the Bbove

re-l*
"n =
ElX.i* le enlled:
E,
n: I
(a) unbiascd samBle varianse (b) true varianse
(e) bialed eample variRnsP (el) varlenQe sf meanc
B, If H(e*) * B end sB s I then binE will bg;
(a) 6 $) B

(e) a (d) I
[Chapter 1U Sampling and Sampling Distributions , _
69
I
I1 54. ln sampling without ,.Olu.u*ort, the standard error of sampling distribution
of sample proportion $ is equal to:

I (a) b'(#J 0) Y(ilfr)


(c)#m ,(d) *(*--J
55. Wheh saurpling is done without replaeement oO is equal to:

(a) * om
c
,0,.
fr
fr1[t= (d);[--T t .
o
(c)
56. In case of sampling with replacement or. " is equal
Pr-Pz - '
s
to:
p
{r+ aFog
l
(a, (b)

. b
3
(c) (d)

9 4(b)
57. The distribution of the rneans of sarnples of size 4, taken from a populati,on

9 (d)
with a standard deviation o, has a standard doviation of:

t
(a) o ot4

ta
(c) c/2 o'12

s
58. In sampling ivith replacemen6,, of is eeual to:

/: /
r_;,

s
(b)

tt p
o:l+ ol

h
(c) (d)

59.. When sampling is done with or without replacement, E($, - 0, it equal to:
(a) 0,-0, 'i (b) pr - pr
(c) Pr * Pz (d) prPz
60. In case of sampling with replacement, E(S2) is equal to:
(a)'(--l) ' (ur (,,--u- j "'
(c) (N-) (d) *
"'
{
r

70 Baslc Statistics Paft:ll

61. In sampling without rcplacement, the expected value of 52 is equal to:

(a) (Y)ffi" (b) (*_J(#'J ",

62,
(c)
When
k*)(*"
sampling
)owith is done
(d)

replacement, then pr.z is equal to:

m
(a) (b) *

o
o2

. ffi c
lcz
(c) (a) o'

t
\n-
(b) (#J", o
68. In sampling without replacement, pr:l is equal to:

(d) (#) s
p
og
l
",

G)b
64. When eampling is done with or without replacctncnt,
i, i* cquitl to;

.
1tO,
-

3
(a) - pz Fr Fr + Itz

(c) - Pz ltr

9 4 (d)
ff-ff
9
GE. If X represent the number of units having the specified charactcristic and n is

(b) *at
the size of the sample, then sarnple proportion $ is cqual to:

t
n X+o o
(a)
s
(c) (d)

/
x r/n

/
(a)* s: tu)*
G6. If X represent the number of units having the specified characteristic and N is
the size of the population, then population proportion p is equal to:

tt p
(c) (d) x o:l
N N

1. h (a) 2. 3. 4.
(c) 5. &) 6. (b)7; (b) (d) (c) 8. (c)

9. (d) 10. 11. 12.


(b) 13.(d) 14. (d)15. (c) (c) (b) 16. (c

L7. (d) 18. (c) le. (b) 20. (c) 21. (b) 22. (a) 23. (a) 24.
25. (a) 26. (b) 27. (d) (c) 29. (c)
28, 30. (c) 31. (d) 82.
39. (d) 94. (c) 35. (a) 86. (b) 87. (b) 38. (a) 3e. (d) 40. (a) i

4L. ft) 42. (b) 4s. (b) .44. (c) 46. G) 48. (c) 47. (a) 48. (d)
49. (c) 60.. (c) 61. (c) 62. (a) 53. (d) 54. (c) bD. (c) 56. (c)

67, (c) 58. (c) 59. (b) 60. (a) 6r. (c) 62. (a) 63. (b) 64. (a
I
65. (b) 66. (c)
T

lChapter 111 Sampllng and SlTptlng DtrtrtbuUonr ,t ilI

qHonr qr,rEsTrolrs
1. Given lr = 6 and n =- 80.
"]:. .Find
r.ru l.;.
t=
Ans.6
2, Given n = 36 and o = G. l'ind o!.
I n
Ans. 1 .

m
8. rGiven n= 26 and o- = 6, Finclrthe value of o?.
Ans.626

c o
4, Given F,, = 10 and p, = 6. Find F,
t .
o
xr-x:
= r

Ans.4
p
.

s
5. Civen nr = 30, n, = 25, o? = SOO ancl"of,= lEO. Find oi, _
g
rr.

' lo
Ans. 16
6. Given N = 800,.n =.1o0 and

.b
s2 = 200. If eampling ir done without roplaccmenT
-
then find the vaiue of or-.
Ans. 1.16
7,
43
9
Giveri N = 310, n= 100 and o3 = 35. If sampling ii'don" without rrplacement,
then find o2. '
9
x

Ans.5150

a t
8. Given
It
s t
3, Dr = 2, N, = .1, D,r = 2, g? = g/g nnct ol = 6l4.lf rnmpling ir done

/: /
.-
without ruflo."*unt, then fincl the valul of o1
Ii- xz

s
Ans. 1.08 ra

tt p
9. Given N = 7, n = 2 an4 oz = 16. If sampliog is done without replnsmont, then
find E(Sz).
Ans.9.33
h
l0' Given N = 7, n = Zand o2 = 16. If samplinq is done without rcplnccmcnt, thon
find p.:.
Ans. 1E.6? '.',

11. Gi.ygn F = G, n = 2 ancl oz = 10.g. Find E(Ss)


Ans.5.4
12. Given- p = 6, n = 2 and oz = 10.8. find b(sl
Ans. 10.8
13: Given N = 7. o= 3, lf =3/7. Findthevaluiof populationproportiotr p.
t,
Ans.3/7
r
I 72 Basic Statistics Part-II

\d. Given N = T, n = 3 and pnp = 3/7. If sampling is clonc lvitliorrt rcplacernent, tind
,
o'n '
,p
Ans.0..0544
15. Given .n = 5 and p = 0.5. Find O-,rt) .

m
Ans.0.05
nl = 2, pz= | l2 and n, = 2. -h'ind

o
16. Given pr= 2l 3, ;.r..,,
- lr

Ans. 1/6
L7. Given Pr=2/ 3, n, =2, Pz= 1/2 and D., = 2. Find or..
t . t,
c
o
l)i- r)l

p
Ans.0.24

s
18. Given N, = 4, frt= 2, N, = 4, frz=2, pi= 112 and p, = i'-1: if sain;;l,lrg t:i tiotte

g
without replacement, find S.ti. (0, - 0,).
Ans.0.3819
l o
19. What is the value of the finite population correction t'actor r'"'hi:t: :t -' lli rttld
N = 125.
. b
Ans.0.93

43
20. Differentiate betrveetr sampling with and without I'uiiia'jerncnt'

9
21. . Distinguish betrveen probability rind non'probabi[t1' s:rnc lirg

9
t
22. Differentiate between parameter and statistic

a
23. I)istinguish between populatioll and sautplc'

s t
24. Distinguish between sampling and non-sampling el'r'ois.

/: /
25. .Differentiate between sirnple l.andotn sanrpiing and
sampling.

s
26, Explain the term sampling frarne.

tt p
27. Define the standatd error.
2g. Distinguish between sirnple randorn sarnplc and sttnple rantiu'm surtiplir:g'
29.
30.
31.
h
Explain the term sanrpling dcsign.
Differentiate between finite and infinite pcrpulations.
Define the sampling distribution.
32. Differentiate between randotn sample and sirnple ratlrl'lt:r sa'ctple'
33. Write down the advantages of sarnpiing'
g4. Write down the basic aims of sampling'
35. Define the terms sample and sampling'
36. Define the sampling distribution of means'
3?. Describe the propet'ties of the sampling distribution of sampie lncitnri'
38. Define the sampling distribution of samplq proportion anti describc its
properties.
39. \\'hat is meant l:v bias?
IChapten !.lj
}IXERCISES
;{ A popula*,icn ccnsists-of fivc numbcrs 3, 7, ll, 15 anci 19. Takc all possible
salrllrirs of s: ;e trvo rvithout rcplaccrncnt from this population. Find thc mean
and standarc deviation of thc sarnpling distribution of means
Ans:1t., - i1, rt.- = il.lC

m
T'ake aii pos,siblo sampies of size 3 rvithout replacerncnt from the population 2,
-?t
o
6, I' r2 anrl i4. For:rn sampling distritrution of rnean and find its mean and
t,ariance. \'erify that: pr; - p anrl "i - * (Hj)
t . c
Ans: pi, = 8.4, G__
x
- 3.04, lr = 8..1, oz = 18.24

p o
s
3- Di"aw .'rll possible samlrles of sizc two rvithr:rrt replacement frorn the population

g
c' 16, 1l, 20 anrl 22. c;alculate their mrrans and. prepare thc frequency

l o
dist,-ibui,ion of sarnple inean. Cornpute me.ln and varianco of frequency

b
disriiburion of rnean and compare them with population mean ancl variance.
Ans:pto : j.9, oi = 1.67,.F= 19, o2= b

3 .
4
4, A popuitation crinsists of four numbers 5, 6, 7 and 8. Take all possible sitmples

9
of. size three without reprlsgsrr.nt from this population. Calculate mean and

9
variance tf salrtple means and compare them with population mean ancl
variairce.

a t
rr-
Ans: 'x = 6.5. o2
t
= C 14, p = G.b, oj = l.Zb
s
/: /
X

5. .{Sopuiatiorl crins;'-rts of two elements 24 and,35. Take all"possible samplee of

s
siLe two -*'ith repiar;ement and find their means. Make a sampling distiibution
'of sarriple rie ans and find its rnean and standard deviatiq. Vlriff

p
that:

Ans: ir,h
t
t c-
-
.= 2.1.5,
(i) p* = ft
= 3.89, p = 29.5, o = 5.{-l
(ii)
lvn+-::
oo =

6. . A population consists ofthree values ZA, 4d and 60,


(l) Take all posoible samplcs of size 2, rvhich can be clrawn'with replaccment
fro:n this i:opulation and find means of.these sarnplcs.
(ii) Make a. frequency distribution of the sample rnean and show thai"tlre
t'arian,le of this distribution is equal to the population variancc divided by
the sarnple size.
I
Ans: Var(X) = 133.33, o2 = 266.G7
Stasrtlct Pail'll

A population conrists of values B, 8 and |Z' Tako


']lry.tibt3-i:T.l]tt :|;':f"3
XrtITffil;;J; ...,r,prii,s' with.':plu:onl,lI: I::lll:",.:,i:T::X
ffiih.,iilflf r.iiuiA pro"i thit etnndurd orror ol'rnean thc squurc - ie root
sia!'
of populaUon varianco dividod by tho samplc
.t
; Ar*S,Ed) I l.?6, ot=8,2222
2, A'Then show

m
=(ffi;;;'--.U *rtble ramploe of aizo-8eample
with replaccmcnt'from

o
; ;h"t tii population mean = moan of moans

c
(ii) rtandard error = population S'D'AF

.
'

ot
p
would be
a populntion are 6 nnd 2.16 reepectively' Whnt
if
'i.. L
r' If Jcan and varianco

s
;ffi;il; .;;;i ruati ii eumploe of eizc 4 arc clrawn with rcplaccment'

g
i{:'

l o
=nuo'"*tn*
of population ar-e ? and 8.16 respectively. what wo*Id
be
{,*t''6)
b
," '- ;;pi;r
.
(i from
Irffift error of mean if arc drawn without roplacement of size
:- ,0\
3
poiulation of size

4
,.J
irr 9.8.6) = o.48llo {n

9
if (i) sayp.lg of 36 ie diawn
$lhat will be the mean and variance of eample means {t;'s,6, 7' (ii) Slrnple of 4 is

9
, u.
'-'*tf,
t
,"pfl*rmnt frgm tho potrulation 1, zia,.a, 4', 4, (i).

a
fii*o iiin ut rrplalrcment from the population given in

s t
//
*1|iatili :
,'i;l'fr:[f d:ltlnfirffi"'i!:iii&#,1'ffi ,':l
p s
"#ffi;,stJl.'l}ffi oach-Fre obtaiired, what would be the expected nt€an

t
"ft[".iuaonh of tho' r'oculting earnpling distribution of menne if
t
end *anaaiJ-aeviiiion
;1rpUnf *|[ dilii) *ith ,.plocompnt (ii) without replacemcnt?
h
A[(i)Xf s 66, cI= 0-.6 - (ii)p*='66, or= 0'5
a tn-enn of 68'O.inches
laTh! hslght of 1600 etudente:aqq,normallV distributed withsamplee of eize 26 are
end e.t"nA"ri-i"ri"iion of 2.6 inchee.-lf 300 randorn
the expectcd" mean and stsndard
d'rn hom d;-;;;;fiti;n;'laomtrinu
'*rptini'aittriUution of meane if ryrnnling is done
'

tri.ti,ri Of ffi
(l) sith rcplacement (ii) without roplace*1nt' '
''
A1;Op;a 08, c*= Q.$;:"i (il)g;='68, of = 0'6 '''''
n
'

la, data.on.earninga of industry workero'''


--' ffrai.a"r"f Buneau of Statistlcg.collects
qf workerg in the indusJry is Re1$6p0' S1tpP389
iil ""ininst
' . thrt 'o*Il;;Llt
n ruch workers arc to be sclectecl nt rnndom., I-ct X donotc the
mean
t...
(+= {-L-r g,a
/\
(,@\i-(w-c
-25
P)
Dlstrlbutlons {Tl V^(-( )
weekly salary gll thc workers chosen. Asnuming n populntion strlndarcl
cleviation of Rs. foo, ,tnn thc rncan ilo I in ,Ut'i-
strrioarcl deviation or
(i) g=25 (ii)n=100 (iii)n=40d Si
Ans: (i) F* = 1{00, o*- = 40 (ii) Ff = iObO, of = 20 (iii) }r1= 1600, oX = _10
16. suppose a random samplc of eize n ie taken from a population of size N.

m
(a) Assume n = 1, Computo o; , if t\e sampling is done
(i) with replacement (ii) without replacement.

c o
(b)
t .
Assume n = N. Compute o1 , when sampling ie done without replaccment.
Ans: (a)(i) or = o (ii) o* = 6 (b) ox = O
/'
p o
s
16. A population coneists of the three numbers 2; 4, B, Consider all poseible snrnplee

g
of size two which can be drawn with replacement from thie population. Find the

l o
mean of the sampling distribution of variances.

b
Ans. Fs2 = 1.3333

3 .
1?.A population of 7 numbers hae a mean of 40 and a etandard deviation of B. If

4
samples of size 5 are drawn ft'otn lL;.,
this -^,...1^;:^- ^-, the variancc St = Zg-X lZ

9
population and
n

9
of each sample is computecl, find the mean of the sampling distribution of

t
variances if sampling is: (i) with replacement(ii) without replacemont.

a
st
Ans. (i) 7.2 ' (ii) 8.4

/: /
lE.A population consists of four values 4, 10, 14,20. Tako all pogsiblo sam;lles of
size two without replacerncnt from thia population and veriff that

s
f N \/n-1\
I'Y = (N_1/(. n
tt p
,/".,
Ans. prz = 22.67, o2 = 34

h
19. A population consists of three numbers 4, 6, 8. Take all possible samples of siie
two with replacement from this population. tr'ina tne mean and thi unbiaeed
variance for each sample. Show that (i) IrX = p and (ii) pg2 = 62.

1t.. t * = 6, psz = 2.67, lr = 6, oz = 2.67


20, A population consists of four values 4, 6, 8, 10. Take atl possible earnples of size
two without replacement from thie population.and verify that E(s2) = (
1s )",
where s2 = E(X - X)rl(ir - fl.
Ans: E(s3) - 6.67j oz = 5.
Basic Statistics Paft-II

qf size ftr = 21ith re,laccment


Y.*rx, reprepent the mean of c random sample
SimilqrlV, Irat Xr rcnl;sent the
*rrn frnite population consiqting of values 4, 8,
"
mean of a random semple of eize \t'= 2
with replacement from another finite

population consisting of values 2' 4' Form


a sampling distribution of X' - X''

m
Zt)
6: oI
!-.3
-Xr;=-
o
+
- p, (ii) Var (X;

c
Verify thqt: (i) E(X, - Xr) = lt, rtrz

t .
6' oi = 4' ltz- 3' of,= 1

o
Ans. = 3, var 6r - X, = '2'6'' |rt=
Efi, - X;

p
at 2 without replaceurent from 'a
22.oraw all possible random "nrr,plu, of size =

s
frnite population consisting of 3, 6, 9.
similarly dray all possible random samples

g
finite population consisting of
of eitn n = 2 ;iil;ffi;i"*:T;;'f;;;"iher
' " 2r416.
\I ' (a) Find the possible
l o
.tetween the sample means of thc' two

b
dift'erences

. -
I
poPulatiorrs.

4 3
(b)'Construct the sampling distribution of X'
X, and comPute its mean and

9 .*tH)
variance.

t 9 6, f(H)
Veri& that: (i) rx, - r, = lLrltz 1ii; of, - xr=

? t-a
G)

s
:.')v
4,oi = 6, ltz= o;=

/
A6;, lk, - *, = 2' ofir- rr== 2.1661,
--r Fr = E

/
^I
:
20( = 300' nr = 100
)' 6z= 250' Nr = 400' Nz.l^rrinrinn
Given pr = 4500,"p2= 4000, or =

:ffiJii";sdil#;;;";iiiii';",i"
'88. of the
Iil'l',ul=u61",i"rli",i,l?'"".0;.''d *'un ilu ---r -l1l,1Tt.
1::::""n
^r^-,r^-,r or

tt p
tho means if sarnpling ':-l"nu
"r
(r) with rdptacemcRt Gi) without replacement

24. h
Ans: (i) 600 and 40.62 (ii) 500 and 36'69
Given Nr = 125, nr = 30, pt = ?8' oi = 150'
200, l-r0, pz = 85 and

Xr) when sampling is done


ol2 = 200. compute E(Xz - Xr) and var(xz -
(l) with rePlacement (ii) without rePlacement
and 9
(i) ?, auu
Ahs: (r,
Ans: r (ir) q'rs 6'85
\u,, 7' and v'.--v
4' 6-''1I and 10. Draw all possible
ilfi, Th""" are frve digits in a population-,-!i'e;^-l2' ri-t rtlooortion t0)
t0) or
of
nnd llra oornnle p'oportion
lli"J;Tlr};:";ffi;ffi."i,"a
ai*tt in pach sample' Verify that:
the sampre

"r"ri pll /N-n\


(D' E(0) = p(population proportion) (ii) S.E.(0) = n \.N-1,/

Ans:E(0) = 0.6, s.E(0) = o'2, P = 0'6


Chapter 111 Sam and Sam Distributions

26. Draw all possible sarnples of size three without replacetnent froln the
populatlon'A, 4, b and.7. Calculate ptoptrtioh of odd nutnbers in cach.-
s;tttrul& ,t
andverifythat: (i) E($) = p (ii) var(01 ='T(H=)- --
where fi and p are proportions of odd numbers in sarnple ancl population,
' respectivelY.

lodge, there live five friends and their marital status is U, M, M, U,


om
c
27. Ina private

.
M where U an-ct M stand for unmarried and married respectively. Find the
,
i*o
o
frierrds without replacement from this population and find the proportiont
proportion of marriecl friencls in the population. Take all possible samples of

of married friends in each sample. Make the sampling distribul"ion of the

s p = Y (H-=
g
sample proportion and verify that: (i) ui; d (ii) o2^ )
Ans: 'p
u,. = 0.6,,. o2r., = 0.09, P = 0'G

l o
b
p

.
(a) Draw all possible samples of two letters each, with replacemcnt frotn the

3
2g.
letters of th6 word "NEW".

9 4
(b) Find proportion of lettcr "E" in each sample'
(c). Make sampling distribution'of proportions obtained in part (b)'

t9
(d)' Find mean and variance of the dist'ribution'

ta
(e) Verify that: (i) P0 = P
=s
(ii), or2_.p = pg
!.4

/: /
1/3, =
oi.p p
Ans. u^ 'p = 1/9, 1/3

s
29, suppose a random sarnple of size n = 80 is taken from a population
of size

p
N =-fOO. The proportiorrin the population being calculated is 72.3 Find the

t
9'o.

t ' -p = "'p
mean and standard deviation of the distribution of sample proportions rvhen

- h
sampling is d.one (i) with replacement wittrout repiacement.
f1)_
rd Ans: (iJ Vt= 0.723, Ga
"'p 0.050 (ii) u" = O'723, o,rp
= 0'046 '
80. Find the mean anct stanclard deviation of the sampling distribution of
proportions for n = 100 and a population proportion of
3

(t) 20 %u (ii) 40 % (iii) 50 % (iv) 90 %

Ans: (i) p0 = 0.20, o0 = = 0.05


p" = 0.40, o0
(iii) p6 = 0.50, o0 = 0.05 (iv) p0 = 0.90, o0 = 0.Q3
of size
3l. Let $, represerit the proportion of even nutnbers in a randorn Salnple
or = 2 without replacement frorn a finite popul:rt.ion consisting of values 4, 6,
!I
of
Similarly, Iet $, represent the proportion of cvcn nutnbcrs in a randotn s;arnpie
78 Basic Statistlcs Part'II

size n, = 2 without replacement from another finite population consisting of


valuee 2, 2, 5. Form a sampling dfitribution of (fi - fir). Verify that:
'
(i) E($, - 0J = pr - pg (ii) var (0, - 0,) = ? (H) - T (PJ
Ans. E(fl, - 0r) = o, Var ($, - 0J = l/9, p, = 213, P:r = 2/3
82. Solve directly:
(i) Given N = 1500, n = 30, lt = 22,4. Find'[*.
om
(ii) $iven n = 25, o* = 2'5' Find o?

t . c
(iiiFGiven N= 310, n= 100, ltX = 24000
p o
oe = 5000. Fincl o!x (without

(it)
, replacetnent).

g s
Given F = 6, oe = 8, n = 2. Irind p1 ancl pru whcn sampling is done with

l o
b
replacement and Sz = E( X-I

.
;21n.
whcn sampli'g is donc with

3
(V).,lciren Fr = b, Fsz = 4, n = Z Fincl p and o2

9 4
replacement and Sz = E( X-X ;21n.

9
(vi) GivenN = 5, n= 2, p= 6, o2 = 8. F'ind p1 and prz when sampling is done

a t
t
without replacement ancl Sz = E( X-X ;2ln'

s
:2,
f,,2= 2,Fr = 6, ltz= 2, o\= 2.67, oi= Fincl lt*,

/
(vii) Given ill O.OZ.
- *,
andof,-*r.
:/
p s = 0.3, o? = 0.36, o?r= 9.16. Find lrr -
t
(viii) Given n, = 49, n, = 36, FX, - f, ltz

h t ,nd oX, _ x,
(ix) Given px,
- xr= 4, $.t=
6, or = 2.25, Nr = 30, N, = 25, n, = 4, nr= 4,

6.25. Fincl p, ancl o, when sampling is done without replacetnent'


9x, - X, =

&1 Given N = 5, n = ,.p =


2,lrl 2/5. Find p and of, when sampling is clone without
, p

a.
replacement.
(xi) Given pr= LlZ, 'pz = 1/3, Nr = 3, trr = 2, Nz = 3, tz= 2. Find B(0, - 0J nnd
S.E. 0r) *hon sarnpling is done without rcplaccment'
6r -
Ans. (i) 22.4 (ii) 156,25 (iii) 33.98 (iv) 6, 4 (v) 5, 8 (vi) 6, 5
/,,ii\ ,{ 11.67
(vii) 4, A1 /rriii\ o 1O82 (ix) 10,
R 0.1082
(viii) o0'3, 10. 13.166 (x)
k\215.0.09 (xi) 1/6,0.3436.
215, O'09 (xi)
Chapter
12
STATISTICAL INFERENCE
ESTIMATION
om
I2.I INTRODUCTION
t . c
p o
A pereon eelected at random from a certain place shows the colour; habits and

s
language of the people of his area. He gives some information about all the people,
We say that he ie the representative of all the people of hie area.'Whatever we gain

g
from thie person ie the inference about the lot of people among whom he has b6en

l o
selected. Frorn a particular pcrson, we gain information about the mnin group of

b
people. The information travele frorn the particular individual to the gener'hJ group

.
of people. To get better information about the people, we may select more than one

3
person out of them. We say that the sample eize has been increased. A large-saniple

4
usually contains greater inforrnation about the population. In our daily life, if an

9
individual says something, we may not accept it. We may have some doubts about it.
But if many people say sornething, we feel like acccpting whatevcr thcy say. The

t9
same logic works in etatistical reasoning. Every individual from u statistical

a
population speaks about the properties of the population. Every drop of water from

t
an ocean has certain characteristics which lead us to sorne conclusions about the

s
water in the ocean. A single drop of water or oil is,sometimes sufficient.to give clear

/: /
picture about these liquids in the big containers. In statistical studies, if the
population contains individuale whose characteristics are similar, then a small

s
simple random sample is eufficient to give the required information about the

tt p
properties of the population. We decide about the sample size according to, the
nature of population and the nature of the study. The conclusions based on a sample

h
data are related to probability.
r2.2 STATISTICAL INFERENCE
The information gained frorn the sample data is used to reach soine concluslons
about the characteristics of the population. This process is called stolistical
. ilference. Statistical inference is baecd on the principles of the sampling theory.
There are two approaches of statistical inference namely,
(i) Estimation of parameters (ii) Hypotheeis testing or testing of hypothesis.
In this chapter we.diecuss esti,matiotr. and hypothesis testing will be discussed
in the next chapter.
12.2.t APPROACHES. OF STATISTTCAL TNFERENCE
There are two approaches of statistical inference called estinwt,i,ort, und testing
of hypothesis. These two approaches are completcly different as far as their
79
Easic Statistics Fart-II
80
0v @'*.#

on i,h* riilir'xi theor;' c'f proi:a$:l'll


conclusions are concerneci but both zrrc basccl il1
principles of .u.rrpting thcor.'*' They start wirl: a sr'r1:;f '.
the .1a1n':::'::",lii:i:::,^li:
ffiJ;;:"#fi;ffi';; ;i;; ,;;;;oo"""1'o'ate paths ,!'11':':-j,:,.ij: ::::.f'::*::
irffi,h"^;;;;;;;-. of, stnokers i* o,-ir cou*trv. Tl-ris rs a^prohlem l-t]'tl:"f::
t"ireE of some size claims that
ffi; .r7;,1"r;r,,. !:"u;; u ,r,untrr*ti;.re' of .opp*t rr L^1*.
H'J;;-'*il;:;;;;t';'"I1,i."i,,"i;'. i' 10 kgu;'rricr
i{is clailn is to }'rc tesrt''d rvith thc- help
ir1'pctiresis testing'
oi uorr-r. test!. Thisls a probleru rv6icit
corncs

12.3 ESTI}LA.TION
Statistical inference about the tlnkno',vn valu::s of tire
om
popi.rlation i;ararr:ciers is

. c
tO lintrv i'i:ie avererge lif"r
called estit*otio*of paratnet.-rs. Siippose lYe are lutct:cs-:r;eci scr:':ihitrg which is not

t
atr csiirnate of
of tires of a certain firrn. This lBcans u'e \l'ailt

o
known to us. It is a prohlem of eslirnetion. The
tninitnuri a'ird maxixlurn cholestrol

p
These e;ititnates are provided by the
level of persons is also a problem r.)f csiirnation.

s
is donc hy't"ro methods rvirich are:
sarnple oSservationu. Eutlrnuiion of para-inetcrs

g
' (r) Point estimation (ii) inicrval e'si'iination'

l o
rz.g.i POiNT" ESTIM,t.TO{I A}JE I}OIN? .trSTi&TATE
.. point estimatc is a value ^caiQuiar.ect frci:-, the sample riata. A sirrgie val'ue

. b
calculated fro* , *omple or: saulples is ceiiad
pciirt esttmate' Consirler a Sirnple

3
of this sarnpio is
;;J;;.;;;* ri sizc t with values as 10, 15, ?0 a::d 25' Tire rnean

4
(10 + 1g + 20 * zisii= tr.i-rnu* the sarnpie ineen 1?'5
is a poi,b estimate of the
to knorv thc 1:ercentage cf

9
unknown population paratiietcr it. \'ve ur* irrtrr*stcci
the- breakfaltt w1 have taliern a

9
chilclren under 5 years who talie tca regnlarly rvith

t
tea'
simple ranclom of 100 children and AO are found to he habitual of taking

a
"*;;;i"
The sample proporiion 60i 100 - 0.6 is ea.llecl the
point estirnate of the unknown

t
value calculated from the
population proportion p. If ihe pararneier is 0' the specific

/: / s
*u*pt" is calied. a point estitnate of 6. Srppose we have t$'o samples
of sizes n' and
the differe,ce (0, - 0') is a]'so

s
n, and we have calc*lated thcir p'opo.tions ff, an'l ffr'
popul'ation paramt:ters' This

tt p
a poiirt estirnate of the .actual Jifference between tire
(p, - PJ'
unknou'h po.u*ui.. may be denoted b1' r ,i
is based on bhe

h
wt,u. is uscd in gencra} far a s,iat.ig Ltc. Esiitlt,olor
The word esdill
sample and in general differs from un*pie to sample'
It is a rantlom variabie with a
probability distributior'r cailed the sampiing distribution'
If the unknorvn populatron
uy 6 (tnetu hat)' If the
pararn:ter is d.enoted by 0, its eslurtotor is gencraiiy d,enoted
XX/n (*ith any value) is
parameter is the poputration mean pt, the satnple.meln =
X
Lhe estir*ator ofp'- Medlan and rno,le are
also esti'm;tars of p' Thc sample proportion
., t(x-x)2ut'aI
p.'repsvtion p' The sanrpie variartt:r's s' = -:l--
$ is an estintatorof popuiation

are estirnalols 0f popttlat'ion variat"lce o'' Bre have already learnt in

chapter l-1 that li(s?; = o? and }t(S?) * o2. r\t sorne


later st'age we shail call s2 an
oi o!'
unbiased estinrator of o2 ancl Sr a biasccl estirnator
[Chapter 12] Statistical Inference Estirnation 8t ;

I2.3.2 POINI' ESTT&TA.TION


Po*tt e"qtiliioliorr is a plccess of getting a sirrElc r.illuc frorn t.he samplc as al)
esthn,ute of the unlinorvn poputrutron parameter. h poiit! e,qii,tttole, in general, is not
eqrral to the population paramctcr. Point estinLal,iott is of great irnportance in
practicai lif'e. In o'rr daily lii"c it is qlitc c()mi!ro:i thai.xc rnal<e use of l"he poitr.t
esl.i,nwtes lvithout r',-.fcr'iii-rg tr,, ijrl. i,ir,lt ui''.',:tti',,it,"rtl ii:ilr;',,:1crt. 'l']ic 1;orcr.:ut:rge of
peoplc iiving in rer,icr{, hct:scs is;r 1-,;'1vi;i,',lu r.,{',r,ri;l.it.l t:tiinr<;iit;tr.'i'lrrr )clcentage of
bottles proper'[y'fillerj is provitlerl i,i'u i;r,'in/ lts!inie!e. 'i'ire ;rercrrrLagc r-rl'biibrcs ivl'r<.r
are born rvith,phi'sical dt;fi:cts a;rrl thr: pci-cei)t:l6c'cl' cirildren rvho do' not get
om
adinissirrn in thr: s,:hcols lrlc thc iirc::ts of ;,,i;i;rl cstitu,rtl;otL. A serious drawbeck in

t . c
pttittt estintat/ott is tj:rit tite antotint of t,i'r(,r'Clrtrni.rl i-'c caictrilrtcd inpoilrl cslirttatiott.
T2.3"3 UNI}IASEDNESS

p o
s
\Vhen a Iarqe nunrbr:t'of lanrir:rn sar::1:lco i;f';: givl,n si:"c arc takuir and rire

g
vaiue ,:f the e:;tilitar,-rr is caicuiatril fbr cach sai:.,pjc, il:c a'?"{-.rsBe r.if t}rcsc valucs may
be equal ti; thr.r 1;oi,rull-.ii;n paritrictcr. An cr:tirnatct'liar.ing tlris propcrty is called

l o
u,rftiase'd t:sli.ttir,ttit'. irr ot.hr.-.r rvolcls, the usri.na:o:: 6 is called unbiascd cstimator of
the paramet;:f Er t
. b
r tii) = 0.'fiie esfirtiatur is a st;itistic with a prol;ltbility

3
disiributic* w l-..i,'ii i.s c*lir,ri the. sarnpling clistril-nrtion of iiie staiistic. An estimiitor is

4
calied an ulti:rased estir,iati:;r oi ti',e pr:pulaticn pariu:rcti:r if the me&n of rts sa:npiing

9
i,iilauletcr 'Ihe i.
Cist,rrbuLiitn lr; tr:JuaJ t,r tlie ir, an tritbiosed esii.ntettor

9
-.e11i;.,1e rr:c,rr"r

t
of the pop.-rlatiun niii.;:iri li. It rriear:s E(X) = pl. Ti.re sl:'l't-lpl6j proportio:: () ls aiso

ta
ttttbillr:;t:ti rslir;i,lf',;' of thc ,'rrj:uiatiori proporiiorr p br:cuus+ E(0) = p. '['ire sanrple

v*.riance .' '.'.=:'[.1I,


/: /
:i'i - -t:: ,.
s ti,;'tticseri eslirnolar cf tire popuiafion variance o' but

s
\', Y r,' rZ

tt p
qj-*-----:sabia.;ec cstin-iator r.,f o? and tire bias is equal to the diffcrence
ti
j)
H(S - o:.

h
UtLl:iosediter-ts is r-,nt of thc tlnp,:rtant pr..ri.ri:rtles of goot-l point eslimators. Othel'
properties cf prlrcd f otnt csiirliitoi's alc ccrusislartcy, e/t'iciclr.c1, and s;.tfl'it:icrtcy. These
properLres r,viii nryt be discLrsserl in this hooll.
12.ts.4 ISiPCIltTANCE Oir Uh"Ei;',SrlDi{CSS
a tireJCl roie iii strtis'iic:ri ir:feicncc. l"'he next chapter i.s
Un[,'iaBeJne*os ]ri:1,'s
about the testing of irl"potht-'sis wirci'ein rve sliall icarn tirat the h;,rpothesis abor-rt the
populaticn parantttc:'i$ infacl th,: h5:';;ctliesis rl:oril'rire sarnplrng ciistnbution of the
estimator: of that pararncter. if"ivo int'cr that ilic ltoan of t,ire sarnpling distribution
ofX say 150, it means that tlie l1i*at1 cif l.he population aiso 150. This is due to
the unbiaseclness of X. tf X were a biaseiL estin:ator, sorro vcr';v irnportant tcsts of
hypotheses about p would not have hcen pcssible.
Baslc Stitlstlcs Paft'II
82
12.4 INTERVAL ESTIMATION
parameter
If a random interval is calculated so that it contains the unknown ilfioruo.l estinr'ate or
with a known probability, then the intcrval is catled confi'den'ce
process of finding such intervals is
simply confidence ;;;;ili for thp paratneter. Thc
gained a lot of intportnnee in
called iruterual estitnaliort 'llte intcrval estirnation has
statistical inference. It is based on randorn sarnpling.
ih,,s the conlidctrce itlterval
on tlte saruple data'

m
constructed is a random term because it is based
provide the estirnate of error'

o
A drawback in point estinration is that it docs not

c
Point estirnate is a single value and
No assurance is atiached to the point estimate.

.
the value of thc unknown

t
it is wrong to think that a singie value will be equal to
..tinrJte has some assurance of containing the population

o
parameter, The i";;;i
parameter. Corrriitn.J int"runl tclls us with a knorvn
dcgpe of confidencc as to
*h"r" the population parameter actually lies'
s p
g
L2.4.I CONFIDENCE COEFFICIENT

l o
Theprobabilityattachecltotheconfidenceintervaliscalledcon'fi'den'ce
coefficientor level oiconfidence. It is denoted
by 1 = a' If cr is specifiod as 0'05' then

. b
con{idence cglffgient in tcrms of
I - cr = I - 0.0b = 0.95 or 95 o/o. Wo .an speak of coefficients *hich are conrnronly

3
unity or in terms oi-f.r.untugu. tt.t.orrfi,lrn.u

4
.,rud ,r" gO Yo,95 o/o,98 % and 99 %'

99
To make a confidence interval estimate of the
parameter 0' we adopt the
following Procedure.

a t
t
Xn from a
(i) We take a random salnplc of size n with obsorvations X1' Xs' X't' ""

/: / s
population with unknown paratnetcr 0'
(ii) The point estimator of 0 is decided. Let it be 6' tno point estimatc
denoted bv

s
tt p
6, is calcrlated from the samPle'
(iii)Theconfidencecoefficientisdecidetl.Letitbe(1-cr).

h
lower limit and u is the
(iv) Let the interval be denotcd by (L, U) where L is the
upper ltmit.
(v) A certain proceclrtre
ra urrrLa.r of calculatine ! ?"d,Y t":d:11",1-t;t|-':::*?:"3':Ji:
n,rvvvs..^;:^,li-
in contains the parameter 0' In syrnbols' wc may
l, - ") that ttre interval (L, U) "I''to
;r;"-'r[;".;. j= 1 - cr where cr.ries !etwy,1 o,Tjl]1j,,:t^t:,.:',:*:I
I t L--t:r i^.,^.r^llrr
u
H:i, ;il ;;;J."a. ;;
irrrru"r
^'u'1,
and u which are called Lhc tower
"f of the parameter 0' L and! TY u are random $armo
^--^ -^--l^* terms
;,;",;,,;;;";r',;i;;;,,;;-1i"in'
based on the samPle data'
point estimate
ffr. to*u, Iimit L and the upper limit U are calculated from the
6,r. Thrr, as a general rule,

t=1, - k (Standard error of 6) and U = 6p + k (Standarcl error of 6)


lChapter 121 Slatisflcal Inference EstimaTion g3
where k depends upon the shnpe of thc, sarnpling clistribution of 6 and the
Qonfidence uusrrruruilr,, J, - (r. Thc
vvrrrrqerrls coefficient r ltc estinrator
esunla[or 0u lnlly bc
bc sample
sample lncan X,
x, sitmple
.. n
proportion fi, the clifference bctween me{lns
dr - X, or the diffcrelce bc.tween
.proportions 6, - 0r).
L2.5,I SELECTION OF PROPER CONFIDENCE INTERVAL
For making the confidence interval estimate frlr sonre paralneter, we Suve
use the appropriate fonnula. Sonre iritervals arc basccl on tire
om to

c
nonnal'clistribution

.
and some are based on the t-rlistribution. It is in fact thc sampling

t
clistribution of
the statistic which decidcs thc fornrula. If the sarnpling clistribution

o
of thc statistic is
. a normal distribution, then the stanclard normai variate Z is used in the interval.

p
and if the distribution is 't', then tlte ranclorn variable 't', is used in

s
the formula. It is
important to note that it is the sarnpling distribution whicrr a".ia"r tt e proper

g
formula. It is not the parent populationwfrictr decides the interval, though
the shape

o
o'f the population distribution also plays its role in cletennining

l
the pro[u" interval.
We have to examine the following poiirts for rnaking the confidence

b
intoruui i* trr"

.
population mean p.

3
(i) Parent Population

4
What is the shape of thc population which is sarnplecl? Is it normal,
9
approximately normal for practical purposes or known to be non-norrnal?

9
(ii) Sample Size

a t
The sample size--n plays an itnportant role in thc statistical infcrence

t
about p,
When n > 30, it is called large sarnple size. According to the Central lirnit

s
theorem,

/: /
the sampling distribution of X tends to normality by incrcasing the sample
size.
(iii) o is Known or Unknown

s
tt p
If o is known, the distribution of X can be ass.umccl normalcven if n s B0

h
provided the population is normal. If o is unknown and n S
30, the distribution of X
is not assumed to be normal.
t - DISTRIBUTIOI{

The sampling distribution of X forms t-distribution under: the following


conditions
(i) The simple random sample of srnall size is drawn from a normal population
with
mean p. This assumption is very irnportant for any inference about X.
(ii) The sample X1, X2, X., ..., X,, is selected at random.
(iii) If there are two populations under consideration,
both. are nonnal with equal
variances.
Bdsic Statistics Par!!!
n from a normal. PoPulation
with
If Xl, Xn is a ranclom sample of size
Xz, X3, "',
is
o2 and X is the sample mean' then the random varrablc''t'
rnean p dnd variance
dcfined bY
defincd as t=ffi s is the satnplc standarcl deviation
.rvhcre
t--- - with (n ;

m
1)
- x',,, The ranclom variable- 'r,' forms the t'clistributidn
s-. -= 1 /t<xn-l "'-"

o
\

ct.
rlegrees of freedont (d'f'')
Thet-distributionissymmetricalabotttits.tn<:anZetolikethenormal
changes by increasing the samplc
t .
size'

o
distribution. The shape of the-
t-6istribJiion the normal
lar*" i;; tt'u"t-tii"iribution tends

p
When szrmple .ir" i.-r'.,ttciently '0t'

s
distribution.

g
Tables are available foom which
we

o
given values^
a.""a tire t-values for
ol^-o.- ff = 0'05 and degrees of
ir."a"* is"I' then from thc t'table' weI
bl
,-oad r.rrd'", column 0'05
and against

3 . u

4
is
;;;; of freedom' We get 1'833'.It
as ts.05(9) = 1'833' Sirnilarll' . -to721d.r'.) t=o
9
,"Jtt"n hlzt't'l)

9
= 2.ii06' . Figure t2' 1

t
to.ozs(e)
MEAN

a
ESTIMATE OF POPULATION IT

t
12.6 CONF'IDENCE INTERVAL

/ s
(LARGE SAI\{PLE)

l.-K"";l
,rd s
:/
Lettrsconsiderapopulation(normalornon.nolmhl)withmeanlwhichis

tt p
*ir,rr", i. assumed to be known'
A siinple random
. unknown
,#'";ffi:?

h
sampleofsizenisselecteclf;orni}repopulationandthesamplelneall*is
calculated.whenthesarnplesizeislarge(n>30)'thesamplingdistributionofxisa tt I
where o* =
distrir-rution with mean px = p and standard. error oo fr'
normal
may
the population is infinite or it is very large' The population
assumed he.re that
ormaynotbenortnal.ThenortnaldistribtltionofXisshownbelow:
TherandomvariablexcanbetransformedintostanclardnormalvariableZ
u to + o' Let
Dcf,weelr - oo
X - u -' r --- --^-r^k
variabl 1o7. oqntnke
e Z can take anv value bctween
any value
rvhere Z = -#' The random
o/\n
rtsmarktwopoirrts-Zgalld,Zs.onZ-scale,wherecrliesbetween0and]..-Zgisa
Chapter 12J Statisllcal Inference Estim
85
point on the left of which thc arct
unrler normal, rlistributio n of z isl trnd zg cuts
2'
cx.
olt an area, to the right. Thus thc a.ea
of the norrnar curve betwecn
-ztLandzsis
I - the total area unclcr thc not',ral culve
q''
bcing unity. out of all possibll u,,luuJof
z' 100 (1 * cr) % of the-values o".upv thc spacc

m
,]ro,'toa I - a. Thus thc proti*bility
is(1 - cr) that the rancrorn variabrc
2 wiil iarru a ,oir" bctween
o
_ z!!. andZs. This

c
probability staternent can be rvrittc.n

.
in syrnbols as p [* Z*. Z . Z{ :t _ o, tiuttioe

t
x-u , we gct t'-"i)
'=JG
p o
-lt- .-. 'iI
X-u
g s
L z o/!n --iJ ^
l o
u

b
Without proof,.we writc thc

.
conficlence interval for p rvhiclr

3
is

4
X- z*# *rr.X +Zy*
'-I

9
z Vn {n .=Zu/2
0 Z.=

9 ,r,.,,,?;:'il:.'.rr*donce
Zu/Z

Ir ie etr[ect 100 (l

a t
.* a),,/uconficlenec

t
inrervat lirriit ia

s
L - I *Z+*

/: /
arrrl tlruul)pereonfideneeljntitie
+Z!*
".'- -'''r='vv'r'rqErrLv rlllllL lE U
U E
2 Vn =XJ
Z Vn

s
'l'he interval ean slrio be written ns * ,t
X

tt p
fr
for'1r,

h nreCI

0,{750 I o,qzso
.. ,_1.*_ ltr , _ .
0.
Z
--1,q6 I
ion
-1,06fr <1,< +1,g0fr iriguuo ILB

I,'or gg % eonfide nco,intelvnl ltrr a


1r, c = 0,01, - 2.68 (From
arsa table of norrnal rliiriribution)
2^ 0,006 nnd Zu,,,,n

X-z,oefr <rr <XF2,68*


Basic Statistics Part-I!
86
rvide intcrval' which
th:.ubllt"?.?.'1:]nte'rval' A verv
This interval is wider tlln p is
givesffi-;;p*U"Ut" confidence linrits for
o
# <F<X*3
X-g"r/n
'L : T
\n vaiue of p'
This intervuil" **o't certain to cotrtarn thc rruc
tion
ot' samPling is done without
om
rePlacement' the

c
When the PoPulatiotl is finite

.
is given, the confidence interval
ft\F t
whenN
siandard error of f is o* =

for p would become'*


p
.';fta[., o
s
- qft \F' p' x

og
CONTIDE*"I. INTERVAL
l
rZ'.C.r 'iHg
MEA}.IING OT

b
- 1'96 a"a
p' ThL interval hrrs L = X

.
Let ue consider 95 % confidence
interval for ft

U = X + 1.96
4 3
probabilitv is 0'95 that the ra1-dom interval
(L' U) contains

9
fr 'n" rnean, th-"t if samples of size Yere
n- repcatedly

9
parameter p. This conficlence interval
the unkhown u;J if thc to"i*
t
taken from the population

a
samplq:::i e5 out or 100 such

t
ror each
r;ft,x;;ruft)*'*'::louted
iij
/ s
;"*ff lty.*:*",:1i,"#.Tl;l';:lffiiililt*T?ilxril;u%.-chancesthat
/
s : u;hichr'ill nob cover the-value of
o 24
it' suppose x = 80' =

p
we shall get an interval eonfidence interval' we
get

t t *-r.;6 = 80-l'ecft
and n = 36. Let
":Tilil;;;;;;;;il '' = 80-1'e6x7'84 64'63

h andx+r.seft = 80+r.gofr =
-r/n
= B0+1.96x?'84
a-r---^ inle;vul
r*+^urr.l for
fr.n p
rr is
95'3?

ig (64.63 to 96'87)'
(64'63'to 96'8?)
,., "lr.o*.; tt"ll* eample' 96';':onfldenee
ll'-t, Ar rhis srage
is1
*,, .nnoot'sav that nrobabilitvwcl
we carl it 96 % dil;;;;il;rgl eont*in.-in6 r*tuu ;i:;i:'B;?;rc toseing a rlie'
0.96 thar tho rnterval
(64.68,.g6,ao
*.]ii';;;; t" tr'tt di;' b; when a dio haa been
rav that probabilitv it 1/9 that,4
l'';' ,,;t bt'en obeerved then we4are
not
toiEed and the fase'l hag bee-n- oU*uruuii;il sn the
Now *, ;;;.;..v trrgl pt.d;tlitv of sottins
in a situation of piobabiliry,.
til".rri*i*i.1-i;;ii; t';fric ruloc' thsn thc
die ir 1/0. rf z %-;iii; drivcre un.
ari"r, trru;ril;, ilui wr'en a drivsr hag
orobability is z/ro6'="i;o;iil; ilfi;e;-
"iiiuiJrutuno* r',. has n-s- concern with
the
ma.e a misrake;i'i; ii.g u9o1 g
.ituatio ns, whe n eo rnerhi
n

probnb*ity or o. oil irffi iriiri i; i.;'J;iJIiir"


[Chapter 121 Statlslicat InferenceEsUmatton E,
has happened or it has not happened, it is now ricliculous to
probabilitv of its happening. when Mr. A has died, talk about the
*" ao noi*y il;rolability of
his survival is, say 0.90 when a'confidence interval has bcen
.inrtiuJtua from the
sample data, it
.is- now sonrething which hap hafpened or which har been
deterrnined. The different possibilities are not involvcd
no*. The calculnted interval
is now not a random variable. It is the realized value
oitr,o r"nii, iii"r'*r,
Example 12.1.
(a) An electrical iirm tnanufqctures light bulbs that have

om
a length of life with mean

c
p and a stanclard cleviation of 40 h-ours. If a ;o,";i;

.
of 100 b'ulbg h"!

t
life of 780 hours, finrr a g5 % confidence intorvai ro"iri. "n-;;;;;;
;"priil;;;;f ii
o
bulbs producccl by this firrn.
(b) A ratndom sarnple of size n = 400, sclected without

sp
replacement from a
population of size N = 2000 rvith o 4, the samplermEan

g
ie found to be x g0.
construct a so o/o confidence interval= for ths true mJan ,fth";il;Uiion. =

l o
Solution:

b
(a) r\ 100 (l - s) % confidence

.
intcrval for p is

3
\- zy*
zVn.p<X +b+ q

l-o4
lVn

9
HereI = 780, o = 40, n =
9
100, o
= 0.96 or = 0.06 and != 0.026

a t
From the area table of normal dietribution, we hava Zg=
Zo,otr= l.g6

s t
Hence the gb % confirlcnce irrtorval for p ie

zrro -.
/:
r e6
/
(ffi) <p< (#J
z8o + r.e6

s 7ii.t6 <
tt p
780- 7.84 .< pr < Tg0 +1.g4
<p 797.94

h
O) A 100 (t - rr) % confidence interval for p ie

N * 'Ls+
2Vn
Heren=i00,N=2000, da4tf;=gg, l-o = 0.g0 ora = o.l0 and f; =, o.oo
we have %E= 2...='1 64;
;:':,::ffii:1,,T,::T::Tii:::T
Bo- 1'6''|5 (ffihhffi < p < so+ r.*(#hffi
80-0,204 < p < E0+0.294
79.708<p<g0,Zg4
Basic Statistics Part-II

n
.In the previous article it was assumed that o is known. In practical siturttions,
o is u;ually'not known. Wherr o is not ktrown, we can replace it by
the sample
stanaara deviation s. In this cqse the confidence interval for
p is

F,_Zs*.rr<X+ZsA
-I!n 2Vn

m
used only whcn n is
is isportarrt to note that this interval estimat:...ot be

o
It-B;t-iiii
be normal. Whcn the population is finite'

c
t"re". p"o"fation rnay o, ,ray

.
'ot

t
the interval for p would becontc

x-rtftatr.p.x.4rt
p o
s
This interval can bc calcttlatctl rvhett N is givcn'

':"T:: {";:;ts
og
of a random sarnpte of I'r0 collese stuclents showed a mcan
ot 114'5

l
c.nti*.iriJ-?iJ'-" standard dcviation of 6.9 centirrretcrs. construct a 98 %

. b
.oona.n.. interval for the mefln hcight of all college students'

3
Solution:

4
A loo (1_o) % conndcr;1,+,jl::l_
r,*
99 Vn --I
!n
t
2

a
1-o q = 0'02 and fi=
= 0'98 or 0'01
= 6,9, n = 60,
t
Herefi= 1?4,6, S

s
2'326
From the area tablc of normul digtributionl wc have Ze = Zo,o, =

/: /
Hence the 98 % eonfidsnco interval for p ic

p s <
17+6-2,s20(ffi) < rt l'szl(#J 174'6+

t t <
174,6-2,27 < Ir 174'6+ 2'21

hrtri [l <
L12,29 < 11$,17
Exomplo 11,8,
:yrtotie blood prorcure of 00 nrsn hrte a rRuuR of 128,0 :n* ol 1l]nl.:::I^r*
, .r.iinri ffi;;ffi;l iirnm of n*ruuri, Anuunring rhar rhoc' .nr' a -,-r*ndom
Lr -^l
gg *, ounfidoneE intorval fsr ths maaR blood
rample of blood pru,r*., uuiuutntu u
Breriure in the Population,
-golutlontA
100 (1 - E) % oonfidsnee interval for p ie

7,-'lh!*.,,<X+
-I tv*
lrr r r1n

Horol:. 128,9,8 r 1?' n r g0, l-c !s 0'00 or s = 0'01 nnd fi*0'006


l
[Chapter 12J Statisticat Inference Estimation -,qe_
IIromtheareatablcoftrot.ttritlclist,ribution,wel.tivcZg=Zo.oos=2,575
Hence the gg % confirlcncc intorval for. is
1t

128'e-zszsffio) < l, < r28.e +2.575(#)


128.9-4.61 < p < 128.9+4.61

m
12,1,29 < p < l33.bl

o
12.7 CONFIDENCE INTERVAL ESTIMATE
TOR POPULATION MEAN

o-Known, Popqlation Normal


t . c
Here we are stressing that the population is normal.

p o
tf n is small, o is known,
then the random variable z can be ,secr in the i,rru;;;i;;l;;h;;;i;upo"p.,r,,tioo

s
normal' The confidence interval for i. tt o same as for i"

g
f, the large sample. Ifor the
convenience of students, the r00 (l a) % confirlence
-
o
interval fqr p is.*;;;;;;

l
here i.e.,

x- Zy*.r.X+
zVn
. b
2,,*
3
zVn

9 4
This is an importunt case in rvhich thc, ranclorn var.iable

9
When n is small, population is nor.rnal with unknown
Z cannot bc used.

t
o, the randorn variable

a
X-u
s/!n
s t
/: /
Let us mark two points _ tf
,n- ,1

s
and tlr, r) on the t-scalc in thc figrir.c.
-

tt p
Using tables of the t-distribution, rvc
can find - t;,r- ry ancl tlrn-,;.'l'hc

h is (l-er).
ar.ca
of the t-distribution betwecn - t,,
-r(n - l)
.

I -ta/2(n-l)l t:Q tq/Z (n-t)


and tlrn-,y Thus the
Figure 12.4
probability is (l-cr) that the randonr
variable't,will fall betwcor, _ tftrr_r1 and tlrn-1;.. We can write
thc probability
stateltentas: .n < t. tfr"-,r]
[-t]t"-,t',r) = 1-cr
_ Putting the value of wc h:rvc

I
PL-5,"-,,'ffi '
x-,,t;,,,-',JI
= t -o
Basic Statistics Paft-II
90
we can gct the confirlcncc interval for
From thig inequalitY within the brackets
p is
r,. tt - *tonddontu interval for
"iliioti- "\

m
the random variable 'Z' ot't' is to be
12.1. can be used to clecide whether

o
for p'
-i"rrc
.,""a in ."king the confrdence intorval

. c
Table 12.1.

t
n - Srnall
Z
o
(normal PoPulation)

p
s
t '(normal PoPulation)

fr*omPle 12.1,

og
l
population gives the sample mean
A random eample of n = 20 fro.rn a norrnal gSoio confidence interval

b
gampi"ltrna"ra deviation, s = 8' Construct a

.
140 and ttre

3
for the PoPulation mean'

4
Solutlon: is
interval based on t'rlistribution
Hen n ie small, therefore the cqnfidence
9
.l i6O (1- o) % conlidence itrterval for p is
9
t
"*a. e c
-F
X- $r"-t,G<P<x+t!,n
a
,r
r/,r

s t l-o = 0'98 or cr = 0'02 and i = o'ot


/: /
HereX= 140,8 = 8, n = 20'
to.o,rtol = 2'539
Fmm ghe t- table, we hate tf t"- ,l=

s is
tt p
for p
Hence the 98 % confidence interval . .\
- 2.53e (#i < P < 140 + 2 53e (frJ
h
r4o
; 140-4.54 < P < 140+4'54
. 135.4G < [t < 144'54
D*omple 1218. ' ,^^^^ rL^+ o-o r.rrlin A lample o.
pieces !!ut^u13 cyiindricai in shape'
s

A machine is producing metal


th"i, diameters 0.9?, 1.03, 1'04, 0'99, 0'98' 0'90' 1'01
piecea ie tslcen -rina gg %"*"-i.or, t"l"Y for the mean diameter o
and 1.03 "rrd
..ru*"f,rr. a *nnJon.. '

an approxrmate normal population'


pieces produced uyitris machine, assuming
Solutlon:
p is
A 100 (f - a) % confidence interval for q

x- t5t,,,,f; ' p' x*';,,'-,,i8


[Chapter 12] Statistical Inference Esti mation 91

HereXX=9.0b, EXz=g.10{11, D=g, X= = oil


? =1.00b6,

s2 = ;h[tr, ry]= {[r.rour -tt#,,] = 0.0006, s = 0.0245,

l-cr=0.ggora=0.01 and ; =0.008


From the t- table, we have tf tn-,, = tn.ooera) = B.Bbb
Hence the 99 % confidence interval for p is

., . r 00b6+B.s'b(ff) o
m
.c
1.00b6-BBb'(ff)
< 1.0056 +A.0274
1.0056 -0.0274 < [r
ot
0.9782 < p < 1.0330
12.8 CONFIDENCE INTERVAL ESTIMATE FOB THE
s p
g
DIFFERENCE
BETWEEN TWO POPULATION MEANS (LARGE SAMPLES)
of and o! known
l o
. b
Consider two large populations with means p, and p, which are unknown and

3
.

4I,
variances of and o'! which are assunrcd to be known. The populations may or may

9
not be normal. Two independent random samples of sizes n, and n2 are selected from

t9
the populations and sample mcans X, and are calculated. The point estimator qf

ta
the-difference between p, and p, is given by the statistic X, - Xr. The statistio X, - X,

/ s
is an unbiased estimator of trr Ir: and has the normal distribution with mean pl -
-

:/vr, *
lo?-o,

s
p, and stand.ard errorl /; *
n, tno standard normal variable of
. (X, - X; it

tt p
Z=
h
The probability is (1 - a) that the value of random variable Z will fall between
trvo selected points -Zsan,-l Zg. \4Ic can writ-e the probability statement as

Pl';'z'z;j = r-cr
ur'l-";-
or d -r,. (Xr - x,) - (ur - rt::r .rr1= 1-a
!,gE 1l
92 Basic Statistics Pad-II

We can simplify this incqualitl, to gct ihe 100 (t - u) '% confidcncc interval flr
pr - Fz rvhich is

f"t-e it -z6;
/o',
(Xr -Xr) - Zs1\l /-t],* -' t,, - pt: < (X, -X::)+r:\q *rr-
11.,
" l;

This inberval estirnate catr bc rvt'ittctr tls


f22

m
(Xr - Xr) tzs\ /5 .1
o
z \ n' I1'r

c -
If we want to get the confidcncc intcrval of pt, - [ti, wc shall use the interval in
i. trscd. Thus the conficlcnce interval for p,
t .
o
which the difference (X, - X,) ltr is

p
r)
l": o;

s
(Xz-Xr)tZsli:+*
z y tt' Il:

og
l
In the numerical qucstions, thc valucs of X, and X, ato usually positive but the

. b
difference (X, -X, or (X,.- X,) nray bc positive or ncgntivc.'lhe conficlence linrits of

3
(tt, - trr) or (F2 - p,) are sotttctirncs Ircgittivc'

4
Exomple 12.6,

9
Arandomsampleofsiz.cn,=2Stakctlfromanormalpopulationwitha

t9
standard deviation or = .b has a lltean Xr = 80. A second random sarnple of -(ize

ta
n, = 36, taken from a differcnt nornral poptrlation with a standard dcvlation Gz = 3,

Solution:
/: / s
has a mean Rz=75.Fincl a 94')/o confirlcncc intorval lbr lrr- lt..r.

A 100 (1-cr)
s
% confidcncc itrtcrval for 1t,- 1t, is

tt p
h
Irrom the area table of ttornlal distribution, we havc Zy=
')
Zr.o.t= 1.88

Hence the 94 % conficlcnce intcrval for p, - 1t, is

(80-75)-1.88\H+ <Itr-1t, < (80-75;+ t'88\'F.


V
5-2.1 < pr-p:r < 5+2.1
2.9 < p,- pr < i.i
lCha 121 Statistical Inference Estimation
I
93

FF-"@
when the poptrltrtio, r,.r'iu,"c.. oi :rncl oj irr.., .oi, givcn, the1, itle cstirnitted by
ttre sample variances sf ancl s.j. 'l'hc populaLiorrs ulay or rnay not bc normirl. The
confidence interval {br (ttr - it:r) bcct_rmcs

,\FT q
VilI;i m
(Xr - Xz) - lir,'
Exumple 12,7.

c o
t .
Construct a 95 % confirlettcc irttcrval for thc truc ciifferencc bctwecn the

o
average time in breakdorvns of tu'o kintls of'rleviccs, given that a ranclonr sanrple
of

p
40 devices of type A on thc ilvcrilge larited 20ti hotirs of continuolrs Llse bctween

s
breakdowns with a standard dcviation of 26 hours, anrl thzrt a randorn sarnplc of
b0

g
devices of type B lasted on thc avorurge 192 hcurS lvit[ a stanciar.d 4eviatio, of 22
hours.
Solution:
l o
. b
A 100 (t - cr) % confideucc intcrval fbr it, - p, is

ls;
43
Gr--X:)-Zg1 l) o-.u,--tr!<1Xr-Xz)+Zg\
E?#
/1 n2
s:
-' ";

9
z \ rtr Il,: V n, r):

I{ere 40,n, =

t9 22, X, = 20t1, S, = 26., Si = OZA,

50,
n, =
ta = l g2, S, =

s
X., Sl = aga

/ /
-1 -0"t)5 cr. a = 0.01-r zrnd crl2 = 0.02b
cf,

:
I,'rom the area tablc of nonnal clistribution,*g have Zs= Zuuzr= l.g6

s
-tt
p 676
Hence the 95 7, con(jrlcuce ilitcr,val for'1t,- 1r. is

h
(208 i92)- i.e6 62(riB4
.lti I
< pr < (208
40 -p, - l-r0
192)-r- r.yu 1V
40 o-b0
< lrr - ll,: < 16 + 10.1
- 16 10.1

5.9 < lrr- p: < 26.1


12.9 CONFIDENCE INTERVAI, ESTIMATE I]OR TII}' DIF]I.'EIiENCE
BETWEEN T\ryO POPULATION ME]ANS.POPULATIONS NOIIMAL
(SMALI, SAN{PI,BS)
of and ol Krroo',
When the populations arc norrnal ancl thcir v;rrianccs are knovvn, ilie lilrmuia
for confidence interval for (pr, - pr) for srnail sarnplcs is the.. srrrne ;rs ilrr large
Basic Statistics Pqtll
94
intervals in t'heir Proper +rder, the
interval is
samples. To keeP the different confidence
is written here again' Thus 100(1 -c,,)%
highliehted in this section and
inierval for (P, - tl, it
1lJ+:
E--o:
nz
\''

om
t::T:::-:::'"::.,:"1:
c
the random variable
When of and o! are not known'

.
3

t
the
.",oJffJ;#;;";;;,:;;; i;;i no,n,ur popur.tiorrs, wiilr smarl sampre sizes'

o
we
with (n, * n' - 2) dcgree of freedom' IJut

p
statistic (X,-Xr) has the t-distribution equal
the variances of the populations are

s
another assumption that
have to make
o2 which is common for both
the

g
i.e., ol = "! ";ituy)'
= The pop'f i*-t"'iot"u

o
pooled estitnator sf where'

l
populations can be estimated by a

b n,*nr-2
si
.
(nr -1) SrrjtX&
si + (nz t) E(xr - - Xzt'z
=
3
p
"2= *nr-2 n,

4
samples of small
variances. Thus, if independent
s! and s2, are the unbiased sarnplc

9- 9
populations ivith o7 = o"the statistic 1x'-x')
has

t
sizes are drawn from the normal
* ,z 2) degree of freedom, where

a
the t_distribution with (n,

s t
: /
/n. *2
j+j a2

s
n,,

p
t.

h tt 11
nl n?
taiztrt.t.) t=0
Figure 12:5
ta/2(d.t.

when nl = n2 = n, we can write'

- Gr - pz). iandor"n v-ariable t will fall


L--
*
(Xr - Xz)
The probability is (1 =' ct) that the
lz
,,\"
and t!tn,+nr-2)' The probability staiernent for the random
between - tltn,+nz-2)
variable t is:
P q (,, + n't,- 2)< t
< t] tnr +
= 1.-o
"'- "]
[-
[Chapter 121 Statisticat Inference Estimation 95

Now, we directly write the 100 (1 - a) % confidence interval for (p, - p) which is

m
(Xr-Xz)tts E I

o
' 7,,r+nz-2)t'!*.;
Eromple 12,8.

t . c
o
The following summary statistics are recorded about the strength of two types
)
of synthetic rubber.

p
I

s
) Type I n,=16 f,r = sr 4.4

g
15.3

l o
Type II Il.r=9 X,, = 13.8 s, = 3'9

. b
Assume that the distribution of strengths for the two types of rubber are

3
normal with equal variances. Cornpute a 99 % confidence intervai for the diffcrence
I
4
Pr - Pz.

9
Solution:
s

fl9
A 100 (f - cr) % confidence interval for p, - p, is

- Xs) - tir,r t,t


a,U.:,
-l ' *

t
(Xr (X, t;,,, ,r
!A ,L Pr - P2
' - Xr) +
1ffi
Heren,
/: /
= 16, X,
s = sr = 11.4,
"?=
tg.ge

s-
trz = 9, Xz = 13.8, s2 = 3.9, ,?r= tS.Zt

tt p - t)sj _ (16 -
(n' 1) s?+ (nz t)19.A6l(s
"p_
., - 1) 1b.21 _ 290.4+12L.68

h
nr*nz-2 16+9-2 23
_ 412.08
Zg = 17.916, ., = .,/l?ffi = 4.233
1-o = 0.99 or o = 0.01 ancl
fi =0.05, v=ir+n2 Z=L6*g-!=!J
From the t-table, we have tlt"t = torx,r,(g,lr = 2.g07

Hence the 99 % confidcncc interval for 1t, - p, is

(15.3-13.8) -(2.807)(42rr,1fl.* < pl - p2 < (t5.8-18.8)+(2.80 7)(4.zs))


llI
f.5-4.95 < Fr -lr. 1.5 + 4.95
\G* e

-3.45 <,Fr - ytr< 6.45


IT

96 Basic Statistics Paft-II

1,2.L0 CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO


POPULATION MEANS.DEPENDENT SAMPLES
(PAIRED OBSERVATIONS)
Suppose we give a test to a sample of students and the rnarks obtained by them
or. d".rot"d by X where X takes the values X1, X2, X3, ..., X,r. The students are given
sofire extra coaching and again t}rey are given the test of the'same difficulty
and the
by them are denoted by Y where Y takes the values Y,, Y2, Y3, ""

m
*;rr.r
"rrtuir"d
yrr..The marks obtained in the first test are calied 'before' and the marks obtained in
the second test are called 'after' observations. These two sets of marki are in
pairs

c o
Iike (X,, Yr); (X2, Y2), (Xs' Yr)' ... (xn, Yn) ancl are called paired observations'
t .
o
obviously the Y values depend upon the X values, hence the'fcrm samples are dependent'

p
Let us write the paired observations in the. following and calculate the

s
difference d; for each Pair.

g
xi Yi dt=Xi-Yi

l o
xr Yr X,.- Y, = d,

b
x2 Y2 Xr-Yr=d,

.
x3 Y^.) Xr-Yr=d,

43
:

;"
9
;, X,.-Y,r=dn

t9
The mean of 'd' values is clcnoted by d where d = Idi/n' We can think of a

a
population of X1 and Yi observaiions with means p, and pr, and the population of

t
random differences di with me an ptn and standard error od. It is required
to calculate

/: / s
the confidence intervai t'ir the mcan po. The distribution of d has the t-distribution

s mean pn and standarrl The random

tt p
lr''ith (n - J ) degrees of freedorn ft.
".ro,

h
valiabl.e I can be transforrned randomvariabletn'here , = -1-P
o6/!n
The standard deviation on'is unknown and is repiaced by its sarnpie estimate
so

d-u.,
wnere so
U = T Thus t- +
so/!n
. The random variatrle li.es between
-o'=.
II _

-ia. ,. and tq,- 11


\\'ith a probability of (l-cr). We can write the probability
;(n-lr ,\ir--

st e're mcrrt -1=,-'


,
I)r <t<ts.("_r)] =l-cr "rrl--*
L ';,"
.['E.,.,
t
ffi'
- t,
ttt"- t,,r= t - o
L'
Chapter 121 Statistical Inference Estimation
,IheteymswithintIrebl.ackcLscanbcwt.ittcnas:
T
rrla- s'r r , s'r
I
'L-- ,,a6'1t,, <rl * t'!,,, ,,i"J =l-cr
Thus 100(1 - cr) % confidcncc intcr.val fbr is
",,' pt,,
I
.,lI

G
m
Example 12.9.

o
The following data give paircd yiclds of trlo varieties of wheat. Ilach pirir was

. c
planted in a different locality,
Locality I 2 4 i) t)

ot
p
Variety I ,,10 25 D' 43 46

s
Variety ll 47 27 33 4r) s2

g
Compute a95 % confidcr:ce intct'val for the nrcan difl'crence betwecn tho yields

o
of the two varieties, assutning thc difl'crences of l,iclcls to be approxirnatcly nc-rrmally

l
distributed

. b
Solution:
A 100 (1 -s)
3
% conficloncc irrter.vnl for' 1r,1 = lrr* lr, is
d

9 4
- tt,, - ,, fr < lt., < ri' + 6o,,, .. ,,.ft

t9
The neecssa tion rrlc givcrt bcl
errlculatrons bulow:

a
xl 40 25 37 ,l ll 46

t
x, t)1

s
,17 33 40 52

//
d1 = X1-Xg _r) ,l :6 Xcli=*g

:
il

?Edi8 s
I d? 49 tl 16, 3(i )-dP = 114

=; p =*B =
tt -ry] =*[,,,-.#l =]rror,pr
ct _1,0

.;h=S[ro1 =15,
cd*6,08, l-u=0,060rq=0,0Ear:rl
7 - 0.086
Ilrom the t.tablo, wo lrnvo tf
trr_ rr
- toorng) = g,TT6

Hcnee the 96 % eonfirlsnee interval for p,1 = 1r, =- 1r, iu

* 1,6 * 2,17(tffi . rrd < .: t,G + 2,77(,


1,6 - (i,?4 < lrd < : 1,6 + 0,2,1
ffi
- 7,84 < p,1 < 4,04
98 Basic Statistics Part-II

I.2.11 PROPORTION
Suppose a population is divided into trvo groups. Thc observations in thc first
group a.e ca[ea 'iuccesses' and the observations of the second group are called
failures'. For example the people rnay be divided into literates and illiterates. The
proportion of successes in the population is defincd as

om
This proportion is rlerrotccl by p. 'lhe proportiort of 'failures' is dcnotcd by q and

c
q = 1 -p or q + p = 1. Let us see how q * p = 1. Let N dcnote the total num'ber of
observations in the population. Wc have,

t .
o
N = number of successes + number of failures' Divide both sides by N
N _ number of successes + qumber of failures
N=N
s p
number ol-quggglqcs ,
og
nunrber of failures

l
-

b
^^

.
I = p+q

3
Suppose a random sample of size n is selected from the population. Let there be

4
X successes in the sample. The ratio X/n is the samplc proportion and is denoted by

9
random variable'
0. fh,rs 0 = )Vn, where $ is randorn variabie and X is also
POINT ESTIMATE
t9
ta
The sample proportion $ calculated from a sample is the point estimato of the

L2,I2
/: / s
population proportion p. The statistic ff is unbiased estimator of p' Hence E(0 ) = p'
CONFIDENCE INTERVAL ESTIMATE FOR POPULATION

s
PROPORTION p (LARGE SAMPLE)

tt p
Suppose a population proportion is p whiqh is unknown. A random sample of
size n 1" , eOl is selected from the population and sample proportion 0 is calculated,

h
The gtatistic 0 is the eetimator of p. The dietribution of 0 is normal with mean

1.0 = p and etandrra ,rro,


1ff, Thue the ranclorn variable fi can be transformed
A

into random variable Z, where


.l = +, Whcn n is largc, the termc p and q in
/P-g
"!n
the dcnominator ean be replaeod by their cample cstimatec fi and A Thuc
0-p
a=Tn
!+
lChapter 121 Statistical Inference Estimation

We take tu'o points on Z - scaie.


These are -2, andZ,!.'l'he area of the
22
normal curve between - Zs and Zc is
(1 - Z wtll
cr). The random variable
fall betrveen - 'fly and Zs with a ll

m
I

- Zan Z=0 Za12

o
probability of (1 - o). This statement
can be expressed in the follorving Figure 12.6

c
I

t.=1-cr
form:

o
,l-r;.2.2;)= r-cr or pl +.ffi .rr1
s
Op I
The terms within the br*ckcts ctttt uo f rittun

og
l
I r;; [;l
"*V
oL$- qliu<p<$. ,;\i#l=
b
r-cr
Thus 100(1 - cr) 9/o

3 .
confidencc intcrvai estimate for p is

4
[Tn /^"
fi- z;VT $* z;V+
9
and

t9
For 95 % confidence interval we have c = 0.05, alL = 0.02 5 and Zo.or, = L.96,

a
/""
-Zo.or, = -
s t
1.96..Thus 95 % confitlence interval for p i, 0 - f.O6lf ,"a

f/+ //.
s:
f^^,
fi + t.96 For most probable confidence lirnits we take Z = 3.

tt p
Exomple 12,10.
A random sample of 200 persons from a city was intorviewed and 50 of them

h
were found to be literate. Calcdlate a 90 % confidence interval for the proportion of
literate persons in the city. AlBo calculate a confidence interval for the proportion of
illiterate persons in the city.
Sohttion:
A 100 (1 - a) Yo confidenee ittEerval for p (liternte perrions) is
/aa
fi-'t1,I# <p<0- ,t!T
/an i

x 50
I:feren = 200, X = 60 (nurnberof literate pelsono), $ =
n 200 = 0,26,
0= t-0= 0,76,1-o = 0,90 or o = 0.10 and ul2 = 0.06
F rom the area table of nsrmal distribution, we hrtvo %!,
2
= Zn.on = 1,646
100 Basic Statistics Part-II
Hence the 90 % cgnfidencc intcrval ibr p (litcratc pcrsons) is

o.zb -, unr\m < p < o.2rr + l.Gtb{@Ho D


0.25- 0.05 < p < 0.25 + 0.05
0.2<p<0.3

m
Also

o
A 100 (1 -cr) % conticlence intcwal fbrp (iiliterabe pcrsons) is

i' - 2,,'\
; VN
/Pg .o. 0 + 2,,1
zVN
/u
t . c
Herci'r = 200, X= 150 (nriinbcr of illiteratc pcrsons)

p o
s
X lI;0
^ =* =
Yn200
rf **
=g.,Ib,0 - l*il= 0.2t-r
I

g
t
,

o
Hence the 90 % confidoncc intcrval for p (illitelatc persons) is

i,6rb!'OqFi,
bl 6rD^F##
.
0,75- < p < 0 Ti + r I

3
0.?r'r - 0,05 < p < + 0.0Ir

4
0.71-r

C),7<p<0.8

9
12.1A EONFII}ENCIT IN'I'I,BVAL NSTIMATE ITOIT THE DIIi'FERENCE

9
t
BEI"vvEENi T'WO .FO I, U I,A'T I ON II 11 O IIO II'I'I O N S (LAR G E SAMP I,ES)

a
Suppose tlrcre ttre two perpulutionu huvirtg lrrupr,r'trons p, nrrrl p, which are

t
unknown, It iu rcquired to cak:ulnte the cerrrfirlcrrce irrtervrrl fol tlre diflt,renee

s
/: /
(pr - p,,)l 'l'wo ineleponrlent raurlor:r surrrpleu oi'sire n, lrrrl r1e ars nerleeted frorn the
pe,pulationn nnd tlre enrnpk pnrpoliions rlre r:nlrulrrtod which ill'e 0, nnd 0,

s
respcctively, '['hn statistic tff, "- ij,,l in crr[irnntttr of Ilru purnrrretcr' (lr, : pr), When n,

p
t
h*t
rrnd n, aro lnrge, the rnnclorrr vtlurblo (ii, :. ilr) l',*,, tlre' rrorrrral dietribution with

lnean (pr ps) ancl stnndnrd rjll'0t . 'l'hr, xtnrtrlnrel norrnal rnndom

(tt
vnrintrlo Z ie rvritton ae Z = , 'l'lrc plobnbility rs (1 -- u) that the
l)rQr lrr(lc
11
r 11*

l{lndonl vlriabls Z will tnlte ort {l vnlur: heh,{1ron - Zrl rinrl I


rlr'! 'l'hia strrtenlont can bc
rvrittrrn an below:
+
It f ' Zy< Z':
L*TJ
Zy7 =l"-u rI' I, it; 'f,t l
tlttliT ll:li
.r{ =1--c
l"
t 11
I D3
Cha Statistical Inference Estimation 101
The terrns within the brnckcts can be written as:

u[,U,- fin - zi P rQr-;,=.t)r


PzQi
;; -Pz"(0,-0J+
Thris 100 (1 - a) % confidence interval estimate for (p, p,
- is

(ff, - $J - ,t\F.T< p - pc < (0,- 0, * ,;VT-Y


But the termB Pr, Qr, p, attd qr al'e for the populatione and are unknown. For
om
large sample sizes, they can he estirnaleel by their sample ostimatee which arc
ff,, ff,,

t . c
o
$, anrl Q, respucti'clv.'l'hus Lhc co,fidcnca lirnics for (p,- py) are:

p
,AN AA

s
* /P rQ r
'Ls'\ l.:- PzQ,r
Lower Limit (], - $r) - +

, ZtV? - Tg
-

o
f"-i-

l
A A

Upper Lirnir = (ii, -


b
6,rl
Example 12,11,

3 .
4
Cortsider twn paln r*lieving drugs compared on two independent samples of

9
1000 indivrduals eaclt, Sul,pose ?$0 of those rnclividuals receiving drug I and 800 of
lI
9
those reeeiving drug reported sorne 1:ain relief. Construct a gO o/o confidence

t
interval for the difference betwecn popuration proportions,

ta
Solution:

($,-0J- /s +ff.Pr-Pr'tii'-0r+
A 100 (1 * a) % confidence intenal for p, ps is
-

:/f
AA NN
P rQ r PrQz PrQr , Pz{z

s
ze Zs
Il1 ll2

tt p
2 2

Heren,= 1000,
* ffit Xr =?s0, fl, = = = 0.?8, 0r=1-0r=0,28,

h = tt, 1000,
x2 X,
- ffi
* 800, $, = =
Boo
= 0.80, 0r=1-02=0;20,
1* a = 0.90 or, o *
0.10 nnd ttlL = 0.05
Froin the alea tatrlo.1'nornrll rlisir:ibr.rtir:n, we havezi= zs,sa = 1.646

Herrce the !)0 ?6 conliclunce intcrval for p, p,


- iB

(0'?5-0'8'.'o,uffiaoL-@a<Fr*Pc<(0'?6-0.8)*,'unum
0,05 - 0.03 < pr-pz< - 0.05 + 0,08
0.08<pr-pz<-0.02
Basic Statistics Paft-II
102
sHoRT DEFII-ITI-ONS
Statistical Inferencu ,,
reach ^_^r..^:^-- about a population based on
The process by which decioio-n make,rs 1l11lusions
;;;;1" information collected from the poprllation'
or
prediction' or generalization about the

m
A statistical inference is a decision, estimate, sample'
1

in a I

o
population based on infortnation contiined
J nx;'nation
Estimation is the process by which
t .
we nttcrnot
.,hi^h *.'p atternpt
c
to dctermine the vaiuc of a

'o
population parameter from samplc inforrnation'

p
"""*^:-- or 'or

s
o*'nrrf rrnkno'n untru I
Estimationisaprocessbywhiclrrvegetinformationaboutunknorvnl

g
population parameter by using sample values'

o
l
Estimate

. b
O""r,t*rteisthenumericalvaluecalculatedfromsampledata^
or

43
An estimate is the numerical value of the
estitnator'

9
Estimator i based on the

t9
Anestimatorisu,,l@ttellshowtocalculateanestimate
measurements contained in a-Bam ple'

ta
s
an
to use the sarnpie data to estimate

/
An estimator is a statistig.{hat specifies how

/ 'l -'-""^^'
I

:
unknown parameter of the population

s
PointEstirnate
p
- nn cstittttit'c 0f the p*pulation uu'"*utu'l
A point estimatc.ic a numbel$o*tnting

t t camPle,
based on a

h
or. I

Apointestimatoconsistsofasinglesatnplestatieticthtrtisusedtoestirnatethe| I

Parameter'
true population
Interval Estimate r rr- value of ..r tho
+r"., r
is the range of values within which the^.--^r-.^ Rarameterl
An interval estimate
is expected to lie'
or
within which the true value of the
An estimate expreaoed by a range. of values to ae an interval estintate'
population puru*.i.ii. llfiuu.a ti tio, ie rcfeged
Error df Estlmation the crror oi
parameter is called
fha.didtansc betwecn an eetirnats and the estimated
eetirncttbn
I t )',1r* r4 _5!a llstica I Ir fg ren
u..vv-Lr!.t..qLt.,.l
ce-Esti nr ati on
/ ;__-__;:.."_.
Unbiased Estimator
_. _ _ 103

An estirnator i.s unbjased if its expected value


rn
i.s equal tc' the population parameter
being estirnared.
,\
l" of it popnlarron paramcter is said to be unbiased --
,"f "..Tilhtor
g.\distribution is cqual if the rnean'of
- its
/samplin io the parametcr

m
/ Biased Estimator i
"-"

o
If the rnean o? tf," ostirntrrolis

c
equal to the population parameter, the estimator

.
a, is said to bc biascd. I 'ot

t
"

o
/ or"

p
An estimator 6 i* said to be biasccl if
the expectcd value of the estimator is not equal

s
"fi ,h: popuiation 1-.arameter being cstimaterl i.e;

g
I T E(6 ) * O.
Confidence fnterval

l o
A conlidcncc ittte^'ai is it t'ange o{'r'alucs
rvirhin which tlrc population paranrcter is

b
I expected to occur.

An interval that esrimates a popularion


3
or'
.
4
para'leter within'a range of possible values
with a specifiecl probtrbilitv"

9
he Confidence Linrirs

t9
The two endpoints of a conljdeni-:e
intcrval are called confidence limits.

a
Level of Conficlence

t
The probabiiity t]:at thc population pararneter

s
an is included.within the confidence

/: /
interval is called the level of confiaon"o.
oI

s
The probabilii'v r'f t:orrectly rrcc.pting
the nuil hypothesis (1 - c,), is callecl the level of

tt p
confirlcrrc*.
De gnr es of I,'r.c rr El nr r

h
Deg'ees of' frcl'rrortr *, rl:,, rrLlr!rrr.r
ot' va.rucs tlrot ar,g li'r:o to vary al'tor w0 lravc)
;he placecl certain rc$trrr:t rrjr]. rJl,()1
tf,1', aof,u

*-- * -rtlllr.il].Plll_QHgJqE 9UESUAN,S_


1. 'Ihr: prrice:r r:i ;;; ,-r; . .rr*,il;
e..rtlrnates abi.,ut the population paramctcr
saurpie is cailerl: I'r'orn a
(a) statrstical inrrr,;r,'r,ir.r'r,:c (t
(c) ) st,ti.sticar i.f,crenco
statrstrcal h.v,,;tl;, srs (cl) st,aris[ical decision
2, Statistical inferericc lrirs tv,,o branches
nnrrrcly:
(a) level of conficier::e nncl degrees of fi.eeclorn
(p) biased estimator and unbiased
estimator
(c) point eetimate ancl intervnl estirnate
-. (d) estimatinn of paramuter nnrl testing
of hypotheeis
I

Basic Statistics Paft-II I


104
3. Estintrttion is of two tYPes:
(a) one sided and two sided (b) type i and tYPc -[1

(c) point estiniation and interval cstitttation


(al biased and unbiased
the pirrarnett,r' is called:
4. A f.rrnula or rule ueed for estirrrating

m
(a) estintation (b) cstitttittt'

o
(d) itltcrvtti cstitrtute

c
(e) eiitimator

.
valtte is enlled:

t
D. Stntistic is an estimator and its ealctllated
(b) estit:rut.tr:n
o
(a) biased estimate

p
(c) interval eetimate (d) eBtimate

6. ltrstirnate ie the observed value of an:

g s
(b) cst,itttii[rlr

o
(a) untriuecdestimator

l
(c) (d) itltet'vitl t'sIitrtntiotr

b
estimation

.(b)
thc valrtrs o1' unk'tttllvrr poptrlatitln
The process erf using samPle clata to estittlitte

3
1.
parameters i.g callcd:

4
(a) estirnate cstimnttrr

9
(c) (d) itltervtrlestimate

9
estirnation

t
fi'tlm thc srrnrple for population
8. The numerical value which we rleterrnine

ta
parameter ls called:
(b)
s
(a) eetiruation esiin"liltc

//
(c)estirnator(cl)csnficlencuci:r:ffiticnt

(a) s: estimatc
value is eullud:
9. A single value used ts estimnte a populatirtn

(c)p
ft) puirlt csttrttittti

t
interval

t ccufidenee interval (d) ievel of confidener'r

h;6;;;;-ier
lhe value
sarnple data arrd it i$ Iikely to contain
10. An interval caleulated from tlre
with noure prohabilitv ie calied:
(a)interval eetimate (b) point estimul't:

(c) level of eonfidcnce (d) dcgrees cf freci{o'n


is
populatiCIn paratneler is experete el to otrt:ur
11. A range of values within whieh the
caller{:
(a) confideneecoefficient (b) eonfid*trce intervnl
(c) eonfidcnce limits
(d) level of signifieanee

The end points of a confidence iRterval are


e alleel:
L2,
\ (a) conficlence eoefficient &)confit{ence limits

(c) error of estimation (d)paranreters


lChaplel-lzl St?tislicatlnference Estimation 105
,u l;;
(a) level of confidence (b) conficlencecoefficient
(c) bcth (a) and (b) (d) confidence limits
14. If the mean of the estimator is not equal to the population irarameter, the
estimator is said to be:
(a) urrbiascrl (b)
m
biasccl
(c) positively'biased
o
(d) negatively biased

. c
15. If 6 is the estirirat'r'of the pir'rrmercr

t
0 , then 6 is called unbiascd if:
(a) Et6l , o
o
(b) E(61 ..e

p
(c) E(0)n * o (d)
s
E10y = o

g
16. Estimates given irr tirr: i'crrn of confiriencc intcrvals nrc called:
(a) pci,t estiilirrteo-
o
(il) interval estirnates
(c) ccrnfidenr:r: lrrrril;,r
l
(d) Cegrecs of frecdonr

b
.
)n
L7. (1 - c) is cal]*:,J:

3
(a) criticai veii.is (b) level of significance
(c) ievel r;I c,.r:ti irjr:nr:c

9 4 (rl) interval estimate


18. if (1 - cr) is ilcreese.d, i,ha rvrdth of a conficlence interval is:

t9
011
(a) decreased (b) increased

ta (b)
(c) consran"u (d) same

s
19. By decreasing the sample size, the conficrence interval becomes:

/: /
(a) narrower wider

s
(c) fixed (d) all of the above

tt p
20, Confidence interval becomes narrow by incrcasin.g thc:
(a) saniple size (b) population size
h
lue
(c) level of confielence (d) degrees offreedom
21. By increasing t}:e eamplc sizc, the precision of confidence intee-vi:l ir;:
(a) increaserl (b) decreased
(c) same (ci) unchangcd
22. The distance lietlveen an estimate and the estimated parameter i* c.tilr:ql:
(a) sampliirf, r-.rrr;r (b) clror of estimation
(c) bia s (d) standard error
23. ?he numl'rer of values that are free to vary after we have x;J-;,tced certain
restrictions ulron tire data is called:
(q) degrecs of freerlom (b) confidencecoefficieni;
(c) nrrmber of paraneetersr 1d) nr:mber of samples
Basic Statistics Paft-II
106
24. A95o/oconfidenceintervalforthemeanofapoptrlirtioiiissuchtlrat;
(a) It contains 95 % of the vulucs in the populabion population
(b) There is a 95 % chance that it contaitts all the vrtlues in the
(c) There is a 95 %o chancc that it contains the tnentt of tire population of the
(d) There is a 95 % chance that it contains thc stetndard deviation

m
population'

o
25. A confidence interval will be widcned if:

c
(a) The confid'ence level ilo increasecl anrl the riatnl;lt si'ze is reduced
(b)Theconfidence}evelisincreasedandtlrclsamplcaizeisincreased
t .
The confidence level is clecreased ancl the
p o
(c)Theconficlencelevelisdecreasedarrclthesarnplcsizeisincretrsed
stirnplc size is decreased.

s
io for p rvhen o is knorvtr' The
statisticiur, .ur.riut.s a g5 % confide.ce intervai

g
26. A
zzxxl,thc amottnt of the sarnple n]""" X

o
confidence interval is Rs. 18000 to Rs.

(b) l
b
is:

.
(a) Rs. 18000 Rs. 20000

3
(c) Rs. 22000 (d) Its' 40000

4
kuown, the r':onficlence Jntcrval lbr the
27. If the population standard dcviation o is

9
population rnean P is based on:

9 (d)
t
the poisson distribution (b) ,. the t-distribution
ia)

a
tirc normal d'istribution

t
(c) X2-distribution
the

/ s
28. Ifthepopulationstandarddeviationo.isunkno*:,,andthesamplesizeis
populaticn mean p is bascd on:

/
small i.e; n < 30, the confidence interval'for the

s :
(a). thet-distribution - (b)
(d)
the iormai distribution
the hypergeotnetric rlistribution

p size
(c) the binornial distribution

(a)tt
upon the:
29. The shape of the t-distribution clepends

hiitn"parameters
sample 0) po1:ulation size

(c) (d) dcgrees of freedom

population standard deviation o is cloubles, the


width of the confidence
30.
p (i'e; Lhc upp.er limit of the confidence
interval for the population "''9.a.n interval) rvill bc:
iii.rr"f - lower liinit of the corrfidence
(a) divided by 2 0) rnultipliecl f'v rE
(c) doubled , (d) dccreasc

31. The follorving statistics are unbiased estimators:

(a) the sanrPle mean (b) bhc s*rnple varlance *'= IT#
(c) the samPle ProPortion (d) all thc ltlruvc
Chapter 12J Statislical Inference Estimation
32, A statistic is an unbiased estimator of a parameter if:
(a)B(statisric) = parametet. (b) E(rnean) = variance
, (c)
Il(r'ariance) = mean (d) E(sampre mean) = proportion
33. Which of the following is biasccl astimator?

(b) 0=*
(c)-'=$ g' = I(Xit' (d)
om
34. If the observations are pairccl and thc number of pairs is n, then degrees
of
t . c
o
freedom is eclual to:
(a) n
p
(b) n -1
(c)'nt+nz-z
s
(d) nt2

g
35. If n = 0.10 and n = lb; tg equais:

o
2

l
(a) t.761 &) 1.753

b
(c) t. t't L (d)
.
2.t45
36. If n, = 16, Dz = I

3
and cr = 0.01; t".) equals:

4
(a) 2.787 (b) 2.807

9
(c) 2.7e7 (d) a.767

9 (b)
In t-distribution for two independent samples nr =

t
37.
freedonr is equal to:
=', 'z
then the degrees of

ta (d) n-1
(a) 2n -1 2n-2

s
(c) 2n+ 1

/: /
If 1 - cr = 0.90, then value of Zsis:

s
(a) 1.96 (b) 2.575

tt p
(c) 1.645 (d) 2.326
If the populationstandard deviation o is known and the sample size n is less

hX* Z"*
than or equal to or morc than 30, the confidcnce interval for the population
mean p is;

(a).
' (b) X* Zr+
z{n 2Vn
rfr,
(c) --S
X* tc(--F (d/ p+
z" 1 /p-q
zVn ivn
{0' If the population standard deviation o is unknown and the sample size n is
' greatcr than 30, thc confidence interval for the population
*urn p'i.,
(a) X* (b) x*zo*
uvn (c) x*to* \-/ x*tn#
zVn (d) "- TV,
' Basic Statistics Paft-II
108
4|.Ifthepopulationstandarddcviationoisunknownirnd|he.samPle:1i::isless
for the population mean p !s:
than or uq.,d1o go, the coufidcnce interval
s (b) x*t**\n
\-/ X+ Zs*
(a) 2 !n
2

sd
(c) X+to7 (d) b*tqor-z)su

m
2!n

o
interval for population mean pt when
42, A student ."t"rtutu* a 9O % confidence

c
u"a ,, - 9. the confidence interval

.
standard deviati*

t
population ";;;;f.,',*n

o
s to 64.3 cents, the samPle tnean
X is:
(b)
p
"- 24.i1
(a) 40

s
(c) 64.3 (d) 20

g
proportion p is 32'4
o/u to 47 '6 "/o' lhe
43. Ag5 %confidencc interval fbr population
value of samPle ProPortion $ is:
l o
b
(b) 32.'1"/o
.
(a) 40 %
(d) 8o o/o
3
(c) o/o
47 .6

4
44.Ifwehavenormalpopulationswithknowrrpoptrlationstandardcleviationsol

9
and"o2,theconfidenceilrtervalestirnateforthedifferencebetweentwo

9 ft) tx'-x,) t'r{i:.;,


. population means (Pr -'PJ is:

t
Pvr/grqw4v.....-:---:.,;_-fz"-z r;:.----.

a
(a) (x, x,.,*fH*
t
// s
s :
tt p
45.Ifthepopulationstandarddeviationsolando"arettnknownandsamplesizes
interval for p'
is: - Ir'

h(a) G,-x,) * z*\/H


Illr
n1, D2 / d\.,,
Il2 > -9!"'lfidence
30, the 100 (1
ulrs avv \r tTIz EE
C.+e (b) -x,l .
zv ';\.* rx'
E-.i
' n, .*,-r,) - r;\fffi (d) 1x'-x').';!{."2
cstimate of a population
46. If the sample size is llrge, the confid'ence intervai
nronortion n f;
Xr: i*z!\;
(a) xtz;ft (b) /k_q

(c) (d) fi * zu{iffi


$ *,2s. 2
2
[Chapter 12] Statistical Inference Estimation 109
47. If n,, nrX 30, the confidence interval estimate for the di{ference of two
population means (p, - pr) when population standard d.eviations oi, oz are
unknown but equal in case of pooled variates is:

(a) (Xr- X,) + Zs


/s? s;
f (b) (Xr - _ r r8,,,,1
Xr)
E=
/= * :1
z\v, Vr, nz

!,, otim
i) -,
(c) (d) .
- X2)
't*,t'\m 1x, - xrl ,; .

.c
(Xr +

48.
t
The confidence interval estimate for the difference of two population means

o
trr.- trz = pa in casc of paired observa[ions small sample (n <.30) is:

s p
(b) r{*5i,- r)tt
Si

(d) lo
f2- g
b
(c) tg,"-,,!T&
.
d+ b+ts(n_z)sr,

fi.^ ^"4
3
49, If the sampie size is large, the confidencc intervai estimate for the dii'fei:erncre
between two population proportions p, - p, is:

9 9 fi;.;"
(a) - 6, ze1 /+.
0,) +
i v nr
a t +tn, &) zg\ lry+*t
-, v
(0, - fi,r + nr'tn,

s t
(c)

:/ /
*zrvfis,+
1fi, - fi,) $r0,

p s
e. ht
r. t
(b)

(b)
2.
10.
(d)

11.
(a)
4.
L2'
3. 5. (c)

(b)
6.
14.
7.(c)

15.
(b)
8.
16.
13.
(d)

(c)
(b)

(b)
(c)

(d)
(b)

(b)

17. (c) 18. le.


(b) 20. 2t. (b) 22. 2s.
(a) 24. (a) (l)) (a) (c)

25. (a) 26. (b) 27. (d) 28. (a) 2e. (d) so. (c) 31. (d) 32. (a)
33. (d) 34. (b) 35. (a) 36. (b) 37. (b) 38. (c) 3e" (a) 40. (b)
4t. (b) 42. (d) 43. (a) 44. (b) 45. (c) 46. (b) 47" (c) 48. (tr)

49. (a)
110 Basic Statistics Paft-II

SHORT QUESTIONS

1. Given n = 64,I = 42.7,cr = 8 and 2,l,)= 1,6,15. Find the confidence interval for p.

Ans. 41.1 < pt < 44"3


2, Givenn= 16, X=52.5, o= l0and 1-a=0,90, Cornputethe 90%confidcnce

m
interval for p.
<p<56.6

o
Arrs.48.,1

. c
3. GivenN=500, n= 100, I=60, o=5 ancl Zrl,rra= 1.96, Find the95o/o confidence
interval for p.
Ans..59.1<p<60.9
ot
4. n= lM,X = 750, S = 6 anrl ?1,0r=2.326.
Given

s p
Cornpute the 98% confidence interual for p.

g
Ans. T4g.BBT. ii *: ?5i.168
5.
l o
Given n= 16, X= 80, s= 3 and to.or(ro) =2.602. Constructthe 98% confidcnce
interval for population mean p.
. b
3
Ans.78.05<p<81.95
6.
9 4
Given n, = 32, n, = 50, Xr = 125, X:r = 100, o?= 16, o?= ZS and Zuuu" = 2.575-

9
Find the 99% confidence interval for the difference between population means
l-tr - Fz'

a t
t
Ans.22.425 < lrr - ytr< 27.575

7,
/: / s
Given n, = 48, Xr = 90, S? = lZ, n, ='12, Xz = 85, 53 = tA andZn.o2. = 1.96. Irind

s
the 95% confidence interval for p, - Fz.

tt p
Ans.3.6<pr-pr.6.4
8. Given the following surnmary statistics for irrclcpenclent random samplcs from

h
two populations:

n, = 16, Ir = 60, sf = 36, rz = 9,'Iz = 50, s?r= 25, s, = 5.67 and t,,,,, (x)= 1.714-
Find the 90% confidence interval for pr - lr:,.
Ans. 5.95 < pr * pr, < 1"1,05 i '
g. Givend = 3, n=9, sa= 3andto.u,,,1sy=2.306. Findthe 95%confitlenceinterval
for po - ltl - lr:r.
Ans.0.694<pa<5.306
L0. Given n = 500, 0 = 0.aO and Zco,r = 1.88. F-ind Lhe 9.4% confidence interval for
the popr"riation prbportion p.
Ans.Lr.26<p<03.1
11. Given nl = 400, 0, = 0.8, n, = 1100, 0g = 0.d and Zo.o4= 1.7b. Find thc g2%
!g
confidence interl'ar for the diffe rence between popuration
proportions.
Ans. 0.14 < pr - pe < Q,26
12' Determine the critical value of Z in each of the following circumstances;
'(a) l-o=0.g0 (b)1-s=0.92 (c)I_o=0.94
(d) 1-a = 0.96 (e) 1-cr =0.98 (0 l-s = 0.99
om
c
Ans.(a) 1.645 (b) 1.7b (c) r,r]8 (d) 2.054 (e) 2,326 (f) 2.575
13' State the formula which is used to calculatc a gb % confidence intcrval
t .
o
populatiorl mean p, when thc population standard for the
cleviati;, ;;r;;;*".
Ans.X- 1.96+ . r- p. X'-+ -'""
n t.e6+
s p
g
ri Vn

o
L4' state the formula which is used to calculate a g0 % confidence interval
l
for
the

b
population mean p, when the population standard
deviation o.is unkn'wn and

.
n=10.

Ans.X - t.33s+ .
3
vlo p. < X a 1.s33
4
-'""" $

9
vro
15'
9
State the formula which is used to calculat e a gz % confidence

t
inter:val for the
population proportion p, when n > 80.

- L.TIV+. p. ar.75VT
t
/a a 1""

s
Ans.ff +

/: / n=lZ
O

16' Determine the critical value of 't' in each of the following

s
circumstances:
(a) 1-s=0.9b,
p
(b) 1- o = 0.98, Dr = 10, nz= 12
(c)
t t 2.528 53
1-cr =0.g0, n=16 t (d) . cr = 0.99, r1r= 6, r: =6

17' If h
Ans. (a) 2.201 (b) (c) 1 .7 (d) 3. 169

X = 100, o = 8 ancl n = 6'1, sel up a g5 %conficlcncc interval


estimate of the
population mean p.
Ans.98.04 < p<101.96
r8' Stl,te the formula which is usecl to calculatc a g8 o/oconfid.ence interval
for the
difference between two poptrlzttion means pr -. pz, when populaiion
variances
are known with any sarnple size.

o; (v;
,)

Ans.(X,-X, -2.s26 < trr - pz < (Xr - Xz) + 2.g26 -|-


n, n2
Lt2 Basic Statistics Paft-II

19. State the formula which is used to calcr.rlrrte a 95 % r:onfidence intervai for the
rnean of a population of paircd differences for n = 9.
s; -- s,r
Ans. d - 2.3061 . L\r. a + Z.aOo ;
20. Distinguish hetween point estimnte and interval estimate.

m
2L. Explain the terrns estintate and estimator'

o
() .,
What is meant by csi;imation?
23. Itrxplain rvhat is rnearit by confidence interval.

t . c
o
24. Explain v,,hat is meant by unbiased esiirnator.

p
25. What rfui you mean by unbiaseci estimator? Give at least two exampitrs.
26.

g s
Defirre the terms point estimate and ini*:trvai estimate.
27. Wriie all the conficlence intcrvals for population mean with srnali r,n:d large

l o
samples, popuiation standzrrci deviation l:eing known and unknown.
28. What do you knorv about statistical inference?

. b
3
29. Differentiate between biaseti estimator airil unbiased esiiuratcr.
30.

9 4
What is mean! by unbiascdttcss?

9
3r" What is the procedure foilorved in {he constrr-tcticin of confirtgllqt' int'Et'v'all

t
Why is a confidence intervai cstimate of a pararneter is rnore useful tltau
ocl a

a
tl2t

t
point estimate?

/: / s
s
tt p
h
lchapter 1 2l st*iIllca I I nfelgnce Esti !.!gti.on 113

EXERCISES
l. An electricnl lirrn rnanufactures light bulbs that have a iength of lit'e tlrat is
rupl)t'clxrrnatcl,v nornrully disiributcrl witir a s[anrlarcl dcviation of 42 hours. lf'a
rntrdorn satnple of 4{} bulbs hrrs an avel'Hge lifrr of' 800 hours, find a 95 %
corrfidetiue itttervel ftrr tirc pupulirtion nroun of tll bulbs produeed by thrs firm.
Ans.788,?4<p<811.76

m
2, Finil a 90 % cunlidencc inte rvui fur thc rnc,rn ot;l nonuill tlistribut,ir:n if o = 2 and

o
rt satnplu of'sizc I grtve thc valuee f), l.[, i0, 13. ?, ll,
I l, 1?,
i Ans.9;84 < p < 12.1fi

t
8. Arr aclvertising agency want* to estimate the lverage ineome of fpm+figs loeated
.
y
c
o
I

in n partieular area uf a lorv.ineome Bcctir:n o[ Iinrae.]ri. 'lhcro lIUO00j,rrniliee

s p
in this area, &nrl the agency eirooses a randorn s:il)1plr.ut I00. The\reen rneome
c,f tltese fanrilies is lts" 1tt()0 per mtinth, Cleirnput,.r a 95 o/o confidence interval f'or

g
thr: population meatr, if tlre polrulation sttndarri clevitrtion is known to be Rs,

l o
900,

b
Ans" 1762.79 < p < 18S7,?1

3 .
.1. 'Ihe population o1' ,ecore$ of l0-year oid children in a psychologicul peltbrrnance,-
test is knorvn to,llavc a standarcl clcviar"ionT-,2. If a rnnclorn snniple of size !0

9 4
sherws fl meaR of 16.9, find a 95 % eonfidence interVal fbr the mean score of the
population, assuming that thc population i.s normal.
Ans, 14,(i2 < 1r < I0,18,

t9
a
6. r\ trre r-rranul.irsturcr rvnnts ts ergtinrate the rnearlryergirt of the tires produced by

t
one of its plants. Hri takes a ran;lqm samplc c,i lOb tires proctueed at thi* plant

s
/: /
and finels thut thc sanrpic mean i$&S.1 lbs. anrl tffiinmple stonciard cleviation ie
" 0.1.2 lbs, Calculute a 95 % confirlcnec intervsl for tlie population urean.

s
Ans""at*.Cff6 < pt < 48.124

tt p
6. Cornpute a 90 ')/o corrfitlence interval f'or the po;;nlut,ion lnoln, ifn* 36,
*
5400 and X(X * X)r * tZ$ti.
h
IIX
Ans. 148,1155 < p dt51,G,l'5 'x
1, The heights of a rrinrtorn sanrplo uf 64 qolle ge s[udents slrowed a mean af L72
ce nl;irnetera urtcl a strrndnrd rlovirition of 6,]-i cerrtirnetcrs. Irinda92%confielence
interva}forthelneanheielitoffi}icollegesturlcnts
Ans. 170,5?8 < [ < 1?[].4?2
8. The hourly wages of 144 workcre of a largc fhctory wqre reeoreled, and the
sarnple mesn and standarci dcviation were found to be lill" ?8.59 and Ra,6.'i1
re$pectively. Ilind a "99 % eorrfiricrrec intervul f'or rh* mean wages of faetorl'
workcrs,
Ans.2?.08<p<24.06
114 Basic Statistics Paft-II

S" A randorn sample of I


cigarettes of a certain brand has an average nicotine
content of 8.6 miliigrams and a standard deviation of 0.9 miiligrams. Construct a
gb Vo confulence inierval for the true average nicotine content of this particular
brand of cigarettes, assuming an approximate normal distribution'
Ans.2.91 <p<4.29
10. A certain machine is used co produce items whose weights are assumed to be

m
nbrmally distributed. Suppose that the variability in the weight of the output of

o
the machine is unknown. For a random sample of size 16, X is ibund to be 282

c
.
grams and. s is 8 grams. F'ind a95% confidence interval for the true poptrlation

t
mean.

o
Ans. 277.738 < p < 286.262

p
r..rf bool. lrrom the
11. A firm wants to estimate the mean lifetime of a particular kind

s
previous experience, it is known that the lifetimes are normally distributed' It

g
dru** o ,urdo* sample of four of these tools and finds that their lifetimes are

o
T.g, g.B, 10.8 ancl 11.4 years. Calculate a95 % confidence interval for the mean
lifetime of this kind of tool.

bl
.
Ans.7.35<p<12.35

3
f1.1. Arandom sample cf size n, = 30 taken frorn a normal
population vvith a vttriance

4
= g, has a mean X, = 75. A sccond randotn sample of size nr=
25, taken from a

9
o?

9-
different normal population with a variance of,--ZS, has a mean Xz = 70' Irind a

a t
98 % confidence interval for p1 lt2'
Ans. 2.4 < Fr - ytr< 7.6

s t
/: /
13. Two independent sarnples of 100 machinists arrd 100 carpenters
:rre taken to
estimate the difference between the rveekly wilges of the two categories of

s
workers. The relcvan
rele data are glveIlrn below:

tt p
Sample meaR wagc Population varianec
hinictc 345 196

h
Mae
Carpentere 840 204
put*ffig/,confidenc0limitsftlrthetruecliffer'encebetween
the average wages for rnachinists and carpenters. Which interval is rvider?
Ans. 95'% confidence itlterval: 1.08 < p, - ltz < 8'92
gg % confidence interval: - 0.15 < Fr-,tr< 10.15. Therefote 99Yo interval is
wicler,
14. A standardized statistics test wns given to 75 boys and 50 girls. The boys made
an average grade of 82 with'a variance of 64, while tlie girls made an average
grade of ?A *ittl o variance of 36. Find a 9S% confidence interval for F r - Itz,
where pl stands for tnean score of all boys and p, stands for mean st;or1;
of all

Ans. 2.77 < !r - P, < 9.?3


[Chapter L2] Statistical Inference Estiniation 115
15. Two independent random samples of the diameters of tires are drawn, one frorn a
batch of tires produccd at plant A., and anothcr from a batch of tires prcdticed at
plant B. The results are as follows:
Sample Sarnple sizc Samplc mean Sample variance
. (inches) (inches)2
"
Plant A 100 50.7 0.09
Plant B r00 50.3 0.04

om
. c
Calculate a 95 % ionficiencc interval for the differcnce between ihe mean

t
diameter of the entire batch produced at plant A and the mean cliameter of the

o
'entire batch produced at plant I3.

p
Ans. 0.33 < pr -- ttz< 0.47

g s
16. Suppose that for a ranclom sarnple of size 8 from population I the sarnplc mean
and standar.d deviation were 14.9 and 4.17, respectively while a random sample

l o
of size 5 from populatiou II yieided a sample m€an of 10.6 and a sample standard

b
deviation of 3.62 respectively. Assuming that the populations are normally

.
distributecl with equal variances. Cornpute a 90 % confidence interval for the

3
dif'ference between the population ineans.

4
Ans. C.22 < Fr - p, < 8.38
17. The following summary

99statistics are recorcled for independent randbm sarnplers

t
from two populations:

t
Sample I
a trr =9 nL = sr = 1.54

s
16'18

/: /
Sample II Dz=6 Xz = 4'22 sz = 1''37

s
tt p
Assunre that populabione are normal with the identical etanflarrl devintions.
Caleulate a 95 % eonfidenee interval firr pr, - lrr '

h
Ans. 10,28 < Irr - p, < 13.64
lE.It is claimed that a new diet will reclucc pu,'*un;, weight by S kiioerams on the
" of 4 men who were given this diet
average in a period of 8 weeks. The weights
were recorded before and after a 8 week period:

*-
t
Weights betbre

{ lVeights after
Compute a 90 % confidence interval for the mean difference in the weights.
Assume the distribution of we ights to bc' npproximately Rormal,
Ane. - 0.19 < p6 < 8,19
I Basic Statlstlcs Part-II

To compare two treatments, a matched-pair ,&Reriment was conducted with 12


1.9.
pairs of subjects, and the paircd differences lt tl"te response to treatment B from
*
ihe response to treatment A were recorded/2, 5, 6, 8, :- 6, 4, l'8' 12, L7, - I ' 16'
12. Construct a g5 % confidence interval for the mean difference of the responses
bo the two treatments.
t
Ans. - 0.98 < p, < 11'48 |

m
nO.A ranclom sample of 300 cf'garette smokere is seleeted and ?5 are found to have
a
v-/-

o
preferenee for'Gold le$f, Ilind u 98 % contidence interval for the fraction of the

c
population of cigarctte smokerg whu prefet' Gclld leaf'
.Ans.0,19<p<0.31
t .
o
tire ntatrufacturcr tlraws a randont sample of i60 tires produeod by a new
i7-.rZL,A
-- pro."*u
p
and finde that 40 percent wear better than required speoifications.

s
A;gtruct a g0 % confidencc intcrval for the population proportiori of the tires

g
p.uJr*eel by the new proeess that will wear better than required specifications.

l o
Ans.0.34<p<0'46
in 40 tosses of a
-/.ZZ,Find,a gb % eonfidenee interval for'p'if
b
24 heacts are obtained

.
coln.

3
Ans.0,45<P<0.?S

4
l Zl"Iu a random sample of 1000 hbmee in a certain city, it is found.that 228 are

9
heatecl by oi1. Find a gS % confidence interval for the proportion of homes in this

9
oil. /
t
city that are heated bY
t ./

a
Ans.0.194 < p < 0.2(i2

t
t; LA,A firm hae faetories in Karachi and Lahore. It picks a random sample of
100

/ s
workers from each fgctoryr In Karuehi, 32 percent say that they buy the firm'e

/
proeiuet; in [,ahore, 27 pe]cent, Construct a 95 % confidence interval for the

s :
clifferenee iretween the iroportion nf workers in Karachi and Lahore who
say

tt p
they bu)' tlre firttt's Protluet,
Ans. .- 0.08 < pr .. P, < C\.lti

h
16, A poll is takel alRorlg tlre resitlents of a city ancl the surrounding
semi'urban
a civic center, If 240 of
. ur*u to rletermine thelea,sibility of a proposal toofeonstruct
semi'urbans favour it, find
b00 city renidents favour the propoeU nna 120 300
a SE gi, confidcnce interval fcri the truc diffcrence in thc fractionu tavouring
the
propcisnl to con$truut t,he civic centrrl
Ans. * 0.20 < trr, * i)y <
* 0,04

tit t(o\'
.. $\I \',
''
Chapter
13
STATISTICAL INFERENCE
TESTING OF HYPOTHESES
om
18.1 INTBODUCTION

t . c
o
Statistical inference consists of estimation, of parameters
and testing of

p
hypotheses' Estimation has already been discur.u'd in
ihe previo.,, .t upi"" and in

s
this chapter our lessoh is about the testing of hypotheses. point
ihterval estimation as discussed earlier ti"r. tt*ir- own fields estimation and
g
of application.

o
Sometimes there is a situation in which the poini estimation

l
and the interval.
eetimation are either not required or the estimation rf;**;;;";;r';;t.provide

. b
anv inference. For example, the following situation. r"q"iib
l;f;;; *ij.r, is not

3
pbssible by methods of eslimation.
(i)
4
::ll,t-llt bfIna this
medicine have been changed to improve the effectiveness
''i P"
the medicine' of

9
situation both the point estimation and the t;;;;i
estimation fail to answer the question aboui the improvement

9
the meclicine.
: this case we have to take help from the samfie data to decideofwhether
t
In
or not

a
:the medicine has been imptoved.

t
(iri) A manufacturer of tires claims that the average

s
' kilometers' The life of tires is an important faitorlifetoofsettle
his tires is at least 15000

/: /
the price of the tires.
is a big information if we prove *iir, ,"".r""ui" amount
-It of confidence that the

s
life of the tires is not mo"e th"n 15000 kilometeis. The answer
is not provided

tt p
by a point estimate by an inrerval estimare or ttre-iire;l ;; ffi;. what we
-or
shall have to do is that we shall examine the claim
basis of the experiment conducted the r"rnpte or"rirr"
-"""i;;;r., on the
h
_on tires. A .""t*r, procedure
yill be adopted to reach some conclusion. This is what we shall call the test of
-
r,. hypothesis about the life of tires.
18.2 STATISTICAL HYPOTHESES
Any opinion or idea may be formed about the population.under
study. Consider
the following statements: Average con€umption or suga" per
month for a cdnsumer
I kg; Intelligent parents have, intelligent -'childreri, tall father. tr"r" -
tall
is
sons,
average life of the people of pakistan is higtrer than
that of ,India, proper greasing
increases the life of ceiling fans, use of coffee irt"ruur.,
chances cf
variety qf'seed is better than the other, a med.icine of allergy giveshea4 attack, one
relief ;;;1""r;
8o %o. of the people, more than 25 % people hre titerate
people will go to the polling stations foi voiing. rnu"u
ln'ffi .;;;;;, only 60 %
.iut"*urrt. thl'qu".tior.
tt7 "*
Basic Statistics Paft'II
rt8
of life and these questions are to be answered after
proper
in different field.s
;il;;;;;;. d;.u q,r..tions ha*'e come up in the Proc:tt",:.frY"t$il:"i:
iil:"iJ"ii;;'",'il" hypotheses are generated i,ring various studies. when an
about the clistribution of a
assumption. is explained in the form of a statement
population o* popuin-tions, it is callerl a statistical hypothesis'
In sirnple words' a
statisticalhypothesisisastaterncntabouttheunknorvnvalueofthepopulation

m
parameter. The staletnent may be true or false'

o
13.2.1 NUI,L HYPOTHESIS
hypothesis' It is denoted by

c
l

The hypothesis which is to be tested is calLed tt"till

.
A statement which we hope will be

t
l

Ho. It is a starting point in the investigations'


is different' Today any
o
rejected is taken as a hypothesis. Moclern approach
and is denoted by Ho' In this

p
hypothesis we wish to testls called null hypothesis

s
rvill be called null
book we shall follow the o-ld convention. Any hypothesis

g
we hope to reject it. Thus the null hypothesis
is framed for
hypothesis orty *t that tall fathers

o
"r,
possible rejeetion. Tail fathers have tuit .t la."n'
We shall assurne 'will be

l
tall children. This will be considered as null hypothesls and

b
do not. have
rejected on the basis of samplc data'

.
denoted by Ho. w;;; h;pi.,e tltut H" wlll be
To start rvith we shall assume that

3
Use of coffee increases chances of heart attack'
will be taken as Ho and we hope

4
heart attack fru, .r"lint *itf, tn" .rs, oi.offue. This

9
it will be rejected by the sample data'

9
L3.2.2 ALTERNAiTVE HYPOTHESIS

t
-"'lfivpoirru.i* been rejected is
which is accepted when the ,uli hypothesis has

a
called the alternrtl"" tvp"thesis. It is denoted
by H, or Ha' Whatever we are

s t
expecting from the sample data is taken-as the,altcrnate
hypothesis' "More than

/: /
hoping to get this result from the
2b% people are literate in our'.ounirf'. We are.
ri, nult hvpothesis Ho will be
sample. It will b" l;k;; u. un uttrrr,ril hypothesis

s
1na o/o or less
o/o orless than that are literate. To be more specific' Ho will be 25

tt p
that 25
o/o aYQliterate' It is rvritten as:
are literate and H, will be more than 25
Ho: p < 0.25 (25 o/t'or les.s) H,: p > 0'25
(more than 25 %)

h
.
To keep the things simpie, we san write Ho in the
form of equality.'" jo' p = 0'25
Thus we write
but it is important to write Hr with proper clirection of inequality'
H,:p>0.25. :

( > ). We shall explain


In this case tire H, contains the inequality tttore th,atr.
fut", tfrut H*'f"oy be written with inequrtity lnss tltan
(< ) or n'ot equal ( * )' In
0 is 0o, then H, can be
general, if the hypothesis about the population parameter
written in three different ways'
' fo. Ho:0=0o, Hr:0*0o Hr:0>0o H':0<0o
the students' Another way
But this is ihe simple approach which is allowed for
of writing the above hypotheses Ho and H, is
(a)I{o:0 = 0r,H,:0*0o ft)Ho:0 s 0o'Hr:0>0o (0) H":8 > 0o , H, : 0 < 0n
I

lChapter 13] Statistical Inference Testing of Hypotheses 119


i
The alternative hypothesis ll, never contains the sign of equality. Thue H, will
not contain '=', 's' or '>' signs. The equality sign '=' and inequalities like 's' and ,>,
are used for writing Ho.
13.2.3 SIMPLE HYPOTHESIS
If a hypothesis has a single value for the population parameter, it is called
simple hypothesis. The breaking strength of copper wire is lokg. Here Ho: p 10 kg

m
= I
has a single specified value. Ho is simple hypothesis, similarly
Fr - trz = 10 and

o
p = 0.6 are simple hypotheses.
13.2.4 COMPOSITE HYPOTHESIS

t . c i

o
The hypothesis is called coutposite if it specifies a range of values for the

p
the

s
hypotheses (pr - Fz) > l0 and p s 0.6 are composite.
i

g
13.2.5 ACCEPTANCE AND REJECTION OF NULL HYPOTHESIS

l o
The given hypothesis is testcd rvith the help of the sample data. A simple
random sample hae the full freedom of giving any value to its statistic. The sample

b
I

.
is n9t a-w1ry of our plans. We decide about our hypothesis on the basis of the .u*pl.

3
statistic. If the sample does not support the null hypothesis, we reject it on {

4
probability basis and accept the alternative hypothesis: If the sample does not

9
oppose the hypothesis, the hypothesis is accepted. But here 'accepl'does not mean
I
the acceptance of null hypothesis but only means that the sample has not strongly

t9
opposed it. "Not opposed" does not mean that the sample has strongly supported the

a
hypothesis. The support of the sample in favour of tn" hypoihlsis'cannot be

t
{
established. When the hypothesis is rejected, it is rejected with a high probability.

s
;

/: /
Thus rejectiorr, of Ho: is a strong decision and it leads u$ to the acceptance of H,. But
acceptance of H, is not like the acceptance of Ho. The acceptance of null hypothesis

s
does not give us a certain strong decision. It is a situation which may require some I

tt p
i
further investigations. At this stage, many factors are to be taken inio account. The I
sample-size and certain other things not yet discussed help us to do something more

h
I
about the null hypothesis before it is finally accepted. Thus reje:ctiort,is a decision but. :

j
not necessarily true and occeptance is not a decision in any sense of the word.
There is a modern approach in which the terms rejectiort and, acceptance are not
used. This modern approach is beyond the level of this book. But it remains true in
its place that acceptance of a null hypothesis is a weak decision whereas rejection is
a strong evidence of the sample against the null hypothesis. Vfhen the null
hypothesis is rejected, it means the sample has done some statistical work but when
the null hypothesis is accepted, it means the sample is almost silent. This.behaviour
of the sample should not be used in favour of the null hypothesrs.
13.2.6 TEST STATISTIC
A statistic is calculated from the sample. To begin with we assume that the
hypothesis about the population parameter is true. We ccinpare the value of the
statistic with the hyperhetrr:;rl value of the parameter. if the rlifference bctween
Basie Statistics Fart-II
between t'hem is large'
them is small, the hypothesis is accepted ancl if the dift'e'etrce
can be based whether to
the hypothesis is ,ui*t"d. A statistic cn which the decisic'n
of the test statistics to be
accept or reject u t ipott .ris is called test statist'rc' Some
discussed in this book nre 'Z', 't' antl 1' lchi-square)
L8.2.7 ACCEPTANCE AND REJECTION REGIONS
agree with the given
The values of the test statistic which we think do not

m
The values of the test '
hypothesis are cailed the eritical region or rejection region'
statistic which the hypothesis form ihc acceptance regionr I'he rejection'

c o
.
""pi,ort
,"gion is equal to'n and th* zicceptance region is denoted by -
(1 C[)'These two

t
combinecl together rnake the
regions are separate frorn eacit other and both regions

o
regions are separated by a
complete sampling clisiribution of the ntatistic. 'fliese
;;il (or values), *hi.i, is called critical value (or vaiues).
s p
g
Wheri the rejectio' region is taken orr both erlds of tlte

o
sarnptring distribution'

l
the test is called two-sided. test at two-tailed,lesJ' When \Ie are
us-ina a.two'sided

b
the rig,ht side and the other

.
test, half of the ,.io"iior, region equal to alT is taken on
half equal to .,l1f;;t;" oi th" teft side of the-sarnpling
ilistributioii' Suppose the

3
and we have to test the
sarnpling distribution of the staiistic is.a normal distribution

4
0o which is two-
lvp,iirr"rrr Ho, o ;-;; ;;;t"ri tt , alternative hypothesis H,: 0'*

9
greater than Zoi2 or it is le.ss
sided. Ho is rejected when the calcufated value of Z is

t9
<-Z''12 can also be written
than-Zoy2. Thus the critical region isZ > Zul2orZ 'iL
as-Zop<Z<Z*2
ta
s
is shorvn in Fig' 13' 1'

/
when Ho is rejected, then H, is accep ted' . Two'sided test'

:/
s
tt p
Rejection Region
Rejection Region v (l-u)
I
h
Acceptance Region

Zsl2
-Z o,t2 Z=0
{ {
l.orver Critieal Value Upper Critical Value

' Figure 13' I


T3.2.9 ONE-TAILED TEST
0o or 0 < 0o' then the
When the alternative hypothesis FI, is one-sided likc 0 =
cl'istribution' It is called
rejection region is taken only on one side of the sampiing
one-taiJ,ed, test or one-si,rJecl J*si. Vv'hen H, is orre-sid.ed,
to the i:ighN like 0 ' *o'-lh'
entire rejection region eqnal to cr is taken in the riglit end of thc sampling
distribution.
lCha pter 13
] ji Lliirtl sti e* i Xn fe r*st {e Testi sr s g{.EIE{}t}g:qE LzL
The test is cali*d an,e-sidcd to One - Sided totheRight
the right. Thc .hypothesis l{o is
rejected if the caL:uiatecl vaiue of a
statistic, sa3'Z fails in Ll-ii: rcjection
region. The criticai vaiue " is 2,,
which l'ras the area equal to cr to its
right. The rejection region rrrld

m
1,-t)
acceptance reg;ion are shown in

o
[,'igure L3.2
Fig.13.2. The nuli hypothcsis Iio is
rejected r.r,hen ?(caicutrate d) > Zn.

t . c
o
If the alternati.ve hypothesis is One-Sieled tn thr: !,*ft

p
one-sided to tire lef't likc 0 .- 0o, the

s
entire rcjecticn rcglon cqu;rl to ct is

g
Ite.iection llegion
taken on the i*-,ft i.*il of the

l o
sampling rlistribuiion. 'fht; t.:st is i

b
!

called one-siclerd or ar:e-tailed to the .t

.
i
left. The critical yillue is * 2,, rvhich

3
cuts off the aret cqual to ,; 'uo its Figule 13.3

4
Ieft. The critical region is Z < - Z*

9
and is shorvn in Fig.13.3.

t9
For some irnportant vaiucs ol'a, the critical values of Z for trvo,l;ailed and one

a
tailed tests are given bek:w:

s t Critical values of ff

/: /
a Trvo *.siried test 0ne-sidecl 0ne-sided

s *
to the r:ight to the'left

tt p
0.10 (10 ?6) -. 1,645 ancl * 1.645 + l.?82 -- L.282
0.05 (,5 ;s6) 1.t)6 and + L.96 + 1.645 - t,645

h 4.02 t2%)
0.01 (1 %)
*
*
2.3?6 and + 2.326
2.575 and + 2.575
+ 2.054
+ 2.326
-- 9.054

- 2.326
13.3 ERRORS IN TESTING OF' HYPOTHESIS
The null hvpothesis flo is accepted or rejected on the basis of the value of the
test-statistic which is a function of the sample. The test statistic may land in
acceptance region or rejection region. lf the calculated value'of test-statistic, suy Z, is
small (insignificant) i.e., Z is close to zero or we can say Z ltes between - Zrr12 and
Zrlg is a two-siderl rili;ernativ* test (H,: 0 x 0o), the hypothesis is accepted. If the
calculated valrte of the tcst-statisticZis lnrge (significant), Ho is rejected and H, is
accepted. In ihi$i rcjectior: pian or acceptance plan, there is the possibiiity of rnaking
any one of the two e rrors which are called Type I and 'Iype trI-errors.
L22 Basic Statistics Part-ll

13.3.1 TYPE I ERROR


The null hypothesis Hn may be true but it may be rejected' This is an error and
any value
is called Type I error. When Ho is true, the test-statistic, say Z, can take
between - o to + oo . But we reject Ho when z lies in the rejection
region while the
rejection region is also includecl in the interval - o to.o. In a two-sided
H, (like 0 * 0o), the
2,172' When Ho'is
hypothesie is rejectecl when Z is less than - Zozor Z is greater than

m
region
true, zcan fall in the rejection rcgion with a probability equal to the rejection

o
iir""it is possible ttrat Ho is rejected while Ho is true. 'l'his is called Type I error'
".
.
The probabitity is (1 - a) that Ho is acceptecl when Ho is true' It is called

t c correct

o
decision. we can say that T'ype I cllor has been committed when:

p
(i) an intelligent student is not promoted to the next class.
(ii) a good player is not allowed to play the match'

g s
o
(iii) an innocent person is punished'
(iv) a diiver is punished for no fault of him'

bl
(v) a good worker is not paid his salary in time'

3 . quoted to make

4
These are the examples from practical life. These examples are

9
a point clear to the students.

9
cr (ALPHA)

a t I
The probability of making Type error is denoted by cr(alpha).
when a null

t
hypothesis is releCted, *u bc wrong in rejecting it or we may be right in

s
^oy will be' it
,ui'".ti"g it. We do not tno* that Ho is true or false. Whatever our decision

/: /
probability of
*iff f,ui. the support of probability. A true hypothesis has some the size

s
called
rejection and this irobabiliiy is dcnoled by cr,. This probability is also

tt p
of. Type I error and is denoted bY a'

13.3.2 TYPE II ERROR

h
The null hypothesis Ho may be false but it may be accepted' It is
an error and is

called Type II error. The value of the test-statistic may fall in


the acceptance region
;;; H" is in fact false. Supposc the hypothesis being tested'is Ho: 0 = 0o and Ho is
0o and 91 is very
false and true value of 0 is 0, or 01ru". If the difference between
large then the chance is very small that Oo(wrong) will be accepted'
In this case the
true sampling distributiori of the statistic will be quite away from thefall in the
sampling
distributior, .rnau, U"., ih"ru will be hardly any test'statistic which will
overlaps the
acceptance region of H,. When the true distribution of the test'statistie
u...prrrr.u region of Ho, then Ho is accepted though Ho is false' If the
difference

between 0o ar.rd 0, is small, then there is a high chance of accepting


Ho'This action
will he an error of TYPe ll'
fChapter t3l Statisticat Inference Tesllng_s[Hypolheleg .
123
p (BErrA)
The proirabiiit3, of rnafting ,l,ype II e1r.oi, is denoted
committed rvhen Flo is accepted *irito [I, is true.
by F. Type II error is
The value of B can be calculated
only when we happen to know the true value of the population
paranreter being
tested.
13.3.3 RELATION BE'IWIIEN s AND
B

m
suppose we have tg tcst Ho: ir = pn ag:rinst the
albernative H,: l, > Fo. A randonr
sample of size n is sclcctcd front the population. and the sample mean
X
c o
.
is

t
calculated' The sarnple size n is large and therefore
the samplingdistribution of X is

o
normal with mean p. To srart with we assurne that

p
Ho: p =;" i, tru" and x has the
distribution,as shorvn on left sicle. of the fig. 18.4.
Undcr IIo

g s
Under H,

l o
. b
43
99
t
Fig.13.4. has two sampling or.rr,orr?lxT"'"": on

a
rhe lefr side and the other is on

t
th,e right side. when the null hypothesis Ho:
,, = p; i.-being t"rt.a, iir"r" are the

s
rollowrng tbur possibilitics.

/: /
(i) Ho is true and X falls in the area marl<ed (t
- a) in the Fig.1B.4. The hypothesis

s
Ho is accepted and this is callecl correct decision. Probability

tt p
of this correct
decision is (i - o.). we rnay or r'ay not rnake this
decision.

h
(ii) Ho is trtte and X falls in the area marked cr. This
is the area of the distribution
on the left side' Now Ho is true but it will be rejeeted
because X faus in the
rejection region. This is an error of Type I and this"error
will be committed.wittr
the probability of q. we do not know whether we
have committed o, emor or not.
(iiil Ho is false' The true value of p is say and the
Fr true distribution of X is the
distribution on the right side in Fig. r8.4. Now suppose
marked (1 - p)' This is outside the acceptarrce region
x fals in the area
of the distribution on the
left side' 'Ihus Ho: p = pro is rejccted *nd th" p.ouiuitity of
this action is (1 - 0).
It is called colrect decision when Ho is false. fact, X belongs to some
' distribution' when we take a hypothesis H, thisIn is an assumption about the
L24 Basic Statistics Part'II

mean ofithe ailtrilrtion of X. If true distribution of X is on the right side, then


;;; *ru*pf thi, dlrtribution is falling on the acceptance region of the
hypottibtical distribution on the left side. This area is marked as B.

(iv) Ho is false and the value of X falls in the area marked P. In thie case Ho is
accepted because X has fallen in the acceptance region of the first
distribution'
ThusHobeingfalse,maybeacceptedwithprobabilityofB'
If the distribution on the right side ie shifted to the right, B will decrease
o
and ifm
. c
value of p depends

t
this distribution is shifted to the left, B will increase. Thus the
when n is
upon the,true value of population mean p. In A certain given situation

o
decrease o, we
fixed the value of B increases when a ie decreased. Thus if we want to

p
g'risk an-d
shall do it at the risk of increasing B. cr -error and B'error are also called

G; the costs of committing o-error


g s
p-rirt respectivety. wt i.tt risk do-;e want to keep at minimum level? This depends
a1-d p-'error. Suppose we are hesitant of

l
rejecting Ho when'il i, tr.r", ihun *" shall take cl at a small level'
o
In most of the

b
o/o) or 0.05 (5 %).
tbsts, a is.fixed at a small level like 0.01 (1

.
of hypothesis'
The following table shows four possible decisions in a certain test

43 Ho is True Ho is False

9
Ho is Accepted Correct decision Type II error

t9
Ho is Rejected Tlpe I error Comect decision

ta
wn*ffithesis,ourdecisionwillfa1linanyoneof.th3above
-clecisions

s
-four boxes. 1'he fo"r porJiUtu in terms of probabilities are ehown below in a

/
'

:/
tabular form

s
Ho True Ho False

tt p
(1 p
Ho ie Accepted - cr)

(1-p)

- h
Ho is Rejected ct

under Ho
may u" notra ttrut q is an area in the right tail of the distribution
It
*I
and p ie the area in the left tail of the distribution under H,, Thus
ct + F in
general. In some epecial case and that too very.rarely' o +,p f:y b:,:,q111lP llff]
ff;"fffi;i;Uirr. Thus probabiliry te smau that our decision will fall in the box
marked o. But *frun our dicision hae fallon in the box marked
a, it is a powerful
decieion against Hn.
18.4 LEVEI OF SIGNIFICANCE
-Th;;.riek
is the probability of rejecting a true null hypothesis. It rs aleo
calle{
by o and its
the significance level or level oi eignifi4ance of the tegt. trt is denoted
before the selection of
level is ueually I yo or 6 %, The ,uiu. of cr is usually decided
the sample.
[Chapter l3I Statistical Inference Testing of Hypotheses 125
I8.5 FARMULATING HO AND HI AND MAKING CRITICAL REGION
Now, when we have discussed different terms used in the testing of hypothesis,
we are in a position to discuss a point which is quite confusing sometimes. The
question is how to formulate the null hypothesis Ho and the alternative hypothesis
H,. We elaborate this point here and we shall repeat here certain points already
discussed in this ehapter about framing of Ho and H,. Let us consider some cases.

m
(i) A machine has been produeing components with mean length of 3 cm. which is

o
the required standard; A new machinery has been installed and it is required to

c
test the hypothesis that the mean length of the components is the same. It is
obvious that in this case the Ho and H, will be:

t .
o
Ho:Ir-3cm. Hr:F+3cm.

p
H, contains the inequality'*'which means that the rejection region is taken in
both ends of the sampling distribution.

g s
The test-statistic used is Z = +.
l o
b
" o/r/n

.
The null hypothesis Ho is rejected if

3
Z.-Zaz or Z > Zon .It is called ,uo'

4
tailed, tesl with rejection region on
, ui3
9
both sides. Ho is rejected when -2CI/2 Z= 0
,

9
Zun

t
sample mean X is sufficiently larger

ta
than 3 cm. or sufficiently smaller

s
than 3.

/: /
(ii) Suppose that we want to test whether the mean p of a normal distribution
€xceeds a'specified value [,o. We set up the null and alterhative hypotheses as

s
follows: Ho:lr=lro Hr:F>Fo

tt p
The null hypothesis Ho and the alhrnative hypothesie H, in thic case can also
be written as Ho : lr 5 lro

h
Hr : [r > po
o
H,'is complement of Ho and the area of the distribution unddr Ho and H, makes
n the complete diatribution. In thie iase, the region of rejection ig takon in the
rl
rieht tail of the dietribution.
x The test-statietic is
il
Z =4.The
ofi/n
null hypotheeie
Ho is rejected when the
d calculated valud of Z is
;s greater than the critical value
lf Z=0
za.
.Figure 13.6
126 Basic Statistics Part'II

(iii) At least 60 % of the people are in favour of English as medium of instructions.


(1) at least
The sampling distribution of proportion ff is dirided into two parts
60% (2) less than 60%.
We have a serious doubt about the statement and we hope to disprove it.
The
; proportion of the people p > 0.6 is to be tested. The idea or suggestion of at least
bO i" (p > 0.6) wili be'rejected if the sample gives the- result weII
below 60 %; The
iui".tiio r"gion is deciied by H, which is one-sided to the left' Thus we frame
HoandH,as: Ho: P>0.6 H,:P<0.6
om
c
In this case the entire critical region lies in the left tail. If Hr: P < 0.6 is true
then the sample proportion fi should lie in the rejection region.
t .
The test statistic used here is

p o
z = #.
^\n/pq
The hypothesis Ho

g s
l o
b
is rejected if.Z < -Zo. p=0.6

3 .
-zq Z=A'

4
Figtrre 13.7
Example 13.1.

99
t
Indicate the type of errors cornmitted in the following casesl
(i)
ta
Ho: P = 500, H,: p * 500' Ho is rcjected while Ho is true'

/: / s
(ii) Ho: p = 500, Hr: lr < 500. Ho is accepted while true value of p = 600'
Answer:

s
(i) The hypothesis p = 500 is true and it has bebn rejected. Type I error has been

tt p
committed.
(tD;;;i"lseandhasbeenaccepted.TypeIIerrorhasbeencommitted.

h
18.6 GENERAL PROCEDURE FoR rbSrtNC OF HYPoTHESIS
Following are the main steps involved in the testing of a hypothesis. about
population Parameter
the

1. Formulating Null hYPothesis Ho:


'frame the hypoth-esis
First of all we have to identify the problem and then we
which
which we think shall be rejected. Supposi the population_parameter is 0 about
*" fr""" to frame the hypothesis.We specify a value 0o for the unknown parameter'
The null hypothesis Ho can be written in three lvays as shown below:
(i). Ho:'6=0o (ii) Ho: 0 < 0o (iiil Ho: 0 > 0o
In some particular situation any one of the above three forms of Ho is taken. The
important thing about Ho is that Ho always iontains some form of ari equality sign
such as '=', ')', or 's '. As Ho always contains sign of equality of some type, some
people always write Ho ns Ho: 0 = 0o and they do not write the inequality contained
in Ho.
Alternative hypothesis if ,:

om
The alternative hypothesis H, is the opposite or complement of Ho. Ho and H,

c
combined together make the entire sampling distribution. Both Ho and H, are

t .
equally important and they are to be defined properly and clearly. As H, is

o
complement of Ho, therefore H, stands decided when Ho has been fixed. For

p
example, for each value of Ho, the corresponding value of H, is given below:

s
(i) If Ho: 0 =0o then H,: 0 *

g
0o
' (ii)" If Ho: 0 <0o then

o
H,: 0 > 0o
(iir) If Ho: g >0o then H,: 0 < 0o

bl
.
2. Level of significance q,:

43
It is the probability of rejecting Ho when Ho is true. It is denoted by a. It makes

9
the size of the critical region.
3.
9
Test-statistic:

t
The testr stafisfic clepends upon the shape of the sampling distribution of the

ta
statistic. If the sampling distribution is a normal distribution, tle test-statistic to be

s
used is Z and, if it is a t-distribution, the test-stqtistic to be used is t. Other test

4. Critical region:
/: /
s
Critical region or rejection region is decided by Hr: The size of critical region is

tt p
equril to a.
(0 If the alternative
h
hypothesis is H,: 0 * 0o the rejection region is taken in
both ends of the sampling distribution. Each side has rejection region equal
to uJ2. It is called two-sided rejection region. The rejection regions are
separated by the trrio critical values.
(ii) When H, is 0 > 0o, then rejection region of size a is taken only in the right
side. It is called one-sided to the right. The rejection region is eeparated
from.the accept:rnce region by a critical value of test-statistic.
(iii) When H, is 0 < 0o, the rcjection region of size a is taken only on the left
side. It is called one-si.d,ed to the left.
6. Computations:
The relevant test-statistic is calculated from the sample data. The calculated
value is to be compared with the tabulated value.
Basic Statistics Part-II
128 T
6. Conelusion: (
region, the null
If the calculated.value of test-statistic lies in the.rejectionvalue
hypothesis Ho is. rejected and H, is accepted. If the calculated
of the test'
'it is not
statistic falls in the acceptance region, we say that Ho is accepted bu't
means that the
acceptance in the real sense of the word. The word acceirtanc-e only
,u*plu has not provided suffieient information against the null hypothesis'

m
IS.THYPOTHESIS TESTING - POPULATION MEAN p, o KNOWN

o
(LARGE SAMPLE)

c
Suppose u pop"ruiior, has the mean p which is unknown and
the'standard

.
from'the population

t
deviation o, which is know.,. A large sample of size n is selected

o
and. sample mean X is calculatecl. We are required to test a
hypothesis that the

p
((
population mean p ilr the specified value p". ih" steps of the procedure are listed

s
below:

g
i.--Wu frame the null hypothesis Ho and the alternative hypothesis H,' Three

l o
different forms of Ho and H, are possible which are:
> Po

b
(a) Ho: p= lro and.H,: lt * fto ft) Ho: Ps Po andH,: F
(c) Ho: F) lto andH, : F < lro

3 .
4
.)
Level of signifrcance o is decided.

9
3. Test-statistic:
When sample size is large, the sampling distrib"tln of X ha1
the normal

t9
distribution with mean p and.the standard error o/r/n . The'population
may or

ta X_po

s
may not be normal.'Ihe test-statistic to be used is Z where
Z=

/: /
m
4'
s*
ff:'::||ifr-l:;:" depends upon the arrernative hvporhesis. rhere are three

tt p
pogsible rejection p[arrs. We discuss a]l the three turn by turn'
(a) When H, is 1t Po, the

h
rejection region equal Rejection Region/f I \ Rejection Region
to atT in size is taken
on both onds of the
sampllng distribution ,:,1!,2
ae ehown in Fig' 13.8. tt = ]t o

The criticat u"tr.r-Ji?


which separates the Figure 13.8
region are - Zs12 and Zo2' The 6.
critical regions from the central acceptance
critical value
britical value - z(xt|hae the area on its left equal to atT and the ac
* Zot'has area on it, right equal to uly.'Ho is rejected if the calculated value of ag
< and Z > Zs.tz' When
Z il;in rejection region..The rejection region isZ -Zon re.
cr = 0.05, then - ZalD= -Zo.ouo = - 1'96 and
Zo,oro = 1'96' tS
[Chapter 13] Statistical Inference Testing of Hypotheses L29
(b) When.H, is p > po, the
rejection region equal
to cr is taken in the
right end of the
distribution as shown
in Fig. 13.9. The test Z=0 Zq

m
plan is called one-tailed
Figure 13.9

o
to the right.

c
The hypothesis is rejected when the calculated value of Z is grdater than Zo,

t .
where Zo is the critical point on the right of which the area is equal to cr.
(c) When H, is p < po, the

p o
s
rejection region equal

g
to cr is taken in the left

o
end of the distribution

l
as shown in Fig. 13.10.

b
The rejection plan is -Za

.
Z--O
called one-tailed to the

3
Figure 13.10
left.

9 4
The hypothesis is rejected when the calculated value of. Z is less than the

9
critical value - Zo where -Zo is a critical point on the left of which the area is

t
cr. The rejection region is Z < * 26. Corresponding to each null hypothesis, the

ta
alternate hypothesis and the rejection regions are given below:

/: /
NuII hypothesis
s Alternative hypothesis Rejection region

s
(a) Ho:p = lro Hr:tr * po (two-sided) Z <-2o12 and Z, Zon

tt p
(b) Ho:pspo Hr : lr > Fo (one-sided) Z, Zo

h
(c) Ho:p2po H, : pt < po (one-sided) Z. -Zo
5. Computations:

The value of Z iscalculated by using the formul a: Z=+


o/Vn
6. Conclusion:
If th; value of Z lies in the acceptance region, the hypothesis is accepted. But
acceptance is just an indication that the sa*ple data has failed to provide evidence
against the null hypothesis. If the value of Z lies in rejection region the hypothesis is
rejected. When Ho is rejected, there is only 100 o % chance that the null hypothesis
rs true.
130 Basic StaUstlcs Part'II

iExomple 13.2.
past records show that the average score of students in statistics is 57 with
,

sample
.tu"iuia a""irii"n ro. A new rnethod o1teaching is ernployed and a random basis of
,i?O .i"a.nts is selected. The sample average is 60. Can we conclude on the
these results, at5%olevel ofsignificance, that the average score
has increased?

Solution:
1. Null hypothesis: Ho: p = 57 Alternative hypothesisi H' : p>
m
57

2. Level of significance:c = 0.05


c o
X-Fo
t .
o
Cl-
Test - statistic:

will p
I
3.
oir/n

s
4. Critic6l region: Z > -L.645. Here we use one-sided test to the right. The

g
hypothesis Ho: F = 57 be rejected if Z lies in

o
lo.=
rejection region.

b
(From the area table of normal distribution, we have Zo = zo.or-- 1.645)

5. Computations:
3
Here n = 70, X .
= 60, 10, and hence

Z=
9 460-57
-
3
10
r,ffi = 2.51

9
10d-70

t
6. Conclusion: since the calculated value of.z= 2.51 falls in the critical

a
region, so we reject our null hypothesis Ho: P = 57 at

s t
5 % Ievel of significance and we m.ay conclude that the

/: /
average score has increased.

s
*"T:';:::;"ar
rirm manufactures right butbs that^have a tength of life that is

tt p
approximately normally distributed *Ith * mean of .812 hours
and a standard
against the alternative
ffitrtil;iaO t "rr.. Test the hypothesis that p = 812hours
h if a rantiom samplt-of 36 bulbs has an average life of 800 hours' Use
a
tL * 8lzhours
5 % level of significance.
Solution:
1. Nutl hypothesis: Ho : p = 812 Alternative hypothesis: H': p * 812

2. Level of significance:o = 0'05


X-po
3. Test-statistic: ,=ffi
4. Critical region: I Z I > 1'96 (Z < -1'96 and'Z > 1'96)
(From the area table of normal distribution, ri,e have ,?,= ,o.orr=
1'96)
lChapter 131 St?tistical Inference Testing of Hypotheses 131

5. Computations: Hercn=36, X= 800, o = 40, andhence


L = 800 - 812 12
(6)=_1.9
40 / v36
-_-
6. Conclusion: Since the calculated value of Z = -1.8 falls in the
acceptance region. Thus Ho: p = 812 is nolrejected.

m
13.8, HYPOTHESIS TESTING-POPULATION MEAN p-o NOT

o
KNOWN (LARGE SAMPLE)

. c
This is an important case in which o is not known. When sample size n is large,

ot
the population may be normal or not, the.sampling distribution of X has the normal

p
distribution with mean p and stanclard rrro. oA,6 . But when o is unknown, it is

s
estimated by thg sample standard deviation S and the estimated standard error is

/G rhe z-statistic becomes I-


og =
where s2 The remaining

l
ffi

.b
procedure is exactly the same as discussed earlier. The only difference is that S is
used in place of o in the calculation of Z.

3
Exomple 13.4.

9 4
A home heating oil delivery company would like to estimate the annual usage
for its customers who live in single-family homes. A sample of 100 customers

t 9
indicated an average annual usage of 1103 gallons and a sample standard deviation

a
|
of 327.8 gallons. At the % level of significance, is there evidence that the average

t
annual usage exceeds 1000 gallons per year?
Solution:

// s
:
1. Null hypothesis: Ho :lr s 1000 Alternative hypothesis: Hr > 1000

s
:F
2. . Level of significance:c = 0.01

tt p .,- X-lto

h
8. Test - statistie: b-
/{nS
4. Critical region: z> 2.326
(From the area table of normal distribution, we have Z, = Zo.o, = 2.826)
5. Computations: Heren=100, X= 1103, S = B2?.8, and.hence
1103 - 1000 103
.L___=m(10)=3.14
327.8 / {100
6. Conclusion: Since the calculated value of Z= 8.14 falls in the critical
region, so we reject our null hypothesis Ho: F < 1000 at
I % level of significance and we may conclude that the
average annual usage exceeds 1000 gatlons.per year.
lr
L32 Basic Statistics Paft'II

Exomple 13.5,
A sample of 42 measurements was taken in order to test the null hypothesis
that the populatiori lnean equals 8.5 against the alternative that it is different from
8.b. The'sample mean and standard deviation were found to be 8.79 and' 1.27,
resp"ctiuely. Perform the hypothesis test using 0.01 as the level of significance.
Solution:

m
1. Null hypothesis: Ho : p = 8'5 Alternative hypothesis: H' : p * 8'5

o
2, Level of significance:cr, = 0.01

3. statistic: Z=
X_po

t . c
o
Test - S tG
p
-2.575 andZ > 2.575)
4, Critical region: I Z I > 2.575 (Z <

s
(From the area table of normal di.trib.,tion, we have 2'575)

g
'7,= ',*u=

l o X=
7= 8'79, S = L.27, and henbe
5. Computations: Here tr = 42, a

. b
z=ffi=Wr@=1.48
6. Conclusion:
43
Since the calculated value of Z
= l'48 falls in the
acceptance region, so we accept our null hypothesis

99
Ho: [r = 8.5 at 1 % level of significance'

t
1S.9 HYPOTHESIS TESTING - POPULATION MEAN
p, o KNOWN -

ta
NORMAL POPULATION (SMALL SAMPLE)

s
is
Sometimes the hypothesis about the population which normal and its

/
/ #.
,tarrauia deviation o i* t rro*n. In this case Z-test is usbd both for small and large

:
s The procedure for testing of population mean p is the

tt p
sample size. Thus Z =

same as discussed earlier.

h
13.10 HYPOTHESIS TESTING - POPUI"ATION MEAN [r, o UNKNOWN -
NORIVIAL POPULATION (SMALL SAMPLE)
When the stand.ard deviation of the population is not known, it is estimated by
the sample standard deviation's'where s = * xG - X)'. Thu Procedure runs
as follows:
The different forms of hypotheses are
1. (a) Ho:P=lto and Hr:P*Po
(b) Ho:ttf Po uld Hr:F>Po
(c) Ho: p:'Po and Hr:P<!ro
2. Level of significance o is decided.
lChrpt"r 131 St"tirti.at r.!,1g.*I:S]rj!j1s gf llypoth*r*o
__ 133
3. Test - statistic:
when population is normal anci samplc size n is small, tht: nampling
distriburion of X tras the t'distribution with (n l) degrees of freedom, The test-
-
X-po
statistic is f =
s /\6

m
4. Critical region:

o
The critical .egion is based on the alternative hypothesis.

. c
(a) .For the alternative

t
hypothesisH,:p*po,
the rejection region is
two-sided as shown in
R.ejecticn Region

p o Rej ection Region

s
+
Fig. 13.11. The two

g
I
criiical values -tui2 ( n*1)
", "
l o
and. top( n_t; are s€en
,* *--

b
from the t-table belorv

.
-tcrl2 (rpl) t=0 tai2 (n-l)
alZ and against (n - i)

3
degrees of freedorn. The Figure i3.i1

4
critical region is

9
t> t al 2 (
n-l)ort{- tcrl2 ( n- 1)

t9
as show in Fig.13.11.

ta
(b) When H, is pr > p,r, tire

s
rejection reglon is taken on

/: /
Rejection Regiot:
the extreme righi side of the
sarnpiing distribution as

s
shown in Fig. 18.12. The

tt p
critrcal value tu (n_l) ts seen
from the t-tahle below cr and re
t=0

h
ta (n-l
against (n - i) degrees of )

freedom. "Ihe critical region is Figure 13.12


t > tcr (n-1)'
(c) When H, is p < trro, the entire
rejection region is taken on Rejection Region
the left side of the sampling
distribution as shor+,n in Fig.
13.13. The critical r,alue
ta(n_1) is seen from ttrE: t-ta!:ie tt: Po
-*---T*.* - _
belorv o, and against (n - 1) : ta(n-t) t=0
degrees of freedom. Tire
'eritical regir:n is t < ' 1,.,. Figure 13.13
i;r_lj^
L34 Basic Statistics Paft-II
5. Cornputations:
X-po
The test-statistic't' is calctilated from tht: sai:rple data where t, =
r /\F
6. Conclusion:
'The
nuli i-lypothesis Ho is rejected in flavour I-I, rvhen [he value of t lies in the
*1.

m
rejection region. Ho is accepted lvhen the vaiue of t l-ies in acceptance region.

o
Exampie 13.6.
A rnanufhctr.rring cornpany lnaking automobile trres claims that the averi,:" 1 ie

t
qf its procluc'r, is 35i]0* lrriies. :\ random sampie of 16 tires was selected; and , ';' iIS
. c
o
fbund that the tneall liib u,*r ,1.i000 miles rvit.h :r stattcir..rti deviation s = 200() I l,'s.

p
Test hypothesis Ho: 1r = 35000 tgaiir.,t the aiternative 1I,: p < 35000 at cr = 0.0i''.
Solution:

g s
.Altemative hypothesis: H,: F < j; 't00

o
1. Null hy'pothesis: Ffo : ;r = i15000
2. Level of significance: u = 0.05
bl
.
v

3
X __Fo
3. r
Test - statlstic: t.-

4
s/1n-

9
4. Critical region: t..-1.?53

t9 X=
(Florn the t-tab!e, u,'c havc - t,r(,,. ir = - to.ol,.rr, = -1.753)

5.
ta" i{ra
Hcrcn=1G, s = 2000, anclhence ;
Computations: 3,1000.

s
35000 -i0c0

/: /
34000 - /..\..
2ooo
-
20u0 \'') -- -'o

s
-_
6. Conclusion: Since the calculated value of t = -2 falls in the criticai

tt p
region, so lve reject our ttull hi'pothesis FIo: p = 35000 at
5 7u lcvel of signiiicancc.

h
Exantple 13.7.
A ranrlom sample of 8 cigarettes of a certain brand iras an average nicotine ,
content of ,tr.2 niilligrams and a stlndard deviation cf 1 4 milligrams. Is this in line
rvith thr: manufactrlrer'1,, r.rlliiril thal, t,l-,i: lly.l;igt tr.ir:olinr-, cr:u;ent does not exceed 3"5 3.
uriliigrams? Use I ')6 ir:'.'*l cf :igr:iticance and a$sirllre tl-rc clistribution of nicotine
content-s to be irorir-ia1.
So/rrtion.'
l. Null hypothesis: Ho: pr s 3.5 Alternative irypothesis: H, : p > 3.5
2. [,cvel of significanee:o = 0.0i
X-t,n
:i 'I't-<r .. statistic: t-
,r
s / \"in
135
4, Critical region: t > 2.998
(From the t-table, we have i,i(n_r) = to.or(z) = 2.ggg)

5. Computations: Hcrcn=8, X= 4.2, s = 1.4, andhence


,l .1.2 ;1.5
- L4 --ti8 0.7
r=--==-=JVg=1.414 r-

m
1.'

o
6. Conclusion: Sincc the calculated value of t = 1.414 falls in the
acceptancc r"egic;n, so we accept our null hypothesis
Ho: pr = ll5 at I ?/o level of significarc€.
t . c
o
:

p
13.11 HYPOTHESIS TESTING * DIFFERENCE BET\IIEEN TWO
"popul,ArroN
s
ITEANS 1r, -;;;,;;Nil11 x'xorvrv

g
(LARGE SAIITPLES)
Suppose there are two no1:ulations (normal or non-normal) with
l o
b
means p, and p,

.
which are unknorvn antl the variirnce" oY and oj which are known.
r*t h.gu

3
random samples of sizes n, ancl 11, are selected from the populations

4
and the sample

9
lneans X, and X, a." calculatccl.'l'lrt'rliltcrence.(!, i"
- Xr) a r.andom variable ancl its

t9 lol oZ
distribution is norrnal rvith rnean lri - F: a:rd standard error \ \i/fnt + -nr

ta
s
The proceiitrre for testing the lil,pothesis pt, _
F: = 0 is explainecl below.

/: /
1. The null and the alternative hypotheses which are possible are

s
ri (a)Ho : Fr-ir:= 0 (orFr = [rr) and H, : pl-fr2*0(or
Ft*ltz)

tt p
t
(b)Ho: lrr-Fzs0(orpr <p, and H,:Fr-Iz> 0(orpt, >[r2)

h
(c)Ho: Irr-p2 >0 (orp, )1r.3) and H,:Fr
-lr::< 0 (orFr < pz)
e
e
2. Level of significance o is deciclctl.
5 3. Test - statistic:
e
The distribution of (X, - i. normal, therefore
Xr) the test-statistic to be used is Z,
(Xr-Xz)-(trr-u:.)
where Z =

Critical region:
^E=
V't Ir:

For each altern:lte f,3,potlrcsis H,, thepe is a rejectic;i plan as explair.-d


oarlier.
Basic Statlstlcs Part'II
136
5. ComPutations:
The Z'statistio is calculated using Sometimes the
null hypothesis states
the samPle clata where, some difference between P, and P, and
the difference is denoted by A' In that
case Ho is 1t, * ltz = A (saY) and
b- (Xr-Xz)-(rrr-uz)
.,-
I"?
m
' 1, lJ+J "3 ,a)--6gf,*L:-1
o
n2
Vt' I o I

V"' '"
^. loi -o;
c
t .
6. Conclusion:

p o
value of Z lies in rejection region' If Z

s
The hypothesis is rejected if the calculated
lies in acceptance resion, the hypothesis is accepted'
Example 13.8,

og
l
Suppose You wish to estimate the effects
of a celtain sleeping pill on men and

b
and' the relevant data are shown
women. Two samPles ;;" ild;p.ndently 1"tut',
below:

3 .
4
Men Women

9
nr=36
Sample size
nr=64

t 9 frl=
Sample mean *z= 8'75 7 '25

ta o?= g qlr= 4

s
Population variance

/ /
Testthe,,.,ttt,ypotffiinstthealternativehypothesisH,:p1>Pz
:
s
at cr = 0.05.

p
Solution:

t
Alternative hypothesis: H' : p1> Pz

t
1. Null hYPothesis: Pe
Ho: [r1=

h
2, Level of significance:o = 0'05
(Xr-Xz\-(pr-P:)
3. Test - statistic: lt. .z

^V',15.\ n2

4. Critical regioir: Z> l'645


(From the area table of normal distribution, we
havgZo=Zr.ou= 1'645)
dl
Xz= 7 '25' oZ= 4'
5. ComPutations: Here n, = 36, Xr = 8'75, o? = 9' n, = 64'
(8'75 - 7'?!I: !
and hence z=-ff4- = #o = 2'683
!ft.*
lChTter 131 Statistical ltfergn:e Testjng of llypothesg:- L37
6. Conclusion: Since the calculated value of Z = 2,683 falls in the critical
region, so we reject our null hypothesis Ho: ftr = p, at 5 o/o
level of significance.
Example 13.9,
Two.astronotners recorded observations on a certain etar. The mean of 30
observations obtained by first astronomer is 8.85 and mean of 40 obqervations made
by second astronomer is 8.20. Pasr experience shows that each astronomer obtained

m
readings with variance of 1.2, Using a = 0.0L, can we say that the difference between

o
two results is significant.

. c
Solution:

t
1. Null hypothesis: Ho : lrr = Fz Alternative hypothesis: H, : p, * p,

o
2, Level of significance:o = 0.01 '(

8. Test - statistic: Z
s
(Xr-Xz)-0rr-rrz)
p
g
=
I
^w
1

o
nl

l
n2
Vn' n2

. b
("' o? =ali=o2)

3
4, Critical region; ZlI > 2.$75 (Z < -2.575 and Z> 2,575)

4
(From the area table of nornral distribution, we have Zs= Za.ooa= 2,575)

5, Computations:
99 :

t
Hcre n,=$Q, Xr = 8,85, nr=40, Xz = 8.20, o2=L.2, o = 1.10
(8,85-8.20)-0 _

a
0.65
=

st
and hence Z 2.407
I i =
1.10
,a.27

/: /
m+a6
6. Conclusion: Since the calculated value of. Z = 2,407 falls in the

s
acceptance region, so we accept our null hypothesis

tt p
FI,r: p, = p, at l. % It-.vel of significance. We may conclude
thut tlre difference between two results ie insignificant,
18.12 HYPOTIIESIS TESTING . DIFFERENCE BETWEEN TWO

h
,

POPULATION MEANS pr * p2, of ANO ol UNXNOWN ,


(LABGE SAMPLES)
When the popllation varianccs of, and o! are unknown, they are estimated by
their sample variances Sf and Sl and the test-etatistic to be used becomes,
(Xr..Id * (ur-:uz)
l=
EI-=
\*.il
This formula is used oniy for large sample sizea but the populations may or may
not be normal. The procedure for testing Ho is the same as explained earlier.
Basic Statistics Paft-II
I
Example 13.10.
I
Suppose that two ranclomly selected sarrrpies ;vield tlie follorving information:

Sample I Sample il
Size n,=82 tt, = 41

Meart Xr= 50 Xz= 55

Variancc s? = aor si = aza

om
Test the null hypothesis that the two population means are equal that

t . c is,

o
Ho:pr= p, ag&inst the alternative hypothesis Er:Ir < pt, at a = 0.01.

p
Solu.tion:

s
Nutl hypothesis: Ho: p1= lt: Alternative hypothesis: "H,
1. : pt1 <

g
F:r

o
2. Level of significance:s = 0.01

/s; b
-Xrt'*(u,-p:)
l
.
tXr
3. Test Z.
- statistic: {;7- 5;
^,

V"' 3
,\ l-) t --2

4
no

4, region:
Criticaf
9
9 :
Z<-2,326

t
(Frorn the area tabie of normal clistributioll, we have * Za= * Zo.o, = - 2"326)

5.
ta Here n, = 82, Xr 50,S?= 405, n, = 41, X, = 55, Si = 324,

s
Computations:

://
s \ - rlz ,rt

tt p
6. Conclusion: Since thc calculated valtte cif Z = - 1.40 falls in the
acceptatrcc region, so we ttccept our null hypothesfs

h (srvIALL SAMPLES)
Ho: pt, = 1t, at 19/u level'of significance.

18.13 TEST ABOUT pr - ps, of AUn ol XwOW'U, POPULATIONS NORI\'IAL r€


E,

In case of small sarnpie sizes, we can use Z-test fbr testing the diffeience ta
betrveen p, and p, when of and ol are knolvn and the populations are necessarily fo

normil. The Z-teet used is 'l =


(Xr-- Xr) Jlti * tz)
ffi----;
. /-J
loi + -lol
^\
\ v'/ n, tt,,
[Chapter 13] Statistical Inferenee Testing cf hlypothe*es 139
13.14 TEST ABOUT Fr * irE, ei ..UUi: ol NGT' [.i]i{}\YN, PUpI"JLATIONS
. NORMAL (SMALL SAIIIFLES)
This is a case whici: is riri't'cr*ni trcrn tlLi::'.I:revrr)us t,irree crlscs. [{ere the
conditions are that:
(i) the populaiior:s are nt;r:rral
(ii) of and oj ar" unknor,,r: but assrrnied tu t,+ ceual.

m
(iiil the sample sizee n, and n, are sinall and:ire selected inilependentiy.

o
The variances oi and oj lre unkno.*'n but o? = = o'. The parirnet,:r' o' !s

c
"i

t .
estimated by the sample variances. The sample e*tinrator of o2 is sf, wher(\

p o
r(Xr *8,;z+r4X,
g s
o
and sp=

l
n -ttr .-')

b
s2 called poolerl estirnator of the colrmon populaiion variancc a'.

.
p
ie T'he

3
difference 6, - X,) has the t - drstribution rvith (n, , n: 2) degrees of freedom

4 r{
where

99 .. (]Kr -- Xz) - (p: - rr:)

a t .t/;; - ;; *,

st
The tabulated value c,f 't' for ir-, .f r1o -- 2 degrees of fi'eedom is seen from tire

/: /
t-table.

s
For H, : p, * pr, the critical values ate - tg,(*..r-nr-Z) and toy2 (nr+nr-2)

tt p
For H, ; p, > pr, l,he critical v'alue is ta qn,+n2*I1
and For H,: p, < tr2 t.he critica! value is - tr, (r:1tn2-2)

reglon. h
The null hypothesis

Example 13,11,
FIo is rcjectcd rvhen the calculated value nf t lies in r:ejoction

Two sarnples are randornly sclccted from two classes of stucierris who have Lecn
taught by different methocis. An cxamination is given and the results are shownr as
follows:
Class i Class II
Sarnpic Size fi,=I 10

I Ulean fir - !)I'


*---.;'*::-* **. a-,; = S?
*1*--
Varianc*
L_.--___-_*-_
si= +l ..;i +,r
$', -- rjv
Basic Statistics Part-II
140
of the two classes of students have
On the aseumP'tion that the test scores trvo clif{'ererrt methods of teaching are
ihe
identical variancr:s, determine wiret!{er
equaliY effective at cr = 0'01'
Solution:
Alternative hYPothesis: H, : P1* Fz
1. Null hyPothesis: Ho : P' = ltg
2. Lcvel of significance:q' = 0'01
om
. c
l-

t
it.. Test - statistic:

it l>2.921 (t<*2'921 antlt>2'921)


p o
s
4. Critical region:
= 2'92L)

g
(Frorn the t-:table, we have t!,n, * or - 2) = io,
t0.,,00,

l o
lIcrc rl, = 8, X, = 95, st= 4?' nz =
10, X? = 9?, sl = 30,

b
5. ComPutations:

. ilo .1) 30.

3
(nr - f+(nr-
1),_ s;I 1) _-s; LL
H = B7 '4375'
\'-t -
= =u.1.1.1
8+ 10-2 =

4
"2=
p tf,+nr-2
-l]);g =-4= -
9
*--f:T ($5
= 2.9030 = - o.6ge
6.12, anrl hence t "

- t9
= c;tz\/ I
=
*m

ta t * Since tl"ir: ealculaterl value of =


0'689 falls in the

s
6. Cor:clusion: ouv nr'tll hypothesis

/: /
ilcceptilllcc region' so -!v1 accept
;;il'level H.,: lt, = iI, of uignirit*nte' on tl.re basis of
the two rlifferent

s
the cvirlcllcet we tnay conclucle thr'rt

tt p
*.tnoas of t'eaehing are equally effective '
IB.ISTESTABOUTFt-ItTIIEPUNDENTSAMI'LES'POPULATIONS

h NORilIAL
Supposethereuretrvopopultrtionswithtnettnlr,anclu,whichareunknown.
,l.wo rundOur sarnples of sizes tt, itltd nx are selected' It ie
further assumed that the
a sample of . some
are Suppolic \i,'o ,u.n,,l bio.:d' pre$$ure$ of
silmple s clepetr<tent. anrl again thoir blood
I'he patients are giurn a treatntent fof sonre
lratients. ut't' of oh'qervation' uit called depenclcnt sarnples'
'ul"ina
prcssures are recsrdeel. The,se tt'o set c]f obsen'ations is
,l.hc first sct of obscrvations is c,,licti'i.,-f"-' and the s*tonct
ofr*',,uu*ions are in pairs'
*'',*ul "'' Xn are
cailccl'*ftcr,ob*ervations.'lhcs0 "1t, then tlte
,bcf.rc' ohseniations and Yr, Y'r, Yj, :"' Yn are the'after'oblervltions'
thei
l'et'us find the
are (X' Yr)' (X'' Ye)' (X3' Y'1)' "" (Xn'Y;)'
lriii'trrl 'bseryatioll$
tlrc pltirct! ,..luo*. l,.t .titfu,nn*u
a, * X, -.Y,, 'dz = X,. Y,,
.liticretir:ir bretwr:en
rl ,= X -Y.,, ',cl'=-X"-Y"
lChlpter 131 Statistical Inlere.nge Te:ling ot Hypottljr?es ,__ L4t
-
The inean of the sarnpie 'd' values is denoted bv [. Suppose the corresponding
paraneler of the difference between paired obsanaticirs in the populations igr
denoted by po. 'Ihe various steps of the procridure are:
1. . Three different forms of null anrl alternative hypoi;heses are
(a)Ho : po = 0 (orp, =p2) and H, : pr, ;e 0 (orp, *[2)
(b)Ho:p, ( 0 (orpr,Spr) and Hr;iro > 0 (orpr>lr,z)
(c)Ho : [p ] 0 (orp, ]fr,) and Hi : Iii, < 0 (or1lr <ilg)
om
. c
Sornetirnes rve hrtve [r, exirrninc that the differences of the paired observaiions.in

t
the population hlrvc sornc sJ"rccificd value say A. In that case p,., = A

o
2. Level r:f signilicancu (-r. is riccirlcr[.

p
B" Test-statistie :
d'has tlic t-clistrihr.rtion rvith (n *
g s
1) degrees of lreedom.

l o
. b
3
4, Critical region:

4
Corrcspontling to each H,, thclo is a critical region.

5- 9
9 {$
, = 1:&
t
Computations: The test-statistrc t is calculated where

ta *ffi
/: / s
when Ho is pr,, = o, then , = =
6. Conclusion:

s = 0 is rejcctctl rf the calculatett value of 't' lies in the

tt p
The hypothcsis it,-, r.9Jcc'Lioi]
rurgion,

h
Exurnple 13.12.
Suppooe that a shoc cerrnpiiny' *'iinted to test matc::ra! foi' ther sales of shoes. For
each pair of shocs tho nerv nrnt':r'i;ll was placeel on onc shoe anrl tl're olrl material lvas
placed on the olhcr slioc. ,\ftcr ir givcri periocl of tiure a randonl sample of len pairs
of shocs rvas scicr:tc,! rrrrti tlrc rrc;lr wa$ ineaeur*d on a tcn-pornt scrlc u,ith the
follorving rcsults:
Plrir number I 2 .J *1 5 6 I It I tCI

Nirv rnattrial I l-i 7 7 l'r f] I I 7

0lrl nraterinl 4 s ;l B 1) '4 I D

Dilfcrences *i) '-'l1 r2


+t) il *1 -r.lt o +l
-+"
t
Al thc 0.05 lcvel r:f significance, is therc evidcnce lhrit thr: ilveinfi,,; ',.i'riir'iri
highcr fcl tlrc nc\v urltorjal than tlru olrl rnaterial?
L42 Basic Statistics Part-II
Solution:
1. Null hypothesis: Hu : p,,"* s poraor lrD = p,,o* - Fora S 0
Alternative hypothesis: H, : fl,rew >.pordor pD = ltnu* - pora > 0
2. Level of significance: o = 0.05
d-do
it. - statistic: [=

m
Test
s,1 / t/n
4. Critical region: t> l. 833

c o
.
(Irrom the t-table, rve havc t*(*-l) = t0,s(s) = 1.8113)
5. Conr:putations:
ot
Lct X, = nelv rnaterial and X, = old materiai.

I p
The necessary calculations are given below:
x1 2 4 r,l

g s I 7 5 8 8 7

o
I
l
X, 4 5 d 8 4 7 8 5. 6

. b
d=Xr-Xz -2 -1 +9
-1 _9 +1 +2 0 +3 +1

3
d2 4 I 1 I 4 I 4 0 i
=4
1

9
Ilere n [= + =*
10, Ed = B, Id2=29, =0.3,

si =9*[ro,-q$] =m=[rr-S]
a t
st
= 3.1222, s6 = 1.77, and hence

/: /
0 0.3
rF pr
- = 0.536
0.3

s
1.77lV10 L'tt
L f-
- - "

tt p
6. Conelusion: Sincc thc calculatecl vaiue of' t = 0.536 falls in the
accelltartcc region, so we accept our null hypothesis

h
Hur l!,,u* < fro1,1 flt 5 % level of significance. On the basis of
the evidence, we may conclude that the averagd wear is
not higher for the new material than the old mater:ial.
Example 18,13,
Tv,,o varieties of wheat oro un.O planted in ten localities with clifferences in yield
as lblio'uvsr 2, 4,'2, 2, 3, 6, 2, Z, 4, 3, Test the hypothesis that the populatiotr mean
differerrce is zero, using o = t).01.
Sol.ution.:
1, Null hypothesis: = Itr - ltz = 0
Ho : p, = [ts o].'[u
Alternative hypothesis: ll, , p, * [2 or p, = ltr.- [, * 0
?,. tevel of significance: s = 0.0I
T

d-do
B. Test - statistic: L-
s'1 / r/n
4. Criticalregion: I t l>8.2b0 (t<-B.2b0andt>8.2b0)
(i"rom the t-table, rve have tlr,,_
r)= to.tns(sr = 3.250)

5.

om
Computations: Herc n = 10,Ed = Bo,xd2= 106, d= +=#=r,
,i -
.c
*[ro, ry] =ml[,ro-'',?']
= 1.7778, s6 = 1.33, and hence
ot
p
.t 3-0 3 r-
= --'6=mltO=7.188
s
1.33 /V10 l'riJ
-- Y

g
6. Conclusion: Sinco the calculated value of t=7.133 falls in the cfitical

o
lp
region, so rve reject ow null hlpothesis Ho: pr, = [2 at I olo

b
level of significance.

.
13.16 TEST oF POPULATI0N PRoPoRTIoN (LARGE SAMPLE)

43
Let us consider a binornial population with a proportion p which js unknown
and we have to test a hypothesis about the unknown population parameter. A

9
random sample of size n (n > 30) is selected frorn the population Lnd the sample

t9\ F. ,n.
proportion ff is calculated. When sarnple size is large, the distribution of $ i.s normal

ta
with rnean p and standard error random varia'ole Z canbe calculated

s
V

\n:/
from $. Thus Z - -P---P-

^E /
p s
t
The random variable Z is usad as test statistic and the value of Z make.s a base

t
for the acceptance or rejection of the null hypothesis about the popuiation

h
proportion. The procedure for testing p runs as below:
1. We frame a hypothesis about the population proportion p. Let us specify a value
po for the population paramclcr p. The null hypothesis Ho and the aliernative
hypothesis H, can take any one of the following three forms:
(a)Ho:p = poandH,:p * po ft) Ho:p ( poandH,:p > po
(c)Ho:p ) po and H,:p < p,
2, Level of significance is cleciderl. [t is denoted by a.

B. Test-statistic: used in this cnse is , = where ec, =1* po


*
V,
L44 Basic Statistics Palt-II
x
The sample proportion $ can also be written as p
n ,
where'X'is the number
x
of successes in the samPle of size n. Putting fi = n ln the above formula for Z,
x_npo
rl
;*nPo 1_npo
we get lJ

m
{nPoqo

X-nPu
c o
.
cnn also be r.rsed as le.sl-slo tlstic for testing population

t
Thrrs ll = --*L
! r"r p., q0
pt'oltortiotl
4. Critical rcgion:
1-r'

p o
s
T'lic criricul riegion dcpcrrrls ullon the nlternative hypoihesis H,. The three fcrrms

g
o
oil [, lrt'c
(a) l{, is p * pe. ln this case

bl
.
thcr rojcction rcgion is

3
takt'rr in bot"ir e nds of tlie

4
r-;itt: 1i lin g rilstribtriion.

T. 9
'l-l-rtr lcjrrctir-in rcgiotr on

9
,-:nclr srtlc is cqtral to alL'

t
I
'llic tlvo t:r'itical valucs *Z g,t2 Z= Zst|

a
0

t
* Z,rt2 tnxlZul2 seParalc Figure 13.14

/: / s
the critical rcgiotr fronr lhe ncceptance region as shown in Fig' 13'14' Ho is
rerjected rvheil thc cralculirterl vitlue of Z lies in rejectioll region'
Ho is rejected

s
*
rvltctr 'l < * Zuti or Z > .2,,t2. 'fltc values be[ween Zsly and Zr12 form the

tt p
acceptitttce rc:giott. 'l'hc test is callcd two - sided'

h
ibr .Ll, : p >' Po. Itr this casc
tlre r:cjection rcgiotr is Reject,ion Region
talicu onlY iu the right (l-_a) 1
sirlcr of tlic sarullling
c{isLt,iilritrutl.'l'ltc test is
P = Po
calleil uttc *'sidt'cl kr thg
ri61'rt, 'l'he ct,iticrtl vnluc Z=0 Zu
l;i-'trt'ct'tt thc lttic.:Pt:tttcc
Ftgure 13.15
rcgir-'t'r lirttl t lrc lsji:ctirlu
lr-'iritttt is 2,, it,s shorvn in Fig' tlj,15' The values above Zuform the
critical region
irnri tltc vitltttrs icss tltirn 2,, for:rtr the acceptance region whet'e as
Z* is the
r:r'iticril r':riuc ltncl shortlrl not bo used for acceptance or rejectiolr of Hu,
(c) When H, is p < pp, the cntire
rejection region falls in the left Rejection Region
side of the sampling
distribution. The test is cailed
one-sided to the left. The
critical value - Zr'is a point tt
P=Po

m
between the critical region and -Zd Z=0

o
the acceptance region as Figure 13.16

.
shown in Fig. 13.16. The value less than - Z, form the critical region. Ho is

t
rejected when the Z value calculated from the sample data falls in the rejection c
o
region otherwise the null hypothesis Ho is accepted with the usual meauing of

p
s
the term 'acceptance'. The rejection region is Z < -Zr,

g
5. Computation 6. Conclusion

o
Example 13,14,

l
In a poll of 1000 voters selected at random frorn all the voters in a certain

. b
distriet, if is found that 518 voters are in favour of a particular candidate. Test the
null hypothesis that the proportion of all the voters in the district who favour the

than 50 percent at cr = 0.05.


43
candidate is equal to or less than 50 percent against the aiternative that it is greater

9
Solution:

9
1. Null hypothesis:
t
Ho :p < 0.50 Alternative hypothesis: Hr : P > 0.50

a
2. Level of significance:G = 0.05
3. Test - statistici Z = t
s .tr
P-Po

/: /
Critical
s
region: Z> L.645

tt p
(From the area table of normal distribution, we have Zo= Zoos = 1.645)
"x 518

h
5. Cpmputations: Here n = 1000, X= 518, ij=; - 1000 = 0.518
po = 0.50, go = 1-po = 0.50, and hence
(0.518 - 0.50) 0.018
Z_ (0.50x0.50) 0.016' =
1.r25
1000
6. Conclusion: Since the calculated valtre of Z = 1.125 falis in the
acceptance region, so we accept our nttli hypothesis
Ho: p < 0.50 at.5 7o level of significance.
Exomple 13.15,
At a certain college it is estirnated that at most 2.5 % of the students ride
bicycles to class. Does this see.m to be a valid estimale, if in a random sample of 90
college students, 28 are found to ride bicycles to class? Use a 5 % level of
significance.
146 Basic Statistics Paft-II

Solution:
Alternative hypothesis: H, : P > 0.25
1. Null hypothesis: H, : P < 0.25
2, Level of significallce; a = 0.05
0-po'
3. Test - statistic:

m
^\ iPoQo
i-
vn
4. Critical region: z> l.G-15

c o
.
(Irrom the area table of not'nlal clistribution' we haveZo=Zo,u = 1.645)

5. Cornputatiotrs: 'I{crclt=90, X=28, 0=* = # = o'81'


ot
p
' ltu = 0.25, Qo
'= 1 - Po = 0.75, and hcnce
0.31 - 0.25

ve0 og
s
t=-ffi
0.06
= 1.32 =ffi;ft

bl
Sirrce the calculatect value of Z = 1'32 falls in the

.
6. Conclusion:

3
acceptance region, so we accept ottr null hypothesis Ho:

4
p < 0.2ir at 5 % level of significance. On the basis of the
tviticncc, we may conclude that at most 25 % of the

99
sttidctrts ricie bicycles to class'

t
13.17 TEST OF DIFFERENCE BETWEEN TWO POPULATION

a
PROPORTIONS, Pr - Fr (LARGE SAMPLES)

s t
suppose there are two binornial populations with proportio'ns
p, and p, which

/: /
n2 are selected
are unknown. Two independcnl large random samples of sizes n, and
difference
from the populations ancl sartrplc proportion $, and fl, ate calculated' The

s
tt p
(fi, - 0r) is a random variabl,; iinrl has the normal distribution with mean Pr - Pz and
6"q,T'Jr,

h
standard erro'1 l- + n"
Vnr - given beiow:
The proced..." fo, testing of the difference between p, and p, is
1. Thrbe forms of the hypothcscs are as below:
(a)Ho:Pr-Pz= 0 (or Pr = PJ and Hr:Pr -Pz* 0 (orpt *p2)
(b)Ho:pr-pr(0 (orp, sp3) and Hr:Pr-Pz> 0 (orp1 >P2)
(c)Ho:pr-pz> 0 (or p, ) pr) and Hl:P, -Pz< 0 (orp1 < P2)
I Level of significance is decided and is denoted by a'
.J. Test-statistlrc:
The random variable Z is used as test statistic where
6,-$r-(p,-p,)
PrQr PvQz
I1 I 11+
Chapter 13 Statistical Inferepce Testing of Hypotheses 147
btrtz as defined above is only in theory. In actual practice when Ho is p,-p,
=0
(or p, = pz), the values of pr, er, p2 anri
Q2 a].e not known because these are all
unknown parameters. when IIo is pr = p,i, then wc assume that the cornrnon
population proportion for both populations is p.. This proportion p^
is estirnated

by 6. by poolir-rg the data frorn both samtr;les. ,l.hus

Thus the test - statistic userl in actual practicr: is

om
c
r0,-0r-o
AA N
A n (t + 1\
t .
o
Pc Qc Pc (lc
nc tc
T
[nl ,,"j

p
nl [:

s
lVhen Ho is p, - pz = A (say), thcn the test statistic r.rscci is

g
,An
(Pr-P.J-A

o
ry

l
L
nAnn
PrQr-t

b
P:r(i:

4. Critical region:

3 .
4
The critical region depencls upon the alternative hypothesis H,. For
three forms

9
of H,, the rejcction r.egrons ar.c:

9
(a) When H, is p, - ps = 0 or p, pr, thu rejcction region is taken in both ends

a t
of the sampling dist'ibution.=The critical values are .2112 and zop. The
-

t
values grcater than'1,112 and less than zrrl2forrn the rejecii<_rr-, .ugion.
- rh"

/: / s
values which lie betwee n - zar2 and zop form the acceptance region.
Ho is
rejectecl if Z < -Zutzor Z > Zu12.When Ho is pr then it
-"pz = 0; does not

s
make any difference whcrhcr we take (0,-0, or (02-$,) in the test-statistic.

tt p
(b) when H, is p, - pz > 0 or p, > p2, the entire rcjection r.egion is takcn rn the
right side of the turvc. It is called one - tailed test to the right. The critical

h
value is zo and if z ries in rejection regio, the hypothesls
6, - p:r) < 0
or (p, < p,) is rejected and H, : pr > p2 is accepted. It is important to note
that if H, is Pz > Pr, thcn the clifTerence 6r-0,) is used in the test statistic.
-
rhus Z=--S4:= The rejection region is Z > Zo.

(c) When H, is (p, - p,2) < 0 (or p, < pJ, the rejecJion region equal to cr is taken
in the extreme left sirle. 'fhe critical value is - Zo and the hypothesis
Ho : (pr - pJ > 0 is rcjccfed anci H, : (pr
- pz) < 0 is accepted. The critical
region rs Z<-Zo.
5., Computation 6. Conclusion
Basic Statistics Part-II
r48
Example.13,i6,
The cigarette-manufaciuring firrn ctistribuies two brands
of cigarettes' It is
S0 of 150 smokers prefer
found that 56 of 200 smokers prefcrr i:rrand 'A' and that
brar.rd
,8,. Test th; iry;;;;;*i."o, 0"05 level of significance that brand 'A' outsells
brand 'B' by 10% against the altcrnatire hypothesis
that the difference is less
than 10 %.

m
Siolution:

o
1. Null hYPothesis: FI6: Pr* p, > o.1o

c
< o.1o
Alternativc hYPothesib: H, t).

.
r_I - P,
:

t
2, evel of significalrce: G - 0.05

o
tr

p
rr*
3. Test'- statistic: Ar\ n

s
P rQr,t P'zQz

g
nl i1:

l o
4. Critical reglon:
Urrtlcal region: z < -- 1.6,{5
(From the area tabie of normal distrihution' we haye - Zo= -Zoor,= -

b
1.645)

.
prefer brand A)'
5r Computat'ions: Here n, = 200' Xr = 56 (No of smokers who

3
',n',=tr5C1'X'=30(No'ofsrnokerswhopreferbrandB)'
* 4
9 #
0,= 0' = 1-
= =o'28' 0r=o'?2'

fi--*9 #
a t '(0.28-0.2)-0,10-- = =
-= =0'2,02=1 0z=-0'8' andhence

s t'=@\ ,oo- '-t5o


"-# -9'9,?

/
-0'0455 -o.44

: /
s
tt p
6. Conclusion: Sincethecaicuiatedvalueof.Z=-o,44fallsintheacceptanceat
- 0'10
region, so we accept our null hypothesis Ho: Pr P:2
brand
5 9/o level of signiticance and we may conclude that the

h
u""Tort{niii';r*r}e
'A'outsells brand'B'.

of 180 high schoor students was asked whether I}.:I r","}1 in


*"r;" ;;;'f;th;;;-; trruit *"tirers for help with a home work assignment
- --.1-^J rL^
srr'rdents'*'as asked the
i#a"#;il #;;;;tl"r.rra"," r**pr" of rbb high schoot
::il:";''i:H:;;;#;;;d ;;;;;;;;r'
r. r TT-- !I^^
in -^^,,1+
1.'lsilef l:i'::l; Y::.11: ::*:
revel or significance to test rvhether or not
ffiJr;;r;'irii;;j;*"*ur" ;; rrretrueo.oiproportions of high school students rvho turn
i r I r,f ^--r^ --.L^ l"--

;il;;.'; ffif-;;;;" bltween the


ffi;i; f.trl"* trtrru, thun tl'";L*qt\9'o {"i'}"lP t} th*tg tt"'
h{athematiei; English
Mother 59 B5

Father 91 65
[Chaphr 13J Statistical Inference Testing of Hypotheses 149
Solution:
l. Null hypothesis: Ho : pr = p2 or pr
- pz = 0
: Alterriative hypothesis: H, : pr * pz or pr p2 * 0
-
2: Level of significancei c = 0.01

m
3, Test - statistic:

c o
4. Critical region: I Z I > 2.515 (Z < - 2.575 andZ> 2.575)
t .
o
(From the area table of normal distribution, we have Zg=Zo.,od= 2.575

5. Computations: Here n, =
s
150, X, = 91, n, = 150,
p
X, = 65,

,.. Xl
P'=il 91

ogXz
= fE6' P, =_=-
65

,. lt;;;-n,o, .blrbo(ffi)
nz 150 '

150(#) +

3
.,0, * el + 6b rb6
m= -T06-- = 306 = 0.52,

- = -4
pc = =

st9
= 0.48,
9
0. = 1 0. 1 0.52 and hence

t
( 65\

ta
tr_ Ilbo-lbor-o

s
L-

:/ /
s
6. Conclusion: Since the calculated value of.Z = 3.003 falls in the crifical region,

tt p
so we reject our null hypothesis Ho:. p, = p2 at I 7o level of
significance and we may conclude that there is a difference

h
between the true proportions of high school students.
13.IE CHOICE OF PROPER TEST - STATISTIC
In a certain given situation, we have to choose the proper test-statistic. For
example the population mean p can be tested with the help of Z-teat and t-test. The
testing of hypotheses along with other things, mainly depends upon the sample size.
The sample size plays a major role in the testing of hypothesis. The. following table
can be used for guidance in choosing the proper test.statistic.
n - Large n - Small
o - Known Z - test Z - test
o - Unkhown Z -.test t - test
Paft-II
Basic Statrttics
I
T DEFTNITIONS 1

t
Hypothesis
-i?*u^unt v
purpose of testing'
about a population parameter developed for the
/or "l
//Hypothesis is a statement which may or may not apper
ars be true after conclusion' t
.A

SYpothesis Testing
r

m
.rfi;9|;;.liu" of'ffithbsis testing is to check the validitv of a statement about
a

o
populaiion parameter. I

c
or

.
theoryto deter5nine whether tr

t
A procedure based on sample evidence and probability

o
hypothesis'testing' T
the hypothesis is ,"".onible statement or not is called
"

p
T
Statistical HYPothesis value of a population
A statistical hypothesis is a statement about the numerical
s
l't

parameter

og or T

is l
about a population'
A statistical hypothesis is a quantitative statement is

b
.-
.
X"l'jf'ilr"Jlffis is any hyporhesis which tested for possible rejection or A

3
acceptance Lnder the assumption that it is true'
T

4
or a

9
of a population parameter'
The null hypdthesis is a statement about the value

evidence't9
Alternati.rl ffvpothesis or Research Hypothesis which the researcher wants T
The altern"ti"Jrlv"p-"ii"Jrli rtraUy the hypothesis for

a
le

t
i
to gather suPPorting C

s
or

/
statement specifuing that the populatiori parameter
is some value other t'han the A

/
\rA "

*iri.t :
ci
@the nullhYPothesis

hypothesis s
9.
- Simple HYPothesis

tt p
called siinPle
A hypothesis ,pu.rfy a1 values of parameters of a distribution is ,,: A
d.
, or .e

h
T
uniquely specifies
A hypothesis is said io be a simple hypothgsis if the hypothesis A
the distribution from whichthesample is taken' ,str
\Gti*po"ite HYPothesis ,w
does not completely specify the
A hypothesis is said to be a composite hyirothesis if it
probability distribution. A
or
of a distribution is di
A hypothesis which does not specify aIL values of parameters -l
calledcompositehypothqsis. '
C
r
\l -- .q, i -\
lgignificance.Level or Level of signifioance is called the significance level cr'
T
The probability of rejectine u trru ,,;l hypbthesis vi
'.or
The probability of making a type I elrort is called ttle significance'level of the T]
hypoihesis test and is denoted by cr (alpha)' r€
I
lChapter 13] Stati:]ical Inference Testing of Hypotheses . 151
Tests of Signipdance
A significance lest is a statistical test laying down the procedure
for decirling
whether to accept or reject a'statisticat hypoiheJls.
Test Statistic'g/

A statistic used as 'a basis for deciding whether ,the null hypothesis should be
rejected is called test statistii
or
The sample quantity on which thq,decision to support Ho
om
c
or Hr is based is called the

.
teststatistic. a ! f,-i
Rejection Region./ (11;,h"' ,r'4

ot
p
The rejection region-is the *ut ofio.*ible computed vaiues
of the test statistic for

s
which the null hypothesis wilt be rejected.

og
The set of values for the test statistic that lead to rejection

l
of the null hypothesis Ho

Acceptance Region \-,


. b
3
The set of values for the test statistic that lead to accept the
null hypothesis is called

4
acceptance refiion.

9
or

t9 -l
The portion of the area under a curve that includes those values
of a statistic
--: that

a
lead to acceptance of the null hypothesis.
rr

t
one-TailedTest*.:?.tii,n]-\:L.>

/ s
A statistical test in which the critical region is at one end of sampling

/
distribution is

:
called as one-tailed test.

s
tt p
A one-tailed test.of.hypothesis
is orru'lr, which the alternative hypothesis is
directional, and includes either the symbol ,: < ,, or,, > ,,.
Two-TailedTest ",Jr ,,
h
+ t
i,
A two-tailed test of hypothesis is oru which the altern#J" ;rr;esis does nor
specifr deilarture from Ho in prrii".rtur direction; such an alternative
" is written
with the symbol " * ".

A statistical test in which the critica, -11"" is located ui both ends of sarnpling
distribution is known as two-tailecl test.
Critical Value
The value which separaCes the rcjectionlind acceptance regions
is cAlled the critical
value of the test statistic.
-''"t",, ot
The dividing point betweer, the regioq *h*,r" the null hypothesis rejected a,rd the
- 1S
region where it is accrpre,i is said.to be critical valu".- "
I q, , ./' '' Basic Statistics Part-II
:'-v/ ' : : :

'fype I Erro|*.,.
If *.e r:eject {true hu11 hypothesis, the e*or is called a type I error.
'or
Type tr error is the rejection of Ho when it is true'
Tyne II Erro2-r
If we accept afialsq null hypothesis, the error is called a type II error.

m
\*--/

o
nF Of 9.

c
it is false. is known as type II error'

.
Acceptance of Ho when

t
Power of a Test

o
1
The power of a tesl[tire probability of rejecting the null hypoihesis when it is false.
i

s p
The power of a test is the probability that the'test will lead to a rejection of the null

g
hypolhesis Ho when, in fact, the alternative hypothesis Hr is true.

o
l
Power Curve

b
-i
A graph of the problS-iliry of rejecting Ho for all possible values of the population

.
parameter not satisfuing the null hypothesis is knbwn as power curve.

3
4
MULTTPLE _ CHOICE QUESTIQNg

9
9
A statement about a population developed for the purpose of testing is 0alled:

t
1.
(a) hypothesis (b) hypothesis testing
(c) level of significance
ta (d) test - statistic

s
Any hypothesis which is tested for the purpose of rejection under the

/: /
c)

assurnption that it is true is called:


(b)
s
(a) null hypothesis alternativehYPothesis
(d) comPositehYPothesis

tt p
(c) statisticalhypothesis
3. A statement about the value of a population parameter is called:

h
(a) null hypothesis (b) alternativehYPothesis
(c) simple hypothesis (d) compositehYPothesis
,4. Any statement whose validity is tested on the basis of a sample is callqd:
(a) null hypothesrs (b) alternativehYPothesis
(c) statistical.hypothesis ' (d) simple hYPothesis
5. A quantitative statement about a population is ealled:
(a) researchhypothesib (b) compositehYPothesis
(c) simple hypdthesis " (d) statistical hypothesis
6. A statement that is accepterl if the sample'data provide sufficient evidence that
the null hypothesis is false is called:
(a) simple hypothesis (b) compositehYPothesis
(c) statisticalhYPothesis (d) alternativehYPothesis
[Chapter 13] Statisticat Inference Testing of Hypotheses .153
7. The alternative hypothesis is also called:
(a) null hypothesis (b) statisticalhyporhesis
(c) research hypothesis (d) simple hypothesis
i8. A hypothesip that specifies all the values of parameter is called:
(a) simple hypothesis (b) . composite hypothesis

l, (c) statisticalhypothesis
The hypdthesis p.< 10 is a:
(d) none.of the above.
m
(a) simple hypothesis (b) compositehypothesis
c o
.
(c)
t
alternativehypothesis (d) difficult to tell.

o
10. If a hypothesis specifies the population distribution is called:

p
(a) simple hypotheois (b) compositehypothesis

s
(c) alternativehypothesis (d) none ofthe above.

g
11. The probability of rejecting the null hypothesis when it is true is called:
(a) level ofconfidence
l o
(b) level of significance

b
(c) power of the test
.
(d) difficult to tell

3
12, The dividing point between.the region where the null hypothesis is rejected

4
and the region where,it is not rejected is said to b'e:

9
(a) critical region (b) critical value

9
(c) acceptance region (d) significant region
t
18. If the critical region is located equally in both sides of the sampling distribution

a
t
oftest - statistic, the test is called:

s
(a) one tailed

/: /
(b) two tailed
(c) risht tailed ' (d) left tail€d

s
14. The choice of one-tailed test and two-tailed test depends upon:

tt p
(a) null hypothesis (b) algernativehypothesis
(c) none of these (d) compositehypothesis

h
15. A rule or formula that provides a basis for testing a null hypothesis is called:
(a) test-statistic (b) populationstatistic
(c) .: both cfthese (d) none ofthe above
16. The test statistic is equal to:
(arffi
Sample - Population
(b) -Earnple
statistic - ParArneter
Standard error ofthe statistic
/a\ Selnple mean - Population mean Statistib-E(Statistic)
\v/ Population standard deviation (d)
Variance of the statistic
17. I-crigalsocalled:
(a) confidencecoeffrcieryt (b) power of the test
(c) size ofthe test (d) level of significance
Basic Statistics Part-II
_1I1
18. If true and we reject it is called:
Ho is
(a) type-I error (b) type-Il error
(c) standard error (d) sampling error
is:
I"9. The probabiiity associated with committing type-I error
(a) B
(b) cr
(c) 1-B (d) 1-cr
2A. 1 - is the probability associated with:
cr
G) type-II error
om
c
(a) type-I error

t .
(c) level of confrdence (d) ievel of significance

o
21, I,evel of significance is also calied:

p
(a) power of the test &) size of the test

s
(c) level of confidence (d) confidence coefficient

g
22. The probability of r'ejecting Ho when it is false is called:

o
(b) size of the test
l
(a) power of the test
(d) confrdencecoefficient
b
(c) level ofconfidencc
(b).
28. In testing hypothesis o + p is always equal to:
(a)
3
zera

4(b)
one
(c) (d)
difficult to tell

9
two
24. The significance level is the risk of:
(a) rejecting Ho when Ho is correct
t9 (d)
rejecting Ho wheri H' is correct

a (b)
accepting Ho when Ho is correct'

t
(c) - rejecting H, when H, is correct

s
is:
25. An example in'a two-sided alternative hypothesis

/: /
(a) H,:p<0 H,:trr>0
(d) H,:P*0
s
(c) II,:p)O

tt p
tabulated value of t'
Il the magnitude bf calculaied value of t is less than the
;"; H,l;;wo'sided, we should:

h
(a) reject Ho (b) accept H,

(c) not reject Ho (d) d.iffrcult to teII

27. Accepting a null hYPothesis Ho:


(a) Proves that Ho is true (b) proves that Ho is false

(c) irnplies that Ho is likely to be true ' (d) proves that p < 0'
when sample size is:
28. The chance of rejecting a true hypothesis decreases
'(a) decreased (b) increased
(c) constant (d) both (a) and (b)
29. The equality condition always appears 1n:
(a) null hypothesis (b) simple hypothesis
(c) hypothesis (d) both (a) and (b)
alternative
I
lchapter 13I statistlcat rnference Testing of Hypotheies 155
S0: Which hypothesis is always in an inequality form?
(a) null hypothesis (b) alternativehypothesis
(c) , simple hypothesis (d) composite hypothesis
81. Which of the following is not composite hypothesis?
(a) p>p.
(c) lr =ilro

m
32. P(Type I error) is equal to:

o
(a) 1-a

c
(c) o
88. P(TYpe II error) is equal to:

t .
o
(a) o

p
(c) 1-q,

s
34. The power of the test is eqtral to:

g
(a) cr

o
(c) l-s
35. The degree of confidence is equal. to:

bl
.
(a) cr

3
(c) l-cr

4
36. alZ is called:

9
(a) one tailed significance level (b) two tailed significance level

9
(c) left tailed significance level (d) right tailed significance level
t
37. In an unpaired samples t-test with sample sizeg nr = 1l and n, = 11, the value

a
t
of tabulated t should be obtained for:

s
(a)' 10 degrees offreedorn (b) 21 degrees offreedom

/: /
(c) 22degreesoffreedom (d) 20degreesoffreedom
38. In analyzing the rebults of an experiment involving'seven'paired samples,
s
tt p
tabulated t should be obtained for:
(a) 13 degrees offreedorn (b) 6 degrees offreedom
(c) 12 degrees offreedorn (d) 14 degrees offreedom
h
89. The purpose of statistical inference is:
(a) to collect sampld data and use them to formulate hypotheges about a
population
(b) lto 4raw sortclusion about populations and then collect sample data to
support t[re conclusions
(c) to draw conclusions about populations from sample data
(d) to draw conclusions about the known value of population parameter.
40. Suppooe that the null hypothesis is true and it is reiecied; is known as:
(a) a type-I'errsr, and its probability is p ',*:
(b) a type-I errer, and its probability is o
(c) a type:Il error, and its probability is o
(d) a type-Il error, and its probability is p
Basic Statistics Part'II
156
the proportion of
41. An advertising agency wants to test the hypothesis that pereent. The null
adults in pakistan rvho read a sunday Maiazine ]: 2q l
Magazine is:
hypothesis is that the proportion reading the Sunday (
(a) different from 25 o/o (b) equal to 25 %
(c) less than 25 % (d) more than 25 % 1

m
iq distributed:
42. If the mean of a particular population is pu, Z = ffi

o
3

c
4

.
(a) as a standard normal variable, if the populatign
is non'normal

t
,' as a etandard normal variable, if the slm.nle
is large

o
(b)
(c) as a standard normal variable, if population is normal

p
;

s
(d)asthet.distributionwithv=n_ldegreesoffreedom

g
A
(X'-X')-0rr-Pe)

o
of two PoPulations, Z =
l
48. If Fr and l\ are means m 2

.b
A
\"."

3
distributed: I
4
(a) as a standard normal variable, if both samples are independent and less

9
than 30

t9
(b) as a standard normal variable, if both populatigns
are normal

a
(c) as both (a) and (b) state

t
D

(d) as the t-distribution with n, * rlz - 2 degrees of


freedom

/: / s
44. If the population proportion equals Po, then
ij-po is distributed:

s
'= ffi, o

tt p
\, A

(g) >
as a etandard normal variable, if n 30'
6

(b)
(c)
(d)
h
as a Poisson variable
as the t'distribution with v = n - 1 degrees
as a 2g2'distribution with v degrees of freedorh
of freedom

Ho; the absolute value of the


45. Given Ho: [r = tlo, Hr : p * [r,, o = 0.05 and we reject
Z-statisticmust have equalled or been beyond what value? I
(a) 1.96 (b) 1.65
(c) 2.58 (d) 2.33
8

I
46. Given lro = 130, I = 150, o = 25and n = 4; rvhat test statistic is appropriate?
(
(a) t (b) z
(c) x'
(cl) Ii I
1. (a) 2. (a) 3. (a) 4, (c) 5. (d) 6. (d) 7. (c) 8. (a)

e. (b) 10. (a) 11. (b) 12. (b) 13. (b), L4. (b) 15. (a) 16. (b)

L7. (a) 18; (a) 1e. (b) 2a, (c) 2L. (b) 22. (a) 28. (d) 24. (a)
26. (d) 26. (c) 27. (c) 28. G) 2e. (d) 30. (b) 81. (c) 82. (c)

m
33. (b) 34, (d) 85. (c) 36. (b) 87. (d) 3E. (b) 39. (c) 40. (b)
4r. (b) a2. @) 43. (b) 44. (a) 46, (c) 46. (b)
c o
SHOBT QUESTIONS
t .
1. Given I = 100, oO = 16 and po = 90. Find Z.

p o
Ans.0.62

g s
l o
2. Given o=80, n=625, lto= 050andX= 356. FindZ.

I= .b
Ans.l.8E

3
8. GivenHo: p= 12, Hr: $> 12, n=64, tr6, o= 10 andc=0.05. Find Zand,

4
make the statistical decision.

9 9
4. GivenHo:p= 1b0, n= 86,

a t
I= 1G0, S=60ando=0.05. FindZandmakethe

t
statistical decision.

/: / s
Ans,.Z= 1, accept Ho

6. I=
s
Given 120, po= 100, s=34.75 andn=25' Findt.

tt p
Ans.2.E8
6. GivenHo:Fr= lto, H,: pf [to, a= 0.05, t=-2.08and n= 26. Make the statistical

h
decision.
Ans. reject Ho and assert H,

7. GivenHo: p= 10, Hi: p* 10, n = 16, I= 10.5, e=0.?5 ando=0.05. Findtand


make the statistical decision.
Ans. t = 2.67, reject Ho
8. Given o?= 150, oi = 180, nr = 30 dnd n, = 30. Find or,
-*r' .:
Ans.3.32
9.GivenHo:Itr=t,,,I,=6.53,7l=4.44ando*,_1,=0.?8.FindZ.
Ans.2.68
158 Basic Statistks Part-II

10. Given X, = 26, I, = 18, o*,-*r= 3'41, Ho:P*S pr.and o = 0'05' Find Z and

. make the statistical decision.


Ans. Z = 2.35, reject Ho

11. Given Ho: pr = P2, Hr: $t* llz,n, = 100, I, = 14, 4= 4, n, = 150, Iz = 11' 4=n

m
and cr = lYo. Find Z and make the statistical decision'

o
Ans.Z= 9.5, reject H,

L2.
t .
Given Ho: trr ) pr, Hr: Fr < [tz, n, = 60, I, = 75'6, S, = 25, n, = 40,
c
*z= 89'2'

o
sz = 30 and s = 0.05. Find Z and make the statistical decision.
Ans,Z= -2.37, reject

s p
Ho

g
-Xr= 23,sp 11'48, n, = 19' nz= 23 and o
13. Given Hol Pr = P2, H,: Pr * [tz, I, = 15, =
= 0.05. Find t and make the statistical decision'

l o
b
Ans.t =-Z.Z]o,rejectHo

3 .
L4. Given ,1= 1'43, *1= 5'21,n, = 10 and n, = 10' Find sp'

4
Ans. 1.82
15.
99
Given'X, = 84, Xz = 77, nr= 31, n, ! 41, Ho: p, = p, and .*,
-*r=
3.07. Find t.

Ans.2.28

a t
t
16. Given xXr = 671, EXI= 38275, n, = 12' EX, = 551, DXtr= 3L707 and n, = 10.

Find t*, _*r.

// s
s- :
Ans.4.4

tt p
Given Ho: pz ltr = 10, H,: Itz - lrr > 10, n, = lb, nr = 18, Ir = 10' Iz =
25'
L7.
sp = 31.68 and cr = 0.05. Find t and make the statistical
decision'

h
Ans. t = 0.40, accept Ho

18. Given Ho:pr = F2, Hr: ltr # Ft,l, n = 10, f, =-0'5, so =3.44ando= 0'05' Findt
: and make the statistical decision'
Ans.t --0.46, accePtHo
lg.GivenHo:P=0'5,H,:p*0'5,0=0'54'n=1340ando=0'02'Fin-dZand
make the statistical decision'
Ans. Z = 2.93, reject IIo
20. GivenHo: p)0.85, H,: p < 0.85, n= 400,0 =O'gf andcr=0'01' Find'Zand
make the statistical decision'
Ans.Z =-2.23, accePtHo
lChapter 13I Statistical Inference Testing of Hypotheses 159

2L. Given Ho: pr = p2, H,: pr * pz, fi, = 0.30, $r-- 0.25, n, = 1200, nz = 900 and
cl = 0.05. Find Z and make the statistical decision;
Ans.Z = 2.53, reject Ho
22. GivenHo:pr -pz) 0.10, H,: pr-Pz < 0.10, n, = 200, nr= 150, $, = 0.28,
fiz= O.2O and cr = 0.05. Find Z and'make the statistical decision.
Ans.Z=-0.44,acceptHo
23. Describe the procedure for testing hypothesis about mean of a

om normal

c
population when population standard deviation is known.

.
24. Explain the general procedure for testing of hypothesis regarding the
sample size is large.
ot
population mean when population standard deviation is unknown aud the

p
25, Describe the procedure for testing hypothesis about mean of a normal

s
population when population standard deviation is unknown and the sample

g
size is small.

l o
26. Describe the proceduie for testing equality of means of two normal populations

b
when population standard deviations are known and sample sizes are large or

.
small.

3
27. Describe the procedure for testin! equality of means of two normal populations

4
when ol = c2 but unknown for small samples'

9
28. Describe the procedure for testing hypothesis about two means with paired

9
observations.

t
29. Explain the general procedure for testing of hypothesis regarding the

ta
population proportion p for a large sample.

s
30. Explain the general procedure for testing of hypothesis about the difference

/: /
between two population proportions for large samples.
31. Distinguish between null hypothesis and alternative hypothesis.

s
32, Differentiate between type I error and type II error.

tt p
33. Differentiate between one-tailed test and two-tailed test.
34. What is meant by critical region?

h
35. Differentiate between simple hypothesis and composite hypothesis.
36. Differentiate between acceptance region and rejection region.
S7. Define null hypothesis and describe the general procedure for its tdsting.
38. What is meant by test-statistic?
39. Explain the terms hypothesis and tests of hypothesis. :

40, Explain the terms level of significance and tests of significance.


41. What is meant by a statistical hypothesis?
42. Explain the-difference between one-sided and two-sided tests, When should
each be used?
43. Explain with example the clifference between acceptance region and rejection
region.
44. What is meant by critical value?
45. Define the terms potver of a test an,i power curve.
160 Baslc Statistics Paft-II

E}(ERCISES
I{
/t . A sampie of 900 plants is found to have a mean of 34 cni. Can it be reasonably
regarded as a rahdom'sample from a largB population with mean 32 cm. and
standard deviation 23 cm. Use 5 %olevel of significance.
Ans.Z = 2.6L, Ho:p = 32, H,: tt*32: rejectHo

m
2, Suppose that the variance of the IQS of the high school students in a certain

o
cityis 225. Arandom sample of 36 siudents has a mean Ia 9t 106. Ifthe level of

c
significance is chosen at 0.05, should we conclude that the IQS of the high school
studente in this eity are higher than 100?

t .
o
Ans. Ho: p ( 100, Hr: lt > LO}; Z = 2.4; reject Ho

s p
3. Suppose that scores on an aptitude test used for determining admission to
graduate study in statistics are known to be normally distributed with a mean

g
of SOO and a population standard deviation of 100. If a random sample of 64

o
l
applicants from a college has a sample mean of 537, is there any evidence that

b
their mean score is different from the mean expected of all applicants? Use
= 0.01.
.
cr

Ans. Ho: p = 500,


.E
H),! * 5OO,

43
Z= 2.96; reject Ho

9
4. I€t X 1 N (p, fOO) and X be the mean of a random sample of 64 observations of

t9
" X, giveR that X = 15. Test Ho: p = Lz against the alternative H,: p > 12' Use
o = 0.05.

ta
/: / s
Ans.Z= 2.4i reject Ho
5. ,A random sample of 64 drinks from a soft-drink machine has an average content

s
of 21.9 deciliters, with a standard deviation of L.42 deeiliters. Test the hypo'

tt p
theeis that p = 22.2 deciliters.against the alternativb hypothesis p < 22.2, at the
5 % level of signifrcance.

h
Ans.Z=-1.69; rejectHo
G. A.random sample of 200 trucks were driven on the average f 6300 miles a year
*itt sarmple'etandard deviation of 3100 miles. Test the null hypothesis that
"
the average trucli mileage in the population is 1?000 miles a year,againot the
alternativl hypothesis that the av€rage is less. Use the 5 % level of signifrcance.
Ans. Ho: p = 1?000, Hr: P < 17000, l= - 3.19; reject Ho
7. AmanufafiUrer of detergent claims that the mean weight of a particular box of
a"i*'g""tj& B.2b pourrJr. A random sample of.64 boxes revealed a sample
of t.238 pounds with a standard deviation of 0.11?.pounds. Using the
"uur"e"
I % level of significance, is there evidence that the average weight of the boxes
is different from 3.25 Pounds?
Ans. Ho: lr= 3.25,H,: p * 3.25, N=-0.82;acceptH,
ri
lChapter 13I Statisticat Infprence'Testing gf Hypotheses 161
8. Past experience indicates that the time for high school seniors to complete
a
standardized test is a Rormal random variable *itt, u mean of Bb minutes.
If a
random sample of 20 high school seniors toolc'.an ii.i *inutes to
=
"r"""U"
complete this test with a standard deviation. i.B minutes, "i
test the h;;il;r;;
at the I o/o level of significan"ulhat l, = minutes against the alternative that
p < 35 minutes.
15

Ans. tp- 1.976:aecept Ho


w--
9' g r*naom sample of 10 from a population gave X = 20 and sum of square of
om
. c
devjations from mean is 144 test Ho: p = lg.b-against H,: p > 1g.5. At cr

t
= 0.0b.

o
Ans. t = 0.395; accept Ho

p
r0. C'i/dn the following information. What is your conclusion in testing each of the

s
indicated null and alternative hypotheses?

og
(,
bl
(ii)

3 .
l; 4
(iir)
Ans.(i)t=-

99
accept Ho (ii) t =
accept Ho: (iii) t = 1.5; accept Ho

t
11. suppose you wish to estimate the difference between the daily wages for

ta
machinists and carpenters. Two independent samples of E0 people each are

s
respectively taken, and the relevant data are shown as follows:

/: /
Maehinists Carpenters

s 50 ft.
Sample Size 56 n\

tt p
Sample mean 172.5 \ 170.0 q\

h
Population variance e8 (" toz r
Should we reject the null hypothesis that the daily wages for machinists
and
' carpenters are the same in favor of the alternatiie t
v'i"*".i" th;; they aie
differentata=0.0b.
Ans. Ho:-Fi= ltr, Hr: pr .
k\ f = L.25; accept Ho -{)
12' A random sample of 100 *orkers in a large.fa-rm took an average of 14 minutes
11.:"rol"te aJask. A random sample of rsovivbikers in another-large fgrm took
an average oftY minutes to complete the task. Can it be assumed at
b % level of
significance' that'the average time taken by the workers in the two farms
is
saae, if the standard deviations of all the workers of first farm andsecond farm
are 2.minutes and I minutes respectively.
Ans. Hj:'pr = p2, Hr:[rr {yr, Z= g.49; reject Ho
Basic Statistics Paft-

has a
13. .A tire manufacturer wishes to test two types of tires. Fifty tires of type 'A'
has a mean

Ii
mean life of 24000 miles with 52 = 6250000. Forty tires of type 'B'
difference between
life of 2G000 miies with 52 = 9000000. Is there a significant
.the two samPle means ? Use o = 0'05' '
Arrs. Ho: Fr = Fz, pr * 1tr, Z= - 3.38; reject Ho
|r: Inajor outlet
14. A carpet manufacturer is studying differences between two of its
in the time it takes
m
before
stores. The company is particularly interested

o
. customers reeeive .urputlng that Las been ordered from the
plant' Data

.
."n."r"i"g a sample of delivery times for the most popular type of carpet are
t c
o
summarized as follows:

p
A B

x 34.3 days

g s
43.7 days

l o
.--"5 2.4 days 3.1 days

. b
n 4l 31

43
At the 0.01 level of significance, is there evidence of a difference in the average
delivery times for the two outlet stores?

99
Ans. Ho: Fr = Fz, Hr: Pr * 1t, Z - - 14; reject Ho /

t
15. Two random samples taken independently ho- ,or,nal poPulations with an

ta
identical variance yield the following results:

s
I II

/: /
Sample Sample

Size nr 10 nr=18

s
tt p
Mean Xz=
Xr=10 25

h
Variance t? = tzoo .3 = goo

population means is 10,


Test the hypothesis that the true difference between the
> at the 5 % level
that is, Ho'-t, - Pr = 10, against the alternative H': Fz - lti 10
of significance.
Ans. t = 0.40; accePt Ho
are 196'42 and
16. The means of two random sarnples of sizes 9 an{ 7 respectively the mean are
198.82 ,".p..tiu"Iy. The sums of tn" squares of the
deviation from
26.94 and rA.ii iespectively.
-*itft Assum" tt ut the two-sampleq are. drawn from
normal poprlut-io* iienticat variance. .Test Ho: ltt = Pz'' against the
aliernative Hr: ltr < Vzatthe 5o/o level of significance'
Ans. t = - 2.63L; reject Ho
[Chapter 13J StaHstical Inference Testing of Hypotheses . 1,63
'

17. In an examination, a class of 18 students had a mean of Z0 with . = 6. Another


class of 21 had a mean of 77 with s = 8 in the same examination. Is there reason
to believe that one class is significantly better than the other? Consider the
students as samples from one population. Use a b %o level of significance.
Ans. Ho: Hr: *
lF ltz, trr 1tr, t = - B.0b; reject Ho
18. ThTr{veights of 4 person-s_before they stopped smoking'and b weeks after they

om
c
Person '1 .)
3 4

Before .148 776 153

t .
118
After L54 t76 150

p o 120

s
Use the t'test for paired observations to test the hypothesis at 0.08 level of

g
significance that giving up smoking has no effect o, , pu"ron's weight.

]{i Vr * ltz, t - - 0.662; accept Ho


l o
Ans. Ho: ltr = llz,

. b
19. An.expery{en! was performed with five hop plants. One half of each plant was

3
pollinate/and the other half was non-pollinated. The yielfi of the seed of each
hop pla/t is tabulated as follows:

Pollinated
9 4 0.78 o.76 0.43 o.92 0.86

t
Non-pollinated
9 0.2r o.L2 0.32 0.29 0.30

ta
Determine at the 5 Yo level of significanee whethgr thq pollinated half of the

/: / s
plant gives a higher yield,in seed than the non-polihated half.
Ans. Ho: pr s lrz, Hr: Fr > lt2, t = 5.lOZ; reject Ho

s
20. Let X designate the.defective parts produced by an automatic machine. From a

tt p
randomly selected sample of 50 parts, 10 are defective. Let p be the true
proportion of all the parts that are defective; test the null hypothesis Ho: p 0.1
-

h
against the alternative hypothesis H,: p * 0.1 qt cr = 0.01.
Ans. Z = 2.Bbg; accevtrh.l'l \^
'1
^
,/
2L. Acoin is tossed zdrm"sresulting in 5''heads. Is this sufficient evidence to reject
the hypothesis at the 5o/o level of significance that the coin is Uutr"""a i" f^;;;;
of the alternative that heads occur less than iloa/o ofthe times?
Ans.Ho: p=b.5, Hr:p< 0.5, Z=-2.286; rejectilo ' \, :
I
22. A
random sample of 20Or'woilieis was selected from a population and 140
workers were found to be skiiled. The factory owner ctaimea ihut ut least g0 %
workers were skilled in his factory. Is it possible to reject the claim of the factory
owner at 5 %o level of significance.
Ans. Ho: p > 0.80, Hr: p < O.BO, Z= - B.bB4; reject Ho
Basic Statistics Part-II
164
85',/";i. the parts which it supplied
28. An electric company claimed that6L'least
conformed to .p""ii.;;;;;.. e ear$pk of
4,O-f{rts'was tested and 75 did not
. r{aim at 1 % level
of
meet specifications. Can we ".""pi it u- .o*puny's
significance?
accept Ho
Ans. Ho: p ) 0.85, Hr: P < 0'85, fl= - 2'095;
*L':l* f::fXl"':l"i:f!i
m
experu is
24. An expert rutrErE in the proportion :t
ru interested
of 100 males' 31
d"'' ltt a randomr sample
that have a certain minor blood disor

o
r^-r^r
resred appear +n rra,e
to have
lH"lffi;" dTilffi;il"*J;;rl;-d1
c
il,;-i "i]09 r*.*es -.-^ ^L^! rL^ --^nnrfinn
.
il: t'"Ili,ffi ffi z'

t
i:}ffi "i.
.r
*itt,1""er "r 3isni,n11::il"
rhis
: :h:, ::H:::17
blood disorder is significantlv
:?";ff'tr'il";;rlation ofafflicred
o
than the proportion women afflicted?

p
;;";1";

s
Ans. Ho: pr ( pz, Hr:Pr u pr, Z= 1'109; aci:ept
Ho

g
of male and{emale students have
studies comparing the mathematical abilities

o
25. of grades earned in
produced .o"iii.[i"g .on.tu"iorrr. tt " distribution at one institution is
introductory statistics by " ,"nio* s"*pl"
l
of students

b
.
that there is no difference in
given below. Use thdse data to t"tlit. hypothesis
that reeeive grade A' L€t

3
the population proportion of ,rr"i"t an'd females

4
cr = 0.05

9 E Total

9
A B C D
Gradb

a t - r.6 8 11 68

t
Males 15 18

/: / s
19 t2 t5 82
Females 20 16

s
Ans. Ho: pr = pz, Hr: Pr * 92, Z= - 0'331; accept Ho

tt p
h
Chapter
,"i
' \-/ 14
J
REGRESSION AND CORRELATION

om
c
..4.1 trNTliODUCt'iON

.
, ,,

t
rt

'l'lret'e iii'c sorr)e statistical tools with the help of which .l

we study a Sirfgle'

o
)
variable. llhe averages, the measures of dispersion, ih, moments etc. areial."f"tai

p
the^ f'requency di'-qrribution of a single variable, There are certain
toof. *iif, ilrt

s
1'o1
heip of which two or tnore than trvo variables or attributes are studied. What do
we

g
study when tliere ar:e tlvo or r'llore than trvo variatrles or attributes. In Cfr"pt"i

o
association, we slurll discuss tnutuai relationship between qualitative variables.

l
The
qualitative variables are aiso callerl attributes. The attributes are studied

b
by a

.
statistical, tool 12 (read as chi-square), In the present chapter and in the next

3
chapter we shall cliscuss the tools rvhich ,rud for the rtudy of two variables.
".* the level of this
4
Cases of more t,harr trvo variables are beyond book, and therefore
wili nr-rt be covered in this book. Theru u.L t*o different techniques whieh are used

9
for the study of two ot' lnorc tharr two variables. Thes, ur. regression and,

9
t
correlation. Both studv the behaviour of the variables but they differ-in their end

a
resulbs. Regression studies the relationship where {,epend,ence is necessari$

t
involved' One vilr'iai'rir: has the rlepentlence on a certain number of variablee.

/: / s
Regression can be used lbr preclicting the values of the variable which depends
upon
other variables' Correl;rtion attetnpts ta ',stucly the strength of tire mutual

s
relationship beiween two variablcs. In correlation rve asgume that the variables are
random arrd ricl.rcntlc,ce of'arr;' nature is not involved.

tt p
r4.2 MATnEilr-aTrcAL I{ODEL OR EQUATION

h
Regression involv-es the stud.v of equations. First we talk
about some simple
equation's br tlroclels. Tire sirnplenl rnathematical nodel or equation is the
equatlon
of straight irnc.
Example 14.1.
Suppose a shop-kecper is sclling pencils. He sells one pencil for Rs. 2. Table
14'1' gives the nunrber of pencil"s solcl.and the sale price of the pencils.
Table 14.1.
Number of pencils solcl 0 t1 .)
3 4 D

Sale pr:ice (Its.) 0 , 4 '6, 8 10


Let us exarnine the tu,o variables given in Table 14.1. For the eake of our
convenience, we can give sollre name$ to the variables given in the table."Let X
1G5
Basic Statistlcs Part-II
realised hy
denote the number of pencils sold and S (S for sale) denote the amount
selling X Bencils. Thus,
() .1
5
x 0 1 r) ,a

2 4 6 (t 10
S 0

The information written above can be presented in sonte other lorms as


rveil'

m
For example we can write an equation describing the above reiation bctweetl
x and

o
S. itl- *iy simple to write the equation. The algebraic equation
conlrecr:Ilg ,\ arrd S

. c
ie, S = 2X.

upon x. Here X ie called independent variable a1d s


ot
It is called mathematical equation or mathematical model in rvhrch S depernds
is called dependent variable.

p
Tl;;r is exact relation between X and S. When 2 pencils are sold, the sate price is

s
ns. +. Neither leee than 4 nor more than 4' The atrove moclel is caileti detr'rininistic

g
mathematical model becauee we can determine the value ot' S without any
erlor by

o
pottii* the value of X in the equation, The sale S is said to l:e function X' Thisof
atatement in eymbolie form is written as: S = f(X)

bl
.
It ie read ac ,S ia function of.X'. It means that S depends upon X anri oniy X and

3
no other element. The data in Table 14.1 can be presented in the form
of a graph as

4
ehown in figure 14.1.

99
a t
s t
/: /
s
tt p
h
-*. Figure 14.1

The main features- of the graph in figure 14' l ' are:


(i) The graph liee in the firet quadrant because ali the values of X antl s are
poeitive.
(ii) It ia an exaet straight line. But all graphs are not in the tbrrr e.rf a straight line'
It could be eome curve aleo.
(iii) All the points (paire of x and s) lie on the straight line.
(iv) The lile passes through the origin'
t:II. [chapt3rltBggrg;sipn:!{Sq!:lgtalton, .- _ 167
thv (v) Take any point P on the line and draw a perpendicular line PQ which joine P
with tire X-axis, Let us find the ratio
&E Here pe = 6 units and Oe = g
units. f'fxrs
ffi =56 = 2 units.

It is callerl the slope of the line


and in general it is denoted by 'b'. The slope of
ve1l, the line is the $&rne at all points on.the line, The alope 'b' is equal t-o the change-in Y.

m
and for_a-unit change in X. The relation S = 2X is also ciUea Hncar equation between X

o
idS and S,

c
I4.2, '
.
&xomple

t
,
:nds Suppose a carpenter wants to make some wooden toyg for the small children. Hg

o
rble, has purchased some wood and sonre other material for Be. 20, The coet of mhking

p
:e is each toy is Rs. 6. Table 14.2. givea the information about the number of toys madl

s
istic and the cost ofthe toys.

g
rr hry
Nurnber of Toys 0

o
Thie 1 2 B 4 6

l y
Qost of Toys 20 25 30 36 40 46
and

. b
Let X denote the number of toye and Y denote the coet of the toyo, What is the

3
has , algebraic relation between X and Y. Whcn X = 0, = 20, Thie is called frxed or

4
starting cost and it nray be denoted by 'a'. For each additional toy, the coct ia Bs, 6.

9
Thus Y and X are connecte{ through the following equation:
'
9
Y = 20+EX

t
It is called equation of straight line. It is aleo mathematical model of

ta
deterrninistic naiure. Let us make the graph of the data in Table 14,2. Figurc t4,2,

s
is the graph of rhe data in Table 14.2,

/: /
s
tt p
h
Y*20+5X

i are

line. Figure 14,2


Let us note some important features of the graph obtained in figuro 14,2.
(ir The line AB does not pass through the origin, It pacses through the point'4'on
Y-axis. The distance tretrveen A and the origin '0' ic ealled the lintercept'and ir
usrrally denotpd b), 'a'.
168 Basie Statistics Part;II
(ii) Take any point P on the line and complete.a triirngle I'QA as *iii:rvtr in the
figure. Let us find the ratio between the perpendicular PQ atrcr, the base AQ r;f
this triangle. The ratio is,
ffi = i[ = 5 units.
I'his ratio is denoted by 'b' in the equation of straight line. Thus ';he equrrtio;r of
straight line Y =20 + 5X has the intercept a = 20 and siope b = 5. In geuerai,

m
when the values of. inlercepl and slope arc not known, we write tlie equatlon of

o
straight line as Y = a + bX.'It is also called linear equation betweer, X and Y anrl

c
.
tf
the relation between X and is called lhwar. The equation Y = a + bX tnay also

ot
be called exact linear model between X and Y or sirnl:ly linear model between X

p
and Y. The value of Y can be determined conrpletcly' when X is given. 'I'he

s
relation Y = g + bX is.therefore, called- the deternrinisl,ic lurear mc,ciel between X

g
and Y. In statistics, when we shall use the terur 'liuear r:iodell, we shall noi

l o
mean a mathematical model as described above.

b
'Another property of the exact }inear model is rhat thc 1st rliiferenccs of

.
Y-variabie are zero. The first differences of the Y-varialrle in T'abie i4.11. ar*

3
4
calculated as below:

9
x Y Fi.rst differences AY
0

t9 20
2'i-20 =F
I
ta 26

s
30-2.5 = 5

/: /
()
.JU
35*30 = 5

s
J 35
40-35 = 5

tt p
4 40 :,,.,
45 -. 40 5
o 45

h
F'i
' It means that when all the points of the pairs (Xi, Yi).tie on the atraight Line, the is
first differences AY are exactly constant. We ehali take help li'orn rhis p::opei'ty later da
on. In a certain observed. data, when the frrst.differences rviil l:e r:<.rnst,ai1t or almc,st de
sel
constant, we shall consider the observed data to be close to a straight line anrl we
11
would like to find the equation of that line.
14.3 NON. LINEAR IVIODEL str
Let us consider an equation Y = 10 + 5Xz thr
By putting the valuea of X * 0, 1, 2, 3, 4, in this equai;ion, lve find the values pri
of Y ae given in Table 14.3 belolr,. The first and seeond clifferences are calculated in
'lable 14.3.
Table 14.3.
tr'irst differences Ay Seeond differences A2Y
n 10
15*10 =5
1 15
16-S=10
30- 15 = 15
2 " 2A* 15=10
m
30
55-30 = 2b

o
.,]
3 55
35-25=10

c
90-55 = 35

.
4 90
'I he scccnd
dlfferences are exactly constant. The gener"l

ot
qrudr"tffiuation or

p
mociel is rvritten as

s
Y = ,1"+bX..cXz : (c*0)

g
Ir is sist'r c;iiit:d second (ltgr"ee parabo{a or second degree curve. The graph

o
dafa is r,hoir';r of the

l-
bt:1r,.,,,- i;.r i'riturc i+.g.-
i'
I
l.

. b
43
99
a t
t
Y=lo+5x2

/: / s
s , 'r't'r'
tt p
| 2
Figure 14.8

h
Figure l'1''1 is ttot a straigirt line. it is a curve or we
say that the model y = 10 + bX2
'fhe 'qr*denrs are advised to rememb;;iil;f1"
T,1o";li"*ar.
clata' the second ciffr,'rence:s a.re eonstant or almost'constant, " "";i"r" ;;;";;;
we frnd the second
rlegree curvd clcse ro Lhe q:bsr:rved data. we
shati iuc"-tlrtr;;; in r:.me
series. "i.iir.ri"n
I.1.4 ST.\TtrSTICAL T,{ODEL
statistical model is aiso a mathematical modefl but the difference
statistir:al mcciel always contain.q an error term or is that
rande;m tei; i" ,i" ,*i.rliat'Ii
the mathemaiical equation. what is an
-- tirmlrl*t
"rro* ---'-.."' us take ,"
S practrcal lifc to explain tliis term. ".;;;il;i
n Suppose 'r,hei:e arn 10 agricultural plots of the la**tir"
it is assurneel ti-rar rhe plors arer similar in all. [h" .rme fertility.
"rrd
t;r];r;;" lrri a"u.rrrrg
rre ro remain as, cnnstant as p*ssibte from piot ""p-rir.:tt.
to pis;jtir;;;#;#'"?;.e
.
is.used
,. 1 |, ..!
170 Basic Statistics Paft-II
each plot. The yie,lds of rice from
i{ithe plote. We decide to put 5 kg. of fertilizer in
i"ot pi,rt Are rpcordecl. tei X denote the amount of fertilizer and Y denote the yield
[i;;. rr;" ei"gr" fixed value of X there are corresponding 10 figures of vields of
in very large number.o{plots, then the vields
;il;. i1t ke. oiiu"tilizer is applied some rnean denoted by pys. The mean Fyr5 is the
i,ifilni, u riormal distribution with
mean of y values when X is fixed at 5 kg. This mean is also denoted by E00' S9*j

m
,ffiki; vield, Y
arr uuor* b6y") rna some are below E00. The difference between theresidual.

o
'.ril Em fr *uUuA ihu term or the random term' It is also called the

c
,tffi;*;gdi; "rro" rice are a random variable with

.
*uy be denoted by ei. The yields of

t
from the
n Certain probability distribution. The random .errors are calculated

o
Ieiaoro'r"iltk;{ A "arrdom variable calcuiated'from another random variable is
p
aIsO ; random variable. Thus the errors ei are the random
variable and it is a well

s
0. Table
krrown fact they ei's are normally distributed with mean zero. Thus E(ei) =
L!!,shows

og
yields of rice Yi for a given value of X and the mean E(9 and the

l
".rt.in
errors ei are calculated as below:
Table 14.4.

. b
Amoufrt of fertilizer, X

43
Yield of rice (kgs.), Yi Average Error ei = Yi - E(Yl

9
40-55=-15

t9 40-55=-15

ta 50-55=-b

/: / s 50-55=-$'

s
tt p
bu-DD=-c
6k9., E(D = 55 kg

h
60-55=5
60-55=5
60-55=5
70-55=15
70-55=15'
Xei=0
t
r fnia5ie fal w6"have takerr only 10 values of Yi.In actual practice the h'umber i
are very large corresponding to a fixed value of x' In this Jable we
'Y.rvalues
of r
given Y values'
;hp;rfghi;t trr.reii po iquation which links the X value with the d
i' tt''| '' t
.l
[Chapter 14] Regression and Conrelatidn L7L
There is in fact no relation of mathematical nature between X and individual values
of Y*. The individual values of Y cannot be determined by any mathematrcal .
equation. If we change the amount of fertilizer, we shall obtain another set of Y
values (distribution of Y values) for the difTerent yields of riee. Thus for each value of
X, there is a normal clistribution of Y values. This fact is illustrated in frgure L4.4.

om E(Y)
I

c
lB

t .
p o
g s
l o
. b
4 3
99
a t x2 x3 xn

st
Figure 14.4

/: /
On each value of X, there is a normal distribution with mean E(Y). The Y-values
in the same distribution differ from their mean E(f) and the difference is called

s
error term. If a population data on two variables X Arld Y is under consideration,

tt p
then a linear statistical model or equation can be written as:
Yi = cx+pXi+e1

h
where cr is the intercept, B is the slope of the line and. ei (epsilon) is the error term
and it may take positive or negative values. The line AB in figure 14.4. which passes
through the Eft)'s is called the regression line. The observed value Yi can also be
written as Yi = Effr) + e 1

This equation contains a random term ei on the right side. Thus the variable Y1
is ranclom because it depends orl €1.
I4.4.T INDEPENDENT AND DEPENDENT VARIABLES
The value which is'decided by the experimentoi,',is called fixed variable'or
independent variable. it is also calied regressor or predictor. The variable which is
influenced by the independent variatrle is called" dependent variable. It is also Cailed
iegressand or predictand. This variable is of random nature and cannot be
dctsrlnlllsd exactiy for a given valrre of X. It is also called random variable.
172 Basic Siatistics Paft-II
L4.4.2 CAUSE AND EFFECT RELATION
t ;' In a relation, in which one variable is independent and the other is dependent,
qome people use the terrns 'cause' and 'effect'. In the previous exa.mple of pi'oduction
' of rice for a given dosage of fertilizer, the aniount of fertilizer is the 'cause' and
'produetion of rice' is the 'effect'. Thus in this regression relation, we can say that
fherd is 'cause' and 'effect' relation between the variables. Some special food may be
'
ferited on poultry birds. The amount of food is 'cause' and ihq rveight of the birds is

m
The 'effect'variable is also called the response variahle. But there may be
-'aan,effect'.
o
regression relation between two variables X and Y in which there is no couse and

. c
effdct (causal) relationship between them. In sorne cases a change in X does cause a

t
, change in Y but it does not happen always. Sometrmes the change in Y is not caused

o
by change in X. The dependencq qf Y on X should not be interpreted as cause and

p
' effect relation between X and Y. In regression analysis the vi.ord dependence means

s
that there is a distribution of Y values for a given single value of X. Fror a given

g
height of 60 inches for men, there may be very large number of people with different

o
weights. The distribution of these weights depends upon the fixeri value of.X. It is in

l
this sense that the word dependence is used. Thus depend.ence does not mean

. b
regponse.(effect) due to some cause. Some examples ar,e discuissed i:ere to elaborate
: the idea.
3
'
(i) The sun rises and the shining sun increases the ternper:ature. Let temperatrtre

9 4
be denoted by X. With increase in X, the ice cn the mountains melts and the
average thickness of ice Y; decreases. It is possibLe that the thickness of ice

t9
. decreases due to increase in temperature. Rut this is also possible that the

a In
thickuess of ice is decreasing due to weight anrtr hardening of ice. We may be

t
fegressing the thickness Y against the temperature X only whereas another

s
/: /
ifnportant factor is being ignored. this type of problern, more than one

s
qlmultaneously to estimate the unknown parameters.

tt p
(ii) We may think that increase. in the number of workers (X) is increasing the
production of fans (Y) in the factory. The increase in Y may be due to change in
i
h
't ' the administration and some changes about the leave rules and other benefits.
In a regression relation there may or.-ry rroi be a causal relation between X
and Y. The cause and effect relation betrveen twr: variables is also calleC causation.
It is important to note that the statistical method of regression analysis is silent
about the cause and effect relation between the variables. Sometirnes it is not
' possible to identify as to which variable is 'causc' anci which onc is 'effect'. In fact,
'. the answer is to bq searched rrot in regression alaysis but in some other area of
relationship,betvy,e6h the variables.
14.6 REGRESSION
Regression is concernecl with the.study of reiati.onships among variables. The
- aim of regression (or regression analysis) is to make mr-.dels fcrr prediction and for
making oihu" inferences. Two variables or more thrrn trvo variables may he treated
lChapter 14] Regression and Correlation. L73
scientist, Sir Francis Gaiton, who analyzed the heights of sons and the average
heights of their parents. Gali;on concluded that the sons of v6ry tall (or short)
parents were generally taller (or shorter) than the average but not as tall (or short)
as their parents. His work rvas published in 1885 under the title "Regression Towarrl
Mediocrity in Hereditary Stature". According to his conclusion, "regression tcrvards
mecliocrity" means that the sons heights tended towards the average rather than
take the extreme values. But now the word regression is used in much broarler

m
sense. It is the statisticai study of the relationship among variables.

o
14.5.T SIMPLE LINEAR, RITGRESSION

. c
Suppose we want to study the depenclencc'of Y variatrie on a single indelenrlenI

t
rrariable X. The variable Y depends on X and is also subject to unuccountatrle errors.

o
This study is covered by simple linear regression. For a popuiatiou data the simple

p
linear regression rnodel is written as Yi = ct + pXi + e 1 rvhere u is the interc'ept, p

s
is the slope and ei (epsilon) is the el'ror term and on sampie basis ihe simple linear

g
regrgssion model is r.r'ritten as Y1 = a + bX, -r e1 s'here 'a' is the intercept in the

l o
sample anti is the estirrrate of the population parametel o. ?he para:neter p is

b
estimated by the sample value 'b' and e, is the err.ol term in rhe equatir.,n.
14.5.2 PURPOSE

3
OF REGRESSION AN.A.LYSIS
.
4
There is no statistical prohlem when the parameters of the regression moclel are

9
known. Statistical problern arises when some of the parameters are not known. The

9
study of regression aims at:

t
(i) The regression models contain the unknown parameters. These pararneters are

ta
estirnated in regression analysis.

s
(ii) The value of the dependent variable can be predicted. when the valu'e of the

/: /
independent variable is fixed.
(iii) Certain hypotheses about the parameters cr and E J are tested. Confidence

s
Confidence t
intbrvals for cr-and $ are construclg,{
/t xi j l (: .,,#'\11f
tt p
f* r" h lilr lAr u "#+"rt'.'
14.5.3 SCATTER DTAGRAM (v filu,l
7\ t f , qaq lK Jaq4.-J-'r
! "r .r: {!,{, t'i '
h
Scatter diagram is a graphic pi#ure of the sample data. Suppose h random
sample of n pairs of obse.rvations has the valnes (Xr, Yr), (X2, Yr, (Xs. YJ, ...,
(Xo, Yrr). These poi4ts are piotled on a rectanguiar co-ordinate system taking
independent variable on X-axis and the clependent variabie on Y-axis. Whatever be
the name of the independent variable, it is to be taken on X-axis. Suppose the
piotted points are as shown in figure 14.5(a). Such a diagram is cailed scail,er
di.agrant.In this figure, we see that when X has a smail value, Y is also srnall and
when X takes a large value, Y also tahes a large value. This is called direct or
positive relationship between X and Y. The plottecl points cluster around a straight
line. It appears that if a straight line is drarvn passing through the points, the line
will be a good approximation for representing the original data. Suppose we draw a
line AB to represent the scattered points. The line AB rises from left to the right and
has positive slope. This iine can be used to establish an approximate reiation
L74 Easic Statistics Pail-II
between the rhnclorn variable Y and the indepenrjent variabie X. It is non-
mathematical method in the sense that different persons rnay draw different lines.
This line is cailed the re{ression line obtained by inspectiorr or juclgernent.
l;,
L/,\
B" "t{
om
t . c
p o
Fositive and Liuear

g s
Negative and Linear

l o
Y (c)

. b
3
[\,
tuF"
9 4
t9
ta
//
Negative Non--Lincar
Non--Lincar
s No Reiationship

:
Figure 14.5

to s
Making a and drawing a line or curve is the primary

p
scatte,r d,i,agrant

*o.ttt
investigation assess the type of reiationship between the variabies. The
knorvleige gained from the r."iti, diagrani can be used for further analysis of the

h
clata. In of thd cases'uhe diagrarns are not as simple as in figure 14.5 (a). There
are quite complicated diagrams and it is difficult'to choose a proper mathematical
model for representing the original clata. The scatter d"iagram gives an indication of
the appropriate mod.el which should be used for further analysis with the help'of
rnethod of least squares. Irigure 14,5" (b) shows that the points in the scatter
diagram are falling from the top ieft corner to the right. This is a relation called
inverse or indirect. The points are in the neighbourhood of a certain line calied the
n
regression line.
As long as the scattered points show a closeness to a straight trine of some
direction, wle draw a straight line to represent the sample data. But when the points
d.o not lie around a straighl line, we do not drarv the regression line. Figure 14'5.
(c)
shows that the plotted points have a tendency to fall frorn left to riSht in the form of
[Chapter 14] Regression and Correlation !75
a curve. This is a relation called non-linear or curvilinear. This type of relations will
not'be discussed in this book.
Figure 14.5 (d) shorvs the points which apparently do not foliow any pattern. If
X takes a small value, Y may take a small or large value. There seems to be no
sympathy between X and Y. Such a diagram suggests that there is no relationship
between the two variabies. But there is one point to be remembered that the figures

m
Iike 14.5 (d) have sometimes the relationship of cireular nature, something which

I,4.6 FITTING A LINEAR REGRESSION LINE - THE METHOD OF LEAST


c o
SQUARES

t .
o
The linear regression line is Y = o + BX which contains the parameters s

p
and p. If we know the values of cr and p, then this line is determined. llut the values

s
of a and p are usually unknown and we have to find the regression line by

g
estimating cr and B from the sarnpie data. Suppose we have a random sampie'of n

o
pairs of observations (X,, Y1), (Xz, YJ, (X3, YJ, ..., (Xn, Y*) and we are required to

l
find the regression'line of Y on X. These observation's are plotted in figure 14.6.Let

b
.
us draw a line AB passing through the plotted points. The values of the dependent

3
variable which }ie on the line are denoted bv t. Thus the estimated regression iine
of sampLe data is t
values of o and p.
= ,
4
+ bX where 'a' anci 'b' represent the estirnates of the true

9
Y

t9
ta
/: / s
s
tt p
(&,' Yn)

h
(&, Yz )
1x, ' Y;.)

(xr, Yr )

Figure 14.6
The difference between Y;.and t1 is called the error which is denoted by ei. The
sum of squares of errors (SS,E) can be written as .

./\.,AA.A.
SSE = ff, -(i,)'+ Gz-i")'+... + Gi-yJ'+... + CY,r-Yrr)'

= e?+ ,i+ ... + ef + ... + u3 - x"?


L76 Basic Statistics Paft-II

We have to finel that line whicir ie &esC, iitti.tt,g for the sample data. This besl
fi,tti.lg line is otrtainecl by using the principle of least squares.
The principle of least
.rquu*. is that "the bcst fitting line is that one for which the sum of squares of
errors is rninimum". This ineellls r've have i'dminimize
SSII = Ic? =I(Y,-t,)t butt, = aibXi a*cl SSE=:tli-a-'bX1)z
Thus SISE is a function of 'a' antl 'lt'. Each iine has some vaiues of 'a' and 'b'.

m
Those values of 'a' anci 'b' are required lbr ivhich SSE i*" minimum. The '"'alues of 'a'

o
and.'b' are calculatecl from the foil<.iwrng trvo equatians caiied the normal equations:

c
EY = na + b EX and XXY = aEX + bIX2

t
Solving these norrnal equaiions simultaueously, rt'e get the values of 'a' aud 'b'
.
o
rvhich minimize SSE. We can calculate the values of 'a' and 'b' b-y using the formula.s

p
as below:

s
(IX) (IYi
YL1f -.*-

g
ICi - Xi n'-Y) n

o
D= (rx):

l
J+
rix - x):l sV!-
iA- n

b
. --li[z - (I& (II\):
If the numerator ancl elenominator are muitiplied with n, this is convenieni; for

3
compu bitional purposcs.

-X,J:EIF 4
n :XY - (EX) (I\1 (IX'i) (lY)
get b= anri a =

9
wc 1:arz

9
The slope 'b' is alsc called thc regression cr.refficient of Y on X and is denotecl by

t
b,*" Putting tire vaiues of 'a' ancl 'b' in the regressiotr equation t = u bX, we can find

aregt;ess+ =
+

t
n

s
ttre Y values which iie on the ion line. The t values tne estimated
called the
les are calied e

://
values. The linear regression equation 2 r- bX can be used" to estimate the

s
values of the dependent variable when the value (or values) of X is known. The

tt p
p
calculatecl values of 'a' anil 'b' are the estimates of the unknown parameters cr and
and are used for inference about ry" and B. The inf'erence about cr ancl B wili not be

h
discussed in this book'
Horv to Write Normal Equations
The cterivation of rhe normal equations is heyon,l the level cf this book. Horvever
the normal equations can be written dire.ctl,r'as explained helow:
We write the equahon of straight line i.e.' 'y = 6 + hX '^"" (1)
We.wani to rvritc the norrnai equalion qf 'a'. The coef,ficient of 'a' is l- and the
ahove equaticrn is muitipliecl with t and then surnmation I is applieti.
Thus rve get
IY= na+bLlX (a+a+""+a=Ia=na)
This is called normal equation for 'at. To find the nortnal equation for 'b' the
(L) and then
equation (1) is muitiplie.t rv.ith X which is tire coefficipnt of b in equation
stumuration X is aPPlied- We'get
IXY -: aIX+b:X!
'fhis is callcd the ttot'm:rl r-'quation for'b''
[Chapter 14] Regression and Conrelation L77
Exumple 14.3.
Const,rtict an equaticn of tire hne of regres-eion {using' norrnerl equtitions} of yrelcl
of ri.ce on waier from the riata given in the fcrllowirrg tal,'le which shows the amounr
of water applied in inchcs and the yiek{ of rice in tons per acre }n an experirtenta}
farm. Esiimate the raost i.rrobable yield r.,f rice of 36 inches of water.
lYater ( X ) 10 g,)
1G 28 Jrl 4t) 46
Yield of rice { Y ) 2.25 9.85 2.95 3.15 3.40

om 3,80 4.00

. c
Solu,tion:

t
The rcgressicrn line of :'ield of rice (Y) on warcr'(X.t is I = a+hX

o
'lhe normal cquatiorrs aru: s-)' :: na 'r ir I-\ rnri I-X\' = aIX+bIX:
Tire necessulry caiculatiot-iij arr: given belorv:

s p
g
X \I \ry vlJ

l o
f,r, -\
i0 i) .) J-.
100

. b
16 ? rlf-, .i5.6 256

3
,,) 2.$5 64.9 484
28

9 4 3.15 88.2 784

9
34 115,6 156

t
3..10 1

a
4A 3.80 152.fi 1600
46

s t ,1-0C 184.0 2116

// IX = 196

qqn: . 7a+196b .,...(t)


LY = 22.4 IXY = 672.8 IXz = 6496

s
Substituting the values in the normal equations, we have

tt p
. pi.a 6i2.8 =196ar-649Gb....".(2)
.

h
So.lving these two equations, we multiply equation (1) bV 28 anci subtract I'ron:
equation (2;, we get
672.8 = 196a+64$6b
62i.2 = 19Ga+5.188b

1a a

4r.G = loo8borb=ffi =o.o;"i.


Substituting b = 0.05 in equation (1), we get
t
22.4= ?a+ 196(0.05) or 22"4 = ?a*9.8 or 7 a=22.4-9.8
or7a= 12.6 or =
"= # 1.8

Hence the regression line of Y on X is Y = tr.8 + 0.05 X


178 Baslc Statlstics Part.II
To esLimate the mos$ probabie yield of rice, we put X = 36 in the above
equation, x'e get
A
Y= 1"8 + 0.05 {36) = 1'8 + 1.8 = 3"6
Exomple "14,4,
Show that the sum of errors and sum of squares of errore are zero.
,7 o 4 5

m
1 r)

i
o
Y 0 I 3 4

. c
Solution:

t
The equation c,f a least square line is Y = a + bX

o
'\..
The ntrrtnril equations are
BY = na + bEX and
IXY = aEX + bIXz

s p
g
The necen-qary caiculations are given below:

o
t=X-1 (Y-t) tv - t)r
l
X Y XY x?

.b
'1 0 0 1 0 0 0
.2

3
r) I 4, 1 0 0

4
() o ()
e G 0 0

9
.I l(} 16 3 0 0

9
0

t
5 4 ?0 25 4 0

a
IXY = 4tt IX: = 55 EY=10 E(Y-t; = s r(Y*t)z = o

t
IX= 15 XY=trO

/ s
Suhstituting the velue s in ihe trr.rr:mal equatioRs, we have

:/
l0 l5b ..'"' (1) 40 = 15!r+55b ... (2)
= 5a +

s
Soiving thesr: two equati.rns, we m-ultiply equation (1) by 3 and eubtraet from
ii

tt p-=
et{uation (2), we gel
40 15a +' Siib

h
30 lgz+45b
10
tr.0= 1()krorb=T6=1
Suhrstituting ii = I irr eqitation (1.), rve get

10 = 5a+ i$(1) or 5a = 10-- 15 = -5 or a=r


-D *1
r'-
I{ence the fitted least sclutlre line is Y = X -1
Another Form of Regression Equation
We know that the hnear t'trgression equation.is
t = a+bX (1)
g and L79
ve Norrnal equation for 'a'is
IY= na+bXX -\ :
Dividing both sides by n, we get
-1
,J
lY na + ---
bIY
-_=
nnn mr i.I t2) v
it

.//r
____._
{\
Itmeans that the regressirin line passes through the point (X, $.
<\1
\-/ m
Subtracting equation (2) from equation (1), rve get

o
t/ z-
- [v',
c
Y-Y= b(X-X)

t .
which is another way of writing the reiression equation of Y on X. From

o
equation (2) rve can r,vrite

asp
a = Y-uX
The reg'ression coefficient b may be written

g s
br*, which rneans tlJat the

o
,coefficient beiongs to an equation in which X is the independent variable and Y is
the dependent variable.

bl -
.
Calculation of Sum of Squares of Errors

3
The sum of squares of eruors definecl by SSE $'X(Y * .rr. also be calcul

4
as below:

9
- ^_ ,./
SSE = rYr-axY-brxY (A". ,/ ,.uhj.
14.6.r pRopERTTES
t9
oF rHE THE REGRESSTON LrNEI |ffi, !J/
REGRESSToN LrN'n
s;'/
W.i-"
\'-'' ('*-;
.?
l)
=ta
0 ---r,*.
,/ i'-;The regression line oft= +
o followirg prop*itG*,
bX has the \ \ "
tn
[i7- Wu know that Y

/: /
t-
s a 't- bX. 'fhis shorvs that the line passes through the means X

s
ancl Y,
,/n

p
(ii) = a + bX ancl

tt$A-t
The Jum of eruors is equal ta zeto. The regression equation is $'
the sum of deviarions of obscrvecl Y frurn cstrnrateC I is

h
t\
= E&-a-bx)=.IY-,na-bIL=
(.1).--*__r_\
0 IEY=na+b]JXl
when E(Y -'h = 0, it means thalfif = It )
14.6.2 REGRESSION EQUA?rON O
There are some,special cases in which X and Y can be assumed independent
variabld turn by turn, This is possible when both X and Y'are rand.om variables and
to estimate some Y-value, X is assumed as independent and to estimate sorne X-
value, Y is taken as independent. When X and Y can be interchanged then the
regression coefficient of Y on X is clenoted by br* and the intercept of Y on X is
denoted by ay* when Y is the independent variable, the regression equation cf X on Y
can be written as:
* = Stxy+b*rY
180 Basic $tatistics Part-II

where b*, is theregressioir criefiicient of X an Y. 'f]ie ncrtr,al eqLratians for the


regressic'n of X ein Y are
IX , a ,=r, + br. IY lirt'l IX'Y = a*, 11 + l:"- IY'
The regression coelTicicnt. h.. tiri,l the intercepE axy can bc riirectiy calculated by
thc relarions.

om
t .g c
t.- f, o
and

s p
'I'he r:egres*ion c:qu,ation of X o:i Y can be written u* = bxv -Y)
Also I(X-t) =0, ,X
og = rt-andXr=&*r*b*rY or'a*r=X-b*rY

l
\uarious F'orrnulas {br t,he Calculation of Regression Coefficients

b
-.- .
DiiTerelt fbrrns of the fcrmulas fcl regression coefficient of Y on X are:

3
(rx) (t\')
IXY

4
n

9
(rx'!2
L4
\.Y2 -

9-
n

a t xXY-nXY

t
Ix\ -_Gx'1ji! " n llXg _ (IX):

s \') ?llSi ' S*r ;


LK!*nX2

= // Si
s : i(X'"' '\)*t'--
rr
= ,vltcre ' trX - X) CY - Y)

tt p
Wher:i u,he eaiculatiori,i ar'* to be reducecl by change of crigin, then we use Dx and
*
h LL'txaY n
I), where il* = xil, .. A anrl =Y B, ;{ and B are some. constants bro can be
ca lrr:lated as'celorv :

1r*. Er -Lrj)*i-Gll'l nIiJ*Dv * (ID") (EDy)


[r-
): (:l).,).: ':-*x - (ID")r
nID2 \--^/
iU:- .
X-A Y*B
Wtren change rif origin artd scale is used, then U = and V = -1-and
iUV-ry*tu nEU\'- (IUtQlO
L_
Lr.... * /r I i'\2
r-.,r'Y,, \=:a/- nxU?"_ (XU)z
,n-
I
[Chapter 14] Regression and Correlatlon 181
The regression equation of Y on X can be writterr as:

t -y = Stx-xl aL

Similarly, the formulas for the regression coefficient of X on Y are

x(x-nff-n xxY-qf!
m
I.
u*y -

o
(rnz
x(Y - D2 -
c
EY2
.:
n

.
1!

nxXY-(EX)(l$ _ EXY-nXY
n EY2 - (El)z
ot
p
Eyz _ n yz

= ,6-X\g-S = fu
ry- g s
where s*, =
E
l o
b
When the idea of change of origin is applied