You are on page 1of 156

FOUNDER

Shri Jagdishprasad Jhabarmal Tihrcwala


(I\eithcr Born Nor Died, VisIted thi, Planet during 1926-19(5)

O"e "fthe eminent Past President of Shri Kajasthani Se\'a Sangh who hdp the Rap,tham
cOll1'nllnity 10 preSeTve then culture and hentage thought heing away Irum thdr motherland. lie had a
dYliamie personality. lie established various b'lsinesses in tile fi led of Chemical s. Engmet:mg, PIa,lie ami
Electnal, elC. which arc havlTlg global presencc today and contribution to thc development in India
cconom\'. llc served in ~lull1hai through v<lriou, I'Til"aS; Organuatl(m, lih :\hnudi Sanllnelan,
Rajasthani Yidhy,uthi (; ri ha elC.

Ilis contribution in the development of the l11agnificem Shri Khemisati Mandir "I Jhun)hunu is <l
,cun ror any ,ocial wurker. Shn Rajasthni Scva Silllgh Tru,l undcr h" leadership madc ,ul1stautial
contribmion in the dcvclopmem of tcmple, school and collcge at J. U, ~<lgar. Education was very close ltl
hi, hcart. He started a seh"ol al Jhunjhunu in hi, lather's nal11eami he hatl a VIsion to make Jhunjhunu <In
education huh having a University at Jhunjhullu. We all arc V(;ry proud that we arc in the process of
fulfilling the dream of Shri Jagdishpra,ad Jhaharmal Tibrewala. the hcloveJ I'ast I'resiJell1 of Shri
R<I)a,th<tniSev" Sangh in overall development of jhllnjhunll by t<lking suppon of tht: mt:mbt:r; uf
jhunjhnun Pragati Sangh ami "th"" "rg:""i~ati,,ns wh'ch arc llltcrt:,ted III the J~vehlpment uf nut only
Shekhawati region, but all over Rajasthan. as ill a very ,hort sp<tnof lime the University is going to 'pread
its wings all over India thus fnlfilling his mission and completing the dre<tm of late Shri Shriniwas
bagarka whoestablisheJ Shri Rajasthani Seva S;mgh.
Quantitative Techniques
(For Ph.D. Course Work)

I
I
I,

Shri Jagadishprasad Jhabarmal Tibrewala University


Jhunjhunu - Churn - Bislum Road, Vidyanagari, Chudella,
Dist. Jhnnjhnnn, Rajasthan - 333 001

,
,

Chancellor's Message nn"

lam trying /0 reach you through this book. This book call be lIsed as a seljlearning material
/0 enhance the re~'earch skills and Quality. This book is thorough~v based 011 the syllabus
prescribed by University Grants Commissioll. Our experts have tried themselves the best to give
you excellenl study material. The lerms have been explained ill a lucid manner. 1hupe Ihal m)'
gen/le scholars will enhance their research methodology through this sincere effort. Some solved
numerical problems have also been added IV express dala collection and ,lola sampling methods
/0 design a quality reseore/I.

With hex/wishes & worm regards.

Sd/.
Vinod D. Tibrewala
Chancellor
Preface

Quantitative techniques, employing mathematical and statistical tools, have been


gaining considerable importance in the context of research in any stream of study,
including social sciences. Modem research lays heavy slress on critical and analytical
study ofa subject with the help of scientific tools of analysis. It is against this background
that the JJT University has made study of Quantitative Techniques mandatory fOf all
research scholnfs pursuing Ph.D. This study material has been written keeping in mind
the course content prescribed by Ihc University as also the specific requirements of the
student commnnity.
The author has made every endeavor to make the book practical-oriented. The
book contains a number of solved cases as well as examples for the practice of students.
Furthermore, conscious efforts have been made to present the material in a language that
can be easily underslood by all users ineluding those without any statistical background.
It is hoped that the research community at JJT University will find the publication very
helpful in their research work. Nevertheless, any critical comments and suggestions from
readers will be highly appreciated as this will enable us to further improve the quality ()f

publication.
The author is immensely grateful to Honourable Chancellor JJT University, Shri
Vi nod Tibrewa la for his constant encouragement and guidance in bringing out this wurk.
Valuable inputs provided by Or. N.N. P:lnlley, Prin. Prahladmi Dalmia Lions College of
Commerce and Economics, from time to time are also gratefully acknowledged. The
author is also thankful to Ms. Vanashree Valecha, Ms. Rakhee KcI:lskar,
Dr. Mrs.Anju Singh, for their valuable help. The author will also like to place on record
the untiring help extended by Dr. Bnlwanl Singh in the expeditious printing and
publication of this material.
INDEX

Sr. No. Title Page No.

t. Module I 1 .9

2. Module II 10 - 34

3. Module III 35-52

4. Module IV 53-58

5. Solved Examples 59.133

6. Exercise 134~139
Try Yourself

7. Statistical Table 140- 152

8. Reference (Colour Page) 153

Shrl Jagadishprasad Jhabarmal Tibrewala University


Jhllnjhunu - Churn - Bishau Road. VldYlinagarl, Chudclht,
Di~t.Jhunjhunu, Rajasthan -333 001
Module I

Application of Central Tendency


Application of Ccntra] Tendency and central dispersion, Coefficient of Correlation,
Coefficient of determination and non- detcrmination, calculation of standard error of estimate.

What afe the mea~(Ires of Central Tendency?

Measures of Central Tendency are nothing but statistical averages. It tell w. the valUe about
which items have a tendency to cluster. It is representative of the mass of data. It is useful in
comparing different distributions. The average have a general tendency to lie at centre and hence
they, arc tcrmoo as 'measures of central tendency'. The requisites of mc ••.,ures of central
tendency include it's simplicity in dclinition, easiness in computation, capability of fUMher
algebraic treatment" sampling slability and nun influence by eJltreme observations.

Types: The measures of central tendencies or averages can be classified as (il Algehric
averages and

(ii) Positional averages.

Algebrie averages require algebraic formliia to compute. While Positional aver"ge' can be
located from graphs. Algebrie averages cannot be obtained from graph.

Amongst Mean Mode Median, TIle Mean falls in Algebric average category. White Median
and Mode are Positi"nal averages.

Me:ln is the simplest measure~ uf central tendency and is widely used. It i~ used ill
summarizing the essentials features of a ~eries and enables data to bc compared. It is easy to define
and simple to understand. It is based on all the observations and hence treated as a good
representative of the distribution. It hal>a sampling stability and al,o capability of further algebraic
treatment. Its only limitation is it cannot be obtained for 'open end' d<lsS interval distribution. AI~o
it is duly affected by extreme observations. Sometimes it gives absurd resulls. It may be the value
which is not pari of the distribution. Specially in Economics and Social studies where dircct
quantitative measurements are possible, Mean is the better avernge than others.

Normally Mean is of three types as Arithmetic Mean, Geometric Mean and Harmonic Mean.

Simply Mean indicates Arithmetic Mean.

Gemnctrlc Mean is dcfined as n'h root of the prodllct of the values of ,] times.

It's application is in detennination of average percent of change_ Whcnev<,r ratios.


percentagcs are to be <lveragcd, Gcometric Mean can be used. Generally in construction of ludeJl
numbers, Geometric Mean is oftcnly used.

1
Ckcassionaly a frequency distribution is encounted thaI is skewed to right, but if logarithms
X values are used with the class intervals of logs constant, the curve becomes synunctricaL In such
situation the Geometric Mean Illay be appropriate.

Geometric Mean can also be used in averaging the rale of change.

Harmonic Mean is defined as the reciprocal of the average of reciprocals of values of items
of a series. It has limited applications particularly where time and rate are involved. It is used in case
like time and motion study where time is variable and distance constant.

Median is the value of the middle item of the series where the series is arranged in ascending
or descending order. ~1edian is used only in the context of quali13th'e phenomenon for example in
estimating intelligence. Median is not Useful where items need to be assigned relative importance
:md weights, It is not frequently used in sampling statistics.

There arc two specific situations where the median ser\'t'S as a ,'aluable alternative 10 the
mean. Thesc <x:cur when

0) There arc a few extreme scores in the distribution and

(2) Some scores have undetennined values,

In psychology this often ueeurs in learning experiments where you arc measuring the number of
errors or amnunt of time required for an individual to solve a particular problem, Generally in Open-
End class interval frequency distribution mean is unable to compute and hence in such cases median
will be preferred.

Mode is lhe value which occurs most frequcntly in the distribution. It is easy to compute and
it can be used with any scale of measurement. The fact that mode can be used in any scale of
measurement made i.e. flexible when scores are measured in a nominal scme it is impossible to
calculate either mean or median so mode i~ used to de:;cribc central tendency. Mode describes the
typical or most represcntalive academic major for the sample. Because the mode identifies the Illost
typical value I case, it often produces a more sensible measure of central tendency.

Thus comparinR mean, median aod mode it Is noted that lhe mean is the commonl}'
used averaRe, taking into consideration all the ob:;ervatioDS. It can be good representalil'C of
the distribution. The goal of centrallendcney is to find a single ,'alue that best represent.~ the
distribulion. Besides being a good rcprl'Sentative the mean has added average of being a good
measure for purpose of infen:ntial statistics. Specifically whenel'Cr you take a sample from a
population the sample mean will give a good indication of the value of lhe population mean.
Also mean satislil'S majority requisites nf an ideal a"erage so mean is the superior nmongst all.
But there are cerlain situations where mean cannot be compullod then median or mode can be
used.

,
Dispersion
An average can represent a distribution only as a best single represcntativc. Thcre are some
situations whcre averages fail to compare the distributions. Consider the following case.

Four candidates Sanchit, Saumbh, Smila, Seema. Scores marks three te~ts as follows.

Test Sanchit Saurabh Smita Seema

I 80 95 80 98

80 80
II

m
"
95 65 80
92

50

On thc basis of average if candidates are compared, the conclusion is all four arc equal or
same as far are scores are concerned, since the average score for everyone is 80. If studied minutely,
we see that Smita is most consisitent. Sanchit comes next to Saurabh ,!lid then Seema. So here there
is need to study scatter of the values from average and it is defined as dispen;ion. Thus Dispersion
means scatter or spread of individual values from its average in the distribution. The measures which
are used to measure dispersion are known a~ measures of dispersion.

Requisites of good measures of dispersion,

I) It should be easy to understand and calculate.

2) It should be based on all the observations.

3) It should nO!be affected much by sampling nuctuations.

4) It should oot be affccted much by extreme observalions.

5) It should be capable for funher algebraic I m3lhematicallreatmenl.

Measures of dispersion can be either Absolute or Relative.

Absolute measures are with respect to given distribution and hence arc expressed in
corresponding unils of measurements. While Relative measures are free from any measurement
units. They are pure numbers.

To compare different distributions which differ in units of measurement always relative


measures are used. The relative measures generally are referred by 'coefficient'

,
Following is the list of Absolute and Relative measures of dispersion with their f<mnulas.

Absolute Measures of Dispersion Relative Measures of


Dispt:~ion

l. Range H.S
H.g
eoeff. of Range =
Where H: Higest. Value in the data H+. •

S: Smallest- Value in the data

2.
Coeff. Of Q.D = Q3 - Ql
Q3-Ql Q3+Qt
Qumtile Deviation Or SCllll hlter Qnmtilc Range =
2

,. Mean Deviation about an average Coefficient of Mean


Deviation

,. Standard Deviation 6 Coefficient Of Variation


(CV)
OR

Variance 61 Cv = ..&.. X 100


X

The utility of range is that it gives an idea of variability very quickly. But it affected very
greatly by sampling fluctuations. Range is mostly used as a rough measure of variability.

If open- end class intervals are given the suitable measure is Quanile Deviation.

As mean is in Central Tendency we ha\'e standard deviation on variance in dispersion.

The standard deviation concept is very important in further analysis. It is defined as root
mean squared deviation.

+
6=
1.1....N L (X-xl' OR

Coeff.orVariation -
= ~ X 100
X
,
Coefficient of Variation is very important mcasure uscd to compare different <.\i,tributions on
tbe basis of cOnsistency homogeneity, variability. uniformity,

Lcss CV indicates more consistency. more unifonnity, more stability while More CV
indicates more variability, more heterogeneity.

In above discussion the Quartiles are introduced. Let us know about Quartile'S in brief.

Quartiles are the partition values which divide tile distribotion into Four equal parts, when
data is arranged in order. There are 3 Quartilcs in all denoted by Q I, Q ~ and Q;.

Q I is known as lower quartile and divide the distribution sueb that 25 %

observations have value less thall Q I and 75 % observations have value abovc Q I. Q 1 is known
as second quartile and divide the distribution such !hat 50 % obscrvmions have value less than Q 2
"d
50 % observations have value above Q 2. In other words Q 1 is nothing but Median of the
distribution.

Q J is known as upper quartile and diville tbe distribUlion such that 75 % observations have
value less than Q; and 25 % observations have value above Q J.

Q t and Q J provide limits for central 50% observations.

The computations of Quartiles is similar to that of Median with minor changes.

The other panition values are Decilcs (D1,D2 •••.••••••.••••••• 09) and Percenti les(Pl. Pl •........... P'I9).

The interpretations and computational procedures for Deciles and Percentiles are similar tu
that of Merlian. Deciles divide the distribution into 10 equal parts, while Percentiles divide the
distribution into 100 equal parts. Fifth Decile DJ . Fiftieth Percentile Pxo are median of the
distribution. Tenth Percentile is DJ. 25'" Percentile is Ql, am;!so 011.

Panition values can be used to determine limits for desired percentage of central
observations. For e:o;ample Qj, and QJ provide limits for central 25% observations, Plo, or D2 anll
Pso Of D~ provide limits for central 60 % observations.

s
Correlation Coefficient and Coefficient of Determination
The meaSures we studied in above sections are related to univariate study only.

In practice we COmeacrO!<sa large number of problems involving the use of two or more than
two variables, The variables are said to be correlated if change in one variable Callses change in
othen; either in Same direction or ill opposite direction. The degrc-e of relationship between the
variables under consideration is measured through the correlation analysis. Thc mea.>ure of
correlation called the correlation coefficient.

Correlation analysis alternpl, to determine the "degree of relationship betwccn thc variables"

Thus the COrrelation is a statistical dcvice which help., us in analyzing the covariation of two or
more variables.

The problem of analyzing the relation between different series should be broken into three steps;

I. Determining whethcr a relation exists and if it does measuring it.

2. Testing whether it is significant.

3. Establishing the cause and effect relation if any.

Correlation is described or classified in different ways. Some of the important ways are as
follows.

i) Positive or Negative

ii) Simple or MultiplcIPanial

iii) Linear or Non linear.

Methods of stud}inl: Correlation:

a) Scalier diagram Method.

b) Karl Pearson's coefficient of Correlation. (r)

c) Concurrent Deviation Method.

d) Speannan's Rank Correlation Coefficient. (R)

Karl Pearson's Correlation Coefficient is also known as Product Moment COrTelation


coefficient. It is based on the following assumptions:

• The variables are related by linear relationship.

• The two variables under study are affected by a large number of independent causes so as to
fonn a normal distribution.
,
• There is caus(} and effect relationship between the forces affecting the distribution of the items
in the two series.

Amongst the mathematical methods used for measuring the degree of relationship, Karl
Pearson's method is most popular. The correlation coefficient summarizes in one figure not only the
degree of correlation but also the direction positi\'e or negative.

(nterpretaion of Coefficient of Correlallon.

Coefficient of Correlation lies between -I and +1 (both inclusive)

When r = -I it implies perfect negative linear relationship between the variables.

When r = + I it implies perfect positive linear relationship between Ule variahles.

When r =0 it implies nO relationship between the variables. i.e. variables arc uncorrelatcd.

The value of r closer to 0 indicates weak relationship or weak association between the variables.
While r closer 10 -1 or +1 indicates strong association amongst the variables.

The full interpretation of r depend, upon circumstances one of which is the size

Of the sample.

The probable error of the coefficient of correlation helps to detennine the reliability of the

value coefficient.

The probable error P.E., = 0.6745 (I- i) I" N

Where N is number of pairs of observations and r is coefficient of correlation.

• If the value of r is less than probable error there is no eviden~"Cof eOITelation. i.e. the value ur
r is not at all significant.

• If the value of r is more than si~ times the probable error, the coefficient of correlation is
practically certain. j,e. the value of r is significant.

• The (r - P.E., r + P.E.) provide the limits for population correlation coefficient expected to
lie.

The measure of probable error can be properly used only when the following conditions
exists.

1. The data must approximate a normal frequency CUNe.

2. The statistical measure for which the P,E. is computed must have been calculated from a
sample.
,
3. The sample must have been selected in an unbiased manner and the individual items must
be independent.

COEFFICIENT OF DETERMINATION AND NON-DETERMINATION.

If r is coefficient of correlation then.-l is Coefficient of Determination. It is used to measure

explained variation. it Is defined as the ratio of the explained variance to the total variance.

Coefficient of determination"{ Explained v"'iation}! Total Vari,lrlce.

Coefficient of Non --determination" 1- Coefficient of Determination.

" 1-~

Coefficient of non-determinat;on is denoted by K'. And square root of it I.e. K is lmown as


coefficient of alienation.

According to Tuttle ~the coefficient of correlation has been grossly overrated and is used
entirely too much. Its square, the coeffident of determinatiun Is much more useful measure the
linear covarlatlon of two variables. The reader should develop the habit of squaring every
correlation coeffldent he finds dted or stated before coming to any contluslon about the extent
of the linear relationship between the two correlated varlables.~

Consider lhe following table of rand r1

R .90 .80 .70 .60 .50 .40 30 .20 .10

R .81 .64 .49 .36 .25 .16 .09 .04 .01

Following things are noted.

I. As rdecreases from Ito minimum .. ?- decreases more rapidly.

2. The value of r '" 0.707 implies (! = 0.499g49 i.e. half variance in Y is due tn X

=
3. If r 0.6 36% of tntal varialion is explained. while if r = 0.3 only 9% nf tutal varialinn is
explained.

Value of (! is always posilive. It can nol tell whether the relalionship between the two
variables is positive or negative. For that purpose r is 10 be computed.

8
STAl'o'DARD ERROR OF AN ESTIl\lATE.
Sampling Distribution of of a statistic is generated from a population distribution. Known or
assumed. The same population may generate an infinite number of sampling distribmion s for tlie
statistic, each for special s.lmple size n. A population may genemte sampling distrihutions for two or
more different statistics. The standard deviation of sampling distrihution of a statistic b known
as Standard Error of an estimate.

It is used as an instrument in testing a given hypothesis.

Standard Error provides an idea about the unreliability of a sample.

The reciprocal of S_E. l/S.E. is the measure of reliability or precision.

The S.E. can also be used to determine the limits within which the parameter values are
e~pected to lie.

Follnwing are some formulae for Standard Error of Estimates.

Estimate Standard Error

M~ Pop.Std. Devi-Jn

NO.of successes 'ON


Proportion of successes -J(pq/n)

Coefficient of Correlation 'r' (l-<)/-1n

9
Module II

Forecasting Techniques
Multiple Correlation & Multiple Regression
Time Series Analysis.

2.1 Foruastlnl: Techniques

2.11 Introduction:

Forecasting is a key element of management decision making. Since the ultimate effecliveness of
any decision depends upon a sequence of evenls following the decision should pennit an impmved choice
over that which would otherwise be made.
Forecasting techniques are useful in

a) InvenlOry management to estimate the usage rate for each part in order to determine
procurement quantities.
b) Production planning to fonx:ast unit sales for each item by delivery period for a number of
months in the future.

Forecasting is an integral part of planning process. Correct fureca\ting is not usually possible
because of uncertainty which inevitably attaches 10 the future. By fore<;asling we only try to minimize the
impact of uncertainty. Thus forecasting is only a means of attempting to reduce uncertainty of the future
and not of eliminating it.

Forecasting can be both short- term and long-term.

2.12 For~'("lIsling Techniques

Various techniques arc available for fotel:asting. The choice of a method is generally diclated by
dala availability and I or by urgency of fotel:ast. Many limes fOTe<:;astare forced 10 usc less reliable
method for the required data as the usc of more reliable metho<J are not always available. If the usc of
beUer techniques i\ time consuming and forecast are urgently needed. forecast are made on the basis of
easy and less reliable techniques.

Following are some C{lmmonly used techniques


I. Historical Analogy Method.
2. Executive Opinion Method.
3. Survey Techniques.
4. Barometric Techniques.
5. Regression Analysis.
6. Time series Analysis.
7. Exponential Smoothing.
8. Input- Output Models.

2.13 RcgnSliion AIllllJsis

Regression technique is a tool to i:;olate the casual relationship between the variables several
regression models arc available to test and establish a statistically satisfactory fit between the dependent

10
variable and a specific range of independent variables, Forecast are made by substituting in values of the
independent variables in the equation and hen computing the dependent variables. These methods are
useful for long-term forecasting are relatively more sophisticated and expensive to use.

Regression analysis is olle of the scientific techniques uscd for predictions. According to M.M.
Blair, "Regression Analysis is a mathematical measure of thc average relationship belween IWOor more
variables in terms oflhe original unils of the data.

Regression analysis confined to the study of only IWOvariables at a time is termed as simple
regression. The regression analysis for slUdying more than two variables al a time is known as multiple
regression.

In this analysis there are two types of variables as Dependent __ariahles and Independent variables.

The variables whor.e value is innuenced or is to be predicted is called dependent variable.

The variable which ;nnuen<:cs the values or is used for prediction is called indc'pendcnt variable.

In regression analysis, independent variable is also known as regressor or cJlplanatmy while the
de]X'ndent variable is known as regressed or explained variahle.

The estimation are done with the help of equations known as regression equalions. The regression
equations gives in aeC{Jrdanee with the Principle of Leasl Squares whieh consists in minimizing the sum
of the squares of the deviations between and the given observed values of the variahles and their
corresponding estimated values given by t1Jeline of best fil.

2.14 MULTIPLE CORRELATIOX:

While dealing with involvement of three or more variables we need to apply multiple .orrelation.
We may be interesled to find association between the yield of wheal per acre and bot1J the amount of
rainfall and the average dail}' temperature, We shall be trying to make estimates of the value of the one of
t1Jese variables based on t1J",values of all tJle Olhecs. The variables whose value we are trying to estimate
is termed as dcpendent variable and all the variahles on which our estimates arc based arc kJlO"''I1to be
independent variables.

The coefficient of multiple linear correlation is represemed by R I and it is common 10 add


subscripts designating Ihe variables involved. Thus, R 12,. would represent the coefficient of multiple
linear correlation between XI. on the one hand and X" Xl, X; on thc ot1Jem
. We use fommla for multiple correlation as:
~ 'J.
r 12 + r 13 - 21"131"121"23 'J.:.I J
1'12+ 1"132(I-r 12)
1_1"23

11
r ,,13 = r ~ 1.+1'23
:.i
_2rnl't3I'2:J

l-r13
,
A,d
1. 1.
r 3.l. = r 13+ r2] -2r

l-r12
, 1. r13r7.l ~J.
r13 +r.3J
:<
(1-1"13)

"'"
A coefficient of multiple correlation lie~ between 0 and I Le. always positive. A closer value to 1
indiClltesthe better linear relationship between the variables. If the coefficient of multiple correlation is I.
the relations called perfect. A correlation coefficient 0 indicate> no linear relationship between the
variable~, but a non- linear relationship between the variables c:mnot be ruled OUI.

EXA!vIPLE 1 : From the data relating to the yield of dry bark (Xl), height (X~) and girth (X:;) for 18
cinchona plants the following correlation coefficients were obtained: 1"12
'" 0.77, fn '" 0.72
and r~J == 0.52. Find the partial correlation coefficients r12) and multiple correlation
coefficients fl.2J.

SOLUTIO~ : We ha,'e,

0.77 - 0.72 x 0.52

~,-(0.77)' v,-ro.m'
0.77 - 0.3744 0.3956

~ 1-0.518-1 ~ 1-0.270-1 J 0,3513753

0.3956 = 0.667
0.5927691

12
And,
~ ~
I"U + l" 13-2 fl, rl3l"23
R' 1,23 = ,
1-"23
~
(0.77) + (o.n) ~-2;J.: 0.77;J.: o.n ;J.:O.~2
1 - (0.52)~

0.Ht724
= 0.733
1 -0.2704
Hence, Rl.2J = 0.8562

EXAMPLE 2 :The following zero-order correlation coefficients are given fl1 '" 0.98. flJ == 0.44,
f2J" 0.54. Calculate multiple correlation coefficient treating XI. as dcpendent and second (X~)ami third
(Xl) variables as independent.

SOLUTION :

r 1.23 = I"~
t2 "
+ I" 13-21"12 rl:ll'n
,
1-1'23

(0.98)" + (0.4.t)" -2 (0.98) (OA.t) (O.~.t)


1-(0.~4l

0.930.t + 0.1936 - 0 ..t6~7


0.7084

" 0.986.

2.15 ADVANTAGES AND LlMITA TlOl'\S OF ML'L TlPLE CORRELATION ANALYSIS:

Advantages:

I. The coefficient of multiple correlation serves as a measure of the degree of association


between one variables taken as the dependent variable and a group of other variables as
independent variables.

2. Thus again the coefficient of multiple correlation also serves as a measure of goodness fit of
the calculated plane of regression and consequently as a measure of the general degree of
accuracy of estimates made by reference to equation for the plane of regression.

IJ
Solving (i), (ii) and (iii),
bll.l=-0.623 bIZ,l=0.389 andALZ3" 16.479
Hence required regression equation is,
XI = 16.479 + 0.389 X2- 0.623 XJ

EXAMPLE 4: Given the following determine the regression equation of


(i) Xl on X, and XJ (ii) X2 on XI and Xl while

ell = 0.8
6(=10

SOLUTION The regression equation of XI onX, and Xl is given by,


X,Z3 = AUJ + biZ,] Xz + blU Xl •.... (1)

If Lhe variables Xl. X2 and Xl are mea~ured as derivations from their re,pective means, AL:z.'
vanishes and we write Xl- Xl" XI, Xl- X 2 = X2and Xl" X 3= Xl

Then, we have (I) t,-"nsfonns into XI= biB X2+ bO.2Xl (2)
Then,
bIB = r12-rL3fJ3 X ~I = 0.8-0.6xO.~ X 10
J
l-r 23 02 1-(0.~)' S

= .Q.L X 10 = 20 =.l.. = 0.833


0.75 8 24 6

b _m-lll1•J3 X.lL 0.6 - 0.8 x O.~


13,2- ~ X 1~ X 18~== O.~JJ
l-r n 03 1-(OJl
... required regression equation is

X,,, 0.833 Xz+ 0.533 XJ

(ii) Regression equation of X20n Xland Xl is

0.8 - 05 x 0.6
b12.3 = I"L.-1"131'23
, X~ =
1-(0.6):.1
~
X 10

_
1-1"13

05 ~
"<
- 0.64 X 10 = 8 =0.62~
A"d

btn _ 1"12-1.13 1'lJ

1-1"13
, X iL
63
= (0.5) - 0.6xO.8 X --S....
1-(0.6)' 5

~ O.~- OA8 X....a.. = 0.02 = 0.05


0.64 5 0.4
Hence, require<Jregression equation of "Ion XIand Xjis
"l = 0.625 "J
+ 0.05 Xj
16
EXAMPl.E 5: The regre<sion of lIJon lit and lIlarc to be found out while.
rl~=O.28, rl,J=O.49. rJI=O.51. 61=2.7, 62=2.4 and 6] =2.7

SOLUTION : The regression equation of lIJon.\1 and.\1 is.


XJ", b]1..2II1+ bl,J.IIII •.•... (1)

Now,

: bnl=r231 x

1'13- t'23 fl2 051-0.-19 x 0.28

-~ ~ :)1-CO.-l9/ J1-CO.28)2

'" 0.445
rnl= m-1'231U = 0.49-0.~1XO.28

~ ~ J1-CO.51)2 J1-CO.20)2
'" 0.42

~U=~~~

=1.7)(~)'.J1-(0.41)2 = 2.113

61.2, = 6
J
~ ~ = 1.7.Jl- (0.18)2 )1- (O.-l-l~)2

'" 2.333

Also,
''13 - f;l3 1'12 0.18- 051)( 0.-19
1'123 =

~~
.J1-(051)2 F.-I9)2
= 0.04
02,13=02
~
r-;-j.'~j , ~14. ~ ~,1-(0.49)-\1 ~
l-r~131/ -r 123 ! l-{O.O-l)2

'" 2.067

= 0.-1-15 x :~~~~ - 0.403 b231 = OAl X ::~~~ = 0.-119

17
Hence, required regressioll lille is,
Xl" 0.403XI + 0.429X2

EXAMPLE 6 : III a trivariale distribution 61 " 2, ~,,6):TIl" rll" 0.5 : Til" 0.7. Find (i) bll.2 alld (ii)
blu

SOLUTION
(i)

Now,
ftl- I7l 0.7 - (0.5) (0.5)
1'\2
---------~---------
~ ~ ..jl-CO.5)2 JI-Co.si

" 0.6

61.3=
61~ = 2)1-0.25 = 1.732

~3
\Ir:-:::
I - 0.25 - 2.598

Then,
1.732
bn,2= 0.6 )( 2.598 = 0.412
(ii)

rn - 1'\2 I7l 0.5-0.7X05


I' 13.2 -

..jl-CO.7)2..jl-C05)1.
~~
" 0.243

1-f J 12 =2)(
1 -0.49 = 1.418

18
= 3 )( 1- 0.25 = 2.598

1.-128
- = 0.13.t
2.598

EXAMPLE 7 : The correlation coefficient between a general intelligence test and school achie~ement in
a group of children from 8 tll 14 years age is 0.80. The correlation between the general
intelligence test and age in the same group is 0.7U and the correlation between school
achievement and age is 0.60. What is the correlation between general intelligence and
school achie\'ement in children of the same age.

SOLUTION : We are given with .. correlation between a generJl intelligence test and school
achievement", r I~'" 0.80.

Correlation between a general intelligence test and age'" r]]= 0.70

Correlation between a school achievement and age = r n = 0.60


We have to find correlation between general intelligence and school achievement in children
of same age.

r 123 = 0.8 - 0.7 "f. 0.6


= -----------

~~

0.8 - 0.42 = 0.38 =


0.665

EXAMPLE 8
~~ FF
: An instructor of mathematics wishes to detennine the relationship of grades on final
examination to grades of the quizzes given during the semester. Calling XI. X:!. Xl the
grades of a student on the first quiz; second quiz and final examination respectively. he
made the following computation for a lotal of 120 students.

XI",6.8; X ~=7.0; X l=74


SI'" 1.0; S~'"0.80; 5l=9.0
Til '" 0.60; rll= 0.70; rlJ= 0.65.

(i) Find the least square regression equation of Xl on Xl and Xl.

(ii) Estimate the [mal grades of two students who scored respectively I and 7; 4 and 8
in the quezzes.

SOLUTION The regression equation of Xl onXI and Xl in


19
X, = 16.07+4.36 XI'~4.04 X~

(ii) Final grades of students who scored 9 and 7 marh I e XI = 9, X2= 7,


Xl= 16.07+4.36 (9) + 4.04 (7)
= 83.59 i.e. 84

Final grades of studenlS who scored 4 and 8 marks i.e. XI = 4, Xl = 8.


Xl= 16.07 +4.36 (4) + 4.04 (8) = 65.83 i.e. 66

EXAMPLE 9: Suppose a compUlcr has found for a given set of values of XI ,X2 and Xl. TIl'" 0.91; Til
=0.33; r23=0.81.

Explain whether these computations may be said to be free from errors.

1"12 -1"13 1"23


r 12.3 =

~F: 0.91- 0.1673

Jl- Jl-
~

0.1089 O.6~61

0.6-127 0.6-127
= = 1.161

~J=:
Oj~36

Since the value of rI2.3 cannot exceed OnCthe computation given in lhe que:;lion are not frec from
errors

20
EXAMPLE 10: Find the regression equation of Xl on X, and X2 and estimate X, when Xj = 10 and X,
= 6 from the table:

x, 3 , 6 8 12 14
2
X, 16 10 1 4 3
X, 90 72 54 42 30 12

SOLUTION: The regression equation of XlOllXland X, can be written as

X," X, =( "' -'"


l-r21J
"' ) ( ~, )
SJ:
<X," ,)+(.u- ." "')(-"'-)< X _
X l-rJ12 Sl' t
X I)

X, X, X.- & "


,
3 -,
X " X.

-3
"
25
9
16
10
9
3
81
9
6 -2 4 1 0 0
8 0 0 4 -3 9
-4 16
12
14
4
6
16
36
3
2 -, 25
LX 48 Do -0 LX2,= 90 LX _42 LX =0 Do 140

-
X, X - X.\ Xl Xl Xl Xl XI Xl

90 40 "1600 -45 360 -2110


72 22 484 -9 66 -66
54 4 16 0 0 -8
42 -8 64 0 o_ 0
30 -20 400 -16 -16 -SO
12 -38 1444 -30 -30 -228
LX 3110 Do 0 Do 4008 L' X - -100
L" 720 Lx Xl - 582

Now,

21
- ,
~(X3-X3)
N
--~ ~6X"~ V ~68
6 ~ 2~.85

:!;"
" .100 = -0.891
Tn
V
= ~

Lx:l, 1. L::l:l ~ 90 y" 140

~:q;>;;) .582

V .0.969
r 13 = ~ ~

V L't2• L.'t2) 90 )( ~O08

1"23=
L"l:2X3
~ no ~ 0961

V Lx2• Ix23 V1401- 4008

Then regression equation of X, "" Xl and X2 is,

_ [0.%1- (.0.969)(00.891)]
XJ-50- 1-(-O.891)J
(25.85)
4.83
(Xl- i)

+ [.0.961 - (0.961)(-0.891)]
1- (-O.891)J
(2s.sS)l.87
(X1-8)

Or. X,- 50 = 2.546 (Xl - 7) - 3.664 (X1- 8)


Or Xl=61.49 + 2.546 Xz-3.664 X,
When Xl = 10 and Xh6 the estimated value ofXJ will be
X,= 15.276 - 36.44 + 61.49 = 40.126 i.e. 40

EXAMPLE No. 11: If t12= 0.65. Tn= 0.6 and tll= 0.4. Calculate the value of tn,2

SOLUTION
OA-O.6SxO.6

FF Jl-0.~F
1'11.2 - -

- 0,01 _ 001
o.m
- 0.02
~1-04225 x~

22
EXAMPLE No. 12 : The simple oorrelation coefficient between temperature (XI). com yield (X~) and
rainfall (Xl) are rl2 :0.59, Til = 0.46 and r~l = 0.77. Calculate the partial correlation
coefficient rll_land multiple correlation coefficient Rl.2).

SOLUTION

rill _

FF
We have TlI=O.59. Til = 0.46 and Tn = 0.77

O.59-0JH!
"u -
0.59 - 0,46 x 0.77
-
~ (1- (OA6r) ~ (1-(0.77)') 1-0.2116~1-0.'919

0.2358 0.2358
= =

~ 0.7884 x ~ 0.4071 1/ 0.8879 x 0.6380

0.2358
0.4162
0.5665
Again,

R 1.l3 = T~I. + ? t3-1 ru r:n l'l3


1-1"'23

(O.59)~+ (0.46l-! (OJ9) (0...16) (0.77)


1.(0.77)~

0.3481 +0.1116 -OA180 0.1417


== 0.589
1- 0.5929 004071

23
2.18 Time SerifS analysis Method

Time series refer to numerical data at successive intervals over a time in the P.1St.

It is an arrangement of numerical data in chronological ordcr. The time series data shows
certain definitive patterns which can be meaningfully analysed for purpose of projection into several ways
of using the time series data for forecasting purpos.es such as

(i) Extrapolmion of sales patterns into the future. Considering current sales levels as ba.«eand
pmje<:tions on the as!mmption that the pattern will continue in future. Extrapolative
forecasting involves determination of a curve of trend appropriate for the product being
forecast where upon projections can be madc.

(ii) Time series smoothing is done to minimize the influen<:eof extreme values in the historical
data which might have been caused by random facton;. The smoothing process brings out
the underlying pmtem in the time series data. The underlying pattern may be horizontal or
may involve somc fluctuations or trend. a steady increase or steady decrea.,e. The
techniques of simple moving averages and weighted moving averages are ll-'C for
smoothing the horizontal paUems. Where there is as evident trends in the time serie', least
square method is used.

(iii) The various components of time series. the trend, seasonal, cyclical and random Or erratic
fluctuations. The trend factor is the long-term underlying movement of time series. It may
be a steady stme trend or a growing or declining trend. The cyclical factor is the periodic
ups and downs in the observations forming into a cycle every few years.

The seasonal factor is the periodi<:pattern in the data during the course of a year. The random
factor arises out of erratic events which do nO!occur frequently. Here the te<:hniques of moving averages,
simple linear or nun linear regression equation can be applied to isolate the above fa<:t(lTS and they can
be cumulatively analysed to fore<:a,t the future sales.

Time series may be used for both short-term and long.term forecast. But more useful for
short tenn forecast.

2.19 TIME SERIF-S


A Time Series is a l'et of obl'ervations taken at specified times, usually (but not
always) at equal intervals. Thul' a lIet of data depending Oil time (which may be year,
quarter, month, week, days. etc.) is called a Time Series.
Examples of Time Series are
•• The Annual Production of Tea/Coffee in India over the lust 10 years;
•• The Monthly Sales of a Chemical Industryfor tbe Ia.<;t6 months;
•• The daily closing price of a share in Bombay Stock Exchange;
• Yearly Price or Quantity Index Numbers.
Analysis of Time Series helps us to understand the past hehaviour of time series
dala.one can understand the ch:l1lges that took p1:1ce in the past . With the
knowledge

24
of the past behaviour, it would be possible, within certain limits, to forecast for
the probable future variations (or movements) of such data. Thus it helps in plallning
future operations.

With the help of Time Series Analysis, we can compare the actual
perfonnance with the expected perfonnance and analyse the caUl;e of variation.

Analysis of TillIe Series sho",s that the observed values of the variahle
are fluctuating from time to time.The fluctuations are due to various faclOrs (or
forces» like changes in habits and tastes of people, weather conditions, etc. On the
action of these forces. the values of the Variable arc chllllging with time.
The object of time series analysis is to isolate. and ascertain these forces (i.e ..
the various components).

2.20 COMPONENTS OF TIME SERIES


HuelUations in a Time Series arc mainly due to four basic Types of
variations.

1bese four types of movements are called the our components or elements
of Time Series.

The four componentl; are;


I. Secular Trend or TTend (T).
2. SeaHlIlal Variation (S),
3. Cyclical VaTiation or Cyclical Fluctuation (C).
4. In-egular or RandOill Movement (I).

The changes in Time Series dma arc the result of the combined effect of these four
components.

In traditional or classical time series analysis. a llmltiplicalive


relationship between the four components is usually assumed. i.e., any particular
observation is considered to be the: the crfects of four components. Symbolically,
y='rxs xCxI

where Y =: the result of the four components (or original data).


Instead of the multiplicative model. some statisticians may prefer additive model as

Y=T+S+C+I

in which Y is the sum of the four components

2S
2.212 SEMI. AVERAGE METHOD
In the Semi-Aver-dge Method. the given data is first divided into two parts (preferably equal) and
an average (i.e. A.M.) for each pari is found. Then these two averages are ploned on a graph paper as
point against the mid-points of the time intervals eovered by the respective two parts. These two points
are joined by a straight line. This straight line is the required trend line and the distances of the line from
the horizontal axis OX give the trend values.

Although this method is simple to apply. it may lead to poor results when used inw'LTiminate!y. II
is applicable only where the trend is linear or appro"imately linear.

EXAMPLE I : Draw a trend line hy the Semi-Average Method using the following data:

.. 1973 1974 1975 1976 1977 1978


Y'M
Production of Steel in lakh tones , 25J 260 255 266 259 264

SOLUTION The average production of Steel fOTthe first three years


2~j + 260+ 2~~ 768 = 2~6 lakh 100Ules
J J
And the average production for last three years
263+2~9+ 26,;( 786
~._------- ~ = 262 iakh IOlUies
J J
Thus we get two p<Jints 256 and 262 which arc plotted agains! the respective middle years (mid-pint')
1974 and 1977 of two parIS 1973-75 and 1976-78. By joining these two points, the required trend line is
obtained. (see fig 8.2 given below)

y
t
:rn

!~
.

i•
i*
"
!~,
-•,
~rn
•,
i ,m ,
.•,
2.213 MOVING AVERAGE 1\1ETH.OU
For a given numlJe,-,;Y 1, Yz,Y3, ...• we define moving totals of order N by the sum,
Y1 + Y:+ •.. YN, Y1+ Yl+ .•.. + YN+1,Y)+ Y.+ .... + YN+1•...
And moving aver-.Igesof order N by the sequene of arithmetic means

28
In Moving Average Method, a series of moving averages of specific order is calculated. Slarting
from the beginning of the given series. an average for a specific number of years for yearly data or a time
intelO'a! (called period) is calculated and this is placed again.t the mid.point of !he time intelO'al. Keeping
the period fixed the process is replaced by dropping the first yearly figure of the given values and adding
the figure of the next year we had not added before. We continue with this till the end of the series is
reached.

If !he period of moving average is odd, the moving totals and moving averages com:spond to !he
given years of time. But if the period is even, a two-poim moving average of the moving averages is to be
found for centering them. i.e. for synchronizing the moving averages wi!h !he original data (see example
3(ii) given below).

This method is commonly used for measuring trend. By using moving averages of approprime
orders, cyclical fluctuation, seasonal and irregular movements may be eliminated, leaving only the trend
movemell!.

If the moving averages are strongly affected by extreme values, a weighted moving average with
appropriate weights is sometime used.

Advllnlages and Disadvantages

(i) This melhod is used to measure trend seasonal, cyclical and irregular fluctuations.
(il) Moving average method is easy to apply as this method does not involve any difficult
calculation.
(iii) If an appropriate period is chosen (i.e., if the period of the moving average coincide with
the period of cyclical fluctuations), Ihen these fluctuations are automatically eliminated
from the data by using this method.
(Iv) The choice of the period of moving averages is made by obselO'ing the oscillatory
movements in the data and not by the personal judgement of the Statistician.
(v) This method is quite flexible in the sense that when a few more obselO'ations are added to
the given data, the trend values already obtained will not be affected. only some more trend
values will be included in the series.

Limitations (or Dlo;advllnlages)


(i) Some trend values at the beginning and at the end of !he series cannot be determined.
(ii) It is not easy 10 determine !he period of moving average whtn the oscillalory movement
does not exhibit any regular periodic cycle.
(iii) This method cannot be used to forecast future trend values as the moving averages do not
obey any law.
(Iv) This method is used to find only linear trend. Non-linear trend values obtained by this
method are biased and deviate from the actual trend values.
(v) This method may generate cycles or other movements which were nOI present in !he
origillal data.

29
Example 2.
{i}Obtain the /ive- ear movinl! avera ell for the followin ~erie~ of observation~;
YOM 1%7 1968 1969 1970 1971 1972 1973 1974
Annual Sales Rs. '000 3.6 4.3 4.3 3.4 4.4 SA 3.4 2.4
(ii) Construcl also the 4 year cenlered muving average.

SOLUTION.
(i) TABLE 1
CALCULATIONS OF 5-YEAR MOVING AVERAGES
YOM Annual Sales 5- year moving total 5- year moving aver~ge
(I) (Rs. '000)

'"
(3)
I,,,
(Rs. '000)

1967 3.6 - -
1968 4.3 - -
1969 4.3 20.0 4.00
1970 3.4 21.8 4.36
1971 4.4 20.9 4.18
1972 5.4 [9.0 3_80
1973 3.4 . .
1974 2.4 . .
Note thaI the first moving total 20.0 of column 3 IS Ihe sum of Ihe fIrst 5 values 3.6, 4.3,
4.3.3.4,4.4. The second moving total is 4.3 + 4.3 + 3.4 + 4.4 + 5.4 = 21.8 which can also be easily
obtained by adding (5.4 - 3.6), i,e .. 1.8 with the flI'Stmoving lolal. Similarly. Ihe 3nl ~lOving 10lal is 21.8
+ (3.4 - 4.3) = 20.9 and so on.
NOTE; The five year moving averages (or lrend values) for Ihe years 1969-1972 are shuwn in
cournn 4. (Note that the moving averages correspond 10 the given years.) For the olher years 1967. 1968
and 1973. 1974, moving averages cannot be deleffilinoo.

(ii) First Method


TABLE 2 CALCULATIONS OF 4-YEAR CENTERED MOVING AVERAG£S
Yw Annual Sales 4.year moving 2.point moving tOlai of 4-ycar centered movlOg
(I) (Rs. '000) tOlal Col 3 (Centered) average (Rs. '000)
(2) (3) (4) CoI4+8\
1967 3.6 ..... ...... .....

1968 4.3 ..... ..... .....


15.6
1969 4.3 32.0 4.00
16.4
1970 3.4 33.9 4.24
17.5
1971 4.4 34.1 4.26
16.6
1972 5.4 32.2 4.03
15.6
1973 3.4 .... ..... .....

1974 2.4 ..... ..... .....


30
In the above labl", 4-year moving totals are shown against the mid. points of the time intervals in
<l01.J.As the moving totals do not correspond to the given years, 2-poim moving IOtal of col.J are found
in co1.4, for centering them (Le., for synchrunizing them with the original data).

Second Method
TABLE] CALCULATIONS FOR 4-YEAR CENTERED MOVING AVERAGES.
Year Data( Annual 4-year movmg 4-ycar movmg 2-ycar moving total 4-)'ear centered
(I) Sales Rs. '000) total average of col. 4 (centered) moving Aver.age
(2) (3) (4) Co1.5+2)
1967 3.6 ..... ..... ..... ....
1968 4.3 ..... ..... .... ....
15.6 3.9
1969 4.3 8.0 4.0
16.4 4.1
1970 3.4 8.5 4.2
17.5 4.4
1971 4.4 8.6 4.3
16.6 4.2
1972 5.4 8.1 4.0
15.6 3.9
1973 3.4 .... .... .... .....
1974 2.4 ..... .... ..... .....

EXAMPLE].

F"dth
m , lreo or the f OllOWlfllZ
series USinlZa three-vear wei hted mavin averalZe with weilZht I. 2; I:
Y'M .. I 2 3 4 5 6 7
Values: 2 4 5 7 8 10 13

SOI,UTION.

TABLE 4 CALCULATIONS OF 3. YEAR WEIGHTED MOVING AVERAGE


Year Values 3.year weighted moving total 3-year weighted moving average
(I) (2) (3) \;)01. 3 +4)

1 2 ... .....
2 4 2xl+4x2+5xl _ 15 3.75
3 5 4xl+5x2+7xl - 21 5.25
4 7 5xl+7x2+8xl,,27 6.75
5 8 7xl +8x2 +lOxl 33 8.25
6 10 8xl+IOx2+13xl 41 10.25
7 13 .... .....
Col. 4 = Col. 3 + total weight, where total weight" I + 2 + I = 4.

31
EXAMPLE 4.

For the following series of observations, verify that the 4-year centered movmg average is
equivalent to a 5-year weighted moving average with weight 1,2,2,2,1 respectively:

I
y=

;:~e~ .'000
1
2
2
6
3
1 , ,
4
3
6
7
7
2
8
6
9
4
10
8
11
3

SOLUTION.

TABLE 5 CALCULATIONS OF 4-YEAR CENTERED MOVING AVERAGE

Year Annual Sales 4-year moving 2-year moving 4-year centered


(I) (Rs.'OOO) total total of Col. 3 moving a;~rage
(2) 13) 14) Co1.4+8
1 2 ..... ..... .....

2 6 ..... ...... ......


14
3 1 29 3.625

4 , " 31 3.875

, 3
16
33 4.125
17
6
.,
7
18 " 4.375

2 37 4.625
19
8 6 39 4.875
20
9 4 41 5.12S
21
10 8 ..... ..... .....

11 3 ...... ..... ......

32
TABLE 6 CALCULATION OF 5-YEAR WEIGIITED MOVING AVERAGE

Year Annual Sales 5-year weighted moving total 5-year weighted


(I) (Rs:OOOOO) (3) moving avemge
(2) Col. 3 + total wI. 8\
1964 2 ..... .....
1965 6 ..... .....
1966
1967 ,
I lxl +6x2+ 1 x2+5
6xl+lx2+5x2+3x2+7xl
x2+3x I
31" 3.625
3.875
1968 3 I xl +5x2+3x2+7x2+ 2x 1 33 4.125
1969 7 5xl+3x2+7x2+2x1+6xl 35 4.375
1970 2 3xl+7x1+2x2+6x2+4xl 37 4.615
1971 6 7xl+2x2+6x2+4x2+8xl 39 4.875
1972 4 1x1+6x2+4x2+8x2+3xl 41 5.125
1973 8 ..... .....
1974 3 ..... ......

From the la~t columns of the two tablesS and 6, we see !hat !he 4-year centered moving avemge is
equivalent to a 5.year weighted moving average with weight 1,2.2.2.1 respectively.

2.214 METHODS OF LEAST SQUARES

This mc!hod is widely used for !he measurement of trend.


Linear Trend
Let (XI, Y t) ,(Xz, y z), ....• (XN• YN) be N pairs of observations where YI repre~enls time series
and Xi represellts time. Suppose the equation of the straight line to be fitted to the time series data by the
Method of Least Squares is
Y=a+bX (I)

For a given value of X. say Xl, the corresponding value of Y obtained from (I) is a + b XI. The
=
difference E1 YI - (a + b Xl) or Y 1- a - b XI, which may be positive. negative or zero. is called an error
or residual.
Similarly we obtain
E:1= Yz- a. bXz, ... , EN: YN- a- bXN.

By !he Principle of Least Squares. the line of the best fit is obtained when the sum of the squares
of the differences E. between the observed values Yi and the corresponding calculated values a + h Xi. is
minimum. i.e. when

is minimum.
i~
N
~
=1
.' ,
N
= L(Y;-a-bX;)
i= 1
,

N
L E:l;
Whell i= 1 is minimum, we obtain !he normal equations
:EY=Na+bDe (2)
And:E XY: a rx + b Dez (3)

33
Solving these two equations, a and b can be determined. and substituting these values or a and b in
(I), the required equation or the straight line trend is obtai lied, From thi, equation, we can compute the
trend values.
If we take tbe mid-point in time as the origin. tbe negative values in the first haIr or the series
balance out the positive values in lile sc<::ondhair so that LX" O. The normal equations (2) and (3) would
reduce to
I: y" N a and I: Xy" BI: X2;
a = ZY and b = LXY
N zx2
EXAMPLES.
Detemlined the equation of a straight line which best fils the following data:

Compute tile trend values for all the years from 1974 to 1978.
SOLUTION. Let the equation of the straight line best of fit. with the origin at the middle year 1976 and
unit or X as I year, be
Y"a+bX (I)
By the Method of Least ,;quare., the values of a and b are given by
l (2)
a'" I:Y I Nand b "'LXY IDC
Hence N:= number of years '" 5.

TABLE 7 CALCULA nONS FOR THE LINE OF THE BEST FIT

Year Sales in Rs. '000 X X XY


4 -70
1974
1975 "
56
-2
-I 1 -56
1976 79 0 0 0
1977 .0 1 1 80
1978 40 2 4 80
Total 290 LY 0 lO-D- 34 l)(y

Using (2), a '" I:Y IN", 290/5 = 58. and b", DY I LX? = 34flO = 3.4.
From (I). the required equation of the best fitted straight line is y" 58 + 3.4 X.

YOM X "J:~d Values leI


'058+3.4X
1974 -2 58+3.4x-2_51.2
1975 -1 58+3.4x-1 54.6
1976 0 58+3.4xO 58.0
1977 1 58+3.4xl 61.4
1978 2 58+3.4x2'064.8

NOW. Unless otherwi,e specified, we shall assume lhatlhe values of Y refer to mid-year values, i.e. as a
=
July, I. Thus in Example 6, X '" 0 corresponds to July, I 1976, X -1 to July I, 1975, X =
I to July I,
1977, etc.

34
Module III
Parametric Tests

Theory of estimation- Point and Interval- Testing of Hypothesis. Large and small
sample Tests.
Parametric Test: t-Test, F- Test, Chi-square test, ANOVA. Probability Distribution:
Binomial, Poisson, and Normal distribution.

3.1 Theory of Estimation

Some Basic oonceptsIDefinitions related to Estimation:

~ Sampling theory is the study of relationships between a population and samples drawn from the
population.

Sampling theory helps us to determine whether the differences between two samples are actually
due to chance variation or whether they are really significant.

(I) a
P:lnlmeler : It is a statistical measure based on all the units of populntiotl. For example,
population mean, population standard deviation, proportion of defectives in population etc.

(2) Statictie : It is also a statistical measure thJt based on all units selected in sample. For example
sample mean, .samplestandard deviation. Etc.

Consider the case of selecting 100 houses from the city of Mumbai to study effect of Internet on
children. Let us assumc that Mumbai city consists of 50000 houscs having intcrnct connection. Here
SOOOO is the population size and 100 is sample size selected from lhese 50000.

Now any statistical measures say avcrage age of user, standard devialion or variance of age of
user, if thesc are obtaincd f calcuLatedfrom all 50000 users, it will be 'Parameter' and if calculated from
selected 100 houses thcn it will be sample statistic or simply statistic.

Since the units selected in two or more samples drawn from a population are not the same. the
value of statistic varies from sample to sample. But the parameter always remains constant. This variation
in the value of statistic is called sampling fluctuation.
A parameter has no sampling fluctuation.

(3) Sampling Distribution of a statistic.

A sampling di~tribution is a thcoretical distribution that express the functional relation between
each of the distinct values of the sample statistic and the corresponding probability for all the different
possible samples of size n from the same population.

In the other words the frequency distribution or probability distribution of a sample statistic is
called sampling distribution of statistic. For such distribution standard deviation etc. the characteristics
mean and standard deviation of the distribution are very imponant and plays important role in thcory of
estimation.
The mean of sampling distribution i.e. an e~pectation of statistic of if it is equal to value of the
parameter then it is known as an unblasednl'M property ufthe statistic.

"
The standard deviation of sampling distribution is lenned as Standard Error of an estimate,

II is used as a 1001in teslS of hypothesis. It gives an idea about the reliability and the precision of a
sample. II helps in determining the limits within which the parameters are expected to lie.

It is possible to draw valid conclusions about the population parameter from each sampling
distribution.

Types of Estimation :

II is difficult to obtain population parameters in many studies, in such cases it is essential to


estim31e them as accurately as possible. In estimation there are IWOeypes of estimates as Point eSlimate
and Interval estimate.

Point Estimation:

A Point estimate is a single value that is used to cstim31e the unknown parameter. E.g. sample
mean is a Point estimate population mean.

Interval Estimation:

In this type instead of obtaining a single value as an eslimale, the pair of values are ohtaincd and is
used to estimate an interval or range within which parameter lies with certain confidence (probability).
Such inlerval is known as confidence Interval and the two values arc known as confidence limits. The
prubability or confidence generally in tems of percentages arc 95%, or 99% . Higher the probability,
higher is the confidence. Standard error plays very important role in determining confidence limits and
hence cunfidence Interval.

3.11 Difference between estlmator and estimate.


Any sample statistic which is used to estimate a population parameter is called as estimator,
An estimate is a specific observed value of slatistie. An estimate is formed by taking a sample and
computing the value taken by the estimator in that sample.

3.12 Properties or Characteristics of Good K~timator.

a) Unblasednl'!is:
Let T denotes an estimator and (} denotes parameter.
Thus T can be sample mean I proportion I standard devialion etc and 8 can be POpulaLinn mean I
population proportion I population standard deviation ctc.

T is said 10 be an unbiased eSlimate of {} if E(n = 8.

In other word if On an average T is same as 8, It is said to be unbiased estimate.

36
b) Cllusislency :
IfVm70asn70<
Le. as sample size inc,""ases, variance approaches to zero which shows spread or dispernion diminishes as
sample size becomes large then T is said to be consistent estimator. i.e. as sample size increases the
difference between T and 0 should be smaller and smaller.

c) Effidency:
Efficiency is measured by variance. The estimator with smallest variance is an efficient estimator.

d) Suffciency :
A sufficient statistic is an estimator that utilizes all the infonnation a sample contains about the
paramcter to be estimated.
Among all the estimato~ sample mean Jl and sample proportion P are sufficient statistics for
population mean [J and population proportion P.

Also 5t and P possesses all above four properties.

Method of maximum Likelihood providcs estimators with the desired properties.

3.2: Tl'StingofHypothesfS

Hypothesis is one of the important aspect in any research study. The purpose of hypothesis testing
is to be detell11ine the accuracy of hypothesis due to Ihe fact that data is collected through sampling
method and not complete enumeration method.

The accuracy of hypothesis is evaluated by determining the st.ltistical likelihood that the data
reveal lrue difference and not random sampling error.

There an: two approaches to hypothe.is testing as (i) Clas.~ical or Sampling Theory approach and
(ii) Bayesian approach.

Classical or Sampling theory approach is most widely used in research applications. This approach
represents an objective view of probability in which the decision making rests totally on an analysis of
available sampling data. A hypothesis is established, it is rejected or accepted bascd on the sample data
collected.

Bayesian statistics are an extension of the classical approach. Here also sampling data is used for
decision making, but here research goes beyond it to consider all other available information. The
additional information consists of subjective probability estimates stated in terms of degree of belicf.
These subjective estimates are ba~ed on general experience than on specific data collected. They are
expressed as a prior distribution that can be revised after the sample information is gathered. The revised
estimate known as posterior distribution information and so on various decision rules are established, cost
and other estimates can be introduced, and these element are used to judge decision alternative hypothesis
testing procedure.

In classical tests of significance Iwo kinds of hypothesis are used - The Null hypothesis and an
alternative hypotbesis.

37
Null Hypothesis:

It is a statement that no difference exists between the par<lnJcter and the statisti<:. The ''No
DilTe~nce" type hypothesis is teffiled as Null hypothesis and denoted by Ho.

Alternllti~e Hypothesis :

It is the logical opposite of the Null hypothesis. It is denoted by HI or HA. The alternative
hypothesis may take several fonm depending on the objective of researcher. It may be of the "Not equal
to" or "greater than" or "less than" type. And these types will be used be decide whether the underlying
test is two tailed or one tailed.

If H, or H .•.is "Not equal to" types ( ~ ). The underlying test is two tailed or two sided or non
directional test.

Otherwise the test is One tailed or One sided or directional.

Hoand HI f H.•.are complementary to each other. If Hois rejected means HI is accepted and vice versa.

Based on sample results 110may be accepted or rejected. And Ho may be True or False in legal or
true sense. Thus it will arise following four situations.

II, Tru,

Decision

Acee t I Correct Decision Tell Error


Re'ectl Tel Error Correct Decision

The error committed in rejccting true Hypothesis is termed as Type I Error. The probability of community
Type I Error is denoted by 0< and kllOwn a~ Ie\'e1 of significance. The standard values of c< are 5% and
1%
0< = P [Type I Error]

= P [Reject HoI Hois True].

In Quality control Type I Error is termed as Producer's Risk.

Thc error eommilled in acrepting false hypothesis is termed as Type IT Error. Probability of
community Type ITError is denoted by B.
B = P [Type II Error]
= P [Accept HoI "ois false].

t- B is known as power oHest.

38
3.21 Statislkal Testing Procedure:
It is a step by step procedure as follows:

1. State the Null Hypothesis.


2. State the Ahemative Hypothesis and decide One Tailed or Two Tailed tests.
3. Select the desired level of significance. The most common level is 0.05 and .01 The exact level
to choose is largely detennined by two much •• risk one is willing to accept and the effect that this
choice has On 13risk. The larger the ",Iowcr is the 13.
4. Choose the statistical test.
To test a hypothesis one must choose an appropriate statistical test. There are various criteria to
choose a test. One is the power efficicncy, Nature of population, method of sampling, type of
measurement scale used and so on.
5. Obtain the cri:ical test value. (Table Values) for specified level of significance.
6. CompUle the value of Test stalistic.
7. Decision: ff calculated Value < Table Value Ho is accepted at specified level of significance
otherwise rejected.
8. Interpret the results. Draw conclusions.

3.22:Test of Signilic:mce :
Generally there are two classes of significance tests:
Parametric and Non- Parametric.
Parametric test are more powerful because their data are derived from inlerval and nttio
measurements.
Nun- Parametric lests are used to test hypothesis with nominal and ordinal data.
Tnthe ahove paragraph different methods of measorement are introduced. Let us discuss the same.

A measurement scale can be defined as a sci or numbers or symbols developed ht a manner that
facilitates the assIgning of these numbcrs or symbols 10 the units under research on the basis of
certain rules.
The design - of a measurement scale depends opon the objective of the research and the
mathematical calculations thai a researcher expects to perform on the data collected by using the sI."a1cs.
The different types of measuremcnt scale are as follows:

A) Nominal Scale:
This type uses number or letters 10 identify different objects. It assigns numbers to each
category for identification after segregating them into mutually exclusive and collectively
exhaustive categories.

8) Ordinal Scale:
An ordinal scale is used to amlnge objecls in a particular order. It can be used for ranking
brdJIds based on their quality.

C) Interval Scale:
This is similar to an ordinal scale and is used for arranging the objects in particular order
whcre the intervals between the points on the scale are equal. The two poinl~ on the scale are
located at equal distance.

39
D) Ratio Scale:
Ratio scale have a fixed zero point and equal intervals. These scales are used for
representing age. weight. height etc. for example age can be represented as ratio scale like The
difference between 10 years and 20 years is the same as the difference between 30 years and 40
years.

Comparing IIbo\'c four Iypes follo"ing lITCsome rmdings.

-7 Nominal data is numerical in name only. They do not share any properties of the numbers which
we deal in ordinary arithmetic. e.g. we can record marital status as 1,2.3,4 depending on whether
the person is single, manied, widowed or divorced. But we can not write 4>2 or 3<4. Also 1+3~,
4~2=2 and so on. In such situations we are restricted to use mode as the measure of eentml
tendency. There is no generally used measure of dispersion for nominal scales. Chi. square lest is
the most common test which can be utilil.ed. Also for correlation the contingency coofficiem can be
worked out.

-7 In those situations where we can not do anything expect set up inequalities, the data is referred as
ordinal data. Ordinal scales only pennit the ranking of items from highest to lowest ordinal
measures have no absolute value. The real difference between adjacent ranks may not be equal. In
this situation median is the appropriate measures of central tendency. A p.:rcentilc or quartile
measures is used for measuring dispersion. Correlation arc restricted to I'llnk correlation
coefficients. Non" plll'llmetric tests of significance can be usen.

-7 Interval scales can hav.:: an arbitrary zero but it is not possible to determine for them an ab~olute
zero or unique origin. And this is the limitation of Interval scale. It does not have the capacity to
measore the complete absence of characteri~tic. The Fahrenheit seale is an example of an interval
scale. Increa:;e in tempemlUre from 4' to 8 • ami 30' 10 38' is same but we can not say that
30'temperature is 5 times Wann than 6".

-7 Interval :;cale provides more powerful mea~urement than ordinal scale. Mean is applllpriate
measures of central tendency and standard deviation is the most widely used measure of dispersion.
Product moment correlation technique is appropriate to ~tudy correlation and t-test, F-test arc
generally used test for significance.

-7 Ratio scales represents the actual amounts of variables. Generally all statistical techniques arc
usable with ratio scales.

-7 Selection of measurement scale requires dccision in six key areas as I) study objective ii) Response
fonn ill) Degree of preference Iv) Data properties v) Number of dimensions vi) Scale construetion.

If the nature of the variables pennits. the rescarcher should use the scale that provides the most precise
description. Researchers in physical sciences have thc advantage to describe variables in ratio scales but
behavioural sciences are generally limited to describe variables in interval scale from which is less
precise.

40
The scales should be reliable, valid, st'llSitiVe, generall7.able and relevant.
The Reliabilily is the degree to which it is error free and produces consistent results.
Valldlly is the ability of a scale or the instrument to mea~lIre what is intended to measure.
Sensitivity is the instruments ability to measure the variability in responses accurately.
Relevance is the suitability of using a particular scale for measuring a variable. Thus Relevance =
Reliability x Validity.

3.3 Parametric Tests


Assumptions for Parametric Te~ts

1. The observation must be independents mean the selection of allY one case should not affect the chalices
for any other case to be included in me sample.
2. The observations should be drawn from normally distributed populations.
3. These populations should have equal variance.
4. The measurement scales should be interval or ratio so that arithmetic operation can be used with them.

The researcher is responsible for reviewing the assumptions pertinent to the chosen test.
Performing diagnostic checks on the data allows the resean:her to selectlhe most appropriate test.

The Z. tesl or t.test is used to determine me st3listical significance between a sampling


distribution of mean and a parameter.

Z. tesl is a large sample test. Generally if sample size exceeds 30 it is said to be large sample.
Otherwise small sample distribution. This is because of lack of information about me population standard
deviation.

When sample size approaches 120. me sample standard deviation becomes a very good estimate of
population standard deviation.

Beyond 120, the t and z distribution are virtually identical.

For the characteristic like average and proponion Z or t distribution based tests are most
appropriate tests.

4I
3.31 Z. tests

l. Doe sample Mean lest

To{eslHo:~=po
HI; "poor(~<lJOor~)po)

Here p denote populalion mean and IJOdenule specified value of population mean.

The available data include_ X (sample mean). (J (population standard deviation) or s


(sample standard deviation) and sample si:r.e'n'

The test statistic

z~X-JiO 2- JlO
cr / .r,; , I .r,;
The critical values of Z depend:; upon
(il level of significance 5% or 1%
(ii) Two tailed or one tailed {est (sign in HI" or <. »

The followin table ives critical values


Level of 5% 1%
Significance
Typ<

Two Tailed 1.96 2.58


H1has",i ,)
One Tailed 1.64 2.33
bas>or<si n

Criteria for dedsion


If IZ I :!i critical value

Then Ho is accepted at specified level of significance.

Otherwise Hois rcjcctd.

2. Two sample mean test


HO:~I"'~2
HJ:~I"J.l~or(J.lI<~2or ~1>~2)

42
Available data: For two ~amples, their sample size .• n" n. with mean X, , X. and population
standard deviations (mayor may not) or sample standard deviations.
The computation of te~t statistic Z is done as follows:

Case I: Ifboth the samples are drawn from same population with standard deviation (>

X. - X,
z~

Case II: Population standard deviation unknown. let S,2 , sl<denotes sample variances.

9.jJ = L(.r2-X2)2
n, - I
Then work 0111

, (lIt- 1)SIJ+ ("2-1) S.


,
S -
III + 11.

z~ X., X,
,
'.n. + n,~
Case III:If two samples are drawn from two different popul31ioos with standard deviation 0, and 02

z~ X.
cr'
-' +
n.
Rest procedure is same as above

III One Sample Proportion

Ho: P=Po
H,: P;t. Po (orP< Poor P> Po)

43
P denotes desired proportion. Puis specified value

z = P-Po

~
Where P = sample proportion
Qo'" 1. Po

IV Two sample Proportlon test


Ho:Pl",P2
HI: PI it P2 (or PI> Ploc PI <Pd

The test statistic is


a) If population are heterogeneous

z = Pl-P.
PI q] p. (12
+---
III 11.

Where PI P2are sample proportions


ql = 1- PI. q2= 1- Pl

b) If proportions are similar with respect to given attribute, the best estimate of population proportion
is obtained as
111PI + 11. p.
Po =
III + 11.
qo= 1- Po
md

P, - P,
z =
\/_1_ II]
+ _,_
U.

In all above four tests the probability distribution of test statistic is Nonnal distribution.
So let us discuss about Normal distribution.

44
3.311 Normal Distrihution

This is the most widel)' used probability distribution. This is applicable for continuous random
variable.

A mndom variable means a real valued function defined over a sample space. For every value of
random variable there is asso<:iated probability.

If II random variable takes only integer values, it is knOwn as discrt'te random variable. If a
random variable assumes any value within range it is known as continuous random '-ariable.

From the most widely used probability distributions Nonnal distribution is for continuous mndum
variable and Binnmial,Poisson, distributions arc for discrete random variables.

The probability distribution of random variable is either a tabular form or a functional f\lOn
showing probabilities distributed over various values of random variable sHch that individual probabilities
lies between 0 and I and sum or Total probability is I( unity)

For discrete probability distributions fhe tabular fonn or functional foml referred as pruoobility
mass function (p.m.f.) and for C<Jntinuousrandom variables the function is referred as probability density
function (p.dJ.)

To write p.m.f. or p.d.f. we require parameler(s) oftbe distribution.

Parameters specify the distribution completely.

For Normal distribution, there are twe> parameters mean '1-1' and Standard deviation 0" ( or
variance .r)
Let the random variable X is said to follow Normal distribution with parameters 1-1and a ( or a~)
Then it's p.d.r. is given by

,
1 e
+C X-II
0
)
1 ('l,) •
0
~


f

.l(X) = 1

And it is written as X - N ().\,a)

The frequency ClIr\ICobtained for various values of X and f (~)is known as Nornm! Curve,
For a nonnal distribution if mean is '0' and standard deviation IS J then that variable or variate
is known as Standard Nonnal Variate (SNV). It is generally denoted by t Of Z

Thus tor Z - N (0,1).

The p.dJ or Z (or t) is

1 e
.t(z) •

The graph of f (~)is known as Standard Nonnal curve.

Properties or Normallll.~lribution I Nonnal curve.

l. The p.d.f. is given by

__ <x<""
0>0
2. FOf Normal distribution mean = median = mode = I!
3. Normal curve is a bell shaped symmetric curve
Symmetric about X = I' or Z = 0

x= It z=o

46
4. The total area under the curve is unity
I.e.

The area under the curve ~nd probability of X or Z between the 11'.'0values is same concept.
Th"
b

f" L(Z)dz = p [a-'S Z -'Sb]

5. The nonnal curve is an asymptoTic curve i,e. two tails of Thecurvc do not touch X axis but remain
parallel to X "",is.

6. Following arc Tbesome standard areas under the nomlal curve.


Area between X '" fl:t <:f is 0.6826 or 68.26%
Area betw{'en X '" fl :t 20 is 0.9545 or 95.45%
Area between X '" fl :l:30 is 0.9913 or 99.13%

7. Under standard nonnal curve Theareas are as foll"ws:


Area between Z ",:l: I is 0.6826
Area betwcen Z ",:l: 2 is 0.9545
Area between Z "':t 3 is 0.9913

8. Any nonnal variate with mean fl and standard deviation (1can be converted into corresponding SNV
"
z =
x- It
o

3.312 Binomial Distribution

This is discrete probability distribution. The random variable X under sTudy is said to follow
Binorninal Distribution with parameters n and p under the following assumpTions.

1) The trial must result into two OUTcomesonly success and failure i.e. the trial must be a
Bernoulli trial.
2) Let P denOTeprobability of success then 0 < p < I and p should remain constant in all repe~ted
trials.
3) Let n denote number of times, the trial is repeated. All trials are independent of each other and
n is finite.

If X is defined as 'Number of success' in the experimenT then this is said to have Binomial
distribution with parameters n and p.

47
x- 13 (0. p) aod probability maJ;s function is given by

n 11- r
pI' q
" q=l-p
r=O,I •.... o

Properties or Binomial Distribulion

I) The mean for the distribution == np.

2) The varianee for the dislribution = npq. ,


1
p -
3) For 2 , The distribution is symmetric.

4) Mode of the distribution is an integer lying between M and


M - J where
=
M (n + I ) p. If M itself is an integer. the distribution is said to have 2 modes as M and M
- I. and it is Bimodal distribution
5) If 0 --) = and p --) 0. This distribution approaches to Poissoo Distribution.

3.313 Poisson Distribution

This is a special case of Binomial distribution. If n is infinite or too large and p is very
small then the product of n & p will be a moderdte value say it is..t then this,t is the parameter for
Poisson Distribution. Thus

X-P(,l)

And p.mJ. is

,
r=O,1,2 .
"0

48
Properties of Poisson Distribution.

I) For Poisson distribution mean = varian<:e" )...

2) For Poisson distribution mode an integer lying between ).' I and .i.. If).. itself is an integer,
distribution has two modes,t- I and,l

The Poisson distribution should meet the following criteria.

(i) Inuependence : The number of times an event occurs in any time interval is independcnt of
the number of times it occurs in any disjoint time interval.

(ii) In a very small time interval say t to t + h where h is infinitely small, the probability that
the event occurs once is approximately..t h where,t is the average rate at which the event
occurs per unil of time.

(iii) The chance of two or more occurrences of event in a very small interval t 10 ( + h is
insignificant in comparison lo,t h, the chance of one occurrence.

Examples of random variables following Poissnn Distribution.

(i) The number of telephone calls per min at a swilch board.

(ii) The number of printing mistakes per page of a text book.

(iii) The number of vehicles passing a certain point in one minute.

(iv) The number of persons born blind per year in a large city.

(v) The number of defective articles manufactured by the company ...ctc.

----------_._----------....-------)(----------------- -----------

49
3.32 Student's t. test

We have 51,1far discussed large sample test Z test fur n 2: 30.


If n ~ 30, populatiun is nunnally distributc'd but if standard deviatiun is nut knuwn thell s3mplillg
distributiun uf the meall is alsu nommlly distributed. But in case of small samples similar to Normal
Probability distributiun there is anuther probability distributiun known as Student's
t. <iistribulion can be applied.

Studtnl's t- disiribuliull is one of lhe importanl cuntinuuus probability distribution intn:><luce<.l


hy
W.S. Gusset under the pet name 'student' and it is used fur testing of hypothesis un small samples.

For small sample tests the cunccpt uf 'degrees of freedom' is introduced.

TIm degrees uf freedum is a number which tells us huw many uf the values may be independently
or frecly chusen. Su as the conditiuns an: satisfied. TIlcre is a rule tu set degree of freedom as if n is the
sample Sill' and one parameter is specilied then the degree uf freedum i~n _ I, If twu par:lIl1eters
are spedfied then n _ 2 & su un. If there are twu samples of II] & 0z as sizes fur specified means uf twu
populations then degree offreedum will be (nj - I) + (nz- I) '" nl + nl -2.

The probahility distribuliun (p.d.£) of lhe randum variables fulluwing t distrihutiun with degree of
freedum n- I is as fl,ll"ws:
(V+l)(2

__ <t>o<
Where K is constant.

)'roperties of t . distribution.

(il t - distribmion is symmetrical about thc line t '" o.


(ii) t - distribution is asymptotic 11,1
the t - axis.
(iii) The shape uft- curve changcs witb <icgrec of frecdum i.c. sample size.
(iv) t - distribution has a greater dispersion than ~tandard nurmal distributiun.
(v) It is unimodal with Mean'" Median", Mode.

Uses of I - test.

When (i) Sample si7<::n is small i.e. < 30


(ii) The "variance of the populatioll is nol known.
(iii) The sample is a random one.
(iv) The populmion is normal.

t. test is used

(i) To test fur a specified mean


(ii) To test for equality of twu mcans of twn independent samples drawn from lwo normal
population, standard <ievialilln of the population beillg ullknuwn and
(iii) To fest the significance of difference between the means of paired dala.

50
The procedure for lests of significance is same as Z lest or general procedure. Here critical ~alue is
..,btaioed for required degree of freedom at specified le~el of significance.

Calculated value of tesl slalistic is compared wilh crilical value


If t cakulated $: t table (critical ~alue)
Huis accepted.
Other wise Hois rejected.

The formula for test smtistic, corresponding degree of freedom are tabulated below.
I) To tesl specified mean

Ho: 1.1=1.10

I -

d.f.=n-l
s is sample standard deviation

II) To tesl equality of two means (Two sample)


01 <: 30, I'll <: 30
Ho:l.Il"'1.I1
ololunknown
- - 1 1
A~ailable information: X I , X 2( Two sample means) 01. nl,SI , s. (sample ~ariances)
Test statistic

K1- K.
S. E
X1- Xl
where

I 1
+-
lit 112

A,d

IlISl
~
+ Ill!!:!
4

s =
III +tl.-l

"
If observations for two samples are available then steps to calculate X I , X 2 and S.E are as
follows:
Let Xl], X 1, Xn X Inl are observations of sample 1
' , X2J X2n2 are observatioos of sample II
XII, X2l
<D Obtain LXI, r Xl
'" o :s x, _

'"
-
@ObtainXI-X,.(XI-Xl),X1-Xl(X1-X1)Column.
-1 - -1

- 2 - 1
@ObtainE(XI-Xd ,E{X2- Xll
@ Obtain

s' .•.~ (X,-1'd+~(XJ.1'd


III+nl-2

@ test statistic

, 0

III) Pairedt test


flo: III = III i.e. There is 110difference between the averages before and after
Available information: n pairs of observations (Xi. Yi) i = 1,2 .... n
dof=n-I

Steps to calculate test statistic


<D Obtain di = Xi - Yi for i= 1.2.... 0
@ Obtain

d 0

G> Obtain di - CI , (di - Cll


@ Obtain L (di _ CI)l

'" ,.----
J 2; (di-iJ)' ,
,-, o
,-,
@ lest statistic

t _ ••
sf r;;
52
Module IV
Non- Parametric Tests

Non paTametric tests have fewer and less stringent assumptions. They do not specify nonnally
distributed populations or homogeneity of variance. Non parametric tests are the only oncs useable with
nominal data. They aTe the unly technically correct tests are sometimes employed in this case. Non
pilfametric te.~tsmay also easy to use for interval and ratio data. These are ealiY to use and understand.

Parametric tests have greater efficiency when their use is appropriate but even in such cases non
parametric test often achieve an efficiency upto 95%.

Chi- square can be used as a nOn parllJlletric statistic which is used frequently for cross- tabulation
or contingency tables. It's applications include testing for differences between proportions in populations
and testing for independence.

Non parametric tests are also known ali Distrihution- free tests.

Some important Non parametric test with their application area.

I. Test concerning some single value for the giwn data ( Oue sample sign test)

2. Test concerning nO difference among any two or more sets of data (Two samples sign test. Fisher
- Irwin Test, Rank Sum Test)

3. Test of hypothesis of a relatiOnship hetween variables ( Rank correlation, Kendalls Coefficienl

and other tests of dependence)

4. Test of a hypothesis concerning variations in the given data (similar to ANOYA, Kruskal

Wallis Test)

5. Tests of randomness of a sample based on theory of runs ( one sample run test)

6. Tests of hypothesis to detennine if categorical data shows dependency (Chi- squart: lest for

independence of attributes)

53
4.1 Sign Tests
alOne sample si~ Ie-I

To lest Ho: fl '" flu

On the basis of sample sir.e n we replace the value of each and every item of the sample with a (
+) sign if It is greater Ihan flo and

with (-) sign If ills Ies.~than fl•.

If value is equal to flO discard It.

After doing this we testlbe null byp. tbm tbese + and - signs are values

of r.v baving Binomial distribution with p '" 'II .

If sample is small Binomial probabilities can be used.

For large sample, Nonnal approximation to Binomial can be used.

b) Two sample sign to:st

To test Ho: fll = 111

Here data is given in n pairs of observations.

Steps;rre as follows

(i) Change the pared values into signs + or- as sign of ex - Y)


Count total no. of + signs

Counl total no. of - ~igns

Discardtbose pain; witb X - Y '" 0

(ii) Obtain obsened proportiun of (pluses)

i. e. pandheneeq= l-p

Here sample size will be no. of pluses and no. of minuses

(iii) Obtain

S.E. =~. n -
r;;L
(Iv) Calculate test statistic

1
Po "
P-Po 2
Z =
S.E

If Ca1culatcdZ < 1.96 (or 2.58)

then Ho is accepted at5 % I.o.s. (orl % I.o.s)

Othcrwise flo is rejected.

4,2 RUN TEST


Definition of Run: A TUnis a succession of identicalleuen; I symbols which i. followed or
preceeded by different letter I symbol or no Icnerlsymbol at all. For example consider a scquence

AAAABBBJ1J1J1KKKFFOOOOOPPPPLULMMMMMM

I 2 3 4 , 6 7 8 9

Here in all there are 9 runs with different letters A,B,J,K,F,O,P,L,M respectively. First run is of 4
A's, then 3 B's upto last 9'" run is of 6 M' s.

The IOtal number of nmsappearing in an arrangement is always good indication of a possihle lack
of randomness.

If there are too few runs it implies a definite grouping, dustering or trend may be sus[l<;'clcd.

(f there are too lIlany runs some sort of repeated alternating pattern may be suspected. Thus it may
be possible to prove that 100 many or too few runs in a sample inuicate something other than chance when
the items were selected.

The Ilumber of runs 'r' is a statistic with it's own special sampling distribution and its oWn test. To
derive the mean and standard deviation following fonnulae are used. Meao '" (2njn;>/{nl+ 01) 1 + I

-nl-02) )f[( nl - 02)'(01 + 02 -I)J )


Standard eJ1\}f "''' 1120,n2(2111112

Also the sampling distribution of r can be approximated by a normal distribution if cithcr nl or "2
is larger than 20.

Z", (r - mean)1 S.E.

55
4.3 Test for independence 01"Attributes.
Let l1le obs.elVations be classified according to two attributes and the frequencies 0, in different
categories arc shown in [wo way table called as contingency table. And we have to test whether the two
attributes are independent or nol.

Under the Null Hypothesis Ho that the two attributes are mdependent, the el<pected frequency for
cell is calculated as

Row 10to1 X Total


FU'I"n,,,;., 0fallY cell -~~--------- ('oillmu

Gr.ud Tol.1

Thus Expected frequencies fOf all cells arc calculated.

Test Statistic

(Oij- Ei) )
,
x' •.yII y'"
j~l i-I E ij

II III
OJ .2
.r r Ei
-N
j-I i-I

Where

Ri XCj
y yaij Y Eij •• N . Eij ••
, ' , N

There are m rows and n columns m It n contingency table.

R; denote i"' row total.

Cj denote j"' column total.

E;i '" Ex\=led frequency of (i,j)"' cell

N is grand total

dof",(m-I)(n-l)

If tcal < .t table we say that Hob llcttplcd

i.e. the 111tribut~ are Independent.

Otlierwise Ho is rejected

i.e. The Attributes Clrenvl independent.

"
Z)( 2 Contingency Table and Yates correction.

Let A and B denote existence of two anributes. AD. DO denote

not A and not B.respocti\'ely.

a. b ,c. d arc eell frequencies as under

A NotA ToTal

AI

~
B b ub
"
NolB , d "d
BO
Total "H b" a+b+e+d=N

Here

2
N (lld- be)
X' (n+b)(c+d) (ll+e)(b+d)

And dor == I

In 2 x 2 contingency table if any of the cell fl't'lJuency is less than 5,

Yates corr«tion can be applied.

N 2
eOlTected
N[I,d-b,I--2j
(a+b)(c+d) (a+e)(b+d)

dor == 1.

"
SPEARMAN'S RANK CORRELATION PROCEDURE
The Spearman'~ rank correlation coefficient i~ a measure of a~so<:iation based 00 the ordinal
feature of data, Among the various statistical methods based on ranks the Spearman's rank correlation
procedure was the earliest to developed. Also this method i~ ~implc to use and easy to ~pply. It has also
proved that it is as powerful as Karl Pearson's correlation coefficient when assumptions about parametric
methods are violated.
ltisdenotedbyRandgivenby R = 1_{6L))")J{N(Nl_I)}

The Standard Error of the Rank correlation coefficient is given by

S.E. =R "(n - I)
The Spearman's rank correlation coefficient may be employed as a test statistic to test a
hypothesis of no association between two populations. We as~ume that pairs of observ3lions have been
randomly selected and therefore the hypothesis of no association between the populations implies a
random assignment to "mks within each sample. Each random assignment represents a sample point
a.~,;ociatcd with experiment and a value of R could be calculaled for each. lllUS it is possible to calculate
the probability that R aS~UmeSa large positive or large negative valUe due solely to chance and thcrby
suggests an association belwcen population,; when none eAists.

Some remarks about R


1. This test makes no assumptions about the probability relation between the two variables and hence
it is less restrictive than the test of correlation based on the Pearson's coefficient, which require th~l the
variables be normally distributed.
2. This test ignores thc stronger information contained in data which have an interval or ratio scale.

3. As the sample si7e gelS larger, data manipulations required for non-parametric procedures are
sometimes laborious unless appropriate computer software is avail~ble.

4. A collection of tabulated critical values for a variety of non.parametric tests under situations
dealing with small and large 11 is not readily available.

"
Solved Examples on Module 111and IV

EX.l A population consists of four values 0, 2, 4, 6. Draw all pos~ible ~amples (with replacement) of
size 2 from the population and hence find the sampling distribution of sample means.

Sol. Total numbers of ~ample~ will be 42 = 16. Samples of Slle 2 and their sample means are shown
beiow.

Sl.No Sample Total Sample SI.No. Sample Total Sample


Values Mean Value~ M,~
I 0,0 0 0 9 4,0 4 2

2 0,2 2 1 10 4,2 6 3

3 0,4 4 2 11 4,4 8 4

4 0,6 6 3 12 4,6 10 5

5 2,0 2 I 13 6,0 6 3

6 2,2 4 2 14 6,2 8 4

7 2,4 6 3 15 6,4 10 5

8 2,6 8 4 16 6,6 12 6

16 possible sample~ of size 2 with replacement as shOWnabove can be drawn. Hence each of the 16
1
~amples means occurs with probability 16

Sample mean 1 is repeated twice and hence occur with probability

2 x ...!... = _,_
16 8

Sample Mean 21s thrice, S is repeated twice and 6 only once. Probability distribution of sample mean
j is given below.

Sample Mean (j) 0 1 2 3 4 5 6

Probability (p) 1 2 J , 3 2 1
16 16 16
" " 16 16

50
Elt.2. A population consists of the four numbers 3, 4, 2, 5. Consider all possible distinct samples
(without replacement) of size two and verify that the population mean is equal to the mean of the
sample means.

Sol. The

3+-1+2+5
poptualiollmeau (~l) = = 3.5
4

Allpossible distinct samples of size two (without replacement) and the corresponding sample me<lns are
shown In the following table:

Sample No. Samples Total of the sample Sample Mean


Values

Col. ( Col.2 Col.3 [Col.3 +2]

I 3,4 7 3.5

2 3,2 5 2.5
) 3,5 8 4.0

4 4,2 6 3.0

5 4,5 9 4.5

6 2,5 7 3.5

Total .. .. 21.0

[Sampl1ng used in the above t<lble is random sampling without replacement and the no. samples
".c,,, 6)

7
Mean of sample meaDS = 2~O -,- = 3.5

Hence population mean" 3.5" the mean of the sample means.


Ex.4.. In a random of 81 item~ taken from a large consignment, some were found to be defective. If the
~tandard error of the proportion of defective items in the sample is 1/18 find 95% confidence limits of
the percentage of defective items in the consignment.

Sol.

PQ
SE=-
11 whereQ"l-P

p" Proportion of defectives.

-<v
18
P(l - P)
9
::;- 1 = 1 ~P(I-P)

:::::-4P(1 -P) =1 =- 4P -4r = 1


::::> 4P~-4P + 1 = 0 ::;.(2P -I) 1 = 0:::::- P = 111

P= ~ ::::>Q=I-P=l- ~ ",111

95% confidence limits are.

1
-,-f 1
-18 x 1.96= -.t
1 0.1088 '" 0.:' t 0.1088 = 0.6088 alld 0..~911
_ 1

b.5. The financial controller for Home Electronics concerned about rising personnel costs. Recruiting
expenses appear to be too high, and the controller suspects that an under number 01 applicants are
being examined for each new position. From the recently filled position~ he ~ampled 36 and was 38,
with a standard deviation of 4.5. Construct a 95% confidence interval for the mean number of applicants
screened for each new job at Home Electronics.

Sol.

z =
x- Il x- Jl
:i 1.96::::> X - 38 =:i 1.96
S.E fiE '.'Ij36 .
X-38
:i 1.96
0.75

x = 38:t 1.96 x 0.75 = 38 :i 1.47 = 39A7 .36.53

95% confidence interval can approximately be taken as 36 to 40 Ans.

"
Ex.6. The business manager of a large company wants to heck the inventory records against the
Physical inventories by a sample survey. He wants (I) to be 95% confident (ii) to be almost sure that the
maximum sampling error should not be more than 5% above or below the true proportion of the
inaccurate records. The proportion of Inaccurate records is estimated at 20% from past experience,
Determine the sample size.

Sol.

p= '0
100

SE of proportions =~ pQ = .lx..:l
n .5 5
n
where n Is the sample size

Maximum allowable error is 5 % i.e., 0.05

:. P-P=:t0.05
,'.z = :t 0.05 -1.96:=>:t0.OsF x~ = 1.96

J2tn
[ :. For 95% confidence level Z = 1.96)

1.96 x 2 - 15.68 :=>It:=> 246


5 x 0.05 Approx ans.

If the manager wants to be completely sure then Z = 3

n"n3~'J
0.05
4 =>
[
n => "0.0'
3>2 ]'
~"6An,.
25 II

Ex.7. A company has the head office at Calcutta and a branch at Bombay. The personnel director
wanted to know if the workers at the two places would like the Introduction of a new plan of work and a
survey was conducted for this PUrp05e. Out of a sample of 500 workers at Calcutta 62% favoured the
new plan. At Bombay out of a sample of 400 workers 41% were against the new plan. 15there any
significant difference between the two groups in their attitude towards the new plan at 5% leve;?

62
Sol. let PI and P2 be the population proportions in CaicuUa and Bombay respectively who favour the
new plan.

let the Null Hypothesis be Ho; PI" Pl

The alter!;lative Hypothesis is HI; PI'" P1

WehllVe andS.E. of(p! -1>;:) =

HerU11=,SOO.PI= O.62,l1l=-I00,l';l'" ( 100-41)


100 =0.59

500 )(0.62 + 400)( 0.59 546


.. P= =0.607. q= I-p=0.393
500 +400 900

S.E. of (PI - Pl) "


0.607 x 0.393( :'i~0+ 4~0 )

= j 0.607 x 0.393 x 9
~ 2000
= ~ 0.00107
= 0.0327

Assuming that Hois true, the Null Hypothesis at 5% level of significance and conclude that there is no
significant difference between the two group in their attilUde towards the new plan.

EIl.8 If it costs a rupee to draw one member of a sample, how much would it cost, in sampling form a
universe with meantOO and standard deviation to, to take sufficient members to erlsure that the mean
of the sample In all probability would be within 0.01% of the true value? Also firld the additional cost to
double the precision.

"

Sol. We know that mean:l: 3 standard error covers 99.73% (or leaves) 0.27% of the total are or cases,
which in other words, amounts to overall coverages In all probability.

crp
Standard Error offill' Mean of sample or 0. = r
x '"J II

Where 0" P denotes the S.D. of the population and n the number of members (or items) in the sample,

In all probability the difference between samples mean and population mean should be 3 times of S.E.

"
3cp

~ and the given value of it is 0.01% of mean (i.e., 0.01% of 100) i.e.. 0,01 or

= 0.01 Or 3 x 10 - 0.01
,J;;
Or 30 =O.Ol~ Or F = 3000

Or n" 9,000,000

So the number of sufficient members to ensure that the mean of the sample in all probability be
within 0.01% ofthe true value is 9,000,000 and consequently the total cost will be Rs. 90lakhs.

To double the precision means to have the standard error. In order to have the standard error or
double the accuracy (Precisiorlj, the number of members in the sample should be fourfold, i.e., it should
be 36,000,000. But in the question, additional cost is being asked, which will be Rs. 36,000,000 -
Rs.9,OOO,OOO"Rs. 27,000,000.

Note: Precision. Precision is defined as the degree of accuracy with which the sample mean can
estimate the population mean as revealed by the standard error of the mean.

As the standard error decreases, the precision with which the sample mean can be used to
estimate the populatiorl, mean increases i.e.

Precision a I
SEx
~ if precision is doubled, the S.E. will be have =:;> sample si~e will become four times il precision is
doubled because for a given populations S.D. is fixed and to have S.E.

[ SE ~ -'!...- ]
~ , n, the sample size will have to be made 4 times.

For a fixed sample si~e, reduction in the interval width causes greater preciSion i.e., doubling the
pretlslon means reducing the interval to half.

EX.9 If it costs Rs. 40 to draw one unit of sample how much would it cot in sampling from a universe
with mean as 100 and standard deviation as 10 to take sufficient number as to ensure that the mean of
the sample with a 5% significance level be within 1% of the true value? Find the extra cost to double the
precision.

Sol. Mean of the sample - mean of the universe

= I observed value - E.'Cpectedvallie I = 1~ )( 100 = 1

SE of mean (for large sample) = ~ =~


'i II j;
1
z ~ % - ~L
= ....!.Q.. =
196 [at ~%level of significance Z = 196]
S.E% j;
::>~ = 1.96::>n=384.16=384 (Approx.)

Cost of drawing 384 units" 384 x 40" Rs. 15360

If precision is doubled the size of the sample si~e = 384.16 x 4" 1537 approx.

Extra cost = (1537 - 384) x 40" Rs. 46120 Ans.

If fraction in the sample size is ignored then extra cost = Rs. 46080

65
EK.10 1.800 persons 01" certain age group were observed to have a standard deviation of 9.2 beats per
minute. A5sign the limits for the standard deviation of the population, assuming the above sample of
1,800 persons came from a normally distributed universe.

Sol. Standard Error of Standard Deviation

a
Or G(J= ---

2~

Substituting the values, we get

9.2 9.2
cr(J= -0.153
60
~lXL800

As thrice the standard error covers almost the total number of cases (to be exact, 99.73% cases) so the
population standard deviation should not differ by more than :t 3 S.E. or should remain within 9.2 :t 3
(.153}

9.2 - 3 (0.153)" 9.2 - 0.459" 8.741


9.2 + 3 (0.153)" 9.2 + 0.459" 9.659

Hence the limits of the population standard deviation are 8.741- 9.659, .e., between these (two
minimum and maximum values parameter standard deviation should lie.

Ex.l1. A sample study of 2,500 couples gives a correlation coefficient of 0.45. Estimate the limits to the
correlation in the universe.

Sol. The 5.E. of the correlation coefficient is

Where r is the correlation of coefficient Substituting the values, we get

.. ,
S E = 1 - (0.-t5)j
----
1 -0.201." 0.7975
-0.0159501"0.016
V 2.500 '0 '0
66
In all probability, th@param@tNco@fficientofcorrelationshouldnotdiffer by more than thrice the S.E.,
from sample correlation coefficient as sample:!: 3 S.E., would cover 99.73% of the total populati':m. So
the limits to coefficient of correlation are.

0.4S - 3 (0.16) = 0.4S - 0.048 = 0.402

0.45 + 3 (0.16) = 0.45 + 0.048 = 0.498

Thus we can confidently expect that the parameter or population correlation of coefficient
should be within the limits of 0.402 and 0.498.

Note; it should noted that S.E., ( 1- r2) I "" should be used only when r is moderate, say, less
than 0.5 and n is large, othe'wise t-test of the significance of r should be used.

Ex.12 A correlation coefficient of 0.2 is obtained from a random sample of 1,600 pairs of observations.
Do you think this value of correlation coefficient Is si8nificant?

Sol. To conclude whether the value of r = 0.2 is significant, I.e., whether the observed pairs are really
correlated. It is necessary to find out the value of r which may arise on account of chance when 1,600
pairs are observed, presuming that the observed pairs are uncorrelated.

On the hypothesis that the pairs are uncorrelated, viz., r = O.

1 _1.:l 1
S.E .• = ~-=O.015
40
~

We know that 3 S.E.,cover 99.73% cases, therefore, the upper limit of r will 3(0.025} = 0.075 on account
of sampling fluctuations. But the vlue of the observed r is 0.2 whiCh is many times this value, so we can
safely conclude that the value of r, viz., 0.2 is highly significant, Le., the observed pairs are really
correlated.

"

EX.13 Mr. X wants to determine the average time to complete a certain Job. The past records show that
population standard deviation is 10 days. Determine the sample size so that Mr. X may be 9S% confident
that the sample average remains within:t 2 days of the average.

(Critical value of Z at 95% confidence is 1.96 from standard normal area table)

Sol. The size of the sample is given by the formula,

1l=(~ZY
Where a = population standard deviation

Z = The normal variate value corresponding to the given confidence level

E = Sampling Error = observed value of the mean. Expected value of the mean

Here n = 1.96, a = 10 and E = 2. Therefore,

No. of Items = ( 1.96; 10)2=96.04 = 96 AuS'.

Ex. 14 A manufacturer of ball point pens claims that a certain pen he manufactures has a mean writing
Jife of 400 pages with a standard deviation of 20 pages. A purchasing agent selects a sample of 100 pens
and puts them for test. The mean writing life for the sample was 390 pages. Should the purchasing agent
reject the manufacturer's claim at S% level.

Sol. Let the null hypothesis Hobe that the mean writIng life of ball pens is 400 pages.

Alternative hypothesis = The mean writing life of ball pens is not 400 pages.

390 -400 -10


z ~ ~
10
a I {II

The tabulated value of z at S% level of significance is 1.96

Since the calculated value is more than the tabulated value at 5% level the claim of the
manufacturer is rejected. The purchasing agent should rejf'ct the manufacturer's claim, that the mean
writing life of pens is 400 pages.

'"
Ex. 15 A manufacturer claimed that at least 95% of the equipment which he supplied to a factory conformed
to specifications. An e>;amination of a sample of 200 pieces of equipment revealed that 18 were faulty. Test
his claim at a significance level of (i) 0.05, (ii) 0.01.

Sol. Null Hypothesis Is that the proportion of equipments conformed to specification is 95% i.e., 50 Ho{P"
0.95). Alternative hypothesis is that It is less than 95%. HI: (p< 0.95). Now equipment found t.:l be not faulty"
200 -18 = 182.

The proportion of equipments conforming to specifications, i.e. observed value

~ 182
= 0.91
200
Assuming Hoto be true, expected value'" 0.95

095 >; 0.05


= 0.0154
200

z~0.91 -0.95 0.04


0.0154
= • 2.591 = - 2.60
0.01.54
(Approx)

(i) Z = (-2.60) is less than -1.645 therefore at 5% level, claim is not justified.
(ii) Z" (-2.60) is less than - 2.33 therefore atl% level, claims not justified.

Note: Since we are interested to check only the lower proportion, one tailed test has been
considered.

Ex. 16 In random samples of 600 and 1000 men from two cities, 400 and 600 men are found to be literate.
Do the data indicate (at 5% level of significance) that the populations are significantly different in the
percentage of literacy?

Sol. Null Hypothesis Is Ho: (P1" PI); Alternative hypothesis HI; (PI ~ P2)

Herell=600P=
1.16006 400=...::!...

6 ~_l_
10 15

69
If Hois true, the best estimate of the value of p is given by

..:!... 600 + ~ . 1000


p~ 6 10 _ 1000 _...!Q.. q_ 1 _ ~ =..£..
600+ 1000 tWO 16' 16 16

S,E. (ofPl-v.) = 106('


W . 16 600 + ')'
100 - ..10

Z = ~~~-~"'~- = _,_ " ..10 = , 67


S.E. 15 _.
PI-Pl

This value of z is greater than 1.96 (at 5% level), so it is significant and we conclude that the difference
between the two proportions In percentage of literacy is significant.

E~. 17 a firm found with the help of a sample survey in a city (size of sample 900) that % of the population
consumes things produced by them. The firm thus advertised the goods in paper and no radio. After one
year, a sample size of 1000 reveals that proportions of consumers of the goods produced by the firm is 4/5'".
Is this significant to indicate that the advertisement was effective?

Sol. Null Hypothesis is that proportions of consumption before and after advertisement were equal, Ho:
(Pi" P,); Alternative hypothesis Hi: (Pi <: P,)

Herell] = 900, ..Lof900=67~ P,=Bi..=07'


..1 .: 900 '.

HereUl=lOOO,~ of 1000 = 800: IJ:l= l~: =0.8

If Hois true, the best estimate of the valUe of I' is given by

p= 0.7S x 900 +0.8 "1000


=
675 + SOO 0.78; q = O.ll.
900 + 1000 1900

S.E. (of PI -Pl) - 0.78" 0.22 ( 9~ + 10100)-0.019

0,05
Z= PI -Pl = 0.7S -0.8
-. 0.019 =2.63
S.E. 0.019

Here HI: (PI <: P2) Is one sided and for this test the critical regions are z ~ - 1,645 at 5% level, And z ~ - 2.33 at
1% level.
Now this value l < - 2.33, so It is sigrlificant at level. We reject the Hoand conclude that the proportion
of consumption increases after advertisement, i.e., advertisement was effective.

We can also say that Ill> 2.33, therefore the null hypothesis can be rejected.

Ex. 18 In an infantile paralysis epidemic 500 persons contracted the disease. 300 received no serum
treatment and of them 75 became paralysed. Of those who received serum treatment 65 became paralysed.
Was serum treatment effective?

501. We have the I"ull hypothesis Hothat the serum treatment Is not effective, i.e" P," P, and Altemative
hypothesis Hi: P, < P,

Number of persons who received the serum" 500 - 300" 200

Number of persons who did not receive the serum" 300

The proportion of persons who became paralysed after receiving the serum

=
"
200
- 0.32.~ = PI (s:l}')

The proportion of persons who became paralysed without receiving the serum

p
III PI + 112P2 65 +75
= 0.28, q= 1-0.28 =0.72
IlI+U2 500

8.E.(pI-P:l)
0.28xO.71(2~ + 3~0) =0.0-11

Z=
III - 112 = 0.315 - 0.25 = 1 83
S.E. 0.0-11 .

At 5% level of significance the tabulated value of l is 1.64 which is more than the calculated value.

Hence at 5% level of significance the null hypothesis is accepted i.e., there is no difference in the proportion
of persons getting paralysed with or without serum treatment i.e., the serum treatment was not effective.
EX.19 On a certain day, 74 trains were arriving on time at Delhi station during the rush hours and 83 were
late. At New Delhi there were 65 on time and 107 late. Is there any difference in the proportions arriving on
time at the two stations?

Sol. let the null hypothesis be that there is no difference in the proportions of trains arriving on time at
Delhi and New Dell'llrailway stations I.e., P, '" P,

Alternative Hypothesis ;P1 ~ P,

Total number of trains arriving at Delhi Station'" 74" 83 ,,157

Total number of trains arriving at New Delhi station" 65 •. 107" 172

Proportion of trains reaching Delhi station on time

PI = 74 -0471
157 .

Proportion of trains reaching New Delhi station on time

os
172 =0.378

SE of difference of proportions =
pq ( -t + I~) where

~ 74 +65 139
0.422
157+172 319

= Mean proportion of trains reaching on time


q= 1- 0.422=0.!i78

.'. SE ofthe difference of pro portio liS

0.421 x 0.578 ( 1~7 + 1;2 ) =0.054

z~p,-p, _ 0 "71-0.378 =1722


S.E. 0.054 .
P, -p,

The value of z at 5% level of significance is 1.!l6.


$in(e the (akulated value of l is less than the tabulated value, the null hypothesis is a((epted at $%
level I.e., at 5% level of signifkan(e there Is no signifi(ant difference between the proportion of trains
arriving on time at Delhi and New Delhi railway stations.

Ex.20 In a random selection of 64 of the 600 road (rossings in a town, the mean number of automobile
accidents per year was found to be 4.2 and the sample S.D. was 0,8. Construct a 95% confidence interval for
the mearl number of automobile a((idents per crossing per year.

5<>1.

.".E. ofmeau".JL ~
-.[lI'1~= ~.~_0.8
--.soo::l"" 3 IJ ~"09--16
.\9<) .

x = 4.2
For 95% confidence, the value of l = 1.96

.'. The 95% confidence intervals for the mean will be 4.2 :t 1.96 x 0.0946 = 4.2 :t 0.17854 I.e., 4.0146 to
4,3854.

Ex. 21 A sample of size 600 persons selected at random from a large city shows that the percentage of male
in the sampie is 53. It is believed that male to the total population ratio in the city is)1. Test where this belief
is confirmed by the observation.

Sol. We have the Null Hypothesis, Hothat male to the total population ratio in the city is Y,= 0.5.

let Po= Y,= 0.5 and PI = 53%, when p = 0.5, q = 1- 0.5 = 0.5

.Lx..L
SEP='~+ 'V ---;;- 2
600
2
= 0.0204

= 0.53-0.5 -147
0.0204 .

At 5% level of significan(e the value of l is 1.96 whkh is more tharl 1.47. Hen(e at 5% level of signifi(an(e
there is no significant difference between the observed value and the normal belief and we accept our null
hypothesis. Hence the belief that the male to the total population Is }i Is (onfirmed by tile sample
observations.
Ex.22 lrl order to make a survey of the buying habits, two markets A and B are chosen at two different parts
of a city.

400 women shoppers are chosen at random in market A..Their average weekly expenditure on food is
found to be Rs. 250 with a S.D. of Rs. 40. These fIgures are 220 and Rs. 55 respectively in the markets B
where also 400 women shoppers are chosen at random. Test at 1% level of significance whether the average
weekly food expenditure of the two populations of shoppers are equal.

Sol. Given Sample I Sample II

Sample Size n, =400

Sample Mean XC I = RS.2S0

Sample S.O O,=Rs.40 (J 2" RS.5S

let!!l = Mean of first population,

And 1.12'" Mean of second population

Nul! Hypothesis Ho; (!!l;' !!l}; Alternative Hypothesis HI: (1-1, ~ I-Ill

Assuming Hoto be true,

S.EofYI" :i:2= cr' 1600 + 3025


--'n, + 400 =3.4
400

z~ 250 -220 ..l.Q... = 8.82


3'< 3'<
This value of z is greater than 2.58 at 1% level of significance, so it lies In Food rejection region. So we
reject the null hypothesis and conclude that the said average expenditure of two populations of shoppers are
not equal.

Ex 23. a supplier of components to the electronic indu~try make~ a sophisticated product which ~ometimes
fails immediately it Is used. He controls his manufacturing process so that the population of faulty products Is
supposed to be only 5%. Out of 400 suppliers In one batch, 26 prove to be faulty. Has the process gone out of
control to produce too many faulty components?
Sol. tet the Null Hypothesis be that the process has not gom~ out of control i.e., the proportion of faulty
components'" 0.05

Alternative hypothesis is that process has gone out of control and the proportion of defective
components Is more than 0.05 i.e., the process products too many faulty componer'lts.

Expected proportion offanlty COIllpouellts = 1~0 = 0.0." = AJ

P\I'" 1-0.05 "o.95

Observed propoltioll offanlty compOIlell! = }O~ = 0.065 = P

z= p-",
S.E'(I»
=
0.065 -0.0."
=
0.015 O.OlS x 20
0.2179 1.376
0.05 x 0.95
n
~ '00

The value of z at 1% level of significance for one tailed test is 2.33 and at 5% level of significance i.
1.65.

Since the calculated value of z is less than the tabulated value we can reasonably e~pect that the
process has not gone out of control at both 5% and 1% level of significance.

Ex24. A maragarine firm has invited 200 mefl and women to see if they can di.tinguish maragarine from
butter. It is found that 120 of the women, but only 108 of the men can. Investigate whether there is any
evidence of sex difference in taste discrimination.

Sol. We have Null Hypothesis, Hath"t there is no evidence of se~ difference in taste discrimination, I.e.. p,
" PI

Proportior! of women who can distinguish maragarine from butter.

120 _.....L _ _
200 -5 -0.6-Pl

Proportion of men who can distinguish maragarine from butter

108
= 0.54 =Pl
200

120 + 108 228 0.57=p
200 +200 400

S_E.(ofPl-p.z) =
\J.1 pq(_'_+..1-) _
III ~ 0~<7" (1-0.'<7)
("
200 + 200
)

=~(0.57 x 0.43 x 0.01) = 0.0495

06 -0.54
z- 0.049.~
1.21

The value of l for two ta;led test at 1% level of significance Is 2.58 and at 5% level of significam:e is
1.96.

Since the calculated value is less than the tabulated value, we anept our null hypothesis. Hence there
is no evidence of se~ difference in taste discrimination.

E~.25. Random samples drawn from two places the following data relating to the heights of adult males:

Place A Place B

Mean height (inches) 68.50 68.58

Standard Deviation (in inches) 2.5 3.0

Number of items in sample 1200 ,soo


Test, at 5% level, that the mean height is the same for adults In the two places. (Table value of 2 of 5% level
for two tailed test is 1.96)

Sol. We set H~: 11'" III

The standard error of the difference between the number of the samplesls given by

S.E = 0' (l.5)J (3.ol


--L+
n, 1200 + 1500

= 10.0052 +0.006 =0.1058

"
z~Difference of means
S.E. of lIleallS

68.58 -68.50
0.1058
'" 0.08 -07'.
O.lO.~8 - ..

Computed value of I being less than the table value, we cannot reject the null hypothesis, and so the
mean height for adults in the two places.

E~.26. In a certain city 380 mean out of 800 were found to be smokers. Discuss whether this information
supports the view thai the majority of mean In this city are !'lon-smokers.

{Use 95% level of significance, for which the critical value of 'z' is 1.96 given in standard normal area
table}

Sol. tet p denote the proportion of smokers in the city. The from the given information, sample
proportion

380
PI= = 0.-175
'00
Let us construct a 95% caMidenc!! Ir.lerval for the population proportion p.

The standard efror is given by

SEp= ~= 10.-175 x 0515


'If ----n-' 'V 800 = 0.01766

Hence the confidence limits for p, the proportiOrl of smokers at 95% cOrlfiderlce level Is

0.475:!: 0.01766 x 1.96

= 0.4404 to 0.5086

The view that majority of men in the city are smokers Is equivalerlt to the view that minority of merl
irl the city are smokers which amOUrlts to the situation that the proportion p of smokers should be always
less tharl 0.5. Since the confiderlce limits for p include values more tharl 0.5 in the preserll case, the Biven
irlformation does not support the view that the majority of merl irl the city are non-smokers at 95%
confidence level.

n
h.27 A random sample of 16 values from a normal population showed a mean of 41.5 cms, and sum of
squares of deviations from mean equals 135cm. 5how that the assumption of a mean of 43.5 cm. for the
population Is not reasonable and that 95% fiducial limits for the mean are 39.9 and 43.1 cms.

Sol. We have the null hypothesis, Hothat the sample has been drawn from a population whose mean is
43.5 cm.

Unbiased estimate of S.D.

S=~ ~(X.X')2=
0-1 /VjJlL=3
15

I~ (4U -43.5) 4
2.666
3

d.o.f=16-1=15

The tabulated value of I t I for 15 d.oJ at 5% level of significance is 2.13 which is less than the
calculated value of 1 t I.

Hence the null hypothesis that the sample has been drawn from a normal population with mean 43.5
cms is rejected.

95% confidence limits for population mean are

41.5:i: 2.13 x ~ [SE,= ~ 1


l.e.41.5:l: 1.6 i.e., 39.9 to 43.1 Ans.

h.28 You are given the gain In weights (Ibs) of cows fed on two diets of X and Y

GAIN IN WEIGHT(lbsJ

Diet X ;25 32 30 32 24 14

DletY;24 34 22 30 42 31 30 32 35

Test, at 5% level, whether the two diets differ as regards their effect on mean increases!n weight.
(tabulated value of 't' for 15 degrees offreedom at 5% = 1.753)
Sol. let us t~~e null hypothesis th~t the two diets X and Y do not differ significantly as regards their effect
on Increase In weight. Appling t.test of difference of means:

f
X-y
,
x~2~ + 32 + 30 + 32 + 24 + 14 + 32 21
7

~ EY EY = 320 :::- y = 32
n 10 I.

x (il - 27 =dil (X - 27) y (y-32)dy (Y - 32)


-
X=27 Y=32
25 -2 4 24 -8 64
32 .5 25 34 .2 4
3. .3 9 22 -10 100
32 .5 25 3. -2 4
24 -3 9 42 .1. 100
14 -13 169 31 -1 1
32 .5 25 40 .8 64
3. -2 4
32 0

35 .) 9

LX 189 E (il _ 27) !(X 27) EY _ 320 r(y - 32) I(Y - 32)

=Id!I = 0 =266 =Idy=O = 350

"
, L(X_:()2 + I(Y-vi 266 + 3.~O
7+10-2 = 6.408
III +11;1-2

~ ~
. 1I1St+1I2S2 (X-x} + (Y_y)2
COlll1110IlVan~lIce= 111+112-2
111+112-2

,- 27 -32
6.408 1)~7XTO = --~-'-"
6A08
2.029 = - 1.~83 = 1.:"83

[Absolute \/alue]

d.oJ. '=", + n,-2" 7 + 10-2" 15

For 15 d.oJ. to.05= 1.753. The calculated value of I t I is less than the table value. The null hypothesis is
accepted. Hence the two diets do nOI differ significantly with regarded to their effect on mean increases In
weight.

EX.29 The following data show the cost per squilre foot of floor area connecting randomly selected 7
schools and 5 office blocks from those completed during the period 1984 to 1989.

Building Type Cost Per Square Foot (Rs.)

Schools

Office blocks
"
37
31 26 27
" 38 37

37 35

00 the data support the hypothesis that the cost per square foot for office blacks was greater
than that for schools? Tesl at 5% level of significance using 'nest.

Sol. let us the null hypothesis that the cast per square foot for office blocks was not greater than that
for schools. Applying t-test of difference of means. He; J.Isc"Jl<,H
B.No. x, (Xl"Xd (Xl-XI)
, B.No. X, (X:l-Xl) (X.-Xl):J
.,
,, " ,, " ,.,
I 18 4 I 37 0 0
31 +1 I

.,.,
~ 16 J4 "9
,
4 21
l3
9
49
4
, '7 0
, 0

6
, 38
37
+.
+7
64
49
" 4

1l!~7 EX! I:{Xl-Xl) r(Xl- XI):J lll~~ EX. I:( X. X, :I:(Xl X.)
""llO -0 -192 =18~ 9> =38

t= (XI-X.)
s

_..X, ,
II, ,
210 185
=37

192+38
=4.796
7+.'-1

:. t= 30-37 _ ~ = 7 x 1.i08
V96 \j~ 1.-191
..1.796

Degree of freedom" nj .• nl- 2" 10

For 10 degrees of freedom, the calculated value of t at 5% level of significance for one tailed
(right tail) test Le., to.os= 1.812. The calculated value of t is greater than the table value. The hypothesis
Is rejected. The cost per squares foot for office blocks was greater than that for schools.

Ex. 30 From a large population of unemployed youths, a random sample of 25 is selected and an
:ntelligence test given to them. From the test the data, it was found that the average I. Q. is 97 with a
standard deviation of 12. Are these data consistent with the hypothesis that the unemployed youths
were selected from a population of average intelligence, that is, a population with 1.0. of lOO?

"'
Sol. We formulate the null hypothesis that Ihe sample is select from a population of average
intelligence of 100.

Ho:p.=l00

The standard error of mean:

S.E. of X '" ~ S "'.~ '" 1l f -1.899 '" 2.-15


N-l
,"
.. 1 . ()' X'Il
11Ie cnlicn ratio t ISgiven by: t = S.E.ofX

197-1001
Substituting the \'31I1e8.we get: 1 t 1= 1.45 - 1.22-1

The degree of freedom = N -1 = 25 -1" 24.

For 24 degrees of freedom, Ihe table value of 'I' for one lailed test at 5% level 01 significance is
1.711. The computed value of I t I is less than the table value of t. Thus it falls in the acceptance region.
Hence our null hypothesis is correct. i.e., the sample of unemployed youths was taken from a population
of average intelligence with an I.Q. of 100.

Ex.31 Certain refined edible oil is packed in tins holding 15kg each. The filling machine can maintain
this but wilh a slandard deviation of 0.5 kg. samples of 25 are taken from the produclion line. If a
sample mean is (i) 16.35kg (ii) 15.85kg can we be 95% sure Ihat the sample has come from a population
of 16kg.lins7

Sol. Ho:).!" 16kg

S.E. =.JL =....QJ..."'0 1


X{iI~'

Limit of population mean are = 15:t 1.95 (0.1) i.e., from 15.804 to 16.196kg.

Tabuiated value of z for 95% confidence = 1.96 as (J Is known,

"'
With 95% confidence we can say that

Ii} If sample mean is 16.35kg. then the sample does not belong to population of 16 kg.
tins.
(Ii) If sample mean is 15.85 kg. then the sample belongs to population of 16 kg. tins.
NOle: Please note the difference in the formula used for 5.E. of mean in eX.39 and ex. 40. In eX.39 the
S.D. of the sample is given where as in eX.40 the S.D. of the population is given though both the example
deal with small sample.

Ex.32 A soap manufacturing company was distributing a particular brand of soap through" large
number of retail shops. Before a heavy advertisement campaign the mean s"les per week per shop was
140 dozens. After the compaign, a sample of 26 shops was taken and the mean sales was found to be
147 with standard deviation 16. What conclusion do you draw on the impact of advertisement on sales7
Use 5% significance level.

Sol. We set up the hypothesis

Ho: 1-1" 140 i.e., the campaign is nOI effective

Ho: J.I;< 140

It is given that 5" 15, n "26, .{ '" 147

The unbiased estimate of the 5.E. of the mean is given by

S.E. = .L~ where S is the S.D. ofthe £a1llle


X ,J 11-1

Nowt=----~
IX. f'l 1147-1401 7 - 2.19
S.E._ 3.2 3.1
X
From the table, for 25 degrees of freedom t 0.05" 1.708 {for one tailed test].

Since computed (or calculated) value of t > to."., we reject the null hypothesis. i.e, advertisement m"y
be considered to have changed the average sales volume or we can say the campaign h"d impact on
sales.

EX.33 two salesmen A and B are working in a certain district. From a sample SUNey conducted by the
Head Office. The following results were obtained. State whether there Is any significant difference in the
average sales between the two salesmen:

83
A B

No. of Sales

Average Sales In Rs, 170 ,os


Samples s.d. irl Rs. 20" 5,

Sol. We set the null and alternative hypothesis as follows:

Ho; I!I" III i.e., there Is no difference in the average 5lae.

H':!Jt-l" III

III S~11I2S~2
Unbiased estimate ~ oftbe COIlllllOlivariance =
III +112-2

20)( 400 + 18 J< 625


= 36 =53-U2 =-S=~534.72 "'23.11

.Here one tailed test Is used because under normal cirwmstances, the sales can be expected to
increase iI. a result of campaign. However if we use two tailed test then t 0.0" " 2.06. Even then we
conclude that the campaign was effective. But one tailed test is more suitable here.

,- - 170-20~ _. -1.66 or It 1-4.66


s 23_12 ~ lo + /s
Since the calculated value of t is much greater than 3, the null hypothesis is rejected, and we
conclude Ihat the average sates of the two salesmen are significantly different.

Ex.34. Two type of batteries are tested for their length of life and the following data are obtained.

No. of samples Mean life In hours Variance

Type A 9 600 m
Type B B 640 144

Is there a significant difference in the two mean? Value of t for lS degrees of freedom at S% level
is 2.131.

'"
Sol. The null and alternative hypotheses are,

Ho : II," 1I,I.e. the two type of batteries an identical I.e., statistically there is no difference between
their mean lives.

H, : 1-11
cF 1-11I.e. the two type of batteries are different with regard to their mean life.

An unbiased estimate of the common population S.d. Is given by

(II] -1)'(] + (112-1) S'2


111+112-2

S", Sl, and n" n1belng respective sample variances, and the corresponding sample sizes.

Here n," 9, S"" 121, n," 8, S'l = 144

Sp'" (9-1),,11.1+(8-1)xl44
9+8-1

- 8" 121+ 7 "1-l-l =~ 968" 1008


1;
"
=~ "'-v 13-H.733 =11.-\7

The standard error of the difference between the two means is given by

" In some books the formula variance Is

n, SJ
'I
112S'
• 1
III +112-2

In the above formula unbiased estimate of S.D. has been assumed.

S.E S, _1_ +_1_


III '12

85
-llA7x ~ t

+

I _l1Ah~ ~~

-11.47 • "1/0.2.\6 _ 11.47 " 0.486 - ~.~7

i'"1" .t"2
t -
S.E ,where j' I, and .f 1are respectively the means of the first and the second
sample.

600 -640
5.57
,,-
40
... ,
=-i.18 :.PI =7.18
Degrees of freedom ~ 9" 8 - 2 ~ 15.

Table value 011 for d.o.f at 5% level of significance [two tails test) ~ 2.131

Since computed value of It I is mOrethan the table value, the difference between the means is significant.

EK.35. Ten objects are chosen at random from a large population and their weight are found to be in gms.,
63,63,64,65,66,69,69,70,70,71. In the light of the above data dl.,;uss the suggestion that the mean weight in the
universe Is65gms.

Sol. We have the null hypothesis, HO~ .t ~).I~ 65 gms.

Weight x-66- d
, d

63 .3 9

63 .3 9

64 ., 4

65 .1 1

66 0 0

69 3 9

69 3 9

70 4 16

70 4 16

71 5

I:d ,,10
"
I: d<"
10

"
Meau= j'=66+ :g =67

,
Sample SoD. = --"-'!'. - ( " \"J
U II

,
- ~_(10Y
10 10J £8.8
•. 'Vli.l' =2.966

1:.966 2.966
Unbiased estimate of S.E. of meall:O
.y 10-1 J
"'0.98&

Tabulated value of t for 9 degrees of freedom at 1% level of slgnificance in two tails is 3.2S. Since the
calcul••ted value is less than the tabulated value, we accept our null hypothesis and the mean weight in
the universe 15likely to be 6Skgs.

Ex.36. samples of two types of electrk bulbs were tested for length of life and the following data were
obtained:

Type I Type II

Number In sample 8 7

Mean of sample (in hours) 1,134 1,024

Standard deviation of sample (il'lhours) 3S 40

Test at 5% level, whether the differel'lce in the sample mean is significal'll. (Table values of t for
13 degrees of freedom" 2.16, for 14 degrees of freedom - 2.15 al'ld for 15 degrees of freedom" 2.13 at
5% level for two tail areas and 1.77, 1.76 and 1.75 respectively for one tail area).

Sol. Let 1.11.1.12


be the means of the length of life of two types of bulbs namely Type I and Type II then
the null al'ld alternatives hypotheses are

We will test the significance of difference in sample means by t- test as in Type J and Type II, nur->ber of
items in the samples are 8 al'ld 7 respectively. AI'lunbiased estimate of the Common Population standard
deviation Is givel'l by.

"
Since it Is given th~t SI = 35 ~nd SI; 40, nl '" 8 ~nd 112'" 7, we get

s~= 8~Ul5~/"1600 =';1615.4=40,l

Hence the standard error of difference between the n~mes is given by

S.E = Sp

40.1 x + + ~ =10.8

The test statistic "t can be computed by the formula


W

t =

It Is given that x I" 1,134 ~nd XI'" 1.024. Hence

1,13-1 -1.024
1= 5.288
10.8

Since the computed value of t is more than the table valUe of t a,os (;2.16) for 13 degrees of
freedom, the difference is significant. Hence. the null hypothesis is rejected and therefore the two types
of electric bulbs differ significantly in their mean values.

Ex.3? Two kinds of manure applied to sixteen one"acre plots, other conditions remaining the same. The
yields in quintals are given beioVl:-

Manure I 18 20 35 50

3S
" 35
" " 41

" 29 28 16 30
" 45

ISthere any significant difference between mean yields? Use 5% significance level.

"
Sol. We have the null hypothesis, He,that the mean yields of two kinds of manure do not differ
significantly.

let the samples with manure I be denoted by X,and those with manure II be denoted by X,

X, XI - XI (X, .,) X, Xl-X2 (X,-X2)

XI-37 X,-34

I' .19 361 29 ., 25


20 .17 289 28 .6 36

36 .1 1 26 ., 64

50 .13 169 35 +1 1
49 +12 144 30 -4 16
36 .1 1 40 +10 100
34 .3 9 44 +12 144

49 +12 144 46

41 +4 16

333 1134 238 386

i,"333/9:37 x,"238/7=34

Sample variallce S~I fOf variance ~ I _ (Xl' 5:I )J 113~


=116
111 9

Sample variance S'. for vru'iance y2 386 = 55.1~


7

An urlbiased estimate of the common population s.d. is given by

III S~I+ Il:.lSJ:.l 9 x 11.6 +7 x 5_~.14


s, =
=~ li10 =IO.H
III +112-1. 9+7-1

"
Tabulated value of to.05 for the d.oJ Is 2.14.

Since calculated value of t is much le5s than the tabulated value at 5% level of significance we
accept our null hypothesis that there Is no si~nificant difference between the mean yields of two kinds
of manure.

Ex.38. The foilowing data pertain to two types of Tube. Bulbs tested for their leflgth of life:

Type Sample size Mean Life In Variance of life


hours

Type I , 550 100


Type II 7 500

Test whether there Is a significant difference between the two means at 5% level.
"
501. The nutl and alternative hypothesis are:

An unbiased estimate of the common population standard deviation is given by (assuming the
given standard deviations to be unbiased)

(nl-1)sJ1+(1I2-1)S1. 400 + 486


'" 9.-113
= ~ +7-2
111
+1l.-2

The standard error of the difference between the two means is given by

"9.413 x 0.586" 5.516

Compound t = =9.06~

• 90
Table value of 'I' at 5% level for 10 d.o.f is 2.22g. Since the calculated value ISmuch greater than
the table value, the difference is significant and hence we reject the null hypothesis. Hence the two
means differ significantly .

• Note the difference between Ex. SOand Ex.Sl. In Ex. 50, the formula used for the common
population S.D. is

• "l S" +
",+11,-1
II'S'l

and in Ex. 51 the formula used is

.
(Jll-l)S~I+(1l2-1)S 2
,
111+112-2
. In fact the formula for unbiased estimate of common
population variance of two series x and y is

r( X,- X,)\ 1:(Y _?)1


n,+n,~2
. The unbiased estimate of
variance is

Ex.39 Three samples of five, four and five motor car types are drawn respectively from three brands A,
Band C manufactured by three machines. The life-time of these lyres (In 'OOOmiles)is given below. Test
whether the average life-time of the three brands of lyres are equal or not.

ABC

45 41 44

42 40 42

43 42 38

44 43 43

42 39

Sol.

let the Null Hypothesis be Ho : the average lifetime of three brands oftyres are equal.

Let us subtract 40 from each of the given values. Ther'lthe coded data are given below.

"
TABlE

Calculations for Analysis of Variance

Sample I Sample II Sample III

X. X. X. X. X. X.

1 1 4 16
5

, " ,
4 0 0 4

3 9 , 4 -, 4

4 16 3 9 3 9

, 4 .. .. -1 1

16 58 6 14 6
••
"EXt "EX\ " EXl =EXll = E Xl =Ex'J

T = Sum of the values in the three samples

~ (l8)~
CorrecliOl.1Fllctor=..I...",
-= 56
N 14

=58+14+34-56=SO

5SB = Sum of the Squares Between the samples

"51.2 + 9 + 7.2- 56= 11.4;y. "dJ. "3-1 =2

"
SSW'" Sum of the Squares within the samples

SST- SSB'" SO-11.4 '" 38.6;

'Yl"'dJ. '" N-k '" 14-3 '" 11.


11.4 SSW _ 38.6
:. USB = _8_'_B_. = ~.7 ~ndMSW- = 3.~1
", 2 'h II
TABLE

AnalysIs of Variance Table (or ANOVATable)

Source of Sum of Degrees of M,~ Tesl Statistic


Varialion squares Freedom Squares (MS)
(SS) (d.f.)

Between 11.4 2 5.7


Samples
F. '.7 1.624
3.51
Within 38.6 II 3.51
Samples

Tolal 50 13 ..
The tabulated value of F for 'Yl '" 2 and 'Yl'" 11 at 5% level is 3.98. We see that the calculated value
of F i.e., 1.624 is less than the tabulated value 3.98 at 5% level. Hence we accept the Null Hypothesis HO
and conclude that the average lifetime of the three brands of tyres are equal.

EX.40. The Amrit Merchandising Co. Wishs to test whether Its three salesman A,B and C tend to make
sates of the same size or whether they differ in their selling ability as measured by the average size of
their sales. During the last week there have been 14 sale calls. A made S calls. B made 4 calls and C made
5 calls. Following are the weekly sales record (irl Rs.) of the three salesman:

A: 300 400 300 500 0

B: 600 300 300 400

C: 700 300 400 600 500

Perform the analysis arld draw your conclusion

Sol.

let the null hypothesis be HO:the three salesman tend to make sales of the same size.

let us divide each observation by the common factor 100. then the coded data and their squares are
given in the following table:

93
TABLE

Calculations for Analysis of Variance

Sample I Sample II Sample III

X, X, X, X, X, X,

3 9 6 36 7 49

4 16 3 9 3 9

3 9 3 9 4 16

5 25 4 16 6 36

0 0 .. .. 5 25

15 59 16 70 25 135

=:EX. =:Ex21 =I:XJ =I:xlJ =:Ex, ,,:EX'.


T = Sum of all the observations in the samples

~ (~6)~
Correction Factor"" l"" -"-""
224
N 14
,
SST ""Total stun oCUte ScpJares'" :EX'l +:E X1l+:E X:3 _ L
N

= S9 + 70 + 135- 224 = 40.

SSB = Sum of the Squares Between the samples

SSW = Sum of the Squares withIn the samples

SST- sse = 40 -10 = 30;

1l= d.f. = N -k= 14-3 = 11.

10 SSW 30
:. MSB '" _S_S_B_ = and MSW- --"--2.73
"', 2 "', 11
94
TABLE

Analysis of Variance Table (or ANOVA Table)

Source of Sum of Degrees of Mean Test Statistic


Variation squares Freedom Squares (MS)
(SS) (dJ.)

Between 10 2 , F
, 1.83
Samples 2.73
Within 30 11 2.73
Samples

TOlal 40 13 ..
The tabtllated valtle of F for 'YI=:2 and 'Yl = 11 at 5% level is 3.98. We see that the calculated valtle
of F I.e.• 1.83 of F <: the tabtllated valtle 3.98 at S% level. Hence, we accept the Null Hypothesis Ho and
conclude that the three salesman tend to make sale of same size.

EJI.41. An experimentor wished to study the effect of four fertilizers on the yield of a crop. He divided
the field into 24 pots assigned each fertilizer at random of 6 pots. Part of his calculations are shown
below:

Source d.f. SS MS F F,.


Fertilizers .. 2940 .. ..
3.10

Within ... .. .. ...


Group

Total .. 6212 .. ..

(a) complete the above table by filing In the values marked by.
(b) test at S% level to see whether the fertilizers differ significantly.

Sol.

(a) here n - Total no. of observations = 24.k = No. of samples = 4.


:. Total d.t. = N - 1 = 24 - 1 = 23; d.t. for Fertilizers (Le., d.l.) for between the

Groups) = k-l = 4-1 =3;

os
... dJ. for Within group = N - k = 24 - 4 = 20

SSB = Sum of the Squares Between the Fertilizers = 2940

... SSW = Sum 01 the Squares within the group = SST- SSB

= 6S12 - 2940 = 3272 .

.'. MSB = Mean Square between fertilizers

= SSB _ 2940 - 980


d.f - 3

& MSW = Mean Square Within the group

3272
= 163.6
20

MSB 980
:.F= MSW =163.6 =5.99

The required completed table is given below.

TABLE

Analysis of VarIance table

Source d.f. SS MS F F,.


Fertilizers 3 2940 980 ..
'.99 3.10

Within Group 20 3272 163.6

Total 23 6212

(b) We see that the calculated value of F i.e., 5.99 > the tabulated value 3.10 of F at 5% level with
dJ. '/1 = 3 and '/2= 20. Hence we conclude that the fertilizers differ significantly.

"
Ex. 42. A company appoints four salesman A,B, C and 0 and observes their sales in three seasons:
Summer, winter and monsoon, The figures (in Lakhs) are given in the following tables:

Salesman

A
• c D Total

Summer 36 36 21 35 '28

Winter 28 29 31 32 120

Monsoon 16 28
" " 112

Total 90 93 96 360

carry out an analysIs of variance.

Sol.

Let the Null Hypothesis be Ho : There is no significant difference between salesman or between
seasons.

Ttle given data are first coded by subtracting 30 from each observations and then classified
according to two factors.

(i) Salesman and (ii) Seasons In the following Table.


TABLE

calculations for Analysis of Variance.

A B C D Season
~Iesman Total

s,,~1
Summer 6 6 -, 5 8

Winter -2 -I 1 2 0

Monsoon -4 -2 -I -2 -,
Salesmen 0 J -, 6 O-T
Total
( Grand
Total)

T ~ (0\1
Correction Factof= _=..s::.L= 0
N 12

sse = Sum of Squares between salesman

(ot (3)~ (-9)' (6)~ l'


J + J + 3 + 3 N

,,0+3+27+12-0=42

AnddJ. = (-1 = 4-1 = 3

SSR" Sum of Squares Between Seasons

(8t (0)' (-8)~ l'


--+ --+ +- ----16 +0 +16-0 =32
4 4 4 N
SST= Total of sum of Squares

•• T~
N ={36+4+16+36
+ 1 +4 +81 + 1 + 1+ 2S + 4 + 1 }-o

= 210

And d.f.= N-l = 12 -1 = 11 .

...SSE = SST- (SSe + SSR)•• 210 - (42 + 32) = 136.

And d.l. = 11-{3 + 2} = 6

The Analysis of Variance table is below

TABLE

Analysis of Variance Table

Source of Sum of Degrees Mean Squares Test Statistic


Variation Squares o.
freedom (M'
(55)
(d.l)

Between 42 3 14 F 22.67
1.62
Samples 14

Within Samples 32 1 16

136 6 22.67 21.67


F LU
!6
Residual

Total 210 11 ..
The table vaful! for F for YI " 6 and Y2= 3 degrees of freedom at S% level is 8.94. Since the
calculated value 1.62 .; the tabulated value at 5% level. We conclude that there is no significant
difference between the salesman.

Again the table of value of F for YI= 6 and Y~= 2 degree of freedom at 5% level is 19-33. Since the
calculated value of F .; the tabulated value at S%. we conclude that there is no significant difference
between the seasons.
EX.43. Apply the techniques of analysis of Variance to ten foilowing data showing the yield of 3 Varietiei
of a crop each from 4 blocks, and test whether the mean yield of the varieties are equal or not. Also test
equality of the block means.

Varieties Blocks

, "
IV
'"
A 8 6 8
, 5 5 7 8

c 6 7 9 5

Given F.os = 5.143, F ,01" 10.925 for dJ. (2, 6) : F.os = 19.33 for dJ. (6,2) for F .0'" 4.757, Fe1"
9.779 for dJ. (3,6).

Sol. let the Null Hypothesis be He: The mean yield of the varieties are equal or the block means
are equal.

The given data are first coded by subtracting 5 from each observation and therl classified according
to two factors: (i) Blocks and (ii) Varieties.

TABLE

Two-way classification of coded data.

Blocks I II III IV Season


Total

Varieties

A .1 3 I 3 ,
B .0 0 2 3 ,
C I 2 , 0 7

Total 0 3 ., , 18: T

100
. T:.l (18)~
CorrectiOilFactor= ~= -= 0
N 12
sse" Sum of squares between Salesman

'"" JQ.L + {~):.l + (7)1 + (6)' _ r


3 3 J 3 N
"36.67 - 27" 9.6,
Andd.f. '" c-l" 4-1" 3
SSR" Sum of Squares Between Seasons
, i'
,6
~--+ -'-+--+ , ~:.l
, l=17.~-1.7 ••.
N
5

SST" Total Sum 01 Squares

"54-27"27

And d.f." N-l" 12 -1" 11.

,".SSE" SST- (SSe + 5SR)" 27 - (96.7 + .5)


And d.f." 11- (3 + 2)" 6

TABLE

Analysis of Variance Table

Source of Sum of Degrees Mean Test Statistic


VarIation Squares o. Squares
freedom
(SS) (M'
(d.t)

Between 9.67 3 3.22 3.22


Samples F 1.15
2.81

Within 0.5 2 0.25


Samples

F 2.81
0.25 11.2-1
Residual 16.83 6 2.81
Total 27 11 ..
101
Since the calculated value of F (viz. 1.15) is .: the tabulated value 4.757 at 5% level for (3,6) dJ. we
conclude that the mean yields of the varieties are equal.

Again sInce the calculated value of F (viz. 11.24) is .: the tabulated value 19-33 at 5% level for (6,2) dJ.
we conclude that the block means are also equals.

Ex.44. IQ test was administered to 5persons before and after they were trained. The results are given
below:

Candidates

lQ before
I

110
"
110
'"
123
IV

132
V

125
Training

IQafterTraining 120 118 125 136 121

Test whether there is ,my change in IQ after the training programme.

(t 0.05(4) = 4.6)

Sol. Let us apply paired t test.

let the null hypothesis be Ho: Il, = Ill, there is no significant effect of the training.

The alternative hypothesis is H,: 1-1, '1' f.ll, i.e., the IQ before training is less than the IQ after
training.

Candidates IQ before IQ after d=y-x de;-


Training (x) Training (y)

I 110 120 10 100 ."

11B -, 4
"
III
120

123 125 , 4

IV 132 136 4 16

V 125 121 -4 16

Total Ed=10 Edl", 140

'"
.,'
. d= rd
.. -..1.Q..-2
u-5-

-
lxS -rt k =0.82
€x140-100

Andd.f." 11-1" $-1 '"4

Thus t = 0.82 <: 4.6 at 1% level with 4 dJ. (Two Tailed)

Since the calculated value of t <: the tabulated value with 4 dJ. a11% level, we accept He at 1 %
level and conclude that there is no significant change in IQ after the training programme.

EX.45 A certain stimulus administered to each of the 12 patients resulted In the following increase of
blood pressure;

5,2,8, -1, 3, 0, -2, 1, 5, 0,4 and 6

Can it be concluded that the stimulus will, In general, be accompanied by an Increase in blood
pressure? (Given for 11 d.f., toOl = 2.7)

Sol. Hered "y-x" 5, 2, 8, -1, 3, 0, -2,1, S,0, 4, 6.

let the Null Hypothesis be He: III = III I.e. there is no significant difference in blood pressure before and
after administering the stimulus i.e., stlmulles Is not effective.

The Alternative Hypothesis Is H,: 111> jJ.,i.e., stimulus increases the blood pressure.

Now d" S, Z, 8, -1, 3, 0, -Z, 1, S, 0, 4, 6,; 1:d = 31

Now d'" ZS, 4, 64, 1, 9, 0, 4, 1, ZS, 0, 16, 36.

103
.-_Ed 31
..d-Il=--=::U8 ",,,,:E"".18'.
"

~
va Klh[il
{l1 x 18' -(31)'

and dJ." 1\-1" 12-1" 11.

to.ollor 11 d.f. '" 2.7 (One Tailed Test]

Since the calculated value of t > the tabulaled value with 11 dJ. at 1% level, we reject the Null
Hypothesis Heand conclude that the stimulus will, In general, be accompanied by an increase in blood
pressure.

Ex.46. The sales data of an item In six shops before and after a special promotional compaign are as
under;

Shops A B C D E F

Before Compalgn 53 28 31 4B 50 42

After Compaign 58 29 30 55 56 4S

Can the compaign be judged to be a success? Test S%level of significance.

'04
Sol. The compalgn will be succeis'lf there is a significant increase in the average sales after compaign.
In this case we have to consider the significance on one side only i.e., increase in sales.

SALES d=XI-Xl

Shops

Before Compaign After Compaign d d'

A 53 58 -S 2S

a
C
"
31
"
30
-1

1
1

0 48 55 -, 49

E SO 56 -6 36

F 42 4S -, 9

n:: 6 ur =-21 ur1= 121

Null Hypothesis Ho: III = III I.e., there Is no difference in the average sales before and after the
compalgn.

Alternative hypothesis HI; III < III I.e., average sales have improved after the compaign.

cr • I l:.d'_ (l:.d)' •
~ 121
-,-
2
- (3.') '" 2.8136
/\j II II

o
U<>b
•••••••••••• ors.£, - r--7
,"••
-I

The tabulated valUe of t at 5 d.o.f. at 5% level of significance (one tailed test J Is 2.015.
The computed value of t = 2.78 being more than the table value, we reject Hoand conclude the
sales compaign has been a success.

lOS
Important Note:

Some author use the letter s for standard deviation and calculate the unbiased estimate of S.D.
by using the formula

s=~ I:(X.:l)2
SE s
II-1
and then

But the general formula for S.[. fOf pair t-test remains the same i.e.,
-""
~ urtr-(Ld')
S[= --==~--
n rJ Il- 1


n{n- I)

Ex.47 10 Accountants were given intensive coaching and four tests were conducted in a mOrlth. The
scores of tests 1 and 4 given below:

Serial No. of 1 , 3 4 5 6 7 8 9 10
Accountants •

Marks in 1" test 50 42 51 42 60 41 70 55


" 3B

Marks Irl4' test


" 40 61 52 68 51 64
" "
50

Does the score from the 1 to test 4 show an Improvement? Test at 5% level of significance. (The
value oft for 9 d.oJ. at 5% level for one tail test is 1.833 and for two tail test is 2.262)

501. let us denote the score of first test with SUffiK1 and that of the fourth test with suffix 4 anel thefl
taking the null hypothesis that there is no improvement, we can write.

Ho: Ill" 14 (i.e., there is no improvement)

HI: 14 > III ( i.e., the coaching has resulted in improvement)

Since we have matched pairs, we use paired t-test and work out test static 't' given by:

106
~ nl'<f-Il(d)~
t. i IS,E,wbon 'i •••.• onof4ondS.E.---------
11(11- I)

Marks In 1" test Marks in 4' test Difference Difference Square

" " d=.l-.' "


50
" ." 14.

42 40 -2

51 61 "a 100

42 52 "a 100

60 68 .8 64
"
.
41 51 "a 100

70 64 -6 36

55 63 .8 64
.,0
"
38
72

SO .12
'00
14.

n= 10 .I:d=72 1:d2= 856

d .nII0.7.2

856-10(12)'
•• 1937
10(10-1)

H.ncot
.. -
S.E.
. "
..-.3717
1,1137

Degrees offreedom = n -1 = 10-1 = 9

'"
As H, Is one sided, we shall apply one tailed test {In the right tail, because H, Is greater than tvpe
for determining the rejection region at 5% le~el.)

The observed ~alue of t '" (3.717), Is more than 1.S33, and hence, in the rejection region.
Accordinsly, we reject Ho(I.e., we accept H,), and conclude that coaching has improved the standard.

Ex.48 A company can claim that the weight of their product is 10 kgs. A sample of Items taken from a
lot supplied by the company has shown the following weights.

10.2,9.7,10.3,10.0,9.8,9.7,9.6,9.7,9.4

Is there any statistical evidence to support the claim of the company about the weight of the
item?

dJ. 11 10 9 8

to.", value 1.796 1.812 1.S33 1.860

Sol. Mean weight of the sample

___________________
10.2 + 9.7 + 10.3 +10.0 +9.8 + 9.7 + 9.6+ 9.7 + 9.4 •• 9,8
10

If X'.Xl.... X,odenote the weight for the 10 Items in lhe sample then

,.I:'",,~,- 104,4 + 94JJ9 + 106JJ9 + 100 + 96JJ4.•.94.09 .•.9216 .•.9~.16

89 .•.88.36 '" 961.48

,
_(1':) n- 961411 (9$)" 10 - l.0Il

- ~ -1.039 -1.04

let Ihe null hypothesis be that the mean weight of the item Is 10 kg. Alteroative hypothesis is mean
weight.,. 10 kg.

108
t • <I'IJ.)-f: ~ .\Y,S-IO)--JlO--F
- 104

- (~~ Yn . _ .' '_"_'_'_U_ •• 1,824

'"
At 5% level of significance with 9 dJ. the tabulated value" 1.833.

Since the calculated value is less than the tabulated value, we can accept the claim of the
company that the weight of Ii'll' item is 10 kg,

Note: In the above example if we consider two tailed test then we have take 10% (eve! of significance
and if we lake one tailed test we have to consider 5% level of significance .•

EX.49 Eight students were given a test in statistics, and after one month's coaching, they were givl"I1
another I('sl of the similar nature. The following table gives the difference in their marks in the second
test over the first:

Roll Number: , , 3 , 5 6 , 8

Difference In Marks: , ., .8 ., .,
6

ISthe difference in marks statistically sign1flcant?


" 5

Sol. !d'" 16+ 4 +36+ 64 + 144 + 25 + 49 +4" 342

!d,,4-2 +6-8+12 + 5-7+2" 12

"
r d' -
.
rd "'..12....U = d

n (d)' ~ 342 - 3 ( 1 j)' • 342 _ 18 • 324

let the null hypothesis be that the training is not effective i.e., there is no significant difference in thl'
marks of the two tests.

Alternative hypothesis: Training is effective.

t. d -F" -y;;:l ,-'"--F or;


• -0.623

~ !ee-n(eJ)2 "

'"'
At 5% level of signifkance with 7 d.f. tabulated value of t = 2.365 for two tailed test and t" 1.895
for one tailed test. Since the calculated value is less than the tabulated value, we can accept the null
hypothesis and cOflclude that the training Is not effective I.e., the differeflce ifl the marks Is flot
statistically significafll at S% of significance.

EX.50 A certain drug administered to 10 patients showed the following additional hours of sleep:

-10,0.5,2.7, -0.6, 1.2, 1.8, 1.6, 3.5. 0.2, -1.7

Can it be cOflcluded that the drug does produce additiOflal hours of sleep7

{Given: t C.OIS = 2.262 2.228

d.f. 10

$01. Here n = 10,

Ld = -1 +0.5 +2.7-0.6 + 1.2 + 1.8 + 1.6+3.5 +0.2-1.7" 8.2

Ld' = 1 + 0.25 + 7.29 + 0.36 + 1.44 + 3.24 + 2.56 + 12.25 + 0,04 + 2.89 = 31.32

~ d" _ n (d)" M 3131 -10 ((1,6724)


.• 3131_6724. 14j96

tet the null hypothesis be that the drug Is not effective I.e., the drug does not produce any
additional hour of sleep.

Alternative hypothesis Is drug is effective.

t. d-F~ .Q,81"~~

~ !dl_n(d)1 -V 24,596

Since the calculated value of t Is less than t 0,015 with 9 d.f. I.e., at 5% level of significance, the null
hypothesis cannot be rejected. Hence we conclude that the drug do not produce any additional hours of
sleep.

110
EX.51 A certain stimulant was administered to 10 patients In a hospital and their blood pressure
showed the following:

-2, -3, +5, +3, +1, 0, -2, +4, -3, +5.

Can it be concluded from the above data that the stimulant has Its impact on blood pressure?

Sol. Ld"-2-3+5+3+1+0-2+4-3+5"S

Ed - 8
-- Il '" d "'- 10 - 0.8

Ld2" 4+9 + 25 + 9 + 1 + 0+4 + 16+9 + 25" 102

1-0.8#$ - 0,716
9.777

t O.OlS for 9 dJ. "2.262 ~ stimulant is not effective.

Ex52 A certain diet newly introduced to each of the 12 pigs resulted in the following increases 01 body
weight:

6,3,8, -2, 3, 0, -1, 1, 6, 0, 5 and 4.

Can you conclude that the diet effective in increasing the weight of the pigs? ( given t "0' for 11
d.oJ." 2.20)

Sol. tet x: 6, 3, S, -2, 3, 0, -I, 1, 6, 0, 5,4 so that L K" 33.

LXI" 36 + 9 + 64 + 4 + 9 + 0+ 1+ 1 + 36+0 + 25 + 16" 201

X- -12 " -~,75; s-

He:~"O, i.e., diet is not effective

111
Nowle.tot.ti.li.;.I.S~
51
- 2.7.5 F
F
_...rJ.1.-x 3.3166 _ 3DO~
3031l

At 5% level of ~ignificance (dJ. = 11), tabulated value of t is 2.20 (two tailed test) and the calculated
value Is greater tnan the tabulated value, so we reject Hoand conch/de that the diet is effective. [ In fact
it is question of paired I-test}.

Note: If we consider single tailed test, the value 011 = 1.796. Since we are Interested onlv in the increase
in weight we should preler one tailed leSl because two tailed 1<,,\ wHisimply test the change In weight.

Ex.53 The number of car accidents per month in a metropolitan city was found as below:

20,17,12,6,7,15,8,5,16 and 14. Use chi- square test to check whether these frequeflClesare
in agreement with the belief that occurrence of accidents was the same during the 10 months period.
Test at 5% level of significance.

501. We have the null hypothesis, HD, that the occurrence of accidents was the same during the ten
month period.

Total number of accidents 1n 10 months.

= 20 + 17 + 12 + 6 + 7 + 15 + 8 + 5 + 16 + 14 = 120

Ex peeled n\n ber(Avor"S. numb •• Df ",,,denIo) per •• onlh • ..!1Q... • 12


••
x' _ r {(Oi-Xiil.J
i-I Xi

(20-12)' (l7-1XJ' (12-1:!)' (6-121 (1-12)' OJ-I:!)'



12
• 12
• I. + 12 + 12 + 12

(8-121 (5-121 06-121 (14-12)"


+ -
12
- +
12
+
12 • Il

.--.--
6415
12 12 ...--.--.--.--
362.1916
12 12 12 12

'M
--,,-.2033

Oegree of freedom" 10 -1 = 9
m
The tabulated value of X2 for 9 degree of freedom for two tailed test at 5% level of significance is
19.02. Since the calculated value is more than the tabulated value, we reject our null hypothesis and say
thai the given data do not support the belief that the number of accidents were same during 10 months
period.

E~.54 A sample analysis of e~amination results of 500 students was made. It was found that 180
students had failed, 170 had secured a third class, 110 were piaced in second class and 40 got a first
class. Are these figures commensurate with the general e~amination result which is in the ratio 4:3:2:1
for the various categories respectively? Answer at a '" 0.05 (Table values of chi-squares at a 0.05 for 3
d.f. and 4 d.f. and 4 d.f. are 5.99, 7.81 and 9.49 respectively).

Sol. We have the null hypothesis Ho, that the result of the e~amlnation were commensurate with the
general e~amination result which is In the ratio 4:3:2:1.

Total number of students" SOO

Observed number of failed sWder'lts '" 180

£'pol:l.ocl n"", b •• of foilocl .tl.lcl."t. _ 5()0 ~4.2110


4+~+2+1

Observed number of students getting third class = 170

E'poetod nl.lflbot of sludonto S'tIing tli,d 01•••• ~ • 3. 1.10

Observed number of students getting second class = 110

Exp.oI.dn •.•••b" orstudem. golling ••cond 01•••• ~OO.2 • 100

Observed number of students getting first dass = 40

Ezplcled n•.•••b" of studorti golling fi,st duo • .11~ • 1 -.10

(Oi-Ei)' (180-2001 (I'D-Oil)" (110-100)' (40-.10)'


• + + +
E, 200 ISO 100 50

= 2 + 2.67 + 1 + 2 = 7.67

Value of X2at 3d.f. for a = 0,05 is given to be 7.81

As the calculated value is less than the tabulated value, we accept our null hypothesis and say
that the observed figures are quite commensurate with the general e~amination result.
Ex.55 The following table shows the distribution of goals In football matth:

No,of Goals: a 1 , 3 4 5 6 7

NO.OfMatches 95 158 108 63 40 , 5 ,


Fit a POisson distribution and test the goodness of fit.

Sol. We have null hypothesis Ho. that the Poisson distribution can be fitted to the data.

Total number of matches'" 95 + 158 + 108 + 63 + 40 +9 + 5 + 2 ••480

Total number of goals '" 95 x 0+ 158 x 1+ 63 x3 + 40 x4 + 9 x 5 + 5 x 6 + 2 x 7 '" 812

A_og. n"",b" of ~01. po. mol<h. ~~ • 1.7 (opp"o.)

The expected frequencies of the Poisson distribution are computed from the expression:

E'p",udfr."",ncy-N ( '~")
.1 . lten,"N. 4&l••••• 1,7

:.For Poissorl distribution we have:

.-lY(I."l
E%p"lod Fu'f'oncy" 480 ----
.,
Where x '" 0.1,2,3,4,5,6 and 7.

Working out the successive terms of this distribution, we get the following frequencies (results
expressed to the nearest whole number):

NO.OfGoo1. Obo •••.• dF,,'IuuICY E%p.t1.ed F.equency

,
0 95 ~

7
150
lOB ''"
'"
n

n
7
,
4 "
40 ~
!OJ

7
16 ~ 14

Since no expected frequerlty should be less than 5, we pooled the last three frequerlcles.

x. •

114
• iY5-8:8)' (158-130)' (108_126)' (63-72)' (40-30)" (16-14)'
X.- -.-~-_.-.- -"_--_-.' '.~ _
88 DO 126 72 30 14

x'" 0.56 + 0.43 + 2.57 + 1.12 + 3.33 + 0.29" 8.30

Then no. of degree5 offreedom: 6 - 2 ,,4 (PI. Note)

For 4 degrees of freedom at 5% level of 5ignificance, the table value of X'" 9.488, while the calculated
value X'" 8.30. Since the calculated value is less than the taboiated value, difference between expected
and observed frequencies is not significant and can be ignored. Hence the fit is good.

Ex.s6. In experimental on pea breeding, Mendel obtained the following frequencie, 01 seed,; 31S round and
yellow; 101 wrinkled and yellow, 108 round and green and 32 wrinkied and green lotal 556. Theory predicts that
frequencie, should be Inthe proportions 9;3:3:1. Examinethe corre,pondence between theory and experiment

Sol. We have Ihe null hypothesis, Ho, Ihat Ihe frequencies are in the proportloo of 9:3:3:1.

On Ihe basis 01 hypothesis that the seeds are In Ihe proportions of 9:3;3:1, Ihe expected frequencies of
,eeds of four categories are:

313,104,104,3S.

3j6
"9 -""""j6"9-,12, 1'-iiJapp'",",

Ob••",'.d F•• quo""y hp."ud F••",••ncl'

'"'" ,Ii

'"
'"n '""
substituting the observed and expected frequencie51n the expression:

• (Oi_Ei)l
X'- I Ei

We get

x:•. ".'.'.'.'.'.'->' •. (101- 104)' (lOS_IO~)'


IO~ IO~.
(ll-H)'
II
'"
~ 0,013 •.0.086 •.0,154 •. O,2S7" 0.51

m
The oumber of degrees of freedom:

D.F.=4-1=3

For 3 degrees of freedom at 5% level of significance, the table value of X' ~ 7.B15 which is much greater
than the computed value of X' = 0.51. Therefore, the difference b~tweeo observed and expected frequencies Is
ootsigoificant 3nd m3Ybe Igoored. Hence, there is a perfect correspondence between theory and experlmeots.

Ex.57. 50 students selected at random from 500 students enrolled 10 a computer crash programme were
classified accordlog to age an grade points giving the following data:

Age (10years)

Grade Points
" .0'
uoder
21-30 Above 30 Total

Upto 5.0 , 5 , 10
,
,"
5.1 to 7.5

7.6 to 10.0 ,
5
"
" "
Test at 5% level of significance the hypothesis th3t age and grade points are Independents. Table value of
X' (Chl.Square)

d.f. 4 5 6 7
" 9
9.4BB 11.070 12.592 14.067 15.507 16.919

501. let Ho: Age and gr3de points are Independent.

H,: Age and gr3de points are oot independent.

00 the b3Sls of Ho, Ihe expected frequencies are obtained 3Sbelow:-

Age in Years

Grade Points
" ,,'
under
21-30 Above 30 Total

Upto 5.0 , 5 , 10

5.1107.5 , ,
7.6tol0.0 , ,
5

, "
"
Total
" " " 50

116
Since value~ Ie•• th~n 5 are occurring In some cell of the expected frequencle~, we have to ~malgamale
the~e cells to their neighbours. After amalgam~tion the new frequencies .

• Expectedfrequencies of each cell are calculated bVthe formula

rOwlotal x CdUllln 1Ol0i 10 x IS •


• •• (<r co1\ira'" linll ••••...,dfint cdumn.
Gn:"d 10101 SO

OBSERVED

Age In Years

Grade Points •• d 21-30 Above 30


"
under
Total

Upto 7.S n ,
7.6 to 10.0
15

, , , "
"
Total 15
" 15 SO

EXPECTED

Gr~de Points •• d 21-30 Above 30


"
under
Total

Upto 7,5 9 n
7.6 to 10.0 6 ,
9
"
Total
6
"
15
" 15 SO

(11_9)' (!l_ll)' (7-9)' (~-6>, ~-S)' (5-6)'


X,.~~~ ..• .--.-_.
• ~ -+ 12 -.- • ,

The v~lueof X'0,005 for 1 x 2" 2 dJ. Is 5.991. Since the computed value is less than the table value, we
~ccept H.and conclude that grade point and age are two Independent qualities.
Ex.58. Of the patients admitted consequently to the hospital, e~ery fifth one is treated by new therapy and the
remained by the old therapy. Out of the 100 patients treated by new therapy 20% die whereas 30%die without of
those treated by the old therapy. Test at 5% le~el of significance whether the difference In fatality is more than
occurrence by chance. (Use X'test).

Sol. let us con,trud a 2 x 2 contingency table of observed frequencies from the gl~en information as below: •

Not Alive Ali~e Total

New Therapy

OldTherapy
" "' '"'
Total
'" '"' '"
,,,
'" '"'
[30%of 400 = 120] We willformulate the hypothesis a, below:-

Ho:The difference in fatality is Just due to chance

H,: The difference in fatality is more than occurrence by chance.

On the presence of Ho,we can find the expected frequencies as below:-

EXPECTED FREQUENCIES

Not AII~e Alive Total

",W (IOO~UO)
~ 2S
(lOO x 36D)
lOO - 12 '"
Therapy
'"
(~OO~ UO) (JOOx560)
lOO - lSi
Old Therapy
lOO • 1I2
'"
Total ,,,
'" '"'
• If one patient, is treated by new therapy by new therapy then 4 patients are treated by old therapy and the
total number of patients admitted are 5. If 100 patient, are treated by new therapy then patients treated by old
therapy = 400 and total number of patienh" 500.

For a contingency table of r row, and c column. Degree, of freedom = (r- 1) (c -1)

(Oi_Ei)2
Th. ,..lu. cfX'wi1! lhtn b•• L Ei

(SO- 12)' {I20 - lI2r (2S0-288)'


.
(.!O - 21)'

" 12

112 lSi

"2,286 + 0.889 + 0,571 + 0.222 + = 3.96g

1'8
From the table, we find the table value of X' for] d.f. at 5'" level of significance is 3.84. Compound X' >
table X'value. Hence Hois rejeded. The inference is that the difference in fatality is more than occurrence by
chance.

Ex.59. ]00 Students randomly selected from the 1000 students enrolled in an MBA program were cross-
classified by age and grade point. Accordingly, the following data were complied:

Age (in years}

Grade Point 25 and under 26-28 Over 28 Total

Up to 3.0 , ,
3,1 t03.S 18 18
5
"
8
"
3.6 to 4.0 11 11 17
"
Total
" " " '00

At 5% level of significance, test the hypothesis that age and grade points are independent.

501. We set Ho: Age and grade points are Independent

H,; Age and grade points are not independent.

On the basis of Hewe can obtain the upected frequencies as shown below:

Age [in yers)

Grade Points 25 and under 26-2g Over 28 Total

Up to 3.0 , , , "
3.1 to 3.5 18 18

3.6 to 4.0 18
11
"
Total
18 11
"
" " " '00

[E~pected frequency '" (row total)( Column Total) I Grand Total]

{Oi-Eir
X' • "" ~-"
"- Ei
us
(6 _7)2 (9_7)2 (j_6)2 (18_14)2 (14-14)2 (&_12)2
~-~-+ + ~-~- + + ~--~ + ~--"
7 7 6 14 14 12
of (11 - 14i (12-14)~ + (17-12)~ = 6369
14 + 14 12

At 5% level for (c - 1) (r - 1) '" 2 ~ 2 '" 4 d.o,f. the I~ble value of X' is 9.488. The computed v~lue of X' being less
Ihan this is insignificant. Accordingly, we cannot reject H, , and conclude that age and grade points are
Independent

h.60. A Chemical extraction plant processes sea water to collect sodium chloride Jnd magnesium. It is known
that sea water contains ,odium chloride ,magnesium and other elements In the ratio of 62:4:4 sample of 200
tonnes of Sea water ha, resulted In 130 tonnes of sodium chloride ~nd 6 tonne. of magnesium .Are the,e data
consistent with the known composition 01 sea water at 5% level?

Sol. We get Ho:Composition of sea water is 62:4:34 and

H,: Composition is not 62:4:34

Where H, is null hypothe.is and H, I. alternative hypothesis. As per the null hypothesis, in 200 tonnes of
sea water we expect 124, 8 and 68 tonnes of sodium chloride, magnesium and other elements respectively.

We now compute X' vaille.

£lomed OlJ•••",.d hpernd (0 -E) (O-E)' (0-£)'/£


quOl1liy quan~y
(lon,s) 0 (t.rn. •• ) £

Sodium
Clorid"
'" D' • 36 361124~0290

-,
Mlgnejlm
• • 4 4/8 ~O.50D

au., ~ -, 16 16168~D.235
Domm,
"
Tolo! ~, '00 x'- 1.D2$

As Ihere are three types of elements, n "3,The degrees of freedom ~ n.1 ~ 3.1 = 2. At 5% level, for 2
degrees of freedom. the table value of chi-square is S,!l!lL The computed value Is less Ihan Ihis table value.
Accordingly, it can be concluded thaI the observed data are consistent with Ihe known composition of sea water.

DO
E~,61 A sample of 300 students of Undergraduate and 3000 students ot Post graduate clas,es of a University
were as~edto give their opinion towards the autonomous colleges. 190 of the Undergraduate and 210 of the Post
graduate ,tudents favoured the autonomous status,

Present the above fact in the form of a frequency table and teSl, at 5% level, that opinions of
Undergraduate and post graduate students on autonomous status of colleges are independent (Table value of chl-
square at 5% level for 1 dJ. is 3.84).

Sol. We,et

H.: Opinions Onautonomous status and level of Graduation are independent

H,' Opinions on autonomous ,lalUS and level of Graduation are not Independent.

let uSnow from a contingency table of observed frequencies in which expected frequencfes are shown
within brac~et,.

Favo", N ct in Fa V<\It 'oW

U<d"'grdciuah 190(100) 110(100) '00


Po.t..gr~t. 210(200) 90(100)
'""
'00
Toto!
'"" 600. Sap!. Si••

Expected frequencies are computed by multiplying Ihe row total and column total and then dividing the
product by the sample size.

(Oi_ti}2
Now X' • L Ei

.--.--.~
tOO
200
100
200
100
100

d.oJ. ~ (r- 1) (c -1)" 1, X'.os lor 1 d.o.f." 3.84

Computed value of X' being less than the table value, the difference between observed and e~pected
frequencies are not large enough to reject the hypothesis 01 Independence at the 0.05 level of significance. Hence
the events are In dependent I.e.• the opinions of Undergraduate and Post graduate students are independent.

Note: It may be recalled thaI a contingency table i, a two-way table of frequencies corresponding to two factors
of cla"LfLcation,

121
h62. calculate the expected frequencies for the following data presuming the two attributes viz., condition of
home and condition of child ilS independent:

Condition of Home

Clean Dirty

50
Condition
of Child
Clean
"
Fairy Clean 80 80

<5
"'" "
Use chl-~quare test at 5% level to state whether the two attributes are independent.

(Table value~ of chi-square at 5% for 2d.1. is 5.991 and for 3 dJ. Is 7.B15 ,lnd for 4 dJ. is 9.488)

"'I. An expected frequency E, corre,ponding to each cell in the table will be given by

,- R~ toto! x C durnn 101.01


Ofand T do! i.e, •••••pl • ..:••

We form the table of ",petted frequencie~ with the help of the abovl' rule and write the expected
frequencies In eath cell within Brackets

Condition Condition of Home


of Child

Clean Total
"'"
Clean 70 (185 x 120) /300: (74) 50 (115 x 100) /300: (46) 120

,
Fairy
Clean
80 (1g5 x 100) / 300
61.67
20 (115 x 100) / 300 :
(38.33) '"
01.'" 35 (185 x 80) /300 - 49,33 45 (115 x 80)/300 ~ 30.67 80

Total US SOO
'"

m
let us set the null and the alternative hypothesis as follows:

H,: No association e.lsts between the attributes, I.e., the two a!tributes are independent.

H,: Ana"ociatlon e.ists between the attributes.

X' •• '<'
L
(Oi_Ei)2
---
Ei

CU-49.33)'
.
" ----. ----- .• -----
(1C-7~' (50-40'
4IS
()l0-61Jj7)' (20-38.33)'

(45_30,67)l
61.67 38.33

• -----.----
49,33 3067

~ 0.2162 + 0.3478 + 5.4482 + 8.7657 + 4.1627 + 6.6954

~ 25.636

d.oJ.: (r-l) (c-l)"2'1: 2.

Compound value of X' being more than 5.991, it is significant, and hence, HGis rejected. Therefore. the
two attributes are not independent, I.e., association e.ists between attributes.

h.63 Out of 800 persons, 25% were literate and 300 had travelled beyond the limits of their district. 40%
of the literatures were among those who had not travelled. Prepare a 2 x 2 table and test at 5% level
of significance whether there is any relation between travelling and literacy.

Sol. We have the null hypothesis, H", that literacy and travelling are independent and HL the alternative
hypothesis that travelling and literacy are related I.e., Independent.

Humber <>fliter~lopenon. - l~SO,,800 - 200

'" •• '"
'" ." ."
>00
'" •••
On the assumption that literacy and travelling are Independent, the expected frequency table will be as
under:

T, ••• llod No' ••••••11.<1 To'.1

.
Li ••••••
Ill•••" •• m" '"
.m
~O
••
••• , •••
123
~ (120-73)~ (SO-125i (1SO- 225)" (420 - 375)~
X .~---~.+~---~+ 225 + 375
75 125

'-27.0+ 16.2+9.0+5.4 = 57.6

At 5% level for 1 d.oJ. table value for chi.square is 3.841 but our computed value is muoh larger
than this. Hence, the assUmption of Independence cannot be accepted. Accordingly, we conclude that
there Is relation between travelling and

Ex.64. To test the efficiency of a new drug a controlled experiment !Conducted wherein 300 patients were
administered the new drug and 200 other patients were not given the drug. The patients were monitored and
the results were not given the drug. The patients were monitored and the results were obtained as follows:

Cured Condition WOl"lened No Effect Total

Giventhea-ug 200 300

Not givon the <rug 120 30 300

70 110 500

Use X' (chi square) test for finding the eefect of the drug,

So'- Null hypothesis is He" The'drug is not effective.

OBSERVED FREQUENCIES(0)

Coc<d Condition Wors.md No Effect To""


Oro, 200 <0 60 300
No Drug 120 30 50 <00

370 110
T"" 70 500

fXPECTED FREQUENCIES(E)

Coc<d Condition Worsened No Effect To""


Oro,
'" " 00 300
No D,ug
'" " " <00

T"" 370 70
'" 500

I"
,.
Expect ••d frequency for each tell h~s been Cilltulated by using the formula

R_ toto! • C duma 1ol<l1


Orand T CIlO1
i.e, ..,.pl. IU'

We set Ho:The'e Is no effect of drug

H,: The'e is no effect of drug.

COMPUTATION OF CHI-SQUARE

0
, (0 -E) (O-E)" (O-£)'IE

200 192 +8 64 0.3333


120 12S .s '4 O.SOOO
40 42 .2 4 0.0952
30 2S +2 4 0.1429
•0 6. -Q 3• O.S4~4
'0 44 +6 3. 0.8182
X~= 24350

dJ. = {2 -1) (3 -1) E 1, tabulated value of X'O.OSfor 2 dJ,,, 5,99

Since calculated value of X' is less than the tabulated value, the effect of the drug Is not significant and we
accept our null hypothesis thilt the drug Is not effective.

EX.6S. The table given below show, the dilta obtained during an epidemic of cholera:

Attacked Not Attacked

\nOl'ulaled
" ,,,
Non Inoculated
'"
Test the effectiveness ollnoculalion in preventing the attack of cholera.

llIGiven; X'O.OS: 3.841 for d.f.; 5.991 for 2 d.f.; 7.815 for 3 d.1. III

Sol. We have the null hypothe,ls Ho, that the Inoculation Is not effective In preventing the attack of cholera.

H1: Alternative hypothesis that the Inoculation Is effective In preventing the attack of cholera.

m
Attacked Net AUad;td Row Total

lL,oC\~attd
NOll lLloc.daltd
41 (A)
106 (B)
lJ1 (C)
7-lft (D) '"'"
Cehllll1l Tet.l
'" 980 1128 = Or.nd Tot.l

Grand total (sample sile) " 1128

E:q>ected Frequency = Row lcto! • C dum" toto!


OfoffiTotol

Expected FrequencyOf cell A == 174 "148 =3S.9S=36


1118
Expected fr"'luen~ of cell B = 148 - 36 = 112

Expectedfrequency of cell C" 274 - 36" 238

Expectedfrequency of cell D ••980 - 238 = 742

Ohscn"cll E:tpccfcd (0 -E):.I (0 -Ei


Cell
Frcq.O Frcq. E E

3.
A 41 3. (-16 -36):.1 ]6= 1.000

B 10. m (106 _112)" -f&- = 0,6420

C 232 3.
238 (232 _138)2 238 0.1500

D 36
748 741 (7-18-7-ul 74"2 == 0,0485

Total 1.8405

Degreesof freedom = (2 -1) (2 - 1 ) = 1 = (No. of rows-I)" (No. of Columns - 1)

Calculaled value X';s lessthan the tabulated value X' at 1 d.f. at 5% level of significance.

We can presume that the null hypothesis Is valid and the Inoculation 15not effective In preventing the attack of
cholera.

126
h.66. A market analysts took a sample of 20 markets In a large city In an attempt to determine how much
variation is there in the butler prices. The 20 prices that were quoted to him for the four samples of the butter
yielded the same value; x: 100 and x" 9. Th" problem now Is to find a 95% confidence interval for the standard
deviation of all the market prices.

Sol. Suppose a large number 01such sample of 20 prices were laken and their standard deviations computed,
the sample m"an being 01no Interest here.

(n -1) s'" 19" 9': 1539

From the X'dl,tribution table it i, found that for d,o,f. X"m, ~ 32,85 and X' 0."'" 8.91

Required conndence Interval is therefore given by.

1539 1~J9
~ 0' ~
32.85 8.91

i.C'. -16.85 ~ 0' ~ 172.73

of,6.8-1 ~ 0' ~ 13.l-l AilS.

Ex.61. A sample of 101 light bulbs yielded a standard deviation of 80 hours burning time, whereas long
experience with the particular brand showed a standard deviation of 90 hours. Using u" 0.05, te,t if Ihere is any
difference Inthe slandard deviation,

d.o.f: 101-1" 100

'
X~=(Il-l ) _l00"~SO) =79 ,
- 020 (90)

Tabulated value of x' ••osfor 100 d.o.f." 124.3

Since the calculated value is less than the tabulated value we accept the null hypothesis that there 15no
slgnlficanl difference Inthe S.D.

Note: Identification of the te,t statistic and its di,tributlon.

Ex.68. Weights In kg.0110 stud"nts are given below;

38,40,45,53,41,43,55,48,52,49

Can w" say that variance of the distribution of weights of all slUdents from which the above sample 01 10
students was drawn, is equal to 20 'quare kg. ?

m
'0'.
X> Xi-j'-)'.:i-47 (Xi-y i'
38
<0 .,.,., 81
,
<0 ,
"" 0 38
.," "
"""
10

,,,
0
,
M

""
3
<0 4

IX;-470 n- 10 L(Xi-Y)'.280

S""pl. Mom- r--' _LXi.-1-:( 470.47


o '"

Statement of hYPQlhesis flo; <i ~ 0'0" 20

Specification of the significant level. tel us consider S % significance level. Statement of the decision rule.
At 5% significant level" 10- 1" 9 d.o.f. X' 0-">''' 19.02.

Performing the calculations

x' •

• (n-I) ~
0'0

Making a statistical decision. Since 12.6 < 19.02, Hocannot be rejected.

Making an admTni,traHve decision. The data indicate that the population variance may be 20 square
Kgms.

'"
h.69. In a survey of 200 boy~. of which 7S were intelligent, 40 had skilled father~; while 85 of the
unintelligent boys had un~killed father~. Do these figures support the hypothesis that skilled fathers have
intelligent boy~.U~eX' te~t. Value of X'lor 1 degree of freedom at 5% level is 3.84

Sol. Numberof unintelligent boys ~ 200- 75" 125

let us prepare a table showing the observations.

ObP:fVed frequences

Unint...nigenl
Intelligent
boys
T'"
'"'"
.0
Scilledfalhers
'" '"
Unskilledt'lth,n 35
" IlO

To"" m 200
"
The following table gives the expected frequende~

rxp.ot.d fro'f'oncio,

Intelligent Ullinte1hgenl To'"


ooy' boys

ScilledfatMs
7j~80
200 -," 125-80 -50
'"0
80

125 120
Un~kiI1ed&th..-l 75~120.45
'"" '"0
M

-" "'
To"" m TOO
"
We ~et the null and the alternative hypothesis as follows:

He: No a~s.oclatione~ilts between skilledfathers and Intelligent boy~.

H,: An ass.rn:lationexi~ts between skilled f.ther~ and intelligent bar.;.


,
x' • r (Oi-li)-
,
"

'"
_100(15+9+10+6)_100. 4}O.J.Q...S.8S
4S0 450 9

This value is much higher than 3.84, the given table value, and Is significant. Therefore, we reject the null
hypothesis and consequently accept the alternallve hypothesis. Thus the given data .upport the hypothesis that
skilled fathers have Intelligent boys I.e., association exists between skilled fathe,s and intelligent boys.

Ex. 70. A certain drug is claimed to be effective in curing ooids. In an experiment, on 164 people with cold, half
of them were given the drug and half of them given sugar pills. The patients ,eactions 10 the treatment are
reoorded in the following table. Test the hypothesis thaI the drug Is not better than sugar pills for curing colds.

Helped Harmed No Effect

Drug
" " 20

Sugar Pills
" " "
Sol. The table showing the observed frequencies.

Helped Harmed No Effect Totat

Oru, 52 (A) 10 (EI) 20 (e)


"
Sugar Pills 44 {OJ 26 (F)
12 lE)
"
" " " '"
We have the null hypothesis, Hothal the drug is not effective in curing cold

82x96
,
E.po.t.od fr,q.>ency cOlJosponclng to A c,l1- 164 • 48_

Other e.pecled frequencies have been computed similarly, using,

R...- totol x Column IOto!


Or_ToW. = Expected Frequency

Table showing observed frequencies and eKpeeted frequencies.

130
Holped Horm.d NoEffool 'oW

0"" "
A- 48 '"
B-Il '"
C.23 "
Sug •• Pill,
"
D- 48 "
E. 11 "
I'"-n "
Toto!
" n
•• '"
x' may now b calculated as follows:

(10 -11)' + (10 - 23')' + (44 -4&)' + (11-11)' + (26 - 23)'


11 23 48 II 23

-un
Now the degrees of freedom are calculated as (r -1) (c- 1), where r" number of rows, and c ~ number of
columns.

In this problem, therefore, d,a, f. " ( 2 - 1 ) ( 3 - 1 ) " 2. From Ihe table, for d.a. f. X' 0.0>"S.991.

Ex,?l A Bombay film director claims that his films are liked equally by males and females. An opinion survey of a

random sample of llXJOfilm- goers revealed Ihe followed resulls:

liked Disliked

Males
'" '"
Females m 2",

Is the 111mdirector claim supported bVthe data??

IGiven .X' '.00" 3,8 5.99 7.81

d.l. " , , ,
Sol. The table of results is given below

Uked Disliked Total


(Row)

Males 402 (A) 193(8)


'"
Females 245 (e) 160 (D)

Total
'"
'" 353 Grand
Total •
"'''''
595 x 647
Eop.mdft'Cf.',n'yof ••l!A. 1000 • )85

;. Table showing observed values and expected values is as under.

liked Disliked RowTotal

Males 402(385) 193 (210)


'"
Females 245 (262) 160(143)

Total
'"
Grand Total ~
353
'"
'"""
• (4J:l-335)' (193-210r (245-262)' (160-143/
X. 385 + 210 + 262 + 143

'" 0.75 + 1.376 + 1.1 + 2.02" 5.246

Ho:The director's claim 15supported by the data

H,: The director's claim i, not supported

D<>grees
01freedom "(2-1) (2-1) ~1

x' "",at 1d.o.!" 3.8

Since the calculated value is more than the tabulated value, the claim is not supported by the data i.e.,
the films 01the dlr •.•ctor are not liked equally by males and females.

m
b.n 1600 familie. were selected at random in a city test the belief that high income familie. u.uallV ,end thei~
children to public schools and low income families often send their children to government school,.

The following results were obtained:

School,

Income Public Government 'fotal

Low
'" 506 1000

High 162
'" 600

Total '50
'" 1600

Te,t whether income and type of schooling are Independent.

Sol. H,: Income and type of .chooling are Independent

Expected frequencies are shown in Bracket.

494 (410) 506 (590)

162 (246) 438 (354)

x:, ,('"'"'_-_'"L~")~'+
a_

'W
"
(506 -59!J)'
590 +
."("~_._--,,~,,'
'"
-

~ 17.2 + 11.95 + 28,68 + 19.93 ~ 77.76

Which is much mOre than the tabulated value ofX'for 1 d.o.f at all level, of significance.

Hence the null hypothesi. i. rejected we can .ay that high Income families u.ually .end their children to
public schoois.

'"
TRY YOURSELF

I .Memory capacity of 9 students was tested before and after training. State at 5% level of signilican~-e
Whether the training was effective from the following score.?

S, I 2 3 4 5 6 7 8 9
.No.

Before 10
" 9 3 7 12 16 17 4

After 12 17 8 5 6 II 18 20 3

2. The sales data of an item in six ~)lops befoTe and after a special promotional campaign are as follows
Can the campaign be judged to be a success? Use 5% level of significance.

Shop A B C D E F

Before 53
" Jl 48 50 42

After 58
" 30 55
" 45

J. Sample of sales in similar shops in two towns are taken for a new product with the following results
Is there any evidence of difference in sales in the two towns? Use 5% level of ~ignilicance fOT te$ting
this diffeTenee between the mcans of two samples.

Town Mean Sales Variance Size of


sample

A
" 5.3 5

B 61 4.8 7

134
4. A personnel manager i~ interested in Irying to delermine whether abscntism is greater on one day
of the week. than on anolher. His rc<;ords for the pa~T years show the sample.

Daysofa Mon T., W,d Th. Fri


w~k

No. of 66 57 54 48 75
absenTism

TeSTwhether the ab!'><:nti~m


is unifomlly distributed,?

5. The eontin~ency Table below summarizes the results obtained in a study CQndUCTedby a I"Cseard.,
organisation wilh respect to the performance of four competing brands oftootbpasTe among the users TeST
wheTher incidence of cavities is independent •.•fthe brand of the toolhpaste used.?

Cavilies: ColgaTe Pepsodenl Close-up Anchor


Pasle

aro 9 13 17 II

1,5 63 70 85 82

>5 28 37 48 37

6. The following lable gives tbe number of good and bad parts produced by each •.•f three shifts in a
factory Is there any association betwcen the shift and the qualiTYof parts produced?

Shift Good B,d

D" 900 130

Evening 700 170

Night 400 200

135
7. An invcstm~nt consultancy flIm finds that 87% of 150 investors in city A prefer equity investment and
65.9% of 120 investors in city B prefer equity investment against debt investment. Test whether the two
cities differ in the proportion of investors preferring equity.?

8. A film director claims that his films are liked equally by males and females. An opinion survey of a
random sample of 1000 film gocrs revealed the following resulls:

Liked Disliked

Males 402 193

Females 245 16lJ

9. 400 women shllppers are chosen at random in the market A. Their average weekly expenditure on fllOd
is found to be Rs. 250 with a standard deviation of Rs. 40. thes~ figures are Rs. 220 and Rs. 55
respoclively in the market B where 600 women shoppers are chosen at random. Use 1% !e,'el of
significance to tesl whether the average weekly fllOd expenditure of the two populations of shoppers are
equal.

10 A sample of heights of 6400 soldiers ha., a mean of 67.45 inches and standard deviation of 2.56 inches
while a simple sample of heighls of 1600 sailors h3j; a mean of 68.55 inches and a S.D. of 2.52 inches Do
th~ data indicate that the sailors are on Ihe average taller than Ih~ soldiers?

1\ A stenogr .•pher claims that she can take dictation at the rale of 12U words per minute. Can we rejcct
her claim on the basis of 100 trails in which she demonstrates a mean of 116 words with a standard
deviation of 115 word,? Use 1% \.o.s.

12The following data give the yields on 12 plots of land in three samples. each of 4 plots, under three
varieties of seeds A,S and C.

32 )j 31 30
30 24 32 26
26 27 25 30

Apply technique of the Analysis of Variance to test whether dirference in the average yields under the
three varieties is significant or not.

l3.A consumer maga7.ine was interested in determining whether any differcnce existed in the average life
of four different brand of transistor balleries. A random samrle of 4 batteries of each brand was tested
with the following results (in hours).

136
Brand I Brand II Brand III Brand IV
12 14 12 14
15 17 19 21
IS 12
10 19
20
" "
20

Is there any significant difference in the average life of the four brands at 5% level?

14.An experimentor wanted to study tlte effed of 3 fertilizers on the yield of a crop. He divided tile field
into 12 plots and assigned each fertilil.er at random to 4 plots. Part of his calculatiun.~ are shown below:

Source d.f. 55 MS F F
..
Fertilizers
I"' ..
.. 4.26
Within .. .. ..
G.rou
Total 176

(a) Complete the above table by filling the gaps shown by ....
(b) Test at 5% levclto sec whether the fertilizers differ significantly.

15.A manager of a mercantile finn wishes to test whether its three salesman A,B.C tend to make sales of
the same size or differ in their selling ability. During a week there have been 14 sale calls - A made 5
calls, B made 4 calls and C made 5 calls. Following are the sales data for the week of the three salesman.

A: 500 400 700 gOO 600


B: 300 700 400 600
C; 500 300 500 400 300.
Perfoffit the analysis of variance and draw your conclusion.

16.. TItree varieties A, B,C of a crop are tested in a randomized block design with four replications. The
yields are given below in pounds:

Variety Replications (Blocks) Total

1 2 3 4
A
B ,,
6 4
6
8
6
6
10
24

C 5 10 9 "
32

Test whether there are differences between varieties. Test also whether yields of A differs significantly
from that of B.

17.Following tables gives the number of refrigerat(lrs sold by 4 salesman of Kelvinator (India) Ud .• in
three month 1anuary, February and March in the year 1978:

137
Month A B c D

January 50 40 48 39

February 46 48 50 45

March 39 44 40 39

Is Ihere a significant difference in the ~ales made by the four salesman?

18. Four different manufacruring processes were tried at three different stations and the average
measurement of a quality characteristics of me product by three proce:;scs obtained as in the following
table, Perform the analysis of variancc of Ihe data and list for the difference between the processes.

Station Proces:;es
A B C D
I 7 14 II 11
11 IS 16 I' 10
111 8 I' 10 12

19.5uppose that we are interested in establishing the yield producing ability offour types of soyabeans A.
S, C and D. We have three blocks of land X,Y and Z which may be different in fertilily. Each block of
land is divided into four plms in each block by a random procedure. The following results are obtained.

Type Block ABC D


X 5 9 II 10
Y47810
Z ) 5 8 9
Test whether A. B. C and D are significantly different

20. The following data give me number of units produced per day by 4 woken A, S, C, D using four
different types of machines MI. Mz, M1.!'04.

M, M,
A
B
45
'0
42
32
'"
38
34
C 43 36 40
D 36 38 36

(a) Test whether the mean production of the four differenltypes of machines arc equal.
(b) Te't <llsowhether the four workers differ with respect to mean productivity.

21. Set up ANOYA table for the following per hectares yield for three varieties of wheat each grown
on four plots:
22.

138
Per Hectare Yield (in '00 kgs,)
Variety of Wheat
Plot of
U,,' A, A, A,
1 6 5 5
2 7 5 4
3 3 3 3
4 8 7 4
Alw work out F-ratio.

23. Perfoon a two-way ANOVA table on the data givcn below:

Treatment I
(i) (ii) (iii)
Treatment 2 (i) 30 26 38
{ii) 24 29 28
(iii) 33 24 35
(iv) 36 31 30
1'1 27 35 33

(Use coding Mcthod subtracting 30 from tim givcnnumbcrs)

23. A manufaerurer of ball point pens claims that a certain pen he manufactures has a mean writing life
of 400 pages with a sl3ndard deviation of 20 pages. A purchasing agent selects a sample of 100 pens and
puts them for test. The mean writing life for the sample was 390 pages. Should the purchasing agent reject
thc manufacturer's claim at 5% I.o.s.

24. A manufacturer claimed thaI allea.t 95% of the equipmenl which he supplied to a facto!)' confonned
to specifications. An examination of a sample of 200 pieces of equipmenl revealed lhat 18% were Faulty.
Test
his claim at a significanllevel of 5%.

25 In an infantile paralysis epidemics 500 persons contracted thedisease. 300 received no serum
treatment and of them 75 became paralysed. Of those who received serum reatrnent 65 became parulysed
Was serum treatment effective?

26 Out of 800 persons 25% werc literate and 300 had travelled beyond the limits of their district. 40% of
the literates were among those who had not travelled. Is there any relation between travelling and
literacy?

27. In a survey of 200 boys of which 75 were inteUigent. 40 hal;! skilled fathers; while 85 of the
unintelligent boys had unskilled fathers. Do these figures support the hypothesis that skilled fathers have
intelligent boys?

139
T.lble 1: Area Und~. Nflrn",1 ,- ••tv.'

An ."try;n lhe "bl. is lh. proponio" IInder lh.


cnli", cn",. which i.!>cl"'e." Z ,,0 and n I"',il;¥<
volno of ,. A",a, for nc8"';'" ynille. for;: "re
OOl:l;l1edhy 'ym,nelty. ,
.\ '""3' "f " ..•."n' l:ortl ,••,n, '. ,I d i,1ri I'" Ii""

, , .m .0> m
001

"., '"''
ill
'" ill

.om
.ffi
'"
m59
, "'" "'~
D
"'" "'~
.QI60 .m9')
"'"
.0319

, , "'am" ,am .,,'" mn "'"


"".
<0%
om "'''
.,""
.()(;75
.1((>4
.0714 ,0753

"''' ,1103 .\141

.,,,. .,~ ,m .,w;


.- ,." "'"
.1179 .1117 .l2'i5 .129.1 ,1m .):lI:>ll ,1443 .l51?

,• .l554 .1~1

~""'"
.l(j(,.. .1736 ,m
'''' .'''''
, =. .,"" "" "'" ""
.l915 .1985

"" "n
,,~ "" "'" .2.~17
""
.,, "., "'" """A'
,,~ ""
21"
zrn
""
'61'
'"" """
" '"'' "",
""
., "" , '"w
,,~ "'..<"m "'" "',,~ '"'' 3133
m
31~9

~" ""3212
-"~ "'""n 3315 "OS
'.0
,, '''' "" "" "" 3531
"" ""
" ''''
"
"
;"" ""
"OJ
."' "
' ".JW

."'"""
"m .•m .•'"
,~,
.4192
.TIre
"'" -""
,
,m
""~" ""
,,.
"'"
",J
,,~
'''' •••
.4357
..
"'"
"'"
.4251
3749

,4115 .4lJ1 .4147


.4292
.4418
-"w
em>
.4162
m,
.,," ~,
-""
.4ot5
,4171
.4l19

" .4332

.,," ..." ''''


...•, ..." ''''
.4370

.•
.- '" Am
" .•""
.4474 .4515
,,~ ,4545

"
.4554 A'"
,,~"'" <@,
.4573 .4S91 .4599 .4616 ,4633

.•'"
" .4641
""' "'" "" ,4671
"'"
.-
Am ,4750
1,9

" .,m Am
.4713 .4719
"" "" 4732

"'" ••• .
,4756 .4761 .4767
.4783 ,4788

.•.'"
.4793 AM
"" .4317

.<8.n
" ."".'"" """" "'" "' " ""
"" "'" "".<m .""A9"
.4338 A'"
.<W
""
-, "'" ."".""
20 .4871

.
23 ,m A"" .4i«> .4913 .4916

" Am .•", ,~, 'On '.~2


.4918
"" '''' " "'" A'" .49-t3
.4931
.49-19
.'OM ,49J6

" .4940

.•", .•gro
,m
.4945 .4951 .4952

" .495.1
.""
.4956
"" ""Am
.4951
.""Am ."",~,
,. '''''
2J
.4974
."'
.4966
.4975
""
" ,4961
"n Am .m .""
.4moS
.• OW .4971
"., .4981

" "".'0'" ."" .•", "" ••• "" ""


.'O~ .'0" Am
" '''' "" "" "'" "'" .•'"
..4937 Am Am

140
T.,bl" 1, e';I;,',,1 \'"llI," or SI"d,'"I', I-U;'lJih"ti""

Lev~1of sillOific.ncc fOf lwo-lailed lesl


d,[ 02> om 0.'" om om df

Le",,1 of ,illo;ficllIIce for ooe-tailed 1~.1

0,10 oms om oms


,, 2.<m
0"'
,
,
, ,.'" ""
2.""
"Th 31.821
"'" ,
1.6311
'-",
<.)<"
'''"
'-""
'925

•.""""
3.182
w,
,• 1533 2m
WI) ,m
3.147
,•
,
,
1.476

,."" ,~, ''''


'ill2

,."" ,
,
IAI5 ""
,.'" ",.
3.143
3,499

,• ,-'" "'" ,•
''"'
un ""'
m ""'-'"
,,.,
3.355
"IU

"" "'"
10
II
''''
1.812

,.'" ""
m 2718
""
3.106
10
II
U ,-'" ,.'" 212'
"',<60
" ''''' u

""
"'" 1.711
L761 ,,~
""' 3.012
291' "
'3" '624
"
"
"
I)

'"
,-",
""
1.333

"'"
1.7:53
1.746
1.740
1.134
2m

"'"
2110
2101
um

""
""
""
""
,m
,
'-'"
..
"
I)

""
19 '318
''''
w, "m "" '-", 19

'" "" "" "" 2M.>


=, '""
"
12 "" 1.721
1.717 "'" '-''"
'''' 211"
""' '-'19
,."
12

" 1.319 1.714


'''' ""' ,m "
~

"
1.318 1.711

,."'" '''" ''''' ,,~ " ~

~
1.316
1315
'001
"',."" ,,,, ~

'" 1.314 "',.""" """'" ,m w,


'"
'21" 1.313 1.701 21'"
2Clt5
"'., '762
'21"
lnfioily
1.311
,=
f.W9

'"" 'EW ""


""
"56
'-'" In.fioily

14'
T~ble 3: Critieal V"iue, oi ;('

--
f'rob>.bility under H,1ha,of X' > Chi square
"'''"''
• -" .W
, '" " .,~ ''''' '" ill
"
, ,.." .'"
.oo:ll5J .oom 3.841 5.412

., ill'"
"" ,.,.,
'''' "'" 5.991
?S15 9.83J
9.210
11.341

•,
.115
m '" ""
'.m .'"
"" ,,-'" """
"'"
.Jl1

,, "., .1145
"'"
,n;
""'"
,,.,,,
15.ct16

,• ,
""
""
..
.om
"'"
""
''''
,.,.,
""
""
.,.. "'"
"'"''''
"'"
lll.645
12.01J 14.ll6J
,,-'"
16.919
15.033

'M"
18.168
19,679 ""'"
"""
16.&12
18.4"1:1

" "", "" IH8J


,,-'" 21.161
"",
"• "'3$71
"
.S»
""" ,'-", l7.T1S 19.6"15
"." ~.'"
''''
,m ,,,.,
11.340 2i.026
'''''' "'"
"An
262lJ

" 4.107 19.812


"'"
""",.,.,
• .«;> . 21:054 29.141
" 65"
"''' "'"
"
"
"
.m
, ..
5.812
""
'''''
,,.,
"'"
'"''
IB38
,.",
"""
"'"
".,.,
"""
,.".
,,= 29.633
»."
=«;
»on
n.<m
33.400

,.'"" """ """


J.o15 1J.338 25.989

,,.,
1.633 lall1
10.&51 '''''.
19.331
n»>
28.412
»'"
31.4lll
ll.OO

".""
36.191

""
,m 11.591 20.337 ' 29.615 'w, ,."., "'"
~'"
"., '>D' 21.331
.",,, """ 30.813 33.924
""" <>=
"" JaI96 13W1 35.17l
,.'" "."
. ," ,,= '''''
'''''' 14.611 "'" ""'
'"'' ,,= ,.".,
36.415
37.n'i2 . ""'" """
""" ",." 44.314

"
"
~
,m, "'"
12198

'''''' '''''
16.151
""~ """
"'~
773~
36.141
37.916
40.113
41.331
..
41.&56
44.140

..
45.419
'"
.,,"
" "'" 11.7ffi
"'"
",~ """""" """ "" ."".
D
"'" 18.493 . 43.713
""" ""',
NO'" Ford<a=< "ff~ ••<"h •••lO. Ih< qv",,'ily
~n: 2ot' _ J2d,f. _ 1 may be u.><dos. """""I van". wiIl> un;'
yart ••••••j~"'- ••bot' - JZd,'f. - I.

'"
Table 4(a), Crilkal Value, of F.Dist,;h"tion (at 5 per n,nt)
, , , , , , • U ~
-
~
" .
,,,
,,
,
161.4
18.S1
10.13
7.71
"""m 215.7
"" ""
19.16
"" "'"
." '" '" "" ,•.".. ""
'51
'"
"",,-"
,m ,w
9.12
no.,
.~
1937
243.9
19.41
"'., ""
'" >.n
S.91

."
19.45

'" "'"'"'"
,. '"''" '" 5.14
5.41

'" '" '" "" 5.19

." '"." '00 '" ,., ."


474
'" ,~ ,.w m '"
,• '"
'"'",., '" • .00 '"
4.12
,-" 'M '" 3,41
'"
." '"no '" "" '" 3.12
2"

"n. ,~ 4.10 ,." '" '"


'"
,.., '" ,.00 "" "" '"
'"2" '" '"
U 4.75 '" '" '" "" ". ,m 2" '" ""
" "" '00
'" """. 2%.
3.49
3.41
3.11 W 2W
'" ""2"
" "" 3.74 "" '" 2n
W
2m
'-" '"'"U> '"un
"" '" '"' '" '" 3.11 210
".
4,49 "A ""
,m '"2~ ". '" '"..,.
"" ", 2% '"'" '" '" un
"
" '" '"'"
4,41 2"3.16 '"
2n '"
'"' '"2" '"
'"2" '"2U . ""
,.%

m
'""" '"
3.13
"
D "" '" 2aJ
2~3.10 '" ""
'" '" '" 2~ ""
", '"'" '" "" '"'"200 '"2•• '"'" '"'"
347 '.00

'"'" 2DJ
"" '" 1.78
". . '" '" 'ro 2" 2" 2"
"" 2'" 1.76
• . '" '''' 2n3.0r

". '"2m '"'" '"'" "" ".


,., .."
Z
~ '"
,.., '" 2," ". '" ,m
1.71

" '"'" ,-" 2%'"


,-" '" '" W 2D
'" '" ,~
'" '" '"
2.73
'"2" '" ,~
.•
~ 2"
'"210 '" '" "" 1.91

"
D
4.18
'" 2"

,.~ '" '" 2" '" '" '"".


4.17 2W '" '" '" '"un '.00 , 'M,.,
". .m '" '" '-" '" '" 2" 200 ,." '"
ro us
'"2" '" '"U> '" ,., .., ,2>
,." '10 D'

-
3.92 . ,.00
'D
2aJ 2" '" "",." ,m
"" ""
•., • Dc,,,,,, <>Ifreedom for g ",",Of vari"""".
'" '" 1.7S
'"
Y, ., Dog""" of ff'C<domfor , •• ,Ii" vari....,.,.

143
Tabl~4(b); Crilical Values of F.Di,lr;butiOIl (01 1 per cellO
, , , , , , • - ~
"
,,
1

, 34.12
,m,
"" ""
"99.17
,.,
..""'" ",ro "'" """''' """'-'" "'""''' "''' "'""'''
~" "O' va; ,.'" ,."
28.71
"""'."
27.91 V."
-
"'50

,, "''' ,m '''''
"ill
"a; ,,» ''''
"'~
13.98
""
".
13.21
""
.m
13.93

.~
9.47
"" "''' '.n "''' "''' ".
'" ."
,en
, 13.13 9.13
'" 8.10

"" '50
". '" '" '" '""" ,m
843
'" 7.19
.'" 'I"
• "" '"
,m
'" ,O' '" '"
m
• "''' .m
'" '" ''''
5.47 3.11
>om
'" ." ,.'" ,.~ '" ,."'m '''' ,~'"
>0 4.71 3.91
'.m ,m

.".~
5.ll 4.74
"u ""
,m '"
7.21
'" ,OS
'''' ""3,41
,,. ". '" '50 4.16 3.78
3,17
" ''''
."
5.74
'"
'm '"
52'

'" '"
." ,O'
'" ""
,ro
" ••• '" '" "" ,en ,m
"
" '"•••• '"'",m
'"'" m 'M "" '" '", "" '"
'" '"
.."
""
'" .. 3.18

'" ,~""'" '"'


4.10

."" ." '" '"


5.18
". '" '",~
'.00
'"' '-"
J.7I

• 8.\8 5.01
'" '5O
'" '"'"
,'~
'" 8.10
'" 'M
m ""
4.43
'" '" "" '"
4.10

" "'"
5.78
"",~." "" ,w '" 3.81 3.17

u. "",.•• '" '"'''' '" 3.12


~
n
,.OS
,." '" ''''3.71 , 341.
""
'.n '"
4.76
,m
~

"• m ,,.,
"" 3.61

'"
, ..
'" "" '" ,en "" '"'" '"
'" 4.18
'" '.61
'"
.. '",..,
2iCl
"" '"
'.n '"
4.14
""
'eM '"
= "" w
V

•n ,
'" "" '"
4.11

"" '"", """"


5.45
'" '''' m '" '" 3.75

"" '-" '" "" '''' '" "" ""'" '" '"'
5.42 '50

."'" '" '" "",.~ '" '" "" 3.17 W

'" 5.18
'" 4.13
'" '.61 ".
"" '",os
3.12 "" "" ,m
''''"" '"' '" ,." ""
-
. 1.38
'" '.m ,n ,en ,." "" '" '"". ,," ,ro
'" J.l7
'"
,.OS

'" '"
", '"D<J=lI of fr=;lom for S..•• ter vorlonc<:.
v, '" D<~,oH....,oom for 1m. tlu von.nee.

144
Table S: Value, for Spearman's Rank Correlalion (r,) for Combined Area, in Both TJil,

10% 01 area

_.3966

.m DO
", '" .",
,, =>
.", ..", .",
,, """
1m>
.""
.=
.7H4

""
""'.,"" "'",n" ,>m
"'"
.••,
..,., """ "'".m> "'" "'.","
.7143
.8167 .
"m .•• ~ ""
.5515 """
hOM .TIll .7818
"'"
n
.,., ,.,.
.m .=
.,,, .7455
""
"" .""
U

"
"
"
..
'"'"
"""'," ,,~
"""
..."
.,'"
"",~,
,,~ ""
.",
1m>
.=
Ifm
.m
'"'
,,~
.-
. •.
.8182
.7912

.,
."",
" "" "'..."" "" .,..,
V

•• "'"
.411B
""-"" .6152

"'"

.3148 '
"""" ,,~ A716
,m "'"
"" .•m
D """
"'" "'" .••" =>
"" ""
Am
" " "'"'"" "'" "'A'"" "" "..
""
•.
""
-
~ , .4241
"'"
""z J. '.' II61 .~:
." 27l»
~;{
-=
.3518

'''' . .•• .
AI.'iO .
, "'"'!" ..;
'""'
-=
~

z. ,,~ '
"'"
, ""
"'" "'"
=> ""
"'ih\ <1 -,: :2540 ',' "" .3236 -,'
"" '. ""
"'.m
" ..,.
.4915
.
""
sm
~28.'".''"
'B"~". I"" "" '''' , ,..,
"'"
. -" .."
., , "'"
.3l13
"""'" ,
Am
,A251 "" ,,~
" """ "'" ' "'"

..,
. '" .~,." ,~
IW. - M;n W,J or [M~x, W,-W,J
"'" .". 'J'

• , M;. M •• , , , • 0
• , • • '" " " " •• " " " '" '" W

,,
W.
,, ,W.

.~.
,

,, •"'" .~, ,'~


0 .'00

0
0 .D4
.00'

,• ,, " m,00.
.028 ,OS6
,143

, ""• .m .~ .,"
.111

, ,•, .In

•• •" ,~
.'00

~.
.<00

,• •• ~ m. ,~,
.<00
.018••
.OS7 .114

,•

m. m,
" .~ .Oll. .131

•• " .~ m, ". ~, .~,


.011 .033 <00 .D3

• ,, " "
00 .011 .<00 .I 39

-• .<00 •

•, '"'" "'" '"


~ M' .114
,014 <00 .m .'00

,• " " ,.
" ~
00'
<00
.016
.01(1
.Oll
,019
.M'
m,
.<00
.M'
.143

,M'..
.oM .119

, "
I(I$' IS8
.00'
.<00
.~ .(111
.~
.036
m.
OSS .01\2 .ILl
.,m
0 , .~


.036 ,OSS .<00

""" ~"
.018 ,l1S
'"
.~

•0 ,00
~
.~
.(116
~
01lOS6
.016.0211 ~, I-
00, .143
.073 .Ill
•, " " <00 .~ <00 ,OIS .026 00' .= ". .123

, " ~
" "
00'
.00'
.<00
.<00
,COS ,00II
.003 .OOS
.01S
<00
.1124 .Ol7
.01S .1l2J
,OS3 .00' .101
m, .00' ~ 00' .Ill

'CO"' •. I

--------------------- ------_._----_._----- -~---

"
, , , 0
• , .. III II 11 lJ 14 IS I~ 17 I~ 19 20
, Min M•• , , , • , • , •
"
W, W,
0
'" " " " " " " " " " '"
• ,• " " '" oro. •
," "
,• "" """
,00'
,~

,00'
.olO
,~

,00>
.019
.009
,004
.Oi' ,026
,008
, •••
~,
,033 ,ClS7 1186 .129 • ,

oro,
.•
.012 .047 .066 ,00>
,12)

.c104 ~'" 'W


,~, ,In
, •,• "" "
,00' ,00' .002
,000 .00, ,00' ~
.OJI ,017 ,02~ .037 .051
.006 .010 .015 .02\
,~

~, 000
m,
""
.ms. "'" '" .1l4

"" " '" ..,~.


,00' ,000 .012 -021
~ 00' ,00' .005 .009 .01S .014 .017 M> .(114 .IO!

,• " n" ,00' ,00' .002 .004 ,~ ,Oll .017 ,000 ,037
'" ,~ .117

"" '" ,"" '" ,'~


00> ,00' ,00' ,~ JXJ6 .009 .013 .OJ9.027 ,00'

• •, " "
~ 000 ,000
"" 00' ""
00> ,00' ,00> ,00' .OW .014 ,= ,m .<136
"'"
,076 00' .116

.•, ~,
,~ ,~ .008 .014 • ,

-
~
•, "" "•
00'
,000
,~ 00>
.00,
,00' ,000 ,015 m,
,033 .041'.06('
,~,
.•,
~ 00' ,00' ,~ .006 .010 .01S .021 , ,07] 00' .lla
.003 .005 .IX17.010 .014 ow .027"'"
• "" "
,000 000 ,00' ,00' ,~
'00 ,000 ,000 .000 .001" ,00' 00' 00' J~l.om .010 mol .019
"'" '" "" ,000
,00'
,'M
M'
00'

'"
,[ 16
000 ,00' .In

• 1M;,: •••••••• , ••••• I•••• ""0<1 <>f !hi ••••• ""'" (aoS<!
,""'" •• I••• ",,", .,.. '-reI ••• "'" pou!ol< fo< Ihc ",," •• 1••• of, .n<! I in ,b•• ""'.
1 al,le 7- C,ili<al V"I"", "I T in thO'Wii(o~(!n MaT{-h(,d Pai" l(>.'T

Level of ligniflCln<:e [0< one-tailed test

I " I
"" Level of lignificonce for two-l.liJed lest

m
" '" "
,
6
,,
0
,, ,,
• ,,
9
•n
6

• ,
n , ,,
" " • •
"" "
"
~
"
"
~ ""
" ~ ~ ~

""" ~

'"
~
n
n
~

"
~ '"• •
~


~ ~
" "
"" "
n
~
~ "
~
~
~
•W W
n
61
w

148
Table 8; C:um"I~liv,. llill()miJll'roh~bilili"s: I' Ir <: r1n.p'

" '. " ".,"" '"mn '",m


", SOXI
uno 1.o:m
, ,, ''''"
,~Ioo -~~ ''''"
, ."'" .9375 '"" "'"
.7.'>00

"n" "'"
, ,, '"'" '''''' ''''''
,, ""' 2m
"'" .OJ13

'""
..,
.9J85 b.l:!8 II!75

.•'"
.9':114
..""'" ""'-"
,9130 '""
8125

•, .~

,nm
.~
'""
,, '''''' '''''' ,."'.0010"
" ,,
-=
.7361 ""
2~ "'"
"""
.,m
.0108

"" >'"
.n~ "'"
•,
.om
••• .9219 ,,~
"n .1719
,no
,
, u=
~
...,
"'"
m;
,8331
.9452
.02~
.8".Bl
.9453

," '"'"
'"'" '"'"
""
.~~
"'"
'om> .~
'"'" 'om> "'"
'", '"'" WID
nm .ron
WID
.am
" ,, ""
, "'" '''' .0196
"m
0'"
'""
m", '"" .01'T.!

•, "~OJ
."-"
.8424 "'."'"" ."'.1937
"
, .~ .,,~
"'" ""
.•,.
,on
, I.ClOO
""
.!!-tIS

.'"
,, WID
WID
.'>m ."v
.,'" mm
'"'" won .om .=
m ,mn .•m
'ron '"'"
,om> "'"
'ron 'om>
" 'ron 'ron ,ron
''''''
" (c."""")

'"
" '. .00
'"
.40
'"
3>
,
0 .1216
.3917
.0))2

""'3
=
=
.=

-.-
=
2
.,'" om
"''' "'""'13
3
4 .,,., "'51
.4148
.0159
.CB:» ro59
5 .6\71 .1255 IJ)5JI
6 .""5 .7857 2-1'39
7 9'J9S .8911' .4158 """
.13l6

,
8
"'"
,=
."'"
.9116'
.5'>15
.7552 "'''
.41J9
.'800 E72J
'0 ,.=
WXJ .9.m '""
.7483
" 1.=
.,,'" ..,,,
.97118 .8684
"
13
1.=
,= ,= .\Iom
14 ,= 1= .m3
""'3
.'m6
13
16
1.00X>
un»
,=
1= 1.=
'''''
,mI
].0:00 1.= I.IJ:m .W>J
"
,s 1.= ,= 1.= ,=
19 1.= 1.= 1.= ,.=
3> ,= ,.= 1= ,=

150
Tabh' 9: Sl'le(ll~t C, itic"I VaIl"" of S in [Ill' KVrld"II's .C"dhl i",,1 "I CUIlt ,,,d.IIl' ,.

V"luc' ,01S<);,ic' d "I ,j~l\ific"nc"


--- , ---
N Some add;[;"""l

, !
value_forN=3
, , 5 6 , , ,
, '" 103.9 157.3 9 'W
, '95 88.4 143.3 217.0
"" 71.9
626
,,.,
112.3 182A 216.2
3..15,2
83.8
6 75,7 221.4
" 95"

"" '"
ron
101.7
127.8
1&3.7 WO
376.7
4Sl1
sn" '" Im.7

IS 89.' 1!72.9 ''''


"'" "'"
1lf04,9

'" 119.7 "'.0


"'-' 764.4 1158.7

V;,I"c, "t j'i ICI'cl"I ,i~nill."Tl.c

,
, 75.6 ",. •
11lS6 , 759
"m
, 61.4 lOB 116.2
m,
MS.O
"
• 6 ""
""
1428
176,1 m' ""
,,'-, "" 121.9

""
"IS"
OJ" 137.4
175.3
242.7
'""
•••.0
579.9
737J:J '" 15S.6

'"
131.0 W>.'
300.'
475.2 '5'2 1129.'i

'" In.O
"'•" "''' """ 1521.9

151

~

t
T~hl
••10 T"I,I" Shnwinl: Cfilk~1 Valu,,, of A_Slali<licfOfa"y Giwn V"iuc
or " ~ I. Curn'>po"d;ng III Va,inu. l t",d~ oj I'miJahiloly

(A j, ,ig"jf,e."t at a giwll k,el ,fil i<<: the valuo ,hown in ,he l.hl.)
n _ ]" Level or .igniflcance to< one.l.iled Ie" [

= m = ,
Level "r .ignifkllllcc (or IW<>-~1ncd t." "'" ~
m m m,
.W
, '", • , ,
,, 05OOXl12

,
0.5125
0.412 '''"'
"'.
0..50:)4<)
O_'-lJ
OSDl2
,= D.3J4
=, "",
-
M~ MU o.m
,-'"
•, "m
..
0,211
"""
,= ''''
"'" "' 0184
,
, '''"
,,., ,~ n= 0,167 l
",.. =,
"'"
U 1<)(, 0.155

•, ''''' '''''
0.217 0.190
noM
0.146
0,139
''''' "'''
nn.
0.213
0.210 0.181
""
" """ O,Z73 ,,., 0.178 0.130
"" "'"
,y,
,,..
"''' ~ 0.176 0.126
U
"'" ,,.. "''M
"
0,174
IH12
0.124
0.121
" ,y,
''''' OW> 0.\10 Q.1l9
"•• "'" n". Ull?
''''' OW>

,~
OW>
0.19") .,," 0.116
" '''''
ox< ''''' n'~ 0.167 0.114

"" n,n no,"


ox< OW> 0.] J3
~ .m n,~ 0.112
"" """ 0,1% O.l6.'i Ulll'
"
n ""''''' """
nm; "~
=. 0,1% QUO
V no~ no~ n,'"
u
Z
0=
''''' OW
"'..,
O,I~
o,~
no",
0.162
0.108
0.108 I
"'" "'., no~ 0.162 0.107
"
n '''''
0= "'" 0.193 0.161 O.lffi

• ""'''' '''' nov 0.1(,1 no'"


no",
V
~
'''''
,.~
"'"
0,193
0,193
0.161
0.160
n''''
0.\91 no~ no",
"w '''''
",..
OW
om ,w;
,~ ''''' "'''' '0.189
om
"'" ,,~ 0.153
0.151 """
oom

"'''' '''"
• " ~ Q"m"", nf P'''''
,\",,,a: n.. /1';1, J. !'I}~~"I,
V"lo"'" XLV!. 19~5.p. 226

152
• References:
1. S. P. Gupta, Statistical Methods, New Delhi, Sultan Chand & Sons 20 II.

2. C.R.Kothari, Research Methodology. Nev.' Delhi, New Age International


Publishers, Second Edition.

3. Course Material on Quantitative Methods for Post Graduate Diploma in


Materials Management by Indian Institute of Male rials Management.

4. V. K. Kapoor, Fundamentals of Statistics for Business and Economics,


New Delhi, Sultan Chand & Sons.

5. Suranjan Saha. Mathematics and Statistics


.~

COURSE CONTEI'TS OF QlJAI'T1TATlVE TECIINIQUES


(100 MARKS)

Module I Application of Central tendency


Application of central tendency and (ental dispersion, Co-effcient of correlation,
coefficient of determination and non-determination, calculation of standard error
of estimate.

Module 11 Forecastingtechniques
Multiple correlation and Multiple regression. Time series anaiysis

Module \II - Parametric Test


Theory of estimation; Point and interval testing of Hypothesis large and Small
sample Tests; Parametric test: t-test, Chi-square test, ANNQVA, Probability
distribution, Binominal, Poison and Normal Distribution.

ModuleJV - Non-parameticTest
Sign test, Rank correlation, Chi square test, Runtest.
Course material prepared by

Ms. Swati Subbash Desai


M.Sc. (Applied Stalistie~)
Assoeiatc rrofe~~or.

Department or J\lathenillties. Statistics,


and Compuh'rs.

Prah\adral Dalmia Lions Colkge of


Commen:c and Economics.
Sundemagar S.Y.Road Malad (W) .

•• ABOUT'IBEAUTHOR ..c
",is. S,,'ati Subha>h [k,,,i i, "n .-\"OciHtl' I'rol'e""r in PmhlaJrai Oahma Lion' Colk!;e "I'
Commerce and Economic,. Malad Iwl. Mumhai -1>4,She ha, d"ne her I'o,t graduation (~I.Sc) in Applied
Slali<tic, from Puona t:ni,'er<it~. Her arcas of intcre't incl"de amoog other> Opcration, Rcsearch.
QWlIltita1t\'c Method,. "nd Re,~a",h Mcthod"I"gy.

She has o\'cr H years oft,.ad""g eXf"'Tle!1cc.She has abo neenleaching at P"st (jraJuate level ,ince
I\I'J~ at \ ariOLl.S""an"gemem m,titutes of ".pute ,ueh as Prin, LN. Weiingkar In,t ilUleof 1\-.lanagementand
Rcsc'"reh. Ilanasaheh (jawde hl'titute of Mana~cment. Dr lledebr In,titute <IfManagement. l'aJmashrcc
V,,,antdada l'atillnstitllie ofr,bnagemen1. I(.V,'Aetc.

Slle i, al,o aetivcly a"ociat"d with the In,l;l ,,11'"I' Oi'ta •••.e F:d"clllion. l: n iv'er>il~'of :\1" mbai. io
her capacity", a ""urse "'riler fllr :\lalhcmati,-"I and Sll11ilfic,,1 'l~chni'lue, at F.Y.R.Com. level,
Ioll'grate,1 ,\ I'l'rollchl'~ 10 (l(lcralio'" R"'l'lIrcli at PC;I)OR:\t and also '" a b.eully fonhe,e cou",e~.

She i, "Iso lin author Ope<a1iuo' RCSI'Hell "hieh has been deSL~1\edas a tcM Book fllr T.Y.B.M.S.
Scm \"I. r n i\'Cr,il~' of:\t "mb"i.

She is also serving as" (j"e>t Faculty "t J.J.TUmVl'"ity and also at Reg;iollal Training lmti\ute.
Mllmhai. Indian Audit andAc"""n" Depanment. (jm.ernment on odia,

She ha, pmlleipated i1\ \'"r'''lls "a1ion"1 Level Semin,ors.' Sym)ll"la and pre.sented a oum!>er of
technical papers at such , •.mina". Quile it fc". or her pal"'''' h'l,e heen published in prestigIOUSjournal, e.g.
I::NTiIt[ RJo:SL\RCIl ISS'-1l97S-S020 titled "Role of Quantitative Teehniques 111 Industry:'
V,o\RIORUJ\t :\lulti- Oi<ciplioar} e-Re'carch Jo"rnal ISS' U976-9714 titled "Applications of Lincar
Programming Prohlem, and Son Linear Programmiog Problem,.

She was n"minated for Masten; training program on life ,kills. citil~O,hip & civics otg.nized hy
RGN IYD tn repre,ent Mah"rJshlra,

Rs. 250/~

You might also like