You are on page 1of 335

Pradip Kumar Sahu · Santi Ranjan Pal

Ajit Kumar Das

Estimation
and
Inferential
Statistics
Estimation and Inferential Statistics
Pradip Kumar Sahu Santi Ranjan Pal

Ajit Kumar Das

Estimation and Inferential


Statistics

123
Pradip Kumar Sahu Ajit Kumar Das
Department of Agricultural Statistics Department of Agricultural Statistics
Bidhan Chandra Krishi Viswavidyalaya Bidhan Chandra Krishi Viswavidyalaya
Mohanpur, Nadia, West Bengal Mohanpur, Nadia, West Bengal
India India

Santi Ranjan Pal


Department of Agricultural Statistics
Bidhan Chandra Krishi Viswavidyalaya
Mohanpur, Nadia, West Bengal
India

ISBN 978-81-322-2513-3 ISBN 978-81-322-2514-0 (eBook)


DOI 10.1007/978-81-322-2514-0

Library of Congress Control Number: 2015942750

Springer New Delhi Heidelberg New York Dordrecht London


© Springer India 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)


Preface

Nowadays one can hardly find any field where statistics is not used. With a given
sample, one can infer about the population. The role of estimation and inferential
statistics remains pivotal in the study of statistics. Statistical inference is concerned
with problems of estimation of population parameters and test of hypotheses. In
statistical inference, drawing a conclusion about the population takes place on the
basis of a portion of the population. This book is written, keeping in mind the need
of the users, present availability of literature to cater to these needs, their merits and
demerits under a constantly changing scenario. Theories are followed by relevant
worked-out examples which help the user grasp not only the theory but also
practice them.
This work is a result of the experience of the authors in teaching and research
work for more than 20 years. The wider scope and coverage of the book will help
not only the students, researchers and professionals in the field of statistics but also
several others in various allied disciplines. All efforts are made to present the
“estimation and statistical inference”, its meaning, intention and usefulness. This
book reflects current methodological techniques used in interdisciplinary research,
as illustrated with many relevant research examples. Statistical tools have been
presented in such a manner, with the help of real-life examples, that the fear factor
about the otherwise complicated subject of statistics will vanish. In its seven
chapters, theories followed by examples will make the readers to find most suitable
applications.
Starting from the meaning of the statistical inference, its development, different
parts and types have been discussed eloquently. How someone can use statistical
inference in everyday life has remained the main point of discussion in examples.
How someone can draw conclusions about the population under varied situations,
even without studying each and every unit of the population, has been discussed
taking numerous examples. All sorts of inferential problems have been discussed, at
one place supported by examples, to help the students not only in meeting their
examination need and research requirement, but also in daily life. One can hardly
get such a compilation of statistical inference in one place. The step-by-step

v
vi Preface

procedure will immensely help not only the graduate and Ph.D. students but also
other researchers and professionals. Graduate and postgraduate students,
researchers and the professionals in various fields will be the user of the book.
Researchers in medical and social and other disciplines will be greatly benefitted
from the book. The book would also help students in various competitive
examinations.
Written in a lucid language, the book will be useful to graduate, postgraduate
and research students and practitioners in diverse fields including medical, social
and other sciences. This book will also cater the need for preparation in different
competitive examinations. One can find hardly a single book, in which all topics
related to estimation and inference are included. Numerous relevant examples for
related theories are added features of this book. An introduction chapter and an
annexure are special features of this book which will help readers in getting basic
ideas and plugging the loopholes of the readers. Chapter-wise summary of the
content of the proposed book is presented below.

Estimation and Inferential Statistics


• Chapter 1: The chapter relates to introduction to the theory of point estimation
and inferential statistics. Different criteria for a good estimator are discussed.
The chapters also present real-life worked-out problems that help the reader
understand the subject. Compared to partial coverage of this topic in most books
on statistical inference, this book aims at elaborate coverage about the subject of
point estimation.
• Chapter 2: This chapter deals with different methods of estimation like least
square method, method of moments, method of minimum χ2 and method of
maximum likelihood estimation. Not all these methods are equally good and
applicable in all situations. Merits, demerits and applicability of these methods
have been discussed in one place, which otherwise have remained mostly dis-
persed or scattered in the competing literature.
• Chapter 3: Testing of hypotheses has been discussed in this chapter. This
chapter is characterized by typical examples in different forms and spheres
including Type A1 testing, which is mostly overlooked in many of the available
literature. This has been done in this book.
• Chapter 4: The essence and technique of likelihood ratio test has been discussed
in this chapter. Irrespective of the nature of tests for hypotheses (simple and
composite), this chapter emphasizes how easily the test could be performed,
supported by a good number of examples. Merits and drawbacks have also been
discussed. Some typical examples are discussed in this chapter that one can
hardly find in any other competing literature.
Preface vii

• Chapter 5: This chapter deals with interval estimation, techniques of interval


estimation under different situations, problems and prospects of different
approaches of interval estimation has been discussed with numerous examples
in one place.
• Chapter 6: This chapter deals with non-parametric methods of testing
hypotheses. All types of non-parametric tests have been put together and dis-
cussed in detail. In each case, suitable examples are the special feature of this
chapter.
• Chapter 7: This chapter is devoted to the discussion of decision theory. This
discussion is particularly useful to students and researchers interested in infer-
ential statistics. In this chapter, attempt has been made to present the decision
theory in an exhaustive manner, keeping in mind the requirement and the
purpose of the reader for whom the book is aimed at. Bayes and mini-max
method of estimation have been discussed in the Annexure. Most of the
available literature on inferential statistics lack due attention on these important
aspects of inference. In this chapter, the importance and utilities of the above
methods have been discussed in detail, supported with relevant examples.
• Annexure: The authors feel that the Annexure portion would be an asset to
varied types of readers of this book. Related topics, proofs, examples, etc.,
which could not be provided in the text itself, during the discussion of various
chapter for the sake of maintenance of continuity and flow are provided in this
section. Besides many useful proofs and derivations, this section includes
transformation of statistics, large sample theories, exact tests related to binomial,
Poisson population, etc. This added section will be of much help to the readers.
In each chapter, theories are followed by examples from applied fields, which
will help the readers of this book to understand the theories and applications of
specific tools. Attempts have been made to familiarize the problems with examples
on each topic in a lucid manner. During the preparation of this book, a good number
of books and articles from different national and international journals have been
consulted. Efforts have been made to acknowledge and provide these in the bib-
liography section. An inquisitive reader may find more material from the literature
cited.
The primary purpose of the book is to help students of statistics and allied fields.
Sincere efforts have been made to present the material in the simplest and
easy-to-understand form. Encouragements, suggestions and help received from our
colleagues at the Department of Agricultural Statistics, Bidhan Chandra Krishi
Viswavidyalaya are sincerely acknowledged. Their valuable suggestions towards
improvement of the content helped a lot and are sincerely acknowledged. The
authors thankfully acknowledge the constructive suggestions received from the
reviewers towards the improvement of the book. Thanks are also due to Springer
viii Preface

for the publication of this book and for continuous monitoring, help and suggestion
during this book project. The authors acknowledge the help, cooperation, encour-
agement received from various corners, which are not mentioned here. The effort
will be successful, if this book is well accepted by the students, teachers,
researchers and other users to whom this book is aimed at. Every effort has been
made to avoid errors. Constructive suggestions from the readers in improving the
quality of this book will be highly appreciated.

Mohanpur, Nadia, India Pradip Kumar Sahu


Santi Ranjan Pal
Ajit Kumar Das
Contents

1 Theory of Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sufficient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Unbiased Estimator and Minimum-Variance
Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Consistent Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.5 Efficient Estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3 Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 Method of Minimum v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Method of Least Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Theory of Testing of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . 63


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Definitions and Some Examples . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Method of Obtaining BCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Locally MPU Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test . . . . . . . . . 97

4 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.1.1 Some Selected Examples . . . . . . . . . . . . . . . . . . . . . . . . 104

5 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

ix
x Contents

5.3 Construction of Confidence Interval. . . . . . . . . . . . . . . . . . . . . . 132


5.4 Shortest Length Confidence Interval and Neyman’s Criterion . . . . 138

6 Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 One-Sample Non-parametric Tests. . . . . . . . . . . . . . . . . . . . . . . 146
6.2.1 Chi-Square Test (i.e Test for Goodness of Fit) . . . . . . . . . 146
6.2.2 Kolmogrov–Smirnov Test . . . . . . . . . . . . . . . . . . . . . . . 147
6.2.3 Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.2.4 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 151
6.2.5 Run Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.3 Paired Sample Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . 156
6.3.1 Sign Test (Bivariate Single Sample Problem)
or Paired Sample Sign Test . . . . . . . . . . . . . . . . . . . . . . 156
6.3.2 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Two-Sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.5 Non-parametric Tolerance Limits . . . . . . . . . . . . . . . . . . . . . . . 168
6.6 Non-parametric Confidence Interval for nP . . . . . . . . . . . . . . . . . 170
6.7 Combination of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.8 Measures of Association for Bivariate Samples . . . . . . . . . . . . . . 174

7 Statistical Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2 Complete and Minimal Complete Class of Decision Rules . . . . . . 189
7.3 Optimal Decision Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.4 Method of Finding a Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . 199
7.5 Methods for Finding Minimax Rule. . . . . . . . . . . . . . . . . . . . . . 208
7.6 Minimax Rule: Some Theoretical Aspects . . . . . . . . . . . . . . . . . 226
7.7 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
About the Authors

P.K. Sahu is associate professor and head of the Department of Agricultural


Statistics, Bidhan Chnadra Krishi Viswavidyalaya (a state agriculture university),
West Bengal. With over 20 years of teaching experience, Dr. Sahu has published
over 70 research papers in several international journals of repute and has guided
several postgraduate students and research scholars. He has authored four books:
Agriculture and Applied Statistics, Vol. 1, and Agriculture and Applied Statistics,
Vol. 2 (both published with Kalyani Publishers), Gender, Education,
Empowerment: Stands of Women (published with Agrotech Publishing House) and
Research Methodology: A Guide for Researchers In Agricultural Science, Social
Science and Other Related Fields (published by Springer) as well as contributed a
chapter to the book Modelling, Forecasting, Artificial Neural Network and Expert
System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and Niranjan
Sarangi (Daya Publishing House). Dr. Sahu has presented his research papers in
several international conferences. He also visited the USA, Bangladesh, Sri Lanka,
and Vietnam to attend international conferences.
S.R. Pal is former eminent professor at the Department of Agricultural Statistics at
R.K. Mission Residential College and Bidhan Chandra Krishi Viswavidyalaya (a
state agriculture university). An expert in agricultural statistics, Prof. Pal has over
35 years of teaching experience and has guided several postgraduate students and
research scholars. He has several research papers published in statistics and related
fields in several international journals of repute. With his vast experience in
teaching, research and industrial advisory role, Prof. Pal has tried to incorporate the
problems faced by the users, students, and researchers in this field.
A.K. Das is professor at the Department of Agricultural Statistics, Bidhan Chandra
Krishi Viswavidyalaya (a state agriculture university). With over 30 years of
teaching experience, Prof. Das has a number of good research articles to his credit
published in several international journals of repute and has guided several

xi
xii About the Authors

postgraduate students and research scholars. He has coauthored a book, Agriculture


and Applied Statistics, Vol. 2 (both published with Kalyani Publishers), and con-
tributed a chapter to the book, Modelling, Forecasting, Artificial Neural Network
and Expert System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and
Niranjan Sarangi (Daya Publishing House).
Introduction

In a statistical investigation, it is known that for reasons of time or cost, one may
not be able to study each individual element of the population. In such a situation, a
random sample should be taken from the population, and the inference can be
drawn about the population on the basis of the sample. Hence, statistics deals with
the collection of data and their analysis and interpretation. In this book, the problem
of data collection is not considered. We shall take the data as given, and we study
what they have to tell us. The main objective is to draw a conclusion about the
unknown population characteristics on the basis of information on the same char-
acteristics of a suitably selected sample. The observations are now postulated to be
the values taken by random variables. Let X be a random variable which describes
the population under investigation and F be the distribution function of X. There are
two possibilities. Either X has a distribution function of Fh with a known functional
form (except perhaps for the parameter h, which may be vector), or X has a
distribution function F about which we know nothing (except perhaps that F is, say,
absolutely continuous). In the former case, let H be the set of possible values of
unknown parameter h, then the job of statistician is to decide on the basis of
suitably selected samples, which member or members of the family fFh ; h 2 Hg
can represent the distribution function of X. These types of problems are called
problems of parametric statistical inference. The two principal areas of statistical
inference are the “area of estimation of parameters” and the “tests of statistical
hypotheses”. The problem of estimation of parameters involves both point and
interval estimation. Diagrammatically, let us show components and constituents of
statistical inference as in chart.

xiii
xiv Introduction

Problem of Point Estimation

The problem of point estimation relates to the estimating formula of a parameter


based on random sample of size n from the population. The method basically
comprises of finding out an estimating formula of a parameter, which is called the
estimator of the parameter. The numerical value, which is obtained on the basis of a
sample while using the estimating formula, is called estimate. Suppose, for an
example, that a random variable X is known to have a normal distribution Nðl; r2 Þ;
but we do not know one of the parameters, say l. Suppose further that a sample
X1 ; X2 ; . . .; Xn is taken on X. The problem of point estimation is to pick a statistic
T ðX1 ; X2 ; . . .; Xn Þ that best estimates the parameter l. The numerical value of T
when the realization is x1 ; x2 ; . . .; xn is called an estimate of l, while the statistic T is
called an estimator of l. If both l and r2 are unknown, we seek a joint statistic
T ¼ ðU; V Þ as an estimate of ðl; r2 Þ.
Example Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh for which
the mean exists and is equal to h. We may want to estimate the mean h of distri-
bution. For this purpose, we may compute the mean of the observations
x1 ; x2 ; . . .; xn , i.e., say

1X n
x ¼ xi :
n i¼1

This x can be taken as the point estimate of h.


Example Let X1 ; X2 ; . . .; Xn be a random sample from Poisson’s distribution with
parameter k, i.e., PðkÞ, where k is not known. Then the mean of the observations
x1 ; x2 ; . . .; xn , i.e.,

1X n
x ¼ xi
n i¼1

is a point estimate of k.
Introduction xv

Example Let X1 ; X2 ; . . .; Xn be a random sample from a normal distribution with


parameters l and r2 , i.e., N ðl; r2 Þ, where both l and r2 are unknown. l and r2 are
the mean and variance respectively of the normal distribution. In this case, we may
take a joint statistics ðx; s2 Þ as a point estimate of N ðl; r2 Þ, where

1X n
x ¼ xi ¼ sample mean
n i¼1

and

1 X n
s2 ¼ ðx1  xÞ2 ¼ sample mean square.
n  1 i¼1

Problem of Interval Estimation

In many cases, instead of point estimation, we are interested in constructing of a


family of sets that contain the true (unknown) parameter value with a specified
(high) probability, say 100ð1  aÞ%. This set is taken to be an interval, which is
known as confidence interval with a confidence coefficient ð1  aÞ and the tech-
nique of constructing such intervals is known as interval estimation.
Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh . Let hð xÞ and
hðxÞ be functions of x1 ; x2 ; . . .; xn . If P½hð xÞ\h\hð xÞ ¼ 1  a, then ðhð xÞ; hð xÞÞ is

called a 100ð1  aÞ% confidence interval for h, whereas hð xÞ and hð xÞ are,
respectively, called lower and upper limits for h.
Example Let X1 ; X2 ; . . .; Xn be random sample from N ðl; r2 Þ, whereas both l
and r2 are unknown. We can find 100ð1  aÞ% confidence interval of l. To esti-
mate the population mean l and population variance r2 , we may take the observed
sample mean

1X n
x ¼ xi
n i¼1

and the observed sample mean square

1 X n
s2 ¼ ðxi  xÞ2
n  1 i¼1
xvi Introduction

respectively. 100ð1  aÞ% confidence interval of l is given by


s
x  ta2;n1 pffiffiffi
n
a
where ta2;n1 is the upper 2 point of the t-distribution with ðn  1Þ d.f.

Problem of Testing of Hypothesis

Besides point estimation and interval estimation, we are often required to decide
which value among a set of values of a parameter is true for a given population
distribution, or we may be interested in finding out the relevant distribution to
describe a population. The procedure by which a decision is taken regarding the
plausible value of a parameter or the nature of a distribution is known as the testing
of hypotheses. Some examples of hypothesis, which can be subjected to statistical
tests, are as follows:
1. The average length of life l of electric light bulbs of a certain brand is equal to
some specified value l0 .
2. The average number of bacteria killed by tests drops of germicide is equal to
some number.
3. Steel made by method A has a mean hardness greater than steel made by
method B.
4. Penicillin is more effective than streptomycin in the treatment of disease X.
5. The growing period of one hybrid of corn is more variable than the growing
period for other hybrids.
6. The manufacturer claims that the tires made by a new process have mean life
greater than the life of a tire manufactured by an earlier process.
7. Several varieties of wheat are equally important in terms of yields.
8. Several brands of batteries have different lifetimes.
9. The characters in the population are uncorrelated.
10. The proportion of non-defective items produced by machine A is greater than
that of machine B.
The examples given are simple in nature, and are well established and have
well-accepted decision rules.

Problems of Non-parametric Estimation

So far we have assumed in statistical inference (parametric) that the distribution of the
random variable being sampled is known except for some parameters. In practice, the
functional form of the distribution is unknown. Here, we are not concerned to the
Introduction xvii

techniques of estimating the parameters directly, but with certain pertinent hypothesis
relating to the properties of the population, such as equalities of distribution, tests of
randomness of the sample without making any assumption about the nature of the
distribution function. Statistical inference under such a setup is called non-parametric.

Bayes Estimator
In case of parametric inference, we consider density function f ðx=hÞ, where h is a
fixed unknown quantity which can take any value in parametric space H. In
Bayesian approach, it is assumed that h itself is a random variable and density f ðx=hÞ
is the density of x for a given h. For example, suppose we are interested in estimating
P, the fraction of defective items in a consignment. Consider a collection of lots,
called superlots. It may happen that the parameter P may differ from lot to lot. In the
classical approach, we consider P as a fixed unknown parameter, whereas in
Bayesian approach, we say that P varies from lot to lot. It is random variable having
a density f ðPÞ, say. Bayes method tries to use this additional information about P.
Example Let X1 ; X2 ; . . . Xn be a random sample from PDF

1
f ðx; a; bÞ ¼ xa1 ð1  xÞb1 ; 0\x\1; a; b [ 0:
bða; bÞ

Find the estimators of a and b by the method of moments.


Answer
We know

a   a ð a þ 1Þ
E ð xÞ ¼ l11 ¼ and E x2 ¼ l12 ¼
aþb ða þ bÞða þ b þ 1Þ

Hence

a aða þ 1Þ 1Xn
¼ x; ¼ x2
aþb ða þ bÞða þ b þ 1Þ n i¼1 i

Solving, we get
P 2 
b ð x  1Þ xi  nx xb
b
b¼ P and b

ð xi  xÞ2 1x

Example Let X1 ; X2 ; . . . Xn be a random sample from PDF

1 x=h rj1
f ðx; h; r Þ ¼ pffiffi e x ; x [ 0; h [ 0; r [ 0
hr r
xviii Introduction

Find estimator of θ and r by


(i) Method of moments
(ii) Method of maximum likelihood
Answer
Here

  1X n
Eð xÞ ¼ l11 ¼ rh ; E x2 ¼ l12 ¼ r ðr þ 1Þh2 and m11 ¼ x; m12 ¼ x2
n i¼1 i

Hence
1X n
rh ¼ x; r ðr þ 1Þh2 ¼ x2
n i¼1 i

Solving, we get
P
n
2 ðxi  xÞ2
nx b i¼1
br ¼ n and h¼
P 2 nx
ð xi  xÞ
i¼1

P
n
1h xi Q
n
1pffi
(i) L ¼ hnr ð rÞ
n e i¼1 xir1
i¼1

pffiffiffi P
n P
n
(ii) log L ¼ nr log h  n log n  1h xi þ ðr  1Þ wgxi
i i¼1

Now,
@ log L nr nx x
¼ þ 2¼0)b

@h h h r

Or
pffiffi
@ log L @ log r X n
¼ n log h  n þ log xi
@r @r i¼1
sð r Þ X
n
¼ n log r  n pffiffi  n log x þ log xi
r i

It is, however, difficult to solve the equation

@ log L
¼0
@r
Introduction xix

and to get the estimate of r. Thus, for this example, estimators of θ and r are more
easily obtained by the method of moments than the method of maximum likelihood.
Example Find the estimators of α and β by the method of moments.
Proof We know

aþb ðb  aÞ2
E ð xÞ ¼ l11 ¼ and Vð xÞ ¼ l2
2 12

Hence

aþb ð b  aÞ 2 1 X n
¼x and ¼ ðxi  xÞ2
2 12 n i¼1

Solving, we get
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P
3 ðxi  xÞ2 b 3 ð xi  x Þ 2
b
a ¼x and b ¼ xþ
n n

Example If a sample of size one be drawn from the PDF f ðx; bÞ ¼


2
b2
ðb  xÞ; 0\x\b find b
b, the MLE of β and b the estimator of β based on
method of moments. Show that b b is biased but b is unbiased. Show that the
efficiency of b
b with respect to b is 2/3.
Solution
Here suppose

2
L¼ ð b  xÞ
b2

Then
LogL ¼ Log2  2 log b þ logðb  xÞ

Or
@ log L 2 1
¼ 2 þ ¼ 0 ) b ¼ 2x
@b b bx

Now,
Zb
2   b
E ð xÞ ¼ bx  x2 dx ¼
b 3
0
xx Introduction

Hence
b
¼ x ) b ¼ 3x
3

Thus the estimator of β based on method of moments is given as b ¼ 3x. Now,


  b 2b
E bb ¼2 ¼ 6¼ b
3 3
b
E ðb  Þ ¼ 3  ¼ b
3

Hence b
b is biased but b is unbiased.
Again

Zb
  2   b2
E x2 ¼ 2 bx2  x3 dx ¼
b 6
0

Therefore,

b2 b2 b2
Vð xÞ ¼  ¼
6 9 18

Solving, we get

b2
Vðb Þ ¼ 9V ð xÞ ¼
9
  2
V b
b ¼ 4V ð xÞ ¼ b2
9

Hence
    h   i2
M b
b ¼V b b þ E bb b
 2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3

Thus the efficiency of b


b with respect to b is 2/3.
Example Let ðx1 ; x2 ; . . . xn Þ be a given sample of size n. It is to be tested whether
the sample comes from some Poisson distribution with unknown mean μ. How do
you estimate μ by the method of modified minimum chi-square?
Introduction xxi

Solution
Let x1 ; x2 ; . . . xn be arranged in K groups such that there are ni observations with
x ¼ i; i ¼ r þ 1; . . .; r þ k  2; nL observations x  r, and nu observations with
x r þ k  1; so that the smallest and the largest values of x that are fewer are
pooled together and

X
rþ k2
nL þ ni þ nu ¼ n
i¼r þ 1

Let

eu li
pi ðlÞ ¼ Pðx ¼ iÞ ¼
i!
Xn
pL ðlÞ ¼ Pðx  rÞ ¼ pi ðlÞ
i¼0
X
1
pu ðlÞ ¼ Pðx r þ k  1Þ ¼ pi ðlÞ
i¼r þ k1

Now, by using

Xk
ni @pi ðhÞ
¼ 0 j ¼ 1; 2; . . .:p
p ðhÞ @hj
i¼1 i

We have
r 
P  P
1  
i
l  1 pi ðlÞ X
rþ k2   i
l  1 pi ðlÞ
i i¼r þ k1
nL i¼0 P þ ni  1 þ nu P
1 ¼0
r
l
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1

Since there is only one parameter, i.e., p ¼ 1 we get the only above equation.
Solving,we get

P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1

n^ nL P r þ ini þ nu P 1
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1

¼ sum of all x0 s

^ is approximately the sample mean x


Hence l
xxii Introduction

Example In general, we consider n uncorrelated observations y1 ; y2 ; . . .yn such that


Eðyi Þ ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki and V(yi ) = r2 ; i ¼ 1; 2; . . .. . .; n; x1i ¼
18i; where b1 ; b2 . . .. . .. . .. . .bk and r2 are unknown parameters. If Y and b stand
for column vectors of the variables yi and parameters bj and if X ¼ ðxji Þ be an
ðn  kÞ matrix of known coefficients xji the above equation can be written as

EðYÞ ¼ Xb and V(e) = Eðee0 Þ ¼ r2 I

Where e ¼ Y  Xb is an ðn  1Þ vector of error random variable with EðeÞ ¼ 0 and


I is an ðn  nÞ identity matrix. The least square method requires that b0 s be calculated
such that / ¼ ee0 ¼ ðY  Xb Þ0 ðY  Xb Þbe the minimum. This is satisfied when

@/
¼ 0 on 2X 0 ðY  Xb Þ ¼ 0
@b

^ ¼ ðX 0 XÞ1 X 0 Y.
The least square estimators b0 s is thus given by the vector b
Example Let yi ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki ; i ¼ 1; 2; . . .. . .; n or
Eðyi Þ ¼ b1 x1i þ b2 x2i ; x1i ¼ 1 for all i. Find the least square estimates of b1 and b2 .
Prove that the method of maximum likelihood and the method of least square are
identical for the case of normal distribution.
Solution
In matrix notation, we have
0 1 01
1 x21 y1
B1  
B x22 C
C  b1
B y2 C
B C
EðY) = Xb where X ¼ B .. .. C; b ¼ and Y ¼B . C
@. . A b2 @ .. A
1 x2n yn

Now,

^ ¼ ðX 0 XÞ1 X 0 Y
b

Here
0 1
1 x21
 B 1 x22 C  P 
1 1 ... 1 B C n x2i
0
XX¼ B. C
.. C ¼ P P
x21 x22 ... x2n B
@ .. . A x2i x22i
1 x2n
 P 
0 yi
XY¼ P
x2i yi
Introduction xxiii

Then
 P 2 P  P 
^ ¼ P 1 x2i  x2i yi
b P P P
n x2i  ð x2i Þ2
2  x2i n x2i yi
P 2 P P P 
1 x2i yi  x2i x2i yi
¼ P 2 P P P P P
n x2i  ð x2i Þ2  x2i yi þ n x2i yi

Hence
P P P P
^ ¼n x2i yi  x2i yi
b 2 P P
n x22i  ð x2i Þ2
P P
x2i yi  nx2y
¼ P 2
x2i  nx22
P
ðx2i  x2 Þðyi  yÞ
¼ P
ðx2i  x2 Þ2

and
P P PP
^ ¼ x22i yi x2i x2i yi
b 1 P P
n x2  ð x2i Þ2
P 2 2i P
y x2i  x2 x2i yi
¼ P 2
x2i  nx2
P
ynx22  x2 x2i yi
¼ y þ P
x22i  nx22
^
¼ y  x2 b 2

Let yi be an independent Nðb1 þ b2 xi ; r2 Þ variate, i ¼ 1; 2; . . .. . .; n so that


Eðyi Þ ¼ b1 þ b2 xi: The estimators of b1 and b2 are obtained by the method of least
square on minimizing

X
n
/¼ ðyi  b1  b2 xi Þ2
i¼1

The likelihood estimate is


 n P
1 2
e2r2 ðyi b1 b2 xi Þ
1
L¼ pffiffiffiffiffiffi
2pr
xxiv Introduction

P
n
L is maximum when ðyi  b1  b2 xi Þ2 is minimum. By the method of
i¼1 Pn
maximum likelihood we choose b1 and b2 such that ðyi  b1  b2 xi Þ2 ¼ / is
i¼1
minimum. Hence both the methods of least square and maximum likelihood esti-
mator are identical.
Chapter 1
Theory of Point Estimation

1.1 Introduction

In carrying out any statistical investigation, we start with a suitable probability


model for the phenomenon that we seek to describe (The choice of the model is
dictated partly by the nature of the phenomenon and partly by the way data on the
phenomenon are collected. Mathematical simplicity is also a point that is given
some consideration in choosing the model). In general, model takes the form of
specification of the joint distribution function of some random variables
X 1 ; X 2 ; . . . X n (all or some of which may as well be multidimensional). According
to the model, the distribution function F is supposed to be some (unspecified)
member of a more or less general class F of distribution functions.
Example 1.1 In many situations, we start by assuming that X 1 ; X 2 ; . . . X n are iid
(independently and identically distributed) unidimensional r.v’s (random variables)
with a common but unspecified distribution function, F1, say. In other words, the
model states that F is some member of the class of all distribution functions of the
form
Y
n
Fðx1 ; x2 ; . . .; xn Þ ¼ F1 ðxi Þ:
i¼1

Example 1.2 In traditional statistical practice, it is frequently assumed that


X 1 ; X 2 . . . X n have each the normal distribution (but its mean and/or variance being
left unspecified), besides making the assumption that they are iid r.v’s.
In carrying out the statistical investigation, we then take as our goal, the task of
specifying F more completely than is done by the model. This task is achieved by
taking a set of observations on the r.v’s X 1 ; X 2 ; . . . ; X n . These observations are the
raw material of the investigation and we may denote them, respectively, by
x1 ; x2 ; . . . ; xn . These are used to make a guess about the distribution function F,
which is partly unknown.

© Springer India 2015 1


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_1
2 1 Theory of Point Estimation

The process is called Statistical Inference, being similar to the process of inductive
inference as envisaged in classical logic. For here too the problem is to know the
general nature of the phenomenon under study (as represented by the distribution of
the r.v’s) on the basis of the particular set of observations. The only difference that in a
statistical investigation induction is achieved within a probabilistic framework.
Probabilistic considerations enter into the picture in three ways. Firstly, the model used
to represent the field of study is probabilistic. Second, certain probabilistic principles
provide the guidelines in making the inference. Third, as we shall see in the sequel, the
reliability of the conclusions also is judged in probabilistic terms.
Random Sampling
Consider a statistical experiment that culminate in outcomes x which are the values
assumed by a r.v. X. Let F be the distribution function of X. One can also obtain n
independent observations on X. This means that the n values observed as
x1 ; x2 ; . . . ; xn are assumed by the r.v. X [This can be obtained by replicating the
experiment under (more or less) identical conditions]. Again each xi may be
regarded as the value assumed by a r.v. Xi, i = 1 (1)n, where X 1 ; X 2 ; . . . X n are
independent random variables with common distribution function F. The set
X 1 ; X 2 ; . . . X n of iid r.v’s is known as a random sample from the distribution
function F. The set of values ðx1 ; x2 ; . . . ; xn Þ is called a realization of the sample
ðX 1 ; X 2 ; . . .; X n Þ.
Parameter and Parameter Space
A constant which changes its value from one situation to another is knownpa-
rameter. The set of all admissible values of a parameter is often called the parameter
space. Parameter is denoted by θ (θ may be a vector). We denote the parameter
space by H.
Example 1.3
(a) Let y ¼ 2x þ h. Here, θ is a parameter and

H ¼ fh;  / \ h \ /g:

(b) Let x  bð1; pÞ. Here, p is a parameter and

H ¼ fp; 0 \ p \ 1g:

(c) Let x  PðkÞ Here, λ is a parameter and

H ¼ fk; k [ 0g:

(d) Let x  Nðl0 ; r2 Þ, μ0 is a known constant.


Here, σ is a parameter and H ¼ fr; r [ 0g:
 r Þ, both μ and σ are unknown.
(e) Let x  Nðl; 2

l   
Here, h ¼ is a parameter and H ¼ lr ; 1 \ l \ 1; r [ 0
r
1.1 Introduction 3

Family of distributions
Let X  Fh where h e H. Then the set of distribution functions {Fθ, h e H} is called
a family of distribution functions.
Similarly, we define family of p.d.f’s and family of p.m.f’s.
Remark
(1) If functional form of Fθ is known, then θ can be taken as an index.
(2) In the theory of estimation, we restrict ourselves to the case H  Rk when k is
the number of unknown functionally unrelated parameters.

Statistic
A statistic is a function of observable random variable which must be free from
unknown parameter(s), that is a Borel measurable function of sample observations
X ¼ ðx1 ; x2 ; . . . ; xn Þ 2 Rn f : Rn ! Rk is often called a statistic.


Example 1.4 Let X 1 ; X 2. . . X n be a random sample from Nðl; r2 Þ. Thus


P P P P
X i ; X 2i ; X i ; X 2i each of these is a statistic.
Estimator and estimate
Any statistic which is used to estimate (or to guess) τ(θ), a function of unknown
parameter θ, is said to be an estimator of τ(θ). The experimentally determined value
of an estimator is called an estimate.
Example 1.5 Let X 1 ; X 2 ; . . .; X 5 be a random sample from P(λ).
P
 ¼ 1 5 Xi .
An estimator of λ is X 5 i¼1
Suppose the experimentally determined values are X1 ¼ 1; X2 ¼ 4; X3 ¼ 2;
X4 ¼ 6 and X5 ¼ 0.
Then the estimate of λ is 1 þ 4 þ 52 þ 6 þ 0 ¼ 2:6.

1.2 Sufficient Statistic

In statistics, the job of a statistician is to interpret the data that he has collected and to
draw statistically valid conclusion about the population under investigation. But, in
many cases the raw data, which are too numerous and too costly to store, are not
suitable for this purpose. Therefore, the statistician would like to condense the data by
computing some statistics and to base his analysis on these statistics so that there is no
loss of relevant information in doing so, that is the statistician would like to choose
those statistics which exhaust all information about the parameter, which is contained
in the sample. Keeping this idea in mind, we define sufficient statistics as follows:
Definition Let X ¼ ðX 1 ; X 2 ; . . . ; X n Þ be a random sample from fF h ; h 2 Hg.

A statistic Tð X Þ is said to be sufficient for θ [or for the family of distribution

fF h ; h 2 Hg] iff the conditional distribution of X given T is free from θ.

4 1 Theory of Point Estimation

Illustration 1.1 Suppose we want to study the nature of a coin. To do this, we


want to estimate p, the probability of getting head in a single toss. To estimate p,
n tosses are performed. Suppose the results are X 1 ; X 2 ; . . . ; X n where

0 if tail appears
Xi ¼
1 if head appears ðin ith tossÞ:

Intuitively, it sums unnecessary to mention the order of occurrences of head. To


estimate p, it is enough to keep the record of the number of heads. So the statistic
T = ΣXi should be sufficient for p.
Again, conditional distribution of X1 ¼ x1 ; X2 ¼ x2 ; . . .; X n ¼ xn given Tð X Þ ¼

t where t ¼ T ðX1 ¼ x1 ; X2 ¼ x2 ; . . . ; Xn ¼ xn Þ is given by
(
PðX1 ¼ x1 ;X2 ¼ x2 ;...;Xn ¼ xn ;T¼tÞ
PðT ¼ tÞ if T ¼ t
0 otherwise
8 P P
>
> xi
ð1pÞ
n xi
>
< n
p
if T ¼ t
¼ p ð1pÞ
t nt
>
> t
>
:
0 otherwise
8 1
>   if T ¼ t
< n
¼
>
:
t
0 otherwise

which is free from parameter p.


So from definition of sufficient statistics, we observe that Σxi is a sufficient
statistic for p.
Illustration 1.2 Let X1 ; X2 ; . . .; Xn be a random samples from N(μ, 1) where μ
is unknown. Consider an orthogonal transformation of the form

X1 þ X2 þ    þ Xn
y1 ¼ p
n

ðk  1ÞX k  ðX 1 þ X 2 þ    þ X k1 Þ
and yk ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; k ¼ 2ð1Þn:
k ð k  1Þ
pffiffiffi
Clearly, y1  Nð n l; 1Þ and each of yk  Nð0; 1Þ:
Again, y1, y2,…, yn are independent.
Note that the joint distribution of y2 ; y3 ; . . .; yn does not involve μ, i.e. y2 ; . . .; yn
do not provide any information on μ. So to estimate μ, we use either the obser-
vations on X 1 ; X 2 ; . . . X n or simply the observed value of y1. So any analysis based
on y1, is just as effective as the analysis that is based on all observed values on
X 1 ; X 2 ; . . . X n . Hence, we can suggest that y1 is a sufficient statistic for μ.
1.2 Sufficient Statistic 5

From the above discussion, we see that the conditional distribution of


ðy2 ; y3 ; . . .; yn Þ given y1 is same as the unconditional distribution of
ðy2 ; y3 ; . . . ; yn Þ. Hence, the conditional distribution of X given y1 will be free from

μ.
Thus according to the definition of sufficient statistics, y1 will be a sufficient
statistic for μ.
However, this approach is not always fruitful. To overcome this, we consider a
necessary and sufficient condition for a statistic to be sufficient.
We first consider the Fisher–Neyman criterion for the existence of a sufficient
statistic for a parameter.
Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ be a random sample from a population with contin-

uous distribution function F h ; h 2 H. Let Tð X Þ be a statistic whose probability

density function is fgfTð x Þ; hgg. Then Tð X Þ is a sufficient statistic for h iff the
 
joint probability density function f ð x ; hÞ of X 1 ; X 2 ; . . .; X n can be expressed as

f ð X ; hÞ ¼ gfTð x Þ; hghð x Þ whose, for every fixed value of Tð x Þ, hð x Þ does not
    
depend upon h.
Example 1.5 Let X 1 ; X 2 ; . . .; X n be a random sample from the distribution that has
probability mass function  
P
f ðx; hÞ ¼ h ð1  hÞ ; x ¼ 0; 1; 0 \ h \ 1. The statistic T X ¼ ni¼1 X i has
x 1x

the probability mass function

n!
gðt; hÞ ¼ ht ð1  hÞnt ; t ¼ 0; 1; 2; . . .; n
t!ðn  tÞ!
Thus the joint probability mass function of X 1 ; X 2 ; . . .; X n may be written


f x ; h ¼ hx1 þ x2 þ  þ xn  ð1  hÞnðx1 þ x2 þ  þ xn Þ

n! t!ðn  tÞ!
¼ ht ð1  hÞnt   
t!ðn  tÞ! n!
 
By Fisher–Neyman criterion, T X ¼ X 1 þ X 2 þ    þ X n is a sufficient

statistic for h. In some cases, it is quite tedious to find the p.d.f or p.m.f of a certain
statistic which is or is not a sufficient statistic for h. This problem can be avoided if
we use the following
Fisher–Neymann factorization theorem
Let X ¼ ðX 1 ; X 2 ; . . . X n Þ be a random sample from a population with c.d.f.

F h ; h e H. Furthermore, let all X 1 ; X 2 ; . . . X n are of discrete type or of continuous
6 1 Theory of Point Estimation

type. Then a statistic Tð x Þ will be sufficient for θ or for fF h ; h e Hg iff the joint

p.m.f. or p.d.f. f ð X ; hÞ, of X 1 ; X 2 ; . . . X n can be expressed as


f ð X ; 0Þ ¼ gfTð X Þ; hg  hð x Þ
  

where the first factor gfTð x Þ; hg is a function of θ and x only through T( x ) and for
 
fixed T( x ) the second factor h( x ) is free from θ and is non-negative.
 

Remark 1.1 When we say that a function is free from θ, we do not only mean that θ
does not appear in the functional form but also the domain of the function does not
involve θ.
e.g. the function

1=2; h  1\x\hþ1
f ðxÞ ¼
0 otherwise

does depend upon θ.


      
0
Corollary 1.1 Let T X be a sufficient statistic for θ and T X = ψ T X
  
 
be a one-to-one function of T. Then T 0 X is also sufficient for the same

parameters θ.
Proof Since T is sufficient for θ, by factorization theorem, we have

f ð x ; hÞ ¼ gfTð x Þ; hg  hð x Þ
  



Since the function T 0 x is one-to-one

h i
f ð x ; hÞ ¼ g w1 fT 0 ð x Þg; h hð x Þ:
  
h

Clearly, the first factor of R.H.S. depends on h and x only through T 0 ð x Þ and
 
the second factor h( x ) is free from θ and is non-negative.

Therefore, according to factorizability criterion, we can say that T 0 ð x Þ is also

sufficient for the same parameter θ.
Example 1.6 Let X1 ; X2 ; . . . Xn be a random sample from b(1, π). We show that 1/n
ΣXi is a sufficient statistic for π.
1.2 Sufficient Statistic 7

P.m.f. of x is

hX ð1  hÞ1X if x ¼ 0; 1 ½h  p 
fh ðxÞ ¼
0 otherwise

where 0 \ h \ 1, i.e. the parameter space is H ¼ ð0; 1Þ. Writing fh ðxÞ in the
form

1 if x ¼ 0; 1
fh ðxÞ ¼ C ðxÞhx ð1  hÞ1x with CðxÞ ¼
0 Otherwise:

We find that joint p.m.f. of X 1 ; X 2 ; . . . X n is

P fh ðxi Þ ¼ hRXi ð1  hÞnRXi P Cðxi Þ


i i
¼ gh ð t Þ h ð x 1 ; x 2 ; . . . x n Þ ðSayÞ
Q
where t ¼ Rxi ; gh ðtÞ ¼ ht ð1  hÞnt and h ðx1 ; x2 ; . . .; xn Þ ¼ i Cðxi Þ:
Hence, the factorization criterion is met by the joint distribution, implying that
T = ΣXi is sufficient for θ. So is T/n, the sample proportion of successes being
one-to-one correspondence with T.
Example 1.7 Let X1,…, Xn be a random sample from P(λ). We show that 1/n ΣXi is
a sufficient statistic for λ.
The p.m.f. of the Poisson distribution is

eh hx
fh ðxÞ ¼ x! if x ¼ 0; 1; 2. . .½h  k
0 otherwise

where 0 \ h /, i.e. H ¼ ð0; /Þ


Let us write the p.m.f. in the form fh ðxÞ ¼ CðxÞ eh hX
1
if x ¼ 0; 1; 2; . . .
with CðxÞ ¼ x!
0 otherwise

We may represent the joint p.m.f. of X 1 ; X 2 ; . . . X n as


P
P fh ðxi Þ ¼ enh h Xi
P Cðxi Þ
i i
¼ gh ðtÞhðx1 ; x2 . . .xn Þ; ðSayÞ
P
where t ¼ xi ; gh ðtÞ ¼ enh ht and hðx1 ; x2 ; . . .xn Þ ¼ P Cðxi Þ:
i
8 1 Theory of Point Estimation

The factorizability condition is thus observed to hold, so that T = ΣXi is sufficient


for θ; so is T/n ¼ X, the sample mean.
Example 1.8 Let X 1 ; X 2 ; . . . ; X n be a random sample from Nðl; r2 Þ. Show that
(i) if σ is known, ΣXi is a sufficient statistic for μ, (ii) if μ is known RðX i  lÞ2 is a
sufficient statistic for σ2, and (iii) if both μ and σ are unknown RX i ; RX 2i is a
sufficient statistic for ðl; r2 Þ.
Ans. (i) we may take the variance to be σ2 and the unknown mean to be μ,
varying over the space H ¼ ð /; /Þ. Here, the joint p.d.f. of X 1 ; X 2 ; . . . ; X n is
Q n 1  12 ðxi lÞ2 o
pffiffiffiffi e 2r
i r 2p

P
1  12 ðxi lÞ2
¼  pffiffiffiffiffiffin e
2r
i

r 2p
8 P 2 9
nðxlÞ2
<  i ðxi xÞ 1 =
¼ e 2r2 e 2r2  pffiffiffiffiffi n
: ðr 2nÞ ;
¼ gl ðtÞ  h ðx1 ; x2 ; . . .; xn Þ; ðSayÞ

 2
where t ¼ x; gl ðtÞ ¼ enðXlÞ =2r2
P
 1 ðxi xÞ2
and hðx1 ; x2 ; . . . ; xn Þ ¼ p1ffiffiffiffi n e 2r2
ð r 2p Þ

Thus the factorizability condition holds with respect P to T ¼ X, the sample mean,
which is therefore sufficient for μ. So the sum is i X i
(ii) The unknown variance σ2 = θ, say, is supposed to vary over H ¼ ð0; /Þ.
The joint p.d.f. of X 1 ; X 2 ; . . . X n may be written as
P
Y 1  12 ðxi lÞ2

1
1
2r2
ðxi lÞ2
pffiffiffiffiffiffi e 2r ¼ pffiffiffiffiffiffi n e i ¼ gh ðtÞ hðx1 ; x2 . . . xn Þ; say;
i r 2p ðr 2pÞ

P P 2
ðxi  lÞ2 ; gh ðtÞ ¼ pffiffiffiffi1 n e2r2 ðxi lÞ
1
where t ¼
ð 2prÞ
i
P
r  h and h x ¼ 1. Hence, T ¼
2
ðX i  lÞ2 is a sufficient statistic for θ. So

is
P
S20 ¼ 1n i ðxi  lÞ2 , which is in this situation commonly used to estimates σ2.
1.2 Sufficient Statistic 9

(iii) Taking the unknown mean and variance to be θ1 and θ2, respectively, we
now have for θ a vector θ = (θ1, θ2) varying over the parameter space (which is a
half-plane) H ¼ fðh1 ; h2 Þ = / \ h1 \ /; 0 \ h2 \ /g.
The jt. p.d.f. of X 1 ; X 2 ; . . . ; X n may now be written as
Y 1 2

1 2h1 ½nðxh1 Þ2 þ ðn1Þs2 
pffiffiffiffiffiffiffiffiffiffi e2h2 ðxi h1 Þ
1
¼ n e 2

i 2ph2 ð2ph2 Þ 2

X .
¼ gh ðt1 ; t2 Þ hð x Þ; say; where t1 ¼ x; t2 ¼ s2 ¼ ðxi  xÞ2 ðn  1Þ

i
1

2h1 ½nðxh1 Þ2 þ ðn1Þs2 
gh ð t 1 ; t 2 Þ ¼ n e 2 and h x ¼ 1 :
ð2ph2 Þ2 

The factorizability condition is thus observed to hold with regard to the statistics
T 1 ¼ X, the sample mean and T2 ¼ s2, the sample variance.  
Hence, X and s2 are jointly sufficient for θ1 and θ2, i.e. RX i ; RX 2i is a joint
sufficient statistic for (μ, σ2).
Example 1.9 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(0, θ).
Show that X ðnÞ ¼ max Xi is a sufficient statistic for θ.
1in
Ans.: The jt. p.d.f. of x1 ; x2; . . .; xn is

 1
hn if 0 \ xi \ h 8i
f x;h ¼
 0otherwise
 1
if 0 \ xðnÞ \ h
hn
¼
0 otherwise

1   1 if a \ x \ b
¼ n I ð0;hÞ xðnÞ where I ða;bÞ ðxÞ ¼
h 0 otherwise
¼ gfxðnÞ ; hg  hð x Þ; say


where gfxðnÞ; hg ¼ h1n I ð0;hÞ fxðnÞ g & hð x Þ ¼ 1.



Note that g{x(n), θ} is a function of θ and x only through x(n) whereas for fixed

xðnÞ ; hð x Þ is free from θ.

Hence, x(n) is a sufficient statistic for θ.
Example 1.10 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(θ1, θ2). Show that
fX ð1Þ ; X ðnÞ g is a sufficient statistic for θ = (θ1, θ2) where Xð1Þ ¼ min X i ,
1in
X ðnÞ ¼ max X i .
1  i n
10 1 Theory of Point Estimation

Solution Joint p.d.f. of X 1 ; X 2 ; . . . ; X n is


(

1
if h1 \ xi \ h2 8i
ðh2 h1 Þn
f x;h ¼
 0 otherwise
(
1
ðh2 h1 Þn
if h1 \ xð1Þ \ xðnÞ \ h2
¼
0 otherwise
1
¼ I1ðh1 ;1Þ fxð1Þ gI2ð1;h2 Þ fxðnÞ g
ðh2  h1 Þn

where

1 if h1 \ xð1Þ \ /
I1ðh1 ;1Þ fxð1Þ g ¼
0 otherwise

1 if  / \ xn \ h2
and I 2 ð /; h2 ÞfxðnÞ g ¼ , i.e. f ð x ; hÞ ¼ g½fxð1Þ ; xðnÞ g;
0 otherwise 


ðh1 ; h2 Þh x where g½fxð1Þ ; xðnÞ g; ðh1 ; h2 Þ ¼ ðh2 h
1

n I 1ð0 ;/Þ fxð1ÞgI 2ð1;h Þ

1 2
 
xðnÞ and h x ¼ 1.

Note that g is a function of (θ1, θ2) and x only through {x(1), x(n)} where as for



fixed {x(1), x(n)}, h x is free from θ.



Hence, {x(1), x(n)} is a sufficient statistic for (θ1, θ2).
Example 1.11 Let X 1 ; X 2 ; . . .; X n be a random sample from a population having
p.d.f.

eðx hÞ ; x[ h
f ðx; hÞ ¼
0 otherwise

Show that X ð1Þ ¼ min X i is a sufficient statistic for θ.


1in
Solution The p.d.f. can equivalently be written as

eðXhÞ ; xð1Þ [ h
f ðx;hÞ ¼
0 otherwise
1.2 Sufficient Statistic 11

Now, the joint p.d.f. of X 1 ; X 2 ; . . . ; X n is


8 P
<  ðxi hÞ
f ð x ;hÞ ¼ e i ; xð1Þ [ h
 :
0 otherwise
P 
 ðxi  hÞ     1 if xð1Þ [ h
f ð x ; hÞ ¼ e i I ðh;/Þ xð1Þ where I ðh;/Þ xð1Þ ¼
 0 otherwise


¼ g fxð1Þ ; hg : h x ; say

P  

where gfxð1Þ ; hg ¼ e i
ðxi hÞ
I ðh;/Þ xð1Þ and h x ¼ 1.

Note that g{x(i), θ} is a function of θ and x only through x(1) and for fixed x(1), h

(x) is free from θ. Hence, according to factorizability criterion, x(1) is a sufficient
statistic for θ.
Note In the above three problems the domain of the probability density depends
upon the parameter h. In this situation, we should aware to apply Fisher–Neyman
factorization

theorem
and we should give
proper
consideration to the domain of the
function h x for every fixed value of T X . In these situations, it is better to use
 
Fisher–Neyman criterion. Let us solve Example 1.10 by using Fisher–Neyman
criterion:

1
f ð xÞ ¼ ; h1 \ x \ h2
h 2  h1
Let X ð1Þ ¼ min X i ¼ y1 ; X ðnÞ ¼ max X i ¼ y2
1in 1in
The joint p.d.f. of y1 ; y2 is

nð n  1Þ
gð y 1 ; y 2 ; h1 ; h2 Þ ¼ ðy  y1 Þn2 ; h1 \ y1 \ y2 \ h2
ð h2  h1 Þ n 2

The joint p.d.f. of X 1 ; X 2 ; . . .; X n is



1
f x ; h1 ; h2 ¼
 ð h2  h1 Þ n
nð n  1Þ  n2 1
¼ n xðnÞ  xð1Þ  n2
ð h2  h1 Þ nðn  1Þ xðnÞ  xð1Þ
 

¼ g xðnÞ ; xð1Þ ; h1 ; h2 h x

12 1 Theory of Point Estimation
 
By the Fisher–Neyman criterion, xð1Þ ; xðnÞ is a sufficient statistic for
h ¼ ð h1 ; h2 Þ
Example 1.12 Let X  Nð0; r2 Þ, show that jX j is sufficient for σ.
Solution

1 X2
f ðx; rÞ ¼ pffiffiffiffiffiffiffiffi e2r2 ; r[0
2kr
1 jX j2
¼ pffiffiffiffiffi e 2r2  1
2kr
¼ gðt; rÞ hðxÞ; hðxÞ ¼ 1

where g(t, σ) is a function of σ and x only through t ¼ jxj and for fixed t, h(x) = 1 is
free from σ.
Hence, by Fisher–Neymam factorization theorem, jX j is sufficient for σ.
Example 1.13 Let X 1 ; X 2 ; . . . X n be a random sample from a double-exponential
distribution whose p.d.f. may be taken as fh ðX Þ ¼ 12 exp ðjxi  hjÞ, and the
unknown parameter θ varies over the space H ¼ ð  /; /Þ. 
Q P
In this case, the joint p.d.f. is i f h ðxi Þ ¼ 21n exp  i jxi  hj .
For no single statistic T, it is now not possible to express the joint p.d.f. in the
form gθ(t) h(x1, x2, … xn).
Hence, there exists no statistic T which taken alone is sufficient for θ. The whole
set X1, X2, …, Xn, or the set X(1), X(2), … X(n), is of course sufficient.
Remarks 1.2 A single sufficient statistic does not always exist.
e.g. Let X1, X2,…, Xn be a random sample from a population having p.d.f.
1
f ðx;hÞ ¼ h; k h \ x \ ðk þ 1Þ h; k [ 0
0 otherwise
Here, no single sufficient statistic for θ exists. In fact, {x(1), xn)} is sufficient for
θ.
Remark 1.3 Not all functions for sufficient statistic are sufficient. For example, in
2
random sampling from N(μ, σ2), σ2 being known, X is not sufficient for μ. (Is X
sufficient for μ2 ?)
Remark 1.4 Not all statistic are sufficient.
Let X1, X2 be a random sample from P(λ). Then X1 + 2X2 is not sufficient for λ,
because in particular, say

PfX 1 ¼ 0; X 2 ¼ 1 X 1 þ 2X 2 ¼ 2g
PfX 1 ¼ 0; X 2 ¼ 1j X 1 þ 2X 2 ¼ 2g ¼
PfX 1 þ 2X 2 ¼ 2g
1.2 Sufficient Statistic 13

PfX 1 ¼ 0; X 2 ¼ 1g
¼
PfX 1 ¼ 0; X 2 ¼ 1g þ PfX 1 ¼ 2; X 2 ¼ 0g
k k
e e k
¼ 2
ek  ek  k þ ek  ek  k2!

¼ k þ2 2 which depends upon λ.


Remarks 1.5 Let h ¼ ðh1 ; h2 ; . . .; hk Þ and T ¼ ðT 1 ; T 2 ; . . . T m Þ. Further, let T
 
be a sufficient statistic for h . Then we cannot put any restriction on m, i.e. m ≥ k,

the number of parameters involved in the distribution. Even if m = k, then we
cannot say that Ti of T is sufficient for θi of h . It is better to say that (T1, T2, … Tm)
 
are jointly sufficient for (θ1, θ2, … θk).
Let X1, X2,…, Xn be a random sample from N(μ, σ2). Here, ΣXi and ΣX2i are
jointly sufficient for μ and σ2.
Remarks 1.6 The whole set of observations X ¼ ðX 1 ; X 2 ; . . . ; X n Þ is always

sufficient for h . But we do not consider this to be real sufficient statistic when

another sufficient statistic exists. There are a few situations where the whole set of
observations is a sufficient statistic. [As shown in the example of
double-exponential distribution].
Remarks 1.7 The set of all order statistics T{X(1), X(2), …, X(n)}, X(1) < X(2),
…, < X(n), is sufficient for the family.
Conditional distribution of ð X =T ¼ tÞ ¼ n!1 because for each T = t, we have n-

tuples of the form (x1, x2, … xn).
Remarks 1.8: Distribution

admitting

sufficient
statistic Let X1, X2, …, Xn be a
random sample from f x ; h and T X be a sufficient statistic for θ (θ is a scalar).
 
According to factorization theorem,
X
log f ðxi ; hÞ ¼ log g ðT; hÞ þ log hð x Þ

i
Differentiating w.r.t. θ, we have
X @ log f ðxi ; hÞ @ log gðT; hÞ
¼ ¼ GðT; hÞ; ðsayÞ ð1:1Þ
i
@h @h

Put a particular value of h in (1.1).


14 1 Theory of Point Estimation

X
n
Then we have uðxi Þ ¼ GðTÞ ð1:2Þ
i¼1

Now differentiating (1.1) and (1.2) w.r.t. xi, we have

@ 2 log f ðxi ; hÞ @GðT; hÞ @T


¼  ð1:3Þ
@h@xi @T @xi

@uðxi Þ @GðT Þ @T
¼  ð1:4Þ
@xi @T @xi

(1.3) and (1.4) give us

@ 2 log f ðxi ; hÞ @uðxi Þ @GðT; hÞ=@T


= ¼ 8i ð1:5Þ
@h @xi @xi @GðT Þ=@T

Since the R.H.S. of (1.5) is free from xi, we can write

@GðT; hÞ @ GðTÞ
¼ k1 ðhÞ
@T @T
) GðT; hÞ ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
P
@ i logf ðxi ; hÞ
) ¼ GðT Þk1 ðhÞ þ k2 ðhÞ
@h Z Z
X

) log f ðxi ; hÞ ¼ GðT Þ k1 ðhÞdh þ k2 ðhÞdh þ c x

i
Y

) f ðxi ; hÞ ¼ A x eh1 GðT Þ þ h2

i



where A x ¼ a function of x
 

θ1 = a function of θ, and
θ2 = another function of θ.
Thus if a distribution is to have a sufficient statistic for its parameter, it must be
of the form

f ðx; hÞ ¼ eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ : ð1:6Þ

(1.6) is known as Koopman form.


Example Show, by expressing a Poisson p.m.f. in Koopman form, that Poisson
distribution possesses a sufficient statistic for itsparameter k.
1.2 Sufficient Statistic 15

k x
Here, f ðx; kÞ ¼ e x!k ¼ ek þ x log klog x!
which is of the form eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ .
Hence, there exists a sufficient statistic for k.
Completeness A family of distributions is said to be complete

if E ½gðX Þ ¼ 0 8h2H
) PfgðxÞ ¼ 0g ¼ 1 8h2H

A statistic T is said to be complete if family of distributions of T is complete.


Examples 1.14P(a) Let X1, X2,…, Xn be a random sample from b(1, π), 0 < π < 1.
Then T ¼ ni¼ 1 X i is a complete statistic.

As E ½gðT Þ ¼ 0 8 p 2 ð0; 1Þ
X n  
) gðtÞ nt pt ð1  pÞnt ¼ 0
t¼0
X
n n 
p t
) ð 1  pÞ n gð t Þ ¼ 0 8 p 2 ð0; 1Þ
t¼0
t
1p
) gð t Þ ¼ 0 for t ¼ 0; 1; 2. . . n 8 p 2 ð0; 1Þ
) P fgð t Þ ¼ 0g ¼ 1 8p
(b) Let X * N (0, σ2). Then X is not complete

as; E ð X Þ ¼ 0 ; P ð X ¼ 0Þ ¼ 1 8 r2

(c) If X  Uð0; hÞ, then X is a complete statistic [or R(0, θ)].


A statistic is said to be complete sufficient statistic if it is complete as well as
sufficient. P
If (X1, X2,…, Xn) is a random sample from b (1, π), 0 < π < 1, then
P T ¼ X i is
also sufficient. So T is a complete sufficient statistic where T ¼ Xi.
Minimal Sufficient Statistic
A statistic T is said to be minimal sufficient if it is a function of every other
sufficient statistics.
The sufficiency principle
A sufficient statistic for a parameter h is a statistic that, in a certain sense, captures
all the information about h contained in the sample. Any additional information in
the sample, besides the value of sufficient statistic, does not contain any more
information about h. These considerations lead to the data reduction technique
knownas sufficiency
 principle.
If T X is a sufficient statistic for h, then any inference about h should depend

 
on the sample X only through the value T X , that is, if x and y are two sample
   
16 1 Theory of Point Estimation


 
points such that T x ¼ T y , then the inference about h should be the same
 
whether X ¼ x or X ¼ y is observed.
   
 
Definition (Sufficient statistic) A statistic T X is a sufficient statistic for h if the

 
conditional distribution of the sample X given the value of T X does not depend
 
on h.

Factorization theorem: Let f x jh denote the joint pdf/pmf of a sample X .
 
 

A statistic T X is a sufficient statistic for h iff 9 functions gðtjhÞ and h x such
 
that for all sample points X and all parameter values h,




f x jh ¼ gðtjhÞh x
 
     
Result: If T X is a function of T 0 X , then T 0 X is sufficient which
  
 
implies that T X is sufficient.

     
i.e. sufficiency of T 0 X ) sufficiency of T X ; a function of T 0 X
  
 
0 0 0
Proof Let fBt0 jt 2 s g and fAt jt 2 sg be the partitions induced by T X and

 
T X , respectively. h

   
Since T X is a function of T 0 X , for t0 2 s0 ) Bt0  At , for some 8t 2 s.
 
 
0
Thus Sufficiency of T X

 
, Conditional distribution of X ¼ x given T 0 X ¼ t0 is independent of h,
  
8t0 2 s0
, Conditional distribution of X ¼ x given X 2 Bt0 is independent of h, 8t0 2 s0
  
) Conditional distribution of X ¼ x given X 2 At (for some 8t 2 s) is inde-
  
pendent of h, 8t 2 s  
, Conditional distribution of X ¼ x given T X ¼ t is independent of h,
  
8t 2 s.
1.2 Sufficient Statistic 17
 
, Sufficiency of T X .


Sufficient statistic for an Exponential family of distributions: 


Let X1 ; X2 ; . . .; Xn be i.i.d. observations from a pdf/pmf f xjh that belongs to
an exponential family given by
      !
X
k
f xjh ¼ hð xÞc h exp xi h ti ð xÞ
i¼1

where h ¼ ðh1 ; h2 ; . . .hd Þ, d  k. Then




  !
X
n   X
n  
T X ¼ t1 Xj ; . . .; tk X j

j¼1 j¼1

is a (complete) sufficient statistic for h .




Minimal sufficient statistic


When we introduced the concept of sufficiency, we said that our objective was to
condense the data without losing any information about the parameter. In any
problem, there are, in fact, many sufficient statistics. In general, we have to consider
the choice between alternative sets of sufficient statistics. In a sample of n obser-
vations, we always have a set of n sufficient statistics [viz., the observations X ¼
  
ðX1 ; X2 ; . . .; Xn Þ themselves or the order statistics Xð1Þ ; Xð2Þ ; . . .; XðnÞ ] for the
kð 1Þ parameters of the distributions. For example, in sampling from N ðl; r2 Þ
distribution with both l and r2 unknown, there are, in fact, three sets of jointly
sufficient statistic: the observations X ¼ ðX1 ; X2 ; . . .; Xn Þ, the order statistics
  
 s2 Þ. We naturally prefer the jointly sufficient statistic
Xð1Þ ; Xð2Þ ; . . .; XðnÞ and ðX;
 s2 Þ since they condense the data more than either of the other two. Sometimes,
ðX;
though not always, there will be a set of sð\nÞ statistics sufficient for the
parameters. Often s ¼ k but s may be <k also.
The question that we might ask is as follows: Does 9 a set of sufficient statistic
that condenses the data more than ðX;  s2 Þ? The answer is there does not. The notion
that we are alluding to is of minimum set of sufficient statistics, which we label
minimal sufficient statistic. In other words, we have to ask: what is the smallest
number s of statistics that constitute a sufficient set in any problem? It may be said
in general that a sufficient statistic T may expected to be minimal sufficient if it has
the same dimensions (i.e. the same number of components) as h.
18 1 Theory of Point Estimation

Statistics and partition


It may be noted that every statistic induces a partition of  x. The same is true for a set
of statistics; a set of statistics induces a partition of  x. Loosely speaking, the
condensation of data that a statistic or a set of statistics exhibits can be measured by
the number of subsets in the partition induced by the statistic or a set of statistics. If
a set of statistics has fewer subsets (co-sets) in its induced partition than the induced
partition of another set of statistics, then we say that the first statistic condenses the
data more than the later. Still loosely speaking, a minimal sufficient set of statistics
is then a sufficient set of statistics that has fewer subsets (co-sets) in its partition
than the induced partition of any other set of sufficient statistics. So a set of
sufficient statistic is minimal if no other set of sufficient statistics condenses the data
more without losing sufficiency.
Thus T is minimal sufficient if any further reduction of data is not possible
without losing sufficiency, i.e. T is minimal sufficient if there does not exist a
function U ¼ wðT Þ such that U is sufficient.
Definition (Minimal sufficient statistic) A sufficient statistic Tð X Þ is called minimal

sufficient if, for every other sufficient statistic T 0 ð X Þ, Tð X Þ is a function of T 0 ð X Þ.
  
To say that Tð X Þ is a function of T 0 ð X Þ simply means that if T 0 ð x Þ ¼ T 0 ð y Þ,
   
then Tð x Þ ¼ Tð y Þ. In terms of the partition sets, if Bt0 jt0 2 s0 are the partition sets
 
for T 0 ð X Þ and At jt 2 T are the partition sets for Tð X Þ, then the above definition of
 
minimal sufficient statistic states that every Bt0 is a subset of some At . Thus, the
partition associated with a minimal sufficient statistic is coarsest possible partition
for a sufficient statistic, and a minimal sufficient statistic achieves the greatest
possible data reduction for a sufficient statistic.
Example Let Xi ði ¼ I; 2; . . .; nÞ  independent PðhÞ distribution. Then T ¼
Pn
i ¼ 1 Xi is sufficient for h and, in fact, it is minimal sufficient.
Pn
Since T ¼ i ¼ 1 Xi is minimal sufficient; therefore, any further reduction of the
data is not possible without losing sufficiency, i.e. there does not exist a function
U ¼ wðT Þ such that U is sufficient. Suppose that T is sufficient and if possible, 9 a
function

U ¼ wðtÞ3wðt1 Þ ¼    ¼ wðtk Þ ¼ u:

Then
8 ðnhÞti
>
>
<Pk
ti !
if t ¼ ti ði ¼ 1; 2; . . .; kÞ
ðnhÞti
Ph ½T ¼ tjU ¼ u ¼
>
> i¼1
ti !

:
0 otherwise
! depends on h ;
1.2 Sufficient Statistic 19
Pn
so that U is not sufficient retaining sufficiency. Hence, T ¼ i¼1 Xi is minimal
sufficient statistic.
Remark 1 Since minimal sufficient statistic is a function of sufficient statistic,
therefore, a minimal sufficient statistic is also sufficient.
Remark 2 Minimal sufficient statistic is not unique since any one-to-one function of
minimal sufficient statistic is also a minimal sufficient statistic.
Definition of minimal sufficient statistic does not help us to find a minimal
sufficient statistic except for verifying whether a given statistic is minimal sufficient
statistic. Fortunately, the following result of Lehman and Scheffe (1950) gives an
easier way to find a minimal sufficient statistic.


Theorem Let f x jh be the pmf/pdf of a sample X . Suppose 9 a function
 
 
  
T X 3 for every two sample points x and , and the ratio of f x jh
y f y jh
    

 
is constant as a function of h (i.e. independent of h) iff T x ¼ T y . Then
 
 
T X is minimal sufficient statistic.


 
Proof Let us assume f x jh [ 0; x 2  x and h. First, we show that T X is a
  
n o
sufficient statistic. Let s ¼ t=t ¼ Tð x Þ; x 2 
x be the image of  x under Tð x Þ.
 n
o 

Define the partition sets induced by Tð X Þ as At ¼ x jT x ¼ t . For each


  
At , choose and fix one element x 2 At . For any x 2 
x, x is the fixed element
t   Tð x Þ


that is in the same set At , as x . Since x and x


are in the same set At ,
  T x
0 1 , 0

1



T x ¼ T @ x
A and, hence, f x jh f @ x
jhA is constant as a
 T x  T x
 



f x jh
x by h x ¼   and h

function of h. Thus, we can define a function on 

f x jh
T ð x Þ
does not depend on h.   h
Define a function on s by gðtjhÞ ¼ f x jh . Then
t
20 1 Theory of Point Estimation
 

f x jh f x jh
 Tð x Þ 
f ð x jhÞ ¼ 
 ¼ gðtjhÞhð x Þ and by factorization theorem, Tð X Þ
  
f x jh
 Tð x Þ


is sufficient for h. Now to show that Tð X Þ is minimal, let T 0 ð X Þ be any other


 
sufficient statistic. By factorization theorem, 9 functions g0 and h0 such that


f ð x jhÞ ¼ g0 T 0 ð x Þjh h0 ð x Þ
  

Let x and y be any two sample points with T 0 ð x Þ ¼ T 0 ð y Þ. Then


   


f ð x jhÞ g0 ðT 0 x jhÞ h0 ð x Þ h0 x
  
¼ ¼  :
f ð y jhÞ g0 ðT 0 y jhÞh0 y
   h0 y


Since this ratio does not depend on h, the assumptions of the theorem imply
Tð x Þ ¼ Tð y Þ. Thus Tð x Þ is a function of T 0 ð x Þ and Tð x Þ is minimal.
    

Example (Normal minimal sufficient statistic) Let X1 ; X2 ; . . .Xn be iid Nðl; r2 Þ,


both l and r2 unknown. Let x and y denote two sample points, and let ðx; s2x Þ and
  

ðy; s2y Þ be the sample means and variances corresponding to the x and y samples,
  
respectively. Then we must have
  

n=2
f x jl; r 2 exp ð2pr2 Þ
 n ðx  lÞ 2
þ ð n  1 Þs 2
x ð 2r 2
Þ


 ¼   
n=2
f y jl; r2 ð2pr Þ
2 2
exp  nðy  lÞ þ ðn  1Þs y 2 ð2r Þ2
 
   
 2 
¼ exp n x  y þ 2nlðx  yÞ  ðn  1Þ s x  s y
2 2 2
2r2
 

This ratio will be constant as a function of l and r2 iff x ¼ y and s2x ¼ s2y , i.e.
 
 s2 Þ is a minimal sufficient
ðx; s2x Þ  ðy; s2y Þ. Then, by the above theorem, ðX;
 

statistic for ðl; r2 Þ.


Remark Although minimal sufficiency ) sufficiency, the converse is not neces-
Pn true.PnFor 2a random sample X1 ; X2 ; . . .Xn from Nðl; lÞ distribution,
sarily
i ¼ 1 Xi ; i ¼ 1 Xi is sufficient but not minimal sufficient statistic. In fact,
1.2 Sufficient Statistic 21

Pn P P
i¼1 Xi and ni¼ 1 Xi2 are each singly sufficient for l; ni¼ 1 Xi2 being minimal.
(This particular example also establishes the fact that single sufficiency does not
imply minimal sufficiency.)

1.3 Unbiased Estimator and Minimum-Variance


Unbiased Estimator

Let X be a random variable having c.d.f. F h ; h 2 H. The functional form of Fθ is


known, but the parameter θ is unknown. Here, we wish to find the true value of θ on
the basis of the experimentally determined values x1, x2,…, xn, corresponding to a
random sample X1, X2,…, Xn from Fθ. Sine the observed values x1, x2, …, xn
change from one case to another, leading to different estimates in different cases, we
cannot expect that the estimate in each case will be good in the sense of having
small deviation from the true value of the parameter. So, we first choose an esti-
mator T of θ such that the following condition holds:

PfjT  hj \ cg PfjT 0  hj \ cg 8 h 2 H and 8 c ð1:7Þ

where T 0 is any rival estimator.


Surely, (1.7) is an ideal condition, but the mathematical handling of (1.7) is very
difficult. So we require some simpler condition. Such a condition is based on mean
square error (m.s.e.). In this case, an estimator will be best if its m.s.e. is least. In
other words, an estimator T will be best in the sense of m.s.e. if

EðT  hÞ2  E ðT 0  hÞ 8 h and for any rival estimator T 0


2
ð1:8Þ

It can readily be shown that there exists no T for which (1.8) holds. [e.g. Let θ0
be a value of θ and consider T 0 ¼ h0 . Note that m.s.e. of T 0 at θ = θ0 is ‘0’, but m.s.
e. of T 0 for other values of θ may be quite large.]
To sidetrack this, we introduce the concept of unbiasedness.
Actually, we choose an estimator on the basis of a set of criteria. Such a set of
criteria must depend on the purpose for which we want to choose an estimator.
Usually, a set consists of the following criteria: (i) unbiasedness; (ii) mini-
mum-variance unbiased estimator; (iii) consistency, and (iv) efficiency.
Unbiasedness
An estimator T is said to be an unbiased estimator (u.e.) of h ½or cðhÞ iff
E ðT Þ ¼ h ½or cðhÞ 8h 2 H.
Otherwise, it will be called a biased estimator. The quantity b(θ, T) = Eθ (T) − θ
is called the bias. A function γ(θ) is estimable if it has an unbiased estimator.
22 1 Theory of Point Estimation

Let X1, X2,…, Xn be a random sample from a population with mean μ and
Pn
variance σ2. Then X and s2 ¼ 1  2 are u.e’s of μ and σ2,
n1 i ¼ 1 ðXi  X Þ
respectively.
Note
(i) Every individual observation is an unbiased estimator of population mean.
(ii) Every partial mean is an unbiased estimator of population mean.
Pk Pk
1
(iii) Every partial sample variance [ e.g. k1  2 
1 ðXi  Xk Þ ; Xk ¼ k
1
1 Xi and
k < n] is an unbiased estimator of σ .
2

Example 1.15 Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then X and
Pn
s2 ¼ n1  2
1 ðXi  X Þ are u.e’s for μ and σ , respectively. But estimator s ¼
1 2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ffi
 2
1 ðXi  X Þ is a biased estimator of σ.
1
hqffiffiffiffiffiffi
n1
  1 i
The bias bðs; rÞ ¼ r n1 2
Cðn=2Þ C n1
2 1 .

Remark 1.9 An unbiased estimator may not exist.


Example (a) Let X  b ð1; pÞ; 0 \ p \ 1.
Then there is no estimator T(X) for which E fT ðX Þg ¼ p2 8 p 2 ð0; 1Þ
i.e. π2 is not estimable. Similarly, p1 has no unbiased estimator.
m hm 
f ðx; hÞ ¼ x hnx
 ; x ¼ 0; 1; 2 . . . n
(b) For n
h ¼ m; m þ 1; . . .
then there is no unbiased estimator for θ.
Remark 1.10 Usually, unbiased estimator is not unique. Starting from two unbiased
estimators, we can construct an infinite number of unbiased estimators.
Example Let X1, X2,…, Xn be a random sample from P(λ). Then both X  and
P
s2 ¼ n1 n  2
1 ðXi  XÞ are unbiased estimators of λ as mean = variance = λ for P(λ).
1


Let Ta ¼ aX þ ð1  aÞs2 ; 0  a  1. Here, Ta is an unbiased estimator of λ.
Remark 1.11 An unbiased estimator may be absurd.
Example Let X  PðkÞ. Then T ðX Þ ¼ ð2ÞX is an unbiased estimator of e−3λ since

X ek kx
E fT ðX Þg ¼ ð2Þx
x
x!
X ð2kÞx
¼ ek ¼ ek  e2k ¼ e3k :
x
x!
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 23

Note that T ðX Þ [ 0 for even X


\ 0 for odd X:
 
∴ T(X) which is an estimator of a positive quantity e3k [ 0 may occasionally be
negative.
k
Example Let X * P(λ).
( Construct an unbiased estimator of e .
1 if x ¼ 0
Ans Let TðxÞ ¼
0; otherwise

) E fT ðX Þg ¼ 1  P ðX ¼ 0Þ þ O  PðX 6¼ 0Þ ¼ ek 8k:

∴ T(X) is an unbiased estimator of ek .


Remark 1.12 Mean square error of an unbiased estimator (i.e. variance of unbiased
estimator) may be greater than that of a biased estimator and then we prefer the
biased estimator.

EðT  hÞ2 ¼ E½T  EðT Þ þ fE ðT Þ  hg2


¼ V ðT Þ þ b2 ðT; hÞ where bðT; hÞ ¼ E ðT Þ  h:

Let T1 be a biased estimator and T2 an unbiased estimator, i.e. E(T1) ≠ θ but


E(T2) = θ.

) MSE ðT 1 Þ ¼ V ðT 1 Þ þ b2 ðT 1 ; hÞ
MSE ðT 2 Þ ¼ V ðT 2 Þ
if V ðT 2 Þ [ V ðT 1 Þ þ b2 ðT 1 ; hÞ; then we prefer T 1 :

e.g. Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then s2 ¼
P P
1  2
i ðX i  XÞ is an unbiased estimator of σ . Clearly, n þ 1
2 1  2
i ðX i  XÞ ¼ n þ 1s
n1 2
n1
is a biased estimator of σ .
2


ðn  1ÞS2 ðn  1ÞS2
As  v 2
n1 ) V ¼ 2 ð n  1Þ
r2 r2
  2
) V s2 ¼ r4 ¼ MSE of s2 :
n1
24 1 Theory of Point Estimation



2
On the other hand, MSE of n1 2
nþ1 s ¼ V n1 2
nþ1 s þ n1 2
nþ1 r  r2

 
n1 2 2 4r4 2r4 2r4 2r4
¼ r4 þ ¼ ð n  1 þ 2Þ ¼ \
nþ1 n  1 ð n þ 1Þ 2
ðn þ 1Þ 2 nþ1 n1

⇒ MSE of s2 > MSE of n1 2


nþ1 s , i.e. MSE (Unbiased estimator) > MSE (biased
estimator).
Remark 1.13: Pooling of information Let Ti be an unbiased estimator of θ obtained
from the ith source, i = 1, 2 …, k. Suppose Ti’s are independent and V(Ti) = σ2i < σ2
8i. Then T k ¼ 1k ¼ ðT 1 þ T 2 þ    þ T k Þ is also an unbiased estimator of θ with
  P
V T k ¼ k12 k1 r2i \ rk ! 0 as k !/.
2

The implication of this statement is that T k gets closer and closer to the true
value of the parameter as k → ∝ (k becomes larger and larger).
On the other hand if Ti’s are biased estimator with common bias β, then T k
approaches to the wrong value θ + β instead of the true value θ even if k → ∝.
Problem 1.1 Let X1, X2, …, Xn be a random sample from bð1; ^Þ.
Show that
XðX1Þ 2
(i) nðn1Þ is an unbiased estimator of ^
XðnXÞ
(ii) is an unbiased estimator of ^ð1  ^Þ
nðn1Þ
P
where X = number of success in n tosses = ni¼ 1 Xi .
Minimum-VarianceUnbiased Estimator (MVUE)  
Let U be the set of all u.e’s (T) of θ with E T 2 \ / 8h 2 H, and then an
estimator T0 2 U will be called a minimum-variance unbiased estimator (MVUE) of
θ{or γ(θ)} if V(T0) ≤ V(T) 8θ and for every T 2 U.
Result 1.1 Let U be the set of all u.e’s (T) of θ with E(T2) < ∝, 8θ 2 Θ.
Furthermore, let U0 be the class of all u.e’s (v) of ‘0’ {Zero} with E(v2) < ∝ 8θ, i.e.
U0 = {v: E(v) = 0 8θ and E(v2) < ∝].
Then an estimator T0 2 U will be an MVUE of θ iff

CovðT 0 ; vÞ ¼ E ðT 0 vÞ ¼ 0 8 h; 8v 2 U 0 :

Proof Only if part Given that T0 is an MVUE of θ, we have to prove that

E ðT 0 vÞ ¼ 0 8h; 8v 2 U 0 ð1:9Þ

Suppose the statement (1.9) is wrong.


∴ E(T0v) ≠ 0 for some θ0 and for some v0 2 U0. h
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 25

Note that for every real λ, T0 + λv0 is an u.e. of θ.


Again, T0 + λv0 2 U, as E(T0 + λv0)2 < ∞
Now V00(T0 + λvo) = V00(T0) + λ2 E00(V20) + 2λE(v0T0).
Choose a particular setting k ¼  EE00 ðTV0 v2 0 Þ assuming E00(v20) > 0
00 ð 0 Þ

(If E00(v20) = 0 then P00(v0 = 0) = 1, and hence E00(T0v0) = 0)


E 200 ðT o vo Þ
We have V 00 ðT 0 þ kv0 Þ ¼ V 00 ðT 0 Þ  E o ðV 2o Þ
\ V 00 ðT 0 Þ which contradicts the
fact that T0 is a minimum-variance unbiased estimator of θ.
(if part) It is given that Cov(T0v) = 0 8 θ, 8 v 2 U0. We have to prove that T0 is
an MVUE of θ. Let T be an estimator belonging to U, then (T0 − T) 2 U0.
∴ From the given condition, Cov(T0, T0 − T) = 0
 
) V ðT 0 ÞCovðT 0 ; T Þ ¼ 0 ) Cov T 0; T ¼ V ðT 0 Þ ð1:10Þ

Now, V ðT 0 T Þ 0

) V ðT 0 Þ þ V ðT Þ  2 CovðT 0 ; T Þ 0
) V ðT 0 Þ þ V ðT Þ  2 V ðT 0 Þ 0 ðby ð1:10ÞÞ
) V ðT Þ V ðT 0 Þ 8h 2 H:

Since T is an arbitrary member of U so that result.


Result 1.2 Minimum-variance unbiased estimator is unique.
Proof Suppose T1 and T2 are MVUE’s of θ.
Then

E fT 1ðT 1T 2 Þg ¼ 0 ðfrom Result 1.1Þ


) E T 21 ¼ E ðT 1 T 2 Þ ) qT 1 T 2 ¼ 1

as V(T1) = V(T2) 8θ ⇒ T1 = βT2 + α with probability 1.


Now V ðT 1 Þ ¼ b2 V ðT 2 Þ 8h ) b2 ¼ 1 ) b ¼ 1 ðas qT 1 T 2 ¼ 1Þ:
Again EðT 1 Þ ¼ bE ðT 2 Þ þ a 8h )a ¼ 0
as E ðT 1 Þ ¼ EðT 2 Þ ¼ h and b ¼ 1 ) P ðT 1 ¼ T 2 Þ ¼ 1 h
Remark Correlation coefficient between T1 and T2 (where T1 is an MVUE of θ and
T2 is any unbiased estimator of θ) is always non-negative.

E fT 1 ðT 1 T 2 Þg ¼ 0 . . . ðfrom Result 1:1Þ


) Cov ðT 1 ; T 2 Þ ¼ V ðT 1 Þ 0:
26 1 Theory of Point Estimation

Result 1.3 Let T1 be an MVUE of γ1(θ) and T2 be an MVUE of γ2(θ). Then


αT1 + βT2 will be an MVUE of αγ1(θ) + βγ2(θ).
Proof Let v be an u.e. of zero.
Then

EðT 1 vÞ ¼ 0 ¼ E ðT 2 vÞ

Now

EfðaT 1 þ bT 2 Þvg ¼ aEðT 1 vÞ þ bEðT 2 vÞ ¼ 0


) ðaT 1 þ bT 2 Þ is an MVUE of ac1 ðhÞ þ bc2 ðhÞ:
h
Result 1.4: (Rao–Cramer inequality) Let X1, X2,…, Xn be a random sample from
a population having p.d.f. f ðx; hÞ; h 2 H. Assume that θ is a non-degenerate open
interval on the real line. Let
 Q T be an unbiased
 estimator of γ(θ). Again assume that
the joint p.d.f. f ð x ; hÞ ¼ ni¼ 1 f ðxi ; hÞ of X ¼ ðX1 ; X2 ; . . .; Xn Þ satisfies the
 
following regularity conditions:
@f ð x ;hÞ

(a) @h exists
@
R R @f ð x ;hÞ
(b) @hf ð x ; hÞd x ¼ @h d 
x
 
R
R @f ð x ;hÞ
@
(c) @h T x f ð x ; hÞd x ¼ Tð x Þ @h d x
    
and
(d) 0 < I(θ) < ∝

@ log f ð X ;hÞ 2
where IðhÞ ¼ E @h

, information on θ supplied by the sample of size n.
Then

ðc0 ðhÞÞ2
V ðT Þ 8h:
IðhÞ
R
Proof Since 1 ¼ Rn f ð x ; hÞd x
 
∴ We have from the condition (b)



Z @f x ; h Z @ log f x ; h

 
O ¼ dx ¼ f x;h dx ð1:11Þ
@h  @h  
n n
R R
h
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 27
 
Again, since T X is an u.e. of γ(θ), we have from condition (c)


Z @ log f ð x ; hÞ

c0 ðhÞ ¼ Tð x Þ f ð x ; hÞd x ð1:12Þ
 @h  
n
R

Now, (1.12)–(1.11). γ(θ) gives us


Z @ log f ð x ; hÞ

c0 ðhÞ ¼ ½Tð x Þ  cðhÞ f ð x ; hÞd x
 @h  
n
R
 )
@ log f ð X ; hÞ

¼ Cov Tð X Þ; :
 @h

From the result, [Cov(X, Y)]2 ≤ V(X) V(Y), we have


" ( )# 2
@ log f ð X ; hÞ
0 2 
fc ðhÞg ¼ Cov Tð X Þ;
 @h
( )
@ log f ð X ; hÞ

 VfTð X Þg  V
 @h
 !2 !
@ log f ð X ; hÞ @ log f ð X ; hÞ
 
¼ V Tð X Þ  E as from ð1:11Þ E ¼ 0
 @h @h
¼ V fT ð X Þ g  I ð hÞ
fc 0 ð hÞ g2
) V ðT Þ ; 8 h: h
I ðhÞ

Remark 1 If the variables are of discrete type, the underlying condition and the
proof of the Cramer–Rao inequality will also be similar, only the multiple integrals
being replaced by multiple sum.
Remark 2 For any set of estimators T, having expectation γ(θ),

fc0 ðhÞg2
MSE ¼ EðT  hÞ2 ¼ V ðT Þ þ B2 ðT; hÞ þ B2 ðT; hÞ ¼
IðhÞ
½1 þ B0 ðT; hÞ2
þ B2 ðT; hÞ ½where cðhÞ ¼ h þ BðT; hÞ:
IðhÞ
28 1 Theory of Point Estimation


Remark 3 Assuming that f x ; h is differentiable not only once but also twice, we

have

Z @ 2 log f ð x ; hÞ Z (@ log f ð x ; hÞ)2


 
0¼ f ð x ; hÞd x þ f ð x ; hÞd x
@h2   @h  
n n
R R
( )
@ 2 log f ð x ; hÞ

) IðhÞ ¼ E :
@h2

Remark 4 Since X1, X2,…, Xn are iid random variables,


( )
@ 2 log f ð x ; hÞ

IðhÞ ¼ n E :
@h2

Remark 5 An estimator T for which the Cramer–Rao lower bound is attained is


often called a minimum-variance bound estimator (MVBE). In this case, we have

@ log f ð x ; hÞ

¼ kðhÞfT  cðhÞg:
@h

Note that every MVBE is an MVUE, but the converse may not be true.
Remark 6 Distributions admitting an MVUE
A distribution having an MVUE of λ(θ) must satisfy
@ log f ðx;hÞ
@h ¼ kðhÞfT  cðhÞg. It is a differential equation. So
Z Z
log f ðx; hÞ ¼ T kðhÞ dh  kðhÞ cðhÞdh þ cðxÞ

) f ðx; hÞ ¼ AeTh1 þ h2

where hi ; i ¼ 1; 2 are functions of h and A ¼ ecðxÞ


Note If T be a sufficient statistic for θ, then

L ¼ gðT; hÞ hðx1 ; x2 ; . . . ; xn Þ

@ log L @ log gðT;hÞ


or; ¼ ð1:13Þ
@h @h

which is a function of T and h.


Now the condition that T be an MVB unbiased estimator of h is that / ¼
@ log L
@h ¼ BðThÞ which is a linear function of T and h and V(T) = 1/B. Thus if there
exists an MVB unbiased estimator of h it is also sufficient. The converse is not
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 29

necessarily true. Equation (1.13) may be a non-linear functions of T and h in which


case T is not an MVB unbiased estimator of h. Thus the existence of MVB unbiased
estimator implies the existence of a sufficient estimator, but the existence of suf-
ficient statistic does not necessarily imply the existence of an MVB unbiased
estimator. It also follows that the distribution possessing an MVB unbiased esti-
mator for its parameter can be expressed in Koopman form. Thus, when
0
L ¼ eA T þ BðhÞ þ Rðx1 ;x2 ;...xn Þ , T is an MVB unbiased estimator of h with variance
0
1=ð@A
@h Þ, which is also MVB.

Example x1 ; x2 ; . . .; xn is a random sample from Nðl; 1Þ


Pn Pn

n 12 ðxi lÞ2 12 ðxi xÞ2 þ nx þ nl2 2nlx n2 log 2p
Here, L ¼ p1ffiffiffiffi e i¼1
2p
¼e i¼1

Take A0 T ¼ nlx where T ¼ x: MVB ¼ 1=ð@ðnlÞ


@l Þ ¼ n.
1

Example x1 ; x2 ; . . .; xn is a random sample


 
from b(1, π) distribution.
Pn P
Here, P P xi log p þ ðn xi Þ logð1pÞ
nx log p 1p þ n logð1pÞ
n xi
L¼p xi
i ð1  pÞ ii¼1
¼e i
¼e :
0
Take ¼ A T ¼ p
n log 1p x where T ¼ x ¼ k
n, k = number of successes in n trials.
 
@n log 1p
p
pð1  pÞ
MVB ¼ 1 ¼ :
@p n

Remark 7 A necessary condition for satisfying the regularity conditions is that the
domain of positive p.d.f. must be free from h.

2
Example 1.16 Let X  U½0; h, let us compute nE @logf@hðx;hÞ which is hn2 . So
Cramer–Rao lower bound, in this case, for the variance of an unbiased estimator of
2
h is hn (apparant).
Now, we consider an estimator. T ¼ n þn 1 X ðnÞ ; X ðnÞ ¼ maxðX1 ; X 2 ;. . .; X n Þ: P.d.f.
 n1 1
of X ðnÞ is fxðnÞ ðxÞ ¼ n hx  h ; 0  x  h:

3h
Zh
  n n x nþ1
)E X ðnÞ ¼ n xn dx ¼ n : 5 ¼ n h:
h h nþ1 nþ1
0 0

) T ¼ n þn 1 X ðnÞ is an unbiased estimator of θ. It can be shown that


2 2
VðTÞ ¼ nðnhþ 2Þ \ hn .
30 1 Theory of Point Estimation

This is not surprising because the regularity conditions do not hold here.
Actually, here @f @h
ðx;hÞ
exists for h 6¼ x but not for θ = x since

1
f ðx; hÞ ¼ if h x
h
¼0 if h \ x:

Result 1.5: Rao–Blackwell Theorem Let fF h ; h 2 Hg be a family of distribution


functions and h be any estimator of cðhÞ in U which is the class of unbiased
estimators (h) with Eh ðh2 Þ \ 1 8h:. Let T be a sufficient statistic for the family
fF h ; h 2 Hg. Then Eðh=TÞ is free from h and will be an unbiased estimator of cðhÞ.
Moreover, V fEðh=TÞg  VðhÞ 8h; h 2 H:
The equality sign holds iff h ¼ Eðh=TÞ with probability ‘1’.
Proof Since T is a sufficient for the family fF h ; h 2 Hg; conditional distribution of
h given T must be independent of h.
)Eðh=TÞ will be free from h.
 
Now; cðhÞ ¼ EðhÞ ¼ E T Eh=T ðh=TÞ 8 h
i:e: cðhÞ ¼ EfEðh=TÞg 8h

) Eðh=TÞ is an unbiased estimator of cðhÞ.


Again we know that VðhÞ ¼ V fEðhjTÞg þ EfVðhjTÞg

) VðhÞ V fEðhjTÞg ðsince VðhjTÞ 0Þ

‘=’ holds iff VðhjTÞ ¼ 0:, i.e. iff h ¼ EðhjTÞ with probability ‘1’
h i
VðhjT Þ ¼ Eh=T fh  EðhjT Þg2
h
Result 1.6: Lehmann–Scheffe Theorem If T be a complete sufficientstatistic for θ
and if h be an unbiased estimator of cðhÞ, then EðhjTÞ will be an MVUE of cðhÞ.
Proof Let both h1 ; h2 2 U ¼ ½h : EðhÞ ¼ cðhÞ; Eðh2 Þ \ 1.
Then E fEðh1 jTÞg ¼ cðhÞ ¼ E fEðh2 jTÞg (from Result 1.5).
Hence, E fEðh1 jTÞ  Eðh2 jTÞg ¼ 0. . .8h
 
) P Eðh1 jTÞ ¼ Eðh2 jTÞ ¼ 1 ð* T is completeÞ:

) EðhjTÞ is unique for any h 2 U.


Again, applying Result 1.5, we have V fEðhjTÞg  VðhÞ8 h 2 U: Now since
EðhjTÞ is unique, it will be an MVUE of cðhÞ. h
Remark 1.14 The implication of Result 1.5 is that if we are given an unbiased
estimator h, then we can improve upon h by forming the new estimatorEðhjTÞ
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 31

based on h and the sufficient statistic T. This process of finding an improved


estimator starting from an unbiased estimator has been called Blackwellization.
Problem 1.2 Let X 1 ; X 2 ; . . .; X n be a random sample from Nðl; r2 Þ; r2 known. Let
cðlÞ ¼ l2 .
(a) Show that the variance of any unbiased estimator of l2 cannot be less than
4l2 r2
n
2 2 4l2 r2 2r4
(b) Show that T ¼ X  rn is an MVUE of l2 with variance n þ n .


1 if X ¼ 0
Example 1.17 Let X  PðkÞ, then show that dðxÞ ¼ is the only
0 otherwise
unbiased estimator of cðkÞ ¼ ek . Is it an MVUE of ek ?
Answer
Let h(x) be an unbiased estimator of ek ¼ h, say.
Then E fhðxÞg ¼ h 8h
 x
X
1
h loge 1h
) hð x Þ ¼h 8h
x¼0
x!

1 if x ¼ 0
) hð x Þ ¼
0 if x 6¼ 0
) hðxÞ; i:e:; dðxÞ is the only unbiased estimator of ek .
Here, unbiased estimator of ek is unique and its variance exists. Therefore, dðxÞ
will be an MVUE of cðkÞ¼ ek .
" #
X
1
2
E fhðxÞg ¼ 1:Pðx ¼ 0Þ þ 0: Pðx ¼ iÞ ¼ ek \ 1
i¼1

Remark 1.15 MVUE may not be very sensible.

Example Let X 1 ; X 2 ; . . .; X n be a random sample from N ðl; 1Þ, and then T ¼




2 2
X  1n is an MVUE of l2 . Note that X  1n may occasionally be negative, so that
an MVUE of l2 is not very sensible in this case.
Remark 1.16 An MVUE may not exist even though an unbiased estimator does
exist.
Example Let
PfX ¼ 1g ¼ h and PfX ¼ ng ¼ ð1  hÞ2 hn ; n ¼ 0; 1; 2; . . .; 0\ h \ 1.
32 1 Theory of Point Estimation

No MVUE of θ exists even though an unbiased estimator of θ exists.



1 if X ¼ 1
e.g. T ðX Þ ¼
0 otherwise

Bhattacharya system of Lower Bounds (Sankhya A (1946))


(Generalization of Cramer–Rao lower bound)
Regularity conditions
A family of distribution P ¼ ff h ðxÞ; h 2 Xg is said to satisfy Bhattacharya
regularity conditions if
1. θ lies in an open interval Ω of real line R. Ω may be infinite;
@i
2. @h i f h ðxÞ exists for almost all x and 8h; i ¼ 1; 2;. . .k;

Z Z i
@i @
3: f h ðxÞdx ¼ f h ðxÞdx 8h; i ¼ 1; 2. . .k; and
@h i
@hi
  i ¼ 1; 2;. . .k
4. V k
k ðhÞ ¼ mij ðhÞ ;
j ¼ 1; 2;. . .k
exists and is positive definite 8h where

1 @i @j
mij ðhÞ ¼ E h f ð xÞ f ð xÞ :
f h ðxÞ @hi h @h j h

For i = 1, Bhattacharya regularity conditions ≡ Cramer–Rao regularity


conditions.
Theorem 1.1 Let P ¼ff h ðxÞ; h 2 Xg be a family of distributions satisfying
above-mentioned regularity conditions and gðhÞ be a real valued, estimable, and
k times differentiable function of θ. Let T be an unbiased estimator of gðhÞ satisfying
R R
5. @h@ i tðxÞf h ðxÞdx ¼ tðxÞ @h@ i f h ðxÞdx
Then

Varh ðT Þ g0 V 1 g 8h

where
n o @i
g0 ¼ gð1Þ ðhÞ; gð2Þ ðhÞ; . . .; gðkÞ ðhÞ ; gðiÞ ðhÞ ¼ i gðhÞ
@h
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 33

Proof

1 @i
Define bi ðx;hÞ ¼ f ð xÞ
f h ðxÞ @hi h
Z
1 @i
E ½bi ðx;hÞ ¼ f ðxÞ  f h ðxÞdx ¼ 0
f h ðxÞ @hi h
2
1 @i
V ½bi ðx;hÞ ¼ E½bi ðx;hÞ2 ¼ E f ð x Þ ¼ mii ðhÞ
f h ðxÞ @hi h
 
Cov bi ; bj ¼ mij ðhÞ
Z Z
1 @i @i
CovðT;bi Þ ¼ t ð xÞ f h ðxÞ : f h ðxÞdx ¼ tðxÞf h ðxÞdx ¼ gðiÞ ðhÞ
f h ðxÞ @h i
@hi
0 1
T
B C
B 1C
b
B C
B b2 C
B C
B C
Let Rðk þ 1Þxðk þ 1Þ ¼ DispB  C
B C
B  C
B C
B C
@  A
bk
8 ð1Þ ð2Þ 9
>
> V h ðT Þ g ð hÞ g ð hÞ . . . gð k Þ ð hÞ >
>
>
> gð1Þ ðhÞ >
>
>
> m11 m12 ... m1k > >
>
>
>
< gð2Þ ðhÞ >
=  
m21 m22 ... m 21 V h ðT Þ g0
¼ ¼
> 
>   ... ... >
> g V
>
> >
>
>
> >
>
>    ... ... >
>
>
: ðkÞ ;
g ð hÞ mk1 mk2 ... mkk
 
jRj ¼ jV j V h ðT Þ  g0 V 1 g
as jRj 0; jV j 0
) V h ðT Þ  g0 V 1 g 0 i:e: V h ðT Þ g0 V 1 g 8h:

0 2 0 2
Cor: For k ¼ 1 V h ðTÞ fvg11ðhÞg fg ðhÞg
ðhÞ ¼ IðhÞ = Cramer–Rao lower bound. h
P
Case of equality holds when j j ¼ 0

X
X h
X Xi
)R \ k þ 1 or R  k; RðVÞ ¼ k R =rank of

X
X
R RðVÞ ) R ¼ k:
34 1 Theory of Point Estimation

 0 P
Lemma 1.1 Let X ¼ x1 ; x2 ; . . .; xp ; DðXÞ ¼ p
p
P
is of rank rð  pÞ iff x1 ; x2 ; . . .; xp satisfies (p − r) linear restrictions of the
form
    n o
a11 x1  Eðx1 Þ þ a12 x2  Eðx2 Þ þ    þ a1p xp  Eðxp Þ ¼ 0
    n o
a21 x1  Eðx1 Þ þ a22 x2  Eðx2 Þ þ    þ a2p xp  Eðxp Þ ¼ 0
:
:
    n o
apr;1 x1  Eðx1 Þ þ apr;2 x2  Eðx2 Þ þ    þ apr;p xp  Eðxp Þ ¼ 0

with probability 1.
Put p = k + 1, r = k; x1 ¼ T, x2 ¼ b1 ; . . .; xp ¼ bk .
Then RðRÞ ¼ k iff T; b1 ; b2 ; . . .; bk satisfy one restriction with probability ‘1’ of
the form

a1 fT  EðTÞg þ a2 fb1  Eðb1 Þg þ    þ ak þ 1 fbk  Eðbk Þg ¼ 0


) a1 fT  gðhÞg þ a2 b1 þ    þ ak þ 1 bk ¼ 0
) T  gðhÞ ¼ b1 b1 þ b2 b2 þ    þ bk bk ¼ b 0 b
 

where b 0 ¼ ðb1 ; b2 ;. . .; bk Þ and b ¼ ðb1 ; b2 ; . . .; bk ; Þ0 .


 

Result
T  gðhÞ ¼ b0 b with probability ‘1’ ) T  gðhÞ ¼ g0 V 1 b with probability ‘1’.
Proof

T  gðhÞ ¼ b0 b ) V h ðTÞ ¼ g0 V 1 g
 
Consider V h b0 b  g0 V 1 b ¼ V h ðT  g0 V 1 bÞ
¼ V h ðTÞ þ g0 V 1 VðbÞV 1 g  2g0 V 1 CovðT; bÞ

¼ g0 V 1 g þ g0 V 1 g  2g0 V 1 g ¼ 0 ) b0 b ¼ g0 V 1 b with probability


0 ‘1’.
1 h
gð1Þ
B gð2Þ C
 ð1Þ ð2Þ  1 B
B
C
C
A series of lower bounds: g V g ¼ g ; g ; . . .; g V B : C gives
0 1 ðkÞ
B C
@ : A
gðkÞ
1
nth lower bound ¼ gðnÞ 0 V n gðnÞ ¼ Dn ; n ¼ 1; 2;::; k
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 35

Theorem 1.2 The sequence fDn g is a non-decreasing sequences, i.e. Dn þ 1 Dn 8n


Proof The (n+1)th lower bound D ¼ g0n þ 1 V n þ 1 1 gn þ 1 where
n o n þ1 
g0n þ 1 ¼ gð1Þ ðhÞ ; gð2Þ ðhÞ ; . . .; gðnÞ ðhÞ ; gðn þ 1Þ ðhÞ ¼ g1n ; gn þ 1
0 1
m11 m12 m1n m1;n þ 1
B m21 m22 m2n m2;n þ 1 C
B C  
B: : : : C mn
Vnþ1 ¼B C ¼ V0n
B: : : : C mn mn þ 1;n þ 1
B C
@ mn1 mn2 :: mnn mn;n þ 1 A
mn þ 1;1 mn þ 1;2 :: mn þ 1;n mn þ 1;n þ 1


where m0n ¼ mn þ 1;1 mn þ 1;2 . . . mn þ 1;n .

Now Dn þ 1 ¼ g0n þ 1 V 1
n þ 1 gn þ 1

¼ g0n þ 1 C 0 ðC 0 Þ1 V 1 1
n þ 1 C Cgn þ 1 for any non symmetric matrix C n þ 1xn þ 1
0 0 1
¼ ðCgn þ 1 Þ ðCVn þ 1 C Þ ðCgn þ 1 Þ

!
In o

Choose C ¼
m0n V 1 1
n
!
In o  g   gn

 n
) Cgn þ 1 ¼ ¼
m0n V 1
n 1 gn þ 1 gn þ 1  m0n V 1
n gn
! ! !
In o Vn mn I n V 1n mn
CVn þ 1 C 0 ¼ 
m0n V 1 1 m0n mn þ 1;n þ 1; o 1
n 
! ! !
In o Vn o Vn o
  
¼ ¼
m0n V 1
n 1 m0n mn þ 1;n þ 1  m0n V 1
n mn
o

E n þ 1;n þ 1

Since V n þ 1 is positive definite, CV n þ 1 C0 is also +ve definite h


!
V 1
n o
0 1 
E n þ 1;n þ 1; [ 0; ðCV n þ 1 C Þ ¼
o E 1
n þ 1;n þ 1

36 1 Theory of Point Estimation

Then
0 1
V 1 o  
  n  g0n
Dn þ 1 ¼ gn ; g nþ1
 m0n V 1 @ A
n gn
o E 1 gn þ 1  m0n V 1
n gn
 n þ 1;n þ 1
n   1 o g0n

¼ gn V n 1 ; gn þ 1  m0n V 1
n gn E n þ 1;n þ 1
gn þ 1  m0n V 1
n gn
 2
gn þ 1  m0n V 1
n gn
¼ g0n V n 1 gn þ g0n V n 1 gn ¼ Dn
E1
n þ 1;n þ 1

i.e. Dn þ 1 Dn :
If there exists no unbiased estimator T of gðhÞ for which V(T) attains the nth
Bhattacharya’s Lower Bound (BLB), then one can try to find a sharper lower bound
by considering the (n + 1)th BLB. In case the lower bound is attained at nth stage,
then Dn þ 1 ¼ Dn . However, Dn þ 1 ¼ Dn does not imply that the lower bound is
attained at the nth stage.
Example 1.18 X 1 ; X 2 ; . . .; X n is a random sample from iid Nðh; 1Þ
P 2
f h ðxÞ ¼ Const: e2 ðxi hÞ ; gðhÞ ¼ h2
1

 
1 1
X  N h; i.e. EðXÞ ¼ h; VðXÞ ¼
n n

2 1 2 1
) EðX Þ  E2 ðXÞ ¼ ) EðX Þ ¼h2 þ
  n n
2 1 2 1
)E X  ¼ h2 ; T ¼X  :
n n

@ P 2 X
f h ðxÞ ¼ Const: e2 ðxi hÞ 
1
ð x i  hÞ
@h
1 @ X
b1 ¼ f h ðxÞ ¼ ð x i  hÞ
f h ðxÞ @h
1 @2 nX o2
b2 ¼ f ðxÞ ¼ ð x i  h Þ n
f h ðxÞ @h2 h
 
Eðb1 Þ ¼ 0; Eðb2 Þ¼ 0; E b21 ¼ n;
nX o3 nX o
Eðb1 b2 Þ ¼ E ðxi  hÞ  nE ðxi  hÞ ¼ 0
  nX o4 nX o2
E b22 ¼ E ðxi  hÞ þ n2 2nE ð x i  hÞ
¼ 3n2 þ n2  2n  n ¼ 2n2
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 37

  1 
n 0 1 0
V¼ )V ¼ n
1
0 2n2 0 2n2

gðhÞ ¼ h2 ; gð1Þ ðhÞ ¼ 2h; gð2Þ ðhÞ ¼ 2 )g0 ¼ ð2h; 2Þ


!  !
1
0 2h 2h=
D2 ¼ g V0 1
g ¼ ð2h; 2Þ n
1
¼ ð2h; 2Þ  n
0 2 1 2
2n2 n
4h2 2
¼ þ 2:
n n
2
 2  2
 2  v21;k ; k ¼ nh2 ; V nX
nX  ¼ 2 þ 4h
 ¼ 2 þ 4nh2 ; V X
n2 n
Lower bound is attained if b0 b ¼ T  gðhÞ ¼ g0 V 1 b:
! P
1
0 ð x i  hÞ
T  gðhÞ ¼ ð2h; 2Þ n
P
0 1
2n2 ½ ðxi  hÞ2 n

x  h 1
¼ ð2h; 2Þ 1 ¼ 2hðx  hÞ þ ðx  hÞ2 
2 ðx  hÞ2  2n
1 n
1 1
¼ ðx  hÞð2h þ x  hÞ  ¼ x2  h2  :
n n

Theorem 1.3 Let fh ðxÞ is of the exponential, i.e.

fh ðxÞ ¼ hðxÞek1 ðhÞtðxÞ þ k2 ðhÞ such that k10 ðhÞ 6¼ 0: ð1:14Þ

Then the variance of an unbiased estimator of gðhÞ; say b


g ðxÞ, attains the kth lower
bound but not ðk  1Þth if b
g ðxÞ is a polynomial of degree k in t(x).
Proof If fh ðxÞ is of form (1.14), then

@  
fh ðxÞ ¼ fh ðxÞ k10 ðhÞtðxÞ þ k20 ðhÞ
@h
1 @
b1 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ
fh ðxÞ @h
@2 h 2  00 i
0 0 00
f h ðxÞ ¼ fh ðxÞ k 1 ðhÞtðxÞ þ k 2 ðhÞ þ k 1 ðhÞtðxÞ þ k 2 ðhÞ
@h2
1 @2  2  
b2 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ k100 ðhÞtðxÞ þ k200 ðhÞ
fh ðxÞ @h 2

1 @i
 i h
Generally, bi ¼ fh ðxÞ f ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Pi1 ftðxÞ; hg
@hi h
where Pi1 ftðxÞ; hg ¼ a polynomial in t(x) of degree at most (i − 1).
38 1 Theory of Point Estimation

iP
1
Let Pi1 ftðxÞ; hg ¼ Qij ðhÞ:t j ðxÞ
j¼0
Then

 i X
i1
bi ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0
i1  
X j  ij Xi1 ð1:15Þ
i
¼ k10 ðhÞ :t j ðxÞ  k20 ðhÞ þ Qij ðhÞ:t j ðxÞ
j¼0 j j¼0

¼ a polynomial in tðxÞ of degree i since k10 ðhÞ 6¼ 0

Condition of equality in BLB


Variance of b
g ðxÞ attains the kth BLB but not the ðk  1Þth BLB iff

X
k
b
g ðxÞ ¼ a0 ðhÞ þ ai ðhÞbi ð1:16Þ
i¼1

with ak ðhÞ 6¼ 0:
Proof Only if part Given that b g ðxÞ is of the form (1.16), we have to show that
^
gðxÞ is a polynomial of degree k in t(x). From (1.15), bi is a polynomial of degree
i in t(x). So by putting the value of bi in (1.16), we get ^gðxÞ as a polynomial of
degree k in t(x) since ak ¼ 0.
if part Given that h

X
k
^gðxÞ ¼ Cj  t j ðxÞ
j¼0 ð1:17Þ
½Ck 6¼ 0 ¼ a polynomial of degree k in tðxÞ

It is sufficient to show that we can write ^gðxÞ in the form of (1.16)

X
k X
k X
i1
a0 ðhÞ þ ai ðhÞbi ¼ a0 ðhÞ þ ai ðhÞ Qij ðhÞ  t j ðxÞ
i¼0 i¼0 j¼0
!
X
k X
i i  j  ij
þ ai ðhÞ k10 ðhÞ t j ðxÞ k20 ðhÞ
i¼1 j¼0 j
1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 39

from (1.15)
!
X
k   jXk i  0 ij X
k1 X k
¼ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij
j¼0 i¼j j j¼0 i¼j þ 1
!
 k X i1   jXk i  0 ij X
k1 X k
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij ðhÞ
j¼0 i¼j j j¼0 i¼j þ 1
" ! !#
 k X i1  j X k i  0  j  0 ij
¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ aj ðhÞ þ ai ðhÞ k1 ðhÞ  k2 ðhÞ þ Qij ðhÞ
j¼0 i¼j þ 1 j

ð1:18Þ

Equating coefficients of t j from (1.17) and (1.18), we get


 k
Ck ¼ ak ðhÞ k10 ðhÞ
Ck
) ak ð hÞ ¼  k 6¼ 0 and
k10 ðhÞ
! !
Pk i   j   ij
Cj  i¼j þ 1 ai ð hÞ k10 ðhÞ  k20 ðhÞ þ Qij ðhÞ
j
aj ð hÞ ¼
fk 0 ðhÞg j
for j ¼ 0; 1; . . .; k  1

As such a choice of aj ðhÞ exists with ak ðhÞ 6¼ 0, the result follows.


Result 1 If there exists an unbiased estimator of gðhÞ say ^gðxÞ such that g^ðxÞ is a
polynomial of degree k in t(x), then
Dk ¼ kth BLB to the variance of an unbiased estimator of gðhÞ ¼ Var fgðxÞg:
Result 2 If there does not exist any polynomial in t(x) which is an unbiased
estimator of gðhÞ, then it is not possible to find any unbiased estimator of gðhÞ
where variance attains BLB for some k.

1.4 Consistent Estimator

An estimation procedure should be such that the accuracy of an estimate increases


with the sample size. Keeping this idea in mind, we define consistency as follows.
Definition An estimator Tn is said to be (weakly) consistent for cðhÞ if for any two
positive numbers 2 and d there exists an n0 (depending upon 2, d) such that
PrfjTn  cðhÞj  2g [ 1  d whenever n n0 and for all h 2 H, i.e. if
Pr
Tn !cðhÞ as n ! 1
40 1 Theory of Point Estimation

Result 1.7 (Sufficient condition for consistency):


An estimator Tn will be consistent for cðhÞ if EðTn Þ ! cðhÞ and VðTn Þ ! 0 as
n ! 1.
Proof By Chebysheff’s inequality, for any 20 [ 0

VðTn Þ
PrfjTn  EðTn Þj  20 g [ 1  :
202
Now jT n  cðhÞj  jT n  E ðT n Þj þ jE ðT n Þ  cðhÞj
jT n  E ðT n Þj  20 ) jT n  cðhÞj  20 þ jEðT n Þ  cðhÞj
Hence, h
  
PrfjT n  cðhÞj  20 þ jE ðT n Þ  cðhÞjg Pr T n  EðT n Þ  20 [ 1
VðT n Þ
 : ð1:19Þ
202

Since E ðT n Þ ! cðhÞ and VðT n Þ ! 0 as n ! 1, for any pair of two positive


numbers ð200 ; dÞ, we can find an n0 (depending on ð200 ; dÞ) such that

jE ðT n Þ ! cðhÞj  200 ð1:20Þ

and

VðT n Þ  202 d ð1:21Þ

whenever n n0 . For such n0

jTn  cðhÞj  20 þ jEðTn Þ  cðhÞj ) jTn  cðhÞj  20 þ 200


V(Tn Þ ð1:22Þ
and 1 02 1  d
2

Now from (1.19) and (1.22), we have PrfjT n  cðhÞj  20 þ 200 g

PrfjT n  cðhÞj  20 þ jE ðT n Þ  cðhÞjg [ 1  d:

Taking 2 ¼ 20 þ 200
) PrfjT n  cðhÞj  2g [ 1  d whenever n n0
Since, 20 200 and d are arbitrary positive numbers, the proof is complete.
(It should be remembered that consistency is a large sample criterion)
Example 1.19 Let X 1 ; X 2 ; . . .; X n be a random sample from a population mean l
P
and standard deviation r. Then X n ¼ 1n i X i is a consistent estimator of l.
    2
Proof E X n ¼ l; V X n ¼ rn ! 0 as n ! 1. Sufficient condition of consistency
holds. ) X n will be consistent for l. h
1.4 Consistent Estimator 41

Alt
By Chebysheff’s inequality, for any 2
   r2
Pr X n  l  2 [ 1 
n 22
Now for any d, we can find an n0 so that
  
Pr X n  l  2 [ 1  d whenever n n0 (here d ¼ nr22 )
2

Example 1.20 Show that in random sampling from a normal population, the sample
mean is a consistent estimator of population mean.
   n pffiffio
Proof For any 2 ð [ 0Þ; Pr X n  l  2 ¼ Pr jZ j  2 r n

p
2 n

Zr
1 X n  l pffiffiffi
pffiffiffiffiffiffie2t dt
12
¼ where Z ¼ n  N ð0; 1Þ
pffi
2p r
2 n
r
h
Hence, we can choose an n0 depending on any two positive numbers 2 and d
such that
PrfjX  n  lj  2g [ 1  d whenever n n0
)X n Pr
!l as n ! 1 )X  n is consistent for l.

Example 1.21 Show that for random sampling from the Cauchy population with
density function
f ðx;lÞ ¼ p1 1 þ ð1xlÞ2 ;  1 \ x\ 1; the sample mean is not a consistent esti-
mator of l but the sample median is a consistent estimator of l.
Answer
Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; lÞ ¼ p1 1 þ ð1xlÞ2 : It can be shown
that the sample mean X  is distributed as x.

Z2
   1 1 2  
) Pr X n  l  2 ¼ dZ ¼ tan1 2 taking Z ¼X  l
p 1þZ 2 p
2

which is free from n.   


Since this probability does not involve n, Pr X n  l  2 cannot always be
greater than 1  d; and however large n may be.
It can be shown that for the sample median X ~ n;
   
    2
~n ¼ l þ 0 1 ;
E X ~n ¼ 0 1 þ p
V X
n n 4n
42 1 Theory of Point Estimation

   
) Since E X~ n ! l and V X ~ n ! 0 as n ! 1; sufficient condition for consistent
~
estimator holds. )X n ; is consistent for l.
Remark 1.17 Consistency is essentially a large sample criterion.
Remark 1.18 Let T n be a consistent estimator of cðhÞ and wfyg be a continuous
function. Then wfT n g will be a consistent estimator of wfcðhÞg.
Proof Since T n is a consistent estimator of cðhÞ; for any two +ve numbers 21 and d;
we can find an n0 such that h
PrfjT n  cðhÞj  21 g [ 1  d whenever n n0 :
Now wfT n g is a continuous function of T n : Therefore, for any 2, we can choose
an 21 such that

jT n  cðhÞj  21 ) jwfT n g  wfcðhÞgj  2 :

) PrfjwfT n g  wfcðhÞgj  2g PrfjT n  cðhÞj  21 g [ 1  d whenever


n n0
i.e. PrfjwfT n g  wfcðhÞgj  2g [ 1  d whenever n n0 :
Remark 1.19 A consistent estimator is not unique
For example, if T n is a consistent estimator of h, then for any fixed a and b
T 0n ¼ na
nb T n is also consistent for h.

Remark 1.20 A consistent estimator is not necessarily unbiased, e.g. U : f ðx; hÞ ¼


h ; 0 \ x\ h; consistent estimator of h is X ðnÞ ¼ max1  i  n X i : But it is not
1

unbiased.
Remark 1.21 An unbiased estimator is not necessarily consistent, e.g.
f ð xÞ ¼ 12 ejxhj ; 1 \ x \ 1.
Xð1Þ þ XðnÞ
An unbiased estimator of h is 2 ; but it is not consistent.
Remark 1.22 A consistent estimator may be meaningless,

0 if n  1010
e.g. Let Tn0 ¼
Tn if n 1010

If Tn is consistent, then Tn0 is also consistent, but Tn0 is meaningless for any practical
purpose.
Remark 1.23 If T1 and T2 are consistent estimators of c1 ðhÞ and c2 ðhÞ; then
ðiÞðT1 þ T2 Þ is consistent for c1 ðhÞ þ c2 ðhÞ and
ðiiÞT 1 T 2 is consistent for c1 ðhÞc2 ðhÞ:
Proof (i) Since T1 and T2 are consistent for c1 ðhÞ and c2 ðhÞ; we can always choose
an n0 much that
1.4 Consistent Estimator 43

PrfjT1  c1 ðhÞj  21 g [ 1  d1

and

PrfjT2  c2 ðhÞj  22 g [ 1  d2

whenever n n0
21 ; 22 ; d1 ; d2 are arbitrary positive numbers.

Now jT1 þ T2  c1 ðhÞ  c2 ðhÞj  jT1  c1 ðhÞj þ jT2  c2 ðhÞj

 21 þ 22 ¼ 2; ðsayÞ

) PrfjT1 þ T2  c1 ðhÞ  c2 ðhÞj  2g PrfjT1  c1 ðhÞj  21 ; jT2  c2 ðhÞj  22 g


PrfjT1  c1 ðhÞj  21 g þ PrfjT2  c2 ðhÞj  22 g  1

½*PðABÞ Pð AÞ þ PðBÞ  1

1  d1 þ 1  d2  1 ¼ 1  ðd1 þ d2 Þ ¼ 1  d for n n0

) PrfjT1 þ T2  c1 ðhÞ  c2 ðhÞj  2g [ 1  d for n n0

Hence, T1 þ T2 is consistent estimator of c1 ðhÞ þ c2 ðhÞ.


(ii) Again jT1  c1 ðhÞj  21 and jT2  c2 ðhÞj  22

) jT1 T2  c1 ðhÞc2 ðhÞj ¼ jfT1  c1 ðhÞgfT2  c2 ðhÞg þ T2 c1 ðhÞ þ T1 c2 ðhÞ  2c1 ðhÞc2 ðhÞj
 jfT1  c1 ðhÞgfT2  c2 ðhÞgj þ jc1 ðhÞjjT2  c2 ðhÞj þ jc2 ðhÞjjT1  c1 ðhÞj
 21 22 þ jc1 ðhÞj 22 þ jc2 ðhÞj 21 ¼2 ðsayÞ

) PrfjT1 T2  c1 ðhÞc2 ðhÞj  2g PrfjT1  c1 ðhÞj  21 ; jT2  c2 ðhÞj  22 g


PrfjT1  c1 ðhÞj  21 g þ PrfjT2  c2 ðhÞj  22 g  1
1  d1 þ 1  d2  1 ¼ 1  ðd1 þ d2 Þ ¼ 1  d whenever n n0

) T1 T2 is consistent for c1 ðhÞc2 ðhÞ: h


Example 1.22 Let X1 ; X2 ; . . .; Xn be a random sample from the distribution of X
for which the moments of order 2r ðl02r Þ exist. Then show that
P
n
(a) m0r ¼ 1n Xir is a consistent estimator of l0r , and
P 1
(b) mr ¼ 1n ðXi  X  Þr is a consistent estimator of lr . These can be proved using
the following results.
44 1 Theory of Point Estimation

    l0  l02
As E m0r ¼ l0r and V m0r ¼ 2r r
n
  
)V m0r ! 0 as n ! 1 )m0r is consistent for l0r and E ðmr Þ ¼ lr þ 0 1n
 
1  1
V ðm r Þ ¼ l2r  l2r  2rlr1 lr þ 1 þ r 2 l2r1 l2 þ 0 2 ! 0 as n ! 1
n n
)mr is consistent for lr :
l2
(c) Also it can be shown that b1 and b2 are consistent estimators of b1 ¼ l33 and
2
b2 ¼ ll42 :
2

1.5 Efficient Estimator

Suppose the regularity conditions hold for the family of distribution


ff ðx; hÞ; h 2 Hg: Let an unbiased estimator of cðhÞ be T. Then the efficiency of T is
given by
.
fc 0 ð hÞ g2
I ð hÞ
V ðT Þ

It is denoted by eff. (T)/or e(T). Clearly, 0  eðT Þ  1:


An estimator T will be called (most) efficient if eff(T) = 1. An estimator T of cðhÞ
is said to be asymptotically efficient if E ðT Þ ! cðhÞ and eff (T) ! 1 as n ! 1:
Let T1 and T2 be two unbiased

estimators of cðhÞ. Then the efficiency of T1
relative to T2 is given by eff. T1=T2 ¼ VV ððTT21 ÞÞ :

Remark 1.24 An MVBE will be efficient.


Remark 1.25 In many cases, MVBE does not exist even though the family satisfies
the regularity conditions. Again in many cases, the regularity conditions do not
hold. In such cases, the above definition fails. If MVUE exists, we take it as an
efficient estimator.
Remark 1.26 The efficiency measure has an appealing property of determining the
relative sample sizes needed to attain the same precision of estimation as measured
by variance.
e.g.: Suppose an estimator T1 is 80 % efficient and V ðT1 Þ ¼ nc ; where c depends
upon h. Then, V ðT0 Þ ¼ 0:8 nc : Thus the estimator based on a sample of size 80 will
be just as good as an estimator T1 based on a sample of size 100.
1.5 Efficient Estimator 45

Example 1.23 Let T1 and T2 be two unbiased estimators of h with efficiency e1 and
e2 , respectively. If q denotes the correlation coefficient between T1 and T2 , then
pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
e1 e2  ð1  e1 Þð1  e2 Þ  q  e1 e2 þ ð1  e1 Þð1  e2 Þ

Proof For any real ‘a’, T ¼ aT 1 þ ð1  aÞT 2 will also be an unbiased estimator of
h. Now
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
V ðT Þ ¼ a2 V ðT 1 Þ þ ð1  aÞ2 V ðT 2 Þ þ 2að1  aÞq V ðT 1 ÞV ðT 2 Þ:

Suppose T0 be an MVUE of h with variance V0 . Then V ðT Þ V0

V0 V0 V0
) a2 þ ð1  aÞ2 þ 2að1  aÞq pffiffiffiffiffiffiffiffiffi V0
e1 e2 e1 e2
     
1 1 2q 1 q 1
) a2 þ  pffiffiffiffiffiffiffiffiffi  2a  pffiffiffiffiffiffiffiffiffi þ  1 0
e1 e2 e1 e2 e2 e1 e2 e2
!2 !2
pffiffiffiffiffiffi pffiffiffiffiffiffi
e2  e1 e2
q
e2  e1 e2
q
e2  1
1 1 1
) a 1 þ 1  1 0
e1 þ e2  e1 e2
1 p2qffiffiffiffiffiffi 1 ffiffiffiffiffiffi
e1 þ e2  e1 e2
p2q
e1 þ e2  e1 e2
1 p2qffiffiffiffiffiffi

1 pqffiffiffiffiffi
e2  e1 e2
Taking a ¼ 1 ffiffiffiffiffi, we get
þ e1 pe2qe
e1 2 1 2

    2
1 1 1 2q 1 q
1 þ  pffiffiffiffiffiffiffiffiffi   pffiffiffiffiffiffiffiffiffi 0
e2 e1 e2 e1 e2 e2 e1 e2
 
p ffiffiffiffiffiffiffiffi
ffi e 1 e1
) q2 þ 2q e1 e2  þ ð1  e 2 Þ 1 þ 0
e2 e2
pffiffiffiffiffiffiffiffiffi
) q2  2q e1 e2  1 þ e1 þ e2  0
pffiffiffiffiffiffiffiffiffi 2
) ð q  e1 e2 Þ  ð 1  e1 Þ ð 1  e2 Þ  0
 pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
) q  e1 e2   ð1  e1 Þð1  e2 Þ Hence, the result. h
Remark 1.27 The correlation coefficient between T and the most efficient estimator
pffiffiffi
is e where e is the efficiency of the unbiased estimator T. Put e2 ¼ e and e1 = 1 in
 pffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the above inequality; q  e1 e2   ð1  e1 Þð1  e2 Þ and easily we get the
result.
Chapter 2
Methods of Estimation

2.1 Introduction

In chapter one, we have discussed different optimum properties of good point


estimators viz. unbiasedness, minimum variance, consistency and efficiency which
are the desirable properties of a good estimator. In this chapter, we shall discuss
different methods of estimating parameters which are expected to provide estima-
tors having some of these important properties. Commonly used methods are:
1. Method of moments
2. Method of maximum likelihood
3. Method of minimum v2
4. Method of least squares
In general, depending on the situation and the purpose of our study we apply any
one of the methods that may be suitable among the above-mentioned methods of
point estimation.

2.2 Method of Moments

The method of moments, introduced by K. Pearson is one of the oldest methods of


estimation. Let (X1, X2,…Xn) be a random sample from a population having p.d.f.
(or p.m.f) f(x,θ), θ = (θ1, θ2,…, θk). Further, let the first k population moments about
zero exist as explicit function of θ, i.e. l0r ¼ l0r ðh1 ; h2 ; . . .; hk Þ; r = 1, 2,…,k. In the
method of moments, we equate k sample moments with the corresponding popu-
lation moments. Generally, the first k moments are taken because the errors due to
sampling increase with the order of the moment. Thus, we get k equations
l0r ðh1 ; h2 ; . . .; hk Þ; ¼ m0r ; r = 1, 2,…, k. Solving these equations we get the method
P P
of moment estimators (or estimates) as m0r ¼ 1n ni¼1 X ri (or m0r ¼ 1n ni¼1 xri ).

© Springer India 2015 47


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_2
48 2 Methods of Estimation

If the correspondence between l0r and θ is one-to-one and the inverse function is
 
hi ¼ f i l01 ; l02 ; . . .; l0k , i = 1, 2,.., k then, the method of moment estimate becomes
 
^
hi ¼ f i m01 ; m02 ; . . .; m0k . Now, if the function fi() is continuous, then by the weak
law of large numbers, the method of moment estimators will be consistent.
This method gives maximum likelihood estimators when f(x, θ) = exp
(b0 + b1x + b2x2 + ….) and so, in this case it gives efficient estimator. But the
estimators obtained by this method are not generally efficient. This is one of the
simplest methods. Therefore, these estimates can be used as a first approximation to
get a better estimate. This method is not applicable when the theoretical moments
do not exist as in the case of Cauchy distribution.
Example 2.1 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.
f ðx; a; bÞ ¼ bða;b
1
Þx
a1
ð1  xÞb1 ; 0\x\1; a; b [ 0: Find the estimators of a and
b by the method of moments.
Solution

We know E ð xÞ ¼ l11 ¼ a þa b and Eðx2 Þ ¼ l12 ¼ ða þ baÞððaaþþ1bÞ þ 1Þ :


P
Hence, a þa b ¼ x; ða þ abðÞaðaþþ1bÞ þ 1Þ ¼ 1n ni¼1 x2i
P 2
ðx  1Þð xi  nxÞ ^
By solving, we get ^b ¼ P 2 and ^a ¼ 1xb x :
ðxi  xÞ

2.3 Method of Maximum Likelihood

This method of estimation is due to R.A. Fisher. It is the most important general
method of estimation. Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ denote a random sample with joint
    
p.d.f or p.m.f. f x ; h ; h 2 H (θ may be a vector). The function f x ; h , con-
 
sidered as a function of θ, is called the likelihood function. In this case, it is denoted
by L(θ). The principle of maximum likelihood consists of choosing an estimate, say
^
h; within the admissible range of θ, that maximizes the likelihood. ^h is called the
maximum likelihood estimate (MLE) of θ. In other words, ^h will be an MLE of θ if
 
L h^  LðhÞ8 h 2 H:

In practice, it is convenient to work with logarithm. Since log-function is a


monotone function, ^h satisfies
2.3 Method of Maximum Likelihood 49

 
log L ^h  log LðhÞ8 h 2 H:

Again, if log LðhÞ is differentiable within H and ^h is an interior point, then ^h will
be the solution of

@log LðhÞ 0
¼ o; i ¼ 1; 2; . . .; k; h k1 = ðh1 ; h2 ; . . .; hk Þ .
@hi

These equations are known as likelihood equations.


Problem 2.1 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from b(m, p ), (m known).
Pn
^¼ mn
Show that p 1
i¼1 X i is an MLE of p.

Problem 2.2 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from P (λ). Show that


^ P
k ¼ 1n ni¼1 X i is an MLE of λ.

  2.3 Let ðX 1 ; X 2 ; . . .; X n Þ be a random


Problem
P
sample from N ðl; r2 Þ. Show that
P
 s2 is an MLE of ðl; r2 Þ, where X
X;  ¼ 1 n X i and s2 ¼ 1 n ðX i  X  Þ2 :
n i¼1 n i¼1

Example 2.2 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from a population having


p.d.f f ðx; hÞ ¼ 12 ejxhj ; 1\x\1:
Show that the sample median X ~ is an MLE of θ.

Answer
Pn
LðhÞ ¼ Const: e i¼1
jxi hj

P
Maximization of L(θ) is equivalent to the minimization of ni¼1 jxi  hj: Now,
Pn
~
i¼1 jxi  hj will be least when h ¼ X; the sample median as the mean deviation
about the median is least. X ~ will be an MLE of θ.
Properties of MLE
(a) If a sufficient statistic exists, then the MLE will be a function of the sufficient
statistic.
n   o
Proof Let T be a sufficient statistic for the family f X ; h ; h 2 H

Qn n   o  
By the factorisation theorem, we have f ð x i ; hÞ ¼ g T X ; h h X :
 
n   i¼1 o n   o
To find MLE, we maximize g T x ; h with respect to h. Since g T X ; h
  

is a function of h and x only through T X ; the conclusion follows immediately.h


 
50 2 Methods of Estimation

Remark 2.1 Property (a) does not imply that an MLE is itself a sufficient statistic.
Example 2.3 Let X1, X2,…,Xn be a random sample from a population having p.d.f.
 
1 8 hxhþ1
f X; h ¼ :
 0 Otherwise

1 if h  MinX i  MaxX i  h þ 1
Then, LðhÞ ¼ :
0 Otherwise
Any value of θ satisfying MaxX i  1  h  MinX i will be an MLE of θ. In
particular, Min Xi is an MLE of θ, but it is not sufficient for θ. In fact, here
ðMinXi ; MaxXi Þ is a sufficient statistic.
(b) If T is the MVBE, then the likelihood equation will have a solution T.
 
@ log f X ;h

Proof Since T is an MVBE, ¼ ðT  hÞkðhÞ
  @h
@ log f X ;h

Now, @h ¼0

) h ¼ T ½* kðhÞ 6¼ 0:

(c) Let T be an MLE of θ and d ¼ wðhÞ be a one-to-one function of θ. Then,


d ¼ wðT Þ will be an MLE of d. h
n  o
Proof Since T is an MLE of θ, L T X  LðhÞ8h;

Since the correspondence between θ and d is one-to-one, inverse function must
exist. Suppose the inverse function is h ¼ w1 ðdÞ:
 
Thus, LðhÞ ¼ L w1 ðdÞ ¼ L1 ðdÞ (say)
Now, 8 0
2 19 3
 1  <   = n  o
L1 ðd Þ ¼ L w ðd Þ ¼ L4w1 w@T X A 5¼ L T X  LðhÞ ¼ L1 ðdÞ.
:  ; 

Therefore, ‘d’ is an MLE of d.
(d) Suppose the p.d.f. (or p.m.f.) f(x, θ) satisfies the following regularity
conditions:

(i) For almost all x, @f ð@h


x; hÞ @ f ðx; hÞ @ f ðx; hÞ
2 3
; ; @h3 exists 8 h 2 H.
2 @h 3
2

@f ðx; hÞ @ f ðx; hÞ
(ii) @h \A1 ðxÞ; @h2 \A2 ðxÞ and @ f@hðx;3 hÞ \BðxÞ;
where A1(x) and A2(x) are integrable functions of x and
R1
BðxÞf ðx; hÞdx \ M; a finite quantity
1
R1  2
@ log f ðx; hÞ
iii) @h f ðx; hÞdx is a finite and positive quantity.
1

If ^
hn is an MLE of θ on the basis of a sample of size n, from a population having
p.d.f. (or p.m.f.) f(x,θ) which satisfies the above regularity conditions, then
2.3 Method of Maximum Likelihood 51

pffiffiffi^ 
n hn  h is asymptotically normal with mean ‘0’ and variance
1  2 1
R @ log f ðx; hÞ
@h f ðx; hÞdx : Also, ^hn is asymptotically efficient and consistent.
1
(e) An MLE may not be unique. h

1 if h  x  h þ 1
Example 2.4 Let f ðx; hÞ ¼ :
0 Otherwise

1 if h  min xi  max xi  h þ 1
Then, LðhÞ ¼
0 Otherwise

1 if max xi  1  h  min xi
i.e. LðhÞ ¼
0 Otherwise

Clearly, for any value of θ, say T a ¼ aðMaxxi  1Þ þ ð1  aÞMinxi ; 0  a  1;


L(θ) will be maximized. For fixed α, T a will be an MLE. Thus, we observe that an
infinite number of MLE exist in this case.
(f) An MLE may not be unbiased.
Example 2.5
1
if 0  x  h
f ðx; hÞ ¼ h :
0 Otherwise
 1
if max xi  h
Then, LðhÞ ¼ hn :
0 Otherwise

L( θ )

Max Xi θ

From the figure, it is clear that the likelihood L(θ) will be the largest when
θ = Max Xi. Therefore Max Xi will be an MLE of θ. Note that EðMax X i Þ ¼
n þ 1 h 6¼ h: Therefore, here MLE is a biased estimator.
n

(g) An MLE may be worthless.


52 2 Methods of Estimation

Example 2.6

1 3
f ðx; pÞ ¼ px ð1  pÞ1x ; x ¼ 0; 1; p 2 ;
4 4
 
p if x ¼ 1 p ¼ 34 if x ¼ 1
Then; LðpÞ ¼ i.e. LðpÞ will be maximized at
1p if x ¼ 0 p ¼ 14 if x ¼ 0

Thus, T ¼ 2X 4þ 1 will be an MLE of θ.


Now, EðT Þ ¼ 2p 4þ 1 6¼ p: Thus, T is a biased estimator of π.

MSE of T ¼ E ðT  pÞ2
2
2x þ 1 1
¼E  p ¼ Ef2ðx  pÞ þ 1  2pg2
4 16
1 n o
¼ E 4ðx  pÞ2 þ ð1  2pÞ2 þ 4ðx  pÞð1  2pÞ
16
1 n o 1
¼ 4pð1  pÞ þ ð1  2pÞ2 ¼
16 16

Now, we consider a trivial estimator dðxÞ ¼ 12.


 2 1  
MSE of dðxÞ ¼ 12  p  16 = MSE of T 8p 2 14 ; 34
Thus, in the sense of mean square error MLE is meaningless.
(h) An MLE may not be consistent
Example 2.7

hx ð1  hÞ1x if h is rational
f ðx; hÞ ¼
ð1  hÞx h1x if h is rational 0\h\1; x ¼ 0; 1:

An MLE of θ is ^hn ¼ X. Here, ^hn is not a consistent estimator of h.


(i) The regularity conditions in (d) are not necessary conditions.
Example 2.8

1 1\x\1
f ðx; hÞ ¼ ejxhj ;
2 1\h\1

Here, regularity conditions do not hold. However, the MLE (=sample median) is
asymptotically normal and efficient.

Example 2.9 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼


bebðxaÞ ; a  x\1 and b [ 0.
Find MLE’s of α, β.
2.3 Method of Maximum Likelihood 53

Solution
P
n
b ðxi aÞ
Lða; bÞ ¼ b e n i¼1

X
n
loge Lða; bÞ ¼ n loge b  b ð x i  aÞ
i¼1

@ log L P @ log L
@b ¼ bn  ðxi  aÞ and @a ¼ nb:
@ log L
Now, ¼ 0 gives us β = 0 which is nonadmissible. Thus, the method of
@a
differentiation fails here.
Now, from the expression of L(α, β), it is clear that for fixed β(>0), L(α, β)
becomes maximum when α is the largest. The largest possible value of α is
X(1) = Min xi.  
Now, we maximize L X ð1Þ ; b with respect to β. This can be done by consid-
ering the method of differentiation.

 
@ log L xð1Þ ; b n X n
¼0)  ðxi  min xi Þ ¼ 0 ) b ¼ P
@b b ðxi  min xi Þ
 
So, the MLE of (α, β) is minxi ; P n
:
ðxi minxi Þ
Example 2.10 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼

ba; axb
1

0; Otherwise
(a) Show that the MLE of (α, β) is (Min Xi, Max Xi).
(b) Also find the estimators of a and b by the method of moments.
Proof

1
ðaÞLða; bÞ ¼ if a  Minxi \Maxxi  b ð2:1Þ
ð b  aÞ n

It is evident from (2.1), that the likelihood will be made as large as possible
when (β − α) is made as small as possible. Clearly, α cannot be larger than Min xi
and β cannot be smaller than Max xi; hence, the smallest possible value of (β − α) is
(Max xi − Min xi). Then the MLE’S of α and β are ^a ¼ Min xi and b ^ ¼ Max xi ,
respectively.
2
(b) We know E ð xÞ ¼ l11 ¼ a þ2 b and V ð xÞ ¼ l2 ¼ ðb 12aÞ
2 P
Hence, a þ2 b ¼ x and ðb 12aÞ ¼ 1n ni¼1 ðxi  xÞ2
54 2 Methods of Estimation

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P ffi
2
ðxi xÞ ðxi xÞ
By solving, we get ^a ¼ x 
3
^ ¼ x þ
and b
3
n n

Successive approximation for the estimation of parameter

It frequently happens that the likelihood equation is by no means easy to solve.


A general method in such cases is to assume a trial solution and correct it by an
extra term to get a more accurate solution. This process can be repeated until we get
the solution to a sufficient degree of accuracy.

Let L denote the likelihood and h be the MLE.

Then @ log
@h
L

¼ 0. Suppose h0 is a trial solution of @ log
@h ¼ 0
L
h¼h
Then
L @ log L log L
0 ¼ @ log þ ðh  h0 Þ @ @h + terms involving ðh  h0 Þ
2

@h 
¼ @h 2
h¼h h¼h0 h¼h0
with powers higher
than unity.
@ log L log L
þ ðh  h0 Þ @ @h
2
) 0 ’ @h 2 ; neglecting the terms involving
h¼h0 h¼h0

ðh  h0 Þ with powers higher than unity.  2 
L
) 0 ’ @ log
@h ðh  h0 ÞIðh0 Þ; where IðhÞ ¼ E @ @h
log L
2 :
h¼h0
Thus, the first approximate value of θ is
8 9
> @ log L >
< @h =
hð1Þ ¼ h0 þ
h¼h0
:
: I ð h0 Þ >
> ;

Example 2.11 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; hÞ ¼ p 1


f1 þ ðx  hÞ2 g
@ log f ðx; hÞ Pn
Here, @h ¼ 1 þ2ððxhÞ
xhÞ2
; and so the likelihood equation is ðxi  hÞ
i¼1 1 þ ðxi  hÞ2 ¼ 0;
clearly it is difficult to solve for θ.
So, we consider successive approximation method.
In this case, IðhÞ ¼ n2 :
P
Here, the first approximation is hð1Þ ¼ h0 þ 4n ni¼1 1 þðxði xh0hÞ Þ2 ;
i 0

h0 being a trial solution.


Usually, we take h0 = sample median.
2.4 Method of Minimum v2 55

2.4 Method of Minimum v2

This method may be used when the population is grouped into a number of mu-
tually exclusive and exhaustive class and the observations are given in the form of
frequencies.
Suppose there are k classes and pi ðhÞ is the probability of an individual
belonging to the ith class. Let f i denote the sample frequency. Clearly,
Pk Pk
i¼1 pi ðhÞ ¼ 1 and i¼1 f i ¼ n:
The discrepancy between observed frequency and the corresponding expected
frequency is measured by the Pearsonian v2 , which is given by
P 2 P f2
v2 ¼ ki¼1 ff i npnpi ðhi ðÞhÞg ¼ npi iðhÞ  n:
The principle of the method of minimum v2 consists of choosing an estimate of
θ, say ^h, we first consider the minimum v2 equations @v
2

@hi ¼ 0; i = 1, 2,…,r and


hi ¼ ith component of θ.
It can be shown that for large n, the min v2 equations and the likelihood
equations are identical and provides identical estimates.
The method of minimum v2 , is found to be more troublesome to apply in many
cases, and has no improvement on the maximum likelihood method. This method
can be used when maximum likelihood equations are difficult to solve. In particular
situations, this method may be simple. To avoid the difficulty in minimum v2
method, we consider another measure of discrepancy, which is given by v02 ¼
Pk ff i  npi ðhÞg2 02
i¼1 f ; v is called modified Pearsonian v2 . Now, we minimize, instead
i

of v2 , with respect to θ.
It can be shown that for large n the estimates obtained by min v2 would also be
approximately equal to the MLE’s. Difficulty arises if some of the classes are
empty. In this case, we minimize

X ff  npi ðhÞg2
v002 ¼ i
+ 2M;
i:f 6¼0
fi
i

where M = sum of the expected frequencies of the empty classes.


Example 2.12 Let ðx1 ; x2 ; . . .; xn Þ be a given sample of size n. It is to be tested
whether the sample comes from some Poisson distribution with unknown mean l.
How do you estimate l by the method of modified minimum chi-square?
Solution

Let x1 ; x2 ; . . .; xn be arranged in k groups such that there are


ni observations with x ¼ i; i ¼ r þ 1; . . .; r þ k  2
nL observations x  r
56 2 Methods of Estimation

nu observations with x  r þ k  1 so that the smallest and the largest values of


P þ k2
x, which are fewer, are pooled together andnL þ ri¼r þ 1 ni þ nu ¼ n:
el li P
Let pi ðlÞ ¼ Pð x ¼ iÞ ¼ i! , pL ðlÞ ¼ Pðx  r Þ ¼ ri¼0 pi ðlÞ and pu ðlÞ ¼
P1
Pðx  r þ k  1Þ ¼ i¼r þ k1 pi ðlÞ:
Pr i
Pk ni @pi ðhÞ ð 1Þpi ðlÞ
Now using i¼1 pi ðhÞ @hj ¼ 0; j ¼ 1; 2; . . .p we have nL
P
i¼0 l
r þ
p ðlÞ
  P1 i¼0 i
Pr þ k2 ð 1Þpi ðlÞ
i

i¼r þ 1 ni l  1 þ nu
i P
i¼r þ k1 l
1 ¼ 0.
i¼r þ k1
pi ð lÞ
Since there is only one parameter, i.e. p ¼ 1 we get the only above equation. By
solving, we get

P
r P
1
ipi ðlÞ X
rþ k2 ipi ðlÞ
i¼0 i¼r þ k1

n^ nL P r þ ini þ nu P 1
pi ðlÞ i¼r þ 1 pi ðlÞ
i¼0 i¼r þ k1

= sum of all x’s


^ is approximately the sample mean x.
Hence, l

2.5 Method of Least Square

In the method of least square , we consider the estimation of parameters using some
specified form of the expectation  and second moment  of the observations. For
fitting a curve of the form y ¼ f x; b0 ; b1 ; . . .; bp to the data ðxi ; yi Þ; i = 1, 2,…n,
we may use the method of least squares. This method consists of minimizing the
sum of squares.  
P
S ¼ ni¼1 e2i , where ei ¼ yi  f xi ; b0 ; b1 ; . . .; bp ; i = 1, 2,…,n with respect to
P 2 P 2
b0 ; b1 ; . . .; bp : Sometimes, we minimize wi ei instead of ei . In that case, it is
called a weighted least square method.
To minimize S, we consider (p + 1) first order partial derivatives and get (p + 1)
equations in (p + 1) unknowns. Solving these equations, we get the least square
estimates of b0i s.
In general, the least square estimates do not have any optimum properties even
asymptotically. However,  in case of linear estimation this method provides good
estimators. When f xi ; b0 ; b1 ; . . .; bp is a linear function of the parameters and the
2.5 Method of Least Square 57

x-values are known, least square estimators will be BLUE. Again, if we assume that
e0i s are independently and identically normally distributed, then a linear estimator of
0
the form a b will be MVUE for the entire class of unbiased estimators. In general,
 
we consider n uncorrelated observations y1 ; y2 ; . . .yn such that E ðyi Þ ¼
b1 x1i þ b2 x2i þ þ bk xki :

V ðyi Þ ¼ r2 ; i ¼ 1; 2; . . .. . .; n; x1i ¼ 18i;



where b1 ; b2 . . .. . .. . .bk and r2 are unknown parameters. If Y  and
 b stand for
column vectors of the variables yi and parameters bj and if X ¼ xji be an ðn  kÞ
matrix of known coefficients xji then the above equation can be written as

EðY Þ ¼ Xb

V ðeÞ ¼ E ðee0 Þ ¼ r2 I

where e ¼ Y  Xb is an ðn  1Þ vector of error random variable with E ðeÞ ¼ 0


and I is an ðn  nÞ identity matrix. The least square method requires that b0 s be such
calculated that / ¼ e0 e ¼ ðY  Xb Þ0 ðY  Xb Þ be the minimum. This is satisfied
when

@/
¼0
@b

Or; 2X 0 ðY  Xb Þ ¼ 0:

^ ¼ ðX 0 X Þ1 X 0 Y:
The least square estimators b 0 s is thus given by the vector b
Example 2.13 Let yi ¼ b1 x1i þ b2 x2i þ ei ; i ¼ 1; 2; . . .. . .; n or Eðyi Þ ¼ b1 x1i þ
b2 x2i ; x1i ¼ 1 for all i.
Find the least square estimates of b1 and b2 . Prove that the method of maximum
likelihood and the method of least square are identical for the case of normal
distribution.
Solution
In matrix notation we have 0 1 0 1
1 x21 y1
B1 
B x22 C
C b1
B y2 C
B C
E ðY Þ ¼ Xb ; where X ¼ B .. 
.. C; b ¼ and Y ¼ B . C
@. . A b2 @ .. A
1 x2n yn
58 2 Methods of Estimation

Now,

^ ¼ ðX 0 X Þ1 X 0 Y
b
0 1
1 x21
B P 
... 1 B1 x22 C
C
Here, X 0 X ¼
1 1
B. .. C ¼ Pn P x2i
x21 x22 . . . x2n @ .. . A x2i x22i
1 x2n

P 
X0Y ¼ P yi
x2i yi
P 2 P  P 
^ ¼ P 1 x  x2i yi
)b P P 2i P
n x22i  ð x2i Þ2  x2i n x2i yi
P 2 P P P 
1 x2i yi  x2i x2i yi
¼ P 2 P P P P P
n x2i  ð x2i Þ2  x2i yi þ n x2i yi

Hence,
P
P P P P P
^ ¼n x2i yi  x2i yi x2i yi  nx2y
b P 2 P ¼ P 2
2
n x2i  ð x2i Þ 2 x2i  nx22
P
ðx2i  x2 Þðyi  yÞ
¼ P
ðx2i  x2 Þ2
P P P P
^ ¼ x22i yi  x2i x2i yi
b 1 P P
n x22i  ð x2i Þ2
P P
y x22i  x2 x2i yi
¼ P 2
x2i  nx2
P
ynx22  x2 x2i yi
¼ y þ P
x22i  nx22
^
¼ y  x2 b 2
2.5 Method of Least Square 59

Let yi be an independent N ðb1 þ b2 xi ; r2 Þ variate, i ¼ 1; 2; . . .. . .; n so that


E ðyi Þ ¼ b1 þ b2 xi : The estimators of b1 and b2 are obtained by the method of least
square on minimizing

X
n
/¼ ðyi  b1  b2 xi Þ2
i¼1

The likelihood estimate is


n P
1 2
pffiffiffiffiffiffiffiffi e2r2 ðyi b1 b2 xi Þ
1

2pr
Pn 2
L is maximum when i¼1 ðyi  b1  b2 xi Þ is minimum. By the method of
P
maximum likelihood, we choose b1 and b2 such that ni¼1 ðyi  b1  b2 xi Þ2 ¼ / is
minimum. Hence, both the methods of least square and maximum likelihood
estimator are identical.
Example 2.14 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.
f ðx; h; r Þ ¼ hr C1ðrÞ ex=h xr1 ; x [ 0; h [ 0; r [ 0:
Find estimator of h and r by
(i) Method of moments
(ii) Method of maximum likelihood

Answer

(i) Here, E ð xÞ ¼ l11 ¼ rh; E ðx2 Þ ¼ l12 ¼ r ðr þ 1Þh2

1 Xn 2
m11 ¼ x; m12 ¼ x
i¼1 i
n
Pn
Hence, rh ¼ x; r ðr þ 1Þh2 ¼ 1n 2
i¼1 xi Pn
ðx xÞ2
By solving, we get ^r ¼ Pn nx
2
2 and ^
h ¼ i¼1 i
ðx  
x Þ n
x
i¼1 i
Pn Q n
(ii) L ¼ hnr ðC1ðrÞÞn eh i¼1 xi
1
xr1
i
i¼1
P P
log L ¼ nr log h  n log Cðr Þ  1h ni xi þ ðr  1Þ ni¼1 log xi
Now, @ log L ¼  nr þ nx2 ¼ 0 ) ^h ¼ x
@h h h r
60 2 Methods of Estimation

@ log L @ log Cðr Þ X n


¼  n log h  n þ log xi
@r @r i¼1
C0 ðr Þ X
n
¼ n log r  n  n log x þ log xi
Cðr Þ i

It is, however, difficult to solve the equation @ log


@r ¼ 0 and to get the estimate of r.
L

Thus, for this example estimators of h and r are more easily obtained by the method
of moments than the method of maximum likelihood.
Example 2.15 If a sample of size one is drawn from the p.d.f f ðx; bÞ ¼
2
b2
ðb  xÞ; 0\x\b:
^ the MLE of b and b , the estimator of b based on method of moments. Show
Find b,
^ is biased, but b is unbiased. Show that the efficiency of b
that b ^ w.r.t. b is 2/3.

Solution

2
L¼ ð b  xÞ
b2

log L ¼ log 2  2 log b þ logðb  xÞ

@ log L 2 1
¼ þ ¼ 0 ) b ¼ 2x
@b b bx
a
Thus, the MLE of b is given by b = 2x.
Rb
Now, Eð xÞ ¼ b22 0 ðbx  x2 Þdx ¼ b3
Hence, b3 ¼ x ) b ¼ 3x
Thus, the estimator of b based on method of moment is given by b ¼ 3x:
Now,
 
^ ¼ 2  b ¼ 2b 6¼ b
E b
3 3
 b
E ðb Þ ¼ 3  ¼ b
3
^ is biased but b is unbiased.
Hence, b
2.5 Method of Least Square 61

Again,

Zb
  2   b2
E x2 ¼ 2 bx2  x3 dx ¼
b 6
0

b2 b2 b2
) V ð xÞ ¼ ¼
6 9 18
b2
V ðb Þ ¼ 9V ð xÞ ¼
2
  2 2
^
V b ¼ 4V ð xÞ ¼ b
9
    h   i2
^ ^
M b ¼ V b þ E b b ^
2
2 2 2
¼ b þ bb
9 3
1
¼ b2
3
^ with respect to b is 2 :
Thus, the efficiency of b 3
Chapter 3
Theory of Testing of Hypothesis

3.1 Introduction

Consider a random sample from an infinite or a finite population. From such a


sample or samples we try to draw inference regarding population. Suppose the form
of the distribution of the population is Fh which is assumed to be known but the
parameter h is unknown. Inferences are drawn about unknown parameters of the
distribution. In many practical problems, we are interested in testing the validity of
an assertion about the unknown parameter h. Some hypothesis is made regarding
the parameters and it is tested whether it is acceptable in the light of sample
observations. As for examples, suppose we are interested in introducing a high
yielding rice variety. We have at our disposal a standard variety having average
yield x quintal per acre. We want to know whether the average yield for the new
variety is higher than x. Similarly, we may be interested to check the claim of a tube
light manufacturer about the average life hours achieved by a particular brand.
A problem of this type is usually referred to as a problem of testing of hypothesis.
Testing of hypothesis is closely linked with estimation theory in which we seek the
best estimator of unknown parameter. In this chapter, we shall discuss the problem
of testing of hypothesis.

3.2 Definitions and Some Examples

In this section, some aspects of statistical hypotheses and tests of statistical


hypothesis will be discussed.
Let q ¼ fpð xÞg be a class of all p.m.f or p.d.f. In testing problem pð xÞ is
unknown, but ρ is known. Our objective is to provide more information about pð xÞ
on the basis of X ¼ x. That is, to know whether pð xÞ 2 q  q:

© Springer India 2015 63


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_3
64 3 Theory of Testing of Hypothesis

Definition 1 A hypothesis is a conjecture or assertion about pð xÞ. It is of two types,


viz., Null hypothesis (H) and alternative hypothesis (K).
Null hypothesis (H): A hypothesis that is tentatively set up is called null
hypothesis. Alternative to H is called Alternative hypothesis.
H and K are such that H \ K ¼ u and H [ Kq: We also write H as

H : pð xÞ 2 qH  q
q \ qK ¼ u and qH [ qK q
and K as K : pð xÞ 2 qK  q H

Labeling of the distribution


   
Write q ¼ pð xÞ ¼ p x=h ; h 2 H . Then ‘h’ is called the labelling parameter of the
distribution and ‘H’ is called the parameter space.
Example
Pm 3.1 X  binðm; pÞ , X1 ; X2 ; . . .Xm are i.i.d Bernoulli (p) ) X ¼
X
i¼1 i  bin ðm; pÞ, m is known, h ¼ p, H ¼ ½0; 1; outcome space

x ¼ f0; 1; 2; . . .mg  f0; 1gX f0; 1gX. . .X f0; 1g

  P m P
m

xi m xi
m x mx
pðx=hÞ ¼ p ð 1  pÞ Or p x =h ¼ p i¼1 ð1  pÞ i¼1
x 

    
m x m x
q¼ p ð1  pÞmx ; p 2 ½0; 1 is known but p ð1  pÞmx is
x x
unknown if p is unknown.
Example 3.2 Let X1 ; X2 ; . . .Xn1 are i.i.d Pðk1 Þ and Y1 ; Y2 ; . . .Yn2 are i.i.d Pðk2 Þ.
Also they are independent and n1 and n2 are known.
Now,

X ¼ ðX1 ; X2 ; . . .Xn1 ; Y1 ; Y2 ; . . .Yn2 Þ; n ¼ n1 þ n2

x ¼ ½f0; 1; . . .1gn1 X ½f0; 1; . . .1gn2




h ¼ ðk1 ; k2 Þ; H ¼ ð0; 1ÞX ð0; 1Þ ¼ fðk1 ; k2 Þ : 0\k1 ; k2 \1g


P P
Y n1 Yn2 x y
  k1 i k2 j  ð n k þ n k Þ
pðx=hÞ ¼ pðxi =k1 Þ p y j k2 ¼ Q Q e 1 1 2 2
i¼1 j¼1
xi ! yj !
     
q ¼ p x=h ; 0\k1 ; k2 \1 is known but p x=h is unknown if h is unknown.
Example 3.3 X1 ; X2 ; . . .Xn are i.i.d N ðl;
 r2Þ. X ¼ ðX1 ; X2 ; . . .Xn Þ; n 1; h ¼ ðl; r2 Þ
or flg (if r2 is known) or r2 (if l is known), H ¼ ðl; r2 Þ :
0  
1\l\1; r [ 0g or fl : 1\l\1g  R or r2 : r2 [ 0 :
2

x ¼ Rn : n-dimensional Euclidean space.


3.2 Definitions and Some Examples 65

Pn
 1 ðxi lÞ2
pðx=hÞ ¼ ð2pÞ n=2 n 2r2 1
r e or
P
n
12 ðxi lÞ2
¼ ð2pÞn=2 e 1 ; r2 ¼ 1 or
P
n
 1 x2
¼ ð2pÞ n=2 n 2r2 1 i
r e ; l ¼ 0: or
   
q ¼ p x=h ; 1\l\1; r2 [ 0 or
   
p x=h ; 1\l\1 or
   2   
p x=h ; r [ 0 all are known but unknown is p x=h for fixed θ (Unknown).

Parametric set up
pð xÞ ¼ pðx=hÞ; h 2 H: Then we can find HH ð HÞ and HK ð HÞ such that

HH \ HK ¼ / and pH ¼ fpðx=hÞ; h 2 HH g; pK ¼ fpðx=hÞ; h 2 HK g

So,

H : p 2 pH , H : h 2 H H
K : p 2 pK , K : h 2 H K :

Definition 2 Now a hypothesis H  is called


i. Simple if H  contains just one parametric point, i.e. H  specifies the distribution
fpðx=hÞg completely.
ii. Composite if H  contains more than one parametric point, i.e. H  cannot
specify the distribution fpðx=hÞg completely.

Example 3.4 X1 ; X2 ; . . .Xn are i.i.d N ðl; r2 Þ: Consider the following hypothesis
ðH  Þ:
1. H : l ¼ 0; r2 ¼ 1 : H  ) H  N ð0; 1Þ
2. H : l
0; r2 ¼ 1
3. H : l ¼ 0; r2 [ 0
4. H : r2 ¼ r20
5. H : lþr ¼ 0
The first one is a simple hypothesis and the remaining are composite hypotheses.
Definition 3 Let x be the observed value of the random variable X with probability
model pðx=hÞ; h unknown. Wherever X ¼ x is observed, pðx=hÞ is a function of h
only and is called the likelihood of getting such a sample. It is simply called the
likelihood function and often denoted by LðhÞ or Lðh=xÞ:
66 3 Theory of Testing of Hypothesis

Definition 4 Test It is a rule for the acceptance or the rejection of the null
hypothesis (H) on the basis of an observed value of X.
Definition 5 Non-randomized test
Let x be a subset of 
x such that

X 2 x ) The rejection of H
X2
x  x ) The acceptance of H:

Then x is called the critical region or a test for H against K. Test ‘x’ means a
rule determined by x. Note that x does not depend on the random experiment (that
is on X). So it is called a non-randomized test.
Definition 6 Randomized test:
It consists in determining a function /ð xÞ
such that
(i) 0
/ð xÞ
1 8x 2 
x
(ii) H is rejected with probability /ð xÞ whenever X ¼ x is observed.
Such a ‘/ð xÞ’ is called ‘Critical function’ or ‘test function’ or simply ‘test’ for
H against K. Here the function /ð xÞ depends on the random experiment (that is on
X). So that name ‘randomised’ is justified.
e.g. (i) and (ii) ⇒ whenever X ¼ x is observed, perform a Bernoullian trial with
probability of success /ð xÞ. If the trial results in success, reject H; otherwise H is
accepted. Thus /ð xÞ represents the probability of rejection of H.
If /ð xÞ is non-randomized with critical region ‘x’, then we have
x ¼ fx : /ð xÞ ¼ 1g

x  x ¼ fx :/ðxÞ ¼ 0g (Acceptance region).
Detailed study on Non-randomized test
If x is Non-randomized test then it implies H is rejected iff X 2 x. In many cases,
we get a statistic T ¼ T ð X Þ such that for some C or C1 and C2 ,
½X 2 x , ½X : T [ C  or ½X : T\C  or ½X : T\C1 or : T [ C2 ; C1 \C2 :
Such a ‘T’ is called ‘test statistic’.
The event ½T [ C  is called right tail test based on T.
The event ½T\C is called left tail test based on T.
The event ½T\C1 or T [ C2  is called two tailed test based on ‘T’.
Sometimes C1 and C2 are such that PfT\C1 =hg ¼ PfT [ C2 =hg8h 2 HH then
the test ½T\C1 or T [ C2  is called equal-tail test based on T.
3.2 Definitions and Some Examples 67

Definition 7 Power Function


Let X be a random variable with pðx=hÞ as p.d.f or p.m.f of X, h 2 H; x 2 
x

Testing problem

H : h 2 HH versus K : h 2 HK fHH \ HK ¼ ;g

Let x be a test for H against K.


Then the function given by

Px ðhÞ ¼ Probabilityfrejecting H under hg:


¼ PfX 2 x=hg; h 2 H

is called power function (a function of h) of the test ‘x’


For a given h 2RHK , Px ðhÞ is called the power of ‘x’ at h. For continuous case,
P
we have Px ðhÞ ¼ x pðx=hÞdx and for discrete case we have Px ðhÞ ¼ x pðx=hÞ:
A test ‘x’ is called size-α if

Px ðhÞ
a 8h 2 HH ½a : 0\a\1 ð3:1Þ

and it is called strictly size-α if

Px ðhÞ
a 8h 2 HH and Px ðhÞ ¼ a for some h 2 HH ð3:2Þ

(3.1) , Sup Px ðhÞ


a and (3.2) , Sup Px ðhÞ ¼ a:
h2HH h2HH
The quantity SupfPx ðhÞ; h 2 HH g is called the size of the test. Sometimes ‘a’ is
called the level of the test ‘x’

Some Specific cases


(i) h: Real-valued; testing problem H : h ¼ h0 (Simple) or H : h
h0
(Composite) against K : h [ h0 ; x: A test; Px ðhÞ: Power function; Size of the
test: Px ðh0 Þ or Sup Px ðhÞ
h
h0
(ii) h ¼ ðh1 ; h2 Þ : 2 component vector
Testing problem: H: h1 ¼ h01 (given) against K : h1 [ h01 (composite)
x: A test
 
Pw ðhÞ: power function = Pw ðh1 ; h2 Þ ¼ Pw h01 ; h2 (at H) = A function of h2 .
Thus, the power function (under H) is still unknown. The quantity sup
  
Pw h01 ; h2 ; h2 2 Space of h2 g is known and is called the size of the test. For, e.g.
take N ðl; r2 Þ distribution and consider the problem of testing H: l ¼ 0 against K:
l [ 0, then the size of the test is
68 3 Theory of Testing of Hypothesis

n 2
o
Sup Pw ðl; r Þ l ¼ 0; 0\r2 \1 :
 
Example 3.5 X1 ; X2 ; . . .; Xn are i.i.d N l; r02 ; H : l
l0 against K : l [ l0 .

 r0
x  ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi
n


Pw ðhÞ ¼ Pw ðlÞ ¼ P X [ l0 þ pffiffin l
r 0

pffiffiffi pffiffiffi 
nðX  lÞ nðl  l0 Þ
¼P [ þ 1j l
r0 r0
pffiffiffi 
nðl  l0 Þ
¼ P Z [1  jZ  N ð0; 1Þ
r0
pffiffiffi 
nðl  l0 Þ
¼U 1
r0
 pffiffiffi 
nðl  l0 Þ
Size of x ¼ Sup Pw ðlÞ ¼ Sup U 1
l
l0 l
l0 r0
¼ Uð1Þ ¼ size of x for testing H : l ¼ l0 :

Example 3.6
 
X1 ; X2 ; . . .; Xn are i.i.d N l; r02 .
H : l ¼ l0 against K : l [ l0 .


 r0 a 2 ð0; 1Þ
x¼ ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pffiffiffi sa ;
n U ðsa Þ ¼ a

Pw ðl0 Þ ¼ size of x for testing H


 pffiffiffi 
r0 nð X  l0 Þ

¼ P X [ l0 þ pffiffiffi sa jl0 ¼ P [ sa jl0
n r0
¼ PfZ [ sa jZ  N ð0; 1Þg ¼ a:
) Test is exactly size 0 a0 :

Example 3.7 X1 ; X2 ; . . .; Xn are i.i.d. N ðl; r2 Þ


H : l ¼ l0 against K : l [ l0

 [ cg
x : fðX1 ; X2 ; . . .; Xn Þ; X ð3:3Þ

where ‘c’ is such that the test is of size 0.05.


3.2 Definitions and Some Examples 69

n o
Pw ðl0 Þ ¼ size of x for testing H ¼ P X [ c l
0
pffiffiffi  pffiffiffi 
n X  l0 ðc  l0 Þ n
¼P [ =l0
r0 r0
pffiffiffi 
ðc  l0 Þ n
¼P Z[ =Z  N ð0; 1Þ ¼ 0:05ðgivenÞ
r0
pffiffiffi
ðc  l0 Þ n 1:645r0
) ¼ s0:05 ’ 1:645 )c ¼ l0 þ pffiffiffi
r0 n
ð3:4Þ

Test given (3.3) and (3.4) is strictly (exactly) of size 0.05.


(or, level of significance of the test is 0.05).
Example 3.8 X1 and X2 are i.i.d. according to (, Bernoulli (1, p)).

f x=p ¼ px ð1  pÞx ; x ¼ 0; 1

Testing problem, H : p ¼ 12 against K : p [ 12.


Consider the test x ¼ fðX1 ; X2 Þ : X1 þ X2 ¼ 2g
Accept H if ðX1 ; X2 Þ 62 x
Test statistic: T ¼ X1 þ X2  binð2; pÞ
     2
x T ¼ 2 1
Size of the test is P ðX1 ; X2 Þ 2 p¼2 ¼P
1
p ¼ 2 ¼ 2 ¼ 0:25
1

If we take x ¼ fðX1 ; X2 Þ : X1 þ X2 ¼ 1; 2g i.e. accept H if ðX1 ; X2 Þ 62 x: We get


 2  2
size ¼ 2: 12 þ 12 ¼ 0:75:
Let us take another test of the form:
x : Reject H if X1 þ X2 ¼ 2
xB : Reject H if X1 þ X2 ¼ 1 with probability 1
2
A: Accept H if X1 þ X2 ¼ 0
Sample space ¼ f0; 1; 2g ¼ x þ xB þ A

1
Size ¼ 1: PfðX1 þ X2 Þ ¼ 2g þ PðX1 þ X2 ¼ 1Þ þ 0: PðX1 þ X2 ¼ 0Þ
2
¼ 0:25 þ 0:25 ¼ 0:50

The test given above is called a randomized test.


Definition 8 Power function of a randomized test:
70 3 Theory of Testing of Hypothesis

Consider /ð xÞ as a randomized test which is equivalent to probability of


rejection of H when the observed value of (X = x) and E as an Event of rejection of
H. Then PðE jX ¼ xÞ ¼ /ðxÞ: Power function of / is

P/ ðhÞ ¼ Probability fRejection of H under h using the function /g


n o
¼ P E=h
Z n o  
¼ P E=x; h p x=h dx; when X is continuous ð3:5Þ


X n o  
¼ P E=x; h p x=h ; when X is discrete ð3:6Þ


In case of (3.5), we get:


Z

 
P/ ðhÞ ¼ /ð xÞ p x=h dx As P E=x; h ¼ P E=x ¼ /ð xÞ

¼ E h / ð xÞ
P  
In case of (3.6), we get: P/ ðhÞ ¼ /ð xÞ:p x=h ¼ Eh /ð xÞ

In either case we have P/ ðhÞ ¼ Eh /ð xÞ8h 2 H:

Special cases

1. Suppose /ð xÞ takes only two values, viz. 0 and 1. In that case, we say /ð xÞ is
non-randomized with critical region x ¼ fx : /ð xÞ ¼ 1g.
In that case

P/ ðhÞ ¼ 1:Ph f/ð xÞ ¼ 1g þ 0: Ph f/ð xÞ ¼ 0g


¼ Ph fX 2 xg ¼ Px ðhÞ:
) / is generalization of x.
2. Suppose /ð xÞ takes three values, viz 0, a and I according as
x 2 A; x 2 wB and x 2 x. In that case /ð xÞ is called randomized test having the
boundary region WB . The power function of this test is
P/ ðhÞ ¼ Ph fX 2 xg þ aPh fX 2 wB g.

(1) ) no need of post randomization: Non-randomised test.


(2) ) requires post randomization: randomized test.

Example 3.9 X1 ; X2 ; . . .; Xn are i.i.d. Bernoulli (1, p), n = 25. Testing problem:
H : p ¼ 12 against K : p 6¼ 12.
3.2 Definitions and Some Examples 71

Consider the following tests:


9
P
25
/ð xÞ ¼ 1 if xi [ 12 >>
=
1
(1) Non-randomized
P15 >
¼ 0 if xi
12 >
;
1

9
P
25
>
/ðxÞ ¼ 1 if xi [ c >
>
>
>
>
1 >
=
P
25
(2) ¼ a if xi ¼ c
>
>
1 >
>
P25 >
>
¼ 0 if xi \c >
;
1

Find c and a such that Ep¼12 /ð xÞ ¼ 0:05. Randomized if a 2 ð0; 1Þ and


Non-randomized if a ¼ 0 or 1. In case of (1), size = Ep¼12 /ð xÞ ¼
25 
P
P xi [ 12jp ¼ 12 ¼ 0:50001:
1
In case of (2), we want to get (c, a) such that Ep¼12 /ð xÞ ¼ 0:05.
( ) ( )
X
25 X
25
, Pp¼12 xi [ c þ aPp¼12 xi ¼ c ¼ 0:05
1 1

nP o
25
By inspection we find that Pp¼12 1 x i [ 17 ¼ 0:02237 and
nP o
25
Pp¼12 1 xi [ 16 ¼ 0:0546. Hence, c = 17.
P15 
0:05Pp¼1 x [c
Now, a ¼ P25
2 1 i
 ¼ 0:050:02237 ¼ 0:8573:
P xi ¼c 0:03223
1
Thus the test given by

X
25
/ð xÞ ¼ 1 if xi [ 17
1
X
25
¼ 0:8573 if xi ¼ 17
1
X
25
¼ 0 if xi \17
1

is randomized and of size 0.05.


But the test given by
72 3 Theory of Testing of Hypothesis

X
25
/ðxÞ ¼ 1 if xi 17
1
X
25
¼0 if xi \17
1

is non-randomized and of size 0.0546 (at the level 0.06).

Performance of x
Our object is to choose x such that Px ðhÞ8h 2 HH and ð1  Px ðhÞÞ8h 2 HK are as
small as possible. While performing a test x we reach any of the following
decisions:
I. Observe X = x, Accept H when θ actually belongs to HH : A correct decision.
II. Observe X = x, Reject H when θ actually belongs to HH : An incorrect
decision.
III. Observe X = x, Accept H when θ actually belongs to HK : An incorrect
decision.
IV. Observe X = x, Reject H when θ actually belongs to HK : A correct decision.
An incorrect decision of the type as stated in II above is called type-I error and
an incorrect decision of the type as stated in III above is called type-II error. Hence,
the performance of x is measured by the following:
(a) Size of type-I error = Probability {Type-I error} ¼ Sup PfX 2 x=hg ¼
h2HH
Sup Px ðhÞ
h2HH

(b) Size of type-II error = Probability {Type-II error } ¼ PfX 2 


x  xg 8h 2 HK

¼ 1  Px ðhÞ 8h 2 HK :
So we want to minimize simultaneously both the errors. In practice, it is not
possible to minimize both of them simultaneously, because the minimization of one
leads to the increase of the other.
Thus the conventional procedure: Choose ‘x’ such that, for a given a 2
ð0; 1Þ; Px ðhÞ
a 8h 2 HH and 1  Px ðhÞ 8h 2 HK is as low as possible, i.e.,
Px ðhÞ 8h 2 HK is as high as possible. ‘x’ satisfying above (if it exists) is called an
optimum test at the level α.
Suppose HH ¼ fh0 g a single point set and HK ¼ fh1 g a single point set.
The above condition thus reduces to: Px ðh1 Þ = maximum subject to Px ðh0 Þ
a.
Definition 9
1. For testing H : h 2 HH against K: h ¼ h1 62 HH , a test ‘x0 ’ is said to be most
powerful (MP) level ‘α’ 2 ð0; 1Þ if
3.2 Definitions and Some Examples 73

Px0 ðhÞ
a8h 2 HH ð3:7Þ
and

Px0 ðh1 Þ Px ðh1 Þ 8x satisfying ð3:7Þ ð3:8Þ


In particular, if H : h ¼ h0 , (3.7) and (3.8) reduce to
Px0 ðh0 Þ
a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying first condition.
2. A test ‘x0 ’ is said to be MP size-α, if Suph2HH Px0 ðhÞ ¼
a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying Px ðhÞ
a8h 2 HH . Again if HH ¼ fh0 g,
we get the above condition as Px0 ðh0 Þ ¼ a and Px0 ðh1 Þ Px ðh1 Þ8x satisfying
Px ðh0 Þ
a.
3. For testing H : h 2 HH against K : h 2 HK ; HK \ HH ¼ /, a test ‘x0 ’ is said to
be Uniformly Most Powerful (UMP) level ‘α’ if

Px0 ðhÞ
a8h 2 HH ð3:9Þ

Px0 ðh1 Þ Px ðh1 Þ8h1 2 HK and 8x satisfying (3.9). i.e. ‘x0 ’is said to be UMP
size-α if Sup Px0 ðhÞ ¼ a and Px0 ðh1 Þ Px ðh1 Þ 8h1 2 HK and 8x satisfying
h2XH
Sup Px ðhÞ
a. Again if HH ¼ fh0 g, the aforesaid conditions reduce to
h2XH

(a) Px0 fh0 g


a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g
a.
(b) Px0 fh0 g ¼ a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g
a.
4. A test x is said to be unbiased if (under testing problem: H : h ¼ h0 against
K : h ¼ h1 ; ðh1 6¼ h0 Þ)
Px ðh1 Þ Px ðh0 Þ (⇒power ≥ size), i.e. it is said to be unbiased size-α if
Px ðh0 Þ ¼ a and Px ðh1 Þ a. If K : h 2 HK is composite, the above relation
reduces to (A) Px ðh1 Þ Px ðh0 Þ 8h1 2 HK ðBÞ Px ðh1 Þ a8h1 2 HK where
a ¼ Px ðh0 Þ.
5. For testing H : h ¼ h0 against K : h 2 HK 6 3 h0 , a test x is said to be
Uniformly Most Powerful Unbiased (UMPU) size-α if
(i) Px ðh0 Þ ¼ a; (ii) Px ðh1 Þ a 8h1 2 HK and (iii) Px ðh1 Þ Px ðh1 Þ 8h1 2
HK 8x satisfying (i) and (ii).

3.3 Method of Obtaining BCR

The definition of most powerful critical region, i.e. best critical region (BCR) of
size α does not provide a systematic method of determining it. The following
lemma, due to Neyman and Pearson, provides a solution of the problem if we,
however, test a simple hypothesis against a simple alternative.
The Neyman–Pearson Lemma maybe stated as follows:
74 3 Theory of Testing of Hypothesis

For testing H : h ¼ h0 against K : h ¼ h1 ; h0 ; h1 2 H; h1 6¼ h0 ,


for some a 2 ð0; 1Þ, let x0 be a subset of 
x. Suppose x0 satisfies the following
conditions:

(i) If x 2 x0 ; p x=h1 kp x=h0 ðInside x0 Þ



(ii) If x 2  x  x0 ; p x=h1 \kp x=h0 ðOutside x0 Þ

(x: observed value of X) where kð [ 0Þ is such that Px0 ðh0 Þ ¼ a. Then


Px0 ðh1 Þ Px fh1 g8x satisfying Px ðh0 Þ
a. That means ‘x0 ’ is a MP size-α test.
Proof (Continuous case)
Z
Z

Px0 ðh1 Þ  Px ðh1 Þ ¼ p x=h1 dx  p x=h1 dx


x0 x
Z
Z
Z
Z

¼ p x=h1 dx þ p x=h1 dx  p x=h1 dx p x=h1 dx


x0 x x0 \x xx0 x\x0
Z
Z

¼ p x=h1 dx  p x=h1 dx
x0 x xx0

ð3:10Þ
h

Now, x 2 x0  x , x 2 inside x0 ) p x=h1 kp x=h0


Z
Z

) p x=h1 dx k p x=h0 dx
x0 x x0 x

x 2 x  x0 , x 2 outside x0 ) p x=h1 \kP x=h0


Z
Z

) p x=h1 dx\k P x=h0 dx


xx0 xx0

Hence R.H.S of (3.10)


2 3
Z
Z

k4 p x=h0 dx  p =h0 dx5


x
x0 x xx0
2 3
Z
Z

¼ k4 p x=h0 dx  p x=h0 dx5


x0 x
2 3
Z

¼ k4a  p x=h0 dx5 ¼ k½a  Px ðh0 Þ


x
kða  aÞ ¼ 0 as Px ðh0 Þ
a:
3.3 Method of Obtaining BCR 75

Hence we get Px0 ðh1 Þ  Px ðh1 Þ 0

, Px0 ðh1 Þ Px ðh1 Þ


R
(Similar result can be obtained for the discrete case replacing by R)

Notes

1. Define Y ¼ ppððxjh 1Þ
xjh0 Þ. If the random variable Y is continuous, we can always find a
k such that, for a 2 ð0; 1Þ P½Y k ¼ a. If the random variable Y is discrete, we
sometimes find k such that P½Y k ¼ a.
But, in most of the cases, we have (assuming that P½Y k 6¼ a)
Ph0 ðY k1 Þ\a and Ph0 ðY k2 Þ [ a; k1 [ k2 ð) PðY kÞ ¼ a has no
solution).
In that case we get a non-randomized test 0 x0 0 of level a given by

pð xjh1 Þ
x0 ¼ x: k1 ; Px0 ðh0 Þ
a:
pð xjh0 Þ

In order to get a size-a test, we proceed as follows:


(i) Reject H if Y k1
(ii) Accept H if Y\k2
(iii) Acceptance (or Rejection) depends on the random experiment whenever
Y ¼ k2 .
Random experiment: when Y ¼ k2 is observed, perform a random experiment
with probability of success

a  Ph0 fY k1 g
P¼a¼ :
P fY ¼ k2 g

If the experiment results in success reject H; otherwise accept H. Hence, we get


the following randomized test:

pð xjh1 Þ
/0 ð xÞ ¼ 1 if k1
pð xjh0 Þ
a  Ph0 fY k1 g pðx=h1 Þ
¼a¼ if ¼ k2
Ph0 fY ¼ k2 g pðx=h0 Þ
pðx=h1 Þ
¼ 0 if \k2 :
pðx=h0 Þ
Test /0 ð xÞ is obviously of size-a.
2. k ¼ 0 ) Px0 ðh1 Þ ¼ 1 ) x0 is a trivial MP test.
3. If the test (x0 ) given by N–P lemma is independent of h1 2 Hk that does not
include h0 , the test is UMP size-a.
4. Test (x0 ) is unbiased size-a.
76 3 Theory of Testing of Hypothesis

Proof x0 ¼ fX : pð xjh1 Þ [ kpð xjh0 Þg we want to show Px0 ðh1 Þ a.


R Take k ¼ 0.
Then x0 ¼ fX : pðx=h1 Þ [ 0g. In that case Px0 ðh1 Þ ¼ pð xjh1 Þdx ¼
x0
R R
pð xjh1 Þdx ¼ pð xjh1 Þdx ¼ 1 [ a.
fx:pðxjh1 Þ [ 0g 
x
) Test is trivially unbiased.
So throughout we assume that k [ 0.
Now

pð xjh1 Þ [ kpð xjh0 Þ ½As inside x0 : pð xjh1 Þ [ kpð xjh0 Þ


Z Z
) pð xjh1 Þdx k pð xjh0 Þdx ¼ ka
x0 x0

, Px0 ðh1 Þ ka ð3:11Þ

Again

pð xjh1 Þ
kpð xjh0 Þ ½As outside x0 : pð xjh1 Þ
kpð xjh0 Þ
Z Z
) pð xjh1 Þdx
k pð xjh0 Þdx
x0c x0c

, 1  Px0 ðh1 Þ
kð1  aÞ ð3:12Þ

P ðh1 Þ a
(3.11) ÷ (3.12) ) 1Px0x ðh Þ 1a , Px0 ðh1 Þ a:
0 1
) Test is unbiased.
Conclusion MP test is unbiased. Let x0 be a MP size-a test. Then, with
probability one, the test is equivalent to (assuming that ppððxjh 1Þ
xjh0 Þ has continuous dis-
tribution under h0 and h1 ) x0 ¼ fx : pð xjh1 Þ [ kpð xjh0 Þg where k is such that
Px0 ðh0 Þ ¼ a 2 ð0; 1Þ. h
 
Example 3.10 X1 ; X2 ; . . .Xn are i.i.d. N l; r20 ; 1\l\1; r0 = known. (without
any loss of generality take r0 ¼ 1).
X ¼ ðX1 ; X2 ; . . .Xn Þ observed value of X ¼ x ¼ ðx1 ; x2 ; . . .; xn Þ. To find UMP
size-a test for H : l ¼ l0 against K : l [ l0 . Take any l1 [ l0 and find MP
size-a test for
H : l ¼ l0 against K : l ¼ l1 ;

Solution


n 1 P
n
ðxi lÞ2
1 2
p x=l ¼ pffiffiffiffiffiffi e 1 :
2p
3.3 Method of Obtaining BCR 77

Then


P
n
1
ðxi l0 Þ2 Pn
p x=l 2
e 1
1
2 ðl1 l0 Þ ð2xi l1 l0 Þ
1

¼ P n ¼e 1
p x=l 1
2 ðxi l1 Þ2
0
e 1
 
1X
¼ enxðl1 l0 Þ2ðl1 l0 Þ
n 2 2
* x ¼ xi
n

Hence, by N–P lemma, the MP size-a test is given by

n

o
x0 ¼ x : p x=l [ kp x=l ð3:13Þ
1 0

where k is such that

Px0 ðl0 Þ ¼ a ð3:14Þ


n o
ð3:13Þ , x : enxðl1 l0 Þ2ðl1 l0 Þ [ k
n 2 2
ð3:15Þ

loge k 1
, x : x [ þ ðl1 þ l0 Þ as l1 [ l0
n ð l1  l0 Þ 2
, fx : x [ cg; say ð3:16Þ

By (3.16),

ð3:14Þ , P x [ c=l ¼ a
0
pffiffiffi pffiffiffi  
nðx  l0 Þ nðc  l0 Þ
,P [ l0 ¼ a
1 1
 
  N l0 ; 1 under H)
(X1 ; X2 ; . . .Xn are i.i.d N ðl0 ; 1Þ under H ) X n
 pffiffiffi 
, P Z [ nðc  l0 ÞjZ  N ð0; 1Þ ¼ a
2 1 3
Z
pffiffiffi
) nðc  l0 Þ ¼ sa 4 N ðZjð0; 1ÞÞdz ¼ a5
sa
1
, c ¼ l0 þ pffiffiffi sa ð3:17Þ
n

Test given by (3.16) and (3.17) is MP size-a for H : l ¼ l0 against


K : l ¼ l1 ð [ l0 Þ.
78 3 Theory of Testing of Hypothesis

The test is independent of any μ1(>μ0). Hence it is UMP size-α for H : l ¼ l0


against K : l [ l0 .

Observations
1. Power function of the test given by (3.16) and (3.17) is
 
 [ l0 þ psa 
Px0 ðlÞ ¼ PðX 2 x0 jlÞ ¼ Pr: X ffiffiffil
n
 pffiffiffi 
¼ P Z [ nðl0  lÞ þ sa jZ  N ð0; 1Þ
Z1
¼ N ðZjð0; 1ÞÞdz
pffiffi
sa  nðll0 Þ
pffiffiffi
(Under any l : ðX1 ; X2 ; . . .; Xn Þ are i.i.d. N ðl; 1Þ ) nðx  lÞ  N ð0; 1Þ)
 pffiffiffi 
¼ 1  U sa  nðl  l0 Þ :

Hence, for any fixed lð [ l0 Þ

Px0 ðlÞ ! 1 as n ! 1 ð3:18Þ

and for any fixed lð\l0 Þ

Px0 ðlÞ ! 0 as n ! 1 ð3:19Þ

(3.18) ) test is consistent against any l [ l0 .


Definition 10
1. For testing H : h ¼ h0 against K : h ¼ h1 , a test x (based on n observations) is
said to be consistent if the power Px ðh1 Þ of the test tends to ‘1’ as n ! 1.
pffiffiffi
2. Px0 ðlÞ ¼ 1  Uðsa  nðl  l0 ÞÞ which increases as l increases for fixed n.

) Px0 ðlÞ [ 1  Uðsa Þ for all l [ l0


¼ 1  ð 1  aÞ ¼ a
) x0 is unbiased.
3. Px0 ðlÞ\Px0 ðl0 Þ for all l\l0
¼a
) Power \a for any l\l0
That is, test x0 is biased for testing H : l ¼ l0 against K : l\l0 .
4. From (3.15), if l1 \l0 , we get x0 to be equivalent to

fx : x\c0 g ð3:20Þ
3.3 Method of Obtaining BCR 79


and Px0 ðl0 Þ is equivalent to PfX\c 0
jl0 g ¼ a
sa
) c0 ¼ l0  pffiffiffi ð3:21Þ
n
(by the same argument as before while finding c)
Test given by (3.20) and (3.21) is independent of any l1 \l0 . Hence it is UMP
size-a for H : l ¼ l0 against K : l\l0 .
n o
5. (i) UMP size-a for H : l ¼ l against K : l [ l is x ¼ x : x [ l þ psaffiffi
0 0 0 0 n

n test for H :ol ¼ l0 against K : l\l0


(ii) UMP size-a
is x0 ¼ x : x\l0  psaffiffin
0

Clearly, x0 6¼ x00 (xo is biased for H against l\l0 and x00 is biased for
H against l [ l0 ).
There does not exist any test which is UMP for H : l ¼ l0 against
K : l 6¼ l0 .

Example 3.11 X1 ; X2 ; . . .Xn are i.i.d. N ðl0 ; r2 Þ; r2 [ 0 and l0 is known (without


any loss of generality we take l0 ¼ 0)
X ¼ ðX1 ; X2 ; . . .; Xn Þ, observed value = x ¼ ðx1 ; x2 ; . . .; xn Þ
Testing problem: H : r ¼ r0 against K : r [ r0 .
To find UMP size-a test for H against K : r [ r0 we take any r1 [ r0 .

Solution
P
n
 
n 12 x2i
Here p x=r ¼ p1ffiffiffiffi e
2r
i
r 2x
Hence

 
x   P
n
p =r1 r0 n 1
2 xi2 1

r2 r2
1


¼ e i 0 1
ð3:22Þ
x
p =r0 r 1

By N–P lemma MP size-a test is given by


8
9
<
p x=r1 =
w0 ¼ x :
[k ð3:23Þ
: p x= ;
r0

where kð [ 0Þ
is such that

Pw0 ðr0 Þ ¼ a ð3:24Þ


80 3 Theory of Testing of Hypothesis


 
P
n
x
p =r
n 2 xi r2 r2
1 2 1 1

Now,
1

[ k , rr01 e i 0 1
[ k [from (3.22)]
p x=r
0


2
r0
X
n
2 loge k n loge r1
, xi2 [
 ½As r1 [ r0 
i
1
 1 1
r20
 1
r21
ð3:25Þ
r 2 r0
2
1

¼ c ðsayÞ

Hence (3.23) and (3.24) are equivalent to


( )
X
n
w0 ¼ x: x2i [ c ð3:26Þ
i

( )
X
n
and P x2i [ c=r0 ¼a ð3:27Þ
i

Under any r2 ; X1 ; X2 ; . . .; Xn are i.i.d. N ð0; r2 Þ

P
n
x2i
i
)  v2n
r2

Hence (3.27)
2 3
Z1  
c 6 1 y 7
e 2 y21 dy ¼ a5
n
)  v2n;a 4
r20 Cðn=2Þ2n=2
v2n;a

,c¼ r20 v2n;a ð3:28Þ

Thus the test given by


( )
X
n
w0 ¼ x: x2i [ r20 v2n;a ð3:29Þ
i

is MP size-a for H : r ¼ r0 against K : r ¼ r1 . Test is independent of any


r1 [ r0 . Hence it is UMP size-a for H : r ¼ r0 against K : r1 [ r0
3.3 Method of Obtaining BCR 81

Observations
1. Under any r20 ;

1X n
x2 ¼ Yn  v2n
r0 i i
2

) EðYn Þ ¼ n; V ðYn Þ ¼ 2n

Hence, from the asymptotic theory of v2 , for large n under H


ffiffiffiffi
Ypn n
is asymptotically N ð0; 1Þ.
2n
n o
ffiffiffiffi [ vp
n;a n
2
So, for large n; w0 ¼ x : Ypn n2n
ffiffiffiffi
2n
v2n;a n pffiffiffiffiffi
and pffiffiffi 2n
ffi ’ sa i.e. vn;a ’ sa 2n þ n
2

Thus, (3.29) can be approximated by


( )
X
n pffiffiffiffiffi

x0 ¼ x: x2i [ r02 sa 2n þ n
i¼1
2. UMP size-α test for H : r2 ¼ r20 against K : r2 [ r20 is
( )
X
n
w0 ¼ x: x2i [ r20 v2n;a
i
UMP size-a test for H : r2 ¼ r20 against K : r2 \r20 is
2 3
( ) Z1
X
n 6 7
w ¼ x: x2i \r20 v2n;1a 6 f ðv2n Þdv2n ¼ 1  a7
4 5
i
v2n;ð1aÞ

Clearly, w0 6¼ w . Hence there does not exist UMP test for H : r2 ¼ r20 against
K : r2 6¼ r20 .
The power function of the test w0 is
( ) Z1
 2 Xn .  
Pw0 r ¼ P xi [ r0 vn;a r ¼
2 2 2 2
f v2n dv2n
i
r2
0 v2
r2 n;a

Clearly, Pw0 ðr2 Þ increases


 2  as r 2 increases.
2

Also Pw0 ðr Þ
Pw0 r0 ¼ a 8r : r2
r20
2

Test is biased ) w0 cannot be recommended for H : r2 ¼ r20 against


K : r2 \r20 .
Similarly w is biased (Here Pw ðr2 Þ increases as r2 decreases) and hence it
cannot be recommended for H : r2 ¼ r20 against K : r2 [ r20 .
82 3 Theory of Testing of Hypothesis

P
n
Next observe that 1n x2i is a consistent estimator of r2 . That means, for fixed r2 ,
i
Pn
r2 v 2
as n ! 11n x2i ! r2 in probability and 0 nn;a ! r20 . Thus if r2 [ r20 , we get
n i . 
P 2
Lt P xi [ r20 v2n;a r2 ¼ 1 implying that the test w0 is consistent against
n!1 i
K : r2 [ r20 .
Similarly the test w is consistent against K : r2 \r20

eX =2 against
2
Example 3.12 Find the MP size-a test for H : X  p1ffiffiffiffi
2p
K : X  12 ejX j

Answer MP size-a test is given by (Using N–P lemma)

x0 ¼ fx : pðx=K Þ [ kpðx=H Þg ð3:30Þ

where k is such that

Px0 ðH Þ ¼ a ð3:31Þ

Now,
rffiffiffi
pðx=K Þ p x2 jxj
¼ e2 [k
pðx=H Þ 2
rffiffiffi
p x2
, loge þ  j xj [ loge k
2 2
n p
o
, x2  2j xj þ loge  2 loge k [ 0
2
, x 2  2j x j þ C [ 0 ð3:32Þ

Using (3.32), (3.31) is equivalent to


 
P x2  2j xj þ C [ 0=H ¼ a ð3:33:Þ

Test given by (3.32) and (3.33.) is MP size-a.


To find ‘C’ we proceed as follows:
     
P x2  2jxj þ C [ 0=H ¼ PH x2  2jxj þ C [ 0\x\0 þ PH x2  2jxj þ C [ 0\x [ 0
 2   2 
¼ PH x þ 2x þ C [ 0\x\0 þ PH x  2x þ C [ 0\x [ 0
3.3 Method of Obtaining BCR 83

Now, under H, X ∼ N(0, 1)


   
) PH x2 þ 2x þ C [ 0\x\0 ¼ PH x2  2x þ C [ 0 \ x [ 0

Thus (3.33) is equivalent to


  a
PH x2  2x þ C [ 0 \ x [ 0 ¼ ð3:34Þ
2

Writing gð xÞ ¼ x2  2x þ C ) g00 ð xÞ ¼ 2 and g0 ð xÞ ¼ 0 at x ¼ 1


) gð xÞ is minimum at x ¼ 1
 
) x2  2x þ C [ 0\x [ 0 , x\x1 ðcÞ or x [ x2 ðcÞ

g(x)

x<0 x1(c) x2 (c) x>0

where x1 ðcÞ\x2 ðcÞ are the roots of

x2  2x þ C ¼ 0
pffiffiffiffiffiffiffiffiffiffiffiffiffi
2 4  4c pffiffiffiffiffiffiffiffiffiffiffi
)x¼ ¼1 1c
2
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
So, x1 ðcÞ ¼ 1  1  c and x2 ðcÞ ¼ 1 þ 1  c
Hence (3.34)
n pffiffiffiffiffiffiffiffiffiffiffio n pffiffiffiffiffiffiffiffiffiffiffio a
, PH 0\x\1  1  c þ PH x [ 1 þ 1  c ¼
2
pffiffiffiffiffiffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffiffi
a
, U 1  1  c  þ1  U 1þ 1  c ¼
2 2
pffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffi
a 1 1a
, U 1þ 1  c  U 1  1  c ¼ 1   ¼ ð3:35Þ
2 2 2

Test given by (3.32) and (3.35) is MP size-a.


 
Example 3.13 Let X be a single observation from the p.d.f. p x=h ¼ ph h2 þ1 x2 ; h [ 0.
Find the UMP size-a test for H : h ¼ h0 against K : h [ h0
84 3 Theory of Testing of Hypothesis

Answer Take any h1 [ h0 and consider the ratio

pðx=h1 Þ h1 h20 þ x2 h1 h20 þ x2


¼ 2 ¼  2   
pðx=h0 Þ h0 h1 þ x2 h0 h1  h20 þ h20 þ x2
h1 1
¼ 2 2
; which is a strictly increasing function of x2 ði:e:; j xjÞ:
h0 1 þ h21 h0
h0 þ x2

Since h1 [ h0 , hence we can find a ‘C’ such that

pðx=h1 Þ
[ k , j xj [ C ð3:36Þ
pðx=h0 Þ

where C is such that

P½j xj [ C=h0  ¼ a ð3:37Þ

Z1   1
h0 dx a 1 x a
, 2 ¼ , tan1 ¼
p h0 þ x 2 2 p h0 C 2
C
    
1 p 1 C a 2 1 C
,  tan ¼ , 1  tan ¼a ð3:38Þ
p 2 h0 2 p h0

Test given by (3.36) and (3.38) is MP size-a. As the test is independent of any
h1 [ h0 , it is UMP size-a for H : h ¼ h0 against K : h [ h0 . Power function is
given by

ZC
h
Px0 ðhÞ ¼ Pfj X j [ C=hg ¼ 1   dx
p h2 þ x 2
C

Example 3.14 X is a single observation from Cauchy p.d.f .f ðx=hÞ ¼ p 1


,
fðxhÞ2 þ 1g
we are to find MP size-a test for H : h ¼ h0 against K : h ¼ h1 ð [ h0 Þ.
Answer X  CouchyðhÞ ) Y ¼ X  h0  Couchyðh  h0 ¼ dÞ. Hence H : h ¼
h0 , H : d ¼ 0 using Y-observation. So, without any loss of generality we take
H : h ¼ 0 and for the sake of simplicity we take h1 ¼ 1.
Here, by N–P lemma, MP test has the critical region x ¼
fx : pðx=h1 Þ [ kpðx=h0 Þg with Ph0 ðX 2 xÞ ¼ a 2 ð0; 1Þ
3.3 Method of Obtaining BCR 85

Here

pðx=h1 Þ 1 þ x2
[k , [k
pðx=h0 Þ 1 þ ð x  1Þ 2
 
, 1 þ x2 [ k 1 þ x2  2x þ 1
, x2 ð1  kÞ þ 2kx þ 1  2k [ 0 ð3:39Þ

Several cases
(a) k ¼ 1 , x [ 0, hence the size of the test is PðX [ 0=h ¼ 0Þ ¼ 12.
(b) 0\k\1 if we write gð xÞ ¼ ð1  kÞx2 þ 2kx þ ð1  2kÞ,
we have, g0 ð xÞ ¼ 2ð1  kÞx þ 2k ¼ 0 ) x ¼  1kk
00
g ð xÞ ¼ 2ð1  kÞ [ 0, this means that the curve y ¼ gð xÞ has a minimum at
k
x ¼  1k .

Shape of the curve is:

Here x1 \x2 are the roots of gð xÞ ¼ 0. Clearly, test is given by x\x1 or x [ x2


such that

PfX\x1 =h ¼ 0g þ PfX [ x2 =h ¼ 0g ¼ a ð3:40Þ

We take those values of x1 ; x2 that satisfies (3.40). Eventually, it is not possible


to get x1 ; x2 for any given a. It exists for some specific values of a.
(c) If k [ 1, in that case g00 ð xÞ ¼ 2ð1  kÞ\0, thus y ¼ gð xÞ has the maximum at
 k 
x ¼  1k [ 0. As shown in (b) above here also we can find x1 and x2 the
two roots of gð xÞ ¼ 0 and the test will be given by x [ x1 or x\x2 with
Pfx1 \X\x2 =h ¼ hg ¼ a. Taking h1 ¼ 2, it can be shown that the MP test
for H : h ¼ 0 against h ¼ 2 is completely different. Hence based on single
observation there does not exist UMP test for H : h ¼ 0 against K : h [ 0
86 3 Theory of Testing of Hypothesis

Randomized Test Testing problem H : h ¼ h0 against K : h ¼ h1 . If the ran-


dom variable Y ¼ ppððX=h 1Þ
X=h0 Þ is continuous under h ¼ h0 , we can always find kð [ 0Þ
such that for a given a 2 ð0; 1Þ; Ph0 ðY [ kÞ ¼ a.
On the other hand, if the random variable Y is discrete under h ¼ h0 , it may not be
always possible to find k such that, for a given a 2 ð0; 1Þ Ph0 ðY [ kÞ ¼ a. In that
case, we modify the non-randomized test x0 ¼ fx : pðx=h1 Þ [ kpðx=h0 Þg by using
following functions:
9 8 9
¼ 1; if pðx=h1 Þ [ kpðx=h0 Þ = <Y [k=
/0 ð xÞ ¼ a; if pðx=h1 Þ ¼ kpðx=h0 Þ , Y ¼ k ð3:41Þ
; : ;
¼ 0; if pðx=h1 Þ\kpðx=h0 Þ Y\k

where ‘a’ and ‘k’ are such that

Ph0 fY [ kg þ aPh0 fY ¼ kg ¼ a ð3:42Þ

The function given by (3.41) and (3.42) is called the randomized test corre-
sponding to non-randomized test x0 . It states that, after observing Y ði.e; X Þ

Reject H if Y [ k
Accept H if Y\k

Perform random experiment with probability of success = a, if Y ¼ k.


Occurrence of success ) Rejection of H and
Occurrence of failure ) Acceptance of H.
Now we can show that the test given by (3.41) and (3.42) is MP size-a among all
 
tests / satisfying Eh0 /ð xÞ
a. Observe that /0 ð xÞ ¼ 0 ) /0 ð xÞ  /ð xÞ 08x :
 
pðx=h1 Þ [ kpðx=h0 Þ and /0 ð xÞ ¼ 0 ) /0 ð xÞ  /ð xÞ
08x : pðx=h1 Þ\kpðx=h0 Þ.
Hence, for all x, we have,
 0 
/ ð xÞ  /ð xÞ ½pðx=h1 Þ  kpðx=h0 Þ 0
Z
 0 
) / ð xÞ  /ð xÞ ½pðx=h1 Þ  kpðx=h0 Þdx 0

, Eh1 /0 ð xÞ  Eh1 /ð xÞ  ka þ kEh0 /ð xÞ 0


, Eh1 /0 ð xÞ  Eh1 /ð xÞ kða  Eh0 /ð xÞÞ 0; As k [ 0 and Eh0 /ð xÞ
a
, P/0 ðh1 Þ P/ ðh1 Þ
3.3 Method of Obtaining BCR 87

) /0 is MP size-a among all / : Eh0 /ð xÞ


a.

Example 3.15 X1 ; X2 ; . . .. . .; Xn are i.i.d according to f ðx=hÞ ¼ hx ð1  hÞ1x ; x ¼


0; 1 To find UMP size-a test for H : h ¼ h0 ; against K : h [ h0 .

Answer Take any h1 [ h0 . To get MP size-a test for H : h ¼ h0 ; against


K : h ¼ h1 , we consider the ratio

P P
Qn xi n xi
pðx=h1 Þ f ð x =h Þ h ð 1  h Þ
¼ Qi¼1 ¼ 1P
i 1 1
Y¼ n P
pðx=h0 Þ i¼1 f ðxi =h0 Þ
xi n xi
 n  s h0 ð 1  h0 Þ
1  h1 h1 ð1  h0 Þ
¼
1  h0 h0 ð1  h1 Þ
P
where s ¼ xi . Observe that Y is a discrete r.v. under any h.

Hence by the N–P lemma, MP size-a test is given by


9
1; if Y [ k =
/0 ð xÞ ¼ a; if Y ¼ k
;
0; if Y\k
  
1  h1 n h1 ð 1  h0 Þ s
,
or k; ð3:43Þ
1  h0 h0 ð 1  h1 Þ

where ‘k’ and ‘a’ are such that Eh0 /0 ð xÞ ¼ a

, Ph0 fY [ kg þ aPh0 fY ¼ kg ¼ a; ð3:44Þ

Now,
 n 
1  h1 h1 ð 1  h0 Þ s

or k
1  h0 h0 ð 1  h1 Þ
 h 0 i
1  h1 h 1 ð 1  h0 Þ
, n log þ s log
or k0 k ¼ loge k
1  h0 h 0 ð 1  h1 Þ
0
k log 1h 1

, s
or n on
n 1h0 o
log hh10 ðð1h0Þ
log hh10 ðð1h 0Þ

 1h1 Þ 1h1 Þ  
h1 ð 1  h0 Þ
¼ C; ðsayÞ; As; h1 [ h0 ) log [0
h0 ð 1  h1 Þ
88 3 Theory of Testing of Hypothesis

Hence (3.43) and (3.44) are equivalent to


8 9
< 1; if s [ C >
> =
/0 ð xÞ ¼ a; if s ¼ C ð3:45Þ
>
: >
;
0; if s\C

and

Ph0 fs [ C g þ aPh0 fs ¼ C g ¼ a ð3:46Þ


P
Under any h, s ¼ n1 Xi  binðn; hÞ. Hence from (3.46) we have, either,
n  
P ns
Ph0 fs [ C g ¼ a , s h0 ð 1  h0 Þ
n s
¼a)a¼0
cþ1
or,

Ph0 fs Cg\a\Ph0 fs C g
P  
a  nc þ 1 ns hs0 ð1  h0 Þns ð3:47Þ
)a¼   c
n h ð1  h Þnc
c 0 0

Test given by (3.45) and (3.47) is MP size-a for H : h ¼ h0 against


K : h ¼ h1 ð [ h0 Þ. Test is independent of any h1 ð [ h0 Þ. Hence it is UMP size-a
for H : h ¼ h0 against K : h [ h0

Observation
n o
h1 ð1h0 Þ
1. For h1 \h0 ) log h0 ð1h1 Þ \0
In that case (3.43) and (3.44) are equivalent to
8 9
< 1; if s\C >
> =
/ ð xÞ ¼ a; if s ¼ C and Ph0 fs\C g þ aPh0 fs ¼ C g ¼ a:
>
: >
;
0; if s [ C
We can get UMP for H : h ¼ h0 against K : h\h0 by similar arguments.
Obviously /0 6¼ / . So there does not exist a single test which is UMP for
H : h ¼ h0 against K : h 6¼ h0
2. By De Moivre–Laplace limit theorem, for large n, pffiffiffiffiffiffiffiffiffiffiffiffi
Snh
is approximately N(0, 1).
nhð1hÞ
Hence, from (3.45) and (3.46), we get,

C  nh0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ sa ) C ’ nh0 þ sa nh0 ð1  h0 Þ
nh0 ð1  h0 Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Then approximately size-a test is : Reject H if s [ nh0 þ sa nh0 ð1  h0 Þ,
Accept H otherwise.
3.3 Method of Obtaining BCR 89

3. Power function of test given by (3.45) and (3.46) is:

PðhÞ ¼ Eh /0 ð X Þ
¼ Ph fS [ cg þ aPh fS ¼ cg
X n n  s ns  
¼ s h ð 1  hÞ þ a nc hc ð1  hÞnc
S¼c þ 1
½Can be obtained using Biometrika table
X n n  s ns
X
n  
ns
¼ ð 1  aÞ s h ð 1  hÞ þa s h ð1  hÞ
n s

S¼c þ 1 S¼c
¼ ð1  aÞIh ðc þ 1; n  cÞ þ aIh ðc; n  c þ 1Þ;
½Can be obtained using incomplete Beta function table:
Observe, as Ih ðm; nÞ is an increasing function of h, the Power function PðhÞ
increases with h.

Example 3.16 Let X be a single 1 3 observation. To find MP size-a test for H :


X  Rð0; 1Þ against K : X  R 2 ; 2
Answer pðx=H Þ ¼0; 1; if 0\x\1
otherwise

1; if 1=2 \ x \ 3=2
pðx=K Þ ¼0; otherwise

As the ratio pðx=K Þ=pðx=H Þ is discrete, MP test for H against K is given by:
9
¼ 1 if; pðx=K Þ [ kpðx=H Þ =
/0 ð xÞ ¼ a; if pðx=K Þ ¼ kpðx=H Þ ð3:48Þ
;
¼ 0; if; pðx=K Þ\kpðx=H Þ

where ‘a’ and ‘k’ are such that


E H / 0 ð xÞ ¼ a ð3:49Þ

Taking
1
k\1; 0\x
) pðx=K Þ ¼ 0 and pðx=H Þ ¼ 1
2
) pðx=k Þ\kpðx=H Þ ) /0 ð xÞ ¼ 0
1
\x\1 ) pðx=K Þ ¼ pðx=H Þ ¼ 1=2
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
3
1
x\ ) pðx=K Þ ¼ 1 and pðx=H Þ ¼ 0
2
) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1
90 3 Theory of Testing of Hypothesis
1 
So, for k\1, we get EH /0 ð X Þ ¼ 1: PH 2 \X\1 þ 1: PH ðX 1Þ ¼ 12. Thus it
is a trivial test of size 0.5.
Taking k\1;

1 9
0\x
) /0 ð xÞ ¼ 0 >
>
>
2 >
>
=
1
\x\1 ) /0 ð xÞ ¼ 0 EH /0 ð X Þ ¼ 0 and it is a trivial test of size 0:
2 >
>
>
>
3 >
1
x
) /0 ð xÞ ¼ 1 ;
2

Taking k ¼ 1;
o\x
12 ) /0 ð xÞ ¼ 0: we always accept H.
1
x\ 32 ) /0 ð xÞ ¼ 1: We always reject H.

2 \x\1 ) pðx=K Þ ¼ kpðx=H Þ ) We perform a random experiment with


1

probability of success ‘a’ determined by EH /0 ð xÞ ¼ a.


 
1
, a:PH \X\1 ¼ a , a ¼ 2a
2

Thus the randomized test given by /0 ð xÞ

19>
¼ 0; if 0\x
>
2>>
>
=
1
¼ 2a; if \x\1 is MP size-a test:
2 >
>
>
3>>
¼ 1; if1
x\ ;
2

3.4 Locally MPU Test

The optimum region is obtained by the use of the following:


Generalization of N–P lemma.
Theorem 2 Let g0 ; g1 ; g2 ; . . .gm be ðm þ 1Þ non-negative
R integrable functions on
the sample space  x . Let x be any region such that x gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm
where Ci ’s are all known numbers.
Suppose x0 be a subset of  x such that:
Pm
Inside x0 : g0 ð xÞ [ ki gi ð xÞ;
1
3.4 Locally MPU Test 91

P
m
Outside x0 : g0 ð xÞ
ki gi ð xÞ, where k1 ; k2 ; . . .. . .km are so chosen that
R 1
gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm.
x0
R R
Then we have g0 ð xÞdx g0 ð xÞdx.
x0 x
This is called generalized Neyman–Pearson Lemma.
Proof
Z Z
g0 ð xÞdx  g0 ð xÞdx
x0 x
Z Z  
x0  x ¼ x0  x\x0 ¼ insidex0
¼ g0 ð xÞdx  g0 ð xÞdx. . .. . .. . .ð1Þ
x  x0 ¼ x  x\x0 ¼ outsidex0
x0 x xx0

X
m
x 2 x0  x ) g0 ð xÞ [ ki gi ð xÞ
1
Z Z ( )
X
m
) g0 ð xÞdx ki gðxi Þ dx
1
x0 x x0 x
8 9
Xm < Z =
¼ ki gi ð xÞdx
i¼1
: ;
x0 x

X
m
x 2 x 0  x ) g0 ð x Þ
ki gi ð x Þ
1
( ) 8 9
Z Z X
m X
m < Z =
) g0 ð xÞdx
ki gi ðxÞ dx ¼ ki gi ð xÞdx
1 1
: ;
xx0 xx0 xx0

Hence L.H.S of (1)


8 9 8 9
Z Z X
m < Z = X m < Z =
g0 ð xÞdx  g0 ð xÞdx ki gi ð xÞdx  ki gi ð xÞdx
i
: ; i
: ;
x0 x xx0 x0 x xx0
2 3
X
m Z Z
¼ ki 4 gi ðxÞdx  gi ð xÞdx5
i
x0 x xx0
2 3
X
m Z Z X
m
¼ ki 4 gi ð xÞdx  gi ð xÞdx5 ¼ k i ðCi  Ci Þ ¼ 0
i i
x0 x

) Hence the proof. h


92 3 Theory of Testing of Hypothesis

Locally Best Tests


1. One-sided case: For the family fpðx=hÞ; h 2 Hg, sometimes we cannot find
UMP size-α test for H : h ¼ h0 against K : h [ h0 or h\h0 .
For example, if X1 ; X2 ; . . .; Xn ðn 1Þ are i.i.d according to the p.d.f.

1 1
f ðx=hÞ ¼ ; ð1\h\1; 1\x\1Þ
p 1 þ ðx  hÞ2

we cannot find UMP size-a test for H : h ¼ h0 against h [ h0 or h\h0 .


In that case, we can find an e [ 0 for which there exists a critical region x0 such
that Px0 ðh0 Þ ¼ a and Px0 ðhÞ Px ðhÞ8h : h0 \h\h0 þ e and 8x : Px ðh0 Þ ¼ a.
Construction Let pðx=hÞ be such that, for every x, ddh px ðhÞ exists and is
continuous in the neighborhood of h0 . Then we have, by mean value theorem, for
any h [ h0

d
Px ðhÞ ¼ Px ðh0 Þ þ ðh  h0 Þ Px ðhÞ ; h0 \h \h
dh h¼h

¼ Px ðh0 Þ þ ðh  h0 ÞP0x ðh Þ; ðsayÞ ð3:50Þ

Similarly,

Px0 ðhÞ ¼ Px0 ðh0 Þ þ ðh  h0 ÞP0x0 ðh Þ; ðsayÞ ð3:51Þ

Let x0 be such that Px0 ðh0 Þ ¼ a and P0x0 ðh0 Þ is maximum, i.e.
P0x0 ðh0 Þ P0x ðh0 Þ8x : Px0 ðh0 Þ
¼ a: Then comparing (3.50) and (3.51), we get an
e [ 0, such that Pxc ðhÞ Px ðhÞ8h : h0 \h\h0 þ e. Such a x0 is called locally
most powerful size-α test for H : h ¼ h0 against h [ h0 .
Now our problem is to choose x0 such that
Z
Px0 ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a ð3:52Þ
x0

and
Z Z
dpðx=hÞ dpðx=hÞ
P0x0 ðh0 Þ P0x ðh0 Þ , dx dx
dh0 dh0
x
Z0 x
Z
0
, p ðx=h0 Þdx p0 ðx=h0 Þdx
x0 x

R
where x satisfies Px ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a
x
3.4 Locally MPU Test 93

In the generalized N–P lemma, take m = 1 and set g0 ðxÞ ¼ p0 ðx=h0 Þ,


g1 ðxÞ ¼ pðx=h0 Þ, C1 ¼ a, k1 ¼ k.
Then we get,
)
Inside x0 : p0 ðx=h0 Þ [ kpðx=h0 Þ
ð3:53Þ
Outside x0 : p0 ðx=h0 Þ
kpðx=h0 Þ
R R
Finally, p0 ðx=h0 Þdx p0 ðx=h0 Þdx
x0 x
where x0 and x satisfy

Px0 ðh0 Þ ¼ Px ðh0 Þ ¼ a ð3:54Þ

Thus the test given by (3.53) and (3.54) is locally most powerful size-a for
H : h ¼ h0 against h [ h0 .
Note If UMP test exists for H : h ¼ h0 against h [ h0 ) LMP test corre-
sponding to the said problem must be identical to the UMP test. But the converse
may not be true.
Example 3.17 X1 ; X2 ; . . .Xn are i.id Nðh; 1Þ. H : h ¼ h0 against h [ h0 .
LMP test is provided by

x0 ¼ fx : p0 ðx=h0 Þ [ kpðx=h0 Þg ð3:55Þ

where k is such that


Z
pðx=h0 Þdx ¼ a ð3:56Þ
x0

It can be observed that

p0 ðx=h0 Þ [ kpðx=h0 Þ
1
, p0 ðx=h0 Þ [ k
pðx=h0 Þ ð3:57Þ
d
, ½loge pðx=h0 Þ [ k
dh0
P
Here pðx=hÞ ¼ ð2pÞn=2 e2 ðxi hÞ
1 2

1Xn
) log pðx=hÞ ¼ const.  ðxi  hÞ2
2 1


P
n
) d logðx=h
dh ¼ ðxi  h0 Þ, hence by (3.57), (3.55)
0
1
94 3 Theory of Testing of Hypothesis

, x0 ¼ fx : x [ k0 g

and (3.56) , Ph0 fx [ k0 g ¼ a


pffiffiffi pffiffiffi  pffiffiffi
, Ph0 nðx  h0 Þ [ nðk0  h0 Þ ¼ a ) nðk0  h0 Þ ¼ sa
n o
i.e, k0 ¼ h0 þ p1ffiffin sa . Thus x0 , x0 ¼ x : x [ h0 þ p1ffiffin sa
which is identical to the UMP test for H : h ¼ h0 against h [ h0 .

General case: Let X1 ; X2 ; . . .Xn be i.i.d with p.d.f f ðx=hÞ.


To find LMP for H : h ¼ h0 against h [ h0
Qn
Here pðx=hÞ ¼ f ðxi =hÞ;
i¼1
LMP test is given by the critical region:
x ¼ fx : p0 ðx=h0 Þ [ kpðx=h0 Þg, where k such that Px ðh0 Þ ¼ a
Now,

p0 ðx=h0 Þ
p0 ðx=h0 Þ [ kpðx=h0 Þ , [k
pðx=h0 Þ
d log pðx=h0 Þ
, [k ½pðx=hÞ [ 0
dh0

P
n 0
f ðxi =h0 Þ P
n 0
, f ðxi =h0 Þ [ k, ðsayÞ , yi [ k, where yi ¼ ff ðxðxii=h
=h0 Þ

.
1 1
Now, under H, yi 0 s is i.i.d with
Z Z
f 0 ðxi =h0 Þ
Eh0 fyi g ¼ f ðxi =h0 Þdx ¼ f 0 ðx=h0 Þdx
f ðxi =h0 Þ
Z
d d
¼ f ðx=h0 Þdx ¼ ð1Þ ¼ 0
dh0 dh0
Z 
f 0 ðxi =h0 Þ 2
Vh0 fyi g ¼ f ðx=h0 Þdx
f ðx=h0 Þ
Z 
@ log f ðx=h0 Þ 2
¼ f ðx=h0 Þdx
dh0
¼ Iðh0 Þ ½Fisher0 s information:
Pn
y
Hence, by Central Limit Theorem, for large n, pffiffiffiffiffiffiffiffiffi
1 i
 Nð0; 1Þ, under H. So,
nIðh0 Þ
for large n, the above test can be approximated by
3.4 Locally MPU Test 95

( )
X
n
f 0 ðxi =h0 Þ pffiffiffiffiffiffiffiffiffiffiffiffiffi
x¼ x: [ sa nIðh0 Þ :
i¼1
f ðxi =h0 Þ

Locally Best test: Two-sided case


For testing H : h ¼ h0 against h 6¼ h0 corresponding to the family fpðx=hÞ; h 2 Hg,
it is being known that (expect some cases) there does not exist any test which is
UMP for both sided alternatives. e.g. taking l ¼ 0 against l 6¼ 0 for Nðl; 1Þ and
taking r2 ¼ 1 against r2 6¼ 1 for Nð0; r2 Þ etc.
In those cases, we can think of a test which is UMP in a neighbourhood of h0 .
Thus a test ‘w0 ’ is said to be locally best (of size a) for H : h ¼ h0 against K : h 6¼ h0
if there exists an t [ 0 for which
(i) Pw0 ðh0 Þ ¼ a
(ii) Pw ðhÞ a 8h : jh  h0 j\t and 8w satisfying (i).
Let pðx=hÞ be such that, for a chosen w
(i) P0w ðhÞ exists in the neighbourhood of h0 ;
(ii) P00w ðhÞ exists and is continuous at (in the neighbourhood) h0 .
Then we have, by Taylor’s Theorem

ðh  h0 Þ2 00 k
Pw ðhÞ ¼ Pw0 ðh0 Þ þ ðh  h0 ÞP0w ðh0 Þ þ Pw ðh Þ;
2!
jh  h0 j\jh  h0 j

Let w0 be such that


(i) Pw0 ðh0 Þ ¼ a (size condition)
(ii) P0w0 ðh0 Þ ¼ 0 (Locally unbiased condition)
(iii) P00w0 ðh0 Þ is maximum
Then we can find an t [ 0 such that 8h : jh  h0 j \t, We have
Pw0 ðhÞ Pw ðhÞ8w satisfying (i) and (ii).
2
Now Pw ðhÞ ¼ Pw ðh0 Þ þ ðh  h0 ÞP0w ðh0 Þ þ ðhh
2!

P00w ðhk0 Þ þ g and g ! 0 as
h ! h0 .
To get Pw0 ðhÞ Pw ðhÞ8h : jh  h0 j\t we must have P00w0 ðh0 Þ P00w ðh0 Þ [due to
continuity of Pw ðhÞ].
Then w0 is called locally Most Powerful unbiased size-a test if
(i) Pw0 ðh0 Þ ¼ a
(ii) P0w0 ðh0 Þ ¼ 0:
(iii) P00w0 ðh0 Þ P00w ðh0 Þ8w satisfying (i) and (ii).
96 3 Theory of Testing of Hypothesis

Construction
Z
Z

Pw ðh0 Þ ¼ p x=h0 dx; P0w ðh0 Þ ¼ p0 x=h0 dx;


w w
Z

P00w ðh0 Þ ¼ p00 x=h0 dx:


w

Let us set in generalized N–P lemma




g0 ðxÞ ¼ p00 x=h0 ; g1 ðxÞ ¼ p x=h0 ; g2 ðxÞ ¼ p0 x=h0 ;

c1 ¼ a; c2 ¼ 0

Then
n


o
w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0
R R
where k1 and k2 are such that g1 ðxÞdx ¼ a; g2 ðxÞdx ¼ 0:
R Rw0 w0
Then we have g0 ðxÞdx g0 ðxÞdx provided ‘w’ satisfies (i) and (ii).
w0 w

, P00w0 ðh0 Þ P00w ðh0 Þ

Example 3.18 X1 ; X2 ; . . .Xn are i.i.d. Nðl; 1Þ. To find LMPU test for H : l ¼ l0
against K:l 6¼ l0 :
P n
  1
n 12 ðxi lÞ2
Answer Here p x=h ¼ pffiffiffiffi 2p
e 1

  P
n
 
0 1 n
1
2 ðxi lÞ2  
x
p =h ¼ pffiffiffiffiffiffi nðx  lÞe ¼ nðx  lÞp x=h
2p
 
00 x
   
p =h ¼ ½nðx  lÞ2 p x=h  np x=h

LMPU size-a test is


n


o Z
w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0 pðx=h0 Þdx ¼ a ð3:58Þ
x0
3.4 Locally MPU Test 97

and
Z
p0 ðx=h0 Þdx ¼ 0 ð3:59Þ
x0

n o
¼ x : ½nðx  l0 Þ2 n [ k1 þ k2 nðx  l0 Þ
n pffiffiffi 2 pffiffiffi o
¼ x: nðx  l0 Þ [ k01 þ k02 nðx  l0 Þ
 
¼ x : y2 [ k01 þ k02 y ;
pffiffiffi
y ¼ nðx  l0 Þ  Nð0; 1Þ under H
(3.59) ⟺
Z
yNðy=0; 1Þdy ¼ 0 ð3:60Þ
y2 [ k01 þ k02 y

Now the LHS of (3.60) is zero irrespective of choice of any ðk01 ; k02 Þ since
y
Nð =0; 1Þ is symmetrical about ‘0’.
Here, we can safely take k02 ¼ 0 without affecting size condition. Then our test
 
reduces to w0 : x : y2 [ k01  fx : jyj [ cg and hence (3.58) is equivalent to
R
Nðy=0; 1Þdy ¼ a ) c ¼ sa=2
jyj [ c
Then we obtain LMPU test for H : l ¼ l0 against l 6¼ l0 .
A test which is locally most powerful and locally unbiased is called a Type A
test and corresponding critical region ‘w0 ’ is said to be Type-A critical region

3.5 Type A1 (Uniformly Most Powerful Unbiased) Test


 
Let p x=h ; h 2 H : Real parameter family of distributions.
Testing problem: H : h ¼ h0 against K : h 6¼ h0 :
T ð X Þ ¼ T : Test statistic.
(i) Right tail test based on T is UMP for H : h ¼ h0 against h [ h0 (in most of the
cases)
(ii) Left tail test based on T is UMP for H : h ¼ h0 against h\h0 (in most of the
cases)
P P
[As for example N ðl; 1Þ; N ð0; r2 Þ …. T ¼ xi ; T ¼ x2i etc. and for
Bðn; pÞ; T ¼ x etc:]
There  not exist a single test which is UMP for H : h ¼ h0 against h 6¼ h0 :
 does
If p x=h has monotone likelihood ratio in T ð X Þ; i.e.
98 3 Theory of Testing of Hypothesis

x=
h1
" T ð xÞ for h1 [ h0 ; then (i) and (ii) are satisfied.
p

p x=h
0
In that case, we try to choose a test w0 for which
(i) Pw0 ðh0 Þ ¼ a
(ii) Pw0 ðhÞ a8h 6¼ h0
(iii) Pw0 ðhÞ Pw ðhÞ8h 6¼ h0 8w satisfying (i) and (ii)

 is called UMPU size-a test for H : h ¼ h0 against h 6¼ h0 :


Such a test
Let p x=h be such that, for every test w;
d
dh ½Pw ðhÞ exists;
and
Z Z Z
d d dpðx=hÞ
½Pw ðhÞ ¼ pðx=hÞdx ¼ dx ¼ p0 ðx=hÞdx ð3:61Þ
dh dh dh
w w w

Then unbiasedness of a test


d
w) Pw ðhÞ ¼ 0 ð3:62Þ
dh0

Thus, if a test w0 satisfies (i), (ii) and (iii); under (3.61), w0 also satisfies (i) and (iii).
Test satisfying (i), (iii) and (3.62) is called type-A1 test.
For exponential distribution, if type-A1 test exists, then it must be unbiased. But
this is not true in general.
Construction Our problem is to get w0 such that
R

(i) p x=h0 dx ¼ a;
R 0

w0

(ii) p x=h0 dx ¼ 0
R  
w0
R  
(iii) p x=h dx p x=h dx 8w satisfying (i) and (ii) and 8h 6¼ h0
w0 w
 

In generalized N–P Lemma, put g0 ¼ p x=h ; g1 ¼ p x=h0 ; g2 ¼ p0 x=h0 ;


c1 ¼ 1; c2 ¼ 0. n

o
 
Then, define w0 ¼ x : p x=h [ k1 p x=h0 þ k2 p0 x=h0 and hence
R   R  
p x=h dx p x=h dx8w satisfying (i) and (ii) and 8h 6¼ h0 :
w0 w
For exponential distribution, it is always possible to have such region w0 (which
means type-A1 test exists).
Example 3.19 X1; X2; . . .; Xn are i.i.d. Nðl; 1Þ. We test H : l ¼ l0 against
K : l 6¼ l0
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 99

1
P
n
ðxi lÞ2
pðx=hÞ ¼ ð2pÞn=2 e
2

P
n
X
n 1
ðxi lÞ2 X
n
p ðx=hÞ ¼ ð2pÞn=2
0 2
ðxi  lÞe i ¼ ðxi  lÞpðx=hÞ
i¼1 i

Then type-A1 region (test) is given by

w0 ¼ fx : pðx=hÞ [ k1 pðx=h0 Þ þ k2 nðx  l0 Þpðx=h0 Þg

1
P 2
e 2 ðxi lÞ
n 2
pðx=hÞ e 2 ðxlÞ
P ¼ e2ðll0 Þf2xðl0 þ lÞg
n
¼ ¼
pðx=h0 Þ e12 ðxi l0 Þ2 en2 ðxl0 Þ2
pffiffiffi
) w0 ¼ fx : eðll0 Þt [ k01 þ k02 tg where t ¼ nðx  l0 Þ
¼ fx : edi [ k01 þ k02 tg

where k00 1 and k01 are such that


Z Z
x
pð =h0 Þdx ¼ a; nðx  lÞpðx=h0 Þdx ¼ 0
w0 w0

Z
, Nðt=0; 1Þdt ¼ a ð3:63Þ
wo

Z
, tNðt=0; 1Þdt ¼ 0 ð3:64Þ
w0

Writing gðtÞ ¼ edt  k01  k02 t we have g0 ðtÞ ¼ dedt  k02 and g00 ðtÞ ¼ d2 edt [ 08t
) y ¼ gðtÞ has a single minimum (global minimum).
Now if we take a\0:5, because of (3.63) and since the distribution of t is sym-
metric about 0 under H0 our shape of the curve will be like as shown below. From the
graph, we observe that c1 \c2 ; gðtÞ [ 0 for t\c1 and t [ c2 and gðtÞ
0 otherwise.
y=g(t)

c1 c2 t
100 3 Theory of Testing of Hypothesis

Hence w0 is equivalent
R to wo ¼ fx : t\c1 or t [ c2 g
(3.63) , Nðt=0; 1Þdt ¼ a and (3.64)
t\c1 ;t [ c2

Z
, tNðt=0; 1Þdt ¼ 0 ð3:65Þ
t\c1; t [ c2

Now, as T  Nð0; 1Þ, we take w0 as

w0 ¼ fx : t\  c and t [ cg ð3:66Þ

where c is such that


Z
Nðt=0; 1Þdt ¼ a ) c ¼ sa=2 ð3:67Þ
jt j [ c

Here (3.65) is automatically satisfied. Hence test given by (3.66) and (3.67) is
type-A1 (which is UMPU).
Example 3.20

X1 ; X2 ; . . .. . .; Xn are i.i.d: Nð0; r2 Þ:

Testing Problem, H : r2 ¼ r20 against K : r2 6¼ r20


  n Pn
1 1 2
pðx=hÞ ¼ pffiffiffiffiffiffiffi e2r2 i xi
r 2P
0Pn 1
x2i
B i C 1
p0 ðx=hÞ ¼ B C
@ r2  nA 2r2 pð =hÞ
x

Thus,
 Pn 2  
x x i xi 1 x

w0 ¼ x : pð =hÞ [ k1 pð =h0 Þ þ k2 n p =h0


r20 2r20
( Pn 2
pðx= Þ x
¼ x : x h [ k01 þ k02 tg; t ¼ i 2 i
pð =ho Þ r0

P
n
x2

pðx= Þ r
n i
i r2
ð1 02 Þ
As x h ¼
0 2r2 r
e 0
pð =h0 Þ r
3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 101

n r
n d o
e2t [ k01 þ k02 t
0
w0 ¼ x :
r
 n dt
Now as before the curve y ¼ gðtÞ ¼ rr0 e 2  k01  k02 t has a single minimum.
n o
Here P T [ 0=h ¼ 1
) Shape of the curve gðtÞ will be as shown below

y = g(t)

d1 d2 t

which means there exists d1 and d2


such that w0 is equivalent to w0 ¼ fx : t\d1 or t [ d2 g
Here d1 and d2 are such that
Z
Zd2
p x=h0 dx ¼ a , fv2n ðtÞdt ¼ 1  a ð3:68Þ
w0 d1

and
Z
Z
p0 x=h0 dx ¼ 0 , ðt  nÞfv2n ðtÞdt ¼ 0 ð3:69Þ
w0 t\d1 or t [ d2

R R
(3.69) ⇔ tfv2n ðtÞdt ¼ n fv2n ðtÞdt ¼ na, by (3.68)
t\d1 or t [ d2 t\d1 or d2

Zd2
, tfvn2 ðtÞdt ¼ ð1  aÞn
d1

Zd2
, fvn2þ 2 ðtÞdt ¼ ð1  aÞ ð3:70Þ
d1

Thus UMPU (a type-A1 ) size a test is


x0 ¼ fx : t\d1 or t [ d2 g such that
   
P d1\ vn2 \d2 ¼ 1  a and P d1 \vx2þ 2 \d2 ¼ 1  a:

Note in this example ) type A1 test , type-A test.


102 3 Theory of Testing of Hypothesis

 
Example 3.21 X1 ; X2 ; . . .; Xn are i.i.d with p x=h ¼ hehx . Find Type-A and Type-
A1 test for H : h ¼ h0 against h 6¼ h0 .
Answer proceed as Examples 3.19 and 3.20 and hence get
( )
X
n X
n
x0 ¼ x: Xi \c1 or Xi [ c2
1 1

where c1 and c2 are such that



c1 c2
P  1\v2n \
2
 1 ¼ 1  a and
2h0 2h0

c1 c2
P  1\v2n þ 2 \
2
 1 ¼ 1  a:
2h0 2h0
Chapter 4
Likelihood Ratio Test

4.1 Introduction

In the previous chapter we have seen that UMP or UMP-unbiased tests exist only
for some special families of distributions, while they do not exist for other families.
Further, computations of UMP-unbiased tests in K-parameter family of distribution
are usually complex. Neyman and Pearson (1928) suggested a simple method for
testing a general testing problem.
Consider X  pðxjhÞ, where h is a real parameter or a vector of parameters,
h 2 H:
A general testing problem is

H : h 2 H0 Against K : h 2 H1 :

Here, H and K may be treated as the subsets of H. These are such that H \ K ¼
/ and H [ K  H. Given that X ¼ x; pðxjhÞ is a function of h and is called like-
lihood function. Likelihood test for H against K is provided by the statistic

Sup pðxjhÞ
h2H
L ð xÞ ¼ ;
Sup pðxjhÞ
h2H[K

which is called the likelihood ratio criterion for testing H against K. It is known that
(i) pðxjhÞ  08h
(ii) Sup pðxjhÞ  Sup pðxjhÞ.
h2H h2H[K

Obviously 0  Lð xÞ  1. The numerator in Lð xÞ measures the best explanation


that the observation X comes from some population under H and the denominator

© Springer India 2015 103


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_4
104 4 Likelihood Ratio Test

measures the best explanation of X to come from some population covered under
H [ K. Higher values of the numerator correspond to the better explanation of
X given by H compared to the overall best possible explanation of X, which results
in larger values in Lð xÞ leading to acceptance of H. That is, Lð xÞ would be larger
under H than under K. Indeed, smaller values of Lð xÞ will lead to the rejection of
H. Hence, our test procedure is:
Reject H iff Lð xÞ\C
and accept H otherwise,
where C is such that PfLð xÞ\CjH g ¼ a 2 ð0; 1Þ:
If the distribution of Lð xÞ is continuous, then the size a is exactly attained and no
randomization on the boundary is needed. If the distribution is discrete, the size
may not attain a and one may require randomization. In this case, we have C from
the relation

PfLð xÞ\CjH g  a:

Here, we reject H if Lð xÞ\C;


accept H if Lð xÞ [ C;
and reject with probability ‘a’ iff Lð xÞ ¼ C.
Thus, we have PfLð xÞ\CjH g þ aPfLð xÞ ¼ CjH g ¼ a.
The likelihood ratio tests are useful, especially when h is a vector of parameters
and the testing involves only some of them. This test criterion is very popular
because of its computational simplicity. Moreover, this criterion proves to be a
powerful alternative for testing vector valued parameters that involve nuisance
parameters. Generally, the likelihood ratio tests result in optimal tests, whenever
they exist. An LR test is generally UMP, if an UMP test at all exists. In many cases
the LR tests are unbiased, although this is not universally true. However, it is
difficult to compute the exact null distribution of the test statistic Lð xÞ in many
cases. Therefore, a study of large sample properties of Lð xÞ becomes necessary
where maximum likelihood estimators follow normal distribution under certain
regularity conditions. We mention the following large sample property of the
likelihood ratio test statistic without proof.
Under H, the asymptotic distribution of 2 loge Lð xÞ is distributed as v2 with
degrees of freedom equal to the difference between the number of independent
parameters in H and the number in H0 .
Drawback: Likelihood ratio test is constructed completely by intuitive argu-
ment. So, it may not satisfy all the properties that are satisfied by a test obtained
from N–P theory; it also may not be unbiased.

4.1.1 Some Selected Examples

Example 4.1 Let X be a binomial bðn; hÞ random variable. Find the size-a likeli-
hood ratio test for testing H : h  h0 against K : h [ h0
4.1 Introduction 105

Solution Here, H0 ¼ fh : 0  h  h0 g and H ¼ fh : 0  h  1g.


The likelihood ratio test statistic is given as

Sup pðxjhÞ Sup pðxjhÞ


h2H h  h0
L ð xÞ ¼ ¼
Sup pðxjhÞ Sup pðxjhÞ
H[K H
 
n x
Sup h ð1  hÞnx
h  h0 x
¼  
n x
Sup h ð1  hÞnx
H x
_
The MLE of h for h 2 H is h ¼ nx.
For, h 2 H0 , we have

_ x x
hH ¼ if  h0
n n
x
¼ h0 if [ h0
n

Thus, we have
     
n n x x
nx x nx
Sup h ð 1  hÞ ¼
x
1
H x x n n
8 
> n    nx
  >
> x x
1  nx if x
 h0
n x < x n n
nx
Sup h ð 1  hÞ ¼  
h  h0 x >
> n x
>
: h ð1  h0 Þnx if x
[ h0
x 0 n

(
1 if x
n  h0
So, Lð xÞ ¼ hx0 ð1h0 Þnx
if x
[ h0
ðnxÞ ð1nxÞ
x nx
n

It can be observed that Lð xÞ  1 when x [ nh0 and Lð xÞ ¼ 1 when x  nh0 . This


shows that Lð xÞ is the decreasing function of x. Thus, Lð xÞ\C iff x [ C 0 and the
likelihood ratio test rejects H0 if x [ C 0 where C0 is obtained as
Ph0 ðX [ C 0 Þ ¼ a. Since X is a discrete random variable, C 0 is obtained such that

Ph0 ðX [ C 0 Þ  a\Ph0 ðX [ C0  1Þ:

Example 4.2 Let X1 ; . . .; Xn be a random sample from a normal distribution with


mean l and variance r2 . Find out the likelihood ratio test of
(a) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is known.
(b) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is unknown.
106 4 Likelihood Ratio Test

Solution
(a) Here, H0 ¼ fl0 g; H ¼ fl : 1\l\1g
P
n
 n=2 2r2 1
ðXi lÞ2
pðxjhÞ ¼ pðxjlÞ ¼ ð2pÞn=2 r2 e i¼1

The likelihood ratio test statistic is given as

Sup pðxjlÞ Sup pðxjlÞ


H H0
Lð xÞ ¼ ¼
Sup pðxjlÞ Sup pðxjlÞ
H[K H

The maximum likelihood estimate of l for l 2 H is x.


P
n
 1 ðXi l0 Þ2
n=2 2 n=2 2r2 i¼1
So, Sup pðxjlÞ ¼ ð2pÞ ðr Þ e
H0
P
n
 1
ðXi xÞ2
n=2
and Sup pðxjlÞ ¼ ð2pÞn=2 ðr2 Þ
2r2
e i¼1 :
H
This gives
Pn
ðXi l0 Þ2
e2r2
1
i¼1 2
¼ e2r2 ðxl0 Þ
n
L ð xÞ ¼ Pn
 1
ðXi xÞ2
e 2r2 i¼1

The rejection region Lð xÞ\C gives


n
 2 ðx  l0 Þ2 \C1
2r
pffiffiffi
nðx  l0 Þ
or; [ C2
r

Thus, the likelihood ratio test is given as


( pffiffi

1 if nðxrl0 Þ [ C2
/ðxÞ ¼
0 Otherwise

where the constant C2 is obtained by the size condition

El0 ½/ðxÞ ¼ a

pffiffiffi
nðx  l0 Þ

or; P l0
r [ C2 ¼ a
4.1 Introduction 107

pffiffi
Now, under H : l ¼ l0 , the statistic nðxrl0 Þ follows N ð0; 1Þ distribution. Since
the distribution is symmetrical about 0, C2 must be the upper a=2—point of the
distribution. Finally, the test is given as
( pffiffi

1 if nðxrl0 Þ [ sa=2
/ðxÞ ¼
0 Otherwise

(b) Here, H0 ¼ ðl; r2 Þ : l ¼ l0 ; r2 [ 0
 
H¼ l; r2 : 1\l\1; r2 [ 0

In this case,
  Pn
  1 n=2 2r2 ðXi lÞ
1 2

Sup p xjl; r 2
¼ Sup e i¼1

H0 H0 2pr2

MLE of r2 given l ¼ l0 is given as


1X n
s20 ¼ ð X i  l0 Þ 2 :
n i¼1

 n=2
e 2 .
n
This gives Sup pðxjl0 ; r2 Þ ¼ 1
2ps02
H0
 n=2 Pn
ðXi lÞ2
e2r2
1
Further, Sup pðxjl; r2 Þ ¼ Sup pðxjl; r2 Þ ¼ Sup p 1
2pr2
i¼1
H l;r2 l;r2
Pn
The MLE of l and r are given as x and2 1
i¼1
 Þ2 ¼ ðn1Þ s2 ; where
ðXi  X
Pn n n
s ¼ n1  2
i¼1 ðXi  X Þ . We have,
2 1

!n=2

 1
e 2
n
Sup p xjl; r2 ¼
l;r2 2p ðn1Þ 2
n s

Hence, likelihood ratio test statistic is given as


 n2
e2
1  n n=2
2ps20 ðn  1Þs2
Lð xÞ ¼  n2 ¼
ns20
e2
1 n
ðn1Þ 2
2p n s
!n=2 " #n=2
ðn  1Þs2 ðn  1Þs2
¼ Pn ¼ Pn
i¼1 ðX i  l 0 Þ2 nðx  l0 Þ þ
2  2
i¼1 ðXi  X Þ
" #n=2
ðn  1Þs2
¼
nðx  l0 Þ2 þ ðn  1Þs2
108 4 Likelihood Ratio Test

The LR critical region is given as

Lð xÞ\C
nðx  l0 Þ2
or; 2
[ C1
pffiffiffi s
nðx  l0 Þ
or; [ C2 :
s

The likelihood ratio test is given as


( pffiffi

1 if nðxsl0 Þ
/ðx; sÞ ¼ [ C2 ;
0 otherwise

where the constant C2 is obtained by the size condition



pffiffiffi
nðx  l0 Þ
Pl0 [ C2 ¼ a:

s
pffiffi
Now under H : l ¼ l0 , the statistic nðxsl0 Þ is distributed as, t with (n − 1) d.f
which is symmetric about 0. Hence, C2 ¼ ta2;n1 . Finally, the test is given as

( pffiffi

1 if nðxsl0 Þ
/ðx; sÞ ¼ [ ta2;n1
0 otherwise

Example 4.3 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ. Derive LR test for H : l  l0 against
K : l [ l0 .
Answer h ¼ ðl; r2 Þ; H ¼ h : 1\l\1; r2 [ 0
P
n
    n 2r2
1
ðXi lÞ2
Here, pðx=hÞ ¼ p x l; r2 ¼ ð2pÞ2 ðr2 Þ 2 e
n
1

Likelihood ratio criterion


     
Sup x l; r2 Sup x l; r2
H l  l0
Lð xÞ ¼ ¼ ½H [ K ¼ H ¼ ðl  l0 Þ[ðl [ l0 Þ
Supðx=l; r2 Þ Supðx=l; r2 Þ
H[K H
  
p x l ^2H
^H ; r
¼
pðx=^ ^2 Þ
l; r
 
where l ^2H : MLE ðl; r2 Þ under H
^H ; r
^; r
ðl ^2 Þ: MLE of ðl; r2 Þ under H [ K (in the unrestricted case).
4.1 Introduction 109

For ðl; r2 Þ 2 H [ K, i.e. H, we have

_ _2 1Xn
ð n  1Þ 2
l ¼ x; r ¼ ðxi  xÞ2 ¼ s
n 1 n

For ðl; r2 Þ 2 H, we have

_ n1 2
lH ¼ x if x  l0 and r
^2H ¼ s if x  l0
n
1X n
¼ l0 if x [ l0 and r
^2 H ¼ s20 ¼ ðxi  l0 Þ2 if x [ l0 :
n 1
 .  n  n
_ 2 2 n
^2 ¼ ð2pÞ 2 n1
Thus, we have p x l; r n s e2

 .   n2
_ n2 n  1 2 n
^ H ¼ ð2pÞ
p x lH ; r 2
s e 2 if x  l0
n
n n n
¼ ð2pÞ2 s20 2 e2 if x [ l0

So, Lð xÞ ¼ 1 if x  l0

n1 2 n=2
n s
¼ ; if x [ l0
s20

Hence, we reject H if Lð xÞ\C; where Cð\1Þ is such that

PfLð xÞ\C=l ¼ l0 g ¼ a 2 ð0; 1Þ ð4:1Þ


n1 2 n=2
s
, n2 \C and x [ l0
s0
" #n=2
ðn  1Þs2
, \C and x [ l0
ðn  1Þs2 þ nðx  l0 Þ2
nðx  l0 Þ2
, 1þ [ C 0 and x [ l0
ðn  1Þs2
pffiffiffi
nðx  l0 Þ
, [ C00 ð4:2Þ
s
npffiffi 00
o
Thus, (4.1) is , P nðxsl0 Þ [ C =l ¼ l0 ¼ a
110 4 Likelihood Ratio Test

8 9
>
> p ffiffi
ffi >
>
< nðx  l Þ 00
=
, P rP C
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ =l ¼ l
0
¼a
>
> n 0>
>
: ðx xÞ2
1 i ;
n1
00
) C ¼ ta;n1
pffiffi
Hence, reject H iff nðxsl0 Þ [ ta;n1
) Test can be carried out by using students’ t-statistic.
Example 4.4 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ; 1\l\1; r2 [ 0: Find the LR test
for

I. H : r2 ¼ r20 against K : r2 [ r20


II. H : r2 ¼ r20 against K : r2 6¼ r20
Answer I. h ¼ ðl; r2 Þ
Pn
   n  12 ðxi lÞ2
p x=l; r2 ¼ Likelihood function ¼ ð2pÞn=2 r2
=2 2r
e i¼1

 n=2  1 ½P ðxi xÞ2 þ nðxlÞ2 


¼ ð2pÞn=2 r2 e 2r2

Likelihood ratio is:


     
Supl;r2 ¼r20 p x l; r2 p x l^H ; r20
Lð xÞ ¼ ¼ ;
Supl;r2  r20 pðx=l; r2 Þ ^2 Þ
l; r
pðx=^

^ = (MLE of l overall h : r2  r20 ) = x


where l
 n1
s2 ; if n1
n s  r0
2 2
^ ¼ MLE of r ¼
r 2 2 n
r0 ; if n s \r0
2 n1 2 2

^H = MLE of l under 8
l H = x
>
> 
ðn1ÞS2
< ðr20 Þn=2 e 2r20
Hence we get, Lð xÞ ¼ Þ 2 n=2 n
; if n1 2
n s  r20
>
> ððn1
n s Þ e 2
: 1; if n1 s2 \r2
n 0
4.1 Introduction 111

Now we apply LR technique: reject H iff

Lð xÞ\C: ð4:3Þ

where C (<1) is such that


PfLð xÞ\C=H g ¼ a 2 ð0; 1Þ ð4:4Þ
n1 2 n2 ðn1Þs2
s  2 ðn  1Þs2
ð 1Þ , n 2 e 2r0 \C 0 iff n
r0 r20

, u2 e2 \C 
n u
if u  n ð4:5Þ

Writing gðuÞ ¼ u2 e2 ; u  0


n u

n
n n u2 u
g0 ðuÞ ¼ u21 e2  e2 ¼ 0
u

2 2
)nu¼0,u¼n

The curve y = g(u) has a maximum and the shape of the curve is

g(u)=c'

u-1 u= n u0

From the graph, it is clear that gðuÞ\C  , u\u1 or u [ u0 where


0\u1 \n\u0 :
Hence, (4.5) , u [ u0 and (4.4) , PH ðU [ u0 Þ ¼ a


2 As under H;
, P vn1 [ u0 ¼ a
U  v2n1
) u0 ¼ v2n1;a
P
Thus, LR test is: reject H if ni¼1 ðxi  xÞ2 [ r20 v2n1;a :
ðn1Þs2

Supl;r2 ¼r2 pðx=l;r2 Þ ðr2 Þn=2 e 2r20
¼ K u2 e  2
n u
II. Lð xÞ ¼ Sup 0pðx=l;r2 Þ ¼ n1 0
n=2
l;r2 ð n s2 Þ en=2 n n U o
Our test is to reject H if u2 e2 \C0 ; where C 0 is such that PH U 2 e 2 \C0 ¼ a:
n u

From the graph, we observe that the line y = C′ cuts the curve y = g(u) at two
points: u1 and u0 such that 0\ u1 \n\ u0 . Hence, the test is equivalent to reject
H iff u\u1 or u [ u0 ; where
   
PH v2n1 \u1 þ PH v2n1 [ u0 ¼ a:
112 4 Likelihood Ratio Test

Although v2 is not symmetric, for the matter of simplicity, equal error proba-
bilities a=2 are attached to both left-and right-sided critical regions. Thus, u1 ¼
v2n1;1a and u0 ¼ v2n1;a .
2 2

Example
 4.5 Let X1 ; X2 ; . . .Xn1 and Y1 ; Y2 ; . . .Yn2 be two independent samples drawn
from N l1 ; r21 and N l2 ; r22 ; respectively. Find out the likelihood ratio test of
(a) H : r21 ¼ r22 against K : r21 [ r22
(b) H : r21 ¼ r22 against K : r21 6¼ r22 ; when l1 and l2 are unknown
 
(a) Here, h ¼ l1 ; l2 ; r21 ; r22
 
H0 ¼ l1 ; l2 ; r21 ; r22 : 1\l1 ; l2 \1; r21 ¼ r22 ¼ r2 [ 0
 
H ¼ l1 ; l2 ; r21 ; r22 : 1\li \1; r2i [ 0; i ¼ 1; 2
P
n1
2
P
n2
2
n1 þ n2
1
 1
ðXi l1 Þ  ðyi l2 Þ
pðx; yjhÞ ¼ ð2pÞ rn 1 n2 2r2 2r2
1 r2 e
2 1 i¼1 2 i¼1

 _ 
Sup pðx; yjhÞ
p x; y hH
h2H
Lðx; yÞ ¼ ¼  _
Sup pðx; yjhÞ
h2H[K
p x; y h
_
where hH = MLE of h under H
_
h ¼ MLE of h under H [ K:

Under H, we obtain MLEs


P P
_ _ _2 ðxi  xÞ2 þ ðyi  yÞ2
l1H ¼ x; l2H ¼ y; rH ¼
n 1 þ n2

Under H [ K, MLEs are


8 P
>
>
_ _ _2
l1 ¼ x; l2 ¼ y; r1 ¼ n11 ðxi  xÞ2
>
>
>
< _2 P _2
r
r2 ¼ n12 ðyi  yÞ2 if _12  1
> r2
>
>
>
> _ _ _2
_2
r1
: l1 ¼ x; l2 ¼ y; rH if _2 \1
r2

 _
 n1 þ n2  n1 þ2 n2 n þ n
_ 2
Hence, p x; yjhH ¼ ð2pÞ 2 rH e 2
1 2

8
n þn  n21  2 n22 n þ n
>
> ð Þ 1 2 2 _2
r
_
r2 e 2
1 2
_2
r1
1
 _
 < 2p 1 if _2
r2
and p x; yjh ¼ n1 þ n2  2   1 2 2
n þn
>
> n1 þ n2 _2
r1
: ð2pÞ 2 r _
H e 2 if _2
\1
r2
4.1 Introduction 113
8
> _2 n21 _2 n22
>
> r1 r2 _2
r1
>
<   n1 þ n2 if 1
_2
_2 r2
Therefore, Lðx; yÞ ¼
2
rH
>
>
>
>
_2
r1
: 1 if _2
\1
r2

8 nP on1=2 nP on2=2
>
> ðxi xÞ2 ðyi yÞ2 P
>
> ðxi  xÞ2=
>
>
n1 n2
P n1  1
> if
< nP
> 2
P
ðxi xÞ þ
2 oðn1 þ n2 Þ=2
ðyi yÞ ðyi  yÞ2=
¼ n1 þ n2
n2
>
> P
>
> ðxi  xÞ2=
>
> n1 \1
>
> P
:1 if
ðyi  yÞ2=
n2
8
> P n1=2
>
> 2
P
>
> P ðxi xÞ2 ðxi  xÞ2=
>
>
n1 þ n2
ðyi yÞ
> ðn1 þ n2 Þ 2
> if P n1  1
>
< n 1= n 2=
n 2 n 2  P n1 þ2 n2 ðyi  yÞ2=
ðxi xÞ2
¼ 1þP n2
1 2

>
> ðyi yÞ2
P
>
>
>
> ðxi  xÞ2=
>
> n1 \1
>
> 1 if P
: ðyi  yÞ2=
n2

Now, under the null hypothesis H : r21 ¼ r22 ¼ r2 ; consider the statistic
P .
ðxi  xÞ2
ðn1  1Þ s21
F¼P 2.
¼ 2  Fn1 1;n2 1 :
ðyi  yÞ s2
ð n2  1Þ

On writing Lð xÞ in terms of F, we have


8
>  n1=2
>
>
>
> ðn þ n Þ=2 n1 1
F
< ð n1 þ n2 Þ 1 2

n2 1
if F n1 ðn2 1Þ
Lðx; yÞ ¼ n 1=2 n 2=2 ðn1 þ n2 Þ=2 n2 ðn1 1Þ
n1 n2
>
> 1 þ
n1 1
F
>
>
n2 1
>
:1 if F \ nn12 ððnn21 1Þ
1Þ

Now we apply LR technique: reject

H iff L ðx; yÞ\C ð4:6Þ

where C(<1) is such that


114 4 Likelihood Ratio Test

PfLðx; yÞ\CjH g ¼ a 2 ð0; 1Þ: ð4:7Þ


 n1 =2
n1 1
n2 1 F n1 ð n2  1Þ
ð4:6Þ )  ðn1 þ n2 Þ=2 \C1 if F ;
n1 1
n2 ð n1  1Þ
1þ n2 1 F

2  n1 =2 3
n1 1
6 n2 1F n1 ðn2 1Þ7
where C1 is such that P4 ðn1 þ n2 Þ=2 \C1 and F n2 ðn1 1Þ5 ¼ a:
n 1
1 þ n1 1F
 n1=2
2

n1 1
n2 1F
Writing gðF Þ ¼  ðn1 þ n2 Þ=2
n 1
1 þ n1 1F
2

 n21 1  n1 þ2 n2  n21


0 n1 n1  1 n1  1 n1  1 n1  1 n1 þ n2
) g ðF Þ ¼ F 1þ F  F
2 n2  1 n2  1 n2  1 n2  1 2
 n1 þ2 n2 1
n1  1 n1  1
1þ F ¼0
n2  1 n2  1
   
n1  1 n1  1
) n1 1 þ F  ðn 1 þ n 2 Þ F¼0
n2  1 n2  1
n1 ðn2  1Þ
)F¼
n2 ðn1  1Þ

The curve gðF Þ has single maximum at F ¼ nn12 ððnn21 1Þ


1Þ and the shape of the curve is

From the graph, we observe that gðF Þ ¼ C1 and F  nn12 ððnn21 1


1Þ
Þ
 
) F [ d0 [ nn12 ððnn12 1 Þ
1Þ . The constant d0 is obtained by the size condition

Pr21 ¼r22 fF [ d0 g ¼ a
4.1 Introduction 115

This gives d0 ¼ Fn1 1;n2 1;a


(
s21
1 if [ Fn1 1;n2 1;a
) LR test is given by /ðx; yÞ ¼ s22
0 otherwise

(b) Similarly, for testing H : r21 ¼ r22 against K : r21 6¼ r22 the LR test is equivalent
to

s21 s21
\d 1 or [ d0 :
s22 s22

These constants d 1 and d0 are obtained by the size condition

PH fF\d 1g ¼ PH fF [ d0 g ¼ a=2

This gives d 1 ¼ Fn1 1;n2 1;1a2 and d0 ¼ Fn1 1;n2 1;a=2 : The LR test is, there-
fore, given as
8
>
>
s21
\Fn1 1;n2 1;1a=2
<1 if s22
/ðx; yÞ ¼ s21
[ Fn1 1;n2 1;a=2
>
>
or s22
:
0 otherwise

Example 4.6 Let X1; X2 ; . . .; Xn1 and


 Y1 ; Y2 ; . . .Yn2 be two independent samples
drawn from N l1 ; r1 and N l2 ; r2 ; respectively. Obtain the likelihood ratio test of
2 2

(a) H : l1 ¼ l2 against K : l1 6¼ l2 when r21 and r22 are known


(b) H : l1 ¼ l2 against K : l1 6¼ l2 when r21 ¼ r22 ¼ r2 but unknown
(c) H : l1  l2 against K : l1 \l2 when r21 ¼ r22 ¼ r2 but unknown

Solution
 
(a) Here, h ¼ l1 ; l2 ; r21 ; r22

H0 ¼ fðl1 ¼ l2 ¼ lÞ; 1\l\1g


H ¼ fðl1 ; l2 Þ; 1\li \1; i ¼ 1; 2g

P n1 P n2
n1 þ n2  1
ðXi l1 Þ2  1
ðyi l2 Þ2

rn1 n2 2r2 2r2
pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1

ML estimators of l1 and l2 under H are given as


_ _
l1 ¼ x and l2 ¼ y
116 4 Likelihood Ratio Test

P
n1 n2 P
n1 þ n2  1
ðxi xÞ2  1
ðyi yÞ2

rn1 n2 2r2 2r2
Sup pðx; yjhÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1

ðl1 ;l2 Þ2H

Under l 2 H0 ;

P
n1 P n2
n1 þ n2  1
ðXi lÞ2  1
ðyi lÞ2

rn 1 n2 2r2 2r2
pðx; yjlÞ ¼ ð2pÞ 2
1 r2 e
1 i¼1 2 i¼1

On taking log, we get

1 X 1 X
log pðx; yjlÞ ¼ k  2
ð x i  lÞ 2  2 ð y i  lÞ 2 ;
2r1 2r2

where k is a constant which is independent of l.


The ML estimator for l is obtained as

d
log pðx; yjlÞ ¼ 0
dl
1X n1
1X
) 2 ð xi  l Þ þ 2 ð y i  lÞ ¼ 0
r1 1 r2
 
1 1 n1 n2
) 2 n1x þ 2 n2y ¼ þ l
r1 r2 r21 r22
n1 x n2 y r21 r22
þ
_ r21 r22 n1 yþ n2 x
) lH ¼ ¼ :
n1
þ n2 r21 r22
r21 r22 n1 þ n2

This gives

P
n1 _ 2 Pn2 _ 2
n þn  1
xi l H  1
yi l H
12 2
rn 1 n2
2 2
pðx; yjhÞ ¼ ð2pÞ 1 r2 e
2r 2r
Sup 1 i¼1 2 i¼1

ðl1 ;l2 Þ2H0

LR test L(x, y) is given as



 2
 2
P _ P _
 1
ðxi xÞ2 þ n1 xl H  1
ðyi yÞ2 þ n2 yl H
2r2 2r2
e 1 2

Lðx; yÞ ¼ P P
 1
ðxi xÞ2  1
ðyi yÞ2
2r2 2r2
e 1 2
 _ 2  _ 2
n1 n2
 xl H  yl H
¼e 2r2
1
2r2
2
4.1 Introduction 117

"r2 #2
 2
n1 ð
1 x yÞ
_
Now, x  lH ¼ r2 r2
n1 þ n2
1 2

22 32
 2 r2
ð 
y  
x Þ
y  lH ¼ 4n2r2 r2 5
_

n1 þ n2
1 2

 2 1
n1 _ n2 _ 2 r1 r22
2 2
) 2 ðx  lH Þ þ 2 ðy  lH Þ ¼ ðx  yÞ þ
r1 r2 n1 n2
2 1
r r2
12 1 þ n2 ðxyÞ2
Thus, Lðx; yÞ ¼ e 2n1

The rejection region Lðx; yÞ\C gives


 1
1 r21 r22
 þ ðx  yÞ2 \C1
2 n1 n2
0 1
ðx  
y Þ 2
or; @ r2 r2 A [ C2
n þ n2
1 2

1

x  y

or; qffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3
r21 r22
n1 þ n2

We know that under H : l1 ¼ l2 ; qxffiffiffiffiffiffiffiffiffi


y ffi
2 2
 N(0, 1). Hence, the likelihood ratio
r r
1
n1 þ n2
2

test has critical region


8 9
> >
< x  y =

q
x ¼ ðx; yÞ : 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3 ;
>
: r1 r22 >
;
n1 þ n2

where C3 is such that


2 3

x  y
6 7
PH 4 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ C3 5 ¼ a
r21 r22
n1 þ n2

This gives C3 ¼ sa2 :


118 4 Likelihood Ratio Test

Finally, the LR test is given as


8
<1 if qjxffiffiffiffiffiffiffiffiffi
yj
ffi [ sa2
r 2 2r
/ðx; yÞ ¼ 1 þ n2
: n1 2
0 Otherwise
 
(b) Here h ¼ l1 ; l2 ; r21 ; r22
 
H0 ¼ l; r2 ; 1\l\1; r2 [ 0
 
H¼ l1 ; l2 ; r2 ; 1\li \1; r2 [ 0; i ¼ 1; 2

For ðl1 ; l2 ; r2 Þ 2 H;

n
 n1 þ2 n2 P 1 P
n2

 
2 2
 12 ðxi l1 Þ þ ðyi l2 Þ
1 2r
p x; yjl1 ; l2 ; r2 ¼ e 1 1
2pr2

On taking log, we get


" #
n1 þ n2   1 X n1
2
X
n2
2
log p ¼  log 2pr  2
2
ð x i  l1 Þ þ ð y i  l2 Þ
2 2r 1 1

The ML estimators for ðl1 ; l2 ; r2 Þ 2 H are given as

d log p _
¼ 0 ) l1 ¼ x
dl1
d log p _
¼ 0 ) l2 ¼ y
dl2
P P
d log p _2 ðxi  xÞ2 þ ðyi  yÞ2
¼ 0 ) r ¼
dr2 n1 þ n2
 n1 þ2 n2  ðn1 þ n2 Þ
  1 _2
e2ðn1 þ n2 Þ
2 1
) Sup p x; yjl1 ; l2 ; r ¼ 2
r
ðl1 ;l2 ;r Þ2H
2 2p

For ðl; r2 Þ 2 H0 ;


 n1 þ2 n2 P
n1 P
n2

 
2 2
 12 ðxi lÞ þ ðyi lÞ
1 2r
p x; yjl; r2 ¼ e 1 1
2pr2
" #
n1 þ n2   1 X n1
2
X
n2
2
) log p ¼  log 2pr  2
2
ð x i  lÞ þ ð y i  lÞ
2 2r 1 1
4.1 Introduction 119

Now,

d log p _ n1x þ n2y


¼ 0 ) lH ¼
dl n1 þ n2
" #
Xn1  2 Xn2  2
d log p _2 1 _ _
¼ 0 ) rH ¼ xi  lH þ yi  lH
dr2 n1 þ n2 1 1
" #
1 Xn1  2 X n2  2
2 _ 2 _
¼ ðxi  xÞ þ n1 x  lH þ ðyi  yÞ þ n2 y  lH
n1 þ n2 1 1

 2  2 n o2
_ n1 x þ n2 y n2 ðx  yÞ
Here, x  lH ¼ x  n1 þ n2 ¼ n1 þ n2

 2   
n1x þ n2y 2

n1 ðy  xÞ 2
_
y  lH ¼ y  ¼
n1 þ n2 n1 þ n2

This gives
"     #
  1 X
n1
n2 ðx  yÞ 2 X n2
n1 ðy  xÞ 2
_2 2 2
rH ¼ ðxi  xÞ þ n1 þ ðyi  yÞ þ n2
n1 þ n2 1 n1 þ n2 1
n1 þ n2
" #
1 X
n1 X
n2
n1 n2
¼ ðxi  xÞ2 þ ðyi  yÞ2 þ ðx  yÞ2
n1 þ n2 1 1
n 1 þ n2

 1 n1 þ2 n2 _ 2 
ðn1 þ n2 Þ

e2ðn1 þ n2 Þ
2 1
Therefore, Sup pðx; yjl; r Þ ¼ 2p 2
rH
ðl;r2 Þ2H0
Hence we get,

Sup pðx; yjl; r2 Þ


ðl;r2 Þ2H0
Lðx; yÞ ¼
Sup pðx; yjl1 ; l2 ; r2 Þ
ðl1 ;l2 ;r2 Þ2H

_2
!n1 þ2 n2 Pn1 P
r ðxi  xÞ2 þ n12 ðyi  yÞ2
¼ _2
¼ Pn 1 P 1

rH xÞ2 þ n12 ðyi  yÞ2 þ n1n1þn2n2 ðx  yÞ2


1 ð xi  

2 3
6 7
6 7
6 1 7
¼6 7
6 n n ðx  yÞ2 7
41 þ nP 1 2 o5
n1 P
ðn1 þ n2 Þ 1 ðxi   xÞ2 þ n12 ðyi  yÞ2
120 4 Likelihood Ratio Test

2 3
6 7
6 1 7
¼6
6
7
7
ðxyÞ
P
2
P n2
41 þ 5
ðn11 þ n12 Þðn1 þ n2 2Þf ðyi yÞ2 g
n1
1
ðxi xÞ2 þ 1
ðn1 þ n2 2Þ

   
  N l1 ; r2 and Y  N l2 ; r2
We know, X n1 n2

  
  Y  N l1  l2 ; r2 1 þ 1
X
n1 n2
 ðl1 l2 Þ
Thus, X ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rY
   N ð0; 1Þ
r 1
n1 þ n1
2
P
Again, 1
ðXi  X Þ2  v2n 1
r2
P 1

and r12 ðYi  Y Þ  v2n2 1 :


2
hP P i
Therefore, r12 ðX i  X Þ2 þ ðYi  Y Þ  v2n1 þ n2 2
2

Thus, under H : l1 ¼ l2 ;
. ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðX  Y Þ q
n1 þ n2
r 1 1

t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
hP i ffi  tn1 þ n2 2 ;
P
1
ðX i  X Þ þ 2  2
ðYi  Y Þ n1 þ n2 2 1
r2

ðx  yÞ
or; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  ffi  n1 þ n2  2
n1 þ n2 s
1 1 2

P P
ðxi  xÞ2 þ ðyi  yÞ2 ðn1  1Þs21 þ ðn2  1Þs22
where s ¼
2
n1 þ n2 2 ¼ n1 þ n2  2
So, Lðx; yÞ ¼ 1
2
1 þ n þt n 2
1 2
Thus, the rejection region Lðx; yÞ\C gives

t2
1þ [ C1
n1 þ n2  2
or; t2 [ C2
or; jtj [ C3

Therefore, the LR test is given as jtj [ C3 ;where C3 is obtained as


PH ½jtj [ C3  ¼ a:
4.1 Introduction 121

This gives C3 ¼ tn1 þ n2 2; a=2 : Hence, LR test is given as


8 jx  yj
>
<1 if rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  [ tn1 þ n2  2;a=2
/ðx; yÞ ¼
s
1
þ 1
>
:
n1 n2

0 Otherwise

(c) Proceeding similarly as in (b), for testing H : l1  l2 against K : l1 \l2 the


LR test is given as

8 x  y
>
<1 if rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  \  tn1 þ n2  2;a
s
/ðx; yÞ ¼ 1
þ 1

>
n1 n2
:
0 Otherwise

Example 4.7 Suppose xij  N ðli ; r2 Þ; j ¼ 1ð1Þni ; i ¼ 1ð1Þk independently. This is


one-way classified data.
We are to find LR test for H : l1 ¼ l2 ¼ ¼ lk against K: l0i are not all equal.
Answer Here, h ¼ ðl1 ; l2 ; . . .lk ; r2 Þ and H ¼ fh : 1\li \1; i ¼ 1ð1Þk;
r [ 0g: Observe that H[K ¼ H: Likelihood functions ¼ pð x=hÞ ¼
2
P
k Pni
ðxij li Þ
2
n=2 n
 12 P
ð2pÞ r e ; n ¼ k1 ni :
2r
1 1
 
Sup pðx=hÞ p x=h
H
Likelihood ratio is Lð xÞ ¼ h2H x ¼ ;
Sup pð =hÞ pðx=hÞ
h2H

where h and hH are, respectively, the MLEs of hð2 HÞ and hð2 H Þ:


P
ni
Now, h 2 H) MLEs are li ¼ xi ¼ n1i xij
1


 
@ log p x=h
as ¼ 0 ) li ¼ xi
@li
1X k X ni
 2 within S:S: W
r2 ¼ xij  xi ¼ ¼ :ðsayÞ:
n 1 1 n n
"   #
@ log p x=h 1 X
k X ni
  2
Here ¼ 0 ) r2 ¼ xij  xi
@r2 n 1 1

Again h 2 H ) h ¼ ðl; r2 Þ; where l is the common value of li ’s.


122 4 Likelihood Ratio Test

Here, we have MLEs

1Xk X
ni
1Xk
lH ¼ xij ¼ nixi ¼ x; ðsayÞ
n 1 1 n 1
1X k Xni
 2 Total S:S: T
r2H ¼ xij  x ¼ ¼
n 1 1 n n
" #
1 X k  2 W þB
¼ Wþ ni xi  x ¼
n 1
n
" #
Xk  2
B¼ ni xi  x ¼ BetweenðmeansÞ S:S :
1

 . 
Hence, we get p x ^h ¼ ð2pÞn=2 ðrÞn e2 :
n

 . 
p x ^hH ¼ ð2pÞn=2 ðr
^H Þn e2
n

 2 n=2
^
r
So, Lð xÞ ¼ ^2H
r
and therefore reject H iff Lð xÞ\c; PH fLð xÞ\cg ¼
a 2 ð0; 1Þð0 \ c \ 1Þ

^2H
r W þB B
, [ c0 , [ c0 , [ c00
r 2 W W
B=
, T  ¼ k  1 [ c000
W=
nk

The size condition now reduces to PH fT  [ c000 g ¼ a under H.


Under H, T   F ðk  1; n  kÞ

)c000 ¼ Fa;ðk1;nkÞ

B=
So, our LR test is W k  1 [ Fa;ðk1;nkÞ as rejection of H.
=n  k
Note It is the same as ANOVA test.
Special case: (i) l ¼ l0 (given)
 n=2 "P P  2 #n=2
k ni
^2
r 1 xij   xi
Lð xÞ ¼ ¼ Pk Pn 
1
2
^H 2
r
1 xij  l0
i
1
" #n=2
W
¼ P
W þ k1 ni ðxi  l0 Þ2
4.1 Introduction 123

Test reduces to reject H iff


Pk

1
ni ðxi  l0 Þ2
T ¼ k 1
[ Fa;ðk;nkÞ
W=
ðn  kÞ

(ii) Common value is unknown ðlÞ but r2 ¼ r20 is known.


 ni 

P
k P 2 n o
exp  2r1 2 xij  x exp  2r1 2 ðB þ WÞ
0 1
Lð xÞ ¼  1
¼ n0 o
P ni 
k P 2
exp  2r2
1
xij  xi exp  2rW
2
0
0
1 1
 
B
¼ exp  2
2r0

Hence LR test is: reject H iff

B
[ v2a;k1 :
r20

(iii) If min ðn1 ; n2 ; . . .; nk Þ is large, we can approximate the distribution of


2 log Lð xÞ by v2 —distribution with d.f. k − 1.
Note The above hypothesis is equivalent to homogeneity of k univariate normal
population.
Example 4.8
 
Suppose xij  N li ; r2i ; j ¼ 1ð1Þni ; i ¼ 1ð1Þk independently:

Obtain the likelihood ratio test of H : r21 ¼ r22 ¼ r23 ¼ ¼ r2k against K: not
all ri ’s are equal. 
Answer h ¼ l1 ; l2 ; . . .; lk ; r21 ; r22 ; . . .; r2k


H ¼ h : 1\li \1; r2i [ 0; i ¼ 1ð1Þk
8 9
<  12 P ðxij li Þ2 =

ni

  ð2pÞ
n =2 Q P
k
Likelihood function = p x=h ¼ Q k
; n¼
2r
ni e i ni
k
ð r2 2
i Þ
i¼1
: ;
i¼1
124 4 Likelihood Ratio Test


@ log pðx=hÞ 2
Pn i  
Now, @li ¼0) 2r2i j¼1 ^i ¼ xi
xij  li ð1Þ ¼ 0 ) l
and
 
@ log p x=h ni 1 X ni
 2
) þ xij  li ¼ 0
@ri2 2
2ri 4
2ri j¼1
1X ni
  2 ni  1 2
^2i ¼
)r xij  xi ¼ s
ni j¼1 ni i
. Q  2 n2i
Hence, for h 2 H we get p x ^h ¼ ð2pÞn=2 ki¼1 r ^i
 
 
x
¼ Sup p =h
h2H

 
Under H, p x=h reduces to

P
k P
ni

  ðxij li Þ
2
 12
p x=h ¼ ð2pÞn=2 rn e ;
2r

P ni 
k P 2 P
k
_ _2
where from we get liH ¼ xi and rH ¼ 1n xij  xi ¼ 1n ðni  1Þs2i
  .   2 n=2 n=2
Hence, Sup p x=h ¼ p x ^hH ¼ ð2pÞn=2 r ^H e
h2H
Hence, likelihood ratio is

Qk nðni 1Þs2i o 2i
n
 n=2
r^2H i¼1 ni
Lð xÞ ¼ Qk ¼h in=2 :
ni
^2i Þ 2 Pk
i¼1 ðr i¼1 ðni  1Þsi
1 2
n

The distribution of the statistic obtained in Lð xÞ is difficult to calculate.


Therefore, we could only say about its asymptotic distribution, i.e. 2 loge Lð xÞ:
Pk
ðni  1Þs2i X k
ð ni  1Þ 2
So; 2 loge Lð xÞ ¼ n loge i¼1
 ni loge si
n i¼1
ni

is distributed as v22kðk þ 1Þ¼k1 for large ni’s.


It has been suggested by Bartlett (1937) that the above approximation to
Chi-square for large n can be improved if we replace the ML estimators of r2 ’s by
unbiased estimators, i.e. if we replace ni by ni  1 and n by n − k in the above
expression. So,
4.1 Introduction 125

Pk
ðni  1Þs2i X k
2 loge Lð xÞ ¼ ðn  k Þ loge i¼1
 ðni  1Þ loge s2i
nk i¼1
X Pk
k
s 2
ðni  1Þs2i
¼ ðni  1Þ loge 2 ; where s ¼ Pi¼1
2
k
i¼1 ðni  1Þ
i¼1
si

Bartlett has also suggested that Chi-square approximation will hold goof for ni as
low as four or five if the above statistic is divided by t; where
" #
1 Xk
1 1
t ¼ 1þ  Pk
3ðk  1Þ i¼1 ni  1 i¼1 ðni  1Þ

PK 2
ðni 1Þ loge s2
i¼1 s
Hence, T ¼ t
i
 v2k1
So, we reject H approximately at level a iff

T [ v2k1;a

It is noted that the rapidity of convergence for this statistic T towards v2 is


greater than that of 2 loge Lð xÞ:

Example 4.9 X1 ; X2 . . .Xn are i.i.d. with density r1 erðxlÞ


1
for x  l: Find the LR
test for testing
I. H : l ¼ l0 against K: l 6¼ l0
II. H : r ¼ r0 against K: r ¼6 r0

Solution
I. Likelihood function
P
n
  1 r
1
ðxi lÞ
p x=l; r ¼ n e i if xi [ l8i
r
¼ 0 otherwise

MLEs of l and r are given as


^ ¼ y1 ; y1 ¼ 1st order statistic
l
Pn
^ ¼ 1n ðyi  y1 Þ; y1 \y2 \yn are the order statistics of x1 ; x2 . . .; xn :
r
i¼1
Then
   
Sup p x=l; r ¼ p x=l ^Þn en
^ ¼ ðr
^; r
l;r
126 4 Likelihood Ratio Test
 
Under H, p x=l; r reduces to

P
n
  r1 ðxi l0 Þ
p x=l ; r ¼ rn e i
0

P
n Pn
^H ¼ 1n ðxi  l0 Þ ¼ 1n ðyi  l0 Þ
) MLE of r is r
  
i  i
x x
Then Sup p =l; r ¼ p =l ; r ¼ ðr^H Þn en
l;r2H 0 ^H

Then likelihood ratio is given as


 
Sup p x=l; r  n
l;r2H ^
r
Lð xÞ ¼   ¼
Sup p x=l; r ^H
r
l;r

Now, reject H iff


P
ð yi  y1 Þ
Lð xÞ\C , P \c
ð y i  l0 Þ
P
ð yi  y1 Þ n ð y 1  l0 Þ
,P \c , P [ c0
ð y i  y 1 Þ þ nð y 1  l 0 Þ ð yi  y1 Þ

Under

H : l ¼ l0
nðy1  l0 Þ=
P 2  F2;2n2 :
ðyi  y1 Þ=
2n  2

Hence, we reject H iff


n= ðy1  l Þ
T ¼P 2 0
[ Fa;2;2n2
ðyi  y1 Þ=
ð2n  2Þ

II. As earlier, it can be shown that the test has acceptance region as

1X n
c1 \T ¼ ðyi  y1 Þ\c2 ;
r0 i
4.1 Introduction 127

where c1 and c2 are determined that



PH c1 \v22n2 \c2 ¼ 1  a

PH c1 \v22n \c2 ¼ 1  a

Example 4.10 Let ðX11 ; X21 Þ; ðX12 ; X22 Þ. . .; ðX1n ; X2n Þ be a random sample from a
bivariate normal distribution with means l1 and l2 , variances r21 and r22 and cor-
relation coefficient q. Find the likelihood ratio test of
H : q ¼ 0 against K : q 6¼ 0
Solution
 
Here, h ¼ l1 ; l2 ; r21 ; r22 ; q
 
H ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; 1\q\1; i ¼ 1; 2
 
H0 ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; q ¼ 0; i ¼ 1; 2
In H, ML estimators for l1 ; l2 ; r21 ; r22 and q are
_ _ _2 P _2 P _
l1 ¼ x1 ; l2 ¼ x2 ; r1 ¼ 1n ðx1i  x1 Þ2 ; r2 ¼ 1n ðx2i  x2 Þ2 : and q¼
P
ðx1i x1 Þðx2i x2 Þ
P P 1=2 ¼ r.
f ðx1i x1 Þ2 ðx2i x2 Þ2 g
Thus,
h 2 2 i
 pffiffiffiffiffiffiffiffiffiffiffiffiffin 2 1r
1
r
n^ r
n^
1 þ 2 2r:n:r

e ð Þ r^1 r^2
_ _
Sup pð xjhÞ ¼ 2pr1 r2 1  r 2
2 2 2

h2H
 pffiffiffiffiffiffiffiffiffiffiffiffiffin
en
_ _
¼ 2pr1 r2 1  r 2

In Ho ; ML estimators for l1 ; l2 ; r21 and r22 are

_ _ 1X _2 _2 1X
l1H ¼ x1 ; l2H ¼ x2 ; r1H ¼ðx1i  x1 Þ2 ; r2H ¼ ðx2i  x2 Þ2
n n
h i
 n 1 n^r21H þ n^r22H
_ _
Thus, Sup pð xjhÞ ¼ 2pr1H r2H
2 2 2
e r^1H r^2H
h2H

 n
en
_ _
¼ 2pr1H r2H

Hence, the LR is given as

Sup pð xjhÞ
h2H0  n=2
Lð xÞ ¼ ¼ 1  r2
Sup pð xjhÞ
h2H
128 4 Likelihood Ratio Test

The LR critical region is given as

Lð xÞ\C
 n=2
or; 1  r2 \C
or; 1  r 2 \C1
or; r 2 [ C2
or; jr j [ C3 ;

where C3 is obtained as

PH ½jr j [ C3  ¼ a

Thus, the test of H : q ¼ 0 against K : q 6¼ 0 is based on r, the distribution of the


sample correlation coefficient and its distribution for q ¼ 0 is symmetric about 0.
Thus, PH ½r\  C3  ¼ PH ½r [ C3  ¼ a=2
Equivalently, the critical region for H : q ¼ 0 against K : q 6¼ 0 is
pffiffiffiffiffiffiffiffiffiffiffi
jr j n  2
pffiffiffiffiffiffiffiffiffiffiffiffiffi [ k:
1  r2
pffiffiffiffiffiffi
Since rpffiffiffiffiffiffiffi
n2ffi
1r2
has the t-distribution with (n − 2) d.f. when q ¼ 0, the constant k is
given as

pffiffiffiffiffiffiffiffiffiffiffi
jr j n  2
PH pffiffiffiffiffiffiffiffiffiffiffiffiffi [ k ¼ a:
1  r2

This gives k ¼ tn2;a=2 :


Note For example, if n = 4 and a = 0.05 then

Zc3 Z1
1 1
dr ¼ dr ¼ 0:025
2 2
1 c3

gives C3 = 0.95. Hence, H is rejected at 5 % level of significance if r based on a


sample of size four is such that jr j [ 0:95:

Example 4.11 Let X1 ; X2 ; . . .; Xn be a random sample form density f ð xÞ ¼


1 x=k
ke ; x [ 0; k [ 0: Find the likelihood ratio test of H : k ¼ k against K : k 6¼ k :
0 0
4.1 Introduction 129

Solution Here, h ¼ ðkÞ; H ¼ fk : k [ 0g

H 0 ¼ fk : k ¼ k0 g
_
In H, MLE of k is k ¼ x
Thus, the LR test is

Sup pðxjkÞ nx


1 k0
H0 kno e xn n x
Lð xÞ ¼ ¼ ¼ e k0 :en
Sup pðxjkÞ 1 n
xn :e
kn0
H

The rejection region Lð xÞ\C gives

xn enx=k0 \C1 ;


nx
writing gðxÞ ¼ xn e k0 :
It shows that the curve y ¼ gðxÞ has single maximum at x ¼ k0 and the shape of
the curve is

The graph shows the critical regions 0\ x \d0 or d1 \ x \1 corresponding to


the critical region Lð xÞ\C: The constants d0 and d1 are obtained by the size
condition

PH ½d0 \x\d1  ¼ 1  a:

In this problem, X  Gð1; kÞ

Mx ðtÞ ¼ ð1  ktÞ1
 
kt n
) Mx ðtÞ ¼ 1 
n
  
  G n; k ; i.e. f ðxÞ ¼ 1 n n enkx xn1 :
Thus, X n CðnÞ k
One can find the values of d0 and d1 from the gamma distribution table under H0 .
Chapter 5
Interval Estimation

5.1 Introduction

Inpoint estimation when a random sample ðX1 ; X2 ; . . .; Xn Þ is drawn from a popu-


lation having distribution function Fh and h is the unknown parameter (or the set of
unknown parameter). We try to estimate the parametric function cðhÞ by means of a
single value, say t, the value of a statistic T corresponding to the observed values
ðx1 ; x2 ; . . .; xn Þ of the random variables ðX1 ; X2 ; . . .; Xn Þ. This estimate may differ
from the exact value of cðhÞ in the given population. In other words, we take t as an
estimate of cðhÞ such that jt  cðhÞj is small with high probability. In the point
estimate we try to choose a unique point in the parameter space which can rea-
sonably be considered as the true value of the parameter. Instead of unique estimate
of the parameter we are interested in constructing a family of sets that contain the
true (unknown) parameter value with a specified (high) probability. In many
problems of statistical inference we are not interested only in estimating the
parameter or testing some hypothesis concerning the parameter, we also want to get
a lower or an upper bound or both, for the real-valued parameter. Here two limits
are computed from the set of observations, say t1 and t2 and it is claimed with a
certain degree of confidence (measured in probabilistic terms) that the true value of
cðhÞ lies between t1 and t2 . Thus we get an interval ðt1 ; t2 Þ which we expect would
include the true value of cðhÞ. So this type of estimation is called intervalestimation.
In this chapter we discuss the problem of interval estimation.

5.2 Confidence Interval

An interval depending on a random variable X is called a random interval. For


example, (X, 2X) is a random interval. Note that, 12  X  1 , X  1  2X:

© Springer India 2015 131


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_5
132 5 Interval Estimation

A confidence interval (CI) of h is a random interval which covers the true value
of the parameter h with specified degrees of confidence (assurance). In other words,
 
a random interval I ð X Þ ¼ hð X Þ; hð X Þ satisfying
n  o
Prh h 2 I X  1  a8h 2 H ð7:1Þ


will be a confidence interval for θ at confidence level (1 − α). If equality in (7.1)


holds then (1 − α) will be called confidence coefficient. hð X Þ and hð X Þ are the lower
and upper confidence limits respectively.
Let I ð X Þ ¼ ½hð X Þ; 1 be a random interval such that Prh fh 2 I ð X Þg ¼
Prh fh  hð X Þg  1  a 8 h 2 H: Then hð X Þ is called the lower confidence bound
of θ at confidence level (1 − α). Similarly we can define upper confidence bound
 
hð X Þ such that Prh fh 2 I ð X Þg ¼ Prh h  hð X Þ  1  a 8 h 2 H; corresponding
 
to a random interval I ð X Þ ¼ a; hð X Þ :
Remark 1 In making the probability statement we do not mean θ is a random
variable. Indeed, θ is a constant. All that is meant here is that the probability is
 
(1 − α) that the random interval hð X Þ; hð X Þ will cover θ whatever the true value
of θ may be. More specifically, it is asserted that about 100(1 − α)% statements of
 
the form h 2 hð X Þ; hð X Þ should be correct.
b
Remark 2 In thepoint estimation, we  choose
 an estimate, say h ð xÞ; on the basis of a
b 
sample x such that the difference  h x  h is small with high probability. In other
 
words, in the point estimator we try to choose a unique point in the parameter space
which can reasonably be considered as the true value of theparameter. On the other

hand, in the interval estimation, we choose a subset of the parameter space, say I x ;

on the basis of a sample x which reasonably includes the true value of the parameter.
  
More specifically in interval estimation, we choose an interval I x ; such that

n  o
Prh h 2 I x  1a 8 h:


5.3 Construction of Confidence Interval

Method I
A simple procedure for finding a confidence interval
Let T be a statistic and WðT; hÞ be a function of T and θ. Suppose the distribution of
WðT; hÞ is free from θ. Then it is always possible to choose two constants K1 and K2
ðK1  K2 Þ such that
5.3 Construction of Confidence Interval 133

PrfwðT; hÞ\K1 g\a1 and PrfwðT; hÞ [ K2 g\a2 where a1 ; a2 [ 0 and


a1 þ a2 ¼ a:
Hence PrfK1  wðT; hÞ  K2 g  1  ða1 þ a2 Þ ¼ 1  a:
Suppose it is possible to convert the inequality K1  wðT; hÞ  K2 into the form
hðT Þ  h  hðT Þ:
 
Then Pr hðT Þ  h  hðT Þ  1  a: This fact gives us a (1 −α) level confidence
interval for θ.
Example 5.1 Let X1 ; X2 ; . . .; Xn be a random sample form N ðl; r2 Þ. Find ð1  aÞ
level confidence interval for l when (i) r2 is known and (ii) when r2 is unknown.
Solution (i) Suppose r2 is known.
pffiffi
We take wðT; hÞ ¼ nðrxlÞ which is an N ð0; 1Þ variate. Hence the distribution of
wðT; hÞ is independent of h. We can choose k1 and k2 from N ð0; 1Þ such that

pffiffiffi
nðx  lÞ
P s1a1   sa 2 ¼ 1  ð a1 þ a 2 Þ ¼ 1a
r
h i
So, x  sa2 prffiffin ; x  s1a1 prffiffin is a ð1  aÞ level confidence interval for l if r is
known.
(ii) Suppose r2 is unknown:
pffiffi
We take wðT; hÞ ¼ nðxslÞ which is student’s t statistic with d.f ðn  1Þ where
P
s2 ¼ n11
ðxi  xÞ2 : The distribution of wðT; hÞ is independent of h. Again we
choose k1 and k2 using a t-distribution with (n − 1) d.f such that

pffiffiffi
nðx  lÞ
P tn1;1a1   tn1;a2 ¼ 1  ð a1 þ a2 Þ ¼ 1a
s


s s
) P x  tn1;a2 pffiffiffi  l  x  tn1;1a1 pffiffiffi ¼ ð1  aÞ:
n n
h i
So x  tn1;a2 psffiffin ; x  tn1;1a1 psffiffin is a ð1  aÞ level confidence interval for l, if
r2 is unknown.
Example 5.2 Let X1 ; X2 ; . . .; Xn be a random sample from N ðl; r2 Þ. Find ð1  aÞ
level confidence interval for r2 when (i) l is known and (ii) l is unknown.
Solution (i) SupposeP l is known.
ðxi lÞ2
We take wðT; hÞ ¼ r2 which is distributed as v2 with n d.f. Thus its
distribution is independent of h. We can choose k1 and k2 from v2 distribution with
n d.f such that
134 5 Interval Estimation

" P #
ð x i  lÞ 2
P v2n;1a1   v2n;a2 ¼ 1  ð a1 þ a2 Þ ¼ 1a
r2
"P P #
ðxi  lÞ2 ð x i  lÞ 2
)P  r2  ¼ 1a
v2n;a2 v2n;1a1
P P
ðxi lÞ2 ðxi lÞ2
Thus v2n;a ; v2n;1a
is 100(1 − a)% confidence interval of r2 when l is
2 1

known. (ii) Suppose l is unknown.P


ðxi xÞ2
We take the function wðT; hÞ ¼ r2 which is distributed as v2 with (n − 1)
d.f. This distribution
is independent of h. Proceeding as in (i),
P P
ðxi xÞ2 ðxi xÞ2
v2
; v2 is 100ð1  aÞ% confidence interval of r2 when l is
n1;a2 n1;1a1

unknown.
Example 5.3 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
1
h ; 0 \ x \ h: Find 100ð1  aÞ% confidence interval of h.
Solution The likelihood function is L ¼ h1n . This is maximum when h is the
smallest; but h cannot be less than xðnÞ , the maximum of sample observations. Thus
_
h ¼ xð n Þ :
_
The p.d.f of h is given by
n1
_ nh
_
_
h h ¼ n ; 0\h\h:
h
_
x
Let u ¼ ðhnÞ ¼ hh : so that gðuÞ ¼ nun1 ; 0 < u < 1.
Thus the distribution of u is independent of h.
We find u1 and u2 such that

P½u1 \ u \ u2  ¼ 1  ða1 þ a2 Þ ¼ 1a

Ru1 R1
where gðuÞdu ¼ a1 , gðuÞdu ¼ a2
0

_
u2
_ _

h h h
i.e. P u1 \ h \u2 ¼ 1  a: ) P u2 \h\ u1 ¼ 1  a
 
Thus, max u2 ; u1 is a 100ð1  aÞ% confidence interval for h.
Xi max Xi

 
Example 5.4 X1 ; X2 ; . . .; Xn is a random sample from a G 1h ; 1 distribution having
p.d.f.
5.3 Construction of Confidence Interval 135

1
f ðx/hÞ ¼ ex=h ; x  0:
h

Find 100ð1  aÞ% confidence


Pn interval of h.
xi
Solution Let t = i¼1
h ¼ n
x
h which is a G(1, n) variate having p.d.f. g
t n1
(t) = CðnÞ e t ; 0  t\1:
1

Thus the distribution of t is independent of h. We find k1 and k2 such that




nx
P k1 \t ¼ \k2 ¼ 1  ða1 þ a2 Þ ¼ 1  a
h

where

Zk1 Z1
gðtÞdt ¼ a1 ; gðtÞdt ¼ a2
0 k2

i.e.


nx nx
P \h\ ¼1a
k2 k1

Method 2: Confidence based methods: A general approach:


Let T be a statistic and t1 ðhÞ and t2 ðhÞ be two quantities such that PrfT\t1 ðhÞg\a1
and PrfT [ t2 ðhÞg\a2 ; a1 ; a2 [ 0; a1 þ a2 ¼ a: The equation T ¼ t1 ðhÞ and
T ¼ t2 ðhÞ give us two curves as

θ(t) B(t,θ(t))

θ
T = t 2 (θ)
θ(t) A(t, θ(t))

Suppose t be the observed value of the statistic T. Draw a perpendicular at


T = t. It intersects the curves at A and B. Suppose the co-ordinates of A and B are
 
ðt; hðtÞÞ and t; hðtÞ respectively. According to the construction

t1 ðhÞ  T  t2 ðhÞ , hðtÞ  h  hðtÞ:


 
) Pr½t1 ðhÞ  T  t2 ðhÞ ¼ Pr hðtÞ  h  hðT Þ  1  a:

This fact gives us ð1  aÞ level Confidence Interval for h.


136 5 Interval Estimation

Note 1: To avoid the drawing one may consider inverse interpolation formula.
Note 2: If the L.H.S’s of the Eq. (7.1) can be given explicit expression in terms of h
and if the equations can be solved for h uniquely, then roots are the confidence
limits for h at confidence level ð1  aÞ.
Example 5.5 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼
h;0\x\h: Find 100ð1  aÞ% confidence interval of h.
1

Solution The likelihood function is L ¼ h1n : This is maximum when h is the


smallest; but h cannot be less than xðnÞ ; the maximum of sample observations. Thus
_
h ¼ xð n Þ :
_
The p.d.f of h is given by
n1
_ nh
_
_
h h ¼ n ; 0\h\h:
h

We find k1 ðhÞ and k2 ðhÞ such that


h _
i
P k1 ðhÞ\h\k2 ðhÞ ¼ 1  ða1 þ a2 Þ ¼ 1  a

where

Z
k1 ðhÞ
 
h ^h d ^h = a1 ð7:2Þ
0

and
Zh  
h h^ d ^h = a2 ð7:3Þ
k2 ðhÞ

^n k ðhÞ
From (7.2), hhn 01 ¼ a1 or, k1 ðhÞ ¼ hða1 Þ1=n
^ n
From (7.3),¼ a2 or, 1 − ½k2hðhn Þ ¼ a2 or, k2 ðhÞ ¼ hð1  a2 Þ1=n . Therefore,
hn h
hn k2 ðhÞ
h _
i h i
1=n ^
h ^h
P ha1 \h\hð1  a2 Þ1=n ¼ 1  a or, P 1=n \h\ 1=n ¼ 1  a:
ð1a2 Þ ða1 Þ
Note We can get the confidence interval of h by the Method I which is given in
Example 5.3.
Large sample confidence interval: Let the asymptoticn distribution pofffiffi a statistic
o
Tn be normal with mean h and variance r nðhÞ ; then Pr s1a1  ðTnrh Þ n
2

ðhÞ  sa2 ’
1  a1 þ a2 ¼ 1  a (say).
This fact gives us a confidence interval for h at confidence level ð1  aÞ
approximately.
5.3 Construction of Confidence Interval 137

Example 5.6 X1 ; X2 ; . . .; Xn is a large random sample from PðkÞ. Find the


100ð1  aÞ% confidence interval for k. Px i
enk k
Solution Likelihood function is L ¼ x1 !x2 !:::::xn !
_
MLE of k ¼ k ¼ x
 _ 1 k
V k ¼  ¼
@ 2 log L n
E @k2

Thus, p ffiffiffiffiffiffi ! N ð0; 1Þ as n ! 1


xk
k=n


Hence P s1a1 \ p ffiffiffiffiffiffi \sa2 ¼ 1  ða1 þ a2 Þ ¼ 1  a
xk
k=n

h pffiffiffiffiffiffiffi pffiffiffiffiffiffiffii
) P x  sa2 x=n\l\x  s1a1 x=n ¼ 1  a

using the approximation k ¼ ^k ¼x in the denominator. So 100ð1  aÞ% confi-


qffiffiffiffiffiffi qffiffiffiffiffiffi
dence interval for k is from x  sa2 x=n to x  s1a1 x=n.
Method 3 Method based on Chebysheff’s inequality:
By Chebysheff’s inequality, Pr½jT  EðT Þj  erT  [ 1  e12 . Now setting 1 
e2 ¼ 1  a; we can construct confidence interval.
1

Example 5.7 Consider the problem of Example 5.3. Find the 100ð1  aÞ% confi-
dence interval of h by using the method of Chebysheff’s inequality.
Solution
_ _ 2
We have E h ¼ n þn 1 h and E h  h ¼ h2 ðn þ 1Þ2ðn þ 2Þ
By applying Chebysheff’s inequality we get
2 _  3
  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
h  h
6  ð n þ 1Þ ð n þ 2Þ 7 1
P4 \ 25 [ 1  2 :
h 2 2

_ p _
Since h ! h; we replace h by h and for moderately large n,
2 _  3
  rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 h  h  ð n þ 1Þ ð n þ 2Þ
6 7 1
P4 _ \ 25 [ 1  2 :
h 2 2

Choosing 1  212 ¼ 1  a or 2 ¼ p1ffiffi : we have


a
138 5 Interval Estimation

" pffiffiffi pffiffiffi #


_1 _ 2 _ 1 _ 2
P h  pffiffiffi h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi \h\h þ pffiffiffi h pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi [ 1  a
a ð n þ 1Þ ð n þ 2Þ a ð n þ 1Þ ð n þ 2Þ

_
Again pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 ffi ’ 1n for large n and the fact that h  h, we have
ðn þ 1Þðn þ 2Þ
h_ _
 qffiffii _  qffiffi
P h\h\h 1 þ 1n 2a [ 1  a: Thus h ¼ max xi ; max xi 1 þ 1n 2a is an
approximate 1  a level confidence interval for h.

5.4 Shortest Length Confidence Interval


and Neyman’s Criterion

From the above discussion, it is clear that ð1  aÞ level C.I is not unique. In fact,
infinite number of C.I’s can be constructed by simple method [Because the equation
a1 þ a2 ¼ a; a1  0; a2  0 has infinite number of solution for ða1 ; a2 Þ]. Again
for different choice of statistic, we get different confidence intervals. For example,
in r.s. from

1 x
f ðx; hÞ ¼ eh ; 0 \ x \ 1
h

P
2 Xi
is a ð1  aÞ level lower confidence bound for h
v22n;a
P
 v2 2n since M.g.f. of X = ð1  thÞ1 ) M.g.f of
2 Xi
[As 2X
h  v 2
2 ) h
2X
h ¼ ð1  2tÞ1 = M.g.f. of v2 2 ].
On the other hand, að1  aÞ level confidence bound for h based on Xð1Þ ¼ minXi
i
2nXð1Þ
is v22;a
.
So we need some optimality criteria to choose one of the ð1  aÞ level confi-
dence intervals.
1. Shortest length confidence interval [Wilk’s criterion]
 
A ð1  aÞ level confidence interval I ðhÞ ¼ hðT Þ; hðT Þ based on T will be of
shortest length if the inequality
hð T Þ  hð T Þ  
 h ðT Þ  h ðT Þ; for all h holds for every other ð1  aÞ level C.I.
  
h ðT Þ; 
h ðT Þ based on the same statistic T.
Example 5.8 On the basis of an r.s. from N ðl; r2 Þ; r2 being unknown, a ð1  aÞ
 given by
h C.I. for l based on X is
level i
  sa2 prffiffi ; X
X   s1a1 prffiffi ; a1 ; a2  0 and a1 þ a2 ¼ a: The length of the
n n
interval is ðsa2  s1a1 Þ prffiffi : n
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 139

To find the shortest length confidence interval, we minimize ðsa2  s1a1 Þ


subject to a1 þ a2 ¼ a; a1 ; a2  0:
pffiffi
Owing to the symmetry of the distribution nðrxlÞ about ‘0’, the quantity sa2 
s1a1 will be a minimum when sa=2 ¼ s1a1 , i.e., when a1 ¼ a2 ¼ a=2: Thus the
h i
shortest length ð1  aÞ level C.I. for l based on x is x  sa=2 prffiffin ; x þ sa=2 prffiffin :

Remarks Occasionally, the length of a C.I is a random quantity. In this case, we


minimize its expected length. e.g. In random sampling from N ðl; r2 Þ; (both l and
hr are unknown), a ð1i  aÞ level C.I for h l is given i by
2

  t
X  ta ;n1 psffiffi ; X psffiffi : This length of the C.I is t psffiffi :
2 n 1a ;n1 1 n a ;n1  t1a ;n1 2 1 n
which is a random quantity.
h So, to find i the shortest (expected) length C.I, we minimize
ta2 ;n1  t1a1 ;n1 EpðsffiffinÞ subject to a1 ; a2  0 and a1 þ a2 ¼ a: Owing to the sym-
metry of tn1 distribution about ‘0’, the minimum is attained at a1 ¼ a2 ¼ a=2:
Therefore
h the required shortest i expected length confidence interval is
 sffiffi 
X  ta=2;n1 n ; X þ ta=2;n1 n :
p psffiffi

Example 5.9 Consider the problem discussed in Example 5.2. On the basis of a
random sample from N ðl; r2 Þ; l being known, a ð1  aÞ level CI for r2 is given by

P P
ðxi lÞ2 ðxi lÞ2
vn;a
2 ; v 2 , a1 ; a2  0 and a1 þ a2 ¼ a: The length of the interval is

1
2 n;1a

P
v2
1
 v21 ðXi  lÞ2 which has the expected value
n;1a1 n;a2

" #
1 1
 nr2 :
v2n;1a1 v2n;a2


We wish to minimize 1
v2n;1a
 v21 ;
n;a2
1

Rv2
2

subject to f ðv2 Þdv2 ¼ 1  a; where v21 ¼ v2n;1a1 ; v22 ¼ v2n;a2 and f ðv2 Þ is the
v21
p.d.f. of a chi-square r.v. with
2 n d.f. 3
v22
R
Now let / ¼ v12  v12 þ k4 f ðv2 Þdv2  ð1  aÞ5
1 2
v21

where k is a Lagrangian multiplier. We get


140 5 Interval Estimation

d/ 1  
¼  4  kf v21 ¼ 0
dv21 v1

d/ 1  
¼ 4 þ kf v22 ¼ 0
dv2 v2
2

1 1
)k ¼   ¼  4  2
v41 f v21 v2 f v2

Hence v21 and v22 are such that the equation

Zv2
2

     
v41 f v21 ¼ v42 f v22 is to be satisfied and f v2 dv2 ¼ 1  a:
v21

It is very difficult to find the actual values of v21 and v22 . In practice, the equal tails

P P
ðxi lÞ2 ðxi lÞ2
interval, v2
; v2 , is used.
n;a=2 n;1a=2

PSimilarly if l is unknown, the equal tail confidence interval,


P
ðxi xÞ2 ðxi xÞ2
v2
; v2 , is employed.
n1;a=2 n1;1a=2

Example 5.10 Consider the problem discussed in Example in 5.3. A ð1  aÞ level


C.I. for h is given by
max xi ; max xi ; a ; a  0; a þ a ¼ a: The length L of the interval is
u2 u1 1 2 1 2
 
u1  u2 max xi .
1 1

We minimize L subject to

Zu2
nun1 du ¼ un2  un1 ¼ 1  a
u1

This implies 1  a\un2

) ð1  aÞ1=n \u2  1

and
5.4 Shortest Length Confidence Interval and Neyman’s Criterion 141


dL dL du1 1
¼ max xi þ
du2 du1 du2 u22

1 nu2n1 1
¼ max xi  2 n1 þ 2
u1 nu1 u2
un1 þ 1  u2n þ 1
¼ max xi \0;
u22 un1 þ 1

so that the minimum occurs at u2 ¼ 1: When u. 1=n : Thus a 1  a level


2 ¼ 1;u1 ¼ a

confidence interval is given by max x ; max x a1=n : This confidence interval has
i i

the smallest length among all confidence intervals for h based on max xi

2. Neyman’s criterion
Let I1 ð X Þ and I2 ð X Þ be two ð1  aÞ levelconfidence intervals for h. I1 ð X Þ will be
accurate (or shorter) than I2 ð X Þ if

Ph0 fh 2 I1 ð X Þg  Ph0 fh 2 I2 ð X Þg8h; h0 2 H; h 6¼ h0 ðh0 ¼ true valueÞ

A ð1  aÞ level C.I. I ð X Þ is said to be most accurate (UMA) (or shortest) if


Ph0 fh 2 I ð X Þg  Ph0 fh 2 I  ð X Þg8h; h0 2 H; h 6¼ h0 for any other ð1  aÞ level C.I.
I  ð X Þ.
A ð1  aÞ level C.I. I ð X Þ is said to be unbiased if,

Ph0 fh 2 I ð X Þg  1  a ¼ Ph0 fh0 2 I ð X Þg8h; h0 2 H; h 6¼ h0

i.e. Probability (containing wrong value of θ) ≤ Probability (containing true


value of θ).
Implication An unbiased confidence interval includes true value more often than
it does contain wrong value.
A ð1  aÞ level unbiased C.I. I ð X Þ is said to be most accurate amongst the class
of unbiased ð1  aÞ level if Ph0 fh 2 I ð X Þg  Ph0 fh 2 I  ð X Þg8h; h0 2 H;h 6¼ h0 for
any other ð1  aÞ level unbiased C.I. I  ð X Þ
Relation between non randomized test and confidence interval
Theorem 5.1 Suppose Aðh0 Þ denoted the acceptance region of a level a test for
testing H0 :h ¼ h0 n o
Define S x ¼ h= x 2 AðhÞ
  

Then S x will be a ð1  aÞ level confidence interval for h.



 
Proof By the construction of S x , we have

142 5 Interval Estimation

 
x 2 AðhÞ , h 2 S x
 
n  o n o
)Ph h 2 S x ¼ Ph x 2 AðhÞ  1  a8h:
 

Note The implication of this theorem is that for a fixed x , the confidence region
  

S x is that set of values h0 for which the hypothesis H0 : h ¼ h0 is accepted when



x is the observed value of x
 
 
Theorem 5.2 Let S x be a ð1  aÞ level confidence interval for h: Define AðhÞ ¼
n  o 

x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level a


 
non-randomized test for testing H0 : h ¼ h0 .
Proof By the construction of AðhÞ, we have
 
x 2 AðhÞ , h 2 S x
 
n o n  o
)Ph x 2 AðhÞ ¼ Ph h 2 S x  1  a8h:
 

Relation between UMP non-randomized test and UMA confidence interval


Theorem 5.3 Suppose Aðh0 Þ denoted the acceptance  region
 n of an UMP,olevel-a
non-randomized test for testing H0 : h ¼ h0 . Define S x ¼ h= x 2 AðhÞ . Then
   

S x will be an UMA ð1  aÞ level confidence interval for h:



 
Proof By Theorem 5.1, it is clear that the level of set S x is ð1  aÞ.

Consider another acceptance region A ðh0 Þ of a level a non-randomized test for
testing H0: h¼ hn0 o  
Let S x ¼ h= x 2 A ðhÞ , then the level of S x is also ð1  aÞ.
  
Since AðhÞ is the acceptance region of a UMP non-randomized test, we can write,
n o n o
Ph0 x 2 AðhÞ  Ph0 x 2 A ðhÞ ðh 6¼ h0 Þ
 
n  o n  o
) Ph0 h 2 S x  Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H
 

 
Since S x is arbitrary the proof follows immediately.

5.4 Shortest Length Confidence Interval and Neyman’s Criterion 143
 
Theorem 5.4 Let S x be an UMA ð1  aÞ level confidence interval for h. Define
n  o
AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level-a
 
UMP test for testing H0 : h ¼ h0 .
Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of
a level-a non-randomized test for testing H0 : h ¼ h0 .  
Now corresponding to another ð1  aÞ level C.I. S x ,

n  o
let A ðhÞ ¼ x : h 2 S x :
 

Then A ðh0 Þ will be also an acceptance region of a level-a non-randomized test


for testing H0 : 
h ¼h0
Now since S x is an UMA ð1  aÞ level C.I. for h,

n  o n  o
Ph0 h 2 S x  Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H; h 6¼ h0
 
n o n o
) Ph0 x 2 AðhÞ  Ph0 x 2 A ðhÞ
 

which implies that Aðh0 Þ will be the acceptance region of level-a UMP
non-randomized test for testing H0 : h ¼ h0 , since A ðhÞ is arbitrary.
Relation between UMPU non-randomized test and UMAU confidence interval
Theorem 3.5 Let Aðh0 Þ be the acceptance region   of nan UMPU olevel-a
non-randomized test for testing H0 : h ¼ h0 . Define S x ¼ h= x 2 AðhÞ . Then
   

S x will be an UMAU ð1  aÞ level confidence interval for h.



 
Proof According to construction of S x it will be a ð1  aÞ level Confidence
  n  o
Interval for h. Let S x ¼ h= x 2 A ðhÞ corresponding to any other accep-

 
tance region A ðh0 Þ of a level-a non-randomized test for testing H0 : h ¼ h0 .
Now since Aðh0 Þ is the acceptance region of a level-a UMPU non-randomized
testfor testing H0 : h ¼ h0
n o n o
Ph0 x 2 AðhÞ  Ph0 x 2 A ðh0 Þ  1  a8h; h0 2 H; h 6¼ h0
 
n  o n  o
) Ph0 h 2 S x  Ph0 h 2 S x 1  a
 

 
i.e., S x is a UMAU ð1  aÞ level C.I. for h, since A ðh0 Þ is arbitrary.

144 5 Interval Estimation
 
Theorem 5.6 Let S x be an UMAU ð1  aÞ level confidence interval for h.
n   o

Define AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a


 
level-a UMPU test for testing H0 : h ¼ h0 .
Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of
a level-a non-randomized test for testing H0 : h ¼ h0 .  
Now, corresponding to any other ð1  aÞ level C.I. S x for h,
n  o 
  
let A ðhÞ ¼ x =h 2 S x ; then A ðhÞ will also be an acceptance region of a
 
level-a non-randomized
  test for testing H0 : h ¼ h0 .
Since S x is UMAU ð1  aÞ level C.I.

n  o n  o
)Ph0 h 2 S x  Ph0 h 2 S x  1  a8h; h0 2 H; h 6¼ h0
 
n o n o
) Ph0 x 2 AðhÞ  Ph0 x 2 A ðh0 Þ  1  a
 

i.e. Aðh0 Þ will be an acceptance region of a level-a UMPU test for testing
H 0 : h ¼ h0 .
Example 5.11 Let X1 ; X2 ; . . .; Xn be a r.s. from Rð0; hÞ: The UMP level-a
non-randomized test for testing H0 : h ¼ h0 against h 6¼ h0 is given by the critical
pffiffiffi
region xðnÞ [ h0 or xðnÞ  h0 n a:
n  pffiffiffi o

Let AðhÞ ¼ x h n a\xðnÞ  h

  n o
Define S x ¼ hj x 2 AðhÞ
 
 pffiffiffi 
¼ hjh n a\xðnÞ  h
 
xð n Þ
¼ hjxðnÞ  h\ p ffiffiffi
n
a
n o
Thus, by Theorem 5.3, Sð xÞ ¼ hjxðnÞ  h\ pðnnffiffiaÞ will be a ð1  aÞ level UMA
x

confidence interval for h.


Chapter 6
Non-parametric Test

6.1 Introduction

In parametric tests we generally assume a particular form of the population dis-


tribution (say, normal distribution) from which a random sample is drawn and we
try to construct a test criterion (for testing hypothesis regarding parameter of the
population) and the distribution of the test criterion depends upon the parent
population.
In non-parametric tests the form of the parent population is unknown. We only
assume that the population, from which a random sample is drawn, is continuous
and try to develop a test criterion whose distribution is independent of the popu-
lation distribution under the hypothesis under consideration. A non-parametric test
is concerned with the form of the population but not with any parametric value.
A test procedure is said to be distribution free if the statistic used has a distri-
bution which does not depend upon the form of the distribution of the parent
population from which the sample is drawn. So in such procedure assumptions
regarding the population are not necessary.
Note Sometimes the term ‘distribution free’ is used instead of non-parametric. But
we should make some distinction between them.
In fact, the terms ‘distribution free’ and ‘non-parametric’ are not synonymous.
The term ‘distribution free’ is used to indicate the nature of the distribution of the
test statistic whereas the term ‘non-parametric’ is used to indicate the type of
hypothesis problem investigated.
Advantages and disadvantages of non-parametric method over parametric
method
Advantages
(i) Non-parametric methods are readily comprehensible, very simple and easy to
apply and do not require complicated sample theory.

© Springer India 2015 145


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_6
146 6 Non-parametric Test

(ii) No assumption is made about the form of frequency function of the parent
population from which the sample is drawn.
(iii) No parametric technique will be applicable to the data which are mere clas-
sification (i.e. which are measured in nominal scale), while non-parametric
method exists to deal with such data.
(iv) Since the socio-economic data are not, in general, normally distributed,
non-parametric tests have found applications in psychometry, sociology and
educational statistics.
(v) Non-parametric tests are available to deal with data which are given in ranks
or whose seemingly numerical scores have the strength of the ranks. For
example, no parametric test can be applied if the scores are given in grades
such as A, B, C, D, etc.
Disadvantages
(i) Non-parametric test can be used only if the measurements are nominal and
ordinal. Even in that case, if a parametric test exists it is more powerful than
the non-parametric test.
In other words, if all the assumptions of a statistical model are satisfied by the
data and if the measurements are of required strength, then non-parametric
tests are wasteful of time and data.
(ii) No non-parametric method exists for testing interactions in ANOVA model
unless special assumptions about the additivity of the model are made.
(iii) Non-parametric tests are designed to test statistical hypothesis only but not for
estimating parameters.

6.2 One-Sample Non-parametric Tests

In this section we consider the following one-sample non-parametric tests:


(i) Chi-square test
(ii) Kolmogorov–Smirnov test
(iii) Sign test
(iv) Wilcoxon signed-rank test
(v) Run test

6.2.1 Chi-Square Test (i.e Test for Goodness of Fit)

Let n sample observations are continuous measurements grouped in k class intervals


or observations themselves are frequency of k mutually exclusive events
6.2 One-Sample Non-parametric Tests 147

A1 ; A2 ; . . .; Ak such that S ¼ A1 þ A2 þ    þ Ak is the space of the variable under


consideration. The form of the distribution is not known. We want to test Ho :
FðxÞ ¼ F0 ðxÞ against: H1 : FðxÞ 6¼ F0 ðxÞ. Here Fo ðxÞ is specified with all its
parameters.
Under H0 we can obtain the probability ðpi Þ of a random observation from F0 to
belong in the ith class Ai ði ¼ 1; 2; . . .kÞ: The expected frequency in ith class is
ei ¼ npi for i ¼ 1; 2;. . .; k. These are compared with the observed frequencies xi .
Pearson suggested the statistic.

X
k
ðxi  npi Þ2
v2 ¼ :
i¼1
npi

If the agreement between the observed ðxi Þ and expected frequencies ðei Þ is
close, then the differences ðxi  npi Þ will be small and consequently v2 will be
small. Otherwise it will be large. The larger the value of v2 the more likely is that
the observed frequencies did not come from the population under H0 . This means
that the test is always right-sided. It can be shown that for large samples the
sampling distribution of v2 under H0 follows chi-square distribution with (k − 1) d.
f. The approximation holds good if every ei  5. In case there are some ei \5, we
have to combine adjacent classes till the expected frequency in the combined class
is at least 5. Then k will be the actual number of classes used in computing v2 . Thus
the null hypothesis H0 is rejected if Cal v2 [ v2a;k1 .

6.2.2 Kolmogrov–Smirnov Test

Let X1 ; X2 ; . . .; Xn be a sample from continuous distribution function FðxÞ. We are


to test H0 : FðxÞ ¼ F0 ðxÞ 8 x against H1 : FðxÞ 6¼ F0 ðxÞ for some x.
Suppose Fn ðxÞ is the sample (empirical) distribution function corresponding to
any given x; that is, if the number of observation  x is k, then

k
Fn ðxÞ ¼ :
n

Test statistic under H0 is given by

Dn ¼ Sup jFn ðxÞ  F0 ðxÞj


x

which is known as Kolmogorov–Smirnov statistic.


The distribution of Dn does not depend on F0 as long as F0 is continuous. H0 is
rejected if Dn [ Dn;a .
148 6 Non-parametric Test

Similarly, the one-sided KS statistics for one-sided alternatives are the


following:
(i) for the alternative H þ : FðxÞ  F0 ðxÞ 8 x the appropriate statistic is

Dnþ ¼ Sup½Fn ðxÞ  F0 ðxÞ


x
(ii) for the alternative H  : FðxÞ  F0 ðxÞ 8 x the appropriate statistic is

D
n ¼ Sup½F0 ðxÞ  Fn ðxÞ
x

The statistics Dnþ and D n have the same distribution because of symmetry. The
test rejects H0 if Dnþ [ Dn;a þ
when alternative is FðxÞ  F0 ðxÞ 8 x and rejects H0 if
Dn [ D 
n;a when alternative is FðxÞ  F0 ðxÞ 8 x at the level a.

6.2.3 Sign Test

FðxÞ is continuous distribution function of the parent population, which is con-


tinuous. FðxÞ is unknown, from which we draw a random sample ðx1 ; x2 ; . . .; xn Þ.
We define fp ¼ pth order population quantile.

) Pr ½X  np  ¼ p i:e: Pr ½X  np  0 ¼ p:

Assumption FðxÞ is continuous in the neighbourhood of fp . To test H 0 : fp ¼ fp 0 .

Case 1 H 1 : np [ np 0
To perform the test we consider the number of positive quantities among
     
x1  np 0 ; x2  np 0 ; . . .; xn  np 0 . Sample values equal to np 0 are ignored.
Suppose S = total number of + signs, we note that, under H 0

Pr½X  np 0  0 ¼ p
 
) Pr X  np 0 [ 0 ¼ 1  p ¼ q; say:

) Under H 0 ; S  Bðn;hqÞ i  
Also, under H 1 ; Pr X  n0p \p, i.e. Pr X  np 0 [ 0 [ q. Suppose, under
h i
H 1 ; Pr X  n0p [ 0 ¼ q0 where q0 [ q.
) Under H 1 ; S  Bðn; q0 Þ where q0 [ q.
6.2 One-Sample Non-parametric Tests 149

Hence a large value of S indicates the rejection of H 0 .

8
< 1 if S[s
So the test is /ðSÞ ¼ a if S ¼ s
:
0 if S\s

where ‘s’ and ‘a’ are such that


(I) Pr½S [ s=H 0 \a  Pr½S  s=H 0 
(II) E H 0 /ðSÞ ¼ a
   
From (I) we get s and from (II) a ¼ Pr S [ s=H 0 þ a Pr S ¼ s=H 0
 
a  Pr S [ s=H 0
)a¼   :
Pr S ¼ s=H 0

Hence test is given by S [ s ) Rejection of H 0


S\s ) Acceptance of H 0
S ¼ s ) To draw a random number with probability of rejection ‘a’ and
probability of acceptance 1 − a.
Case 2

H 2 ¼ np \np 0 or np ¼ np 0 \np 0

Under H 2

Pr½X  np 0  ¼ p

) Pr½X  np 0  [ p

or, Pr½X  np 0  0 [ p, i.e. Pr½X  np 0 [ 0\1  p ¼ q.


 
Suppose under H 2 ; Pr X  np 0 [ 0 ¼ q0 where q0 \q.
) Under H 2 ; S  Bðn; q0 Þ where q0 \q:
So a small value of S 8 indicates the rejection of H 0 .
> 1 if S \ s
<
So our test is /ðSÞ ¼ a if S ¼ s
>
:
0 if S [ s
150 6 Non-parametric Test

where ‘s’ and ‘a’ are such that


   
Pr S\s=H 0 \a  Pr S  s=H 0 and E H 0 /ðSÞ ¼ a
   
i.e: Pr S\s=H 0 þ a Pr S ¼ s=H 0 ¼ a or,
 
a  Pr S\s=H 0
a¼  
Pr S ¼ s=H 0

i.e. if S\s ) reject H 0


S [ s ) accept H 0
S ¼ s ) draw a random number with probability of rejection ‘a’ and probability
of acceptance (1 − a).
Large sample test
Under H 0 ; S  Bðn; qÞ
) under H 0 s ¼ Spffiffiffiffiffiffi
npq  Nð0; 1Þ
nq

) x0 : s\sa

Case 3 H 3 : np 6¼ np 0

Under H 3 ; Pr½X  np 0  6¼ p ) Pr½X  np 0 [ 0 6¼ q


Suppose under H 3 ; Pr½Xnp 0 [ 0 ¼ q0 where q0 6¼ q
) Under H 3 ; S  Bðn; q0 Þ where q0 6¼ q:
So a small or a large value of S indicates the rejection of H 0 . Here the test is
8
>
> 1 if S\s1
>
>
< a1 if S ¼ s1
/ ðSÞ ¼ 0 if s1 \ S \ s2
>
>
>
> a if S ¼ s2
: 2
1 if S [ s2

where s1 and s2 are such that

Pr½S\s1 =H 0 \a1  Pr½S  s1 =H 0 ;


   
Pr S [ s2 =H 0 \a2  Pr S  s2 =H 0

and a1 þ a2 ¼ a. For simplicity we take a1 þ a2 ¼ a=2.


‘a1’ and ‘a2’ are such that
   
a= ¼ Pr S\s1 =H þ a1 Pr S ¼ s1 =H
2 0 0

a
 Pr½S\s1 =H 0 
) a1 ¼ 2
Pr½S ¼ s1 
6.2 One-Sample Non-parametric Tests 151

a    
¼ Pr S [ s2 =H 0 þ a2 Pr S ¼ s2 =H 0
2
a
 Pr½S [ s2 =H 0 
) a2 ¼ 2
Pr½S ¼ s2 =H 0 

Thus, we reject H 0 if S\s1 or S [ s2 .


We accept H 0 if s1 \S\s2 and random or no conclusion if S ¼ s1 or S ¼ s2 .
Large sample test: Under H 0 ; S  Bðn; qÞ,
) under H 0 ; s ¼ pffiffiffiffiffi

npq  Nð0; 1Þ
Snq

x0 : jsj [ sa2:

Note p ¼ 12 ; np ¼ n12 ¼ median:


 
Under H 0 ; S  B n; 12 and then S is symmetric about n2 : Therefore for two sided
test in case of Case 3,
2  s1 ¼ s2  2 ) s1 ¼ n  s2 and hence a1 ¼ a2 :
n n

6.2.4 Wilcoxon Signed-Rank Test

Another similar modification of the sign test is the Wilcoxon signed-rank test. This
is used to test the hypothesis that observations have come from symmetrical pop-
ulation with a common specified median, say, l0 . Thus the problem is to test
H0 : l ¼ l0 : The signed-rank statistic T þ is computed as follows:
1. Subtract l0 from each observation.
2. Rank the resulting differences in order of size, discarding sign.
3. Restore the sign of the original difference to the corresponding rank.
4. Obtain T þ , the sum of the positive ranks.
Similarly, T  is the sum of the negative ranks. Then under H0 , we expect T þ
and T  to be the same. We also note that

X
n
nðn þ 1Þ
T þ þ T ¼ i¼ :
i¼1
2

The statistic T þ (or T  ) is known as the Wilcoxon statistic. A large value of T þ


(or equivalently, a small value of T  ) means that most of the large deviation from
l0 are positive and therefore we reject H0 in favour of the alternative H1 : l [ l0 .
Thus the test rejects H0 at the level a if T þ \C1 when H1 : l\l0
if T þ [ C2 when H1 : l [ l0
if T þ \C3 or T þ [ C4 when H1 : l 6¼ l0
where C1 ; C2 ; C3 and C4 are such that
152 6 Non-parametric Test

P½T þ \C1  ¼ a

P½T þ [ C2  ¼ a

P½T þ \C3  þ P½T þ [ C4  ¼ a:

6.2.5 Run Test

Suppose we have a set of observations ðX1 ; X2 . . .; Xn Þ. We are to test H0 : The set of


observations are random against H1 : They are not random.
We replace each observation either by ‘+’ or ‘−’ sign according as it is larger or
smaller than the median of the sample observations. Any observation equal to
median is simply discarded. A run is defined to be a sequence of values of the same
kind bounded by the values of other kind. We compute the total number of runs ‘r’.
Too many values of ‘r’ as well as too small values of ‘r’ give an indication of
non-randomness. Thus the test rejects H0 at the level a if r < r1 or r > r2 where r1
and r2 are such that

P½r\r1  ¼ a=2; P½r [ r2  ¼ a=2:

The one-sample run test is based on the order or sequence in which the indi-
vidual scores or observations originally were obtained.
Example 6.1 The theory predicts that the proportion of peas in the four groups A,
B, C and D should be 9:3:3:1. In an experiment among 556 peas, the numbers in the
four groups were 315, 108, 101 and 32. Does the experimental result support the
theory?
Solution If P1, P2, P3 and P4 be the proportions of peas in the four classes in the
whole population of peas, then the null hypothesis to be tested is

9 3 3 1
H0 : P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼
16 16 16 16

The test statistic under H0 is given by

X
k
ðxi  np0 Þ2
v2 ¼ i
with ðk  1Þ d:f
i¼1
np0i
Xk
x2i
¼ n
i¼1
np0i
6.2 One-Sample Non-parametric Tests 153

The expected frequencies are


9
e1 ¼ np01 ¼ 556X ¼ 312:75
16
3
e2 ¼ np02 ¼ 556X ¼ 104:25
16
3
e3 ¼ np03 ¼ 556X ¼ 104:25
16
1
e4 ¼ np04 ¼ 556X ¼ 34:75
16
3152 1082 1012 322
So, v2 ¼ þ þ þ  556
312:75 104:25 104:25 34:75
¼ 556:47  556 ¼ 0:47 with 3 d:f:

From the table we have v20:05;3 ¼ 7:815. Since the calculated value of v2 , i.e. 0.47
is less than the tabulated value, i.e. 7.815, it is not significant. Hence the null
hypothesis may be accepted at 5 % level of significance and we may conclude that
the experimental result supports the theory.
Example 6.2 Can the following sample be reasonably regarded as coming from a
uniform distribution on the interval (35,70): 36, 42, 44, 50, 64, 58, 56, 50, 37, 48,
52, 63, 57, 43, 39, 42, 47, 61, 53, 58? Use Kolmogorov–Smirnov test.
Solution Here we test H0 : FðxÞ ¼ F0 ðxÞ for all x, where F0 ðxÞ is the distribution
function of the uniform distribution on the interval (35,70). Now
F0 ð xÞ ¼ 0 if x  35
x  35
¼ if 35\x\70
35
¼ 1 if x  70

Rearranging the data in increasing order of magnitude, we have the following


results:
x F0 ðxÞ F20 ðxÞ jF20 ð xÞ  F0 ð xÞj
36 1/35 1/20 3/140
37 2/35 2/20 6/140
39 4/35 3/20 5/140
42 7/35 4/20 0
42 7/35 5/20 7/140
43 8/35 6/20 10/140
44 9/35 7/20 13/140
47 12/35 8/20 8/140
48 13/35 9/20 11/140
50 15/35 10/20 10/140
(continued)
154 6 Non-parametric Test

(continued)
50 15/35 11/20 17/140
52 17/35 12/20 16/140
53 18/35 13/20 19/140
56 21/35 14/20 14/140
57 22/35 15/20 17/140
58 23/35 16/20 20/140
58 23/35 17/20 27/140
61 26/35 18/20 22/140
63 28/35 19/20 21/140
64 29/35 20/20 24/140

27
D20 ¼ SupjF20 ðxÞ  F0 ðxÞj ¼ ¼ 0:1929:
x 140

Let us take a ¼ 0:05: Then from the table D20;0:05 ¼ 0:294. Since
0.1929 < 0.294, we accept H0 at 5 % level of significance. So we can conclude that
the given data has come from a uniform distribution on the interval (35,70).
Example 6.3 The following data represent the yields of maize in q/ha recorded
from an experiment.
16.4, 19.2, 24.5, 15.4, 17.3, 23.6, 22.7, 20.9, 18.2
Test whether the median yield (M) is 20 q/ha.
Solution We test H0 : M ¼ 20 against H1 : M 6¼ 20. To test H0 , we find the dif-
ference ðX  20Þ and write their signs

  þ  þ þ þ

Here n = 9 and r = number of ‘+’ sign = 4. This r will be binomial variate with
parameters n = 9 and p = 0.5.
To test H0 against H1 : M 6¼ 20  H1 : p 6¼ 0:5; the critical region x will be
0 0
given by r  ra=2 and r  ra=2 , where r a=2 is the smallest integer and ra=2 is the
largest integer such that
   X9   9
9 1 a
P r  ra=2 H0 ¼  ¼ 0:025
x¼ra=2 x 2 2
ra=2 1   9
X 9 1
i.e.,  0:975
x¼0 x 2
h  i X ra=2   9
0

0  9 1 a
and P r  ra=2 H0 ¼  ¼ 0:025
x¼0 x 2 2
6.2 One-Sample Non-parametric Tests 155

0
From the table we have ra=2  1 ¼ 7, i.e. ra=2 ¼ 8 and ra=2 ¼ 1. Here
0
ra=2 ¼ 1\r ¼ 4\ra=2 ¼ 8, so H0 is accepted at 5 % level of significance.

Example 6.4 For the problem given in Example 6.3, test H0 : M ¼ 20 against
H1 : M 6¼ 20 by using Wilcoxon signed-rank test.
Solution The differences Xi  20 are
−3.6, −0.8, 4.5, −4.6, −2.7, 3.6, 2.7, 0.9, −1.8
The order sequence of numbers ignoring the sign and their ranks with original
signs are as follows:

0.8 0.9 1.8 2.7 2.7 3.6 3.6 4.5 4.6


−1 2 −3 4.5 −4.5 6.5 −6.5 8 −9

Thus, T þ = The sum of the positive ranks = 21 and T  = The sum of negative
ranks = 24.
We note that T þ þ T  ¼ nðn 2þ 1Þ ¼ 45
To test H0 : M ¼ 20 against H1 : M 6¼ 20, the critical region ω will be given by
T þ [ C4 and T þ \C3 at the level a. Here we take a ¼ 0:05:
From the table we have P½T þ [ 39  0:025 and

P½T þ \6  0:025

Since T þ ¼ 21 lies between 6 and 39 (table values), we accept H0 . It means that


the median yield of maize is 20 q/ha.
Example 6.5 Test whether the observations
21, 19, 22, 18, 20, 24, 15, 32, 35, 28, 30 are random.
Solution We test H0 : The observation are random against H1 : The observations
are not random.
The sample values are arranged in increasing order.

15; 18; 19; 20; 21; 22; 24; 28; 30; 32; 35

) Median = 22
Each original observation is replaced by ‘+’ or ‘−’ sign according as it is larger
or smaller than the median, i.e. 22. Any observation equal to median is simply
discarded. Thus we have from the original observation
21 19 22 18 20 24 15 32 35 28 30
- - x - - + - + + + +
156 6 Non-parametric Test

Thus number of runs = r = 4, number of ‘+’ signs = n1 = 5 and number of ‘−’


signs = n2 = 5. From table for n1 = 5, n2 = 5 any observed r of 2 or less or of 10 or
more is in the region of rejection at 5 % level of significance. So H0 is accepted, i.e.
the observations are random.
Example 6.6 The males (M) and females (F) were queued in front of the railway
reservation counter in the order below

M F F M M M F M F F M M F M

Test whether the order of males and females in the queue was random.
Solution Here null hypothesis is
H0 : The order of males and females in the queue was random against
H1 : The order of males and females in the queue was not random.
For the given sequence,
MFFMMMFMFFMMFM
we have,
n1 = number of males = 8
n2 = number of females = 6
r = number of runs = 9
Since the observed value of r = 9 lies between the critical values 3 and 12, we
accept H0 at 5 % level of significance. It means that the order of males and females
in the queue was random.

6.3 Paired Sample Non-parametric Test

In this section we consider the following paired sample non-parametric tests:


(i) Sign test.
(ii) Wilcoxon signed-rank test.

6.3.1 Sign Test (Bivariate Single Sample Problem) or Paired


Sample Sign Test

Suppose we have a bivariate population with continuous distribution function


F(x,y) which is unknown but continuous. The ordinary sign test for the location
parameter of a univariate population is equally applicable to a paired sample
problem. This is the non-parametric version of paired ‘t’ test.
6.3 Paired Sample Non-parametric Test 157

We draw a random sample ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ from F(x, y). To test
H 0 : np ðx  yÞ ¼ np 0 writing z ¼ x  y ) H 0 : np ðzÞ ¼ np 0 , i.e. H 0 : np ¼ np 0 ,
writing np ðzÞ ¼ np .
Assumption z ¼ x  y is continuous in the neighbourhood of np ðzÞ. Note that
   
Pr z  np ¼ p ) Pr znp [ 0 ¼ q; q ¼ 1  p. We define S = total number of
     
positive signs among z1  np 0 ; z2  np 0 ; . . .; zn  np 0 .
 
) Under H 0 , Pr znp 0 [ 0 ¼ q and S  Bðn; qÞ. Proceed for Case 1, Case 2
and Case 3 as worked out already in Sect. 6.2
Note Since np ðx  yÞ is not necessarily equal to np ðxÞ  np ðyÞ, the paired sample
sign test is a test for the quantile difference (but not for the difference of the
quantiles), whereas the paired ‘t’ test is a test for the mean difference (and also for
the difference of the means).

6.3.2 Wilcoxon Signed-Rank Test

This is another test used on matched pairs. It is more powerful than the sign test
because it gives more weight to large numerical differences between the members
of a pair than to small differences. Under matched-paired samples, the differences
d within n paired sample values ðx1i ; x2i Þ for i ¼ 1; 2; . . .; n are assumed to have
come from continuous and symmetric population differences. If Md is the median of
the population of differences, then the null hypotheses is that Md ¼ 0 and the
alternative hypothesis is one of Md [ 0; Md \0 or Md 6¼ 0:
The observed differences di ¼ x1i  x2i are ranked in increasing order of abso-
lute magnitude and the sum of ranks is computed for all the differences of like sign.
The test statistic T is the smaller of these two rank-sums. Paris with di ¼ 0 are not
counted. On the null hypothesis, the expected value of the two ranks-sums would be
equal. If the positive rank-sum is the smaller and is equal to or less than the table
value, the null hypothesis will be rejected at the corresponding level of significance
a in favour of the alternative hypothesis that Md [ 0. If the negative rank-sum is the
smaller, the alternative will be that Md \0. If a two-tailed test is required, the
alternative being that Md 6¼ 0, the given levels of significance should be doubled.
Example 6.7 For nine animals, tested under control conditions and experimental
conditions, the following values of a measured variable were observed:
Animal 1 2 3 4 5 6 7 8 9
Control (x1) 21 24 26 32 55 82 46 55 88
Experimental (x2) 18 9 23 26 82 199 42 30 62
158 6 Non-parametric Test

Test whether a significant difference exists between the medians, using (i) the
sign test and (ii) the Wilcoxon signed-ranks test.
Solution Let h be the median of the distribution of differences. Our null hypothesis
will be H0 : h ¼ 0 against H1 : h 6¼ 0.
(i) Let di ¼ x1i  x2i be the difference of the values under control and experi-
mental conditions.

di : 3; 15; 3; 6; 27; 117; 4; 25; 26

Here we have 7 ‘+’ signs among 9 non-zero values. Under Ho , number(r) of ‘+’
signs will follow a binomial distribution with parameters n = 9 and p = 0.5. To test
H0 : h ¼ 0  H0 : p ¼ 0:5 against H1 : h 6¼ 0  H1 : p 6¼ 0:5; the critical region x
0 0
will be given by r  ra=2 and r  ra=2 where ra=2 is the smallest integer and ra=2 is
the largest integer such that.

   X9   9
9 1 a
P r  ra=2 H0 ¼  ¼ 0:025
x¼ra=2 x 2 2
ra 1   9
X
2
9 1
i:e:;  0:975
x¼0 x 2
h  i X ra=2   90

0  9 1 a
and P r  ra=2 H0 ¼  ¼ 0:025
x¼0 x 2 2

From the table we get ra=2  1 ¼ 7 ) ra=2 ¼ 8 and r 0 a=2 ¼ 1: For our example
r = 7 which lies between ra=2 ð¼8Þ and r 0 a=2 ð¼1Þ: So H0 is accepted.
(ii) The observed differences di ¼ x1i  x2i are ranked in increasing order of
absolute magnitude and the sum of the ranks is computed for all the difference of
like sign. Thus

di 3 15 3 6 −27 −117 4 25 26
Rank 1.5 5 1.5 4 8 9 3 6 7

The test statistic T is the smaller of these two rank-sums (one for positive di and
one for negative di ). Here T = 17. From the table, we reject H0 at a ¼ 0:05 if either
T > 39 or T < 6. Since T > 6 and < 39, we accept H0 .
6.4 Two-Sample Problem 159

6.4 Two-Sample Problem

Case 1 The two populations differ in location only:


We take two univariate populations with continuous distribution functions F 1 ðxÞ
and F 2 ðxÞ which are unknown but continuous.
Assumption The two populations differ only in location.
To test H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 2 ðxÞ is located to the right of F 1 ðxÞ ,
H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 1 ðxÞ  F 2 ðxÞ:
We draw a random sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the first population
and another sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the second popula-
tion. We write, F 1 ðxÞ ¼ F 2 ðxÞ and F 2 ðxÞ ¼ Fðx  dÞ, d is unknown location
parameter. So we are to test H 0 : d ¼ 0 against H 1 : d [ 0.
A. Wilcoxon–Mann Whitney Rank-Sum Test
We pooled the two samples and give them ranks. Suppose ðR1 ; R2 ; . . .; Rn1 Þ and
ðRn1 þ 1 ; Rn1 þ 2 ; . . .; Rn1 þ n2 Þ be the ranks of the 1st and 2nd sample observations
respectively.
[Example (10,7,9,11,3), n1 ¼ 5 is sample 1 and (20,5,17,8), n2 ¼ 4 is the sample 2.

3 \ 5 \ 7 \ 8 \ 9 \ 10 \ 11 \ 17 \ 20
# # # # # # # # #
Ranks 1 2 3 4 5 6 7 8 9

) ðR1 ¼ 6; R2 ¼ 3; R3 ¼ 5; R4 ¼ 7; R5 ¼ 1Þ are the 1st sample ranks and ðR6 ¼


9; R7 ¼ 2; R8 ¼ 8; R9 ¼ 4Þ are the 2nd sample ranks.]
If there is any tie then the corresponding observation is ignored. Let
S1 ; S2 ; . . .; Sn2 be the ordered ranks of the 2nd sample observations, i.e.
S1 \ S1 \ . . .\ Sn2 .
[In the example above 2 < 4 < 8 < 9 ) R7 ¼ S1 ; R9 ¼ S2 ; R8 ¼ S3 ; R6 ¼ S4 ]
P
n2 P
n2
Define T = sum of the ranks of the 2nd sample observations ¼ Rn1 þ j ¼ Sj
j¼1 j¼1
If H 1 is true, then it is expected that the second sample observations are gen-
erally of higher ranks and hence T will be large. So a right tail test will be
appropriate here.
Hence for testing H 0 : d ¼ 0 against H 1 : d [ 0, x0 : T [ ta where ta is such
that Pr½T [ ta =H 0   a. Similarly for H 0 : d ¼ 0 against H 2 : d\0, x0 : T\ta 0
where ta 0 is such that Pr½T\ta 0 =H 0   a, and for H 0 : d ¼ 0 against H 3 : d 6¼
0; x0 : T\t1 ; T [ t2 where t1 and t2 are such that

P½T\t1 =H 0  þ P½T [ t2 =H 0   a:

Null distribution of T: Under H 0 all the nð¼ n1 þ n2 Þ, observations


x1 ; x2 ; . . .; xn1 ; xn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 are i.i.d. so that the second sample ranks can
be considered as a random sample of size n2 without replacement from (1, 2,…,n).
160 6 Non-parametric Test

) l = population mean ¼ n þ2 1 and r2 ¼ Variance ¼ n 121.


2


T nþ1 n2 ðn þ 1Þ
)E =H 0 ¼ l ¼ ) EðT=H 0 Þ ¼
n2 2 2

T n  n2 r 2
n1 n  1 n1 ðn þ 1Þ
2
V =H 0 ¼  ¼  ¼
n2 n  1 n2 n  1 12  n2 12  n2
n1 n2 ðn þ 1Þ
) VðT=H 0 Þ¼ :
12

Hence, if n is large, under H 0


n2 ðn þ 1Þ
s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T2
asymptotically  Nð0; 1Þ
n1 n2 ðn þ 1Þ=12
) For H 0 : d ¼ 0 against H 1 : d [ 0 ) x0 : s [ sa
H 0 : d ¼ 0 against H 2 : d\0 ) x0 : s\  sa
and H 0 : d ¼ 0 against H 3 : d 6¼ 0 ) x0 : jsj [ sa=2
Mann–Whitney
An alternative 8description of the test is more convenient.
< 1 if xn1 þ j [ xi
Let gðxi ; xn1 þ j Þ ¼ 0 otherwise i ¼ 1ð1Þn1
:
j ¼ 1ð1Þn2
U = no. of pairs in which 2nd sample observation is greater than 1st sample
observation

X
n2 X
n1
¼ gðxi ; xn1 þ j Þ
j¼1 i¼1

P
n2 P
n1
¼ gðRi ; Rn1 þ j Þ; [no. of pairs in which 2nd sample ranks are greater than
j¼1 i¼1
1st sample ranks]
n2 X
X n1
¼ gðRi ; Sj Þ
j¼1 i¼1


n n
P
n2 P1 P1
¼ gðRi ; Sj Þ , ½ gðRi ; Sj Þ = no. of 1st sample ranks which are less than Sj ]
j¼1 i¼1 i¼1

X
n2
Xn2
n2 ðn2 þ 1Þ
¼ ðSj  1Þ  ðj  1Þ ¼ ðSj  jÞ ¼ T
j¼1 1
2

n2 ðn2 þ 1Þ n1 n2
) EðU=H 0 Þ = E(T/H0 Þ ¼ :
2 2
6.4 Two-Sample Problem 161

n1 n2 ðn þ 1Þ
VðU=H0 Þ ¼ VðT=H0 Þ ¼
12

Hence, for large n, under H0


U  n12n2 a
s ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Nð0; 1Þ
n1 n2 ðn þ 1Þ
12

Therefore
(1) For H0 : d¼ 0 against H1 : d [ 0; x0 : s [ sa
(2) For H0 : d ¼ 0 against H2 : d\0; x0 : s\  sa
(3) For H0 : d ¼ 0 against H3 : d 6¼ 0; x0 : jsj [ sa=2
B. Mood’s Median Test
Here we test H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ  F2 ðxÞ, i.e. H0 : d ¼ 0 against
H1 : d [ 0.
We draw a sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the 1st population and another
sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the 2nd population.
We mix the two samples and arrange them in ascending order of magnitude. Say
xð1Þ \ xð2Þ \    \ xðnÞ & xðmÞ ¼ combined sample median.
Define T ¼ total no: of 2nd sample size [ xðmÞ
¼ total no: of 2nd sample ranks [ m
Here T is the test statistic.
Under H1 , T would be too large and hence a right tail test is appropriate.
So for H1 : d [ 0 ) x0 : T [ ta where, ta is such that PH0 ½T  ta   a
for H2 : d\0 ) x0 : T\ta 0 where PH0 ½T  ta   a and
for H3 : d 6¼ 0 ) x0 : T  t1 and T  t2 where t1 , t2 are such that
PH0 ½T  t1  þ PH0 ½T  t2   a.
Null distribution of T: We want to get PðT ¼ t=H0 Þ.
Note that the totality of the pooled ranks (1, 2,.., n) is comprised of two subsets:
f1; 2; . . .; mg and fm þ 1; m þ 2; . . .; ng. Under H0 , the second sample ranks
represent a random sample without replacement of size n2 from the entire set. Since
T = no. of 2nd sample ranks exceeding m, the probability that there will be just
t number of members from 2nd subset in the random sample of size n2 is given by
the hypergeometric law:
 
nm m
t n t
) PðT ¼ t=H0 Þ ¼  2
n
n2
n2 ðn  mÞ n1 n2 mðn  mÞ
) EðT=H0 Þ ¼ and VðT=H0 Þ ¼ :
n n2 ðn  1Þ
162 6 Non-parametric Test

As n ! 1; mn ’ 12 and then EðT=H0 Þ ’ n22 and VðT=H0 Þ ’ n4n


1 n2
.
) For large n, under H0

T  n2=2
s ¼ pnffiffiffiffiffiffi
ffi a N ð0; 1Þ
1 n2 
4n

) for H1 : d [ 0 ) x0 : s [ sa
for H1 : d\0 ) x0 : s\  sa
and for H3 : d 6¼ 0 ) x0 : jsj [ sa=2 :

Case II The two populations differ in every respect, i.e. with respect to location,
dispersion, skewness, kurtosis, etc.
C. Wald–Wolfowitz Run test
H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ 6¼ F2 ðxÞ
Here also we arrange the combined sample in ascending order xð1Þ \
xð2Þ \ . . .\ xðnÞ .
Suppose ðR1 ; . . .; Rn1 Þ be the ranks of the 1st sample observation and
ðRn1 þ 1 ; . . .; Rn1 þ n2 Þ be the ranks of the 2nd sample observation. According to the
ordered arrangement,
we write za ¼ 0 if xðaÞ comes from 1st sample
= 1 if xðaÞ comes from 2nd sample.

We note that, 1st sample can be written as xðR1 Þ ; xðR2 Þ ; . . .; xðRn1 Þ and the 2nd
n o
sample can be written as xðRn1 þ 1 Þ ; xðRn1 þ 2 Þ ; . . .; xðRn1 þ n2 Þ .

) za ¼ 0 if a 2 ðR1 ; R2 ; . . .Rn1 Þ
¼ 1 if a 2 ðRn1 þ 1 ; Rn2 þ 2 ; . . .Rn1 þ n2 Þ:

So z1 ; z2 ; . . .; zn is a sequence of 0’s and 1’s and are determined by


ðR1 ; R2 ; . . .; Rn Þ. Let U = number of ‘0’ runs and V = number of ‘1’ runs and
W = U + V = total number of runs.
Here W is our test statistic.
The idea is that if the populations are identical, then the 1st sample and 2nd
sample ranks would get thoroughly mixed up, i.e. the runs of ‘0’ and ‘1’ would be
mixed up thoroughly, i.e. W would be too large. On the other hand, if the two
populations are not identical, i.e. if H0 is not true, then the arrangement of runs
will be patching. So x would be too small. Hence a left tail test would be
appropriate.
6.4 Two-Sample Problem 163

Hence x0 : W  xa where xa is such that PH0 ½W  xa   a. It can be shown


that under H0
8
> 0 if ju  vj  2
>
> n1 1 n2 1
> ðu1
>
>
>
>  Þðv1 Þ if ju  vj ¼ 1
>
< n
Pr½U ¼ u; V ¼ v ¼ n1
>
>
> 2ðu1 Þðv1
1 n2 1
>  Þ if u  v ¼ 0
n 1
>
>
>
> n
>
:
n1
 n2 1 
2 n1 1 m1
) PH0 ½W ¼ 2m ¼ PH0 fu ¼ m; v ¼ mg ¼ m1  and
n
n1

PH0 ½W ¼ 2m þ 1 ¼PH0 fu ¼ m; v ¼ m þ 1g þ PH0 fu ¼ m þ 1; v ¼ mg


   
n1  1 n2  1 n1  1 n2  1
þ
m1 m m m1
¼ 
n
n1

It can be shown that EðW=H0 Þ ¼ 2nn1 n2 þ 1;



2n1 n2 2n1 n2
VðW=H0 Þ ¼ 1 :
nðn  1Þ n

For large n1 and n2 , under H0


W  EH0 ðWÞ a
s ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi  Nð0; 1Þ: ð6:1Þ
VH0 ðWÞ

(Note: Since U and V are not independent, so the traditional CLT for W = U + V is
not applicable here. Still (6.1) is true here as shown by Wald and Wolfowitz using
Strilings’ approximation). We write k1 ¼ nn1 and k2 ¼ nn2 ) k1 þ k2 ¼ 1

) EðW=H0 Þ ¼ 2nk1 k2 þ 1 ’ 2nk1 k2 and VðW=H0 Þ ’ 4nk21 k22 :

1 k2ffi
Then s ¼ p ffiffiffiffiffiffiffiffiffiffiffiffi
W2nk
a N ð0; 1Þ
4nk1 k2 
2 2

x0 : s   sa :
D. Kolmogorov–Smirnov test
Let X1 ; X2 ; . . .; Xn1 be from F1 and Xn1 þ 1 ; Xn1 þ 2 ; . . .; Xn be from F2 . We are to test
H0 : F1 ðxÞ ¼ F2 ðxÞ8x against
164 6 Non-parametric Test

H1 : F1 ðxÞ  F2 ðxÞ8x; F1 ðxÞ [ F2 ðxÞ for some x


Or, H2 : F1 ðxÞ  F2 ðxÞ8x; F1 ðxÞ\F2 ðxÞ for some x
Or, H3 : F1 ðxÞ 6¼ F2 ðxÞ8x; for some x.
Let ‘#’ symbol implies the number of cases satisfying a stated condition.

#xa  x; a ¼ 1; 2; . . .; n1
F1n1 ðxÞ ¼
n1
#xb  x; b ¼ n1 þ 1; n1 þ 2; ::; n2
F2n2 ðxÞ ¼
n2

Test statistic

n1 ;n2 ¼ SupfF1n1 ðxÞ  F2n2 ðxÞg for H1
x
D
n1 ;n2 ¼ SupfF2n2 ðxÞ  F1n1 ðxÞg for H2
x
Dn1 ;n2 ¼ Sup jF1n1 ðxÞ  F2n2 ðxÞj
x
n o
¼ max Dþ n1 ;n2 ; D
n1 ;n2 for H3

Let 2nd sample ranks be Rn1 þ 1 ; . . .; Rn and ordered ranks be S1 \S2 \::\Sn2 .
Similarly for 1st sample ranks are R1 ; R2 ; . . .; Rn1 and ordered ranks are S01 \S02 ::\S0n1 .
Then Dþn1 ;n2 ¼ SupfF1n1 ðxÞ  F2n2 ðxÞg ¼ max Sup fF1n1 ðxÞ  F2n2 ðxÞg
x i¼0;1;::;n1 Xs0  x\S0 þ 1
i i



i S0i  i
¼ max max  ;0 :
i¼1;::;n1 n1 n2

 
Sj j
Similarly, D
n1 ;n2 ¼ max 0; max j
n2  n1 :
j¼1;::;n2

n o
Dn1 ;n2 ¼ max Dnþ1 ;n2 ; D
n1 ;n2 :

Under H0 , Dnis uniform and Dþ ; D and Doare distribution free. [Under H0,
distribution of ðs1 ; s2 ; . . .sn2 Þ; ðs01 ; s02 ; s03 ; . . .s0n1 Þ is independent of (F1 = F2)].
Critical region: under H0, we expect that D+, D− and D are very small. Hence right
tailed test based on D’s would be appropriate.
Asymptotic distribution
qffiffiffiffiffiffiffiffiffiffi 
þ
! 1  e2z as min ðn1 ; n2 Þ! 1; z [ 0
2
For one-sided test PH0 n1 n2
D
n1 þ n2 n1 ;n2  z
qffiffiffiffiffiffiffiffiffiffi
Practically we find a z such that e2z ¼ a and reject H0 if n1n1þn2n2 (observed
2


n1 ;n2 )  z.
6.4 Two-Sample Problem 165
qffiffiffiffiffiffiffiffiffiffi 
P
1
ð1Þi1 e2i
2 2
For two sided test PH0 n1 n2
n1 þ n2 Dn1 ;n2 z ! 1  2 z
as min
i¼1
ðn1 ; n2 Þ ! 1:
Advantages of K–S test over Homogeneity v2 test are as follows
1. K–S test is applicable to ungrouped data, while v2 is applicable to grouped data
only.
2. Under H0 K–S is exactly distribution free, while v2 is asymptotically distribu-
tion free.
3. K–S test is consistent against any alternative, while v2 is so for specific alter-
native only.
Example 6.8 Twelve 4-year-old boys and twelve 4-year-old girls were observed
during two 15 min play sessions and each child’s play during these two periods was
scored as follows for incidence and degree of aggression:
Boys : 86; 69; 72; 65; 113; 65; 118; 45; 141; 104; 41; 50
Girls : 55; 40; 22; 58; 16; 7; 9; 16; 26; 36; 20; 15

Test the hypothesis that there were sex differences in the amount of aggression
shown, using (a) the Wald-Wolfowitz runs test, (b) the Mann–Whitney–Wilcoxon
test and (c) the Kolmogorov–Smirnov test.
Solution We want to test H0 : incidence and degree of aggression are the same in
four-year olds of both sexes against H1 : four-year-old boys and four-year-old girls
display differences in incidence and degree of aggression.

(a) Wald–Wolfowitz runs test


We combine the scores of boys (B’s) and girls (G’s) in a single-ordered series, we
may determine the number of runs of G’s and B’s. The ordered series is given below.

Score 7 9 15 16 16 20 22 26 36 40 41 45 50 55 58
Groups G G G G G G G G G G B B B G G
Runs _________________1___________________________ ____2_____ __3___
Score 65 65 69 72 86 104 113 118 141
Groups B B B B B B B B B
Runs __________________4______________________

Each run is underlined and we observe that r = 4.


From the table for n1 = 12, n2 = 12, we reject H0 at a ¼ 0:05 if r  7. Since our
value of r is smaller than 7, we may reject H0 . So we can conclude that boys and
girls display differences in aggression.
166

7 9 15 16 16 20 22 26 36 40 41 45 50 55 58 65 65 69 72 86 104 113 118 141


G G G G G G G G G G B B B G G B B B B B B B B B
Rank 1 2 3 4.5 4.5 6 7 8 9 10 11 12 13 14 15 16.5 16.5 18 19 20 21 22 23 24
6 Non-parametric Test
6.4 Two-Sample Problem 167

(b) Mann–Whitney–Wilcoxon test


The pooled sample and the ranks are given below:
The sum of the ranks for the observations corresponding to the boys is

R1 ¼ 11 þ 12 þ 13 þ 16:5 þ 16:5 þ 18 þ 19 þ 20 þ 21 þ 22 þ 23 þ 24 ¼ 216

and that for girls is

R2 ¼ 1 þ 2 þ 3 þ 4:5 þ 4:5 þ 6 þ 7 þ 8 þ 9 þ 10 þ 14 þ 15 ¼ 84

The smaller rank-sum is 84. This corresponds to girls.


Hence

n2 ðn2 þ 1Þ
U ¼ n1 n2 þ  R2
2
¼ 144 þ 78  84 ¼ 138

Or, equivalently,

n1 ðn1 þ 1Þ
U ¼ n1 n2 þ  R1
2
¼ 144 þ 78  216 ¼ 6

The test statistic is given by the smaller of the two quantities. Here U = 6. The
other value of U can be obtained from the relation U 0 = n1 n2 − U = 144 – 6 = 138.
The critical value of U for a two-tailed test at a = 0.05 and n1 ¼ n2 = 12 is 37. The
observed U = 6 is less than the table value. Hence it is significant at 5 % level.
Hence H0 is rejected.
(c) Kolmogorov–Smirnov test
The scores of the boys and girls are presented in two frequency distributions shown
below:

Score (x) No. of boys No. of girls F12 ðxÞ G12 ðxÞ jF12 ðxÞ  G12 ðxÞj
7–20 0 6 0 6/12 6/12
21–34 0 2 0 8/12 8/12
35–48 2 2 2/12 10/12 8/12
49–62 1 2 3/12 12/12 9/12
63–76 4 0 7/12 12/12 5/12
77–90 1 0 8/12 12/12 4/12
91–104 1 0 9/12 12/12 3/12
105–118 2 0 11/12 12/12 1/12
119–132 0 0 11/12 12/12 1/12
133–146 1 0 12/12 12/12 0
168 6 Non-parametric Test

D12;12 ¼ SupjF12 ðxÞ  G12 ðxÞj ¼ 9=12. From the table, the critical value for
n1 ¼ n2 ¼ 12 at level a ¼ 0:05 is D12;12;05 ¼ 6=12. Since D12;12 [ D12;12;0:5 , we
reject H0 .

6.5 Non-parametric Tolerance Limits

We draw a random sample ðX1 ; X2 ; . . .; Xn Þ from a distribution with distribution


function F(x) which is continuous. We define functions of sample observations
L ¼ Lðx1 ; x2 ; . . .; xn Þ and U ¼ Uðx1 ; x2 ; . . .; xn Þ such that L < U.

If Pr½PrðL  X  UÞ  b ¼ c ð6:2Þ

then the interval (L,U) is called 100 b% tolerance interval with tolerance coefficient
c. L and U are called lower and upper tolerance limits respectively. If the deter-
mination of c does not depend upon F then the limit (L,U) are called non-parametric
(distribution free) tolerance limits. We note that, (6.2) can be written as,

PrfFðUÞ  FðLÞ  bg ¼ c ð6:3Þ

that is a tolerance interval (L,U) for a continuous distribution having c.d.f. F


(x) with tolerance coefficient c is a random interval such that the probability is c that
the area between the endpoints of the interval (L,U) is at least a certain pre-assigned
quantity ‘b’.
If L and U are two-order statistics say xðrÞ and xðsÞ , (r < s), then (6.3) is

equivalent to Pr FðxðsÞ Þ  FðxðrÞ Þ  b ¼ c.
Wilks has shown that the order statistics provide non-parametric tolerance limits,
while it is Robbins who has shown that it is only the order statistics which provide
distribution free tolerance limits.
Determination of Tolerance Limits
Joint distribution of xðrÞ ; xðsÞ is

n!  r1
g xðrÞ ; xðsÞ ¼ FðxðrÞ Þ
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
 sr1  ns
FðxðsÞ Þ  FðxðrÞ Þ 1  FðxðsÞ Þ f ðxðrÞ Þf ðxðsÞ Þ; xðrÞ \xðsÞ

Putting U ¼ FðxðrÞ Þ and V ¼ FðxðsÞ Þ we get,

n!
gðu; vÞ ¼ ur1 ðv  uÞsr1 ð1  vÞns ; 0\u\v\1:
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
6.5 Non-parametric Tolerance Limits 169

U¼W U¼W 0\y\1


Again we put )
V U ¼Y V ¼ W þ Y; 0\W\1  y:

n!
) gðw; yÞ ¼ wr1 ysr1 ð1  w  yÞns
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
Z1y
n!
) gðyÞ ¼ ysr1 wr1 ð1  w  yÞns dw
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
0
Z1
n!
¼ ysr1 ð1  yÞr1 tr1 ð1  yÞns ð1  tÞns ð1  yÞdt
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
0
Z1
n!
¼ ysr1 ð1  yÞn þ rs tr1 ð1  tÞns dt
ðr  1Þ!ðs  r  1Þ!ðn  sÞ!
0
Cðn þ 1Þ
¼ ysr1 ð1  yÞn þ rs
Cðs  rÞCðn þ r  s þ 1Þ
1
¼ ysr1 ð1  yÞn þ rs ; 0\y\1:
bðs  r; n þ r  s þ 1Þ
 
) Pr FðxðsÞ Þ  FðxðrÞ Þ  b ¼ c
, Pr½y  b ¼ c , Pr½y  b ¼ 1  c

Zb
i:e: gðyÞdy ¼ 1  c
0
Rb
ysr1 ð1  yÞn þ rs dy
0
i:e: ¼1c
bðs  r; n þ r  s þ 1Þ

i:e: Ib ðs  r; n þ r  s þ 1Þ ¼ 1  c ð6:4Þ

For given b; c and n we choose r and s satisfying (6.4) such that r + s = n + 1


that is xðrÞ and xðsÞ are symmetrically placed.
Particular case: r = 1, s = n; Then (6.4) ) Ib ðn  1; 2Þ ¼ 1  c
Rb
tn2 ð1  tÞdt
0
i:e: 1  c ¼
bððn  1Þ; 2Þ
 n1
b bn Cðn  1ÞCð2Þ
i:e: 1  c ¼  =
n1 n Cðn þ 1Þ
nbn1  ðn  1Þbn
¼ nðn  1Þ
nðn  1Þ
) 1  c ¼ nbn1 ð1  bÞ þ bn
170 6 Non-parametric Test

that is 1  c ’ nbn1 ð1  bÞ as 0\b\1 and n ! 1. So for large ‘n’,


1  c ’ nbn1 ð1  bÞ:
For given b and c, one can find n from this relationship.
Alternative
For Bin(n,p), we know
!
X
c n
px qnx ¼I q ðn  c; c þ 1Þ
x¼0 x
¼1  I p ðc þ 1; n  cÞ

Then ð6:4Þ ) c ¼1  I b ðs  r; n þ r  s þ 1Þ
!
X n
sr1
¼ bx ð1  bÞnx :
x¼0 x

So for given n, b and c we can find s and r such that xðrÞ and xðsÞ are sym-
metrically placed.

6.6 Non-parametric Confidence Interval for nP

Suppose F(x) is continuous and arandom sample ðx1 ; x2 ; . . .; xn Þ is drawn from it.
np is the p-th order quantile. So P X  np ¼ p. Define X ðrÞ and X ðsÞ as the rth and
 
sth order statistics, r < s. Then X ðrÞ ; X ðsÞ is said to be 100(1 − α)% confidence
interval for np if

 
Pr X ðrÞ  np  X ðsÞ ¼ 1  a ð6:5Þ

     
Now; Pr X ðrÞ  np  X ðsÞ ¼ Pr np  X ðsÞ  Pr np  X ðrÞ
   
¼ Pr X ðsÞ  np  Pr X ðrÞ  np
   
¼ 1  Pr X ðsÞ \np  1 þ Pr X ðrÞ \np
   
¼ Pr X ðrÞ \np  Pr X ðsÞ \np
 
¼ Pr at least r of the observations \np
 
 Pr at least s of the observations \np
6.6 Non-parametric Confidence Interval for nP 171

!
X
s1
n nx
¼ px ð1  pÞ ð6:6Þ
x¼r x
! !
X
s1 n nx X
r1 n nx
¼ p ð1  pÞ
x
 px ð1  pÞ
x¼0 x x¼0 x
¼ 1  I p ðs; n  s þ 1Þ  1 þ I p ðr; n  r þ 1Þ
¼ I p ðr; n  r þ 1Þ  I p ðs; n  s þ 1Þ
 
Since, Pr X ðrÞ  np  X ðsÞ ¼ 1  a, so r and s are such that

1  a ¼ I p ðr; n  r þ 1Þ  I p ðs; n  s þ 1Þ ð6:7Þ

Given a and n, the selection of r and s satisfying (6.7) is not unique. We select
that pair of r and s for which (s − r) is minimum.
For symmetrically placed order statistics xðrÞ and xðsÞ , we select that pair of (r,
s) such that r + s = n + 1 ⇒ s − 1 = n − r.
!
X
nr
n x nx
) From ð6:7Þ 1  a ¼ p ð1  pÞ :
r x

From this relation one can find r and hence s = n + 1 − r.


Note If in (6.7) the exact probability ð1  aÞ is not attained then we choose that
pair of r and s such that
 
Pr X ðrÞ  np  X ðsÞ  1  a i:e: I p ðr; n  r þ 1Þ  I p ðs; n  s þ 1Þ  1  a:

Non-parametric confidence interval for n1=2 (=median) using sign test


The sign test technique can be applied to obtain a class interval estimate for the
unknown population median n1=2 . Suppose X ð1Þ ; X ð2Þ ; . . .; X ðnÞ be the order statis-
tics. We consider the testing problem H 0 : n1=2 ¼ n0 against H 1 : n1=2 6¼ n0 :
 
Define; S ¼ total no: of þ ve signs among XðiÞ  n0 8i ¼ 1ð1Þn

The ordinary sign test is


8
>
> 1 if s\s1
>
>
< a1 if s ¼ s1
/ðsÞ ¼ 0 if s1 \s\s2
>
>
> a2
> if s ¼ s2
:
1 if s [ s2
172 6 Non-parametric Test

where s1 and s2 are such that


9
a
Pr ½s\s1 =H0 \  Pr ½s  s1 =H0  >
=
2 ð6:8Þ
a
Pr ½s [ s2 =H0 \  Pr ½s  s2 =H0  > ;
2
a a
Pr½s\s1 =H0  Pr½s [ s2 =H0 
Also a1 and a2 are such that a1 ¼ 2 P½s¼s1 =H0  and a2 ¼ 2 P½s¼s2 =H0 
We accept H0 if s1 \s\s2 and so

Pr½s1 \s\s2  ¼ 1  a

i:e:; Pr½s1 þ 1  s  s2  1 ¼ 1  a ð6:9Þ

In order to obtain a confidence interval for n1=2 we need only to translate the
inequality in the LHS of (6.9) to an equivalent statement involving
 the order
statistics and n1=2 . We have seen earlier that 1  a ¼ Pr XðrÞ  np  XðsÞ
!
sP
1 n x nx
¼ p ð1pÞ .
x¼r x
Now, for
!
1   X s1 n 1 n
p ¼ ; 1  a ¼ Pr XðrÞ  n1=2  XðsÞ ¼
2 x¼r x 2

1
¼ Pr½r  S  s  1 as S  B n; under H0 :
2
 
) Pr XðrÞ  n1=2  XðsÞ ¼ Pr½r  S  s  1 ¼ 1  a ð6:10Þ

Comparing (6.9) and (6.10), we can write


h i
Pr Xðs1 þ 1Þ  n12  Xðs2 Þ ¼ 1  a
   
) 100ð1  aÞ% C.I. for n1=2 using sign test is Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ

fsince S is symmetric about n=2; n2  s1 ¼ s2  n=2
For large samples, (6.9) is equivalent to
6.6 Non-parametric Confidence Interval for nP 173

" #
s1 þ 1  n=2 S  n2 s2  1  n2
Pr pffiffiffiffiffi  pffiffiffiffiffi  pffiffiffiffiffi ¼1a
n=4 n=4 n=4
" #
s1 þ 1  n=2  0:5 s2  1  n2 þ 0:5
or, Pr pffiffiffiffiffi s pffiffiffiffiffi ¼1a
n=4 n=4

s1 þ 1  n=2  0:5 s2  1  n2 þ 0:5


) pffiffiffiffiffi ¼ sa=2 and pffiffiffiffiffi ¼ sa=2
n=4 n=4

pffiffiffiffiffi pffiffiffiffiffi
i:e: s1 ¼ n=2  0:5  n=4s
a=2 & s2 ¼ =2 þ 0:5 þ
n n=4s
a=2 ð6:11Þ

So, 100ð1  aÞ% C.I. for n1=2 using sign test is


   
Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ where s1 and s2 are given by (6.11).

6.7 Combination of Tests

When several tests of the same hypothesis H0 are made on the basis of independent
sets of data, it is quite likely that some of the tests will dictate rejection of the
hypothesis (at the chosen level of significance) while the others will dictate its
acceptance. In such a case, one would naturally like to have a means of combining
the results of the individual tests to reach a firm, overall decision. While one may
well apply the same test to the combined set of data, what we are envisaging is a
situation where only the values of the test statistics used are available.
Let us denote by Ti the statistic used in making the ith test (say, for i = 1, 2,...,k).
Commonly T1 ; T2 ; . . .; Tk will be statistics defined in the same way (like v2
statistics or t-statistics), but with varying sampling distributions simply because
they are based on varying sample sizes. To fix ideas, let us assume that in each case
the test requires that H0 be rejected if, and only if, the observed value of the
corresponding statistic be too large. Consider, in this situation, the probabilities
yi ¼ Pr½Ti [ ti =H0 , for i = 1,2,…,k.
Provided Ti has a continuous distribution under H0 , say with probability density
R1
function gi ðtÞ, so that yi ¼ gi ðtÞdt; where ti is a randomly taken value of Ti , yi has
ti
the rectangular distribution over the interval [0,1] under H0 and hence 2 loge yi
Pk
has the v2 distribution with df = 2. Consequently Pk ¼ 2 loge yi has, under H0 ,
i¼1
the v2 distribution with 2k degrees of freedom. This statistic is used as the test
statistic for making the combined test. One would reject H0 if, and only if, the
observed value of Pk exceeds v2a;2k .
174 6 Non-parametric Test

The case where each individual test requires rejection of H0 if, and only if, the
observed value of the corresponding test statistic is too small, or the case where
each individual test requires rejection of H0 if, and only if, the observed value of the
test statistic is either too large or too small, is to be similarly dealt with. The reason
is that, if Ti have continuous distributions under H0 , then ui ¼ Pr½Ti \ti =H0  and
vi ¼ Pr½jTi j [ jti j=H0  are also rectangularly distributed over (0,1). This implies
Pk
that the statistic Pk to be appropriate to these situations, viz., Pk ¼ 2 loge ui
i¼1
P
k
and Pk ¼ 2 loge vi , are also distributed as v2 statistics with df = 2k under H0 . In
i¼1
each of these cases also, the overall decision will be to reject H0 if, and only if, the
observed value of the respective Pk exceeds v2a;2K .
Example 6.9 In order to test whether the mean height ðlÞ of a variety of paddy
plants, when fully grown, is 60 cm, or less than 60 cm, five experimenters made
independent (student’s) t-tests with their respective data. The probabilities of the t-
statistics (with the appropriate df in each case) to be less than their respective
observed values are 0.023, 0.061, 0.07, 0.105 and 0.007. If the tests are made at 5 %
level, then the hypothesis H0 : l ¼ 60 cm, has to be accepted in three cases out of
the five.
In order to combine the results of the 5 tests, we note that log yi , for i = 1, 2, 3, 4
and 5, are 2:36173; 2:78533; 2:23045; 1:02119 and 3:84510, respectively. Hence for
P5 P5
the data, loge ui ¼ 10 þ 2:24380 ¼ 7:75620, so that Pk ¼  2 loge ui ¼
i¼1 1
2:30259 15:5124 ¼ 35:719.
This is to be compared with v2:05;10 ¼ 18:307 and v2:01;10 ¼ 23:205: Since the
observed value of Pk exceeds the tabulated values, the combined result of the
experimenter’s tests leads to the rejection of H0 at both 5 % and the 1 % level. In
other words, in the light of all 5 experimenters’ data, we may conclude that the
mean height at the variety of paddy plant is less than 60 cm.

6.8 Measures of Association for Bivariate Samples

A. Spearman’s rank correlation coefficient


In many situations, the individuals are ranked by two judges or the measurements
taken for two variables are assigned ranks within the samples independently. Now it
is desired to know the extent of association between the ranks. The method of
calculating the association between ranks was given by Charles Edward Spearman
in 1906 and is known as Spearman’s rank correlation.
Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ: be a sample from a bivariate population. If
the sample values X1 ; X2 ; . . .; Xn and Y1 ; Y2 ; . . .; Yn are each ranked from 1 to n in
6.8 Measures of Association for Bivariate Samples 175

increasing order of magnitude separately and if the X’s and Y’s have continuous
distribution functions, we get a unique set of rankings. The data will then reduce to
n pairs of ranking. Let us write

R1a ¼ Rank of Xa ; a ¼ 1; 2; . . .; n:
R2a ¼ Rank of Ya ; a ¼ 1; 2; . . .; n:

Pearsonian coefficient of correlation between the ranks R1a ’s and R2a ’s is called
the Spearman’s rank correlation coefficient rs which is given by
Pn
 1 ÞðR2a  R
ðR1a  R 2Þ
a¼1
rs ¼
nP Pn o1=2
n
a¼1
 1 Þ2
ðR1a  R  2
a¼1 ðR2a  R2 Þ
n 
P  
12 R1a  n þ2 1 R2a  n þ2 1
a¼1
¼
nðn2  1Þ

If for n individuals, Da ¼ R1a  R2a , is the difference between ranks of the ath
individual for a ¼ 1; 2; . . .; n, the formula for Spearman’s rank correlation is

P
n
6 D2a
i¼1
rs ¼ 1  :
nðn2  1Þ

The value of rs lies between −1 and +1. If X, Y are independent then Eðrs Þ ¼ 0.
Also Population Spearman’s rank correlation coefficient, i.e. qs ¼ 0 ) Eðrs Þ ¼ 0:
Kendall in 1962 derived the frequency function of rs and gave exact critical value
rs . But the approximate test of rs which is the same as t-test for Pearsonian cor-
relation coefficient is good enough for all practical purposes. Here we test H0 :
qs ¼ 0 against H1 : qs 6¼ 0. The test statistic
pffiffiffiffiffiffi
t ¼ rp
s ffiffiffiffiffiffiffi
n2ffi
has (n − 2) d.f. The decision about H0 is taken in the usual way. For
1rs2
pffiffiffiffiffiffiffiffiffiffiffi
large samples under H0, the random variable Z ¼ rs n  1 has approximately a
standard normal distribution. The approximation is good for n  10.
B. Kendall’s rank correlation coefficient
Kendall’s rank correlation coefficient τ is suitable for the paired ranks as in case of
Spearman’s rank correlation. Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ be a sample from a
bivariate population.  
For any two pairs ðXi ; Yi Þ and Xj ; Yj we say that the relation is perfect con-
cordance if
176 6 Non-parametric Test

Xi \Xj whenever Yi \Yj or Xi [ Xj whenever Yi [ Yj and that the relation is


perfect discordance if Xi [ Xj whenever Yi \Yj or Xi \Xj whenever Yi [ Yj .
Let pc and pd be the probability of perfect concordance and of perfect discor-
dance respectively defined by
 
pc ¼P ðXj  Xi ÞðYj  Yi Þ [ 0
 
and pd ¼P ðXj  Xi ÞðYj  Yi Þ\0 :

The measure of association between the random variables X and Y defined by

s ¼ pc  pd

is known as Kendall’s tau ðsÞ


It is noted that
s ¼ 0 if X and Y are independent.
= +1 if X and Y be in prefect concordance.
= −1 if X and Y be in prefect discordance.
We now need to find an estimate of s from the sample.
Using sample observations, Kendall’s measure of association becomes

1 X
n X
T¼ ! sðxj  xi Þsðyj  yi Þ ð6:12Þ
n 1  i\j  n
2

where sðrÞ ¼ 1 if r [ 0
¼ 0 if r ¼ 0
¼  1 if r\0
  
Naturally E s xj  xi sðyj  yi Þ ¼ pc  pd ¼ s
The statistic T defined in (6.12) is known as Kendall’s sample tau ðsÞ coefficient.
The procedure for calculating T consists of the following steps:
Step 1: Arrange the rank of the first set (X) in ascending order and rearrange the
ranks of the second set (Y) in such a way that n pairs of rank remain the same.
Step 2: After operating Step 1, the ranks of X are in natural order. Now we are left
to determine how many pairs of ranks on the set Y are in their natural order and how
many are not. A number is said to be in natural order if it is smaller than the
succeeding number and is coded as +1 and also if it is greater than its succeeding
 it will not be taken in natural order and will be coded as −1. In this
number then
n
way all pairs of the set (Y) will be considered and assigned the values +1
2
and −1.
6.8 Measures of Association for Bivariate Samples 177

Step 3: Find the sum ‘S’ of all the coded values.


Step 4: The formula for Kendall’s rank correlation coefficient-T is

S Actual value 2S
T¼ ¼ ¼
n Maximum possible value nðn  1Þ
2

Here we test H0 : s ¼ 0 against


 H1 : s6¼ 0. Thus we reject H0 if the observed
value of jTj [ ta=2 where P jT [ ta=2 jH0 ¼ a. The values of ta are given in the
table for selected values of n and a. Values for 4  n  10 are tabulated by Kendall.
4ðn2Þ
þ 1s . If n ! 1 under
2
It can be shown that EðTÞ ¼ s and VðTÞ ¼ 9nðn1Þ
n
2
pffiffi
H0 : s ¼ 0, 2 T  Nð0; 1Þ and we can test the independence of x and y.
3 n

Remark An important difference between T and rs is that T provides an unbiased


estimate of s, whereas rs is not an unbiased estimate of qs :
Example 6.10 Following are the ranks awarded to seven debators in a competition
by two judges.
Debators A B C D E F G
Ranks by judge I (x) 3 2 1 6 7 4 5
Ranks by judge II (y) 5 6 3 7 4 2 1

Compute (i) Spearmen’s rank correlation coefficient ðrs Þ and Kendall’s sample
tau coefficient (T) and test their significance.
Solution (i) First we find di ¼ xi  yi 8i which are
d : −2 −4 −2 −1 3 2 4
P
Also, 7i¼1 di2 ¼P54
6 d2
thus rs ¼ 1  nðn2 1Þi ¼ 1  6 54
7 48 ¼ 0:036
To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the statistic
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi
rs n  2 0:036 7  2
t5 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:080
1  rs2 1  ð0:036Þ2

From the table, t0.025,5 = 2.571. Calculated value of |t| = 0.080 < 2.571, hence we
accept H0. It means there is a dissociation between the ranks awarded by two
judges.
178 6 Non-parametric Test

(ii) We write below ranks of x in natural order and ranks of y correspondingly

x 1 2 3 4 5 6 7
y 3 6 5 2 1 7 4

For this problem, n = 7


For S, take the rank 3 and give +1 or −1 value for all pairs with subsequent ranks
of y. 3 < 6, give a number +1; 3 < 5, again +1; 3 > 2, give a number −1 and so on.
Then choose 6 and take the pairs (6,5), (6,2), (6,1), (6,7) and (6,4) and continue the
process till we reach the last pair (7,4). Proceeding in this manner,
S = (+1 +1 –1 –1 +1 +1) + (–1 –1 –1 +1 –1) + (–1 –1 +1 –1) + (–
1 +1 +1) + (+1 +1) + (–1)
= 2 – 3 – 2 + 1 + 2 – 1 = –1
Thus T ¼ S ! ¼ 1 2
7 6 ¼ 0:048
n
2
To test the significance of T, we test
H0 : s ¼ 0 against H1 : s 6¼ 0.
From the table, for n = 7 we have t0.025 = 0.62. Since |T| = 0.048 < 0.62, we
accept H0 . It reveals that there is no association between the ranks awarded by two
judges.
Example 6.11 A random sample of 12 couples showed the following distribution of
heights (in inches):
Couple no. 1 2 3 4 5 6 7 8 9 10 11 12
Husband 80 70 73 72 62 65 74 71 63 64 68 67
height
Wife height 72 60 76 62 63 46 68 71 61 65 66 67

(a) Compute rs and T.


(b) Test the hypothesis that the heights of husband and wife are independent using
rs as well as T. In each case use the normal approximation.

Solution (a) The heights of husband and wife are each ranked from 1 to 12 in
increasing order of magnitude separately and let us denote their ranks by xi and yi
respectively (i = 1,2, …, 12).

xi : 12 7 10 9 1 4 11 8 2 3 6 5
yi : 11 2 12 4 5 1 9 10 3 6 7 8
di ¼ x i  y i :1 5 2 5 4 3 2 2 1 3 1 3
X
di2 ¼ 108:
6.8 Measures of Association for Bivariate Samples 179
P
6 di26 108
Thus rs ¼ 1  ¼ 1  12
nðn2 1Þ 143 ¼ 0:6224
We write below the ranks of x in natural order and ranks of y correspondingly

xi : 1 2 3 4 5 6 7 8 9 10 11 12
yi : 5 3 6 1 8 7 2 10 4 12 9 11

n
Total number of scores = ¼ 12 2 11 ¼ 66
2
Actual score = S = 3 + 6 + 3 + 8 + 1 + 2 + 5 + 0 + 3 – 2 + 1 = 30 (procedure for
calculations of S is explained in Example 6.10 (ii))
Thus, T ¼ 3060 ¼ 0:4545
(b) To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the approximate test statistic is
pffiffiffiffiffiffiffiffiffiffiffi
Z ¼ rs n  1
pffiffiffiffiffi
¼ 0:6224 11 ¼ 2:06  Nð0; 1Þ

Since Cal |Z|, i.e., 2.06 > Z0.025 = 1.96, hence we reject H0 .
It means that the heights of husband and wife are not independent.
To test H0 : s ¼ 0 against H1 : s 6¼ 0, the approximate test statistic is

3 pffiffiffi 3 pffiffiffiffiffi
Z¼ n T¼ 12 0:4545 ¼ 2:36  Nð0; 1Þ
2 2

Since Cal |Z|, i.e., 2.36 > Z0.025 = 1.96, hence we reject H0 . Hence we can
conclude that there is an association between the heights of husband and wife.
Chapter 7
Statistical Decision Theory

7.1 Introduction

In this chapter we discuss the problems of point estimation, hypothesis testing and
interval estimation of a parameter from a different standpoint.
Before we start the discussion, let us first define certain terms commonly used in
statistical inerence problem and decision theory. Let X 1 ; X 2 ; . . .; X n denote a ran-
dom sample of size n from a distribution that has the p.d.f. f(x, θ), where θ is an
unknown state of nature or an unknown parameter and H is the set of all possible
values of θ, i.e. parameter space (known).
To make some inference about θ, i.e. to take some decisions or action about θ,
the statistician takes an action on the basis of the sample point ðx1 ; x2 ; . . .; xn Þ.
Let us define

 ¼ the set of all possible actions for statistician ðaction spaceÞ


 to choose an action a from :

So, θ = true state of nature and a = action taken by the statistician.


The value L(θ, a) is the loss incurred by taking action ‘a’ when θ is true.
Equivalently, it is a measure of the degree of undesirability of choosing an action
‘a’ when θ is true and this gives a preference pattern over ɶ for given θ, i.e. the
smaller the loss the better the action under θ. L(θ, a) is a real-valued function on Θ x
ɶ = Loss function. Thus (Θ, ɶ, L) is the basic element in our discussion.
Example 7.1 Let θ = average life length of electric bulbs produced in a factory and
H ¼ ð0; 1Þ:
Point estimation of θ
To estimate the value of θ ≡ to choose one value from ð0; 1Þ; so a ¼ ð0; 1Þ.
Observe life lengths of some randomly selected bulbs.
Define L(θ, a) = ðh  aÞ2 = squared error loss function

© Springer India 2015 181


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0_7
182 7 Statistical Decision Theory

(or) = jh  aj = absolute error loss function


(or) = wðhÞðh  aÞ2 = weighted squared error loss function where
wðhÞ = a known function of θ.
Desired nature of (L, θ) graph should be a convex function with minimum at θ
and increasing in jh  aj.

L(θ,a)

Testing of hypothesis of θ
To test H 0 : h  h0 (a given value of θ) against

H 1 : h [ h0

 ¼ fa0 ; a1 g where a0 = accept H0 and a1 = accept H1. Here, simple (0–1) loss
function is as

a0 a1
h  h0 0 1
h [ h0 1 0
9
a0 a1 =
l00 \l01
or, assigned value loss function is as h  h0 l00 l01
; l11 \l10
h [ h0 l10 l11
9
a0 a1 =
w1 ðhÞ " in h0  h
or, a ð0  xÞ type loss function is as h  h0 0 x1 ðhÞ
; w2 ðhÞ " in h  h0
h [ h0 x2 ðhÞ 0

Interval estimation
Here, we are to choose one interval from ð0; 1Þ.

So;  ¼ The set of all possible intervals of ð0; 1Þ


¼ ða1 ; a2 Þ:

1 if h 62 a
Lðh; aÞ ¼ or, may be Lðh; aÞ ¼ a2  a1 ¼ length of the interval.
0 if h 2 a
7.1 Introduction 183

Let R ¼ A random experiment performed


X ¼ Random outcomes of the experiment ¼ Random variable or vector
x ¼ Observed value of X

x ¼ Sample space

The probability distribution of X depends on θ, (say)

Ph : Ph ½X 2 A or; F h ðxÞ ¼ Ph ½X  x
or; f h ðxÞ ¼ p.d.f or p.m.f of X:

The statistician observes the value x of X to take his decision. If X = x is observed


the statistician takes an action d ðxÞ 2 , d ðxÞ : x!
where

d ðxÞ ¼ A decision rule in its simplest form


¼ A non-randomized decision rule:

If d(x) = action taken; loss incurred under h ¼ Lðh; d ðxÞÞ. If d(x) = decision rule,
then loss incurred (under θ) ¼ Lðh; d ðxÞÞ (random quantity) = a real-valued random
variable. Expected loss (under θ) ¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ = risk of d(x) under θ.

) Rd ðhÞ : h 2 H ! Risk function of d ðxÞ:

Let us restrict to rule d(x) for which Rd ðhÞ\18h and let D = the set of all such
d(x)’s. Rd ðhÞ gives a preference pattern D for given θ. The smaller the risk the better
is the decision rule d(x).
X
Thus, ðH; ; LÞ ! ðH; D; RÞ
Example 7.2 Point estimation of real h :  ¼ H
d(x): 
x ! ɶ(Θ); d(x) = point estimator of θ.
For squared error loss Rd ðhÞ ¼ E h ðd ðxÞ  hÞ2 ¼ MSE of d(x) under θ.
Example 7.3  ¼ fa0 ; a1 g; ai ¼ accept Hi , i = 0, 1,
 
d ð xÞ : 
x ! a0; a1
x0 ¼ fx : d ð xÞ ¼ a0 g ¼ acceptance region


x1 ¼ fx : d ð xÞ ¼ a1 g ¼ rejection region

) d ðxÞ ¼ a0 if x 2 
x0
a1 if x 2 
x1


x0 and 
x1 are disjoint and 
x0 þ 
x1¼ 
x
184 7 Statistical Decision Theory

For ð01Þ loss if h 2 H0 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a1 g


¼ Ph fX 2 
x1 g
¼ Probability of first kind of error:

If h 2 H1 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a0 g ¼ Ph fX 2 
x0 g
¼ 1  Ph fX 2  x1 g ¼ Probability of type 2 error:
Interval estimation of real θ
ɶ = set of all possible intervals of H.

d ð xÞ : 
x!

d ð xÞ ¼ ðd1 ð xÞ; d2 ð xÞÞ



1 if h 62 a
Lðh; aÞ ¼
0 if h 2 a

Then Rd ðhÞ ¼ Ph fh 62 d ð xÞg ¼ 1  Ph fh 2 d ð xÞg


If Lðh; aÞ ¼ a2  a1
then Rd ðhÞ ¼ Eh ½d2 ð xÞ  d1 ð xÞ = Expected length of d(x).
Thus, (Θ, ɶ, L) = Basic element of a statistical decision problem.
X = observable random variable; for each x, d ð xÞ 2 , i.e. d : 
x!
d(x) = a non-randomized decision rule.

Rd ðhÞ ¼ Eh ðLðh; d ð xÞÞÞ ¼ Risk of d ðxÞ

D = the set of all non-randomized decision rules (with finite risks 8h)

Randomized Decision Rules


Randomized action
Example 7.4 Let H ¼ fh1 ; h2 g; ɶ ¼ fa1 ; a2 ; a3 g and
a1 a2 a3
Loss function as h1 1 4 3
h2 4 1 3
Neither a1 nor a2 is better than a3 for every value of h. Now define an action
1
a : a ¼ a1 with probability
2
1
¼ a2 with probability
2
7.1 Introduction 185

The expected loss for a is


1 1
Lðh1 ; a Þ ¼ Lðh1 ; a1 Þ þ Lðh1 ; a2 Þ ¼ 2:5
2 2
1 1
Lðh2 ; a Þ ¼ Lðh2 ; a1 Þ þ Lðh2 ; a2 Þ ¼ 2:5
2 2

Thus a is to be preferred to a3 both under h1 and h2 . Such an a is called


randomized action.
Generally, by randomized action a we mean actually a probability distribution
over ɶ and loss due to a randomized action a is
Lðh; a Þ ¼ ELðh; zÞ where z is a random variable with probability distribution a
over ɶ.

Advantages of considering randomized actions


1. Extends the class of actions, i.e. allows more flexibility for the statistician.
2. The set of all randomized actions is convex, i.e. if a1 ; a2 are two randomized
actions, then for every 0  a  1; aa1 þ ð1  aÞa2 is also a randomized action
     
with L h; aa1 þ ð1  aÞa2 ¼ aL h; a1 þ ð1  aÞL h; a2 .
We shall consider only randomized actions a for which Lðh; a Þ is finite 8h and
shall denote by ɶ* the set of all such randomized actions.
Note Clearly ɶ  ɶ* because a non-randomized action ‘a’ ≡ A probability dis-
tribution over ɶ degenerate at the point ‘a’.

First definition of randomized decision rule


Let X = observable random variable
x = observed value of X
For each x, let dð xÞ 2 ɶ*, i.e. d : 
x ! ɶ*
d ¼ dð xÞ = a (behavioural) randomized decision rule.
Rd ðhÞ ¼ Risk of d at h ¼ Eh Lðh; dð xÞÞ
We shall consider only behavioural rules d for which Rd ðhÞ is finite 8h and shall
denote by D as the class of all such behavioural rules. Clearly, D  D.
Example 7.5 Test of hypothesis problem H0 : h 2 H0 against H1 : h 2 H1 .

 ¼ fa0 ; a1 g; ai ¼ accept H i ; i ¼ 0; 1; . . .

A typical randomized action a ¼ /


where / = probability of accepting H 1
1  / = probability of accepting H 0 ; 0  /  1.

A typical behavioural decision rule: d ¼ dðxÞ ¼ /ðxÞ


where /ðxÞ ¼ probability of accepting H 1 for X ¼ x
186 7 Statistical Decision Theory

1  /ðxÞ ¼ probability of accepting H 0 for X ¼ x

0  /ðxÞ  1. For 0–1 loss, Lðh; a Þ ¼ /  1 þ ð1  /Þ  0 ¼ / for h 2 H0

Lðh; a Þ ¼ /  0 þ ð1  /Þ  1 ¼ 1  / for h 2 H1

R/ ðhÞ ¼ Eh /ð xÞ for h 2 H0
¼ Eh ½1  /ð xÞ for h 2 H1

Second definition of randomized decision rule


Let X = observable random variable; x = observed value of X.
D = the set of all non-randomized decision rules.
d ¼ A probability distribution over D
¼ A randomized ðmixedÞ decision rule with Rd ðhÞ ¼ ERz ðhÞ ¼ Risk of d at h where
Z ¼ A random variable with probability distribution d over D:

Example 7.6  ¼ fa1 ; a2 g 


x ¼ fx 1 ; x 2 g
D ¼ fd 1 ; d 2 ; d 3 ; d 4 g

d 1 : d 1 ð x 1 Þ ¼ a1 ; d 1 ð x 2 Þ ¼ a1
d 2 : d 2 ð x 1 Þ ¼ a2 ; d 2 ð x 2 Þ ¼ a2
d 3 : d 3 ð x 1 Þ ¼ a1 ; d 3 ð x 2 Þ ¼ a2
d 4 : d 4 ð x 1 Þ ¼ a2 ; d 4 ð x 2 Þ ¼ a1

A typical mixed decision rule is d ¼ ðp1 ; p2 ; p3 ; p4 Þ


P
4
pi 0 8 i ¼ 1ð1Þ4, pi ¼ 1 where
1

pi ¼ probability of choosing non-randomized rule d i

X
4
R d ð hÞ ¼ pi Rd i ðhÞ:
i¼1

We shall consider only mixed rules d for which Rd ðhÞ is finite 8h and shall
denote by D as the class of all such mixed decision rules. Clearly, D  D since a
non-randomized rule d = a probability distribution over D degenerate at d.
First mode of randomization:
X
ðH; ; LÞ ! ðH;  ; LÞ !ðH; D; RÞ
7.1 Introduction 187

Second mode of randomization:


X
ðH; ; LÞ !ðH; D; LÞ ! ðH; D ; RÞ

Note The two modes of randomization can be considered to be equivalent in the


sense that given any d 2 D one can find a d 2 D with Rd ðhÞ ¼ Rd ðhÞ 8h and
conversely.
Example 7.7  ¼ fa1 ; a2 g,  x ¼ fx 1 ; x 2 g
D ¼ fd 1 ; d 2 ; d 3 ; d 4 g as defined earlier.
P
4
A typical d 2 D is d ¼ ðp1 ; p2 ; p3 ; p4 Þ; pi 0 for i = 1(1)4, pi ¼ 1, where
1
pi ¼ probability of choosing d i .
n . X o
D ¼ ðp1 ; p2 ; p3 ; p4 Þ pi 08i; pi ¼ 1

A typical d 2 D is d ¼ ð/1 ; /2 Þ; 0  /1 ; /2  1, where /i ¼ /ðxi Þ = probabil-


ity of taking action a1 if X ¼ xi ;

1  /i ¼ probability of taking action a2 if X ¼ xi :

D ¼ fð/1 ; /2 Þ=0  /1 ; /2  1g

If one chooses a d 2 D ,

a1 is chosen with probability p1 þ p3 for X ¼ x1
a2 is chosen with probability 1  ðp1 þ p3 Þ ¼ p2 þ p4

a1 is chosen with probability p1 þ p4 for X ¼ x2
a2 is chosen with probability 1  ðp1 þ p4 Þ ¼ p2 þ p3

Thus, d can be considered to be equivalent to a d 2 D with /1 ¼ p1 þ p3 ,


/2 ¼ p1 þ p4 .
Similarly, a d 2 D can be considered to be equivalent to a d ¼ D with
p1 þ p3 ¼ /1 , p1 þ p4 ¼ /2 .

Advantages of considering randomized rules


1. Extends the class of decision rules, i.e. allows more flexibility to the statistician
2. The set of all randomized rules is convex, i.e. if d1 ; d2 2 D (or D ) then
ad1 þ ð1  aÞd2 2 D (or D ).
For every 0  a  1 and Rad1 þ ð1aÞd2 ðhÞ ¼ aRd1 ðhÞ þ ð1  aÞRd2 ðhÞ8h.
Thus, h 2 H; a 2 ; Lðh; aÞ; ðH; ; LÞ
X = observable random variable
P ¼ fPh =h 2 Hg ¼ family of probability distribution of X
188 7 Statistical Decision Theory

d(x) = a non-randomized decision rule


D = the class of all non-randomized decision rules
dðX Þ = a behavioural or randomized decision rule
D = the class of all behavioural rules
D = the class of all randomized rules
D and D are equivalent classes.
We shall hereafter denote both D and D as D.
Note D
D
Let d 2 D, Rd ðhÞ = risk function of d; h 2 H.
Goodness of a d is measured by risk function.

A natural ordering of decision rules


Let d1 ; d2 2 D
1. d1 is said to be equivalent to d2 ðd1  d2 Þ if Rd1 ðhÞ ¼ Rd2 ðhÞ 8h 2 H
2. d1 is at least as good as d2 ðd1 d2 Þ if Rd1 ðhÞ  Rd2 ðhÞ 8h 2 H
3. d1 is said to be better than d2 ðd1 [ d2 Þ if Rd1 ðhÞ  Rd2 ðhÞ 8h 2 H with strict
inequality for at least one θ.

Note
1. d1 d2 ) either d1 [ d2 , or d1  d2
d1 [ d2 ) d1 d2
2. d1 [ d2 ; d2 [ d3 ) d1 [ d3 , similarly for case
3. It may so happen that neither d1 [ ðor Þd2 nor d2 [ ðor Þd1 . In such case d1
and d2 are non-comparable. Thus [ ðor Þ gives a partial ordering of rules
2D

Example 7.8 X N ðh; 1Þ


To estimate h; H ¼  ¼ ð1; 1Þ:
Lðh; aÞ ¼ ðh  aÞ2 ¼ squared error loss. For any real constant C, let d c ðX Þ ¼
CX ¼ A non-randomized rule (Fig. 7.1).

Rdc ðX Þ ¼ Eh ½CX  h2 ¼ E h ½C ðX  hÞ  hð1  C Þ2


¼ C 2 E h ðX  hÞ2 þ h2 ð1  C Þ2 2C ð1  C Þh  Eh ðX  hÞ
¼ C 2 þ h2 ð 1  C Þ 2

Fig. 7.1 Rd ( θ)
R dc ( θ )
Rd (θ )
1
2

R d1 ( θ )

θ
7.1 Introduction 189

For C = 1, Rd 1 ðhÞ ¼ 1 8h.


For C > 1, Rd c ðhÞ [ 1 ¼ Rd 1 ðhÞ 8h ) d 1 [ d c
If C ¼ 12 ; Rd1=2 ðhÞ ¼ 14 þ h4
2

Here neither d 1 [ d 1=2 , nor d1=2 [ d 1


Hence d 1 and d 1=2 are non-comparable.

Admissibility of Decision Rules


Definition A d 2 D is said to be an admissible decision rule if there does not exist
any d0 2 D such that d0 [ d. Otherwise d is said to be inadmissible, i.e. d is said to
be an inadmissible rule if there exists a d0 2 D such that d0 [ d.
In the above example, for any C > 1, d c is inadmissible as d 1 [ d c .
Note Admissibility is the minimum requirement for any reasonably good decision
rule though the criterion is of negative nature.

7.2 Complete and Minimal Complete Class of Decision


Rules

Definition Let C(D) be a class of decision rules. C is said to be a complete class


of decision rule if given any d 62 C such that a d0 2 C exists such that d0 [ d
(Fig. 7.2).
C is said to be minimal complete if
Fig. 7.2

(i) C is complete and


(ii) No proper sub-class of C is complete.
Significance
If a complete class of C is available one can restrict to this class only for finding a
reasonable decision rule and thus reduce the problem.
A minimal complete class, if exists, provides maximal reduction to this extent.
Note A minimal complete class does not necessarily exist.

Some relationship between a complete (or a minimal complete) class and the
class of all admissible rules
Let A = the class of all admissible rules.
Result 1 For any complete class C, A  C, i.e. any complete class C contains all
admissible rules.
190 7 Statistical Decision Theory

Proof Let d 2 A. If possible let d 62 C. So there exists a d0 2 C such that d0 [ d )


d is inadmissible, which is a contradiction as we have assumed d is an admissible
rule. So d 2 A ) d 2 C, i.e. A  C. h
Result 2 If A is complete, then A is minimal complete.
Proof Assume A is complete. Result 1 ) No proper sub-class of A can be com-
plete. Hence A is minimal complete. h
Result 3 If a minimal complete class C exists, then C  A.
Proof Let C be a minimal complete class. Then C is complete. By Result 1, A  C.
So it is enough to prove that C  A. Suppose this is not true. Then there exists a d0
such that d0 2 C but

d0 62 A ð7:1Þ

This will imply that there exists a d1 2 C such that

d1 [ d0 ð7:2Þ

(Since d0 62 A, i.e. d0 is inadmissible. Hence, there exists a d such that d [ d0 . If


d 2 C, take d ¼ d1 . If d 62 C, there exists a d1 2 C such that d1 [ d [ d0 . Thus, in
all cases there exists a d1 2 C such that d1 [ d0 ).
Let us define C 1 ¼ C  fd0 g.

Let us define C 1 ¼ C  fd0 g: Then it follows that C1 is also complete ð7:3Þ

(Let d 62 C1 h
Case 1 d ¼ d0 . By (7.2), there exists a d1 2 C and hence d1 2 C 1 such that
d1 [ d0 :
Case 2 d 6¼ d0 . Then d 62 C, so there exists a d0 2 C such that d0 [ d:
A: d0 ¼ d0 . By (7.2), there exists a d1 2 C and hence 2 C1 such that
d1 [ d0 [ d.
B: d0 6¼ d0 . d0 2 C 1 . Hence, there exists a d0 2 C 1 such that d0 [ d:
Thus, given any d 62 C 1 in all cases there will exist a d0 2 C1 such that d0 [ d )
C 1 is complete)
Now (7.3) contradicts that C is minimal complete and hence (7.1) must be false
) C  A: So C  A.

Result 2 + Result 3 gives us ) A minimal complete class exists iff A is complete


and in this case C  A.
Corollary 1 A minimal complete class, if it exists, is unique.
Proof Let C = a minimal complete class. C  A which is unique. h
7.2 Complete and Minimal Complete Class of Decision Rules 191

Corollary 2 Let C be a minimal complete class and let d 2 C. Then if d0 d, d0


also 2 C.
Proof C  A, d 2 C , d 2 A. d0 d ) d0 also 2 A and hence 2 C. h
Corollary 3 If d is admissible and d0 d then d0 is also admissible.

Essential complete class and minimal essential complete class


Definition Let CðDÞ be a class of decision rules. Then C is said to be an essential
complete class if given any d 62 C there exists a d0 2 C such that d0 d.

C is said to be minimal essential complete class if


(i) C is essential complete; and (ii) No proper sub-class of C is essential complete.

Note A complete class C is also essential complete since d0 [ d ) d0 d.


Result 1 Let A = the class of all admissible rules and C = an essential complete
class.
If d 2 A but 62 C, then there exists a d0 d such that d0 2 C (and hence 2 A).
Proof Let d 2 A but 62 C. Then there exists a d0 2 C such that d0 d. But as d 2 A,
it is impossible that d0 [ d. So, d0 d. h
Result 2 Let C be minimal essential complete and let d 2 C. If d0 d, then d0 62 C.
Proof If possible, let d0 2 C. Define C 0 ¼ C  fd0 g then C0 will be also essential
complete. This contradicts that C is minimal essential complete. Hence d0 62 C. h
Note Let D1 ðDÞ be a class of decision rules. D1 is said to be an equivalent class
if all rules 2 D1 are equivalent to each other, but no rule 2 D  D1 is equivalent to
a rule 2 D1 . Then D can be considered as the disjoint union of some equivalent
classes.
Then,
(i) If C = a min. complete class then C does or does not entirely contain an
equivalent class (by Corollary 2)
(ii) If C = a minimal essential complete class then C contains at most one rule
from each equivalent class (by Result 2)
Further if d 2 C and in C, d is replaced by d0 d, then resultant class is also
minimal essential complete.
So,
(a) A minimal complete class A minimal essential complete class (by (i) and
(ii) above)
(b) A min. essential complete class is not necessarily unique. (by 2nd part of
(ii) above).
192 7 Statistical Decision Theory

If C be a complete class such that C contains no proper essentially complete


sub-class, then C is minimal complete and is also minimal essential complete.
Example 7.9 Examples of complete and essential complete class
(1) Essential completeness of the class of rules based on a sufficient statistic:
Let d  dð xÞ 2 D. For such x, dðxÞ is a probability distribution over ɶ. T = t(x) = a
statistic. d is said to be based on T if dðxÞ is a function of t(x), i.e. dðxÞ ¼ dðx0 Þ
whenever tðxÞ ¼ tðx0 Þ.
Such a rule can be denoted by dðT Þ: T is said to be a sufficient statistic if the
conditional probability distribution of X given T is the same 8h.
Let T = a sufficient statistic and D0 = the class of rules based on T.
Lemma 1 For any d 2 D, there exists a d0 2 D0 such that d0 d: [Cor. D0 is an
essential complete class]
Proof Let d 2 D
For each given value t of T we define a probability distribution d0 ðtÞ over ɶ as
follows:
Observe the value of a random variable X 0 having the probability distribution the
same as the conditional probability distribution of X given T = t (which is inde-
pendent of h) and then if X 0 ¼ x0 choose an action a 2 ɶ according to the proba-
bility distribution dðx0 Þ. h

Clearly, d0 ðT Þ ¼ a decision rule based on T, i.e. 2 D0 .


Also, Lðh; d0 ðtÞÞ ¼ E fLðh; dðxÞÞ=T ¼ tg

) Rd0 ðhÞ ¼ E h Lðh; dðT ÞÞ ¼ Eh E fLðh; d0 ðxÞÞ=T g

¼ E h Lðh; dðxÞÞ ¼ Rd ðhÞ i:e:; d0 d

Thus, given any d 2 D we can find a d0 2 D0 D0 such that d0 d.


(2) Essential completeness of the class of non-randomized rules for convex
(strictly convex) loss. Let Rk ¼ k-dimensional real space. S  Rk .
S is said to be a convex subset if for any two x ; y 2 S and for any 0  a  1,

a x þ ð1  aÞ y also 2 S (Fig. 7.3).

Fig. 7.3 (a) (b)


x y~ x
~ y~
~
S

Convex Non-Convex
7.2 Complete and Minimal Complete Class of Decision Rules 193

Let
 S = a convex subset of Rk .
f x ¼ a real-valued function defined on S (Fig. 7.4).

Fig. 7.4


f x is said to be a convex function if for any two x ; y 2 S and for any

0  a  1,



f a x þ ð1  aÞ y  af x þ ð1  aÞf y ð7:4Þ


If strict inequality holds in (7.4) whenever x 6¼ y , f x is said to be strictly

convex.
f ð xÞ ¼ x2 ; ex ; jxj; x 2 R1
Examples 7.10 |fflffl{zfflffl} convex
Strictly convex

Lemma 2 (Jensen’s inequality) Let S = a convex subset of Rk ; f x ¼ a

real-valued convex function defined on S. Let Z ¼ a random variable, such that
h i    
P Z 2 S ¼ 1 and E Z exists. Then (i) E Z 2 S; (ii) Ef Z f E Z

If f is strictly convex, then strict inequality holds in (ii) unless the distribution of
Z is degenerate.

Let ɶ = a convex subset of Rk . The loss function Lðh; aÞ is said to be convex (or
strictly convex) if for each given h, Lðh; aÞ is a convex (or strictly convex) function
of a.
194 7 Statistical Decision Theory

Example 7.11

 ¼ R1 ; Lðh; aÞ ¼ ðh  aÞ2 or jh  aj
# #
Strictly convex convex

Let d 2 D. For each x, dðxÞ is a probability distribution over ɶ. Let Z x ¼ a


random variable with probability distribution dðxÞ over ɶ.
We assume that Ezx exists for each

x2x ð7:5Þ
Let D = the class of all non-randomized rules D  D.
Lemma 3 Let ɶ = a convex subset of Rk and the loss function be convex. Then for
each d 2 D satisfying (7.5) there exists a d 0 2 D; viz, d 0 ðxÞ ¼ EZ x such that
d 0 d. If the loss function is strictly convex, then d 0 [ d unless d itself 2 D.
Corollary 1 Let ɶ = a convex subset of Rk , the loss function be strictly convex and
every d 2 D satisfying (7.5), then D (=the class of all non-randomized rules) is
essential complete.
Proof of Lemma 3 Let d 2 D
dðxÞ ¼ a probability distribution over ɶ. For each x, Z x ¼ a random variable
with probability distribution dðxÞ. Define d 0 ðxÞ ¼ EZ x . By (i) of Lemma 2, d 0 ðxÞ 2
ɶ 8x, i.e. d 0 ¼ d 0 ðxÞ 2 D. Also, by (ii) of Lemma 2 Lðh; d 0 ðxÞÞ ¼ Lðh; EZ x Þ 
ELðh; Z x Þ ¼ Lðh; dðxÞÞ.

) Rd0 ðhÞ ¼ E h Lðh; d 0 ðxÞÞ  Eh ðL; h; dðxÞÞ ¼ Rd ðhÞ8h


ð7:6Þ
) d0 d

If the loss function is strictly convex, strict inequality holds in (7.6) for at least
one h unless Z x -distribution is degenerate, i.e. 8x except possibly for x 2 A such
that Ph ½x 2 A ¼ 0 8h, in which case it means that d itself 2 D ) d 0 [ d unless d
itself 2 D h
Corollary 2 Let ɶ = a convex subset of Rk , the loss function is strictly convex and
every d 2 D satisfying (7.5). Let T be a sufficient statistic and D0 = the class of
non-randomized rules based on T, D0  D. Then D0 is essential complete
(complete).
Proof Let d 2 D.
D0 = the class of all randomized decision rules based on T. By Lemma 1, there
exists a d0 ¼ d0 ðT Þ 2 D0 such that d0 d. For each t, d0 ðT Þ is a probability dis-
tribution over ɶ. Define Z t ¼ a random variable with probability distribution d0 ðtÞ
and d 0 ðtÞ ¼ EZ t . As in proof of Lemma 3, d 0 ðtÞ 2 ɶ, i.e. d 0 ¼ d 0 ðT Þ 2 D0 and
d 0 d0 ( [ d0 for strictly convex loss function unless d0 2 D; d). Thus, given
7.2 Complete and Minimal Complete Class of Decision Rules 195

any d 2 D, there exists a d 0 2 D0 such that d 0 d (> d for strictly convex loss
function unless d 2 D0 ).
) D0 is essential complete (complete). h
Note On the condition stated by (7.5)
Let d 2 D, Z x ¼ a random variable with probability distribution dðxÞ over ɶ.
Lðh; dðxÞÞ ¼ ELðh; Z x Þ which exists for each x and h. This in many cases implies
(7.5) holds.
Example 7.12 k ¼ 1; Lðh; aÞ ¼ ðh  aÞ2 ɶ ¼ R1
ELðh; Z x Þ ¼ E ðh  Z x Þ2 exists 8x and 8h
) EZ x exists 8x.

Lðh; aÞ ¼ jh  aj

ELðh; Z x Þ ¼ E jZ x  hj E jZ x j  h

i.e. E jZ x j  h þ E jZ x  hj:
Thus E jZ x  hj exists 8x and 8h ) (7.5) holds.
For K 2, ɶ = ¼ Rk ¼ X

 Xk 2

L h; a ¼ j ai  hi j 2 ¼ a  h

i¼1




EL h ; Zx ¼ E
Zx  h which exists 8x and 8h ) (7.5) holds.

Proposition Suppose for some h


Lðh; aÞ C 1 jaj þ C2 for some C 1 ð [ 0Þ; C2 . Then ELðh; Z x Þ exists 8x ) (7.5)
holds.
This fact gives a sufficient condition on loss function for (7.5) to hold (Fig. 7.5).

Fig. 7.5
L(θ,a)
c1 a +c2

Rao-Blackwell Theorem

Let T = a sufficient statistic.


D = the class of random values.
D0 = the class of random vales based on T.
D = the class of non-random values.
196 7 Statistical Decision Theory

D0 = the class of non-random values based on T.

Let d 2 D satisfy E ðd ðxÞ=T ¼ tÞ exists ð7:7Þ

Lemma 4 (Rao-Blackwell Theorem) Let ɶ be a convex subset of Rk and let the


loss function be convex. For any d 2 D satisfying (7.7), there exists a d 0 2 D0 , viz,
d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ. If the loss function be strictly convex d 0 [ d unless d itself
2 D0 .
Proof d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ is independent of h.

Lðh; d 0 ðtÞÞ ¼ Lðh; Eðd ðxÞ=T ¼ tÞÞ


 E fLðh; d ðxÞ=T ¼ tÞg by Lemma 2:
) Rd 0 ðhÞ ¼ E h Lðh; d 0 ðT ÞÞ  E h E fLðh; d ðxÞ=T ¼ tÞg
¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ 8h
) d0 d

If L is strictly convex, ‘=’ in the above inequality 8h


iff d is a function of t, i.e. d itself 2 D0 implying that d 0 [ d unless d itself
2 D0 . h
Corollary Let ɶ be a convex subset of Rk and the loss function be convex. Let
every d 2 D satisfy (7.6) and every d 2 D satisfy (7.7), then D0 is essential com-
plete. If the loss function be strictly convex, D0 is complete.
Proof Let d 2 D
By Lemma 3, there exists a d 2 D such that d d. Also, by Lemma 4 there exists a
d 0 2 D such that d 0 d d. Thus given any d 2 D, there exists a d 0 2 D0 such
that d 0 d ) D0 is essentially complete. If the loss function is strictly convex
d 0 [ d unless d itself 2 D0 ) D0 is complete. h
Note on condition (7.7) For every d 2 D, Rd ðhÞ ¼ Eh Lðh; d ðxÞÞ exists 8h.
This generally implies that Eh ðd ðxÞÞ exists 8h.
) Eðd ðxÞ=T ¼ tÞ exists, i.e. (7.7) holds.
Example 7.13 To estimate a real parameter h; X ¼ ɶ ¼ ð1; 1Þ

Lðh; aÞ ¼ ðh  aÞ2

Rd ðhÞ ¼ Eh ðd ðxÞ  hÞ2 exists 8h ) E h ðd ðxÞÞ exists 8h ) (7.7) holds.


Similarly, it can be shown for absolute error loss Lðh; aÞ ¼ jh  aj
Proposition Let for some h, Lðh; aÞ C 1 jaj þ C 2 for some constant C 1 ð [ 0Þ and
C2 .
7.2 Complete and Minimal Complete Class of Decision Rules 197

Then Rd ðhÞ exists 8h ) E h ðd ðxÞÞ exists ) (7.7) holds. Thus the proposition
gives a sufficient condition on loss function for (7.7) to hold.

7.3 Optimal Decision Rule

d1 d2 if Rd1 ðhÞ  Rd2 ðhÞ 8h and it is a natural partial ordering of decision rules.
d0 2 D is said to be best or optimal if d0 d 8 d 2 D, but generally such an optimal
rule does not exist.
Example To estimate a real parameter h, X ¼  ¼ ð1; 1Þ. Let
Lðh; aÞ ¼ ðh  aÞ2 . If possible, suppose there exists a best rule, say d0 . Consider
any given value of h, say h0 and define d 0 ðxÞ ¼ h0 8x. Clearly, Rd 0 ðh0 Þ ¼ 0 )
Rd0 ðh0 Þ ¼ 0 where d0 d 0 . Since h0 is arbitrary we must have Rd0 ðhÞ ¼ 0; 8h
which is generally impossible.
) generally there does not exist a best rule.
So to find a reasonably good decision rule we need some additional principles.
Two such principles are generally followed:
(i) Restriction principle
(ii) Linear ordering principle
Restriction principle Put some reasonable restrictions on decision rules, i.e.
consider a reasonable restricted sub-class of decision rules having good overall
performances and then try to find a best in this restricted sub-class.
Two restriction criteria often used are
(i) Unbiasedness and
(ii) Invariance
Linear ordering principle
For every d replace the risk function by a representative number and then compare
the rules in terms of these representative numbers.
If representative number of d1  representative number of d2 , then we prefer d1
to d2 . d0 is considered to be optimal if representative number of d0  representative
number of d 8 d 2 D.
Thus a linear ordering principle  is a way of specifying representative number
Note Any linear ordering principle should not disagree with partial ordering
principle, i.e. if d1 d2 we must have representative number of d1  as repre-
sentative number of d2 .
Two linear ordering principles that are used in general are
(i) Bayes principle
(ii) Minimax principle
198 7 Statistical Decision Theory

Bayes Principle Let X may be finite or countable P


sðhÞ : h 2 X ! a suitable weight function over X. sðhÞ 0 8h and sðhÞ ¼ 1.
h2X
Take representative number as weighted average risk
X
¼ sðhÞRd ðhÞ ¼ r ðs; dÞ
h2X

sðhÞ ¼ a p:m:f of a ðdiscreteÞ distribution over X


¼ prior p:m:f of h:
r ðs; dÞ ¼ Bayes risk of d with respect to s:

If X ¼ a non-degenerate interval of Rk ,
sðhÞ = p.d.f of a (continuous)
R distribution over X.
Bayes risk of d ¼ rðs; dÞ ¼ Rd ðhÞsðhÞdh.
X
d0 S are compared with respect to rðs; dÞ, i.e. if rðs; d1 Þ  rðs; d2 Þ, then d1 is
preferred to d2 . A d0 2 D is considered to be optimum if it minimizes rðs; dÞ with
respect to d 2 D. Such a d0 is called a Bayes rule with respect to prior s.
Definition A rule d0 2 D is said to be a Bayes rule with respect to a prior s if it
minimizes Bayes risk (w.r.t. s) rðs; dÞ w.r.t. d 2 D, i.e. if rðs; d0 Þ ¼ inf rðs; dÞ.
d21

Note
1. A Bayes rule may or may not exist. If it exists, inf  min.
2. A Bayes rule depends on prior s.
3. A Bayes rule, even if exists, may not be unique.
4. Bayes principle does not disagree with partial ordering principle, i.e.
Rd1 ðhÞ  Rd2 ðhÞ 8h ) rðs; d1 Þ  rðs; d2 Þ whatever be s.
Minimax principle
For a d 2 D, representative number is taken as
Sup Rd ðhÞ ¼ Max: Risk that may be incurred due to choice of d. d1 is preferred
h2X
to d2 if Sup Rd1 ðhÞ  Sup Rd2 ðhÞ.
h2X h2X
d0 is considered to be optimum if it minimizes Sup Rd ðhÞ with respect to d 2 D.
h2X
Such a d0 is called a “Minimax Rule”.
Definition A rule d0 2 D is said to be a minimax rule if it minimizes Sup Rd ðhÞ
h2X
with respect to d 2 D, i.e. if
Sup Rd0 h ¼ inf Sup Rd h.
h2X d2D h2X
7.3 Optimal Decision Rule 199

Notes
1. A minimax rule may or may not exist.
2. A minimax rule does not involve any prior s.
3. A minimax rule, even if exists, may not be unique.
4. Minimax principle doesn’t disagree with partial ordering principle

i.e. Rd1 ðhÞ  Rd2 ðhÞ 8h


) Sup Rd1 ðhÞ  Sup Rd2 ðhÞ
h2X h2X

7.4 Method of Finding a Bayes Rule

s = a given prior.
To find a Bayes rule d0 with respect to s  to find a rule d0 that minimizes Bayes
risk rðs; dÞ with respect to d.
Proposition If a Bayes rule d0 with respect to a given prior s exists, then there
exists a non-randomized rule d 0 which is Bayes with respect to s.

Implication For finding a Bayes rule, we can without any loss of generality con-
sider non-randomized rules only.
Proof Let d0 be a Bayes rule with respect to s. d0 may be considered as a prob-
ability distribution over D (=the class of non-randomized rules).
Let Z = a random variable with probability distribution d0 over D. Then

rðs; d0 Þ ¼ E z rðs; zÞ ð7:8Þ


P P
[Let X be finite or countable. r ðs; d0 Þ ¼ sðhÞRd0 ðhÞ ¼ sðhÞEz Rz ðhÞ
h2X h2X
X
¼ Ez sðhÞRz ðhÞðassuming that it is permissibleÞ
h2X
¼ Ez r ðs; zÞ

Similarly, we can show if when X ¼ a non-degenerate interval of Rk ].

Now d0 is Bayes ) r ðs; d0 Þ  r ðs; dÞ 8 d 2 D


) r ðs; d0 Þ  r ðs; d Þ 8d 2 D as D  D
ð7:9Þ
) r ðs; d0 Þ  r ðs; zÞ 8 values of z:
) r ðs; d0 Þ  E z r ðs; zÞ ¼ r ðs; d0 Þ
200 7 Statistical Decision Theory

by (7.8).
We must have equality in (7.9), and consequently Z must 2 D0 with probability
1, where D0 ¼ fd=d 2 D;r ðs; d Þ ¼ r ðs; d0 Þg. h
Consider any d 0 2 D0 ,
then r ðs; d 0 Þ ¼ r ðs; d0 Þ ¼ inf r ðs; dÞ (since d0 is Bayes)
d2D
) d 0 is also Bayes. This proves the Proposition.
Note It is clear from the proof that
(1) A randomized Bayes rule = A probability distribution over D0 ; i.e. the class of
non-randomized Bayes rules.
(2) If a non-randomized Bayes rule is unique, i.e. D0 consists of a single d 0 , then a
Bayes rule is unique and is d 0 .

Method of finding Bayes rule


sðhÞ ¼ a prior distribution of h.
To minimize r ðs; dÞ with respect to d 2 D,
Without any loss of generality we may restrict to non-randomized rules only. So
we are to minimize r ðs; d Þ with respect to d 2 D.
Let X be Rcountable and  x be also countable (If 
x is an open interval of Rk ,
replace Σ by ).
Then for any d 2 D
X X X
r ðs; d Þ ¼ sðhÞRd ðhÞ ¼ sðhÞ pðx=hÞLðh; d ð xÞÞ
h2X h2X x2
x
XX ð7:10Þ
¼ sðhÞpðx=hÞLðh; d ð xÞÞ
x h2X
x2

assuming it is permissible.
PSuppose there exists a d0 ¼ d0 ð xÞ such that for each x, d0 ð xÞ minimizes
sðhÞpðx=hÞLðh; d ð xÞÞ with respect to d ð xÞ 2 ɶ.
h2X
Then clearly, d0 minimizes (7.10) w.r.t. d 2 D ) d0 is Bayes rule with respect
to s.
pðx=hÞ ¼ conditional p:m:f of X given h.
sðhÞ ¼ marginal p:m:f of h.
pðx=hÞsðhÞ ¼ Joint p:m:f of X and h.
X
pð x Þ ¼ pðx=hÞsðhÞ ¼ marginal p:m:f of X:
h2X

pðx=hÞsðhÞ
qðh=xÞ ¼ ¼ conditional ðPosteriorÞ p:m:f: of h given X ¼ x
pð x Þ

if pð xÞ [ 0 ½pð xÞ ¼ 0 , sðhÞpðx=hÞ ¼ 08h 2 X.


7.4 Method of Finding a Bayes Rule 201
P
To minimize sðhÞpðx=hÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ
h2XP
, to min. pð xÞ qðh=xÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ.
P h2X
, min qðh=xÞLðh; dð xÞÞ w.r.t. d ð xÞ 2 ɶ.
h2X
(It is conditional (posterior) loss given X = x), i.e. EfLðh; d ð xÞÞ=X ¼ xg.
Thus if there exists a d0  d0 ð xÞ such that for each x, d0 ð xÞ gives min
E fLðh; d ðxÞÞ=X ¼ xg ¼ Conditional ðposteriorÞ loss given X ¼ x w:r:t: d ð xÞ 2 :
Then d0 is a Bayes rule.
If the minimizing d0 ð xÞ is unique for each x, then d0 is the unique Bayes rule.
[Let X be an open interval of RkR and  x be also an open interval of Rk
(If x is countable, replace Σ by )
Then for any d 2 D
Z
r ðs; d Þ ¼ sðhÞRd ðhÞd ðhÞ
X
2 3
Z Z
¼ sðhÞ4 pðx=hÞLðh; d ð xÞÞdx5dh ð7:11Þ
X 
x
Z Z
¼ sðhÞpðx=hÞLðh; d ð xÞÞdhdx
U X

(assuming this to be permissible)


R Suppose there exists a d0  d0 ð xÞ such that for each x, d0 ð xÞ minimize
sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 .
X
Then clearly, d0 minimizes (7.11) with respect to d 2 D ) d0 is Bayes rule with
respect to s.
pðx=hÞ ¼ conditional p:d:f of Xgiven h.
sðhÞ ¼ marginal p:d:f of h
pðx=hÞsRðhÞ ¼ Joint p:d:f of Xand h.
pð xÞ ¼ sðhÞpðx=hÞdh ¼ marginal p:d:f of X:
X
pðx=hÞsðhÞ
qðh=xÞ ¼ p ð xÞ ¼ conditional ðposteriorÞ p:d:f of h given X = x, if pð xÞ [ 0.
R
To minimize sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 
RX
, min. pð xÞ qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 
R X
, min. qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 
X
which is to min conditional (posterior) loss given X = x.
i.e. E ðLðh; d ð xÞÞ=X ¼ xÞ.
202 7 Statistical Decision Theory

Thus if there exists a d0  d0 ð X Þ such that for each x, d0 ð xÞ min


E fLðh; d ð xÞÞ=X ¼ xg = conditional (posterior) loss given X = x with respect to
d ð xÞ 2 ɶ then d0 is a Bayes rule.
If the minimizing d0 ð xÞ is unique for each x, then d0 is unique Bayes rule]
Summary To min r ðs; d Þ with respect to d 2 D

r ðs; d Þ ¼ Eh Rd ðhÞ
h i
¼ Eh EX=h Lðh; d ð xÞÞ *Rd ðhÞEX=h Lðh; d ð xÞÞ
¼ EX Eh=X Lðh; d ð xÞÞ min for each X ¼ x with respect to d ð xÞ 2 

If d0 ð xÞ is the minimizing, then d0 ¼ d0 ð xÞ is Bayes rule.

Applications
1 Estimation of a real parameter h for squared error loss. To estimate a real
parameter h where X ¼ ɶ ¼ R1 or an open interval of it.
Lðh; aÞ ¼ ðh  aÞ2 , sðhÞ ¼ a prior p.d.f of h
R
To min. E fLðh; d ð xÞÞ=X ¼ xg ¼ ðh  d ð xÞÞ2 qðh=xÞdh w.r.t. d ð xÞ 2 .
X
Clearly, minimizing d0 ð xÞ is given by
R
 Z hpðh=xÞsðhÞdh
h X
d0 ð xÞ ¼ E =X ¼ x ¼ hqðh=xÞdh ¼ R
sðhÞpðh=xÞdh
X X

Thus, here unique Bayes rule is d0 where

d0 ð xÞ ¼ Mean of the posterior distribution of h given X ¼ x:

Example 7.14

X Rð0; hÞ; 0\h\1

To estimate h under squared error loss


Let sðhÞ ¼ prior p:d:f of h ¼ heh ; h [ 0

1
pðx=hÞ ¼ ; 0\x\h
h
7.4 Method of Finding a Bayes Rule 203

qðh=xÞ ¼ conditional P:d:f of h given X ¼ x


eh
¼ R1 ; x\h\1
eh dh
x

¼ eh ex

R1
heh dh
Mean of the posterior distribution of h given (X = x) ¼ ex x

R1 h
heh j1
x þ e dh
x
þ ex
¼ e x
x
¼ xe ex ¼ x þ 1.
Thus unique Bayes estimator of h w.r.t. s is d 0 ðxÞ ¼ X þ 1.
Example 7.15 X Binðn; hÞ, n given, 0\h\1
To estimate h under squared error loss.


n x
pðx=hÞ ¼ h ð1  hÞnx ; x ¼ 0; 1; . . .; n
x
Let sðhÞ ¼ prior p:d:f of h

1
¼ ha1 ð1  hÞb1 ; a; b [ 0
Bða; bÞ
¼ Beta prior

qðh=xÞ ¼ posterior distribution of h given X ¼ x




n 1
hx þ a1 ð1  hÞnx þ b1
x Bða;bÞ
¼

n R1 x þ a1
Bða;bÞ h
1
ð1  hÞnx þ b1 dh
x 0
1
¼ hx þ a1 ð1  hÞnx þ b1 ; 0\h\1:
Bðx þ a; n  x þ bÞ

d 0 ðxÞ ¼ mean of posterior distribution of h given ðX ¼ xÞ


Z1
1
¼ hx þ a ð1  hÞnx þ b1 dh
Bðx þ a; n  x þ bÞ
0
Bðx þ a þ 1; n  x þ bÞ xþa
¼ ¼ :
Bðx þ a; n  x þ bÞ aþbþn

Thus the unique Bayes estimator of h w.r.t. Beta ða; bÞ prior is d 0 ðxÞ ¼ n þX þ a
a þ b.
204 7 Statistical Decision Theory

Particular case if a ¼ 1; b ¼ 1 sðhÞ ¼ 1 80\h\1, i.e. uniform prior.


Unique Bayes estimator is Xn þþ 21 :
Example 7.16 Let X Poisson (h), 0\h\1
To estimate h under squared error loss.
To find Bayes estimator w.r.t. Þ prior.
i.e. sðhÞ 1 eah hb1 ; h 0
Let sðhÞ ¼ Keah hb1 ; h 0
R1
as sðhÞdh ¼ 1 ) K aÞbb ¼ 1 ) K ¼ a
b

Þb
0

hx
pðx=hÞ ¼ eh ; x ¼ 0; 1; . . .
x!
eh hx ab ah b1
x! Þ ðbÞ e h
qðh=xÞ ¼ R1
ab 1
: x! eð1 þ aÞh hx þ b1 dh
Þ ð bÞ
0
ð1 þ aÞh x þ b1
e :h xþb
¼ ð 1 þ aÞ
Þ ð x þ bÞ

Z1
ð 1 þ aÞ x þ b
d0 ð x Þ ¼ eð1 þ aÞh hx þ b dh
Þ ðx þ b Þ
0
xþb
ð 1 þ aÞ Þ ð x þ b þ 1Þ xþb
¼  xþbþ1
¼
Þ ðx þ bÞ ð 1 þ aÞ 1þa

) Unique Bayes estimator of h w.r.t. Þ prior is

xþb
d 0 ð xÞ ¼ :
1þa

Notes
1. d 0 ðxÞ is also (unique) Bayes if Lðh; aÞ ¼ cðh  aÞ2 1ðh  aÞ2 , c = a given
constant
2. If a sufficient statistic T exists we may consider rules based on T only (because
of essential completeness of rules based on T) and then may find Bayes rule
based on T.

Example 7.17 X ¼ ðX 1 ; X 2 ; ::; X n Þ; X 1 ; X 2 ; ::; X n i.i.d. N ðh; 1Þ 1\h\1


To estimate h under squared error loss.
7.4 Method of Finding a Bayes Rule 205
 
T ¼ X ¼ min. Sufficient statistic N h; 1n .

sðhÞ :¼ h N ð0; r2 Þ; r2 ð [ 0Þ is known.

pðt=hÞsðhÞ ¼ Cont: e2ðthÞ  eh =2r


n 2 2 2

2
 
n2t2 h2 n þ 12 þ nht
¼ Cont: e e r

 2
2 þ1 nr2
n2 r 2 t 2 nr h
n2t2 þ t
¼ cont: e 2ðnr2 þ 1Þ
e 2r2 nr2 þ 1

qðh=tÞ ¼ Posterior p:d:f of h given t


 2
2 þ1 nr2
n2 r2 t2 nr h
n2t2 þ t
2ðnr2 þ 1Þ
Const.e e 2r2 nr2 þ 1

¼  2
n2t2 þ n2 r 2 t 2
2ðnr2 þ 1Þ
R nr
2 þ1
2r2
h nr2
nr2 þ 1
t
Const.e e
 2
2 þ1 nr2
nr h t
¼ Const.e 2r2 nr2 þ 1



nr2 r2
) given t; h N t; 2
nr þ 1 nr þ 1
2

2
Posterior mean ¼ Kx; K ¼ nrnr
2 þ1 :

) (Unique) Bayes estimator of h ¼ Kx ¼ d 0 ðxÞ


Also, Min. Bayes risk = Bayes risk of d 0 ¼ Eh Ex=h ðd 0  hÞ2

r2 r2
¼ Ex E h=x ðd 0  hÞ2 ¼ Ex ¼
nr2 þ 1 nr2 þ 1

Applications
2. Estimation of a real h under weighted squared error loss: h ¼ a real parameter.
To estimate h, X ¼  ¼ some interval of R1
Let Lðh; aÞ ¼ wðhÞðh  aÞ2 ; wðhÞ [ 0
d 0 be Bayes if for each x, d 0 ðxÞ minimizes
Z
Eh=X¼x wðhÞðh  d ðxÞÞ2 ¼ wðhÞðh  d ðxÞÞ2 qðh=xÞdh

with respect to d ðRxÞ 2 


hwðhÞqðh=xÞdh
Clearly, d 0 ðxÞ ¼ R .
wðhÞqðh=xÞdh
206 7 Statistical Decision Theory

Example 7.18 X Bin ðn; hÞ, n known, 0\h\1


2
To estimate h with Lðh; aÞ ¼ hðha Þ
ð1hÞ ¼ wðhÞðh  aÞ
2

where wðhÞ ¼ hð1h


1
Þ
Let sðhÞ ¼ 1 80\h\1; i.e. uniform prior.


n x
h ð1  hÞnx
x hx ð1  hÞnx
qðh=xÞ ¼
¼ ; 0\h\1
n R x Bðx þ 1; n  x þ 1Þ
h ð1  hÞnx dh
x

R1 hx ð1hÞnx
R h hð1h
1
Þ  Bðx þ 1;nx þ 1Þ dh
hwðhÞqðh=xÞdh 0
d 0 ð xÞ ¼ R ¼ 1
wðhÞqðh=xÞdh R 1 hx ð1hÞnx
hð1hÞ  Bðx þ 1;nx þ 1Þ dh
0
R1
hx ð1  hÞnx1 dh
0 Bðx þ 1; n  xÞ
¼ ¼
R1 Bðx; n  xÞ
hx1 ð1  hÞnx1 dh
0
x
¼ ; for x ¼ 1; 2; ::; n  1
n
Z Z1
For x ¼ 0; 2
wðhÞðh  d 0 ð0ÞÞ qðh=x ¼ 0Þdh1 ðh  cÞ2 h1 ð1  hÞn1 dh
0
¼ finite if c ¼ 0 ½by taking d 0 ð0Þ ¼ c
¼ 1 if c 6¼ 0
Z
x
) wðhÞðh  d 0 ð0ÞÞ2 qðh=x ¼ 0Þdh is min for d 0 ð0Þ ¼ 0 ¼ :
n
R
Similarly, for x = n, wðhÞðh  d 0 ð0ÞÞ2 qðh=x ¼ nÞdh is min for d 0 ðnÞ ¼ 1 ¼ nx
Thus for every x ¼ 0; 1; 2; . . .; n; d 0 ðxÞ ¼ nx minimizes
R
wðhÞðh  d 0 ðxÞÞ2 qðh=xÞdh with respect to d ðxÞ 2 
) d 0 ðxÞ ¼ nx ð¼ minimum variance unbiased estimator or maximum likelihood
estimator of hÞ is unique Bayes rule.

Application
3. Estimation of a real h under absolute error loss. To estimate h ¼ a real
parameter, X ¼  some interval of R1
Let Lðh; aÞ ¼ jh  aj
d0 ¼ d0 ðX Þ be Bayes if for each x d 0 ðxÞ minimizes Eh=X¼x jh  d ðxÞj with respect
to d ðxÞ 2 
7.4 Method of Finding a Bayes Rule 207

Clearly, d 0 ðxÞ = median of the posterior distribution of h given X = x. If the


median of the posterior distribution is unique, then d 0 is the unique Bayes rule.
Example 7.19 X ¼ ðX 1 ; X 2 ; . . .; X n Þ, X 1 ; X 2 ; . . .; X n i.i.d. N ðh; 1Þ; a\h\a
To estimate h under absolute error loss, without loss of any generality we restrict
to rules based on T ¼ X. Let sðhÞ : h N ð0; r2 Þ; r2 ð [ 0Þ known.
2
Median of posterior distribution of h given t ¼ kx; k ¼ nrnr2 þ1

) ðUniqueÞ Bayes estimator of h is kx.

Application

4. Estimation of function of h:
To estimate gðhÞ ¼ a real-value function of h.

 ¼ X ¼ the set of possible values of gðhÞ

Let Lðh; aÞ ¼ ðgðhÞ  aÞ2 ! squared error loss.


d 0 be Bayes if it minimizes E h=X¼x fgðhÞ  d ðxÞg2 with respect to d ðxÞ 2 ӕ for
each given x. Clearly, d 0 ðxÞ ¼ Eh=x¼x fgðhÞg
) d 0 ðxÞ ¼ E h=x fgðhÞg is (unique) Bayes.
Similarly, we can find it for weighted squared error loss or for absolute error
loss.
Example 7.20 X ¼ ðX 1 ; X 2 Þ; X i ’s independent and X i Binðni ; hi Þ; where n1 ; n2
known, 0\h1 ; h2 \1; h ¼ ðh1 ; h2 Þ
To estimate gðhÞ ¼ h1  h2 under squared error loss.
sðhÞ : h1 ; h2 independent, hi Rð0; 1Þ; i ¼ 1; 2



n1 n1 x1 n2
h1 ð 1  h1 Þ
x1
h2 x2 ð1  h2 Þn2 x2
x1 x2
qðh=xÞ ¼ 1 1


R R n1 n1 x1 n2
h1 ð 1  h1 Þ
x1
h2 x2 ð1  h2 Þn2 x2 dh1 dh2
0 0 x 1 x 2
hx11 ð1  h1 Þn1 x1 hx22 ð1  h2 Þn2 x2
¼ ; 0\h1 ; h2 \1:
Bðx1 þ 1; n1  x1 þ 1ÞBðx2 þ 1; n2  x2 þ 1Þ

i.e. posterior distribution of h is, h1 ; h2 independent and hi Bðxi þ 1;


ni  xi þ 1Þ; i ¼ 1; 2

d 0 ðxÞ ¼ Eh=x ðh1  h2 Þ ¼ E h1 =x ðh1 Þ  E h2 =x ðh2 Þ


x1 þ 1 x 2 þ 1
¼ 
n1 þ 2 n2 þ 2
208 7 Statistical Decision Theory

Thus (unique) Bayes estimator of h1  h2 is


X1 þ 1 X2 þ 1
d 0 ðX Þ ¼  :
n1 þ 2 n2 þ 2

7.5 Methods for Finding Minimax Rule

I. Geometric or Direct Method

We find geometrically or directly a rule d0 such that

Sup Rd0 ðhÞ ¼ inf Sup Rd ðhÞ:


h2X d2D h2X

Let X ¼ fh1 ; h2 ; . . .; hk g; d 2 D; S = risk set

y ¼ risk point of d

Sup Rd ðhÞ ¼ max yj


h2X 1jk

ð1Þ ð2Þ
Two risk points y ; y may be considered to be equivalent if

max yj ð1Þ ¼ max yj ð2Þ :


1jk 1jk

A risk point y0 is said to be a minimax point if max yj0 ¼ inf max yj :


1jk y 2S 1  j  k

If y0 is a minimax point and d0 is a rule with risk point y0 , then d0 is minimax.


  
For any real C, let Qc ¼ Qðc;c;::;cÞ ¼ y yj  c8j ¼ 1; 2; ::k :

All risk points lying on the boundary of a Qc are equivalent points (Figs. 7.6 and
7.7).
Fig. 7.6 Equalizer line z1=z 2

(c,c)

Qc
7.5 Methods for Finding Minimax Rule 209

Fig. 7.7

Equivalent points
S

Minimax Point

(For any such point y , max yj = c)



Let C0 ¼ inf fC=Qc \ S 6¼ /g:
Any risk point 2 boundary of Qc0 is a minimax point. Any rule d0 with risk point
y0 is minimax.

Notes
1. If S does not contain its boundary points, a minimax rule may not exist.
2. A minimax point may not be unique

(1,1) (2,1)

S
All Minimax
Points

(1,0) (2,0)
210 7 Statistical Decision Theory

3. A minimax point does not necessarily lie on the equalizer line (Figs. 7.8 and 7.9).

Example 7.20 Let X ¼ fh1 ; h2 g  ¼ fa1 ; a2 g


Loss is (0–1). 
x ¼ f0; 1; 2; . . .g
Ph1 ½X ¼ x ¼ 0 if x = 0

1
¼ if x 1
2x

(1,1) (2,1)

S
All Minimax

(1,0)
(2,0)

C0
Minimax Points

Fig. 7.8
7.5 Methods for Finding Minimax Rule 211

C0

=Minimax Point

Fig. 7.9

1
Ph2 ½X ¼ x ¼ ; x ¼ 0; 1; . . .
2x þ 1

Let d 2 D: dða2 =xÞ ¼ dðxÞ dða1 =xÞ ¼ 1  dðxÞ


0  dð x Þ  1

X
1
y 1 ¼ R d ð h1 Þ ¼ dðxÞ  Ph1 ðX ¼ xÞ
x¼1
X1
y 2 ¼ R d ð h2 Þ ¼ f1  dðxÞg  Ph2 ðX ¼ xÞ
x¼0
1 1X 1
1
¼ 1  dð 0Þ   dð x Þ x
2 2 x¼1 2

To find dðxÞ such that y1 ¼ y2 ¼ 13 we take

1
dð x Þ ¼ 8x 1; dð 0Þ ¼ 1
3
212 7 Statistical Decision Theory

Thus, a minimax rule is given as

dða2 =0Þ ¼ 1 dða1 =0Þ ¼ 0


1 2
dða2 =xÞ ¼ dða1 =xÞ ¼ ; x ¼ 1; 2; . . .
3 3

Example 7.21  
1 1
X¼ h1 ¼ ; h2 ¼
4 2
 
1 1
¼ a1 ¼ ; a2 ¼
4 2

a1 a2
Loss matrix: h1 1 4
h2 3 2

Let X ¼ 0 with probability h
h ¼ h1 ; h2
¼ 1 with probability ð1  hÞ
Then D ¼ fd 1 ; d 2 ; d 3 ; d 4 g
d 1 ð0Þ ¼ d 1 ð1Þ ¼ a1 ; d 2 ð0Þ ¼ d 2 ð1Þ ¼ a2 ; d 3 ð0Þ ¼ a1 but d 3 ð1Þ ¼ a2 and
d 4 ð0Þ ¼ a2 but d 4 ð1Þ ¼ a1
Rd 1 ðh1 Þ ¼ 1; Rd 1 ðh2 Þ ¼ 3; Rd2 ðh1 Þ ¼ 4; Rd2 ðh2 Þ ¼ 2

1 3 1
R d 3 ð h1 Þ ¼  1þ  4 ¼ 3
4 4 4
1 1 1
R d 3 ð h2 Þ ¼  3þ  2 ¼ 2
2 2 2
1 3 3
R d 4 ð h1 Þ ¼  4þ  1 ¼ 1
4 4 4
1 1 1
R d 4 ð h2 Þ ¼  2þ  3 ¼ 2
2 2 2

) S0 ¼ the set of risk points of all non-randomized rules





1 1 3 1
¼ ð1; 3Þ; ð4; 2Þ; 3 ; 2 ; 1 ; 2 :
4 2 4 2

If y2 ¼ my1 þ C using = lined points m ¼  29, c ¼ 26


9  26
Here to find a minimax rule means to find a rule with risk point 26
11 ; 11
(Fig. 7.10)
7.5 Methods for Finding Minimax Rule 213

(4,1)
Minmax point=

Fig. 7.10

Let d 2 D. For X = x

d ¼ a1 with probability dðxÞ


¼ a2 with probability 1  dðxÞ

Let dð0Þ ¼ u; dð1Þ ¼ v, d ffi ðu; vÞ; 0  u; v  1

dða1 =0Þ ¼ u; dða1 =1Þ ¼ v

dða2 =0Þ ¼ 1  u; dða2 =1Þ ¼ 1  v:

1 3
Rd ðh1 Þ ¼ fu  1 þ ð1  uÞ  4g þ fv  1 þ ð1  vÞ4g
4 4
1
¼ ð16  3u  9vÞ
4
1 1
R d ð h2 Þ ¼fu  3 þ ð1  uÞ  2g þ fv  3 þ ð1  vÞ2g
2 2
1
¼ ðu þ v þ 4Þ
2
26
Rd ðh1 Þ ¼ Rd ðh2 Þ ¼
11 ð7:12Þ
1 26 24
) ð16  3u  9vÞ ¼ ) u þ 3v ¼
4 11 11
1 26 8
and ð u þ v þ 4Þ ¼ ) uþv ¼ ð7:13Þ
2 11 11
214 7 Statistical Decision Theory

(7.12), (7.13) gives the unique solution u = 0, v ¼ 11


8
:
Thus, the unique minimax rule is given as

8 3
dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ ; dða2 =1Þ ¼ :
11 11

Note The unique minimax rule is purely randomized. Thus, unlike Bayes rules, a
minimax rule may be purely randomized, i.e. although a minimax rule exists, no
non-randomized rule is minimax.
Alternative (direct/or Algebraic approach)

Let us take the same Example 7.21 (Fig. 7.11)

Fig. 7.11

8
11 D2
v
D
3
D1 11

1
Sup Rd ðhÞ ¼ maxfRd ðh1 Þ; Rd ðh2 Þg ¼ ð16  3u  9vÞ
h2X 4

if 14 ð16  3u  9vÞ 12 ðu þ v þ 4Þ,


i.e. if 5u þ 11v  8 and ¼ 12 ðu þ v þ 4Þ if
2 ðu þ v þ 4Þ 4 ð16  3u  9vÞ, i.e. if
5u þ 11v 8
1 1

Let D1 ¼ fd  ðu; vÞ=5u þ 11v  8g

D2 ¼ fd  ðu; vÞ=5u þ 11v [ 8g ) D1 þ D2 ¼ D

For d 2 D1 , Sup Rd ðhÞ ¼ 16  3u


4
 9v
h2X
163u9v
Now Inf Sup Rd ðhÞ ¼ Inf 4
d2D1 h2H 0  u; v  1
5u þ 11v  8

 
16  3u  9v 1 9ð8  5uÞ
¼ inf inf ¼ inf 16  3u 
0  u  1 0  v  85u 4 0u1 4 11
11
7.5 Methods for Finding Minimax Rule 215

1 104 26
¼ inf ð12u þ 104Þ ¼ ¼
0u1 4 44 11

and inf attained for u = 0, v ¼ 8 115:0 ¼ 11


8

Similarly, inf Sup Rd ðhÞ ¼ 11, which is for u = 0, v ¼ 11


26 8
d2D2 h2H
Finally, inf Sup Rd ðhÞ
d2D h2H

 
26
¼ min inf Sup Rd ðhÞ; inf Sup Rd ðhÞ ¼
d2D1 h2H d2D2 h2X 11

and inf is attained if u = 0, v ¼ 11


8
.
Thus, the unique minimax rule is given as u = 0, v ¼ 11 8

i.e. dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ 11 ; dða2 =1Þ ¼ 11


8 3

II. Use of Bayes rule

A rule d0 is said to be an equalizer rule if Rd0 ðhÞ ¼ Const 8h 2 X.


Result 1 If an equalizer rule d0 is Bayes (w.r.t some prior s), then d0 is minimax. If
d0 is unique Bayes (w.r.t. s), then d0 is unique minimax (and hence admissible).
Proof Rd0 ðhÞ ¼ c 8h ) Sup Rd0 ðhÞ ¼ c and r ðs; d0 Þ ¼ c
h2H
Minimaxiety: If possible let d0 be not minimax, so there exists a d1 such that

Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c


h2H h2H
) Rd1 ðhÞ  Sup Rd1 ðhÞ ¼ c8h
h2H
) r ðs; d1 Þ\c ¼ r ðs; d0 Þ

But this contradicts that d0 is Bayes w.r.t. s. Hence, d0 is minimax.

Unique minimaxiety
If possible let d0 be not unique minimax. So there exists another d1 which is also
minimax, i.e. there exists another d1 such that

Sup Rd1 ðhÞ ¼ Sup Rd1 ðhÞ ¼ c


h2H h2H
) Rd1 ðhÞ  Sup Rd1 ðhÞ ¼ c8h
h2H
) r ðs; d1 Þ  c ¼ r ðs; d0 Þ

i.e. r ðs; d1 Þ ¼ r ðs; d0 Þ (* d0 is Bayes w.r.t. s)


) d1 is also Bayes w.r.t s, but this contradicts that d0 is unique Bayes w.r.t. s.
216 7 Statistical Decision Theory

Hence d0 is unique minimax.


X ¼ 1 with probabilityh
Example 7.22 Let
¼ 0 with prob 1  h: 0\h\1
To estimate θ under squared error loss.
Let d ðxÞ ¼ a non-randomized rule.
Let d ð1Þ ¼ u; d ð0Þ ¼ v; 0\u; v\1.

d  ðu; vÞ

Equalizer rule Rd ðhÞ ¼ hðu  hÞ2 þ ð1  hÞðv  hÞ2


 
¼ h2 ð1 þ 2v  2uÞ þ h u2  v2  2v þ v2
The rule is equalizer iff 1 þ 2v  2u ¼ 0; or,
1 þ 2v
u¼ ð7:14Þ
2
2
and u2  v2  2v ¼ 0 or ð1 þ42vÞ  v2  2v ¼ 0 (using 7.14)
Or 14  v ¼ 0 ) v ¼ 14 ) u ¼ 34
Thus, the only equalizer non-randomized rule is
3 1
d ð1Þ ¼ ; d ð0Þ ¼ :
4 4

Bayes rule: Let s ¼ a prior distribution


 
E ðhÞ ¼ m1 and E h2 ¼ m2
 
) r ðs; d Þ ¼ ERd ðhÞ ¼ m2 ð1 þ 2v  2uÞ þ m1 u2  v2  2v þ v2

@rðs;d Þ
Now @u ¼ 0 ) 2m2 þ 2m1 u ¼ 0 ) u ¼ m2
m1

@r ðs; d Þ
¼ 0 ) 2m2  2m1 v  2m1 þ 2v ¼ 0
@v
m1  m2
)v¼
1  m1

Thus, the unique Bayes rule w.r.t. s is


m2 m1  m2  
d ð 1Þ ¼ ; d ð 0Þ ¼ where m1 ¼ E ðhÞ; m2 ¼ E h2
m1 1  m1

Hence, the equalizer non-randomized rule is unique Bayes w.r.t. a s such that
1  m2
m2
m1 ¼ 34 and m1m 1
¼ 14.
) m1 ¼ 2 and m2 ¼ 38.
1
7.5 Methods for Finding Minimax Rule 217

 
[For example, let s ¼ B 12 ; 12 prior α = β m1 ¼ a þa b ¼ 12 and m2 ¼
aða þ 1Þ
ða þ bÞða þ b þ 1Þ ¼ 38 and the equalizer non-randomized rule is unique Bayes w.r.t.
1 1
B 2 ; 2 prior].
Thus the non-randomized rule d 0 ð1Þ ¼ 34 ; d 0 ð0Þ ¼ 14 is equalizer as well as
unique Bayes (w.r.t some prior) ) d 0 ðX Þ is minimax (unique).
Example 7.23

X Binðn; hÞ; n known and 0\h\1

To estimate h under squared error loss.

sab ¼ Bða; bÞ prior a; b [ 0

The unique Bayes rule w.r.t. sab is

X þa
d ab ðX Þ ¼
nþaþb
 2
X þa
Rd ab ðhÞ ¼ E h h
nþaþb
E h fðx  nhÞ  hða þ bÞ þ ag2
¼
ð n þ a þ bÞ 2
Eh ðx  nhÞ2 þ h2 ða þ bÞ2 þ a2  2haða þ bÞ
½* E h ðx  nhÞ ¼ 0 )
ð n þ a þ bÞ 2
n o
h i h2 ða þ bÞ2 n þ hfn  2aða þ bÞg þ a2
* E h ðx  nhÞ2 ¼ nhð1  hÞ )
ð n þ a þ bÞ 2

d ab is equalizer iff
 pffiffi
ða þ bÞ2 ¼ n a ¼ 2n
, pffiffi
2aða þ bÞ ¼ n b ¼ 2n

Thus the rule


pffiffi
X þ n
d pffin;pffin ðX Þ ¼ p2ffiffiffi
2 2 nþ n
pffiffiffi
pffiffi pffiffi
X þ n=2
is equalizer as well as unique Bayes (w.r.t B n
; n
prior). Hence pffiffi is the
2 2 nþ n
unique minimax estimator of h.
218 7 Statistical Decision Theory

Example 7.24 Let X Binðn; hÞ; n be known 0\h\1


2
To estimate h under loss function Lðh; aÞ ¼ hðhð1ahÞ Þ
E h ðX  h Þ
2

Let d 0 ðX Þ ¼ Xn ; Rd 0 ðhÞ ¼ hðn1hÞ ¼ 1n 8h


i.e. d 0 is an equalizer rule.
Also, d 0 is unique Bayes w.r.t. R(0,1) prior.
Hence d 0 ðX Þ ¼ Xn is the unique minimax estimator of h.
Result 2 If an equalizer rule d0 is extended Bayes, then it is minimax.
Example 7.25 X 1 ; X 2 ; . . .; X n i.i.d N ðh; 1Þ; 1\h\1
To estimate h under squared error loss. Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h, i.e. d 0 is
equalizer. Also, d 0 is extended Bayes. Hence X is minimax.
Proof of Result 2

Rd0 ðhÞ ¼ c 8h

So, Sup Rd0 ðhÞ ¼ c ) r ðs; d0 Þ ¼ c 8s:


h2H
Also, d0 is extended Bayes
) given any 2 [ 0; there exists a prior s2 such that

c ¼ r ðs2 ; d0 Þ  inf r ðs2 ; dÞ þ 2


d2D
ð7:15Þ
or; inf r ðs2 ; dÞ c 2
d2D

if possible let d0 be not minimax.


So there exists a d1 such that

Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c ð7:16Þ


h2H h2H

(7.16) implies there exists an 2 such that

Sup Rd1 ðhÞ\c 2


h2H
) Rd1 ðhÞ\c 2 8h; since Rd ðhÞ  Sup Rd ðhÞ
h2H ð7:17Þ
) r ðs; d1 Þ\c 2 whatever be s:
) inf r ðs; dÞ\c 2 whatever be s
d2D
 
* inf r ðs; dÞ  r ðs; d1 Þ
d2D

(7.17) contradicts (7.15). Hence d0 must be minimax. h


7.5 Methods for Finding Minimax Rule 219

Result 3 Let d0 be such that


(i) Rd0 ðhÞ  c 8h for some real constant c.
(ii) d0 is Bayes (unique Bayes) w.r.t. a prior s0 such that r ðs0 ; d0 Þ ¼ c
Then d0 is minimax (unique minimax).
Corollary 1 Let d0 be such that
ðiÞ0 Rd0 ðhÞ ¼ c 8h (This is in fact Result 1)
ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a prior s0 .
Then d0 is minimax (unique minimax). ðiÞ0 ; ðiiÞ0 ) ðiÞ; ðiiÞ
^
Corollary 2 Let d0 be such that
Rd0 ðhÞ ¼ c 8h 2 H0 ðHÞ
ðiÞ0
 c 8h 2 H  H0
ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a s0 such that Prfh 2 H0 g ¼ 1
Then d0 is minimax (unique minimax)
ðiÞ0 ; ðiiÞ0 ; ) ðiÞ; ðiiÞ
Note For H0 ¼ H, Corollary 2 ) Corollary 1
Proof of Result 3 For any d and any s,

Sup Rd ðhÞ r ðs; dÞ ð7:18Þ


h2X
 
As Rd ðhÞ  Sup Rd ðhÞ8h ) r ðs; dÞ  Sup Rd ðhÞ
h2H h2H

For d ¼ d0 and s ¼ s0 ; (7.18)

) r ðs0 ; d0 Þ  Sup Rd0 ðhÞ by (ii) ð7:19Þ


h2H

Also ðiÞ ) Sup Rd0 ðhÞ  c ð7:20Þ


h2H

(7.19), (7.20)

) Sup Rd0 ðhÞ ¼ c ð7:21Þ


h2H
220 7 Statistical Decision Theory

So, minimaxiety of d0 :

For any; d; Sup Rd ðhÞ r ðs0 ; dÞ by ð7:18Þ


h2H
r ð s 0 ; d0 Þ ðSince; d0 is Bayes w:r:t: s0 Þ
¼ cðbyðiiÞÞ
¼ Sup Rd0 ðhÞ by ð7:21Þ
h2H

) d0 is minimax.
Unique minimaxiety of d0 : For any dð6¼ d0 Þ

Sup Rd ðhÞ r ðs0 ; dÞ ðbyð7:18ÞÞ


h2X
[ r ðs0 ; d0 Þ ðSince d0 is unique Bayes w:r:t:s0 Þ
¼ c ðby ðiiÞÞ
¼ Sup Rd0 ðhÞ by ð7:21Þ
h2X

Thus Sup Rd ðhÞ [ Sup Rd0 ðhÞ


h2X h2X
8dð6¼ d0 Þ ) d0 is unique minimax. h
Example 7.26 Let X Binðn; h1 Þ n be known.
Y Binðn; h2 Þ 0\h1 ; h2 \1; h1 ; h2 are unknown.
To estimate h1  h2 under squared error loss, we can expect a rule of the form
aX + bY + c to be minimum. However, no rule of this form is an equalizer rule. So
Result 1 (or Corollary 1) cannot be applied. But Corollary 2 can be applied as
follows:
Step 1: To find an equalizer Bayes rule in some H0 ðHÞ. Let
H0 ¼ fh1 ; h2 =0\h1 ; h2 \1; h1 þ h2 ¼ 1g. Restricting to H0 , let us write h1 ¼ h;
h2 ¼ 1  h.
Thus, we have,
3
X Binðn; hÞ
7 independent:
Y Binðn; 1  hÞ 5
or n  Y Binðn; hÞ
Without any loss of generality we may restrict ourselves to rules based on
Z ¼ X þ ðn  Y Þ Binð2n; hÞ (Sufficient statistic) pffiffi pffiffi
If X Binðn; hÞ, an equalizer and unique Bayes (w.r.t. Bin 2n ; 2n prior)
pffi
Xþ n
estimator of h under squared error loss is n þ p2ffiffin.
pffiffiffiffi pffiffiffiffi
If Z Binð2n; hÞ; an equalizer and unique Bayes (w.r.t. Bin 22n ; 22n prior)
pffiffiffi
Z þ 2n
estimator of h under squared error loss is 2n þ p2ffiffiffi
2n
ffi.
7.5 Methods for Finding Minimax Rule 221

To estimate now h1  h2 ¼ 2h  1; consider the following:


Lemma Under squared error loss, if d0 is an equalizer Bayes (unique) estimator of
gðhÞ, then d0 ¼ ad0 þ b is an equalizer Bayes (unique) estimator of
g ðhÞ ¼ agðhÞ þ b.
Proof For any estimator d of gðhÞ we can define an induced estimator, viz. d ¼
ad0 þ b of g ðhÞ ¼ agðhÞ þ b and vice versa.
Under squared error loss, Rd ðhÞ ¼ a2 Rd ðhÞ

r ðs; dÞ ¼ a2 r ðs; d Þ

Hence, d0 is equalizer ) d ¼ ad0 þ b is equalizer. h


d0 is Bayes (unique) w.r.t. s ) d0 ¼ ad0 þ b is Bayes (unique).
By the Lemma, an equalizer Bayes (unique) estimator of 2h  1 is
 pffiffiffiffi
2 z þ 22n 2ðX  YÞ
pffiffiffiffiffi ¼ pffiffiffiffiffi
2n þ 2n 2n þ 2n

Thus, if we restrict to H0 , an equalizer Bayes (unique) estimator of h1  h2 is


2ðXY Þ
pffiffiffiffi
¼ d 0 (say)
2n þ 2n
Step 2: Rd0 ðh1 ; h2 Þ  c 8ðh1 ; h2 Þ 2 H where c ¼ Rd 0 ðh1 ; h2 Þ for ðh1 ; h2 Þ 2 H0 .
Proof For ðh1 ; h2 Þ 2 H
 2
2ðX  Y Þ
Rd 0 ðh1 ; h2 Þ ¼ E h1 ;h2 pffiffiffiffiffi  ðh1  h2 Þ
2n þ 2n
n pffiffiffiffiffi o 2  pffiffiffiffiffi 2
¼ E h 2ðX  nh1 Þ  2ðY  nh2 Þ  2nðh1  h2 Þ 2n þ 2n

4E h ðX  nh1 Þ2 þ 4E h ðY  nh2 Þ2 þ 2nðh1  h2 Þ2


¼  pffiffiffiffiffi2
2n þ 2n
2h1 ð1  h1 Þ þ 2h2 ð1  h2 Þ þ ðh1  h2 Þ2 Numerator
¼  pffiffiffiffiffi2 ¼ :
1 þ 2n Dinominator

Now Numerator ¼ 2h1 þ 2h2  h12  h22  2h1 h2


¼ 1  f1  h1  h2 g2  1
0
¼ 0 holds iff h1 þ h2 ¼ 1

1
R d 0 ð h1 ; h2 Þ ¼  pffiffiffiffiffi2 ¼ c 8ðh1 ; h2 Þ 2 H0
Hence, 1 þ 2n h
\c 8ðh1 ; h2 Þ 2 H  H0
222 7 Statistical Decision Theory

2ðXY Þ
By Corollary 2, Step 1 + Step 2 gives us d 0 ¼ 2n pffiffiffiffi is the unique minimax
þ 2n
estimator of h1  h2 .
Result 4 Let d0 be such that
(i) Rd0 ðhÞ  c 8h 2 H, c = a real constant
(ii) There exists a sequence of Bayes rules fdn g w.r.t. sequence of priors fsn g
such that r fsn ; dn g ! c. Then d0 is a minimax.

Proof For any d and any s,

Sup Rd ðhÞ r ðs; dÞ ð7:22Þ


h2H

(as was in the Proof of Result 3) h


(7.22) ) For any d,

Sup Rd ðhÞ r ðsn ; dÞ r ðsn ; dn Þ ! c by ðiiÞ


h2H

ðSince; dn is Bayes w:r:t: sn priorÞ ð7:23Þ

For d ¼ d0
(7.23) ) Sup Rd0 ðhÞ c and also condition ðiÞ ) Sup Rd0 ðhÞ  c
h2H h2H

) Sup Rd0 ðhÞ ¼ c ð7:24Þ


h2H

Then (7.23), (7.24) ) for any d,

Sup Rd ðhÞ c ¼ Sup Rd0 ðhÞ


h2H h2H

i.e. d0 is minimax.
Example 7.27 Let X 1 ; X 2 ; ::; X n i.i.d N ðh; 1Þ; 1\h\1
To estimate h under squared error loss,
Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h
(i) is satisfied with c ¼ 1n.
Let sr : N ð0; r2 Þ prior
2
d r ¼ Bayes estimator of h w.r.t. sr ¼ 1 þnrnr2 X
r ðsr ; d r Þ ¼ 1 þr nr2 ! 1n ¼ c as r2 ! 1
2

Thus (ii) is satisfied.


Hence d 0 ¼ X is minimax.
7.5 Methods for Finding Minimax Rule 223

Example 7.28 Let X Poisson (θ), 0 < θ < α.


2
To estimate θ with Lðh; aÞ ¼ ðh h aÞ .
(Apply Result 4 to prove that d 0 ¼ X is minimax)
2
Hint Rd0 ðhÞ ¼ Eh ðXh hÞ ¼ 1 ) (i) is satisfied with c = 1. Take
sab ðhÞ1eah  hb1 ; 0\h\1. d ab ðX Þ ¼ Bayes estimator of h w.r.t. sab ¼
x þ b1
 
1 þ a r sab ; d ab ! 1 ¼ c as a ! 0, b ! 1. Hence d 0 ¼ X is minimax.

Other Methods: Use of Cramer-Rao inquality.

Result 1 If an equalizer rule d0 is admissible, then d0 is minimax.


Proof Rd 0 ðhÞ ¼ c 8h ) Sup Rd 0 ðhÞ ¼ c.
h2H
If possible let d0 be not minimax. Then there exists a d1 such that
Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c
h2H h2H

) Rd1 ðhÞ\C ¼ Rd0 ðhÞ 8h ð7:25Þ

(7.25) ) d1 [ d0 ; which contradicts that d0 is admissible. Hence, d0 is minimax.


To estimate a real-valued parameter h under squared error loss. H ¼ an open
interval of R1 . Without any loss of generality we can restrict ourselves to
non-randomized rules only (since the loss function is convex). h
Let d ðX Þ ¼ a non-randomized rule.

bd ðhÞ ¼ E h ðd ðX ÞÞ  h ¼ Bias of d ðX Þ

By C--R inequality Rd ðhÞ ¼ MSEh ðd ðX ÞÞ


¼ b2d ðhÞ þ V h ðd ðX ÞÞ
 2 .
b2d ðhÞ þ 1 þ b0d ðhÞ I ð hÞ 8h
¼ C d ð hÞ ðsayÞ

IðhÞ ¼ Fisher's information function:

Result 2 Let d 0 be a non-randomized rule such that


(i) MSE of d 0 attains C–R lower bound, i.e.

Rd0 ðhÞ ¼ C d0 ðhÞ 8h


224 7 Statistical Decision Theory

(ii) For any non-randomized rule d 1 ,

C d1 ðhÞ  Cd 0 ðhÞ 8h
) bd 1 ð hÞ ¼ bd 0 ð hÞ 8h

Then d 0 is admissible.
If further, d 0 is equalizer, then d 0 is minimax.
Proof Result I ) proves that it is minimax h
Proof of admissibility If possible let d 0 be inadmissible.
Then there exists a d 1 such that

Rd 1 ðhÞ  Rd 0 ðhÞ 8h with strict inequality for at least one h ð7:26Þ

(7.26) ⇒

Cd 1 ðhÞ  Rd 1 ðhÞ  Rd 0 ðhÞ ¼ C d 0 ðhÞ 8h ð7:27Þ


|fflfflfflfflfflfflfflfflfflfflffl
ffl{zfflfflfflfflfflfflfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}

by C–R inequality and by (i)


(7.27) ⇒

bd 1 ðhÞ ¼ bd 0 ðhÞ 8h by ðiiÞ

) Cd 1 ðhÞ ¼ Cd 0 ðhÞ 8h ð7:28Þ

(7.27) and (7.28) ⇒

Cd 0 ðhÞ  Rd 1 ðhÞ  Rd0 ðhÞ ¼ Cd 0 ðhÞ 8h ð7:29Þ

We must have equality in (7.29) everywhere, implying that Rd 1 ðhÞ ¼ Rd 0 ðhÞ8h:


Thus, strict inequality in (7.26) cannot hold for any h, i.e. there cannot be any d 1
such that d 1 [ d 0 .
Hence d 0 is admissible. h
Example 7.29 Let x1 ; x2 . . .xn i.i.d N ðh; 1Þ; 1\h\1. To estimate h under
squared error loss.
 is sufficient ) it is enough to restrict to n.r. rules based on X
X  only.
Let d 0 ¼ d 0 ðX Þ ¼ X.

Rd 0 ðhÞ ¼ 1n 8 h, i.e. d 0 is equalizer.
Also, bd0 ðhÞ ¼ 0 8 h, i.e. d 0 is unbiased.

1
Rd 0 ðhÞ ¼ C d 0 ðhÞ ¼ 8h
n
7.5 Methods for Finding Minimax Rule 225

i.e. condition (i) of Result 2 is satisfied [Here IðhÞ ¼ n].


Let d ¼ d ðX Þ be any n.r rule based on X.

Lemma C d ðhÞ  C d 0 ðhÞ 8 h.


) bd ðhÞ ¼ 0 8 h, i.e. d is also unbiased.
Lemma ) Condition (ii) of Result 2 is also satisfied.
Hence, (i) d 0 is admissible.
(ii) d 0 is minimax.
Also, (iii) d 0 is unique minimax.
[Proof of (iii): Let d 1 ¼ d 1 ðxÞ be another minimax rule.
Then

1
Sup Rd1 ðhÞ ¼ Sup Rd 0 ðhÞ ¼ ¼ C d 0 ð hÞ
h2H h2H n
) C d1 ðhÞ  Rd 1 ðhÞ  Sup Rd 1 ðhÞ ¼ Cd 0 ðhÞ8h
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} h2H

By C–R inequality
) Cd 1 ðhÞ  Cd 0 ðhÞ 8 h ) bd1 ðhÞ ¼ 0 8 h (By Lemma).
) d 1 is an unbiased estimator of h. But since X  is complete d 0 is the unique
 is the unique minimax esti-
unbiased estimator of h; i.e., d 1 ¼ d 0 , Hence d 0 ¼ X
mator of h]
Proof of Lemma Writing bd ðhÞ ¼ bðhÞ
Let Cd ðhÞ  Cd 0 ðhÞ ¼ 1n 8 h
.
i:e:; b2 ðhÞ þ f1 þ b0 ðhÞg
2
n  1=n 8 h ð7:30Þ

ð7:30Þ ) b0 ðhÞ  0 8 h i:e:; bðhÞ is non-increasing ð7:31Þ

1 2b0 ðhÞ .
 b2 ðhÞ þ f1 þ b0 ðhÞg n  1=n
2
½as ð7:30Þ) þ
n n
2b0 ðhÞ
)  0 ) b0 ðhÞ  0
n

Also ð7:30Þ ) b2 ðhÞ þ 2b0 ðhÞ  0 ð7:32Þ

½As ð7:30Þ)nb2 ðhÞ þ b0 ðhÞ þ 2b0 ðhÞ  0


2

)b2 ðhÞ þ 2b0 ðhÞ  nb2 ðhÞ þ b0 ðhÞ þ 2b0 ðhÞ  0


2

b0 ðhÞ
Now ð7:32Þ )  1
8 h such that bðhÞ 6¼ 0
b0 ðhÞ
2 2
226 7 Statistical Decision Theory

d 1 1
Or b ð hÞ 8 h such that bðhÞ 6¼ 0 ð7:33Þ
dh 2
ð7:31Þ; ð7:33Þ ) bðhÞ ! 0 as h ! 1 ð7:34Þ

Finally (7.31), (7.34) ) bðhÞ ¼ 0 8 h, which proves the Lemma. h

7.6 Minimax Rule: Some Theoretical Aspects

A statistical decision problem  A game between statistician and nature.

H ¼ the set of possible actions for nature:

 ¼ the set of possible actions for statistician:

Lðh; aÞ ¼ Loss ðto the statisticianÞ if the statistician chooses an action ‘a’ and
nature chooses an action ‘h’.
A randomized action for the statistician = a probability distribution over ɶ.
The statistician observes the value of a r.v. X. If X = x is observed, the statistician
chooses a randomized action dðxÞ.

dðxÞ ¼ a randomized rule for statistician:

s ¼ a prior distribution: ¼ a probability distribution over H:


¼ a randomized action for the nature:
If the statistician chooses a randomized rule d and the nature chooses a ran-
domized action s, then the statistician’s expected loss is
cðs; dÞ ¼ Bayes risk of d w:r:t: s.
Result 1 For any d 2 D,
Sup Rd ðhÞ ¼ Sup cðs; dÞ where H = the set of all possible s’s.
h2H s2H

Proof

Rd ðhÞ  Sup Rd ðhÞ 8 h


h2H
) cðs; dÞ  Sup Rd ðhÞ 8 s ð7:35Þ
h2H
) Sup cðs; dÞ  Sup Rd ðhÞ
s2H h2H

Consider a prior s0 which chooses a particular value h with probability 1.


7.6 Minimax Rule: Some Theoretical Aspects 227

Then r ðs0 ; dÞ ¼ Rd ðhÞ


Hence, Sup r ðs; dÞ r ðs0 ; dÞ ¼ Rd ðhÞ 8 h
s2H
Thus Rd ðhÞ  Sup r ðs; dÞ 8 h
s2H

) Sup Rd ðhÞ  Sup r ðs; dÞ ð7:36Þ


h2H s2H

(7.35), (7.36) ) Sup Rd ðhÞ ¼ Sup r ðs; dÞ, hence the proof.
h2H s2H
A rule d0 is minimax if it minimizes

Sup Rd ðhÞw:r:t d 2 D
h2X

Or; Sup cðs; dÞ w:r:t d 2 D ½ by Result 1


s2H

i.e. if Sup r ðs; dÞ ¼ Inf Sup r ðs; dÞ ¼ m (say)


s2H d2D s2H

m ¼ Upper value of the game:

Thus, if a statistician chooses a minimax rule d0 , his expected loss is at most m


whatever be the action chosen by nature.
Similarly, a prior s0 is said to be a maximum rule for the nature or a least
favourable prior for the statistician if s0 maximizes inf r ðs; dÞw:r:t s, i.e. if
d

inf r ðs0 ; dÞ ¼ Sup inf r ðs; dÞ ¼ m ðSay)


d s d

m ¼ Lower value of the game:

If nature chooses a least favourable s0 , then expected loss (of the statistician) is
at least m whatever be the rule the statistician chooses. h
Result 2 m  m
Proof

r ðs; dÞ  Sup r ðs; dÞ 8 s; d


s
) inf r ðs; dÞ  inf Sup r ðs; dÞ ¼ m 8 s
d d s
) Sup inf r ðs; dÞ  m
s d

) m  m

The statistical game is said to have a value m if m ¼ m ¼ m, h


228 7 Statistical Decision Theory

Result 3 if the statistical game has a value and a least favourable prior s0 and a
minimax rule d0 exists, then d0 is Bayes w.r.t. s0 .
Proof m ¼ inf r ðs0 ; dÞ  r ðs0 ; d0 Þ  Sup r ðs; d0 Þ ¼ m
d s
If m ¼ m, then ‘=’ must hold every where implying
inf r ðs0 ; dÞ ¼ r ðs0 ; d0 Þ ) d0 is Bayes w.r.t. s0 . h
d
Minimax theorem Let H be finite and the risk set S be bounded below. Then the
statistical game will have a value and a least favourable prior s0 exists.
If further, S is closed from below an admissible minimax rule d0 exists and d0
Bayes w.r.t. s0 .
Thus if H is finite and S is bounded below as well as closed from below, then
(i) A minimax rule exists
(ii) An admissible minimax rule exists and
(iii) A minimax rule is Bayes (w.r.t least favourable prior s0 ).

Result 4 Suppose there exists a rule d0 such that


(i) Rd0 ðhÞ  c 8 h
(ii) d0 is Bayes w.r.t. some s0 and r ðs0 ; d0 Þ ¼ c,
then
(a) d0 is minimax
(b) s0 is least favourable prior.

Proof
(a) Proved earlier
(b) To show inf r ðs0 ; dÞ inf r ðs; dÞ 8 s ðbÞ
d d
Now (i) ) r ðs; d0 Þ  c 8 s
) inf r ðs; dÞ  r ðs; d0 Þ  c ¼ r ðs0 ; d0 Þ ¼ inf r ðs0 ; dÞ 8 s by ðiiÞ
d d
This proves (b). h

7.7 Invariance

Many statistical decision problems are invariant w.r.t. some transformations of X. In


such case it seems reasonable to restrict to decision rules, which are also invariant
w.r.t. similar transformations. Such a decision rule is called an invariant decision
rule and in many problems a best rule exists within the class of invariant rules.
7.7 Invariance 229

Example 7.30 X N ðh; 1Þ; 1\h\1


We are to estimate h under the squared error loss.
Suppose one considers a transformation of X, viz., X 0 ¼ X þ c, c = a given
constant and considers the problem of estimating h0 ¼ h þ c on the basis of
X 0 N ðh0 ; 1Þ under the squared error loss.
For an action ‘a’ for the first problem, there is an action a0 ¼ a þ c for the second
problem and vice versa with Lðh; aÞ ¼ Lðh0 ; a0 Þ. Thus the two problems may be
considered to be equivalent in the sense that ðH; ɶ, L) ðH0 ; ɶ′, L′).
Now let d ¼ d ð X Þ ¼ a reasonable estimator of h on the basis of X. Then
d ðX 0 Þ ¼ d ðX þ cÞ should be a reasonable estimator of h0 on the basis of X 0 . Also, if
d ð xÞ ¼ a reasonable estimate of h on the basis of X ¼ x then d ð xÞ þ c should be a
reasonable estimate for h0 . The two estimates are identical if

d ðx þ cÞ ¼ d ð xÞ þ c ð7:37Þ

An estimator d ð X Þ is said to be a location invariant or an equivariant if (7.37)


holds 8x8c.
d ð X Þ is an equivariant estimator iff d ð X Þ ¼ X þ K ¼ dK ð X Þ (say) for some
constant K.
[If d ð X Þ ¼ X þ K, then (7.37) is satisfied 8x8c. Let (7.37) be satisfied 8x8c. For
c ¼ x, ðiÞ ) d0 ¼ dðxÞ  x; or d ð xÞ ¼ x þ K, K ¼ dð0Þ Rdk ðhÞ ¼
E ðX þ K  hÞ2 ¼ 1 þ K 2 which is minimum when K = 0. Thus
Rd0 ðhÞ  Rdk ðhÞ8h8K.
) d0 ð X Þ ¼ X is the best within the class of equivariant estimators.]

Invariant statistical decision problems


ðH; ɶ, L) X = a r.v. and x = observed value of X 2  x (=sample space)
Ph = A probability distribution over  x depending on h.
P ¼ fPh =h 2 Hg = family of probability distribution.
A statistical decision problem  ðH; a; LÞ and P,
Groups of transformation of X(or  x)
Y ¼ gðXÞ = a transformation of X
gðxÞ ¼ a single valued function of x.
g: x! x , g = a transformation on  x
We assume that g is measurable so that g(x) is an r.v. g is said to be an onto
transformation if the range of g(x) is  x is 
x, i.e.  x.
g is said to be 1:1 if gðx1 Þ ¼ gðx2 Þ ) x1 ¼ x2 .
Example 7.31  x ¼ R1 ; gðxÞ ¼ x þ c; c ¼ a real constant. This g is 1:1 and onto.
The identity transformation e is defined as eðxÞ ¼ x. Let g1 ; g2 be two trans-
formations on  x. Then the composition of g2 ; g1 , denoted by g2 g1 is defined as
g2 g1 ðxÞ ¼ g2 ½g1 ðxÞ.
Example 7.32  x ¼ R1
g1 ðxÞ ¼ x þ c1 and g2 ðxÞ ¼ c2 , c1, c2 are real constants. g1 g2 ðxÞ ¼ x þ c1 þ c2
230 7 Statistical Decision Theory

Clearly, g1 g2 g3 ¼ g1 ðg2 g3 Þ ¼ ðg1 g2 Þg3


Also ge ¼ eg ¼ g
If g is a transformation on  x, then the inverse transformation of g, denoted by
g1 , is the transformation g such that
gg1 ¼ g1 g ¼ e.
In the example, g1 1 ðxÞ ¼ x  c1 .

Note g1 exists iff g is 1:1 and onto.


Let G = a class of transformation on 
x
Definition G is called a group of transformations if G is closed under the com-
positions and inverses, i.e. if

i. g1 ; g2 2 G ) g2 g1 2 G
ii. g 2 G ) g1 2 G.

Note Let G be a group of transformations, then every g 2 G is 1:1 and onto (since
g1 exists).
Also, the identity transformation e always 2 G [if g 2 G, then g1 2 G,
e ¼ g1 g 2 G].
Example 7.33  x ¼ R1
gc ðxÞ ¼ x þ c; c ¼ a real constant.
Let G ¼ fgc =  1\c\1g

gc1 ; gc2 2 G ) gc1 gc2 2 G½Asgc1 gc2 ðxÞ ¼ x þ c1 þ c2 ; c1 þ c2 ¼ c

 1 
gc 2 G ) g1
c 2 G Asgc ðxÞ ¼ x þ ðcÞ

Hence, G is a group of transformation which is Additive or Location group.


Example 7.34 
x ¼ R1 , gc ðxÞ ¼ cx where c = a positive real constant

gc1 gc2 ðxÞ ¼ c1 c2 x

1
g1
c ðxÞ ¼ x
c

Let G ¼ fge =0\c\1g

gc 1 ; gc 2 2 G ) gc 1 gc 2 2 G

gc 2 G ) g1
c 2G
7.7 Invariance 231

Thus G is a group of transformations.


These are multiplicative or group under scale transformation.
Example 7.35 
x ¼ R1 , ga;b ¼ a þ bx
 
G ¼ ga;b =  1\a\1; 0\b\1

G is a group transformation.
It is a group under both location and scale transformation.
Example 7.36 x ¼ f0; 1; 2. . .ng
Let gðxÞ ¼ n  x

G ¼ fe; gg

eg ¼ g 2 G; g1 ðxÞ ¼ x ¼ eðxÞ 2 G

Also e1 2 G [Trivially]


Hence, G is a group of transformation.
Example 7.37 x ¼ ðx1 ; x2 ; x3 ; . . .. . .xn Þ
x0 = The set of possible values of xi


x¼ x0 x. . .. . .. . .:x
x0 x x0

Let i ¼ ði1 ; i2 ; i3 ; . . .. . .in Þ be a permutation of 1; 2. . .n


Let gi ð xÞ ¼ ðxi1 ; xi2 . . .. . .xin Þ

G ¼ fgi =i 2 the set of all possible permutation of ð1; 2. . .nÞg

G is a group of transformations. It is a permutation group.


The invariance of a statistical decision problem is considered to be w.r.t a given
group transformations G on  x:

Invariance of P Let G = a given group of transformations on 


x:
Definition P ¼ fPh =h 2 Hg is said to be invariant w.r.t G if for any g 2 G and any
h 2 H ði:e:; any Ph 2 PÞ there exists a unique h0 2 H (i.e. a unique Ph0 2 P) such
that probability distribution of y ¼ gð xÞ is Ph0 when the probability distribution of
X is Ph .
This unique h0 determined by g and h is denoted by gðhÞ.
Example 7.38 X Nðh; 1Þ; 1\h\1

P ¼ fNðh; 1Þ=  1\h\1g

Let G ¼ fgc =  1\c\1g where gc ðxÞ ¼ x þ c


If X Nðh; 1Þ then Y ¼ gc ðxÞ Nðh þ c ¼ h0 ; 1Þ
232 7 Statistical Decision Theory

h0 is uniquely determined by c and h:


Thus P is invariant under G with

gc ðhÞ ¼ h þ c:
Example 7.39 X expðhÞ; 0\h\1
P.d.f of X under h is 1h eh ; x [ 0
x

P ¼ fexpðhÞ=0\h\1g

Let G ¼ fgc =0\c\1g where gc ðxÞ ¼ cx


If X expðhÞ, then gc ðxÞ expðchÞ, i.e. ch ¼ h0 : h0 is uniquely determined by
c and h. Thus P is invariant under G with gc ðhÞ ¼ ch:
Example 7.40 Let X Binðn; hÞ, n known, 0\h\1

P ¼ fBinðn; hÞ=0\h\1g

Let G = a group of transformations on  x ¼ fe; gg where gðxÞ ¼ n  x


If X Binðn; hÞ then eðxÞ Binðn; h ¼ h0 Þ and gðxÞ Binðn; 1  h ¼ h0 Þ
h0 is uniquely determined by h and member of G. Thus P is invariant under
G with eðhÞ ¼ h; 
gðhÞ ¼ 1  h:

Invariance of loss function

Let G = a group of transformations on 


x
Let P be invariant w.r.t G with induced group of transformations on H as
 ¼ f
G g=g 2 Gg:
Definition The loss function L is said to be invariant w.r.t G if for each g 2 G and
each a 2 , there exists a unique a0 2  such that

Lðh; aÞ ¼ LðgðhÞ; a0 Þ 8h 2 H:

This unique a0 determined by g and ‘a’ is denoted by gðaÞ:


Example 7.41 X Nðh; 1Þ; 1\h\1

G ¼ fgc =  1\c\1g; gc ¼ x þ c

P is invariant w.r.t
G with G  ¼ fgc =  1\c\1g; gc ðhÞ ¼ c þ h:
To estimate h under Lðh; aÞ ¼ ðh  aÞ2
For any gc 2 G; a 2 , there is an a0 ¼ a þ c 2 
such that Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:
a0 is uniquely determined by a and c. Hence the loss function is invariant w.r.t G.
7.7 Invariance 233

Example 7.42 X expðhÞ; 0\h\1

G ¼ fgc =0\c\1g; gc ðxÞ ¼ cx

P is invariant w.r.t. G with


G ¼ fgc =0\c\1g, gc ðhÞ ¼ ch:
 2
To estimate h with Lðh; aÞ ¼ 1  ah
For a0 ¼ ca, Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:
This a0 is uniquely determined by a and c. Hence the loss function is invariant w.
r.t. G.
Example 7.43 X Binðn; hÞ, 0\h\1

G ¼ fe; gg; eðxÞ ¼ x; gðxÞ ¼ n  x

P is invariant w.r.t. G with


 ¼ fe; 
G gg, eðhÞ ¼ h, gðhÞ ¼ 1  h:
To estimate h under squared error loss.
Then Lðh; aÞ ¼ LðeðhÞ; a0 Þ where a0 ¼ a
and Lðh; aÞ ¼ LðgðhÞ; a0 Þ where a0 ¼ 1  a: a0 is uniquely determined by a
member of G. Thus L is invariant w.r.t. G.
Invariance of a statistical decision problem:
A statistical decision problem  ðH; ; LÞ and P
G = A group of transformation of  x.
Definition A Statistical decision problem is said to be invariant under G if
(i) P is invariant under G
and (ii) L is invariant under G.
Thus as already shown
i. X Nðh; 1Þ to estimate h under squared error loss
G ¼ fge =  1\c\1g; ge ðxÞ ¼ x þ c

the problem is invariant under G.

ii: X expðhÞ; 0\h\1


 2
To estimate h under Lðh; aÞ ¼ 1  ah ; ge ðxÞ ¼ cx the problem is invariant
under G.
iii. X Binðn; hÞ; n is known, 0\h\1
To estimate h under squared error loss with G ¼ fe; gg
e(x) = x, g(x) = n − x, the problem is invariant under G.

Example 7.44 X Nðl; r2 Þ; 1\l\1; r2 [ 0


To test H0 : l  0 against H1 : l [ 0, i.e. h 2 H0 against h 2 H1
234 7 Statistical Decision Theory

H0 ¼ fh ¼ ðl; r2 Þ=l  0g

H1 ¼ fh ¼ ðl; r2 Þ=l [ 0g; H ¼ H0 þ H1

Let G ¼ fge =0\c\1g; gc ðxÞ ¼ cx


= A group of transformation on 
x

X Nðl; r2 Þ

) gc ðxÞ Nðc l; c2 r2 Þ 2 P

P is invariant under G with gc ðhÞ ðc l; c2 r2 Þ h 2 Hi ,


Note ge ðhÞ 2 Hi ; i ¼ 0; 1; 2; . . .. . .:ðiÞ
i.e. both P0 and P1 are invariant under G
where Pi ¼ fPh =h 2 Hi g; i ¼ 0; 1:
Also, Lðh; aÞ ¼ Lðgc ðhÞ; a0i Þ; i ¼ 0; 18h 2 H by (i)
⇒ Loss is invariant under G
Note To test H0 : h 2 H0 against H1 : h 2 H1 ; H0 ; H1 , disjoint, H0 þ H1 ¼ H

a ¼ fa0 ; a1 g; ai ¼ accept H i :

a0 a1
h 2 H0 0 L0
h 2 H1 L1 0

Let the loss function be 0–Li


Let G = a group of transformation on 
x

P ¼ fPh =h 2 Hg; Pi ¼ fPh =h 2 Hi g

Let both P0 and P1 be invariant under G, then P is invariant under G.


Also, h 2 Hi , gðhÞ 2 Hi ; i ¼ 0; 1
Hence, Lðh; ai Þ ¼ LðgðhÞ; ai Þ; i ¼ 0; 18h 2 H
L is invariant under G
A test of hypothesis problem (with 0–Li loss) is said to be invariant under G if
both P0 and P1 are invariant under G.

Invariant decision rule

Let G = a group of transformation on x. The problem is invariant under G with


corresponding group of induced transformations on H and a.
7.7 Invariance 235

Let g 2 G

Let dðXÞ ¼ a be reasonable n.r. rule for the original problem. dðgðxÞÞ should be
a reasonable rule for the transformed problem. Also if for X = x, dðxÞ 2  is a
reasonable action for the original problem, then for gðXÞ ¼ gðxÞ; ~gðdðXÞÞ should be
a reasonable action in the transformed problem.
These two agree if dðgðxÞÞ ¼ ~gðdðxÞÞ. . .. . .:ðiiÞ
A non-randomized rule is said to be an invariant non-randomized rule if
(ii) holds 8x 2 x8g 2 G.
We thus get a class of n.r. decision rules as
DI = the class of invariant n.r. rules.
Appendix

A.1 Exact Tests Related to Binomial Distribution

A.1.1 We have an infinite population for which π = unknown proportion of indi-


viduals having certain character, say A. We are to test H0 : p ¼ p0 .
For doing this we draw a sample of size n. Suppose x = no. of individuals in the
sample have character A. The sufficient statistic x is used for testing H0 : p ¼ p0 .
Suppose x0 is the observed value of x. Then x  binðn;! pÞ.
P n
(a) H1 : p [ p0 ; x0 : P½x  x0 =H0   a i:e:; px0 ð1  p0 Þnx  a
x  x0 x
!
P n
(b) H2 : p\p0 ; x0 : P½x  x0 =H0   a i:e:; px0 ð1  p0 Þnx  a
x  x0 x

(c) H3 : p 6¼ p0 ; where p0 ¼ 12 may be of our interest.


h n i

x0 : P x    d0 =H0  a
h 2 i h i
n n
i:e:; P x  þ d0 =H0 þ P x   d0 =H0  a
2 ! 2 !
 n  n 
X n 1 X n 1  n
i:e:; þ  a where d0 ¼ x0  
x  n2 þ d0 x 2 x  n2d0 x 2 2

Note
(1) For other values of π0 the exact test cannot be obtained as binomial distri-
bution is symmetric only when p ¼ 12.
(2) For some selected n and π the binomial probability sums considered above are
given in Table 37 of Biometrika (Vol. 1)

© Springer India 2015 237


P.K. Sahu et al., Estimation and Inferential Statistics,
DOI 10.1007/978-81-322-2514-0
238 Appendix

A.1.2 Suppose we have two infinite populations with π1 and π2 as the unknown
proportion of individuals having character A. We are to test H0 : p1 ¼ p2 .
To do this we draw two samples from two populations having sizes n1 and n2.
Suppose x1 and x2 as the random variables denoting the no. of individuals in the 1st
and 2nd samples with character A.
To test H0 : p1 ¼ p2 we make use of the statistics x1 and x2 such that x1 þ x2 ¼ x
(constant), say.
Under H0 : p1 ¼ p2 ¼ p (say),
!
n1
f ðx1 Þ ¼ p:m:f: of x1 ¼ px1 ð1  pÞn1 x1
x1
!
n2
f ðx2 Þ ¼ p:m:f: of x2 ¼ px2 ð1  pÞn2 x2
x2
 
n1 þ n2
f ðxÞ ¼ p:m:f: of x ¼ px ð1  pÞn1 þ n2 x :
x

The conditional distribution of x1 given x has p.m.f.


! !
n1 n2
x1 x2
f ðx1 =xÞ ¼   , which is hypergeometric and independent of p.
n1 þ n2
x
Suppose the observed values of x1 and x are x10 and x0 respectively.

(a) H1 : p1 [ p2 ; x0 : P½x1  x10 =x ¼ x0   a


! 
n1 n2
X x1 x0  x1
i:e:;   a
x1  x10 n1 þ n2
x0

(b) H2 : p1 \p2 ; x0 : P½x1  x10 =x ¼ x0   a


! 
n1 n2
X x1 x0  x1
i:e:;   a
x1  x10 n1 þ n2
x0
Appendix 239

(c) H3 : p1 6¼ p2 ; exact test is not available.


Note The above probabilities can be obtained from the tables of hypergeometric
distributions (Standard University Press).

A.2 Exact Tests Related to Poisson Distribution

A.2.1 Suppose we have a Poisson population with unknown parameter k. We draw


a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here, we are to test
H 0 : k ¼ k0 . P
To develop a test we make use of the sufficient statistic y ¼ ni¼1 xi ; which is
itself distributed as Poisson with parameter nk. The p.m.f. of y under H0 is therefore
y
f ðyÞ ¼ enk0 ðnky!0 Þ ; y ¼ 0; 1; 2. . .
Suppose y0 is the observed value of y.
(a) H1 : k [ k0 ; x0 : P½y  y0 =k ¼ k0   a

X ðnk0 Þy
i:e:; enk0  a:
y  y0
y!

(b) H2 : k\k0 ; x0 : P½y  y0 =k ¼ k0   a

X ðnk0 Þy
i:e:; enk0  a:
y  y0
y!

(c) H3 : k 6¼ k0 : exact test is not available.


Note These probabilities may be obtained from Table 7 of Biometrika (Vol. 1)
A.2.2 Suppose we have two populations Pðk1 Þ and Pðk2 Þ. We draw a random
sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from Pðk1 Þ and another random sample
ðx21 ; x22 ; . . .; x2n2 Þ of size
Pn12 from Pðk2 Þ. We are to Ptest H0 : k1 ¼ k2 ¼ k (say).
Here we note that y1 ¼ ni¼1 x1i  Pðn1 k1 Þ and y2 ¼ ni¼1
2
x2i  Pðn2 k2 Þ.
To develop a test we shall make use of the sufficient statisticsy1 and y2 but shall
concentrate only on those for which y ¼ y1 þ y2 = constant. Under H0 the p.m.f. of
y1 ; y2 and y are

ðn1 kÞy1 ðn2 kÞy2 fðn1 þ n2 Þkgy


f ðy1 Þ ¼ en1 k ; f ðy2 Þ ¼ en2 k and f ðyÞ ¼ eðn1 þ n2 Þk
y1 ! y2 ! y!
240 Appendix

The conditional distribution of y1 given y has the p.m.f. as


yy1 y1
en2 k ðnðyy
2 kÞ
1 Þ!
en1 k ðn1ykÞ
1!
f ðy1 =yÞ ¼ fðn1 þ n2 Þkgy
eðn1 þ n2 Þk y!
y! ny11 ny22
¼
y1 !y2 ! ðn1 þ n2 Þy
!  y1  y2  
y n1 n1 n1
¼ 1  bin y; free of k:
y1 n1 þ n2 n1 þ n2 n1 þ n2

So this may be regarded as sufficient statistic. Suppose the observed values of y1


and y are y10 and y0 respectively. We consider the conditional p.m.f. f ðy1 =y0 Þ for
testing H0 .

(a) H1 : k1 [ k2 ; x0 : P½y1  y10 =y ¼ y0   a

! y1  y0 y1


X y0 n1 n2
i:e:; a
y1  y10 y1 n1 þ n2 n1 þ n2

(b) H2 : k1 \k2 ; x0 : P½y1  y10 =y ¼ y0   a

! y1  y0 y1


X y0 n1 n2
i:e:;  a:
y1  y10 y1 n1 þ n2 n1 þ n2

(c) H3 : k 6¼ k0 : exact test is not available.

A.3 A Test for Independence of Two Attributes

In many investigations one is faced with the problem of judging whether two
qualitative characters, say A and B, may be said to be independent. Let us denote the
forms of A by Ai fi ¼ 1ð1Þkg and the forms of B by Bj fj ¼ 1ð1Þlg, and the prob-
ability associated with the cell Ai Bj in the two-way classification of the population
Appendix 241
P
by pij . The probability associated with Ai is then pi0 ¼ j pij and that associated
P
with Bj is p0j ¼ i pij . We show the concerned distribution in the following table:

A B Total
B1 B2 …. Bj …. Bl
A1 p11 p12 …. pij …. p1l p10
A2 p21 p22 …. p2j …. p2l p20
. . . . . .
. . . . . .
Ai pi1 pi2 …. pij …. pil pi0
. . . . . .
. . . . . .
Ak pk1 pk2 …. pkj …. pkl pk0
Total p01 p02 …. p0j …. p0l 1

 
where pij ¼ P A ¼ Ai ; B ¼ Bj  8ði; jÞ
 
pi0 ¼ PðA ¼ Ai Þ and p0j ¼ P B ¼ Bj

We are to test H 0 : A and B are independent , H 0 : pij ¼ pi0  p0j 8ði; jÞ


To do this we draw a random sample of size n. Let nij P = observed frequency for
the cell Ai Bj . The marginal frequency of Ai is ni0 ¼ j nij and that of Bj is
P
n0j ¼ i nij . Note that the joint p.m.f. of nij is multinomial, i.e.
  YY
i ¼ 1ð1Þk i ¼ 1ð1Þk n!
f nij ; pij ; ¼Q Q ðpij Þnij :
j ¼ 1ð1Þl j ¼ 1ð1Þl i j ðn ij Þ! i j

Under H 0 : pij ¼ pio  poj 8ði; jÞ


  Y Y
i ¼ 1ð1Þk n!
f nij ; ¼Q Q ðpio Þnio ðpoj Þnoj
j ¼ 1ð1Þl i j ðn ij Þ! i j
n! Y
f ðni0 Þ ¼ Q ðpi0 Þni0 8i ¼ 1ð1Þk
i ðn i0 Þ! i
n! Y
f ðn0j Þ ¼ Q ðp0j Þn0j 8j ¼ 1ð1Þl
j ðn 0j Þ! j

The conditional Q distribution


Q of nij keeping marginals fixed is, under
f ðnij Þ ðni0 Þ! ðn0j Þ!
H 0 ; f ðni0 Þf ðn0j Þ ¼ n!i Q Q ðnj Þ!
i j ij
242 Appendix

This may be used for testing H 0 . Keeping marginal frequencies fixed we change
the cell-frequencies and calculate the corresponding probabilities. If the sum of the
probabilities  a, then we reject H 0 .

A.4 Problems Related to Univariate


Normal Distribution

Suppose we have a normal population with mean l and standard deviation r. We


draw a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here x ¼
Pn P P
1
n 1 xi ; s ¼ n
2 1
xÞ2 and s02 ¼ n 1 1 i ðxi  xÞ2 .
i ðxi  
A.4.1 To test H 0 : l ¼ l0 .
pffiffi
Case I r known: we note that nðxlÞ r  Nð0; 1Þ
pffiffi
Under H 0 ; s ¼ nðxr l0 Þ  Nð0; 1Þ:

H 1 : l [ l0 ; x0 : s [ sa
H 2 : l\l0 ; x0 : s\sa
H 3 : l 6¼ l0 ; x0 : jsj [ sa=2

100ð1  aÞ% confidence interval for l (when H 0 is rejected) is x prffiffin sa=2
pffiffi
Case II r unknown: Here we estimate r by s’ and nðxs0 lÞ  tn1 .
pffiffi
Under H 0 t ¼ nðxs0 l0 Þ  tn1 :

H 1 : l [ l0 ; x0 : t [ ta;n1
H 2 : l\l0 ; x0 : t\ta;n1
H 3 : l 6¼ l0 ; x0 : jtj [ ta=2;n1
0

100ð1  aÞ% confidence interval for l is x ps ffiffin ta=2 ; n  1
A.4.2 To test H 0 : r ¼ r0P
: P
ðxi  lÞ2 ðxi  lÞ2
Case I l known: we know r2  v2n , under H 0 ; v2 ¼ r20
 v2n .

H 1 : r [ r0 ; x0 : v2 [ v2a;n
H 2 : r\r0 ; x0 : v2 \v21a;n
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n
Appendix 243

or,

v2 \v21a=2;n
" P #
ðxi  lÞ2
P v1a=2;n \
2
\va=2;n ¼ 1  a
2
r2
"P P #
ðxi  lÞ2 ðxi  lÞ2
i:e:; P \r \2
¼1a
v2a=2;n v21a=2;n

P)100ð1  aÞ% confidence


P
interval for r2 when l is known is
ðxi lÞ2 ðxi lÞ2
v 2 ; v2
.
a=2;n 1a=2;n
P P
ðxi  xÞ2 ðxi xÞ2
Case II l unknown: we know r2  v2n1 under H 0 ; v2 ¼ r20
 v2n1

H 1 : r [ r0 ; x0 : v2 [ v2a;n1
H 2 : r\r0 ; x0 : v2 \v21a;n1
H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n1 :

or,

v2 [ v21a=2;n1
" P #
ðxi  xÞ2
P v1a=2;n1 \
2
\va=2;n1 ¼ 1  a
2
r2
"P P #
ðxi  xÞ2 ðxi  xÞ2
i:e., P \r \ 2
2
¼1a
v2a=2;n1 v1a=2;n1

P) 100ð1 P
 aÞ% confidence interval for r2 when l is unknown is
ðxi xÞ2 ðxi xÞ2
v2
; v2 .
a=2;n1 1a=2;n1

A.5 Problems Relating Two Univariate Normal


Distributions

Suppose we have two independent populations Nðl; r21 Þ and Nðl2 ; r22 Þ. We draw a
random sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from the first population and another
random sample ðx21 ; x22 ; . . .; x2n2 Þ of size n2 from the second population.
244 Appendix

Now, we have for the 1st and the 2nd samples

1X n1
1X n2
x1 ¼ x1i and x2 ¼ x2i
n1 i¼1 n2 i¼1
1 X n1
1 X n2
s02
1 ¼ ðx1i  x1 Þ2 s02
2 ¼ ðx2i  x2 Þ2
n1  1 i¼1 n2  1 i¼1

respectively.
(I) H 0 : 11 l1 þ 12 l2 ¼ 13 :
Case I r1 ; r2 known:
11x1 þ 12x2  ð11 l1 þ 12 l2 Þ
We find that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Nð0; 1Þ
121 r21 122 r22
þ
n1 n2
11x1 þ 12x2  13
Under H 0 ; s ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Nð0; 1Þ
121 r21 122 r22
þ
n1 n2

) H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : s [ sa
H 2 : 11 l1 þ 12 l2 \13 ; x0 : s\sa
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jsj [ sa=2

Also, ð1  aÞ100 % confidence interval for ð11 l1 þ 12 l2 Þ is


2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3
1 2 r2 1 2 r2
411x1 þ 12x2 1 1
þ 2 2 sa=2 5
n1 n2

Case II r1 ; r2 unknown:
Fisher’s t-test: We assume r1 ¼ r2 ¼ r, say.
ðn1 1Þs02 02
1 þ ðn2 1Þs2
r2 is estimated by ðn1 þ n2 2Þ ¼ s02 say

11x1 þ 12x2  ð11 l1 þ 12 l2 Þ


Also, sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  tn1 þ n2  2
2 2
1 1
s0 1
þ 2
n1 n2
11x1 þ 12x2  13
Under H 0 ; t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  tn1 þ n2 2
121 122
s0 þ
n1 n2
This t is known as Fisher’s t when 11 ¼ 1, 12 ¼ 1.
Appendix 245

H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : t [ ta;n1 þ n2 2
H 2 : 11 l1 þ 12 l2 \13 ; x0 : t\  ta;n1 þ n2 2
H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jtj [ ta=2 ;n1 þ n2 2

Also 100(1 − α)% confidence interval for 11 l1 þ 12 l2 is


0 sffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1
1 2 1 2
@11x1 þ 12x2 s0 1
þ 2 ta=2;n1 þ n2 2 A:
n1 n2

Note–I The above procedure may also be applicable when r1 and r2 are not
 r2 
equal provided 1  r12 \ 0.4—theoretical investigation in this area verifies this.
2
Note–II when homoscedasticity assumption r1 ¼ r2 is not tenable then we
require the alternative procedure and the corresponding problem is known as the
Fisher-Behren problem.
Note–III For 11 ¼ 1 and 12 ¼ 1 we get the test procedure for the difference
between the two means. Also for testing the ratio of the means, i.e. for testing
H 0 : ll1 ¼ k; say, we start with ðx1  kx2 Þ.
2
(II) H 0 : rr12 ¼ n0 :
P
1
ðx1i l1 Þ2
Case I l1 ; l2 known: n11 P ðx l Þ2 : r12  F n1 ;n2
n1 2i 2 1
r2
2
P
ðx1i l Þ2 =n1
) Under H 0 ; F ¼ P ðx l1 Þ2 n : n12  F n1 ;n2
2i 2 = 2 0

r1
H1 : [ n0 ; x0 : F [ F a;n1 ;n2
r2
r1
H 2 : \n0 ; x0 : F\F 1a;n1 ;n2
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 ;n2 or; F\F 1a=2;n1 ;n2 :
r2
P 
ðx1i l1 Þ2 =n1 r22
P
Also, P F 1a=2;n1 ;n2 \ ðx l Þ2 n : r2 \F a=2;n1 ;n2 ¼ 1  a
2i 2 = 2 1

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P 
ðx1i  l1 Þ2 ðx1i  l1 Þ2
Or, P P n2 r1
\ r2 \ n P n2
¼1a
n 1 ðx  l Þ2 F
2i 2 a=2;n1 ;n2 ðx  l Þ2 F
1 2i 2 1a=2;n1 ;n2

r1
This provides the 100ð1  aÞ% confidence interval for r2 when l1 ; l2 are
known.
Case II l1 ; l2 unknown:
P
1
ðx1i x1 Þ2
We have n111 P ðx x Þ2 : r12  F n1 1;n2 1
n2 1 2i 2 1
r2
2
246 Appendix

s02
1 r2
2
i:e:; :  F n1 1;n2 1
s02
2 r1
2

s02
under H 0 ; F ¼ s102 : e12  F n1 1;n2 1
2 o

r1
) H1 : [ n0 ; x0 : F [ F a;n1 1;n2 1
r2
r1
H2 : \n0 ; x0 : F\F 1a;n1 1;n2 1
r2
r1
H3 : 6¼ n0 ; x0 : F [ F a=2;n1 1;n2 1 or F\F 1a=2;n1 1;n2 1 :
r2

2 3
6 s02 7
Also, P4F 1a=2;n1 1;n2 1 \ s102 : 1 2 \F a=2;n1 1;n2 1 5 ¼ 1  a
2 r1
r2

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
s02 r1 s02
Or, P s02 F
1
\ r2 \ s02 F
1
¼1a
2 a=2;n1 1;n2 1 2 1a=2;n1 1;n2 1

i.e., 100ð1  aÞ% confidence interval for rr12 , when l1 ; l2 are unknown, is
2 3
 
s01 0 s01 0
6 s2 s2 7
4qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi5:
F a=2;n1 1;n2 1 F 1a=2;n1 1;n2 1

A.6 Problems Relating to Bivariate Normal


Distributions

Suppose in a given population the variables x and y are distributed in the bivariate
normal form N 2 ðlx; ly ; rx ; ry ; qÞ. Let ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ be the values of
x and y observed in a sample of size n drawn from this population. We shall
suppose that the n pairs of sample observations are random and independent. We
shall also assume that all the parameters are unknown.
We have for the sample observations
1X 1X 1 X
x¼ xi ; y ¼ yi ; s02
x ¼ ðxi  xÞ2 ;
n i n i n1 i
P
1 X
2 2
i ðxi  xÞ ðyi  yÞ
1
02 2
sy ¼ ðyi  yÞ ; and r xy ¼ n1
n1 i s0x s0y
Appendix 247

(1) To test H 0 : q ¼ 0:
pffiffiffiffiffiffi
We know when q ¼ 0; t ¼ rpffiffiffiffiffiffiffi
n2ffi
2
 tn2
1r

H 1 : q [ 0; x0 : t [ ta;n2
H 2 : q\0; x0 : t [  ta;n2
H 3 : q 6¼ 0; x0 : jtj [ ta=2;n2

Note For testing q ¼ q0 ð6¼0Þ, exact test is difficult to get as for q 6¼ 0 the
distribution of r is complicated in nature. But for moderately large n one can use the
large sample test which will be considered later.
(2) H 0 : lx  ly ¼ n0
Define z ¼ x  y ) lz ¼ lx  ly i.e., we are to test H 0 : lz ¼ n0 . Also
pffiffi P
nðzlz Þ
note that s0z  tn1 where s02
z ¼ n1
1
zÞ2 ¼ s02
i ðzi  
02 0
x þ sy  2sxy
P pffiffi
nðzn0 Þ
s0xy ¼ n1
1 2 2
i ðxi xÞ ðyi yÞ . Under H 0 ; t ¼ s0  tn1 :
z

For H 1 : lx ly [ n0 ; x0 : t [ ta;n1

H 2 : lx ly \n0 ; x0 : t\  ta;n1


H 3 : lx ly 6¼ n0 ; x0 : jtj [ ta=2;n1

s0
Also, 100ð1  aÞ% confidence interval for lz ¼ lx  ly is z pzffiffin ta=2;n1
(3) H 0 : llx ¼ g0 : we write g¼ llx .
y y

To test H 0 : g ¼ g0 , we take z ¼ xgy ) lz ¼ lx  gly ¼ 0.


z ¼ x  gy = a function of g.
s02 02 2 02 0
z ¼ sz þ g sy  2gsxy = a function of g.
pffiffi pffiffi
nðzlz Þ
Now, s0z  tn1 : i:e:; sn0 z  tn1 ð*lz ¼ 0Þ
pffiffi z

Under H 0 ; t ¼ sn0 z0  tn1 where z0 ¼ x  g0y


z0

s02 02 02 0
z0 ¼ sx þ g0 sy  2g0 sxy

So for H 1 : llx [ g0 ; x0 : t [ ta;n1


y

lx
H2 : \g0 ; x0 : t \  ta;n1
ly
l
H 3 : x 6¼ g0 ; x0 : jtj [ ta=2;n1 :
ly
248 Appendix

h pffiffi i
Again P ta=2;n1 \ sn0 z \ta=2;n1 ¼ 1  a
z
hpffiffi  i
 nz
i.e., P  s0 \ta=2;n1 ¼ 1  a or P½wðgÞ\0 ¼ 1  a.
 2
z
z2
Solving the equation wðgÞ ¼ n s02  ta=2;n1 ¼ 0 which is a quadratic equation
z
in g, one can get two roots g1 and g2 ð [ g1 Þ. Now if wðgÞ is a convex function and
g1 and g2 are real, then P½g1 \g\g2  ¼ 1  a. If wðgÞ is a concave function, then
P½g\g1 ; g [ g2  ¼ 1  a. But if g1 and g2 be imaginary then from the given sample
100ð1  aÞ%Confidence interval does not exist.
(4) Test for the ratio n ¼ rrxy :
We write u ¼ x þ ny; v ¼ x  ny

) Covðu; vÞ ¼ r2x  n2 r2y ) quv ¼ 0


pffiffiffiffiffiffi
Then, rp
uv ffiffiffiffiffiffiffiffi
n2ffi
2
 tn2
1ruv
P
1
ðui uÞðvi vÞ
where r uv ¼ n su sv = a function of n. We are to test H 0 : rrxy ¼ n0 , i.e.
H 0 : n ¼ n0 .
pffiffiffiffiffiffi
r0uv n2
) under H0, t ¼ p ffiffiffiffiffiffiffiffiffi
02
 tn 2
1ruv
where r 0uv ¼ value of r uv under n ¼ n0 .
For H 1 : n [ n0 ; x0 : t [ ta;n2
H 2 : n\n0 ; x0 : t\ta;n2
H 3 : n 6¼ n0 ; x0 : jtj [ ta=2;n2 :
pffiffiffiffiffiffi

Also, P ta=2;n2 \ rp
uv ffiffiffiffiffiffiffiffi
n2ffi
2
\t a=2;n2 ¼ 1  a
1ruv
r2 ðn2Þ
Solving the equation wðnÞ ¼ uv1r2  t2a=2;n2 ¼ 0, (which is a quadratic in n)
uv
one can get two roots n1 and n2 ð [ n1 Þ. If these roots are real and wðnÞ is a convex
function, then Pðn1 \n\n2 Þ ¼ 1  a. Again if wðnÞ is concave,
Pðn\n1 ; n [ n2 Þ ¼ 1  a. But if n1 and n2 are not real, then 100ð1  aÞ%
Confidence interval does not exist so far as the given sample is concerned.
(5) rx ; ry ; q are known:
H 0 : lx ¼ l0x ; ly ¼ l0y against H1 : H0 is not true. We know that
" 2
     #
1 x  lx x  lx y  ly y  ly 2
Qðx; yÞ ¼ 2q þ  v22
1  q2 rx rx ry ry
!
r2x r2y
ðx; yÞ  N 2 lx ; ly ; ; ; q
n n
"       #
n x  lx 2 x  lx y  ly y  ly 2
) Qðx; yÞ ¼ 2q þ  v22
1  q2 rx rx ry ry
Appendix 249

Under H 0 ,
2 ! ! 3
    0 2
n x  l 0 2
x  l 0 y  l 0
y  l
v2 ¼ 4 x
2q x y
þ
y 5  v2
1  q2 rx rx ry ry 2

Hence, the critical region is x0 : v2 [ v2a;2 .

A.7 Problems Relating to k-Univariate Normal


Distributions

Suppose there are k-populations Nðl1 ; r21 Þ; Nðl2 ; r22 Þ; . . .Nðlk ; r2k Þ. We draw a
random sample of size ni from the ith population with ni (  2 for at least one i).
Define
xij ¼ jth observation of ith sample, i = 1,2,...,k; j = 1,2,..., ni
Pi
xi ¼ ith sample mean ¼ n1i nj¼1 xij
02
Pi  2
si ¼ ith sample variance ¼ ni 1 nj¼1
1
xij  xi
(I) We are to test H 0 : l1 ¼ l2 ¼    ¼ lk ð¼lÞ, say against H 1 . There is at least
one inequality in H 0 .
Assumption r1 ¼ r2 ¼    ¼ rk ð¼rÞ say.
Note that xi  N li ; rni
2

pffiffiffi
n ðx l Þ
) i ri i  Nð0; 1Þ 8i and are independent.
ðn 1Þs02
Also, i r2 i  v2ni 1 (xi and s0i are independent.)
Under H 0
X k
ni ðxi  lÞ2
 v2k 
i¼1
r2
these two v2 are independent.
Xk
ðni  1Þs02
and i
 v2nk
i¼1
r2

But the unknown l is estimated by


1X X

l nixi ¼ xðsayÞ; n ¼ ni
n i

Pk P
) Under H 0 ; ni ðxi  xÞ2  r2 v2k1 and ki¼1 ðni  1Þs02
i  r vnk .
2 2
P
i¼1
2
ni ðxi  xÞ =
) Under H 0 ; F ¼ P k  1  F k1;nk .
02
i ðn i  1Þs i =n  k
x0 : F [ F a;k1;nk . If H 0 is rejected, then we may be interested to test H 0 : li ¼
lj against H 1 : li 6¼ lj 8ði; jÞ.
250 Appendix

  
  2 1 1
xi  xj  N li  lj ; r þ
ni nj
xi  xj  ðli  lj Þ
) qffiffiffiffiffiffiffiffiffiffiffiffiffi  Nð0; 1Þ
r n1i þ n1j
P
ðni 1Þs02 ðxi xpj Þðli lj Þ
ffiffiffiffiffiffiffiffi
Unknown r2 is estimated by r ^2 ¼ n  k i ¼ s02 , say )  tnk .
s0 ni þ nj
1 1
ðp xi xj Þ
) under H 0 ; t ¼ 0 ffiffiffiffiffiffiffiffi  tnk :
ni þ nj
1 1
s
 
) x0 : jtj [ ta=2;nk . Also, 100ð1  aÞ% confidence interval for li  lj is
n  qffiffiffiffiffiffiffiffiffiffiffiffiffi o
xi xj s0 n1i þ n1j ta=2;nk .
(II)Bartlett’s test To test H 0 : r1 ¼ r2 ¼    ¼ rk ð¼ rÞ, say against H 1 : There
is at least one inequality in H 0 .
P
Define ci ¼ ni  1 and c ¼ ki¼1 ci ¼ n  k. Bartlett’s test statistic M is such that
( )
X
k
c s02 X
k
M ¼ c loge i i
 ci loge s02
i¼1
c i¼1
i

Under H 0 M  v2k1 (approximately) provided none of ci is small. For small


P
samples M 0 ¼ n M o  v2k1 under H 0 where c1 ¼ ki¼1 c1  1c and
c i
1 þ 3ðk1Þ
1

x0 : M 0 [ v2a;k1 :

A.8 Test for Regression

Suppose the sample values of x and y are arranged in arrays of y according to the
fixed values of x as given below:

x1 x2 . . . xi ... xk
y1 y21 . . . yi1 ... yk1
y12 y22 . . . yi2 yk2
: : ...
: :
: :
y1n1 y2n1 yini ... yknk
Appendix 251

Pni P
Define yi0 ¼ 1
ni j¼1 yij ; y00 ¼ 1n i niyi0 ¼ y

1X X
x¼ ni x i ; n ¼ ni
n i i
P
ni ðy  y00 Þ2
e2yx ¼ P iP i0
i j ð yij  y00 Þ2
qffiffiffiffiffiffi
eyx ¼ þ e2yx ¼ sample correlation ratio:
P P
j ðyij   y00 Þðxi  xÞ
1
i
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r ¼ rn n
on P offi
P P
j ðyij  
2 2
1
n i y00 Þ 1
n i ni ð x i  x Þ

 
We assume yij xi  N 1 ðli ; r2 Þ , i.e. Eðyij xi Þ ¼ li: .
(I) Test for regression: H 0 There does not exist any regression of y on x.

, H 0 : l1 ¼ l2 ¼    ¼ lk :
qffiffiffiffiffiffi
Define g2yx ¼ VðEðy=xÞÞ
VðyÞ ; gyx ¼ þ g2yx ¼ population correlation ratio.
) To test H 0 is equivalent to test H 0 : g2yx ¼ 0 against H 1 : g2yx [ 0
We note that
XX XX X
ðyij  y00 Þ2 ¼ ðyij  yi0 Þ2 þ ni ðyi0  y00 Þ2
i j i j i

Under H 0
XX X
SSB ¼ e2yx ðyij  y00 Þ2 ¼ ni ðyi0  y00 Þ2  r2  v2k1
i j i
XX XX
SSw ¼ 1  e2yx ðyij  y00 Þ2 ¼ ðyij  yi0 Þ2  r2 :v2nk
i j i j
h i
=ðk1Þ e2yx B =ðk1Þ
) Under H 0 :F ¼ ð1e2 Þ nk  F k1;nk : F ¼ SS
yx =
SSW =nk

) x0 : F [ F a;k1;nk :

(II) If H 0 is rejected then we may be interested in testing whether the regression


is linear, i.e. we are to test

H 0 : li ¼ a þ bxi 8i
H 1 : li 6¼ a þ bxi
252 Appendix

P P P
We note that, e2yx ðyij  y00 Þ2 ¼ i ni ðyi0  y00 Þ2
nP P i j
o2
P P ðy 
y Þ ðx i xÞ
^2 P ni ðxi xÞ2
Pj
ij 00
Also, r 2 i j ðyij  y00 Þ2 ¼ ¼b
i

ni ðxi xÞ2
PP i
ðyij y00 Þðxi xÞ

where b i Pj
2
ni ðxi xÞ
P P
i
P
) e2yx r 2
ðyij  y00 Þ2 ¼ ^ 2 P ni ðxi xÞ2  r2 :v2 under
ni ðyi0  y00 Þ2  b
i j i i k2

H0. P P
P P
Also, e2yx i j ðyij  y00 Þ2 and e2yx  r 2 i y00 Þ2 are independent.
j ðyij  
ðe2yx r2 Þ=ðk2Þ
) under H 0 ; F ¼ ð1e2yx Þ=nk
 F k2;nk

) x0 : F [ F a;k2;nk

A.9 Tests Relating to Simple Linear


Regression Equation

Regression of y on x is established and it is linear, i.e. Eðy=xÞ ¼a þ bx; say

) Eðy=x ¼ xi Þ ¼ a þ bxi ; i ¼ 1ð1Þn:


y=x  Nða þ bx; r2 Þ

Least square (LS) regression line is given by


PY = a + bx, where a, b are the LS
ðy yÞðxi xÞ
estimates of α and β, i.e. a ¼ y  bx and b ¼ Pi ðx xÞ2 ¼ Sxyxx :
S

2 i

)y  N a þ bx; rn and b  N b; Srxx . Also they are independent.


2

) ‘a’ is normal with EðaÞ ¼ EðyÞ  xEðbÞ ¼ a.

r2 r2 r2   r2 X 2
VðaÞ ¼ VðyÞ þ x2 VðbÞ ¼ þ x2 ¼ Sxx þ nx2 ¼ xi
n Sxx nsxx nsxx

x2
i.e., a  N a; r2 1n þ Sxx
a  a0
H 01 : a ¼ a0 : under H 01 , t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  tn2
1 x2
^
r þ
n Sxx
P 2
^ ¼
where r 2
ðyi  a  bxi Þ =ðn  2Þ
Appendix 253

) H 11 : a [ a0 ; x0 : t [ ta;n2
H 21 : a\a0 ; x0 : t\ta;n2
H 31 : a 6¼ a0 ; x0 : jtj [ ta=2;n2 :
qffiffiffiffiffiffiffiffiffiffiffiffiffi
Also, 100ð1  aÞ% confidence interval for a is a r ^ 1n þ Sxxx ta=2 ; n  2 :
2

pffiffiffiffiffi
H 02 : b ¼ b0 : under H 02 ; t ¼ ðbb0r^Þ Sxx  tn2

) H 12 : b [ b0 ; x0 : t [ ta;n2
H 22 : b\b0 ; x0 : t\ta;n2
H 32 : b 6¼ b0 ; x0 : jtj [ ta=2;n2 :

Also, 100ð1  aÞ% confidence interval for b is


 
^
r
b pffiffiffiffiffiffi  ta=2;n2
Sxx

H 03 : a ¼ a0 ; b¼b0 : Covða; bÞ ¼ Cov (y  bx; bÞ

r2
¼ xVðbÞ ¼  x :
Sxx
! ( ! P 2 !)
r 2
a a xi
 rSxxx
2

)  N2 ; nSxx
b b
 rSxxx r 2 2
Sxx
! ( ! P 2  )
a a xi x  n
; nSr xx
2
i.e.,  N2
b b x  n n
P 2  X
r 2
xi x  n
Let ¼
nSxx x  n n
!0 !
aa X1 aa
)  v22 :
bb bb
 
n Pnx
P r2
P1 adj nSxx
nx x2
Now, ¼ j P j ¼ r2 2 P 2 i2 2
ðnSxx Þ ðn xi  n x Þ
254 Appendix

  
nSxx n nx 1 n nx
¼ 2 P 2
¼ 2 P 2
r nSxx nx xir nx xi
!0 ! !0   !
a  a X1 a  a 1 aa n nx aa
) ¼ 2 P 2
bb bb r bb nx xi bb
h X i
) nða  aÞ2 þ 2nxða  aÞðb  bÞ þ ðb  bÞ2 x2i  r2 v22
P
Again, n1 ðyi  a  bxi Þ2  r2  v2n2
) under H 03 ,
n P o.
nðaa0 Þ2 þ 2nxða  a0 Þðb  b0 Þ þ ðb  b0 Þ2 x2i 2
F¼ P .  F 2;n2
ðyi  a  bxi Þ2 ðn  2Þ
) w0 : F [ F a;2;n2 :

A.10 Tests Relating to Multiple and Partial


Correlation Coefficient
 
P
Suppose x px1  N p l ; pxp
 
q1:23...p = population multiple correlation coefficient of X 1 on X 2 ; X 3 ; . . .; X p
r 1:23...p = sample multiple correlation coefficient of X 1 on X 2 ; X 3 ; . . .; X p based on
a sample of size n ð  p þ 1Þ 0 1
1 r 12 r 13 . . . r 1p
1=2 B 1 r 23 . . . r 1p C
B C
¼ 1  RjRj11 where R ¼ BB ... ...C C and R11 = cofactor of r 11
@ ... ...A
1
in R.
r2 =ðp1Þ

If q1:23...p ¼ 0 then F ¼ 1:23...p  F p1;np .
ð1r1:23...p Þ ðnpÞ
2

To test
H 0 : q1:23...p ¼ 0 against H 1 : q1:23...p [ 0

)w0 : F [ F a;p1;np :

q12:34 ...p = population partial correlation coefficient of X 1 and X 2 eliminating the


effect of X 3 ; . . .; X p .
Appendix 255

r 12:34 ...p = sample partial correlation coefficient of X 1 andX 2 eliminating the effect
of X 3 ; . . .; X p
¼  pffiffiffiffiffiffiffiffiffi
R12 ffi
R R
. If q12:34 ...p then
11 22

pffiffiffiffiffiffiffiffiffiffiffi
r 12:34...p n  p
t ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  tnp
1  r 212:34...p

Thus for testing H 0 : q12:34 ...p ¼ 0 against

H 1 : q12:34 ...p [ 0; x0 : t [ ta;np


H 2 : q12:34 ...p \0; x0 : t\  ta;np
H 3 : q12:34 ...p 6¼ 0; x0 : jtj [ ta=2;np :

A.11 Problems Related to Multiple Regression


 
We consider a set of variables y; x1 ; x2 ; . . .; xp , where y is stochastic and
 
x1 ; x2 ; . . .; xp are nonstochastic. Let the multiple regression of y on x1 ; x2 ; . . .; xp be
  
E y x1 ; x2 ; . . .; xp ¼ b0 þ b1 x1 þ b2 x2 :: þ bp xp ðA:1Þ

where b0 ; b1 ; b2 ; bp are constants. In fact,


bi = partial regression coefficient of y on xi eliminating the effects of
xj ; j 6¼ i ¼ 1; 2;. . .p:
Define riy ¼ Cov(xi ; yÞ;rij ¼ Cov(xi ; xj Þ; ryy ¼ vðyÞ; qiy ¼ correlation of
ðxi ; yÞ; qij ¼ correlation of ðxi ; xj Þ; i ¼ 1; 2; ::p and j ¼ 1ð1Þp
We write r px1 ¼ ðr1y ; r2y ; . . .; rpy ; Þ0

0 ð1Þ 1
r11 r12 . . . r1p
Ppxp B r21 r22 . . . r2p C
¼B C
@ . . . . . . . . . . . . A = variance-covariance matrix of x1 ; x2 ; ::; xp
rP1 rP2 . . . rpp
We write
0 1
ryy ry1 ry2 ryp
B
Xp þ 1xp þ 1 B r1y r11 r12 r1p C
C
0
¼B
B r2y r21 r22 r2p C
C
@... ... ... ... A
0 1 rpy rp1 rp2 rpp
ryy r0
 ð1Þ
¼@ P A = variance–covariance matrix of y, x1 ; x2 ; . . .; xp .
r pxp
 ð1Þ
256 Appendix

Similarly, we0write 1
qyy qy1 qy2 qyp
B q1y q11 q12 q1p C
B C
q0p þ 1xp þ 1 ¼ B q
B 2y q21 q22 q2p C
C = correlation matrix of y,
@... ... ... ... A
qpy qp1 qp2 qpp
x1 ; x2 ; . . .; xp . 
P  P
Now,  0  ¼ product of the diagonal element of 0 jq0 j

¼ ðryy r11 r22 . . .rpp Þjq0 j


P
j j P
)jq0 j ¼ ðryy r11 r220...rpp Þ. Also, j j ¼ ðr P11 r22 . . .rpp Þx Cofactor of qyy in q0 .
j j
) Cofactor of qyy in q0 ¼ r11 r22 ...rpp

jq 0 j
) q2y:12...p ¼ 1 
Cofactor of qyy in q0
P  P 
  ðryy r11 r22 . . .rpp Þ  
¼1 P 0
 ¼1 P
0
j j ðr11 r22 . . .rpp Þ ryy j j
P1 P
ryy  r 0 r 0
r0ð1Þ 1 rð1Þ rð1Þ b X1
 ð1Þ  ð1Þ 
¼1 ¼ ¼ as b ¼ r :
ryy ryy ryy   ð1Þ
P
r0 b b0 b X1 X X 
 ð1Þ 
) q2y:12...p ¼ ¼ 
; b¼ r ) b ¼ r ) r0 b ¼ b0 b
ryy ryy   ð1Þ   ð1Þ  ð1Þ   

Suppose we are given the set of observations


 
ya ; x1a ; x2a ; . . .; xpa ; a ¼ 1ð1Þn; n [ p þ 1:
Pn Pn  
Define xi ¼ 1n a¼1 xia ; Sij ¼ a¼1 ðxia  xi Þ xja  xj
Xn  
Siy ¼ a¼1
ðxia  xi Þ ya  yj : 8i; j ¼ 1ð1Þp

0 1
S11 S12 ... S1p
B S21 S22 ... S2p C
Spxp ¼B
@...
C which is positive definite.
... ... ... A
Sp1 Sp2 ... Spp

^ þb
Estimated regression equation of y on x1 ; x2 ; . . .; xp is y ¼ b ^ x1 þ
0 1
^ ^
b2 x2 þ    þ bp xp
Appendix 257

^ ;b
where b ^ ^ ^
0 1 ; b2 ; . . .; bp are the solutions of the following normal equations:
9
S1y ¼b ^ S11 ^ S12
b þ  þ ^ S1p >
b >
1 2 p >
S2y ¼b ^ S21 ^ S22
b þ  þ ^ S2p =
b
1 2 p ðA:2Þ
... ... ... >
>
Spy ¼b ^ Sp1 ^ Sp2
b þ  þ ^ Spp >
b ;
1 2 p

^ ¼yb
and b ^ x1  b
^ x2      b ^ xp
0 1 2 p
We write y nx1
¼ ðy1 ; y2 ; . . .; yn Þ0

0 1
x11  x1 x21  x2 . . . xp1  xp
B x12  x1 x22  x2 . . . xp2  xp C
K nxp ¼B
@ ...
C
... ... ... A
x1n  x1 x2n  x2 . . . xpn  xp
0
^ px1 ¼ b
b ^ ;b
^ ^
1 2 ; ::; bp


Pn
Note that Siy ¼ a¼1 ðxia  xi Þya ; (A.2) reduces to

^ ¼ K0 y ) b
Sb ^ ¼ S1 K 0 y
   

^ ;b
)b ^ ^
1 2 ; . . .; bp are linear functions of y1 ; y2 ; . . .; yn which are normal.
  
^ ^ ;D b
) b  Np E b ^
 
 
^ ¼ S1 K 0 y ¼ S1 K 0 y  y 2
Now, b
   
0
where 2 ¼ ð1; 1; . . .; 1Þ and K 0 2 ¼ 0
  

   
^ 1 0
) E b ¼ S K E y  y 2
  

Eðya Þ ¼ b0 þ b1 x1a þ    þ bp xpa


EðyÞ ¼ b0 þ b1x1 þ    þ bpxp
Eðya  yÞ¼b1 ðx1a  x1 Þ þ    þ bp ðxpa  xp Þ
0 1
y1  y
  B y  y C
B 2 C
E y  y 2 ¼ E B C ¼ Kb
  @ : A 

yn  y
258 Appendix

 
)E b ^ ¼ S1 K 0 K b
^ ¼ S1 S b ¼ b ½* K 0 K ¼ S
   

^
D b ¼ S K r I n KS ¼ r S K KS1 ¼ r2 S1 SS1 ¼ r2 S1
1 0 2 1 2 1 0

 
^ ¼ S1 K 0 y  Np b ; r2 S1
)b
  

 ij 
We write S1 ¼ S ^ Þ ¼ r2 Sii
) Vðb i
and

Cov b ^
^ ;b
i j ¼r S 8i; j ¼ 1ð1Þp
2 ij

 
^  N 1 b ; r2 Sii i ¼ 1ð1Þp
)bi i

^ ¼y  b
Again, b ^ x1  b
^ x2  b
^ xp
0 1 2 p
!
X
p X
p X
p
^ ¼ EðyÞ
)E b E ^ xi ¼
b b0 þ bixi  bixi ¼ b0
0 i
i¼1 i¼1 i¼1
X
^ ¼V y
V b ^ xi
b
0 i
 
 
^ ; x 0 ¼ x1 ; x2 ; . . .; xp
¼ V y  x0b
  
 
¼ rn
2
0 ^ x (as y and b
þ x D b ^ are independent)
   

r2 1
¼ þ x 0 r2 S1 x ¼ r2 þ x 0 S1 x :
n   n  

^ is also a linear combination of normal variables.


Thus b0
 
^  N 1 b ; r2 1 þ x 0 S1 x
)b 0 0
n  

^ þ Pb
p
Again Y ¼ b ^ xi ; ) Y  N 1 ðEðYÞ; VðYÞÞ
0 i
1
P
^ þ
where EðYÞ ¼ E b ^ xi ¼ b þ P b xi ¼ nx ; (say)
E b
0 i 0 i
Appendix 259

! " #
X
p X X
VðYÞ ¼ V b^0 þ ^ xi
b i ¼ V y  ^ xi þ
b i
^ xi
b i
i¼1 i i
" # !
X X
¼ V y þ ^
bi ðxi  xi Þ ¼ VðyÞ þ V ^
bi ðxi  xi Þ
i i

r2 X X
p p
 
¼ þ ðxi  xi Þ xj  xj Cov b ^
^ ;b
i j
n 1 1
0 
r2 X X   2 ij
p p
2 1 1
¼ þ ðxi  xi Þ xj  xj r S ¼ r þ x  x S ð x  xÞ
n n  
1 1
  
X 1 0
) Y  N1 b0 þ bi xi ¼ nx ; r2 þ x  x S1 ð x  xÞ
n  

To get different test procedures r2 is estimated as

X p 2
1 ^ b ^ x1a þ . . . þ b ^ xpa
^2 ¼
r ya  b
n  p  1 a¼1 0 1 p

( )2
1 Xn X
¼ ðya  yÞ  ^ ðxia  xi Þ
b
n  p  1 a¼1 i
i

" #
1 X n XX
¼ Syy  ^ ^
bi bj ðxia  xi Þðxia  xjÞ
np1 a¼1 i j
" # 
1 XX 1 0
¼ Syy  ^ ^
bi bj Sij ¼ ^
Syy  b S b^
np1 i j
np1  
0
P
b b
(Note that q2y:12:::p ¼  ryy  )
(1) H 01 : b1 ¼ b2 ¼    ¼ bp ¼ 0
) x1 ; x2 ; . . .; xp are not worthwhile in predicting y.

^0 P b
b ^
 
* q2y:12...p ¼ ) b1 ¼ b2 ¼    ¼ bp ¼ 0 ) q2y:12...p ¼ 0
ryy

So the problem
is to test H 01 : q2y:12...p ¼ 0 against H 1 : q2y:12...p [ 0
Now Syy 1  r 2y:12...p ¼ Syy  b ^0 P b ^
 

n
X
¼ ^ b
ya  b ^ x1a      b
^ xpa  r2 v2
0 1 p np1
a¼1
260 Appendix
 
^ 0S b
Also, Syy r 2y:12...p ¼ b ^ 0S b
^ ¼ Syy  Syy  b ^
   
 
) Syy ¼ ^ 0S b
Syy  b ^ þ Syy r 2y:12...p
 

Syy  r2 v2n1 ) Syy r 2y:12...p  r2 v2p


.
r 2y:12...p p
) F1 ¼ .  F p;np1
1  r 2y:12...p ðn  1  pÞ
) x0 : F 1 [ F a;p;np1

(2) H 0 : b0 ¼ b against H 1 : b0 6¼ b
^ b
b
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0
 tnp1
1 0 1
^
r þ x S x
n   
^2 ¼ np1
where r 1 ^ 0S b
Syy  b ^ ¼ S02
y:12...p , say
 

X
^ ¼y
and b ^ xi ¼ y x 0 b
b ^
0 i
 

x0 : jtj [ ta=2;np1

(3) H 0 : bi ¼ b0i against H 1 : bi 6¼ b0i 8i ¼ 1ð1Þp


 
^  N 1 b ; r2 Sii
b i i

^ b0
b ipffiffiffiiffi
Under H 0 ; t ¼  tnp1 :
r
^ Sii

x0 : jtj [ ta=2;np1

100ð1  aÞ% confidence interval for bi is


pffiffiffiffiffi
b^ r
^ S ii
t a=2;np1
i

(4) H 0 : bi  bj ¼ d0 against H 1 : bi  bj 6¼ d0 : 8i 6¼ j ¼ 1ð1Þp


  
^ b
b ^  N 1 b  b ; r2 Sii þ Sjj  2Sij
i j i j

ðb^i b^j Þd


) under H 0 ; t ¼ pffiffiffiffiffiffiffiffiffiffi ffi 0  tnp1
r
^
ii jj
S þ S 2Sij
Appendix 261

) x0 : jtj [ ta=2;np1
 
100ð1  aÞ% confidence interval for bi  bj is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
b^ b
^ r ^ Sii þ Sjj  2Sij ta=2;np1
i j

(5) H 0 : EðYÞ ¼ nx ¼ n0
Yn0
Under H 0 ; t ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
0 ffi  tn  p  1
r
^ nþ
1
x  x S1 x  x
   

x0 : jtj [ ta=2;np1 :

100ð1  aÞ% confidence interval for nx is


rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !
1 0
1

^
Y r þ x  x S x  x ta=2;n  p  1
n    

^ þ Pp b
where Y ¼ b ^
0 i¼1 i xi

A.12 Distribution of the Exponent


of the Multivariate Normal Distribution
 
Ppxp P
Let x px1
 Np l px1
; ; is positive definite.
 
The p.d.f. of x is


0 P
1
1 12 xl xl 1\xi \1
f x ¼ p ffiffiffiffiffiffiffiffi
P e     ; and 0\ri \1
 ð2pÞp=2 j j 1\li \1

X1
Qð x Þ ¼ ð x  l Þ0 ð x  lÞ
    

P
Since is positive definite, there exists a nonsingular matrix V pxp such that
P1
¼ VV 0 :
) Qð x Þ ¼ ð x  l Þ0 VV 0 ð x  l Þ ¼ y 0 y where y ¼ V 0 ð x  l Þ
         
262 Appendix

X
p
¼ y2i :
i¼1
   rffiffiffiffiffiffiffiffiffiffi
X
@ x1 ; x2 ; . . .; xp  1 1  

jJ j ¼   ¼ ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffi ¼  
@ ðy1 ; y2 ; . . .; yn Þ  jV j P1 
 

12 y 0 y  pffiffiffiffiffiffiffiffi
P
) p.d.f. of y is f ð y Þ ¼ p
1 ffiffiffiffiffiffiffiffi
P e   j j
  ð2pÞ p=2
j j
P
P

1 12 y21
¼ e 1 :
ð2pÞp=2

Now y1 ; y2 ; . . .; yp are i.i.d Nð0; 1Þ

X
p
) y2i  v2p ; i.e., Qð x Þ  v2p

1

P P
If we now want to find the distribution of Q
ð x Þ ¼ x 0 1 x, since is
  P
positive definite, there exists a non-singular matrix V pxp such that 1 ¼ VV 0 .
) Q
ð x Þ ¼ x 0 VV 0 x ¼ z 0 z where Z ¼ V 0 x .
   
P 
pffiffiffiffiffiffiffiffi
Here also jJ j ¼ j j.
X1
)ð x  l Þ0 ð x  l Þ¼ ð x  l Þ0 VV 0 ð x  l Þ
       
 0  
¼ V 0 x V 0 l I p V 0 x V 0 l
   
 0  
¼ z  V 0 l Ip z  V 0 l
   
0
1 12 z V 0 l I p z V 0 l
) f ðzÞ ¼ e    

ð2pÞp=2

) z1 ; z2 ; . . .; zp are normal with common variance unity but with means given by
E z ¼ V0 l.
 
 0  
Pp 2
) 1 zi  non-central v2p with non-centrality parameter V 0 l V0 l ¼
 
P
l 0 VV 0 l ¼ l 0 1 l.
  
Appendix 263

A.13 Large Sample Distribution of Pearsonian


Chi-Square

Events A1 A2 …. Ai …… Ak Total
Probability P1 P2 …. Pi …… Pk 1
Frequency n1 n2 …. ni …… nk n

n! Y k
) f ðn1 ; n2 ; . . .; nk Þ ¼ Q pni i
n !
i i i¼1

ni  Binðn; pi Þ
P 2
Pearsonian chi-square statistic is v2 ¼ ki¼1 ðni npnpi Þ
i
Using Stirling’s approximation to factorials
pffiffiffiffiffiffi n n þ 1
2pe n 2 Y k
f ðn1 ; n2 ; . . .; nk Þ ’ Q pffiffiffiffiffiffi ni þ 12
pni i
k n
1 2pe ni
i 1
n þ 12
Qk n i
n 1 pi
¼ k1 Q n þ1
k
ð2pÞ 2 1 ni i 2
pffiffiffi Y k   1
n npi ni þ 2
¼ k1 Q pffiffiffiffiffiffi
ð2pÞ 2 npi 1 ni

Xk    
1 npi
) loge f ðn1 ; n2 ; . . .; nk Þ ’ C þ ni þ logc ðA:3Þ
i¼1
2 ni
pffiffi

where C ¼ loge n
k1 Qk pffiffiffiffiffi
ð2pÞ 2
1
npi

ni npi
We write di ¼ p ffiffiffiffiffiffiffi
np q ; qi ¼ 1  pi
i i
264 Appendix

rffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffi ni qi
) ni ¼ npi þ di npi qi ) ¼ 1 þ di
npi npi
 rffiffiffiffiffiffi1
np qi
) i ¼ 1 þ di
ni npi
X k    rffiffiffiffiffiffi
1 pffiffiffiffiffiffiffiffiffiffi qi
) loge f ðn1 ; n2 ; . . .; nk Þ¼ C  npi þ þ di npi qi logc 1 þ di
1
2 npi
0 1
Xk   rffiffiffiffiffiffi 3=2
1 pffiffiffiffiffiffiffiffiffiffi qi d2 q d3 q
¼C npi þ þ di npi qi @di  i i þ i i     A;
2 np i 2np i 3ðnpi Þ 3=2
1
 qffiffiffiffiffi
 qi 
Provided di np \1
i

k 
X rffiffiffiffiffiffi 
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di  d2i qi  d2i i þ d2i qi þ piffiffiffi ð  Þ þ   
1
2 npi 2 4 npi n

k 
X rffiffiffiffiffiffi 
pffiffiffiffiffiffiffiffiffiffi 1 qi 1 1 q d3
¼C di npi qi þ di þ d2i qi  d2i i þ piffiffiffi ð  Þ    ðA:4Þ
1
2 npi 2 4 npi n
P pffiffiffiffiffiffiffiffiffiffi Pk
Note that di npi qi ¼ 1 ðni  npi Þ ¼ n  n ¼ 0;
pffiffiffi d3 d2
we assume that d3i ¼ 0ð nÞ i:e:; piffiffin ! 0; pdiffiffin ! 0 ) ni ! 0
) All the terms in the R.H.S of (A.4) tends to zero except 12 d2i qi , thus (A.4)
implies

1X k Pk 2
d2i qi ) f ’ eC  e2 1 di qi
1
loge f ’ C 
2 1
pffiffiffi Pk ðni npi Þ2
n 12
)f ’ k1 Q pffiffiffiffiffiffi e
i¼1 npi

ð2pÞ 2 k1 npi

1 Pk ðni npi Þ2 1
2 1
¼ pffiffiffiffiffi e
k1
i¼1 npi
: Qk1 pffiffiffiffiffiffi ðA:5Þ
ð2pÞ pk
2
1 npi
P
We note that k1 ðni  npi Þ ¼ 0
P
i.e., nk  npk ¼  1k1 ðni  npi Þ

X
k
ðni  np Þ2 X
k1
ðni  np Þ2 ðnk  npk Þ2
) i
¼ i
þ
i¼1
npi i¼1
npi npk
Appendix 265

nP o2
k1
X
k1
ðni  np Þ2 1 ðni  npi Þ
¼ i
þ ðA:6Þ
1
npi npk

We use the transformation ðn1 ; n2 ; . . .; nk1 Þ ! ðx1 ; x2 ; . . .; xk1 Þ


where xi ¼ npi np
ffiffiffiffi
ffii
np ; i ¼ 1ðlÞk  1:
i

 
@ ðn1 ; n2 ; . . .; nk1 Þ   pffiffiffiffiffiffiffiffiffiffiffi

jJ j ¼   ¼ Diag pffiffiffiffiffiffiffi
np1 . . . npk1 
@ ðx ; x ; . . .; x Þ 
1 2 k1
Y
k 1
pffiffiffiffiffiffi
¼ npi
Pk1 pffiffiffiffiffi 2
Pk ðni npi Þ2 Pk1 npi xi
(A.6) ) 1 npi ¼ 1 x2i þ 1
npk
" #
X
k1
1 X k1 k1 X
X pffiffiffiffiffiffiffiffi
¼ þ x2i px þ
2
pi pj x i x j
1
pk 1 i i i6¼j¼1
k1 
X  Xk1 X pffiffiffiffiffiffiffi

pi pj
p
¼ 1 þ i x2i þ xi xj
1
pk i6¼j¼1
pk
¼ x 0A x
 

where
0 pffiffiffiffiffiffiffi pffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffi 1
p1 p1 p2 p1 p3 p1 pk1
1þ pk ffi . . .
B
pk pk pffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffi
pk ffi
C
B 1 þ pp2
p2 p3 p2 pk1
C
Ak1xk1 ¼ B pk . . . pk C
@ A
k
... ...
1 þ ppk1
k

0 1
1 þ a21 a1 a2 a1 a3 . . . a1 ak1
B a1 a2 1 þ a22 a2 a3 . . . a2 ak1 C qffiffiffiffi
¼B
@
C where ai ¼ pi 8i ¼ 1ð1Þk  1
... ... A pk

a1 ak1 a2 ak1 a3 ak1 1 þ a2k1

Now
 
 a1 þ a1 a2 a3 . . . ak1 
 1   
 a1 a2 þ a12 a3 . . . ak1  Rows

jAj ¼ ða1 a2 ::ak1 Þ ;

 . . .: ... ... ...  ai
 a1 a2 a3 1 
ak1 þ ak1
266 Appendix

 
1þ 1
1 1. . . 1 
 a21 
 
1 1þ 1
1. . . 1 

2
a22 
¼ ða1 a2    ak1 Þ  1 1þ 1 
1 1 
... ... ...
a23
... 
 
1 1þ 
 1 1 1
a2k1 

 2 
 a1 þ 1 a21 ... a21 
 2 
 a a2 þ 1
2
... a22 
¼  2  (ith row X a2 )

 . 2. . ... ... i

 a a2k1 ... a2k1 þ1 
k1

 Pk1 P Pk1 
1þ a2i 1 þ 1k1 a2i ... 1þ a2i 
 1 1
 
¼ 
a22 1 þ a22 ... a22  (1st row R1 ¼ P Ri =
... ... ... 
 
 a2k1 a2k1 ... 1 þ a2k1 
sum of all rows)
 
 1 1 ... 1 
 2 
Xk1  a a22 þ 1 ... a22 
¼ 1þ a2i ¼  2 ;

1
 . 2. . ... ... 
a a2k1 ... 1 þ a2k1 
k1

 
 1 1 ... 1 

Pk1 2  0 1 ... 0 
¼ 1 þ 1 ai ¼  ;
 . . .: . . .: ... . . . 
 0 0 ... 1 

P
k
X
k1 X
k1 pi
pi 1
¼ 1þ a2i ¼ 1 þ ¼ 1 ¼
1
pk pk pk
12 x 0 A x
) (A.5) ) f ðx1 ; x1 ; . . .; xk1 Þ ¼ 1
k1 p ffiffiffiffi e   Qk11 pffiffiffiffiffi jJ j
ð2pÞ 2 pk npi
pffiffiffiffiffiffi
1

jAj 12 x 0 A x
¼ k1 e  

ð2pÞ 2

P1
12 x 0 x P
¼ k1 P 1=2 e
1   where A1 ¼ :
ð2pÞ 2 j j
Appendix 267

P
Since 1 is positive definite therefore there exists a non-singular V such that
P1
¼ VV 0 .
P
) x 0 1 x ¼ x 0 vv0 x ¼ y 0 y where y ¼ v0 x .
       
Using transformation ðx1 ; x2 ; . . .; xk1 Þ ! ðy1 ; y2 ; . . .; yk1 Þ
X1=2
1  
jJ j ¼ ¼ 
jV j
12 y 0 y
) f ðy1 ; y2 ; . . .; yk1 Þ ¼ 1
k1 e  ) y1 ; y2 ; . . .; yk1 are i.i.d. N(0, 1)
ð2pÞ 2

) y 0 y  v2k1
 

X1 X
k
ðni  np Þ2
) x0 x  v2k1 ) i
 v2k1
  npi
i¼1
 qffiffiffiffiffi
 qi 
Note This approximate distribution is valid if di np \1
i

i.e., d2i qi \npi ) if d2i \ np npi


q , i.e. if Maxdi \ q
i 2
i i

n np
Again di ¼ pi ffiffiffiffiffiffiffii , using normal approximation the effective range of di is (−3, 3)
npi qi
)d2i 9
i.e.,

Maxd2i ¼ 9

So the approximate distribution will be valid if 9\ np qi , i.e. if npi [ 9ð1  pi Þ ,


i

i.e. if npi [ 9. So the approximation is valid if the expected frequency for each
event is at least 10.
Again, if we consider the effective range of di as (−2, 2), then the approximation
is valid if the expected frequency for each event is at least 5.
It has been found by enquiry that if the expected frequencies are greater than 5
then the approximation is good enough.
If the expected frequencies of some classes be not at least 5 then some of the
adjacent classes are pooled such that the expected frequencies of all classes after
coalition are at least 5. If k
be the no. of classes after coalition, then
Pk
ðn
i np
i Þ2
i¼1 np
 v2k
1 ;
i
where n
i ¼ observed frequency after coalition,
np
i ¼expected frequency after coalition.
Uses of Pearsonian-v2 :
(1) Test for goodness of fit:
268 Appendix

Classes Probability Frequency


A1 p1 n1
A2 p2 n2
. . .
. . .
. . .
Ai pi ni
: : :
Ak pk nk
Total 1 n

We are to test H 0 : pi ¼ p0i  8i


Under H 0 ; expected frequencies are np0i : We assume np0i  58i.
Pk ðni np0i Þ2
) Under H 0 ; i¼1 np0i
 v2k1
P n2
i.e. ki¼1 npi 0  n  v2k1 :
Pi 2
i.e. v2 ¼ ki¼1 OE  n  v2k1
where O = observed frequency (ni )
E = Expected frequency (np0i )

) x0 : v2 [ v2a;k1

Note Suppose the cell probabilities p1 ; p2 ; . . .; pk depend on unknown parame-


ters h s1 ¼ ðh1 ; h2 ; . . .; hs Þ0 and suppose ^h be an efficient estimator of h . Then
  
n o2
Pk ni npi ð h^Þ
i¼1
  v2ðk1sÞ :
npi ^
h


(2) Test for homogeneityof similarly classified populations

Classes Population
P1 P2 ………. Pj ………. Pl
A1 p11 p12 ………. p1j ………. p1l
A2 p21 p22 ………. p2j ………. p2l
. . . ………. . ………. .
. . . ………. . ………. .
Ai pi1 pi2 pij pij
. . . . .
. . . . .
. .
Ak pk1 pk2 pkj pkl
Total 1 1 ………. 1 ………. 1
Appendix 269

where pij ¼ the probability that an individual selected from jth population will
belong to ith class.
We are to test H 0 : pi1 ¼ pi2 ¼    ¼ pil ð¼ pi sayÞ 8i ¼ 1ð1Þk. To do this we
draw a sample of size n and classify as shown below:

Classes Population Total


P1 P2 …. Pj …. Pl
A1 n11 n12 n1j n1l n10
A2 n21 n22 n2j n2l n20
. . . . . . . .
. . . . . . . .
Ai ni1 ni2 nij nil ni0
. . . . . . . .
. . . . . . . .
Ak nkl nk2 …. nkj …. nkl nk0
Total n01 n01 …. n0j …. n0l n

For the jth population the Pearsonian chi-square statistic is


 2
Xk
nij  n0j pij
 vð2k1Þ  8j ¼ 1ð1Þ1
i¼1
n0j pij
 2
Xl X k
nij  n0j pij
)  v21ðk1Þ
j¼1 i¼1
n 0j p ij

Pl Pk fnij n0j pi g2
) Under H 0 ; v2 ¼ j¼1 i¼1 n0j pi  v21ðk1Þ
pi ’s are unknown and they are estimated by p^i ¼ nni0 8i ¼ 1ð1Þk:
P P fnij n0j ni0 g
) under H 0 ; v2 ¼ 1j¼1 ki¼1 n ni0n  v21ðk1Þðk1Þ ¼ v2ðk1Þðl1Þ
0j n

as the d.f. will be reduced by (k − 1) since we are to estimate any (k − 1) of


P
p1 ; p2 ; . . .; pk as k1 pi ¼ 1:

) x0 : v2 [ v2a;ðk1Þð11Þ
270 Appendix

(3) Test for independenceof two attributes

A B Total
B1 B2 ………. Bj ………. Bl
A1 p11 p12 ………. p1j ………. p1l p10
A2 p21 p22 ………. p2j ………. p2l p20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai pi1 pi2 pij pi1 pi0
. . . . . .
. . . . . .
Ak pk1 pk2 pkj pkl pk0
Total p01 p02 ………. p0j ………. p0l 1

We are to test
H 0 : A and B are independent, i.e. to test

H 0 : pij ¼ pi0 xp0j 8ði; jÞ

To do this we draw a sample of size n and suppose the sample observations be


classified as shown below:

A B Total
B1 B2 ………. Bj ………. Bl
A1 n11 n12 ………. n1j ………. n1l n10
A2 n21 n22 ………. n2j ………. n2l n20
. . . ………. . ………. . .
. . . ………. . ………. . .
Ai ni1 ni2 nij nil ni0
. . . . . .
. . . . . .
Ak nk1 nk2 nkj nkl nk0
Total n01 n02 ………. n0j ………. n0l n

n! Y Y  nij
Pðn11 ; . . .; nkl Þ ¼ Q Q   pij
i j nij ! i j

 
E nij ¼ npij ;
Appendix