estimation and inferential stats

© All Rights Reserved

6 views

estimation and inferential stats

© All Rights Reserved

- Solutions Part
- Bsafc6 Tif Ch01 Q1-38
- Feinstein- Principles of Medical Statistics- 2002
- Descriptive Statistics
- tmp664
- Losseau ULG Belgium Straightening Estimamtion
- Brett Steenbarger - Trading_Journal_Suggestions
- Incremental Dynamic Analysis
- SKM_36717032702380ii
- 10.1.1.214.8629.pdf
- Rmrs 2001 Schreuder h001
- Donald Rubin_Direct and Indirect Causal Effects via Potential Outcomes
- UJI NORMALITAS.xls
- Tutorial Chapter 9
- Summarization
- sampling distribution
- FReET User
- Chapter 5 - Optimization
- PDF Paper 3a
- Essay Statistics

You are on page 1of 335

Ajit Kumar Das

Estimation

and

Inferential

Statistics

Estimation and Inferential Statistics

Pradip Kumar Sahu Santi Ranjan Pal

•

Statistics

123

Pradip Kumar Sahu Ajit Kumar Das

Department of Agricultural Statistics Department of Agricultural Statistics

Bidhan Chandra Krishi Viswavidyalaya Bidhan Chandra Krishi Viswavidyalaya

Mohanpur, Nadia, West Bengal Mohanpur, Nadia, West Bengal

India India

Department of Agricultural Statistics

Bidhan Chandra Krishi Viswavidyalaya

Mohanpur, Nadia, West Bengal

India

DOI 10.1007/978-81-322-2514-0

© Springer India 2015

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part

of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations,

recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission

or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar

methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this

publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from

the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this

book are believed to be true and accurate at the date of publication. Neither the publisher nor the

authors or the editors give a warranty, express or implied, with respect to the material contained herein or

for any errors or omissions that may have been made.

Preface

Nowadays one can hardly ﬁnd any ﬁeld where statistics is not used. With a given

sample, one can infer about the population. The role of estimation and inferential

statistics remains pivotal in the study of statistics. Statistical inference is concerned

with problems of estimation of population parameters and test of hypotheses. In

statistical inference, drawing a conclusion about the population takes place on the

basis of a portion of the population. This book is written, keeping in mind the need

of the users, present availability of literature to cater to these needs, their merits and

demerits under a constantly changing scenario. Theories are followed by relevant

worked-out examples which help the user grasp not only the theory but also

practice them.

This work is a result of the experience of the authors in teaching and research

work for more than 20 years. The wider scope and coverage of the book will help

not only the students, researchers and professionals in the ﬁeld of statistics but also

several others in various allied disciplines. All efforts are made to present the

“estimation and statistical inference”, its meaning, intention and usefulness. This

book reflects current methodological techniques used in interdisciplinary research,

as illustrated with many relevant research examples. Statistical tools have been

presented in such a manner, with the help of real-life examples, that the fear factor

about the otherwise complicated subject of statistics will vanish. In its seven

chapters, theories followed by examples will make the readers to ﬁnd most suitable

applications.

Starting from the meaning of the statistical inference, its development, different

parts and types have been discussed eloquently. How someone can use statistical

inference in everyday life has remained the main point of discussion in examples.

How someone can draw conclusions about the population under varied situations,

even without studying each and every unit of the population, has been discussed

taking numerous examples. All sorts of inferential problems have been discussed, at

one place supported by examples, to help the students not only in meeting their

examination need and research requirement, but also in daily life. One can hardly

get such a compilation of statistical inference in one place. The step-by-step

v

vi Preface

procedure will immensely help not only the graduate and Ph.D. students but also

other researchers and professionals. Graduate and postgraduate students,

researchers and the professionals in various ﬁelds will be the user of the book.

Researchers in medical and social and other disciplines will be greatly beneﬁtted

from the book. The book would also help students in various competitive

examinations.

Written in a lucid language, the book will be useful to graduate, postgraduate

and research students and practitioners in diverse ﬁelds including medical, social

and other sciences. This book will also cater the need for preparation in different

competitive examinations. One can ﬁnd hardly a single book, in which all topics

related to estimation and inference are included. Numerous relevant examples for

related theories are added features of this book. An introduction chapter and an

annexure are special features of this book which will help readers in getting basic

ideas and plugging the loopholes of the readers. Chapter-wise summary of the

content of the proposed book is presented below.

• Chapter 1: The chapter relates to introduction to the theory of point estimation

and inferential statistics. Different criteria for a good estimator are discussed.

The chapters also present real-life worked-out problems that help the reader

understand the subject. Compared to partial coverage of this topic in most books

on statistical inference, this book aims at elaborate coverage about the subject of

point estimation.

• Chapter 2: This chapter deals with different methods of estimation like least

square method, method of moments, method of minimum χ2 and method of

maximum likelihood estimation. Not all these methods are equally good and

applicable in all situations. Merits, demerits and applicability of these methods

have been discussed in one place, which otherwise have remained mostly dis-

persed or scattered in the competing literature.

• Chapter 3: Testing of hypotheses has been discussed in this chapter. This

chapter is characterized by typical examples in different forms and spheres

including Type A1 testing, which is mostly overlooked in many of the available

literature. This has been done in this book.

• Chapter 4: The essence and technique of likelihood ratio test has been discussed

in this chapter. Irrespective of the nature of tests for hypotheses (simple and

composite), this chapter emphasizes how easily the test could be performed,

supported by a good number of examples. Merits and drawbacks have also been

discussed. Some typical examples are discussed in this chapter that one can

hardly ﬁnd in any other competing literature.

Preface vii

estimation under different situations, problems and prospects of different

approaches of interval estimation has been discussed with numerous examples

in one place.

• Chapter 6: This chapter deals with non-parametric methods of testing

hypotheses. All types of non-parametric tests have been put together and dis-

cussed in detail. In each case, suitable examples are the special feature of this

chapter.

• Chapter 7: This chapter is devoted to the discussion of decision theory. This

discussion is particularly useful to students and researchers interested in infer-

ential statistics. In this chapter, attempt has been made to present the decision

theory in an exhaustive manner, keeping in mind the requirement and the

purpose of the reader for whom the book is aimed at. Bayes and mini-max

method of estimation have been discussed in the Annexure. Most of the

available literature on inferential statistics lack due attention on these important

aspects of inference. In this chapter, the importance and utilities of the above

methods have been discussed in detail, supported with relevant examples.

• Annexure: The authors feel that the Annexure portion would be an asset to

varied types of readers of this book. Related topics, proofs, examples, etc.,

which could not be provided in the text itself, during the discussion of various

chapter for the sake of maintenance of continuity and flow are provided in this

section. Besides many useful proofs and derivations, this section includes

transformation of statistics, large sample theories, exact tests related to binomial,

Poisson population, etc. This added section will be of much help to the readers.

In each chapter, theories are followed by examples from applied ﬁelds, which

will help the readers of this book to understand the theories and applications of

speciﬁc tools. Attempts have been made to familiarize the problems with examples

on each topic in a lucid manner. During the preparation of this book, a good number

of books and articles from different national and international journals have been

consulted. Efforts have been made to acknowledge and provide these in the bib-

liography section. An inquisitive reader may ﬁnd more material from the literature

cited.

The primary purpose of the book is to help students of statistics and allied ﬁelds.

Sincere efforts have been made to present the material in the simplest and

easy-to-understand form. Encouragements, suggestions and help received from our

colleagues at the Department of Agricultural Statistics, Bidhan Chandra Krishi

Viswavidyalaya are sincerely acknowledged. Their valuable suggestions towards

improvement of the content helped a lot and are sincerely acknowledged. The

authors thankfully acknowledge the constructive suggestions received from the

reviewers towards the improvement of the book. Thanks are also due to Springer

viii Preface

for the publication of this book and for continuous monitoring, help and suggestion

during this book project. The authors acknowledge the help, cooperation, encour-

agement received from various corners, which are not mentioned here. The effort

will be successful, if this book is well accepted by the students, teachers,

researchers and other users to whom this book is aimed at. Every effort has been

made to avoid errors. Constructive suggestions from the readers in improving the

quality of this book will be highly appreciated.

Santi Ranjan Pal

Ajit Kumar Das

Contents

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Sufﬁcient Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Unbiased Estimator and Minimum-Variance

Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4 Consistent Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.5 Efﬁcient Estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.3 Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . 48

2.4 Method of Minimum v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.5 Method of Least Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2 Deﬁnitions and Some Examples . . . . . . . . . . . . . . . . . . . . . . . . 63

3.3 Method of Obtaining BCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.4 Locally MPU Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.5 Type A1 (Uniformly Most Powerful Unbiased) Test . . . . . . . . . 97

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.1.1 Some Selected Examples . . . . . . . . . . . . . . . . . . . . . . . . 104

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.2 Conﬁdence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

ix

x Contents

5.4 Shortest Length Conﬁdence Interval and Neyman’s Criterion . . . . 138

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.2 One-Sample Non-parametric Tests. . . . . . . . . . . . . . . . . . . . . . . 146

6.2.1 Chi-Square Test (i.e Test for Goodness of Fit) . . . . . . . . . 146

6.2.2 Kolmogrov–Smirnov Test . . . . . . . . . . . . . . . . . . . . . . . 147

6.2.3 Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.2.4 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 151

6.2.5 Run Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

6.3 Paired Sample Non-parametric Test . . . . . . . . . . . . . . . . . . . . . . 156

6.3.1 Sign Test (Bivariate Single Sample Problem)

or Paired Sample Sign Test . . . . . . . . . . . . . . . . . . . . . . 156

6.3.2 Wilcoxon Signed-Rank Test . . . . . . . . . . . . . . . . . . . . . . 157

6.4 Two-Sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.5 Non-parametric Tolerance Limits . . . . . . . . . . . . . . . . . . . . . . . 168

6.6 Non-parametric Conﬁdence Interval for nP . . . . . . . . . . . . . . . . . 170

6.7 Combination of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.8 Measures of Association for Bivariate Samples . . . . . . . . . . . . . . 174

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7.2 Complete and Minimal Complete Class of Decision Rules . . . . . . 189

7.3 Optimal Decision Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

7.4 Method of Finding a Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . 199

7.5 Methods for Finding Minimax Rule. . . . . . . . . . . . . . . . . . . . . . 208

7.6 Minimax Rule: Some Theoretical Aspects . . . . . . . . . . . . . . . . . 226

7.7 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

About the Authors

Statistics, Bidhan Chnadra Krishi Viswavidyalaya (a state agriculture university),

West Bengal. With over 20 years of teaching experience, Dr. Sahu has published

over 70 research papers in several international journals of repute and has guided

several postgraduate students and research scholars. He has authored four books:

Agriculture and Applied Statistics, Vol. 1, and Agriculture and Applied Statistics,

Vol. 2 (both published with Kalyani Publishers), Gender, Education,

Empowerment: Stands of Women (published with Agrotech Publishing House) and

Research Methodology: A Guide for Researchers In Agricultural Science, Social

Science and Other Related Fields (published by Springer) as well as contributed a

chapter to the book Modelling, Forecasting, Artiﬁcial Neural Network and Expert

System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and Niranjan

Sarangi (Daya Publishing House). Dr. Sahu has presented his research papers in

several international conferences. He also visited the USA, Bangladesh, Sri Lanka,

and Vietnam to attend international conferences.

S.R. Pal is former eminent professor at the Department of Agricultural Statistics at

R.K. Mission Residential College and Bidhan Chandra Krishi Viswavidyalaya (a

state agriculture university). An expert in agricultural statistics, Prof. Pal has over

35 years of teaching experience and has guided several postgraduate students and

research scholars. He has several research papers published in statistics and related

ﬁelds in several international journals of repute. With his vast experience in

teaching, research and industrial advisory role, Prof. Pal has tried to incorporate the

problems faced by the users, students, and researchers in this ﬁeld.

A.K. Das is professor at the Department of Agricultural Statistics, Bidhan Chandra

Krishi Viswavidyalaya (a state agriculture university). With over 30 years of

teaching experience, Prof. Das has a number of good research articles to his credit

published in several international journals of repute and has guided several

xi

xii About the Authors

and Applied Statistics, Vol. 2 (both published with Kalyani Publishers), and con-

tributed a chapter to the book, Modelling, Forecasting, Artiﬁcial Neural Network

and Expert System in Fisheries and Aquaculture, edited by Ajit Kumar Roy and

Niranjan Sarangi (Daya Publishing House).

Introduction

In a statistical investigation, it is known that for reasons of time or cost, one may

not be able to study each individual element of the population. In such a situation, a

random sample should be taken from the population, and the inference can be

drawn about the population on the basis of the sample. Hence, statistics deals with

the collection of data and their analysis and interpretation. In this book, the problem

of data collection is not considered. We shall take the data as given, and we study

what they have to tell us. The main objective is to draw a conclusion about the

unknown population characteristics on the basis of information on the same char-

acteristics of a suitably selected sample. The observations are now postulated to be

the values taken by random variables. Let X be a random variable which describes

the population under investigation and F be the distribution function of X. There are

two possibilities. Either X has a distribution function of Fh with a known functional

form (except perhaps for the parameter h, which may be vector), or X has a

distribution function F about which we know nothing (except perhaps that F is, say,

absolutely continuous). In the former case, let H be the set of possible values of

unknown parameter h, then the job of statistician is to decide on the basis of

suitably selected samples, which member or members of the family fFh ; h 2 Hg

can represent the distribution function of X. These types of problems are called

problems of parametric statistical inference. The two principal areas of statistical

inference are the “area of estimation of parameters” and the “tests of statistical

hypotheses”. The problem of estimation of parameters involves both point and

interval estimation. Diagrammatically, let us show components and constituents of

statistical inference as in chart.

xiii

xiv Introduction

based on random sample of size n from the population. The method basically

comprises of ﬁnding out an estimating formula of a parameter, which is called the

estimator of the parameter. The numerical value, which is obtained on the basis of a

sample while using the estimating formula, is called estimate. Suppose, for an

example, that a random variable X is known to have a normal distribution Nðl; r2 Þ;

but we do not know one of the parameters, say l. Suppose further that a sample

X1 ; X2 ; . . .; Xn is taken on X. The problem of point estimation is to pick a statistic

T ðX1 ; X2 ; . . .; Xn Þ that best estimates the parameter l. The numerical value of T

when the realization is x1 ; x2 ; . . .; xn is called an estimate of l, while the statistic T is

called an estimator of l. If both l and r2 are unknown, we seek a joint statistic

T ¼ ðU; V Þ as an estimate of ðl; r2 Þ.

Example Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh for which

the mean exists and is equal to h. We may want to estimate the mean h of distri-

bution. For this purpose, we may compute the mean of the observations

x1 ; x2 ; . . .; xn , i.e., say

1X n

x ¼ xi :

n i¼1

Example Let X1 ; X2 ; . . .; Xn be a random sample from Poisson’s distribution with

parameter k, i.e., PðkÞ, where k is not known. Then the mean of the observations

x1 ; x2 ; . . .; xn , i.e.,

1X n

x ¼ xi

n i¼1

is a point estimate of k.

Introduction xv

parameters l and r2 , i.e., N ðl; r2 Þ, where both l and r2 are unknown. l and r2 are

the mean and variance respectively of the normal distribution. In this case, we may

take a joint statistics ðx; s2 Þ as a point estimate of N ðl; r2 Þ, where

1X n

x ¼ xi ¼ sample mean

n i¼1

and

1 X n

s2 ¼ ðx1 xÞ2 ¼ sample mean square.

n 1 i¼1

family of sets that contain the true (unknown) parameter value with a speciﬁed

(high) probability, say 100ð1 aÞ%. This set is taken to be an interval, which is

known as conﬁdence interval with a conﬁdence coefﬁcient ð1 aÞ and the tech-

nique of constructing such intervals is known as interval estimation.

Let X1 ; X2 ; . . .; Xn be a random sample from any distribution Fh . Let hð xÞ and

hðxÞ be functions of x1 ; x2 ; . . .; xn . If P½hð xÞ\h\hð xÞ ¼ 1 a, then ðhð xÞ; hð xÞÞ is

called a 100ð1 aÞ% conﬁdence interval for h, whereas hð xÞ and hð xÞ are,

respectively, called lower and upper limits for h.

Example Let X1 ; X2 ; . . .; Xn be random sample from N ðl; r2 Þ, whereas both l

and r2 are unknown. We can ﬁnd 100ð1 aÞ% conﬁdence interval of l. To esti-

mate the population mean l and population variance r2 , we may take the observed

sample mean

1X n

x ¼ xi

n i¼1

1 X n

s2 ¼ ðxi xÞ2

n 1 i¼1

xvi Introduction

s

x ta2;n1 pﬃﬃﬃ

n

a

where ta2;n1 is the upper 2 point of the t-distribution with ðn 1Þ d.f.

Besides point estimation and interval estimation, we are often required to decide

which value among a set of values of a parameter is true for a given population

distribution, or we may be interested in ﬁnding out the relevant distribution to

describe a population. The procedure by which a decision is taken regarding the

plausible value of a parameter or the nature of a distribution is known as the testing

of hypotheses. Some examples of hypothesis, which can be subjected to statistical

tests, are as follows:

1. The average length of life l of electric light bulbs of a certain brand is equal to

some speciﬁed value l0 .

2. The average number of bacteria killed by tests drops of germicide is equal to

some number.

3. Steel made by method A has a mean hardness greater than steel made by

method B.

4. Penicillin is more effective than streptomycin in the treatment of disease X.

5. The growing period of one hybrid of corn is more variable than the growing

period for other hybrids.

6. The manufacturer claims that the tires made by a new process have mean life

greater than the life of a tire manufactured by an earlier process.

7. Several varieties of wheat are equally important in terms of yields.

8. Several brands of batteries have different lifetimes.

9. The characters in the population are uncorrelated.

10. The proportion of non-defective items produced by machine A is greater than

that of machine B.

The examples given are simple in nature, and are well established and have

well-accepted decision rules.

So far we have assumed in statistical inference (parametric) that the distribution of the

random variable being sampled is known except for some parameters. In practice, the

functional form of the distribution is unknown. Here, we are not concerned to the

Introduction xvii

techniques of estimating the parameters directly, but with certain pertinent hypothesis

relating to the properties of the population, such as equalities of distribution, tests of

randomness of the sample without making any assumption about the nature of the

distribution function. Statistical inference under such a setup is called non-parametric.

Bayes Estimator

In case of parametric inference, we consider density function f ðx=hÞ, where h is a

ﬁxed unknown quantity which can take any value in parametric space H. In

Bayesian approach, it is assumed that h itself is a random variable and density f ðx=hÞ

is the density of x for a given h. For example, suppose we are interested in estimating

P, the fraction of defective items in a consignment. Consider a collection of lots,

called superlots. It may happen that the parameter P may differ from lot to lot. In the

classical approach, we consider P as a ﬁxed unknown parameter, whereas in

Bayesian approach, we say that P varies from lot to lot. It is random variable having

a density f ðPÞ, say. Bayes method tries to use this additional information about P.

Example Let X1 ; X2 ; . . . Xn be a random sample from PDF

1

f ðx; a; bÞ ¼ xa1 ð1 xÞb1 ; 0\x\1; a; b [ 0:

bða; bÞ

Answer

We know

a a ð a þ 1Þ

E ð xÞ ¼ l11 ¼ and E x2 ¼ l12 ¼

aþb ða þ bÞða þ b þ 1Þ

Hence

a aða þ 1Þ 1Xn

¼ x; ¼ x2

aþb ða þ bÞða þ b þ 1Þ n i¼1 i

Solving, we get

P 2

b ð x 1Þ xi nx xb

b

b¼ P and b

a¼

ð xi xÞ2 1x

1 x=h rj1

f ðx; h; r Þ ¼ pﬃﬃ e x ; x [ 0; h [ 0; r [ 0

hr r

xviii Introduction

(i) Method of moments

(ii) Method of maximum likelihood

Answer

Here

1X n

Eð xÞ ¼ l11 ¼ rh ; E x2 ¼ l12 ¼ r ðr þ 1Þh2 and m11 ¼ x; m12 ¼ x2

n i¼1 i

Hence

1X n

rh ¼ x; r ðr þ 1Þh2 ¼ x2

n i¼1 i

Solving, we get

P

n

2 ðxi xÞ2

nx b i¼1

br ¼ n and h¼

P 2 nx

ð xi xÞ

i¼1

P

n

1h xi Q

n

1pﬃ

(i) L ¼ hnr ð rÞ

n e i¼1 xir1

i¼1

pﬃﬃﬃ P

n P

n

(ii) log L ¼ nr log h n log n 1h xi þ ðr 1Þ wgxi

i i¼1

Now,

@ log L nr nx x

¼ þ 2¼0)b

h¼

@h h h r

Or

pﬃﬃ

@ log L @ log r X n

¼ n log h n þ log xi

@r @r i¼1

sð r Þ X

n

¼ n log r n pﬃﬃ n log x þ log xi

r i

@ log L

¼0

@r

Introduction xix

and to get the estimate of r. Thus, for this example, estimators of θ and r are more

easily obtained by the method of moments than the method of maximum likelihood.

Example Find the estimators of α and β by the method of moments.

Proof We know

aþb ðb aÞ2

E ð xÞ ¼ l11 ¼ and Vð xÞ ¼ l2

2 12

Hence

aþb ð b aÞ 2 1 X n

¼x and ¼ ðxi xÞ2

2 12 n i¼1

Solving, we get

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P P

3 ðxi xÞ2 b 3 ð xi x Þ 2

b

a ¼x and b ¼ xþ

n n

2

b2

ðb xÞ; 0\x\b ﬁnd b

b, the MLE of β and b the estimator of β based on

method of moments. Show that b b is biased but b is unbiased. Show that the

efﬁciency of b

b with respect to b is 2/3.

Solution

Here suppose

2

L¼ ð b xÞ

b2

Then

LogL ¼ Log2 2 log b þ logðb xÞ

Or

@ log L 2 1

¼ 2 þ ¼ 0 ) b ¼ 2x

@b b bx

Now,

Zb

2 b

E ð xÞ ¼ bx x2 dx ¼

b 3

0

xx Introduction

Hence

b

¼ x ) b ¼ 3x

3

b 2b

E bb ¼2 ¼ 6¼ b

3 3

b

E ðb Þ ¼ 3 ¼ b

3

Hence b

b is biased but b is unbiased.

Again

Zb

2 b2

E x2 ¼ 2 bx2 x3 dx ¼

b 6

0

Therefore,

b2 b2 b2

Vð xÞ ¼ ¼

6 9 18

Solving, we get

b2

Vðb Þ ¼ 9V ð xÞ ¼

9

2

V b

b ¼ 4V ð xÞ ¼ b2

9

Hence

h i2

M b

b ¼V b b þ E bb b

2

2 2 2

¼ b þ bb

9 3

1

¼ b2

3

b with respect to b is 2/3.

Example Let ðx1 ; x2 ; . . . xn Þ be a given sample of size n. It is to be tested whether

the sample comes from some Poisson distribution with unknown mean μ. How do

you estimate μ by the method of modiﬁed minimum chi-square?

Introduction xxi

Solution

Let x1 ; x2 ; . . . xn be arranged in K groups such that there are ni observations with

x ¼ i; i ¼ r þ 1; . . .; r þ k 2; nL observations x r, and nu observations with

x r þ k 1; so that the smallest and the largest values of x that are fewer are

pooled together and

X

rþ k2

nL þ ni þ nu ¼ n

i¼r þ 1

Let

eu li

pi ðlÞ ¼ Pðx ¼ iÞ ¼

i!

Xn

pL ðlÞ ¼ Pðx rÞ ¼ pi ðlÞ

i¼0

X

1

pu ðlÞ ¼ Pðx r þ k 1Þ ¼ pi ðlÞ

i¼r þ k1

Now, by using

Xk

ni @pi ðhÞ

¼ 0 j ¼ 1; 2; . . .:p

p ðhÞ @hj

i¼1 i

We have

r

P P

1

i

l 1 pi ðlÞ X

rþ k2 i

l 1 pi ðlÞ

i i¼r þ k1

nL i¼0 P þ ni 1 þ nu P

1 ¼0

r

l

pi ðlÞ i¼r þ 1 pi ðlÞ

i¼0 i¼r þ k1

Since there is only one parameter, i.e., p ¼ 1 we get the only above equation.

Solving,we get

P

r P

1

ipi ðlÞ X

rþ k2 ipi ðlÞ

i¼0 i¼r þ k1

l¼

n^ nL P r þ ini þ nu P 1

pi ðlÞ i¼r þ 1 pi ðlÞ

i¼0 i¼r þ k1

¼ sum of all x0 s

Hence l

xxii Introduction

Eðyi Þ ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki and V(yi ) = r2 ; i ¼ 1; 2; . . .. . .; n; x1i ¼

18i; where b1 ; b2 . . .. . .. . .. . .bk and r2 are unknown parameters. If Y and b stand

for column vectors of the variables yi and parameters bj and if X ¼ ðxji Þ be an

ðn kÞ matrix of known coefﬁcients xji the above equation can be written as

I is an ðn nÞ identity matrix. The least square method requires that b0 s be calculated

such that / ¼ ee0 ¼ ðY Xb Þ0 ðY Xb Þbe the minimum. This is satisﬁed when

@/

¼ 0 on 2X 0 ðY Xb Þ ¼ 0

@b

^ ¼ ðX 0 XÞ1 X 0 Y.

The least square estimators b0 s is thus given by the vector b

Example Let yi ¼ b1 x1i þ b2 x2i þ . . .. . .. . .. . . þ bk xki ; i ¼ 1; 2; . . .. . .; n or

Eðyi Þ ¼ b1 x1i þ b2 x2i ; x1i ¼ 1 for all i. Find the least square estimates of b1 and b2 .

Prove that the method of maximum likelihood and the method of least square are

identical for the case of normal distribution.

Solution

In matrix notation, we have

0 1 01

1 x21 y1

B1

B x22 C

C b1

B y2 C

B C

EðY) = Xb where X ¼ B .. .. C; b ¼ and Y ¼B . C

@. . A b2 @ .. A

1 x2n yn

Now,

^ ¼ ðX 0 XÞ1 X 0 Y

b

Here

0 1

1 x21

B 1 x22 C P

1 1 ... 1 B C n x2i

0

XX¼ B. C

.. C ¼ P P

x21 x22 ... x2n B

@ .. . A x2i x22i

1 x2n

P

0 yi

XY¼ P

x2i yi

Introduction xxiii

Then

P 2 P P

^ ¼ P 1 x2i x2i yi

b P P P

n x2i ð x2i Þ2

2 x2i n x2i yi

P 2 P P P

1 x2i yi x2i x2i yi

¼ P 2 P P P P P

n x2i ð x2i Þ2 x2i yi þ n x2i yi

Hence

P P P P

^ ¼n x2i yi x2i yi

b 2 P P

n x22i ð x2i Þ2

P P

x2i yi nx2y

¼ P 2

x2i nx22

P

ðx2i x2 Þðyi yÞ

¼ P

ðx2i x2 Þ2

and

P P PP

^ ¼ x22i yi x2i x2i yi

b 1 P P

n x2 ð x2i Þ2

P 2 2i P

y x2i x2 x2i yi

¼ P 2

x2i nx2

P

ynx22 x2 x2i yi

¼ y þ P

x22i nx22

^

¼ y x2 b 2

Eðyi Þ ¼ b1 þ b2 xi: The estimators of b1 and b2 are obtained by the method of least

square on minimizing

X

n

/¼ ðyi b1 b2 xi Þ2

i¼1

n P

1 2

e2r2 ðyi b1 b2 xi Þ

1

L¼ pﬃﬃﬃﬃﬃﬃ

2pr

xxiv Introduction

P

n

L is maximum when ðyi b1 b2 xi Þ2 is minimum. By the method of

i¼1 Pn

maximum likelihood we choose b1 and b2 such that ðyi b1 b2 xi Þ2 ¼ / is

i¼1

minimum. Hence both the methods of least square and maximum likelihood esti-

mator are identical.

Chapter 1

Theory of Point Estimation

1.1 Introduction

model for the phenomenon that we seek to describe (The choice of the model is

dictated partly by the nature of the phenomenon and partly by the way data on the

phenomenon are collected. Mathematical simplicity is also a point that is given

some consideration in choosing the model). In general, model takes the form of

speciﬁcation of the joint distribution function of some random variables

X 1 ; X 2 ; . . . X n (all or some of which may as well be multidimensional). According

to the model, the distribution function F is supposed to be some (unspeciﬁed)

member of a more or less general class F of distribution functions.

Example 1.1 In many situations, we start by assuming that X 1 ; X 2 ; . . . X n are iid

(independently and identically distributed) unidimensional r.v’s (random variables)

with a common but unspeciﬁed distribution function, F1, say. In other words, the

model states that F is some member of the class of all distribution functions of the

form

Y

n

Fðx1 ; x2 ; . . .; xn Þ ¼ F1 ðxi Þ:

i¼1

X 1 ; X 2 . . . X n have each the normal distribution (but its mean and/or variance being

left unspeciﬁed), besides making the assumption that they are iid r.v’s.

In carrying out the statistical investigation, we then take as our goal, the task of

specifying F more completely than is done by the model. This task is achieved by

taking a set of observations on the r.v’s X 1 ; X 2 ; . . . ; X n . These observations are the

raw material of the investigation and we may denote them, respectively, by

x1 ; x2 ; . . . ; xn . These are used to make a guess about the distribution function F,

which is partly unknown.

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_1

2 1 Theory of Point Estimation

The process is called Statistical Inference, being similar to the process of inductive

inference as envisaged in classical logic. For here too the problem is to know the

general nature of the phenomenon under study (as represented by the distribution of

the r.v’s) on the basis of the particular set of observations. The only difference that in a

statistical investigation induction is achieved within a probabilistic framework.

Probabilistic considerations enter into the picture in three ways. Firstly, the model used

to represent the ﬁeld of study is probabilistic. Second, certain probabilistic principles

provide the guidelines in making the inference. Third, as we shall see in the sequel, the

reliability of the conclusions also is judged in probabilistic terms.

Random Sampling

Consider a statistical experiment that culminate in outcomes x which are the values

assumed by a r.v. X. Let F be the distribution function of X. One can also obtain n

independent observations on X. This means that the n values observed as

x1 ; x2 ; . . . ; xn are assumed by the r.v. X [This can be obtained by replicating the

experiment under (more or less) identical conditions]. Again each xi may be

regarded as the value assumed by a r.v. Xi, i = 1 (1)n, where X 1 ; X 2 ; . . . X n are

independent random variables with common distribution function F. The set

X 1 ; X 2 ; . . . X n of iid r.v’s is known as a random sample from the distribution

function F. The set of values ðx1 ; x2 ; . . . ; xn Þ is called a realization of the sample

ðX 1 ; X 2 ; . . .; X n Þ.

Parameter and Parameter Space

A constant which changes its value from one situation to another is knownpa-

rameter. The set of all admissible values of a parameter is often called the parameter

space. Parameter is denoted by θ (θ may be a vector). We denote the parameter

space by H.

Example 1.3

(a) Let y ¼ 2x þ h. Here, θ is a parameter and

H ¼ fh; / \ h \ /g:

H ¼ fp; 0 \ p \ 1g:

H ¼ fk; k [ 0g:

Here, σ is a parameter and H ¼ fr; r [ 0g:

r Þ, both μ and σ are unknown.

(e) Let x Nðl; 2

l

Here, h ¼ is a parameter and H ¼ lr ; 1 \ l \ 1; r [ 0

r

1.1 Introduction 3

Family of distributions

Let X Fh where h e H. Then the set of distribution functions {Fθ, h e H} is called

a family of distribution functions.

Similarly, we deﬁne family of p.d.f’s and family of p.m.f’s.

Remark

(1) If functional form of Fθ is known, then θ can be taken as an index.

(2) In the theory of estimation, we restrict ourselves to the case H Rk when k is

the number of unknown functionally unrelated parameters.

Statistic

A statistic is a function of observable random variable which must be free from

unknown parameter(s), that is a Borel measurable function of sample observations

X ¼ ðx1 ; x2 ; . . . ; xn Þ 2 Rn f : Rn ! Rk is often called a statistic.

P P P P

X i ; X 2i ; X i ; X 2i each of these is a statistic.

Estimator and estimate

Any statistic which is used to estimate (or to guess) τ(θ), a function of unknown

parameter θ, is said to be an estimator of τ(θ). The experimentally determined value

of an estimator is called an estimate.

Example 1.5 Let X 1 ; X 2 ; . . .; X 5 be a random sample from P(λ).

P

¼ 1 5 Xi .

An estimator of λ is X 5 i¼1

Suppose the experimentally determined values are X1 ¼ 1; X2 ¼ 4; X3 ¼ 2;

X4 ¼ 6 and X5 ¼ 0.

Then the estimate of λ is 1 þ 4 þ 52 þ 6 þ 0 ¼ 2:6.

In statistics, the job of a statistician is to interpret the data that he has collected and to

draw statistically valid conclusion about the population under investigation. But, in

many cases the raw data, which are too numerous and too costly to store, are not

suitable for this purpose. Therefore, the statistician would like to condense the data by

computing some statistics and to base his analysis on these statistics so that there is no

loss of relevant information in doing so, that is the statistician would like to choose

those statistics which exhaust all information about the parameter, which is contained

in the sample. Keeping this idea in mind, we deﬁne sufﬁcient statistics as follows:

Deﬁnition Let X ¼ ðX 1 ; X 2 ; . . . ; X n Þ be a random sample from fF h ; h 2 Hg.

A statistic Tð X Þ is said to be sufﬁcient for θ [or for the family of distribution

fF h ; h 2 Hg] iff the conditional distribution of X given T is free from θ.

4 1 Theory of Point Estimation

want to estimate p, the probability of getting head in a single toss. To estimate p,

n tosses are performed. Suppose the results are X 1 ; X 2 ; . . . ; X n where

0 if tail appears

Xi ¼

1 if head appears ðin ith tossÞ:

estimate p, it is enough to keep the record of the number of heads. So the statistic

T = ΣXi should be sufﬁcient for p.

Again, conditional distribution of X1 ¼ x1 ; X2 ¼ x2 ; . . .; X n ¼ xn given Tð X Þ ¼

t where t ¼ T ðX1 ¼ x1 ; X2 ¼ x2 ; . . . ; Xn ¼ xn Þ is given by

(

PðX1 ¼ x1 ;X2 ¼ x2 ;...;Xn ¼ xn ;T¼tÞ

PðT ¼ tÞ if T ¼ t

0 otherwise

8 P P

>

> xi

ð1pÞ

n xi

>

< n

p

if T ¼ t

¼ p ð1pÞ

t nt

>

> t

>

:

0 otherwise

8 1

> if T ¼ t

< n

¼

>

:

t

0 otherwise

So from deﬁnition of sufﬁcient statistics, we observe that Σxi is a sufﬁcient

statistic for p.

Illustration 1.2 Let X1 ; X2 ; . . .; Xn be a random samples from N(μ, 1) where μ

is unknown. Consider an orthogonal transformation of the form

X1 þ X2 þ þ Xn

y1 ¼ p

n

ðk 1ÞX k ðX 1 þ X 2 þ þ X k1 Þ

and yk ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ; k ¼ 2ð1Þn:

k ð k 1Þ

pﬃﬃﬃ

Clearly, y1 Nð n l; 1Þ and each of yk Nð0; 1Þ:

Again, y1, y2,…, yn are independent.

Note that the joint distribution of y2 ; y3 ; . . .; yn does not involve μ, i.e. y2 ; . . .; yn

do not provide any information on μ. So to estimate μ, we use either the obser-

vations on X 1 ; X 2 ; . . . X n or simply the observed value of y1. So any analysis based

on y1, is just as effective as the analysis that is based on all observed values on

X 1 ; X 2 ; . . . X n . Hence, we can suggest that y1 is a sufﬁcient statistic for μ.

1.2 Sufﬁcient Statistic 5

ðy2 ; y3 ; . . .; yn Þ given y1 is same as the unconditional distribution of

ðy2 ; y3 ; . . . ; yn Þ. Hence, the conditional distribution of X given y1 will be free from

μ.

Thus according to the deﬁnition of sufﬁcient statistics, y1 will be a sufﬁcient

statistic for μ.

However, this approach is not always fruitful. To overcome this, we consider a

necessary and sufﬁcient condition for a statistic to be sufﬁcient.

We ﬁrst consider the Fisher–Neyman criterion for the existence of a sufﬁcient

statistic for a parameter.

Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ be a random sample from a population with contin-

uous distribution function F h ; h 2 H. Let Tð X Þ be a statistic whose probability

density function is fgfTð x Þ; hgg. Then Tð X Þ is a sufﬁcient statistic for h iff the

joint probability density function f ð x ; hÞ of X 1 ; X 2 ; . . .; X n can be expressed as

f ð X ; hÞ ¼ gfTð x Þ; hghð x Þ whose, for every ﬁxed value of Tð x Þ, hð x Þ does not

depend upon h.

Example 1.5 Let X 1 ; X 2 ; . . .; X n be a random sample from the distribution that has

probability mass function

P

f ðx; hÞ ¼ h ð1 hÞ ; x ¼ 0; 1; 0 \ h \ 1. The statistic T X ¼ ni¼1 X i has

x 1x

the probability mass function

n!

gðt; hÞ ¼ ht ð1 hÞnt ; t ¼ 0; 1; 2; . . .; n

t!ðn tÞ!

Thus the joint probability mass function of X 1 ; X 2 ; . . .; X n may be written

f x ; h ¼ hx1 þ x2 þ þ xn ð1 hÞnðx1 þ x2 þ þ xn Þ

n! t!ðn tÞ!

¼ ht ð1 hÞnt

t!ðn tÞ! n!

By Fisher–Neyman criterion, T X ¼ X 1 þ X 2 þ þ X n is a sufﬁcient

statistic for h. In some cases, it is quite tedious to ﬁnd the p.d.f or p.m.f of a certain

statistic which is or is not a sufﬁcient statistic for h. This problem can be avoided if

we use the following

Fisher–Neymann factorization theorem

Let X ¼ ðX 1 ; X 2 ; . . . X n Þ be a random sample from a population with c.d.f.

F h ; h e H. Furthermore, let all X 1 ; X 2 ; . . . X n are of discrete type or of continuous

6 1 Theory of Point Estimation

type. Then a statistic Tð x Þ will be sufﬁcient for θ or for fF h ; h e Hg iff the joint

p.m.f. or p.d.f. f ð X ; hÞ, of X 1 ; X 2 ; . . . X n can be expressed as

f ð X ; 0Þ ¼ gfTð X Þ; hg hð x Þ

where the ﬁrst factor gfTð x Þ; hg is a function of θ and x only through T( x ) and for

ﬁxed T( x ) the second factor h( x ) is free from θ and is non-negative.

Remark 1.1 When we say that a function is free from θ, we do not only mean that θ

does not appear in the functional form but also the domain of the function does not

involve θ.

e.g. the function

1=2; h 1\x\hþ1

f ðxÞ ¼

0 otherwise

0

Corollary 1.1 Let T X be a sufﬁcient statistic for θ and T X = ψ T X

be a one-to-one function of T. Then T 0 X is also sufﬁcient for the same

parameters θ.

Proof Since T is sufﬁcient for θ, by factorization theorem, we have

f ð x ; hÞ ¼ gfTð x Þ; hg hð x Þ

Since the function T 0 x is one-to-one

h i

f ð x ; hÞ ¼ g w1 fT 0 ð x Þg; h hð x Þ:

h

Clearly, the ﬁrst factor of R.H.S. depends on h and x only through T 0 ð x Þ and

the second factor h( x ) is free from θ and is non-negative.

Therefore, according to factorizability criterion, we can say that T 0 ð x Þ is also

sufﬁcient for the same parameter θ.

Example 1.6 Let X1 ; X2 ; . . . Xn be a random sample from b(1, π). We show that 1/n

ΣXi is a sufﬁcient statistic for π.

1.2 Sufﬁcient Statistic 7

P.m.f. of x is

hX ð1 hÞ1X if x ¼ 0; 1 ½h p

fh ðxÞ ¼

0 otherwise

where 0 \ h \ 1, i.e. the parameter space is H ¼ ð0; 1Þ. Writing fh ðxÞ in the

form

1 if x ¼ 0; 1

fh ðxÞ ¼ C ðxÞhx ð1 hÞ1x with CðxÞ ¼

0 Otherwise:

i i

¼ gh ð t Þ h ð x 1 ; x 2 ; . . . x n Þ ðSayÞ

Q

where t ¼ Rxi ; gh ðtÞ ¼ ht ð1 hÞnt and h ðx1 ; x2 ; . . .; xn Þ ¼ i Cðxi Þ:

Hence, the factorization criterion is met by the joint distribution, implying that

T = ΣXi is sufﬁcient for θ. So is T/n, the sample proportion of successes being

one-to-one correspondence with T.

Example 1.7 Let X1,…, Xn be a random sample from P(λ). We show that 1/n ΣXi is

a sufﬁcient statistic for λ.

The p.m.f. of the Poisson distribution is

eh hx

fh ðxÞ ¼ x! if x ¼ 0; 1; 2. . .½h k

0 otherwise

Let us write the p.m.f. in the form fh ðxÞ ¼ CðxÞ eh hX

1

if x ¼ 0; 1; 2; . . .

with CðxÞ ¼ x!

0 otherwise

P

P fh ðxi Þ ¼ enh h Xi

P Cðxi Þ

i i

¼ gh ðtÞhðx1 ; x2 . . .xn Þ; ðSayÞ

P

where t ¼ xi ; gh ðtÞ ¼ enh ht and hðx1 ; x2 ; . . .xn Þ ¼ P Cðxi Þ:

i

8 1 Theory of Point Estimation

for θ; so is T/n ¼ X, the sample mean.

Example 1.8 Let X 1 ; X 2 ; . . . ; X n be a random sample from Nðl; r2 Þ. Show that

(i) if σ is known, ΣXi is a sufﬁcient statistic for μ, (ii) if μ is known RðX i lÞ2 is a

sufﬁcient statistic for σ2, and (iii) if both μ and σ are unknown RX i ; RX 2i is a

sufﬁcient statistic for ðl; r2 Þ.

Ans. (i) we may take the variance to be σ2 and the unknown mean to be μ,

varying over the space H ¼ ð /; /Þ. Here, the joint p.d.f. of X 1 ; X 2 ; . . . ; X n is

Q n 1 12 ðxi lÞ2 o

pﬃﬃﬃﬃ e 2r

i r 2p

P

1 12 ðxi lÞ2

¼ pﬃﬃﬃﬃﬃﬃn e

2r

i

r 2p

8 P 2 9

nðxlÞ2

< i ðxi xÞ 1 =

¼ e 2r2 e 2r2 pﬃﬃﬃﬃﬃ n

: ðr 2nÞ ;

¼ gl ðtÞ h ðx1 ; x2 ; . . .; xn Þ; ðSayÞ

2

where t ¼ x; gl ðtÞ ¼ enðXlÞ =2r2

P

1 ðxi xÞ2

and hðx1 ; x2 ; . . . ; xn Þ ¼ p1ﬃﬃﬃﬃ n e 2r2

ð r 2p Þ

Thus the factorizability condition holds with respect P to T ¼ X, the sample mean,

which is therefore sufﬁcient for μ. So the sum is i X i

(ii) The unknown variance σ2 = θ, say, is supposed to vary over H ¼ ð0; /Þ.

The joint p.d.f. of X 1 ; X 2 ; . . . X n may be written as

P

Y 1 12 ðxi lÞ2

1

1

2r2

ðxi lÞ2

pﬃﬃﬃﬃﬃﬃ e 2r ¼ pﬃﬃﬃﬃﬃﬃ n e i ¼ gh ðtÞ hðx1 ; x2 . . . xn Þ; say;

i r 2p ðr 2pÞ

P P 2

ðxi lÞ2 ; gh ðtÞ ¼ pﬃﬃﬃﬃ1 n e2r2 ðxi lÞ

1

where t ¼

ð 2prÞ

i

P

r h and h x ¼ 1. Hence, T ¼

2

ðX i lÞ2 is a sufﬁcient statistic for θ. So

is

P

S20 ¼ 1n i ðxi lÞ2 , which is in this situation commonly used to estimates σ2.

1.2 Sufﬁcient Statistic 9

(iii) Taking the unknown mean and variance to be θ1 and θ2, respectively, we

now have for θ a vector θ = (θ1, θ2) varying over the parameter space (which is a

half-plane) H ¼ fðh1 ; h2 Þ = / \ h1 \ /; 0 \ h2 \ /g.

The jt. p.d.f. of X 1 ; X 2 ; . . . ; X n may now be written as

Y 1 2

1 2h1 ½nðxh1 Þ2 þ ðn1Þs2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ e2h2 ðxi h1 Þ

1

¼ n e 2

i 2ph2 ð2ph2 Þ 2

X .

¼ gh ðt1 ; t2 Þ hð x Þ; say; where t1 ¼ x; t2 ¼ s2 ¼ ðxi xÞ2 ðn 1Þ

i

1

2h1 ½nðxh1 Þ2 þ ðn1Þs2

gh ð t 1 ; t 2 Þ ¼ n e 2 and h x ¼ 1 :

ð2ph2 Þ2

The factorizability condition is thus observed to hold with regard to the statistics

T 1 ¼ X, the sample mean and T2 ¼ s2, the sample variance.

Hence, X and s2 are jointly sufﬁcient for θ1 and θ2, i.e. RX i ; RX 2i is a joint

sufﬁcient statistic for (μ, σ2).

Example 1.9 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(0, θ).

Show that X ðnÞ ¼ max Xi is a sufﬁcient statistic for θ.

1in

Ans.: The jt. p.d.f. of x1 ; x2; . . .; xn is

1

hn if 0 \ xi \ h 8i

f x;h ¼

0otherwise

1

if 0 \ xðnÞ \ h

hn

¼

0 otherwise

1 1 if a \ x \ b

¼ n I ð0;hÞ xðnÞ where I ða;bÞ ðxÞ ¼

h 0 otherwise

¼ gfxðnÞ ; hg hð x Þ; say

Note that g{x(n), θ} is a function of θ and x only through x(n) whereas for ﬁxed

xðnÞ ; hð x Þ is free from θ.

Hence, x(n) is a sufﬁcient statistic for θ.

Example 1.10 Let X 1 ; X 2 ; . . . ; X n be a random sample from R(θ1, θ2). Show that

fX ð1Þ ; X ðnÞ g is a sufﬁcient statistic for θ = (θ1, θ2) where Xð1Þ ¼ min X i ,

1in

X ðnÞ ¼ max X i .

1 i n

10 1 Theory of Point Estimation

(

1

if h1 \ xi \ h2 8i

ðh2 h1 Þn

f x;h ¼

0 otherwise

(

1

ðh2 h1 Þn

if h1 \ xð1Þ \ xðnÞ \ h2

¼

0 otherwise

1

¼ I1ðh1 ;1Þ fxð1Þ gI2ð1;h2 Þ fxðnÞ g

ðh2 h1 Þn

where

1 if h1 \ xð1Þ \ /

I1ðh1 ;1Þ fxð1Þ g ¼

0 otherwise

1 if / \ xn \ h2

and I 2 ð /; h2 ÞfxðnÞ g ¼ , i.e. f ð x ; hÞ ¼ g½fxð1Þ ; xðnÞ g;

0 otherwise

ðh1 ; h2 Þh x where g½fxð1Þ ; xðnÞ g; ðh1 ; h2 Þ ¼ ðh2 h

1

1Þ

n I 1ð0 ;/Þ fxð1ÞgI 2ð1;h Þ

1 2

xðnÞ and h x ¼ 1.

Note that g is a function of (θ1, θ2) and x only through {x(1), x(n)} where as for

Hence, {x(1), x(n)} is a sufﬁcient statistic for (θ1, θ2).

Example 1.11 Let X 1 ; X 2 ; . . .; X n be a random sample from a population having

p.d.f.

eðx hÞ ; x[ h

f ðx; hÞ ¼

0 otherwise

1in

Solution The p.d.f. can equivalently be written as

eðXhÞ ; xð1Þ [ h

f ðx;hÞ ¼

0 otherwise

1.2 Sufﬁcient Statistic 11

8 P

< ðxi hÞ

f ð x ;hÞ ¼ e i ; xð1Þ [ h

:

0 otherwise

P

ðxi hÞ 1 if xð1Þ [ h

f ð x ; hÞ ¼ e i I ðh;/Þ xð1Þ where I ðh;/Þ xð1Þ ¼

0 otherwise

¼ g fxð1Þ ; hg : h x ; say

P

where gfxð1Þ ; hg ¼ e i

ðxi hÞ

I ðh;/Þ xð1Þ and h x ¼ 1.

Note that g{x(i), θ} is a function of θ and x only through x(1) and for ﬁxed x(1), h

(x) is free from θ. Hence, according to factorizability criterion, x(1) is a sufﬁcient

statistic for θ.

Note In the above three problems the domain of the probability density depends

upon the parameter h. In this situation, we should aware to apply Fisher–Neyman

factorization

theorem

and we should give

proper

consideration to the domain of the

function h x for every ﬁxed value of T X . In these situations, it is better to use

Fisher–Neyman criterion. Let us solve Example 1.10 by using Fisher–Neyman

criterion:

1

f ð xÞ ¼ ; h1 \ x \ h2

h 2 h1

Let X ð1Þ ¼ min X i ¼ y1 ; X ðnÞ ¼ max X i ¼ y2

1in 1in

The joint p.d.f. of y1 ; y2 is

nð n 1Þ

gð y 1 ; y 2 ; h1 ; h2 Þ ¼ ðy y1 Þn2 ; h1 \ y1 \ y2 \ h2

ð h2 h1 Þ n 2

1

f x ; h1 ; h2 ¼

ð h2 h1 Þ n

nð n 1Þ n2 1

¼ n xðnÞ xð1Þ n2

ð h2 h1 Þ nðn 1Þ xðnÞ xð1Þ

¼ g xðnÞ ; xð1Þ ; h1 ; h2 h x

12 1 Theory of Point Estimation

By the Fisher–Neyman criterion, xð1Þ ; xðnÞ is a sufﬁcient statistic for

h ¼ ð h1 ; h2 Þ

Example 1.12 Let X Nð0; r2 Þ, show that jX j is sufﬁcient for σ.

Solution

1 X2

f ðx; rÞ ¼ pﬃﬃﬃﬃﬃﬃﬃﬃ e2r2 ; r[0

2kr

1 jX j2

¼ pﬃﬃﬃﬃﬃ e 2r2 1

2kr

¼ gðt; rÞ hðxÞ; hðxÞ ¼ 1

where g(t, σ) is a function of σ and x only through t ¼ jxj and for ﬁxed t, h(x) = 1 is

free from σ.

Hence, by Fisher–Neymam factorization theorem, jX j is sufﬁcient for σ.

Example 1.13 Let X 1 ; X 2 ; . . . X n be a random sample from a double-exponential

distribution whose p.d.f. may be taken as fh ðX Þ ¼ 12 exp ðjxi hjÞ, and the

unknown parameter θ varies over the space H ¼ ð /; /Þ.

Q P

In this case, the joint p.d.f. is i f h ðxi Þ ¼ 21n exp i jxi hj .

For no single statistic T, it is now not possible to express the joint p.d.f. in the

form gθ(t) h(x1, x2, … xn).

Hence, there exists no statistic T which taken alone is sufﬁcient for θ. The whole

set X1, X2, …, Xn, or the set X(1), X(2), … X(n), is of course sufﬁcient.

Remarks 1.2 A single sufﬁcient statistic does not always exist.

e.g. Let X1, X2,…, Xn be a random sample from a population having p.d.f.

1

f ðx;hÞ ¼ h; k h \ x \ ðk þ 1Þ h; k [ 0

0 otherwise

Here, no single sufﬁcient statistic for θ exists. In fact, {x(1), xn)} is sufﬁcient for

θ.

Remark 1.3 Not all functions for sufﬁcient statistic are sufﬁcient. For example, in

2

random sampling from N(μ, σ2), σ2 being known, X is not sufﬁcient for μ. (Is X

sufﬁcient for μ2 ?)

Remark 1.4 Not all statistic are sufﬁcient.

Let X1, X2 be a random sample from P(λ). Then X1 + 2X2 is not sufﬁcient for λ,

because in particular, say

PfX 1 ¼ 0; X 2 ¼ 1 X 1 þ 2X 2 ¼ 2g

PfX 1 ¼ 0; X 2 ¼ 1j X 1 þ 2X 2 ¼ 2g ¼

PfX 1 þ 2X 2 ¼ 2g

1.2 Sufﬁcient Statistic 13

PfX 1 ¼ 0; X 2 ¼ 1g

¼

PfX 1 ¼ 0; X 2 ¼ 1g þ PfX 1 ¼ 2; X 2 ¼ 0g

k k

e e k

¼ 2

ek ek k þ ek ek k2!

Remarks 1.5 Let h ¼ ðh1 ; h2 ; . . .; hk Þ and T ¼ ðT 1 ; T 2 ; . . . T m Þ. Further, let T

be a sufﬁcient statistic for h . Then we cannot put any restriction on m, i.e. m ≥ k,

the number of parameters involved in the distribution. Even if m = k, then we

cannot say that Ti of T is sufﬁcient for θi of h . It is better to say that (T1, T2, … Tm)

are jointly sufﬁcient for (θ1, θ2, … θk).

Let X1, X2,…, Xn be a random sample from N(μ, σ2). Here, ΣXi and ΣX2i are

jointly sufﬁcient for μ and σ2.

Remarks 1.6 The whole set of observations X ¼ ðX 1 ; X 2 ; . . . ; X n Þ is always

sufﬁcient for h . But we do not consider this to be real sufﬁcient statistic when

another sufﬁcient statistic exists. There are a few situations where the whole set of

observations is a sufﬁcient statistic. [As shown in the example of

double-exponential distribution].

Remarks 1.7 The set of all order statistics T{X(1), X(2), …, X(n)}, X(1) < X(2),

…, < X(n), is sufﬁcient for the family.

Conditional distribution of ð X =T ¼ tÞ ¼ n!1 because for each T = t, we have n-

tuples of the form (x1, x2, … xn).

Remarks 1.8: Distribution

admitting

sufﬁcient

statistic Let X1, X2, …, Xn be a

random sample from f x ; h and T X be a sufﬁcient statistic for θ (θ is a scalar).

According to factorization theorem,

X

log f ðxi ; hÞ ¼ log g ðT; hÞ þ log hð x Þ

i

Differentiating w.r.t. θ, we have

X @ log f ðxi ; hÞ @ log gðT; hÞ

¼ ¼ GðT; hÞ; ðsayÞ ð1:1Þ

i

@h @h

14 1 Theory of Point Estimation

X

n

Then we have uðxi Þ ¼ GðTÞ ð1:2Þ

i¼1

¼ ð1:3Þ

@h@xi @T @xi

@uðxi Þ @GðT Þ @T

¼ ð1:4Þ

@xi @T @xi

= ¼ 8i ð1:5Þ

@h @xi @xi @GðT Þ=@T

@GðT; hÞ @ GðTÞ

¼ k1 ðhÞ

@T @T

) GðT; hÞ ¼ GðT Þk1 ðhÞ þ k2 ðhÞ

P

@ i logf ðxi ; hÞ

) ¼ GðT Þk1 ðhÞ þ k2 ðhÞ

@h Z Z

X

) log f ðxi ; hÞ ¼ GðT Þ k1 ðhÞdh þ k2 ðhÞdh þ c x

i

Y

) f ðxi ; hÞ ¼ A x eh1 GðT Þ þ h2

i

where A x ¼ a function of x

θ1 = a function of θ, and

θ2 = another function of θ.

Thus if a distribution is to have a sufﬁcient statistic for its parameter, it must be

of the form

Example Show, by expressing a Poisson p.m.f. in Koopman form, that Poisson

distribution possesses a sufﬁcient statistic for itsparameter k.

1.2 Sufﬁcient Statistic 15

k x

Here, f ðx; kÞ ¼ e x!k ¼ ek þ x log klog x!

which is of the form eB1 ðhÞuðxÞ þ B2 ðhÞ þ RðxÞ .

Hence, there exists a sufﬁcient statistic for k.

Completeness A family of distributions is said to be complete

if E ½gðX Þ ¼ 0 8h2H

) PfgðxÞ ¼ 0g ¼ 1 8h2H

Examples 1.14P(a) Let X1, X2,…, Xn be a random sample from b(1, π), 0 < π < 1.

Then T ¼ ni¼ 1 X i is a complete statistic.

As E ½gðT Þ ¼ 0 8 p 2 ð0; 1Þ

X n

) gðtÞ nt pt ð1 pÞnt ¼ 0

t¼0

X

n n

p t

) ð 1 pÞ n gð t Þ ¼ 0 8 p 2 ð0; 1Þ

t¼0

t

1p

) gð t Þ ¼ 0 for t ¼ 0; 1; 2. . . n 8 p 2 ð0; 1Þ

) P fgð t Þ ¼ 0g ¼ 1 8p

(b) Let X * N (0, σ2). Then X is not complete

as; E ð X Þ ¼ 0 ; P ð X ¼ 0Þ ¼ 1 8 r2

A statistic is said to be complete sufﬁcient statistic if it is complete as well as

sufﬁcient. P

If (X1, X2,…, Xn) is a random sample from b (1, π), 0 < π < 1, then

P T ¼ X i is

also sufﬁcient. So T is a complete sufﬁcient statistic where T ¼ Xi.

Minimal Sufﬁcient Statistic

A statistic T is said to be minimal sufﬁcient if it is a function of every other

sufﬁcient statistics.

The sufﬁciency principle

A sufﬁcient statistic for a parameter h is a statistic that, in a certain sense, captures

all the information about h contained in the sample. Any additional information in

the sample, besides the value of sufﬁcient statistic, does not contain any more

information about h. These considerations lead to the data reduction technique

knownas sufﬁciency

principle.

If T X is a sufﬁcient statistic for h, then any inference about h should depend

on the sample X only through the value T X , that is, if x and y are two sample

16 1 Theory of Point Estimation

points such that T x ¼ T y , then the inference about h should be the same

whether X ¼ x or X ¼ y is observed.

Deﬁnition (Sufﬁcient statistic) A statistic T X is a sufﬁcient statistic for h if the

conditional distribution of the sample X given the value of T X does not depend

on h.

Factorization theorem: Let f x jh denote the joint pdf/pmf of a sample X .

A statistic T X is a sufﬁcient statistic for h iff 9 functions gðtjhÞ and h x such

that for all sample points X and all parameter values h,

f x jh ¼ gðtjhÞh x

Result: If T X is a function of T 0 X , then T 0 X is sufﬁcient which

implies that T X is sufﬁcient.

i.e. sufficiency of T 0 X ) sufficiency of T X ; a function of T 0 X

0 0 0

Proof Let fBt0 jt 2 s g and fAt jt 2 sg be the partitions induced by T X and

T X , respectively. h

Since T X is a function of T 0 X , for t0 2 s0 ) Bt0 At , for some 8t 2 s.

0

Thus Sufﬁciency of T X

, Conditional distribution of X ¼ x given T 0 X ¼ t0 is independent of h,

8t0 2 s0

, Conditional distribution of X ¼ x given X 2 Bt0 is independent of h, 8t0 2 s0

) Conditional distribution of X ¼ x given X 2 At (for some 8t 2 s) is inde-

pendent of h, 8t 2 s

, Conditional distribution of X ¼ x given T X ¼ t is independent of h,

8t 2 s.

1.2 Sufﬁcient Statistic 17

, Sufﬁciency of T X .

Let X1 ; X2 ; . . .; Xn be i.i.d. observations from a pdf/pmf f xjh that belongs to

an exponential family given by

!

X

k

f xjh ¼ hð xÞc h exp xi h ti ð xÞ

i¼1

!

X

n X

n

T X ¼ t1 Xj ; . . .; tk X j

j¼1 j¼1

When we introduced the concept of sufﬁciency, we said that our objective was to

condense the data without losing any information about the parameter. In any

problem, there are, in fact, many sufﬁcient statistics. In general, we have to consider

the choice between alternative sets of sufﬁcient statistics. In a sample of n obser-

vations, we always have a set of n sufﬁcient statistics [viz., the observations X ¼

ðX1 ; X2 ; . . .; Xn Þ themselves or the order statistics Xð1Þ ; Xð2Þ ; . . .; XðnÞ ] for the

kð 1Þ parameters of the distributions. For example, in sampling from N ðl; r2 Þ

distribution with both l and r2 unknown, there are, in fact, three sets of jointly

sufﬁcient statistic: the observations X ¼ ðX1 ; X2 ; . . .; Xn Þ, the order statistics

s2 Þ. We naturally prefer the jointly sufﬁcient statistic

Xð1Þ ; Xð2Þ ; . . .; XðnÞ and ðX;

s2 Þ since they condense the data more than either of the other two. Sometimes,

ðX;

though not always, there will be a set of sð\nÞ statistics sufﬁcient for the

parameters. Often s ¼ k but s may be <k also.

The question that we might ask is as follows: Does 9 a set of sufﬁcient statistic

that condenses the data more than ðX; s2 Þ? The answer is there does not. The notion

that we are alluding to is of minimum set of sufﬁcient statistics, which we label

minimal sufﬁcient statistic. In other words, we have to ask: what is the smallest

number s of statistics that constitute a sufﬁcient set in any problem? It may be said

in general that a sufﬁcient statistic T may expected to be minimal sufﬁcient if it has

the same dimensions (i.e. the same number of components) as h.

18 1 Theory of Point Estimation

It may be noted that every statistic induces a partition of x. The same is true for a set

of statistics; a set of statistics induces a partition of x. Loosely speaking, the

condensation of data that a statistic or a set of statistics exhibits can be measured by

the number of subsets in the partition induced by the statistic or a set of statistics. If

a set of statistics has fewer subsets (co-sets) in its induced partition than the induced

partition of another set of statistics, then we say that the ﬁrst statistic condenses the

data more than the later. Still loosely speaking, a minimal sufﬁcient set of statistics

is then a sufﬁcient set of statistics that has fewer subsets (co-sets) in its partition

than the induced partition of any other set of sufﬁcient statistics. So a set of

sufﬁcient statistic is minimal if no other set of sufﬁcient statistics condenses the data

more without losing sufﬁciency.

Thus T is minimal sufﬁcient if any further reduction of data is not possible

without losing sufﬁciency, i.e. T is minimal sufﬁcient if there does not exist a

function U ¼ wðT Þ such that U is sufﬁcient.

Deﬁnition (Minimal sufﬁcient statistic) A sufﬁcient statistic Tð X Þ is called minimal

sufﬁcient if, for every other sufﬁcient statistic T 0 ð X Þ, Tð X Þ is a function of T 0 ð X Þ.

To say that Tð X Þ is a function of T 0 ð X Þ simply means that if T 0 ð x Þ ¼ T 0 ð y Þ,

then Tð x Þ ¼ Tð y Þ. In terms of the partition sets, if Bt0 jt0 2 s0 are the partition sets

for T 0 ð X Þ and At jt 2 T are the partition sets for Tð X Þ, then the above deﬁnition of

minimal sufﬁcient statistic states that every Bt0 is a subset of some At . Thus, the

partition associated with a minimal sufﬁcient statistic is coarsest possible partition

for a sufﬁcient statistic, and a minimal sufﬁcient statistic achieves the greatest

possible data reduction for a sufﬁcient statistic.

Example Let Xi ði ¼ I; 2; . . .; nÞ independent PðhÞ distribution. Then T ¼

Pn

i ¼ 1 Xi is sufﬁcient for h and, in fact, it is minimal sufﬁcient.

Pn

Since T ¼ i ¼ 1 Xi is minimal sufﬁcient; therefore, any further reduction of the

data is not possible without losing sufﬁciency, i.e. there does not exist a function

U ¼ wðT Þ such that U is sufﬁcient. Suppose that T is sufﬁcient and if possible, 9 a

function

U ¼ wðtÞ3wðt1 Þ ¼ ¼ wðtk Þ ¼ u:

Then

8 ðnhÞti

>

>

<Pk

ti !

if t ¼ ti ði ¼ 1; 2; . . .; kÞ

ðnhÞti

Ph ½T ¼ tjU ¼ u ¼

>

> i¼1

ti !

:

0 otherwise

! depends on h ;

1.2 Sufﬁcient Statistic 19

Pn

so that U is not sufﬁcient retaining sufﬁciency. Hence, T ¼ i¼1 Xi is minimal

sufﬁcient statistic.

Remark 1 Since minimal sufﬁcient statistic is a function of sufﬁcient statistic,

therefore, a minimal sufﬁcient statistic is also sufﬁcient.

Remark 2 Minimal sufﬁcient statistic is not unique since any one-to-one function of

minimal sufﬁcient statistic is also a minimal sufﬁcient statistic.

Deﬁnition of minimal sufﬁcient statistic does not help us to ﬁnd a minimal

sufﬁcient statistic except for verifying whether a given statistic is minimal sufﬁcient

statistic. Fortunately, the following result of Lehman and Scheffe (1950) gives an

easier way to ﬁnd a minimal sufﬁcient statistic.

Theorem Let f x jh be the pmf/pdf of a sample X . Suppose 9 a function

T X 3 for every two sample points x and , and the ratio of f x jh

y f y jh

is constant as a function of h (i.e. independent of h) iff T x ¼ T y . Then

T X is minimal sufﬁcient statistic.

Proof Let us assume f x jh [ 0; x 2 x and h. First, we show that T X is a

n o

sufﬁcient statistic. Let s ¼ t=t ¼ Tð x Þ; x 2

x be the image of x under Tð x Þ.

n

o

At , choose and ﬁx one element x 2 At . For any x 2

x, x is the ﬁxed element

t Tð x Þ

are in the same set At ,

T x

0 1 , 0

1

T x ¼ T @ x

A and, hence, f x jh f @ x

jhA is constant as a

T x T x

f x jh

x by h x ¼ and h

function of h. Thus, we can deﬁne a function on

f x jh

T ð x Þ

does not depend on h. h

Deﬁne a function on s by gðtjhÞ ¼ f x jh . Then

t

20 1 Theory of Point Estimation

f x jh f x jh

Tð x Þ

f ð x jhÞ ¼

¼ gðtjhÞhð x Þ and by factorization theorem, Tð X Þ

f x jh

Tð x Þ

sufﬁcient statistic. By factorization theorem, 9 functions g0 and h0 such that

f ð x jhÞ ¼ g0 T 0 ð x Þjh h0 ð x Þ

f ð x jhÞ g0 ðT 0 x jhÞ h0 ð x Þ h0 x

¼ ¼ :

f ð y jhÞ g0 ðT 0 y jhÞh0 y

h0 y

Since this ratio does not depend on h, the assumptions of the theorem imply

Tð x Þ ¼ Tð y Þ. Thus Tð x Þ is a function of T 0 ð x Þ and Tð x Þ is minimal.

both l and r2 unknown. Let x and y denote two sample points, and let ðx; s2x Þ and

ðy; s2y Þ be the sample means and variances corresponding to the x and y samples,

respectively. Then we must have

n=2

f x jl; r 2 exp ð2pr2 Þ

n ðx lÞ 2

þ ð n 1 Þs 2

x ð 2r 2

Þ

¼

n=2

f y jl; r2 ð2pr Þ

2 2

exp nðy lÞ þ ðn 1Þs y 2 ð2r Þ2

2

¼ exp n x y þ 2nlðx yÞ ðn 1Þ s x s y

2 2 2

2r2

This ratio will be constant as a function of l and r2 iff x ¼ y and s2x ¼ s2y , i.e.

s2 Þ is a minimal sufﬁcient

ðx; s2x Þ ðy; s2y Þ. Then, by the above theorem, ðX;

Remark Although minimal sufﬁciency ) sufﬁciency, the converse is not neces-

Pn true.PnFor 2a random sample X1 ; X2 ; . . .Xn from Nðl; lÞ distribution,

sarily

i ¼ 1 Xi ; i ¼ 1 Xi is sufﬁcient but not minimal sufﬁcient statistic. In fact,

1.2 Sufﬁcient Statistic 21

Pn P P

i¼1 Xi and ni¼ 1 Xi2 are each singly sufﬁcient for l; ni¼ 1 Xi2 being minimal.

(This particular example also establishes the fact that single sufﬁciency does not

imply minimal sufﬁciency.)

Unbiased Estimator

known, but the parameter θ is unknown. Here, we wish to ﬁnd the true value of θ on

the basis of the experimentally determined values x1, x2,…, xn, corresponding to a

random sample X1, X2,…, Xn from Fθ. Sine the observed values x1, x2, …, xn

change from one case to another, leading to different estimates in different cases, we

cannot expect that the estimate in each case will be good in the sense of having

small deviation from the true value of the parameter. So, we ﬁrst choose an esti-

mator T of θ such that the following condition holds:

Surely, (1.7) is an ideal condition, but the mathematical handling of (1.7) is very

difﬁcult. So we require some simpler condition. Such a condition is based on mean

square error (m.s.e.). In this case, an estimator will be best if its m.s.e. is least. In

other words, an estimator T will be best in the sense of m.s.e. if

2

ð1:8Þ

It can readily be shown that there exists no T for which (1.8) holds. [e.g. Let θ0

be a value of θ and consider T 0 ¼ h0 . Note that m.s.e. of T 0 at θ = θ0 is ‘0’, but m.s.

e. of T 0 for other values of θ may be quite large.]

To sidetrack this, we introduce the concept of unbiasedness.

Actually, we choose an estimator on the basis of a set of criteria. Such a set of

criteria must depend on the purpose for which we want to choose an estimator.

Usually, a set consists of the following criteria: (i) unbiasedness; (ii) mini-

mum-variance unbiased estimator; (iii) consistency, and (iv) efﬁciency.

Unbiasedness

An estimator T is said to be an unbiased estimator (u.e.) of h ½or cðhÞ iff

E ðT Þ ¼ h ½or cðhÞ 8h 2 H.

Otherwise, it will be called a biased estimator. The quantity b(θ, T) = Eθ (T) − θ

is called the bias. A function γ(θ) is estimable if it has an unbiased estimator.

22 1 Theory of Point Estimation

Let X1, X2,…, Xn be a random sample from a population with mean μ and

Pn

variance σ2. Then X and s2 ¼ 1 2 are u.e’s of μ and σ2,

n1 i ¼ 1 ðXi X Þ

respectively.

Note

(i) Every individual observation is an unbiased estimator of population mean.

(ii) Every partial mean is an unbiased estimator of population mean.

Pk Pk

1

(iii) Every partial sample variance [ e.g. k1 2

1 ðXi Xk Þ ; Xk ¼ k

1

1 Xi and

k < n] is an unbiased estimator of σ .

2

Example 1.15 Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then X and

Pn

s2 ¼ n1 2

1 ðXi X Þ are u.e’s for μ and σ , respectively. But estimator s ¼

1 2

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Pn ﬃ

2

1 ðXi X Þ is a biased estimator of σ.

1

hqﬃﬃﬃﬃﬃﬃ

n1

1 i

The bias bðs; rÞ ¼ r n1 2

Cðn=2Þ C n1

2 1 .

Example (a) Let X b ð1; pÞ; 0 \ p \ 1.

Then there is no estimator T(X) for which E fT ðX Þg ¼ p2 8 p 2 ð0; 1Þ

i.e. π2 is not estimable. Similarly, p1 has no unbiased estimator.

m hm

f ðx; hÞ ¼ x hnx

; x ¼ 0; 1; 2 . . . n

(b) For n

h ¼ m; m þ 1; . . .

then there is no unbiased estimator for θ.

Remark 1.10 Usually, unbiased estimator is not unique. Starting from two unbiased

estimators, we can construct an inﬁnite number of unbiased estimators.

Example Let X1, X2,…, Xn be a random sample from P(λ). Then both X and

P

s2 ¼ n1 n 2

1 ðXi XÞ are unbiased estimators of λ as mean = variance = λ for P(λ).

1

Let Ta ¼ aX þ ð1 aÞs2 ; 0 a 1. Here, Ta is an unbiased estimator of λ.

Remark 1.11 An unbiased estimator may be absurd.

Example Let X PðkÞ. Then T ðX Þ ¼ ð2ÞX is an unbiased estimator of e−3λ since

X ek kx

E fT ðX Þg ¼ ð2Þx

x

x!

X ð2kÞx

¼ ek ¼ ek e2k ¼ e3k :

x

x!

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 23

\ 0 for odd X:

∴ T(X) which is an estimator of a positive quantity e3k [ 0 may occasionally be

negative.

k

Example Let X * P(λ).

( Construct an unbiased estimator of e .

1 if x ¼ 0

Ans Let TðxÞ ¼

0; otherwise

Remark 1.12 Mean square error of an unbiased estimator (i.e. variance of unbiased

estimator) may be greater than that of a biased estimator and then we prefer the

biased estimator.

¼ V ðT Þ þ b2 ðT; hÞ where bðT; hÞ ¼ E ðT Þ h:

E(T2) = θ.

) MSE ðT 1 Þ ¼ V ðT 1 Þ þ b2 ðT 1 ; hÞ

MSE ðT 2 Þ ¼ V ðT 2 Þ

if V ðT 2 Þ [ V ðT 1 Þ þ b2 ðT 1 ; hÞ; then we prefer T 1 :

e.g. Let X1, X2,…, Xn be a random sample from N(μ, σ2). Then s2 ¼

P P

1 2

i ðX i XÞ is an unbiased estimator of σ . Clearly, n þ 1

2 1 2

i ðX i XÞ ¼ n þ 1s

n1 2

n1

is a biased estimator of σ .

2

ðn 1ÞS2 ðn 1ÞS2

As v 2

n1 ) V ¼ 2 ð n 1Þ

r2 r2

2

) V s2 ¼ r4 ¼ MSE of s2 :

n1

24 1 Theory of Point Estimation

2

On the other hand, MSE of n1 2

nþ1 s ¼ V n1 2

nþ1 s þ n1 2

nþ1 r r2

n1 2 2 4r4 2r4 2r4 2r4

¼ r4 þ ¼ ð n 1 þ 2Þ ¼ \

nþ1 n 1 ð n þ 1Þ 2

ðn þ 1Þ 2 nþ1 n1

nþ1 s , i.e. MSE (Unbiased estimator) > MSE (biased

estimator).

Remark 1.13: Pooling of information Let Ti be an unbiased estimator of θ obtained

from the ith source, i = 1, 2 …, k. Suppose Ti’s are independent and V(Ti) = σ2i < σ2

8i. Then T k ¼ 1k ¼ ðT 1 þ T 2 þ þ T k Þ is also an unbiased estimator of θ with

P

V T k ¼ k12 k1 r2i \ rk ! 0 as k !/.

2

The implication of this statement is that T k gets closer and closer to the true

value of the parameter as k → ∝ (k becomes larger and larger).

On the other hand if Ti’s are biased estimator with common bias β, then T k

approaches to the wrong value θ + β instead of the true value θ even if k → ∝.

Problem 1.1 Let X1, X2, …, Xn be a random sample from bð1; ^Þ.

Show that

XðX1Þ 2

(i) nðn1Þ is an unbiased estimator of ^

XðnXÞ

(ii) is an unbiased estimator of ^ð1 ^Þ

nðn1Þ

P

where X = number of success in n tosses = ni¼ 1 Xi .

Minimum-VarianceUnbiased Estimator (MVUE)

Let U be the set of all u.e’s (T) of θ with E T 2 \ / 8h 2 H, and then an

estimator T0 2 U will be called a minimum-variance unbiased estimator (MVUE) of

θ{or γ(θ)} if V(T0) ≤ V(T) 8θ and for every T 2 U.

Result 1.1 Let U be the set of all u.e’s (T) of θ with E(T2) < ∝, 8θ 2 Θ.

Furthermore, let U0 be the class of all u.e’s (v) of ‘0’ {Zero} with E(v2) < ∝ 8θ, i.e.

U0 = {v: E(v) = 0 8θ and E(v2) < ∝].

Then an estimator T0 2 U will be an MVUE of θ iff

CovðT 0 ; vÞ ¼ E ðT 0 vÞ ¼ 0 8 h; 8v 2 U 0 :

E ðT 0 vÞ ¼ 0 8h; 8v 2 U 0 ð1:9Þ

∴ E(T0v) ≠ 0 for some θ0 and for some v0 2 U0. h

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 25

Again, T0 + λv0 2 U, as E(T0 + λv0)2 < ∞

Now V00(T0 + λvo) = V00(T0) + λ2 E00(V20) + 2λE(v0T0).

Choose a particular setting k ¼ EE00 ðTV0 v2 0 Þ assuming E00(v20) > 0

00 ð 0 Þ

E 200 ðT o vo Þ

We have V 00 ðT 0 þ kv0 Þ ¼ V 00 ðT 0 Þ E o ðV 2o Þ

\ V 00 ðT 0 Þ which contradicts the

fact that T0 is a minimum-variance unbiased estimator of θ.

(if part) It is given that Cov(T0v) = 0 8 θ, 8 v 2 U0. We have to prove that T0 is

an MVUE of θ. Let T be an estimator belonging to U, then (T0 − T) 2 U0.

∴ From the given condition, Cov(T0, T0 − T) = 0

) V ðT 0 ÞCovðT 0 ; T Þ ¼ 0 ) Cov T 0; T ¼ V ðT 0 Þ ð1:10Þ

Now, V ðT 0 T Þ 0

) V ðT 0 Þ þ V ðT Þ 2 CovðT 0 ; T Þ 0

) V ðT 0 Þ þ V ðT Þ 2 V ðT 0 Þ 0 ðby ð1:10ÞÞ

) V ðT Þ V ðT 0 Þ 8h 2 H:

Result 1.2 Minimum-variance unbiased estimator is unique.

Proof Suppose T1 and T2 are MVUE’s of θ.

Then

) E T 21 ¼ E ðT 1 T 2 Þ ) qT 1 T 2 ¼ 1

Now V ðT 1 Þ ¼ b2 V ðT 2 Þ 8h ) b2 ¼ 1 ) b ¼ 1 ðas qT 1 T 2 ¼ 1Þ:

Again EðT 1 Þ ¼ bE ðT 2 Þ þ a 8h )a ¼ 0

as E ðT 1 Þ ¼ EðT 2 Þ ¼ h and b ¼ 1 ) P ðT 1 ¼ T 2 Þ ¼ 1 h

Remark Correlation coefﬁcient between T1 and T2 (where T1 is an MVUE of θ and

T2 is any unbiased estimator of θ) is always non-negative.

) Cov ðT 1 ; T 2 Þ ¼ V ðT 1 Þ 0:

26 1 Theory of Point Estimation

αT1 + βT2 will be an MVUE of αγ1(θ) + βγ2(θ).

Proof Let v be an u.e. of zero.

Then

EðT 1 vÞ ¼ 0 ¼ E ðT 2 vÞ

Now

) ðaT 1 þ bT 2 Þ is an MVUE of ac1 ðhÞ þ bc2 ðhÞ:

h

Result 1.4: (Rao–Cramer inequality) Let X1, X2,…, Xn be a random sample from

a population having p.d.f. f ðx; hÞ; h 2 H. Assume that θ is a non-degenerate open

interval on the real line. Let

Q T be an unbiased

estimator of γ(θ). Again assume that

the joint p.d.f. f ð x ; hÞ ¼ ni¼ 1 f ðxi ; hÞ of X ¼ ðX1 ; X2 ; . . .; Xn Þ satisﬁes the

following regularity conditions:

@f ð x ;hÞ

(a) @h exists

@

R R @f ð x ;hÞ

(b) @hf ð x ; hÞd x ¼ @h d

x

R

R @f ð x ;hÞ

@

(c) @h T x f ð x ; hÞd x ¼ Tð x Þ @h d x

and

(d) 0 < I(θ) < ∝

@ log f ð X ;hÞ 2

where IðhÞ ¼ E @h

, information on θ supplied by the sample of size n.

Then

ðc0 ðhÞÞ2

V ðT Þ 8h:

IðhÞ

R

Proof Since 1 ¼ Rn f ð x ; hÞd x

∴ We have from the condition (b)

Z @f x ; h Z @ log f x ; h

O ¼ dx ¼ f x;h dx ð1:11Þ

@h @h

n n

R R

h

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 27

Again, since T X is an u.e. of γ(θ), we have from condition (c)

Z @ log f ð x ; hÞ

c0 ðhÞ ¼ Tð x Þ f ð x ; hÞd x ð1:12Þ

@h

n

R

Z @ log f ð x ; hÞ

c0 ðhÞ ¼ ½Tð x Þ cðhÞ f ð x ; hÞd x

@h

n

R

)

@ log f ð X ; hÞ

¼ Cov Tð X Þ; :

@h

" ( )# 2

@ log f ð X ; hÞ

0 2

fc ðhÞg ¼ Cov Tð X Þ;

@h

( )

@ log f ð X ; hÞ

VfTð X Þg V

@h

!2 !

@ log f ð X ; hÞ @ log f ð X ; hÞ

¼ V Tð X Þ E as from ð1:11Þ E ¼ 0

@h @h

¼ V fT ð X Þ g I ð hÞ

fc 0 ð hÞ g2

) V ðT Þ ; 8 h: h

I ðhÞ

Remark 1 If the variables are of discrete type, the underlying condition and the

proof of the Cramer–Rao inequality will also be similar, only the multiple integrals

being replaced by multiple sum.

Remark 2 For any set of estimators T, having expectation γ(θ),

fc0 ðhÞg2

MSE ¼ EðT hÞ2 ¼ V ðT Þ þ B2 ðT; hÞ þ B2 ðT; hÞ ¼

IðhÞ

½1 þ B0 ðT; hÞ2

þ B2 ðT; hÞ ½where cðhÞ ¼ h þ BðT; hÞ:

IðhÞ

28 1 Theory of Point Estimation

Remark 3 Assuming that f x ; h is differentiable not only once but also twice, we

have

0¼ f ð x ; hÞd x þ f ð x ; hÞd x

@h2 @h

n n

R R

( )

@ 2 log f ð x ; hÞ

) IðhÞ ¼ E :

@h2

( )

@ 2 log f ð x ; hÞ

IðhÞ ¼ n E :

@h2

often called a minimum-variance bound estimator (MVBE). In this case, we have

@ log f ð x ; hÞ

¼ kðhÞfT cðhÞg:

@h

Note that every MVBE is an MVUE, but the converse may not be true.

Remark 6 Distributions admitting an MVUE

A distribution having an MVUE of λ(θ) must satisfy

@ log f ðx;hÞ

@h ¼ kðhÞfT cðhÞg. It is a differential equation. So

Z Z

log f ðx; hÞ ¼ T kðhÞ dh kðhÞ cðhÞdh þ cðxÞ

) f ðx; hÞ ¼ AeTh1 þ h2

Note If T be a sufﬁcient statistic for θ, then

L ¼ gðT; hÞ hðx1 ; x2 ; . . . ; xn Þ

or; ¼ ð1:13Þ

@h @h

Now the condition that T be an MVB unbiased estimator of h is that / ¼

@ log L

@h ¼ BðThÞ which is a linear function of T and h and V(T) = 1/B. Thus if there

exists an MVB unbiased estimator of h it is also sufﬁcient. The converse is not

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 29

case T is not an MVB unbiased estimator of h. Thus the existence of MVB unbiased

estimator implies the existence of a sufﬁcient estimator, but the existence of suf-

ﬁcient statistic does not necessarily imply the existence of an MVB unbiased

estimator. It also follows that the distribution possessing an MVB unbiased esti-

mator for its parameter can be expressed in Koopman form. Thus, when

0

L ¼ eA T þ BðhÞ þ Rðx1 ;x2 ;...xn Þ , T is an MVB unbiased estimator of h with variance

0

1=ð@A

@h Þ, which is also MVB.

Pn Pn

n 12 ðxi lÞ2 12 ðxi xÞ2 þ nx þ nl2 2nlx n2 log 2p

Here, L ¼ p1ﬃﬃﬃﬃ e i¼1

2p

¼e i¼1

@l Þ ¼ n.

1

from b(1, π) distribution.

Pn P

Here, P P xi log p þ ðn xi Þ logð1pÞ

nx log p 1p þ n logð1pÞ

n xi

L¼p xi

i ð1 pÞ ii¼1

¼e i

¼e :

0

Take ¼ A T ¼ p

n log 1p x where T ¼ x ¼ k

n, k = number of successes in n trials.

@n log 1p

p

pð1 pÞ

MVB ¼ 1 ¼ :

@p n

Remark 7 A necessary condition for satisfying the regularity conditions is that the

domain of positive p.d.f. must be free from h.

2

Example 1.16 Let X U½0; h, let us compute nE @logf@hðx;hÞ which is hn2 . So

Cramer–Rao lower bound, in this case, for the variance of an unbiased estimator of

2

h is hn (apparant).

Now, we consider an estimator. T ¼ n þn 1 X ðnÞ ; X ðnÞ ¼ maxðX1 ; X 2 ;. . .; X n Þ: P.d.f.

n1 1

of X ðnÞ is fxðnÞ ðxÞ ¼ n hx h ; 0 x h:

3h

Zh

n n x nþ1

)E X ðnÞ ¼ n xn dx ¼ n : 5 ¼ n h:

h h nþ1 nþ1

0 0

2 2

VðTÞ ¼ nðnhþ 2Þ \ hn .

30 1 Theory of Point Estimation

This is not surprising because the regularity conditions do not hold here.

Actually, here @f @h

ðx;hÞ

exists for h 6¼ x but not for θ = x since

1

f ðx; hÞ ¼ if h x

h

¼0 if h \ x:

functions and h be any estimator of cðhÞ in U which is the class of unbiased

estimators (h) with Eh ðh2 Þ \ 1 8h:. Let T be a sufﬁcient statistic for the family

fF h ; h 2 Hg. Then Eðh=TÞ is free from h and will be an unbiased estimator of cðhÞ.

Moreover, V fEðh=TÞg VðhÞ 8h; h 2 H:

The equality sign holds iff h ¼ Eðh=TÞ with probability ‘1’.

Proof Since T is a sufﬁcient for the family fF h ; h 2 Hg; conditional distribution of

h given T must be independent of h.

)Eðh=TÞ will be free from h.

Now; cðhÞ ¼ EðhÞ ¼ E T Eh=T ðh=TÞ 8 h

i:e: cðhÞ ¼ EfEðh=TÞg 8h

Again we know that VðhÞ ¼ V fEðhjTÞg þ EfVðhjTÞg

‘=’ holds iff VðhjTÞ ¼ 0:, i.e. iff h ¼ EðhjTÞ with probability ‘1’

h i

VðhjT Þ ¼ Eh=T fh EðhjT Þg2

h

Result 1.6: Lehmann–Scheffe Theorem If T be a complete sufﬁcientstatistic for θ

and if h be an unbiased estimator of cðhÞ, then EðhjTÞ will be an MVUE of cðhÞ.

Proof Let both h1 ; h2 2 U ¼ ½h : EðhÞ ¼ cðhÞ; Eðh2 Þ \ 1.

Then E fEðh1 jTÞg ¼ cðhÞ ¼ E fEðh2 jTÞg (from Result 1.5).

Hence, E fEðh1 jTÞ Eðh2 jTÞg ¼ 0. . .8h

) P Eðh1 jTÞ ¼ Eðh2 jTÞ ¼ 1 ð* T is completeÞ:

Again, applying Result 1.5, we have V fEðhjTÞg VðhÞ8 h 2 U: Now since

EðhjTÞ is unique, it will be an MVUE of cðhÞ. h

Remark 1.14 The implication of Result 1.5 is that if we are given an unbiased

estimator h, then we can improve upon h by forming the new estimatorEðhjTÞ

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 31

estimator starting from an unbiased estimator has been called Blackwellization.

Problem 1.2 Let X 1 ; X 2 ; . . .; X n be a random sample from Nðl; r2 Þ; r2 known. Let

cðlÞ ¼ l2 .

(a) Show that the variance of any unbiased estimator of l2 cannot be less than

4l2 r2

n

2 2 4l2 r2 2r4

(b) Show that T ¼ X rn is an MVUE of l2 with variance n þ n .

1 if X ¼ 0

Example 1.17 Let X PðkÞ, then show that dðxÞ ¼ is the only

0 otherwise

unbiased estimator of cðkÞ ¼ ek . Is it an MVUE of ek ?

Answer

Let h(x) be an unbiased estimator of ek ¼ h, say.

Then E fhðxÞg ¼ h 8h

x

X

1

h loge 1h

) hð x Þ ¼h 8h

x¼0

x!

1 if x ¼ 0

) hð x Þ ¼

0 if x 6¼ 0

) hðxÞ; i:e:; dðxÞ is the only unbiased estimator of ek .

Here, unbiased estimator of ek is unique and its variance exists. Therefore, dðxÞ

will be an MVUE of cðkÞ¼ ek .

" #

X

1

2

E fhðxÞg ¼ 1:Pðx ¼ 0Þ þ 0: Pðx ¼ iÞ ¼ ek \ 1

i¼1

2 2

X 1n is an MVUE of l2 . Note that X 1n may occasionally be negative, so that

an MVUE of l2 is not very sensible in this case.

Remark 1.16 An MVUE may not exist even though an unbiased estimator does

exist.

Example Let

PfX ¼ 1g ¼ h and PfX ¼ ng ¼ ð1 hÞ2 hn ; n ¼ 0; 1; 2; . . .; 0\ h \ 1.

32 1 Theory of Point Estimation

1 if X ¼ 1

e.g. T ðX Þ ¼

0 otherwise

(Generalization of Cramer–Rao lower bound)

Regularity conditions

A family of distribution P ¼ ff h ðxÞ; h 2 Xg is said to satisfy Bhattacharya

regularity conditions if

1. θ lies in an open interval Ω of real line R. Ω may be inﬁnite;

@i

2. @h i f h ðxÞ exists for almost all x and 8h; i ¼ 1; 2;. . .k;

Z Z i

@i @

3: f h ðxÞdx ¼ f h ðxÞdx 8h; i ¼ 1; 2. . .k; and

@h i

@hi

i ¼ 1; 2;. . .k

4. V k

k ðhÞ ¼ mij ðhÞ ;

j ¼ 1; 2;. . .k

exists and is positive deﬁnite 8h where

1 @i @j

mij ðhÞ ¼ E h f ð xÞ f ð xÞ :

f h ðxÞ @hi h @h j h

conditions.

Theorem 1.1 Let P ¼ff h ðxÞ; h 2 Xg be a family of distributions satisfying

above-mentioned regularity conditions and gðhÞ be a real valued, estimable, and

k times differentiable function of θ. Let T be an unbiased estimator of gðhÞ satisfying

R R

5. @h@ i tðxÞf h ðxÞdx ¼ tðxÞ @h@ i f h ðxÞdx

Then

Varh ðT Þ g0 V 1 g 8h

where

n o @i

g0 ¼ gð1Þ ðhÞ; gð2Þ ðhÞ; . . .; gðkÞ ðhÞ ; gðiÞ ðhÞ ¼ i gðhÞ

@h

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 33

Proof

1 @i

Define bi ðx;hÞ ¼ f ð xÞ

f h ðxÞ @hi h

Z

1 @i

E ½bi ðx;hÞ ¼ f ðxÞ f h ðxÞdx ¼ 0

f h ðxÞ @hi h

2

1 @i

V ½bi ðx;hÞ ¼ E½bi ðx;hÞ2 ¼ E f ð x Þ ¼ mii ðhÞ

f h ðxÞ @hi h

Cov bi ; bj ¼ mij ðhÞ

Z Z

1 @i @i

CovðT;bi Þ ¼ t ð xÞ f h ðxÞ : f h ðxÞdx ¼ tðxÞf h ðxÞdx ¼ gðiÞ ðhÞ

f h ðxÞ @h i

@hi

0 1

T

B C

B 1C

b

B C

B b2 C

B C

B C

Let Rðk þ 1Þxðk þ 1Þ ¼ DispB C

B C

B C

B C

B C

@ A

bk

8 ð1Þ ð2Þ 9

>

> V h ðT Þ g ð hÞ g ð hÞ . . . gð k Þ ð hÞ >

>

>

> gð1Þ ðhÞ >

>

>

> m11 m12 ... m1k > >

>

>

>

< gð2Þ ðhÞ >

=

m21 m22 ... m 21 V h ðT Þ g0

¼ ¼

>

> ... ... >

> g V

>

> >

>

>

> >

>

> ... ... >

>

>

: ðkÞ ;

g ð hÞ mk1 mk2 ... mkk

jRj ¼ jV j V h ðT Þ g0 V 1 g

as jRj 0; jV j 0

) V h ðT Þ g0 V 1 g 0 i:e: V h ðT Þ g0 V 1 g 8h:

0 2 0 2

Cor: For k ¼ 1 V h ðTÞ fvg11ðhÞg fg ðhÞg

ðhÞ ¼ IðhÞ = Cramer–Rao lower bound. h

P

Case of equality holds when j j ¼ 0

X

X h

X Xi

)R \ k þ 1 or R k; RðVÞ ¼ k R =rank of

X

X

R RðVÞ ) R ¼ k:

34 1 Theory of Point Estimation

0 P

Lemma 1.1 Let X ¼ x1 ; x2 ; . . .; xp ; DðXÞ ¼ p

p

P

is of rank rð pÞ iff x1 ; x2 ; . . .; xp satisﬁes (p − r) linear restrictions of the

form

n o

a11 x1 Eðx1 Þ þ a12 x2 Eðx2 Þ þ þ a1p xp Eðxp Þ ¼ 0

n o

a21 x1 Eðx1 Þ þ a22 x2 Eðx2 Þ þ þ a2p xp Eðxp Þ ¼ 0

:

:

n o

apr;1 x1 Eðx1 Þ þ apr;2 x2 Eðx2 Þ þ þ apr;p xp Eðxp Þ ¼ 0

with probability 1.

Put p = k + 1, r = k; x1 ¼ T, x2 ¼ b1 ; . . .; xp ¼ bk .

Then RðRÞ ¼ k iff T; b1 ; b2 ; . . .; bk satisfy one restriction with probability ‘1’ of

the form

) a1 fT gðhÞg þ a2 b1 þ þ ak þ 1 bk ¼ 0

) T gðhÞ ¼ b1 b1 þ b2 b2 þ þ bk bk ¼ b 0 b

Result

T gðhÞ ¼ b0 b with probability ‘1’ ) T gðhÞ ¼ g0 V 1 b with probability ‘1’.

Proof

T gðhÞ ¼ b0 b ) V h ðTÞ ¼ g0 V 1 g

Consider V h b0 b g0 V 1 b ¼ V h ðT g0 V 1 bÞ

¼ V h ðTÞ þ g0 V 1 VðbÞV 1 g 2g0 V 1 CovðT; bÞ

0 ‘1’.

1 h

gð1Þ

B gð2Þ C

ð1Þ ð2Þ 1 B

B

C

C

A series of lower bounds: g V g ¼ g ; g ; . . .; g V B : C gives

0 1 ðkÞ

B C

@ : A

gðkÞ

1

nth lower bound ¼ gðnÞ 0 V n gðnÞ ¼ Dn ; n ¼ 1; 2;::; k

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 35

Proof The (n+1)th lower bound D ¼ g0n þ 1 V n þ 1 1 gn þ 1 where

n o n þ1

g0n þ 1 ¼ gð1Þ ðhÞ ; gð2Þ ðhÞ ; . . .; gðnÞ ðhÞ ; gðn þ 1Þ ðhÞ ¼ g1n ; gn þ 1

0 1

m11 m12 m1n m1;n þ 1

B m21 m22 m2n m2;n þ 1 C

B C

B: : : : C mn

Vnþ1 ¼B C ¼ V0n

B: : : : C mn mn þ 1;n þ 1

B C

@ mn1 mn2 :: mnn mn;n þ 1 A

mn þ 1;1 mn þ 1;2 :: mn þ 1;n mn þ 1;n þ 1

where m0n ¼ mn þ 1;1 mn þ 1;2 . . . mn þ 1;n .

Now Dn þ 1 ¼ g0n þ 1 V 1

n þ 1 gn þ 1

¼ g0n þ 1 C 0 ðC 0 Þ1 V 1 1

n þ 1 C Cgn þ 1 for any non symmetric matrix C n þ 1xn þ 1

0 0 1

¼ ðCgn þ 1 Þ ðCVn þ 1 C Þ ðCgn þ 1 Þ

!

In o

Choose C ¼

m0n V 1 1

n

!

In o g gn

n

) Cgn þ 1 ¼ ¼

m0n V 1

n 1 gn þ 1 gn þ 1 m0n V 1

n gn

! ! !

In o Vn mn I n V 1n mn

CVn þ 1 C 0 ¼

m0n V 1 1 m0n mn þ 1;n þ 1; o 1

n

! ! !

In o Vn o Vn o

¼ ¼

m0n V 1

n 1 m0n mn þ 1;n þ 1 m0n V 1

n mn

o

E n þ 1;n þ 1

!

V 1

n o

0 1

E n þ 1;n þ 1; [ 0; ðCV n þ 1 C Þ ¼

o E 1

n þ 1;n þ 1

36 1 Theory of Point Estimation

Then

0 1

V 1 o

n g0n

Dn þ 1 ¼ gn ; g nþ1

m0n V 1 @ A

n gn

o E 1 gn þ 1 m0n V 1

n gn

n þ 1;n þ 1

n 1 o g0n

¼ gn V n 1 ; gn þ 1 m0n V 1

n gn E n þ 1;n þ 1

gn þ 1 m0n V 1

n gn

2

gn þ 1 m0n V 1

n gn

¼ g0n V n 1 gn þ g0n V n 1 gn ¼ Dn

E1

n þ 1;n þ 1

i.e. Dn þ 1 Dn :

If there exists no unbiased estimator T of gðhÞ for which V(T) attains the nth

Bhattacharya’s Lower Bound (BLB), then one can try to ﬁnd a sharper lower bound

by considering the (n + 1)th BLB. In case the lower bound is attained at nth stage,

then Dn þ 1 ¼ Dn . However, Dn þ 1 ¼ Dn does not imply that the lower bound is

attained at the nth stage.

Example 1.18 X 1 ; X 2 ; . . .; X n is a random sample from iid Nðh; 1Þ

P 2

f h ðxÞ ¼ Const: e2 ðxi hÞ ; gðhÞ ¼ h2

1

1 1

X N h; i.e. EðXÞ ¼ h; VðXÞ ¼

n n

2 1 2 1

) EðX Þ E2 ðXÞ ¼ ) EðX Þ ¼h2 þ

n n

2 1 2 1

)E X ¼ h2 ; T ¼X :

n n

@ P 2 X

f h ðxÞ ¼ Const: e2 ðxi hÞ

1

ð x i hÞ

@h

1 @ X

b1 ¼ f h ðxÞ ¼ ð x i hÞ

f h ðxÞ @h

1 @2 nX o2

b2 ¼ f ðxÞ ¼ ð x i h Þ n

f h ðxÞ @h2 h

Eðb1 Þ ¼ 0; Eðb2 Þ¼ 0; E b21 ¼ n;

nX o3 nX o

Eðb1 b2 Þ ¼ E ðxi hÞ nE ðxi hÞ ¼ 0

nX o4 nX o2

E b22 ¼ E ðxi hÞ þ n2 2nE ð x i hÞ

¼ 3n2 þ n2 2n n ¼ 2n2

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 37

1

n 0 1 0

V¼ )V ¼ n

1

0 2n2 0 2n2

! !

1

0 2h 2h=

D2 ¼ g V0 1

g ¼ ð2h; 2Þ n

1

¼ ð2h; 2Þ n

0 2 1 2

2n2 n

4h2 2

¼ þ 2:

n n

2

2 2

2 v21;k ; k ¼ nh2 ; V nX

nX ¼ 2 þ 4h

¼ 2 þ 4nh2 ; V X

n2 n

Lower bound is attained if b0 b ¼ T gðhÞ ¼ g0 V 1 b:

! P

1

0 ð x i hÞ

T gðhÞ ¼ ð2h; 2Þ n

P

0 1

2n2 ½ ðxi hÞ2 n

x h 1

¼ ð2h; 2Þ 1 ¼ 2hðx hÞ þ ðx hÞ2

2 ðx hÞ2 2n

1 n

1 1

¼ ðx hÞð2h þ x hÞ ¼ x2 h2 :

n n

g ðxÞ, attains the kth lower

bound but not ðk 1Þth if b

g ðxÞ is a polynomial of degree k in t(x).

Proof If fh ðxÞ is of form (1.14), then

@

fh ðxÞ ¼ fh ðxÞ k10 ðhÞtðxÞ þ k20 ðhÞ

@h

1 @

b1 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ

fh ðxÞ @h

@2 h 2 00 i

0 0 00

f h ðxÞ ¼ fh ðxÞ k 1 ðhÞtðxÞ þ k 2 ðhÞ þ k 1 ðhÞtðxÞ þ k 2 ðhÞ

@h2

1 @2 2

b2 ¼ fh ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ k100 ðhÞtðxÞ þ k200 ðhÞ

fh ðxÞ @h 2

1 @i

i h

Generally, bi ¼ fh ðxÞ f ðxÞ ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Pi1 ftðxÞ; hg

@hi h

where Pi1 ftðxÞ; hg ¼ a polynomial in t(x) of degree at most (i − 1).

38 1 Theory of Point Estimation

iP

1

Let Pi1 ftðxÞ; hg ¼ Qij ðhÞ:t j ðxÞ

j¼0

Then

i X

i1

bi ¼ k10 ðhÞtðxÞ þ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ

j¼0

i1

X j ij Xi1 ð1:15Þ

i

¼ k10 ðhÞ :t j ðxÞ k20 ðhÞ þ Qij ðhÞ:t j ðxÞ

j¼0 j j¼0

Variance of b

g ðxÞ attains the kth BLB but not the ðk 1Þth BLB iff

X

k

b

g ðxÞ ¼ a0 ðhÞ þ ai ðhÞbi ð1:16Þ

i¼1

with ak ðhÞ 6¼ 0:

Proof Only if part Given that b g ðxÞ is of the form (1.16), we have to show that

^

gðxÞ is a polynomial of degree k in t(x). From (1.15), bi is a polynomial of degree

i in t(x). So by putting the value of bi in (1.16), we get ^gðxÞ as a polynomial of

degree k in t(x) since ak ¼ 0.

if part Given that h

X

k

^gðxÞ ¼ Cj t j ðxÞ

j¼0 ð1:17Þ

½Ck 6¼ 0 ¼ a polynomial of degree k in tðxÞ

X

k X

k X

i1

a0 ðhÞ þ ai ðhÞbi ¼ a0 ðhÞ þ ai ðhÞ Qij ðhÞ t j ðxÞ

i¼0 i¼0 j¼0

!

X

k X

i i j ij

þ ai ðhÞ k10 ðhÞ t j ðxÞ k20 ðhÞ

i¼1 j¼0 j

1.3 Unbiased Estimator and Minimum-Variance Unbiased Estimator 39

from (1.15)

!

X

k jXk i 0 ij X

k1 X k

¼ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij

j¼0 i¼j j j¼0 i¼j þ 1

!

k X i1 jXk i 0 ij X

k1 X k

¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ ai ðhÞ k2 ðhÞ þ t j ðxÞ ai ðhÞQij ðhÞ

j¼0 i¼j j j¼0 i¼j þ 1

" ! !#

k X i1 j X k i 0 j 0 ij

¼ tk ðxÞ k10 ðhÞ ak ðhÞ þ t j ðxÞ k10 ðhÞ aj ðhÞ þ ai ðhÞ k1 ðhÞ k2 ðhÞ þ Qij ðhÞ

j¼0 i¼j þ 1 j

ð1:18Þ

k

Ck ¼ ak ðhÞ k10 ðhÞ

Ck

) ak ð hÞ ¼ k 6¼ 0 and

k10 ðhÞ

! !

Pk i j ij

Cj i¼j þ 1 ai ð hÞ k10 ðhÞ k20 ðhÞ þ Qij ðhÞ

j

aj ð hÞ ¼

fk 0 ðhÞg j

for j ¼ 0; 1; . . .; k 1

Result 1 If there exists an unbiased estimator of gðhÞ say ^gðxÞ such that g^ðxÞ is a

polynomial of degree k in t(x), then

Dk ¼ kth BLB to the variance of an unbiased estimator of gðhÞ ¼ Var fgðxÞg:

Result 2 If there does not exist any polynomial in t(x) which is an unbiased

estimator of gðhÞ, then it is not possible to ﬁnd any unbiased estimator of gðhÞ

where variance attains BLB for some k.

with the sample size. Keeping this idea in mind, we deﬁne consistency as follows.

Deﬁnition An estimator Tn is said to be (weakly) consistent for cðhÞ if for any two

positive numbers 2 and d there exists an n0 (depending upon 2, d) such that

PrfjTn cðhÞj 2g [ 1 d whenever n n0 and for all h 2 H, i.e. if

Pr

Tn !cðhÞ as n ! 1

40 1 Theory of Point Estimation

An estimator Tn will be consistent for cðhÞ if EðTn Þ ! cðhÞ and VðTn Þ ! 0 as

n ! 1.

Proof By Chebysheff’s inequality, for any 20 [ 0

VðTn Þ

PrfjTn EðTn Þj 20 g [ 1 :

202

Now jT n cðhÞj jT n E ðT n Þj þ jE ðT n Þ cðhÞj

jT n E ðT n Þj 20 ) jT n cðhÞj 20 þ jEðT n Þ cðhÞj

Hence, h

PrfjT n cðhÞj 20 þ jE ðT n Þ cðhÞjg Pr T n EðT n Þ 20 [ 1

VðT n Þ

: ð1:19Þ

202

numbers ð200 ; dÞ, we can ﬁnd an n0 (depending on ð200 ; dÞ) such that

and

V(Tn Þ ð1:22Þ

and 1 02 1 d

2

Taking 2 ¼ 20 þ 200

) PrfjT n cðhÞj 2g [ 1 d whenever n n0

Since, 20 200 and d are arbitrary positive numbers, the proof is complete.

(It should be remembered that consistency is a large sample criterion)

Example 1.19 Let X 1 ; X 2 ; . . .; X n be a random sample from a population mean l

P

and standard deviation r. Then X n ¼ 1n i X i is a consistent estimator of l.

2

Proof E X n ¼ l; V X n ¼ rn ! 0 as n ! 1. Sufﬁcient condition of consistency

holds. ) X n will be consistent for l. h

1.4 Consistent Estimator 41

Alt

By Chebysheff’s inequality, for any 2

r2

Pr X n l 2 [ 1

n 22

Now for any d, we can ﬁnd an n0 so that

Pr X n l 2 [ 1 d whenever n n0 (here d ¼ nr22 )

2

Example 1.20 Show that in random sampling from a normal population, the sample

mean is a consistent estimator of population mean.

n pﬃﬃo

Proof For any 2 ð [ 0Þ; Pr X n l 2 ¼ Pr jZ j 2 r n

p

2 n

ﬃ

Zr

1 X n l pﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃe2t dt

12

¼ where Z ¼ n N ð0; 1Þ

pﬃ

2p r

2 n

r

h

Hence, we can choose an n0 depending on any two positive numbers 2 and d

such that

PrfjX n lj 2g [ 1 d whenever n n0

)X n Pr

!l as n ! 1 )X n is consistent for l.

Example 1.21 Show that for random sampling from the Cauchy population with

density function

f ðx;lÞ ¼ p1 1 þ ð1xlÞ2 ; 1 \ x\ 1; the sample mean is not a consistent esti-

mator of l but the sample median is a consistent estimator of l.

Answer

Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; lÞ ¼ p1 1 þ ð1xlÞ2 : It can be shown

that the sample mean X is distributed as x.

Z2

1 1 2

) Pr X n l 2 ¼ dZ ¼ tan1 2 taking Z ¼X l

p 1þZ 2 p

2

Since this probability does not involve n, Pr X n l 2 cannot always be

greater than 1 d; and however large n may be.

It can be shown that for the sample median X ~ n;

2

~n ¼ l þ 0 1 ;

E X ~n ¼ 0 1 þ p

V X

n n 4n

42 1 Theory of Point Estimation

) Since E X~ n ! l and V X ~ n ! 0 as n ! 1; sufﬁcient condition for consistent

~

estimator holds. )X n ; is consistent for l.

Remark 1.17 Consistency is essentially a large sample criterion.

Remark 1.18 Let T n be a consistent estimator of cðhÞ and wfyg be a continuous

function. Then wfT n g will be a consistent estimator of wfcðhÞg.

Proof Since T n is a consistent estimator of cðhÞ; for any two +ve numbers 21 and d;

we can ﬁnd an n0 such that h

PrfjT n cðhÞj 21 g [ 1 d whenever n n0 :

Now wfT n g is a continuous function of T n : Therefore, for any 2, we can choose

an 21 such that

n n0

i.e. PrfjwfT n g wfcðhÞgj 2g [ 1 d whenever n n0 :

Remark 1.19 A consistent estimator is not unique

For example, if T n is a consistent estimator of h, then for any ﬁxed a and b

T 0n ¼ na

nb T n is also consistent for h.

h ; 0 \ x\ h; consistent estimator of h is X ðnÞ ¼ max1 i n X i : But it is not

1

unbiased.

Remark 1.21 An unbiased estimator is not necessarily consistent, e.g.

f ð xÞ ¼ 12 ejxhj ; 1 \ x \ 1.

Xð1Þ þ XðnÞ

An unbiased estimator of h is 2 ; but it is not consistent.

Remark 1.22 A consistent estimator may be meaningless,

0 if n 1010

e.g. Let Tn0 ¼

Tn if n 1010

If Tn is consistent, then Tn0 is also consistent, but Tn0 is meaningless for any practical

purpose.

Remark 1.23 If T1 and T2 are consistent estimators of c1 ðhÞ and c2 ðhÞ; then

ðiÞðT1 þ T2 Þ is consistent for c1 ðhÞ þ c2 ðhÞ and

ðiiÞT 1 T 2 is consistent for c1 ðhÞc2 ðhÞ:

Proof (i) Since T1 and T2 are consistent for c1 ðhÞ and c2 ðhÞ; we can always choose

an n0 much that

1.4 Consistent Estimator 43

PrfjT1 c1 ðhÞj 21 g [ 1 d1

and

PrfjT2 c2 ðhÞj 22 g [ 1 d2

whenever n n0

21 ; 22 ; d1 ; d2 are arbitrary positive numbers.

21 þ 22 ¼ 2; ðsayÞ

PrfjT1 c1 ðhÞj 21 g þ PrfjT2 c2 ðhÞj 22 g 1

½*PðABÞ Pð AÞ þ PðBÞ 1

1 d1 þ 1 d2 1 ¼ 1 ðd1 þ d2 Þ ¼ 1 d for n n0

(ii) Again jT1 c1 ðhÞj 21 and jT2 c2 ðhÞj 22

) jT1 T2 c1 ðhÞc2 ðhÞj ¼ jfT1 c1 ðhÞgfT2 c2 ðhÞg þ T2 c1 ðhÞ þ T1 c2 ðhÞ 2c1 ðhÞc2 ðhÞj

jfT1 c1 ðhÞgfT2 c2 ðhÞgj þ jc1 ðhÞjjT2 c2 ðhÞj þ jc2 ðhÞjjT1 c1 ðhÞj

21 22 þ jc1 ðhÞj 22 þ jc2 ðhÞj 21 ¼2 ðsayÞ

PrfjT1 c1 ðhÞj 21 g þ PrfjT2 c2 ðhÞj 22 g 1

1 d1 þ 1 d2 1 ¼ 1 ðd1 þ d2 Þ ¼ 1 d whenever n n0

Example 1.22 Let X1 ; X2 ; . . .; Xn be a random sample from the distribution of X

for which the moments of order 2r ðl02r Þ exist. Then show that

P

n

(a) m0r ¼ 1n Xir is a consistent estimator of l0r , and

P 1

(b) mr ¼ 1n ðXi X Þr is a consistent estimator of lr . These can be proved using

the following results.

44 1 Theory of Point Estimation

l0 l02

As E m0r ¼ l0r and V m0r ¼ 2r r

n

)V m0r ! 0 as n ! 1 )m0r is consistent for l0r and E ðmr Þ ¼ lr þ 0 1n

1 1

V ðm r Þ ¼ l2r l2r 2rlr1 lr þ 1 þ r 2 l2r1 l2 þ 0 2 ! 0 as n ! 1

n n

)mr is consistent for lr :

l2

(c) Also it can be shown that b1 and b2 are consistent estimators of b1 ¼ l33 and

2

b2 ¼ ll42 :

2

ff ðx; hÞ; h 2 Hg: Let an unbiased estimator of cðhÞ be T. Then the efﬁciency of T is

given by

.

fc 0 ð hÞ g2

I ð hÞ

V ðT Þ

An estimator T will be called (most) efﬁcient if eff(T) = 1. An estimator T of cðhÞ

is said to be asymptotically efﬁcient if E ðT Þ ! cðhÞ and eff (T) ! 1 as n ! 1:

Let T1 and T2 be two unbiased

estimators of cðhÞ. Then the efﬁciency of T1

relative to T2 is given by eff. T1=T2 ¼ VV ððTT21 ÞÞ :

Remark 1.25 In many cases, MVBE does not exist even though the family satisﬁes

the regularity conditions. Again in many cases, the regularity conditions do not

hold. In such cases, the above deﬁnition fails. If MVUE exists, we take it as an

efﬁcient estimator.

Remark 1.26 The efﬁciency measure has an appealing property of determining the

relative sample sizes needed to attain the same precision of estimation as measured

by variance.

e.g.: Suppose an estimator T1 is 80 % efﬁcient and V ðT1 Þ ¼ nc ; where c depends

upon h. Then, V ðT0 Þ ¼ 0:8 nc : Thus the estimator based on a sample of size 80 will

be just as good as an estimator T1 based on a sample of size 100.

1.5 Efﬁcient Estimator 45

Example 1.23 Let T1 and T2 be two unbiased estimators of h with efﬁciency e1 and

e2 , respectively. If q denotes the correlation coefﬁcient between T1 and T2 , then

pﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

e1 e2 ð1 e1 Þð1 e2 Þ q e1 e2 þ ð1 e1 Þð1 e2 Þ

Proof For any real ‘a’, T ¼ aT 1 þ ð1 aÞT 2 will also be an unbiased estimator of

h. Now

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

V ðT Þ ¼ a2 V ðT 1 Þ þ ð1 aÞ2 V ðT 2 Þ þ 2að1 aÞq V ðT 1 ÞV ðT 2 Þ:

V0 V0 V0

) a2 þ ð1 aÞ2 þ 2að1 aÞq pﬃﬃﬃﬃﬃﬃﬃﬃﬃ V0

e1 e2 e1 e2

1 1 2q 1 q 1

) a2 þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2a pﬃﬃﬃﬃﬃﬃﬃﬃﬃ þ 1 0

e1 e2 e1 e2 e2 e1 e2 e2

!2 !2

pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃ

e2 e1 e2

q

e2 e1 e2

q

e2 1

1 1 1

) a 1 þ 1 1 0

e1 þ e2 e1 e2

1 p2qﬃﬃﬃﬃﬃﬃ 1 ﬃﬃﬃﬃﬃﬃ

e1 þ e2 e1 e2

p2q

e1 þ e2 e1 e2

1 p2qﬃﬃﬃﬃﬃﬃ

1 pqﬃﬃﬃﬃﬃ

e2 e1 e2

Taking a ¼ 1 ﬃﬃﬃﬃﬃ, we get

þ e1 pe2qe

e1 2 1 2

2

1 1 1 2q 1 q

1 þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ 0

e2 e1 e2 e1 e2 e2 e1 e2

p ﬃﬃﬃﬃﬃﬃﬃﬃ

ﬃ e 1 e1

) q2 þ 2q e1 e2 þ ð1 e 2 Þ 1 þ 0

e2 e2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃ

) q2 2q e1 e2 1 þ e1 þ e2 0

pﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2

) ð q e1 e2 Þ ð 1 e1 Þ ð 1 e2 Þ 0

pﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

) q e1 e2 ð1 e1 Þð1 e2 Þ Hence, the result. h

Remark 1.27 The correlation coefﬁcient between T and the most efﬁcient estimator

pﬃﬃﬃ

is e where e is the efﬁciency of the unbiased estimator T. Put e2 ¼ e and e1 = 1 in

pﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

the above inequality; q e1 e2 ð1 e1 Þð1 e2 Þ and easily we get the

result.

Chapter 2

Methods of Estimation

2.1 Introduction

estimators viz. unbiasedness, minimum variance, consistency and efﬁciency which

are the desirable properties of a good estimator. In this chapter, we shall discuss

different methods of estimating parameters which are expected to provide estima-

tors having some of these important properties. Commonly used methods are:

1. Method of moments

2. Method of maximum likelihood

3. Method of minimum v2

4. Method of least squares

In general, depending on the situation and the purpose of our study we apply any

one of the methods that may be suitable among the above-mentioned methods of

point estimation.

estimation. Let (X1, X2,…Xn) be a random sample from a population having p.d.f.

(or p.m.f) f(x,θ), θ = (θ1, θ2,…, θk). Further, let the ﬁrst k population moments about

zero exist as explicit function of θ, i.e. l0r ¼ l0r ðh1 ; h2 ; . . .; hk Þ; r = 1, 2,…,k. In the

method of moments, we equate k sample moments with the corresponding popu-

lation moments. Generally, the ﬁrst k moments are taken because the errors due to

sampling increase with the order of the moment. Thus, we get k equations

l0r ðh1 ; h2 ; . . .; hk Þ; ¼ m0r ; r = 1, 2,…, k. Solving these equations we get the method

P P

of moment estimators (or estimates) as m0r ¼ 1n ni¼1 X ri (or m0r ¼ 1n ni¼1 xri ).

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_2

48 2 Methods of Estimation

If the correspondence between l0r and θ is one-to-one and the inverse function is

hi ¼ f i l01 ; l02 ; . . .; l0k , i = 1, 2,.., k then, the method of moment estimate becomes

^

hi ¼ f i m01 ; m02 ; . . .; m0k . Now, if the function fi() is continuous, then by the weak

law of large numbers, the method of moment estimators will be consistent.

This method gives maximum likelihood estimators when f(x, θ) = exp

(b0 + b1x + b2x2 + ….) and so, in this case it gives efﬁcient estimator. But the

estimators obtained by this method are not generally efﬁcient. This is one of the

simplest methods. Therefore, these estimates can be used as a ﬁrst approximation to

get a better estimate. This method is not applicable when the theoretical moments

do not exist as in the case of Cauchy distribution.

Example 2.1 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.

f ðx; a; bÞ ¼ bða;b

1

Þx

a1

ð1 xÞb1 ; 0\x\1; a; b [ 0: Find the estimators of a and

b by the method of moments.

Solution

P

Hence, a þa b ¼ x; ða þ abðÞaðaþþ1bÞ þ 1Þ ¼ 1n ni¼1 x2i

P 2

ðx 1Þð xi nxÞ ^

By solving, we get ^b ¼ P 2 and ^a ¼ 1xb x :

ðxi xÞ

This method of estimation is due to R.A. Fisher. It is the most important general

method of estimation. Let X ¼ ðX 1 ; X 2 ; . . .; X n Þ denote a random sample with joint

p.d.f or p.m.f. f x ; h ; h 2 H (θ may be a vector). The function f x ; h , con-

sidered as a function of θ, is called the likelihood function. In this case, it is denoted

by L(θ). The principle of maximum likelihood consists of choosing an estimate, say

^

h; within the admissible range of θ, that maximizes the likelihood. ^h is called the

maximum likelihood estimate (MLE) of θ. In other words, ^h will be an MLE of θ if

L h^ LðhÞ8 h 2 H:

monotone function, ^h satisﬁes

2.3 Method of Maximum Likelihood 49

log L ^h log LðhÞ8 h 2 H:

Again, if log LðhÞ is differentiable within H and ^h is an interior point, then ^h will

be the solution of

@log LðhÞ 0

¼ o; i ¼ 1; 2; . . .; k; h k1 = ðh1 ; h2 ; . . .; hk Þ .

@hi

Problem 2.1 Let ðX 1 ; X 2 ; . . .; X n Þ be a random sample from b(m, p ), (m known).

Pn

^¼ mn

Show that p 1

i¼1 X i is an MLE of p.

^ P

k ¼ 1n ni¼1 X i is an MLE of λ.

Problem

P

sample from N ðl; r2 Þ. Show that

P

s2 is an MLE of ðl; r2 Þ, where X

X; ¼ 1 n X i and s2 ¼ 1 n ðX i X Þ2 :

n i¼1 n i¼1

p.d.f f ðx; hÞ ¼ 12 ejxhj ; 1\x\1:

Show that the sample median X ~ is an MLE of θ.

Answer

Pn

LðhÞ ¼ Const: e i¼1

jxi hj

P

Maximization of L(θ) is equivalent to the minimization of ni¼1 jxi hj: Now,

Pn

~

i¼1 jxi hj will be least when h ¼ X; the sample median as the mean deviation

about the median is least. X ~ will be an MLE of θ.

Properties of MLE

(a) If a sufﬁcient statistic exists, then the MLE will be a function of the sufﬁcient

statistic.

n o

Proof Let T be a sufﬁcient statistic for the family f X ; h ; h 2 H

Qn n o

By the factorisation theorem, we have f ð x i ; hÞ ¼ g T X ; h h X :

n i¼1 o n o

To ﬁnd MLE, we maximize g T x ; h with respect to h. Since g T X ; h

50 2 Methods of Estimation

Remark 2.1 Property (a) does not imply that an MLE is itself a sufﬁcient statistic.

Example 2.3 Let X1, X2,…,Xn be a random sample from a population having p.d.f.

1 8 hxhþ1

f X; h ¼ :

0 Otherwise

1 if h MinX i MaxX i h þ 1

Then, LðhÞ ¼ :

0 Otherwise

Any value of θ satisfying MaxX i 1 h MinX i will be an MLE of θ. In

particular, Min Xi is an MLE of θ, but it is not sufﬁcient for θ. In fact, here

ðMinXi ; MaxXi Þ is a sufﬁcient statistic.

(b) If T is the MVBE, then the likelihood equation will have a solution T.

@ log f X ;h

Proof Since T is an MVBE, ¼ ðT hÞkðhÞ

@h

@ log f X ;h

Now, @h ¼0

) h ¼ T ½* kðhÞ 6¼ 0:

d ¼ wðT Þ will be an MLE of d. h

n o

Proof Since T is an MLE of θ, L T X LðhÞ8h;

Since the correspondence between θ and d is one-to-one, inverse function must

exist. Suppose the inverse function is h ¼ w1 ðdÞ:

Thus, LðhÞ ¼ L w1 ðdÞ ¼ L1 ðdÞ (say)

Now, 8 0

2 19 3

1 < = n o

L1 ðd Þ ¼ L w ðd Þ ¼ L4w1 w@T X A 5¼ L T X LðhÞ ¼ L1 ðdÞ.

: ;

Therefore, ‘d’ is an MLE of d.

(d) Suppose the p.d.f. (or p.m.f.) f(x, θ) satisﬁes the following regularity

conditions:

x; hÞ @ f ðx; hÞ @ f ðx; hÞ

2 3

; ; @h3 exists 8 h 2 H.

2 @h 3

2

@f ðx; hÞ @ f ðx; hÞ

(ii) @h \A1 ðxÞ; @h2 \A2 ðxÞ and @ f@hðx;3 hÞ \BðxÞ;

where A1(x) and A2(x) are integrable functions of x and

R1

BðxÞf ðx; hÞdx \ M; a ﬁnite quantity

1

R1 2

@ log f ðx; hÞ

iii) @h f ðx; hÞdx is a ﬁnite and positive quantity.

1

If ^

hn is an MLE of θ on the basis of a sample of size n, from a population having

p.d.f. (or p.m.f.) f(x,θ) which satisﬁes the above regularity conditions, then

2.3 Method of Maximum Likelihood 51

pﬃﬃﬃ^

n hn h is asymptotically normal with mean ‘0’ and variance

1 2 1

R @ log f ðx; hÞ

@h f ðx; hÞdx : Also, ^hn is asymptotically efﬁcient and consistent.

1

(e) An MLE may not be unique. h

1 if h x h þ 1

Example 2.4 Let f ðx; hÞ ¼ :

0 Otherwise

1 if h min xi max xi h þ 1

Then, LðhÞ ¼

0 Otherwise

1 if max xi 1 h min xi

i.e. LðhÞ ¼

0 Otherwise

L(θ) will be maximized. For ﬁxed α, T a will be an MLE. Thus, we observe that an

inﬁnite number of MLE exist in this case.

(f) An MLE may not be unbiased.

Example 2.5

1

if 0 x h

f ðx; hÞ ¼ h :

0 Otherwise

1

if max xi h

Then, LðhÞ ¼ hn :

0 Otherwise

L( θ )

Max Xi θ

From the ﬁgure, it is clear that the likelihood L(θ) will be the largest when

θ = Max Xi. Therefore Max Xi will be an MLE of θ. Note that EðMax X i Þ ¼

n þ 1 h 6¼ h: Therefore, here MLE is a biased estimator.

n

52 2 Methods of Estimation

Example 2.6

1 3

f ðx; pÞ ¼ px ð1 pÞ1x ; x ¼ 0; 1; p 2 ;

4 4

p if x ¼ 1 p ¼ 34 if x ¼ 1

Then; LðpÞ ¼ i.e. LðpÞ will be maximized at

1p if x ¼ 0 p ¼ 14 if x ¼ 0

Now, EðT Þ ¼ 2p 4þ 1 6¼ p: Thus, T is a biased estimator of π.

MSE of T ¼ E ðT pÞ2

2

2x þ 1 1

¼E p ¼ Ef2ðx pÞ þ 1 2pg2

4 16

1 n o

¼ E 4ðx pÞ2 þ ð1 2pÞ2 þ 4ðx pÞð1 2pÞ

16

1 n o 1

¼ 4pð1 pÞ þ ð1 2pÞ2 ¼

16 16

2 1

MSE of dðxÞ ¼ 12 p 16 = MSE of T 8p 2 14 ; 34

Thus, in the sense of mean square error MLE is meaningless.

(h) An MLE may not be consistent

Example 2.7

hx ð1 hÞ1x if h is rational

f ðx; hÞ ¼

ð1 hÞx h1x if h is rational 0\h\1; x ¼ 0; 1:

(i) The regularity conditions in (d) are not necessary conditions.

Example 2.8

1 1\x\1

f ðx; hÞ ¼ ejxhj ;

2 1\h\1

Here, regularity conditions do not hold. However, the MLE (=sample median) is

asymptotically normal and efﬁcient.

bebðxaÞ ; a x\1 and b [ 0.

Find MLE’s of α, β.

2.3 Method of Maximum Likelihood 53

Solution

P

n

b ðxi aÞ

Lða; bÞ ¼ b e n i¼1

X

n

loge Lða; bÞ ¼ n loge b b ð x i aÞ

i¼1

@ log L P @ log L

@b ¼ bn ðxi aÞ and @a ¼ nb:

@ log L

Now, ¼ 0 gives us β = 0 which is nonadmissible. Thus, the method of

@a

differentiation fails here.

Now, from the expression of L(α, β), it is clear that for ﬁxed β(>0), L(α, β)

becomes maximum when α is the largest. The largest possible value of α is

X(1) = Min xi.

Now, we maximize L X ð1Þ ; b with respect to β. This can be done by consid-

ering the method of differentiation.

@ log L xð1Þ ; b n X n

¼0) ðxi min xi Þ ¼ 0 ) b ¼ P

@b b ðxi min xi Þ

So, the MLE of (α, β) is minxi ; P n

:

ðxi minxi Þ

Example 2.10 Let X 1 ; X 2 ; . . .; X n be a random sample from f ðx; a; bÞ ¼

ba; axb

1

0; Otherwise

(a) Show that the MLE of (α, β) is (Min Xi, Max Xi).

(b) Also ﬁnd the estimators of a and b by the method of moments.

Proof

1

ðaÞLða; bÞ ¼ if a Minxi \Maxxi b ð2:1Þ

ð b aÞ n

It is evident from (2.1), that the likelihood will be made as large as possible

when (β − α) is made as small as possible. Clearly, α cannot be larger than Min xi

and β cannot be smaller than Max xi; hence, the smallest possible value of (β − α) is

(Max xi − Min xi). Then the MLE’S of α and β are ^a ¼ Min xi and b ^ ¼ Max xi ,

respectively.

2

(b) We know E ð xÞ ¼ l11 ¼ a þ2 b and V ð xÞ ¼ l2 ¼ ðb 12aÞ

2 P

Hence, a þ2 b ¼ x and ðb 12aÞ ¼ 1n ni¼1 ðxi xÞ2

54 2 Methods of Estimation

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P ﬃ

2

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P ﬃ

2

ðxi xÞ ðxi xÞ

By solving, we get ^a ¼ x

3

^ ¼ x þ

and b

3

n n

A general method in such cases is to assume a trial solution and correct it by an

extra term to get a more accurate solution. This process can be repeated until we get

the solution to a sufﬁcient degree of accuracy.

Let L denote the likelihood and h be the MLE.

Then @ log

@h

L

¼ 0. Suppose h0 is a trial solution of @ log

@h ¼ 0

L

h¼h

Then

L @ log L log L

0 ¼ @ log þ ðh h0 Þ @ @h + terms involving ðh h0 Þ

2

@h

¼ @h 2

h¼h h¼h0 h¼h0

with powers higher

than unity.

@ log L log L

þ ðh h0 Þ @ @h

2

) 0 ’ @h 2 ; neglecting the terms involving

h¼h0 h¼h0

ðh h0 Þ with powers higher than unity. 2

L

) 0 ’ @ log

@h ðh h0 ÞIðh0 Þ; where IðhÞ ¼ E @ @h

log L

2 :

h¼h0

Thus, the ﬁrst approximate value of θ is

8 9

> @ log L >

< @h =

hð1Þ ¼ h0 þ

h¼h0

:

: I ð h0 Þ >

> ;

f1 þ ðx hÞ2 g

@ log f ðx; hÞ Pn

Here, @h ¼ 1 þ2ððxhÞ

xhÞ2

; and so the likelihood equation is ðxi hÞ

i¼1 1 þ ðxi hÞ2 ¼ 0;

clearly it is difﬁcult to solve for θ.

So, we consider successive approximation method.

In this case, IðhÞ ¼ n2 :

P

Here, the ﬁrst approximation is hð1Þ ¼ h0 þ 4n ni¼1 1 þðxði xh0hÞ Þ2 ;

i 0

Usually, we take h0 = sample median.

2.4 Method of Minimum v2 55

This method may be used when the population is grouped into a number of mu-

tually exclusive and exhaustive class and the observations are given in the form of

frequencies.

Suppose there are k classes and pi ðhÞ is the probability of an individual

belonging to the ith class. Let f i denote the sample frequency. Clearly,

Pk Pk

i¼1 pi ðhÞ ¼ 1 and i¼1 f i ¼ n:

The discrepancy between observed frequency and the corresponding expected

frequency is measured by the Pearsonian v2 , which is given by

P 2 P f2

v2 ¼ ki¼1 ff i npnpi ðhi ðÞhÞg ¼ npi iðhÞ n:

The principle of the method of minimum v2 consists of choosing an estimate of

θ, say ^h, we ﬁrst consider the minimum v2 equations @v

2

hi ¼ ith component of θ.

It can be shown that for large n, the min v2 equations and the likelihood

equations are identical and provides identical estimates.

The method of minimum v2 , is found to be more troublesome to apply in many

cases, and has no improvement on the maximum likelihood method. This method

can be used when maximum likelihood equations are difﬁcult to solve. In particular

situations, this method may be simple. To avoid the difﬁculty in minimum v2

method, we consider another measure of discrepancy, which is given by v02 ¼

Pk ff i npi ðhÞg2 02

i¼1 f ; v is called modiﬁed Pearsonian v2 . Now, we minimize, instead

i

of v2 , with respect to θ.

It can be shown that for large n the estimates obtained by min v2 would also be

approximately equal to the MLE’s. Difﬁculty arises if some of the classes are

empty. In this case, we minimize

X ff npi ðhÞg2

v002 ¼ i

+ 2M;

i:f 6¼0

fi

i

Example 2.12 Let ðx1 ; x2 ; . . .; xn Þ be a given sample of size n. It is to be tested

whether the sample comes from some Poisson distribution with unknown mean l.

How do you estimate l by the method of modiﬁed minimum chi-square?

Solution

ni observations with x ¼ i; i ¼ r þ 1; . . .; r þ k 2

nL observations x r

56 2 Methods of Estimation

P þ k2

x, which are fewer, are pooled together andnL þ ri¼r þ 1 ni þ nu ¼ n:

el li P

Let pi ðlÞ ¼ Pð x ¼ iÞ ¼ i! , pL ðlÞ ¼ Pðx r Þ ¼ ri¼0 pi ðlÞ and pu ðlÞ ¼

P1

Pðx r þ k 1Þ ¼ i¼r þ k1 pi ðlÞ:

Pr i

Pk ni @pi ðhÞ ð 1Þpi ðlÞ

Now using i¼1 pi ðhÞ @hj ¼ 0; j ¼ 1; 2; . . .p we have nL

P

i¼0 l

r þ

p ðlÞ

P1 i¼0 i

Pr þ k2 ð 1Þpi ðlÞ

i

i¼r þ 1 ni l 1 þ nu

i P

i¼r þ k1 l

1 ¼ 0.

i¼r þ k1

pi ð lÞ

Since there is only one parameter, i.e. p ¼ 1 we get the only above equation. By

solving, we get

P

r P

1

ipi ðlÞ X

rþ k2 ipi ðlÞ

i¼0 i¼r þ k1

l¼

n^ nL P r þ ini þ nu P 1

pi ðlÞ i¼r þ 1 pi ðlÞ

i¼0 i¼r þ k1

^ is approximately the sample mean x.

Hence, l

In the method of least square , we consider the estimation of parameters using some

speciﬁed form of the expectation and second moment of the observations. For

ﬁtting a curve of the form y ¼ f x; b0 ; b1 ; . . .; bp to the data ðxi ; yi Þ; i = 1, 2,…n,

we may use the method of least squares. This method consists of minimizing the

sum of squares.

P

S ¼ ni¼1 e2i , where ei ¼ yi f xi ; b0 ; b1 ; . . .; bp ; i = 1, 2,…,n with respect to

P 2 P 2

b0 ; b1 ; . . .; bp : Sometimes, we minimize wi ei instead of ei . In that case, it is

called a weighted least square method.

To minimize S, we consider (p + 1) ﬁrst order partial derivatives and get (p + 1)

equations in (p + 1) unknowns. Solving these equations, we get the least square

estimates of b0i s.

In general, the least square estimates do not have any optimum properties even

asymptotically. However, in case of linear estimation this method provides good

estimators. When f xi ; b0 ; b1 ; . . .; bp is a linear function of the parameters and the

2.5 Method of Least Square 57

x-values are known, least square estimators will be BLUE. Again, if we assume that

e0i s are independently and identically normally distributed, then a linear estimator of

0

the form a b will be MVUE for the entire class of unbiased estimators. In general,

we consider n uncorrelated observations y1 ; y2 ; . . .yn such that E ðyi Þ ¼

b1 x1i þ b2 x2i þ þ bk xki :

where b1 ; b2 . . .. . .. . .bk and r2 are unknown parameters. If Y and

b stand for

column vectors of the variables yi and parameters bj and if X ¼ xji be an ðn kÞ

matrix of known coefﬁcients xji then the above equation can be written as

EðY Þ ¼ Xb

V ðeÞ ¼ E ðee0 Þ ¼ r2 I

and I is an ðn nÞ identity matrix. The least square method requires that b0 s be such

calculated that / ¼ e0 e ¼ ðY Xb Þ0 ðY Xb Þ be the minimum. This is satisﬁed

when

@/

¼0

@b

Or; 2X 0 ðY Xb Þ ¼ 0:

^ ¼ ðX 0 X Þ1 X 0 Y:

The least square estimators b 0 s is thus given by the vector b

Example 2.13 Let yi ¼ b1 x1i þ b2 x2i þ ei ; i ¼ 1; 2; . . .. . .; n or Eðyi Þ ¼ b1 x1i þ

b2 x2i ; x1i ¼ 1 for all i.

Find the least square estimates of b1 and b2 . Prove that the method of maximum

likelihood and the method of least square are identical for the case of normal

distribution.

Solution

In matrix notation we have 0 1 0 1

1 x21 y1

B1

B x22 C

C b1

B y2 C

B C

E ðY Þ ¼ Xb ; where X ¼ B ..

.. C; b ¼ and Y ¼ B . C

@. . A b2 @ .. A

1 x2n yn

58 2 Methods of Estimation

Now,

^ ¼ ðX 0 X Þ1 X 0 Y

b

0 1

1 x21

B
P

... 1 B1 x22 C

C

Here, X 0 X ¼

1 1

B. .. C ¼ Pn P x2i

x21 x22 . . . x2n @ .. . A x2i x22i

1 x2n

P

X0Y ¼ P yi

x2i yi

P 2 P
P

^ ¼ P 1 x x2i yi

)b P P 2i P

n x22i ð x2i Þ2 x2i n x2i yi

P 2 P P P

1 x2i yi x2i x2i yi

¼ P 2 P P P P P

n x2i ð x2i Þ2 x2i yi þ n x2i yi

Hence,

P

P P P P P

^ ¼n x2i yi x2i yi x2i yi nx2y

b P 2 P ¼ P 2

2

n x2i ð x2i Þ 2 x2i nx22

P

ðx2i x2 Þðyi yÞ

¼ P

ðx2i x2 Þ2

P P P P

^ ¼ x22i yi x2i x2i yi

b 1 P P

n x22i ð x2i Þ2

P P

y x22i x2 x2i yi

¼ P 2

x2i nx2

P

ynx22 x2 x2i yi

¼ y þ P

x22i nx22

^

¼ y x2 b 2

2.5 Method of Least Square 59

E ðyi Þ ¼ b1 þ b2 xi : The estimators of b1 and b2 are obtained by the method of least

square on minimizing

X

n

/¼ ðyi b1 b2 xi Þ2

i¼1

n P

1 2

pﬃﬃﬃﬃﬃﬃﬃﬃ e2r2 ðyi b1 b2 xi Þ

1

L¼

2pr

Pn 2

L is maximum when i¼1 ðyi b1 b2 xi Þ is minimum. By the method of

P

maximum likelihood, we choose b1 and b2 such that ni¼1 ðyi b1 b2 xi Þ2 ¼ / is

minimum. Hence, both the methods of least square and maximum likelihood

estimator are identical.

Example 2.14 Let X1 ; X2 ; . . .Xn be a random sample from p.d.f.

f ðx; h; r Þ ¼ hr C1ðrÞ ex=h xr1 ; x [ 0; h [ 0; r [ 0:

Find estimator of h and r by

(i) Method of moments

(ii) Method of maximum likelihood

Answer

1 Xn 2

m11 ¼ x; m12 ¼ x

i¼1 i

n

Pn

Hence, rh ¼ x; r ðr þ 1Þh2 ¼ 1n 2

i¼1 xi Pn

ðx xÞ2

By solving, we get ^r ¼ Pn nx

2

2 and ^

h ¼ i¼1 i

ðx

x Þ n

x

i¼1 i

Pn Q n

(ii) L ¼ hnr ðC1ðrÞÞn eh i¼1 xi

1

xr1

i

i¼1

P P

log L ¼ nr log h n log Cðr Þ 1h ni xi þ ðr 1Þ ni¼1 log xi

Now, @ log L ¼ nr þ nx2 ¼ 0 ) ^h ¼ x

@h h h r

60 2 Methods of Estimation

¼ n log h n þ log xi

@r @r i¼1

C0 ðr Þ X

n

¼ n log r n n log x þ log xi

Cðr Þ i

@r ¼ 0 and to get the estimate of r.

L

Thus, for this example estimators of h and r are more easily obtained by the method

of moments than the method of maximum likelihood.

Example 2.15 If a sample of size one is drawn from the p.d.f f ðx; bÞ ¼

2

b2

ðb xÞ; 0\x\b:

^ the MLE of b and b , the estimator of b based on method of moments. Show

Find b,

^ is biased, but b is unbiased. Show that the efﬁciency of b

that b ^ w.r.t. b is 2/3.

Solution

2

L¼ ð b xÞ

b2

@ log L 2 1

¼ þ ¼ 0 ) b ¼ 2x

@b b bx

a

Thus, the MLE of b is given by b = 2x.

Rb

Now, Eð xÞ ¼ b22 0 ðbx x2 Þdx ¼ b3

Hence, b3 ¼ x ) b ¼ 3x

Thus, the estimator of b based on method of moment is given by b ¼ 3x:

Now,

^ ¼ 2 b ¼ 2b 6¼ b

E b

3 3

b

E ðb Þ ¼ 3 ¼ b

3

^ is biased but b is unbiased.

Hence, b

2.5 Method of Least Square 61

Again,

Zb

2 b2

E x2 ¼ 2 bx2 x3 dx ¼

b 6

0

b2 b2 b2

) V ð xÞ ¼ ¼

6 9 18

b2

V ðb Þ ¼ 9V ð xÞ ¼

2

2 2

^

V b ¼ 4V ð xÞ ¼ b

9

h i2

^ ^

M b ¼ V b þ E b b ^

2

2 2 2

¼ b þ bb

9 3

1

¼ b2

3

^ with respect to b is 2 :

Thus, the efﬁciency of b 3

Chapter 3

Theory of Testing of Hypothesis

3.1 Introduction

sample or samples we try to draw inference regarding population. Suppose the form

of the distribution of the population is Fh which is assumed to be known but the

parameter h is unknown. Inferences are drawn about unknown parameters of the

distribution. In many practical problems, we are interested in testing the validity of

an assertion about the unknown parameter h. Some hypothesis is made regarding

the parameters and it is tested whether it is acceptable in the light of sample

observations. As for examples, suppose we are interested in introducing a high

yielding rice variety. We have at our disposal a standard variety having average

yield x quintal per acre. We want to know whether the average yield for the new

variety is higher than x. Similarly, we may be interested to check the claim of a tube

light manufacturer about the average life hours achieved by a particular brand.

A problem of this type is usually referred to as a problem of testing of hypothesis.

Testing of hypothesis is closely linked with estimation theory in which we seek the

best estimator of unknown parameter. In this chapter, we shall discuss the problem

of testing of hypothesis.

hypothesis will be discussed.

Let q ¼ fpð xÞg be a class of all p.m.f or p.d.f. In testing problem pð xÞ is

unknown, but ρ is known. Our objective is to provide more information about pð xÞ

on the basis of X ¼ x. That is, to know whether pð xÞ 2 q q:

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_3

64 3 Theory of Testing of Hypothesis

viz., Null hypothesis (H) and alternative hypothesis (K).

Null hypothesis (H): A hypothesis that is tentatively set up is called null

hypothesis. Alternative to H is called Alternative hypothesis.

H and K are such that H \ K ¼ u and H [ Kq: We also write H as

H : pð xÞ 2 qH q

q \ qK ¼ u and qH [ qK q

and K as K : pð xÞ 2 qK q H

Write q ¼ pð xÞ ¼ p x=h ; h 2 H . Then ‘h’ is called the labelling parameter of the

distribution and ‘H’ is called the parameter space.

Example

Pm 3.1 X binðm; pÞ , X1 ; X2 ; . . .Xm are i.i.d Bernoulli (p) ) X ¼

X

i¼1 i bin ðm; pÞ, m is known, h ¼ p, H ¼ ½0; 1; outcome space

x ¼ f0; 1; 2; . . .mg f0; 1gX f0; 1gX. . .X f0; 1g

P m P

m

xi m xi

m x mx

pðx=hÞ ¼ p ð 1 pÞ Or p x =h ¼ p i¼1 ð1 pÞ i¼1

x

m x m x

q¼ p ð1 pÞmx ; p 2 ½0; 1 is known but p ð1 pÞmx is

x x

unknown if p is unknown.

Example 3.2 Let X1 ; X2 ; . . .Xn1 are i.i.d Pðk1 Þ and Y1 ; Y2 ; . . .Yn2 are i.i.d Pðk2 Þ.

Also they are independent and n1 and n2 are known.

Now,

P P

Y n1 Yn2 x y

k1 i k2 j ð n k þ n k Þ

pðx=hÞ ¼ pðxi =k1 Þ p y j k2 ¼ Q Q e 1 1 2 2

i¼1 j¼1

xi ! yj !

q ¼ p x=h ; 0\k1 ; k2 \1 is known but p x=h is unknown if h is unknown.

Example 3.3 X1 ; X2 ; . . .Xn are i.i.d N ðl;

r2Þ. X ¼ ðX1 ; X2 ; . . .Xn Þ; n 1; h ¼ ðl; r2 Þ

or flg (if r2 is known) or r2 (if l is known), H ¼ ðl; r2 Þ :

0

1\l\1; r [ 0g or fl : 1\l\1g R or r2 : r2 [ 0 :

2

3.2 Deﬁnitions and Some Examples 65

Pn

1 ðxi lÞ2

pðx=hÞ ¼ ð2pÞ n=2 n 2r2 1

r e or

P

n

12 ðxi lÞ2

¼ ð2pÞn=2 e 1 ; r2 ¼ 1 or

P

n

1 x2

¼ ð2pÞ n=2 n 2r2 1 i

r e ; l ¼ 0: or

q ¼ p x=h ; 1\l\1; r2 [ 0 or

p x=h ; 1\l\1 or

2

p x=h ; r [ 0 all are known but unknown is p x=h for ﬁxed θ (Unknown).

Parametric set up

pð xÞ ¼ pðx=hÞ; h 2 H: Then we can ﬁnd HH ð HÞ and HK ð HÞ such that

So,

H : p 2 pH , H : h 2 H H

K : p 2 pK , K : h 2 H K :

i. Simple if H contains just one parametric point, i.e. H speciﬁes the distribution

fpðx=hÞg completely.

ii. Composite if H contains more than one parametric point, i.e. H cannot

specify the distribution fpðx=hÞg completely.

Example 3.4 X1 ; X2 ; . . .Xn are i.i.d N ðl; r2 Þ: Consider the following hypothesis

ðH Þ:

1. H : l ¼ 0; r2 ¼ 1 : H ) H N ð0; 1Þ

2. H : l

0; r2 ¼ 1

3. H : l ¼ 0; r2 [ 0

4. H : r2 ¼ r20

5. H : lþr ¼ 0

The ﬁrst one is a simple hypothesis and the remaining are composite hypotheses.

Deﬁnition 3 Let x be the observed value of the random variable X with probability

model pðx=hÞ; h unknown. Wherever X ¼ x is observed, pðx=hÞ is a function of h

only and is called the likelihood of getting such a sample. It is simply called the

likelihood function and often denoted by LðhÞ or Lðh=xÞ:

66 3 Theory of Testing of Hypothesis

Deﬁnition 4 Test It is a rule for the acceptance or the rejection of the null

hypothesis (H) on the basis of an observed value of X.

Deﬁnition 5 Non-randomized test

Let x be a subset of

x such that

X 2 x ) The rejection of H

X2

x x ) The acceptance of H:

Then x is called the critical region or a test for H against K. Test ‘x’ means a

rule determined by x. Note that x does not depend on the random experiment (that

is on X). So it is called a non-randomized test.

Deﬁnition 6 Randomized test:

It consists in determining a function /ð xÞ

such that

(i) 0

/ð xÞ

1 8x 2

x

(ii) H is rejected with probability /ð xÞ whenever X ¼ x is observed.

Such a ‘/ð xÞ’ is called ‘Critical function’ or ‘test function’ or simply ‘test’ for

H against K. Here the function /ð xÞ depends on the random experiment (that is on

X). So that name ‘randomised’ is justiﬁed.

e.g. (i) and (ii) ⇒ whenever X ¼ x is observed, perform a Bernoullian trial with

probability of success /ð xÞ. If the trial results in success, reject H; otherwise H is

accepted. Thus /ð xÞ represents the probability of rejection of H.

If /ð xÞ is non-randomized with critical region ‘x’, then we have

x ¼ fx : /ð xÞ ¼ 1g

x x ¼ fx :/ðxÞ ¼ 0g (Acceptance region).

Detailed study on Non-randomized test

If x is Non-randomized test then it implies H is rejected iff X 2 x. In many cases,

we get a statistic T ¼ T ð X Þ such that for some C or C1 and C2 ,

½X 2 x , ½X : T [ C or ½X : T\C or ½X : T\C1 or : T [ C2 ; C1 \C2 :

Such a ‘T’ is called ‘test statistic’.

The event ½T [ C is called right tail test based on T.

The event ½T\C is called left tail test based on T.

The event ½T\C1 or T [ C2 is called two tailed test based on ‘T’.

Sometimes C1 and C2 are such that PfT\C1 =hg ¼ PfT [ C2 =hg8h 2 HH then

the test ½T\C1 or T [ C2 is called equal-tail test based on T.

3.2 Deﬁnitions and Some Examples 67

Let X be a random variable with pðx=hÞ as p.d.f or p.m.f of X, h 2 H; x 2

x

Testing problem

H : h 2 HH versus K : h 2 HK fHH \ HK ¼ ;g

Then the function given by

¼ PfX 2 x=hg; h 2 H

For a given h 2RHK , Px ðhÞ is called the power of ‘x’ at h. For continuous case,

P

we have Px ðhÞ ¼ x pðx=hÞdx and for discrete case we have Px ðhÞ ¼ x pðx=hÞ:

A test ‘x’ is called size-α if

Px ðhÞ

a 8h 2 HH ½a : 0\a\1 ð3:1Þ

Px ðhÞ

a 8h 2 HH and Px ðhÞ ¼ a for some h 2 HH ð3:2Þ

a and (3.2) , Sup Px ðhÞ ¼ a:

h2HH h2HH

The quantity SupfPx ðhÞ; h 2 HH g is called the size of the test. Sometimes ‘a’ is

called the level of the test ‘x’

(i) h: Real-valued; testing problem H : h ¼ h0 (Simple) or H : h

h0

(Composite) against K : h [ h0 ; x: A test; Px ðhÞ: Power function; Size of the

test: Px ðh0 Þ or Sup Px ðhÞ

h

h0

(ii) h ¼ ðh1 ; h2 Þ : 2 component vector

Testing problem: H: h1 ¼ h01 (given) against K : h1 [ h01 (composite)

x: A test

Pw ðhÞ: power function = Pw ðh1 ; h2 Þ ¼ Pw h01 ; h2 (at H) = A function of h2 .

Thus, the power function (under H) is still unknown. The quantity sup

Pw h01 ; h2 ; h2 2 Space of h2 g is known and is called the size of the test. For, e.g.

take N ðl; r2 Þ distribution and consider the problem of testing H: l ¼ 0 against K:

l [ 0, then the size of the test is

68 3 Theory of Testing of Hypothesis

n 2

o

Sup Pw ðl; r Þ l ¼ 0; 0\r2 \1 :

Example 3.5 X1 ; X2 ; . . .; Xn are i.i.d N l; r02 ; H : l

l0 against K : l [ l0 .

r0

x ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pﬃﬃﬃ

n

Pw ðhÞ ¼ Pw ðlÞ ¼ P X [ l0 þ pﬃﬃn l

r 0

pﬃﬃﬃ pﬃﬃﬃ

nðX lÞ nðl l0 Þ

¼P [ þ 1j l

r0 r0

pﬃﬃﬃ

nðl l0 Þ

¼ P Z [1 jZ N ð0; 1Þ

r0

pﬃﬃﬃ

nðl l0 Þ

¼U 1

r0

pﬃﬃﬃ

nðl l0 Þ

Size of x ¼ Sup Pw ðlÞ ¼ Sup U 1

l

l0 l

l0 r0

¼ Uð1Þ ¼ size of x for testing H : l ¼ l0 :

Example 3.6

X1 ; X2 ; . . .; Xn are i.i.d N l; r02 .

H : l ¼ l0 against K : l [ l0 .

r0 a 2 ð0; 1Þ

x¼ ðX1 ; X2 ; . . .; Xn Þ : X [ l0 þ pﬃﬃﬃ sa ;

n U ðsa Þ ¼ a

pﬃﬃﬃ

r0 nð X l0 Þ

¼ P X [ l0 þ pﬃﬃﬃ sa jl0 ¼ P [ sa jl0

n r0

¼ PfZ [ sa jZ N ð0; 1Þg ¼ a:

) Test is exactly size 0 a0 :

H : l ¼ l0 against K : l [ l0

[ cg

x : fðX1 ; X2 ; . . .; Xn Þ; X ð3:3Þ

3.2 Deﬁnitions and Some Examples 69

n o

Pw ðl0 Þ ¼ size of x for testing H ¼ P X [ c l

0

pﬃﬃﬃ pﬃﬃﬃ

n X l0 ðc l0 Þ n

¼P [ =l0

r0 r0

pﬃﬃﬃ

ðc l0 Þ n

¼P Z[ =Z N ð0; 1Þ ¼ 0:05ðgivenÞ

r0

pﬃﬃﬃ

ðc l0 Þ n 1:645r0

) ¼ s0:05 ’ 1:645 )c ¼ l0 þ pﬃﬃﬃ

r0 n

ð3:4Þ

(or, level of signiﬁcance of the test is 0.05).

Example 3.8 X1 and X2 are i.i.d. according to (, Bernoulli (1, p)).

f x=p ¼ px ð1 pÞx ; x ¼ 0; 1

Consider the test x ¼ fðX1 ; X2 Þ : X1 þ X2 ¼ 2g

Accept H if ðX1 ; X2 Þ 62 x

Test statistic: T ¼ X1 þ X2 binð2; pÞ

2

x T ¼ 2 1

Size of the test is P ðX1 ; X2 Þ 2 p¼2 ¼P

1

p ¼ 2 ¼ 2 ¼ 0:25

1

2 2

size ¼ 2: 12 þ 12 ¼ 0:75:

Let us take another test of the form:

x : Reject H if X1 þ X2 ¼ 2

xB : Reject H if X1 þ X2 ¼ 1 with probability 1

2

A: Accept H if X1 þ X2 ¼ 0

Sample space ¼ f0; 1; 2g ¼ x þ xB þ A

1

Size ¼ 1: PfðX1 þ X2 Þ ¼ 2g þ PðX1 þ X2 ¼ 1Þ þ 0: PðX1 þ X2 ¼ 0Þ

2

¼ 0:25 þ 0:25 ¼ 0:50

Deﬁnition 8 Power function of a randomized test:

70 3 Theory of Testing of Hypothesis

rejection of H when the observed value of (X = x) and E as an Event of rejection of

H. Then PðE jX ¼ xÞ ¼ /ðxÞ: Power function of / is

n o

¼ P E=h

Z n o

¼ P E=x; h p x=h dx; when X is continuous ð3:5Þ

X n o

¼ P E=x; h p x=h ; when X is discrete ð3:6Þ

Z

P/ ðhÞ ¼ /ð xÞ p x=h dx As P E=x; h ¼ P E=x ¼ /ð xÞ

¼ E h / ð xÞ

P

In case of (3.6), we get: P/ ðhÞ ¼ /ð xÞ:p x=h ¼ Eh /ð xÞ

In either case we have P/ ðhÞ ¼ Eh /ð xÞ8h 2 H:

Special cases

1. Suppose /ð xÞ takes only two values, viz. 0 and 1. In that case, we say /ð xÞ is

non-randomized with critical region x ¼ fx : /ð xÞ ¼ 1g.

In that case

¼ Ph fX 2 xg ¼ Px ðhÞ:

) / is generalization of x.

2. Suppose /ð xÞ takes three values, viz 0, a and I according as

x 2 A; x 2 wB and x 2 x. In that case /ð xÞ is called randomized test having the

boundary region WB . The power function of this test is

P/ ðhÞ ¼ Ph fX 2 xg þ aPh fX 2 wB g.

(2) ) requires post randomization: randomized test.

Example 3.9 X1 ; X2 ; . . .; Xn are i.i.d. Bernoulli (1, p), n = 25. Testing problem:

H : p ¼ 12 against K : p 6¼ 12.

3.2 Deﬁnitions and Some Examples 71

9

P

25

/ð xÞ ¼ 1 if xi [ 12 >>

=

1

(1) Non-randomized

P15 >

¼ 0 if xi

12 >

;

1

9

P

25

>

/ðxÞ ¼ 1 if xi [ c >

>

>

>

>

1 >

=

P

25

(2) ¼ a if xi ¼ c

>

>

1 >

>

P25 >

>

¼ 0 if xi \c >

;

1

Non-randomized if a ¼ 0 or 1. In case of (1), size = Ep¼12 /ð xÞ ¼

25

P

P xi [ 12jp ¼ 12 ¼ 0:50001:

1

In case of (2), we want to get (c, a) such that Ep¼12 /ð xÞ ¼ 0:05.

( ) ( )

X

25 X

25

, Pp¼12 xi [ c þ aPp¼12 xi ¼ c ¼ 0:05

1 1

nP o

25

By inspection we ﬁnd that Pp¼12 1 x i [ 17 ¼ 0:02237 and

nP o

25

Pp¼12 1 xi [ 16 ¼ 0:0546. Hence, c = 17.

P15

0:05Pp¼1 x [c

Now, a ¼ P25

2 1 i

¼ 0:050:02237 ¼ 0:8573:

P xi ¼c 0:03223

1

Thus the test given by

X

25

/ð xÞ ¼ 1 if xi [ 17

1

X

25

¼ 0:8573 if xi ¼ 17

1

X

25

¼ 0 if xi \17

1

But the test given by

72 3 Theory of Testing of Hypothesis

X

25

/ðxÞ ¼ 1 if xi 17

1

X

25

¼0 if xi \17

1

Performance of x

Our object is to choose x such that Px ðhÞ8h 2 HH and ð1 Px ðhÞÞ8h 2 HK are as

small as possible. While performing a test x we reach any of the following

decisions:

I. Observe X = x, Accept H when θ actually belongs to HH : A correct decision.

II. Observe X = x, Reject H when θ actually belongs to HH : An incorrect

decision.

III. Observe X = x, Accept H when θ actually belongs to HK : An incorrect

decision.

IV. Observe X = x, Reject H when θ actually belongs to HK : A correct decision.

An incorrect decision of the type as stated in II above is called type-I error and

an incorrect decision of the type as stated in III above is called type-II error. Hence,

the performance of x is measured by the following:

(a) Size of type-I error = Probability {Type-I error} ¼ Sup PfX 2 x=hg ¼

h2HH

Sup Px ðhÞ

h2HH

x xg 8h 2 HK

¼ 1 Px ðhÞ 8h 2 HK :

So we want to minimize simultaneously both the errors. In practice, it is not

possible to minimize both of them simultaneously, because the minimization of one

leads to the increase of the other.

Thus the conventional procedure: Choose ‘x’ such that, for a given a 2

ð0; 1Þ; Px ðhÞ

a 8h 2 HH and 1 Px ðhÞ 8h 2 HK is as low as possible, i.e.,

Px ðhÞ 8h 2 HK is as high as possible. ‘x’ satisfying above (if it exists) is called an

optimum test at the level α.

Suppose HH ¼ fh0 g a single point set and HK ¼ fh1 g a single point set.

The above condition thus reduces to: Px ðh1 Þ = maximum subject to Px ðh0 Þ

a.

Deﬁnition 9

1. For testing H : h 2 HH against K: h ¼ h1 62 HH , a test ‘x0 ’ is said to be most

powerful (MP) level ‘α’ 2 ð0; 1Þ if

3.2 Deﬁnitions and Some Examples 73

Px0 ðhÞ

a8h 2 HH ð3:7Þ

and

In particular, if H : h ¼ h0 , (3.7) and (3.8) reduce to

Px0 ðh0 Þ

a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying ﬁrst condition.

2. A test ‘x0 ’ is said to be MP size-α, if Suph2HH Px0 ðhÞ ¼

a and Px0 ðh1 Þ Px ðh1 Þ 8x satisfying Px ðhÞ

a8h 2 HH . Again if HH ¼ fh0 g,

we get the above condition as Px0 ðh0 Þ ¼ a and Px0 ðh1 Þ Px ðh1 Þ8x satisfying

Px ðh0 Þ

a.

3. For testing H : h 2 HH against K : h 2 HK ; HK \ HH ¼ /, a test ‘x0 ’ is said to

be Uniformly Most Powerful (UMP) level ‘α’ if

Px0 ðhÞ

a8h 2 HH ð3:9Þ

Px0 ðh1 Þ Px ðh1 Þ8h1 2 HK and 8x satisfying (3.9). i.e. ‘x0 ’is said to be UMP

size-α if Sup Px0 ðhÞ ¼ a and Px0 ðh1 Þ Px ðh1 Þ 8h1 2 HK and 8x satisfying

h2XH

Sup Px ðhÞ

a. Again if HH ¼ fh0 g, the aforesaid conditions reduce to

h2XH

a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g

a.

(b) Px0 fh0 g ¼ a and Px0 fh1 g Px fh1 g 8h1 2 HK and 8x satisfying Px fh0 g

a.

4. A test x is said to be unbiased if (under testing problem: H : h ¼ h0 against

K : h ¼ h1 ; ðh1 6¼ h0 Þ)

Px ðh1 Þ Px ðh0 Þ (⇒power ≥ size), i.e. it is said to be unbiased size-α if

Px ðh0 Þ ¼ a and Px ðh1 Þ a. If K : h 2 HK is composite, the above relation

reduces to (A) Px ðh1 Þ Px ðh0 Þ 8h1 2 HK ðBÞ Px ðh1 Þ a8h1 2 HK where

a ¼ Px ðh0 Þ.

5. For testing H : h ¼ h0 against K : h 2 HK 6 3 h0 , a test x is said to be

Uniformly Most Powerful Unbiased (UMPU) size-α if

(i) Px ðh0 Þ ¼ a; (ii) Px ðh1 Þ a 8h1 2 HK and (iii) Px ðh1 Þ Px ðh1 Þ 8h1 2

HK 8x satisfying (i) and (ii).

The deﬁnition of most powerful critical region, i.e. best critical region (BCR) of

size α does not provide a systematic method of determining it. The following

lemma, due to Neyman and Pearson, provides a solution of the problem if we,

however, test a simple hypothesis against a simple alternative.

The Neyman–Pearson Lemma maybe stated as follows:

74 3 Theory of Testing of Hypothesis

for some a 2 ð0; 1Þ, let x0 be a subset of

x. Suppose x0 satisﬁes the following

conditions:

Px0 ðh1 Þ Px fh1 g8x satisfying Px ðh0 Þ

a. That means ‘x0 ’ is a MP size-α test.

Proof (Continuous case)

Z

Z

x0 x

Z

Z

Z

Z

x0 x x0 \x xx0 x\x0

Z

Z

¼ p x=h1 dx p x=h1 dx

x0 x xx0

ð3:10Þ

h

Z

Z

) p x=h1 dx k p x=h0 dx

x0 x x0 x

Z

Z

xx0 xx0

2 3

Z

Z

x

x0 x xx0

2 3

Z

Z

x0 x

2 3

Z

x

kða aÞ ¼ 0 as Px ðh0 Þ

a:

3.3 Method of Obtaining BCR 75

R

(Similar result can be obtained for the discrete case replacing by R)

Notes

1. Deﬁne Y ¼ ppððxjh 1Þ

xjh0 Þ. If the random variable Y is continuous, we can always ﬁnd a

k such that, for a 2 ð0; 1Þ P½Y k ¼ a. If the random variable Y is discrete, we

sometimes ﬁnd k such that P½Y k ¼ a.

But, in most of the cases, we have (assuming that P½Y k 6¼ a)

Ph0 ðY k1 Þ\a and Ph0 ðY k2 Þ [ a; k1 [ k2 ð) PðY kÞ ¼ a has no

solution).

In that case we get a non-randomized test 0 x0 0 of level a given by

pð xjh1 Þ

x0 ¼ x: k1 ; Px0 ðh0 Þ

a:

pð xjh0 Þ

(i) Reject H if Y k1

(ii) Accept H if Y\k2

(iii) Acceptance (or Rejection) depends on the random experiment whenever

Y ¼ k2 .

Random experiment: when Y ¼ k2 is observed, perform a random experiment

with probability of success

a Ph0 fY k1 g

P¼a¼ :

P fY ¼ k2 g

the following randomized test:

pð xjh1 Þ

/0 ð xÞ ¼ 1 if k1

pð xjh0 Þ

a Ph0 fY k1 g pðx=h1 Þ

¼a¼ if ¼ k2

Ph0 fY ¼ k2 g pðx=h0 Þ

pðx=h1 Þ

¼ 0 if \k2 :

pðx=h0 Þ

Test /0 ð xÞ is obviously of size-a.

2. k ¼ 0 ) Px0 ðh1 Þ ¼ 1 ) x0 is a trivial MP test.

3. If the test (x0 ) given by N–P lemma is independent of h1 2 Hk that does not

include h0 , the test is UMP size-a.

4. Test (x0 ) is unbiased size-a.

76 3 Theory of Testing of Hypothesis

R Take k ¼ 0.

Then x0 ¼ fX : pðx=h1 Þ [ 0g. In that case Px0 ðh1 Þ ¼ pð xjh1 Þdx ¼

x0

R R

pð xjh1 Þdx ¼ pð xjh1 Þdx ¼ 1 [ a.

fx:pðxjh1 Þ [ 0g

x

) Test is trivially unbiased.

So throughout we assume that k [ 0.

Now

Z Z

) pð xjh1 Þdx k pð xjh0 Þdx ¼ ka

x0 x0

Again

pð xjh1 Þ

kpð xjh0 Þ ½As outside x0 : pð xjh1 Þ

kpð xjh0 Þ

Z Z

) pð xjh1 Þdx

k pð xjh0 Þdx

x0c x0c

, 1 Px0 ðh1 Þ

kð1 aÞ ð3:12Þ

P ðh1 Þ a

(3.11) ÷ (3.12) ) 1Px0x ðh Þ 1a , Px0 ðh1 Þ a:

0 1

) Test is unbiased.

Conclusion MP test is unbiased. Let x0 be a MP size-a test. Then, with

probability one, the test is equivalent to (assuming that ppððxjh 1Þ

xjh0 Þ has continuous dis-

tribution under h0 and h1 ) x0 ¼ fx : pð xjh1 Þ [ kpð xjh0 Þg where k is such that

Px0 ðh0 Þ ¼ a 2 ð0; 1Þ. h

Example 3.10 X1 ; X2 ; . . .Xn are i.i.d. N l; r20 ; 1\l\1; r0 = known. (without

any loss of generality take r0 ¼ 1).

X ¼ ðX1 ; X2 ; . . .Xn Þ observed value of X ¼ x ¼ ðx1 ; x2 ; . . .; xn Þ. To ﬁnd UMP

size-a test for H : l ¼ l0 against K : l [ l0 . Take any l1 [ l0 and ﬁnd MP

size-a test for

H : l ¼ l0 against K : l ¼ l1 ;

Solution

n 1 P

n

ðxi lÞ2

1 2

p x=l ¼ pﬃﬃﬃﬃﬃﬃ e 1 :

2p

3.3 Method of Obtaining BCR 77

Then

P

n

1

ðxi l0 Þ2 Pn

p x=l 2

e 1

1

2 ðl1 l0 Þ ð2xi l1 l0 Þ

1

¼ P n ¼e 1

p x=l 1

2 ðxi l1 Þ2

0

e 1

1X

¼ enxðl1 l0 Þ2ðl1 l0 Þ

n 2 2

* x ¼ xi

n

n

o

x0 ¼ x : p x=l [ kp x=l ð3:13Þ

1 0

n o

ð3:13Þ , x : enxðl1 l0 Þ2ðl1 l0 Þ [ k

n 2 2

ð3:15Þ

loge k 1

, x : x [ þ ðl1 þ l0 Þ as l1 [ l0

n ð l1 l0 Þ 2

, fx : x [ cg; say ð3:16Þ

By (3.16),

ð3:14Þ , P x [ c=l ¼ a

0

pﬃﬃﬃ pﬃﬃﬃ

nðx l0 Þ nðc l0 Þ

,P [ l0 ¼ a

1 1

N l0 ; 1 under H)

(X1 ; X2 ; . . .Xn are i.i.d N ðl0 ; 1Þ under H ) X n

pﬃﬃﬃ

, P Z [ nðc l0 ÞjZ N ð0; 1Þ ¼ a

2 1 3

Z

pﬃﬃﬃ

) nðc l0 Þ ¼ sa 4 N ðZjð0; 1ÞÞdz ¼ a5

sa

1

, c ¼ l0 þ pﬃﬃﬃ sa ð3:17Þ

n

K : l ¼ l1 ð [ l0 Þ.

78 3 Theory of Testing of Hypothesis

against K : l [ l0 .

Observations

1. Power function of the test given by (3.16) and (3.17) is

[ l0 þ psa

Px0 ðlÞ ¼ PðX 2 x0 jlÞ ¼ Pr: X ﬃﬃﬃl

n

pﬃﬃﬃ

¼ P Z [ nðl0 lÞ þ sa jZ N ð0; 1Þ

Z1

¼ N ðZjð0; 1ÞÞdz

pﬃﬃ

sa nðll0 Þ

pﬃﬃﬃ

(Under any l : ðX1 ; X2 ; . . .; Xn Þ are i.i.d. N ðl; 1Þ ) nðx lÞ N ð0; 1Þ)

pﬃﬃﬃ

¼ 1 U sa nðl l0 Þ :

Deﬁnition 10

1. For testing H : h ¼ h0 against K : h ¼ h1 , a test x (based on n observations) is

said to be consistent if the power Px ðh1 Þ of the test tends to ‘1’ as n ! 1.

pﬃﬃﬃ

2. Px0 ðlÞ ¼ 1 Uðsa nðl l0 ÞÞ which increases as l increases for ﬁxed n.

¼ 1 ð 1 aÞ ¼ a

) x0 is unbiased.

3. Px0 ðlÞ\Px0 ðl0 Þ for all l\l0

¼a

) Power \a for any l\l0

That is, test x0 is biased for testing H : l ¼ l0 against K : l\l0 .

4. From (3.15), if l1 \l0 , we get x0 to be equivalent to

fx : x\c0 g ð3:20Þ

3.3 Method of Obtaining BCR 79

and Px0 ðl0 Þ is equivalent to PfX\c 0

jl0 g ¼ a

sa

) c0 ¼ l0 pﬃﬃﬃ ð3:21Þ

n

(by the same argument as before while ﬁnding c)

Test given by (3.20) and (3.21) is independent of any l1 \l0 . Hence it is UMP

size-a for H : l ¼ l0 against K : l\l0 .

n o

5. (i) UMP size-a for H : l ¼ l against K : l [ l is x ¼ x : x [ l þ psaﬃﬃ

0 0 0 0 n

(ii) UMP size-a

is x0 ¼ x : x\l0 psaﬃﬃn

0

Clearly, x0 6¼ x00 (xo is biased for H against l\l0 and x00 is biased for

H against l [ l0 ).

There does not exist any test which is UMP for H : l ¼ l0 against

K : l 6¼ l0 .

any loss of generality we take l0 ¼ 0)

X ¼ ðX1 ; X2 ; . . .; Xn Þ, observed value = x ¼ ðx1 ; x2 ; . . .; xn Þ

Testing problem: H : r ¼ r0 against K : r [ r0 .

To ﬁnd UMP size-a test for H against K : r [ r0 we take any r1 [ r0 .

Solution

P

n

n 12 x2i

Here p x=r ¼ p1ﬃﬃﬃﬃ e

2r

i

r 2x

Hence

x P

n

p =r1 r0 n 1

2 xi2 1

r2 r2

1

¼ e i 0 1

ð3:22Þ

x

p =r0 r 1

8

9

<

p x=r1 =

w0 ¼ x :

[k ð3:23Þ

: p x= ;

r0

where kð [ 0Þ

is such that

80 3 Theory of Testing of Hypothesis

P

n

x

p =r

n 2 xi r2 r2

1 2 1 1

Now,

1

[ k , rr01 e i 0 1

[ k [from (3.22)]

p x=r

0

2

r0

X

n

2 loge k n loge r1

, xi2 [

½As r1 [ r0

i

1

1 1

r20

1

r21

ð3:25Þ

r 2 r0

2

1

¼ c ðsayÞ

( )

X

n

w0 ¼ x: x2i [ c ð3:26Þ

i

( )

X

n

and P x2i [ c=r0 ¼a ð3:27Þ

i

P

n

x2i

i

) v2n

r2

Hence (3.27)

2 3

Z1

c 6 1 y 7

e 2 y21 dy ¼ a5

n

) v2n;a 4

r20 Cðn=2Þ2n=2

v2n;a

( )

X

n

w0 ¼ x: x2i [ r20 v2n;a ð3:29Þ

i

r1 [ r0 . Hence it is UMP size-a for H : r ¼ r0 against K : r1 [ r0

3.3 Method of Obtaining BCR 81

Observations

1. Under any r20 ;

1X n

x2 ¼ Yn v2n

r0 i i

2

) EðYn Þ ¼ n; V ðYn Þ ¼ 2n

ﬃﬃﬃﬃ

Ypn n

is asymptotically N ð0; 1Þ.

2n

n o

ﬃﬃﬃﬃ [ vp

n;a n

2

So, for large n; w0 ¼ x : Ypn n2n

ﬃﬃﬃﬃ

2n

v2n;a n pﬃﬃﬃﬃﬃ

and pﬃﬃﬃ 2n

ﬃ ’ sa i.e. vn;a ’ sa 2n þ n

2

( )

X

n pﬃﬃﬃﬃﬃ

x0 ¼ x: x2i [ r02 sa 2n þ n

i¼1

2. UMP size-α test for H : r2 ¼ r20 against K : r2 [ r20 is

( )

X

n

w0 ¼ x: x2i [ r20 v2n;a

i

UMP size-a test for H : r2 ¼ r20 against K : r2 \r20 is

2 3

( ) Z1

X

n 6 7

w ¼ x: x2i \r20 v2n;1a 6 f ðv2n Þdv2n ¼ 1 a7

4 5

i

v2n;ð1aÞ

Clearly, w0 6¼ w . Hence there does not exist UMP test for H : r2 ¼ r20 against

K : r2 6¼ r20 .

The power function of the test w0 is

( ) Z1

2 Xn .

Pw0 r ¼ P xi [ r0 vn;a r ¼

2 2 2 2

f v2n dv2n

i

r2

0 v2

r2 n;a

2 as r 2 increases.

2

Also Pw0 ðr Þ

Pw0 r0 ¼ a 8r : r2

r20

2

K : r2 \r20 .

Similarly w is biased (Here Pw ðr2 Þ increases as r2 decreases) and hence it

cannot be recommended for H : r2 ¼ r20 against K : r2 [ r20 .

82 3 Theory of Testing of Hypothesis

P

n

Next observe that 1n x2i is a consistent estimator of r2 . That means, for ﬁxed r2 ,

i

Pn

r2 v 2

as n ! 11n x2i ! r2 in probability and 0 nn;a ! r20 . Thus if r2 [ r20 , we get

n i .

P 2

Lt P xi [ r20 v2n;a r2 ¼ 1 implying that the test w0 is consistent against

n!1 i

K : r2 [ r20 .

Similarly the test w is consistent against K : r2 \r20

eX =2 against

2

Example 3.12 Find the MP size-a test for H : X p1ﬃﬃﬃﬃ

2p

K : X 12 ejX j

Px0 ðH Þ ¼ a ð3:31Þ

Now,

rﬃﬃﬃ

pðx=K Þ p x2 jxj

¼ e2 [k

pðx=H Þ 2

rﬃﬃﬃ

p x2

, loge þ j xj [ loge k

2 2

n p

o

, x2 2j xj þ loge 2 loge k [ 0

2

, x 2 2j x j þ C [ 0 ð3:32Þ

P x2 2j xj þ C [ 0=H ¼ a ð3:33:Þ

To ﬁnd ‘C’ we proceed as follows:

P x2 2jxj þ C [ 0=H ¼ PH x2 2jxj þ C [ 0\x\0 þ PH x2 2jxj þ C [ 0\x [ 0

2 2

¼ PH x þ 2x þ C [ 0\x\0 þ PH x 2x þ C [ 0\x [ 0

3.3 Method of Obtaining BCR 83

) PH x2 þ 2x þ C [ 0\x\0 ¼ PH x2 2x þ C [ 0 \ x [ 0

a

PH x2 2x þ C [ 0 \ x [ 0 ¼ ð3:34Þ

2

) gð xÞ is minimum at x ¼ 1

) x2 2x þ C [ 0\x [ 0 , x\x1 ðcÞ or x [ x2 ðcÞ

g(x)

x2 2x þ C ¼ 0

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

2 4 4c pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

)x¼ ¼1 1c

2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

So, x1 ðcÞ ¼ 1 1 c and x2 ðcÞ ¼ 1 þ 1 c

Hence (3.34)

n pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃo n pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃo a

, PH 0\x\1 1 c þ PH x [ 1 þ 1 c ¼

2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

a

, U 1 1 c þ1 U 1þ 1 c ¼

2 2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

a 1 1a

, U 1þ 1 c U 1 1 c ¼ 1 ¼ ð3:35Þ

2 2 2

Example 3.13 Let X be a single observation from the p.d.f. p x=h ¼ ph h2 þ1 x2 ; h [ 0.

Find the UMP size-a test for H : h ¼ h0 against K : h [ h0

84 3 Theory of Testing of Hypothesis

¼ 2 ¼ 2

pðx=h0 Þ h0 h1 þ x2 h0 h1 h20 þ h20 þ x2

h1 1

¼ 2 2

; which is a strictly increasing function of x2 ði:e:; j xjÞ:

h0 1 þ h21 h0

h0 þ x2

pðx=h1 Þ

[ k , j xj [ C ð3:36Þ

pðx=h0 Þ

Z1 1

h0 dx a 1 x a

, 2 ¼ , tan1 ¼

p h0 þ x 2 2 p h0 C 2

C

1 p 1 C a 2 1 C

, tan ¼ , 1 tan ¼a ð3:38Þ

p 2 h0 2 p h0

Test given by (3.36) and (3.38) is MP size-a. As the test is independent of any

h1 [ h0 , it is UMP size-a for H : h ¼ h0 against K : h [ h0 . Power function is

given by

ZC

h

Px0 ðhÞ ¼ Pfj X j [ C=hg ¼ 1 dx

p h2 þ x 2

C

,

fðxhÞ2 þ 1g

we are to ﬁnd MP size-a test for H : h ¼ h0 against K : h ¼ h1 ð [ h0 Þ.

Answer X CouchyðhÞ ) Y ¼ X h0 Couchyðh h0 ¼ dÞ. Hence H : h ¼

h0 , H : d ¼ 0 using Y-observation. So, without any loss of generality we take

H : h ¼ 0 and for the sake of simplicity we take h1 ¼ 1.

Here, by N–P lemma, MP test has the critical region x ¼

fx : pðx=h1 Þ [ kpðx=h0 Þg with Ph0 ðX 2 xÞ ¼ a 2 ð0; 1Þ

3.3 Method of Obtaining BCR 85

Here

pðx=h1 Þ 1 þ x2

[k , [k

pðx=h0 Þ 1 þ ð x 1Þ 2

, 1 þ x2 [ k 1 þ x2 2x þ 1

, x2 ð1 kÞ þ 2kx þ 1 2k [ 0 ð3:39Þ

Several cases

(a) k ¼ 1 , x [ 0, hence the size of the test is PðX [ 0=h ¼ 0Þ ¼ 12.

(b) 0\k\1 if we write gð xÞ ¼ ð1 kÞx2 þ 2kx þ ð1 2kÞ,

we have, g0 ð xÞ ¼ 2ð1 kÞx þ 2k ¼ 0 ) x ¼ 1kk

00

g ð xÞ ¼ 2ð1 kÞ [ 0, this means that the curve y ¼ gð xÞ has a minimum at

k

x ¼ 1k .

such that

to get x1 ; x2 for any given a. It exists for some speciﬁc values of a.

(c) If k [ 1, in that case g00 ð xÞ ¼ 2ð1 kÞ\0, thus y ¼ gð xÞ has the maximum at

k

x ¼ 1k [ 0. As shown in (b) above here also we can ﬁnd x1 and x2 the

two roots of gð xÞ ¼ 0 and the test will be given by x [ x1 or x\x2 with

Pfx1 \X\x2 =h ¼ hg ¼ a. Taking h1 ¼ 2, it can be shown that the MP test

for H : h ¼ 0 against h ¼ 2 is completely different. Hence based on single

observation there does not exist UMP test for H : h ¼ 0 against K : h [ 0

86 3 Theory of Testing of Hypothesis

dom variable Y ¼ ppððX=h 1Þ

X=h0 Þ is continuous under h ¼ h0 , we can always ﬁnd kð [ 0Þ

such that for a given a 2 ð0; 1Þ; Ph0 ðY [ kÞ ¼ a.

On the other hand, if the random variable Y is discrete under h ¼ h0 , it may not be

always possible to ﬁnd k such that, for a given a 2 ð0; 1Þ Ph0 ðY [ kÞ ¼ a. In that

case, we modify the non-randomized test x0 ¼ fx : pðx=h1 Þ [ kpðx=h0 Þg by using

following functions:

9 8 9

¼ 1; if pðx=h1 Þ [ kpðx=h0 Þ = <Y [k=

/0 ð xÞ ¼ a; if pðx=h1 Þ ¼ kpðx=h0 Þ , Y ¼ k ð3:41Þ

; : ;

¼ 0; if pðx=h1 Þ\kpðx=h0 Þ Y\k

The function given by (3.41) and (3.42) is called the randomized test corre-

sponding to non-randomized test x0 . It states that, after observing Y ði.e; X Þ

Reject H if Y [ k

Accept H if Y\k

Occurrence of success ) Rejection of H and

Occurrence of failure ) Acceptance of H.

Now we can show that the test given by (3.41) and (3.42) is MP size-a among all

tests / satisfying Eh0 /ð xÞ

a. Observe that /0 ð xÞ ¼ 0 ) /0 ð xÞ /ð xÞ 08x :

pðx=h1 Þ [ kpðx=h0 Þ and /0 ð xÞ ¼ 0 ) /0 ð xÞ /ð xÞ

08x : pðx=h1 Þ\kpðx=h0 Þ.

Hence, for all x, we have,

0

/ ð xÞ /ð xÞ ½pðx=h1 Þ kpðx=h0 Þ 0

Z

0

) / ð xÞ /ð xÞ ½pðx=h1 Þ kpðx=h0 Þdx 0

, Eh1 /0 ð xÞ Eh1 /ð xÞ kða Eh0 /ð xÞÞ 0; As k [ 0 and Eh0 /ð xÞ

a

, P/0 ðh1 Þ P/ ðh1 Þ

3.3 Method of Obtaining BCR 87

a.

0; 1 To ﬁnd UMP size-a test for H : h ¼ h0 ; against K : h [ h0 .

K : h ¼ h1 , we consider the ratio

P P

Qn xi n xi

pðx=h1 Þ f ð x =h Þ h ð 1 h Þ

¼ Qi¼1 ¼ 1P

i 1 1

Y¼ n P

pðx=h0 Þ i¼1 f ðxi =h0 Þ

xi n xi

n s h0 ð 1 h0 Þ

1 h1 h1 ð1 h0 Þ

¼

1 h0 h0 ð1 h1 Þ

P

where s ¼ xi . Observe that Y is a discrete r.v. under any h.

9

1; if Y [ k =

/0 ð xÞ ¼ a; if Y ¼ k

;

0; if Y\k

1 h1 n h1 ð 1 h0 Þ s

,

or k; ð3:43Þ

1 h0 h0 ð 1 h1 Þ

Now,

n

1 h1 h1 ð 1 h0 Þ s

or k

1 h0 h0 ð 1 h1 Þ

h 0 i

1 h1 h 1 ð 1 h0 Þ

, n log þ s log

or k0 k ¼ loge k

1 h0 h 0 ð 1 h1 Þ

0

k log 1h 1

, s

or n on

n 1h0 o

log hh10 ðð1h0Þ

log hh10 ðð1h 0Þ

1h1 Þ 1h1 Þ

h1 ð 1 h0 Þ

¼ C; ðsayÞ; As; h1 [ h0 ) log [0

h0 ð 1 h1 Þ

88 3 Theory of Testing of Hypothesis

8 9

< 1; if s [ C >

> =

/0 ð xÞ ¼ a; if s ¼ C ð3:45Þ

>

: >

;

0; if s\C

and

P

Under any h, s ¼ n1 Xi binðn; hÞ. Hence from (3.46) we have, either,

n

P ns

Ph0 fs [ C g ¼ a , s h0 ð 1 h0 Þ

n s

¼a)a¼0

cþ1

or,

Ph0 fs Cg\a\Ph0 fs C g

P

a nc þ 1 ns hs0 ð1 h0 Þns ð3:47Þ

)a¼ c

n h ð1 h Þnc

c 0 0

K : h ¼ h1 ð [ h0 Þ. Test is independent of any h1 ð [ h0 Þ. Hence it is UMP size-a

for H : h ¼ h0 against K : h [ h0

Observation

n o

h1 ð1h0 Þ

1. For h1 \h0 ) log h0 ð1h1 Þ \0

In that case (3.43) and (3.44) are equivalent to

8 9

< 1; if s\C >

> =

/ ð xÞ ¼ a; if s ¼ C and Ph0 fs\C g þ aPh0 fs ¼ C g ¼ a:

>

: >

;

0; if s [ C

We can get UMP for H : h ¼ h0 against K : h\h0 by similar arguments.

Obviously /0 6¼ / . So there does not exist a single test which is UMP for

H : h ¼ h0 against K : h 6¼ h0

2. By De Moivre–Laplace limit theorem, for large n, pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Snh

is approximately N(0, 1).

nhð1hÞ

Hence, from (3.45) and (3.46), we get,

C nh0 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ’ sa ) C ’ nh0 þ sa nh0 ð1 h0 Þ

nh0 ð1 h0 Þ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Then approximately size-a test is : Reject H if s [ nh0 þ sa nh0 ð1 h0 Þ,

Accept H otherwise.

3.3 Method of Obtaining BCR 89

PðhÞ ¼ Eh /0 ð X Þ

¼ Ph fS [ cg þ aPh fS ¼ cg

X n n s ns

¼ s h ð 1 hÞ þ a nc hc ð1 hÞnc

S¼c þ 1

½Can be obtained using Biometrika table

X n n s ns

X

n

ns

¼ ð 1 aÞ s h ð 1 hÞ þa s h ð1 hÞ

n s

S¼c þ 1 S¼c

¼ ð1 aÞIh ðc þ 1; n cÞ þ aIh ðc; n c þ 1Þ;

½Can be obtained using incomplete Beta function table:

Observe, as Ih ðm; nÞ is an increasing function of h, the Power function PðhÞ

increases with h.

X Rð0; 1Þ against K : X R 2 ; 2

Answer pðx=H Þ ¼0; 1; if 0\x\1

otherwise

1; if 1=2 \ x \ 3=2

pðx=K Þ ¼0; otherwise

As the ratio pðx=K Þ=pðx=H Þ is discrete, MP test for H against K is given by:

9

¼ 1 if; pðx=K Þ [ kpðx=H Þ =

/0 ð xÞ ¼ a; if pðx=K Þ ¼ kpðx=H Þ ð3:48Þ

;

¼ 0; if; pðx=K Þ\kpðx=H Þ

E H / 0 ð xÞ ¼ a ð3:49Þ

Taking

1

k\1; 0\x

) pðx=K Þ ¼ 0 and pðx=H Þ ¼ 1

2

) pðx=k Þ\kpðx=H Þ ) /0 ð xÞ ¼ 0

1

\x\1 ) pðx=K Þ ¼ pðx=H Þ ¼ 1=2

2

) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1

3

1

x\ ) pðx=K Þ ¼ 1 and pðx=H Þ ¼ 0

2

) pðx=K Þ [ kpðx=H Þ ) /0 ð xÞ ¼ 1

90 3 Theory of Testing of Hypothesis

1

So, for k\1, we get EH /0 ð X Þ ¼ 1: PH 2 \X\1 þ 1: PH ðX 1Þ ¼ 12. Thus it

is a trivial test of size 0.5.

Taking k\1;

1 9

0\x

) /0 ð xÞ ¼ 0 >

>

>

2 >

>

=

1

\x\1 ) /0 ð xÞ ¼ 0 EH /0 ð X Þ ¼ 0 and it is a trivial test of size 0:

2 >

>

>

>

3 >

1

x

) /0 ð xÞ ¼ 1 ;

2

Taking k ¼ 1;

o\x

12 ) /0 ð xÞ ¼ 0: we always accept H.

1

x\ 32 ) /0 ð xÞ ¼ 1: We always reject H.

1

1

, a:PH \X\1 ¼ a , a ¼ 2a

2

19>

¼ 0; if 0\x

>

2>>

>

=

1

¼ 2a; if \x\1 is MP size-a test:

2 >

>

>

3>>

¼ 1; if1

x\ ;

2

Generalization of N–P lemma.

Theorem 2 Let g0 ; g1 ; g2 ; . . .gm be ðm þ 1Þ non-negative

R integrable functions on

the sample space x . Let x be any region such that x gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm

where Ci ’s are all known numbers.

Suppose x0 be a subset of x such that:

Pm

Inside x0 : g0 ð xÞ [ ki gi ð xÞ;

1

3.4 Locally MPU Test 91

P

m

Outside x0 : g0 ð xÞ

ki gi ð xÞ, where k1 ; k2 ; . . .. . .km are so chosen that

R 1

gi ð xÞdx ¼ Ci ; i ¼ 1ð1Þm.

x0

R R

Then we have g0 ð xÞdx g0 ð xÞdx.

x0 x

This is called generalized Neyman–Pearson Lemma.

Proof

Z Z

g0 ð xÞdx g0 ð xÞdx

x0 x

Z Z

x0 x ¼ x0 x\x0 ¼ insidex0

¼ g0 ð xÞdx g0 ð xÞdx. . .. . .. . .ð1Þ

x x0 ¼ x x\x0 ¼ outsidex0

x0 x xx0

X

m

x 2 x0 x ) g0 ð xÞ [ ki gi ð xÞ

1

Z Z ( )

X

m

) g0 ð xÞdx ki gðxi Þ dx

1

x0 x x0 x

8 9

Xm < Z =

¼ ki gi ð xÞdx

i¼1

: ;

x0 x

X

m

x 2 x 0 x ) g0 ð x Þ

ki gi ð x Þ

1

( ) 8 9

Z Z X

m X

m < Z =

) g0 ð xÞdx

ki gi ðxÞ dx ¼ ki gi ð xÞdx

1 1

: ;

xx0 xx0 xx0

8 9 8 9

Z Z X

m < Z = X m < Z =

g0 ð xÞdx g0 ð xÞdx ki gi ð xÞdx ki gi ð xÞdx

i

: ; i

: ;

x0 x xx0 x0 x xx0

2 3

X

m Z Z

¼ ki 4 gi ðxÞdx gi ð xÞdx5

i

x0 x xx0

2 3

X

m Z Z X

m

¼ ki 4 gi ð xÞdx gi ð xÞdx5 ¼ k i ðCi Ci Þ ¼ 0

i i

x0 x

92 3 Theory of Testing of Hypothesis

1. One-sided case: For the family fpðx=hÞ; h 2 Hg, sometimes we cannot ﬁnd

UMP size-α test for H : h ¼ h0 against K : h [ h0 or h\h0 .

For example, if X1 ; X2 ; . . .; Xn ðn 1Þ are i.i.d according to the p.d.f.

1 1

f ðx=hÞ ¼ ; ð1\h\1; 1\x\1Þ

p 1 þ ðx hÞ2

In that case, we can ﬁnd an e [ 0 for which there exists a critical region x0 such

that Px0 ðh0 Þ ¼ a and Px0 ðhÞ Px ðhÞ8h : h0 \h\h0 þ e and 8x : Px ðh0 Þ ¼ a.

Construction Let pðx=hÞ be such that, for every x, ddh px ðhÞ exists and is

continuous in the neighborhood of h0 . Then we have, by mean value theorem, for

any h [ h0

d

Px ðhÞ ¼ Px ðh0 Þ þ ðh h0 Þ Px ðhÞ ; h0 \h \h

dh h¼h

Similarly,

Let x0 be such that Px0 ðh0 Þ ¼ a and P0x0 ðh0 Þ is maximum, i.e.

P0x0 ðh0 Þ P0x ðh0 Þ8x : Px0 ðh0 Þ

¼ a: Then comparing (3.50) and (3.51), we get an

e [ 0, such that Pxc ðhÞ Px ðhÞ8h : h0 \h\h0 þ e. Such a x0 is called locally

most powerful size-α test for H : h ¼ h0 against h [ h0 .

Now our problem is to choose x0 such that

Z

Px0 ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a ð3:52Þ

x0

and

Z Z

dpðx=hÞ dpðx=hÞ

P0x0 ðh0 Þ P0x ðh0 Þ , dx dx

dh0 dh0

x

Z0 x

Z

0

, p ðx=h0 Þdx p0 ðx=h0 Þdx

x0 x

R

where x satisﬁes Px ðh0 Þ ¼ a , pðx=h0 Þdx ¼ a

x

3.4 Locally MPU Test 93

g1 ðxÞ ¼ pðx=h0 Þ, C1 ¼ a, k1 ¼ k.

Then we get,

)

Inside x0 : p0 ðx=h0 Þ [ kpðx=h0 Þ

ð3:53Þ

Outside x0 : p0 ðx=h0 Þ

kpðx=h0 Þ

R R

Finally, p0 ðx=h0 Þdx p0 ðx=h0 Þdx

x0 x

where x0 and x satisfy

Thus the test given by (3.53) and (3.54) is locally most powerful size-a for

H : h ¼ h0 against h [ h0 .

Note If UMP test exists for H : h ¼ h0 against h [ h0 ) LMP test corre-

sponding to the said problem must be identical to the UMP test. But the converse

may not be true.

Example 3.17 X1 ; X2 ; . . .Xn are i.id Nðh; 1Þ. H : h ¼ h0 against h [ h0 .

LMP test is provided by

Z

pðx=h0 Þdx ¼ a ð3:56Þ

x0

p0 ðx=h0 Þ [ kpðx=h0 Þ

1

, p0 ðx=h0 Þ [ k

pðx=h0 Þ ð3:57Þ

d

, ½loge pðx=h0 Þ [ k

dh0

P

Here pðx=hÞ ¼ ð2pÞn=2 e2 ðxi hÞ

1 2

1Xn

) log pðx=hÞ ¼ const. ðxi hÞ2

2 1

0Þ

P

n

) d logðx=h

dh ¼ ðxi h0 Þ, hence by (3.57), (3.55)

0

1

94 3 Theory of Testing of Hypothesis

, x0 ¼ fx : x [ k0 g

pﬃﬃﬃ pﬃﬃﬃ pﬃﬃﬃ

, Ph0 nðx h0 Þ [ nðk0 h0 Þ ¼ a ) nðk0 h0 Þ ¼ sa

n o

i.e, k0 ¼ h0 þ p1ﬃﬃn sa . Thus x0 , x0 ¼ x : x [ h0 þ p1ﬃﬃn sa

which is identical to the UMP test for H : h ¼ h0 against h [ h0 .

To ﬁnd LMP for H : h ¼ h0 against h [ h0

Qn

Here pðx=hÞ ¼ f ðxi =hÞ;

i¼1

LMP test is given by the critical region:

x ¼ fx : p0 ðx=h0 Þ [ kpðx=h0 Þg, where k such that Px ðh0 Þ ¼ a

Now,

p0 ðx=h0 Þ

p0 ðx=h0 Þ [ kpðx=h0 Þ , [k

pðx=h0 Þ

d log pðx=h0 Þ

, [k ½pðx=hÞ [ 0

dh0

P

n 0

f ðxi =h0 Þ P

n 0

, f ðxi =h0 Þ [ k, ðsayÞ , yi [ k, where yi ¼ ff ðxðxii=h

=h0 Þ

0Þ

.

1 1

Now, under H, yi 0 s is i.i.d with

Z Z

f 0 ðxi =h0 Þ

Eh0 fyi g ¼ f ðxi =h0 Þdx ¼ f 0 ðx=h0 Þdx

f ðxi =h0 Þ

Z

d d

¼ f ðx=h0 Þdx ¼ ð1Þ ¼ 0

dh0 dh0

Z

f 0 ðxi =h0 Þ 2

Vh0 fyi g ¼ f ðx=h0 Þdx

f ðx=h0 Þ

Z

@ log f ðx=h0 Þ 2

¼ f ðx=h0 Þdx

dh0

¼ Iðh0 Þ ½Fisher0 s information:

Pn

y

Hence, by Central Limit Theorem, for large n, pﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1 i

Nð0; 1Þ, under H. So,

nIðh0 Þ

for large n, the above test can be approximated by

3.4 Locally MPU Test 95

( )

X

n

f 0 ðxi =h0 Þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

x¼ x: [ sa nIðh0 Þ :

i¼1

f ðxi =h0 Þ

For testing H : h ¼ h0 against h 6¼ h0 corresponding to the family fpðx=hÞ; h 2 Hg,

it is being known that (expect some cases) there does not exist any test which is

UMP for both sided alternatives. e.g. taking l ¼ 0 against l 6¼ 0 for Nðl; 1Þ and

taking r2 ¼ 1 against r2 6¼ 1 for Nð0; r2 Þ etc.

In those cases, we can think of a test which is UMP in a neighbourhood of h0 .

Thus a test ‘w0 ’ is said to be locally best (of size a) for H : h ¼ h0 against K : h 6¼ h0

if there exists an t [ 0 for which

(i) Pw0 ðh0 Þ ¼ a

(ii) Pw ðhÞ a 8h : jh h0 j\t and 8w satisfying (i).

Let pðx=hÞ be such that, for a chosen w

(i) P0w ðhÞ exists in the neighbourhood of h0 ;

(ii) P00w ðhÞ exists and is continuous at (in the neighbourhood) h0 .

Then we have, by Taylor’s Theorem

ðh h0 Þ2 00 k

Pw ðhÞ ¼ Pw0 ðh0 Þ þ ðh h0 ÞP0w ðh0 Þ þ Pw ðh Þ;

2!

jh h0 j\jh h0 j

(i) Pw0 ðh0 Þ ¼ a (size condition)

(ii) P0w0 ðh0 Þ ¼ 0 (Locally unbiased condition)

(iii) P00w0 ðh0 Þ is maximum

Then we can ﬁnd an t [ 0 such that 8h : jh h0 j \t, We have

Pw0 ðhÞ Pw ðhÞ8w satisfying (i) and (ii).

2

Now Pw ðhÞ ¼ Pw ðh0 Þ þ ðh h0 ÞP0w ðh0 Þ þ ðhh

2!

0Þ

P00w ðhk0 Þ þ g and g ! 0 as

h ! h0 .

To get Pw0 ðhÞ Pw ðhÞ8h : jh h0 j\t we must have P00w0 ðh0 Þ P00w ðh0 Þ [due to

continuity of Pw ðhÞ].

Then w0 is called locally Most Powerful unbiased size-a test if

(i) Pw0 ðh0 Þ ¼ a

(ii) P0w0 ðh0 Þ ¼ 0:

(iii) P00w0 ðh0 Þ P00w ðh0 Þ8w satisfying (i) and (ii).

96 3 Theory of Testing of Hypothesis

Construction

Z

Z

w w

Z

w

c1 ¼ a; c2 ¼ 0

Then

n

o

w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0

R R

where k1 and k2 are such that g1 ðxÞdx ¼ a; g2 ðxÞdx ¼ 0:

R Rw0 w0

Then we have g0 ðxÞdx g0 ðxÞdx provided ‘w’ satisﬁes (i) and (ii).

w0 w

Example 3.18 X1 ; X2 ; . . .Xn are i.i.d. Nðl; 1Þ. To ﬁnd LMPU test for H : l ¼ l0

against K:l 6¼ l0 :

P n

1

n 12 ðxi lÞ2

Answer Here p x=h ¼ pﬃﬃﬃﬃ 2p

e 1

P

n

0 1 n

1

2 ðxi lÞ2

x

p =h ¼ pﬃﬃﬃﬃﬃﬃ nðx lÞe ¼ nðx lÞp x=h

2p

00 x

p =h ¼ ½nðx lÞ2 p x=h np x=h

n

o Z

w0 ¼ x : p00 x=h0 [ k1 p x=h0 þ k2 p0 x=h0 pðx=h0 Þdx ¼ a ð3:58Þ

x0

3.4 Locally MPU Test 97

and

Z

p0 ðx=h0 Þdx ¼ 0 ð3:59Þ

x0

n o

¼ x : ½nðx l0 Þ2 n [ k1 þ k2 nðx l0 Þ

n pﬃﬃﬃ 2 pﬃﬃﬃ o

¼ x: nðx l0 Þ [ k01 þ k02 nðx l0 Þ

¼ x : y2 [ k01 þ k02 y ;

pﬃﬃﬃ

y ¼ nðx l0 Þ Nð0; 1Þ under H

(3.59) ⟺

Z

yNðy=0; 1Þdy ¼ 0 ð3:60Þ

y2 [ k01 þ k02 y

Now the LHS of (3.60) is zero irrespective of choice of any ðk01 ; k02 Þ since

y

Nð =0; 1Þ is symmetrical about ‘0’.

Here, we can safely take k02 ¼ 0 without affecting size condition. Then our test

reduces to w0 : x : y2 [ k01 fx : jyj [ cg and hence (3.58) is equivalent to

R

Nðy=0; 1Þdy ¼ a ) c ¼ sa=2

jyj [ c

Then we obtain LMPU test for H : l ¼ l0 against l 6¼ l0 .

A test which is locally most powerful and locally unbiased is called a Type A

test and corresponding critical region ‘w0 ’ is said to be Type-A critical region

Let p x=h ; h 2 H : Real parameter family of distributions.

Testing problem: H : h ¼ h0 against K : h 6¼ h0 :

T ð X Þ ¼ T : Test statistic.

(i) Right tail test based on T is UMP for H : h ¼ h0 against h [ h0 (in most of the

cases)

(ii) Left tail test based on T is UMP for H : h ¼ h0 against h\h0 (in most of the

cases)

P P

[As for example N ðl; 1Þ; N ð0; r2 Þ …. T ¼ xi ; T ¼ x2i etc. and for

Bðn; pÞ; T ¼ x etc:]

There not exist a single test which is UMP for H : h ¼ h0 against h 6¼ h0 :

does

If p x=h has monotone likelihood ratio in T ð X Þ; i.e.

98 3 Theory of Testing of Hypothesis

x=

h1

" T ð xÞ for h1 [ h0 ; then (i) and (ii) are satisﬁed.

p

p x=h

0

In that case, we try to choose a test w0 for which

(i) Pw0 ðh0 Þ ¼ a

(ii) Pw0 ðhÞ a8h 6¼ h0

(iii) Pw0 ðhÞ Pw ðhÞ8h 6¼ h0 8w satisfying (i) and (ii)

Such a test

Let p x=h be such that, for every test w;

d

dh ½Pw ðhÞ exists;

and

Z Z Z

d d dpðx=hÞ

½Pw ðhÞ ¼ pðx=hÞdx ¼ dx ¼ p0 ðx=hÞdx ð3:61Þ

dh dh dh

w w w

d

w) Pw ðhÞ ¼ 0 ð3:62Þ

dh0

Thus, if a test w0 satisﬁes (i), (ii) and (iii); under (3.61), w0 also satisﬁes (i) and (iii).

Test satisfying (i), (iii) and (3.62) is called type-A1 test.

For exponential distribution, if type-A1 test exists, then it must be unbiased. But

this is not true in general.

Construction Our problem is to get w0 such that

R

(i) p x=h0 dx ¼ a;

R 0

w0

(ii) p x=h0 dx ¼ 0

R

w0

R

(iii) p x=h dx p x=h dx 8w satisfying (i) and (ii) and 8h 6¼ h0

w0 w

c1 ¼ 1; c2 ¼ 0. n

o

Then, deﬁne w0 ¼ x : p x=h [ k1 p x=h0 þ k2 p0 x=h0 and hence

R R

p x=h dx p x=h dx8w satisfying (i) and (ii) and 8h 6¼ h0 :

w0 w

For exponential distribution, it is always possible to have such region w0 (which

means type-A1 test exists).

Example 3.19 X1; X2; . . .; Xn are i.i.d. Nðl; 1Þ. We test H : l ¼ l0 against

K : l 6¼ l0

3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 99

1

P

n

ðxi lÞ2

pðx=hÞ ¼ ð2pÞn=2 e

2

P

n

X

n 1

ðxi lÞ2 X

n

p ðx=hÞ ¼ ð2pÞn=2

0 2

ðxi lÞe i ¼ ðxi lÞpðx=hÞ

i¼1 i

1

P 2

e 2 ðxi lÞ

n 2

pðx=hÞ e 2 ðxlÞ

P ¼ e2ðll0 Þf2xðl0 þ lÞg

n

¼ ¼

pðx=h0 Þ e12 ðxi l0 Þ2 en2 ðxl0 Þ2

pﬃﬃﬃ

) w0 ¼ fx : eðll0 Þt [ k01 þ k02 tg where t ¼ nðx l0 Þ

¼ fx : edi [ k01 þ k02 tg

Z Z

x

pð =h0 Þdx ¼ a; nðx lÞpðx=h0 Þdx ¼ 0

w0 w0

Z

, Nðt=0; 1Þdt ¼ a ð3:63Þ

wo

Z

, tNðt=0; 1Þdt ¼ 0 ð3:64Þ

w0

Writing gðtÞ ¼ edt k01 k02 t we have g0 ðtÞ ¼ dedt k02 and g00 ðtÞ ¼ d2 edt [ 08t

) y ¼ gðtÞ has a single minimum (global minimum).

Now if we take a\0:5, because of (3.63) and since the distribution of t is sym-

metric about 0 under H0 our shape of the curve will be like as shown below. From the

graph, we observe that c1 \c2 ; gðtÞ [ 0 for t\c1 and t [ c2 and gðtÞ

0 otherwise.

y=g(t)

c1 c2 t

100 3 Theory of Testing of Hypothesis

Hence w0 is equivalent

R to wo ¼ fx : t\c1 or t [ c2 g

(3.63) , Nðt=0; 1Þdt ¼ a and (3.64)

t\c1 ;t [ c2

Z

, tNðt=0; 1Þdt ¼ 0 ð3:65Þ

t\c1; t [ c2

w0 ¼ fx : t\ c and t [ cg ð3:66Þ

Z

Nðt=0; 1Þdt ¼ a ) c ¼ sa=2 ð3:67Þ

jt j [ c

Here (3.65) is automatically satisﬁed. Hence test given by (3.66) and (3.67) is

type-A1 (which is UMPU).

Example 3.20

n Pn

1 1 2

pðx=hÞ ¼ pﬃﬃﬃﬃﬃﬃﬃ e2r2 i xi

r 2P

0Pn 1

x2i

B i C 1

p0 ðx=hÞ ¼ B C

@ r2 nA 2r2 pð =hÞ

x

Thus,

Pn 2

x x i xi 1 x

r20 2r20

( Pn 2

pðx= Þ x

¼ x : x h [ k01 þ k02 tg; t ¼ i 2 i

pð =ho Þ r0

P

n

x2

pðx= Þ r

n i

i r2

ð1 02 Þ

As x h ¼

0 2r2 r

e 0

pð =h0 Þ r

3.5 Type A1 (Uniformly Most Powerful Unbiased) Test 101

n r

n d o

e2t [ k01 þ k02 t

0

w0 ¼ x :

r

n dt

Now as before the curve y ¼ gðtÞ ¼ rr0 e 2 k01 k02 t has a single minimum.

n o

Here P T [ 0=h ¼ 1

) Shape of the curve gðtÞ will be as shown below

y = g(t)

d1 d2 t

such that w0 is equivalent to w0 ¼ fx : t\d1 or t [ d2 g

Here d1 and d2 are such that

Z

Zd2

p x=h0 dx ¼ a , fv2n ðtÞdt ¼ 1 a ð3:68Þ

w0 d1

and

Z

Z

p0 x=h0 dx ¼ 0 , ðt nÞfv2n ðtÞdt ¼ 0 ð3:69Þ

w0 t\d1 or t [ d2

R R

(3.69) ⇔ tfv2n ðtÞdt ¼ n fv2n ðtÞdt ¼ na, by (3.68)

t\d1 or t [ d2 t\d1 or d2

Zd2

, tfvn2 ðtÞdt ¼ ð1 aÞn

d1

Zd2

, fvn2þ 2 ðtÞdt ¼ ð1 aÞ ð3:70Þ

d1

x0 ¼ fx : t\d1 or t [ d2 g such that

P d1\ vn2 \d2 ¼ 1 a and P d1 \vx2þ 2 \d2 ¼ 1 a:

102 3 Theory of Testing of Hypothesis

Example 3.21 X1 ; X2 ; . . .; Xn are i.i.d with p x=h ¼ hehx . Find Type-A and Type-

A1 test for H : h ¼ h0 against h 6¼ h0 .

Answer proceed as Examples 3.19 and 3.20 and hence get

( )

X

n X

n

x0 ¼ x: Xi \c1 or Xi [ c2

1 1

c1 c2

P 1\v2n \

2

1 ¼ 1 a and

2h0 2h0

c1 c2

P 1\v2n þ 2 \

2

1 ¼ 1 a:

2h0 2h0

Chapter 4

Likelihood Ratio Test

4.1 Introduction

In the previous chapter we have seen that UMP or UMP-unbiased tests exist only

for some special families of distributions, while they do not exist for other families.

Further, computations of UMP-unbiased tests in K-parameter family of distribution

are usually complex. Neyman and Pearson (1928) suggested a simple method for

testing a general testing problem.

Consider X pðxjhÞ, where h is a real parameter or a vector of parameters,

h 2 H:

A general testing problem is

H : h 2 H0 Against K : h 2 H1 :

Here, H and K may be treated as the subsets of H. These are such that H \ K ¼

/ and H [ K H. Given that X ¼ x; pðxjhÞ is a function of h and is called like-

lihood function. Likelihood test for H against K is provided by the statistic

Sup pðxjhÞ

h2H

L ð xÞ ¼ ;

Sup pðxjhÞ

h2H[K

which is called the likelihood ratio criterion for testing H against K. It is known that

(i) pðxjhÞ 08h

(ii) Sup pðxjhÞ Sup pðxjhÞ.

h2H h2H[K

that the observation X comes from some population under H and the denominator

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_4

104 4 Likelihood Ratio Test

measures the best explanation of X to come from some population covered under

H [ K. Higher values of the numerator correspond to the better explanation of

X given by H compared to the overall best possible explanation of X, which results

in larger values in Lð xÞ leading to acceptance of H. That is, Lð xÞ would be larger

under H than under K. Indeed, smaller values of Lð xÞ will lead to the rejection of

H. Hence, our test procedure is:

Reject H iff Lð xÞ\C

and accept H otherwise,

where C is such that PfLð xÞ\CjH g ¼ a 2 ð0; 1Þ:

If the distribution of Lð xÞ is continuous, then the size a is exactly attained and no

randomization on the boundary is needed. If the distribution is discrete, the size

may not attain a and one may require randomization. In this case, we have C from

the relation

PfLð xÞ\CjH g a:

accept H if Lð xÞ [ C;

and reject with probability ‘a’ iff Lð xÞ ¼ C.

Thus, we have PfLð xÞ\CjH g þ aPfLð xÞ ¼ CjH g ¼ a.

The likelihood ratio tests are useful, especially when h is a vector of parameters

and the testing involves only some of them. This test criterion is very popular

because of its computational simplicity. Moreover, this criterion proves to be a

powerful alternative for testing vector valued parameters that involve nuisance

parameters. Generally, the likelihood ratio tests result in optimal tests, whenever

they exist. An LR test is generally UMP, if an UMP test at all exists. In many cases

the LR tests are unbiased, although this is not universally true. However, it is

difﬁcult to compute the exact null distribution of the test statistic Lð xÞ in many

cases. Therefore, a study of large sample properties of Lð xÞ becomes necessary

where maximum likelihood estimators follow normal distribution under certain

regularity conditions. We mention the following large sample property of the

likelihood ratio test statistic without proof.

Under H, the asymptotic distribution of 2 loge Lð xÞ is distributed as v2 with

degrees of freedom equal to the difference between the number of independent

parameters in H and the number in H0 .

Drawback: Likelihood ratio test is constructed completely by intuitive argu-

ment. So, it may not satisfy all the properties that are satisﬁed by a test obtained

from N–P theory; it also may not be unbiased.

Example 4.1 Let X be a binomial bðn; hÞ random variable. Find the size-a likeli-

hood ratio test for testing H : h h0 against K : h [ h0

4.1 Introduction 105

The likelihood ratio test statistic is given as

h2H h h0

L ð xÞ ¼ ¼

Sup pðxjhÞ Sup pðxjhÞ

H[K H

n x

Sup h ð1 hÞnx

h h0 x

¼

n x

Sup h ð1 hÞnx

H x

_

The MLE of h for h 2 H is h ¼ nx.

For, h 2 H0 , we have

_ x x

hH ¼ if h0

n n

x

¼ h0 if [ h0

n

Thus, we have

n n x x

nx x nx

Sup h ð 1 hÞ ¼

x

1

H x x n n

8

> n nx

>

> x x

1 nx if x

h0

n x < x n n

nx

Sup h ð 1 hÞ ¼

h h0 x >

> n x

>

: h ð1 h0 Þnx if x

[ h0

x 0 n

(

1 if x

n h0

So, Lð xÞ ¼ hx0 ð1h0 Þnx

if x

[ h0

ðnxÞ ð1nxÞ

x nx

n

shows that Lð xÞ is the decreasing function of x. Thus, Lð xÞ\C iff x [ C 0 and the

likelihood ratio test rejects H0 if x [ C 0 where C0 is obtained as

Ph0 ðX [ C 0 Þ ¼ a. Since X is a discrete random variable, C 0 is obtained such that

mean l and variance r2 . Find out the likelihood ratio test of

(a) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is known.

(b) H : l ¼ l0 aganist K : l 6¼ l0 when r2 is unknown.

106 4 Likelihood Ratio Test

Solution

(a) Here, H0 ¼ fl0 g; H ¼ fl : 1\l\1g

P

n

n=2 2r2 1

ðXi lÞ2

pðxjhÞ ¼ pðxjlÞ ¼ ð2pÞn=2 r2 e i¼1

H H0

Lð xÞ ¼ ¼

Sup pðxjlÞ Sup pðxjlÞ

H[K H

P

n

1 ðXi l0 Þ2

n=2 2 n=2 2r2 i¼1

So, Sup pðxjlÞ ¼ ð2pÞ ðr Þ e

H0

P

n

1

ðXi xÞ2

n=2

and Sup pðxjlÞ ¼ ð2pÞn=2 ðr2 Þ

2r2

e i¼1 :

H

This gives

Pn

ðXi l0 Þ2

e2r2

1

i¼1 2

¼ e2r2 ðxl0 Þ

n

L ð xÞ ¼ Pn

1

ðXi xÞ2

e 2r2 i¼1

n

2 ðx l0 Þ2 \C1

2r

pﬃﬃﬃ

nðx l0 Þ

or; [ C2

r

( pﬃﬃ

1 if nðxrl0 Þ [ C2

/ðxÞ ¼

0 Otherwise

El0 ½/ðxÞ ¼ a

pﬃﬃﬃ

nðx l0 Þ

or; P l0

r [ C2 ¼ a

4.1 Introduction 107

pﬃﬃ

Now, under H : l ¼ l0 , the statistic nðxrl0 Þ follows N ð0; 1Þ distribution. Since

the distribution is symmetrical about 0, C2 must be the upper a=2—point of the

distribution. Finally, the test is given as

( pﬃﬃ

1 if nðxrl0 Þ [ sa=2

/ðxÞ ¼

0 Otherwise

(b) Here, H0 ¼ ðl; r2 Þ : l ¼ l0 ; r2 [ 0

H¼ l; r2 : 1\l\1; r2 [ 0

In this case,

Pn

1 n=2 2r2 ðXi lÞ

1 2

Sup p xjl; r 2

¼ Sup e i¼1

H0 H0 2pr2

1X n

s20 ¼ ð X i l0 Þ 2 :

n i¼1

n=2

e 2 .

n

This gives Sup pðxjl0 ; r2 Þ ¼ 1

2ps02

H0

n=2 Pn

ðXi lÞ2

e2r2

1

Further, Sup pðxjl; r2 Þ ¼ Sup pðxjl; r2 Þ ¼ Sup p 1

2pr2

i¼1

H l;r2 l;r2

Pn

The MLE of l and r are given as x and2 1

i¼1

Þ2 ¼ ðn1Þ s2 ; where

ðXi X

Pn n n

s ¼ n1 2

i¼1 ðXi X Þ . We have,

2 1

!n=2

1

e 2

n

Sup p xjl; r2 ¼

l;r2 2p ðn1Þ 2

n s

n2

e2

1 n n=2

2ps20 ðn 1Þs2

Lð xÞ ¼ n2 ¼

ns20

e2

1 n

ðn1Þ 2

2p n s

!n=2 " #n=2

ðn 1Þs2 ðn 1Þs2

¼ Pn ¼ Pn

i¼1 ðX i l 0 Þ2 nðx l0 Þ þ

2 2

i¼1 ðXi X Þ

" #n=2

ðn 1Þs2

¼

nðx l0 Þ2 þ ðn 1Þs2

108 4 Likelihood Ratio Test

Lð xÞ\C

nðx l0 Þ2

or; 2

[ C1

pﬃﬃﬃ s

nðx l0 Þ

or; [ C2 :

s

( pﬃﬃ

1 if nðxsl0 Þ

/ðx; sÞ ¼ [ C2 ;

0 otherwise

pﬃﬃﬃ

nðx l0 Þ

Pl0 [ C2 ¼ a:

s

pﬃﬃ

Now under H : l ¼ l0 , the statistic nðxsl0 Þ is distributed as, t with (n − 1) d.f

which is symmetric about 0. Hence, C2 ¼ ta2;n1 . Finally, the test is given as

( pﬃﬃ

1 if nðxsl0 Þ

/ðx; sÞ ¼ [ ta2;n1

0 otherwise

Example 4.3 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ. Derive LR test for H : l l0 against

K : l [ l0 .

Answer h ¼ ðl; r2 Þ; H ¼ h : 1\l\1; r2 [ 0

P

n

n 2r2

1

ðXi lÞ2

Here, pðx=hÞ ¼ p x l; r2 ¼ ð2pÞ2 ðr2 Þ 2 e

n

1

Sup x l; r2 Sup x l; r2

H l l0

Lð xÞ ¼ ¼ ½H [ K ¼ H ¼ ðl l0 Þ[ðl [ l0 Þ

Supðx=l; r2 Þ Supðx=l; r2 Þ

H[K H

p x l ^2H

^H ; r

¼

pðx=^ ^2 Þ

l; r

where l ^2H : MLE ðl; r2 Þ under H

^H ; r

^; r

ðl ^2 Þ: MLE of ðl; r2 Þ under H [ K (in the unrestricted case).

4.1 Introduction 109

_ _2 1Xn

ð n 1Þ 2

l ¼ x; r ¼ ðxi xÞ2 ¼ s

n 1 n

_ n1 2

lH ¼ x if x l0 and r

^2H ¼ s if x l0

n

1X n

¼ l0 if x [ l0 and r

^2 H ¼ s20 ¼ ðxi l0 Þ2 if x [ l0 :

n 1

. n n

_ 2 2 n

^2 ¼ ð2pÞ 2 n1

Thus, we have p x l; r n s e2

. n2

_ n2 n 1 2 n

^ H ¼ ð2pÞ

p x lH ; r 2

s e 2 if x l0

n

n n n

¼ ð2pÞ2 s20 2 e2 if x [ l0

So, Lð xÞ ¼ 1 if x l0

n1 2 n=2

n s

¼ ; if x [ l0

s20

n1 2 n=2

s

, n2 \C and x [ l0

s0

" #n=2

ðn 1Þs2

, \C and x [ l0

ðn 1Þs2 þ nðx l0 Þ2

nðx l0 Þ2

, 1þ [ C 0 and x [ l0

ðn 1Þs2

pﬃﬃﬃ

nðx l0 Þ

, [ C00 ð4:2Þ

s

npﬃﬃ 00

o

Thus, (4.1) is , P nðxsl0 Þ [ C =l ¼ l0 ¼ a

110 4 Likelihood Ratio Test

8 9

>

> p ﬃﬃ

ﬃ >

>

< nðx l Þ 00

=

, P rP C

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ =l ¼ l

0

¼a

>

> n 0>

>

: ðx xÞ2

1 i ;

n1

00

) C ¼ ta;n1

pﬃﬃ

Hence, reject H iff nðxsl0 Þ [ ta;n1

) Test can be carried out by using students’ t-statistic.

Example 4.4 X1 ; X2 . . .Xn are i.i.d N ðl; r2 Þ; 1\l\1; r2 [ 0: Find the LR test

for

II. H : r2 ¼ r20 against K : r2 6¼ r20

Answer I. h ¼ ðl; r2 Þ

Pn

n 12 ðxi lÞ2

p x=l; r2 ¼ Likelihood function ¼ ð2pÞn=2 r2

=2 2r

e i¼1

¼ ð2pÞn=2 r2 e 2r2

Supl;r2 ¼r20 p x l; r2 p x l^H ; r20

Lð xÞ ¼ ¼ ;

Supl;r2 r20 pðx=l; r2 Þ ^2 Þ

l; r

pðx=^

where l

n1

s2 ; if n1

n s r0

2 2

^ ¼ MLE of r ¼

r 2 2 n

r0 ; if n s \r0

2 n1 2 2

^H = MLE of l under 8

l H = x

>

>

ðn1ÞS2

< ðr20 Þn=2 e 2r20

Hence we get, Lð xÞ ¼ Þ 2 n=2 n

; if n1 2

n s r20

>

> ððn1

n s Þ e 2

: 1; if n1 s2 \r2

n 0

4.1 Introduction 111

Lð xÞ\C: ð4:3Þ

PfLð xÞ\C=H g ¼ a 2 ð0; 1Þ ð4:4Þ

n1 2 n2 ðn1Þs2

s 2 ðn 1Þs2

ð 1Þ , n 2 e 2r0 \C 0 iff n

r0 r20

, u2 e2 \C

n u

if u n ð4:5Þ

n u

n

n n u2 u

g0 ðuÞ ¼ u21 e2 e2 ¼ 0

u

2 2

)nu¼0,u¼n

The curve y = g(u) has a maximum and the shape of the curve is

g(u)=c'

u-1 u= n u0

0\u1 \n\u0 :

Hence, (4.5) , u [ u0 and (4.4) , PH ðU [ u0 Þ ¼ a

2 As under H;

, P vn1 [ u0 ¼ a

U v2n1

) u0 ¼ v2n1;a

P

Thus, LR test is: reject H if ni¼1 ðxi xÞ2 [ r20 v2n1;a :

ðn1Þs2

Supl;r2 ¼r2 pðx=l;r2 Þ ðr2 Þn=2 e 2r20

¼ K u2 e 2

n u

II. Lð xÞ ¼ Sup 0pðx=l;r2 Þ ¼ n1 0

n=2

l;r2 ð n s2 Þ en=2 n n U o

Our test is to reject H if u2 e2 \C0 ; where C 0 is such that PH U 2 e 2 \C0 ¼ a:

n u

From the graph, we observe that the line y = C′ cuts the curve y = g(u) at two

points: u1 and u0 such that 0\ u1 \n\ u0 . Hence, the test is equivalent to reject

H iff u\u1 or u [ u0 ; where

PH v2n1 \u1 þ PH v2n1 [ u0 ¼ a:

112 4 Likelihood Ratio Test

Although v2 is not symmetric, for the matter of simplicity, equal error proba-

bilities a=2 are attached to both left-and right-sided critical regions. Thus, u1 ¼

v2n1;1a and u0 ¼ v2n1;a .

2 2

Example

4.5 Let X1 ; X2 ; . . .Xn1 and Y1 ; Y2 ; . . .Yn2 be two independent samples drawn

from N l1 ; r21 and N l2 ; r22 ; respectively. Find out the likelihood ratio test of

(a) H : r21 ¼ r22 against K : r21 [ r22

(b) H : r21 ¼ r22 against K : r21 6¼ r22 ; when l1 and l2 are unknown

(a) Here, h ¼ l1 ; l2 ; r21 ; r22

H0 ¼ l1 ; l2 ; r21 ; r22 : 1\l1 ; l2 \1; r21 ¼ r22 ¼ r2 [ 0

H ¼ l1 ; l2 ; r21 ; r22 : 1\li \1; r2i [ 0; i ¼ 1; 2

P

n1

2

P

n2

2

n1 þ n2

1

1

ðXi l1 Þ ðyi l2 Þ

pðx; yjhÞ ¼ ð2pÞ rn 1 n2 2r2 2r2

1 r2 e

2 1 i¼1 2 i¼1

_

Sup pðx; yjhÞ

p x; y hH

h2H

Lðx; yÞ ¼ ¼ _

Sup pðx; yjhÞ

h2H[K

p x; y h

_

where hH = MLE of h under H

_

h ¼ MLE of h under H [ K:

P P

_ _ _2 ðxi xÞ2 þ ðyi yÞ2

l1H ¼ x; l2H ¼ y; rH ¼

n 1 þ n2

8 P

>

>

_ _ _2

l1 ¼ x; l2 ¼ y; r1 ¼ n11 ðxi xÞ2

>

>

>

< _2 P _2

r

r2 ¼ n12 ðyi yÞ2 if _12 1

> r2

>

>

>

> _ _ _2

_2

r1

: l1 ¼ x; l2 ¼ y; rH if _2 \1

r2

_

n1 þ n2 n1 þ2 n2 n þ n

_ 2

Hence, p x; yjhH ¼ ð2pÞ 2 rH e 2

1 2

8

n þn n21 2 n22 n þ n

>

> ð Þ 1 2 2 _2

r

_

r2 e 2

1 2

_2

r1

1

_

< 2p 1 if _2

r2

and p x; yjh ¼ n1 þ n2 2 1 2 2

n þn

>

> n1 þ n2 _2

r1

: ð2pÞ 2 r _

H e 2 if _2

\1

r2

4.1 Introduction 113

8

> _2 n21 _2 n22

>

> r1 r2 _2

r1

>

< n1 þ n2 if 1

_2

_2 r2

Therefore, Lðx; yÞ ¼

2

rH

>

>

>

>

_2

r1

: 1 if _2

\1

r2

8 nP on1=2 nP on2=2

>

> ðxi xÞ2 ðyi yÞ2 P

>

> ðxi xÞ2=

>

>

n1 n2

P n1 1

> if

< nP

> 2

P

ðxi xÞ þ

2 oðn1 þ n2 Þ=2

ðyi yÞ ðyi yÞ2=

¼ n1 þ n2

n2

>

> P

>

> ðxi xÞ2=

>

> n1 \1

>

> P

:1 if

ðyi yÞ2=

n2

8

> P n1=2

>

> 2

P

>

> P ðxi xÞ2 ðxi xÞ2=

>

>

n1 þ n2

ðyi yÞ

> ðn1 þ n2 Þ 2

> if P n1 1

>

< n 1= n 2=

n 2 n 2 P n1 þ2 n2 ðyi yÞ2=

ðxi xÞ2

¼ 1þP n2

1 2

>

> ðyi yÞ2

P

>

>

>

> ðxi xÞ2=

>

> n1 \1

>

> 1 if P

: ðyi yÞ2=

n2

Now, under the null hypothesis H : r21 ¼ r22 ¼ r2 ; consider the statistic

P .

ðxi xÞ2

ðn1 1Þ s21

F¼P 2.

¼ 2 Fn1 1;n2 1 :

ðyi yÞ s2

ð n2 1Þ

8

> n1=2

>

>

>

> ðn þ n Þ=2 n1 1

F

< ð n1 þ n2 Þ 1 2

n2 1

if F n1 ðn2 1Þ

Lðx; yÞ ¼ n 1=2 n 2=2 ðn1 þ n2 Þ=2 n2 ðn1 1Þ

n1 n2

>

> 1 þ

n1 1

F

>

>

n2 1

>

:1 if F \ nn12 ððnn21 1Þ

1Þ

114 4 Likelihood Ratio Test

n1 =2

n1 1

n2 1 F n1 ð n2 1Þ

ð4:6Þ ) ðn1 þ n2 Þ=2 \C1 if F ;

n1 1

n2 ð n1 1Þ

1þ n2 1 F

2 n1 =2 3

n1 1

6 n2 1F n1 ðn2 1Þ7

where C1 is such that P4 ðn1 þ n2 Þ=2 \C1 and F n2 ðn1 1Þ5 ¼ a:

n 1

1 þ n1 1F

n1=2

2

n1 1

n2 1F

Writing gðF Þ ¼ ðn1 þ n2 Þ=2

n 1

1 þ n1 1F

2

0 n1 n1 1 n1 1 n1 1 n1 1 n1 þ n2

) g ðF Þ ¼ F 1þ F F

2 n2 1 n2 1 n2 1 n2 1 2

n1 þ2 n2 1

n1 1 n1 1

1þ F ¼0

n2 1 n2 1

n1 1 n1 1

) n1 1 þ F ðn 1 þ n 2 Þ F¼0

n2 1 n2 1

n1 ðn2 1Þ

)F¼

n2 ðn1 1Þ

1Þ and the shape of the curve is

1Þ

Þ

) F [ d0 [ nn12 ððnn12 1 Þ

1Þ . The constant d0 is obtained by the size condition

Pr21 ¼r22 fF [ d0 g ¼ a

4.1 Introduction 115

(

s21

1 if [ Fn1 1;n2 1;a

) LR test is given by /ðx; yÞ ¼ s22

0 otherwise

(b) Similarly, for testing H : r21 ¼ r22 against K : r21 6¼ r22 the LR test is equivalent

to

s21 s21

\d 1 or [ d0 :

s22 s22

PH fF\d 1g ¼ PH fF [ d0 g ¼ a=2

This gives d 1 ¼ Fn1 1;n2 1;1a2 and d0 ¼ Fn1 1;n2 1;a=2 : The LR test is, there-

fore, given as

8

>

>

s21

\Fn1 1;n2 1;1a=2

<1 if s22

/ðx; yÞ ¼ s21

[ Fn1 1;n2 1;a=2

>

>

or s22

:

0 otherwise

Y1 ; Y2 ; . . .Yn2 be two independent samples

drawn from N l1 ; r1 and N l2 ; r2 ; respectively. Obtain the likelihood ratio test of

2 2

(b) H : l1 ¼ l2 against K : l1 6¼ l2 when r21 ¼ r22 ¼ r2 but unknown

(c) H : l1 l2 against K : l1 \l2 when r21 ¼ r22 ¼ r2 but unknown

Solution

(a) Here, h ¼ l1 ; l2 ; r21 ; r22

H ¼ fðl1 ; l2 Þ; 1\li \1; i ¼ 1; 2g

P n1 P n2

n1 þ n2 1

ðXi l1 Þ2 1

ðyi l2 Þ2

rn1 n2 2r2 2r2

pðx; yjhÞ ¼ ð2pÞ 2

1 r2 e

1 i¼1 2 i¼1

_ _

l1 ¼ x and l2 ¼ y

116 4 Likelihood Ratio Test

P

n1 n2 P

n1 þ n2 1

ðxi xÞ2 1

ðyi yÞ2

rn1 n2 2r2 2r2

Sup pðx; yjhÞ ¼ ð2pÞ 2

1 r2 e

1 i¼1 2 i¼1

Under l 2 H0 ;

P

n1 P n2

n1 þ n2 1

ðXi lÞ2 1

ðyi lÞ2

rn 1 n2 2r2 2r2

pðx; yjlÞ ¼ ð2pÞ 2

1 r2 e

1 i¼1 2 i¼1

1 X 1 X

log pðx; yjlÞ ¼ k 2

ð x i lÞ 2 2 ð y i lÞ 2 ;

2r1 2r2

The ML estimator for l is obtained as

d

log pðx; yjlÞ ¼ 0

dl

1X n1

1X

) 2 ð xi l Þ þ 2 ð y i lÞ ¼ 0

r1 1 r2

1 1 n1 n2

) 2 n1x þ 2 n2y ¼ þ l

r1 r2 r21 r22

n1 x n2 y r21 r22

þ

_ r21 r22 n1 yþ n2 x

) lH ¼ ¼ :

n1

þ n2 r21 r22

r21 r22 n1 þ n2

This gives

P

n1 _ 2 Pn2 _ 2

n þn 1

xi l H 1

yi l H

12 2

rn 1 n2

2 2

pðx; yjhÞ ¼ ð2pÞ 1 r2 e

2r 2r

Sup 1 i¼1 2 i¼1

2

2

P _ P _

1

ðxi xÞ2 þ n1 xl H 1

ðyi yÞ2 þ n2 yl H

2r2 2r2

e 1 2

Lðx; yÞ ¼ P P

1

ðxi xÞ2 1

ðyi yÞ2

2r2 2r2

e 1 2

_ 2 _ 2

n1 n2

xl H yl H

¼e 2r2

1

2r2

2

4.1 Introduction 117

"r2 #2

2

n1 ð

1 x yÞ

_

Now, x lH ¼ r2 r2

n1 þ n2

1 2

22 32

2 r2

ð

y

x Þ

y lH ¼ 4n2r2 r2 5

_

n1 þ n2

1 2

2 1

n1 _ n2 _ 2 r1 r22

2 2

) 2 ðx lH Þ þ 2 ðy lH Þ ¼ ðx yÞ þ

r1 r2 n1 n2

2 1

r r2

12 1 þ n2 ðxyÞ2

Thus, Lðx; yÞ ¼ e 2n1

1

1 r21 r22

þ ðx yÞ2 \C1

2 n1 n2

0 1

ðx

y Þ 2

or; @ r2 r2 A [ C2

n þ n2

1 2

1

x y

or; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ C3

r21 r22

n1 þ n2

y ﬃ

2 2

N(0, 1). Hence, the likelihood ratio

r r

1

n1 þ n2

2

8 9

> >

< x y =

q

x ¼ ðx; yÞ : 2 ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ C3 ;

>

: r1 r22 >

;

n1 þ n2

2 3

x y

6 7

PH 4 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ C3 5 ¼ a

r21 r22

n1 þ n2

118 4 Likelihood Ratio Test

8

<1 if qjxﬃﬃﬃﬃﬃﬃﬃﬃﬃ

yj

ﬃ [ sa2

r 2 2r

/ðx; yÞ ¼ 1 þ n2

: n1 2

0 Otherwise

(b) Here h ¼ l1 ; l2 ; r21 ; r22

H0 ¼ l; r2 ; 1\l\1; r2 [ 0

H¼ l1 ; l2 ; r2 ; 1\li \1; r2 [ 0; i ¼ 1; 2

For ðl1 ; l2 ; r2 Þ 2 H;

n

n1 þ2 n2 P 1 P

n2

2 2

12 ðxi l1 Þ þ ðyi l2 Þ

1 2r

p x; yjl1 ; l2 ; r2 ¼ e 1 1

2pr2

" #

n1 þ n2 1 X n1

2

X

n2

2

log p ¼ log 2pr 2

2

ð x i l1 Þ þ ð y i l2 Þ

2 2r 1 1

d log p _

¼ 0 ) l1 ¼ x

dl1

d log p _

¼ 0 ) l2 ¼ y

dl2

P P

d log p _2 ðxi xÞ2 þ ðyi yÞ2

¼ 0 ) r ¼

dr2 n1 þ n2

n1 þ2 n2 ðn1 þ n2 Þ

1 _2

e2ðn1 þ n2 Þ

2 1

) Sup p x; yjl1 ; l2 ; r ¼ 2

r

ðl1 ;l2 ;r Þ2H

2 2p

For ðl; r2 Þ 2 H0 ;

n1 þ2 n2 P

n1 P

n2

2 2

12 ðxi lÞ þ ðyi lÞ

1 2r

p x; yjl; r2 ¼ e 1 1

2pr2

" #

n1 þ n2 1 X n1

2

X

n2

2

) log p ¼ log 2pr 2

2

ð x i lÞ þ ð y i lÞ

2 2r 1 1

4.1 Introduction 119

Now,

¼ 0 ) lH ¼

dl n1 þ n2

" #

Xn1 2 Xn2 2

d log p _2 1 _ _

¼ 0 ) rH ¼ xi lH þ yi lH

dr2 n1 þ n2 1 1

" #

1 Xn1 2 X n2 2

2 _ 2 _

¼ ðxi xÞ þ n1 x lH þ ðyi yÞ þ n2 y lH

n1 þ n2 1 1

2 2 n o2

_ n1 x þ n2 y n2 ðx yÞ

Here, x lH ¼ x n1 þ n2 ¼ n1 þ n2

2

n1x þ n2y 2

n1 ðy xÞ 2

_

y lH ¼ y ¼

n1 þ n2 n1 þ n2

This gives

" #

1 X

n1

n2 ðx yÞ 2 X n2

n1 ðy xÞ 2

_2 2 2

rH ¼ ðxi xÞ þ n1 þ ðyi yÞ þ n2

n1 þ n2 1 n1 þ n2 1

n1 þ n2

" #

1 X

n1 X

n2

n1 n2

¼ ðxi xÞ2 þ ðyi yÞ2 þ ðx yÞ2

n1 þ n2 1 1

n 1 þ n2

1 n1 þ2 n2 _ 2

ðn1 þ n2 Þ

e2ðn1 þ n2 Þ

2 1

Therefore, Sup pðx; yjl; r Þ ¼ 2p 2

rH

ðl;r2 Þ2H0

Hence we get,

ðl;r2 Þ2H0

Lðx; yÞ ¼

Sup pðx; yjl1 ; l2 ; r2 Þ

ðl1 ;l2 ;r2 Þ2H

_2

!n1 þ2 n2 Pn1 P

r ðxi xÞ2 þ n12 ðyi yÞ2

¼ _2

¼ Pn 1 P 1

1 ð xi

2 3

6 7

6 7

6 1 7

¼6 7

6 n n ðx yÞ2 7

41 þ nP 1 2 o5

n1 P

ðn1 þ n2 Þ 1 ðxi xÞ2 þ n12 ðyi yÞ2

120 4 Likelihood Ratio Test

2 3

6 7

6 1 7

¼6

6

7

7

ðxyÞ

P

2

P n2

41 þ 5

ðn11 þ n12 Þðn1 þ n2 2Þf ðyi yÞ2 g

n1

1

ðxi xÞ2 þ 1

ðn1 þ n2 2Þ

N l1 ; r2 and Y N l2 ; r2

We know, X n1 n2

Y N l1 l2 ; r2 1 þ 1

X

n1 n2

ðl1 l2 Þ

Thus, X ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

rY

N ð0; 1Þ

r 1

n1 þ n1

2

P

Again, 1

ðXi X Þ2 v2n 1

r2

P 1

2

hP P i

Therefore, r12 ðX i X Þ2 þ ðYi Y Þ v2n1 þ n2 2

2

Thus, under H : l1 ¼ l2 ;

. ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ðX Y Þ q

n1 þ n2

r 1 1

t ¼ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

hP i ﬃ tn1 þ n2 2 ;

P

1

ðX i X Þ þ 2 2

ðYi Y Þ n1 þ n2 2 1

r2

ðx yÞ

or; t ¼ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ﬃ n1 þ n2 2

n1 þ n2 s

1 1 2

P P

ðxi xÞ2 þ ðyi yÞ2 ðn1 1Þs21 þ ðn2 1Þs22

where s ¼

2

n1 þ n2 2 ¼ n1 þ n2 2

So, Lðx; yÞ ¼ 1

2

1 þ n þt n 2

1 2

Thus, the rejection region Lðx; yÞ\C gives

t2

1þ [ C1

n1 þ n2 2

or; t2 [ C2

or; jtj [ C3

PH ½jtj [ C3 ¼ a:

4.1 Introduction 121

8 jx yj

>

<1 if rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

[ tn1 þ n2 2;a=2

/ðx; yÞ ¼

s

1

þ 1

>

:

n1 n2

0 Otherwise

LR test is given as

8 x y

>

<1 if rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

\ tn1 þ n2 2;a

s

/ðx; yÞ ¼ 1

þ 1

>

n1 n2

:

0 Otherwise

one-way classiﬁed data.

We are to ﬁnd LR test for H : l1 ¼ l2 ¼ ¼ lk against K: l0i are not all equal.

Answer Here, h ¼ ðl1 ; l2 ; . . .lk ; r2 Þ and H ¼ fh : 1\li \1; i ¼ 1ð1Þk;

r [ 0g: Observe that H[K ¼ H: Likelihood functions ¼ pð x=hÞ ¼

2

P

k Pni

ðxij li Þ

2

n=2 n

12 P

ð2pÞ r e ; n ¼ k1 ni :

2r

1 1

Sup pðx=hÞ p x=h

H

Likelihood ratio is Lð xÞ ¼ h2H x ¼ ;

Sup pð =hÞ pðx=hÞ

h2H

P

ni

Now, h 2 H) MLEs are li ¼ xi ¼ n1i xij

1

@ log p x=h

as ¼ 0 ) li ¼ xi

@li

1X k X ni

2 within S:S: W

r2 ¼ xij xi ¼ ¼ :ðsayÞ:

n 1 1 n n

" #

@ log p x=h 1 X

k X ni

2

Here ¼ 0 ) r2 ¼ xij xi

@r2 n 1 1

122 4 Likelihood Ratio Test

1Xk X

ni

1Xk

lH ¼ xij ¼ nixi ¼ x; ðsayÞ

n 1 1 n 1

1X k Xni

2 Total S:S: T

r2H ¼ xij x ¼ ¼

n 1 1 n n

" #

1 X k 2 W þB

¼ Wþ ni xi x ¼

n 1

n

" #

Xk 2

B¼ ni xi x ¼ BetweenðmeansÞ S:S :

1

.

Hence, we get p x ^h ¼ ð2pÞn=2 ðrÞn e2 :

n

.

p x ^hH ¼ ð2pÞn=2 ðr

^H Þn e2

n

2 n=2

^

r

So, Lð xÞ ¼ ^2H

r

and therefore reject H iff Lð xÞ\c; PH fLð xÞ\cg ¼

a 2 ð0; 1Þð0 \ c \ 1Þ

^2H

r W þB B

, [ c0 , [ c0 , [ c00

r 2 W W

B=

, T ¼ k 1 [ c000

W=

nk

Under H, T F ðk 1; n kÞ

)c000 ¼ Fa;ðk1;nkÞ

B=

So, our LR test is W k 1 [ Fa;ðk1;nkÞ as rejection of H.

=n k

Note It is the same as ANOVA test.

Special case: (i) l ¼ l0 (given)

n=2 "P P 2 #n=2

k ni

^2

r 1 xij xi

Lð xÞ ¼ ¼ Pk Pn

1

2

^H 2

r

1 xij l0

i

1

" #n=2

W

¼ P

W þ k1 ni ðxi l0 Þ2

4.1 Introduction 123

Pk

1

ni ðxi l0 Þ2

T ¼ k 1

[ Fa;ðk;nkÞ

W=

ðn kÞ

ni

P

k P 2 n o

exp 2r1 2 xij x exp 2r1 2 ðB þ WÞ

0 1

Lð xÞ ¼ 1

¼ n0 o

P ni

k P 2

exp 2r2

1

xij xi exp 2rW

2

0

0

1 1

B

¼ exp 2

2r0

B

[ v2a;k1 :

r20

2 log Lð xÞ by v2 —distribution with d.f. k − 1.

Note The above hypothesis is equivalent to homogeneity of k univariate normal

population.

Example 4.8

Suppose xij N li ; r2i ; j ¼ 1ð1Þni ; i ¼ 1ð1Þk independently:

Obtain the likelihood ratio test of H : r21 ¼ r22 ¼ r23 ¼ ¼ r2k against K: not

all ri ’s are equal.

Answer h ¼ l1 ; l2 ; . . .; lk ; r21 ; r22 ; . . .; r2k

H ¼ h : 1\li \1; r2i [ 0; i ¼ 1ð1Þk

8 9

< 12 P ðxij li Þ2 =

ni

ð2pÞ

n =2 Q P

k

Likelihood function = p x=h ¼ Q k

; n¼

2r

ni e i ni

k

ð r2 2

i Þ

i¼1

: ;

i¼1

124 4 Likelihood Ratio Test

@ log pðx=hÞ 2

Pn i

Now, @li ¼0) 2r2i j¼1 ^i ¼ xi

xij li ð1Þ ¼ 0 ) l

and

@ log p x=h ni 1 X ni

2

) þ xij li ¼ 0

@ri2 2

2ri 4

2ri j¼1

1X ni

2 ni 1 2

^2i ¼

)r xij xi ¼ s

ni j¼1 ni i

. Q 2 n2i

Hence, for h 2 H we get p x ^h ¼ ð2pÞn=2 ki¼1 r ^i

x

¼ Sup p =h

h2H

Under H, p x=h reduces to

P

k P

ni

ðxij li Þ

2

12

p x=h ¼ ð2pÞn=2 rn e ;

2r

P ni

k P 2 P

k

_ _2

where from we get liH ¼ xi and rH ¼ 1n xij xi ¼ 1n ðni 1Þs2i

. 2 n=2 n=2

Hence, Sup p x=h ¼ p x ^hH ¼ ð2pÞn=2 r ^H e

h2H

Hence, likelihood ratio is

Qk nðni 1Þs2i o 2i

n

n=2

r^2H i¼1 ni

Lð xÞ ¼ Qk ¼h in=2 :

ni

^2i Þ 2 Pk

i¼1 ðr i¼1 ðni 1Þsi

1 2

n

Therefore, we could only say about its asymptotic distribution, i.e. 2 loge Lð xÞ:

Pk

ðni 1Þs2i X k

ð ni 1Þ 2

So; 2 loge Lð xÞ ¼ n loge i¼1

ni loge si

n i¼1

ni

It has been suggested by Bartlett (1937) that the above approximation to

Chi-square for large n can be improved if we replace the ML estimators of r2 ’s by

unbiased estimators, i.e. if we replace ni by ni 1 and n by n − k in the above

expression. So,

4.1 Introduction 125

Pk

ðni 1Þs2i X k

2 loge Lð xÞ ¼ ðn k Þ loge i¼1

ðni 1Þ loge s2i

nk i¼1

X Pk

k

s 2

ðni 1Þs2i

¼ ðni 1Þ loge 2 ; where s ¼ Pi¼1

2

k

i¼1 ðni 1Þ

i¼1

si

Bartlett has also suggested that Chi-square approximation will hold goof for ni as

low as four or ﬁve if the above statistic is divided by t; where

" #

1 Xk

1 1

t ¼ 1þ Pk

3ðk 1Þ i¼1 ni 1 i¼1 ðni 1Þ

PK 2

ðni 1Þ loge s2

i¼1 s

Hence, T ¼ t

i

v2k1

So, we reject H approximately at level a iff

T [ v2k1;a

greater than that of 2 loge Lð xÞ:

1

for x l: Find the LR

test for testing

I. H : l ¼ l0 against K: l 6¼ l0

II. H : r ¼ r0 against K: r ¼6 r0

Solution

I. Likelihood function

P

n

1 r

1

ðxi lÞ

p x=l; r ¼ n e i if xi [ l8i

r

¼ 0 otherwise

^ ¼ y1 ; y1 ¼ 1st order statistic

l

Pn

^ ¼ 1n ðyi y1 Þ; y1 \y2 \yn are the order statistics of x1 ; x2 . . .; xn :

r

i¼1

Then

Sup p x=l; r ¼ p x=l ^Þn en

^ ¼ ðr

^; r

l;r

126 4 Likelihood Ratio Test

Under H, p x=l; r reduces to

P

n

r1 ðxi l0 Þ

p x=l ; r ¼ rn e i

0

P

n Pn

^H ¼ 1n ðxi l0 Þ ¼ 1n ðyi l0 Þ

) MLE of r is r

i i

x x

Then Sup p =l; r ¼ p =l ; r ¼ ðr^H Þn en

l;r2H 0 ^H

Sup p x=l; r n

l;r2H ^

r

Lð xÞ ¼ ¼

Sup p x=l; r ^H

r

l;r

P

ð yi y1 Þ

Lð xÞ\C , P \c

ð y i l0 Þ

P

ð yi y1 Þ n ð y 1 l0 Þ

,P \c , P [ c0

ð y i y 1 Þ þ nð y 1 l 0 Þ ð yi y1 Þ

Under

H : l ¼ l0

nðy1 l0 Þ=

P 2 F2;2n2 :

ðyi y1 Þ=

2n 2

n= ðy1 l Þ

T ¼P 2 0

[ Fa;2;2n2

ðyi y1 Þ=

ð2n 2Þ

II. As earlier, it can be shown that the test has acceptance region as

1X n

c1 \T ¼ ðyi y1 Þ\c2 ;

r0 i

4.1 Introduction 127

PH c1 \v22n2 \c2 ¼ 1 a

PH c1 \v22n \c2 ¼ 1 a

Example 4.10 Let ðX11 ; X21 Þ; ðX12 ; X22 Þ. . .; ðX1n ; X2n Þ be a random sample from a

bivariate normal distribution with means l1 and l2 , variances r21 and r22 and cor-

relation coefﬁcient q. Find the likelihood ratio test of

H : q ¼ 0 against K : q 6¼ 0

Solution

Here, h ¼ l1 ; l2 ; r21 ; r22 ; q

H ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; 1\q\1; i ¼ 1; 2

H0 ¼ l1 ; l2 ; r21 ; r22 ; q : 1\li \1; r2i [ 0; q ¼ 0; i ¼ 1; 2

In H, ML estimators for l1 ; l2 ; r21 ; r22 and q are

_ _ _2 P _2 P _

l1 ¼ x1 ; l2 ¼ x2 ; r1 ¼ 1n ðx1i x1 Þ2 ; r2 ¼ 1n ðx2i x2 Þ2 : and q¼

P

ðx1i x1 Þðx2i x2 Þ

P P 1=2 ¼ r.

f ðx1i x1 Þ2 ðx2i x2 Þ2 g

Thus,

h 2 2 i

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃn 2 1r

1

r

n^ r

n^

1 þ 2 2r:n:r

e ð Þ r^1 r^2

_ _

Sup pð xjhÞ ¼ 2pr1 r2 1 r 2

2 2 2

h2H

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃn

en

_ _

¼ 2pr1 r2 1 r 2

_ _ 1X _2 _2 1X

l1H ¼ x1 ; l2H ¼ x2 ; r1H ¼ðx1i x1 Þ2 ; r2H ¼ ðx2i x2 Þ2

n n

h i

n 1 n^r21H þ n^r22H

_ _

Thus, Sup pð xjhÞ ¼ 2pr1H r2H

2 2 2

e r^1H r^2H

h2H

n

en

_ _

¼ 2pr1H r2H

Sup pð xjhÞ

h2H0 n=2

Lð xÞ ¼ ¼ 1 r2

Sup pð xjhÞ

h2H

128 4 Likelihood Ratio Test

Lð xÞ\C

n=2

or; 1 r2 \C

or; 1 r 2 \C1

or; r 2 [ C2

or; jr j [ C3 ;

where C3 is obtained as

PH ½jr j [ C3 ¼ a

sample correlation coefﬁcient and its distribution for q ¼ 0 is symmetric about 0.

Thus, PH ½r\ C3 ¼ PH ½r [ C3 ¼ a=2

Equivalently, the critical region for H : q ¼ 0 against K : q 6¼ 0 is

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

jr j n 2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ k:

1 r2

pﬃﬃﬃﬃﬃﬃ

Since rpﬃﬃﬃﬃﬃﬃﬃ

n2ﬃ

1r2

has the t-distribution with (n − 2) d.f. when q ¼ 0, the constant k is

given as

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

jr j n 2

PH pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ k ¼ a:

1 r2

Note For example, if n = 4 and a = 0.05 then

Zc3 Z1

1 1

dr ¼ dr ¼ 0:025

2 2

1 c3

sample of size four is such that jr j [ 0:95:

1 x=k

ke ; x [ 0; k [ 0: Find the likelihood ratio test of H : k ¼ k against K : k 6¼ k :

0 0

4.1 Introduction 129

H 0 ¼ fk : k ¼ k0 g

_

In H, MLE of k is k ¼ x

Thus, the LR test is

1 k0

H0 kno e xn n x

Lð xÞ ¼ ¼ ¼ e k0 :en

Sup pðxjkÞ 1 n

xn :e

kn0

H

nx

writing gðxÞ ¼ xn e k0 :

It shows that the curve y ¼ gðxÞ has single maximum at x ¼ k0 and the shape of

the curve is

the critical region Lð xÞ\C: The constants d0 and d1 are obtained by the size

condition

PH ½d0 \x\d1 ¼ 1 a:

Mx ðtÞ ¼ ð1 ktÞ1

kt n

) Mx ðtÞ ¼ 1

n

G n; k ; i.e. f ðxÞ ¼ 1 n n enkx xn1 :

Thus, X n CðnÞ k

One can ﬁnd the values of d0 and d1 from the gamma distribution table under H0 .

Chapter 5

Interval Estimation

5.1 Introduction

lation having distribution function Fh and h is the unknown parameter (or the set of

unknown parameter). We try to estimate the parametric function cðhÞ by means of a

single value, say t, the value of a statistic T corresponding to the observed values

ðx1 ; x2 ; . . .; xn Þ of the random variables ðX1 ; X2 ; . . .; Xn Þ. This estimate may differ

from the exact value of cðhÞ in the given population. In other words, we take t as an

estimate of cðhÞ such that jt cðhÞj is small with high probability. In the point

estimate we try to choose a unique point in the parameter space which can rea-

sonably be considered as the true value of the parameter. Instead of unique estimate

of the parameter we are interested in constructing a family of sets that contain the

true (unknown) parameter value with a speciﬁed (high) probability. In many

problems of statistical inference we are not interested only in estimating the

parameter or testing some hypothesis concerning the parameter, we also want to get

a lower or an upper bound or both, for the real-valued parameter. Here two limits

are computed from the set of observations, say t1 and t2 and it is claimed with a

certain degree of conﬁdence (measured in probabilistic terms) that the true value of

cðhÞ lies between t1 and t2 . Thus we get an interval ðt1 ; t2 Þ which we expect would

include the true value of cðhÞ. So this type of estimation is called intervalestimation.

In this chapter we discuss the problem of interval estimation.

example, (X, 2X) is a random interval. Note that, 12 X 1 , X 1 2X:

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_5

132 5 Interval Estimation

A conﬁdence interval (CI) of h is a random interval which covers the true value

of the parameter h with speciﬁed degrees of conﬁdence (assurance). In other words,

a random interval I ð X Þ ¼ hð X Þ; hð X Þ satisfying

n o

Prh h 2 I X 1 a8h 2 H ð7:1Þ

holds then (1 − α) will be called conﬁdence coefﬁcient. hð X Þ and hð X Þ are the lower

and upper conﬁdence limits respectively.

Let I ð X Þ ¼ ½hð X Þ; 1 be a random interval such that Prh fh 2 I ð X Þg ¼

Prh fh hð X Þg 1 a 8 h 2 H: Then hð X Þ is called the lower conﬁdence bound

of θ at conﬁdence level (1 − α). Similarly we can deﬁne upper conﬁdence bound

hð X Þ such that Prh fh 2 I ð X Þg ¼ Prh h hð X Þ 1 a 8 h 2 H; corresponding

to a random interval I ð X Þ ¼ a; hð X Þ :

Remark 1 In making the probability statement we do not mean θ is a random

variable. Indeed, θ is a constant. All that is meant here is that the probability is

(1 − α) that the random interval hð X Þ; hð X Þ will cover θ whatever the true value

of θ may be. More speciﬁcally, it is asserted that about 100(1 − α)% statements of

the form h 2 hð X Þ; hð X Þ should be correct.

b

Remark 2 In thepoint estimation, we choose

an estimate, say h ð xÞ; on the basis of a

b

sample x such that the difference h x h is small with high probability. In other

words, in the point estimator we try to choose a unique point in the parameter space

which can reasonably be considered as the true value of theparameter. On the other

hand, in the interval estimation, we choose a subset of the parameter space, say I x ;

on the basis of a sample x which reasonably includes the true value of the parameter.

More speciﬁcally in interval estimation, we choose an interval I x ; such that

n o

Prh h 2 I x 1a 8 h:

Method I

A simple procedure for ﬁnding a conﬁdence interval

Let T be a statistic and WðT; hÞ be a function of T and θ. Suppose the distribution of

WðT; hÞ is free from θ. Then it is always possible to choose two constants K1 and K2

ðK1 K2 Þ such that

5.3 Construction of Conﬁdence Interval 133

a1 þ a2 ¼ a:

Hence PrfK1 wðT; hÞ K2 g 1 ða1 þ a2 Þ ¼ 1 a:

Suppose it is possible to convert the inequality K1 wðT; hÞ K2 into the form

hðT Þ h hðT Þ:

Then Pr hðT Þ h hðT Þ 1 a: This fact gives us a (1 −α) level conﬁdence

interval for θ.

Example 5.1 Let X1 ; X2 ; . . .; Xn be a random sample form N ðl; r2 Þ. Find ð1 aÞ

level conﬁdence interval for l when (i) r2 is known and (ii) when r2 is unknown.

Solution (i) Suppose r2 is known.

pﬃﬃ

We take wðT; hÞ ¼ nðrxlÞ which is an N ð0; 1Þ variate. Hence the distribution of

wðT; hÞ is independent of h. We can choose k1 and k2 from N ð0; 1Þ such that

pﬃﬃﬃ

nðx lÞ

P s1a1 sa 2 ¼ 1 ð a1 þ a 2 Þ ¼ 1a

r

h i

So, x sa2 prﬃﬃn ; x s1a1 prﬃﬃn is a ð1 aÞ level conﬁdence interval for l if r is

known.

(ii) Suppose r2 is unknown:

pﬃﬃ

We take wðT; hÞ ¼ nðxslÞ which is student’s t statistic with d.f ðn 1Þ where

P

s2 ¼ n11

ðxi xÞ2 : The distribution of wðT; hÞ is independent of h. Again we

choose k1 and k2 using a t-distribution with (n − 1) d.f such that

pﬃﬃﬃ

nðx lÞ

P tn1;1a1 tn1;a2 ¼ 1 ð a1 þ a2 Þ ¼ 1a

s

s s

) P x tn1;a2 pﬃﬃﬃ l x tn1;1a1 pﬃﬃﬃ ¼ ð1 aÞ:

n n

h i

So x tn1;a2 psﬃﬃn ; x tn1;1a1 psﬃﬃn is a ð1 aÞ level conﬁdence interval for l, if

r2 is unknown.

Example 5.2 Let X1 ; X2 ; . . .; Xn be a random sample from N ðl; r2 Þ. Find ð1 aÞ

level conﬁdence interval for r2 when (i) l is known and (ii) l is unknown.

Solution (i) SupposeP l is known.

ðxi lÞ2

We take wðT; hÞ ¼ r2 which is distributed as v2 with n d.f. Thus its

distribution is independent of h. We can choose k1 and k2 from v2 distribution with

n d.f such that

134 5 Interval Estimation

" P #

ð x i lÞ 2

P v2n;1a1 v2n;a2 ¼ 1 ð a1 þ a2 Þ ¼ 1a

r2

"P P #

ðxi lÞ2 ð x i lÞ 2

)P r2 ¼ 1a

v2n;a2 v2n;1a1

P P

ðxi lÞ2 ðxi lÞ2

Thus v2n;a ; v2n;1a

is 100(1 − a)% conﬁdence interval of r2 when l is

2 1

ðxi xÞ2

We take the function wðT; hÞ ¼ r2 which is distributed as v2 with (n − 1)

d.f. This distribution

is independent of h. Proceeding as in (i),

P P

ðxi xÞ2 ðxi xÞ2

v2

; v2 is 100ð1 aÞ% conﬁdence interval of r2 when l is

n1;a2 n1;1a1

unknown.

Example 5.3 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼

1

h ; 0 \ x \ h: Find 100ð1 aÞ% conﬁdence interval of h.

Solution The likelihood function is L ¼ h1n . This is maximum when h is the

smallest; but h cannot be less than xðnÞ , the maximum of sample observations. Thus

_

h ¼ xð n Þ :

_

The p.d.f of h is given by

n1

_ nh

_

_

h h ¼ n ; 0\h\h:

h

_

x

Let u ¼ ðhnÞ ¼ hh : so that gðuÞ ¼ nun1 ; 0 < u < 1.

Thus the distribution of u is independent of h.

We ﬁnd u1 and u2 such that

Ru1 R1

where gðuÞdu ¼ a1 , gðuÞdu ¼ a2

0

_

u2

_ _

h h h

i.e. P u1 \ h \u2 ¼ 1 a: ) P u2 \h\ u1 ¼ 1 a

Thus, max u2 ; u1 is a 100ð1 aÞ% conﬁdence interval for h.

Xi max Xi

Example 5.4 X1 ; X2 ; . . .; Xn is a random sample from a G 1h ; 1 distribution having

p.d.f.

5.3 Construction of Conﬁdence Interval 135

1

f ðx/hÞ ¼ ex=h ; x 0:

h

Pn interval of h.

xi

Solution Let t = i¼1

h ¼ n

x

h which is a G(1, n) variate having p.d.f. g

t n1

(t) = CðnÞ e t ; 0 t\1:

1

nx

P k1 \t ¼ \k2 ¼ 1 ða1 þ a2 Þ ¼ 1 a

h

where

Zk1 Z1

gðtÞdt ¼ a1 ; gðtÞdt ¼ a2

0 k2

i.e.

nx nx

P \h\ ¼1a

k2 k1

Let T be a statistic and t1 ðhÞ and t2 ðhÞ be two quantities such that PrfT\t1 ðhÞg\a1

and PrfT [ t2 ðhÞg\a2 ; a1 ; a2 [ 0; a1 þ a2 ¼ a: The equation T ¼ t1 ðhÞ and

T ¼ t2 ðhÞ give us two curves as

θ(t) B(t,θ(t))

θ

T = t 2 (θ)

θ(t) A(t, θ(t))

T = t. It intersects the curves at A and B. Suppose the co-ordinates of A and B are

ðt; hðtÞÞ and t; hðtÞ respectively. According to the construction

) Pr½t1 ðhÞ T t2 ðhÞ ¼ Pr hðtÞ h hðT Þ 1 a:

136 5 Interval Estimation

Note 1: To avoid the drawing one may consider inverse interpolation formula.

Note 2: If the L.H.S’s of the Eq. (7.1) can be given explicit expression in terms of h

and if the equations can be solved for h uniquely, then roots are the conﬁdence

limits for h at conﬁdence level ð1 aÞ.

Example 5.5 Let X1 ; X2 ; . . .; Xn be a random sample from density function f ðxjhÞ ¼

h;0\x\h: Find 100ð1 aÞ% conﬁdence interval of h.

1

smallest; but h cannot be less than xðnÞ ; the maximum of sample observations. Thus

_

h ¼ xð n Þ :

_

The p.d.f of h is given by

n1

_ nh

_

_

h h ¼ n ; 0\h\h:

h

h _

i

P k1 ðhÞ\h\k2 ðhÞ ¼ 1 ða1 þ a2 Þ ¼ 1 a

where

Z

k1 ðhÞ

h ^h d ^h = a1 ð7:2Þ

0

and

Zh

h h^ d ^h = a2 ð7:3Þ

k2 ðhÞ

^n k ðhÞ

From (7.2), hhn 01 ¼ a1 or, k1 ðhÞ ¼ hða1 Þ1=n

^ n

From (7.3),¼ a2 or, 1 − ½k2hðhn Þ ¼ a2 or, k2 ðhÞ ¼ hð1 a2 Þ1=n . Therefore,

hn h

hn k2 ðhÞ

h _

i h i

1=n ^

h ^h

P ha1 \h\hð1 a2 Þ1=n ¼ 1 a or, P 1=n \h\ 1=n ¼ 1 a:

ð1a2 Þ ða1 Þ

Note We can get the conﬁdence interval of h by the Method I which is given in

Example 5.3.

Large sample conﬁdence interval: Let the asymptoticn distribution pofﬃﬃ a statistic

o

Tn be normal with mean h and variance r nðhÞ ; then Pr s1a1 ðTnrh Þ n

2

ðhÞ sa2 ’

1 a1 þ a2 ¼ 1 a (say).

This fact gives us a conﬁdence interval for h at conﬁdence level ð1 aÞ

approximately.

5.3 Construction of Conﬁdence Interval 137

100ð1 aÞ% conﬁdence interval for k. Px i

enk k

Solution Likelihood function is L ¼ x1 !x2 !:::::xn !

_

MLE of k ¼ k ¼ x

_ 1 k

V k ¼ ¼

@ 2 log L n

E @k2

xk

k=n

Hence P s1a1 \ p ﬃﬃﬃﬃﬃﬃ \sa2 ¼ 1 ða1 þ a2 Þ ¼ 1 a

xk

k=n

h pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃi

) P x sa2 x=n\l\x s1a1 x=n ¼ 1 a

qﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃ

dence interval for k is from x sa2 x=n to x s1a1 x=n.

Method 3 Method based on Chebysheff’s inequality:

By Chebysheff’s inequality, Pr½jT EðT Þj erT [ 1 e12 . Now setting 1

e2 ¼ 1 a; we can construct conﬁdence interval.

1

Example 5.7 Consider the problem of Example 5.3. Find the 100ð1 aÞ% conﬁ-

dence interval of h by using the method of Chebysheff’s inequality.

Solution

_ _ 2

We have E h ¼ n þn 1 h and E h h ¼ h2 ðn þ 1Þ2ðn þ 2Þ

By applying Chebysheff’s inequality we get

2 _ 3

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

h h

6 ð n þ 1Þ ð n þ 2Þ 7 1

P4 \ 25 [ 1 2 :

h 2 2

_ p _

Since h ! h; we replace h by h and for moderately large n,

2 _ 3

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

h h ð n þ 1Þ ð n þ 2Þ

6 7 1

P4 _ \ 25 [ 1 2 :

h 2 2

a

138 5 Interval Estimation

_1 _ 2 _ 1 _ 2

P h pﬃﬃﬃ h pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ \h\h þ pﬃﬃﬃ h pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ [ 1 a

a ð n þ 1Þ ð n þ 2Þ a ð n þ 1Þ ð n þ 2Þ

_

Again pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

1 ﬃ ’ 1n for large n and the fact that h h, we have

ðn þ 1Þðn þ 2Þ

h_ _

qﬃﬃi _ qﬃﬃ

P h\h\h 1 þ 1n 2a [ 1 a: Thus h ¼ max xi ; max xi 1 þ 1n 2a is an

approximate 1 a level conﬁdence interval for h.

and Neyman’s Criterion

From the above discussion, it is clear that ð1 aÞ level C.I is not unique. In fact,

inﬁnite number of C.I’s can be constructed by simple method [Because the equation

a1 þ a2 ¼ a; a1 0; a2 0 has inﬁnite number of solution for ða1 ; a2 Þ]. Again

for different choice of statistic, we get different conﬁdence intervals. For example,

in r.s. from

1 x

f ðx; hÞ ¼ eh ; 0 \ x \ 1

h

P

2 Xi

is a ð1 aÞ level lower conﬁdence bound for h

v22n;a

P

v2 2n since M.g.f. of X = ð1 thÞ1 ) M.g.f of

2 Xi

[As 2X

h v 2

2 ) h

2X

h ¼ ð1 2tÞ1 = M.g.f. of v2 2 ].

On the other hand, að1 aÞ level conﬁdence bound for h based on Xð1Þ ¼ minXi

i

2nXð1Þ

is v22;a

.

So we need some optimality criteria to choose one of the ð1 aÞ level conﬁ-

dence intervals.

1. Shortest length conﬁdence interval [Wilk’s criterion]

A ð1 aÞ level conﬁdence interval I ðhÞ ¼ hðT Þ; hðT Þ based on T will be of

shortest length if the inequality

hð T Þ hð T Þ

h ðT Þ h ðT Þ; for all h holds for every other ð1 aÞ level C.I.

h ðT Þ;

h ðT Þ based on the same statistic T.

Example 5.8 On the basis of an r.s. from N ðl; r2 Þ; r2 being unknown, a ð1 aÞ

given by

h C.I. for l based on X is

level i

sa2 prﬃﬃ ; X

X s1a1 prﬃﬃ ; a1 ; a2 0 and a1 þ a2 ¼ a: The length of the

n n

interval is ðsa2 s1a1 Þ prﬃﬃ : n

5.4 Shortest Length Conﬁdence Interval and Neyman’s Criterion 139

subject to a1 þ a2 ¼ a; a1 ; a2 0:

pﬃﬃ

Owing to the symmetry of the distribution nðrxlÞ about ‘0’, the quantity sa2

s1a1 will be a minimum when sa=2 ¼ s1a1 , i.e., when a1 ¼ a2 ¼ a=2: Thus the

h i

shortest length ð1 aÞ level C.I. for l based on x is x sa=2 prﬃﬃn ; x þ sa=2 prﬃﬃn :

minimize its expected length. e.g. In random sampling from N ðl; r2 Þ; (both l and

hr are unknown), a ð1i aÞ level C.I for h l is given i by

2

t

X ta ;n1 psﬃﬃ ; X psﬃﬃ : This length of the C.I is t psﬃﬃ :

2 n 1a ;n1 1 n a ;n1 t1a ;n1 2 1 n

which is a random quantity.

h So, to ﬁnd i the shortest (expected) length C.I, we minimize

ta2 ;n1 t1a1 ;n1 EpðsﬃﬃnÞ subject to a1 ; a2 0 and a1 þ a2 ¼ a: Owing to the sym-

metry of tn1 distribution about ‘0’, the minimum is attained at a1 ¼ a2 ¼ a=2:

Therefore

h the required shortest i expected length conﬁdence interval is

sﬃﬃ

X ta=2;n1 n ; X þ ta=2;n1 n :

p psﬃﬃ

Example 5.9 Consider the problem discussed in Example 5.2. On the basis of a

random sample fromN ðl; r2 Þ; l being known, a ð1 aÞ level CI for r2 is given by

P P

ðxi lÞ2 ðxi lÞ2

vn;a

2 ; v 2 , a1 ; a2 0 and a1 þ a2 ¼ a: The length of the interval is

1

2 n;1a

P

v2

1

v21 ðXi lÞ2 which has the expected value

n;1a1 n;a2

" #

1 1

nr2 :

v2n;1a1 v2n;a2

We wish to minimize 1

v2n;1a

v21 ;

n;a2

1

Rv2

2

subject to f ðv2 Þdv2 ¼ 1 a; where v21 ¼ v2n;1a1 ; v22 ¼ v2n;a2 and f ðv2 Þ is the

v21

p.d.f. of a chi-square r.v. with

2 n d.f. 3

v22

R

Now let / ¼ v12 v12 þ k4 f ðv2 Þdv2 ð1 aÞ5

1 2

v21

140 5 Interval Estimation

d/ 1

¼ 4 kf v21 ¼ 0

dv21 v1

d/ 1

¼ 4 þ kf v22 ¼ 0

dv2 v2

2

1 1

)k ¼ ¼ 4 2

v41 f v21 v2 f v2

Zv2

2

v41 f v21 ¼ v42 f v22 is to be satisfied and f v2 dv2 ¼ 1 a:

v21

It is very difﬁcult to ﬁnd the actual values of v21 and v22 . In practice, the equal tails

P P

ðxi lÞ2 ðxi lÞ2

interval, v2

; v2 , is used.

n;a=2 n;1a=2

P

ðxi xÞ2 ðxi xÞ2

v2

; v2 , is employed.

n1;a=2 n1;1a=2

C.I. for h is given by

max xi ; max xi ; a ; a 0; a þ a ¼ a: The length L of the interval is

u2 u1 1 2 1 2

u1 u2 max xi .

1 1

We minimize L subject to

Zu2

nun1 du ¼ un2 un1 ¼ 1 a

u1

) ð1 aÞ1=n \u2 1

and

5.4 Shortest Length Conﬁdence Interval and Neyman’s Criterion 141

dL dL du1 1

¼ max xi þ

du2 du1 du2 u22

1 nu2n1 1

¼ max xi 2 n1 þ 2

u1 nu1 u2

un1 þ 1 u2n þ 1

¼ max xi \0;

u22 un1 þ 1

2 ¼ 1;u1 ¼ a

conﬁdence interval is given by max x ; max x a1=n : This conﬁdence interval has

i i

the smallest length among all conﬁdence intervals for h based on max xi

2. Neyman’s criterion

Let I1 ð X Þ and I2 ð X Þ be two ð1 aÞ levelconﬁdence intervals for h. I1 ð X Þ will be

accurate (or shorter) than I2 ð X Þ if

Ph0 fh 2 I ð X Þg Ph0 fh 2 I ð X Þg8h; h0 2 H; h 6¼ h0 for any other ð1 aÞ level C.I.

I ð X Þ.

A ð1 aÞ level C.I. I ð X Þ is said to be unbiased if,

value of θ).

Implication An unbiased conﬁdence interval includes true value more often than

it does contain wrong value.

A ð1 aÞ level unbiased C.I. I ð X Þ is said to be most accurate amongst the class

of unbiased ð1 aÞ level if Ph0 fh 2 I ð X Þg Ph0 fh 2 I ð X Þg8h; h0 2 H;h 6¼ h0 for

any other ð1 aÞ level unbiased C.I. I ð X Þ

Relation between non randomized test and conﬁdence interval

Theorem 5.1 Suppose Aðh0 Þ denoted the acceptance region of a level a test for

testing H0 :h ¼ h0 n o

Deﬁne S x ¼ h= x 2 AðhÞ

Proof By the construction of S x , we have

142 5 Interval Estimation

x 2 AðhÞ , h 2 S x

n o n o

)Ph h 2 S x ¼ Ph x 2 AðhÞ 1 a8h:

Note The implication of this theorem is that for a ﬁxed x , the conﬁdence region

x is the observed value of x

Theorem 5.2 Let S x be a ð1 aÞ level conﬁdence interval for h: Deﬁne AðhÞ ¼

n o

non-randomized test for testing H0 : h ¼ h0 .

Proof By the construction of AðhÞ, we have

x 2 AðhÞ , h 2 S x

n o n o

)Ph x 2 AðhÞ ¼ Ph h 2 S x 1 a8h:

Theorem 5.3 Suppose Aðh0 Þ denoted the acceptance region

n of an UMP,olevel-a

non-randomized test for testing H0 : h ¼ h0 . Deﬁne S x ¼ h= x 2 AðhÞ . Then

Proof By Theorem 5.1, it is clear that the level of set S x is ð1 aÞ.

Consider another acceptance region A ðh0 Þ of a level a non-randomized test for

testing H0: h¼ hn0 o

Let S x ¼ h= x 2 A ðhÞ , then the level of S x is also ð1 aÞ.

Since AðhÞ is the acceptance region of a UMP non-randomized test, we can write,

n o n o

Ph0 x 2 AðhÞ Ph0 x 2 A ðhÞ ðh 6¼ h0 Þ

n o n o

) Ph0 h 2 S x Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H

Since S x is arbitrary the proof follows immediately.

5.4 Shortest Length Conﬁdence Interval and Neyman’s Criterion 143

Theorem 5.4 Let S x be an UMA ð1 aÞ level conﬁdence interval for h. Deﬁne

n o

AðhÞ ¼ x =h 2 S x : Then Aðh0 Þ will be an acceptance region of a level-a

UMP test for testing H0 : h ¼ h0 .

Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of

a level-a non-randomized test for testing H0 : h ¼ h0 .

Now corresponding to another ð1 aÞ level C.I. S x ,

n o

let A ðhÞ ¼ x : h 2 S x :

for testing H0 :

h ¼h0

Now since S x is an UMA ð1 aÞ level C.I. for h,

n o n o

Ph0 h 2 S x Ph0 h 2 S x 8ðh 6¼ h0 Þh; h0 2 H; h 6¼ h0

n o n o

) Ph0 x 2 AðhÞ Ph0 x 2 A ðhÞ

which implies that Aðh0 Þ will be the acceptance region of level-a UMP

non-randomized test for testing H0 : h ¼ h0 , since A ðhÞ is arbitrary.

Relation between UMPU non-randomized test and UMAU conﬁdence interval

Theorem 3.5 Let Aðh0 Þ be the acceptance region of nan UMPU olevel-a

non-randomized test for testing H0 : h ¼ h0 . Deﬁne S x ¼ h= x 2 AðhÞ . Then

Proof According to construction of S x it will be a ð1 aÞ level Conﬁdence

n o

Interval for h. Let S x ¼ h= x 2 A ðhÞ corresponding to any other accep-

tance region A ðh0 Þ of a level-a non-randomized test for testing H0 : h ¼ h0 .

Now since Aðh0 Þ is the acceptance region of a level-a UMPU non-randomized

testfor testing H0 : h ¼ h0

n o n o

Ph0 x 2 AðhÞ Ph0 x 2 A ðh0 Þ 1 a8h; h0 2 H; h 6¼ h0

n o n o

) Ph0 h 2 S x Ph0 h 2 S x 1 a

i.e., S x is a UMAU ð1 aÞ level C.I. for h, since A ðh0 Þ is arbitrary.

144 5 Interval Estimation

Theorem 5.6 Let S x be an UMAU ð1 aÞ level conﬁdence interval for h.

n o

level-a UMPU test for testing H0 : h ¼ h0 .

Proof According to the construction of AðhÞ, Aðh0 Þ will be the acceptance region of

a level-a non-randomized test for testing H0 : h ¼ h0 .

Now, corresponding to any other ð1 aÞ level C.I. S x for h,

n o

let A ðhÞ ¼ x =h 2 S x ; then A ðhÞ will also be an acceptance region of a

level-a non-randomized

test for testing H0 : h ¼ h0 .

Since S x is UMAU ð1 aÞ level C.I.

n o n o

)Ph0 h 2 S x Ph0 h 2 S x 1 a8h; h0 2 H; h 6¼ h0

n o n o

) Ph0 x 2 AðhÞ Ph0 x 2 A ðh0 Þ 1 a

i.e. Aðh0 Þ will be an acceptance region of a level-a UMPU test for testing

H 0 : h ¼ h0 .

Example 5.11 Let X1 ; X2 ; . . .; Xn be a r.s. from Rð0; hÞ: The UMP level-a

non-randomized test for testing H0 : h ¼ h0 against h 6¼ h0 is given by the critical

pﬃﬃﬃ

region xðnÞ [ h0 or xðnÞ h0 n a:

n pﬃﬃﬃ o

Let AðhÞ ¼ x h n a\xðnÞ h

n o

Define S x ¼ hj x 2 AðhÞ

pﬃﬃﬃ

¼ hjh n a\xðnÞ h

xð n Þ

¼ hjxðnÞ h\ p ﬃﬃﬃ

n

a

n o

Thus, by Theorem 5.3, Sð xÞ ¼ hjxðnÞ h\ pðnnﬃﬃaÞ will be a ð1 aÞ level UMA

x

Chapter 6

Non-parametric Test

6.1 Introduction

tribution (say, normal distribution) from which a random sample is drawn and we

try to construct a test criterion (for testing hypothesis regarding parameter of the

population) and the distribution of the test criterion depends upon the parent

population.

In non-parametric tests the form of the parent population is unknown. We only

assume that the population, from which a random sample is drawn, is continuous

and try to develop a test criterion whose distribution is independent of the popu-

lation distribution under the hypothesis under consideration. A non-parametric test

is concerned with the form of the population but not with any parametric value.

A test procedure is said to be distribution free if the statistic used has a distri-

bution which does not depend upon the form of the distribution of the parent

population from which the sample is drawn. So in such procedure assumptions

regarding the population are not necessary.

Note Sometimes the term ‘distribution free’ is used instead of non-parametric. But

we should make some distinction between them.

In fact, the terms ‘distribution free’ and ‘non-parametric’ are not synonymous.

The term ‘distribution free’ is used to indicate the nature of the distribution of the

test statistic whereas the term ‘non-parametric’ is used to indicate the type of

hypothesis problem investigated.

Advantages and disadvantages of non-parametric method over parametric

method

Advantages

(i) Non-parametric methods are readily comprehensible, very simple and easy to

apply and do not require complicated sample theory.

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_6

146 6 Non-parametric Test

(ii) No assumption is made about the form of frequency function of the parent

population from which the sample is drawn.

(iii) No parametric technique will be applicable to the data which are mere clas-

siﬁcation (i.e. which are measured in nominal scale), while non-parametric

method exists to deal with such data.

(iv) Since the socio-economic data are not, in general, normally distributed,

non-parametric tests have found applications in psychometry, sociology and

educational statistics.

(v) Non-parametric tests are available to deal with data which are given in ranks

or whose seemingly numerical scores have the strength of the ranks. For

example, no parametric test can be applied if the scores are given in grades

such as A, B, C, D, etc.

Disadvantages

(i) Non-parametric test can be used only if the measurements are nominal and

ordinal. Even in that case, if a parametric test exists it is more powerful than

the non-parametric test.

In other words, if all the assumptions of a statistical model are satisﬁed by the

data and if the measurements are of required strength, then non-parametric

tests are wasteful of time and data.

(ii) No non-parametric method exists for testing interactions in ANOVA model

unless special assumptions about the additivity of the model are made.

(iii) Non-parametric tests are designed to test statistical hypothesis only but not for

estimating parameters.

(i) Chi-square test

(ii) Kolmogorov–Smirnov test

(iii) Sign test

(iv) Wilcoxon signed-rank test

(v) Run test

or observations themselves are frequency of k mutually exclusive events

6.2 One-Sample Non-parametric Tests 147

consideration. The form of the distribution is not known. We want to test Ho :

FðxÞ ¼ F0 ðxÞ against: H1 : FðxÞ 6¼ F0 ðxÞ. Here Fo ðxÞ is speciﬁed with all its

parameters.

Under H0 we can obtain the probability ðpi Þ of a random observation from F0 to

belong in the ith class Ai ði ¼ 1; 2; . . .kÞ: The expected frequency in ith class is

ei ¼ npi for i ¼ 1; 2;. . .; k. These are compared with the observed frequencies xi .

Pearson suggested the statistic.

X

k

ðxi npi Þ2

v2 ¼ :

i¼1

npi

If the agreement between the observed ðxi Þ and expected frequencies ðei Þ is

close, then the differences ðxi npi Þ will be small and consequently v2 will be

small. Otherwise it will be large. The larger the value of v2 the more likely is that

the observed frequencies did not come from the population under H0 . This means

that the test is always right-sided. It can be shown that for large samples the

sampling distribution of v2 under H0 follows chi-square distribution with (k − 1) d.

f. The approximation holds good if every ei 5. In case there are some ei \5, we

have to combine adjacent classes till the expected frequency in the combined class

is at least 5. Then k will be the actual number of classes used in computing v2 . Thus

the null hypothesis H0 is rejected if Cal v2 [ v2a;k1 .

to test H0 : FðxÞ ¼ F0 ðxÞ 8 x against H1 : FðxÞ 6¼ F0 ðxÞ for some x.

Suppose Fn ðxÞ is the sample (empirical) distribution function corresponding to

any given x; that is, if the number of observation x is k, then

k

Fn ðxÞ ¼ :

n

x

The distribution of Dn does not depend on F0 as long as F0 is continuous. H0 is

rejected if Dn [ Dn;a .

148 6 Non-parametric Test

following:

(i) for the alternative H þ : FðxÞ F0 ðxÞ 8 x the appropriate statistic is

x

(ii) for the alternative H : FðxÞ F0 ðxÞ 8 x the appropriate statistic is

D

n ¼ Sup½F0 ðxÞ Fn ðxÞ

x

The statistics Dnþ and D n have the same distribution because of symmetry. The

test rejects H0 if Dnþ [ Dn;a þ

when alternative is FðxÞ F0 ðxÞ 8 x and rejects H0 if

Dn [ D

n;a when alternative is FðxÞ F0 ðxÞ 8 x at the level a.

tinuous. FðxÞ is unknown, from which we draw a random sample ðx1 ; x2 ; . . .; xn Þ.

We deﬁne fp ¼ pth order population quantile.

) Pr ½X np ¼ p i:e: Pr ½X np 0 ¼ p:

Case 1 H 1 : np [ np 0

To perform the test we consider the number of positive quantities among

x1 np 0 ; x2 np 0 ; . . .; xn np 0 . Sample values equal to np 0 are ignored.

Suppose S = total number of + signs, we note that, under H 0

Pr½X np 0 0 ¼ p

) Pr X np 0 [ 0 ¼ 1 p ¼ q; say:

) Under H 0 ; S Bðn;hqÞ i

Also, under H 1 ; Pr X n0p \p, i.e. Pr X np 0 [ 0 [ q. Suppose, under

h i

H 1 ; Pr X n0p [ 0 ¼ q0 where q0 [ q.

) Under H 1 ; S Bðn; q0 Þ where q0 [ q.

6.2 One-Sample Non-parametric Tests 149

8

< 1 if S[s

So the test is /ðSÞ ¼ a if S ¼ s

:

0 if S\s

(I) Pr½S [ s=H 0 \a Pr½S s=H 0

(II) E H 0 /ðSÞ ¼ a

From (I) we get s and from (II) a ¼ Pr S [ s=H 0 þ a Pr S ¼ s=H 0

a Pr S [ s=H 0

)a¼ :

Pr S ¼ s=H 0

S\s ) Acceptance of H 0

S ¼ s ) To draw a random number with probability of rejection ‘a’ and

probability of acceptance 1 − a.

Case 2

H 2 ¼ np \np 0 or np ¼ np 0 \np 0

Under H 2

Pr½X np 0 ¼ p

) Pr½X np 0 [ p

Suppose under H 2 ; Pr X np 0 [ 0 ¼ q0 where q0 \q.

) Under H 2 ; S Bðn; q0 Þ where q0 \q:

So a small value of S 8 indicates the rejection of H 0 .

> 1 if S \ s

<

So our test is /ðSÞ ¼ a if S ¼ s

>

:

0 if S [ s

150 6 Non-parametric Test

Pr S\s=H 0 \a Pr S s=H 0 and E H 0 /ðSÞ ¼ a

i.e: Pr S\s=H 0 þ a Pr S ¼ s=H 0 ¼ a or,

a Pr S\s=H 0

a¼

Pr S ¼ s=H 0

S [ s ) accept H 0

S ¼ s ) draw a random number with probability of rejection ‘a’ and probability

of acceptance (1 − a).

Large sample test

Under H 0 ; S Bðn; qÞ

) under H 0 s ¼ Spﬃﬃﬃﬃﬃﬃ

npq Nð0; 1Þ

nq

) x0 : s\sa

Case 3 H 3 : np 6¼ np 0

Suppose under H 3 ; Pr½Xnp 0 [ 0 ¼ q0 where q0 6¼ q

) Under H 3 ; S Bðn; q0 Þ where q0 6¼ q:

So a small or a large value of S indicates the rejection of H 0 . Here the test is

8

>

> 1 if S\s1

>

>

< a1 if S ¼ s1

/ ðSÞ ¼ 0 if s1 \ S \ s2

>

>

>

> a if S ¼ s2

: 2

1 if S [ s2

Pr S [ s2 =H 0 \a2 Pr S s2 =H 0

‘a1’ and ‘a2’ are such that

a= ¼ Pr S\s1 =H þ a1 Pr S ¼ s1 =H

2 0 0

a

Pr½S\s1 =H 0

) a1 ¼ 2

Pr½S ¼ s1

6.2 One-Sample Non-parametric Tests 151

a

¼ Pr S [ s2 =H 0 þ a2 Pr S ¼ s2 =H 0

2

a

Pr½S [ s2 =H 0

) a2 ¼ 2

Pr½S ¼ s2 =H 0

We accept H 0 if s1 \S\s2 and random or no conclusion if S ¼ s1 or S ¼ s2 .

Large sample test: Under H 0 ; S Bðn; qÞ,

) under H 0 ; s ¼ pﬃﬃﬃﬃﬃ

ﬃ

npq Nð0; 1Þ

Snq

x0 : jsj [ sa2:

Under H 0 ; S B n; 12 and then S is symmetric about n2 : Therefore for two sided

test in case of Case 3,

2 s1 ¼ s2 2 ) s1 ¼ n s2 and hence a1 ¼ a2 :

n n

Another similar modiﬁcation of the sign test is the Wilcoxon signed-rank test. This

is used to test the hypothesis that observations have come from symmetrical pop-

ulation with a common speciﬁed median, say, l0 . Thus the problem is to test

H0 : l ¼ l0 : The signed-rank statistic T þ is computed as follows:

1. Subtract l0 from each observation.

2. Rank the resulting differences in order of size, discarding sign.

3. Restore the sign of the original difference to the corresponding rank.

4. Obtain T þ , the sum of the positive ranks.

Similarly, T is the sum of the negative ranks. Then under H0 , we expect T þ

and T to be the same. We also note that

X

n

nðn þ 1Þ

T þ þ T ¼ i¼ :

i¼1

2

(or equivalently, a small value of T ) means that most of the large deviation from

l0 are positive and therefore we reject H0 in favour of the alternative H1 : l [ l0 .

Thus the test rejects H0 at the level a if T þ \C1 when H1 : l\l0

if T þ [ C2 when H1 : l [ l0

if T þ \C3 or T þ [ C4 when H1 : l 6¼ l0

where C1 ; C2 ; C3 and C4 are such that

152 6 Non-parametric Test

P½T þ \C1 ¼ a

P½T þ [ C2 ¼ a

observations are random against H1 : They are not random.

We replace each observation either by ‘+’ or ‘−’ sign according as it is larger or

smaller than the median of the sample observations. Any observation equal to

median is simply discarded. A run is deﬁned to be a sequence of values of the same

kind bounded by the values of other kind. We compute the total number of runs ‘r’.

Too many values of ‘r’ as well as too small values of ‘r’ give an indication of

non-randomness. Thus the test rejects H0 at the level a if r < r1 or r > r2 where r1

and r2 are such that

The one-sample run test is based on the order or sequence in which the indi-

vidual scores or observations originally were obtained.

Example 6.1 The theory predicts that the proportion of peas in the four groups A,

B, C and D should be 9:3:3:1. In an experiment among 556 peas, the numbers in the

four groups were 315, 108, 101 and 32. Does the experimental result support the

theory?

Solution If P1, P2, P3 and P4 be the proportions of peas in the four classes in the

whole population of peas, then the null hypothesis to be tested is

9 3 3 1

H0 : P1 ¼ ; P2 ¼ ; P3 ¼ ; P4 ¼

16 16 16 16

X

k

ðxi np0 Þ2

v2 ¼ i

with ðk 1Þ d:f

i¼1

np0i

Xk

x2i

¼ n

i¼1

np0i

6.2 One-Sample Non-parametric Tests 153

9

e1 ¼ np01 ¼ 556X ¼ 312:75

16

3

e2 ¼ np02 ¼ 556X ¼ 104:25

16

3

e3 ¼ np03 ¼ 556X ¼ 104:25

16

1

e4 ¼ np04 ¼ 556X ¼ 34:75

16

3152 1082 1012 322

So, v2 ¼ þ þ þ 556

312:75 104:25 104:25 34:75

¼ 556:47 556 ¼ 0:47 with 3 d:f:

From the table we have v20:05;3 ¼ 7:815. Since the calculated value of v2 , i.e. 0.47

is less than the tabulated value, i.e. 7.815, it is not signiﬁcant. Hence the null

hypothesis may be accepted at 5 % level of signiﬁcance and we may conclude that

the experimental result supports the theory.

Example 6.2 Can the following sample be reasonably regarded as coming from a

uniform distribution on the interval (35,70): 36, 42, 44, 50, 64, 58, 56, 50, 37, 48,

52, 63, 57, 43, 39, 42, 47, 61, 53, 58? Use Kolmogorov–Smirnov test.

Solution Here we test H0 : FðxÞ ¼ F0 ðxÞ for all x, where F0 ðxÞ is the distribution

function of the uniform distribution on the interval (35,70). Now

F0 ð xÞ ¼ 0 if x 35

x 35

¼ if 35\x\70

35

¼ 1 if x 70

results:

x F0 ðxÞ F20 ðxÞ jF20 ð xÞ F0 ð xÞj

36 1/35 1/20 3/140

37 2/35 2/20 6/140

39 4/35 3/20 5/140

42 7/35 4/20 0

42 7/35 5/20 7/140

43 8/35 6/20 10/140

44 9/35 7/20 13/140

47 12/35 8/20 8/140

48 13/35 9/20 11/140

50 15/35 10/20 10/140

(continued)

154 6 Non-parametric Test

(continued)

50 15/35 11/20 17/140

52 17/35 12/20 16/140

53 18/35 13/20 19/140

56 21/35 14/20 14/140

57 22/35 15/20 17/140

58 23/35 16/20 20/140

58 23/35 17/20 27/140

61 26/35 18/20 22/140

63 28/35 19/20 21/140

64 29/35 20/20 24/140

27

D20 ¼ SupjF20 ðxÞ F0 ðxÞj ¼ ¼ 0:1929:

x 140

Let us take a ¼ 0:05: Then from the table D20;0:05 ¼ 0:294. Since

0.1929 < 0.294, we accept H0 at 5 % level of signiﬁcance. So we can conclude that

the given data has come from a uniform distribution on the interval (35,70).

Example 6.3 The following data represent the yields of maize in q/ha recorded

from an experiment.

16.4, 19.2, 24.5, 15.4, 17.3, 23.6, 22.7, 20.9, 18.2

Test whether the median yield (M) is 20 q/ha.

Solution We test H0 : M ¼ 20 against H1 : M 6¼ 20. To test H0 , we ﬁnd the dif-

ference ðX 20Þ and write their signs

þ þ þ þ

Here n = 9 and r = number of ‘+’ sign = 4. This r will be binomial variate with

parameters n = 9 and p = 0.5.

To test H0 against H1 : M 6¼ 20 H1 : p 6¼ 0:5; the critical region x will be

0 0

given by r ra=2 and r ra=2 , where r a=2 is the smallest integer and ra=2 is the

largest integer such that

X9 9

9 1 a

P r ra=2 H0 ¼ ¼ 0:025

x¼ra=2 x 2 2

ra=2 1 9

X 9 1

i.e., 0:975

x¼0 x 2

h i X ra=2 9

0

0 9 1 a

and P r ra=2 H0 ¼ ¼ 0:025

x¼0 x 2 2

6.2 One-Sample Non-parametric Tests 155

0

From the table we have ra=2 1 ¼ 7, i.e. ra=2 ¼ 8 and ra=2 ¼ 1. Here

0

ra=2 ¼ 1\r ¼ 4\ra=2 ¼ 8, so H0 is accepted at 5 % level of signiﬁcance.

Example 6.4 For the problem given in Example 6.3, test H0 : M ¼ 20 against

H1 : M 6¼ 20 by using Wilcoxon signed-rank test.

Solution The differences Xi 20 are

−3.6, −0.8, 4.5, −4.6, −2.7, 3.6, 2.7, 0.9, −1.8

The order sequence of numbers ignoring the sign and their ranks with original

signs are as follows:

−1 2 −3 4.5 −4.5 6.5 −6.5 8 −9

Thus, T þ = The sum of the positive ranks = 21 and T = The sum of negative

ranks = 24.

We note that T þ þ T ¼ nðn 2þ 1Þ ¼ 45

To test H0 : M ¼ 20 against H1 : M 6¼ 20, the critical region ω will be given by

T þ [ C4 and T þ \C3 at the level a. Here we take a ¼ 0:05:

From the table we have P½T þ [ 39 0:025 and

the median yield of maize is 20 q/ha.

Example 6.5 Test whether the observations

21, 19, 22, 18, 20, 24, 15, 32, 35, 28, 30 are random.

Solution We test H0 : The observation are random against H1 : The observations

are not random.

The sample values are arranged in increasing order.

15; 18; 19; 20; 21; 22; 24; 28; 30; 32; 35

) Median = 22

Each original observation is replaced by ‘+’ or ‘−’ sign according as it is larger

or smaller than the median, i.e. 22. Any observation equal to median is simply

discarded. Thus we have from the original observation

21 19 22 18 20 24 15 32 35 28 30

- - x - - + - + + + +

156 6 Non-parametric Test

signs = n2 = 5. From table for n1 = 5, n2 = 5 any observed r of 2 or less or of 10 or

more is in the region of rejection at 5 % level of signiﬁcance. So H0 is accepted, i.e.

the observations are random.

Example 6.6 The males (M) and females (F) were queued in front of the railway

reservation counter in the order below

M F F M M M F M F F M M F M

Test whether the order of males and females in the queue was random.

Solution Here null hypothesis is

H0 : The order of males and females in the queue was random against

H1 : The order of males and females in the queue was not random.

For the given sequence,

MFFMMMFMFFMMFM

we have,

n1 = number of males = 8

n2 = number of females = 6

r = number of runs = 9

Since the observed value of r = 9 lies between the critical values 3 and 12, we

accept H0 at 5 % level of signiﬁcance. It means that the order of males and females

in the queue was random.

(i) Sign test.

(ii) Wilcoxon signed-rank test.

Sample Sign Test

F(x,y) which is unknown but continuous. The ordinary sign test for the location

parameter of a univariate population is equally applicable to a paired sample

problem. This is the non-parametric version of paired ‘t’ test.

6.3 Paired Sample Non-parametric Test 157

We draw a random sample ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ from F(x, y). To test

H 0 : np ðx yÞ ¼ np 0 writing z ¼ x y ) H 0 : np ðzÞ ¼ np 0 , i.e. H 0 : np ¼ np 0 ,

writing np ðzÞ ¼ np .

Assumption z ¼ x y is continuous in the neighbourhood of np ðzÞ. Note that

Pr z np ¼ p ) Pr znp [ 0 ¼ q; q ¼ 1 p. We deﬁne S = total number of

positive signs among z1 np 0 ; z2 np 0 ; . . .; zn np 0 .

) Under H 0 , Pr znp 0 [ 0 ¼ q and S Bðn; qÞ. Proceed for Case 1, Case 2

and Case 3 as worked out already in Sect. 6.2

Note Since np ðx yÞ is not necessarily equal to np ðxÞ np ðyÞ, the paired sample

sign test is a test for the quantile difference (but not for the difference of the

quantiles), whereas the paired ‘t’ test is a test for the mean difference (and also for

the difference of the means).

This is another test used on matched pairs. It is more powerful than the sign test

because it gives more weight to large numerical differences between the members

of a pair than to small differences. Under matched-paired samples, the differences

d within n paired sample values ðx1i ; x2i Þ for i ¼ 1; 2; . . .; n are assumed to have

come from continuous and symmetric population differences. If Md is the median of

the population of differences, then the null hypotheses is that Md ¼ 0 and the

alternative hypothesis is one of Md [ 0; Md \0 or Md 6¼ 0:

The observed differences di ¼ x1i x2i are ranked in increasing order of abso-

lute magnitude and the sum of ranks is computed for all the differences of like sign.

The test statistic T is the smaller of these two rank-sums. Paris with di ¼ 0 are not

counted. On the null hypothesis, the expected value of the two ranks-sums would be

equal. If the positive rank-sum is the smaller and is equal to or less than the table

value, the null hypothesis will be rejected at the corresponding level of signiﬁcance

a in favour of the alternative hypothesis that Md [ 0. If the negative rank-sum is the

smaller, the alternative will be that Md \0. If a two-tailed test is required, the

alternative being that Md 6¼ 0, the given levels of signiﬁcance should be doubled.

Example 6.7 For nine animals, tested under control conditions and experimental

conditions, the following values of a measured variable were observed:

Animal 1 2 3 4 5 6 7 8 9

Control (x1) 21 24 26 32 55 82 46 55 88

Experimental (x2) 18 9 23 26 82 199 42 30 62

158 6 Non-parametric Test

Test whether a signiﬁcant difference exists between the medians, using (i) the

sign test and (ii) the Wilcoxon signed-ranks test.

Solution Let h be the median of the distribution of differences. Our null hypothesis

will be H0 : h ¼ 0 against H1 : h 6¼ 0.

(i) Let di ¼ x1i x2i be the difference of the values under control and experi-

mental conditions.

Here we have 7 ‘+’ signs among 9 non-zero values. Under Ho , number(r) of ‘+’

signs will follow a binomial distribution with parameters n = 9 and p = 0.5. To test

H0 : h ¼ 0 H0 : p ¼ 0:5 against H1 : h 6¼ 0 H1 : p 6¼ 0:5; the critical region x

0 0

will be given by r ra=2 and r ra=2 where ra=2 is the smallest integer and ra=2 is

the largest integer such that.

X9 9

9 1 a

P r ra=2 H0 ¼ ¼ 0:025

x¼ra=2 x 2 2

ra 1 9

X

2

9 1

i:e:; 0:975

x¼0 x 2

h i X ra=2 90

0 9 1 a

and P r ra=2 H0 ¼ ¼ 0:025

x¼0 x 2 2

From the table we get ra=2 1 ¼ 7 ) ra=2 ¼ 8 and r 0 a=2 ¼ 1: For our example

r = 7 which lies between ra=2 ð¼8Þ and r 0 a=2 ð¼1Þ: So H0 is accepted.

(ii) The observed differences di ¼ x1i x2i are ranked in increasing order of

absolute magnitude and the sum of the ranks is computed for all the difference of

like sign. Thus

di 3 15 3 6 −27 −117 4 25 26

Rank 1.5 5 1.5 4 8 9 3 6 7

The test statistic T is the smaller of these two rank-sums (one for positive di and

one for negative di ). Here T = 17. From the table, we reject H0 at a ¼ 0:05 if either

T > 39 or T < 6. Since T > 6 and < 39, we accept H0 .

6.4 Two-Sample Problem 159

We take two univariate populations with continuous distribution functions F 1 ðxÞ

and F 2 ðxÞ which are unknown but continuous.

Assumption The two populations differ only in location.

To test H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 2 ðxÞ is located to the right of F 1 ðxÞ ,

H 0 : F 1 ðxÞ ¼ F 2 ðxÞ against H 1 : F 1 ðxÞ F 2 ðxÞ:

We draw a random sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the ﬁrst population

and another sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the second popula-

tion. We write, F 1 ðxÞ ¼ F 2 ðxÞ and F 2 ðxÞ ¼ Fðx dÞ, d is unknown location

parameter. So we are to test H 0 : d ¼ 0 against H 1 : d [ 0.

A. Wilcoxon–Mann Whitney Rank-Sum Test

We pooled the two samples and give them ranks. Suppose ðR1 ; R2 ; . . .; Rn1 Þ and

ðRn1 þ 1 ; Rn1 þ 2 ; . . .; Rn1 þ n2 Þ be the ranks of the 1st and 2nd sample observations

respectively.

[Example (10,7,9,11,3), n1 ¼ 5 is sample 1 and (20,5,17,8), n2 ¼ 4 is the sample 2.

3 \ 5 \ 7 \ 8 \ 9 \ 10 \ 11 \ 17 \ 20

# # # # # # # # #

Ranks 1 2 3 4 5 6 7 8 9

9; R7 ¼ 2; R8 ¼ 8; R9 ¼ 4Þ are the 2nd sample ranks.]

If there is any tie then the corresponding observation is ignored. Let

S1 ; S2 ; . . .; Sn2 be the ordered ranks of the 2nd sample observations, i.e.

S1 \ S1 \ . . .\ Sn2 .

[In the example above 2 < 4 < 8 < 9 ) R7 ¼ S1 ; R9 ¼ S2 ; R8 ¼ S3 ; R6 ¼ S4 ]

P

n2 P

n2

Deﬁne T = sum of the ranks of the 2nd sample observations ¼ Rn1 þ j ¼ Sj

j¼1 j¼1

If H 1 is true, then it is expected that the second sample observations are gen-

erally of higher ranks and hence T will be large. So a right tail test will be

appropriate here.

Hence for testing H 0 : d ¼ 0 against H 1 : d [ 0, x0 : T [ ta where ta is such

that Pr½T [ ta =H 0 a. Similarly for H 0 : d ¼ 0 against H 2 : d\0, x0 : T\ta 0

where ta 0 is such that Pr½T\ta 0 =H 0 a, and for H 0 : d ¼ 0 against H 3 : d 6¼

0; x0 : T\t1 ; T [ t2 where t1 and t2 are such that

P½T\t1 =H 0 þ P½T [ t2 =H 0 a:

x1 ; x2 ; . . .; xn1 ; xn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 are i.i.d. so that the second sample ranks can

be considered as a random sample of size n2 without replacement from (1, 2,…,n).

160 6 Non-parametric Test

2

T nþ1 n2 ðn þ 1Þ

)E =H 0 ¼ l ¼ ) EðT=H 0 Þ ¼

n2 2 2

T n n2 r 2

n1 n 1 n1 ðn þ 1Þ

2

V =H 0 ¼ ¼ ¼

n2 n 1 n2 n 1 12 n2 12 n2

n1 n2 ðn þ 1Þ

) VðT=H 0 Þ¼ :

12

n2 ðn þ 1Þ

s ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

T2

asymptotically Nð0; 1Þ

n1 n2 ðn þ 1Þ=12

) For H 0 : d ¼ 0 against H 1 : d [ 0 ) x0 : s [ sa

H 0 : d ¼ 0 against H 2 : d\0 ) x0 : s\ sa

and H 0 : d ¼ 0 against H 3 : d 6¼ 0 ) x0 : jsj [ sa=2

Mann–Whitney

An alternative 8description of the test is more convenient.

< 1 if xn1 þ j [ xi

Let gðxi ; xn1 þ j Þ ¼ 0 otherwise i ¼ 1ð1Þn1

:

j ¼ 1ð1Þn2

U = no. of pairs in which 2nd sample observation is greater than 1st sample

observation

X

n2 X

n1

¼ gðxi ; xn1 þ j Þ

j¼1 i¼1

P

n2 P

n1

¼ gðRi ; Rn1 þ j Þ; [no. of pairs in which 2nd sample ranks are greater than

j¼1 i¼1

1st sample ranks]

n2 X

X n1

¼ gðRi ; Sj Þ

j¼1 i¼1

n n

P

n2 P1 P1

¼ gðRi ; Sj Þ , ½ gðRi ; Sj Þ = no. of 1st sample ranks which are less than Sj ]

j¼1 i¼1 i¼1

X

n2

Xn2

n2 ðn2 þ 1Þ

¼ ðSj 1Þ ðj 1Þ ¼ ðSj jÞ ¼ T

j¼1 1

2

n2 ðn2 þ 1Þ n1 n2

) EðU=H 0 Þ = E(T/H0 Þ ¼ :

2 2

6.4 Two-Sample Problem 161

n1 n2 ðn þ 1Þ

VðU=H0 Þ ¼ VðT=H0 Þ ¼

12

U n12n2 a

s ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nð0; 1Þ

n1 n2 ðn þ 1Þ

12

Therefore

(1) For H0 : d¼ 0 against H1 : d [ 0; x0 : s [ sa

(2) For H0 : d ¼ 0 against H2 : d\0; x0 : s\ sa

(3) For H0 : d ¼ 0 against H3 : d 6¼ 0; x0 : jsj [ sa=2

B. Mood’s Median Test

Here we test H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ F2 ðxÞ, i.e. H0 : d ¼ 0 against

H1 : d [ 0.

We draw a sample ðx1 ; x2 ; . . .; xn1 Þ of size n1 from the 1st population and another

sample ðxn1 þ 1 ; xn1 þ 2 ; . . .; xn1 þ n2 Þ of size n2 from the 2nd population.

We mix the two samples and arrange them in ascending order of magnitude. Say

xð1Þ \ xð2Þ \ \ xðnÞ & xðmÞ ¼ combined sample median.

Define T ¼ total no: of 2nd sample size [ xðmÞ

¼ total no: of 2nd sample ranks [ m

Here T is the test statistic.

Under H1 , T would be too large and hence a right tail test is appropriate.

So for H1 : d [ 0 ) x0 : T [ ta where, ta is such that PH0 ½T ta a

for H2 : d\0 ) x0 : T\ta 0 where PH0 ½T ta a and

for H3 : d 6¼ 0 ) x0 : T t1 and T t2 where t1 , t2 are such that

PH0 ½T t1 þ PH0 ½T t2 a.

Null distribution of T: We want to get PðT ¼ t=H0 Þ.

Note that the totality of the pooled ranks (1, 2,.., n) is comprised of two subsets:

f1; 2; . . .; mg and fm þ 1; m þ 2; . . .; ng. Under H0 , the second sample ranks

represent a random sample without replacement of size n2 from the entire set. Since

T = no. of 2nd sample ranks exceeding m, the probability that there will be just

t number of members from 2nd subset in the random sample of size n2 is given by

the hypergeometric law:

nm m

t n t

) PðT ¼ t=H0 Þ ¼ 2

n

n2

n2 ðn mÞ n1 n2 mðn mÞ

) EðT=H0 Þ ¼ and VðT=H0 Þ ¼ :

n n2 ðn 1Þ

162 6 Non-parametric Test

1 n2

.

) For large n, under H0

T n2=2

s ¼ pnﬃﬃﬃﬃﬃﬃ

ﬃ a N ð0; 1Þ

1 n2

4n

) for H1 : d [ 0 ) x0 : s [ sa

for H1 : d\0 ) x0 : s\ sa

and for H3 : d 6¼ 0 ) x0 : jsj [ sa=2 :

Case II The two populations differ in every respect, i.e. with respect to location,

dispersion, skewness, kurtosis, etc.

C. Wald–Wolfowitz Run test

H0 : F1 ðxÞ ¼ F2 ðxÞ against H1 : F1 ðxÞ 6¼ F2 ðxÞ

Here also we arrange the combined sample in ascending order xð1Þ \

xð2Þ \ . . .\ xðnÞ .

Suppose ðR1 ; . . .; Rn1 Þ be the ranks of the 1st sample observation and

ðRn1 þ 1 ; . . .; Rn1 þ n2 Þ be the ranks of the 2nd sample observation. According to the

ordered arrangement,

we write za ¼ 0 if xðaÞ comes from 1st sample

= 1 if xðaÞ comes from 2nd sample.

We note that, 1st sample can be written as xðR1 Þ ; xðR2 Þ ; . . .; xðRn1 Þ and the 2nd

n o

sample can be written as xðRn1 þ 1 Þ ; xðRn1 þ 2 Þ ; . . .; xðRn1 þ n2 Þ .

) za ¼ 0 if a 2 ðR1 ; R2 ; . . .Rn1 Þ

¼ 1 if a 2 ðRn1 þ 1 ; Rn2 þ 2 ; . . .Rn1 þ n2 Þ:

ðR1 ; R2 ; . . .; Rn Þ. Let U = number of ‘0’ runs and V = number of ‘1’ runs and

W = U + V = total number of runs.

Here W is our test statistic.

The idea is that if the populations are identical, then the 1st sample and 2nd

sample ranks would get thoroughly mixed up, i.e. the runs of ‘0’ and ‘1’ would be

mixed up thoroughly, i.e. W would be too large. On the other hand, if the two

populations are not identical, i.e. if H0 is not true, then the arrangement of runs

will be patching. So x would be too small. Hence a left tail test would be

appropriate.

6.4 Two-Sample Problem 163

that under H0

8

> 0 if ju vj 2

>

> n1 1 n2 1

> ðu1

>

>

>

> Þðv1 Þ if ju vj ¼ 1

>

< n

Pr½U ¼ u; V ¼ v ¼ n1

>

>

> 2ðu1 Þðv1

1 n2 1

> Þ if u v ¼ 0

n 1

>

>

>

> n

>

:

n1

n2 1

2 n1 1 m1

) PH0 ½W ¼ 2m ¼ PH0 fu ¼ m; v ¼ mg ¼ m1 and

n

n1

n1 1 n2 1 n1 1 n2 1

þ

m1 m m m1

¼

n

n1

2n1 n2 2n1 n2

VðW=H0 Þ ¼ 1 :

nðn 1Þ n

W EH0 ðWÞ a

s ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ﬃ Nð0; 1Þ: ð6:1Þ

VH0 ðWÞ

(Note: Since U and V are not independent, so the traditional CLT for W = U + V is

not applicable here. Still (6.1) is true here as shown by Wald and Wolfowitz using

Strilings’ approximation). We write k1 ¼ nn1 and k2 ¼ nn2 ) k1 þ k2 ¼ 1

1 k2ﬃ

Then s ¼ p ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

W2nk

a N ð0; 1Þ

4nk1 k2

2 2

x0 : s sa :

D. Kolmogorov–Smirnov test

Let X1 ; X2 ; . . .; Xn1 be from F1 and Xn1 þ 1 ; Xn1 þ 2 ; . . .; Xn be from F2 . We are to test

H0 : F1 ðxÞ ¼ F2 ðxÞ8x against

164 6 Non-parametric Test

Or, H2 : F1 ðxÞ F2 ðxÞ8x; F1 ðxÞ\F2 ðxÞ for some x

Or, H3 : F1 ðxÞ 6¼ F2 ðxÞ8x; for some x.

Let ‘#’ symbol implies the number of cases satisfying a stated condition.

#xa x; a ¼ 1; 2; . . .; n1

F1n1 ðxÞ ¼

n1

#xb x; b ¼ n1 þ 1; n1 þ 2; ::; n2

F2n2 ðxÞ ¼

n2

Test statistic

Dþ

n1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg for H1

x

D

n1 ;n2 ¼ SupfF2n2 ðxÞ F1n1 ðxÞg for H2

x

Dn1 ;n2 ¼ Sup jF1n1 ðxÞ F2n2 ðxÞj

x

n o

¼ max Dþ n1 ;n2 ; D

n1 ;n2 for H3

Let 2nd sample ranks be Rn1 þ 1 ; . . .; Rn and ordered ranks be S1 \S2 \::\Sn2 .

Similarly for 1st sample ranks are R1 ; R2 ; . . .; Rn1 and ordered ranks are S01 \S02 ::\S0n1 .

Then Dþn1 ;n2 ¼ SupfF1n1 ðxÞ F2n2 ðxÞg ¼ max Sup fF1n1 ðxÞ F2n2 ðxÞg

x i¼0;1;::;n1 Xs0 x\S0 þ 1

i i

i S0i i

¼ max max ;0 :

i¼1;::;n1 n1 n2

Sj j

Similarly, D

n1 ;n2 ¼ max 0; max j

n2 n1 :

j¼1;::;n2

n o

Dn1 ;n2 ¼ max Dnþ1 ;n2 ; D

n1 ;n2 :

Under H0 , Dnis uniform and Dþ ; D and Doare distribution free. [Under H0,

distribution of ðs1 ; s2 ; . . .sn2 Þ; ðs01 ; s02 ; s03 ; . . .s0n1 Þ is independent of (F1 = F2)].

Critical region: under H0, we expect that D+, D− and D are very small. Hence right

tailed test based on D’s would be appropriate.

Asymptotic distribution

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

þ

! 1 e2z as min ðn1 ; n2 Þ! 1; z [ 0

2

For one-sided test PH0 n1 n2

D

n1 þ n2 n1 ;n2 z

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Practically we ﬁnd a z such that e2z ¼ a and reject H0 if n1n1þn2n2 (observed

2

Dþ

n1 ;n2 ) z.

6.4 Two-Sample Problem 165

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P

1

ð1Þi1 e2i

2 2

For two sided test PH0 n1 n2

n1 þ n2 Dn1 ;n2 z ! 1 2 z

as min

i¼1

ðn1 ; n2 Þ ! 1:

Advantages of K–S test over Homogeneity v2 test are as follows

1. K–S test is applicable to ungrouped data, while v2 is applicable to grouped data

only.

2. Under H0 K–S is exactly distribution free, while v2 is asymptotically distribu-

tion free.

3. K–S test is consistent against any alternative, while v2 is so for speciﬁc alter-

native only.

Example 6.8 Twelve 4-year-old boys and twelve 4-year-old girls were observed

during two 15 min play sessions and each child’s play during these two periods was

scored as follows for incidence and degree of aggression:

Boys : 86; 69; 72; 65; 113; 65; 118; 45; 141; 104; 41; 50

Girls : 55; 40; 22; 58; 16; 7; 9; 16; 26; 36; 20; 15

Test the hypothesis that there were sex differences in the amount of aggression

shown, using (a) the Wald-Wolfowitz runs test, (b) the Mann–Whitney–Wilcoxon

test and (c) the Kolmogorov–Smirnov test.

Solution We want to test H0 : incidence and degree of aggression are the same in

four-year olds of both sexes against H1 : four-year-old boys and four-year-old girls

display differences in incidence and degree of aggression.

We combine the scores of boys (B’s) and girls (G’s) in a single-ordered series, we

may determine the number of runs of G’s and B’s. The ordered series is given below.

Score 7 9 15 16 16 20 22 26 36 40 41 45 50 55 58

Groups G G G G G G G G G G B B B G G

Runs _________________1___________________________ ____2_____ __3___

Score 65 65 69 72 86 104 113 118 141

Groups B B B B B B B B B

Runs __________________4______________________

From the table for n1 = 12, n2 = 12, we reject H0 at a ¼ 0:05 if r 7. Since our

value of r is smaller than 7, we may reject H0 . So we can conclude that boys and

girls display differences in aggression.

166

G G G G G G G G G G B B B G G B B B B B B B B B

Rank 1 2 3 4.5 4.5 6 7 8 9 10 11 12 13 14 15 16.5 16.5 18 19 20 21 22 23 24

6 Non-parametric Test

6.4 Two-Sample Problem 167

The pooled sample and the ranks are given below:

The sum of the ranks for the observations corresponding to the boys is

R2 ¼ 1 þ 2 þ 3 þ 4:5 þ 4:5 þ 6 þ 7 þ 8 þ 9 þ 10 þ 14 þ 15 ¼ 84

Hence

n2 ðn2 þ 1Þ

U ¼ n1 n2 þ R2

2

¼ 144 þ 78 84 ¼ 138

Or, equivalently,

n1 ðn1 þ 1Þ

U ¼ n1 n2 þ R1

2

¼ 144 þ 78 216 ¼ 6

The test statistic is given by the smaller of the two quantities. Here U = 6. The

other value of U can be obtained from the relation U 0 = n1 n2 − U = 144 – 6 = 138.

The critical value of U for a two-tailed test at a = 0.05 and n1 ¼ n2 = 12 is 37. The

observed U = 6 is less than the table value. Hence it is signiﬁcant at 5 % level.

Hence H0 is rejected.

(c) Kolmogorov–Smirnov test

The scores of the boys and girls are presented in two frequency distributions shown

below:

Score (x) No. of boys No. of girls F12 ðxÞ G12 ðxÞ jF12 ðxÞ G12 ðxÞj

7–20 0 6 0 6/12 6/12

21–34 0 2 0 8/12 8/12

35–48 2 2 2/12 10/12 8/12

49–62 1 2 3/12 12/12 9/12

63–76 4 0 7/12 12/12 5/12

77–90 1 0 8/12 12/12 4/12

91–104 1 0 9/12 12/12 3/12

105–118 2 0 11/12 12/12 1/12

119–132 0 0 11/12 12/12 1/12

133–146 1 0 12/12 12/12 0

168 6 Non-parametric Test

D12;12 ¼ SupjF12 ðxÞ G12 ðxÞj ¼ 9=12. From the table, the critical value for

n1 ¼ n2 ¼ 12 at level a ¼ 0:05 is D12;12;05 ¼ 6=12. Since D12;12 [ D12;12;0:5 , we

reject H0 .

function F(x) which is continuous. We deﬁne functions of sample observations

L ¼ Lðx1 ; x2 ; . . .; xn Þ and U ¼ Uðx1 ; x2 ; . . .; xn Þ such that L < U.

If Pr½PrðL X UÞ b ¼ c ð6:2Þ

then the interval (L,U) is called 100 b% tolerance interval with tolerance coefﬁcient

c. L and U are called lower and upper tolerance limits respectively. If the deter-

mination of c does not depend upon F then the limit (L,U) are called non-parametric

(distribution free) tolerance limits. We note that, (6.2) can be written as,

(x) with tolerance coefﬁcient c is a random interval such that the probability is c that

the area between the endpoints of the interval (L,U) is at least a certain pre-assigned

quantity ‘b’.

If L and U are two-order statistics say xðrÞ and xðsÞ , (r < s), then (6.3) is

equivalent to Pr FðxðsÞ Þ FðxðrÞ Þ b ¼ c.

Wilks has shown that the order statistics provide non-parametric tolerance limits,

while it is Robbins who has shown that it is only the order statistics which provide

distribution free tolerance limits.

Determination of Tolerance Limits

Joint distribution of xðrÞ ; xðsÞ is

n! r1

g xðrÞ ; xðsÞ ¼ FðxðrÞ Þ

ðr 1Þ!ðs r 1Þ!ðn sÞ!

sr1 ns

FðxðsÞ Þ FðxðrÞ Þ 1 FðxðsÞ Þ f ðxðrÞ Þf ðxðsÞ Þ; xðrÞ \xðsÞ

n!

gðu; vÞ ¼ ur1 ðv uÞsr1 ð1 vÞns ; 0\u\v\1:

ðr 1Þ!ðs r 1Þ!ðn sÞ!

6.5 Non-parametric Tolerance Limits 169

Again we put )

V U ¼Y V ¼ W þ Y; 0\W\1 y:

n!

) gðw; yÞ ¼ wr1 ysr1 ð1 w yÞns

ðr 1Þ!ðs r 1Þ!ðn sÞ!

Z1y

n!

) gðyÞ ¼ ysr1 wr1 ð1 w yÞns dw

ðr 1Þ!ðs r 1Þ!ðn sÞ!

0

Z1

n!

¼ ysr1 ð1 yÞr1 tr1 ð1 yÞns ð1 tÞns ð1 yÞdt

ðr 1Þ!ðs r 1Þ!ðn sÞ!

0

Z1

n!

¼ ysr1 ð1 yÞn þ rs tr1 ð1 tÞns dt

ðr 1Þ!ðs r 1Þ!ðn sÞ!

0

Cðn þ 1Þ

¼ ysr1 ð1 yÞn þ rs

Cðs rÞCðn þ r s þ 1Þ

1

¼ ysr1 ð1 yÞn þ rs ; 0\y\1:

bðs r; n þ r s þ 1Þ

) Pr FðxðsÞ Þ FðxðrÞ Þ b ¼ c

, Pr½y b ¼ c , Pr½y b ¼ 1 c

Zb

i:e: gðyÞdy ¼ 1 c

0

Rb

ysr1 ð1 yÞn þ rs dy

0

i:e: ¼1c

bðs r; n þ r s þ 1Þ

i:e: Ib ðs r; n þ r s þ 1Þ ¼ 1 c ð6:4Þ

that is xðrÞ and xðsÞ are symmetrically placed.

Particular case: r = 1, s = n; Then (6.4) ) Ib ðn 1; 2Þ ¼ 1 c

Rb

tn2 ð1 tÞdt

0

i:e: 1 c ¼

bððn 1Þ; 2Þ

n1

b bn Cðn 1ÞCð2Þ

i:e: 1 c ¼ =

n1 n Cðn þ 1Þ

nbn1 ðn 1Þbn

¼ nðn 1Þ

nðn 1Þ

) 1 c ¼ nbn1 ð1 bÞ þ bn

170 6 Non-parametric Test

1 c ’ nbn1 ð1 bÞ:

For given b and c, one can ﬁnd n from this relationship.

Alternative

For Bin(n,p), we know

!

X

c n

px qnx ¼I q ðn c; c þ 1Þ

x¼0 x

¼1 I p ðc þ 1; n cÞ

Then ð6:4Þ ) c ¼1 I b ðs r; n þ r s þ 1Þ

!

X n

sr1

¼ bx ð1 bÞnx :

x¼0 x

So for given n, b and c we can ﬁnd s and r such that xðrÞ and xðsÞ are sym-

metrically placed.

Suppose F(x) is continuous and arandom sample ðx1 ; x2 ; . . .; xn Þ is drawn from it.

np is the p-th order quantile. So P X np ¼ p. Deﬁne X ðrÞ and X ðsÞ as the rth and

sth order statistics, r < s. Then X ðrÞ ; X ðsÞ is said to be 100(1 − α)% conﬁdence

interval for np if

Pr X ðrÞ np X ðsÞ ¼ 1 a ð6:5Þ

Now; Pr X ðrÞ np X ðsÞ ¼ Pr np X ðsÞ Pr np X ðrÞ

¼ Pr X ðsÞ np Pr X ðrÞ np

¼ 1 Pr X ðsÞ \np 1 þ Pr X ðrÞ \np

¼ Pr X ðrÞ \np Pr X ðsÞ \np

¼ Pr at least r of the observations \np

Pr at least s of the observations \np

6.6 Non-parametric Conﬁdence Interval for nP 171

!

X

s1

n nx

¼ px ð1 pÞ ð6:6Þ

x¼r x

! !

X

s1 n nx X

r1 n nx

¼ p ð1 pÞ

x

px ð1 pÞ

x¼0 x x¼0 x

¼ 1 I p ðs; n s þ 1Þ 1 þ I p ðr; n r þ 1Þ

¼ I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ

Since, Pr X ðrÞ np X ðsÞ ¼ 1 a, so r and s are such that

Given a and n, the selection of r and s satisfying (6.7) is not unique. We select

that pair of r and s for which (s − r) is minimum.

For symmetrically placed order statistics xðrÞ and xðsÞ , we select that pair of (r,

s) such that r + s = n + 1 ⇒ s − 1 = n − r.

!

X

nr

n x nx

) From ð6:7Þ 1 a ¼ p ð1 pÞ :

r x

Note If in (6.7) the exact probability ð1 aÞ is not attained then we choose that

pair of r and s such that

Pr X ðrÞ np X ðsÞ 1 a i:e: I p ðr; n r þ 1Þ I p ðs; n s þ 1Þ 1 a:

The sign test technique can be applied to obtain a class interval estimate for the

unknown population median n1=2 . Suppose X ð1Þ ; X ð2Þ ; . . .; X ðnÞ be the order statis-

tics. We consider the testing problem H 0 : n1=2 ¼ n0 against H 1 : n1=2 6¼ n0 :

Define; S ¼ total no: of þ ve signs among XðiÞ n0 8i ¼ 1ð1Þn

8

>

> 1 if s\s1

>

>

< a1 if s ¼ s1

/ðsÞ ¼ 0 if s1 \s\s2

>

>

> a2

> if s ¼ s2

:

1 if s [ s2

172 6 Non-parametric Test

9

a

Pr ½s\s1 =H0 \ Pr ½s s1 =H0 >

=

2 ð6:8Þ

a

Pr ½s [ s2 =H0 \ Pr ½s s2 =H0 > ;

2

a a

Pr½s\s1 =H0 Pr½s [ s2 =H0

Also a1 and a2 are such that a1 ¼ 2 P½s¼s1 =H0 and a2 ¼ 2 P½s¼s2 =H0

We accept H0 if s1 \s\s2 and so

Pr½s1 \s\s2 ¼ 1 a

In order to obtain a conﬁdence interval for n1=2 we need only to translate the

inequality in the LHS of (6.9) to an equivalent statement involving

the order

statistics and n1=2 . We have seen earlier that 1 a ¼ Pr XðrÞ np XðsÞ

!

sP

1 n x nx

¼ p ð1pÞ .

x¼r x

Now, for

!

1 X s1 n 1 n

p ¼ ; 1 a ¼ Pr XðrÞ n1=2 XðsÞ ¼

2 x¼r x 2

1

¼ Pr½r S s 1 as S B n; under H0 :

2

) Pr XðrÞ n1=2 XðsÞ ¼ Pr½r S s 1 ¼ 1 a ð6:10Þ

h i

Pr Xðs1 þ 1Þ n12 Xðs2 Þ ¼ 1 a

) 100ð1 aÞ% C.I. for n1=2 using sign test is Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ

fsince S is symmetric about n=2; n2 s1 ¼ s2 n=2

For large samples, (6.9) is equivalent to

6.6 Non-parametric Conﬁdence Interval for nP 173

" #

s1 þ 1 n=2 S n2 s2 1 n2

Pr pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ ¼1a

n=4 n=4 n=4

" #

s1 þ 1 n=2 0:5 s2 1 n2 þ 0:5

or, Pr pﬃﬃﬃﬃﬃ s pﬃﬃﬃﬃﬃ ¼1a

n=4 n=4

) pﬃﬃﬃﬃﬃ ¼ sa=2 and pﬃﬃﬃﬃﬃ ¼ sa=2

n=4 n=4

pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃ

i:e: s1 ¼ n=2 0:5 n=4s

a=2 & s2 ¼ =2 þ 0:5 þ

n n=4s

a=2 ð6:11Þ

Xðs1 þ 1Þ ; Xðs2 Þ ¼ Xðs1 þ 1Þ ; Xðns1 Þ where s1 and s2 are given by (6.11).

When several tests of the same hypothesis H0 are made on the basis of independent

sets of data, it is quite likely that some of the tests will dictate rejection of the

hypothesis (at the chosen level of signiﬁcance) while the others will dictate its

acceptance. In such a case, one would naturally like to have a means of combining

the results of the individual tests to reach a ﬁrm, overall decision. While one may

well apply the same test to the combined set of data, what we are envisaging is a

situation where only the values of the test statistics used are available.

Let us denote by Ti the statistic used in making the ith test (say, for i = 1, 2,...,k).

Commonly T1 ; T2 ; . . .; Tk will be statistics deﬁned in the same way (like v2

statistics or t-statistics), but with varying sampling distributions simply because

they are based on varying sample sizes. To ﬁx ideas, let us assume that in each case

the test requires that H0 be rejected if, and only if, the observed value of the

corresponding statistic be too large. Consider, in this situation, the probabilities

yi ¼ Pr½Ti [ ti =H0 , for i = 1,2,…,k.

Provided Ti has a continuous distribution under H0 , say with probability density

R1

function gi ðtÞ, so that yi ¼ gi ðtÞdt; where ti is a randomly taken value of Ti , yi has

ti

the rectangular distribution over the interval [0,1] under H0 and hence 2 loge yi

Pk

has the v2 distribution with df = 2. Consequently Pk ¼ 2 loge yi has, under H0 ,

i¼1

the v2 distribution with 2k degrees of freedom. This statistic is used as the test

statistic for making the combined test. One would reject H0 if, and only if, the

observed value of Pk exceeds v2a;2k .

174 6 Non-parametric Test

The case where each individual test requires rejection of H0 if, and only if, the

observed value of the corresponding test statistic is too small, or the case where

each individual test requires rejection of H0 if, and only if, the observed value of the

test statistic is either too large or too small, is to be similarly dealt with. The reason

is that, if Ti have continuous distributions under H0 , then ui ¼ Pr½Ti \ti =H0 and

vi ¼ Pr½jTi j [ jti j=H0 are also rectangularly distributed over (0,1). This implies

Pk

that the statistic Pk to be appropriate to these situations, viz., Pk ¼ 2 loge ui

i¼1

P

k

and Pk ¼ 2 loge vi , are also distributed as v2 statistics with df = 2k under H0 . In

i¼1

each of these cases also, the overall decision will be to reject H0 if, and only if, the

observed value of the respective Pk exceeds v2a;2K .

Example 6.9 In order to test whether the mean height ðlÞ of a variety of paddy

plants, when fully grown, is 60 cm, or less than 60 cm, ﬁve experimenters made

independent (student’s) t-tests with their respective data. The probabilities of the t-

statistics (with the appropriate df in each case) to be less than their respective

observed values are 0.023, 0.061, 0.07, 0.105 and 0.007. If the tests are made at 5 %

level, then the hypothesis H0 : l ¼ 60 cm, has to be accepted in three cases out of

the ﬁve.

In order to combine the results of the 5 tests, we note that log yi , for i = 1, 2, 3, 4

and 5, are 2:36173; 2:78533; 2:23045; 1:02119 and 3:84510, respectively. Hence for

P5 P5

the data, loge ui ¼ 10 þ 2:24380 ¼ 7:75620, so that Pk ¼ 2 loge ui ¼

i¼1 1

2:30259 15:5124 ¼ 35:719.

This is to be compared with v2:05;10 ¼ 18:307 and v2:01;10 ¼ 23:205: Since the

observed value of Pk exceeds the tabulated values, the combined result of the

experimenter’s tests leads to the rejection of H0 at both 5 % and the 1 % level. In

other words, in the light of all 5 experimenters’ data, we may conclude that the

mean height at the variety of paddy plant is less than 60 cm.

In many situations, the individuals are ranked by two judges or the measurements

taken for two variables are assigned ranks within the samples independently. Now it

is desired to know the extent of association between the ranks. The method of

calculating the association between ranks was given by Charles Edward Spearman

in 1906 and is known as Spearman’s rank correlation.

Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ: be a sample from a bivariate population. If

the sample values X1 ; X2 ; . . .; Xn and Y1 ; Y2 ; . . .; Yn are each ranked from 1 to n in

6.8 Measures of Association for Bivariate Samples 175

increasing order of magnitude separately and if the X’s and Y’s have continuous

distribution functions, we get a unique set of rankings. The data will then reduce to

n pairs of ranking. Let us write

R1a ¼ Rank of Xa ; a ¼ 1; 2; . . .; n:

R2a ¼ Rank of Ya ; a ¼ 1; 2; . . .; n:

Pearsonian coefﬁcient of correlation between the ranks R1a ’s and R2a ’s is called

the Spearman’s rank correlation coefﬁcient rs which is given by

Pn

1 ÞðR2a R

ðR1a R 2Þ

a¼1

rs ¼

nP Pn o1=2

n

a¼1

1 Þ2

ðR1a R 2

a¼1 ðR2a R2 Þ

n

P

12 R1a n þ2 1 R2a n þ2 1

a¼1

¼

nðn2 1Þ

If for n individuals, Da ¼ R1a R2a , is the difference between ranks of the ath

individual for a ¼ 1; 2; . . .; n, the formula for Spearman’s rank correlation is

P

n

6 D2a

i¼1

rs ¼ 1 :

nðn2 1Þ

The value of rs lies between −1 and +1. If X, Y are independent then Eðrs Þ ¼ 0.

Also Population Spearman’s rank correlation coefﬁcient, i.e. qs ¼ 0 ) Eðrs Þ ¼ 0:

Kendall in 1962 derived the frequency function of rs and gave exact critical value

rs . But the approximate test of rs which is the same as t-test for Pearsonian cor-

relation coefﬁcient is good enough for all practical purposes. Here we test H0 :

qs ¼ 0 against H1 : qs 6¼ 0. The test statistic

pﬃﬃﬃﬃﬃﬃ

t ¼ rp

s ﬃﬃﬃﬃﬃﬃﬃ

n2ﬃ

has (n − 2) d.f. The decision about H0 is taken in the usual way. For

1rs2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

large samples under H0, the random variable Z ¼ rs n 1 has approximately a

standard normal distribution. The approximation is good for n 10.

B. Kendall’s rank correlation coefﬁcient

Kendall’s rank correlation coefﬁcient τ is suitable for the paired ranks as in case of

Spearman’s rank correlation. Let ðX1 ; Y1 Þ; ðX2 ; Y2 Þ; . . .; ðXn ; Yn Þ be a sample from a

bivariate population.

For any two pairs ðXi ; Yi Þ and Xj ; Yj we say that the relation is perfect con-

cordance if

176 6 Non-parametric Test

perfect discordance if Xi [ Xj whenever Yi \Yj or Xi \Xj whenever Yi [ Yj .

Let pc and pd be the probability of perfect concordance and of perfect discor-

dance respectively deﬁned by

pc ¼P ðXj Xi ÞðYj Yi Þ [ 0

and pd ¼P ðXj Xi ÞðYj Yi Þ\0 :

s ¼ pc pd

It is noted that

s ¼ 0 if X and Y are independent.

= +1 if X and Y be in prefect concordance.

= −1 if X and Y be in prefect discordance.

We now need to ﬁnd an estimate of s from the sample.

Using sample observations, Kendall’s measure of association becomes

1 X

n X

T¼ ! sðxj xi Þsðyj yi Þ ð6:12Þ

n 1 i\j n

2

where sðrÞ ¼ 1 if r [ 0

¼ 0 if r ¼ 0

¼ 1 if r\0

Naturally E s xj xi sðyj yi Þ ¼ pc pd ¼ s

The statistic T deﬁned in (6.12) is known as Kendall’s sample tau ðsÞ coefﬁcient.

The procedure for calculating T consists of the following steps:

Step 1: Arrange the rank of the ﬁrst set (X) in ascending order and rearrange the

ranks of the second set (Y) in such a way that n pairs of rank remain the same.

Step 2: After operating Step 1, the ranks of X are in natural order. Now we are left

to determine how many pairs of ranks on the set Y are in their natural order and how

many are not. A number is said to be in natural order if it is smaller than the

succeeding number and is coded as +1 and also if it is greater than its succeeding

it will not be taken in natural order and will be coded as −1. In this

number then

n

way all pairs of the set (Y) will be considered and assigned the values +1

2

and −1.

6.8 Measures of Association for Bivariate Samples 177

Step 4: The formula for Kendall’s rank correlation coefﬁcient-T is

S Actual value 2S

T¼ ¼ ¼

n Maximum possible value nðn 1Þ

2

H1 : s6¼ 0. Thus we reject H0 if the observed

value of jTj [ ta=2 where P jT [ ta=2 jH0 ¼ a. The values of ta are given in the

table for selected values of n and a. Values for 4 n 10 are tabulated by Kendall.

4ðn2Þ

þ 1s . If n ! 1 under

2

It can be shown that EðTÞ ¼ s and VðTÞ ¼ 9nðn1Þ

n

2

pﬃﬃ

H0 : s ¼ 0, 2 T Nð0; 1Þ and we can test the independence of x and y.

3 n

estimate of s, whereas rs is not an unbiased estimate of qs :

Example 6.10 Following are the ranks awarded to seven debators in a competition

by two judges.

Debators A B C D E F G

Ranks by judge I (x) 3 2 1 6 7 4 5

Ranks by judge II (y) 5 6 3 7 4 2 1

Compute (i) Spearmen’s rank correlation coefﬁcient ðrs Þ and Kendall’s sample

tau coefﬁcient (T) and test their signiﬁcance.

Solution (i) First we ﬁnd di ¼ xi yi 8i which are

d : −2 −4 −2 −1 3 2 4

P

Also, 7i¼1 di2 ¼P54

6 d2

thus rs ¼ 1 nðn2 1Þi ¼ 1 6 54

7 48 ¼ 0:036

To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the statistic

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

rs n 2 0:036 7 2

t5 ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ 0:080

1 rs2 1 ð0:036Þ2

From the table, t0.025,5 = 2.571. Calculated value of |t| = 0.080 < 2.571, hence we

accept H0. It means there is a dissociation between the ranks awarded by two

judges.

178 6 Non-parametric Test

x 1 2 3 4 5 6 7

y 3 6 5 2 1 7 4

For S, take the rank 3 and give +1 or −1 value for all pairs with subsequent ranks

of y. 3 < 6, give a number +1; 3 < 5, again +1; 3 > 2, give a number −1 and so on.

Then choose 6 and take the pairs (6,5), (6,2), (6,1), (6,7) and (6,4) and continue the

process till we reach the last pair (7,4). Proceeding in this manner,

S = (+1 +1 –1 –1 +1 +1) + (–1 –1 –1 +1 –1) + (–1 –1 +1 –1) + (–

1 +1 +1) + (+1 +1) + (–1)

= 2 – 3 – 2 + 1 + 2 – 1 = –1

Thus T ¼ S ! ¼ 1 2

7 6 ¼ 0:048

n

2

To test the signiﬁcance of T, we test

H0 : s ¼ 0 against H1 : s 6¼ 0.

From the table, for n = 7 we have t0.025 = 0.62. Since |T| = 0.048 < 0.62, we

accept H0 . It reveals that there is no association between the ranks awarded by two

judges.

Example 6.11 A random sample of 12 couples showed the following distribution of

heights (in inches):

Couple no. 1 2 3 4 5 6 7 8 9 10 11 12

Husband 80 70 73 72 62 65 74 71 63 64 68 67

height

Wife height 72 60 76 62 63 46 68 71 61 65 66 67

(b) Test the hypothesis that the heights of husband and wife are independent using

rs as well as T. In each case use the normal approximation.

Solution (a) The heights of husband and wife are each ranked from 1 to 12 in

increasing order of magnitude separately and let us denote their ranks by xi and yi

respectively (i = 1,2, …, 12).

xi : 12 7 10 9 1 4 11 8 2 3 6 5

yi : 11 2 12 4 5 1 9 10 3 6 7 8

di ¼ x i y i :1 5 2 5 4 3 2 2 1 3 1 3

X

di2 ¼ 108:

6.8 Measures of Association for Bivariate Samples 179

P

6 di26 108

Thus rs ¼ 1 ¼ 1 12

nðn2 1Þ 143 ¼ 0:6224

We write below the ranks of x in natural order and ranks of y correspondingly

xi : 1 2 3 4 5 6 7 8 9 10 11 12

yi : 5 3 6 1 8 7 2 10 4 12 9 11

n

Total number of scores = ¼ 12 2 11 ¼ 66

2

Actual score = S = 3 + 6 + 3 + 8 + 1 + 2 + 5 + 0 + 3 – 2 + 1 = 30 (procedure for

calculations of S is explained in Example 6.10 (ii))

Thus, T ¼ 3060 ¼ 0:4545

(b) To test H0 : qs ¼ 0 against H1 : qs 6¼ 0, the approximate test statistic is

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Z ¼ rs n 1

pﬃﬃﬃﬃﬃ

¼ 0:6224 11 ¼ 2:06 Nð0; 1Þ

Since Cal |Z|, i.e., 2.06 > Z0.025 = 1.96, hence we reject H0 .

It means that the heights of husband and wife are not independent.

To test H0 : s ¼ 0 against H1 : s 6¼ 0, the approximate test statistic is

3 pﬃﬃﬃ 3 pﬃﬃﬃﬃﬃ

Z¼ n T¼ 12 0:4545 ¼ 2:36 Nð0; 1Þ

2 2

Since Cal |Z|, i.e., 2.36 > Z0.025 = 1.96, hence we reject H0 . Hence we can

conclude that there is an association between the heights of husband and wife.

Chapter 7

Statistical Decision Theory

7.1 Introduction

In this chapter we discuss the problems of point estimation, hypothesis testing and

interval estimation of a parameter from a different standpoint.

Before we start the discussion, let us ﬁrst deﬁne certain terms commonly used in

statistical inerence problem and decision theory. Let X 1 ; X 2 ; . . .; X n denote a ran-

dom sample of size n from a distribution that has the p.d.f. f(x, θ), where θ is an

unknown state of nature or an unknown parameter and H is the set of all possible

values of θ, i.e. parameter space (known).

To make some inference about θ, i.e. to take some decisions or action about θ,

the statistician takes an action on the basis of the sample point ðx1 ; x2 ; . . .; xn Þ.

Let us deﬁne

to choose an action a from :

The value L(θ, a) is the loss incurred by taking action ‘a’ when θ is true.

Equivalently, it is a measure of the degree of undesirability of choosing an action

‘a’ when θ is true and this gives a preference pattern over ɶ for given θ, i.e. the

smaller the loss the better the action under θ. L(θ, a) is a real-valued function on Θ x

ɶ = Loss function. Thus (Θ, ɶ, L) is the basic element in our discussion.

Example 7.1 Let θ = average life length of electric bulbs produced in a factory and

H ¼ ð0; 1Þ:

Point estimation of θ

To estimate the value of θ ≡ to choose one value from ð0; 1Þ; so a ¼ ð0; 1Þ.

Observe life lengths of some randomly selected bulbs.

Deﬁne L(θ, a) = ðh aÞ2 = squared error loss function

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0_7

182 7 Statistical Decision Theory

(or) = wðhÞðh aÞ2 = weighted squared error loss function where

wðhÞ = a known function of θ.

Desired nature of (L, θ) graph should be a convex function with minimum at θ

and increasing in jh aj.

L(θ,a)

Testing of hypothesis of θ

To test H 0 : h h0 (a given value of θ) against

H 1 : h [ h0

¼ fa0 ; a1 g where a0 = accept H0 and a1 = accept H1. Here, simple (0–1) loss

function is as

a0 a1

h h0 0 1

h [ h0 1 0

9

a0 a1 =

l00 \l01

or, assigned value loss function is as h h0 l00 l01

; l11 \l10

h [ h0 l10 l11

9

a0 a1 =

w1 ðhÞ " in h0 h

or, a ð0 xÞ type loss function is as h h0 0 x1 ðhÞ

; w2 ðhÞ " in h h0

h [ h0 x2 ðhÞ 0

Interval estimation

Here, we are to choose one interval from ð0; 1Þ.

¼ ða1 ; a2 Þ:

1 if h 62 a

Lðh; aÞ ¼ or, may be Lðh; aÞ ¼ a2 a1 ¼ length of the interval.

0 if h 2 a

7.1 Introduction 183

X ¼ Random outcomes of the experiment ¼ Random variable or vector

x ¼ Observed value of X

x ¼ Sample space

Ph : Ph ½X 2 A or; F h ðxÞ ¼ Ph ½X x

or; f h ðxÞ ¼ p.d.f or p.m.f of X:

the statistician takes an action d ðxÞ 2 , d ðxÞ : x!

where

¼ A non-randomized decision rule:

If d(x) = action taken; loss incurred under h ¼ Lðh; d ðxÞÞ. If d(x) = decision rule,

then loss incurred (under θ) ¼ Lðh; d ðxÞÞ (random quantity) = a real-valued random

variable. Expected loss (under θ) ¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ = risk of d(x) under θ.

Let us restrict to rule d(x) for which Rd ðhÞ\18h and let D = the set of all such

d(x)’s. Rd ðhÞ gives a preference pattern D for given θ. The smaller the risk the better

is the decision rule d(x).

X

Thus, ðH; ; LÞ ! ðH; D; RÞ

Example 7.2 Point estimation of real h : ¼ H

d(x):

x ! ɶ(Θ); d(x) = point estimator of θ.

For squared error loss Rd ðhÞ ¼ E h ðd ðxÞ hÞ2 ¼ MSE of d(x) under θ.

Example 7.3 ¼ fa0 ; a1 g; ai ¼ accept Hi , i = 0, 1,

d ð xÞ :

x ! a0; a1

x0 ¼ fx : d ð xÞ ¼ a0 g ¼ acceptance region

x1 ¼ fx : d ð xÞ ¼ a1 g ¼ rejection region

) d ðxÞ ¼ a0 if x 2

x0

a1 if x 2

x1

x0 and

x1 are disjoint and

x0 þ

x1¼

x

184 7 Statistical Decision Theory

¼ Ph fX 2

x1 g

¼ Probability of first kind of error:

If h 2 H1 ; Rd ðhÞ ¼ Ph fd ð xÞ ¼ a0 g ¼ Ph fX 2

x0 g

¼ 1 Ph fX 2 x1 g ¼ Probability of type 2 error:

Interval estimation of real θ

ɶ = set of all possible intervals of H.

d ð xÞ :

x!

1 if h 62 a

Lðh; aÞ ¼

0 if h 2 a

If Lðh; aÞ ¼ a2 a1

then Rd ðhÞ ¼ Eh ½d2 ð xÞ d1 ð xÞ = Expected length of d(x).

Thus, (Θ, ɶ, L) = Basic element of a statistical decision problem.

X = observable random variable; for each x, d ð xÞ 2 , i.e. d :

x!

d(x) = a non-randomized decision rule.

D = the set of all non-randomized decision rules (with ﬁnite risks 8h)

Randomized action

Example 7.4 Let H ¼ fh1 ; h2 g; ɶ ¼ fa1 ; a2 ; a3 g and

a1 a2 a3

Loss function as h1 1 4 3

h2 4 1 3

Neither a1 nor a2 is better than a3 for every value of h. Now deﬁne an action

1

a : a ¼ a1 with probability

2

1

¼ a2 with probability

2

7.1 Introduction 185

1 1

Lðh1 ; a Þ ¼ Lðh1 ; a1 Þ þ Lðh1 ; a2 Þ ¼ 2:5

2 2

1 1

Lðh2 ; a Þ ¼ Lðh2 ; a1 Þ þ Lðh2 ; a2 Þ ¼ 2:5

2 2

randomized action.

Generally, by randomized action a we mean actually a probability distribution

over ɶ and loss due to a randomized action a is

Lðh; a Þ ¼ ELðh; zÞ where z is a random variable with probability distribution a

over ɶ.

1. Extends the class of actions, i.e. allows more flexibility for the statistician.

2. The set of all randomized actions is convex, i.e. if a1 ; a2 are two randomized

actions, then for every 0 a 1; aa1 þ ð1 aÞa2 is also a randomized action

with L h; aa1 þ ð1 aÞa2 ¼ aL h; a1 þ ð1 aÞL h; a2 .

We shall consider only randomized actions a for which Lðh; a Þ is ﬁnite 8h and

shall denote by ɶ* the set of all such randomized actions.

Note Clearly ɶ ɶ* because a non-randomized action ‘a’ ≡ A probability dis-

tribution over ɶ degenerate at the point ‘a’.

Let X = observable random variable

x = observed value of X

For each x, let dð xÞ 2 ɶ*, i.e. d :

x ! ɶ*

d ¼ dð xÞ = a (behavioural) randomized decision rule.

Rd ðhÞ ¼ Risk of d at h ¼ Eh Lðh; dð xÞÞ

We shall consider only behavioural rules d for which Rd ðhÞ is ﬁnite 8h and shall

denote by D as the class of all such behavioural rules. Clearly, D D.

Example 7.5 Test of hypothesis problem H0 : h 2 H0 against H1 : h 2 H1 .

¼ fa0 ; a1 g; ai ¼ accept H i ; i ¼ 0; 1; . . .

where / = probability of accepting H 1

1 / = probability of accepting H 0 ; 0 / 1.

where /ðxÞ ¼ probability of accepting H 1 for X ¼ x

186 7 Statistical Decision Theory

Lðh; a Þ ¼ / 0 þ ð1 /Þ 1 ¼ 1 / for h 2 H1

R/ ðhÞ ¼ Eh /ð xÞ for h 2 H0

¼ Eh ½1 /ð xÞ for h 2 H1

Let X = observable random variable; x = observed value of X.

D = the set of all non-randomized decision rules.

d ¼ A probability distribution over D

¼ A randomized ðmixedÞ decision rule with Rd ðhÞ ¼ ERz ðhÞ ¼ Risk of d at h where

Z ¼ A random variable with probability distribution d over D:

x ¼ fx 1 ; x 2 g

D ¼ fd 1 ; d 2 ; d 3 ; d 4 g

d 1 : d 1 ð x 1 Þ ¼ a1 ; d 1 ð x 2 Þ ¼ a1

d 2 : d 2 ð x 1 Þ ¼ a2 ; d 2 ð x 2 Þ ¼ a2

d 3 : d 3 ð x 1 Þ ¼ a1 ; d 3 ð x 2 Þ ¼ a2

d 4 : d 4 ð x 1 Þ ¼ a2 ; d 4 ð x 2 Þ ¼ a1

P

4

pi 0 8 i ¼ 1ð1Þ4, pi ¼ 1 where

1

X

4

R d ð hÞ ¼ pi Rd i ðhÞ:

i¼1

We shall consider only mixed rules d for which Rd ðhÞ is ﬁnite 8h and shall

denote by D as the class of all such mixed decision rules. Clearly, D D since a

non-randomized rule d = a probability distribution over D degenerate at d.

First mode of randomization:

X

ðH; ; LÞ ! ðH; ; LÞ !ðH; D; RÞ

7.1 Introduction 187

X

ðH; ; LÞ !ðH; D; LÞ ! ðH; D ; RÞ

sense that given any d 2 D one can ﬁnd a d 2 D with Rd ðhÞ ¼ Rd ðhÞ 8h and

conversely.

Example 7.7 ¼ fa1 ; a2 g, x ¼ fx 1 ; x 2 g

D ¼ fd 1 ; d 2 ; d 3 ; d 4 g as deﬁned earlier.

P

4

A typical d 2 D is d ¼ ðp1 ; p2 ; p3 ; p4 Þ; pi 0 for i = 1(1)4, pi ¼ 1, where

1

pi ¼ probability of choosing d i .

n . X o

D ¼ ðp1 ; p2 ; p3 ; p4 Þ pi 08i; pi ¼ 1

ity of taking action a1 if X ¼ xi ;

D ¼ fð/1 ; /2 Þ=0 /1 ; /2 1g

If one chooses a d 2 D ,

a1 is chosen with probability p1 þ p3 for X ¼ x1

a2 is chosen with probability 1 ðp1 þ p3 Þ ¼ p2 þ p4

a1 is chosen with probability p1 þ p4 for X ¼ x2

a2 is chosen with probability 1 ðp1 þ p4 Þ ¼ p2 þ p3

/2 ¼ p1 þ p4 .

Similarly, a d 2 D can be considered to be equivalent to a d ¼ D with

p1 þ p3 ¼ /1 , p1 þ p4 ¼ /2 .

1. Extends the class of decision rules, i.e. allows more flexibility to the statistician

2. The set of all randomized rules is convex, i.e. if d1 ; d2 2 D (or D ) then

ad1 þ ð1 aÞd2 2 D (or D ).

For every 0 a 1 and Rad1 þ ð1aÞd2 ðhÞ ¼ aRd1 ðhÞ þ ð1 aÞRd2 ðhÞ8h.

Thus, h 2 H; a 2 ; Lðh; aÞ; ðH; ; LÞ

X = observable random variable

P ¼ fPh =h 2 Hg ¼ family of probability distribution of X

188 7 Statistical Decision Theory

D = the class of all non-randomized decision rules

dðX Þ = a behavioural or randomized decision rule

D = the class of all behavioural rules

D = the class of all randomized rules

D and D are equivalent classes.

We shall hereafter denote both D and D as D.

Note D

D

Let d 2 D, Rd ðhÞ = risk function of d; h 2 H.

Goodness of a d is measured by risk function.

Let d1 ; d2 2 D

1. d1 is said to be equivalent to d2 ðd1 d2 Þ if Rd1 ðhÞ ¼ Rd2 ðhÞ 8h 2 H

2. d1 is at least as good as d2 ðd1 d2 Þ if Rd1 ðhÞ Rd2 ðhÞ 8h 2 H

3. d1 is said to be better than d2 ðd1 [ d2 Þ if Rd1 ðhÞ Rd2 ðhÞ 8h 2 H with strict

inequality for at least one θ.

Note

1. d1 d2 ) either d1 [ d2 , or d1 d2

d1 [ d2 ) d1 d2

2. d1 [ d2 ; d2 [ d3 ) d1 [ d3 , similarly for case

3. It may so happen that neither d1 [ ðor Þd2 nor d2 [ ðor Þd1 . In such case d1

and d2 are non-comparable. Thus [ ðor Þ gives a partial ordering of rules

2D

To estimate h; H ¼ ¼ ð1; 1Þ:

Lðh; aÞ ¼ ðh aÞ2 ¼ squared error loss. For any real constant C, let d c ðX Þ ¼

CX ¼ A non-randomized rule (Fig. 7.1).

¼ C 2 E h ðX hÞ2 þ h2 ð1 C Þ2 2C ð1 C Þh Eh ðX hÞ

¼ C 2 þ h2 ð 1 C Þ 2

Fig. 7.1 Rd ( θ)

R dc ( θ )

Rd (θ )

1

2

R d1 ( θ )

θ

7.1 Introduction 189

For C > 1, Rd c ðhÞ [ 1 ¼ Rd 1 ðhÞ 8h ) d 1 [ d c

If C ¼ 12 ; Rd1=2 ðhÞ ¼ 14 þ h4

2

Hence d 1 and d 1=2 are non-comparable.

Deﬁnition A d 2 D is said to be an admissible decision rule if there does not exist

any d0 2 D such that d0 [ d. Otherwise d is said to be inadmissible, i.e. d is said to

be an inadmissible rule if there exists a d0 2 D such that d0 [ d.

In the above example, for any C > 1, d c is inadmissible as d 1 [ d c .

Note Admissibility is the minimum requirement for any reasonably good decision

rule though the criterion is of negative nature.

Rules

of decision rule if given any d 62 C such that a d0 2 C exists such that d0 [ d

(Fig. 7.2).

C is said to be minimal complete if

Fig. 7.2

(ii) No proper sub-class of C is complete.

Signiﬁcance

If a complete class of C is available one can restrict to this class only for ﬁnding a

reasonable decision rule and thus reduce the problem.

A minimal complete class, if exists, provides maximal reduction to this extent.

Note A minimal complete class does not necessarily exist.

Some relationship between a complete (or a minimal complete) class and the

class of all admissible rules

Let A = the class of all admissible rules.

Result 1 For any complete class C, A C, i.e. any complete class C contains all

admissible rules.

190 7 Statistical Decision Theory

d is inadmissible, which is a contradiction as we have assumed d is an admissible

rule. So d 2 A ) d 2 C, i.e. A C. h

Result 2 If A is complete, then A is minimal complete.

Proof Assume A is complete. Result 1 ) No proper sub-class of A can be com-

plete. Hence A is minimal complete. h

Result 3 If a minimal complete class C exists, then C A.

Proof Let C be a minimal complete class. Then C is complete. By Result 1, A C.

So it is enough to prove that C A. Suppose this is not true. Then there exists a d0

such that d0 2 C but

d0 62 A ð7:1Þ

d1 [ d0 ð7:2Þ

d 2 C, take d ¼ d1 . If d 62 C, there exists a d1 2 C such that d1 [ d [ d0 . Thus, in

all cases there exists a d1 2 C such that d1 [ d0 ).

Let us deﬁne C 1 ¼ C fd0 g.

(Let d 62 C1 h

Case 1 d ¼ d0 . By (7.2), there exists a d1 2 C and hence d1 2 C 1 such that

d1 [ d0 :

Case 2 d 6¼ d0 . Then d 62 C, so there exists a d0 2 C such that d0 [ d:

A: d0 ¼ d0 . By (7.2), there exists a d1 2 C and hence 2 C1 such that

d1 [ d0 [ d.

B: d0 6¼ d0 . d0 2 C 1 . Hence, there exists a d0 2 C 1 such that d0 [ d:

Thus, given any d 62 C 1 in all cases there will exist a d0 2 C1 such that d0 [ d )

C 1 is complete)

Now (7.3) contradicts that C is minimal complete and hence (7.1) must be false

) C A: So C A.

and in this case C A.

Corollary 1 A minimal complete class, if it exists, is unique.

Proof Let C = a minimal complete class. C A which is unique. h

7.2 Complete and Minimal Complete Class of Decision Rules 191

also 2 C.

Proof C A, d 2 C , d 2 A. d0 d ) d0 also 2 A and hence 2 C. h

Corollary 3 If d is admissible and d0 d then d0 is also admissible.

Deﬁnition Let CðDÞ be a class of decision rules. Then C is said to be an essential

complete class if given any d 62 C there exists a d0 2 C such that d0 d.

(i) C is essential complete; and (ii) No proper sub-class of C is essential complete.

Result 1 Let A = the class of all admissible rules and C = an essential complete

class.

If d 2 A but 62 C, then there exists a d0 d such that d0 2 C (and hence 2 A).

Proof Let d 2 A but 62 C. Then there exists a d0 2 C such that d0 d. But as d 2 A,

it is impossible that d0 [ d. So, d0 d. h

Result 2 Let C be minimal essential complete and let d 2 C. If d0 d, then d0 62 C.

Proof If possible, let d0 2 C. Deﬁne C 0 ¼ C fd0 g then C0 will be also essential

complete. This contradicts that C is minimal essential complete. Hence d0 62 C. h

Note Let D1 ðDÞ be a class of decision rules. D1 is said to be an equivalent class

if all rules 2 D1 are equivalent to each other, but no rule 2 D D1 is equivalent to

a rule 2 D1 . Then D can be considered as the disjoint union of some equivalent

classes.

Then,

(i) If C = a min. complete class then C does or does not entirely contain an

equivalent class (by Corollary 2)

(ii) If C = a minimal essential complete class then C contains at most one rule

from each equivalent class (by Result 2)

Further if d 2 C and in C, d is replaced by d0 d, then resultant class is also

minimal essential complete.

So,

(a) A minimal complete class A minimal essential complete class (by (i) and

(ii) above)

(b) A min. essential complete class is not necessarily unique. (by 2nd part of

(ii) above).

192 7 Statistical Decision Theory

sub-class, then C is minimal complete and is also minimal essential complete.

Example 7.9 Examples of complete and essential complete class

(1) Essential completeness of the class of rules based on a sufﬁcient statistic:

Let d dð xÞ 2 D. For such x, dðxÞ is a probability distribution over ɶ. T = t(x) = a

statistic. d is said to be based on T if dðxÞ is a function of t(x), i.e. dðxÞ ¼ dðx0 Þ

whenever tðxÞ ¼ tðx0 Þ.

Such a rule can be denoted by dðT Þ: T is said to be a sufﬁcient statistic if the

conditional probability distribution of X given T is the same 8h.

Let T = a sufﬁcient statistic and D0 = the class of rules based on T.

Lemma 1 For any d 2 D, there exists a d0 2 D0 such that d0 d: [Cor. D0 is an

essential complete class]

Proof Let d 2 D

For each given value t of T we deﬁne a probability distribution d0 ðtÞ over ɶ as

follows:

Observe the value of a random variable X 0 having the probability distribution the

same as the conditional probability distribution of X given T = t (which is inde-

pendent of h) and then if X 0 ¼ x0 choose an action a 2 ɶ according to the proba-

bility distribution dðx0 Þ. h

Also, Lðh; d0 ðtÞÞ ¼ E fLðh; dðxÞÞ=T ¼ tg

(2) Essential completeness of the class of non-randomized rules for convex

(strictly convex) loss. Let Rk ¼ k-dimensional real space. S Rk .

S is said to be a convex subset if for any two x ; y 2 S and for any 0 a 1,

a x þ ð1 aÞ y also 2 S (Fig. 7.3).

x y~ x

~ y~

~

S

Convex Non-Convex

7.2 Complete and Minimal Complete Class of Decision Rules 193

Let

S = a convex subset of Rk .

f x ¼ a real-valued function deﬁned on S (Fig. 7.4).

Fig. 7.4

f x is said to be a convex function if for any two x ; y 2 S and for any

0 a 1,

f a x þ ð1 aÞ y af x þ ð1 aÞf y ð7:4Þ

If strict inequality holds in (7.4) whenever x 6¼ y , f x is said to be strictly

convex.

f ð xÞ ¼ x2 ; ex ; jxj; x 2 R1

Examples 7.10 |ﬄﬄ{zﬄﬄ} convex

Strictly convex

Lemma 2 (Jensen’s inequality) Let S = a convex subset of Rk ; f x ¼ a

real-valued convex function deﬁned on S. Let Z ¼ a random variable, such that

h i

P Z 2 S ¼ 1 and E Z exists. Then (i) E Z 2 S; (ii) Ef Z f E Z

If f is strictly convex, then strict inequality holds in (ii) unless the distribution of

Z is degenerate.

Let ɶ = a convex subset of Rk . The loss function Lðh; aÞ is said to be convex (or

strictly convex) if for each given h, Lðh; aÞ is a convex (or strictly convex) function

of a.

194 7 Statistical Decision Theory

Example 7.11

¼ R1 ; Lðh; aÞ ¼ ðh aÞ2 or jh aj

# #

Strictly convex convex

random variable with probability distribution dðxÞ over ɶ.

We assume that Ezx exists for each

x2x ð7:5Þ

Let D = the class of all non-randomized rules D D.

Lemma 3 Let ɶ = a convex subset of Rk and the loss function be convex. Then for

each d 2 D satisfying (7.5) there exists a d 0 2 D; viz, d 0 ðxÞ ¼ EZ x such that

d 0 d. If the loss function is strictly convex, then d 0 [ d unless d itself 2 D.

Corollary 1 Let ɶ = a convex subset of Rk , the loss function be strictly convex and

every d 2 D satisfying (7.5), then D (=the class of all non-randomized rules) is

essential complete.

Proof of Lemma 3 Let d 2 D

dðxÞ ¼ a probability distribution over ɶ. For each x, Z x ¼ a random variable

with probability distribution dðxÞ. Deﬁne d 0 ðxÞ ¼ EZ x . By (i) of Lemma 2, d 0 ðxÞ 2

ɶ 8x, i.e. d 0 ¼ d 0 ðxÞ 2 D. Also, by (ii) of Lemma 2 Lðh; d 0 ðxÞÞ ¼ Lðh; EZ x Þ

ELðh; Z x Þ ¼ Lðh; dðxÞÞ.

ð7:6Þ

) d0 d

If the loss function is strictly convex, strict inequality holds in (7.6) for at least

one h unless Z x -distribution is degenerate, i.e. 8x except possibly for x 2 A such

that Ph ½x 2 A ¼ 0 8h, in which case it means that d itself 2 D ) d 0 [ d unless d

itself 2 D h

Corollary 2 Let ɶ = a convex subset of Rk , the loss function is strictly convex and

every d 2 D satisfying (7.5). Let T be a sufﬁcient statistic and D0 = the class of

non-randomized rules based on T, D0 D. Then D0 is essential complete

(complete).

Proof Let d 2 D.

D0 = the class of all randomized decision rules based on T. By Lemma 1, there

exists a d0 ¼ d0 ðT Þ 2 D0 such that d0 d. For each t, d0 ðT Þ is a probability dis-

tribution over ɶ. Deﬁne Z t ¼ a random variable with probability distribution d0 ðtÞ

and d 0 ðtÞ ¼ EZ t . As in proof of Lemma 3, d 0 ðtÞ 2 ɶ, i.e. d 0 ¼ d 0 ðT Þ 2 D0 and

d 0 d0 ( [ d0 for strictly convex loss function unless d0 2 D; d). Thus, given

7.2 Complete and Minimal Complete Class of Decision Rules 195

any d 2 D, there exists a d 0 2 D0 such that d 0 d (> d for strictly convex loss

function unless d 2 D0 ).

) D0 is essential complete (complete). h

Note On the condition stated by (7.5)

Let d 2 D, Z x ¼ a random variable with probability distribution dðxÞ over ɶ.

Lðh; dðxÞÞ ¼ ELðh; Z x Þ which exists for each x and h. This in many cases implies

(7.5) holds.

Example 7.12 k ¼ 1; Lðh; aÞ ¼ ðh aÞ2 ɶ ¼ R1

ELðh; Z x Þ ¼ E ðh Z x Þ2 exists 8x and 8h

) EZ x exists 8x.

Lðh; aÞ ¼ jh aj

ELðh; Z x Þ ¼ E jZ x hj E jZ x j h

i.e. E jZ x j h þ E jZ x hj:

Thus E jZ x hj exists 8x and 8h ) (7.5) holds.

For K 2, ɶ = ¼ Rk ¼ X

Xk
2

L h; a ¼ j ai hi j 2 ¼
a h

i¼1

EL h ; Zx ¼ E

Zx h
which exists 8x and 8h ) (7.5) holds.

Lðh; aÞ C 1 jaj þ C2 for some C 1 ð [ 0Þ; C2 . Then ELðh; Z x Þ exists 8x ) (7.5)

holds.

This fact gives a sufﬁcient condition on loss function for (7.5) to hold (Fig. 7.5).

Fig. 7.5

L(θ,a)

c1 a +c2

Rao-Blackwell Theorem

D = the class of random values.

D0 = the class of random vales based on T.

D = the class of non-random values.

196 7 Statistical Decision Theory

loss function be convex. For any d 2 D satisfying (7.7), there exists a d 0 2 D0 , viz,

d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ. If the loss function be strictly convex d 0 [ d unless d itself

2 D0 .

Proof d 0 ðtÞ ¼ E ðd ðxÞ=T ¼ tÞ is independent of h.

E fLðh; d ðxÞ=T ¼ tÞg by Lemma 2:

) Rd 0 ðhÞ ¼ E h Lðh; d 0 ðT ÞÞ E h E fLðh; d ðxÞ=T ¼ tÞg

¼ E h Lðh; d ðxÞÞ ¼ Rd ðhÞ 8h

) d0 d

iff d is a function of t, i.e. d itself 2 D0 implying that d 0 [ d unless d itself

2 D0 . h

Corollary Let ɶ be a convex subset of Rk and the loss function be convex. Let

every d 2 D satisfy (7.6) and every d 2 D satisfy (7.7), then D0 is essential com-

plete. If the loss function be strictly convex, D0 is complete.

Proof Let d 2 D

By Lemma 3, there exists a d 2 D such that d d. Also, by Lemma 4 there exists a

d 0 2 D such that d 0 d d. Thus given any d 2 D, there exists a d 0 2 D0 such

that d 0 d ) D0 is essentially complete. If the loss function is strictly convex

d 0 [ d unless d itself 2 D0 ) D0 is complete. h

Note on condition (7.7) For every d 2 D, Rd ðhÞ ¼ Eh Lðh; d ðxÞÞ exists 8h.

This generally implies that Eh ðd ðxÞÞ exists 8h.

) Eðd ðxÞ=T ¼ tÞ exists, i.e. (7.7) holds.

Example 7.13 To estimate a real parameter h; X ¼ ɶ ¼ ð1; 1Þ

Lðh; aÞ ¼ ðh aÞ2

Similarly, it can be shown for absolute error loss Lðh; aÞ ¼ jh aj

Proposition Let for some h, Lðh; aÞ C 1 jaj þ C 2 for some constant C 1 ð [ 0Þ and

C2 .

7.2 Complete and Minimal Complete Class of Decision Rules 197

Then Rd ðhÞ exists 8h ) E h ðd ðxÞÞ exists ) (7.7) holds. Thus the proposition

gives a sufﬁcient condition on loss function for (7.7) to hold.

d1 d2 if Rd1 ðhÞ Rd2 ðhÞ 8h and it is a natural partial ordering of decision rules.

d0 2 D is said to be best or optimal if d0 d 8 d 2 D, but generally such an optimal

rule does not exist.

Example To estimate a real parameter h, X ¼ ¼ ð1; 1Þ. Let

Lðh; aÞ ¼ ðh aÞ2 . If possible, suppose there exists a best rule, say d0 . Consider

any given value of h, say h0 and deﬁne d 0 ðxÞ ¼ h0 8x. Clearly, Rd 0 ðh0 Þ ¼ 0 )

Rd0 ðh0 Þ ¼ 0 where d0 d 0 . Since h0 is arbitrary we must have Rd0 ðhÞ ¼ 0; 8h

which is generally impossible.

) generally there does not exist a best rule.

So to ﬁnd a reasonably good decision rule we need some additional principles.

Two such principles are generally followed:

(i) Restriction principle

(ii) Linear ordering principle

Restriction principle Put some reasonable restrictions on decision rules, i.e.

consider a reasonable restricted sub-class of decision rules having good overall

performances and then try to ﬁnd a best in this restricted sub-class.

Two restriction criteria often used are

(i) Unbiasedness and

(ii) Invariance

Linear ordering principle

For every d replace the risk function by a representative number and then compare

the rules in terms of these representative numbers.

If representative number of d1 representative number of d2 , then we prefer d1

to d2 . d0 is considered to be optimal if representative number of d0 representative

number of d 8 d 2 D.

Thus a linear ordering principle is a way of specifying representative number

Note Any linear ordering principle should not disagree with partial ordering

principle, i.e. if d1 d2 we must have representative number of d1 as repre-

sentative number of d2 .

Two linear ordering principles that are used in general are

(i) Bayes principle

(ii) Minimax principle

198 7 Statistical Decision Theory

sðhÞ : h 2 X ! a suitable weight function over X. sðhÞ 0 8h and sðhÞ ¼ 1.

h2X

Take representative number as weighted average risk

X

¼ sðhÞRd ðhÞ ¼ r ðs; dÞ

h2X

¼ prior p:m:f of h:

r ðs; dÞ ¼ Bayes risk of d with respect to s:

If X ¼ a non-degenerate interval of Rk ,

sðhÞ = p.d.f of a (continuous)

R distribution over X.

Bayes risk of d ¼ rðs; dÞ ¼ Rd ðhÞsðhÞdh.

X

d0 S are compared with respect to rðs; dÞ, i.e. if rðs; d1 Þ rðs; d2 Þ, then d1 is

preferred to d2 . A d0 2 D is considered to be optimum if it minimizes rðs; dÞ with

respect to d 2 D. Such a d0 is called a Bayes rule with respect to prior s.

Deﬁnition A rule d0 2 D is said to be a Bayes rule with respect to a prior s if it

minimizes Bayes risk (w.r.t. s) rðs; dÞ w.r.t. d 2 D, i.e. if rðs; d0 Þ ¼ inf rðs; dÞ.

d21

Note

1. A Bayes rule may or may not exist. If it exists, inf min.

2. A Bayes rule depends on prior s.

3. A Bayes rule, even if exists, may not be unique.

4. Bayes principle does not disagree with partial ordering principle, i.e.

Rd1 ðhÞ Rd2 ðhÞ 8h ) rðs; d1 Þ rðs; d2 Þ whatever be s.

Minimax principle

For a d 2 D, representative number is taken as

Sup Rd ðhÞ ¼ Max: Risk that may be incurred due to choice of d. d1 is preferred

h2X

to d2 if Sup Rd1 ðhÞ Sup Rd2 ðhÞ.

h2X h2X

d0 is considered to be optimum if it minimizes Sup Rd ðhÞ with respect to d 2 D.

h2X

Such a d0 is called a “Minimax Rule”.

Deﬁnition A rule d0 2 D is said to be a minimax rule if it minimizes Sup Rd ðhÞ

h2X

with respect to d 2 D, i.e. if

Sup Rd0 h ¼ inf Sup Rd h.

h2X d2D h2X

7.3 Optimal Decision Rule 199

Notes

1. A minimax rule may or may not exist.

2. A minimax rule does not involve any prior s.

3. A minimax rule, even if exists, may not be unique.

4. Minimax principle doesn’t disagree with partial ordering principle

) Sup Rd1 ðhÞ Sup Rd2 ðhÞ

h2X h2X

s = a given prior.

To ﬁnd a Bayes rule d0 with respect to s to ﬁnd a rule d0 that minimizes Bayes

risk rðs; dÞ with respect to d.

Proposition If a Bayes rule d0 with respect to a given prior s exists, then there

exists a non-randomized rule d 0 which is Bayes with respect to s.

Implication For ﬁnding a Bayes rule, we can without any loss of generality con-

sider non-randomized rules only.

Proof Let d0 be a Bayes rule with respect to s. d0 may be considered as a prob-

ability distribution over D (=the class of non-randomized rules).

Let Z = a random variable with probability distribution d0 over D. Then

P P

[Let X be ﬁnite or countable. r ðs; d0 Þ ¼ sðhÞRd0 ðhÞ ¼ sðhÞEz Rz ðhÞ

h2X h2X

X

¼ Ez sðhÞRz ðhÞðassuming that it is permissibleÞ

h2X

¼ Ez r ðs; zÞ

) r ðs; d0 Þ r ðs; d Þ 8d 2 D as D D

ð7:9Þ

) r ðs; d0 Þ r ðs; zÞ 8 values of z:

) r ðs; d0 Þ E z r ðs; zÞ ¼ r ðs; d0 Þ

200 7 Statistical Decision Theory

by (7.8).

We must have equality in (7.9), and consequently Z must 2 D0 with probability

1, where D0 ¼ fd=d 2 D;r ðs; d Þ ¼ r ðs; d0 Þg. h

Consider any d 0 2 D0 ,

then r ðs; d 0 Þ ¼ r ðs; d0 Þ ¼ inf r ðs; dÞ (since d0 is Bayes)

d2D

) d 0 is also Bayes. This proves the Proposition.

Note It is clear from the proof that

(1) A randomized Bayes rule = A probability distribution over D0 ; i.e. the class of

non-randomized Bayes rules.

(2) If a non-randomized Bayes rule is unique, i.e. D0 consists of a single d 0 , then a

Bayes rule is unique and is d 0 .

sðhÞ ¼ a prior distribution of h.

To minimize r ðs; dÞ with respect to d 2 D,

Without any loss of generality we may restrict to non-randomized rules only. So

we are to minimize r ðs; d Þ with respect to d 2 D.

Let X be Rcountable and x be also countable (If

x is an open interval of Rk ,

replace Σ by ).

Then for any d 2 D

X X X

r ðs; d Þ ¼ sðhÞRd ðhÞ ¼ sðhÞ pðx=hÞLðh; d ð xÞÞ

h2X h2X x2

x

XX ð7:10Þ

¼ sðhÞpðx=hÞLðh; d ð xÞÞ

x h2X

x2

assuming it is permissible.

PSuppose there exists a d0 ¼ d0 ð xÞ such that for each x, d0 ð xÞ minimizes

sðhÞpðx=hÞLðh; d ð xÞÞ with respect to d ð xÞ 2 ɶ.

h2X

Then clearly, d0 minimizes (7.10) w.r.t. d 2 D ) d0 is Bayes rule with respect

to s.

pðx=hÞ ¼ conditional p:m:f of X given h.

sðhÞ ¼ marginal p:m:f of h.

pðx=hÞsðhÞ ¼ Joint p:m:f of X and h.

X

pð x Þ ¼ pðx=hÞsðhÞ ¼ marginal p:m:f of X:

h2X

pðx=hÞsðhÞ

qðh=xÞ ¼ ¼ conditional ðPosteriorÞ p:m:f: of h given X ¼ x

pð x Þ

7.4 Method of Finding a Bayes Rule 201

P

To minimize sðhÞpðx=hÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ

h2XP

, to min. pð xÞ qðh=xÞLðh; dð xÞÞ with respect to d ð xÞ 2 ɶ.

P h2X

, min qðh=xÞLðh; dð xÞÞ w.r.t. d ð xÞ 2 ɶ.

h2X

(It is conditional (posterior) loss given X = x), i.e. EfLðh; d ð xÞÞ=X ¼ xg.

Thus if there exists a d0 d0 ð xÞ such that for each x, d0 ð xÞ gives min

E fLðh; d ðxÞÞ=X ¼ xg ¼ Conditional ðposteriorÞ loss given X ¼ x w:r:t: d ð xÞ 2 :

Then d0 is a Bayes rule.

If the minimizing d0 ð xÞ is unique for each x, then d0 is the unique Bayes rule.

[Let X be an open interval of RkR and x be also an open interval of Rk

(If x is countable, replace Σ by )

Then for any d 2 D

Z

r ðs; d Þ ¼ sðhÞRd ðhÞd ðhÞ

X

2 3

Z Z

¼ sðhÞ4 pðx=hÞLðh; d ð xÞÞdx5dh ð7:11Þ

X

x

Z Z

¼ sðhÞpðx=hÞLðh; d ð xÞÞdhdx

U X

R Suppose there exists a d0 d0 ð xÞ such that for each x, d0 ð xÞ minimize

sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2 .

X

Then clearly, d0 minimizes (7.11) with respect to d 2 D ) d0 is Bayes rule with

respect to s.

pðx=hÞ ¼ conditional p:d:f of Xgiven h.

sðhÞ ¼ marginal p:d:f of h

pðx=hÞsRðhÞ ¼ Joint p:d:f of Xand h.

pð xÞ ¼ sðhÞpðx=hÞdh ¼ marginal p:d:f of X:

X

pðx=hÞsðhÞ

qðh=xÞ ¼ p ð xÞ ¼ conditional ðposteriorÞ p:d:f of h given X = x, if pð xÞ [ 0.

R

To minimize sðhÞpðx=hÞLðh; d ð xÞÞdh with respect to d ð xÞ 2

RX

, min. pð xÞ qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2

R X

, min. qðh=xÞLðh; d ð xÞÞdh with respect to d ð xÞ 2

X

which is to min conditional (posterior) loss given X = x.

i.e. E ðLðh; d ð xÞÞ=X ¼ xÞ.

202 7 Statistical Decision Theory

E fLðh; d ð xÞÞ=X ¼ xg = conditional (posterior) loss given X = x with respect to

d ð xÞ 2 ɶ then d0 is a Bayes rule.

If the minimizing d0 ð xÞ is unique for each x, then d0 is unique Bayes rule]

Summary To min r ðs; d Þ with respect to d 2 D

r ðs; d Þ ¼ Eh Rd ðhÞ

h i

¼ Eh EX=h Lðh; d ð xÞÞ *Rd ðhÞEX=h Lðh; d ð xÞÞ

¼ EX Eh=X Lðh; d ð xÞÞ min for each X ¼ x with respect to d ð xÞ 2

Applications

1 Estimation of a real parameter h for squared error loss. To estimate a real

parameter h where X ¼ ɶ ¼ R1 or an open interval of it.

Lðh; aÞ ¼ ðh aÞ2 , sðhÞ ¼ a prior p.d.f of h

R

To min. E fLðh; d ð xÞÞ=X ¼ xg ¼ ðh d ð xÞÞ2 qðh=xÞdh w.r.t. d ð xÞ 2 .

X

Clearly, minimizing d0 ð xÞ is given by

R

Z hpðh=xÞsðhÞdh

h X

d0 ð xÞ ¼ E =X ¼ x ¼ hqðh=xÞdh ¼ R

sðhÞpðh=xÞdh

X X

Example 7.14

Let sðhÞ ¼ prior p:d:f of h ¼ heh ; h [ 0

1

pðx=hÞ ¼ ; 0\x\h

h

7.4 Method of Finding a Bayes Rule 203

eh

¼ R1 ; x\h\1

eh dh

x

¼ eh ex

R1

heh dh

Mean of the posterior distribution of h given (X = x) ¼ ex x

R1 h

heh j1

x þ e dh

x

þ ex

¼ e x

x

¼ xe ex ¼ x þ 1.

Thus unique Bayes estimator of h w.r.t. s is d 0 ðxÞ ¼ X þ 1.

Example 7.15 X Binðn; hÞ, n given, 0\h\1

To estimate h under squared error loss.

n x

pðx=hÞ ¼ h ð1 hÞnx ; x ¼ 0; 1; . . .; n

x

Let sðhÞ ¼ prior p:d:f of h

1

¼ ha1 ð1 hÞb1 ; a; b [ 0

Bða; bÞ

¼ Beta prior

n 1

hx þ a1 ð1 hÞnx þ b1

x Bða;bÞ

¼

n R1 x þ a1

Bða;bÞ h

1

ð1 hÞnx þ b1 dh

x 0

1

¼ hx þ a1 ð1 hÞnx þ b1 ; 0\h\1:

Bðx þ a; n x þ bÞ

Z1

1

¼ hx þ a ð1 hÞnx þ b1 dh

Bðx þ a; n x þ bÞ

0

Bðx þ a þ 1; n x þ bÞ xþa

¼ ¼ :

Bðx þ a; n x þ bÞ aþbþn

Thus the unique Bayes estimator of h w.r.t. Beta ða; bÞ prior is d 0 ðxÞ ¼ n þX þ a

a þ b.

204 7 Statistical Decision Theory

Unique Bayes estimator is Xn þþ 21 :

Example 7.16 Let X Poisson (h), 0\h\1

To estimate h under squared error loss.

To ﬁnd Bayes estimator w.r.t. Þ prior.

i.e. sðhÞ 1 eah hb1 ; h 0

Let sðhÞ ¼ Keah hb1 ; h 0

R1

as sðhÞdh ¼ 1 ) K aÞbb ¼ 1 ) K ¼ a

b

Þb

0

hx

pðx=hÞ ¼ eh ; x ¼ 0; 1; . . .

x!

eh hx ab ah b1

x! Þ ðbÞ e h

qðh=xÞ ¼ R1

ab 1

: x! eð1 þ aÞh hx þ b1 dh

Þ ð bÞ

0

ð1 þ aÞh x þ b1

e :h xþb

¼ ð 1 þ aÞ

Þ ð x þ bÞ

Z1

ð 1 þ aÞ x þ b

d0 ð x Þ ¼ eð1 þ aÞh hx þ b dh

Þ ðx þ b Þ

0

xþb

ð 1 þ aÞ Þ ð x þ b þ 1Þ xþb

¼ xþbþ1

¼

Þ ðx þ bÞ ð 1 þ aÞ 1þa

xþb

d 0 ð xÞ ¼ :

1þa

Notes

1. d 0 ðxÞ is also (unique) Bayes if Lðh; aÞ ¼ cðh aÞ2 1ðh aÞ2 , c = a given

constant

2. If a sufﬁcient statistic T exists we may consider rules based on T only (because

of essential completeness of rules based on T) and then may ﬁnd Bayes rule

based on T.

To estimate h under squared error loss.

7.4 Method of Finding a Bayes Rule 205

T ¼ X ¼ min. Sufﬁcient statistic N h; 1n .

n 2 2 2

2

n2t2 h2 n þ 12 þ nht

¼ Cont: e e r

2

2 þ1 nr2

n2 r 2 t 2 nr h

n2t2 þ t

¼ cont: e 2ðnr2 þ 1Þ

e 2r2 nr2 þ 1

2

2 þ1 nr2

n2 r2 t2 nr h

n2t2 þ t

2ðnr2 þ 1Þ

Const.e e 2r2 nr2 þ 1

¼ 2

n2t2 þ n2 r 2 t 2

2ðnr2 þ 1Þ

R nr

2 þ1

2r2

h nr2

nr2 þ 1

t

Const.e e

2

2 þ1 nr2

nr h t

¼ Const.e 2r2 nr2 þ 1

nr2 r2

) given t; h N t; 2

nr þ 1 nr þ 1

2

2

Posterior mean ¼ Kx; K ¼ nrnr

2 þ1 :

Also, Min. Bayes risk = Bayes risk of d 0 ¼ Eh Ex=h ðd 0 hÞ2

r2 r2

¼ Ex E h=x ðd 0 hÞ2 ¼ Ex ¼

nr2 þ 1 nr2 þ 1

Applications

2. Estimation of a real h under weighted squared error loss: h ¼ a real parameter.

To estimate h, X ¼ ¼ some interval of R1

Let Lðh; aÞ ¼ wðhÞðh aÞ2 ; wðhÞ [ 0

d 0 be Bayes if for each x, d 0 ðxÞ minimizes

Z

Eh=X¼x wðhÞðh d ðxÞÞ2 ¼ wðhÞðh d ðxÞÞ2 qðh=xÞdh

hwðhÞqðh=xÞdh

Clearly, d 0 ðxÞ ¼ R .

wðhÞqðh=xÞdh

206 7 Statistical Decision Theory

2

To estimate h with Lðh; aÞ ¼ hðha Þ

ð1hÞ ¼ wðhÞðh aÞ

2

1

Þ

Let sðhÞ ¼ 1 80\h\1; i.e. uniform prior.

n x

h ð1 hÞnx

x hx ð1 hÞnx

qðh=xÞ ¼

¼ ; 0\h\1

n R x Bðx þ 1; n x þ 1Þ

h ð1 hÞnx dh

x

R1 hx ð1hÞnx

R h hð1h

1

Þ Bðx þ 1;nx þ 1Þ dh

hwðhÞqðh=xÞdh 0

d 0 ð xÞ ¼ R ¼ 1

wðhÞqðh=xÞdh R 1 hx ð1hÞnx

hð1hÞ Bðx þ 1;nx þ 1Þ dh

0

R1

hx ð1 hÞnx1 dh

0 Bðx þ 1; n xÞ

¼ ¼

R1 Bðx; n xÞ

hx1 ð1 hÞnx1 dh

0

x

¼ ; for x ¼ 1; 2; ::; n 1

n

Z Z1

For x ¼ 0; 2

wðhÞðh d 0 ð0ÞÞ qðh=x ¼ 0Þdh1 ðh cÞ2 h1 ð1 hÞn1 dh

0

¼ finite if c ¼ 0 ½by taking d 0 ð0Þ ¼ c

¼ 1 if c 6¼ 0

Z

x

) wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ 0Þdh is min for d 0 ð0Þ ¼ 0 ¼ :

n

R

Similarly, for x = n, wðhÞðh d 0 ð0ÞÞ2 qðh=x ¼ nÞdh is min for d 0 ðnÞ ¼ 1 ¼ nx

Thus for every x ¼ 0; 1; 2; . . .; n; d 0 ðxÞ ¼ nx minimizes

R

wðhÞðh d 0 ðxÞÞ2 qðh=xÞdh with respect to d ðxÞ 2

) d 0 ðxÞ ¼ nx ð¼ minimum variance unbiased estimator or maximum likelihood

estimator of hÞ is unique Bayes rule.

Application

3. Estimation of a real h under absolute error loss. To estimate h ¼ a real

parameter, X ¼ some interval of R1

Let Lðh; aÞ ¼ jh aj

d0 ¼ d0 ðX Þ be Bayes if for each x d 0 ðxÞ minimizes Eh=X¼x jh d ðxÞj with respect

to d ðxÞ 2

7.4 Method of Finding a Bayes Rule 207

median of the posterior distribution is unique, then d 0 is the unique Bayes rule.

Example 7.19 X ¼ ðX 1 ; X 2 ; . . .; X n Þ, X 1 ; X 2 ; . . .; X n i.i.d. N ðh; 1Þ; a\h\a

To estimate h under absolute error loss, without loss of any generality we restrict

to rules based on T ¼ X. Let sðhÞ : h N ð0; r2 Þ; r2 ð [ 0Þ known.

2

Median of posterior distribution of h given t ¼ kx; k ¼ nrnr2 þ1

Application

4. Estimation of function of h:

To estimate gðhÞ ¼ a real-value function of h.

d 0 be Bayes if it minimizes E h=X¼x fgðhÞ d ðxÞg2 with respect to d ðxÞ 2 ӕ for

each given x. Clearly, d 0 ðxÞ ¼ Eh=x¼x fgðhÞg

) d 0 ðxÞ ¼ E h=x fgðhÞg is (unique) Bayes.

Similarly, we can ﬁnd it for weighted squared error loss or for absolute error

loss.

Example 7.20 X ¼ ðX 1 ; X 2 Þ; X i ’s independent and X i Binðni ; hi Þ; where n1 ; n2

known, 0\h1 ; h2 \1; h ¼ ðh1 ; h2 Þ

To estimate gðhÞ ¼ h1 h2 under squared error loss.

sðhÞ : h1 ; h2 independent, hi Rð0; 1Þ; i ¼ 1; 2

n1 n1 x1 n2

h1 ð 1 h1 Þ

x1

h2 x2 ð1 h2 Þn2 x2

x1 x2

qðh=xÞ ¼ 1 1

R R n1 n1 x1 n2

h1 ð 1 h1 Þ

x1

h2 x2 ð1 h2 Þn2 x2 dh1 dh2

0 0 x 1 x 2

hx11 ð1 h1 Þn1 x1 hx22 ð1 h2 Þn2 x2

¼ ; 0\h1 ; h2 \1:

Bðx1 þ 1; n1 x1 þ 1ÞBðx2 þ 1; n2 x2 þ 1Þ

ni xi þ 1Þ; i ¼ 1; 2

x1 þ 1 x 2 þ 1

¼

n1 þ 2 n2 þ 2

208 7 Statistical Decision Theory

X1 þ 1 X2 þ 1

d 0 ðX Þ ¼ :

n1 þ 2 n2 þ 2

h2X d2D h2X

y ¼ risk point of d

h2X 1jk

ð1Þ ð2Þ

Two risk points y ; y may be considered to be equivalent if

1jk 1jk

1jk y 2S 1 j k

For any real C, let Qc ¼ Qðc;c;::;cÞ ¼ y yj c8j ¼ 1; 2; ::k :

All risk points lying on the boundary of a Qc are equivalent points (Figs. 7.6 and

7.7).

Fig. 7.6 Equalizer line z1=z 2

(c,c)

Qc

7.5 Methods for Finding Minimax Rule 209

Fig. 7.7

Equivalent points

S

Minimax Point

Let C0 ¼ inf fC=Qc \ S 6¼ /g:

Any risk point 2 boundary of Qc0 is a minimax point. Any rule d0 with risk point

y0 is minimax.

Notes

1. If S does not contain its boundary points, a minimax rule may not exist.

2. A minimax point may not be unique

(1,1) (2,1)

S

All Minimax

Points

(1,0) (2,0)

210 7 Statistical Decision Theory

3. A minimax point does not necessarily lie on the equalizer line (Figs. 7.8 and 7.9).

Loss is (0–1).

x ¼ f0; 1; 2; . . .g

Ph1 ½X ¼ x ¼ 0 if x = 0

1

¼ if x 1

2x

(1,1) (2,1)

S

All Minimax

(1,0)

(2,0)

C0

Minimax Points

Fig. 7.8

7.5 Methods for Finding Minimax Rule 211

C0

=Minimax Point

Fig. 7.9

1

Ph2 ½X ¼ x ¼ ; x ¼ 0; 1; . . .

2x þ 1

0 dð x Þ 1

X

1

y 1 ¼ R d ð h1 Þ ¼ dðxÞ Ph1 ðX ¼ xÞ

x¼1

X1

y 2 ¼ R d ð h2 Þ ¼ f1 dðxÞg Ph2 ðX ¼ xÞ

x¼0

1 1X 1

1

¼ 1 dð 0Þ dð x Þ x

2 2 x¼1 2

1

dð x Þ ¼ 8x 1; dð 0Þ ¼ 1

3

212 7 Statistical Decision Theory

1 2

dða2 =xÞ ¼ dða1 =xÞ ¼ ; x ¼ 1; 2; . . .

3 3

Example 7.21

1 1

X¼ h1 ¼ ; h2 ¼

4 2

1 1

¼ a1 ¼ ; a2 ¼

4 2

a1 a2

Loss matrix: h1 1 4

h2 3 2

Let X ¼ 0 with probability h

h ¼ h1 ; h2

¼ 1 with probability ð1 hÞ

Then D ¼ fd 1 ; d 2 ; d 3 ; d 4 g

d 1 ð0Þ ¼ d 1 ð1Þ ¼ a1 ; d 2 ð0Þ ¼ d 2 ð1Þ ¼ a2 ; d 3 ð0Þ ¼ a1 but d 3 ð1Þ ¼ a2 and

d 4 ð0Þ ¼ a2 but d 4 ð1Þ ¼ a1

Rd 1 ðh1 Þ ¼ 1; Rd 1 ðh2 Þ ¼ 3; Rd2 ðh1 Þ ¼ 4; Rd2 ðh2 Þ ¼ 2

1 3 1

R d 3 ð h1 Þ ¼ 1þ 4 ¼ 3

4 4 4

1 1 1

R d 3 ð h2 Þ ¼ 3þ 2 ¼ 2

2 2 2

1 3 3

R d 4 ð h1 Þ ¼ 4þ 1 ¼ 1

4 4 4

1 1 1

R d 4 ð h2 Þ ¼ 2þ 3 ¼ 2

2 2 2

1 1 3 1

¼ ð1; 3Þ; ð4; 2Þ; 3 ; 2 ; 1 ; 2 :

4 2 4 2

9 26

Here to ﬁnd a minimax rule means to ﬁnd a rule with risk point 26

11 ; 11

(Fig. 7.10)

7.5 Methods for Finding Minimax Rule 213

(4,1)

Minmax point=

Fig. 7.10

Let d 2 D. For X = x

¼ a2 with probability 1 dðxÞ

1 3

Rd ðh1 Þ ¼ fu 1 þ ð1 uÞ 4g þ fv 1 þ ð1 vÞ4g

4 4

1

¼ ð16 3u 9vÞ

4

1 1

R d ð h2 Þ ¼fu 3 þ ð1 uÞ 2g þ fv 3 þ ð1 vÞ2g

2 2

1

¼ ðu þ v þ 4Þ

2

26

Rd ðh1 Þ ¼ Rd ðh2 Þ ¼

11 ð7:12Þ

1 26 24

) ð16 3u 9vÞ ¼ ) u þ 3v ¼

4 11 11

1 26 8

and ð u þ v þ 4Þ ¼ ) uþv ¼ ð7:13Þ

2 11 11

214 7 Statistical Decision Theory

8

:

Thus, the unique minimax rule is given as

8 3

dða1 =0Þ ¼ 0; dða2 =0Þ ¼ 1; dða1 =1Þ ¼ ; dða2 =1Þ ¼ :

11 11

Note The unique minimax rule is purely randomized. Thus, unlike Bayes rules, a

minimax rule may be purely randomized, i.e. although a minimax rule exists, no

non-randomized rule is minimax.

Alternative (direct/or Algebraic approach)

Fig. 7.11

8

11 D2

v

D

3

D1 11

1

Sup Rd ðhÞ ¼ maxfRd ðh1 Þ; Rd ðh2 Þg ¼ ð16 3u 9vÞ

h2X 4

i.e. if 5u þ 11v 8 and ¼ 12 ðu þ v þ 4Þ if

2 ðu þ v þ 4Þ 4 ð16 3u 9vÞ, i.e. if

5u þ 11v 8

1 1

4

9v

h2X

163u9v

Now Inf Sup Rd ðhÞ ¼ Inf 4

d2D1 h2H 0 u; v 1

5u þ 11v 8

16 3u 9v 1 9ð8 5uÞ

¼ inf inf ¼ inf 16 3u

0 u 1 0 v 85u 4 0u1 4 11

11

7.5 Methods for Finding Minimax Rule 215

1 104 26

¼ inf ð12u þ 104Þ ¼ ¼

0u1 4 44 11

8

26 8

d2D2 h2H

Finally, inf Sup Rd ðhÞ

d2D h2H

26

¼ min inf Sup Rd ðhÞ; inf Sup Rd ðhÞ ¼

d2D1 h2H d2D2 h2X 11

8

.

Thus, the unique minimax rule is given as u = 0, v ¼ 11 8

8 3

Result 1 If an equalizer rule d0 is Bayes (w.r.t some prior s), then d0 is minimax. If

d0 is unique Bayes (w.r.t. s), then d0 is unique minimax (and hence admissible).

Proof Rd0 ðhÞ ¼ c 8h ) Sup Rd0 ðhÞ ¼ c and r ðs; d0 Þ ¼ c

h2H

Minimaxiety: If possible let d0 be not minimax, so there exists a d1 such that

h2H h2H

) Rd1 ðhÞ Sup Rd1 ðhÞ ¼ c8h

h2H

) r ðs; d1 Þ\c ¼ r ðs; d0 Þ

Unique minimaxiety

If possible let d0 be not unique minimax. So there exists another d1 which is also

minimax, i.e. there exists another d1 such that

h2H h2H

) Rd1 ðhÞ Sup Rd1 ðhÞ ¼ c8h

h2H

) r ðs; d1 Þ c ¼ r ðs; d0 Þ

) d1 is also Bayes w.r.t s, but this contradicts that d0 is unique Bayes w.r.t. s.

216 7 Statistical Decision Theory

X ¼ 1 with probabilityh

Example 7.22 Let

¼ 0 with prob 1 h: 0\h\1

To estimate θ under squared error loss.

Let d ðxÞ ¼ a non-randomized rule.

Let d ð1Þ ¼ u; d ð0Þ ¼ v; 0\u; v\1.

d ðu; vÞ

¼ h2 ð1 þ 2v 2uÞ þ h u2 v2 2v þ v2

The rule is equalizer iff 1 þ 2v 2u ¼ 0; or,

1 þ 2v

u¼ ð7:14Þ

2

2

and u2 v2 2v ¼ 0 or ð1 þ42vÞ v2 2v ¼ 0 (using 7.14)

Or 14 v ¼ 0 ) v ¼ 14 ) u ¼ 34

Thus, the only equalizer non-randomized rule is

3 1

d ð1Þ ¼ ; d ð0Þ ¼ :

4 4

E ðhÞ ¼ m1 and E h2 ¼ m2

) r ðs; d Þ ¼ ERd ðhÞ ¼ m2 ð1 þ 2v 2uÞ þ m1 u2 v2 2v þ v2

@rðs;d Þ

Now @u ¼ 0 ) 2m2 þ 2m1 u ¼ 0 ) u ¼ m2

m1

@r ðs; d Þ

¼ 0 ) 2m2 2m1 v 2m1 þ 2v ¼ 0

@v

m1 m2

)v¼

1 m1

m2 m1 m2

d ð 1Þ ¼ ; d ð 0Þ ¼ where m1 ¼ E ðhÞ; m2 ¼ E h2

m1 1 m1

Hence, the equalizer non-randomized rule is unique Bayes w.r.t. a s such that

1 m2

m2

m1 ¼ 34 and m1m 1

¼ 14.

) m1 ¼ 2 and m2 ¼ 38.

1

7.5 Methods for Finding Minimax Rule 217

[For example, let s ¼ B 12 ; 12 prior α = β m1 ¼ a þa b ¼ 12 and m2 ¼

aða þ 1Þ

ða þ bÞða þ b þ 1Þ ¼ 38 and the equalizer non-randomized rule is unique Bayes w.r.t.

1 1

B 2 ; 2 prior].

Thus the non-randomized rule d 0 ð1Þ ¼ 34 ; d 0 ð0Þ ¼ 14 is equalizer as well as

unique Bayes (w.r.t some prior) ) d 0 ðX Þ is minimax (unique).

Example 7.23

X þa

d ab ðX Þ ¼

nþaþb

2

X þa

Rd ab ðhÞ ¼ E h h

nþaþb

E h fðx nhÞ hða þ bÞ þ ag2

¼

ð n þ a þ bÞ 2

Eh ðx nhÞ2 þ h2 ða þ bÞ2 þ a2 2haða þ bÞ

½* E h ðx nhÞ ¼ 0 )

ð n þ a þ bÞ 2

n o

h i h2 ða þ bÞ2 n þ hfn 2aða þ bÞg þ a2

* E h ðx nhÞ2 ¼ nhð1 hÞ )

ð n þ a þ bÞ 2

d ab is equalizer iff

pﬃﬃ

ða þ bÞ2 ¼ n a ¼ 2n

, pﬃﬃ

2aða þ bÞ ¼ n b ¼ 2n

pﬃﬃ

X þ n

d pﬃn;pﬃn ðX Þ ¼ p2ﬃﬃﬃ

2 2 nþ n

pﬃﬃﬃ

pﬃﬃ pﬃﬃ

X þ n=2

is equalizer as well as unique Bayes (w.r.t B n

; n

prior). Hence pﬃﬃ is the

2 2 nþ n

unique minimax estimator of h.

218 7 Statistical Decision Theory

2

To estimate h under loss function Lðh; aÞ ¼ hðhð1ahÞ Þ

E h ðX h Þ

2

i.e. d 0 is an equalizer rule.

Also, d 0 is unique Bayes w.r.t. R(0,1) prior.

Hence d 0 ðX Þ ¼ Xn is the unique minimax estimator of h.

Result 2 If an equalizer rule d0 is extended Bayes, then it is minimax.

Example 7.25 X 1 ; X 2 ; . . .; X n i.i.d N ðh; 1Þ; 1\h\1

To estimate h under squared error loss. Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h, i.e. d 0 is

equalizer. Also, d 0 is extended Bayes. Hence X is minimax.

Proof of Result 2

Rd0 ðhÞ ¼ c 8h

h2H

Also, d0 is extended Bayes

) given any 2 [ 0; there exists a prior s2 such that

d2D

ð7:15Þ

or; inf r ðs2 ; dÞ c 2

d2D

So there exists a d1 such that

h2H h2H

h2H

) Rd1 ðhÞ\c 2 8h; since Rd ðhÞ Sup Rd ðhÞ

h2H ð7:17Þ

) r ðs; d1 Þ\c 2 whatever be s:

) inf r ðs; dÞ\c 2 whatever be s

d2D

* inf r ðs; dÞ r ðs; d1 Þ

d2D

7.5 Methods for Finding Minimax Rule 219

(i) Rd0 ðhÞ c 8h for some real constant c.

(ii) d0 is Bayes (unique Bayes) w.r.t. a prior s0 such that r ðs0 ; d0 Þ ¼ c

Then d0 is minimax (unique minimax).

Corollary 1 Let d0 be such that

ðiÞ0 Rd0 ðhÞ ¼ c 8h (This is in fact Result 1)

ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a prior s0 .

Then d0 is minimax (unique minimax). ðiÞ0 ; ðiiÞ0 ) ðiÞ; ðiiÞ

^

Corollary 2 Let d0 be such that

Rd0 ðhÞ ¼ c 8h 2 H0 ðHÞ

ðiÞ0

c 8h 2 H H0

ðiiÞ0 d0 is Bayes (unique Bayes) w.r.t. a s0 such that Prfh 2 H0 g ¼ 1

Then d0 is minimax (unique minimax)

ðiÞ0 ; ðiiÞ0 ; ) ðiÞ; ðiiÞ

Note For H0 ¼ H, Corollary 2 ) Corollary 1

Proof of Result 3 For any d and any s,

h2X

As Rd ðhÞ Sup Rd ðhÞ8h ) r ðs; dÞ Sup Rd ðhÞ

h2H h2H

h2H

h2H

(7.19), (7.20)

h2H

220 7 Statistical Decision Theory

So, minimaxiety of d0 :

h2H

r ð s 0 ; d0 Þ ðSince; d0 is Bayes w:r:t: s0 Þ

¼ cðbyðiiÞÞ

¼ Sup Rd0 ðhÞ by ð7:21Þ

h2H

) d0 is minimax.

Unique minimaxiety of d0 : For any dð6¼ d0 Þ

h2X

[ r ðs0 ; d0 Þ ðSince d0 is unique Bayes w:r:t:s0 Þ

¼ c ðby ðiiÞÞ

¼ Sup Rd0 ðhÞ by ð7:21Þ

h2X

h2X h2X

8dð6¼ d0 Þ ) d0 is unique minimax. h

Example 7.26 Let X Binðn; h1 Þ n be known.

Y Binðn; h2 Þ 0\h1 ; h2 \1; h1 ; h2 are unknown.

To estimate h1 h2 under squared error loss, we can expect a rule of the form

aX + bY + c to be minimum. However, no rule of this form is an equalizer rule. So

Result 1 (or Corollary 1) cannot be applied. But Corollary 2 can be applied as

follows:

Step 1: To ﬁnd an equalizer Bayes rule in some H0 ðHÞ. Let

H0 ¼ fh1 ; h2 =0\h1 ; h2 \1; h1 þ h2 ¼ 1g. Restricting to H0 , let us write h1 ¼ h;

h2 ¼ 1 h.

Thus, we have,

3

X Binðn; hÞ

7 independent:

Y Binðn; 1 hÞ 5

or n Y Binðn; hÞ

Without any loss of generality we may restrict ourselves to rules based on

Z ¼ X þ ðn Y Þ Binð2n; hÞ (Sufﬁcient statistic) pﬃﬃ pﬃﬃ

If X Binðn; hÞ, an equalizer and unique Bayes (w.r.t. Bin 2n ; 2n prior)

pﬃ

Xþ n

estimator of h under squared error loss is n þ p2ﬃﬃn.

pﬃﬃﬃﬃ pﬃﬃﬃﬃ

If Z Binð2n; hÞ; an equalizer and unique Bayes (w.r.t. Bin 22n ; 22n prior)

pﬃﬃﬃ

Z þ 2n

estimator of h under squared error loss is 2n þ p2ﬃﬃﬃ

2n

ﬃ.

7.5 Methods for Finding Minimax Rule 221

Lemma Under squared error loss, if d0 is an equalizer Bayes (unique) estimator of

gðhÞ, then d0 ¼ ad0 þ b is an equalizer Bayes (unique) estimator of

g ðhÞ ¼ agðhÞ þ b.

Proof For any estimator d of gðhÞ we can deﬁne an induced estimator, viz. d ¼

ad0 þ b of g ðhÞ ¼ agðhÞ þ b and vice versa.

Under squared error loss, Rd ðhÞ ¼ a2 Rd ðhÞ

r ðs; dÞ ¼ a2 r ðs; d Þ

d0 is Bayes (unique) w.r.t. s ) d0 ¼ ad0 þ b is Bayes (unique).

By the Lemma, an equalizer Bayes (unique) estimator of 2h 1 is

pﬃﬃﬃﬃ

2 z þ 22n 2ðX YÞ

pﬃﬃﬃﬃﬃ ¼ pﬃﬃﬃﬃﬃ

2n þ 2n 2n þ 2n

2ðXY Þ

pﬃﬃﬃﬃ

¼ d 0 (say)

2n þ 2n

Step 2: Rd0 ðh1 ; h2 Þ c 8ðh1 ; h2 Þ 2 H where c ¼ Rd 0 ðh1 ; h2 Þ for ðh1 ; h2 Þ 2 H0 .

Proof For ðh1 ; h2 Þ 2 H

2

2ðX Y Þ

Rd 0 ðh1 ; h2 Þ ¼ E h1 ;h2 pﬃﬃﬃﬃﬃ ðh1 h2 Þ

2n þ 2n

n pﬃﬃﬃﬃﬃ o 2 pﬃﬃﬃﬃﬃ 2

¼ E h 2ðX nh1 Þ 2ðY nh2 Þ 2nðh1 h2 Þ 2n þ 2n

¼ pﬃﬃﬃﬃﬃ2

2n þ 2n

2h1 ð1 h1 Þ þ 2h2 ð1 h2 Þ þ ðh1 h2 Þ2 Numerator

¼ pﬃﬃﬃﬃﬃ2 ¼ :

1 þ 2n Dinominator

¼ 1 f1 h1 h2 g2 1

0

¼ 0 holds iff h1 þ h2 ¼ 1

1

R d 0 ð h1 ; h2 Þ ¼ pﬃﬃﬃﬃﬃ2 ¼ c 8ðh1 ; h2 Þ 2 H0

Hence, 1 þ 2n h

\c 8ðh1 ; h2 Þ 2 H H0

222 7 Statistical Decision Theory

2ðXY Þ

By Corollary 2, Step 1 + Step 2 gives us d 0 ¼ 2n pﬃﬃﬃﬃ is the unique minimax

þ 2n

estimator of h1 h2 .

Result 4 Let d0 be such that

(i) Rd0 ðhÞ c 8h 2 H, c = a real constant

(ii) There exists a sequence of Bayes rules fdn g w.r.t. sequence of priors fsn g

such that r fsn ; dn g ! c. Then d0 is a minimax.

h2H

(7.22) ) For any d,

h2H

For d ¼ d0

(7.23) ) Sup Rd0 ðhÞ c and also condition ðiÞ ) Sup Rd0 ðhÞ c

h2H h2H

h2H

h2H h2H

i.e. d0 is minimax.

Example 7.27 Let X 1 ; X 2 ; ::; X n i.i.d N ðh; 1Þ; 1\h\1

To estimate h under squared error loss,

Let d 0 ¼ X; Rd 0 ðhÞ ¼ 1n 8h

(i) is satisﬁed with c ¼ 1n.

Let sr : N ð0; r2 Þ prior

2

d r ¼ Bayes estimator of h w.r.t. sr ¼ 1 þnrnr2 X

r ðsr ; d r Þ ¼ 1 þr nr2 ! 1n ¼ c as r2 ! 1

2

Hence d 0 ¼ X is minimax.

7.5 Methods for Finding Minimax Rule 223

2

To estimate θ with Lðh; aÞ ¼ ðh h aÞ .

(Apply Result 4 to prove that d 0 ¼ X is minimax)

2

Hint Rd0 ðhÞ ¼ Eh ðXh hÞ ¼ 1 ) (i) is satisﬁed with c = 1. Take

sab ðhÞ1eah hb1 ; 0\h\1. d ab ðX Þ ¼ Bayes estimator of h w.r.t. sab ¼

x þ b1

1 þ a r sab ; d ab ! 1 ¼ c as a ! 0, b ! 1. Hence d 0 ¼ X is minimax.

Proof Rd 0 ðhÞ ¼ c 8h ) Sup Rd 0 ðhÞ ¼ c.

h2H

If possible let d0 be not minimax. Then there exists a d1 such that

Sup Rd1 ðhÞ\ Sup Rd0 ðhÞ ¼ c

h2H h2H

To estimate a real-valued parameter h under squared error loss. H ¼ an open

interval of R1 . Without any loss of generality we can restrict ourselves to

non-randomized rules only (since the loss function is convex). h

Let d ðX Þ ¼ a non-randomized rule.

bd ðhÞ ¼ E h ðd ðX ÞÞ h ¼ Bias of d ðX Þ

¼ b2d ðhÞ þ V h ðd ðX ÞÞ

2 .

b2d ðhÞ þ 1 þ b0d ðhÞ I ð hÞ 8h

¼ C d ð hÞ ðsayÞ

(i) MSE of d 0 attains C–R lower bound, i.e.

224 7 Statistical Decision Theory

C d1 ðhÞ Cd 0 ðhÞ 8h

) bd 1 ð hÞ ¼ bd 0 ð hÞ 8h

Then d 0 is admissible.

If further, d 0 is equalizer, then d 0 is minimax.

Proof Result I ) proves that it is minimax h

Proof of admissibility If possible let d 0 be inadmissible.

Then there exists a d 1 such that

(7.26) ⇒

|ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ

ﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ}

(7.27) ⇒

Thus, strict inequality in (7.26) cannot hold for any h, i.e. there cannot be any d 1

such that d 1 [ d 0 .

Hence d 0 is admissible. h

Example 7.29 Let x1 ; x2 . . .xn i.i.d N ðh; 1Þ; 1\h\1. To estimate h under

squared error loss.

is sufﬁcient ) it is enough to restrict to n.r. rules based on X

X only.

Let d 0 ¼ d 0 ðX Þ ¼ X.

Rd 0 ðhÞ ¼ 1n 8 h, i.e. d 0 is equalizer.

Also, bd0 ðhÞ ¼ 0 8 h, i.e. d 0 is unbiased.

1

Rd 0 ðhÞ ¼ C d 0 ðhÞ ¼ 8h

n

7.5 Methods for Finding Minimax Rule 225

Let d ¼ d ðX Þ be any n.r rule based on X.

) bd ðhÞ ¼ 0 8 h, i.e. d is also unbiased.

Lemma ) Condition (ii) of Result 2 is also satisﬁed.

Hence, (i) d 0 is admissible.

(ii) d 0 is minimax.

Also, (iii) d 0 is unique minimax.

[Proof of (iii): Let d 1 ¼ d 1 ðxÞ be another minimax rule.

Then

1

Sup Rd1 ðhÞ ¼ Sup Rd 0 ðhÞ ¼ ¼ C d 0 ð hÞ

h2H h2H n

) C d1 ðhÞ Rd 1 ðhÞ Sup Rd 1 ðhÞ ¼ Cd 0 ðhÞ8h

|ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} h2H

By C–R inequality

) Cd 1 ðhÞ Cd 0 ðhÞ 8 h ) bd1 ðhÞ ¼ 0 8 h (By Lemma).

) d 1 is an unbiased estimator of h. But since X is complete d 0 is the unique

is the unique minimax esti-

unbiased estimator of h; i.e., d 1 ¼ d 0 , Hence d 0 ¼ X

mator of h]

Proof of Lemma Writing bd ðhÞ ¼ bðhÞ

Let Cd ðhÞ Cd 0 ðhÞ ¼ 1n 8 h

.

i:e:; b2 ðhÞ þ f1 þ b0 ðhÞg

2

n 1=n 8 h ð7:30Þ

1 2b0 ðhÞ .

b2 ðhÞ þ f1 þ b0 ðhÞg n 1=n

2

½as ð7:30Þ) þ

n n

2b0 ðhÞ

) 0 ) b0 ðhÞ 0

n

2

2

b0 ðhÞ

Now ð7:32Þ ) 1

8 h such that bðhÞ 6¼ 0

b0 ðhÞ

2 2

226 7 Statistical Decision Theory

d 1 1

Or b ð hÞ 8 h such that bðhÞ 6¼ 0 ð7:33Þ

dh 2

ð7:31Þ; ð7:33Þ ) bðhÞ ! 0 as h ! 1 ð7:34Þ

Lðh; aÞ ¼ Loss ðto the statisticianÞ if the statistician chooses an action ‘a’ and

nature chooses an action ‘h’.

A randomized action for the statistician = a probability distribution over ɶ.

The statistician observes the value of a r.v. X. If X = x is observed, the statistician

chooses a randomized action dðxÞ.

¼ a randomized action for the nature:

If the statistician chooses a randomized rule d and the nature chooses a ran-

domized action s, then the statistician’s expected loss is

cðs; dÞ ¼ Bayes risk of d w:r:t: s.

Result 1 For any d 2 D,

Sup Rd ðhÞ ¼ Sup cðs; dÞ where H = the set of all possible s’s.

h2H s2H

Proof

h2H

) cðs; dÞ Sup Rd ðhÞ 8 s ð7:35Þ

h2H

) Sup cðs; dÞ Sup Rd ðhÞ

s2H h2H

7.6 Minimax Rule: Some Theoretical Aspects 227

Hence, Sup r ðs; dÞ r ðs0 ; dÞ ¼ Rd ðhÞ 8 h

s2H

Thus Rd ðhÞ Sup r ðs; dÞ 8 h

s2H

h2H s2H

(7.35), (7.36) ) Sup Rd ðhÞ ¼ Sup r ðs; dÞ, hence the proof.

h2H s2H

A rule d0 is minimax if it minimizes

Sup Rd ðhÞw:r:t d 2 D

h2X

s2H

s2H d2D s2H

whatever be the action chosen by nature.

Similarly, a prior s0 is said to be a maximum rule for the nature or a least

favourable prior for the statistician if s0 maximizes inf r ðs; dÞw:r:t s, i.e. if

d

d s d

If nature chooses a least favourable s0 , then expected loss (of the statistician) is

at least m whatever be the rule the statistician chooses. h

Result 2 m m

Proof

s

) inf r ðs; dÞ inf Sup r ðs; dÞ ¼ m 8 s

d d s

) Sup inf r ðs; dÞ m

s d

) m m

228 7 Statistical Decision Theory

Result 3 if the statistical game has a value and a least favourable prior s0 and a

minimax rule d0 exists, then d0 is Bayes w.r.t. s0 .

Proof m ¼ inf r ðs0 ; dÞ r ðs0 ; d0 Þ Sup r ðs; d0 Þ ¼ m

d s

If m ¼ m, then ‘=’ must hold every where implying

inf r ðs0 ; dÞ ¼ r ðs0 ; d0 Þ ) d0 is Bayes w.r.t. s0 . h

d

Minimax theorem Let H be ﬁnite and the risk set S be bounded below. Then the

statistical game will have a value and a least favourable prior s0 exists.

If further, S is closed from below an admissible minimax rule d0 exists and d0

Bayes w.r.t. s0 .

Thus if H is ﬁnite and S is bounded below as well as closed from below, then

(i) A minimax rule exists

(ii) An admissible minimax rule exists and

(iii) A minimax rule is Bayes (w.r.t least favourable prior s0 ).

(i) Rd0 ðhÞ c 8 h

(ii) d0 is Bayes w.r.t. some s0 and r ðs0 ; d0 Þ ¼ c,

then

(a) d0 is minimax

(b) s0 is least favourable prior.

Proof

(a) Proved earlier

(b) To show inf r ðs0 ; dÞ inf r ðs; dÞ 8 s ðbÞ

d d

Now (i) ) r ðs; d0 Þ c 8 s

) inf r ðs; dÞ r ðs; d0 Þ c ¼ r ðs0 ; d0 Þ ¼ inf r ðs0 ; dÞ 8 s by ðiiÞ

d d

This proves (b). h

7.7 Invariance

such case it seems reasonable to restrict to decision rules, which are also invariant

w.r.t. similar transformations. Such a decision rule is called an invariant decision

rule and in many problems a best rule exists within the class of invariant rules.

7.7 Invariance 229

We are to estimate h under the squared error loss.

Suppose one considers a transformation of X, viz., X 0 ¼ X þ c, c = a given

constant and considers the problem of estimating h0 ¼ h þ c on the basis of

X 0 N ðh0 ; 1Þ under the squared error loss.

For an action ‘a’ for the ﬁrst problem, there is an action a0 ¼ a þ c for the second

problem and vice versa with Lðh; aÞ ¼ Lðh0 ; a0 Þ. Thus the two problems may be

considered to be equivalent in the sense that ðH; ɶ, L) ðH0 ; ɶ′, L′).

Now let d ¼ d ð X Þ ¼ a reasonable estimator of h on the basis of X. Then

d ðX 0 Þ ¼ d ðX þ cÞ should be a reasonable estimator of h0 on the basis of X 0 . Also, if

d ð xÞ ¼ a reasonable estimate of h on the basis of X ¼ x then d ð xÞ þ c should be a

reasonable estimate for h0 . The two estimates are identical if

d ðx þ cÞ ¼ d ð xÞ þ c ð7:37Þ

holds 8x8c.

d ð X Þ is an equivariant estimator iff d ð X Þ ¼ X þ K ¼ dK ð X Þ (say) for some

constant K.

[If d ð X Þ ¼ X þ K, then (7.37) is satisﬁed 8x8c. Let (7.37) be satisﬁed 8x8c. For

c ¼ x, ðiÞ ) d0 ¼ dðxÞ x; or d ð xÞ ¼ x þ K, K ¼ dð0Þ Rdk ðhÞ ¼

E ðX þ K hÞ2 ¼ 1 þ K 2 which is minimum when K = 0. Thus

Rd0 ðhÞ Rdk ðhÞ8h8K.

) d0 ð X Þ ¼ X is the best within the class of equivariant estimators.]

ðH; ɶ, L) X = a r.v. and x = observed value of X 2 x (=sample space)

Ph = A probability distribution over x depending on h.

P ¼ fPh =h 2 Hg = family of probability distribution.

A statistical decision problem ðH; a; LÞ and P,

Groups of transformation of X(or x)

Y ¼ gðXÞ = a transformation of X

gðxÞ ¼ a single valued function of x.

g: x! x , g = a transformation on x

We assume that g is measurable so that g(x) is an r.v. g is said to be an onto

transformation if the range of g(x) is x is

x, i.e. x.

g is said to be 1:1 if gðx1 Þ ¼ gðx2 Þ ) x1 ¼ x2 .

Example 7.31 x ¼ R1 ; gðxÞ ¼ x þ c; c ¼ a real constant. This g is 1:1 and onto.

The identity transformation e is deﬁned as eðxÞ ¼ x. Let g1 ; g2 be two trans-

formations on x. Then the composition of g2 ; g1 , denoted by g2 g1 is deﬁned as

g2 g1 ðxÞ ¼ g2 ½g1 ðxÞ.

Example 7.32 x ¼ R1

g1 ðxÞ ¼ x þ c1 and g2 ðxÞ ¼ c2 , c1, c2 are real constants. g1 g2 ðxÞ ¼ x þ c1 þ c2

230 7 Statistical Decision Theory

Also ge ¼ eg ¼ g

If g is a transformation on x, then the inverse transformation of g, denoted by

g1 , is the transformation g such that

gg1 ¼ g1 g ¼ e.

In the example, g1 1 ðxÞ ¼ x c1 .

Let G = a class of transformation on

x

Deﬁnition G is called a group of transformations if G is closed under the com-

positions and inverses, i.e. if

i. g1 ; g2 2 G ) g2 g1 2 G

ii. g 2 G ) g1 2 G.

Note Let G be a group of transformations, then every g 2 G is 1:1 and onto (since

g1 exists).

Also, the identity transformation e always 2 G [if g 2 G, then g1 2 G,

e ¼ g1 g 2 G].

Example 7.33 x ¼ R1

gc ðxÞ ¼ x þ c; c ¼ a real constant.

Let G ¼ fgc = 1\c\1g

1

gc 2 G ) g1

c 2 G Asgc ðxÞ ¼ x þ ðcÞ

Example 7.34

x ¼ R1 , gc ðxÞ ¼ cx where c = a positive real constant

1

g1

c ðxÞ ¼ x

c

gc 1 ; gc 2 2 G ) gc 1 gc 2 2 G

gc 2 G ) g1

c 2G

7.7 Invariance 231

These are multiplicative or group under scale transformation.

Example 7.35

x ¼ R1 , ga;b ¼ a þ bx

G ¼ ga;b = 1\a\1; 0\b\1

G is a group transformation.

It is a group under both location and scale transformation.

Example 7.36 x ¼ f0; 1; 2. . .ng

Let gðxÞ ¼ n x

G ¼ fe; gg

Hence, G is a group of transformation.

Example 7.37 x ¼ ðx1 ; x2 ; x3 ; . . .. . .xn Þ

x0 = The set of possible values of xi

x¼ x0 x. . .. . .. . .:x

x0 x x0

Let gi ð xÞ ¼ ðxi1 ; xi2 . . .. . .xin Þ

The invariance of a statistical decision problem is considered to be w.r.t a given

group transformations G on x:

x:

Deﬁnition P ¼ fPh =h 2 Hg is said to be invariant w.r.t G if for any g 2 G and any

h 2 H ði:e:; any Ph 2 PÞ there exists a unique h0 2 H (i.e. a unique Ph0 2 P) such

that probability distribution of y ¼ gð xÞ is Ph0 when the probability distribution of

X is Ph .

This unique h0 determined by g and h is denoted by gðhÞ.

Example 7.38 X Nðh; 1Þ; 1\h\1

If X Nðh; 1Þ then Y ¼ gc ðxÞ Nðh þ c ¼ h0 ; 1Þ

232 7 Statistical Decision Theory

Thus P is invariant under G with

gc ðhÞ ¼ h þ c:

Example 7.39 X expðhÞ; 0\h\1

P.d.f of X under h is 1h eh ; x [ 0

x

P ¼ fexpðhÞ=0\h\1g

If X expðhÞ, then gc ðxÞ expðchÞ, i.e. ch ¼ h0 : h0 is uniquely determined by

c and h. Thus P is invariant under G with gc ðhÞ ¼ ch:

Example 7.40 Let X Binðn; hÞ, n known, 0\h\1

P ¼ fBinðn; hÞ=0\h\1g

If X Binðn; hÞ then eðxÞ Binðn; h ¼ h0 Þ and gðxÞ Binðn; 1 h ¼ h0 Þ

h0 is uniquely determined by h and member of G. Thus P is invariant under

G with eðhÞ ¼ h;

gðhÞ ¼ 1 h:

x

Let P be invariant w.r.t G with induced group of transformations on H as

¼ f

G g=g 2 Gg:

Deﬁnition The loss function L is said to be invariant w.r.t G if for each g 2 G and

each a 2 , there exists a unique a0 2 such that

Lðh; aÞ ¼ LðgðhÞ; a0 Þ 8h 2 H:

Example 7.41 X Nðh; 1Þ; 1\h\1

G ¼ fgc = 1\c\1g; gc ¼ x þ c

P is invariant w.r.t

G with G ¼ fgc = 1\c\1g; gc ðhÞ ¼ c þ h:

To estimate h under Lðh; aÞ ¼ ðh aÞ2

For any gc 2 G; a 2 , there is an a0 ¼ a þ c 2

such that Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:

a0 is uniquely determined by a and c. Hence the loss function is invariant w.r.t G.

7.7 Invariance 233

G ¼ fgc =0\c\1g, gc ðhÞ ¼ ch:

2

To estimate h with Lðh; aÞ ¼ 1 ah

For a0 ¼ ca, Lðh; aÞ ¼ Lðgc ðhÞ; a0 Þ 8h 2 X:

This a0 is uniquely determined by a and c. Hence the loss function is invariant w.

r.t. G.

Example 7.43 X Binðn; hÞ, 0\h\1

¼ fe;

G gg, eðhÞ ¼ h, gðhÞ ¼ 1 h:

To estimate h under squared error loss.

Then Lðh; aÞ ¼ LðeðhÞ; a0 Þ where a0 ¼ a

and Lðh; aÞ ¼ LðgðhÞ; a0 Þ where a0 ¼ 1 a: a0 is uniquely determined by a

member of G. Thus L is invariant w.r.t. G.

Invariance of a statistical decision problem:

A statistical decision problem ðH; ; LÞ and P

G = A group of transformation of x.

Deﬁnition A Statistical decision problem is said to be invariant under G if

(i) P is invariant under G

and (ii) L is invariant under G.

Thus as already shown

i. X Nðh; 1Þ to estimate h under squared error loss

G ¼ fge = 1\c\1g; ge ðxÞ ¼ x þ c

2

To estimate h under Lðh; aÞ ¼ 1 ah ; ge ðxÞ ¼ cx the problem is invariant

under G.

iii. X Binðn; hÞ; n is known, 0\h\1

To estimate h under squared error loss with G ¼ fe; gg

e(x) = x, g(x) = n − x, the problem is invariant under G.

To test H0 : l 0 against H1 : l [ 0, i.e. h 2 H0 against h 2 H1

234 7 Statistical Decision Theory

H0 ¼ fh ¼ ðl; r2 Þ=l 0g

= A group of transformation on

x

X Nðl; r2 Þ

) gc ðxÞ Nðc l; c2 r2 Þ 2 P

Note ge ðhÞ 2 Hi ; i ¼ 0; 1; 2; . . .. . .:ðiÞ

i.e. both P0 and P1 are invariant under G

where Pi ¼ fPh =h 2 Hi g; i ¼ 0; 1:

Also, Lðh; aÞ ¼ Lðgc ðhÞ; a0i Þ; i ¼ 0; 18h 2 H by (i)

⇒ Loss is invariant under G

Note To test H0 : h 2 H0 against H1 : h 2 H1 ; H0 ; H1 , disjoint, H0 þ H1 ¼ H

a ¼ fa0 ; a1 g; ai ¼ accept H i :

a0 a1

h 2 H0 0 L0

h 2 H1 L1 0

Let G = a group of transformation on

x

Also, h 2 Hi , gðhÞ 2 Hi ; i ¼ 0; 1

Hence, Lðh; ai Þ ¼ LðgðhÞ; ai Þ; i ¼ 0; 18h 2 H

L is invariant under G

A test of hypothesis problem (with 0–Li loss) is said to be invariant under G if

both P0 and P1 are invariant under G.

corresponding group of induced transformations on H and a.

7.7 Invariance 235

Let g 2 G

Let dðXÞ ¼ a be reasonable n.r. rule for the original problem. dðgðxÞÞ should be

a reasonable rule for the transformed problem. Also if for X = x, dðxÞ 2 is a

reasonable action for the original problem, then for gðXÞ ¼ gðxÞ; ~gðdðXÞÞ should be

a reasonable action in the transformed problem.

These two agree if dðgðxÞÞ ¼ ~gðdðxÞÞ. . .. . .:ðiiÞ

A non-randomized rule is said to be an invariant non-randomized rule if

(ii) holds 8x 2 x8g 2 G.

We thus get a class of n.r. decision rules as

DI = the class of invariant n.r. rules.

Appendix

viduals having certain character, say A. We are to test H0 : p ¼ p0 .

For doing this we draw a sample of size n. Suppose x = no. of individuals in the

sample have character A. The sufﬁcient statistic x is used for testing H0 : p ¼ p0 .

Suppose x0 is the observed value of x. Then x binðn;! pÞ.

P n

(a) H1 : p [ p0 ; x0 : P½x x0 =H0 a i:e:; px0 ð1 p0 Þnx a

x x0 x

!

P n

(b) H2 : p\p0 ; x0 : P½x x0 =H0 a i:e:; px0 ð1 p0 Þnx a

x x0 x

h n i

x0 : P x d0 =H0 a

h 2 i h i

n n

i:e:; P x þ d0 =H0 þ P x d0 =H0 a

2 ! 2 !

n n

X n 1 X n 1 n

i:e:; þ a where d0 ¼ x0

x n2 þ d0 x 2 x n2d0 x 2 2

Note

(1) For other values of π0 the exact test cannot be obtained as binomial distri-

bution is symmetric only when p ¼ 12.

(2) For some selected n and π the binomial probability sums considered above are

given in Table 37 of Biometrika (Vol. 1)

P.K. Sahu et al., Estimation and Inferential Statistics,

DOI 10.1007/978-81-322-2514-0

238 Appendix

A.1.2 Suppose we have two inﬁnite populations with π1 and π2 as the unknown

proportion of individuals having character A. We are to test H0 : p1 ¼ p2 .

To do this we draw two samples from two populations having sizes n1 and n2.

Suppose x1 and x2 as the random variables denoting the no. of individuals in the 1st

and 2nd samples with character A.

To test H0 : p1 ¼ p2 we make use of the statistics x1 and x2 such that x1 þ x2 ¼ x

(constant), say.

Under H0 : p1 ¼ p2 ¼ p (say),

!

n1

f ðx1 Þ ¼ p:m:f: of x1 ¼ px1 ð1 pÞn1 x1

x1

!

n2

f ðx2 Þ ¼ p:m:f: of x2 ¼ px2 ð1 pÞn2 x2

x2

n1 þ n2

f ðxÞ ¼ p:m:f: of x ¼ px ð1 pÞn1 þ n2 x :

x

! !

n1 n2

x1 x2

f ðx1 =xÞ ¼ , which is hypergeometric and independent of p.

n1 þ n2

x

Suppose the observed values of x1 and x are x10 and x0 respectively.

!

n1 n2

X x1 x0 x1

i:e:; a

x1 x10 n1 þ n2

x0

!

n1 n2

X x1 x0 x1

i:e:; a

x1 x10 n1 þ n2

x0

Appendix 239

Note The above probabilities can be obtained from the tables of hypergeometric

distributions (Standard University Press).

a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here, we are to test

H 0 : k ¼ k0 . P

To develop a test we make use of the sufﬁcient statistic y ¼ ni¼1 xi ; which is

itself distributed as Poisson with parameter nk. The p.m.f. of y under H0 is therefore

y

f ðyÞ ¼ enk0 ðnky!0 Þ ; y ¼ 0; 1; 2. . .

Suppose y0 is the observed value of y.

(a) H1 : k [ k0 ; x0 : P½y y0 =k ¼ k0 a

X ðnk0 Þy

i:e:; enk0 a:

y y0

y!

X ðnk0 Þy

i:e:; enk0 a:

y y0

y!

Note These probabilities may be obtained from Table 7 of Biometrika (Vol. 1)

A.2.2 Suppose we have two populations Pðk1 Þ and Pðk2 Þ. We draw a random

sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from Pðk1 Þ and another random sample

ðx21 ; x22 ; . . .; x2n2 Þ of size

Pn12 from Pðk2 Þ. We are to Ptest H0 : k1 ¼ k2 ¼ k (say).

Here we note that y1 ¼ ni¼1 x1i Pðn1 k1 Þ and y2 ¼ ni¼1

2

x2i Pðn2 k2 Þ.

To develop a test we shall make use of the sufﬁcient statisticsy1 and y2 but shall

concentrate only on those for which y ¼ y1 þ y2 = constant. Under H0 the p.m.f. of

y1 ; y2 and y are

f ðy1 Þ ¼ en1 k ; f ðy2 Þ ¼ en2 k and f ðyÞ ¼ eðn1 þ n2 Þk

y1 ! y2 ! y!

240 Appendix

yy1 y1

en2 k ðnðyy

2 kÞ

1 Þ!

en1 k ðn1ykÞ

1!

f ðy1 =yÞ ¼ fðn1 þ n2 Þkgy

eðn1 þ n2 Þk y!

y! ny11 ny22

¼

y1 !y2 ! ðn1 þ n2 Þy

! y1 y2

y n1 n1 n1

¼ 1 bin y; free of k:

y1 n1 þ n2 n1 þ n2 n1 þ n2

and y are y10 and y0 respectively. We consider the conditional p.m.f. f ðy1 =y0 Þ for

testing H0 .

X y0 n1 n2

i:e:; a

y1 y10 y1 n1 þ n2 n1 þ n2

X y0 n1 n2

i:e:; a:

y1 y10 y1 n1 þ n2 n1 þ n2

In many investigations one is faced with the problem of judging whether two

qualitative characters, say A and B, may be said to be independent. Let us denote the

forms of A by Ai fi ¼ 1ð1Þkg and the forms of B by Bj fj ¼ 1ð1Þlg, and the prob-

ability associated with the cell Ai Bj in the two-way classiﬁcation of the population

Appendix 241

P

by pij . The probability associated with Ai is then pi0 ¼ j pij and that associated

P

with Bj is p0j ¼ i pij . We show the concerned distribution in the following table:

A B Total

B1 B2 …. Bj …. Bl

A1 p11 p12 …. pij …. p1l p10

A2 p21 p22 …. p2j …. p2l p20

. . . . . .

. . . . . .

Ai pi1 pi2 …. pij …. pil pi0

. . . . . .

. . . . . .

Ak pk1 pk2 …. pkj …. pkl pk0

Total p01 p02 …. p0j …. p0l 1

where pij ¼ P A ¼ Ai ; B ¼ Bj 8ði; jÞ

pi0 ¼ PðA ¼ Ai Þ and p0j ¼ P B ¼ Bj

To do this we draw a random sample of size n. Let nij P = observed frequency for

the cell Ai Bj . The marginal frequency of Ai is ni0 ¼ j nij and that of Bj is

P

n0j ¼ i nij . Note that the joint p.m.f. of nij is multinomial, i.e.

YY

i ¼ 1ð1Þk i ¼ 1ð1Þk n!

f nij ; pij ; ¼Q Q ðpij Þnij :

j ¼ 1ð1Þl j ¼ 1ð1Þl i j ðn ij Þ! i j

Y Y

i ¼ 1ð1Þk n!

f nij ; ¼Q Q ðpio Þnio ðpoj Þnoj

j ¼ 1ð1Þl i j ðn ij Þ! i j

n! Y

f ðni0 Þ ¼ Q ðpi0 Þni0 8i ¼ 1ð1Þk

i ðn i0 Þ! i

n! Y

f ðn0j Þ ¼ Q ðp0j Þn0j 8j ¼ 1ð1Þl

j ðn 0j Þ! j

Q of nij keeping marginals ﬁxed is, under

f ðnij Þ ðni0 Þ! ðn0j Þ!

H 0 ; f ðni0 Þf ðn0j Þ ¼ n!i Q Q ðnj Þ!

i j ij

242 Appendix

This may be used for testing H 0 . Keeping marginal frequencies ﬁxed we change

the cell-frequencies and calculate the corresponding probabilities. If the sum of the

probabilities a, then we reject H 0 .

Normal Distribution

draw a random sample ðx1 ; x2 ; . . .; xn Þ of size n from this population. Here x ¼

Pn P P

1

n 1 xi ; s ¼ n

2 1

xÞ2 and s02 ¼ n 1 1 i ðxi xÞ2 .

i ðxi

A.4.1 To test H 0 : l ¼ l0 .

pﬃﬃ

Case I r known: we note that nðxlÞ r Nð0; 1Þ

pﬃﬃ

Under H 0 ; s ¼ nðxr l0 Þ Nð0; 1Þ:

H 1 : l [ l0 ; x0 : s [ sa

H 2 : l\l0 ; x0 : s\sa

H 3 : l 6¼ l0 ; x0 : jsj [ sa=2

100ð1 aÞ% conﬁdence interval for l (when H 0 is rejected) is x prﬃﬃn sa=2

pﬃﬃ

Case II r unknown: Here we estimate r by s’ and nðxs0 lÞ tn1 .

pﬃﬃ

Under H 0 t ¼ nðxs0 l0 Þ tn1 :

H 1 : l [ l0 ; x0 : t [ ta;n1

H 2 : l\l0 ; x0 : t\ta;n1

H 3 : l 6¼ l0 ; x0 : jtj [ ta=2;n1

0

100ð1 aÞ% conﬁdence interval for l is x ps ﬃﬃn ta=2 ; n 1

A.4.2 To test H 0 : r ¼ r0P

: P

ðxi lÞ2 ðxi lÞ2

Case I l known: we know r2 v2n , under H 0 ; v2 ¼ r20

v2n .

H 1 : r [ r0 ; x0 : v2 [ v2a;n

H 2 : r\r0 ; x0 : v2 \v21a;n

H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n

Appendix 243

or,

v2 \v21a=2;n

" P #

ðxi lÞ2

P v1a=2;n \

2

\va=2;n ¼ 1 a

2

r2

"P P #

ðxi lÞ2 ðxi lÞ2

i:e:; P \r \2

¼1a

v2a=2;n v21a=2;n

P

interval for r2 when l is known is

ðxi lÞ2 ðxi lÞ2

v 2 ; v2

.

a=2;n 1a=2;n

P P

ðxi xÞ2 ðxi xÞ2

Case II l unknown: we know r2 v2n1 under H 0 ; v2 ¼ r20

v2n1

H 1 : r [ r0 ; x0 : v2 [ v2a;n1

H 2 : r\r0 ; x0 : v2 \v21a;n1

H 3 : r 6¼ r0 ; x0 : v2 [ v2a=2;n1 :

or,

v2 [ v21a=2;n1

" P #

ðxi xÞ2

P v1a=2;n1 \

2

\va=2;n1 ¼ 1 a

2

r2

"P P #

ðxi xÞ2 ðxi xÞ2

i:e., P \r \ 2

2

¼1a

v2a=2;n1 v1a=2;n1

P) 100ð1 P

aÞ% conﬁdence interval for r2 when l is unknown is

ðxi xÞ2 ðxi xÞ2

v2

; v2 .

a=2;n1 1a=2;n1

Distributions

Suppose we have two independent populations Nðl; r21 Þ and Nðl2 ; r22 Þ. We draw a

random sample ðx11 ; x12 ; . . .; x1n1 Þ of size n1 from the ﬁrst population and another

random sample ðx21 ; x22 ; . . .; x2n2 Þ of size n2 from the second population.

244 Appendix

1X n1

1X n2

x1 ¼ x1i and x2 ¼ x2i

n1 i¼1 n2 i¼1

1 X n1

1 X n2

s02

1 ¼ ðx1i x1 Þ2 s02

2 ¼ ðx2i x2 Þ2

n1 1 i¼1 n2 1 i¼1

respectively.

(I) H 0 : 11 l1 þ 12 l2 ¼ 13 :

Case I r1 ; r2 known:

11x1 þ 12x2 ð11 l1 þ 12 l2 Þ

We ﬁnd that sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nð0; 1Þ

121 r21 122 r22

þ

n1 n2

11x1 þ 12x2 13

Under H 0 ; s ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nð0; 1Þ

121 r21 122 r22

þ

n1 n2

) H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : s [ sa

H 2 : 11 l1 þ 12 l2 \13 ; x0 : s\sa

H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jsj [ sa=2

2 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 3

1 2 r2 1 2 r2

411x1 þ 12x2 1 1

þ 2 2 sa=2 5

n1 n2

Case II r1 ; r2 unknown:

Fisher’s t-test: We assume r1 ¼ r2 ¼ r, say.

ðn1 1Þs02 02

1 þ ðn2 1Þs2

r2 is estimated by ðn1 þ n2 2Þ ¼ s02 say

Also, sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ tn1 þ n2 2

2 2

1 1

s0 1

þ 2

n1 n2

11x1 þ 12x2 13

Under H 0 ; t ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ tn1 þ n2 2

121 122

s0 þ

n1 n2

This t is known as Fisher’s t when 11 ¼ 1, 12 ¼ 1.

Appendix 245

H 1 : 11 l1 þ 12 l2 [ 13 ; x0 : t [ ta;n1 þ n2 2

H 2 : 11 l1 þ 12 l2 \13 ; x0 : t\ ta;n1 þ n2 2

H 3 : 11 l1 þ 12 l2 6¼ 13 ; x0 : jtj [ ta=2 ;n1 þ n2 2

0 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1

1 2 1 2

@11x1 þ 12x2 s0 1

þ 2 ta=2;n1 þ n2 2 A:

n1 n2

Note–I The above procedure may also be applicable when r1 and r2 are not

r2

equal provided 1 r12 \ 0.4—theoretical investigation in this area veriﬁes this.

2

Note–II when homoscedasticity assumption r1 ¼ r2 is not tenable then we

require the alternative procedure and the corresponding problem is known as the

Fisher-Behren problem.

Note–III For 11 ¼ 1 and 12 ¼ 1 we get the test procedure for the difference

between the two means. Also for testing the ratio of the means, i.e. for testing

H 0 : ll1 ¼ k; say, we start with ðx1 kx2 Þ.

2

(II) H 0 : rr12 ¼ n0 :

P

1

ðx1i l1 Þ2

Case I l1 ; l2 known: n11 P ðx l Þ2 : r12 F n1 ;n2

n1 2i 2 1

r2

2

P

ðx1i l Þ2 =n1

) Under H 0 ; F ¼ P ðx l1 Þ2 n : n12 F n1 ;n2

2i 2 = 2 0

r1

H1 : [ n0 ; x0 : F [ F a;n1 ;n2

r2

r1

H 2 : \n0 ; x0 : F\F 1a;n1 ;n2

r2

r1

H3 : 6¼ n0 ; x0 : F [ F a=2;n1 ;n2 or; F\F 1a=2;n1 ;n2 :

r2

P

ðx1i l1 Þ2 =n1 r22

P

Also, P F 1a=2;n1 ;n2 \ ðx l Þ2 n : r2 \F a=2;n1 ;n2 ¼ 1 a

2i 2 = 2 1

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

P

ðx1i l1 Þ2 ðx1i l1 Þ2

Or, P P n2 r1

\ r2 \ n P n2

¼1a

n 1 ðx l Þ2 F

2i 2 a=2;n1 ;n2 ðx l Þ2 F

1 2i 2 1a=2;n1 ;n2

r1

This provides the 100ð1 aÞ% conﬁdence interval for r2 when l1 ; l2 are

known.

Case II l1 ; l2 unknown:

P

1

ðx1i x1 Þ2

We have n111 P ðx x Þ2 : r12 F n1 1;n2 1

n2 1 2i 2 1

r2

2

246 Appendix

s02

1 r2

2

i:e:; : F n1 1;n2 1

s02

2 r1

2

s02

under H 0 ; F ¼ s102 : e12 F n1 1;n2 1

2 o

r1

) H1 : [ n0 ; x0 : F [ F a;n1 1;n2 1

r2

r1

H2 : \n0 ; x0 : F\F 1a;n1 1;n2 1

r2

r1

H3 : 6¼ n0 ; x0 : F [ F a=2;n1 1;n2 1 or F\F 1a=2;n1 1;n2 1 :

r2

2 3

6 s02 7

Also, P4F 1a=2;n1 1;n2 1 \ s102 : 12 \F a=2;n1 1;n2 1 5 ¼ 1 a

2 r1

r2

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

s02 r1 s02

Or, P s02 F

1

\ r2 \ s02 F

1

¼1a

2 a=2;n1 1;n2 1 2 1a=2;n1 1;n2 1

i.e., 100ð1 aÞ% conﬁdence interval for rr12 , when l1 ; l2 are unknown, is

2 3

s01 0 s01 0

6 s2 s2 7

4qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

ﬃ ; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ5:

F a=2;n1 1;n2 1 F 1a=2;n1 1;n2 1

Distributions

Suppose in a given population the variables x and y are distributed in the bivariate

normal form N 2 ðlx; ly ; rx ; ry ; qÞ. Let ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . .; ðxn ; yn Þ be the values of

x and y observed in a sample of size n drawn from this population. We shall

suppose that the n pairs of sample observations are random and independent. We

shall also assume that all the parameters are unknown.

We have for the sample observations

1X 1X 1 X

x¼ xi ; y ¼ yi ; s02

x ¼ ðxi xÞ2 ;

n i n i n1 i

P

1 X

2 2

i ðxi xÞ ðyi yÞ

1

02 2

sy ¼ ðyi yÞ ; and r xy ¼ n1

n1 i s0x s0y

Appendix 247

(1) To test H 0 : q ¼ 0:

pﬃﬃﬃﬃﬃﬃ

We know when q ¼ 0; t ¼ rpﬃﬃﬃﬃﬃﬃﬃ

n2ﬃ

2

tn2

1r

H 1 : q [ 0; x0 : t [ ta;n2

H 2 : q\0; x0 : t [ ta;n2

H 3 : q 6¼ 0; x0 : jtj [ ta=2;n2

Note For testing q ¼ q0 ð6¼0Þ, exact test is difﬁcult to get as for q 6¼ 0 the

distribution of r is complicated in nature. But for moderately large n one can use the

large sample test which will be considered later.

(2) H 0 : lx ly ¼ n0

Deﬁne z ¼ x y ) lz ¼ lx ly i.e., we are to test H 0 : lz ¼ n0 . Also

pﬃﬃ P

nðzlz Þ

note that s0z tn1 where s02

z ¼ n1

1

zÞ2 ¼ s02

i ðzi

02 0

x þ sy 2sxy

P pﬃﬃ

nðzn0 Þ

s0xy ¼ n1

1 2 2

i ðxi xÞ ðyi yÞ . Under H 0 ; t ¼ s0 tn1 :

z

H 3 : lx ly 6¼ n0 ; x0 : jtj [ ta=2;n1

s0

Also, 100ð1 aÞ% conﬁdence interval for lz ¼ lx ly is z pzﬃﬃn ta=2;n1

(3) H 0 : llx ¼ g0 : we write g¼ llx .

y y

z ¼ x gy = a function of g.

s02 02 2 02 0

z ¼ sz þ g sy 2gsxy = a function of g.

pﬃﬃ pﬃﬃ

nðzlz Þ

Now, s0z tn1 : i:e:; sn0 z tn1 ð*lz ¼ 0Þ

pﬃﬃ z

z0

s02 02 02 0

z0 ¼ sx þ g0 sy 2g0 sxy

y

lx

H2 : \g0 ; x0 : t \ ta;n1

ly

l

H 3 : x 6¼ g0 ; x0 : jtj [ ta=2;n1 :

ly

248 Appendix

h pﬃﬃ i

Again P ta=2;n1 \ sn0 z \ta=2;n1 ¼ 1 a

z

hpﬃﬃ i

nz

i.e., P s0 \ta=2;n1 ¼ 1 a or P½wðgÞ\0 ¼ 1 a.

2

z

z2

Solving the equation wðgÞ ¼ n s02 ta=2;n1 ¼ 0 which is a quadratic equation

z

in g, one can get two roots g1 and g2 ð [ g1 Þ. Now if wðgÞ is a convex function and

g1 and g2 are real, then P½g1 \g\g2 ¼ 1 a. If wðgÞ is a concave function, then

P½g\g1 ; g [ g2 ¼ 1 a. But if g1 and g2 be imaginary then from the given sample

100ð1 aÞ%Conﬁdence interval does not exist.

(4) Test for the ratio n ¼ rrxy :

We write u ¼ x þ ny; v ¼ x ny

pﬃﬃﬃﬃﬃﬃ

Then, rp

uv ﬃﬃﬃﬃﬃﬃﬃﬃ

n2ﬃ

2

tn2

1ruv

P

1

ðui uÞðvi vÞ

where r uv ¼ n su sv = a function of n. We are to test H 0 : rrxy ¼ n0 , i.e.

H 0 : n ¼ n0 .

pﬃﬃﬃﬃﬃﬃ

r0uv n2

) under H0, t ¼ p ﬃﬃﬃﬃﬃﬃﬃﬃﬃ

02

tn 2

1ruv

where r 0uv ¼ value of r uv under n ¼ n0 .

For H 1 : n [ n0 ; x0 : t [ ta;n2

H 2 : n\n0 ; x0 : t\ta;n2

H 3 : n 6¼ n0 ; x0 : jtj [ ta=2;n2 :

pﬃﬃﬃﬃﬃﬃ

Also, P ta=2;n2 \ rp

uv ﬃﬃﬃﬃﬃﬃﬃﬃ

n2ﬃ

2

\t a=2;n2 ¼ 1 a

1ruv

r2 ðn2Þ

Solving the equation wðnÞ ¼ uv1r2 t2a=2;n2 ¼ 0, (which is a quadratic in n)

uv

one can get two roots n1 and n2 ð [ n1 Þ. If these roots are real and wðnÞ is a convex

function, then Pðn1 \n\n2 Þ ¼ 1 a. Again if wðnÞ is concave,

Pðn\n1 ; n [ n2 Þ ¼ 1 a. But if n1 and n2 are not real, then 100ð1 aÞ%

Conﬁdence interval does not exist so far as the given sample is concerned.

(5) rx ; ry ; q are known:

H 0 : lx ¼ l0x ; ly ¼ l0y against H1 : H0 is not true. We know that

" 2

#

1 x lx x lx y ly y ly 2

Qðx; yÞ ¼ 2q þ v22

1 q2 rx rx ry ry

!

r2x r2y

ðx; yÞ N 2 lx ; ly ; ; ; q

n n

" #

n x lx 2 x lx y ly y ly 2

) Qðx; yÞ ¼ 2q þ v22

1 q2 rx rx ry ry

Appendix 249

Under H 0 ,

2 ! ! 3

0 2

n x l 0 2

x l 0 y l 0

y l

v2 ¼ 4 x

2q x y

þ

y 5 v2

1 q2 rx rx ry ry 2

Distributions

Suppose there are k-populations Nðl1 ; r21 Þ; Nðl2 ; r22 Þ; . . .Nðlk ; r2k Þ. We draw a

random sample of size ni from the ith population with ni ( 2 for at least one i).

Deﬁne

xij ¼ jth observation of ith sample, i = 1,2,...,k; j = 1,2,..., ni

Pi

xi ¼ ith sample mean ¼ n1i nj¼1 xij

02

Pi 2

si ¼ ith sample variance ¼ ni 1 nj¼1

1

xij xi

(I) We are to test H 0 : l1 ¼ l2 ¼ ¼ lk ð¼lÞ, say against H 1 . There is at least

one inequality in H 0 .

Assumption r1 ¼ r2 ¼ ¼ rk ð¼rÞ say.

Note that xi N li ; rni

2

pﬃﬃﬃ

n ðx l Þ

) i ri i Nð0; 1Þ 8i and are independent.

ðn 1Þs02

Also, i r2 i v2ni 1 (xi and s0i are independent.)

Under H 0

X k

ni ðxi lÞ2

v2k

i¼1

r2

these two v2 are independent.

Xk

ðni 1Þs02

and i

v2nk

i¼1

r2

1X X

^¼

l nixi ¼ xðsayÞ; n ¼ ni

n i

Pk P

) Under H 0 ; ni ðxi xÞ2 r2 v2k1 and ki¼1 ðni 1Þs02

i r vnk .

2 2

P

i¼1

2

ni ðxi xÞ =

) Under H 0 ; F ¼ P k 1 F k1;nk .

02

i ðn i 1Þs i =n k

x0 : F [ F a;k1;nk . If H 0 is rejected, then we may be interested to test H 0 : li ¼

lj against H 1 : li 6¼ lj 8ði; jÞ.

250 Appendix

2 1 1

xi xj N li lj ; r þ

ni nj

xi xj ðli lj Þ

) qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Nð0; 1Þ

r n1i þ n1j

P

ðni 1Þs02 ðxi xpj Þðli lj Þ

ﬃﬃﬃﬃﬃﬃﬃﬃ

Unknown r2 is estimated by r ^2 ¼ n k i ¼ s02 , say ) tnk .

s0 ni þ nj

1 1

ðp xi xj Þ

) under H 0 ; t ¼ 0 ﬃﬃﬃﬃﬃﬃﬃﬃ tnk :

ni þ nj

1 1

s

) x0 : jtj [ ta=2;nk . Also, 100ð1 aÞ% conﬁdence interval for li lj is

n qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ o

xi xj s0 n1i þ n1j ta=2;nk .

(II)Bartlett’s test To test H 0 : r1 ¼ r2 ¼ ¼ rk ð¼ rÞ, say against H 1 : There

is at least one inequality in H 0 .

P

Deﬁne ci ¼ ni 1 and c ¼ ki¼1 ci ¼ n k. Bartlett’s test statistic M is such that

( )

X

k

c s02 X

k

M ¼ c loge i i

ci loge s02

i¼1

c i¼1

i

P

samples M 0 ¼ n M o v2k1 under H 0 where c1 ¼ ki¼1 c1 1c and

c i

1 þ 3ðk1Þ

1

x0 : M 0 [ v2a;k1 :

Suppose the sample values of x and y are arranged in arrays of y according to the

ﬁxed values of x as given below:

x1 x2 . . . xi ... xk

y1 y21 . . . yi1 ... yk1

y12 y22 . . . yi2 yk2

: : ...

: :

: :

y1n1 y2n1 yini ... yknk

Appendix 251

Pni P

Deﬁne yi0 ¼ 1

ni j¼1 yij ; y00 ¼ 1n i niyi0 ¼ y

1X X

x¼ ni x i ; n ¼ ni

n i i

P

ni ðy y00 Þ2

e2yx ¼ P iP i0

i j ð yij y00 Þ2

qﬃﬃﬃﬃﬃﬃ

eyx ¼ þ e2yx ¼ sample correlation ratio:

P P

j ðyij y00 Þðxi xÞ

1

i

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

r ¼ rn n

on P oﬃ

P P

j ðyij

2 2

1

n i y00 Þ 1

n i ni ð x i x Þ

We assume yij xi N 1 ðli ; r2 Þ , i.e. Eðyij xi Þ ¼ li: .

(I) Test for regression: H 0 There does not exist any regression of y on x.

, H 0 : l1 ¼ l2 ¼ ¼ lk :

qﬃﬃﬃﬃﬃﬃ

Deﬁne g2yx ¼ VðEðy=xÞÞ

VðyÞ ; gyx ¼ þ g2yx ¼ population correlation ratio.

) To test H 0 is equivalent to test H 0 : g2yx ¼ 0 against H 1 : g2yx [ 0

We note that

XX XX X

ðyij y00 Þ2 ¼ ðyij yi0 Þ2 þ ni ðyi0 y00 Þ2

i j i j i

Under H 0

XX X

SSB ¼ e2yx ðyij y00 Þ2 ¼ ni ðyi0 y00 Þ2 r2 v2k1

i j i

XX XX

SSw ¼ 1 e2yx ðyij y00 Þ2 ¼ ðyij yi0 Þ2 r2 :v2nk

i j i j

h i

=ðk1Þ e2yx B =ðk1Þ

) Under H 0 :F ¼ ð1e2 Þ nk F k1;nk : F ¼ SS

yx =

SSW =nk

) x0 : F [ F a;k1;nk :

is linear, i.e. we are to test

H 0 : li ¼ a þ bxi 8i

H 1 : li 6¼ a þ bxi

252 Appendix

P P P

We note that, e2yx ðyij y00 Þ2 ¼ i ni ðyi0 y00 Þ2

nP P i j

o2

P P ðy

y Þ ðx i xÞ

^2 P ni ðxi xÞ2

Pj

ij 00

Also, r 2 i j ðyij y00 Þ2 ¼ ¼b

i

ni ðxi xÞ2

PP i

ðyij y00 Þðxi xÞ

^¼

where b i Pj

2

ni ðxi xÞ

P P

i

P

) e2yx r 2

ðyij y00 Þ2 ¼ ^ 2 P ni ðxi xÞ2 r2 :v2 under

ni ðyi0 y00 Þ2 b

i j i i k2

H0. P P

P P

Also, e2yx i j ðyij y00 Þ2 and e2yx r 2 i y00 Þ2 are independent.

j ðyij

ðe2yx r2 Þ=ðk2Þ

) under H 0 ; F ¼ ð1e2yx Þ=nk

F k2;nk

) x0 : F [ F a;k2;nk

Regression Equation

y=x Nða þ bx; r2 Þ

PY = a + bx, where a, b are the LS

ðy yÞðxi xÞ

estimates of α and β, i.e. a ¼ y bx and b ¼ Pi ðx xÞ2 ¼ Sxyxx :

S

2 i

2

r2 r2 r2 r2 X 2

VðaÞ ¼ VðyÞ þ x2 VðbÞ ¼ þ x2 ¼ Sxx þ nx2 ¼ xi

n Sxx nsxx nsxx

x2

i.e., a N a; r2 1n þ Sxx

a a0

H 01 : a ¼ a0 : under H 01 , t ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ tn2

1 x2

^

r þ

n Sxx

P 2

^ ¼

where r 2

ðyi a bxi Þ =ðn 2Þ

Appendix 253

) H 11 : a [ a0 ; x0 : t [ ta;n2

H 21 : a\a0 ; x0 : t\ta;n2

H 31 : a 6¼ a0 ; x0 : jtj [ ta=2;n2 :

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Also, 100ð1 aÞ% conﬁdence interval for a is a r ^ 1n þ Sxxx ta=2 ; n 2 :

2

pﬃﬃﬃﬃﬃ

H 02 : b ¼ b0 : under H 02 ; t ¼ ðbb0r^Þ Sxx tn2

) H 12 : b [ b0 ; x0 : t [ ta;n2

H 22 : b\b0 ; x0 : t\ta;n2

H 32 : b 6¼ b0 ; x0 : jtj [ ta=2;n2 :

^

r

b pﬃﬃﬃﬃﬃﬃ ta=2;n2

Sxx

r2

¼ xVðbÞ ¼ x :

Sxx

! ( ! P 2 !)

r 2

a a xi

rSxxx

2

) N2 ; nSxx

b b

rSxxx r 2 2

Sxx

! ( ! P 2 )

a a xi x n

; nSr xx

2

i.e., N2

b b x n n

P 2 X

r 2

xi x n

Let ¼

nSxx x n n

!0 !

aa X1 aa

) v22 :

bb bb

n Pnx

P r2

P1 adj nSxx

nx x2

Now, ¼ j P j ¼ r2 2 P 2 i2 2

ðnSxx Þ ðn xi n x Þ

254 Appendix

nSxx n nx 1 n nx

¼ 2 P 2

¼ 2 P 2

r nSxx nx xir nx xi

!0 ! !0 !

a a X1 a a 1 aa n nx aa

) ¼ 2 P 2

bb bb r bb nx xi bb

h X i

) nða aÞ2 þ 2nxða aÞðb bÞ þ ðb bÞ2 x2i r2 v22

P

Again, n1 ðyi a bxi Þ2 r2 v2n2

) under H 03 ,

n P o.

nðaa0 Þ2 þ 2nxða a0 Þðb b0 Þ þ ðb b0 Þ2 x2i 2

F¼ P . F 2;n2

ðyi a bxi Þ2 ðn 2Þ

) w0 : F [ F a;2;n2 :

Correlation Coefﬁcient

P

Suppose x px1 N p l ; pxp

q1:23...p = population multiple correlation coefﬁcient of X 1 on X 2 ; X 3 ; . . .; X p

r 1:23...p = sample multiple correlation coefﬁcient of X 1 on X 2 ; X 3 ; . . .; X p based on

a sample of size n ð p þ 1Þ 0 1

1 r 12 r 13 . . . r 1p

1=2 B 1 r 23 . . . r 1p C

B C

¼ 1 RjRj11 where R ¼ BB ... ...C C and R11 = cofactor of r 11

@ ... ...A

1

in R.

r2 =ðp1Þ

If q1:23...p ¼ 0 then F ¼ 1:23...p F p1;np .

ð1r1:23...p Þ ðnpÞ

2

To test

H 0 : q1:23...p ¼ 0 against H 1 : q1:23...p [ 0

)w0 : F [ F a;p1;np :

effect of X 3 ; . . .; X p .

Appendix 255

r 12:34 ...p = sample partial correlation coefﬁcient of X 1 andX 2 eliminating the effect

of X 3 ; . . .; X p

¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ

R12 ﬃ

R R

. If q12:34 ...p then

11 22

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

r 12:34...p n p

t ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ tnp

1 r 212:34...p

H 2 : q12:34 ...p \0; x0 : t\ ta;np

H 3 : q12:34 ...p 6¼ 0; x0 : jtj [ ta=2;np :

We consider a set of variables y; x1 ; x2 ; . . .; xp , where y is stochastic and

x1 ; x2 ; . . .; xp are nonstochastic. Let the multiple regression of y on x1 ; x2 ; . . .; xp be

E y x1 ; x2 ; . . .; xp ¼ b0 þ b1 x1 þ b2 x2 :: þ bp xp ðA:1Þ

bi = partial regression coefﬁcient of y on xi eliminating the effects of

xj ; j 6¼ i ¼ 1; 2;. . .p:

Deﬁne riy ¼ Cov(xi ; yÞ;rij ¼ Cov(xi ; xj Þ; ryy ¼ vðyÞ; qiy ¼ correlation of

ðxi ; yÞ; qij ¼ correlation of ðxi ; xj Þ; i ¼ 1; 2; ::p and j ¼ 1ð1Þp

We write r px1 ¼ ðr1y ; r2y ; . . .; rpy ; Þ0

0 ð1Þ 1

r11 r12 . . . r1p

Ppxp B r21 r22 . . . r2p C

¼B C

@ . . . . . . . . . . . . A = variance-covariance matrix of x1 ; x2 ; ::; xp

rP1 rP2 . . . rpp

We write

0 1

ryy ry1 ry2 ryp

B

Xp þ 1xp þ 1 B r1y r11 r12 r1p C

C

0

¼B

B r2y r21 r22 r2p C

C

@... ... ... ... A

0 1 rpy rp1 rp2 rpp

ryy r0

ð1Þ

¼@ P A = variance–covariance matrix of y, x1 ; x2 ; . . .; xp .

r pxp

ð1Þ

256 Appendix

Similarly, we0write 1

qyy qy1 qy2 qyp

B q1y q11 q12 q1p C

B C

q0p þ 1xp þ 1 ¼ B q

B 2y q21 q22 q2p C

C = correlation matrix of y,

@... ... ... ... A

qpy qp1 qp2 qpp

x1 ; x2 ; . . .; xp .

P P

Now, 0 ¼ product of the diagonal element of 0 jq0 j

P

j j P

)jq0 j ¼ ðryy r11 r220...rpp Þ. Also, j j ¼ ðr P11 r22 . . .rpp Þx Cofactor of qyy in q0 .

j j

) Cofactor of qyy in q0 ¼ r11 r22 ...rpp

jq 0 j

) q2y:12...p ¼ 1

Cofactor of qyy in q0

P P

ðryy r11 r22 . . .rpp Þ

¼1 P 0

¼1 P

0

j j ðr11 r22 . . .rpp Þ ryy j j

P1 P

ryy r 0 r 0

r0ð1Þ 1 rð1Þ rð1Þ b X1

ð1Þ ð1Þ

¼1 ¼ ¼ as b ¼ r :

ryy ryy ryy ð1Þ

P

r0 b b0 b
X1 X X

ð1Þ

) q2y:12...p ¼ ¼

; b¼ r ) b ¼ r ) r0 b ¼ b0 b

ryy ryy ð1Þ ð1Þ ð1Þ

ya ; x1a ; x2a ; . . .; xpa ; a ¼ 1ð1Þn; n [ p þ 1:

Pn Pn

Deﬁne xi ¼ 1n a¼1 xia ; Sij ¼ a¼1 ðxia xi Þ xja xj

Xn

Siy ¼ a¼1

ðxia xi Þ ya yj : 8i; j ¼ 1ð1Þp

0 1

S11 S12 ... S1p

B S21 S22 ... S2p C

Spxp ¼B

@...

C which is positive deﬁnite.

... ... ... A

Sp1 Sp2 ... Spp

^ þb

Estimated regression equation of y on x1 ; x2 ; . . .; xp is y ¼ b ^ x1 þ

0 1

^ ^

b2 x2 þ þ bp xp

Appendix 257

^ ;b

where b ^ ^ ^

0 1 ; b2 ; . . .; bp are the solutions of the following normal equations:

9

S1y ¼b ^ S11 ^ S12

b þ þ ^ S1p >

b >

1 2 p >

S2y ¼b ^ S21 ^ S22

b þ þ ^ S2p =

b

1 2 p ðA:2Þ

... ... ... >

>

Spy ¼b ^ Sp1 ^ Sp2

b þ þ ^ Spp >

b ;

1 2 p

^ ¼yb

and b ^ x1 b

^ x2 b ^ xp

0 1 2 p

We write y nx1

¼ ðy1 ; y2 ; . . .; yn Þ0

0 1

x11 x1 x21 x2 . . . xp1 xp

B x12 x1 x22 x2 . . . xp2 xp C

K nxp ¼B

@ ...

C

... ... ... A

x1n x1 x2n x2 . . . xpn xp

0

^ px1 ¼ b

b ^ ;b

^ ^

1 2 ; ::; bp

Pn

Note that Siy ¼ a¼1 ðxia xi Þya ; (A.2) reduces to

^ ¼ K0 y ) b

Sb ^ ¼ S1 K 0 y

^ ;b

)b ^ ^

1 2 ; . . .; bp are linear functions of y1 ; y2 ; . . .; yn which are normal.

^ ^ ;D b

) b Np E b ^

^ ¼ S1 K 0 y ¼ S1 K 0 y y 2

Now, b

0

where 2 ¼ ð1; 1; . . .; 1Þ and K 0 2 ¼ 0

^ 1 0

) E b ¼ S K E y y 2

EðyÞ ¼ b0 þ b1x1 þ þ bpxp

Eðya yÞ¼b1 ðx1a x1 Þ þ þ bp ðxpa xp Þ

0 1

y1 y

B y y C

B 2 C

E y y 2 ¼ E B C ¼ Kb

@ : A

yn y

258 Appendix

)E b ^ ¼ S1 K 0 K b

^ ¼ S1 S b ¼ b ½* K 0 K ¼ S

^

D b ¼ S K r I n KS ¼ r S K KS1 ¼ r2 S1 SS1 ¼ r2 S1

1 0 2 1 2 1 0

^ ¼ S1 K 0 y Np b ; r2 S1

)b

ij

We write S1 ¼ S ^ Þ ¼ r2 Sii

) Vðb i

and

Cov b ^

^ ;b

i j ¼r S 8i; j ¼ 1ð1Þp

2 ij

^ N 1 b ; r2 Sii i ¼ 1ð1Þp

)bi i

^ ¼y b

Again, b ^ x1 b

^ x2 b

^ xp

0 1 2 p

!

X

p X

p X

p

^ ¼ EðyÞ

)E b E ^ xi ¼

b b0 þ bixi bixi ¼ b0

0 i

i¼1 i¼1 i¼1

X

^ ¼V y

V b ^ xi

b

0 i

^ ; x 0 ¼ x1 ; x2 ; . . .; xp

¼ V y x0b

¼ rn

2

0 ^ x (as y and b

þ x D b ^ are independent)

r2 1

¼ þ x 0 r2 S1 x ¼ r2 þ x 0 S1 x :

n n

Thus b0

^ N 1 b ; r2 1 þ x 0 S1 x

)b 0 0

n

^ þ Pb

p

Again Y ¼ b ^ xi ; ) Y N 1 ðEðYÞ; VðYÞÞ

0 i

1

P

^ þ

where EðYÞ ¼ E b ^ xi ¼ b þ P b xi ¼ nx ; (say)

E b

0 i 0 i

Appendix 259

! " #

X

p X X

VðYÞ ¼ V b^0 þ ^ xi

b i ¼ V y ^ xi þ

b i

^ xi

b i

i¼1 i i

" # !

X X

¼ V y þ ^

bi ðxi xi Þ ¼ VðyÞ þ V ^

bi ðxi xi Þ

i i

r2 X X

p p

¼ þ ðxi xi Þ xj xj Cov b ^

^ ;b

i j

n 1 1

0

r2 X X 2 ij

p p

2 1 1

¼ þ ðxi xi Þ xj xj r S ¼ r þ x x S ð x xÞ

n n

1 1

X 1 0

) Y N1 b0 þ bi xi ¼ nx ; r2 þ x x S1 ð x xÞ

n

X p 2

1 ^ b ^ x1a þ . . . þ b ^ xpa

^2 ¼

r ya b

n p 1 a¼1 0 1 p

( )2

1 Xn X

¼ ðya yÞ ^ ðxia xi Þ

b

n p 1 a¼1 i

i

" #

1 X n XX

¼ Syy ^ ^

bi bj ðxia xi Þðxia xjÞ

np1 a¼1 i j

" #

1 XX 1 0

¼ Syy ^ ^

bi bj Sij ¼ ^

Syy b S b^

np1 i j

np1

0

P

b b

(Note that q2y:12:::p ¼ ryy )

(1) H 01 : b1 ¼ b2 ¼ ¼ bp ¼ 0

) x1 ; x2 ; . . .; xp are not worthwhile in predicting y.

^0 P b

b ^

* q2y:12...p ¼ ) b1 ¼ b2 ¼ ¼ bp ¼ 0 ) q2y:12...p ¼ 0

ryy

So the problem

is totest H 01 : q2y:12...p ¼ 0 against H 1 : q2y:12...p [ 0

Now Syy 1 r 2y:12...p ¼ Syy b ^0 P b ^

n

X

¼ ^ b

ya b ^ x1a b

^ xpa r2 v2

0 1 p np1

a¼1

260 Appendix

^ 0S b

Also, Syy r 2y:12...p ¼ b ^ 0S b

^ ¼ Syy Syy b ^

) Syy ¼ ^ 0S b

Syy b ^ þ Syy r 2y:12...p

.

r 2y:12...p p

) F1 ¼ . F p;np1

1 r 2y:12...p ðn 1 pÞ

) x0 : F 1 [ F a;p;np1

(2) H 0 : b0 ¼ b against H 1 : b0 6¼ b

^ b

b

Under H 0 ; t ¼ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

0

tnp1

1 0 1

^

r þ x S x

n

^2 ¼ np1

where r 1 ^ 0S b

Syy b ^ ¼ S02

y:12...p , say

X

^ ¼y

and b ^ xi ¼ y x 0 b

b ^

0 i

x0 : jtj [ ta=2;np1

^ N 1 b ; r2 Sii

b i i

^ b0

b ipﬃﬃﬃiﬃ

Under H 0 ; t ¼ tnp1 :

r

^ Sii

x0 : jtj [ ta=2;np1

pﬃﬃﬃﬃﬃ

b^ r

^ S ii

t a=2;np1

i

^ b

b ^ N 1 b b ; r2 Sii þ Sjj 2Sij

i j i j

) under H 0 ; t ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ 0 tnp1

r

^

ii jj

S þ S 2Sij

Appendix 261

) x0 : jtj [ ta=2;np1

100ð1 aÞ% conﬁdence interval for bi bj is

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

b^ b

^ r ^ Sii þ Sjj 2Sij ta=2;np1

i j

(5) H 0 : EðYÞ ¼ nx ¼ n0

Yn0

Under H 0 ; t ¼ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

0 ﬃ tn p 1

r

^ nþ

1

x x S1 x x

x0 : jtj [ ta=2;np1 :

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ !

1 0

1

^

Y r þ x x S x x ta=2;n p 1

n

^ þ Pp b

where Y ¼ b ^

0 i¼1 i xi

of the Multivariate Normal Distribution

Ppxp P

Let x px1

Np l px1

; ; is positive deﬁnite.

The p.d.f. of x is

0 P

1

1 12 xl xl 1\xi \1

f x ¼ p ﬃﬃﬃﬃﬃﬃﬃﬃ

P e ; and 0\ri \1

ð2pÞp=2 j j 1\li \1

X1

Qð x Þ ¼ ð x l Þ0 ð x lÞ

P

Since is positive deﬁnite, there exists a nonsingular matrix V pxp such that

P1

¼ VV 0 :

) Qð x Þ ¼ ð x l Þ0 VV 0 ð x l Þ ¼ y 0 y where y ¼ V 0 ð x l Þ

262 Appendix

X

p

¼ y2i :

i¼1

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

X

@ x1 ; x2 ; . . .; xp 1 1

jJ j ¼ ¼ ¼ q ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼

@ ðy1 ; y2 ; . . .; yn Þ jV j P1

12 y 0 y pﬃﬃﬃﬃﬃﬃﬃﬃ

P

) p.d.f. of y is f ð y Þ ¼ p

1 ﬃﬃﬃﬃﬃﬃﬃﬃ

P e j j

ð2pÞ p=2

j j

P

P

1 12 y21

¼ e 1 :

ð2pÞp=2

X

p

) y2i v2p ; i.e., Qð x Þ v2p

1

P P

If we now want to ﬁnd the distribution of Q

ð x Þ ¼ x 0 1 x, since is

P

positive deﬁnite, there exists a non-singular matrix V pxp such that 1 ¼ VV 0 .

) Q

ð x Þ ¼ x 0 VV 0 x ¼ z 0 z where Z ¼ V 0 x .

P

pﬃﬃﬃﬃﬃﬃﬃﬃ

Here also jJ j ¼ j j.

X1

)ð x l Þ0 ð x l Þ¼ ð x l Þ0 VV 0 ð x l Þ

0

¼ V 0 x V 0 l I p V 0 x V 0 l

0

¼ z V 0 l Ip z V 0 l

0

1 12 z V 0 l I p z V 0 l

) f ðzÞ ¼ e

ð2pÞp=2

) z1 ; z2 ; . . .; zp are normal with common variance unity but with means given by

E z ¼ V0 l.

0

Pp 2

) 1 zi non-central v2p with non-centrality parameter V 0 l V0 l ¼

P

l 0 VV 0 l ¼ l 0 1 l.

Appendix 263

Chi-Square

Events A1 A2 …. Ai …… Ak Total

Probability P1 P2 …. Pi …… Pk 1

Frequency n1 n2 …. ni …… nk n

n! Y k

) f ðn1 ; n2 ; . . .; nk Þ ¼ Q pni i

n !

i i i¼1

ni Binðn; pi Þ

P 2

Pearsonian chi-square statistic is v2 ¼ ki¼1 ðni npnpi Þ

i

Using Stirling’s approximation to factorials

pﬃﬃﬃﬃﬃﬃ n n þ 1

2pe n 2 Y k

f ðn1 ; n2 ; . . .; nk Þ ’ Q pﬃﬃﬃﬃﬃﬃ ni þ 12

pni i

k n

1 2pe ni

i 1

n þ 12

Qk n i

n 1 pi

¼ k1 Q n þ1

k

ð2pÞ 2 1 ni i 2

pﬃﬃﬃ Y k 1

n npi ni þ 2

¼ k1 Q pﬃﬃﬃﬃﬃﬃ

ð2pÞ 2 npi 1 ni

Xk

1 npi

) loge f ðn1 ; n2 ; . . .; nk Þ ’ C þ ni þ logc ðA:3Þ

i¼1

2 ni

pﬃﬃ

where C ¼ loge n

k1 Qk pﬃﬃﬃﬃﬃ

ð2pÞ 2

1

npi

ni npi

We write di ¼ p ﬃﬃﬃﬃﬃﬃﬃ

np q ; qi ¼ 1 pi

i i

264 Appendix

rﬃﬃﬃﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ni qi

) ni ¼ npi þ di npi qi ) ¼ 1 þ di

npi npi

rﬃﬃﬃﬃﬃﬃ1

np qi

) i ¼ 1 þ di

ni npi

X k rﬃﬃﬃﬃﬃﬃ

1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qi

) loge f ðn1 ; n2 ; . . .; nk Þ¼ C npi þ þ di npi qi logc 1 þ di

1

2 npi

0 1

Xk rﬃﬃﬃﬃﬃﬃ 3=2

1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qi d2 q d3 q

¼C npi þ þ di npi qi @di i i þ i i A;

2 np i 2np i 3ðnpi Þ 3=2

1

qﬃﬃﬃﬃﬃ

qi

Provided di np \1

i

k

X rﬃﬃﬃﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 qi 1 1 q d3

¼C di npi qi þ di d2i qi d2i i þ d2i qi þ piﬃﬃﬃ ð Þ þ

1

2 npi 2 4 npi n

k

X rﬃﬃﬃﬃﬃﬃ

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 qi 1 1 q d3

¼C di npi qi þ di þ d2i qi d2i i þ piﬃﬃﬃ ð Þ ðA:4Þ

1

2 npi 2 4 npi n

P pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Pk

Note that di npi qi ¼ 1 ðni npi Þ ¼ n n ¼ 0;

pﬃﬃﬃ d3 d2

we assume that d3i ¼ 0ð nÞ i:e:; piﬃﬃn ! 0; pdiﬃﬃn ! 0 ) ni ! 0

) All the terms in the R.H.S of (A.4) tends to zero except 12 d2i qi , thus (A.4)

implies

1X k Pk 2

d2i qi ) f ’ eC e2 1 di qi

1

loge f ’ C

2 1

pﬃﬃﬃ Pk ðni npi Þ2

n 12

)f ’ k1 Q pﬃﬃﬃﬃﬃﬃ e

i¼1 npi

ð2pÞ 2 k1 npi

1 Pk ðni npi Þ2 1

2 1

¼ pﬃﬃﬃﬃﬃ e

k1

i¼1 npi

: Qk1 pﬃﬃﬃﬃﬃﬃ ðA:5Þ

ð2pÞ pk

2

1 npi

P

We note that k1 ðni npi Þ ¼ 0

P

i.e., nk npk ¼ 1k1 ðni npi Þ

X

k

ðni np Þ2 X

k1

ðni np Þ2 ðnk npk Þ2

) i

¼ i

þ

i¼1

npi i¼1

npi npk

Appendix 265

nP o2

k1

X

k1

ðni np Þ2 1 ðni npi Þ

¼ i

þ ðA:6Þ

1

npi npk

where xi ¼ npi np

ﬃﬃﬃﬃ

ﬃi

np ; i ¼ 1ðlÞk 1:

i

@ ðn1 ; n2 ; . . .; nk1 Þ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

jJ j ¼ ¼ Diag pﬃﬃﬃﬃﬃﬃﬃ

np1 . . . npk1

@ ðx ; x ; . . .; x Þ

1 2 k1

Y

k 1

pﬃﬃﬃﬃﬃﬃ

¼ npi

Pk1 pﬃﬃﬃﬃﬃ 2

Pk ðni npi Þ2 Pk1 npi xi

(A.6) ) 1 npi ¼ 1 x2i þ 1

npk

" #

X

k1

1 X k1 k1 X

X pﬃﬃﬃﬃﬃﬃﬃﬃ

¼ þ x2i px þ

2

pi pj x i x j

1

pk 1 i i i6¼j¼1

k1

X Xk1 X pﬃﬃﬃﬃﬃﬃﬃ

ﬃ

pi pj

p

¼ 1 þ i x2i þ xi xj

1

pk i6¼j¼1

pk

¼ x 0A x

where

0 pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1

p1 p1 p2 p1 p3 p1 pk1

1þ pk ﬃ . . .

B

pk pk pﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ

pk ﬃ

C

B 1 þ pp2

p2 p3 p2 pk1

C

Ak1xk1 ¼ B pk . . . pk C

@ A

k

... ...

1 þ ppk1

k

0 1

1 þ a21 a1 a2 a1 a3 . . . a1 ak1

B a1 a2 1 þ a22 a2 a3 . . . a2 ak1 C qﬃﬃﬃﬃ

¼B

@

C where ai ¼ pi 8i ¼ 1ð1Þk 1

... ... A pk

Now

a1 þ a1 a2 a3 . . . ak1

1

a1 a2 þ a12 a3 . . . ak1 Rows

jAj ¼ ða1 a2 ::ak1 Þ ;

. . .: ... ... ... ai

a1 a2 a3 1

ak1 þ ak1

266 Appendix

1þ 1

1 1. . . 1

a21

1 1þ 1

1. . . 1

2

a22

¼ ða1 a2 ak1 Þ 1 1þ 1

1 1

... ... ...

a23

...

1 1þ

1 1 1

a2k1

2

a1 þ 1 a21 ... a21

2

a a2 þ 1

2

... a22

¼ 2 (ith row X a2 )

. 2. . ... ... i

a a2k1 ... a2k1 þ1

k1

Pk1 P Pk1

1þ a2i 1 þ 1k1 a2i ... 1þ a2i

1 1

¼

a22 1 þ a22 ... a22 (1st row R1 ¼ P Ri =

... ... ...

a2k1 a2k1 ... 1 þ a2k1

sum of all rows)

1 1 ... 1

2

Xk1 a a22 þ 1 ... a22

¼ 1þ a2i ¼ 2 ;

1

. 2. . ... ...

a a2k1 ... 1 þ a2k1

k1

1 1 ... 1

Pk1 2 0 1 ... 0

¼ 1 þ 1 ai ¼ ;

. . .: . . .: ... . . .

0 0 ... 1

P

k

X

k1 X

k1 pi

pi 1

¼ 1þ a2i ¼ 1 þ ¼ 1 ¼

1

pk pk pk

12 x 0 A x

) (A.5) ) f ðx1 ; x1 ; . . .; xk1 Þ ¼ 1

k1 p ﬃﬃﬃﬃ e Qk11 pﬃﬃﬃﬃﬃ jJ j

ð2pÞ 2 pk npi

pﬃﬃﬃﬃﬃﬃ

1

jAj 12 x 0 A x

¼ k1 e

ð2pÞ 2

P1

12 x 0 x P

¼ k1 P 1=2 e

1 where A1 ¼ :

ð2pÞ 2 j j

Appendix 267

P

Since 1 is positive deﬁnite therefore there exists a non-singular V such that

P1

¼ VV 0 .

P

) x 0 1 x ¼ x 0 vv0 x ¼ y 0 y where y ¼ v0 x .

Using transformation ðx1 ; x2 ; . . .; xk1 Þ ! ðy1 ; y2 ; . . .; yk1 Þ

X1=2

1

jJ j ¼ ¼

jV j

12 y 0 y

) f ðy1 ; y2 ; . . .; yk1 Þ ¼ 1

k1 e ) y1 ; y2 ; . . .; yk1 are i.i.d. N(0, 1)

ð2pÞ 2

) y 0 y v2k1

X1 X

k

ðni np Þ2

) x0 x v2k1 ) i

v2k1

npi

i¼1

qﬃﬃﬃﬃﬃ

qi

Note This approximate distribution is valid if di np \1

i

q , i.e. if Maxdi \ q

i 2

i i

n np

Again di ¼ pi ﬃﬃﬃﬃﬃﬃﬃi , using normal approximation the effective range of di is (−3, 3)

npi qi

)d2i 9

i.e.,

Maxd2i ¼ 9

i

i.e. if npi [ 9. So the approximation is valid if the expected frequency for each

event is at least 10.

Again, if we consider the effective range of di as (−2, 2), then the approximation

is valid if the expected frequency for each event is at least 5.

It has been found by enquiry that if the expected frequencies are greater than 5

then the approximation is good enough.

If the expected frequencies of some classes be not at least 5 then some of the

adjacent classes are pooled such that the expected frequencies of all classes after

coalition are at least 5. If k

be the no. of classes after coalition, then

Pk

ðn

i np

i Þ2

i¼1 np

v2k

1 ;

i

where n

i ¼ observed frequency after coalition,

np

i ¼expected frequency after coalition.

Uses of Pearsonian-v2 :

(1) Test for goodness of ﬁt:

268 Appendix

A1 p1 n1

A2 p2 n2

. . .

. . .

. . .

Ai pi ni

: : :

Ak pk nk

Total 1 n

Under H 0 ; expected frequencies are np0i : We assume np0i 58i.

Pk ðni np0i Þ2

) Under H 0 ; i¼1 np0i

v2k1

P n2

i.e. ki¼1 npi 0 n v2k1 :

Pi 2

i.e. v2 ¼ ki¼1 OE n v2k1

where O = observed frequency (ni )

E = Expected frequency (np0i )

) x0 : v2 [ v2a;k1

ters h s1 ¼ ðh1 ; h2 ; . . .; hs Þ0 and suppose ^h be an efﬁcient estimator of h . Then

n o2

Pk ni npi ð h^Þ

i¼1

v2ðk1sÞ :

npi ^

h

Classes Population

P1 P2 ………. Pj ………. Pl

A1 p11 p12 ………. p1j ………. p1l

A2 p21 p22 ………. p2j ………. p2l

. . . ………. . ………. .

. . . ………. . ………. .

Ai pi1 pi2 pij pij

. . . . .

. . . . .

. .

Ak pk1 pk2 pkj pkl

Total 1 1 ………. 1 ………. 1

Appendix 269

where pij ¼ the probability that an individual selected from jth population will

belong to ith class.

We are to test H 0 : pi1 ¼ pi2 ¼ ¼ pil ð¼ pi sayÞ 8i ¼ 1ð1Þk. To do this we

draw a sample of size n and classify as shown below:

P1 P2 …. Pj …. Pl

A1 n11 n12 n1j n1l n10

A2 n21 n22 n2j n2l n20

. . . . . . . .

. . . . . . . .

Ai ni1 ni2 nij nil ni0

. . . . . . . .

. . . . . . . .

Ak nkl nk2 …. nkj …. nkl nk0

Total n01 n01 …. n0j …. n0l n

2

Xk

nij n0j pij

vð2k1Þ 8j ¼ 1ð1Þ1

i¼1

n0j pij

2

Xl X k

nij n0j pij

) v21ðk1Þ

j¼1 i¼1

n 0j p ij

Pl Pk fnij n0j pi g2

) Under H 0 ; v2 ¼ j¼1 i¼1 n0j pi v21ðk1Þ

pi ’s are unknown and they are estimated by p^i ¼ nni0 8i ¼ 1ð1Þk:

P P fnij n0j ni0 g

) under H 0 ; v2 ¼ 1j¼1 ki¼1 n ni0n v21ðk1Þðk1Þ ¼ v2ðk1Þðl1Þ

0j n

P

p1 ; p2 ; . . .; pk as k1 pi ¼ 1:

) x0 : v2 [ v2a;ðk1Þð11Þ

270 Appendix

A B Total

B1 B2 ………. Bj ………. Bl

A1 p11 p12 ………. p1j ………. p1l p10

A2 p21 p22 ………. p2j ………. p2l p20

. . . ………. . ………. . .

. . . ………. . ………. . .

Ai pi1 pi2 pij pi1 pi0

. . . . . .

. . . . . .

Ak pk1 pk2 pkj pkl pk0

Total p01 p02 ………. p0j ………. p0l 1

We are to test

H 0 : A and B are independent, i.e. to test

classiﬁed as shown below:

A B Total

B1 B2 ………. Bj ………. Bl

A1 n11 n12 ………. n1j ………. n1l n10

A2 n21 n22 ………. n2j ………. n2l n20

. . . ………. . ………. . .

. . . ………. . ………. . .

Ai ni1 ni2 nij nil ni0

. . . . . .

. . . . . .

Ak nk1 nk2 nkj nkl nk0

Total n01 n02 ………. n0j ………. n0l n

n! Y Y nij

Pðn11 ; . . .; nkl Þ ¼ Q Q pij

i j nij ! i j

E nij ¼ npij ;

Appendix