You are on page 1of 5

On the Existence of Maximum Likelihood Estimators for the Binomial Response Models

Author(s): Mervyn J. Silvapulle


Source: Journal of the Royal Statistical Society. Series B (Methodological) , 1981, Vol. 43,
No. 3 (1981), pp. 310-313
Published by: Wiley for the Royal Statistical Society

Stable URL: https://www.jstor.org/stable/2984941

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series B (Methodological)

This content downloaded from


195.176.113.159 on Mon, 19 Jun 2023 11:58:33 +00:00
All use subject to https://about.jstor.org/terms
J. R. Statist. Soc. B (1981),
43, No. 3, pp. 310-313

On the Existence of Maximum Likelihood Estimators for the Binomial


Response Models

By MERVYN J. SILVAPULLE
Australian National University, Canberra

[Received May 1980. Revised November 1980]

SUMMARY
Necessary and sufficient conditions are given for the existence of maximum likelihood
estimators of the linear regression parameter in binomial response (this includes Logit and
Probit) models.

Keywords: BINOMIAL RESPONSE; MAXIMUM LIKELIHOOD ESTIMATOR; EX-ISTENCE

1. INTRODUCTION
THE question of existence of maximum likelihood estimators (mle) for Logit models arose in an
analysis of the relationship of psychiatric "caseness" to scores on a psychiatric screening
questionnaire. Tennant (1977) administered the GHQ (General Health Questionnaire,
Goldberg, 1972) to 120 patients attending a General Practitioner's surgery, and also gave each
one a standardized psychiatric interview. From the interview, patients were classified as
Psychiatric Case/Non-case. In a secondary analysis of Tennant's data, Duncan-Jones and
Henderson (1978) fitted a Logit regression of"caseness" on GHQ Score, and obtained a good fit
with the model.
Logit { Prob (Case)} = 31+ /2 X, (1)
where x = GHQ Score and Logit (t) =log {t/(1 - t)}, for the full set of data. In a more detailed
analysis, Duncan-Jones encountered problems in attempting to fit a separate Logit regression
for males (Table 1), though the data for females gave a satisfactory fit. For illustration, we have
used the 12-item version of the GHQ Score. The same problem arose in these data when using
the 30-item version.

TABLE 1
Number of patients classified by the GHQ score and the outcome of a standardized psychiatric
interview (Case/Non-case)

GHQ score 0 1 2 3 4 5 6 7 8 9 10 11 12 Total

Males Cases 0 0 1 0 1 3 0 2 0 0 1 0 0 8
Non-cases 18 8 1 0 0 0 0 0 0 0 0 0 0 27
Females Cases 2 2 4 3 2 3 1 1 3 1 0 0 0 22
Non-cases 42 14 5 1 1 0 0 0 0 0 0 0 0 63

It is obvious that there is an indeterminancy if there is no overlap between values of x for


which yi = 0 (Non-case) and values of x for which yi = 1 (Case). In this paper we generalize this
idea and also show that a certain degree of overlap is a necessary and sufficient condition for the
existence of mle.

2. THE MAIN RESULTS


Let (y x1), ..., (yn, xn) be a set of n observations, where yi is binary (i.e. yi = 0 or 1) and x
p-dimensional vector (xi1, ..., xip). The Binomial Response Model that we are intereste
Prob (yi = 1) = G(xi P), (2)

This content downloaded from


195.176.113.159 on Mon, 19 Jun 2023 11:58:33 +00:00
All use subject to https://about.jstor.org/terms
1981] SILVAPULLE - Existenice of mle's 311

where G is a distribution function, P - (/4,..., Ap) and xi P is the usual inner product
xi, 1 +... +xipP. For convenience, we shall assume that yi = 1, i = 1,...,r and yi = 0,
= r + 1, ...,n for some 0 < r < ii. Writing /(p) for -log (likelihood), we have
r n

1(p) =-E
1
log
r+
G(xi
1
P)- Y log { 1-G(xi P})J (3)

Since the parameter P


that the design matrix
zero vector xi is an ad
i = 1, ..., n. The mle of
RP and e denotes a unit vector in RP. The notation "R-2.6.3 p 14" will be used to mean Theorem
(or Corollary, etc.) 2.6.3 which appears on page 14 of Rockafellar (1972), further we will follow
the terminology of this book (see the Appendix to this paper).
Let S, F be the relative interiors of the convex cones generated by x1, ..., Xr and Xr+ 1, ..., Xn
respectively. Then (see the Appendix)

S = {Yki xi ki>O}, F ={rLi:kixi|ki>?}.

Our main result is as follows.

Theorem. Let the condition [l be defined by

11: S n F = 0 or one of S, F is RP (0 = empty set).

(i) The mle J of P exists and the minimum set {j} is bounded only when fl is satisfi
(ii) Suppose that /(p) is a proper closed convex function on RP. Then the mle P exists and
minimum set {t} is bounded if and only if H is satisfied.
(iii) Suppose that - log G and log (1 - G) are convex and xi1 = 1 for every i. Then j ex
and the minimum set {I } is bounded if and only if S n F 0. Let us further assume th
G is strictly increasing at every t satisfying 0 < G(t) < 1. Then j is uniquely defined if
only if SnF#0.
As an application, let us consider the Logit (G = Logistic) and Probit (G = Normal) models.
Maximum likelihood estimation in these important models are discussed in Cox (1970) and
Finney (1971) respectively. It may be verified directly by evaluating the second derivatives that
- log G and - log { 1 - G} are convex. Therefore, assuming that the response model includes a
constant term it follows from part (iii) of the above theorem that the mle P is uniquely defined if
and only if S n F =A 0.
McFadden (1976) refers to the cases G = Cauchy and G = Uniform (0, 1). When G is Cauchy
- log G and - log { 1 - G} are not convex. Therefore, l(p) is not convex in general and it may
have multiple minima. It seems that this problem will arise whenever G has tails heavier than
that of Logistic distribution. For instance, suppose that G(t) 1 I t I - l as t - c - c for some q > 0.
Then (d2/dt2) { - log G(t)} -" t-2 for large negative t. Therefore, - log G is not convex. Now,
let us consider the case when G = Uniform (0, 1) and x1 1 for every i. Clearly, - log G and
- log { 1 - G} are convex and G is strictly increasing on (0, 1). Therefore, (by part (iii) of the above
theorem) we conclude that the mle i exists uniquely if and only if S rn F =# 0.
For most practical purposes the third part of the above theorem is sufficient. To give a simple
illustration of the general ideas involved in the main theorem, let us consider the set of data in
Table 1 for males. The model in consideration is (1) which is the same as (2) with G the Logistic
distribution and xi = (1, (GHQ Score)i). The convex cones S and F are shown in Fig. 1. Since F is
open (relative to the vector space spanned by all the xis corresponding to Non Cases) it is the
cone which lies between (not including) OA and OB. Similarly, S does not include OB and OC.
Clearly S and F are disjoint and are separated by the vector (1, 2). Hence, by the theorem, j does

This content downloaded from


195.176.113.159 on Mon, 19 Jun 2023 11:58:33 +00:00
All use subject to https://about.jstor.org/terms
312 SILVAPULLE - Existence of mle's [No. 3,

x2

ilc

e(1, 1)

//---',
e S 9 , (1 ,2)
F

0 (1, 0) A Xi

Fig. 1. A representation of the data in Table 1 for the males.

not exist. Note that the vector e = (-2, 1), which is orthogonal to (1, 2) is such that xi e O for
Non-cases and xi e 0 for Cases, and so, from (3), l(f + ke) is decreasing in k for any p.
By contrast, if there is an additional observation which is either a Non-case with GHQ > 2 or
a Case with GHQ < 2, then S rq F is no longer 0, and so, by the theorem, , exists. In this event
there is no vector separating S and F and thus no vector corresponding to e above. Therefore, (3)
implies that for any P and e, l(P + ke) increases in k for large k, that is l(p) increases in any
direction eventually. Now, since l(p) is convex it is intuitively clear that it must have a minimum.
A figure similar to Fig. 1 for the females shows that S r- F # 0 and the mle Pi exists. Let us remark
here that if all the Non-cases correspond to xi = (1, 2) then F is the line OB, not the empty set.

3. PROOF OF THE THEOREM


(i) Let {1} be non-empty and bounded. Suppose that [l is not satisfied, that is S r) F = 0,
SO RP and F # RP. Clearly, there exists a P* such that l(P*) is finite.
Case (a): Suppose that F = 0 or S = 0. Without loss of generality assume that F = 0.
Since S o RP, there exists an e such that xi e is non-negative for 1 < i < n (R-1 1.7.3 p 101). H
/(P* + ke) is a decreasing function of k.
Case (b): S #0, F #0. There exists an e such that

xi e > 0 for 1 < i - r and xi e < 0 for r + -< i < n,


and not all of xi e are zero (R 1 1.3 p 97 and R-1 1.7 p 100). Hence, l(P* + ke) is decreasing in k.
Therefore, either p does not exist or {,} is unbounded. This is a contradiction, hence I1 is
satisfied.
(ii) The necessity of Hl is proved above, therefore we have to prove only the sufficiency. So
let us assume that HI is satisfied. Since l(P) is proper there is a P* such that l(D*) is finite. Let e be a
arbitrary but fixed unit vector.
Case (a): S = RP or F = RP. Without loss of generality assume that S = RP. Clearly, there
exists a j, 1 s_j r such that xje is negative. So, l(p* +ke) is increasing in k.
Case (b): S = RP, F # RP. Since Hl is satisfied, S r) F #0. So, we cannot find a hyperplane
which separates S, F properly (R 11.3 p 97). Therefore, xi e is negative for some 1 < i s r or
positive for some r + 1 < i < n. So, again l(P* + ke) is increasing in k. Therefore, l(p) does not have
a direction of recession and the result follows from R 27.1(d) p 265.

This content downloaded from


195.176.113.159 on Mon, 19 Jun 2023 11:58:33 +00:00
All use subject to https://about.jstor.org/terms
1981] SILVAPULLE - Existence of mle's 313

(iii) Let C be the convex set {pIxiP>a


-oo < a < b < oo and O < G(t) < 1 only when a < t < b. Clearly, l(p) is finite inside C and + oo
outside. The set C is non-empty since (c, 0, ..., 0) E C whenever a < c < b. Convexity of l(p) follows
from the convexity of - log G and - log { 1 - G}. Also, l(p) is lower semicontinuous since it is
continuous on C and approaches oo as P approaches the boundary of C. Hence, l(p) is a proper
closed convex function. Therefore, {p} is non-empty and bounded if and only if S n F = 0. The
other part is straightforward since l(p) is strictly convex on C whenever G is strictly increasing o
(a, b).

ACKNOWLEDGEMENT
I am grateful to Professor C. R. Heathcote, Mr P. Duncan-Jones and the referees for their
useful comments. Also, I am grateful to Dr C. Tennant for allowing me to use his data; the full
data have not been published previously and the data in Table 1 have been made available by
Mr P. Duncan-Jones.

APPENDIX
In this Appendix we shall explain most of the technical terms (not in their full generalities) in
Convex Analysis that are used in this paper. For precise definitions the reader is referred to
Rockafellar (1972). However we believe that this Appendix should be sufficient to understand
the essential points.
A convex cone C in RP is a convex set such that kx E C whenever x c C and k > 0. The convex
cone SO generated by Xj,...,Xr is {klx,+... +kp xpIki . The relative interior S of SO is the
interior of SO with respect to {x - y I x, ye So which is the sub-vector space spanned by S0.
Let C, and C2 be non-empty sets in RP and H = {x I xe = 0} be a hyperplane. The two closed
halfspaces associated with H are defined as {x xeO} and {x xeO}. We say that H separates
C, and C2 if Cl is contained in one closed half space and C2 is contained in the oth
addition, if C1 u C2 is not contained in H, then H is said to separate C1 and C2 properly.
Letfbe a convex function on RP. We say thatfis proper if it is finite on a non-empty convex
set C and takes the values + oo outside C. A proper convex function is closed if it is lower semi-
continuous. The minimum set {x} offis {x E RP f(I) = inff(y), where the infimum is taken over
y E RP}. A direction of recession of a proper closed convex function f is a unit vector e E RP for
which there exists x E RP such that f(x) < oo and f(x + ke} is a non-increasing function of k for
large k.

REFERENCES
Cox, D. R. (1970). Analysis of Binary Data. London: Chapman and Hall.
DUNCAN-JONES, P. and HENDERSON, A. S. (1978). The use of a two-phase design in a population survey. Social
Psychiatry, 13, 231-237.
FINNEY, D. J. (1971). Probit Analysis, 3rd ed. Cambridge: Cambridge University Press.
GOLDBERG, D. P. (1972). The Detectioni of Psychiatric Illness by Questionnaire. (Institute of Psychiatry Maudsley
Monographs, No. 21) London: Oxford University Press.
McFADDEN, D. (1976). Quantal choice analysis, a survey. Atnn1. Econ. Soc. Meas., 4, 363-390.
ROCKAFELLAR, R. T. (1972). Convex Analysis. Princeton, N.J.: Princeton University Press.
TENNANT, C. (1977). The general health questionnaire: a valid index of psychological impairment in Australian
populations. Med. J. Aust., 2, 392-394.

This content downloaded from


195.176.113.159 on Mon, 19 Jun 2023 11:58:33 +00:00
All use subject to https://about.jstor.org/terms

You might also like