Bagkavos and Patil (2009)

This article was downloaded by: [CERIST]
On: 02 September 2013, At: 09:13

Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory

and Methods
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/lsta20
Variable Bandwidths for Nonparametric

Hazard Rate Estimation
a b
Dimitrios Bagkavos & Prakash Patil
a
Accenture Marketing Sciences, Athens, Greece
b
School of Mathematics and Statistics, The University of
Birmingham, Birmingham, UK
Published online: 19 Mar 2009.
To cite this article: Dimitrios Bagkavos & Prakash Patil (2009) Variable Bandwidths for Nonparametric
Hazard Rate Estimation, Communications in Statistics - Theory and Methods, 38:7, 1055-1078, DOI:
10.1080/03610920802364088
To link to this article: http://dx.doi.org/10.1080/03610920802364088
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Communications in Statistics—Theory and Methods, 38: 1055–1078, 2009
Copyright © Taylor & Francis Group, LLC
ISSN: 0361-0926 print/1532-415X online
DOI: 10.1080/03610920802364088
Variable Bandwidths for Nonparametric

Hazard Rate Estimation
DIMITRIOS BAGKAVOS1 AND PRAKASH PATIL2

1
Accenture Marketing Sciences, Athens, Greece
2
School of Mathematics and Statistics, The University of Birmingham,
Birmingham, UK
Downloaded by [CERIST] at 09:13 02 September 2013
A smoothing parameter inversely proportional to the square root of the true density
is known to produce kernel estimates of the density having faster bias rate of
convergence. We show that in the case of kernel-based nonparametric hazard rate
estimation, a smoothing parameter inversely proportional to the square root of the
true hazard rate leads to a mean square error rate of order n−8/9 , an improvement
over the standard second order kernel. An adaptive version of such a procedure is
considered and analyzed.
Keywords Hazard rate estimation; Kernel; Mean integrated square error;

Smoothness; Variable bandwidth.
Mathematics Subject Classification Primary 62G05, 62N02; Secondary 62P10,

62G30.
1. Introduction
For a given random sample X1 X2 Xn of independent failure times with
common probability density function fx and distribution function Fx, the most
commonly used kernel-based estimator of the hazard rate, x=fx/1 − Fx,
for Fx < 1), is

n
x − Xi
ˆ h = h−1
x K n − i + 1−1 (1)
i=1
h
where K is any second-order positive kernel, h denotes the bandwidth, and Xi is
the ith order statistic. Since Watson and Leadbetter (1964) introduced the above
estimator, several of its asymptotic properties have been studied in the literature.
For example, its mean square error (MSE) properties for complete samples are given
in Rice and Rosenblatt (1976). Besides MSE, Tanner and Wong (1983) proved the
Received June 20, 2007; Accepted July 23, 2008

Address correspondence to Dimitrios Bagkavos, Accenture Marketing Sciences, 17
Rostoviou Str., Athens 11526, Greece; E-mail: dimitrios.bagkavos@accenture.com
1055
1056 Bagkavos and Patil
asymptotic normality of the above estimator for censored data using the Hajék
projection method, while Lo et al. (1989) achieved the same result via strong
representation of the Kaplan–Meier estimator. Smoothing parameter selection by
least squares cross-validation is investigated by Sarda and Vieu (1991) and Patil
(1993). González-Manteiga et al. (1996) employed the bootstrap for bandwidth
selection.
If a second-order kernel and an asymptotically optimal bandwidth are used,
then the MSE rate of convergence of xˆ h is n−4/5 . It would be possible to achieve
a faster rate by using higher-order kernels. However, this leads to the undesirable
feature of having hazard rate estimates that could take negatives values. In this
direction, in the context of kernel density estimation Abramson (1982a) introduced
an estimate that uses bandwidth inversely proportional to the square root of the
true density. Hall and Marron (1988) used a variant of Abramson’s estimate and
considered its theoretical properties when the unknown density is bounded away
from 0. In analogy to the proposals of Hall and Marron (1988), we allow the
bandwidth in xˆ h to be inversely proportional to the square root of the true
hazard rate itself. It should be noted that Hall et al. (1995) provided the fine
tuning of the proposal in Hall and Marron (1988). Since in our case the effective
interval over which the hazard rate is estimated is finite, the issues addressed in Hall
et al. (1995) do not play a role. Although an estimator, obtained by allowing the
bandwidth in xˆ h to be inversely proportional to the square root of the true
hazard rate, is not useful in practice, it is still helpful to know that, as shown in
Sec. 2, this estimator achieves a faster rate convergence without taking negative
values.
To construct a practically feasible estimator, we replace the unknown hazard
rate in the bandwidth function by its usual kernel estimate. In Sec. 3, it is shown
that such an estimator is almost as good as the ideal estimator as far as the rate of
convergence to the true curve is concerned. In Sec. 4, we illustrate the estimate for
distributional and real life data and a simulation study is carried out to assess and
compare its performance with other estimates.
It should be mentioned that Silverman (1986) considered variable bandwidth
for hazard rate estimation, albeit indirectly and without the theoretical analysis
considered here. In contrast to our direct approach, the hazard rate there
is estimated as a ratio of density and survival function estimators, with the
density estimate being based on variable bandwidth. The results obtained here for
ˆ h could be derived for an estimator obtained based on the
the estimator x
approach described in Silverman (1986) and were discussed in Bagkavos (2003).
Another attempt to study variable smoothing in hazard rate estimation is by Tanner
(1983). Noting the difficulty in establishing theoretical properties of a hazard rate
estimator that uses variable smoothing, Tanner (1983) proves the strong consistency
of such an estimator with a deterministic form of variable smoothing. More recently,
Nielsen (2003) studied variable bandwidth hazard rate estimation under a counting
processes framework. There, the asymptotic properties of nine estimates that use
different bandwidth selection methods—three of which are based on Abramson’s
square root law—were derived and their performance was further examined by an
extensive simulation study. Specifically, estimator â9 of Nielsen (2003) is equivalent
to the estimate we develop here given though with a different mathematical
formulation. The present research is a direct extension of the work of Hall and
Marron (1988) to the hazard setting and as such extends the theoretical results
Variable Bandwidth Hazard Rate 1057
of Nielsen (2003) to the set up that we have considered. In detail, we start with
the ideal estimate, which is based on (1) and as opposed to Nielsen (2003), take a
direct approach in calculating its asymptotic properties. We then define an estimate
that can be used in practice and quantify its difference with the ideal estimate as
a normally distributed random variable. Simultaneously, we prove the asymptotic
normality of the ideal estimate. Thus, the contribution that this article makes is to
provide asymptotic formulae for bias and variance when the variable bandwidth has
a deterministic form. Further, it establishes the convergence rate of an (adaptive)
variable bandwidth estimator when the variable bandwidth is stochastic, proves its
asymptotic normality and provides comparison between the adaptive and the ideal
estimators. Finally, it compares, via simulations, the developed method with other
methods with similar asymptotic properties.
2. Variable Bandwidth Estimator and Mean Square Error Analysis

ˆ h defined in (1) by taking bandwidth proportional to − 21
We modify estimator x
to define a new estimator that makes use of variable bandwidth,

1 x−X 1
1 n
Xi 2 K h i Xi 2
˜ n x h = (2)
h i=1 n−i+1
This estimator is a natural extension of the density estimator defined in Hall and
Marron (1988) to the case of hazard rate estimation.
Note that the bandwidth hx ≡ hx− 2 , used in the definition of ˜ n x h
1
involves the true hazard function. This makes the above estimator infeasible in
practice. However, in the next section, we show that this ideal variable bandwidth
estimator has better bias and hence MSE (since their variances are of the same
order) properties than its fixed bandwidth counterpart.
Unless stated otherwise, we assume standard kernel conditions, that is, the
kernel function K → , satisfies:
A.1
Kzdz = 1
A.2 K is non negative, symmetric, and five times differentiable.

A.3 K vanishes outside a compact interval.
We note here two implications of conditions A.2 and A.3. First, both the
fourth moment of the kernel and its supremum are finite. Second, the first two
derivatives of the kernel are bounded. Typically, estimation of the hazard function
is done in 0 T, where T = sup x Fx < 1 −
, for some
> 0. Note though that
near zero kernel based estimators are biased more than the usual unless the real
curve is almost or equal to zero there. One way of handling this, is to use special
kernel functions, called boundary kernels, which are weight functions used only
within the boundary region (see, for example, Simonoff, 1996, pp. 53–50 or Wand
and Jones, 1995, pp. 46–49 for technical details and Simonoff, 1996, pp. 80–81
for a small overview of the literature on boundary kernels). Specifically, for the
estimator we propose here, we conjecture that a boundary kernel constructed as
indicated in Müller and Wang (1994, p. 75) would restore boundary bias to the bias
levels claimed in the interior in Theorem 2.1 below. However, calculations on the
asymptotic properties of the estimate along this route are far too long and involve
tedious algebra to the point that they steal focus from the main message. For this
reason, we study the theoretical properties of the estimate in an interval T for a
small > 0 which can decrease to zero as bandwidth tends to zero.
The asymptotic mean and variance formulae of ˜ n x h are summarized in
the following theorem. The full proof is given in Bagkavos (2003), however see
Appendix A.1 for an outline.
Theorem 2.1. Under assumptions A.1, A.3 for the kernel, and assuming that has four
continuous derivatives and that it is bounded away from zero on T,

˜ n x h = x + gxh4 z4 Kzdz + oh4
x 2 1 2
3
ar ˜ n x h = K zdz + onh−1

1 − Fx nh
uniformly in x ∈ T as h → 0, n → + and nh → +, with
1
24 x − 36 x x x + 6 x 2 x
4 2 2 2
gx =
24x5

+ 8 x x2 x − 4 x3 x
Remark 2.1. From Theorem 2.1 the MSE distance between ˜ n x h and x can
8
easily shown to be On− 9 . For this, first note that the values of h (in the bandwidth
function, h 2 x) which minimize the mean square error of ˜ n x h are asymptotic
1
1
to constant multiples of n− 9 . Then, for such h’s,
2
˜ n x h − x = On− 9
8
1
Remark 2.2. With h ∼ n− 9 one could establish that as n → +
√
nh ˜ n x h − ˜ n x h
→ N0 1
x
where
x 2 2
3
2 x = K tdt
1 − Fx
Thus, the distance between ˜ n x h and x is Op n− 9 . For an outline see the
4
proof of the asymptotic normality part of Theorem 3.2 given in the Appendix.
Remark 2.3. In the discussion above, the bandwidth of the ideal estimator (i.e., h
in h−1/2 x) is denoted and treated as constant. The case of random bandwidth
could be analyzed as follows. Assume that the kernel is compactly supported and
1
twice continuously differentiable and that a random bandwidth ĥ satisfies n 9 ĥ → c
in probability, where 0 < c < +. Then, using an argument similar to that of
Abramson (1982b) for the proof of Eq. (2) there, one can show that
˜ n x ĥ = ˜ n x cn− 9 + op n− 9

1 4
as n → + for every x ∈ T.
3. Adaptive Estimation
In order to get a practically useful estimator we replace x in (2) by a simple
kernel estimator. This leads to the definition of the so-called adaptive estimator of
the hazard rate function,

ˆ i h1 21 K x−Xi X
n X ˆ i h1 21
ˆn x h1 h2 = 1 h2
(3)
h2 i=1 n−i+1
ˆ h is the ‘pilot’ estimator defined in (1). Practical choice of the

where x
bandwidths involved in (3) is discussed in Sec. 4.
Note that for the estimator based on random bandwidths we will replace h1 h2
in the definition of ˆ n x h1 h2 , by ĥ1 and ĥ2 .
To quantify the distance between the ideal and adaptive estimators we prove
that up to terms of op n−4/9 the adaptive equals the ideal plus a remainder term,
and that this term is asymptotically normally distributed. For that, first define
ˆ h Lx = Kx + xK x and L1 z = zK z

x h = x
Define also,
1 n
Tx h1 h2 = tXi x h1 h2 (4)
2nh1 h2 i=1
where

Xi −u
K
− 21 h1
tu x h1 h2 = Xi − h1 Xi h1
1 − FXi

x − Xi 1
−1
×L Xi 2 1 − FXi
h2
Then we have the following theorem, a detailed proof of which is given in Bagkavos
(2003), however an outline is given in Appendix A.2.
Theorem 3.1. Assume that the kernel satisfies conditions A.1–A.3. Suppose that > 0
is three times differentiable with the third derivative satisfying a Lipschitz condition of
unit order. Also, we assume that if the bandwidth is random then, with probability 1 as
n → +
1 1 1
n−a− 5 < ĥ1 < na− 5 where a <
5
1 1
and n− 9 < ĥ2 < n− 9
with > > 0. Then,

4
ˆ n x ĥ1 ĥ2 = ˜ n x ĥ2 + Tx ĥ1 ĥ2 + op n− 9 (5)
Remark 3.1. In practice, one might choose h1 so that it satisfies the assumptions of
the theorem by squared error cross-validation; see Patil (1993) for details. On the
other hand, practical choice of h2 is more difficult. An appealing method would be
the “Solve-the-Equation Plug-In Approach” of Jones et al. (1996, Paragraph 2.3.1).
Although this method, extended to the hazard setting, would give a bandwidth as
required by the assumptions of the theorem, the theoretical study of this method is
beyond the scope of the present work.
Now, define
Sx h2 = ˜ n x h2 − ˜ n x h2
The next theorem is used to evaluate how much we lose in practice by using
the adaptive estimator instead of the ideal. It shows that the remainder term,
Tx h1 h2 , is of the same order as the difference between ˜ n x h and the true
hazard x. Proof of the theorem is given in Sec. A.3.
Theorem 3.2. Assume that the kernel satisfies conditions A.1, A.2 and that it has
two bounded derivatives. Let > 0 be bounded on T and continuous at x ∈
T. Suppose that for non random bandwidths h1 , h2 satisfying n
maxh1 h2 → 0,
n1−
minh1 h2 → for some
> 0 and h1 h−1 2 → 0 we have ĥ1 /h1 → 1 and
ĥ2 /h2 → 1 in probability. Then
d
nh2 Sx ĥ2 Tx ĥ1 ĥ2 → N1 N2
where N1 N2 is a bivariate normal distribution with mean 0 and covariance
x 2 2 1 x 2 2
3 3
arN1 = K arN2 = L
1 − Fx 4 1 − Fx
1 x 2
3
ovN1 N2 = KL
2 1 − Fx
An immediate conclusion that can be drawn is that the remainder term does
not change the rate of convergence, but it does prevent the adaptive estimator from
achieving the same first-order properties as the ideal as explained in Remark 3.2
below.
Remark 3.2. Since the limiting pair N1 N2 has zero mean, we conclude that
ˆ n x h1 h2 and ˜ n x h have the same asymptotic bias. Further, using the
properties of kernels K and L it is not difficult to see that the covariance between
S and T is positive. Hence, we conclude that the adaptive estimator has larger
asymptotic variance than the ideal.
Remark 3.3. A consequence of Theorem 3.2 is that dependence of Tx ĥ1 ĥ2 on
ˆ h
ĥ1 vanishes in the limit. Therefore, the use of a more accurate estimator than x
as pilot will not affect the asymptotic properties of ˆ n x h1 h2 .
Remark 3.4. The condition h1 h−1 2 → 0 of Theorem 3.2 provides a guidance to

choose the bandwidths. Practically we would choose a bigger value for h2 than
1 1
for h1 , i.e., h1 ∼ n− 5 and h2 ∼ n− 9 because in this way we achieve the fastest rate
of convergence to the true hazard rate. For a detailed discussion of this issue in the
density estimation setting, see Hall and Marron (1988).
4. Numerical Examples
In this section, we use distributional as well as real life data to exhibit the practical
performance of the variable bandwidth estimator. Further, we conduct a simulation
study in order to compare the performance of ˆ n x h1 h2 over other well-known
kernel hazard rate estimates. For this purpose, we use the following three different
estimators.
1. The standard second-order kernel estimator x ˆ h. It is expected that the bias
and hence the Mean Integrated Squared Error (MISE) of ˆ n x h1 h2 to be
ˆ h in the interval T. However, note that for a hazard
smaller than that of x
rate with second derivative zero, one does not expect ˆ n x h1 h2 to provide any
improvement over xˆ h for obvious reasons.
2. The plug-in estimator, where hazard rate is treated as a ratio of two functions
and the estimator is obtained by plugging in the estimators of numerator and the
denominator. That is,
f̃ x h1 h2
ˆ a x h1 h2 = (6)
1− F x
where

n 1 1
f̃ x h1 h2 = n−1 h−1
2 f̂ Xi h1 2 K h−1
2 f̂ Xi h1 x − Xi
2
i=1

n
x − Xi
f̂ x h1 =n−1 h−1
1 K
i=1
h1

n
x − Xi

F x = n −1

i=1
h
and u is the cumulative distribution function of the kernel,

u
u = Kvdv
−
This estimate was discussed in Silverman (1986). Further, in Bagkavos (2003) it

was proved that this estimate achieves a bias reduction of the same order with
estimator ˆ n x h1 h2 . Therefore, in asymptotic sense ˆ a x h1 h2 is expected
to perform as good as ˆ n x h1 h2 . However, because of its construction
ˆ a x h1 h2 uses bandwidth h1 optimal for density function and not for hazard
rate function estimation. Although optimal bandwidth for hazard rate and
density function are of the same order, the constants are different. Thus, one
expects in finite sample studies ˆ n x h1 h2 to perform better. In order to
alleviate this, the adjustment of Hazelton (2007, Sec. 4) is applied when choosing
h1 and h2 for ˆ a x h1 h2 .
3. Finally, for comparison we also use a higher-order kernel estimate given by
i
n
K4 x−X
˘x = 1 h

h i=1 n − i + 1
where K4 x is a kernel of order 4, i.e., its first three moments are 0 and the fourth
moment is different than 0. The kernel is given by

 1 −x2
 √ 3 − x2 e 2 −4 < x < 4
K4 x = 2 2


0 elsewhere.
˘
Estimator x is motivated by the fact that for sufficiently smooth x, the
ˆ h is
asymptotic bias of estimator x
n
h2i 2i
ˆ h − x =
x x u2i Kudu + on−1
i=1
2i!

By choosing K such that u2i Kudu = 0 for i = 1 k − 1, bias can be reduced in
any desired order. Here, since the kernel K4 is such that its first three moments are 0,
˘
x achieves asymptotically the same bias reduction as estimators ˆ n x h1 h2 and
ˆ a x h1 h2 .
In fact, x ˘ will have the same order of MISE convergence as that of
ˆ n x h1 h2 and because of its undesirable feature of producing a hazard rate
estimate with possible negative values, we have proposed the estimator ˆ n x h1 h2 .
Thus, its comparison with ˆ n x h1 h2 will be most useful and clearly in terms of
MISE, one would expect both, x ˘ and ˆ n x h1 h2 to have similar performances.
To select bandwidths we minimize numerically the Integrated Squared Error
(ISE) for each sample with respect either to h when estimating ˆ and ˘ or with
respect to h1 and h2 simultaneously when estimating ˆ n . A similar procedure was
used by Jones and Signorini (1997). The difference p x − x2 , where p x is
any of , ˆ , ˘ ˆ n , and x is the true hazard function, is calculated at 100 equispaced
grid points from the start to the end of the region of estimation. The bandwidth(s)
are selected by a grid search in the interval I = n−1/5 /4 ˆ 3n−1/5 /2
ˆ where ˆ is the
sample standard deviation. Forty equispaced potential bandwidths are selected from
this interval and are used to calculate the differences p x − x2 at the grid
points x1 x100 . The outcomes are then numerically integrated using Simpson’s
extended rule and the bandwidth that corresponds to the minimum difference is
selected. The interval is extended if the minimum is at the endpoints of either side.
After the best bandwidth(s) are found a possible improvement would be a quasi-
Newton approach. Specifically, for estimator ˆ a , first the procedure described above
is applied on f̃ x h1 h2 − fx2 with f being the true density. Bandwidth h is
selected for F in the same fashion with obvious modifications. In this way, estimates
f̃ and F and hence ˆ a are formed and then, the process is repeated on the difference
ˆ a x − x2 in order to readjust bandwidths h1 and h2 according to Hazelton
(2007). While this grid search based method is not available in practice (since it
depends on the true curve), this approach would at least provide a lower bound
on the error that can be achieved in practice and hence provide insight into the
maximum potential of each estimator.
Using this bandwidth selection method, we estimate the hazard rate from five
distributions with different shapes. Except for the higher-order estimate ˘ that uses
kernel K4 , in all other cases we use the Biweight kernel. The sample size is n = 800
and we estimate the hazard rate in the interval T where T is the 95th percentile
of the sample for each distribution and is the biggest of the bandwidths involved
in estimation.
In Fig. 1, we estimate the hazard rate of the standard exponential distribution.
In general, all estimates are quite close to the true hazard rate function and
demonstrate more or less the same behavior. Estimator ˆ n x h1 h2 appears to
be closer to the true curve overall. On the contrary, estimator ˆ a x h1 h2 seems

to somewhat underestimate the true hazard function. However, we feel that it is
acceptable as an estimate of the true curve.
2
Next, in Fig. 2 we estimate the hazard rate of the 12 distribution. We see
that the estimators perform similarly from the start until about x = 12. From that
point and on estimator x ˆ h becomes increasingly variable. This is also true for
estimator x. Estimator ˆ a x h1 h2 seems to underestimate the true curve from
˘
x = 12 and on while ˆ n x h1 h2 is more stable and nearer to the true curve then
any other estimate.
Figure 1. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

ˆ a x h1 h2 (plug-in estimate, dotted line), x
˘ (higher-order, short dashed-dashed line),
and x ˆ h (ordinary estimate, short dashed line) estimating the hazard rate from the
exponential distribution using a sample of 800 observations. The parallel line to the
horizontal axis is the real curve.

and x ˆ h (ordinary estimate, short dashed line) estimating the hazard rate from the 2
distribution with 12 degrees of freedom, using a sample of 800 observations. The solid line
is the real curve.
In Fig. 3 we estimate the hazard rate of the Weibull06 1 distribution.

Estimator x ˆ h is somewhat more variable then the rest from about x = 1 and on.
Estimator ˆ a x h1 h2 seems to underestimate the true curve from about x = 04
up to x = 14. Estimators ˆ n x h1 h2 and to a lesser extent x
˘ are more stable
and nearer to the true curve then any other estimate.
In Fig. 4, we estimate the hazard rate of the Lognormal distribution. Estimator
ˆ a x h1 h2 constantly underestimates the true hazard from x = 06 and on.
Overall, despite the fact that ˆ n x h1 h2 is not as good as the rest estimates from
0–0.5 after that point performs obviously better with estimator x ˘ being quite close
in performance.
In Fig. 5, we estimate the hazard rate of the 23 N3 1 + 13 N2 022 distribution.
From approximately x = 15 up to x = 25, it is clear that ˆ n x h1 h2 adapts
better to the curvature with estimator ˆ a x h1 h2 being second in performance.
Moreover, from x = 1 to x = 18 is marginally closer to the true curve.
We further examine the performance of the proposed variable bandwidth
estimate by comparing the approximate MISE of estimators ˆ n x h1 h2 ,
ˆ a x h1 h2 , x
ˆ h, and x.
˘ We consider for this comparison the same five
distributions and three different sample sizes, n = 100 400, and 1,000. For each
distribution and sample size we compute the Integrated Squared Error of each
estimate, as described previously, for 100 different samples. In every iteration the
sample in use is common for all four estimates. The average ISE across the 100
iterations is reported on the tables for each estimate. By concentrating on the
interior to avoid the effect of boundary problems on our comparison, we have
computed the MISE’s of all estimators using as interval T the 5th and the 95th
percentile.

â x h1 h2 (plug-in estimate, dotted line), x
Weibull06 1 distribution, using a sample of 800 observations. The solid line is the real
curve.

â x h1 h2 (plug-in estimate, dotted line), x
Lognormal distribution, using a sample of 800 observations. The solid line is the real curve.

2
3
N3 1 + 13 N2 022 distribution, using a sample of 800 observations. The solid line is the
real curve.
The MISE comparisons of the different estimators are given in Tables 1–3.
As expected, in terms of MISE ˆ n x h1 h2 performs as well as x ˘ (and importantly
would not produce negative values for the hazard rate estimates which x ˘ would).
In Table 1, we report the approximate MISE’s when estimating the hazard rate
2
from the Exp1 and 12 distributions. In the exponential case, all estimates are quite
close with estimator ˆ a x h1 h2 to be doing a little better for n = 100 while x ˘ is
doing better for n = 400, 1,000. However, it should be noted that since its a constant
hazard rate, in reality one does not need to use any other estimator than x ˆ h.
In the 12 2
example, ˆ n x h1 h2 is doing better for all sample sizes.
In Table 2, we report the approximate MISE’s when estimating the hazard rate
from the Weibull06 1 and LN0 1 distributions. In the Weibull case, estimator
ˆ n x h1 h2 has the lower MISE figures although estimator x ˘ is quite close.
In the Lognormal example, x ˘ is doing better for all sample sizes but ˆ n x h1 h2
is not very far behind.
Table 1
Approximate MISE of ˆ n x h1 h2 , ˆ a x h1 h2 , x
ˆ h, and x
˘ for samples
2
from Exp1, and 12 of size 100, 400, and 1,000
2
Exp1 MISE MISE MISE MISE 12 MISE MISE MISE MISE
size ˆ n ˆ a ˆ ˘ size ˆ n ˆ a ˆ ˘
100 0.7471 0.7239 0.7338 0.7976 100 0.04313 0.07633 0.201 0.0516
400 0.4494 0.4467 0.4344 0.4967 400 0.0199 0.02891 0.1021 0.02877
1000 0.1245 0.1182 0.1234 0.0973 1000 0.01024 0.0238 0.0464 0.01278
Table 2
ˆ h, and x
˘ for samples
from Weibull06 1, and LN0 1 of size 100, 400, and 1,000
Weibull06 1 MISE MISE MISE MISE LN0 1 MISE MISE MISE MISE
size ˆ n ˆ a ˆ ˘ size ˆ n ˆ a ˆ ˘
100 1.234 1.373 1.401 1.312 100 0.8791 0.1103 0.1049 0.8702
400 0.812 0.957 1.231 0.876 400 0.0346 0.0567 0.0442 0.0312
1000 0.612 0.731 0.865 0.634 1000 0.0231 0.0292 0.0312 0.0197
In Table 3, we report the approximate MISE’s when estimating the hazard

rate from the normal mixture 23 N3 1 + 13 N2 022 . It is seen that estimator
ˆ n x h1 h2 is doing better than the other estimates across all sample sizes
considered, with estimator ˆ a x h1 h2 being close second.
It is interesting to note that the above simulation results are in tune with
similar simulations carried out by Jones and Signorini (1997) in density estimation
context. Further, although in a slightly different framework, Nielsen (2003)
through simulations provides insight into the performance of ˆ n x h1 h2 using as
benchmark the Mean Weighted Integrated Squared Error (MWISE) instead of the
MISE we considered here, with weighting comprised of the observations remaining
at risk.
In Fig. 6, we use the suicide data, Silverman (1986), to see the performance
of the estimates on a real life application. The data are lengths of treatment
spells (in days) of control patients in a suicide study. The solid line is estimator
ˆ n x h1 h2 and the dashed line is ˆ a x h1 h2 . Bandwidths for the pilot estimates
were chosen by least squares cross validation and bandwidths for the kernels of the
adaptive estimates were chosen subjectively. This gave h2 = 23 for ˆ n x h1 h2 and
h2 = 45 for ˆ a x h1 h2 . In both cases we used the biweight kernel. We see that
in general both estimates perform similarly. Both curves suggest a fall in hazard in
approximately x = 100 to x = 500 and a subsequent increase from x = 500 onwards.
We have to note here the unreliability of such estimates for large values of x because
of the denominator in their variance formulae. That is, the increase for x > 500
could be just random effect. However, the two estimates have some differences such
as at about x = 100 where ˆ a suggests a bimodal structure in contrast with ˆ n which
shows a unimodal curve. Moreover, at about x = 500, ˆ n suggests a bigger drop in
hazard than ˆ a .
Table 3
ˆ h, and x
˘ for samples
from 23 N3 1 + 13 N2 022 of size 100, 400, and 1,000
2
3
+ 13 N2 022
N3 1 MISE MISE MISE MISE
sample size ˆ n x h1 h2 ˆ a x h1 h2 ˆ h
x ˘
x
100 0.5567 0.5793 0.743 0.7892
400 0.143 0.1491 0.293 0.3012
1000 0.07812 0.8023 0.1167 0.13321
Figure 6. Comparison of ˆ n x h1 h2 (solid line) with estimator ˆ a x h1 h2 (dashed line).
5. Conclusions
We showed that in the case of kernel-based nonparametric hazard rate estimation,
a smoothing parameter inversely proportional to the square root of the true hazard
rate leads to a mean square error rate of order n−8/9 . We discussed the practical
implementation of such an estimate and applied it to distributional and real life
data.
However, the results here leave open two important issues. First, an extension of
this work in the case of randomly right-censored data would be particularly useful
as this is frequently the case in lifetime studies. Second, an automatic bandwidth
selection rule for the adaptive estimate would be particularly interesting since this
would make the method useful to practitioners.
Appendix: Proofs
Here, we give an outline of the proofs of Theorems 2.1, 3.1, and 3.2. For the
full proofs see Bagkavos (2003). A full proofs version of the present work can be
found on the statlib website: http://lib.stat.cmu.edu. On the “Main menu”, click
the “Get software” link and then select “XLispStat Archive”. Browse in order
to find the “variablebandwidth” link and click to download the compressed file
“variablebandwidth.zip”.
A.1 Proof of Theorem 2.1

From Watson and Leadbetter (1964) we have

ˆ h = uKh u − x1 − F n udu
x
Then,
1 x−Xi 1

n
Xi 2 K Xi 2
˜ n x h = h −1 h
n−i+1
i=1

x−u
= h−1 1 − F n uu1/2 K u1/2 udu
h

3 x−u 1
= h−1 u 2 K u 2 du + on−1
h
because Fx < 1 and for large n, Fxn = on−1 . Now, applying the
transformation x − u = hz yields
1

3
3 x − hz 2 1 x − hz 2
˜
n x h = x 2 K zx 2 dz + on−1
3 1
x 2 x 2
Set
1
x − z 2 1 h
uz = zx 2 = y and = (7)
x1/2
1
x 2
Then

˜ n x h = x u3 yKyuydy + on−1 (8)
In order to analyze this mean expression further we expand uy in Taylor series
around 0 and Kyuy in Taylor series around y, we substitute back in (8),
factorize the expression with respect to , 2 , 3 , and 4 and integrate. Using A.1,
A.2, and A.3 and repeated integration by parts the integral of the coefficients of ,
2 , and 3 is 0. By integration by parts on the coefficient of 4 we get finally get that

1
˜
n x h = x + − u 0 + u 0u 0 + 5u 0
4
12

3
+ u 0 − 6u 0 u 0
2 2
y4 Kydy + o4
4
1
After some algebra and using again = h/ 2 x the bias expression follows.
As about the variance, set

n
1 n
In F = Fui−1 1 − Fun−i+1
i=1
n−i+1 i−1
and note that for 0 < F < 1
1
nIn F → as n → +
1−F
see, for example, Lemma 6 in Watson and Leadbetter (1964). Using this we find

1 u2 2 x−u
˜ n x h2 =
1
K u 2 du + o1
nh2 1 − Fu h
Set Qu = u2 /1 − Fu and x − u = hz. Then,

1 Qx
1
˜ x 2 1
n x h =
2
Qx − hzK z
2
1
x − hz dz + o1
2 (9)
nh Qx x 2
Now, set
−1
x − z2 x2
z =
1 − Fx − z 1 − Fx
and use (7) to get

−1
x − y2 x2
y = (10)
1 − Fx − y 1 − Fx
Then,
1 x 2
3
˜ n x h2 = yK 2 yuydy + onh−1 (11)

nh 1 − Fx
Expand y and Kyuy in Taylor series around 0 and substitute in (11). After
some algebra, applying the definition of variance and taking into account the fact
that the square of the expectation of ˜ n x h is of order on−2 the second equation
of Theorem 2.1 follows.

Utilizing the fast convergence of the empirical cdf Fn x to the true cdf Fx it is
easily seen that (5) is equivalent to
¯ n x ĥ1 ĥ2 = ¯ n x ĥ2 + Tx ĥ1 ĥ2 + op n− 9

4
(12)
where

ˆ i ĥ1 21 K x−Xi X
n X ˆ i ĥ1 21
1
¯ n x ĥ1 ĥ2 =
ĥ2
nĥ2 i=1 1 − FXi

1 x−Xi 1
1 n X 2 K
i Xi 2
¯ n x ĥ2 =
ĥ2

nĥ2 i=1 1 − FXi
Define x from the relation
ˆ ĥ1 21 = x 21 1 + x

x (13)
ˆ h − x h. Working as in

Also, set bx h = x h − x and Dx h = x
Hall and Marron (1988) we can write
1 1
¯ n x ĥ1 ĥ2 = ¯ n x ĥ2 +
1 x ĥ1 ĥ2 +
2 x ĥ1 ĥ2 +
3 x ĥ1 ĥ2 (14)
2 2
where we define the
i , i = 1 2 3 to be
1 x−Xi 1
n X − 2 DX ĥ L
1 i i 1 h2
Xi 2

1 x ĥ1 ĥ2 =
nĥ2 i=1 1 − FXi
1 x−Xi 1
n X − 2 bX ĥ L
1 i i 1 ĥ2
Xi 2

2 x ĥ1 ĥ2 =
nĥ2 i=1 1 − FXi

3 x ĥ1 ĥ2 ≤ C1
4 x ĥ1 ĥ2

n
C1
≡ sup Dx ĥ1 + bx ĥ1
2 2
I x − Xi ≤ C2 ĥ2
nĥ2 x∈T i=1
The theorem will be proved if we show that

4
sup P
i x h1 h2 > n− 9 = On−r r > 0 (15)
x∈Th1 ∈1 h2 ∈2
for i = 2 4 and
4
sup P
1 x h1 h2 − 2Tx h1 h2 > n− 9 = On−r (16)
x∈Th1 ∈1 h2 ∈2
1 1 1 1
where for > > 0, 1 = h n−a− 5 ≤ h ≤ na− 5 and 2 = h n− 9 ≤ h ≤ n 9 .
Starting with
4 we see that we need to bound functions D, b, and the sum.
We start with function D. Suppose that is a finite set, subset of T such that
= x ∈ T for some y ∈ T x − y <

> 0
An upper bound for E Dx h1 l for some appropriate constant C1 will be
2l 2l
1 1
Dx h1 ≤ C1
l
≤ C1 4
nh1 na+ 5
uniformly
in x ∈ T. Using an argument similar to that of Stone (1984), the fact
that i zi ≤ # sup zi , together with Corollary 2.2 from Hall and Heyde (1980,
p. 19) we have

− 49 4 1
P sup Dx h1 > n
2
≤ # −1 n 9 2 sup Dx h1 l
x∈ x∈
Using Hölder continuity of the kernel and choosing l large we get

− 49
P sup Dx h1 > n
2
= On−r r > 0 (17)
x∈T
uniformly in h1 ∈ 1 . The same result can be established for the sum by using
ˆ This
exponential bounds and for bx h1 by using the expression for the mean of .
proves (15) for i = 4. Now, calculations similar to those on the proof of Theorem 2.1
give
1

1 x h1 h2 = x − h2 z 2 bx − h2 z h1 L zx − h2 z1/2 dz = Oh21 h22
uniformly in x ∈ T. Working as in

4 and assuming l sufficiently large,
l
4l 1
P
2 x −
2 x > n−4/9 ≤ Cl−1 n 9 −r
l
1 h2 = On
h21 h2 2 + h2l l
h2
which proves (15) for i = 2. For

1 first define
K v−u 1
1
v− 2 1−Fv
h1
− h1 2 v h1 L x−v
h
v 2
mu v =
2
(18)
1 − Fv
m1 u = mu Xj (19)
and Mu v = mu v − m1 u for any j = 1 2 n. Therefore,

1 n

1 x h1 h2 = 2 MXi Xj + MXi Xi + 2Tx h1 h2 (20)
n h1 h2 i
=j i=1
Hence, it remains to show that for r > 0 we have

n
1
− 45
sup P 2 MX X > n = On−r (21)
x∈Th1 ∈1 h2 ∈2 n h1 h2 i=1 i i

1
− 45
sup P 2 MX X > n = On−r (22)
x∈Th1 ∈1 h2 ∈2 n h1 h2 i
=j i j
To prove (21) first note that

n
n
nK 2 0 2 x−z 1
M 2 Xi Xi ≤ m2 Xi Xi ≤ L z 2 dz ≤ C6 nh2
i=1 i=1
1 − Fx3 h2
with C6 being a positive generic constant. Also,

z− 2
l
l x−z
n n
1
MXi Xi ≤ l
mXi Xi = n l l
K 0L z fzdz
2
i=1 i=1
1 − Fz2l h2
and thus,

n
MXi Xi l ≤ nC7 h2
i=1
where C7 is positive generic constant. Apply inequality (21.5) of Burkholder (1973)

with x = xl ,

n
f ∗ = sup MXi Xi di = MXi Xi − MXi Xi
i=1n i=1
and
21

n
Sf = di 2
i=1
gives
n l n 2l

n
2
MXi Xi ≤ MXi Xi + MXi Xi l
i=1 i=1 i=1
−l l
≤ n h1 h2 C6 nh2 + C7 nh2
2 2
Hence,
n
1
− 45
P MXi Xi > n
n2 h1 h2 i=1
l
1
≤ C7 l−l 2
n h1 h2
n 2l

n

× MXi Xi − MXi Xi + 2
MXi Xi − MXi Xi l
i=1 i=1
l
1
≤ C8 l−l nh2 l/2 + nh2
n 2 h1 h2
Choosing l sufficiently large completes the proof of (21). The proof of (22) is similar
though it involves a longer calculation.

To avoid long technical details we here consider nonrandom bandwidths to
outline the proof and then restrict ourselves to determine asymptotic variances
and covariances. The random bandwidths case can be handled by expanding the
arguments in Abramson (1982b) to the hazard case.
Asymptotic normality of Sx h2 can be established by the Hajek projection
method; for details see Bagkavos (2003). For the asymptotic normality of
Tx h1 h2 first note that this term is a sum of i.i.d random variables and thus
Lindeberg’s theorem can be used. Alternatively, one can use again the Hajek
projection method. This would lead to relatively easy technical arguments to employ
the Cramer–Wold device in order to establish joint asymptotic normality of Sx h2
and Tx h1 h2 .
Recall the definitions of mu v and m1 u from (18) and (19),
m1 u = m2 u − m3
where
1/2 v u − v x − v
m2 u = K L 1/2
v dv
1 − Fv h1 h2
1/2 x − h z u − x + h z
= h2 2
K 2
L z1/2 x − h2 z dz
1 − Fx − h2 z h1
and

m3 = h1 h2 x − vh2 h1 1/2 x − vh2 Lv1/2 x − vh2 dv = Oh1 h2
n
Thus, Tx h1 h2 = 2nh1 h i=1 m2 Xi − 2h1 h2 m3 and its asymptotic variance will
1
1 2
now equal to 4 n2 h2 h2 i=1 ar m2 Xi It has been proved in Bagkavos (2003) that
1 1 n
1 2

m2 X = h2 h1 x KuLwdu dw = Oh1 h2 (23)
which is negligible. Therefore,
1 1 n
ar Tx h1 h2 2 2
m22 Xi (24)
4 n2 h1 h2 i=1
Now rewrite m2 u by substituting z = v1 h where h = h1 h−1

2 as

1/2 x − h1 v1 u − x + h 1 v1
m2 u = h1 K Lhv1 1/2 x − h1 v1 dv1
1 − Fx − h1 v1 h1
Then

1/2 x − h1 v1 u − x + h 1 v1
m22 X = h21 K
1 − Fx − h1 v1 h1

x − h1 v2
1/2
u − x + h 1 v2
× Lhv1 x − h1 v1
1/2
K
1 − Fx − h1 v2 h1
× Lhv2 1/2 x − h1 v2 fudv1 dv2 du
By substituting h1 w = u − x + h1 v1 we get
m22 X
1/2 x − h1 v1 1/2 x − h1 v2
= h31 KwLhv1 1/2 x − h1 v1
1 − Fx − h1 v1 1 − Fx − h1 v2
× Kw + v1 − v2 Lhv2 1/2 x − h1 v2 fx + h1 w − h1 v1 dv1 dv2 dw
1/2
Next set Qx = 1−Fx
x
Then expand Q and f in Taylor series around x, and put
a = 1/2 x to get
m22 X

h31 Q2 xfx KwL ahv1 Kw + v1 − v2 Lahv2 dv1 dv2 dw

= h31 Q2 xfx KwL ahv2 + zKw + zLahv2 dz dv2 dw
ahv2 = u, then use the Taylor series expansion of L around u and

Next substitute
the fact that KwKw + zdw dz = 1 to conclude that
3/2 x 2
m22 X h21 h2 L udu (25)
1 − Fx
Thus, substituting (25) in (24) we get,
1 1 3/2 x 2
ar Tx h1 h2 L udu
4 nh2 1 − Fx
Since Sx h2 = 0, the covariance will be calculated from
ovSx h2 Tx h1 h2
= Sx h2 Tx h1 h2

= ˜ n x h2 Tx h1 h2 − ˜ n x h2 Tx h1 h2 (26)
For the second term of the right-hand side of the last equation, considering only
the leading term of the product of the means of ˜ n and m2 (respectively, from
Theorem 2.1 and Eq. (23)), we have

˜ n x h2 m2 Xi 2 xh1 h2 KuLwdu dw
and thus
2 x
˜ n x h2 Tx h1 h2 KuLwdu dw (27)
2
To compute ˜ n x h2 Tx h1 h2 , separate the diagonal and non diagonal terms.

For the diagonal term note that
n i 1/2
1 K x−X h2
Xi
m2 Xi Xi 1/2
n i=1
1 − Fn Xi
n x−X
1/2 Xi K h i 1/2 Xi
= m2 Xi 2

i=1
n−i+1
Let fXi u denote the density of the ith order statistic, then
x−Xi

n 1/2 Xi K h2
1/2 Xi
m2 Xi
i=1
n−i+1

n 1/2
v Xi − v x − v 1/2
= K L v dv
i=1
1 − Fv h1 h2
x−X
1/2 Xi K h i 1/2 Xi
× 2
n−i+1

n
u−v x − v 1/2
= QvK L v
i=1
h1 h2

1/2 uK x−u
h2
1/2 u
× fXi udu dv
n−i+1

u−v x − v 1/2 x − u 1/2
= QvK L 1/2
v uK u
h1 h2 h2
× u1 − F n udu dv

u−v x − v 1/2 x − u 1/2
QvK L 3/2
v uK u du dv = I
h1 h2 h2
Now set v = x − h1 y, u = x − h2 w and hh2 = h1 . Then,

hy − w
I = h1 h2 Qx − h1 yK Lhy1/2 x − h1 y
h
× 3/2 x − h2 wKw1/2 x − h2 wdy dw
Next, use substitution hy − w = hu and then, expanding Q and in Taylor series

around x, note that

I h1 h2 Qx3/2 x KuLawKawdu dw
where a is as defined before. Finally, set aw = v to conclude that
3/2 x
I h 1 h2 KvLvdv (28)
1 − Fx
For the non diagonal term, with similar arguments and algebra, but using the joint
density of Xi Xj , one can derive that
i 1/2
1 K x−X
h2
Xi
1/2
m2 Xj Xi
2n2 h1 h22 i
=j 1 − Fn Xi
2 x
KuLwdu dw (29)
2
The result follows by combining (26)–(29).
Acknowledgment
The authors would like to thank the referees for helpful suggestions and comments
that lead to improvements of this research.
References
Abramson, I. (1982a). On bandwidth variation in kernel estimates – a square root law.
Ann. Statist. 10(4):1217–1223.
Abramson, I. (1982b). Arbitrariness of the pilot estimator in adaptive kernel methods.
J. Multivariate Anal. 12:562–567.
Bagkavos, D. (2003). Bias Reduction in Nonparametric Hazard Rate Estimation. PhD thesis,
The University of Birmingham, UK.
Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probab.
1:19–42.
González-Manteiga, W., Cao, R., Marron, J. S. (1996). Bootstrap selection of the smoothing
parameter in nonparametric hazard rate estimation. J. Amer. Statist. Assoc. 91:
1130–1140.
Hajek, J. (1968). Asymptotic normality of simple linear statistics under alternatives.
Ann. Mathemat. Statist. 39:325–346.
Hall, P. (1990). On the bias of variable bandwidth curve estimators. Biometrika 77:529–535.
Hall, P., Heyde, C. (1980). Martingale Limit Theory and Its Applications. New York: Academic
Press.
Hall, P., Marron, J. S. (1988). Variable window width kernel estimates of probability
densities. Probab. Theor. Related Fields 80:37–49.
Hall, P., Hu, T. C., Marron, J. S. (1995). Improved variable window width kernel estimates
of probability densities. Ann. Statist. 23:1–10.
Hazelton, M. L. (2007). Bias reduction in kernel binary regression. Computat. Statist. Data
Anal. 51:4393–4402.
Jones, M. C., Signorini, D. F. (1997). A comparison of higher order bias kernel density
estimators. J. Amer. Statist. Assoc. 91(433):401–407.
Jones, M. C., Marron, S., Sheather, S. (1996). A brief survey of bandwidth selection for
density estimation. J. Amer. Statist. Assoc. 92(439):1063–1073.
Lo, Y., Mack, S., Wang, J. (1989). Density and hazard rate estimation for censored data via
strong representation of the Kaplan–Meyer estimator. Probab. Theor. Related Fields
80:461–473.
Müller, H.-G., Wang, J. L. (1994). Hazard rate estimation under random censoring with
varying kernels and bandwidths. Biometrics 50:61–76.
Nielsen, J. P. (2003). Variable bandwidth kernel hazard estimators. Nonparametric Statist.
15(3):355–376.
Patil, P. N. (1993). Bandwidth choice for non parametric hazard rate estimation. J. Statist.
Plann. Infer. 35:15–30.
Rice, J., Rosenblatt, M. (1976). Estimation of the log-survivor function and hazard function.
Sankhya, Ser. A 38:60–78.
Ruppert, D., Cline, D. (1994). Bias reduction in kernel density estimation by smoothed
empirical transformations. Ann. Statist. 22(1):185–210.
Sarda, P., Vieu, P. (1991). Smoothing parameter selection in hazard rate estimation. Statist.
Probab. Lett. 11:429–434.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman
& Hall.
Simonoff, J. (1996). Smoothing Methods in Statistics. New York: Springer.
Stone, C. J. (1984). An asymptotically optimal window selection rule for kernel density
estimates. Ann. Statist. 12:1285–1297.
Tanner, M. (1983). A note on the variable kernel estimator of the hazard function from
randomly censored data. Ann. Statist. 11:994–998.
Tanner, M., Wong, W. (1983). The estimation of the hazard function from randomly
censored data by the kernel method. Ann. Statist. 11:989–993.
Wand, M. P., Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall.
Watson, G., Leadbetter, M. (1964). Hazard analysis I. Biometrika 51:175–184.

Bagkavos and Patil (2009)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bagkavos and Patil (2009)

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [CERIST]

On: 02 September 2013, At: 09:13

Communications in Statistics - Theory

Variable Bandwidths for Nonparametric

To link to this article: http://dx.doi.org/10.1080/03610920802364088

PLEASE SCROLL DOWN FOR ARTICLE

Variable Bandwidths for Nonparametric

DIMITRIOS BAGKAVOS1 AND PRAKASH PATIL2

Keywords Hazard rate estimation; Kernel; Mean integrated square error;

Mathematics Subject Classiﬁcation Primary 62G05, 62N02; Secondary 62P10,

Received June 20, 2007; Accepted July 23, 2008

2. Variable Bandwidth Estimator and Mean Square Error Analysis

to deﬁne a new estimator that makes use of variable bandwidth,

A.2 K is non negative, symmetric, and ﬁve times differentiable.

ar ˜ n x h = K zdz + onh−1

uniformly in x ∈ T as h → 0, n → + and nh → +, with

˜ n x ĥ = ˜ n x cn− 9 + op n− 9

as n → + for every x ∈ T.

ˆ h is the ‘pilot’ estimator deﬁned in (1). Practical choice of the

ˆ h Lx = Kx + xK  x and L1 z = zK  z

with  >  > 0. Then,

where N1 N2 is a bivariate normal distribution with mean 0 and covariance

Remark 3.4. The condition h1 h−1 2 → 0 of Theorem 3.2 provides a guidance to

and u is the cumulative distribution function of the kernel,

This estimate was discussed in Silverman (1986). Further, in Bagkavos (2003) it

be closer to the true curve overall. On the contrary, estimator ˆ a x h1 h2 seems

Figure 1. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 2. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

In Fig. 3 we estimate the hazard rate of the Weibull06 1 distribution.

Figure 3. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 4. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 5. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

In Table 3, we report the approximate MISE’s when estimating the hazard

Figure 6. Comparison of ˆ n x h1 h2 (solid line) with estimator ˆ a x h1 h2 (dashed line).

A.1 Proof of Theorem 2.1

and note that for 0 < F < 1

Set Qu = u2 /1 − Fu and x − u = hz. Then,

and use (7) to get

˜ n x h2 = yK 2 yuydy + onh−1 (11)

A.2 Proof of Theorem 3.1

¯ n x ĥ1 ĥ2 = ¯ n x ĥ2 + Tx ĥ1 ĥ2 + op n− 9

nĥ2 i=1 1 − FXi

Deﬁne x from the relation

ˆ ĥ1 21 = x 21 1 + x

ˆ h − x h. Working as in

The theorem will be proved if we show that

 = x ∈ T  for some y ∈ T x − y <

Using Hölder continuity of the kernel and choosing l large we get

uniformly in x ∈ T. Working as in

which proves (15) for i = 2. For

and Mu v = mu v − m1 u for any j = 1 2 n. Therefore,

Hence, it remains to show that for  r > 0 we have

To prove (21) ﬁrst note that

with C6 being a positive generic constant. Also,

where C7 is positive generic constant. Apply inequality (21.5) of Burkholder (1973)

i=1 i=1 i=1

A.3 Proof of Theorem 3.2

which is negligible. Therefore,

Now rewrite m2 u by substituting z = v1 h where h = h1 h−1

 ahv2 = u, then use the Taylor series expansion of L around u and

Thus, substituting (25) in (24) we get,

Since Sx h2 = 0, the covariance will be calculated from

To compute ˜ n x h2 Tx h1 h2 , separate the diagonal and non diagonal terms.

ar ˜ n x h = K zdz + onh−1

uniformly in x ∈ T as h → 0, n → + and nh → +, with

˜ n x ĥ = ˜ n x cn− 9 + op n− 9

as n → + for every x ∈ T.

ˆ h Lx = Kx + xK x and L1 z = zK z

with > > 0. Then,

and u is the cumulative distribution function of the kernel,

be closer to the true curve overall. On the contrary, estimator ˆ a x h1 h2 seems

Figure 1. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 2. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 3. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 4. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 5. Comparison of ˆ n x h1 h2 (variable bandwidth, dashed-dotted line),

Figure 6. Comparison of ˆ n x h1 h2 (solid line) with estimator ˆ a x h1 h2 (dashed line).

Set Qu = u2 /1 − Fu and x − u = hz. Then,

˜ n x h2 = yK 2 yuydy + onh−1 (11)

¯ n x ĥ1 ĥ2 = ¯ n x ĥ2 + Tx ĥ1 ĥ2 + op n− 9

Deﬁne x from the relation

ˆ ĥ1 21 = x 21 1 + x

ˆ h − x h. Working as in

= x ∈ T for some y ∈ T x − y <

Hence, it remains to show that for r > 0 we have

ahv2 = u, then use the Taylor series expansion of L around u and

To compute ˜ n x h2 Tx h1 h2 , separate the diagonal and non diagonal terms.

Next, use substitution hy − w = hu and then, expanding Q and in Taylor series