You are on page 1of 9

Chapter 7

Change Point Model for Hazard


Function

7.1 Introduction

In the previous chapters, we presented change point analyses for various


models. In this chapter, we introduce another change point problem, often
encountered in reliability analysis, the problem of estimating the change point
in a failure rate or hazard function.
Let T1 , T2 , . . . , Tn be a random sample of size n from a lifetime distribution
with probability distribution function F (·) and density function f (·). The
hazard function of F is defined as

r(t) = f (t)/(1 − F (t)) for t ≥ 0.

It is of our concern whether the hazard function maintains at a rate a for


0 ≤ t ≤ τ , and later keeps at a lower rate b for t ≥ τ , where a > b ≥ 0.
Formally, we are interested in investigating if

a for 0 ≤ t ≤ τ
r(t) = , (7.1)
b for t > τ

where τ is defined as the change point or threshold of the failure rate function
r(t).
This change point is different from the discrete change point discussed in
the usual literature of the change point models. It is a continuous time in
nature, and its “discrete” realization is the change point location we want to
estimate.
For testing whether a new leukemia therapy produces a departure from
a constant relapse rate after induction of remission, Matthews and Farewell
(1982) proposed to study the above model for the relapse rate of leukemia
patients and estimated the threshold τ by a numerical method. Nguyen,
Rogers, and Walker (1984) estimated the parameters in the above model
using the analysis of mixture model. Matthews, Farewell, and Pyke (1985)

J. Chen and A.K. Gupta, Parametric Statistical Change Point Analysis: With Applications 189
to Genetics, Medicine, and Finance, DOI 10.1007/978-0-8176-4801-5_7,
© Springer Science+Business Media, LLC 2012
190 7 Hazard Function Change Point Model

considered the inference of the change point in hazard models using a


score-statistic process. Basu, Ghosh, and Joshi (1988; BGJ) investigated the
estimation of such a change point by semiparametric methods. Ghosh and
Joshi (1992) further studied the asymptotic distribution of one of the estima-
tors proposed by BGJ. From the simulation studies of the estimates proposed
by these authors, one can see that the estimates are not as good as expected
for all occasions.
In a review article by Muller and Wang (1994), it was pointed out that
the likelihood of the change point model (7.1) of hazard rate functions is
unbounded based on the work of Matthews and Farewell (1982), Worsley
(1988), and Henderson (1990) unless a > b. Such a hazard function is plotted
in Figure 7.1.

Fig. 7.1 A typical hazard rate function with one change point

Furthermore, it is noted that the likelihood-based estimation of τ does not


have a closed form. Therefore, the Bayesian approach for the estimation of
the change point τ in the hazard function (7.1) became quite popular and
many works were done in the Bayesian framework. For this reason, we present
the Bayesian approach of Ghosh, Joshi, and Mukhopadhyay (1993) for the
estimation of τ in the following section.

7.2 The Bayesian Approach

For simplicity, we denote the order statistics of the random sample


T1 , T2 , . . . , Tn , taken from a lifetime distribution with probability distribution
function F (·) and density function f (·), as t1 , t2 , . . . , tn with a clear under-
standing that t1 ≤ t2 ≤ · · · ≤ tn .
7.2 The Bayesian Approach 191

Note that there are several classical estimators of the location τ of the
change point model (7.1) in the literature.
First, an estimator of τ , denoted τ̂ , was obtained in Nguyen et al. (1984)
as the value of τ̂ such that the stochastic process Yn (τ ) approaches 0 at τ̂ ,
where for 0 ≤ r ≤ n − 1, Yn (tr ) = Xn (tr ) with
   
n
Xn (t) = S(t) (n − r) log − r n−1
n−r
  
M (t) n
+r − Zn log n−1 ,
n n−r

n
ti
M (t) = ,
i=r+1
n −r


n
t2i 2
S 2 (t) = − M (t) ,
i=r+1
n − r

and
Zn = (t1 + · · · + tn )/n.
For tr ≤ t < tr+1 , Yn (t) is defined by linear interpolation. For t ≥ tn , Yn (t)
is defined by

[Xn (tn ) − Xn (tn−1 )](t − tn )


Yn (t) = Xn (tn ) + .
tn − tn−1

The estimator τ̂ was shown to be consistent in Nguyen et al. (1984).


However, this estimate relies on the asympototic properties of the estimate.
As noted in Nguyen et al. (1984), for their model, the likelihood is unbounded
as the MLE τ̂ → Tn . For this and other reasons, Ghosh et al. (1993) proposed
a Bayesian approach for the estimation of the change position τ .
Let the sample {T1 , T2 , . . . , Tn } be denoted as D, and T0 = 0, Tn+1 = ∞.
It is clear that corresponding to the hazard rate change point model (7.1),
the pdf of each of the random variable Ti is given by

ae−ati for 0 ≤ ti ≤ τ
fTi (ti ; a, b, τ ) = . (7.2)
be−bti −(a−b)τ for ti > τ

Then, the likelihood function of the parameters a, b, τ over the natural


parameter space (a, b, τ )|0 < b < a < ∞, 0 < τ < ∞ is given by

8
n
L(a, b, τ |D) = fTi (ti ; a, b, τ )
i=1

= aR(τ ) e−aQ(τ ) bn−R(τ ) e−b(T −Q(τ )) ,


192 7 Hazard Function Change Point Model

where

n
R(τ ) = I[Ti ≤τ ] ,
i=1


n
M (τ ) = Ti I[Ti ≤τ ] ,
i=1

Q(τ ) = M (τ ) + (n − R(τ ))τ,



n
T = Ti .
i=1

To obtain the Bayesian estimate of the change point location τ , Ghosh


et al. (1993) first placed the following prior distribution, π(a, b, τ ), on the
parameters (a, b, τ ) over the natural parameter space (a, b, τ )|0 < b < a < ∞,
0 < τ < ∞ as the following.
1
π(a, b, τ ) = , 0 < b < a < ∞, 0 < τ < ∞.
ab
Then, the joint posterior distribution, π(a, b, τ |D), of (a, b, τ ) can be
obtained as

π(a, b, τ |D) ∝ L(a, b, τ |D) · π(a, b, τ )


= aR(τ )−1 e−aQ(τ ) bn−R(τ )−1 e−b(T −Q(τ )) . (7.3)

Next, we proceed to obtain the joint posterior of (b, τ ) by integrating the


right-hand side (R.H.S.) of expression (7.3) with respect to a on its natural
range (b, ∞). This can be done in two situations as the following.
(I). For 0 < τ < T1 .
For this situation, the joint posterior, π(b, τ |D), of (b, τ ) was derived in
Ghosh et al. (1993) as
 ∞
π(b, τ |D) = π(a, b, τ |D)da
b
 ∞8
n
1
∝ be−bti −(a−b)τ · da
b i=1
ab
 ∞  n
= bn e−b 1 ti −anτ +bnτ a−1 b−1 da
b
 ∞
= a−1 e−anτ bn−1 e−b(T −nτ ) da
b

= bn−1 e−b(T −nτ ) g(b), (7.4)


7.2 The Bayesian Approach 193

where  ∞
g(b) = a−1 e−anτ da.
b

(II). For Ti < τ < Ti+1 , i = 1, . . . , n.


For this situation, the joint posterior, π(b, τ |D), of (b, τ ) was also derived
in Ghosh et al. (1993) as
 ∞
π(b, τ |D) ∝ aR(τ )−1 e−aQ(τ ) bn−R(τ )−1 e−b(T −Q(τ )) da
b

(i − 1)!  (Q(τ ))j n−i+j−1 −T b


i−1
= b e . (7.5)
(Q(τ ))i j=0 j!

Note that for Tn ≤ τ < ∞, the R.H.S. of expression (7.5) is clearly,

(n − 1)!  T j j−1 −T b
n−1
R.H.S. of (7.5) = b e . (7.6)
T n j=0 j!

When integrating (7.6) with respect to b, the first term is


 ∞
b−1 e−T b db,
0

which diverges. For this reason, a restriction on the parameter b was imposed
by Ghosh et al. (1993). This restriction is that for a positive constant b0 ,
we require 0 < b0 ≤ b. With this restriction and the above-obtained joint
posterior π(b, τ |D) of (b, τ ), the posterior density of τ , π(τ |D), was obtained
in Ghosh et al. (1993) for the following three cases.

Case 1. When 0 < τ < T1 .


In this case, using expression (7.4), the posterior density of τ , π(τ |D) was
given as  ∞
π(τ |D) ∝ bn−1 e−b(T −nτ ) g(b)db. (7.7)
b0

Note that the integral given by expression (7.7) does not have a closed
form, therefore, the posterior of τ can only be obtained numerically. To assure
the existence of π(τ |D) in this case, Ghosh et al. (1993) have shown that
π(τ |D) is finite, along with some other properties of π(τ |D) evidenced by
Propositions 4.1–4.5 in Ghosh et al. (1993). During the course of establishing
these propositions, another restriction, namely, τ ≥ c > 0 for a constant c
was imposed on τ as if τ = 0; the model is nonidentifiable.

Case 2. When Ti < τ < Ti+1 , i = 1, . . . , n − 1.


In this case, π(τ |D) was obtained by integrating expression (7.5) with
respect to b over the range [b0 , ∞):
194 7 Hazard Function Change Point Model

 ∞
(i − 1)!  (Q(τ ))j n−i+j−1 −T b
i−1
π(τ |D) ∝ b e db
b0 (Q(τ ))i j=0 j!
n−i+j−1  
(i − 1)! −T b0   (T b0 )k (n − i + j − 1)! (Q(τ ))j
i−1
= e .
(Q(τ ))i j=0
k! j! T n−i+j
k=0
(7.8)

Case 3. When τ ≥ Tn .
In this case, π(τ |D) was obtained by integrating expression (7.6) with
respect to b over the range [b0 , ∞):

(n − 1)!  T j j−1 −T b
∞ n−1
π(τ |D) ∝ b e db
b0 T n j=0 j!
⎡  j−1  ⎤
(n − 1)! ⎣ 
n−1  (T b0 )k
1⎦
= h(b0 ) + e−T b0 , (7.9)
Tn j=1
k! j!
k=0

where  ∞
h(b0 ) = b−1 e−T B db.
b0

Due to the constancy of π(τ |D) over the interval [Tn , ∞), Ghosh et al.
(1993) placed a third restriction on τ : τ ≤ d < ∞ for a finite constant d.
Summing all of the considerations above, the prior π(a, b, τ ) on the para-
meters (a, b, τ ) should now be completely restated as
1
π(a, b, τ ) = , 0 < b0 ≤ b < a < ∞, 0 < c ≤ τ ≤ d < ∞.
ab
Under this prior, the posterior π(τ |D) is given by expressions (7.7), (7.8),
and (7.9), respectively, for the three cases of τ . Then, the Bayesian estimate
of the change point τ is obtained by either the posterior mean or posterior
mode. Ghosh et al. (1993) have given computational details on how to obtain
such an estimate using (7.7)–(7.9). Note that the computation itself is quite
tedious as is the case for Bayesian computation. So, in the next section,
an alternative estimate of τ , given in Chen (2003) is presented in light of the
Schwarz information criterion (SIC).

7.3 The Informational Approach

In the literature, the change point of the hazard function is written as an


inference problem of testing

H0 : τ = 0 against H1 : τ > 0.
7.3 The Informational Approach 195

Several test statistics and estimates were proposed for the inference. The most
interesting aspect of the inference is to estimate the true change point τ . For
this reason, a model selection method using SIC proposed in Chen (2003) is
presented herein.
Recall from the previous chapter that the SIC is defined as
 + p log n,
SIC = −2 log L(Θ)

where p is the number of free parameters that need to be estimated under the
model. Its penalty term p log n takes the information from the sample size.
Now, we introduce an estimate for τ based on the SIC. If a lifetime random
variable T has a hazard function given by (7.1), then it is clear that the
probability density function of T is

a exp{−at} 0≤t≤τ
f (t) = . (7.10)
b exp{−aτ − b(t − τ ) τ <t<∞

Because a random sample T1 , T2 , . . . , Tn is available from the lifetime distri-


bution function in (7.2), the order statistics of this sample can be denoted by
T(1) ≤ T(2) ≤ · · · ≤ T(n) . Now, let τ0 be the point such that T(k) ≤ τ0 < T(k+1)
for some k with k = 1, . . . , n − 1. Then we can truncate the values at τ0 if
such a k can be found and use it to estimate the true τ .
The likelihood function L(·) of a, b, k is given by

L(a, b, k)
= L( a, b, k| t(1) , . . . , t(n) )
= fT(1) ,...,T(n) (t(1) , . . . , t(n) )
!⎛ n ⎞
8 k 8
= n! f (t(i) ) ⎝ f (t(j) )⎠
i=1 j=k+1
  ⎧ ⎫

k ⎨ 
n ⎬
= n!ak exp −a t(i) · bn−k exp −(n − k)aτ0 − b (t(j) − τ0 ) ,
⎩ ⎭
i=1 j=k+1

where t(1) , . . . , t(n) are the sample realizations of the order statistics T(1) ≤
T(2) ≤ · · · ≤ T(n) . The log of the likelihood function is clearly

l(a, b, k) = log L(a, b, τ0 )


 k 

= log n! + k log a − a t(i) + (n − k)τ0
i=1
⎡ ⎤

n
+ (n − k) log b − b ⎣ t(j) − (n − k)τ0 ⎦ .
j=k+1
196 7 Hazard Function Change Point Model

For each k, the maximum likelihood estimates (MLEs) of a and b are


obtained as
k

a= k
(n − k)τ0 + i=1 t(i)
and
b = n n−k
.
j=k+1 (j) − (n − k)τ0
t
Define the SIC for the change point hazard function model as

a, b, τ0 ) + 3 log n
SIC(k; τ0 ) = −2l(
⎧ ⎡ ⎤⎫
⎨  n ⎬
= −2(n − k) log (n − k) ⎣ t(j) − (n − k)τ0 ⎦
⎩ ⎭
j=k+1
  

k
− 2k log k (n − k)τ0 + t(i) − 2 log n! + 2n + 3 log n,
i=1

with an empirical choice of τ0 . The next step of the procedure is to find k


such that
SIC(k; τ0 ) = min SIC(k; τ0 ). (7.11)
1≤k≤n−1

Then the true change point τ of the hazard is estimated by

  
k nj=k+1 t(j) − (n − 
k) ki=1 t(i)
τ = , (7.12)
n(n − k)

which minimizes SIC(


k; τ ) at τ = τ, and (7.12) is obtained by solving

∂ 

SIC(k; τ0 ) = 0.
∂τ0 τ 0 =τ

This estimate (7.12) is optimal by the model selection principle. Three choices
of τ0 are suggested here:

τ01 = t(k) , τ02 = (t(k) + t(k+1) )/2, or τ03 = 0.618t(k) + 0.382t(k+1) ,

where τ03 is designed according to the weighted average with empirically


more weight towards the kth order statistic value t(k) .
This estimator of the change in hazard function is very appealing to prac-
tioners due to its simplicity and ease of computation.
7.4 Simulation Results 197

Table 7.1 The Estimates τ1 , τ2 , τ3 in Chen and τ in GJ


a b True τ τ1 (MSE) τ2 (MSE) τ3 (MSE) τ (MSE)
3 2 .15 .1229 .0033 .1270 .0123 .1276 .0129 .1379 .0078
3 2 .10 .1080 .0382 .1120 .0501 .1044 .0373 .1173 .0092
3 1 .15 .0547 .2017 .0533 .1918 .0533 .1918 .1828 .0119
3 1 .10 .1755 .8167 .1875 .9928 .1967 .0585 .1970 .0409
3 1.5 .15 .0082 .0201 .0152 .0297 .0076 .0203 .1584 .0085
3 1.5 .10 .0301 .0394 .0228 .0299 .0216 .0247 .1475 .0179
2 1.5 .20 .1548 .1932 .1555 .1652 .1461 .1362 .1605 .0197
2 1.5 .15 .1095 .0637 .1075 .0958 .1098 .0957 .1479 .0177
2 1.5 .10 .0751 .0640 .0776 .0796 .0783 .0796 .1538 .0254
2 1 .20 .3074 .0163 .3010 .0167 .3016 .0168 .2229 .0261
2 1 .15 .2144 .0069 .2093 .0066 .2096 .0066 .1923 .0316
2 1 .10 .1693 .0389 .1620 .0392 .1618 .0392 .2135 .0607
2 .5 .20 .5051 .1535 .5007 .1507 .4688 .1335 .3136 .0743
2 .5 .15 .3516 .0715 .3162 .0592 .3104 .0581 .3147 .1325
2 .5 .10 .2143 .0280 .2114 .0261 .2112 .0258 .3225 .2049
1 .5 .30 .6061 .1460 .6083 .2211 .6086 .2216 .4465 .1720
1 .5 .20 .3519 .1286 .3503 .2742 .3224 .1212 .4079 .2444
1 .5 .15 .3243 .7067 .3296 .8190 .3310 .8190 .3825 .2636

7.4 Simulation Results

A simulation study is given to show the feasibility of these new estimates τ1 ,
τ2 , and τ3 (with respect to τ01 , τ02 , and τ03 ) in comparison with the estimates
obtained in Ghosh and Joshi (1992; GJ) and in BGJ. The simulation results
are given in Table 7.1, where τ1 , τ2 , and τ3 are the new estimates proposed
in Section 7.2 and τ is the estimate in Table I of GJ. One can observe that
the new estimates τ1 , τ2 , and τ3 are quite compatible with the estimate τ in
GJ and τ1 , τ2 in BGJ; and they are better than the estimates τ3 and τ4 in
BGJ (see the simulation results given in BGJ). However, the new estimates
perform better only when a ≥ 3, a − b ≥ 1, and τ ≥ .15. Thus, it is still
desirable to find good estimates for other values of a, b, and τ . A further
study for the properties of the new estimates τ1 , τ2 , and τ3 is also needed.

You might also like