You are on page 1of 5

Explaining the hazard function, h(t), through statistical lenses

Let F(t) denotes cumulative distribution function (c.d.f.)
0 ≤ F(t) ≤ 1
In our case, t is time and F(t) is cumulative probability of an event up to time t. The
loner the follo! up time, the reater is the probability that the event !ill happen. In
other !ords, c.d.f. is simply the "cumulative incidence# (for incident events) or a
Let $(t) % 1 & F(t). 'vidently, $(t) is the survival probability( the probability that the
event !ill not happen until time t.
)ere I dre! an e*ample of F(t). For the moment, inore the e+uation behind the raph.
If !e ta,e the first derivative of a cumulative distribution function, !e et the probability
density function (p.d.f.). Let-s call it f(t).
F-(t) % dF.dt % f(t). /ra! a small trianle on the raph.
That-s a basic relation bet!een a cumulative distribution function, and a probability
density function.
Cumulative Distribution Function (cdf)
0 5 10 15 20
)ere is the first derivative of that F(t)
0eep the c.d.f and p.d.f. in the bac,round for a moment. 1o!, !e define the ha2ard at
time t, h(t), as the probability of an event at the interval 3t, t45t6, !hen 5t0. To find an
e*pression for h(t), !e should reali2e that h(t) must be a "conditional probability#( it is
conditional on not havin the event up to time t (or conditional on survivin to time t.)
From probability calculus, !e ,no! the formula of a conditional probability(
7r (8 and 9)
7r (8 . 9) % %
7r (9)
:here 8%#havin the event at time t#; "9#%#not havin the event by time t#
)ere( 7r(8 and 9) % 5F.5t % dF.dt That-s the "delta probability of the event per unit
time.# 8llison e*plains nicely !hy !e have to use that ratio (!e !ant to
eliminate the influence of 5t and !e can-t compute the probability at e*actly time
"t#<it is 2ero.)
7r(9) % $(t), as defined earlier.
dF.dt F-(t) f(t)
h(t) % % %
$(t) $(t) $(t)
Probability Density Function (pdf)
derivative of F(t)
0 5 10 15 20
=hec, the units of h(t)( "dF# is the difference bet!een probabilities. "dt# is the
difference in time (time units). "$(t)# is probability. $o the units of h(t) are
"probability.probability.time#, !hich is 1.time (or time
). The ha2ard is not a
probability? It is counts per time (!hich is rate). $ome people call it "probability rate#.
1o!, all of this !as eneric. :e have nothin specific until !e ma,e an assumption
about F(t). :hat is that function@ :hat function did I use to create the raph above@
Ane possible function is e*ponential. (I used it to dra! the raphs.)
F(t) %
1− or
λ −
− 1
B is a constant, but !e don-t ,no! yet !hat it means. The function behaves reasonably,
ho!ever( :hen t tends to 0, F(t) tends to 0, as it should( the cumulative probability of the
event is small. :hen t tends to infinity, F(t) tends to 1( the event !ill happen,
"eventually#. I sho!ed that function in the raph for B%0.C.
1o!, derive f(t), $(t), and h(t)
f(t) % F-(t) % B e
$(t) % e
f(t) B e
h(t) % % % B
$(t) e
So, we discovered that λ in F(t), as defined above, is what we called the hazard.
8llison-s boo, e*plains this nicely, but he starts from the ha2ard and moves to the other
functions. I thin, that the c.d.f. is the best startin point, pedaoically. From c.d.f. !e
et to p.d.f. and then to ha2ard.
There are other ha2ard functions that are not constant(
'*ponential( h(t) % B (This is !hat !e did above)
Dompert2( h(t) % e*p( B 4 E t)
:eibull( h(t) % Bt
If !e allo! for predictors, and define lo h (t) % F, then(
'*ponential( lo h(t) % F 4 β
Dompert2( lo h(t) % F 4 E t 4 β
:eibull( lo h(t) % F 4 E lo (t) β
Ga*imum partial li,elihood (=o*) ma,es no assumption about h(t).
!dditional notes on !ugust "#, "$$%&
)ere I sho! the similarity bet!een the ha2ard (rate) and the probability !hen the follo!
up is short and the rate is lo!. (Inspired by $ymons GH( )a2ard rate ratio and
prospective epidemioloical studies( H =lin 'pidemiol C00C;II(JKL>JKK). For this
reason, the ha2ard rate ratio and the proportion ratio (and the odds ratio) tend to be
Focus on a short interval since the start of follo!>up 30, t6(
The ha2ard of death, h(t) % B.
8nd the cu'ulative hazard function is M0
h(t) dt % & ln $(t) % & ln (e
) % λt
8side( Ane !ay of provin the first e+uality above is provin that the derivative of
"ln $(t)# is &h(t), ,eepin the rule of calculus belo! in mind.
ln (u) %
d ln $(t) 1 d $(t) d $(t) . dt d31 & F(t)6 . dt & f(t)
% % % % % &h(t)
dt $(t) dt $(t) $(t) $(t)
The probabilit( of death in that interval is simply F(t), the cu'ulative distribution
F(t) %
λ −
− 1 % 1&(1&Bt)6 % λt 3based on the famous appro*imation( e
% 1&*
!hen * is small6
$o, for a small B (lo! ha2ard rate) and a small t (short interval), namely a small Bt, the
follo!in holds(
F(t) (the probability of death) N Bt % cumulative h(t). 9OT I /A1-T O1/'P$T81/.
)8Q8P/ is B, not Bt. 1''/ TA T)I10 $AG' GAP'.
That paper also sho!s the follo!in strict relation bet!een )P, PP, and AP for a small
1 R PP R )P R AP
1 S PP S )P S AP
(The ine+ualities bet!een AP and PP is !ell ,no!n, but the middle position of the )P is
not !ell ,no!n.) Af course, !hen follo! up is lon, the !hole story brea,s and the PP
and AP become useless. 9ecause proportion and odds mi* causal forces !ith follo! up
time (chapter L).