You are on page 1of 8

Event History Analysis

Event history analysis allows researchers to investigate not only occurrences of


certain political events but also the processes or histories of those events. In
essence, event history analysis estimates how histories of events or durations
leading up to the events (if events occur at all) change with different values of
independent variables (or covariates). Such estimates allow researchers to discuss
how included covariates increase or decrease the probability of the event
occurrence at any given time. Since event history analyzes the duration between
the start when the event is possible to the time when the event actually occurs (if it
occurs), event history analysis is often called duration analysis. Alternatively, as
event history analysis models the probability of an observation surviving (not
experiencing the event), event history analysis is sometimes called survival
analysis. Below, this entry discusses some major features and possible applications
of event history analysis, taking into account both parametric and nonparametric
models and some recent extensions.

Many political science studies ask questions regarding occurrences of significant


political events. For instance, scholars in comparative politics are interested in why
some countries democratize while others fail to do so (democratization), or why
some countries are able to economically develop while others stay in poverty
(economic development). Likewise, scholars in international relations study why
some countries join many international organizations, while others join only a few
(international organization [IO] membership) and why some pairs of states are
more likely to engage in militarized interstate disputes than others (militarized
disputes). Scholars in American politics ask why some court nominees are
approved, while others are not (legislative and judicial branch relations), and what
makes an incumbent more likely to face a challenger (reelection). By answering
these questions, political scientists are able to identify systematic dynamics
influencing occurrences of these political events.

While these inquiries certainly enhance our understanding of interesting political


phenomena, the timing of occurrences of such events can provide even richer
information and thus help further enhance our understanding of the underlying
dynamics. For instance, not only identifying under what conditions a country is able
to democratize but also finding out under what conditions a country is able to do so
faster than other countries would help promote our understanding of
democratization. Event history analysis provides researchers with a unique
opportunity to exploit richer information of the histories of events.

Since event history analysis has been introduced to political science, there have
been a growing number of studies that use it in empirical investigations. With this
powerful new tool, many old research questions have been revisited, and many new
research questions have been developed and empirically tested. These questions
cut across sub-disciplines of political science, and examples of them are numerous.
Why do some coalition governments last longer than others? Why do some
militarized disputes last longer than others? Why do some periods of peace last
longer than others? Why do some economic sanctions last longer than others? Why
do some drugs take a longer time to get approved by the Food and Drug
Administration than other drugs? And why can some representatives maintain their
seats longer than others? The popularity of the event history analysis is due to its
usefulness in understanding the underlying political dynamics of events and their
histories. Specifically, event history analysis has a few advantages over traditional
binary logit or probit analysis, on the one hand, and over ordinary least squares
regression (OLS), on the other.

Event history analysis has many advantages over binary logit or probit models.
First of all, event history analysis takes the history of an event seriously by taking
duration into account. As how long it takes for an event to occur provides more
information than just whether the event occurs or not, it helps better understand
the dynamics of the events under investigation. Returning to a prior example, some
countries may successfully democratize less than a few decades after
independence, others may democratize after more than a few decades, while still
others may not be able to democratize at all. Conventional logit or probit models
cannot differentiate between those cases democratizing in less than a few decades
and those in more than a few decades. In contrast, event history analysis allows
researchers not only to investigate factors that make democratization more likely
but also to investigate factors that hasten democratization by exploiting richer data.

Likewise, one may think that OLS regression might be able to capture factors
influencing duration quite nicely, as it examines continuous dependent variables.
But event history/duration data pose several challenges for traditional OLS
estimation. For one, duration data are often right skewed and the OLS approach
requires an arbitrary transformation of data (most commonly taking a log) to deal
with a right-skewed data set. A more serious problem is data truncation. Data
truncation happens when researchers do not know either the exact entry time of an
observation (left truncated) or the end time—this is the case when events have not
yet occurred when the data collection is done (right censoring). That is, there could
be some observations that, at the time of data collection, are still ongoing. OLS
treats left-truncated observations as if they have an equivalent entry time with
other observations and copes with right-censoring problems either by dropping all
the observations that have not experienced the event or by capping the history by
assuming that the event has occurred at the conclusion of the period of data
collection. Both these arbitrary assumptions can cause biases. In contrast, event
history analysis takes into account these cases of left truncation and right
censoring. Finally, event history analysis can accommodate time-varying covariates
(TVCs), which are independent variables with different values over time.

In event history, an “event” is the primary phenomenon of interest, and the


“history of such an event” refers to the duration leading up to the event. If we think
that history can span across time T, then the probability of the event occurring at
any given time point t (t is an element of T) can be written as a probability density
function f(t). Then, the cumulative probability at time t is given by F(t) and the
integration of f(t) from 0 to t. This is the probability of the event having occurred
between time 0 and t. Then, the probability of survival, or the event not having
occurred up to time t, can be simply written as S(t) = 1 – F(t). Given f(t) and S(t),
the hazard rate is the probability of the event happening at time t given the
observation has not experienced the event until timet. With the simple notations
outlined above, the hazard rate h(t) is equal to f(t)/S(t), the conditional probability
of an event occurring, given that it has not happened up until time t.

Going back to the democratization example, the hazard rate at 10 years after a
country's independence would be the probability of democratization at Year 10
divided by the probability of democratization not having happened for the past
decade in this country. Theoretically, various factors would presumably influence
the hazard rate. For instance, one may think that the level of economic
development, former colonial experience, and strength of military force all affect
the hazard rate, as well as the duration itself, and build the event history model
accordingly. Event history analysis then estimates the hazard rate for
democratization as the function of both these covariates and time. A baseline
hazard rate refers to the hazard rate as only a function of time t.

Parametric and Nonparametric Approaches


There are various modeling options for researchers who seek to use event history
analysis based on appropriateness to the data and background theories. First, the
choice between parametric and non-parametric approaches depends on how
confident researchers are of the shape of the baseline hazard, which ideally is
guided by theory. In essence, all parametric models make explicit assumptions
about the shape of the baseline hazard, while non-parametric models do not make
such assumptions.

Parametric models make assumptions for the baseline hazard rate once covariates
are included in the model. There are a wide variety of models, and models may be
nested in others. For instance, the exponential model assumes that the baseline
hazard is flat across time. This would mean that the probability of an event
occurring at time t conditional on the event not having occurred is constant over
time. The exact value then depends on included covariates. The Weibull model is
more flexible than the exponential model, and it allows that the baseline hazard
rate may be monotonically increasing, monotonically decreasing, or flat over time.
Thus, the Weibull model nests the exponential model. Both the Weibull and
exponential models assume proportional hazards as the changes of hazard rates
with changes of covariates are proportional to the baseline hazard rate. The
proportional hazards assumption should be tested. Since the Weibull model can be
flexible, there have been many studies that have used it. Yet, in some settings, the
monotonicity assumption may not be appropriate, and for those cases parametric
models without a monotonicity assumption can be used. These models include the
log-logistic and the lognormal models. These models allow hazard rates to first
increase and then decrease as t passes. Neither of these models has the
proportional hazards property.

The generalized gamma model can be useful to adjudicate among different


parametric models, as several parametric models are nested within the generalized
gamma model. The exponential, the Weibull, the lognormal, and the gamma
models are all special forms of the generalized gamma model. When one has no a
priori theoretical justification about how the baseline hazard rate varies across
time, the generalized gamma model is likely to be particularly useful. If the fit is
correct, parametric models generally have smaller standard errors than their
semiparametric counterparts. However, as parametric models require assumptions
about the shape of the baseline hazard, when assumptions are not correct, the
estimation will be biased.

In comparison with the parametric models, non-parametric approaches do not


make assumptions about the shape of the baseline hazard. While it is true that
some parametric models are more flexible than others, they all still make
assumptions about the shape of the baseline hazard. Moreover, more flexible
models require more parameters to be estimated, and inclusion of such parameters
may be tenuous and cumbersome. Finally, nonparametric models will approximate
a correct parametric model; if the correct parametric model is the Weibull,
nonparametric estimates will closely approximate the Weibull; if the correct
parametric model is the exponential, nonparametric estimates will approximate the
exponential without specifying the baseline hazard. In general, given that there is
little theoretical justification for the shape of the baseline hazard in most political
science inquiries, there seems to be a rising consensus that there is little to lose
and much to gain in using nonparametric models in social science applications.
Thus, it may be better to leave the baseline hazard unspecified and instead focus
on how covariates of interest influence the hazard rate. This is the basic
justification of the Cox proportional hazards model, or simply the Cox model. Time-
varying covariates are easily handled in the Cox model as well.

The Cox model estimates the hazard rate of the ith observation at time t as a
function of both the unspecified baseline hazard and the covariates. As the hazard
rate is a product of the function of the covariates and the baseline hazard, the Cox
model is also a proportional hazards model, and this assumption needs to be
checked. The Cox model uses partial likelihood methods to estimate coefficients for
each covariate. The partial likelihood function for some given data is the product of
the hazard rate of the observation that experiences the nth event over the sum of
all the hazard rates of observations that have not experienced the event, n to K,
when n varies from 1 to N, where N is the number of observations that experience
the event in the data, and K is the total number of observations. Then, the partial
likelihood function is maximized, and the coefficients are obtained. For instance,
imagine a data set with three observations (K = 3), where only two of them (N = 2)
experience the event. The partial likelihood is then given by the hazard rate of the
observation that experiences the event first over the sum of the hazard rates of all
three observations multiplied by the hazard rate of the observation that
experiences the event second over the sum of the hazard rates of the two
remaining observations.

As illustrated, the Cox model uses information about which observation experiences
the event sooner than others. Thus, the interval between two observations
experiencing events does not provide any additional information. In this sense, the
Cox model is an ordered events model. Important advances have been made in
handling ties, which are observations that experience the event at the same time,
such as the Breslow, the Efron, and the exact discrete approximations.

Discrete and Continuous Time Approaches


Thus far, the discussion of event history analysis implicitly assumes continuous
time. While time is continuous in nature, in reality the data sets that researchers
use are often discrete. Event history data in discrete time contain the same
information in different forms—a series of binary observations in which zeros are
recorded for a nonevent, and ones are recorded for an event. In such a case,
researchers need to select a probability distribution to capture the binary decision.
The logistic distribution (as in a logit model) and the standard normal distribution
(as in a probit model) are typically used.

Duration dependence still needs to be directly accounted for in the discrete time
model, which is generally handled by fitting smoothing functions such as spline
functions or lowess. Some argue for discrete time due to greater familiarity by most
social scientists with logit/probit models. Others argue for continuous time
approaches since modeling duration dependency does not require extra modeling
steps. Both discrete and continuous time models are appropriate and adequate.

Diagnostics
Model fit and assumptions should be tested for event history models, just as they
are for any other statistical model. Residual plots show the difference between the
observed values and the predicted values from the estimated model, and residual
measures can be used as graphical tests for various purposes. Cox-Snell residuals
are often used to evaluate the overall fit of the estimated model. When Cox-Snell
residuals plot roughly around the 45° reference line, the Cox model performs
reasonably well. When there is a systematic deviation of the plotted residuals from
the reference line, then there may be some omitted variables or problems with
model specifications. But one should also note that the Cox-Snell residuals are not
a definite evidence of how the estimated model fits.

Since the Cox model and some parametric models assume proportional hazards, it
is important to test this assumption. If it does not hold, it is not possible to
interpret the hazard rate or ratio constant across time, and additional modeling
steps are required. Graphically, scaled Schoenfeld residuals can be used in testing
the proportional hazards assumption. Scaled Schoenfeld residuals can be thought of
as the difference between the expected values of the covariates and the observed
values of the covariates. Thus, when scaled Schoenfeld residuals are plotted across
time, one can see if any variable shows violations of the proportional hazards
assumption. When scaled Schoenfeld residuals are consistently gathered around the
zero line, the proportional hazards assumption is likely to hold. When residuals
systematically deviate from the zero line or show heterogeneity across time, it is
possible that the model is misspecified and the proportional hazards assumption
might not hold. There is a global statistical test to examine the proportional
assumption as well. If the global test indicates that the model may have a violation,
Harrell's rho can be used to focus on offending covariates individually. Typically, an
interaction with time and the offending covariate is included in the model to
account for the nonproportionality.

Extensions
Additional extensions make event history analysis even more useful. For instance,
an observation may experience an event more than once. It is also possible that an
observation can exit in more than one way. The model can also be extended to
include unobserved heterogeneity among observations, event dependence, or
spatial dependence.

Repeated Events
Repeated events are simply those when the observation can experience the event
more than once, such as occurs in the study of repeated interstate disputes
between certain pairs of countries. The analysis of repeated events takes into
account the fact that the first, second, third, and such other events are not
independent. Pairs of countries with two prior interstate disputes will likely have a
different probability of having another dispute than pairs of countries with no such
prior experiences of having an interstate dispute. Stratification by event number is
an appropriate way to model repeated events to allow the hazard rate to vary by
event number.

Competing Risks
Competing risks extensions to the basic Cox model allow analysts to consider
different types of events. For example, a member of Congress may leave office due
to defeat in the primary election, defeat in the general election, running for higher
office, or retiring. Economic sanctions may end with either a sanctioning failure—a
sanctioner lifting sanctions without achieving desired policy goals—or a sanctioning
success—a target accepting the demand from the sanctioner. These events are
distinct, and the covariate effects are likely to differ. The usual modeling strategy
for a competing risks model is to estimate separate models for each event while
treating observations experiencing other events as censored. One drawback of this
approach is that it assumes independence between different types of events. Yet,
often, this assumption is questionable. For instance, the hazard rate of sanctioning
failure may depend on the hazard rate of sanctioning success. The competing risks
model does not assume a time order for events to occur.

Unobserved Heterogeneity
In an ideal world, researchers would include all relevant covariates in the statistical
model to estimate the precise effects of each covariate. But it is often not the case
that all the relevant variables are available. Some may be unmeasurable, and
others may be unobservable. The frailty model extension of the Cox model provides
one way to account for unobserved heterogeneity across observations. The frailty
term can be thought of as an additional factor that measures how prone or “frail”
observations are to experiencing the event. The frailty can be modeled as either a
group- or individual-specific term.

Event Dependence
The conditional frailty model separates and accounts for both event dependence
and heterogeneity in repeated events models. Event dependence exists when the
occurrence of one event makes further events more or less likely. For example,
learning effects or damaging effects may make an event more or less likely to
occur. In short, the risk of an event may be a function of a prior event occurring.
Separating out event dependence and heterogeneity is helpful to analysts
methodologically in order to draw more accurate substantive conclusions.

Spatial Dependence
Finally, recent extensions incorporate spatial dependence into event history models.
This is an exciting development because it offers an effective approach to account
for spatial dependence in political event processes. The model allows for spatially
correlated random effects at neighboring locations. For example, scholars have
looked at incorporating the proximity of civil unrest on the timing of outbreaks of
additional violence and the diffusion of state policy adoption across states on the
timing of those policy adoptions.

—Janet M. Box-Steffensmeier

—Byungwon Woo

Further Readings
Blossfeld, H., & Rohwer, G. (2002). Techniques of event history modeling: New
approaches to causal analysis (2nd ed.). Mahwah, NJ: Lawrence Erlbaum
Associates.

Box-Steffensmeier, J. M. , De Boef, S. , and Joyce, K. Event dependence and


heterogeneity in duration models: The conditional frailty model. Political
Analysis,vol. 15 (3), p. 237–256. (2007).

Box-Steffensmeier, J. M., & Jones, B. S. (2004). Event history modeling: A guide


for social scientists. Cambridge, UK: Cambridge University Press.

Darmofal, D. Bayesian spatial survival models for political event


processes. American Journal of Political Science, vol. 53 (1), p. 241–257. (2009).

Hosmer, D. W., Lemeshow, S., & May, S. (2008). Applied survival analysis:


Regression modeling of time-to-event data (2nd ed.). Hoboken, NJ: Wiley-
Interscience.

Kleinbaum, D. G., & Klein, M. (2005). Survival analysis: A self-learning text (2nd


ed.). New York: Springer.
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling
change and event occurrence. Oxford, UK: Oxford University Press.

Therneau, T., & Grambusch, P. (2000). Modeling survival data: Extending the Cox
model. New York: Springer-Verlag.

Entry Citation:

Box-Steffensmeier, Janet M., and Byungwon Woo. "Event History Analysis." International

Encyclopedia of Political Science.  Ed. Bertrand Badie, Dirk Berg-Schlosser, and Leonardo Morlino.

Thousand Oaks, CA: SAGE, 2011. 856-61. SAGE Reference Online. Web. 23 Feb. 2012.

You might also like