Professional Documents
Culture Documents
Handout 7 PDF
Handout 7 PDF
1 The data
The data is from an experimental study of recidivism of 432 male prisoners, who were observed for a year after
being released from prison. Half of the prisoners were randomly given financial aid when they were released.
The main focus of the study was whether financial aid kept the prisoners from being rearrested.
- week: week of arrest after release, or censoring time.
- arrest: the event indicator, equal to 1 for those arrested during the period of the study and 0 for those
who were not arrested.
- fin: a dummy variable, equal to 1 if the individual received financial aid after release from prison, and 0
if he did not; financial aid was a randomly assigned factor manipulated by the researchers.
- age: in years at the time of release.
- race: a dummy variable coded 1 for blacks and 0 for others.
- wexp: a dummy variable coded 1 if the individual had full-time work experience prior to incarceration
and 0 if he did not.
- mar: a dummy variable coded 1 if the individual was married at the time of release and 0 if he was not.
- paro: a dummy variable coded 1 if the individual was released on parole and 0 if he was not.
- prio: number of prior convictions.
- educ: education, a categorical variable, with codes 2 (grade 6 or less), 3 (grades 6 through 9), 4 (grades
10 and 11), 5 (grade 12), or 6 (some post-secondary).
- emp1 - emp52: dummy variables coded 1 if the individual was employed in the corresponding week of
the study and 0 otherwise.
1.00
0.95
Proportion Not Rearrested
0.90
0.85
0.80
0.75
0.70
0 10 20 30 40 50
Weeks
3 Model development
It is likely that we will have data for more covariates than we can reasonably expect to include in the model.
We must therefore decide on a method to select a subset.
1. We fit a multivariable model containing all variables that were significant in a univariable analysis at
the 20-25% level.
2. We use the p-values from the Wald statistic to remove variables from our mode. We also confirm the non
significance by a likelihood ratio test.
3. We check whether the removal has produced an important change in coefficients of other variables.
4. We check again all the variables that we removed.
5. We check for nonlinearity.
6. We look for interactions.
7. We check assumptions.
C = W + (p − 2q)
W = W (p) − W (p − q), where W (p) is the Wald test statistic for the model containing all p variables and
W (p − q) denotes the Wald test statistic for the subset model.
4 Model diagnostics
- PH assumption
We will see that the PH assumption fails for the variable age. We therefore look at i) an interaction between
age and time and ii) age as a strata variable.
- Influential observations
For each covariate we look at how much the regression coefficients change if we remove one observation.
- Checking for nonlinearity
The Martingale residual for individual i at time ti is
where δi is the event indicator, Ĥ(ti , x, β̂) is the cumulative hazard for that indvidual and ti is the time at the
end of follow up.
The time-dependent employment variable has an apparently huge effect. The hazard of rearrest is smaller by a
factor of 0.265 (declined by 73.5%) when people are on a employed status.
Arrest
Arrest
at time t
At time t
Ambiguous causality
Weekly Weekly
Employment Employment
at time t at time t-1
6 Final model
After we introduced the weekly employment into our model the marriage variable has become non significant.
We therefore remove it. We also choose to have age as a strata variable for ease of interpretation.
This means that subject that received financial aid are being arrested at a 29% slower rate than those who did
not.