You are on page 1of 5

Residual Analysis Using R

For this exercise we will be using the following data

Data on the recurrence times to infection, at the point of insertion of the catheter, for kidney
patients using portable dialysis equipment. Catheters may be removed for reasons other than
infection, in which case the observation is censored. Each patient has exactly 2 observations.
(We will ignore this issue that data lacks independence for the time being)

Format

patient: id
time: time
status: event status
age: in years
sex: 1=male, 2=female
disease: disease type (0=GN, 1=AN, 2=PKD, 3=Other)
frail: frailty estimate from original paper

1. Load the data set "kidney", a pre-packaged data set in r.

>data("kidney")

view the data

>head(kidney)

2. Create the cox proportional hazard model for age, sex and disease

>kfit0 <- coxph(Surv(time, status)~ age + sex + disease, kidney)


3. Create the residuals

Martingale Residuals

>martingaleres = resid(kfit0,type="martingale")

Cox Snell

>coxsnellres=kidney$status-resid(kfit0,type="martingale")

Type: The choices for type are

“martingale”, “deviance”, “score”, “schoenfeld”, “dfbetas”, “scaledsch”

Create the residuals for the last five.

3. Assessing the overall model fit with Cox Snell.

Create the Kaplan-Meier test on the Cox Snell residuals using the original censored values.
Then, plot the -log(S(t)) vs time.

>mysurv = Surv(coxsnellres,kidney$status)

>KMfit = survfit(mysurv ~1)

>plot(-log(KMfit$surv) ~ KMfit$time)

If the data is in a straight line with slope = 1, y-intercept = 0, the model is a good fit.

Is this a good fit?

4. Looking for outliers and influential points.

a. Plot Martingale residuals vs observation number.

>index = 1:nrow(kidney)

>plot(martingaleres~index)
If very negative, outlier. If close to 1 unusual.

b. Deviance residual vs observation number.

Analyze for outliers. Any values far away or the absolute value is greater than 2 within
reason.

c. Influential points- Use dfbetas

>dfbetares = resid(kfit0,type="dfbetas")

>plot(dfbetares[,4]~index,type = "h")

5. Assessing Functional form of variates (numerical variables)

Create the null model (model with no variables)

>kfitnull <- coxph(Surv(time, status)~1,kidney)

Find Martigale residuals of the null model

>martingaleres = resid(kfitnull,type="martingale")

Plot the Martingale Residuals vs variate. Overlay a loess line

>plot(martingaleres~kidney$age)

>lw1=loess(martingaleres ~ kidney$age)

Need to put the variable age in ascending order. Otherwise, the loess line creates data art.

>j = order(kidney$age)

>lines(kidney$age[j],lw1$fitted[j])
Create a new variate called fakevariate using the rnorm() function to add a random number
with mean 35 and sd 18 from a normal distribution. Run several times to see

>fakevariate = rnorm(length(kidney$age),35,18)

>plot(martingaleres~fakevariate)

>lw1=loess(martingaleres ~ fakevariate)

>j = order(fakevariate)

>lines(fakevariate[j],lw1$fitted[j])

6. Assessing the Proportional Hazard Assumption

a. Method 1: Plot log(-log(S(t)) vs log(Time)


Need to choose a categorical variable to plot by, usually treatment. We will use disease. Use the
cloglog command in survfit to generate this. Make sure you put cloglog in quotes!

>plot(survfit(Surv(kidney$time, kidney$status)~kidney$disease),fun="cloglog",ylab = "-


log(log(Survival))",xlab="log(time)",col = c("red","black","blue","purple"))

If graphs cross a lot, assumption violated.

b. Method 2: Plot Schoenfeld residuals for each variable vs time.

The cox.zph() function returns a table with test statistics and p-values to help assess

>schoenfeldres = cox.zph(kfit0)

>schoenfeldres

>plot(schoenfeldres)

You might also like