25 views

Uploaded by Franz Eigner

6 candidate markers were evaluated in its ability to predict survival of breast cancer patients. For simplicity, the expression values of these markers have been categorized (above median, below median). Univariable and multivariable Cox regressions are used to examine correlations between gene expression of pmp22 and the analyzed survival times, which were disease-free survival, overall survival and tumor specific survival. Analyzes showed that the significant effect of pmp22 in multivariable Cox regressions seemed to be due to interactions with pN and, at least in the overall survival case, with pT. The statistical package R Project was used for following analyses:

6 candidate markers were evaluated in its ability to predict survival of breast cancer patients. For simplicity, the expression values of these markers have been categorized (above median, below median). Univariable and multivariable Cox regressions are used to examine correlations between gene expression of pmp22 and the analyzed survival times, which were disease-free survival, overall survival and tumor specific survival. Analyzes showed that the significant effect of pmp22 in multivariable Cox regressions seemed to be due to interactions with pN and, at least in the overall survival case, with pT. The statistical package R Project was used for following analyses:

Attribution Non-Commercial (BY-NC)

- Miron_The Linkage Between CSP_2013!02!13 Curat
- Effect of Occupational Status of Women on Their Cooking Habits and Food Buying Behaviour
- 2017 FRM Study Guide
- IJRTEM_H021052065.pdf
- An Assessment of Impact of Some Demographic Variables on Traveling Behavior of Dhaka City Dwellers: An Application of Logistic Regression Model
- SPSS
- Regression Analysis
- Regression Analysis
- Econo I Course Outline[1]
- statistics.pdf
- notes6
- PereiraMartins_MonthlyMincer.pdf
- Mreg e Print
- Progress
- Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise Model
- Bollen - Variáveis Latentes
- The Robust Beauty of Improper Linear Models in Decision Making
- How Thesis
- Spatial Modeling on Net Enrollment Rate of Junior High in Mataram
- 10.1.1.54

You are on page 1of 31

Finding new markers for predicting breast cancer survival

Verfasser

Studienrichtung lt. Studienblatt: Statistik

Betreuer: Ao.Univ.-Prof.Mag.Dr. Georg Heinze

Statistical Consulting SS 08

Table of contents

1 Introduction .............................................................................................................................. 4

2 Description of Data................................................................................................................... 4

3 Methods ................................................................................................................................... 10

5 Literature ................................................................................................................................ 30

3

Statistical Consulting SS 08

1 Introduction

In survival analysis breast cancer survival is usually predicted by using clinical variables and gene

expressions as independent factors. This paper intends to find new gene expressions as prognostic

markers for breast cancer survival. On behalf of Prof. Dan Cacsire Castillo-Tong and Prof. Zeilinger 6

gene expressions (DDR2, EMP1, EMP2, EMP3, PMP22 and MKI67) were chosen as candidate

markers, which were correlated with disease-free survival, overall survival and tumor specific survival

of the patients. A candidate marker which proves satisfactory in our analyses could be used to develop

a new score to improve prognosis for breast cancer.

2 Description of Data

250 breast cancer patients from the Department of Obstetrics and Gynaecology, Medical University of

Vienna, were included in this study. Date of diagnosis range from 03.03.1987 to 30.11.2001. As

clinical variables, histologic type (HISTOTYPE), tumor size (pT), degree of spread to lymph nodes

(pN), Tumor Grade (G) and age (AGE) were chosen. Additionally to the 6 gene expressions described

in the introduction, the gene expression of estrogen receptor (ER) was also quantified in the tumor

tissues of patients with breast cancer. The analyses were made by the open-source statistical software

package R (R Dev. Core Team, 2008) and the commercial SAS ® (SAS Institute Inc., 2008) software

package.

Simple tests of plausibility were performed to check the data. Patient 7763 was excluded from the data

set because the results of all gene expressions were missing. Patient 7363 was removed from the data,

because the date of recurrence was unknown.

To graphically check the distribution of the gene expressions, histograms were computed. The data

were transformed by taking the logarithm to base 2 to obtain an approximately symmetric normal

distribution.

4

Statistical Consulting SS 08

0.20

0.20

0.20

Density

Density

Density

Density

0.10

0.10

0.10

0.00

0.00

0.00

-6 -2 0 2 -4 -2 0 2 4 -6 -4 -2 0 -6 -4 -2 0

0.30

0.20

6

0.10 0.20

Density

Density

Density

Density

0.10

4

0.10

2

0.00

0.00

0.00

0

-4 -2 0 2 -8 -4 0 4 -8 -6 -4 -2 0 0.0 0.4 0.8

To show the importance of using the logarithm, the untransformed values of the gene expression

MKI67 are also plotted. Afterwards gene expressions were transformed into dichotomous variables by

using the median.

Histologic type

Invasive ductal carcinoma (IDC) 182

IDC and ILC 6

Invasive lobular carcinoma (ILC) 40

Medullary 5

Mucinous 6

Unknown 9

Total 248

For analysis of survival times groupings were necessary because of the low number of cases in some

subgroups. Concerning histological type, the category IDC and ILC was classified just as IDC,

because for survival the more serious diagnosis is important. Medullary, Mucinous and Unknown were

combined to a new category Others and Unknown.

Tumor size

Mic 1

pT I 64

pT II 127

pT III 23

pT IV 14

Unknown 19

Total 248

The Patient with the category Mic was assigned to pT I. For the analysis pT III and pT IV were pooled

and compared with the groups pT I and pT II.

5

Statistical Consulting SS 08

Nodal status

pN0 95

pN1 123

Unknown 30

Total 248

Differentiation grade

G1 34

G2 122

G3 71

Unknown 21

Total 248

Recurrence of Disease

Recurrence of disease 109

No evidence of disease 139

Total 248

Survival

Alive at last observation 152

Death at last observation 196

Death as a result of disease 71

Death of other cause 16

Death of unknown cause 9

Total 248

Years 27.8 48.0 58.1 69.4 89.6

For analysis, patients were divided into younger or equal than 50 years and older than 50 years.

Usually around this age the menopause starts. At time of diagnosis, 31% of all patients were younger

and 69% were older than 50 years.

Correlations

Gene expression values were grouped into values lower or equal to and values greater than the median

and then compared between groups constituted by histopathologic data according to the χ 2 -test.

PMP22 seems to be strongly correlated with differentiation grade (G) and nodal status (pN). Gene

expression of EMP2 seems to be strongly correlated with G, pT and pN. High correlations were also

found between ER and G. Furthermore correlation analysis reveals that no significant difference in the

level of expression of PMP22 can be examined between patients aged younger or equal than 50 years

and patients aged older than 50 years. Additionally, correlations between gene expressions were

estimated by Spearman’s nonparametric correlation coefficient. Our gene expressions are in general

remarkably correlated with each other.

6

Statistical Consulting SS 08

As typical in clinical and epidemiological studies, survival times are censored caused by a time

restriction of type I (Lagakos, 1979). The study continues until a prespecified time point (cut-off

point). The date of the event of interest is known precisely only for those subjects who present the

event until cut-off point. For the remaining subjects, it is only known that the time to the event is

greater than the observation time. This is referred as „administrative censoring“ and the incomplete

data are called „right censored“. Besides the time restriction, incomplete data can be also given by lost

to follow-up or drop out patients in the study.

DFS was defined as the time elapsing from date of diagnosis to date of recurrence of disease

(event) or - in case of no recurrence - to date of last gynecological examination (censored).

OS was defined as the time elapsing from date of diagnosis to date of death (event) or - in case

of no death - to date of last observation (censored).

TS follows the definition of OS except that patients who died of causes unrelated to breast

cancer were also treated as censored.

7

Statistical Consulting SS 08

Survival probabilities were estimated using the method of Kaplan and Meier (1958).

5-year survival DFS 0.639 0.578 0.707 122

OS 0.754 0.700 0.812 156

TS 0.810 0.759 0.863 156

10-year survival DFS 0.490 0.420 0.570 50

OS 0.589 0.524 0.663 71

TS 0.658 0.591 0.732 71

15-year survival DFS 0.322 0.207 0.501 5

OS 0.400 0.290 0.553 7

TS 0.547 0.421 0.712 7

prob. … probability to survive

Analyses show that the probability of recurrence of cancer within a time period of 5 years is about

36.1%, the probability of death is about 24.6% and of death on account of a tumor is about 19%.

Within a time period of 10, 15 years respectively probabilities for recurrence and death increase

steadily, approximately constant for disease-free survival and overall survival and with diminishing

trend for tumor-specific survival.

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

1.0

0.8

Cumulative survival

0.6

0.4

0.2

0.0

0 2 4 6 8 10 12 14

8

Statistical Consulting SS 08

Median survival time is estimated on the survival curve for DFS, OS and TS.

DFS 9.8 54

OS 13.5 21

TS - -

It estimates the time period beyond which 50% of the patients are expected to survive in the

population under study. As it’s evident in the graphs above, in the case of tumor-specific survival the

survival curve doesn’t fall under 50 %. The last patient in the study dies on account of the tumor after

20.6 years at a survival rate of 0.547.

time was evaluated using the method “reverse Kaplan-Meier” (KM-PF) (Schemper and Smith, 1996),

which is calculated in the same way as the Kaplan-Meier estimate of the survival function, but with

the meaning of the status indicator reversed. “Thus death ( δ = 1 ) censors the true but unknown

observation time of an individual, and censoring ( δ = 0 ) is an endpoint. The unobservable follow-up

time of a deceased patient is interpreted as the follow-up time that potentially would have been

obtained had that patient not died.” (Schemper and Smith, p.344)

Follow-up distribution

time in years number at risk

1.0

50% 9.8 74

Proportion followed-up

0.8

25% 12.5 34

0.6

0.4

0.2

0.0

0 2 4 6 8 10 12 14 16 18 20

years

9

Statistical Consulting SS 08

3 Methods

The Kaplan-Meier estimator estimates the survival function from the survival data. It can be used to

measure the survival probability for a certain amount of time after biopsy. The value of the survival

function between successive distinct sampled observations is assumed to be constant. For simplicity,

explanations are restricted to the case, where the event of interest is death.

Let S(t) be the probability that an individual from a given population will have a lifetime exceeding t.

For a sample from this population of size N let the observed times until death of N sample members be

t1 ≤ t 2 ≤ t 3 L ≤ t N

Let T be the random variable that measures the time of death and let F(t) be its cumulative distribution

function. Then the survival function is given by:

The Kaplan-Meier estimator is the nonparametric maximum likelihood estimate of the survival

function S (t ) . It’s of the form

n − di

Sˆ (t ) = ∏ i

ti ≤t ni

where ni is the number "at risk" just prior to time ti , and d i is the number of deaths at time ti . With

censoring, ni is the number of survivors less the number of losses. It is only those surviving cases that

are still being observed that are "at risk" of an (observed) death.

10

Statistical Consulting SS 08

Cox-Regression is a sub-class of survival models in statistics. They are used in this paper in the

analysis of censored survival data for identifying differences in survival due to prognostic factors. The

basic model assumes that the hazard function for failure time T for an individual i with covariate

vector xi′ = ( x1i , x 2 i , K , x ki , K , x Ki ) is

for i = 1, K , N

The covariates are assumed to be constant in time and have independent effects on the hazard rate. The

first part, h0 (t ) , is a function of time and is assumed to be the same for all subjects. Its form is not

specified by the Cox model. The second depends on the individual covariate vector, where β is the

unknown effect parameter which has to be estimated. Cox (1972) introduced a method for

estimating β and hence the hazard ratio without having to involve h0 (t ) by using partial likelihoods.

Although h0 (t ) can take any form, the hazard ratio between 2 individuals can be calculated

independent of h0 (t ) .

= = exp[β ′( x1 − x 2 )]

h(t , x 2 ) h0 (t ) exp(β ′x 2 )

The formula underlines the proportional hazards assumption, which means that the failure rates of any

two individuals are proportional, given that the ratio does not depend on time. Although the risk to

die can vary over time, the risk ratio between two individuals is constant over the whole range of

follow-up. h0 (t ) can be interpreted as the hazard function of a subject with all covariates of value

A different crucial assumption follows from the exponential function for linking the independent

covariates to the hazard. It leads to a multiplicative effect of a covariate on the hazard or, concerning

the logarithm of the hazard function, to an additive effect in form of a constant distance over time.

This assumption will be later relaxed by using interaction terms.

11

Statistical Consulting SS 08

4 Survival Analysis

The association of gene expression groups with survival times was assessed by estimating survival

curves through the method of Kaplan-Meier (1956), which were compared by the log-rank test of

Mantel-Haenzel (1959) and quantified by estimating relative risks (crude Hazards Ratio) from

(univariate) Cox regression analyses (1972), which are closely related to log-rank tests. In order to

evaluate gene expressions as independent prognostic factor for DFS, OS and TS, multivariable Cox

regression analyses were used additionally.

Estimates of the survival curve for censored data using the Kaplan-Meier method and the predicted

survivor function for a Cox proportional hazards model were computed by the function Survfit in the

R-package “Survival” (Therneau et al., 2008). Cox proportional hazards regression models are fitted

by the function coxph from the R-package “Survival”. The Efron approximation (1977) is used for

calculation of parameter estimators instead of the typical Breslow method (1974), “as it is much more

accurate when dealing with tied death times, and it is as efficient computationally” (Therneau et al.,

2008).

12

Statistical Consulting SS 08

Kaplan-Meier Curves are plotted for all gene expressions using disease-free survival.

ddr2 emp1

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative survival

Cumulative survival

≤ median ≤ median

> median > median

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

emp2 emp3

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative survival

Cumulative survival

≤ median ≤ median

> median > median

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

pmp22 mki67

0.0 0.2 0.4 0.6 0.8 1.0

Cumulative survival

Cumulative survival

≤ median ≤ median

> median > median

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

Comment: Plots were cut off at a time level of 180 months, which are 15 years. Higher than median gene

expression levels are shown by a solid line, lower or equal than median gene expression levels are shown by a

dashed line.

13

Statistical Consulting SS 08

Kaplan-Meier survival curves for disease-free survival show only small differences in survival times

between low and high gene expression levels, which are statistically not significant, using univariate

Cox regressions with a confidence interval of 95%, as it is shown in the next chapter.

Kaplan Meier-Curves for disease-free survival for all clinical variables except for G are plotted.

pN ER

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

pN0 ≤ median

pN1 > median

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

age pT

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

pT1

≤ 50 years pT2

> 50 years pT3

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

Comment: Plots were cut off at a time level of 180 months, which are 15 years.

Differences in survival times between lower and higher levels for pN and AGE seem to be quite high.

One has to keep in mind that usual K-M curves reflect the unadjusted analysis. But multivariable Cox

regression allows us to obtain an estimate of the parameter effect adjusted by prognostic covariates.

14

Statistical Consulting SS 08

Crude Hazard Ratios (relative risks) can be calculated by exponentiating estimated coefficients of the

univariate Cox regressions. A Hazard Ratio estimate of 1 means that compared groups don’t differ in

terms of survival, whereas for instance a lower than 1 value indicates lower risk for the group with

higher than median gene expression.

ddr2 OS 1.0 (0.7-1.5) 1.3 (0.8-2.1)

TS 1.1 (0.7-1.8) 1.5 (0.9-2.6)

EMP1 OS 1.1 (0.8-1.7) 1.2 (0.7-1.9)

TS 1.1 (0.7-1.7) 1.3 (0.8-2.2)

EMP2 OS 1.0 (0.6-1.4) 0.9 (0.5-1.5)

TS 1.0 (0.6-1.6) 1.2 (0.6-2.1)

EMP3 OS 1.1 (0.8-1.7) 1.5 (0.9-2.3)

TS 1.1 (0.7-1.7) 1.6 (0.9-2.7)

mki67 OS 1.1 (0.7-1.6) 1.0 (0.6-1.5)

TS 1.1 (0.7-1.8) 1.0 (0.6-1.7)

**

PMP22 OS 1.2 (0.8-1.8) 2.1 (1.2-3.6)

TS 1.3 (0.8-2.1) 3.2** (1.7-6.0)

**

p<0.01

*

p<0.05

HR Hazard Ratio

adj. HR HR in the multivariable Cox regression,

adjusted for the clinical variables and ER

CI Confidence Intervals (95%)

A marker is only important if it adds additional information to the survival prediction. Therefore, we

adjust our analyses to established markers, which can be clinical variables as well as gene expressions.

It is known that sometimes variables are only significant if adjusted for other effects. Indeed

multivariable Cox regressions reveal a weakly significant, independent effect of DDR2 on disease-free

15

Statistical Consulting SS 08

survival time, adjusting for Nodal Status, Differentiation Grade, Tumor Size, Age and ER gene

expression. More impressive is the impact of gene expression PMP22 on all 3 survival outcomes.

Patients with higher PMP22 expression had shorter disease-free survival, overall survival and tumor

specific survival time, whereas patients with lower PMP22 expression had better survival. Other gene

expressions didn’t show a significant effect on any survival time in the multivariable case. After these

results we concentrated our analyses on the gene expression PMP22.

Disease-free survival

Crude HR CI Adj. HR1 CI

*

PMP22 expression 1.0 (0.7-1.5) 1.7 (1.1-3.0)

Nodal Status 2.7** (1.7-4.3) 2.5** (1.5-4.0)

Differentiation Grade 1.1 (0.8-1.4) 1.1 (0.7-1.5)

Tumor Size 1.3 (1.0-1.8) 1.3 (1.0-1.8)

Age>50 - - - -

*

ER expression 0.6 (0.4-0.9) - -

1

stratified by ER≤median and ER>median

Overall survival

Crude HR CI Adj. HR CI

PMP22 expression 1.2 (0.8-1.8) 2.1** (1.2-3.6)

Nodal Status 2.4** (1.5-3.9) 2.2** (1.3-3.7)

Differentiation Grade 0.8 (0.6-1.1) 0.9 (0.6-1.3)

Tumor Size 1.6** (1.1-2.1) 1.6** (1.1-2.3)

Age>50 0.8 (0.6-1.3) 0.9 (0.6-1.5)

ER expression 0.6* (0.4-1.0) 0.4** (0.3-0.7)

Crude HR CI Adj. HR CI

PMP22 expression 1.3 (0.8-2.1) 3.2** (1.7-6.0)

Nodal Status 3.7** (2.0-6.9) 3.3** (1.7-6.3)

Differentiation Grade 1.0 (0.7-1.5) 1.1 (0.7-1.7)

Tumor Size 1.6* (1.1-2.2) 1.6* (1.1-2.4)

**

Age>50 0.5 (0.3-0.8) 0.6 (0.4-1.0)

ER expression 0.5** (0.3-0.9) 0.4** (0.2-0.7)

**

p<0.01

*

p<0.05

HR Hazard Ratio

adj. HR HR in the multivariable Cox regression,

adjusted for the clinical variables and ER

CI Confidence Intervals (95%)

16

Statistical Consulting SS 08

The prognostic value of PMP22 for all 3 survival outcomes is shown in the upper table, together with

histological data and age of patients. Analyses revealed that in the univariate Cox regression the level

of PMP22 doesn’t correlate with any survival outcome, whereas in the multivariable Cox regression

patients with higher expression level of PMP22 had a significantly poorer disease-free survival than

those with lower expression level (p=0.025). Patients with higher than median expression level of

PMP22 had a 1.7 fold (95% confidence level, 1.2-3.6) higher risk to relapse than those with a lower

than median expression level of PMP22. Similar results were obtained for overall survival (p=0.006)

and in a more impressive way, for tumor specific survival (p<0.001). Patients with higher than median

PMP22 expression level had a 3.2 fold (95% confidence interval, 1.7-6.0) higher risk to die on account

of a tumor.

An even larger independent effect on breast cancer was confirmed for nodal status adjusting for

PMP22, tumor size, differentiation grade and age of patient at diagnosis. Patients with negative nodal

status tended to experience much better survival than those with nodal involvement. (DFS: p<0.001,

OS: p=0003, TS: p<0.001)

A larger tumor size was correlated with poorer overall survival (p=0.007) and tumor specific survival

(p=0.018) compared to patients with a smaller tumor size. A negative but not statistically significant

impact of tumor size on disease-free survival was also revealed.

Older patients tended to have better overall and tumor specific survival, but this finding is not

statistically significant in the adjusted case (p=0.670 and p=0.058, respectively). A higher-than median

gene expression of ER is correlated with higher overall survival and higher tumor specific survival.

The same holds for its effect on disease-free survival in the univariate case (p=0.021). As we show in

the next chapter, in the multivariable (adjusted) case the effect of ER seemed to correlate with time as

revealed by a correlation of Schoenfeld-residuals with time (p=0.003). Therefore we stratified the

multivariable Cox regression by ER. Other histological data, that is Differentiation grade, didn’t show

a significant prognostic value in the univariate as in the multivariable case.

Differences between crude and adjusted Hazard Ratios could be due to correlations between the

examined variables. In this case these differences are quite large.

17

Statistical Consulting SS 08

To determine whether a fitted Cox regression model describes adequately the data, one has to check its

fundamental assumptions: (1) proportional hazards assumption, (2) multiplicative effect of covariates

on the hazard and (3) linearity in the relationship between the log hazard and the covariates.

Extensions of the model are now presented to modify these characteristics. Assumption (1) can be

relaxed by stratification, assumption (2) can be relaxed by using interaction terms between the

covariates and assumption (3) can be replaced by integrating natural splines.

The proportional hazards assumption is crucial for Cox regression and means that the ratio between

the hazards of 2 patient groups remain constant over the complete follow-up period. This implies that

in Cox regression analysis one relative risk is computed which should apply to all recurrence or death

times respectively. A way to formally detect violations of the proportional hazards assumption is to

test the significance of an interaction of a covariate with time. A different approach would be to test

the slope of partial residuals as proposed by Schoenfeld (1980, cited after Marubini/Valsecchi (1995,

p. 244). This approach has the advantage that one doesn’t have to pay attention to the specification of

the interaction term. By partitioning both the time axis and the space of the covariate values, mutually

exclusive categories of failure times with associated covariates are formed. The idea behind the

method aims at comparing the number of events observed and the number of those expected under the

Cox model in each of the „cells“ produced by this partition.

The function Cox.zph in the “Survival”- Package of R tests the proportional hazards assumption for a

Cox regression model by using scaled Schoenfeld residuals. However they are calculated after the

method of Grambsch and Therneau (1993), because they better reflect the log hazard ratio function

than ordinary Schoenfeld residuals and are furthermore on the regression coefficient scale. Residuals

are weighted by Grambsch and Therneau's "average variance" method. In detail each residual is scaled

by premultiplying by a time-dependent variance matrix, to obtain estimates of time varying

coefficients.

18

Statistical Consulting SS 08

Plots are made by the cox.zph function. The time dependent coefficient Beta(t) gives an estimate of

the correlation of each covariate with time, the test if the slope of partial residuals is unequal to zero is

measured by the p-value for Beta(t).

Disease-free survival

4

4

2

2

0

0

-2

-2

-4

-4

Beta(t) for G Beta(t) for pN

3

2

2

1

0

0

-1

-2

-2

-3

-4

-4

Beta(t) for pT

3

2

1

0

-1

-2

“The solid line is a smoothing-spline fit to the plot, with the broken lines representing a ± 2-standard-error

band around the fit. Systematic departures from a horizontal line are indicative of non-proportional hazards“.

(Fox, 2002, p. 13)

rho p-value

PMP22 0.03 0.75

ER 0.33 <0.01

G <0.01 0.98

pN -0.19 0.07

pT 0.04 0.74

19

Statistical Consulting SS 08

Overall survival

4

2

2

0

0

-2

-2

-4

-4

18 29 45 60 84 110 140 170 18 29 45 60 84 110 140 170

Beta(t) for G Beta(t) for pN

2

2

0

0

-2

-2

-4

-4

Beta(t) for pT

3

2

1

0

-1

-2

rho p-value

PMP22 0.16 0.11

ER 0.04 0.74

Age 0.02 0.87

G 0.11 0.31

pN -0.04 0.73

pT 0.14 0.25

20

Statistical Consulting SS 08

4

2

1

2

0

0

-1

-2

-2

-3

-4

-4

18 27 35 54 69 89 110 130 18 27 35 54 69 89 110 130

Beta(t) for G Beta(t) for pN

4

2

2

0

0

-2

-2

-4

-4

Beta(t) for pT

3

2

1

0

-1

-2

18 27 35 54 69 89 110 130

rho p-value

PMP22 0.19 0.09

ER 0.12 0.35

Age -0.06 0.65

G <0.01 0.98

pN 0.07 0.62

pT 0.26 0.07

The assumption of proportional hazards appears to be supported for nearly all covariates in all survival

times. There only appears to be strong evidence of non-proportional hazards for ER in the disease-free

survival analyses.

21

Statistical Consulting SS 08

To correct for unproportional hazards, a stratified Cox model is used. Stratification can be used if for a

variable non-proportional hazards are detected and if the variable is not of interest by itself. Extending

the model may accommodate this by considering the stratification of the data into subgroups, each

identified by a level of the factor, and applying the model:

hm (t , x) = h0 m (t ) exp(β ′x)

where the suffix m indicates the stratum ( m = 1, K , M ). This model assumes that individuals within

the m th stratum who have different covariates still have proportional hazards, but individuals in

different strata are permitted to experience non-proportional hazards, because each stratum has a

different baseline hazard function.

covariates and time into the Cox regression model. Such interactions are themselves time-dependent

covariates. However stratification has the advantage that one doesn’t have to assume a particular form

of interaction between the stratifying covariates and time. A disadvantage of stratification is “the

resulting inability to examine the eﬀects of the stratifying covariates”, therefore “stratification is most

natural […] when the eﬀect of the stratifying variable is not of direct interest” (Fox, p.14).

PMP22 1.83 1.1 3.0 0.015

pN 2.60 1.6 4.2 <0.001

pT 1.25 0.9 1.7 0.160

G 1.04 0.7 1.4 0.830

ER 0.49 0.3 0.8 0.003

PMP22 1.73 1.1 2.8 0.025

pN 2.48 1.5 4.0 <0.001

pT 1.31 1.0 1.8 0.096

G 1.05 0.7 1.5 0.800

Analyses show that stratifying by ER doesn’t seem to change coefficients significantly. The effect of

PMP22 fell from 1.8 to 1.7. It may be that the time dependent effect of ER was not too large.

22

Statistical Consulting SS 08

If covariates are introduced in a Cox model without an interaction term, they are supposed to act

independently and multiplicatively on the hazard. The introduction of an interaction term relaxes this

assumption. Because ignoring interaction terms would lead to a misspecification of the model, one has

to test for interaction terms. At first all interaction terms between PMP22 and the clinical variables +

ER are analysed for all survival times.

DFS OS TS

p-value

PMP22 x pN 0.82 0.62 0.78

PMP22 x G 0.20 0.55 0.28

PMP22 x pT 0.24 0.03 0.13

PMP22 x age - 0.29 0.65

PMP22 x ER 0.55 0.70 0.77

There seems to be only one significant interaction term between PMP22 and all the other variables,

which is PMP22 together with Tumor Size in the overall survival analyses. This interaction term will

be analyzed further by drawing Kaplan-Meier Curves, showing the interaction between PMP22 and

pT. Keep in mind that these K-M curves are unadjusted for all the other histological variables +ER.

23

Statistical Consulting SS 08

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

≤ median ≤ median

0.0

0.0

> median > median

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

≤ median ≤ median

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

One has to notice that adjusted for Tumor Size, differences between survival curves due to PMP22

became obviously, at least for pT=2 and pT=3. In contrast to the univariate case (graph on the lower

right), in which there seems to be no significant difference.

Analyses are extended to the adjusted case by integrating the interaction term into the multivariable

Cox regression for overall survival.

without Interaction

coef exp(coef) lower CL upper CL p-value

PMP22 0.75 2.13 1.24 3.63 0.01

pT 0.47 1.60 1.14 2.26 0.01

ER -0.82 0.44 0.27 0.74 <0.01

Age -0.11 0.90 0.55 1.46 0.67

G -0.11 0.90 0.63 1.29 0.57

pN 0.79 2.21 1.32 3.68 <0.01

24

Statistical Consulting SS 08

with interaction

coef exp(coef) lower CL upper CL p-value

PMP22 -0.90 0.41 0.08 1.95 0.26

pT -0.07 0.93 0.51 1.70 0.82

ER -0.80 0.45 0.27 0.75 <0.01

Age -0.21 0.81 0.50 1.33 0.41

G -0.08 0.93 0.64 1.34 0.68

pN 0.84 2.31 1.38 3.86 <0.01

PMP22:pT 0.80 2.23 1.07 4.64 0.03

After inclusion of the interaction term PMP22:pT the coefficient for PMP22 became insignificant. To

calculate the impact of PMP22 on overall survival one now has to consider the interaction term too.

For instance, to calculate the effect of PMP22 and pT=1 one has to add (0.80) to the coefficient of

PMP22 (-0.9) which results in -0.1. Using the exponent on this result delivers the (adjusted) Hazard

Ratio.

Overall survival

coef HR lower CL upper CL p-value

pT=1 -0.10 0.90 0.4 2.3 0.830

pT=2 0.70 2.02 1.2 3.5 0.011

pT=3 1.51 4.51 1.8 11.1 0.001

HR … Hazard Ratio

CL … confidence limit (95%)

Because the interaction effect between PMP22 and pT is positive, increasing size of tumors leads to an

increasing interaction term and therefore to an increasing impact of PMP22 on overall survival. For

the case of pT=1, no effect of PMP22 can be detected. For pT=2 and pT=3, PMP22 has an increasing

influence on overall survival.

Confidence intervals were calculated by using Cox regressions where the interaction term for the

analyzed factor of pT was eliminated by subtracting the value of pT itself. Which means: In order to

specify the confidence interval for the effect of PMP22 in case of pT=1, Cox regressions were used,

where the interaction term with pT was eliminated by subtracting pT with 1. Therefore, the estimated

coefficient of PMP22 encompassed also the effect of the interaction term and represented the whole

effect of PMP22. This is shown in the following after the method of Figueiras et al. (1998).

25

Statistical Consulting SS 08

pT_1=pT-1

This Cox regression delivers the impact of PMP22 with pT=1 on overall survival.

PMP22 -0.1 0.9 0.36 2.27 0.83

pT_1 -0.1 0.9 0.51 1.70 0.82

ER -0.8 0.4 0.27 0.75 <0.01

age>50 -0.2 0.8 0.50 1.33 0.41

G -0.1 0.9 0.64 1.34 0.68

pN 0.8 2.3 1.38 3.86 <0.01

PMP22:pT_1 0.8 2.2 1.08 4.64 0.03

Here the effect of PMP22 is clearly not significant. However evaluating the effect of the separate

factors of pT delivered a significant effect of PMP22 for pT=2 (pvalue=0.011, 1.2-3.5) and for pT=3

(pvalue=0.001, 1.8-11.1), for the latter there could be also a positive correlation with time on account

of significant Schoenfeld-Residuals (pvalue=<0.001). This would mean that the impact of PMP22 for

pT=3 seems to be larger in the later observation time. Survival curves differ concerning pT=2 and

pT=3, a positive correlation with time can be seen for pT=3.

Correlation Schoenfeld

TS coefficient Residual

10

rho p-value

PMP22 0.37 <0.001

5

ER 0.05 0.624

Age -0.05 0.673

0

G 0.12 0.252

-5

pN <0.01 1.000

pT -0.26 0.019

-10

-15

Time

26

Statistical Consulting SS 08

Due to high correlations between PMP22 and pN ( Chapter 2.2), Kaplan-Meier Curves are

computed to analyze the confounding between PMP22 and pN.

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

≤ median, pN0

> median, pN0

≤ median, pN1 ≤ median

0.0

0.0

> median, pN1 > median

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

≤ median, pN0

> median, pN0

≤ median, pN1 ≤ median

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

1.0

1.0

0.8

0.8

Cumulative survival

Cumulative survival

0.6

0.6

0.4

0.4

0.2

0.2

≤ median, pN0

> median, pN0

≤ median, pN1 ≤ median

> median, pN1 > median

0.0

0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

27

Statistical Consulting SS 08

The graphical presentation of the confounding between PMP22 and pN delivers comparable results

with estimated Cox regressions. Looking at the 3 plots on the right side, which illustrate the crude

effect of PMP22 on the 3 outcomes, one can see that survival time doesn’t seem to be correlated with

the degree of PMP22 which undermines the non significant Hazard Ratio from the univariate Cox

regressions. Looking at the left side, we see how correlation between PMP22 and pN extracts

differences between survival times. Patients with negative nodal status again tend to experience better

survival than those with nodal involvement, however PMP22 seem to add additional explanation for

differences in survival times. A higher than median expression of PMP22 has a clearly negative effect

on survival times, with pN=0 and with pN=1. In the multivariable Cox regression we adjusted the

effect of PMP22 not only for pN, but also for Differentiation grade, tumor size, age and ER

expression. We also obtained a significant Hazard Ratio for all 3 survival time outcomes.

„Nonlinearity – that is, an incorrectly specified functional form in the parametric part of the model – is

a potential problem in Cox regression as it is in linear and generalized linear models.“ (Fox, 2000, p.

15) When the linear dependence of the log-hazard on the covariate is not believed to hold through its

entire range, one may extend the predictor to include a squared term to detect a possible departure

from the linear relationship. A more sophisticated approach to this problem consists in using a spline

function to model the relationship between log-hazard and predictors (Harrel et al., 1988; Durrleman

and Simon, 1989) (cit. after Marubini/Valsecchi, p.195).

By using the function rcs from the R-package “Design” (Harrell, 2008) a linear tail-restricted cubic

spline function (natural spline) for PMP22 is integrated into the model.

DFS OS TS

coef p-value coef p-value coef p-value

PMP22 0.98 0.319 PMP22 1.04 0.371 PMP22 0.90 0.515

PMP22' -0.97 0.717 PMP22' -0.15 0.961 PMP22' 1.23 0.734

PMP22'' 0.80 0.938 PMP22'' -3.69 0.752 PMP22'' -8.33 0.545

ER -0.07 0.368 ER -0.17 0.094 ER -0.23 0.081

Age - - Age 0.01 0.176 Age -0.01 0.187

G -0.02 0.912 G -0.17 0.336 G -0.02 0.914

pN 0.94 <0.001 pN 0.76 0.004 pN 1.23 <0.001

pT 0.12 0.449 pT 0.34 0.050 pT 0.29 0.137

28

Statistical Consulting SS 08

0.5

0.0

log Relative Hazard

0.0

-0.5

-0.5

-1.0

-1.0

-1.5

-1.5 -1.0 -0.5 0.0 0.5 -1.5 -1.0 -0.5 0.0 0.5

0.5

log Relative Hazard

0.0

-1.0 -0.5

-2.0 -1.5

One can test for each survival time if the model with the cubic spline function delivers a significant

higher likelihood (LL) than the general model. The joint contribution of the cubic spline coefficients to

the likelihood is evaluated by applying the likelihood ratio test (LR), which gives the statistic:

The statistic QLR is asymptotically distributed as a chi-square with two degrees of freedom.

-431.25 -429.81 2.90 0.235

DFS

-371.25 -368.90 4.69 0.096

OS

-281.41 -278.62 5.57 0.062

TS

Analyses show that for all survival times the assumption of linearity concerning the effect of gene

expression PMP22 can not be rejected on a significance level of 5% by using cubic splines as an

alternative.

29

Statistical Consulting SS 08

Conclusion

Six candidate markers were evaluated in its ability to predict survival of breast cancer patients. For

simplicity, the expression values of these markers have been categorized (above median, below

median). The marker didn’t prove satisfactory in univariate analyses. However, in multivariable Cox

regressions, statistically significant correlations were found between gene expression of PMP22 and

all of the analyzed survival times, which were disease-free survival, overall survival and tumor

specific survival. Further analyzes showed that the significant effect of PMP22 in multivariable Cox

regressions seemed to be due to confounding by pN and, at least in the overall survival case, by pT.

5 Literature

Breslow, NE. (1974) Covariance analysis of censored survival data. Biometrics, 30: 89-99.

Cox, DR (1972) Regression models and life tables. J R Stat Soc B 34: 187-220.

Efron, B. (1977) The efficiency of Cox's likelihood function for censored data. J. Amer. Statist. Assoc.

72: 557-565.

confidence interval of effects in the presence of interactions. Statistics in Medicine. Vol. 17:

2099-2105.

<http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf>

<http://biostat.mc.vanderbilt.edu/s/Design>, <http://biostat.mc.vanderbilt.edu/rms>

Kaplan EL and Meier P (1958) Nonparametric estimation for incomplete observations. J Am Stat

Assoc 53: 457-481.

Lagakos S. W. (1979) General right censoring and its impact on the analysis of survival data.

Biometrics, 35: 139-156,

Lam, P (2007) coxph: Cox Proportional Hazards Regression for Duration Dependent Variables, in

Kosuke Imai, Gary King and Olivia Lau, “Zelig: Everyone’s Statistical Software”

<http://gking.harvard.edu/zelig>

30

Statistical Consulting SS 08

Mantel, N. and Haenszel, W. (1959) Statistical Aspects of the Analysis of Data from Retrospective

Studies of Disease. Journal of the National Cancer Institute, 22: 719-748.

Marubini E, Valsecchi MG (1995) Analysing survival data from clinical trials and observational

studies. Wiley.

R Development Core Team (2008). R: A language and environment for statistical computing. R

Foundation for Statistical Computing,Vienna. < http://www.R-project.org>

SAS Institute Inc. (2008) SAS for Windows, Version 9.2 SAS Institute Inc., Cary, NC, USA.

Schemper M, Smith TL (1996) A note on quantifying follow-up in studies of failure time. Control Clin

Trials 17: 343-346.

<http://www.mayo.edu/hsr/people/therneau/survival.ps> Mayo Foundation.

Therneau T M, Grambsch P M (2000) Modeling Survival Data: Extending the Cox Model, Springer.

Therneau and ported by Lumley T (2008) survival: Survival analysis, including penalised likelihood.

R package version 2.34-1.

31

- Miron_The Linkage Between CSP_2013!02!13 CuratUploaded byalexvendaria
- Effect of Occupational Status of Women on Their Cooking Habits and Food Buying BehaviourUploaded byparth
- 2017 FRM Study GuideUploaded byAslam Khan
- IJRTEM_H021052065.pdfUploaded byjournal
- An Assessment of Impact of Some Demographic Variables on Traveling Behavior of Dhaka City Dwellers: An Application of Logistic Regression ModelUploaded byinventionjournals
- SPSSUploaded byAyu Agiari
- Regression AnalysisUploaded byLas Ukcu
- Regression AnalysisUploaded byjjjjkjhkhjkhjkjk
- Econo I Course Outline[1]Uploaded byAbbas Raza
- statistics.pdfUploaded byKashif Rehman
- notes6Uploaded byrodicasept1967
- PereiraMartins_MonthlyMincer.pdfUploaded byBernardo Frederes Kramer Alcalde
- Mreg e PrintUploaded byscreenshotc
- ProgressUploaded byAnjana Ravihansa Abenayake
- Two-Parameter Ridge Regression and Its Convergence to the Eventual Pairwise ModelUploaded byalexandru_bratu_6
- Bollen - Variáveis LatentesUploaded byIbrahimAli
- The Robust Beauty of Improper Linear Models in Decision MakingUploaded byYohanes Theda
- How ThesisUploaded bykmillat
- Spatial Modeling on Net Enrollment Rate of Junior High in MataramUploaded byAnonymous izrFWiQ
- 10.1.1.54Uploaded byLoh Jia Sin
- OM Sample_Exam_I 2012 OnlineUploaded byironmike51790
- 10.1007_978-0-387-09616-2_1Uploaded byulirschj
- Combined Nivolumab and Ipilimumab or Monotherapy in Untreated MelanomaUploaded byratih
- NET June 2008 economics paper iiiUploaded bySunil Yadav
- ass6_docUploaded bysubhadippal
- 2009 MickalskyUploaded byIgnacio Gould
- CEO Age and CEO Gender- Are Female CEOs Older Than Their Male CounterpartsUploaded byElwen
- It's not how you play the game, it's winning that matters: an experimental investigation of asymmetric contestsUploaded byCerac - Centro de Recursos para el Análisis de Conflictos
- JCE_70_2018_2_3_1488_ENUploaded byAzhArRafIq
- cbUploaded byDommy Rampisela

- Stochastic Volatility Option Pricing using Heston’s SV modelUploaded byFranz Eigner
- Explaining Wage Differentials in Belgium - Mincerian Wage Model. An empirical Study.Uploaded byFranz Eigner
- Dynamic Panel Data Methods for cross-section panelsUploaded byFranz Eigner
- Dynamic panel data methods for cross-section panels with an application on a winter tourism demand modelUploaded byFranz Eigner
- Health Care Systems - Costa Rica vs. ColombiaUploaded byFranz Eigner
- Professional Arbitrage - Inefficent Markets (based on Shleifer)Uploaded byFranz Eigner
- Annual Natural Population Increase - An Empirical StudyUploaded byFranz Eigner
- Liberalization of Trade in Services: The Bolkenstein-DirectiveUploaded byFranz Eigner
- Forecasting and VAR modelsUploaded byFranz Eigner
- Forecasting and VAR models - PresentationUploaded byFranz Eigner
- Nonstationary panel data methods applied on a winter tourism demand modelUploaded byFranz Eigner

- Duration Dependence of Real Estate Price in ChinaUploaded byNational Graduate Conference
- RISK-LinzUploaded byMiguel Angel Hrndz
- 2009-A Review on Reliability Models With CovariatesUploaded byg7waterloo
- Sinonasal Undifferentiated Carcinoma (SNUC)Uploaded byrichiekho
- Chap 2Uploaded byKelvin Law
- Comparative Analysis of Study Design and Statistical Test Utilization in Indian Journal of Community Medicine, Indian Journal of Public Health and Bulletin of the World Health OrganizationUploaded byAnonymous x8fY69Crn
- Lecture 13Uploaded bymuhammad_asim_10
- wokehUploaded byThom'z Ari
- 281-2011ClinicalGraphsUploaded byupendernathi
- Breast cancer survival - Finding new markers for predicting BCSUploaded byFranz Eigner
- Nelson a Ale Nest i MatorUploaded bykamelakami
- Applied Regression II (Qixuan Chen) P8110 - Syllabus 2016Uploaded bysykim657
- akmeUploaded byKaled Abode
- CH5MaintanabiltyUploaded byGwisani Vums Mav
- Crpm Cdrr ReportsUploaded byRajesh
- Maintenance of Mining MachineryUploaded byjorgeluis2000
- Survival Analysis in Patients with Dengue Hemorrhagic Fever (DHF) Using Cox Proportional Hazard RegressionUploaded byIJAERS JOURNAL
- COX_MODELUploaded byMala Kumala
- stflour0stflourاسس الرياضيات والمفاهيم الهندسية الاساسية - فاضل سلامة شطناويUploaded byMohamedFisal
- Fox . Appendix R-Cox-regressionUploaded bylinadiazbejar
- 3 Constant Failure Rate ModelsUploaded byeeit_nizam
- Learning Clinical Outcomes from Heterogeneous Genomic Data SourcesUploaded byctrplieff2669
- Trial SizeUploaded byOlga Constanza Uñate
- Survival Analysis: Cox RegressionUploaded byLusi Yang
- AMM-06 Summary CIGRE Work FordUploaded byEduardo777_777
- Survival Analysis Using SPSSUploaded by786bismillah
- Ciavarella 2004Uploaded byAmber Eve
- Event History Modeling PDFUploaded byNick
- Weekly Carboplatin With Paclitaxel Compared to Standard Three-weekly Treatment in (1)Uploaded byMuhammad Avicenna Abdul Syukur
- Ebeling Ch2-3Uploaded bySaurab Devanandan