Non-parametric test is a hypothesis test where in the distribution of the data does not need to be described by the parameters. It is also called as ‘distribution-free tests’ because it has only few assumptions to be satisfied. One of the most popular non-parametric tests that are widely used in clinical studies is the Survival Analysis.
Survival analysis is a statistical technique that is used to analyze time-to-event data to model the length of time until the event of interest occurs. The response variable often indicates as an event time, failure time and survival time. Models that are created by the survival analysis are called survival models.

© All Rights Reserved

15 views

Case Study about Monoclonal Gammapothy of Undetermined Significance (MGUS): an Application of Survival Analysis using Kaplan Meier, Cox Regression and Logistic Regression

Non-parametric test is a hypothesis test where in the distribution of the data does not need to be described by the parameters. It is also called as ‘distribution-free tests’ because it has only few assumptions to be satisfied. One of the most popular non-parametric tests that are widely used in clinical studies is the Survival Analysis.
Survival analysis is a statistical technique that is used to analyze time-to-event data to model the length of time until the event of interest occurs. The response variable often indicates as an event time, failure time and survival time. Models that are created by the survival analysis are called survival models.

© All Rights Reserved

- 2013 LHB Taylor Et Al.
- spssSpunkins
- How to Evaluate Credit Scorecards and Why Using the Gini Coefficient Has Cost You Money
- Searching Appropriate Methods for Survey Data Analysis
- Vegetation and Crime
- Regression analysis.docx
- Non-Inferiority Tests for Two Survival Curves Using Cox's Proportional Hazards Model
- Lopez-Saez Etal Landslides Inpress
- 8 IJAEST Vol No.4 Issue No.2 Mathematical Model Based on Mobility Index for the 036 041
- Lecture 14-1 Cross Section and Panel
- Harvesting Collective Intelligence: Temporal Behavior in Yahoo Answers
- Score Traumatologia
- 1-s2.0-S0148296306001846-main
- Mortality and Revision Rates in BHRs, THAs
- Vcd5 Handout 2x2
- Journal Geriatri Untung
- Thompson
- Public Open Space and Walking
- Disc Surv
- Marketing and Advertising

You are on page 1of 26

Application of Survival Analysis using Kaplan Meier, Cox Regression and Logistic

Regression

Introduction

Plasma cells are responsible in producing immunoglobulin, an important part of the

immune defense. They are estimated to be about 106 different immunoglobulins in the circulation

of the blood at a specific time. Thus, when a patient has a plasma cell malignancy, the

distribution of immunoglobulin will be subjugated by a single isotype, a product of malignant

clone. It can be seen as a spike on a serum protein electrophoresis. Monoclonal gammapothy of

undertermined significance (MGUS) is the existence of such spike; hence, it is a premalignant

plasma cell disorder. In order for the researcher to study about this disease, non-parametric tests

were used.

Non-parametric test is a hypothesis test where in the distribution of the data does not

need to be described by the parameters. It is also called as distribution-free tests because it has

only few assumptions to be satisfied. One of the most popular non-parametric tests that are

widely used in clinical studies is the Survival Analysis.

Survival analysis is a statistical technique that is used to analyze time-to-event data to

model the length of time until the event of interest occurs. The response variable often indicates

as an event time, failure time and survival time. Models that are created by the survival analysis

are called survival models.

Survival models are widely used in variety of health related fields and also used for

conducting research in the field of economics, physical and biological sciences, social sciences,

sociological, psychological, political, anthropological data and even in the field of engineering

that is typically called as reliability analysis (Thomas Ryan, 2015).

PASW or Predictive Analytics Software (formerly SPSS) is applied statistical software

that is used effectively in conducting a Survival Analysis, and one of the non-parametric

statistical tests that are commonly used is the Kaplan Meier method.

The researcher gathered a secondary data that contain the natural history of 241 subjects

with monoclonal gammapothy of undetermined significance (MGUS).

As for the description of the data, it was conducted by Dr. Robert Kyle in Mayo Clinic in

Rochester, Minn.

A data frame with 241 observations on the following 12 variables.

id:

subject id

age:

age in years

sex:

male or female

pcdx: for subjects who progress to a plasma cell malignancy

the subtype of malignancy: multiple myeloma (MM) is the

most common, followed by amyloidosis (AM), macroglobulinemia (MA),

and other lymphprolifative (LP)

pctime:

death: 1= follow-up is until death

alb:

hgb:

Reference: http://vincentarelbundock.github.io/Rdatasets/datasets.html

In addition, the censoring status is the variable death, indicating that all patients who have

been coded as 1 in PASW, dies. This is not a problem since the study record the patients followup till he or she dies. Moreover, the researcher created dummy variable for all independent

variables for analyzing Cox Regression Analysis and Logistic Regression.

Age

0 = 34 to 60 years old

1 = 61 to 90 years old

Sex

0 = Male

1 = Female

Patients who progress to a subtype plasma malignancy

0 = Those who do not

1 = Those who progress

Creatinine at MGUS Diagnosis

0 = 0.6 1.4

1 = 1.5 6.4

Albumin Level

0 = 1 3.0

1 = 3.1 5.1

Hemoglobin

0 = 6.8 12.5

1 = 12.1 16.5

Monoclonal Spike Size

0 = 0.8 1.8

1 = 1.9 2.9

Monoclonal Gammapothy of Undetermined Significance is define by the presence of a

protein called serum M which should be 3.0 g/dL or less; the lack of lytic bone cut,

hypercalcemia, anemia, and if determined, theres a proportion of plasma cells in the bone

marrow of 10% or less (Robert Kyle et. al., 2004). Thus, this plasma cell disorder is present

among 3 percent to 4 percent of the population for people who are more than 50 years old (Rishi

K. Wadhera et. al, 2011). There are four subtypes of plasma cell malignancy, the subtype of that

is most common is the multiple myeloma. Next is amyloidosis, then macroglobulinemia and the

other is lymphprolifative. The number of the abnormal proteins that was produced helps to

differentiate those subtypes of plasma malignancy.

Multiple myeloma is a plasma cell malignancy that occurs in the bone marrow caused by

the overproduction of monoclonal protein. It is one of the reasons why people have anemia and

hypercalcemia. In the case study of Rishi K. Wadhera et. al showed that MGUS occurs in seven

percent of the patients who have Multiple Myeloma.

Amyloidosis happens when an amyloid, an abnormal protein is produced in the bone

marrow or in the organs. This is a rare disease that eventually affects kidneys, liver and heart and

even the nervous sytem. Thus, those patients who have a severe amyloidosis commonly have an

organ failure.

Macroglobulinemia or much known as Waldenstroms macroglobulinemia occurs when

the cancer cells produce massive amounts of abnormal protein that is called macroglubilin. This

cell commonly grows in bone marrows, and the most common symptoms for this disorder are

heavy sweating, particularly at night, itchiness and inexplicable fever and weight loss.

Lymphprolifative occurs when the cell lymphocytes, the primary cells of lymphoid tissue,

are too much in quantities. This disorder might occur to patients who have low red blood cells or

anemia and is usually inherited if the parents of the patient have this kind of disorder. According

to the result of the study made by Robert Kyle et. al (July, 2004) who also study this Mayo clinic

data, it was found out that the analysis in sex, age, presence of organomegaly, light chain in

urine, the albumin level and IgG subclass doesnt have a power to separate the patients with

MGUS and those patient who have lymphoplasma cell disorder.

Albumin Level is a protein that is produced by liver. It is commonly test for those who

have a disease in the liver. A low albumin level may affect the body to have very small amount of

nutrients.

Creatinine is a waste that is eliminated after it passes to the kidney and is present in urine.

This is a product of creatine, an energy supplier of the muscle. For adults that has a kidney

disorder, dialysis is recommended when the creatinine reach 10.0 mg/dL while for babies,

dialysis should be done if their creatinine level reach 2.0 mg/dL.

Hemoglobin is the main component of the red blood cells that is responsible in carrying

oxygen from lungs to the other parts of the body tissues. A low hemoglobin level can cause

anemia.

Monoclonal spike is the main symptom or sign that a patient has a monoclonal

gammopathy of undetermined significance. Thus, serum protein electrophoresis is used to

separate the proteins based on their physical properties. The subsets that were produced were

used in interpreting the result on what kind of plasma malignancy does the patient have.

Limitations

From the data gathered, pctime was excluded since it has numerous of missing data.

Moreover, creatinine has a quite numerous missing data compared to the variable albumin,

hemoglobin (hgb) and mspike, in that case, the researcher select cases in PASW where in the

missing cases in creatinine will be exluded in the analysis.Hence, those cases that were not

selected by PASWD are excluded in analysis. So from 241 patients in the study, only 200 of

them were analyzed.

Objectives

In Kaplan Meier: To test if there is a significant difference between the survival curves of those

patients who does and does not proceed to subtypes of plasma cell malignancy.

In Cox Regression and Logistic Regression: To form a regression model that will be used to

predict an outcome for hazard risks.

Statistical Analysis

Kaplan Meier

Kaplan Meier method or Kaplan Meier estimate performs survival analysis with a

descriptive process for the distribution of the time-to-event data and creates survival plot. In

many clinical trials many participants or patients have withdrawn during the study, the reasons

can be: they stop going to hospital for follow-up, uncooperative patients, they have gone to

another hospital for some reasons, subject does not experience the event before the study ends,

death occurs or the study has a fixed period of time. For these cases, the data are still used for the

analysis and eventually will be considered as censored data. Because of this complication that

will affect the survival analysis, Kaplan Meier is good to use since it simply estimate the data in

the presence of many censored cases and compute the survival over time regardless of the

problems encountered. In addition, Log Rank test is well-known non-parametric test used for

comparing two or more survival curves or functions. This test will help the researcher to know if

the survival probabilities are the same for patients who joined early or late in the study and if the

desired event happened at a particular time.

Advantages:

Estimates the expected event, failure or survival time.

Creates Survival plots.

Compare the distribution for each level of the independent variables and analyze

separately those levels by the stratification variable.

Assumptions:

1. Censored cases are independent.

2. Censored and uncensored cases should have the same view of survival.

3. The event of interests probabilities should be dependent only on time without the

influence of covariates.

4. Cases that go through the study at various times (i.e. patients that was treated at different

times) should appear in a similar way.

5. The event studied (e.g. death) occurs in a particular time.

Hypothesis:

Ha: There is significant difference between the survival curves.

Decision Rule:

Reject the null hypothesis if the P-value of the Log-rank Test is less than the alpha.

Cox Regression

The log-rank test is helpful to compare the survival curves of two or more groups, but

Cox regression or also known as proportional hazards regression allows the researcher to

determine the effect of the independent variables hazard risk on the survival rate of a patient.

Cox Regression is designed for analyzing time until an event or time between events (Garson,

G. D., 2013). Hence, it creates a regression model called Cox Model. Cox model allows us to

estimate the hazard ratio or risk of death of a person through the independent variables. The final

model from the analysis will yield an equation for the hazard as a function of several explanatory

variables (Stephen J Walters, May 2009).

Assumption:

1. The proportional hazard assumption must be met; meaning the hazard risk of all

independent variables is constant over time.

Logistic Regression

Logistic Regression is used to assess the effect of multiple independent variables and its

relationship to the dependent variable that is usually dichotomous (meaning there are only two

outcomes). Logistic Regression has two models namely binary logistic regression and

multinomial logistic regression. If the dependent is dichotomous, binary logistic will be used but

if isnt or the dependent variable is composed of more than two categories, the multinomial

logistic regression will be used. Here, we will use binary logistic regression since we have a

dichotomous dependent variable, which is our censoring data.

The difference of logistic regression to Cox Regression is that logistic aims to estimate

the odd ratio of the risk proportions while Cox regression estimate the hazard ratio.

Assumption of Binary Logistic:

1. The dependent variable must be discrete and a dichotomous variable.

2. The dependent variable of the desired event must be coded as 1.

3. The model should be fitted properly and do not include many insignificant variables.

RESULTS

Kaplan Meier

Case Processing Summary

Patients who proceed to

Censored

subtype plasma

malignancy

Amyloidosis

Lymphprolifative

Macroglobulinemia

Multiple Myeloma

Overall

Total N

151

N of Events

107

7

4

5

33

200

7

2

4

31

151

N

44

Percent

29.1%

0

2

1

2

49

.0%

50.0%

20.0%

6.1%

24.5%

The table above summarizes the analysis dataset in terms of event and censored cases. It

present the number of patients that reached the end of the study (151) and the number of patients

that did not reach the end of the study (49), and the total number of patients (200). The N of

Events is the number of event occurred, here our event is Death, it indicates that the patient still

suffer in plasma malignancy until they die while the N of Censored are those who withdrawn

from the study for a reason, but we can assume that this patients are still alive. Thus, the

frequency above tells us that there are 7 patients who have amyloidosis, 4 with lymphprolifative,

5 with Macroglobulinemia, 33 with multiple myeloma and 151 for those who dont proceed to a

subtype plasma malignancy.

The survival table above shows a descriptive table that details the time until the event of

interest occurred. The table is sectioned by every group who does proceed to a subtype plasma

malignancy group and those who do not, and each group occupies its own row in the table. The

table above is not the whole table since the table is very large.

Time shows the time till the censored and the event occurred. Again, our event here is

death. The Status shows what is the event happened, if it is censored or the event. The

Cumulative Proportion Surviving at the Time is the proportion of cases that survives from the

start of the table until the end. N of Cumulative Events shows the number of cases that

experienced the terminal event from the start till the end of time in the table. N of Remaining

Cases shows the numbers of cases where in the event or the censored event does not yet happen.

Patients who proceed

to a subtype plasma

malignancy

Meana

95% Confidence Interval

Estimate Std. Error Lower Bound Upper Bound

5796.252

381.504

5048.504

6544.000

Amyloidosis

4526.714

751.555

3053.666

Lymphprolifative

7399.250

799.452

5832.324

Macroglobulinemia 4999.000 1097.063

2848.756

Multiple Myeloma

5338.747

439.709

4476.917

Overall

5458.036

283.891

4901.608

a. Estimation is limited to the largest survival time if it is censored.

dimension0

5999.762

8966.176

7149.244

6200.578

6014.463

Median

95% Confidence Interval

Estimate Std. Error Lower Bound Upper Bound

5088.000

512.841

4082.832

6093.168

3511.000

8052.000

5234.000

4996.000

5068.000

168.901

.000

2101.064

239.438

230.517

3179.955

.

1115.915

4526.701

4616.187

The table above shows the mean and median survival time per age group. Here, the mean

survival time is the area that was estimated under the survival curve in the interval 0 to tmax

(Klein & Moeschberger, 2003). On the other hand, the median survival is the least time wherein

the probability of survival drops to 50% (0.5) or below.

As we look to the 95% Confidence Interval, we can see that there are overlaps in the

confidence intervals. If the confidence interval does have a lot of overlap, it indicates that theres

a doubtful that there is difference in the "average" survival time. Since in our table theres only a

3842.045

.

9352.085

5465.299

5519.813

small gap with each other and that the group of lymphprolifative overlap with the patients who

doesnt proceed to plasma cell malignancy, we can conclude that there is only a slight or even no

difference between the average survival times of the groups.

Now, we will proceed to the most important figure, the survival plot or the survival

curves.

A a subtype

The figure above shows the comparisons of the survival distributions for patients who

proceed to a subtype of plasma cell malignancy and those who do not. This is the most important

since it reveals the differences between the survival distributions. The horizontal axis shows the

time interval, on the figure above, our time is by day. On the other hand, the vertical axis shows

us the survival probability. 1.0 indicates 100% survival probability and from 0.4 and below

indicates small probability of survival.

We can notice that the survival curves are like a staircase. Every step down of the curve

means that theres a patient who died, since the event that causes the survival curve to move

down is death (Steve Dun, 2002). Hence, we can see that the treatment for those who suffer from

lymphprolifative has the highest probability survival from their start time up to almost 6,000

days. Thus, it is good to note that the start time for the diagnosis in lymphprolifative is late

compared to other groups, (Amyloidosis group was the last to start in the diagnosis). But still,

they have the longest time interval with a survival probability of 100%. Nevertheless, after 8,000

days the survival curve of the group quickly move down to zero. This only indicates that no one

(except for those who withdrawn the study) was cured. Thus, we can conclude that those who

suffer from lymphprolifative in the study, except for those who were censored, have no

probability of survival after 8,000 days (21.9 years).

Those who do not proceed to a subtype of plasma cell malignancy have the longest time

of survival probability. When the time interval was almost 9,000 days, the group has a constant

survival probability of 23%. Thus, we can conclude that all patients who do not proceed to a

subtype of plasma cell malignancy were cured.

The group of patient who suffered from Multiple Myeloma was rank second in terms of

having a long survival probability. Consequently, though some of the patients reach the time

interval of almost 10,000 days, after that they eventually die. Hence, we can conclude that a

patient that has a multiple myeloma has a probability of survival until the time interval of 10,000

days or 27.39 years.

Those patients who suffered from Macroglobulinemia have the average days of survival

probability when compared to other groups since the end point of the curve is in their midst.

Thus, when a patient reached almost 6,500 days (17.8 years), the probability of survival will

descend to approximately 20% survival probability. Moreover, we can conclude that patients

who suffered from macroglobulinemia are all cured in the study since their survival curve does

not touch the 0 survival probability.

Patients who suffered from Amyloidosis were the most delayed in starting time of

diagnosis compared to other groups and does have a constant risk. Constant risk indicates that

the probability of dying doesnt change over time. Its risk curve has a constant half life where the

patients do not have relief, since the risk is always the same (Dun, 2002). Hence, we can

conclude that those patients who have amyloidosis died before 8,000 days (21.9 years).

In addition, the average or median survival is the time where in the percentage of survival

probability is 50%. We can see it in the figure above, the point in every survival curve indicates

the median survival and the broken lines indicate the days of survival probability. For example,

the average survival of patients who suffered from Multiple Myeloma was 5,000 days (13.69

years).

To compare the survival distribution, we will look to the table of overall comparisons.

Overall Comparisons

Log Rank (Mantel-Cox)

Breslow (Generalized

Chi-Square

3.709

2.980

df

4

4

Sig.

.447

.561

Wilcoxon)

Tarone-Ware

2.904

4

.574

Test of equality of survival distributions for the different levels of

Patients who proceed in plasma malignancy.

The table above shows the tests for all survival curves to test if there is difference

between them. These tests are quite similar to each other but they differ on how they weight the

assign value for each survival value. Log Rank test or Mantel-Cox is a test of equality per

survival functions that gives equal weights in all time points, while Breslow gives more weight

to the earlier failures or it weights all time points by the number of deaths or cases at risk.

Tarone-ware falls in between since it weights all time points by getting the square root of each

cases at risk for every time point.

Decision:

Since the Log Rank test has a significant value of 0.447 which is greater than 0.05, the alpha, we

fail to reject the null hypothesis and conclude that theres no significant difference between the

survival curves.

Conclusion:

There is no significant difference between the survival curves of patients who progress to a

subtype of plasma cell malignancy and for those who do not.

Cox Regression

Case Processing Summary

N

Cases available in analysis

Eventa

142

71.0%

45

22.5%

187

93.5%

13

6.5%

.0%

.0%

13

6.5%

200

100.0%

Censored

Total

Cases dropped

Percent

Total

Total

a. Dependent Variable: Days from diagnosis to last follow-up days

This table shows us the frequency of the analyzed data. The event, or the death cases are

142 and those censored are 45. The total for event and censored cases are 187. In contrast, there

are 13 missing cases. Thus, there are a total of 200 cases that were analyzed.

Now, we will proceed to the most important table, the Variables in the Equation.

95.0% CI for Exp(B)

B

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

dummyPCDX

-.524

.201

6.833

.009

.592

.399

.877

dummyage

1.289

.198

42.290

.000

3.629

2.461

5.351

dummycreat

.864

.305

8.051

.005

2.373

1.306

4.312

dummyalb

-.583

.181

10.353

.001

.558

.391

.796

dummyhgb

-.966

.212

20.701

.000

.381

.251

.577

.010

.187

.003

.957

1.010

.700

1.457

-.448

.183

5.986

.014

.639

.446

.915

dummymspike

DummySex

This table displays the difference between the hazard risks between the groups. The B or

the coefficient tells us if the risk of those who are grouped as 1 in the dummy variable is higher

than those who are grouped as 0. The Exp(B) displays the impact of the hazard risk for each

variables. The Sig. indicates the p-value that will help us to determine if the value of coefficients

of B and Exp(B) is significant. The P-value must be less than 0.05, the alpha.

The Sig. or the P-value of all independent variables shows that they are significant or

they contribute to the model except for the monoclonal spike. Hence, before we create the model

we should satisfy the assumption of Cox Regression that the hazard risk of independent variables

are proportional. Those independent variables that will not satisfy the assumption will not be

included in the model.

Proportional Hazard Ratio tells us that the hazard risk of the independent variable is constant

over time. Thus, we can check this by:

1. Creating a log minus log plot for the independent variable.

2. Use Cox with Time Dep. Covariate

The researcher use Cox with time Dep. Covariate, for an example:

95.0% CI for Exp(B)

B

T_COV_

dummyage

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

.000

.000

.002

.960

1.000

1.000

1.000

-1.125

.385

8.539

.003

.325

.153

.690

The table above will help us to see if the variable satisfies the assumption. We can do this

by looking to the value of Sig. which indicates the P-value of Time Covariate (T_Cov_). The null

hypothesis for Proportional Hazard Assumption is that the hazard risk is proportional. If the Pvalue is significant, we reject the null hypothesis and non-proportionality of hazard ratio is

present and does it violate the assumption.

Here, the P-value of T_COV_ is 0.960 which indicates that it is not significant. Hence,

we fail to reject the null hypothesis that the hazard risk is proportional. Thus, the age variable

satisfies the assumption of Proportional Hazard and it will be included in the model.

The following tables are the table of other independent variables for checking the assumption.

For Patients who progress to a plasma cell malignancy

Variables in the Equation

B

T_COV_

dummyPCDX

.000

1.121

SE

.000

.421

Wald

12.637

7.095

df

Sig.

.000

.008

1

1

Exp(B)

1.000

3.066

Lower

Upper

1.000

1.000

1.345

6.994

For Creatinine

Variables in the Equation

B

T_COV_

dummycreat

.000

-1.768

SE

.000

.454

Wald

4.135

15.180

df

1

1

Sig.

.042

.000

Exp(B)

1.000

.171

Lower

Upper

.999

1.000

.070

.415

Variables in the Equation

B

T_COV_

dummyalb

.000

1.178

SE

.000

.349

Wald

3.729

11.390

df

1

1

Sig.

.053

.001

Exp(B)

1.000

3.247

Lower

Upper

1.000

1.000

1.638

6.434

For Hemoglobin

Variables in the Equation

B

T_COV_

dummyhgb

.000

.940

SE

.000

.357

Wald

.188

6.938

df

1

1

Sig.

.665

.008

Exp(B)

1.000

2.560

Lower

Upper

1.000

1.000

1.272

5.152

Exp(B)

1.000

2.284

Lower

Upper

1.000

1.000

1.182

4.413

For Sex

Variables in the Equation

B

T_COV_

DummySex

.000

.826

SE

.000

.336

Wald

3.634

6.043

df

1

1

Sig.

.057

.014

This is the summary table for checking the assumption of each independent variable. Thus,

monoclonal spike size will not be included since it is not significant in the Cox Regression table.

Covariates

pcdx

Age

Creatinine

Albumin Level

Hemoglobin

Sex

P-Value

0.000

0.960

0.042

0.053

0.665

0.057

The table above shows us the independent variable that satisfies the assumption of

proportionality in hazard risk. Variables that are significant or the p-value is less than 0.05 will

be excluded in the model and those variables that are not significant or the p-value is greater than

0.05 will be included in the model. Thus, the independent variables age, albumin level,

hemoglobin and sex are the variables that satisfy the assumption. It indicates that these variables

have a constant hazard risk over time and that they will be included in our model.

The researcher ran again the Cox Regression analysis with the independent variables that

satisfy the assumption. Those independent variables that are excluded are pcdx and creatinine

since they violate the assumption.

Final Model for Cox Regression

Variables in the Equation

95.0% CI for Exp(B)

B

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

dummyage

1.240

.197

39.811

.000

3.455

2.351

5.079

dummyalb

-.529

.180

8.681

.003

.589

.414

.838

dummyhgb

-.918

.209

19.237

.000

.399

.265

.602

DummySex

.364

.174

4.353

.037

.695

.494

.978

Interpretation

1. Age

The coefficient 1.240 tells us that there is a difference between the hazard risks of the

groups. A 0 value means there is no difference between the groups hazard risks. It

indicates that the group 1 or the patients whose age is 61 to 90 years old has a higher

hazard risk compared to patients whose age is 34 to 60 years old. Also as the age

increases the hazard risk increases also and the survival rate of the patient decreases. The

Exp(B) which is 3.430 indicates that the hazard risk or the probability of dying for those

patients whose age is 61 to 90 years old is 3.455 times higher than those patients whose

age is 34 to 60 years old. Thus, we can conclude that the survival rate of those patients

whose age is 34 to 60 is higher than those patients whose age is 61 to 90.

2. Albumin level

The coefficient -.529 tells us that there is a difference between the hazard risks for

those who have a low albumin level and high albumin level at MGUS diagnosis. Since

the value has a negative sign, it indicates that those patients who have a higher albumin

level which is 3.1 to 5.1 has a lower hazard risk compared to those patients who have

only an albumin level of 1.8 to 3.0. The Exp(B) which is .589 implies that the hazard risk

or the probability of dying for those patients whose albumin level is 3.1 to 5.1 is .589

times lower than those patients with an albumin level of 1.8 to 3.0 during MGUS

diagnosis. We can conclude that patient who has an albumin level of 3.1 to 5.1 has a

higher survival rate compared to those patients whose albumin level is 1.8 to 3.0.

In addition, if the value of Exp(B) is equal to 1, it indicates that there is no

difference between the hazard risk of groups. If it is less than 1, the hazard risk for those

who are grouped as 1 in dummy variable is lower than those who are grouped as 0.

3. Hemoglobin

The coefficient -.918 tells us that there is a difference between the hazard risks in

the patients that have a low hemoglobin and high hemoglobin at MGUS diagnosis. Since

-.918 value is negative, it indicates that those patients who have a higher hemoglobin

which is 12.1 to 16.5 has a lower hazard risk compared to those patients who have

hemoglobin of 6.8 to 12.0. The Exp(B) which is .399 indicates that the hazard risk or the

probability of dying for those patients whose hemoglobin is 12.1 to 16.5 is 0.399 times

lower compared to those patients with lower hemoglobin during MGUS diagnosis. Thus,

we can conclude that patient who does have hemoglobin of 12.1 to 16.5 has a higher

survival rate compared to those patients whose hemoglobin is 6.8 to 12.0.

4. Sex

The coefficient .364 tells us that there is a difference between the hazard risks of

male and female patients. It also indicates that female patients (female are the dummy

variable 1 in Sex) has a lower hazard risk compared to male patients. The Exp(B) which

is .037 indicates that the hazard risk or the probability of dying for female patients is only

0.037 times lower compared to female patients during MGUS diagnosis. Thus, we can

conclude that female patient has a little bit higher survival rate compared to male

patients.

COX REGRESSION MODEL

Logistic Regression

Unweighted Casesa

Selected Cases

N

Included in Analysis

Missing Cases

Total

Unselected Cases

Total

Percent

187

93.5

13

6.5

200

100.0

.0

200

100.0

cases.

The table above tells us the number of cases that was analyzed. The number of included

cases is 187 while the missing cases are 13. The percent tells us the proportion of the cases

included and missing, thus the total case that was analyzed is 200.

This part describes the null model or the model with no predictors and just the intercept.

Classification Tablea,b

Observed

Predicted

Censoring Status

Censored

Step 0

Censoring Status

Percentage

Died

Correct

Censored

45

.0

Died

142

100.0

Overall Percentage

75.9

b. The cut value is .500

Variables in the Equation

Step 0

Constant

B

1.149

S.E.

.171

Wald

45.126

df

Sig.

.000

Exp(B)

3.156

The B indicates the coefficient for the constant or the intercept in the null model.

Variables not in the Equation

Score

Step 0

Variables

dummyPCDX(1)

df

Sig.

7.059

.008

dummyage

29.289

.000

dummycreat

1.028

.311

dummyalb

3.792

.051

dummyhgb

6.824

.009

dummymspike

.017

.898

dumysex(1)

.191

.662

53.177

.000

Overall Statistics

The table above name is "Variables not in the Equation" because all independent

variables that we select in forming the model is excluded, only the computed intercept.

This part is generally the most important part of the output. Hence the main tables to be look at

are the Classification Table and the table of Variables in the Equation.

Classification Tablea

Observed

Predicted

Censoring Status

Censored

Step 1

Censoring Status

Percentage

Died

Correct

Censored

26

19

57.8

Died

12

130

91.5

Overall Percentage

83.4

The table above shows us how fit the full model correctly classifies the cases. A perfect

model will show only values in the diagonal form indicating that the cases are correctly

classified. The most important here is the overall percentage in the lower right corner which

indicates that the full model with all independent variables and the constant is 83.4% correct;

which implies a good model.

Variables in the Equation

95% C.I.for EXP(B)

B

Step 1a

dummyPCDX(1)

S.E.

Wald

df

Sig.

Exp(B)

Lower

Upper

-2.427

.638

14.461

.000

.088

.025

.309

dummyage

2.512

.454

30.645

.000

12.331

5.067

30.010

dummycreat

.446

.947

.222

.637

1.563

.244

9.989

dummyalb

-.556

.494

1.266

.261

.573

.218

1.511

dummyhgb

-2.152

.724

8.834

.003

.116

.028

.481

dummymspike

.077

.446

.030

.862

1.080

.451

2.587

dumysex(1)

.228

.434

.277

.599

1.256

.537

2.938

3.941

.948

17.276

.000

51.469

Constant

a. Variable(s) entered on step 1: dummyPCDX, dummyage, dummycreat, dummyalb, dummyhgb, dummymspike, dumysex.

The table above shows us much information that will help us to assess the relationship

between dependent variable and independent variables.

The B shows the coefficient that will be used in forming logistic regression equation that

will help us to predict the dependent variable. Hence, they are in log-odds ratio meaning it shows

us how the log-odds of a "success" change if there is a one-unit change in the independent

variable. Since the event here is Death, it indicates that the probability of dying will increase if

there is an increase in the probability of success or in the log-odds units. The sign (if negative or

positive) of log-odds ratio indicates the relationship of dependent and independent variables.

Odds ratios with a value of 1 implies that the exposure of risk doesnt affect the odds of outcome,

a value of less than 1 implies that dying has a lower odds of outcome and a value that is greater

than 1 implies that dying has a higher odds of outcome.

S.E displays the standard errors within the coefficients. Hence, it is used for creating a

confidence interval for the parameters.

Wald shows the chi-square value that is use for testing the null hypothesis that the

coefficient is not different from 0. Sig. which indicates p-value will help us to know if the null

hypothesis of the Wald Chi-Square is significant. A coefficient that has a p-value that is less than

alpha which is 0.05 indicates that it is statistically significant and hence it will be used in the

logistic regression equation or model.

Here, the coefficients pcdx, age and hemoglobin are statistically significant since their pvalue is less than 0.05. Hence, they are only the variables which will be included in the model,

others are excluded.

The researcher had re-run the binary logistic regression that will only contain the

significant independent variables to form the model.

Classification Tablea

Observed

Predicted

Censoring Status

Censored

Step 1

Censoring Status

Overall Percentage

Percentage

Died

Correct

Censored

29

20

59.2

Died

13

138

91.4

83.5

We can see that this new model is 83.5% correctly classify the cases, indicating that the

model is good.

95% C.I.for EXP(B)

B

Step 1a

dummyPCDX(1)

S.E.

Wald

df

Sig.

Exp(B)

Lower

Upper

-2.206

.577

14.617

.000

.110

.036

.341

dummyage

2.623

.438

35.860

.000

13.778

5.839

32.512

dummyhgb

-2.386

.690

11.944

.001

.092

.024

.356

3.691

.833

19.636

.000

40.093

Constant

Interpretation

1. Patients who progress to a sub-type of plasma cell malignancy:

For every one-unit increase of the patient who progress to a sub-type plasma cell

malignancy we expect a decrease of -2.206 in the log-odds of dying holding all other

variables constant.

In terms of Exp(B), it expresses that the probability of dying for those patient who

proceed to a sub-type plasma cell malignancy is 0.110 times lower than those who do not.

2. Age:

For every one-unit increase in age of the patient who suffers from plasma cell

malignancy, we expect an increase of 2.623 in the log-odds of dying holding all other

variables constant.

In terms of Exp(B), it implies that the probability of dying for patient whose age

is 61 and above is 13.778 times higher than those patient whose age is lower than 61.

3. Hemoglobin:

For every one-unit increase in the hemoglobin of patient who suffers from plasma

cell malignancy, we expect a decrease of 2.386 in the log-odds of dying holding all other

variables constant.

In terms of Exp(B), it indicates that the probability of dying for patient whose

hemoglobin is 12.1 to 16.5 is 0.92 times lower than those patient whose hemoglobin is

6.8 to 12.

log(p/1-p) = 3.961 2.206*pcdx + 2.623*age 2.386*hemoglobin

Conclusion

In the Kaplan Meier estimates, the result was theres no significant difference between

the survival curves of patients who progress to a subtype of plasma cell malignancy and for those

who do not. Thus, we can conclude that whether a patient progress in a subtype of plasma cell

malignancy or not, the survival probability is the same.

The formed regression models:

Cox Model

Ht (t) = h0(t) exp(1.240xage_t 0.529albumin level_t 0.918hemoglobin_t + 0.364*Sex_t)

log(p/1-p) = 3.961 2.206*pcdx + 2.623*age 2.386*hemoglobin

As stated earlier, the difference of logistic regression to Cox Regression is that logistic

aims to estimate the odd ratio of the risk proportions while Cox regression estimate the hazard

ratio. Odd ratio is the chance of success, so the odd ratio here is the probability that the risk will

occur through the independent variables. Since Logistic estimates this odd ratio, Cox regression

is still more fitting to use since it estimate the hazard ratio. Thus, if the dependent variable is not

related to time to event but rather events that can be counted, in that case logistic or other models

are more fitting to use (Garson, G. D., 2013). Therefore, the formed Cox Regression model is

more appropriate to use since the data is time-to-event data.

Moreover, the independent that gives a higher hazard risk among all variables is the age.

Thus, we can conclude that as a patient grows older, the probability of dying is much expected.

Recommendation

Young patients who are diagnosed to have a monoclonal gammapothy of undetermined

significance are highly recommended to undergo treatment. Since based in the Survival Analysis,

there is a high survival probability for those young patients compared to old patients. Hence, the

formed regression models can be used in determining the hazard risk.

References:

Understanding nonparametric tests

http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesistests/nonparametrics-tests/understanding-nonparametric-tests/

Nonparametric Tests

http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Nonparametric/BS704_Nonparametric_print.html

Cox proportional-hazards regression

https://www.medcalc.org/manual/cox_proportional_hazards.php

Understanding survival analysis: Kaplan-Meier estimate

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059453/

Blood. 2011 May 26

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316455/

Diseases and Conditions Amyloidosis by Mayo Clinic Staff

http://www.mayoclinic.org/diseases-conditions/amyloidosis/basics/definition/con-20024354

American Cancer Society

http://www.cancer.org/cancer/waldenstrommacroglobulinemia/detailedguide/waldenstrommacroglobulinemia-w-m

Cancer.Net

http://www.cancer.net/cancer-types/waldenstrom%E2%80%99s-macroglobulinemia/symptomsand-signs

National Institute of Allergy and Infectious Diseases

http://www.niaid.nih.gov/topics/alps/pages/default.aspx

Lab Tests Online

https://labtestsonline.org/understanding/analytes/albumin/tab/test/

U.S National Library of Medicine

https://www.nlm.nih.gov/medlineplus/ency/article/003480.htm

Davita

http://www.davita.com/kidney-disease/overview/symptoms-and-diagnosis/what-iscreatinine?/e/4726

Medicine.Net

http://www.medicinenet.com/hemoglobin/article.htm

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3319353/

- 2013 LHB Taylor Et Al.Uploaded byJessica Smith
- spssSpunkinsUploaded byrhiscal
- How to Evaluate Credit Scorecards and Why Using the Gini Coefficient Has Cost You MoneyUploaded byhenrique_oliv
- Searching Appropriate Methods for Survey Data AnalysisUploaded byPrashanta Pokhrel
- Vegetation and CrimeUploaded byDan McGrady
- Regression analysis.docxUploaded byThePhantomStranger
- Non-Inferiority Tests for Two Survival Curves Using Cox's Proportional Hazards ModelUploaded byscjofyWFawlroa2r06YFVabfbaj
- Lopez-Saez Etal Landslides InpressUploaded byviorel pop
- 8 IJAEST Vol No.4 Issue No.2 Mathematical Model Based on Mobility Index for the 036 041Uploaded byiserp
- Harvesting Collective Intelligence: Temporal Behavior in Yahoo AnswersUploaded byHewlett-Packard
- Score TraumatologiaUploaded byAnyone Else
- Vcd5 Handout 2x2Uploaded byjuntujuntu
- 1-s2.0-S0148296306001846-mainUploaded bycss_said
- Lecture 14-1 Cross Section and PanelUploaded byMarcel Tuca Chincinllà Santagate
- Journal Geriatri UntungUploaded byuntung s
- ThompsonUploaded bydakota5197
- Mortality and Revision Rates in BHRs, THAsUploaded byJanuary59
- Public Open Space and WalkingUploaded byDesarts Sonnants
- Disc SurvUploaded byfoobar
- Marketing and AdvertisingUploaded byuser8589
- 4i2_coba (1)Uploaded byEspee
- 202123Uploaded byPrasad Kandekar
- Dokumen 63Uploaded byCa Nd Ra
- Precentral gyrus functional connectivity signatures of autismUploaded bytonylee24
- 08 Chapter 3Uploaded bykapu
- Internet TherapyUploaded byALEJANDRO MATUTE
- 3501-2015Uploaded byRabia Almamalook
- Auditor Change Sfo HealUploaded byGoodluck Mollel
- Thesis Final PresentationUploaded byAizaz Ul Haq
- CHPTER 4 EMUploaded byMuhammad Hanief Al-Kautsar

- EczemaUploaded byAlexandra Cliveti
- Galactosemia NcpUploaded byLucky Charm Rosos
- Barcelona 2014Uploaded byckeshava
- DR GRAND CASEUploaded byMaria Michaela Cassiopeia
- Oral Propranolol for Treatment of Pediatric Capillary HemangiomasUploaded byesti_mahanani
- clm104c12Uploaded byAaron Rasmussen
- 2019 ACC-AHA Guideline on the Primary Prevention of Cardiovascular.pdfUploaded byThalía Lelis Sánchez Santillán
- 12 DiseasesUploaded byMaryam Tahir
- 3. Complete DentureUploaded byMohsin Habib
- Follow Health Safety and Security ProceduresUploaded byKenard
- National Vital Statistics Reports, Vol. 61, No. 6, October 10, 2012Uploaded byHeather X Rhodes
- Catalogo Edicion 2016 American Eagle U.S.a.Uploaded byJosé Valdebenito Cifuentes
- PYODERMIA.pptUploaded byakashkumarpanwar
- Research ProposalUploaded byChelLen Pascual
- AP Chapter 11Uploaded byAnnel Bender
- Complete FIGO Strategic Plan 2013 - 2016 - July 2014Uploaded byHenriPurnomo
- e 05213741Uploaded byIOSRJEN : hard copy, certificates, Call for Papers 2013, publishing of journal
- fbs 502touch lr sales lit 050913Uploaded byapi-221020032
- Novartis Q2 2017 IR PresentationUploaded bymedtechy
- Clinical Lab Skills Assignment 2Uploaded byDaniel Woodard
- B2B Camp Registration Form - Final FillableUploaded byJennifer Gilbert
- veterinary post-mortem examinations.pdfUploaded byQuinho
- 01_BiocidesUploaded byResolve Itech
- Respiration in DanceUploaded byJessicaKohler
- Home Remedies for ConstipationUploaded byviky24
- Youth+Weightlifting+Position+StatementUploaded byHari Setiawan
- Viro Therapy fullUploaded byUmarAli
- Annotated Bibliography of Post-Reich Orgonomic JournalsUploaded byLeon Southgate
- tredmill.pdfUploaded byRexhail Ramadani
- 1-s2.0-S0021915008005315Uploaded byFranciscus Buwana