You are on page 1of 26

Case Study about Monoclonal Gammapothy of Undetermined Significance (MGUS): an

Application of Survival Analysis using Kaplan Meier, Cox Regression and Logistic
Regression

Introduction
Plasma cells are responsible in producing immunoglobulin, an important part of the
immune defense. They are estimated to be about 106 different immunoglobulins in the circulation
of the blood at a specific time. Thus, when a patient has a plasma cell malignancy, the
distribution of immunoglobulin will be subjugated by a single isotype, a product of malignant
clone. It can be seen as a spike on a serum protein electrophoresis. Monoclonal gammapothy of
undertermined significance (MGUS) is the existence of such spike; hence, it is a premalignant
plasma cell disorder. In order for the researcher to study about this disease, non-parametric tests
were used.
Non-parametric test is a hypothesis test where in the distribution of the data does not
need to be described by the parameters. It is also called as distribution-free tests because it has
only few assumptions to be satisfied. One of the most popular non-parametric tests that are
widely used in clinical studies is the Survival Analysis.
Survival analysis is a statistical technique that is used to analyze time-to-event data to
model the length of time until the event of interest occurs. The response variable often indicates
as an event time, failure time and survival time. Models that are created by the survival analysis
are called survival models.
Survival models are widely used in variety of health related fields and also used for
conducting research in the field of economics, physical and biological sciences, social sciences,
sociological, psychological, political, anthropological data and even in the field of engineering
that is typically called as reliability analysis (Thomas Ryan, 2015).
PASW or Predictive Analytics Software (formerly SPSS) is applied statistical software
that is used effectively in conducting a Survival Analysis, and one of the non-parametric
statistical tests that are commonly used is the Kaplan Meier method.

Materials and Methods


The researcher gathered a secondary data that contain the natural history of 241 subjects
with monoclonal gammapothy of undetermined significance (MGUS).
As for the description of the data, it was conducted by Dr. Robert Kyle in Mayo Clinic in
Rochester, Minn.
A data frame with 241 observations on the following 12 variables.
id:

subject id

age:

age in years

sex:

male or female

dxyr: year of diagnosis


pcdx: for subjects who progress to a plasma cell malignancy
the subtype of malignancy: multiple myeloma (MM) is the
most common, followed by amyloidosis (AM), macroglobulinemia (MA),
and other lymphprolifative (LP)
pctime:

days from MGUS until diagnosis of a plasma cell malignancy

futime: days from diagnosis to last follow-up


death: 1= follow-up is until death
alb:

albumin level at MGUS diagnosis

creat: creatinine at MGUS diagnosis


hgb:

hemoglobin at MGUS diagnosis

mspike: size of the monoclonal protein spike at diagnosis


Reference: http://vincentarelbundock.github.io/Rdatasets/datasets.html

In addition, the censoring status is the variable death, indicating that all patients who have
been coded as 1 in PASW, dies. This is not a problem since the study record the patients followup till he or she dies. Moreover, the researcher created dummy variable for all independent
variables for analyzing Cox Regression Analysis and Logistic Regression.

Dummy Variables Coding:

Age
0 = 34 to 60 years old
1 = 61 to 90 years old
Sex
0 = Male
1 = Female
Patients who progress to a subtype plasma malignancy
0 = Those who do not
1 = Those who progress
Creatinine at MGUS Diagnosis
0 = 0.6 1.4
1 = 1.5 6.4
Albumin Level
0 = 1 3.0
1 = 3.1 5.1
Hemoglobin
0 = 6.8 12.5
1 = 12.1 16.5
Monoclonal Spike Size
0 = 0.8 1.8
1 = 1.9 2.9

Definitions and Related Literature


Monoclonal Gammapothy of Undetermined Significance is define by the presence of a
protein called serum M which should be 3.0 g/dL or less; the lack of lytic bone cut,
hypercalcemia, anemia, and if determined, theres a proportion of plasma cells in the bone
marrow of 10% or less (Robert Kyle et. al., 2004). Thus, this plasma cell disorder is present
among 3 percent to 4 percent of the population for people who are more than 50 years old (Rishi
K. Wadhera et. al, 2011). There are four subtypes of plasma cell malignancy, the subtype of that
is most common is the multiple myeloma. Next is amyloidosis, then macroglobulinemia and the

other is lymphprolifative. The number of the abnormal proteins that was produced helps to
differentiate those subtypes of plasma malignancy.
Multiple myeloma is a plasma cell malignancy that occurs in the bone marrow caused by
the overproduction of monoclonal protein. It is one of the reasons why people have anemia and
hypercalcemia. In the case study of Rishi K. Wadhera et. al showed that MGUS occurs in seven
percent of the patients who have Multiple Myeloma.
Amyloidosis happens when an amyloid, an abnormal protein is produced in the bone
marrow or in the organs. This is a rare disease that eventually affects kidneys, liver and heart and
even the nervous sytem. Thus, those patients who have a severe amyloidosis commonly have an
organ failure.
Macroglobulinemia or much known as Waldenstroms macroglobulinemia occurs when
the cancer cells produce massive amounts of abnormal protein that is called macroglubilin. This
cell commonly grows in bone marrows, and the most common symptoms for this disorder are
heavy sweating, particularly at night, itchiness and inexplicable fever and weight loss.
Lymphprolifative occurs when the cell lymphocytes, the primary cells of lymphoid tissue,
are too much in quantities. This disorder might occur to patients who have low red blood cells or
anemia and is usually inherited if the parents of the patient have this kind of disorder. According
to the result of the study made by Robert Kyle et. al (July, 2004) who also study this Mayo clinic
data, it was found out that the analysis in sex, age, presence of organomegaly, light chain in
urine, the albumin level and IgG subclass doesnt have a power to separate the patients with
MGUS and those patient who have lymphoplasma cell disorder.
Albumin Level is a protein that is produced by liver. It is commonly test for those who
have a disease in the liver. A low albumin level may affect the body to have very small amount of
nutrients.
Creatinine is a waste that is eliminated after it passes to the kidney and is present in urine.
This is a product of creatine, an energy supplier of the muscle. For adults that has a kidney
disorder, dialysis is recommended when the creatinine reach 10.0 mg/dL while for babies,
dialysis should be done if their creatinine level reach 2.0 mg/dL.
Hemoglobin is the main component of the red blood cells that is responsible in carrying
oxygen from lungs to the other parts of the body tissues. A low hemoglobin level can cause
anemia.

Monoclonal spike is the main symptom or sign that a patient has a monoclonal
gammopathy of undetermined significance. Thus, serum protein electrophoresis is used to
separate the proteins based on their physical properties. The subsets that were produced were
used in interpreting the result on what kind of plasma malignancy does the patient have.

Limitations
From the data gathered, pctime was excluded since it has numerous of missing data.
Moreover, creatinine has a quite numerous missing data compared to the variable albumin,
hemoglobin (hgb) and mspike, in that case, the researcher select cases in PASW where in the
missing cases in creatinine will be exluded in the analysis.Hence, those cases that were not
selected by PASWD are excluded in analysis. So from 241 patients in the study, only 200 of
them were analyzed.
Objectives
In Kaplan Meier: To test if there is a significant difference between the survival curves of those
patients who does and does not proceed to subtypes of plasma cell malignancy.
In Cox Regression and Logistic Regression: To form a regression model that will be used to
predict an outcome for hazard risks.
Statistical Analysis
Kaplan Meier
Kaplan Meier method or Kaplan Meier estimate performs survival analysis with a
descriptive process for the distribution of the time-to-event data and creates survival plot. In
many clinical trials many participants or patients have withdrawn during the study, the reasons
can be: they stop going to hospital for follow-up, uncooperative patients, they have gone to
another hospital for some reasons, subject does not experience the event before the study ends,
death occurs or the study has a fixed period of time. For these cases, the data are still used for the
analysis and eventually will be considered as censored data. Because of this complication that

will affect the survival analysis, Kaplan Meier is good to use since it simply estimate the data in
the presence of many censored cases and compute the survival over time regardless of the
problems encountered. In addition, Log Rank test is well-known non-parametric test used for
comparing two or more survival curves or functions. This test will help the researcher to know if
the survival probabilities are the same for patients who joined early or late in the study and if the
desired event happened at a particular time.
Advantages:

Estimates survival data even there are censored cases.


Estimates the expected event, failure or survival time.
Creates Survival plots.
Compare the distribution for each level of the independent variables and analyze
separately those levels by the stratification variable.

Assumptions:
1. Censored cases are independent.
2. Censored and uncensored cases should have the same view of survival.
3. The event of interests probabilities should be dependent only on time without the
influence of covariates.
4. Cases that go through the study at various times (i.e. patients that was treated at different
times) should appear in a similar way.
5. The event studied (e.g. death) occurs in a particular time.
Hypothesis:

Ho: There is no significant difference between the survival curves.


Ha: There is significant difference between the survival curves.

Note: This hypothesis will be test by Log-rank Test.

Decision Rule:
Reject the null hypothesis if the P-value of the Log-rank Test is less than the alpha.
Cox Regression
The log-rank test is helpful to compare the survival curves of two or more groups, but
Cox regression or also known as proportional hazards regression allows the researcher to
determine the effect of the independent variables hazard risk on the survival rate of a patient.
Cox Regression is designed for analyzing time until an event or time between events (Garson,
G. D., 2013). Hence, it creates a regression model called Cox Model. Cox model allows us to
estimate the hazard ratio or risk of death of a person through the independent variables. The final
model from the analysis will yield an equation for the hazard as a function of several explanatory
variables (Stephen J Walters, May 2009).
Assumption:
1. The proportional hazard assumption must be met; meaning the hazard risk of all
independent variables is constant over time.

Logistic Regression
Logistic Regression is used to assess the effect of multiple independent variables and its
relationship to the dependent variable that is usually dichotomous (meaning there are only two
outcomes). Logistic Regression has two models namely binary logistic regression and
multinomial logistic regression. If the dependent is dichotomous, binary logistic will be used but
if isnt or the dependent variable is composed of more than two categories, the multinomial
logistic regression will be used. Here, we will use binary logistic regression since we have a
dichotomous dependent variable, which is our censoring data.
The difference of logistic regression to Cox Regression is that logistic aims to estimate
the odd ratio of the risk proportions while Cox regression estimate the hazard ratio.
Assumption of Binary Logistic:
1. The dependent variable must be discrete and a dichotomous variable.
2. The dependent variable of the desired event must be coded as 1.

3. The model should be fitted properly and do not include many insignificant variables.

RESULTS
Kaplan Meier
Case Processing Summary
Patients who proceed to

Censored

subtype plasma
malignancy
Amyloidosis
Lymphprolifative
Macroglobulinemia
Multiple Myeloma
Overall

Total N
151

N of Events
107

7
4
5
33
200

7
2
4
31
151

N
44

Percent
29.1%

0
2
1
2
49

.0%
50.0%
20.0%
6.1%
24.5%

The table above summarizes the analysis dataset in terms of event and censored cases. It
present the number of patients that reached the end of the study (151) and the number of patients
that did not reach the end of the study (49), and the total number of patients (200). The N of
Events is the number of event occurred, here our event is Death, it indicates that the patient still
suffer in plasma malignancy until they die while the N of Censored are those who withdrawn
from the study for a reason, but we can assume that this patients are still alive. Thus, the
frequency above tells us that there are 7 patients who have amyloidosis, 4 with lymphprolifative,
5 with Macroglobulinemia, 33 with multiple myeloma and 151 for those who dont proceed to a
subtype plasma malignancy.

The survival table above shows a descriptive table that details the time until the event of
interest occurred. The table is sectioned by every group who does proceed to a subtype plasma
malignancy group and those who do not, and each group occupies its own row in the table. The
table above is not the whole table since the table is very large.
Time shows the time till the censored and the event occurred. Again, our event here is
death. The Status shows what is the event happened, if it is censored or the event. The
Cumulative Proportion Surviving at the Time is the proportion of cases that survives from the
start of the table until the end. N of Cumulative Events shows the number of cases that
experienced the terminal event from the start till the end of time in the table. N of Remaining
Cases shows the numbers of cases where in the event or the censored event does not yet happen.

Means and Medians for Survival Time


Patients who proceed
to a subtype plasma
malignancy

Meana
95% Confidence Interval
Estimate Std. Error Lower Bound Upper Bound
5796.252
381.504
5048.504
6544.000

Amyloidosis
4526.714
751.555
3053.666
Lymphprolifative
7399.250
799.452
5832.324
Macroglobulinemia 4999.000 1097.063
2848.756
Multiple Myeloma
5338.747
439.709
4476.917
Overall
5458.036
283.891
4901.608
a. Estimation is limited to the largest survival time if it is censored.
dimension0

5999.762
8966.176
7149.244
6200.578
6014.463

Median
95% Confidence Interval
Estimate Std. Error Lower Bound Upper Bound
5088.000
512.841
4082.832
6093.168
3511.000
8052.000
5234.000
4996.000
5068.000

168.901
.000
2101.064
239.438
230.517

3179.955
.
1115.915
4526.701
4616.187

The table above shows the mean and median survival time per age group. Here, the mean
survival time is the area that was estimated under the survival curve in the interval 0 to tmax
(Klein & Moeschberger, 2003). On the other hand, the median survival is the least time wherein
the probability of survival drops to 50% (0.5) or below.
As we look to the 95% Confidence Interval, we can see that there are overlaps in the
confidence intervals. If the confidence interval does have a lot of overlap, it indicates that theres
a doubtful that there is difference in the "average" survival time. Since in our table theres only a

3842.045
.
9352.085
5465.299
5519.813

small gap with each other and that the group of lymphprolifative overlap with the patients who
doesnt proceed to plasma cell malignancy, we can conclude that there is only a slight or even no
difference between the average survival times of the groups.
Now, we will proceed to the most important figure, the survival plot or the survival
curves.

A a subtype

The figure above shows the comparisons of the survival distributions for patients who
proceed to a subtype of plasma cell malignancy and those who do not. This is the most important
since it reveals the differences between the survival distributions. The horizontal axis shows the
time interval, on the figure above, our time is by day. On the other hand, the vertical axis shows
us the survival probability. 1.0 indicates 100% survival probability and from 0.4 and below
indicates small probability of survival.
We can notice that the survival curves are like a staircase. Every step down of the curve
means that theres a patient who died, since the event that causes the survival curve to move
down is death (Steve Dun, 2002). Hence, we can see that the treatment for those who suffer from
lymphprolifative has the highest probability survival from their start time up to almost 6,000
days. Thus, it is good to note that the start time for the diagnosis in lymphprolifative is late
compared to other groups, (Amyloidosis group was the last to start in the diagnosis). But still,
they have the longest time interval with a survival probability of 100%. Nevertheless, after 8,000
days the survival curve of the group quickly move down to zero. This only indicates that no one
(except for those who withdrawn the study) was cured. Thus, we can conclude that those who
suffer from lymphprolifative in the study, except for those who were censored, have no
probability of survival after 8,000 days (21.9 years).
Those who do not proceed to a subtype of plasma cell malignancy have the longest time
of survival probability. When the time interval was almost 9,000 days, the group has a constant
survival probability of 23%. Thus, we can conclude that all patients who do not proceed to a
subtype of plasma cell malignancy were cured.
The group of patient who suffered from Multiple Myeloma was rank second in terms of
having a long survival probability. Consequently, though some of the patients reach the time
interval of almost 10,000 days, after that they eventually die. Hence, we can conclude that a
patient that has a multiple myeloma has a probability of survival until the time interval of 10,000
days or 27.39 years.
Those patients who suffered from Macroglobulinemia have the average days of survival
probability when compared to other groups since the end point of the curve is in their midst.
Thus, when a patient reached almost 6,500 days (17.8 years), the probability of survival will
descend to approximately 20% survival probability. Moreover, we can conclude that patients

who suffered from macroglobulinemia are all cured in the study since their survival curve does
not touch the 0 survival probability.
Patients who suffered from Amyloidosis were the most delayed in starting time of
diagnosis compared to other groups and does have a constant risk. Constant risk indicates that
the probability of dying doesnt change over time. Its risk curve has a constant half life where the
patients do not have relief, since the risk is always the same (Dun, 2002). Hence, we can
conclude that those patients who have amyloidosis died before 8,000 days (21.9 years).
In addition, the average or median survival is the time where in the percentage of survival
probability is 50%. We can see it in the figure above, the point in every survival curve indicates
the median survival and the broken lines indicate the days of survival probability. For example,
the average survival of patients who suffered from Multiple Myeloma was 5,000 days (13.69
years).
To compare the survival distribution, we will look to the table of overall comparisons.
Overall Comparisons
Log Rank (Mantel-Cox)
Breslow (Generalized

Chi-Square
3.709
2.980

df
4
4

Sig.
.447
.561

Wilcoxon)
Tarone-Ware
2.904
4
.574
Test of equality of survival distributions for the different levels of
Patients who proceed in plasma malignancy.

The table above shows the tests for all survival curves to test if there is difference
between them. These tests are quite similar to each other but they differ on how they weight the
assign value for each survival value. Log Rank test or Mantel-Cox is a test of equality per
survival functions that gives equal weights in all time points, while Breslow gives more weight
to the earlier failures or it weights all time points by the number of deaths or cases at risk.
Tarone-ware falls in between since it weights all time points by getting the square root of each
cases at risk for every time point.
Decision:

Since the Log Rank test has a significant value of 0.447 which is greater than 0.05, the alpha, we
fail to reject the null hypothesis and conclude that theres no significant difference between the
survival curves.
Conclusion:
There is no significant difference between the survival curves of patients who progress to a
subtype of plasma cell malignancy and for those who do not.

Cox Regression
Case Processing Summary
N
Cases available in analysis

Eventa

142

71.0%

45

22.5%

187

93.5%

13

6.5%

Cases with negative time

.0%

Censored cases before the

.0%

13

6.5%

200

100.0%

Censored
Total
Cases dropped

Percent

Cases with missing values

earliest event in a stratum


Total
Total
a. Dependent Variable: Days from diagnosis to last follow-up days

This table shows us the frequency of the analyzed data. The event, or the death cases are
142 and those censored are 45. The total for event and censored cases are 187. In contrast, there
are 13 missing cases. Thus, there are a total of 200 cases that were analyzed.
Now, we will proceed to the most important table, the Variables in the Equation.

Variables in the Equation


95.0% CI for Exp(B)
B

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

dummyPCDX

-.524

.201

6.833

.009

.592

.399

.877

dummyage

1.289

.198

42.290

.000

3.629

2.461

5.351

dummycreat

.864

.305

8.051

.005

2.373

1.306

4.312

dummyalb

-.583

.181

10.353

.001

.558

.391

.796

dummyhgb

-.966

.212

20.701

.000

.381

.251

.577

.010

.187

.003

.957

1.010

.700

1.457

-.448

.183

5.986

.014

.639

.446

.915

dummymspike
DummySex

This table displays the difference between the hazard risks between the groups. The B or
the coefficient tells us if the risk of those who are grouped as 1 in the dummy variable is higher
than those who are grouped as 0. The Exp(B) displays the impact of the hazard risk for each
variables. The Sig. indicates the p-value that will help us to determine if the value of coefficients
of B and Exp(B) is significant. The P-value must be less than 0.05, the alpha.
The Sig. or the P-value of all independent variables shows that they are significant or
they contribute to the model except for the monoclonal spike. Hence, before we create the model
we should satisfy the assumption of Cox Regression that the hazard risk of independent variables
are proportional. Those independent variables that will not satisfy the assumption will not be
included in the model.
Proportional Hazard Ratio tells us that the hazard risk of the independent variable is constant
over time. Thus, we can check this by:
1. Creating a log minus log plot for the independent variable.
2. Use Cox with Time Dep. Covariate
The researcher use Cox with time Dep. Covariate, for an example:

Variables in the Equation


95.0% CI for Exp(B)
B
T_COV_
dummyage

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

.000

.000

.002

.960

1.000

1.000

1.000

-1.125

.385

8.539

.003

.325

.153

.690

The table above will help us to see if the variable satisfies the assumption. We can do this
by looking to the value of Sig. which indicates the P-value of Time Covariate (T_Cov_). The null
hypothesis for Proportional Hazard Assumption is that the hazard risk is proportional. If the Pvalue is significant, we reject the null hypothesis and non-proportionality of hazard ratio is
present and does it violate the assumption.
Here, the P-value of T_COV_ is 0.960 which indicates that it is not significant. Hence,
we fail to reject the null hypothesis that the hazard risk is proportional. Thus, the age variable
satisfies the assumption of Proportional Hazard and it will be included in the model.

The following tables are the table of other independent variables for checking the assumption.
For Patients who progress to a plasma cell malignancy
Variables in the Equation
B
T_COV_
dummyPCDX

.000
1.121

SE
.000
.421

Wald
12.637
7.095

df

Sig.
.000
.008

1
1

Exp(B)
1.000
3.066

95.0% CI for Exp(B)


Lower
Upper
1.000
1.000
1.345
6.994

For Creatinine
Variables in the Equation
B
T_COV_
dummycreat

.000
-1.768

SE
.000
.454

Wald
4.135
15.180

df
1
1

Sig.
.042
.000

Exp(B)
1.000
.171

95.0% CI for Exp(B)


Lower
Upper
.999
1.000
.070
.415

For Albumin Level


Variables in the Equation
B
T_COV_
dummyalb

.000
1.178

SE
.000
.349

Wald
3.729
11.390

df
1
1

Sig.
.053
.001

Exp(B)
1.000
3.247

95.0% CI for Exp(B)


Lower
Upper
1.000
1.000
1.638
6.434

For Hemoglobin
Variables in the Equation
B
T_COV_
dummyhgb

.000
.940

SE
.000
.357

Wald
.188
6.938

df
1
1

Sig.
.665
.008

Exp(B)
1.000
2.560

95.0% CI for Exp(B)


Lower
Upper
1.000
1.000
1.272
5.152

Exp(B)
1.000
2.284

95.0% CI for Exp(B)


Lower
Upper
1.000
1.000
1.182
4.413

For Sex
Variables in the Equation
B
T_COV_
DummySex

.000
.826

SE
.000
.336

Wald
3.634
6.043

df
1
1

Sig.
.057
.014

This is the summary table for checking the assumption of each independent variable. Thus,
monoclonal spike size will not be included since it is not significant in the Cox Regression table.

Covariates
pcdx
Age
Creatinine
Albumin Level
Hemoglobin
Sex

P-Value
0.000
0.960
0.042
0.053
0.665
0.057

The table above shows us the independent variable that satisfies the assumption of
proportionality in hazard risk. Variables that are significant or the p-value is less than 0.05 will
be excluded in the model and those variables that are not significant or the p-value is greater than
0.05 will be included in the model. Thus, the independent variables age, albumin level,
hemoglobin and sex are the variables that satisfy the assumption. It indicates that these variables
have a constant hazard risk over time and that they will be included in our model.

The researcher ran again the Cox Regression analysis with the independent variables that
satisfy the assumption. Those independent variables that are excluded are pcdx and creatinine
since they violate the assumption.
Final Model for Cox Regression
Variables in the Equation
95.0% CI for Exp(B)
B

SE

Wald

df

Sig.

Exp(B)

Lower

Upper

dummyage

1.240

.197

39.811

.000

3.455

2.351

5.079

dummyalb

-.529

.180

8.681

.003

.589

.414

.838

dummyhgb

-.918

.209

19.237

.000

.399

.265

.602

DummySex

.364

.174

4.353

.037

.695

.494

.978

Interpretation
1. Age
The coefficient 1.240 tells us that there is a difference between the hazard risks of the
groups. A 0 value means there is no difference between the groups hazard risks. It
indicates that the group 1 or the patients whose age is 61 to 90 years old has a higher
hazard risk compared to patients whose age is 34 to 60 years old. Also as the age
increases the hazard risk increases also and the survival rate of the patient decreases. The
Exp(B) which is 3.430 indicates that the hazard risk or the probability of dying for those
patients whose age is 61 to 90 years old is 3.455 times higher than those patients whose
age is 34 to 60 years old. Thus, we can conclude that the survival rate of those patients
whose age is 34 to 60 is higher than those patients whose age is 61 to 90.
2. Albumin level
The coefficient -.529 tells us that there is a difference between the hazard risks for
those who have a low albumin level and high albumin level at MGUS diagnosis. Since
the value has a negative sign, it indicates that those patients who have a higher albumin

level which is 3.1 to 5.1 has a lower hazard risk compared to those patients who have
only an albumin level of 1.8 to 3.0. The Exp(B) which is .589 implies that the hazard risk
or the probability of dying for those patients whose albumin level is 3.1 to 5.1 is .589
times lower than those patients with an albumin level of 1.8 to 3.0 during MGUS
diagnosis. We can conclude that patient who has an albumin level of 3.1 to 5.1 has a
higher survival rate compared to those patients whose albumin level is 1.8 to 3.0.
In addition, if the value of Exp(B) is equal to 1, it indicates that there is no
difference between the hazard risk of groups. If it is less than 1, the hazard risk for those
who are grouped as 1 in dummy variable is lower than those who are grouped as 0.
3. Hemoglobin
The coefficient -.918 tells us that there is a difference between the hazard risks in
the patients that have a low hemoglobin and high hemoglobin at MGUS diagnosis. Since
-.918 value is negative, it indicates that those patients who have a higher hemoglobin
which is 12.1 to 16.5 has a lower hazard risk compared to those patients who have
hemoglobin of 6.8 to 12.0. The Exp(B) which is .399 indicates that the hazard risk or the
probability of dying for those patients whose hemoglobin is 12.1 to 16.5 is 0.399 times
lower compared to those patients with lower hemoglobin during MGUS diagnosis. Thus,
we can conclude that patient who does have hemoglobin of 12.1 to 16.5 has a higher
survival rate compared to those patients whose hemoglobin is 6.8 to 12.0.
4. Sex
The coefficient .364 tells us that there is a difference between the hazard risks of
male and female patients. It also indicates that female patients (female are the dummy
variable 1 in Sex) has a lower hazard risk compared to male patients. The Exp(B) which
is .037 indicates that the hazard risk or the probability of dying for female patients is only
0.037 times lower compared to female patients during MGUS diagnosis. Thus, we can
conclude that female patient has a little bit higher survival rate compared to male
patients.
COX REGRESSION MODEL

Ht (t) = h0(t) exp(1.240xage_t 0.529albumin level_t 0.918hemoglobin_t + 0.364*Sex_t)

Logistic Regression

Case Processing Summary


Unweighted Casesa
Selected Cases

N
Included in Analysis
Missing Cases
Total

Unselected Cases
Total

Percent
187

93.5

13

6.5

200

100.0

.0

200

100.0

a. If weight is in effect, see classification table for the total number of


cases.

The table above tells us the number of cases that was analyzed. The number of included
cases is 187 while the missing cases are 13. The percent tells us the proportion of the cases
included and missing, thus the total case that was analyzed is 200.

Block 0: Beginning Block


This part describes the null model or the model with no predictors and just the intercept.

Classification Tablea,b
Observed

Predicted
Censoring Status
Censored

Step 0

Censoring Status

Percentage

Died

Correct

Censored

45

.0

Died

142

100.0

Overall Percentage

75.9

a. Constant is included in the model.


b. The cut value is .500
Variables in the Equation
Step 0

Constant

B
1.149

S.E.
.171

Wald
45.126

df

Sig.
.000

Exp(B)
3.156

The B indicates the coefficient for the constant or the intercept in the null model.
Variables not in the Equation
Score
Step 0

Variables

dummyPCDX(1)

df

Sig.

7.059

.008

dummyage

29.289

.000

dummycreat

1.028

.311

dummyalb

3.792

.051

dummyhgb

6.824

.009

dummymspike

.017

.898

dumysex(1)

.191

.662

53.177

.000

Overall Statistics

The table above name is "Variables not in the Equation" because all independent
variables that we select in forming the model is excluded, only the computed intercept.

Block 1: Method = Enter


This part is generally the most important part of the output. Hence the main tables to be look at
are the Classification Table and the table of Variables in the Equation.

Classification Tablea
Observed

Predicted
Censoring Status
Censored

Step 1

Censoring Status

Percentage

Died

Correct

Censored

26

19

57.8

Died

12

130

91.5

Overall Percentage

83.4

a. The cut value is .500

The table above shows us how fit the full model correctly classifies the cases. A perfect
model will show only values in the diagonal form indicating that the cases are correctly
classified. The most important here is the overall percentage in the lower right corner which
indicates that the full model with all independent variables and the constant is 83.4% correct;
which implies a good model.
Variables in the Equation
95% C.I.for EXP(B)
B
Step 1a

dummyPCDX(1)

S.E.

Wald

df

Sig.

Exp(B)

Lower

Upper

-2.427

.638

14.461

.000

.088

.025

.309

dummyage

2.512

.454

30.645

.000

12.331

5.067

30.010

dummycreat

.446

.947

.222

.637

1.563

.244

9.989

dummyalb

-.556

.494

1.266

.261

.573

.218

1.511

dummyhgb

-2.152

.724

8.834

.003

.116

.028

.481

dummymspike

.077

.446

.030

.862

1.080

.451

2.587

dumysex(1)

.228

.434

.277

.599

1.256

.537

2.938

3.941

.948

17.276

.000

51.469

Constant

a. Variable(s) entered on step 1: dummyPCDX, dummyage, dummycreat, dummyalb, dummyhgb, dummymspike, dumysex.

The table above shows us much information that will help us to assess the relationship
between dependent variable and independent variables.
The B shows the coefficient that will be used in forming logistic regression equation that
will help us to predict the dependent variable. Hence, they are in log-odds ratio meaning it shows
us how the log-odds of a "success" change if there is a one-unit change in the independent
variable. Since the event here is Death, it indicates that the probability of dying will increase if

there is an increase in the probability of success or in the log-odds units. The sign (if negative or
positive) of log-odds ratio indicates the relationship of dependent and independent variables.
Odds ratios with a value of 1 implies that the exposure of risk doesnt affect the odds of outcome,
a value of less than 1 implies that dying has a lower odds of outcome and a value that is greater
than 1 implies that dying has a higher odds of outcome.
S.E displays the standard errors within the coefficients. Hence, it is used for creating a
confidence interval for the parameters.
Wald shows the chi-square value that is use for testing the null hypothesis that the
coefficient is not different from 0. Sig. which indicates p-value will help us to know if the null
hypothesis of the Wald Chi-Square is significant. A coefficient that has a p-value that is less than
alpha which is 0.05 indicates that it is statistically significant and hence it will be used in the
logistic regression equation or model.
Here, the coefficients pcdx, age and hemoglobin are statistically significant since their pvalue is less than 0.05. Hence, they are only the variables which will be included in the model,
others are excluded.
The researcher had re-run the binary logistic regression that will only contain the
significant independent variables to form the model.

Classification Tablea
Observed

Predicted
Censoring Status
Censored

Step 1

Censoring Status

Overall Percentage

Percentage

Died

Correct

Censored

29

20

59.2

Died

13

138

91.4
83.5

a. The cut value is .500

We can see that this new model is 83.5% correctly classify the cases, indicating that the
model is good.

Variables in the Equation


95% C.I.for EXP(B)
B
Step 1a

dummyPCDX(1)

S.E.

Wald

df

Sig.

Exp(B)

Lower

Upper

-2.206

.577

14.617

.000

.110

.036

.341

dummyage

2.623

.438

35.860

.000

13.778

5.839

32.512

dummyhgb

-2.386

.690

11.944

.001

.092

.024

.356

3.691

.833

19.636

.000

40.093

Constant

a. Variable(s) entered on step 1: dummyPCDX, dummyage, dummyhgb.

Interpretation
1. Patients who progress to a sub-type of plasma cell malignancy:
For every one-unit increase of the patient who progress to a sub-type plasma cell
malignancy we expect a decrease of -2.206 in the log-odds of dying holding all other
variables constant.
In terms of Exp(B), it expresses that the probability of dying for those patient who
proceed to a sub-type plasma cell malignancy is 0.110 times lower than those who do not.
2. Age:
For every one-unit increase in age of the patient who suffers from plasma cell
malignancy, we expect an increase of 2.623 in the log-odds of dying holding all other
variables constant.
In terms of Exp(B), it implies that the probability of dying for patient whose age
is 61 and above is 13.778 times higher than those patient whose age is lower than 61.
3. Hemoglobin:
For every one-unit increase in the hemoglobin of patient who suffers from plasma
cell malignancy, we expect a decrease of 2.386 in the log-odds of dying holding all other
variables constant.
In terms of Exp(B), it indicates that the probability of dying for patient whose
hemoglobin is 12.1 to 16.5 is 0.92 times lower than those patient whose hemoglobin is
6.8 to 12.

Logistic Regression Model or Equation


log(p/1-p) = 3.961 2.206*pcdx + 2.623*age 2.386*hemoglobin

Conclusion
In the Kaplan Meier estimates, the result was theres no significant difference between
the survival curves of patients who progress to a subtype of plasma cell malignancy and for those
who do not. Thus, we can conclude that whether a patient progress in a subtype of plasma cell
malignancy or not, the survival probability is the same.
The formed regression models:
Cox Model
Ht (t) = h0(t) exp(1.240xage_t 0.529albumin level_t 0.918hemoglobin_t + 0.364*Sex_t)

Binary Logistic Model


log(p/1-p) = 3.961 2.206*pcdx + 2.623*age 2.386*hemoglobin
As stated earlier, the difference of logistic regression to Cox Regression is that logistic
aims to estimate the odd ratio of the risk proportions while Cox regression estimate the hazard
ratio. Odd ratio is the chance of success, so the odd ratio here is the probability that the risk will
occur through the independent variables. Since Logistic estimates this odd ratio, Cox regression
is still more fitting to use since it estimate the hazard ratio. Thus, if the dependent variable is not
related to time to event but rather events that can be counted, in that case logistic or other models
are more fitting to use (Garson, G. D., 2013). Therefore, the formed Cox Regression model is
more appropriate to use since the data is time-to-event data.
Moreover, the independent that gives a higher hazard risk among all variables is the age.
Thus, we can conclude that as a patient grows older, the probability of dying is much expected.

Recommendation
Young patients who are diagnosed to have a monoclonal gammapothy of undetermined
significance are highly recommended to undergo treatment. Since based in the Survival Analysis,
there is a high survival probability for those young patients compared to old patients. Hence, the
formed regression models can be used in determining the hazard risk.

References:
Understanding nonparametric tests
http://support.minitab.com/en-us/minitab/17/topic-library/basic-statistics-and-graphs/hypothesistests/nonparametrics-tests/understanding-nonparametric-tests/
Nonparametric Tests
http://sphweb.bumc.bu.edu/otlt/MPHModules/BS/BS704_Nonparametric/BS704_Nonparametric_print.html
Cox proportional-hazards regression
https://www.medcalc.org/manual/cox_proportional_hazards.php
Understanding survival analysis: Kaplan-Meier estimate
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059453/
Blood. 2011 May 26
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316455/
Diseases and Conditions Amyloidosis by Mayo Clinic Staff
http://www.mayoclinic.org/diseases-conditions/amyloidosis/basics/definition/con-20024354
American Cancer Society

http://www.cancer.org/cancer/waldenstrommacroglobulinemia/detailedguide/waldenstrommacroglobulinemia-w-m
Cancer.Net
http://www.cancer.net/cancer-types/waldenstrom%E2%80%99s-macroglobulinemia/symptomsand-signs
National Institute of Allergy and Infectious Diseases
http://www.niaid.nih.gov/topics/alps/pages/default.aspx
Lab Tests Online
https://labtestsonline.org/understanding/analytes/albumin/tab/test/
U.S National Library of Medicine
https://www.nlm.nih.gov/medlineplus/ency/article/003480.htm
Davita
http://www.davita.com/kidney-disease/overview/symptoms-and-diagnosis/what-iscreatinine?/e/4726
Medicine.Net
http://www.medicinenet.com/hemoglobin/article.htm

US National Library of Medicine, National Institute of Health


http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3319353/