Dtrwebinar

An Introduction to Dynamic
Treatment Regimes
Marie Davidian
Department of Statistics
North Carolina State University
http://www4.stat.ncsu.edu/davidian
1/64 Dynamic Treatment Regimes Webinar

Outline
• What is a dynamic treatment regime, and why study them?

• Clinical trials to study dynamic treatment regimes
• Thinking in terms of dynamic treatment regimes
• Constructing dynamic treatment regimes
• Discussion

Hot topic
Personalized Medicine
Source of graphic: http://www.personalizedmedicine.com/

A perspective on personalized medicine
Clinical practice: Clinicians make (a series of) treatment

decisions(s) over the course of a patient’s disease or disorder
• Key decision points in the disease process
• Fixed schedule , milestone in the disease process, event
necessitating a decision
• Several treatment options at each decision point
• Accruing information on the patient


• “Personalize ” treatment to the patient


• “Personalize ” treatment to the patient
That is: Treatment in practice involves sequential

decision-making based on accruing information
• Suggests thinking about and studying treatment from this
perspective. . .

Clinical decision-making
How are these decisions made?

• Clinical judgment
• Practice guidelines based on study results, expert opinion
• Synthesize all information on a patient up to the point of
the decision to determine the next treatment action

Clinical decision-making
How are these decisions made?

• Clinical judgment
• Practice guidelines based on study results, expert opinion
• Synthesize all information on a patient up to the point of
the decision to determine the next treatment action
Can clinical decision-making be formalized and made

“evidence-based?”

Dynamic treatment regime
Dynamic treatment regime:

• A set of sequential decision rules, each corresponding to a
key decision point
• Each rule dictates the treatment to be given from among
the available options based on the accrued information on
the patient to that point
• Taken together, the rules define an algorithm for making
treatment decisions
• Dynamic because the treatment action can vary depending
on the accrued information
• Ideally , provides an “evidence-based ” approach to
personalized treatment

Treatment regime
Terminology/Convention:
• Often, treatment regime is used to refer generally to any
approach to deciding on treatment
• And dynamic treatment regime is reserved for the case
where patient information is used
• We will use these terms interchangeably
In fact: Many common situations can be cast as involving

(dynamic) treatment regimes

ADHD therapy
Sequential (scheduled) decision points

• Decision 1: Low dose therapy – 2 options: medication or
behavior modification
• Subsequent monthly decisions:
I Responders – Continue initial therapy
I Non-responders – 2 options: add the other therapy or
increase dose of current therapy
• Objective: Improved end-of-school-year performance
Example from Susan Murphy, University of Michigan

Cancer treatment
Two (milestone) decision points:

• Decision 1 : Induction chemotherapy (options C1 , C2 )
• Decision 2 :
I Maintenance treatment for patients who respond
(options M1 , M2 )
I Salvage chemotherapy for those who don’t respond
(options S1 , S2 )
• Objective : Maximize survival time

Possible treatment regimes
Possible rules at Decision 1:
• “Give C1 ” (non-dynamic )

• “If age < 50, progesterone receptor level < 10 fmol,
RAD51 mutation, then give C1 , else, give C2 ”

• “If patient is a Libra, Scorpio, or Sagittarius, give C1 ,
else, give C2 ”

else, give C2 ”

• “If patient responds, give maintenance M1 ; if does not
respond, give salvage S1 ” (dynamic )

else, give C2 ”

• “If patient responds, give maintenance M1 ; if does not
respond, give salvage S1 ” (dynamic )
• “If patient responds, age < 60, CEA > 10 ng/mL,
progesterone receptor level < 8 fmol, give M1 , else, give
M2 ; if does not respond, age > 65, P53 mutation,
CA 15-3 > 25 units/mL, then give S1 , else, give S2 ”

Result: Rules, and thus regimes , can be simple or complex

(or not realistic )
• More complex rules involve more “personalization ” and
more closely mimic clinical practice
• There is an infinitude of possible rules at each decision
point, and thus an infinitude of possible regimes
• Ultimate goal : Find the “best ” or “optimal ” regime
Regimes of interest and “optimal” depend on the question

• For definiteness, assume larger outcomes are preferred

Classes of treatment regimes
1. Classical treatment comparison:
• Focus on a single decision point

• Cancer example: Decision 1
• Two regimes of interest: “Give C1 ” vs. “Give C2 ”
• Class of regimes of interest is D = { “Give C1 ” , “Give C2 ”}

• Usual question : “If all patients in the population were to be
given C1 , would mean outcome (mean survival time ) be
different from (better than ) that if all patients in the
population were to be given C2 ?”

given C1 , would mean outcome (mean survival time ) be
different from (better than ) that if all patients in the
population were to be given C2 ?”
• Optimal regime in D: The regime such that, if all patients in
the population were to receive treatment according to it ,
mean outcome would be the largest among all regimes in
D (here, “Give C1 ” or “Give C2 ”)

2. Which is the “best” treatment sequence?
• Multiple decision points

• Cancer example: Eight dynamic regimes of interest:

1. Give C1 followed by (M1 if response, S1 if no response)
• Class D of interest contains these 8 regimes

• Question: Comparison of mean outcomes if all patients in
the population were to follow each regime

• Question: Comparison of mean outcomes if all patients in
the population were to follow each regime
• Optimal regime in D: The regime such that, if all patients
were to receive treatment according to it , mean outcome
would be the largest among all regimes in D

3. “Best” dynamic regime in a “feasible class?”
• Single or multiple decision points

• X1 = (lots of) patient information available at Decision 1
• In resource-limited setting, interested in rules depending
on a subset of X1 routinely collected, e.g., of form
“If age < η1 and PR < η2 give C2 ; else give C1 ”
PR = progesterone receptor level

• Class D of interest consists of all regimes of this form
(so for all values of η1 and η2 )

• Class D of interest consists of all regimes of this form
(so for all values of η1 and η2 )
• Optimal regime in D: The regime defined by values η1opt ,
η2opt such that, if all patients in the population were to
receive treatment according to it , mean outcome would be
the largest among all regimes in D

4. “Optimal” overall dynamic treatment regime:
• Cancer example: Two decision points

• X1 = patient information available at Decision 1, X2 =
additional information collected between Decisions 1 and 2
• Accrued information at each decision
Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
• Class D of interest: All possible sets of rules
{d1 (H1 ), d2 (H2 )}
• Each rule takes as input the accrued information and
outputs a treatment from among the available options

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
• Class D of interest: All possible sets of rules
{d1 (H1 ), d2 (H2 )}
• Each rule takes as input the accrued information and
outputs a treatment from among the available options
• Optimal regime in D: {d1opt (H1 ), d2opt (H2 )} such that, if all
patients were to receive treatment according to it , mean
outcome would be the largest among all regimes in D

In all of Cases 1–4: A set of rules at each of K decision points,
K = 1 or 2, depending on accrued information
Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}
• Case 1 : K = 1, rules of form (simple )
d1 (H1 ) = Cj for all H1 , j = 1, 2

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}
d1 (H1 ) = Cj for all H1 , j = 1, 2
d1 (H1 ) = Cj for all H1 , j = 1, 2
X2 contains response status
d2 (H2 ) = Mk if response, S` if no response, k , ` = 1, 2

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}
• Case 3 : K = 1, code {C1 , C2 } = {0, 1}, rules of form

d1 (H1 ) = I(age < η1 , PR < η2 )

Decision 1 H1 = X1
Decision 2 H2 = {X1 , A1 , X2 }
d = d1 (H1 ) or d = {d1 (H1 ), d2 (H2 )}
• Case 3 : K = 1, code {C1 , C2 } = {0, 1}, rules of form

d1 (H1 ) = I(age < η1 , PR < η2 )
• Case 4 : K = 2, general rules {d1 (H1 ), d2 (H2 )}; e.g., with

two options coded as {0, 1} at each decision
d1 (H1 ) = I(η1T H1 > 0), d2 (H2 ) = I(η2T H2 > 0)
Rules involve linear combinations of accrued information

Studying dynamic treatment regimes
How do we find an optimal treatment regime within a class

of interest?
• Required : Appropriate data
• Case 1. Classical, single decision treatment comparison :
Data from a standard clinical trial comparing C1 and C2
• Case 2. Optimal treatment sequence for two decision
points (simple dynamic treatment regimes)
• We will return to Cases 3 and 4 later

Clinical trials for studying treatment regimes
Recall: In our example, D consists of eight regimes

How do we compare the regimes in D and identify the

“best?”

Can’t we base this on data from a series of previous trials?
• In one trial, C1 was compared against C2 in terms of
response rate

response rate
• In another trial, M1 and M2 were compared on the basis of
survival time in subjects who responded to their induction
chemotherapy

response rate
chemotherapy
• In yet another, S1 and S2 were compared (survival ) in
subjects for whom induction therapy did not induce
response

response rate
chemotherapy
• In yet another, S1 and S2 were compared (survival ) in
subjects for whom induction therapy did not induce
response
• Can’t we just “piece together ” the results from these
separate trials to figure out the “best regime ?”
• E.g., figure out the best “C” treatment for inducing
response and then the best “M” and “S” treatments for
prolonging survival?
• Wouldn’t the regime that uses these have to have the
“best ” mean outcome?
One problem with this: Delayed effects

• E.g., C1 may yield a higher proportion of responders than
C2 but may also have other effects that render subsequent
maintenance treatments less effective in terms of mean
survival time
• Implication : Must study entire regimes in the same
patients

One problem with this: Delayed effects

• E.g., C1 may yield a higher proportion of responders than
C2 but may also have other effects that render subsequent
maintenance treatments less effective in terms of mean
survival time
• Implication : Must study entire regimes in the same
patients
Data for doing this:

• Design a clinical trial expressly for this purpose (next )
• Use longitudinal observational data , where treatments
actually received at each decision point have been
recorded (with other information)

Clinical trials:
• An eight arm trial – subjects randomized to the jth arm
follow the jth regime
• A Sequential , Multiple Assignment , Randomized Trial
(next slide. . . )
• How to analyze the data to compare regimes and find the
optimal regime ? What else can be learned from such
trials?

SMART: Sequential, Multiple Assignment, Randomized Trial
(Randomization at •s)
M1
Response M2
C1
S1
No
Response
S2
Cancer
M1
Response
C2 M2
No
Response S1
S2
Pioneered by Susan Murphy, Phil Lavori, and others

Embedded regimes: The eight regimes in D are embedded in
the SMART
M1
Response M2
C1
S1
No
Response
S2
Cancer
M1
Response
C2 M2
No
Response S1
S2

Examples of SMARTs: SMARTs have been carried out or are
ongoing, mainly in behavioral disorders; see
http://methodology.psu.edu/ra/smart/projects
• SMARTs have also been done in oncology (coming up. . . )

Examples of SMARTs: SMARTs have been carried out or are
ongoing, mainly in behavioral disorders; see
http://methodology.psu.edu/ra/smart/projects
• SMARTs have also been done in oncology (coming up. . . )
Remarks:
• There is really no conceptual difference between
randomizing up front or sequentially
• Advantages and disadvantages , e.g., consent , balance
• Important : Making efficient use of the data
Seminal reference: Murphy SA. (2005). An experimental

design for the development of adaptive treatment strategies,
Statistics in Medicine , 24, 1455–1481.

Remark 1: Individuals following the same regime can have
different realized treatment experiences , e.g.,
Give C1 followed by (M1 if response, S1 if no response)
• Subject 1 : Receives C1 , responds, receives M1
• Subject 2 : Receives C1 , does not respond, receives S1
• Both subjects’ experiences are consistent with following
this regime

Remark 1: Individuals following the same regime can have
different realized treatment experiences , e.g.,
Give C1 followed by (M1 if response, S1 if no response)
• Subject 1 : Receives C1 , responds, receives M1
• Subject 2 : Receives C1 , does not respond, receives S1
• Both subjects’ experiences are consistent with following
this regime
Remark 2: Individuals following different regimes can have the

same realized treatment experience , e.g., experience
C1 ⇒ Response ⇒ M1
is consistent with having followed EITHER OF regimes
• C1 followed by (M1 if response, S1 if no response)
• C1 followed by (M1 if response, S2 if no response)

Remark 3: Do not confuse the regime with the possible
realized experiences that can result from following it
• “C1 followed by response followed by M1 ” and “C1 followed
by no response followed by S1 ” are not regimes but are
possible results of following the above regime
• The regime is the algorithm (set of rules)

Remark 3: Do not confuse the regime with the possible
realized experiences that can result from following it
• “C1 followed by response followed by M1 ” and “C1 followed
by no response followed by S1 ” are not regimes but are
possible results of following the above regime
• The regime is the algorithm (set of rules)
Remark 4: Do not confuse dynamic treatment regimes

themselves or SMARTs with response-adaptive clinical trial
designs for classical treatment comparisons
• A dynamic treatment regime is an algorithm for treating a
single patient
• This has nothing to do with other patients in a study
• An adaptive trial is one in which the data are used to alter
the design (e.g., drop an arm, sample size)
• The design of a SMART does not change

Estimation of mean outcome (e.g., mean survival):

• Usual approach under up-front randomization : estimate
mean for regime j by sample average outcome based on
subjects randomized to regime j only

Estimation of mean outcome (e.g., mean survival):

• Usual approach under up-front randomization : estimate
mean for regime j by sample average outcome based on
subjects randomized to regime j only
• However : Subjects will have realized experiences
consistent with more than one regime !
• This can be exploited to improve precision. . .

Estimating mean outcome for embedded regimes
Demonstration:
• A certain kind of SMART is common in oncology. . .
• . . . but way these trials are usually analyzed does not focus
on comparing the embedded dynamic treatment regimes
and finding the best treatment sequence
• We demonstrate the general principle of how to exploit
realized experiences consistent with more than one regime
to do this
Reference: Lunceford JK, Davidian M, Tsiatis AA. (2002).

Estimation of survival distributions of treatment policies in
two-stage randomization designs in clinical trials. Biometrics ,
58, 48–57.

Cancer and Leukemia Group B (CALGB) Protocol 8923:

Double-blind, placebo-controlled trial of 338 elderly subjects
with acute myelogenous leukemia (AML) with randomizations
at two key decision points

Cancer and Leukemia Group B (CALGB) Protocol 8923:

Double-blind, placebo-controlled trial of 338 elderly subjects
with acute myelogenous leukemia (AML) with randomizations
at two key decision points
• Decision 1 : Subjects randomized to either standard
induction chemotherapy C1 OR standard induction therapy
+ granulocyte-macrophage colony-stimulating factor
(GM-CSF ) C2 (two options)
• Decision 2 :
I If response , subjects randomized to M1 , M2 =
intensification/maintenance treatments I, II (two options)
I If no response , only one option: follow-up with physician
• All subjects followed for the outcome survival time

Four possible regimes: The class D of interest comprises

1. C1 followed by (M1 if response, else follow-up) (C1 M1 )

Schematic of CALGB 8923: Randomization at •s

Follow-up
Non-

Response

Chemo +
Intensification I
Placebo

Response

Intensification II

AML

Non-
Follow-up
Response

Chemo +
GM-CSF
Intensification I

Response

Intensification II

Standard analysis:
• Compare response rates to C1 and C2
• Compare survival between M1 and M2 among responders
• Compare survival between C1 and C2 regardless of
subsequent response
• Does not address the embedded regimes

Goal: Find the regime in D such that, if all patients in the

population were to receive treatment according to it , mean
survival would be the largest

Goal: Find the regime in D such that, if all patients in the

population were to receive treatment according to it , mean
survival would be the largest
• Estimate mean survival if all patients followed each of the
four embedded regimes Cj Mk , j = 1, 2, k = 1, 2
• Use data from all subjects whose realized experience is
consistent with having followed Cj Mk
• I.e., subjects with either
Cj ⇒ response ⇒ Mk
Cj ⇒ no response ⇒ follow up with physician

Statistical framework: Causal inference perspective

• Characterize in terms of potential outcomes
Consider first: Classical single decision treatment comparison

Statistical framework
Case 1: Classical, single decision treatment comparison

• D = { “Give C1 ” , “Give C2 ” }
• Hypothesize potential outcomes under each regime in D


• D = { “Give C1 ” , “Give C2 ” }
• Y (1) = outcome that would be achieved if a randomly
chosen patient from the population were to follow regime
“Give C1 ”; Y (2) defined analogously
• E(Y (1) ) = the mean outcome if all patients in the
population were to follow “Give C1 ”; E(Y (2) ) analogously


• D = { “Give C1 ” , “Give C2 ” }
• Y (1) = outcome that would be achieved if a randomly
chosen patient from the population were to follow regime
“Give C1 ”; Y (2) defined analogously
• E(Y (1) ) = the mean outcome if all patients in the
population were to follow “Give C1 ”; E(Y (2) ) analogously
given C1 , would mean outcome be different from (better
than ) that if all patients were to be given C2 ?”
⇒ Compare E(Y (1) ) and E(Y (2) )

Clinical trial: Do not observe Y (1) and Y (2) on each subject
• If A = 1 (2) if subject randomized to “Give C1 ” (“Give C2 ”),
we do observe (Y , A), where
Y = Y (1) I(A = 1) + Y (2) I(A = 2)

Clinical trial: Do not observe Y (1) and Y (2) on each subject
• If A = 1 (2) if subject randomized to “Give C1 ” (“Give C2 ”),
we do observe (Y , A), where
Y = Y (1) I(A = 1) + Y (2) I(A = 2)
• By randomization , Y (1) , Y (2) ⊥

⊥A
⇒ E(Y (1) ) = E(Y (1) |A = 1) = E(Y |A = 1)
and similarly for E(Y (2) )
• Thus, from observed data (Yi , Ai ), i = 1, . . . , n (iid), can
estimate Pn
(1) Yi I(Ai = 1)
E(Y ) by Pi=1 n ,
i=1 I(Ai = 1)
the usual sample average , and E(Y (2) ) similarly

Case 2: Optimal treatment sequence for two decision points

• D = { Cj Mk , j, k = 1, 2 }


• D = { Cj Mk , j, k = 1, 2 }
• Y (jk ) = survival time that would be achieved if a randomly
chosen patient from the population were to follow Cj Mk


• D = { Cj Mk , j, k = 1, 2 }
• Y (jk ) = survival time that would be achieved if a randomly
chosen patient from the population were to follow Cj Mk
• Question : Compare mean survival if all patients followed
each of Cj Mk , j, k = 1, 2
⇒ Compare (estimate ) E(Y (jk) ), j, k = 1, 2
• Or survival probabilities
Sjk (t) = pr(Y (jk) > t) = E{I(Y (jk ) > t)}, j, k = 1, 2
• Assume no censoring (can be generalized )

Clinical trial (e.g., SMART): Do not observe Y (jk ) , j, k = 1, 2
• Can we make a connection between potential outcomes
and observed data as we did in Case 1?

• Consider j = 1; j = 2 similar
Observed for each subject: (R, RZ , Y )

• Y = survival time
• R = 1 if subject responds to C1 , R = 0 if not
• Z = k for responder randomized to Mk , k = 1, 2
(not defined if R = 0)

• Consider j = 1; j = 2 similar
Observed for each subject: (R, RZ , Y )

• Y = survival time
• R = 1 if subject responds to C1 , R = 0 if not
• Z = k for responder randomized to Mk , k = 1, 2
(not defined if R = 0)
• Assume when R = 0, Y (11) , Y (12) are the same ; then
Y = (1 − R)Y (11) + RI(Z = 1)Y (11) + RI(Z = 2)Y (12)
• From observed data (Ri , Ri Zi , Yi ), i = 1, . . . , n (iid),
Estimate E(Y (11) ), E(Y (12) ) and similarly for j = 2

Consider j = 1: Responders to C1 are randomized to M1 with

probability π = 1/2
• Nonresponders to C1 ⇒ follow up
• Half of responders get M1 , half get M2
• Estimate mean survival for C1 M1 by weighted average
• Nonresponders represent themselves ⇒ weight = 1
• Each responder who got M1 represents him/herself and
another similar subject who got randomized to M2 ⇒
weight = 2
• Estimator for C1 M2 , switch roles
• Note : Survival times from nonresponders are used to
estimate the means for both C1 M1 and C1 M2

Formally: For j = 1 (j = 2 similar), (Ri , Ri Zi , Yi ), i = 1, . . . , n
Yi = survival time for subject i
Ri = 1 if i responds to C1 , Ri = 0 if not
Zi = k for responder randomized to Mk , k = 1, 2
pr(Zi = 1| Ri = 1) = π (= 1/2 in previous)

Formally: For j = 1 (j = 2 similar), (Ri , Ri Zi , Yi ), i = 1, . . . , n
Yi = survival time for subject i
Ri = 1 if i responds to C1 , Ri = 0 if not
Zi = k for responder randomized to Mk , k = 1, 2
pr(Zi = 1| Ri = 1) = π (= 1/2 in previous)
Estimators for E(Y (11) ): Qi = 1 − Ri + Ri I(Zi = 1) π −1

n n
!−1 n
X X X
−1
n Qi Yi or Qi Qi Yi
i=1 i=1 i=1
• Qi = 0 if i is inconsistent with C1 M1 (consistent with C1 M2 )

• Qi = 1 if Ri = 0
• Qi = π −1 if Ri = 1 and Zi = 1
• Similarly for E(Y (12) )

Estimators for E(Y (11) ): Qi = 1 − Ri + Ri I(Zi = 1) π −1
n n
!−1 n
X X X
−1
n Qi Yi or Qi Qi Yi
i=1 i=1 i=1
• Can show : E(QY ) = E(Y (11) ), E(Q) = 1

• And similarly for j, k = 1, 2
• ⇒ Consistent estimators for E(Y (jk) ) (Appendix)
• Estimators for E(Y (jk ) ), k = 1, 2, are correlated
• Can derive statistics for comparison ⇒ identify optimal
regime in D

Remarks:
• Subjects may die before having a chance to respond –
nonresponders at the time of death (R = 0)
• Survival time may be right-censored – can incorporate
inverse probability of censoring weighting
• Randomization at each decision is key ⇒ subjects are
prognostically similar
• Can be generalized to arbitrary number of decisions,
numbers of options at each

Designing SMARTs
Considerations:
• Class of regimes should involve key decision points where
it is feasible to randomize
• And with more than one treatment option and no
consensus on choice among options
• Simplicity – small numbers of decision points and options
• Embedded regimes should have simple decision rules ;
e.g., depending only on a few variables (response status )
• Criteria and methods for sample size determination is an
open problem
• Critical : Collect rich patient information at baseline and
between decision points to inform development of more
complex , optimal regimes (e.g., Cases 3 and 4)
• More shortly. . .

Designing SMARTs
Schematic of CALGB 8923: Randomization at •s

Follow-up
Non-

Response

Chemo +
Intensification I
Placebo

Response

Intensification II

AML

Non-
Follow-up
Response

Chemo +
GM-CSF
Intensification I

Response

Intensification II

Thinking in terms of dynamic treatment regimes
Questions not addressed in a conventional clinical trial:

• If a treatment is effective, what should be the duration of
administration?
• How would the randomized treatments have compared if
no patients had discontinued their assigned treatments?

Questions not addressed in a conventional clinical trial:

• If a treatment is effective, what should be the duration of
administration?
• How would the randomized treatments have compared if
no patients had discontinued their assigned treatments?
Such questions can be cast as questions about dynamic

treatment regimes
• Available data are almost always observational
• Databases from registries
• Databases from completed clinical trials

Example: Optimal treatment duration
• ESPRIT trial – Integrilin vs. placebo in PCI/stent patients
• Primary analysis : Integrilin superior
• Protocol : Infusion duration of 18 – 24 hours with
mandatory stopping for adverse events
• Duration of infusion left to physician discretion
• What should be the “recommended ” treatment duration ?
• Data are observational with respect to this question

Example: Optimal treatment duration
• ESPRIT trial – Integrilin vs. placebo in PCI/stent patients
• Primary analysis : Integrilin superior
• Protocol : Infusion duration of 18 – 24 hours with
mandatory stopping for adverse events
• Duration of infusion left to physician discretion
• What should be the “recommended ” treatment duration ?
• Data are observational with respect to this question
More precisely: Treatment duration of t hours means infuse

for t hours or until an adverse event requiring stopping,
whichever comes first
• This is a dynamic treatment regime for each t because
realized duration depends on the adverse event status
Johnson BA, Tsiatis AA. (2004). Estimating mean response as a function of treatment duration in an observational
study, where duration may be informatively censored. Biometrics, 60, 315–323.

Duration regime of t hours:
Stop infusion

immediately
AE before t
hours
Start Integrilin
infusion
No AE
before t Stop infusion at
hours t hours
• D = { all regimes of the form “infuse for t hours or until an

adverse event requiring
stopping, whichever comes first”
for 18 ≤ t ≤ 24 }
Objective : Find t opt ∈ [18, 24] leading to largest mean

outcome (probability of no CVD event in 30 days)

Example: Treatment comparison in presence of treatment
discontinuation
• SYNERGY trial - enoxaparin (ENOX) vs. unfractionated
heparin (UFH) in ACS patients (open label )
• Primary (intent-to-treat) analysis : No difference
• Lots of treatment discontinuation (switching, stopping)
• Some mandatory due to adverse events , some at
clinician/patient discretion
• How do the treatments compare if there were no
discontinuation ?

Example: Treatment comparison in presence of treatment
discontinuation
• SYNERGY trial - enoxaparin (ENOX) vs. unfractionated
heparin (UFH) in ACS patients (open label )
• Primary (intent-to-treat) analysis : No difference
• Lots of treatment discontinuation (switching, stopping)
• Some mandatory due to adverse events , some at
clinician/patient discretion
• How do the treatments compare if there were no
discontinuation ?
Objective: Compare the two dynamic treatment regimes

“Take ENOX (UFH) until completion or discontinuation for
mandatory reasons”
Zhang M, Tsiatis AA, Davidian M, Pieper KS, Mahaffey KW. (2011). Inference on treatment effects from a clinical
trial in the presence of premature treatment discontinuation: The SYNERGY trial. Biostatistics, 12, 258–269.

Studying regimes based on observational data
Again: Data are observational with respect to these questions
• Decisions on duration , treatment discontinuation were
not randomized
• Made at clinician/patient discretion

Again: Data are observational with respect to these questions
• Decisions on duration , treatment discontinuation were
not randomized
• Made at clinician/patient discretion
Difficulties for studying regimes:

• Confounding – subjects receiving one treatment or another
may not be prognostically similar
• E.g., subjects who discontinued may be sicker , older , etc
• Standard methods are available to adjust for confounding ,
e.g., regression , propensity scores , etc, assuming no
unmeasured confounders
• However , the time-dependent nature of treatment causes
additional complications

Time-dependent confounding: Treatments actually received

over time depend on accruing information
• Temptation : “Adjust” for such time-dependent confounding
• E.g., a Cox model for outcome including time-dependent
intermediate variables and treatments
• However : Part of the effect of treatment on outcome may
be mediated through intermediate variables
• ⇒ Adjustment would incorrectly remove this effect and
hence misrepresent the true treatment effect

Resolution:
• Requires a generalization of no unmeasured confounders
• Unverifiable from the observed data

Resolution:
• Requires a generalization of no unmeasured confounders
• Unverifiable from the observed data
Sequential randomization assumption: At any point where a

treatment decision is made, the treatment received (among the
options available) depends only on the accrued information on
the patient and not additionally on his/her future prognosis
• At some level, this must be true
• In a SMART, this is automatically true by randomization
• With observational data , is tenable only if all accrued
information used to make decisions is available in the
database

Under sequential randomization: Inference on dynamic
treatment regimes
• Can use weighted methods similar to those discussed
earlier for Case 2 , extended to multiple decision points
• Critical difference : Rather than weighting based on known
randomization probabilities , weighting is based on the
propensities of receiving treatment at each decision as a
function of accrued information
• Modeling/estimation of propensities

Under sequential randomization: Inference on dynamic
treatment regimes
• Can use weighted methods similar to those discussed
earlier for Case 2 , extended to multiple decision points
• Critical difference : Rather than weighting based on known
randomization probabilities , weighting is based on the
propensities of receiving treatment at each decision as a
function of accrued information
• Modeling/estimation of propensities
Moral:
• Many complex questions can be posed in terms of a class
of dynamic treatment regimes
• Methods are available for inference on regimes in the class

Constructing dynamic treatment regimes
Cases 3 and 4: More complex regimes focused on

personalizing treatment to the patient
• Case 3 : D = specified class of feasible regimes
• Case 4 : D = all possible regimes
• Rules involve accrued information on the patient

Constructing dynamic treatment regimes
Cases 3 and 4: More complex regimes focused on

personalizing treatment to the patient
• Case 3 : D = specified class of feasible regimes
• Case 4 : D = all possible regimes
• Rules involve accrued information on the patient
Can we estimate an optimal regime within these classes?

• From data from a SMART in which detailed accruing
information was collected?
• From data from an observational database ?

Characterizing an optimal regime
Demonstration: Characterize an optimal regime d opt in the

class D of all possible regimes d (Case 4 )
• Single decision point
• Two treatment options coded as {0, 1}
• d ∈ D is a single rule d1 (X1 ) taking values 0 or 1
• Data from a conventional clinical trial (simplest SMART)
(X1i , A1i , Yi ), i = 1, . . . , n (iid)
A1 is treatment received taking values {0, 1}

• Assume large outcomes are better

Potential outcome for a regime: For any regime d ∈ D
• Y (0) and Y (1) are potential outcomes if a randomly chosen
patient were to receive treatments 0 and 1, respectively

• Potential outcome if a randomly chosen patient were to
follow regime d
Y (d) = Y (1) I{d(X1 ) = 1} + Y (0) I{d(X1 ) = 0}
= Y (1) d(X1 ) + Y (0) {1 − d(X1 )}
• E(Y (d) ) = mean outcome if all patients in the population
were to follow regime d

• Potential outcome if a randomly chosen patient were to
follow regime d
Y (d) = Y (1) I{d(X1 ) = 1} + Y (0) I{d(X1 ) = 0}
= Y (1) d(X1 ) + Y (0) {1 − d(X1 )}
• E(Y (d) ) = mean outcome if all patients in the population
were to follow regime d
Optimal regime d opt : d opt maximizes

E(Y (d) ) among all d ∈ D
• Can we estimate d opt satisfying this from the trial data ?

Estimating an optimal regime
Observed outcome:
Y = Y (1) I(A1 = 1) + Y (0) I(A1 = 0) = Y (1) A1 + Y (0) (1 − A1 )
⊥ A1 |X1
⇒ E(Y (1) |X1 ) = E(Y (1) |X1 , A1 = 1) = E(Y |X1 , A1 = 1)
and similarly for Y (0)

Observed outcome:
Y = Y (1) I(A1 = 1) + Y (0) I(A1 = 0) = Y (1) A1 + Y (0) (1 − A1 )
⊥ A1 |X1
⇒ E(Y (1) |X1 ) = E(Y (1) |X1 , A1 = 1) = E(Y |X1 , A1 = 1)
and similarly for Y (0)
Thus: E(Y (d) ) = E{ E(Y (d) |X1 ) }

h i
= E E(Y (1) |X1 )d(X1 ) + E(Y (0) |X1 ){1 − d(X1 )}
h i
= E E(Y (1) |X1 , A1 = 1)d(X1 ) + E(Y (0) |X1 , A1 = 0){1 − d(X1 )}
h i
= E E(Y |X1 , A1 = 1)d(X1 ) + E(Y |X1 , A1 = 0){1 − d(X1 )}

Recall: We wish to maximize
h i
E(Y (d) ) = E E(Y |X1 , A1 = 1)d(X1 )+E(Y |X1 , A1 = 0){1−d(X1 )}
• Clearly : E(Y (d) ) is maximized by

d opt (X1 ) = I{ E(Y |X1 , A1 = 1) > E(Y |X1 , A1 = 0) }
• E(Y |X1 , A1 ) is the regression of outcome on baseline
information and treatment received

h i

d opt (X1 ) = I{ E(Y |X1 , A1 = 1) > E(Y |X1 , A1 = 0) }
Suggests: Posit a regression model for E(Y |X1 , A1 )
Q(X1 , A1 ; β)
• Fit the model to trial data ⇒ Q(X1 , A1 ; β)
b
• Estimated optimal regime
b opt (X1 ) = I{ Q(X1 , 1; β)
d b }
b > Q(X1 , 0; β)

h i

d opt (X1 ) = I{ E(Y |X1 , A1 = 1) > E(Y |X1 , A1 = 0) }
Suggests: Posit a regression model for E(Y |X1 , A1 )
Q(X1 , A1 ; β)
• Fit the model to trial data ⇒ Q(X1 , A1 ; β)
b
• Estimated optimal regime
b opt (X1 ) = I{ Q(X1 , 1; β)
d b }
b > Q(X1 , 0; β)
• Issue : What if the model Q(X1 , A1 ; β) is misspecified ?

Shameless promotion: Discussion of estimation of an optimal
regime within a broad class of regimes D with a focus on
personalized treatment as in Cases 3 and 4 merits its own
shortcourse
• Robustness to misspecification of models?
• Alternative approaches ?
• Extension to multiple decision points ?
• Etc, etc. . .

shortcourse
• Etc, etc. . .
Personalized Medicine and Dynamic Treatment Regimes

• Half-day shortcourse at 2015 ENAR Spring Meeting
(Sunday, March 15, morning)

shortcourse
• Etc, etc. . .
Personalized Medicine and Dynamic Treatment Regimes

• Half-day shortcourse at 2015 ENAR Spring Meeting
(Sunday, March 15, morning)
Forthcoming book: Kosorok, M. R. and Moodie, E. E. M.

(2015). Adaptive Treatment Strategies in Practice: Planning
Trials and Analyzing Data for Personalized Medicine. SIAM.

Discussion
• Dynamic treatment regimes formalize clinical

decision-making and provide a framework for personalized
treatment
• A broad range of problems can be cast in terms of dynamic
treatment regimes
• SMARTs are the “gold standard ” data source for
estimation of dynamic treatment regimes
• Design considerations for SMARTs? Broader adoption ?
Implications for how treatments are evaluated ?
• Estimation of optimal treatment regimes is a wide open
area of research

Thought Leaders
2013 MacArthur Fellow Susan Murphy and Jamie Robins

Resources
Introductory material:
• http://methodology.psu.edu/
• http://www-personal.umich.edu/~dalmiral/
• http://www.huffingtonpost.com/
american-statistical-association/
being-smart-about-constru_b_4963862.html
• http://impact.unc.edu/Symposium2014Agenda
Literature: See the separate list of references

Appendix
Consistency of estimators for E(Y (11) ):
Qi = 1 − Ri + Ri I(Zi = 1) π −1
n n
!−1 n
X X X
−1
n Qi Yi or Qi Qi Yi
i=1 i=1 i=1
(11) (11)
Y = (1 − R)Y + RI(Z = 1)Y + RI(Z = 2)Y (12)
Want to show: E(QY ) = E(Y (11) )

• Using R(1 − R) = 0, I(Z = 1)I(Z = 2) = 0, etc.
E(QY ) = E[ Y (11) {(1 − R) + RI(Z = 1)π −1 } ]
= E[ Y (11) E{(1 − R) + RI(Z = 1)π −1 |R, Y (11) } ]
• So equivalently want to show
E{(1 − R) + RI(Z = 1)π −1 |R, Y (11) } = 1

Appendix
E{(1 − R) + RI(Z = 1)π −1 |R, Y (11) }

= E{(1 − R) + RI(Zi = 1)π −1 |R = 0, Y (11) }P(R = 0|Y (11) )
+ E{(1 − R) + RI(Zi = 1)π −1 |Ri = 1, Y (11) }P(R = 1|Y (11) )
= P(R = 0|Y (11) ) + E{ I(Z = 1)|R = 1, Y (11) }π −1 P(R = 1|Y (11) )
= P(R = 0|Y (11) ) + P(R = 1|Y (11) ) = 1
⊥ Y (11)
Because: By randomization, assignment to M1 ⊥
E{ I(Z = 1)|R = 1, Y (11) } = P(Z = 1|R = 1, Y (11) )

= P(Z = 1|R = 1) = π
For k = 2: Same argument, Q = 1 − R + RI(Z = 2)(1 − π)−1

Dtrwebinar

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dtrwebinar

Uploaded by

Copyright:

Available Formats

An Introduction to Dynamic

1/64 Dynamic Treatment Regimes Webinar

• What is a dynamic treatment regime, and why study them?

2/64 Dynamic Treatment Regimes Webinar

Source of graphic: http://www.personalizedmedicine.com/

3/64 Dynamic Treatment Regimes Webinar

Clinical practice: Clinicians make (a series of) treatment

4/64 Dynamic Treatment Regimes Webinar

Clinical practice: Clinicians make (a series of) treatment

4/64 Dynamic Treatment Regimes Webinar

Clinical practice: Clinicians make (a series of) treatment

That is: Treatment in practice involves sequential

4/64 Dynamic Treatment Regimes Webinar

How are these decisions made?

5/64 Dynamic Treatment Regimes Webinar

How are these decisions made?

Can clinical decision-making be formalized and made

5/64 Dynamic Treatment Regimes Webinar

Dynamic treatment regime:

6/64 Dynamic Treatment Regimes Webinar

In fact: Many common situations can be cast as involving

7/64 Dynamic Treatment Regimes Webinar

Sequential (scheduled) decision points

Example from Susan Murphy, University of Michigan

8/64 Dynamic Treatment Regimes Webinar

Two (milestone) decision points:

9/64 Dynamic Treatment Regimes Webinar

10/64 Dynamic Treatment Regimes Webinar

10/64 Dynamic Treatment Regimes Webinar

10/64 Dynamic Treatment Regimes Webinar

Possible rules at Decision 2:

10/64 Dynamic Treatment Regimes Webinar

Possible rules at Decision 2:

10/64 Dynamic Treatment Regimes Webinar

Result: Rules, and thus regimes , can be simple or complex

Regimes of interest and “optimal” depend on the question

11/64 Dynamic Treatment Regimes Webinar

12/64 Dynamic Treatment Regimes Webinar

12/64 Dynamic Treatment Regimes Webinar

12/64 Dynamic Treatment Regimes Webinar

12/64 Dynamic Treatment Regimes Webinar

13/64 Dynamic Treatment Regimes Webinar

13/64 Dynamic Treatment Regimes Webinar

13/64 Dynamic Treatment Regimes Webinar

13/64 Dynamic Treatment Regimes Webinar

13/64 Dynamic Treatment Regimes Webinar

14/64 Dynamic Treatment Regimes Webinar

14/64 Dynamic Treatment Regimes Webinar

14/64 Dynamic Treatment Regimes Webinar

14/64 Dynamic Treatment Regimes Webinar

15/64 Dynamic Treatment Regimes Webinar

15/64 Dynamic Treatment Regimes Webinar

15/64 Dynamic Treatment Regimes Webinar

15/64 Dynamic Treatment Regimes Webinar

16/64 Dynamic Treatment Regimes Webinar

16/64 Dynamic Treatment Regimes Webinar

16/64 Dynamic Treatment Regimes Webinar

17/64 Dynamic Treatment Regimes Webinar

• Case 3 : K = 1, code {C1 , C2 } = {0, 1}, rules of form

17/64 Dynamic Treatment Regimes Webinar

• Case 3 : K = 1, code {C1 , C2 } = {0, 1}, rules of form

• Case 4 : K = 2, general rules {d1 (H1 ), d2 (H2 )}; e.g., with