Professional Documents
Culture Documents
ates, namely dynamic survival analysis, also gained inally proposed by Cox (1972). A stream of work
popularity in the deep learning field (Lee et al., uses neural networks to parameterize the hazard func-
2019; Jarrett et al., 2019; Damera Venkata and Bhat- tion (Yousefi et al., 2017; Katzman et al., 2018),
tacharyya, 2022; Maystre and Russo, 2022). As op- Gensheimer and Narasimhan (2019) additionally re-
posed to models trained to maximize EEP likelihood, move the proportional hazard assumption. Simul-
DSA models estimate the event PMF at any horizon. taneously, other works focus on parameterizing the
Thus, in theory, such a model can also provide an es- PMF (Kvamme et al., 2019; Lee et al., 2018; Ren
timate of the cumulative failure function as required et al., 2019), while adding regularization terms to
in EEP tasks while additionally providing a decompo- their negative log-likelihood objective. Among these
sition of where such a risk lies within the considered works, Lee et al. (2018) went in the opposite direc-
horizon. tion to our work by fitting the PMF with MLE on the
In this work, we study the usage of deep learning cumulative function, similar to EEP. In DSA, based
models trained with a DSA likelihood for EEP tasks on the landmarking idea (Van Houwelingen, 2007;
to design more advanced alarm policies. Our con- Parast et al., 2014), advances followed a similar trend
tribution can be summarized as follows: (i) We for- with works parametrizing the PMF (Damera Venkata
malize and propose how to train and use DSA mod- and Bhattacharyya, 2022) and fitting an MLE on the
els to match EEP models’ timestep-level performance cumulative failure function (Jarrett et al., 2019; Lee
on three established benchmarks (ii) To this end, we et al., 2019).
propose survTLS, a non-trivial extension to temporal
Survival analysis and event classification
label smoothing Yèche et al. (2023) (TLS) for DSA.
Prior work has investigated the use of static survival
(iii) At the event level, we propose a simple yet novel
analysis for event classification problems such as early
scheme, leveraging the risk localization provided by
detection of fraud (Zheng et al., 2019). This work
DSA models to prioritize imminent alarms, resulting
however remains restricted to non-dynamic applica-
in further performance improvement over EEP mod-
tion, thus not applicable to DSA or EEP. On the
els.
other hand, Shen et al. (2023) propose a model for
EEP applications trained with a vanilla DSA likeli-
hood to classify neurological prognostication. How-
2. Related Work
ever, they do not provide any comparison to EEP
Early event prediction from EHR data As pre- MLE nor propose tailored alarm policies to the local-
viously mentioned, early warning systems (EWS) us- ized risk estimation.
ing deep learning models have recently gained trac-
tion in the literature. Indeed, over the years multi-
3. Methods
ple large publicly available EHR databases (Johnson
et al., 2016b; Pollard et al., 2018; Faltys et al., 2021b; In this section, we describe our DSA approach to EEP
Thoral et al., 2021) and benchmarks including EEP tasks starting from the timestep estimator training to
tasks (Harutyunyan et al., 2019; Wang et al., 2020; the alarm policy design. An overview of the pipeline
Reyna et al., 2020; Yèche et al., 2021; van de Wa- and how it compares to EEP can be found in Figure 1.
ter et al., 2024) were released. Using these, exist-
ing works proposed new architecture designs (Horn 3.1. From Early Event Prediction to
et al., 2020; Tomašev et al., 2019), imputation meth- Dynamic Survival Analysis
ods (Futoma et al., 2017), and more recently objec-
tive functions Yèche et al. (2023). However, in all Early event prediction In EEP, we consider a
these works, the backbone model is trained via MLE dataset of multivariate time series of covariates Xi
on the cumulative failure function at the horizon of and binary event labels ei,t representing the occur-
prediction. Thus, to our knowledge, our work is the rence of an event at time t in trajectory i. Each
first to investigate the DSA models for these EEP sample i is a sequence {(xi,0 , ei,0 ), . . . , (xi,Ti , ei,Ti )}
tasks. of length Ti . For each timepoint t along a time series,
the covariates observed up to this point are denoted
Survival analysis in the era of deep learning Xi,t = [xi,0 , . . . , xi,t ] and the time of the next event
With deep learning emergence, SA quickly moved is given by Te (t) = arg minτ :τ ≥t {eτ : eτ = 1}. If no
away from proportional linear hazard models, as orig- event happened, we define Te (t) = +∞ and call this
2
Dynamic Survival Analysis for Early Event Prediction
EEP
Threshold-based
Cumulative Failure
Alarm Policy
Function Estimator
Function
Hazard Aggregation
Function Estimator
Priorization
Episode 1 Episode 2 Function
Episode splitting
DSA
Figure 1: Overview of the pipeline for EEP (green) and our proposed DSA (purple) approach to EEP tasks.
In the common EEP pipeline, a cumulative failure function estimator Fθ is trained by MLE on corresponding
likelihood LEEP and serves as a risk score for a threshold-based alarm policy. On the other hand, in our
proposed DSA approach, using a re-organized train set into episodes, we fit a hazard function estimator
λθ by MLE on a partial survival likelihood LhDSA . We obtain a unique risk after applying a prioritization
function and aggregation function to [Fθ (1), ..., Fθ (h)].
sample ”right-censored” to align with DSA terminol- indicator. Then the DSA negative log-likelihood for
ogy. The EEP task consists of modeling the cumula- a model parameterizing the PMF fθ is defined as:
tive probability of this event occurring within a fixed
prediction horizon h defined as F (h| Xt ) = P (Te ≤
N X Ti
t+h|Xt ). Importantly, as EEP focuses on early warn- X
LDSA = − (1 − ci )fθ (Ti + 1 − t|Xi,t )
ing, no prediction is carried out during events. The
i=1 t=0
common approach in the EEP literature is to train
TiX
+1−t
models that directly parameterize the cumulative fail-
+ ci ([1 − fθ (k|Xi )]) (2)
ure function Fθ by minimizing the following negative
k=1
log-likelihood with labels yi,t = 1[Pt+h ek ≥1] :
k=t
It is known (Kalbfleisch and Prentice, 2011) that
the survival likelihood can be re-written as binary
N X Ti
X cross-entropy over the hazard function λ(k| Xt ) =
LEEP = −ei,t [yi,t log(Fθ (h| Xi,t ))
P (Te = t + k|Xt , Te > t + k − 1). Thus, it is common
i t
in DSA to parameterize the hazard λθ and to mini-
+ (1 − yi,t ) log(1 − Fθ (h| Xi,t ))] (1) mize the following survival negative log-likelihood us-
ing binary labels yi,t,k = 1[Ti −t=k∧ci =0] and sample
Dynamic survival analysis Conversely, DSA weights w
i,t,k = 1[k≤Ti −t] :
aims to model the precise time Te (t) until an event of
interest occurs given observation up to t. Thus, when
considering a discrete setting, to model for all hori- X N XTi TX max
3
Dynamic Survival Analysis for Early Event Prediction
1.0
λθ (k| Xt ))λθ (h| Xt ) and the cumulative failure func- f(h|X1)
f(h|X2)
Qh
P(Te = h|X)
tion Fθ (h| Xt ) = 1− k=1 (1−λθ (k| Xt )). Thus, from
a hazard estimate, one can also perform EEP tasks. 0.5 f(h|X3)
P(Te = h|X)
using a survival analysis method for these cases, the 0.2 fS(h|X3)
DSA framework requires unique terminal events. To
address this issue, we propose to instead only predict
0.0
the occurrence of the closest event, if there is one. T1 T2 T3
In EEP tasks, timesteps within events are ignored Time-to-event h
in the likelihood. Thus, for a patient stay i experienc-
ing v events at times te1 , ..., tev , respectively ending at Figure 2: Illustration of survTLS for three non-
times se1 , ..., sev , we proposed instead to consider dis- censored samples. (Top) Ground-truth PMF; (Bot-
tinct episodes [Xi,0 , ..., Xi,te1 −1 ],[Xi,0 , ..., Xi,te2 −1 ], tom) survTLS smoothed PMF. The further the event,
..., [Xi,0 , ..., Xi,Ti ] associated to their respective the higher the entropy of the mass function around
labels [yi,0 , ..., yi,te1 −1 ],[yi,se2 , ..., yi,te2 −1 ],...,[yi,sev the ground-truth time-to-event.
, ..., yi,Ti ]. It is important to note that for any
episode beyond the first one indexed by k, we
Similarly, each negative is associated with Ti − t neg-
provide the history of measurement between 0 and
ative labels. Hence the prevalence from the EEP task
sek−1 to ensure preserving the signal from previous
is divided by a factor T̄ corresponding to the average
occurrences. This procedure ensures, that in each
sequence length. This becomes extreme in EHR data
sample, if not censored, the event occurrence is
where sequences have thousands of steps. We found
unique and at the end of the sequence, as in DSA.
this to forbid convergence in certain cases.
To overcome this issue, inspired by Karpathy
3.2. Bridging the gap on timestep-level (2019), we propose to initialize the bias of the logit
performance layer b = [b1 , ..., bTmax ] such that the output proba-
bility 1+e1−bk = ỹ(k) with ỹ(k), the average hazard
In practice, as reported by Yèche et al. (2023), we
label value for horizon k. Thus we initialize the bias
show that fitting hazard deep learning models to a
as follows:
DSA likelihood on EEP data is unstable and under-
performs on timestep metrics (see Figure 5 and Ta- ỹ(k)
bk = log( ), ∀k ≤ Tmax
ble 1). However, we find we can overcome this issue 1 − ỹ(k)
with two simple modifications to the training. First,
we find that instability is due to extreme imbalance Survival likelihood truncating As shown in Ta-
compared to EEP likelihood and propose a specific ble 1, correct initialization of biases already allows
logit bias initialization to handle it. Second, to fo- DSA models to be trainable for ICU data, however,
cus on the horizon of prediction used at inference, we they still lag behind EEP counterparts. As moti-
propose to match EEP likelihood and truncate DSA vated by Yèche et al. (2023), further events are gen-
likelihood only until the horizon of prediction h. Fi- erally harder to predict due to their lower signal.
nally, we propose, survTLS our extension to TLS for We observe sequences to be longer in EEP datasets
survival analysis, allowing to further improve perfor- than in DSA. Indeed, PCB2 and AIDS, two com-
mance over DSA MLE for EEP. monly used datasets in DSA, have median lengths of
3 and 5 Maystre and Russo (2022), whereas MIMIC-
Bias initialization As resolution is high in ICU III and HiRID have median sequence lengths of 50
data and horizons of prediction relatively short, asso- and 275. Thus, when fitting a DSA likelihood, a
ciated tasks tend to already be imbalanced. Unfortu- model is trained to model event occurrence possibly
nately, as shown in Eq 3, when fitting a hazard model, much further than the fixed horizon of prediction h
each positive label from the EEP is associated with used in EEP. As empirically validated in Figure 5),
Ti − t − 1 negative elements for a single positive label. we believe such events dominate the loss due to their
4
Dynamic Survival Analysis for Early Event Prediction
5
Dynamic Survival Analysis for Early Event Prediction
6
Dynamic Survival Analysis for Early Event Prediction
Time-step AuPRC
Priority Weight
0.6 0.34
0.32
0.4
0.30
0.2
0.28
0.0 12 24 36 48 60
0 20 40 60 80 100 120 140 Max Horizon in LDSA (Hours)
Survival horizon (steps)
Figure 5: Ablation on the maximum considered hori-
Figure 4: Visualizations of the qexp exponential decay zon in LDSA for the ventilation task. By not training
function used for prioritization of survival risk scores. the DSA model further than h, the model’s estima-
The plot shows hmax = h = 144, and different γ for tion of Fθ (h) improves leading to a better timestep
the convex and a concave version, which we didn’t AuPRC.
find to work better.
Evaluation At a time-step level, to evaluate the Event Recall Proposed by Tomašev et al. (2019),
goodness of the cumulative failure function estimate given binary timestep predictions at (for alarms),
Fθ across models, we follow a common approach event recall Revent corresponds to the true positive
for highly imbalanced tasks, with the area under rate over event predictions. An event E at time tE
the precision-recall curve (AuPRC). We report the is detected if any at is positive between tE − h and
7
Dynamic Survival Analysis for Early Event Prediction
te − 1. Hence the following definition: Table 1: Time-Step AuPRC. “EEP” and “Survival”
P refer to training with the respective vanilla likeli-
E 1( ttE −1 hoods. Time-step level metrics are useful to assess
P
at )>0
E −h
Revent = a machine learning predictor’s generalization per-
#Events
formance on the trained task but do not explicitly
Alarm Precision Proposed by Hyland et al. quantify performance on clinically meaningful enti-
(2020), given binary timestep predictions at (for ties such as alarms and events.
alarms), alarm precision Palarm is the proportion of
alarms raised located within h of a event. It is defined Task HiRID MIMIC
as follows: Circ. Vent. Decomp.
P PtE −1 EEP 39.0 ± 0.4 34.3 ± 0.3 37.1 ± 0.6
( at ) + TLS 40.5 ± 0.4 34.9 ± 0.4 37.2 ± 0.3
Pevent = E PtE −h
t at Survival 37.4 ± 2.0 21.5 ± 7.5 N.A
+ Bias Init. 39.2 ± 0.3 31.3 ± 0.8 36.2 ± 0.4
Note, that considering a single event, there can be + Limit Horizon 40.7 ± 0.2 34.5 ± 0.4 37.4 ± 0.6
multiple true alarms for it as long as they fall within + survTLS 40.7 ± 0.2 34.9 ± 0.3 38.4 ± 0.2
the prediction horizon.
In our work, the first event metric we report is the
Table 2: Area under the Alarm Precision / Event Re-
Event Recall @ Alarm Precision for high preci-
call Curve (Hyland et al., 2020). Different from time-
sion thresholds corresponding to clinically applicable
step metrics (Table 1), this metric evaluates the out-
regions.
put of the alarm policy used over the model’s time-
Distance to Event Because event recall does not step risk estimates. “Survival” refers to training with
capture the earliness of the alarm, Hyland et al. bias initialization and truncated survival likelihood
h
(2020) also proposes to report the distance for the LDSA . We observe that an alarm policy prioritizing
first alarm for an event Df a . Formally for an event imminent events can notably improve the event-level
event E at time tE this is defined as: performance.
8
Dynamic Survival Analysis for Early Event Prediction
Table 3: Event Recall and Mean Distance (in hours) of the first alarm to the event start on MIMIC-III
Decompensation at fixed alarm precisions of 60%, 70%, 80%. In Table 2 we provide area under the curve
results to assess the event-level performance across operating thresholds τ . Here, to mimic deployment
scenarios, we use a high precision constraint to fix a threshold on the risk score that has to be chosen for the
alarm policy. Our method can provide noticeable improvements in event recall while maintaining a similar
detection distance to the event.
0.8
Alarm Precision
3
0.6
2
0.4
Figure 6: Alarm Precision (Hyland et al., 2020) per Figure 7: Mean distance of the first alarm to the event
Event Recall on HiRID Ventilation. We show that per Event Recall on HiRID Ventilation. It highlights
our proposed DSA-based approach paired with a pri- that the improved event recall and alarm precision
oritizing alarm policy maintains a higher alarm pre- performance does come at a very small to no cost in
cision for higher event recall regions. terms of detection distance to the event.
9
Dynamic Survival Analysis for Early Event Prediction
7. Conclusion
In this work, we investigate the usage of DSA models
for EEP tasks motivated by the additional localiza-
tion of the risk they provide. We show that even
though it is more challenging to train, with care-
ful initialization and partial survival likelihood fit-
ting, DSA models can be competitive at a time-step
level. Further, we show that our simple prioritiza-
tion scheme for alarms allows DSA models to outper-
form EEP counterparts even further. Our proposed
prioritization transformation is a first step towards
tailored alarm policies. Future work remains to fur-
ther leverage risk localization to design more sophis-
ticated alarm policies and provide a richer output to
clinicians.
In the past many application-oriented works have
chosen to use classical EEP modeling approaches for
their studies due to higher stability and ease of train-
ing. We hope our analysis paves the way for DSA, a
more comprehensive modeling approach, to become
the prominent choice.
10
Dynamic Survival Analysis for Early Event Prediction
Michael F Gensheimer and Balasubramanian Andrej Karpathy. A recipe for training neural net-
Narasimhan. A scalable discrete-time survival works. Personal Blog, 2019. URL https://
model for neural networks. PeerJ, 7:e6257, 2019. karpathy.github.io/2019/04/25/recipe/.
Hrayr Harutyunyan, Hrant Khachatrian, David C Jared L Katzman, Uri Shaham, Alexander Cloninger,
Kale, Greg Ver Steeg, and Aram Galstyan. Multi- Jonathan Bates, Tingting Jiang, and Yuval Kluger.
task learning and benchmarking with clinical time Deepsurv: personalized treatment recommender
series data. Scientific data, 6(1):1–18, 2019. system using a cox proportional hazards deep neu-
ral network. BMC medical research methodology,
Max Horn, Michael Moor, Christian Bock, Bastian 18(1):1–12, 2018.
Rieck, and Karsten Borgwardt. Set functions for
time series. In International Conference on Ma- Diederik P. Kingma and Jimmy Ba. Adam: A
chine Learning, pages 4353–4363. PMLR, 2020. method for stochastic optimization, 2017.
11
Dynamic Survival Analysis for Early Event Prediction
Håvard Kvamme, Ørnulf Borgan, and Ida Scheel. Matthew A Reyna, Christopher S Josef, Russell
Time-to-event prediction with neural networks and Jeter, Supreeth P Shashikumar, M Brandon West-
cox regression. arXiv preprint arXiv:1907.00825, over, Shamim Nemati, Gari D Clifford, and Ashish
2019. Sharma. Early prediction of sepsis from clinical
data: the physionet/computing in cardiology chal-
Changhee Lee, William Zame, Jinsung Yoon, and Mi-
lenge 2019. Critical care medicine, 48(2):210–217,
haela Van Der Schaar. Deephit: A deep learning
2020.
approach to survival analysis with competing risks.
In Proceedings of the AAAI conference on artificial Xiaobin Shen, Jonathan Elmer, and George H. Chen.
intelligence, volume 32, 2018. Neurological prognostication of post-cardiac-arrest
Changhee Lee, Jinsung Yoon, and Mihaela Van coma patients using eeg data: A dynamic sur-
Der Schaar. Dynamic-deephit: A deep learning ap- vival analysis framework with competing risks. In
proach for dynamic survival analysis with compet- Kaivalya Deshpande, Madalina Fiterau, Shalmali
ing risks based on longitudinal data. IEEE Trans- Joshi, Zachary Lipton, Rajesh Ranganath, Iñigo
actions on Biomedical Engineering, 67(1):122–133, Urteaga, and Serene Yeung, editors, Proceedings of
2019. the 8th Machine Learning for Healthcare Confer-
ence, volume 219 of Proceedings of Machine Learn-
Xinrui Lyu, Bowen Fan, Matthias Hüser, Philip ing Research, pages 667–690. PMLR, 11–12 Aug
Hartout, Thomas Gumbsch, Martin Faltys, To- 2023. URL https://proceedings.mlr.press/
bias M. Merz, Gunnar Rätsch, and Karsten Borg- v219/shen23a.html.
wardt. An empirical study on kdigo-defined acute
kidney injury prediction in the intensive care unit. Reed T Sutton, David Pincock, Daniel C Baum-
medRxiv, 2024. doi: 10.1101/2024.02.01.24302063. gart, Daniel C Sadowski, Richard N Fedorak, and
URL https://www.medrxiv.org/content/ Karen I Kroeker. An overview of clinical decision
early/2024/02/03/2024.02.01.24302063. support systems: benefits, risks, and strategies for
Lucas Maystre and Daniel Russo. Temporally- success. NPJ digital medicine, 3(1):17, 2020.
consistent survival analysis. Advances in Neural
Patrick J Thoral, Jan M Peppink, Ronald H Driessen,
Information Processing Systems, 35:10671–10683,
Eric JG Sijbrands, Erwin JO Kompanje, Lewis Ka-
2022.
plan, Heatherlee Bailey, Jozef Kesecioglu, Maurizio
Michael Moor, Nicolas Bennett, Drago Plečko, Max Cecconi, Matthew Churpek, et al. Sharing icu pa-
Horn, Bastian Rieck, Nicolai Meinshausen, Peter tient data responsibly under the society of critical
Bühlmann, and Karsten Borgwardt. Predicting care medicine/european society of intensive care
sepsis using deep learning across international sites: medicine joint data science collaboration: the am-
a retrospective development and validation study. sterdam university medical centers database (ams-
EClinicalMedicine, 62:102124, August 2023. terdamumcdb) example. Critical care medicine, 49
(6):e563, 2021.
Layla Parast, Lu Tian, and Tianxi Cai. Landmark
estimation of survival and treatment effect in a ran- Nenad Tomašev, Xavier Glorot, Jack W Rae, Michal
domized clinical trial. J Am Stat Assoc, 109(505): Zielinski, Harry Askham, Andre Saraiva, Anne
384–394, January 2014. Mottram, Clemens Meyer, Suman Ravuri, Ivan
Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Protsyuk, et al. A clinically applicable approach
Leo A Celi, Roger G Mark, and Omar Badawi. to continuous prediction of future acute kidney in-
The eicu collaborative research database, a freely jury. Nature, 572(7767):116–119, 2019.
available multi-center database for critical care re-
search. Scientific data, 5(1):1–13, 2018. Gerhard Tutz, Matthias Schmid, et al. Modeling dis-
crete time-to-event data. Springer, 2016.
Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang,
Weinan Zhang, Lin Qiu, and Yong Yu. Deep recur- Robin van de Water, Hendrik Schmidt, Paul Elbers,
rent survival analysis. In Proceedings of the AAAI Patrick Thoral, Bert Arnrich, and Patrick Rock-
Conference on Artificial Intelligence, volume 33, enschaub. Yet another ICU benchmark: A flexi-
pages 4798–4805, 2019. ble multi-center framework for clinical ML. In The
12
Dynamic Survival Analysis for Early Event Prediction
13
Dynamic Survival Analysis for Early Event Prediction
Table 4: Label and event prevalence statistics computed on the training set for all tasks.
Circulatory failure Online binary prediction of future circulatory failure events on the HiRID (Faltys
et al., 2021b) dataset as defined by Yèche et al. (2021) every 5 minutes. The benchmarked prediction horizon
for the EEP models and the survival model at a fixed horizon are set at 12 hours.
Ventilation Online binary prediction of future ventilator usage on the HiRID (Faltys et al., 2021b) dataset
every 5 minutes. The ventilation status is extracted from the data as defined by Yèche et al. (2021) and the
prediction horizon is 12 hours.
Decompensation Online binary prediction of patient mortality as defined by Harutyunyan et al. (2019).
A label is positive if the patient dies within the horizon. Benchmarked and evaluated on MIMIC-III (Johnson
et al., 2016b) at a 24-hour horizon.
A.2. Implementation
Training details. For all models, we set the batch size to 64 and the learning rate to 1e−4 using Adam
optimizer Kingma and Ba (2017). We early-stop each model training according to their validation loss when
no improvement was made after 10 epochs.
Libraries. An exhaustive list of libraries and their version we used is in the environment.yml file from
the code release.
Infrastructure. We trained all models on a single NVIDIA RTX2080Ti with 8 Xeon E5-2630v4 cores and
64GB of memory. Individual seed training took between 3 to 10 hours for each run.
Timestep modeling hyperparameters We used the same architecture and shared hyperparameters for
both types of likelihood training. These were selected based on the validation performance of EEP likelihood
training. It is possible further improvement can be achieved by selecting different hyperparameters for our
DSA approach. However, we prefer to be conservative to ensure a fair comparison. Exact parameters are
reported in Table 6, Table 7, Table 8.
Temporal label smoothing hyperparameters Similarly we selected hyperparameters for TLS and
survTLS on validation set timestep AUPRC. We found similar hyperparameters as the original paper for
TLS with hm ax = 2h and hm in = 0 and γcirc = 0.2, γvent = 0.1, and γdecomp = 0.05. For survTLS, we
found for the lengthscale parameter that lcirc = 10, lvent = 50, ldecomp = 8. We plot the obtained labels
and weights from smoothing the groud-truth event PMF f corresponding to smooth hazard function λS and
survival function SS in Figure 8 and Figure 9.
Alarm policy hyperparameter We find that a short one-step silencing actually performs the best (5
minutes on HiRID and 1 hour on MIMIC-III) based on validation set performance for the EEP model and
keep that constant across all experiments also for the survival models.
For the prioritized alarm policy we tune hmax , γ, and q exp function type for each task on the validation
set. Chosen values are shown in Table 5.
14
Dynamic Survival Analysis for Early Event Prediction
1.0
y1h = (h|X1)
y2h = (h|X2)
0.5
P(Te = h|Te > h 1, X)
y3h = (h|X3)
0.0
1.0
y1h = S(h|X1)
0.5 y2h = S(h|X2)
y3h = S(h|X3)
0.0
T1 T2 T3
Time-to-event
Figure 8: Illustration of the smooth hazard function λS serving as labels in survTLS.
1.0
w1h = S(h|X1)
P(Te > h|X)
w2h = S(h|X2)
0.5 w3h = S(h|X3)
0.0
1.0
w1h = SS(h|X1)
P(Te > h|X)
w2h = SS(h|X2)
0.5 w3h = SS(h|X3)
0.0
T1 T2 T3
Time-to-event h
Figure 9: Illustration of the smooth survival function §S serving as weights in survTLS.
Table 5: Hyperparameter search range for prioritized alarm policies on top of DSA models. In bold are
parameters selected by grid search.
15
Dynamic Survival Analysis for Early Event Prediction
Table 6: Hyperparameter search range for circulatory failure, In bold are parameters selected by grid search.
Hyperparameter Values
Table 7: Hyperparameter search range for mechanical ventilation, In bold are parameters selected by grid
search.
Hyperparameter Values
Table 8: Hyperparameter search range for decompensation, In bold are parameters selected by grid search.
Hyperparameter Values
Event Performance Curves In Figure 10 we show alarm precision and mean distance to event curves
plotted over levels of event recall (sensitivity of the alarm policy conditioned on a risk predictor).
16
Dynamic Survival Analysis for Early Event Prediction
12.5
0.6
10.0
0.4 7.5
5.0
0.2 EEP (AuC: 67.34)
TLS (AuC: 68.37) 2.5
Survival (ours) (AuC: 69.92)
SurvTLS (ours) (AuC: 70.89)
0.0 Imminent prio. (ours) (AuC: 71.28) 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Event Recall Event Recall
(a) Decompensation on MIMIC-III
Event PR Curve Mean Distance to Event (hours)
1.0 EEP
TLS
Survival (ours)
4 SurvTLS (ours)
0.8 Imminent prio. (ours)
Mean Distance to Event (h)
0.6 3
Alarm Precision
0.4
2
0.2
EEP (AuC: 66.03) 1
0.0 TLS (AuC: 76.37)
Survival (ours) (AuC: 75.17)
SurvTLS (ours) (AuC: 77.53)
Imminent prio. (ours) (AuC: 79.48) 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Event Recall Event Recall
(b) Circulatory Failure on HiRID
Event PR Curve Mean Distance to Event (hours)
EEP
TLS
1.0 Survival (ours)
4 SurvTLS (ours)
Imminent prio. (ours)
Mean Distance to Event (h)
0.8
3
Alarm Precision
0.6
2
0.4
Figure 10: Event-based performances of the alarm policy under different models. We show the event-based
alarm precision (Hyland et al., 2020) and also plot the mean distance of the first alarm to the event horizon
at each event detection sensitivity level.
17
Dynamic Survival Analysis for Early Event Prediction
18