You are on page 1of 14

Early warning via transitions in latent stochastic dynamical

systems
a,b,c ∗a,b,c a,b,c † d,e,c
Lingyu Feng , Ting Gao , Wang Xiao and Jinqiao Duan
a
School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074,
China
b
Center for Mathematical Science, Huazhong University of Science and Technology, Wuhan 430074, China
c
Steklov-Wuhan Institute for Mathematical Exploration, Huazhong University of Science and Technology,
arXiv:2309.03842v1 [stat.ML] 7 Sep 2023

Wuhan 430074, China


d
Department of Mathematics and Department of Physics, Great Bay University, Dongguan, Guangdong
523000, China
e
Dongguan Key Laboratory for Data Science and Intelligent Medicine, Dongguan, Guangdong 523000, China

September 8, 2023

Abstract
Early warnings for dynamical transitions in complex systems or high-dimensional observation
data are essential in many real world applications, such as gene mutation, brain diseases, nat-
ural disasters, financial crises, and engineering reliability. To effectively extract early warning
signals, we develop a novel approach: the directed anisotropic diffusion map that captures the la-
tent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic
electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and
derive early warning signals capable of detecting the tipping point during the state transition. Our
method bridges the latent dynamics with the original dataset. The framework is validated to be
accurate and effective through numerical experiments, in terms of density and transition proba-
bility. It is shown that the second coordinate holds meaningful information for critical transition
in various evaluation metrics.

Keywords: Directed anisotropic diffusion map; Latent stochastic dynamical system; Transition
phenomena; Meta-stable states; Early warning indicator

1 Introduction
High-dimensional evolutionary processes exist in many real world systems, such as time-varying
patterns of protein folding and unfolding, information flow over wireless communication, climate
change, portfolio evolution in financial markets and artificial intelligence assisted disease diagnosis [1–4].
Among these, one particularly interesting problem is to study complex brain activities over time, which
also poses a challenge, in terms of the limited high-quality labelled data, complex graphic structure
and multi-scale neuron models at different levels. Problems arise from modelling brain dynamics in
the high-dimensional space due to the complexity of the human brain system. Most of the multi-scale
data-driven modelling approaches build surrogate models for the brain activities. Electroencephalo-
gram (EEG) signals are collected by dozens or hundreds of electrodes placed on the scalp. Considering
the low-dimensional representation from observations in electrophysiological experiments, neuroscien-
tists have been focusing on investigating the latent dynamics of brain via various dimension reduction
techniques. Moreover, it shows that dimension reduction techniques lead to significantly enhanced
performance by removing multi-collinearity, and avoiding the curse of dimensionality.
∗ Corresponding author Ting Gao: tgao0716@hust.edu.cn; tinggao0716@gmail.com
† Jinqiao Duan: duan@gbu.edu.cn; duanjq@gmail.com

1
One methodology on learning low-dimensional latent dynamics comes from the variational auto-
encoder (VAE) framework. For example, the ODE2VAE framework learns the embedding of high-
dimensional trajectories and infers arbitrarily complex continuous-time latent dynamics [5]. The
VAE2SDE model presents a method for learning latent stochastic differential equations from high-
dimensional time series data [6]. When considering brain data specifically, Bi et al. propose an inno-
vative semi-supervised VAE approach to explore the low-dimensional representations of event-related
potentials and demonstrate that these latent variables exhibit great potential in the development of
brain-controlled vehicles [7]. Li et al. employ a VAE to extract latent variables from multichannel
EEG signals and highlight that emotion recognition can be accomplished with exceptional performance
based on the learnt latent variables [8]. Guo et al. utilize a VAE to investigate the latent variables
from multichannel electroencephalogram signals associated with acupuncture stimulation [9]. Further-
more, the analysis of low-dimensional dynamics derived from video-evoked cortical responses shows
promising results in predicting the corresponding response with remarkable accuracy by VAE [10].
Generative-discriminative representations are learned from lower-dimensional latent space by deep
learning in [11].
An inherent benefit of the VAE framework is its ability to simultaneously reduce dimensionality
while learning latent dynamics or even predicting future behaviors. While the disadvantages include
the high computational cost, inaccuracy of reconstructed data, as well as weak explanation on effective
coordinates of reduced systems. Therefore, other manifold learning methods are highly investigated
to represent the dataset on a low-dimensional embedding space [12–15], which preserve the intrinsic
structure of the original data. Specifically, linear embedding algorithm computes low-dimensional,
neighborhood-preserving embeddings of high-dimensional inputs [16]. Multiple studies consistently
demonstrate that nonlinear manifold learning algorithms outperform the linear ones. Among several
effective nonlinear manifold learning methods, diffusion maps are widely used to analyze dynamical
systems [17] and stochastic dynamical systems [18] on low-dimensional latent space, determine order
parameters for chain dynamics [19], and classify person from electrocardiogram recordings [20], and
so on. Also, dynamics of brain activity from task-dependent functional magnetic resonance imaging
data are modelled by a three step framework including diffusion maps in [21]. Besides, the double
diffusion maps on the latent space allows the approximation of the reduced dynamical models [22].
Grassmannian diffusion maps create the transition matrix of a random walk on a graph connecting
points on the Grassmann manifold [23]. Further, the directed anisotropic diffusion map improves
upon the isotropic diffusion by incorporating the local information. It considers the original data
structure and adapts the diffusion process accordingly. Inspired by this and to efficiently distinguish
between epilepsy-related data and pre-ictal data in the latent space, we design some drift function in
the directed diffusion to construct an anisotropic diffusion map.
After dimension reduction through manifold learning, the latent dynamics could be learnt in either
deterministic or stochastic ways. For example, Talmon et al. study the inference of latent intrinsic
variables of dynamical systems from observed output signals [24]. The VAMP framework encompasses
the complete transformation from molecular coordinates to Markov states, consolidating the entire
data processing pipeline into a unified end-to-end framework [1]. Also, for partially observed data,
Ouala et al. introduce a framework based on the data-driven identification of an augmented state-
space model using a neural-network-based representation [25]. On the other hand, since stochastic
differential equations are usually used to model nonlinear systems that involve uncertainty or ran-
domness [26], it is of great interest to learn latent stochastic differential equations from real high
dimensional data. To gain deeper insights into the latent characteristics, researchers frequently inves-
tigate the stochastic dynamical systems, which depend on the geometry and distribution of the latent
space. For instance, Duncker et al. develop an approach to learn an interpretable semi-parametric
model of a latent continuous-time stochastic dynamical system [27]. And the authors combine dif-
fusion map and Kramers–Moyal to construct a reduced, data-driven, parameter dependent effective
stochastic differential equation for electric-field mediated colloidal crystallization using data obtained
from Brownian dynamics simulations in [28]. Besides, in general cases when the noise is non-Gaussian
Lévy motion, there are various methods to realize the stochastic dynamical systems evolving in the
latent space over time [29–31], as well as the corresponding transition dynamics [32, 33].
In real world applications for complex disease diagnosis, early warning detection has attracted a
great deal of attention for a long time. For instance, epilepsy is a neurological disease characterized by
recurrent and unpredictable seizures, which are sudden bursts of abnormal brain activity. According

2
to the World Health Organization, epilepsy affects around 50 million people worldwide, making it one
of the most prevalent neural diseases. However, the mechanisms governing seizure activity are not well
understood and many patients suffer a lot. Early detection and intervention of epileptic seizures can
help patients receive timely and appropriate treatment, which might prevent or minimize the negative
impacts of seizures on their health and quality of life. Liu et al. present a model-free computational
method to detect critical transitions with strong noise by dynamical network markers [34]. Huang
et al. propose an information metric-based manifold learning algorithm to extract early warnings for
financial markets [35]. Bury et al. develop a deep learning algorithm that provides early warning
signals in systems by exploiting information of dynamics near tipping points [36]. Therefore, in this
paper, we propose to learn the early warning signals from the latent stochastic dynamical system by
introducing the transition probability between meta-stable states in the latent space.
In summary, the goal of the present study is to model and analyze the brain activity data from
epileptic patients to identify early warnings of epileptic seizures, with stochastic dynamical systems
tools. Our main contributions are
• Find the appropriate low-dimensional coordinates from time evolutionary high-dimensional data;
• Explore effective early warning indicators for transitions from one meta-stable state to another;
• Learn the latent stochastic differential equation and dynamics on low-dimensional manifold;
• Apply brain activity data to detect epilepsy ictus signals at early time, and verify its consistency
with the transition signals from the latent stochastic system.
The remainder of this paper is structured as follows. In Section 2, we first design a drift function
to obtain better low dimensional representation by the directed anisotropic diffusion map. Following
that, we extract the latent stochastic dynamical system from high dimensional dataset, and establish
specific early warning indicators. In Section 3, we validate our framework by numerical experiments
on real EEG data. Finally, we summarize our results together with some conclusions in Section 4.

2 Methodology
In this section, we introduce the main procedures in our workflow of exploring early warning signals
from latent dynamical systems (Fig. 1).

Directd
anisotorpic Early warning
diffusion map signals
neural
dimension reduction
networks
data
preprocessing Low dimensional Latent stochastic
latent space latent space
dynamical systems

EEG data  modeling

Figure 1: A schematic diagram of our framework.

2.1 Diffusion Maps


The core idea of diffusion map is a time-dependent diffusion process, i.e., a random walk on the
dataset where each hop has the associated transition probability. When the diffusion process runs,
for each step, we can calculate the distance over the underlying geometric structure. To evaluate the
connectivity strength between two data points or the probability of jumping from x to y in one step
of the random walk, we need to introduce a diffusion kernel.

3
2.1.1 Isotropic Diffusion Map
Let X be the dataset. Define a kernel k on X satisfying:
(1) k is symmetric, i.e. k(x, y) = k(y, x),
(2) k is positivity preserving, i.e. k(x, y) ≥ 0.
This kernel captures how closely the points in X related to each other. It characterizes the connection
between pairs of points within the symmetric graph, where the weight is determined by the kernel
function k. The transition probabilities defined by the weight function in the graph directly reflect the
local geometry. The technique to obtain the latent diffusion space is known as the normalized graph
Laplacian construction [37]. Let M be a compact smooth manifold and M = Φ(M), Φ : M → Rd
denote an isometric embedding into d-dimensional Euclidean space. Assume that the dataset X is the
entire manifold and q(x) represents the density of the points on M.

2.1.2 Directed Anisotropic Diffusion Map


The isotropic diffusion generated by rotation invariant kernel computes all directions equally. It
means that all variables in dataset play the same role. However, some variables might be more
important than others under the prior knowledge of the dataset. The anisotropic diffusion is sometimes
more useful to locally separate the variables and provides early warnings.
The function f defined on dataset can be obtained through prior knowledge. To design a diffusion
process adapted to the dataset, consider the directed kernel

∥ x − y ∥2 ∥ ⟨∇f, x − y⟩ ∥2
kε (x, y) = exp(− − ), (1)
ε ε2
where ∥ · ∥ denotes the Euclidean norm. The directed kernel will promote the formation of connections
among points within the same class while also creating greater separation between different classes.
We adopt directed diffusion to show that the construction of diffusion kernel can be data-driven.
To balance the influence of the prior knowledge on the latent dynamics, we introduce an anisotropic
diffusion process with a parameter α, which specifies the amount of influence of the distribution of the
data points. The crucial point is that the graph Laplacian normalization is not applied on a graph
with isotropic weights, but rather on a renormalized graph [13]. In fact, when α = 0, the diffusion
reduces to that of the classical normalized graph Laplacian normalization.
There are two steps to construct the directed anisotropic kernel: firstly renormalize the directed
weight into an anisotropic kernel, and secondly compute the normalized graph Laplacian diffusion from
this new graph. By integrating over the second variable kε , the density qε is represented by
Z
qε (x) = kε (x, y)q(y)dy,
X

and form a new kernel


kε (x, y)
kε,α (x, y) = .
qε (x)qεα (y)
α

Apply the weighted graph Laplacian normalization to the new kernel by setting
Z
dε,α (x) = kε,α (x, y)q(y)dy, (2)
X

and defining
kε,α (x, y)
pε,α (x, y) = . (3)
dε,α (x)
The directed operator Pε,α is defined by
Z
Pε,α f (x) = pε,α (x, y)f (y)q(y)dy. (4)
X

The normalization removes part influence of the density and construct the latent geometry of
the dataset. Dealing with finite data points, we should approximate integrals by finite sums and can
describe the latent reduced order space by the spectral analysis. For most applications, we consider the

4
bounded part will consist of a finite number of points. Here, the first d eigenvalues and corresponding
eigenvectors capture the main features. The directed anisotropic diffusion map is given by Φ(x) =
[λ1 ϕ1 (x) . . . λd ϕd (x)]T , where {λi }di=1 are eigenvalues with λ1 ≥ λ2 ≥ · · · and {ϕi }di=1 are eigenvectors.
Several benefits are obtained by dimension reduction. It is well known that high dimensionality
often degrades the dynamical performance, necessitating the use of lower-dimensional representations.
Additionally, in the latent space, there is a significant reduction in the time required to train a model.

2.2 Extracting the Latent Stochastic Dynamical System


In order to depict the broader spectrum of nonlinear dynamics in the latent space, we proceed to
the construction of the reduced order models. Here, the stochastic differential equations are preferred
to carry the latent dynamics of the high-dimensional data. The drift term (vector field) characterizes
the deterministic aspect of the system. While the diffusion term (fluctuation) pertains to the stochastic
component of the system and depends on a random process, such as Brownian motion or Lévy process.
Let z(t) be a stochastic variable governed by the stochastic differential equation
dz(t) = µ(z(t))dt + σ(z(t))dBt , (5)
d d d d×d
where µ : R → R is the drift term, σ : R → R is the diffusivity matrix, and Bt is Brownian
motion. Inspired by deep learning architecture [38], we construct two neural networks µθ and σθ to
estimate µ and σ, whose inputs and weights are represented by the basis functions and coefficients,
respectively.
In fact, we have access to a set of lower-dimensional points {z(ti )} ∈ Rd by directed anisotropic
diffusion map. Then the set of snapshots {z(ti+1 ), z(ti ), δt }, δt = ti+1 − ti is available. Over the small
time interval [ti , ti+1 ], the stochastic differential equation has a discrete form
z(ti+1 ) = z(ti ) + µ(z(ti ))δt + σ(z(ti ))δBt ,
where δBt = Bti+1 − Bti ∼ N (0, δt ). Conditioned on (z(ti ), δt ), the point z(ti+1 ) obeys a multivariate
normal distribution
z(ti+1 ) ∼ N (z(ti ) + δt µ(z(ti )), δt σ(z(ti ))2 ).
Hence, the neural networks are trained by minimizing the loss function L(θ), which is defined as
(z(ti+1 ) − z(ti ) − δt µθ (z(ti ))2 )
L(θ) = + log | δt σθ (z(ti ))2 | + log(2π). (6)
δt σθ (z(ti ))2
Moreover, depending on the tail property of the real dataset, the latent space modelling can be extended
to stochastic differential equations driven by non-Gaussian noise. The corresponding loss function is
in [31].

2.3 Early Warning Indicator


When there exist latent variables in complex disease processes, a critical concern revolves around
the concept of early warning signals. In order to provide early warnings before abrupt changes, we
introduce the transition probability between two states [39]. The phase space for the latent variable
is separated into two regions A and B and the ratio TPAB is defined as
⟨IA (z(0))IB (z(t))⟩
TPAB = , (7)
⟨IA (z(0))⟩
where IA is an indicator function satisfying IA (z) = 1 if z ∈ A, and IA (z) = 0 if z ∈
/ A and ⟨·⟩ denotes
the ensemble average. The ratio TPAB represents the probability of finding the system in region B
after time t under the condition z(0) ∈ A. In the following, if the probability is more than 0.5, we
will give an early warning due to higher transition probability. Additionally, the mean and standard
deviation of the specified time window are also computed as indicators for early warnings.

3 Experiments
We now apply the method outlined in the previous sections to datasets collected from patients with
epileptic seizures. Our objective is to analyze the latent dynamics and illustrate our early warning
strategy.

5
3.1 Data Preprocessing
Electroencephalogram (EEG) involves the recording of electroencephalographic signals via elec-
trodes placed on the scalp, which record the electrical activity during epileptic seizures. Here, the
EEG dataset comprises both pre-ictal and ictal data, which is a matrix of channel by time, with a
sampling rate of 256 points per second. For each electrode signal, we rescale the data to a interval
of [−0.5, 0.5] and subsample the data by selecting one data point for every sixteen points to form the
dataset. In Fig. 2, there is a sample image that the doctor labels T ime = 500 as the separation time
for pre-ictal and ictal period.

pre-seizure epileptic seizure

Figure 2: An example: the EEG data of a patient.

3.2 Dimension Reduction with Directed Anisotropic Diffusion Map


To construct the diffusion matrix, the data from all electrodes at each time point is viewed as
high-dimensional node in the diffusive graph. According to the original data, a few electrode signals
exhibit abnormal fluctuations prior to the onset of an epileptic seizure. These fluctuations are difficult
to detect by conventional manifold learning methods. Therefore, we propose to identify early warnings
in lower-dimensional space by directed anisotropic diffusion maps.

3.2.1 Diffusion Matrix and Spectral Gap


The diffusion matrix is constructed by the diffusion kernel and shows the transition among all
high-dimensional nodes in the diffusive graph, which exhibits the transition probability from one point
to another. Since the dataset is divided into two distinct classes: pre-ictal state and ictal state at
T ime = 500, we define the following directed drift with parameters a and b:

0, i ≤ 500
∇f (xti ) = a (8)
 , i > 500
1 + exp (−bxti )
and apply the directed diffusion to the preprocessed dataset. When adding an external term in diffusion
kernel (Eq. (1)), the term can be thought as external force or knowledge in reality. As different doctors
may assign varying labels over time, ∇f (x) fluctuates depending on each doctors’ decisions on the onset
time of seizures. Here the division point at T ime = 500 is determined by the doctors’ designation of
the beginning of epileptic seizures.
As shown in the top line of Fig. 3, each point has a low transition probability to others in (a). In
(b), (c), data points within [1, 500] have a high probability of transition to each other, with a gradually
decreasing transition probability to points in [501, 1500]. With the directed diffusion, the transition
probability of sample points starting from [1, 500] to [501, 1500] reduces, especially to the points in
[1001, 1500] due to the function’s shape defined in Eq.8. Correspondingly, the transition probability

6
(a) (b) (c)

(d) (e) (f)

Figure 3: Comparison of diffusion map, anisotropic diffusion map and directed anisotropic diffusion
map. The top line presents the diffusion matrices, and the bottom line represents the logarithm of
eigenvalues. For all cases, the plots we show here are under ϵ = 1 for three scenarios: left column
((a)(d)) is constructed when α = 0 and ∇f = 0, middle column ((b)(e)) is obtained when α = 1 and
∇f = 0, while the right column ((c)(f)) is calculated when α = 1 and ∇f defined in Eq. (8).

of data points starting from [501, 1500] to [1, 500] also drops accordingly. In this way, the strength of
diffusion activity between the first one third time ([1, 500]) and the latter time points([501, 1500]) in
original high-dimensional time series is forced to damped down, so the reduced manifold for the two
classes will be better separated.
The diffusion matrix has a discrete sequence of eigenvalues {λi }i≥1 and eigenvectors {ϕi }i≥1 such
that λ1 ≥ λ2 ≥ · · · . The eigenvalues affect diffusive strength, and the spectral gap is an indicator
to help us find effective dimension of latent space. In the bottom line of Fig. 3, the logarithm of
eigenvalues for three kinds of diffusion maps shows the spectral gap. The first two largest eigenvalues
are separated from others.

3.2.2 Low-dimensional Embedding over Time

(a) (b) (c)

Figure 4: Comparison of low-dimensional embedding patterns with (a) diffusion map (α = 0 and
∇f = 0), (b) anisotropic diffusion map (α = 1 and ∇f = 0) and (c) directed anisotropic diffusion map
(α = 1 and ∇f defined in Eq. (8)).

The directed anisotropic diffusion map discovers two latent coordinates denoted as Φ1 and Φ2 by
the gap between two adjacent eigenvalues, as shown in Fig. 3 bottom line. This suggests that two
diffusion coordinates are enough to provide an economical representation of the original EEG dataset.
We check the interpretability of the data-driven observations by displaying the coordinates in Fig. 4.
The isotropic diffusion map projects data points to divergent locations, and the anisotropic diffusion
map creates a curve. In contrast, with directed diffusion, the anisotropic diffusion map partitions the
dataset into two states in the latent space, expended by the two leading diffusion coordinates. Most
important of all, from (a) and (b) in Fig. 4, we can see that data points from [1, 500] and [501, 1000]

7
are mixed on the left half manifold. However, in (c), the drift term induced by external knowledge
f (x) helps to differentiate two groups of points in [1, 500] and [501, 1000].

3.3 Latent Stochastic Dynamical Systems


An interesting aspect of our research involves the dynamic behaviors exhibited by latent dynamical
systems. In this section, we aim to elucidate the connection between the latent stochastic differential
equation and the original high-dimensional time evolutionary data.
Following dimension reduction through the directed anisotropic diffusion map, we obtain the orig-
inal EEG dataset projected onto the first two diffusion coordinates Φ1 and Φ2 , shown in Fig. 5 (a).
The projected data plotted over time displays the transition between two states: pre-ictal state and
ictal state. Therefore, we assume that the stochastic differential equation in the latent space has two
meta-stable states and rescale the data to a interval of [−1, 1] for convenience. Given data in Fig. 5
(a), we would like to obtain the latent stochastic differential equation in the form of Eq. (5). To get
meaningful representations of drift and diffusion coefficients, we assume the utilization of polynomial
basis for approximating both µ(z) and σ(z) and employ neural networks to approximate the latent
equations. The inputs of networks are the basis functions of the coordinates and the time step is
0.0625. The latent space contains 1500 time points and 1499 pairs of {(z(ti+1 ), z(ti ))} are available.
We randomly select 1200 pairs to train the networks and validate the identification accuracy on the
rest 299 pairs.

(a) (b)

Figure 5: The latent space. (a) Data for Φ1 and Φ2 . (b) The drift and the diffusion terms are
approximated by neural networks for Φ1 (top line) and Φ2 (bottom line).

To find the explicit form of drift and diffusion coefficients in the unknown stochastic differential
equations, all the unknown drift and diffusion terms are parameterized with neural networks, with
a polynomial basis as the input vector. Then using the loglikelihood method in Sec. 2.2, the learnt
stochastic differential equations under the coordinates Φ1 and Φ2 have the form
(
dz1 (t) = µ1 (z1 (t))dt + σ1 (z1 (t))dBt ,
(9)
dz2 (t) = µ2 (z2 (t))dt + σ2 (z2 (t))dBt ,

where µ1 (z) = 0.1033 + 0.0676z − 0.5705z 2 − 0.9001z 3 , σ1 (z) = 0.5109z, µ2 (z) = 0.1565 + 0.2856z −
0.1386z 2 − 0.7195z 3 and σ2 (z) = −0.5650z. The approximations of the drift term and diffusion term
of the stochastic differential equation with regard to the first two coordinates Φ1 and Φ2 are presented
in Fig. 5 (b).
Moreover, we choose density as our validation metric for the testing data (299 pairs). In Fig. 6, we
compare the densities of the estimated z(ti+1 ) by approximations of the latent stochastic differential
equations Eq.9 with those of the ground truths from corresponding data points in Fig.5(a). We can see

8
that our approximation aligns well with the density of true data. And also in each figure, the presence
of two peaks in the density plot indicates the existence of two meta-stable states.

Figure 6: The validation of the neural network approximation with regard to Φ1 a nd Φ2 .

Further, using sample trajectories from the learnt latent stochastic dynamical systems, we plot
their time evolutionary density over time. Fig.7 shows time evolutionary density starting from 1000
uniformly distributed sample paths. The shift of the density and long-term stationary patterns suggest
the presence of hidden information worthy of further investigation. One promising research direction
involves employing latent stochastic dynamical systems to generate more samples so as to enhance the
data augmentation capabilities within the constrained context of medical resources.

Figure 7: Time evolutionary density over time with time step dt = 0.001.

3.4 Early Warning Indicator


In this section, we present the three time varying indicators resulted from transitions in the latent
space.

Transition probability
For each Φi (i = 1, 2), using the time series trajectory comprising 1500 data points in Fig. 5 (a),
we now calculate the time varying transition probability. The first 200 points are chosen as starting
points and 200 samples are obtained. The split point is set as −0.5, so the regions A = (−∞, −0.5]
and B = (−0.5, ∞). In Fig. 8, we calculate the transition probability by Eq. (7) of state at each time
step t starting from the initial state. Assume that a transition will occur when the probability exceeds
0.5. So we define the point with probability equal to 0.5 as the warning point. As time goes on, we
can see quite different patterns from the signals of the first and second coordinates respectively. The
warning points are around T ime = 1000 for Φ1 and around T ime = 600 for Φ2 . The early warning
point would be set at T ime = 600. The second coordinate actually outputs more information on the
early warning compared with the first coordinate.
Since the above signal is calculated based on the specific trajectory projected from original high-
dimensional data, it is also compelling to check whether our latent dynamical system also preserve

9
1.0
1
0.8

Transition probablility
2
0.6
0.4
0.2
0.0
200 400 600 800 1000 1200 1400
Time
Figure 8: The transition probability of raw data projected onto coordinates Φ1 and Φ2 . The red dots
indicate the transition probability equal to 0.5.

the transition patterns statistically. This is important when given limited data source or partial
observation, and one needs to detect transition directly from the latent stochastic differential equation.
Leveraging neural networks with polynomial basis, we can obtain a sufficient number of sample paths
generated by learnt stochastic differential equations and consequently yielding statistically robust
outcomes. Therefore, we generate 1000 samples from the learnt stochastic differential equation and
obtain the corresponding transition probability over time Fig.9. Consistent with the previous results,
the second statistical coordinate Φ̃2 gives an earlier warning.
1.0
0.8 1
Transition probablility

2
0.6
0.4
0.2
0.0
0 500 1000 1500 2000 2500 3000
Time
Figure 9: The transition probability of 1000 samples generated by learnt stochastic differential equa-
tions with time step dt = 0.0025. The statistical coordinates are denoted by Φ̃1 and Φ̃2 . The red dots
indicate the transition probability equal to 0.5.

Moving average and standard deviation


Now we incorporate the standard deviation of original data sdori over time (Fig. 2) with the
projected data Φ1 , Φ2 (in Fig. 5 (a)) for the comparison on warning points.
Assigning the width of each time window as 200, Fig.10 shows the moving average along time.
One interesting observation is that the first coordinate of the reduced systems aligns closely with the
standard deviation of the original high-dimensional data at each time step. The second coordinate
gives earlier warning alert than the other two signals.
When checking the standard deviation of the moving windows with width 200 in Fig. 11, we
see different evaluations on the variance of noisy data through different metrics. Specially, it’s not
surprising to see the around T ime = 600, the second coordinate exhibits the highest standard deviation
value while the first coordinate and the original data shows the opposite way. This comes from the
impact of the drift term added in the directed diffusion kernel.

4 Conclusion
Predicting epileptic seizures from brain signals is a challenging task due to the high dimensionality

10
1.0
1
2
0.5 sdori

0.0

0.5

1.0
200 400 600 800 1000 1200 1400
Time
Figure 10: The moving average of Φ1 , Φ2 and sdori . When considering 12.5% increase of the value
compared with starting point as abnormal signals, we obtain the red dots. The second coordinate
gives earlier warning alert than the other two signals.

0.35
1
0.30 2
sdori
0.25
0.20
0.15
0.10
0.05
200 400 600 800 1000 1200 1400
Time
Figure 11: The moving standard deviation of Φ1 , Φ2 and sdori . The red dots represent the highest
standard deviation value. The second coordinate gives earlier warning alert than the other two signals.

11
and complexity of the data. In this paper, we propose a machine learning approach with stochastic
dynamical systems for modelling the signals in lower latent space and identify the early warning siganls
for epilepsy.
Directed diffusion is employed to differentiate ictal data from pre-ictal data under the notion that
information can flow in a specific direction. This technique serves as an extension of the diffusion map
algorithm, specifically tailored to extract and represent the underlying structure and relationships in
high-dimensional data. A crucial factor in defining a reduced latent space is the presence of a spectral
gap among the eigenvalues. In this paper, we focus on two primary coordinates that effectively capture
the dynamic characteristics of the original high-dimensional data.
Early warning of epileptic seizures is of paramount importance for individuals with epilepsy. The
abrupt change is caught for early warning in the latent space, where normal state and ictal state can be
viewed as two meta-stable states. The ability to identify transitions between meta-stable states plays a
pivotal role in predicting and controlling brain behavior. Furthermore, we employ stochastic differential
equations to model the nonlinear dynamics in the latent space, simulating dynamics such as long-term
time evolutionary density and transition probability, corresponding to the the metrics of the real
high dimensional data. This framework of learning latent stochastic systems and detecting abnormal
dynamics is extendable to general scenarios for other complex high dimensional time evolutionary data.

Acknowledgements
We would like to thank Professor Quanying Liu and her students Yinuo Zhang, Zhichao Liang for
providing the data and helpful discussion. This work was supported by the National Key Research
and Development Program of China (No. 2021ZD0201300), the National Natural Science Foundation
of China (No. 12141107), the Fundamental Research Funds for the Central Universities (5003011053),
the Fundamental Research Funds for the Central Universities, HUST: 2022JYCXJJ058 and Dongguan
Key Laboratory for Data Science and Intelligent Medicine.

Data availability
The data sets that support the findings of this study are available upon request.

References
[1] A. Mardt, L. Pasquali, H. Wu, and F. Noé. Vampnets for deep learning of molecular kinetics.
Nat. Commun., 9(1):5, 2018.
[2] Y. Zhang and W. Wang. Mathematical analysis for stochastic model of alzheimer’s disease.
Commun. Nonlinear Sci. Numer. Simulat., 89:105347, 2020.
[3] F. Yang, Y. Zheng, J. Duan, L. Fu, and S. Wiggins. The tipping times in an arctic sea ice system
under influence of extreme events. Chaos, 30(6):063125, 2020.
[4] D. Faranda, F. M. E. Pons, E. Giachino, S. Vaienti, and B. Dubrulle. Early warnings indicators
of financial crises via auto regressive moving average models. Commun. Nonlinear Sci. Numer.
Simulat., 29(1-3):233–239, 2015.

[5] C. Yildiz, M. Heinonen, and H. Lahdesmaki. Ode2vae: Deep generative second order odes with
bayesian neural networks. NIPS, 32, 2019.
[6] A. Hasan, J. Pereira, S. Farsiu, and V. Tarokh. Identifying latent stochastic differential equations.
IEEE Trans. Signal Processing, 70:89–104, 2021.
[7] L. Bi, J. Zhang, and J. Lian. Eeg-based adaptive driver-vehicle interface using variational autoen-
coder and pi-tsvm. IEEE Trans. Neural Syst. Rehab. Eng., 27(10):2025–2033, 2019.
[8] X. Li, Z. Zhao, and D. et al. Song. Latent factor decoding of multi-channel eeg for emotion
recognition through autoencoder-like neural networks. Front. Neurosci., 14:87, 2020.

12
[9] X. Guo and J. Wang. Low-dimensional dynamics of brain activity associated with manual acupunc-
ture in healthy subjects. Sensors, 21(22):7432, 2021.
[10] K. Han, H.g Wen, and J. et al. Shi. Variational autoencoder: An unsupervised model for encoding
and decoding fmri activity in visual cortex. NeuroImage, 198:125–136, 2019.
[11] D. Bethge, P. Hallgarten, and T. et al. Grosse-Puppendahl. Eeg2vec: Learning affective eeg
representations via variational autoencoders. In IEEE SMC 2022, pages 3150–3157. IEEE, 2022.
[12] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representa-
tion. Neural Comput., 15(6):1373–1396, 2003.
[13] R. R. Coifman and S. Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006.
[14] A. Ansuini, A. Laio, J. H. Macke, and D. Zoccolan. Intrinsic dimension of data representations
in deep neural networks. NIPS, 32, 2019.
[15] L. Feng, T. Gao, M. Dai, and J. Duan. Learning effective dynamics from data-driven stochastic
systems. Chaos, 33(4), 2023.
[16] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding.
Science, 290(5500):2323–2326, 2000.
[17] B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis. Diffusion maps, spectral clustering and
reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal., 21(1):113–127, 2006.
[18] R. R. Coifman, I. G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler. Diffusion maps, reduction
coordinates, and low dimensional representation of stochastic systems. Multiscale Model. Simul.,
7(2):842–864, 2008.
[19] A. L. Ferguson, A. Z. Panagiotopoulos, P. G. Debenedetti, and I. G. Kevrekidis. Systematic
determination of order parameters for chain dynamics using diffusion maps. PNAS, 107(31):13597–
13602, 2010.
[20] J. Sulam, Y. Romano, and R. Talmon. Dynamical system classification with diffusion embedding
for ecg-based person identification. Signal Process., 130:403–411, 2017.
[21] I. K. Gallos, D. Lehmberg, F. Dietrich, and C. Siettos. Data-driven modelling of brain activity us-
ing neural networks, diffusion maps, and the koopman operator. arXiv preprint arXiv:2304.11925,
2023.
[22] N. Evangelou, F. Dietrich, E. Chiavazzo, and D. et al. Lehmberg. Double diffusion maps and their
latent harmonics for scientific computations in latent space. J. Comput. Phys., 485:112072, 2023.
[23] K. R. Dos Santos, D. G. Giovanis, and M. D. Shields. Grassmannian diffusion maps–based dimen-
sion reduction and classification for high-dimensional data. SIAM J. Sci. Comput., 44(2):B250–
B274, 2022.
[24] R. Talmon, S. Mallat, H. Zaveri, and R. R. Coifman. Manifold learning for latent variable inference
in dynamical systems. IEEE Trans. Signal Processing, 63(15):3843–3856, 2015.
[25] S. Ouala, D. Nguyen, L. Drumetz, and B. et al. Chapron. Learning latent dynamics for partially
observed chaotic systems. Chaos, 30(10):103121, 2020.
[26] J. Duan. An Introduction to Stochastic Dynamics, volume 51. Cambridge, 2015.
[27] L. Duncker, G. Bohner, J. Boussard, and M. Sahani. Learning interpretable continuous-time
models of latent stochastic dynamical systems. In ICML, pages 1726–1734. PMLR, 2019.
[28] N. Evangelou, F. Dietrich, J. M. Bello-Rivas, and A. J. et al. Yeh. Learning effective sdes from
brownian dynamic simulations of colloidal particles. Mol. Syst. Des. Eng., 8:887–901, 2023.
[29] Y. Li and J. Duan. Extracting governing laws from sample path data of non-gaussian stochastic
dynamical systems. J. Stat. Phys., 186:1–21, 2021.

13
[30] T. Gao and J. Duan. Quantifying model uncertainty in dynamical systems driven by non-gaussian
lévy stable noise with observations on mean exit time or escape probability. Commun. Nonlinear
Sci. Numer. Simul., 39:1–6, 2016.
[31] C. Fang, Y. Lu, T. Gao, and J. Duan. An end-to-end deep learning approach for extracting
stochastic dynamical systems with α-stable lévy noise. Chaos, 32(6), 2022.

[32] M. Dai, T. Gao, Y. Lu, Y. Zheng, and J. Duan. Detecting the maximum likelihood transition
path from data of stochastic dynamical systems. Chaos, 30(11):113124, 2020.
[33] T. Gao, J. Duan, and X. Kan. Dynamical inference for transitions in stochastic systems with
α-stable lévy noise. J. Phys. A Math. Theor., 49, 2015.

[34] R. Liu, P. Chen, K. Aihara, and L. Chen. Identifying early-warning signals of critical transitions
with strong noise by dynamical network markers. Sci. Rep., 5(1):17501, 2015.
[35] Y. Huang, G. Kou, and Y. Peng. Nonlinear manifold learning for early warnings in financial
markets. Eur. J. Oper. Res., 258(2):692–702, 2017.
[36] T. M. Bury, R. Sujith, I. Pavithran, and M. et al. Scheffer. Deep learning for early warning signals
of tipping points. PNAS, 118(39):e2106140118, 2021.
[37] F. Chung. Spectral graph theory, volume 92. Am. Math. Soc., 1997.
[38] F. Dietrich, A. Makeev, G. Kevrekidis, and N. et al. Evangelou. Learning effective stochastic
differential equations from microscopic simulations: Linking stochastic numerics to deep learning.
Chaos, 33(2), 2023.
[39] Peter G. B., David W. C., Christoph D., and Phillip L. G. Transition path sampling: throwing
ropes over rough mountain passes, in the dark. Annu. Rev. Phys. Chem., 53:291–318, 2002.

14

You might also like