Professional Documents
Culture Documents
Abstract
This paper explores the question: how to identify a reduced order model from data.
There are three ways to relate data to a model: invariant foliations, invariant manifolds
and autoencoders. Invariant manifolds cannot be fitted to data unless a hardware in a
loop system is used. Autoencoders only identify the portion of the phase space where
the data is, which is not necessarily an invariant manifold. Therefore for off-line data
the only option is an invariant foliation. We note that Koopman eigenfunctions also
define invariant foliations, but they are limited by the assumption of linearity and result-
ing singularites. Finding an invariant foliation requires approximating high-dimensional
functions. We propose two solutions. If an accurate reduced order model is sought, a
sparse polynomial approximation is used, with polynomial coefficients that are sparse hi-
erarchical tensors. If an invariant manifold is sought, as a leaf of a foliation, the required
high-dimensional function can be approximated by a low-dimensional polynomial. The
two methods can be combined to find an accurate reduced order model and an invariant
manifold. We also analyse the reduced order model in case of a focus type equilibrium,
typical in mechanical systems. We note that the nonlinear coordinate system defined
by the invariant foliation and the invariant manifold distorts instantaneous frequencies
and damping ratios, which we correct. Through examples we illustrate the calculation
of invariant foliations and manifolds, and at the same time show that Koopman eigen-
functions and autoencoders fail to capture accurate reduced order models under the
same conditions.
1 Introduction
There is a great interest in the scientific community to identify explainable and/or parsimo-
nious mathematical models from data. In this paper we classify these methods and identify
one concept that is best suited to accurately calculate reduced order models (ROM) from
∗
Abbreviated title: Data-driven reduced order modelling
1
a) b)
Figure 1: Two cases of data distribution. The blue dots (initial conditions) are mapped
into the red triangles by the nonlinear map F given by equation (35). a) data points are
distributed in the neighbourhood of the solid green curve, which is identified as an approx-
imate invariant manifold. b) Initial conditions are well-distributed in the neighbourhood of
the steady state and there is no obvious manifold structure. For completeness, the second
invariant manifold is denoted by the dashed line.
off-line data. A ROM must track some selected features of the data over time and predict
them into the future. We call this property of the ROM invariance. A ROM may also be
unique, which means that barring a (nonlinear) coordinate transformation, the mathemat-
ical expression of the ROM is independent of the sampled data as long as the sample size
is sufficiently large and the distribution of the data satisfies some minimum requirements.
However, uniqueness may not be as important as invariance in certain applications.
Not all methods that identify low-dimensional models produce ROMs. In some cases the
data lie on a low-dimensional (nonlinear) manifold embedded in a high-dimensional (typically
Euclidean) space (figure 1(a)). In this case the task is to parametrise the low-dimensional
manifold and fit a model to the data in the coordinates of the parametrisation. The choice of
parametrisation influences the form of the model. It is desired to use a parametrisation that
yields a model with the least number of parameters. Approaches to reduce the number of
parameters include compressed sensing [21, 6, 11, 12, 17] and normal form methods [46, 58,
16]. The methods to obtain a parametrisation of the manifold include diffusion maps [18],
isomaps [52] and autoencoders [33, 17, 32, 16].
The main focus of this paper is genuine reduced order models, where we need to find
order in a cloud of data as illustrated in figure 1(b). The dynamics within the data defines
the reduced order model. We are not merely finding a parametrisation of a manifold, we are
choosing a particular one that is invariant. Alternatively, a family of manifolds that map onto
each other are sought after, which is the case of an invariant foliation. The lesson learnt from
previous work [24, 20, 13] is that invariant manifolds are not defined by the dynamics on them
but by the dynamics in their neighbourhood. Therefore, any method that does not identify
the dynamics in the neighbourhood of a manifold cannot claim invariance. Autoencoders do
not identify the surrounding dynamics and therefore fail to guarantee invariance. Obviously,
2
if there is no surrounding dynamics to identify, because all the data is on the manifold, this
is not a problem.
1.1 Set-up
We assume a deterministic system and small additive noise of zero mean that appears in our
off-line collected data. The state of the system belongs to a real n-dimensional Euclidean
space X (a vector space with an inner product h·, ·iX : X × X → R). The state of the system
is uniformly sampled
y k = F (xk ) + ξ k , k = 1, . . . , N, (1)
where ξ k ∈ X is a small noise sampled from a distribution with zero mean. Equation (1)
describes pieces of trajectories if for some k ∈ {1, . . . , N }, xk+1 = y k . The state of the
system can also be defined on a manifold, in which case X is chosen such that the manifold
is embedded in X according to Whitney’s embedding theorem [56]. It is also possible that
the state cannot be directly measured, in which case Taken’s delay embedding technique [51]
can be used to reconstruct the state, which we will do subsequently in an optimal manner
[15]. As a minimum, we require that our system is observable [29].
For the purpose of this paper we also assume a fixed point at the origin, such that
F (0) = 0 and that the domain of F is a compact and connected subset G ⊂ X that includes
a neighbourhood of the origin.
1. The encoder-map pair (U , S) is a reduced order model (ROM) of F if for all initial
conditions x0 ∈ G ⊂ X the trajectory xk+1 = F (xk ) and for initial condition z 0 =
U (x0 ) the second trajectory z k+1 = S (z k ) are connected such that z k = U (xk ) for
all k > 0.
2. The decoder-map pair (W , S) is a reduced order model of F if for all initial conditions
z 0 ∈ H = {z ∈ Z : W (z) ∈ G} the trajectory z k+1 = S (z k ) and for initial condition
3
x0 = W (z 0 ) the second trajectory xk+1 = F (xk ) are connected such that xk =
W (z k ) for all k > 0.
Remark 2. In essence, a ROM is a model whose trajectories are connected to the trajectories
of our system F . We call this essential property invariance. It is possible to define a weaker
ROM, where the connections z k = U (xk ) or xk = W (z k ) are only approximate, which is
not discussed here.
The following corollary can be thought of as an equivalent definition of a ROM.
Proof. Let us assume that (2) holds, choose an x0 ∈ X and let z 0 = U (x0 ). First we check
whether z k = U (xk ), if xk = F k (x0 ), z k = S k (z 0 ) and (2) hold. Substituting x = F k (x0 )
into (2) yields
S k (z 0 ) = U F k (x0 )
z k = U (xk ) .
z 1 = S (z 0 ) = U (x1 ) ,
S (U (x0 )) = U (F (x0 )) ,
which is true for all xk−1 ∈ G, hence (2) holds. The proof for the decoder is identical, except
that we swap F and S and replace U with W .
4
Figure 3: Autoencoder. Encoder U maps data to parameter space Z, then the low-order map
S takes the parameter forward in time and finally the decoder W brings the new parameter
back to the state space. This chain of maps denoted by solid purple arrows must be the
same as map F denoted by the dashed arrow. The two results can only match between two
manifolds M and N and not elsewhere in the state space. Manifold M is invariant if and
only if M ⊂ N , which is not guaranteed by this construction.
F
X F
X X X
a) b) W
U W
U
S
Z S
Z Z Z
F
F X X
X X
W U
c) d)
U W
S
S Z Z
Z Z
Now the question arises if we can use a combination of an encoder and a decoder? This
is the case of the autoencoder or nonlinear principal component analysis [33]. Indeed, there
are four combinations of encoders and decoders, which are depicted in diagrams 2(a,b,c,d).
We name these four scenarios as follows.
As we walk through the dashed and solid arrows in diagram 2(a), we find the two sides of
the invariance equation (2). If we do the same for diagram 2(b), we find equation (3). This
5
implies that invariant foliations and invariant manifolds are ROMs. The same is not true
for autoencoders and reverse autoencoders. Reading off diagram 2(c), the autoencoder must
satisfy
W (S (U (x))) = F (x) . (4)
Equation (4) is depicted in figure 3. Since the decoder W maps onto a manifold M, equation
(4) can only hold if x is chosen from the preimage of M, that is, x ∈ N = F −1 (M). For
invariance, we need F (M) ⊂ M, which is the same as M ⊂ N if F is invertible. However,
the inclusion M ⊂ N is not guaranteed by equation (4). The only way to guarantee that
M ⊂ N , is by stipulating that the function composition W ◦ U is the identity map on the
data. A trivial case is when dim Z = dim X or, in general, when all data fall onto a dim Z
dimensional submanifold of X. Indeed, the standard way to find an autoencoder [33] is to
solve
N
arg min kW (U (xk )) − xk k2 .
X
(5)
U ,W
k=1
Equation (6) immediately provides S, for any U , W , without identifying any structure in
the data.
Only invariant foliations, through equation (2) and autoencoders through equations (4)
and (5) can be fitted to data, because parameter z is not in the data. Indeed, if we take into
account our data (1), foliation invariance equation (2) turns into an optimisation problem
N
arg min kxk k−2 kS (U (xk )) − U (y k )k2 ,
X
(7)
S,U
k=1
For the optimisation problem (7) to have a unique solution, we need to apply a constraint
to U . One possible constraint is explained in remark 7. In case of the autoencoder (8), the
constraint, in addition to the one restricting U , can be that U ◦ W is the identity as also
stipulated in [16].
6
(b)
(a)
Figure 4: (a) Invariant foliation: a leaf Lz is being mapped onto leaf LS(z) . Since the origin
0 ∈ Z is a steady state of S, the leaf L0 is mapped into itself and therefore it is an invariant
manifold. (b) Invariant manifold: a single trajectory, represented by red dots, starting from
the invariant manifold M remains on the green invariant manifold.
7
Theorem 6. Assume that DF (0) is semisimple and that there exists an integer σ ≥ 2, such
that iE ? < σ ≤ r. Also assume that
n
k 6= µj , j = 1, . . . , ν
Y
µm k
(10)
k=1
Proof. The proof is carried out in [49]. Note that the assumption of DF (0) being semisimple
was used to simplify the proof in [49], therefore it is unlikely to be needed.
Remark 7. Theorem 6 only concerns the uniqueness of the foliation, but not the encoder U .
However, for any smooth and invertible map R : Z → Z, the encoder Ũ = R ◦ U represents
the same foliation and the nonlinear map S transforms into S̃ = R ◦ S ◦ R−1 . If we want
to solve the invariance equation (2), we need to constrain U . The simplest such constraint
is that
U (W 1 z) = z, (11)
where W 1 : Z → X is a linear map with full rank such that E ? ∩ ker W ?1 = {0}. To
explain the meaning of equation (11), we note that the image of W 1 is a hyperplane H in
X. Equation (11) therefore means that each leaf Lz must intersect this hyperplane exactly
at parameter z. The condition E ? ∩ ker W ?1 = {0} then means that the leaf L0 has a proper
intersection with hyperplane H and therefore is not tangent to H. This is similar to the
graph-style parametrisation of a manifold over a hyperplane.
Remark 8. Eigenvalues and eigenfunctions of the Koopman operator [36, 37] are invariant
foliations. Indeed, the Koopman operator is defined as (Ku) (x) = u (F (x)). If we assume
that U = (u1 , . . . , uν ) is a collection of functions uj : X → R, Z = Rν , then U spans an
invariant subspace of K if there exists a matrix S such that K (U ) = SU . Expanding this
equation yields U (F (x)) = SU (x), which is the same as the invariance equation (2), except
that S is linear. The existence of linear map S requires further non-resonance conditions,
which are ν
k 6= µj , j = 1, . . . , ν
Y
µm k
(12)
k=1
for all mk ≥ 0 such that k=0 mk ≤ σ − 1. Equation (12) is referred to as the set of internal
Pn
non-resonance conditions, because these are intrinsic to the invariant subspace E ? . In many
cases S represents the slowest dynamics, hence even if there are no internal resonances, (12)
will approximately hold for some set of m1 , . . . , mν and that causes numerical issues leading
to undesired inaccuracies. We will illustrate this in section 5.2.
Now we discuss invariant manifolds. A decoder W defines a differentiable manifold
M = {W (z) : z ∈ H} ,
8
where H = {z ∈ Z : W (z) ∈ G}. Invariance equation (3) is equivalent to the geometric
condition that F (M) ⊂ M. This geometry is shown in figure 4(b), which illustrates that if
a trajectory is started on M, all subsequent points of the trajectory stay on M.
Invariant manifolds as a concept cannot be used to identify ROMs from off-line data. As
we will see below, invariant manifolds can still be identified as a leaf of an invariant foliation,
but not through the invariance equation (3). Indeed, it is not possible to guess the manifold
parameter z ∈ Z from data. Introducing an encoder U : X → Z to calculate z = U (x),
transforms the invariant manifold into an autoencoder, which does not guarantee invariance.
We now state the conditions of the existence and uniqueness of an invariant manifold.
Theorem 10. Assume that there exists an integer σ ≥ 2, such that ℵE < σ ≤ r. Also
assume that ν
µkmk 6= µj , j = ν + 1, . . . , n
Y
(13)
k=1
for all mk ≥ 0 such that k=0 mk ≤ σ − 1. Then in a sufficiently small neighbourhood of the
Pν
origin there exists an invariant manifold M tangent to the invariant linear subspace E of the
C r map F . The manifold M is unique among the σ times differentiable manifolds and it is
also C r smooth.
U 1 W (z) = z, (14)
where U 1 : Z → X is a linear map with full rank such that E ∩ ker U ?1 = {0}. This is
similar, but more general than a graph-style parametrisation (akin to theorem 1.2 in [13]),
where the range of U 1 must span the linear subspace E. Constraint (14) can break down for
large kzk, when U 1 DW (z) does not have full rank. A globally suitable constraint is that
DW ? (z) DW (z) = I.
For linear subspaces E and E ? with eigenvalues closest to the complex unit circle (rep-
resenting the slowest dynamics), iE = 1 and ℵE is maximal. Therefore the foliation corre-
sponding to the slowest dynamics requires the least smoothness, while the invariant manifold
requires the maximum smoothness for uniqueness. The ultimate factor that decides what
method is whether it can be fitted to data and can produce a ROM at the same time, which
only invariant foliations are capable of. Table 1 summarises the main properties of the three
conceptually different model identification techniques.
9
Invariant Invariant Autoencoder
Foliation Manifold
Applicable to YES YES YES
Math. Models
Applicable to YES NO YES
Data
Obtains a ROM YES YES NO
Uniqueness slowest most slowest least NO
unique unique
References [49, 36] [47, 20, 13, 14, [16, 17, 32]
28, 50, 55]
nates in a transversal direction to the manifold and W 0 prescribes where the actual manifold
is along this transversal direction, while U provides the parametrisation of the manifold. All
other leaves of the approximate foliation F̂ are shifted copies of M along the U ⊥ direction as
10
Figure 5: Approximate invariant foliation. The linear map U ⊥ and the nonlinear map U
create a coordinate system, so that in this frame the invariant manifold M is given by
U ⊥ x = W 0 (z), where z = U (z). We then allow to shift manifold M in the U ⊥ direction
such that U ⊥ x − W 0 (z) = ẑ, which creates a foliation parametrised by ẑ ∈ Ẑ. If this
foliation satisfies the invariance equation in a neighbourhood of M with a linear map B as
per equation (16), then M is an invariant manifold. In other words, nearby blue dashed
leaves are mapped towards M by linear map B the same way as the underlying dynamics
does.
11
I
hence DW (0) = Q −1
. The nonlinear part of (18) is
0
!
− Ue DW (0) z + W f (z)
QWf (z) = ,
W 0 (z)
As we will see in section 5, this approach provides better results than using an autoen-
coder. Here we have resolved the dynamics transversal to the invariant manifold M up to
linear order. It is essential to resolve this dynamics to find invariance, not just the location
of data points.
Remark 13. This method is not without problems. The first is that Û (xk ) might assume
large values over some data points. This either requires to replace B with a nonlinear map
or we need to filter out data points that are not in a small neighbourhood of the invariant
manifold M. Due to B being high-dimensional, replacing it with a nonlinear map leads
to computational difficulties. Filtering data is easier. For example, we can assign weights
to each term in our optimisation problem (7) depending on how far a data point is from
the predicted manifold, which is the zero level surface of Û . This can be done using the
optimisation problem
1
N
2
2
arg min exp − 2
Û (xk )
B Û (xk ) − Û (y k )
,
X −2
kxk k (20)
B,U ⊥ ,W 0
k=1
2κ
where κ > 0 determines the size of the neighbourhood of the invariant manifold M that we
take into account.
Remark 14. The approximation (16) can be made more accurate if we allow matrix U ⊥ to
vary with the parameter of the manifold z = U (x). In this case the encoder becomes
This does increase computational costs, but not nearly as much as if we were calculating a
globally accurate invariant foliation.
Remark 15. It is also possible to eliminate a-priori calculation of U . We can assume that
U is a linear map, such that U U T = I and treat it as an unknown in representation (15).
The assumption that U is linear makes sense if we limit ourselves to a small neighbourhood
of the invariant manifold M by setting κ < ∞ in (20), as we have already assumed a linear
dynamics among the leaves of the associated foliation F̂ given by linear map B. Once B,
12
U , U ⊥ and W 0 are found, the map S can also be fitted to the invariance equation (2). The
equation to fit S to data is
1
N
2
arg min exp − 2
Û (xk )
kU y k − S (U xk )k2 ,
X −2
kxk k
S
k=1
2κ
which is a straightforward linear least squares problem, if S is linear in its parameters. This
approach will be further explored elsewhere.
Without much thinking, (as described in [50, 49]), the instantaneous frequency and damping
could be calculated as
respectively. The instantaneous amplitude is a norm A (r) = kW (r, ·)k, for example
1
Z 2π
kf k = hf , f i, hf , f i = hf (θ) , f (θ)iX dθ,
p
(24)
2π 0
13
Equation (25) means that the instantaneous amplitude of the trajectories on manifold M
is the same as parameter r, hence the map r 7→ R (r) determines the change in amplitude.
Equation (26) stipulates that the parametrisation in the angular variable is such that there
is no phase shift between the closed curves W (r1 , ·) and W (r2 , ·) for r1 6= r2 . If there would
be a phase shift γ, a trajectory that within a period moves from amplitude r1 to r2 , would
misrepresent its instantaneous period of vibration by phase γ, hence the frequency given by
T (r) would be inaccurate. In fact, one can set a continuous phase shift γ (r) among the
closed curves W (r, ·), such that the frequency given by T (r) is arbitrary. The following
result provides accurate values for the instantaneous frequency and damping ratio.
Proposition 16. Assume a decoder W : [0, r1 ]×[0, 2π] → X and functions R, T : [0, r1 ] → R
such that they satisfy invariance equation (21).
1. A new parametrisation W̃ of the manifold generated by W that satisfies the constraints
(25) and (26) is given by
2
r
W̃ (r, θ) = W (t, θ + δ (t)) , t = κ−1
,
2
where
hD1 W (ρ, θ) , D2 W (ρ, θ)iX dθ
Z r
R 2π
δ (r) = − R02π dρ, (27)
0 0
hD2 W (ρ, θ) , D2 W (ρ, θ)iX dθ
1
Z rZ 2π
κ (r) = hD1 W (ρ, θ) , W (ρ, θ)iX dθdρ (28)
2π 0 0
14
Remark 18. The same calculation applies to vector fields, ẋ = f (x), but the final result
is somewhat different. Assume a decoder W : [0, r1 ] × [0, 2π] → X and functions R, T :
[0, r1 ] → R such that they satisfy the invariance equation
rk+1 = eζ0 ω0 rk
(33)
θk+1 = θk + ω0
r cos θ − 14 r3 cos3 θ
W (r, θ) = . (34)
r sin θ + 21 r3 cos3 θ
In terms of the polar invariance equation (32) the linear map (33) translates to T (r) = ω0
and R (r) = eζ0 ω0 r. If we disregard the nonlinearity of W , the instantaneous frequency
and the instantaneous damping ratio of our hypothetical system would be constant, that is
ω (r) = ω0 and ζ (r) = ζ0 . Using proposition 16, we calculate the effect of W , and find that
2 15r2 − 8 1
δ(r) = − √ tan−1
√ + tan −1
√ ,
19 8 19 19
1 2
κ(r) = r 25r4 − 48r2 + 256 .
512
Finally, we plot the expressions (29) and (30) in figure 6(a) and 6(b), respectively. It can be
seen that frequencies and damping ratios change with the vibration amplitude (red lines),
but they are constant without taking the decoder (34) into account (blue lines).
The geometry of the reparametrisation is illustrated in figure 7.
15
1.0 1.0
(a) (b)
0.8 0.8
0.6 0.6
r
r
0.4 0.4
0.2 0.2
0.0 0.0
0.990 0.992 0.994 0.996 0.998 1.000 0.0190 0.0195 0.0200 0.0205 0.0210
ω(r ) ζ(r )
Figure 6: The instantaneous frequency (a) and instantaneous damping of system (33) together
with the decoder (34). The blue line represents the naive calculation just from system (33)
and the red lines represent the corrected values (65) and (66).
Figure 7: Geometry of the decoder (34). The blue grid represents the polar coordinate system
with maximum radius r = 1. The red curves are the images of the blue concentric circles
under the decoder W . Each red dot corresponds, where the phases kπ/5, k = 1, . . . 10, or the
intersections of the blue radial lines with the blue circles mapped by W . Therefore the red
dots deviating from the radial blue lines indicate a phase shift. The green curves correspond
to the images of the blue concentric circles under the re-parametrised decoder W̃ for the
same parameters as the red curves. The amplitudes of the green curves are now the same as
the amplitude of the corresponding blue curves. On average there is no phase shift among
the green curves, that is the displacement of the black dots from the blue radial lines is zero
on average.
16
4 ROM identification procedures
Here we describe our methodology of finding invariant foliations, manifolds and autoencoders.
These steps involve methods described so far and further methods from the appendices. First
we start with finding an invariant foliation together with the invariant manifold that has the
same dynamics as the foliation.
F1. If the data is not from the full state space, especially when it is a scalar signal,
we use a state-space reconstruction technique as described in appendix B. At this
point we have data points xk , y k ∈ X, k = 1, . . . , N .
F2. To ensure that the solution of (7) converges to the desired foliation, we calculate
a linear estimate of the foliation. We only consider a small neighbourhood of the
fixed point, where the dynamics is nearly linear. Hence, we define the index set
Iρ = {k ∈ {1, . . . , N } : |xk | < ρ} with ρ sufficiently small, but large enough that
it encompasses enough data for linear parameter estimation. Then we create
a least-squares estimate [8] of the linearised system about the equilibrium by
calculating
K=
X
|xk |−2 xk ⊗ xTk ,
k∈Iρ
L=
X
|xk |−2 y k ⊗ xTk ,
k∈Iρ
A = LK −1 ,
where matrix A approximates the Jacobian DF (0). We calculate the left and
right invariant subspaces (E ? and E, respectively) of matrix A, by using real
Schur decomposition, that is A = QHQT , where Q is unitary and H is in an
upper Hessenberg form, which is zero below the first subdiagonal. The Schur
decomposition is calculated (or rearranged) such that the first ν column vectors,
q 1 , . . . , q ν of Q span the required right invariant subspace E and correspond to
eigenvalues µ1 , . . . , µν . Therefore we define W 1 = [q 1 , . . . , q ν ]. We repeat the
T
Schur decomposition for AT = Q̂Ĥ Q̂ , where Q̂ = [q̂ 1 , . . . , q̂ n ] with the same
order of eigenvalues in the diagonal blocks of H and Ĥ. This gives us the initial
guess DU (0) ≈ U 1 = [q̂ 1 , . . . , q̂ ν ]T , which spans the left invariant subspace E ? .
The initial guess for the map S is such that DS (0) ≈ S 1 = U 1 AU T1 . Finally,
we rearrange the decomposition AT = Q̃H̃ Q̃ such that the selected eigenvalues
T
=
T
µ1 , . . . , µν now appear last in the diagonal of H̃, and we set U ⊥
1 q̃ 1 , . . . , q̃ n−ν
and B 1 = U ⊥ 1 AU ⊥T
1 . This provides the initial guesses U ⊥
≈ U ⊥
1 and B ≈ B1
in optimisation (20).
F3. We solve problem (7) with the initial guess DU (0) = U 1 and DS (0) = S 1 as cal-
culated in the previous step. We also prescribe the constraints that DU (0) (DU (0))T =
I and that U (W 1 z) is linear in z. The computational representation of the en-
coder U is described in appendix A.3.
17
F4. We perform a normal form transformation on map S and transform U into the
new coordinates. The normal form S n satisfies the invariance equation U n ◦ S =
S n ◦ U n (c.f. equation (2)), where U n : Z → Z is a nonlinear map. We then
replace U with U n ◦ U and S with S n as our reduced order model. This step
is optional and only required if we want to calculate instantaneous frequencies
and damping ratios. In a two dimensional coordinate system, where we have a
complex conjugate pair of eigenvalues µ1 = µ2 , the real valued normal form is
which leads to the polar form (21) with R (r) = r fr2 (r2 ) + fi2 (r2 ) and T (r) =
p
f i (r 2 )
tan−1 fr (r2 ) . This normal form calculation is described in [49].
F5. To find the invariant manifold, we use the approximate foliation (15) and solve
the optimisation problem (20) to find the decoder W of invariant manifold M.
The initial guess in problem (20) is such that U ⊥ = U ⊥
1 and B = B 1 . We also
need to set a κ parameter, which is assumed to be κ = 0.2 throughout the paper.
We have found that results are only sensitive to the value of kappa if extreme
values are chosen such as 0, ∞.
The procedure for the Koopman eigenfunction calculation is the same as steps F1-F6, except
that S is assumed to be linear. To identify an autoencoder, we use the same setup as in
[16]. The computational representation of the autoencoder is described in appendix A.2. We
carry out the following steps
18
5 Examples
We are considering three examples. The first is a caricature model to illustrate all the
techniques discussed in this paper and why certain techniques fail. The second example is
series of synthetic data sets with higher dimensionality, to illustrate the methods in more
detail using a polynomial representations of the encoder with HT tensor coefficients (see
appendix A.3). This example also illustrates two different methods to reconstruct the state
space of the system from a scalar measurement. The final example is a physical experiment
of a jointed beam, where only a scalar signal is recorded and we need to reconstruct the state
space with our previously tested technique.
M 4 = {V (0, z) , z ∈ R} .
5
L 4 ,z = {V (z, z) , z ∈ R} . (37)
5
model to.
We first attempt to fit an autoencoder to the data. We assume that the encoder, decoder
and the nonlinear map S are
U (x1 , x2 ) = x1 , (38)
W (z) = (z, h (z))T , (39)
S (z) = λ (z) , (40)
19
where λ, h are polynomials of order-5. Our expressions already contain the invariant subspace
E = span (1, 0)T , which should make the fitting easier. Finally, we solve optimisation problem
(8). The result of the fitting can be seen in figure 8(a) as depicted by the red curve. The
fitted curve is more dependent on the distribution of data than the actual position of the
invariant manifold, which is represented by the blue dashed line in figure 8(a). Various other
expressions for U and W were also tried that does not assume the direction of the invariant
subspace E with similar results.
To calculate the invariant foliation in the horizontal direction, we assume that
U (x1 , x2 ) = x1 + u (x1 , x2 )
, (41)
S (z) = λz
where u is an order-5 polynomial which lacks the constant and linear terms. The exact
expression of U is not a polynomial, because it is the second coordinate of the inverse of
function V . The fitting is carried out by solving the optimisation problem (7). The result
can be seen in figure 8(b), where the red curves are contour plots of the identified encoder U
and the dashed blue lines are the leaves as defined by equation (36). Figure 8(c) is produced
in the same way as 8(b), except that the encoder is defined as U (x1 , x2 ) = x2 + h (x1 , x2 )
and the blue lines are the leaves given by (37).
As we have discussed in section 2.2, an approximate encoder can also be constructed from
a decoder. In the expression of the encoder (15) we take
W 0 (z) = h (z)
and U ⊥ = (0, 1), where h is an order-9 polynomial without constant and linear terms. The
expressions for U and S were already found as (41), hence our approximate encoder becomes
We solve optimisation problem (20) with κ = 0.13. We do not reconstruct the decoder W ,
as it is straightforward to plot the level surfaces of Û directly. The result can be seen in
figure 8(d), where the green line is the approximate invariant manifold (the zero level surface
of Û ) and the red lines are other level surfaces of Û .
In conclusion, this simple example shows that only invariant foliations can be fitted to
data, even if only approximate. Autoencoders give spurious results.
20
a) b)
c) d)
Figure 8: Identifying invariant objects in equation (35). The data contains 500 trajectories
of length 30 with initial conditions picked from a uniform distribution; a) fitting an autoen-
coder (red continuous curve) does not reproduce the invariant manifold (blue dashed curve),
instead it follows the distribution of data; (b) invariant foliation in the horizontal direction;
c) invariant foliation in the vertical direction; d) an invariant manifold is calculated as the
leaf of a locally accurate invariant foliation. The
green line is the invariant manifold. Each
diagram represent the box − 45 , 54 × − 14 , − 41 , the axes labels are intentionally hidden.
ṙ1 = − 500
1
r1 + 100
1 3 1 5
r1 − 10 r1 , θ̇1 = 1 + 14 r12 − 10
3 4
r1 ,
ṙ2 = − 500q
e 1 5
r2 − 10 r2 , θ̇2 = e + 20 r2 − 5 r2 ,
3 2 1 4
√
ṙ3 = − 50 10 r3 + 100 r3 − 10 r3 , θ̇3 = 30 + 50
1 3 1 3 1 5 9 2
r3 − 10019 4
r3 , (42)
ṙ4 = − 500
1
π 2 r4 + 100
1 3
r4 − 1 5
r ,
10 4
θ̇4 = π 2 + 4 2
r
25 4
− 17 4
r ,
100 4
ṙ5 = − 500 r5 + 100 r5 ,
13 1 3
θ̇5 = 13 + 4 2
r
25 5
− 9 4
r .
50 5
The first transformation brings the polar form of equation (42) into Cartesian coordinates
using the transformation y = g (x), which is defined by y2k−1 = rk cos θk and y2k = rk sin θk .
Finally, we couple all variables using the second nonlinear transformation y = h (z), which
reads
y1 = z1 + z3 − 121
z3 z5 , y2 = z2 − z3 ,
y3 = z3 + z5 − 12 z5 z7 , y4 = z4 − z5 ,
1
y5 = z5 + z7 + 121
z7 z9 , y6 = z6 − z7 , (43)
y7 = z7 + z9 − 12 z1 z9 , y8 = z8 − z9 ,
1
y9 = z9 + z1 − 121
z3 z1 , y10 = z10 − z1 ,
and where y = (y1 , . . . , y10 ) and z = (z1 , . . . , z10 ). The two transformations give us the
differential equation ż = f (z), where
21
The natural frequencies of our system at the origin are
√
ω1 = 1, ω2 = e, ω3 = 30, ω4 = π 2 , ω5 = 13
22
(a) (b) (c)
amplitude
amplitude
0.04 0.04 0.04 ST-1
ST-2
0.02 0.02 0.02 ST-3
ST-4
VF
0.00 0.00 0.00
0 10 20 30 40 0.95 1.00 1.05 10−3 10−2 10−1
density frequency [rad/s] damping ratio [-]
amplitude
amplitude
0.04 0.04 0.04 PCA-1
PCA-2
0.02 0.02 0.02 PCA-3
PCA-4
VF
0.00 0.00 0.00
0 10 20 30 40 0.95 1.00 1.05 10−3 10−2 10−1
density frequency [rad/s] damping ratio [-]
amplitude
amplitude
Figure 9: Instantaneous frequencies and damping ratios of differential equation (44) using
invariant foliations. The first column shows the data density with respect to the scalar
value kU (xk )k, the second column is the instantaneous frequency and the third column is
the instantaneous damping. The first row corresponds to state space date, the second row
shows PCA reconstructed data and the third row is DFT reconstructed data. The data
density is controlled by sampling initial conditions from different sized neighbourhoods of
the origin. A too small neighbourhood results in inaccurate high-amplitude predictions, too
large neighbourhood invalidate assumptions about embedding dimensions. The amplitude
A (r) is calculated using formula (24), which is a root mean square calculation.
23
(a) (b)
0.06 0.06
amplitude
amplitude
0.04 0.04 ST-1
ST-2
0.02 0.02 ST-3
ST-4
VF
0.00 0.00
0.950 0.975 1.000 1.025 1.050 10−3 10−2 10−1
frequency [rad/s] damping ratio [-]
Figure 10: ROM by Koopman eigenfunctions. A sample of the same quantities are calculated
as in figure 9, except that map S is assumed to be linear. There is some variation in the
frequencies and damping ratios with respect to the amplitude due to the corrections in section
3, but accurate values could not be recovered as the linear approximation does not account
for near internal resonance, as in formula (12).
more data with high amplitudes further increases inaccuracies as can be seen for curves
labelled DFT-2, DFT-3, DFT-4. The author has also tried non-optimal delay embedding,
with inferior results. None of the techniques had any problem with the less the challenging
Shaw-Pierre example [47, 49] (data not shown).
When restricting map S to be linear, we are identifying Koopman eigenfunctions. De-
spite that linear dynamics is identified we should be able to reproduce the nonlinearities as
illustrated in section 3. However, we also have near internal resonances as per equation (12),
which make certain terms of encoder U large, which are difficult to find by optimisation.
The result is what can be seen in figure 10. The identified frequencies and damping ratios
show little variation with amplitude and mostly capture the average of the reference values.
Knowing that autoencoders are only useful if all the dynamics is on the manifold, we have
synthetically created data consisting of trajectories with initial conditions from the invariant
manifold of the first natural frequency. We used 800 trajectories, 24 points each with time-
step ∆t = 0.2 starting on the manifold. Fitting an autoencoder to this data yields a good
match in figure (11), the corresponding lines are labelled MAN. Then we tried dataset ST-1,
that matched the reference best when calculating an invariant foliation. However, our data
does not lie on a manifold and it is impossible to make W ◦ U close to the identity on our
data. The results in figure (11) show this: the frequency and damping curves are far from
the reference.
24
(a) (b)
MAN
0.06 ST-1 0.06
VF-1
amplitude
amplitude
0.04 VF-2 0.04
0.02 0.02
0.00 0.00
1.0 1.5 2.0 2.5 3.0 10−2 100
frequency [rad/s] damping ratio [-]
Figure 11: Data analysis by autoencoder. An autoencoder can recover system dynamics if
all data is on the invariant manifold. For the solid line MAN, data was artificially forced to
be on an a-priori calculated invariant manifold. However if the data is not on an invariant
manifold, such as for data set ST-1, the autoencoder calculation is meaningless. The dotted
line VF-1 represents the analytic calculation for the first natural frequency, the dash-dotted
VF-2 depicts the analytic calculation for the second natural frequency of vector field (44).
(a) (b)
Accelerometer
Impact
Figure 12: (a) Experimental setup. (b) Schematic of the jointed beam. The width of the
beam (not shown) is 25mm. All measurements are in millimetres.
25
(a) (b) (c)
0.100 0.100 0.100
0 Nm
1.0 Nm
0.075 0.075 0.075 2.1 Nm
3.1 Nm
amplitude
amplitude
amplitude
0.050 0.050 0.050 0 Nm
1.0 Nm
2.1 Nm
0.025 0.025 0.025 3.1 Nm
Figure 13: Instantaneous frequencies and damping ratios of the jointed beam setup in figure
12. Solid red lines correspond to the invariant foliation of the system with minimal tightening
torque, blue dashed lines with tightening torque of 2.1 Nm and green dotted lines with
tightening torque of 3.1 Nm. The markers of the same colour show results calculated in [53]
using curve fitting. (a) data density with respect to amplitude kU (xk )k, (b) instantaneous
frequency, (c) instantaneous damping ratio.
3.1 Nm. The free vibration of the steel beam was recorded using an accelerometer placed at
the end of the beam. The vibration was initiated using an impact hammer at the position of
the accelerometer. Calibration data for the accelerometer is not available. For each torque
value a number of 20 seconds long signals were recorded. The impacts were of different
magnitude so that the amplitude dependency of the dynamics could be tracked. In [53], a
linear model was fitted to each signal and the first five vibration frequencies and damping
ratios were identified. These are represented by various markers in figure 13. In order to
make a connection between the peak impact force and the instantaneous amplitude we also
calculated the peak root mean square (RMS) amplitude for signals with the largest impact
force for each tightening torque and found that the average conversion factor between the
peak RMS amplitude and the peak impact force was 443, which we used to divide the peak
force and plot the equivalent peak RMS in figure 13.
To calculate the invariant foliation we used a 12-dimensional PCA-reconstructed state
space, as described is appendix B.1. As opposed to our synthetic data, the signal could
be accurately reproduced in 12 dimensions and at the same time the data filled up a non-
negligible volume of the reconstructed state space. We choose κ = 0.2 in problem (20) when
finding the invariant manifold. The result can be seen in figure 13. Since we do not have
the ground truth for this system it is not possible to tell which method is more accurate.
However, we can argue that linear identification at high amplitudes should produce errors,
because of non-negligible nonlinearities, hence the invariant foliation should be more accurate
there.
We can also assess, whether the measured signal is reproduced by the invariant foliation
(U , S) in figure 14. For this we apply the encoder U to our original signal xk , k = 1, . . . , N
and compare this signal to the one produced by the recursion z k+1 = S (z k ), where z 1 =
U (x1 ). In the case of our data series with 2.1 Nm tightening torque there was small error
in the amplitude and even the phase accumulated slowly over time. For the loosest possible
26
(a) (b) (c) (d)
0.100 0.10 Measurement 0.100 0.10 Measurement
Prediction Prediction
Error Error
0.075 0.05 0.075 0.05
kzk
kzk
z1
0.00
z1
0.050 0.050 0.00
Figure 14: Reconstruction error of the foliation. (a,b) Reconstructing the first signal of
the series of experimental data with 0 Nm tightening torque. (c,d) Reconstructing the first
signal of the series of experimental data with 2.1 Nm tightening torque. The compared values
are calculated through the encoder z = U (xk ) and the reconstructed values are from the
iteration z k+1 = S (z k ). The amplitude error (kzk) is minimal, however, over time phase
error accumulates, hence direct comparison of a coordinate (z1 ) can look unfavourable.
tightening torque (0 Nm) there was a reasonably low amplitude error, but the phase error
accumulated to its maximum at about a seconds, from which point onwards there was no
further accumulation. Note that our fitting procedure does not accumulate phase errors, it
only minimises the prediction error for a single time step, the maximum of which over all
data point was in the range of 10−3 for all four tightening torques and which has occurred
at the largest amplitudes. Unfortunately, accumulating phase error is rarely shown in the
literature, where only short trajectories are compared, in contrast to the 28685 and 25149
time-steps that are displayed in figure 14.
6 Discussion
The main conclusion of this study is that only invariant foliations are suitable for ROM identi-
fication from off-line data. Using an invariant foliation avoids the need to use resonance decay
[22], or waiting for the signal to settle near the most attracting invariant manifold, thereby
throwing away valuable data. Using invariant foliations can make use of unstructured data
with arbitrary initial conditions, such as impact hammer tests. Invariant foliations produce
genuine reduced order models and not only parametrise a-priori known invariant manifolds,
like other methods [16, 17, 58]. We have shown that the high-dimensional function required
to represent an encoder can be represented by polynomials with sparse tensor coefficients,
which significantly reduces the computational and memory costs. Sparse tensors are also
amenable to further analysis, such as singular value decomposition [25], which gives way to
mathematical interpretations of the decoder U . The low dimensional map S is also amenable
to normal form transformations, which can be used to extract information such as instanta-
neous frequencies and damping ratios.
We have tested the related concept of Koopman eigenfuntions, which differs from an
invariant foliation in that map S is assumed to be linear. If there are no internal reso-
nances, Koopman eigenfunctions are theoretically equivalent to invariant foliations. However
27
in numerical settings unresolved near internal resonances become important and therefore
Koopman eigenfunctions become inadequate. We have also tried to fit autoencoders to our
data [16], but apart from the artificial case where the invariant manifold was pre-computed,
it performed even worse than Koopman eigenfunctions.
Fitting an invariant foliation to data is extremely robust when state space data is avail-
able. However, when the state space needed to be reproduced from a scalar signal, the results
were not as accurate as we hoped for. While Taken’s theorem allows for any generic delay
coordinates, in practice a non-optimal choice can lead to poor results. We were expecting
that the embedding dimension at least for low amplitude signals would be the same as the at-
tractor dimension. This is however not true if the data also includes higher amplitude points.
Despite not being theoretically optimal, we have found that perfect reproducing filter banks
produce accurate results for low amplitude signals and at the same time provide a state-space
reconstruction with the same dimensionality as that of the attractor. For our experimental
data optimal PCA based state-space reconstruction was appropriate as the underlying sys-
tem was a continuum and therefore the signal was not on a submanifold for any choice of
dimensionality of the reconstructed state-space. Future work should include exploring var-
ious state-space reconstruction techniques in combination with fitting an invariant foliation
to data.
We did not fully explore the idea of locally accurate invariant foliations in remark 15,
which can lead to computationally efficient methods. Further research can also be directed
towards cases, where the data is on a high-dimensional submanifold of an even higher di-
mensional vector space X. This is where an autoencoder and invariant foliation may be
combined.
A Implementation details
In this appendix we deal with how to represent invariant foliations, locally accurate invariant
foliations and autoencoders. We also describe our specific techniques to carry out optimisa-
tion on these representations.
Mc,d = {m ∈ Nν : c ≤ m1 + · · · + mν ≤ d} .
The ordering of Mc,d is such that m < n if there exists k ≤ ν and mj < nj for all j =
1, . . . , k. The cardinality of set Mc,d is denoted by # (ν, c, d). Therefore we can also write
28
that Mc,d = m1 , m2 , . . . , m#(ν,c,d) . Using the ordered notation of monomials, a polynomial
U T U = I,
(45)
U T W = 0,
which turns the admissible set of matrices U , W into a matrix manifold. We call this
manifold the orthogonal autoencoder manifold and denote it by OAE n,ν,#(ν,2,d) . In paper [16],
the authors also consider the case when the decoder is decoupled from the encoder. In this
case W (z) does not include the orthogonal matrix U and therefore W (z) is represented by
matrix W ∈ Rn×#(ν,1,d) , which also includes linear terms. The constraint on matrices U , W
is therefore different
U T U = I,
(46)
UTW = M,
where matrix M represents the identity polynomial with respect to the monomials M1,d .
We call the resulting matrix manifold the generalised orthogonal autoencoder manifold and
denote it by GOAE M ,n,ν,#(ν,1,d) . Unfortunately, this generalised autoencoder leads to an ill-
defined optimisation problem. Indeed, if all the data is on an invariant manifold (the only
sensible case), then the directions encoded by U is not defined by the data, because any
transversal directions are equally suitable for projecting the data. In our numerical test U
quickly degenerated and led to spurious results. Nevertheless, due to the similar formulation
we treat both cases at the same time, because OAE n,m,l is a particular case of GOAE M ,n,m,l
with M being equal to zero. The details of how projections and retractions are calculated
for GOAE M ,n,m,l are presented in section A.5.2.
29
free lunch [57]). In addition, the approximation can overfit the data, that is the accuracy on
unseen data can be significantly worse than in the training data. In terms of the geometry
of neural networks, they do not form a differentiable manifold and therefore parameters may
tend to infinity without minimising the loss function [44].
To represent encoders we still use polynomials, except that the polynomial coefficients
are represented by sparse tensors in the hierarchical Tucker (HT) format. This tensor format
was introduced in [27] and demonstrated to solve many problems [26] that would normally
wrestle with the ’curse of dimensionality’ [3]. In chemisty and physics a somewhat different
but related format, the matrix product state [43], is frequently used and in other problems
a particular type of HT representation the tensor train format [41] is used. However, the
most important property of HT tensors is that they form a smooth quotient manifold and
therefore suitable to use in optimisation problems [54].
To define the format, we partly follow the notation of [54] in our definitions. First, we
need to define the dimension tree.
Definition 20. Given an order d, a dimension tree T is a non-trivial rooted binary tree
whose nodes can be labelled by subsets of {1, 2, . . . , d} such that
1. the root has the label tr = {1, 2, . . . , d}, and
2. every node t ∈ T , which is not a leaf has two descendants, t1 and t2 that form an
ordered partition of t, that is,
{Ut : t ∈ L} ∪ {Bt : t ∈ T \ L}
fully specifies a HT tensor. For each node, apart from the root node tr , one can apply a trans-
formation, which does not change the resulting tensor. That is, for t 6= tr define Ũt = At Ut
and B̃t = Bt ×2,3 A−1 −1
t1 ⊗ At2 , where At ∈ R
kt ×kt
are invertible matrices. Without restricting
30
generality, we can stipulate that the leaf nodes Ut and matricisation of Bt with respect to the
first index are also orthogonal matrices, which means that transformations At must be uni-
tary to preserve orthogonality. This identity transformation defines an equivalence relation
between parametrisations of a HT tensor and therefore the HT format is a quotient manifold
as described in [7, chapter 7]. Using a quotient manifold would prevent us from optimising
for the matrices Ut , Bt individually, as we need to treat the whole HT tensor as a single
entity. In addition, we could not find any example in the literature, where a HT tensor was
treated as a quotient manifold. Instead, we simply assume a simpler structure, that is, Btr
is on a Euclidean manifold and the rest of the coefficients Bt and Ut are orthogonal matrices
and therefore elements of Stiefel manifolds, which is the running example in [7]. This way
we have the HT format as an element of a product manifold, where each coefficient matrix
remains independent of each other. It is also possible to calculate singular values of HT
tensors. Vanishing singular values indicate that the required quantity is well-approximated
and that the HT tensor can be simplified to include less terms. This further adds to our
ability to identify parsimonious models. In what follows, we use balanced dimension trees
and up to rank six matrices Ut , Bt for simplicity.
Using notation (47), we can write the encoders as
p n
U (x) = U x + Utdr (·; i1 , . . . , id ) xi1 · · · xid ,
X X
1
(48)
d=2 i1 ···id =1
31
To identify locally accurate foliations we use the Riemannian trust region method and
for identifying autoencoders, we use the Riemannian BFGS quasi-Newton technique from
software package [4].
Projection like retractions are explained in detail in [2]. Using a retraction we can pull back
our result onto the manifold. The Hessian of F can be defined in terms of a second order
retraction R. If we assume that F̃ (p, X) = F (Rp (P p X)), then the Hessian is
D2 F (p) = D22 F̃ (p, 0) .
The expression of the Hessian can be simplified in many ways, which is described, e.g., in [7].
In what follows we detail two matrix manifolds that do not appear in the literature and
are specific to our problems. We also use the Stiefel manifold of orthogonal matrices that is
covered in many publications [7]. In particular, we use the so-called polar retraction of the
Stiefel manifold, which is equivalent to equation (49). For our specific matrix manifolds we
solve (49).
32
of orthogonal matrices in Rn×m , whose range is in the orthogonal complement of the range
of a fixed orthogonal matrix Π. We find that this set is also a manifold, which we call the
restricted Stiefel manifold and denote by RStΠ,n,m .
A + AT
Dh (p) X = .
B
This calculation shows that Dh (p) has a full rank and therefore RStΠ,n,m is an embedded
manifold.
A + AT
0
Dh (X) V = = . (50)
B 0
hX, Y i = pA + p⊥ C, pK + ΠL + p⊥ M = 0
= trace AT K + C T M = 0
(51)
holds. To satisfy equation (51), K must be symmetric, but L can be arbitrary and M = 0.
Projection to tangent space. Now, we let X be an arbitrary matrix in the tangent space
of the ambient space Tp E. Its projection to Tp RStΠ,n,m is given by
Y = X − pK − ΠL,
33
where the unknown K is symmetric and L is arbitrary. To determine the unknowns K, L,
we solve the equation Dh (p) Y = 0, which is
p X + XT p − K − KT
T
0
Dh (p) (X − pK − ΠL) = T = . (52)
Π X −L 0
pT X + X T p
K= ,
2
L = ΠT X,
pT X + X T p
P p X = I − ΠΠT X − p
.
2
Projective retraction. Similar to the polar retraction of the Stiefel manifold, we use the
orthogonal projection of an element of the ambient space onto the manifold as the retraction.
The orthogonal projection is given by
D2 g (p, α, β) = pT p − I
D3 g (p, α, β) = ΠT p,
which must vanish at the stationary point. We also define σ = I + α + αT , so we get the
equations
pσ + Πβ = q,
pT p = I,
ΠT p = 0,
34
which bring us σ = pT q, β = ΠT q and p = I − ΠΠT qσ −1 . Finally, substituting into the
constraint pT p = I we find
q T q − q T ΠΠT q = σ 2
and therefore
P Stn,m (q) = p = I − ΠΠT q q T q − q T ΠΠT q
−1/2
.
Now we can assume that q = p + X, where (p, X) ∈ T RStn,m and define Rp (X) =
P Stn,m (p + X). As we know that ΠT (p + X) = 0, we can write that
−1/2
Rp (X) = (p + X) (p + X)T (p + X)
= (p + X) I + X T X
−1/2
.
This is the same retraction as for the Stiefel manifold, hence if q = p + X = U ΣV T is a
singular value decomposition, then we have Rp (X) = U V T .
Tangent space. The tangent space T(p1 ,p2 ) M is defined as the null space of Dh (p), that
is
T(p1 ,p2 ) M = X ∈ Rn×m : Dh (p1 , p2 ) (X 1 , X 2 ) = (0, 0) .
AT1 + A1
Dh (p1 , p2 ) (X 1 , X 2 ) = . (54)
1 p2 + A2
B T1 p⊥T
For (55) to equal zero, A1 must be antisymmetric, B 1 , B 2 can be arbitrary and A2 =
−B T1 p⊥T
1 p2 .
35
Normal space. We use X 1 = p1 A1 + p⊥ 1 B 1 , X 2 = −p1 B 1 p1 p2 + p1 B 2 to represent
T ⊥T ⊥
an arbitrary element of the ambient space. For (Y 1 , Y 2 ) to be a normal vector in N(p1 ,p2 ) M
equation
hX 1 , Y 1 i + hX 2 , Y 2 i = 0 (55)
must hold for all B 1 , B 2 and A1 antisymmetric matrices. Therefore we expand (55) into
1 p2 , K 2 + hB 2 , L2 i = 0.
hA1 , K 1 i + hB 1 , L1 i − B T1 p⊥T
(56)
To explore what parameters are allowed, we consider (56) term-by-term. If we set B 1 = 0,
B 2 = 0, then what remains is hA1 , K 1 i = 0 for all A1 anti-symmetric. This constraint holds
if and only if K 1 is symmetric. Now we set A1 = 0, B 1 = 0, which leads to hB 2 , L2 i = 0 for
all B 2 , hence we must have L2 = 0. Finally we set A1 = 0, B 2 = 0, which gives us
1 p2 , K 2 = 0,
hB 1 , L1 i − B T1 p⊥T
1 B1K 2 = 0
hB 1 , L1 i − p2 , p⊥
or in index notation
1jk B1kl K2li = 0.
B1ij L1ij − p2ji p⊥ (57)
Now we differentiate (57) with respect to B 1 , which gives
1jr K2si = 0 =⇒ L1 = p1 p2 K 2 .
L1rs − p2ji p⊥ ⊥T T
36
The solution of equation (59) is
1
K1 = X T1 p1 + pT1 X 1
2
K 2 = X 1 p2 + pT1 X 2 I + pT2 p⊥
⊥T
−1
1 p1 p2
Projective retraction. The retraction we calculate is the orthogonal projection onto the
manifold (49). This is a second order retraction according to [2]. In our case, the projection
is defined as
1 1
P M (q 1 , q 2 ) = arg min kq 1 − p1 k2 + kq 2 − p2 k2 ,
(p1 ,p2 )∈M 2 2
where we can assume that m < l and m < n. Using constrained optimisation we define the
auxiliary objective function
1 1
g (p1 , p1 , α, β) = kq 1 − p1 k2 + α, pT1 p1 − I + kq 2 − p2 k2 + β, pT1 p2 − M ,
2 2
where α ∈ Rm×m , β ∈ Rm×l are Lagrange multipliers. We can also write the augmented cost
function in index notation, that is
1 1
g (p1 , p1 , α, β) = q1ij q1ij − q1ij p1ij + p1ij p1ij + αji p1kj p1ki − αii
2 2
1 1
+ q2ij q2ij − q2ij p2ij + p2ij p2ij + βji p1kj p2ki − βji Mji .
2 2
To find the stationary point of g, we take the derivatives
D1 g (p1 , p2 , α, β) = −q1rs + p1rs + αjs p1rj + αsi p1ri + βsi p2ri
= −q 1 + p1 + p1 α + p1 αT + p2 β T ,
D2 g (p1 , p2 , α, β) = −q2rs + p2rs + βjs p1rj = −q 2 + p2 + p1 β,
which must vanish. Let us define σ = I + α + αT , hence the equations to solve become
p1 σ + p2 β T = q 1 ,
p2 + p1 β = q 2 ,
pT1 p1 = I,
pT1 p2 = M .
We can eliminate unknown variables σ, β and p2 by solving the equations, that is
β = pT1 q 2 − M ,
σ = pT1 q 1 − M q T2 p1 + M M T ,
p2 = q 2 − p1 pT1 q 2 − M .
37
The equation for the remaining p1 then becomes
p1 pT1 − I q 1 − q 2 q T2 p1 + q 2 M T = 0.
(60)
→
− −1 −−−−−−−−−T→
= a(k)T ⊗ I + I ⊗ q 2 q T2 q1 + q2M ,
(k+1)
p1
−1/2
= p1
(k+2) (k+1) (k+1)T (k+1)
p1 p1 p1 ,
=
−1/2
which uses standard linear algebra operations. Note that for a matrix b we have b bT b
uv , where B = uΣv is the singular value decomposition. In practice, we set the retraction
T T
to
X1
= p1 , p2
(2t) (2t)
R(p̃1 ,p̃2 ) ,
X2
where q 1 = p̃1 + X 1 , q 2 = p̃2 + X 2 and t is some positive integer.
B State-space reconstruction
When the full state of the system cannot be measured, but it is still observable from the
output, we need to employ state-space reconstruction. We focus on the case of a real valued
scalar signal zj ∈ R, j = 1 . . . , M , sampled with frequency fs and where j stands for time.
Taken’s embedding theorem [51] states that if we know the box counting dimension of our
attractor n, the full state can almost always be observed if we create a new vector xp =
zp , zp+τ1 , . . . , zp+τd−1 ∈ Rd , where d > 2n and τ1 , . . . , τd−1 are integer delays. The required
number of delays is a conservative estimate, in many cases n ≤ d < 2n + 1 is sufficient. The
general problem, how to select the number of delays d and the delays τ1 , . . . , τd−1 optimally
[42, 33], is a much researched subject.
Instead of selecting delays τ1 , . . . , τd−1 , we use linear combinations of all possible delay
embeddings, which is a better defined problem. We create vectors of length L,
M −L
up = (zps+1 , · · · , zps+L ) ∈ R , p = 0, 1, . . . ,
L
,
s
where s is the number of samples by which the window is shifted forward in time with
increasing p. Our reconstructed state variable xp ∈ Rd h X is then calculated by a linear
map T : RL → Rd , in the form xp = T up such that our scalar signal is returned by
38
zps+` ≈ v ·xp for some ` ∈ {1, . . . , L}. We assume that the signal has m
j dominant
k frequencies
fk , k = 1, . . . , m, with f1 being the lowest frequency. We define w = ff1s as an approximate
period of the signal, where fs is the sampling frequency and set L = 2w + 1. This makes
each up capture roughly two periods of oscillations of the lowest frequency f1 .
In what follows we use two approaches to construct the linear map T .
where up are the columns of V . Then calculate the singular value decomposition (or equiv-
alently Jordan decomposition) of the symmetric matrix in the form
V V T = HΣH T ,
where H is a unitary matrix and Σ is diagonal. Now let us take the columns of H corre-
sponding to the d largest elements of Σ, and these columns then become the d rows of T . In
this case, we can only achieve an approximate reconstruction of the signal by defining
where Tk` is the element of matrix T in the k-th row and `-th column. The dot product
zps+` ≈ v·xp approximately reproduces our signal. The quality of the approximation depends
on how small the discarded singular values in Σ are.
delay filter
2
cT Kup = zps+` ,
2w + 1 `
39
that delays the signal by ` samples. Separating c` into components, such that
c` = v 1 + v 2 + · · · + v d (62)
when summed over its components, the input zps+` is recovered. The question is how to
create an optimal decomposition (62). As we assumed that our signal has m frequencies, we
can further assume that we are dealing with m coupled nonlinear oscillators, and therefore
we can set d = 2m. We therefore divide the resolved frequencies of the DFT, which are
f˜k = k 2w+1 fs
, k = 0, . . . , w into m bins, which are centred around the frequencies of the signal
f1 , . . . , fm . We set the boundaries of these bins as
f1 + f2 f2 + f3 fm−1 + fm wfs
b0 = 0, b1 = , b2 = , · · · , bm−1 = , bm =
2 2 2 2w + 1
and for each bin labelled by k, we create an index set
which ensures that all DFT frequencies are taken into account without overlap, that is ∪k Ik =
{0, 1, . . . , w} and Ip ∩ Iq = ∅ for p 6= q. This creates a decomposition (62) in the form of
( (
c`,2j if j ∈ Ik c`,2j+1 if j ∈ Ik
v2k−1,2j = , v2k,2j+1 =
0 if j ∈/ Ik 0 if j ∈
/ Ik
with the exception of v 1 , for which we also P set v1,1 = c`,1 to take into account the moving
average of the signal and to make sure that v k = c` . Here vk,l stands for the l-th element
of vector v k . Finally, we define the transformation matrix
kv 1 k−1 v T1
2 ..
H= ,
2w + 1
.
−1 T
kv 2m k v 2m
where each row is normalised such that our separated signals have the same amplitude. The
newly created signal is then
xp = T up ,
where T = HK and the original signal can be reproduced as
zps+` = kv 1 k · · · kv 2m k xp .
40
C Proof of proposition 16
Proof of proposition 16. First we explore what happens if we introduce a new parametrisation
of the decoder W that replaces r by ρ (r) and θ by θ+γ (r), where ρ : R+ → R+ is an invertible
function with ρ (0) = 0 and γ : R+ → R with γ (0) = 0. This transformation creates a new
decoder W̃ of M in the form of
Equation (63) is not the only re-parametrisation, but this is the only one that preserves the
structure of the polar invariance equation (21). After substituting the transformed decoder,
invariance equation (21) becomes
W̃ R̃ (r) , θ + T̃ (r) = F W̃ (r, θ) ,
where
In the new coordinates the instantaneous frequency ω (r) and damping ratio ζ (r) become
where we also divided by . We then need to find the value of γ (r + ) that minimises J for
a fixed . A necessary condition for a local minimum of J is that the derivative J 0 = 0, that
is,
41
We can also remove the constant phase shifts and find the phase condition
hD1 W (ρ (r) , ·) ρ0 (r) + D2 W (ρ (r) , ·) γ 0 (r) , D2 W (ρ (r) , ·)i = 0. (67)
Note that phase conditions are commonly used in numerical continuation of periodic orbits
[5], for slighly different reasons.
The next quantity to fix is the amplitude constraint given by equation (25). In the new
parametrisation, equation (25) can be written as
kW (ρ (r) , · + γ (r))k2 = r2 . (68)
Differentiating (68) with respect to r gives
hD1 W (ρ (r) , · + γ (r)) ρ0 (r) + D2 W (ρ (r) , · + γ (r)) γ 0 (r) , W (ρ (r) , · + γ (r))i = r
and removing the unnecessary phase shifts yields
hD1 W (ρ (r) , ·) ρ0 (r) + D2 W (ρ (r) , ·) γ 0 (r) , W (ρ (r) , ·)i = r. (69)
We now have two differential equations (67) and (69) to solve for ρ and γ. Unfortunately,
equations (67) and (69) are ill-defined, hence we introduce new dependent variables η and δ
such that
1
ρ (η (r)) = r =⇒ ρ0 (η (r)) = 0 ,
η (r)
δ 0 (r)
γ (η (r)) = δ (r) =⇒ γ 0 (η (r)) = 0 ,
η (r)
which brings us two new equations
hD1 W (r, ·) + D2 W (r, ·) δ 0 (r) , D2 W (r, ·)i = 0, (70)
hD1 W (r, ·) + D2 W (r, ·) δ (r) , W (r, ·)i = η (r) η (r) .
0 0
(71)
√
To further regularise equations (70) and (71), we also define η = 2κ. Differentiating the
new function we find that η 0 (r) = κ0 (r) / 2κ (r) = κ0 (r) /η (r), so that our equations (70)
p
42
Finally, substituting our results into the expression of W̃ and equations (65), (66) the proof
is complete.
Proof of remark 18. In case of a vector field f : X → X, and invariance equation
D1 W (r, θ) R (r) + D2 W (r, θ) T (r) = f (W (r, θ)) ,
the instantaneous frequency and damping ratios with respect to the coordinate system defined
by the decoder W , are
ω (r) = T (r) ,
R (r)
ζ (r) = − ,
rω (r)
respectively. However the coordinate system defined by W is nonlinear, which needs cor-
recting. As in the proof of proposition 16, we assume a coordinate transformation
W̃ (r, θ) = W (ρ (r) , θ + γ (r)) , (74)
and find that invariance equation (32) becomes
D1 W̃ (r, θ) R̃ (r) + D2 W̃ (r, θ) T̃ (r) = f W̃ (r, θ) , (75)
Acknowledgement
I would like to thank Branislaw Titurus for supplying the data of the jointed beam. I would
also like to thank Sanuja Jayatilake, who helped me collect more data on the jointed beam,
which eventually was not needed in this study. Discussions with Alessandra Vizzaccaro,
David Barton and his research group provided great inspiration to complete this research.
A.V. has also commented on a draft of this manuscript.
43
References
[1] P. A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds.
Princeton University Press, 2009.
[2] P. A. Absil and J. Malick. Projection-like retractions on matrix manifolds. SIAM Journal
on Optimization, 22(1):135–158, 2012.
[5] W.-J. Beyn and V. Thümmler. Phase Conditions, Symmetries and PDE Continuation,
pages 301–330. Springer Netherlands, Dordrecht, 2007.
[6] S.A. Billings. Nonlinear System Identification: "NARMAX" Methods in the Time, Fre-
quency, and Spatio-Temporal Domains. Wiley, 2013.
[8] S. Boyd and L. Vandenberghe. Introduction to Applied Linear Algebra: Vectors, Matri-
ces, and Least Squares. Cambridge University Press, 2018.
[9] T. Breunung and G. Haller. Explicit backbone curves from spectral submanifolds of
forced-damped nonlinear mechanical systems. Proceedings of the Royal Society of London
A: Mathematical, Physical and Engineering Sciences, 474(2213), 2018.
[11] S. L. Brunton, J. H. Tu, I. Bright, and J. N. Kutz. Compressive sensing and low-rank
libraries for classification of bifurcation regimes in nonlinear dynamical systems. SIAM
Journal on Applied Dynamical Systems, 13(4):1716–1732, 2014.
[12] S.L. Brunton, J.L. Proctor, J.N. Kutz, and W. Bialek. Discovering governing equations
from data by sparse identification of nonlinear dynamical systems. Proceedings of the
National Academy of Sciences of the United States of America, 113(15):3932–3937, 2016.
[13] X. Cabré, E. Fontich, and R. de la Llave. The parameterization method for invariant
manifolds I: Manifolds associated to non-resonant subspaces. Indiana Univ. Math. J.,
52:283–328, 2003.
[14] X. Cabré, E. Fontich, and R. de la Llave. The parameterization method for invariant
manifolds iii: overview and applications. Journal of Differential Equations, 218(2):444–
515, 2005.
[15] M. Casdagli. Nonlinear prediction of chaotic time series. Physica D: Nonlinear Phenom-
ena, 35(3):335–356, 1989.
44
[16] M. Cenedese, J. Axås, B. Bäuerlein, K. Avila, and G. Haller. Data-driven modeling
and prediction of non-linearizable dynamics via spectral submanifolds. Nat Commun,
13(872), 2022.
[18] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic
Analysis, 21(1):5–30, 2006.
[19] A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust Region Methods. MPS-SIAM Series
on Optimization. SIAM, 2000.
[22] D. A. Ehrhardt and M. S. Allen. Measurement of nonlinear normal modes using multi-
harmonic stepped force appropriation and free decay. Mechanical Systems and Signal
Processing, 76-77:612 – 633, 2016.
[23] D. Elbrachter, D. Perekrestenko, P. Grohs, and H. Bolcskei. Deep neural network ap-
proximation theory. IEEE Transactions on Information Theory, 67(5):2581–2623, 2021.
[24] N. Fenichel. Persistence and smoothness of invariant manifolds for flows. Indiana Univ.
Math. J., 21:193–226, 1972.
[26] L. Grasedyck, D. Kressner, and C. Tobler. A literature survey of low-rank tensor ap-
proximation techniques. GAMM-Mitteilungen, 36(1):53–78, 2013.
[27] W. Hackbusch and S. Kühn. A new scheme for the tensor representation. Journal of
Fourier Analysis and Applications, 15:706–722, 2009.
[28] G. Haller and S. Ponsioen. Nonlinear normal modes and spectral submanifolds: exis-
tence, uniqueness and use in model reduction. Nonlinear Dynamics, 86(3):1493–1534,
2016.
[29] R. Hermann and A. Krener. Nonlinear controllability and observability. IEEE Transac-
tions on Automatic Control, 22(5):728–740, 1977.
45
[31] M. Jin, W. Chen, M. R. W. Brake, and H. Song. Identification of instantaneous fre-
quency and damping from transient decay data. Journal of Vibration and Acoustics,
Transactions of the ASME, 142(5):051111, 2020.
[35] H. Blaine Lawson, Jr. Foliations. Bull. Amer. Math. Soc., 80:369–418, 1974.
[36] I. Mezić. Spectral properties of dynamical systems, model reduction and decompositions.
Nonlinear Dynamics, 41(1-3):309–325, 2005.
[37] I. Mezić. Koopman operator, geometry, and learning of dynamical systems. Notices of
the American Mathematical Society, 68(7):1087–1105, 2021.
[39] G. B. Orr and K. R. Müller. Neural Networks: Tricks of the Trade. Lecture Notes in
Computer Science. Springer Berlin Heidelberg, 2003.
[44] P. Petersen, M. Raslan, and F. Voigtlaender. Topological properties of the set of func-
tions generated by neural networks of fixed size. Foundations of Computational Mathe-
matics, 21(2):375–444, 2021.
46
[46] N. K. Read and W. H. Ray. Application of nonlinear dynamic analysis to the identifi-
cation and control of nonlinear systems - iii. n-dimensional systems. Journal of Process
Control, 8(1):35–46, 1998.
[47] S. W. Shaw and C Pierre. Normal-modes of vibration for nonlinear continuous systems.
J. Sound Vibr., 169(3):319–347, 1994.
[48] G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.
[49] R. Szalai. Invariant spectral foliations with applications to model order reduction and
synthesis. Nonlinear Dynamics, 101(4):2645–2669, 2020.
[50] R. Szalai, D. Ehrhardt, and G. Haller. Nonlinear model identification and spec-
tral submanifolds for multi-degree-of-freedom mechanical vibrations. Proc. R. Soc. A,
473:20160759, 2017.
[51] F. Takens. Detecting strange attractors in turbulence. In D. Rand and Lai-Sang Young,
editors, Dynamical Systems and Turbulence, Warwick 1980, pages 366–381. Springer
Berlin Heidelberg, 1981.
[57] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation, 1(1):67–82, 1997.
Conflict of Interest
The author has no relevant financial or non-financial interests to disclose. No funds, grants,
or other support was received.
47
Data availability
The datasets generated during and/or analysed during the current study are available in the
FMA repository, https://github.com/rs1909/FMA.
48