You are on page 1of 10

Natural Intelligence Home About me

A blog by Sam Greydanus

Lagrangian Neural Networks


Mar 10, 2020 • Sam Greydanus, Miles Cranmer, and Stephan Hoyer

Accurate models of the world are built on notions of its underlying symmetries. In physics, these
symmetries correspond to conservation laws, such as for energy and momentum. But neural network
models struggle to learn these symmetries. To address this shortcoming, last year I introduced a class
of models called Hamiltonian Neural Networks (HNNs) that can learn these invariant quantities directly
from (pixel) data. In this project, some friends and I are going to introduce a complimentary class of
models called Lagrangian Neural Networks (LNNs). These models are able to learn Lagrangian
functions straight from data. They’re interesting because, like HNNs, they can learn exact conservation
laws, but unlike HNNs they don’t require canonical coordinates.

Figure 1: A Lagrangian Neural Network learns the Lagrangian of a double pendulum. In this post, we
introduce Lagrangian Neural Networks (LNNs). Like Hamiltonian Neural Networks, they can learn
arbitrary conservation laws. In some cases they are better since they do not require canonical
coordinates.

READ THE PAPER RUN IN BROWSER GET THE CODE

“A scientific poem”
Joseph-Louis Lagrange must have known that life is short. He was born to a family of 11 children and
only two of them survived to adulthood. Then he spent his adult years in Paris, living through the Reign
of Terror and losing some of his closest friends to the guillotine. Sometimes I wonder if these hardships
made him more sensitive to the world’s ephemeral beauty, and more determined to make the most of
his short time here.

Indeed, his path into research was notable for its passion and suddenness. Until the age of 17,
Lagrange was a normal youth who planned to become a lawyer and showed no particular interest in
mathematics. But all of that changed when he read an inspiring memoir by Edmund Halley and decided
to embark on an obsessive course of self-study in mathematics. A mere two years later he published
the principle of least action.
“I will deduce the complete mechanics of solid and fluid bodies using the principle of least action.”
– Joseph-Louis Lagrange, age 20

A French stamp commemorating Lagrange.

Lagrange’s work was notable for its purity and beauty, especially in contrast to the chaotic and broken
times that he lived through. Expressing admiration for the principle of least action, William Hamilton
once called it “a scientific poem”. In the following sections, I’ll introduce you to this “scientific poem” and
then use it to derive Lagrangian Neural Networks.

The Principle of Least Action


The Action. Start with any physical system that has coordinates x t = (q, q̇ ) . For example, we might
describe a double pendulum using the angles of its arms and their respective angular velocities. Now,
one simple observation is that these coordinates must start in one state x 0 and end up in another, x 1 .
There are many paths that these coordinates might take as they pass from x 0 to x 1 , and we can
associate each of these paths with a scalar value S called “the action.” Lagrangian mechanics tells us
that the action is related to kinetic and potential energy, T and V , by a functional

t
S = ∫ t 01 T (qt , q̇ t ) − V (qt , q̇ t ) dt. (1)

At first glance, S seems like an arbitrary combination of energies. But it has one remarkable property. It
turns out that for all possible paths between x 0 and x 1 , there is only one path that gives a stationary
value of S . Moreover, that path is the one that nature always takes.

Figure 3: Possible paths from q0 to q1, plotted in


configuration space. The action is stationary (δS = 0) for small
perturbations (δq) to the path that the system actually takes
(red).

The Euler-Lagrange equation. In order to “deduce the complete mechanics of solid and fluid bodies,”
all Lagrange had to do was constrain every path to be a stationary point in S . The modern principle of
least action looks very similar: we let L ≡ T − V (this is called the Lagrangian), and then write the
d ∂L ∂L
constraint as dt
∂ q̇ j
= ∂qj
. Physicists call this constraint equation the Euler-Lagrange equation.
When you first encounter it, the principle of least action can seem abstract and impractical. But it can
be quite easy to apply in practice. Consider, for example, a single particle with mass m , position q, and
potential energy V (q):

1
L = − V (q) + mq̇ 2 write down the Lagrangian (2)
2
∂V (q)
− = mq̈ apply the Euler-Lagrange equation to L (3)
∂q
F = ma this is Newton's second law (4)

Nature’s cost function. As a physicist who now does machine learning, I can’t help but think of S as
Nature’s cost function. After all, it is a scalar quantity for which Nature finds a stationary point, usually a
minimum, in order to generate the dynamics of the entire universe. The analogy gets even more
interesting at small spatial scales, where quantum wavefunctions can be interpreted as Nature’s way of
exploring multiple paths that are all very close to the path of stationary action.1

How we usually solve Lagrangians


Ever since Lagrange introduced the notion of stationary action, physicists have followed a simple
formula:

1. Find analytic expressions for kinetic and potential energy


2. Write down the Lagrangian
3. Apply the Euler-Lagrange constraint
4. Solve the resulting system of differential equations

But these analytic solutions are rather crude approximations of the real world. An alternative approach
is to assume that the Lagrangian is an arbitrarily complicated function – a black box that does not
permit analytical solutions. When this is the case, we must give up all hope of writing the Lagrangian
out by hand. However, there is still a chance that we can parameterize it with a neural network and
learn it straight from data. That is the main contribution of our recent paper.

How to learn Lagrangians


The process of learning a Lagrangian differs from the traditional approach, but it also involves four
basic steps:

1. Obtain data from a physical system


2. Parameterize the Lagrangian with a neural network (L ≡ L θ ).
3. Apply the Euler-Lagrange constraint
4. Backpropagate through the constraint to train a parametric model that approximates the true
Lagrangian

The first two steps are fairly straightforward, and we’ll see that automatic differentiation makes the
fourth pretty painless. So let’s focus on step 3: applying the Euler-Lagrange constraint. Our angle of
attack will be to write down the constraint equation, treat L as a differentiable blackbox function, and
see whether we can still obtain dynamics:
d ∂L ∂L
= the Euler-Lagrange equation
dt ∂q̇ j ∂ qj
d
∇ L = ∇ qL switch to vector notation
dt q̇
d
(∇ q̇ ∇ ⊤q̇ L)q̈ + (∇ q ∇ ⊤q̇ L)q̇ = ∇ q L expand time derivative
dt
q̈ = (∇ q̇ ∇ ⊤q̇ L) −1 [∇ q L − (∇ q ∇ ⊤q̇ L)q̇ ] matrix inverse to solve for q̈

For a given set of coordinates x t = (qt , q̇ t ) , we now have a method for calculating ẋ t = (q̇ t , q̈ t )
from a blackbox Lagrangian. We can integrate this quantity to obtain the dynamics of the system. And
in the same manner as Hamiltonian Neural Networks, we can learn L θ by differentiating the MSE loss
Lθ true
between ẋ t and ẋ t .

Implementation. If you look closely at Equation 8, you may notice that it involves both the Hessian and
the gradient of a neural network during the forward pass of the LNN. This is not a trivial operation, but
modern automatic differentiation makes things surprisingly smooth. Written in JAX, Equation 8 is just a
few lines of code:

q_tt = (
jax.numpy.linalg.pinv(jax.hessian(lagrangian, 1)(q, q_t)) @ (
jax.grad(lagrangian, 0)(q, q_t)
- jax.jacfwd(jax.grad(lagrangian, 1), 0)(q, q_t) @ q_t
)
)

Learning real Lagrangians


In our paper, we conduct several experiments to validate this approach. In the first, we show that
Lagrangian Neural Networks can learn the dynamics of a double pendulum.

Double pendulum. The double pendulum is a dynamics problem that regular neural networks struggle
to fit because they have no prior for conserving the total energy of the system. It is also a problem
where HNNs struggle, since the canonical coordinates of the system are not trivial to compute (see
equations 1 and 2 of this derivation for example). But in contrast to these baseline methods, Figure 4
shows that LNNs are able to learn the Lagrangian of a double pendulum.

Figure 4: Learning the dynamics of a double pendulum. Unlike the baseline neural network, our model learns to
approximately conserve the total energy of the system. This is a consequence of the strong physical inductive bias of the
Euler-Lagrange constraint.

It’s also interesting to compare qualitative results. In the video below, we use a baseline neural network
and an LNN to predict the dynamics of a double pendulum, starting from the same initial state. You’ll
notice that both trajectories seem reasonable until the end of the video, when the baseline model shifts
to states that have much lower total energies.
double_pend_compare

Figure 5: Dynamics predictions of a baseline (left) versus


an LNN (right)

Relativistic particle. Another system we considered was a particle of mass m = 1 moving at a


relativistic velocity through a potential g with c = 1 . The Lagrangian of the system is
L = ((1 − q̇ 2 ) −1/2 − 1) + gq and it is interesting because existing Hamiltonian and Lagrangian
learning approaches fail. HNNs fail because the canonical momenta of the system are hard to
compute. Deep Lagrangian Networks2 fail because they make restrictive assumptions about the form
of the Lagrangian.

Figure 6: Learning the dynamics of a relativistic particle. In the first plot (a), an HNN model fails to
model the system because the default coordinates are non-canonical. In the second plot (b), we
provide the HNN with proper canonical coordinates and it succeeds. In the third plot (c), we show
that an LNN can fit the data even in the absence of canonical coordinates.

Related Work
Learning invariant quantities. This approach is similar in spirit to Hamiltonian Neural Networks
(HNNs) and Hamiltonian Generative Networks3 (HGNs). In fact, this blog post was written as a
compliment to the original HNN post and it has the same fundamental motivations. Unlike these
previous works, our aim here is to learn a Lagrangian rather than a Hamiltonian so as not to restrict the
inputs to being canonical coordinates. It’s worth noting that once we learn a Lagrangian, we can always
use it to obtain the value of a Hamiltonian using the Legendre transformation.

Deep Lagrangian Networks (DeLaN, ICLR’19). Another closely related work is Deep Lagrangian
Networks2 in which the authors show how to learn specific types of Lagrangian systems. They assume
that the kinetic energy is an inner product of the velocity, which works well for rigid body dynamics such
as those in robotics. However, there are many physical systems that do not have this specific form.
Some simple examples include a charged particle in a magnetic field or a fast-moving object with
relativistic corrections. We see LNNs as a complement to DeLaNs in that they cover the cases where
DeLaNs struggle but are less amenable to robotics applications.

Closing Thoughts
The principle of stationary action is a unifying force in physics. It represents a consistent “law of the
universe” which holds true in every system humans have ever studied: from the very small1 to the very
large, from the very slow to the very fast. Lagrangian Neural Networks represent a different sort of
unification. They aim to strengthen the connection between real-world data and the underlying physical
constraints that it obeys. This gives LNNs their own sort of beauty, a beauty that Lagrange himself may
have admired.

Footnotes
1. Here e −S/h is actually the probability of a particular path occurring. Because h is small, we
usually only observe the minimum value of S on large scales. See Feynman lecture 19 for more
on this. ↩ ↩ 2
2. Lutter, M., Ritter, C., and Peters, J. Deep lagrangian networks: Using physics as model prior for
deep learning, International Conference on Learning Representations, 2019. ↩ ↩ 2
3. Toth, P., Rezende, D. J., Jaegle, A., Racanière, S., Botev, A., & Higgins, I. Hamiltonian Generative
Networks, International Conference on Learning Representations, 2020. ↩

ALSO ON GREYDANUS-BLOG

A Tutorial on Structural The Art of Scribe: Generating Training Networks in


Optimization Regularization Realistic … Random Subspaces

2 years ago • 1 comment 8 years ago • 2 comments 8 years ago • 19 comments 6 years ago • 3 comments

A science blog by Sam Learning about learning. Learning about learning. Learning about learning.
Greydanus
Sponsored

Rio Grande: Invista 2 reais, receba 30 Hoje


BONUS Jogar

If you have a mouse, play this game for 1 minute and see why everyone is addicted.
Combat Siege

Urologista revela: Esse alimento na sua geladeira pode encolher a sua próstata em questão
de dias!
Programa Viva Bem [Oficial] | Próstata

O quebra-cabeça mais popular do país vira febre em Rio Grande


Puzi

A bet dos melhores do mundo - Junte-se a tropa


LUVA Clique aqui

Bariátrica caseira: "Essa planta "come" gordura", diz especialista (veja como usar)
Fofocalizei | Saúde & Você [TV]
11 Comments 1 Login

G Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 3 Share Best Newest Oldest

A
avinash bhashkar − ⚑
3 years ago

This approach of learning lagrangian is supercool for the dynamics of the system. I would like to implement the
learning behavior of leg's dynamics. for this task how to approach the system since i am using pybullet for
simulation. with jax can I incorporate with your method?
0 0 Reply Share ›

B batuhankoyuncu − ⚑
3 years ago edited

I am not sure about this but I think there may be a mistake in the colab notebook. In the code snippet below, I
believe the index should be 0 and 2 for q and q_dot. The current version plots position for two masses of the
double pendulum. Would you agree @Sam Ritchie ?

plt.figure(figsize=[10,3], dpi=120) ; plt.xlim(0, 100)


plt.subplot(1,3,1)
plt.title("Analytic vs Perturbed ($\epsilon$={})".format(noise_coeff_1))
plt.xlabel("Time") ; plt.ylabel("State")
plt.plot(t, x_analytical[:, 0], 'g-', label='$q$')
plt.plot(t, x_analytical[:, 1], 'c-', label='$\dot q$')
plt.plot(t, x_perturbed_1[:, 0], 'g--', label='pert. $q$')
plt.plot(t, x_perturbed_1[:, 1], 'c--', label='pert. $\dot q$')
plt.legend(fontsize=6)

0 0 Reply Share ›

B
batuhankoyuncu > batuhankoyuncu − ⚑
3 years ago edited

You can see it through the following code snippet. Define total_energy and check it through time
steps. For both x_analytical and x_autograd, total energy is conserved; however, the indexes should
be chosen as I mentioned above.

def total_energy(q, q_dot, m1, m2, l1, l2, g):


#same as lagrangian except:
return T + V

0 0 Reply Share ›

M
Mgf5 − ⚑
3 years ago

Congratulations about this approach. I have a simple question about it: is the code you provide in the example
prepared for learning a lagrangian completely from data?

0 0 Reply Share ›

Jeffrey Lai − ⚑
4 years ago

How do you arrive at eq. 7 from eq. 6 (the expand the time derivative part)?

0 0 Reply Share ›

Sam Ritchie > Jeffrey Lai − ⚑


Sa tc e y
4 years ago

That's the chain rule; remember, q and qdot are both functions of time, and the Lagrangian is a
function of their outputs. So if you write L(t, q(t), \dot{q}(t)) then you have to use the multivariable
chain rule. It's a little tougher to track here because of the j subscript, so you also have to expand
across all of the components of the position and velocity....

Try it with a single component (ie, q(t) and qdot(t) return a single value each) and then build up to the
bigger version.
0 0 Reply Share ›

B
batuhankoyuncu > Sam Ritchie − ⚑
3 years ago edited

Hmm, I guess we need to write L(q(t), \dot{q}(t)), since we do not have the time derivative in
the expanded version. I feel like it implies that L does not depend on time explicitly.

0 0 Reply Share ›

Sam Ritchie > batuhankoyuncu


− ⚑
3 years ago

@batuhankoyuncu , you're definitely right! That assumption is baked into this


derivation, since that term is indeed dropped (I confirmed by looking at the
arguments to jax.grad(lagrangian, 0) in the code snippet below).

In principle you can definitely have a time-dependent Lagrangian if, say, some
potential changes in strength over time (and energy leaks out of the system).

1 0 Reply Share ›

greydanus Mod > Sam Ritchie


− ⚑
4 years ago

Yes, thanks Sam. That's the right intuition.

0 0 Reply Share ›

Layne Sadler − ⚑
4 years ago

What a beautiful idea. The music of the spheres.

0 0 Reply Share ›

greydanus Mod > Layne Sadler − ⚑


4 years ago

Thanks Layne.

0 0 Reply Share ›

Subscribe Privacy Do Not Sell My Data


Sponsored

Rio Grande: Invista 2 reais, receba 30 Hoje


BONUS Jogar

If you have a mouse, play this game for 1 minute and see why everyone is addicted.
Combat Siege

Urologista revela: Esse alimento na sua geladeira pode encolher a sua próstata em questão
de dias!
Programa Viva Bem [Oficial] | Próstata

O quebra-cabeça mais popular do país vira febre em Rio Grande


Puzi

A bet dos melhores do mundo - Junte-se a tropa


LUVA Clique aqui

Bariátrica caseira: "Essa planta "come" gordura", diz especialista (veja como usar)
Fofocalizei | Saúde & Você [TV]

Natural Intelligence greydanus A blog by Sam Greydanus


samgreydanus
RSS

You might also like