You are on page 1of 9

Experimental Demonstration of Coupled Learning in Elastic Networks

Lauren E. Altman,1, ∗ Menachem Stern,1 Andrea J. Liu,1, 2 and Douglas J. Durian1, 2


1
Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104-6396, USA
2
Center for Computational Biology, Flatiron Institute,
Simons Foundation, New York, NY 10010, USA
(Dated: November 2, 2023)
Coupled learning is a contrastive scheme for tuning the properties of individual elements within a
network in order to achieve desired functionality of the system. It takes advantage of physics both
to learn using local rules and to “compute” the output response to input data, thus enabling the
system to perform decentralized computation without the need for a processor or external memory.
We demonstrate a proof-of-concept mechanical network that can learn simple tasks such as self-
arXiv:2311.00170v1 [cond-mat.soft] 31 Oct 2023

symmetrizing via iterative tuning of individual spring rest lengths. These mechanical networks
could feasibly be scaled and automated to solve increasingly complex tasks, hinting at a new class
of “smart” metamaterials.

I. INTRODUCTION “nudging” the network into the desired functionality via


local learning rules [16].
Learning is a physical process by which a system Despite the growing interest for learning behaviors in
evolves to exhibit a desired behavior. Artificial neural physical materials, laboratory realizations are still rare.
networks have been very successful in training towards Pruning bonds in a network is one approach to achieve
a wide array of tasks including regression and classifica- some desired functionality [12, 17, 18], but this process
tion via global optimization algorithms such as gradient is generally not reversible. Directed aging has been ef-
descent [1]. These algorithms generally rely on a central fectively demonstrated in a laboratory setting [9, 12, 19],
processor to use global information about the state of the but this algorithm does not minimize the true loss func-
entire network and compute gradients to determine how tion and is therefore limited in its use cases. It is also
the system should evolve in the next training step. By possible to computationally simulate a network’s train-
contrast, physical and biological learning systems gen- ing and print it out in its trained state [20], but there
erally do not have access to such global information as are good reasons to build a network with tunable com-
they evolve, and so must instead rely on a decentral- ponents: A printed network that has been designed for a
ized learning scheme to achieve desired behaviors [2]. In single task is static in nature, and could not be re-trained
such systems, learning is better described as an emergent for alternative behavior after printing. Additionally, the
phenomenon in a complex material. Decentralized, lo- distributed learning algorithm is agnostic to manufactur-
cal learning as seen in biological systems has advantages ing errors and a physical network could use this algorithm
over the computational approach [3], including power ef- to re-learn a task after taking damage. Equilibrium prop-
ficiency [4], scalability of processing speed with network agation has been shown to successfully train an Ising ma-
size [5], and robustness to damage [6]. chine [21]. Coupled learning has been demonstrated in
Recent efforts have been made to develop algorithms the lab for electrical flow networks [22–24] to learn vari-
inspired by computational machine learning for imple- ous tasks, including classification.
mentation in physical systems [7], adopting some of the In comparison with flow networks, a mechanical net-
features exhibited in biological systems and using physics work poses unique and complex challenges for construc-
to propagate information throughout the system. Such tion. The degrees of freedom in a mechanical network are
approaches involve tuning the individual properties of vectorial in nature, rather than scalar like the voltages at
a collective material using global information, e.g. by nodes of a circuit, which increases the dimensional com-
backpropagation [8]. Directed aging [9] is a learning al- plexity of the system. A mechanical network is spatially
gorithm in which a network minimizes its elastic energy embedded, so the architecture and connectivity of the
in response to stress. Contrastive learning compares a network must be carefully chosen for proper coordination
network’s internal state in response to different sets of number and to avoid collision of neighboring edges and
boundary conditions [10, 11] and takes advantage of lo- frustration. Construction of a unit cell with adjustable
cal rules to perform updates [12]. Equilibrium propaga- learning degrees of freedom is of primary concern as well.
tion is a class of algorithms which combine the locality Lee et al. [25] have constructed and trained a network
of contrastive learning with the physics-informed update of elastic components with tunable stiffnesses using voice
rules of directed aging [13–15]. Coupled learning simi- coils, although their approach still relies on a central pro-
larly trains a network by applying boundary conditions, cessor to compute updates globally. Tunable elastic com-
ponents built from a combination spring-pendulum sys-
tem can perform unsupervised pattern-finding tasks [26],
but functionality in unsupervised training is limited to a
∗ laltman2@sas.upenn.edu relatively small subset of potential problems of interest
2

[27]. Arinze et al. used autonomous softening of hinges and serves as the “physical cost function” which the
to train thin elastic sheets to fold in desired ways in re- system automatically minimizes subject to the imposed
sponse to force patterns [28] based on a local supervised boundary conditions.
learning rule that softens elements only when they fold We wish to train this network to achieve some desired
correctly [29]. behavior in the positions of its nodes. This is what is
In this paper, we tune each edge’s rest length, rather referred to as a “motion task.” We select certain nodes
than its stiffness. Using rest length as the learning de- to be “inputs” and “outputs” for our task, depicted in
gree of freedom requires special consideration, as each Fig. 1(a) as blue and red filled circles, respectively. When
iterative learning step will alter the equilibrium state of the input node is at position xI , we want the output
the network. Additionally, unlike bond stiffnesses that node’s position xO to be at some desired value, xD .
are directly analogous to conductances in flow networks, We train for this behavior by iteratively adjusting the
the rest length has no direct analogy and is unique to turnbuckle lengths at each edge according to the coupled
mechanical systems. In our approach, a turnbuckle is learning algorithm [16]. Here, the positions of each node
connected to a Hookean spring at its end, and by ad- in the system serve as the “physical degrees of freedom,”
justing the length of the turnbuckle, the effective rest which equilibrate so that there is no net force on each
length of combination spring-turnbuckle edge is altered. edge, while the adjustable rest lengths act as the “learn-
Since updates are not automated but rather made man- ing degrees of freedom” to be adjusted during training.
ually, it is imperative to choose architectures that min- In the free state, Fig. 1(a), we apply the input boundary
imize the number of tunable edges while still exhibit- condition to the network by fixing the position of the in-
ing a diversity of tunable behaviors. Here we provide put node, xI , and allowing all other physical degrees of
two laboratory demonstrations of experimental mechan- freedom to equilibrate. The output node’s position xO is
ical systems that can train using coupled learning with measured, along with the extensions of each edge in the
adjustable rest length components, and we compare with network, sF i . In the clamped state, Fig. 1(b), we once
simulation. Our systems can optimize for a generic task again apply the input boundary condition, but now we
without the use of a computer, placing them among the also enforce a fixed value for the position of the output
first supervised local learning demonstrations in physi- node by “nudging” it towards the desired position xD .
cal mechanical systems. The two minimal architectures The clamped output value xC generally takes the form
presented in this study have been shown to effectively
span the range of possible states to achieve the desired xC = xF + η(xD − xF ), (3)
behavior quickly, smoothly, and effectively. Further, our
where the “nudge factor” 0 < η ≤ 1 is a hyperparameter
experimental demonstrations exhibit successful learning
of the training scheme. In all experiments below, we take
behaviors that are not accounted for by simulation, thus
a full nudge of η = 1 so the output node is clamped at
lending credence to the robustness of our algorithm. The
the desired position at each learning step. The lengths
success of these initial experiments shows promise for
sCi of each edge in this state are also recorded.
scalable learning networks in more complex systems.
In coupled learning, the system evolves by comparing
the mechanical energy of the network in the free and
clamped states. Learning is achieved when the energy in
II. COUPLED LEARNING FOR
SPRING-TURNBUCKLE SYSTEMS the clamped state U C is equal to the energy in the free
state U F . In the absence of nonlinear mechanical effects
like buckling, the difference between these energies, which
We construct a mechanical network such as the one we refer to as the “learning contrast function”, is always
depicted in Fig. 1(a). Each ith edge of this network is non-negative because the clamped state is more strongly
an extension spring with the same stiffness k and rest constrained than the free state, U C −U F ≥ 0. Analogous
length l connected to a rigid turnbuckle with adjustable to the machine-learning approach of minimizing a loss
length Li , as drawn schematically in the inset of Fig. 1(a). function like mean-squared error (MSE), the rest lengths
The total node-node separation of the edge, including the evolve by descending along the gradient of the contrast
length of the rigid part, is si , such that the spring’s ex- function:
tension past its rest length is si −(l+Li ). The mechanical
energy associated with this edge is given by Hooke’s law dLi ∂ hX C i
∝ − uj − uFj (4)
as dt ∂Li

1 2 = − (uC − uF
i ) (5)
ui = k [si − (l + Li )] . (1) ∂Li i
2
= k(sC F
i − si ) (6)
The total mechanical energy in the network is the sum
of the elastic energies of all edges, Since the learning contrast function was not squared, the
X partial derivative in Eq. (4) picks out only the j = i term,
U= uj , (2) which is readily simplified to Eq. (6) using Eq. (1). Thus,
j we arrive at the following general discrete learning rule
3

FIG. 1. Schematic detailing the coupled learning algorithm for an arbitrary elastic network. A mechanical network is constructed
such that each ith edge consists of a spring with stiffness k connected to a turnbuckle with adjustable rest length Li . The springs
act as the edges of the network, and their connection points are nodes. Specific nodes are chosen as “inputs” and “outputs” for
the network to learn a desired task. Some nodes (gray) are fixed to prevent translation and rotation of the entire network, and
the remaining nodes move in response to imposed boundary conditions. (a) In the free state, the position of the input node xI
is enforced, and the position of the output node is measured xO . Each ith edge has length (i.e. node-node separation) sF i . (b)
In the clamped state, the input node’s position is still fixed at xI and the output node is “clamped” to position xC . Each ith
has length sCi . (c) By locally comparing the lengths of each edge, the update rule in Eq. (7) determines how Li evolves.

for how to update each edge at each training step for any network, or two tunable-rest length springs connected in
network of identical spring-turnbuckle edges: series. A photograph of the network is shown in Fig. 2(a).
We construct the network so it is hanging vertically un-
∆Li = α(sC F
i − si ) (7) der gravity from a fixed point, thereby restricting the
motion to be along the axis of gravity. We define the
where α is a per-step learning rate that we shall set in one-dimensional position y = 0 to be at the fixed hang-
experiment, and (sC F
i − si ) is the difference in clamped ing point, and y > 0 measures position downwards from
and free lengths of the edge being updated. This simple this origin. The first spring is connected to the y = 0
rule is purely local: each edge is updates according only fixed point via a turnbuckle of length L1 , terminating at
to it’s behavior, irrespective of how other edges change position y1 . The second turnbuckle of length L2 then
upon clamping. As discussed below, we train with differ- connects to the second spring, which terminates at posi-
ent learning rate parameters in the range 0.1 < α ≤ 1. It- tion y2 . We choose y2 to be the input node of our system
erative updates should drive the global learning contrast and y1 to be the output node. Both springs have equal
function to zero in order to achieve the desired motion stiffnesses k and natural rest lengths l, so that the Eq. (7)
function. learning rule applies. The forces acting on springs in se-
While our Eq. (7) update rule is spatially local, it is ries are equal, so k(y1 − l − L1 ) = k(y2 − y1 − l − L2 )
not temporally local because the learning rule requires si- holds and can be solved for the output node position as
multaneous information about the system in two states.
The experimental implementation in an electrical resistor 1
y1 = y2 + a (8)
network [22] was able to circumvent this issue by build- 2
ing identical twin networks to run the free and clamped where a = 21 (L1 − L2 ). Choosing some desired numerical
states simultaneously. By contrast, the mechanical net- value for a defines a trainable task for our system. This is
work is embedded in space, posing difficulties for con- equivalent to a machine-learning linear regression prob-
structing twin 2-dimensional networks side-by-side, and lem with one variable coefficient [30]. Note that there is
an impossibility entirely for 3-dimensional implementa- no unique solution for L1 and L2 ; only their difference
tions. Therefore, our approach must rely on temporal is trained for. Like typical machine learning algorithms,
memory of the spring extensions between the free and our system is over-parameterized and thus has multiple
clamped states. solutions.

III. MOTION DIVIDER A. Apparatus

Analogous to the voltage divider in the electrical net- Each unit cell of our network consists of an extension
work case, our first demonstration is a “motion divider” spring coupled to a turnbuckle in series, as shown dia-
4

FIG. 2. Training a mechanical motion divider to obtain de-


sired behavior in the form of an additive constant a. (a) Trial True a [cm] Fit a [cm] True α Fit α
Photograph of the experimental apparatus. Two springs in (a) 2.00 2.25 0.16 0.13
series are hung vertically from a fixed point. The positions (b) -1.00 -1.11 0.32 0.37
of the two nodes relative to the fixed point, y1 and y2 , serve (c) 0.70 0.73 0.48 0.56
as the target and source for the training task, respectively. (d) -1.20 -1.23 0.40 0.49
The learning degrees of freedom, L1 , L2 , determine the rela-
tionship between the node positions. (b) Results of training FIG. 3. Comparison of motion divider learning dynamics with
the network four consecutive times. Different goal values for theoretical prediction. The time-evolution of the physical be-
a (dashed gray line) and learning rates α were used for each havior for each of the four trials from Fig. 2(b) are separately
consecutive training. (c) Evolution of the learning degrees fit to an exponential form and values for the goal state a and
of freedom over the course of training. L1 (orange) evolves the learning rate α are obtained. The table (bottom) com-
inversely to L2 (green) as the network approaches the goal pares these fitted values with the true experimental values for
state. Rest lengths were not reset to their initial values after a and α for each of the four trials.
each successful training.

grammatically in Fig. 1 and photographically in Fig. 2(a).


The spring has a stiffness of k = 13 N/cm and a rest
without initialization. The evolution of the learning de-
length of l = 5.5 cm (Grainger 5108N536), while the turn-
grees of freedom, L1 , L2 is shown in Fig. 2(c). Each time
buckle has a range of lengths between Lmin = 12 cm and
we change the training task a, we also choose a different
Lmax = 16 cm (eoocvt M4 Stainless Steel 304). Up-
learning rate to demonstrate that the network may be
dates are made on the system via turns of the turn-
trained in different time frames.
buckle, where each half-turn results in a change in length
of ∆L/turn = 0.079 cm. The total effective rest length
of the unit cell object is therefore l + Li . These unit Training time required to reach the desired state is de-
cells are connected together using keyrings, which serve termined both by the learning rate and the distance in
functionally as the nodes of our network. The system is parameter space between the network’s initial and final
clamped at the nodes manually by inserting a small rod states. The physical and learning degrees of freedom are
through the keyrings and fixing its position up or down expected to evolve exponentially and asymptotically ap-
using a clamp on a vertical pole. Measurements were proach their desired values over the course of training.
made manually by the experimenter using a ruler. For a full derivation of the learning dynamics, see Sup-
plemental Information, Sec. A. Fig. 3 presents each of the
four trials of training separately. Each of these is fit to the
derived time-evolution Eq. (A13), where νt has been re-
B. Results placed by αn, with n being the number of training steps.
The fits are overlaid on the data in red. The table below
Fig. 2(b) shows the evolution of the node couplings compares the true values for a and α with those obtained
over the course of training. The network is trained mul- from the exponential fit, and finds good agreement, with
tiple times to achieve different values for the goal state, an average error of 12 %. By comparing fitted values for
a. The turnbuckle lengths were not reset after each train- a with the true values, we may also obtain an estimate
ing; instead, the network was able to re-learn a new task of the measurement error, where ϵ ∼ ⟨∆a⟩ = 0.1 cm.
5

IV. SYMMETRY NETWORK the horizontal track described previously. The horizontal
position of the top node, xtarget is restricted by placing it
We further demonstrate our system’s ability to learn on a vertical track using the same tool hanger as above,
more complex tasks by constructing a two-dimensional while its vertical position ytarget is allowed to equilibrate
network within a frame. The network is shown schemat- freely.
ically and photographically in Fig. 4(a-b). Four of the Since this network requires measurements in two di-
edges are fixed to rigid a 45 cm×40 cm frame constructed mensions, we chose to automate the measurement process
from 80/20, with the fifth central edge connecting the using a digital camera. As can be seen in the photographs
two internal nodes of the system. Edges are typically in Fig. 4(a-b), we attach pink markers to the free and
stretched past their rest length value by about 6 cm. fixed nodes (6), as well as the points of connection be-
Edges are labeled by number and nodes are denoted as tween the turnbuckles and springs (5). The camera cap-
source (blue) and target (red). tures a three-channel color image of the entire network,
The goal is to train this network to become left-right where we choose a beneficial white balance (2500 K) and
symmetric, which occurs when L1 = L2 and L4 = L5 . tint (M6.0) to maximize the contrast between the mark-
This task may be encoded in the physical degrees of free- ers and the remainder of the image. We then split the
dom (node positions) of the network with the condition: color channels and subtract the green channel from the
red, leaving us with a one-channel image with high inten-
xtarget = xsource = 0 for any ysource value, sity values at the pixels associated with markers and low
where x = 0 is defined to be the midline between the fixed intensity everywhere else. The image is then binarized
nodes on either side. We may then train this network for using a threshold intensity value of 100, which is in be-
this one-input, two-output task using a single data point. tween the well-separated high and low intensity values.
The pixel values that have been identified after binariza-
tion are then clustered into 11 clusters using a k-means
A. Experiment algorithm [31]. The centroid of each cluster is determined
to be the (x, y) position of the associated marker. This
allows us to track all node positions, spring extensions,
The symmetry network was constructed using the same
and turnbuckle lengths at every stage of the experiment.
unit cell design as in the motion divider in Section III.
For a given photograph with 11 markers such as the ones
The same turnbuckles were used in construction of the
in Fig. 4, the full processing and tracking takes approxi-
edges, and keyrings served as the internal nodes of the
mately 0.7 s on a standard Macbook Pro.
system. The springs used here have a natural rest length
The results of training the physical network are shown
of l = 5.5cm and a stiffness of 27.8N/cm (Grainger
in Fig. 4(c-d). The learning degrees of freedom evolve
1NAA2). The external frame of the network was con-
such that L1 comes to meet L2 and L4 meets L5 within
structed from 80/20 parts and eyeholes screwed into the
measurement error of 0.28cm in only 5 training steps.
rail serve as the fixed nodes.
The time evolution of these rest lengths follows a similar
Training this network requires only one input-output
trend as in the motion divider, where the largest adjust-
pair. Both outputs, xsource and xtarget , have desired val-
ments happen in the early training steps and updates
ues that are directly central in the frame. Since training
become asymptotically small as the network approaches
can occur for any choice of the input value, ysource , we
the learned state. See Appendix A for more information
choose the equilibrium position of this node in its initial
about learning dynamics. The success of training may
state for simplicity.
also be measured by the cost function, C = UC − UF .
It is important to note that the physical degrees of
Fig. 4(d) shows that the cost decreases exponentially
freedom of a single node are mixed: the “source” node
from its initial value by about three orders of magnitude
acts as an input in the y-direction and as an output in
overbthe course of training.
the x-direction. The coupled learning algorithm allows
for this seemingly strange coupling, and the network is The middle edge L3 does not exhibit exponential time-
still trainable under this scheme. The mixing of degrees evolution behavior, but rather drifts upward over the
of freedom was achieved experimentally by aligning the course of training by a total of 1.5 cm, or 37.5 % of the
nodes along 80/20 tracks, as can be seen in the pho- range of values. This linear evolution suggests some sys-
tos in Fig. 4(a-b). In the free state, we fix the input tematic experimental bias within the system. Since L3
value for the training task, ysource , by attaching the node has no bearing on the desired state, the network can learn
to a slide-in nylon tool hanger which slots into a hori- the task for any value of L3 , even if we had held its value
zontal 80/20 track so its vertical position is fixed. The fixed over all training steps.
node may still move along the horizontal direction with
a small coefficient of static friction µ ≈ 0.1 for nylon on
dry aluminum. The target node is left unconstrained so B. Simulation
both its degrees are freedom can freely equilibrate. In
the clamped state, we fix both degrees of freedom of the There are many different ways to train for symmetry
bottom node, {ysource , xsource }, by placing stoppers along in our two-dimensional network. Our goal for this task is
6

FIG. 4. Training a two-dimensional network for symmetry. (a-b) Schematics and photographs of the free and clamped states of
the network for a given initial configuration. Edges are labeled by number (a, top). Photographs of the experimental apparatus
show marker tracking system and clamping mechanism. (c-d) Experimental training results. Rest lengths (c) converge to the
symmetric state within experimental precision. The cost C = UC − UF (d) decreases by three orders of magnitude. (e-f)
Simulation training results. The rest lengths (e) evolve such that L1 = L2 and L4 = L5 . Cost (f) decreases exponentially by
five orders of magnitude.

a specified internal state of the network, but the coupled much between the free and clamped states, and therefore
learning scheme requires that we specify a behavior in evolve very slowly. For more information, see Supple-
the nodes that is satisfied by this internal state, of which mental Information, Sec. B. The two-output task allows
there are multiple options. We therefore use simulation for a strong learning signal for all of the edges in the
support to explore the different iterations of our task and network.
examine their behavioral dynamics. We may also use Training was performed using a full nudge η = 1 and
simulation to modify the aspect ratio of the frame, as Eq. (7) with a learning rate of α = 0.4. Successful
the angles of the edges at the internal nodes contributes training is possible with a higher learning rate, but we
to the coupling strength of the learning signal. Having have artificially slowed the training to better examine the
done this, we chose the training task and geometry that learning dynamics. Fig. 4(c) shows the evolution of the
allowed for the most efficient evolution to the learned edges’ rest lengths during the course of training. The
state for our experimental demonstration. outer edges evolve smoothly to meet their desired state,
We simulated training on this network using the FIRE L1 = L2 and L4 = L5 , in approximately 10 training
optimization algorithm to determine the network’s node steps.
positions when boundary conditions are applied [32]. The middle edge’s rest length L3 has no bearing on
Here, each edge was allowed to vary between 0.5 and the desired goal state, yet it still evolves during training,
1.0 length units, and all edge stiffnesses were fixed at particularly in the first few training steps. The desired
the same value, 1. Initial edge lengths were selected at state is not dependent on L3 , but the cost function is,
random with validation that the network was sufficiently as this is the edge which directly connects the two rel-
detuned from its goal state. evant nodes. While evolution of L3 is not required for
The behavior that was used to train the network is the network to learn, allowing it to vary during training
a one-input / two-output task, but we could have al- according to the coupled learning update rule helps the
ternatively defined a one-input, one-output task that is network descend down the cost landscape more quickly.
equally satisfied by the symmetric state of the network. In the experimental trial, Fig. 4(c), L3 systematically
Training on the one-input, one-output task was success- drifts upward, rather than the asymptotic decrease that
ful in simulation, but required very long training times is seen in Fig. 4(e). We attribute this apparent discrep-
that were unfeasible for an experiment with manual op- ancy to frictional effects in the apparatus. While this
eration. In this case, L4 and L5 evolve quickly to their likely inhibited the speed of training, the network was
desired values due to their strong coupling to the target still able to converge on the desired configuration of rest
node. L1 and L2 , however, are only connected directly lengths. This lends credence to the coupled learning al-
to the source node, which does not move in position very gorithm’s robustness even in non-ideal experimental con-
7

ditions. ufacturing and expensive electronics to function, limit-


ing the potential for larger scalability. Further, a unit
cell that learns by updating its stiffness rather than its
V. CONCLUSION rest length would potentially allow for more trainable
functionalities, but it is not obvious how to design a
We have demonstrated that laboratory realizations of spring with tunable stiffness. Each of these concerns is
mechanical networks can be trained for desired behav- addressed with electronically-controlled tunable-stiffness
iors using the contrastive coupled learning algorithm. springs [33]. These springs are constructed of predom-
Through two different architectures, we have shown that inantly cheap or 3D-printed parts, operate by feeding
learning is achievable despite apparent differences in the concentric coils into or out of an elastic ring to mod-
learning dynamics between simulation and experiment. ify the stiffness, and use flex and force sensors to locally
Although the functionality of these minimal networks is sense their elastic energies. The modularity and ease of
limited by their simplicity, larger networks could readily construction of these tunable springs will allow for easier
be made using the same spring-turnbuckle repeat unit, scalability to more complex architectures and tasks.
and the initial experiments are sufficient to demonstrated
great promise for the potentiality of decentralized learn-
ing in physical materials. ACKNOWLEDGMENTS
Future implementations of mechanical coupled learn-
ing networks would ideally include automated compo- We thank Samuel Dillavou, Ben Pisanty, Cynthia
nents which could perform local sensing and actuation Sung, and Shivangi Misra for their helpful discussions.
to self-adjust. This poses its own challenges, as these This work was supported by NSF grants MRSEC/DMR-
active components would generally require precise man- 1720530 and MRSEC/DMR-2309043.

[1] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Na- edited by D. S. Touretzky, J. L. Elman, T. J. Sejnowski,
ture 521, 436 (2015). and G. E. Hinton (Morgan Kaufmann, 1991) pp. 10–17.
[2] M. Stern and A. Murugan, Learning without [11] M. J. Falk, J. Wu, A. Matthews, V. Sachdeva,
neurons in physical systems, Annual Review N. Pashine, M. L. Gardel, S. R. Nagel, and A. Muru-
of Condensed Matter Physics 14, 417 (2023), gan, Learning to learn by using nonequilibrium training
https://doi.org/10.1146/annurev-conmatphys-040821- protocols for adaptable materials, Proceedings of the Na-
113439. tional Academy of Sciences 120, e2219558120 (2023).
[3] S. Denève, A. Alemi, and R. Bourdoukan, The Brain as [12] N. Pashine, Local rules for fabricating allosteric net-
an Efficient and Robust Adaptive Learner, Neuron 94, works, Physical Review Materials 5, 065607 (2021).
969 (2017). [13] J. Kendall, R. Pantone, K. Manickavasagam, Y. Bengio,
[4] B. Sengupta and M. B. Stemmler, Power Consumption and B. Scellier, Training End-to-End Analog Neural Net-
During Neuronal Computation, Proceedings of the IEEE works with Equilibrium Propagation, arXiv:2006.01981
102, 738 (2014). [cs] (2020).
[5] R. A. John, N. Tiwari, M. I. B. Patdillah, M. R. Kulka- [14] E. Martin, M. Ernoult, J. Laydevant, S. Li, D. Querlioz,
rni, N. Tiwari, J. Basu, S. K. Bose, Ankit, C. J. Yu, T. Petrisor, and J. Grollier, EqSpike: Spike-driven equi-
A. Nirmal, et al., Self healable neuromorphic memtran- librium propagation for neuromorphic implementations,
sistor elements for decentralized sensory signal processing iScience 24, 102222 (2021).
in robotics, Nature communications 11, 4030 (2020). [15] C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang,
[6] R. A. McGovern, A. N. V. Moosa, L. Jehi, R. Busch, E. Montgomery, P. Lin, Z. Wang, W. Song, J. P. Stra-
L. Ferguson, A. Gupta, J. Gonzalez-Martinez, E. Wyl- chan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang,
lie, I. Najm, and W. E. Bingaman, Hemispherectomy in and Q. Xia, Efficient and self-adaptive in-situ learning in
adults and adolescents: Seizure and functional outcomes multilayer memristor neural networks, Nature Commu-
in 47 patients, Epilepsia 60, 2416 (2019). nications 9, 2385 (2018).
[7] L. G. Wright, T. Onodera, M. M. Stein, T. Wang, D. T. [16] M. Stern, D. Hexner, J. W. Rocks, and A. J. Liu, Su-
Schachter, Z. Hu, and P. L. McMahon, Deep physical pervised Learning in Physical Networks: From Machine
neural networks trained with backpropagation, Nature Learning to Learning Machines, Physical Review X 11,
601, 549 (2022). 021045 (2021).
[8] R. Rojas and R. Rojas, The backpropagation algorithm, [17] D. R. Reid, N. Pashine, J. M. Wozniak, H. M. Jaeger,
Neural networks: a systematic introduction , 149 (1996). A. J. Liu, S. R. Nagel, and J. J. de Pablo, Auxetic meta-
[9] D. Hexner, N. Pashine, A. J. Liu, and S. R. Nagel, Effect materials from disordered networks, Proceedings of the
of directed aging on nonlinear elasticity and memory for- National Academy of Sciences 115, E1384 (2018).
mation in a material, Physical Review Research 2, 043231 [18] C. P. Goodrich, A. J. Liu, and S. R. Nagel, The Prin-
(2020). ciple of Independent Bond-Level Response: Tuning by
[10] J. R. Movellan, Contrastive Hebbian Learning in the Pruning to Exploit Disorder for Global Behavior, Physi-
Continuous Hopfield Model, in Connectionist Models, cal Review Letters 114, 225501 (2015).
8

[19] N. Pashine, D. Hexner, A. J. Liu, and S. R. Nagel, Di- Appendix A: Motion Divider Learning Dynamics
rected aging, memory, and nature’s greed, Science Ad-
vances 5, eaax4215 (2019).
For the motion divider, we can explicitly write out and
[20] J. W. Rocks, N. Pashine, I. Bischofberger, C. P.
Goodrich, A. J. Liu, and S. R. Nagel, Designing allostery- solve for the learning dynamics. In this task, we choose
inspired response in mechanical networks, Proceedings of an input value, y2 , for the position of the end node. The
the National Academy of Sciences 114, 2520 (2017). system is stretched and y2 is held fixed for both the free
[21] J. Laydevant, D. Markovic, and J. Grollier, Training and clamped states during the entire course of training.
an ising machine with equilibrium propagation, arXiv The position of the middle node, y1 , serves as the output.
preprint arXiv:2305.18321 (2023). It will change over the course of training, both between
[22] S. Dillavou, M. Stern, A. J. Liu, and D. J. Durian, free and clamped states and between training steps as L1
Demonstration of decentralized physics-driven learning, and L2 evolve. In the free state, force balance gives the
Physical Review Applied 18, 014040 (2022). position of the middle node as
[23] J. F. Wycoff, S. Dillavou, M. Stern, A. J. Liu, and
D. J. Durian, Desynchronous learning in a physics-driven 1 1
learning network, The Journal of Chemical Physics 156, y1F = y2 + (L1 − L2 ). (A1)
144903 (2022).
2 2
[24] M. Stern, S. Dillavou, M. Z. Miskin, D. J. Durian, and Before training, this differs from the desired output value
A. J. Liu, Physical learning beyond the quasistatic limit,
given by
Physical Review Research 4, L022037 (2022).
[25] R. H. Lee, E. A. Mulder, and J. B. Hopkins, Mechanical 1
neural networks: Architected materials that learn behav- y1D = y2 + a. (A2)
iors, Science Robotics 7, eabq7278 (2022). 2
[26] V. P. Patil, I. Ho, and M. Prakash, Self-learning mechan-
for whatever value the user has chosen for a. During
ical circuits, arXiv preprint arXiv:2304.08711 (2023).
[27] P. Dayan, M. Sahani, and G. Deback, Unsupervised
training, the goal is to adjust the turnbuckle lengths so
learning, The MIT encyclopedia of the cognitive sciences that y1F → y1D , which happens when (L1 − L2 )/2 → a.
, 857 (1999). The update rule to achieve this is based on comparing the
[28] C. Arinze, M. Stern, S. R. Nagel, and A. Murugan, Learn- free state to the clamped state, where the output node
ing to self-fold at a bifurcation, Physical Review E 107, is clamped by nudge factor 0 < η ≤ 1 toward the desired
025001 (2023). position
[29] M. Stern, C. Arinze, L. Perez, S. E. Palmer, and A. Mu-
rugan, Supervised learning through physical changes in a y1C = ηy1D + (1 − η)y1F . (A3)
mechanical system, Proceedings of the National Academy
of Sciences 117, 14843 (2020). The general discrete update rule, Eq. (7), may now be
[30] T. M. Hope, Linear regression, in Machine Learning (El- evaluated by computing the node-node separations of
sevier, 2020) pp. 67–81.
each edge in free and clamped states using the above
[31] S. Lloyd, Least squares quantization in pcm, IEEE trans-
actions on information theory 28, 129 (1982).
expressions. This gives
[32] P. Koskinen, E. Bitzek, F. Gähler, M. Moseler, and  
P. Gumbsch, Fire: Fast inertial relaxation engine for op- L1 − L2
∆L1 = αη a − (A4)
timization on all scales, Last accessed: July 30, 2006 2
(2019). ∆L2 = −∆L1 (A5)
[33] S. Misra, M. Mitchell, R. Chen, and C. Sung, Design
and control of a tunable-stiffness coiled-actuator, IEEE Note that these updates are equal and opposite, and van-
International Conference on Robotics and Automation
ish when learning is achieved. Also note that the nudge
(ICRA) (2023).
factor η and update factor α equivalently affect the rate
of learning. Thus, in experiment, we take η = 1 and
vary α without loss of generality. It should be empha-
sized that in the lab we use only Eq. (7) to make the
updates based on measurement of node-node separations
in free and clamped states. In the lab, physics “com-
putes” y1F automatically, whereas here in the appendix
we additionally bring force balance to bear in order to
predict learning dynamics.
To obtain differential equations from the predicted
discrete update rules, one might guess that the time-
derivatives of L1 and L2 should be proportional to the
right-hand sides of Eqs. (A4)-(A5). This turns out to
be true. To see, recall that the elastic energy in the
clamped state must be greater than that in the free state
and the difference serves as the coupled learning contrast
9

function. For the motion divider, it can be explicitly


computed using the above positions and simplifies to

C = UC − UF (A6)
 2
L1 − L2
= kη 2 a − , (A7)
2

Not only is this intrinsically positive, it is proportional


to the loss function L = (y1D − y1F )2 . Therefore, gradient
descent on C is perfectly aligned with gradient descent on
L. The resulting coupled learning dynamical equations,
dLi /dt = −γdC/dLi , simplify to
 
∂L1 L1 − L2
=ν a− , (A8)
∂t 2
∂L2 ∂L1 FIG. 5. Simulated training of the one-output symmetry
=− . (A9)
∂t ∂t training with varying hyperparameters. Learning degrees of
freedom oscillate over the course of training, and even the
where ν = γkη 2 is a rate constant with units of 1/time. cost function does not decrease monotonically. Training with
These are in agreement with intuition from the discrete η = 1 and α = 2 (a) takes approximately 2000 steps to reach
version above. the minimal cost value. Training with a high learning rate
The time-evolution equations, Eqs. (A8)-(A9) may be α = 2.5 (b) reduces the training time to only 200 but displays
directly integrated to obtain unstable learning dynamics.
 0
L02 L0 L0

−νt L1
L1 (t) = e − − a + a + 1 + 2 , (A10)
2 2 2 2
0 0 0
L0
 
L L L We train the network using this training scheme in
L2 (t) = e−νt a − 1 + 2 − a + 1 + 2 , (A11) eq
2 2 2 2 simulation, using the equilibrium node position ysource
and the bottom edge of the frame as the two input data
where L01 and L02 are the initial rest lengths of the two points. The outputs data points are both at x = 0, in
edges, respectively. The network stops evolving when the center of the frame. Choosing two input data points
that are well separated increases the speed of learning
L1 − L2 because the nodes move a greater distance between the
− a ≤ ϵ, (A12)
2 alternating training steps.
where ϵ is the desired training accuracy. Note that the Two simulated trainings of this one-output task are
analytic solutions imply that the training error shown in Fig. 5. Both trainings use the full nudge, η = 1,
 0 0
 and the learning rate α is varied. Fig. 5(a) depicts the
L1 (t) − L2 (t) −νt L1 − L2 standard training regime, where α = 2. Training takes an
−a=e −a . (A13)
2 2 excessively long 2000 steps to reach the minimum cost,
with training times increasing with decreasing learning
decreases exponentially in time. Thus, the required du-
rate. L4 and L5 exhibit oscillatory dynamics over the
ration is set by the training rate ν, as well as the initial
course of training, while L1 and L2 incrementally move
conditions and the desired accuracy.
towards their goal values. Increasing the learning rate
to αs = 2.5 moves us into the “overtraining” regime,
Appendix B: One-Output Symmetry Training
Fig. 5(b), in which each learning degree of freedom can
evolve past its goal value in a single training step. Here,
training time is reduced significantly to only 200 steps,
As expressed in Sec. IV, there are multiple ways to but the cost function does not decrease monotonically
define a task in the physical degrees of freedom that is over the course of training. The cost can increase by up
satisfied by the desired state, L1 = L2 and L4 = L5 . to an order of magnitude in only a few training steps, but
Such a well-defined task of this nature could be on average decreases by 10 orders of magnitude. The rest
xtarget = 0 for any two ysource values. lengths also exhibit strong oscillatory behavior, and can
even switch their symmetry states between training steps.
Unlike the two-output training, which only required a While this type of overtraining is in principle possible to
single data point to train, this training scheme requires perform experimentally, the unstable learning dynamics
that we choose two values of ysource , which we alternate make for a non-ideal demonstration of coupled learning’s
between during training steps. capabilities.

You might also like