0% found this document useful (0 votes)

34 views1 page

Higher-Order Runge-Kutta in DNNs

The document discusses applying higher-order Runge-Kutta methods to train neural networks. Specifically, it aims to model deep neural network training as an optimal control problem to simplify network design, analyze stability and generalization, and develop variational frameworks. It presents the motivation for using higher-order methods due to their equivalence to ResNets and skip connections. It outlines the 4th-order Runge-Kutta scheme and provides numerical results demonstrating its effectiveness in training a simple convolutional network model.

Uploaded by

l15801823611

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views1 page

Higher-Order Runge-Kutta in DNNs

Uploaded by

l15801823611

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A PPLYING H IGHER -O RDER R UNGE -K UTTA M ETHODS TO N EURAL N ETWORKS

D EREK O NKEN AND L ARS R UTHOTTO D EPARTMENT OF M ATHEMATICS AND C OMPUTER S CIENCE , E MORY U NIVERSITY

O BJECTIVES M OTIVATION M ODEL

Broader Goals: Model training of deep neural net- Since the community recognizes the effectiveness of
works (DNNs) as optimal control problem. Resnets and their skip connections (shown to be equiv-
alent to Forward Euler), wouldn’t higher-order Runge- Loop Back Twice
1. simplify design of DNNs Kutta schemes assist in training? P
(≈ discretize a PDE) Opening Layer Dynamic Unit Connecting Layer Dense Layer
o
o
Dog
2. analyze stablity and generalization ! ∘ # ∘ $%&'3 ) Runge-Kutta scheme ! ∘ # ∘ $%&'1 ) +)
(≈ vanishing/exploding gradients) R UNGE -K UTTA S CHEMES l

3. develop variational framework Goal: Improve training by maintaining few parame-

(; multilevel and multiscale learning) ters and controlling conditioning
4. design reversible dynamics N UMERICAL R ESULTS N OISY S TOCHASTIC S HIFTS
(; memory-free learning) Recall the Fourth-Order Runge-Kutta We train a simple model of a convolutional opening Goal: Analyze the network when the time-stepping is
Defining the length of the j-th time interval by layer, three blocks containing the RK scheme doubling varied every epoch
Current focus:
the channels each pass, and one fully connected layer.
hj = tj+1 − tj , Fixing the control time steps tθ = [0, 1, 2, 3, 4] and state
1. research: model order reduction, efficient opti- The dynamic unit is the only portion that we vary.
mization, stable dynamics, time-integrators [1] time steps tY = [0, 1, 2, 3, 4] or [0, 2, 4],
the update scheme reads Our learning strategy uses 120 epochs of SGD with mo- at every epoch, draw noise from a uniform distribu-
2. community: free MATLAB/Julia software
3. accessibility: building models in pyTorch hj mentum with initial learning rate of 0.1 which reduces tion.
uj+1 = uj + f (θ(tj ), z1 ) + 2f (θ(tj+1/2 ), z2 ) by a factor of 10 after epochs 60, 80, 100.
6 This varies the interpolation of the control weights to
DNN S MEET O PTIMAL C ONTROL

+2f (θ(tj+1/2 ), z3 ) + f (θ j+1 , z4 ) 80
STL-10 Double Sym Layer obtain different state weights.
Goal: Find a function f : Rn × Rp → Rm and its pa- where f is the primary layer in the dynamic unit as a
70
Results:
rameter θ ∈ Rp such that f (yk , θ) ≈ ck for training 60

Validation Accuracy
function of the controls θ(tk ) and intermediate states For a Double Sym Layer in the dynamic unit
data y1 , . . . , ys ∈ Rn and labels c1 , . . . , cs ∈ Rm . zi that are computed as follows 50

CIFAR-10 Noisy Double Sym Layer tY=[0,1,2,3,4]

40 100

Model ykN = f (yk , θ) as output of Residual Neural z1 = uj 30 RK4 [2]

Network (ResNN) with N layers. Let yk0 = yk and hj

RK4 [1]
RK1 [1] 80
20 RK1 [.5]

Valdiation Accuracy
z2 = uj + f (θ(tj ), uj ) RK1 [.25] 70

yki+1 = yki + hg(yki , θi ), ∀i = 0, . . . , N − 1. 2 10

RK1 [.125]
60
0 500 1000 1500 2000 2500 3000
hj Time (s) 50
z3 = uj + f (θ(tj+1/2 ), z1 )
(g transforms features, e.g., g(y, θ) = tanh(K(θ)y)) 2 100
CIFAR-10 Double Sym Layer 40
no noise
U[-.1,.1]

Note that ResNN is a forward Euler discretization [2] z4 = uj + hj f (θ(tj+1/2 ), z2 ) 90

30
U[-.2,.2]
U[-.3,.3]
20 U[-.4,.4]
of the initial value problem (t ∈ [0, T ]) 80
U[-.5,.5]

Validation Accuracy
10
0 20 40 60 80 100 120
From this RK4 scheme for f , we build a dynamic unit 70
Epoch

∂t yk (t, θ) = g(yk (t, θ), θ(t)), yk (0, θ) = yk as part of a simple model to compare different time- CIFAR-10 Noisy Double Sym Layer tY=[0,2,4]
60 100

steppings for when f is a layer of type: 90

Learning: Find θ and weights of classifier by solving 50 RK4 [2]
RK4 [1]
80
RK1 [1]
Double / ResNN: σ2 ◦ N2 ◦ Kθ2 ◦ σ1 ◦ N1 ◦ Kθ1 (Y ) 40 RK1 [.5]

Valdiation Accuracy
RK1 [.25] 70
s
1 X 30
RK1 [.125]

min loss(yk (T, θ)W, ck ) + regularizer(θ, W). Preactivated Double: N2 ◦ Kθ2 ◦ σ2 ◦ N1 ◦ Kθ1 ◦ σ1 (Y ) 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
60

θ,W s Time (s) 50

k=1
Double Sym / Parabolic [3]: −Kθ> ◦ σ ◦ N ◦ Kθ (Y ) 40
no noise
U[-.1,.1]
learning ≈ mass transport, trajectory planning for activation functions σ, normalizations N , and con- T EAM 30

20
U[-.2,.2]
U[-.3,.3]
U[-.4,.4]

volution operators K defined by weights θ 10

U[-.5,.5]

R EFERENCES • Eldad Haber (UBC, Vancouver)

0 20 40 60
Epoch
80 100 120

• Eran Treister (Ben Gurion, Israel)

[1] Chen et al. Neural Ordinary Differential Equations.. NeurIPS, S OFTWARE • Simion Novikov (Ben Gurion, Israel)
2018.
Github:
F UTURE D IRECTIONS
[2] E Haber, L Ruthotto Stable Architectures for Deep Neural Net- • Meganet.m: academic and teaching tool
• Loss Landscape Analysis
works. Inverse Problems, 2017. • Meganet.jl: high-performance dis- F UNDING
[3] L Ruthotto, E Haber Deep Neural Networks Motivated by Par- tributed computing • Adversarial Vulnerability Analysis
tial Differential Equations. arXiv, 2018.
Supported by the National Science Foundation
• PyTorch implementations in the works awards DMS 1522599 and CAREER DMS 1751636 • Adaptive Time-Stepping
and by NVIDIA Corporation. • Adams-Bashforth Methods

Runge-Kutta Neural Networks for ODEs
No ratings yet
Runge-Kutta Neural Networks for ODEs
14 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
No ratings yet
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
11 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
No ratings yet
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
29 pages
Amos, Kolter - 2017 - OptNet Differentiable Optimization As A Layer in Neural Networks
No ratings yet
Amos, Kolter - 2017 - OptNet Differentiable Optimization As A Layer in Neural Networks
10 pages
Machine Learning for Dynamical Systems
No ratings yet
Machine Learning for Dynamical Systems
12 pages
Physics-Informed Deep Learning
No ratings yet
Physics-Informed Deep Learning
19 pages
Mathematics of Deep Learning Lecture Notes
100% (1)
Mathematics of Deep Learning Lecture Notes
58 pages
Tac 232
No ratings yet
Tac 232
7 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
RNN As Univ Approx
No ratings yet
RNN As Univ Approx
15 pages
A Recurrent Neural-Network-Based Real-Time Learning Control Strategy Applying To Nonlinear Systems With Unknown Dynamics
No ratings yet
A Recurrent Neural-Network-Based Real-Time Learning Control Strategy Applying To Nonlinear Systems With Unknown Dynamics
11 pages
Latent DeepONet for PDE Solutions
No ratings yet
Latent DeepONet for PDE Solutions
22 pages
DeepGraphONet for System Dynamics
No ratings yet
DeepGraphONet for System Dynamics
10 pages
Diagonal Recurrent Neural Networks For Dynamic Systems Control
No ratings yet
Diagonal Recurrent Neural Networks For Dynamic Systems Control
13 pages
Deep Learning for Linear Programming Solutions
No ratings yet
Deep Learning for Linear Programming Solutions
22 pages
1 s2.0 S0021999119304504 Main
No ratings yet
1 s2.0 S0021999119304504 Main
16 pages
Recurrent Neural Network Application
No ratings yet
Recurrent Neural Network Application
10 pages
Machine Learning for PDE Solutions
No ratings yet
Machine Learning for PDE Solutions
59 pages
Optimal Control of Robots with Wavelets
No ratings yet
Optimal Control of Robots with Wavelets
18 pages
Application of HNN For Max Cut Problem
No ratings yet
Application of HNN For Max Cut Problem
6 pages
Deep Learning Quiz 1: Concepts & Questions
No ratings yet
Deep Learning Quiz 1: Concepts & Questions
5 pages
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
No ratings yet
Artificial Neural Network Methods For The Solution of Second Order Boundary Value Problems
15 pages
RNN Time Series Prediction with SCKF
No ratings yet
RNN Time Series Prediction with SCKF
16 pages
Nonlinear System Identification with FLANN
No ratings yet
Nonlinear System Identification with FLANN
7 pages
Advanced Non Parametrics Write Up Nonparametric Regression Using Relu Networks
No ratings yet
Advanced Non Parametrics Write Up Nonparametric Regression Using Relu Networks
3 pages
Neural ODE
No ratings yet
Neural ODE
21 pages
2021 04 Deeponet Deep Neural Network Based Approximate
No ratings yet
2021 04 Deeponet Deep Neural Network Based Approximate
5 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
36 Neural Operator Graph Kernel N
No ratings yet
36 Neural Operator Graph Kernel N
21 pages
Deep Learning for Lyapunov Exponents
No ratings yet
Deep Learning for Lyapunov Exponents
15 pages
Complex-Valued Recurrent Neural Network With Iir
No ratings yet
Complex-Valued Recurrent Neural Network With Iir
11 pages
Data-Efficient Active Weighting for Control
No ratings yet
Data-Efficient Active Weighting for Control
5 pages
Short-Run MCMC for Super-Resolution
No ratings yet
Short-Run MCMC for Super-Resolution
5 pages
Neural Network Solutions for Nonlinear ODEs
No ratings yet
Neural Network Solutions for Nonlinear ODEs
13 pages
Nueral Network
No ratings yet
Nueral Network
10 pages
Feed Forward Neural Network Overview
No ratings yet
Feed Forward Neural Network Overview
11 pages
Physics-Informed Deep Learning for PDEs
100% (1)
Physics-Informed Deep Learning for PDEs
22 pages
Lu DeepONet NMachineIntell21
No ratings yet
Lu DeepONet NMachineIntell21
15 pages
Sol 4
No ratings yet
Sol 4
7 pages
Computationally Efficient Gauss-Newton Reinforcement Learning For Model Predictive Control
No ratings yet
Computationally Efficient Gauss-Newton Reinforcement Learning For Model Predictive Control
14 pages
Control Using Soft Computing PDF
No ratings yet
Control Using Soft Computing PDF
8 pages
RBF Neural Network for Manipulator Control
No ratings yet
RBF Neural Network for Manipulator Control
4 pages
MATLAB Neural Network Guide
No ratings yet
MATLAB Neural Network Guide
7 pages
One-Layer Neural Network for Optimization
No ratings yet
One-Layer Neural Network for Optimization
12 pages
Iterative Learning for Random Weight Neural Networks
No ratings yet
Iterative Learning for Random Weight Neural Networks
23 pages
Chebyshev Spline Neural Filter for System Identification
No ratings yet
Chebyshev Spline Neural Filter for System Identification
5 pages
Deep Learning for Time-Periodic p-Laplacian
No ratings yet
Deep Learning for Time-Periodic p-Laplacian
15 pages
The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks
No ratings yet
The Numerical Solution of Linear Ordinary Differential Equations by Feedforward Neural Networks
25 pages
Physics-Informed Neural Networks
100% (2)
Physics-Informed Neural Networks
22 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
A2
No ratings yet
A2
13 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Neural Network for Nonlinear Control
No ratings yet
Neural Network for Nonlinear Control
7 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
Multi-Grid Fourier Neural Operator for PDEs
No ratings yet
Multi-Grid Fourier Neural Operator for PDEs
29 pages
Deep Learning for Inverse Problems 2018
No ratings yet
Deep Learning for Inverse Problems 2018
29 pages
BE Syllabus 2022-23
No ratings yet
BE Syllabus 2022-23
19 pages
Optimized Wind Speed Forecasting Model
No ratings yet
Optimized Wind Speed Forecasting Model
12 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Using Segment-Level Attention To Guide Breast Ultrasound Video Classification
No ratings yet
Using Segment-Level Attention To Guide Breast Ultrasound Video Classification
5 pages
Pipelineprofiler:: A Visual Analytics Tool For The Exploration of Automl Pipelines
No ratings yet
Pipelineprofiler:: A Visual Analytics Tool For The Exploration of Automl Pipelines
11 pages
Deep Learning and Pure Mathematics
No ratings yet
Deep Learning and Pure Mathematics
16 pages
Evolution and Applications of Computer Vision
No ratings yet
Evolution and Applications of Computer Vision
17 pages
Research - Paper - Format - For MCA
No ratings yet
Research - Paper - Format - For MCA
6 pages
Comparison of MT and HT
No ratings yet
Comparison of MT and HT
6 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
43 pages
Machine Learning for Human Activity Recognition
No ratings yet
Machine Learning for Human Activity Recognition
14 pages
Neural Network Lab Experiments Guide
No ratings yet
Neural Network Lab Experiments Guide
15 pages
AI Overview: Huawei AI Academy Training Materials
No ratings yet
AI Overview: Huawei AI Academy Training Materials
41 pages
ASET 2023 Program (20 - 23 February 2023) - Higher Colleges of Technology
No ratings yet
ASET 2023 Program (20 - 23 February 2023) - Higher Colleges of Technology
50 pages
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
No ratings yet
Dynamic Android Malware Category Classification Using Semi-Supervised Deep Learning
8 pages
Internship Report
No ratings yet
Internship Report
21 pages
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
No ratings yet
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
116 pages
HCIA-AI V3.0 Exam Guide and Topics
No ratings yet
HCIA-AI V3.0 Exam Guide and Topics
3 pages
Brain Tumor Detection via MRI Techniques
No ratings yet
Brain Tumor Detection via MRI Techniques
7 pages
Machine Learning in Physical Sciences
No ratings yet
Machine Learning in Physical Sciences
4 pages
Missing Child Identification System
No ratings yet
Missing Child Identification System
3 pages
Road Lane Detection Project Report
No ratings yet
Road Lane Detection Project Report
9 pages
Use of Artificial Intelligence - EUROSTAT
No ratings yet
Use of Artificial Intelligence - EUROSTAT
10 pages
AI: Definitions, Applications, and Ethics
No ratings yet
AI: Definitions, Applications, and Ethics
27 pages
Sapkota Et Al., 2025
No ratings yet
Sapkota Et Al., 2025
28 pages
Comprehensive Guide to Artificial Intelligence
No ratings yet
Comprehensive Guide to Artificial Intelligence
11 pages
Project PPT 2
No ratings yet
Project PPT 2
13 pages
AI in Art Therapy Assessment
No ratings yet
AI in Art Therapy Assessment
10 pages
Ethics S
No ratings yet
Ethics S
54 pages
From Theory To Practice The Evolution of Artificial Intelligence in Business
No ratings yet
From Theory To Practice The Evolution of Artificial Intelligence in Business
15 pages

Higher-Order Runge-Kutta in DNNs

Uploaded by

Higher-Order Runge-Kutta in DNNs

Uploaded by

A PPLYING H IGHER -O RDER R UNGE -K UTTA M ETHODS TO N EURAL N ETWORKS

O BJECTIVES M OTIVATION M ODEL

3. develop variational framework Goal: Improve training by maintaining few parame-

CIFAR-10 Noisy Double Sym Layer tY=[0,1,2,3,4]

Model ykN = f (yk , θ) as output of Residual Neural z1 = uj 30 RK4 [2]

Network (ResNN) with N layers. Let yk0 = yk and hj

yki+1 = yki + hg(yki , θi ), ∀i = 0, . . . , N − 1. 2 10

Note that ResNN is a forward Euler discretization [2] z4 = uj + hj f (θ(tj+1/2 ), z2 ) 90

steppings for when f is a layer of type: 90

θ,W s Time (s) 50

volution operators K defined by weights θ 10

R EFERENCES • Eldad Haber (UBC, Vancouver)

• Eran Treister (Ben Gurion, Israel)

You might also like