Professional Documents
Culture Documents
Havard Berland
1 / 21
Abstract
2 / 21
What is automatic differentiation?
Automatic
differentiation
f(x) {...}; df(x) {...};
human
programmer
y = f (x) y 0 = f 0 (x)
symbolic differentiation
(human/computer)
3 / 21
Why automatic differentiation?
Scientific code often uses both functions and their derivatives, for
example Newtons method for solving (nonlinear) equations;
4 / 21
Divided differences
f (x + h) f (x) approximate
f 0 (x) = lim
h0 h
f (x + h)
exact
so why not use
f (x + h) f (x) f (x)
f 0 (x)
h
x x +h
for some appropriately small h?
5 / 21
Accuracy for divided differences on f (x) = x 3
error = f (x+h)f (x)
3x 2
h
ce
enr
ffe
108
di
d
e
er
x = 10
nt
ce
1012 x = 0.1
desired accuracy x = 0.001
1016 h
1016 1012 108 104 100
1 1 x
(x + xd) = x xd, = 2d (x 6= 0)
x + xd x x
7 / 21
Polynomials over dual numbers
Let
P(x) = p0 + p1 x + p2 x 2 + + pn x n
and extend x to a dual number x + xd.
Then,
8 / 21
Functions over dual numbers
9 / 21
Conclusion from dual numbers
f (x1 , x2 ) = x1 x2 + sin(x1 )
extends to
f (x1 + x1 d1 , x2 + x2 d2 ) =
(x1 + x1 d1 )(x2 + x2 d2 ) + sin(x1 + x1 d1 ) =
x1 x2 + (x2 + cos(x1 ))x1 d1 + x1 x2 d2
where di dj = 0.
10 / 21
Decomposition of functions, the chain rule
Two approaches,
I Source code transformation (C, Fortran 77)
I Operator overloading (C++, Fortran 90)
12 / 21
Source code transformation by example
function.c
double f ( double x1 , double x2 ) {
double w3 , w4 , w5 ;
w3 = x1 * x2 ;
w4 = sin ( x1 );
w5 = w3 + w4 ;
return w5 ;
}
function.c
13 / 21
Source code transformation by example
diff function.c
double* f ( double x1 , double x2 , double dx1, double dx2) {
double w3 , w4 , w5, dw3, dw4, dw5, df[2];
w3 = x1 * x2 ;
dw3 = dx1 * x2 + x1 * dx2;
w4 = sin ( x1 );
dw4 = cos(x1) * dx1;
w5 = w3 + w4 ;
dw5 = dw3 + dw4;
df[0] = w5;
df[1] = dw5;
return df;
}
AD tool C compiler
13 / 21
Operator overloading
function.c++
Number f ( Number x1 , Number x2 ) {
w3 = x1 * x2 ;
w4 = sin ( x1 );
w5 = w3 + w4 ;
return w5 ;
}
function.c++
DualNumbers.h
14 / 21
Source transformation vs. operator overloading
Operator overloading:
No changes in your original code
Flexible when you change your code or tool
Easy to code the AD tool
Only possible in selected languages
Current compilers lag behind, code runs slower
15 / 21
Forward mode AD
w5 = w3 + w4
Forward propagation
w5
of derivative values
w4 = cos(w1 )w1 w3 = w1 w2 + w1 w2
w4 w3
sin
w2 seeds, w1 , w2 {0, 1}
w1
w1
x1 x2
d
16 / 21
Reverse mode AD
f = w5 = 1 (seed)
Backward propagation
w5
of derivative values
w4 = w5 w
w4 = w5 1
5
w3 = w5 w
w3 = w5 1
5
w4 w3
sin
w3
w1a = w4 cos(w1 ) w2 = w3 w2
= w3 w1
w1b = w3 w2
x1 x2
x1 = w1a + w1b = cos(x1 ) + x2 x2 = w2 = x1
d
17 / 21
Jacobian computation
Given F : Rn 7 Rm and the Jacobian J = DF (x) Rmn .
f1 f1
x1 xn
J = DF (x) =
fm fm
x1 xn
F : Rn R
G : R Rm
I
recursing through the chain rule.
For n > 1 and m > 1 there is a golden
mean, but finding the optimal way is
?
probably an NP-hard problem.
19 / 21
Discussion
20 / 21
Applications of AD
Recommended literature:
I Andreas Griewank: Evaluating Derivatives. SIAM 2000.
21 / 21