Automatic Differentiation

0:00so hi my name is Jarret my mentor is
0:03Miles Lubin and Theo Papa Marco he's

0:06unfortunately not able to be here this
0:07conference but I'm going to be working
0:09on it's not showing but I'm going to be
0:11work on automatic differentiation in
0:13Julia there's currently a couple
0:15packages that use automatic
0:17differentiation what it is is it's a
0:19method to essentially take exact
0:22derivatives really quickly there are
0:24other methods of course one common one
0:27is the method of finite differencing
0:28which uses the limit definition of a
0:33derivative and essentially that's really
0:35prone to approximation error and it's
0:39quite slow we can leverage these cool
0:43things called dual numbers to take exact
0:46derivatives and a manner that's much
0:51faster so I guess I don't don't have
0:53this up there yet but if there's oh oh
0:57there we go that makes my job easier I
0:59guess um so essentially a dual number is
1:04kind of like a complex number except
1:05instead of having an imaginary component
1:07it has a an infinitesimal component and
1:11so essentially what you get is when you
1:13run this number through any generic
1:16function the output of that function
1:20ends up being the evaluation of that
1:23function at the real component and the
1:25evaluation of the derivative of that
1:27function at the real component in that
1:29in that derivative evaluation is the
1:32epsilon component which is the infinite
1:35Testament component so basically there's
1:40already a type that exists in Julia for
1:44using dual numbers it's in the dual
1:45numbers package that has a single
1:47epsilon component still not out there
1:51but we can basically take gradients of
1:57functions faster if we use more than one
2:00epsilon capone at a time with what you
2:02end up having is you have multiple
2:04epsilon components e oh here we go cool
2:07so i'm already down so this part if you
2:12can see it is essential
2:13what I was saying before the epsilon
2:16component on the right essentially gives
2:19you the evaluation of the derivative of
2:22the function at your real input and so
2:25we have this dual number type that
2:27currently exists and we can use it to
2:30evaluate gradients by just evaluating
2:33the directional derivatives one at a
2:35time so this is kind of what that looks
2:37like so that takes it an evaluation of f
2:43the number of x equal to the length of
2:46the input vector so that's obviously
2:48kind of slow we can add more epsilon
2:53components to basically essentially take
2:57more directional derivatives in a single
2:59pass of X of F and a single evaluation
3:03of F and so essentially there are a
3:07couple of different ways we can
3:10implement a type that has multiple
3:13epsilon components and they're the two
3:17main competing implementations this is
3:19just an example of of doing this
3:22evaluation the two competing
3:25implementations are using tuples and
3:28vectors so with the latest tuple update
3:31the to apocalypse it's a tuple members
3:36are stack allocated so that could
3:39potentially be great because you know
3:41things things operations using the
3:43members could speed up the
3:45implementation we currently have four
3:47for this relies a lot on generated
3:49functions which is not as nice it ends
3:53up making really hard to do things like
3:55use the simdi macro to have more really
4:00parallelized operations on a processor
4:01level so there's another implementation
4:04that uses vectors and so vectors are
4:07nice because you won't run into possible
4:10stack overflow errors if you have lots
4:11of components right and it also we can
4:16take advantage of simdi instructions and
4:19the fast math macro and other things so
4:24so basically I've
4:27so what I've been working on are these
4:29implementations for I started a little
4:31bit early I know the current projects
4:35supposed to started june fifteenth i
4:36started at the beginning of june but so
4:40essentially what i've done so far is
4:42taken a couple of different test
4:44functions so the rosen Brock functions a
4:46common optimization test function
4:48there's also the ackley function which
4:51normally returns a vector takes in a
4:54vector returns a vector but I modified
4:56it to return a scalar simply summing the
4:58components and then a really simple kind
5:01of test function to to have just a
5:06really simple benchmark and so my
5:09results if they can fit on the screen
5:11here basically what you see for let's go
5:16to a more telling plot this is probably
5:18the best one here so what you kind of
5:22see essentially is that as we increase
5:24the first of all the Green Line is the
5:28the the simple dual number
5:30implementation that already exists for
5:32automatic differentiation that can be
5:33found an optimum package so using two
5:38poles which is this blue line here we
5:40see we get a significant speed by the
5:42time we hit five epsilon components
5:44that's about a four time speed up
5:45already on top of that but if we go to a
5:49larger number of components which which
5:51is a pretty so something you'd is pretty
5:55desirable if you have a large input
5:57vector because then you can the more
5:59epsilon components you use the fewer
6:00evaluations of f you have to be made to
6:03evaluate a gradient so so as you go
6:07towards larger larger input vectors it
6:10seems like using the vector
6:12implementation of the in dual numbers is
6:16preferable so yes so essentially that's
6:20what I have the work later in the summer
6:23will probably be to really nail down a
6:27solid implementation and kind of
6:29proliferate its use throughout the Julia
6:32ecosystem wherever automatic
6:33differentiation is needed that's me
6:46okay so that's a good question so
6:50Julia's way of having of being able to
6:54overload functions in a manner that's
6:58really efficient basically input type of
7:01they are based on the types of the
7:02arguments basically allows us to
7:04overload elementary functions which if
7:07we go back if we go back to this if we
7:11go back to this definition right here
7:13essentially because the systems we want
7:17to implement are our systems where you
7:19can take a native Julia function that
7:21you've just coded up randomly that takes
7:23in a vector and returns whatever you can
7:26take that function pass it to this
7:28automatic differentiation tool and it
7:30will take the gradient of it to do that
7:32we have to be able to go through each of
7:34the elementary functions that people use
7:36to compose other functions and overload
7:41their their overload their operations
7:44using dual numbers and so such that the
7:47dual number that gets returned for
7:49example if this was if this was sign if
7:51G was signed here we would just want to
7:53overload it so that you know it would it
7:56would return sine X plus whatever that
7:59whatever why is here times cosine cosine
8:05x times the epsilon components so being
8:08able to evaluate just define a set of
8:10definitions essentially for all the
8:12elementary functions that is just as
8:14performant as a native implementation
8:18would be basically allows us to allow
8:22this to be a tenable idea essentially
8:24whereas like something in like Python
8:27you try to overload the sine function
8:29and it's it's it's going to be a sad
8:32time maybe it's that time yeah
8:37I have a situation where I have
8:41functions that i can write in tulia that
8:43i need to differentiate however inside
8:45those functions there are calls to
8:48functions that might be implemented in
8:49C++ is there a way for me to say hey
8:52I've got a function and I'm going to
8:54tell me what the derivative of that
8:55function is but the thing I watching our
8:57auto differentiate is some other
8:58function and Julia that calls that
9:00function so out of the box I'm going to
9:04say we don't have that implemented but
9:07what you could do is you could overload
9:10your own function with the dual number
9:12type that we basically provide and if
9:15you if you define if you define that
9:17then like it will basically pass through
9:19the calculation and work correctly but
9:22it yeah yeah you would have to you
9:26basically have to define the behavior of
9:28whatever other function you're using on
9:30dual numbers yourself which is not like
9:33the best thing but there might be some
9:35way to do like code introspection or
9:38something later to make it easier to do
9:40that but it will probably always be a
9:41thing if you're not writing a native
9:43julia code then it might always be a
9:45more of the hassle
10:00yeah okay yeah let me make sense you it
10:07is all automatic appreciate
10:09differentiate it might be worth it just
10:10because it automatic different cheated
10:12is also faster than using a lot of the
10:15methods that give you approximate
10:16derivatives so but I guess it depends on
10:19how much implement work yourself yeah
10:22yeah okay yeah yeah there you go cool
10:26any other questions you need to evaluate
10:34a directional derivative so I mean
10:36essentially the gradient here is just a
10:42series of directional derivatives that
10:45you're taking with the function so
10:47generally if you're using a single
10:49epsilon component you need to you need
10:51to take as many directional derivatives
10:54as the length of the input vector that
10:57you have that you're giving to the
10:58function write it right and so and so if
11:06you if you what I was right um yeah yeah
11:16there you go if you want to do further
11:17yet to go through it again go through
11:19the pain again but but yeah so being
11:23able to cut down on the number of
11:25evaluations of F that have to occur to
11:27take those derivatives is like a really
11:29important thing to do essentially
11:32because we don't because that's that's
11:34where a lot of this slowness comes from
11:35the book of the time spent all ri
Yesterday at 7:27 PM
Cherry Joy Pepito
11:35the book of the time spent all right oh
11:52there is a package that exists called
11:55hyper dual numbers dodge al that
11:58basically implements that has a dual
12:02number implementation that is
12:04specifically useful for taking second
12:06derivatives I don't really know much
12:08about it you could just evaluate the
12:11gradient and then evaluate the gradient
12:12again if you wanted to take the second
12:15derivative of F in this in this case but
12:18it might be it might be faster to rely
12:19on basically a different mathematical
12:22model to to do so I'm not sure yet
12:25that's something to look at anything
12:29else

Automatic Differentiation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Differentiation

Uploaded by

Copyright:

Available Formats

0:00so hi my name is Jarret my mentor is

0:03Miles Lubin and Theo Papa Marco he's

You might also like