Automatic Differentiation

0:00so hi my name is Jarret my mentor is
0:03Miles Lubin and Theo Papa Marco he's

0:06unfortunately not able to be here this
0:07conference but I'm going to be working
0:09on it's not showing but I'm going to be
0:11work on automatic differentiation in
0:13Julia there's currently a couple
0:15packages that use automatic
0:17differentiation what it is is it's a
0:19method to essentially take exact
0:22derivatives really quickly there are
0:24other methods of course one common one
0:27is the method of finite differencing
0:28which uses the limit definition of a
0:33derivative and essentially that's really
0:35prone to approximation error and it's
0:39quite slow we can leverage these cool
0:43things called dual numbers to take exact
0:46derivatives and a manner that's much
0:51faster so I guess I don't don't have
0:53this up there yet but if there's oh oh
0:57there we go that makes my job easier I
0:59guess um so essentially a dual number is
1:04kind of like a complex number except
1:05instead of having an imaginary component
1:07it has a an infinitesimal component and
1:11so essentially what you get is when you
1:13run this number through any generic
1:16function the output of that function
1:20ends up being the evaluation of that
1:23function at the real component and the
1:25evaluation of the derivative of that
1:27function at the real component in that
1:29in that derivative evaluation is the
1:32epsilon component which is the infinite
1:35Testament component so basically there's
1:40already a type that exists in Julia for
1:44using dual numbers it's in the dual
1:45numbers package that has a single
1:47epsilon component still not out there
1:51but we can basically take gradients of
1:57functions faster if we use more than one
2:00epsilon capone at a time with what you
2:02end up having is you have multiple
2:04epsilon components e oh here we go cool
2:07so i'm already down so this part if you
2:12can see it is essential
2:13what I was saying before the epsilon
2:16component on the right essentially gives
2:19you the evaluation of the derivative of
2:22the function at your real input and so
2:25we have this dual number type that
2:27currently exists and we can use it to
2:30evaluate gradients by just evaluating
2:33the directional derivatives one at a
2:35time so this is kind of what that looks
2:37like so that takes it an evaluation of f
2:43the number of x equal to the length of
2:46the input vector so that's obviously
2:48kind of slow we can add more epsilon
2:53components to basically essentially take
2:57more directional derivatives in a single
2:59pass of X of F and a single evaluation
3:03of F and so essentially there are a
3:07couple of different ways we can
3:10implement a type that has multiple
3:13epsilon components and they're the two
3:17main competing implementations this is
3:19just an example of of doing this
3:22evaluation the two competing
3:25implementations are using tuples and
3:28vectors so with the latest tuple update
3:31the to apocalypse it's a tuple members
3:36are stack allocated so that could
3:39potentially be great because you know
3:41things things operations using the
3:43members could speed up the
3:45implementation we currently have four
3:47for this relies a lot on generated
3:49functions which is not as nice it ends
3:53up making really hard to do things like
3:55use the simdi macro to have more really
4:00parallelized operations on a processor
4:01level so there's another implementation
4:04that uses vectors and so vectors are
4:07nice because you won't run into possible
4:10stack overflow errors if you have lots
4:11of components right and it also we can
4:16take advantage of simdi instructions and
4:19the fast math macro and other things so
4:24so basically I've
4:27so what I've been working on are these
4:29implementations for I started a little
4:31bit early I know the current projects
4:35supposed to started june fifteenth i
4:36started at the beginning of june but so
4:40essentially what i've done so far is
4:42taken a couple of different test
4:44functions so the rosen Brock functions a
4:46common optimization test function
4:48there's also the ackley function which
4:51normally returns a vector takes in a
4:54vector returns a vector but I modified
4:56it to return a scalar simply summing the
4:58components and then a really simple kind
5:01of test function to to have just a
5:06really simple benchmark and so my
5:09results if they can fit on the screen
5:11here basically what you see for let's go
5:16to a more telling plot this is probably
5:18the best one here so what you kind of
5:22see essentially is that as we increase
5:24the first of all the Green Line is the
5:28the the simple dual number
5:30implementation that already exists for
5:32automatic differentiation that can be
5:33found an optimum package so using two
5:38poles which is this blue line here we
5:40see we get a significant speed by the
5:42time we hit five epsilon components
5:44that's about a four time speed up
5:45already on top of that but if we go to a
5:49larger number of components which which
5:51is a pretty so something you'd is pretty
5:55desirable if you have a large input
5:57vector because then you can the more
5:59epsilon components you use the fewer
6:00evaluations of f you have to be made to
6:03evaluate a gradient so so as you go
6:07towards larger larger input vectors it
6:10seems like using the vector
6:12implementation of the in dual numbers is
6:16preferable so yes so essentially that's
6:20what I have the work later in the summer
6:23will probably be to really nail down a
6:27solid implementation and kind of
6:29proliferate its use throughout the Julia
6:32ecosystem wherever automatic
6:33differentiation is needed that's me
6:46okay so that's a good question so
6:50Julia's way of having of being able to
6:54overload functions in a manner that's
6:58really efficient basically input type of
7:01they are based on the types of the
7:02arguments basically allows us to
7:04overload elementary functions which if
7:07we go back if we go back to this if we
7:11go back to this definition right here
7:13essentially because the systems we want
7:17to implement are our systems where you
7:19can take a native Julia function that
7:21you've just coded up randomly that takes
7:23in a vector and returns whatever you can
7:26take that function pass it to this
7:28automatic differentiation tool and it
7:30will take the gradient of it to do that
7:32we have to be able to go through each of
7:34the elementary functions that people use
7:36to compose other functions and overload
7:41their their overload their operations
7:44using dual numbers and so such that the
7:47dual number that gets returned for
7:49example if this was if this was sign if
7:51G was signed here we would just want to
7:53overload it so that you know it would it
7:56would return sine X plus whatever that
7:59whatever why is here times cosine cosine
8:05x times the epsilon components so being
8:08able to evaluate just define a set of
8:10definitions essentially for all the
8:12elementary functions that is just as
8:14performant as a native implementation
8:18would be basically allows us to allow
8:22this to be a tenable idea essentially
8:24whereas like something in like Python
8:27you try to overload the sine function
8:29and it's it's it's going to be a sad
8:32time maybe it's that time yeah
8:37I have a situation where I have
8:41functions that i can write in tulia that
8:43i need to differentiate however inside
8:45those functions there are calls to
8:48functions that might be implemented in
8:49C++ is there a way for me to say hey
8:52I've got a function and I'm going to
8:54tell me what the derivative of that
8:55function is but the thing I watching our
8:57auto differentiate is some other
8:58function and Julia that calls that
9:00function so out of the box I'm going to
9:04say we don't have that implemented but
9:07what you could do is you could overload
9:10your own function with the dual number
9:12type that we basically provide and if
9:15you if you define if you define that
9:17then like it will basically pass through
9:19the calculation and work correctly but
9:22it yeah yeah you would have to you
9:26basically have to define the behavior of
9:28whatever other function you're using on
9:30dual numbers yourself which is not like
9:33the best thing but there might be some
9:35way to do like code introspection or
9:38something later to make it easier to do
9:40that but it will probably always be a
9:41thing if you're not writing a native
9:43julia code then it might always be a
9:45more of the hassle
10:00yeah okay yeah let me make sense you it
10:07is all automatic appreciate
10:09differentiate it might be worth it just
10:10because it automatic different cheated
10:12is also faster than using a lot of the
10:15methods that give you approximate
10:16derivatives so but I guess it depends on
10:19how much implement work yourself yeah
10:22yeah okay yeah yeah there you go cool
10:26any other questions you need to evaluate
10:34a directional derivative so I mean
10:36essentially the gradient here is just a
10:42series of directional derivatives that
10:45you're taking with the function so
10:47generally if you're using a single
10:49epsilon component you need to you need
10:51to take as many directional derivatives
10:54as the length of the input vector that
10:57you have that you're giving to the
10:58function write it right and so and so if
11:06you if you what I was right um yeah yeah
11:16there you go if you want to do further
11:17yet to go through it again go through
11:19the pain again but but yeah so being
11:23able to cut down on the number of
11:25evaluations of F that have to occur to
11:27take those derivatives is like a really
11:29important thing to do essentially
11:32because we don't because that's that's
11:34where a lot of this slowness comes from
11:35the book of the time spent all ri
Yesterday at 7:27 PM
Cherry Joy Pepito
11:35the book of the time spent all right oh
11:52there is a package that exists called
11:55hyper dual numbers dodge al that
11:58basically implements that has a dual
12:02number implementation that is
12:04specifically useful for taking second
12:06derivatives I don't really know much
12:08about it you could just evaluate the
12:11gradient and then evaluate the gradient
12:12again if you wanted to take the second
12:15derivative of F in this in this case but
12:18it might be it might be faster to rely
12:19on basically a different mathematical
12:22model to to do so I'm not sure yet
12:25that's something to look at anything
12:29else

Automatic Differentiation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Differentiation

Uploaded by

Copyright:

Available Formats

0:00so hi my name is Jarret my mentor is

0:03Miles Lubin and Theo Papa Marco he's

You might also like