0:06unfortunately not able to be here this 0:07conference but I'm going to be working 0:09on it's not showing but I'm going to be 0:11work on automatic differentiation in 0:13Julia there's currently a couple 0:15packages that use automatic 0:17differentiation what it is is it's a 0:19method to essentially take exact 0:22derivatives really quickly there are 0:24other methods of course one common one 0:27is the method of finite differencing 0:28which uses the limit definition of a 0:33derivative and essentially that's really 0:35prone to approximation error and it's 0:39quite slow we can leverage these cool 0:43things called dual numbers to take exact 0:46derivatives and a manner that's much 0:51faster so I guess I don't don't have 0:53this up there yet but if there's oh oh 0:57there we go that makes my job easier I 0:59guess um so essentially a dual number is 1:04kind of like a complex number except 1:05instead of having an imaginary component 1:07it has a an infinitesimal component and 1:11so essentially what you get is when you 1:13run this number through any generic 1:16function the output of that function 1:20ends up being the evaluation of that 1:23function at the real component and the 1:25evaluation of the derivative of that 1:27function at the real component in that 1:29in that derivative evaluation is the 1:32epsilon component which is the infinite 1:35Testament component so basically there's 1:40already a type that exists in Julia for 1:44using dual numbers it's in the dual 1:45numbers package that has a single 1:47epsilon component still not out there 1:51but we can basically take gradients of 1:57functions faster if we use more than one 2:00epsilon capone at a time with what you 2:02end up having is you have multiple 2:04epsilon components e oh here we go cool 2:07so i'm already down so this part if you 2:12can see it is essential 2:13what I was saying before the epsilon 2:16component on the right essentially gives 2:19you the evaluation of the derivative of 2:22the function at your real input and so 2:25we have this dual number type that 2:27currently exists and we can use it to 2:30evaluate gradients by just evaluating 2:33the directional derivatives one at a 2:35time so this is kind of what that looks 2:37like so that takes it an evaluation of f 2:43the number of x equal to the length of 2:46the input vector so that's obviously 2:48kind of slow we can add more epsilon 2:53components to basically essentially take 2:57more directional derivatives in a single 2:59pass of X of F and a single evaluation 3:03of F and so essentially there are a 3:07couple of different ways we can 3:10implement a type that has multiple 3:13epsilon components and they're the two 3:17main competing implementations this is 3:19just an example of of doing this 3:22evaluation the two competing 3:25implementations are using tuples and 3:28vectors so with the latest tuple update 3:31the to apocalypse it's a tuple members 3:36are stack allocated so that could 3:39potentially be great because you know 3:41things things operations using the 3:43members could speed up the 3:45implementation we currently have four 3:47for this relies a lot on generated 3:49functions which is not as nice it ends 3:53up making really hard to do things like 3:55use the simdi macro to have more really 4:00parallelized operations on a processor 4:01level so there's another implementation 4:04that uses vectors and so vectors are 4:07nice because you won't run into possible 4:10stack overflow errors if you have lots 4:11of components right and it also we can 4:16take advantage of simdi instructions and 4:19the fast math macro and other things so 4:24so basically I've 4:27so what I've been working on are these 4:29implementations for I started a little 4:31bit early I know the current projects 4:35supposed to started june fifteenth i 4:36started at the beginning of june but so 4:40essentially what i've done so far is 4:42taken a couple of different test 4:44functions so the rosen Brock functions a 4:46common optimization test function 4:48there's also the ackley function which 4:51normally returns a vector takes in a 4:54vector returns a vector but I modified 4:56it to return a scalar simply summing the 4:58components and then a really simple kind 5:01of test function to to have just a 5:06really simple benchmark and so my 5:09results if they can fit on the screen 5:11here basically what you see for let's go 5:16to a more telling plot this is probably 5:18the best one here so what you kind of 5:22see essentially is that as we increase 5:24the first of all the Green Line is the 5:28the the simple dual number 5:30implementation that already exists for 5:32automatic differentiation that can be 5:33found an optimum package so using two 5:38poles which is this blue line here we 5:40see we get a significant speed by the 5:42time we hit five epsilon components 5:44that's about a four time speed up 5:45already on top of that but if we go to a 5:49larger number of components which which 5:51is a pretty so something you'd is pretty 5:55desirable if you have a large input 5:57vector because then you can the more 5:59epsilon components you use the fewer 6:00evaluations of f you have to be made to 6:03evaluate a gradient so so as you go 6:07towards larger larger input vectors it 6:10seems like using the vector 6:12implementation of the in dual numbers is 6:16preferable so yes so essentially that's 6:20what I have the work later in the summer 6:23will probably be to really nail down a 6:27solid implementation and kind of 6:29proliferate its use throughout the Julia 6:32ecosystem wherever automatic 6:33differentiation is needed that's me 6:46okay so that's a good question so 6:50Julia's way of having of being able to 6:54overload functions in a manner that's 6:58really efficient basically input type of 7:01they are based on the types of the 7:02arguments basically allows us to 7:04overload elementary functions which if 7:07we go back if we go back to this if we 7:11go back to this definition right here 7:13essentially because the systems we want 7:17to implement are our systems where you 7:19can take a native Julia function that 7:21you've just coded up randomly that takes 7:23in a vector and returns whatever you can 7:26take that function pass it to this 7:28automatic differentiation tool and it 7:30will take the gradient of it to do that 7:32we have to be able to go through each of 7:34the elementary functions that people use 7:36to compose other functions and overload 7:41their their overload their operations 7:44using dual numbers and so such that the 7:47dual number that gets returned for 7:49example if this was if this was sign if 7:51G was signed here we would just want to 7:53overload it so that you know it would it 7:56would return sine X plus whatever that 7:59whatever why is here times cosine cosine 8:05x times the epsilon components so being 8:08able to evaluate just define a set of 8:10definitions essentially for all the 8:12elementary functions that is just as 8:14performant as a native implementation 8:18would be basically allows us to allow 8:22this to be a tenable idea essentially 8:24whereas like something in like Python 8:27you try to overload the sine function 8:29and it's it's it's going to be a sad 8:32time maybe it's that time yeah 8:37I have a situation where I have 8:41functions that i can write in tulia that 8:43i need to differentiate however inside 8:45those functions there are calls to 8:48functions that might be implemented in 8:49C++ is there a way for me to say hey 8:52I've got a function and I'm going to 8:54tell me what the derivative of that 8:55function is but the thing I watching our 8:57auto differentiate is some other 8:58function and Julia that calls that 9:00function so out of the box I'm going to 9:04say we don't have that implemented but 9:07what you could do is you could overload 9:10your own function with the dual number 9:12type that we basically provide and if 9:15you if you define if you define that 9:17then like it will basically pass through 9:19the calculation and work correctly but 9:22it yeah yeah you would have to you 9:26basically have to define the behavior of 9:28whatever other function you're using on 9:30dual numbers yourself which is not like 9:33the best thing but there might be some 9:35way to do like code introspection or 9:38something later to make it easier to do 9:40that but it will probably always be a 9:41thing if you're not writing a native 9:43julia code then it might always be a 9:45more of the hassle 10:00yeah okay yeah let me make sense you it 10:07is all automatic appreciate 10:09differentiate it might be worth it just 10:10because it automatic different cheated 10:12is also faster than using a lot of the 10:15methods that give you approximate 10:16derivatives so but I guess it depends on 10:19how much implement work yourself yeah 10:22yeah okay yeah yeah there you go cool 10:26any other questions you need to evaluate 10:34a directional derivative so I mean 10:36essentially the gradient here is just a 10:42series of directional derivatives that 10:45you're taking with the function so 10:47generally if you're using a single 10:49epsilon component you need to you need 10:51to take as many directional derivatives 10:54as the length of the input vector that 10:57you have that you're giving to the 10:58function write it right and so and so if 11:06you if you what I was right um yeah yeah 11:16there you go if you want to do further 11:17yet to go through it again go through 11:19the pain again but but yeah so being 11:23able to cut down on the number of 11:25evaluations of F that have to occur to 11:27take those derivatives is like a really 11:29important thing to do essentially 11:32because we don't because that's that's 11:34where a lot of this slowness comes from 11:35the book of the time spent all ri Yesterday at 7:27 PM Cherry Joy Pepito 11:35the book of the time spent all right oh 11:52there is a package that exists called 11:55hyper dual numbers dodge al that 11:58basically implements that has a dual 12:02number implementation that is 12:04specifically useful for taking second 12:06derivatives I don't really know much 12:08about it you could just evaluate the 12:11gradient and then evaluate the gradient 12:12again if you wanted to take the second 12:15derivative of F in this in this case but 12:18it might be it might be faster to rely 12:19on basically a different mathematical 12:22model to to do so I'm not sure yet 12:25that's something to look at anything 12:29else