talked about execution engines, and how applications make use of them. And we saw that execution engines can work through YARN, or some things like might go directly to HDFS and the data. So there's and there are execution engines that sit on top of YARN, and then applications that sit on top of those execution. So there's multiple ways to use applications on Hadoop. So, this brings up the question of resource management and how you schedule the available capacity on a cluster. So in this video we talk about resource management, different kinds of scheduling algorithms, and the types of parameters that you can control and typically control in these environments. So what's the motivation for these schedules? So as I mentioned in the previous slide. There are various execution options and engines and the way you access resources so if you just let schedule by their default mechanism you could end up with issues with scheduling like an important job that needs dedicated resources, might end up waiting a long time. It might impact job completion because you might run some resources out of memory for example. It can affect performance because of the way resources are being shared so there's, it's important to try and schedule things in an efficient way. And you want to be able to control how many resources are used, how much of the resources are used between components. So in terms of scheduling on Hadoop, the default option is to do First in First out. So you end up with a bunch of jobs in the queue that essentially flow through. So you could potentially have a very small job waiting a long time because there's a longer job ahead of it even though there is a resource available. You could have other options for schedulers so these are plugins that are going to the. Our YARN framework, so you have fairshare scheduling or capacity scheduling. So in fairshare what you do is try to balance out the resource allocation across applications over time. And capacity schedulers, you can have a guaranteed capacity for each application or group, and there are safeguards to prevent a user or an application from taking down the whole cluster by running it out of resources. So let's look at some details of capacity scheduler. So here, the primary mechanism of control is queues. Each queue's got entry of fluxions of the resource capacity. You can set hard and soft limits on these. They're, like in this figure you can see, you have four queue with varying fluxions of the resource capacity. And then, as you can see, in each queue you can have different users limited. So you can have access control lists that lets you restrict access to particular users. And you can also restrict users from looking at and changing other user's jobs. So, you can also do resource limits. So summarizing what we saw on the slide before, we have queues, and sub-queues are also possible. You can do a capacity guarantee, and there is an option to do elastic expansion if resources are available. You can use ACLs for security, as I mentioned. All of this can be done runtime, and also you can drain applications. What this means is if you have a bunch of applications running, and you want to make a change, you could say no new applications will start, but the existing applications will run too. They're finished, and then you can make the change. You can also do resource-based scheduling with limits on memory and CPU. The other option is to use a fairshare scheduler. And this essentially balances out the resource allocations among the various applications or time. For the fairshare is based on memory, but we can add CPUs as a resource. So an example here going from left to right you're seeing essentially a timeline. So when you start off an application and there's nothing else running that application can use another cluster. Say now you submit a second application as task complete as part of application one application tool can pick up some of the resources. So the Fairshare Scheduler will try to balance things up. So like the third step here, you'll see App1 is now it's only 5% App2 25%. Now you submitted another application and the Fairshare Scheduler try to balance things out, and the idea is in the long run you try to even out how many resources each application gets. So to summarize, essentially, it balances out resource allocation over time. You can organize these into queues and sub-queues also. And you can guarantee minimum shares. You can do limits per user or application. You can also do priorities that are weighted among the apps, so not every app gets the same share, essentially. So to summarize what we learned, the default set up on Hadoop is to do FIFO, that's first in first out. But we have the Fairshare and Capacity schedulers. As we saw, you can restrict based on queues and sub-queues, you can do user and app based limits, you can do resource limits. Typically that's memory, but you can add in CPUs. And I also want to add, that in addition to this, if you have a commercial window, you might have additional mechanisms that let you allocate resources among. Things beyond YARNs, so you could have containers for how much yon gets and how much something else, like maybe a instance will get. So there are other options, essentially.