You are on page 1of 19

4/2/24

Job Scheduling
• Given:
– n jobs available submitted by users à in a queue
Scheduling – Estimated execution time/job known
– m VMs available
Distributed Computing in General • Scheduling Problem
Cloud Computing in Particular – Assign jobs to VMs
– Reduce overall execution time to process all jobs
in the queue

1 2

Desirable features of a scheduling Desirable features of a scheduling


algorithm algorithm
• No A Priori Knowledge about Processes • Balanced system performance and scheduling
– User does not want to specify information about overhead
characteristic and requirements
– Great amount of information gives more
• Dynamic in nature intelligent decision, but increases overhead
– Decision should be based on the changing load of
nodes and not on fixed static policy
• Stability
• Quick decision-making capability – Unstable when all processes are migrating without
accomplishing any useful work
– Algorithm must make quick decision about the
assignment of task to nodes of system – It occurs when the nodes turn from lightly-loaded
to heavily-loaded state and vice versa
3 4

3 4

1
4/2/24

Desirable features of a scheduling


Task assignment approach
algorithm
• Scalability • Main assumptions
– A scheduling algorithm should be capable of – Processes have been split into tasks
handling small as well as large networks – Computation requirement of tasks and speed of
• Fault tolerance processors are known
– Should be capable of working after the crash of – Cost of processing tasks on nodes are known
one or more nodes of the system – Communication cost between every pair of tasks
• Fairness of Service are known
– More users initiating equivalent processes expect – Resource requirements and available resources on
to receive the same quality of service node are known
– Reassignment of tasks are not possible
5 6

5 6

Task assignment approach Dispatching Algorithms


Strategies to select the target server
• Static: Fastest solution to prevent switching bottlenecks,
• Basic idea: Finding an optimal assignment to but do not consider the current state of the servers
achieve goals such as the following: • Dynamic: Outperform static algorithms by using intelligent
decisions, but collecting state information and analyzing them
– Minimization of IPCcosts cause expensive overheads
– Quick turnaround time of process Requirements: (1) Low computational complexity (2) Full
– High degree of parallelism compatibility with existing web standards (3) state
information must be readily available without much overhead
– Efficient utilization of resources

7 8

2
4/2/24

Content blind approach Content blind approach


• Static Policies:
Random
distributes the incoming requests uniformly with equal
probability of reaching any server
Round Robin (RR)
use a circular list and a pointer to the last selected server to
make the decision
Static Weighted RR(For heterogeneous severs)
A variation of RR, where each server is assigned a weight Wi
depending on its capacity

9 10

Content blind approach Content blind approach


• Dynamic • Client and server state aware
Client state aware Client affinity
static partitioning the server nodes and to assign group instead of assigning each new connection to a server only on the basis of
of clients identified through the clients information, such the server state regardless of any past assignment, consecutive
connections from the same client can be assigned to the same server
as source IP address
Server State Aware
Least Loaded, the server with the lowest load.
Issue: Which is the server load index?
Least Connection
fewest active connection first
• Fastest Response

Weighted Round Robin


Variation of static RR,associates each server with a dynamically
evaluated weight that is proportional to the server load

11 12

3
4/2/24

Considerations of content blind Content aware approach


• Static approach is the fastest, easy to implement, but
may make poor assignment decision
• Dynamic approach has the potential to make better
decision, but it needs to collect and analyze state Server Client
information, may cause high overhead aware aware

13 14

Content aware approach Content aware approach


• Sever state aware • Client state aware
Cache Affinity Service Partitioning
the file space is partitioned among the server nodes. employ specialized servers for certain type of requests.
Load Sharing Client Affinity
. SITEA(Size Interval TaskAssignment with Equal Load) using session identifier to assign all web transactions from the
switch determines the size of the requested file and select the same client to the same server
target server based on this information
. CAP(Client-Aware Policy)
web requests are classified based on their impact on system
resources: such as I/O bound, CPU bound

15 16

4
4/2/24

Content aware approach Cloud Scenario


• Client and server state aware • An example
LARD (Locality aware request distribution)
direct all requests to the same web object to the same server
node as long as its utilization is below a given threshold.
Cache Manager
a cache manager that is aware of the cache content of all web
servers.

17 18

Example - IaaS Step 1 Example - IaaS Step 2

User submits to
Research and Commercial clouds made Condor Job scheduler
available with some cloud-like interface. that has no resources
attached to it.

Ian Gable 19 Ian Gable 20

19 20

5
4/2/24

Step 3 Step 4

Cloud Scheduler detects that


there are waiting jobs in the
Condor Queues and then
makes request to boot the
The VMs boot, attach
VMs that match the job
themselves to the Condor
requirements.
Queues and begin draining
jobs. Once no more jobs
require the VMs Cloud
Scheduler shuts them down.
Ian Gable 21 Ian Gable 22

21 22

Example - IaaS Cloud Scheduler Goals


1. A user submits a job to a job scheduler • Don’t replicate existing functionality.
2. This job sits idle in the queue, because there are no • To be able to use existing IaaS and job
resources yet scheduler software together, today.
3. Cloud Scheduler examines the queue, and • Users should be able to use the familiar HTC
determines that there are jobs without resources
4. Cloud Scheduler starts VMs on IaaS clusters
tools.
5. These VMs advertise themselves to the job • Support VM creation on Nimbus, OpenNebula,
scheduler Eucalyptus, and EC2, i.e. all likely IaaS
6. The job scheduler sees these VMs, and starts resources types people are likely to encounter.
running jobs on them • Adequate scheduling to be useful to our users
7. Once all of the jobs are done, Cloud Scheduler
shuts down the VMs • Simple architecture
Ian Gable 23 Ian Gable 24

23 24

6
4/2/24

First Come First Serve


Queuing-based Scheduling
(FCFS)
• FCFS
• SJF
• Backfilling
Job 1
• Gang Scheduling etc.

Processors
Time
Job 3 Job 4
Job 2

Queue:

25 26

Tennis Court Scheduling EASYBackfilling


• Allow backfills when the projected start of first job in the
queue is not delayed
Job 1
Job 5 • No starvation—all jobs will eventually run
Job 2 • Claim: “Jobs in the queue are never delayed from running
by jobs submitted to the queue after them.”
Processors

Job 3 Job 4 Job 6

Time
Job 7

27 28

7
4/2/24

Conservative Backfilling Backfilling Variants


• Allow backfills when the projected starts of all preceding
jobs in the queue are not delayed • Dynamic Backfilling/Slack-based Backfilling
• Worst-case start time guaranteed at submittal • overruling previous reservation if introducing a slight delay will
• Claim: “guarantees that future arrivals do not delay improve utilization considerably
previously queued jobs.” – Eachjob in the queue is associated with a slack – maximum delay
after reservation.
– Important jobs will have little slack
– Backfilling is allowed only if the backfilled job does not delay any
other job by more than that job’s slack

• Multiple-Queue Backfilling
– Eachjob is assigned to a queue according to its expected
execution time
– Eachqueue is assigned to a disjoint partition of the parallel
system on which only jobs from this queue can be executed
– Reduces the likelihood that short jobs get delayed in the queue
behind long jobs

29 30

Heuristic Scheduling Scheduling Heuristics


• Min-Min • Max-Min
• Start with a list of Unmapped tasks, U. • Start with a list of Unmapped tasks, U.
• Determine the set of minimum completion times for U.
• Determine the set of minimum completion times for U.
• Choose the next task that has max of min completion
• Choose the next task that has min of min completion times and assign to the machine that provides the min.
times and assign to the machine that provides the min. completion time.
completion time. • The new mapped task is removed from U and the process
• The new mapped task is removed from U and the is repeated.
process is repeated. • Avoids starvation of long tasks
• Theme - Map as many tasks as possible to their first • Long tasks executed concurrently with short tasks
choice of machine • Better machine-utilization
• Since short jobs are mapped first, the percentage of
tasks that are allocated to their first choice is high

31 32

8
4/2/24

Scheduling Heuristics GA
• Operates with chromosomes. A chromosome represents a
• Genetic Algorithm mapping of task to machines, a vector of size t.
• General steps of GA • Initial population – n chromosomes randomly generated
• Evaluation – initial population evaluated based on fitness
value (makespan)
• Selection –
– Roulette wheel – probabilistically generate new population, with
better mappings, from previous population
– Elitism – guaranteeing that the best solution (fittest) is carried forward

33 34

GA - Roulette wheel scheme GA


Chromosomes 1 2 3 4 • Crossover
– Choose pairs of chromosomes.
Score 4 10 14 2 – For every pair
Probability of 0.13 0.33 0.47 0.07 • Choose a random point
• exchange machine assignments from that point till the end of the
selection chromosome
• Mutation. For every chromosome:
– Randomly select a task
Select a random number, r, between 0 and 1. – Randomly reassign it to new machine
Progressively add the probabilities until the sum is • Evaluation
• Stopping criterion:
greater than r
– Either m iterations or
– No change in elite chromosome

35 36

9
4/2/24

Real-Time Based Scheduling DAG Scheduling


• Deadlines … EDF (Earliest Deadline First) • Map all tasks in a DAG to computing resources
• Earliest Finish Time (ETF) – Computational time
• SLA based – Data dependencies and Transfer costs
• Often done statically
– Assumes deterministic behavior of apps and
• Users: Wide range of constraints and machines
deadlines
– Batch operation

37 38

Workflow Scheduling Best Effort: HEFT

39 40

10
4/2/24

Best Effort: HEFT Best Effort: HEFT

41 42

Best Effort: HEFT Best Effort: HEFT


• Data and Compute Estimates • Critical Path Estimates

43 44

11
4/2/24

Best Effort: HEFT Best Effort: HEFT

45 46

Load-balancing approach Load-balancing approach


Type of load-balancing algorithms

• Static versus Dynamic


Load-balancing algorithms – Static algorithms use only information about the
average behavior of the system
Static Dynamic – Static algorithms ignore the current state or load
of the nodes in the system
Deterministic Probabilistic Centralized Distributed – Dynamic algorithms collect state information and
react to system state if it changed
– Static algorithms are much more simpler
Cooperative Noncooperative – Dynamic algorithms are able to give significantly
better performance
47 48

47 48

12
4/2/24

Load-balancing approach Load-balancing approach


Type of static load-balancing algorithms Type of dynamic load-balancing algorithms

• Deterministic versus Probabilistic • Centralized versus Distributed


– Deterministic algorithms use the information – Centralized approach collects information to
about the properties of the nodes and the server node and makes assignment decision
characteristic of processes to be scheduled
– Probabilistic algorithms use information of static – Distributed approach contains entities to make
attributes of the system (e.g. number of nodes, decisions on a predefined set of nodes
processing capability, topology) to formulate – Centralized algorithms can make efficient
simple process placement rules decisions, have lower fault-tolerance
– Deterministic approach is difficult to optimize – Distributed algorithms avoid the bottleneck of
– Probabilistic approach has poor performance collecting state information and react faster

49 50

49 50

Load-balancing approach Issues in designing Load-balancing


Type of distributed load-balancing algorithms algorithms
• Cooperative versus Noncooperative • Load estimation policy
– determines how to estimate the workload of a node
– In Noncooperative algorithms entities act as • Process transfer policy
autonomous ones and make scheduling decisions – determines whether to execute a process locally or remote
independently from other entities • State information exchange policy
– determines how to exchange load information among nodes
– In Cooperative algorithms distributed entities • Location policy
cooperate with each other – determines to which node the transferable process should be sent
• Priority assignment policy
– Cooperative algorithms are more complex and – determines the priority of execution of local and remote processes
involve larger overhead • Migration limiting policy
– Stability of Cooperative algorithms are better – determines the total number of times a process can migrate

51 52

51 52

13
4/2/24

1. Load estimation policy 1. Load estimation policy


for Load-balancing algorithms for Load-balancing algorithms

• Tobalance the workload on all the nodes of the • In some cases the true load could vary widely
system, it is necessary to decide how to measure the depending on the remaining service time, which can
workload of a particular node be measured in several way:
• Some measurable parameters (with time and node – Memoryless method assumes that all processes have the
dependent factor) can be the following: same expected remaining service time, independent of the
– Total number of processes on the node time used so far
– Resource demands of these processes – Pastrepeats assumes that the remaining service time is
– Instruction mixes of these processes equal to the time used so far
– Architecture and speed of the node’s processor – Distribution method states that if the distribution service
• Several load-balancing algorithms use the total times are known, the associated process’s remaining
number of processes to achieve big efficiency service time is the expected remaining time conditioned
by the time already used

53 54

53 54

1. Load estimation policy 2. Process transfer policy


for Load-balancing algorithms for Load-balancing algorithms

• None of the previous methods can be used in • Most of the algorithms use the threshold policy to decide on
modern systems because of periodically running whether the node is lightly-loaded or heavily-loaded
processes and daemons • Threshold value is a limiting value of the workload of node
• An acceptable method for use as the load estimation which can be determined by
policy in these systems would be to measure the CPU – Static policy: predefined threshold value for each node depending on
utilization of the nodes processing capability
• Central Processing Unit utilization is defined as the – Dynamic policy: threshold value is calculated from average workload
number of CPU cycles actually executed per unit of and a predefined constant
real time • Below threshold value node accepts processes to execute,
• It can be measured by setting up a timer to above threshold value node tries to transfer processes to a
periodically check the CPUstate (idle/busy) lightly-loaded node

55 56

55 56

14
4/2/24

2. Process transfer policy 2. Process transfer policy


for Load-balancing algorithms for Load-balancing algorithms

• Single-threshold policy may lead to unstable algorithm • Double threshold policy


because underloaded node could turn to be overloaded right – When node is in overloaded region, new local processes
after a process migration are sent to run remotely, requests to accept remote
processes are rejected
Overloaded
Overloaded – When node is in normal region, new local processes run
High mark
Normal
locally, requests to accept remote processes are rejected
Threshold
Low mark – When node is in underloaded region, new local processes
Underloaded
Underloaded run locally, requests to accept remote processes are
accepted
Single-threshold policy Double-threshold policy

• To reduce instability, double-threshold policy has been


proposed which is also known as high-low policy
57 58

57 58

3. Location policy 3. Location policy


for Load-balancing algorithms for Load-balancing algorithms

• Threshold method • Bidding method


– Policy selects a random node, checks whether the node is – Nodes contain managers (to send processes) and
able to receive the process, then transfers the process. If contractors (to receive processes)
node rejects, another node is selected randomly. This – Managers broadcast a request for bid, contractors respond
continues until probe limit is reached. with bids (prices based on capacity of the contractor node)
• Shortest method and manager selects the best offer
– L distinct nodes are chosen at random, each is polled to – Winning contractor is notified and asked whether it
determine its load. The process is transferred to the node accepts the process for execution or not
having the minimum value unless its workload value – Full autonomy for the nodes regarding scheduling
prohibits to accept the process. – Big communication overhead
– Simple improvement is to discontinue probing whenever a – Difficult to decide a good pricing policy
node with zero load is encountered.

59 60

59 60

15
4/2/24

3. Location policy 4. State information exchange policy


for Load-balancing algorithms For Load-balancing algorithms

• Pairing • Dynamic policies require frequent exchange of state


– Contrary to the former methods the pairing policy is to reduce
information, but these extra messages arise two
the variance of load only between pairs opposite impacts:
– Each node asks some randomly chosen node to form a pair with – Increasing the number of messages gives more accurate
scheduling decision
it
– Increasing the number of messages raises the queuing
– If it receives a rejection, it randomly selects another node time of messages
and tries to pair again • State information policies can be the following:
– Two nodes that differ greatly in load are temporarily paired with – Periodic broadcast
each other and migration starts
– Broadcast when state changes
– The pair is broken as soon as the migration is over – On-demand exchange
– Anode only tries to find a partner if it has at least two processes – Exchange by polling

61 62

61 62

4. State information exchange policy 4. State information exchange policy


For Load-balancing algorithms For Load-balancing algorithms

• Periodic broadcast • On-demand exchange


– Each node broadcasts its state information after the elapse – In this method a node broadcasts a State-
of every T units of time Information-Request message when its state switches
from normal to either underloaded or overloaded region.
– Problem: heavy traffic, fruitless messages, poor scalability
– On receiving this message other nodes reply with their
since information exchange is too large for networks own state information to the requesting node
having many nodes – Further improvement can be that only those nodes reply
• Broadcast when state changes which are useful to the requesting node
– Avoids fruitless messages by broadcasting the state only • Exchange by polling
when a process arrives or departures – To avoid poor scalability (coming from broadcast
– Further improvement is to broadcast only when state messages), the partner node is searched by polling the
switches to another region (double-threshold policy) other nodes one by one, until poll limit is reached

63 64

63 64

16
4/2/24

5. Priority assignment policy 6. Migration limiting policy


for Load-balancing algorithms for Load-balancing algorithms

• Selfish • This policy determines the total number of times a


– Local processes are given higher priority than remote processes. process can migrate
• Altruistic – Uncontrolled
• A remote process arriving at a node is treated just as a process
– Remote processes are given higher priority than local processes. originating at a node, so a process may be migrated any number of
• Intermediate times
– When the number of local processes is greater or equal to the number – Controlled
of remote processes, local processes are given higher priority than • Avoids the instability of the uncontrolled policy
remote processes. Otherwise, remote processes are given higher • Use a migration count parameter to fix a limit on the number of
priority than local processes. time a process can migrate
• Irrevocable migration policy: migration count is fixed to 1
• For long execution processes migration count must be greater
than 1 to adapt for dynamically changing states

65 66

65 66

Load-sharing approach Load estimation policies


for Load-sharing algorithms

• Drawbacks of Load-balancing approach • Since load-sharing algorithms simply attempt to


– Load balancing technique with attempting equalizing the workload on avoid idle nodes, it is sufficient to know whether a
all the nodes is not an appropriate objective since big
overhead is generated by gathering exact state information node is busy or idle
– Load balancing is not achievable since number of processes in a node • Thus these algorithms normally employ the simplest
is always fluctuating and temporal unbalance among the nodes exists
every moment load estimation policy of counting the total number
• Basic ideas for Load-sharing approach of processes
– It is necessary and sufficient to prevent nodes from being idle while • In modern systems where permanent existence of
some other nodes have more than two processes several processes on an idle node is possible,
– Load-sharing is much simpler than load-balancing since it only
attempts to ensure that no node is idle when heavily node exists algorithms measure CPU utilization to estimate the
– Priority assignment policy and migration limiting policy are the same load of a node
asthat for the load-balancingalgorithms

67 68

67 68

17
4/2/24

Process transfer policies Location policies


for Load-sharing algorithms for Load-sharing algorithms

• Algorithms normally use all-or-nothing strategy • Location policy decides whether the sender node or the
• This strategy uses the threshold value of all the nodes fixed to 1 receiver node of the process takes the initiative to search for
• Nodes become receiver node when it has no process, and suitable node in the system, and this policy can be the
become sender node when it has more than 1 process following:
• To avoid processing power on nodes having zero process load- – Sender-initiated location policy
sharing algorithms use a threshold value of 2 instead of 1 • Sender node decides where to send the process
• When CPU utilization is used as the load estimation policy, the • Heavily loaded nodes search for lightly loaded nodes
double-threshold policy should be used as the process – Receiver-initiated location policy
transfer policy
• Receiver node decides from where to get the process
• Lightly loaded nodes search for heavily loaded nodes

69 70

69 70

Location policies Location policies


for Load-sharing algorithms for Load-sharing algorithms

• Sender-initiated location policy • Experiences with location policies


– Node becomes overloaded, it either broadcasts or randomly probes
– Both policies gives substantial performance advantages
the other nodes one by one to find a node that is able to receive
remote processes over the situation in which no load-sharing is attempted
– When broadcasting, suitable node is known as soon as reply arrives – Sender-initiated policy is preferable at light to moderate
• Receiver-initiated location policy system loads
– Nodes becomes underloaded, it either broadcast or randomly probes – Receiver-initiated policy is preferable at high system loads
the other nodes one by one to indicate its willingness to receive – Sender-initiated policy provide better performance for the
remote processes
case when process transfer cost significantly more at
• Receiver-initiated policy require preemptive process receiver-initiated than at sender-initiated policy due to the
migration facility since scheduling decisions are usually made preemptive transfer of processes
at process departure epochs

71 72

71 72

18
4/2/24

State information exchange policies


for Load-sharing algorithms

• In load-sharing algorithms it is not necessary for the nodes to periodically


exchange state information, but needs to know the state of other nodes
when it is either underloaded or overloaded
• Broadcast when state changes
– In sender-initiated/receiver-initiated location policy a node broadcasts State
Information Request when it becomes overloaded/underloaded
– It is called broadcast-when-idle policy when receiver-initiated policy is used
with fixed threshold value value of 1
• Poll when state changes
– In large networks polling mechanism is used
– Polling mechanism randomly asks different nodes for state information until
find an appropriate one or probe limit is reached
– It is called poll-when-idle policy when receiver-initiated policy is used with
fixed threshold value value of 1

73

73

19

You might also like