Professional Documents
Culture Documents
Cloudscheduling Backfills
Cloudscheduling Backfills
Job Scheduling
• Given:
– n jobs available submitted by users à in a queue
Scheduling – Estimated execution time/job known
– m VMs available
Distributed Computing in General • Scheduling Problem
Cloud Computing in Particular – Assign jobs to VMs
– Reduce overall execution time to process all jobs
in the queue
1 2
3 4
1
4/2/24
5 6
7 8
2
4/2/24
9 10
11 12
3
4/2/24
13 14
15 16
4
4/2/24
17 18
User submits to
Research and Commercial clouds made Condor Job scheduler
available with some cloud-like interface. that has no resources
attached to it.
19 20
5
4/2/24
Step 3 Step 4
21 22
23 24
6
4/2/24
Processors
Time
Job 3 Job 4
Job 2
Queue:
25 26
Time
Job 7
27 28
7
4/2/24
• Multiple-Queue Backfilling
– Eachjob is assigned to a queue according to its expected
execution time
– Eachqueue is assigned to a disjoint partition of the parallel
system on which only jobs from this queue can be executed
– Reduces the likelihood that short jobs get delayed in the queue
behind long jobs
29 30
31 32
8
4/2/24
Scheduling Heuristics GA
• Operates with chromosomes. A chromosome represents a
• Genetic Algorithm mapping of task to machines, a vector of size t.
• General steps of GA • Initial population – n chromosomes randomly generated
• Evaluation – initial population evaluated based on fitness
value (makespan)
• Selection –
– Roulette wheel – probabilistically generate new population, with
better mappings, from previous population
– Elitism – guaranteeing that the best solution (fittest) is carried forward
33 34
35 36
9
4/2/24
37 38
39 40
10
4/2/24
41 42
43 44
11
4/2/24
45 46
47 48
12
4/2/24
49 50
49 50
51 52
51 52
13
4/2/24
• Tobalance the workload on all the nodes of the • In some cases the true load could vary widely
system, it is necessary to decide how to measure the depending on the remaining service time, which can
workload of a particular node be measured in several way:
• Some measurable parameters (with time and node – Memoryless method assumes that all processes have the
dependent factor) can be the following: same expected remaining service time, independent of the
– Total number of processes on the node time used so far
– Resource demands of these processes – Pastrepeats assumes that the remaining service time is
– Instruction mixes of these processes equal to the time used so far
– Architecture and speed of the node’s processor – Distribution method states that if the distribution service
• Several load-balancing algorithms use the total times are known, the associated process’s remaining
number of processes to achieve big efficiency service time is the expected remaining time conditioned
by the time already used
53 54
53 54
• None of the previous methods can be used in • Most of the algorithms use the threshold policy to decide on
modern systems because of periodically running whether the node is lightly-loaded or heavily-loaded
processes and daemons • Threshold value is a limiting value of the workload of node
• An acceptable method for use as the load estimation which can be determined by
policy in these systems would be to measure the CPU – Static policy: predefined threshold value for each node depending on
utilization of the nodes processing capability
• Central Processing Unit utilization is defined as the – Dynamic policy: threshold value is calculated from average workload
number of CPU cycles actually executed per unit of and a predefined constant
real time • Below threshold value node accepts processes to execute,
• It can be measured by setting up a timer to above threshold value node tries to transfer processes to a
periodically check the CPUstate (idle/busy) lightly-loaded node
55 56
55 56
14
4/2/24
57 58
59 60
59 60
15
4/2/24
61 62
61 62
63 64
63 64
16
4/2/24
65 66
65 66
67 68
67 68
17
4/2/24
• Algorithms normally use all-or-nothing strategy • Location policy decides whether the sender node or the
• This strategy uses the threshold value of all the nodes fixed to 1 receiver node of the process takes the initiative to search for
• Nodes become receiver node when it has no process, and suitable node in the system, and this policy can be the
become sender node when it has more than 1 process following:
• To avoid processing power on nodes having zero process load- – Sender-initiated location policy
sharing algorithms use a threshold value of 2 instead of 1 • Sender node decides where to send the process
• When CPU utilization is used as the load estimation policy, the • Heavily loaded nodes search for lightly loaded nodes
double-threshold policy should be used as the process – Receiver-initiated location policy
transfer policy
• Receiver node decides from where to get the process
• Lightly loaded nodes search for heavily loaded nodes
69 70
69 70
71 72
71 72
18
4/2/24
73
73
19