You are on page 1of 78

The Forgotten Factor: FACTS

on Performance Evaluation
and its Dependence on Workloads

Dror Feitelson
Hebrew University
Performance Evaluation

• In system design
– Selection of algorithms
– Setting parameter values
• In procurement decisions
– Value for money
– Meet usage goals
• For capacity planing
The Good Old Days…
• The skies were
blue
• The simulation
results were
conclusive
• Our scheme
was better than
theirs
Feitelson & Jette, JSSPP 1997
But in their papers,

Their scheme was better than ours!


How could they be so wrong?
Performance evaluation depends on:
• The system’s design
(What we teach in algorithms and data structures)
• Its implementation
(What we teach in programming courses)
• The workload to which it is subjected
• The metric used in the evaluation
• Interactions between these factors
Performance evaluation depends on:
• The system’s design
(What we teach in algorithms and data structures)
• Its implementation
(What we teach in programming courses)
• The workload to which it is subjected
• The metric used in the evaluation
• Interactions between these factors
Outline for Today

• Three examples of how workloads affect


performance evaluation
• Workload modeling
• Research agenda
In the context of parallel job scheduling
Example #1

Gang Scheduling and


Job Size Distribution
Gang What?!?
Time slicing parallel jobs with coordinated
context switching

Ousterhout
matrix

Ousterhout, ICDCS 1982


Gang What?!?
Time slicing parallel jobs with coordinated
context switching

Ousterhout
matrix

Optimization:
Alternative
scheduling
Ousterhout, ICDCS 1982
Packing Jobs
Use a buddy system for allocating processors

Feitelson & Rudolph, Computer 1990


Packing Jobs
Use a buddy system for allocating processors
Packing Jobs
Use a buddy system for allocating processors
Packing Jobs
Use a buddy system for allocating processors
Packing Jobs
Use a buddy system for allocating processors
The Question:
• The buddy system leads to internal
fragmentation
• But it also improves the chances of
alternative scheduling, because processors
are allocated in predefined groups

Which effect dominates the other?


The Answer (part 1):

Feitelson & Rudolph, JPDC 1996


The Answer (part 2):
The Answer (part 2):
The Answer (part 2):
The Answer (part 2):
• Many small jobs
• Many sequential jobs
• Many power of two jobs
• Practically no jobs use full machine

Conclusion: buddy system should work well


Verification

Feitelson, JSSPP 1996


Example #2

Parallel Job Scheduling


and Job Scaling
Variable Partitioning

• Each job gets a dedicated partition for the


duration of its execution
• Resembles 2D bin packing
• Packing large jobs first should lead to better
performance
• But what about correlation of size and
runtime?
“Scan” Algorithm

• Keep jobs in separate queues according to


size (sizes are powers of 2)
• Serve the queues Round Robin, scheduling
all jobs from each queue (they pack
perfectly)
• Assuming constant work model, large jobs
only block the machine for a short time
Krueger et al., IEEE TPDS 1994
Scaling Models
• Constant work
– Parallelism for speedup: Amdahl’s Law
– Large first  SJF
• Constant time
– Size and runtime are uncorrelated
• Memory bound
– Large first  LJF
– Full-size jobs lead to blockout

Worley, SIAM JSSC 1990


The Data

Data: SDSC Paragon, 1995/6


The Data

Data: SDSC Paragon, 1995/6


The Data

Data: SDSC Paragon, 1995/6


Conclusion
• Parallelism used for better results, not for
faster results
• Constant work model is unrealistic
• Memory bound model is reasonable
• Scan algorithm will probably not perform
well in practice
Example #3

Backfilling and
User Runtime Estimation
Backfilling
• Variable partitioning can suffer from
external fragmentation
• Backfilling optimization: move jobs
forward to fill in holes in the schedule
• Requires knowledge of expected job
runtimes
Variants

•EASY backfilling
Make reservation for first queued job

•Conservative backfilling
Make reservation for all queued jobs
User Runtime Estimates

• Lower estimates improve chance of


backfilling and better response time
• Too low estimates run the risk of having the
job killed
• So estimates should be accurate, right?
They Aren’t

Mu’alem & Feitelson, IEEE TPDS 2001


Surprising Consequences
• Inaccurate estimates actually lead to
improved performance
• Performance evaluation results may depend
on the accuracy of runtime estimates
– Example: EASY vs. conservative
– Using different workloads
– And different metrics
EASY vs. Conservative
Using CTC SP2 workload
EASY vs. Conservative
Using Jann workload model
EASY vs. Conservative
Using Feitelson workload model
Conflicting Results Explained
• Jann uses accurate runtime estimates
• This leads to a tighter schedule
• EASY is not affected too much
• Conservative manages less backfilling of long
jobs, because respects more reservations
Conservative is bad for the long jobs
Good for short ones that are respected

Conservative

EASY
Conflicting Results Explained
• Response time sensitive to long jobs, which
favor EASY
• Slowdown sensitive to short jobs, which
favor conservative
• All this does not happen at CTC, because
estimates are so loose that backfill can
occur even under conservative
Verification
Run CTC workload with accurate estimates
But What About My Model?

Simply does not


have such small
long jobs
Workload Modeling
No Data
• Innovative unprecedented systems
– Wireless
– Hand-held
• Use an educated guess
– Self similarity
– Heavy tails
– Zipf distribution
Serendipitous Data

• Data may be collected for various reasons


– Accounting logs
– Audit logs
– Debugging logs
– Just-so logs
• Can lead to wealth of information
NASA Ames iPSC/860 log
42050 jobs from Oct-Dec 1993
user job nodes runtime date time
user4 cmd8 32 70 11/10/93 10:13:17
user4 cmd8 32 70 11/10/93 10:19:30
user42 nqs450 32 3300 11/10/93 10:22:07
user41 cmd342 4 54 11/10/93 10:22:37
sysadmin pwd 1 6 11/10/93 10:22:42
user4 cmd8 32 60 11/10/93 10:25:42
sysadmin pwd 1 3 11/10/93 10:30:43
user41 cmd342 4 126 11/10/93 10:31:32
Feitelson & Nitzberg, JSSPP 1995
Distribution of Job Sizes
Distribution of Job Sizes
Distribution of Resource Use
Distribution of Resource Use
Degree of Multiprogramming
System Utilization
Job Arrivals
Arriving Job Sizes
Distribution of Interarrival Times
Distribution of Runtimes
Job Scaling
User Activity
Repeated Execution
Application Moldability
Distribution of Run Lengths
Predictability in Repeated Runs
Research Agenda
The Needs
• New systems tend to be more complex
• Differences tend to be finer
• Evaluations require more detailed data
• Getting more data requires more work
• Important areas:
– Internal structure of applications
– User behavior
Generic Application Model
• Iterations of
– Compute
• granularity
compute
• Memory working set / locality
– I/O I/O
• Interprocess locality
– Communicate communicate
• Pattern, volume
• Option of phases with
different patterns of iterations
Consequences
• Model the interaction of the application
with the system
– Support for communication pattern
– Availability of memory

Application attributes depend on system


Effect of multi-resource schedulers
Missing Data
• There has been some work on the
characterization of specific applications
• There has been no work on the distribution
of application types in a complete workload
– Distribution of granularities
– Distribution of working set sizes
– Distribution of communication patterns
Effect of Users

• Workload is generated by users


• Human users do not behave like a random
sampling process
– Feedback based on system performance
– Repetitive working patterns
Feedback
• User population is finite
• Users back off when performance is
inadequate

Negative feedback
Better system stability

• Need to explicitly model this behavior


Locality of Sampling
• Users display different levels of activity at
different times
• At any given time, only a small subset of
users is active
• These users repeatedly do the same thing
• Workload observed by system is not a
random sample from long-term distribution
Final Words…
We like to think
that we design
systems based
on solid
foundations…
But beware:
the foundations
might be
unbased
assumptions!
Computer Systems are Complex

We should have more “science” in computer


science:
• Run experiments under different conditions
• Make measurements and observations
• Make predictions and verify them
Acknowledgements
• Students: Ahuva Mu’alem, David Talby,
Uri Lublin
• Larry Rudolph / MIT
• Data in Parallel Workloads Archive
– Joefon Jann / IBM
– CTC SP2 log
– SDSC Paragon log
– SDSC SP2 log
– NASA iPSC/860 log

You might also like