The Forgotten Factor: FACTS: On Performance Evaluation and Its Dependence On Workloads

The Forgotten Factor: FACTS
on Performance Evaluation
and its Dependence on Workloads
Dror Feitelson
Hebrew University
Performance Evaluation
• In system design
– Selection of algorithms
– Setting parameter values
• In procurement decisions
– Value for money
– Meet usage goals
• For capacity planing
The Good Old Days…
• The skies were
blue
• The simulation
results were
conclusive
• Our scheme
was better than
theirs
Feitelson & Jette, JSSPP 1997
But in their papers,
Their scheme was better than ours!

How could they be so wrong?
Performance evaluation depends on:
• The system’s design
(What we teach in algorithms and data structures)
• Its implementation
(What we teach in programming courses)
• The workload to which it is subjected
• The metric used in the evaluation
• Interactions between these factors
Performance evaluation depends on:
• The system’s design
(What we teach in algorithms and data structures)
• Its implementation
(What we teach in programming courses)
• The workload to which it is subjected
• The metric used in the evaluation
• Interactions between these factors
Outline for Today
• Three examples of how workloads affect

performance evaluation
• Workload modeling
• Research agenda
In the context of parallel job scheduling
Example #1
Gang Scheduling and

Job Size Distribution
Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Ousterhout, ICDCS 1982

Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Optimization:
Alternative
scheduling
Ousterhout, ICDCS 1982
Packing Jobs
Use a buddy system for allocating processors
Feitelson & Rudolph, Computer 1990

Packing Jobs
Packing Jobs
Packing Jobs
Packing Jobs
The Question:
• The buddy system leads to internal
fragmentation
• But it also improves the chances of
alternative scheduling, because processors
are allocated in predefined groups
Which effect dominates the other?

The Answer (part 1):
Feitelson & Rudolph, JPDC 1996

• Many small jobs
• Many sequential jobs
• Many power of two jobs
• Practically no jobs use full machine
Conclusion: buddy system should work well

Verification
Feitelson, JSSPP 1996

Example #2
Parallel Job Scheduling

and Job Scaling
Variable Partitioning
• Each job gets a dedicated partition for the

duration of its execution
• Resembles 2D bin packing
• Packing large jobs first should lead to better
performance
• But what about correlation of size and
runtime?
“Scan” Algorithm
• Keep jobs in separate queues according to

size (sizes are powers of 2)
• Serve the queues Round Robin, scheduling
all jobs from each queue (they pack
perfectly)
• Assuming constant work model, large jobs
only block the machine for a short time
Krueger et al., IEEE TPDS 1994
Scaling Models
• Constant work
– Parallelism for speedup: Amdahl’s Law
– Large first  SJF
• Constant time
– Size and runtime are uncorrelated
• Memory bound
– Large first  LJF
– Full-size jobs lead to blockout
Worley, SIAM JSSC 1990

The Data
Data: SDSC Paragon, 1995/6

The Data

The Data

Conclusion
• Parallelism used for better results, not for
faster results
• Constant work model is unrealistic
• Memory bound model is reasonable
• Scan algorithm will probably not perform
well in practice
Example #3
Backfilling and
User Runtime Estimation
Backfilling
• Variable partitioning can suffer from
external fragmentation
• Backfilling optimization: move jobs
forward to fill in holes in the schedule
• Requires knowledge of expected job
runtimes
Variants
•EASY backfilling
Make reservation for first queued job
•Conservative backfilling
Make reservation for all queued jobs
User Runtime Estimates
• Lower estimates improve chance of

backfilling and better response time
• Too low estimates run the risk of having the
job killed
• So estimates should be accurate, right?
They Aren’t
Mu’alem & Feitelson, IEEE TPDS 2001

Surprising Consequences
• Inaccurate estimates actually lead to
improved performance
• Performance evaluation results may depend
on the accuracy of runtime estimates
– Example: EASY vs. conservative
– Using different workloads
– And different metrics
EASY vs. Conservative
Using CTC SP2 workload
Using Jann workload model
Using Feitelson workload model
Conflicting Results Explained
• Jann uses accurate runtime estimates
• This leads to a tighter schedule
• EASY is not affected too much
• Conservative manages less backfilling of long
jobs, because respects more reservations
Conservative is bad for the long jobs
Good for short ones that are respected
Conservative
EASY
Conflicting Results Explained
• Response time sensitive to long jobs, which
favor EASY
• Slowdown sensitive to short jobs, which
favor conservative
• All this does not happen at CTC, because
estimates are so loose that backfill can
occur even under conservative
Verification
Run CTC workload with accurate estimates
But What About My Model?
Simply does not

have such small
long jobs
Workload Modeling
No Data
• Innovative unprecedented systems
– Wireless
– Hand-held
• Use an educated guess
– Self similarity
– Heavy tails
– Zipf distribution
Serendipitous Data
• Data may be collected for various reasons

– Accounting logs
– Audit logs
– Debugging logs
– Just-so logs
• Can lead to wealth of information
NASA Ames iPSC/860 log
42050 jobs from Oct-Dec 1993
user job nodes runtime date time
user4 cmd8 32 70 11/10/93 10:13:17
user4 cmd8 32 70 11/10/93 10:19:30
user42 nqs450 32 3300 11/10/93 10:22:07
user41 cmd342 4 54 11/10/93 10:22:37
sysadmin pwd 1 6 11/10/93 10:22:42
user4 cmd8 32 60 11/10/93 10:25:42
sysadmin pwd 1 3 11/10/93 10:30:43
user41 cmd342 4 126 11/10/93 10:31:32
Feitelson & Nitzberg, JSSPP 1995
Distribution of Job Sizes
Distribution of Job Sizes
Distribution of Resource Use
Distribution of Resource Use
Degree of Multiprogramming
System Utilization
Job Arrivals
Arriving Job Sizes
Distribution of Interarrival Times
Distribution of Runtimes
Job Scaling
User Activity
Repeated Execution
Application Moldability
Distribution of Run Lengths
Predictability in Repeated Runs
Research Agenda
The Needs
• New systems tend to be more complex
• Differences tend to be finer
• Evaluations require more detailed data
• Getting more data requires more work
• Important areas:
– Internal structure of applications
– User behavior
Generic Application Model
• Iterations of
– Compute
• granularity
compute
• Memory working set / locality
– I/O I/O
• Interprocess locality
– Communicate communicate
• Pattern, volume
• Option of phases with
different patterns of iterations
Consequences
• Model the interaction of the application
with the system
– Support for communication pattern
– Availability of memory
Application attributes depend on system

Effect of multi-resource schedulers
Missing Data
• There has been some work on the
characterization of specific applications
• There has been no work on the distribution
of application types in a complete workload
– Distribution of granularities
– Distribution of working set sizes
– Distribution of communication patterns
Effect of Users
• Workload is generated by users

• Human users do not behave like a random
sampling process
– Feedback based on system performance
– Repetitive working patterns
Feedback
• User population is finite
• Users back off when performance is
inadequate
Negative feedback
Better system stability
• Need to explicitly model this behavior

Locality of Sampling
• Users display different levels of activity at
different times
• At any given time, only a small subset of
users is active
• These users repeatedly do the same thing
• Workload observed by system is not a
random sample from long-term distribution
Final Words…
We like to think
that we design
systems based
on solid
foundations…
But beware:
the foundations
might be
unbased
assumptions!
Computer Systems are Complex
We should have more “science” in computer

science:
• Run experiments under different conditions
• Make measurements and observations
• Make predictions and verify them
Acknowledgements
• Students: Ahuva Mu’alem, David Talby,
Uri Lublin
• Larry Rudolph / MIT
• Data in Parallel Workloads Archive
– Joefon Jann / IBM
– CTC SP2 log
– SDSC Paragon log
– SDSC SP2 log
– NASA iPSC/860 log

The Forgotten Factor: FACTS: On Performance Evaluation and Its Dependence On Workloads

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Forgotten Factor: FACTS: On Performance Evaluation and Its Dependence On Workloads

Uploaded by

Copyright:

Available Formats

The Forgotten Factor: FACTS

Their scheme was better than ours!

• Three examples of how workloads affect

Gang Scheduling and

Ousterhout, ICDCS 1982

Feitelson & Rudolph, Computer 1990

Which effect dominates the other?

Feitelson & Rudolph, JPDC 1996

Conclusion: buddy system should work well

Feitelson, JSSPP 1996

Parallel Job Scheduling

• Each job gets a dedicated partition for the

• Keep jobs in separate queues according to

Worley, SIAM JSSC 1990

Data: SDSC Paragon, 1995/6

Data: SDSC Paragon, 1995/6

Data: SDSC Paragon, 1995/6

• Lower estimates improve chance of

Mu’alem & Feitelson, IEEE TPDS 2001

Simply does not

• Data may be collected for various reasons

Application attributes depend on system

• Workload is generated by users

• Need to explicitly model this behavior

We should have more “science” in computer

You might also like