You are on page 1of 33

Real-World

Performance
Monitoring: Can You Believe the
CPU Numbers ?

Andrew Holdsworth
V.P. Real-World Performance
Database Development, Server Technologies
September, 2016

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direcTon. It is intended for
informaTon purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or funcTonality, and should not be relied upon
in making purchasing decisions. The development, release, and Tming of any features or
funcTonality described for Oracles products remains at the sole discreTon of Oracle.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 3
Real-World Performance SANGAM 2016
Where to Find Us
Session Time: Friday 10:30 AM
Real-World Performance Monitoring: Can You Believe the CPU Numbers ?
Room #1

Session Time: Friday 5:20 PM


Layered SoJware Architectures and Performance in the Real World
Room #1

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 4
Real-World Performance SANGAM 2016
This years ObservaTons
Almost obsessive belief that pla]orm
tuning/selecTon will solve all performance
problems

Almost as much focus on other aspects that


deliver incremental gains

Very li^le work on algorithmic and


architecture changes that yield order of
magnitude gains

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 5
Real-World Performance SANGAM 2016
Where to Get more Real-World Performance
See online video library
h^p://www.oracle.com/goto/oll/rwp

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 6
Program Agenda

1 Key Metrics
2 What happens when we load a system
3 Todays CPUs are not our Dads CPUs
4 How to look at CPU UTlizaTon
5 The Benchmark Dilemma
6 Some Real World Advice and ImplicaTons

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 7
Key Metrics
We have tradiTonally used CPU uTlizaTon
percentage to determine uTlizaTon and
headroom on any system.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 8
Key Metrics
Racing Drivers use the Tachometer or RPM
Gauge to see what % of the engine they use.
Easy to see where the expression Red Lining
comes from.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 9
Key Metrics
Target boat speed is essenTal for any racing
sailor to make sure the boat and crew are all
working together properly

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 10
Key Metrics
Endurance athletes such as cyclists, tri-
athletes will have many key metrics such as
power, heart rate and speed.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 11
What Happens When We Load a System

Queuing Plateau or
StarTng Zone Time-slicing Degraded, Unstable,
Zone Contending Zone

Output
Linear Safe Zone

Increasing Workload
November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 12
Todays CPUs are not our Dads CPUs
Not a Single Uniform Processor
Basic Geometry Sockets, Cores, Threads ( Thanks Intel ! )
Sockets is what is on the hardware board
Core is the number of processors per socket
Threads, onen called virtual CPUs, represent an on-chip cached environment
Cores ability to simultaneously reTre instrucTons

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 13
Todays CPUs are not our Dads CPUs
Not a Single Uniform Processor

Non Uniform Speeds or Cache access speeds


Power and Heat eects
Turbo Modes
Single Thread/Process opTmizaTon/SPECint Specials
L1, L2, L3 Caches, near or far DRAM access ( nano to micro seconds )
And there is more !

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 14
Todays CPUs are not our Dads CPUs
Not a Single Uniform Processor
Hardware threads
MulTple on-chip cache copies of instrucTon pipelines waiTng and ready to run when
another thread stalls on DRAM access
The ability to simultaneously reTre instrucTons on a single core
Onen confused with threads but something quite dierent

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 15
How to Look at CPU UTlizaTon
AWR Report
The Per Second staTsTcs on the rst page of
an AWR report present two of the most
useful insights into the state of the system.
- DB Time(s) Number of processes
execuTng DB calls
- DB CPU(s) Number of processes
a^empTng to run on the CPU
- If the DB CPU(s) Per Second exceeds the
number of cores on the system is there a
fair chance we have exceeded the linear
zone ?

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 16
How to Look at CPU UTlizaTon
OperaTng System Level
Todays operaTng systems see all HW
threads as virtual CPUs and express
uTlizaTon as a percentage of all threads
uTlizaTon.
This creates a problem when the number of
threads is much greater than the number of
cores and this leads to under reporTng of
actual HW CPU uTlizaTon
This is not as simple just being wrong by the
number cores as a mulTplier because
modern CPUs are able to reTre mulTple
instrucTons on a single core.
Tools such as vmstat, sar and top fall vicTm
to this and cannot reliably give true HW
uTlizaTon

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 17
How to look at CPU UTlizaTon
CPU Counters ( Note highly CPU specic )
Usually executed as root
Not easy to interpret
Easy to come to the wrong conclusion
Tools
Intel Performance Counter Meter or PCM
SPARC pgstat v (ran as root )

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 18
How to look at CPU UTlizaTon
CPU Counters ( pgstat SPARC CPU specic )
PG RELATIONSHIP HW UTIL CAP SW USR SYS IDLE CPUS

HW to SW uTlizaTon raTo is 4:1


39 Floating_Point_Unit 0.1% 2.5M 4.1B 10.1% 9.0% 1.1% 89.9% 96-103
when there is actually 8 threads
38 Integer_Pipeline indicaTng simultaneous reTrement
36.1% 3.0B 8.3B 10.1% 9.0% 1.1% 89.9% 96-103
44 Floating_Point_Unit 0.1% 2.6Mof instrucTons
4.1B 10.3% 9.1% 1.2% 89.7% 104-111
43 Integer_Pipeline 36.1% 3.0B 8.3B 10.3% 9.1% 1.2% 89.7% 104-111
82 Floating_Point_Unit 0.1% 2.6M 4.1B 11.0% 9.9% 1.1% 89.0% 208-215
81 Integer_Pipeline 38.1% 3.2B 8.3B 11.0% 9.9% 1.1% 89.0% 208-215
85 Floating_Point_Unit 0.1% 2.6M 4.1B 10.7% 9.6% 1.1% 89.3% 216-223
84 Integer_Pipeline 37.2% 3.1B 8.3B 10.7% 9.6% 1.1% 89.3% 216-223

Note Integer Pipeline Dominant SW and HW uTlizaTon values


Dierent e.g. O/S vs HW

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 19
How to look at CPU UTlizaTon
How to make a core 100% Busy
#include <stdio.h>
int main(int argc, char **argv) {
register unsigned x=0, y=1, i=0;
for (;;x++)
for(i=0;i<1;i++)
{
y=y+x;
i++;
}
return 0;

}
November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 20
How to look at CPU UTlizaTon
And now for the Real World
Cache misses/stalls
TheoreTcal UTlizaTon vs Actual UTlizaTon
These uTlizaTon numbers have no correlaTon with the upstack measurements
For Oracle Database in the Real World it is impossible to get the HW running at 100%
uTlizaTon even if the O/S says 100%
Sensible numbers would be closer to 50-60%

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 21
How to look at CPU UTlizaTon
And now for the Real Oracle World
The Oracle Binary
Extremely large and bigger than CPU cache sizes
Great eort is used at compilaTon and link edit Tme to opTmize both execuTon and
cache footprint by use of feedback and staTsTcal techniques

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 22
5. The Benchmark Dilemma
How do we test these things.
Tests are arTcial and simple
Simple is not Real World
Think about the ulTmate C program to test HW uTlizaTon does look like code in the
Real World
Simple tesTng in this case rarely represents reality this may lead to disappointment in
producTon of more variable and complex workloads.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 23
6. Some Real World Advice and ImplicaTons
Do threads actually work ?
What happens when the HW reaches its theoreTcal max for the workload
ImplicaTon Capacity Planning and Headroom CalculaTons

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 24
Some Semi-Real World TesTng
Threads/HW UTlizaTon/Planning
2 Threads per Core 8 Threads per Core
120 120

100 100

80 80
CPU Thread CPU Thread
CPU No Thread CPU No Thread
60 60
THRU Thread THRU Thread
THRU No Thread THRU No Thread
40 40

20 20

0 0

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 25
2 Threads per Core
120
When O/S reports 40% uTlizaTon with
threading turned on the O/S uTlizaTon is
100 close to 70% is for the unthreaded case to
achieve the same applicaTon workload
80 In this case we think we are in the safe linear
CPU Thread zone when we are in fact in the starTng to
CPU No Thread queue zone.
60
THRU Thread

THRU No Thread

40

20

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 26
8 Threads per Core
When O/S reports 15% uTlizaTon with
120
threading turned on the O/S uTlizaTon is
close to 70% is for the unthreaded case to
100 achieve the same applicaTon workload
In this case we think we are in the safe linear
80 zone when we are in fact in the starTng to
CPU Thread queue zone.
CPU No Thread
60 THRU Thread Note this is more dramaTc than the 2 core
THRU No Thread by the margin of error
40 Also note the posiTve impact of threading
with high uTlizaTon
20

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 27
Conclusions on How to look at CPU UTlizaTon
Simple Guidelines
Understand the impact on HW threads and simultaneous reTrement of
instrucTons and the impact on Database and O/S top-down StaTsTcs
Database level
1-1.5 CPU Sec/Sec per core may indicate over subscripTon
OS/Level
Be aware the OS assumes threads are actual processors which they are not
However simultaneous reTrement of instrucTons mean it is not reasonable to divide
by the thread count per core
Be^er numbers might be 1.4-1.6 for Intel or 4 for SPARC

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 28
Conclusions on How to look at CPU UTlizaTon
Whats next ?
VirtualizaTon
Impossible to gauge without understanding the FracTon of the Machine you actually
gewng
EducaTon
Todays decision makers need help to avoid false assumpTons
Lobby for improvements in vmstat, sar, top

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 29
References and Acknowledgements
References
h^ps://sonware.intel.com/en-us/arTcles/intel-performance-counter-monitor

Acknowledgements
Bjrn Engsig Real World Performance
Andy Bowers SPARC Systems Performance

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 30
Safe Harbor Statement
The preceding is intended to outline our general product direcTon. It is intended for
informaTon purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or funcTonality, and should not be relied upon
in making purchasing decisions. The development, release, and Tming of any features or
funcTonality described for Oracles products remains at the sole discreTon of Oracle.

November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 31
November 11, 2016 Copyright 2016, Oracle and/or its aliates. All rights reserved. | 32

You might also like