You are on page 1of 78

LINUX PERFORMANCE

TROUBLESHOOTING:
USE METHOD
Chad Dorton
06/27/2018

1 © 2016 Proofpoint, Inc.


Introduction
What is the USE Method?

2 © 2016 Proofpoint, Inc.


USE Method for Performance Troubleshooting

For every resource, check utilization, saturation, and


errors.

3 © 2016 Proofpoint, Inc.


USE Method in Troubleshooting Context
▪ First, state the problem you are trying to solve

▪ Then, for every system resource check:


▪ Utilization
▪ Saturation
▪ Errors

▪ Third, follow up analysis (examples)


▪ Drill-down analysis
▪ Latency analysis
▪ Event tracing
▪ Etc…

4 © 2016 Proofpoint, Inc.


Problem Statement
▪ Prerequisite for the USE Method:
▪ What makes you think there is a problem?

▪ Has this system every performed well?

▪ Any recent system changes? (Software? Hardware? Load? Etc...)

▪ Can the performance degradation be expressed in terms of latency or run time?

▪ Does the problem affect other people or applications (or is it just you)?

▪ What is the environment? What software and hardware are used? (Versions?
Configuration? Etc...)

5 © 2016 Proofpoint, Inc.


USE Method: Utilization
▪ Two ways to look at utilization

▪ Time based: time resource was busy (units or %)


- Many Linux performance tools (top, iostat) report this type
- CPU: time spent working/time spent idle
- Current IOPS/max or current throughput/max are variation on this type

▪ Capacity of the resource that was use


- % main memory consumed
- % disk space consumed

▪ While busy the device may still be able to accept work (until saturation)

▪ Storage devices use both methods of measuring utilization

6 © 2016 Proofpoint, Inc.


Load vs Throughput
USE Method: Saturation 100

▪ Saturation begins at 100% utilization

Threads
10

1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Ops/sec

Latency vs Throughput
40
▪ When load is added to a system past 35
64

saturation point, load just increases latency 30


25

Lat (ms)
20 32
15
10 16
5 1 2 4 8
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Ops/sec

8 Core VM

7 © 2016 Proofpoint, Inc.


The USE Method: Errors
▪ Check the logs:
▪ Applications logs
▪ Service logs
▪ dmesg | less is your friend:
root@us3-mdac16-2:~# dmesg | less

[3195690.522857] CPU: 6 PID: 37 Comm: migration/6 Tainted: G L 4.4.0-116-generic #140-Ubuntu


[3195690.522860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[3195690.522863] task: ffff8806216e8e00 ti: ffff8806216f0000 task.ti: ffff8806216f0000
[3195690.522865] RIP: 0010:[<ffffffff810ac14d>] [<ffffffff810ac14d>] finish_task_switch+0x7d/0x230
[3195690.792220] RSP: 0000:ffff8806216f3df8 EFLAGS: 00000246
[3195690.792221] RAX: ffff880621736200 RBX: ffff8806216e8e00 RCX: 0000000000000000
[3195690.792222] RDX: 0000000000000000 RSI: ffff8806216e8e00 RDI: ffff8806257972c0
[3195690.792223] RBP: ffff8806216f3e20 R08: ffff8806216f0000 R09: 0000000000000000
[3195690.792224] R10: ffff88063ffe6000 R11: 0000000000000001 R12: ffff8806257972c0
[3195690.792225] R13: ffff880621736200 R14: ffff8805caa55c00 R15: 0000000000000000
[3195690.792227] FS: 0000000000000000(0000) GS:ffff880625780000(0000) knlGS:0000000000000000
[3195690.792228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[3195690.792229] CR2: 00007f6f85a41000 CR3: 0000000001e0a000 CR4: 0000000000160670

▪ Other tools we’ll discuss

8 © 2016 Proofpoint, Inc.


USE Method: The Order of Steps

9 © 2016 Proofpoint, Inc.


USE Method: Hardware Resource Examples
▪ CPUs: sockets, cores, hardware threads, virtual CPUs, etc
▪ Main memory: DRAM
▪ Network Interfaces: Ethernet ports
▪ Storage devices: disks, disk arrays
▪ Controllers: storage, network
▪ Interconnects: CPU, memory, I/O

10 © 2016 Proofpoint, Inc.


USE Method: Server Functional Diagrams
▪ Often a great source of hardware resources to analyze is given by a server’s
Functional Block Diagram

▪ Analyze every component in the data path

▪ Becoming harder and harder to come by

▪ Often, information can be reconstructed from User Manuals

11 © 2016 Proofpoint, Inc.


USE Method: Server Functional Diagram

12 © 2016 Proofpoint, Inc.


USE Method: Software Resource Examples
▪ Mutex locks: time lock was held, threads queued waiting for locks
▪ Thread pools: time threads were working, requests wait to be serviced by
pool
▪ Process/thread capacity: # of threads currently doing work, # threads
waiting to do work, # of allocation failures
▪ File descriptor capacity: # of descriptors allocated, # of requests waiting on
descriptor allocation, # of allocation failures

13 © 2016 Proofpoint, Inc.


USE Method: Know Your Software Stack

Cache
Cache
(memcache)
DB-RO (memcache)
(logs/config)
Interface DB-RW
UI (logs/config)

DB-RO
(logs/config)
dispatch
dispatch
11

MX supermx filterque
filterque
supermx filterque
filterque
vip supermx
supermx filterque dispatch
dispatch
bb

Engine
Engine
Engine
Engine
Engine
Engine
Engine
Engine Storage
(mdac)
(mdac)
(mdac)
(mdac)
(mdac)
(mdac)
(mdac)
(mdac) (quarantine)

pgsql
(maint)

farmd
farmd
farmd
farmd
farmd
farmd
farmd Metrics Admin DNS
(pulse) (rsyslog) (mdlocal)

14 © 2016 Proofpoint, Inc.


USE Method: Easy to Obtain Metrics
Resource Type Metric
CPU Utilization CPU Utilization (either per CPU or system-wide average)
CPU Saturation Dispatcher-queue length (aka run-queue length)
Memory Utilization Available free memory (system-wide)
Memory Saturation Anonymous paging, thread swapping, page scanning, out-of-memory events
Network Interface Utilization Receive throughput/max bandwidth, send throughput/max bandwidth
Storage Device I/O Utilization Device busy percentage
Storage Device I/O Saturation Wait-queue length
Storage Device I/O Errors Device errors

15 © 2016 Proofpoint, Inc.


USE Method: Harder to Obtain Metrics
Resource Type Metric
CPU Errors Correctable CPU cache (ECC) events, faulted CPUs
Memory Errors Failed malloc()s, ECC errors
Network Saturation Saturation related network interface errors, ex: “overruns”
Storage Controller Utilization Maximum IOPS, maximum bandwidth
CPU Interconnect Utilization Per-port throughput/maximum bandwidth (CPU performance counters)
Memory Interconnect Saturation Memory stall cycles, high cycles per instruction (CPU performance
counters)
I/O Interconnect Utilization Bus throughput/maximum bandwidth (performance counters platform
dependent)

16 © 2016 Proofpoint, Inc.


Linux Observability Tool Types
System Wide
vmstat
mpstat tcpdump
iostat perf
sar
Counters Tracing

ps strace
top gdb
pmap blktrace

Per-Process

17 © 2016 Proofpoint, Inc.


Counters vs Tracing

Type Source
Per-process counters /proc
System-wide counters /proc, /sys
Device driver and debug info /sys
Per-process tracing ptrace, uprobes
Network tracing libpcap
System-wide tracing tracepoints, kprobes, ftrace

▪ Counters are “free”


▪ Already enabled and being collected
▪ Tracing
▪ Built in, but need to be enabled
▪ Tracing adds overhead
18 © 2016 Proofpoint, Inc.
Linux Observability Tools

19 © 2016 Proofpoint, Inc.


CPU Analysis
How to Determine Utilization and Saturation

20 © 2016 Proofpoint, Inc.


USE Method: CPU Analysis
▪ For each CPU, check for:
▪ Utilization: time CPU was busy (not running idle thread)
- Often available as % busy
- Check per CPU and per core
- Be aware of potential quotas in cloud environments
- Check hypervisors as well
▪ Saturation: running threads are waiting for CPU time
- Load averages may include wait on other resources
- Can be obfuscated in a virtual or cloud environment
▪ Errors: CPU errors, including uncorrectable errors

21 © 2016 Proofpoint, Inc.


CPU Analysis Tool: uptime
▪ What to look for:
▪ System uptime duration – (not exactly relevant to USE, but sometimes helpful)
▪ Load average – (related to use and saturation, but beware!)

▪ Important switches/parameters
▪ None

▪ Example:

root@m0131372:~# uptime
00:53:59 up 2:34, 4 users, load average: 15.90, 10.27, 4.51

22 © 2016 Proofpoint, Inc.


Linux Load Averages
▪ If you come from a UNIX/Solaris background:
▪ load average indicates demand for CPU
▪ Three values are 1-, 5-, and 15-minute moving averages (exponentially damped)
▪ Rule of thumb: load average > CPU count = BAD
Load Average Over Time
▪ Rule of thumb: load average < CPU count = GOOD 1.2

▪ An experiment: 1

▪ Conclusion: 0.8

▪ 1-, 5-, and 15- minutes averages are:


0.6
▪ Not 1-, 5-, and 15- minutes
▪ Not averages 0.4

0.2

0
0 60 120 180 240 300 360 420 480 540 600 660

1 Min 10 min 15 min

23 © 2016 Proofpoint, Inc.


Linux Load Averages
▪ In the early days of Linux, load averages worked essentially the same as UNIX and others
▪ Originally intended as a measure of CPU demand
▪ In 1993, TASK_UNINTERRUPTIBLE and TASK_SWAPPING were added in Linux
▪ Why?
▪ “[t]he following patch seems to make the load average much more consistent WRT the
subjective speed of the system.”
▪ It includes processes waiting for disk IO.
▪ Result:
▪ Linux Load Average is isn’t a measure of CPU, really
▪ It’s really measure of general system demand
▪ Better tools to assess CPU utilization and saturation will be discussed

24 © 2016 Proofpoint, Inc.


CPU Analysis Tool: vmstat
▪ Important switches/parameters
▪ What to look for:
▪ time interval
▪ CPU Utilization (us + sy + st columns)
▪ -w (wide view, trust me)
▪ CPU Saturation (r column)

root@m0131372:~# vmstat -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 32567528 14864 195140 0 0 5 3 55 37 7 0 93 0 0
1 0 0 32567512 14864 195140 0 0 0 0 552 353 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 561 367 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 557 371 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 588 398 13 0 88 0 0

▪ r: run-queue length—threads waiting+running ▪ id: idle


▪ us: user-time ▪ wa: wait I/O
▪ sy: system-time (kernel) ▪ st: stolen—Time spent servicing other VMs

25 © 2016 Proofpoint, Inc.


CPU Analysis Tool: mpstat
▪ What to look for: ▪ Important switches/parameters
▪ CPU Utilization (all columns but ▪ time interval
%idle,%iowait) ▪ -P ALL (per CPU stats)
root@m0131372:~# mpstat -P ALL 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/05/2018 _x86_64_ (8 CPU)

02:57:37 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:57:38 AM all 12.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 87.48
02:57:38 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:57:38 AM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
02:57:38 AM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

▪ %nice: user-time for nice’d procs ▪ %soft: software interrupt CPU usage
▪ %irq: hardware interrupt CPU usage ▪ %gnice: CPU time spent running nice’d guests

26 © 2016 Proofpoint, Inc.


CPU Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ CPU Utilization (sar –P ALL duplicates ▪ time interval (real time data)
mpstat) ▪ -P ALL (per CPU util stats)
▪ CPU Saturation (runq-sz with sar –q) ▪ -q CPU queue and load averages

root@m0131372:~# sar -q 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/05/2018 _x86_64_ (8 CPU)

03:20:14 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked


03:20:15 AM 16 179 14.96 6.95 3.04 0
03:20:16 AM 16 179 14.96 6.95 3.04 0
03:20:17 AM 16 179 14.96 6.95 3.04 0
03:20:18 AM 16 179 15.05 7.10 3.11 0
03:20:19 AM 16 179 15.05 7.10 3.11 0

▪ runq-sz: waiting tasks + running tasks


▪ plist-sz: number of tasks in the task list
▪ Blocked: number of procs waiting on IO

27 © 2016 Proofpoint, Inc.


CPU Analysis Tool: top
▪ What to look for: ▪ Important options
▪ CPU Utilization (per system, per ▪ 1 – show individual CPU utilization
processor, per process) ▪ I – toggle Irix mode on/off
▪ H – show per thread utilization
top - 23:32:05 up 21:38, 4 users, load average: 15.70, 8.80, 3.76
Tasks: 146 total, 1 running, 145 sleeping, 0 stopped, 0 zombie
%Cpu(s):100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32929988 total, 32260660 free, 170604 used, 498724 buff/cache
KiB Swap: 7813116 total, 7813116 free, 0 used. 32325788 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND


49761 root 20 0 109676 7160 5952 S 800.0 0.0 32:30.58 sysbench
1 root 20 0 37512 5628 4012 S 0.0 0.0 0:01.52 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H

Helpful for diagnosing utilization/saturation on an individual core/CPU or from an individual process

28 © 2016 Proofpoint, Inc.


Memory Analysis
How to Determine Utilization and Saturation

29 © 2016 Proofpoint, Inc.


Linux Virtual Memory
▪ Each process has it’s own larger, linear, private logical address space
▪ Processes address space is mapped to main memory and physical swap device

Process Address Space

Virtual Memory

Main Memory
Anonymous
(Physical) Swap Device
Paging

30 © 2016 Proofpoint, Inc.


Paging in Linux
▪ Movement of pages in and out of memory
▪ Allows partially loaded programs to execute
▪ Allows programs larger than main memory to execute
▪ Efficient movement of data between software and storage

▪ Filesystem Paging
▪ Reading and writing of memory-mapped files (mmap())
▪ Code execution
▪ Reading from/writing to Filesystem Page Cache
▪ The good kind of paging!

▪ Anonymous Paging
▪ Writing out (and writing back in private process pages)
▪ Pages live in anonymous swap space (swapon())
▪ The bad kind of paging!
▪ Sometimes referred to as “swapping.”

31 © 2016 Proofpoint, Inc.


Swappiness

▪ vm.swappiness:
▪ Parameter between 1 and 100
▪ Higher value
- Favors freeing memory by paging applications (anonymous paging)
▪ Lower value
- Favors freeing memory by reclaiming page cache
▪ My view:
▪ Set to 0 (1 on older kernels)
▪ Is it really ever ok for your production app to page anonymously?

32 © 2016 Proofpoint, Inc.


Memory Analysis Tool: free
▪ What to look for:
▪ Memory Utilization (used and free columns)
▪ Memory Saturation – (only if free is low and buff/cache is low)

▪ Important switches/parameters
▪ Specify units (-m megabytes, -g gigabytes)

▪ Example:
root@m0131372:~# free -m
total used free shared buff/cache available
Mem: 32158 163 31503 49 491 31571
Swap: 7629 0 7629

33 © 2016 Proofpoint, Inc.


Memory Analysis Tool: vmstat
▪ What to look for: ▪ Important switches/parameters
▪ Memory Utilization (free + buff + cache columns) ▪ time interval
▪ Memory Saturation (si, so, and swpd columns) ▪ -w (wide view, trust me)

root@m0131372:~# vmstat -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 32567528 14864 195140 0 0 5 3 55 37 7 0 93 0 0
1 0 0 32567512 14864 195140 0 0 0 0 552 353 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 561 367 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 557 371 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 588 398 13 0 88 0 0

▪ free: free available memory ▪ swpd: amount of swapped-out memory


▪ buff: memory in the buffer cache ▪ si: memory swapped in (anonymous paging)
▪ cache: memory in the page cache ▪ so: memory swapped out (anonymous paging)

34 © 2016 Proofpoint, Inc.


Memory Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Memory Utilization (%memused) ▪ time interval (real time data)
▪ -r memory usage statistics
root@m0131372:~# sar -r 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/06/2018 _x86_64_ (8 CPU)

12:30:57 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
12:30:58 AM 32257328 672660 2.04 56292 358788 314860 0.77 310584 172492 8
12:30:59 AM 32257328 672660 2.04 56292 358788 314860 0.77 310584 172492 8
12:31:00 AM 32257328 672660 2.04 56292 358788 314860 0.77 310584 172492 8
12:31:01 AM 32257328 672660 2.04 56292 358788 314860 0.77 310584 172492 8
12:31:02 AM 32257328 672660 2.04 56292 358788 314860 0.77 310584 172492 8

▪ kbmemfree: free memory ▪ kbcommit: estimate of mem needed to serve


current workload
▪ kbmemused: used memory (excluding kernel)
▪ %commit: %memory committed
▪ %memused: % memory used
▪ kbactive: active list memory size
▪ kbbuffers: buffer cache size
▪ kbinanct: inactive memory list size
▪ kcached: page cache size ▪ kbdirty: kb waiting to be written back to disk

35 © 2016 Proofpoint, Inc.


Memory Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Memory Saturation (pgscank + ▪ time interval (real time data)
pgscand) ▪ -B report paging statistics
root@m0131372:~# sar -B 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/06/2018 _x86_64_ (8 CPU)

12:44:13 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
12:44:14 AM 0.00 0.00 18.00 0.00 38.00 0.00 0.00 0.00 0.00
12:44:15 AM 0.00 0.00 0.00 0.00 29.00 0.00 0.00 0.00 0.00
12:44:16 AM 0.00 0.00 0.00 0.00 29.00 0.00 0.00 0.00 0.00
12:44:17 AM 0.00 0.00 0.00 0.00 45.00 0.00 0.00 0.00 0.00
12:44:18 AM 0.00 0.00 0.00 0.00 29.00 0.00 0.00 0.00 0.00

▪ pgpgin/s: page-ins ▪ pgscank/s: pages scanned by page-out daemon


▪ pgpsout/s: page-outs ▪ pgscand/s: direct page scans
▪ faults/s: both major and minor faults ▪ pgsteal/s: pages reclaimed from cache
▪ majfaults/s: major faults ▪ %vmeff: page reclaim efficiency (page
▪ pgfree/s: pages added to free list steal/page scan)

36 © 2016 Proofpoint, Inc.


Memory Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Memory Saturation (pswpin/s, ▪ time interval (real time data)
pswpout/s) ▪ -W report swapping statistics
root@m0131372:~# sar -W 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/06/2018 _x86_64_ (8 CPU)

12:57:43 AM pswpin/s pswpout/s


12:57:44 AM 0.00 0.00
12:57:45 AM 0.00 0.00
12:57:46 AM 0.00 0.00
12:57:47 AM 0.00 0.00
12:57:48 AM 0.00 0.00

▪ pswpin/s: page-ins (Linux “swap-ins”)


▪ pswpout/s: page-outs (Linux “swap-ins”)

37 © 2016 Proofpoint, Inc.


Memory Analysis Tool: slabtop
▪ What to look for: ▪ Important options
▪ Memory Utilization (kernel slab cache ▪ -sc sort by cache size
information)

Active / Total Objects (% used) : 2367805 / 2901991 (81.6%)


Active / Total Slabs (% used) : 80865 / 80865 (100.0%)
Active / Total Caches (% used) : 71 / 110 (64.5%)
Active / Total Size (% used) : 549891.99K / 611021.77K (90.0%)
Minimum / Average / Maximum Object : 0.01K / 0.21K / 8.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
599172 599172 100% 0.57K 21399 28 342384K radix_tree_node
2003118 1503603 75% 0.10K 51362 39 205448K buffer_head
10110 5559 54% 1.05K 337 30 10784K ext4_inode_cache
35238 23029 65% 0.19K 1678 21 6712K dentry
10528 10528 100% 0.55K 376 28 6016K inode_cache

▪ OBJS: number of objects ▪ SLABS: number of slabs


▪ ACTIVE: number of active objects ▪ OBJ/SLAB: objects per slab
▪ USE: cache utilization ▪ CACHE SIZE: cache size
▪ OBJ SIZE: object size ▪ NAME: name

38 © 2016 Proofpoint, Inc.


Memory Analysis Tool: top
▪ What to look for: ▪ Important options
▪ Memory Utilization (per system, per ▪ H – show per thread utilization
processor, per process)
▪ VIRT, RES, %MEM
top - 23:32:05 up 21:38, 4 users, load average: 15.70, 8.80, 3.76
Tasks: 146 total, 1 running, 145 sleeping, 0 stopped, 0 zombie
%Cpu(s):100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32929988 total, 32260660 free, 170604 used, 498724 buff/cache
KiB Swap: 7813116 total, 7813116 free, 0 used. 32325788 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND


49761 root 20 0 109676 7160 5952 S 800.0 0.0 32:30.58 sysbench
1 root 20 0 37512 5628 4012 S 0.0 0.0 0:01.52 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H

▪ VIRT: virtual memory size


▪ RES: resident set size
▪ %MEM: RSS as a percentage of system total

39 © 2016 Proofpoint, Inc.


Memory Analysis: Other Things to Check
▪ dmesg | grep killed
▪ Out-of-Memory (OOM) killer shows memory saturation

▪ Check dmesg for physical failure

▪ Console for dmesg errors

40 © 2016 Proofpoint, Inc.


Filesystem Analysis
How to Determine Utilization and Saturation

41 © 2016 Proofpoint, Inc.


USE Methodology and File Systems
▪ Filesystems are not a physical resource
▪ Filesystems can and do have a significant performance impact
▪ With respect to the USE Methodology
▪ Cache Utilization
▪ Cache Saturation
▪ Filesystem Errors
▪ Latency

42 © 2016 Proofpoint, Inc.


Filesystem Analysis Tool: vmstat
▪ What to look for: ▪ Important switches/parameters
▪ Buffer Cache Utilization (buff + cache ▪ time interval
columns) ▪ -w (wide view, trust me)

root@m0131372:~# vmstat -w 1
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 32567528 14864 195140 0 0 5 3 55 37 7 0 93 0 0
1 0 0 32567512 14864 195140 0 0 0 0 552 353 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 561 367 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 557 371 13 0 88 0 0
1 0 0 32567512 14864 195140 0 0 0 0 588 398 13 0 88 0 0

▪ free: free available memory ▪ swpd: amount of swapped-out memory


▪ buff: memory in the buffer cache ▪ si: memory swapped in (anonymous paging)
▪ cache: memory in the page cache ▪ so: memory swapped out (anonymous paging)

43 © 2016 Proofpoint, Inc.


Filesystem Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Utilization (file-nr, inode-nr) ▪ time interval (real time data)
▪ -v inode usage statistics
root@m0131372:~# sar -v 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/06/2018 _x86_64_ (8 CPU)

08:55:51 PM dentunusd file-nr inode-nr pty-nr


08:55:52 PM 4227 1120 23789 2
08:55:53 PM 4227 1120 23789 2
08:55:54 PM 4227 1120 23789 2
08:55:55 PM 4227 1120 23789 2
08:55:56 PM 4227 1120 23789 2

▪ dentunusd: # of unused entries in directory cache


▪ file-nr: # of file handles in use by the system
▪ inode-nr: # of inode handlers in use by the system
▪ pty-nr: number of pseudo terminals in use by system

44 © 2016 Proofpoint, Inc.


Filesystem Analysis Tool: df
▪ What to look for: ▪ Important switches/parameters
▪ Utilization (IUsed, IUse%) ▪ -i inode usage statistics

root@m0131372:~# df -i
root@m0131372:~# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 4114281 386 4113895 1% /dev
tmpfs 4116262 2566 4113696 1% /run
/dev/sda3 6053888 101781 5952107 2% /
tmpfs 4116262 1 4116261 1% /dev/shm
tmpfs 4116262 3 4116259 1% /run/lock
tmpfs 4116262 16 4116246 1% /sys/fs/cgroup
/dev/sda1 60960 307 60653 1% /boot
tmpfs 4116262 4 4116258 1% /run/user/14480

▪ Inodes: # of inodes on filesystem


▪ IUsed: # of inodes in use by filesystem
▪ IFree: # of free inodes on filesystem
▪ Iuse%: % of inodes used on filesystem

45 © 2016 Proofpoint, Inc.


Filesystem Analysis Tool: slabtop
▪ What to look for: ▪ Important options
▪ Utilization (kernel slab cache ▪ -o display and exit
information)
root@m0131372:~# slabtop -o
Active / Total Objects (% used) : 233156 / 234362 (99.5%)
Active / Total Slabs (% used) : 4325 / 4325 (100.0%)
Active / Total Caches (% used) : 69 / 108 (63.9%)
Active / Total Size (% used) : 59301.05K / 60013.86K (98.8%)
Minimum / Average / Maximum Object : 0.01K / 0.26K / 8.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
48824 48824 100% 0.12K 718 68 5744K kernfs_node_cache
26502 26502 100% 0.19K 631 42 5048K dentry
18582 18582 100% 0.55K 326 57 10432K inode_cache
11466 11466 100% 0.10K 294 39 1176K buffer_head
10864 10864 100% 0.07K 194 56 776K Acpi-Operand

▪ OBJS: number of objects ▪ SLABS: number of slabs


▪ ACTIVE: number of active objects ▪ OBJ/SLAB: objects per slab
▪ USE: cache utilization ▪ CACHE SIZE: cache size
▪ OBJ SIZE: object size ▪ NAME: name

46 © 2016 Proofpoint, Inc.


Filesystem Analysis Tool: strace
▪ What to look for: ▪ Important switches/parameters
▪ -tt time in microseconds
▪ Latency Analysis (read, write syscall timings)
▪ -T time spent in syscalls
▪ -p PID
root@m0131372:~# strace -ttT -p 796
19:58:55.741953 write(151, "Jun 6 19:58:55 us4-mdac16-16 am"..., 347) = 347 <0.000050>
19:58:55.742088 write(11, "Jun 6 19:58:55 us4-mdac16-16 am"..., 347) = 347 <0.000094>
19:58:55.742319 write(16, "Jun 6 19:58:55 us4-supermx16-4 "..., 280) = 280 <0.000095>
19:58:55.742489 write(11, "Jun 6 19:58:55 us4-supermx16-3 "..., 120) = 120 <0.000053>
19:58:55.742615 write(8, "Jun 6 19:58:55 us4-dispatch16-2"..., 797) = 797 <0.000048>
19:58:55.742738 write(11, "Jun 6 19:58:55 us4-dispatch16-1"..., 455) = 455 <0.000186>
19:58:55.743002 write(56, "Jun 6 19:58:55 us4-mdac16-21 pp"..., 171) = 171 <0.000052>
19:58:55.743130 write(17, "Jun 6 19:58:55 us4-mdac16-21 pp"..., 171) = 171 <0.000048>
19:58:55.743250 write(12, "Jun 6 19:58:55 us4-mdac16-21 pp"..., 171) = 171 <0.000047>
▪ Right side: timestamp with microseconds
▪ Return value: bytes read/written
▪ Left side: execution time in seconds
▪ Caution: strace can introduce significant overhead!

47 © 2016 Proofpoint, Inc.


Disk IO Analysis
How to Determine Utilization and Saturation

48 © 2016 Proofpoint, Inc.


USE Method: IO Analysis
▪ Disk Devices
▪ Utilization: the time the device was busy
▪ Saturation: IO queue time
▪ Errors: Device errors
▪ Disk Controllers
▪ Utilization: current vs maximum throughput
▪ Utilization: current vs maximum operation rate
▪ Saturation: IO waiting on controller
▪ Errors: controller errors

49 © 2016 Proofpoint, Inc.


Latency and Saturation
Latency vs Throughput
▪ Storage systems are 90

256
complex 80

70
▪ Often reflect multiple
60
inflection points Saturation Point

Latency (ms)
50

▪ Extra load adds latency 40


What’s happening here? 128

before saturation 30
64

32
▪ Might mean something is 20

16
broken/misconfigured 10
1 2 4 8

0
0 500 1000 1500 2000 2500 3000 3500
IOPS

4K Writes

50 © 2016 Proofpoint, Inc.


Little’s Law
▪ Little’s Law comes from queuing theory, highly relevant to systems performance

▪ L = average number of requests in a system


▪ λ = average arrival rate
▪ W = average service time

L=λW
Avg. # of requests in system = Arrival rate * Avg service time

51 © 2016 Proofpoint, Inc.


Little’s Law
(Assuming Arrival Rate = Departure Rate)

Queuing System

Service
Queue Center
Arrivals Departures

Wait Time Service


Time

Queue size = Arrival Rate * Wait Time


Requests in Process = Arrival Rate * Service Time
(Requests Queued + In Process) = Arrival Rate *(Queue Time + Wait Time)

52 © 2016 Proofpoint, Inc.


Little’s Law: Disk IO

Disk System

Disk
Queue Device
Arrivals Departures

Wait Time Service


Time

Avg Queue Depth = IOPS * Avg Latency

(Queue Depth = IOs queued + in process)

53 © 2016 Proofpoint, Inc.


Little’s Law: Network Traffic

Network System

On the
Queue Wire
Arrivals Departures

Wait Time Service


Time

Packets in flight = Packget/sec * Avg Latency

Bytes in flight = Bytes/sec * Avg Latency


(Large variance in packet size can skew the second one)

54 © 2016 Proofpoint, Inc.


Little’s Law: Email Message Flow

Mail Transmission Agent

In
Queue Transmission
Arrivals Departures

Wait Time Service


Time

Messages in System = Messages/sec * Avg Latency

55 © 2016 Proofpoint, Inc.


Little’s Law: Ramifications
Latency Throughput

Throughput Latency
Requests Throughput Requests Throughput
Assuming constant Assuming constant
Requests in Flight Requests in Flight Assuming constant Assuming constant
Latency Latency

Requests Latency Requests Latency

Assuming constant Assuming constant


Throughput Throughput

56 © 2016 Proofpoint, Inc.


Disk Analysis Tool: iostat
▪ What to look for: ▪ Important switches/parameters
▪ Disk Utilization (r/s, w/s, rkB/s, wkB/s) ▪ time interval (real time data)
▪ Disk Saturation (avgqu-z) ▪ -x extended statistics
▪ -d disk stats only (no CPU)

root@us1-filterqueue16-4:~# iostat -xd 1


Linux 4.4.0-116-generic (us1-filterqueue16-4) 06/06/2018 _x86_64_ (4 CPU)

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.99 0.01 1.62 0.18 19.66 24.39 0.00 0.36 2.39 0.34 0.21 0.03
sdb 0.00 173.01 0.00 48.78 0.02 3603.72 147.76 0.05 1.05 1.19 1.05 0.49 2.39
dm-0 0.00 0.00 0.00 90.72 0.01 1704.58 37.58 0.06 0.67 1.36 0.67 0.13 1.13
dm-1 0.00 0.00 0.00 129.11 0.01 1899.14 29.42 0.09 0.69 2.22 0.69 0.10 1.31

▪ rrqm/s: read requests merged/queued per second ▪ avgrq-sz: average request size (in 512b blocks)
▪ wrqm/s: read requests merged/queued per second ▪ avgqu-sz: average IO queue depth

▪ r/s: read requests completed per second ▪ await: average request time (includes wait, ms)
▪ r_await: average read request time (ms)
▪ w/s: write requests completed per second
▪ w_await: average write request time (ms)
▪ rkB/s: KB read per second
▪ scvtm: here be dragons*
▪ wkB/s: KB written per second
▪ %util: here also be dragons*
57 © 2016 Proofpoint, Inc.
iostat Caveats
▪ svctm
▪ man page: “The average service time (svctm field) value is meaningless”
▪ Why? Kernel measures stats at block/request level (not the device)
▪ Service time is inferred from %util and total IOPS

▪ %util
▪ Percentage of time that I/O requests were issued to the device
▪ This doesn’t work for:
- RAID Arrays
- SSDs
- Why? Parallelism

58 © 2016 Proofpoint, Inc.


Disk Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Disk Utilization (tps, rd_sec/s, wr_sec/s) ▪ time interval (real time data)
▪ Disk Saturation (avgqu-z) ▪ -d report disk statistics
root@us1-filterqueue16-4:~# sar -d 1
Linux 4.4.0-116-generic (us1-filterqueue16-4) 06/06/2018 _x86_64_ (4 CPU)

11:17:06 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
11:17:07 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11:17:07 PM dev8-16 53.00 0.00 8880.00 167.55 0.03 0.60 0.38 2.00
11:17:07 PM dev252-0 117.00 0.00 7512.00 64.21 0.09 0.79 0.10 1.20
11:17:07 PM dev252-1 116.00 0.00 1368.00 11.79 0.05 0.45 0.07 0.80

▪ DEV: Device, by major/minor number ▪ avgqu-sz: average IO queue depth


▪ tps: IOPS (IO Operations per Second) ▪ await: average request time (includes wait)
▪ rd_sec/s: 512b block read per second ▪ scvtm: here be dragons*
▪ wr_sec/s: 512b block read per second ▪ %util: here also be dragons*
▪ avgrq-sz: average request size (in 512b blocks)

59 © 2016 Proofpoint, Inc.


Disk Analysis Tool: iotop
▪ What to look for: ▪ Important options
▪ Utilization (per process) ▪ Left/Right arrow: change sort column
▪ DISK READ, DISK WRITE, IO ▪ r: change sort order
▪ o: show procs with IO only

Total DISK READ : 0.00 B/s | Total DISK WRITE : 2.71 M/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 3.41 M/s
TID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND
17966 be/4 postfix 0.00 B/s 1693.11 K/s 0.00 % 0.34 % cleanup -z -t unix -u -c
19865 be/4 postfix 0.00 B/s 479.33 K/s 0.00 % 0.32 % cleanup -z -t unix -u -c
20054 be/4 postfix 0.00 B/s 405.88 K/s 0.00 % 0.24 % cleanup -z -t unix -u -c
20050 be/4 postfix 0.00 B/s 81.18 K/s 0.00 % 0.10 % cleanup -z -t unix -u -c
20053 be/4 postfix 0.00 B/s 46.39 K/s 0.00 % 0.07 % cleanup -z -t unix -u -c

▪ TID: thread ID ▪ DISK WRITE: disk write throughput


▪ PRIO: IO priority ▪ SWAPIN: % time spent swapping
▪ USER: process user ▪ IO: % time spent on IO
▪ DISK READ: disk read throughput ▪ COMMAND: command

60 © 2016 Proofpoint, Inc.


Disk Analysis Tool: blktrace
▪ What to look for: ▪ Important switches/parameters
▪ blktrace –d device
▪ Details of individual IO
▪ blktrace –o output file
▪ Caution, can introduce overhead ▪ blkparse –i input file
root@m0131372:~# blktrace -d /dev/sda -o - | blkparse -i -
8,0 7 1 0.000000000 15204 A W 71030728 + 8 <- (8,3) 54915016
8,0 7 2 0.000001446 15204 Q W 71030728 + 8 [kworker/u256:2]
8,0 7 3 0.000006088 15204 G W 71030728 + 8 [kworker/u256:2]
8,0 7 4 0.000006756 15204 P N [kworker/u256:2]
8,0 7 5 0.000063346 15204 A W 30117584 + 8 <- (8,3) 14001872
8,0 7 6 0.000063569 15204 Q W 30117584 + 8 [kworker/u256:2]
8,0 7 7 0.000065031 15204 G W 30117584 + 8 [kworker/u256:2]
8,0 7 8 0.000067831 15204 A W 30117576 + 8 <- (8,3) 14001864

1. Device major, minor 5. PID


2. CPU # 6. Action
3. Seq # 7. RWBS
4. Relative time 8. LBA + offset w/Process name (action dependent)

61 © 2016 Proofpoint, Inc.


Disk Analysis Tool: blktrace
Actions
▪ C - IO Completion ▪ G - get request
▪ D - IO Issued to driver ▪ S - sleep
▪ I - IO inserted into request queue ▪ P - plug
▪ Q - Intent to queue IO ▪ U - unplug
▪ B - IO bounced ▪ T - unplug
▪ M - back merge ▪ X - split
▪ F - front merge ▪ A - remap

RWBS
▪ R - Read ▪ S - Synchronous
▪ W - Write ▪ D - Discard
▪ B - Barrier

62 © 2016 Proofpoint, Inc.


Typical IO Event Flow

P T/U
Plug Queue Unplug Queue

A Q G I D C
Intent to Get Request Insert Into Submitted to
Remap IO Complete
Queue Struct Queue Driver

F/M
Merged

63 © 2016 Proofpoint, Inc.


blktrace Example: Query Analysis
MySQL Query (TPC-H Query 1)

64 © 2016 Proofpoint, Inc.


Network Interface Analysis
How to Determine Utilization and Saturation

65 © 2016 Proofpoint, Inc.


USE Method: Network Interface Analysis
▪ For each network interface, check for (in both directions):
▪ Utilization: time the interface was busy sending/receiving frames
▪ Utilization: Current throughput vs max throughput
▪ Saturation
- extra queuing
- blocking
- buffering
▪ Errors:
- Bad checksum (receive)
- Frame too short/too long (receive)
- Collisions (receive)/Late collisions (send)

66 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: ip
▪ What to look for: ▪ Important switches/parameters
▪ Utilization (TX/RX bytes)
▪ Saturation (dropped, overrun) ▪ -s statistics
▪ Errors (errors, collsns) ▪ link selects link layer objects
root@m0131372:~# ip -s link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped overrun mcast
403540 2632 0 0 0 0
TX: bytes packets errors dropped carrier collsns
403540 2632 0 0 0 0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:81:cf:9d brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
900011431 12033688 0 1707 0 35
TX: bytes packets errors dropped carrier collsns
64177178 247179 0 0 0 0
▪ bytes: Bytes transmitted/received ▪ overrun: interface buffer overruns
▪ packets: Packets transmitted/received ▪ mcast: multicast packets
▪ errors: CRC errors detected ▪ carrier: carrier drops (link disconnects)
▪ dropped: dropped packets ▪ collsns: collisions (switch issues?)

67 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Network Interface Utilization (rxKB/s, ▪ time interval (real time data)
txKB/s, %ifutil) ▪ -n report network statistics
▪ DEV report device stats
root@m0131372:~# sar -n DEV 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/08/2018 _x86_64_ (8 CPU)

10:08:14 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
10:08:15 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:08:15 PM eth0 78.00 0.00 4.55 0.00 0.00 0.00 0.00 0.00

10:08:15 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
10:08:16 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
10:08:16 PM eth0 56.00 1.00 3.27 0.39 0.00 0.00 0.00 0.00

▪ kB/s: kilobytes transmitted/received per second ▪ %ifutil: greater of rxkB/S or txkB/s as a


percentage of interface speed

68 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: sar
▪ What to look for: ▪ Important switches/parameters
▪ Network Interface Saturation (drops, fifos) ▪ time interval (real time data)
▪ -n report network statistics
▪ EDEV report device errors
root@m0131372:~# sar -n EDEV 1
Linux 4.4.0-83-generic (m0131372.ppops.net) 06/08/2018 _x86_64_ (8 CPU)

09:52:40 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
09:52:41 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
09:52:41 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

09:52:41 PM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s
09:52:42 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
09:52:42 PM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

▪ drops: Number of packets dropped per second ▪ fifos: Number of FIFO overrun errors that
because of a lack of space in linux buffers happened per second

69 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: ping
▪ What to look for:
▪ Network connectivity
▪ Baseline network latency
root@m0131372:~# ping 10.7.64.42
PING 10.7.64.42 (10.7.64.42) 56(84) bytes of data.
64 bytes from 10.7.64.42: icmp_seq=1 ttl=57 time=42.6 ms
64 bytes from 10.7.64.42: icmp_seq=2 ttl=57 time=42.4 ms
64 bytes from 10.7.64.42: icmp_seq=3 ttl=57 time=42.4 ms
64 bytes from 10.7.64.42: icmp_seq=4 ttl=57 time=42.4 ms
64 bytes from 10.7.64.42: icmp_seq=5 ttl=57 time=42.6 ms

▪ uses the ICMP ECHO_REQUEST to elicit an ICMP


ECHO_RESPONSE from specified IP address

70 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: traceroute
▪ What to look for: ▪ Interesting switches/parameters
▪ network connectivity ▪ -T use TCP
▪ network latency ▪ -I use ICMP
▪ network route information ▪ -p port (base port for UDP)
root@m0131372:~# traceroute 10.7.64.42
traceroute to 10.7.64.42 (10.7.64.42), 30 hops max, 60 byte packets
1 gateway.vlan55.com (10.110.55.253) 0.501 ms 0.416 ms 0.465 ms
2 10.87.3.10 (10.87.3.10) 17.647 ms 17.632 ms 17.554 ms
3 10.87.3.1 (10.87.3.1) 0.689 ms 0.673 ms 0.655 ms
4 10.87.0.16 (10.87.0.16) 41.882 ms 41.876 ms 41.867 ms
5 10.87.4.2 (10.87.4.2) 78.615 ms 78.600 ms 78.649 ms
6 10.87.4.9 (10.87.4.9) 42.258 ms 42.178 ms *
7 10.7.6.66 (10.7.6.66) 51.645 ms 51.679 ms 51.617 ms
8 m0126666.ppops.net (10.7.64.42) 42.606 ms 42.753 ms 42.725 ms

▪ Tracks route to IP address


▪ Utilizes the IP TTL field to elicit an ICMP TIME_EXCEEDED from each hop
▪ Most Linux implementations default to UDP high port

71 © 2016 Proofpoint, Inc.


Network Interface Analysis Tool: tcpdump
▪ What to look for: ▪ Important switches/parameters
▪ -i interface
▪ Network Packet Capture ▪ -w file to capture to
▪ Useful for detailed analysis ▪ -n don’t convert addresses to names
▪ -r file to read packets from

root@m0131372:~# tcpdump -i eth0 -w capture.pcap


tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C418 packets captured
439 packets received by filter
0 packets dropped by kernel
root@m0131372:~# tcpdump -nr capture.pcap
reading from file capture.pcap, link-type EN10MB (Ethernet)
23:07:50.452414 IP 10.110.49.157.22 > 10.100.9.222.60927: Flags [P.], seq 3122911001:3122911125, ack
1724054225, win 79, length 124
23:07:50.510353 ARP, Request who-has 10.110.50.117 tell 10.110.51.65, length 46
23:07:50.511465 ARP, Request who-has 10.110.48.27 tell 10.110.51.91, length 46
23:07:50.518487 ARP, Request who-has 10.110.48.128 tell 10.110.48.114, length 46
23:07:50.518631 ARP, Request who-has 10.110.50.142 tell 10.110.50.62, length 46
23:07:50.531575 ARP, Request who-has 10.110.49.35 tell 10.110.55.240, length 46
23:07:50.534405 IP 10.100.9.222.60927 > 10.110.49.157.22: Flags [.], ack 124, win 696, length 0
23:07:50.535136 ARP, Request who-has 10.110.50.6 tell 10.110.51.69, length 46

72 © 2016 Proofpoint, Inc.


Summary
Utilization, Saturation, Errors

73 © 2016 Proofpoint, Inc.


USE Method
▪ For every resource, check:
▪ Utilization: time resource is busy, degree resource is used
▪ Saturation: Degree extra work is queued, degree latency is high/growing fast
▪ Errors: any errors

▪ Allows you to access overall system health quickly

▪ Identifies areas for further analysis

74 © 2016 Proofpoint, Inc.


Linux Observability Tools

75 © 2016 Proofpoint, Inc.


Sources and Further Reading
▪ Gregg, Brendan. (2014) Systems Performance: Enterprise and the Cloud. Pearson Education.
Kindle Edition.
▪ Jain, Raj. (1991) The Art of Computer Systems Performance Analysis: Techniques for
Experimental Design, Measurement, Simulation, and Modeling. John Wiley & Sons
▪ http://www.brendangregg.com/usemethod.html
▪ https://www.slideshare.net/brendangregg/velocity-2015-linux-perf-tools
▪ http://brendangregg.com/blog/2017-08-08/linux-load-averages.html
▪ http://techblog.netflix.com/2015/11/linux-performance-analysis-in-60s.html
▪ https://brooker.co.za/blog/2014/07/04/iostat-pct.html
▪ http://yoshinorimatsunobu.blogspot.com/2009/07/iostat-rs-ws-svctm-util-on-linux.html
▪ http://www.fis.unipr.it/doc/blktrace-1.0.1/blktrace.pdf
▪ Man pages for: uptime, vmstat, mpstat, sar, top, free, slabtop, strace, iostat, iotop, blktrace, ip,
ping, traceroute, tcpdump

76 © 2016 Proofpoint, Inc.


Thank You
Questions?

77 © 2016 Proofpoint, Inc.


78 © 2016 Proofpoint, Inc.

You might also like