Professional Documents
Culture Documents
Enterprise Server
Performance Management
SA-400
Appendices
• Interface Card Characteristics
• SyMON Installation
• Accounting
Course Objectives
• Describe the purpose and goals of tuning
Course Prerequisites
• Basic Sun system administration as demonstrated in
Solaris 2.X System Administration Essentials (SA-135), and
Solaris 2.X System Administration (SA-285)
Introductions
1. Who are you?
2. What do you do?
3. What is your background and experience as it relates
to this course?
4. What do you hope to accomplish this week?
Module 1
Introduction
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Explain what performance tuning really means
What is Tuning?
• Improving resource usage
• Improving user perception of the system
• Making more efficient use of system resources
• Increasing system capacity
• Adapting the system to your workload
• Lowering costs
Find a bottleneck
Remove it
Is
N performance Y
satisfactory Stop
?
Configuration changes
Where to Tune?
• The closer to the application you tune, the more
leverage you have
• System administrators have little control at the
application or DBMS levels
• For the OS and hardware, more general things are
usually done that benefit all applications
• You can tune the system specifically for one
application if you accept the trade-offs
• You may need to understand the applications’
characteristics to properly tune them
Performance Measurement
Terminology
• Bandwidth
• Throughput
• Latency
• Utilization
Saturation Point
Fast
Response
Degradation
Utilization
Memory
CPU I/O
SunOS Components
User level User applications and libraries
Process management
File M m S
system mmap() e a c
m n h
o a e I
r g d P
y e u C
Buffer cache m l
segmap e i
page structures n n
t g
Character Block
Device drivers
Machine-dependent code
Hardware
Module 2
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the lifetime of a process in SunOS
• Explain multithreading and its benefits
• Describe locking and its performance implications
• Describe the functions of the clock thread
• Describe the process and thread tuning data
• Describe the related tuning parameters
Process Lifetime
Parent Child
Creation fork()
exec() Execution
exit() Termination
Status wait()
collection
print prompt
y
fget() terminal
EOF ? done
input line
parse line n
Is y
(pid = fork()) error
<0
wait() n Is y exec()
pid == 0
on pid of child input command
parent child
Multithreading
• Thread - a logical sequence of instructions in a program
Application Threads
proc
Threads
LWP
Process Execution
87
62
CPU
45
15 CPU
Thread Execution
Process System
73 87
44 62
LWP CPU
12 45
3 LWP 15 CPU
Hardware Layer
Performance Issues
Multithreading an application allows it to:
• Be broken into separate tasks that can be scheduled and
executed independently
• Take advantage of multiprocessors with less overhead
than multiple processes
• Share memory without going through the overhead and
complexity of IPC mechanisms
• Use a cleaner programming model for certain types of
applications
• Extend the program more easily
Locking
• Locks are used to synchronize threads by serialization
• They protect critical data from simultaneous write
access
• Locks must be used when threads share writable data
• Solaris provides several different types of locks
Locking Problems
• Lock contention
• Granularity
• Inappropriate lock type
• Deadlock
• ’Lost’ locks
• Race Conditions
• Incomplete implementation
Interrupt Levels
15 Nonmaskable interrupt
Real-time CPU clock
Serial communications
Higher Priority
10 clock
Video vertical interrupt
graphics, ALM
Tape, Ethernet
Disks
Lower Priority
Deferred interrupts
Module 3
Tuning Tools
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe how performance data is collected in the
system
The kstat
• Created by a system component
• One for every thing the component wants to measure
• Specific format for each component
• Read by the ioctl system call
• A copy is made into the requesting process
• A whole chain of kstats may be read
• The component updates the kstat as needed
• Usually as the recorded event occurs
sar
• sar does general data collection – a bit of everything
• Has 15 different collection types
• Covers:
• File system, system calls
• Physical memory usage, paging, swap, kernel
memory
• Block device activity, TTY activity
• Process table status
• SV IPC activity, CPU usage
11:16:05 0 0 0
0 0 100 0 0 100 0 0
0 0 0 100
32/3530 0 1594/15320 0 213/213 0 0/0
#
vmstat
• vmstat reports primarily processor-related statistics
• Has 4 operands
• Covers:
• Process, interrupt, CPU cache and CPU activity
• Virtual memory and swapping activity
• Disk activity
• Information similar to sar
iostat
• iostat reports I/O device performance data
• Includes NFS mount points and raw devices
• Has 12 operands
• Covers:
• CPU utilization, TTY activity
• Disk activity, device errors
• Hints:
• Use -M to see Mbytes per second instead of Kbytes
• Use -n to get cXtXdX device names
mpstat
• mpstat reports CPU-related activity
• Reports CPU-specific statistics
• No operands
• Reports:
• Page faults, interrupts, cross-calls
• Locking, context switches, CPU utilization
• Usually not needed
netstat
• netstat reports network-related activity
• Has 14 operands
• Covers:
• Socket state, interface state, DHCP information
• arp and routing tables
• STREAMS statistics
• Gives you excellent visibility on the use of your network
• Can be complicated if you have many network
interfaces
UDP
udpInDatagrams = 274 udpInErrors = 0
udpOutDatagrams = 66
nfsstat
• nfsstat reports nfs-related activity
• Has 6 operands
• Covers:
• Client-side information
• NFS-mounted file system information
• NFS and RPC information
• Can help isolate the exact source of an NFS problem
• Can be useful with DBMS network problems as well
• Many use RPCs to transfer requests and data
Server rpc:
Connection oriented:
calls badcalls nullrecv badlen xdrcall dupchecks
0 0 0 0 0 0
dupreqs
0
Connectionless:
calls badcalls nullrecv badlen xdrcall dupchecks
0 0 0 0 0 0
dupreqs
0
...
SyMON
• Provided on the SMCC Supplements CD-ROM
• Provides remote monitoring of:
• Performance data
• System failures
• System configuration
• Syslog information
• Uses SNMP to transfer data
• Rules can signal exception conditions
Other Utilities
• /usr/proc/bin
• proctool
• System accounting
/usr/proc/bin
• This directory was added by SunOS 5.5
• It contains 10 commands that format data from /proc
• Commands include:
• ptree – Process’ ancesters and children
• pseg – Process’ virtual memory segments
• pfile – Open file information for a process
• The output can be cryptic
proctool
• proctool can be used to continuously display the
following information on a per-process basis:
• Resource usage
• I/O usage
• Memory map
• Process credentials
• Traceback
• Scheduling parameters
• proctool is not part of Solaris and must be
downloaded separately
• sysdef
• adb
• ndd
/etc/system
• This file can be used to modify kernel-tunable variables.
System Accounting
• Accounting is not enabled by default
• Accounting allows you to track what is executed ad
how much resource it uses
• It can be very useful in determining what is using your
system
• You must run the reports to reduce the data
• It does add system overhead
• If you have a varying environment it is the easiest way
to decide where to spend your time
• Correlate the accounting data with your tuning reports
Module 4
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the purpose of a cache
What is a Cache?
• Keeps accessed data near its user
• Must provide high-speed access
• Holds a small subset of the available data
• Caches are used by hardware and software
• Many different caches around the system
• Critical for performance
• Many different ways to manage a cache
Hardware Caches
• There are hardware caches all over the system
• CPU level 1 and level 2
• PDC/TLB
• Main store
• I/O components
• Each system has different caches with different
characteristics
CPU Caches
• Usually two levels of cache in a CPU
Cache Operation
The cache’s user requests data
Cache Replacement
• Data to be replaced may have to be copied back
Cache Operation
Cache Hit
Request
Hit
s Move out
Request is
M
Move in
• Fetch rate
• Locality of data references
• Cache structure
Cache Characteristics
• There are several pairs of mutually exclusive cache
characteristics
Virtual Physical
Write Through Write Back
Direct Mapped Set Associative
Harvard Unified
Snooping Cache
Stack
Hole
Physical
address
CPU Cache MMU
Data
Virtual
address
Physical
Text memory
Stack
Hole
Physical
address
CPU MMU Cache
Data
Virtual
address
Text Physical
memory
Cache Thrashing
• Picking the wrong memory stride essentially turns off
the cache
Cache Thrashing
Physical
pages
Virtual address 1 (array_in[])
MMU
page table
Cache
Module 0 Module 1
Bus
I/O
Interfaces
• Cache thrashing
• Locality of reference
• Memory strides
• Algorithms
• Compiler Optimization
• Main memory management
Module 5
Memory Tuning
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the structure of virtual memory
Virtual Memory
• What the programs see
• Moves between main memory and disk
• Created and deleted as necessary
• Managed by the OS
Kernel
Kernel base
Stack
Shared libraries
Hole
Heap
Text
0
Types of Segments
DVMA segkmem
Device segdev
Frame
buffer
device
Data segvn
Text segvn
Hole
Physical
address
CPU MMU
Data Virtual
address
Text Physical
memory
Is (PDC hit)
it in cache? Done
n (Cache miss)
Wait for MMU to do table
lookup to find the pte
for the page being accessed.
Is y
the valid bit set in Put in cache. A
the pte?
n (Page fault)
I/O n
Get free page and do I/O. Error
successful?
Scan rate
deficit
slowscan
Free Memory
An Alternate Approach
The Clock Algorithm
pages
1
0
backhand 0
0
0
1 reference
bits
handspreadpages 0
0
0
fronthand 0
1
maxpgio
• Modifed pages need to be written back
Swapping
• Swapping is a last resort – ’desperation swapping’
• If the page daemon consistently cannot keep up with the
demand for memory, memory use must be cut
• The number of swaps is not important; the fact that
there are swaps is
• A process will be swapped when its last LWP is
swapped out
• Its memory will not be freed until then
• Do not try to tune swaps, try to eliminate them
AND
AND
Swapping Priorities
• The higher the caclulated priority the more likely the
thread will be swapped
• The priority calculation is:
epri = swapin_time - rss/maxpgio/2 - pri
• Where:
• swapin_time – How long since the thread was last
swapped
• rss – Amount of memory used by the thread’s process
• pri – The thread’s dispatching priority
Swap In
• Once the pressure is off the memory subsystem,
swapped out threads will be brought back in
• A swapping priority is calculated again
• The lowest priority swapped-out thread will be brought
back in
• The process continues until all threads are back
• Memory pressure may cause new swaps
Swap Space
• Virtual memory includes most of main memory
• Reservable virtual memory is disk swap space plus
physical memory, minus kernel pages, minus a
3.5-Mbyte safety factor.
• Systems with a large amount of physical memory
may not require much disk space for swap
• Remember to allocate enough to hold a panic dump
• Swap space is not allocated until the page is written out
• If disk space is not available, the page stays in
memory
tmpfs
• /tmp is a virtual memory ramdisk
• It uses virtual memory like any other user
Performance Data
• Reporting is performed by vmstat and sar
• They don’t report the same data
• Many of the numbers should be ignored
• The scan rate is the most useful value
• Everything else can be misleading
Shared Libraries
Shared libraries have a longer startup time when a routine is
used; however:
Paging Review
Kernel pages are
Kernel fixed in memory.
Stack /dev/rdsk/c0t0d0s1
Anonymous memory is
Heap Swap space paged out to swap space
Module 6
System Buses
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Define a bus
• Explain general bus operation
• Describe the processor buses: UPA, Gigaplane,
Gigaplane XB
• Describe the data buses: SBus, PCI
• Explain how to calculate bus loads
• Describe how to recognize and configure to avoid bus
problems
• Dynamic reconfiguration considerations
What is a Bus?
• A mechanism to transfer data, just like a network
• It usually connects more than two components
• It runs at a fixed speed
• It is usually bi-directional and broadcasts messages
• It is like a highway
• Speed limit is a maximum, not always a required
speed
• It has lower speed ’on and off ramps’
• Higher speeds mean more traffic capacity
System Buses
• Used to connect sections of the system to each other
• CPUs, peripheral controllers and memory
• Many different kinds, depending on technology and
needs
• Smaller systems usually use simpler buses
• Cost issues are important
• Large systems need throughput
• Performance outweighs cost
• Sun has used many different kinds over the years
System Buses
MBus and XDbus
• MBus – for small desktop multiprocessors (sun4m)
• Separate I/O bus from CPU and memory
• 135 Mbytes/second at 50 MHz
• Looks like a current Pentium system design
• XDBus – SS1000, SC2000, CS6400 (sun4d)
• 1 on 1000, 2 on 2000, 4 on 6400
• 250-300 Mbytes/second at 40 MHz
• Easy to overload with a full configuration
System Buses
UPA and Gigaplane Bus
• UPA – used by the Ultra systems
• Crossbar connects CPUs, memory and I/O
• Very efficient and very fast
• Gigaplane – Used in the Ultra servers
• Connects multiple UPA buses (on system boards)
• Very fast – ~2.75 GBytes/second (X500 servers)
• Configurable while you’re running
• This is what dynamic reconfiguration (DR) does
1
3
rd 1
oard 1
boa
mb
10
14
tem
te
d
r d
System board 12
ar
Sys
oa
Sys
bo
b
em
st e m
Sy st
Sy
d9 5
oar b
oa rd 1
tem
Sys te mb
Sys
System board 8
System board 0
Sys
tem Sys
boa tem
rd 7 boa
rd 1
Sy
st Sy
em st
em
bo
bo
Sys
Sys
ar
d ar
6 d
2
tem
tem
b
boa
oar
rd 5
d3
System board 4
Peripheral Buses
• Connect peripherals to the system buses
• Slower than the system buses, usually by a lot
• Circuit or packet switched
• Can easily be the (hidden) limiting factor in I/O
performance
• Sun provides two different ones: PCI and SBus
• Can mix PCI and SBus in Enterprise Servers
• All new systems are PCI
prtdiag
• Provided for Sun’s sun4d and sun4u systems
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------------
0 SBus 25 0 qec/qe (network) SUNW,595-3198
0 SBus 25 0 SUNW,soc/SUNW,pln 501-2069
0 SBus 25 1 QLGC,isp/sd (block) QLGC,ISP1000
1 SBus 25 0 qec/qe (network) SUNW,595-3198
1 SBus 25 0 SUNW,soc 501-2069
1 SBus 25 1 QLGC,isp/sd (block) QLGC,ISP1000
Bus Limits
• A bus is like a freeway
• More traffic = more congestion
• Too much traffic, things back up
• Things get delayed, you get problems
• Going faster takes more skill and money
• Bus saturation = problems
• You don’t get any specific messages
• How do these problems occur?
If I have checks...
• I must have money!
• If I have slots, I must have bandwidth!
• Both may be wrong
• Symptoms (of the second)
Gigaplane Bus
UPA
200 Mbytes/second each
SBus A SBus B
12 20 ? ? ? 50 50
10 SBus
Memory
5 Gbytes in 20
256 Mbyte banks
50 Mbytes/second
Dynamic Reconfiguration
Considerations
• Problems occur when you add, not when you remove
• Make sure you have bandwidth when you add
components
• Be careful replacing slower cards or components
• Consider reorganizing to spread the load
• Be careful of controller and component renumbering
• Plan ahead
Tuning Reports
• Not very much available in this area
• Vigilance is the best prevention
• Check prtdiag when you add new cards
• Remember that empty slots do not mean available
bandwidth
• Strange behavior in tuning reports may be related to bus
problems
• It usually occurs when the system is most heavily
loaded
Module 7
I/O Tuning
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe how the I/O buses operate
• Describe the configuration and operation of the SCSI
bus
• Describe the configuration and operation of a Fibre
Channel bus
• Describe how disk and tape operations proceed
• Describe how a storage array works
• Describe the RAID levels and their uses
• Describe the available I/O tuning data
SCSI Speeds
Name Maximum Speed
Asynchronous 4 MBytes/sec
Synchronous 5 MBytes/sec
Fast 10 MBytes/sec
Fast-20 (Ultra) 20 Mbytes/sec
SCSI Widths
Name Width Cable
Narrow 1 byte 50 conductor cable
Wide 2 bytes in parallel 68 conductor cable
Quad 4 bytes in parallel 2 cables, 104 conductors
SCSI Lengths
Name Limit Symbol
Single-ended 20’, 6m
Host Adapter
target 7 SBE/S
Fibre Channel
• General purpose networking mechanism
• Looks like a packet-switched bus
• Supports up to 128 targets
• Supports more than SCSI – ATM, video, and so on
• Three types
• FC-25 – SPARCstorage Arrays
• FC-AL – A5000
• FC Fabric – Full network
• All devices have a unique WWN, like an IP address
...
Tagged Queueing
• A special ID field is added to the device command
• Allows the drive to handle multiple simultaeou
requests
• Requests can be executed out of order
• Takes advantage of ordered requests to limit arm
motion
• Must be negotiated with the device
• Not all devices support it
• Can provide noticable performance improvements
Mode Pages
• Mode pages are provided for all SCSI devices
• Identified by number
• They describe device properties and configuration
• Several standard, required, pages
• Additional device-specific pages
• Can also have vendor-specific pages
• Can be viewed using format -e
• Header files are in /usr/include/sys/scsi
Bus
Controllers
Drives
Module 8
UFS Tuning
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the vnode interface layer to the file system
vnode layer
Process Anonymous
address CD-Rom Disk memory
space Diskette
Data
Data
B-U Superblock
Cylinder group
block Cylinder group
inodes 1
Data
The Directory
inode number
entry length
name length
name
inode number
entry length
name length
name
name
The inode
inode type and permissions
Number of hard links
Owner and group id
Size in bytes
Time file last accessed
Time file last modified
Time inode last modified
12 direct block pointers
3 indirect block pointers
Number of sectors
Shadow inode location
Data Data
Block Data
addresses Data
Single indirect
2048
Addresses Data
Double indirect Addresses
2048
Triple indirect 2048
Data Data
2048
Addresses Addresses Data
Addresses
2048 2048
2048
Data
2048 2048
Hard Links
directory entry
inode number
directory entry
inode number
name
name
inode
ic_nlink = 2
Symbolic Links
Regular symbolic link Fast symbolic link
/usr directory entry
inode number inode number
entry length
12
name length 4
name t m
p \0
Allocation Performance
’Compacted’ UFS
Allocation
performance
Normal UFS
Faster
0 100
Partition percentage full
Quadratic Rehash
• Tries to avoid clumps of full cylinder groups
• Searches twice as far each time another cg is found full
• Stopping this quickly is important for performance
• Free space is reserved to try to stop quickly
• This is the "10%" reserved seen in df
• With 2.6 this amount depends on partition size
• Formula is (64 Mbytes / file system size) * 100
• Runs from 10% (<= 640 Mbytes) to 1% (>= 6.4
GBytes)
Fragments
• A full 8Kbyte block is a waste for small files
• Fragments allocate 1/8 block increments to a file
• Fragments can be allocated by first-fit or best-fit
• Fragment rules:
1. Only the last block of the file can be fragmented
2. Files larger than 96 Kbytes cannot use fragments
3. The fragments must be contiguous within their block
4. Fragments may be moved to remain contiguous
Application I/O
• Application I/O usually goes through the segmap cache
• The segmap cache is a kernel memory segment
• The segmap cache is 4 MBytes in size
• It is a fully-associative, write-back cache
• The cache contents are different for each process
Application I/O
segmap
cache bread
bwrite
lread lwrite
Kernel
User
read() write()
• One system call and (at least) one copy per request
fsflush
• Remember that the segmap cache is a write-back cache
• Data could stay there unwritten indefinitely, which is
not good in case of a system crash
• fsflush writes modified data back to disk on a regular
basis for UFS and NFS
• There are two tuning parameters:
• tune_t_fsflushr – How often to wake up
• autoup – How long a cycle should take
• autoup is the longest that unmodified data will remain
unwritten
Stack
paddr Mapping
len
off
Data File
(or file-like object)
Text
Using madvise
Used to control more precisely an application’s use of main
storage. Can be used for most virtual memory areas.
• MADV_NORMAL – normal (read ahead only) access
• MADV_RANDOM – random access; no read ahead
• MADV_SEQUENTIAL – sequential access (read ahead
and free behind)
• MADV_WILLNEED – Virtual memory the application
intends to use soon. The data will be read in by read
ahead.
• MADV_DONTNEED – Unneeded virtual memory. Any
associated physical memory will be freed.
• Formula is
Module 9
Network Tuning
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the different types of network hardware
Networks
• Networks are generally packet switched buses
• There are several places to tune:
• The application
• The OS
• The network itself
• You will tune differently depending on your workload
•
Network Bandwidth
Network Connection Bandwidth (bits/s)
28800 baud modem 28.8K
ISDN 1 BRI 64K
ISDN 2 BRI 128K
T1 1.5M
Ethernet 10M
T3 45M
FastEthernet 100M
Full Duplex Fast Enet 100+100M
FDDI 100M
ATM (OC3) 155+155M
ATM (OC12) 622+622M
IP Trunking
• Like network RAID
• Data is striped over multiple network interfaces
• Allows multiple interfaces to look like one
• Works only with qfe cards
• 100base-T quad fast Ethernet
• Can combine 2 or all 4 interfaces
• Usually goes to an Ethernet switch
• Special software required
TCP Connections
TCP Connections are expensive
• TCP is optimized for reliable data on long lived
connections
• Making a connection uses a lot more CPU resource than
moving data
• Connection setup protocol involves several round trip
delays
• Pending connections can cause “listen queue” issues
• Each new connection goes through a “slow start” ramp
up as reliability is determined
TCP Connections
• TCP windows can limit high latency high speed links
• Lost or delayed data causes timeouts and
retransmissions; overhead
• Each open connection consumes memory
• HTTP persistent connections carry several operations
on one connection
Using ndd
• Network driver parameters can be changed in
/etc/system
• They can also be reported on and changed by ndd
• Change lasts until reboot
•
NFS
• Tune the client and server first
• Eliminate non-NFS-specific performance issues
• DNLC and inode cache
• Server I/O and memory performance issues
• Network performance issues
• actimeout too low generates lots of traffic
• cachefs
• NFS V.3 added some protocol efficiencies
Isolating Problems
• Four sources of problems
• Client
• Server
• Network
• NFS itself
• Ensure the individual pieces are tuned, then look at NFS
specifically
• Use NFS V.3 if possible to cut overhead
Tuning Reports
• netstat
•
• nfsstat
•
Module 10
CPU Scheduling
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• Describe the Solaris scheduling classes
Scheduling Classes
• There are four scheduling classes provided by SunOS
• Timesharing - TS
• Interactive - IA
• System - SYS
• Real-time - RT
Dispatch Priorities
Dispatch Queues Dispatch Queues
Real-time priorities
109 Interrupt thread priorities
100 100
99 99
Kernel priorities Kernel priorities
60 60
59 59
Timesharing/Interactive Timesharing/Interactive
priorities priorities
0 0
Scheduling States
Running
Blocked Scheduled
Sleeping Ready
Awakened
• Options:
• -c - class to use
dispadmin Example
# dispadmin -c TS -g
#Time Sharing Dispatcher Configuration
RES=1000
Real-Time Scheduling
Real-time systems have rigid response time or dispatch
latency requirements
• Failure to meet the time constraint can cause problems
General purpose systems have problems with real-time:
• Page faults – Unknown time to resolve
• All pages are locked in memory
• Including shared libraries and shared memory
• Non-preemptable kernel – Potential CPU delays
• Preemptable kernel
Real-Time Issues
• Real-time processes can hurt the system if they:
• Use lots of memory – All pages are locked in memory
• Are compute-intensive - Other tasks may be locked
out and system daemons may suffer
• fork other processes – Children inherit real-time
• Get stuck in a loop - May be hard to stop
• Make sure that you really need real-time before using it
• Consider using a real-time thread instead of a whole
process
• Set hires_tick to 1 to set clock interrupts to 1000 Hz
Enterprise Server Performance Management Module 10, slide 17 of 29
Copyright 1998 Sun Microsystems, Inc. All Rights Reserved. SunService August 1998
Sun Educational Services
Thread Dispatching
Switch request
Finished with
time slice?
no yes
Set it up
Pick a CPU Run it
mpstat Data
• xcal - number of interprocessor cross calls
(cpu_sysinfo.xcalls)
• intr - number of interrupts received on each CPU
(cpu_sysinfo.intr)
• migr - number of threads which migrated from another
CPU (cpu_sysinfo.cpumigrate)
• smtx - number of failed adaptive mutex_enter() calls
(cpu_sysinfo.mutex_adenters)
• srw - number of times readers/writer locks were not
acquired on the first try (cpu_sysinfo.rw_wrfails,
cpu_sysinfo.rw_rdfails)
Processor Sets
• Allows exclusive use of groups of processors by certain
processes
• Also known as CPU fencing
• Very different from pbind(1M)
• Managed by the psrset(1M) command
• Controlled by root
• System-defined processor sets may be used by any user
• DR will release bindings if necessary
Module 11
Module Overview
• Course map
• Relevance
• Objectives
Course Map
Objectives
• List four major performance bottlenecks
Compiler Optimization
• Can provide performance improvements of several
hundred percent
• Usually not done by the programmer
• Makes programs a bit harder to debug
• Even better optimization is available for specific CPUs
• Basic compile flags to use are -xO3 -xdepend
• Compiles will run longer
• Use optimized libraries if available
Performance Bottlenecks
Memory I/O CPU Lock Contention
Bottleneck Bottleneck Bottleneck Bottleneck
Memory Bottlenecks
Memory
Bottleneck
Detection
Symptom
I/O Bottlenecks
I/O
Bottleneck
Detection
Symptoms
Uneven workload
Many threads blocked waiting on I/O
High disk utilization rate
Active disk with no free space
CPU Bottlenecks
CPU
Bottleneck
Detection
Symptom