You are on page 1of 122

RHELKernelPerformanceOptimization,

CharacterizationandTuning
LarryWoodman
JohnShakshober

Agenda

Section1Systemoverview

Section2AnalyzingSystemPerformance

Section3TuningRedhatEnterpriseLinux

Section4PerfomanceAnalysisandTuningExamples

References

Section1SystemOverview

Processors

NUMA

MemoryManagement

FileSystem&DiskIO

ProcessorsSupported/Tested

RHEL4Limitations

x8616

x86_648,512(LargeSMP)

ia648,64(SGI)

RHEL5Limitations

x8632

x86_64256

ia641024

Processortypes

UniProcessor

SymmetricMultiProcessor

MultiCore

SymmetricMultiThread

NUMASupport

RHEL3NUMASupport

Basicmultinodesupport

Localmemoryallocation

RHEL4NUMASupport

NUMAawarememoryallocationpolicy

NUMAawarememoryreclamation

Multicoresupport

RHEL5NUMASupport

NUMAawarescheduling

CPUsets

NUMAawareslaballocator

NUMAawarehugepages

AMD64SystemNumaMemoryLayout
S1
C0

C1

Memory

ProcessonS1C0

S2
C0

C1

Memory

S S SS S S S S S S SS
1 2 34 1 2 3 4 1 2 34
Interleaved(NonNUMA)

C0

C1

C0

C1

Memory

Memory

S3

S4

ProcessonS1C0

S1
S2
S3 S4
NonInterleaved(NUMA)

MemoryManagement

PhysicalMemory(RAM)Management

VirtualAddressSpaceMaps

KernelWiredMemory

ReclaimableUserMemory

PageReclaimDynamics

PhysicalMemorySupported/Tested

RHEL3Limitations

x8664GB

x86_6464GB

ia64128GB

RHEL4Limitations

x8664GB

x86_64128GB

ia641TB

RHEL5Limitations

x8664GB

x86_64256GB

ia642TB

PhysicalMemory(RAM)Management

PhysicalMemoryLayout

NUMAversusNonNUMA(UMA)

NUMANodes

Zones

mem_maparray

Pagelists

Freelist

Active

Inactive

MemoryZones
32bit

64bit
Upto64GB(PAE)

EndofRAM

HighmemZone

NormalZone
896MBor3968MB

NormalZone

16MB
DMAZone
0

16MB
DMAZone
0

MemoryZoneUtilization
DMA
24bitI/O

Normal
KernelStatic
KernelDynamic
slabcache
bouncebuffers
driverallocations
UserOverflow

Highmem(x86)
User
Anonymous
Pagecache
Pagetables

PerZoneResources

RAM

mem_map

Pagelists:free,activeandinactive

Pageallocationandreclamation

Pagereclamationwatermarks

mem_map

Kernelmaintainsapagestructforeach4KB(16KBonIA64
and64KBforPPC64/RHEL5)pageofRAM

mem_mapistheglobalarrayofpagestructs

Pagestructsize:

RHEL332bit=60bytes

RHEL364bit=112bytes

RHEL4/RHEL532bit=32bytes

RHEL4/RHEL564bit=56bytes

16GBx86runningRHEL3:~250MBmem_maparray!!!

RHEL4&5mem_mapisonlyabout50%oftheRHEL3
mem_map.

Perzonepagelists

ActiveListmostrecentlyreferenced

Anonymousstack,heap,bss

Pagecachefilesystemdata/metadata

InactiveListleastrecentlyreferenced

Dirtymodified

Laundrywritebackinprogress

Cleanreadytofree

Free

Coalescedbuddyallocator

PerzoneFreelist/buddyallocatorlists

Kernelmaintainsperzonefreelist

Buddyallocatorcoalescesfreepagesintolargerphysicallycontiguouspieces
DMA
1*4kB4*8kB6*16kB4*32kB3*64kB1*128kB1*256kB1*512kB0*1024kB1*2048kB2*4096kB=11588kB)

Normal
217*4kB207*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=3468kB)

HighMem
847*4kB409*8kB17*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=7924kB)

Memoryallocationfailures

Freelistexhaustion.

Freelistfragmentation.

PerNUMANodeResources

Memoryzones(DMA&Normalzones)

CPUs

IO/DMAcapacity

Pagereclamationdaemon(kswapd#)

NUMANodesandZones
64bit

Node1

EndofRAM

NormalZone

NormalZone

Node0
16MB(or4GB)
DMAZone
0

VirtualAddressSpaceMaps

32bit

3G/1Gaddressspace

4G/4Gaddressspace(RHEL3/4)

64bit

X86_64

IA64

Linux32bitAddressSpaces(SMP)
Virtual

3G/1GKernel(SMP)

0GB3GB4GB
RAM

DMANormalHighMem

Linux32bitAddressSpace(Hugemem)

Virtual

4G/4GKernel(Hugemem)
User(s)
Kernel

RAM

0GB3968MB

DMANormal3968MBHighMem

Linux64bitAddressSpace
x86_64
VIRT

Kernel

User

0128TB(2^47)
RAM
IA64
VIRT

0
RAM

MemoryPressure
32bit
DMA

Normal

Highmem

KernelAllocationsUserAllocations
64bit
DMA

Normal
KernelandUserAllocations

KernelMemoryPressure

StaticBoottime(DMAandNormalzones)

Kerneltext,data,BSS

Bootmemallocator,tablesandhashes(mem_map)

Dynamic

Slabcache(Normalzone)

Kerneldatastructs

Inodecache,dentrycacheandbufferheaderdynamics

Pagetables(Highmem/Normalzone)

32bitversus64bit

HughTLBfs(Highmem/Normalzone)

UserMemoryPressure
Anonymous/pagecachesplit

PagecacheAllocationsPageFaults

pagecache

anonymous

PageCache/Anonymousmemorysplit

Pagecachememoryisglobalandgrowswhenfilesystemdataisaccessed
untilmemoryisexhausted.

Pagecacheisfreed:

Underlyingfilesaredeleted.

Unmountofthefilesystem.

Kswapdreclaimspagecachepageswhenmemoryisexhausted.

Anonymousmemoryisprivateandgrowsonuserdemmand

Allocationfollowedbypagefault.

Swapin.

Anonymousmemoryisfreed:

Processunmapsanonymousregionorexits.

Kswapdreclaimsanonymouspages(swapout)whenmemoryis
exhausted

PageCache/Anonymousmemorysplit(Cont)

Balancebetweenpagecacheandanonymousmemory.

Dynamic.

Controlledvia:

/proc/sys/vm/pagecache.

/proc/sys/vm/swappinessonRHEL4/RHEL5.

32bitMemoryReclamation
KernelAllocationsUserAllocations
DMA

Normal

Highmem

KernelReclamationUserReclamation
(kswapd)(kswapd,bdflush/pdflush)
slapcachereaping

pageaging

inodecachepruningpagecacheshrinking

bufferheadfreeing swapping
dentrycachepruning

64bitMemoryReclamation

RAM
KernelandUserAllocations

KernelandUserReclamation

Anonymous/pagecachereclaiming
PagecacheAllocationsPageFaults

pagecache

anonymous

kswapd(bdflush/pdflush,kupdated)kswapd
pagereclaim
deletionofafile
unmountfilesystem

pagereclaim(swapout)
unmap
exit

PerNode/ZonePagingDynamics
UserAllocations
Reactivate

ACTIVE
Pageaging

INACTIVE

FREE

(Dirty>Clean)
swapout

Reclaiming

bdflush(RHEL3)
pdflush(RHEL4/5)

Userdeletions

MemoryreclaimWatermarks
FreeList
AllofRAM

Donothing

PagesHighkswapdsleepsaboveHigh
kswapdreclaimsmemory
PagesLowkswapdwakesupatLow
kswapdreclaimsmemory

PagesMinallmemoryallocatorsreclaimatMin
userprocesses/kswapdreclaimmemory
0

Bufferedfilesystemwrite
pagecache

Memory
copy
buffer
User

100%ofpagecacheRAMdirty

Pagecache
page(dirty)

Kernel

pdflushdandwrite()'ng
processeswritedirtybuffers

40%dirty)processesstart
synchronouswrites
pdflushdwritesdirtybuffersin

background
10%dirtywakeuppdflushd
do_nothing
0%dirty

Bufferedfilesystemread
Memorycopy
Buffer
(dirty)

Pagecache
page

User

Kernel

Section2AnalyzingSystemPerformance

PerformanceMonitoringTools

Whattorunundercertainloads

AnalyzingSystemPerformance

Whattolookfor

PerformanceMonitoringTools

StandardUnixOStools

Monitoringcpu,memory,process,disk

oprofile

KernelTools

/proc,info(cpu,mem,slab),dmesg,AltSysrq

Profilingnmi_watchdog=1,profile=2

Tracing

strace,ltrace

dprobe,kprobe

3rdpartyprofiling/capacitymonitoring

Perfmon,Caliper,vtune

SARcheck,KDE,BEAPatrol,HPOpenview

RedHatTopTools

CPUTools

MemoryTools

ProcessTools

1top

1top

1top

2vmstat

2vmstats

2psopmem

3psaux

3psaur

3gprof

4mpstatPall

4ipcs

4strace,ltrace

5saru

5sarrBW

5sar

6iostat

6free

7oprofile

7oprofile

1iostatx

8gnome

8gnome

2vmstatD

systemmonitor

systemmonitor

3sarDEV#

9KDEmonitor

9KDEmonitor

4nfsstat

10/proc

10/proc

5NEEDMORE!

DiskTools

toppresshhelp,1showcpus,mmemory,tthreads,>
columnsort
top09:01:04up8days,15:22,2users,loadaverage:1.71,0.39,0.12
Tasks:114total,1running,113sleeping,0stopped,0zombie
Cpu0:5.3%us,2.3%sy,0.0%ni,0.0%id,92.0%wa,0.0%hi,0.3%si
Cpu1:0.3%us,0.3%sy,0.0%ni,89.7%id,9.7%wa,0.0%hi,0.0%si
Mem:2053860ktotal,2036840kused,17020kfree,99556kbuffers
Swap:2031608ktotal,160kused,2031448kfree,417720kcached

PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
27830oracle1601315m1.2g1.2gD1.360.90:00.09oracle
27802oracle1601315m1.2g1.2gD1.061.00:00.10oracle
27811oracle1601315m1.2g1.2gD1.060.80:00.08oracle
27827oracle1601315m1.2g1.2gD1.061.00:00.11oracle
27805oracle1701315m1.2g1.2gD0.761.00:00.10oracle
27828oracle1502758466484620S0.30.30:00.17tpcc.exe
1root1604744580480S0.00.00:00.50init
2rootRT0000S0.00.00:00.11migration/0
3root3419000S0.00.00:00.00ksoftirqd/0

vmstat(pagingvsswapping)
vmstat10
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200548352420052423457600546315251303096
020169784020052439314400057850482108539941221463
300784420052457841090059330589463243144307321842

mstat10
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200548352420052423457600546315251303096
02016623402005242345760057850482108539941221463
3023567873842005242345761875423745193589463243144307321842

VmstatIOzone(8GBfilewith6GBRAM)
#!depletememoryuntilpdflushturnson
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200448352420052423457600546315251303096
020169784020052429314400057850482108539941221463
3001537884200524384109200193589463243144307321842
02052812020052462281720047888810177133921322246
01046140200524671373600179110719144718251303535
22050972200524670574400232119698131619710253144
....
#!nowtransitionfromwritetoreads
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
14051040200524670554400213351912658390265618
1103506420052467127240040118911136720210354223
01068264234372664702000767445420484032072073
01034468234372667801600773913416202834091872
01047320234372669035600810507717832916072073
10038756234372669834400761364420273705191972
01031472234372670653200767253316012807081973

iostatxofsameIOzoneEXT3filesystem
Iostatmetrics
ratesperfsecsizesandresponsetime
r|wrqm/srequestmerged/saverqszaveragerequestsz
r|wsec/s512bytesectors/savequszaveragequeuesz
r|wKB/sKilobyte/sawaitaveragewaittimems
r|w/soperations/ssvcmaveservicetimems
Linux2.4.2127.0.2.ELsmp(node1)05/09/2005

avgcpu:%user%nice%sys%iowait%idle
0.400.002.630.9196.06
Device:rrqm/swrqm/sr/sw/srsec/swsec/srkB/swkB/savgrqszavgquszawaitsvctm%util
sdi16164.600.00523.400.00133504.000.0066752.000.00255.071.001.911.8898.40
sdi17110.100.00553.900.00141312.000.0070656.000.00255.120.991.801.7898.40
sdi16153.500.00522.500.00133408.000.0066704.000.00255.330.981.881.8697.00
sdi17561.900.00568.100.00145040.000.0072520.000.00255.311.011.781.76100.00

SAR
[root@localhostredhat]#saru33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005

10:32:28PMCPU%user%nice%system%idle
10:32:31PMall0.000.000.00100.00
10:32:34PMall1.330.000.3398.33
10:32:37PMall1.340.000.0098.66
Average:all0.890.000.1199.00
[root]sarnDEV
Linux2.4.2120.EL(localhost.localdomain)03/16/2005

01:10:01PMIFACErxpck/stxpck/srxbyt/stxbyt/srxcmp/stxcmp/
srxmcst/s
01:20:00PMlo3.493.49306.16306.160.00
0.000.00
01:20:00PMeth03.893.532395.34484.700.00
0.000.00
01:20:00PMeth10.000.000.000.000.00
0.000.00

free/numastatmemoryallocation
[root@localhostredhat]#freel
totalusedfreesharedbuffers
cached
Mem:511368342336169032029712
167408
Low:51136834233616903200
0
High:00000
0
/+buffers/cache:145216366152
Swap:104324001043240
numastat(on2cpux86_64basedsystem)
node1node0
numa_hit980333210905630
numa_miss20490181609361
numa_foreign16093612049018
interleave_hit5868954749
local_node977092710880901
other_node20814231634090

ps
[root@localhostroot]#psaux
[root@localhostroot]#psaux|more
USERPID%CPU%MEMVSZRSSTTYSTATSTARTTIMECOMMAND
root10.10.11528516?S23:180:04init
root20.00.000?SW23:180:00[keventd]
root30.00.000?SW23:180:00[kapmd]
root40.00.000?SWN23:180:00[ksoftirqd/0]
root70.00.000?SW23:180:00[bdflush]
root50.00.000?SW23:180:00[kswapd]
root60.00.000?SW23:180:00[kscand]

pstree
init/usr/bin/sealer
acpid
atd
auditdpython
{auditd}
automount6*[{automount}]
avahidaemonavahidaemon
bonoboactivati{bonoboactivati}
btapplet
clockapplet
crond
cupsdcupspolld
3*[dbusdaemon{dbusdaemon}]
2*[dbuslaunch]
dhclient

mpstat

[root@localhostredhat]#mpstat33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005
10:40:34PMCPU%user%nice%system%idleintr/s
10:40:37PMall3.000.000.0097.00193.67
10:40:40PMall1.330.000.0098.67208.00
10:40:43PMall1.670.000.0098.33196.00
Average:all2.000.000.0098.00199.22

The/procfilesystem

/proc

meminfo

slabinfo

cpuinfo

pid<#>/maps

vmstat(RHEL4&RHEL5)

zoneinfo(RHEL5)

sysrqtrigger

/proc/meminfo(rhel3,4,5)
RHEL3>cat/proc/meminfo
MemTotal:509876kB
MemFree:17988kB
MemShared:0kB
Buffers:4728kB
Cached:157444kB
SwapCached:46576kB
Active:222784kB
ActiveAnon:118844kB
ActiveCache:103940kB
Inact_dirty:41088kB
Inact_laundry:7640kB
Inact_clean:6904kB
Inact_target:55680kB
HighTotal:0kB
HighFree:0kB
LowTotal:509876kB
LowFree:17988kB
SwapTotal:1044184kB
SwapFree:945908kB
CommitLimit:1299120kB
Committed_AS:404920kB
HugePages_Total:0
HugePages_Free:0
Hugepagesize:2048kB

RHEL4>cat/proc/meminfo
MemTotal:32749568kB
MemFree:31313344kB
Buffers:29992kB

Cached:1250584kB
SwapCached:0kB
Active:235284kB

Inactive:1124168kB

RHEL5>cat/proc/meminfo
MemTotal:1025220kB
MemFree:11048kB
Buffers:141944kB
Cached:342664kB
SwapCached:4kB
Active:715304kB
Inactive:164780kB
HighTotal:0kB
HighFree:0kB
LowTotal:1025220kB

HighTotal:0kB

LowFree:11048kB

LowTotal:32749568kB

SwapFree:2031472kB

SwapTotal:4095992kB

Writeback:0kB

HighFree:0kB

LowFree:31313344kB

SwapFree:4095992kB
Dirty:0kB

Writeback:0kB

Mapped:1124080kB
Slab:38460kB

CommitLimit:20470776kB
Committed_AS:1158556kB
PageTables:5096kB

VmallocTotal:536870911kB
VmallocUsed:2984kB

VmallocChunk:536867627kB
HugePages_Total:0
HugePages_Free:0

Hugepagesize:2048kB

SwapTotal:2031608kB
Dirty:84kB
AnonPages:395572kB
Mapped:82860kB
Slab:92296kB
PageTables:23884kB
NFS_Unstable:0kB
Bounce:0kB
CommitLimit:2544216kB
Committed_AS:804656kB
VmallocTotal:34359738367kB
VmallocUsed:263472kB
VmallocChunk:34359474711kB
HugePages_Total:0
HugePages_Free:0
HugePages_Rsvd:0
Hugepagesize:2048kB

/proc/slabinfo
slabinfoversion:2.1
#name<active_objs><num_objs><objsize><objperslab><pagesperslab>:tunables<limit>
<batchcount><sharedfactor>:slabdata<active_slabs><num_slabs><sharedavail>
nfsd4_delegations0065661:tunables54278:slabdata000
nfsd4_stateids00128301:tunables120608:slabdata000
nfsd4_files0072531:tunables120608:slabdata000
nfsd4_stateowners0042491:tunables54278:slabdata000
nfs_direct_cache00128301:tunables120608:slabdata000
nfs_write_data363683292:tunables54278:slabdata440
nfs_read_data323576851:tunables54278:slabdata770
nfs_inode_cache13831389104031:tunables24128:slabdata4634630
nfs_page00128301:tunables120608:slabdata000
fscache_cookie_jar35372531:tunables120608:slabdata110
ip_conntrack_expect00136281:tunables120608:slabdata000
ip_conntrack75130304131:tunables54278:slabdata10100
bridge_fdb_cache0064591:tunables120608:slabdata000
rpc_buffers88204821:tunables24128:slabdata440
rpc_tasks3030384101:tunables54278:slabdata330

/proc/cpuinfo
[lwoodman]$cat/proc/cpuinfo
processor:0
vendor_id:GenuineIntel
cpufamily:6
model:15
modelname:Intel(R)Xeon(R)CPU3060@2.40GHz
stepping:6
cpuMHz:2394.070
cachesize:4096KB
physicalid:0
siblings:2
coreid:0
cpucores:2
fpu:yes
fpu_exception:yes
cpuidlevel:10
wp:yes
flags:fpuvmedepsetscmsrpaemcecx8apicsepmtrrpgemcacmovpatpse36clflushdts
acpimmxfxsrssesse2sshttmsyscallnxlmconstant_tscpnimonitords_cplvmxesttm2cx16xtpr
lahf_lm
bogomips:4791.41
clflushsize:64
cache_alignment:64
addresssizes:36bitsphysical,48bitsvirtual
powermanagement:

32bit/proc/<pid>/maps
[root@dhcp8336proc]#cat5808/maps
0022e0000023b000rxp0000000003:034137068/lib/tls/libpthread0.60.so
0023b0000023c000rwp0000c00003:034137068/lib/tls/libpthread0.60.so
0023c0000023e000rwp0000000000:000
0037f00000391000rxp0000000003:03523285/lib/libnsl2.3.2.so
0039100000392000rwp0001100003:03523285/lib/libnsl2.3.2.so
0039200000394000rwp0000000000:000
00c4500000c5a000rxp0000000003:03523268/lib/ld2.3.2.so
00c5a00000c5b000rwp0001500003:03523268/lib/ld2.3.2.so
00e5c00000f8e000rxp0000000003:034137064/lib/tls/libc2.3.2.so
00f8e00000f91000rwp0013100003:034137064/lib/tls/libc2.3.2.so
00f9100000f94000rwp0000000000:000
080480000804f000rxp0000000003:031046791/sbin/ypbind
0804f00008050000rwp0000700003:031046791/sbin/ypbind
09794000097b5000rwp0000000000:000
b5fdd000b5fde000p0000000000:000
b5fde000b69de000rwp0000100000:000
b69de000b69df000p0000000000:000
b69df000b73df000rwp0000100000:000
b73df000b75df000rp0000000003:033270410/usr/lib/locale/localearchive
b75df000b75e1000rwp0000000000:000
bfff6000c0000000rwpffff800000:000

64bit/proc/<pid>/maps
#cat/proc/2345/maps
004000000100b000rxp00000000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
0110b00001433000rwp00c0b000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
01433000014eb000rwxp0143300000:000
4000000040001000p4000000000:000
4000100040a01000rwxp4000100000:000
2a95f730002a96073000p0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960730002a96075000rp0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960750002a96078000rwp0012d000fd:00819273/lib64/tls/libc2.3.4.so
2a960780002a9607e000rwp2a9607800000:000
2a9607e0002a98c3e000rws0000000000:06360450/SYSV0100401e(deleted)
2a98c3e0002a98c47000rwp2a98c3e00000:000
2a98c470002a98c51000rxp00000000fd:00819227/lib64/libnss_files2.3.4.so
2a98c510002a98d51000p0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d510002a98d53000rwp0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d530002a98d57000rxp00000000fd:00819225/lib64/libnss_dns2.3.4.so
2a98d570002a98e56000p00004000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e560002a98e58000rwp00003000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e580002a98e69000rxp00000000fd:00819237/lib64/libresolv2.3.4.so
2a98e690002a98f69000p00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f690002a98f6b000rwp00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f6b0002a98f6d000rwp2a98f6b00000:000
35c7e0000035c7e08000rxp00000000fd:00819469/lib64/libpam.so.0.77
35c7e0800035c7f08000p00008000fd:00819469/lib64/libpam.so.0.77
35c7f0800035c7f09000rwp00008000fd:00819469/lib64/libpam.so.0.77
35c800000035c8011000rxp00000000fd:00819468/lib64/libaudit.so.0.0.0
35c801100035c8110000p00011000fd:00819468/lib64/libaudit.so.0.0.0
35c811000035c8118000rwp00010000fd:00819468/lib64/libaudit.so.0.0.0
35c900000035c900b000rxp00000000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c900b00035c910a000p0000b000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c910a00035c910b000rwp0000a000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
7fbfff10007fc0000000rwxp7fbfff100000:000
ffffffffff600000ffffffffffe00000p0000000000:000

/proc/vmstat(RHEL4/RHEL5)
cat/proc/vmstat
nr_anon_pages98893
nr_mapped20715
nr_file_pages120855
nr_slab23060
nr_page_table_pages5971
nr_dirty21
nr_writeback0
nr_unstable0
nr_bounce0
numa_hit996729666
numa_miss0
numa_foreign0
numa_interleave87657
numa_local996729666
numa_other0
pgpgin2577307
pgpgout106131928
pswpin0
pswpout34
pgalloc_dma198908
pgalloc_dma32997707549
pgalloc_normal0
pgalloc_high0
pgfree997909734
pgactivate1313196
pgdeactivate470908
pgfault2971972147
pgmajfault8047.

CONTINUED...
pgrefill_dma18338
pgrefill_dma321353451
pgrefill_normal0
pgrefill_high0
pgsteal_dma0
pgsteal_dma320
pgsteal_normal0
pgsteal_high0
pgscan_kswapd_dma7235
pgscan_kswapd_dma32417984
pgscan_kswapd_normal0
pgscan_kswapd_high0
pgscan_direct_dma12
pgscan_direct_dma321984
pgscan_direct_normal0
pgscan_direct_high0
pginodesteal166
slabs_scanned1072512
kswapd_steal410973
kswapd_inodesteal61305
pageoutrun7752
allocstall29
pgrotated73

AltSysrqMRHEL3
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:2929min:0low:0high:0
Zone:Normalfreepages:1941min:510low:2235high:3225
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:4870(0HighMem)
(Active:72404/13523,inactive_laundry:2429,inactive_clean:1730,free:4870)
aa:0ac:0id:0il:0ic:0fr:2929
aa:46140ac:26264id:13523il:2429ic:1730fr:1941
aa:0ac:0id:0il:0ic:0fr:0
1*4kB4*8kB2*16kB2*32kB1*64kB2*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11716kB)
1255*4kB89*8kB5*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=7764kB)
Swapcache:add958119,delete918749,find4611302/5276354,race0+1
27234pagesofslabcache
244pagesofkernelstacks
1303lowmempagetables,0highmempagetables
0bouncebufferpages,0areontheemergencylist
Freeswap:598960kB
130933pagesofRAM
0pagesofHIGHMEM
3497reservedpages
34028pagesshared
39370pagesswapcached

AltSysrqMRHEL3/NUMA
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:0min:0low:0high:0
Zone:Normalfreepages:369423min:1022low:6909high:9980
Zone:HighMemfreepages:0min:0low:0high:0
Zone:DMAfreepages:2557min:0low:0high:0
Zone:Normalfreepages:494164min:1278low:9149high:13212
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:866144(0HighMem)
(Active:9690/714,inactive_laundry:764,inactive_clean:35,free:866144)
aa:0ac:0id:0il:0ic:0fr:0
aa:746ac:2811id:188il:220ic:0fr:369423
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:2557
aa:1719ac:4414id:526il:544ic:35fr:494164
aa:0ac:0id:0il:0ic:0fr:0
2497*4kB1575*8kB902*16kB515*32kB305*64kB166*128kB96*256kB56*512kB39*1024kB30*2048kB300*4096kB=1477692kB)
Swapcache:add288168,delete285993,find726/2075,race0+0
4059pagesofslabcache
146pagesofkernelstacks
388lowmempagetables,638highmempagetables
Freeswap:1947848kB
917496pagesofRAM
869386freepages
30921reservedpages
21927pagesshared
2175pagesswapcached
Buffermemory:9752kB
Cachememory:34192kB
CLEAN:696buffers,2772kbyte,51used(last=696),0locked,0dirty0delay
DIRTY:4buffers,16kbyte,4used(last=4),0locked,3dirty0delay

AltSysrqMRHEL4&5
SysRq:ShowMemory
Meminfo:
Freepages:20128kB(0kBHighMem)
Active:72109inactive:27657dirty:1writeback:0unstable:0free:5032slab:19306mapped:41755pagetables:945
DMAfree:12640kBmin:20kBlow:40kBhigh:60kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:847
all_unreclaimable?yes
protections[]:000
Normalfree:7488kBmin:688kBlow:1376kBhigh:2064kBactive:288436kBinactive:110628kBpresent:507348kB
pages_scanned:0all_unreclaimable?no
protections[]:000
HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB

0*1024kB0*2048kB0*4096kB=7488kB

Normal:1052*4kB240*8kB39*16kB3*32kB0*64kB1*128kB0*256kB1*512kB
HighMem:empty
Swapcache:add52,delete52,find3/5,race0+0
Freeswap:1044056kB
130933pagesofRAM
0pagesofHIGHMEM
2499reservedpages
71122pagesshared
0pagesswapcached

AltSysrqMRHEL4&5/NUMA
Freepages:16724kB(0kBHighMem)
Active:236461inactive:254776dirty:11writeback:0unstable:0free:4181slab:13679mapped:34073
pagetables:853
Node1DMAfree:0kBmin:0kBlow:0kBhigh:0kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node1Normalfree:2784kBmin:1016kBlow:2032kBhigh:3048kBactive:477596kBinactive:508444kB
present:1048548kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node1HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node0DMAfree:11956kBmin:12kBlow:24kBhigh:36kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:1050all_unreclaimable?yes
protections[]:000
Node0Normalfree:1984kBmin:1000kBlow:2000kBhigh:3000kBactive:468248kBinactive:510660kB
present:1032188kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node0HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node1DMA:empty
Node1Normal:0*4kB0*8kB30*16kB10*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=2784kB
Node1HighMem:empty
Node0DMA:5*4kB4*8kB4*16kB2*32kB2*64kB3*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11956kB
Node0Normal:0*4kB0*8kB0*16kB0*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=1984kB
Node0HighMem:empty
Swapcache:add44,delete44,find0/0,race0+0
Freeswap:2031432kB
524280pagesofRAM
10951reservedpages
363446pagesshared
0pagesswapcached

AltSysrqT
bashRcurrent016091606
(NOTLB)
CallTrace:[<c02a1897>]snprintf[kernel]0x27(0xdb3c5e90)
[<c01294b3>]call_console_drivers[kernel]0x63(0xdb3c5eb4)
[<c01297e3>]printk[kernel]0x153(0xdb3c5eec)
[<c01297e3>]printk[kernel]0x153(0xdb3c5f00)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f0c)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f14)
[<c0125992>]show_state[kernel]0x62(0xdb3c5f24)
[<c01cfb1a>]__handle_sysrq_nolock[kernel]0x7a(0xdb3c5f38)
[<c01cfa7d>]handle_sysrq[kernel]0x5d(0xdb3c5f58)
[<c0198f43>]write_sysrq_trigger[kernel]0x53(0xdb3c5f7c)
[<c01645b7>]sys_write[kernel]0x97(0xdb3c5f94)
*loggedin/var/log/messages

AltSysrqWandP
SysRq:ShowCPUs
CPU0:
ffffffff8047ef480000000000000000ffffffff80437f10ffffffff8019378b
000000000000000000000000000000000000000000000000ffffffff801937ba
ffffffff8019378bffffffff80022b27ffffffff800551bf0000000000090000
CallTrace:
[<ffffffff80069572>]show_trace+0x34/0x47
[<ffffffff80069675>]_show_stack+0xd9/0xe8
[<ffffffff801937ba>]showacpu+0x2f/0x3b
[<ffffffff80022b27>]smp_call_function_interrupt+0x57/0x75
[<ffffffff8005bf16>]call_function_interrupt+0x66/0x6c
[<ffffffff8002fcc2>]unix_poll+0x0/0x96
[<ffffffff800551f5>]mwait_idle+0x36/0x4a
[<ffffffff80047205>]cpu_idle+0x95/0xb8
[<ffffffff8044181f>]start_kernel+0x225/0x22a
[<ffffffff8044125b>]_sinittext+0x25b/0x262

oprofilebuiltintoRHEL4&5(smp)

opcontrolon/offdata

opreportanalyzeprofile

startstartcollection

rreverseordersort

stopstopcollection

dumpoutputtodisk

t[percentage]theshold

event=:name:count

toview

Example:
#opcontrolstart
#/bin/timetest1&
#sleep60
#opcontrolstop
#opcontroldump

f/path/filename

ddetails

opannotate

s/path/source

a/path/assembly

oprofileopcontrolandopreportcpu_cycles
#CPU:Core2,speed2666.72MHz(estimated)
CountedCPU_CLK_UNHALTEDevents(Clockcycleswhennothalted)withaunitmaskof0x00(Unhaltedcorec
ycles)count100000
CPU_CLK_UNHALT...|
samples|%|

39743597184.6702vmlinux
197030644.1976zeus.web
169143173.6034e1000
122085142.6009ld2.5.so
117117462.4951libc2.5.so
51646641.1003sim.cgi
23334270.4971oprofiled
12951610.2759oprofile
10997310.2343zeus.cgi
9686230.2064ext3
2701630.0576jbd

ProfilingTools:SystemTap

RedHat,Intel,IBM&Hitachicollaboration

LinuxanswertoSolarisDtrace

Dynamicinstrumentation

Tooltotakeadeeplookintoarunningsystem:

Assistsinidentifyingcausesofperformance
problems
Simplifiesbuildinginstrumentation

Currentsnapshotsavailablefrom:
http://sources.redhat.com/systemtap

Sourceforpresentations/papers

Kernelspacetracingtoday,userspacetracing
underdevelopment

Technologypreviewstatusuntil5.1

parse

probescript

elaborate
probesetlibrary

translatetoC,compile*

loadmodule,startprobe

probekernel
object

extractoutput,unload
probeoutput

*SolarisDtraceisinterpretive

ProfilingTools:SystemTap

Technology:Kprobes:

Incurrent2.6kernels

Upstream2.6.12,backportedtoRHEL4kernel

Kernelinstrumentationwithoutrecompile/reboot

Usessoftwareintandtraphandlerforinstrumentation

Debuginformation:

Providesmapbetweenexecutableandsourcecode

GeneratedaspartofRPMbuilds

Availableat:ftp://ftp.redhat.com

Safety:Instrumentationscriptinglanguage:

Nodynamicmemoryallocationorassembly/Ccode

Typesandtypeconversionslimited

Restrictaccessthroughpointers

Scriptcompilerchecks:

InfiniteloopsandrecursionInvalidvariableaccess

New Tuning Tools w/ RH MRG


MRG Tuning using the TUNA dynamically control
Device IRQ properties
CPU affinity / parent and threads
Scheduling policy

New Tuning Tools w/ RH MRG


MRG Tuning using the TUNA dynamically control
Process affinity / parent and threads
Scheduling policy

Section3:Tuning

HowtotuneLinux

Capacitytuning

Fixproblemsbyaddingresources

PerformanceTuning

Methodology
1)Documentconfig
2)Baselineresults
3)Whileresultsnonoptimal
a)Monitor/Instrumentsystem/workload
b)Applytuning1changeatatime
c)Analyzeresults,exitorloop
4)Documentfinalconfig

Tuninghowtosetkernelparameters

/proc
[root@foobarfs]#cat/proc/sys/kernel/sysrq(see0)
[root@foobarfs]#echo1>/proc/sys/kernel/sysrq
[root@foobarfs]#cat/proc/sys/kernel/sysrq(see1)

Sysctlcommand
[root@foobarfs]#sysctlkernel.sysrq
kernel.sysrq=0
[root@foobarfs]#sysctlwkernel.sysrq=1
kernel.sysrq=1
[root@foobarfs]#sysctlkernel.sysrq
kernel.sysrq=1

Editthe/etc/sysctl.conffile
#KernelsysctlconfigurationfileforRedHatLinux
#ControlstheSystemRequestdebuggingfunctionalityofthekernel
kernel.sysrq=1

CapacityTuning

Memory

/proc/sys/vm/overcommit_memory

/proc/sys/vm/overcommit_ratio

/proc/sys/vm/max_map_count

/proc/sys/vm/nr_hugepages

Kernel

/proc/sys/kernel/msgmax

/proc/sys/kernel/msgmnb

/proc/sys/kernel/msgmni

/proc/sys/kernel/shmall

/proc/sys/kernel/shmmax

/proc/sys/kernel/shmmni

/proc/sys/kernel/threadsmax

Filesystems

/proc/sys/fs/aio_max_nr

/proc/sys/fs/file_max

OOMkills

OOMkillsswapspaceexhaustion(RHEL3)
Meminfo:
Zone:DMAfreepages:975min:1039low:1071high:1103
Zone:Normalfreepages:126min:255low:1950high:2925
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:1101(0HighMem)
(Active:118821/401,inactive_laundry:0,inactive_clean:0,free:1101)
aa:1938ac:18id:44il:0ic:0fr:974
aa:115717ac:1148id:357il:0ic:0fr:126
aa:0ac:0id:0il:0ic:0fr:0
6*4kB0*8kB0*16kB1*32kB0*64kB0*128kB1*256kB1*512kB1*1024kB1*2048kB0*4096kB=3896kB)
0*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB0*512kB0*1024kB0*2048kB0*4096kB=504kB)
Swapcache:add620870,delete620870,find762437/910181,race0+200
2454pagesofslabcache
484pagesofkernelstacks
2008lowmempagetables,0highmempagetables
Freeswap:0kB
129008pagesofRAM
0pagesofHIGHMEM
3045reservedpages
4009pagesshared
0pagesswapcached

OOMkillslowmemconsumption(RHEL3/x86)
Meminfo:
zone:DMAfreepages:2029min:0low:0high:0
Zone:Normalfreepages:1249min:1279low:4544high:6304
Zone:HighMemfreepages:746min:255low:29184high:43776
Freepages:4024(746HighMem)
(Active:703448/665000,inactive_laundry:99878,inactive_clean:99730,free:4024)
aa:0ac:0id:0il:0ic:0fr:2029
aa:128ac:3346id:113il:240ic:0fr:1249
aa:545577ac:154397id:664813il:99713ic:99730fr:746
1*4kB0*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB1*4096kB=8116kB)
543*4kB35*8kB77*16kB1*32kB0*64kB0*128kB1*256kB0*512kB1*1024kB0*2048kB0*4096kB=4996kB)
490*4kB2*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=2984kB)
Swapcache:add4327,delete4173,find190/1057,race0+0
178558pagesofslabcache
1078pagesofkernelstacks
0lowmempagetables,233961highmempagetables
Freeswap:8189016kB
2097152pagesofRAM
1801952pagesofHIGHMEM
103982reservedpages
115582774pagesshared
154pagesswapcached
OutofMemory:Killedprocess27100(oracle).

OOMkillslowmemconsumption(RHEL4&5/x86)
Freepages:9003696kB(8990400kBHighMem)
Active:323264inactive:346882dirty:327575writeback:3686unstable:0free:2250924slab:177094
mapped:15855pagetables:987
DMAfree:12640kBmin:16kBlow:32kBhigh:48kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:149all_unreclaimable?yes
protections[]:000
Normalfree:656kBmin:928kBlow:1856kBhigh:2784kBactive:6976kBinactive:9976kBpresent:901120kB
pages_scanned:28281all_unreclaimable?yes
protections[]:000
HighMemfree:8990400kBmin:512kBlow:1024kBhigh:1536kBactive:1286080kBinactive:1377552kB
present:12451840kBpages_scanned:0all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB
Normal:0*4kB2*8kB0*16kB0*32kB0*64kB1*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB=656kB
HighMem:15994*4kB17663*8kB11584*16kB8561*32kB8193*64kB1543*128kB69*256kB2101*512kB
1328*1024kB765*2048kB875*4096kB=8990400kB
Swapcache:add0,delete0,find0/0,race0+0
Freeswap:8385912kB
3342336pagesofRAM
2916288pagesofHIGHMEM
224303reservedpages
666061pagesshared
0pagesswapcached
OutofMemory:Killedprocess22248(httpd).
oomkiller:gfp_mask=0xd0

OOMkillsIOsystemstall(RHEL4&5/x86)
Freepages:15096kB(1664kBHighMem)Active:34146inactive:1995536dirty:255
writeback:314829unstable:0free:3774slab:39266mapped:31803pagetables:820
DMAfree:12552kBmin:16kBlow:32kBhigh:48kBactive:0kBinactive:0kBpresent:16384kB
pages_scanned:2023all_unreclaimable?yes
protections[]:000
Normalfree:880kBmin:928kBlow:1856kBhigh:2784kBactive:744kBinactive:660296kB
present:901120kBpages_scanned:726099all_unreclaimable?yes
protections[]:000
HighMemfree:1664kBmin:512kBlow:1024kBhigh:1536kBactive:135840kBinactive:7321848kB
present:7995388kBpages_scanned:0all_unreclaimable?no
protections[]:000
DMA:2*4kB4*8kB2*16kB4*32kB3*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=
12552kB
Normal:0*4kB18*8kB14*16kB0*32kB0*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB
=880kB
HighMem:6*4kB9*8kB66*16kB0*32kB0*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB0*4096kB
=1664kB
Swapcache:add856,delete599,find341/403,race0+0
0bouncebufferpages
Freeswap:4193264kB

2228223pagesofRAM
1867481pagesofHIGHMEM
150341reservedpages
343042pagesshared
257pagesswapcached
kernel:OutofMemory:Killedprocess3450(hpsmhd).

EliminatingOOMkills

RHEL3

/proc/sys/vm/oomkillnumberofprocessesthatcanbeinan
OOMkillstateatanyonetime(default1).

RHEL4

/proc/sys/vm/oomkilloomkillenable/disableflag(default1).

RHEL5

/proc/<pid>/oom_adjperprocessOOMadjustment(17to+15)

Setto17todisablethatprocessfrombeingOOMkilled

DecreasetodecreaseOOMkilllikelyhood.

IncreasetoincreaseOOMkilllikelyhood.

/proc/<pid>/oom_scorecurrentOOMkillpriority.

GeneralPerformanceTuningConsiderations

OverCommittingRAM

Swapdevicelocation

Storagedeviceandlimitslimits

Kernelselection

PerformanceTuning(RHEL3)

/proc/sys/vm/bdflush

/proc/sys/vm/pagecache

/proc/sys/vm/numa_memory_allocator

RHEL3/proc/sys/vm/bdflush
intnfract;/*Percentageofbuffercachedirtytoactivatebdflush*/

intndirty;/*Maximumnumberofdirtyblockstowriteoutperwakecycle*/
intdummy2;/*old"nrefill"*/
intdummy3;/*unused*/
intinterval;/*jiffiesdelaybetweenkupdateflushes*/
intage_buffer;/*Timefornormalbuffertoagebeforeweflushit*/
intnfract_sync;/*Percentageofbuffercachedirtytoactivatebdflushsynchronously
intnfract_stop_bdflush;/*Percetangeofbuffercachedirtytostopbdflush*/
intdummy5;/*unused*/

Example:
SettingsforServerwithampleIOconfig(defaultr3gearedforws)
sysctlwvm.bdflush=505000002005000300060200

RHEL3/proc/sys/vm/pagecache

pagecache.minpercent

Lowerlimitforpagecachepagereclaiming.

Kswapdwillstopreclaimingpagecachepagesbelowthispercentof
RAM.

pagecache.borrowpercnet

KswapdattemptstokeepthepagecacheatthispercentorRAM

pagecache.maxpercent

Upperlimitforpagecachepagereclaiming.

RHEL2.1hardlimit,pagecachewillnotgrowabovethispercentofRAM.

RHEL3kswapdonlyreclaimspagecachepagesabovethispercentof
RAM.

Increasingmaxpercentwillincreaseswapping

Example:echo11050>/proc/sys/vm/pagecache

RHEL3/proc/sys/vm/numa_memory_allocator

>numa=on(default)

Zone:Normalfreepages:10539min:1279low:17406high:25597
Zone:Normalfreepages:10178min:1279low:17406high:25597
Zone:Normalfreepages:10445min:1279low:17406high:25597
Zone:Normalfreepages:856165min:1279low:17342high:25501
Swapcache:add2633120,delete2553093,find1375365/1891330,race0+0

>numa=off

Zone:Normalfreepages:861136min:1279low:30950high:63065
Swapcache:add0,delete0find0/0,race0+0

>numa=onand/proc/sys/vm/numa_memory_allocatorsetto1

Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:17406min:1279low:17406high:25597
Zone:Normalfreepages:85739min:1279low:17342high:25501
Swapcache:add0,delete0find0/0,race0+0

PerformanceTuning(RHEL4andRHEL5)

/proc/sys/vm/swappiness

/proc/sys/vm/min_free_kbytes

/proc/sys/vm/dirty_ratio

/proc/sys/vm/dirty_background_ratio

/proc/sys/vm/pagecache

RHEL4/proc/sys/vm/swappiness

Controlshowaggressivelythesystemreclaimsmappedmemory:

Anonymousmemoryswapping

Mappedfilepageswritingifdirtyandfreeing

SystemVsharedmemoryswapping

Decreasing:moreaggressivereclaimingofunmappedpagecachememory

Increasing:moreaggressiveswappingofmappedmemory

Sybaseserverwith/proc/sys/vm/swappinesssetto60(default)

procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
51643644267883544323417888801204044749613022084625342516

Sybaseserverwith/proc/sys/vm/swappinesssetto10

procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
8302422867243228069600238886377612862002024381326

RHEL4&5/proc/sys/vm/min_free_kbytes

DirectlycontrolsthepagereclaimwatermarksinKB

#echo1024>/proc/sys/vm/min_free_kbytes

Node0DMAfree:4420kBmin:8kBlow:8kBhigh:12kB
Node0DMA32free:14456kBmin:1012kBlow:1264kBhigh:1516kB

echo2048>/proc/sys/vm/min_free_kbytes

Node0DMAfree:4420kBmin:20kBlow:24kBhigh:28kB
Node0DMA32free:14456kBmin:2024kBlow:2528kBhigh:3036kB

MemoryreclaimWatermarksmin_free_kbytes
FreeList

AllofRAM

Donothing

PagesHighkswapdsleepsaboveHigh
kswapdreclaimsmemory
PagesLowkswapdwakesupatLow
kswapdreclaimsmemory

PagesMinallmemoryallocatorsreclaimatMin
userprocesses/kswapdreclaimmemory
0

RHEL4&5/proc/sys/vm/dirty_ratio

Absolutelimittopercentageofdirtypagecachememory

Defaultis40%

LowermeanslessdirtypagecacheandsmallerIOstreams

HighermeansmoredirtypagecacheandlargerIOstreams

RHEL4&5/proc/sys/vm/dirty_background_ratio

Controlswhendirtypagecachememorystartsgettingwritten.

Defaultis10%

Lower

pdflushstartsearlier

lessdirtypagecacheandsmallerIOstreams

Higher

pdflushstartslater

moredirtypagecacheandlargerIOstreams

dirty_ratioanddirty_background_ratio
pagecache
100%ofpagecacheRAMdirty

pdflushdandwrite()'ngprocesseswritedirtybuffers

dirty_ratio(40%ofRAMdirty)processesstartsynchronouswrites
pdflushdwritesdirtybuffersinbackground
dirty_background_ratio(10%ofRAMdirty)wakeuppdflushd
do_nothing
0%ofpagecacheRAMdirty

RHEL4&5/proc/sys/vm/pagecache

Controlswhenpagecachememoryisdeactivated.

Defaultis100%

Lower

Preventsswappingoutanonymousmemory

Higher

Favorspagecachepages

Disabledat100%

PagecacheTuning
Filesystem/pagecacheAllocation
Accessed(pagecacheunderlimit)

ACTIVE

INACTIVE
Aging

(new>old)

Accessed(pagecacheoverlimit)

reclaim

FREE

(Hint)flushingthepagecache

[tmp]#echo1>/proc/sys/vm/drop_caches

procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussyidwa
00224571841078083350196000561136212008317
0022457184107808335019600001039198001000
0022457184107808335019600001021188001000
0022457184107808335019600001035204001000
0022457248107808335019600001008164001000
302242128160176143863600001030197015850
002243610656204344080028361027177032672
0022436106562043440800001026180001000
002243610720212344000080101018300991

(Hint)flushingtheslabcache

[tmp]#echo2>/proc/sys/vm/drop_caches

[tmp]#cat/proc/meminfo
MemTotal:3907444kB
MemFree:3604576kB

tmp]#cat/proc/meminfo
MemTotal:3907444kB
MemFree:3604576kB

Slab:115420kB

Slab:115420kB

Hugepagesize:2048kB

Hugepagesize:2048kB

RHEL3kernelselection

x86

Standardkernel(noPAE,3G/1G)

UPsystemswith<=4GBRAM

PAEcosts~5%inperformance

SMPkernel(PAE,3G/1G)

SMPsystemswith<~12GBRAM

Highmem/Lowmemratio<=10:1

4G/4Gcosts~5%

Hugememkernel(PAE,4G/4G)

SMPsystems>~12GBRAM

X86_64

StandardkernelforUPsystems

SMPkernelforSMPsystems

RHEL4kernelselection

x86

Standardkernel(noPAE,3G/1G)

SMPkernel(PAE,3G/1G)

SMPsystemswith<~16GBRAM

Highmem/Lowmemratio<=16:1

Hugememkernel(PAE,4G/4G)

UPsystemswith<=4GBRAM

SMPsystems>~16GBRAM

X86_64

StandardkernelforUPsystems

SMPkernelforsystemswithupto8CPUs

LargeSMPkernelforsystemsupto512CPUs

RHEL5kernelselection

x86

Standardkernel(noPAE,3G/1G)

PAEkernel(PAE,3G/1G)

UPandSMPsystemswith>4GBRAM

X86_64

UPandSMPsystemswith<=4GBRAM

Standardkernelforallsystems

IA64

Standardkernelforallsystems

Problem16GBx86runningSMPkernel
Zone:DMAfreepages:2207min:0low:0high:0
Zone:Normalfreepages:484min:1279low:4544high:6304
Zone:HighMemfreepages:266min:255low:61952high:92928
Freepages:2957(266HighMem)
(Active:245828/1297300,inactive_laundry:194673,inactive_clean:194668,free:2957)
aa:0ac:0id:0il:0ic:0fr:2207
aa:630ac:1009id:189il:233ic:0fr:484
aa:195237ac:48952id:1297057il:194493ic:194668fr:266
1*4kB1*8kB1*16kB1*32kB1*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB2*4096kB=8828kB)
48*4kB8*8kB97*16kB4*32kB0*64kB0*128kB0*256kB0*512kB0*1024kB0*2048kB0*4096kB=
1936kB)
12*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=
1064kB)
Swapcache:add3838024,delete3808901,find107105/1540587,race0+2
138138pagesofslabcache
1100pagesofkernelstacks
0lowmempagetables,37046highmempagetables
Freeswap:3986092kB

4194304pagesofRAM
3833824pagesofHIGHMEM

TuningFileSystemsandDiskIO

KernelOptimizations

CPUSchedulingmultithreaded,multicore

NUMAoptimizedw/NUMActl

KerneldiskI/OI/Oschedulers,DirectI/O,
AsyncI/O

FilesystemsEXT3,NFS,GFS,OCFS

Databasecharactistics

HugePagesHugetlbfs,db'sjavaetc

RHEL5PerformanceFeatures

Linuxat16cpusquadcoreandbeyond

Recognizesdifferencesbetween
logicalandphysicalprocessors

I.E.Multicore,hyperthreaded&
chips/sockets

Optimizesprocessscheduling
totakeadvantageofshared
onchipcache,andNUMAmemorynodes

Implementsmultilevelrunqueues
forsocketsandcores(as
opposedtoonerunqueue
perprocessororpersystem)

StrongCPUaffinityavoids
taskbouncing
RequiressystemBIOStoreportCPU
topologycorrectly

Socket 0
Core 0
Thread 0

Thread 1

Core 1
Thread 0

Socket 1

Thread 1

Thread 0

Thread 1

Socket 2

Process

Process

Process

Process

Process

Process

Process

Process

Process

Process

Process

Process

Scheduler Compute Queues

AsynchronousI/OtoFileSystems

EliminatesSynchronousI/Ostall

Stall for
completion

CriticalforI/Ointensiveserverapplications

App I/O
Request

Device
Driver
I/O Request
Issue

RedHatEnterpriseLinuxsince2002

Synchronous I/O

AllowsapplicationtocontinueprocessingwhileI/
Oisinprogress

I/O

SupportforRAWdevicesonly

Application

WithRedHatEnterpriseLinux4,significant
improvement:

SupportforExt3,NFS,GFSfilesystem
access
SupportsDirectI/O(e.g.Database
applications)

I/O Request
Completion

Asynchronous I/O

No stall for
completion

Makesbenchmarkresultsmoreappropriate
forrealworldcomparisons

App I/O
Request

I/O

I/O
Completion
Application

Red Hat Confidential

Device
Driver
I/O Request
Issue

I/O Request
Completion

AsynchronousI/OCharacteristics
R4 U4 FC AIO Read

R4 U4 FC AIO Write Perf


180

140

160

120

140

4k

100

8k

80

16k
32k

60

64k

40

120

4k
8k
16k
32k
64k

100
80
60
40

20
0

MB/sec

MB/sec

160

20

aios

16

32

64

aios

16

32

64

PerformanceTuningDISKRHEL3
[root@dhcp8336sysctl]#/sbin/elvtune/dev/hda

/dev/hdaelevatorID0
read_latency:2048
write_latency:8192
max_bomb_segments:6

[root@dhcp8336sysctl]#/sbin/elvtuner1024w2048/
dev/hda

/dev/hdaelevatorID0
read_latency:1024
write_latency:2048
max_bomb_segments:6

DiskIOtuningRHEL4/5

RHEL4/54tunableI/OSchedulers

CFQelevator=cfq.CompletelyFairQueuingdefault,balanced,fairfor
multipleluns,adaptors,smpservers
NOOPelevator=noop.Nooperationinkernel,simple,lowcpu
overhead,leaveopttoramdisk,raidcntrletc.
Deadlineelevator=deadline.Optimizeforruntimelikebehavior,low
latencyperIO,balanceissueswithlargeIOluns/controllers(NOTE:
currentbestforFC5)
Anticipatoryelevator=as.InsertsdelaystohelpstackaggregateIO,
bestonsystemw/limitedphysicalIOSATA

RHEL4Setatboottimeoncommandline

RHEL5Changeonthefly

FileSystems

Separateswapandbusypartitionsetc.

EXT2/EXT3separatetalk
http://www.redhat.com/support/wpapers/redhat/ext3/*.html

Tune2fsormountoptions

data=orderedonlymetadatajournaled

data=journalbothmetadataanddatajournaled

data=writebackusewithcare!

SetupdefaultblocksizeatmkfsbXX

RHEL4/5EXT3improvesperformance

Scalabilityupto5Mfile/system

Sequentialwritebyusingblockreservations

Increasefilesystemupto8TB

GFSglobalfilesystemclusterfilesystem

OptimizingFileSystemPerformance

UseOLTPandDSSworkloads

Resultswithvariousdatabasetuningoptions

RAWvsEXT3/GFS/NFSw/o_direct(iedirectIOiniozone)

ASYNCIOoptions

RHEL3DIO+AIOnotoptimal(pagecachestillactive)

RHEL4

EXT3supportsAIO+DIOoutofthebox

GFSU2fullsupportAIO+DIO/Oraclecert

NFSU3fullsupportofbothDIO+AIO

HUGHMEMkernelsonx86kernels
HugeTLBSuselargerpagesizes(ipcs)

Section4Examples

Generalguidelines

EffectofNUMAandNUMCTL

EffectCPUspeedhowtocontrol

Benchmarking

McCalpinknowmaxmemoryBW

IOzonerunyourown

DatabaseTuning

JVMTuning

McCalpinStreamsCopyBandwidth(1,2,4,8)
16000

25

14000
20

Rate(MB/s)

12000
10000

15

NonNuma

8000
10

6000
4000

5
2000
0

No.ofStreams

Numa
%Difference

RHEL4&5NUMAstatandNUMActl

NUMAstattodisplaysystemNUMAcharacteristicsonanumasystem
[root@perf5~]#numastat
node3node2node1node0
numa_hit7268482215157244325444
numa_miss0000
numa_foreign0000
interleave_hit2668243127632699
local_node6730677456152115324733
other_node537847595129711

NUMActltocontrolprocessandmemory

numactl[interleavenodes][preferrednode][membindnodes]
[cpubindnodes][localalloc]command{arguments...}
TIP

App<memorysingleNUMAzone

Numactlusecpubindcpuswithinsamesocket

App>memoryofasingleNUMAzone

NumactlinterleaveXYandcpubindXY

RHEL4&5NUMAstatandNUMActl
EXAMPLES
numactlinterleave=allbigdatabaseargumentsRunbigdatabasewith
itsmemoryinterleavedonallCPUs.
numactlcpubind=0membind=0,1processRunprocessonnode0with
memoryallocatedonnode0and1.
numactlpreferred=1numactlshowSetpreferrednode1andshowthe
resultingstate.
numactlinterleave=allshmkeyfile/tmp/shmkeyInterleaveallofthe
sysvsharedmemoryregiionspecifiedby/tmp/shmkeyoverallnodes.
numactloffset=1Glength=1Gmembind=1file/dev/shm/Atouch
Bindthesecondgigabyteinthetmpfsfile/dev/shm/Atonode1.
numactllocalalloc/dev/shm/fileResetthepolicyforthesharedmem
oryfilefiletothedefaultlocalallocpolicy.

LinuxNUMAEvolution
RHEL3,4and5LinpackMultistream
AMD64,8cpudualcore(1/2cpusloaded)
3000000

45
40

PerformanceinKflops

2500000
35
2000000

30
25

1500000
20
1000000

15
10

500000
5
0

Limitations:

0
RHEL3U8

RHEL4U5

Numaspilltodifferentnumaboundaries

Processmigrationsnowayback

Lackofpagereplicationtext,readmostly

RHEL5GOLD

DefaultScheduler
TasksetAffinity
ColumnE

RHEL5.2CPUspeedandperformance:

Enabled=governorsettoondemand

Looksatcpuusagetoregulatepower

Within35%ofperformanceforcpuloads

IOloadscankeepcpusteppeddown1530%

SupportedinRHEL5.2virtualization

Toturnoffelsemayleavecpusinreducedstep

Ifitsnotusingperformance,then:

#echoperformance>/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Thenchecktoseeifitstuck:

#cat/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Check/proc/cpuinfotomakesureyourseeingtheexpectedCPUfreq.

Proceedtonormalservicedisable

Servicecpuspeedstop

Chkconfigcpuspeedoff

EffectsofCPUspeedtopeakperformance:
RHEL5.2EffectofCPUspeedonI/Oworkloads
Intel4cpu,16Gbmemory,FCdisk
1.2

(cpuspeeddisabled)

RelativePerformancetoPeak

0.8

50%vsPeak
99%vsPeak

0.6

0.4

0.2

0
IozonePerf

OracleOLTP

IBMDB2

EffectsofCPUspeedwithRHEL5.2Virtualization
OraclerunswithCPUFreqXenkernel
1.2

0.8

80URun1
80URun2

0.6

0.4

0.2

0
RHEL51Dom0CPUfreqon

RHEL51Dom0CPUfreqoff

RHEL51PVCPUfreqon

RHEL51PVCPUfreqoff

UsingIOzonew/o_directmimicdatabase

Problem:

Filesystemsusememoryforfilecache

Databasesusememoryfordatabasecache

Userswantfilesystemformanagementoutsidedatabase
access(copy,backupetc)

YouDON'TwantBOTHtocache.

Solution:

FilesystemsthatsupportDirectIO

Openfileswitho_directoption

DatabaseswhichsupportDirectIO(ORACLE)

NODOUBLECACHING!

EXT3,GFS,NFSIozonew/DirectIO

PerformanceinMB/sec

RHEL5Direct_IOIOzoneEXT3,GFS,NFS
(Geom1M4GB,1k1m)
80
70
60

EXT_DIO

50

GFS1_DIO

40

NFS_DIO

30
20
10
0

ALL

Initial ReWrite

I/O's

Write

Read

ReRead Random Random


Read

Write

Back

RecRe

Stride

ward

Write

Read

Read

Red Hat Confidential

TheTranslationLookasideBuffer(TLB)isa
smallCPUcacheofrecentlyusedvirtualto
physicaladdressmappings

TLBmissesareextremelyexpensiveon
today'sveryfast,pipelinedCPUs

Largememoryapplications
canincurhighTLBmissrates

HugeTLBFS

TLB

HugeTLBspermitmemorytobe
managedinverylargesegments

E.G.Itanium:

Standardpage:16KB

Defaulthugepage:256MB

16000:1difference

Filesystemmappinginterface

Idealfordatabases

128data
128instruction

VirtualAddress
Space

E.G.TLBcanfullymapa32GB
OracleSGA

Red Hat Confidential

PhysicalMemory

UsingHugeTLBfsw/Databases
RHEL4+5 Effect of HugeTLBfs
Oracle 10G OLTP Performance
Intel 4cpu, 8GB memory, FC San

Transactions/min (k)

60

16.0%
14.0%

50

12.0%
40

10.0%

30

8.0%
6.0%

20

4.0%
10
0

2.0%
RHEL4 U5

RHEL5 GA

0.0%

Base (4k)
HugeTLBfs (2MB)
%Diff

JVMTuning

Eliminateswapping

Promotepagecachereclaiming

Lowerswappinessto10%(or
lowerifnecessary).
Lowerdirty_background_ratioto
10%
Lowerdirty_ratioifnecessary

Promoteinodecachereclaiming

Lowervfs_cache_pressure

TuningNetworkAppsMessages/sec

Disablecpuspeed,selinux,auditd,irqbalance

ManualbindingIRQsw/multiplenics

echovalues>/proc/irq/XXXoruseTUNA

IntelixgbIRQssend/recvtocpusocketw/sharedcache

UseTasksetctostartapplicationson

1cpupersocketgoodforBWintensiveapp

Shieldcpusforcriticalapps

Moveallexistingprocessesoffofthecore(s)tocpu0

Pairsofcpusonthesamesocketshared2ndlevelcache

KeepuserappsoncpusseparatefromNetworkapps

RTTuningNetworkAppsMessages/sec
10 Gbit Nics Stoakley 2.67 to Bensley 3.0 Ghz
Tuning enet gains +25% in Ave Latency,
RT kernel reduced peak latency but smoother how much?
RedHatMRGPerformanceAMQPMess/s
Intel8cpu/16gb,10Gbenet

Messages/sec(32bytesize)

600000
500000
400000
300000
200000
100000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Samples(MillionMessage/sample)
rhel52_base

rhel52_tuned

rhelrealtime_tune

RTPeformanceofNetworkAppsMessages/sec
RHAMQPLatencyonIntel8cpu/10Gbitenet
RHEL5.2andRHELRT

Milisecond/message

120.00

100.00

80.00

Ave
StdDev
Max

60.00

40.00

Max

20.00

0.00
3
rt

2b

6
rt

4b

1
rt

8b

2
rt

b
56

2
51
rt

1
rt

kb

2
r5

2b

2
r5

4b

2
r5

b
28

2
r5

6
25

2
r5

1
5

2b

2
r5

1k

Ave
b

NumaNetworkAppsMessages/sec

Messages/Sec

Wombat Messages/sec RHEL5.2


Effects with Numa On/Off
60000

3000

50000

2500

40000

2000

30000

1500

20000

1000

10000

500

0
5000

10000

15000

20000

Message Rate

25000

30000

40000

0
50000

Messages/sec Numa On
Messages/sec Numa Off
Average Latency (ms) Numa
On
Average Latency (ms) Numa
Off

GeneralPerformanceTuningGuidelines

Usehugepageswheneverpossible.

Minimizeswapping.

Maximizepagecachereclaiming

Placeswappartition(s)onquite
device(s).

DirectIOifpossible.

BewareofturningNUMAoff.

BenchmarkTuning

UseHugepages.

Dontovercommitmemory

Ifmemorymustbeovercommitted

Eliminateallswapping.

Maximizepagecachereclaiming

Placeswappartition(s)on
separatedevice(s).

UseDirectIO

DontturnNUMAoff.

LinuxPerformanceTuningReferences

Alikins,?SystemTuningInfoforLinuxServers,
http://people.redhat.com/alikins/system_tuning.html

Axboe,J.,?DeadlineIOSchedulerTunables,SuSE,EDFR&D,2003.

Braswell,B,Ciliendo,E,?TuningRedHatEnterpriseLinuxonIBMeServer
xSeriesServers,http://www.ibm.com/redbooks

Corbet,J.,?TheContinuingDevelopmentofIOScheduling?,
http://lwn.net/Articles/21274.

Ezolt,P,OptimizingLinuxPerformance,www.hp.com/hpbooks,Mar2005.

Heger,D,Pratt,S,?WorkloadDependentPerformanceEvaluationoftheLinux
2.6IOSchedulers?,LinuxSymposium,Ottawa,Canada,July2004.

RedHatEnterpriseLinuxPerformanceTuningGuide
http://people.redhat.com/dshaks/rhel3_perf_tuning.pdf

Network,NFSPerformancecoveredinseparatetalks
http://nfs.sourceforge.net/nfshowto/performance.html

Questions?

You might also like