You are on page 1of 56

IBM Advanced Technical Support - Americas

AIX Performance: Configuration & Tuning for Oracle

Vijay Adik vadik@us.ibm.com ATS - Oracle Solutions Team

April 17, 2009

© 2009 IBM Corporation

IBM Advanced Technical Support - Americas

Legal information
The information in this presentation is provided by IBM on an "AS IS" basis without any warranty, guarantee or assurance of any kind. IBM also does not provide any warranty, guarantee or assurance that the information in this paper is free from errors or omissions. Information is believed to be accurate as of the date of publication. You should check with the appropriate vendor to obtain current product information. Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use. IBM,^, RS6000, System p, AIX, AIX 5L, GPFS, and Enterprise , Storage Server (ESS) are trademarks or registered trademarks of the International Business Machines Corporation. Oracle, Oracle9i and Oracle10g are trademarks or registered trademarks of Oracle Corporation. All other products or company names are used for identification purposes only, and may be trademarks of their respective owners.

2

© 2009 IBM Corporation

April 17, 2009

IBM Advanced Technical Support - Americas

Agenda
AIX Configuration Best Practices for Oracle – Memory – CPU – I/O – Network – Miscellaneous

3

© 2009 IBM Corporation

April 17, 2009

IBM Advanced Technical Support - Americas

AIX Configuration Best Practices for Oracle
The suggestions presented here are considered to be basic configuration “starting points” for general Oracle workloads Your workloads may vary Ongoing performance monitoring and tuning is recommended to ensure that the configuration is optimal for the particular workload characteristics

4

© 2009 IBM Corporation

April 17, 2009

IBM Advanced Technical Support - Americas

Performance Overview – Tuning Methodology
Iterative Tuning Process Stress System (i.e., Tune at Peak workload) Monitor Sub-Systems Identify Predominant Bottleneck Tune Bottleneck
Predominant Bottleneck

Understand the external view of system performance The external view of system performance is the observable event that is causing someone to say the system is performing poorly. Typically, (1) end-user response time, (2) application (or task) response time or (3) throughput. Should not use system metrics to judge improvement. Performance only improves when the predominant bottleneck is fixed Fixing a secondary bottleneck will not improve performance and typically results in overloading an already overloaded predominant bottleneck. Monitor Performance after a change – Tuning is an iterative process Monitoring is required after making a change for two reasons (1) Fixing the predominant bottleneck typically uncovers another bottleneck, and (2) Not all changes yield a positive results. If possible you should have a “repeatable” test to so change can be accurately evaluated.

Repeat

CPU

Memory

Network

I/O

• End-User Response time is the elapsed time between when a user submits a request and receives a response. • Application Response time is the elapsed required for one or more jobs to complete. Historically, these jobs have been called batch jobs. • Throughput is the amount of work that can be accomplished per unit time. This metric is typically expressed in terms of transaction per minute.

5

© 2009 IBM Corporation

April 17, 2009

splat. gprof. reorgvg 6 © 2009 IBM Corporation April 17. sar.chl v. trace. fuser. kdb. lsattr/lsdev. chdev. emstat/alstat netpmon svmon. atmstat. ipcs mpstat. chdev. topas. emstat/alstat tokstat. trace.chdev Status Commands Monitor Commands Trace Level Commands tprof.trcrpt netstat. bindprocessor rmss. setpri ioo. topas. bindintcpu. trcrpt nfso. prof truss. dbx. fdpr. trcrpt trace. trcrpt iptrace. truss. ps. iostat. kdb. migratepv. iostat. filemon Network Processes & Threads vmstat. splat. lspv/lsvg/lslv fileplace. fddistat. ps. pprof curt. entstat. netpmon. ipreport. time/timex. ps. pstat. vmo. lparstat. curt.fdpr. topas. 2009 .ifconfig Tuning tools schedo. lvmo. chps/mkps nice/renice.Americas Performance Monitoring and Tuning Tools CPU Memory I/O Subsystem vmstat. trcrpt no. ifconfig netpmon. lvmstat. lsps. lsps. filemon trace. vmstat. tcpdump svmon. trace. topas. . nfsstat.IBM Advanced Technical Support . topas.

2009 .IBM Advanced Technical Support .Americas Agenda AIX Configuration Best Practices for Oracle – Memory – CPU – I/O – Network – Miscellaneous 7 © 2009 IBM Corporation April 17.

. which are 16M The 32-bit or 64-bit address translates into a 52-bit or 80-bit virtual address – 32-bit system : 4-bit segment register that contains a 24-bit segment id. – The VMM replenishes the free list by removing some of the current pages from real memory (i. • 52-bit segment id + 28-bit offset = 80-bit VA The VMM maintains a list of free frames that can be used to retrieve pages that need to be brought into memory. 2009 . – A segment is 256M – default page size 4K – POWER 4+ and POWER5 can define large pages. and 28-bit offset. steal memory).Advanced Technical Support – System p AIX Memory Management Overview The role of Virtual Memory Manager (VMM) is to provide the capability for programs to address more memory locations than are actually available in physical memory. 8 © 2009 IBM Corporation April 17. and 28-bit offset. On AIX this is accomplished using segments that are partitioned into fixed sizes called “pages”. The VMM uses a Page Replacement Algorithm (implemented in the lrud kernel threads) to select pages that will be removed from memory.e. • 24-bit segment id + 28-bit offset = 52-bit VA – 64-bit system: 32-bit segment register that contains a 52-bit segment id. – The process of moving data between memory and disk is called “paging”.

. 256 Mbyte Segment 52-bit Segment Id + 28-bit offset = 80-bit Virtual Address Virtual Memory 1 Trillion Terabytes or 1 Yotta byte 9 © 2009 IBM Corporation April 17.Advanced Technical Support – System p Virtual Memory Space – 64 Bits 36-bits selects Segment Register 28-bits offset within Segment 64-bit Address Each Segment Register contains a 52-bit Segment ID Segments IDs 0 Kernel Segment Page Space Disk Map Kernel Heap Segment is divided into 4096 byte chunks called pages Each Segment can have a maximum of 65536 pages 28-bit offset – to access a specific location in the segment 228 = 256M . . 2009 .

Advanced Technical Support – System p Memory Tuning Overview Memory: vmo –p –o <parameter name>=<new value> -p flags updates /etc/tunables/nextboot Virtual Memory (General) JFS Enhanced JFS (JFS2) Large Pages (Pinned Memory 1) minfree maxfree lru_file_repage lru_poll_interval maxperm strict_maxperm maxclient strict_maxclient v_pinshm lgpg_regions lgpg_size NAME CUR DEF BOOT MIN MAX UNIT TYPE -------------------------------------------------------------------------------lru_file_repage 1 1 1 0 1 boolean D lru_poll_interval 0 0 0 0 60000 milliseconds D maxclient% 80 80 80 1 100 % memory D maxfree 1088 1088 1088 8 200K 4KB pages D maxperm% 80 80 80 1 100 % memory D minfree 960 960 960 8 200K 4KB pages D strict_maxclient 1 1 1 0 1 boolean D strict_maxperm 0 0 0 0 1 boolean D minperm% 20 20 20 1 100 % memory D 10 © 2009 IBM Corporation April 17. 2009 .

Americas Virtual Memory Manager (VMM) Tuning The AIX “vmo” command provides for the display and/or update of several parameters which influence the way AIX manages physical memory – The “-a” option displays current parameter settings vmo –a – The “-o” option is used to change parameter values vmo –o minfree=1440 – The “-p” option is used to make changes persist across a reboot vmo –p –o minfree=1440 On AIX 5. 2009 .3.IBM Advanced Technical Support . number of the default “vmo” settings are not optimized for database workloads and should be modified for Oracle environments 11 © 2009 IBM Corporation April 17.

1 AIX 6. Many tunable are classified as ‘Restricted’: – Only change if AIX Support says so – Parameters will not be displayed unless the ‘-F’ option is used for commands like vmo.1.1 environment 12 © 2009 IBM Corporation April 17. 2009 . When migrating from AIX 5.3 will be transferred to AIX 6.IBM Advanced Technical Support .3 to 6. no.Americas Kernel Parameter Tuning – AIX 6.1 configured by default to be ‘correct’ for most workloads. parameter override settings in AIX 5. ioo. etc.

IBM Advanced Technical Support . 2009 .Americas General Memory Tuning Two primary categories of memory pages: Computational and File System AIX will always try to utilize all of the physical memory available (subject to vmo parameter settings) – What is not required to support current computational page demand will tend to be used for filesystem cache – Raw Devices and filesystems mounted (or individual files opened) in DIO/CIO mode do not use filesystem cache Memory Use 9/20/2007 Process% 100 90 80 70 60 50 40 30 20 10 0 08:02 08:06 08:10 08:14 08:18 08:22 08:26 08:30 08:34 08:38 08:42 08:46 08:50 08:54 08:58 09:02 09:06 09:10 09:14 09:18 09:22 09:26 09:30 09:34 09:38 09:42 09:46 09:50 09:54 09:58 10:02 10:06 10:10 10:14 FScache% 13 © 2009 IBM Corporation April 17.

maxclient = target maximum number of pages for fs buffer cache Parameters: MINPERM% = target min % real memory for fs buffer cache MAXPERM%. 2009 . MAXCLIENT% = target max % real memory for fs buffer cache MINFREE = target minimum number of free memory pages MAXFREE = target maximum number of free memory pages When does LRUD start? When total free pages (in a given memory pool) < MINFREE When (maxclient pages . seen in ‘vmstat –v’ minperm = target minimum number of pages for fs buffer cache maxperm. numclient = used fs buffer cache pages.numclient) < MINFREE When does LRUD stop? When total free pages > MAXFREE When (maxclient pages – numclient) > MAXFREE 14 © 2009 IBM Corporation April 17.IBM Advanced Technical Support .Americas VMM Tuning Definitions: LRUD= VMM page stealing process (LRU Daemon) – 1 per Memory Pool numperm.

3 and 6. MAXCLIENT%=90* STRICT_MAXPERM=0* STRICT_MAXCLIENT=1 (default for 6.3 and 6.2 environments with large physical memory.2ML4+ or later) LRU_FILE_REPAGE=0 (default for 5. set MAXPERM%.3 and 6.1) (default for 5.1) (default for 5.1) tells LRUD to page out file pages (filesystem buffer cache) rather than computational pages when numperm > minperm LRU_POLL_INTERVAL=10 (default for 5.3 and 6. 2009 .IBM Advanced Technical Support .1) (default for 6. MAXCLIENT% = (2400 / Phys Memory (GB)) and STRICT_MAXPERM=1 15 © 2009 IBM Corporation April 17.1) * In AIX 5.1) indicates the time period (in milliseconds) after which LRUD pauses and interrupts can be serviced. MINPERM%=3 MAXPERM%. Default value of “0” means no preemption.Americas VMM Tuning starting points (AIX 5.

Americas Virtual Memory Management (VMM) Thresholds 100% P h y s ic a l M e m o ry 80% 60% 40% 20% 0% Time numperm% maxfree 18 Start stealing pages when free memory below minfree Stop stealing pages when free memory above maxfree When numperm% > maxperm%. steal only file system pages When minperm% < numperm% < maxperm%.IBM Advanced Technical Support . steal both file system and computational pages © 2009 IBM Corporation April 17. 2009 . depending on repage rate comp% minfree Free% minperm% maxperm% When numperm% < minperm%. steal file system or computation pages.

maxpgahead=8 and j2_maxPageReadAhead=8: – minfree = 360 = 120 x 6 x 2 / 4 – maxfree = 1536 = 1440 + (max(8.120 x # logical CPUs /#mem pools) AIX 5.Americas VMM Page Stealing Thresholds Minfree/maxfree values are per memory pool in AIX 5. j2_maxPageReadAhead) * # logical CPUs .1 – Total system minfree = minfree * # of memory pools – Total system maxfree = maxfree * # of memory pools Most workloads do not require minfree/maxfree changes in AIX 5.3/6.3 or 6.1: maxfree = max(1088.1: minfree = max(960. 2009 maxfree = minfree + (MAX(maxpgahead.minfree + (MAX(maxpgahead.2: minfree = 120 x # logical CPUs Consider increasing if vmstat “fre” column frequently approaches zero or if “vmstat –s” shows significant “free frame waits” maxfree AIX 5.8) x 6 x 2) vmo –o minfree=1440 –o maxfree=1536 -p 19 © 2009 IBM Corporation April 17.3 and 6.1 minfree AIX 5.IBM Advanced Technical Support .3/6. j2_maxPageReadAhead) * # logical CPUs)/ # mem pools) AIX 5.2: Example: For a 6-way LPAR with SMT enabled.

LRUD may have to spend a lot of time scanning the LRU list to find an eligible filesystem page to steal AIX 6.1) 21 © 2009 IBM Corporation April 17.1 introduced the ability to maintain separate LRU lists for computational vs.3 New page_steal_method parameter – Enabled (1) by default in 6. – In environments with lots of computational pages that you want to keep in memory. – Also backported to AIX 5.3 and 6.1.3 – Requires a reboot to change – Recommended for Oracle DB environments (both AIX 5. AIX maintained a single LRU list which contains both computational and filesystem pages.Advanced Technical Support – System p Page Steal Method Historically. disabled (0) by default in 5. 2009 . filesystem pages.

Advanced Technical Support – System p Understanding Memory Pools Memory cards are associated with every Multi Chip Module (MCM). Memory pool configuration is influenced by the VMO parameter “memory_affinity” – – Memory_affinity=1 means configure memory pools based on physical hardware configuration (DEFAULT) Memory_affinity=0 means configure roughly uniform memory pools from any physical location p590 / p595 MCM Architecture Number can be seen with ‘vmstat –v |grep pools’ Size can only be seen using KDB LRUD operates per memory pool 22 © 2009 IBM Corporation April 17. memory for a process is allocated from memory associated with the processor that caused the page fault. DCMs or DCMs For a given LPAR. there will normally be at least 1 memory pool for each MCM. DCM or QCM that has contributed processors to that LPAR or shared processor pool – By default. Dual Core Module (DCM) or Quad Core Module (QCM) in the server – The Hypervisor assigns physical CPUs to a dedicated CPU LPAR (or shared processor pool) from one or more MCMs. 2009 .

1 23 © 2009 IBM Corporation April 17. try basic vmo parameter or Oracle SGA/PGA tuning first If issues remain. system paging or excessive LRUD scanning activity) if memory pool sizes are unbalanced If there are paging or LRUD related issues.Advanced Technical Support – System p Memory Affinity… Not generally a benefit unless processes are bound to a particular processor It can exacerbate any page replacement algorithm issues (e.g. use ‘kdb’ to check if memory pool sizes are unbalanced: KDB(1)> memp * memp_frs+010000 memp_frs+010780 memp_frs+010280 memp_frs+010500 VMP 00 00 01 02 MEMP 000 003 001 002 NB_PAGES FRAMESETS 00B1F9F4 000 001 00001BBC 006 007 00221C80 002 003 00221C80 004 005 NUMFRB 00B073DE 00000000 0021C3CB 0021CDDE Pages in pool Free pages If the pool sizes are not balanced.3 TL5/TL6 solved most memory affinity issues – Memory_affinity is also a “Restricted” tunable in AIX 6. 2009 . consider disabling Memory Affinity: # vmo –r –o memory_affinity=0 (requires a reboot) – IY73792 required for 5300-01 and 5300-02 – Code changes in 5.

2.ibm. the largest available smaller size will be used • Current Oracle versions will end up using 64KB pages even if SGA is not pinned Refer: http://www03.3 5300-04 introduces two new page sizes: – 64K – 16M (large pages) (available since Power 4+) – 16G (huge pages) • • • Requires p5+ hardware Requires p5 System Release 240.Americas AIX 5. Service Level 202 microcode 16MB support requires Version 5 Release 2 of the Hardware Management Console (HMC) machine code User/Application must request preferred page size – 64K page size is very promising.com/systems/resources/systems_p_os_aix_whitepapers_multiple_ page.3/6.1 Multiple Page Size Support AIX 5.0.pdf 24 © 2009 IBM Corporation April 17. since they do not need to be configured/reserved in advance or pinned • export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STATSPACK=64K oracle* to use the 64K pagesize for stack.4 & up) If preferred size not available. 2009 .IBM Advanced Technical Support . data & text – – Will require Oracle to explicitly request the page size (10.

SGA & PGA properly.IBM Advanced Technical Support .1 Note: It is recommended not to pin SGA.CAP_PROPAGATE oracle Using Monitoring Tools svmon –G svmon –P Oracle metalink note# 372157. as long as you had configured the VMM.Americas Large Page Support (optional) Pinning shared memory AIX Parameters • • vmo –p –o v_pinshm = 1 Leave maxpin% at the default of 80% unless the SGA exceeds 77% of real memory – Vmo –p –o maxpin%=[(total mem-SGA size)*100/total mem] + 3 Oracle Parameters • LOCK_SGA = TRUE Enabling Large Page Support vmo –r –o lgpg_size = 16777216 –o lgpg_regions=(SGA size / 16 MB) Allowing Oracle to use Large Pages chuser capabilities=CAP_BYPASS_RAC_VMM. 25 © 2009 IBM Corporation April 17. 2009 .

984 ---------------- sum 18.864.864.172.IBM Advanced Technical Support .371.960 / 16.928.777.241.513.448 2.210. 2009 .584 1.Americas Determining SGA size SGA Memory Summary for DB: test01 Instance: test01 Snaps: 1046 -1047 SGA regions -----------------------------Database Buffers Fixed Size Redo Buffers Variable Size Size in Bytes ---------------16.944 768.172.960 lgpg_regions = 18.216 = 1084 (rounded up) 26 © 2009 IBM Corporation April 17.

paging space does not have to be large Provides safety net to prevent system crashes when memory overcommitted. Generally. 2009 . MAXCLIENT) Reduce Oracle SGA or PGA (9i or later) size Add physical memory Do not over commit real memory! 27 © 2009 IBM Corporation April 17. keep within internal drive or high performing SAN storage Monitor paging activity: vmstat -s sar -r nmon Resolve paging issues: Reduce file system cache size (MAXPERM.IBM Advanced Technical Support .Americas AIX Paging Space Allocate Paging Space: Configure Server/LPAR with enough physical memory to satisfy memory requirements With AIX demand paging.

Large Memory Model – Goal is to adjust tuning parameters to prevent paging • Multiple Memory pools • Page Space smaller than Memory • Must Tune VMM key parameters – %Computational > 70% . 2009 .Small Memory Model – Goal is to make paging as efficient as possible • • • • • Add multiple page spaces on different spindles Make all pages space the same size to ensure round-robin scheduling PS = 1.Advanced Technical Support – System p Tuning and Improving System Performance Adjust the VMM Tuning Parameters – Key parameters listed on word document Implement VMM related Mount Options – DIO / CIO – Release behind or read and/or write Reduce Application Memory Requirements Memory Model – %Computational < 70% .5 computational requirements Turn off DEFPS Memory Load Control Add additional Memory 28 © 2009 IBM Corporation April 17.

IBM Advanced Technical Support . 2009 .Americas Agenda AIX Configuration Best Practices for Oracle – Memory – CPU – I/O – Network – Miscellaneous 29 © 2009 IBM Corporation April 17.

2009 .Advanced Technical Support – System p CPU Considerations Oracle Parameters based on the # of CPUs – DB_WRITER_PROCESSES – Degree of Parallelism • • • • user level table level query level MAX_PARALLEL_SERVERS or AUTOMATIC_PARALLEL_TUNING (CPU_COUNT * PARALLEL_THREADS_PER_CPU) – CPU_COUNT – FAST_START_PARALLEL_ROLLBACK – should be using UNDO instead – CBO – execution plan may be affected. check explain plan 30 © 2009 IBM Corporation April 17.

00 : 4.00% :April 17.Advanced Technical Support – System p Lparstat command # lparstat -i • • • • • • • • • • • • • • • • • • • • • • • • 32 Node Name Partition Name Partition Number Type Mode Entitled Capacity Partition Group-ID Shared Pool ID Online Virtual CPUs Maximum Virtual CPUs Minimum Virtual CPUs Online Memory Maximum Memory Minimum Memory Variable Capacity Weight Minimum Capacity Maximum Capacity Capacity Increment Maximum Physical CPUs in system Active Physical CPUs in system Active CPUs in Pool Unallocated Capacity Physical CPU Percentage Unallocated Weight : erpcc8 ::: Dedicated : Capped : 4.00 : 1.00 :4 :4 ::: 100. 2009 © 2009 IBM Corporation .00 :::4 :4 :1 : 8192 MB : 9216 MB : 128 MB :: 1.

3/Power5 (or later) environments Micro-partitioning Guidelines – Virtual CPUs <= physical processors in shared pool CAPPED – Virtual CPUs should be the nearest integer >= capping limit UNCAPPED – Virtual CPUS should be set to the max peak demand requirement – Entitlement >= Virtual CPUs / 3 DLPAR considerations Oracle 9i – Oracle CPU_COUNT does not recognize change in # cpus – AIX scheduler can still use the added CPUs Oracle 10g – Oracle CPU_COUNT recognizes change in # cpus Max CPU_COUNT limited to 3x CPU_COUNT at instance startup 33 © 2009 IBM Corporation April 17.Advanced Technical Support – System p CPU Considerations Use SMT with AIX 5. 2009 .

3.0.3 AIX 6.3 TL07 and higher are certified 35 © 2009 IBM Corporation April 17.IBM Advanced Technical Support .2 AIX 5. 2009 .Americas CPU: Compatibility Matrix SMT AIX 5.1 & AIX 5.3 on VIOS 1.1 DLPAR MicroPartition Oracle 9i Oracle 10g Oracle 11g Note: Oracle RAC 10.2.1.

IBM Advanced Technical Support . 2009 .Americas Agenda AIX Configuration Best Practices for Oracle – Memory – CPU – I/O – Network – Miscellaneous 36 © 2009 IBM Corporation April 17.

2009 Raw disks Multi-path IO driver (optional) Queues exist for both adapters and disks Disk Device Drivers Adapter device drivers use DMA for IO Adapter Device Drivers Disk subsystems have read and write cache Disk subsystem (optional) Disks have memory to store commands/data Disk Write cache Read cache or memory area used for IO Raw LVs JFS JFS2 NFS Other VMM LVM (LVM device drivers) .Americas The AIX IO stack Application Logical file system NFS caches file attributes NFS has a cached filesystem for NFS clients JFS and JFS2 cache use extra system RAM JFS uses persistent pages for cache JFS2 uses client pages for cache Application memory area caches data to avoid IO IOs can be coalesced (good) or split up (bad) as they go thru the IO stack 37 © 2009 IBM Corporation April 17.IBM Advanced Technical Support .

cached writes. – Direct I/O (DIO) mount/open option no caching on reads Enhanced JFS (JFS2) Better for large files/filesystems – Buffer caching (default) provides Sequential Read-Ahead. control files and online redo logs only!!! GPFS Clustered filesystem – the IBM filesystem for RAC – Non-cached. cached writes.dbf. etc. – Direct I/O (DIO) mount/open option no caching on reads DIO. with write serialization – Concurrent I/O (CIO) mount/open option disabled • Use for Oracle . etc. non-blocking I/Os (similar to JFS2 CIO) for all Oracle files GPFS and JFS2 with CIO offer similar performance as Raw Devices 40 © 2009 IBM Corporation April 17. 2009 .IBM Advanced Technical Support .Americas AIX Filesystems Mount options Journaled File System (JFS) Better for lots of small file creates & deletes – Buffer caching (default) provides Sequential Read-Ahead.

• Performance achieved using CIO is comparable to raw-devices.IBM Advanced Technical Support . .3. To mount a filesystem in CIO: $ mount –o cio /data 41 © 2009 IBM Corporation April 17.Americas AIX Filesystems Mount options (Cont’d) Direct IO (DIO) – introduced in AIX 4. • No Inode locking : Multiple threads can perform reads and writes on the same file at the same time.2 ML1 • Implicit use of DIO. • Emulates a raw-device implementation. 2009 Bench throughput over run duration – higher tps indicates better performance. • Data is transfered directly from the disk to the application buffer. bypassing the file buffer cache hence avoiding double caching (filesystem cache + Oracle SGA). To mount a filesystem in DIO $ mount –o dio /data Concurrent IO (CIO) – introduced with JFS2 in AIX 5.

mount -o rw Cached by AIX mount -o rw Cached by Oracle Cached by AIX mount -o rw Cached by Oracle Cached by AIX mount -o rw Cached by AIX mount -o rw Cached by AIX with optimized mount options mount -o rw Cached by AIX mount -o cio * (1) Oracle Datafiles Cached by Oracle mount -o cio (jfs2 + agblksize=512) Cached by Oracle mount -o rbrw Use JFS2 write-behind … but are not kept in AIX Cache.IBM Advanced Technical Support . 2009 .Americas CIO/DIO implementation Advices with Standard mount options Oracle bin and shared lib. mount -o rw Cached by AIX Oracle Redolog Oracle Archivelog Oracle Control files *(1) : to avoid demoted IO : jfs2 agblksize = Oracle DB block size / n © 2009 IBM Corporation 43 April 17.

2009 . This may lead to significant I/O performance problems on often accessed frequently changing files such as the contents of the /var/spool directory.IBM Advanced Technical Support .Americas CIO Demotion and Filesystem Block Size Data Base Files (DBF) If db_block_size = 2048 If db_block_size >= 4096 set agblksize=2048 set agblksize=4096 Online redolog files & control files Set agblksize=512 and use CIO or DIO Mount Filesystems with “noatime” option AIX/Linux records information about when files were created and last modified as well as last accessed. 44 © 2009 IBM Corporation April 17.

Starting Value: 2048 Monitor with “vmstat –v” 45 © 2009 IBM Corporation April 17.Americas I/O Tuning (ioo) READ-AHEAD (Only applicable to JFS/JFS2 with caching enabled) MINPGAHEAD (JFS) or j2_minPageReadAhead (JFS2) – Default: 2 – Starting value: MAX(2. Starting Value: 568 j2_nBufferPerPagerDevice (j2_dynamicBufferPreallocation replaces) – Default: 512.DB_BLOCK_SIZE / 4096) MAXPGAHEAD (JFS) or j2_maxPageReadAhead (JFS2) – Default: 8 (JFS). 2009 . 128 (JFS2) – Set equal to (or multiple of) size of largest Oracle I/O request • DB_BLOCK_SIZE * DB_FILE_MULTI_BLOCK_READ_COUNT Number of buffer structures per filesystem: NUMFSBUFS: – Default: 196.IBM Advanced Technical Support .

.ibm.com/support/techdocs/atsmastr.. etc. Avoid/Eliminate I/O hotspots Manual file-by-file data placement is time consuming. resource intensive and iterative Use RAID-5 or RAID-10 to create striped LUNs (hdisks) Create AIX Volume Group(s) (VG) w/ LUNs from multiple arrays.Americas Data Layout for Optimal I/O Performance Stripe and mirror everything (SAME) approach: Goal is to balance I/O activity across all disks.nsf/WebIndex/WP100319 46 © 2009 IBM Corporation April 17. adapters. loops.IBM Advanced Technical Support . 2009 . striping on the front end as well for maximum distribution Physical Partition Spreading (mklv –e x) –orLarge Grained LVM striping (>= 1MB stripe size) http://www-1.

16k. 4M. 2009 .3: AIX 5. 1 MB • AIX 5. or use “mklv –T O” with Big VGs • Requires AIX APAR IY36656 and Oracle patch (bug 2620053) PP Striping – Use minimum Physical Partition (PP) size (mklv -t. 512k. 16 MB. 8k.2: 4k.2 Stripe sizes + 2M. 32k. 128M – Use AIX Logical Volume 0 offset (9i Release 2 or later) • Use Scalable Volume Groups (VGs).Americas Data Layout cont’d… Stripe using Logical Volume (LV) or Physical Partition (PP) striping LV Striping – Oracle recommends stripe width of a multiple of • Db_block_size * db_file_multiblock_read_count • Usually around 1 MB – Valid LV Strip sizes: • AIX 5. 128k. 32M. -s parms) • Spread AIX Logical Volume (LV) PPs across multiple hdisks in VG (mklv –e x) 47 © 2009 IBM Corporation April 17.IBM Advanced Technical Support . 64M. 64k. 256k.

IBM Advanced Technical Support . 2009 .Americas Tuning and Improving System Performance Adjust the key IOO Tuning Parameters Adjust device specific tuning Parameters Other I/O tuning Options – DIO / CIO – Release behind or read and/or write – IO Pacing – Write Behind Improve the data layout Add additional hardware resources 48 © 2009 IBM Corporation April 17.

ioo: pv_min_pbuf = The minimum # of pbufs per PV used by LVM 50 © 2009 IBM Corporation April 17. 2009 .Americas Other I/O Stack Tuning Options (PBUFS) When vmstat –v shows increasing “pending disk I/Os blocked with no pbuf” values lvmo: max_vg_pbuf_count (lvmo) = maximum number of pbufs that may be used for the VG. pv_pbuf_count (lvmo) = the # of pbufs that are added when a PV is added to the VG.IBM Advanced Technical Support .

fc_err_recov = When set to “fast_fail” (recommended). 2009 . queue_depth = the maximum # of outstanding I/Os for an hdisk.Americas Other I/O Stack Tuning Options (Device Level) lsattr/chdev: num_cmd_elems = maximum number of outstanding I/Os for an adapter. if the driver receives an RSCN notification from the switch. Maximum supported value is storage subsystem dependent. allows for immediate re-routing of I/O requests to an alternative path when a device ID (N_PORT_ID) change has been detected. max_xfer_size = the maximum allowable I/O transfer size (default is 0x100000 or 256k). the driver will check to see if the device is still on the fabric and will flush back outstanding I/Os if the device is no longer found. Recommended/supported maximum is storage subsystem dependent. dyntrk = When set to yes (recommended).IBM Advanced Technical Support . 52 © 2009 IBM Corporation April 17. Increasing value (to at least 0x200000) will also increase DMA size from 16 MB to 256 MB.

Americas Asynchronous I/O for filesystem environments AIX parameters minservers = minimum # of AIO server processes (system wide) – AIX 5.1 default = 65536 “enable” at system restart (Always enabled for 6.1 default = 3 (per CPU) maxservers = maximum # of AIO server processes – AIX 5. maxservers=200.1) Typical 5.3 default = 4096.3 default = 1 (system wide).3 default = 10 (per CPU).IBM Advanced Technical Support .1 default = 30 (per CPU) maxreqs = maximum # of concurrent AIO requests – AIX 5. 6.1 – For CIO fastpath AIO in 5. 2009 .3 TL5+. 6. which should be enabled – CIO uses fastpath AIO in AIX 6. 6. maxreqs=65536 – Raw Devices or ASM environments use fastpath AIO > above parameters do not apply > lsattr -El aio0 and look for the value "fastpath".3 settings: minservers=100. set fsfastpath=1 > not persistent across reboot Oracle parameters disk_asynch_io = TRUE filesystemio_options = {ASYNCH | SETALL} db_writer_processes (let default) 54 © 2009 IBM Corporation April 17.

the Oracle code is using the Legacy AIO servers. CIO. JFS2.Americas IO : Asynchronous IO (AIO) • Allows multiple requests to be sent without to have to wait until the disk subsystem has completed the physical IO. • Utilization of asynchronous IO is strongly advised whatever the type of file-system and mount option implemented (JFS. For the moment. 55 © 2009 IBM Corporation April 17. DIO). 1 Application 2 aio Q 3 aioservers 4 Disk Posix vs Legacy Since AIX5L V5. 2009 .3. two types of AIO are now available : Legacy and Posix.IBM Advanced Technical Support .

3 TL5+ : Activate fsfast_path (comparable to fast_path but for FS + CIO/DIO) AIX 5L : adding the following line in /etc/inittab: aioo:2:once:aioo –o fsfast_path=1 AIX 6. => “ps –k | grep aio | wc –l” is not relevent. Better performance compare to non-fast_path No need to tune the min and max aioservers No ioservers proc. : AIX 5L : chdev -a fastpath=enable -l aio0 (default since AIX 5.3) AIX 6.Americas IO : Asynchronous IO (AIO) fastpath With fast_path. use “iostat –A” instead 1 3 Application 2 • Raw Devices / ASM : AIX Kernel Disk check AIO configuration with : lsattr –El aio0 enable asynchronous IO fast_path. 2009 .1 : ioo –p –o aio_fastpath=1 (default setting) • FS with CIO/DIO and AIX 5. IO are queued directly from the application into the LVM layer without any “aioservers kproc” operation.1 : ioo –p –o aio_fsfastpath=1 (default setting) 56 © 2009 IBM Corporation April 17.IBM Advanced Technical Support .

IBM Advanced Technical Support .Americas Asynchronous I/O for filesystem environments… Monitor Oracle usage: • Watch alert log and *.3) 57 © 2009 IBM Corporation April 17.trc files in BDUMP directory for warning message: “Warning “lio_listio returned EAGAIN” If warning messages found. increase maxreqs and/or maxservers Monitor from AIX: • “pstat –a | grep aios” • Use “-A” option for NMON • iostat –Aq (new in AIX 5. 2009 .

most will use 512KB-1MB Pagepool – GPFS fs buffer cache.IBM Advanced Technical Support .1 Async I/O: Oracle parameter filesystemio_options is ignored Set Oracle parameter disk_asynch_io=TRUE Prefetchthreads= exactly what the name says – Usually set prefetchthreads=64 (the default) Worker1threads = GPFS asynch I/O – Set worker1threads=550-prefetchthreads Set aio maxservers=(worker1threads/#cpus) + 10 Other settings: GPFS block size is configurable. 2009 . not used for RAC but may be for binaries.Americas GPFS I/O Related Tunables Refer Metalink note 302806. Default=64M mmchconfig pagepool=100M 58 © 2009 IBM Corporation April 17.

5secs during node eviction/reboot. 2009 .1 59 © 2009 IBM Corporation April 17.1 Defaults: – – – chgsys -l sys0 -a maxpout=8193 minpout=4096 (AIX 6.1 defaults) nfso –o nfs_iopace_pages=1024 (AIX 6. just enough to write the log/trace files for future diagnosis.1 defaults) On the Oracle clusterware set : crsctl set css diagwait 13 –force • This will delay the OPROCD reboot time to 10secs from 0. use AIX 6.Americas I/O Pacing I/O Pacing parameters can be used to prevent large I/O streams from monopolizing CPUs – – – System backups (mksysb) DB backups (RMAN. Metalink note# 559365.IBM Advanced Technical Support . Netbackup) Software patch updates When Oracle ClusterWare is used.

2009 .IBM Advanced Technical Support .Americas ASM configurations AIX parameters – Async I/O needs to be enabled. It is common to set this value to 0. then increase to a higher value during maintenance windows – PROCESSES=25+ 15n. May be changed dynamically. where n=# of instances using ASM DB instance parameters – – – – – disk_asynch_io=TRUE filesystemio_options=ASYNCH Increase Processes by 16 Increase Large_Pool by 600k Increase Shared_Pool by [(1M per 100GB of usable space) + 2M] 60 © 2009 IBM Corporation April 17. but default values may be used ASM instance parameters – ASM_POWER_LIMIT=1 Makes ASM rebalancing a low-priority operation.

Americas Agenda AIX Configuration Best Practices for Oracle – Memory – CPU – I/O – Network – Miscellaneous 61 © 2009 IBM Corporation April 17. 2009 .IBM Advanced Technical Support .

IBM Advanced Technical Support . check to see if settings have been overridden at the network interface level: $ no -a | grep isno use_isno=1 use_isno=1 $ lsattr -E -l en0 -H attribute value description rfc1323 N/A tcp_nodelay N/A tcp_sendspace N/A tcp_recvspace N/A tcp_mssdflt N/A 62 © 2009 IBM Corporation April 17.Americas Network Options (no) Parameters – – – – Set sb_max >= 1 MB (1048576) Set tcp_sendspace >= 262144 Set tcp_recvspace >= 262144 Set rfc1323=1 If isno=1. 2009 .

IBM Advanced Technical Support . 2009 .Americas Additional Network (no) Parameters for RAC: Set udp_sendspace = db_block_size * db_file_multiblock_read_count (not less than 65536) Set udp_recvspace = 10 * udp_sendspace – Must be < sb_max Increase if buffer overflows occur Ipqmaxlen=512 for GPFS environments Use Jumbo Frames if supported at the switch layer Examples: no -a |grep udp_sendspace no –o -p udp_sendspace=65536 netstat -s |grep "socket buffer overflows" 63 © 2009 IBM Corporation April 17.

2009 .IBM Advanced Technical Support .Americas Agenda AIX Configuration Best Practices for Oracle – Memory – I/O – Network – Miscellaneous 64 © 2009 IBM Corporation April 17.

Americas Miscellaneous parameters User Limits (smit chuser) – – – – – Soft FILE size = -1 (Unlimited) Soft CPU time = -1 (Unlimited) Soft DATA segment = -1 (Unlimited) Soft STACK size -1 (Unlimited) /etc/security/limits Maximum number of PROCESSES allowed per user (smit chgsys) – maxuproc >= 2048 Environment variables: – AIXTHREAD_SCOPE=S – LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STATSPACK=64K oracle* 65 © 2009 IBM Corporation April 17. 2009 .IBM Advanced Technical Support .

IBM Advanced Technical Support .Americas Q&A 71 © 2009 IBM Corporation April 17. 2009 .

Contact your IBM representative or Business Partner for the most current pricing in your geography. these changes will be incorporated in new editions of the publication. Intel is a registered trademark of Intel Corporation * All other products may be trademarks or registered trademarks of their respective companies. ESCO. and represent goals and objectives only. Inc. xSeries. compatibility. e-business logo. pSeries. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. VM/ESA. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved.shtml: AS/400. Consult your local IBM business contact for information on the product or services available in your area.IBM Advanced Technical Support . Websphere. 2009 . iSeries. SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. zSeries. FICON. MVS.ibm.Americas Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use. DBE. 72 © 2009 IBM Corporation April 17. and the workload processed. the I/O configuration. z/OS. IBM hardware products are manufactured from new parts. This publication was produced in the United States. S/30. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. Notes. services or features discussed in this document in other countries. For a complete list of IBM Trademarks. Changes are periodically made to the information herein. Regardless. eServer. see www.com/legal/copytrade. the storage configuration. IBM. our warranty terms apply. or any other claims related to non-IBM products. OS/390. and Domino are trademarks or registered trademarks of Lotus Development Corporation Java and all Java-related trademarks and logos are trademarks of Sun Microsystems. IBM has not tested those products and cannot confirm the performance. Microsoft. VSE/ESA. Windows and Windows NT are registered trademarks of Microsoft Corporation. z/VM The following are trademarks or registered trademarks of other companies Lotus. and the information may be subject to change without notice. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.. IBM Logo. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream. IBM may not offer the products. Prices subject to change without notice. RS/6000. in the United States and other countries LINUX is a registered trademark of Linux Torvalds UNIX is a registered trademark of The Open Group in the United States and other countries. Therefore. The information could include technical inaccuracies or typographical errors. NOTES: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. or new and serviceable used parts. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice.