You are on page 1of 56

IBM i Disk HW Performance

Satid Singkorapoom
ASEAN IBM i ATS Specialist
April 2014

© 2012 IBM Corporation


IBM Power Systems

Disclaimer

What is described as “good performance guideline” or “rule of thumb” in this


presentation is the author’s personal guideline which is based on his 20-
year first-hand personal experience as a Technical Product Specialist for
IBM i and its predecessors. The guideline should be applicable to the
general majority of IBM i customers.

2 © 2012 IBM Corporation


IBM Power Systems

Agenda

 IBM i Disk Subsystem


Performance

 Main Memory Page Faulting


and Performance

 Miscellaneous

3 © 2012 IBM Corporation


IBM Power Systems

IBM i Disk Subsystem Performance

4 © 2012 IBM Corporation


IBM Power Systems

IBM i HW Performance

Which HW subsystem/component of a computer


machine is the slowest ?

5 © 2012 IBM Corporation


IBM Power Systems

IBM i HW Performance

Which HW subsystem/component of a computer


machine is the slowest ? – HDD Disk

6 © 2012 IBM Corporation


IBM Power Systems

Disk Subsystem Performance Characteristic


Good performance guideline for disk unit % Busy in IBM i

 40 % for old SCSI and Ultra SCSI disk HW

 50 % for current
SAS disk HW

% Busy for
Good
From past AS/400 Performance
Performance Capabilities Guideline
Reference publication occurs here

7 © 2012 IBM Corporation


IBM Power Systems

Disk Subsystem Performance Characteristic Reference

Published when new OS


release is available with
subsequent update every year

Don’t leave home without it ☺

8 © 2012 IBM Corporation


IBM Power Systems

Disk Subsystem Performance Characteristic Reference


Sample disk subsystem
performance characteristics

More disk units = better


disk IO throughput and
response time

CCIN 572F = FC 5904/6/8 CCIN 57B5 = FC 5913


Disk Controller Adapter Disk Controller Adapter

9 © 2012 IBM Corporation


IBM Power Systems

Disk Subsystem Performance Characteristic Reference


Sample disk subsystem
performance characteristics

10 © 2012 IBM Corporation


IBM Power Systems

IBM i Dual SAS Disk Controller

11 © 2012 IBM Corporation


IBM Power Systems

IBM i Dual SAS RAID5/6 Disk Controller Performance

12 © 2012 IBM Corporation


IBM Power Systems

IBM i - Setting RAID-5/6 Optimization Steps

13 © 2012 IBM Corporation


IBM Power Systems

IBM i - Dual SAS Disk Optimization Consideration

14 © 2012 IBM Corporation


IBM Power Systems

IBM i - Display Disk Path Status


http://publib.boulder.ibm.com/infocenter/dsichelp/ds8000ic/index.jsp?topic=%2Fcom.ibm.storage.ssic.help.doc%2Ff2c_service_system.html

System Service Tool (SST)


1. On the Start Service Tools (STRSST) Sign On panel, type your service tools user ID and
password.
2. Select option 3 Work with Disk Units.
3. Select option 1 Display disk configuration.
4. Select option 9 Display disk path status.

Dedicated Service Tools


1. Select option 3 Use Dedicated Service Tools (DST).
2. Log into DST with your Service Tools UserID.
3. Select option 1 Work with disk unit.
4. Select option 1 Work with disk configuration.
5. Select option 1 Display disk configuration.
6. Select option 9 Display disk path status.

IBM Navigator for i Procedure (port 2001)


1. Expand Configuration and Service.
2. Click on Disk Units.
3. Right-mouse click on desired disk unit and choose Properties.
4. Click on Connections on the left side of the window.

15 © 2012 IBM Corporation


IBM Power Systems

IBM i Disk IO Workload


Two main sources of disk IO workload in IBM i
 Data Access disk IO – DB2 database and IFS files access

 Main memory page faulting disk IO

Also when there exists substantial SQL-based or


IBM i Query workload
 Temporary disk allocation from DB2 for i SQL/Query engine

 Inefficient Table Scan disk IO – as opposed to Index Access IO

16 © 2012 IBM Corporation


IBM Power Systems

Disk Space for IBM i Temporary Objects Allocation

• Temporary disk allocation


from DB2 for i query engine
and other OS functions -
contributes to disk IO
workload
• Currently allocated
• Maximum since last IPL

17 © 2012 IBM Corporation


IBM Power Systems

IBM i Data Stripping Disk Storage Allocation

Characteristics of IBM i data stripping disk storage


allocation (RAID-0)
 Round Robin allocation to all disk units in the same ASP

 Allocate equal “% Use” disk space (seen in WRKDSKSTS screen)


for every disk unit in the same ASP – regardless of disk unit size

• Larger-size disk units receive more write operations to be filled at the


same percentage as the smaller-size disk units in the same ASP
• Disk write operations take more time than disk read
• Regardless of its size, each disk unit can handle similar maximum
IOPS rate before its response time degrades (see page 7 graph)

 Each disk write is done in a number of fixed chunk of bytes

18 © 2012 IBM Corporation


IBM Power Systems

IBM i Data Stripping Disk Storage Allocation

IOPS IOPS
rate rate

19 © 2012 IBM Corporation


IBM Power Systems

IBM i PDI Tool: Display Disk IOPS Rate in Reads VS Writes

This graph shows per system/LPAR disk IOPS

“Advanced” graph is also available showing more read/write details

20 © 2012 IBM Corporation


IBM Power Systems

IBM i Disk Subsystem Good Performance Guideline

As long as % Busy of every disk unit is lower than


50%, the general disk response time should be fine
• An exception – SQL accessing very large table(s) without useful
index(es) can take a long time to run even when disk general %
Busy is lower than 50% - This is about SQL DB access
performance, not disk performance

A sign of impending overall disk performance issue:


more than 1/3 of all the disk units in an ASP are
consistently busy beyond 50% for an extended period of
time
• Performance degradation is likely during such a situation

21 © 2012 IBM Corporation


IBM Power Systems

IBM i Disk Subsystem Good Performance Guideline

Disk % Busy high for all 10


units but is generally OK
because this did not last too
long (see graph)

22 © 2012 IBM Corporation


IBM Power Systems

When Mixed Disk Size is Used in the Same ASP

IBM i Disk Config Rule of Thumb for Good Performance


 Use the same disk unit/LUN size in the same ASP or

 Large-size disk units/LUNs must be the “majority population” in the


ASP (2/3 of the total disk units population or more in the same ASP) –
some smaller-size units/LUNs can coexist but not exceeding 1/3 of the
total disk unit population

Disk performance “hot spot” problem occurs when


 Too few large-size disk units/LUNs exist among the majority of
smaller-size ones in the same ASP

23 © 2012 IBM Corporation


IBM Power Systems

When Mixed Disk Size is Used in the Same ASP


Sample 1
433C 4327

24 © 2012 IBM Corporation


IBM Power Systems

When Mixed Disk Size is Used in the Same ASP


Sample 2

IOPS
rate

25 © 2012 IBM Corporation


IBM Power Systems

Separate ASP for Journal Receivers?


ASP 1 for OS and ASP X for database ASP Y for journal
Temp Disk Allocation and journal object receivers

+ Contain disk failure within an ASP – restore only the failed ASP
- Limit peak IOPS capacity in each ASP and disk space management

In the distant past we separate ASP because :


 Low max IOPS capacity per each disk unit (less than 150 IOPS)
 Low or no write cache in disk controller card (256MB or less)
 No Journal Cache function available to reduce disk write to journal receivers
 Less efficient disk allocation in old releases OS/400 for journal receivers

26 © 2012 IBM Corporation


IBM Power Systems

Separate ASP for Journal Receivers? – Less Need Now


ASP 1 for ALL

+ Better peak IOPS capacity for consistent performance and easy


disk space management
- Pay attention to disk failure protection and use of hot spare

Now, much less need to separate ASP because :


 Higher max IOPS capacity per each disk unit (150 IOPS or more)
 Large write cache in disk controller card/pair (1.5 to 12 GB now)
 Journal Cache function available to reduce disk write to journal receivers
 Intelligent disk allocation as of IBM i 5.4 for journal receivers
27 © 2012 IBM Corporation
IBM Power Systems

Separate ASP for Journal Receivers? – Less Need


ASP 1 for ALL

General guideline for handling disk failure for a single ASP

Use RAID-5/6 protection with Hot Spare disk unit – disk mirroring is
even better
One hot spare unit per max 12 functioning units
No more than 12 units per each RAID-5 set

28 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory and System Performance

29 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory and System Performance

Popular question on IBM i Main Memory:


When do I need to add more main memory?

 Application requirement – Java, C and C++ applications


(as opposed to RPG, CL, COBOL) need heap memory
that can be initially large

 Disk subsystem Performance is degraded at periods of


very high main memory page fault rate in user pool(s)
 RAM page faulting toleration in IBM i depends on disk
HW performance at peak faulting rate

30 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory - Machine Pool Performance

Good Performance Guideline for IBM i Machine Pool


Page Fault Rate: 10 fault per second per core

Allocate 2-3 times the “Reserved Size” of Machine Pool


that you see in WRKSYSSTS screen (preferably at
peak workload periods) and fix it at this value with
WRKSHRPOOL command (see next 2 pages).

31 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory - Machine Pool Performance

32 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory - Machine Pool Performance


WRKSHRPOOL + Enter + F11

33 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory - Machine Pool Performance

Good page fault rate

34 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory User Pool Page Fault Rate


High main memory page fault rate causes high disk IO
operations

+ = Total

35 © 2012 IBM Corporation


IBM Power Systems

IBM i User Pool Memory Faulting

Disk IO caused by main memory page faulting in all


IBM i memory pools always occurs in System ASP

Temporary disk allocation by IBM i and DB2 for i


(“Unprotect” in page 17) always occurs in System ASP

36 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Fault Rate and System ASP Performance

As long as “% Busy” of all disk units in System ASP


(ASP No. 1) is lower than 50%, the overall system
performance should be fine or acceptable

A sign of impending overall system performance


issue: more than 1/3 of all the disk units in System
ASP are consistently busy beyond 50% for an
extended period of time - same as page 21 but
applicable only for disk units in System ASP

37 © 2012 IBM Corporation


IBM Power Systems

Memory Faulting and System ASP Performance

Metaphor: How fast do I drive my car to get the best fuel consumption
per mileage?

Answer: Wrong question!

Efficient fuel consumption per mileage lies in engine RPM, not car
speed. Transmission gear lies between these two factors: proper gear
position brings down engine RPM at a speed  less fuel consumption

For IBM i, User Pool page faulting rate can be high as long as % Busy
of all disk units in System ASP is still substantially lower than 50% -
powerful disk subsystem absorbs more page faulting rate before its
performance degrades

38 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance


Formal statement:
The time it takes to do a page fault disk IO should be
lower than 10-15% of the average disk response time
for good page faulting performance

Rule of Thumb:
During the period of high User Pool page fault rate, if %
Busy of all the disk units in System ASP is lower than
50%, then there is no need to add more main memory
(if the application does not require more by its own
reason)

Samples in the following 3 pages.


39 © 2012 IBM Corporation
IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

Bad disk % Busy

40 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

Contribute more to high disk % Busy

41 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

Contribute less to high disk % Busy

42 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

The previous 3 sample graphs (of the same time period) indicate
that high disk % Busy is more likely caused by high Batch
Processing (Batch Logical Database IO) than by User Pool fault
rate

Adding more main memory may not help improve performance in


this case because page faulting rate is not very high

Rule of thumb: pay attention to a total memory faulting rate


higher than 1,000 faults per second

Look at WRKSYSSTS (page 35) or PDI graph (next page)

43 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

44 © 2012 IBM Corporation


IBM Power Systems

Is good disk subsystem HW performance sufficient for good


overall application performance?

It depends on performance expectation of the customers


and there are almost always additional non-HW
performance factors to explore
For example:
If substantial SQL-based workload that access large tables exists,
identifying and creating useful indexes for large tables helps improve
overall performance further – use DB2 for i System-wide Index Advisor
tool

Inefficient program codes and creation method may exist and thus are
open for change for improvements – changing from OPM program coding
to ILE can deliver performance improvement

45 © 2012 IBM Corporation


IBM Power Systems

Is good disk subsystem HW performance sufficient for good


overall application performance? ….Continued

For application environment that is complex, there may exist too much
object seize/lock that causes run-time performance issue – query IBM i
performance data to produce such report

For data-change-intensive long-running jobs accessing journalled tables,


Journal Cache can most likely deliver improved performance

For read-intensive long-running jobs accessing a lot of data, the use of


SSD may deliver improved performance

46 © 2012 IBM Corporation


IBM Power Systems

Quiz: What can be the possible cause of this?

 Many disk units exhibit % Busy higher than 50% just recently –
was previously consistently lower than 40%

 Total main memory page fault rate is only moderate for months
up till now

 Users’ workload remained steady in the past month and up till


now (500 concurrent users)

Recently, IBM i Audit Journaling was activated with ALL audit


categories !  High disk write workload !

47 © 2012 IBM Corporation


IBM Power Systems

Be careful in selecting security audit categories

 Each selected audit category increases audit journal


receiver disk write workload to disk subsystem

 The more active jobs, the more disk write workload

48 © 2012 IBM Corporation


IBM Power Systems

IBM i Main Memory Page Fault Rate and Performance

If you see disk % Busy (in System ASP) of more than 50% during
the same period of time where you also notice high page fault
rate in User Pools, then it is highly likely that the high page fault
rate contributes to the high disk % Busy

Two choices when high page faulting causes high System ASP
disk % Busy

 Add more main memory to reduce peak page fault rate

 Add disk controller and disk units to absorb more disk IOPS
 In some cases, changing from low-performance disk controller to a high-
performance one can also be a solution

49 © 2012 IBM Corporation


IBM Power Systems

IBM i Performance Data Investigator (PDI) Tool

Delivered with IBM i 6.1 and later release

No Installation and set up required

50 © 2012 IBM Corporation


IBM Power Systems

A Better Way to Look at IBM Performance Data – IBM i PDI

51 © 2012 IBM Corporation


IBM Power Systems

A Better Way to Look at IBM Performance Data – IBM i PDI

Same system, same date/time

52 © 2012 IBM Corporation


IBM Power Systems

A Better Way to Look at Disk Response Time – IBM i PDI

Same system, same date/time

53 © 2012 IBM Corporation


IBM Power Systems

A Better Way to Look at Page Fault Time – IBM i PDI

Same system, same date/time

This system has 3


cores and 19GB
RAM. Adding more
RAM may reduce
Disk Page Faults
Time.

54 © 2012 IBM Corporation


IBM Power Systems

Thank
You

55 © 2012 IBM Corporation


IBM Power Systems

56 © 2012 IBM Corporation

You might also like