You are on page 1of 44

OPENVMS I/O AND STORAGE

Tips and Best Practices for good performance

Rafiq Ahamed K OpenVMS Engineering

©2011 Hewlett-Packard Development Company, L.P. ©2010 The information contained herein is subject to change without notice

Agenda

OpenVMS I/O Facts
– I/O

Evolution on Integrity servers to expect from hardware do you know about multipathing on IOPERFORM and I/O

– What – What – Notes

– NUMA

Storage Tips and Tricks
– EVA

Best Practices Connectivity a sneak into EVA PA

– OpenVMS – Have

Q&A

2

9/19/2011

“Operating System performance is largely dependent on the underlying hardware”

..So know your hardware capabilities

39/19/2011

Integrity server I/O
PCI PCI (66MHz, Interconnects 0.5GB/sec) PCI-X PCIe Gen 1 (133MHz,1GB/sec) (2.5Gb/sec/Lane, PCI-X 250MB/sec) (266MHz, 2GB/sec) Ultra SCSI 320 2G FC 600GB 3G SAS (LSI Logic) 4G FC 1.2TB PCIe Gen 2 (5Gb/sec/Lane, 500MB/sec) 6G SAS (p410i) 8G FC 7.2TB

Core I/O SAN I/O Disk Size

Ultra SCSI 160 1G FC 300GB

# I/O Device 3

6

8

12/16

Architecture has evolved drastically for I/O Devices within Integrity, Performance & Scalability - doubling in each new hardware release

4

9/19/2011

Examples of latest speeds and feeds of I/O on Integrity platforms 5 .

up to 7.Leadership in I/O and Storage on i2 architecture High performance. Striping across controllers provides no SPOF storage Parallel SCSI with rx7640 has a shared bus Ultra160 SCSI Each BL890c/rx2800 supports eight SFF SAS HDD. reliable and scalable SAS provides a point-to-point connection to each HDD @ 6G Speeds Provides four p410 RAID controllers (one per blade) on BL890c i2.2TB capacity 7 9/19/2011 . One 410 RAID on rx2800 Configured as RAID 0/1 or HBA mode [Future] Stripe data within and across multiple p410 RAID controllers (OpenVMS Shadowing) Striping within provides high Performance.

SAS Logical Disk (Striping) 500 IOPS 400 300 200 100 0 1 2 4 8 16 Load 1500 1000 500 0 32 64 128 256 1 2 4 8 16 Load 32 64 128 256 W/O Cache Cache 1 disk with Cache 2 disk with Cache 4 disk with Cache   Use p410i Cache Batter Kit for faster response Stripe across multiple disks to maximize the utilization and throughput 8 9/19/2011 .Core I/O on i2 servers Data shows the impact of p410i Caching and Striping rx2800 i2 .Core SAS Caching 700 600 2500 2000 IOPS rx2800 i2 .

Customer Concerns. • • • • How is I/O Performance on Integrity Servers? How they compare against my existing high end alpha servers? After migrating to new platform what should I expect? Why is i2 server I/O market differentiator? 9 9/19/2011 ..

Software capabilities. multipathing 10 .

3] • • • • • OpenVMS MP supports FAILOVER and FAILBACK OpenVMS MP load balances the storage devices across all the paths available – It spreads # of devices evenly across all the paths during boot time • • At any point in time only ―single‖ path can be active to a device Users are recommended to use static load balancing techniques 11 9/19/2011 .Multipathing 1(4) • Multipathing (MP) is a technique to manage multiple paths to storage device through failover and failback mechanisms It helps user to load balance across multiple paths to storage By default multipath will be enabled on OpenVMS OpenVMS MP supports ALUA (Asymmetric Logical Unit Access) [> V8.

MP Connections – Good and Bad IA64 switch IA64 IA64 switch switch Shows single controller configurations switch HSV Single HSV Two Paths HSV Four Paths Alpha IA64 Alpha IA64 switch switch switch switch HSV HSV HSV HSV 12 9/19/2011 .

if no alternative then active nonoptimized (ANO) is picked [ how to fix this will discuss during EVA best practices] • With latest firmware's on storage. SDA > SHOW DEVICE „DGAxx‟ shows MP Set path discovered is considered “primary” active path is called “current” path selection algorithms optimized to support Active/Active arrays – Automatic – Always active optimized (AO) paths are picked for I/O.Multipathing 2(4) • Device discovery initiates the path discovery and forms MP set for each device – MC – First – The SYSMAN IO AUTO. very rare you will get connected to ANO 13 9/19/2011 .

Multipathing 3(4) VMS switches its path to a LUN when: • An I/O error triggers mount verification (MV) – Device is not reachable on current path and another path works • • • MOUNT of a device with current path offline Manual path switch via SET DEVICE/SWITCH/PATH= Some local path becomes available when current path is MSCP – Path switch from MSCP to local triggered by the poller [not if manual sw] • Note: Any MV might trigger a path switch – MV – MV due to loss of cluster quorum due to SCSI error flush 14 9/19/2011 .

MV followed with path switch is indication of failover/failback – The operator logs will indicate the details DEV „device‟/full will show details of path switch [time etc] SHOW DEV „device‟ logs lot of diagnostic information in MPDEV structure – SHOW – SDA> 15 9/19/2011 . It only indicates that OpenVMS validated your device – Shadow devices can initiate and complete a MV . Each shadow member operate independently on a path switch • But.Multipathing 4(4) • MPDEV_POLLER is light weight. will poll all paths for availability – SET DEVICE „device‟/POLL/NOPOLL • MV is not bad.

Bus Reset etc SCSI_ERROR_POLL is poller responsible for clearing the latched errors (like SCSI UA) on all the fibre and SCSI devices.SCSI Error Poller • We have seen customers reporting huge traffic of Unit Attention(UA) in SAN resulting in Cluster hangs. slow disk operations. which can otherwise cause confusion in SAN By default the poller will be enabled SYSGEN>SHOW SCSI_ERROR_POLL • • • 16 9/19/2011 . high mount verifications etc! These UA are initiated due to any changes in SAN like Firmware Upgrades.

. ALUA. does that indicate problem? Does multipathing do load balance? Or policies? I see too many mount verification messages in operator log. OpenVMS Multipathing… • After upgrading my SAN components. I see large number of mount verifications. will it impact the volume performance (especially latency) How do I know if my paths are well balanced or not? How do I know my current path is active optimized or not? Does multipathing support Active/Active Array’s. SSD device • • • • • 17 9/19/2011 . third party storage.Customer/Field Concerns. SAS device.

Provide alternative 18 9/19/2011 .Did you know QIO is one of the most heavily used interface in OpenVMS. Replace QIO 3. Optimize QIO 2. What should we do?  1. We want to put it on diet.

c • System management considerations: – SYSGEN parameter – Creating MAXBOBMEM limits memory usage (defaults to 100) – VMS$BUFFER_OBJECT_USER identifier is required for process buffer objects once and using them for the time of application is faster 19 9/19/2011 .sys$io_perform.IOPERFORM/FastIO • • • Fast I/O is a performance-enhanced alternative to performing QIOs Substantially reduces the setup time for an I/O request Fast I/O uses buffer objects to eliminate the I/O overhead of manipulating I/O buffers – locked memory doubly mapped Performed using the buffer objects and the following system services: – sys$io_setup • . sys$io_cleanup (jacket) /sys$create_bufobj_64 – sys$create_bufobj –$ dir sys$examples:io_perform.

Impact of IOPERFORM/FASTIO I/O Data Rate (MB/sec) • Resource usage is reduced by 20-70% depending on load and system – Small size random workloads doubles the throughput with increased loads size sequential workloads operate same 450 400 350 300 250 200 150 100 50 0 1 2 3 4 Threads 5 6 7 128K_READ 128K_READ_QIO 128K_WRITE 128K_WRITE_QIO – Larger Throughput (IOPS) 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 8K_READ 8K_READ_QIO 1 2 4 8 Threads 16 32 64 20 9/19/2011 .

NUMA/RAD Impact What you should know? DGA100 P1 22 9/19/2011 BL890c i2 (architecturally 4 Blades conjoined) .

NUMA/RAD Impact • In RAD based system. each RAD is made of CPU. Memory and I/O devices The accessibility of I/O devices from local to remote domains results in accessing remote memory and remote interrupt latency Impact of RAD on I/O Device 3700 3600 3500 IO Rate 3400 3300 3200 3100 • Optimized Performance 10-15 % overhead 3000 0 1 2 3 4 RAD # 23 9/19/2011 5 6 7 8 Opt/sec .

RAD Guidelines for I/O • • Keep I/O Devices close to process which is heavily accessing it Make use of FASTPATHING efficiently – Make – The sure to FASTPATH the Fibre Devices close to the process which is initiating the I/O overhead involved in handling the remote I/O can impact the throughput [chart] • • • • FASTPATH algorithms assign the CPU on round robin basis Statically load balance the devices across multiple RAD’s Make use to SET PROC/AFFINITY to bind processes with high I/O Use SET DEVICE „device‟/PREFERRED_CPUS 24 9/19/2011 .

STORAGE BEST PRACTICES 25 .

8 1GbE 4 FC. Speed # 3-1/2 drives # 2-1/2 drives Max.700 MB/s 780 MB/s 55. Vdisk I/O Read Bandwidth I/O Write Bandwidth Random Read I/O 4Gb/s FC 4 4Gb/s FC 96 0 1024 780 MB/s 590 MB/s 26.000 IOPs 8 FC.EVA Differences Speeds and Feeds EVA4400 Model Memory /controller pair HSV300 4GBytes EVA6400 HSV400 8GBytes EVA8400 HSV450 14/22GBytes P6300 HSV340 4GBytes P6500 HSV360 8GBytes Host Ports / controller pair 4 FC 20 w switches 8 FC 8 FC Host Port speed Device Ports. 0 GbE 4 FC. 4 10GbE 27 9/19/2011 .000 IOPs 12 4Gb/s FC 324 0 2048 1.545 MB/s 515 MB/s 54.000 IOPs 6Gb/s 240 500 2048 1. 0 GbE 4 FC.000 IOPs 8 FC. 8 1GbE 4 FC.250 MB/s 510 MB/s 54.000 IOPs 4Gb/s FC 4Gb/s FC 8 4Gb/s FC 216 0 2048 1.700 MB/s 600 MB/s 45. 4 10GbE 8Gb/s FC 8Gb/s FC 1Gb/s iSCSI 1Gb/s iSCSI 10Gb/s 10Gb/s iSCSI/FCoE iSCSI/FCoE 8 16 6Gb/s 120 250 1024 1. # Device Ports.

we see additional CPU’s ticks for copy. our CRTL FSYNC is running slow After upgrading. delete and rename Our database is suddenly responding slow Some nodes in cluster see high I/O latency after mid-night • • • • • Customer wants to know if this storage is enough for next 5 years Customer is migrating from older version of EVA to newer version. I/O response has become slower – We see 5-6 millisecond delay in completion of I/O compared to yesterday • After moving to new blade in same SAN environment.General I/O issues reported • After upgrading the OS or applying the patch. can you advise 9/19/2011 28 .

“Maximum number of storage performance issues reported are due to mis-configuration of SAN components” 29 9/19/2011 .

have shown linear growth in throughput (small random) – Tests • Number of disk groups influences performance – No – In mixed load environments.. it would be ok to have random vs. but no protection – Use for non-critical storage needs • Vraid level influences performance – Yes – Vraid1 – Vraid0 30 9/19/2011 .1(6) • Number of disks influences performance . however. Vraid5 is better for some sequential-write workloads provides the best random write workload performance . sequential application disk groups best performance over the widest range of workloads.Best Practices.Yes – Fill the EVA with as many disk drives as possible.

2(6) • Fibre channel disk speed . speed doesn’t matter.Best Practices.. consider using more 10k rpm disks. 30-40% gains in request rates is seen – Large-block – Small-block – Best • price-performance For the equivalent cost of using 15k rpm disks. but capacity matters random I/O.Yes – 10K vs 15K Speed – 15K rpm disks provide highest performance sequential I/O. • Combine disks with different performance characteristics in the same disk group – Do not create separate disk groups to enhance performance 31 9/19/2011 .

Best Practices. disks with equal capacity in a disk group. • • Read cache management influences performance and always ENABLE LUN count – Yes. No – Good to have few LUNs per controller on Host Requests and Queue Depth’s OpenVMS Queue depth – Depends – Monitor • Transfer size – Yes – Impacts – Tune SEQUENTIAL Workloads the write transfer size to be a multiple of 8K and no greater than 128K.. – OpenVMS Max Transfer Size is 128K for Disk and 64K for Tapes! DEVICE_MAX_IO_SIZE 9/19/2011 32 . The larger disk will have more LUN capacity leading into imbalanced density control over the demand to the larger disks.4(6) • Mixing disk capacities – Yes – The – No – Use EVA stripes LUN capacity across all the disks in a disk group.

Yes – SSDs • are about 12 times better OpenVMS performed 10x times better than FC [next slide details] – Workloads – Spread – SSDs – Monitor like transaction processing.3(6) • SSD performance . and databases such as Oracle are ideal SSDs evenly across all available back end loops and HDDs may be mixed in the same drive enclosure your application and EVA. data mining.. Yes.Yes. accordingly can assign SSD or HDD to individual Controllers. where the response time is un-compromised – Customers 33 9/19/2011 .Best Practices. or enable write through mode for SSD’s can help [Experiment!!] use SSD drives to keep the critical path data.

OpenVMS 8. 8 Disks SSD/FC DG on EVA4400 Smaller IO’s(4K/8K) showed 9-10x times sustained increased in IOPS and MB/sec with increase in load for SSD carved LUN’s compared to FC With 10 times faster response time.FC vs SSD 14000 12000 10000 IOPS 8000 6000 4000 2000 0 1 2 4 8 16 Threads QIO_FC QIO_SSD FC 32 64 128 256 Res Time(msec) OpenVMS V8. SSD carved LUN was able to deliver 10 time more performances and bandwidth for smaller size IO’s 9/19/2011 • 34 .4 Performance Results SSD Drive Through EVA 4K Mixed QIO .4 10x Faster 250 200 150 100 50 0 0 4K Mixed QIO – FC vs SSD 10x Faster 50 100 150 Threads SSD 200 250 300 • • Mixed Load.

Ownership is only one – Active/Active load balance LUN ownership to both controllers (use EVAPerf) either through using CV EVA or using OpenVMS SET DEVICE/SWITCH/PATH='PATH_NAME' 'DEV_NAME' command Path – During the initial boot of the EVA the ―preferred path‖ parameter is read and determines the managing controller [see below figure for options] LUN ownership is reassigned after a failed controller has been repaired the workload as evenly as possible across all the Host Ports DGA99 answers Inquiry on these ports DGA99 does I/O on these ports HSVxxx HSVxxx – Preferred – Verify – Balance DGA99 35 9/19/2011 Command View EVA Preferred Path Settings . Yes..5(6) • Controller balancing – Yes.Best Practices. Yes – Maximize – Manually the utilization present LUNs simultaneously through both controllers.

Customer Scenario Controller Load Imbalance & No Equal Port Load Distribution 36 9/19/2011 .

Device loop failure etc.Best Practices. Drive reporting timeouts Deploy the array only in supported configurations Stay latest on EVA firmware!! BC and CA have different best practices and beyond the scope of this discussion • • • 37 9/19/2011 .. hence Write performance becomes an issue).6(6) • Ensure there are no HW issues – Specially Battery failure (Cache Battery failure causes to change to Write-Through mode.

such as an array processing large I/O requests Array processor utilizations tend to be high under intense.Some Points to Remember • Large latencies may be quite natural in some contexts. small-block. transaction oriented workloads but low under intense large-block workloads • 38 9/19/2011 .

Larger blocks can drain the interconnects faster due large data transfers. 128K IO’s pushing 4G FC line speeds! Random Read (IO/sec) 6 resp time(msec) 5 4 3 2 1 0 6562 12935 22287 31888 38136 40052 43452 45504 46202 IOPS Higher throughputs can be obtained with smaller blocks.OpenVMS IO Data P6500 – 36G RAID 5 Volume – 4G FC Infrastructure Sequential Read (MB/sec) 45 40 35 30 25 20 15 10 5 0 138 286 367 383 404 411 412 412 MB/sec resp time(msec) Higher bandwidth can be obtained with larger blocks. Usually smaller blocks need a lot of processing power 8K workloads pushing close to EVA Max Throughputs! 40 9/19/2011 .

STORAGE PERFORMANCE ANALYSIS TOOLS & REFERENCES 41 9/19/2011 .

EXE. FIBRE_SCAN. FCP. VEVAMON (older EVA’s) > FC [for fibre devices]. PKR/PKC [for SAS devices] SCSI_INFO. – SDA – SYS$ETC: – Many • EVAPerf – Command Line EVA Performance Data Collector • • • EVA Performance Advisor [ Year End Release] XP Performance Advisor Storage Essential Performance Pack 42 9/19/2011 .Storage Performance Tools • OpenVMS Host Utilities – T4.EXE. TLViz – Disk. SCSI_MODE..EXE more.

0 in single pane of glass User centric design compliance Features: – Dashboard – Threshold – Key – Chart – Report monitoring & notification metric • • • Quick setup Events Database 43 9/19/2011 .Salient aspects of HP P6000 Performance Advisor Slated to release soon – Can participate in early adaptor program • • • Integrated with Command View 10.

Sizing EVA HP StorageWorks Sizing Tool 44 9/19/2011 .

HP Document Library • 45 9/19/2011 .References – EVA Performance • HP StorageWorks Enterprise Virtual Array —A tactical approach to performance problem diagnosis.

com 46 9/19/2011 .madhavan@hp.com • Office of Customer Programs – OpenVMS.Questions/Comments • Business Manager (Rohini Madhavan) – rohini.Programs@hp.

THANK YOU .

XXX or 3.Reference EVA Model EVA3000 EVA5000 EVA3000 EVA5000 EVA4000 EVA6000 EVA4100 EVA6100 Controller Model HSV100 HSV1 10 HSV101 HSV1 1 1 HSV200 or HSV200-A HSV200 or HSV200-A HSV200-B HSV200-B 5.XXX (Latest 4100) EVA8000 EVA8100 EVA4400 EVA6400 EVA8400 HSV210 or HSV210-A HSV210-B HSV300 HSV400 HSV450 09XXXXXX or 10000000 095XXXXX.XXX (Latest 6220) Firmware 1.EVA Models .XXX. 2.XXX (Latest 31 10) 4.XXX or 6. or 10000000 10000090 P6300 P6500 48 9/19/2011 HSV340 HSV360 .