Professional Documents
Culture Documents
Session Title:
Designing a PowerHA SystemMirror for AIX Disaster Recovery Solution
Session ID:
HA18 (AIX)
+
Workload-Optimizing Systems
Agenda
Available Offerings Campus Disaster Recovery vs. Extended Distance What you get with Enterprise Edition Expected Fallover Behaviors Summary
Value
Tier 4 - Batch/Online database shadowing & journaling, Point in Time disk copy (FlashCopy), TSM-DRM Tier 3 - Electronic Vaulting, TSM**, Tape
15 Min
1-4 Hr.
4-8 Hr.
8-12 Hr.
12-16 Hr.
24 Hr.
Days
Recovery Time
Tiers based on SHARE definitions *PTAM=Pickup Truck Access Method with Tape **TSM=Tivoli Storage Manager ***=Geographically Dispersed Parallel Sysplex
Packaging Changes:
Standard Edition - Local Availability Enterprise Edition - Local & Disaster Recovery
(Version 7.1 will not be released until 2011)
Licensing Changes:
Small, Medium, Large Server Class Product Lifecycle: Version HACMP 5.4.1 PowerHA 5.5.0 PowerHA SystemMirror 6.1.0 PowerHA SystemMirror 7.1.0
* These dates are subject to change per Announcement Flash
Release Date Nov 6, 2007 Nov 14, 2008 Oct 20, 2009 Sept 10, 2010
Standard Edition
Enterprise Edition
Highlights:
New Editions to optimize software value capture Standard Edition targeted at datacenter HA Enterprise Edition targeted at multi-site HA/DR Tiered pricing structure Small/Med/Large
Campus Style DR
Cross Site LVM Mirroring SVC Split I/O VDisk Mirroring Metro Mirror or SRDF * - AIX LVM Mirrors - SVC VDisk functionality - Disk Based Replication
- IP Based Replication
Data Center
LVM Mirroring
Disk Replication
VDisk Mirroring
Storage Enclosure 1
Storage Enclosure 2
Network Connectivity
Subnetting & potential latency Can you merge fabrics and present LUNs from either location across the campus? LVM Mirroring across storage subsystems Storage Level Replication VDisk Mirroring (San Volume Controller) Both copies are accessible Only active copy available Single logical copy mirrored on backend 8
(Distance limitations ~10km or 6 miles)
LVM mirrors
FC switch FC switch
FC switch FC switch
LVM mirrors
Choices: Cross Site LVM Mirroring VDisk Mirroring (Split I/O Group) Metro Mirroring
LAN
LAN
SAN
DWDM DWDM
SAN
SVC
SVC
10
Logical Volume
LV
LV
hdisk
hdisk
hdisk
hdisk
LV Copy 1
primary
LV Copy 2
secondary
New in AIX 6.1 - Mirror Pools Intended for Asynchronous GLVM Address Issues with Extending Logical Volumes and spanning copies New DR Redbook: Exploiting PowerHA SystemMirror Enterprise Edition - Scenario for Cross Site LVM with Mirror Pools
11
Benefits: Prevent spanning copies Requirement for Async GLVM Other Potential Uses: - Cross Site LVM configurations - Synchronous GLVM
* Reason that there is no asynchronous GLVM on AIX 5.3 and why it was not retrofitted
* CSPOC does not currently allow you to create logical volume via menus. * Work around is to create logical volume using smit mklv and then continue creating Filesystem via CSPOC
12
Infrastructure Considerations
Site A
Site B
LAN
LAN
SAN
DWDM Node A DWDM
SAN
Node B
50GB 50GB
SITEAMETROVG
50GB 50GB
Important:
Identify & Eliminate Single Points of Failure! 13
Infrastructure Considerations
Site A
XD_rs232 XD_IP
Site B
WAN
net_ether_0
LAN
LAN
SAN
DWDM Node A
ECM VG: diskhb_vg1 hdisk2 000fe4111f25a1d1
SAN
DWDM Node B
ECM VG: diskhb_vg1 hdisk3 000fe4111f25a1d1
1GB
1GB
50GB 50GB
SITEAMETROVG
50GB 50GB
Important:
Identify Single Points of Failure & design the solution around them 14
15
Resource Group A Startup: Online on Home Node Only Fallover: Fallover to Next Node in List Fallback: Never Fallback Site Policy: Prefer Primary Site Nodes: NodeA Node B Service IP: service_IP1 service_IP2 Volume Groups: datavg Application Server: AppA XD_rs232_net_0 XD_IP_net_0 en2 10.10.10.100 base 10.10.10.120 service_IP1 disk_hb_net_0 1GB disk_hb_net_1 1GB en2 13.10.10.100 base 13.10.10.120 service_IP2
30GB Bldg A
30GB
30GB
30GB Bldg B
16
Node A1
Node A2
Node B1
Node B2
SAN
PPRC Links
SAN
SVC
SVC
HA first
then DR
17
18
IP Communication
Considerations: DS8700 Global Mirror, EMC SRDF & Hitachi True Copy require PowerHA 6.1+ The Enterprise Edition adds additional cluster panels to define and store the relationships for the replicated volumes CLI is enabled for each replication offering to communicate directly with the storage enclosures and perform a role reversal in the event of a fallover
Source LUNs
Target LUNs
19
IBM z/VSE
Novell NetWare
VMware vSphere 4
Sun Solaris
Linux
SGI
IBM BladeCenter
1024 Hosts
Point-in-time Copy Full volume, Copy on write 256 targets, New Incremental, Cascaded, Reverse Space-Efficient, FlashCopy Mgr
Entry Edition software
SAN
New
IBM DS IBM IBM XIV N series
New
New
Hitachi
Lightning Thunder TagmaStore AMS 2100, 2300, 2500 WMS, USP
EMC
CLARiiON CX4-960 Symmetrix
Sun NetApp NEC Fujitsu Pillar Bull StorageTek FAS iStorage Eternus Axiom StoreWay 3000
8000 Models 2000 & 1200 4000 models 600 & 400
20
For the most current, and more detailed, information please visit ibm.com/storage/svc and click on Interoperability.
- Note Enterprise Verification takes longer - Dont install if you are not using it Filesets are in addition to base replication
solution requirements
21
Site B GLVM code is available in the AIX base media: AIX 5.3 synchronous replication AIX 6.1 synchronous & asynchronous replication
PowerHA SystemMirror Enterprise Edition provides SMIT panels to define and manage all configuration information and automates the management of the replication in the event of a fallover
Find more details in the new DR Redbook SG24-7841-00 Source LUNs Target LUNs
22
23
Vegas Conference: Implementing PowerHA SystemMirror Enterprise Edition for Asynchronous GLVM
Double session lab Wednesday Bill Miller
24
* *Communication CommunicationPath Pathto toTakeover TakeoverNode Node * *Application Server Name Application Server Name * *Application ApplicationServer ServerStart StartScript Script * *Application Server Stop Application Server StopScript Script HACMP HACMPcan cankeep keepan anIP IPaddress addresshighly highlyavailable: available: Consider specifying Service IP labels Consider specifying Service IP labelsand and Persistent PersistentIP IPlabels labelsfor foryour yournodes. nodes. Service ServiceIP IPLabel Label Persistent PersistentIP IPfor forLocal LocalNode Node Persistent IP for Takeover Persistent IP for TakeoverNode Node
Single XD_data network IP-Alias enabled Includes all inter-connected network interfaces Persistent IP address for each node (optional for single interface networks) One Resource Group Inter-site Management Policy Prefer Primary Site Includes all the GMVGs created by the wizard Application Server One or more Service IPs
25
The Enterprise Edition appends Inter-Site Management policies beyond the resource group node list - Prefer Primary Site - Online on Either Site - Online on Both Sites
RG Dependencies: - Online on Same Site - will group RGs into a set - rg_move would move set not an individual resource group - SW will prevent removal of RG without removing dependency first 26
Serial Networks:
* PowerHA SystemMirror 7.1 has self tuning FDR with IP Multicasting * * There is no Enterprise Edition available for the 7.1 2010 release
27
* Example shows SVC menu but same option there for all replication options
28
29
2. Start PowerHA
Application Server Min 1 Desired 2 Max 2
Read Requirements
Application Server Min 1 Desired 2 Max 2
Activate LPARs
LPAR Profile Min 1 Desired 1 Max 2
DLPAR
DLPAR
System A
- 1 CPU Primary Site 3. Release resources Fallover or RG_move + 1 CPU 1 CPU Oracle DB 2
HMC Cluster 1
System B
Oracle Standby DB 12CPU CPU + 1 CPU - 1 CPU
System C
Standby 1 CPU CPU Oracle DB 2 + 1 CPU
30
GLVM Resources & Statistics EMC SRDF relationships Hitachi True Copy relationships
Knowing these will help identify & manage configuration Various usage examples in the new Enterprise Edition Redbook
31
Enterprise Edition appends additional tests that can be included in custom test plans
32
Results:
Standby site will acquire and redirect relationship Lost write access to disks and commands hung Might result in a system crash * Note: Environments in same network segment could experience duplicate IP ERROR messages
Intermittent Failure (even worse): - Links back up and then log GS_DOM_MER_ERR (halt of Standby Site) - Entire cluster is now down since access to LUNs is N/A on primary site
33
* Note that there is only one network passing heartbeats between the sites * Did not specify replication type but can probably assume that this was an SVC Metro Mirror configuration based on the name of the states * Arrows should really point in other direction for the replication after the failure Avoiding a partitioned cluster: - More XD_IP networks - Serial over Ethernet - Diskhb networks over the SAN Future Considerations: - Quorum server
34
Consider bringing down all nodes on one site (avoid a cluster initiated halt)
Hard Reset might be the best approach as graceful stop might hang up attempting to release individual resources (ie. unmount, varyoff with no access to volumes)
What type of Storage currently being used ? Same storage type at both locations ? Requirement to use CLI for management of relationships SLA Requirements is HA required after a site fallover? What is the True requirement for automated fallover
Recovery Time Objective RTO Recovery Point Objective RPO
Extended Distance Offerings Introduction to PowerHA SystemMirror for AIX Enterprise Edition
HA20 (AIX) Thursday & Friday Shawn Bodily 36
Recovery plan should be well documented & reside at both locations Leverage Cluster functions to ensure success
CSPOC User functions guarantee that users are propagated to all cluster nodes User Password cluster management functions will ensure that changes are also updated on all cluster nodes
37
Automated Fallover
Manual Fallover option (based on state of disks) Enterprise Cluster will automatically trigger a fallover
To disable alter start up scripts at DR location
Ease of Management
One time configuration Location of RG will determine direction of replication
38
Questions?
Additional Resources
New - Disaster Recovery Redbook
SG24-7841 - Exploiting PowerHA SystemMirror Enterprise Edition for AIX
http://www.redbooks.ibm.com/abstracts/sg247841.html?Open
Online Documentation
http://www-03.ibm.com/systems/p/library/hacmp_docs.html
40