Daniela Milanova

Senior Sales Consultant

Oracle Disaster Recovery Solution .

What is Data Guard? • Management. errors. and corruptions of the database • Automates the process of maintaining a copy of a Oracle production database (standby database) . monitoring and automation software infrastructure that protects data against failure.

Data Guard Architecture Clients Clients Primary Site Standby Site Data Changes Primary Database Services types: • Log transport services • Log apply services • Role-management services Standby Database .

all should use the same combination .Software Data Guard Requirements • Same release of Oracle Database Enterprise Edition must be installed for all databases • Incase of using ASM/OMF.

• The hardware can be different for the primary and standby database • The operating system and platform architecture for the primary and standby databases must be the same • The operating system version for the primary and standby databases can be different • In case of all databases are on the same system. Hardware an OS Data Guard Requirements . OS must allow mounting more than one database with the same name.

transaction for transaction copy of the Production database.Data Guard At the Highest Level • Data Guard comprises of two parts • REDO APPLY • Maintains a physical. • Can be open in Read Only mode for short time reporting • SQL APPLY • Maintains a logical. • Can be open in Read Write for reporting purposes and cloning activities . block for block copy of the Production (also called Primary) database.

REDO Apply Architecture Primary Database Asynchronous/ Synchronous Redo Shipping Physical Standby Database MRP Redo Apply Network Backup DIGITAL DATA STORAGE DIGITAL DATA STORAGE • Maintains a ‘Physical’ block for block copy of the Primary Database .

SQL Apply Architecture Primary Database Asynchronous/ Synchronous Redo Shipping Logical Standby Database Continuously Open for Reports Network SQL Apply Transform Redo to SQL Additional Indexes and Materialized Views • Maintains a ‘Logical’ transactional copy of the Primary Database .

Data Protection & Disaster Recovery Solution with Reporting Capability Clients Standby Site Physical Standby Database Primary Site Primary Database h C ta Data a D Guard Da ta Ch an ge s es g an Reporting Clients Standby Site Logical Standby Database .

• Maximum protection Data Guard Data Protection Modes • No data loss • In case of failure remote writting the primary database is shutsdown • Maximum availability • No data loss • In case of failure remote writting the primary database works in maximum performance • Maximum performance • Highest possible level of data protection • No affecting performance of the primary database .

Data Guard Role Transition • Oracle Data Guard supports two roletransition operations • Switchover • Planned role reversal • Used for OS or hardware maintenance • No data loss • Failover • Unplanned role reversal • Use in Emergency • Zero or minimal data loss depending on choice of data protection mode .

production DB must be rebuilt .Existing Site Recovery Tradeoffs Primary Database Redo Shipment Standby Database Reporting on delayed data Delayed Apply • Log apply may be delayed to protect from user errors but: • Switchover/Failover gets delayed • Reports run on old data • After failing over to standby.

Enhanced DR with Flashback Database Primary Database Redo Shipment Real Time Apply Standby Database Real Time Reporting No Delay! Flashback Log Flashback Log Primary: No reinstantiation after failover! • • • Flashback DB removes the need to delay application of logs Flashback DB removes the need to reinstantiate primary after failover Real-time apply enables real-time reporting on standby .

• By utilizing a Logical standby database customers can upgrade one database while running on the original production database and then run in a mixed version environment before returning to the original.Rolling Database Upgrades • In Oracle Database SQL Apply provides the starting point for performing rolling upgrades of the Oracle RDBMS software and database with minimal interruption of service. but upgraded. configuration! .

upgrade A Run in mixed mode to test .SQL Apply – Rolling Database Upgrades Upgrade Clients Redo A B Logs Queue A B Patch Set Upgrades Major Release Upgrades Version X 1 Version X 2 X X+1 Initial SQL Apply Config Upgrade node B to X+1 Redo Upgrade A B A Redo B Cluster Software & Hardware Upgrades X+1 X+1 3 X X+1 4 Switchover to B.

Benefits of Oracle Disaster Recovery Solution • • • • Disaster recovery and high availability Complete data protection Efficient utilization of system resources Flexibility in data protection to balance availability against performance requirements • Automatic gap detection and resolution • Centralized and simple management • Integrated with Oracle database .

Ease of Use • New and Improved Data Guard Manager! • Monitoring SQL Apply • Unsupported Storage Attributes • Applied Logs and Apply Progress • Managing the Logical Standby • Bypassing the Guard • Skipping Table Redo • Skipping Failed (and subsequently fixed) Transactions .

New Data Guard Feature: Fast-Start Failover Automatic and fast • Physical and Logical standby each complete failover in less than 20 seconds • Old primary is reinstated automatically once connectivity is re established between Observer and primary database .

open the new primary directly from the mount state • Or.Data Guard Best Practices: Switchover for Planned Maintenance For fastest switchover (< 1 minute) • Prior to switchover • A physical standby transitioning from read-only back to Redo Apply should be restarted • Disconnect all sessions and stop job processing • Shutdown abort for all secondary RAC instances on both primary and standby databases • Enable real-time apply on the standby database and ensure the standby is synchronized with the primary database • For switchovers using SQL or command line interface.complete transactions and shutdown abort all primary instances . simulate a Fast-Start Failover .

Data Guard Best Practices: Faster Redo Transport • Set SDU=32K • Tune network parameters that affect network buffer sizes and queue lengths • Ensure sufficient network bandwidth for peak database redo generation rate + other activities http://www.pdf .oracle.com/technology/deploy/availa bility/pdf/MAA_DG_NetBestPrac.

Set device queues to a minimum of 10.000 Megabits / 8 = 3.000 Mbps * 25ms (.000.025 = 25.000.000 bytes • Tune network device queues to eliminate packet losses and waits.125.000 * .Data Guard Best Practices: Tune Network Parameters • Send and receive buffer size = 3 x bandwidth delay product (BDP) BDP = the product of the estimated minimum bandwidth and the round trip time between the primary and standby server BDP = 1.000 (default 100) .025 secs) = 1.000.

Impact of Network Tuning TestResults Results--Oracle OracleDatabase10g Database10gRelease Release1 1& &2 2 Test .

Data Guard or Remote Mirroring • Remote Mirroring (host-based and storage-based) is another way to protect enterprise data • However: • What about Data Reliability? • What about Data Recoverability? • What about Data Availability? • What about Cost? • A well-designed Business Continuity Plan must consider these critical issues in addition to simple data protection .

Better Network Efficiency .Remote mirroring solutions: datafiles. archivelog files.Transmits only redo data . extra cost and latency issues Data Guard enables zero data loss Preserves write-order consistency Avoids logical and physical corruptions Both SQL Apply and Redo Apply validates redo data before applying 3.Data Guard is the Preferred Solution 1. Better Data Protection – – – – . redolog files must be mirrored 2. complexity and latency Data Guard based on standard TCP/IP Data Guard doesn’t have to deal with protocol converters. Better suited for WAN-s – – – – Fibre/ESCON-based mirroring solutions have an intrinsic distance limitation Protocol converters needed – adds to the cost.

Data Guard is the Preferred Solution 4. Better Functionality – Data Guard based on commodity hardware Does not force lock-in with storage vendors Remote mirroring solutions typically need identically configured storage from the same vendor Data Guard is a comprehensive DR solution: Redo Apply/SQL Apply Flexible protection modes Push-button switchover/failover Graceful handling of network connectivity problems Provides more value for DR investment Standby database can be opened read-only or read-write Allow backups to be offloaded on the standby database Allows reporting/queries using the standby database Integrated natively with other HA features (RAC. Higher ROI – – – . Higher Flexibility – – – 5. RMAN.) No extra cost 6. etc.

Oracle Data Guard’s integrated disaster recovery solution involving standby databases is preferred to remote disk mirroring: • For technical reasons • For business reasons • Remote mirroring may be used to protect nonOracle database data that are changing frequently: • File system data • Data in databases that are not Oracle .Data Guard and Remote Mirroring Summary • For protecting Oracle data.

1 . Flashback.Competitive Strengths vs. SharePlex • SharePlex • Redo log-based replication tool from Quest software • Heavy front-end processing to extract transaction information from the primary redo logs • Somewhat similar to Data Guard SQL Apply • It doesn’t make sense for customers to use SharePlex: Data Guard Cost Feature support DR Zero Data Loss Primary system overhead Integration with HA features Free Native feature of the database Comprehensive and integrated DR solution Supported Minimal Integrated with RAC. See MetaLink Note 97080. … SharePlex Expensive Based on unpublished and unsupported interface1 At best a replication solution No support because of architecture limitations Much more Limited integration 1. RMAN.

10g New Features and Best Practices .

Data Guard Release 10.2 Redo Transport Improvements • Increased network write sizes to 10 MB to better utilize network capacity for both ARCH and LNS • LNS can potentially write 10MB or less • Full decoupling of LGWR and LNS processes • No more waits during log switches • No more waits when LNS buffer is full • Intra-file parallelism support for ARCH • Up to 29 parallel remote archive processes .

1GB/100Mbps/25msRTT 1GB/100Mbps/25msRTT .

.

high bandwidth network. use ASYNC to minimize primary database performance impact • Follow tips for tuning redo transport • Example: Less than 7 seconds of data loss exposure for high redo rates of 2-12 MB/sec with <=25 ms latency in our tests . use SYNC transport • For high latency or low bandwidth networks.Data Guard Best Practices: Gap Resolution and Data Loss • For fastest gap resolution • Leverage intra-file archive parallelism (MAX_CONNECTIONS attr) • Follow tips for tuning redo transport to improve network utilization • To minimize data loss • For a low latency.

less than 10% impact on primary database even with latencies of 50 and 100 ms • Primary database performance impact was 2-3 times less with the new ASYNC transport compared to previous releases Best Practice • Allocate additional I/O bandwidth for Online Redo Log Files .Data Guard Best Practices: Reduce Overhead on Primary Performance Gains with 10g Release 2 ASYNC Transport • For redo rates less than 2 MB/sec. there is less than 5% impact on the primary database across different latencies • For very high redo rates of 20 MB/sec.

pdf .com/technology/deploy/availability/pdf/ RMAN_DataGuard_10g_wp.Data Guard Best Practices: Using Standby for Backups Offload Backups to Physical Standby Database • Eliminate backup overhead on primary database • RMAN allows for backup operations while Redo Apply is in progress Best Practices • For simplicity. use identical directory structures on the primary and standby databases • Use RMAN Recovery Catalog so that backups taken on one database server can be restored on another • Use a catalog server physically separate from primary and standby sites • Reference MAA RMAN/Data Guard best practices paper http://www.oracle.

Data Guard or Remote Mirroring? Load 200txns/sec & Redo rate 1.1 MB/sec • Data Guard SYNC transport has less overhead on the primary database .

RVWR – flashback log writer. and foreground direct writes • Both DBWR and LGWR are affected by network latency in a remote mirroring solution. DBWR – database writer. ARCH . causing contention for free buffers and an increase in buffer busy waits . In contrast. A remote mirroring solution must transmit all database writes • A remote mirroring solution needs to transmit the following writes: LGWR .archiver.Data Guard Advantage Because … • Data Guard only transmits redo. only LGWR is impacted by network latency in a Data Guard solution • Higher wait times for DBWR can be very etrimental to performance.log writer.

Some customer references … .

Utilities. Government. Information Resellers. Agents & Title Companies • Thousands of concurrent online users at peak • www.firstamres.First American Real Estate Solutions • Nations largest source of Real Estate data • 100 million properties • Online services for 50.com . Corporations.000 clients • Lenders. Appraisers.

1 hour for site failure • Oracle Database 10g goals • RPO: zero data loss for computer failure.maximum data loss • Oracle9i: 10MB for computer failure. 200MB for site failure • Recovery Time Objective (RTO) for Oracle Database • Oracle9i: 10 minutes for computer failure.HA/DR Requirements • High Availability: 24x7 .365 days/year • Limited instances of planned downtime once/quarter • Recovery Point Objective (RPO) . 10MB for site failure • RTO: zero downtime for computer failure. 10 minutes for site failure .

Oracle 9i HA/DR Architecture Local Standby #1 Data Guard LGWR Asynchronous Redo Shipping First American Primary Production Site Local Standby #2 Remote Disaster Recovery Site Remote Standby #3 Data Guard Delayed Apply (30 minutes) LGWR Asynchronous Redo Shipping Data Guard Archive Log Shipping (ARCH) Primary Database 1500 miles > .

Looking Ahead to Oracle Database 10g • Real Application Clusters • Transparent failover on node failure. zero data loss • Flashback Technologies • Flashback Database & Flashback Table • Protect/repair for logical corruptions • Enhanced LGWR ASYNC redo transport • Improve RPO for remote DR site • Real Time Apply • Improve RTO .

Plan Primary Production Site Remote Disaster Recovery Site First American Data Guard LGWR Asynchronous redo shipping 1500 miles > Primary Database Real Application Cluster Standby Database Data Guard .Oracle Database 10g Architecture .

Data Guard for DR First American • Better remote data protection • ASYNC enhancements = less compromise on WAN • Better protection against logical corruption • Fewer databases. quicker repair of logical corruptions . surgically repair vs full point in time • Less downtime • Faster failover.Oracle Database 10g Benefits • Higher Availability –transparent node failover • RAC for HA.

oracle.000 concurrent users 5.5TB Oracle database www.com .Oracle Corporation Global Single Instance (GSI) • A key enabler in Oracle saving $1 billion annually • Consolidation: 1 is the magic number • Versus 75 separate implementations of Oracle Apps • Versus 100’s of Oracle databases world wide • • • • Oracle E-Business Suite 7.

25-35ms RTT network latency . 2.000 miles of separation. dual OC12 • 1.5MB/sec sustained • WAN.Oracle Global Single Instance HA/DR Requirements • HA requirement • Continuous operation regardless of component failure • DR requirement • Protect against site failure. physical & logical corruption • RPO – 5 minutes of transactions • RTO – database failover in less than 1 hour • High workload – OLTP system • 8.2MB/sec redo generation at peak.

000 miles > Primary Database Standby Database (4 hour delayed apply) .Oracle Global Single Instance HA/DR Architecture GSI Production Site (4) SUN F12Ks 36 CPU’s each (4) SUN F12Ks DR domain 8 CPU’s each Development & Test domain: 28 CPU’s each Disaster Recovery Site Data Guard LGWR Asynchronous redo shipping 1.

Utilization of Standby Resources • Four node Standby Cluster • 2 domains: DR. Development & Test • DR domain has sufficient capacity to maintain standby database and execute failover • At Failover time: • • • • Failover is executed. standby assumes primary role Development & Test is stopped CPU’s are re-allocated to the new production domain Nodes are upgraded in a rolling fashion with no application downtime .

just 30 minutes • • • • Cancel recovery on standby and open read only Stop the affected application on primary Export data from standby Recreate table on primary.000 row table updated by mistake • Standby database configured with 4 hour delayed apply • Instead of 10 hours of downtime.Delayed Apply – Downtime Avoided • Human error caused logical corruption on primary • 160. import data to primary db after disabling triggers • Restart application on primary • Restart recovery on standby .

Oracle Database 10g Feature Adoption • Flashback Technologies • Flashback Table • Flashback Database Oracle Global Single Instance • Data Guard 10g • Real Time Apply • Asynchronous Redo Transport enhancements • Redo Apply performance enhancements • Benefits • Faster failover. better data protection .

.

Ohio Savings Bank
• Founded in 1899 • In Top 20 of all US Mortgage Lenders • Provide mortgage services to independent brokers nationwide via Web • $13 billion in assets • Reputation for Innovation
• 2002 Web Site of the Year (Mortgage Technology Magazine)

• www.ohiosavings.com

HA/DR Requirements • 24 x 7 .365 days/year • Recovery Point Objective: zero data loss • Recovery Time Objective: 30 minutes • Planned maintenance windows Sunday mornings .

0 Remote DR Site HP N-Class PA-RISC EMC Symmetrix SAN attached HP-UX v11.Ohio Savings Bank Oracle9i Architecture Online Mortgage Services Primary Production 2-node RAC Cluster HP N-Class PA-RISC EMC Symmetrix SAN attached HP-UX v11.0 Data Guard Archive Log Shipping (ARCH) 3rd party storage based synchronous disk mirroring for online logs Primary Database 15 miles > .

2 Zeon CPUs/node EMC Symmetrix & Clariion SAN attached Red Hat Linux Ohio Savings Bank Remote DR Site Primary Database Data Guard Maximum Availability synchronous redo shipping Zero Data Loss 15 miles > Standby Database .Oracle Database 10g Architecture Customer Call Center Primary Production 3-node RAC Cluster HP DL-380. 2 Zeon CPUs/node EMC Symmetrix & Clariion SAN attached Red Hat Linux 3-node RAC Cluster HP DL-380.

Oracle Database 10g Features Deployed • Automatic Storage Management • Reduces time spent managing storage Ohio Savings Bank • RMAN Flash Recovery Area • Fully automates disk-based backup & recovery • Oracle Data Guard • Zero Data Loss • Replaces 3rd party remote mirroring • Standby DB also used for daily exports .

Ohio Savings Bank Automatic Storage Management • Automatically spreads database files across all available storage • Automatic rebalancing of used disk space when disks are added or removed • Increases I/O distribution beyond disk array striping • Reduces DBA workload .

Future Plans GRID – from concept to reality • Add nodes to the existing RAC 10g cluster • Manage cluster via a single system view • Add mortgage database.Ohio Savings Bank. and potentially the OSB Data Warehouse to same RAC 10g cluster • Define application workloads as services • Establish rules to dynamically allocate processing resources to services • Maximize the utilization of resources while meeting changing business needs .

Oracle Disaster Recovery Solution Includes as Oracle Products: • Oracle Database Enterprise Edition on both sites .

Oracle Maximum Available Architecture .

Oracle Maximum Availability Architecture Clients Application Servers WAN Traffic Manager Clients Application Servers Dedicated Network Instance1 Primary Site RAC based hb hb Instance2 Data Guard Instance1 hb hb Instance2 Secondary Site .

oracle.com/deploy/availability/htdocs/maa.com/deploy/availability • Data Guard home page on OTN: http://otn.html .oracle.oracle.html • New SQL Apply Best Practices Paper now available! • HA Portal on OTN: http://otn.com/deploy/availability/htdocs/odg_overview.Resources • Maximum Availability Architecture white papers: http://otn.