You are on page 1of 36

Oracle Database Service

High Availability with


Data Guard?
Robert Bialek
Senior Principal Consultant

@RobertPBialek doag2017
Who Am I

Senior Principal Consultant and Trainer at Trivadis GmbH in Munich.


– Master of Science in Computer Engineering.
– At Trivadis since 2004.
– Trivadis Partner since 2012.
Focus:
– Data and Service High Availability, Disaster Recovery.
– Architecture Design, Optimization, Automation.
– New Technologies (Trivadis Technology Center).
– Open Source.
– Technical Project Leadership.
– Trainer: O-GRINF, O-RAC, O-DG.
2 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Our company.

Trivadis is a market leader in IT consulting, system integration, solution engineering


and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:

OPERATION

Trivadis Services takes over the interacting operation of your IT systems.

3 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
With over 600 specialists and IT experts in your region.
COPENHAGEN

14 Trivadis branches and more than


600 employees.
HAMBURG
200 Service Level Agreements.
Over 4,000 training participants.
Research and development budget:
DÜSSELDORF
CHF 5.0 million.
FRANKFURT Financially self-supporting and
sustainably profitable.
STUTTGART
Experience from more than 1,900
FREIBURG
MUNICH
VIENNA
projects per year at over 800
BASEL
BRUGG customers.
ZURICH
BERN
GENEVA LAUSANNE

4 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Technology on its own won't help you.
You need to know how to use it properly.

5 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Database Service High Availability – Goal

Increase database service uptime, by:


– eliminating any single point of failure to avoid unplanned outages.
– minimizing the effect of an unplanned outage on the end user (automatic failover).
– reducing downtimes during planned outages.

Consider the whole SW/HW stack. Find the best cost/risk ratio.
Effort
Downtime Costs
Database Application Complexity
Storage Server(s) Clients
Server(s)
Best cost/risk
ratio

Availability

6 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Database Service High Availability – Options
HA

Cluster – primarily used option for service high availability:


– Real Application Clusters. SPOF

– RAC One Node.


– Cold Failover Cluster.

HA
Data Replication – used mostly for data, rather than
service high availability:
– Data Guard (Fast-Start Failover/Global Data Services).
– GoldenGate (Global Data Services).
– Other replication technologies. HA

7 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Agenda

1. Introduction
2. Configuration
3. Special Cases
4. Conclusions

8 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Introduction

9 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Database Service HA with Data Guard? – Introduction
Data Guard
Yes, it can also be used for service high availability:
– Planned downtimes – manual switchover.
– Unplanned downtimes – fast-start failover or manual failover. Primary Standby
FSFO Configuration
Why might we consider Data Guard for service high availability:
– Less complex than a cluster installation.
– Infrastructure requirements not that high (even local storage is sufficient).
– Not subject to additional license fees (EE license assumed).
– Additionally, many other advantages: data high availability, snapshot standby,
potentially rolling upgrade capability, ...

But, with some restrictions we need to consider...

10 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Database Service HA with Data Guard? – Big Picture
Backup Observers
(optional, 12.2)
Database Clients Master Observer
Required

Ping Ping Ping

Primary
RW Service

Transparency
required
(failover/
switchover) ...
Primary Target Candidate Target Failover Standbys
Failover Standby (optional, 12.2)

11 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Database Service HA with Data Guard? – Monitoring
Observer
Failover condition
detected
Reconnect interval expired Logoff

Timeout DGMGRL – Threads


ObserverReconnect
property set and reached Connect Failover
re-tries W000 B001
P001 S001

SLEEP SLEEP
~ 3sec. Connect ~ 3sec.

Enter PING State

PING PING
PRIMARY TARGET STANDBY

12 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Configuration

13 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Data Guard Protection Modes with FSFO – Prerequisites
FSFO: Guaranted zero data FSFO: Data loss possible. FSFO: Guaranted zero data
loss. loss.

MaxAvailability (10.2+) MaxPerformance (11.1+) MaxProtection (12.2)


▪ LogXptMode=SYNC or ▪ LogXptMode=ASYNC ▪ LogXptMode=SYNC
FASTSYNC (12.1+) ▪ FastStartFailoverTarget(*) ▪ FastStartFailoverTarget(*)
▪ FastStartFailoverTarget(*) ▪ FastStartFailoverLagLimit ▪ Flashback Database
▪ Flashback Database ▪ Flashback Database ▪ Recommended: at least 2
STDBY DBs (protection
Mostly used
mode downgrade!)
protection mode

All Protection Modes


DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = <xy>;
DGMGRL> ENABLE FAST_START FAILOVER;
Value in seconds

14 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Observer
Fast-Start Failover – Observer (1)
Ping
Monitoring component, initiates a failover procedure.
In 12.2, up to 3 observers (in background) can be started: W000 B001
P001 S001
– One master and up to two backup (standby) observers.
PRIMARY TARGET
DGMGRL> START OBSERVER OBS1.TRIVADIS.COM IN BACKGROUND Failover Standby
FILE IS '$ADMIN_SID/fsfo_$ORACLE_SID.dat'
LOGFILE IS '$ADMIN_SID/fsfo_$ORACLE_SID.log'
CONNECT IDENTIFIER IS <Alias1>.TRIVADIS.COM; Oracle wallet
required
In older releases:
– Only one running observer (HA needs to be adressed).
nohup dgmgrl -logfile $ADMIN_SID/fsfo_$ORACLE_SID.log <<EOD &
CONNECT $CONNECT_DATA
START OBSERVER FILE='$ADMIN_SID/fsfo_$ORACLE_SID.dat';
EOD
15 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Observer
Fast-Start Failover – Observer (2)
Ping
Fast start-failover is initiated by the master observer to
the target standby database, if one of the following W000 B001
conditions is detected: P001 S001

– observer and the target standby database cannot reach PRIMARY TARGET
the primary database (default: ObserverOverride=‘FALSE‘). Failover Standby

– user-configurable condition is met.


– DBMS_DG.INITIATE_FS_FAILOVER function has been executed.

Additonally, other pre-condidtions enforced by a protection mode need to be fulfilled:


– MaxProtection/MaxAvailability: target failover standby is in SYNC.
– MaxPerformance: FastStartFailoverLagLimit not reached for the target failover
standby.
16 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Data Guard: Role-Based Services

For a Data Guard system, we need a role-based service, Database Clients


that is running only if database has a specific role:
– Read-write service on a primary database.
– Optionally, a service on standby databases for reporting.
?
– Optionally, a service on snapshot standby databases.

To accomplish this task: Service Service


R/W R/O
– Use Oracle Grid Infrastructure role-based services.
[SNAP]
– Create your own AFTER STARTUP ON DATABASE trigger.

17 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Data Guard: Example Role-Based Services

Example role-based services with Grid Infrastructure.

srvctl add service -db DB_SITE1 –service SRV_RW.trivadis.com -role PRIMARY


srvctl add service -db DB_SITE1 –service SRV_RO.trivadis.com -role PHYSICAL_STANDBY
srvctl add service -db DB_SITE1 -service SRV_SP.trivadis.com -role SNAPSHOT_STANDBY

Services are started only if database and service role match.

SvcAgent::start 680 query_db_role


SvcAgent::start 710 not starting service srv_rw Role mismatch - Service
role:PRIMARY, current DB role:PHYSICAL_STANDBY

Depending on used client HA features (TAF, FAN/FCF, AC) additional service


properties need to be specified.

18 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Client-Side Configuration – Main Problems To Address
CASE 1 CASE 2

New network session (connect) Already established network session

Database Clients Database Clients

2 Connect attempts 4 Client failover 1 Connected 5 Client failover


3 Wait for Problem
3 Re-connect attempts Problem
connect timeout
4 Wait for
Problem re-connect timeout

Problem

1 IP not reachable (server/network/… issue) 2 IP not reachable (server/network/… issue)

19 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
New Oracle Net Session – Connect Timeout (1)
1
sqlnet.ora parameters (OCI, ODP.net)
– Applies to each IP that a host name resolves to!
– All Oracle client versions supported. LSNR LSNR
Oracle Net
TCP.CONNECT_TIMEOUT=3 #default 60 sec. 2
SQLNET.OUTBOUND_CONNECT_TIMEOUT=5 #no default
Three-way handshake
3
For clients >=11.2:
OLTP.trivadis.com =
(DESCRIPTION =
(FAILOVER=ON) (LOAD_BALANCE=OFF) Introduced in 12.1.0.2
(CONNECT_TIMEOUT=5)(RETRY_COUNT=3)(RETRY_DELAY=1)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP )(HOST = italy )(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP )(HOST = sweden )(PORT = 1521)))
(CONNECT_DATA = (SERVICE_NAME = OLTP_RW.trivadis.com)))

20 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
New Oracle Net Session – Connect Timeout (2)

JDBC Thin driver


– TRANSPORT_CONNECT_TIMEOUT is available beginning with 12.2 version
– To use RETRY_COUNT with 12.1.0.2, patch is required (BUG 19154304)
pds.setURL("jdbc:oracle:thin:@(DESCRIPTION =(FAILOVER=ON)(LOAD_BALANCE=OFF)" +
"(CONNECT_TIMEOUT=3)(RETRY_COUNT=10)(RETRY_DELAY=1)" +
"(ADDRESS_LIST = " +
"(ADDRESS = (PROTOCOL = TCP )(HOST = blue.trivadis.com )(PORT = 1521)) " +
"(ADDRESS = (PROTOCOL = TCP )(HOST = brown.trivadis.com )(PORT = 1521))) " +
"(CONNECT_DATA = (SERVICE_NAME = sales_rw.trivadis.com)))");

JDBC Thin clients can alternatively use the following driver property (ms)
– Overrides CONNECT_TIMEOUT from address description parameters
Properties prop = new Properties();
prop.put(oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, ""+3000);
ods.setConnectionProperties(prop);

21 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Established Oracle Net Session – Re-Connect Timeout
2

Break established network connection 1


without waiting for long TCP timeouts (>15 min.) P1
LSNR LSNR
– In most cases no VIPs in use! Oracle Net

3 4
Timeout Client failover
Using the following parameters is not a good idea:
SQLNET.RECV_TIMEOUT=30 #no default value, OCI driver
SQLNET.SEND_TIMEOUT=30 #no default value, OCI driver

prop.put ("oracle.jdbc.ReadTimeout", "5000"); //5000ms, JDBC Thin driver

Better solution:
– If possible use: Fast Application Notification/Fast Connection Failover.
– Tuning OS kernel parameter tcp_retries2 might be also an alternative.

22 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Client HA Features – Overview

Transparent Application Failover:


– Can be used with Data Guard the same way (advantages/disadvantages) as with
a cluster.
Fast Application Notification/Fast Connection Failover:
– Oracle Grid Infrastructure is required to register with ONS.
– Comparing to RAC “only” rapid notification about up/down events, no workload
balancing.
Application Continuity can be used with Data Guard the same way as with a cluster:
– But requires RAC or RAC One Node or ADG (GG) option.

More about this topic:


– DOAG 2016 presentation: „Oracle Client Failover - Under the Hood”
23 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases

24 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases with FSFO: Candidate Targets Standby

Starting with 12.2, many candidate fast-start failover target databases can be
specified, but switchover or FSFO works only to the current target standby database.
DGMGRL> EDIT DATABASE db_site1 SET PROPERTY FastStartFailoverTarget =
'db_site2,db_site3'; Current target depends
on many conditions
Threshold: 60 seconds FSFO
Target: db_site2
Candidate Targets: db_site2,db_site3
Observers: (*) obs1.trivadis.com
obs2.trivadis.com Switchover
db_site1 db_site2 db_site3

DGMGRL> SWITCHOVER TO db_site3;


Error: ORA-16655: specified standby database not the current fast-start
failover target standby.

25 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases: Switchover – DelayMins >0
DelayMins>0
Version 11.2.0.X: recovery re-started with NODELAY option.
Version 12.1.0.X: recovery waits until DelayMins reached!
– OPEN_MODE
PRIMARY TARGET
• Primary – CLOSED BY SWITCHOVER Failover Standby
• Standby – MOUNTED
– Application RW service outage within DelayMins time-frame! Application connect
attemps fail with ORA-16456: switchover to standby in progress or completed
Version 12.2.0.1: Switchover is not possible.
Error: ORA-16672: switchover not permitted to standby database with non-zero
DelayMinsFailed.

26 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases with FSFO: Master Observer Failure
Master Observer Backup Observers

After about 31 sec. the master observer is changed, i.e.:


available backup observer is promoted to the master role.
– Note: to perform a master change, the primary database
needs to be available!
PRIMARY
Data Guard Broker initiated a master observer switch since
the current master observer cannot reach the primary database
Logged on the
For maintenance, a master change can be performed manually. primary database

DGMGRL> SET MASTEROBSERVER TO obs2.trivadis.com;


Sent the proposed master observer to the data guard broker configuration.
Please run SHOW OBSERVER to see if master observer switch actually happens.

27 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases with FSFO: Failover Target Failure Observer

After about 10 sec. the target failover standby is changed, i.e.: a candidate
target failover standby observer is promoted to the current target role.
Permission granted to the primary database for target
switch.
The primary database returned to SYNC/NOT LAGGING state
with the standby database db_site3.
db_site1 db_site2 db_site3

Note: to perform the target failover standby change, the primary database and the
master observer need to be available!
– If the master observer fails at the same time:
LGWR: FSFO SetState("UNSYNC", 0x2) operation requires an ack
Primary database will shutdown within 30 seconds if permission
is not granted from Observer or FSFO target standby to proceed

28 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases with FSFO: Master Observer/Primary
Master Observer Backup Observers
If the primary and the master observer fail:
– No failover is initiated to a candidate standby.
– From a backup observer log file:
Ready to failover check on standby returned
RFS_NON_MSTOB.
Command READY_TO_FSFO to thread S024 returned status=0
Fast-Start Failover is not possible because this
observer is not the master.
db_site1 db_site2 db_site3

If the master observer is started at a later time, it waits until


FastStartFailoverThreshold timeout is reached again and fails over to the current
target standby.

29 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Special Cases with FSFO: Private Redo Network
Master Observer
If the public network on the primary server fails:
– Broker configuration property: ObserverOverride=FALSE.
– No failover (HB over private network still works!).
Public Network
Fast-Start Failover is not possible because
primary last contacted the standby within
FastStartFailoverThreshold seconds
HB

db_site1 db_site2 db_site3


In this network configuration consider using: Private Network

DGMGRL> EDIT CONFIGURATION SET PROPERTY ObserverOverride='TRUE';

30 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Master Observer, Primary/Standby DB: Location? (1)

For DR HA service protection, do not place the primary and the master observer in
the same data center!

Data Center 1 Data Center 2

Master Observer Backup Observer

No automatic
failover!

Primary Target Standby

No RW application service available!


31 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Master Observer, Primary/Standby DB: Location? (2)

For DR HA service protection, do not place the primary and the master observer in
the same data center!

Data Center 1 Data Center 2

Backup Observer Master observer placement correction monitoring: Master Observer


DGMGRL> SET MASTEROBSERVER TO …

To relocate:
Disable & Enable Automatic
FSFO failover!

Candidate Primary Target Standby


Target Standby
Potential placement
problem! RW application service available!
32 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Conclusions

33 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Conclusions (1)

Can Data Guard be a good solution for database service high availability?
– Yes, with a fast-start failover configuration.
– Anyway, it is not a replacement for a cluster but rather an alternative.
– Careful business requirements analysis is necessary.
Advantages:
– It offers a good service high availability, in addition to excellent data high
availability and some other features.
– Fairly simple solution (setup and operation).
– Not subject to additional license fees (EE license assumed).
– Infrastructure requirements not that high as for a cluster.
– Most client HA features can be used the same way as with a cluster.
34 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Conclusions (2)

Disadvantages:
– Component placement is critical and requires customized monitoring scripts.
– Some technical restrictions like network latencies (SYNC), flashback database or
force logging might limit Data Guard in this area.
– Re-connect timeouts without FAN/FCF (no VIPs).

35 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?
Trivadis @ DOAG 2017
#opencompany
Booth: 3rd Floor – next to the escalator

We share our Know how!


Just come across, Live-Presentations
and documents archive
T-Shirts, Contest and much more
We look forward to your visit

36 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?

You might also like