You are on page 1of 46

TRN4078

Best Practices for Application High Availability



10/25/2018 10:00 Moscone West - Room 3009

Make your applications run without interruption. Leverage Oracle Database 18c
High Availability and Application Continuity features for handling planned and
unplanned outage. This session demystifies the Oracle technologies that can be
used with language APIs to build resilient applications. Learn what FAN, DG, TAF,
TG, AC, RLB, Oracle RAC, and Oracle Connection Manager’s Traffic Director Mode
are and where to use them. Examples are applicable to C, Python, Node.js,
ODP.Net, Java, PHP and othter scripting languages.

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 1


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Application High Availability
Best Practices and New Features

Nancy Ikeda, Consulting Technical Staff


Kevin Neel, Consulting Technical Staff
Data Access Development, Oracle Database
October 25, 2018

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing, and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted
Application High Availability

•  Goal: Provide infrastructure to make applications highly available


•  What is High Availability
– who has to care about it?
– what are the choices?
•  Database options, FAN, TAF, etc.

•  Planned vs. unplanned outages

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 6


Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 7
Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 8
Application High Availability
What if my application does not address high availability?

• Applications hang on TCP/IP timeouts


• Connection requests hang when services are down
Performance
• Not connecting when services resume issues not
reported in your
• Receiving errors during planned maintenance
favorite tools.
• Attempting work at slow, hung, or dead nodes

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 9


Goal of the High Availability (HA) Infrastructure
Provide capabilities for application developers to . . .

•  Mask the effects of an outage from the end user


– Planned maintenance, e.g. patching
– Unplanned outage, e.g. node failure
•  Move work to different instance/database with no errors reported to end
users
•  Enable HA capabilities with configuration and deployment settings
•  Reduce/Eliminate need for custom code

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 10


Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 11
Applications Need a Highly Available Database
Oracle Database is the Gold Standard for HA

•  Node Replication
– RAC – Real Application Clusters
– DG – Data Guard
– ADG – Active Data Guard
– PDB – Pluggable Database clones
•  A node is always available to get new connections

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 12



Oracle Net Configuration for Application Connections
Gracefully handle service temporary unavailability; Use Dynamic Database Services

alias =(DESCRIPTION_LIST = Retry while service is
unavailable
(DESCRIPTION=
     (RETRY_COUNT=10)(RETRY_DELAY=5)     
(ADDRESS_LIST=(ADDRESS = . . .)(ADDRESS= . . .))
(CONNECT_DATA=(SERVICE_NAME=hr_svc)))
ALWAYS use a service that is
NOT the default database
service

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13


How Should Your Application Connect to Oracle?
Use an Oracle-supplied pool

OCI Session Pool Example ODP.NET Pooling Example


(also applies to open source drivers)

OracleConnection con =
new OracleConnection();
Main Thread:
con.ConnectionString = "User Id=hr;
OCISessionPoolCreate(. . .); Password=hr; Data Source=hrdb;
Pooling=true"
con.Open();
Worker Thread: [database request]
OCISessionGet(. . .);
con.Dispose();
[database request]
OCISessionRelease(. . .);


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 14
Connect via CMAN-TDM (Traffic Director Mode)


C C++ •  Supports 11gR2 (11.2.0.4) and newer
Applications and Drivers – 11gR2 and up
client and database
•  Optional internal OCI Session Pool
(PRCP) for multiplexing and HA
support
CMAN- TDM
•  Shields application from planned and
unplanned outages

Oracle Database 18c - Cloud & On-Premise

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 15


CMAN-TDM Configuration for the end client

•  If the client uses inst1_tdm to connect, then the following entry is needed in his
tnsnames.ora
inst1_tdm=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=tdmhost01)(PORT=3000))
(CONNECT_DATA=(SERVICE_NAME=hrsvc)))
–  tdmhost01 is the machine where CMAN-TDM is hosted (not the DB) and 3000 is the CMAN-TDM listening port
–  hrsvc is the dynamic database service created by srvctl or DBMS_SERVICE

•  Example: sqlplus scott/tiger@inst1_tdm

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 16
Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 17
Fast Application Notification (FAN) Proven since 10g
Server initiated messages sent to interested clients

•  Down – sent out of band to invoke failover


•  Planned Down – inform application of imminent planned maintenance
•  Up – Re-allocate sessions when services resume
•  Load % - Advice to balance sessions across a RAC cluster or GDS
•  Affinity - Advice when to keep conversation locality
•  In-band – Planned down message sent via client to DB connection (new)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 18


OCI FAN
Database configuration

Configure your dynamic database service for HA Notifications:
- RAC database:
srvctl modify service –db hr -service hrsvc -notification TRUE

- Non-RAC database:
dbms_service.modify_service (‘hrsvc’, aq_ha_notfications => TRUE);

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 19


FAN Client Configuration
Examples

•  OCI configuration: Enable events in oraaccess.xml


–  Eliminates need to use OCI_EVENTS mode in OCIEnvCreate()
<default_parameters>
<events>true</events>
<default_parameters>

•  ODP.NET configuration in the connect string


"user id=hr; password=hr; data source=hrdb; pooling=true; HA events=true"

•  PHP configuration: Enable events in the php.ini file


oci8.events = On

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 20


Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 21
Error-Free Planned Maintenance
Migrate connections away from target instance

•  Drain work from target instance


–  Supports well behaved applications using Oracle pools
ODP.NET, OCI Session Pool, PHP, . . ., WebLogic Active GridLink, UCP
– Connections migrate away from the target instance when request completes
•  Transactional Disconnect with Transaction Guard (TG) and Transparent
Application Failover (TAF)
–  Applications with one COMMIT per request
– Connection terminated at transaction boundary; TAF replaces it with new connection

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 22


Draining or Transactional Disconnect?
Understand your application

•  Applications using an Oracle pool should use draining


•  Applications using a custom pool can use draining with a code modification
•  For applications not using a pool, consider transactional disconnect + TAF
•  TAF implicitly restores the following state after a failover:
–  NLS Settings
–  DBMS_APPLICATION_INFO
–  CURRENT_SCHEMA, TIME_ZONE, EDITION,
–  SQL_TRANSLATION_PROFILE, ROW ARCHIVAL VISIBILITY, ERROR_ON_OVERLAP_TIME
–  CONTAINER and SERVICE for PDBs

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 23


Graceful Planned Maintenance with Draining -- How it works
Applications Uses Oracle pools – OCI Session Pool, open source, ODP.NET, Tuxedo, FAN Planned

srvctl relocate|stop service –db … –service … –


DBA Runs [drain_timeout …] (no –force)
Pools drain
Immediately:
sessions as
Sessions Drain
- New work is redirected by listeners
work
- Idle sessions are released
completes
Active sessions are released when returned to pools
Wait to allow sessions to drain, .e.g. 10-30 minutes
After drain timeout,
exec dbms_service.disconnect_session (‘svcname’,
DBA Completes
…, DBMS_SERVICE.POST_TRANSACTION); (optional)
Shutdown
shutdown immediate

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 24


Draining for OCI Applications With Custom Pools
What if my application does not use an Oracle pool?

•  Call OCIAttrGet() with attribute OCI_ATTR_SERVER_STATUS to


determine connection health
– OCI_SERVER_NORMAL
– OCI_SERVER_NOT_CONNECTED

•  At check-out from the pool, do not dispense failed connection

•  Optional: monitor pool to terminate failed idle connections

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 25


Planned Maintenance with Transactional Disconnect -- How it works

Applications Uses Applications with one COMMIT per request

srvctl relocate|stop service –db … –service …


DBA Runs
[–drain_timeout …] –stopoption TRANSACTIONAL -force FAN Down
Immediately:
- New work is redirected by listeners
Sessions Failover
At Transactional Boundary:
Sessions
- TAF fails over active sessions and restores state failover as
transactions
DBA Completes Wait to allow sessions to drain, .e.g. 10-30 minutes
Shutdown shutdown immediate
complete

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 26


Planned Maintenance with Transactional Disconnect
Database service configuration for TAF, Transaction Guard and FAILOVER_RESTORE
- RAC database:
srvctl modify service –db hr -service hrsvc -failovertype SELECT
-failovermethod BASIC -failoverretry 20 –failoverdelay 5 -failover_restore
LEVEL1 -commit_outcome true

- Non-RAC database:
declare
params dbms_service.svc_parameter_array ; params('commit_outcome') := 'true' ;
begin params(’failover_restore') := 'LEVEL1'
params(’failover_type') := 'SELECT'; dbms_service.modify_service('hrsvc', params);
params(’failover_method') := 'BASIC' ; end;
params(’failover_delay') := 5;
params(‘failover_retries’) := 20;

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 27
HA with Oracle CMAN in Traffic Director Mode (CMAN-TDM)

C C++ •  CMAN-TDM responds to server initiated


planned events and re-directs work
–  Out-of-band events are sent by Oracle
Database Cloud Traffic Manager Notification Services (ONS) for service relocation
–  In-band notifications are sent for PDB relocation
•  In pooling mode, connections are drained
ONS or Connection is from the source instance when the request
In-band re-directed to completes
Event a new instance
or PDB •  In dedicated mode, CMAN-TDM leverages
TAF for transactional disconnect

Oracle Database

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 28


Planned Maintenance Summary

•  Start with a highly available Oracle database


•  Enable FAN notifications


•  Use Oracle pools whenever possible

•  Consider TAF with Transactional Disconnect if pooling is not possible

•  Consider CMAN-TDM to completely hide planned maintenance

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29


Program Agenda

1 Motivation
2 Connecting to Oracle
3 Detecting HA Events
4 Re-directing Database Work for Planned Maintenance
5 Recovering from an Unplanned Outage


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 30
Unplanned Outages
and
Application
Continuity
in OCI from 12.2

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 31


FAN and TAF Unplanned Support
•  Alternate database nodes take over
•  Fast Application Notification can stop hangs
•  Transparent Application Failover can handle some unplanned outages
– Can fail over during SELECT queries
– No transaction underway
– Simple session state needs

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 32


Application Continuity

In-flight work continues

§  Application automatically replays queries


and transactions on recoverable errors
§  Masks most hardware, software,
network, storage errors and outages
§  New in 12.2 for OCI and ODP.Net
(“unmanaged”); supported since 12.1 for
JDBC
§  RAC, RAC One, & Active Data Guard
§  Improves end user experience

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Phases in Application Continuity – Under the hood
Normal Operation Replay Replay (continued)
• Client checks replay is • Client Replays captured
• Pool marks database enabled calls
requests
• Creates a new connection • Server and client ensure
• Client captures original results returned to
calls , their inputs, and • Client verifies timeliness application match original
validation data
• Server checks replay •  On success, client returns
• Server decides which can database is valid control to the application
and cannot be replayed
• Server checks if
• At end of request, client committed, rolls back if not
purges queue
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 34
Database Requests
Oracle pooled connections are critical

OCI Session Pool Example


Main Thread:
OCISessionPoolCreate(…);

Request
Worker Thread: Begins
OCISessionGet(…);

OCI calls Request Body
… often ends with
COMMIT
OCISessionRelease(…);
Request
Ends

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35


Database Requests
ODP.NET pooled connections
OracleConnection con = new OracleConnection();
con.ConnectionString =
"User Id=hr; Password=hr; Data Source=hrdb; Pooling=true"
con.Open();
. . .
ExecuteNonQuery();
con.Commit();
. . .
con.Close();


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 36
Excluded Configurations
When replay is not enabled

Application Replay Database


• Connects to default database •  Databases able to diverge
service or default PDB service (must –  Logical Standby
create own service)
–  Golden Gate
• XA
• Admin console
(Alter system/database)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 37


Restrictions
When does replay not occur

Normal Runtime Replay


Replay is disabled per request after •  Error is not recoverable
• successful commit •  Reconnection failure
• a disabling user call –  replay initiation timeout

• (supported: basic SQL and LOB use) –  max failover retries

• OCIRequestDisableReplay API •  Last call committed in embedded


(DDL, PL/SQL) or autocommit mode
• Some ALTER SESSION operations •  Validation detects different results
• ADG with read/write DB links

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 38


Steps to use Application Continuity
Check What to do

Installation Have Oracle client 12.2+, and Oracle Database 12.2+ with Transaction Guard

Service
Use srvctl, gdsctl, or DBMS_SERVICE to enable AC on the service
Configuration
Request
Use Oracle pools
Boundaries

Side Effects Use disable API or attribute if a request has a call that should not be replayed

Callbacks Register a TAF callback for applications that change state outside requests.

Mutable Functions Grant keeping mutable values, e.g. sequence.nextval

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 39


Oracle Database 12.2 Configuration
Set Service Attributes

FAILOVER_TYPE = TRANSACTION for Application Continuity

Review the service attributes:


COMMIT_OUTCOME = TRUE for Transaction Guard
REPLAY_INITIATION_TIMEOUT = 300 after which replay is canceled
FAILOVER_RETRIES = 30 for the number of connection retries per replay
FAILOVER_DELAY = 3 for delay in seconds between connection retries
FAILOVER_RESTORE = LEVEL1 to restore session state to new session

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 40


Control Repetition of Side Effects
Make a conscious decision to replay side effects
e.g. Autonomous Transactions
UTL_HTTP UTL_URL UTL_MAIL
UTL_SMTP UTL_TCP
UTL_FILE UTL_FILE_TRANSFER
DBMS_JAVA callouts EXTPROC

To disable replay:
For current request in OCI:
OCIRequestDisableReplay()
For a connection in ODP.Net:
New attribute in connect string:
"User Id=scott; Password=tiger; Data Source=oracle;

Application Continuity=disabled"

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 41


Grant Mutables
Keep original function results at replay

ALTER SEQUENCE.. [sequence object] [KEEP|NOKEEP];


CREATE SEQUENCE.. [sequence object] [KEEP|NOKEEP];
For other database users accessing these items :
GRANT [KEEP DATE TIME | KEEP SYSGUID].. [to  USER]
REVOKE [KEEP DATE TIME | KEEP SYSGUID][from USER]
GRANT KEEP SEQUENCE on [sequence object] [to  USER] ;
REVOKE KEEP SEQUENCE on [sequence object] [from USER]

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 42


Transparent Application Continuity (TAC)
Automates Some of the AC Tasks

Recognizes states that need to be restored (FAILOVER_RESTORE=AUTO)
Recognizes side effects that shouldn’t be replayed

To configure: FAILOVER_TYPE=AUTO

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 43


Killing Sessions - Extended
DBA Command Replays
srvctl stop service -db orcl -instance orcl2 -force
YES
srvctl stop service -db orcl -node rws3 -force
YES
srvctl stop service -db orcl -instance orcl2 –noreplay -force
NO
srvctl stop service -db orcl -node rws3 –noreplay –force
NO
alter system kill session … immediate
YES
alter system kill session … noreplay
NO
dbms_service.disconnect_session([service], dbms_service.noreplay)
NO

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 44


Supported High Availability (HA) Deployments in Oracle 12.2
Fast Application Runtime Load Transparent Transaction Application
Notification (FAN) Balancing (RLB) Application Guard (TG) Continuity
Failover (TAF) (AC)
Real Application
Clusters (RAC, RAC ü ü (RAC) ü ü ü
One)
Data Guard (DG)
With clusterware x ü ü x
physical standby
Active Data Guard
With clusterware x ü ü ü
(ADG)
RAC+DG (physical
ü ü (within RAC) ü ü ü
standby)
Global Data
ü ü ü ü ü
Services (GDS)
Golden Gate
x x ü (no DML retry) x x
Note: AC is available in JDBC with 12.1

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 45


Resources

Maximum Availability Architecture
http://www.oracle.com/goto/maa

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 46

You might also like