Professional Documents
Culture Documents
1
1/20/2012
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features
or functionality described for Oracles products remains at the sole discretion of Oracle.
Agenda
The Oracle RAC Architecture
VIPs and Networks
Listeners and SCAN
and Services
Client Connectivity
Node Membership
The Interconnect
Installation and Upgrade
2
1/20/2012
Interconnect
with switch
SAN switch
Shared Storage
Shared Storage
3
1/20/2012
4
1/20/2012
ASM Instance
Oracle Grid Infrastructure
HA Framework
OS OS
Node
Membership
My Oracle Support (MOS) OS
Note 1053147.1 - 11gR2 Clusterware
and Grid Home - What You Need to Know
Note 1050908.1 - How to Troubleshoot
Grid Infrastructure Startup Issues
5
1/20/2012
6
1/20/2012
7
1/20/2012
8
1/20/2012
9
1/20/2012
10
1/20/2012
Listeners
Listeners and dependencies
Listeners
The default LISTENER
[GRID]> srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle Grid Software Owner
Home: <CRS home>
rac1 rac2
End points: TCP:1521
ora.LISTENER.lsnr ora.LISTENER.lsnr
ora.net1.network ora.net1.network
11
1/20/2012
Listeners
The default LISTENER FAQ
rac1 rac2
Can you define another owner? YES
12
1/20/2012
13
1/20/2012
Listeners
Can I add another listener on another port? - YES
[GRID]> srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
Home: <CRS home>
rac1 rac2
End points: TCP:1521
Name: LISTENER2011
Network: 1, Owner: oracle
Home: <CRS home>
ora.LISTENER.lsnr ora.LISTENER.lsnr
End points: TCP:2011 ora.LISTENERK2.lsnr ora.LISTENERK2.lsnr
ora.net1.network ora.net1.network
14
1/20/2012
Listeners
Can I add another listener on another network? - YES
[GRID]> srvctl add listener -l ListenerK2 -p 1544 -k 2
[GRID]> srvctl config listener
Name: LISTENER
Network: 1, Owner: oracle
rac1 rac2
Home: <CRS home>
End points: TCP:1545
Name: LISTENERK2 ora.LISTENER.lsnr ora.LISTENER.lsnr
Ora.LISTENERK2.lsnr ora.LISTENERK2.lsnr
Network: 2, Owner: oracle
ora.net1.network ora.net1.network
Home: <CRS home> ora.net2.network ora.net2.network
End points: TCP:1544
Listeners
Remember its just another listener
1 [GRID]> srvctl config listener
Name: JUSTALISTENER
Network: 1, Owner: root
Home: <CRS home>
rac1 rac2
End points: TCP:1522 LISTENER_SCAN1 LISTENER_SCAN2
... ora.SCAN1.VIP ora.SCAN2.VIP
ora.rac1.vip ora.rac2.vip
[GRID]> vi /u01/app/11.2.0/grid/network/admin/listener.ora
JUSTALISTENER=(DESCRIPTION=(ADDRESS_LIST=
ora.LISTENER.lsnr ora.LISTENER.lsnr
(ADDRESS=(PROTOCOL=IPC)(KEY=JUSTALISTENER))))
# line added by Agent
ora.net1.network ora.net1.network
[GRID]> vi /u01/app/11.2.0/grid/network/admin/endpoints_listener.ora
JUSTALISTENER_RAC1=(DESCRIPTION=(ADDRESS_LIST= Oracle Grid Infrastructure
(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1522))
(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.51)(PORT=1522)(IP=FIRST))))
OS OS
15
1/20/2012
Listeners
Remember its just another listener
2 [GRID]> vi /u01/app/11.2.0/grid/network/admin/listener.ora
JUSTALISTENER=(DESCRIPTION=(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=IPC)(RATE_LIMIT=10) For demonstration purposes only
(KEY=JUSTALISTENER)))) # line added by Agent
... rac1 rac2
[GRID]> srvctl stop listener -l JustAListener LISTENER_SCAN1 LISTENER_SCAN2
ora.SCAN1.VIP ora.SCAN2.VIP
[GRID]> srvctl start listener -l JustAListener ora.rac1.vip ora.rac2.vip
[GRID]> vi /u01/app/11.2.0/grid/network/admin/listener.ora
... ora.LISTENER.lsnr ora.LISTENER.lsnr
[GRID]> vi /u01/app/11.2.0/grid/network/admin/listener.ora
ora.net1.network ora.net1.network
JUSTALISTENER=(DESCRIPTION=(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=IPC)(RATE_LIMIT=10) Oracle Grid Infrastructure
(KEY=JUSTALISTENER)))) # line added by Agent
OS OS
SCAN
The basics and overview
In Oracle RAC 11g Release 2
SCAN Listeners were introduced SCAN
There is one SCAN setup per cluster
16
1/20/2012
SCAN
The SCAN bundle
1 3 SCANs are the default for HA and LB
Regardless of number of nodes SCAN
You can define less or more, if really needed.
SCAN listener resources run with rac1 rac2 rac3
an active dispersion dependency LISTENER_SCAN1 LISTENER_SCAN2 LISTENER_SCAN3
If you use more nodes in the cluster than ora.SCAN1.VIP ora.SCAN2.VIP ora.SCAN3.VIP
SCAN listeners are defined, no node should
run more than one SCAN bundle at a time
ora.LISTENER.lsnr ora.LISTENER.lsnr ora.LISTENER.lsnr
If you use less nodes in the cluster than
SCAN listeners are defined, there will be ora.net1.network ora.net1.network ora.net1.network
nodes running more than one SCAN bundle
at a time. Oracle Grid Infrastructure
SCAN VIP moves with the listener, if possible. OS OS OS
SCAN
The SCAN bundle
2 [GRID]> srvctl config scan
SCAN name: cluster1, Network: 1/192.168.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /cluster1.us.oracle.com/192.168.0.41 SCAN
rac1 rac2
[GRID]> srvctl modify scan LISTENER_SCAN1
Modifies the SCAN name. ora.SCAN1.VIP
Usage: srvctl modify scan
-n <scan_name> Domain name qualified SCAN name
ora.LISTENER.lsnr ora.LISTENER.lsnr
-h Print usage
ora.net1.network ora.net1.network
17
1/20/2012
SCAN
The SCAN bundle
3 [GRID]> srvctl add scan -h
Adds a SCAN VIP to the Oracle Clusterware.
Usage: srvctl add scan -n <scan_name> SCAN
-n <scan_name> Domain name qualified SCAN name
rac1 rac2
-k <net_num> network number (default number is 1) LISTENER_SCAN1
-S <subnet>/<netmask>/[if1[|if2...]] ora.SCAN1.VIP
NET address spec for network
-h Print usage
ora.LISTENER.lsnr ora.LISTENER.lsnr
Note: SCAN can only operate on one network / in one subnet. ora.net1.network ora.net1.network
SCAN
The SCAN FAQ
1 From MOS Note 220970.1
RAC: Frequently Asked Questions
How to use SCAN and node listeners with different ports?
SCAN
With Oracle RAC 11g Release 2 using SCAN is the default.
rac1 rac2
As with other listeners, there is no direct communication LISTENER_SCAN1
between the node (listeners) and the SCAN listeners. ora.SCAN1.VIP
Listeners are only aware of the instances and services served,
since the instances (PMON) register themselves and the services
they host with the listeners. ora.LISTENER.lsnr ora.LISTENER.lsnr
The instances use the LOCAL and REMOTE Listener
ora.net1.network ora.net1.network
parameters to know with which listeners to register.
Listeners used for a client connection to Oracle RAC should be Oracle Grid Infrastructure
managed by Oracle Clusterware and should be listening on an OS OS
Oracle managed VIP.
18
1/20/2012
SCAN
The SCAN FAQ
2 Can you define another port? - YES
See MOS Note 220970.1
RAC: Frequently Asked Questions
SCAN
How to use SCAN and node listeners with different ports?
rac1 rac2
Use srvctl modify scan_listener p <newPort>
LISTENER_SCAN1
Can you define another owner? NO ora.SCAN1.VIP
Can you define another home? NO
ora.LISTENER.lsnr ora.LISTENER.lsnr
Can you have more than one node listener with SCAN? YES
Can the SCAN and the node listener ports differ? - YES ora.net1.network ora.net1.network
The use of the TNSNAMES connector string is the default: ora.LISTENER.lsnr ora.LISTENER.lsnr
local_listener = (DESCRIPTION= (ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.61)(PORT=2011))
Oracle Grid Infrastructure
))' OS OS
19
1/20/2012
20
1/20/2012
DB Instance DB Instance
If listeners in different subnets are used,
LISTENER_SCAN1
use LISTENER_NETWORKS: ora.SCAN1.VIP
http://download.oracle.com/docs/cd/E11882_01
/server.112/e17110/initparams115.htm#REFRN10313
ora.LISTENER.lsnr ora.LISTENER.lsnr
Note: Listeners specified by the LISTENER_NETWORKS Ora.LISTENERK2.lsnr ora.LISTENERK2.lsnr
parameter should not be used in the LOCAL_LISTENER and
ora.net1.network ora.net1.network
REMOTE_LISTENER parameters. Otherwise, cross registration
will happen and connections will be redirected cross networks. ora.net2.network ora.net2.network
21
1/20/2012
ora.LISTENER.lsnr ora.LISTENER.lsnr
Currently there is no support for service failover
Between Server Pools
Oracle Grid Infrastructure
Between networks
OS OS
22
1/20/2012
START_DEPENDENCIES=hard(ora.orcl.db,type:ora.cluster
_vip_net1.type) weak(type:ora.listener.type)
ora.LISTENER.lsnr ora.LISTENER.lsnr
pullup(type:ora.cluster_vip_net1.type)
pullup:always(ora.orcl.db)
dispersion(type:ora.service.type) Oracle Grid Infrastructure
STOP_DEPENDENCIES=hard(intermediate:ora.orcl.db,inte OS OS
rmediate:type:ora.cluster_vip_net1.type)
...
46 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
23
1/20/2012
Client Connectivity
Client Connectivity
Direct or indirect connect
Connect Time Load Balancing (CTLB)
Connect Time Connection Failover (CTCF)
BATCH
Production
Email
SCAN
Connection
Pool
24
1/20/2012
Client Connectivity
Connect Time Connection Failover
jdbc:oracle:thin:@MySCAN:1521/Email
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)))
BATCH
Production
Email
MySCAN
Connection
Pool
Client Connectivity
Runtime Time Connection Failover
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)
...))
BATCH
Production
Email
MySCAN
Connection
Pool
25
1/20/2012
Client Connectivity
Runtime Time Connection Failover
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)
(FAILOVER_MODE= (TYPE=select)(METHOD=basic)(RETRIES=180)(DELAY=5))))
BATCH
Production
Email
MySCAN
?
Connection
Pool
Client Connectivity
More information
If problems occur, see:
Note 975457.1 How to Troubleshoot Connectivity Issues with 11gR2 SCAN Name
For more advanced configurations, see:
Note 1306927.1 Using the TNS_ADMIN variable and changing the default port
number of all Listeners in an 11.2 RAC for an 11.2, 11.1, and 10.2 Database
??
BATCH
Production
Email
MySCAN
?
Connection
Pool
26
1/20/2012
Client Connectivity
Two ways to protect the client
1. Transparent Application Failover (TAF) 2. Fast Application Notification (FAN)
Tries to make the client unaware of a failure FAN wants to inform clients ASAP
Provides means of CTCF and RTCF Client can react to failure asap
Allows for pure selects (reads) to continue Expects clients to re-connect on failure (FCF)
Write transactions need to be re-issued Sends messages about changes in the cluster
The Application needs to be TAF aware
BATCH
Production
Email
MySCAN
?
Connection
Pool
27
1/20/2012
Client Connectivity
Use a FAN aware connection pool
1 If a connection pool is used
The clients (users) get a physical
connection to the connection pool
The connection pool creates a physical
connection to the database
It is a direct client to the database
Internally the pool maintains logical connections
BATCH
Production Connection Pool
Email
MySCAN
Client Connectivity
Use a FAN aware connection pool
2 The connection pool
Invalidates connections to one instance
Re-establishes new logical connections
May create new physical connections
Prevent new clients to be misrouted
28
1/20/2012
Client Connectivity
The Load Balancing (LB) cases
Connect Time Load Balancing (CTLB)
Runtime Connection Load Balancing (RTLB)
MySCAN
Email
Connection
Pool
Client Connectivity
Connect Time Load Balancing (CTLB) on the client side
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)(LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)))
BATCH
Production
Email
MySCAN
Connection
Pool
29
1/20/2012
Client Connectivity
Connect Time Load Balancing (CTLB) on the server side
Traditionally, PMON dynamically registers the services to the specified listeners with:
Service names for each running instance of the database and instance names for the DB
The listener is updated with the load information for every instance and node as follows:
1-Minute OS Node Load Average all 30 secs.
Number of Connections to Each Instance
Number of Connections to Each Dispatcher
BATCH
Production
Email
MySCAN
Connection
Pool
Client Connectivity
Use FAN for the Load Balancing cases
Connect Time Load Balancing (CTLB)
Connect Time Connection Failover (CTCF)
Im busy
Instance1
10% connections
Im very busy
Im idle Instance2
60% connections
Instance3
30
1/20/2012
Client Connectivity
Use FAN for the Load Balancing cases
Connect Time Load Balancing (CTLB)
Runtime Connection Load Balancing (RTLB)
Also via AQ (Advanced Queuing) based notifications
Background is always the Load Balancing Advisory RAC
Database
30% connections
Im busy
Instance1
10% connections
MySCAN
For more information, see: Im very busy
Node Membership
31
1/20/2012
SAN SAN
Network Network
Voting
Disk
Ping
32
1/20/2012
Ping
33
1/20/2012
CSSD CSSD
Ping
34
1/20/2012
http://www.oracle.com/goto/rac
Using standard NFS to support
a third voting file for extended
cluster configurations (PDF)
CSSD CSSD
35
1/20/2012
36
1/20/2012
Fencing Basics
Why are nodes evicted?
Evicting (fencing) nodes is a preventive measure (its a good thing)!
Nodes are evicted to prevent consequences of a split brain:
Shared data must not be written by independently operating nodes
The easiest way to prevent this is to forcibly remove a node from the cluster
1 2
CSSD CSSD
Fencing Basics
How are nodes evicted? STONITH
Once it is determined that a node needs to be evicted,
A kill request is sent to the respective node(s)
Using all (remaining) communication channels
A node (CSSD) is requested to kill itself STONITH like
STONITH foresees that a remote node kills the node to be evicted
1 2
CSSD CSSD
37
1/20/2012
Fencing Basics
EXAMPLE: Network heartbeat failure
The network heartbeat between nodes has failed
It is determined which nodes can still talk to each other
A kill request is sent to the node(s) to be evicted
Using all (remaining) communication channels Voting Disk(s)
A node is requested to kill itself; executer: typically CSSD
1 2
CSSD CSSD
Fencing Basics
What happens, if CSSD is stuck?
A node is requested to kill itself
See also: MOS note
BUT CSSD is stuck or sick (does not execute) e.g.: 1050693.1 -
CSSD failed for some reason Troubleshooting 11.2
CSSD is not scheduled within a certain margin Clusterware Node
Evictions (Reboots)
OCSSDMONITOR (was: oprocd) will take over and execute
1 2
CSSD CSSDmonitor
CSSD
2
CSSD
38
1/20/2012
Fencing Basics
How can nodes be evicted?
Oracle Clusterware 11.2.0.1 and later supports IPMI (optional)
Intelligent Platform Management Interface (IPMI) drivers required
IPMI allows remote-shutdown of nodes using additional hardware
A Baseboard Management Controller (BMC) per cluster node is required
CSSD CSSD
Fencing Basics
EXAMPLE: IPMI based eviction on heartbeat failure
The network heartbeat between the nodes has failed
It is determined which nodes can still talk to each other
IPMI is used to remotely shutdown the node to be evicted
CSSD
39
1/20/2012
Fencing Basics
Which node gets evicted?
Voting Disks and heartbeat communication is used to determine the node
In a 2 node cluster, the node with the lowest node number should survive
In a n-node cluster, the biggest sub-cluster should survive (votes based)
1 2
CSSD CSSD
Fencing Basics
Cluster members can escalate a kill request
Cluster members (e.g Oracle RAC instances) can request
Oracle Clusterware to kill a specific member of the cluster
Oracle Clusterware
Inst. 1:
kill inst. 2
40
1/20/2012
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Oracle Clusterware
Inst. 1:
kill inst. 2
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Oracle Clusterware
Inst. 1:
kill inst. 2
41
1/20/2012
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Oracle RAC
DB Inst. 1
Oracle Clusterware
App X App Y
RAC DB RAC DB
Inst. 1 Inst. 2
42
1/20/2012
App X App Y
RAC DB RAC DB
Inst. 1 Inst. 2
Then IO issuing processes are killed; it is made sure that no IO process remains
For a RAC DB mainly the log writer and the database writer are of concern
App X App Y
RAC DB
Inst. 1
43
1/20/2012
Once all IO issuing processes are killed, remaining processes are stopped
IF the check for a successful kill of the IO processes, fails reboot
App X App Y
RAC DB
Inst. 1
Once all remaining processes are stopped, the stack stops itself with a restart flag
App X App Y
RAC DB
Inst. 1
Oracle
CSSD Clusterware OHASD
44
1/20/2012
OHASD will finally attempt to restart the stack after the graceful shutdown
App X App Y
RAC DB
Inst. 1
Oracle
CSSD Clusterware OHASD
App X App Y
RAC DB RAC DB
Inst. 1 Inst. 2
45
1/20/2012
The Interconnect
The Interconnect
Heartbeat and memory channel between instances
Network
Public Lan
Node 1 Node 2 Node N-1 Node N
Client
Interconnect
with switch
SAN switch
46
1/20/2012
The Interconnect
Redundant Interconnect Usage
1 Redundant Interconnect Usage can be used as a bonding alternative
It works for private networks only; the nodeVIPs use a different approach
It enables HA and Load Balancing for up to 4 NICs per server (on Linux / Unix)
It can be used by Oracle Databases 11.2.0.2 and Oracle Clusterware 11.2.0.2
It uses so called HAIPs that are assigned to the private networks on the server
The HAIPs will be used by the database and ASM instances and processes
Node 1 Node 2
HAIP1 HAIP3
HAIP2 HAIP4
The Interconnect
Redundant Interconnect Usage
2 A multiple listening endpoint approach is used
The HAIPs are taken from the link-local (Linux / Unix) IP range (169.254.0.0)
To find the communication partners, multicasting on the interconnect is required
With 11.2.0.3 Broadcast is a fallback alternative (BUG 10411721)
Multicasting is still required on the public lan for MDNS for example.
Details in My Oracle Support (MOS) Note with Doc ID 1212703.1:
11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting
Node 1 Node 2
HAIP1 HAIP3
HAIP2 HAIP4
47
1/20/2012
The Interconnect
Redundant Interconnect Usage and the HAIPs
If a network interface fails, the assigned HAIP is failed over to a remaining one.
Redundant Interconnect Usage allows having networks in different subnet
You can either have one subnet for all networks or a different one for each
You can also use VLANs with the interconnect. For more information see:
Note 1210883.1 - 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip
Note 220970.1 - RAC: Frequently Asked Questions - How to use VLANs in Oracle RAC? AND
Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network?
Node 1 Node 2
HAIP1 HAIP3
HAIP1 HAIP2 HAIP4 HAIP3
48
1/20/2012
Installation
Patch Sets are also more than BUG fixe with 11.2.0.x
Upgrade + New Features
Upgrade + Full Installation
And (always) Out-of-place
Patch uses new Oracle Home
Installation
Installation is meant to be simpler with 11g Release 2
Oracle Universal Installer (OUI)
is the main installation instrument.
Installation tips and tricks in:
MOS note 810394.1 (generic)
See:
RAC Platform Specific Starter Kits and Best Practices
Step by Step for various versions available.
See also:
RAC Platform Generic Load Testing and System Test Plan Outline
49
1/20/2012
Things to consider:
MOS note 1312225.1 - Things to Consider
Before Upgrading to 11.2.0.2 Grid Infrastructure
MOS note 1320966.1 - Things to Consider
Before Upgrade to 11.2.0.2 Database Performance
MOS note 1363369.1
Things to Consider Before Upgrading to 11.2.0.3
Grid Infrastructure/ASM
50