Professional Documents
Culture Documents
The Oracle RAC GFS is a complete guide to all things Oracle RAC GFS!
Table of Contents
Introduction ...................................................................................................................................... vi
1. About This Guide ................................................................................................................... vi
2. Audience ............................................................................................................................... vi
3. Software Versions ................................................................................................................... vi
4. Related Documentation ............................................................................................................ vi
5. Document Conventions ........................................................................................................... vii
1. Sample Cluster ................................................................................................................................ 1
1. Oracle RAC Cluster on CS4/GFS ................................................................................................ 1
1.1. Sample 4-node Oracle RAC cluster ................................................................................... 1
1.2. Storage ........................................................................................................................ 2
1.3. Fencing Topology ......................................................................................................... 3
1.4. Remote Lock Management using GULM ........................................................................... 3
1.5. Network fabrics ............................................................................................................ 4
1.5.1. Hostnames and Networks ..................................................................................... 4
1.5.2. Hostnames and Physical Interfaces ......................................................................... 4
2. Installation and Configuration of RHEL4 ............................................................................................. 6
1. Using RHEL4 Update 3 ............................................................................................................ 6
1.1. Customizing the RHEL4 Installation ................................................................................. 6
1.1.1. Boot Disk Provisioning ........................................................................................ 6
1.1.2. Network Interfaces .............................................................................................. 6
1.1.3. Firewall and Security ........................................................................................... 6
1.1.4. Selecting from the Custom subset .......................................................................... 6
1.2. Post Install Configuration Activities .................................................................................. 7
1.2.1. INIT[run level] options ........................................................................................ 7
1.2.2. Configuring Cluster Clock synchronization .............................................................. 7
1.2.3. Configuring HP iLO (Integrated Lights Out) ............................................................ 7
1.2.4. Shared LUNs Requirement ................................................................................... 8
3. Installation and Configuration of Cluster Suite4 ..................................................................................... 9
1. Installing ClusterSuite4 (CS4) components ................................................................................... 9
1.1. Installing CS4 RPMs ..................................................................................................... 9
1.2. Configuring CS4 Using the GUI Tool ............................................................................... 9
1.2.1. Verify X11 connectivity ......................................................................................10
1.3. Configuring the 1st lock server .......................................................................................10
1.3.1. Hostnames, Networks and Interfaces .....................................................................10
1.3.2. Configuring with the GUI tool ..............................................................................10
1.3.3. After the GUI configuration .................................................................................19
1.3.4. Testing first GULM lock server ............................................................................20
1.3.5. Configuring the remaining GULM lock servers .......................................................22
1.3.6. After the GUI configuration: for other lock servers ...................................................23
1.3.7. Adding the Four RAC nodes and their Fence Devices ...............................................26
1.3.8. Post GUI configuration for other lock servers ..........................................................28
1.3.9. Operational Considerations ..................................................................................29
4. Installing Clustered Logical Volume Manager (CLVM) .........................................................................31
1. Installing CLVM components ...................................................................................................31
2. Configuring CLVMD ..............................................................................................................31
3. Start up CLVMD ....................................................................................................................31
4. Repeat Installation and configuration for all nodes ........................................................................31
5. Creating the Physical and Logical Volumes .........................................................................................32
1. Physical_Storage_Allocation ....................................................................................................32
2. Initialize and Configure Volumes ...............................................................................................32
2.1. Verify X11 connectivity ................................................................................................32
2.2. Initialize the Shared Home volume group ..........................................................................33
2.3. Create the 1st redo volume group and logical volume ..........................................................36
2.4. Create the remaining redo groups and volumes ..................................................................40
2.5. Create the main datafiles logical volume ...........................................................................41
6. GFS .............................................................................................................................................42
1. Installing GFS components .......................................................................................................42
2. Create the GFS volumes ...........................................................................................................42
2.1. Verify the logical volumes .............................................................................................42
2.2. Create the filesystems ...................................................................................................42
2.3. /etc/fstab entries ...........................................................................................................43
7. Oracle 10gR2 Clusterware ................................................................................................................45
iv
Oracle Real Application Clusters GFS
v
Introduction
1. About This Guide
This manual provides a step-by-step installation of Oracle’s 10gR2 Real Application Clusters (RAC) database on GFS6.1.
A sample cluster is provided as a working example that incorporates some best practices to provide entry-level perform-
ance and stability.
2. Audience
Installing RAC on GFS typically requires the collaboration of database administrators (DBAs), storage administrators, and
system administrators in order to get the best setup. Installing RAC on GFS6/CS4 is an advanced activity for all three
groups.
3. Software Versions
Software Description
CS refers to CS4
4. Related Documentation
This manual is intended to be a complete “cookbook” and therefore attempts to eliminate the need to read other Oracle or
RHEL installation manuals. All steps to successfully install and instantiate an Oracle 10gRAC cluster on GFS6/CS4 are
contained in this manual.
Oracle is a very complex product and many permutations on installation are possible. The description of the sample cluster
provides a rationale for many of the best practices that are being deployed. Oracle RAC can be installed for HA, for
scalability, or to realize the cost savings of using a commodity RHEL enterprise computing platform. This sample four-
node cluster will provide some degree of high-avilability (HA) and scalability, but as always, the degree to which these are
realized is highly dependent on the mid-tier application architecture.
Referring to other manuals should not be necessary because all the information you need is in this document. However, if
you would like to learn how to customize your installation, there are Notes and Tips throughout the document to provide
some detail into why certain decisions were made for this sample cluster. For further optional reading:
• Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Installation Guide for 10g Release 2
(10.2) for Linux (Part Number B14203-05)
• Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
10g Release 2 (10.2) (Part Number B14197-03)
vi
Introduction
• Oracle® Administer
5. Document Conventions
Certain words in this manual are represented in different fonts, styles, and weights. This highlighting indicates that the
word is part of a specific category. The categories include the following:
Courier font
Courier font represents commands, file names and paths, and prompts .
If you have to run a command as root, the root prompt (#) precedes the command:
# gconftool-2
bold font
Bold font represents application programs and text found on a graphical interface.
Additionally, the manual uses different strategies to draw your attention to pieces of information. In order of how critical
the information is to you, these items are marked as follows:
Note
A note is typically information that you need to understand the behavior of the system.
Tip
A tip is typically an alternative way of performing a task.
Important
Important information is necessary, but possibly unexpected, such as a configuration change that will not persist
after a reboot.
Caution
A caution indicates an act that would violate your support agreement, such as recompiling the kernel.
Warning
A warning indicates potential data loss, as may happen when tuning hardware for maximum performance.
vii
Chapter 1. Sample Cluster
1. Oracle RAC Cluster on CS4/GFS
1.1. Sample 4-node Oracle RAC cluster
The sample cluster in this chapter represents a simple, yet effective RAC cluster deployment. Where necessary, a rationale
will be provided so that when your business requirements cause you to deviate from the sample cluster requirements, there
will be information to help you with these customizations.
Oracle 10gR2 RAC can provide both high availability and scalability using modern commodity servers running RHEL4.
Oracle 10gR2 comes in both 32-bit and 64-bit versions. A typical modern four node RAC cluster consists of high quality
commodity servers that provide superior price/performance, reliability, and modularity to make Oracle commodity com-
puting a viable alternative to large Enterprise class UNIX mainframes.
This sample cluster, which consists of four identical nodes, is the most common deployment topology. It is called a Sym-
metrical RAC cluster as all server nodes are identically configured.
1
Sample Cluster
Note
Asymmetrical cluster topologies also make sense where there is a need to isolate application writes to one node.
Spreading writes over several nodes in RAC can limit scalability. It can complicate node failover, but this high-
lights how important application integration with the RAC is. Asymmetrical RAC clusters slightly favor perform-
ance over availability.
1.2. Storage
This cluster uses a commodity storage platform (HP Storageworks MSA1500) which is a conventional 2GB/s FCP SAN.
GFS is a blocks-based clustered filesystem and therefore can run over any FCP or iSCSI SAN. The storage array must be
accessible from all nodes. Each node needs a FCP HBA. Storage vendors will be very particular about which HBAs and
which supported drivers are required for use with their equipment. Typically, the vendor will sell an attach kit that contains
the FCP HBA and the RHEL4 relevant drivers.
A minimum of one FCP switch is required, although many topologies are configured with two switches, which would then
require each node to have a dual-ported HBA.
Note
Like FCP, iSCSI is a blocks-based protocol that implements the T.10 SCSI command set. It does this over a TCP/
IP transport instead of a Fiber Channel Transport. The term SAN and NAS are no longer relevant in the modern
storage world, but have historically been euphemisms for SCSI over FCP (SAN) and NFS over TCP/IP (NAS).
What matters is whether or not it is a SCSI blocks-based protocol or if it uses the NFS filesystem protocol when
communicating with the storage array. Often iSCSI is mistakenly referred to as NAS, where it really has much
more in common with FCP, since the protocol is what matters, not the transport.
In order for Oracle to perform well, it requires spindles, not bandwidth. Many customers often configure their storage ar-
ray based on how much space the database might need or if they do consider performance, how much bandwidth. These
are not appropriate metrics for sizing a storage array to run an Oracle database.
Database performance almost always depends on the number of IOPs (I/O Operations) that a storage fabric can deliver and
this is inevitably a function of the number of spindles underneath a database table or index. The best strategy is to use the
SAME (Strip and Mirror everything) methodology. This allows any GFS volume to have access to the maximum IOP rate
the array supports without having the performance requirements of the application in advance.
When doing a SQL query that does an index range scan, thousands of IOPs may be needed. What determines the IOP rate
of a given physical disk is how fast it spins (RPM), the location of the data and the interface type. Here is an approximate
sizing guide:
Interface/RPM IOPs
A 144GB 10K FCP drive can sometimes out-perform a 72GB 10K drive because most of the data might be located in few-
er “tracks,” causing the larger disk to seek less. However, the rotational latency is identical and the track cache often does
not help as the database typically reads thousands of random blocks per second. SATA-I drives are particularly bad be-
cause they do not support tagged-queuing. Tagged queuing, an interface optimization found in SCSI, permits the disk to
process more I/O transactions, but it increases the cost. A 7200-rpm Ultra-Wide SCSI disk often out-performs the equival-
2
Sample Cluster
ent SATA-I due to tagged queuing. SATA-I drives are very high capacity and cheap, but are very poor at random IOPS.
SATA-II disks support tagged queuing.
A modern Oracle database should have at least two shelves (20-30 payload spindles) in order to insure that there is a reas-
onable amount of performance. In this cluster, the RAID10 volumes are implemented in the storage array, which is now
common practice. The extent allocation policy does influence performance and this will be defined when the volume
group is created with CLVM. CLVM will be presented with several physical LUNs that all have the same performance
characteristics.
When adding performance capacity to the storage array, it is important that the array re-balance the existing allocated
LUNs over this larger set of spindles so the database objects on those existing GFS volumes benefit from increased IOP
rates.
Note
Payload spindles are the physical disks that contain only data, not parity. RAID 0+1 configurations allow the mir-
ror to be utilized as payload, which doubles the IOP rate of a mirror. Some arrays that support this feature on con-
ventional RAID1 mirrors also perform this optimization.
The Oracle Clusterware files (Voting and Registry) are not currently supported on GFS. For this cluster, two 256MB
shared raw partitions located on LUN0 will be used. LUN0 is usually not susceptible to the device scanning instability, un-
less new SCSI adaptors are added to the node so not connect SCSI controllers to a RAC cluster node once Clusterware is
installed. Since all candidate CLVMD2 LUNs have the same performance characteristics, their size and number is determ-
ined by their usage requirements:
Each node in the cluster gets a dedicated volume for their specific Redo and Undo. One single GFS volume will contain all
datafiles and indexes. This normally will not cause a bottleneck on RHEL unless there is a requirement for more than
15,000 IOPs, but this is an operational trade-off of performance and simplicity in a modest cluster. The number of spindles
in the array should continue to be the bottleneck.
Caution
Choosing a RAID policy can be contentious. With databases, spindle-count matters more than size, so using a
simple RAID scheme such as RAID 1+0 (or RAID 10- mirrored, then striped) is often the best policy. It is faster
on random I/O, yet not as space-efficient as RAID4 and RAID5. The system will typically have far more space
because the array was correctly configured by spindle-count, not bandwidth or capacity.
In CS4, the lock servers are responsible for maintaining quorum and determining if a member node is in such a state that it
needs to be fenced in order to protect the integrity of the cluster. The master lock server will issue commands directly to
the power management interface to power cycle the server node. This is a very reliable fencing mechanism. CS4 supports
a variety of hardware interfaces that can affect power cycling on nodes that need to be fenced.
3
Sample Cluster
Note
If more redundancy is required, then adding another switch requires adding two more GbE ports to each server in
order to implement bonded interfaces. Just adding a 2nd private switch dedicated just to RAC does not help. If the
other switch fails, then CS4 heartbeat would fail and take RAC down with it.
Note
Private unmanaged switches are sufficient as these are standalone, isolated network fabrics. Network Operations
staff may still prefer that the switch is managed, but it should remain physically isolated from production VLANs.
RAC1-vip
4
Sample Cluster
RAC2-vip
RAC3-vip
RAC4-vip
5
Chapter 2. Installation and Configuration
of RHEL4
1. Using RHEL4 Update 3
Each node that will become an Oracle RAC cluster node requires RHEL4 Update 3 or higher. Previous versions of RHEL4
are not recommended. The lock servers must also be installed with the same version of RHEL4, but can either be 32-bit or
64-bit. In our sample cluster, the RAC nodes and the external lock servers are all 64-bit Opteron servers running 64-bit
versions of both RHEL and Oracle.
/boot 128MB
swap 4096MB # 10gR2 Installer expects at least 3932MB
/ 4096MB # A RAC/CS4-ready RHEL4 is about 2.5GB
/home 1024MB # Most of the ORACLE files are on GFS
Most customers will deploy with much larger drives, but this example helps explain what is being allocated. Oracle files
are mostly installed on a GFS volume. The /home requirements are so minimal, that it can be safely folded into the root
volume and still not exceed a 4GB partition. The size of the RHEL install including the need to recompile the kernel will
rarely exceed 4GB. Once installed, the only space growth would come from system logs or crash.
eth0
192.168.1.100 (SQL*Net App Tier)
eth1
192.168.2.100 (Oracle RAC GCS/CS4-GULM)
eth2
192.168.3.100 (Optional to isolate RAC from CS4-GULM)
The first two interfaces are required; the optional third network interface could be deployed to further separate CS4 lock
traffic from Oracle RAC traffic. In addition, NIC bonding (which is supported by Oracle RAC) is recommended for all in-
terfaces if further hardening is required. For the sake of simplicity in this example, this cluster does not deploy bonding
Ethernet interfaces.
DNS should be configured and enabled so that ntpd can locate the default clock servers. The ntpd process normally needs
DNS to look up the default name servers. If ntpd will be configured to use raw IP addresses, then DNS will not be re-
quired. This sample cluster will configure DNS during the install and ntpd during post-install.
6
Installation and Configuration of
RHEL4
imum subset required to configure and install both CS4 and Oracle 10gRAC is:
• X Windows
• Development Tools
• X Software Development
• Admin Tools
• System Tools
Note
* Sub-systems that are only available on x64 installs
The compatibility subsets appear as options only during 64-bit install sessions. More subsets can of course be selected, but
it is recommended that an Oracle 10gRAC node not be provisioned to do anything other than run Oracle 10gRAC. The
same recommendation also applies to GULM lock servers.
RAC clusters need to their clocks to be within a few minutes of each other, but not completely synchronous. Using ntpd
should provide accuracy within a second or better and this is more than adequate. If the clocks are wildly out on the system
after install, ntpd will slowly slew the clocks back into synchronization and this will not happen quickly enough to be ef-
fective. In order to use ntpdate as a one-time operation ntpd must not be running.
7
Installation and Configuration of
RHEL4
er’s BIOS Advanced section is usually where this is configured. The version of iLO that appears on most DL1xx series
boxes (i100) is not supported as it does not support sshd.
8
Chapter 3. Installation and Configuration
of Cluster Suite4
1. Installing ClusterSuite4 (CS4) components
1.1. Installing CS4 RPMs
ClusterSuite4 consists of the Cluster Configuration System (CCS), lock manager and other support utilities required by a
GULM implementation. This cluster has four RAC nodes and three GULM lock server nodes. Unless stated otherwise in
the Notes section, the following software must be loaded on all seven nodes. This is the list for a 64-bit install:
RPM Notes
magma-1.0.4-0.x86_64.rpm n/a
magma-plugins-1.0.6-0.x86_64.rpm n/a
You can run system-config-cluster over X11 or from the system console on lock1. Normally, it is best to initially con-
figure from run-level 2 (init [2]), so that services will not automatically startup at reboot before configuration testing
9
Installation and Configuration of Cluster
Suite4
is complete. Once a functioning cluster is verified, the system can be switched to either run-level 3 or 5. This configuration
example will run the GUI tool remotely using X11.
Run xclock, to make sure that the X11 clock program appears on the adminws desktop.
Tip
Running X through a firewall often requires you to set the flag on the ssh command and possibly fiddle with the
.ssh/config file so that ForwardX11 yes is included. Remember to disable this feature once you are preparing
to run the 10gCRS installer as it will need to execute ssh commands between nodes (such as ssh hostname
date) that only return the date string (in this case) and nothing else.
192.168.1.154 lock1
192.168.2.154 lock1-gulm
192.168.2.54 lock1-ilo
When using HP iLO power management, there is a fence device for every node in the cluster. This is different from having
one single device that fences all nodes (for example, a Brocade switch). When the node named lock1-gulm is created,
the corresponding fence device will be lock1-ILO. It is mandatory that the iLO network be accessible from the lock1
server. They share the same interface in this example, but this is not mandatory. Typically, there is only one iLO interface
port, so using the same network interface and switch as the CS4 heartbeat does not incur any further failure risk. Full
hardening of iLO is limited by the interface processor’s single port, but the fencing fabric could use bonded NICs on all
servers and a hardened production VLAN.
Although lock server hostnames and RAC node hostnames have a different naming convention, these services share the
same physical interface in this cluster. The hostname conventions are different to emphasize that it is possible to further
separate RAC and GULM networks if required for performance (not reliability). RAC-vip hostnames could also be defined
on a separate physical network so that redundant pathways to the application tiers can be also configured. Hardening is an
availability business requirement and this sample cluster emphasizes a cost-effective balance between availability and
complexity.
1. Run:
10
Installation and Configuration of Cluster
Suite4
3. Click the Grand Unified Lock Manager (GuLM) radio button, and then click OK.
4. Highlight Cluster Nodes in the Cluster pane, which will present the Add a Cluster Node button in the Properties
pane. Clicking Add a Cluster Node presents the Node Properties window:
11
Installation and Configuration of Cluster
Suite4
6. Select Fence Devices in the Cluster pane and then click Add a Fence Device:
12
Installation and Configuration of Cluster
Suite4
7. The username and password default for iLO systems is set at the factory to admin/admin. The hostname for the
iLO interface is lock1-ilo. The name of the fence device is lock1-ILO. Click OK.
13
Installation and Configuration of Cluster
Suite4
8. The fence device needs to be linked to the node in order to activate fencing.
14
Installation and Configuration of Cluster
Suite4
15
Installation and Configuration of Cluster
Suite4
11. After you click OK and close this window, the main window now shows that lock1-gulm is associated with the fence
level. The Properties pane should convey this by the message, "one fence in one fence level."
16
Installation and Configuration of Cluster
Suite4
17
Installation and Configuration of Cluster
Suite4
13. Now save this configuration and then test that this single GULM lock server can start up. Save the file into /
etc/cluster/cluster.conf using the File->Save menu option in the Cluster Configuration tool.
18
Installation and Configuration of Cluster
Suite4
The /etc/cluster/cluster.conf file now contains one master GULM lock server and it is now possible to exit
the GUI. Once the first lock server is running, re-starting the GUI tool will now permit the Cluster Management tab to be
selected and the server should appear in this display.
19
Installation and Configuration of Cluster
Suite4
#
# Node-private
#
GULM_OPTS="--name lock1-gulm --cluster alpha_cluster --use_ccs"
Make sure that the –-name parameter is the same name as the cluster node that was chosen in the GUI tool. The default
for this file is to use the server hostname, but this causes GULM to run over the public network interface. GULM traffic in
this cluster will run over a private network that corresponds to the hostname lock1-gulm. If the cluster name was
changed in the /Cluster Properties pane in the GUI, then the --cluster parameter must be changed to match that
value. If these values do not exactly match, then GULM will not startup successfully.
Note
Quorate (or Inquorate) is the term used in the system log to indicate the presence (or absence) of a GULM lock
server quorum. Without quorum, no access to the storage is permitted.
Tip
Set /etc/inittab to 2, so that when you transition to init3 or init5, you can do it from a system that is run-
ning and accessible (tail -f /var/log/messages for debug).
1. Open two terminal windows on lock1: one for typing commands and one for running tail –f /
var/log/messages.
Note
Making the /var/log/messages file visible to the user oracle or orainstall will make the procedure
easier. As these users need to read /var/log/messages much more frequently in a RAC environment,
providing group read permission is recommended, at least during the install.
2. Because lock1 is running in run-level 2, ccsd, lock_gulmd would not be running accidentally (for instance, you had
to reboot the server after installing the RPMs). Start up the ccsd process:
20
Installation and Configuration of Cluster
Suite4
Because rgmanager was installed on this node, the clustat utility can be used to verify the status of this lock man-
ager:
$ sudo clustat
21
Installation and Configuration of Cluster
Suite4
22
Installation and Configuration of Cluster
Suite4
23
Installation and Configuration of Cluster
Suite4
#
# Node-private
#
GULM_OPTS="--name lock2-gulm --cluster alpha_cluster --use_ccs"
# Node-private
#
GULM_OPTS="--name lock3-gulm --cluster alpha_cluster --use_ccs"
Verify the status check the status of the cluster using clustat:
lock1 $ clustat
24
Installation and Configuration of Cluster
Suite4
Note
The Send to Cluster button in the upper right hand corner of the Cluster Configuration tab will send the current
configuration only to nodes that have cluster status shown above. Because each node is brought up one at a time
during initial test and setup, this feature is not initially useful. Once the cluster is completely up and running and
all the nodes are in the cluster, this option is an effective way to distribute changes to /
25
Installation and Configuration of Cluster
Suite4
etc/cluster/cluster.conf.
1.3.7. Adding the Four RAC nodes and their Fence Devices
The steps to adding the four RAC nodes are identical to the adding lock servers. The only difference is the hostname con-
vention. GULM and RAC heartbeat share the same physical interface, but the hostname convention indicates that these
networks could be physically separated. The –priv suffix is a RAC hostname convention. During the Add a Cluster Node
step, use this hostname convention and do not check the GuLM Lockserver box
Fence devices follow the same naming convention and the only difference is that the rac1-ILO fence device is associ-
ated with rac1-priv.
26
Installation and Configuration of Cluster
Suite4
The completed configuration should show all seven cluster nodes and fence devices.
Managed Resources
This section does not need to be configured, as it is the responsibility of the Oracle Clusterware to manage the
RAC database resources. This section is reserved for CS4 hot-standby non-RAC configurations.
27
Installation and Configuration of Cluster
Suite4
#
# Node-private
#
GULM_OPTS="--name rac1-priv --cluster alpha_cluster --use_ccs"
#
# Node-private
#
GULM_OPTS="--name rac2-priv --cluster alpha_cluster --use_ccs"
#
# Node-private
#
GULM_OPTS="--name rac3-priv --cluster alpha_cluster --use_ccs"
#
# Node-private
#
GULM_OPTS="--name rac4-priv --cluster alpha_cluster --use_ccs"
28
Installation and Configuration of Cluster
Suite4
rac1 $ clustat
Shutting down the last two nodes that hold quorum will cause the cluster to become inquorate and all activity on the
cluster is blocked, including the ability to proceed with a normal shutdown. These steps will insure a clean shutdown and
only apply to the last two lock servers that are holding quorum. All other nodes should shut down normally.
Although clvmd has not been configured yet in this guide, it is a lock server client, so this protocol assumes it is running.
1. Remove existing lock server clients from each of the remaining nodes holding quorum:
2. Stop the GULM lock manager on lock2. This will cause the cluster to be in-quorate at this time.
29
Installation and Configuration of Cluster
Suite4
3. The remaining GULM lock manager must be shutdown using the gulm_tool utility and output from /
var/log/messages shows that the core is shutdown cleanly.
Once this is complete, both servers can be shutdown cleanly as ccsd will be able to terminate normally.
30
Chapter 4. Installing Clustered Logical
Volume Manager (CLVM)
1. Installing CLVM components
CLVM requires only one RPM, lvm2-cluster-2.02.01-1.2.RHEL4.x86_64.rpm, and this must be installed
on all seven nodes. The lock servers must be able to see the cluster logical volumes, but do not need to mount the GFS
volumes.
Warning
ALL GULM lock servers must have all of the shared storage mounted and must have CLVM installed, con-
figured, and running.
2. Configuring CLVMD
The cluster volume manager process clvmd needs to be configured and started on every node in the cluster. The configura-
tion file for clvmd is /etc/lvm/lvm.conf.
The default configuration of this file should be set up for clustered operation. The parameters locking_type and
locking_library should be verified. The library_dir parameter will be “/usr/lib” for a 32-bit installations:
3. Start up CLVMD
Run:
There are no physical groups or volume groups (VGs) defined at this time, otherwise the console output would list all ac-
tivated volumes it could find.
31
Chapter 5. Creating the Physical and
Logical Volumes
Logical volumes, once created and visible to members of the cluster, will appear to RHEL as block device entries. Logical
Volumes can optionally appear as rawdevices using the rawdevices service in RHEL. Both block and raw logical volumes
will be used to install RAC. Physical and Logical volumes can be initialized and configured using the commandline or the
GUI-based system-config-lvm.
1. Physical_Storage_Allocation
The storage array was configured to present a series of physical LUNs that will be initialized and configured by CLVM.
Warning
The current version of the GUI tool will not prevent the initialization and addition of local physical volumes to
cluster volumes. In this sample cluster, all media handled by LVM2 must be on shared storage for Oracle RAC to
function. Do not configure any node-local storage from any of the 7 nodes until it is known that this is resolved in
subsequent releases of the GUI tool.
For example, /dev/hda is visible to CLVM and it is a local physical device (the boot disk), but remains un-
initialized by CLVM. Only initialize the shared storage intended for use with Oracle RAC.
1. Run:
2. Run xclock, to make sure that the X11 clock program appears on the adminws desktop.
Tip
Running X through a firewall often requires you to set the flag on the ssh command and possibly fiddle with the
.ssh/config file so that X11Forward yes is included. Remember to disable this feature once you are pre-
paring to run the 10gCRS installer as it will need to execute ssh commands between nodes (such as ssh date)
that only return the date string (in this case) and nothing else.
3. Run:
32
Creating the Physical and Logical
Volumes
33
Creating the Physical and Logical
Volumes
1. Identify and highlight the 6GB raw LUN that will be listed under the Uninitialized Entries and then click Initialize
Entry. The Properties pane can be used to verify the size and other characteristics.
2. Create a new volume group on this physical volume group by selecting Partition 1 and then clicking Create new
Volume Group. Although volume groups may consist of multiple physical LUNs, these LUNs were created on a
storage array that implements both the stripping and mirroring. The physical extent size of 128k is a reasonable aver-
age extent size for most Oracle databases.
Note
Oracle has its own extent strategy and the policies used on tablespaces factor more directly on performance than
the LVM extent size. A good general practice for tablespaces (using the create tablespace command)that
hold tables is a 1M extent and the indexes will match the LVM extent size of 128K.
The volume group common is a 6GB volume that will contain the Oracle shared home installation of both Cluster-
ware and the RDBMS.
34
Creating the Physical and Logical
Volumes
3. Highlight the Logical View for the volume group common and then click on Create New Logical Volume
4. There are two ways to consume free space: click the Use remaining button or slide the allocation bar all the way to
the right.
Warning
Do NOT define any filesystems during logical volume creation. The GUI assumes that the number of nodes that
are in connected to the cluster is the number of lock journals that will be required. This cluster is being increment-
ally installed, so this value will be wrong. GFS volumes will be created using the mkfs.gfs command.
35
Creating the Physical and Logical
Volumes
...and the newly created logical volume will have a block device file names that will be used by the mkfs.gfs call
and in /etc/fstab.
6. Run:
lock1 # ls –l /dev/common/
2.3. Create the 1st redo volume group and logical volume
There are four 4GB physical LUNs that need to be initialized as physical groups, volume groups and then logical volumes.
Each one corresponds to the redo/undo GFS volume for each RAC node. The steps used to create this logical volume will
only show the first redo log, and the steps will be need to be repeated for the remaining three 4GB physical volumes.
1. Initialize the physical LUN /dev/sdb. The physical LUN /dev/sda is reserved for the Oracle Clusterware files
and must not be initialized by CLVM.
36
Creating the Physical and Logical
Volumes
2. Create the volume group redo1. Click on Create New Volume Group. Verify that the extent size is 128 KB.
3. Create the logical volume log1 by clicking on Logical View of redo1 and then clicking Create New Logical
Volume.
37
Creating the Physical and Logical
Volumes
Figure 5.9. Logical Volume Management window: Creating a new logical volume
Use the entire volume when creating the log1. The default units are Extents; you can use this or set it to Gigabytes.
Since this logical volume will be the entire contents of the volume group, leaving the units as Extents and then click-
ing on Use remaining will be the easiest.
38
Creating the Physical and Logical
Volumes
4. Verify in the Properties pane, that the pathname to this file is /dev/redo1/log1 and that the all 4GB was used to
create this logical volume. Click OK.
39
Creating the Physical and Logical
Volumes
40
Creating the Physical and Logical
Volumes
In this view, the completed logical volume layout highlights the CRS, shared home and datafiles logical volumes
41
Chapter 6. GFS
1. Installing GFS components
Most installs (including this sample cluster) will be have more than one CPU, so the SMP compatible kernel modules will
be required. This system is a 64-bit SMP, so the actual names will reflect the type of RHEL kernel that is running (RHEL4
Update 3). Install GFS on all four RAC nodes.
Note
GFS does not need to be installed on the GULM lock server nodes.
This process installs only the GFS module required by the specific kernel that is being used (64-bit, SMP). All
variants of the GFS kernel module may be installed, but it is not required—only the version that matches the
RHEL kernel is required.
42
GFS
-j 4 One for each RAC node. GULM lock servers do not run
GFS.
-J 32MB Oracle maintains the integrity of its filesystem with its own
journals or redo logs. All database files are opened
O_DIRECT (bypassing the RHEL buffer cache and the
need to use GFS journals). Additionally, redo logs are
opened O_SYNC.
Run:
Run:
43
GFS
The _netdev option is also useful as it insures the filesystems are un-mounted before cluster services shutdown. Copy
this section of the /etc/fstab file and move it to the other nodes in the system. These volumes were mounted in /mnt
and the corresponding mount directories needed to be created on every node.
44
Chapter 7. Oracle 10gR2 Clusterware
1. Installing Oracle 10gR2 Clusterware (formerly
10gR1 CRS)
Although this is documented in Oracle install manuals, in metalink notes, and elsewhere, it is consolidated here, so that
this manual can be used as the main reference for a successful installation. A good supplementary Oracle article for doing
RAC installations can be found here:
http://www.oracle.com/technology/pub/articles/smiley_rac10g_install.html
The LUN /dev/sda should be large enough to create two 256MB partitions. Using the /dev/sda command, create two
primary partitions:
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content will not be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): p
Disk /dev/sda: 536 MB, 536870912 bytes
17 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 1037 * 512 = 530944 bytes
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1011, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1011, default 1011): +256M
Command (m for help): p
45
Oracle 10gR2 Clusterware
If the other nodes were already up and running while you created these partitions, these other nodes must re-read the parti-
tion table from disk (blockdev –rereadpt /dev/sda).
Make sure the service rawdevices is enabled on all four RAC nodes for the run level that will be used. This example en-
ables it for both run levels. Run:
The permissions of these files must always be owned by the oracle user used to install the software (oracle). A 10
second delay is needed to insure that the rawdevices service has a chance to configure the /dev/raw directory. Add
these lines to the /etc/rc.local file. This file is symbolically linked to /etc/rc?.d/S99local.
echo "Sleep a bit first and then set the permissions on raw"
sleep 10
chown oracle:dba /dev/raw/raw?
Note
After you install Clusterware and if you see a set of three /tmp/crsctl.<pid> trace files, then Clusterware
did not start and there will be an error message in these files, usually complaining about permissions. Make sure
the /dev/raw/raw? files are owned by oracle owner (in this example, oracle:dba)
46
Oracle 10gR2 Clusterware
#
# Oracle specific settings
# x86 Huge Pages are 2MB
#
#vm.hugetlb_pool = 3000
#
kernel.shmmax = 4047483648
kernel.shmmni = 4096
kernel.shmall = 1051168
kernel.sem = 250 32000 100 128
net.ipv4.ip_local_port_range = 1024 65000
fs.file-max = 65536
#
# This is for Oracle RAC core GCS services
#
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 1048576
net.core.wmem_max = 1048576
The parameter that most often needs to be modified to support larger SGAs is the shared memory setting: ker-
nel.shmmax. Typically 75% of the memory in a node should be allocated to the SGA. This does assume a modest num-
ber of Oracle foreground processes, which can consume physical memory for allocating the PGA (Oracle Process Global
Area). The PGA is typically used for sorting. On a 4GB system, a 3GB SGA is recommended. The amount of memory
consumed by the SGA and the PGA are very workload-dependant.
Note
The maximum size of the SGA on a 64-bit version of RHEL4 is currently slightly less than 128GB. The maxim-
um size of the SGA on a 32-bit version of RHEL4 varies a bit. The standard size is 1.7GB. If the oracle binary is
lower mapped, then this maximum can be increased to 2.5GB on –SMP kernels and 3.7GB on –HUGEMEM ker-
nels. Lower mapping is an Oracle approved linking technique that changes the address where the SGA attaches in
the user address space. When it is lowered, there is more space available for attaching a larger shared memory
segment. See Metalink Doc 260152.1
Another strategy for extending the SGA to 8GB and higher in a 32-bit environment is through the use of the /
dev/shm filesystem, although this is not recommended. If you need this much SGA, then using the 64-bit ver-
sion of Oracle and RHEL4 is a better strategy.
The net.core.* parameters establish the UDP buffers that will be used by the Oracle Global Cache Services (GCS) for
heartbeats and inter-node communication (including the movement of Oracle buffers). For large SGAs (more than 16GB),
the use of HugeTLBs is recommended.
Tip
TLBs or Translation Lookaside Buffers is the working end of a Page Table Entry (PTE). The hardware speaks in
physical addresses, whereas the processes running in user-mode speak only PVAs (Process Virtual Address), in-
cluding the SGA. These addresses have to be translated and modern CPUs must provide some TLB register space
so that during memory loads, the translation does not cause extra memory references.
By default, the page table entry on x86 hardware is 4K. When configuring a large SGA (16GB or more), the num-
ber of 4K PTEs (or TLBs slots) required to just map the SGA into the user’s process space requires 4,000,000
PTEs. HugeTLBs are a mechanism in RHEL that permits the use of 2MB hardware page tables. This mechanism
reduces the number of PTEs required to map the SGA. The performance improvments increase with the size of
the SGA, but can be between 10-30%.
During RHEL installation, 4GB of swap was set up and the Oracle Installer will check for this minimum.
47
Oracle 10gR2 Clusterware
1.1.4. Create_a_clean_ssh_connection_environment
You have to insure that whenever Clusterware talks to other nodes in the cluster, the ssh commands proceed unimpeded
and without extraneous session dialog. In order to insure that all connection pathways are set up, run:
oracle@rac2's password:
OR
The authenticity of host 'rac2 (192.168.1.151)' can't be established.
RSA key fingerprint is 48:e5:e0:84:63:62:03:84:c7:57:05:6b:58:7d:12:07.
Are you sure you want to continue connecting (yes/no)?
Create a file of ~/.ssh/authorized_keys, distribute it to all four nodes and then proceed to execute ssh host-
name date to every host in the RAC cluster, in all combinations over both the primary and heartbeat interfaces. If you
miss any one of them, the Oracle Clusterware installer will fail at the node verification step.
On rac1, login to the oracle user and make sure $HOME/.ssh is empty. Do not supply a passphrase for the keygen
command; just press Return. Run:
Repeat this step on all four RAC nodes (not required by GULM lock servers), collect up all the ~/.ssh/id_dsa.pub
files into one ~/.ssh/authorized_keys file and distribute this to the other three nodes:
48
Oracle 10gR2 Clusterware
Run all cominations from all nodes for both PUBLIC and PRIVATE networks (including the node where you are currently
executing):
Run xclock, to make sure that the X11 clock program appears on the adminws desktop.
Although, you can have ORACLE_BASE, ORACLE_HOME pre-set in the oracle user profile prior to running the installer, it
is not mandatory. In our case, it is set to point to the shared Oracle home location that is a 6GB GFS volume. The installer
will detect these values if they are set:
export ORACLE_BASE=/mnt/ohome/oracle/1010
export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db
/home/oracle/inst/clusterware/runInstaller
49
Oracle 10gR2 Clusterware
********************************************************************************
Please run the script rootpre.sh as root on all machines/nodes. The script can be found at
Answer 'y' if root has run 'rootpre.sh' so you can proceed with Oracle Clusterware installa
Answer 'n' to abort installation and then ask root to run 'rootpre.sh'.
********************************************************************************
Has 'rootpre.sh' been run by root? [y/n] (n)
y
Starting Oracle Universal Installer...
50
Oracle 10gR2 Clusterware
Verify that $ORACLE_BASE/oraInventory is located on the shared GFS volume (/mnt/ohome). If you want an in-
ventory on each node for CRS or the RDBMS, you would need to type in a node local directory (/
opt/oracle/1010/oraInventory), but you have to insure the directory is created and owned by the oracle user
before you click Next.
51
Oracle 10gR2 Clusterware
This screen’s default path will need to be changed, as it wants to put the CRSHOME in ORACLE_HOME. This install is a
single, shared CRS install, so the path is on the shared GFS volume. The name was simplified to just crs. Click Next.
Prerequisite checks run and since we have done our preparation work in the file /etc/sysctl.conf, then we expect
no errors or warnings.
52
Oracle 10gR2 Clusterware
Click Next.
53
Oracle 10gR2 Clusterware
Click Next.
Next, the other three nodes need to be added to the cluster configuration. All of these hosts must be defined in /
etc/hosts on all nodes.
54
Oracle 10gR2 Clusterware
Click OK.
Click Next.
This is the step that fails if any part of the ssh hostname date set up was not performed correctly.
If the /etc/hosts, ~/.ssh/authorized_keys and ~/.ssh/known_hosts are all properly setup, then the in-
staller should proceed to the next screen. Fully qualified hostnames can sometimes cause confusion, so the public network
hostnames entered into the Clusterware installer must match the string that is returned from (hostname. Otherwise, go
back and verify the entire matrix of ssh hostname date calls to make sure all these paths are clean. Often the self-
referential ones are missed, ssh rac1 date from rac1 itself.
55
Oracle 10gR2 Clusterware
Figure 7.8. Oracle Universal Installer: Specify Network Interface Usage window
Edit the eth0 fabric and change the interface type to Public and click Next.
56
Oracle 10gR2 Clusterware
Click OK.
57
Oracle 10gR2 Clusterware
Click Next.
Assign the quorum voting and registry files. The option external redundancy is chosen as the files reside on a storage array
that implements redundancy.
58
Oracle 10gR2 Clusterware
Figure 7.11. Oracle Universal Installer: Specify Voting Disk Location window
The quorum vote disk will be located on /dev/raw/raw2. Once again, external redundancy is chosen. Click Next.
59
Oracle 10gR2 Clusterware
The installer starts to install, link and copy. This process typically takes less than 10 minutes depending on the perform-
ance of the CPU and the filesystem.
60
Oracle 10gR2 Clusterware
This screen prompts for 2 sets of scripts to be run on all 4 nodes. Run the orainstRoot.sh script first on each node, in
order.
Password:
Changing permissions of /mnt/ohome/oracle/1010/oraInventory to 770.
Changing groupname of /mnt/ohome/oracle/1010/oraInventory to dba.
The execution of the script is complete
61
Oracle 10gR2 Clusterware
If successful, the completion of the script on the fourth node should indicate that CSS is running on all nodes
Return to the main installer screen and click OK. Most of the verification and installation checks should pass.
62
Oracle 10gR2 Clusterware
If not, or if this pop-up occurs then is it likely the CRS application registration has failed to start up. This is usually due to
it not finding the tool in the path, but this can be fixed by running the vipca utility from rac1 once you quit the installer.
Click OK to the pop-up and Next for the Configuration Assistants screen.
63
Oracle 10gR2 Clusterware
The crs_stat command will display any registered CRS resources. There are currenlty none, so the vipca utility will need
to be executed next.
rac1 $ crs_stat –t
64
Oracle 10gR2 Clusterware
Click Next on this window and the next one. Then the hostnames mapping window appears:
65
Oracle 10gR2 Clusterware
Figure 7.18. VIP Configuration Assistant: Virtual IPs for Cluster Nodes window
Fill in the first IP Alias name and press Tab. The tool should fill in the rest.
66
Oracle 10gR2 Clusterware
Figure 7.19. VIP Configuration Assistant: Virtual IPs for Cluster Nodes window
67
Oracle 10gR2 Clusterware
68
Oracle 10gR2 Clusterware
rac1 $ crs_stat -t
69
Oracle 10gR2 Clusterware
70
Chapter 8. Installing Oracle 10gR2
Enterprise Edition Database
Installing the database requires you to run the Oracle Installer once more for the database-specific components. These
components will include configuring and registering the Oracle SQL*Net Tnslsnr process. This process needs to run on
each RAC node and should be registered with Oracle Clusterware. The final step is actually creating a database from /
dev/sda of the cluster.
1. RHEL Preparation
Oracle 10gR2 binaries are now being shipped with a dependency on the RHEL Async I/O library. This library will need to
be installed prior to running the Installer or the linking phase of the install process will fail. In this instance, it is not neces-
sary to verify if the library is installed, as it was not included in any packages. However, if you want to check prior to in-
stall and it is installed:
libaio-0.3.105-2
This installer is also X-based, so if you choose to do it remotely, a correct set up can be verified using xclock. Setting
these environment variables are optional prior to running the installer, but will become mandatory once the product is in-
stalled, so these two should be put into the appropriate shell profile for the user (in this case, oracle):
export ORACLE_BASE=/mnt/ohome/oracle/1010
export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db
The installer is located at the top of the install tree. The database installer files are located in the same directory as the Or-
acle Clusterware install directories, which is /home/oracle/inst. To start the installer, execute this shell script:
/home/oracle/inst/database/runInstaller
71
Installing Oracle 10gR2 Enterprise Edi-
tion Database
After an initial installer splash screen, this screen appears. Click Next.
72
Installing Oracle 10gR2 Enterprise Edi-
tion Database
73
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Choose a simple Name (as in the example above) and verify the Path. The installer should have extracted the path from the
environment variable:
export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db
Click Next.
74
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.4. Oracle Universal Installer: Specify Hardware Cluster Installation Mode window
Because this is a Shared Home install, leave only rac1 checked in the Specify Hardware Cluster Installation Mode
window.
The Product-Specific Prerequisite Checks may fail due to these the installer best practice minimums not being met. It
sometimes does make sense to at least review the Warnings to see if they are a legitimate concern. Often they are not, as in
this case.
75
Installing Oracle 10gR2 Enterprise Edi-
tion Database
The Product Specific Pre-Requisite Checks window may fail due to some of the installer best-practice minimums not
being met. It does make sense to at least review the Warnings to see if they are a legitimate concern. In this case, there
were zero requirements to be verified. This appears between the two panes in the window. Click Next.
76
Installing Oracle 10gR2 Enterprise Edi-
tion Database
77
Installing Oracle 10gR2 Enterprise Edi-
tion Database
This is the Summary screen. The Cluster Nodes shows only one node in the Cluster Nodes section because this is a
shared home install. A lot of files need to be copied, processed and linked and it is during this process where you find you
find out you had not installed the libaio-0.3.103-3.x86_64.rpm. Click Install.
78
Installing Oracle 10gR2 Enterprise Edi-
tion Database
79
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Once complete, a new window will open up on top of the Install screen asking for a script to be executed. Run:
$ sudo ./root.sh
Once this script has completed, click OK, which returns processing to the Install screen and eventually to the final End of
Installation screen. Click Exit.
80
Installing Oracle 10gR2 Enterprise Edi-
tion Database
/mnt/ohome/oracle/1010/product/db/bin/netca
81
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Click Next.
82
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.12. Oracle Net Configuration Assistant: RAC Active Nodes window
Verify that these are the correct node names (they should be) and also that they are all selected before clicking Next.
Click Next.
83
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.14. Oracle Net Configuration Assistant: Listener Configuration, Listener window
Click Next.
84
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.15. Oracle Net Configuration Assistant: Listener Configuration, Listener Name
window
Click Next.
Figure 8.16. Oracle Net Configuration Assistant: Listener Configuration, Select Protocols
window
Click Next.
85
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.17. Oracle Net Configuration Assistant: Listener Configuration, TCP/IP Protocol
window
Do not use the standard port number of 1521. Use another port number that is supplied to you by Network Operations. Ac-
cess Control lists in modern switches block most port numbers. The value assigned must be on the switches Access Con-
trol List (ACL) or clients will not be able to connect to the database. If this is not the case, it is still best not to choose the
default. Choose your birth year, perhaps. It is rare, but some database applications still assume 1521 is the listener port and
will not be able to connect. Check with the application’s network configuration documentation to determine how to set up
the application with correct port number. Click Next.
86
Installing Oracle 10gR2 Enterprise Edi-
tion Database
Figure 8.18. Oracle Net Configuration Assistant: Listener Configuration Done window
Click Next and then Finish at the next window, which will exit the netca.
By checking the output from the Clusterware command with $CRS_HOME/crs_stat –t , you can verify that the listeners
are now registered with Cluster services and will automatically restart when Clusterware restarts (which is usually when
the machine is rebooted). Run:
crs_stat –t
After the database is created, it can also be registered with Clusterware. Once the database and RAC nodes are registered,
Clusterware will be able to automatically restart is complete, then the four instances can also be registered with Cluster-
ware, so that the database instance on a given node will also start up upon reboot. This Clusterware registration step must
be performed after the database has been successfully created.
87
Chapter 9. Creating a Database
1. Database File Layout
A four-node Oracle RAC database is actually only one database, but has four instances, one running on each node. An Or-
acle instance consists of shared memory (Shared Global Area or SGA) and Oracle background processes. Oracle functions
like an operating system that has a transactional file system with a buffer cache and journaling (redo logs). Each instance
shares access to the database files, but maintains an instance-specific set of redo logs. Although these logs are instance-
private, they must also be shared and visible so that any other node can perform RAC instance recovery.
Oracle file I/O falls into two usage categories: database files and transaction files. Database files (and the sparse TEMP
files) contain database blocks, which hold the user’s data. The I/O profile of these files is either small random reads and
writes or large sequential reads and writes, depending on the SQL application.
Transaction files (redo and undo) are typically small, very low-latency sequential writes and reads. The transaction files
are instance-specific and have an I/O profile distinct from the datafiles.
All volumes in our sample cluster are evenly stripped across all spindles. This is called the SAME (Stripe and Mirror
Everything) strategy, which avoids most I/O tuning problems. The problem you are going to have is the problem everyone
faces and that is not enough IOPs or spindles. If you have expand the array by adding more spindles, then it must be cap-
able of adding the performance capacity of these new spindles to existing GFS volumes. Most modern storage arrays do
this, but it is usually considered an advanced activity for the storage administrator.
There is also an undo tablespace for each instance as well. For example, the undo tablespace holds the entire encoded con-
tents of a table whose zip code column is being updated. All the values as they existed prior to the update statement are
stored in the undo tablespace. If the transaction issues a commit, the undo contents are discarded. If the transaction issues a
rollback, the undo contents are retrieved and the table is put back to its original pre-update state. The undo tablespaces are
initially 64MB, but are allowed to expand up to 2GB. In our example, a single UPDATE statement of 100s of millions of
rows of zip codes could be supported with 2GB worth of available undo. It is unlikely that the redo logs would ever be ex-
panded, but undo requirements may exceed 2GB. If that is the case, then redo/undo volumes could be created to be 8GB
instead of 4GB.
export ORACLE_BASE=/mnt/ohome/oracle/1010
export ORACLE_HOME=/mnt/ohome/oracle/1010/product/db
export ORACLE_BASE_SID=rhel
88
Creating a Database
export ORACLE_INSTANCE=1
export ORACLE_SID=$ORACLE_BASE_SID$ORACLE_INSTANCE
export ADMHOME=/mnt/ohome/oracle/admin
export PATH=$ORACLE_HOME/bin:$CRS_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH
The $ADMHOME is a variable that is defined in addition to the standard list above and it refers to the location of all of the
critical administration files for the RAC cluster. This admin directory must be on the shared home GFS volume. Some ali-
ases you can’t live without are listed below that help you navigate around both $ADMHOME and $ORACLE_HOME, includ-
ing a simple alias named dba that starts up sqlplus and connects as sysdba with the minimum of typing:
Note
Do not recycle an init.ora from some previous released version of Oracle. Oracle releases change enough so
that old init.ora settings become irrelevant or even counter-productive.
Each instance can have its own private init.ora, but it is not required and creates more problems than it
solves. A single init.ora can be customized to contain instance-specific parameters.
Some DBAs will require the use of the SPFILE feature in Oracle, which stores a copy of the init.ora inside
the database. This is often a production policy preference, but this example uses text mode, despite the risk that
somebody could delete or corrupt it. Both the init.ora and controlfile should be regularly archived or backed
up with something as simple as a cronjob.
The actual init.ora must be kept in a common location on the shared home admin directory for parameter files
($ADMHOME/$ORACLE_SID/pfile). All instances have must access to it and other admin related files. The Oracle
SQL command startup command assumes that the instance’s init.ora file is located in $ORACLE_HOME/dbs. Four
symbolic links must be created. This example shows how the environment variables factor in locating key files:
rac1 $ ls –l init*.ora
89
Creating a Database
control_files='/mnt/ohome/oracle/admin/rhel/ctl/control01.ctl', '/mnt/oradata/oracle/ctl/co
*.db_name = 'rhel'
*.db_block_size = 8192
#
# SGA Sizing
*.sga_target = 3300M
#
# File I/O
filesystemio_options = setall
#
# Network and Listeners
rhel1.local_listener = listener_rac1
rhel2.local_listener = listener_rac2
rhel3.local_listener = listener_rac3
rhel4.local_listener = listener_rac4
#
# Undo
*.undo_management = 'AUTO'
rhel1.undo_tablespace = 'UNDOTBS1'
rhel2.undo_tablespace = 'UNDOTBS2'
rhel3.undo_tablespace = 'UNDOTBS3'
rhel4.undo_tablespace = 'UNDOTBS4'
#
# Foreground and Background Dump Destinations
rhel1.background_dump_dest ='/mnt/ohome/oracle/admin/rhel/bdump/rhel1'
rhel2.background_dump_dest ='/mnt/ohome/oracle/admin/rhel/bdump/rhel2'
rhel3.background_dump_dest ='/mnt/ohome/oracle/admin/rhel/bdump/rhel3'
rhel4.background_dump_dest ='/mnt/ohome/oracle/admin/rhel/bdump/rhel4'
*.core_dump_dest ='/mnt/ohome/oracle/admin/rhel/cdump'
*.user_dump_dest ='/mnt/ohome/oracle/admin/rhel/udump'
#
# RAC Identification
*.cluster_database_instances= 4
*.cluster_database = FALSE # FALSE ONLY for database create phase
rhel1.thread = 1
rhel2.thread = 2
rhel3.thread = 3
rhel4.thread = 4
rhel1.instance_name = rhel1
rhel2.instance_name = rhel2
rhel3.instance_name = rhel3
rhel4.instance_name = rhel4
rhel1.instance_number = 1
rhel2.instance_number = 2
rhel3.instance_number = 3
rhel4.instance_number = 4
In a few cases, some parameters are “optionally mandatory”. You do not have to set an optionally mandatory parameter,
but if you do not, your system will not work effecitively. An example of such parameters is setting the size of the SGA
buffers and pools.
90
Creating a Database
single point of failure in an Oracle RAC database can be the loss of all controlfiles.
If the controlfiles are in the same directory as the datafiles, then there can be a slight performance impact on create or ex-
tend operations. This is easily avoided by just putting the controlfiles in a sub-directory underneath the datafiles. All con-
trolfiles need to be on shared media for cluster recovery. Create a cron job to email the contents to a safe place.
The *. at the beginning of any parameter indicates that this value applies to all instances that use this init.ora. If
rac4 could accommodate a buffer cache of 5GB, then a rac4-specific entry might look like:
rac4.sga_target = 5072M
*.filesystemio_options=directIO
*.filesystemio_options=asyncIO
This parameter enables either DirectIO or AsyncIO. DirectIO bypasses the GFS filesystem buffer cache. This prevents
memory from being buffered in memory by two buffer caches. DirectIO provides near-raw performance. AsyncIO in-
creases I/O performance further still, but its benefit is usually only seen at very high I/O rates. Both setall are recom-
mended for RHEL4.3 and higher.
rhel =
(DESCRIPTION=
(ADDRESS_LIST=
(CONNECT_DATA=(SERVICE_NAME=rhel))
(LOAD_BALANCE=OFF)
(ADDRESS = (PROTOCOL = tcp)(HOST = rac1)(PORT = 1921))
(ADDRESS = (PROTOCOL = tcp)(HOST = rac2)(PORT = 1921))
(ADDRESS = (PROTOCOL = tcp)(HOST = rac3)(PORT = 1921))
(ADDRESS = (PROTOCOL = tcp)(HOST = rac4)(PORT = 1921))
)
)
91
Creating a Database
If you forget this step, then you should see these error messages when attempting to start an instance:
If you start up the instance on rac1 with the nomount option, it will create an SGA and start up the background ses-
sions. This will verify that it is possible to bring up an instance. Here is a simple script, which can be modified to bring the
database down easily (shutdown immediate). The following script assumes the default location for the init.ora
($ORACLE_HOME/dbs):
#!/bin/sh
sqlplus << EOF
connect / as sysdba
startup nomount
exit
EOF
rac1 $ nomnt
92
Creating a Database
Because no database has been created yet, there is no valid controlfile and this is a normal error at this early stage.
However, you have verified that an Oracle instance on rac1 can start up. The next step is to actually create the database
using the following script:
spool db_create
STARTUP NOMOUNT
CREATE DATABASE rhel CONTROLFILE REUSE
LOGFILE
GROUP 1 ('/mnt/log1/oracle/logs/redo11.log') SIZE 512M reuse,
GROUP 2 ('/mnt/log1/oracle/logs/redo12.log') SIZE 512M reuse,
GROUP 3 ('/mnt/log1/oracle/logs/redo13.log') SIZE 512M reuse
CHARACTER SET UTF8
NATIONAL CHARACTER SET UTF8
NOARCHIVELOG
MAXINSTANCES 4
MAXLOGFILES 128
MAXLOGMEMBERS 3
MAXLOGHISTORY 10240
MAXDATAFILES 256
DATAFILE '/mnt/oradata/oracle/sys.dbf'
SIZE 256M REUSE EXTENT MANAGEMENT LOCAL
SYSAUX
DATAFILE '/mnt/oradata/oracle/sysaux.dbf'
SIZE 256M REUSE AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED
UNDO TABLESPACE undotbs1
DATAFILE '/mnt/log1/oracle/undo1.dbf'
SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 2048M
DEFAULT TEMPORARY TABLESPACE temp
TEMPFILE '/mnt/oradata/oracle/temp.dbf'
SIZE 256M REUSE AUTOEXTEND ON NEXT 1024M MAXSIZE UNLIMITED;
rem Make sure that the basic create works and then either re-run
rem the whole thing or paste the rest of it into a sqlplus session
rem
exit;
CREATE UNDO TABLESPACE undotbs2
DATAFILE '/mnt/log2/oracle/undo2.dbf'
SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 2048M;
CREATE UNDO TABLESPACE undotbs3
DATAFILE '/mnt/log3/oracle/undo3.dbf'
SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 2048M;
CREATE UNDO TABLESPACE undotbs4
DATAFILE '/mnt/log4/oracle/undo4.dbf'
SIZE 64M REUSE AUTOEXTEND ON NEXT 64M MAXSIZE 2048M;
ALTER DATABASE ADD LOGFILE THREAD 2
GROUP 4 ( '/mnt/log2/oracle/logs/redo21.log' ) SIZE 512M reuse,
GROUP 5 ( '/mnt/log2/oracle/logs/redo22.log' ) SIZE 512M reuse,
GROUP 6 ( '/mnt/log2/oracle/logs/redo23.log' ) SIZE 512M reuse;
ALTER DATABASE ENABLE PUBLIC THREAD 2;
ALTER DATABASE ADD LOGFILE THREAD 3
GROUP 7 ( '/mnt/log3/oracle/logs/redo31.log' ) SIZE 512M reuse,
93
Creating a Database
Run this script and remember to first create all sub-directories for the logs and the controlfiles, as Oracle will not create
these sub-directories. This simple script creates the database by calling the create.sql script:
#!/bin/bash
set echo on
# @? In sqlplus translates to @$ORACLE_HOME
time sqlplus /nolog << EOF > bld.lst
connect / as sysdba
shutdown abort
@create
@?/rdbms/admin/catalog
EOF
The catalog.sql is a master script that defines the Oracle data dictionary. If these steps are successful, then you have a
single node RAC database running in Exclusive mode. Shut down the database, change *.cluster_database =
TRUE and then start up rac1 again. (Remember, this is still all on rac1). If the listener is running on this node, then the
network status of this node can be verified using the command:
2.4.13. Registering the database and the instance with Oracle Cluster-
ware
Oracle Clusterware supports the capability of registering both the database and each instance so that it will automatically
start up an instance when it detects that it is not running. The utility srvctl is used to register the instances for auto-start:
94
Creating a Database
It also permits the use of srvctl to manually start and stop the instances from one node:
This command may be run from any node and it can retrieve the status of any node:
To get a consolidated cluster-wide status, the $CRS_HOME/bin/crs_stat –t utility can be used. This example
shows all services and instances registered and online. The instances are listed at the end and their online status indic-
ates that the database instance is registered with Oracle Clusterware and is running.
rac1 $ crs_stat –t
------------------------------------------------------------
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
ora....C3.lsnr application ONLINE ONLINE rac3
ora.rac3.gsd application ONLINE ONLINE rac3
ora.rac3.ons application ONLINE ONLINE rac3
ora.rac3.vip application ONLINE ONLINE rac3
ora....C4.lsnr application ONLINE ONLINE rac4
ora.rac4.gsd application ONLINE ONLINE rac4
ora.rac4.ons application ONLINE ONLINE rac4
ora.rac4.vip application ONLINE ONLINE rac4
ora.rhel.db application ONLINE ONLINE rac2
ora....l1.inst application ONLINE ONLINE rac1
ora....l2.inst application ONLINE ONLINE rac2
ora....l3.inst application ONLINE ONLINE rac3
ora....l4.inst application ONLINE ONLINE rac4
A corresponding network status inquiry from node1 shows the presence of services for all four nodes:
95
Creating a Database
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1921)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias LISTENER_RAC1
Version TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date 14-APR-2006 17:08:13
Uptime 0 days 9 hr. 19 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /mnt/ohome/oracle/1010/product/db/network/admin/listener.ora
Listener Log File /mnt/ohome/oracle/1010/product/db/network/log/listener_rac1.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.20)(PORT=1921)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.150)(PORT=1921)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "rhel" has 4 instance(s).
Instance "rhel1", status READY, has 3 handler(s) for this service...
Instance "rhel2", status READY, has 2 handler(s) for this service...
Instance "rhel3", status READY, has 2 handler(s) for this service...
Instance "rhel4", status READY, has 2 handler(s) for this service...
Service "rhel_XPT" has 4 instance(s).
Instance "rhel1", status READY, has 3 handler(s) for this service...
Instance "rhel2", status READY, has 2 handler(s) for this service...
Instance "rhel3", status READY, has 2 handler(s) for this service...
Instance "rhel4", status READY, has 2 handler(s) for this service...
The command completed successfully
96
Index
97