Professional Documents
Culture Documents
What Is a Cluster?
A cluster consists of a group of independent but interconnected computers whose combined
resources can be applied to a processing task. A common cluster feature is that it should appear
to an application as though it were a single server. Most cluster architectures use a dedicated
network (cluster interconnect) for communication and coordination between cluster nodes.
A common cluster architecture for data-intensive transactions and computations is built around
shared disk storage. Shared-nothing clusters use an alternative architecture where storage is not
shared and data must be either replicated or segmented across the cluster. Shared-nothing
clusters are commonly used for workloads that can be easily and predictably divided into small
units that can be spread across the cluster in parallel. Shared disk clusters can perform these
tasks but also offer increased flexibility for varying workloads. Load balancing clusters allow a
single application to balance its workload across the cluster. Alternatively, in a failover cluster,
some nodes can be designated as the primary host for an application, whereas others act as the
primary host for different applications. In a failover cluster, the failure of a node requires that the
applications it supports be moved to a surviving node. Load balancing clusters can provide
failover capabilities but they can also run a single application across multiple nodes providing
greater flexibility for different workload requirements. Oracle supports a shared disk cluster
architecture providing load balancing and failover capabilities. In an Oracle cluster, all nodes
must share the same processor architecture and run the same operating system.
What Is Clusterware?
Clusterware is a term used to describe software that provides interfaces and services that enable
and support a cluster.
Different cluster architectures require clusterware that delivers different services. For example,
in a simple failover cluster the clusterware may monitor the availability of applications and
perform a failover operation if a cluster node becomes unavailable. In a load balancing cluster,
different services are required to support workload concurrency and coordination.
Typically, clusterware includes capabilities that:
• Allow the cluster to be managed as a single entity (not including OS requirements), if
desired
• Protect the integrity of the cluster so that data is protected and the cluster continues to
function even if communication with a cluster node is severed
• Maintain a registry of resources so that their location is known across the cluster and so
that dependencies between resources is maintained
• Deal with changes to the cluster such as node additions, removals, or failures
• Provide a common view of resources such as network addresses and files in a file system
Oracle Clusterware
Oracle Clusterware is a key part of Oracle Grid Infrastructure, which also includes Automatic
Storage Management (ASM) and the ASM Cluster File System (ACFS).
In Release 11.2, Oracle Clusterware can use ASM for all the shared files required by the cluster.
Oracle Clusterware is also an enabler for the ASM Cluster File System, a generalized cluster file
system that can be used for most file-based data such as documents, spreadsheets, and reports.
The combination of Oracle Clusterware, ASM, and ACFS provides administrators with a unified
cluster solution that is not only the foundation for the Oracle Real Application Clusters (RAC)
database, but can also be applied to all kinds of other applications.
Note: Grid Infrastructure is the collective term that encompasses Oracle Clusterware, ASM, and
ACFS. These components are so tightly integrated that they are often collectively referred to as
Oracle Grid Infrastructure.
NIC1 NIC1
NIC2 NIC2
NIC1
NIC2 Interconnect: Private network
GPnP Domain
The GPnP domain is a collection of nodes served by the GPnP service. In most cases, the size of
the domain is limited to the domain served by the multicast domain. The nodes in a multicast
domain implicitly know which GPnP domain they are in by the public key of their provisioning
authority. A GPnP domain is defined by the nodes in a multicast domain that recognize a
particular provisioning authority, as determined by the certificate in their software images. A
GPnP node refers to a computer participating in a GPnP domain. It must have the following:
IP Connectivity: The node must have at least one routable interface with connectivity outside of
the GPnP domain. If the node has varying connectivity (multiple interfaces, subnets, and so on),
the required binding (public, private, storage) needs to be identified in the GPnP profile. The
physical connectivity is controlled by some outside entity and is not modified by GPnP.
Unique identifier: Associated with a node is a unique identifier, established through OSD. The
ID is required to be unique within the GPnP domain, but is usefully unique globally within the
largest plausible domain.
Personality: Characteristics of the node personality include:
• Cluster name
• Network classifications (public/private)
• Storage to be used for ASM and CSS
• Digital signatures
• The software image for the node, including the applications that may be run
Software image
• A software image is a read-only collection of software to be
run on nodes of the same type.
• At a minimum, the image must contain:
– An operating system
– The GPnP software
GPnP Components
A software image is a read-only collection of software to be run on nodes of the same type. At a
minimum, it must contain an operating system, the GPnP software, the security certificate of the
provisioning authority, and other software required to configure the node when it starts up. In the
fullest form, the image may contain all application software to be run on the node, including the
Oracle Database, iAS, and customer applications. Under GPnP, an image need not be specific to
a particular node and may be deployed to as many nodes as needed.
An image is created by a provisioning authority through means not defined by GPnP. It may
involve doing an installation on an example machine, then “scraping” the storage to remove
node-dependent data.
An image is distributed to a node by the provisioning authority in ways outside the scope of
GPnP—it may be a system administrator carrying CDs around, a network boot, or any other
mechanism.
GPnP Profile
A GPnP profile is a small XML file that is used to establish the correct global personality for
otherwise identical images. A profile may be obtained at boot time from a remote location, or it
may be burned permanently in an image. It contains generic GPnP configuration information
and may also contain application-specific configuration data. The generic GPnP configuration
data is used to configure the network environment and the storage configuration during bootup.
Application parts of the profile may be used to establish application-specific personalities. The
profile is security sensitive. It might identify the storage to be used as the root partition of a
machine. A profile must be digitally signed by the provisioning authority, and must be validated
on each node before it is used. Therefore, the image must contain the public certificate of the
authority to be used for this validation. Profiles are intended to be difficult to change. They may
be burned into read-only images or replicated to machines that are down when a change is made.
Profile attributes defining node personality include:
• Cluster name
• Network classifications (public/private)
• Storage to be used for ASM and CSS
• Digital signature information
Changes made to a cluster and therefore the profile are replicated by the gpnpd daemon during
installation, system boot, or when updated. These updates can be triggered by making changes
with configuration tools such as oifcfg, crsctl, asmca, and so on.
Oracle Grid Infrastructure 11g: Manage Clusterware and ASM F - 16
Grid Naming Service
SL SL SL
remote_listener
LL LL LL LL LL
Client local_listener
loaded SCAN
node VIP
for agents
service orarootagent
DHC
P Profile
GPnP GPnP GPnP GPnP GPnP replication
GPnPd discovery
GNS
DNS
Database
client
SCAN Listener1
Listener2
SCAN
listener
SCAN Listener3
listener
Shared
Memory
storage
Levels of Scalability
Successful implementation of cluster databases requires optimal scalability on four levels:
• Hardware scalability: Interconnectivity is the key to hardware scalability, which greatly
depends on high bandwidth and low latency.
• Operating system scalability: Methods of synchronization in the operating system can
determine the scalability of the system. In some cases, potential scalability of the hardware
is lost because of the operating system’s inability to handle multiple resource requests
simultaneously.
• Database management system scalability: A key factor in parallel architectures is
whether the parallelism is affected internally or by external processes. The answer to this
question affects the synchronization mechanism.
• Application scalability: Applications must be specifically designed to be scalable. A
bottleneck occurs in systems in which every session is updating the same data most of the
time. Note that this is not RAC specific and is true on single-instance system too.
It is important to remember that if any of the areas above are not scalable (no matter how
scalable the other areas are), then parallel cluster processing may not be successful. A typical
cause for the lack of scalability is one common shared resource that must be accessed often. This
causes the otherwise parallel operations to serialize on this bottleneck. A high latency in the
synchronization increases the cost of synchronization, thereby counteracting the benefits of
parallelization. This is a general limitation and not a RAC-specific limitation.
Oracle Grid Infrastructure 11g: Manage Clusterware and ASM G - 5
Scaleup and Speedup
Original system
HBA1
HBA2
HBA1
HBA2
HBA1
HBA2
8 ° 200 MB/s = 1600 MB/s
Throughput Performance
Component Theory (Bit/s) Maximal Byte/s
HBA ½ Gbit/s 100/200 Mbytes/s
16 Port Switch 8 ° 2 Gbit/s 1600 Mbytes/s
Fibre Channel 2 Gbit/s 200 Mbytes/s
1008 1008
1 2
Lost
updates!
1008 1008
4 3
Cluster
Node1 Noden
Instance1 Instancen
GRD Master Cache GRD Master Cache
… Global …
LMON GES resources GES LMON
LMD0 LMD0
LMSx GCS GCS LMSx
LCK0 Interconnect LCK0
DIAG DIAG
Cluster
Node1 Node2
Instance1 Instance2
Cache 1009 1009 Cache
… 3 …
LMON LMON
LMD0 LMD0
LMSx 4 LMSx
LCK0 LCK0
DIAG DIAG
Cluster
Node1 Node2
Instance1 Instance2
Cache 1009 1010 Cache
… 3 …
LMON LMON
LMD0 LMD0
LMSx LMSx
LCK0 5 4 LCK0
DIAG DIAG
Only one
disk I/O
1010
Reconfiguration remastering
Dynamic Reconfiguration
When one instance departs the cluster, the GRD portion of that instance needs to be redistributed
to the surviving nodes. Similarly, when a new instance enters the cluster, the GRD portions of
the existing instances must be redistributed to create the GRD portion of the new instance.
Instead of remastering all resources across all nodes, RAC uses an algorithm called lazy
remastering to remaster only a minimal number of resources during a reconfiguration. This is
illustrated on the slide. For each instance, a subset of the GRD being mastered is shown along
with the names of the instances to which the resources are currently granted. When the second
instance fails, its resources are remastered on the surviving instances. As the resources are
remastered, they are cleared of any reference to the failed instance.
Instance2
Object
Node2
☺ ☺
After
dynamic
☺ ☺ remastering
Instance1
Instance2
Node1
No messages are sent to remote node when reading into cache.
V$INSTANCE V$INSTANCE
UPDATE UPDATE
1 2