Professional Documents
Culture Documents
h2603 Oracle DB Emc Symmetrix Stor Sys WP LDV
h2603 Oracle DB Emc Symmetrix Stor Sys WP LDV
Version 1.3
Yaron Dar
Copyright © 2008, 2009, 2010, 2011 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date regulatory document for your product line, go to the Technical Documentation and
Advisories section on EMC Powerlink.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
H2603.3
Preface
Title Page
1 Oracle Systems Architecture......................................................................... 27
2 Physical data elements in an Oracle configuration ................................... 30
3 Relationship between data blocks, extents, and segments....................... 32
4 Oracle two-node RAC configuration........................................................... 37
5 Symmetrix VMAX logical diagram ............................................................. 47
6 Basic synchronous SRDF configuration ...................................................... 54
7 SRDF consistency group ............................................................................... 57
8 SRDF establish and restore control operations .......................................... 63
9 SRDF failover and failback control operations .......................................... 65
10 Geographically distributed four-node EMC SRDF/CE clusters............. 67
11 EMC Symmetrix configured with standard volumes and BCVs ............ 69
12 ECA consistent split across multiple database-associated hosts............. 73
13 ECA consistent split on a local Symmetrix system ................................... 74
14 Creating a copy session using the symclone command ........................... 77
15 TimeFinder/Snap copy of a standard device to a VDEV......................... 80
16 SRM commands.............................................................................................. 82
17 EMC Storage Viewer...................................................................................... 87
18 PowerPath/VE vStorage API for multipathing plug-in........................... 91
19 Output of rpowermt display command on a Symmetrix VMAX
device .................................................................................................................94
20 Device ownership in vCenter Server........................................................... 95
21 Virtual Provisioning components .............................................................. 101
22 Virtual LUN eligibility tables ..................................................................... 103
23 Copying a cold (shutdown) Oracle database with TimeFinder/
Mirror ...............................................................................................................112
24 Copying a cold Oracle database with TimeFinder/Clone ..................... 114
25 Copying a cold Oracle database with TimeFinder/Snap....................... 116
26 Copying a running Oracle database with TimeFinder/Mirror............. 119
27 Copying a running Oracle database with TimeFinder/Clone .............. 121
28 Copying a running Oracle database with TimeFinder/Snap................ 123
Oracle Databases on 13
Figures
Title Page
1 Oracle background processes ........................................................................ 28
2 SYMCLI base commands ............................................................................... 49
3 TimeFinder device type summary................................................................ 79
4 Data object SRM commands .......................................................................... 83
5 Data object mapping commands .................................................................. 83
6 File system SRM commands to examine file system mapping ................ 84
7 File system SRM command to examine logical volume mapping ........... 85
8 SRM statistics command ................................................................................ 85
9 Comparison of database cloning technologies ......................................... 154
10 Database cloning requirements and solutions .......................................... 154
11 Background processes for managing a Data Guard environment......... 280
12 Initialization parameters .............................................................................. 331
13 Background processes for managing a Data Guard environment......... 353
14 FAST VP Oracle test environment .............................................................. 390
15 Initial tier allocation for test cases with shared ASM disk group .......... 391
16 FINDB initial tier allocation......................................................................... 393
17 Initial AWR report for FINDB ..................................................................... 393
18 Oracle database tier allocations-initial and FAST VP enabled ............... 395
19 FAST VP enabled database response time from the AWR report ......... 395
20 FINDB and HRDB initial storage tier allocation....................................... 397
21 Initial AWR report for FINDB ..................................................................... 397
22 FAST VP enabled database transaction rate changes .............................. 399
23 Initial tier allocation for a test case with independent ASM disk
groups ..............................................................................................................399
24 Initial AWR report for CRMDB and SUPCHDB....................................... 401
25 AST VP enabled AWR report for CRMDB and SUPCHDB .................... 402
26 Storage tier allocation changes during the FAST VP-enabled run ........ 403
27 Test configuration ......................................................................................... 408
28 Storage and ASM configuration for each test database........................... 409
29 Database storage placement (initial) and workload profile.................... 409
30 Initial Oracle AWR report inspection (db file sequential read).............. 410
31 Initial FAST performance analysis results................................................. 416
32 Results after FAST migration of DB3 to Flash .......................................... 417
33 ASM diskgroups, and Symmetrix device and composite groups ........ 444
34 Test hardware ................................................................................................ 464
Conventions used in EMC uses the following conventions for special notices.
this document
Note: A note presents information that is important, but not hazard-related.
IMPORTANT
An important notice contains information essential to operation of
the software or hardware.
Typographical conventions
EMC uses the following type style conventions in this document:
Normal Used in running (nonprocedural) text for:
• Names of interface elements (such as names of windows,
dialog boxes, buttons, fields, and menus)
• Names of resources, attributes, pools, Boolean expressions,
buttons, DQL statements, keywords, clauses, environment
variables, functions, utilities
• URLs, pathnames, filenames, directory names, computer
names, filenames, links, groups, service keys, file systems,
notifications
Bold Used in running (nonprocedural) text for:
• Names of commands, daemons, options, programs, processes,
services, applications, utilities, kernels, notifications, system
calls, man pages
Used in procedures for:
• Names of interface elements (such as names of windows,
dialog boxes, buttons, fields, and menus)
• What user specifically selects, clicks, presses, or types
Italic Used in all text (including procedures) for:
• Full titles of publications referenced in text
• Emphasis (for example a new term)
• Variables
Introduction
The Oracle RDBMS on open systems first became available in 1979
and has steadily grown to become the marketshare leader in
enterprise database solutions. With a wide variety of features and
functionality, Oracle provides a stable platform for handling
concurrent, read-consistent access to a customer's application data.
Oracle database 10g and 11g, the latest releases of the Oracle RDBMS,
have introduced a variety of new and enhanced features over
previous versions of the database. Among these are:
◆ Increased self-management through features such as Automatic
Undo Management, Oracle managed files, and mean time to
recovery enhancements.
◆ Improved toolsets and utilities such as Recovery Manager
(RMAN), Oracle Data Guard, and Oracle Enterprise Manager
(OEM).
◆ Introduction of Automatic Storage Management (ASM).
◆ Enhancements to Oracle Real Application Clusters.
◆ Introduction of Database Resource Manager.
◆ Enhancements to Oracle Flashback capabilities.
◆ Introduction of Oracle VM server virtualization.
Oracle's architectural robustness, scalability, and availability
functions have positioned it as a cornerstone in many customers'
enterprise system infrastructures. A large number of EMC®
customers use Oracle in open-systems environments to support large,
mission-critical business applications.
Oracle overview
The Oracle RDBMS can be configured in multiple ways. The
requirement for 24x7 operations, replication and disaster recovery,
and the capacity of the host(s) that will contain the Oracle instance(s)
will, in part, determine how the Oracle environment must be
architected.
Snnn
Redo ARCn
System Global Area (SGA)
Log
PMON
Active
Shared Redo Log LGWR Archive
Redo
Pool Buffers Logs
Log
CKPT SMON
PGA
Data Redo
DB Block Buffers Dictionary Log
DBWn
ICO-IMG-0
The System Global Area (SGA) contains the basic memory structures
that an Oracle database instance requires to function. The SGA
contains memory structures such as the Buffer Cache (shared area for
users to read or write Oracle data blocks), Redo Log Buffer (circular
buffer for the Oracle logs), Shared Pool (including user SQL and
PL/SQL code, data dictionary, and more), Large Pool, and others.
Oracle overview 27
Oracle on Open Systems
Process Description
DBWn Writes data from buffer cache to the datafiles on disk. Up to 20 database
(Database writer processes can be started per Oracle instance. The number of writers
Writer) can be controlled manually by using the DB_WRITER_PROCESSES
init.ora parameter. If not specified, Oracle will determine automatically the
number of writers.
LGWR (Log Manages the redo log buffer and transmitting data from the buffer to the redo
Writer) logs on disk. Log writer writes to the logs whenever one of these four
scenarios occurs:
• A user committed transaction
• Every three seconds
• When the redo buffer is third full
• If DB writer needs to write dirty blocks, but their redo log is still in the
redo buffer
ARCn Copies the redo logs to one or more log directories when a log switch
(Database occurs. The ARCn process is only turned on if the database is in
Archiver) ARCHIVELOG mode and automatic archiving is enabled. Up to 10 archive
processes can be started per Oracle instance, controlled by the init.ora
parameter LOG_ARCHIVE_MAX_PROCESSES.
CKPT When the Oracle system performs a checkpoint, DBWn needs to destage
(Checkpoint) data to disk. The CKPT process updates the data file header accordingly.
Process Description
PMON Cleans up after a user process fails. The process frees up resources
(Process including database locks and the blocks in the buffer cache of the failed
Monitor) process.
Snnn (Server Connects user processes to the database instance. Server processes can
processes) either be dedicated or shared, depending on user requirements and the
amount of host memory available.
Oracle overview 29
Oracle on Open Systems
REDO1
SYSTEM
REDO2
CNTL 1
CNTL 2 ARCH 14
CNTL 2 ARCH 15
Binaries ARCH 16
ICO-IMG-000502
Oracle redo logs contain data and undo changes. All changes to the
database are written to the redo logs, unless logging of allowed
database objects, such as user tables, is explicitly disabled. Two or
more redo logs are configured, and normally the logs are multiplexed
to prevent data loss in the event that database recovery is required.
Archive logs are offloaded copies of the redo logs and are normally
required for recovering an Oracle database. Archive logs can be
multiplexed, both locally and remotely.
Oracle binaries are the executables and libraries used to initiate the
Oracle instance. Along with the binaries, Oracle uses many other
files to manage and monitor the database. These files include the
initialization parameter file (init<sid>.ora), server parameter file
(SPFILE), alert log, and trace files.
Oracle overview 31
Oracle on Open Systems
Figure 3 shows the relationship between the data blocks, extents, and
segments.
Segment
(1920 KB)
Extent Extent
(960 KB) (960 KB)
Storage management
Standard Oracle backup/restore, disaster recovery, and cloning
methods can be difficult to manage and time-consuming. EMC
Symmetrix® provides many alternatives or solutions that make these
operations easy to manage, fast, and very scalable. In addition, EMC
developed many best practices that increase Oracle performance and
high availability when using Symmetrix storage arrays.
Storage management 33
Oracle on Open Systems
SGA SGA
Shared storage
SYSTEM DATA INDEX
Binaries Binaries
Install base
With more than 55,000 mutual customers, EMC and Oracle are
recognized as the leaders in automated networked storage and
enterprise software, respectively. The EMC Symmetrix VMAX and
DMX offer the highest levels of performance, scalability and
availability along with industry-leading software for successfully
managing and maintaining complex Oracle database environments.
In addition, EMC IT has one of the largest deployments of Oracle
Applications in the world, with over 35,000 named users and over
3,500 concurrent users at peak periods. Also Oracle IT uses both
CLARiiON® and Symmetrix extensively.
Joint engineering
Engineers for EMC and Oracle continue to work together to develop
integrated solutions, document best practices, and ensure
interoperability for customers deploying Oracle databases in EMC
Symmetrix VMAX and DMX storage environments. Key EMC
technologies such as TimeFinder and SRDF have been certified
through Oracle's Storage Certification Program (OSCP). As Oracle
phased out OSCP based on the maturity of the technology,
Engineering efforts continue between the two companies to ensure
successful integration between each company's products. With each
major technology or new product line EMC briefs Oracle Engineering
about the technology changes and together they review best
practices. EMC publishes many of the technology and deployment
best practices as joint logo papers with the presence of the Oracle logo
showing the strong communication and relationship between the
companies.
Introduction
EMC provides many hardware and software products that support
Oracle environments on Symmetrix systems. This chapter provides a
technical overview of the EMC products referenced in this document.
The following products, which are highlighted and discussed, were
used and/or tested with VMware Infrastructure deployed on EMC
Symmetrix.
EMC offers an extensive product line of high-end storage solutions
targeted to meet the requirements of mission-critical databases and
applications. The Symmetrix product line includes the DMX Direct
Matrix Architecture™ series and the VMAX Virtual Matrix™ series.
EMC Symmetrix is a fully redundant, high-availability storage
processor, providing nondisruptive component replacements and
code upgrades. The Symmetrix system features high levels of
performance, data integrity, reliability, and availability.
EMC Enginuity™ Operating Environment — Enginuity enables
interoperation between the latest Symmetrix platforms and previous
generations of Symmetrix systems and enables them to connect to a
large number of server types, operating systems and storage software
products, and a broad selection of network connectivity elements and
other devices, ranging from HBAs and drivers to switches and tape
systems.
EMC Solutions Enabler — Solutions Enabler is a package that
contains the SYMAPI runtime libraries and the SYMCLI command
line interface. SYMAPI provides the interface to the EMC Enginuity
operating environment. SYMCLI is a set of commands that can be
invoked from the command line or within scripts. These commands
can be used to monitor device configuration and status, and to
perform control operations on devices and data objects within a
storage complex.
EMC Symmetrix Remote Data Facility (SRDF) — SRDF is a
business continuity software solution that replicates and maintains a
mirror image of data at the storage block level in a remote Symmetrix
system. The SRDF component extends the basic SYMCLI command
set of Solutions Enabler to include commands that specifically
manage SRDF.
Introduction 43
EMC Foundation Products
ICO-IMG-000752
<200Km
Escon
Server FC
GigE
Source Target
ICO-IMG-000001
SRDF benefits
SRDF offers the following features and benefits:
◆ High data availability
◆ High performance
◆ Flexible configurations
◆ Host and application software transparency
◆ Automatic recovery from a component or link failure
◆ Significantly reduced recovery time after a disaster
◆ Increased integrity of recovery procedures
◆ Reduced backup and recovery costs
◆ Reduced disaster recovery complexity, planning, testing, etc.
◆ Supports Business Continuity across and between multiple
databases on multiple servers and Symmetrix systems.
RDF-ECA R1(X)
R2(A)
R2(X)
R1(Y)
R2(B)
R2(Y)
Host 2
2 R2(C)
R1(C) R2(Z)
Consistency group
Host component 3
Symmetrix control Facility
DBMS
R1(Z)
RDF-ECA X = DBMS data
Y = Application data
Z = Logs
ICO-IMG-000106
SRDF terminology
This section describes various terms related to SRDF operations.
Update operation
The update operation allows users to resynchronize the R1s after a
failover while continuing to run application and database services
on the R2s. This function helps reduce the amount of time that a
failback to the R1 side takes. The update operation is a subset of
the failover/failback functionality. Practical uses of the R1
update operation usually involve situations in which the R1
becomes almost synchronized with the R2 data before a failback,
while the R2 side is still online to its host. The -until option, when
used with update, specifies the target number of invalid tracks that
are allowed to be out of sync before resynchronization to the R1
completes.
Concurrent SRDF
Concurrent SRDF means having two target R2 devices configured as
concurrent mirrors of one source R1 device. Using a Concurrent
SRDF pair allows the creation of two copies of the same data at two
remote locations. When the two R2 devices are split from their
source R1 device, each target site copy of the application can be
accessed independently.
R1/R2 swap
Swapping R1/R2 devices of an SRDF pair causes the source R1
device to become a target R2 device and vice versa. Swapping SRDF
devices allows the R2 site to take over operations while retaining a
remote mirror on the original source site. Swapping is especially
useful after failing over an application from the R1 site to the R2 site.
SRDF swapping is available with Enginuity version 5567 or later.
Data Mobility
Data mobility is an SRDF configuration that restricts SRDF devices to
operating only in adaptive copy mode. This is a lower-cost licensing
option that is typically used for data migrations. It allows data to be
transferred in adaptive copy mode from source to target, and is not
designed as a solution for DR requirements unless used in
combination with TimeFinder.
Dynamic SRDF
Dynamic SRDF allows the creation of SRDF pairs from non-SRDF
devices while the Symmetrix system is in operation. Historically,
source and target SRDF device pairing has been static and changes
required assistance from EMC personnel. This feature provides
greater flexibility in deciding where to copy protected data.
Dynamic RA groups can be created in a SRDF switched fabric
environment. An RA group represents a logical connection between
two Symmetrix systems. Historically, RA groups were limited to
those static RA groups defined at configuration time. However, RA
groups can now be created, modified, and deleted while the
Symmetrix system is in operation. This provides greater flexibility in
forming SRDF-pair-associated links.
◆ Failover switches data processing from the source (R1) side to the
target (R2) side. The source side volumes (R1), if still available,
are write-disabled.
◆ Failback switches data processing from the target (R2) side to the
source (R1) side. The target side volumes (R2), if still available,
are write-disabled.
Establish
Data Data
Failover
Data Data
Failover
Scheduled maintenance or storage system problems can disrupt
access to production data at the source site. In this case, a failover
operation can be initiated from either host to make the R2 device
read/write-enabled to its host. Before issuing the failover, all
applications services on the R1 volumes must be stopped. This is
because the failover operation makes the R1 volumes read-only.
The following command initiates a failover on all SRDF pairs in the
device group named MyDevGrp:
symrdf –g MyDevGrp failover –noprompt
Failback
To resume normal operations on the R1 side, a failback (R1 device
takeover) operation is initiated. This means read/write operations on
the R2 device must be stopped, and read/write operations on the R1
device must be started. When the failback command is initiated,
the R2 becomes read-only to its host, while the R1 becomes
read/write-enabled to its host. The following command performs a
failback operation on all SRDF pairs in the device group named
MyDevGrp:
symrdf –g MyDevGrp failback -noprompt
The SRDF pair must already be in one of the following states for the
failback operation to succeed:
◆ Failed over
◆ Suspended and write-disabled at the source
◆ Suspended and not ready at the source
◆ R1 Updated
◆ R1 UpdInProg
The failback operation:
◆ Write-enables the R1 devices.
◆ Performs a track table merge to discard changes on the R1s.
◆ Transfers the changes on the R2s.
◆ Resumes traffic on the SRDF links.
◆ Write-disables the R2 volumes.
Clients
Enterprise LAN/WAN
Primary Secondary
site nodes site nodes
R1 R1
R2 SRDF R2
ICO-IMG-000005
EMC TimeFinder
The SYMCLI TimeFinder component extends the basic SYMCLI
command set to include TimeFinder or business continuity
commands that allow control operations on device pairs within a
local replication environment. This section specifically describes the
functionality of:
◆ TimeFinder/Mirror — General monitor and control operations
for business continuance volumes (BCV)
◆ TimeFinder/CG — Consistency groups
◆ TimeFinder/Clone — Clone copy sessions
◆ TimeFinder/Snap — Snap copy sessions
Commands such as symmir and symbcv perform a wide spectrum
of monitor and control operations on standard/BCV device pairs
within a TimeFinder/Mirror environment. The TimeFinder/Clone
command, symclone, creates a point-in-time copy of a source device
on nonstandard device pairs (such as standard/standard,
BCV/BCV). The TimeFinder/Snap command, symsnap, creates
virtual device copy sessions between a source device and multiple
virtual target devices. These virtual devices only store pointers to
changed data blocks from the source device, rather than a full copy of
the data. Each product requires a specific license for monitoring and
control operations.
Configuring and controlling remote BCV pairs requires EMC SRDF
business continuity software discussed previously. The combination
of TimeFinder with SRDF provides for multiple local and remote
copies of production data.
Figure 11 illustrates application usage for a TimeFinder/Mirror
configuration in a Symmetrix system.
Note: When BCVs are established, they are inaccessible to any host.
EMC TimeFinder 69
EMC Foundation Products
Regular split
A regular split is the type of split that has existed for
TimeFinder/Mirror since its inception. With a regular split (before
Enginuity version 5568), I/O activity from the production hosts to a
standard volume was not accepted until it was split from its BCV
pair. Therefore, applications attempting to access the standard or the
BCV would experience a short wait during a regular split. Once the
split was complete, no further overhead was incurred.
Beginning with Enginuity version 5568, any split operation is an
instant split. A regular split is still valid for earlier versions and
for current applications that perform regular split operations.
However, current applications that perform regular splits with
Enginuity version 5568 actually perform an instant split.
By specifying the –instant option on the command line, an instant
split with Enginuity versions 5x66 and 5x67 can be performed.
Since version 5568, this option is no longer required because instant
split mode has become the default behavior. It is beneficial to
continue to supply the –instant flag with later Enginuity versions,
otherwise the default is to wait for the background split to
complete.
Instant split
An instant split shortens the wait period during a split by
dividing the process into a foreground split and a background
split. During an instant split, the system executes the foreground
split almost instantaneously and returns a successful status to the
host. This instantaneous execution allows minimal I/O disruptions to
the production volumes. Furthermore, the BCVs are accessible to the
hosts as soon as the foreground process is complete. The background
split continues to split the BCV pair until it is complete. When the
-instant option is included or defaulted, SYMCLI returns
immediately after the foreground split, allowing other operations
while the BCV pair is splitting in the background.
The following operation performs an instant split on all BCV pairs
in MyDevGrp, and allows SYMCLI to return to the server process
while the background split is in progress:
symmir -g MyDevGrp split –instant –noprompt
EMC TimeFinder 71
EMC Foundation Products
Controlling host
SYMAPI
Host A
ECA
STD BCV
Database
STD BCV prodgrp
servers
Consistent split
Host C ICO-IMG-000007
EMC TimeFinder 73
EMC Foundation Products
Host Symmetrix 4
ICO-IMG-000008
TimeFinder/Clone operations
Symmetrix TimeFinder/Clone operations using SYMCLI can create
up to 16 copies from a source device onto target devices. Unlike
TimeFinder/Mirror, TimeFinder/Clone does not require the
traditional standard-to-BCV device pairing. Instead,
TimeFinder/Clone allows any combination of source and target
EMC TimeFinder 75
EMC Foundation Products
devices. For example, a BCV can be used as the source device, while
another BCV can be used as the target device. Any combination of
source and target devices can be used. Additionally,
TimeFinder/Clone does not use the traditional mirror positions the
way that TimeFinder/Mirror does. Because of this,
TimeFinder/Clone is a useful option when more than three copies of
a source device are desired.
Normally, one of the three copies is used to protect the data against
hardware failure.
The source and target devices must be the same emulation type (FBA
or CKD). The target device must be equal in size to the source device.
Clone copies of striped or concatenated metavolumes can also be
created providing the source and target metavolumes are identical in
configuration. Once activated, the target device can be instantly
accessed by a target’s host, even before the data is fully copied to the
target device.
TimeFinder/Clone copies are appropriate in situations where
multiple copies of production data is needed for testing, backups, or
report generation. Clone copies can also be used to reduce disk
contention and improve data access speed by assigning users to
copies of data rather than accessing the one production copy. A single
source device may maintain as many as 16 relationships that can be a
combination of BCVs, clones and snaps.
1
2 DEV
001 Target host
Server running
SYMCLI DEV
005
ICO-IMG-000490
The activation of a clone enables the copying of the data. The data
may start copying immediately if the –copy keyword is used. If the
–copy keyword is not used, tracks are only copied when they are
accessed from the target volume or when they are changed on the
source volume.
Activation of the clone session established in the previous create
command can be accomplished using the following command.
symclone –g MyDevGrp activate -noprompt
Solutions Enabler 7.1 and Enginuity 5874 SR1 introduce the ability to
clone from thick to thin devices using TimeFinder/Clone. thick to
thin TimeFinder/Clone allows application data to be moved from
standard Symmetrix volumes to virtually provisioned storage within
the same array. For some workloads virtually provisioned volumes
offer advantages with allocation utilization, ease of use and
performance through automatic wide striping. thick to thin
TimeFinder/Clone provides an easy way to move workloads that
EMC TimeFinder 77
EMC Foundation Products
TimeFinder/Snap operations
Symmetrix arrays provide another technique to create copies of
application data. The functionality, called TimeFinder/Snap, allows
users to make pointer-based, space-saving copies of data
simultaneously on multiple target devices from a single source
device. The data is available for access instantly. TimeFinder/Snap
allows data to be copied from a single source device to as many as 128
target devices. A source device can be either a Symmetrix standard
device or a BCV device controlled by TimeFinder/Mirror, with the
exception being a BCV working in clone emulation mode. The target
device is a Symmetrix virtual device (VDEV) that consumes
negligible physical storage through the use of pointers to track
changed data.
The VDEV is a host-addressable Symmetrix device with special
attributes created when the Symmetrix system is configured.
However, unlike a BCV which contains a full volume of data, a VDEV
is a logical-image device that offers a space-saving way to create
instant, point-in-time copies of volumes. Any updates to a source
device after its activation with a virtual device, causes the pre-update
image of the changed tracks to be copied to a save device. The virtual
device’s indirect pointer is then updated to point to the original track
data on the save device, preserving a point-in-time image of the
volume. TimeFinder/Snap uses this copy-on-first-write technique to
conserve disk space, since only changes to tracks on the source cause
any incremental storage to be consumed.
The symsnap create and symsnap activate commands are
used to create source/target Snap pair.
Device Description
Virtual device A logical-image device that saves disk space through the use of pointers to
track data that is immediately accessible after activation. Snapping data to a
virtual device uses a copy-on-first-write technique.
Save device A device that is not host-accessible but accessed only through the virtual
devices that point to it. Save devices provide a pool of physical space to store
snap copy data to which virtual devices point.
BCV A full volume mirror that has valid data after fully synchronizing with its source
device. It is accessible only when split from the source device that it is mirroring.
EMC TimeFinder 79
EMC Foundation Products
Controlling host
1 I/O
DEV
2 001 Device pointers
from VDEV to
original data
I/O
VDEV
005 Data copied to
save area due to
Target host SAV copy on write
DEV
ICO-IMG-000491
Note: The acronym for EMC Storage Resource Management (SRM) can be
easily confused with the acronym for VMware Site Recovery Manager. To
avoid any confusion, this document always refers to VMware Site Recovery
Manager as VMware SRM.
Data BCV
Host
SRM DEV DEV
001 001
1 SYMCLI Mapping Command
SYMAPI Data BCV
SYMCLI Invoke Database APIs DEV DEV
2
Identify devices 002 002
DBMS Log BCV
Map database objects
3 between database metadata DEV DEV
PowerPath or and the SYMCLI database 003 003
ECA
4 Log BCV
TimeFinder SPLIT
DEV DEV
004 004
ICO-IMG-000011
EMC Solutions Enabler with a valid license for TimeFinder and SRM
is installed on the host. In addition, the host must also have
PowerPath or use ECA, and must be utilized with a supported DBMS
system. As discussed in “TimeFinder split operations” on page 70,
when splitting a BCV, the system must perform housekeeping tasks
that may require a few seconds on a busy Symmetrix system. These
tasks involve a series of steps (shown in Figure 16 on page 82) that
result in the separation of the BCV from its paired standard:
1. Using the SRM base mapping commands, first query the
Symmetrix system to display the logical-to-physical mapping
information about any physical device, logical volume, file,
directory, and/or file system.
2. Using the database mapping command, query the Symmetrix to
display physical and logical database information.
3. Next, use the database mapping command to translate:
• The devices of a specified database into a device group or a
consistency group, or
• The devices of a specified table space into a device group or a
consistency group.
4. The BCV is split from the standard device.
show Shows information about a database object: table space, tables, file,
or schema of a database, File, segment, or a table of a specified table
space or schema
tbs2dg Translates the devices of a specified table space into a device group.
Only data database files are translated.
Table 7 lists the SYMCLI commands that can be used to examine the
logical volume mapping.
A typical view of the Storage Viewer for vSphere Client can be seen in
Figure 17.
EMC PowerPath
EMC PowerPath is host-based software that works with networked
storage systems to intelligently manage I/O paths. PowerPath
manages multiple paths to a storage array. Supporting multiple paths
enables recovery from path failure because PowerPath automatically
detects path failures and redirects I/O to other available paths.
PowerPath also uses sophisticated algorithms to provide dynamic
load balancing for several kinds of path management policies that the
user can set. With the help of PowerPath, systems administrators are
able to ensure that applications on the host have highly available
access to storage and perform optimally at all times.
A key feature of path management in PowerPath is dynamic,
multipath load balancing. Without PowerPath, an administrator must
statically load balance paths to logical devices to improve
performance. For example, based on current usage, the administrator
might configure three heavily used logical devices on one path, seven
moderately used logical devices on a second path, and 20 lightly used
logical devices on a third path. As I/O patterns change, these
statically configured paths may become unbalanced, causing
performance to suffer. The administrator must then reconfigure the
paths, and continue to reconfigure them as I/O traffic between the
host and the storage system shifts in response to usage changes.
Designed to use all paths concurrently, PowerPath distributes I/O
requests to a logical device across all available paths, rather than
requiring a single path to bear the entire I/O burden. PowerPath can
distribute the I/O for all logical devices over all paths shared by
those logical devices, so that all paths are equally burdened.
PowerPath load balances I/O on a host-by-host basis, and maintains
statistics on all I/O for all paths. For each I/O request, PowerPath
intelligently chooses the least-burdened available path, depending on
the load-balancing and failover policy in effect. In addition to
improving I/O performance, dynamic load balancing reduces
management time and downtime because administrators no longer
need to manage paths across logical devices. With PowerPath,
configurations of paths and policies for an individual device can be
changed dynamically, taking effect immediately, without any
disruption to the applications.
EMC PowerPath 89
EMC Foundation Products
PowerPath/VE
EMC PowerPath/VE delivers PowerPath Multipathing features to
optimize VMware vSphere virtual environments. With
PowerPath/VE, you can standardize path management across
heterogeneous physical and virtual environments. PowerPath/VE
enables you to automate optimal server, storage, and path utilization
in a dynamic virtual environment. With hyper-consolidation, a
virtual environment may have hundreds or even thousands of
independent virtual machines running, including virtual machines
with varying levels of I/O intensity. I/O-intensive applications can
disrupt I/O from other applications and before the availability of
PowerPath/VE, load balancing on an ESX host system had to be
manually configured to correct for this. Manual load-balancing
operations to ensure that all virtual machines receive their individual
required response times are time-consuming and logistically difficult
to effectively achieve.
PowerPath/VE works with VMware ESX and ESXi as a multipathing
plug-in (MPP) that provides enhanced path management capabilities
to ESX and ESXi hosts. PowerPath/VE is supported with vSphere
(ESX4) only. Previous versions of ESX do not have the PSA, which is
required by PowerPath/VE.
PowerPath/VE installs as a kernel module on the vSphere host.
PowerPath/VE will plug in to the vSphere I/O stack framework to
bring the advanced multipathing capabilities of PowerPath -
dynamic load balancing and automatic failover - to the VMware
vSphere platform (Figure 18 on page 91).
EMC PowerPath 91
EMC Foundation Products
PowerPath/VE features
PowerPath/VE provides the following features:
◆ Dynamic load balancing - PowerPath is designed to use all paths
at all times. PowerPath distributes I/O requests to a logical
device across all available paths, rather than requiring a single
path to bear the entire I/O burden.
◆ Auto-restore of paths - Periodic auto-restore reassigns logical
devices when restoring paths from a failed state. Once restored,
the paths automatically rebalance the I/O across all active
channels.
◆ Device prioritization - Setting a high priority for a single or
several devices improves their I/O performance at the expense of
the remaining devices, while otherwise maintaining the best
possible load balancing across all paths. This is especially useful
when there are multiple virtual machines on a host with varying
application performance and availability requirements.
◆ Automated performance optimization - PowerPath/VE
automatically identifies the type of storage array and sets the
highest performing optimization mode by default. For
Symmetrix, the mode is SymmOpt (Symmetrix Optimized).
◆ Dynamic path failover and path recovery - If a path fails,
PowerPath/VE redistributes I/O traffic from that path to
functioning paths. PowerPath/VE stops sending I/O to the failed
path and checks for an active alternate path. If an active path is
available, PowerPath/VE redirects I/O along that path.
PowerPath/VE can compensate for multiple faults in the I/O
channel (for example, HBAs, fiber-optic cables, Fibre Channel
switch, storage array port).
PowerPath/VE management
PowerPath/VE uses a command set, called rpowermt, to monitor,
manage, and configure PowerPath/VE for vSphere. The syntax,
arguments, and options are very similar to the traditional powermt
commands used on all the other PowerPath Multipathing supported
operating system platforms. There is one significant difference in
that rpowermt is a remote management tool.
Not all vSphere installations have a service console interface. In
order to manage an ESXi host, customers have the option to use
vCenter Server or vCLI (also referred to as VMware Remote Tools) on
a remote server. PowerPath/VE for vSphere uses the rpowermt
command line utility for both ESX and ESXi. PowerPath/VE for
vSphere cannot be managed on the ESX host itself. There is neither a
local nor remote GUI for PowerPath on ESX.
Administrators must designate a Guest OS or a physical machine to
manage one or multiple ESX hosts. rpowermt is supported on
Windows 2003 (32-bit) and Red Hat 5 Update 2 (64-bit).
When the vSphere host server is connected to the Symmetrix system,
the PowerPath/VE kernel module running on the vSphere host will
associate all paths to each device presented from the array and
associate a pseudo device name (as discussed earlier). An example of
this is shown in Figure 15 on page 80, which shows the output of
rpowermt display host=x.x.x.x dev=emcpower0. Note in the output
that the device has four paths and displays the optimization mode
(SymmOpt = Symmetrix optimization).
EMC PowerPath 93
EMC Foundation Products
EMC PowerPath 95
EMC Foundation Products
Thin device
A thin device is a “Host accessible device” that has no storage
directly associated with it. Thin devices have pre-configured sizes
and appear to the host to have that exact capacity. Storage is allocated
in chunks when a block is written to for the first time. Zeroes are
provided to the host for data that is read from chunks that have not
yet been allocated.
Data device
Data devices are specifically configured devices within the
Symmetrix that are containers for the written-to blocks of thin
devices. Any number of data devices may comprise a data device
pool. Blocks are allocated to the thin devices from the pool on a round
robin basis. This allocation block size is 768K.
Figure 21 on page 101 depicts the components of a Virtual
Provisioning configuration:
Pool A
Data
devices
Thin
Devices
Pool B
Data
devices
ICO-IMG-000493
Flash y y y RAID 1 y y y x
RAID 6 y y y x
Fibre y y y
Channel RAID 6 y y y x
Un- y y y x
SATA y y y Protected
ICO-IMG-000754
transition from one protection type to another while servers and their
associated applications and Symmetrix software are accessing the
device.
The Virtual LUN feature offers customers the ability to effectively
utilize SATA storage - a much cheaper, yet reliable, form of high
capacity storage. It also facilitates fluid movement of data across the
various storage tiers present within the subsystem - the realization of
true "tiered storage in the box." Thus, Symmetrix VMAX becomes the
first enterprise storage subsystem to offer a comprehensive "tiered
storage in the box," ILM capability that complements the customer's
tiering initiatives. Customers can now achieve varied
cost/performance profiles by moving lower priority application data
to less expensive storage, or conversely, moving higher priority or
critical application data to higher performing storage as their needs
dictate.
Specific use cases for customer applications enable the moving of
data volumes transparently from tier to tier based on changing
performance (moving to faster or slower disks) or availability
requirements (changing RAID protection on the array). This
migration can be performed transparently without interrupting those
applications or host systems utilizing the array volumes and with
only a minimal impact to performance during the migration.
The following sample commands show how to move two LUNs of a
host environment from RAID 6 drives on Fibre Channel 15k rpm
drives to Enterprise Flash drives. The new symmigrate command,
which comes in EMC Solutions Enabler 7.0, is used to perform the
migrate operation. The source Symmetrix hypervolume numbers are
200 and 201, and the target Symmetrix hypervolumes on the
Enterprise Flash drives are A00 and A01.
1. A file (migrate.ctl) is created that contains the two LUNs to be
migrated. The file has the following content:
200 A00
201 A01
The two host accessible LUNs are migrated without having to impact
application or server availability.
Overview
There are many choices when cloning databases with EMC
array-based replication software. Each software product has differing
characteristics that affect the final deployment. A thorough
understanding of the options available leads to an optimal replication
choice.
An Oracle database can be in one of three data states when it is being
copied:
◆ Shutdown
◆ Processing normally
◆ Conditioned using hot-backup mode
Depending on the data state of the database at the time it is copied,
the database copy may be restartable or recoverable. This section
begins with a discussion of recoverable and restartable database
clones. It then describes various approaches to data replication using
EMC software products and how the replication techniques is used in
combination with the different database data states to facilitate the
database cloning process. Following that, database clone usage
considerations are discussed along with descriptions of the
procedures used to deploy database clones across various
operating-systems platforms.
Overview 109
Creating Oracle Database Clones
1 3
ICO-IMG-000505
3. When the database is deactivated, split the BCV mirrors using the
following command:
symmir -g device_group split -noprompt
1 3
ICO-IMG-000505
Controlling
host 1 3
2 I/O
STD
4 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000506
2. Once the create operation has completed, shut down the database
to make a cold TimeFinder/Snap of the DBMS. Execute the
following Oracle commands:
sqlplus "/ as sysdba"
SQL> shutdown immediate;
1 2
ICO-IMG-000507
1 2
ICO-IMG-000507
Controlling
host
1 I/O
STD
2 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000508
Alternatively, with Oracle10g, the entire database can be put into hot
backup mode with:
sqlplus "/ as sysdba"
SQL> alter system archive log current;
SQL> alter database begin backup;
When these commands are issued, data blocks for the tablespaces are
flushed to disk and the datafile headers are updated with the last
SCN. Further updates of the SCN to the datafile headers are not
performed. When these files are copied, the nonupdated SCN in the
datafile headers signifies to the database that recovery is required.
The log file switch command is used to ensure that the marker
indicating that the tablespaces have been taken out of hot backup
mode is found in an archive log.
1 3
5. After tablespaces are taken out of hot backup mode and a log
switch is performed, split the Log BCV devices from their source
volumes:
symmir -g log_group split -noprompt
1 3
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, activate the log clone devices:
symclone -g log_group activate -noprompt
Controlling
host 1 3 5
2 I/O
STD
4 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000510
5. After the database is taken out of hot backup mode and a log
switch is performed, activate the Log snap devices:
symsnap -g log_group activate -noprompt
Oracle
2 5
Data STD Data BCV
3 4 Log STD
6 7
1 Arch STD
8
9
10 ICO-IMG-000511
Host considerations
One of the primary considerations when starting a copy of an Oracle
database is whether to present it back to the same host or mount the
database on another host. While it is significantly simpler to restart a
database on a secondary host, it is still possible to restart a copy of the
database on the same host with only a few extra steps. The extra steps
required to mount a database to the same host, mounting a set of
copied volumes back to the same host, changing the mount points,
and relocating the datafiles, are described next.
MAXDATAFILES 30
MAXINSTANCES 2
MAXLOGHISTORY 224
LOGFILE
GROUP 1 (
'/oracle/oradata/test/oraredo1a.dbf',
'/oracle/oradata/test/oraredo2a.dbf'
) SIZE 10M,
GROUP 2 (
'/oracle/oradata/test/oraredo1b.dbf',
'/oracle/oradata/test/oraredo2b.dbf'
) SIZE 10M,
GROUP 3 (
'/oracle/oradata/test/oraredo1c.dbf',
'/oracle/oradata/test/oraredo2c.dbf'
) SIZE 10M
-- STANDBY LOGFILE
DATAFILE
'/oracle/oradata/test/orasys.dbf',
'/oracle/oradata/test/oraundo.dbf',
'/oracle/oradata/test/orausers.dbf'
CHARACTER SET US7ASCII
;
# Recovery is required if any of the datafiles are restored
# backups, or if the last shutdown was not normal or
# immediate.
RECOVER DATABASE
# Database can now be opened normally.
ALTER DATABASE OPEN;
# Commands to add tempfiles to temporary tablespaces.
# Online tempfiles have complete space information.
# Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP_TS ADD TEMPFILE
'/oracle/oradata/test/oratest.dbf'
SIZE 524288000 REUSE AUTOEXTEND OFF;
# End of tempfile additions.
#
# Set #2. RESETLOGS case
#
# The following commands will create a new control file and
# use it to open the database. The contents of online logs
# will be lost and all backups will be invalidated. Use this
# only if online logs are damaged.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "TEST" RESETLOGS
NOARCHIVELOG
-- SET STANDBY TO MAXIMIZE PERFORMANCE
MAXLOGFILES 16
MAXLOGMEMBERS 2
MAXDATAFILES 30
MAXINSTANCES 2
MAXLOGHISTORY 224
LOGFILE
GROUP 1 (
'/oracle/oradata/test/oraredo1a.dbf',
'/oracle/oradata/test/oraredo2a.dbf'
) SIZE 10M,
GROUP 2 (
'/oracle/oradata/test/oraredo1b.dbf',
'/oracle/oradata/test/oraredo2b.dbf'
) SIZE 10M,
GROUP 3 (
'/oracle/oradata/test/oraredo1c.dbf',
'/oracle/oradata/test/oraredo2c.dbf'
) SIZE 10M
-- STANDBY LOGFILE
DATAFILE
'/oracle/oradata/test/orasys.dbf',
'/oracle/oradata/test/oraundo.dbf',
'/oracle/oradata/test/orausers.dbf'
CHARACTER SET US7ASCII
;
# Recovery is required if any of the datafiles are restored
# backups, or if the last shutdown was not normal or
# immediate.
RECOVER DATABASE USING BACKUP CONTROLFILE
# Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;
# Commands to add tempfiles to temporary tablespaces.
# Online tempfiles have complete space information.
# Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP_TS ADD TEMPFILE
'/oracle/oradata/test/oratest.dbf'
SIZE 524288000 REUSE AUTOEXTEND OFF;
# End of tempfile additions.
#
After deciding whether to open the database with a reset logs and
editing the file appropriately, the datafile locations can change.
When run, the instance will search in the new locations for the
Oracledatafiles.
sqlplus "/ as sysdba"
SQL> @create_control
This will create the new database, relocating the datafiles into the
newly specified locations.
2. After the appropriate devices are available to the host, make the
operating system aware of the devices. In addition, import the
volume or disk groups and mount any file systems. This is
operating-system dependent and is discussed in Appendix C,
“Related Host Operation.”
3. Since the database was shut down when the copy was made, no
special processing is required to restart the database. Start the
database as follows:
sqlplus "/ as sysdba"
SQL> startup;
SQL> startup;
2. After the appropriate devices are available to the host, make the
operating system aware of the devices. In addition, import the
volume or disk groups and mount any file systems. This is
operating-system dependent and is discussed in Appendix C,
“Related Host Operation.”
3. Since the database was shutdown when the copy was made, no
special processing is required to restart the database. The
following is used to start the database:
sqlplus "/ as sysdba"
SQL> startup mount;
SQL> recover database;
TABLESPACE_NAME
-------------------------------
SYSTEM
SYSAUX
TEMP1
UNDO1
USERS1
OWNER
---------------
DEV1
USER1
These owners (schemas) need to be verified on the
target side:
SELECT username
FROM dba_users;
USERNAME
---------------
SYS
SYSTEM
In this case, the DEV1 user exists but the USER1 user does not.
The USER1 user must be created with the command:
CREATE USER user1
IDENTIFIED BY user1;
NAME VALUE
-------------------------------------
db_block_size8192
SELECT *
FROM transport_set_violations;
VIOLATIONS
-----------------------------------------------------
--
CONSTRAINT FK_SALES_ORDER_DEPT between table
DEV1.SALES
in tablespace DATA1 and table DEV2.ORDER_DEPT in
tablespace DATA2
TABLESPACE_NAME FILE_NAME
--------------- --------------------------------
DATA1 d:\oracle\oradata\db1\data1.dbf
INDEX1 d:\oracle\oradata\db1\index1.dbf
In this case, both required datafiles are on the d:\ drive. This
volume will be identified and replicated using TimeFinder. Note
that careful database layout planning is critical when TimeFinder
is used for replication. First, create a device group for the
standard device used by the d:\ drive and a BCV that will be
used for the new e:\ drive. Appendix B, “Sample SYMCLI Group
Creation Commands,”provides examples of creating device
groups.
4. After creating the device group, establish the BCV to the standard
device:
symmir -g device_group establish -full -noprompt
symmir -g device_group verify -i 30
5. After the BCV is fully synchronized with the standard device, the
devices can split since the tablespaces on the device are in
read-only mode.
symmir -g device_group split -noprompt
file = d:\oracle\exp\meta1.dmp
tablespaces = (data1,index1)
tts_owners = (dev1,dev2)
Alternatively, with Data Pump in Oracle10g:
IMPDPsystem/manager
DUMPFILE = meta1.dmp
DIRECTORY = d:\oracle\exp\
TRANSPORT_DATAFILES =
e:\oracle\oradata\db1\data1.dbf,
e:\oracle\oradata\db1\index1.dbf
Overview
Cross-platform transportable tablespaces enable data from an Oracle
database running on one operating system to be cloned and
presented to another database running on a different platform. Oracle
datafiles differences, as a result of the need to run on different
operating systems, are a function of byte ordering, or "endianness," of
the files. The endian format of the datafiles is classified as either "big
endian" or "little endian" (in "big endian," the first byte is the most
significant while in "little endian", the first byte is the least
significant). If two operating systems both use "big endian" byte
ordering, the files can transferred between operating systems and
used successfully in an Oracle database (through a feature such as
transportable tablespaces). For source and target operating systems
with different byte ordering, a process to convert the datafiles from
one "endianness" to another is required.
Oracle uses an RMAN option to convert a data file from "big endian"
to "little endian" and vice versa. First, the "endianness" of the source
and target operating systems must be identified. If different, then the
datafiles are read and converted by RMAN. Upon completion, the
"endianness" of the datafiles is converted to the format needed in the
new environment. The process of converting the cloned datafiles
occurs either on the source database host before copying to the new
environment or once it is received on the target host. Other than this
conversion process, the steps for cross-platform transportable
tablespaces are the same as for normal transportable tablespaces.
No. simultaneous 15 16 2 2
Copies
The following are examples of some of the choices you might make
for database cloning based on the information in Table 10.
More than two simultaneous copies need to be made. The copies TimeFinder/Clone
will live for up to a month’s time.
Multiple copies are being made, some with production mount. The Replication Manager
copies are reused in a cycle expiring the oldest one first.
Introduction
As a part of normal day-to-day operations, the DBA creates backup
procedures that run one or more times a day to protect the database
against errors. Errors can originate from many sources (such as
software, hardware, user, and so on) and it is the responsibility of the
DBA to provide error recovery strategies that can recover the
database to a point of consistency and also minimize the loss of
transactional data. Ideally, this backup process should be simple,
efficient, and fast.
Today, the DBA is challenged to design a backup (and recovery)
strategy to meet the ever-increasing demands for availability that can
also manage extremely large databases efficiently while minimizing
the burden on servers, backup systems, and operations staff.
This section describes how the DBA can leverage EMC technology in
a backup strategy to:
◆ Reduce production impact of performing backups.
◆ Create consistent point-in-time backup images.
◆ Create restartable or recoverable database backup images.
◆ Enhance Oracle's RMAN backup utility.
Before covering these capabilities, it is necessary to review some
terminology and also to look at best practices for Oracle database
layouts that can facilitate and enhance the backup and restore
process.
SYMMETRIX
Archive
INDEX UNDO
logs
ICO-IMG-000512
Making a hot copy of the database is now the standard, but this
method has its own challenges. How can a consistent copy of the
database and supporting files be made when they are changing
throughout the duration of the backup? What exactly is the content of
the tape backup at completion? The reality is that the tape data is a
"fuzzy image" of the disk data, and considerable expertise is required
to restore the database back to a database point of consistency.
Online backups are made when the database is running in log
archival mode. While there are performance considerations for
running in archive log mode, the overhead associated with it is
generally small compared with the enhanced capabilities and
increased data protection afforded by running in it. Except in cases
such as large data warehouses where backups are unnecessary, or in
other relatively obscure cases, archive log mode is generally
considered a best practice for all Oracle database environments.
depends on the state of the database at the time the copy was made.
Chapter 5, “Restoring and Recovering Oracle Databases,” covers the
restore of the database.
The following sections describe how to make a copy of the database
using three different EMC technologies with the database in the three
different states described in the prior paragraph.
The primary method of creating copies of an Oracle database is
through the use of the EMC local replication product TimeFinder.
TimeFinder is also used by Replication Manager to make database
copies. Replication Manager facilitates the automation and
management of database copies.
The TimeFinder family consists of two base products and several
component options. TimeFinder/Mirror, TimeFinder/Clone and
TimeFinder/Snap were discussed in general terms in Chapter 2,
“EMC Foundation Products.” In this chapter, they are used in a
database backup context.
1 3
ICO-IMG-000505
3. When the database is shut down, split the BCV mirrors using the
following command:
symmir -g device_group split -noprompt
1 3
ICO-IMG-000505
Controlling
host 1 3
2 I/O
STD
4 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000506
2. Once the create operation has completed, shut down the database
in order to make a cold TimeFinder/Snap of the DBMS. Execute
the following Oracle commands:
sqlplus "/ as sysdba"
SQL> shutdown immediate;
1 2
ICO-IMG-000507
1 2
ICO-IMG-000507
Controlling
host
1 I/O
STD
2 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000508
Alternatively, with Oracle10g, the entire database can be put into hot
backup mode with:
sqlplus "/ as sysdba"
SQL> alter system archive log current;
SQL> alter database begin backup;
When these commands are issued, data blocks for the tablespaces are
flushed to disk and the datafile headers are updated with the last
checkpoint SCN. Further updates of the checkpoint SCN to the data
file headers are not performed while in this mode. When these files
are copied, the nonupdated SCN in the datafile headers signifies to
the database that recovery is required.
The log file switch command is used to ensure that the marker
indicating that the tablespaces are taken out of hot backup mode is
found in an archive log. Switching the log automatically ensures that
this record is found in a written archive log.
1 3
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, split the Log BCV devices from their source
volumes:
symmir -g log_group split -noprompt
1 3
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, activate the Log clone devices:
symclone -g log_group activate -noprompt
Controlling
host 1 3 5
2 I/O
STD
4 Device pointers
from VDEV to
original data
I/O
VDEV
Data copied to
save area due to
copy on write
Target
host SAVE
DEV
ICO-IMG-000510
5. After the database is taken out of hot backup mode and a log
switch is performed, activate the lsnap devices:
symsnap -g log_group activate -noprompt
Note: Regardless of the tool used to create the backup copy and regardless of
the state of the database at the time the copy was created, the backup process
is the same, except as noted in the next section.
Oracle
2 5
Data STD Data BCV
3 4 Log STD
6 7
1 Arch STD
8
9
10 ICO-IMG-000511
Introduction
Recovery of a production database is an event that all DBAs hope is
never required. Nevertheless, DBAs must be prepared for unforeseen
events such as media failures or user errors requiring database
recovery operations. The keys to a successful database recovery
include the following:
◆ Identifying database recovery time objectives
◆ Planning the appropriate recovery strategy based upon the
backup type (full, incremental)
◆ Documenting the recovery procedures
◆ Validating the recovery process
Oracle recovery depends on the backup methodology used. With the
appropriate backup procedures in place, an Oracle database is
recovered to any point in time between the end of the backup and the
point of failure using a combination of backed up data files and
Oracle recovery structures including the control files, the archive
logs, and the redo logs. Recovery typically involves copying the
previously backed up files into their appropriate locations and, if
necessary, performing recovery operations to ensure that the
database is recovered to the appropriate point in time and is
consistent.
The following sections examine both traditional (user-managed) and
RMAN Oracle database recoveries. This chapter assumes that EMC
technology is used in the backup process as described in Chapter 4,
“Backing Up Oracle Environments.” Thus, this chapter directly
matches the sections of that chapter.
Crash recovery
A critical component of all ACID-compliant (Atomicity Consistency
Isolation Durability) databases is the ability to perform crash
recovery to a consistent database state after a failure. Power failures
to the host are a primary concern for databases to go down
inadvertently and require crash recovery. Other situations where
crash recovery procedures are needed include databases shut down
with the "abort" option and database images created using a
consistent split mechanism.
Crash recovery is an example of using the database restart process,
where the implicit application of database logs during normal
initialization occurs. Crash recovery is a database driven recovery
mechanism-it is not initiated by a DBA. Whenever the database is
started, Oracle verifies that the database is in a consistent state. It
does this by reading information out of the control file and verifying
the database was previously shut down cleanly. It also determines the
latest checkpoint system change number (SCN) in the control file and
verifies that each datafile is current by comparing the SCN in each
data file header. In the event that a crash occurred and recovery is
required, the database automatically determines which log
information needs to be applied. The latest redo log is read and
change information from them is applied to the database files, rolling
forward any transactions that were committed but not applied to the
database files. Then, any transaction information written to the
datafiles, but not committed, are rolled back using data in the undo
logs.
Media recovery
Media recovery is another type of Oracle recovery mechanism.
Unlike crash recovery however, media recovery is always
user-invoked, although both user-managed and RMAN recovery
types may be used. In addition, media recovery rolls forward changes
made to the datafiles restored from disk or tape due to their loss or
corruption. Unlike crash recovery, which uses only the online redo
log files, media recovery uses both the online redo logs and the
archived log files during the recovery process.
The granularity of a media recovery depends on the requirements of
the DBA. It can be performed for an entire database, for a single
tablespace, or even for a single datafile. The process involves
restoring a copy of a valid backed up image of the required data
structure (database, tablespace, datafile) and using Oracle standard
recovery methods to roll forward the database to the point in time of
the failure by applying change information found in the archived and
online redo log files. Oracle uses SCNs to determine the last changes
applied to the data files involved. It then uses information in the
control files that specifies which SCNs are contained in each of the
archive logs to determine where to start the recovery process.
Changes are then applied to appropriate datafiles to roll them
forward to the point of the last transaction in the logs.
Media recovery is the predominant Oracle recovery mechanism.
Media recovery is also used as a part of replicating Oracle databases
for business continuity or disaster recovery purposes. Further details
of the media recovery process are in the following sections.
Complete recovery
Complete recovery is the primary method of recovering an Oracle
database. It is the process of recovering a database to the latest point
in time (just before the database failure) without the loss of
committed transactions. The complete recovery process involves
restoring a part or all of the database data files from a backup image
on tape or disk, and then reading and applying all transactions
subsequent to the completion of the database backup from the
archived and online log files. After restarting the database, crash
recovery is performed to make the database transactionally
consistent for continued user transactional processing.
The processes needed to perform complete recovery of the database
are detailed in the following sections.
Incomplete recovery
Oracle sometimes refers to incomplete database recovery as a
point-in-time recovery. Incomplete recovery is similar to complete
recovery in the process used to bring the database back to a
transactionally consistent state. However, instead of rolling the
database forward to the last available transaction, roll-forward
procedures are halted at a user-defined prior point. This is typically
done to recover a database prior to a point of user error such as the
deletion of a table, undesired deletion or modification of customer
data, or rollback of an unfinished batch update. In addition,
incomplete recovery is also performed when recovery is required, but
there are missing or corrupted archive logs. Incomplete recovery
always incurs some data loss.
Typically, incomplete recovery operations are performed on the entire
database since Oracle needs all database files to be consistent with
one another. However, an option called Tablespace Point-in-Time
Recovery (usually abbreviated TSPITR), which allows a single
tablespace to be only partially recovered, is also available. This
recovery method, in Oracle10g, uses the transportable tablespace
feature described in Section 3.8. The Oracle documentation Oracle
Database Backup and Recovery Advanced Users Guide provides
additional information on TSPITR.
This first case is depicted in Figure 44, where both the volumes
containing the datafiles and the database recovery structures (archive
logs, redo logs, and control files) are restored.
Prior to any disk-based restore using EMC technology, the database
must be shut down, and file systems unmounted. The operating
system should have nothing in its memory that reflects the content of
the database file structures.
1 2
ICO-IMG-000513
1 2
ICO-IMG-000514
In the example that follows, the data_group device group holds all
Symmetrix volumes containing Oracle tablespaces. The log_group
group has volumes containing the Oracle recovery structures (the
archive logs, redo logs, and control files). The following steps
describe the process needed to restore the database image from the
BCVs:
1. Verify the state of the BCVs. All volumes in the Symmetrix device
group should be in a split state. The following commands identify
the state of the BCVs for each of the device groups:
symmir -g data_group query
symmir -g log_group query
3. After the primary database has shut down, unmount the file
system (if used) to ensure that nothing remains in cache. This
action is operating-system dependent.
4. Once the primary database has shut down successfully and the
file system is un-mounted, initiate the BCV restore process. In this
example, both the data_group and log_group device groups are
restored, indicating a point-in-time recovery. If an incomplete or
complete recovery is required, only the data_group device group
would be restored. Execute the following TimeFinder/Mirror
SYMCLI commands:
symmir -g data_group restore -nop
symmir -g log_group restore -nop
5. Once the BCV restore process has been initiated, the production
database copy is ready for recovery operations. It is possible to
start the recovery process even though the data is still being
restored from the BCV to the production devices. Any tracks
needed, but not restored, will be pulled directly from the BCV
device. It is recommended however, that the restore operation
completes and the BCVs are split from the standard devices
before the source database is started and recovery (if required) is
initiated.
6. After the restore process completes, split the BCVs from the
standard devices with the following commands:
symmir -g data_group split -nop
symmir -g log_group split -nop
symmir -g data_group query
symmir -g log_group query
1 2
ICO-IMG-000513
1 2
ICO-IMG-000514
In the example that follows, the data_group device group holds all
Symmetrix volumes containing Oracle tablespaces. The log_group
group has volumes containing the Oracle recovery structures (the
archive logs, redo logs, and control files). Follow these steps to restore
the database image from the BCV clone devices:
1. Verify the state of the clone devices. Volumes in the Symmetrix
device group should be in an active state, although the
relationship between the source and target volumes may have
terminated. The following commands identify the state of the
clones for each of the device groups (the -multi flag is used to
show all relationships available):
symclone -g data_group query -multi
symclone -g log_group query -multi
3. After the primary database has shut down, unmount the file
system (if used) to ensure that nothing remains in server cache.
This action is operating-system dependent.
4. Initiate the clone restore process. In this example, both the
data_group and log_group device groups are restored, indicating a
point-in-time recovery. If an incomplete or complete recovery is
required, only the data_group device group would be restored.
Execute the following TimeFinder/Clone SYMCLI commands:
symclone -g data_group restore -nop
symclone -g log_group restore -nop
symclone -g data_group query -multi
symclone -g log_group query -multi
Database
host 1 2
STD
Log
DEV
Data copied to
save area is
restored to Arch
DEV
standard
Data
SAVE DEV
DEV
ICO-IMG-000515
Database
host 1 2
STD
Data copied to
save area is
Data
restored to DEV
standard
SAVE
DEV
ICO-IMG-000516
3. After the primary database shuts down, unmount the file system
(if used) to ensure that nothing remains in cache. This action is
operating-system dependent.
4. Once the file systems are unmounted, initiate the snap restore
process. In this example, both the data_group and log_group device
groups are restored, indicating a point-in-time recovery. If an
incomplete or complete recovery is required, only the data_group
device group would be restored. Execute the following
TimeFinder/Clone SYMCLI commands:
symsnap -g data_group restore -nop
symsnap -g log_group restore -nop
6. When the snap restore process is initiated, both the snap device
and the source are set to a Not Ready status (that is, they are
offline to host activity). Once the restore operation commences,
the source device is set to a Ready state. Upon completion of the
restore process, terminate the restore operations as follows:
symsnap -g data_group terminate -restored -noprompt
symsnap -g log_group terminate -restored -noprompt
symsnap -g data_group query
symsnap -g log_group query
Note: Terminating the restore session does not terminate the underlying
snap session.
Oracle
2
Data STD Data BCV
1 Log STD
3
4 Arch STD
5
ICO-IMG-000517
Note: It is also possible to simply restart the database as shown in the next
section.
or
SQL> recover database until time timestamp;
Oracle Flashback
Oracle Flashback is a technology that helps DBAs recover from user
errors to the database. Initial Flashback functionality was provided in
Oracle9i but was greatly enhanced in Oracle10g. Flashback retains
undo data in the form of flashback logs. Flashback logs containing
undo information are periodically written by the database in order
for the various types of Flashback to work.
Each type of Flashback relies on undo data being written to the flash
recovery area. The flash recovery area is a file system Oracle uses to
retain the flashback logs, archive logs, backups, and other
Some of the ways Flashback helps DBAs recover from user errors are:
◆ Flashback Query
◆ Flashback Version Query
◆ Flashback Transaction Query
◆ Flashback Table
◆ Flashback Drop
◆ Flashback Database
Each of these recovery methods is describe in the following sections.
Flashback configuration
Flashback is enabled in a database by creating a flash recovery area
for the Flashback logs to be retained, and by enabling Flashback
logging. Flashback allows the database to be flashed back to any
point in time. However, the Flashback logs represent discrete
database points in time, and as such, ARCHIVELOG mode must also
be enabled for the database. Archive log information is used in
conjunction with the flashback logs to re-create any given database
point-in-time state desired.
The default flashback recovery area is defined by the Oracle
initialization parameter DB_RECOVERY_FILE_DEST. It is important
to set this parameter with the location of a directory that can hold the
flashback logs. The required size of this file system depends on how
far back a user may want to flash back the database to, and whether
Flashback Query
Flashback Query displays versions of queries run against a database
as they looked at a previous time. For example, if a user dropped a
selection of rows from a database erroneously, Flashback Query
allows that user to run queries against the table as if it were at that
time.
The following is an example of the Flashback Query functionality:
SELECT first_name, last_name
FROM emp
AS OF TIMESTAMP
TO_TIMESTAMP('2005-11-25 11:00:00', 'YYYY-MM-DD
HH:MI:SS')
Flashback Table
Flashback Table returns a table back into the state that it was at a
specified time. It is particularly useful in that this change can be made
while the database is up and running. The following is an example of
the Flashback Table functionality:
FLASHBACK TABLE emp
TO TIMESTAMP
TO_TIMESTAMP('2005-11-26 10:30:00', 'YYYY-MM-DD
HH:MI:SS');
An SCN can also be used:
FLASHBACK TABLE emp
TO SCN 54395;
Flashback Drop
If tables in Oracle are dropped inadvertently using a DROP TABLE
command, Flashback Drop can reverse the process, reenabling access
to the drop table. As long as space is available, the DROP TABLE
command does not delete data in the tablespace data files. Instead,
the table data is retained (in Oracle's "recycle bin") and the table is
renamed to an internally system-defined name. If the table is needed,
Oracle can bring back the table by renaming it with its old name.
The following shows an example of a table being dropped and then
brought back using the FLASHBACK TABLE command.
1. Determine the tables owned by the currently connected user:
SQL> SELECT * FROM tab;
Flashback Database
Flashback Database logically recovers the entire database to a
previous point in time. A database can be rolled back in time to the
point before a user error such as a batch update or set of transactions
logically corrupted the database. The database can rolled back to a
particular SCN, redo log sequence number, or timestamp. The
following is the syntax of the FLASHBACK DATABASE command:
4. Open the database for use. To make the database consistent, open
the database as follows:
SQL> alter database open resetlogs;
After opening the database with the resetlogs option,
immediately perform a full database backup.
Introduction
A critical part of managing a database is planning for unexpected loss
of data. The loss can occur from a disaster such as a fire or flood or it
can come from hardware or software failures. It can even come
through human error or malicious intent. In each instance, the
database must be restored to some usable point, before application
services can resume.
The effectiveness of any plan for restart or recovery involves
answering the following questions:
◆ How much downtime is acceptable to the business?
◆ How much data loss is acceptable to the business?
◆ How complex is the solution?
◆ Does the solution accommodate the data architecture?
◆ How much does the solution cost?
◆ What disasters does the solution protect against?
◆ Is there protection against logical corruption?
◆ Is there protection against physical corruption?
◆ Is the database restartable or recoverable?
◆ Can the solution be tested?
◆ If failover happens, will failback work?
All restart and recovery plans include a replication component. In its
simplest form, the replication process may be as easy as making a
tape copy of the database and application. In a more sophisticated
form, it could be realtime replication of all changed data to some
remote location. Remote replication of data has its own challenges
centered around:
◆ Distance
◆ Propagation delay (latency)
◆ Network infrastructure
◆ Data loss
This section provides an introduction to the spectrum of disaster
recovery and disaster restart solutions for Oracle databases on EMC
Symmetrix arrays.
Definitions
In the following sections, the terms dependent-write consistency,
database restart, database recovery, and roll-forward recovery are used. A
clear definition of these terms is required to understand the context of
this section.
Dependent-write consistency
A dependent-write I/O is one that cannot be issued until a related
predecessor I/O has completed. Dependent-write consistency is a
data state where data integrity is guaranteed by dependent-write
I/Os embedded in application logic. Database management systems
are good examples of the practice of dependent-write consistency.
Database management systems must devise protection against
abnormal termination to successfully recover from one. The most
common technique used is to guarantee that a dependent-write
cannot be issued until a predecessor write has completed. Typically
the dependent-write is a data or index write while the predecessor
write is a write to the log. Because the write to the log must be
completed prior to issuing the dependent-write, the application
thread is synchronous to the log write (that is, it waits for that write to
complete prior to continuing). The result of this strategy is a
dependent-write consistent database.
Database restart
Database restart is the implicit application of database logs during
the database's normal initialization process to ensure a
transactionally consistent data state.
If a database is shut down normally, the process of getting to a point
of consistency during restart requires minimal work. If the database
abnormally terminates, then the restart process will take longer
depending on the number and size of in-flight transactions at the
time of termination. An image of the database created by using EMC
consistency technology while it is running, without conditioning the
database, will be in a dependent-write consistent data state, which is
similar to that created by a local power failure. This is also known as
a DBMS restartable image. The restart of this image transforms it to a
Definitions 231
Understanding Oracle Disaster Restart & Disaster Recovery
Database recovery
Database recovery is the process of rebuilding a database from a
backup image, and then explicitly applying subsequent logs to roll
forward the data state to a designated point of consistency. Database
recovery is only possible with databases configured with archive
logging.
A recoverable Oracle database copy can be taken in one of three
ways:
◆ With the database shut down and copying the database
components using external tools
◆ With the database running using the Oracle backup utility
Recovery Manager (RMAN)
◆ With the database in hot backup mode and copying the database
using external tools
Roll-forward recovery
With some databases, it may be possible to take a DBMS restartable
image of the database, and apply subsequent archive logs, to roll
forward the database to a point in time after the image was created.
This means that the image created can be used in a backup strategy in
combination with archive logs. At the time of printing, a DBMS
restartable image of Oracle cannot use subsequent logs to roll
forward transactions. In most cases, during a disaster, the storage
array image at the remote site will be an Oracle DBMS restartable
image and cannot have archive logs applied to it.
Operational complexity
The operational complexity of a DR solution may be the most critical
factor in determining the success or failure of a DR activity. The
complexity of a DR solution can be considered as three separate
phases.
1. Configuration of initial setup of the implementation
2. Maintenance and management of the running solution
3. Execution of the DR plan in the event of a disaster
While initial configuration complexity and running complexity can
be a demand on human resources, the third phase, execution of the
plan, is where automation and simplicity must be the focus. When a
disaster is declared, key personnel may be unavailable in addition to
the loss of servers, storage, networks, buildings, and so on. If the
complexity of the DR solution is such that skilled personnel with an
Production impact
Some DR solutions delay the host activity while taking actions to
propagate the changed data to another location. This action only
affects write activity and although the introduced delay may only be
of the order of a few milliseconds, it can impact response time in a
high-write environment. Synchronous solutions introduce delay into
write transactions at the source site; asynchronous solutions do not.
operational functions like power on and off. Ideally, this server could
have some usage such as running development or test databases and
applications. Some DR solutions require more target server activity
and some require none.
Bandwidth requirements
One of the largest costs for DR is in provisioning bandwidth for the
solution. Bandwidth costs are an operational expense; this makes
solutions that have reduced bandwidth requirements very attractive
to customers. It is important to recognize in advance the bandwidth
consumption of a given solution to be able to anticipate the running
costs. Incorrect provisioning of bandwidth for DR solutions can have
an adverse affect on production performance and can invalidate the
overall solution.
Federated consistency
Databases are rarely isolated islands of information with no
interaction or integration with other applications or databases. Most
commonly, databases are loosely and/or tightly coupled to other
databases using triggers, database links, and stored procedures. Some
databases provide information downstream for other databases using
information distribution middleware; other databases receive feeds
and inbound data from message queues and EDI transactions. The
result can be a complex interwoven architecture with multiple
interrelationships. This is referred to as a federated database
architecture.
With a federated database architecture, making a DR copy of a single
database without regard to other components invites consistency
issues and creates logical data integrity problems. All components in
a federated architecture need to be recovered or restarted to the same
dependent-write consistent point of time to avoid these problems.
It is possible then that point database solutions for DR, such as log
shipping, do not provide the required business point of consistency
in a federated database architecture. Federated consistency solutions
guarantee that all components, databases, applications, middleware,
flat files, and such are recovered or restarted to the same
dependent-write consistent point in time.
Cost
The cost of doing DR can be justified by comparing it to the cost of
not doing it. What does it cost the business when the database and
application systems are unavailable to users? For some companies,
this is easily measurable, and revenue loss can be calculated per hour
of downtime or per hour of data loss.
Whatever the business, the DR cost is going to be an extra expense
item and, in many cases, with little in return. The costs include, but
are not limited to:
◆ Hardware (storage, servers and maintenance)
◆ Software licenses and maintenance
◆ Facility leasing/purchase
◆ Utilities
◆ Network infrastructure
◆ Personnel
Tape-based solutions
This sectin discusses the following tape-based solutions:
◆ “Tape-based disaster recovery” on page 239
◆ “Tape-based disaster restart” on page 239
Propagation delay
Electronic operations execute at the speed of light. The speed of light
in a vacuum is 186,000 miles per second. The speed of light through
glass (in the case of fiber-optic media) is less, approximately 115,000
miles per second. In other words, in an optical network, such as
Bandwidth requirements
All remote replication solutions have some bandwidth requirements
because the changes from the source site must be propagated to the
target site. The more changes there are, the greater the bandwidth
that is needed. It is the change rate and replication methodology that
determine the bandwidth requirement, not necessarily the size of the
database.
Data compression can help reduce the quantity of data transmitted
and therefore the size of the "pipe" required. Certain network devices,
like switches and routers, provide native compression, some by
software and some by hardware. GigE directors provide native
compression in a DMX to DMX SRDF pairing. The amount of
compression achieved depends on the type of data being
compressed. Typical character and numeric database data
compresses at about a 2-to-1 ratio. A good way to estimate how the
data will compress is to assess how much tape space is required to
store the database during a full-backup process. Tape drives perform
hardware compression on the data prior to writing it. For instance, if
a 300 GB database takes 200 GB of space on tape, the compression
ratio is 1.5 to 1.
For most customers, a major consideration in the disaster recovery
design is cost. It is important to recognize that some components of
the end solution represent a capital expenditure and some an
operational expenditure. Bandwidth costs are operational expenses
and thus any reduction in this area, even at the cost of some capital
expense, is highly desirable.
Network infrastructure
The choice of channel extension equipment, network protocols,
switches, routers, and such, ultimately determines the operational
characteristics of the solution. EMC has a proprietary "BC Design
Tool" to assist customers in analysis of the source systems and to
determine the required network infrastructure to support a remote
replication solution.
Method of instantiation
In all remote replication solutions, a common requirement is for an
initial, consistent copy of the complete database to be replicated to
the remote site. The initial copy from source to target is called
instantiation of the database at the remote site. Following instantiation,
only the changes made at the source site are replicated. For large
databases, sending only the changes after the initial copy is the only
practical and cost-effective solution for remote database replication.
In some solutions, instantiation of the database at the remote site uses
a process similar to the one that replicates the changes. Some
solutions do not even provide for instantiation at the remote site (log
shipping for instance). In all cases it is critical to understand the pros
and cons of the complete solution.
Method of reinstantiation
Some methods of remote replication require periodic refreshing of the
remote system with a full copy of the database. This is called
reinstantiation. Technologies such as log shipping frequently require
this since not all activity on the production database may be
represented in the log. In these cases, the disaster recovery plan must
account for re-instantiation and also for the fact there may be a
disaster during the refresh. The business objectives of RPO and RTO
must likewise be met under those circumstances.
Locality of reference
Locality of reference is a factor that needs to be measured to
understand if there will be a reduction of bandwidth consumption
when any form of asynchronous transmission is used. Locality of
reference is a measurement of how much write activity on the source
is skewed. For instance, a high locality of reference application may
make many updates to a few tables in the database, whereas a low
locality of reference application rarely updates the same rows in the
same tables during a given time period. While the activity on the
tables may have a low locality of reference, the write activity into an
index might be clustered when inserted rows have the same or
similar index column values. This renders a high locality of reference
on the index components.
In some asynchronous replication solutions, updates are "batched"
into periods of time and sent to the remote site to be applied. In a
given batch, only the last image of a given row/block is replicated to
the remote site. So, for highly skewed application writes, this results
in bandwidth savings. Generally, the greater the time period of
batched updates, the greater the savings on bandwidth.
Log-shipping technologies do not consider locality of reference. For
example, a row updated 100 times, is transmitted 100 times to the
remote site, whether the solution is synchronous or asynchronous.
Failback operations
If there is the slightest chance that failover to the DR site may be
required, then there is a 100 percent chance that failback to the
primary site also will be required, unless the primary site is lost
permanently. The DR architecture should be designed to make
failback simple, efficient, and low risk. If failback is not planned for,
there may be no reasonable or acceptable way to move the processing
from the DR site, where the applications may be running on tier 2
servers and tier 2 networks, and so forth, back to the production site.
In a perfect world, the DR process should be tested once a quarter,
with database and application services fully failed over to the DR site.
The integrity of the application and database must be verified at the
remote site to ensure all required data copied successfully. Ideally,
production services are brought up at the DR site as the ultimate test.
This means production data is maintained on the DR site, requiring a
failback when the DR test completed. While this is not always
possible, it is the ultimate test of a DR solution. It not only validates
the DR process, but also trains the staff on managing the DR process
should a catastrophic failure occur. The downside for this approach is
that duplicate sets of servers and storage need to be present to make
an effective and meaningful test. This tends to be an expensive
proposition.
SYMMETRIX
Archive
INDEX UNDO
logs
ICO-IMG-000512
SQL> shutdown
SQL> exit
2. Move the datafiles using O/S commands from the old location to
the new location:
mv /oracle/oldlogs/log1a.rdo /oracle/newlogs/log1a.rdo
mv /oracle/oldlogs/log1b.rdo /oracle/newlogs/log1b.rdo
4 3
1
2
Oracle
Source Target
ICO-IMG-000518
4. Instruct the source Symmetrix array to send all the tracks on the
source site to the target site using the current mode:
symrdf -g device_group establish -full -noprompt
Note: There is no requirement for a host at the remote site during the
synchronous replication. The target Symmetrix array itself manages the
in-bound writes and updates the appropriate volumes in the array.
At this point, the host can issue the necessary commands to access the
disks. For instance, on a UNIX host, import the volume group,
activate the logical volumes, fsck the file systems and mount them.
Once the data is available to the host, the database can restart. The
database will perform an implicit recovery when restarted.
Transactions that were committed, but not completed, are rolled
forward and completed using the information in the redo logs.
Transactions that have updates applied to the database, but were not
committed, are rolled back. The result is a transactionally consistent
database.
Rolling disaster
Protection against a rolling disaster is required when the data for a
database resides on more than one Symmetrix array or multiple RA
groups. Figure 53 on page 254 depicts a dependent-write I/O
sequence where a predecessor log write is happening prior to a page
flush from a database buffer pool. The log device and data device are
on different Symmetrix arrays with different replication paths.
Figure 53 demonstrates how rolling disasters can affect this
dependent-write sequence
4 Data
ahead
of Log
R1(A)
Host R1(X)
3 R1(B)
R1(Y) 3 R2(A)
R2(X)
R2(B)
DBMS R2(Y)
1 R2(C)
R2(Z)
R1(C)
X = Application Data 2 R1(Z)
Y = DBMS Data
Z = Logs
ICO-IMG-000519
5 Suspend R1/R2
Host 1 1
4 relationship
E-ConGroup
Solutions Enabler definition 7 DBMS
ConGroup definition (X,Y,Z) restartable
R1(A)
copy
DBMS 6
R1(B)
SCF/SYMAPI R1(X)
IOS/PowerPath R2(A)
R2(X)
R1(Y)
R2(B)
R2(Y)
Host 2
2 R2(C)
Solutions Enabler 1 R1(C) R2(Z)
ConGroup definition E-ConGroup 3
definition
(X,Y,Z)
DBMS
R1(Z)
SCF/SYMAPI X = Application data
IOS/PowerPath Y = DBMS data
Z = Logs
ICO-IMG-000520
Source Target
Data Data
files files
Redo Redo
logs logs
Synchronous
Archive Archive
logs logs
Data Data
Oracle files files
Redo Redo
logs logs
Archive Archive
logs logs
ICO-IMG-000521
2. Add to the consistency group the R1 devices 121 and 12f from
Symmetrix with ID 111, and R1 devices 135 and 136 from
Symmetrix with ID 222:
symcg -cg device_group add dev 121 -sid 111
symcg -cg device_group add dev 12f -sid 111
symcg -cg device_group add dev 135 -sid 222
symcg -cg device_group add dev 136 -sid 222
Note: There is no requirement for a host at the remote site during the
synchronous replication. The target Symmetrix array manages the in-bound
writes and updates the appropriate disks in the array.
SRDF/A
SRDF/A, or asynchronous SRDF, is a method of replicating
production data changes from one Symmetrix array to another using
delta set technology. Delta sets are the collection of changed blocks
grouped together by a time interval configured at the source site. The
default time interval is 30 seconds. The delta sets are then transmitted
from the source site to the target site in the order created. SRDF/A
preserves dependent-write consistency of the database at all times at
the remote site.
The distance between the source and target Symmetrix arrays is
unlimited and there is no host impact. Writes are acknowledged
immediately when they hit the cache of the source Symmetrix array.
SRDF/A is only available on the DMX family of Symmetrix arrays.
Figure 56 shows the process.
1 5 2 3 4
R1 N-1 N-1 R2
N N-2
Oracle
R1 N-1 N-1 R2
N N-2
ICO-IMG-000522
SRDF/A 261
Understanding Oracle Disaster Restart & Disaster Recovery
2. Add to the device group the R1 devices 121 and 12f from the
Symmetrix array with ID 111, and R1 devices 135 and 136 from
the Symmetrix array with ID 222:
symld -g device_group add dev 121 -sid 111
symld -g device_group add dev 12f -sid 111
symld -g device_group add dev 135 -sid 222
symld -g device_group add dev 136 -sid 222
4. Instruct the source Symmetrix array to send all the tracks at the
source site to the target site using the current mode:
symrdf -g device_group establish -full -noprompt
Note: There is no requirement for a host at the remote site during the
asynchronous replication. The target Symmetrix array manages the
in-bound writes and updates the appropriate disks in the array.
SRDF/A 263
Understanding Oracle Disaster Restart & Disaster Recovery
Note: There is no requirement for a host at the remote site during the
asynchronous replication. The target Symmetrix array itself manages the
in-bound writes and updates the appropriate disks in the array.
At this point, the host can issue the necessary commands to access the
disks. For instance, on a UNIX host, import the volume group,
activate the logical volumes, fsck the file systems, and mount them.
Once the data is available to the host, the database can be restarted.
The database will perform crash recovery when restarted.
Transactions committed, but not completed, are rolled forward and
completed using the information in the redo logs. Transactions with
updates applied to the database, but not committed, are rolled back.
The result is a transactionally consistent database.
SRDF/A 265
Understanding Oracle Disaster Restart & Disaster Recovery
5
1 3 4
2
STD BCV/R1 R2 BCV
Oracle
STD BCV/R1 R2 BCV
ICO-IMG-000523
SRDF/AR multihop
SRDF/AR multihop is an architecture that allows long-distance
replication with zero seconds of data loss through use of a bunker
Symmetrix array. Production data is replicated synchronously to the
bunker Symmetrix array, which is within 200 km of the production
Symmetrix array allowing synchronous replication, but also far
enough away that potential disasters at the primary site may not
affect it. Typically, the bunker Symmetrix array is placed in a
hardened computing facility.
BCVs in the bunker frame are periodically synchronized to the R2s
and consistent split in the bunker frame to provide a dependent-write
consistent point-in-time image of the data. These bunker BCVs also
have an R1 personality, which means that SRDF in adaptive copy
mode can be used to replicate the data from the bunker array to the
target site. Since the BCVs are not changing, the replication can be
completed in a finite length of time. The replication time depends on
the size of the "pipe" between the bunker location and the DR
location, the distance between the two locations, the quantity of
changed data, and the locality of reference of the changed data. On
the remote Symmetrix array, another BCV copy of the data is made
using the R2s. This is because the next SRDF/AR iteration replaces
the R2 image, in a nonordered fashion, and if a disaster were to occur
while the R2s were synchronizing, there would not be a valid copy of
the data at the DR site. The BCV copy of the data in the remote
Symmetrix array is commonly called the "gold" copy of the data. The
whole process then repeats.
Production Bunker DR
1 5 4
2
R1 R2 BCV/R1 R2 BCV
Log-shipping considerations
When considering a log shipping strategy it is important to
understand:
◆ What log shipping covers.
◆ What log shipping does not cover.
◆ Server requirements.
◆ How to instantiate and reinstantiate the database.
◆ How failback works.
◆ Federated consistency requirements.
◆ How much data will be lost in the event of a disaster.
◆ Manageability of the solution.
◆ Scalability of the solution.
Log-shipping limitations
Log shipping transfers only the changes happening to the database
that are written into the redo logs and then copied to an archive log.
Consequently, operations happening in the database not written to
the redo logs do not get shipped to the remote site. To ensure that all
transactions are written to the redo logs, run the following command:
alter database force logging;
Log shipping is a database-centric strategy. It is completely agnostic
and does not address changes that occur outside of the database.
Changes include, but are not limited to the following:
◆ Application files and binaries
◆ Database configuration files
◆ Database binaries
◆ OS changes
◆ Flat files
To sustain a working environment at the DR site, there are several
procedures required to keep these objects up to date.
Server requirements
Log shipping requires a server at the remote DR site to receive and
apply the logs to the standby database. It may be possible to offset
this cost by using the server for other functions when it is not being
used for DR. Database licensing fees for the standby database may
also apply.
Data
1 files
Data
files Redo Active
logs logs 4
Other
Other data
data
Oracle Oracle
Archive 2 Archive
logs logs
3 ICO-IMG-000525
Note: Note that users need to run a catalog for the production database to
connect to the new location or the IP address of the standby server needs to
be updated to be the same as the failed production server.
Primary DB
Standby
Redo log MRP or LSP
Redo log
ARCn ARCn
LOG_ARCH_DEST_1
Archive log
Archive log
ICO-IMG-000528
Process Description
LGWR Sends Redo Log information from the primary host to the standby host via
(Log Writer) Oracle Net. LGWR can be configured to send data to standby redo logs on
the standby host for synchronous operations.
ARCn Sends primary database archive logs to the standby host. This process is
(Archiver) used primarily in configurations that do not use standby redo logs and are
configured for asynchronous operations.
RFS Receives log data, either from the primary LGWR or ARCn processes, and
(Remote File write data on the standby site to either the standby redo logs or archive logs.
Server) This process is configured on the standby host when Data Guard is
implemented.
Process Description
FAL Manages the retrieval of corrupted or missing archive from the primary to the
(Fetch Archive standby host.
Log)
MRP Used by a physical standby database to apply logs, retrieved from either the
(Managed standby redo logs or from local copies of archive logs.
Recovery)
LSP Used by a logical standby database to apply logs, retrieved from either the
(Logical standby redo logs or from local copies of the archive logs.
Standby)
LNS Enables asynchronous writes to the standby site using the LGWR process
(Network and standby redo logs.
Server)
Data
1 files
Data
files Redo Active
logs logs 4
Other
Other data
data
Oracle Oracle
Archive 2 Archive
logs logs
3 ICO-IMG-000525
Overview
Running database solutions attempt to use DR solutions in an active
fashion. Instead of having the database and server sitting idly waiting
for a disaster to occur, the idea of having the database running and
serving a useful purpose at the DR site is an attractive one. Also,
active databases at the target site minimize the recovery time
required to have an application available in the event of a failure of
the primary. The problem is that hardware, server, and database
replication-level solutions typically require exclusive access to the
database, not allowing users to access the target database. The
solutions presented in this section perform replication at the
application layer and therefore allow user access even when the
database is being updated by the replication process.
In addition to an Oracle Data Guard logical standby database, which
can function as a running database while log information is being
applied to it, Oracle has two other methods of synchronizing data
between disparate running databases. These running database
solutions are Oracle's Advanced Replication and Oracle Streams,
which are described at a high level in the following sections.
Advanced Replication
Advanced Replication is one method of replicating objects between
Oracle databases. Advanced Replication is similar to Oracle's
previous Snapshot technology, where changes to the underlying
tables were tracked internally within Oracle and used to provide a list
of necessary rows to be sent to a remote location when a refresh of the
remote object was requested. Instead of snapshots, Oracle now uses
materialized views to track and replicate changes. Materialized views
are a complete or partial copy of a target table from a single point in
time.
Oracle Streams
Streams is Oracle's distributed transaction solution for propagating
table, schema, or entire database changes to one or many other Oracle
databases. Streams uses the concept of change records from the
source database, which are used to asynchronously distribute
changes to one or more target databases. Both DML and DDL
changes can be propagated between the source and target databases.
Queues on the source and target databases are used to manage
change propagation between the databases.
Introduction
Monitoring and managing database performance should be a
continuous process in all Oracle environments. Establishing baselines
and collecting database performance statistics for comparison against
them are important to monitor performance trends and maintain a
smoothly running system. The following section discusses the
performance stack and how database performance should be
managed in general. Subsequent sections discuss Symmetrix DMX
layout and configuration issues to help ensure the database meets the
required performance levels.
Application
Poorly written application,
inefficient code
SQL Statements
SQL logic errors, missing index
DB Engine
Database resource contention
Operating System
File system parameters settings,
kernel tuning, I/O distribution
Storage System
Storage allocation errors,
volume contention
ICO-IMG-000040
Front-end connectivity
Optimizing front-end connectivity requires an understanding of the
number and size of I/Os, both reads and writes, which will be sent
between the hosts and the Symmetrix DMX array. There are
limitations to the amount of I/O that each front-end director port,
each front-end director processor, and each front-end director board
can handle. Additionally, SAN fan-out counts (that is, the number of
hosts that can be attached through a Fibre Channel switch to a single
front-end port) need to be carefully managed.
A key concern when optimizing front-end performance is
determining which of the following I/O characteristics is more
important in the customer's environment:
70%
60%
50%
40%
30%
20%
10% I/O per sec
MB per sec
0%
512 4096 8192 32768 65536
Blocksize
ICO-IMG-000042
the host to send larger I/O sizes for DSS applications can increase the
overall throughput (MB/s) from the front-end directors on the DMX.
Database block sizes are generally larger (16 KB or even 32 KB) for
DSS applications. Sizing the host I/O size as a power of two multiple
of the DB_BLOCK_SIZE and tuning the
MULTI_BLOCK_READ_COUNT appropriately is important for
maximizing performance in a customer's Oracle environment.
Currently, each Fibre Channel port on the Symmetrix DMX is
theoretically capable of 200 MB/s of throughput. In practice however,
the throughput available per port is significantly less and depends on
the I/O size and on the shared utilization of the port and processor
on the director. Increasing the size of the I/O from the host
perspective decreases the number of IOPS that can be performed, but
increases the overall throughput (MB/s) of the port. As such,
increasing the I/O block size on the host is beneficial for overall
performance in a DSS environment. Limiting total throughput to a
fraction of the theoretical maximum (100 to 120 MB/s is a good "rule
of thumb") will ensure that enough bandwidth is available for
connectivity between the Symmetrix DMX and the host.
Symmetrix cache
The Symmetrix cache plays a key role in improving I/O performance
in the storage subsystem. The cache improves performance by
allowing write acknowledgements to be returned to a host when data
is received in solid-state cache, rather than being fully destaged to the
physical disk drives. Additionally, reads benefit from cache when
sequential requests from the host allow follow-on reads to be
prestaged in cache. The following briefly describes how the
Symmetrix cache is used for writes and reads, and then discusses
performance considerations for it.
point, Enginuity will destage the write to physical disk. The decision
of when to destage is based on overall system load, physical disk
activity; read operations to the physical disk, and availability of
cache.
Cache is used to service the write operation to optimize the
performance of the host system. As write operations to cache are
significantly faster than physical writes to disk media, the write is
reported as complete to the host operating system much earlier.
Battery backup and priority destage functions within the Symmetrix
ensure that no data loss occurs in the event of system power failure.
If the write operation to a given disk is delayed due to higher priority
operations (read activity is one such operation), the write-pending
slot remains in cache for longer time periods. Cache slots are
allocated as needed to a volume for this purpose. Enginuity
calculates thresholds for allocations to limit the saturation of cache by
a single hypervolume. These limits are referred to as write-pending
limits.
Cache allocations are based on a per hypervolume basis. As
write-pending thresholds are reached, additional allocations may
occur, as well as reprioritization of write activity. As a result, write
operations to the physical disks may increase in priority to ensure
that excessive cache allocations do not occur. This is discussed in
more detail in the next section.
Thus, the cache enables buffering of writes and allows for a steady
stream of write activity to service the destaging of write operations
from a host. In a "bursty" write environment, this serves to even out
the write activity. Should the write activity constantly exceed the low
write priority to the physical disk, Enginuity will raise the priority of
write operations to attempt to meet the write demand. Ultimately,
should write load from the host exceed the physical disk ability to
write, the volume maximum write-pending limit may be reached. In
this condition, new cache slots only will be allocated for writes to a
particular volume once a currently allocated slot is freed by destaging
it to disk. This condition, if reached, may severely impact write
operations to a single hypervolume.
Note: In the DMX-3, the cache slot size increases from 32 KB to 64 KB. Sectors
also increase from 4 KB to 8 KB.
From this, we see devices 434 and 435 have reached the device
write-pending limit of 14,157. Further analysis on the cause of the
excessive writes and methods of alleviating this performance
bottleneck against these devices should be made.
Alternatively, Performance Manager may be used to determine the
device write-pending limit, and whether device limits are being
reached. Figure 64 on page 305 is a Performance Manager view
displaying both the device write-pending limits and device
write-pending counts for a given device, in this example Symmetrix
device 055. For the Symmetrix in this example, the write-pending
slots per device was 9,776 and thus the max write-pending limit was
29,328 slots (3 * 9776). In general, a distinct flat line in such graphs
indicates that a limit is reached.
30000
25000
Devices 055-
20000 write pending
count 12/16/200n
15000
Devices 055-
maximum
write pending
10000
threshold
12/16/200n
5000
0
16:50 16:52 16:54 16:56 16:58 17:00 17:02 17:04 17:06 17:08 17:10 17:12 17:14
ICO-IMG-000043
Devices 00F-
write pending
10000 count 12/21/200n
9000
Devices 00E-
8000 write pending
count 12/21/200n
7000
Devices 011-
6000 write pending
count 12/21/200n
5000
Meta
10 ps Hyper
Read 10 ps
Write 10 ps Transactions
per second
ICO-IMG-000039
Note that the number of cache boards also has a minor affect on
performance. When comparing Symmetrix DMX arrays with the
same amount of cache, increasing the number of boards (for example,
four cache boards with 16 GB each as opposed to two cache boards
with 32 GB each) has a small positive affect on the performance in
DSS applications. This is due to the increased number of paths
between front-end directors and cache, and has the affect of
improving overall throughput. However, configuring additional
boards is only helpful in high-throughput environments such as DSS
applications. For OLTP workloads, where IOPS are more critical,
additional cache directors provide no added performance benefits.
This is because the number of IOPS per port or director is limited by
the processing power of CPUs on each board.
Back-end considerations
Back-end considerations are typically the most important part of
optimizing performance on the Symmetrix DMX. Advances in disk
technologies have not kept up with performance increases in other
parts of the storage array such as director and bandwidth (that is,
Direct Matrix versus Bus) performance. Disk-access speeds have
increased by a factor of three to seven in the last decade while other
components have easily increased one to three orders of magnitude.
As such, most performance bottlenecks in the Symmetrix DMX are
attributable to physical spindle limitations.
An important consideration for back-end performance is the number
of physical spindles available to handle the anticipated I/O load.
Each disk is capable of a limited number of operations. Algorithms in
the Symmetrix DMX Enginuity operating environment optimize
I/Os to the disks. Although this helps to reduce the number of reads
and writes to disk, access to disk, particularly for random reads, is
still a requirement. If an insufficient number of physical disks are
available to handle the anticipated I/O workload, performance will
suffer. It is critical to determine the number of spindles required for
an Oracle database implementation based on I/O performance
requirements, and not solely on the physical space considerations.
To reduce or eliminate back-end performance issues on the
Symmetrix DMX, carefully spread access to the disks across as many
back-end directors and physical spindles as possible. EMC has long
recommended for data placement of application data to "go wide
before going deep." This means that performance is improved by
spreading data across the back-end directors and disks, rather than
allocating individual applications to specific physical spindles.
Significant attention should be given to balancing the I/O on the
physical spindles. Understanding the I/O characteristics of each
datafile and separating high application I/O volumes on separate
physical disks will minimize contention and improve performance.
Implementing Symmetrix Optimizer may also help to reduce I/O
contention between hypervolumes on a physical spindle. Symmetrix
Optimizer identifies I/O contention on individual hypervolumes and
nondisruptively moves one of the hypers to a new location on
another disk. Symmetrix Optimizer is an invaluable tool in helping to
reduce contention on physical spindles should workload
requirements change in an environment.
Configuration recommendations
Key recommendations for configuring the Symmetrix DMX for
optimal performance include the following:
Understand the I/O — It is critical to understand the characteristics
of the database I/O including the number, type (read or write) size,
location (that is, data files, logs), and sequentiality of the I/Os.
Empirical data or estimates are needed to assist in planning.
◆ Physical spindles — The number of disk drives in the DMX
should first be determined by calculating the number of I/Os
required, rather than solely based on the physical space needs.
The key is to ensure that the front-end needs of the applications
can be satisfied by the flow of data from the back end.
◆ Spread out the I/O — Both reads and writes should be spread
across the physical resources (front-end and back-end ports,
physical spindles, hypervolumes) of the DMX. This helps to
prevent bottlenecks such as hitting port or spindle I/O limits, or
reaching write-pending limits on a hypervolume.
◆ Bandwidth — A key consideration when configuring
connectivity between a host and the Symmetrix DMX is the
expected bandwidth required to support database activity. This
requires an understanding of the size and number of I/Os
between the host and the Symmetrix system. Connectivity
considerations for both the number of HBAs and Symmetrix
front-end ports is required.
RAID considerations
For years, Oracle has recommended that all database storage be
mirrored; their philosophy of stripe and mirror everywhere (SAME)
is well known in the Oracle technical community. While laying out
databases using SAME may provide optimal performance in most
circumstances, in some situations acceptable data performance (IOPS
or throughput) can be achieved by implementing more economical
RAID configurations such as RAID 5. Before discussing RAID
recommendations for Oracle, a definition of each RAID type available
in the Symmetrix DMX is required.
Types of RAID
The following RAID configurations are available on the Symmetrix
DMX:
◆ Unprotected - This configuration is not typically used in a
Symmetrix DMX environment for production volumes. BCVs
and occasionally R2 devices (used as target devices for SRDF) can
be configured as unprotected volumes.
◆ RAID 1 - These are mirrored devices and are the most common
RAID type in a Symmetrix DMX. Mirrored devices require writes
to both physical spindles. However, intelligent algorithms in the
Enginuity operating environment can use both copies of the data
to satisfy read requests not in the cache of the Symmetrix DMX.
RAID 1 offers optimal availability and performance, but at an
increased cost over other RAID protection options.
◆ RAID 5 - A relatively recent addition to Symmetrix data
protection (Enginuity 5670+), RAID 5 stripes parity information
across all volumes in the RAID group. RAID 5 offers good
performance and availability, at a decreased cost. Data is striped
using a stripe width of four tracks (128 KB on DMX-2 and 256 KB
on DMX-3). RAID 5 is configured either as RAID 5 3+1 (75%
usable) or RAID 5 7+1 (87.5 % usable) configurations. Figure 67
shows the configuration for 3+1 RAID 5 while Figure 68 on
page 313 shows how a random write in a RAID 5 environment is
performed.
ICO-IMG-000083
CACHE
rite
Host data XOR data w Data
1 Data slot 2
of old
XOR ta
ew da
3 and n
Parity slot
4 XOR
parity
write
Parity
ICO-IMG-000045
CACHE
Write new data 3 Data
Data slots
Host data
Write new data
1 Data slots 3 Data
ICO-IMG-000527
RAID recommendations
Oracle has long recommended RAID 1 over RAID 5 for database
implementations. This was largely attributed to RAID 5's historical
poor performance versus RAID 1 (due to software implemented
RAID schemes) and also due to high disk drive failure rates that
caused RAID 5 performance degradation after failures and during
rebuilds. However, disk drives and RAID 5 in general have seen
significant optimizations and improvements since Oracle initially
recommended avoiding RAID 5. In the Symmetrix DMX, Oracle
databases can be deployed on RAID 5 protected disks for all but the
highest I/O performance intensive applications. Databases used for
test, development, QA, or reporting are likely candidates for using
RAID 5 protected volumes.
Another potential candidate for deployment on RAID 5 storage is
DSS applications. In many DSS environments, read performance
greatly outweighs the need for rapid writes. This is because data
warehouses typically perform loads off-hours or infrequently (once a
week or month); read performance in the form of database user
queries is significantly more important. Since there is no RAID
penalty for RAID 5 read performance, only write performance, these
types of applications are generally good candidates for RAID 5
storage deployments. Conversely, production OLTP applications
typically require small random writes to the database, and as such,
are generally more suited to RAID 1 storage.
Symmetrix metavolumes
Individual Symmetrix hypervolumes of the same RAID type (RAID
1, RAID 5) may be combined together to form a virtualized device
called a Symmetrix metavolume. Metavolumes are created for a
number of reasons including:
◆ A desire to create devices that are greater than the largest
hypervolume available (in 5670 and 5671 Enginuity operating
environments, this is currently just under 31 GB per
hypervolume).
Host-based striping
Host-based striping is configured through the Logical Volume
Manager used on most open-systems hosts. For example, in an
HP-UX environment, striping is configured when logical volumes are
created in a volume group as shown below:
lvcreate -i 4 -I 64KB -L 1024 -n stripevol activevg
In this case, the striped volume is called stripevol (using the -n flag),
is created on the volume group activevg, is of volume size 1 GB (-L
1024), uses a stripe size of 64 KB (-I 64KB), and is striped across four
physical volumes (-i 4). The specifics of striping data at the host level
are operating-system-dependent.
Two important things to consider when creating host-based striping
are the number of disks to configure in a stripe set and an appropriate
stripe size. While no definitive answer can be given that optimizes
these settings for any given configuration, the following are general
guidelines to use when creating host-based stripes:
◆ Ensure that the stripe size used is a power of two multiple of the
track size configured on the Symmetrix DMX (that is, a multiple
of 32 KB on DMX-2 and 64KB on DMX-3), the database, and host
I/Os. Alignment of database blocks, Symmetrix tracks, host I/O
size, and the stripe size can have considerable impact on database
performance. Typical stripe sizes are 64 KB to 256 KB, although
the stripe size can be as high as 512 KB or even 1 MB.
◆ Multiples of 4 physical devices for the stripe width are generally
recommended, although this may be increased to 8 or 16 as
required for LUN presentation or SAN configuration restrictions
as needed. Care should be taken with RAID 5 metavolumes to
ensure that members do not end up on the same physical
spindles (a phenomenon known as vertical striping), as this may
affect performance. In general, RAID 5 metavolumes are not
recommended.
◆ When configuring in an SRDF environment, smaller stripe sizes
(32 KB for example), particularly for the redo logs, are
recommended. This is to enhance performance in synchronous
SRDF environments due to the limit of having only one
outstanding I/O per hypervolume on the link.
◆ Data alignment (along block boundaries) can play a significant
role in performance, particularly in Windows environments.
Refer to operating-system-specific documentation to learn how to
align data blocks from the host along Symmetrix DMX track
boundaries.
◆ Ensure that volumes used in the same stripe set are located on
different physical spindles. Using volumes from the same
physicals reduces the performance benefits of using striping. An
exception to this rule is when RAID 5 devices are used in DSS
environments.
Striping recommendations
Determining the appropriate striping method depends on many
factors. In general, striping is a tradeoff between manageability and
performance. With host-based striping, CPU cycles are used to
manage the stripes; Symmetrix metavolumes require no host cycles
to stripe the data. This small performance decrease in host-based
striping is offset by the fact that each device in a striped volume
group maintains an I/O queue, thereby increasing performance over
a Symmetrix metavolume, which only has a single I/O queue on the
host.
Recent tests show that striping at the host level provides somewhat
better performance than comparable Symmetrix-based striping, and
is generally recommended if performance is paramount. Host-based
striping is also recommended with environments using synchronous
SRDF, since stripe sizes in the host can be tuned to smaller increments
than are currently available with Symmetrix metavolumes, thereby
increasing performance.
Management considerations generally favor Symmetrix-based
metavolumes over host-based stripes. In many environments,
customers have achieved high-performance back-end layouts on the
Symmetrix system by allocating all of the storage as four-way striped
metavolumes. The advantage of this is any volume selected for host
data is always striped, with reduced chances for contention on any
given physical spindle. Additional storage requirements for any host
volume group, since additional storage is configured as a
metavolume, also are striped. Management of added storage to an
existing volume group using host-based striping may be significantly
more difficult, requiring in some cases a full backup, reconfiguration
of the volume group, and restore of the data to successfully expand
the stripe.
An alternative in Oracle environments gaining popularity recently is
the combined use of both host-based and array-based striping.
Known as double striping or a plaid, this configuration utilizes
striped metavolumes in the Symmetrix array, which are then
presented to a volume group and striped at the host level. This has
many advantages in database environments where read access is
small and highly random in nature. Since I/O patterns are pseudo
random, access to data is spread across a large quantity of physical
spindles, thereby decreasing the probability of contention on any
given disk. Double striping, in some cases, can interfere with data
prefetching at the Symmetrix DMX level when large, sequential data
reads are predominant. This configuration may be inappropriate for
DSS workloads.
Another method of double striping the data is through the use of
Symmetrix metavolumes and RAID 5. A RAID 5 hypervolume stripes
data across either four or eight physical disks using a stripe size of
four tracks (128 KB for DMX-2 or 256 KB for DMX-3). Striped
metavolumes stripe data across two or more hypers using a stripe
size of two cylinders (960 KB in DMX-2 or 1920 KB in DMX-3). When
using striped metavolumes with RAID 5 devices, ensure that
members do not end up on the same physical spindles, as this will
adversely affect performance. In many cases however, double
striping using this method also may affect prefetching for long,
sequential reads. As such, using striped metavolumes is generally not
recommended in DSS environments. Instead, if metavolumes are
needed for LUN presentation reasons, concatenated metavolumes on
the same physical spindles are recommended.
The decision of whether to use host-based, array-based, or double
striping in a storage environment has elicited considerable fervor on
all sides of the argument. While each configuration has positive and
negative factors, the important thing is to ensure that some form of
striping is used for the storage layout. The appropriate layer for disk
striping can have a significant impact on the overall performance and
manageability of the database system. Deciding which form of
striping to use depends on the specific nature and requirements of the
database environment in which it is configured.
With the advent of RAID 5 data protection in the Symmetrix DMX, an
additional option of triple striping data using RAID 5, host-based
striping, and metavolumes combined is now available. However,
triple striping increases data layout complexity, and in testing has
shown no performance benefits over other forms of striping. In fact, it
is shown to be detrimental to performance and as such, is not
recommended in any Symmetrix DMX configuration.
◆ Rotational Speed - This is due to the need for the platter to rotate
underneath the head to correctly position the data to be accessed.
Rotational speeds for spindles in the Symmetrix DMX range from
7,200-15,000 rpm. The average rotational delay is the time it takes
for half of a revolution of the disk. In the case of a 15 KB drive,
this is about 2 milliseconds.
◆ Interface Speed -A measure of the transfer rate from the drive into
the Symmetrix cache. It is important to ensure that the transfer
rate between the drive and cache is greater than the drive's rate to
deliver data. Delay caused by this is typically a very small value,
on the order of a fraction of a millisecond.
◆ Areal Density -A measure of the number of bits of data that fits on
a given surface area on the disk. The greater the density, the more
data per second that can be read from the disk as it passes under
the disk head.
◆ Cache Capacity and Algorithms - Newer disk drives have
improved read and write algorithms, as well as cache, in order to
improve the transfer of data in and out of the drive and to make
parity calculations for RAID 5.
Delay caused by the movement of the disk head across the platter
surface is called seek time. The time associated with a data track
rotating to the required location under the disk head is referred to as
rotational delay. The cache capacity on the drive, disk algorithms,
interface speed, and the areal density (or zoned bit recording)
combines to produce a disk transfer time. Therefore, the time taken to
complete an I/O (or disk latency) consists of these three elements:
seek time, rotational delay, and transfer time.
Data transfer times are typically on the order of fractions of a
millisecond and as such, rotational delays and delays due to
repositioning the actuator heads are the primary sources of latency on
a physical spindle. Additionally, rotational speeds of disk drives have
increased from top speeds of 7,200 rpm up to 15,000 rpm, but still
average on the order of a few milliseconds. The seek time continues
to be the largest source of latency in disk assemblies when using the
entire disk.
Transfer delays are lengthened in the inner parts of the drive; more
data can be read per second on the outer parts of the drive than by
data located on the inner regions. Therefore, performance is
significantly improved on the outer parts of the disk. In many cases,
performance improvements of more than 50 percent can sometimes
be realized on the outer cylinders of a physical spindle. This
performance differential typically leads customers to place high I/O
objects on the outer portions of the drive.
While placing high I/O objects such as redo logs on the outer edges
of the spindles has merit, performance differences across the drives
inside the Symmetrix DMX are significantly smaller than the
stand-alone disk characteristics would attest. Enginuity operating
Position actuator
Cache and
algorithms
Interface speed
ICO-IMG-000037
Hypervolume contention
Disk drives can receive only a limited number of read or write I/Os
before performance degradation occurs. While disk improvements
and cache, both on the physical drives and in disk arrays, have
improved disk read and write performance, the physical devices can
still become a critical bottleneck in Oracle database environments.
Eliminating contention on the physical spindles is a key factor in
ensuring maximum Oracle performance on Symmetrix DMX arrays.
Contention can occur on a physical spindle when I/O (read or write)
to one or more hypervolumes exceeds the I/O capacity of the disk.
While contention on a physical spindle is undesirable, this type of
contention can be rectified by migrating high I/O data onto other
devices with lower utilization. This is accomplished using a number
separate physical disks. There are pros and cons to each of the
solutions; the optimal solution generally depends on the anticipated
workload.
The primary benefit of spreading BCVs across all physical spindles is
performance. Spreading I/Os across more spindles reduces the risk
of bottlenecks on the physical disks. Workloads that use BCVs, such
as backups and reporting databases, may generate high I/O rates.
Spreading this workload across more physical spindles may
significantly improve performance in these environments.
The main drawbacks to spreading BCVs across all spindles in the
Symmetrix system are:
◆ Synchronization may cause spindle contention during
resynchronization.
◆ BCV workloads may negatively impact production database
performance.
When resynchronizing the BCVs, data is read from the production
hypers and copied into cache. From there it is destaged to the BCVs.
When the physical disks share production and BCVs, the
synchronization rates can be greatly reduced because of increased
seek times due to the conflict between reading from one part of the
disk and writing to another. The other drawback to sharing physical
disks is the increased workload on the spindles that may impact
performance on the production volumes. Sharing the spindles
increases the chance that contention may arise, decreasing database
performance.
Determining the appropriate location for BCVs (either sharing the
same physical spindles or isolated on their own disks) depends on
customer preference and workload. In general, BCVs should share
the same physical spindles. However, in cases where the BCV
synchronization and utilization may negatively impact applications
(for example, databases that run 24x7 with high I/O requirements), it
may be beneficial for the BCVs to be isolated on their own physical
disks.
Parameter Description
DB_BLOCK_BUFFE Specifies the number of data "pages" available in host memory for
RS data pulled from disk. Typically, the more block buffers available in
memory, the better the potential performance of the database.
DB_BLOCK_SIZE Determines the size of the data pages Oracle stores in memory and
out on disk. For DSS applications, using larger block sizes such as
16 KB (or 32 KB when available) improves data throughput, while for
OLTP applications, a 4 KB or 8 KB block size may be more
appropriate.
DB_FILE_MULTIBL Specifies the maximum number of blocks that can be read in a single
OCK_READ_ sequential read I/O. For OLTP environments, this parameter should
COUNT be set to a low value (4 or 8 for example). For DSS environments
where long, sequential data scans are normal, this parameter should
be increased to match the maximum host I/O size (or more) to
optimize throughput.
DB_WRITER_PRO Specifies the number of DBWR processes initially started for the
CESSES database. Increasing the number of DBWR processes can improve
writes to disk through multiplexing if multiple CPUs are available in
the host.
DBWR_IO_SLAVES Configures multiple I/O server process for the DBW0 process. This
parameter is only used on single CPU servers where only a single
DBWR process is enabled. Configuring I/O slaves can improve write
performance to disk through multiplexing the writes.
DISK_ASYNCH_IO Controls whether I/O to Oracle structures such as data- files, log
files, and control files is written asynchronously or not. If
asynchronous I/O is available on the host platform, asynchronous
I/O to the datafiles has a positive affect on I/O performance.
Parameter Description
LOG_BUFFER Specifies the size of the redo log buffer. Increasing the size of this
buffer can decrease the frequency of required writes to disk.
LOG_CHECKPOINT Specifies the number of redo log blocks that can be written before a
_INTERVAL checkpoint must be performed. This affects performance since a
checkpoint requires that data be written to disk to ensure
consistency. Frequent checkpoints reduce the amount of recovery
needed if a crash occurs but can also be detrimental to Oracle
performance.
SORT_AREA_SIZE Specifies the maximum amount in memory that Oracle will use to
perform sort operations. Increasing this parameter decreases the
likelihood that a sort will be performed in a temporary tablespace on
disk. However, this also increases the memory requirements on the
host.
◆ How many data files for the database will be created? Which have
the highest I/O activity?
◆ What are the availability requirements for the database?
◆ Will a cluster be deployed? How many nodes? Single instance?
RAC?
◆ How many data paths are required from the host to the storage
array? Will multipathing software be used?
◆ How will the host connect d to the storage array (direct attach,
SAN)?
◆ Which is more important: IOPS or throughput? How much of
each are anticipated?
◆ What kind of database is planned (DSS, OLTP, a combination of
the two)?
◆ What types of I/Os are anticipated from the database (long
sequential reads, small bursts of write activity, a mix of reads and
writes)?
◆ How will backups be handled? Will replication (host or storage
based) be used?
Answers to these questions determines the configuration and layout
of the proposed database. The key to the layout process is a complete
understanding of the database characteristics and requirements to
implement. Of particular importance when planning a database
layout is the I/O characteristics of the various database objects. This
information is collected and documented, and encompasses the key
deliverable for the next phase of the database layout on a Symmetrix
project.
In some cases, the databases to be deployed already exist in a
production environment. In these cases, it is easy to understand the
I/O characteristics of the various underlying database structures
(tablespaces, data files, tables, and so on). Various tools for gathering
performance statistics include Oracle StatsPack, EMC Performance
Manager, host-based utilities including sar and iostat, and third-party
analyzers (such as Quest Central). Performance statistics such as
reads and writes are determined for database objects. These statistics
are then used to determine the required number of physical spindles,
the number of I/O paths between the host and the storage array, and
the Symmetrix configuration.
Implementation
The implementation phase takes the planned database layout from
the preceding step and implements it into the customer's
environment. The host is presented with the documented storage.
Volume groups and file systems are created as required and database
elements are initialized as planned. This phase of the process is
normally short and relatively straight-forward to complete if prior
steps are performed and documented well.
Data Protection
D E V I C E S W I T H C H E C K S U M E X T E N T S
Action Checks
R P C C A N D
e h h M h S l Z F i
j o k a k t l r s
L e n s g D r B D a c
Num Blk o c e u i B a l B c r
Device Name Dev Exts Siz Type g t H m c A d k A t d
-----------------------------------------------------------------------------
--
Note: When FF or power down occurs, extents are lost. Run the symchksum
enable command again.
Disabling checksum
The symchksum disable command understands the Oracle
database structure. The feature can be enabled for tablespaces,
control files, redo logs, or the entire database.
The symchksum disable command also is used on a device basis.
This capability is not normally used, but is provided in the event the
tablespace was dropped before EMC Double Checksum was disabled
for that object.
When the disable action is specified for a Symmetrix device, the
-force flag is required. Disabling extents in this way can cause a
mapped tablespace or database to be only partially protected,
therefore, use this option with caution. All the extents monitored for
checksum errors on the specified Symmetrix device will be disabled.
Why generic?
Generic SafeWrite is deemed generic because the checks performed to
ensure complete data are application independent. For instance,
Generic SafeWrite will not perform any Oracle- or Exchange-specific
checksums to verify data integrity. It is important to note that for
Oracle, EMC Double Checksum for Oracle provides a rich set of
checks which can be natively performed by the Symmetrix array. For
more information on EMC Double Checksum for Oracle, consult
“Implementing EMC Double Checksum for Oracle” on page 342.
Note: It is always a best practice to separate the location of database files and
log files for a given database onto unique devices. There are cases, however,
where the datafile and log file may share the same device. In this case, it is
still possible to have GSW enabled; however, there will be a performance
impact to the log writes that may impact application performance.
ICO-IMG-000526
Performance considerations
Performance testing was done with Microsoft Exchange, Microsoft
SQL Server and Oracle on standard devices, and in the case of
Microsoft Exchange, also on SRDF/S and SRDF/A devices. For the
Microsoft SQL Server and Oracle performance tests, a TPCC
To enable Checksum on the extents of all the devices that define the
current database instance and then to phone home on error, enter:
symchksum enable -type Oracle -phone_home
To enable Checksum on the extents of all the devices that define the
tablespace and then to log on error, enter:
symchksum enable -type Oracle -tbs SYSTEM
Overview
The EMC Symmetrix VMAX series with Enginuity is the newest
addition to the Symmetrix product family. Built on the strategy of
simple, intelligent, modular storage, it incorporates a new scalable
Virtual Matrix interconnect that connects all shared resources across
all VMAX Engines, allowing the storage array to grow seamlessly
and cost-effectively from an entry-level configuration into the
world’s largest storage system. The Symmetrix VMAX provides
improved performance and scalability for demanding enterprise
storage environments while maintaining support for EMC’s broad
portfolio of platform software offerings.
EMC Symmetrix VMAX delivers enhanced capability and flexibility
for deploying Oracle databases throughout the entire range of
business applications, from mission-critical applications to test and
development. In order to support this wide range of performance
and reliability at minimum cost, Symmetrix VMAX arrays support
multiple drive technologies that include Enterprise Flash Drives
(EFDs), Fibre Channel (FC) drives, both 10k rpm and 15k rpm, and
7,200 rpm SATA drives. In addition, various RAID protection
mechanisms are allowed that affect the performance, availability, and
economic impact of a given Oracle system deployed on a Symmetrix
VMAX array.
As companies increase deployment of multiple drive and protection
types in their high-end storage arrays, storage and database
administrators are challenged to select the correct storage
configuration for each application. Often, a single storage tier is
selected for all data in a given database, effectively placing both
active and idle data portions on fast FC drives. This approach is
expensive and inefficient, because infrequently accessed data will
reside unnecessarily on high-performance drives.
Alternatively, making use of high-density low-cost SATA drives for
the less active data, FC drives for the medium active data, and EFDs
for the very active data enables efficient use of storage resources, and
reduces overall cost and the number of drives necessary. This, in turn,
also helps to reduce energy requirements and floor space, allowing
the business to grow more rapidly.
Database systems, due to the nature of the applications that they
service, tend to direct the most significant workloads to a relatively
small subset of the data stored within the database and the rest of the
database is less frequently accessed. The imbalance of I/O load
Overview 357
Storage Tiering—Virtual LUN and FAST
number of drives, and improve the total cost of ownership (TCO) and
ROI. FAST VP enables users to achieve these objectives while
simplifying storage management.
This chapter describes Symmetrix Virtual Provisioning, a tiered
storage architecture approach for Oracle databases, and the way in
which devices can be moved nondisruptively, using either Virtual
LUN, FAST for traditional thick devices and FAST VP for virtual
provisioned devices, in order to put the right data on the right storage
tier at the right time.
ICO-IMG-000927
Introduction
Symmetrix Virtual Provisioning, the Symmetrix implementation of
what is commonly known in the industry as “thin provisioning,”
enables users to simplify storage management and increase capacity
utilization by sharing storage among multiple applications and only
allocating storage as needed from a shared “virtual pool” of physical
disks.
Symmetrix thin devices are logical devices that can be used in many
of the same ways that Symmetrix standard devices have traditionally
been used. Unlike traditional Symmetrix devices, thin devices do not
need to have physical storage preallocated at the time the device is
created and presented to a host (although in many cases customers
interested only in wide striping and ease of management choose to
fully preallocate the thin devices). A thin device is not usable until it
has been bound to a shared storage pool known as a thin pool.
Multiple thin devices may be bound to any given thin pool. The thin
pool is comprised of devices called data devices that provide the
actual physical storage to support the thin device allocations.
When a write is performed to a part of any thin device for which
physical storage has not yet been allocated, the Symmetrix allocates
physical storage from the thin pool for that portion of the thin device
only. The Symmetrix operating environment, Enginuity, satisfies the
requirement by providing a block of storage from the thin pool called
a thin device extent. This approach reduces the amount of storage
that is actually consumed.
The minimum amount of physical storage that can be reserved at a
time for the dedicated use of a thin device is referred to as a data
device extent. The data device extent is allocated from any one of the
data devices in the associated thin pool. Allocations across the data
devices associated with thin Pool A and three thin devices associated
with thin pool B. The data extents for thin devices are distributed on
various data devices as shown in Figure 73.
Pool A
Data
devices
Thin
devices
Pool B
Data
devices
ICO-IMG-000929
The way thin extents are allocated across the data devices results in a
form of striping in the thin pool. The more data devices in the thin
pool (and the associated physical drives behind them), the wider
striping will be, creating an even I/O distribution across the thin
pool. Wide striping simplifies storage management by reducing the
time required for planning and execution of data layout.
(metadata) to each initialized block. This will cause the thin pool to
allocate the amount of space that is being initialized by the database.
As database files are added, more space will be allocated in the pool.
Due to Oracle file initialization, and in order to get the most benefit
from a Virtual Provisioning infrastructure, a strategy for sizing files,
pools, and devices should be developed in accordance with
application and storage management needs. Some strategy options
are explained next.
Oversubscription
An oversubscription strategy is based on using thin devices with a
total capacity greater than the physical storage in the pools that they
are bound to. This can increase capacity utilization by sharing storage
among applications, thereby reducing the amount of allocated but
unused space. The thin devices each seem to be a full-size device to
the application, while in fact the thin pool cannot accommodate the
total LUNs’ capacity. Since Oracle database files initialize their space
even though they are still empty, it is recommended that instead of
creating very large data files that remain largely empty for most of
their lifetime, smaller data files should be considered to
accommodate near-term data growth. As they fill up over time, their
size can be increased, or more data files added, in conjunction with
the capacity increase of the thin pool. The Oracle auto-extend feature
can be used for simplicity of management, or DBAs may prefer to use
manual file size management or addition.
An oversubscription strategy is recommended for database
environments when database growth is controlled, and thin pools can
be actively monitored and their size increased when necessary in a
timely manner.
Undersubscription
An undersubscription strategy is based on using thin devices with a
total capacity smaller than the physical storage in the pools that they
are bound to. This approach doesnot necessarily improve storage
capacity utilization but still makes use of wide striping, thin pool
sharing, and other benefits of Virtual Provisioning. In this case the
data files can be sized to make immediate use of the full thin device
size, or alternatively, auto-extend or manual file management can be
used.
them into one pool, the pool has eight RAID 5 devices of four drives
each. If one of the drives in this pool fails, you are not losing one
drive from a pool of 32 drives; rather, you are losing one drive from
one of the eight RAID-protected data devices and that RAID group
can continue to service read and write requests, in degraded mode,
without data loss. Also, as with any RAID group, with a failed drive
Enginuity will immediately invoke a hot sparing operation to restore
the RAID group to its normal state. While this RAID group is
rebuilding, any of the other RAID groups in the thin pool can have a
drive failure and there is still no loss of data. In this example, with
eight RAID groups in the pool there can be one failed drive in each
RAID group in the pool without data loss. In this manner data stored
in the thin pool is no more vulnerable to data loss than any other data
stored on similarly configured RAID devices. Therefore, a protection
of RAID 1 or RAID 5 for thin pools is acceptable for most applications
and RAID 6 is only required in situations where additional parity
protection is warranted.
The number of thin pools is affected by a few factors. The first is the
choice of drive type and RAID protection. Each thin pool is a group of
data devices sharing the same drive type and RAID protection. For
example, a thin pool that consists of multiple RAID 5 protected data
devices based on 15k rpm FC disk can host the Oracle data files for a
good choice of capacity/performance optimization. However, very
often the redo logs that take relatively small capacity are best
protected using RAID 1 and therefore another thin pool containing
RAID 1 protected data devices can be used. In order to ensure
sufficient spindles behind the redo logs the same set of physical
drives that is used for the RAID 5 pool can also be used for the RAID
1 thin pool. Such sharing at the physical drive level, but separation at
the thin pool level, allows efficient use of drive capacity without
compromising on the RAID protection choice. Oracle Fast Recovery
Area (FRA), for example, can be placed in a RAID 6 protected SATA
drive’s thin pool.
Therefore the choice of the appropriate drive technology and RAID
protection is the first factor in determining the number of thin pools.
The other factor has to do with the business owners. When
applications share thin pools they are bound to the same set of data
devices and spindles, and they share the same overall thin pool
capacity and performance. If business owners require their own
control over thin pool management they will likely need a separate
set of thin pools based on their needs. In general, however, for ease of
requirements of the +GRID ASM disk group are tiny, very small
devices can be provisioned (High redundancy implies three
copies/mirrors and therefore a minimum of three devices is
required).
◆ +DATA, +LOG: While separating data and log files to two
different ASM disk groups is optional, EMC recommends it in the
following cases:
• When TimeFinder is used to create a clone (or snap) that is a
valid backup image of the database. The TimeFinder/Clone
image can serve as a source for RMAN backup to tape, and/or
can be opened for reporting (read-only), and so on. However
the importance of such a clone image is that it is a valid full
backup image of the database. If the database requires media
recovery, restoring the TimeFinder/Clone back to production
takes only seconds-regardless of the database size. This is a
huge saving in RTO and in a matter of a few seconds, archive
logs can start being applied as part of media recovery roll
forward. When such a clone does not exist, the initial backup
set has to be first restored from tape/VTL prior to applying
any archive log, which can add a significant amount of time to
recovery operations. Therefore, when TimeFinder is used to
create a backup image of the database, in order for the restore
to not overwrite the online logs, they should be placed in
separate devices and a separate ASM disk group.
• Another reason for separation of data from log files is
performance and availability. Redo log writes are synchronous
and require to complete in the least amount of time. By having
them placed in separate storage devices, the commit writes
will not have to share the LUN I/O queue with large async
buffer cache checkpoint I/Os. Having the logs in their own
devices makes it available to use one RAID protection for data
files (such as RAID 5), and another for the logs (such as RAID
1).
◆ +TEMP: When storage replication technology is used for disaster
recovery, like SRDF/S, it is possible to save bandwidth by not
replicating temp files. Since temp files are not part of a recovery
operation and quick to add, having them on separate devices
allows bandwidth saving, but adds to the operations of bringing
up the database after failover. While it is not required to separate
temp files, it is an option and the DBA may choose to do it
anyway for performance isolation reasons if that is their best
practice.
◆ +FRA: Fast Recovery Area typically hosts the archive logs and
sometimes flashback logs and backup sets. Since the I/O
operations to FRA are typically sequential writes, it is usually
sufficient to have it located on a lower tier such as SATA drives. It
is also an Oracle recommendation to have FRA as a separate disk
group from the rest of the database to avoid keeping the database
files and archive logs or backup sets (that protect them) together.
‘+Sales’
20 x 50 GB
ASM members
ICO-IMG-000779
Figure 75 Migration of ASM members from FC to EFDs using Enhanced Virtual LUN
technology
The target devices for the migration can be chosen from configured
space or new devices can be automatically configured by migrating
to unconfigured space.
790
M1 M2 M3 M4
RAID1 RAID5
RDF
(FC) (SATA)
1FD7
M1 M2 M3 M4
RAID1
(FC)
ICO-IMG-000780
Steps:
1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (3+1) on EFD
configured as 1FD7.
790
M1 M2 M3 M4
RAID5
RDF
(SATA)
Primary Remote Secondary
ICO-IMG-000781
Steps:
1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (EFD) pool.
2. Configuration lock is taken.
3. The RAID 5 mirror is created from unconfigured space and added
as the secondary mirror.
4. Configuration lock is released.
5. The secondary mirror is synchronized from the primary mirror.
6. Once synchronization is done, the configuration lock is taken
again.
7. Primary and secondary roles are switched and the original
primary mirror is detached from the source and moved to the
target device 1FD7.
8. The original primary mirror on RAID 1 (FC) is deleted.
9. Configuration lock is released.
FAST VP Elements
FAST VP has three main elements—storage tiers, storage groups, and
FAST policies—as shown in Figure 78.
ICO-IMG-000930
ICO-IMG-000931
FAST VP architecture
There are two components of FAST VP: Symmetrix microcode and
the FAST controller.
The file system will traditionally host multiple data files, each
containing database objects in which some will tend to be more active
than others as discussed earlier, creating I/O access skewing at a
sub-LUN level.
ICO-IMG-000919
Since such events are usually short term and only touch each dataset
once it is unlikely (and not desirable) for FAST VP to migrate data at
that same time and it is best to simply let the storage handle the
workload appropriately. If the event is expected to last a longer
period of time (such as hours or days), then FAST VP, being a reactive
mechanism, will actively optimize the storage allocation as it does
natively.
Test environment
This section describes the hardware, software, and database
configuration used for Oracle databases and FAST VP test cases as
seen in Table 14.
Enginuity 5875
EFD 8 x 400 GB
Table 15 Initial tier allocation for test cases with shared ASM disk group
Databases ASM disk Thin Storage Thin pool RAID Tier Initial tier
groups devices groups associated alocation
FC_Pool FC 100%
12 x 100
+DATA DATA_SG EFD_Pool RAID 5 EFD 0%
FINDB & HRDB GB
SATA_Pool SATA 0%
One server was used for this test. Each of the Oracle databases was
identical in size (about 600 GB) and designed for an
industry-standard OLTP workload. However, during this test one
database had high activity whereas the other database remained idle
to provide a simple example of the behavior of FAST VP.
EFD 0% 0
FINDB (600 GB)
+DATA HRDB DATA_SG FC 100% 1.2 TB
(600 GB)
SATA 0% 0
Event Waits Time (s) Avg wait (s) %DB time Wait class
ICO-IMG-000924
2000
1500
EFD Tier
(GB)
1000 FC Tier
SATA Tier
500
0
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Interval
ICO-IMG-000925
Figure 83 Storage tier allocation changes during the FAST VP test for FINDB
Table 19 FAST VP enabled database response time from the AWR report
Avg wait
Event Waits Time (s) %DB time Wait class
(ms)
4000
3000
4000
0
Initial and With FAST VP Enabled
ICO-IMG-000921
Test Case 2: Oracle databases sharing the ASM disk group and FAST policy
Oracle ASM makes it easy to provision and share devices across
multiple databases. The databases, running different workloads, can
share the ASM disk group for ease of manageability and
provisioning. Multiple databases can share the Symmetrix thin pools
for ease of provisioning, wide striping, and manageability at the
storage level as well. This section describes the test case in which a
FAST VP policy is applied to the storage group associated with the
shared ASM disk group. At the end of the run we can see improved
transaction rates and response times of both databases, and very
efficient usage of the available tiers.
FINDB EFD 0% 0
+DATA (600 GB)
FC 100% 1.2 TB
(1.2 TB) HRDB
(600 GB) SATA 0% 0
Event Waits Time (s) Avg wait (s) %DB time Wait class
3000
2500
FINDB High
2000
Workload
1500
HRDB Low
1000 Workload
500
0
14 HRFAST Enabled Run
ICO-IMG-000922
Figure 85 Storage tier changes during FAST VP enabled run on two databases
Trasnsaction rate
Test Case 3: Oracle databases on separate ASM disk groups and FAST policies
Not all the databases have the same I/O profile or SLA requirements
and may also warrant different data protection policies. By deploying
the databases with different profiles on separate ASM disk groups,
administrators can achieve the desired I/O performance and ease of
manageability. On the storage side these ASM disk groups will be on
separate storage groups to allow for definition of FAST VP policies
appropriate for the desired performance. This section describes a use
case with two Oracle databases with different I/O profiles on
separate ASM disk groups and independent FAST policies.
The hardware configuration of this test was the same as the previous
two use cases (as shown in Table 1). This test configuration had two
Oracle databases—CRMDB (CRM) and SUPCHDB (Supply
Chain)—on separate ASM disk groups, storage groups, and FAST VP
policies, as shown in Table 23.
Table 23 Initial tier allocation for a test case with independent ASM disk groups
The Symmetrix VMAX array had a mix of storage tiers–EFD, FC, and
SATA. One server was used for this test. Each of the Oracle databases
was identical in size (about 600 GB) and designed for an
industry-standard OLTP workload.
The Oracle databases CRMDB and SUPCHDB used independent
ASM disk groups based on thin devices that were initially bound to
FC_Pool (FC tier).
The CRMDB database in this configuration was part of a customer
relationship management system that was critical to the business. To
achieve higher performance the FAST VP policy “GoldPolicy” was
defined to make use of all three available storage tiers and storage
group-OraDevices_C1 was associated with the policy.
The SUPCHDB database was important to the business and had
proper performance characteristics. Business would benefit if the
performance level can be maintained at lower cost. To meet this goal
the FAST VP policy “SilverPolicy” was defined to make use of only
FC and SATA tiers and storage group - OraDevices_S1 was associated
with the policy.
CRMDB
Event Waits Time (s) Avg wait (s) %DB time Wait class
SUPCHDB
Event Waits Time (s) Avg wait (s) %DB time Wait class
For SUPCHDB, our goal was to lower the cost while maintaining or
improving the performance. The FAST VP Silver policy was defined
to allocate the extents across FC and SATA drives to achieve this goal.
The Silver policy allows a maximum of 50 percent allocation on the
FC tier and up to 100 percent allocation on the SATA tier.
Running the database workload after enabling the FAST VP policy
The database workload was repeated after enabling the FAST VP
policy. FAST VP collected statistics, analyzed them, and performed
the extent movements following the performance and compliance
algorithms. The AWR reports for both databases were generated to
review the I/O response times as shown in Table 25.
CRMDB
Event Waits Time (s) Avg wait (s) %DB time Wait class
SUPCHDB
Event Waits Time (s) Avg wait (s) %DB time Wait class
ICO-IMG-000923
Table 26 Storage tier allocation changes during the FAST VP-enabled run
Introduction
Businesses use multiple databases in environments that serve DSS
and OLTP application workloads. Even though multiple levels of
cache exist in the database I/O stack including host cache, database
server cache and Symmetrix cache, the disk response time is critical at
times for application performance. Selection of the correct storage
class for various database objects is a challenge. Also the storage
selection that works in one situation may not be optimal for other
cases. Jobs executed at periodic intervals or on an adhoc basis such as
quarter-end batch jobs demand high degree of performance and
availability and make disk selection and data placement even more
challenging. As the size and number of databases grow, analysis of
performance of various databases, identifying the bottlenecks, and
selection of the right storage tier for the multitude of databases turns
into a daunting task.
Introduced in the Enginuity 5874 Q4 2009 service release, EMC
Symmetrix VMAX Fully Automated Storage Tiering (FAST) is
Symmetrix software that utilizes intelligent algorithms to
continuously analyze device I/O activity and generate plans for
moving and swapping devices for the purposes of allocating or
re-allocating application data across different performance storage
tiers within a Symmetrix array. FAST proactively monitors workloads
at the Symmetrix device (LUN) level in order to identify “busy”
devices that would benefit from being moved to higher-performing
drives such as EFD. FAST will also identify less “busy” devices that
could be relocated to higher-capacity, more cost-effective storage such
as SATA drives without altering performance.
Time windows can be defined to specify when FAST should collect
performance statistics (upon which the analysis to determine the
appropriate storage tier for a device is based), and when FAST should
perform the configuration changes necessary to move devices
between storage tiers. Movement is based on user-defined storage
tiers and FAST Policies.
FAST configuration
FAST configuration involves three components:
◆ Storage Groups
A Storage Group is a logical grouping of Symmetrix devices.
Storage Groups are shared between FAST and Auto-provisioning
Groups; however, a Symmetrix device may only belong to one
Storage Group that is under FAST control. A Symmetrix VMAX
storage array supports upto 8,192 Storage Groups associated with
FAST Policies.
◆ Storage Tiers
storage tiers are a combination of a drive technology (for
example, EFD, FC 15k rpm, or SATA) and a RAID protection type
(for example, RAID 1, RAID 5 (3+1), RAID 5 (7+1), RAID 6 (6+2)).
There are two types of storage tiers-static and dynamic. A static
type contains explicitly specified Symmetrix device groups, while
a dynamic type will automatically contain all Symmetrix disk
ASM disk group of each database that was moved between the
storage tiers. The +REDO and +TEMP disk groups remained on 15k
rpm drives, and FRA on SATA drives.
The first database, DB1, started on FC 15k rpm drives but was
designed to simulate a low I/O activity database that has very few
users, low importance to the business, and is a candidate to move to a
lower storage tier, or “down-tier.” The DB1 database could be one
that was once active but is now being replaced by a new application.
The second database, DB2, was designed to simulate a medium active
database that was initially deployed on SATA drives, but its activity
level and importance to the business are increasing and it is a
candidate to be moved to a higher storage tier, or “up-tier.” The last
database, DB3, started on FC 15k rpm drives and was designed to
simulate the high I/O activity level of a mission-critical application
with many users and is a candidate to up-tier from FC 15k rpm to
EFD.
The test configuration details are provided in Table 27.
Each of the three databases was using the ASM disk group
configuration as shown in Table 28.
Number of
ASM disk groups LUNs Size (GB) Total (GB) RAID
Table 29 shows the initial storage drive types and count behind each
of the +DATA ASM disk groups at the beginning of the tests. It also
shows the OLTP workload and potential business goals for each
database.
Number of physical
Database drives Drive type Workload Business goal
Figure 77 on page 379 shows the logical FAST profile we used for
database 3, or DB3. In this case, while we have three drive types in
the Symmetrix VMAX—EFD, FC 15k rpm, and SATA drives—we do
not want DB3 to reside on SATA so we could potentially not include a
SATA type. However, including it and setting the allowable
percentage to 0 percent has the same effect.
Match
Storage class Service level objectives
Type 1 DB3_FP
400 GB EFD 100%
RAID 5 (3+1) DB3_SG
100%
Type 2
300 GB 15K FC
RAID 5 (3+1) 0% DB2_SG
Type 3
1 TB SATA
RAID 5 (3+1) DB2_SG
ICO-IMG-000782
Table 30 Initial Oracle AWR report inspection (db file sequential read)
Based on these results we can see that DB1 is mainly busy waiting for
random read I/O (“db file sequential read” Oracle event refers to
host random I/O). A wait time of 5 ms is very good; however, this
Type 1 DB3_FP
400 GB EFD 100%
RAID 5 (3+1) DB3_SG
100%
Type 2
300 GB 15K FC
RAID 5 (3+1) 0% DB2_SG
Type 3
1 TB SATA
RAID 5 (3+1) DB2_SG
ICO-IMG-000782
DB2 on SATA
IMG-ICO-000783
FAST control can contain the same devices. In Figure 90 we can see
how the devices of ASM disk group +DATA, of database 3 (DB3), are
placed into a Storage Group that can later be assigned a FAST Policy.
As shown in Figure 90, FAST configuration parameters are specified.
The user approval mode is chosen.
ICO-IMG-000784
Figure 92 shows provisioning the target storage tier for the FAST
policies.
ICO-IMG-000786
When creating FAST policies, the Storage Groups prepared earlier for
FAST control are being assigned storage tiers they can be allocated
on, and the capacity percentage the Storage Group is allowed on each
of them.
The last screen in the wizard is a summary and approval of the
changes. Additional modifications to FAST configuration and
settings can be done using Solutions Enabler or SMC directly, without
accessing the wizard again. Solutions Enabler uses the “symfast”
command line syntax, and SMC uses the FAST tab.
The following example shows how FAST can be used to migrate data
for DB3 to the appropriate storage tier. The DB3 Storage Group
properties box has three tabs—General, Devices, and Fast
Compliance. The Devices tab shows the 10 Symmetrix devices that
belong to the +DATA ASM disk group devices that contain DB3 data
files and comprise the DB3_SG Storage Group. The FAST
Compliance tab shows what tiers of storage this Storage Group may
reside in. In this case we have defined the FC storage tier as the place
where the drives are now and the EFD storage tier is where FAST
may choose to move this ASM disk group. Note that there is no
option for a SATA storage tier for the DB3 Storage Group. This will
prohibit FAST from ever recommending a down-tier of DB3 to SATA.
ICO-IMG-000787
The final step of the process is to associate the Storage Group with the
FAST tiers and define a policy to manage FAST behavior. In our case
we have one Storage Group (DB3_SG), two FAST tiers (EFD and FC),
and one FAST Policy (Figure 94 on page 416). The FAST Policy allows
for up to 100 percent of the Storage Group to reside on the Flash
storage tier and allows for 100 percent of DB3 to reside on FC. Since
there is no SATA storage tier defined for DB3, a third storage tier
option does not exist. By allowing up to 100 percent of the DB3
Storage Group to reside on EFD we expected that if FAST was going
to move any DB3 LUNs to EFD, it would move them all because they
all have the same I/O profile, and there is ample capacity available
on that storage tier to accommodate all the capacity of those ASM
disk group devices or the FAST Storage Group.
ICO-IMG-000791
ICO-IMG-000790
Number of physical
Drives Drive type Avg. txn/min % Change
DB2 on SATA
DB3 on EFD
ICO-IMG-000789
Conclusion
Symmetrix Virtual Provisioning offers great value to Oracle
environments with improved performance and ease of management
due to wide striping and higher capacity utilization. Oracle ASM and
Symmetrix Virtual Provisioning complement each other very well.
With a broad range of data protection mechanisms and tighter
integration between Symmetrix and Oracle now available even for
thin devices, adoption of Virtual Provisioning for Oracle
environments is very desirable.
With the Enginuity 5874 Q4 2009 service release enhancements made
to Virtual LUN migration and the introduction of FAST technology,
data center administrators are now able to dynamically manage data
placement in a Symmetrix array to maximize performance and
minimize costs.Introduced with the Symmetrix Enginuity 5875 in Q1
2011, FAST VP in Oracle environments improves storage utilization
and optimizes the performance of databases by effectively making
use of multiple storage tiers at a lower overall cost of ownership
when using Symmetrix Thin Provisioning.In a multi-tiered Oracle
storage configuration, moving the highly accessed volumes from FC
drives to EFDs can help administrators maintain or improve
performance and free up FC drives for other uses. Moving active
drives from SATA to FC drives improves performance and allows for
increased application activity. Moving lightly accessed volumes from
FC to SATA helps utilization and drives down cost. This volume or
sub-LUN level movement can be done nondisruptively on a
Symmetrix VMAX using Virtual LUN,FAST VP and FAST
capabilities.
Conclusion 419
Storage Tiering—Virtual LUN and FAST
RAC1_HBAs RAC2_HBAs
Storage
SAN Port: 07E:1
Port: 10E:1
can be created out of that clone for test, development, and reporting
instances. When SRDF/A is used any remote TimeFinder operation
should use the consistent split feature to coordinate the replica with
SRDF/A cycle switching. The use cases in this appendix illustrate
some of the basic Oracle business continuity operations that
TimeFinder and SRDF can perform together.
SRDF/Synchronous mode
SRDF/S is used to create a no data loss solution of committed
transactions. It provides the ability to replicate multiple databases
and applications data remotely while guaranteeing the data on both
the source and target devices is exactly the same. SRDF/S can protect
single or multiple source Symmetrix storage arrays with synchronous
replication.
With SRDF/S Synchronous replication, shown in Figure 99, each I/O
from the local host to the source R1 devices is first written to the local
Symmetrix cache (1) and then it is sent over the SRDF links to the
remote Symmetrix unit (2). Once the remote Symmetrix unit
acknowledged it received the I/O in its cache successfully (3), the
I/O is acknowledged to the local host (4). Synchronous mode
guarantees that the remote image is an exact duplication of the source
R1 device's data.
Note: SRDF Adaptive Copy replication is not supported for database restart
or database recovery solutions with Oracle databases. Using SRDF Adaptive
Copy replication by itself for disaster protection of Oracle databases will lead
to a corrupt and unusable remote database.
SRDF topologies
SRDF can be set in many topologies other than the single SRDF
source and target. Thus SRDF satisfies different needs for high
availability and disaster restart. It can use a single target or two
concurrent targets; it can provide a combination of synchronous and
asynchronous replications; it can provide a three-site solution that
allows no data loss over very long distances and more. Some of the
basic topologies that can be used with SRDF are shown in the
following section .
Concurrent SRDF
SRDF allows simultaneous replication of single R1 source devices to
up to two target devices using multiple SRDF links. All SRDF links
can operate in either Synchronous or Asynchronous mode or one or
more links can utilize Adaptive Copy mode for efficient utilization of
available bandwidth on that link. This topology allows simultaneous
data protection over short and long distances as shown in Figure 102.
Cascaded SRDF
SRDF allows cascaded configurations in which data is propagated
from one Symmetrix to the next. This configuration requires
Synchronous mode for the first SRDF leg and Asynchronous or
Adaptive Copy modes for the next. As shown in Figure 103, this
topology provides remote replications over greater distances with
varying degree of bandwidth utilization and none to limited data loss
(depends on the choice of SRDF modes and disaster type).
SRDF/Star
SRDF/Star is a two- or three-site protection topology where data is
replicated from source Site A to two other Symmetrix systems
simultaneously (Site B and Site C). The data remains protected even
in case one target site (B or C) goes down. If site A (the primary site)
goes down, the customer can choose where to come up (site B or C)
based on SRDF/Star information. If the storage data in the other
surviving site is more current then changes will be incrementally sent
to the surviving site that will come up. For protection and
compliance, remote replications can start immediately to the new DR
site. For example, as shown in Figure 105, if database operations
resume in Site C, data will be sent first from Site B to create a no data
loss solution, and then Site B will become the new DR target.
SRDF/Star has a lot of flexibility and can change modes and topology
to achieve best protection with each disaster scenario. For full
description of the product refer to the SRDF product guide.
Restart
Recovery Device SRDF
ASM Database Device Groups Consistency
diskgroups devices Groups (DG) (DG) Group (CG)
(together with control files), log files, and archive logs each had
their own DG, allowing the replica of each to take place at slightly
different times as shown in the recovery use cases. For example, if
a valid datafile's backup replica should be restored to production,
and the production logs are intact, by separating the datafiles and
logs to their own DG and ASM diskgroups, such a restore won't
compromise the logs and full database recovery would be
possible. For a restart solution, a single DG was used that
includes all data (control) and log files, allowing them to be split
consistently creating a restartable and consistent replica.
◆ Note that TimeFinder operations can span Symmetrix arrays.
When that is the case instead of a device group (DG) a composite
group (CG) should be used, following the exact same best
practices as shown for the DG in this paper.
◆ It is recommended to issue TimeFinder and SRDF commands
from a management (or the target) host and not the database
production host. The reason is that in rare cases when consistent
split is used, under heavy write activity Symmetrix management
commands may be queued behind database writes, interfering
with completing the replication and the replica will be deemed
invalid.
◆ It is recommended to use Symmetrix Generic Name Services
(GNS) and allow them to be replicated to the SRDF targets. GNS
manages all the DG and CG definitions in the array and can
replicate them to the SRDF target so the management host issuing
TimeFinder and SRDF commands will be able to operate on the
same CG and DG as the source (without having to re-create
them).
◆ For the sake of simplicity the use cases assume that GNS is used
and replicated remotely. When remote TimeFinder or SRDF
operations are used, they are issued on the target host. It is also
possible to issue remote TimeFinder and SRDF commands from
the local management host using the -rdf flag; however it requires
the SRDF links to be functional.
◆ Note that remote TimeFinder replica creation from an SRDF/A
target should always use the -consistent flag to coordinate
SRDF/A cycle switching with the TimeFinder operation and
simply put, guarantee that the replica is consistent.
High-level steps
1. Place the database in hot backup mode.
2. Activate the DATA_DG clone (with -consistent since ASM is
used).
3. End hot backup mode.
4. Archive the current log.
5. Copy two backup control files to the FRA ASM diskgroup.
6. Activate the ARCHIVE_DG clone (with -consistent since ASM is
used).
7. Optionally mount the clone devices on a backup host and
perform RMAN backup.
Detailed steps
On the production host
1. Place the production database in hot backup mode.
# export ORACLE_SID=RACDB1
# sqlplus "/ as sysdba"
SQL> alter database begin backup;
5. Create two backup control files and place them in the FRA
diskgroup for convenience (RMAN syntax is shown, although
SQL can be used as well). One will be used to mount the database
for RMAN backup; the other will be saved with the backup set.
RMAN>run {
allocate channel ctl_file type disk;
copy current controlfile to
'+FRA/control_file/control_start';
copy current controlfile to
'+FRA/control_file/control_bakup';
release channel ctl_file;
}
# export ORACLE_SID=CLONE_DB
# sqlplus "/ as sysdba"
SQL> startup mount
9. Back up the database with RMAN from the backup host. The
control file copy that was not used to mount the instance
(control_bak) should be part of the backup set. The control_start file
should not be backed up because the SCN will be updated when
the database is mounted for backup.
RMAN>run {allocate channel t1 type disk;
backup format 'ctl%d%s%p%t'
controlfilecopy '+FRA/control_file/control_bak';
backup full format 'db%d%s%p%t' database;
backup format 'al%d%s%p%t' archivelog all;
release channel t1;
}
Note: Note: The format specifier %d is for date, %t for 4-byte timestamp, %s
for backup set number, and %p for the backup piece number.
High-level steps
1. Shut down production database and ASM instances.
2. Restore the DATA_DG clone (split afterwards).
3. Start ASM.
4. Mount the database.
5. Perform database recovery and open the database.
Detailed steps
On the production host
1. Shut down any production database and ASM instances (if still
running).
# export ORACLE_SID=RACDB1
# sqlplus "/ as sysdba"
SQL> shutdown abort
# export ORACLE_SID=+ASM1
# sqlplus "/ as sysdba"
SQL> shutdown abort
3. Start the ASM instance (follow the same activities as in Use Case
1, step 7).
4. Mount the database (follow the same activities as in Use Case 1,
step 8).
5. Recover and open the production database. Use resetlogs if
incomplete recovery was performed.
# export ORACLE_SID=RACDB1
High-level steps
1. Activate the DB_DG clone (with -consistent to create restartable
replica).
2. Start the ASM instance.
3. Start the database instance.
4. Optionally, refresh the clone replica from production at a later
time.
Detailed steps
On the target host
1. Activate the TimeFinder/Clone DB_DG replica. The clone replica
includes all data, control, and log files. Use -consistent to make
sure the replica maintains dependent write consistency and
therefore a valid restartable replica from which Oracle can simply
perform crash recovery.
# symclone -dg DB_DG -tgt -consistent activate
Note: Note: Follow the same target host prerequisites as in Use Case 1 prior
to step 7.
# export ORACLE_SID=+ASM
# sqlplus "/ as sysdba"
SQL> startup
At this point the clone database is opened and available for user
connections.
4. Optionally, it is easy and fast to refresh the TimeFinder replica
from production as TimeFinder/Clone operations are incremental
as long as the clone session is not terminated. Once the clone
session is reactivated, the target devices are available
immediately for use, even if background copy is still taking place.
1. Shut down the clone database instance since it needs to be
refreshed
SQL> shutdown abort
High-level steps
1. Perform initial synchronization of SRDF in Adaptive Copy mode.
2. Once the SRDF target is close enough to the source, change the
replication mode to SRDF/S or SRDF/A.
3. Enable SRDF consistency.
Detailed steps
1. Perform initial synchronization of SRDF in Adaptive Copy mode.
Repeat this step or use the skew parameter until the SRDF target
is close enough to the source.
# symrdf -cg ALL_CG set mode acp_wp skew <number>]
# symrdf -cg ALL_CG establish
2. Once the SRDF target is close enough to the source change the
replication mode to SRDF/S or SRDF/A.
1. For SRDF/S, set protection mode to sync:
# symrdf -cg ALL_CG set mode sync
High-level steps
1. Activate the remote DB_DG clone (use -consistent to create
restartable replica).
2. Start the remote ASM instance.
3. Start the remote database instance.
4. Optionally, refresh the remote clone replica from production
(SRDF targets) at a later time.
Detailed steps
On the target host
1. Activate the TimeFinder/Clone DB_DG remote replica. The clone
replica includes all data, control, and log files. Use -consistent to
make sure the replica maintains dependent write consistency and
therefore a valid restartable replica from which Oracle can simply
perform crash recovery.
# symclone -dg DB_DG -tgt -consistent activate
Note: Note: Follow the same target host prerequisites as in Use Case 1 prior
to step 7.
2. Start the ASM instance. Follow the same activities as in Use Case
3 step 2.
3. Start the database instance. Follow the same activities as in Use
Case 3 step 3.
At this point the clone database is opened and available for user
connections.
4. Optionally, to refresh the database clone follow the same activities
as in Use Case 3 step 4.
Note: For SRDF/A: The SRDF checkpoint command will return control to the
user only after the source device content reached the SRDF target devices
(SRDF will simply wait two delta sets). This is useful for example when
production is placed in hot backup mode before the remote clone is taken.
High-level steps
1. Place the database in hot backup mode.
2. If using SRDF/A, perform SRDF checkpoint (no action required
for SRDF/S).
3. Activate a remote DATA_DG clone (with -consistent if SRDF/A
and/or ASM are used).
4. End hot backup mode.
5. Archive the current log.
6. Copy two backup control files to the FRA ASM diskgroup.
7. If using SRDF/A then perform SRDF checkpoint (no action
required for SRDF/S).
8. Activate the remote ARCHIVE_DG clone (with -consistent if
SRDF/A and/or ASM is used).
9. Optionally mount the remote clone devices on the backup host
and perform RMAN backup.
Detailed steps
On the production host
1. Place production in hot backup mode. Follow the same activities
as in Use Case 1 step 1.
2. If SRDF/A is used then an SRDF checkpoint command will make
sure the SRDF target has the datafiles in backup mode as well.
# symrdf -cg ALL_CG checkpoint
High-level steps
1. Shut down production database and ASM instances.
2. Restore the remote DATA_DG clone (split afterwards). Restore
SRDF in parallel.
3. Start ASM.
4. Mount the database.
5. Perform database recovery (possibly while the TimeFinder and
SRDF restore are still taking place) and open the database.
Detailed steps
On the production host
1. Shut down any production database and ASM instances (if still
running). Follow the same activities as in Use Case 2 step 1.
2. Restore the remote TimeFinder/Clone replica to the SRDF target
devices, then restore SRDF. If SRDF is still replicating from source
to target stop the replication first. Then start TimeFinder restore,
and once started start SRDF restore in parallel.
In some cases the distance is long, the bandwidth is limited, and
many changes have to be restored. In these cases it might make
more sense to change SRDF mode to Adaptive Copy first until the
differences are small before placing it again in SRDF/S or
SRDF/A mode.
# symrdf -cg ALL_CG split
# symclone -dg DATA_DG -tgt restore [-force]
# symrdf -cg ALL_CG restore
High-level steps
1. Shut down production database and ASM instances.
2. Restore the most recent DATA_DG clone (split afterwards).
3. Start ASM.
4. Mount the database.
5. Perform database full or incomplete recovery (possibly while the
TimeFinder background restore is still taking place).
Detailed steps
1. Shut down any production database and ASM instances (if still
running). Follow the same activities as mentioned in Use Case 2
step 1.
2. Restore the most recent DATA_DG TimeFinder replica. Follow
the same activities as mentioned in Use Case 2 step 2.
3. Start the ASM instance (follow the same activities as in Use Case 1
step 7).
4. Mount the database (follow the same activities as in Use case 1
step 8).
5. Perform database recovery based on one of the following options.
Note: Note: It might be necessary to point the location of the online redo logs
or archive logs if the recovery process didn't locate them automatically
(common in RAC implementations with multiple online or archive logs
locations). The goal is to apply any necessary archive logs as well as the
online logs fully.
set serveroutput on
declare
scn number(12) := 0;
scnmax number(12) := 0;
begin
for f in (select * from v$datafile) loop
scn := dbms_backup_restore.scandatafile(f.file#);
dbms_output.put_line('File ' || f.file# ||'
absolute fuzzy scn = ' || scn);
if scn > scnmax then scnmax := scn; end if;
end loop;
Conclusion
Symmetrix VMAX is a new offering in the Symmetrix product line
with enhanced scalability, performance, availability, and security
features, allowing Oracle databases and applications to be deployed
rapidly and with ease.
With the introduction of Enterprise Flash Drives, and together with
Fibre Channel and SATA drives, Symmetrix provides a consolidation
platform covering performance, capacity, and cost requirements of
small and large databases. The correct use of storage tiers together
with the ability to move data seamlessly between tiers allow
customers to place their most active data on the fastest tiers, and their
less active data on high-density, low-cost media like SATA drives.
Features such as Autoprovisioning allow ease of storage provisioning
to Oracle databases, clusters, and physical or virtual server farms.
TimeFinder and SRDF technologies simplify high availability and
disaster protection of Oracle databases and applications, and provide
the required level of scalability from the smallest to the largest
databases. SRDF and TimeFinder are easy to deploy and very well
integrated with Oracle products like Automatic Storage Management
(ASM), RMAN, Grid Control, and more. The ability to offload
backups from production, rapidly restore backup images, or create
restartable database clones enhances the Oracle user experience and
data availability.
Oracle and EMC have been investing in an engineering partnership
to innovate and integrate both technologies since 1995. The
integrated solutions increase database availability, enhance disaster
recovery strategy, reduce backup impact on production, minimize
cost, and improve storage utilization across a single database instance
or RAC environments.
Test setup
Figure 106 on page 464 depicts the test setup containing Oracle RAC
on the production site and associated TimeFinder/Clone and SRDF
devices for local and remote replication.
Local “Production” Host: Dell Red Hat Enterprise 11g release 1 (11.1.0.6.0)
RAC Node 1 Linux 5.0
Local “Production” Host: Dell Red Hat Enterprise 11g release 1 (11.1.0.6.0)
RAC Node 2 Linux 5.0
Remote “Target” Host Dell Red Hat Enterprise 11g release 1 (11.1.0.6.0)
Linux 5.0
Composite group:
1. Create the composite group:
symcg create device_group -type regular
This example shows how to build and populate a device group and a
composite group for TimeFinder/Clone usage:
Device group:
1. Create the device group device_group:
symdg create device_group -type regular
3. Add the target clone devices to the group. The targets for the
clones can be standard devices or BCV devices. In this example,
BCV devices are used. The number of BCV devices should be the
same as the number of standard devices, and the same size or
larger than the paired standard device. The device serial numbers
of the BCVs used in the example are 00C, 00D, 063, 064, and 065.
symbcv -g device_group associate dev 00C
symbcv -g device_group associate dev 00D
symbcv -g device_group associate dev 063
symbcv -g device_group associate dev 064
symbcv -g device_group associate dev 065
Composite group:
1. Create the composite group device_group:
symcg create device_group -type regular
3. Add the target for the clones to the device group. In this example,
BCV devices are added to the composite group to simplify the
later symclone commands. The number of BCV devices should be
the same as the number of standard devices and the same size.
The device serial numbers of the BCVs used in the example are
00C, 00D, 063, 064, and 065.
symbcv -cg device_group associate dev 00C -sid 123
symbcv -cg device_group associate dev 00D -sid 123
symbcv -cg device_group associate dev 063 -sid 456
symbcv -cg device_group associate dev 064 -sid 456
symbcv -cg device_group associate dev 065 -sid 456
Composite group:
1. Create the composite group device_group:
symcg create device_group -type regular
Overview
Previous sections demonstrated methods of creating a database
copy using storage-based replication techniques. While in some
cases, customers create one or more storage-based database
copies of the database as "gold" copies (copies that are left in a
pristine state on the array), in most cases they want to present
copied devices to a host for backups, reporting, and other
business continuity processes. Mounting storage- replicated
copies of the database requires additional array-based,
SAN-based (if applicable), and host-based steps including LUN
presentation and masking, host device recognition, and
importing of the logical groupings of devices so that the
operating system and logical volume manager recognize the data
on the devices. Copies of the database can be presented to a new
host or presented back to the same host that sees the source
database. The following sections describe the host-specific
considerations for these processes.
Whether using SRDF, TimeFinder, or Replication Manager to
create a copy of the database, there are six essential requirements
for presenting the replicated devices and making the copies
available to a host. They include:
◆ Verifying that the devices are presented to the appropriate
front-end directors in the BIN file.
◆ Verifying zoning and LUN presentation through the SAN are
configured (if needed).
◆ Editing configuration information to allow the devices to be seen
on the host.
◆ Scanning for the devices on the SCSI paths.
◆ Creating special files (UNIX) or assigning drive letters
(Windows).
◆ Making the devices ready for use.
The following sections briefly discuss these steps at a high level.
SAN considerations
Hosts can be attached to a Symmetrix DMX either by direct
connectivity (FC-AL, iSCSI, ESCON, or FICON), or through a
SAN using Fibre Channel (FC-SW). When using direct-connect,
all LUNs presented to a front-end port are presented to the host.
In the case of a SAN, additional steps must be considered. These
include zoning, which is a means of enabling security on the
switch, and LUN masking, which is used to restrict hosts to see
only the devices that they are meant to see. Also, there are
HBA-specific SAN issues that must be configured on the hosts.
SAN zoning is a means of restricting FC devices (for example,
HBAs and Symmetrix front-end FC director ports) from accessing
all other devices on the fabric. It prevents FC devices from
accessing unauthorized or unwanted LUNs. In essence, it
establishes relationships between HBAs and FC ports using
World Wide Names (WWNs). WWNs are unique hardware
identifiers for FC devices. In most configurations, a one-to-one
relationship (the zone) is established between an HBA and FC
port, restricting other HBAs (or FC ports) from accessing the
LUNs presented down the port. This simplifies configuration of
shared SAN access and provides protection against other hosts
gaining shared access to the LUNs.
In addition to zoning, LUN masking, which on the Symmetrix
array is called Volume Logix™, can also be used to restrict hosts to
see only specified devices down a shared FC director port. SANs
are designed to increase connectivity to storage arrays such as the
Symmetrix. Without Volume Logix, all LUNs presented down a
FC port would be available to all hosts that are zoned to the
front-end port, potentially compromising both data integrity and
security.
The combination of zoning and Volume Logix, when configured
correctly for a customer's environment, ensures that each host
only sees the LUNs designated for it. They ensure data integrity
and security, and also simplify the management of the SAN
environment. There are many tools to configure zoning and LUN
Overview 475
Related Host Operation
AIX considerations
When presenting copies of devices from an AIX environment to a
different host from the one the production copy is running on, the
first step is to scan the SCSI bus, which allows AIX to recognize
the new devices. The following demonstrates the steps needed for
the host to discover and verify the disks, bring the new devices
under PowerPath control if necessary, import the volume groups,
and mount the file systems (if applicable).
1. Before presenting the new devices, it is useful to run the
following commands and save the information to compare to
after the devices are presented:
lsdev -Cc disk
lspv
syminq
4. The next step is for the target host to recognize the new devices.
The following command scans the SCSI buses and examines all
adapters and devices presented to the target system:
cfgmgr -v
Once the devices are discovered by AIX, the next step is to import
the volume groups. The key is to keep track of the PVIDs on the
source system. The PVID is the physical volume identifier that
uniquely identifies a volume across multiple AIX systems. When
the volume is first included in a volume group, the PVID is
assigned based on the host serial number and the timestamp. In
this way, no two volumes should ever get the same PVID.
However, array-based replicating technologies copy everything
on the disk including the PVID.
7. On the production host, use the lspv command to list the physical
volumes Locate the PVID of any disk in the volume group being
replicated. On the secondary host, do an lspv as well. Locate the
hdisk that corresponds to the PVID noted in the first step.
Suppose the disk has the designation hdisk33. The volume group
can now be imported using the command:
importvg -y vol_grp hdisk33
10. The first time this procedure is performed, create mount points
for the file systems if raw volumes are not used. The mount
points should be made the same as the mount points for the
production file systems.
HP-UX considerations
When presenting clone devices in an HP-UX environment to a
host different from the one the production copy is running on,
initial planning and documentation of the source host
environment is first required. The following demonstrates the
steps needed for the target host to discover and verify the disks,
bring the new devices under PowerPath control if necessary,
import the volume groups and mount the file systems (if
applicable).
1. Before presenting the new devices, it is useful to run the
following commands on the target host and save the information
to compare to output taken after the devices are presented:
vgdisplay -v | grep "Name"(List all volume groups)
syminq(Find Symmetrix volume for each c#t#d#)
3. Create map files for each volume group to replicate. The Volume
Group Reserve Area (VGRA) on disk contains descriptor
information about all physical and logical volumes that make up
a volume group. This information is used when a volume group
is imported to another host. However, logical volume names are
not stored on disk. When a volume group is imported, the host
assigns a default logical volume name. To ensure that the logical
volume names are imported correctly, a map file generated on the
source is created for each volume group and used on the target
host when the group is imported.
vgexport -v -p -m /tmp/vol_grp.map vol_grp
7. Create device special files for the volumes presented to the host:
insf -e
12. Import the volume groups onto the target host. Volume group
information from the source host is stored in the Volume Group
Reserve Area (VGRA) on each volume presented to the target
host. Volume groups are imported by specifying a volume group
name, if the volume group names are not used on the target.
vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d#
[/dev/rdsk/c#t#d#]
14. Once the volume groups are activated, mount on the target any
file systems from the source host. These file systems may require
a file system check using fsck as well. Add an entry to /etc/fstab
for each file system.
Linux considerations
Enterprise releases of Linux from Red Hat and SuSE provide a
logical volume manager for grouping and managing storage.
However, it is not common to use the logical volume manager on
Linux. The technique deployed to present and use a copy of
Oracle database on a different host depends on whether or not the
logical volume manager is used on the production host. To access
the copy of the database on a secondary host, follow these steps:
1. Create a mapping of the devices that contain the database to file
systems. This mapping information is used on the secondary
host. The mapping can be performed by using the information in
the /etc/fstab file and/or the output from the df command.
In addition, if the production host does not use logical volume
manager, the output from syminq and
symmir/symclone/symsnap command is required to associate
the operating-system device names (/dev/sd<x>) with
Symmetrix device numbers on the secondary host.
2. Unlike other UNIX operating systems, Linux does not have a
utility to rescan the SCSI bus. Any of the following methods allow
a user to discover changes to the storage environment:
vgchange -a y volume_group_name
The pvscan command displays all the devices that are initialized,
but not belonging to a volume group. The command should
display all members of the volume groups that constitute the
copy of the database. The vgimport command imports the new
devices and creates the appropriate LVM structures needed to
access the data. If LVM is not used, this step can be skipped.
6. Once the volume groups, if any, are activated, mount on the target
any file systems from the source host. If logical volume manager
is not being used, execute syminq on the secondary host. The
output documents the relationship between the operating system
device names (/dev/sd<x>) and the Symmetrix device numbers
associated with the copy of the database. The output from step 1
can be then used to determine the devices and the file systems
that need to be mounted on the secondary host.
These file systems may require a file system check (using fsck)
before they can be mounted. If it does not exist, make an entry to
/etc/fstab for each file system.
Solaris considerations
When presenting replicated devices in a Solaris environment to a
different host from the one production is running on, the first step
is to scan the SCSI bus which allows the secondary Solaris system
to recognize the new devices. The following steps cause the host
to discover and verify the disks, bring the new devices under
PowerPath control if necessary, import the disk groups, start the
logical volumes, and mount the file systems (if applicable). The
following commands assume that VERITAS Volume Manager
(VxVM) is used for logical volume management.
1. Before presenting the new devices, run the following commands
and save the information to compare to, after the devices are
presented:
vxdisk list
vxprint -ht
syminq
Oracle disk group which physical devices are used, and show the
relationship between hdisks should be run on the host prior to
making any device changes. This is a precaution only and is to
document the environment should it reqiore a manual restore
later.
vxdg list(List all the disk groups)
vxdisk list(List all the disks and associated groups)
syminq(Find Symmetrix volume numbers for each Oracle
disk)
4. The next step is for the target host to recognize the new devices.
The following command scans the SCSI buses, examines all
adapters and devices presented to the target system, and builds
the information into the /dev directory for all LUNs found:
drvconfig;devlinks;disks
6. VERITAS needs to discover the new devices after the OS can see
them. To make VERITAS discover new devices, enter:
vxdctl enable
8. Once VERITAS has found the devices, import the disk groups.
The disk group name is stored in the private area of the disk. To
import the disk group, enter:
vxdg -C import diskgroup
Use the -C flag to override the host ownership flag on the disk.
The ownership flag on the disk indicates the disk group is online
to another host. When this ownership bit is not set, the vxdctl
enable command actually performs the import when it finds the
new disks.
9. Run the following command to verify that the disk group
imported correctly:
vxdg list
11. For every logical volume in the volume group, run fsck must to
fix any incomplete file system unit of work:
fsck -F vxfs /dev/vx/dsk/diskgroup/lvolname
12. Mount the file systems. If the UID and GIDs are not the same
between the two hosts, run the chown command to change the
ownerships of the logical volumes to the DBA user and group
that administers the server:
chown dbaadmin:dbagroup /dev/vx/dsk/diskgroup/lvolname
chown dbaadmin:dbagroup
/dev/vx/rdsk/diskgroup/lvolname
13. The first time this procedure is performed, create mount points
for the file systems, if raw volumes are not used. The mount
points should be made the same as the mount points for the
production file systems.
Windows considerations
To facilitate the management of volumes, especially those of a
transient nature such as BCVs, EMC provides the Symmetrix
Integration Utility (SIU). SIU provides the necessary functions to scan
for, register, mount, and unmount BCV devices.
This command will unmount the volume from the drive letter and
dismiss the Windows cache that relates to the volume. If any running
application maintains an open handle to the volume. SIU will fail and
report an error. The administrator should ensure that no applications
are using any data from the required volume; proceeding with an
unmount while processes have open handles is not recommended.
The SIU can identify those processes that maintain open handles to
the specified drive, using the following command:
symntctl openhandle -drive W:
AIX considerations
When presenting database copies back to the same host in an AIX
environment, one must deal with the fact that the OS now sees the
source disk and an identical copy of the source disk. This is because
the replication process copies not only the data part of the disk, but
also the system part, which is known as the Volume Group
Descriptor Area (VGDA). The VGDA contains the physical volume
identifier (PVID) of the disk, which must be unique on a given AIX
system.
The issue with duplicate PVIDs prevents a successful import of the
copied volume group and has the potential to corrupt the source
volume group. Fortunately, AIX provides a way to circumvent this
limitation. AIX 4.3.3 SP8 and later provides the recreatevg command
to rebuild the volume group from a supplied set of hdisks or
powerdisks. Use syminq to determine the hdisks or powerdisks that
belong to the volume group copy. Then, issue either of the two
commands:
recreatevg -y replicavg_name -l lvrename.cfg hdisk##
hdisk## hdisk ## …
recreatevg -y replicavg_name -l lvrename.cfg hdiskpower##
hdiskpower## hdiskpower## …
where the ## represents the disk numbers of the disks in the volume
group. The recreatevg command gives each volume in the set of
volumes a new PVID, and also imports and activates the volume
group.
HP-UX considerations
Presenting database copies in an HP-UX environment to the same
host as the production copy is nearly identical to the process used for
presenting the copy to a different host. The primary differences are
the need to use a different name for the volume groups and the need
to change the volume group IDs on the disks.
1. Before presenting the new devices, it is useful to run the
following commands on the target host and save the information
to compare to outputs taken after the devices are presented:
vgdisplay -v | grep "Name"(List all volume groups)
syminq (Find Symmetrix volume for each c#t#d#)
7. Create device special files for the volumes presented to the host:
insf -e
10. Once the devices are found by HP-UX, identify them with their
associated volume groups from the source host so that they can
be imported successfully. When using the vgimport command,
specify all of the devices for the volume group to be imported.
Since the target and LUN designations for the target devices are
different from the source volumes, the exact devices must be
identified using the syminq and symmir output. Source volume
group devices can be associated with Symmetrix source devices
through syminq output. Then Symmetrix device pairings from
the source to target hosts are found from the symmir device
group output. And finally, Symmetrix target volume to target
host device pairings are made through the syminq output from
the target host.
11. Change the volume group identifiers (VGIDs) on each set of
devices making up each volume group. For each volume group,
change the VGID on each device using the following:
vgchgid /dev/rdsk/c#t#d# [/dev/rdsk/c#t#d#] . . .
12. After changing the VGIDs for the devices in each volume group,
create the volume group structures needed to successfully import
the volume groups onto the new host. A directory and group file
for each volume group must be created before the volume group
is imported. Ensure each volume group has a unique minor
number and is given a new name.
ls -l /dev/*/group(Identify used minor numbers)
mkdir /dev/newvol_grp
mknod /dev/newvol_grp/group c 64 0xminor#0000
(minor# must be unique)
13. Import the volume groups onto the target host. Volume group
information from the source host is stored in the VGRA on each
volume presented to the target host. Volume groups are imported
by specifying a volume group name, if the volume group names
are not used on the target.
vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d#
[/dev/rdsk/c#t#d#]
15. Once the volume groups are activated, mount on the target any
file systems from the source host. These file systems may require
a file system check using fsck as well. An entry should be made
to /etc/fstab for each file system.
Linux considerations
Presenting database copies back to the same Linux host is possible
only if the production volumes are not under the control of the logical
volume manager. Linux logical volume manager does not have utility
such as vgchgid to modify the UUID (universally unique identifier)
written in the private area of the disk.
For a Oracle database not under LVM management, the procedure to
import and access a copy of the production data on the same host is
similar to the process for presenting the copy to a different host. The
following steps are required:
1. Execute syminq and symmir/symclone/symsnap to determine
the relationship between the Linux device name (/dev/sd<x>),
the Symmetrix device numbers that contain the production data,
and the Symmetrix device numbers that hold the copy of the
production data. In addition, note the mount points for the
production devices as listed in /etc/fstab and the output from the
command df.
2. Initiate the scan of SCSI bus by running the following command
as root:
echo "scsi scan-new-devices" > /proc/scsi/scsi
Solaris considerations
Presenting database copies to a Solaris host using VERITAS volume
manager where the host can see the individual volumes from the
source volume group is not supported other than with Replication
Manager. Replication Manager provides "production host" mount
capability for VERITAS.
The problem is that the VERITAS Private Area on both the source and
target volumes is identical. A vxdctl enable finds both volumes and
gets confused as to which are the source and target.
To get around this problem, the copied volume needs to be processed
with a vxdisk init command. This re-creates the private area. Then, a
vxmake using a map file from the source volume created with a
vxprint -hvmpsQq -g dggroup can be used to rebuild the volume
group structure after all the c#t#d# numbers are changed from the
source disks to the target disks. This process is risky and difficult to
script and maintain and is not recommended by EMC.
Windows considerations
The only difference for Windows when bringing back copies of
volumes to the same Windows server is that duplicate volumes or
volumes that appear to be duplicates are not supported in a cluster
configuration.
############################################################
# Define Variables
############################################################
ORACLE_SID=oratest
export ORACLE_SID
ORACLE_HOME=/oracle/oracle10g
export ORACLE_HOME
SCR_DIR=/opt/emc/scripts
CLI_DIR=/usr/symcli/bin
DATA_DG=data_dg
LOG_DG=logs_dg
#############################################################
############################################################
# Establish the BCVs for each device group
############################################################
${SCR_DIR}/establish.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 1
fi
############################################################
# Get the tablespace names using sqlplus
############################################################
su - oracle -c ${SCR_DIR}/get_tablespaces_sub.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 2
fi
############################################################
# Put the tablespaces into hot backup mode
############################################################
su - oracle -c ${SCR_DIR}/begin_hot_backup_sub.ksh
############################################################
# Split the DATA_DG device group
############################################################
${SCR_DIR}/split_data.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 3
fi
############################################################
# Take the tablespaces out of hot backup mode
############################################################
su - oracle -c ${SCR_DIR}/end_hot_backup_sub.ksh
############################################################
# Split the LOG_DG device group
############################################################
${SCR_DIR}/split_log.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 4
fi
#!/bin/ksh
############################################################
# establish.ksh
# This script initiates a BCV establish for the $DATA_DG
# and $LOG_DG device groups on the Production Host.
############################################################
############################################################
# Define Variables
############################################################
CLI_BIN=/usr/symcli/bin
DATA_DG=data_dg
LOG_DG=log_dg
############################################################
# Establish the DATA_DG and LOG_DG device groups
############################################################
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${LOG_DG} query | grep SyncInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be established."
echo
sleep 10
done
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${DATA_DG} query | grep SyncInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be established."
echo
sleep 10
done
exit 0
=================================================================
#!/bin/ksh
############################################################
# get_tablespaces_sub.ksh
# This script queries the Oracle database and returns with
# a list of tablespaces which is then used to identify
# which tablespaces need to be placed into hotbackup mode.
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
############################################################
# Get the tablespace name using sqlplus
############################################################
############################################################
# Remove extraneous text from spool file
############################################################
> ${TF_DIR}/tablespaces.txt
############################################################
# Verify the creation of the tablespace file
############################################################
if [ ! -s ${SCR_DIR}/tablespaces.txt ]; then
exit 1
fi
exit 0
=================================================================
#!/bin/ksh
############################################################
# begin_hot_backup_sub.ksh
# This script places the oracle database into hot backup
# mode.
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
############################################################
# Do a log switch
############################################################
############################################################
# Put all tablespaces into hot backup mode
############################################################
TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`
exit 0
=================================================================
#!/bin/ksh
############################################################
# split_data.ksh
# This script initiates a Split for the $DATA_DG Device
# group on the Production Host.
############################################################
############################################################
# Define Variables
############################################################
CLI_BIN=/usr/symcli/bin
DATA_DG=data_dg
############################################################
# Split the DATA_DG device group
############################################################
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${DATA_DG} query | grep SplitInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be split."
echo
sleep 5
done
exit 0
=================================================================
#!/bin/ksh
############################################################
# end_hot_backup_sub.ksh
# This script ends the hot backup mode for the oracle
# database. The script is initiated by the end_hot_backup
# scrips
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
###########################################################
# Take all tablespaces out of hotbackup mode
############################################################
TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`
############################################################
# Do a log switch
############################################################
exit 0
=================================================================
#!/bin/ksh
############################################################
# split_log.ksh
# This script initiates a Split for the $LOG_DG Device
# group on the Production Host.
############################################################
############################################################
# Define Variables
############################################################
CLI_BIN=/usr/symcli/bin
LOG_DG=log_dg
############################################################
# Split the LOG_DG device group
############################################################
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${LOG_DG} query | grep SplitInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be split."
echo
sleep 5
done
exit 0
=================================================================
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring 509
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and
Monitoring
Overview
This appendix describes the Solutions Enabler commands lines (CLI)
that can be used to configure and monitor FAST VP operations. All
such operations can also be executed using the GUI of SMC.
Although there are command line counterparts for the majority of the
SMC-based operations, the focus here is to show only some basic
tasks that operators may want to use CLI for.
Enabling FAST
Operation: Enable or disable FAST operations.
Command:
symfast –sid <Symm ID> enable/disable
}
Pool Bound Thin Devices(20): <== Number of Bound Thin Devices
(TDEV) in the Thin Pool
{
-----------------------------------------------------------------------
Pool Pool Total
Sym Total Subs Allocated Written
Dev Tracks (%) Tracks (%) Tracks (%) Status
-----------------------------------------------------------------------
0162 1650000 5 1010940 61 1291842 78 Bound
Shows that Symmetrix thin device 0162 has thin device extents
spread across data devices on FC_Pool, EFD_Pool and SATA_Pool
...
Overview 511
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and
Monitoring
Legend:
Flags: (E)mulation : A = AS400, F = FBA, 8 = CKD3380, 9 = CKD3390
(M)ultipool : X = multi-pool allocations, . = single pool allocation
--------------------------------------------------------------------
I Logical Capacities (GB)
Target n --------------------------------
Tier Name Tech Protection c Enabled Free Used
--------------------- ---- ------------ - -------- -------- ----------------------
Legend:
Inc Type : S = Static, D = Dynamic