You are on page 1of 37

INFINIBOX ORACLE DATABASE

INTEGRATION AND BEST PRACTICES

LAST UPDATED: 02/28/2021


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Table of Contents
1 Abstract_______________________________________________________________3
2 Introduction ___________________________________________________________4
3 High Availability and RAS_________________________________________________6
3.1 Backup and recovery .................................................................................................................... 6
3.2 Database Cloning using Snapshots............................................................................................ 13
3.3 Database Replication vs. Storage Replication........................................................................... 14
3.4 Oracle ASM, crash consistency and recovery ............................................................................ 18
4 Ease of use ___________________________________________________________19
4.1 InfiniBox GUI and CLI are very easy to use................................................................................. 19
4.2 InfiniBox Architecture Promotes Simplified Data Layout ......................................................... 19
5 Performance__________________________________________________________21
5.1 Oracle I/O Profile and Infinidat storage synergy ....................................................................... 21
5.2 Host-based Configuration Guidelines........................................................................................ 23
5.3 Red Hat Enterprise Linux (RHEL) I/O schedulers ....................................................................... 24
5.4 RHEL File system types ............................................................................................................... 24
5.5 Windows NTFS............................................................................................................................. 25
5.6 AIX jfs2 and LVM........................................................................................................................... 25
5.7 Oracle ASM................................................................................................................................... 26
5.8 Oracle Databases on NFS............................................................................................................ 29
6 Data Reduction________________________________________________________34

2
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

1 Abstract
RDBMS Databases represent the backbone of all organizations large and small. These databases are an
integral part of any organization and represent an asset that requires the highest availability, reliability,
performance and flexibility for all aspects of the organization they support. Typically storing mission
critical transactional data about customers, patients, suppliers, orders; databases are also used to
analyze the performance of the organization using data warehouses, data marts, data lakes, as well as
non-structured data analysis as the organization moves into capturing and analyzing what is now
popularly known as “Big Data”, or data outside of the “Systems of Record” data stores.

InfiniBox represents a new age of data storage, departing from the traditional dual-controller, RAID-set
storage mentality and provides a solution to the most demanding of application and database
environments, while using best-of-breed storage architecture design that provides unmatched ease-of-
use, fast start-to-finish storage deployment tools, the InfiniBox system benefits tremendously by
avoiding these legacy storage architectures. This gives applications hosted on InfiniBox higher, more
predicable performance as well as much simpler, easier to manage host side configurations.

The net result is a much lower TCO for applications migrated to InfiniBox, and a platform for
unparalleled database and application consolidation where up to 2PB of data can be stored in a single
floor tile. No other storage vendor can store mission critical databases.

This paper provides more details on what features InfiniBox provides, and how specific database
activities are streamlined, how the InfiniBox architecture encourages simpler database architecture
designs and how InfiniBox reduces the time and complexity of managing these critical database
resources.

The most up-to-date version of this document can be found on the Infinidat Support site:
https://support.infinidat.com/hc/en-us/articles/360002184438

ABSTRACT – 3
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

2 Introduction
Large databases (10’s of TB to hundreds of TB) pose a unique challenge to enterprise storage arrays by
providing an I/O profile that is unpredictable, and often overwhelms the storage frame resulting in high
latencies, which increase the run time of database workloads. Some database activities are very latency
sensitive, and in many cases will effect the end user population that the application supports.

InfiniBox provides benefits that are requirements for enterprise database deployment:
• Consistent, high performance: Infinidat is designed with massive parallelization, huge compute power,
large L1 and L2 caches, while it’s data distribution architecture ensures even access across all 480 NL SAS at
all times and providing consistent, predictable performance, an absolute requirement for all enterprise
databases. The storage snapshot architecture of InfiniBox provides the ability to execute thousands of
snapshots, while not effecting performance, from which Oracle databases can derive benefit. Snapshots are
used by our customers to augment their Oracle database backup and recovery architecture.
• High availability and reliability: InfiniBox architecture provides a robust, highly available storage
environment, providing 99.99999% uptime, which is one of the highest rated uptime’s for any storage
platform. That equates to 3 seconds of downtime a year! Drive rebuild times are the best in the storage
industry. Oracle customers using Infinidat system report no loss of data, even upon multiple disk failures.
InfiniBox offers end-to-end business continuity features -- including synchronous and asynchronous remote
mirroring -- and supports Oracle RAC and its multi-node, multi-initiated Oracle Real Application Cluster
(RAC) platform for highly available clustered database support. Using snapshots, recovery of a database can
be reduced to the amount of time it takes to map the volumes to hosts, minutes instead of hours of
recovery time using a more traditional RMAN recovery process.
• Exceptional ease in storage management: InfiniBox architecture, along with the elegant simplicity of it’s
web-based GUI allow for easy, fast deployment and management of storage to database environments.
The amount of time saved in performing traditional storage administration tasks is huge. Also, because of
InfiniBox open architecture, and aggressive support for REST-ful API’s the use of other platforms such as
Openstack, and emerging container-based application environments such as Docker allow for storage
administration tasks to be performed at the application level, without the need to use the excellent
InfiniBox GUI. Direct storage deployment and management can also be performed directly from VMWARE’s
vCenter console through support for all of the major VMWARE API’s such as VAAI, VASA and VADP.
• Lower total cost of ownership: Massive parallelization, extreme availability, highest data density in the
industry, consistent performance and easy of use all point to unmatched TCO. This is important for
environments where there is a need to consolidate mission critical databases into smaller and smaller
physical footprints, while our customers are experiencing an explosion of data sources such as mobile,

INTRODUCTION – 4
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

machine generated data, and huge amounts of analytic data. There is no other storage platform on the
market that provides all of these benefits, particularly for mission critical enterprise database
environments.
This paper will go through some of the major requirements and characteristics and provide guidance
on best practices, as well as any observed behavior that is unique to running Oracle Databases on
InfiniBox.

INTRODUCTION – 5
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

3 High Availability and RAS


InfiniBox is designed to provide the highest availability, while reducing the physical data footprint to
store that data by utilizing a unique and patented data distribution and parity-based protection
mechanism that distributes data from each volume across EVERY drive in the InfiniBox frame. That is
480 NL-SAS drives supporting each and every volume (F6240).

The parity-based storage architecture insures that the highest amount of usable capacity is available.

Drive rebuild times can directly affect availability. InfiniBox has a maximum drive rebuild time for a 6TB
drive of 15 minutes for a system (hard drive) that is completely full. For systems with less space used,
the rebuilt times are lower. The reason is that InfiniBox is not built from RAID sets, or limited number of
spindles grouped together. InfiniRAID is a new way to store data with a unique and patented way to
distribute large amounts of data across every spindle in the frame. This significantly improves (reduces)
drive rebuild times in part because when data is needed to be re-built all drives in the system support
the effort. Also, because of the unique and patented way that InfiniBox stores data along with parity,
most of the data is rebuilt without having to move data from one place to another.

Parity-based storage architecture, plus low drive rebuild times, and fully redundant hardware (many
components have triple redundancy) gives InfiniBox the ability to provide 99.99999% uptime per year.
This equates to roughly 3 seconds of down time per year. If you put this into perspective, at 3 seconds a
year, that is much shorter than your average SCSI timeout sequence. Which means even if there was
down time, it would not cause the host to loose connection to the data, or even recognize that there
was a short disruption of data availability.

With such high availability, database consolidation, and reduction of mirrored copies of data are
possible. More customers are considering consolidation of databases onto less hardware to reduce
costs.

3.1 Backup and recovery


Most Oracle customers use more traditional backup and recovery architecture, primarily using Oracle
RMAN (Recovery Manager) to backup and recover database data. RMAN is the preferred method to
provide a complete, comprehensive backup image of your database, while providing full or piece-meal
recovery of data to allow for more focused recovery capabilities.

When RMAN backup is called for, InfiniBox is a very good backup target candidate for backup of a
database directly to disk. The reason is that it will be fast, and it will be cost effective due to the massive
density of storage on a single frame. Many Oracle database shops use VTL-based RMAN backup
strategies as the first line of recovery defense.

The following is an example RMAN configuration:

HIGH AVAILABILITY AND RAS – 6


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

RMAN> show all;


RMAN configuration parameters for database with db_unique_name ORAINF are:

CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default


CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO BACKUPSET PARALLELISM 8;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE CHANNEL 2 DEVICE TYPE DISK FORMAT '/backup/orainf/back2/%U';
CONFIGURE CHANNEL 3 DEVICE TYPE DISK FORMAT '/backup/orainf/back3/%U';
CONFIGURE CHANNEL 4 DEVICE TYPE DISK FORMAT '/backup/orainf/back4/%U';
CONFIGURE CHANNEL 5 DEVICE TYPE DISK FORMAT '/backup/orainf/back1/%U';
CONFIGURE CHANNEL 6 DEVICE TYPE DISK FORMAT '/backup/orainf/back2/%U';
CONFIGURE CHANNEL 7 DEVICE TYPE DISK FORMAT '/backup/orainf/back3/%U';
CONFIGURE CHANNEL 8 DEVICE TYPE DISK FORMAT '/backup/orainf/back4/%U';
CONFIGURE MAXSETSIZE TO UNLIMITED;
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD
TRUE ; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO
'/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_orainf.f'; # default

In this configuration example, a total of 8 channels are used and set parallelism to 8. This is to insure
that I/O multi-threading and CPU capacity are maximized to drive the backup hard. You don’t have to
adopt this technique, however, the point of driving backup hard is to complete the backup in as short a
time as possible. If this is not a requirement, back off of the number of channels configured, and the
level of parallelism to the desired level. One reason to do this is to not consume all of the servers CPU
resources and I/O bandwidth capacity while backup is being performed because other activities are
also being performed on the same server.

To backup the database using this level of parallelism, perform the following at the RMAN command
prompt:

backup section size 100G database;

Note the use of section size 100g in the backup command. Without it, every tablespace backup will be
stored in a single backup piece, or if you use the maxpiecesize command at the channel level, it will
break up a large datafile into maxpiecesize chunks, but in this case, since the schema had only a single
large tablespace within the database (made up of a single large ASM data file), still only used one
channel for backup. With the maxpiecesize set, a single channel was observed to write about 350MB/
sec to the /backup file system, while reading from the ASM datafiles at the same 350MB/sec. This

HIGH AVAILABILITY AND RAS – 7


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

backup was started at 10:54 and ended at 20:50, a total of almost 10 hours. Below is a screen shot of
the back process from the InfiniBox performance screen:

After changing the backup command as shown above, (using the section size parameter in the backup
command) the figure below shows a backup of the 5TB database using all 8 channels and maxing out at
a combined 3GB/sec (reading from ASM datafiles at 1.5GB/sec and writing to the /backup file system at
1.5GB/sec, line speed for this 4 x 8GB hba system. The backup took 35 minutes, a significant
improvement over the default backup. The database size was 4.7TB.

HIGH AVAILABILITY AND RAS – 8


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

So, again, the choice is yours, to run backup at full speed, or run it at a reduced speed.

3.1.1 RMAN backup to NAS


Here is a 7.6TB Oracle database backed up to a single 10TB NFS mount point. Infinimetrics view of the
NAS side, which shows the writing side of the RMAN backup process to the NAS mount. We are running
at 1GB/s, line speed for this server. Note the 1.5ms write latency. NAS provides an efficient way to
provide a backup target for RMAN. Since this is an NFS mount, the file system can be mounted on
another server, running IBM’s TSM, or Commvault backup software to backup the RMAN backup sets to
tape. This backup of a database took roughly 2 hours. A single 12TB NFS mount point was created
called /backup, and 4 sub-directories were created. The RMAN config was modified to set parallelism to
8, and 8 separate channels were created, 2 per sub-directory to insure maximum parallelization. The
default filesperset was used to insure that all channels are busy performing work.

Here is the SAN side, which was the reading side of the backup process.

HIGH AVAILABILITY AND RAS – 9


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

3.1.2 RMAN Backup and Compression


RMAN supports in-line data compression of the data as it is being written to the backup target. There
are a few levels of compression supported by RMAN. The RMAN configuration and backup command
are modified to enable compression as follows:
• Set the <'compression level'> compression algorithm to 'DEFAULT'
• This configuration line executed before the backup command sets the level of compression desired. The
levels are:
• BASIC – This is the default
• Advanced Compression options:
• LOW
• MEDIUM
• HIGH
• The backup section size should be 20GB
Here is an example of a backup performed against a 7.6TB database using all variations of compression
options, and their respective impacts on backup set size, and host server utilization. This test was run
on an Oracle 12c database, running on RHEL 7.1.

Test Size Compress Tim Band InfiniBox Ser Ser Serve Ser
[MB] ion Ratio e width R/W ver ver r [% ver
(n:1) [hh: [MB/s] Latency [%u [%s I/O [%i
mm [ms] sr] ys] wait] dle]
]

Uncompressed 7,7713, 1.00 1:20 3,600 26/52 17.0 39.0 2.20 46.0
299 0 0 0

BASIC 2,787,9 2.77 16:0 220 10/14 49.0 1.00 0.30 49.0
01 0 0 0

Low 4,228,2 1.82 1:35 2,000 17/43 57.0 15.0 1.70 29.0
Compression 52 0 0 0

HIGH AVAILABILITY AND RAS – 10


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Test Size Compress Tim Band InfiniBox Ser Ser Serve Ser
[MB] ion Ratio e width R/W ver ver r [% ver
(n:1) [hh: [MB/s] Latency [%u [%s I/O [%i
mm [ms] sr] ys] wait] dle]
]

Medium 3,511,8 2.20 8:54 360 10/12 61.0 8.00 0.00 31.0
Compression 57 0 0

High 2,922,2 2.64 15:5 212.00 9/26 49.0 0.20 0.90 49.0
Compression 24 0 0 0

3.1.3 RMAN Backup in a RAC environment


In the guide: “Oracle Real Application Clusters Administration and Deployment Guide” for 11g or 12c,
Chapter 7: Managing Backup and Recovery, goes into more detail about configuring RMAN to work in a
RAC environment. Specifically, on Page 7-4, examples are provided to configure and launch multiple
channels against certain nodes in the cluster. You can also use Oracle’s server side load balancing to
dynamically launch channels against any and all surviving nodes in the cluster to maximize
parallelization of your backup by using all nodes in the cluster.

Alternately, you can manually direct RMAN to launch specific channels against specific nodes/DB
instances in the cluster to perform backup. Just be aware, that when using this approach, if any of those
nodes is or becomes unavailable, the backup will no longer be load balanced, and could take a longer
time to execute.

In either case, Infinidat makes for an excellent backup target, either block storage file system, or
network filesystem as described above.

3.1.4 InfiniBox Snapshots as a backup / recovery option


InfiniBox provides an elegant high-speed storage-based snapshot system (InfiniSnap) that allows you to
take thousands of snapshots of with no performance impact. Storage-based snapshots are yet another
way to backup and recover database data at the storage level, performing the task within
microseconds, rather than the hours it takes for a traditional RMAN backup. RMAN must read all the
data from the database and write it all to a virtual tape library or actual tape library, compared to
InfiniSnaps which can be used to restore the database immediately and quickly from the snapshot.

InfiniBox provide the ability to create consistency groups to group storage volumes together to enable
the ability to take a snapshot using a single command line, or mouse click to execute a snapshot across
all volumes in the consistency group simultaneously. There is no time gap between snapshot images of
volumes within a consistency group, no matter how many volumes are in the consistency group. This
insures that the contents of the snapshot images within the consistency group are timestamped
essentially the same, which means that the contents of the snapshot will contain data from a specific
point in time and therefore crash consistent image of the database.

This consistent snapshot image (or any one of the tens of thousands of snapshots that can be taken) of
the database, which uses InfiniBox efficient redirect on write snapshot image can then be used as a
direct recovery option for the database. The recovery is simple, and fast. Within the GUI you just click

HIGH AVAILABILITY AND RAS – 11


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

on “Restore from this group” to take the image of the snapshot and overwrite the original volume
contents. This action can also be done within Infinishell on the command line.

Finally, InfiniBox consistency groups can be used as part of the InfiniBox volume replication system to
copy the data to a secondary disaster recovery site, shipped as a point in time snapshot of the data
with a Recovery Point Objective as low as 4 seconds, the lowest in the entire storage industry.

For consistent images of the database (as opposed to a crash consistent image) Oracle provides a
mechanism to allow for the use of storage-based snapshot technologies by allowing the administrator
to place the database in “Backup mode”. The command to do this is:

alter database begin backup;

or

alter tablespace begin backup;

You then execute an InfiniBox snapshot of all volumes supporting this database. Then, issue:

alter database end backup;

or

alter tablespace end backup;

The amount of time in backup mode is very short, as long as it takes you to either go to the InfiniBox
GUI and execute the snapshot or use the CLI commands to execute the snapshot directly from the
server.

This backup mode allows Oracle to continue normal operation even if users are writing data to the
database. Please visit support.oracle.com for more details. Specifically, the Oracle Backup Recovery
Reference Guides for any version of Oracle provide more details about using this series of commands.

Essentially backup mode places the database in a special mode where the contents of the data across
the database can be seen as “consistent” and “recoverable”. When the snapshot is taken while in
backup mode, the contents of the data, the control files, the data file headers and the redo logs all are
in a consistent state. If you need to use this snapshot to recover the database, you simply use the
snapshot images as the primary volumes and mount the database. Oracle, upon mounting reviews the
time stamps and contents of the control files, the data file headers, redo logs and determines what
point in time that the data represents. The database then opens in a mode that allows for even further
recovery by allowing the database administrator to recover the database and apply archive logs (copies
of historical redo logs) until an exact point in time is achieved.

HIGH AVAILABILITY AND RAS – 12


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Obviously the number of archive logs applied is highly dependent on how frequently storage snapshots
are taken. If you take a snapshot every 15 minutes, and your archive logs are written less frequently,
you may not need to apply archive logs to get to the closest RPO (Recovery Point Objective) desired.

3.1.5 Best practice: Use both InfiniBox Snapshots AND RMAN backups
A good strategy would be to utilize both InfiniBox snapshots and RMAN backups for a well-rounded
backup and recovery solution that will cover a wide range of recovery scenarios. And the flexibility of
having both allows for a more customized restoration approach. We have seen customers utilize this
mixture where snapshots are used for daily backup and near line recovery option for full system
restore, and a week-end RMAN full backup to allow both piece meal restoration as well as support off-
site backup media, which InfiniBox snapshots alone cannot achieve.

3.2 Database Cloning using Snapshots


To clone a database, one option would be to use RMAN and what is called a “redirected restore” which
will allow you to point the RMAN restore process to a different set of LUNs. There are several steps
needed to insure that proper device naming is used to restore the database to the right locations, but
once in place, redirected restore is a very viable option. The amount of time it takes to restore is fairly
dependent on how long it takes to back up the database. With the use of multiple channels, and if the
backup was broken up with the section size parameter, the restore could take 30 minutes to 1 hour as
in my example above. Or if it was backed up using default parameters, the restore could take hours.

Another easier way to clone a database is to use InfiniSnaps to take a snapshot of database LUNs,
mount it to another server for read/write use. InfiniBox Consistency Groups simplify the cloning of
databases by allowing single-action snapshot of a group of volumes simultaneously. These snapshot
groups can then be mounted to another server as described above.

This type of snapshot usage saves a significant amount of time, because there is no need to wait for the
restore process. The snapshot is an entirely storage frame-based activity capable of standing up these
LUNs on another server. The Oracle database can be either a file-system-based, or ASM-based
database to mount the snapshot to another server. It is not recommended to try to mount to the same
server because, particularly if it is an ASM-based database, the ASM disk headers will have specific
information about the data being stored, the physical nature of the existing LUNs. This will prevent the
ASM instance to be able to mount the disk group as a different disk group than the one already
mounted. There are some techniques out there that point toward scrubbing the header of each device,
but it is best to not to try this at home.

Take that snapshot and map the LUNs to another server running ASM (with Linux and oracleasmlib you
must also add the device pointers to allow ASM to see the devices with the proper permissions) once
Oracle ASM completes the scan for new devices, it reads the headers of the new devices, and in the
header, ASM recognizes that the device did belong to an ASM disk group in the past, and marks it as a
member rather than candidate.

Since ASM now knows that the mapped devices belong to a disk group, you can immediately mount the
disk group with the snapshots. Since the snapshots are writable, you can then (after copying over and

HIGH AVAILABILITY AND RAS – 13


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

creating the proper $ORACLE_BASE directory structure for the database, and adding it to /etc/oratab,
you can start up the database, and create a listener entry for it.

If your organization has reservations about working with snapshots in a non-prod environment, you
can use more traditional methods for copying the data such as an RMAN redirected restore, or an
RMAN traditional restore to the new structure.

To clone a database on a remote InfiniBox storage system, simply take a snapshot of the replicated
volumes on the secondary side of the replicated InfiniBox volume pair and mount that snapshot group
to a remote server.

More about storage replication on the next section.

3.3 Database Replication vs. Storage Replication


Oracle provides a few tools to replicate database data to a secondary site. All of the tools allow for the
database to be available within minutes on the secondary site. Each tool uses some form of
transactional replication where database transactions are shipped from the primary site to the DR site.

Oracle Dataguard provides database replication and allows the secondary site to be either cold, or
read-only (Active Dataguard), where queries can be run against the secondary site. It does require that
both databases are on the same version of Oracle. There are two versions of Dataguard. Oracle
Dataguard and Oracle Active Dataguard. Active Dataguard is an optional product with additional cost
and provides more features than Oracle Dataguard whose license is included in the purchase of Oracle
Enterprise Edition. The most prominent feature of Active Dataguard is the ability to have the target and
source databases open for users. The target database will be in read-only mode, but queries can be run
on it, to support data warehouse style application access. Oracle Active Dataguard allows up to a 1:30
fan-out replication from 1 primary to up to 30 targets.

HIGH AVAILABILITY AND RAS – 14


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Oracle 12c Active Dataguard introduces a new concept called Real-Time Cascade which allows Oracle
Dataguard to replicate from the Primary to Secondary site, then from the Secondary to a third site. This
daisy chain replication can be a very powerful add-on capability for those environments that have more
unique three- site requirements.

Oracle Goldengate is a more comprehensive tool allowing for full 2-way active-active replication and
live read/write access to the target database as well as the source database, and also supports different
versions of Oracle, and different operating systems on either side of the DR pair. It is the most
comprehensive database replication product Oracle has. It is an optional product with additional
license cost.

HIGH AVAILABILITY AND RAS – 15


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

3.3.1 Storage Replication

InfiniBox includes the tools to replicate the data at the storage level from one site to a second InfiniBox
site through either IP-based asynchronous or synchronous replication. Each individual Oracle datafile
(either a file or a volume) can be replicated to the DR site, so that in the event of a disaster at the
primary site, the data will already be at the secondary site. The data can then be presented to a series
of servers on the DR side and database can be brought up fairly quickly.

InfiniBox replication is included in the price of the storage. The unique advantage that InfiniBox
provides is a very fast and frequent sync intervals, achieving low or even zero RPO. This is the shortest
RPO interval in the storage industry. What this means is that the data update delta between sites is
closer than any other storage vendor. This ensures that there is less chance of corruption, more chance
of recovering to a very near real time when the primary site failure occurs.

Recovery from storage-based long distance replication is similar to recovering the database on a local
copy of a snapshot not using Oracle backup mode. A crash-consistent image is what will be available on
the DR side in the event of recovery using storage replication. Meaning, if a disaster occurs, and the
database must be started on the DR side, upon startup, the database will go into crash recovery mode,
rolling back any transactions not fully committed, and reconciling the control file with the datafile
headers and synchronizing all files to a specific database generation ID. Because this is a crash-
consistent image of the database, there will be no opportunity to roll forward any archive logs to a
point in time. For some customers, this is acceptable, and therefore storage replication is a solution
that satisfies both RPO and RTO requirements.

With the use of InfiniBox Consistency Groups (SnapGroups), remote data replication is absolutely
critical for database Disaster Recovery. Instead of just replicating individual volumes, you can create a
consistency group, and replication all objects to the DR site within the consistency group all at one time.

HIGH AVAILABILITY AND RAS – 16


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

InfiniBox asynchronous replication is based on InfiniSnap technology, the entire consistency group
contents will be replicated on a consistent timestamp for each of the objects in the consistency group,
assuring a consistent recovery. This will ensure that the DR target data is just as consistent as a local
snapshot of the source. The database objects within the consistency group objects will have the exact
same timestamp and will provide the highest level of consistency for a database.

Here is a screen shot of the process of setting up a consistency group replication of a set of Oracle
volumes. With a single mouse click, all members of the consistency group are included in the
replication set up. The storage admin simply points the replication to the specific InfiniBox remote
system, point to the remote pool, set up replication interval and RPO, and “Create”. The whole process
to start the initialization just took under a minute. This example doesn’t show all storage elements
required to replicate. Other items that should be included in the consistency group are redo logs,
archive logs, executables.

3.3.2 Hybrid Replication


Another option is to combine both Active Dataguard AND InfiniBox asynchronous replication for yet
another three-site replication architecture. Dataguard from primary to secondary, then storage
replication from secondary to tertiary site.

HIGH AVAILABILITY AND RAS – 17


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

3.4 Oracle ASM, crash consistency and recovery


When using Oracle ASM, a header is placed on each individual device belonging to a disk group. Within
the header is a flag that allows Oracle to determine if it is a candidate or member disk. If it is a member
disk, the disk group that the device belongs to is also stored. Since InfiniBox replication is based on
snapshots, to recover and startup the database at the target replication site, you must have a server up
and running with Oracle Grid Infrastructure already installed, and an ASM instance running. When you
map the replicated volumes to the server (and for linux oracleasmlib the devices) startup asmca, and
oracle will scan the device headers, see that they are member disks, and notice that they already are
associated to a disk group. You can then just mount the disk group.

You do need to make sure that the replicated volumes are now master devices of the replicated pair,
and when the original source InfiniBox comes back online, re-start the replication in the reverse
direction. Once satisfied that all volumes are now back in sync, you now have the data in both sites in a
consistent state, and can reverse the disaster recovery process to point the primary production
applications back to their original pre-disaster state.

HIGH AVAILABILITY AND RAS – 18


INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

4 Ease of use

4.1 InfiniBox GUI and CLI are very easy to use


There are a couple of dimensions to Ease of Use. The more visible one is the incredibly easy to use
InfiniBox GUI and command line interface. Storage administrators and database administrators will see
that InfiniBox provides tools like the GUI, the CLI and the Host Power Tools that simplify the creation
and management of storage for databases.

InfiniBox provides a management system that can isolate storage pools and volumes to specific users,
to provide Multi-tenancy features so that application users, such as Oracle DBA’s can manage their own
storage, pools, volumes and snapshots. This is important for shops that are moving to Oracle
Automatic Storage Management (ASM) to store data, and moving away from O/S-based file system
storage. With ASM, the storage management function is mainly moved to the DBA support
organization. With the strong user management functions of InfiniBox, Oracle DBA’s can manage their
own objects within one or more storage pools. All the storage administrator has to do is to initially set
up the pool, and add the Oracle users to the InfiniBox management system to manage that pool.

4.2 InfiniBox Architecture Promotes Simplified Data Layout


The second dimension to ease of use is primarily due to the storage architecture. Because each volume
is broken up and it’s data is spread across all 480 spindles in the frame, there is no need to be
concerned about RAID groups, hot spot management, and concern about volume size and the number
of spindles in each RAID group. There is no need to create a large number of small volumes to spread
the I/O load across more spindles. As a result, the best data layout is the simplest. Use a small number
of large LUNS for data. Choose a LUN size that best fits the growth needs of the database, rather than
the performance characteristic limitations of the underlying storage. Most customers choose a LUN size
of 500GB to 2TB so that when a new LUN is required, adding that size LUN to the database doesn’t
waste too much space between new allocations. So, here would be a typical configuration for a 3TB
database, using Oracle ASM.

In this example, there are 3 separate LUNS for redo logs, each in it’s own ASM disk group, and 6 x 1TB
LUNS for Oracle tablespaces in a single Data disk group. Note that we do not configure redo group
mirroring in this example. You may or may not choose to create a mirrored copy of redo groups. It is up
to you. The reason is since we are providing such a highly available storage platform for these LUNs,

EASE OF USE – 19
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

unless dictated by your specific Oracle administrative needs, there is no need to mirror the log groups.
Some customers still mirror log groups for administrative purposes. That is fine. Just note that
mirroring log groups require writes to both log groups before the write can be flagged as successful
and complete, rather than a single write.

There are considerations that need to be explored if you do not use Oracle ASM, i.e. raw devices to
store the data files. In some cases, a bottleneck can be introduced on the server, at the O/S file system
level. Some server environments do have some limitations on how much data can be pushed through a
single file system. For Linux, ext3 file systems, the limit is roughly 300MB/s. With Oracle ASM, the
database is directly managing data on the raw devices as presented by the storage. There is no other
intermediate layer in between storage and database, and therefore you will get maximum performance
from database to storage. And, because you have chosen InfiniBox to store the data, there is no need
to mirror the data disk groups for added protection. So, when you set up the disk group, choose
“External” mirroring, rather than “Normal” or “High” which sets up a 2 or 3 way mirror of data to the
specified set of LUNs.

One other consideration is that when using Oracle ASM, as in this example, you can use ASM to move
data from one storage frame to another by simply adding LUNS from another storage frame to the
ASM Data disk group, and run an ASM rebalance command. Once the command is complete, all of the
data that was originally on the 6 drives will now be located across all drives. Then, you can flag the
original 6 LUNS for removal and re- run a rebalance so that ASM moves the data to what ever LUNS
remain in the disk group. All of this can be done online without downtime.

EASE OF USE – 20
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

5 Performance

5.1 Oracle I/O Profile and Infinidat storage synergy


Oracle uses a multi-process model to manage the database. Each process(s) are responsible for specific
tasks. Each task can be done simultaneously, and asynchronously. Meaning, all I/O to and from
database components are performed in parallel, and uses a “fire and forget” approach to reads and
writes. Oracle can ask for many large blocks of I/O to be read from the database structures on storage,
and the process doesn’t have to wait for all

requests to complete. It can move onto the next task. An internal table of I/O requests are managed by
the process asking for the data, and when all of the I/O slots within the table have been answered or
ack’ed, the entire I/O request is complete. Typically async I/O is performed with multi-block read and
write operations, such as a large table scan, large index scan for reads, and big block sequential writes
from processes like deleting or truncating a table, or a typical ETL process for data warehouses where
massive amounts of data is imported into the database.

Oracle does use several methods to retrieve large blocks of data. If Oracle deems that there is
significant reuse value of the data (several processes are going after the same blocks), it will ask storage
for a big chunk of data, and place it in the Oracle buffer cache, called the System Global Area, or SGA.
That way, the next process asking for those blocks will find it in Oracle buffer cache, rather than
resorting to a storage request.

If the data is deemed as a one-time-only request, from a table with a very low number of block
accesses, or very infrequent block access (oracle reads through each block header to determine date/
time stamp and access info) it will perform what is called a direct read, which is sequential in nature,
and sends the data directly to the process requesting the data, without any caching of those blocks. A
lot of this type of activity indicates that the access pattern of the applications going after the data are
looking at most or all of the entire database footprint, and not re-using that data for other processes.

A fair amount of this type of activity is sequential in nature. Although big data moves like these can look
more random.

This type of access, big block sequential reads, are well suited to Infinidat. We have extremely large
main DIMM cache, and massive SSD cache to support these activities, supported by a sophisticated,
analytics driven pre-fetch mechanism that stays ahead of Oracle large block read requests. Most of our
Oracle shops enjoy 90+% cache hit rates, from either cache. And if the database is mostly read
intensive, eventually the SSD cache will contain the vast majority of the blocks read, and in some cases,
can contain the contents of the entire database as each Infinibox frame supports up to 368TB of SSD
cache. This will result in a overall read latency for tablespaces of under 5ms for even the busiest
Infinidat frames. Typically1-2ms reads are seen.

Another large portion of the read activity are small block random reads. Typically, these are quick index
reads, where a query asks for and get’s only a single block (depending on the block size chosen,
typically 8kb-16kb) of data, randomly. Indexes are built for b-tree walking speed, not for storage
optimization. So typically these small block I/O’s are random in nature. This is where our SSD cache
shines. Particularly for hot indexes, where specific pieces of data are read over and over again in rather
quick succession.

PERFORMANCE – 21
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

For writes, no storage platform supports databases better than Infinidat. Unlike all flash array’s, that
experience write-cliff effect due to the large amount of house-keeping required, we can run line-speed
writes all day long, and faster than any hybrid array, and most all flash arrays due to our patented log-
write technology. Refer to the RMAN section of this document showing our Infinimetrics graphs of how
we performed during an Oracle RMAN backup.

Here is an example of what an Oracle database running Swingbench OLTP workload looks like. This is a
picture of our Infinimetrics showing the performance of the system supporting this Swingbench run.
Note the read and write latencies. The SAN throughput graph shows spikes, which are log switch /
archivelog writes.

Here is a picture of the Swingbench console running the 100 user OLTP test.

PERFORMANCE – 22
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

5.2 Host-based Configuration Guidelines


There are several items that need to be taken into consideration when configuring host operating
systems to support Oracle Databases, particularly when connected to infiniBox.

There are some performance guidelines that are universal in their application across all operating
systems, here are some of them. The Infinidat Host Power Tools will adjust these by default, but it
doesn’t hurt to understand what they do, what they should be set to.

5.2.1 Queue depths


Queue depth is the amount of memory space allocated to insure that when I/O is executed by an
application, that the number of commands and blocks of data that are sent to the host bus adaptor
(HBA) are queued insuring the application is free to send more I/O when it can. This feature allows for
high amount of work parallelization, and the possibility of massive asynchronous I/O. Oracle, along with
SQL Server and DB2 perform both synchronous and asynchronous I/O depending on the situation and
circumstance of where the I/O is being generated. Asynchronous I/O is when an application like Oracle
batches up a group of blocks of data and sends the entire group as a single I/O request to storage. The

PERFORMANCE – 23
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

process that sends the request, most likely for Oracle would be DBWR, the database writer process,
scans through Oracle buffer cache for dirty blocks, consolidates a list of addresses of those blocks and
sends these blocks to be written. DBWR issues a single I/O request of many blocks to storage, signals
the database that they are written, and then in the background clears the list when acknowledges of
each block written is received from storage. This allows the database to immediately recycle the original
blocks back to the free list for more buffered reads. DBWR can be very aggressive with the write list,
with as many as several hundred blocks gathered and written in a fire-and-forget fashion. This high
block count then must have some sort of queuing mechanism between the server and the storage, and
that is where the HBA queue depth comes in. When the queue’s start to fill, and get to what ever the
maximum queue depth is set for the server (or for AIX, there are separate queues for each LUN / hdisk),
a stop request is issued to DBWR to stop sending data until the queue drains. This is not a desirable
condition, as it causes delays in how fast DBWR can evacuate dirty blocks. When more blocks are
needed, DBWR would be the choke point slowing every other user process requesting buffer cache
space. If Oracle senses a slow down in free blocks is causing all transaction activity to slow, it will revert
to backup methods like direct path reads, which are reads requests sent to storage and responses sent
directly back to the user process bypassing Oracle buffer cache. There is no re-use of this data in this
mode, which is not good. Oracle relies on high data reuse to improve performance. Typically, a well
tuned database will service 100 times more logical (buffered reads) through the Oracle buffer cache
than physical reads. This reduction of physical reads improves end user performance and keeps the
storage system from having to perform them.

Queue depths are changed when you install the Infinidat Host Power Tools. The setting for most
operating systems is 128.

5.3 Red Hat Enterprise Linux (RHEL) I/O schedulers


I/O scheduler plays an important part in supporting a very specific I/O profile for the host OS. There are
three classifications of I/O scheduler available for RHEL 4,5,6,7.

cfq – Completely Fair Queueing. This is the default scheduler you will get when you install any version of
RHEL 4 or later. The purpose of this scheduler is to insure that the I/O profile of the application
supported doesn’t overwhelm the underlying storage. Typically cfq is used for desktop/workstation
uses, even when using RHEL. The assumption is cfq kicks in and paces I/O, to and from storage, with
the assumption there is a single dedicated hard drive supporting the workstation.

There are 3 possible choices for setting I/O scheduler for RHEL.
• noop
• deadline
• cfq
The scheduler choice is modified by the Infinidat Host Power Tools, which is set to noop. So there is no
need to make any changes.

5.4 RHEL File system types


Two default file system types are available for use for RHEL 5 and higher.
• ext3

PERFORMANCE – 24
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

• ext4
Typically ext3 will be used on RHEL5 and ext4 on RHEL6 and above.

A word about fragmentation. Both ext3 and ext4 fragment easily, as does many journal-based file
systems. There are some tools available to measure the level of fragmentation, but not many available
to fix it. The only known technique to remove fragmentation is to bring the application down, create a
second brand new file system, copy the data from the old file system to the new file system, unmounts
the old file system, re-mount the new file system with the old file systems mount point, then bring the
application up.

This is a time consuming and painful process, but necessary one, as both ext3 and ext4 can become
heavily fragmented over time.

The other option, would be to use other file system types that simply fragment less. RiserFS, VFS are
two modern journaled file systems available for RHEL that reduce the probability of fragmentation
down to 20%.

5.4.1 Ext4 and RHEL6 write I/O pacing


RHEL6 and ext4 introduce another layer of I/O pacing called write barriers. Again, the design
assumption is by default not to allow an application to overwhelm the underlying storage
infrastructure. With InfiniBox, we don’t worry about writes, as all writes are to cache, and there is a very
large cache available, along with a very elegant de- stage mechanism that uses multi-modal log writing
architecture to dump modified blocks out of cache very quickly. When configuring RHEL6 or higher,
using ext4, or any file system types, mount the file systems with the –nobarrier option. This will turn off
the I/O pacing of the O/S and allow maximum write capabilities straight from application to storage.

You can determine if write barriers are turned off by using the mount command.

5.5 Windows NTFS


When deploying Oracle on Windows and NTFS file systems (not using ASM), the NTFS file system uses
data clusters to store blocks of data in groups. The default block size is 4kb, which works well on
infiniBox. The Allocation Unit size, or extent size, or cluster size, as it is called determines how NTFS will
group data blocks into like groups. When Windows submits an I/O to the storage sub system, it
normally will use this cluster size to access and pre-fetch data. This 64kb matches the block size that
infiniBox uses to store and access that data. So using a 64kb AU size works best.

NTFS does fragment, as does all journal-based file systems, so defrag when ever possible. Oracle likes
to update data in place, and this causes high fragmentation on NTFS. Fortunately, the defrag tool really
works well here. And no downtime required, unlike other file systems like ext3,ext4 and AIX jfs2.

5.6 AIX jfs2 and LVM


AIX uses a journaled file system called jfs2. Jfs2 on top of the AIX Logical Volume Manager are a
powerful storage environment for Oracle databases.

PERFORMANCE – 25
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

AIX supports 2 types of data striping mechanisms. One is very fined grained and one is course grained.

PP spanning is a course grain data layout and access technique that allows Oracle data to be layed out
in chunks of up to 256MB in size for a single Physical Partition, or PP. PP spanning then takes each
256MB of data and stores it on each of the hdisks within a given volume group. Data is then accessed
using this course grain “striping” mechanism. This is the preferred method, as it allows the InfiniBox to
sense the data access and pre-fetch accordingly.

LVM stripping is performed at the logical volume level. You can create a logical volume to span multiple
hdisks /

LUNS to allow for many disks to support a single logical volume. That logical volume is then formatted
with jfs2 and mounted as a single mount point. The default stripe width at the LVM level is 128kb. This
is not the recommended data layout for InfiniBox, as the individual stripes are seen as 128kb random I/
O rather than a much larger block sequential I/O profile that a table scan or index range scan would
look like.

5.7 Oracle ASM


Oracle Automatic Storage Management was introduced in Oracle 10g and is becoming more widely
supported by Oracle shops. There are several advantages and dis-advantages to using ASM. ASM runs
on every operating system that the Oracle database can run on. And it is managed in exactly the same
way no matter what OS is used.

ASM provides a layer between the database and storage and acts as the Oracle database “file system”.
It actually deals directly with raw devices, so it eliminates any issues and bottlenecks that Operating
System file systems introduce.

The performance advantages are then very obvious. ASM does provide it’s own striping mechanism, to
insure that all LUNS / devices in each disk group are evenly used.

ASM also provides other very nice features, like the ability to “move” tablespaces off of one set of LUNS
onto another without bringing the database down. This is very advantageous for Oracle DBA’s providing
a layer of protection from Storage migrations.

There are some dis-advantages of using ASM. Setting up Oracle ASM is not trivial. You are essentially
setting up a single node cluster, by installing the Oracle Grid Infrastructure software under the
Database software.

Once ASM is set up, it is very easy to manage using the built in tools such as ASMCMD, the x-based
asmca tool, or Oracle Enterprise Manager (OEM).

Database backup using RMAN is the same for ASM as it is for file system-based Oracle databases.

Storage-based snapshot backup and recovery, particularly when using cloning techniques to stand up
non-prod environments with an InfiniBox snapshot is slightly different with ASM. First, if you take a
snapshot and map the snapshot back to the same server from which the snapshot is taken, the Oracle
ASM instance running on that server will get confused because there are headers written to each LUN
within each disk group in ASM. When you clone that data, and present it back to the same server, the
ASM instance will notice that the headers found on those devices are already in use and will not allow
you to map and mount this disk group to the same server. If you have a different server running ASM,

PERFORMANCE – 26
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

mapping the snapshot to it is fast and easy. Basically you map the clones to the new host, configure
them as Oracle ASM devices (for linux Oracleasmlib), start up asmca and it already see’s that the
devices make up a disk group and you can immediately mount the disk group.

5.7.1 ASM and AU size


ASM is setup initially with an extent size, or Allocation Unit size used for data layout. The default is 1MB.
What this tells ASM is that for every 1MB of data stored on the disk group, an extent marker is placed.
This has no direct bearing on read access and stripe width, but does impact performance. When Oracle
is scanning through the data, and is told to read a larger block of data, if it encounters an extent
marker, this signals the end of channel for this extent, and Oracle must then issue a second I/O to
continue the read request until the request is satisfied. Obviously the larger the extent size, the less
often an extent marker will be hit, and therefore less physical I/O performed for that same operation.
The suggestion for the data disk group then would be to use an AU larger than 1mb. 8mb or 16mb
works very well on InfiniBox.

5.7.2 Oracle Real Application Clusters (RAC)


Another advantage for ASM is ASM has the ability to work directly with Oracle Real Application Clusters
(RAC), which requires a multi-initiated storage layer like shared file systems that can be read and
written by multiple physical hosts simultaneously. Without ASM, you will need a “cluster aware” file
system like OCFS (Oracle Clustered File system), Veritas Storage Foundation, IBM’s GPFS. Cluster aware-
ness is the ability to allow multiple nodes in a cluster to read and write data stored in one location and
the aware-ness is the synchronization of those activities to insure write order and preserve data
integrity.

RAC does increase availability of the database by allowing the database to be supported by more than
one server. If any of the host nodes in the cluster fails, the database still remains up, supported by the
remaining nodes in the host cluster complex. This is an active-active system. It is widely used by shops
requiring maximum uptime. One side effect of RAC, is that the amount of I/O generated by a 3 node
RAC environment is roughly 3 times more than if the database was run on a single node. The increased
I/O works well on InfiniBox, and something to consider when customers are planning for a RAC
deployment.

5.7.3 Oracle ASM and RHEL using oracleasm


There is one extra step involved in supporting SAN LUNs with RHEL for use with Oracle ASM. The
OracleASMlib must be installed on the RHEL server. It is required to basically create hard links to the
real /dev/mapper , and /dev/dm-* devices created by infinihost and the RHEL multipath software. And
these hard links allow Oracle ASM to see and own the devices as user oracle group oinstall. You
download the OracleASMlib software directly from oracle.com. Be mindful that there are different
versions depending on the linux kernel version you are using.

PERFORMANCE – 27
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

[root@io-colo-06 parameters]# uname -a

Linux io-colo-06 2.6.18-398.el5 #1 SMP Tue Aug 12 06:26:17 EDT 2014 x86_64 x86_64
x86_64 GNU/Linux

In this case, the ulimit –a command reveals we are running 2.6.18.398.el5.

You then run the oracleasm tool to configure, initialize the service. Then oracleasm createdisk <logical
name> <physical device> for each LUN mapped by infinihost.

The oracleasm querydisk command allows you to view the physical nature of the device label, in this
case ORA11DATA006.

[root@io-colo-06 parameters]# oracleasm querydisk -p ORA11DATA006


Disk "ORA11DATA006" is a valid ASM disk
/dev/mapper/mpath28: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdy: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdce: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sddh: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdek: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdfn: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdgq: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdht: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdiw: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdjz: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdlc: LABEL="ORA11DATA006" TYPE="oracleasm"
/dev/sdmf: LABEL="ORA11DATA006" TYPE="oracleasm"

In this example, the logical device ORA11DATA006 is made up of a hardlink to /dev/mapper/mpath28,


which points to /dev/dm-27. There are 12 subordinate devices that support /dev/mapper/mpath28,
which are the 12 individual paths that are created by the multipath software, defined by the physical
connection between the server and Infinibox. You can check this by:

[root@io-colo-06 oracle]# ls -l /dev/mapper/mpath28


brw-rw---- 1 root disk 253, 27 May 28 03:30 /dev/mapper/mpath28
[root@io-colo-06 oracle]# ls -l /dev/dm-27
brw-rw---- 1 root root 253, 27 May 28 03:30 /dev/dm-27
[root@io-colo-06 oracle]#

Note the major and minor number of both devices is the same.
The /etc/sysconfig/oracleasm file is the main configuration file for oracleasm:

PERFORMANCE – 28
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

root@io-colo-06 orainf]# cd /etc/sysconfig


[root@io-colo-06 sysconfig]# cat oracleasm
#
# This is a configuration file for automatic loading of the Oracle
# Automatic Storage Management library kernel driver. It is generated
# By running /etc/init.d/oracleasm configure. Please use that method
# to modify this file
#
# ORACLEASM_ENABLED: 'true' means to load the driver on boot.
ORACLEASM_ENABLED=true
# ORACLEASM_UID: Default UID owning the /dev/oracleasm mount point.
ORACLEASM_UID=oracle
# ORACLEASM_GID: Default GID owning the /dev/oracleasm mount point.
ORACLEASM_GID=oinstall
# ORACLEASM_SCANBOOT: 'true' means fix disk perms on boot
ORACLEASM_SCANBOOT=true
# ORACLEASM_USE_LOGICAL_BLOCK_SIZE: 'true' means use the logical block
# size reported by the underlying disk instead of the physical. The
# default is 'false'
ORACLEASM_USE_LOGICAL_BLOCK_SIZE=false
# ORACLEASM_SCANORDER: Matching patterns to order multipath disk scan
ORACLEASM_SCANORDER="mpath dm"
# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"

The ORACLEASM_SCANORDER and ORACLEASM_SCANEXCLUDE are required entries and force


oracleasm to properly build the correct hard links to the right devices. Without these two lines,
oracleasm will pick the top path from each physical device and use it as the hard link. This will insure
that all I/O traffic passes through a single path, rather than the number of paths that have been
configured between the host and the InfiniBox.

5.8 Oracle Databases on NFS


Oracle databases on NFS storage is an emerging option for customers seeking to simplify their Oracle
database storage environments. The simplicity is provided in several forms.
• Simplified data layout. A single, or two NFS mount(s) to store all Oracle data is much simpler than an ASM
environment using 10-20 LUNs to support a database.
• Removal of Fiber Channel complexity from the server. No more need for FC zoninig, multipath, complex
switch gear. Removal of extra FC HBA hardware from the server.
• Good enough performance. 2-4ms reads, 1-2ms writes. Not as fast as raw devices on ASM, but in most
cases, this level of performance provides sufficient support for most workloads.
• Tighter consolidation of databases on servers, on storage.
This is a relatively new concept, one in which Oracle has been trying to sell to it’s customers with the
ZFS storage appliance. The adoption rate for Oracle on NFS has been slow, primarily due to
performance issues as well as availability issues on other NAS platforms. Infinibox removes these
obstacles for Oracle on NFS by providing the same 7 Nines of availability, and excellent performance by
using the same architecture that supports block storage.

PERFORMANCE – 29
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

The process of implementing Oracle on NFS is simple, starting with setting up storage by creating a NAS
file system rather than block storage on Infinibox, exporting the file systems to hosts, mounting on the
host (using Oracle suggested NFS mount options) and with the use of the Oracle dNFS client installed
on the host running the database software, be able to perform the same direct, asynchronous non-
buffered I/O that it uses to fiber channel devices. Standard NFS v3 file shares are supported, and the
dNFS client is available for just about any host operating system available.
Oracle performed their own tests comparing standard host-based NFS client, compared to their dNFS
client and the performance difference was fairly significant.
Since InfiniBox 2.2 natively supports NFSv3, it makes perfect sense to use it as a ZFS replacement since
all of InfiniBox characteristics are far and beyond that of the ZFS appliance. ZFS uses a heritage dual
controller architecture, with RAID set data deployment architecture. To provide the same dual drive
failure support provided by default on Infinibox, you need to create many triple mirror RAID sets that
span the 1.5PB raw capacity of the ZS4-4 Enterprise, reducing the usable capacity to less than 500TB.
Here are the results of a Swingbench OLTP test between a block storage, ASM database, using 8 * 1TB
luns, and a single NFS mount point-based Oracle database. Both databases were the same size, about
4.7TB.

The Swingbench transaction rates were fairly close, 194k TPM for ASM, 177k TPM for NFS.

PERFORMANCE – 30
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Swingbench transaction response times were also very similar, 31ms for ASM, 33ms for NFS.

From the AWR report, Oracle reports I/O latency between ASM and NFS. In the AWR report, this metric
are I/O response times to the SOE tablespace. Here is where the I/O difference shows how fast ASM
works .vs NFS. When you review this data, it shows that the latency is 3x higher on NFS. However, based
on the actual transaction profile, this resulted in little overall difference to the application throughput in
terms of transactions executed, and overall Swingbench transaction latency. This is where “good
enough” performance is really good enough.

PERFORMANCE – 31
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

There are a couple of white papers published by Oracle on how to mount the file systems.

https://docs.oracle.com/cd/E28223_01/html/E27586/configexorssc.html

http://www.oracle.com/technetwork/server-storage/sun-unified-storage/documentation/oracle11gr2-
zfssa-bestprac-2255303.pdf

This is what was used for the test:

ibox1082-nfs2:/ora11fsdata1 on /u01/app/oracle/nfs/data1 type nfs


(rw,bg,hard,nolock,rsize=32768,wsize=32768,addr=172.19.0.78)

ibox1082-nfs4:/ora11fslog1 on /u01/app/oracle/nfs/log1 type


(rw,bg,hard,nolock,rsize=32768,wsize=512,addr=172.19.0.80)

ibox1082-nfs5:/ora11fslog2 on /u01/app/oracle/nfs/log2 type


(rw,bg,hard,nolock,rsize=32768,wsize=512,addr=172.19.0.81)

ibox1082-nfs1:/ora11fslog3 on /u01/app/oracle/nfs/log3 type


(rw,bg,hard,nolock,rsize=32768,wsize=512,addr=172.19.0.77)

To enable the dNFS client, you must shutdown the database:

$ORACLE_HOME/rdbms/lib/make -f ins_rdbms.mk dnfs_on

This make command replaces the standard SQLnet client NFS library with the dNFS library.

PERFORMANCE – 32
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

rm -f /u01/app/oracle/product/11.2.0/dbhome_1/lib/libodm11.so; cp

/u01/app/oracle/product/11.2.0/dbhome_1/lib/libnfsodm11.so

/u01/app/oracle/product/11.2.0/dbhome_1/lib/libodm11.so

Startup the database:

Review the database alert log and note that there are mentions of Direct NFS as part of the startup
process.

/u01/app/oracle/diag/rdbms/<database name>/<database name>/trace/alert_<database


name>.log

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 2.0.

ORACLE_BASE from environment = /u01/app/oracle/Thu


Jun 02 19:39:52 2016
ALTER DATABASE MOUNT
Direct NFS: channel id [0] path [ibox1082-nfs4] to filer [ibox1082-nfs4] via local []
is UP
Direct NFS: channel id [1] path [ibox1082-nfs4] to filer [ibox1082-nfs4] via local []
is UP
Direct NFS: channel id [2] path [ibox1082-nfs5] to filer [ibox1082-nfs5] via local []
is UP
Direct NFS: channel id [3] path [ibox1082-nfs5] to filer [ibox1082-nfs5] via local []
is UP
Direct NFS: channel id [4] path [ibox1082-nfs2] to filer [ibox1082-nfs2] via local []
is UP
Direct NFS: channel id [5] path [ibox1082-nfs2] to filer [ibox1082-nfs2] via local []
is UP
Successful mount of redo thread 1, with mount id 2809042376Database mounted in
Exclusive Mode Lost write protection disabled
Completed: ALTER DATABASE MOUNT Sun Jun 05 02:19:38 2016
alter database archivelog
Completed: alter database archivelog
alter database open Sun Jun 05 02:19:44 2016
Direct NFS: attempting to mount /ora11fsdata2 on filer ibox1082-nfs2 defined in mtab
Direct NFS: channel config is:
channel id [0] local [] path [ibox1082-nfs2]
routing is disabled by oranfstab option
Direct NFS: mount complete dir /ora11fsdata2 on ibox1082-nfs2 mntport 20048 nfsport
2049
Direct NFS: channel id [6] path [ibox1082-nfs2] to filer [ibox1082-nfs2] via local []
is UP
Direct NFS: channel id [7] path [ibox1082-nfs2] to filer [ibox1082-nfs2] via local []
is UP

PERFORMANCE – 33
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

6 Data Reduction
Starting with the 3.0 version of InfiniBox, data reduction is included with the software. Now, you can
create volumes on Infinidat as compressible, SSD-supported, thin provisioned volumes for Oracle
consumption. Oracle does not know or care that the data underneath is compressed. With traditional
host-based compression capabilities, such as Oracle’s Hybrid Columner compression, and compression
of RMAN backups using the RMAN compression engine, the host CPU is used to compress the data in-
line as the data is written to storage. This has the unfortunate effect of increasing the core count on the
server to support this effort, which increases Oracle Software Licensing costs, which are charged by the
server core.

InfiniBox uses a unique compression engine that compresses the data as it is being destaged out of
memory, rather than attempting to compress it before it gets into DRAM or SSD. The end result is that
write performance is not effected when writing to a compressed volume. Writes on Infinibox, as you
already may know are gathered, staged and executed on a regular interval, typically once every 5
minutes. This insures that the data is written smartly and more efficiently. The data is not compressed
in cache. Not to worry, we have a ton of cache.

The performance penalty is paid when a read miss occurs. A read miss on Infinibox results in the a
spindle read, which on a compressed volume involves reading the blocks, uncompressing the data then
placing in DRAM and SSD.

Since Infinidat has such a high cache hit rate, typically in the mid-90% range, the end result is very little
difference in performance.

Here is a test run of Swingbench against a non-compressed set of volumes on an F6240. Compared to a
set of compressed volumes from a smaller F1130. The systems are vastly different in spindle count (480
for F6240 .vs 60 for F1130) and SSD cache size (86TB for F6240 .vs 23TB for F1130). The I/O workload
does not translate to high workload for the storage, but the SSD cache size and spindle count should
provide improved performance for the larger F6240.

To use compressed volumes, you must create them on Infinibox as compressed, and they must be thin
volumes.

The Swingbench database used in the testing uses 8 x 1TB volumes for the data disk group, 3 x 5GB
volumes for 3 individual redo log groups, 1 x 2TB volume for the archive destination disk group, all
volumes are ASM disk group based.

The RMAN backup test volumes were 4 x 2TB volumes mounted as Linux ext3 file systems. RMAN was
told to create 4 channels, one for each file system and break up the backup pieces into no larger than
10GB chunks. This insured maximum parallelization (set to 4) for the backup.

The data disk group compressed the best, at 2.5:1. All other objects compressed at 1.7:1.

Here is a screen shot of the Swingbench console showing the overview screen showing transactions per
minute, average transaction response time and transactions per second for the uncompressed run.

DATA REDUCTION – 34
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Here is a screen shot of a Swingbench run against a set of compressed volumes.

DATA REDUCTION – 35
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

The difference is negligible. Roughly the same number of transactions per minute between the two
runs. The average transaction latency is roughly the same as well. These are actual Swingbench
transactions.

The actual data compression efficiency (based upon Infinidat GUI) breakdown is as follows:

Object Compression ratio

Data / indexes 2.5:1

redo / archive logs 1.7:1

RMAN backup pieces 1.7:1

Each of the panels identifies a volume type on InfiniBox with the data reduction status showing in the
lower right corner of each box.

The AWR’s show very little difference between uncompressed (top) and compressed (bottom) top 5 wait
events. The majority of work done by this Swingbench test were small block index reads, or to InfiniBox,
small block random reads.

DATA REDUCTION – 36
INFINIBOX ORACLE DATABASE INTEGRATION AND BEST PRACTICES

Event Waits Time (s) Avg wait (ms) % DB time Wait Class

db file sequential read 127.502.017 268,446 2 78.23 User I/O

DB CPU 50,043 14.58

log file sync 12,781,586 15,939 1 4.64 Commit

library cache: mutex X 801,753 2,811 4 0.82 Concurrency

latch: cache buffers 550,664 1,026 2 0.30 Concurrency


chains

db file sequential read 125,059,497 266,664 2 78.89 User I/O

DB CPU 46,697 13.90

log file sync 11,782,395 15,468 1 4.58 Commit

library cache: mutex X 692,545 3,432 5 1.02 Concurrency

latch: cache buffers 596,214 1,208 2 0.36 Concurrency


chains

Here are the tablespace data file stats, top being uncompressed, bottom compressed. Note, similar
number of reads executed, as well as the recorded latency, which doesn’t even register. For writes,
slightly higher latency on the writes.

DATA REDUCTION – 37

You might also like