Professional Documents
Culture Documents
SNIA-SA 110
Essentials of
HDS 9910
Storage Networking
Chapter 5
Data Management
Version 1.1
© COPYRIGHTED 2004
Coverage
© COPYRIGHTED 2004
Section 1
© COPYRIGHTED 2004
What is Data Management?
• In the Storage Domain: (defined by SNIA)
• Data management is the management, control, and
operation of data services
• Data Services are the processes and practices of data
handling, retention, protection, movement, distribution, and
accessibility
• Examples: backup and recovery, archive, replication,
DR, HSM
© COPYRIGHTED 2004
Data Management Services
• In the context of Information Lifecycle Management
© COPYRIGHTED 2004
Section 2
© COPYRIGHTED 2004
Information Lifecycle
Management - Vision
• ILM VISION:
© COPYRIGHTED 2004
ILM - Definition
• Information Lifecycle Management is comprised of the policies,
processes, practices, and tools used to align the business value
of information with the most appropriate and cost effective IT
infrastructure from the time information is conceived through its
final disposition.
• Information is aligned with business processes through
management of policies and service levels associated with
applications, metadata, information, and data.
© COPYRIGHTED 2004
Information Lifecycle
Management - Principles
•ILM PRINCIPLES:
© COPYRIGHTED 2004
High-Level ILM Vision –
An ILM Framework for the Datacenter
• The ILM framework provides
management and control of the
IT Infrastructure abstracted in
terms that are relevant to
Business Requirements in the
management of Information
used by a Business Process
© COPYRIGHTED 2004
ILM and Data Management
• Relationship:
• ILM is a management practice that sets Service Level
Objectives and policies for information and uses data
services to enact those policies
• To achieve the interoperability goal of ILM, we have to first
define and develop interoperability for the elements of data
management
• SNIA SMI-s
© COPYRIGHTED 2004
Section 3
© COPYRIGHTED 2004
Section 3: Data Management and
Data Protection
2.1. Backup and Recovery
2.2. Snapshot
2.3. Replication Techniques, and Mirroring Concepts
2.4. High-Availability
© COPYRIGHTED 2004
Section 3: Data Management and
Data Protection
2.1. Backup and Recovery
2.2. Snapshot
2.3. Replication Techniques, and Mirroring Concepts
2.4. High-Availability
© COPYRIGHTED 2004
Backup Defined
SNIA defines:
“Backup is a collection of data stored on (usually removable ) non-
volatile storage media for purposes of recovery in case the
original copy of data is lost or becomes inaccessible. Also called
backup copy. To be useful for recovery, a backup must be made
by copying the source data image when it is in a consistent
state”.
© COPYRIGHTED 2004
Enterprise Backup
Architecture
• Backup client
• Any computer with data to back up
• Backup servers
• Copy data to back up media and maintain the historical
information
• Backup storage units
• Tapes, magnetic disks, optical disks
© COPYRIGHTED 2004
Backup Components
Backup
server
Catalog
SAN
SAN
LAN
LAN
Tape
Backup
client Storage
© COPYRIGHTED 2004
Backup Components
• Hardware
• Host for Backup Server
• Software
• Backup software
• E.g Veritas Netbackup
• E.g CA Brightstor ARCserve Backup
• E.g Legato Networker
© COPYRIGHTED 2004
Backup Techniques
• Full Backup
• Incremental Backup
• Differential Backup
© COPYRIGHTED 2004
Full Backup
• What
• Is a copy of only the files that changed since the preceding
backup
© COPYRIGHTED 2004
Incremental Backup
• Impacts
• Backup
• only a small % of data if data does not change much
• Faster backup time
• Restore
• Restoration will be tedious as the base full backup must
be restored first, and then the subsequent incremental
backup files. ( See example )
• Longer restoration time
© COPYRIGHTED 2004
Restoration of Incremental
Backup Example:
• Disaster – Friday
M • Restore Monday full
A
backup first
T • Restore all incremental till
B Thursday
W • Longer restoration
C
T D
A B C D Restoration
© COPYRIGHTED 2004
Incremental Backup
• Types
• Block Level Table 1 Snapshot Copy
Table 2
Table 1
Table 2
Backup
Manager
Changed Blocks
© COPYRIGHTED 2004
Differential Backup
• Types
• Cumulative
Example M
A
• Disaster – Friday
• Restore Monday T
• Restore Thursday
B
W C
T D
F
Restoration A D
© COPYRIGHTED 2004
Remote Backup
• What is remote backup?
© COPYRIGHTED 2004
Remote Backup
• Functions of Remote Backup?
• It works like a regular data backup software
• Sends backup over the internet, regular phone lines or other
network connections to a backup server
• Backup data at anytime
• Constantly reevaluating the computer system and add files
to be backup as needed
• Data are encrypted for complete security
• Automatically store this valuable data at more then one site.
© COPYRIGHTED 2004
Remote Backup
• How does Remote Backup works
• Install the Remote Backup Client
• when the time determine is met, Remote Backup will “wake
up”
• it determines which files need to be back up
• what kind of backup
• then compresses those files into archives
• then these archives are encrypted
© COPYRIGHTED 2004
Remote Backup
Diagram of Remote Backup
Tapes
Servers
Gateway
gateway
SAN
WAN
SAN
FC Disk
Tapes
© COPYRIGHTED 2004
Section 3: Data Management and
Data Protection
2.1. Backup and Recovery
2.2. Snapshot
2.3. Replication Techniques, and Mirroring Concepts
2.4. High-Availability
© COPYRIGHTED 2004
Snapshots
Snapshot:
Snapshot is a point in time view of the data that is created by
serving the original data to a repository whenever data in the base
volume is overwritten. The technique that’s allows a snapshot to be
created instantaneously is the innovative copy-on-write technology.
The snapshot process creates an empty repository that’s holds
original values that later change in the bas volume after the time of
snapshot creation.
© COPYRIGHTED 2004
Snapshots
A B’
A B
Base Volume
B Snapshot Image
Repository
© COPYRIGHTED 2004
Snapshots
(cont’d of snapshot)
The primary purpose of snapshot is to facilitate
non- disruptive backups. The snapshot image becomes the source
of the backup/restoring information. Common reason for
information restore is user error. Its is easier and faster to reinstate
selected files using snapshots. Snapshots image also provide a
convenient source for testing and training environments and for
data mining purpose. Traditional methods of finding data and
duplicating it may prove to be costly and time consuming.
© COPYRIGHTED 2004
Snapshots
There are 2 Snapshot techniques:
1. Copy-on-write
2. Split-Mirror
1. Copy-On-Write
Whenever a copy of data is being requested, the disk
subsystem sets up a second pointer ( snapshot index )
and represents it as a new copy. Inside this snapshot
index is empty inside by default.
© COPYRIGHTED 2004
Snapshots
How does it works?
Snapshot is a logical copy of data that gets created by
saving the original data to a snapshot index that is created and is
being updated whenever data in the base volume is updated. The
snapshot then process and creates an empty snapshot index,
holding the original values that later change in the bas volume after
the time of snapshot creation.
Further details:
Snapshot is actually seen by combining the base where
data with the snapshot index containing the original data at the
moment the snapshot was taken. Copy-On-Write technology
enables the instantaneous nature of the snapshot, while only
requiring a fraction of the base volume disk space.
© COPYRIGHTED 2004
Snapshots
(cont’d)
Copy-On-Write provides efficiency by requiring only a fraction of the
base volume disk space. The average disk space requirements for a
snapshot copy is about 10%-20% of the base volume space. Actual
space depends on how long the snapshot is active and how many
writes are being made to the base volume. Copy-On-Writes
technology is efficient to use except in a heavy write environment or
when copy is required to be active for a long period of time.
Copy-On-Write technology is effective as a backup source image.
The required disk space are less then a full volume copy, periodic
snapshots can be made throughout the day as copy points to
reference in the event of restoration.
© COPYRIGHTED 2004
Snapshots
Snapshots Pros Cons Products
Server-based software Tightly integrated with backup Operating system dependency Legato Networker; Veritas NetBackup
application FlashBackup; Computer Asssociates
BrightStor; IBM Tivoli
Snapshot index updates must
communicate with server decreased
performance.
Storage-based Improve efficiency of managing Typically no write capability mostly used for Compaq snap and clone; IBM
snapshots. backup. Flashcopy for FAstT; LSI logic
Snapshot; storagetek D1xx series and
V960 SVA; Clarition snapview(all
Support multiple operating systems Application integration
typically allow writes)
with partitioned storage Specific approach unique to each storage
vendor.
Switch or device based Heterogeneous storage system Immature products and limitations from Datacore: SANsymphony, snapshot
support various device support option;
FalconStor:IPStor snapshot copy
Application integration option: StoreAge: MultiView
© COPYRIGHTED 2004
Snapshots
2. Split-Mirror
Split mirror technology is used to maintain 2 or more up-to-
date full copies of the data. Every write request to the original data is
automatically duplicated to the other mirrors or copies of that data.
The mirror may be contained in the same subsystem or be between
different subsystem, although these typically must be of the same
subsystem model.
The primary purpose of the Split-mirror technology is to
perform disaster recovery. Whenever the system fails to perform,
mirrors must be written between 2 subsystem and how the
appropriate distance for the disaster to not affect both system at the
same time. Often 2 subsystems will be mirrored and sit in the same
data center. It will guard against hardware failure. The further the
distance, the more delay in the performance. Asynchronous modes
of data transfer are available to accommodate wide-arm distance.
© COPYRIGHTED 2004
Snapshots
(cont’d)
Split-mirror technology provides real time redundancy, when its
active, it isn’t a frozen image or snapshot. The mirror can be
temporarily suspended which is called SPLIT-MIRROR to create a
snapshot or point-in-time copy. The disk subsystem is told to
temporarily stop making updates to the mirrored copy so the data is
frozen at the point of the suspension. The split-mirror can then be
used for the backup process.
Mirrors create an instant copy, or snapshot of the data with the split
capability. Full data copy is available, usually a third copy mirror is
established for the purpose of splitting.
© COPYRIGHTED 2004
Snapshots
This (the splitting process) requires 3 entire copies of the data volume to
provide the protection and must continuous processing for backup and
other development needs. There is primary and secondary real-time
copy, and a tertiary point-in-time copy of the data. Data can be updated
for development or training purposes.
Products that utilizes the split-mirror:
EMC TimeFinder
Hitachi InstantSplit
HP SureStore Business Copy
Sun StarEdge Instant image
Xistech
© COPYRIGHTED 2004
Snapshots
© COPYRIGHTED 2004
Snapshots
© COPYRIGHTED 2004
Section 3: Data Management and
Data Protection
2.1. Backup and Recovery
2.2. Snapshot
2.3. Replication Techniques and Mirroring Concepts
2.4. High-Availability
© COPYRIGHTED 2004
Replication Techniques
• Synchronous Replication
• Asynchronous Replication
© COPYRIGHTED 2004
Synchronous Replication
© COPYRIGHTED 2004
Synchronous Replication
synchronous Replication
Application server must wait from step 1-6
Application onwards and this usually take about 1
Server milliseconds
1 6 3
Mirroring Agent
Mirroring Agent 5
4
2
© COPYRIGHTED 2004
Asynchronous Replication
Asynchronous Replication:
This technique allows the replication process to be
separated from the local write so that the application server does not
need to suffer the performance penalty. Though this is fast, but the
remote copy does not guarantee to be a perfect copy of the source.
© COPYRIGHTED 2004
Asynchronous Replication
Asynchronous replication
Application server only waits
from step 1-3 which is faster then
Application
the Synchronous replication.
Server
1 3
4
Mirroring Agent Mirroring Agent
6
2 5
© COPYRIGHTED 2004
Replication Comparison
© COPYRIGHTED 2004
Mirroring Concepts
1. Disk Mirroring
Disk mirroring is a technique which is written to 2 duplicate disks
simultaneously. This way, if one of the disk drives fails to deliver, the
system can instantly switch to the other disk without any loss of data
or service. Disk mirroring is used commonly in on-line database
systems where its critical that the data can be accessible at all times.
Input/Output
Programmer
Controller
© COPYRIGHTED 2004
Section 3: Data Management and
Data Protection
2.1. Backup and Recovery
2.2. Snapshot
2.3. Replication Techniques and Mirroring Concepts
2.4. High-Availability
© COPYRIGHTED 2004
High Availability
• What Is High Availability?
• High availability means minimal downtime for applications to
ensure business continuity. It must be resilient to
unexpected hardware and software failure and deal with
problems such as disaster recovery.
© COPYRIGHTED 2004
The Advantages and
Disadvantages
• The Advantages of high availability:
• Reliability (alternate path)
• Performance - multiple levels of redundancy reduces
downtime
• Integrity - No single points of failure
© COPYRIGHTED 2004
HA Applications
• Redundant and Failover Paths
• Clustering
© COPYRIGHTED 2004
High Availability – Failover Path
• Zero Downtime
• Fail over features HDS 9910
© COPYRIGHTED 2004
High Availability - Clustering
Clustering collection of same Types of Cluster topologies:
kind of objects
Benefits:
1. 1+1 Cluster: often used in 2
• Functions as a single system in case nodes
of failure
2. N+1 Cluster: There is one
• High availability standby servers for all the
servers in the SAN
• Reliability
environment
• More servers can share the SAN
3. N+N Cluster: There is a
• Promotes remote disaster recovery standby for each server in the
capabilities environment.
© COPYRIGHTED 2004
High Availability – Clustering
1 + 1 Clustering
© COPYRIGHTED 2004
High Availability - Clustering
N + N Clustering
© COPYRIGHTED 2004
High Availability – Clustering
N+ 1 Clustering
© COPYRIGHTED 2004
Section 4
© COPYRIGHTED 2004
Section 4: Data Management for
Disaster Recovery and Business
Continuity
• What is Disaster Recovery?
© COPYRIGHTED 2004
Disaster Recovery
• What is Disaster Recovery?
© COPYRIGHTED 2004
The Goal
• The goal is to continue business operations after a loss of use
of all or part of a data center
© COPYRIGHTED 2004
Business Continuity Components
• Data Center Recovery Alternatives
• Backup recovery facilities
• Geographic diversity
• Backup and storage strategies
• Data file backup
• Software Backup
• Offsite storage
• Site relocation
• Post Disaster Communication
© COPYRIGHTED 2004
Terms in Disaster Recovery
•Recovery Point Objective (RPO)
• Point in time in which applications data must be recovered to
resume business transactions
• Traditionally, RPO is in hours but newer technology shortens RPO to
© COPYRIGHTED 2004
Remote/Local
Backup and Recovery
Recovery Point Objective
Backup
RPO
Disaster
© COPYRIGHTED 2004
Remote/Local
Backup and Recovery
Recovery Time Objective
Recovery
Disaster RTO
© COPYRIGHTED 2004
Tiers Levels of Disaster Recovery
Solutions
© COPYRIGHTED 2004
Tiers Levels of Disaster Recovery
Solutions (cont…)
• Tier 0 - No off-site data
• Tier 0 Disaster Recovery solution have no Disaster
Recovery Plan.
• Tier 1 - Data backup with no Hot-Site
• Businesses that use Tier 1 Disaster Recovery solutions back
up their data at an off-site facility (PTAM - Pickup Truck
Access Method).
• Depending on how often backups are made, they are
prepared to accept several hours to days of data loss, but
their backups are secure off-site. However, this Tier lacks
the systems on which to restore data.
© COPYRIGHTED 2004
Tiers Levels of Disaster Recovery
Solutions (cont…)
• Tier 2 - Data Backup with a Hot-site
• Make regular backups on tape.
• This is combined with an off-site facility and infrastructure
(known as a hot-site) in which to restore systems from those
tapes in the event of a disaster.
• This Tier of solution will still result in the need to recreate
several hours to days worth of data, but it is less unpredictable
in recovery time.
© COPYRIGHTED 2004
Tiers Levels of Disaster Recovery
Solutions (cont…)
• Tier 4 - Point-in-time Copies
• Used by businesses who require both greater data currency
and faster recovery than users of lower Tiers.
• Incorporate more disk based solutions.
• Several hours of data loss is still possible, but it is easier to
make such point-in-time copies with greater frequency than
data can be replicated in the lower tiers.
• Tier 5 - Transaction Integrity
• Tier 5 solutions are used by businesses with a requirement
for consistency of data between production and recovery
data centers (software two site, two phase commit).
• There is little to no data loss in such solutions, however the
presence of this functionality is entirely dependent on the
application in use.
© COPYRIGHTED 2004
Tiers Levels of Disaster Recovery
Solutions (cont…)
• Tier 6 - Zero or little data loss
• Maintain the highest levels of data currency.
• Used by businesses with little or no tolerance for data loss
and who need to restore data to applications rapidly.
• These solutions have no dependence on the applications to
provide data consistency.
• Tier 7 Highly automated, business integrated solution
• Include all the major components being used for a Tier 6
solution with the additional integration of automation.
• This allows a Tier 7 solution to ensure consistency of data
above that which is granted by Tier 6 solutions.
• Additionally, recovery of the applications is automated,
allowing for restoration of systems and applications much
faster and more reliably than would be possible through
manual Disaster Recovery procedures.
© COPYRIGHTED 2004
Disaster Recovery Solutions
• Important SAN solutions for DR are:
© COPYRIGHTED 2004
Remote Data Mirroring
•What is Data Mirroring?
•Method of replicating data- having more than one copy of data.
Primary Remote
Data Center DR center
•Benefits
•Reduces downtime after a Disaster Servers
•Multiple copies are available.
SAN
Data Mirroring
© COPYRIGHTED 2004
Remote Data Replication
Method of periodically replicating specific data or file systems.
WAN
Data Replication
© COPYRIGHTED 2004
Electronic tape Vaulting
Enables remote tape backup through WAN
Backup Backup
server server
Tape Vaulting
WAN
router router
Tape Tape
Data Replication
© COPYRIGHTED 2004
Class Discussion
Team A
•Describe data management concepts (Backup & Recovery,
Information Lifecycle)
Team B
•Given a scenario, identify the advantages and disadvantages of
replication, snapshot and split mirror disk backup techniques
Team C
•Identify how the emergence of SMI-S Policy based management can
be an advantage to data and storage management
© COPYRIGHTED 2004
References
• SNIA – Data Management Forum: Vision & Directions
• SNIA DMF ILM-Initiative
• The IBM TotalStorage Solutions Handbook
© COPYRIGHTED 2004