You are on page 1of 66

Course: 7043T - IT Services

Period : February 2018

Storage Management
Backup & Recovery Management
Session 6
D5664 – Dr. Eng. Antoni Wibowo
Storage Management
Storage Management

• Enterprise storage is the field of information technology focused on


the storage, protection, and retrieval of data in large-scale
environments.
• Disk mirroring is the simultaneous writing of data to two or more disks
in real time to provide 100% redundancy of the storage media.
• Replication is the duplication of stored data over an extended distance
• Backup refers to the copying of data files to some type of separate
media for protection of data and to facilitate file recovery
• Archiving is the practice of keeping infrequently used data off-line in an
organized structure for ease of location and retrieval.
• Disaster recovery is a comprehensive plan and redundant computer
structure used to protect data from localized disasters.
Process managing storage

• Capacity
• Performance
• Reliability
• Recoverability

Definition of Storage Management:


storage management is a process used to optimize
the use of storage devices and to protect the
integrity of data for any media on which it resides.
Storage management
capacity

• Providing sufficient data storage to authorized


user at a reasonable cost
• Alternative storage
• Utilization of large capacity storage device
• Monitor disk space usage
Storage Management
Performance

• Performance considerations at processor side


– Size and type of processor main memory
– Number and size of buffers
– Size of swap space
– Number and type of channels
– Device controller configuration
– Logical volume groups
– Amount of disk array cache memory
– Storage area network(SAN)
– Network attached storage (NAS)
Storage management
Reliability

• Disk configuration
– Redundant array of inexpensive disk (RAID)
• Storage management recoverability
– Backup
– recovery
SAN
• A SAN fabric is a network of Fibre Channel devices
interconnected by Fibre Channel Switched (FC-SW)
technology
• Fabrics are typically subdivided by Fibre Channel zoning,
which prevents individual servers from accessing storage
they are not allowed to use.
• Each fabric has a simple name server that manages port
logins and plays a role in the zoning process.
• Switches within a fabric can be connected with one another
using one or more ISLs (Inter Switch Link) to provide
additional fabric expansion and port over-subscription
Zoning segment
• Fibre Channel zoning segments servers and storage,
into isolated groups that cannot access unauthorized
storage, nor have its storage be accessed by other
servers.
There are two main methods of zoning, hard and soft,
and two sets of attributes, name and port.
• Soft zoning restricts access to storage through access permissions stored in
the switch’s Simple Name Server. However, zoning is not hardware enforced,
and if the resource already knows the location of a device, it can access it.
• Hard zoning restricts actual communication across a fabric via hardware
routing within the switch. Unauthorized access to storage is not permitted.
• Port zoning restricts ports from talking to unauthorized ports by specifying
exactly which port numbers are allowed within the zone.
• Name zoning restricts access by World Wide Name, which is managed by the
Simple Name Server.
networking

Host Host

This is a high availability fabric


topology, with no single point of
failure between host and storage.

Switch Switch Switch

Storage Storage This is a an ISL link that allows


switch-to-switch traffic. It allows
fabrics to expand to increase to
overall count of lower performance
FC port.
NAS
• Network-attached storage or NAS systems are storage
devices that can be accessed over a standard Ethernet
network
• NAS devices become logical file system storage for a local
area network.
• NAS was developed to address problems with the
complexity and cost associated with SAN-based storage
devices.
• NAS appliance pricing is typically significantly lower than
equivalent SAN storage due to commodity based network
pricing.
RAID

• RAID, which stands for redundant array of inexpensive disks, (later


known as redundant array of independent disk,) is a system which uses
multiple hard drives to appear as a single logical disk to a server.
• RAID combines multiple hard drives with a parity, or error correction
mechanism, to protect data from individual disk failures.
• RAID has multiple levels for specification of performance characteristics
within the Raid itself (i.e. – 100% protection, maximum performance,
high speed image transfers, good overall read/write performance, etc.)
• Six levels or RAID were originally specified. Today there are well over a
dozen different combinations of the original six levels, modifications of
the original levels, and vendor proprietary RAID definitions.
• One or more user-definable RAID levels is at the heart of most modern
storage subsystems.
RAID Levels
The most commonly implemented RAID levels
include:
• RAID 0: Striped without data protection (JBOD – Just a Bunch of Disks)
• RAID 1: Mirrored (100% redundancy)
• RAID 3: Striped data (dedicated parity disk)
• RAID 5: Striped (parity evenly distributed across disks)

Common nested RAID levels:


• RAID 6: Striped data with parity blocks distributed across two disks
• RAID 01: A mirrored set of striped disks
• RAID 10: A striped set of mirrored disks
• RAID 30: A stripe across dedicated parity RAID systems
• RAID 100: A stripe of a stripe of mirrors
RAID configuration
metrics include:
• Failure rate

• Mean time to data loss (MTTDL)

• Mean time to recovery (MTTR)

• Unrecoverable bit error rate (UBE)

• Atomic Write Failure


DAS and SAN

• Direct Attached Storage (DAS) refers to digital


storage directly attached to a server or
workstation.

• Storage area network (SAN) is a network


designed to connect computer storage such as
independent disk subsystems and tape
libraries to servers.
NAS - SAN
• NAS uses file storage access methods like SMB/CIFS or
NFS. A remote server presents its storage to other
systems and allows it to be “mounted” or “mapped” to the
target server’s existing file system giving the appearance
of additional local storage being available. File read and
write requests are “redirected” to the remote server’s
storage, transparently to the target system.

• Fibre Channel SAN storage uses the SCSI protocol for


communication between servers and devices. Storage
transfers are done at the “block” level and rely on a low-
level, highly efficient protocol, for minimum overhead.
NAS - SAN
• SANs are normally built on a specialized network infrastructure
specifically designed to handle storage communications.
• While SAN technology is usually considered to be a Fibre Channel
fabric network using the SCSI command set, it can just as easily be
structured as a network using TPC/IP over Ethernet.
• One protocol designed to create a SAN over Ethernet is iSCSI which
uses the same SCSI command set over TCP/IP.
• FCP, FC-IP, iFCP, and SAS are common protocols used in a SAN.
• SAN connections include one or more servers (hosts) and one or
more disk arrays, tape libraries, or other storage devices.
tape
• Tape library (sometimes called a tape silo or tape jukebox) is a
large storage device which contains one or more tape drives, a
number of slots to hold tape cartridges, a barcode reader to
identify tape cartridges and an automated method / robot for
loading tapes.
• An Autoloader is a smaller data storage device consisting of at
least one tape drive (the drive), a method of loading tapes into the
drive (the robot), and a storage area for tapes (the magazine).
• Other types of autoloaders may operate with Optical Disks and
CD-ROMs.
• A tape drive is a peripheral device that reads and writes data
stored on a magnetic tape reel or cartridge. It may be operated in
streaming or start/ stop mode, with or without data compression
turned on.
Magnetic - Optical
• Magnetic tape is a non-volatile storage medium consisting of a magnetic
oxide coating on a thin plastic strip.
• Magneto-optical disk and optical tape storage use many of the same
concepts as magnetic storage, but are not as common as magnetic tape.
• Optical recording media is used primarily for Write-Once-Read-Many
(WORM) capabilities, or where media deterioration over time is a concern.

Tapes and drives come in various formats. These formats include:


Digital Data Storage (DDS)
– Digital Linear Tape (DLT)
– Linear Tape-Open (LTO)
– Advanced Intelligent Tape (AIT)
– Quarter Inch Cartridge (QIC)
Primary-Secondary-
Nearline-Offline
• Primary storage is internal memory that is accessible to
the central processing unit without the use of the
computer’s input/output channels.
• Secondary storage is memory that is not directly
attached to the central processing unit of a computer,
requiring the use of computer's input/output channels.
Secondary storage is used to retain data that is not in
active use.
• Near-line storage is a storage medium that can be
recalled without manual intervention, but usually at the
cost of incurring a significant delay. (i.e. – direct data
retrieval from a tape library or optical jukebox.
• Off-line storage is a computer storage medium which
must be inserted into a storage drive by a human
operator before a computer can access the information
stored on the medium.
Possible Infrastructure Service Interconnections
with Storage Management:

• Configuration Management
• Event Management
• Availability Management
• Performance and Capacity Management
• Operations Management
• Network Management
• Security Management
• Inventory
• Business Process Management
Service Interconnections with Storage
Management
Possible Relationship Service Interconnections with Storage
Management include:

• Reporting Management
• SLA Management
• Knowledge Management
• Asset Management
• Notification and Escalation Management
• Problem Management
• Change Management
Storage Management Tool
Sampling

– Symantec (Veritas) NetBackup


– IBM’s Tivoli Storage Manager
– IBM’s Metro Mirror (pprc)
– Compute Associates Brightstor ARCserve
– Hewlett-Packard’s Data Protector
– EMC (Legato) Networker
– EMC’s Data Manager
– Network Appliance’s NearStore
Backup & Recovery Management
Backup and Recovery
Management
• Backup in computer engineering refers to copying
data to a separate media to facilitate the recovery of
lost or damaged files, and to protect the organization
from a major disaster.

• Data recovery is the process of salvaging data from


damaged, failed, wrecked or inaccessible storage
media when it cannot be accessed normally.

• Data backup is done on a defined schedule that


ensures the recovery process meets the
requirements of the business.
Backup & Recovery Management
Session

BACKUP MANAGEMENT
Backup strategies

Backup strategies include:

• Snapshot Backups
• Full Backup
• Differential Backup
• Incremental Backups
• Continuous Backups (CDP)
• Disk Mirroring
Backup issue
considerations
Backup issue considerations:

• Backup time to completion


• Multiple media backup
• Backup software
• Hardware considerations
• Application/Database Status
• Backup Window
• Backup Resources
• Data Validation
six primary metrics data
backup
There are six primary metrics relating to data backup:

– Recovery Point Objective (RPO)

– Backup Window

– Restore Time

– Retention Time

– Backup Validation

– Open File backup


Backup and Recovery
Management

Different roles of data backups

Backup procedures

Recovery strategy

Validation and Verification


types of backups

There are primarily six different types of backups for


online and offline methodologies:

• Full backup
• Incremental backup
• Differential backup
• Mirroring
• Snapshots
• CDP
Backup Window
Space Requirements
types of Damage

There are two types of Damage:

• Physical Damage

• Logical Damage
Techniques repair
programs
Techniques are used by these repair programs

• Consistency checking

• File System Structure Analysis

• Troubleshooting

• Component repair or replacement


Data compression

• Data compression also called source


coding, is the process of encoding
information while using fewer bits (or
other information-bearing units) than
an unencoded method would use
through specific encoding schemes.
Storage Devices

Storage Devices include:

• Hard disk drive


– SCSI Disk
– ATA Disk
• Magnetic tape
• Magneto-optical and optical tape storage
• Optical disc
• WORM
Backup & Recovery Management
Session

RECOVERY MANAGEMENT
Disasters
Disasters are the result of an unforeseen natural event (a physical
event e.g. a fire, tornado, hurricane, flood, earthquake, etc.) or the
consequences of human error.

The current data protection market is characterized by several


factors:

• The loss incurred by having data unavailable


• Recovery time frame
• Business continuity strategy (partial or full restoration)
• Level of data protection required by the business
Common enterprise risks
Common enterprise risks include:

– Fire
– Natural Causes (wind, earthquake, ice storm, etc.)
– Power or Communications Outages
– War and Regional Conflicts
– Terrorist Attacks
– Civil Disruptions
– System and/or Equipment Outages
– Human Error
– Computer Viruses
– Governmental or Legal Intervention
– Loss of key personnel
Preventions Against
Disasters
Preventions Against Disasters include
• Offsite Backups
• Surge Protectors
• Uninterruptible Power Supply (UPS)
• Emergency generators
• Fire prevention systems
• Anti-virus software
• Redundant computing facilities
important factors disaster
recovery plan
A good disaster recovery plan acknowledges the
following important factors:
• Customers
• Facilities
• Knowledge Workers
• Business Information
• Security of data
• Classification of data for staged recovery
Disaster Recovery
Disaster Recovery Process:

– Buy new equipment (hardware) or repair or remove


viruses, etc.
– Call software provider and reinstall software
– Retrieve offsite storage discs
– Reinstall all data from back-up source
– Re-enter data from the previous week or latest copy

Disaster Recovery Technology:

– Virtual Tape library


– Synchronous replication software and technology
– Virtual PBX/hosted phone service
Key Backup and Recovery
Terms

Key Backup and Recovery Terms:

• Recovery Point Objective (RPO)

• Recovery Time Objective (RTO)


BCP

• Business Continuity Planning (BCP) is a


methodology used to create a plan describing
how an organization will resume critical
functions either partially or completely which
were interrupted within a predetermined time
following a disaster or disruption.
Impact analysis

BCP manual consists of:

1. Impact analysis

Recovery requirements consist of the following information:

• Resolution time frame


• Business requirements
• Technical requirements
Threat analysis
BCP manual consists of:

2. Threat analysis - Common threats include the following:

• Disease
• Earthquake
• Fire
• Flood
• Cyber attack
• Hurricane
• Utility outage
• Terrorism
Definition of impact
scenarios
BCP manual consists of:

3. Definition of impact scenarios


(Note: One paper from Deloitte and Touche lists as:

a) Emergency response (protection of life and safety)


b) Disaster assessment (identify scope and criticality of the
disaster)
c) Short term recovery (restore critical services)
d) Long term recovery (restore all services and capabilities)
e) Return to pre-disaster operations (fall-back process to
return services to the primary data center) (RC)
Recovery requirement
documentation
Recovery requirement documentation may include information on:
• Desks Manual workaround solutions (interim floor plans
and user equipment needs)
• Recovery personnel (short term, long term, specialists,
etc.)
• Application and data (All, or categorized by value to the
business)
• Maximum outage allowed for the applications
• Peripheral requirements
• Unique business environment requirements

Disaster Recover Plan and BCP are sometimes synonymous


Disaster Recovery Plan Purpose and Objectives
hardware redundancy

The following is a list of hardware redundancy that is recommend:

– Failover or Clustered processors


– Redundant array of inexpensive disk (RAID) devices
– Dual access paths
– Dual I/O controllers
– Dual power supplies
– Uninterruptible power source (UPS)
Nature of failures

Nature of failures include:

– Human factor
– Hardware failure
– Transaction failure
– Disaster
Outages

Outages are classified into two categories:

• Planned outages

• Unplanned outages
Availability
• Availability is a measure of the time that a server or process is functioning
normally, as well as a measure of the time the recovery process requires after a
component failure.

• High availability roughly equates to a system and its data available almost all the
time, 24 hours a day, 7 days a week, and 365 days a year.
High Availability
level of availability

Five level of availability are:

Level 1: Basic systems, no redundancy


Level 2: RAID x, disk redundancy
Level 3: Failover, component redundancy
Level 4: Replication, data redundancy
Level 5: Disaster recovery, site redundancy
tiers of disaster recovery
The seven tiers of disaster recovery are:

– Tier 0 - No off-site data


– Tier 1 - Data backup with no hot site
– Tier 2 - Data backup with a hot site
– Tier 3 - Electronic vaulting
– Tier 4 - Point-in-time copies
– Tier 5 - Transaction integrity
– Tier 6 - Zero or little data loss
– Tier 7 - Highly automated, business-integrated
solution
Backup & Recovery Management
session
Business Continuity Planning
BCP

• Business Continuity Planning (BCP) is a


methodology used to create a plan describing
how an organization will resume critical
functions either partially or completely which
were interrupted within a predetermined time
following a disaster or disruption.
business continuity facts

Facts about business continuity include:

– Traditional 72-hour recovery periods for business-critical processes are no


longer good enough.
– A new 4 to 24 hour recovery time and recovery point objectives are
generally used.
– A need for a larger goal of ensuring resumption and recovery of end-to-end
enterprise business processes.
– Active/passive configuration between two sites for 30-60 minute recovery.
– 24x7 continuous availability being designed into most critical applications.
– Geographic diversity is imperative
Cost versus Loss
Business Continuity
solution design
A suitable Business Continuity solution design, asks the
following questions:
• How much data can the organization afford to lose?
• What is the organization’s recovery point objective
(RPO)?
• How long can the organization afford to have the
system offline?
• What is the organization’s recovery time objective (RTO)?
Business Continuity
solution design

Factors contributing to higher costs include:


• More complex IT operating environment as a result of exponential growth of
storage capacity and diversification of operating systems (strong growth of
Windows NT® in the last two decades and the emergence of Linux® in the
late 1990s).
• The new era of e-business requires solutions operating at 24 hours a day, 7
days a week, 365 days a year (7x24x365), and on a more global basis of
information exchange between dispersed sites.
• Digital data continues to expand dramatically, and more data is critical to the
business.
• Increased complexity with new data protection and regulatory requirements
Business Continuity factors
For a Business Continuity solution design, the following factors need to be
considered:
– Categorize requirements by value to the business.
– Identify critical applications and data.
– Determine cost of downtime.
– Develop solution with need and cost in mind.
– Implementation time.
– Provision for periodic testing.
– For disaster recovery, compliance to data center disaster recovery strategy
within overall corporate business continuity objectives.
– Which tiers of the 7 tiers in disaster recovery would you like to follow?
– In High availability, which of the 5 levels of availability would you like to achieve?
Business continuity
planning
When planning for business continuity, you should consider the
following:

– Information-based business model.

– Transaction-based (versus batch-based)


processing.

– Distributed work environment.

– People-based business.
Infrastructure Service Interconnections with
Backup and Recovery Management
Possible Infrastructure Service Interconnections with Backup and
Recovery
Management include:

• Configuration Management
• Event Management
• Operations Management
• Availability Management
• Performance and Capacity Management
• Network Management & Security Management
• Inventory
• Business Process Management
• Resource Management (w/ Utility Computing)
Relationship Service Interconnections with
Backup and Recovery Management

Possible Relationship Service Interconnections with Backup and


Recovery Management include:

• Reporting Management
• SLA Management
• Knowledge Management
• Asset Management
• Notification and Escalation Management
• Problem Management

You might also like