Data Center Disaster Recovery

Data Center
Disaster
Recovery
3
Data Center Evolution

Networked Data Center
Phase
Data Center Continuous
Data Center Availability
Data Center
Virtualization
Network Consolidation
Compute Internet Optimization
Evolution Computing Data Center
Client/ Networking
Server 1.Consolidation
Mainframes 2.Integration
Content 3.Virtualization
Networking 4.High Availability
Thin
Client:
TCP/IP HTTP
Network
Terminal Evolution
1960 1980 2000 2010

Today’s Data Center
Integration of Many Systems and Services
Storage N-Tier Front End
Network Applications Network
Application/Server WAN/
Optimization
FC
Security Internet Switch
Web Servers Resilient Cache
IP DR Data Center
Firewall
Scalable Infrastructure Application

NAS and Server Optimization Data Center
App Servers Content
IDS Switch Security
MAN/
Internet
DC Storage Networks
VSANs Distributed Data Centers
DB Servers
FC
Switch Mainframe IP Comm. Operations
FC
Switch
RAID
Metro Network
DWDM/SONET/Ethernet
FC
Tape SAN Secondary Data Center 4
What Is Distributed Data
Center?
App A App B App A App C
Data Replication
FC FC
Primary Data Secondary

Center Data Center
Distributed Data Centers
• Required by disaster recovery and business
Continuance
• Avoid single, concentrated data depositary
• High availability of applications and data
access
• Load balancing together with performance
scalability
• Better response and optimal content routing:
proximity to clients
5
Front-----End IP Access
Layer
“Content Routing”
Site Selection
FC FC

Center Data Center
9
Application and Database
Layer
“Content Switching”
App A App B Load Balancing App A App C
“Server Clustering”
High Availability
FC FC

Center Data Center
:
Backend SAN Extension
App A App B
“Storage” and “Optical” App A App C
Data Replication
and Transporting
FC FC

Center Data Center
2
Disaster Recovery
•  Recovery of data and resumption of service
—Ensuring
• business can recover and continue after
failure or disaster
• Ability of a business to adapt, change and
continue when confronted with various
outside impacts Mitigating the impact of a
disaster
3
Disaster Recovery
What It Means for Business
-
4
Disaster Recovery Planning

•  Business Impact Analysis (BIA)
Determines the impacts of various disasters to specific
business functions and company assets
• Risk analysis
Identifies important functions and assets that are critical
to company’s operations
• Disaster Recovery Plan (DRP)
Restores operability of the target systems, applications,
or computing facility at the secondary data center after
the disaster
Disaster Recovery Objectives
•  Recovery Point Objective (RPO)
The point in time (prior to the outage) in which system
and data must be restored to Tolerable lost of data in
event of disaster or failure.
The impact of data loss and the cost associated with
the loss
• Recovery Time Objective (RTO)
The period of time after an outage in which the
systems and data must be restored to the
predetermined RPO The maximum tolerable outage
time
Recovery Point/Time vs. Cost
Critical Data Is Disaster Systems Recovered
Recovered Strikes and Operational
Time
Recovery Point Recovery Time
time t0 Time t1 Time t2
Days Hours Mins Secs Secs Mins Hours Days Weeks
Tape Periodic Asynchronous Synchronous Extended Manual Tape

backup Replication Replication Replication Cluster Migration Restore
$$$ Increasing Cost $$$ Increasing Cost
Smaller RPO/RTO Larger RPO/RTO

Higher $$$, Lower $$$, tape
replication, backup/restore,
hot standby cold standby 5
Failure Scenarios
Disaster Could Mean Many Types of Failure
Network failure
Device failure
Storage failure
Site failure
:
Network Failures
 Service Internet Service
ISP failure Provider A Provider B
Dual ISP connections

Multiple ISP
Connection failure within
the network
EtherChannel®
Multiple route paths
Device Failures
Service Internet Service
Provider A Provider B
 Routers, switches,
FWs
HSRP
VRRP
Hosts
HA cluster
LB server farm
NIC teaming
Storage Failures
Diskarrays Service Internet Service
RAID
Disk controllers
Storage Replication
Site to Site Mirroring Optimization
2
Site Failures
• Partial site failure Service
Provider A
Internet Service
Provider B
-Application maintenance
-Application migration
-Application scheduled
-DR exercise
• Complete site failure
-Disaster
4
Warm Standby
• A data center that is equipped with
hardware and communications
interfaces capable of providing
backup operating support
• Latest backups from the production data
center must be delivered
• Network access needs to be activated
• Application needs to be manually started
Disaster Recovery—
Active/Standby
IP/Optical Network
FC FC
Secondary
Primary Data Data Center
Center (Warm Standby)
Hot Standby
A data center that is environmentally ready and
has sufficient hardware, software to provide data
processing service with little down time
• Hot backup offers disaster recovery, with little or
no human intervention
• Application data is replicated from the primary
site
• A hot backup site provides better RTO/RPO than
warm standby but cost more to implement
• Business continuance
5
Disaster Recovery—
Active/Standby
IP/Optical Network
FC FC

Center Data Center
9
Active/Active DR Design Multiple
Tiers of Application
Service Internet Service

Presentation Tier
Application Tier
Storage Tier
:
Active/Active Data Centers
Internal
Service
Internet Service
Network
Provider A Provider B Internal
Network
Active/Active Web
Hosting
Active/Active
Application
Processing
Active/Standby
Database Processing
or
Active/Active
for Different
Application 2
Site Selection Mechanisms
• Site selection mechanisms depend on the
technology or mix of technologies adopted
for request routing:
1. HTTP redirect
2. DNS-based
3. L3 Routing with Route Health Injection (RHI)
• Health of servers and/or applications needs
to be taken into account
• Optionally, other metrics (like load) can be
measured and utilized for a better selection
HTTP Redirection—Traffic Flow
http://www.cisco.com/
http://www1.cisco.com/
.1
TP/1 .com
T o
T/H .cisc ed
G E o v om
Kee
w M
1. t: ww 302 cisco
.c
s .1
Ho /1 w2
.
paliv
T TP w
w
2. H tion:
e
a
Loc
s
3. GET/H
TTP/1.1
Host: ww
w2.cisco
.com
HTTP/1.1
200 O K
http://www2.cisco.com/
DNS----Based Site Selection
—Traffic Flow Authoritative Name
Root Name Server for/
Server for .com
2
3 4 Authoritative
Name Server
DNS Proxy 5 cisco.com
1 6
10 7
8
9 Authoritative
Client
Name Server
http://www.cisco.com/ www.cisco.com
s
UDP:53 alive
s
ep
Ke
live
TCP:80
ep a
Ke
Data Center 1 Data Center 2
Route Health Injection
Implementation
Client A Router 11 Client B
Router 13
Router 10
Router 12 Low Cost

Very High Cost
Location A
Backup Location for Location B
VIP x.y.w.z Preferred Location for
VIP x.y.w.z
2
Site Selection Summary
25
Cluster Overview
Load Balancing Cluster : multiple
copies of the same application
against the same data set,
usually read only Web Servers
High Availability Cluster :
multiple copies of application
that requires access to a
Application Servers
common data depository, usually
read and write
Clustering provides benefits for
availability, reliability, scalability, Database Servers
and manageability
2:
High Availability Cluster Design
Public Network : Client
/Application requests
APP
Cluster
Software
Cluster
Enabler
Private Network : OS
Interconnection between
nodes
Storage Disk : Shared

storage array, NAS or
SAN
3
HA Cluster Application View
Active/standby
Standby takes over when active fails
Two-node or multi-node
Active/active
Database requests load balanced all nodes
Lock mechanism ensures data integrity
Shared everything Node1 Node2
Each node mounts all storage resources

Provides a single layout reference system for
all nodes
Shared nothing
Each node mounts only its “semi-private”
storage
Data stored on the peer system’s storage is
accessed via the peer-peer communication
3
Geo--Clusters
Considerations
Geo-Cluster: Cluster That Span Multiple
Data Centers
WAN
Local Remote
Datacenter Datacenter
Node1 Node2

Disk Replication
Challenges:
Synchronous or Asynchronous
2 x RTT
Split brain
L2 heart-
beats
Storage 3
3
2
HA Cluster Challenges :
Split-Brain
Split-brain : Active nodes

concurrently accessing the
same disk, leads to data Node1 Node2
corruption
Resolution : Use a Quorum, a
tie breaker for gaining access
to the disk
Data Corruption
3
3
Layer 2 Heartbeats
WAN
Extended L2 Network : Local Remote
Datacenter Datacenter
L2 adjacency required
for node’s heartbeat. Public Layer 2 Network
Extending VLAN across Node1
Private Layer 2 Network
Node2
site is hazardous
Resolution : L3
Capability for Cluster Disk Replication
Synchronous or Asynchronous
Heartbeat. EoMPLS to
carry L2 hearbits
across DR sites.
3
4
Storage Disk Zoning
Node1 Node2
Active Standby
Storage Zoning : Taking over of

storage disk array when active
node fails. Extended SAN
Resolution : Cluster software to

communicate with the Cluster
Enabler.
Instructs the Disk Array to perform
an failover when failure is detected. sym1320 sym1291
RW WD
RW WD
Storage for Applications
• Presentation tier
Unrelated small data files commonly stored on
internal disks
Manual distribution
• Application processing tier
Transitional, unrelated data
Small files residing on file systems
May use RAID to spread data over multiple disks
• Storage tier
Large, permanent data files or raw data
Large batch updates, most likely real time
Log and data on separate volumes 35
Replication: Modes of
Operation
Synchronous
All data written to local and remote arrays before I/O
is complete and acknowledged to host
Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3Hs/km
Speed through Fiber ≈ ⅔ c ≈ 5Hs/km
2 RTT per write I/O = 20Hs/km
Asynchronous
Write acknowledged and I/O is complete after write to
local array; changes (writes) are replicated to remote
array asynchronously
39
Synchronous vs.
Asynchronous Trade-- Off
Enterprises Must Evaluate the Trade-Offs
Synchronous Asynchronous
Impact to Application No Application Performance
Performance Impact
Distance Limited (Are Unlimited Distance (Second
Both Sites Within the Site Outside Threat Radius)
Same Threat Radius)
Exposure to Possible Data
No Data Loss Loss
• Maximum tolerable distance ascertained by

assessing each application
• Cost of data loss
3:
Data Replication with DB
Example
• DB name
Control
• Creation date
Files
• Backup • Control files identify other files
performed
making up the database and
• Redo log time
period records content and state of the
• Datafile state db
• Datafile is only updated
Identify periodically
• Redo logs record db changes
resulting from transactions
Used to play back changes that
may not have been written to
Record datafile when failure occurred
Redo Log
Datafiles Changes To
Files
Typically archived as they fill to
• Table spaces M Database changes local and DR site destinations
• indexes
• Data dictionary
4
Example (Cont.)
Time
Failure or Disaster Occurs at

... ... ... Time t1
Archived Redo Logs Online Redo
t1 •Media failure (e.g., disk)
t0 • Human error (datafile deletion)
Logs
• Database corruption
Hot Backup of Database restored to state at time of failure

Datafiles and
Control Files (time t1) by:
Taken at Time t0
1. Restoring control files and datafiles from last
hot backup (time t0)
2. Sequentially replaying changes from subsequent
redo logs (archived and online)—changes made
between
time t0 and t1
4
Example (Cont.)
Primary Site Redo Logs (Cyclic) Secondary Site
Redo Logs (Cyclic)
Copy of Every Committed
Transaction
Synchronously
Replicated Earlier DB
Backups
for Zero Loss
Database
SAN
Database Extension Database
Copy at
Copy at Transport Time t0
Time t0
Point in
Replicated/Copied
Time
Copy
Taken Archive Logs Archive Logs
When DB Replicated/Copied
Quiescent
Mixture of Sync and Async Replication Technologies

Commonly Used
• Usually only redo logs sync replicated to remote site
• Archive logs created from redo log and copied when redo log switches
• Point in Time (PiT) copies of datafiles and control files copied periodically
(e.g., 4
nightly)
4
2
Data Center Interconnection
Options
Internet
Stateful Stateful Internet
Firewalls Firewalls
Content Content
Caching Caching
High SONET/SDH Server High
Server
Density Load Density
Load
Multilayer Balancing Balancing Multilayer
LAN LAN
Switch Intrusion Intrusion
Detection Switch
Detection
Front2End
Front2End Application
Application Servers
Servers
DWDM/
CWDM
Back2End
Back2End Application
Application Servers
Servers
High High
Density Density
Multilayer Multilayer
SAN SAN
Director Enterprise2Class
IP/Metro E Storage Arrays
Director
Enterprise2Class
Storage Arrays
4
3
Data Center Transport Options

Increasing Distance
Data
Center Campus Metro Regional National
Dark Fiber Sync Limited by Optics (Power Budget)
CWDM Sync (2Gbps) Limited by Optics (Power Budget)
Limited by BB_Credits
DWDM Sync (2Gbps Lambda)
Async
SONET/SDH Sync (1Gbps+ Subrate)
MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)
Cisco Data Center Vision
Server Enterprise AUTOMATION
Data Storage
Fabric Applications
Network Network Dynamic provisioning and
Network autonomic Information Lifecyle
Management (ILM) to enable
business agility
Business Policies
LAN HPC VIRTUALIZATION On2Demand
WAN SAN Service Oriented
Cluster Management of resources
MAN GRID independent of underlying
physical infrastructure to
Intelligent increase utilization, efficiency
Information and flexibility Compute
Network
CONSOLIDATION Network
Centralization and
standardization to lower costs,
improve efficiency and uptime Storage
Compute Network Storage
4
Today’s Data Centers
Require an Architectural Approach to…
4:
Protect with Business Resilience

•Tighten security
•Improve business continuance
Optimize with Consolidation
•Improve operational efficiency
•and resource utilization
•Lower complexity and cost of ownership
Grow towards Services-oriented
Infrastructure
•Align virtualized resources with business
demands
•Automate infrastructure to respond
dynamically
The Big Picture—The Cisco Data Center
The Emerging MAINFRAME ENTERPRISE
CONNECTIVITY TAPE STORAGE
ENTERPRISE
DISK STORAGE
Data Center ENTERPRISE SAN

Architecture SWITCHING
Embedded Intelligent
Virtual Fabrics (VSANs)
MDS 9000 Storage Services
Family Storage Virtualization
Data Replication Svcs
Embedded Intelligent Fabric Routing Svcs

Network Services
Server Balancing
Embedded Intelligent
Virtualization Services
Multiprotocol
VPN Termination Gateway Services V Server Virtualization
SSL Termination Virtual I/O
Grid/Utility Computing
Firewall Services Catalyst 6500 TOPSPIN
Family FAMILY Low Latency RDMA
Services
Intrusion Detection
Clustering
SERVER
ENTERPRISE FABRIC
WIN GRID SWITCHING
NAS UNIX
Enterprise UNIX/Windows Blade Virtual Private Virtual Private Virtual Private
NAS Storage Servers Servers Server Server Blade Server

Fabric #1 Fabric #2 Fabric #3
2

Data Center Disaster Recovery

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Center Disaster Recovery

Uploaded by

Copyright:

Available Formats

Data Center

Data Center Evolution

Data Center Availability

1960 1980 2000 2010

Scalable Infrastructure Application

App A App B App A App C

Primary Data Secondary

Primary Data Secondary

Primary Data Secondary

Primary Data Secondary

Disaster Recovery Planning

Days Hours Mins Secs Secs Mins Hours Days Weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape

$$$ Increasing Cost $$$ Increasing Cost

Smaller RPO/RTO Larger RPO/RTO

Dual ISP connections

App A App B App A App C

App A App B App A App C

Primary Data Secondary

Service Internet Service

Router 12 Low Cost

Storage Disk : Shared

Each node mounts all storage resources

Split-brain : Active nodes

Storage Disk Zoning

Storage Zoning : Taking over of

Resolution : Cluster software to

• Maximum tolerable distance ascertained by

Failure or Disaster Occurs at

Hot Backup of Database restored to state at time of failure

Mixture of Sync and Async Replication Technologies

Data Center Transport Options

Dark Fiber Sync Limited by Optics (Power Budget)

CWDM Sync (2Gbps) Limited by Optics (Power Budget)

Protect with Business Resilience

Data Center ENTERPRISE SAN

Data Replication Svcs

Embedded Intelligent Fabric Routing Svcs

SSL Termination Virtual I/O

Enterprise UNIX/Windows Blade Virtual Private Virtual Private Virtual Private

NAS Storage Servers Servers Server Server Blade Server

You might also like