You are on page 1of 48

Data Center

Disaster
Recovery
3

Data Center Evolution


Networked Data Center
Phase
Data Center Continuous

Data Center Availability

Data Center
Virtualization
Network Consolidation
Compute Internet Optimization
Evolution Computing Data Center
Client/ Networking
Server 1.Consolidation
Mainframes 2.Integration
Content 3.Virtualization
Networking 4.High Availability
Thin
Client:
TCP/IP HTTP

Network
Terminal Evolution

1960 1980 2000 2010


Today’s Data Center
Integration of Many Systems and Services
Storage N-Tier Front End
Network Applications Network
Application/Server WAN/
Optimization
FC
Security Internet Switch
Web Servers Resilient Cache
IP DR Data Center
Firewall

Scalable Infrastructure Application


NAS and Server Optimization Data Center
App Servers Content
IDS Switch Security
MAN/
Internet
DC Storage Networks
VSANs Distributed Data Centers
DB Servers
FC
Switch Mainframe IP Comm. Operations
FC
Switch
RAID

Metro Network
DWDM/SONET/Ethernet
FC
Tape SAN Secondary Data Center 4
What Is Distributed Data
Center?

App A App B App A App C

Data Replication

FC FC

Primary Data Secondary


Center Data Center
Distributed Data Centers
• Required by disaster recovery and business
Continuance
• Avoid single, concentrated data depositary
• High availability of applications and data
access
• Load balancing together with performance
scalability
• Better response and optimal content routing:
proximity to clients

5
Front-----End IP Access
Layer

“Content Routing”
Site Selection
App A App B App A App C

FC FC

Primary Data Secondary


Center Data Center

9
Application and Database
Layer

“Content Switching”
App A App B Load Balancing App A App C

“Server Clustering”
High Availability

FC FC

Primary Data Secondary


Center Data Center

:
Backend SAN Extension

App A App B
“Storage” and “Optical” App A App C
Data Replication
and Transporting

FC FC

Primary Data Secondary


Center Data Center
2

Disaster Recovery
•  Recovery of data and resumption of service
—Ensuring
• business can recover and continue after
failure or disaster
• Ability of a business to adapt, change and
continue when confronted with various
outside impacts Mitigating the impact of a
disaster
3
Disaster Recovery
What It Means for Business

-
4

Disaster Recovery Planning


•  Business Impact Analysis (BIA)
Determines the impacts of various disasters to specific
business functions and company assets
• Risk analysis
Identifies important functions and assets that are critical
to company’s operations
• Disaster Recovery Plan (DRP)
Restores operability of the target systems, applications,
or computing facility at the secondary data center after
the disaster
Disaster Recovery Objectives
•  Recovery Point Objective (RPO)
The point in time (prior to the outage) in which system
and data must be restored to Tolerable lost of data in
event of disaster or failure.
The impact of data loss and the cost associated with
the loss
• Recovery Time Objective (RTO)
The period of time after an outage in which the
systems and data must be restored to the
predetermined RPO The maximum tolerable outage
time
Recovery Point/Time vs. Cost
Critical Data Is Disaster Systems Recovered
Recovered Strikes and Operational

Time
Recovery Point Recovery Time
time t0 Time t1 Time t2

Days Hours Mins Secs Secs Mins Hours Days Weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape


backup Replication Replication Replication Cluster Migration Restore

$$$ Increasing Cost $$$ Increasing Cost

Smaller RPO/RTO Larger RPO/RTO


Higher $$$, Lower $$$, tape
replication, backup/restore,
hot standby cold standby 5
Failure Scenarios
Disaster Could Mean Many Types of Failure
Network failure
Device failure
Storage failure
Site failure

:
Network Failures
 Service Internet Service
ISP failure Provider A Provider B

Dual ISP connections


Multiple ISP
Connection failure within
the network
EtherChannel®
Multiple route paths
Device Failures
Service Internet Service
Provider A Provider B
 Routers, switches,
FWs
HSRP
VRRP
Hosts
HA cluster
LB server farm
NIC teaming
Storage Failures
Diskarrays Service Internet Service
Provider A Provider B
RAID
Disk controllers
Storage Replication
Site to Site Mirroring Optimization
2

Site Failures
• Partial site failure Service
Provider A
Internet Service
Provider B
-Application maintenance
-Application migration
-Application scheduled
-DR exercise
• Complete site failure
-Disaster
4

Warm Standby
• A data center that is equipped with
hardware and communications
interfaces capable of providing
backup operating support
• Latest backups from the production data
center must be delivered
• Network access needs to be activated
• Application needs to be manually started
Disaster Recovery—
Active/Standby

App A App B App A App C

IP/Optical Network

FC FC
Secondary
Primary Data Data Center
Center (Warm Standby)
Hot Standby
A data center that is environmentally ready and
has sufficient hardware, software to provide data
processing service with little down time
• Hot backup offers disaster recovery, with little or
no human intervention
• Application data is replicated from the primary
site
• A hot backup site provides better RTO/RPO than
warm standby but cost more to implement
• Business continuance

5
Disaster Recovery—
Active/Standby

App A App B App A App C

IP/Optical Network

FC FC

Primary Data Secondary


Center Data Center

9
Active/Active DR Design Multiple
Tiers of Application

Service Internet Service


Provider A Provider B

Presentation Tier

Application Tier

Storage Tier

:
Active/Active Data Centers
Internal
Service
Internet Service
Network
Provider A Provider B Internal
Network

Active/Active Web
Hosting
Active/Active
Application
Processing
Active/Standby
Database Processing
or
Active/Active
for Different
Application 2
Site Selection Mechanisms
• Site selection mechanisms depend on the
technology or mix of technologies adopted
for request routing:
1. HTTP redirect
2. DNS-based
3. L3 Routing with Route Health Injection (RHI)
• Health of servers and/or applications needs
to be taken into account
• Optionally, other metrics (like load) can be
measured and utilized for a better selection
HTTP Redirection—Traffic Flow

http://www.cisco.com/

http://www1.cisco.com/
.1
TP/1 .com
T o
T/H .cisc ed
G E o v om

Kee
w M
1. t: ww 302 cisco
.c
s .1
Ho /1 w2
.

paliv
T TP w
w
2. H tion:

e
a
Loc

s
3. GET/H
TTP/1.1
Host: ww
w2.cisco
.com

HTTP/1.1
200 O K

http://www2.cisco.com/
DNS----Based Site Selection
—Traffic Flow Authoritative Name
Root Name Server for/
Server for .com
2
3 4 Authoritative
Name Server
DNS Proxy 5 cisco.com
1 6
10 7
8
9 Authoritative
Client
Name Server
http://www.cisco.com/ www.cisco.com
s
UDP:53 alive

s
ep
Ke

live
TCP:80

ep a
Ke
Data Center 1 Data Center 2
Route Health Injection
Implementation
Client A Router 11 Client B
Router 13

Router 10

Router 12 Low Cost


Very High Cost
Location A
Backup Location for Location B
VIP x.y.w.z Preferred Location for
VIP x.y.w.z

2
Site Selection Summary

25
Cluster Overview
Load Balancing Cluster : multiple
copies of the same application
against the same data set,
usually read only Web Servers
High Availability Cluster :
multiple copies of application
that requires access to a
Application Servers
common data depository, usually
read and write
Clustering provides benefits for
availability, reliability, scalability, Database Servers

and manageability

2:
High Availability Cluster Design
Public Network : Client
/Application requests
APP
Cluster
Software
Cluster
Enabler
Private Network : OS

Interconnection between
nodes

Storage Disk : Shared


storage array, NAS or
SAN

3
HA Cluster Application View
Active/standby
Standby takes over when active fails
Two-node or multi-node
Active/active
Database requests load balanced all nodes
Lock mechanism ensures data integrity
Shared everything Node1 Node2

Each node mounts all storage resources


Provides a single layout reference system for
all nodes
Shared nothing
Each node mounts only its “semi-private”
storage
Data stored on the peer system’s storage is
accessed via the peer-peer communication

3
Geo--Clusters
Considerations
Geo-Cluster: Cluster That Span Multiple
Data Centers
WAN
Local Remote
Datacenter Datacenter

Node1 Node2


Disk Replication
Challenges:
Synchronous or Asynchronous
2 x RTT
Split brain
L2 heart-
beats
Storage 3
3
2
HA Cluster Challenges :
Split-Brain

Split-brain : Active nodes


concurrently accessing the
same disk, leads to data Node1 Node2
corruption
Resolution : Use a Quorum, a
tie breaker for gaining access
to the disk

Data Corruption
3
3

Layer 2 Heartbeats

WAN
Extended L2 Network : Local Remote
Datacenter Datacenter
L2 adjacency required
for node’s heartbeat. Public Layer 2 Network
Extending VLAN across Node1
Private Layer 2 Network
Node2

site is hazardous
Resolution : L3
Capability for Cluster Disk Replication
Synchronous or Asynchronous
Heartbeat. EoMPLS to
carry L2 hearbits
across DR sites.
3
4

Storage Disk Zoning

Node1 Node2

Active Standby

Storage Zoning : Taking over of


storage disk array when active
node fails. Extended SAN

Resolution : Cluster software to


communicate with the Cluster
Enabler.
Instructs the Disk Array to perform
an failover when failure is detected. sym1320 sym1291

RW WD

RW WD
Storage for Applications

• Presentation tier
Unrelated small data files commonly stored on
internal disks
Manual distribution
• Application processing tier
Transitional, unrelated data
Small files residing on file systems
May use RAID to spread data over multiple disks
• Storage tier
Large, permanent data files or raw data
Large batch updates, most likely real time
Log and data on separate volumes 35
Replication: Modes of
Operation
Synchronous
All data written to local and remote arrays before I/O
is complete and acknowledged to host
Speed of Light = 3 x 108m/s (Vacuum) ≈ 3.3Hs/km
Speed through Fiber ≈ ⅔ c ≈ 5Hs/km
2 RTT per write I/O = 20Hs/km
Asynchronous
Write acknowledged and I/O is complete after write to
local array; changes (writes) are replicated to remote
array asynchronously

39
Synchronous vs.
Asynchronous Trade-- Off
Enterprises Must Evaluate the Trade-Offs

Synchronous Asynchronous
Impact to Application No Application Performance
Performance Impact
Distance Limited (Are Unlimited Distance (Second
Both Sites Within the Site Outside Threat Radius)
Same Threat Radius)
Exposure to Possible Data
No Data Loss Loss

• Maximum tolerable distance ascertained by


assessing each application
• Cost of data loss

3:
Data Replication with DB
Example
• DB name
Control
• Creation date
Files
• Backup • Control files identify other files
performed
making up the database and
• Redo log time
period records content and state of the
• Datafile state db
• Datafile is only updated
Identify periodically
• Redo logs record db changes
resulting from transactions
Used to play back changes that
may not have been written to
Record datafile when failure occurred
Redo Log
Datafiles Changes To
Files
Typically archived as they fill to
• Table spaces M Database changes local and DR site destinations
• indexes
• Data dictionary

4
Data Replication with DB
Example (Cont.)

Time

Failure or Disaster Occurs at


... ... ... Time t1
Archived Redo Logs Online Redo
t1 •Media failure (e.g., disk)
t0 • Human error (datafile deletion)
Logs
• Database corruption

Hot Backup of Database restored to state at time of failure


Datafiles and
Control Files (time t1) by:
Taken at Time t0
1. Restoring control files and datafiles from last
hot backup (time t0)
2. Sequentially replaying changes from subsequent
redo logs (archived and online)—changes made
between
time t0 and t1
4
Data Replication with DB
Example (Cont.)
Primary Site Redo Logs (Cyclic) Secondary Site
Redo Logs (Cyclic)
Copy of Every Committed
Transaction
Synchronously
Replicated Earlier DB
Backups
for Zero Loss
Database

SAN
Database Extension Database
Copy at
Copy at Transport Time t0
Time t0
Point in
Replicated/Copied
Time
Copy
Taken Archive Logs Archive Logs
When DB Replicated/Copied
Quiescent

Mixture of Sync and Async Replication Technologies


Commonly Used
• Usually only redo logs sync replicated to remote site
• Archive logs created from redo log and copied when redo log switches
• Point in Time (PiT) copies of datafiles and control files copied periodically
(e.g., 4

nightly)
4
2
Data Center Interconnection
Options
Internet
Stateful Stateful Internet
Firewalls Firewalls
Content Content
Caching Caching
High SONET/SDH Server High
Server
Density Load Density
Load
Multilayer Balancing Balancing Multilayer
LAN LAN
Switch Intrusion Intrusion
Detection Switch
Detection
Front2End
Front2End Application
Application Servers
Servers
DWDM/
CWDM
Back2End
Back2End Application
Application Servers
Servers
High High
Density Density
Multilayer Multilayer
SAN SAN
Director Enterprise2Class
IP/Metro E Storage Arrays
Director
Enterprise2Class
Storage Arrays
4
3

Data Center Transport Options


Increasing Distance
Data
Center Campus Metro Regional National

Dark Fiber Sync Limited by Optics (Power Budget)

CWDM Sync (2Gbps) Limited by Optics (Power Budget)

Limited by BB_Credits
DWDM Sync (2Gbps Lambda)
Async
SONET/SDH Sync (1Gbps+ Subrate)
MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)
Cisco Data Center Vision
Server Enterprise AUTOMATION
Data Storage
Fabric Applications
Network Network Dynamic provisioning and
Network autonomic Information Lifecyle
Management (ILM) to enable
business agility

Business Policies
LAN HPC VIRTUALIZATION On2Demand
WAN SAN Service Oriented
Cluster Management of resources
MAN GRID independent of underlying
physical infrastructure to
Intelligent increase utilization, efficiency
Information and flexibility Compute
Network

CONSOLIDATION Network
Centralization and
standardization to lower costs,
improve efficiency and uptime Storage
Compute Network Storage

4
Today’s Data Centers
Require an Architectural Approach to…
4:

Protect with Business Resilience


•Tighten security
•Improve business continuance
Optimize with Consolidation
•Improve operational efficiency
•and resource utilization
•Lower complexity and cost of ownership
Grow towards Services-oriented
Infrastructure
•Align virtualized resources with business
demands
•Automate infrastructure to respond
dynamically
The Big Picture—The Cisco Data Center
The Emerging MAINFRAME ENTERPRISE
CONNECTIVITY TAPE STORAGE
ENTERPRISE
DISK STORAGE

Data Center ENTERPRISE SAN


Architecture SWITCHING

Embedded Intelligent
Virtual Fabrics (VSANs)
MDS 9000 Storage Services
Family Storage Virtualization

Data Replication Svcs

Embedded Intelligent Fabric Routing Svcs


Network Services
Server Balancing
Embedded Intelligent
Virtualization Services
Multiprotocol
VPN Termination Gateway Services V Server Virtualization

SSL Termination Virtual I/O

Grid/Utility Computing
Firewall Services Catalyst 6500 TOPSPIN
Family FAMILY Low Latency RDMA
Services
Intrusion Detection
Clustering

SERVER
ENTERPRISE FABRIC
WIN GRID SWITCHING
NAS UNIX

Enterprise UNIX/Windows Blade Virtual Private Virtual Private Virtual Private

NAS Storage Servers Servers Server Server Blade Server


Fabric #1 Fabric #2 Fabric #3
2

You might also like