You are on page 1of 87

JARINGAN ENTERPRISE

DATA CENTER DESIGN & DATA RECOVERY CENTER


(DRC)
Week 5
Agenda
• Introduction to Data Center Design
• Reference Architecture Data Center Design
• Introduction to Data Recovery Center
Introduction to Data Center Design
Source:
http://people.uwplatt.edu/~yangq/csse411/csse411-materials/s05/CS411_06_Data
center%20Design%20presentation.ppt
A Data Center
Highly secure, fault-resistant facilities housing equipment that connect
to telecommunications networks. The facilities accommodate servers,
switches, routers, modem racks.
Data centers support corporate data bases, web sites and provide
locations for CLECs, ISPs, ASPs, Web hosting companies, DSL providers
and other IT services.
Data Center Evolution

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Rack Location Units
• A standard for measuring equipment space in a data center is RLUs or
Rack Location Units

1U 2U 3U 4U…

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Elements of a Data Center
• The Site
• Command Center
• Cable Management
• Network Infrastructure
• Terminal Servers
• Environmental Controls
• Power
Elements of a Data Center (Cisco Product)
Criteria
• The Budget $
-What is the available budget?
-Can the scope of the project be achieved with the current budget?
-What are the actual funds needed to create the data center?
-How will funds be distributed & can they be redistributed?
Criteria Continued
• Physical Constraints:
-Available Space & Weight of Equipment
-Power Requirements
-Cooling
-Bandwidth
Criteria Continued
• System Availability Profiles
-Categorization
-Device Redundancies
-Power Redundancies
-Cooling Redundancies
-Network Redundancies
Structural Aspects
• When dealing with a raised floor ceiling height matters.
• Basement data center locations near water are not a good idea.
• Their must be a pathway for equipment to be moved in & out of the
data center.
• Make sure the floor where the data center is to be located is rated for
the estimated load.
Power
• Adequate power
• Surge suppression
• Proper grounding of equipment
• Cable Layout
Power Redundancy
• Forms of Power Redundancy
1. Battery-feed UPS
2. Power Generators
Networking
• Each cabinet needs be to supplied with appropriate connection
media: Cat6 copper, multi-mode fiber, and/or single-mode fiber
• Proper cable management should be implemented
• Over Flow
Security
• Physical Access
• Levels of Access
• Monitoring
Past to Future

Technology changes continuously making versatility a


primary focus in a Data Center.
At one point in time a single computer occupied the
space of an entire Data Center. That same space can
be occupied by thousands of servers today.
Expandability
• Create more RLUs than current needs dictate
• Power Distributions Unit (PDU)
• Heating, Ventilator, and Air Condition (HVAC)
• Physical Space
Example RLU Definitions
Bandwidth:
-inbound and outbound bandwidth
1 specifications
2
Power: -Media type requirement

3.42 BTUs per Hr = 1 watt Weight:


-Weight of the equipment

Physical Space:
-Width and depth of rack chassis
-Cooling Dimensions

3 Functional Capacity:
-Needed to determine quantity of
servers for a particular RLU
definition
Example RLU Definitions
Specifications RLU-A RLU-B RLU-C
(storage server) (storage server) (processing server)
Weight 780 lbs 970 lbs 1000 lbs

Power Two 3 Amp 208V Two 30 Amp 208V Four 30 Amp 208V
L6-30R outlets L6-30R outlets L6-30R outlets
3812 Watts x RM 4111 Watts x RM 8488 Watts x RM

Cooling 13040 BTUs per hr 14060 BTUs per hr 29030 BTUs per hr

Physical Space 24 x 48 in 24 x 48 in 24 x 53 in

Bandwidth 8 multi-mode fiber 12 multi-mode fiber 4 Cat6 copper


12 multi-mode fiber

Functional 5.2 TB 4.7 TB 24 CPU


96GB RAM
Capacity
Data Center Site Infrastructure Tier Standard

Sources: Uptime Institute


Distributed Data Center
Why Distributed Data Centers?
▪ Provide disaster recovery and business continuance
▪ Avoid single, concentrated data depositary
▪ High availability of applications and data access
▪ Load balancing together with performance scalability
▪ Better response and optimal content routing: proximity to
clients
Reference Architecture Data Center Design
Oracle Enterprise Transformation Solution Series
Sub Agenda
• Introduction
• Reference Architecture Conceptual View
• Architectural Principles
• Reference Architecture Logical View
• Patterns and Best Practice

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Introduction
• Purpose: to effectively communicate a solution to a variety of different
stakeholders. Reference architectures accelerate the implementation of a
solution by avoiding the lengthy process of creating an architecture from
scratch for each project (or worse still, embarking on a project without an
architecture)
• A complete reference architecture identifies architectural concepts,
capabilities, principles and guidelines, and presents them through various
architectural views, technical drill-downs, and product mappings
• Primary objective:
• Lower operational cost
• Reduced risk
• Best performance and flexibility
Introduction - Scope
Reference Architecture Conceptual View
• The conceptual architecture is important for describing the needs of
the consumers of a system and identifying architectural
requirements necessary to meet those needs
• The architecture at this level must be presented in non-technical
language that can be understood by the full range of stakeholders
across the business and IT.
Reference Architecture – Reduced Cost
• Vertical integration: integration of known, standardized, modular
components
• Pre-configured systems: cost optimization achieved through
specialization and standardization
• Consolidation: built-in virtualization to substantially improve
computing resource utilization
• Efficient utilization: improved utilization of storage, servers, and
networks are an important consideration in reducing cost
Reference Architecture – Reduced Risk
• Comprehensive security: a consistent security strategy is fully integrated at
every layer of the architecture.
• Integrated monitoring with unified measurement: a unified monitoring
solution increases visibility and enables effective management of the data
center.
• Highest reliability and availability: the highest levels of reliability and
availability can be achieved by eliminating single points of failure and
establishing redundancy where appropriate.
• Pre-configured systems: known configurations of components (including
system software and applications), tested by a single vendor, eliminate
uncertainties in configuration management, making systems (and the data
center overall) more predictable
Reference Architecture – Performance &
Flexibility
• automation: well managed automation enables the rapid configuration of systems
and service levels.
• shared services: by making the IT infrastructure dynamically reconfigurable data
center resources can be shared in a way that not only leads to optimized
utilization, but also supports rapid reuse.
• pre-configured systems: systems with known performance and operational
characteristics support rapid implementations.
• flexible components: a system that can seamlessly move from IOPS intensive
workloads to supporting large numbers of VMs to data streaming provides an
intrinsic flexibility.
• adherence to Open Standards: conformity to Open Standards supports both
flexibility and longevity
Reference Architecture – Example
Architectural Principles
• In essence architecture principles translate business needs into
statements of IT mandates that the solution must meet
Architectural Principles
Architectural Principles
Architectural Principles
Reference Architecture Logical View
Reference Architecture Logical View
Patterns and best practices
Introduction to Data Recovery Center
Source:
http://people.uwplatt.edu/~yangq/csse411/csse411-materials/s05/CS411_06_Data
center%20Design%20presentation.ppt

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Sub Agenda
• Data Center Disaster Recovery
• Objectives
• Failure Scenarios
• Design Options
• Components of Disaster
• Recovery Site Selection - Front End GSLB
• Server High Availability - Clustering
• Data Replication and Synchronization - SAN Extension
• Sample Design
Objectives (Disaster Recovery)
▪ Recovery of data and resumption of service - Ensuring
business can recover and continue after failure or disaster

▪ Ability of a business to adapt, change and continue when


confronted with various outside impacts

▪ Mitigating the impact of a disaster


Objectives (Disaster Recovery)
Business Resilience

Continued Operation of
Business During a Failure

Business Continuance
Restoration of Business
After a Failure
Disaster Recovery
Protecting Data Through Offsite
Data Replication
and Backup

Zero Down Time is


the ultimate
goal
Objectives (Disaster Recovery Planning)
• Business Impact Analysis (BIA)
• Determines the impacts of various disasters to specific business functions and
company assets

• Risk Analysis
• Identifies important functions and assets that are critical to company’s
operations

• Disaster Recovery Plan (DRP)


• Restores operability of the target systems, applications, or computing facility at the
secondary Data Center after the disaster
Objectives (Disaster Recovery Objectives)
▪ Recovery Point Objective (RPO)
• The point in time (prior to the outage) in which system and data must be restored to
• Tolerable lost of data in event of disaster or failure
• The impact of data loss and the cost associated with the loss
▪ Recovery Time Objective (RTO)
• The period of time after an outage in which the systems and data must be restored to the
predetermined RPO
• The maximum tolerable outage time
▪ Recovery Access Objective (RAO)
• Time required to reconnect user to the recovered application, regardless where it is
recovered
Objectives (Recovery Point/Time vs. Cost)
Critical data is Disaster Systems recovered
recovered strikes and operational

time
Recovery point Recovery time

time t0 time t1 time t2

days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape


backup Replication Replication Replication Cluster Migration Restore

$$$ Increasing cost $$$ Increasing cost

▪ Smaller RPO/RTO ▪ Larger RPO/RTO


Higher $$$, Replication, Hot Lower $$$, Tape backup/restore,
standby Cold stanby
Failure Scenarios
•Disaster could mean many types of Failure
▪ Network Failure
▪ Device Failure
▪ Storage Failure
▪ Site Failure
Failure Scenarios (Network Failures)
Internet Service
Service
▪ ISP failure Provider Provider B
A
✓ Dual ISP connections
✓ Multiple ISP

▪Connection failure within the network


✓ ether-channel
✓ Multiple route paths
Failure Scenarios (Device Failures)
Internet Service
Service
▪ Routers, Switchers, FWs Provider Provider B
A
✓ Hot Standby Router Protocol (HSRP)
✓ Virtual Router Redundancy Protokol (VRRP)

▪Host
✓ HA cluster
Failure Scenarios (Storage Failures)
Internet Service
Service
▪ Disk arrays Provider Provider B
✓ RAID A

▪Disk Controllers
Failure Scenarios (Site Failures)
Internet Service
Service
▪ Partial Site Failure Provider Provider B
✓ Application maintenance A

✓ Application migration
✓ Application scheduled DR exercise

▪ Complete Site Failure


✓ Disaster
Design Options (Cold Standby)
▪ One or more data center with appropriately configured space
equipped with pre-qualified environmental, electrical, and
communication conditioning
▪ Hardware and Software installation, Network access, and data
restoration all need manual intervention
▪ Least expensive to implement and maintain
▪ Substantial delay from standby to full operation
Design Options (Disaster Recovery –
Active/Standby)

APP APP B APP APP B


A A

FC FC

Primary Secondary
Data Center Data
(Cold Standby)
Center
Design Options (Warm Standby)
▪ A data center that is partially equipped with hardware and
communications interfaces capable of providing backup
operating support.
▪ Latest backups from the production data center must be
delivered
▪ Network access needs to be activated
▪ Provides better RTO and RPO than Cold Standby
Backup
Design Options (Disaster Recovery –
Active/Standby)

APP APP B APP APP B


A A

IP/Optical Network

FC FC
Secondary
Primary
Data
Data Center
(Warm Standby)
Center
Design Options (Hot Standby)
▪ A data center that is environmentally ready and has sufficient
hardware, software to provide data processing service with
little down or no down time.
▪ Hot Backup offers Disaster Recovery, with little or no human
intervention
▪ Application data is replicated from the primary site
▪ A hot backup site provides very good RTO and RPO
Design Options (Disaster Recovery –
Active/Standby)

APP APP B APP APP C


A A

IP/Optical Network

FC FC

Primary Secondary
Data Center Data Center
Design Options (Multiple Tiers of
Application)
Internet Service
Service
Provider Provider B
A

Presentation Tier

Application Tier

Storage Tier
Design Options (Active/Active Data Centers)
Internal Internet
Service Service
Network Provider Provider B Internal
A
Network

Active/Active Web
Hosting

Active/Active
Application Processing

Active/Standby
Database Processing
Or
Active/Active
Site Selection
▪ Site selection mechanisms depend on the technology or mix of
technologies adopted for request routing:
1. HTTP Redirect
2. DNS Based
3. L3 Routing with Route Health Injection (RHI)
▪ Health of servers and/or applications needs to be taken into
account
▪ Optionally, other metrics (like load ) can be measured and utilized
for a better selection
Site Selection (HTTP Redirection)
▪ Leveraging the HTTP redirect function: HTTP
return code 302
▪ Proper site selection made after the initial DNS request has
been resolved, via redirection
▪ Mainly as a method of providing site persistence while
providing local server farm failure recovery
▪ Can be used with the “Location Cookie” feature of the CSS to
provide redirection after wrong site selection
Site Selection (HTTP Redirection)

http://www.cisco.com/

http://www1.cisco.com/

http://www2.cisco.com/
Site Selection (HTTP Redirection)

Advantages Limitations
▪ Can be implemented ▪ It is protocol specific – relies on HTTP
without any other GSLB
devices or mechanisms ▪ Requires redirection to fully
▪ Inherent persistence to qualified additional names –
the selected location additional DNS records
▪ Can be used in conjunction ▪ Users may bookmark a specific
with other methods to
provide more sophisticated location – losing automatic failover
site selection ▪ HTTPS redirect requires full SSL
hand shake to be completed first
Site Selection (DNS-Based Site Selection)
▪ The client D-proxy (local name server) performs iterative
queries
▪ The device which acts as “site selector” is the authoritative name
server for the domain(s) distributed in multiple locations
▪ The “site selector” sends keepalives to servers or server load
balancer in the local and remote locations
▪ The “site selector” selects a site for the name resolution, according
to the pre-defined answers and site load balance method
▪ The user traffic is sent to the selected location
Site Selection (DNS-Based Site Selection)
Root Name Server for/
Authoritative Name Server
DNS Proxy 2 for .com
3 4 Authoritative Name
Server cisco.com
5
1 6
10 7
8

Client 9 Authoritative
Name Server
http://www.cisco.com/ www
UDP:53 .cisco.com
TCP:80

Cisco Confidential
Data Center 1 Data Center 2
Site Selection (DNS-Based Site Selection)

Advantages Limitations
▪ Protocol independent: works ▪ Visibility limited to the D-proxy (not the
with any application that uses
name resolution client)
▪ Minimal configuration changes in
the current IP and DNS ▪ Can not guarantee 100% session
infrastructure (DNS authoritative persistency
server)
▪ Implementation can be different ▪ DNS caching in the D-proxy
for specific host names
▪ ▪ DNS caching in the client application
A-records can be changed on the
fly ▪ Order of multiple A-record answers
▪ Can take load or data center can be altered by D-proxies
size into account
▪ Can provide proximity
Site Selection (Route Health Injection)
▪ Server and application health monitoring provided by Local
Server Load Balancers
▪ SLB can advertise or with draw VIP address to upstream routing
devices depending on the availability of the local server farm
▪ Same VIP addresses can be advertised from multiple data
centers – IP Anycast
▪ Relying on L3 routing protocols for route propagating and
content request routing
▪ Disaster Recovery provided by network convergence
Site Selection (Route Health Injection)
Client A Router 11 Client B
Router 13

Router 10

Router 12 Low Cost


Very High Cost
Location A
Backup Location Location B
for VIP x.y.w.z Preferred Location
for
VIP x.y.w.z
Site Selection (Route Health Injection)

Advantages Limitations
▪ Supports legacy application ▪ Relies on host routes (32 bits), which
and does not rely on a DNS cannot be propagated all over the
infrastructure internet (more on this later)
▪ Very good re-convergence
time, especially in Intranets ▪ Requires tight integration between the
where L3 protocols can be application-aware devices and the L3
fine tuned appropriately routers
▪ Protocol-independent: ▪ Inability to intelligently load balance
works with any application among the data centers
▪ Robust protocols and
proven features
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Server High Availability (Cluster)
▪ A cluster is two or more servers configured
to appear as one
▪ Two types of clustering: Load balancing Web Servers

(LB) and High Availability (HA)


▪ Clustering provides benefits for availability,
reliability, scalability, and manageability Application Servers

▪ LB clustering: multiple copies of the same


application against the same data set,
usually read only Database Servers
▪ HA clustering: multiple copies of long
running application that requires access to a
common data depository, usually read and
write
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Server High Availability (Cluster)
▪ Public Network (typically
Ethernet) for client /Application
requests
▪ Servers with same hardware,
OS, and application software
▪ Private Network (typically
Ethernet) for interconnection
between nodes. Could be direct
connect, or optionally going
through the public network
▪ Storage Disk (typically Fiber)
shared storage array, NAS or
SAN
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Server High Availability (Typical HA Cluster)
▪ Application software that are clustered to provide High Availability. Example:
Microsoft Exchange, SQL, Oracle database, File and Print Services
▪ Operating System that runs on the server hardware. Example: Microsoft Windows
2000 or 2003, Linux (and the other flavors of UNIX), IBM VMS or z/OS (for
mainframe)
▪ Cluster Software that provides the HA clustering service for the application.
Example: Microsoft MSCS, EMC AutoStart (Legato), Veritas Cluster Server, HP
TruCluster and OpenVMS
▪ Optionally, Cluster Enabler, a software that synchronizes the cluster software with
the storage disk array software

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Server High Availability (Basic HA Cluster
Design)
▪ Active/Standby:
▪ Active node takes client requests and writing
to the data
▪ Standby takes over when detecting
failure on active node1 node2
▪ Two-node or multi-node
▪ Active/Active:
▪ Database requests load balanced to both
nodes
▪ Lock mechanism ensures data
integrity
▪ Most scalable design
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Server High Availability
(File System Approaches for HA Clusters)
▪ Shared Everything
– Equal access to all storage
– Each node mounts all storage resources
– Provides a single layout reference system for all nodes
– Changes updated in the layout reference
▪ Shared Nothing
– Traditional file system with peer-peer communication
– Each node mounts only its “semi-private” storage
– Data stored on the peer system’s storage is accessed via the peer-peer communication
– Failed node’s storage needs to be mounted by the peer

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Server High Availability (Geo-clusters)
Geo-cluster: cluster that span multiple data centers
WAN

Local Remote
Datacenter Datacenter

node1 node2

Disk Replication
Synchronous or Asynchronous
2 x RTT

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Server High Availability
(Considerations for HA Clusters)
▪ Split Brain: Cluster partitioning when nodes can not communicate with
each other but are equally capable of forming a cluster and mount disks.
▪ Extended L2 required in most implementations for:
– Public Network, since client only knows about the Virtual IP address
– Private Network, used for Heart-beats
▪ Storage:
– Directly Attached Disk (DAS) cannot be used
– Shared Disk needs to be visible to both Nodes
– Needs to interface with cluster software for disk failover, zoning, LUN
masking when there is a node failure
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Data Replication and Synchronization
(Terminology)
▪ Storage subsystem
• Just a bunch of disks (JBOD)
• Redundant array of independent disks (RAID)

▪ Storage I/O devices


• Host Bus Adapter (HBA)
• Small Computer Serial Interface (SCSI)

▪ Storage protocols
• SCSI
• iSCSI
• FC (FCIP)

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Data Replication and Synchronization
(Terminology)
▪ Direct Attached Storage (DAS)
• Storage is “local” behind the server No storage sharing possible
• Costly to scale; complex to manage

▪ Network Attached Storage (NAS)


• Storage is accessed at a file level over an IP network Storage can be shared
between servers

▪ Storage Area Networks (SAN)


• Storage is accessed at a block-level Separation of Storage from the
Server
• High performance interconnect providing high I/O throughput
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Data Replication and Synchronization
(Storage for Application)
▪ Presentation Tier
• Unrelated small data files commonly stored on internal disks Manual distribution

▪ Application Processing Tier


• Transitional, unrelated data
• Small files residing on file systems
• May use RAID to spread data over multiple disks

▪ Storage Tier
• Large, permanent data files or raw data Large batch updates, most
likely Real time Log and data on separate volumes

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Data Replication and Synchronization
(Backup and Replication)
▪ Offsite tape vaulting
• Backup tapes stored at offsite location

▪ Electronic vaulting
• Transmission of backup data to offsite location

▪ Remote disk replication


• Continuous copying of data to offsite location Transparent to host

▪ Other methods of replication


• Host-based mirroring Network-based replication
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Data Replication and Synchronization
(Replication: Modes of Operation)
▪ Synchronous
•All data written to cache of local and remote arrays before I/O is
complete and acknowledged to host
▪ Asynchronous
•Write acknowledged after write to local array cache; changes (writes)
are replicated to remote array asynchronously
▪ Semi-synchronous
•Write acknowledged with a single subsequent WRITE command pending
from remote array
2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1
Data Replication and Synchronization
(Replication: Modes of Operation)
Synchronous Asynchronous
Impact to
No Application
Application Performance Performance Impact
Distance Limited (Are Both Unlimited Distance (Second
Sites within the Same Site Outside Threat
Threat Radius) Radius)
No Data Loss Exposure to
Possible Data Loss

Enterprises Must Evaluate the Trade-Offs

• Maximum tolerable distance ascertained


by assessing each application
• Cost of data loss

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Data Replication and Synchronization
(Data Replication with DB Example)
Control Files
▪ Control Files identify other files making up • DB name
• creation date
the database and records content and state • backup performed

of the db. • redo log time period


• datafile state
▪ Datafile is only updated periodically
▪ Redo logs record db changes resulting from Identify

transactions
▪ Used to play back changes that may not
have been written to datafile when failure
occurred Record
Datafiles changes to Redo Log
▪ Typically archived as they fill to local and DR Files
• Tablespaces • Database changes
site destinations
• Indexes
• Data Dictionary

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Data Replication and Synchronization
(Data Center Interconnection Options)

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Disaster Impact Radius
Global

Regional
< 400km
Secondary Primary
DR Site Data Center Data Center

Metro
< 50km

▪ Disasters are characterized by Local


their impact 1–2 km
Local, metro, regional, global
Fire, flood, earthquake, attack

▪ Is the backup site within the threat


radius?

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1


Summary - Design
▪ Details
Data centers 1 and 2 are in primary location with close
enough distance that can provide DC HA for active/active
access
▪ Data Center 3 (DR) with > tolerable disaster radius, away
for Primary DC 1 and 2
▪ Web/App server farms are load balanced geographically
▪ DB servers are within a geo-HA cluster and running in a
L3 design
▪ Synchronize Data replication between data centers within
the primary location
▪ Asynchronous Data replication is done between the
primary and secondary storage systems

Presentation_ID
Thank You

2023 CTI3D3-JARINGAN ENTERPRISE / PRODI S1

You might also like