You are on page 1of 35

ACME INFRASTRUCTURE

Rev 1.0 - 05012012 Adzmely Mansor

Sunday, May 6, 2012

DC Infra Overview

Sunday, May 6, 2012

FC SAN Storage Blades / ESXi ESXi Networking Virtual Machines


h1 h2 h3 h4 h5 h6

UP-Link DS 192.168.3.x

VLAN DS - 10G 192.168.x

Native

Web - 200 192.168.200.x

DB - 201 192.168.201.x

MM - 202 192.168.202.x

Active - Passive VM::SG:FW Physical SG::FW & IPS Staging & Management Servers

VM SG-FW1

VM SG-FW2

Sunday, May 6, 2012

DC VM Clusters

Sunday, May 6, 2012

SSL & Cache/Varnish Cluster (ap)


n1 n2

RHEL HA Manager (luci)

Management Server (VCenter)

Staging Server
Shared Storage 30G

IDP Cluster (a)


n1 n2

SP Cluster (a)
n1 n2

e-Learning Web Cluster (a)


n1 n2 n3 n4 n5

ID Cluster (a)
n1 n2

ID DBCluster (ap)
n1 n2

Management Node Shared Storage 50G (GFS2) Shared Storage 50G (GFS2) Shared Storage 800G (GFS2) Shared Storage 50G (GFS2) Shared Storage 50G

Legend
Shared Disk/Storage/LUN Replication Slave

Master DB & MemCached Cluster (ap)


n1 n2

(a) (ap)

Active - Active Cluster Active - Passive Cluster Shared Storage 50G

Sunday, May 6, 2012

DR Infra Overview

Sunday, May 6, 2012

FATA SAN Storage Blades / ESXi ESXi Networking Virtual Machines


h1 h2 h3 h4 h5 h6

Standard Switch 192.168.168.x

Native

Physical CISCO-ASA::FW

Sunday, May 6, 2012

DR VM Clusters

Sunday, May 6, 2012

SSL & Cache/Varnish Cluster (ap)


n1 n2

RHEL HA Manager (luci)

Management Server VCenter

Shared Storage 30G

IDP Cluster (a)


n1 n2

SP Cluster (a)
n1 n2 n1

Webs
n2 n3

NFS Cluster (ap)


n1 n2

NFS mount Shared Storage 50G (GFS2) Shared Storage 50G (GFS2) Shared Storage 820G

Legend
Shared Disk/Storage/LUN

DB & MemCached Cluster (ap)


n1 n2

(a) (ap)

Active - Active Cluster Active - Passive Cluster Shared Storage 50G

Sunday, May 6, 2012

DC Access Flow

Sunday, May 6, 2012

Varnish

Return PIPE

HTTP

cache?
YES

NO

HTTPS
YES Cache Storage NO

lookup

SSL Reverse Proxy

Round Robin Load Balance Backend Servers

Sunday, May 6, 2012

Replication Flow for Web Files

Sunday, May 6, 2012

Production to DC
rsync daily - triggered by node web5 ~/home & ~/documents only total sync average size around 600M time taken ~/home : 1 hour 20 minutes ~/documents : 2 hours 30 minutes
Sunday, May 6, 2012

Node web5 to Staging


rsync twice daily: 5 am & 5 pm ~/documents, ~/home, ~/gitfolder time taken ~ 1 hour 45 minutes average size transferred ~ < 500M longer time because of number of les
and folders
Sunday, May 6, 2012

Staging to DR
replicate to NFS server in DR real time, using DT block level replication ~/home, ~/documents, ~/gitfolder replication transmit bandwidth currently
limited/shaped to 2Mbps (DT level) minutes

completed around 08:30am ~ 3 hours 30


Sunday, May 6, 2012

Replication Flow for DB

Sunday, May 6, 2012

DB: DC to DR
replicate from DC DB to DB server in DR real time, using DT block level replication ~/mysql (mysql data only) replication transmit bandwidth currently
limited/shaped to 2Mbps (DT level)

Sunday, May 6, 2012

Replication & Cluster

Sunday, May 6, 2012

Double Take Software


Not cluster aware rolling from node to another: regained replication transmission via
virtual IP route

failure after several attempt: possibility of ended/paused


transmission
Sunday, May 6, 2012

Double Take Software


Ended Transmissions/Errors congureable notication via email possibility of writing DT Script to
node (active - passive switching)

monitor status and re-start incase of failure

not everyday/month rolling to another


Sunday, May 6, 2012

Migration

Sunday, May 6, 2012

Migration
Requirements nameserver delegation and conguration Saudi NIC & ns servers in DC & DR Latest DB dump latest, import to DC Latest Source/Codes need to sync with production
Sunday, May 6, 2012

Migration Simulation (DB only)

Sunday, May 6, 2012

2 hours 24 minutes - execution time human time + 1 hour 30 minutes download latest dump from production import to DC MasterDB Sync with Slave DB DB and Application Internal Test
Sunday, May 6, 2012

Switching to DR

Sunday, May 6, 2012

Switching to DR
Stop all replications Start MySQL in DR Start IDPs in DR Update DR/DC NS
Sunday, May 6, 2012

Switching to DR
When DC recovered, net, storage/servers start replication/mirror with differences
without delete sync

to sync changes in between DC-DR

Sunday, May 6, 2012

DC-DR Switching Simulation

Sunday, May 6, 2012

DR Simulation :: issues
idp need to start manually try to connect to DB during startup DB down, only up during switching Session error because of time differences
between idp, sp, and web servers.

very slow access compared to DC?


Sunday, May 6, 2012

ntpdate failed, blocked by FW?

DR - DC Restoration

Sunday, May 6, 2012

DC Restoration
At least DB must be up-to-date Files sync can be delayed missing/error in links/downloads related
to les

full mirror sync


Sunday, May 6, 2012

gradually recovered by DT restorations

DC Restoration: DB
Start DT restoration, from DR to DC start DB restoration, from DR to DC until mirror and replication status both
are 100% or idle

Switch NS record domain pointer back to


DC
Sunday, May 6, 2012

need downtime window for switching no more changes in DR - DB

Pending Issues

Sunday, May 6, 2012

Outstanding Issues
Code / patches synchronization with production google applications not working VMWare ESXi Networking 2nd vmnic for VLAN DS
Sunday, May 6, 2012

Thank You

Sunday, May 6, 2012