You are on page 1of 24

Huawei OceanStor Active-Active DR Solution

(HyperMetro)

Bill.LiangzhiQiang@huawei.com
Storage MM Director

Security Level:
Contents

1 Introduction

2 Huawei Active-Active Solution

3 Comparison

4 Key technology

2 Huawei Confidential
Importance of Business Continuity to IT Systems

Loss per hour of downtime


Fire Device Fault
648
280
200
160
9 63 110

Power Outage Virus/Hacker Attack Unit:


Media Healthcare Retail Manufacturing Telecom Energy Finance 10,000 USD

Source: Network Computing, the Meta Group and Contingency Planning Research

3 Huawei Confidential
International Standards About DR Construction

RPO: Recovery Point Objective (amount of lost data caused by downtime) RTO: Recovery Time Objective (downtime)

International Standard
RPO RTO BC&DR Solutions
Share78, UK
Tier 7 - Zero data loss and Active-Active/
0 < 15 minutes
automated service recovery Active-Passive/Geo-redundant 3DC/Private cloud solution

Tier 6 - Zero data loss 0 < 2 hours Active-Passive/CDM/Backup/Private cloud solution

Tier 5 - Two-site two-phase 2 to 12


< 24 hours Active-Passive/CDM/Backup solution
commit hours
2 to 24
Tier 4 - Active secondary site < 24 hours Backup solution
hours
12 to 24
Tier 3 - Electronic vaulting 24 hours Backup solution
hours
24 hours to 24 hours to
Tier 2 - PTAM + hot site Backup solution
days days
Tier 1 - Pickup Truck Access Backup solution
Days Days
Method (PTAM)

Tier 0 - No off-site data Days to ? Days to ? Backup solution

4 Huawei Confidential
Active-Active Solution Ensures 24/7 Business Continuity
Active-Active Solution

Active-Passive Solution

Fusion
Local Backup Solutions Sphere

Fusion
Sphere

Fusion
Sphere

Site 1 Site 2
Site 1 Site 2 Active-Active DCs
Active-Passive DCs
Single DC DR level
1 2 3 4 5 6 7

5 Huawei Confidential
Definition of Active-Active Storage
Definition
Active-active storage solutions consists of two storages which provide consistent data copies in real time. The two
copies are accessible to one host at the same time. The failure of any copy does not affect services. The two
storages can be deployed at two data centers, to form an active-active data center solution, with the active-active
design of upper-layer applications (as well the network layer).

Five necessary elements


1. Independent storage systems: Active-active is implemented between two storages with independent hardware and
software.
2. Active-Active access: Hosts can read and write from the same active-active LUN/volume concurrently on two active-
active storages, service loads are balanced, and resources are fully utilized.
3. Independent third-place quorum mechanism: Supports the independent third-place quorum mechanism. After the
quorum server fails, the active-active systems automatically switch to the preferred priority mode.
4. High availability: Business continuity is ensured upon the concurrent failure of three out of four controllers.
5. FC network: FC network can be set up using active-active replication links, ensuring performance of active-active
services.
6. Real-time data synchronization: Data is synchronized between the two data centers in real time and services are
automatically switched over in the event of a disaster, ensuring zero RPO and RTO.

7 Huawei Confidential
Contents

1 Introduction

2 Huawei Active-Active Solution

3 Comparison

4 Key technology

8 Huawei Confidential
End-to-End Physical Architecture of Active-Active DR Solution

≤100 km
Network layer DC outlet
Raw optical fiber
Core layer
Active-Active network layer
Aggregation
High-reliability, optimized layer
Layer-2 interconnect Access
Optimum access path layer
Application
layer Active-Active application layer
Fusion Fusion
Sphere Sphere
Oracle RAC, VMware, and
FusionSphere cross-DC high
Storage layer availability, load balancing,
and migration scheduling

Active-Active storage layer

Active-active access,
zero data loss

DC A DC B
Networking Requirement:
• RTT<1ms
9 Huawei Confidential • Bandwidth >2Gbps
Active-Active DR Solution –All Flash
High performance active-active: • Active-Active replication
~300K IOPS@1ms latency • Load balanced
(Dorado6000 V3 Dual-controller)
• Gateway-free
• Improved reliability
• Simplified management
• Reduced cost
• RPO = 0, RTO ≈ 0

Production center 2

Production center 1

DR center

Active-Active switchover Easily upgrade to


in 3 seconds- Dorado V3 geo-redundant solution

10 Huawei Confidential
Active-Active DR Solution –Unified Storage
DC A DC B
Working principle
Two storages are deployed in DC A and DC B, providing read and
Host application
cluster write services in active-active mode. Write IO will be mirrored real-
time between two storages to ensure data consistency. It supports
both SAN and NAS. No data will be lost if either storage fails.
Highlights
Suggest the distance < 100KM
IP/FC IP/FC  Active-Active,RPO = 0, RTO ≈ 0
SAN  No gateway devices, simplify networks, save costs, and
Real-time data mirroring.
NAS SAN Dual-write heartbeat and SAN NAS eliminate gateway-caused latency
configuration.  Provide dual arbitration mode,improve reliability
 Support replicating between different models of Huawei
storage, saving investment.
Production storage Production storage
 Support smooth upgrade from single-site to active-active and
from active-active solution to geo-redundant solution without
IP IP service interruption.
 Support IP or FC links for intra-city interconnection and IP
networks for arbitration links.
Quorum/VM

11 Huawei Confidential
High Availability Design of HyperMetro

Gateway-Free Cross-site Repair Bad Blocks Dual Arbitration Mode


HOST
Read IO
A-A link

1 5 Cross-site A-A cluster Storage A Storage B


storage A Storage B
Preferred site
AA LUN
Survive first
2 4
3 6
bad block
A LUN B LUN
Quorum server/VM

No need extra gateway device, When bad blocks cannot be repaired Quorum server and static priority
reduce fault point and networking within the array, automatic repair bad modes are provided, with support
complexity , provide higher reliability. blocks by reading data from the remote automatic switchover between the two
storage, the service access is not affected. modes. If the quorum device is faulty, the
static priority mode ensures service
continuity.

12 Huawei Confidential
High Performance Design of HyperMetro

Reduce Gateway Latency FastWrite Optimistic Lock

1~1.5ms

Gateway-free design, avoid bottleneck Combine Write Command and Data The optimistic lock technology is mainly
in the gateway, shorten the IO path, Transfer into one transmission. The aimed at more than 99% of the host IOs
reduce 1~1.5ms latency latency of cross-site write I/O will not have concurrent write conflicts,
interactions is reduced by half. so use optimistic locks to lock locally,
reducing inter-array interaction.

13 Huawei Confidential
High Flexibility Design of HyperMetro

Smooth Evolution Flexible Network Easy O&M

IP or FC

IP IP

Support smooth upgrade from single-site Support using IP or FC network in Support automatic self-healing after failure
to active-active and from active-active replication link, only one type of recovery, reduce manual operations.
solution to geo-redundant solution network, no need complicated network Support online updating version and online
without service interruption. design capacity expansion on Active-Active LUN.
Single site configuration when creation and
operation

14 Huawei Confidential
“Never-down” Data Solution Deployed by Yahoo Japan

200,000 Doc, 700GB


Processing and
HyperMetro synchronization per Day

Active-Active data center solution 12 Hours 30 Mins


Service maintaining time between
primary and secondary sites

99.9999%,7x24 Availability 5 Seconds


Storage switching time between
180km primary and secondary sites

Active-Active
Production Center 2
OceanStor V5

Production Center 1
OceanStor V5

Japan's most popular portal website Disaster Recovery


15 Huawei Confidential
Center
The search and portal services ranked No. 1 in Japan.
Huawei All-Flash Storage Chosen by AENA Ensures 24/7
Operations of Madrid-Barajas Airport
AENA: One of the world's largest airline operator
Core Operation System Non-Operation System Others

Flight mgmt.
Baggage Stand
Asset mgmt.
Geographic
CRM SQL
Huawei's all-flash active-active storage solution
processing allocation information

Ground Comprehen Service Decision-


DCS Development & Testing
services sive Query Statistics making support
24/7 3X
Gateway-free active-active design Airport services Airport operations
(HyperMetro) Stable operating Efficiency improvement

Active-active, mirroring, Slot/Cabinet allocation


and load balancing
OceanStor Dorado OceanStor Dorado
Cross-site takeover
Flight information query
T2 DC T4 DC without service interruption

RPO = 0, RTO ≈ 0 Airport business decision

16 Huawei Confidential
Contents

1 Introduction

2 Huawei Active-Active Solution

3 Comparison

4 Key technology

17 Huawei Confidential
Active-Active Solutions in the Industry
Controller Controller Controller Controller
Gateway Gateway

Controller Controller

Key points:
non-gateway/device isolation/loose Key points:
Key points:
coupling non-gateway/data-level mirroring/tight
gateway/data-level mirroring/tight
coupling
coupling

NAS Controller Controller NAS Controller Controller


gateway gateway

Key points:  Huawei HyperMetro


non-gateway SAN+NAS gateway/ Key points:
device isolation/loose coupling non-gateway/device isolation/loose coupling/
SAN&NAS unified active-active

18 Huawei Confidential
Contents

1 Introduction

2 Huawei Active-Active Solution

3 Comparison

4 Key technology

19 Huawei Confidential
FastWrite — Higher Dual-Write Performance

Common Solution FastWrite


Host Huawei storage Huawei storage Host
Host Huawei storage Huawei storage Host

100 km
100 km
1. Write Command FC/IP FC/IP
1. Write Command

2. Ready 2. Ready

3. Data Transfer 3. Data Transfer

5. Transfer Ready 5. Status Good RTT-1


RTT-1

RTT-2
8. Status Good

Site A Site B Site A Site B

 Common solution: A write I/O involves two interactions  FastWrite: The protocol is optimized to combine Write Command
between two storages, namely, Write Command and and Data Transfer into one transmission. The number of cross-
Data Transfer. site write I/O interactions is reduced by half.
 One 100 km transmission involves two RTTs.  One 100 km transmission link involves only one RTT.

20 Huawei Confidential
Optimistic Lock Optimization (Write Process)
Latency = t1 + t2 + t3 Latency = t1 + t3

Host Host
Cluster Cluster

Write IO Write IO

t1 Cross-site active-active cluster


t1 Cross-site active-active cluster
Apply distributed lock

Storage A t2 Storage B Storage A Storage B


HyperMetro LUN Apply local HyperMetro LUN
Apply local
lock lock

t3 t3

Member Member Member


Member
LUN LUN LUN
LUN
preferred site non-preferred site preferred-site non-preferred site

Write process with distributed lock Write process with optimistic lock
21 Huawei Confidential
HyperMetro Arbitration Design

Arbitration Design

Storage Resource Pool • The quorum device is deployed at a third-place site and in
a different fault domain from the two active-active DCs.
Support two quorum servers to avoid single point failure
Preferred
Note: Two quorum servers work in active/standby mode. Only one
Site quorum server is in effect at a time.
Storage A Storage B
IP
• Deploy the quorum device at the preferred site.

third-place site Primary Quorum Secondary Quorum • Set the static priority mode on the condition without
Device Device quorum device
Note: If the preferred site fails, services will be interrupted.
• Quorum device: support physical server or virtual server. And two quorum servers can be
deployed.
• Quorum link: IP addresses must be reachable.
• Arbitration mode: Both quorum server mode and static priority mode are offered.
• Arbitration granularity: Arbitration is performed based on LUN pairs or consistency groups.

22 Huawei Confidential
Cross-Site Bad Block Repair

Working principle

Host ① The production host reads data from storage A.


② Storage A detects a bad block by verification.
Read I/Os
③ The bad block fails to be repaired by
1 5 HyperMetro reconstruction. (If the bad block is repaired, the
following steps will not be executed.)
Storage A Storage B
Active-active LUNs ④ Storage A checks the status of storage B and

2 4 initiates a request to read data from storage B.


3 6
Bad block
⑤ Data is read successfully and returned to the
HyperMetro HyperMetro production host.
member LUN member LUN
⑥ The data of storage B is used to repair the bad
block's data.
The cross-site bad block repair technology is
Huawei's patent technology. It can be
automatically executed.

23 Huawei Confidential
Summary

POC Statistics in Russia


IP or FC
5%
IP IP 15%

Gateway-Free Flexible 15% 65%


POC
SAN+NAS SAN+NAS

Active Active DR

Simple 3DC Seeing is believing


Dorado V3/Unified Storage
Dorado V6 2020 H1
95% POC Successful

24 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2018 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like