You are on page 1of 48

Sangfor aCloud

Reliability Technical
White Paper

Sangfor Technologies Inc.


Copyright Notice
This document is the copyright of SANGFOR Technologies INC., Sangfor

reserves the final interpretation and the right to amend this document and this

statement.

Any text appearing in this document narrative content of the document

format, illustrations, photographs, methods, processes, etc., unless specifically

stated, the copyright and other related rights belong to Sangfor. Without Sangfor’s

written consent, no person shall in any manner or form on any part of the copy of

this document, extract, backup, modify, distribute, translate into another language,

in its entirety or in part, for commercial purposes.

Disclaimer
This document is for informational purposes only and is subject to change

without notice.

Sangfor Technologies Inc. has made every effort to ensure that its contents are

accurate and reliable at the time of writing this document, but Sangfor is not liable

for any loss or damage caused by omissions, inaccuracies or errors in this document.

Contact us
Service hotline: +60 12711 7129 (7511)

Hong Kong: (+852) 3427 9160

United Kingdom: (+44) 8455 332 371

Singapore: (+65) 9189 3267

Malaysia: (+60) 3 2201 0192

Thailand: (+66) 2 254 5884

Indonesia: (+62) 21 5695 0789

You can also visit the official website of Sangfor Technologies:

www.sangfor.com for the latest technology and product information.


Table of Contents
1. HYPER-CONVERGED PLATFORM ARCHITECTURE .................................................... 4

2. ACLOUD PLATFORM MANAGEMENT RELIABILITY .................................................... 5

2.1. DISTRIBUTED ARCHITECTURE ............................................................................................... 5

2.2. LINK REDUNDANCY ........................................................................................................... 7

2.3. SYSTEM SELF-PROTECTION ................................................................................................. 8


2.4. RESOURCE RESERVATION .................................................................................................. 8

2.5. MONITOR CENTER............................................................................................................. 9

2.6. WATCHDOG ..................................................................................................................... 9

2.7. BLACK BOX ..................................................................................................................... 10

2.8. SYSTEM FILES BACKUP...................................................................................................... 10

3. ASV COMPUTE LAYER RELIABILITY DESIGN ............................................................ 11

3.1. VM RESTART ................................................................................................................... 11

3.2. VM HA(HIGH AVAILABILITY) ..................................................................................... 11

3.3. VM SNAPSHOT................................................................................................................ 12

3.4. LIVE MIGRATION ............................................................................................................. 13

3.5. HOST MAINTENANCE MODE ........................................................................................... 14

3.6. DRS(DYNAMIC RESOURCE SCHEDULER ).................................................................. 15

3.7. DRX(DYNAMIC RESOURCE EXTENSION) .................................................................... 15

3.8. VM PRIORITY .................................................................................................................. 16

3.9. RECYCLING BIN............................................................................................................... 17

3.10. VM ANTI-AFFINITY .......................................................................................................... 17

4. ASAN STORAGE LAYER RELIABILITY DESIGN .......................................................... 18

4.1. ASAN DISTRIBUTED STORAGE ARCHITECTURE .................................................................. 18

4.2. DATA REPLICA BASED PROTECTION ................................................................................ 19

4.3. ARBITRATION BASED PROTECTION ................................................................................... 20

4.4. SPARE DISK ..................................................................................................................... 20

4.5. IO QOS PROTECTION .................................................................................................... 21

4.6. DISK STATE DETECTION .................................................................................................... 21


4.7. DISK MAINTENANCE MODE ............................................................................................ 22

4.8. SILENT ERROR DETECTION ................................................................................................ 23

4.9. FAST DATA REBUILDING ................................................................................................... 24

4.10. FAULT DOMAIN ISOLATION .......................................................................................... 26

4.11. DELAYED DATA DELETION ........................................................................................... 26

4.12. DATA SELF-BALANCING .............................................................................................. 27

5. ANET NETWORK LAYER RELIABILITY DESIGN........................................................... 28

5.1. ANET NETWORK LAYER RELIABILITY ARCHITECTURE ......................................................... 28

5.1.1. Management Plane High Reliability ................................................................. 29


5.1.2. Control Plane High Reliability .............................................................................. 30

5.1.3. Data Forwarding Plane High Reliability ............................................................ 30

5.2. DVSW(DISTRIBUTED VIRTUAL SWITCH) ....................................................................... 30

5.3. VROUTER ......................................................................................................................... 31

5.4. DISTRIBUTED FIREWALL AFW ............................................................................................ 32

5.5. RELIABILITY ....................................................................................................................... 32

5.6. CONNECTIVITY DETECTION .............................................................................................. 33

5.7. VXLAN NETWORK RELIABILITY ........................................................................................ 33

5.8. NETWORK PORT SELF-RECOVERY .................................................................................... 34

6. HARDWARE LAYER RELIABILITY DESIGN ................................................................. 34

6.1. HARDWARE HEALTH CHECK............................................................................................ 34

6.2. CPU RELIABILITY .............................................................................................................. 35

6.3. MEMORY RELIABILITY ....................................................................................................... 35

6.4. DISK RELIABILITY ............................................................................................................... 36

6.5. NETWORK CARD RELIABILITY ........................................................................................... 37

6.6. RAID CARD RELIABILITY .................................................................................................. 38

6.7. POWER SUPPLY RELIABILITY .............................................................................................. 38

6.8. ALARM SERVICE .............................................................................................................. 39

7. SOLUTION LAYER RELIABILITY DESIGN .................................................................... 39

7.1. VM FAST BACKUP ........................................................................................................... 39

7.2. CDP(CONTINUOUS DATA PROTECTION)................................................................... 41

7.3. DR(DISASTER RECOVERY).......................................................................................... 43

7.4. SC(STRETCHED CLUSTER) ........................................................................................... 45


1. Hyper-converged Platform
Architecture

Sangfor aCloud HCI platform is based on the idea of “software-defined data

center”, with virtualization technology as the core, using computing

virtualization aSV , storage virtualization aSAN , network virtualization aNET and

other components to form a unified resource pool and reduce data center

hardware equipments, effectively saving investment costs, shortening application

time-to-online; providing graphical interface and self-service operation and

maintenance capabilities, reducing operation and maintenance complexity,

helping users to liberate productivity; and continuously polishing the product

quality, striving to create minimal, stable and reliable high performance hyper-

converged solution.
Sangfor aCloud is a software-centric platform, the architecture is the most

fundamental guarantee to ensure the reliability of the product itself, including

platform management, compute, storage, network, hardware layer and solution

level reliability.

2. aCloud Platform Management


Reliability

2.1. Distributed Architecture

Sangfor aCloud adopts a fully distributed architecture to ensure platform

reliability .

1) The hyper-converged cluster adopts a non-centralized design. Each node


is an independent peer-to-peer working node. There is no single-node

failure risk. The master control mode is used as the access point to

manage the cluster. The platform automatically elects the master control

through the algorithm. If the host of the master node fails, the platform
automatically re-elects the new master node to ensure the stability and

accessibility of the cluster. During the master node switching process, the

normal operation of the VM is not affected.

2) The hyper-converged cluster configuration information is distributed in the

cluster nodes by multiple copies in the cluster file system. If any single node

fails, the cluster configuration data will not be lost.

aCloud overall architecture diagram

➢ Controller : provides management and control services for the entire cluster,
such as user management and authentication, resource alarms, backup

management, etc.; Controller exists on each node, but only

one master controller is active at the same time, and the status of other node

controllers is Standby .

➢ Worker : primarily responsible for performing computation, configuration, data


transmission exchange such specific work; Each node has a Worker in action.
2.2. Link Redundancy

aCloud HCI solution has four network plane, each network plane is

independently deployed, namely: management network, business network, data

communications network (VXLAN ) and storage network.

Management network: The administrator accesses the management network to

manage the hyper-converged cluster. The management network implements

link redundancy through dual-switch aggregation. The failure of a single switch

and a single link does not affect the stability of the hyper-converged

management platform.

Business network: used for normal service access and release. The business network

can implement link redundancy through dual-switch aggregation. You can set the

network port static binding for the service egress. You can set multiple service

outlets for virtual machine selection in the virtual network to ensure high reliability

the business enetwork.

Data communication network (VXLAN ): East-west traffic between virtual

machines, which can realize communication between services, set up private

network to ensure data security; use physical switches to achieve link

redundancy through aggregation; A distributed virtual switch on Sangfor aCloud

has a virtual switch instance on each and every host in the cluster. When one of

the hosts goes offline, the traffic that passes through the virtual switch instance on

the host is redirected and taken over by other hosts due to virtual routing and

virtual machine HA on other hosts.

Storage network: the need to perform data storage through the network of

IO operation; set up a private network to protect data security, no need for static

binding and link aggregation on the switches, aCloud platform implements the

link aggregation function from the software level. aSAN private network link

aggregation performs load balancing based on TCP connections, different TCP

connections between two hosts may use different physical links.


The four network planes are fault isolated, and failure of any one network plane

will not affect the other network planes.

2.3. System Self-Protection

Because the hyper-converged platform itself occupies a certain amount of

computing resources, in order to ensure the stability and performance of the

platform when carrying the service, the hyper-converged platform provides a

system resource self-guarantee mechanism: in the system startup phase, it will

forcibly retain the most basic computing and RAM resources required for the

platform to run to avoid too many system resources are diverted byvirtual

machines, resulting in the aCloud system malfunctioning; aCloud adaptively

retains the required system resources based on the functional components that

are enabled on the platform.

2.4. Resource Reservation

In order to guarantee sufficient resources are provided for HA execution and

resume the service in the event of host failure. aCloud provides resource

reservation mechanism: to reserve certain resources on a physical host, this part

of the resources are not allocated under normal circumstances, this resource is
allowed to be allocated only when a host fails and the HA mechanism is kicked

in.

The resource reservation mechanism can prevent the HA mechanism of the

entire aCloud platform from being invalidated after the resources are over-

utilized. For the HA mechanism, please see "Chapter 3.2 Virtual Machine High

Availability HA".

2.5. Monitor Center

The hyper-converged platform provides a monitoring and alarming center. It can

provide comprehensive monitoring and alarming services for services running on

the platform, and customize key indicators for intelligent monitoring and rapid

alerting, enabling business personnel to identify application bottlenecks faster, all

dynamics are mastered globally.

➢ Monitor key information such as virtual machine CPU, memory, IO, internal

process status, and form historical trend reports;

➢ Provides various alarm modes such as syslog, snmp trap, email, and SMS. Users

can receive key alarm information in time.

2.6. Watchdog

The system process may suffer a crash, deadlock, etc. caused by an unknown error,

causing the process to not provide external services. At this time, the process

watchdog mechanism provided by the hyper-converged platform can resume

the process in time.

A separate daemon is run in the background of aCloud, the process has the

highest priority, is responsible for monitoring all aCloud system processes, once a

system process crashes, deadlocks, etc., Watchdog will force intervention to restart
the process, resume business operations and record the status information of the

process at that time into the black box for post-analysis.

2.7. Black Box

In the event of system crash, process deadlock or abnormal reset failure, in order

to ensure business continuity and fault location and processing, the hyper-

converged platform preferentially restores the service and provides black box

technology to back up the "dying information" to a local directory for subsequent

fault analysis and processing.

The black box is mainly used to collect and store the kernel log and diagnostic

information of the diagnostic tool before the abnormal exit of the operating system

on the management node and the compute node. After the operating system

crashes, the system maintenance personnel can export and analyze the data

saved by the black box function.

2.8. System Files Backup

aCloud platform provides system files (platform configuration data) 1-click

backup capability, When a system-level failure happens to the platform and

results in the loss of system configuration file, users can quickly restore the system

configuration from a backup file.


3. aSV Compute Layer Reliability
Design

3.1. VM Restart

When the application layer of the VM GuestOS is not scheduled (blue screen or

black screen), aCloud provides an abnormal restart mechanism for the VM to

perform abnormal detection and forced reset to restore services in a timely

manner and ensure business continuity.

aCloud platform will always detect application-level availability, optimization

tools by installing Sangfor vmtool in virtual machines. a few seconds to The vmtool

sends a heartbeat to the host where the virtual machine is running on every few

seconds, then the host determines whether the application layer of the guest

system of the virtual machine is scheduled or not based on heartbeat, disk IO

and network traffic status sent by the VM. After the application layer does not

schedule the state for several minutes, the virtual machine may be considered to

have a black screen or a blue screen. The virtual machine performs HA

operation, shuts down the VM and restarts.

There are many reasons for the abnormality of the virtual machine. The system

blue screen, hardware driver, pirated software, software virus, etc. caused by

hard disk failure, drive error, CPU overclocking, BIOS setting, software poisoning,

etc., the business operating system causes the system to be black screen, etc. At

this point, the hyper-converged platform can provide related automatic restart

solutions to help administrators automate operation and maintenance.

3.2. VM HA(High Availability)

When the external environment is faulty (for example, the host network cable is

disconnected, the storage cannot be accessed, etc.), the hyper-converged


platform provides a mature HA mechanism, and the faulty host's service is

automatically restarted on the healthy host with sufficient resources to implement

the service uninterruptedly or with a very short interruption.

In an aCloud cluster, cluster heartbeat detection will be performed on the nodes

where VMs enabled with HA are running on by the polling mechanism, every 5s to

detect whether or not the virtual machine state is abnormal, and when abnormal

duration reaches a fault detection sensitivity set by the user (the shortest time

is 10s), the HA virtual machine is switched to other hosts to ensure high availability

of the service system, which greatly shortens the service interruption time caused

by various host failures or link failures.

Note: The HA mechanism requires reserved resources (mainly memory resources)

in the entire cluster for the abnormal virtual machine to be pulled up, that is,

the " 2.3 resource reservation guarantee" technology. If the resources are

insufficient, the HA function will fail to pull up the VMs.

3.3. VM Snapshot

When a virtual machine has an illogical failure and cause a service abnormality,

such as a virtual machine change failure (virtual machine patching, new software
installation, etc.), the hyper-converged platform provides virtual machine snapshot

technology, which can quickly roll back to the healthy service state at the

snapshot time .

A virtual machine snapshot is a state in which the state of a virtual machine is saved

at a certain time, so that the virtual machine can be restored to the state at that

time.

3.4. Live Migration

When the administrator wants to perform hardware maintenance and host

change operations on the hosts, the hyper-converged platform provides a virtual

machine live migration mechanism to migrate the virtual machine to other hosts

without affecting the service operation, ensuring that the service continues to

provide services.

When the VM is live-migrated, the information of the source and destination is

synchronized, including the memory, vCPU, disk, and peripheral register status.

After the synchronization is complete, the source VM is suspended and the

computing resources occupied by the source host are released and destination

VM will be started.

During the migration process, the resources of the physical host are checked. If

the resources are insufficient, the migration fails. If the target virtual network is

consistent with the source (if not, the alarm is generated and user decides

whether to continue the migration), the migration is guaranteed.

aCloud live migration supports the following three scenarios:

1) Intra-cluster live migration: because of the distributed shared storage in


the cluster, the virtual machine can only migrate the running location, and

the storage location does not change, so only running data

synchronization (memory, vCPU, disk and peripheral register status) is

required;
2) Cross-storage live migration in the cluster: when the storage location

needs to be migrated, the migration service first migrates the virtual

machine virtual disk image file and then synchronizes the running data.

3) Cross-cluster hot migration: Synchronize virtual disk image files and running
data.

Note: aCloud supports heterogeneous servers to form a cluster. By default,

the new virtual machine of aCloud uses the same type of vCPU , so that the

virtual machine does not depend on the physical CPU model (instruction set),

and can support virtual machine live migration across the physical hosts with

different generations of CPUs.

3.5. Host Maintenance Mode

When the administrator wants to perform hardware maintenance and host

change operations on the host, the hyper-converged platform provides host

maintenance functions, which can achieve the effect of automatic virtual

machine live migration. The system will first migrate the services running on the

host in the maintenance mode to other hosts, ensuring that the services are

affected during the replacement process. The maintenance mode can achieve

the effect of self-operation and maintenance; the host that enters the single-host

maintenance mode is in a frozen state and cannot read and write data.

When there is no host maintenance function, the administrator needs to manually

migrate the virtual machine and there may be a single point of data failure. In

the host maintenance mode, the virtual storage copy check is performed to

ensure that the data copy on the host has a copy on the other host. Host power-

off operation does not affect service.


3.6. DRS(Dynamic Resource Scheduler )

aCloud platform provides a dynamic resource scheduling mechanism to monitor

the usage of resource pools in the cluster and monitor the entire cluster when the

virtual machine service pressure is so high that the performance of the physical

host can be insufficient to carry the normal operation of the service. The DRS

function will dynamically calculate the resource status and dynamically migrate

the virtual machine on the resource overloaded server to the server with sufficient

resources to ensure the healthy running status of the services in the cluster and

balance the host load in the cluster .

The baseline for overloading host resources is user-defined,

including CPU overload, memory overload, and overload duration. This

prevents the traffic from being switched back and forth due to DRS, and the user

can select manual and automatic resource scheduling.

3.7. DRX(Dynamic Resource eXtension)

When the virtual machine service pressure increases, the computing resources

allocated when the user creates the service are insufficient to carry the current
stable operation of the service. The hyper-converged platform provides the

dynamic resource expansion function to monitor the memory and CPU resource

usage of the virtual machine in real time. When the computing resources

allocated for the virtual machine are about to reach the bottleneck, and the

computing resource resources of the running physical host are sufficient, the

computing resources (CPU and memory) are automatically or manually added

to the service virtual machine to ensure the normal operation of the service;

When the resources of the running physical host are overloaded, the computing

resource hot add operation will not be performed to avoid squeezing the

resource space of other virtual machines. At this time, dynamic resource

scheduling will be performed according to the load condition of the cluster.

The service virtual machine resource usage bottleneck is customized by the user,

including CPU usage, memory usage, and the duration of the computing

resource reaching the bottleneck, ensuring that resources are allocated to the

applications that need it .

3.8. VM Priority

When the available resources of the cluster are limited (system resources are

tight, host downtime, virtual machine HA, etc.), priority is required to ensure the
operation of important services. The hyper-converged platform provides virtual

machine priority classification tags to prioritize the resource supply of important

virtual machines and ensure that the virtual machine business has been a higher

level of resource protection.

3.9. Recycling Bin

When the administrator manually deletes resources such as virtual machines and

needs to retrieve the deleted devices, the hyper-converged platform provides a

resource recycling mechanism. The administrator can go to the recycle bin to

retrieve the virtual machines and virtual network devices that have not been

completely deleted. The user provides a "false delete operation buffer" protection

mechanism and a "reverse" opportunity to ensure the reversibility and correctness

of the user's operation as much as possible.

The virtual device deleted by the user will be temporarily put in the recycle bin for

a period of time. At this time, the disk space occupied by the deleted device is

not released, the data is not deleted, and the device in this state can be

retrieved; the deleted device that is in the recycle bin for more than 30 days will

be automatically deleted and the device disk space will be released.

3.10. VM Anti-affinity

When multiple virtual machines are in active/standby or load balancing

relationship, such as multiple RAC node virtual machines in the Oracle RAC

database, if these virtual machines are placed on one host, as if all the eggs are

placed in one basket, the service will be compromised when node fails; aCloud

hyper-converged platform provides a virtual machine security anti-affinity

mechanism to ensure that virtual machines with mutually exclusive relationships

will not run on the same host. When one host is down, it runs on other hosts in the

cluster. The virtual machine can continue to run to ensure the continuity of the

business. When the DRS dynamic resource scheduling and HA pull up take place,
the mutually exclusive virtual machine still follows the principle of anti-affinity, and

prohibits these virtual machines from running on the same host.

4. aSAN Storage Layer


Reliability Design

4.1. aSAN Distributed Storage Architecture

The aSAN storage layer adopts a self-developed distributed storage system,

which uses the virtualization technology to “pool” the local hard disk in the

general-purpose X86 server in the cluster storage volume to realize unified

integration, management and scheduling of server storage resources, and finally

provide NFS/ iSCSI to the upper layer, allowing the virtual machine to freely

allocate storage space in the resource pool according to its storage

requirements.
4.2. Data Replica Based Protection

When the hardware fails (hard disk damage, storage switch/storage network

card failure, etc.), the data on the failed host is lost or cannot be accessed,

which affects service operation. The hyper-converged platform provides data

multi-copy protection mechanism to ensure service data has multiple copies in

the storage pool, and they are distributed on different disks of different physical

hosts. Therefore, the user data still has a functioning copy on other hosts, which

ensures that data will not be lost and services can be run normally.

Note: The multi-copy mechanism only solves the hardware-level faults and does

not solve the logic-level faults. For example, “the upper-layer application is

ransomware encrypted”, the bottom layer will be encrypted regardless of the

number of copies used.


4.3. Arbitration Based Protection

When multiple copies are inconsistently written due to network and other reasons,

and multiple copies consider themselves to be valid data, when the service is not

clear which copy data is correct, data split-brain occurs, affecting the normal

operation of the service. The hyper-converged platform provides a multi-

copy arbitration protection mechanism. Each service has multiple copies of

data + a copy of the arbitration; the arbitration copy is used to determine which

copy of the data is correct, and the service is informed to use the correct copy of

the data to ensure the safe and stable operation of the service.

The arbitrated copy is a special copy. It has only a small amount of parity data,

and the actual storage space is small. The quorum copy also requires that the

data copy must meet the principle of mutual exclusion of the host. Therefore, at

least three storage disks are composed to have a copy of the arbitration. The

core principle of the arbitration mechanism is that "the minority is obeying the

majority", that is, when the number of data copies accessible on the host where

the virtual machine is running can access less than half of the total number of

copies (data copy + arbitration copy), the virtual machine is prohibited to be run

on this host. Conversely, the virtual machine can be run on that host.

4.4. Spare Disk

When a certain HDD hard disk is damaged in the cluster and the IO read/write

fails, which affects the service, the hyper-converged platform provides data hot

spare disk protection. The system hot spare disk can automatically replace the

damaged HDD hard disk to start working without manual intervention by the user.

In a scenario where the host cluster is large and the number of hard disks is large,

the fault of the hard disk may occur from time to time. The aCloud platform

allows users to stop worrying about data loss caused by hard disk damage and

not-in-time replacement.
4.5. IO QOS Protection

In order to provide higher cluster IO capability and optimal allocation of IO for

user services, the hyper-converged platform provides IO QOS protection

mechanism, and users can ensure the IO supply of important services, including

IO queue priority, by configuring virtual machine priority. Resources such as SSD

layered cache space are used in priority.

The service priority policy is: important virtual machine service IO > normal virtual

machine service IO > other IOs (backup, data reconstruction, etc.); the platform

will automatically check the IO throughput load and physical space occupied by

each physical disk, and provide different service scheduling strategies to

maximize IOs.

4.6. Disk State Detection

When the life of the hard disk expires and the number of bad sectors on the hard

disk is too high, the hard disk is actually in a sub-health state. Although the hard

disk can be recognized for data read and write operations, the hard disk has the

disadvantage of unsuccessful reading and writing and even data loss. The

platform provides a sub-health detection mechanism for the hard disk to detect

and avoid the impact of hard disk failure on the service in advance.

The hard disk sub-health detection calls the smartcrtl and iostat commands to

obtain the status information of the hard disk, and compares with the abnormal

threshold of the hard disk to determine whether the hard disk has sub-health

phenomena (such as slow disk, carton , PCIE SSD life detection, etc.), and filters

through the kernel log for the IO call and the RAID card error logs, and the error

information of the hard disk is obtained therefore.

The basic principle is as follows:


The sub-health hard disk will display the "slow disk" alarm on the aCloud platform

to help users discover the sub-health hard disk and replace it with a healthy hard

disk in time to ensure that the hard disks in the cluster are healthy. The sub-health

hard disk will be restricted to add new fragments. The shards are all silently

processed and cannot write new data, and the data on the sub-health hard disk

is rebuilt onto the healthy hard disk.

4.7. Disk Maintenance Mode

After the hard disk is in the sub-health state and the alarm is generated, the

operation and maintenance personnel need to perform the hard disk

replacement operation. If the data synchronization task needs to read data from

the hard disk to be replaced, the operation of the disk insertion may cause

double faults and thus affect the impact. In this case, you can use the hard disk

maintenance/ hard disk isolation function. Before the system isolates the hard

disk, the data will be fully inspected to ensure that the data on the hard disk has

a healthy copy on the other hard disk. The hard disk after the isolation will not

allow data to be read and written to ensure that services are not affected when

the hard disk is isolated.


4.8. Silent Error Detection

There is an unwarrantable error during the use of the hard disk, that is, a silent

error, until the user needs to use the data, they will find that the data has been

wrong and damaged, and eventually cause irreparable damage, because there

is no warning of silent error. The sign that the error may have occurred has been a

long time, leading to a very serious problem. NetApp conducted observation for

more than 1.5 million hard disk drive over 41 months, and discovered that more

than 400,000 silent data corruption, wherein the hardware RAID controller does

not detect errors in more than 30,000.

In order to prevent the return of user error data due to silence error, the hyper-

converged platform provides aSAN data end-to-end verification function, and

adopts the industry-leading checksum algorithm through the Checksum engine,

Verify engine and Checksum management module. In conjunction with the key

technology of checksum storage performance optimization, a checksum is

generated as a "fingerprint" of the data as soon as the user data enters the

system, and is stored. After that, the checksum will be used to verify the data to

protect the user from silent failures;

The schematic diagram is as follows:

End-to-end technology has two key points: “checksum generation algorithm”

and “storage performance optimization during checksum generation ”. aSAN has

industry-leading technical solutions at these two points.


Key Technology 1: Industry's leading edge checksum algorithm

The checksum algorithm has two main evaluation criterias: one is the speed at

which the checksum is generated; the other is the conflict rate and uniformity. The

collision rate is the probability that two data are different but generate

the same checksum.

Sangfor hyper-converged aSAN data end-to-end verification scheme uses the

XXhash64 algorithm, which is faster and has a lower collision rate than the CRC-32

and Adler-32 algorithms commonly used in the industry.

Key Technology 2: Storage Performance Optimization at Checksum Generation

The checksum is generated in memory and can be transferred and stored along

with the data. When data is stored in non-volatile storage such as disks and SSDs,

checksums also need to be stored. This introduces additional write overhead and

affects system performance.

Sangfor aCloud hyper-convergence is based on a non-metadata center

architecture. In the aSAN end-to-end verification scheme, it optimizes the

checksum storage by using asynchronous brushback, key I/O path bypass, and I/O

contention isolation to address performance issues. In addition, correctness and

consistency are ensured by self-checking, collision detection, and timing detection.

4.9. Fast Data Rebuilding

When multiple copies of data are written inconsistently, or after hardware

replacement in the event of the host/hard disk failures, the hyper-converged

platform provides a fast data reconstruction mechanism to check the working

status of the hard disk and the health of the copy periodically. The health data is

used for replica reconstruction of the source to ensure the security status of the

cluster data.
When the data disk and the cache disk are pulled out, the data disk and the

cache disk are taken offline. When the service IO is continuously faulty on the

data disk, the data disk is considered to be faulty, or the cache disk is considered

faulty when the service IO on the cache disk is faulty, the data reconstruction

process will be triggered.

The data reconstruction process uses the following technical solutions to speed up

the reconstruction:

1) Global participation, multiple concurrent reconstruction: I/0 of data

reconstruction is multi-concurrent, that is, reading from multiple

source hard disks and writing to multiple destination hard disks, realizing

rapid data reconstruction;

2) Intelligent reconstruction: data will occupy part of the storage network

bandwidth and hard disk performance during the reconstruction process,

then the reconstruction program can sense the I/O of the upper layer

service and intelligently adjust the I/O occupied by the reconstruction.

Quickly reconstruct data while ensuring normal operation of the

business;

3) Hierarchical reconstruction: The priority of data reconstruction depends on

the priority of the virtual machine. When the space resources of the

storage volume can be used to reconstruct data are scarce, the

hierarchical reconstruction can give priority to the important data of the

user.
4.10. Fault Domain Isolation

The hyper-converged platform provides storage fault domain isolation. The

storage partitions different disk volumes. Users can divide aSAN into different disk

volumes according to requirements. Each disk volume is an independent fault

domain. In the same fault domain, The copy mechanism and the rebuild

mechanism of aSAN will be isolated in the fault domain and will not be rebuilt to

other fault domains; the faults in the same fault domain will not spread to other

fault domains, which can effectively isolate the fault spread; A rack failure only

affects the disk volumes running on that rack.

4.11. Delayed Data Deletion

“3.9 Recycling Bin”section introduced that when the virtual device is completely

removed, the occupied disk space will be freed, equipment cannot be retrieved

after that; in order to further protect the user's operation and the reversibility, the

aSAN virtual storage layer provides a data delayed deletion mechanism to retrieve

virtual device data that is not completely deleted by the aSAN.

When the upper-layer service sends an instruction to delete data to the aSAN

data storage layer (such as completely deleting the virtual machine image

command), aSAN will check the remaining disk space. If the remaining disk space

is sufficient, aSAN will not delete this part immediately. The space is completely

cleared and reclaimed, and this part of the data will be placed in the "to-be-

deleted queue", and the feedback will be applied to the upper layer to delete the

successful result, and then continue to retain the data for a period of time (default

10 days), beyond this time then this part of the data will be deleted.

If the remaining space of aSAN is less than 70%, and there is data in the

background that needs to be deleted, aSAN will recycle the data to be deleted

according to the longest time principle, without waiting for the timeout.
4.12. Data Self-Balancing

aSAN uses data balancing to ensure that in any case, the data is distributed as

evenly as possible within each hard disk in the storage volume, avoiding extreme

data hotspots and utilizing the space and performance of the newly added hard

disk as soon as possible to ensure the hard disks of each host will be used.

1. Balancing trigger conditions:

1)Planned balancing

Initiates planned data balancing according to the planned timeframe (such

as 12 am to 7 am), when different hard drive capacity utilization within the storage

volume is vastly different, it will be trigger data balancing on disks with high usage,

migrating part of the data to a hard disk with low capacity usage.

Within the time frame planned by the user, aSAN's data balancing module

will scan all the hard disks in the storage volume. If the difference between the

highest and lowest hard disk capacity usage in the volume is found to exceed a

certain threshold (default is 30%), that is, the balance is triggered until the

difference between the usage rates of any two hard disks in the volume does not

exceed a certain threshold (the default is 20%).

For example, after the user expands the storage volume, the balance is

triggered to migrate the data to the newly added hard disk during the data

balancing plan time set by the user .

2) Automatic balancing

Auto-balance balances data automatically initiated by the system without

user intervention. This is to avoid the space of a certain hard disk in the storage

volume is full, and the other hard disk still has free space.

When there is a disk space usage in the storage volume that exceeds the risk

threshold (default is 90%), automatic balancing is triggered until the highest and

lowest disk capacity usage in the volume is less than a certain threshold (default is

3%).
2. Balance implementation

When the trigger condition is met, the system will calculate the upcoming

destination hard disk location that the data will be stored in units of slice data on

the source hard disk; destination hard disk location needs to satisfy the following

principles:

1 ) The principle of mutual exclusion of hosts must be met: that two copies of

the fragment after migration are not allowed to be located on the same host;

2 ) The principle of optimal performance: that is, the hard disk that still satisfies

the optimal data distribution strategy after the slice migration is preferred;

3 ) Capacity optimization principle: Priority is given to the destination hard

disk with low capacity usage.

During the balancing process, the newly added/modified data for the slice is

simultaneously written to the source and the target, that is, one more copy is

written; before the end of the balance, the balance program performs data check

on the source and the target to ensure data consistency before and after

balancing; after the balance is completed, the source shards will be moved to the

temporary directory for a period of time and then deleted.

5. aNET Network Layer


Reliability Design

5.1. aNET Network Layer Reliability Architecture

aNET network layer is using the management plane, control

plane, data forwarding plane disaggregated architecture, through standardized

and decoupled interfaces for the communication; If an abnormality occurs in the

sub-module, which only affect the module itself, and will not spread and lead to
the overall failure of the aNET network platform, and the high reliability design of

each plane realize a high reliability architecture of aNET.

Communication between the planes: the management plane receives user

configuration through "Management Service" module that will convert user

configuration into network configuration and deliver it to the service module

"central controller" in the control plane, the control plane analyzes the

configuration issued by management plane, and break it down, then distribute to

different computing nodes and network nodes, "data forwarding

plane" performs tasks; When status change or operation command is issued by

management plane, management agent will issue the configuration to the data

forwarding plane, then the forwarding plane execute on it directly without going

through the control plane.

5.1.1. Management Plane High Reliability

The management plane adopts a centralized control scheme, and the

management plane master node is elected through the cluster module, and the

cluster file system is used to store data in each network node in a distributed

manner. If the control node fails, aNET automatically elects a new master control

node, the new master node obtains cluster network configuration data through

the cluster file system to ensure high reliability of the management plane.
5.1.2. Control Plane High Reliability

The control plane adopts the same centralized control scheme as the

management plane. The cluster module selects the master control, and the

master control node pulls up the central controller. Through the

various reporting and network node module active reporting mechanism of the

network node, the central controller restores the current control. The real-time

state of each computing and network node is mastered to ensure high reliability

of the control plane.

5.1.3. Data Forwarding Plane High Reliability

The data forwarding plane runs on the application layer. Different from other

cloud platforms running in the kernel layer, when the forwarding plane is

abnormal, it will not cause the kernel to crash, and the forwarding plane can be

quickly restored by restarting the service mode, greatly reducing the impact to

the reliability of the platform itself; the data forwarding plane supports

the active/standby switchover in a single host. The standby process contains all

the configuration information of the data forwarding plane. After the main

process exits abnormally, the standby process immediately becomes the master

process and takes over all network forwarding services. The service will not be

interrupted, and the single host of the data forwarding plane is guaranteed to

be highly reliable.

5.2. DVSW(Distributed Virtual Switch)

The hyper-converged virtual switch adopts a distributed solution. A virtual switch

instance exists on all hosts in the cluster. When one of the hosts is offline, the traffic

passing through the virtual switch instance on the host is due to virtual routing and

virtual machine HA to other hosts. The traffic is also taken over by other hosts; the
application to the upper layer is that the virtual switch of the virtual machine is

the same one, and the virtual switch of the virtual machine is the same after the

virtual machine is moved, HA, etc. The access relationship is not affected,

ensuring high reliability of the data forwarding plane across the hosts in the

cluster.

5.3. vRouter

The virtual router in the aNET network layer is a centralized router. The traffic that is

forwarded on the Layer 3 needs to be forwarded through the router. When the

node where the router is located fails or the service network port connected to

the router cannot communicate, the communication between devices

connected to the router will be affected.

The hyper-converged aNET network layer provides the router HA function

to ensure the reliability of the Layer 3 forwarding network. The network

controller monitors the running status of the host and the status of the service

network port in real time. When the host is faulty or the service network port

cannot communicate, the central controller will calculate the affected virtual

routers and automatically switch these routers to other working hosts to ensure

that traffic passing through the router can be forwarded normally.


5.4. Distributed Firewall aFW

When virtual machine is abnormal or faulty, HA mechanism will reboot the VM on

another host in the cluster to resume service, the virtual network management

module will quickly establish the distributed firewall ACL policies that are

associated with the VM on the host where the VM is running after HA based on

the HA startup information, to ensure that the VM is protected by distributed

firewall all the time.

5.5. Reliability

The NFV device is integrated into the aCloud platform in the form of a virtual

machine, and has the high availability protection mechanism of a virtual

machine; the system provides a dual-machine high availability solution for the

NF V device to further ensure the reliability.

At the same time, the aNET network layer monitors the running status of the

NFV device in real time through multiple dimensions (watchdog , disk IO , network

traffic, and BFD detection). If the NFV device fails to work properly, the virtual

router will bypass the associated policy route to ensure that the service is not

affected by the N FV equipment failure.

Note: This section of the N FV device refers specifically to the application

delivery vAD and the application firewall vAF.


5.6. Connectivity Detection

When the virtual network is configured incorrectly or the network link is faulty, the

operation of the virtual network is abnormal. The operation and maintenance

module of the virtual network provides the network connectivity detection

function. The source virtual machine and the destination IP address to be

detected are set through the interface. The control plane sends the route to the

controller, and the controller then coordinates the control agents on multiple

nodes for connectivity detection and result reporting, and clearly presents the

logical and physical network path of the entire probe on the UI , helping the user

to quickly locate the connectivity fault in the virtual network.

5.7. VXLAN Network Reliability

aNET performs connectivity detection on VXLAN network on a routine basis,

ping detection is conducted for each other among each host VXLAN port IPs.

When ports can’t be pinged through for over 5s, alarm is generated on VxLAN

network failure and the connectivity status of VxLAN will be presented to help

user fast locate the VxLAN link failure. In the meantime, VxLAN jumbo frame

detection is also supported for users with VxLAN high performance mode

enabled.

Note: Network connectivity detection (overlay network) and VXLAN network

reliability (underlay network) together provide aNET virtual network

outflow problem location and protection.


5.8. Network Port Self-Recovery

aNET data forwarding plane will regularly check the packet transmission status of

network interface, when detecting the network port is unable to transmit packets

for successive 30s , reset process will be applied to the network ports, to ensure

that network port can be used normally as well as fast recovery of user traffic.

6. Hardware Layer
Reliability Design

6.1. Hardware Health Check

Hyper-converged products offer two delivery approaches: hardware and

software integrated delivery and aCloud pure software delivery (with third-party

hardware); in both ways, the hyper-converged platform provides hardware-level

reliability detection and protection to avoid Hardware failures causing serious

problems.

Hardware reliability monitoring includes health monitoring of CPU, memory,

network cards, hard drives, memory and RAID, to facilitate the timely detection of

problems and provide recommended solution guidance for the corresponding

anomaly detected; Testing results are presented in a unified manner, and user

can eliminate risk by operations based on the alarm information and user

prompts.

In addition, Sangfor aCloud appliance is integrated with BMC diagnostic module,

which can realize failure diagnosis of key components such as CPU, memory,

hard disk, network card, fan, temperature and power supply.


6.2. CPU Reliability

The hyper-converged platform periodically checks the CPU temperature and

frequency.if abnormal, alarms will be raised and solution is provided to avoid

the risk of CPU failure in advance and ensure the reliability of the CPU.

CPU temperature monitoring: The hyper-converged platform checks the

temperature of each physical core of the CPU every minute. When the CPU

temperature abnormality reaches the set threshold (10 minutes), the platform will

alarm.

CPU frequency monitoring: the HCI background periodically checks the CPU

frequency every hour. When the CPU frequency is down, it will alarm.

6.3. Memory Reliability

The hyper-converged platform provides ECC memory monitoring and memory

throughput rate detection to ensure memory reliability.

ECC monitoring: Real-time monitoring of memory using ECC (Error Checking and

Correcting) technology, including UC error (unmodifiable ECC, which will cause

the device to be down or restarted) and CE error (modifiable ECC, ECC error

report doesn't increase, will not affect the continued use of memory), which

includes software reasons, memory module failure, motherboard SI impact,

disturbance (environmental noise, high temperature, high frequency interference

of PWM chip) and heat dissipation of the whole machine;

As the particle processing of each manufacturer decreases, the memory

frequency increases continuously, the capacity of the capacitor to store charge

becomes smaller, and the leakage event is easy to occur. In recent years, the

memory ECC error problem has become more and more obvious. The UC-class

unmodifiable ECC error is monitored statistically by aCloud, alarms and solutions

are provided to avoid accidents.


Memory Throughput Rate Monitoring: The hyper-converged platform provides

detection for memory throughput, alerting when memory throughput is

significantly lower than the nominal memory size.

6.4. Disk Reliability

Hard disk hot swap and RAID : Sangfor hyper-converged appliance supports hard

disk (SAS/SATA) hot swap, supports hard disk RAID 0 , 1 , 10 and multiple other

RAID modes, guarantees high availability of hard disk; It also supports additional

hot spare disk under the RAID configuration to further ensure the high

redundancy of the data disks; supports reconstructing and balancing the data

after the hard disk failure and plugging.

Hard disk comprehensive monitoring, fault avoidance, high reliability of the hard

disk

1) Hard disk status monitoring: The hyper-converged platform monitors

the hard disk status in real time, and immediately alerts when the hard disk is offline;

2) IO error monitoring: The hyper-converged platform periodically analyzes the

IO error condition of the Dmesg information, and immediately alerts when error is

found;

3) SSD life monitoring: The hyper-converged platform regularly uses the smartctl

command to detect the life of the SSD hard disk. When the available life of the

SSD is less than 10% of the life of the entire hard disk, it will immediately alarm;

4) HDD bad sectors monitoring: aCloud uses smartctl instruction to scan all the

physical hard disk in accordance with user's instruction, alarm is raised immediately

when HDD bad sector is found; If the number of bad sectors is less than 10, then

disk replacement suggestion will be proposed, if it’s more than 10, the hard disk will

be labelled as a sub-health hard disk, the disk will be degraded, and the data will

be gradually migrated out;


5) IO latency monitoring: The hyper-converged platform will call the fio

command to test the latency of the random read 4k IO block size in the 32-depth

scene according to the user's instruction. When the latency is more than 10ms,

alarm will be immediately triggered; when the latency is more than 50ms

emergency alert will be triggered immediately, the hard disk will be set as a sub-

health hard disk and downgraded, and data will be gradually move out of this

hard disk;

6) IOPS/throughput monitoring: The hyper-converged platform will call the fio

command according to the user's instruction to test the bare disk 4k IO block size

to be randomly read in the 32-depth scenario. When the hard

disk IOPS performance reaches a dangerous value, an alarm is generated; for

example, when the IOPS of 7200 rpm hard disk is less than 60, the IOPS of 10,000

rpm hard disk is less than 100 and the IOPS of 15000 rpm hard disk is less than 140,

and the platform raises an alarm.

6.5. Network Card Reliability

Network port connection mode detection: In order to provide the correct network

environment for business, the hyper-converged platform checks the working

mode of the network port through the ethtool command to ensure that the

actual network port mode is consistent with the negotiation working mode.

Network port deployment detection: to ensure the normal operation of the

service, the hyper-converged platform performs deployment detection on all

network ports to ensure that the network port configured for a specific purpose

can function to prevent low-level faults such as dropped network ports and

unplugged network cables. If the network port is not deployed correctly, the

alarm prompts;

Network port packet loss detection: to ensure the stability of the service network,

the hyper-converged platform reads the NIC information and counts the packet
loss of the NIC. When the packet loss rate reaches a dangerous value, the alarm

is generated. For example, if the packet loss rate of the network port is greater

than 0.1 % in 10 minutes, alert is prompted;

Network port rate detection: in order to ensure the performance requirements of

running service, the hyper-converged platform detects the network port rate and

alarms when the network port rate reaches a dangerous value; for example, if

the network port rate is less than gigabit, the alarm prompts;

Full-duplex mode detection: in order to ensure the network efficiency required by

the service, the hyper-converged platform performs network mode detection to

ensure that the service operates in a full-duplex mode with high network

efficiency; if it is detected as a half-duplex network mode, an alert is generated.

6.6. RAID Card Reliability

RAID card abnormal status check: the HCI platform analyzes the health status of

the RAID card by reading the RAID status information through system instruction. If

the RAID card has an error or anomaly, an alarm is raised to prompt the user to

check or replace the RAID card;

JBOD (Non-RAID) mode check: in order to ensure the hot swop feature of the hard

disks, the hyper-converged platform performs RAID JBOD mode detection, if it’s

non-JBOD mode, then alert is raised.

6.7. Power Supply Reliability

The hyper-converged appliance is equipped with two sets of power supplies,

which support power supply 1+1 redundancy and power hot swap. After one

power fails, the system can continue to operate without affecting the service,

and the faulty power supply can be replaced online.


6.8. Alarm Service

Sangfor aCloud platform provide comprehensive alerting service, including

multiple dimensions of abnormal alarms of clusters, hosts, storage, networking,

virtual machine, the alarm information is displayed on the page when problems

are found and warning level grouping is offered, users are notified by e-

mail and text messages to ensure alarms are received in a timely manner.

Administrators can set the most suitable alarm policy based on business

requirements to guarantee the accuracy of the platform detection; alarms such

as high usage of the host memory, CPU high-frequency and so on; and provides

log auditing capabilities to further protect the reliability of operation.

7. Solution Layer Reliability Design


7.1. VM Fast Backup

The multi-copy mechanism of aCloud platform can handle the hardware level

single point of failure, making sure that when the hardware level fails, the

platform can provide data redundancy. However, if there is a multipoint failure of

the aCloud platform (all the multiple copies are damaged), or a logical error

(blackmail virus, business database accidental deletion) occurs, the multi-copy

mechanism cannot solve these problems.

The aCloud platform provides the first-time full backup + subsequent incremental

backup + bitmap dirty data marking technology fast backup function to solve

such failures. This function is aCloud’s proprietary technology, greatly improving

backup efficiency and reducing the impact of the backup process to the

production environment.
1) First, perform a full backup (if there is already a full backup, directly perform

an incremental backup);

2) After the full backup, the service continuously writes new data (G and H )

and marks it with bitmap. At this time, the new data can be directly written in the

original position of the qcow2 file, and the data of the modified location is only

incrementally changed in the next backup; After the end, reset the bitmap to 0 to

perform each backup operation.

3 ) When incremental backup files are deleted, data will be merged

backwards to ensure that each reserved backup data is completely available,

thus freeing up space and saving backup storage resources.


When multiple disk images of a virtual machine or disk images of multiple virtual

machines are related, fast backup also provides multi-disk data consistency check.

For example, in the database application scenario, the database (SQL Server ,

Oracle ), the data disk and the log disk must maintain the consistency of the

backup time. Otherwise, when the backup is restored, the restored Oracle system

will still be unavailable due to the inconsistency, and the aCloud fast backup can

ensure that multiple disks of the database data are restored in a consistent manner.

Compared with snapshot-based CBT backup solutions used by other platforms in

the industry, aCloud fast backup technology has an essential improvement in

performance and efficiency, because it can be directly written when writing new

data in the original location, and no copy-on-write will occur. The mapping

between the qcow2 file and the data location will not be out of order, so it does

not affect the performance of the qcow2 image; the incremental backup method

reduces the amount of data for each backup, thereby increasing the backup

speed.

7.2. CDP(Continuous Data Protection)

Virtual machine continuous data protection (CDP) technology is also aCloud

proprietary technology, providing users with additional image protection, finer-

grained data protection technology: virtual machine fast backup technology

can provide hourly granularity protection, while CDP technology can provides

one second or 5 seconds level of data protection, it records every data change

and restore the data with near-zero loss for ultimate protection.

Sangfor aCloud has done deep optimization for the CDP technology, compared

to the traditional CDP software that works as an agent embedded in the OS

layer, Sangfor integrated CDP module with the qcow2 file layer, thus providing a

better CDP data protection solution that is low cost, easy to deploy and more

suitable for VM businesses.


The CDP backup data consists of the RP log file and the BP backup file. The

bypass structure + IO offload + shared cache area technology is used to

asynchronously copy IO from the main service to the CDP log storage repository,

and periodically generate RP points to ensure that the CDP backup process does

not affect the normal service, and the fault isolation is implemented. The fault of

the CDP module does not affect the normal service as well. The BP point is

generated periodically according to the configured backup frequency. The

generated BP point and RP point are marked with a time stamp to locate the

recovery point in the event of a failure.

The traditional CDP software inserts a "probe program" on the IO path. If the

"probe program" itself is faulty or the CDP- dependent storage fails, the business

environment of the original production environment may be abnormal.

The CDP technology provided by aCloud HCI acquires the IO image in

the bypass mode, and if the CDP module is faulty, it will not cause the failure of

the original production system.

CDP also provides consistency check on stored data on multiple disks to ensure

that data at each recovery point is correct and valid.

1 ) the CDP storage has three disks, each IO write forms a RP point marked

with an id, the RP points marked with the same ID on the 3 disks are

considered belonging to the same consistency RPs.


2 ) The above RP points marked with id 3 all exist, so RP3 is a valid RP of

consistency, can be shown on the page for VM restoration;

3 ) The above RP with id 6 is missed in vdisk2, so RP6 is not a valid RP of

consistency , it cannot be shown on the page and used to restore the virtual

machine .

7.3. DR(Disaster Recovery)

Sangfor aCloud provides a complete off-site disaster recovery solution to help

users cope with server room level failures, providing a complete disaster recovery

solution that does not depend on third-party software, reducing the complexity of

the whole solution, making the whole solution simpler and more stable. The active

and standby DR solutions are mainly used for disaster recovery in the same city or

in different locations. The production center and the disaster recovery center

adopt the active/ standby mode. When a disaster occurs in the production center

or a fire occurs, the disaster recovery center can quickly restore services and

maximize the protection for the continuous operation of the business system.
Sangfor aCloud offsite disaster recovery solution realizes asynchronous data

replication of virtual machines across 2 clusters with the integration of DR module

aDR for data backup and transfer. The DR gateway aDR calls CDP backup API to

perform local data backup for the protected VMs and transmit data between DC

and DRC to achieve asynchronous replication. The disaster recovery gateway

supports encryption, compression, dynamic flow control, consistency check, and

breakpoint transmission of data to ensure data security, reliability, and integrity.

The hyper-converged cloud management platform implements unified

management of the production center cluster and the disaster recovery center

cluster, and provides disaster recovery policy management, disaster recovery

planning, disaster recovery large-screen monitoring, and disaster recovery testing

functions to achieve second-level RPO recovery.


7.4. SC(Stretched Cluster)

Sangfor aCloud platform stretched cluster storage active-active solution

achieves 0 RPO and second-level RTO recovery in the event of data center

failure. when a site fails, applications running on the stretched cluster can

seamlessly access the other copy in the other site to realize inter-site business high

availability, VMs can be live migrated or HA failover between the 2 sites.

In the section " 4.2 Data Replica Based Protection", the business data is written to

the storage volume in multiple copies. After the hyper-converged platform is built

into the stretched cluster, multiple copies of the business data running in the

stretched cluster will be synchronously written to two sites. After receiving the

completion confirmation of the two data centers, it is considered that a write IO

is completed, only then the next IO can be written to ensure the consistency of

the data copy; when the service is running normally, the local data is

preferentially accessed. When the local data copy is inaccessible, the system will

switch to access the copy in the remote data center; therefore, when one data

center fails, the virtual machine can be pulled up in another data center by HA,

and the data copy 2 is accessed, maximizing the protection for the continuous

operation of the business system.


When a user runs an Oracle RAC database cluster or other distributed cluster

services, failover takes place automatically between different sites to implement

active-active service. The virtual machines that active-active services must run in

different fault domains. Sangfor aCloud supports specifying virtual machine’s

running location, assuming customer has an active-active business run on VM A

and VM B, when the VMs are created, you can specify that VM A can only run on

the main site and B on the secondary site to ensure that the virtual machines’

running locations are mutually exclusive.

For example, in the Oracle RAC scenario, both RAC nodes are set to run in a

certain server room and are mutually exclusive. Then, when a server room fails,

the other node is still running.

The stretched cluster performs data consistency check through the arbitration

copy. For details, please refer to " 4.3 Data Arbitration Protection" .
Block A1, Nanshan iPark,

No.1001 Xueyuan Road, Nanshan District, Shenzhen,

Guangdong Province, P. R. China (518055)

Service hotline: +60 12711 7129 (7511)

Email: sales@sangfor.com

You might also like