Sangfor HCI Reliability Technical White Paper 20191019

Sangfor aCloud
Reliability Technical
White Paper
Sangfor Technologies Inc.

Copyright Notice
This document is the copyright of SANGFOR Technologies INC., Sangfor
reserves the final interpretation and the right to amend this document and this
statement.
Any text appearing in this document narrative content of the document
format, illustrations, photographs, methods, processes, etc., unless specifically
stated, the copyright and other related rights belong to Sangfor. Without Sangfor’s
written consent, no person shall in any manner or form on any part of the copy of
this document, extract, backup, modify, distribute, translate into another language,
in its entirety or in part, for commercial purposes.
Disclaimer
This document is for informational purposes only and is subject to change
without notice.
Sangfor Technologies Inc. has made every effort to ensure that its contents are
accurate and reliable at the time of writing this document, but Sangfor is not liable
for any loss or damage caused by omissions, inaccuracies or errors in this document.
Contact us
Service hotline: +60 12711 7129 (7511)
Hong Kong: (+852) 3427 9160
United Kingdom: (+44) 8455 332 371
Singapore: (+65) 9189 3267
Malaysia: (+60) 3 2201 0192
Thailand: (+66) 2 254 5884
Indonesia: (+62) 21 5695 0789
You can also visit the official website of Sangfor Technologies:
www.sangfor.com for the latest technology and product information.

Table of Contents
1. HYPER-CONVERGED PLATFORM ARCHITECTURE .................................................... 4
2. ACLOUD PLATFORM MANAGEMENT RELIABILITY .................................................... 5
2.1. DISTRIBUTED ARCHITECTURE ............................................................................................... 5
2.2. LINK REDUNDANCY ........................................................................................................... 7
2.3. SYSTEM SELF-PROTECTION ................................................................................................. 8

2.4. RESOURCE RESERVATION .................................................................................................. 8
2.5. MONITOR CENTER............................................................................................................. 9
2.6. WATCHDOG ..................................................................................................................... 9
2.7. BLACK BOX ..................................................................................................................... 10
2.8. SYSTEM FILES BACKUP...................................................................................................... 10
3. ASV COMPUTE LAYER RELIABILITY DESIGN ............................................................ 11
3.1. VM RESTART ................................................................................................................... 11
3.2. VM HA（HIGH AVAILABILITY） ..................................................................................... 11
3.3. VM SNAPSHOT................................................................................................................ 12
3.4. LIVE MIGRATION ............................................................................................................. 13
3.5. HOST MAINTENANCE MODE ........................................................................................... 14
3.6. DRS（DYNAMIC RESOURCE SCHEDULER ）.................................................................. 15
3.7. DRX（DYNAMIC RESOURCE EXTENSION） .................................................................... 15
3.8. VM PRIORITY .................................................................................................................. 16
3.9. RECYCLING BIN............................................................................................................... 17
3.10. VM ANTI-AFFINITY .......................................................................................................... 17
4. ASAN STORAGE LAYER RELIABILITY DESIGN .......................................................... 18
4.1. ASAN DISTRIBUTED STORAGE ARCHITECTURE .................................................................. 18
4.2. DATA REPLICA BASED PROTECTION ................................................................................ 19
4.3. ARBITRATION BASED PROTECTION ................................................................................... 20
4.4. SPARE DISK ..................................................................................................................... 20
4.5. IO QOS PROTECTION .................................................................................................... 21
4.6. DISK STATE DETECTION .................................................................................................... 21

4.7. DISK MAINTENANCE MODE ............................................................................................ 22
4.8. SILENT ERROR DETECTION ................................................................................................ 23
4.9. FAST DATA REBUILDING ................................................................................................... 24
4.10. FAULT DOMAIN ISOLATION .......................................................................................... 26
4.11. DELAYED DATA DELETION ........................................................................................... 26
4.12. DATA SELF-BALANCING .............................................................................................. 27
5. ANET NETWORK LAYER RELIABILITY DESIGN........................................................... 28
5.1. ANET NETWORK LAYER RELIABILITY ARCHITECTURE ......................................................... 28
5.1.1. Management Plane High Reliability ................................................................. 29

5.1.2. Control Plane High Reliability .............................................................................. 30
5.1.3. Data Forwarding Plane High Reliability ............................................................ 30
5.2. DVSW（DISTRIBUTED VIRTUAL SWITCH） ....................................................................... 30
5.3. VROUTER ......................................................................................................................... 31
5.4. DISTRIBUTED FIREWALL AFW ............................................................................................ 32
5.5. RELIABILITY ....................................................................................................................... 32
5.6. CONNECTIVITY DETECTION .............................................................................................. 33
5.7. VXLAN NETWORK RELIABILITY ........................................................................................ 33
5.8. NETWORK PORT SELF-RECOVERY .................................................................................... 34
6. HARDWARE LAYER RELIABILITY DESIGN ................................................................. 34
6.1. HARDWARE HEALTH CHECK............................................................................................ 34
6.2. CPU RELIABILITY .............................................................................................................. 35
6.3. MEMORY RELIABILITY ....................................................................................................... 35
6.4. DISK RELIABILITY ............................................................................................................... 36
6.5. NETWORK CARD RELIABILITY ........................................................................................... 37
6.6. RAID CARD RELIABILITY .................................................................................................. 38
6.7. POWER SUPPLY RELIABILITY .............................................................................................. 38
6.8. ALARM SERVICE .............................................................................................................. 39
7. SOLUTION LAYER RELIABILITY DESIGN .................................................................... 39
7.1. VM FAST BACKUP ........................................................................................................... 39
7.2. CDP（CONTINUOUS DATA PROTECTION）................................................................... 41
7.3. DR（DISASTER RECOVERY）.......................................................................................... 43
7.4. SC（STRETCHED CLUSTER） ........................................................................................... 45

1. Hyper-converged Platform
Architecture
Sangfor aCloud HCI platform is based on the idea of “software-defined data
center”, with virtualization technology as the core, using computing
virtualization aSV , storage virtualization aSAN , network virtualization aNET and
other components to form a unified resource pool and reduce data center
hardware equipments, effectively saving investment costs, shortening application
time-to-online; providing graphical interface and self-service operation and
maintenance capabilities, reducing operation and maintenance complexity,
helping users to liberate productivity; and continuously polishing the product
quality, striving to create minimal, stable and reliable high performance hyper-
converged solution.
Sangfor aCloud is a software-centric platform, the architecture is the most
fundamental guarantee to ensure the reliability of the product itself, including
platform management, compute, storage, network, hardware layer and solution
level reliability.
2. aCloud Platform Management

Reliability
2.1. Distributed Architecture
Sangfor aCloud adopts a fully distributed architecture to ensure platform
reliability .
1) The hyper-converged cluster adopts a non-centralized design. Each node

is an independent peer-to-peer working node. There is no single-node
failure risk. The master control mode is used as the access point to
manage the cluster. The platform automatically elects the master control
through the algorithm. If the host of the master node fails, the platform
automatically re-elects the new master node to ensure the stability and
accessibility of the cluster. During the master node switching process, the
normal operation of the VM is not affected.
2) The hyper-converged cluster configuration information is distributed in the
cluster nodes by multiple copies in the cluster file system. If any single node
fails, the cluster configuration data will not be lost.
aCloud overall architecture diagram
➢ Controller : provides management and control services for the entire cluster,
such as user management and authentication, resource alarms, backup
management, etc.; Controller exists on each node, but only
one master controller is active at the same time, and the status of other node
controllers is Standby .
➢ Worker : primarily responsible for performing computation, configuration, data

transmission exchange such specific work; Each node has a Worker in action.
2.2. Link Redundancy
aCloud HCI solution has four network plane, each network plane is
independently deployed, namely: management network, business network, data
communications network (VXLAN ) and storage network.
Management network: The administrator accesses the management network to
manage the hyper-converged cluster. The management network implements
link redundancy through dual-switch aggregation. The failure of a single switch
and a single link does not affect the stability of the hyper-converged
management platform.
Business network: used for normal service access and release. The business network
can implement link redundancy through dual-switch aggregation. You can set the
network port static binding for the service egress. You can set multiple service
outlets for virtual machine selection in the virtual network to ensure high reliability
the business enetwork.
Data communication network (VXLAN ): East-west traffic between virtual
machines, which can realize communication between services, set up private
network to ensure data security; use physical switches to achieve link
redundancy through aggregation; A distributed virtual switch on Sangfor aCloud
has a virtual switch instance on each and every host in the cluster. When one of
the hosts goes offline, the traffic that passes through the virtual switch instance on
the host is redirected and taken over by other hosts due to virtual routing and
virtual machine HA on other hosts.
Storage network: the need to perform data storage through the network of
IO operation; set up a private network to protect data security, no need for static
binding and link aggregation on the switches, aCloud platform implements the
link aggregation function from the software level. aSAN private network link
aggregation performs load balancing based on TCP connections, different TCP
connections between two hosts may use different physical links.

The four network planes are fault isolated, and failure of any one network plane
will not affect the other network planes.
2.3. System Self-Protection
Because the hyper-converged platform itself occupies a certain amount of
computing resources, in order to ensure the stability and performance of the
platform when carrying the service, the hyper-converged platform provides a
system resource self-guarantee mechanism: in the system startup phase, it will
forcibly retain the most basic computing and RAM resources required for the
platform to run to avoid too many system resources are diverted byvirtual
machines, resulting in the aCloud system malfunctioning; aCloud adaptively
retains the required system resources based on the functional components that
are enabled on the platform.
2.4. Resource Reservation
In order to guarantee sufficient resources are provided for HA execution and
resume the service in the event of host failure. aCloud provides resource
reservation mechanism: to reserve certain resources on a physical host, this part
of the resources are not allocated under normal circumstances, this resource is
allowed to be allocated only when a host fails and the HA mechanism is kicked
in.
The resource reservation mechanism can prevent the HA mechanism of the
entire aCloud platform from being invalidated after the resources are over-
utilized. For the HA mechanism, please see "Chapter 3.2 Virtual Machine High
Availability HA".
2.5. Monitor Center
The hyper-converged platform provides a monitoring and alarming center. It can
provide comprehensive monitoring and alarming services for services running on
the platform, and customize key indicators for intelligent monitoring and rapid
alerting, enabling business personnel to identify application bottlenecks faster, all
dynamics are mastered globally.
➢ Monitor key information such as virtual machine CPU, memory, IO, internal
process status, and form historical trend reports;
➢ Provides various alarm modes such as syslog, snmp trap, email, and SMS. Users
can receive key alarm information in time.
2.6. Watchdog
The system process may suffer a crash, deadlock, etc. caused by an unknown error,
causing the process to not provide external services. At this time, the process
watchdog mechanism provided by the hyper-converged platform can resume
the process in time.
A separate daemon is run in the background of aCloud, the process has the
highest priority, is responsible for monitoring all aCloud system processes, once a
system process crashes, deadlocks, etc., Watchdog will force intervention to restart
the process, resume business operations and record the status information of the
process at that time into the black box for post-analysis.
2.7. Black Box
In the event of system crash, process deadlock or abnormal reset failure, in order
to ensure business continuity and fault location and processing, the hyper-
converged platform preferentially restores the service and provides black box
technology to back up the "dying information" to a local directory for subsequent
fault analysis and processing.
The black box is mainly used to collect and store the kernel log and diagnostic
information of the diagnostic tool before the abnormal exit of the operating system
on the management node and the compute node. After the operating system
crashes, the system maintenance personnel can export and analyze the data
saved by the black box function.
2.8. System Files Backup
aCloud platform provides system files (platform configuration data) 1-click
backup capability, When a system-level failure happens to the platform and
results in the loss of system configuration file, users can quickly restore the system
configuration from a backup file.

3. aSV Compute Layer Reliability
Design
3.1. VM Restart
When the application layer of the VM GuestOS is not scheduled (blue screen or
black screen), aCloud provides an abnormal restart mechanism for the VM to
perform abnormal detection and forced reset to restore services in a timely
manner and ensure business continuity.
aCloud platform will always detect application-level availability, optimization
tools by installing Sangfor vmtool in virtual machines. a few seconds to The vmtool
sends a heartbeat to the host where the virtual machine is running on every few
seconds, then the host determines whether the application layer of the guest
system of the virtual machine is scheduled or not based on heartbeat, disk IO
and network traffic status sent by the VM. After the application layer does not
schedule the state for several minutes, the virtual machine may be considered to
have a black screen or a blue screen. The virtual machine performs HA
operation, shuts down the VM and restarts.
There are many reasons for the abnormality of the virtual machine. The system
blue screen, hardware driver, pirated software, software virus, etc. caused by
hard disk failure, drive error, CPU overclocking, BIOS setting, software poisoning,
etc., the business operating system causes the system to be black screen, etc. At
this point, the hyper-converged platform can provide related automatic restart
solutions to help administrators automate operation and maintenance.
3.2. VM HA（High Availability）
When the external environment is faulty (for example, the host network cable is
disconnected, the storage cannot be accessed, etc.), the hyper-converged

platform provides a mature HA mechanism, and the faulty host's service is
automatically restarted on the healthy host with sufficient resources to implement
the service uninterruptedly or with a very short interruption.
In an aCloud cluster, cluster heartbeat detection will be performed on the nodes
where VMs enabled with HA are running on by the polling mechanism, every 5s to
detect whether or not the virtual machine state is abnormal, and when abnormal
duration reaches a fault detection sensitivity set by the user （the shortest time
is 10s), the HA virtual machine is switched to other hosts to ensure high availability
of the service system, which greatly shortens the service interruption time caused
by various host failures or link failures.
Note: The HA mechanism requires reserved resources (mainly memory resources)
in the entire cluster for the abnormal virtual machine to be pulled up, that is,
the " 2.3 resource reservation guarantee" technology. If the resources are
insufficient, the HA function will fail to pull up the VMs.
3.3. VM Snapshot
When a virtual machine has an illogical failure and cause a service abnormality,
such as a virtual machine change failure (virtual machine patching, new software
installation, etc.), the hyper-converged platform provides virtual machine snapshot
technology, which can quickly roll back to the healthy service state at the
snapshot time .
A virtual machine snapshot is a state in which the state of a virtual machine is saved
at a certain time, so that the virtual machine can be restored to the state at that
time.
3.4. Live Migration
When the administrator wants to perform hardware maintenance and host
change operations on the hosts, the hyper-converged platform provides a virtual
machine live migration mechanism to migrate the virtual machine to other hosts
without affecting the service operation, ensuring that the service continues to
provide services.
When the VM is live-migrated, the information of the source and destination is
synchronized, including the memory, vCPU, disk, and peripheral register status.
After the synchronization is complete, the source VM is suspended and the
computing resources occupied by the source host are released and destination
VM will be started.
During the migration process, the resources of the physical host are checked. If
the resources are insufficient, the migration fails. If the target virtual network is
consistent with the source (if not, the alarm is generated and user decides
whether to continue the migration), the migration is guaranteed.
aCloud live migration supports the following three scenarios:
1) Intra-cluster live migration: because of the distributed shared storage in

the cluster, the virtual machine can only migrate the running location, and
the storage location does not change, so only running data
synchronization (memory, vCPU, disk and peripheral register status) is
required;
2) Cross-storage live migration in the cluster: when the storage location
needs to be migrated, the migration service first migrates the virtual
machine virtual disk image file and then synchronizes the running data.
3) Cross-cluster hot migration: Synchronize virtual disk image files and running
data.
Note: aCloud supports heterogeneous servers to form a cluster. By default,
the new virtual machine of aCloud uses the same type of vCPU , so that the
virtual machine does not depend on the physical CPU model (instruction set),
and can support virtual machine live migration across the physical hosts with
different generations of CPUs.
3.5. Host Maintenance Mode
When the administrator wants to perform hardware maintenance and host
change operations on the host, the hyper-converged platform provides host
maintenance functions, which can achieve the effect of automatic virtual
machine live migration. The system will first migrate the services running on the
host in the maintenance mode to other hosts, ensuring that the services are
affected during the replacement process. The maintenance mode can achieve
the effect of self-operation and maintenance; the host that enters the single-host
maintenance mode is in a frozen state and cannot read and write data.
When there is no host maintenance function, the administrator needs to manually
migrate the virtual machine and there may be a single point of data failure. In
the host maintenance mode, the virtual storage copy check is performed to
ensure that the data copy on the host has a copy on the other host. Host power-
off operation does not affect service.

3.6. DRS（Dynamic Resource Scheduler ）
aCloud platform provides a dynamic resource scheduling mechanism to monitor
the usage of resource pools in the cluster and monitor the entire cluster when the
virtual machine service pressure is so high that the performance of the physical
host can be insufficient to carry the normal operation of the service. The DRS
function will dynamically calculate the resource status and dynamically migrate
the virtual machine on the resource overloaded server to the server with sufficient
resources to ensure the healthy running status of the services in the cluster and
balance the host load in the cluster .
The baseline for overloading host resources is user-defined,
including CPU overload, memory overload, and overload duration. This
prevents the traffic from being switched back and forth due to DRS, and the user
can select manual and automatic resource scheduling.
3.7. DRX（Dynamic Resource eXtension）
When the virtual machine service pressure increases, the computing resources
allocated when the user creates the service are insufficient to carry the current
stable operation of the service. The hyper-converged platform provides the
dynamic resource expansion function to monitor the memory and CPU resource
usage of the virtual machine in real time. When the computing resources
allocated for the virtual machine are about to reach the bottleneck, and the
computing resource resources of the running physical host are sufficient, the
computing resources (CPU and memory) are automatically or manually added
to the service virtual machine to ensure the normal operation of the service;
When the resources of the running physical host are overloaded, the computing
resource hot add operation will not be performed to avoid squeezing the
resource space of other virtual machines. At this time, dynamic resource
scheduling will be performed according to the load condition of the cluster.
The service virtual machine resource usage bottleneck is customized by the user,
including CPU usage, memory usage, and the duration of the computing
resource reaching the bottleneck, ensuring that resources are allocated to the
applications that need it .
3.8. VM Priority
When the available resources of the cluster are limited (system resources are
tight, host downtime, virtual machine HA, etc.), priority is required to ensure the
operation of important services. The hyper-converged platform provides virtual
machine priority classification tags to prioritize the resource supply of important
virtual machines and ensure that the virtual machine business has been a higher
level of resource protection.
3.9. Recycling Bin
When the administrator manually deletes resources such as virtual machines and
needs to retrieve the deleted devices, the hyper-converged platform provides a
resource recycling mechanism. The administrator can go to the recycle bin to
retrieve the virtual machines and virtual network devices that have not been
completely deleted. The user provides a "false delete operation buffer" protection
mechanism and a "reverse" opportunity to ensure the reversibility and correctness
of the user's operation as much as possible.
The virtual device deleted by the user will be temporarily put in the recycle bin for
a period of time. At this time, the disk space occupied by the deleted device is
not released, the data is not deleted, and the device in this state can be
retrieved; the deleted device that is in the recycle bin for more than 30 days will
be automatically deleted and the device disk space will be released.
3.10. VM Anti-affinity
When multiple virtual machines are in active/standby or load balancing
relationship, such as multiple RAC node virtual machines in the Oracle RAC
database, if these virtual machines are placed on one host, as if all the eggs are
placed in one basket, the service will be compromised when node fails; aCloud
hyper-converged platform provides a virtual machine security anti-affinity
mechanism to ensure that virtual machines with mutually exclusive relationships
will not run on the same host. When one host is down, it runs on other hosts in the
cluster. The virtual machine can continue to run to ensure the continuity of the
business. When the DRS dynamic resource scheduling and HA pull up take place,
the mutually exclusive virtual machine still follows the principle of anti-affinity, and
prohibits these virtual machines from running on the same host.
4. aSAN Storage Layer

Reliability Design
4.1. aSAN Distributed Storage Architecture
The aSAN storage layer adopts a self-developed distributed storage system,
which uses the virtualization technology to “pool” the local hard disk in the
general-purpose X86 server in the cluster storage volume to realize unified
integration, management and scheduling of server storage resources, and finally
provide NFS/ iSCSI to the upper layer, allowing the virtual machine to freely
allocate storage space in the resource pool according to its storage
requirements.
4.2. Data Replica Based Protection
When the hardware fails (hard disk damage, storage switch/storage network
card failure, etc.), the data on the failed host is lost or cannot be accessed,
which affects service operation. The hyper-converged platform provides data
multi-copy protection mechanism to ensure service data has multiple copies in
the storage pool, and they are distributed on different disks of different physical
hosts. Therefore, the user data still has a functioning copy on other hosts, which
ensures that data will not be lost and services can be run normally.
Note: The multi-copy mechanism only solves the hardware-level faults and does
not solve the logic-level faults. For example, “the upper-layer application is
ransomware encrypted”, the bottom layer will be encrypted regardless of the
number of copies used.

4.3. Arbitration Based Protection
When multiple copies are inconsistently written due to network and other reasons,
and multiple copies consider themselves to be valid data, when the service is not
clear which copy data is correct, data split-brain occurs, affecting the normal
operation of the service. The hyper-converged platform provides a multi-
copy arbitration protection mechanism. Each service has multiple copies of
data + a copy of the arbitration; the arbitration copy is used to determine which
copy of the data is correct, and the service is informed to use the correct copy of
the data to ensure the safe and stable operation of the service.
The arbitrated copy is a special copy. It has only a small amount of parity data,
and the actual storage space is small. The quorum copy also requires that the
data copy must meet the principle of mutual exclusion of the host. Therefore, at
least three storage disks are composed to have a copy of the arbitration. The
core principle of the arbitration mechanism is that "the minority is obeying the
majority", that is, when the number of data copies accessible on the host where
the virtual machine is running can access less than half of the total number of
copies (data copy + arbitration copy), the virtual machine is prohibited to be run
on this host. Conversely, the virtual machine can be run on that host.
4.4. Spare Disk
When a certain HDD hard disk is damaged in the cluster and the IO read/write
fails, which affects the service, the hyper-converged platform provides data hot
spare disk protection. The system hot spare disk can automatically replace the
damaged HDD hard disk to start working without manual intervention by the user.
In a scenario where the host cluster is large and the number of hard disks is large,
the fault of the hard disk may occur from time to time. The aCloud platform
allows users to stop worrying about data loss caused by hard disk damage and
not-in-time replacement.
4.5. IO QOS Protection
In order to provide higher cluster IO capability and optimal allocation of IO for
user services, the hyper-converged platform provides IO QOS protection
mechanism, and users can ensure the IO supply of important services, including
IO queue priority, by configuring virtual machine priority. Resources such as SSD
layered cache space are used in priority.
The service priority policy is: important virtual machine service IO > normal virtual
machine service IO > other IOs (backup, data reconstruction, etc.); the platform
will automatically check the IO throughput load and physical space occupied by
each physical disk, and provide different service scheduling strategies to
maximize IOs.
4.6. Disk State Detection
When the life of the hard disk expires and the number of bad sectors on the hard
disk is too high, the hard disk is actually in a sub-health state. Although the hard
disk can be recognized for data read and write operations, the hard disk has the
disadvantage of unsuccessful reading and writing and even data loss. The
platform provides a sub-health detection mechanism for the hard disk to detect
and avoid the impact of hard disk failure on the service in advance.
The hard disk sub-health detection calls the smartcrtl and iostat commands to
obtain the status information of the hard disk, and compares with the abnormal
threshold of the hard disk to determine whether the hard disk has sub-health
phenomena (such as slow disk, carton , PCIE SSD life detection, etc.), and filters
through the kernel log for the IO call and the RAID card error logs, and the error
information of the hard disk is obtained therefore.
The basic principle is as follows:

The sub-health hard disk will display the "slow disk" alarm on the aCloud platform
to help users discover the sub-health hard disk and replace it with a healthy hard
disk in time to ensure that the hard disks in the cluster are healthy. The sub-health
hard disk will be restricted to add new fragments. The shards are all silently
processed and cannot write new data, and the data on the sub-health hard disk
is rebuilt onto the healthy hard disk.
4.7. Disk Maintenance Mode
After the hard disk is in the sub-health state and the alarm is generated, the
operation and maintenance personnel need to perform the hard disk
replacement operation. If the data synchronization task needs to read data from
the hard disk to be replaced, the operation of the disk insertion may cause
double faults and thus affect the impact. In this case, you can use the hard disk
maintenance/ hard disk isolation function. Before the system isolates the hard
disk, the data will be fully inspected to ensure that the data on the hard disk has
a healthy copy on the other hard disk. The hard disk after the isolation will not
allow data to be read and written to ensure that services are not affected when
the hard disk is isolated.

4.8. Silent Error Detection
There is an unwarrantable error during the use of the hard disk, that is, a silent
error, until the user needs to use the data, they will find that the data has been
wrong and damaged, and eventually cause irreparable damage, because there
is no warning of silent error. The sign that the error may have occurred has been a
long time, leading to a very serious problem. NetApp conducted observation for
more than 1.5 million hard disk drive over 41 months, and discovered that more
than 400,000 silent data corruption, wherein the hardware RAID controller does
not detect errors in more than 30,000.
In order to prevent the return of user error data due to silence error, the hyper-
converged platform provides aSAN data end-to-end verification function, and
adopts the industry-leading checksum algorithm through the Checksum engine,
Verify engine and Checksum management module. In conjunction with the key
technology of checksum storage performance optimization, a checksum is
generated as a "fingerprint" of the data as soon as the user data enters the
system, and is stored. After that, the checksum will be used to verify the data to
protect the user from silent failures;
The schematic diagram is as follows:
End-to-end technology has two key points: “checksum generation algorithm”
and “storage performance optimization during checksum generation ”. aSAN has
industry-leading technical solutions at these two points.

Key Technology 1: Industry's leading edge checksum algorithm
The checksum algorithm has two main evaluation criterias: one is the speed at
which the checksum is generated; the other is the conflict rate and uniformity. The
collision rate is the probability that two data are different but generate
the same checksum.
Sangfor hyper-converged aSAN data end-to-end verification scheme uses the
XXhash64 algorithm, which is faster and has a lower collision rate than the CRC-32
and Adler-32 algorithms commonly used in the industry.
Key Technology 2: Storage Performance Optimization at Checksum Generation
The checksum is generated in memory and can be transferred and stored along
with the data. When data is stored in non-volatile storage such as disks and SSDs,
checksums also need to be stored. This introduces additional write overhead and
affects system performance.
Sangfor aCloud hyper-convergence is based on a non-metadata center
architecture. In the aSAN end-to-end verification scheme, it optimizes the
checksum storage by using asynchronous brushback, key I/O path bypass, and I/O
contention isolation to address performance issues. In addition, correctness and
consistency are ensured by self-checking, collision detection, and timing detection.
4.9. Fast Data Rebuilding
When multiple copies of data are written inconsistently, or after hardware
replacement in the event of the host/hard disk failures, the hyper-converged
platform provides a fast data reconstruction mechanism to check the working
status of the hard disk and the health of the copy periodically. The health data is
used for replica reconstruction of the source to ensure the security status of the
cluster data.
When the data disk and the cache disk are pulled out, the data disk and the
cache disk are taken offline. When the service IO is continuously faulty on the
data disk, the data disk is considered to be faulty, or the cache disk is considered
faulty when the service IO on the cache disk is faulty, the data reconstruction
process will be triggered.
The data reconstruction process uses the following technical solutions to speed up
the reconstruction:
1) Global participation, multiple concurrent reconstruction: I/0 of data
reconstruction is multi-concurrent, that is, reading from multiple
source hard disks and writing to multiple destination hard disks, realizing
rapid data reconstruction;
2) Intelligent reconstruction: data will occupy part of the storage network
bandwidth and hard disk performance during the reconstruction process,
then the reconstruction program can sense the I/O of the upper layer
service and intelligently adjust the I/O occupied by the reconstruction.
Quickly reconstruct data while ensuring normal operation of the
business;
3) Hierarchical reconstruction: The priority of data reconstruction depends on
the priority of the virtual machine. When the space resources of the
storage volume can be used to reconstruct data are scarce, the
hierarchical reconstruction can give priority to the important data of the
user.
4.10. Fault Domain Isolation
The hyper-converged platform provides storage fault domain isolation. The
storage partitions different disk volumes. Users can divide aSAN into different disk
volumes according to requirements. Each disk volume is an independent fault
domain. In the same fault domain, The copy mechanism and the rebuild
mechanism of aSAN will be isolated in the fault domain and will not be rebuilt to
other fault domains; the faults in the same fault domain will not spread to other
fault domains, which can effectively isolate the fault spread; A rack failure only
affects the disk volumes running on that rack.
4.11. Delayed Data Deletion
“3.9 Recycling Bin”section introduced that when the virtual device is completely
removed, the occupied disk space will be freed, equipment cannot be retrieved
after that; in order to further protect the user's operation and the reversibility, the
aSAN virtual storage layer provides a data delayed deletion mechanism to retrieve
virtual device data that is not completely deleted by the aSAN.
When the upper-layer service sends an instruction to delete data to the aSAN
data storage layer (such as completely deleting the virtual machine image
command), aSAN will check the remaining disk space. If the remaining disk space
is sufficient, aSAN will not delete this part immediately. The space is completely
cleared and reclaimed, and this part of the data will be placed in the "to-be-
deleted queue", and the feedback will be applied to the upper layer to delete the
successful result, and then continue to retain the data for a period of time (default
10 days), beyond this time then this part of the data will be deleted.
If the remaining space of aSAN is less than 70%, and there is data in the
background that needs to be deleted, aSAN will recycle the data to be deleted
according to the longest time principle, without waiting for the timeout.
4.12. Data Self-Balancing
aSAN uses data balancing to ensure that in any case, the data is distributed as
evenly as possible within each hard disk in the storage volume, avoiding extreme
data hotspots and utilizing the space and performance of the newly added hard
disk as soon as possible to ensure the hard disks of each host will be used.
1. Balancing trigger conditions:
1）Planned balancing
Initiates planned data balancing according to the planned timeframe (such
as 12 am to 7 am), when different hard drive capacity utilization within the storage
volume is vastly different, it will be trigger data balancing on disks with high usage,
migrating part of the data to a hard disk with low capacity usage.
Within the time frame planned by the user, aSAN's data balancing module
will scan all the hard disks in the storage volume. If the difference between the
highest and lowest hard disk capacity usage in the volume is found to exceed a
certain threshold (default is 30%), that is, the balance is triggered until the
difference between the usage rates of any two hard disks in the volume does not
exceed a certain threshold (the default is 20%).
For example, after the user expands the storage volume, the balance is
triggered to migrate the data to the newly added hard disk during the data
balancing plan time set by the user .
2) Automatic balancing
Auto-balance balances data automatically initiated by the system without
user intervention. This is to avoid the space of a certain hard disk in the storage
volume is full, and the other hard disk still has free space.
When there is a disk space usage in the storage volume that exceeds the risk
threshold (default is 90%), automatic balancing is triggered until the highest and
lowest disk capacity usage in the volume is less than a certain threshold (default is
3%).
2. Balance implementation
When the trigger condition is met, the system will calculate the upcoming
destination hard disk location that the data will be stored in units of slice data on
the source hard disk; destination hard disk location needs to satisfy the following
principles:
1 ) The principle of mutual exclusion of hosts must be met: that two copies of
the fragment after migration are not allowed to be located on the same host;
2 ) The principle of optimal performance: that is, the hard disk that still satisfies
the optimal data distribution strategy after the slice migration is preferred;
3 ) Capacity optimization principle: Priority is given to the destination hard
disk with low capacity usage.
During the balancing process, the newly added/modified data for the slice is
simultaneously written to the source and the target, that is, one more copy is
written; before the end of the balance, the balance program performs data check
on the source and the target to ensure data consistency before and after
balancing; after the balance is completed, the source shards will be moved to the
temporary directory for a period of time and then deleted.
5. aNET Network Layer

Reliability Design
5.1. aNET Network Layer Reliability Architecture
aNET network layer is using the management plane, control
plane, data forwarding plane disaggregated architecture, through standardized
and decoupled interfaces for the communication; If an abnormality occurs in the
sub-module, which only affect the module itself, and will not spread and lead to
the overall failure of the aNET network platform, and the high reliability design of
each plane realize a high reliability architecture of aNET.
Communication between the planes: the management plane receives user
configuration through "Management Service" module that will convert user
configuration into network configuration and deliver it to the service module
"central controller" in the control plane, the control plane analyzes the
configuration issued by management plane, and break it down, then distribute to
different computing nodes and network nodes, "data forwarding
plane" performs tasks; When status change or operation command is issued by
management plane, management agent will issue the configuration to the data
forwarding plane, then the forwarding plane execute on it directly without going
through the control plane.
5.1.1. Management Plane High Reliability
The management plane adopts a centralized control scheme, and the
management plane master node is elected through the cluster module, and the
cluster file system is used to store data in each network node in a distributed
manner. If the control node fails, aNET automatically elects a new master control
node, the new master node obtains cluster network configuration data through
the cluster file system to ensure high reliability of the management plane.
5.1.2. Control Plane High Reliability
The control plane adopts the same centralized control scheme as the
management plane. The cluster module selects the master control, and the
master control node pulls up the central controller. Through the
various reporting and network node module active reporting mechanism of the
network node, the central controller restores the current control. The real-time
state of each computing and network node is mastered to ensure high reliability
of the control plane.
5.1.3. Data Forwarding Plane High Reliability
The data forwarding plane runs on the application layer. Different from other
cloud platforms running in the kernel layer, when the forwarding plane is
abnormal, it will not cause the kernel to crash, and the forwarding plane can be
quickly restored by restarting the service mode, greatly reducing the impact to
the reliability of the platform itself; the data forwarding plane supports
the active/standby switchover in a single host. The standby process contains all
the configuration information of the data forwarding plane. After the main
process exits abnormally, the standby process immediately becomes the master
process and takes over all network forwarding services. The service will not be
interrupted, and the single host of the data forwarding plane is guaranteed to
be highly reliable.
5.2. DVSW（Distributed Virtual Switch）
The hyper-converged virtual switch adopts a distributed solution. A virtual switch
instance exists on all hosts in the cluster. When one of the hosts is offline, the traffic
passing through the virtual switch instance on the host is due to virtual routing and
virtual machine HA to other hosts. The traffic is also taken over by other hosts; the
application to the upper layer is that the virtual switch of the virtual machine is
the same one, and the virtual switch of the virtual machine is the same after the
virtual machine is moved, HA, etc. The access relationship is not affected,
ensuring high reliability of the data forwarding plane across the hosts in the
cluster.
5.3. vRouter
The virtual router in the aNET network layer is a centralized router. The traffic that is
forwarded on the Layer 3 needs to be forwarded through the router. When the
node where the router is located fails or the service network port connected to
the router cannot communicate, the communication between devices
connected to the router will be affected.
The hyper-converged aNET network layer provides the router HA function
to ensure the reliability of the Layer 3 forwarding network. The network
controller monitors the running status of the host and the status of the service
network port in real time. When the host is faulty or the service network port
cannot communicate, the central controller will calculate the affected virtual
routers and automatically switch these routers to other working hosts to ensure
that traffic passing through the router can be forwarded normally.
！
5.4. Distributed Firewall aFW
When virtual machine is abnormal or faulty, HA mechanism will reboot the VM on
another host in the cluster to resume service, the virtual network management
module will quickly establish the distributed firewall ACL policies that are
associated with the VM on the host where the VM is running after HA based on
the HA startup information, to ensure that the VM is protected by distributed
firewall all the time.
5.5. Reliability
The NFV device is integrated into the aCloud platform in the form of a virtual
machine, and has the high availability protection mechanism of a virtual
machine; the system provides a dual-machine high availability solution for the
NF V device to further ensure the reliability.
At the same time, the aNET network layer monitors the running status of the
NFV device in real time through multiple dimensions (watchdog , disk IO , network
traffic, and BFD detection). If the NFV device fails to work properly, the virtual
router will bypass the associated policy route to ensure that the service is not
affected by the N FV equipment failure.
Note: This section of the N FV device refers specifically to the application
delivery vAD and the application firewall vAF.

5.6. Connectivity Detection
When the virtual network is configured incorrectly or the network link is faulty, the
operation of the virtual network is abnormal. The operation and maintenance
module of the virtual network provides the network connectivity detection
function. The source virtual machine and the destination IP address to be
detected are set through the interface. The control plane sends the route to the
controller, and the controller then coordinates the control agents on multiple
nodes for connectivity detection and result reporting, and clearly presents the
logical and physical network path of the entire probe on the UI , helping the user
to quickly locate the connectivity fault in the virtual network.
5.7. VXLAN Network Reliability
aNET performs connectivity detection on VXLAN network on a routine basis,
ping detection is conducted for each other among each host VXLAN port IPs.
When ports can’t be pinged through for over 5s, alarm is generated on VxLAN
network failure and the connectivity status of VxLAN will be presented to help
user fast locate the VxLAN link failure. In the meantime, VxLAN jumbo frame
detection is also supported for users with VxLAN high performance mode
enabled.
Note: Network connectivity detection (overlay network) and VXLAN network
reliability (underlay network) together provide aNET virtual network
outflow problem location and protection.

5.8. Network Port Self-Recovery
aNET data forwarding plane will regularly check the packet transmission status of
network interface, when detecting the network port is unable to transmit packets
for successive 30s , reset process will be applied to the network ports, to ensure
that network port can be used normally as well as fast recovery of user traffic.
6. Hardware Layer
Reliability Design
6.1. Hardware Health Check
Hyper-converged products offer two delivery approaches: hardware and
software integrated delivery and aCloud pure software delivery (with third-party
hardware); in both ways, the hyper-converged platform provides hardware-level
reliability detection and protection to avoid Hardware failures causing serious
problems.
Hardware reliability monitoring includes health monitoring of CPU, memory,
network cards, hard drives, memory and RAID, to facilitate the timely detection of
problems and provide recommended solution guidance for the corresponding
anomaly detected; Testing results are presented in a unified manner, and user
can eliminate risk by operations based on the alarm information and user
prompts.
In addition, Sangfor aCloud appliance is integrated with BMC diagnostic module,
which can realize failure diagnosis of key components such as CPU, memory,
hard disk, network card, fan, temperature and power supply.

6.2. CPU Reliability
The hyper-converged platform periodically checks the CPU temperature and
frequency.if abnormal, alarms will be raised and solution is provided to avoid
the risk of CPU failure in advance and ensure the reliability of the CPU.
CPU temperature monitoring: The hyper-converged platform checks the
temperature of each physical core of the CPU every minute. When the CPU
temperature abnormality reaches the set threshold (10 minutes), the platform will
alarm.
CPU frequency monitoring: the HCI background periodically checks the CPU
frequency every hour. When the CPU frequency is down, it will alarm.
6.3. Memory Reliability
The hyper-converged platform provides ECC memory monitoring and memory
throughput rate detection to ensure memory reliability.
ECC monitoring: Real-time monitoring of memory using ECC (Error Checking and
Correcting) technology, including UC error (unmodifiable ECC, which will cause
the device to be down or restarted) and CE error (modifiable ECC, ECC error
report doesn't increase, will not affect the continued use of memory), which
includes software reasons, memory module failure, motherboard SI impact,
disturbance (environmental noise, high temperature, high frequency interference
of PWM chip) and heat dissipation of the whole machine;
As the particle processing of each manufacturer decreases, the memory
frequency increases continuously, the capacity of the capacitor to store charge
becomes smaller, and the leakage event is easy to occur. In recent years, the
memory ECC error problem has become more and more obvious. The UC-class
unmodifiable ECC error is monitored statistically by aCloud, alarms and solutions
are provided to avoid accidents.

Memory Throughput Rate Monitoring: The hyper-converged platform provides
detection for memory throughput, alerting when memory throughput is
significantly lower than the nominal memory size.
6.4. Disk Reliability
Hard disk hot swap and RAID : Sangfor hyper-converged appliance supports hard
disk (SAS/SATA) hot swap, supports hard disk RAID 0 , 1 , 10 and multiple other
RAID modes, guarantees high availability of hard disk; It also supports additional
hot spare disk under the RAID configuration to further ensure the high
redundancy of the data disks; supports reconstructing and balancing the data
after the hard disk failure and plugging.
Hard disk comprehensive monitoring, fault avoidance, high reliability of the hard
disk
1) Hard disk status monitoring: The hyper-converged platform monitors
the hard disk status in real time, and immediately alerts when the hard disk is offline;
2) IO error monitoring: The hyper-converged platform periodically analyzes the
IO error condition of the Dmesg information, and immediately alerts when error is
found;
3) SSD life monitoring: The hyper-converged platform regularly uses the smartctl
command to detect the life of the SSD hard disk. When the available life of the
SSD is less than 10% of the life of the entire hard disk, it will immediately alarm;
4) HDD bad sectors monitoring: aCloud uses smartctl instruction to scan all the
physical hard disk in accordance with user's instruction, alarm is raised immediately
when HDD bad sector is found; If the number of bad sectors is less than 10, then
disk replacement suggestion will be proposed, if it’s more than 10, the hard disk will
be labelled as a sub-health hard disk, the disk will be degraded, and the data will
be gradually migrated out;

5) IO latency monitoring: The hyper-converged platform will call the fio
command to test the latency of the random read 4k IO block size in the 32-depth
scene according to the user's instruction. When the latency is more than 10ms,
alarm will be immediately triggered; when the latency is more than 50ms
emergency alert will be triggered immediately, the hard disk will be set as a sub-
health hard disk and downgraded, and data will be gradually move out of this
hard disk;
6) IOPS/throughput monitoring: The hyper-converged platform will call the fio
command according to the user's instruction to test the bare disk 4k IO block size
to be randomly read in the 32-depth scenario. When the hard
disk IOPS performance reaches a dangerous value, an alarm is generated; for
example, when the IOPS of 7200 rpm hard disk is less than 60, the IOPS of 10,000
rpm hard disk is less than 100 and the IOPS of 15000 rpm hard disk is less than 140,
and the platform raises an alarm.
6.5. Network Card Reliability
Network port connection mode detection: In order to provide the correct network
environment for business, the hyper-converged platform checks the working
mode of the network port through the ethtool command to ensure that the
actual network port mode is consistent with the negotiation working mode.
Network port deployment detection: to ensure the normal operation of the
service, the hyper-converged platform performs deployment detection on all
network ports to ensure that the network port configured for a specific purpose
can function to prevent low-level faults such as dropped network ports and
unplugged network cables. If the network port is not deployed correctly, the
alarm prompts;
Network port packet loss detection: to ensure the stability of the service network,
the hyper-converged platform reads the NIC information and counts the packet
loss of the NIC. When the packet loss rate reaches a dangerous value, the alarm
is generated. For example, if the packet loss rate of the network port is greater
than 0.1 % in 10 minutes, alert is prompted;
Network port rate detection: in order to ensure the performance requirements of
running service, the hyper-converged platform detects the network port rate and
alarms when the network port rate reaches a dangerous value; for example, if
the network port rate is less than gigabit, the alarm prompts;
Full-duplex mode detection: in order to ensure the network efficiency required by
the service, the hyper-converged platform performs network mode detection to
ensure that the service operates in a full-duplex mode with high network
efficiency; if it is detected as a half-duplex network mode, an alert is generated.
6.6. RAID Card Reliability
RAID card abnormal status check: the HCI platform analyzes the health status of
the RAID card by reading the RAID status information through system instruction. If
the RAID card has an error or anomaly, an alarm is raised to prompt the user to
check or replace the RAID card;
JBOD (Non-RAID) mode check: in order to ensure the hot swop feature of the hard
disks, the hyper-converged platform performs RAID JBOD mode detection, if it’s
non-JBOD mode, then alert is raised.
6.7. Power Supply Reliability
The hyper-converged appliance is equipped with two sets of power supplies,
which support power supply 1+1 redundancy and power hot swap. After one
power fails, the system can continue to operate without affecting the service,
and the faulty power supply can be replaced online.

6.8. Alarm Service
Sangfor aCloud platform provide comprehensive alerting service, including
multiple dimensions of abnormal alarms of clusters, hosts, storage, networking,
virtual machine, the alarm information is displayed on the page when problems
are found and warning level grouping is offered, users are notified by e-
mail and text messages to ensure alarms are received in a timely manner.
Administrators can set the most suitable alarm policy based on business
requirements to guarantee the accuracy of the platform detection; alarms such
as high usage of the host memory, CPU high-frequency and so on; and provides
log auditing capabilities to further protect the reliability of operation.
7. Solution Layer Reliability Design

7.1. VM Fast Backup
The multi-copy mechanism of aCloud platform can handle the hardware level
single point of failure, making sure that when the hardware level fails, the
platform can provide data redundancy. However, if there is a multipoint failure of
the aCloud platform (all the multiple copies are damaged), or a logical error
(blackmail virus, business database accidental deletion) occurs, the multi-copy
mechanism cannot solve these problems.
The aCloud platform provides the first-time full backup + subsequent incremental
backup + bitmap dirty data marking technology fast backup function to solve
such failures. This function is aCloud’s proprietary technology, greatly improving
backup efficiency and reducing the impact of the backup process to the
production environment.
1) First, perform a full backup (if there is already a full backup, directly perform
an incremental backup);
2) After the full backup, the service continuously writes new data (G and H )
and marks it with bitmap. At this time, the new data can be directly written in the
original position of the qcow2 file, and the data of the modified location is only
incrementally changed in the next backup; After the end, reset the bitmap to 0 to
perform each backup operation.
3 ) When incremental backup files are deleted, data will be merged
backwards to ensure that each reserved backup data is completely available,
thus freeing up space and saving backup storage resources.

When multiple disk images of a virtual machine or disk images of multiple virtual
machines are related, fast backup also provides multi-disk data consistency check.
For example, in the database application scenario, the database (SQL Server ,
Oracle ), the data disk and the log disk must maintain the consistency of the
backup time. Otherwise, when the backup is restored, the restored Oracle system
will still be unavailable due to the inconsistency, and the aCloud fast backup can
ensure that multiple disks of the database data are restored in a consistent manner.
Compared with snapshot-based CBT backup solutions used by other platforms in
the industry, aCloud fast backup technology has an essential improvement in
performance and efficiency, because it can be directly written when writing new
data in the original location, and no copy-on-write will occur. The mapping
between the qcow2 file and the data location will not be out of order, so it does
not affect the performance of the qcow2 image; the incremental backup method
reduces the amount of data for each backup, thereby increasing the backup
speed.
7.2. CDP（Continuous Data Protection）
Virtual machine continuous data protection (CDP) technology is also aCloud
proprietary technology, providing users with additional image protection, finer-
grained data protection technology: virtual machine fast backup technology
can provide hourly granularity protection, while CDP technology can provides
one second or 5 seconds level of data protection, it records every data change
and restore the data with near-zero loss for ultimate protection.
Sangfor aCloud has done deep optimization for the CDP technology, compared
to the traditional CDP software that works as an agent embedded in the OS
layer, Sangfor integrated CDP module with the qcow2 file layer, thus providing a
better CDP data protection solution that is low cost, easy to deploy and more
suitable for VM businesses.

The CDP backup data consists of the RP log file and the BP backup file. The
bypass structure + IO offload + shared cache area technology is used to
asynchronously copy IO from the main service to the CDP log storage repository,
and periodically generate RP points to ensure that the CDP backup process does
not affect the normal service, and the fault isolation is implemented. The fault of
the CDP module does not affect the normal service as well. The BP point is
generated periodically according to the configured backup frequency. The
generated BP point and RP point are marked with a time stamp to locate the
recovery point in the event of a failure.
The traditional CDP software inserts a "probe program" on the IO path. If the
"probe program" itself is faulty or the CDP- dependent storage fails, the business
environment of the original production environment may be abnormal.
The CDP technology provided by aCloud HCI acquires the IO image in
the bypass mode, and if the CDP module is faulty, it will not cause the failure of
the original production system.
CDP also provides consistency check on stored data on multiple disks to ensure
that data at each recovery point is correct and valid.
1 ) the CDP storage has three disks, each IO write forms a RP point marked
with an id, the RP points marked with the same ID on the 3 disks are
considered belonging to the same consistency RPs.

2 ) The above RP points marked with id 3 all exist, so RP3 is a valid RP of
consistency, can be shown on the page for VM restoration;
3 ) The above RP with id 6 is missed in vdisk2, so RP6 is not a valid RP of
consistency , it cannot be shown on the page and used to restore the virtual
machine .
7.3. DR（Disaster Recovery）
Sangfor aCloud provides a complete off-site disaster recovery solution to help
users cope with server room level failures, providing a complete disaster recovery
solution that does not depend on third-party software, reducing the complexity of
the whole solution, making the whole solution simpler and more stable. The active
and standby DR solutions are mainly used for disaster recovery in the same city or
in different locations. The production center and the disaster recovery center
adopt the active/ standby mode. When a disaster occurs in the production center
or a fire occurs, the disaster recovery center can quickly restore services and
maximize the protection for the continuous operation of the business system.
Sangfor aCloud offsite disaster recovery solution realizes asynchronous data
replication of virtual machines across 2 clusters with the integration of DR module
aDR for data backup and transfer. The DR gateway aDR calls CDP backup API to
perform local data backup for the protected VMs and transmit data between DC
and DRC to achieve asynchronous replication. The disaster recovery gateway
supports encryption, compression, dynamic flow control, consistency check, and
breakpoint transmission of data to ensure data security, reliability, and integrity.
The hyper-converged cloud management platform implements unified
management of the production center cluster and the disaster recovery center
cluster, and provides disaster recovery policy management, disaster recovery
planning, disaster recovery large-screen monitoring, and disaster recovery testing
functions to achieve second-level RPO recovery.

7.4. SC（Stretched Cluster）
Sangfor aCloud platform stretched cluster storage active-active solution
achieves 0 RPO and second-level RTO recovery in the event of data center
failure. when a site fails, applications running on the stretched cluster can
seamlessly access the other copy in the other site to realize inter-site business high
availability, VMs can be live migrated or HA failover between the 2 sites.
In the section " 4.2 Data Replica Based Protection", the business data is written to
the storage volume in multiple copies. After the hyper-converged platform is built
into the stretched cluster, multiple copies of the business data running in the
stretched cluster will be synchronously written to two sites. After receiving the
completion confirmation of the two data centers, it is considered that a write IO
is completed, only then the next IO can be written to ensure the consistency of
the data copy; when the service is running normally, the local data is
preferentially accessed. When the local data copy is inaccessible, the system will
switch to access the copy in the remote data center; therefore, when one data
center fails, the virtual machine can be pulled up in another data center by HA,
and the data copy 2 is accessed, maximizing the protection for the continuous
operation of the business system.

When a user runs an Oracle RAC database cluster or other distributed cluster
services, failover takes place automatically between different sites to implement
active-active service. The virtual machines that active-active services must run in
different fault domains. Sangfor aCloud supports specifying virtual machine’s
running location, assuming customer has an active-active business run on VM A
and VM B, when the VMs are created, you can specify that VM A can only run on
the main site and B on the secondary site to ensure that the virtual machines’
running locations are mutually exclusive.
For example, in the Oracle RAC scenario, both RAC nodes are set to run in a
certain server room and are mutually exclusive. Then, when a server room fails,
the other node is still running.
The stretched cluster performs data consistency check through the arbitration
copy. For details, please refer to " 4.3 Data Arbitration Protection" .
Block A1, Nanshan iPark,
No.1001 Xueyuan Road, Nanshan District, Shenzhen,
Guangdong Province, P. R. China (518055)
Service hotline: +60 12711 7129 (7511)
Email: sales@sangfor.com

Sangfor HCI Reliability Technical White Paper 20191019

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sangfor HCI Reliability Technical White Paper 20191019

Uploaded by

Copyright:

Available Formats

Sangfor aCloud

Sangfor Technologies Inc.

Any text appearing in this document narrative content of the document

format, illustrations, photographs, methods, processes, etc., unless specifically

in its entirety or in part, for commercial purposes.

Hong Kong: (+852) 3427 9160

United Kingdom: (+44) 8455 332 371

Singapore: (+65) 9189 3267

Malaysia: (+60) 3 2201 0192

Thailand: (+66) 2 254 5884

Indonesia: (+62) 21 5695 0789

You can also visit the official website of Sangfor Technologies:

www.sangfor.com for the latest technology and product information.

2. ACLOUD PLATFORM MANAGEMENT RELIABILITY .................................................... 5

2.1. DISTRIBUTED ARCHITECTURE ............................................................................................... 5

2.2. LINK REDUNDANCY ........................................................................................................... 7

2.3. SYSTEM SELF-PROTECTION ................................................................................................. 8

2.5. MONITOR CENTER............................................................................................................. 9

2.6. WATCHDOG ..................................................................................................................... 9

2.7. BLACK BOX ..................................................................................................................... 10

2.8. SYSTEM FILES BACKUP...................................................................................................... 10

3. ASV COMPUTE LAYER RELIABILITY DESIGN ............................................................ 11

3.1. VM RESTART ................................................................................................................... 11

3.2. VM HA（HIGH AVAILABILITY） ..................................................................................... 11

3.4. LIVE MIGRATION ............................................................................................................. 13

3.5. HOST MAINTENANCE MODE ........................................................................................... 14

3.6. DRS（DYNAMIC RESOURCE SCHEDULER ）.................................................................. 15

3.7. DRX（DYNAMIC RESOURCE EXTENSION） .................................................................... 15

3.8. VM PRIORITY .................................................................................................................. 16

3.9. RECYCLING BIN............................................................................................................... 17

3.10. VM ANTI-AFFINITY .......................................................................................................... 17

4. ASAN STORAGE LAYER RELIABILITY DESIGN .......................................................... 18

4.1. ASAN DISTRIBUTED STORAGE ARCHITECTURE .................................................................. 18

4.2. DATA REPLICA BASED PROTECTION ................................................................................ 19

4.3. ARBITRATION BASED PROTECTION ................................................................................... 20

4.4. SPARE DISK ..................................................................................................................... 20

4.5. IO QOS PROTECTION .................................................................................................... 21

4.6. DISK STATE DETECTION .................................................................................................... 21

4.8. SILENT ERROR DETECTION ................................................................................................ 23

4.9. FAST DATA REBUILDING ................................................................................................... 24

4.10. FAULT DOMAIN ISOLATION .......................................................................................... 26

4.11. DELAYED DATA DELETION ........................................................................................... 26

4.12. DATA SELF-BALANCING .............................................................................................. 27

5. ANET NETWORK LAYER RELIABILITY DESIGN........................................................... 28

5.1. ANET NETWORK LAYER RELIABILITY ARCHITECTURE ......................................................... 28

5.1.1. Management Plane High Reliability ................................................................. 29

5.1.3. Data Forwarding Plane High Reliability ............................................................ 30

5.2. DVSW（DISTRIBUTED VIRTUAL SWITCH） ....................................................................... 30

5.3. VROUTER ......................................................................................................................... 31

5.4. DISTRIBUTED FIREWALL AFW ............................................................................................ 32

5.5. RELIABILITY ....................................................................................................................... 32

5.6. CONNECTIVITY DETECTION .............................................................................................. 33

5.7. VXLAN NETWORK RELIABILITY ........................................................................................ 33

5.8. NETWORK PORT SELF-RECOVERY .................................................................................... 34

6. HARDWARE LAYER RELIABILITY DESIGN ................................................................. 34

6.1. HARDWARE HEALTH CHECK............................................................................................ 34

6.2. CPU RELIABILITY .............................................................................................................. 35

6.3. MEMORY RELIABILITY ....................................................................................................... 35

6.4. DISK RELIABILITY ............................................................................................................... 36

6.5. NETWORK CARD RELIABILITY ........................................................................................... 37

6.6. RAID CARD RELIABILITY .................................................................................................. 38

6.7. POWER SUPPLY RELIABILITY .............................................................................................. 38

6.8. ALARM SERVICE .............................................................................................................. 39

7. SOLUTION LAYER RELIABILITY DESIGN .................................................................... 39