You are on page 1of 25

10/01/2021 Security Level:

Introduction to CloudEdge
V100R18C10 FAC_V2.0

www.huawei.com

Author: Fu Hang/Wang Xiaodong/Ju Zheng/Zheng Junhua


Version: V2.0 (2018-02-03)
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Contents

1. FAC Working Principles

2. Start with the FAC

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 2


What Does the FAC Do?

The fault analysis center (FAC) provides fault locating, fault diagnosis,
and fault rectification advice for VNFs.

Difficulties in O&M on the live network

The cloud-based architecture poses great challenges to O&M. The NFV integration implementation faces the
following problems in terms of fault demarcation:
 In the NFV architecture, high problem locating capability is required. When a problem occurs, it is difficult to
describe its symptom, directly affecting the problem solving.
 When a cross-layer problem occurs in a multi-level architecture, it needs to be streamlined, causing high
communication cost and low efficiency.
 Currently, the traditional log analysis method is still used to locate problems, for lack of end-to-end fault
demarcation tools. The cost is high, with little effect.

FAC capability
The FAC provides the automatic fault detection mechanism. It implements fault management for service VNFs and
part of the NFVI, including information collection, fault overview, layered demarcation, KPI monitoring, VM reset
cause demarcation, health check, and CPU overload fault demarcation, thereby helping analyze faults on the live
network.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 3


FAC Architecture
FAC architecture and functions of Component Function
each component The EMS is the device O&M center of the vendor. It provides
the O&M service of VNFs in its management scope, including
analysis of the association between VNFs and NFVI
resources, and monitoring and assurance capabilities. The
U2000 (CCE)
U2000 interconnects with the eSight of the NFVI
(subscription mechanism) to obtain the overall NFVI
monitoring capability, O&M of VNFs and networks, and single
node service network configuration by VNF.
The eSight is an integrated O&M management solution for
eSight enterprise data centers, campus/branch networks, unified
communications, video conferences, and video surveillance.
Network-level fault The VNFM manages VNF life cycle (creating and monitoring
U2000 (CCE) management center VNFM VNFs, for example). The VNFM can obtain NFVI monitoring
data only about those VNFs in its management scope.
The NFVI provides the VNF running environment, including
hardware and software required by VNFs. Hardware includes
NFVI computing, network, and storage resources. Software
includes hypervisors and storage managers. The NFVI
CloudEdge virtualizes physical resources for VNFs.

CloudUGW CloudUSN
Fault
management FAC
center within a
VNF
VNFM

NFVI eSight

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 4


FAC Overview

Relationship between the FAC and peripheral components

No. Interface Name


VNF (CloudUSN) 1 FAC_VNFC O&M network port
Southbound network port or
combined southbound and
2
northbound network port of the
MANO (CSM)
O&M network port of the
3
CloudUSN
4 NFVI communication port
5 Maintenance network port
6 Service network port
7 Maintenance network port

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 5


FAC Overview

Overview of FAC fault locating, fault demarcation, and fault rectification solutions

2. Quick fault
diagnosis
7. VNF rectification
advice
3. Layered
demarcation
Normal 1. Service
network loss 6. Offline analysis
4. Fault information
collection

5. Health check
Communication sub-
health diagnosis

VM reset
demarcation

CPU overload
demarcation

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 6


FAC Overview

FAC function overview


Network-level
Available 18.1
fault self-healing
(CCE)
19.0

Fault tree diagnosis Cross-layer


Cross-layer demarcation Cross-layer demarcation
Fault (Expert experience) of high CPU usage of VM reset
demarcation of
communication faults
demarcation

VNF fault self-


Self-healing healing

KPI monitoring
Monitoring (Expert
experience)

Information
O&M efficiency collection

Major functions are described in the following slides.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 7


Principles of FAC Fault Detection and Diagnosis
 Principle of fault detection: Identify the fault symptoms by fault symptom rules, based on the KPIs and
alarms. If the KPI changes or alarms meet the rule definition, the matching is successful. The root cause is
displayed in the fault symptom tree and table.
 Principle of root cause diagnosis: All possible causes for each symptom are listed based on expert
experience. Each possible cause matches a root cause rule. The data sources of the root cause include
performance data, alarm data, operation logs, configuration data, CHRs, licenses, and MML commands.
During diagnosis, the system automatically matches the data source according to the root cause rules. If the
matching is successful, the root cause is displayed on the UI.
Fault Diagnosis
Diagnosis rule 1
Root Fault tree model diagnosis Solution
results
node
Diagnosis rule 2

Attach success Diagnosis rule N


Solution
rate decrease
2G attach success
rate decrease
Solution
Experience 1

3G attach success
rate decrease Experience 2
Expert
CSFB fault Experience N
experience

Common processing +
… Adaptation
VNF platform

CHR KPI Alarm Log …

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 8


Principle of FAC Communication Sub-health Diagnosis
If a large number of alarms related to communication links are generated within a short period
of time, maintenance personnel cannot quickly identify the fault causes. The comprehensive
communication sub-health diagnosis provided by the FAC helps to solve the problem. The
FAC provides fault diagnosis of internal VNF communication links to deduce link faults of the
NFVI.
Link Alarm BASE

FABRIC

Host Host Host Host Host Host

VM VM VM VM VM VM VM VM VM VM VM VM
VM VM VM VM VM VM

Fault Type

PAE fault
vNIC fault
EOR EOR Host NIC fault
Comprehensive analysis EOR fault
of plane link alarms
to detect faults Site

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 9


FAC Communication Sub-health Diagnosis
Communication sub-health diagnosis supports scenarios listed in the table.

Fault Scope Solution Description


The NOS detects sub-health logs and triggers packet capture in a short time. The FAC analyzes the sub-
Single VM health logs and captured packets and provides the conclusion that whether the NOS is faulty or the NIC
driver is faulty.
1. In a single VM scenario, the FAC analysis capability is available.
Single node 2. The FAC can determine that multiple VMs on a host are in a sub-health state, and draws the conclusion
that the host is faulty.
1. Demarcation based on the single VM and the single node is available.
Single
2. The FAC can determine that multiple hosts in a chassis are in a sub-health state, and draws the
chassis
conclusion that the chassis is faulty or the switch module in the chassis is faulty.
1. Demarcation based on the single VM and the single node is available.
Entire DC
2. The FAC can determine the sub-health status between multiple hosts/VMs in different chassis. The
(TOR/EOR)
conclusion is that an inter-chassis fault (EOR fault) occurs.
The NOS detects a plane fault, records logs, and triggers packet capture. The FAC analyzes the TIPC
Single VM disconnection, Fabric logs and captured packets, and draws the conclusion that the NOS is faulty or the NIC
driver is faulty.
1. In a single VM scenario, the FAC analysis capability is available.
Single node 2. The FAC can determine that multiple VMs on a host are in a sub-health state, and draws the conclusion
that the host is faulty.
1. Demarcation based on the single VM and the single node is available.
Single
2. The FAC can determine that multiple hosts in a chassis are in a sub-health state, and draws the
chassis
conclusion that the chassis is faulty or the switch module in the chassis is faulty.
1. Demarcation based on the single VM and the single node is available.
Entire DC
2. The FAC can determine the sub-health status between multiple hosts/VMs in different chassis. The
(TOR/EOR)
conclusion is that an inter-chassis fault (EOR fault) occurs.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 10


VM Reset Demarcation
When a VM is abnormal, the VNF reports an alarm to the U2000 and WebLMT. Maintenance
personnel need to find the root cause of the alarm. There are more than 270 VM reset causes.
Frontline maintenance personnel need to take a long time to confirm the reset cause, affecting
the problem solving efficiency. The FAC can quickly provide VM reset demarcation results,
improving maintenance efficiency and reducing maintenance costs.

VM reset demarcation supports scenarios


FAC listed in the table.
VM reset
demarcation
Fault Scope Solution Description

The FAC performs automatic analysis


Single VM according to the NOS reset code and
provides demarcation results.
MML Alarm
The FAC provides the analysis result
query collection
that multiple VM faults occur on a
Single host
single host, based on the fault cause
code.

VNF

Host Host

VM VM VM VM VM VM

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 11


CPU Overload Demarcation
When the CPU usage of a VM in a VNF fluctuates greatly, the CPU overload alarm is generated. As a
result, the values of performance counters may increase sharply or reach the peak. Frontline or
maintenance personnel need to explain these symptoms, and need to check several components one
at a time, which takes a long time and affects the problem locating efficiency. The FAC diagnoses the
CPU overload in a VNF and quickly locates the first responsible component for CPU overload,
improving the problem locating efficiency. Clo
ud
USOth
FAC N ers Component CPU usage
CPU overload PS[ 百[ 百
demarcation P 分 分
10比 ]比 ]
%

NO
Log Alarm S
collection collection 80
%

CPU overload demarcation supports the


scenario listed in the table.
VNF
Fault Scope Solution Description
Host Host
Locate the first responsible
component for sharp increase of
Single VM
VM VM VM VM VM VM CPU usage according to records of
the black box.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 12


Contents

1. FAC Working Principles

2. Start with the FAC

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 13


General Description of FAC Usage
FAC functions and application scenarios
This feature is used to detect and diagnose internal VNF faults.
Menu Submenu Scenario
Fault details Used to check the overall fault status of VNFs.
Fault overview
KPI browsing Used to monitor KPI changes in VNFs.
Layered
- Used to identify the root fault cause in VNFs based on the deduction from the fault tree.
demarcation
If a large number of alarms related to communication links are generated within a short period
FAC communication sub-
of time, maintenance personnel cannot quickly identify the fault causes. The comprehensive
health diagnosis
communication sub-health diagnosis provided by the FAC helps to solve the problem.

Allows users to view VM reset details, reset cause percentages, VM and host deployment
VM reset demarcation relationship, and VM reset and alarm scatter charts. The comprehensive communication sub-
Health check
health diagnosis provided by the FAC helps to solve the problem. The FAC provides fault
diagnosis of internal VNF communication links to deduce link faults of the NFVI.
When a VM is abnormal and an alarm is generated, maintenance personnel cannot identify
problems in a short period of time, affecting the problem solving efficiency. The FAC can
CPU overload demarcation
quickly provide VM reset demarcation results, improving maintenance efficiency and reducing
maintenance costs.
Information The FAC collects information about the specific VNF and NFVI where faults occur, allowing
-
collection offline fault analysis and diagnosis by engineers.
The FAC supports database overflow dump, fault analysis rule import, log level setting, user
System -
management, and log management.

This feature is used to demarcate NFV top faults and will continue to be supplemented in the future.

For details about the FAC functions, see the following slides.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 14
Introduction to the FAC Web Page

The following picture shows the FAC web page. From left to right, the sub-pages are as follows:
Fault Overview, Layered Demarcation, Health Condition, Information Collection, and System.

System
Fault Overview Health Condition Information Collection
Layered Demarcation

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 15


Fault Overview
 Set the start time and the end time, and query the faults that occur during this period.
 View the fault details.
 The fault can be further analyzed, and the Layered Demarcation page is displayed. Version
FAC internal alarms information

Log out Page


guide
Step 1: Query fault information.
App

Fault category
Fault sub-category
Step 2: See fault details.

Fault information details Step 3: Click Diagnose to go to the Layered


Demarcation page and perform automatic fault analysis.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 16


KPI Browsing

Allows users to view real-time and historical KPI information.

Step 1: Enable or disable the real-


time refresh button.

KPI line chart

Step 2: View historical KPI


information.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 17


Fault Diagnosis

The following figure shows the fault diagnosis scenario list and fault diagnosis results.

Step 1: Select a scenario.

Step 2: Start fault diagnosis in real time.

Step 3: Query and export fault diagnosis results.

Fault diagnosis results and rectification advice

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 18


Information Collection
 Information collection
One-click information collection --- Collects information supported by all systems automatically.
Customized information collection --- Collects information according to user selection.
Historical data query --- The latest information collection result can be obtained.

 The following table lists the types of collected information.

Performance System MML


VNFC Alarms Configuration Operation logs CHRs DBGLOG
measurement logs commands
VNFM Alarms System logs
FusionSphere
FusionSphere FusionSphere FusionSphere
OpenStack
NFVI OpenStack OpenStack OpenStack
performance
alarms configuration host logs
measurement

 Page display

Step 1: Collect data in the


selected period.

Step 2: Export collected data.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 19


VM Reset Cause Demarcation
Allows users to view VM reset details, reset cause percentages, VM and host deployment relationship,
and VM reset and alarm scatter charts.

Step 1: Check VM reset information during


a specified period of time.
Step 2: Export VM reset information
during the specified period of time.

VM and host deployment


relationship
Percentage of a reset
cause in a specified
period of time

Scatter chart of VM
reset and alarm

Reset
information

Return
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 20
Health Condition
Allows users to view communication sub-health check results.

Step 1: Check sub-health check information


during a specified period of time.
Step 3: Export sub-health check information
during the specified period of time.

Step 2: View sub-health


check details.

Node connection
Node connection
relations
relations

Physical deployment relations


Return

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 21


CPU Overload Demarcation
Allows users to view the CPU overload check results.

Step 1: Check CPU overload information


during a specified period of time.
Step 2: Export CPU overload information
during the specified period of time.

CPU usage of the first


responsible component

Component CPU usage

Return

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 22


Rule Management
The FAC supports rule file updates.

Step 1: Upload the rule file.

Select File No file is selected. Step 2: Import the rule file.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 23


Application Fault Scenarios
Total Number of Application
VNF Fault Type Fault Name
Fault Scenarios
2G attach success rate decrease
3G attach success rate decrease
Attach success rate
4G attach success rate decrease
decrease
2G and 3G attach success rate decrease
2G, 3G, and 4G attach success rate decrease
2G activation success rate decrease
Activation success rate
CloudUSN 3G activation success rate decrease 200+
decrease
2G and 3G activation success rate decrease
2G traffic decrease
Traffic decrease 3G traffic decrease
2G and 3G traffic decrease
VoLTE fault VoLTE fault
CSFB fault CSFB fault
2G and 3G activation success rate decrease
4G activation success rate decrease
UE access exception
2G and 3G UE quantity decrease
4G UE quantity decrease
2G and 3G forwarding success rate decrease
CloudUGW 4G forwarding success rate decrease 250+
Abnormal system traffic
2G and 3G forwarding traffic decrease
4G forwarding traffic decrease
Offline charging Abnormal CDR caching
exception CG fault or capability insufficiency
Abnormal VoLTE Sharp decrease in VoLTE UE quantity

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential 24


Thank you
www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All Rights Reserved.


The information in this document may contain predictive statements including, without limitation,
statements regarding the future financial and operating results, future product portfolio, new technology,
etc. There are a number of factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such information is provided for
reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.

You might also like