HCIA-Intelligent Computing V1.0 Training Material

 For the big data technology, what matters is mining the significance of massive
data but not obtaining massive data. In other words, if big data is compared to an
industry, the key to making profits from this industry is to improve the data
processing capability to increase data value.
 Technically, big data is closely related to cloud computing. Big data must be
processed by a distributed computing system instead of a standalone computer.
Massive data is mined by a distributed system using distributed processing,
distributed database, cloud storage, and virtualization technologies of cloud
computing.
 With the advent of the cloud era, big data attracts increasing attention. According
to the analyst team of Zhucloud, big data is generally used to describe large
amounts of unstructured and semi-structured data created by an enterprise. Data
analysis takes too much time and money after the data is downloaded to relational
databases. Big data analysis is usually associated with cloud computing, because
real-time big data analysis requires a framework (such as MapReduce) to assign
tasks to dozens, hundreds, or even thousands of computers.
 An accurate positioning is the basis for a technology to play its role. Properly
positioning AI is the basis for us to understand and apply this technology. Huawei
agrees that AI, a new GPT, is a combination of technologies, just like the wheels
and iron in B.C., railways and electric power in the 19th century, as well as the
automobiles, computers, and Internet in the 20th century.
 A simple understanding of a so-called GPT is that it has to serve multiple purposes,

applies to almost all corners of economy, and produces huge technological
complementarity and overflow effects. Economists think that 26 GPTs have
occurred by now, and AI is one of them. The reason why we include AI as a GPT is
that we see the ever-increasing impact and value of AI. As a GPT, AI can not only
solve the solved problems with higher efficiency, but also solve many problems
that are not yet solved. To build our competitiveness, we must develop our AI
thinking, and use AI ideas and technologies to deal with problems.
 Through practice, Huawei has discovered that AI can not only replace human work,
but also automatically reduce production costs. This is the biggest difference
between AI and informatization, and is also the most valuable advantage of AI.
 The computing industry, through the development of nearly half a century,
continuously changes the society and reshapes industries. The computing industry
itself is constantly evolving as well.
 Mainframe and minicomputer based dedicated computing marks computing 1.0.

Powered by the Intel x86 CPU architecture and driven by Moore's Law, computing
began to serve general purposes. The emergence of DCs signals the beginning of
computing 2.0. With the rapid growth of digitalization, the world is becoming
more intelligent. Computing is not limited to data centers, but also to full-stack
scenarios. This is called the computing 3.0 era, featuring intelligence. That is way
we call computing 3.0 "intelligent computing."
 IBM, HP, and SUN developed their own midrange computers to solve the
problems such as insufficient functions, closed interfaces (such as FICON and
ESCON interfaces), limited user groups, and high costs.
 IBM's minicomputers are equipped with POWER CPUs (4–8), which are able to
scale out to 16, even 32 sockets. In recent years, IBM has launched projects such as
OpenPOWER and POWER Linux, marking its transition from a closed architecture
to an open architecture.
 Dedicated computing platform: high costs and few applications, applicable to only
a few enterprises and a few apps
 There are increasingly diversified compute demands on the cloud, edge, and
device sides in terms of performance, energy consumption, latency, and product
tolerance to extreme environments.
 The server market is shifting its focus on virtualization and cloud computing to AI
computing, edge computing, and the HPC. Traditional server vendors are facing a
range of increasing compute demands, posing challenges on the computing
architectures , as well as the deployment, management, and O&M. Major
challenge: How to break through the compute bottleneck of traditional servers
while reducing O&M costs?
 In the atomic age, how to break the physical limits of Moore's Law? The 18-fold
increase in the size of massive data and 10-fold of computing power each year
indicate a great deal of heterogeneous computing demands. No single computing
architecture is able for data processing of all types and in all scenarios.
Heterogeneous computing that involves the collaboration between various CPUs,
DSPs, GPUs, AI chips, and FPGA chips the combination of multiple computing
architectures is the optimal solution to meet the requirements of service diversity
(smart phones, smart home, IoT, and smart driving) and data diversity (digits, texts,
pictures, videos, images, structured data, and unstructured data)
 The computing industry, through the development of nearly half a century,
continuously changes the society and reshapes industries. The computing industry
itself is constantly evolving as well.
 From dedicated computing to general-purpose computing, to full-stack all-

scenario intelligent computing, applications, computing scenarios, and data types
to be processed are even diversified. Computing diversity brings new challenges.
No single computing architecture is able to meet all these computing
requirements. The combination of multiple computing architectures is the optimal
solution.
 According to Gordon Moore, Intel co-founder, the number of discrete components
on a newly manufactured integrated circuit doubles every 12 months.
 Nvidia's CEO, Jen-Hsun Huang, believes the traditional version of Moore's Law is
dead.
 According to our analysis, the popularization of AI in the industry faces the
following four challenges:
 Compute power supply: Compute power is scarce and expensive.
 Data collaboration: Data center and edge data cannot be effectively

interoperated, and data cannot be effectively collaborated. Therefore, the
data value cannot be maximized. The DC cannot conveniently access the data
of edge nodes for training and edge nodes cannot timely obtain the latest
training models from the DC
 Deployment in diversified scenarios: The actual applications of different

industries are complex and diverse. It is unlikely to have a dustless service
environment with constant temperature and humidity at the edge. Most
edge environments are extremely cold, hot, and dusty.
 Professional talents and skills: The learning curve of the AI technology is long,
the threshold is high, and talents are scarce. Generally, customers are in want
of AI technical personnel to implement AI applications.
 In the past 10 years, the physical world has gradually become digitalized.
Computing is becoming fully intelligent and moving towards the edge.
 According to third-party statistics, the total annual data volume will reach 165
zettabytes by 2025, where, 50% data will be generated at the edge, and the
requirements for compute power at the edge will burst.
 In addition, according to the GIV 2025 report, digitalization and intelligence will be
deepened in 2025. The data usage will reach 80%, the cloud adoption rate will
reach 85%, and the AI application rate will reach 86%.
 Atlas 500 AI Edge Station for edge intelligence: Most mainstream edge products in
the industry focus more on networking capabilities and less on computing
capabilities. However, in video analytics and industrial interconnection scenarios,
customers need to process the data collected at the edge in real time to support
fast decision-making. The Atlas 500 can collaborate with the cloud, receive and
update algorithms pushed from the cloud, and quickly deploy new services.
 The Atlas 500 AI Edge Station offers powerful performance. It is capable of real-
time data processing at the edge. A single device can provide 16 TOPS of INT8
processing capability with an ultra-low power consumption of less than 1 kWh per
day. The Atlas 500 integrates Wi-Fi and LTE wireless interfaces to support flexible
network access and data transmission.
 Built for harsh edge deployment environments, no matter it is the cold Siberia or
the hot Sahara desert, the Atlas 500 can operate stably from –40°C to +70°C.
 Based on the five self-developed chipset families and two intelligent engines (for
intelligent management and acceleration), Huawei continuously offers customer-
oriented innovations, breaks through Moore's Law, improves management
efficiency, and deploys intelligent computing solutions across scenarios.
 Kunpeng computing platform is equipped with TaiShan servers for general

computing.
 Ascend computing platform employs Atlas products for AI computing.
 Huawei provides a full-scenario all-round intelligent computing solution.

 ICs can be classified into analog, digital, and mixed signal, consisting of both
analog and digital signaling on the same IC.
 Digital ICs can contain anywhere from one to billions of logic gates, flip-flops,
multiplexers, and other circuits in a few square millimeters.
 Analog ICs, such as sensors, power management circuits, and operational

amplifiers (op-amps), work by processing continuous signals. They perform analog
functions such as amplification, active filtering, demodulation, and mixing. Analog
ICs ease the burden on circuit designers by having expertly designed analog
circuits available instead of designing and/or constructing a difficult analog circuit
from scratch.
 ICs can also combine analog and digital circuits on a single chip to create functions
such as analog-to-digital converters and digital-to-analog converters. Such mixed-
signal circuits offer smaller size and lower cost, but must carefully account for
signal interference.
The architectural designs of the CPU are Reduced instruction set computing (RISC)
and Complex instruction set computing (CISC). The x86 CPUs from AMD and Intel are
examples of CISC processors. Mainstream RISC CPUs are produced by ARM and IBM,
where ARM's are ARM architecture based and IBM adopts the PowerPC architecture.
 Instruction system: RISC designers focus on frequently used instructions and try to
make them simple and efficient. Functions not commonly used are usually
implemented by combined instructions. RISC machines deliver lower efficiency
when implementing special functions. However, it can be made up by using the
flow technology and the superscalar technology. The instruction system of the
CISC computer is rich. Instructions are dedicated to different functions. Therefore,
the efficiency of handling special tasks is high.
 Memory operation: RISC has restrictions on memory operations, simplifying
control. However, CISC has more memory operation instructions, and the
operations are direct.
 Program: An RISC assembly language program needs generally a large memory
space, and is complex and difficult to design for implementing special functions.
CISC assembly language programming is relatively simple. Programs for scientific
computing and complex operations is relatively easy to design, delivering higher
efficiency.
 CPU IC: The RISC CPU has a smaller number of unit circuits and therefore has a
smaller footprint and lower power consumption. The CISC CPU has ample circuit
units, and therefore has powerful functions, a larger footprint, and higher power
consumption.
 Design period: RISC microprocessors are simple in structure, compact in layout,
short in design cycle, and easy to adopt the latest technology. The CISC
microprocessor has a complex structure and requires a long design period.
 Usage: RISC microprocessors are simple in structure, regular in instructions, easy
to understand and use. The CISC microprocessor has a complex structure and
powerful functions, and is easy to implement special functions.
 The mainstream computing architectures of data centers include x86, ARM, and
Power.
 x86 is a closed hardware architecture and has been implemented in

processors from Intel and AMD, and many other companies, with a mature
ecosystem. They take the lead in technology evolution, pace, and supply.
 The IBM Power architecture is mainly designed for supercomputing and

cognitive computing. But it lacks application developers, and has a weak
ecosystem and weak sustainable development capabilities.
 The ARM hardware architecture is open to partners, creating an end-to-end

ARM ecosystem. The industry chain is becoming more and more mature.
 From the technical perspective, ARM is a multi-core, many-core architecture. It is

based on RISC and can be optimized based on service features to achieve in-depth
optimization of computing efficiency.
About tick-tock:
 Under the tick-tock model, every microarchitecture change (tock) was followed by
a die shrink of the process technology (tick).
 Tick–tock was a production model adopted in 2007 by Intel. (Intel released its Core
2 CPU in 2006.) Under this model, every "tick" represented a shrinking of the
process technology of the previous microarchitecture and every "tock" designated
a new microarchitecture.
 Under the tick-tock scheme roughly every 12–18 months the Intel alternated
between "Tick" and "Tock".
 With each tick, Intel advances their manufacturing process technology in line with
Moore's Law. Each new process introduces higher transistor density and generally
a plethora of other advantages such as higher performance and lower power
consumption. During a tick, Intel retrofits their previous microarchitecture to the
new process which inherently yielded better performance and energy saving. At
this phase, only lightweight features and improvements are introduced.
 With each tock, Intel uses the latest manufacturing process technology from their
"tick" to manufacture a newly designed microarchitecture. The new
microarchitecture is designed with the new process in mind and typically
introduces Intel's newest big features and functionalities. New instructions are
often added during this cycle stage.
 Headquartered in Cambridge, the company was founded in November 1990 as
Advanced RISC Machines Ltd and structured as a joint venture between Acorn
Computers, Apple Computer (now Apple Inc.) and VLSI Technology.
 As a semiconductor intellectual property (IP) supplier, ARM designs architectures

and cores and licenses these designs to a number of companies that incorporate
those core designs into their own products. Using this win-win partnership, ARM
has quickly become the establisher of the global RISC microprocessor standard.
 ARM is also a CPU technology. Unlike Intel and AMD CPUs that adopt the CISC,
ARM's CPU uses RISC.
 ARM does not manufacture chips. ARM's revenue comes entirely from IP licensing.
 Its revenue sources include:
 Licensing fees from semiconductor companies, one-off within a certain

period of time
 Royalties from semiconductor companies on chips that contain its

technology—a small percentage of the total selling price of a chip
 Fees for technical consulting services from semiconductor companies and

users
 Advantages:
 Features smaller chip sizes, lower power consumption, higher integration,

more hardware CPU cores, and better concurrency performance.
 Supports 16-bit, 32-bit, and 64-bit instruction sets, and are compatible with
various application scenarios from IoT, devices, to cloud.
 Uses a large number of registers, where most data operations are performed
in registers, enabling faster command execution.
 Provides fixed-length instructions, flexible and simple addressing modes, and

high execution efficiency.
 Disadvantages:
 Due to the limitation of RISC, complex computing needs to be completed by

using a combination of instructions, resulting in low processing efficiency.
 In the DC field, Huawei is a new comer. The application ecosystem of Huawei

is far behind that of x86. In particular, the support of commercial software for
ARM needs to be enhanced.
 The improvement of the computing performance of CPU chips has experienced
the development from the single-core era to the multi-core parallel era, and will
enter the heterogeneous computing era in the future.
 In the single-core era. this era, the focus of performance improvement is to

increase the dominant frequency. However, as the Moore's Law slows down, it is
more difficult and costly to improve the single-core performance, there are fewer
and fewer enterprises that can support the R&D of high-end chips. More
enterprises start to seek horizontal development, and the processor chip enters
the multi-core parallel computing era. In 2005, Intel adopted multi-core computing
technology. By designing multiple simplified processor cores on a single chip,
multi-core/many-core parallel computing greatly improves computing
performance. However, the performance improvement brought by multi-core
parallel computing is finite. With the increase of the parallelism degree, the
problems of heat dissipation and power consumption become increasingly
prominent. Therefore, after a bottleneck is encountered in large-scale parallel
processing, improving computing efficiency has become a most concerned field of
processor chips in recent years, that is, how to maximize performance of each
processor, and assign a most suitable task to a computing unit that is best at.
Processor chips will enter the third era, the heterogeneous computing era.
AI and big data applications are posing stricter requirements on computing
capabilities. The conventional x86 computing architecture cannot meet the
requirements of these applications for large-scale parallel computing due to slow
computing performance growth. The heterogeneous computing architecture stands
out.
 The figure shows the differences between serial and parallel computing.
Traditional serial computing has the following features: It runs on a single
computer with a CPU. A problem is broken down into a series of discrete
instructions. Instructions must be executed one by one. Only one instruction can
be executed at a time.
 Parallel computing offers important improvements: Multiple processors are

required. A problem can be decomposed into discrete instructions processed
simultaneously. Each part is further subdivided into a series of instructions. The
problems of each part can be executed on different processors at the same time.
The processing speed of the algorithm is improved.
 The CPU is a general-purpose processor and provides computing and controlling

functions. The architecture of its processor system is Single Instruction Single Data
(SISD). The GPU is used for parallel computing like image processing. The
architecture of its processor system is Single Instruction Multiple Data (SIMD). That
of the ASIC and FPGA are Multiple Instruction Multiple Data (MIMD). The ASIC is a
special-purpose chip. It is small-sized, and has low power consumption, high
computing performance, and high computing efficiency. However, its algorithm is
fixed. Once the algorithm changes, it may not be used. As a programmable chip
with high performance and low power consumption, the FPGA can be customized
based on customer requirements.
The AI technology is maturing and has a large market potential. The GPU computing
acceleration solution has a large market share.
 Architecture evolution from Kepler, Maxwell, and Pascal to Volta means great
performance enhancement.
 The differences of the GPU chips mainly lie in the GPU peripherals and industrial
quality.
 Field-Programmable Gate Array (FPGA)
 Semi-customizable ASIC circuit
 Programmable (hardware programming and C/C++ programming), high EER, low

latency
 The FPGA is integrated with the CPU. The FPGA implements parallel computing
with irregular data, while the CPU implements complex computing.
 Example: The FPGA implements fast computing and nanosecond-level latency for
specific algorithms, such as fast Fourier transform and Smith-Waterman algorithm,
improving computing performance.
 Major FPGA manufacturers include Xilinx, Altera, Lattice, and Microsemi. Xilinx and
Altera control 88% of the market. Almost all major FPGA manufacturers come from
the US.
 FPGA is widely used in the aerospace, military, and telecom fields. In the telecom
field, in the stage of the telecom equipment appliance, the FPGA is able to be
parsed and converted by the application network protocol due to the flexibility of
programming and high performance. In the Network Function Virtualization (NFV)
phase, the FPGA improves the NE data plane performance by 3 to 5 times based
on the general-purpose server and Hypervisor and can be managed and
orchestrated by the OpenStack framework. In the cloud era, the FPGA has been
used as a basic IaaS resource to provide development and acceleration services in
the public cloud. AWS, Huawei, and BAT provide similar general-purpose services.
 FPGA is more suitable for non-regular, multi-concurrency, compute-intensive, and
protocol parsing scenarios, such as video, gene sequencing, and network
acceleration.
 The CPU is a general-purpose processor and provides computing and controlling

functions. The architecture of its processor system is Single Instruction Single Data
(SISD). The GPU is used for parallel computing like image processing. The
architecture of its processor system is Single Instruction Multiple Data (SIMD). That
of the ASIC and FPGA are Multiple Instruction Multiple Data (MIMD). The ASIC is a
special-purpose chip. It is small-sized, and has low power consumption, high
computing performance, and high computing efficiency. However, its algorithm is
fixed. Once the algorithm changes, it may not be used. As a programmable chip
with high performance and low power consumption, the FPGA can be customized
based on customer requirements.
 The advantages of the FPGA are as follows:
 High performance: strong parallel capability and good real-time performance
 Strong programmability: in-depth customization and quick rollout
 High energy efficiency ratio: lower power consumption compared with CPU
and GPU
 High compute power: 4096 FP16 MAC operations per clock cycle
 Smooth scalability: from IP modules of dozens of mW to chips of hundreds of

watts, adapting to the smoothly scalable architecture of device-edge-cloud
 Keys:
 ABD
 F
Based on the new-generation intelligent servers, the computation system
integrates management, acceleration, heterogeneous, and AI chips into the
architecture of full-stack all-scenario intelligent solutions covering the cloud,
edge, and device, implementing all-round intelligent acceleration for computation
services.
 Before the emergence of the World Wide Web, servers came in the forms of
minicomputers, midrange computers, and mainframe computers. The
minicomputers, midrange computers, and mainframe computers run in
host/terminal mode. Most of them run on the Unix OS. The operator machine
have to log in to the host through the terminal.
 In the LAN, the machines that perform operations through terminals are not
exactly modern servers. However, the structure is similar to remote operations
from modern servers.
 Servers, as high-performance computers, can provide various services for
clients.
 Under the control of the network OS, a server shares the hard disks, tapes,
printers, and expensive dedicated communication devices connected to the
server with customer sites on the network. A server also provides services such
as centralized computing, information release, and data management for
network users.
 The von Neumann architecture—also known as the von Neumann model or
Princeton architecture—is a computer architecture that combines program
instruction memory and data memory. This term describes a computing device
for implementing a universal Turing machine, and a reference model of a
sequential structure.
 The storage program computer has the following features in terms of the
system structure:
 Centers on a processing unit.
 Adopts the storage program principle.
 The memory unit is a space that is accessed by address and is linearly

addressed.
 A control flow is generated by an instruction flow.
 An instruction consists of an operation code and an address code.
 Data is encoded in binary mode.

 IBM, HP, and SUN developed their own midrange computers to solve the
problems such as insufficient functions, closed interfaces (such as FICON and
ESCON interfaces), limited user groups, and high costs.
 Migration from minicomputers to Intel processors:
Minicomputers with a closed architecture have disadvantages such as

incompatibility between systems, weak service innovation capability, poor
data migration capability, and high maintenance costs. Therefore, the
architecture of minicomputers has become ever more open, for example, the
Intel Itanium CPUs.
 Migration from minicomputers to x86 CPUs:
According to the preceding analysis, the HP SuperdomeX and Inspur Tesora

TS series should use the X86 processor. Currently, mainstream storage
systems (except the IBM DS and Oracle ZFS series) use the x86 CPUs.
 Tower server
 The appearance and structure of a tower server are similar to those of a

tower PC.
 The mainboard has high scalability and a range of slots. Therefore, this
kind of servers is widely used and can meet common server application
requirements.
 Blade server
 Blade servers provide high availability and high density with several of
them installed in one chassis of standard height.
 Each blade is actually a system mainboard.
 Rack server
 A rack server is a server that is designed based on a unified standard and

is used in conjunction with a cabinet.
 Rack servers are tower servers with an optimized structure.

 Mainboard
A mainboard, system board, or motherboard is installed in a chassis and is
one of the most essential components of a computer.
 CPU
A central processing unit (CPU) consists of a controller and an arithmetic
unit, and is the most essential part of a computer system.
 RAM
The RAM is located between the CPU and the external storage device. It is
typically used to store data and instructions and as well as the intermediate
result of processing which the computer system is working on. Physically,
RAM is a group of high-speed integrated circuits with data input/output and
data storage functions.
 Hard disk
The hard disk head, a key component in hard disk, is used to read data. Its
main function is transforming magnetic information which is stored on the
hard disk into electrical signal.
 RAID
Redundant array of independent disks (RAID) is a data storage virtualization
technology that combines multiple physical disk drive components into one
or more logical units for the purposes of data redundancy, performance
improvement, or both.
 CD-ROM drive
A CD-ROM drive is mainly used for OS installation, which is however is being
replaced by the USB flash drive and image burning.
 Power supply unit (PSU) and fan
 Graphics card
 NIC
 CPU
The performance of a computer is largely determined by the CPU, and the

performance of the CPU is mainly reflected in a speed of app execution.
Performance indicators that affect the execution speed include the CPU
working frequency, cache capacity, instruction system, and logical structure.
 RAM
 Temporarily stores computing data in the CPU and data exchanged with
external memories such as hard disks.
 When a computer is running, the CPU transfers the data that needs to be
computed to the RAM for computation, and then sends out the
computation result. The operating stability of the RAM also determines
the stability of the computer.
 RAID
 Higher efficiency than a large disk
 Higher performance
 Fault tolerance
 Dynamic expansion of storage capacity
 Simple storage management

 PCIe is a high-speed serial computer expansion bus standard, designed to
replace the older PCI.
 BIOS is a program stored on a ROM chip. The BIOS consists of basic I/O
program, self-test programs after boot, and system auto-boot program of the
computer. The BIOS can read and write the detailed system settings from the
CMOS. Its main function is to enable the underlying and the most direct
hardware setting and control for a computer.
 SSD accelerator card: NAND flash is the most important unit of an SSD. SSD
storage devices vary in their properties according to the number of bits stored
in each cell, with single bit cells (SLC) being generally the most reliable,
durable, fast, and expensive type, compared with 2 and 3 bit cells (MLC and
TLC), and finally quad bit cells (QLC).
 The field programmable gate array (FPGA) is developed based on

programmable components such as programmable logic array (PLA),
programmable array logic (PAL), generic array logic (GAL), and complex
programmable logic device (CPLD). The FPGA emerges as a semi-customizable
circuit in the application specific integrated circuit (ASIC) field. The FPGA not
only remedies deficiencies of customizable circuits, but also overcomes a
disadvantage that a quantity of gate circuits in a programmable device is
limited.
 System software allows computer users and other software to treat the
computer as a whole without having to consider how the underlying hardware
works. How the hardware works is controlled by the driver.
 For example, databases, Tomcat of Apache, WebLogic application server of

IBM, WebSphere application server of BEA, Tong series of middleware of Tong
Tech, and middleware of Kingdee.
 System software includes the operating systems (such as BSD, DOS, Linux,
macOS, iOS, OS/2, QNX, Unix, and Windows) and basic tools.
 Middleware facilitates communication between software components,

especially the logic of application software for system software. It is
widely used in modern information technology application frameworks
such as Web services and service-oriented architectures.
 Application software can be a specific program, such as an image

browser. It can also be a collection of closely related and collaborative
programs, such as Microsoft Office. It can also be a huge software system
consisting of many independent programs, such as the database
management system.
 An OS is a special computer program (software) that controls the computer
(hardware). It connects computers and resource consumers. The objective is to
allocate resources so as to optimize responsiveness subject to the finite
resources available. These resources include CPUs, disks, RAM, and printers,
which are accessed when apps run.
 Mainstream OSs:
 According to the application fields: desktop OS, server OS, host OS, and
embedded OS
 According to the number of supported users: single user OS (MSDOS and

OS/2) and multiple user OS (Unix, Linux, and Windows)
 According to the openness of source code: open-source OS (Linux and

Unix) and non-open-source OS (Windows and Mac OS)
 According to the handling modes: batch processing system (DOS), time-

sharing system (Linux, UNIX, Mac OS, and Windows), real-time OS, 32-bit,
and 64-bit
 Windows is most widely used in office environments.
 The Unix OS is mainly applied to industries such as telecommunications,

finance, oil field, mobile, and securities.
 A Linux-based system is a modular Unix-like operating system, deriving much

of its basic design from principles established in Unix.
Middleware provides the running and development environment for upper-layer
application software, and provides preset reusable service functional modules and
APIs to help users flexibly and efficiently develop and integrate complex
application systems.
 CRM:
 Web technologies are used for development.
 Integrates internal information of an enterprise.
 Diversified access channels are provided.
 Mail:
 The Simple Mail Transfer Protocol (SMTP) is the principal application-layer

protocol of Internet email systems. Other mail delivery protocols include
POP and IMAP.
 Typical examples are ExchangeServer and SendMail.
 ERP:
ERP, a supply chain management idea, refers to a management platform

based on information technology, with systematic management ideas, and
providing decision-making operation means for enterprise decision-making
layers and employees.
 Web Server:
Typical HTTP servers are THE Apache HTTP server of the Apache Software
Foundation, Internet Information Server (IIS) of Microsoft, and Google Web
Server (GWS).
H3C server products are classified into industrial standard servers and mission
critical servers. Industrial standard servers are further divided into H3C industrial
standard servers and HPE industrial standard servers.
 H3C industrial standard servers include self-owned rack, tower, and blade
servers. The G3 server (corresponding to Huawei V5 server) is developed by
H3C, and the new brand is UniServer. The G2 server (corresponding to Huawei
V3 server) is all OEM products by HPE, except R4900 G2 which is proprietary.
 The HPE industrial standard server is a full series, including rack, blade, tower,
and high-density servers.
 The mission critical servers are all manufactured by HPE, including HPE Unix
servers and x86 mission critical servers based on the new-generation
minicomputer architecture. The servers are classified into dynamic servers, x86
servers for mission critical services, and fault-tolerant servers. The dynamic
servers and fault-tolerant servers are Intel Itanium series.
 Product & solution portfolio: intelligent servers (rack servers, high-density
servers, blade servers, and mission-critical servers), AI application computing
platform Atlas, and ARM servers.
 Based on these products, we provide various solutions in a range of fields such

as HPC, big data, and AI. We collaborate with partners to enrich and improve
intelligent computing solutions across industries and scenarios.
Keys:
1. C
2. The field programmable gate array (FPGA) is developed based on

programmable components such as programmable logic array (PLA),
programmable array logic (PAL), generic array logic (GAL), and complex
programmable logic device (CPLD). The FPGA emerges as a semi-customizable
circuit in the application specific integrated circuit (ASIC) field. The FPGA not
only remedies deficiencies of customizable circuits, but also overcomes a
disadvantage that a quantity of gate circuits in a programmable device is
limited.
3. Typical HTTP servers are THE Apache HTTP server of the Apache Software
Foundation, Internet Information Server (IIS) of Microsoft, and Google Web
Server (GWS).
 A server is a mainstream computing product developed in 1990s. It can provide
network users with centralized computing, information release, and data
management services. In addition, a server can share hard disks, tape drives,
printers, and modems to which the server connected, and dedicated
communication devices with network users.
 As an important node on the network, a server stores and processes 80% data and
information on the network. Therefore, it is called the soul of the network.
Functions of network terminal devices must be implemented through servers. In
other words, these devices are organized and led by servers.
 A server has the following features:
 R: Reliability – the duration that the server operates consecutively
 A: Availability – percentage of normal system running time and use time
 S: Scalability – including hardware expansion and operating system (OS)

support capabilities
 U: Usability – easy to maintain and restore server hardware and software
 M: Manageability – monitoring and alarm reporting of server running status,

and automatically intelligent fault processing
 C/S is short for Client/Server. Generally, a server is a high-performance PC,
workstation, or minicomputer, and adopts a large-scale database system, such as
Oracle, Sybase, Informix, or SQL Server. You need to install dedicated client
software on a client.
 B/S is short for Browser/Server. You need to install only one browser, such as
Netscape Navigator or Internet Explorer, on a client, and install the Oracle, Sybase,
Informix, or SQL Server database on the server. In this structure, the user interface
is implemented completely using the WWW browser. Some transaction logic is
implemented at the browser, but the main transaction logic is implemented at the
server. The browser exchanges data with the database through Web Server.
 Service scale
 Entry-level server: Entry-level servers are connected to limited terminals and
have poor stability, scalability, fault tolerance, and redundancy performance.
Therefore, entry-level servers apply only to small-sized enterprises that do
not have large-scale database data exchange, have small network traffic, and
do not need to power on servers for a long time. These servers do not have
many server features.
 Workgroup server: Workgroup servers can connect to users of a workgroup
(about 50 clients). The servers have small network scales and low
performance. The servers meet small- and medium-scale network users'
requirements on data processing, file sharing, Internet access, and simple
database applications.
 Department-level server: Department-level servers not only have all features
of workgroup servers, but also offer comprehensive management functions
for monitoring and managing circuits. The servers can monitor parameters
such as the temperature, voltage, fan, and chassis, so that management
personnel can learn server working status in a timely manner based on
standard server management software. In addition, most department-level
servers have excellent system scalability, so that the online system upgrade
can be supported when users' service volumes increase. This maximizes the
return on investment (ROI).
 Enterprise-level server: Enterprise-level servers are high-end servers. An
enterprise-level server uses the symmetric CPU structure of at least four CPUs,
and provides independent dual-PCI channels, memory expansion board
design, and high memory bandwidth. It supports the hot swappable PSUs
and large-capacity hard disks, and provides super strong data processing
capability and high cluster performance..
 Servers have been widely used in various fields, such as the telecom carrier,
Internet service provider (ISP)/Internet content provider (ICP), government, finance,
education, enterprise, and e-commerce. Servers can provide users with the file,
database, email, web, and File Transfer Protocol (FTP) services.
 Differences between a server and a PC:
 Servers feature powerful computing capabilities (multiple processors) and

large and scalable storage capacity, and can be used by multiple users in
network access mode. Servers can work around the clock and works
collaboratively as clusters. They have redundant power modules and fans and
support IPMI-based system monitoring.
 PCs have low computing capabilities (single processor) and their storage
capacity is small and hard to expand. A PC can be used by only one person in
keyboard, mouse, and monitor mode. PCs work separately and for several
hours a time. They do not have redundant components or monitoring.
 Similar to the PC structure, a server consists of the mainboard, CPU, hard disk,
memory, and system bus. Servers are customized based on specific
applications. With the development of the information technology and
network, users have higher requirements on the data processing capabilities
and security of information systems. A server differs from a PC in the
processing capabilities, stability, reliability, security, scalability, and
manageability.
 Server hardware includes the CPU, mainboard, dual in-line memory module
(DIMM), hard disk, PCIe card, chassis, PSU, and fan.
 The figure uses RH2288 V2, RH2288H V2, or RH1288 V2 as an example.
 This slide focuses on the logical architecture of the server, the relationship
between modules, and how modules interact with each other.
 QuickPath Interconnect (QPI) is developed by Intel for interconnection between

chips. The total QPI bus bandwidth is calculated using the following formula: Total
QPI bus bandwidth = Number of times data is transmitted per second (QPI
frequency) x Valid data transmitted each time (16 bit/8, that is, 2 bytes) x 2
(bidirectional). Therefore, the total bandwidth of a bus with a QPI frequency of 4.8
GT/s is calculated as follows: Total bandwidth = 4.8 GT/s x 2 bytes x 2 = 19.2 Gb/s.
A QPI bus is used for interconnection between the CPU and a chip group.
 The CPU connects to the DIMM using double data rate (DDR) signal cables. The
number of DIMMs and DIMM rate supported by the CPU vary depending on the
CPU specifications.
 For example, each CPU of the RH2288H V2 server supports 12 DIMMs and a rate
of 1333 or 1066.
 The CPU is the core processing unit on a server, and a server is an important
device on the network and needs to process a large number of access requests.
Therefore, servers must have high throughput and robust stability, and support
long-term running. Therefore, the CPU is the brain of a computer and is the
primary indicator for measuring server performance.
 The processing of the CPU can be divided into the following four stages: fetch,
decode, execute, and writeback. The CPU retrieves instructions from the memory
or cache, stores the instructions in an instruction register, and decodes and
executes the instructions.
 CPU dominant frequency
 The dominant frequency also refers to clock speed. It uses unit MHz or GHz
to indicate the frequency at which a CPU computes and processes data. In
most cases, if the dominant frequency is higher, the data processing speed of
the CPU will be faster.
 The CPU dominant frequency is calculated using the following formula: CPU
dominant frequency = External frequency x Multiplication factor. The
dominant frequency has a certain relationship with the actual operation
speed, but it is not a simple linear relationship. The CPU computing speed
depends on the performance indicators of the CPU pipeline and bus.
 CPU external frequency
 The external frequency is the reference frequency of a CPU, measured in MHz.
The CPU external frequency determines the running speed of the mainboard.
Overclocking is the process of making a computer or component operate
faster than the clock frequency specified by the manufacturer. However,
overclocking is not allowed for the server CPU. The CPU computing speed
depends on the operating speed of the mainboard, and the CPU and
mainboard operate synchronously. Bus frequency
 The bus frequency is also known as the front side bus (FSB) frequency and
directly affects the speed of exchanging data between a CPU and a DIMM.
The data bandwidth is calculated using the following formula: Data
bandwidth = (Bus frequency x Data bit)/8. The maximum bandwidth of data
transmission depends on all transmitted data bits and transmission frequency.
For example, the Nocona CPU supports 64-bit, and the FSB frequency is 800
MHz. Therefore, the maximum bandwidth of data transmission is 6.4 GB/s.
 L1 cache
 The L1 cache includes data cache and instruction cache. The capacity and
structure of the built-in L1 cache greatly affect the CPU performance. The
cache is composed of static RAM, which has a complex structure. If the CPU
chip area is not large, the capacity of the L1 cache cannot be too large.
Generally, the L1 cache capacity of a server CPU is 32 KB to 256 KB.
 L2 cache
 There are internal and external L2 caches. The speed of the internal L2 cache
is the same as that of the dominant frequency, but the speed of the external
L2 cache is only half that of the dominant frequency. The L2 cache capacity
affects the CPU performance. The larger the cache capacity is, the better the
CPU performance is. In the past, the maximum CPU capacity for home
computers is 512 KB, and that for laptops can reach 2 MB. The L2 cache
capacity for servers and workstations can reach 8 MB or higher.
 L3 cache
 The L3 cache can further reduce memory latency and improve CPU
performance when computing a large amount of data. Increasing L3 cache
can significantly improve server performance. A configuration with a larger L3
cache makes it more efficient to utilize physical memory resources, allowing
the disk I/O subsystem to process more data requests. A processor with
larger L3 cache can provide more efficient file system cache and a shorter
message and processor queue length.
 Instruction sets
 CISC instruction set: also called complex instruction set. In the CISC
microprocessor, instructions of a program and operations in each instruction
are executed in sequence.
 RISC instruction set: This instruction set is developed based on the CISC
instruction system. Compared with the CPU that adopts the CISC, the CPU
that adopts the RISC simplifies the instruction system and improves the
parallel processing capability by using the superscalar and superpipelining
structure. The RISC instruction set is the development trend of high-
performance CPUs.
 EPIC instruction set: This instruction set serves as an important step for Intel
processors to move to the RISC system. Intel uses the advanced and powerful
EPIC instruction set to develop the 64-bit OS-based IA-64 microprocessor
architecture.
 Multi-core CPU
 A multi-core CPU is a CPU with two or more independent computing engines
(kernels). The development of multi-core technology stems from engineers'
recognition that increasing the speed of a single-core chip generates too
much heat and does not lead to corresponding performance improvements,
which are similar to the previous processor products. A multi-core processor
is a single chip (also called a "silicon core") that can be directly inserted into a
single processor slot. However, an operating system utilizes all related
resources to use each execution core as a discrete logical processor. By
dividing tasks between two execution cores, the multi-core processor
performs more tasks in a specific clock cycle.
 The storage, an important computer component, is used to store programs and
data. For computers, the memory function can be supported and normal working
can be ensured only when the storage is available. The storage is classified into the
main memory and external storage by purpose. The main memory, referred to as
the internal storage (that is, memory), is the storage space that the CPU can
address and is made of semiconductor components. The memory features a fast
access rate.
 As a main computer component, the memory is in opposition to the external
storage. Programs, such as Windows OSs, typing software, and game software, are
generally installed on the external storage, such as hard disks. However, program
functions cannot be used. To use the program functions, programs must be put
into the memory and be executed there. Actually, we enter a text or play a game in
the memory. Bookshelves and bookcases for putting books are just like the
external storage, while the desk is like the memory. Permanent and a large amount
of data is generally stored in the external storage, while temporary or a small
amount of data and programs are stored in the memory. Memory performance
affects the computer operating speed.
 Common DIMM manufacturers
 Three major dynamic random access memory (DRAM) vendors: Samsung, SK
Hynix, and Micron
 Module vendors: Kingston and Ramaxel purchase DRAM particles to
manufacture DIMMs.
 Dual-channel: The dual-channel architecture includes two independent and
complementary intelligent memory controllers, and the two memory controllers
can operate simultaneously without waiting time between each other, which
doubles the memory bandwidth.
 Memory interleaving: This technology divides the main memory into two or more
sections, and the CPU can quickly address these sections without waiting. It is used
to organize the memory modules on the server mainboard to improve memory
transmission performance.
 Registered memory: The REGISTERED ECC SDRAM memory has 2 or 3 dedicated
integrated circuit chips, called RegisterICs. These integrated circuit chips improve
current drive capabilities and enable IA servers to support large-capacity memory.
 Online spare memory:
 When a multi-bit error occurs on the primary or extended memory, or a
physical memory fault occurs, the server continues to run.
 The spare memory takes over the work of the faulty memory.
 The spare memory area must be larger or equal to the memory capacity of
other areas.
 Memory mirroring:
 Mirroring provides data protection for the system in the case of multi-bit
errors or physical memory faults to ensure normal system running.
 Data is written to the memory areas of two mirrors at the same time but is
read from an area.
 UDIMM: The address and control signals of the controller are directly sent to the
DIMM.
 The UDIMM has the following features:
 The server often uses UDIMM with a temperature sensor and the error
checking and correcting (ECC) function.
 Servers using UDIMM: X6000JDM (XH310 v2 and XH311 v2)
 RDIMM: The address and control signals of the controller are sent to the DRAM
chip through the Register. The clock signals of the controller are sent to the DRAM
chip through the PLL.
 The RDIMM has the following features:
 Capacity: 4 GB, 8 GB ,16 GB, and 32 GB.
 When four-rank RDIMMs are used, only 1/ 2 DPC configuration is

supported. 3 DPC configuration (each channel of the Intel CPU supports
a maximum of eight ranks) is not supported.
 LRDIMM is short for Load-Reduced DIMM.
 The LRDIMM has the following features:
 Capacity: 32 GB and 64 GB.
 LRDIMM supports more than eight ranks for a channel. This feature can
improve the system memory capacity.
 What is a rank?
 Answer: The bit width of the interface between the CPU and the DIMM is 64 bits.
Each memory chip is 4-bit or 8-bit wide. Therefore, multiple memory chips must be
combined to form a data collection that is 64-bit wide to interconnect with the
CPU. A rank is a 64-bit wide data area on a DIMM.
 Check details of the configuration on the webpage of Huawei Server Product
Memory Configuration Assistant:
http://support.huawei.com/onlinetoolweb/smca/?language=en
 The Purley platform supports RDIMM and LRDIMM.
 The Purley platform supports the following DIMM frequencies:

2133/2400/2666/2933 (CasCade);
 Balanced DIMM configuration is recommended. Configure all channels with

the same DIMM (including the same rate, capacity, and rank). Mixed insertion
of different types of DIMMs is not supported.
 If multiple CPUs are to be configured, ensure that the DIMM configurations

of each CPU are the same.
 If there is only one DIMM, it must be inserted into slot slot0 (the farthest slot
from the CPU) of the specified channel.
 If you want to insert single-rank, dual-rank, and four-rank DIMMs in the form
of 2DPC, start from the farthest slot with the highest-rank DIMM.
 Note: 2DPC means 2 DIMMs per channel.

 Evolution trend:
 Capacity (4 GB -> 8 GB -> 16 GB -> 32 GB -> 64 GB)
 Voltage (1.5 V -> 1.35 V -> 1.2 V)
 Frequency (1333 -> 1600 -> 1866 -> 2133 -> 2400)
 Solid-state drive (SSD): A hard disk made up of a solid-state electronic storage
chip array. An SSD consists of a control unit and a storage unit (flash or DRAM
chip). SSD is the same as the common hard drive in the specification and definition
of interface, function, usage, and product shape and size. SSDs are widely used in
the fields of military, industrial control, video surveillance, network surveillance,
power generation, medical care, and aviation, as well as on vehicle-mounted
devices, network terminals, and navigation devices.
 Strengths: fast in reading and writing, shockproof and anti-drop, low power
consumption, noise-free, wide range of operating temperatures, portable
 Weaknesses: small capacity, limited lifecycle, expensive
 Hybrid hard drive (HHD): A hybrid hard drive is a combination of HDD and SSD,
which uses small-capacity flash memory chips to store frequently accessed files.
Hard disks are the most important storage medium. Flash memory chips serve as a
buffer, which stores frequently accessed files in flash memory chips to reduce the
seeking time and improve efficiency.
 Strengths: faster in application data storage and recovery (such as the word
processing machine), faster startup of the system, lower power consumption,
lower heat generation, longer lifecycle, prolonged battery lifecycle for
laptops and tablets, lower working noise
 Weaknesses: longer seeking time of hard disks, more frequent spin changes
of the hard disks, impossibility in data recovery in case of processing failures
by the flash memory module, higher costs on system hardware
 Hard disk drive (HDD) is made of one or more magnetic disks (in the material of
aluminum or glass), a magnetic head, a rotating shaft, a control motor, a magnetic
head controller, a data converter, an interface, and a cache.
 In the early stage, the hard disk ports include IDE and SCSI ports. With the
development of hard disk technologies, such ports no longer exist.
 The mainstream hard disk interfaces include SATA, SAS, FC (not used by servers),
and PCIe. (Huawei servers mainly use SAS and SATA.)
 Maximum transmission rates of the SAS, SATA, and PCIe ports
 SAS 1.0: 3 Gbit/s; SAS 2.0: 6 Gbit/s; SAS 3.0: 12 Gbit/s;
 SATA 1.0: 1.5 Gbit/s; SATA 2.0: 3 Gbit/s; SATA 3.0: 6 Gbit/s;
 PCIe 1.0: 2.5 GT/s; PCIe2.0: 5 GT/s; PCIe 3.0: 8 GT/s.

 HDDs have the following two sizes:
 2.5-inch HDDs focus on performance. They feature higher IOPS and
bandwidth.
 3.5-inch HDDs focus on capacity and have a capacity about four times larger
than that of the HDDs of the same generation. HDDs are classified into the
following types according to the service applications:
 Enterprise-level performance: also known as mission-critical. HDDs of this
type are mainly high-performance SAS disks with 10,000 rpm or 15,000 rpm.
 Enterprise-level capacity: also known as business-critical or near-line. HDDs
of this type are mainly SAS/SATA disks with 7.2K rpm.
 Enterprise-level cloud disk: also known as nearline-lite or cloud disks. HDDs
of this type usually have the same or lower rotational speed as enterprise
HDDs, and the performance is between enterprise-level capacity HDDs and
desktop HDDs.
 Desktop disk: Hard disks for PC applications.
 SSDs are classified into the following types according to the service applications
and flash media:
 Read-intensive: The storage media is mainly MLC NAND flash, and the port is
mainly SATA.
 Mainstream: The storage media is the durable eMLC NAND flash, and the
DWPD is about 10. Mainstream SSDs are suitable for services that require
balanced read and write operations. The ports include SAS, SATA, and PCIe.
 Write-intensive: The storage media is the durable eMLC NAND flash or SLC
NAND flash. Write-intensive SSDs reserve space and are suitable for services
that require intensive write operations.
 Hard disk capacity (volume): The capacity is measured in MB or GB. The factors
that affect the hard disk capacity include the single platter capacity and the
number of discs.
 Rotational speed: The rotational speed is the number of rotations made by hard
disk platters per minute. The unit is revolutions per minute (rpm). In most cases,
the rotational speed of a hard disk reaches up to 5400 rpm or 7200 rpm. The hard
disk that uses the SCSI interface reaches up to 10,000-15,000 rpm.
 Average access time = Average seek time + Average wait time
 Data transfer rate: The data transfer rate of a hard disk is the speed at which the
hard disk reads and writes data. It is measured in MB/s. The data transfer rate of a
hard disk consists of the internal data transfer rate and the external data transfer
rate.
 Input/Output operations per second (IOPS): IOPS is the input/output operations or

read/write operations per second. It is a key indicator to measure the hard disk
performance. For applications with frequent read/write operations, such as online
transaction processing (OLTP), IOPS is a key indicator. Another key indicator is the
data throughput, which indicates the amount of data that can be successfully
transferred per unit time. For applications that require a large number of
sequential read/write operations, such as video editing and video on demand at
TV stations, the throughput is more of a focus.
 Power consumption comparison between HDDs and SSDs
 The power consumption of SSDs is slightly lower than that of HDDs (except
PCIe SSDs). A 3.5-inch HDD consumes more power than a 2.5-inch HDD of
the same type, and an HDD with a larger capacity consumes more power. An
HDD with a higher rotational speed consumes more power. An SSD with a
larger capacity consumes more power.
 The life of HDDs is limited only by the loading/unloading times. The life of
SSDs is affected by the storage media NAND. P/E is the erase life per cell. TLC
has the lowest P/E and is not used in enterprise-level applications in most
cases. The mainstream HHDs use MLC and eMLC. Select an appropriate HHD
based on the read/write service requirements.
 The mean time between failures (MTBF) of 10K/15K SAS HHD is almost the
same as the MTBF of an SSD. An NL HHD has a relatively lower MTBF. MTBF
can indicate the failure rate. The failure rate of NL HHDs is higher. The cloud
disk is about 0.8 million hours and the reliability is lower than that of NL
HDDs.
 UBER indicates the uncorrectable bit error rate. The UBER of SSDs is two
magnitudes higher than that of HDDs. SSDs have stronger fault recovery
capability than HDDs.
 HDDs have worse vibration resistance and anit-impact capabilities than SSDs.
In harsh environment, SSDs have an advantage over HDDs.
 Redundant Array of Independent Disks (RAID) technology organizes multiple
independent disks into one logical disk, thereby improving disk read/write
performance and data security. Large-capacity disks are expensive. The basic idea
of RAID is to combine multiple small-capacity and inexpensive disks to obtain the
same capacity, performance, and reliability as expensive large-capacity disks at a
low cost. As disk costs and prices continue to decline, RAID can use most of the
disks, and being inexpensive is not the focus. Therefore, the RAID Advisory Board
(RAB) decided to replace "inexpensive" with "independent", and the RAID became
the Redundant Array of Independent Disks. This is only a change in the name, but
not a change in the substance.
 The new concept is RAID level. The specific capacity depends on the RAID level
used by users. The usage varies according to different RAID levels.
 Introduction to various RAID levels
 Strip: A strip consists of one or more consecutive sectors on a disk, and

multiple strips form a stripe.
 Stripe: A stripe is formed by strips in the same positions (or with the same
numbers) on multiple disk drives in a disk array.
 Application scenarios:
 RAID 0: suitable for scenarios that require high read/write speed but low
security, such as graphics workstations
 RAID 1: suitable for scenarios that require random data writes and high
security, such as servers, databases, and storage devices
 RAID 1E: suitable for scenarios that require data transmission and high
security, such as video editing, large-scale database storage
 RAID 5/6: suitable for scenarios that require random data transmission and
high security, such as financial and database storage
 RAID 10: suitable for scenarios that have high requirements on random
read/write and security, such as banking and financial scenarios
 Data parity: Redundant data is used to detect and rectify data errors. The
redundant data is usually calculated through Hamming check or XOR operations.
Data parity can greatly improve the reliability, performance, and error tolerance of
the disk arrays. However, the system needs to read data from multiple locations,
calculate, and compare data during the parity process, which affects system
performance. Each RAID level uses one or more of such technologies to achieve
different data reliability, availability, and I/O performance. You need to
comprehensively evaluate reliability, performance, and costs, and then select a
proper RAID level (or new level or type) or RAID mode based on system
requirements.
 Generally, RAID cannot be used as an alternative to data backup. It cannot prevent

data loss caused by non-disk faults, such as viruses, man-made damages, and
accidental deletion. Data loss here refers to the loss of operating system, file
system, volume manager, or application system data, not the RAID system data
loss. Therefore, data protection measures, such as data backup and disaster
recovery, are necessary. They are complementary to RAID, and can ensure data
security and prevent data loss at different layers.
 The RAID controller card has its own independent processor and memory. It can
calculate parity information and locate files, reducing the CPU computing time and
improving the parallel data transmission speed.
 The flash board and SuperCAP implement the supercapacitor solution.
 The iBBU implements the battery solution.
 The two solutions achieve data protection. For details on the principles, see the
next slide.
 Battery:
 When the system is powered off unexpectedly, data in the DDR is still stored
in the DDR.
 The backup battery unit (BBU) supplies power to the DDR to ensure that the
self-refresh function of the DDR is normal.
 The data is stored for a limited period, which is usually 48 hours to 72 hours.
 During working, the battery needs to be discharged periodically, which
affects performance for about 4 hours to 9 hours.
 The battery life is greatly affected by the environment.
 SuperCAP:
 After the system is powered off unexpectedly, data is transferred from the
DDR to the Nand flash in the flash card.
 The supercapacitor supplies power to the controller, DDR, and flash card to
ensure that data can be transferred to the flash card.
 After the data transfer is complete, the supercapacitor does not need to be
charged. Data can be stored permanently.
 The supercapacitor can be charged or discharged in a short period, which has
little impact on system performance.
 During working, the capacitor capacity decreases continuously, but the
capacity can ensure the normal working in the entire life cycle.
 Direct connection to hard disks:
 The SAS/SATA hard disk is directly connected to the controller.
 A maximum of eight hard disks can be connected.
 The direct connection mode is often used in high-performance racks or blade

servers.
 Expander:
 The Expander is an expanded chip on the backplane.
 The controller is connected to a hard disk through the Expander.
 A maximum of 14 or 26 hard disks can be connected.
 The Expander mode is often used for rack servers that focus on storage.
 Software RAID does not provide the following functions:
 Hot swap
 Hard disk hot spares
 Remote array management
 Support for bootable arrays
 Array configuration on hard disks
 S.M.A.R.T. for disks

 The bus is a transmission harness composed of conducting wires. It serves as a
public channel for information transmission among the CPU, DIMMy, input device,
and output device. Host components are connected by the bus, and external
devices are connected to the bus by the interface circuit, forming a computer
hardware system.
 The bus can transmit bits back and forth along fixed routes. Each route is
responsible for transmitting only one bit at a time. More data can be
transmitted only when multiple routes are used at the same time. The
number of data that can be transmitted at the same time on the bus is called
width. Larger width leads to better transmission performance.
 The bus bandwidth (that is, the volume of data that can be transmitted in a
time period) is calculated by the following formula: Bus bandwidth =
Frequency x Width (bytes/sec).
 Bus technology development
 Industry Standard Architecture (ISA) bus: The first PC bus is the IBM system
bus that was used in the PC/XT computer in 1981. In 1984, IBM launched the
PC/AT computer based on the 16-bit Intel 80286 processor. The system bus
is also extended to 16-bit, which is called the PC/AT bus. To develop
peripheral devices compatible with IBM PCs, the industry gradually
establishes an Industry Standard Architecture (ISA) bus based on the IBM PC
bus specifications.
 Peripheral Component Interconnect (PCI) bus: The transmission speed of the
ISA/EISA bus is slow, and the CPU speed is even higher than the bus speed.
As a result, the hard disk, video card, and other peripheral devices can only
send and receive data at a low speed, which severely affects the performance
of the entire system. To solve this problem, Intel released the 486 processor
and proposed the 32-bit PCI bus in 1992.
 Comparison between PCI and PCIe
 PCI: used in the scenarios that require a high data transmission rate, such as
digital graphics, image and voice processing, and high-speed real-time data
collection and processing. The PCI bus can solve the problem about the low
data transmission rate of the original standard bus.
 PCIe: adopts point-to-point serial connection that is popular in the industry.

Compared with PCI and the earlier shared parallel architecture of the
computer bus, PCIe enables each device to have their own dedicate
connection, without the need to request the bust for bandwidth resources.
Besides, PCIe can greatly increase the data transmission rate. Compared with
the traditional PCI bus that can only implement unidirectional transmission in
a single time period, duplex and simplex connections of PCIe can provide a
higher transmission rate and quality, and the difference between them is
similar to that between half-duplex and full-duplex.
 PCIe bus technology
 The PCIe bus uses the point-to-point serial connection that is popular in the
industry. Each device has its dedicated connection and does not to request
bandwidth from the entire bus. The data transmission rate is improved.
 PCIe ports including PCIe x1, x4, x8, x16, and x32 are different based on the
bus width. Short PCIe cards can be inserted into long PCIe slots for use. PCIe
X1 and PCIe X16 are mainstream PCIe specifications.
 PCIe provides 1 to 32 channels, and is highly scalable to meet requirements
of different system devices for data transmission bandwidth. PCIe is
compatible with the PCI technology and devices at the software layer, and
supports initialization of PCI devices and DIMMs. That means, PCIe can
directly be used in the existing driver and OS.
 PCIe ports support hot swap.
 PCIe development: PCI Express® is formally abbreviated to PCIe®. The computer
expansion card is designed to replace the early PCI, PCI-X, and AGP standards and
is used to connect to the peripherals installed on the mainboard and function as
the expansion card interface of the auxiliary board. PCIe is a high-speed serial
computer extended bus standard, which replaces the bus-based communication
architecture. PCIe enables high-speed serial point-to-point high-bandwidth
transmission through dual channels. The connected devices are allocated exclusive
channel bandwidth and do not share bus bandwidth. PCIe supports active power
management, error reporting, end-to-end reliable transmission, hot swap, and
QoS.
 PCIe slots are downward compatible. That is, a PCIe x1 card can be inserted in to a
PCIe x16 slot.
 At the end of the 1980s, it has been predicted that computers would be widely
used in the network in the 1990s, and this prediction turns out to be true. The
development of LAN has a great influence on the whole computer network field. A
network card is a device that connects a compute to LAN.
 A network card services as an interface that connects a computer and a traditional
medium in LAN. It not only implements physical connection and electrical signal
matching between the network card and the transmission medium in LAN, but also
fulfills functions such as frame sending and receiving, frame encapsulation and
decapsulation, medium access control, data encoding and decoding, and data
caching.
 In the TCP/IP model, a network card works at the physical layer and data link layer.
It is used to receive and transmit data.
 In addition to sending and receiving data, a network card provides the following
functions:
 Fixed network address: Where data is sent and received are both specified by
IP address.
 Data encapsulation and decapsulation: Use a letter as an example. The letter
paper in the envelope is data and the envelope is the frame header and
frame trailer.
 Link management: Ethernet links are shared. If packets are sent at the same
time, a conflict occurs. In this case, check whether the link is idle before
sending the packets.
 Data encoding and decoding: A physical medium transmits level or optical
signals. Therefore, binary data needs to be converted into level or optical
signals.
 Data sending and receiving
 In the early stage of network development, there are few applications and data,
and the network rate is only 10 Mbit/s or 100 Mbit/s. However, in the era of big
data, data grows explosively, which requires a higher network rate and shorter
latency.
 Bus types: PCIe, USB, ISA, PCI, and ISA/PCI are early network buses, which are
seldom used now. Network cards with USB interfaces are mainly used in the
consumer-level electronics.
 Structure: LOM, standard PCIe card, and mezzanine card
 Application type: Based on the computer type, network cards can be classified into
workstation network cards and server network cards.
 Network cards can also be classified by rate.

 The differences between a server network card and a common network card are as
follows:
 Common network cards are used in PCs, workstations, and consumer

electronics. These devices do not have high requirements on reliability and
security. When a fault occurs, you can shut down, restart, or replace the
devices. However, servers must run stably for a long time.
 Therefore, server network cards must have the following features:
 High rate: Servers are used to process big data computing and have
demanding requirements on the network card rate, such as 10 Gbit/s or
25 Gbit/s. Some high-performance servers require 100 Gbit/s.
 Low CPU usage: If CPU responds to network cards frequently, the speed
of processing other tasks will decrease. Server network cards have built-
in control chips that can process some CPU tasks, helping reduce CPU
overhead.
 High security: A network fault occurred on a server may result in

application unavailability, which is unacceptable. This requires that the
network card on the server must be error tolerant to minimize fault
occurrence.
 Network cards used in Huawei servers are classified into the following types: LOM,
standard PCIe network card, flexible I/O card, and mezzanine card.
 LOM is integrated in the PCH chip of the server mainboard and is not replaceable.
It does not occupy the PCIe slot of the server.
 To improve compatibility of network cards, PCIe defines the PCIe size. Vendors
develop network cards based on the PCIe size so that they can be installed in
standard PCIe slots.
 To make full use of the space, Huawei develops the non-standard flexible I/O card
dedicated to Huawei servers.
 A mezzanine card is designed for blades only. It does not provide external physical
ports. All signals are transmitted through the blade backplane. Mezzanine cards of
different vendors cannot be interchanged.
 A network card provides two types of physical ports: electrical port and optical
port.
 Electrical port: It is the network port seen on a common PC. It is one type of
RJ45 port and connects with common network cables.
 Optical port: It connects to an optical module. The port for housing the
optical module is called an optical cage.
 Optical modules can be classified into SFP+, SFP28, and QSFP+ based on the
encapsulation mode. SFP+ and SFP28 have the same structure and are compatible
with each other. SFP28 supports a high rate of 25G, whereas SFP+ supports only
10G. The appearance of QSFP+ differs greatly from that of SFP+. QSFP+ supports
a rate higher than 40G.
 The direct attach cable (DAC) is a direct copper cable. Its module head is
integrated with the cable, and no optical module needs to be configured. The
cable has a large attenuation. Generally, the cable length is 1 m, 3 m, or 5 m.
However, the cable is cheap and is the best solution for short-distance
transmission.
 An active optical cable (AOC) functions as two optical modules + optical fibers and
is also an integrated cable. This type of cable features high data transmission
reliability but is expensive.
 Currently, the mainstream chips used by Huawei network cards come from Intel,
Broadcom, Cavium, and Mellanox.
 ATX standard-for entry-level servers or workstations. The output power ranges
from 125 W to 350 W. Generally, a 20-pin dual-row rectangular socket is used to
supply power to the mainboard. Currently, the power supply specification of the
Pentium 4 processor platform is ATX12V, and a 4-pin 12 V power output end is
added, so as to better meet the power supply requirement of the Pentium 4
processor.
 The Server System Infrastructure (SSI) standard is a power supply standard for IA
servers. It is formulated to standardize the power supply technology of servers,
reduce development costs, and extend the service life of the servers. The SSI
standard specifies the power supply specifications, backplane specifications,
chassis specifications, and heat dissipation system specifications of servers.
 Power supply redundancy modes:
 1+1: In this mode, each module provides 50% of the output power. When
one module is removed, the other provides 100% of the output power.
 2+1: In this mode, three modules are required. Each module provides 1/3 of
the output power. When one module is removed, each of the other two
modules provides 50% of the output power.
 Note: When the system power is large, three modules can be used to implement
2+1 redundancy, and two modules can be used to implement non-redundancy.
However, the system power must be less than the sum of the power of the two
modules minus the current equalization error. When the system power is less than
the power of one module, two modules can be used to implement 1+1. When the
system power is greater than the power of one module, 1+1 may cause overload.
 When the OS wants to do some work, device hardware is controlled by BIOS to
complete the work.
 The BIOS functions above are not listed in any particular order.
 CMOS and RTC are two key concepts related to BIOS.
 The default RTC time is the local time of the factory. When the time is modified
during OS installation or usage, the RTC time is automatically synchronized so that
the time is consecutive after a system power-off.
 RTC uses physical the crystal oscillator with a deviation. In scenarios that require
high time precision, the OS needs to synchronize time with the NTP clock source
periodically. For more about NTP, see https://en.wikipedia.org/wiki/NTP.
 UEFI
 UEFI is developed based on EFI1.10. In 2005, Intel submitted EFI to open-
source UEFI International Organization for management. The major
contributors of UEFI are Intel, Microsoft, and AMI. UEFI uses modularity,
dynamic link, and C-language constant stack transfer to build the system,
getting rid of the traditional complex 16-bit assembly code of BIOS.
 The great thing about UEFI is that it is so easy to use as the Windows
interface.
 On UEFI, the mouse is used to replace the keyboard. The modules for
adjusting functions are the same as those of the Windows program. UEFI can
be considered as a small-sized Windows system.
 Functions
 Larger disk capacity: The GPT partition format in the UEFI standard supports
hard disks with a size of over 100 TB and 100 primary partitions, which is
especially useful for Windows 7 users.
 Higher performance: UEFI can be used in any 64-bit processor. It has a great
addressing capability and excellent performance. To put it simply, you load
more hardware and start up to Windows faster.
 64-bit system: Starting from Vista SP1 and Windows Server 2003, all 64-bit
systems can be started through UEFI, whereas Windows XP and 32-bit
systems are started only through the compatible module of UEFI.
 BMC is a small-sized OS independent of the server system. It is used for remote
management, monitoring, installation, and restart of servers. BMC is integrated on
the mainboard or is inserted into the mainboard through PCIe. BMC is presented
as a standard RJ45 network port with an independent IP address. For common
maintenance, use a browser and enter the IP address: Port to log in to the
management interface. The server cluster uses BMC commands to perform large-
scale unattended operations.
 BMC network ports of a server are generally independent from communication

network ports and are printed with BMC. There are small-sized servers with BMC
network ports and communication network ports integrated.
 Highlights:
 Various management interfaces: IPMI, SNMP, Redfish, CLI, and web
 Fault diagnosis and management (FDM): FDM is a series of diagnosis

capabilities and tools provided by iBMC for Huawei servers, including fault
detection, diagnosis, reporting, and auxiliary diagnosis functions.
 Fault diagnosis: iBMC integrates the MCE troubleshooting system. The

system establishes a general out-of-band x86 hardware troubleshooting
system that uses iBMC as the management center to collect, record, and
diagnose hardware faults, report alarms, and export logs. The component
health tree clearly displays the fault information of each component on the
web UI.
 iBMC provides the following physical ports:
 KVM: The KVM module receives video data from x86 systems over the video
graphics array (VGA) port. Then it compresses the video data and sends the
compressed data to a remote KVM client over the network. Besides, the KVM
module receives keyboard and mouse data from the remote KVM client.
Then it transmits the data to x86 systems by using a simulated USB keyboard
and mouse device.
 Black box: The black box receives running track information from x86 systems
over the PCIe port and provides an interface for exporting the recorded
information.
 LPC: iBMC communicates with x86 systems through LPC ports and supports
standard IPMI ports.
 FE/GE port: iBMC supports remote management through FE/GE network

ports that support IPMI and HTTPS protocols.
 Sensor: iBMC uses sensors to intelligently monitor system status, including

the fan and power supply status of the system.
 Sideband network port: iBMC supports flexible management networks

through the sideband network ports.
 The Intelligent Platform Management Interface (IPMI) is an industrial standard
used to manage the peripherals used in the enterprise systems based on the Intel
structure. The specification is led by Intel and supported be HP, NEC, Dell, and
SuperMicro. Users can use IPMI to monitor the physical health status of servers,
such as the temperature, voltage, fan status, and power status. More importantly,
IPMI is an open and free.
 History
 In 1998, Intel, DELL, HP, and NEC jointly proposed the IPMI specifications, which
can remotely control the temperature and voltage over the network.
 In 2001, the IPMI was upgraded from 1.0 to 1.5, and the PCI Management Bus
feature was added.
 In 2004, Intel released IPMI 2.0 specifications, which are compatible with IPMI 1.0
and 1.5 specifications. The Console Redirection feature is added, and the server
can be managed remotely through the port, modem, and LAN. In addition, the
security, VLAN, and blade server support are enhanced.
 The core of IPMI is a dedicated chip/controller BMC (server processor or
baseboard management controller), which does not depend on the processor,
BIOS, or operating system of the server. It is an independent agent-free
management subsystem that runs independently in the system. BMC can work as
long as the BMC and IPMI firmware are available. BMC is usually an independent
board installed on the server mainboard. Currently, the server mainboard supports
the IPMI. IPMI has a good autonomous feature, which overcomes the limitations of
the previous management mode based on the operating system. For example,
when the OS does not respond or is not loaded, it can still be powered on or off,
and information can still be extracted.
 IPMI's Serial Over LAN (SOL) feature changes the transmission direction of the
local serial port during the IPMI session, thereby providing remote access to the
emergency management service, Windows dedicated management console, or
Linux serial console. This provides a standard way to remotely view the boots, OS
loader, or emergency management console to diagnose and fix server-related
problems. This is a vendor-independent way to diagnose and repair faults.
 Users do not need to worry about the security of command transmission. The IPMI
enhanced authentication (based on the SHA-1 and HMAC) and encryption
(Advanced Encryption Standard and Arcfour) functions help implement secure
remote operations. The support for VLANs facilitates the configuration and
management of private networks and can be configured based on channels.
 The converged infrastructure solution consists of servers, data storage devices,
network devices, IT infrastructure management, automation, and service process
software.
 Three types of operating systems are server operating systems, desktop operating
systems, and embedded operating systems.
 Server OSs include Windows, Linux, UNIX, and more. Each operating system has
different versions. We only need to know some common server OSs and versions.
 What is the benefit of the server OS compared with the single-user OS?
 More stable performance.
 Works better for file management and network applications.
 Higher security and coordination.

 The UNIX system structure is unreasonable and the NetWare system is not user-
friendly. Therefore, most enterprises use the Windows Server system or Linux
system.
 The Novell Netware (Novell network operating system) is a product of Novell.

Before the popularity of Windows, PCs run in the Command Line environment. In
this case, the network architecture is constructed based on Novell Netware.
 The commonly used versions include 3.11, 3.12, 4.10, V4.11, and V5.0. The
mainstream version is NetWare 5, which supports all important desktop OSs (DOS,
Windows, OS/2, Unix, and Macintosh) and the IBM SAA environment. It provides a
high-performance integrated platform for enterprises and institutions that need
complex network computing using products from multiple vendors. NetWare is a
multi-task and multi-user network operating system. Its later versions provide the
system fault tolerance (SFT) capability. The open protocol technology (OPT) is used.
The combination of various protocols enables different types of workstations to
communicate with the public server. This technology meets the requirements of
users for communication between different types of networks, and implements
seamless communication between different networks. That is, various network
protocols are closely connected, which facilitates the communication with various
minicomputers and mainframe computers. NetWare does not require a dedicated
server. Any type of PC can be used. NetWare servers have better support for
diskless stations and games, and are often used in teaching networks and game
halls.
 OpenStack is a community, a project, and a piece of open-sourced software that
provides operating platforms to deploy cloud and tool set.
 It is an open-source cloud computing management platform project, and its major
components are combined to complete specific tasks. OpenStack supports almost
all types of cloud environments. The project objective is to provide a cloud
computing management platform which is easy, scalable, and standard.
 OpenStack manages data center resources and simplifies resource allocation. It
manages the following resources:
 Compute resources: OpenStack controls massive groups of storage, compute,
and networking resource across the data center, and manages via an
OpenStack API. This offers an administrator control and allows users to
provide resources through the web interface.
 Storage resources: Due to performance and price requirements, many
organizations cannot meet the requirements of traditional enterprise-class
storage technologies. Therefore, OpenStack can provide configurable object
storage or block storage functions based on user requirements.
 Network resources: Nowadays, data centers involve a large number of
devices, including servers, network devices, storage devices, and security
devices. They will be divided into more virtual devices or virtual networks. As
a result, the number of IP addresses, route configurations, and security rules
will increase explosively. Traditional network management technologies do
not have high scalability and automation capabilities. Therefore, OpenStack
provides network and IP address management in plug-in, scalable, and API-
driven mode.
 Scalability and elasticity are the main objectives;
 Any feature that constrains the target is optional;
 Everything should be asynchronous;
 All mandatory service components must be horizontally expanded;
 Always use the Shared Nothing architecture or Sharding technology

(segmentation technology);
 Everything should be distributed, especially processing logic;
 Accept the ultimate consistency and implement this principle as far as possible;
 All submitted code should be tested.

 The infrastructure cloud and data center carriers formed by CloudStack can quickly
and easily provide cloud services, such as elastic cloud service (ECS). CloudStack
users can make full use of cloud computing to provide end users with higher
efficiency, unlimited scale, and faster deployment of new services and systems.
 Physical resources of servers are abstracted into logical resources. One server can
be converted into several or even hundreds of isolated virtual servers. Hardware
such as CPUs, memory, disks, and I/Os can be converted into resource pools that
can be dynamically managed. In this way, resource utilization is improved, system
management is simplified, and servers are integrated, making IT more adaptable
to service changes.
 The contents in the blue box specify the most critical and major techniques, which
have been implemented by Huawei.
 Mainstream database software includes Oracle, MySQL, IBM DB2, SQL Server, and
Kingbase.
 Middleware includes Java, WebLogic, and Tomcat.
 Mainstream mail servers:
 Sendmail is the most important mail transfer agent. It is important to understand

how e-mail works. In general, we decompose the e-mail program into user agents,
transport agents, and delivery agents. The user agent receives the user's
instruction and sends the user's mail to the mail transfer agent, for example:
Outlook Express and Foxmail. The delivery agent obtains the mail from the mail
transfer agent and sends it to the mailbox of the end user. For example: procmail.
 Exchange Server is a well-designed mail server product that provides all the
necessary email services. In addition to the conventional SMTP/POP protocol
services, it also supports IMAP4, LDAP, and NNTP protocols. Exchange Server has
two versions, the standard version includes Active Server, network news service
and a series of interfaces connecting other mail systems; The enterprise edition
provides, in addition to the functions of the standard edition, an email gateway for
communicating with the IBM OfficeVision, X.400, VM, and SNADS. The Exchange
Server supports web-based email access.
 Answers:
1. ABCD
2. T
 A cluster is a type of parallel or distributed processing system. It consists of a
collection of interconnected stand-alone computers working together as a single,
integrated computing resource. These computers work together and run a series
of common applications, and further provide single system mapping between a
user and an application. Externally, the cluster is a single system and provides
unified services. Internally, computers in the cluster are physically connected by
using cables, and are logically connected by using cluster software. These
connections offer the computers load balancing and failover capabilities, which are
not possible on a single computer.
Advantages of a cluster:
 Improved performance: Some computing-intensive applications require
powerful computing capabilities. In this case, a cluster is suggested.
 Reduced cost: A computer cluster can deliver better performance at a lower
cost than a general computer.
 Improved scalability: Conventionally, users have to upgrade their severs to
the expensive, latest ones to upgrade the system capacity. With the cluster
technology, you only need to add new servers into the cluster.
 Enhanced reliability: The cluster technology enables the system to continue
operating properly in the event of a failure, minimizing the system downtime
and improving the system reliability.
 High scalability: Server clustering is highly scalable. As the demand and load
increase, more servers can be added to the cluster. In this configuration, multiple
servers execute the same application and database operations.
 High availability (HA): HA refers to a system's ability to prevent system faults or
automatically recover from faults without any human intervention. By transferring
the applications on a faulty server to the backup server, the cluster system can
increase the uptime to 99.9%, thus greatly minimizing the system downtime.
 High manageability: The system administrator can remotely manage one or even a
group of clusters, in the same way for managing a single-node system.
Clusters are classified into the following types:
 High-performance computing cluster (HPC cluster):
 Emphasizes on the performance.
 Mainly used for scientific research.
 High availability (HA) cluster
 Emphasizes on the service availability.
 Minimizes the service downtime.
 Fails over the workload of the unavailable node to an available node.
 Overall performance deteriorates in the event of a failure with one node.
 Highly scalable cluster:
 Adopts a loading balancing policy, generally performed on a specific type of

load.
 Homogeneous load nodes share part of the load.
 Prevents single-node failures and is cost-effective.

 Apache:
Apache is a popular open-source, cross-platform web server, maintained by the
Apache Software Foundation. It supports name-based and IP-based virtual
hosts, server proxying, and the secure socket layer (SSL). It mainly serves static
resources and forwards requests (such as image links) as a proxy server and
works with servlet containers such as Tomcat for JSP processing.
 Nginx:
Nginx is a high-performance HTTP and reverse proxy developed by Russians.
Nginx outshines Apache in performance and stability. More and more websites
in China use Nginx as the web serve, including Sina Blog, Sina Video, NetEase
News, QQ.com, and Sohu Blog. Ngnix performs 10 times faster than Apache in
handling 30,000+ concurrent connections.
 LVS:
Linux Virtual Server (LVS) is a set of integrated software components for
balancing the IP load across a set of real servers. LVS is a free and open-source
project started by Dr. Wensong Zhang in May 1998.
 Memcached:
Memcached is a high-performance, distributed memory object caching system.
Memcached was originally developed by Brad Fitzpatrick from Danga
Interactive for LiveJournal. It is often used to speed up dynamic database-driven
websites by caching data and objects in RAM to reduce the number of times an
external data source must be read. Its daemon is implemented in C language,
but its client supports almost all languages. The server communicates with the
client through simple protocols. The data cached in the Memcached must be
serialized.
 Stateless computing indicates that no status information or specific configuration
exists on a computing entity and all entities are deployed in a non-discriminatory
way. Therefore, computing resources can be quickly replicated and destroyed, and
the states can be extracted by servitization. Compute nodes provide only
computing resources such as CPUs and memory. The hardware configuration is
stored in the configuration profile, and data is stored on disk arrays, enabling the
independence of networking, storage, and compute nodes.
 Stateless computing extracts compute node hardware configurations to form a
configuration policy file (profile). The profile enables offline configuration,
migration, remote batch deployment, and hardware data import and export.
 Huawei stateless computing separates the identifiers (including the MAC address,
WWN, UUID, BIOS, firmware, boot sequence, VLAN, vSAN, vNIC, VHBA, and QoS),
configuration, and setting of a server from the physical resources of its host and
defines the server as a configuration profile. When the server is not bound to the
configuration profile, the server is a raw device. After binding, the server has all the
personalities defined in the configuration profile.
 Typically, the configuration items for a computing node include the configurations
of network, storage, computing, and management.
 Network configuration refers to the configuration for network access
parameters, including MAC address, virtual NICs, VLAN and QoS for virtual
NICs, and remote pre-boot execution environment (PXE) startup.
 Storage configuration refers to the configuration for storage network access
parameters (FC/FCoE WWN and SAN Boot) and local storage parameters
(local RAID configuration).
 Computing configuration refers to the configuration for computing attribute
parameters, including OS startup mode and sequence, memory RAS
configuration, energy conservation, virtualization, and the UUID.
 Heterogeneous computing is able to integrate the advantages of various
architecture computing units to maximize the overall performance.
 Heterogeneous computing is mainly applied in computing-intensive scenarios to

address insufficient CPU computing capabilities. The most typical application
scenario is AI, for example, distributed training systems on the public cloud, and
terminal devices (smart phones and wearables).
 The field programmable gate array (FPGA) is developed based on programmable

components such as programmable logic array (PLA), programmable array logic
(PAL), generic array logic (GAL), and complex programmable logic device (CPLD).
The FPGA emerges as a semi-customizable circuit in the application specific
integrated circuit (ASIC) field. The FPGA not only remedies deficiencies of
customizable circuits, but also overcomes a disadvantage that a quantity of gate
circuits in a programmable device is limited.
 The FPGA is generally used to build digital circuits. The logic and I/O blocks in the
FPGA can be reconfigured as required. It also offers static reprogramming and
online dynamic system restructuring, so that the functions of hardware can be
modified by programming as software. It is no exaggeration to say, the FPGA can
be used to implement any function of digital devices, ranging from high-
performance CPUs to 74-series circuits. The FPGA is like a piece of white paper or a
pile of building blocks, allowing engineers to design digital systems freely using
traditional schematic input methods or hardware description languages.
 Major SSD providers:
 Intel: Since 2009, Intel has occupied a large enterprise market share with its
SATA SSD. However, the PCIe SSD launched in 2012 did not fare well as
expected. Intel became a market dominator with its NVMe SSD. The sales
revenue in 2015 is $ 1.44B.
 Samsung: It captures a smaller market share of enterprise SSDs than Intel.
Data center SATA SSDs are its bread-winning SSDs. Samsung provides OEM
of SAS SSDs for EMC. In 2014, it stepped into the PCIe SSD market.
 WD: It targets the high-end storage market. As is a subsidiary of WD, HGST
uses Intel chips and sells WD SSDs (to customers including EMC, Dell, and
HP) in the SAS market.
 SanDisk: In recent years, SanDisk has made great efforts in the enterprise
market thanks to its large sales volume of PC OEM. Its customers include
Dell, HP, and Netapp. After acquiring Fusion-IO, the company failed to
achieve merge transformation and was finally taken over by WD.
 Toshiba: Since 2015, Toshiba started its line of PCIe SSDs, which has high
performance.
 Let's look at the intelligent SSD controller chips. We started the R&D of SSD
controller chips 13 years ago, and has developed four generations of chips and
seven generations of SSD products. The latest generation of the intelligent SSD
controller chip features 16 nm process, PCIe NVMe and SAS convergence, PCIe 3.0
& SAS 3.0, PCIe hot plug, intelligent acceleration, multi-stream, atomic write, QoS,
and super wear leveling algorithm, prolonging the service life by 20%.
The intelligent converged network interface card chip has been developed since 2004
and now has the third generation. The third-generation intelligent integrated network
chip features 16 nm process, Ethernet and FC convergence, 25GE to 100GE Ethernet,
16G to 32G FC networks, 48 built-in programmable data forwarding cores, OVS and
RoCE v1/v2 protocol offload, 15 Mpps OVS forwarding performance, and SR-IOV.
 Proprietary chip as the core and multi-protocol acceleration: Huawei iNIC adopts
the new-generation ASIC network controller (Hi1822) that supports 2 x 100G or 4 x
25G ETH ports and PCIe 3.0 x16 interfaces. It supports industry-leading 15 Mpps
OVS offload, setting the industry benchmark for Elastic Cloud Server (ECS) network
performance.
 Computing acceleration with 15% CPU resources are offloaded: Huawei Unique
Networking iNIC supports C programming and uses a proprietary programmable
network engine to accelerate services in cloud networks and storage scenarios and
optimize infrastructure utilization.
 Performance acceleration, leading IOPS and latency by 30%+: Congestion control

is supported on RoCE networks. In large-scale networking scenarios, the average
latency linearity outperforms the industry benchmarks.
 High reliability and in-service upgrade: Huawei iNIC supports half-height half-
length standard cards, facilitating server deployment and O&M. It also leverages a
low-power design. The IN can be deployed under the 15 W power consumption
constraint, which has no impact on the deployment of existing servers. In this way,
the IN can be quickly deployed to accelerate services and shorten the TTM of
customers' networks.
Frequent device or service faults occur due to the ever-increasing ICT system scale
and service capacity, and increasingly scattered devices. This poses higher
requirements on the ICT O&M. More O&M spending is required in terms of
manpower, time, and funding. According to Forrester, traditional O&M accounts for
70% of the enterprise IT spending.
 Server management software: The management software is layered. The
underlying two layers are the most important for server management. Standalone
management provides server management capabilities. Without this layer, servers
cannot be managed. The BMCs of our competitors are produced by OEMs. They
are unable to provide the flexibility as Hauwei.
 The upper layer is the centralized management software, that is, eSight. This layer
brings benefits to customers and improves the efficiency of O&M personnel. The
level 1 Internet layer needs the management support and some in-band
management features of the BMC layer.
 Before introducing the BMC, you need to know what platform management is.
Platform management refers to the monitoring and control on the system
hardware. For example, the temperature, voltage, fan, and power supply of the
system are monitored so that adjustment can be made accordingly to ensure that
the system is in a healthy state. The platform management module monitors and
controls the system hardware, records hardware information and logs, and
prompts users to locate faults. The preceding functions can be integrated into a
controller, that is, the BMC.
 The BMC is an independent system that consists of the CPU, minisystem, and
management software. The BMC is required for servers. Common PCs do not have
the BMC.
 eSight is a new-generation ICT management system developed by Huawei for
enterprises. It can manage networks of different equipment suppliers in different
geographic areas. eSight manages IT devices, network devices, and terminals in a
unified manner. eSight supports integration with mainstream third-party
management systems. Based on our 20 years of experience, eSight is an ICT life-
cycle management system built based on ICT installation, routine maintenance,
optimization, and upgrade.
 eSight provides a range of functional components, which can be purchased as

required, minimizing the construction costs for customers.
 OpenManage™ is a collection of Dell management hardware and software. It
discovers, monitors, manages, deploys, and installs patches for Dell servers,
storage devices, and network devices. It can manage Dell PowerEdge™ servers in
physical, virtual, local, and remote environments in in-band or out-of-band mode,
using or not using a management agent.
 OpenManage™ is a collection of Dell management hardware.
 DMC– Dell Management Console
 ITA – IT Assistant OME – OpenManage Essentials
 OMSA – OpenManage Server Administrator
 OMNM – OpemManage Network Manager
 Content Manager Dell Repository Manager
 SUU – Server Update Utility
 DTK – Deployment Toolkit

 Common automated O&M tools include Ansible, SaltStack, Puppet, and Chef.
 Ansible advantages:
 Ansible is an automated O&M tool that can configure systems, deploy

software, and orchestrate complex IT tasks.
 Ansible is written in Python.
 Ansible is SSH based and provides a simpler host management solution.You

do not need to install clients or components on the nodes to be managed.
You only need to install the Ansible app on the management workstation and
configure the IP addresses or domain names of the managed hosts.
 Ansible architecture:
 Ansible: core module
 Connection plugins: Ansible connects to each host through the connection

plugins, in SSH mode by default. More connection modes are supported.
 Host inventory: list of manageable hosts
 Playbooks: specified sequence for executing scheduled tasks
 Core modules: built-in modules
 Custom modules: custom modules
 Plugins: supplement to the functions of the module

Key: ABCD
 Introduce the importance of HPC starting from computational science.
 HPC has been developing for more than half a century. In the early stage, HPC was
based on customized computers and was dedicated to fundamental scientific
research and national defense strategies. With the development of technologies,
most HPC systems are built on x86 server clusters.
 An HPC cluster consists of computing servers, shared storage devices, high-speed

interconnection devices, management software, platform software, and application
software.
 Computing clusters are established by interconnecting varies systems for
processing large-scale computing tasks.
 According to statistics from TOP500, the cluster architecture has become the
mainstream HPC architecture. Intel x86 processors maintain dominance due to
continuous improvement of performance and cost effectiveness. At the same time,
GPUs are more and more commonly used for computing acceleration in specific
fields. In addition to Ethernet, InfiniBand is also widely used to connect cluster
compute nodes because of its low latency and high bandwidth.
 MPP is an old architecture. It is parallel but tightly coupled. It is a single big iron
system and probably runs only one OS. The cluster architecture is loosely coupled
and consists of a large number of small independent nodes. Currently, the most
famous MPP architecture is IBM Blue Gene.
 The network topology of an HPC cluster consists of the computing layer, network
layer, and storage layer. In addition to common compute nodes, GPU nodes and
fat nodes can also be deployed at the computing layer according to application
requirements. Moreover, management nodes and login nodes responsible for
scheduling cluster loads and jobs are also deployed at the computing layer. The
network layer consists of the computing network and management network for
high-speed cluster interconnection. The storage layer also consists of storage
management and login nodes. These nodes are connected to the shared storage
system through the high-speed storage network to provide high-bandwidth
storage I/O services.
 Traditional HPC technologies and architectures are mature after years of
development. However, with the rise and rapid application of cloud computing,
deep learning, and big data technologies, the convergence of HPC cloudification,
HPDA, and deep learning acceleration by using GPU heterogeneous HPC is
emerging.
 More and more customers are providing their supercomputing services for
more users in the form of cloud computing. At the mean time, many small
and medium-sized enterprises cannot purchase expensive HPC servers due to
limited investments. In this case, HPC cloud computing services can reduce
the CAPEX and charge users based on their requirements and actual usage.
This type of elastic services is very important.
 The continuous improvement of computing capabilities and the emergence
of a large amount of available data drive the emergence and development of
deep learning. Therefore, the HPC technology can be used to develop a new-
generation of deep learning systems to boost the development of deep
learning.
 HPDA is applied in the following fields: 1. Consumer behavior analysis and
search ranking analysis in Internet applications. 2. Medical care, logistics
analysis, and financial fraud detection in traditional industries. 3. Product
design and quality analysis in industrial applications.
 In addition, with the advent of Sunway TaihuLight, the computing scale of HPC
clusters has reached the 100 PFLOPS magnitude, and EFLOPS computing becomes
the next target of the HPC industry, which requires major breakthroughs in fields
8
such as computing architectures, network technologies, storage protocols,
compilation environments, and power consumption control.
 An HPC system generally uses multiple processors on a standalone computer or
multiple computers in a cluster as computing resources. Multiple computers in a
cluster are operated as a single resource. HPC systems range from large-scale
clusters based on standard computers to those based on dedicated hardware.
 HPC is a cross-industry and cross-field computing discipline. Generally, cutting-

edge computer technologies are used to resolve the most complex and cutting-
edge scientific problems, driving technological progress of various industries. HPC
is widely used in supercomputing centers, universities, scientific research institutes,
and fields such as manufacturing engineering, geophysics, weather and
environment prediction, oil and gas exploration, biological analysis, life science,
astrophysics, animation rendering, energy, and materials.
 High-performance computers are indispensable in fields such as intensive

computing and massive data processing. They directly promote the development
of national defense, scientific research, and various industries. High performance
computers are crucial in the following fields: 1. Simulations in national defense
research such as nuclear weapon design, nuclear explosion simulation,
aerodynamics, anti-missile weapon system, space technology development. 2.
Massive data processing and large-scale computing in scientific researches such as
long-term climate prediction, high-precision weather prediction, ocean circulation
calculation, simulation of air and water pollution, and flood and earthquake
prediction. 3. Industrial researches such as engine design, mold design, new
biological drug design, wind tunnel test simulation, petroleum geology
exploration, and new material research.
 Different industries and research fields have different requirements on HPC
clusters.
 For example, astrophysics or molecular dynamics simulations require large model

computing. The configuration of multiple processors and shared large-capacity
memory allows intermediate data to be stored in the memory. The processor can
directly obtain data from the memory without exchanging data with the drives,
which reduces I/O frequency and significantly improves the computing speed.
 Weather forecast requires timely computing. When a large number of computing

tasks need to be processed by a large number of parallel systems, high computing
performance of compute nodes can improve the overall computing efficiency.
 In automobile, aviation, and chip manufacturing fields, HPC is used for CAE
simulation, which analyzes product mesh models with a large number of polygons.
The compute nodes need to communicate with each other frequently during the
computing process. To prevent processors from being idle and improve
computing efficiency, a low-latency and high-bandwidth network is required for
data transmission between a large number of compute nodes.
 Gene sequence comparison and combination need to calculate a large amount of

complex and changeable data. With the significant increase of data generated in a
single sequencing task, compute nodes require12high-bandwidth and large-capacity
shared storage for frequent data I/O requests.
 HPC is faced with the following challenges in industrial applications:
 As computer-aided simulation becomes more and more widely used, new

system architectures and technologies are required to set up larger clusters.
 Each industry has its own HPC application software and different
requirements on cluster performance and configuration. These application
characteristics must be understood to provide optimal environment for
application deployment.
 HPC clusters require a large number of expensive dedicated hardware

devices. Constructing and updating HPC clusters require a long time.
Therefore, the barriers against HPC cluster construction and usage are huge.
Popularizing HPC in various industries in a more flexible manner is a
challenge.
 A larger cluster consumes more power. To reduce the overall power

consumption and data center TCO, customers attach great importance to the
cluster power consumption and heat dissipation solutions.
 Huawei solution: hardware platform focus and software cooperation
 The Huawei end-to-end HPC solution involves infrastructures, hardware resources,

system environments, cluster management, service platforms, and industry
applications. For the infrastructure, Huawei provides modular and container data
centers. For the hardware resources, Huawei provides diverse storage devices,
InfiniBand switches, GE switches, and blade servers and rack servers that are
compatible with GPGPUs and PHIs. For the software, Huawei not only provides its
cluster and device management software, but also partners many high-
performance cluster software vendors and application vendors to integrate, test
and optimize commercial HPC products and components to address customer
demands. Huawei also utilizes its service capabilities in the telecom field to the
HPC field and offers professional life-cycle management services including
planning, consulting, construction, deployment, migration, integration, customized
development, and disaster recovery.
 Three types of compute nodes:
 Traditional MPI nodes (thin nodes): high-performance blades or rack servers
 Fat server nodes: SMP high-performance servers with multiple processors

and large memory capacity
 GPU compute nodes: uses GPGPU cards for GPU computing acceleration
 Three-plane networking:
 Computing network: used for message transmission during computing.
 Management network: used for cluster system management.
 Storage network: used for storage or data transmission.
 Terms:
 MPI: Message Passing Interface, which is the interface of the message

transfer function library.
 High performance: new computing benchmark
 A single Kunpeng 920 chip scores over 930 points in the SPECint benchmark test,
surpassing mainstream processors in the industry and raising the performance
record by 25%. It is currently the highest-performance ARM-based processor in the
industry.
 Kunpeng 920 integrates 64 Huawei-developed ARM cores. The processor frequency
is also increased to 2.6 GHz.
 High throughput: gigantic input and output
 The following measures are taken to improve the high concurrency capability at low
latency:
 Measure 1: The number of DDR channels is increased from 6 to 8, the
frequency is increased from 2666 MHz to 2933 MHz, and the total bandwidth is
increased to 1.5 Tbit/s, an increase of 46%.
 Measure 2: PCIe 3.0 ports are upgraded to PCIe 4.0. The speed is doubled and
the bandwidth is increased to 640 Gbit/s, an increase of 66% compared with
industry average.
 Measure 3: The network port bandwidth is increased from 25 Gbit/s to 100
Gbit/s, an increase of four times.
 High integration: four-in-one
 Kunpeng 920 is not only a general-purpose processor. It also integrates three types
of chips (southbridge, NIC, and SAS storage controller chips), achieving the highest
integration level in the industry. Kunpeng 920 integrates the functions of four chips
into one chip, freeing up more slots to support more functions, significantly raising
the system integration level, and cutting down TCO for customers.
 High energy efficiency: green cluster
 With the rise of cloud computing, the scale of a data center is becoming increasingly
larger, evolving from several servers and several racks to a cluster of tens of
thousands of servers. Equipment footprint and 23energy consumption are influential
factors for large data centers, and energy conservation is of great significance.
 One 2U chassis supports four server nodes.
 One single node supports two Kunpeng 916 or 920 processors.
 Up to 16 DDR4 DIMMs
 Six 2.5 inch hard drives or NVMe SSDs
 100GE LOM
 Air cooling or liquid cooling
 These features are supported only by TaiShan XA321 V2.

 High pressure resistance design: The node/FEP hoses can withstand up to 10 times
the atmospheric pressure and work reliably for a long time.
 Reliability and operability design: The quick connectors remain undamaged after
200 removals and insertions in a test.
 Single-node maintenance: The spare parts cost is reduced by 50%.
 Plug-and-play quick connectors: Removal or insertion is complete within 1 minute.
 Maintenance: Single-node maintenance is supported.
 Board-level liquid cooling:
 Less power, less space
 Board-level/Full-liquid cooling, up to 95% liquid cooling ratio, and a PUE of

1.05
 Ultra-high-density deployment, saving equipment room space by 80%
 Smaller granularity
 Intel Phi fusion acceleration coprocessor chip
 Theoretical system memory bandwidth:
 TaiShan 200: 16 channels x 2666 MHz x 8 bytes/1000 = 341.248 GB/s
 x86: 12 channels x 2666 MHz x 8 bytes/1000 = 255.936 GB/s
 Kunpeng 920 uses the advantageous multi-core architecture and powerful

integrated cache at the chip level. Boosted by Kunpeng 920, TaiShan 200 shows
better memory performance with the memory channel and bandwidth improved
by 33%. According to the results of the Stream benchmark test, Kunpeng 920
shows higher performance than the x86 6148 processor in terms of the access
latency of single-core caches at different levels. The DDR access performance of
TaiShan 200 is on the same level as that of the x86 6148 processor.
 The copy operation is the simplest. It reads a value from a memory unit and then
writes the value to another memory unit.
 The scale operation reads a value from a memory unit, multiplies the value by a
factor, and then writes the result to another memory unit.
 The add operation reads two values from a memory unit, adds the values, and
writes the result to another memory unit. The triad operation performs a
combination of the copy, scale, and add operations. The triad operation reads two
values from a memory unit (a and b), multiplies28one value and adds the other value
to the multiplication result (a + factor x b), and then writes the result to another
memory unit.
 The TCP/IP protocol stack has tens of microseconds of packet RX and TX latency
and causes high CPU usage, which has become the system bottleneck.
 To resolve this problem, the Remote Directory Memory Access (RDMA) protocol is
developed to replace the traditional TCP/IP protocol stack. Compared with the
TCP/IP protocol stack, the RDMA protocol allows applications to directly read and
write NICs, greatly reducing the protocol processing time and CPU usage.
However, the RDMA protocol is sensitive to packet loss, and 1‰ of packet loss on
a network can reduce the RDMA throughput by 30%. The RDMA protocol has high
requirements on network packet loss.
 Traditional TCP/IP:
 Data packets need to pass through the OS and other software layers, which
consumes a large amount of resources and memory bus bandwidth.
 Multiple memory copy operations between system memory, cache, and

network controller cache
 The network latency is high.
 Storage and forwarding
 Large buffers
 Packet loss
 RDMA:
 Kernel bypass and CPU offload
 Applications do not need to invoke the kernel memory.
 The kernel memory is not involved, which reduces the frequency of

switching between the kernel memory space and the user space when
network traffic is being processed.
 Zero copy: Data is directly copied from the network port to the application
memory.
 NICs directly exchange data with application memory, eliminating data

exchange between application memory and the kernel.
 iWARP: Internet Wide Area RDMA Protocol
 RoCE: RDMA over Converged Ethernet

 RoCEv2 migrates RDMA to the ETH/IP network, enabling the ETH/IP network to
support HPC, distributed storage, and AI.
 RoCEv2 is a next-generation converged network solution that integrates HPC,

Cloud, AI, and big data. It provides both universality and high-performance.
 Huawei Kunpeng 920 processor is equipped with a 100G RoCEv2 LOM.
 The ARM+RoCE network solution saves a large number of external NICs and
switches, allowing more investments on computing resources.
 Some application scenarios involve reading and writing a large amount of data
and require a large storage capacity and throughput. These application scenarios
include: seismic data processing and reservoir simulation in the oil industry,
meteorological and seismic prediction, satellite remote sensing and mapping,
astronomical image processing, and gene sequence comparison. The HPC cluster
shared storage system well addresses the requirements of these application
scenarios.
 Lustre is an open-source, distributed, and parallel file system. It has the following
advantages: 1. Provides a single namespace. 2. Allows capacity and performance
expansion by adding nodes. 3. Supports online expansion. 4. Supports concurrent
read/write operations of multiple clients. 5. Uses distributed locks to ensure data
consistency.
 The Lustre parallel file system allows multiple nodes in a cluster to read and write
the same file at the same time. This mechanism greatly improves the I/O
performance of file systems that support parallel I/O applications. It stripes data
on multiple storage arrays and integrates all storage servers and storage arrays. In
this way, Lustre builds huge and scalable background storage functions with low-
cost hardware.
 GPFS allows all nodes in a cluster to access data in the same file and provides a
unified file storage space. Different from other cluster file systems, the GPFS file
system supports concurrent and high-speed file access for applications on multiple
nodes to achieve outstanding performance, especially when a large amount of
data is operated in sequence. Although typical GPFS applications are designed for
multiple nodes, the performance in single-node36scenarios is also improved. GPFS is
ideal for application environments where centralized data access exceeds the
processing capability of the distributed file server.
 OpenHPC is a comprehensive HPC software stack and a reference collection of
open-source HPC software components. The 1.3.3 version has passed the
comprehensive test of Huawei ARM servers.
 Theoretically, if the source code is available, all HPC applications can be ported to
the ARM platform.
 The Huawei HPC solution supports Linux OSs such as RHEL, CentOS, and SLES.
Underlying software for system status and performance monitoring is deployed on
these OSs to achieve efficient cluster resource management and job scheduling. In
addition, the Huawei HPC solution provides multiple parallel libraries, compilers,
mathematical libraries, and development tools to build an efficient parallel running
environment.
 The compilers process and link source code to generate executable files. Based on
different hardware platforms, the compilers use different compilation parameters
to optimize the source program for better execution efficiency.
 HPC application software uses many common math algorithms. These algorithms
become standard math libraries after long-time improvement and optimization.
Different open-source organizations and vendors can implement the algorithms
for their own products.
 The HPC cluster parallel libraries are various implementations of the MPI standard.
Jobs are processed by multiple processes in parallel, and inter-process
synchronization is implemented by using message transmission.
 The parallel libraries, compilers, and math libraries are located above the OS and
under application software. They are collectively referred to as the parallel
environment. Parallel running of HPC applications depends on proper parallel
libraries, compilers, and math libraries. Optimizing parallel libraries, compilers, and
math libraries can improve application performance in the same hardware
environment.
 Key:
 F
 T
 F
 F
 T
 Exciting breakthroughs are happening every month, every week, even every day.
On newspapers, one technical page is usually insufficient.
 The upgrade speed of AI technologies has broken the Moore's Law. A new
generation of technologies comes every year.
 Massive data generated on edge devices needs to be processed locally and in real
time, driving new demands such as privacy, security, high bandwidth, and low
costs.
 Though we may not realize, the era of edge computing has come.
 Massive data, especially unstructured data, needs to be processed intelligently.
 Traditional big data analysis technologies cannot meet the requirements.
 The popularization of AI technologies, especially the maturity of lightweight and

miniaturization technologies, is ushering in the intelligent edge era.
 In the intelligent edge era, local devices will have strong computing power to
tackle future challenges.
 Edge-cloud integration will open up scattered data silos to enable data flow and
Internet of Everything.
 In addition, the continuous improvement of AI technologies and edge computing

power provides unprecedented possibilities for new business applications.
 According to IDC statistics, more than 50 billion terminals and devices will be
connected by 2020. In the future, more than 50% of data needs to be analyzed,
processed, and stored at the network edge.
 The number of global street lamps will reach 350 million in 2025. These street
lamps are connected to billions of cameras and various environmental sensors.
Each street lamp generates several gigabytes of data every day, which needs to be
analyzed, processed, and stored.
 Edge devices are transformed so that they can carry AI. In addition, AI capabilities
on the cloud are also transformed to lightweight systems so that they can adapt to
the software environment, hardware environment, and usage scenarios of edge
devices.
 Edge devices have the following characteristics:
 Small memory capacity and week computing power
 Requires miniaturized models.
 Requires applications that can be quickly started and loaded.
 Does not support multithreading.
 Cloud computing:
 Cloud computing is a computing mode that uses the Internet to share
resources such as computing devices, storage devices, and applications
anytime, anywhere, and on demand.
 Fog computing:
 According to Cisco's definition, fog computing is a distributed computing
infrastructure oriented to the Internet of Things (IoT). It extends computing
power and data analysis applications to the network edge, enabling
customers to analyze and manage data locally and thereby obtain real-time
results through the connections.
 MCC:
 Mobile cloud computing (MCC) integrates cloud computing, mobile
computing, and wireless application communication technologies to improve
service quality for mobile users and provide new service opportunities for
network operators and cloud service providers.
 MEC:
 Mobile edge computing (MEC) is considered to be a key factor in the
evolution of the cellular base station model. It combines edge servers and
cellular base stations to connect to or disconnect from remote cloud data
centers.
 Cloud computing is centralized and far away from terminal devices such as
cameras and sensors. Deploying computing power on the cloud will cause
problems such as high network latency, network congestion, and service quality
deterioration, which cannot satisfy the requirements of real-time applications.
However, terminal devices usually have limited computing power compared with
the cloud. Edge computing well addresses this problem by extending computing
power from the cloud to edge nodes near terminal devices.
 After the intelligent edge is implemented, edge nodes are managed so that
applications on the cloud can be extended to the edge. Data on the edge and that
on the cloud are collaborated to support remote management, data processing,
analysis, decision making, and intelligentization. In the mean time, unified O&M
capabilities such as device/application monitoring and log collection are provided
on the cloud to build a cloud-edge-synergistic edge computing solution for
enterprises.
 Key technologies:
 Horizontal expansion architecture that supports millions of edge devices
 Edge node access
 Heterogeneous hardware support for optimal costs
 128 MB minimum memory capacity supported by the edge computing

hardware
 Lightweight edge functions and container engines
 Unified functions and standards on the cloud and edge
 OCI container application ecosystem
 Edge AI and real-time streams for cloud-edge synergy
 Analysis and edge time sequence database
 Secure and reliable cloud-edge data channel

 The intelligence required by the intelligent edge is multi-dimensional capabilities
rather than simple and single-point technologies.
 The framework and software stack of intelligent edge computing consist of the
following parts: 1. Hardware acceleration at the bottom layer. 2. Localized,
miniaturized, and lightweight intelligence at the middle layer. 3. Cloud-edge
synergy such as capability delegation, desensitized data upload, and device
management at the upper layer.
 Full-stack hardware capabilities, full-stack software capabilities, and computing

power of public cloud, private cloud, and hybrid cloud constitute a closed iron
triangle of technical capabilities.
 Professional and fixed usage, distributed physical deployment, fragmented
software and hardware systems, and lightweight architecture
 Wind, sun, rain, dust, high and low temperature, maintenance difficulties, and low
power consumption
 Limited computing and storage resources, poor network quality, narrow

bandwidth, low latency, low cost, privacy, and security
 Remote edge device management from the cloud

 Terminal-side common HD IPCs:
 Face capturing
 Video analysis on the edge
 Edge side:
 The recommended edge hardware is x86 servers with GPUs. Small campus
can use Atlas 300 or Atlas 500.
 IEF pushes the facial detection, crowd monitoring, and perimeter detection
algorithms to edge nodes for deployment.
 IEF manages the application lifecycle (with the algorithm iteratively
optimized).
 IEF centrally manages containers and edge applications.
 Strengths and benefits of the cloud-edge synergy solution:
 Low latency: Images uploaded by cameras are processed quickly and locally.
 Service benefits: Surveillance videos are intelligently analyzed, and abnormal
security events such as intrusions and large crowd gatherings are detected in
real time, which reduce labor costs.
 Edge-cloud synergy: Edge applications are managed throughout the entire
lifecycle and are seamlessly upgraded.
 Model training on the cloud: Models using algorithms of good scalability are
automatically trained. Upgrade is easy.
 High compatibility: Existing campus IPCs can be transformed into intelligent
cameras through edge-cloud synergy.
 Terminal-side HD cameras:
 Facial recognition
 Image processing on the edge
 Edge side:
 The recommended edge hardware is Atlas 300 or Atlas 500 (with GPUs).
 IEF pushes edge facial recognition, customer flow monitoring, and heat map
applications for deployment.

optimized).
 IEF manages containers and edge hardware.
 Low latency: Images uploaded by cameras are processed quickly and locally.
 Service benefits: Overall store operating status is analyzed, and the

consumption habits of important consumers and crowd distribution are
identified, providing information for operation decision-making.
 Edge-cloud synergy: Edge applications are managed throughout the entire

lifecycle and are seamlessly upgraded.
 Model training on the cloud: Models are automatically trained and the
Ascend chips are supported.
 Optical character recognition (OCR) for finance and logistics
 Industry, benchmark customer, and scenario
 Terminal camera:
 Available on the customer side
 Image upload and application reception
 Edge side:
 Customers purchase recommended edge hardware (common servers).
 Customers deploy their own data management applications and databases.
 IEF pushes the edge OCR slicing application (containerized).
 Data privacy protection: Structured private data is stored and processed

locally.
 Low latency: OCR slicing is performed locally to ensure efficiency.
 Accurate identification: OCR text recognition is performed on the cloud, with

accuracy of over 95%.
 Intelligent edge: The cloud centrally pushes edge slicing applications and
manages the entire lifecycle.
 Finance and logistics OCR
 Industry, benchmark customer, and scenario
 Terminal-side HD cameras:
 Infrared photography (4 MB to 5 MB)
 Production line cell image obtaining
 Edge side:
 The recommended edge hardware is Atlas 300 or Atlas 500 (with GPUs).
 IEF pushes the visual quality inspection model to the edge for deployment.
optimized).
 IEF manages containers and edge hardware.
 Low latency: The model is run locally and the latency of single-model
processing is less than 2s.
 Quality inspection: The quality inspection accuracy rate is 100%. The small
image processing latency is 100 ms and will be improved to 60 ms in the
future.
 Edge cloud synergy: Edge applications and devices are scheduled and
managed centrally.
 Model training on the cloud: Models are automatically trained and the
Ascend chips are supported.
 This chip has the strongest inference capability in the industry. A single chip
supports AI inference and analysis for 16 channels of HD videos.
 The chip is developed by Huawei HiSilicon.
 The accelerator card integrates four intelligent chips and can process 64 channels
of videos independently.
 With built-in hardware encoding and decoding capabilities, massive onboard

memory, and large data throughput, the accelerator card can exponentially reduce
the I/O resources occupied by external devices.
 The two products consume very little power and are ideal for edge devices and
edge cloud data centers.
 This is an edge server used for intelligent analysis and inference.
 It has the following features: high density, energy saving, ultra-large storage, and
ultra-high computing power. One such server is equal to multiple common servers.
It is ideal for edge nodes where the conditions of the installation environment are
limited.
 The 4U chassis can support 16 high-performance AI accelerator cards.
 This server supports intelligent analysis for 256 channels of videos of people,
vehicles, and other objects.
 It supports up to 24 3.5 in. drives and up to 240 TB storage capacity.
 12 DDR4 DIMM slots are provided.
 This server meets the requirement for massive storage.
 The maximum working temperature is 55°C.
 The 675 mm deep chassis can be installed in a shallow-depth cabinet.
 The maximum power consumption is 120 W TDP.

 Equipped with strong AI computing power
 25 W low power consumption
 -30°C to 60°C outdoor working
 Industrial-level resistance against wind, sun, rain, and dust
 No fan is used for heat dissipation and maintenance is not required.
 Size of an STB
 16 TOPS powerful local computing power
 14 TB ultra-large local drives
 16-channel HD video encoding/decoding+intelligent processing

 Artificial intelligence includes "artificial" and "intelligence". "Artificial" means that
the object is created or made by humans. "Intelligence" is related to issues such as
consciousness, mind, ego, and imagination. Many issues about intelligence are still
under research. Artificial intelligence is the study on how to allow man-made
systems (computers) to achieve or surpass human intelligence.
 Supervised learning: uses examples whose categories are known to adjust the
parameters of a classifier to achieve certain performance targets. It is also called
supervised training or learning with a teacher.
 Unsupervised learning: attempts to find the hidden structure of unlabeled data.
 Reinforcement learning: allows intelligent systems to map environment states to

activities to achieve the maximum function value of the reward signal
(reinforcement signal).
 Deep learning: abstracts human brains from the angle of information processing
to build simple models and networks connected in different ways.
 Machine learning: allows machines (computers) to learn knowledge.
 Deep learning is a new domain of machine learning and was proposed by Hinton
and other scholars in 2006. Deep learning is derived from the multi-layer neural
network, and its essence is to combine feature representation and learning. Deep
learning gives up on the interpretability and simply pursues the learning
effectiveness.
 The Atlas 800 is preconfigured with the AI system and can be used immediately
out of the box. Customers can focus more on business scenarios without worrying
about the complexity of infrastructure. For example, in a bank, a large number of
credit card applications are processed every day. Generally, a bank specialist can
handle only 50 applications each day. With Atlas 800, a bank specialist can handle
more than 1,200 applications a day.
 The PC running MindSpore Studio is connected to the Atlas 200 DK through a USB
port or network port. The Atlas 200 DK consists of the Hi3559C multimedia
processing chip and the Atlas 200 AI accelerator module.
 Content processing:
 80% of data processed by Internet data centers is unstructured. This issue will
be particularly prominent after 5G is popularized.
 AI technology is used to review contents to safeguard the security redline of

Internet services.
 Precision marketing:
 Digital advertising is an industry with sufficient competition. In the past, only

highly structured and labeled data is processed and deep learning was hardly
used in the bidding phase. Digital advertisements mainly concentrate on
large platforms. The key of precision marketing is the bidding phase based
on the correlations between visitor IDs, visit contents, advertisement
materials, and target products, during which the required information needs
to be provided within milliseconds.
 New retail:
 The number of stores connected to Alibaba Ling Shou Tong has exceeded 1
million.
 The number of Suning Xiaodian stores reached about 4,000 by the end of
2018.
 Vehicle-mounted AI:
 The main purposes of vehicle-mounted AI are to improve driving safety,

promote standard driver operations, and avoid fatigue driving.
 Power system AI simulation analysis can be applied in the following research fields:
AI modeling and application based on experience from large grid simulation
analysis, knowledge discovery of large grid simulation analysis, intelligent analysis
and adjustment of the large power grid power flow mode, and intelligent analysis
and control of the large grid stability.
 Numerical weather forecasting for the electric power industry needs to provide
wind speed and lightning forecasts that are more accurate than national weather
forecasts.
 Scheduling problems such as reactive power optimization, optimal generator
combination, load prediction, and optimal power flow are difficult. The reason is
that building accurate and practical models for these problems is difficult and
there are bottle necks in knowledge obtaining and maintenance. AI can make up
for the shortcoming of mathematics methods in resolving scheduling problems.
 AI technologies are used to monitor and maintain power generators in real time
and provide a series of fault diagnosis and rectification solutions. Faults are quickly
rectified at the initial stage to ensure that the system runs normally and improve
the equipment stability, power generation efficiency, and service life.
 Aerial photography is an important method for power transmission line inspection.
Manually processing the obtained photographs is inefficient and may neglect
some important potential risks.
 Fixed surveillance videos of transmission lines and substations are not intelligently
diagnosed. Real-time risk monitoring and intrusion warning for dangerous areas
are not supported.
 Manual customer service is inefficient and cannot automatically recommend
solutions to customers. Online self-service subscription and query are unavailable
and the response rate of customer service is low.
 Quick customer identity verification, improving account opening efficiency
 Rapid identification and recording of certificate and document information,

reducing labor costs
 VIP customer identification, improving customer satisfaction
 Intelligent crowd counting

Course
 Intelligent customer service is applied in the 10086 voice navigation system and
taobao.com as an intelligent e-commerce channel. Intelligent assistants also help
agents quickly understand customers' demands.
 Intelligent response, saving human workload
 Intelligent outgoing calls for precise service promotion
 Continuous optimization based on user feedback, improving user experience over

time
 Automatic generation of analysis reports

 Benefits: The efficiency is improved by 60%, and the review time is reduced from
30 seconds to 10 seconds.
 Pain points:
 Manual handling efficiency decrease with the working time.
 Non-standard and blurry invoices cause mistakes and reworking.
 Manual handling accuracy and efficiency decreases with the working time.
 The labor cost is high, especially during peak periods of the logistics industry.
 Manual handling has privacy disclosure risks.
 Violent sorting may happen during manual handling.

 Traditional video surveillance relies on long-time manual participation and cannot
achieve prevention before events and automatic alarm report during events.
 The Atlas 200 and Atlas 500 are used to intelligently reconstruct cameras and
access control systems, enabling the surveillance system to provide functions such
as VIP identification, blacklist identification, and conflict warning. The security and
user experience of financial institutions are improved.
 This is Huawei's full-stack all-scenario AI portfolio.
 “All-scenario” refers to the various AI deployment scenarios, including public

clouds, private clouds, edge computing in all forms, industrial IoT devices, and
consumer devices.
 “Full-stack” refers to its technical function. Huawei's full-stack portfolio includes

chips, chip enablement, a training and inference framework, and application
enablement.
 The full-stack portfolio consists of the following parts:
 Ascend: AI IP and chip series based on a unified and scalable architecture. It

comprises five series: Max, Mini, Lite, Tiny, and Nano. The Ascend 910 boasts
the highest computing density per chip among all AI chips in the global
market. Ascend 310 is an AI SoC that delivers the highest computing power
for edge computing scenarios.
 CANN: chip operator libraries and highly automated operator development

toolkit
 MindSpore: unified training and inference framework for the device, edge,
and cloud (independent or collaborative)
 Application enablement: full-process services (ModelArts), layered APIs, and

pre-integration solution
 Business challenges:
 China has over 1.6 million kilometers of high-voltage transmission lines and
over 4 million transmission towers and poles. The stability of power grids is
vital to the development of the country and people's livelihoods.
 For reliable power supply, transmission lines need regular inspections. The
traditional method is risky and consumes a lot of labor and resources.
 Solution:
 The Atlas 200 module is embedded into traditional cameras on towers to

implement unattended, intelligent power grid inspection. More inspection
methods, such as drones, will be supported in the future. Application
scenarios will also be extended to the power generation, transformation, and
distribution phases.
 The Atlas 200 enables real-time surveillance, analysis, and risk warning at the
front end, improving timeliness, reducing manual workload, and increasing
accuracy.
 The Atlas 200 has a low-power design. The entire camera consumes 8 W,
runs on solar power, and is maintenance-free throughout its lifecycle.
 The customer is about to deploy metro line 17 and wants to build a facial
recognition system. The system will serve purposes such as risk warning
against large passenger traffic, specific personnel identification and
monitoring, passenger behavior identification and tracking, and accurate
investigation for criminal cases.
 These functions will enable fine-grained management of public security for

metro services. The system poses high requirements on facial recognition
algorithm precision and system stability.
 The system deployment environment has many restrictions and requires the
devices to be space-saving and support future capacity expansion.
 Simplifies preventive maintenance and improves preventive maintenance
efficiency.
 Reduces manpower, material, and inspection costs.
 Shortens the inspection period; quickly and effectively locate risks.
 Locates faults in real time, and dispatches resources to handle the problems to
control loss.
 As AI researches keep expanding to more fields, the credit card center of

banks cannot be left alone. The customer plans to build a new platform for
the model training of unstructured data (images, voice, and natural
language).
 AI technology is complex and AI software deployment is difficult. The

customer requires delivery in the form of an AI appliance with integrated
hardware and software.
 SenseTime, YISA, and Exocr Technologies are Atlas partners.

 Key:
 Teaching and practices, content processing, precision marketing, new retail,

vehicle-mounted AI, AI simulation analysis of the power system, intelligent
customer service, and smart assistant.

HCIA-Intelligent Computing V1.0 Training Material

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HCIA-Intelligent Computing V1.0 Training Material

Uploaded by

Copyright:

Available Formats

 For the big data technology, what matters is mining the significance of massive

 A simple understanding of a so-called GPT is that it has to serve multiple purposes,

 Mainframe and minicomputer based dedicated computing marks computing 1.0.

 From dedicated computing to general-purpose computing, to full-stack all-

 Compute power supply: Compute power is scarce and expensive.

 Data collaboration: Data center and edge data cannot be effectively

 Deployment in diversified scenarios: The actual applications of different

 Kunpeng computing platform is equipped with TaiShan servers for general

 Ascend computing platform employs Atlas products for AI computing.

 Huawei provides a full-scenario all-round intelligent computing solution.

 Analog ICs, such as sensors, power management circuits, and operational

 x86 is a closed hardware architecture and has been implemented in

 The IBM Power architecture is mainly designed for supercomputing and

 The ARM hardware architecture is open to partners, creating an end-to-end

 From the technical perspective, ARM is a multi-core, many-core architecture. It is

 As a semiconductor intellectual property (IP) supplier, ARM designs architectures

 Its revenue sources include:

 Licensing fees from semiconductor companies, one-off within a certain

 Royalties from semiconductor companies on chips that contain its

 Fees for technical consulting services from semiconductor companies and

 Features smaller chip sizes, lower power consumption, higher integration,

 Provides fixed-length instructions, flexible and simple addressing modes, and

 Due to the limitation of RISC, complex computing needs to be completed by

 In the DC field, Huawei is a new comer. The application ecosystem of Huawei

 In the single-core era. this era, the focus of performance improvement is to

 Parallel computing offers important improvements: Multiple processors are

 The CPU is a general-purpose processor and provides computing and controlling

 Semi-customizable ASIC circuit

 Programmable (hardware programming and C/C++ programming), high EER, low

 The CPU is a general-purpose processor and provides computing and controlling

 The advantages of the FPGA are as follows:

 High performance: strong parallel capability and good real-time performance

 Strong programmability: in-depth customization and quick rollout

 Smooth scalability: from IP modules of dozens of mW to chips of hundreds of

 Centers on a processing unit.

 Adopts the storage program principle.

 The memory unit is a space that is accessed by address and is linearly

 A control flow is generated by an instruction flow.

 An instruction consists of an operation code and an address code.

 Data is encoded in binary mode.

 Migration from minicomputers to Intel processors:

Minicomputers with a closed architecture have disadvantages such as

 Migration from minicomputers to x86 CPUs:

According to the preceding analysis, the HP SuperdomeX and Inspur Tesora

 The appearance and structure of a tower server are similar to those of a

 Each blade is actually a system mainboard.

 A rack server is a server that is designed based on a unified standard and

 Rack servers are tower servers with an optimized structure.

The performance of a computer is largely determined by the CPU, and the

 Higher efficiency than a large disk

 Dynamic expansion of storage capacity

 Simple storage management

 The field programmable gate array (FPGA) is developed based on

 For example, databases, Tomcat of Apache, WebLogic application server of

 Middleware facilitates communication between software components,

 Application software can be a specific program, such as an image

 According to the number of supported users: single user OS (MSDOS and

 According to the openness of source code: open-source OS (Linux and

 According to the handling modes: batch processing system (DOS), time-

 The Unix OS is mainly applied to industries such as telecommunications,

 A Linux-based system is a modular Unix-like operating system, deriving much

 Web technologies are used for development.

 Integrates internal information of an enterprise.

 Diversified access channels are provided.