You are on page 1of 188

Huawei Servers

Troubleshooting

Issue 20
Date 2020-09-25

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: https://e.huawei.com

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. i


Huawei Servers
Troubleshooting About This Document

About This Document

Overview
This document describes how to collect logs, diagnose faults, upgrade software,
perform preventive maintenance and common operations, and collect the
information required to for troubleshoot Huawei E9000, E6000, X6000, X8000,
X6800, rack, heterogeneous, Atlas 800 AI inference (model 3010), and Atlas 800 AI
training (model 9010) servers.
It guides you through the server troubleshooting process.

Intended Audience
This document is intended for:
● Technical support engineers
● Maintenance engineers

Symbol Conventions
The symbols that may be found in this document are defined as follows.

Symbol Description

Indicates a hazard with a high level of risk which,


if not avoided, will result in death or serious
injury.

Indicates a hazard with a medium level of risk


which, if not avoided, could result in death or
serious injury.

Indicates a hazard with a low level of risk which,


if not avoided, could result in minor or moderate
injury.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. ii


Huawei Servers
Troubleshooting About This Document

Symbol Description

Indicates a potentially hazardous situation which,


if not avoided, could result in equipment damage,
data loss, performance deterioration, or
unanticipated results.
NOTICE is used to address practices not related to
personal injury.

Supplements the important information in the


main text.
NOTE is used to address information not related
to personal injury, equipment damage, and
environment deterioration.

Change History
Issue Date Description

20 2020-09-25 This issue is the twentieth official release.


Updated 5.6 Handling Faults Based on
Symptoms.

19 2020-07-16 This issue is the nineteenth official


release.
Added information about the Atlas 800
AI training server (model 9010).

18 2020-05-12 This issue is the eighteenth official


release.
Changed FusionServer G5500 to
FusionServer Pro G5500.

17 2020-04-29 This issue is the seventeenth official


release.
Deleted contents related to the
ServiceCD.

16 2019-09-30 This issue is the sixteenth official release.


Added information about Atlas 800 AI
inference server (model 3010).

15 2019-09-19 This issue is the fifteenth official release.


Added contents related to the MM920/
MM921.

14 2019-07-05 This is the fourteenth official release.


Added information about SmartKit.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. iii


Huawei Servers
Troubleshooting About This Document

Issue Date Description

13 2019-01-08 This is the thirteenth official release.

12 2018-07-13 This is the twelfth official release.

11 2018-05-18 This is the eleventh official release.


Added description about the FusionServer
G2500 heterogeneous server.

10 2018-03-12 This issue is the tenth official release.


Added information about the
FusionServer Pro G5500 server.

09 2017-12-14 This issue is the ninth official release.


Added description about the CX916
switch module of the E9000 server.

08 2017-08-08 This issue is the eighth official release.


Modified 4.4.1.1 Connecting a PC to the
Ethernet Switching Plane.

07 2017-07-20 This issue is the seventh official release.


Modified 4.4.2.1 Collection Method.

06 2017-04-18 This issue is the sixth official release.


Added description that faulty E9000
compute nodes cannot be reseated in 5.5
Checking Indicators to Locate Faults.

05 2016-10-27 This issue is the fifth official release.


Added the quick recovery method for
E9000 switch modules in 5.6 Handling
Faults Based on Symptoms.

04 2016-07-11 This issue is the fourth official release.


● Modified 4.4.2.7 Using the Switch
Module CLI to Collect FC Switching
Plane Information (MX210/MX220).
● Added the quick recovery method in
5.6 Handling Faults Based on
Symptoms.

03 2016-05-10 This issue is the third official release.


● Deleted the "Using the Web Tools of a
Switch Module to Collect Information
About an FC Switching Plane (NX120/
NX220/MX210/MX220)" section.
● Modified 4.4.2.4 Using the V8 Switch
Module CLI to Collect Ethernet
Switching Plane Information.
● Added 9 Other Resources.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. iv


Huawei Servers
Troubleshooting About This Document

Issue Date Description

02 2015-10-27 This issue is the second official release.


● Added 5.5 Checking Indicators to
Locate Faults.
● Added description about how to
collect FreeBSD and Solaris host
information in 4.2 Collecting OS
Logs.

01 2015-10-09 The issue is the first official release.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. v


Huawei Servers
Troubleshooting Contents

Contents

About This Document................................................................................................................ ii


1 Safety Instructions.................................................................................................................. 1
2 Troubleshooting Process........................................................................................................ 5
3 Preparing for Troubleshooting............................................................................................. 7
4 Collecting Information......................................................................................................... 11
4.1 Collecting Basic Information............................................................................................................................................. 11
4.2 Collecting OS Logs................................................................................................................................................................12
4.3 Collecting Hardware Logs.................................................................................................................................................. 13
4.4 Collecting Switch Module Logs (for E9000+MM910).............................................................................................. 14
4.4.1 Preparing for Log Collection.......................................................................................................................................... 14
4.4.1.1 Connecting a PC to the Ethernet Switching Plane............................................................................................. 14
4.4.1.2 Querying the Software Version of the Ethernet Switching Plane................................................................. 17
4.4.2 Collecting Switch Module Logs.....................................................................................................................................18
4.4.2.1 Collection Method......................................................................................................................................................... 18
4.4.2.2 Using SmartKit to Collect Switch Module Logs................................................................................................... 20
4.4.2.3 Using the V5 Switch Module CLI to Collect Ethernet Switching Plane Information.............................. 20
4.4.2.4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane Information.............................. 23
4.4.2.5 Using the Web Tools Page of a Switch Module to Collect FC Switching Plane Information (MX510)
............................................................................................................................................................................................................ 28
4.4.2.6 Using the Switch Module CLI to Collect FC Switching Plane Information (MX510).............................. 30
4.4.2.7 Using the Switch Module CLI to Collect FC Switching Plane Information (MX210/MX220)...............32
4.5 Collecting Switch Module Logs (for E9000+MM910/MM921)............................................................................. 35
4.6 Collecting Qlogic HBA Logs.............................................................................................................................................. 36
4.7 Collecting Other Logs.......................................................................................................................................................... 36

5 Diagnosing and Rectifying Faults......................................................................................37


5.1 Fault Diagnosis Rules.......................................................................................................................................................... 37
5.2 Using Tools to Diagnose Faults........................................................................................................................................ 38
5.3 Handling Alarms................................................................................................................................................................... 39
5.4 Using Error Codes to Locate Faults................................................................................................................................ 40
5.5 Checking Indicators to Locate Faults............................................................................................................................. 41
5.6 Handling Faults Based on Symptoms............................................................................................................................ 77

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. vi


Huawei Servers
Troubleshooting Contents

5.6.1 Power Failures.................................................................................................................................................................... 79


5.6.2 KVM Login Faults.............................................................................................................................................................. 83
5.6.3 POST Faults......................................................................................................................................................................... 85
5.6.4 Memory Errors.................................................................................................................................................................... 92
5.6.5 Drive I/O Faults.................................................................................................................................................................. 94
5.6.6 Ethernet Controller Faults.............................................................................................................................................. 96
5.6.7 FC Controller Faults........................................................................................................................................................ 102
5.6.8 Switch Module Faults.................................................................................................................................................... 106
5.6.9 OS Faults............................................................................................................................................................................ 108

6 Software and Firmware Upgrade.................................................................................... 113


7 Preventive Maintenance.................................................................................................... 115
7.1 Inspecting the Equipment Room Environment and Cable Layout.................................................................... 115
7.1.1 Precautions........................................................................................................................................................................ 115
7.1.2 Inspecting the Equipment Room Environment..................................................................................................... 116
7.1.3 Inspecting Cable Layout............................................................................................................................................... 116
7.2 Inspecting Servers...............................................................................................................................................................117
7.2.1 Precautions........................................................................................................................................................................ 117
7.2.2 Inspecting Indicators...................................................................................................................................................... 117
7.2.3 Using SmartKit to Perform Health Inspection...................................................................................................... 118
7.2.4 Checking the System Status Through iBMC.......................................................................................................... 118
7.3 Huawei Server Inspection Report..................................................................................................................................120

8 Common Operations.......................................................................................................... 125


8.1 Obtaining a Product SN................................................................................................................................................... 126
8.2 Using iMana 200 to Collect Information in Batches.............................................................................................. 134
8.3 Using iBMC to Collect Information in Batches......................................................................................................... 135
8.4 Using the MM910 WebUI to Collect Information in Batches (for Versions Earlier Than U54 2.20)..... 138
8.5 Using the MM910 WebUI to Collect Information in Batches (for U54 2.20 or Later)............................... 138
8.6 Using the FusionDirector WebUI to Collection Information in Batches.......................................................... 139
8.7 Using the MM510 CLI to Collect Information (FusionServer Pro G5500)...................................................... 139
8.8 Logging In to the iMana 200 WebUI........................................................................................................................... 140
8.9 Logging In to the iBMC WebUI......................................................................................................................................143
8.10 Logging In to the Web Tools of the MX510........................................................................................................... 148
8.11 Logging In to the MM910 WebUI.............................................................................................................................. 149
8.12 Logging In to the FusionDirector WebUI................................................................................................................. 154
8.13 Logging In to the MM510 CLI..................................................................................................................................... 157
8.14 Logging In to the RMC CLI........................................................................................................................................... 159
8.15 Logging In to a Server Over a Network Port by Using PuTTY......................................................................... 163
8.16 Logging In to a Server Over a Serial Port by Using PuTTY............................................................................... 166
8.17 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the SOL Function
of the MM910............................................................................................................................................................................. 168
8.18 Logging In to a Compute Node, Passthrough Module, or Switch Module by Using the SOL Function
of the MM920/MM921............................................................................................................................................................ 170

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. vii


Huawei Servers
Troubleshooting Contents

8.19 Using WinSCP to Transfer Files................................................................................................................................... 171


8.20 Configuring an FTP Server............................................................................................................................................ 173
8.21 Using SFTP to Transfer Files......................................................................................................................................... 174

9 Other Resources.................................................................................................................. 176


9.1 Obtaining Technical Support.......................................................................................................................................... 176
9.2 Product Information Resources..................................................................................................................................... 177
9.3 Product Configuration Resources.................................................................................................................................. 178
9.4 Maintenance Tools............................................................................................................................................................. 178

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. viii


Huawei Servers
Troubleshooting 1 Safety Instructions

1 Safety Instructions

General Instructions
● Comply with all local laws and regulations when installing the hardware.
These Safety Instructions are only a supplement.
● Observe the instructions that accompany all "DANGER", "WARNING",
"CAUTION", and "NOTE" symbols in this document. Follow them in
conjunction with these Safety Instructions.
● Observe all safety instructions provided on the device labels when installing
hardware. Follow them in conjunction with these Safety Instructions.
● Operations involving high voltages or moving equipment must be performed
by authorized, qualified personnel.
● Take protective measures against radio interference before operating the
device in residential areas.

Personal Safety
● Only personnel certified or authorized by Huawei are allowed to install
equipment or its components.
● Discontinue any dangerous operations and take protective measures. Report
anything that could cause personal injury or equipment damage to a project
supervisor.
● Do not move devices or install cabinets and power cables in hazardous
weather conditions.
● The average weight carried by a person cannot exceed the maximum
acceptable weight of lift (MAWL) allowed by local safety regulations. Before
moving a device, check the maximum device weight and arrange required
personnel.
● Wear clean protective gloves, ESD clothing, a protective hat, and protective
shoes, as shown in Figure 1-1.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 1


Huawei Servers
Troubleshooting 1 Safety Instructions

Figure 1-1 Protective clothing

● Before contacting devices, wear antistatic clothing and ESD gloves, and take
off electricity-conductive materials such as watches and jewelries, as shown in
Figure 1-2.

Figure 1-2 Conductive objects to be removed

Figure 1-3 shows how to wear an ESD wrist strap.


1. Secure the wrist strap around your wrist.
2. Fasten the strap buckle and ensure that the ESD wrist strap is snug against
the skin.
3. Insert the attached ground terminal into the jack on the grounded rack or
chassis.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 2


Huawei Servers
Troubleshooting 1 Safety Instructions

Figure 1-3 Wearing a wrist strap

● Exercise caution when using tools that could cause personal injury.
● Use a stacker when lifting hardware above shoulder height.
● Avoid any contact with high-voltage cables.
● Ensure that the device is properly grounded before powering it on.
● Do not use a ladder alone.
● Do not look into optical ports without eye protection.

Equipment Safety
● Use dedicated power cables to ensure equipment and personal safety.
● Use power cables only for dedicated devices.
● When moving a device, hold the handles or bottom of the device. Do not hold
the handle of the installed module, such as a power module, fan module,
drive, or mainboard.
● Connect the power cables to separate power distribution units (PDUs) for
active/standby operation.

Transportation Precautions
● The logistics company engaged to transport the equipment must be reliable
and comply with international standards for transporting electronics. Ensure
that the equipment being transported is always kept upright. Take necessary
precautions to prevent collisions, corrosion, package damage, damp
conditions and pollution.
● Transport the equipment in its original packaging.
● If original packages are not used, package heavy, bulky items (such as chassis
and compute nodes) and fragile components (such as PCIe cards and optical
modules) separately.
NOTE

Use Computing Product Compatibility Checker to query the components supported by


the compute node or server.
● Power off all equipment before transportation. Do not transport hazardous
materials.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 3


Huawei Servers
Troubleshooting 1 Safety Instructions

Weight Limits Per Person

CAUTION

To reduce the risk of personal injury, comply with local regulations with regard to
the maximum weight one person is permitted to carry.

Table 1-1 lists the maximum weight each person is permitted to carry by
standards organization.

Table 1-1 Maximum handling weight


Organization Weight (kg/lb)

European Committee for Standardization (CEN) 25/55.13

International Organization for Standardization (ISO) 25/55.13

National Institute for Occupational Safety and Health 23/50.72


(NIOSH)

Health and Safety Executive (HSE) 25/55.13

General Administration of Quality Supervision, ● Male: 15/33.08


Inspection and Quarantine of the People's Republic of ● Female: 10/22.05
China (AQSIQ)

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 4


Huawei Servers
Troubleshooting 2 Troubleshooting Process

2 Troubleshooting Process

Troubleshooting is a process of using appropriate methods to find the cause of a


fault and rectify the fault. The guideline of troubleshooting is to narrow down the
scope of possible causes for a fault to reduce troubleshooting complexity, identify
the root cause, and rectify the fault.
Figure 2-1 shows the recommended troubleshooting process.

Figure 2-1 Troubleshooting flowchart

Table 2-1 Troubleshooting steps


Step Description

3 Preparing for Prepare the manuals and tools required for fault diagnosis
Troubleshooting and rectification.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 5


Huawei Servers
Troubleshooting 2 Troubleshooting Process

Step Description

4 Collecting Collect comprehensive information for fault diagnosis.


Information

5 Diagnosing Locate the fault and take troubleshooting measures.


and Rectifying
Faults

9.1 Obtaining If a fault is difficult to locate or rectify after you refer to


Technical documents, contact Huawei technical support.
Support

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 6


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

3 Preparing for Troubleshooting

Scenarios
This section describes how to prepare for troubleshooting.

Basic Knowledge and Skills


Get familiar with the following basic knowledge and skills before troubleshooting:
● Server product knowledge
● Danger signs and levels
● Server hardware architecture
● Indicators on the front and rear panels
● Systems that run on servers
● Device operating conditions
● Common hardware operations such as power-on and power-off
● Common software operations such as upgrade
● Device maintenance process

Essential Materials
Table 3-1 lists the materials that you must read before routine maintenance for
Huawei servers.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 7


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Table 3-1 Essential materials for routine maintenance

Document Description How to Obtain


Type

User Guide Describes the server 1. Log in to Support >


structure, specifications, and Intelligent Servers or
installation method. Each Support > Ascend
Huawei server has a user Computing.
guide or maintenance and 2. Choose a server model to
service guide. access the product page.
3. On the Documentation
tab page, choose
Operation &
Maintenance.
4. View the required user
guide or maintenance and
service guide.

Alarm Handling Describes common alarms 1. Log in to Support >


reported to the server iMana Management Software >
200/iBMC or management Server Management
module, and alarm handling Software > iBMC >
suggestions. Each Huawei Troubleshooting > Alarm
server has an alarm Handling.
reference. 2. View the corresponding
alarm handling manual.

Equipment Describes the regulations for Comply with the customer's


Room equipment room equipment room
Management management and routine management regulations
Regulations maintenance. during onsite maintenance.

Software Tools
Table 3-2 lists the software tools required for routine maintenance of Huawei
servers.

Table 3-2 Tools for routine maintenance

Tool Server and Description


Version

FusionServer Huawei- Diagnoses and configures servers for fault


Tools Toolkit developed V2 locating.
and V3 servers. Download link: FusionServer Tools
For details, see
the FusionServer
Tools 2.0 Toolkit
User Guide.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 8


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Tool Server and Description


Version

FusionServer For details, see Used for new site deployment and
Tools 2.0 the FusionServer delivery, troubleshooting, and firmware
SmartKit Tools 2.0 upgrade.
SmartKit User Download link: FusionServer Tools
Guide.
Smart See the Smart Used to install OSs without a physical
Provisioning Provisioning DVD-ROM drive, configure RAID, upgrade
User Guide. firmware, and perform troubleshooting.
Download link: Smart Provisioning

PuTTY All Huawei Third-party tool used for remote access.


servers of all You can obtain the tool from the Internet.
versions

WinSCP All Huawei Third-party tool used for file transfer for
servers of all iMana 200/iBMC or the management
versions module. You can obtain the tool from the
Internet.

WFTPD All Huawei Third-party tool used for file transfer for
servers of all the Ethernet switching plane of a switch
versions module. You can obtain the tool from the
Internet.

CoreFTPServer/ All Huawei Third-party tools used for file transfer for
mini-sftp-server servers of all the FC switching plane of a switch
versions module. You can obtain the tool from the
Internet.

Hardware Tools
Table 3-3 lists the hardware tools required for routine maintenance of Huawei
servers.

Table 3-3 Hardware tools required for routine maintenance


Tool Description

Floating nut hook Used to guide floating nuts to the holes in the
mounting bars of a rack.

Screwdriver Used to tighten and loosen screws. A screwdriver can be


a flat-head, Phillips, hex screwdriver.

Diagonal pliers Used to trim insulation tubes and cable ties.

Multimeter Used to measure the resistance and voltage and to


check connectivity.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 9


Huawei Servers
Troubleshooting 3 Preparing for Troubleshooting

Tool Description

ESD wrist strap Used to prevent ESD damage when you touch or
operate devices or components.

Electrostatic Used to prevent ESD damage to a board or precision


discharge (ESD) instrument when you insert, remove, or hold them.
gloves

Cable tie Used to bind cables.

Ladder Used to perform operations at heights.

PC Used to access the management network port or a


service network port over the network to capture data.
(You need to prepare a network cable.)

Serial cable Used to connect the serial port on the server. The serial
port is usually a DB9 or RJ45 port.

Thermometer and Used to measure the equipment room temperature and


hygrometer relative humidity.

Oscilloscope Used to measure the voltage and time sequence.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 10


Huawei Servers
Troubleshooting 4 Collecting Information

4 Collecting Information

About This Chapter


If a fault occurs on a server, collect logs for fault diagnosis.

Collect logs immediately upon fault occurrence to obtain the original data.
4.1 Collecting Basic Information
4.2 Collecting OS Logs
4.3 Collecting Hardware Logs
4.4 Collecting Switch Module Logs (for E9000+MM910)
4.5 Collecting Switch Module Logs (for E9000+MM910/MM921)
4.6 Collecting Qlogic HBA Logs
4.7 Collecting Other Logs

4.1 Collecting Basic Information


The customer needs to collect basic information listed in Table 4-1 before
submitting a service request.

Table 4-1 Server fault records

Server fault records

Trouble Ticket Example: 123456 Fault Report Example:


No. Time 2015-10-18
20:30:00
Customer Name Full name of your Address Example: 20 Baker
organization Street, New York
Customer Example: John Contact Info Phone number
Contact/ASP Smith and email address
Name

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 11


Huawei Servers
Troubleshooting 4 Collecting Information

Server fault records

Device Model Example: RH2285 SN/ESN Example:


V2 2102310XXXXX
(For details about
how to obtain the
value, see 8.1
Obtaining a
Product SN.)

Hardware If the device configuration (CPUs, DIMMs, RAID controller


Configuration cards, or NICs) is modified, you need to provide the
modified configuration. If the configuration is not modified,
enter None.

OS and Service Example: SLES 11 SP1 64-bit or Oracle 10.2. (Consider the
Software Version fault symptom to determine whether to collect the OS and
service software versions.)

Fault Occurrence Example: 2015-10-18 20:30:00


Time

Fault Symptom Example: The server frequently restarts during OS


installation or the server stops responding upon power-on.
Action Before Example: BIOS settings configuration, memory capacity
Fault Occurrence expansion, network settings modification.
Action and Example: After the power cable is disconnected and then
Result After reconnected, the fault persists.
Fault Occurrence After the DVD-ROM is replaced, the fault persists.
(Optional)
...

4.2 Collecting OS Logs


Collect OS logs after an OS fault occurs.

NOTICE

● Obtain the customer's written authorization before collecting information.


● Logs collected by SmartKit may contain sensitive customer information. If
sensitive customer information is involved, obtain the customer's written
authorization before performing any maintenance operation.

Table 4-2 describes the methods for collecting logs of different OSs.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 12


Huawei Servers
Troubleshooting 4 Collecting Information

Table 4-2 Methods for collecting OS logs


OS Collection Method

Window Use SmartKit to collect Windows and Linux (RHEL, SLES, CentOS,
s Ubuntu) logs. For details, see the FusionServer Tools 2.0 SmartKit
User Guide.
Linux

VMware ● If the purple screen of death (PSOD) does not occur, perform the
following steps:
1. Log in to the ESX server console as the root user.
2. Run the vm-support command to collect all VMware logs.
3. After logs are collected, check that a log file in the esxsupport-
YYYY-MM-DD@HH-MM-SS.tgz format is generated in
the /var/tmp directory.
● If the PSOD occurs and the customer retains the site environment,
perform the following steps:
1. Capture a screenshot of the PSOD or take a photo to save the
displayed information.
2. Press Alt+F12 to switch to forcible memory information output
mode, and press Alt+PageUp/Alt+PageDown to capture
screenshots and photos. Ensure that screenshots and photos of
the last several screens are captured after the PSOD occurs.
3. Hot-restart the system, and run the vm-support command to
collect all VMware logs.
4. After logs are collected, check that a log file in the esxsupport-
YYYY-MM-DD@HH-MM-SS.tgz format is generated in
the /var/tmp directory.
● If the PSOD occurs and the customer hot-restarts the system, run
vm-support to collect all of the VMware logs and check that a log
file in the esxsupport-YYYY-MM-DD@HH-MM-SS.tgz format is
generated in the /var/tmp directory.

FreeBSD Log in to the OS CLI over SSH and copy all files in /var/log/.
Copy the messages file and all files prefixed with messages (for
example, messages.0) in /var/log/ before copying other files.

Solaris Log in to the OS CLI over SSH and copy all files in the /var/log/
directory and /var/adm/ directory.
Copy the syslog file and all files prefixed with syslog (for example,
syslog.0) in /var/log/, and copy the messages file and files prefixed
with messages (for example, messages.0) in /var/adm/ before
copying other files.

4.3 Collecting Hardware Logs


Collect hardware logs after a hardware fault occurs.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 13


Huawei Servers
Troubleshooting 4 Collecting Information

NOTICE

● Obtain the customer's written authorization before collecting information.


● Logs collected by SmartKit may contain sensitive customer information. If
sensitive customer information is involved, obtain the customer's written
authorization before performing any maintenance operation.

You can use one of the following methods to collect hardware logs:
● Use SmartKit to collect server hardware information in batches. For details
about the supported servers and operations, see section "Using SmartKit >
Collecting Server Logs" in the FusionServer Tools 2.0 SmartKit User Guide.
● Use iBMC to collect hardware logs of a single server. For details, see 8.3 Using
iBMC to Collect Information in Batches.
● Use iMana 200/iBMC to collect hardware logs. For details, see the 8.2 Using
iMana 200 to Collect Information in Batches or 8.3 Using iBMC to Collect
Information in Batches.
● Use SmartKit to collect hardware logs and Windows/Linux logs. For details,
see the FusionServer Tools 2.0 SmartKit User Guide.

4.4 Collecting Switch Module Logs (for E9000+MM910)

4.4.1 Preparing for Log Collection

4.4.1.1 Connecting a PC to the Ethernet Switching Plane


Connect a PC to the Ethernet switching plane before logging in to the switching
plane.

Procedure
Step 1 Connect the Ethernet port of the PC to the management network ports of the
active and standby MM910 modules over the LAN. Figure 4-1 shows the network
connection.

NOTICE

● The MGMT port on the MM910 panel is the management network port.
● If the active MM910 MGMT port has been connected to the network by using a
network cable and the client needs to be directly connected to the MM910, do
not directly disconnect the network cable from the active MM910 MGMT port.
Otherwise, an active/standby MM910 switchover will be triggered, which may
cause network interruption. You are advised to connect the client to the active
MM910 STACK port in the chassis by using a network cable. If the active
MM910 STACK port has been connected to the MGMT port in another chassis,
use an idle active MM910 STACK port in another chassis.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 14


Huawei Servers
Troubleshooting 4 Collecting Information

Figure 4-1 Network connections

NOTE

The MM910 management port is provided by two modes.


● An Ethernet port on the switch module in slot 2X or 3X can be used as the MM910
management port. However, if a CX910/CX911/CX912/CX913/CX915 is in slot 2X or 3X,
only a GE port can be used; if a CX920 is in slot 2X or 3X, only a 10GE port can be
used; if a CX916/CX916L/CX930 is in slot 2X or 3X, only a 25GE port can be used. If a
CX110/CX111/CX310/CX311/CX312/CX320/CX710 is in slot 2X or 3X, any port can be
used. The CX910/CX911/CX912/CX913 are not recommended for providing the
management network port of the management module.
● The MGMT port on the MM910 panel can be used as the MM910 management port.
● For MM910 (U54) 2.25 or earlier, the port on the switch module in slot 2X or 3X is
used as the management network port by default. In this case, do not connect the
MGMT port on the MM910 and the port on the switch module to the same network
at the same time, otherwise, a network storm occurs and the network is interrupted.
For MM910 (U54) 2.26 or later, the MGMT port is used as the management network
port by default. For details about how to query the version, see the MM910
Management Module V100R001 User Guide.
● You can run the outportmode command to change the mode in which the MM910
management port is provided. For details, see "Querying and Setting the Network Port
Out Mode (outportmode)" in the MM910 Management Module V100R001
Command Reference

Step 2 Use an SSH tool and the MM910 floating IP address to connect to the MM910 CLI.
For details about how to use PuTTY for SSH login, see 8.15 Logging In to a
Server Over a Network Port by Using PuTTY.

NOTE

Step 3 to Step 5 configure the IP address and route for the management network port of
the Ethernet switching plane. If the IP address and routing information of the management
network port have been configured, skip Step 3 to Step 5.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 15


Huawei Servers
Troubleshooting 4 Collecting Information

Step 3 (Optional) Run the following command to query the IP address of the
management network port of the Ethernet switching plane:

smmget -l swiN:fruM -d swipcontrol

The parameters are described as follows:

● N indicates the slot number of the switch module. The value range is 1 to 4,
mapping to logical slot numbers 1E, 2X, 3X, and 4E from left to right on the
panel respectively.
● M: indicates the ID of the switching plane. The value for the Ethernet
switching plane is 2.

Check whether the IP address is 0.0.0.0.

● If yes, go to Step 4.
● If no, go to Step 5.

Step 4 (Optional) Run the following command to set an IP address for the management
network port of the Ethernet switching plane:

ipmcset -l <bladeN|swiN> -d ipaddr -v <ipaddr> <mask> [gateway]

The parameters are described as follows:

● N indicates the slot number of the switch module. The value range is 1 to 4,
mapping to logical slot numbers 1E, 2X, 3X, and 4E from left to right on the
panel respectively.
● M: indicates the ID of the switching plane. The value for the Ethernet
switching plane is 2.
● ipaddress: indicates the IP address of the management network port.
● maskaddress: indicates the subnet mask of the management network port.

Step 5 (Optional) Configure the gateway for the switching plane by running the
following command so that the switching plane can communicate with the PC:
NOTE

For stacked switching planes, configure the gateway only for the master switching plane.

smmset -l swiN:fruM -d route -v targetvalue maskvalue gatewayvalue

The parameters are described as follows:

● N indicates the slot number of the switch module. The value range is 1 to 4,
mapping to logical slot numbers 1E, 2X, 3X, and 4E from left to right on the
panel respectively.
● M: indicates the ID of the switching plane. The value for the Ethernet
switching plane is 2.
● targetvalue: indicates the target network segment IP address of the switching
plane.
● maskvalue: indicates the subnet mask of the switching plane.
● gatewayvalue: indicates the gateway IP address of the switching plane.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 16


Huawei Servers
Troubleshooting 4 Collecting Information

Example: Administrator@SMM:/#ipmcset -l swi2 -d ipaddr -v 172.200.2.153


255.255.0.0 172.200.0.1

----End

4.4.1.2 Querying the Software Version of the Ethernet Switching Plane


Query the software version of the switching plane before upgrading a switch
module.

Prerequisites
● The switch modules have been powered on.
● For logging in to the Ethernet switching plane over SSH, the default username
is root and the default password is Huawei12#$.
● By default, the MM910 username is root and the password is Huawei12#$.
● You are familiar with the parameters required for this operation.

Table 4-3 Parameter description

Parameter Example Value

IP address and subnet mask of the ● IP address of the management


management network port on the network port: 192.168.9.61
Ethernet switching plane ● Subnet mask: 255.255.255.0

Floating IP address, subnet mask, ● IP address: 10.85.4.77


and gateway of the MM910 ● Subnet mask: 255.255.255.0
● Gateway: 10.85.4.1

Procedure
Step 1 Connect the PC to the Ethernet switching plane.
For details, see 4.4.1.1 Connecting a PC to the Ethernet Switching Plane.
Step 2 Log in to the CLI of the Ethernet switching plane by using the SOL function of the
MM910.
For details about SOL login, see 8.17 Logging In to a Compute Node,
Passthrough Module, or Switch Module by Using the SOL Function of the
MM910.
Step 3 Run the following command to query the version of the Ethernet switching plane:
display version
● Information similar to the following is displayed:
BoardName : CX910
CPLD Version : 003
PCB Version : VER.A
Bootrom Version : 008
Creation Time : Sep 17 2012, 09:53:25
Backup Bootrom Version : 008

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 17


Huawei Servers
Troubleshooting 4 Collecting Information

Creation Time : Sep 17 2012, 09:53:25


Switch Version : 1.1.0.200.3
Creation Time : Oct 17 2012, 17:10:28
Backup Switch Version : 1.1.0.200.3
FC BoardName : UNKNOWN
FC PCB Version : UNKNOWN

If the command output contains Switch Version, the software version is V5 .


● Information similar to the following is displayed:
Huawei Versatile Routing Platform Software
VRP (R) software, Version 8.60 (OSCA V100R002C01)
Copyright (C) 2012-2013 Huawei Technologies Co., Ltd.
HUAWEI OSCA uptime is 0 day, 0 hour, 20 minutes

CX910_10GE(Master) 3 : uptime is 0 day, 0 hour, 20 minutes


StartupTime 2013/12/16 01:54:58
Memory Size : 2048 M bytes
Flash Size : 1024 M bytes
CX910_10GE version information
1. PCB Version : CX910_10GE VER C
2. MAB Version : 1
3. Board Type : CX910_10GE4. CPLD1 Version : 013
5. BIOS Version : 038
6. Software Version : 1.2.1.0.39

If the command output contains Software Version, the software version is


V8.

----End

4.4.2 Collecting Switch Module Logs

4.4.2.1 Collection Method


Table 4-4 lists the methods for collecting switch module logs.

Table 4-4 Methods for collecting switch module logs


Switc Switch Prerequis Log Collection Method Reference Link
hing Module ites
Plane Type
Type

Fabric, - 4.4.2.2 Using SmartKit 4.4.2.2 Using


Base, Using NOTE SmartKit to
and FC SmartKit For details, see "Collecting Collect Switch
switchi to Collect Server Logs" in the Module Logs
ng Switch FusionServer Tools 2.0
planes Module SmartKit User Guide.
Logs

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 18


Huawei Servers
Troubleshooting 4 Collecting Information

Switc Switch Prerequis Log Collection Method Reference Link


hing Module ites
Plane Type
Type

Ethern V5 switch 4.4.1.1 ● Using the CLI ● Using the CLI


et modules Connecti ● Using the WebUI 4.4.2.3 Using
switchi ng a PC the V5 Switch
NOTE
ng to the The prerequisites for using
Module CLI to
plane Ethernet the one-click full collection Collect
Switchin function of the MM910 Ethernet
g Plane WebUI to collect switching Switching
plane logs are as follows: Plane
● The MM910 software Information
version is 6.00 or later.
● WebUI
● The switching plane 8.5 Using the
software version is 5.30
or later.
MM910
WebUI to
Collect
Information
in Batches
(for U54 2.20
or Later)

V8 switch ● Using the CLI


modules 4.4.2.4 Using
the V8 Switch
Module CLI to
Collect
Ethernet
Switching
Plane
Information
● WebUI
8.5 Using the
MM910
WebUI to
Collect
Information
in Batches
(for U54 2.20
or Later)

FC CX311, 4.4.1.1 Using the CLI 4.4.2.6 Using the


switchi CX911, Connecti Switch Module
ng and ng a PC CLI to Collect FC
plane CX915 to the Switching Plane
switch Ethernet Information
modules Switchin (MX510)
g Plane

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 19


Huawei Servers
Troubleshooting 4 Collecting Information

Switc Switch Prerequis Log Collection Method Reference Link


hing Module ites
Plane Type
Type

8.10 Using the GUI 4.4.2.5 Using the


Logging Web Tools Page
In to the of a Switch
Web Module to
Tools of Collect FC
the Switching Plane
MX510 Information
(MX510)

CX210, 4.4.1.1 Using the CLI 4.4.2.7 Using the


CX220, Connecti Switch Module
CX912, ng a PC CLI to Collect FC
and to the Switching Plane
CX916 Ethernet Information
switch Switchin (MX210/MX220)
modules g Plane

4.4.2.2 Using SmartKit to Collect Switch Module Logs


For details about how to use SmartKit to collect logs for the E9000 switch module,
see "Collecting Server Logs" in the FusionServer Tools 2.0 SmartKit User Guide.

4.4.2.3 Using the V5 Switch Module CLI to Collect Ethernet Switching Plane
Information

Operation Scenario
Use the E9000 server switch module CLI of the V5 platform to collect Ethernet
switching plane information, including:
● Logs
● Debugging information
● Trap information
For details about how to query the Ethernet switching plane version, see 4.4.1.2
Querying the Software Version of the Ethernet Switching Plane.

Prerequisites
Conditions
● WFTPD 4.2.4.610 or later has been installed on the PC.
● You have logged in to the Ethernet switching plane CLI. For details, see 8.15
Logging In to a Server Over a Network Port by Using PuTTY or 8.17
Logging In to a Compute Node, Passthrough Module, or Switch Module
by Using the SOL Function of the MM910.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 20


Huawei Servers
Troubleshooting 4 Collecting Information

Data
Table 4-5 describes the required parameters.

Table 4-5 Data list


Parameter Example Value

IP Address CX911: 192.168.1.100


CX912: 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is root, and the default password is
Huawei12#$.

NOTE

You can query and set IP addresses of all modules. For details, see 8.11 Logging In to the
MM910 WebUI.
● For the MM910 versions earlier than (U54) 2.20, choose System Management >
Network Management > xx > IP addresses.
● For the MM910 (U54) 2.20 or later, choose Chassis Settings > Network Settings > xx.

Software Tools
wftpd32.exe: used to transfer files between different platforms, for example, from
a PC to a switch module. This tool is third-party software. You need to prepare it
by yourself.

Procedure
Step 1 Configure the FTP server.
For detailed about the configuration operations, see 8.20 Configuring an FTP
Server.
Step 2 Configure the IP address of the management network port.
1. After logging in to the switch module by using a serial port or the SOL
function, run the following commands on the switching plane CLI to query
and set the IP address of the management network port so that the switch
module can properly communicate with the FTP server:
NOTE

Skip this step if you log in to the switch module by using a network port.
<Fabric>system-view
[Fabric]interface MEth 0/0/1
[Fabric-MEth0/0/1]ip address 192.168.100.123 24
[Fabric-MEth0/0/1]display this
#
interface MEth0/0/1

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 21


Huawei Servers
Troubleshooting 4 Collecting Information

ip address 192.168.100.123 255.255.255.0


#
return

[Fabric-MEth0/0/1]quit
[Fabric]quit
2. If the configured IP address and the FTP server address are not on the same
network segment, run the following command on the HMM CLI to configure
a gateway for the switching plane:
smmset -l swiN:fruM -d route -v targetvalue maskvalue gatewayvalue
The parameters are described as follows:
– N indicates the slot number of the switch module. The value range is 1 to
4, mapping to logical slot numbers 1E, 2X, 3X, and 4E from left to right
on the panel respectively.
– M: indicates the ID of the switching plane. The value for the Ethernet
switching plane is 2.
– targetvalue: indicates the target network segment IP address of the
switching plane.
– maskvalue: indicates the subnet mask of the switching plane.
– gatewayvalue: indicates the gateway IP address of the switching plane.
For example, if the IP address is 192.168.112.1, run the following
command:
smmset -l swi3:fru2 -d route -v 0.0.0.0 0.0.0.0 192.168.112.1
Step 3 Obtain the log information.
1. Run the following command to collect logs.
<Fabric>display diagnostic-information diag-info.txt
Now saving the diagnostic information to the device

Info: The diagnostic information was saved to the device successfully.

<Fabric>save logfile
Save log file successfully.

2. View the log file system.


<Fabric>dir
Directory of flash:/
Idx Attr Size(Byte) Date Time(LMT) FileName
0 -rw- 1,075 Apr 01 2000 23:55:17 private-data.txt
1 -rw- 1,260 Apr 02 2000 00:00:13 hostkey
2 -rw- 540 Apr 02 2000 00:00:17 serverkey
3 -rw- 148,848 Sep 08 2015 11:22:40 diag-info.txt
16,129 KB total (15,976 KB free)

<Fabric>dir flashvx:/logfile/
Directory of flashvx:/logfile/
Idx Attr Size(Byte) Date Time(LMT) FileName
0 -rw- 2,939,200 Apr 01 2000 23:55:02 log.dblg
1 -rw- 95,988 Jan 07 2014 19:16:00 2014-01-07.19-13-54.log.zip
2 -rw- 172,081 Jan 07 2014 21:35:14 2014-01-07.21-31-56.log.zip
3 -rw- 2,716,484 Jan 23 2014 01:35:24 log.log
4 -rw- 4,589,648 Jan 17 2014 12:30:48 2000-04-01.23-55-08.dblg

3. Enter the IP address, username, and password to log in to the FTP server. In
the following example, the FTP server address is 200.1.1.126 and the
username is root.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 22


Huawei Servers
Troubleshooting 4 Collecting Information

<Fabric>ftp 200.1.1.126
Trying 200.1.1.126 ...
Press CTRL+K to abort
Connected to 200.1.1.126.
220 WFTPD 2.0 service (by Texas Imperial Software) ready for new user
User(200.1.1.126 none):root
331 Give me your password, please
Enter password:
230 Logged in successfull
[ftp]

NOTE

The IP address of the FTP server is configured by the user and is on the same network
segment as the management IP address of the switch module.
4. Convert the log file into a binary file for transfer.
[ftp]binary
5. Obtain the log file.
[ftp]put flash:/diag-info.txt
200 PORT command okay
150 "F:\diag-info.txt" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 148848 byte(s) sent in 0.280 second(s) 531.60Kbyte(s)/sec.

[ftp]lcd flashVX:/logfile
The current local directory is flashVX:/logfile.

[ftp]mput *
Error: The file name . is invalid.
Error: The file name .. is invalid.
200 PORT command okay
150 "F:\log.dblg" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 1513938 byte(s) sent in 1.160 second(s) 1305.11Kbyte(s)/sec.
200 PORT command okay
150 "F:\log.log" file ready to receive in IMAGE / Binary mode
226 Transfer finished successfully.
FTP: 2689148 byte(s) sent in 1.940 second(s) 1386.15Kbyte(s)/sec.

[ftp]quit

6. View the log file in the FTP directory on the PC.

----End

4.4.2.4 Using the V8 Switch Module CLI to Collect Ethernet Switching Plane
Information

Operation Scenario
Use the CLI of an E9000 switch module to collect the following information about
the V8 platform:

● Logs
● Debugging information
● Trap information

For details about how to query the Ethernet switching plane version, see 4.4.1.2
Querying the Software Version of the Ethernet Switching Plane.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 23


Huawei Servers
Troubleshooting 4 Collecting Information

Prerequisites
Conditions
● WFTPD 4.2.4.610 or later has been installed on the PC.
● You have logged in to the Ethernet switching plane CLI. For details, see 8.15
Logging In to a Server Over a Network Port by Using PuTTY or 8.17
Logging In to a Compute Node, Passthrough Module, or Switch Module
by Using the SOL Function of the MM910.
Data
Table 4-6 describes the required parameters.

Table 4-6 Data list

Parameter Example Value

IP address CX911, CX311, and CX915: 192.168.1.100


CX912, CX210, and CX220: 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is root, and the default password is
Huawei12#$.

NOTE

You can query and set IP addresses of all modules. For details, see 8.11 Logging In to the
MM910 WebUI.
● For the MM910 versions earlier than (U54) 2.20, choose System Management >
Network Management > xx > IP addresses.
● For the MM910 (U54) 2.20 or later, choose Chassis Settings > Network Settings > xx.

Software Tools
wftpd32.exe: used to transfer files between different platforms, for example, from
a PC to a switch module. This tool is third-party software. You need to prepare it
by yourself.

Procedure
Step 1 Configure the FTP server.
For details, see 8.20 Configuring an FTP Server.
Step 2 After logging in through the serial port or SOL function, run the following
commands on the Ethernet switching plane CLI to check whether the
management network port IP address has been configured.
NOTE

Skip this step if you log in to the switch module by using a network port.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 24


Huawei Servers
Troubleshooting 4 Collecting Information

<HUAWEI>system-view
[~HUAWEI]interface MEth 0/0/0
[~HUAWEI-MEth0/0/0]display this
● If the command output is as follows with no IP address displayed, go to Step
3
#
interface MEth0/0/0
#
return
● If the command output contains an IP address and gateway address, go to
Step 4.
#
interface MEth0/0/0
ip address 192.168.100.123 255.255.255.0
#
return

Step 3 (Optional) After logging in to the switch module by using a serial port or the SOL
function, run the following commands on the Ethernet switching plane CLI to
query and set the IP address of the management network port so that the switch
module can properly communicate with the FTP server:
NOTE

Skip this step if you log in to the switch module by using a network port.

<HUAWEI>system-view
[~HUAWEI]interface MEth 0/0/0
[~HUAWEI-MEth0/0/0]ip address 192.168.100.123 24
[~HUAWEI-MEth0/0/0]commit
[~HUAWEI-MEth0/0/0]display this
#
interface MEth0/0/0
ip address 192.168.100.123 255.255.255.0
#
return

[~HUAWEI-MEth0/0/0]quit
[~HUAWEI]quit
Step 4 Obtain the log information.
1. View the log file system.
<HUAWEI>system-view
Enter system view, return user view with return command.
[~HUAWEI]diagnose
Warning: Enter diagnose view, return user view by pressing Ctrl+Z.
Info: The diagnose view is used to debug system hardware and software. Misuse of some commands
in this view will affect system performance. Therefore, use these commands with the guidance of
Huawei engineers.
[~HUAWEI-diagnose]collect diagnostic information
Info: Succeeded in collecting diagnostic information in slot 3.
[~HUAWEI-diagnose]display diagnostic-information diag-info.txt

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 25


Huawei Servers
Troubleshooting 4 Collecting Information

Now saving the diagnostic information to the


device..........................................................................................................................................................
Info: The diagnostic information was saved to the device successfully.
[~HUAWEI-diagnose]return
<HUAWEI>save logfile
Info: Save logfile successfully.
<HUAWEI>dir
Directory of flash:/

Idx Attr Size(Byte) Date Time FileName


0 drwx - Apr 07 2014 22:32:50 $_checkpoint
1 dr-x - Feb 21 2014 15:03:54 $_security_info
2 -rw- 117,788,305 Jan 01 1970 00:03:53 xxx.cc
3 -rw- 117,784,209 Feb 21 2014 14:47:03 xxx.cc
4 -rw- 76,227,537 Feb 21 2014 14:41:45 xxx.cc
5 drwx - Jan 01 1970 00:00:19 POST
6 -rw- 10,568 Feb 21 2014 18:20:01 cfg_from_smm
7 -rw- 6,575 Mar 22 2014 04:14:27 cfg_local
8 -rw- 19,435 Mar 22 2014 04:14:24 device.sys
9 -rw- 1,130,184 Apr 08 2014 16:22:11 diag-info.txt
10 drwx - Apr 08 2014 16:18:55 logfile
11 -rw- 1,838 Mar 22 2014 04:14:24 vrpcfg.zip

1,048,576 KB total (367,972 KB free)


<HUAWEI>dir logfile/
Directory of flash:/logfile/

Idx Attr Size(Byte) Date Time FileName


0 -rw- 7,971,326 Apr 08 2014 16:35:00 diag.log
1 -rw- 444,920 Feb 21 2014 18:23:11 diaglog_3_20140221182310.log.zip
2 -rw- 1,756,870 Apr 08 2014 16:18:55 diagnostic_information.zip
3 -rw- 4,269,737 Apr 08 2014 16:45:08 log.log
4 -rw- 354,428 Dec 22 2013 11:32:34 log_3_20131222113233.log.zip
5 -rw- 353,715 Jan 16 2014 08:50:19 log_3_20140116085018.log.zip

1,048,576 KB total (367,972 KB free)


2. Query stack information.
Record the queried slot numbers and roles of the stacked switch modules.
<HUAWEI>display stack
--------------------------------------------------------------------------------
MemberID Role MAC Priority Device Type Bay/Chassis

--------------------------------------------------------------------------------
2 Standby dcd2-fcf8-5600 100 CX910 2X/300

3 Master dcd2-fcf8-55c0 100 CX910 3X/300

--------------------------------------------------------------------------------
Role specifies the switch module role. The value can be Master, Standby, or
Slave, indicating the primary switch module, standby switch module, and
slave switch module respectively. Bay in Bay/Chassis indicates the switch
module slot number.
3. Obtain the log file.
<HUAWEI>ftp 192.168.100.122
Trying 192.168.100.122 ...
Press CTRL+K to abort
Connected to 192.168.100.122.
220 WFTPD 2.0 service (by Texas Imperial Software) ready for new user
User(192.168.100.122:(none)):huawei
331 Give me your password, please

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 26


Huawei Servers
Troubleshooting 4 Collecting Information

Enter password:
230 Logged in successfully

[ftp]binary
200 Type is Image (Binary)

# On the FTP server, create a log receiving directory for the master switch
module in the stack. In this example, the number 3 in swi3 indicates the stack
ID (same as the slot number) of the master switch module. (If the switch
modules are not stacked, create a log receiving directory for the current
switch module. The number 3 in swi3 indicates the slot number of the current
switch module.)
[ftp]mkdir swi3
[ftp]cd swi3
[ftp]put flash:/diag-info.txt
200 Port command successful.
150 Opening data connection for diag-info.txt.
/ 100% [***********]
226 File received ok

FTP: 1756870 byte(s) send in 0.308 second(s) 5570.431Kbyte(s)/sec.

[ftp]mput flash:/logfile/*
200 Port command successful.
150 Opening data connection for diag.log.
/ 100% [***********]
226 File received ok

FTP: 7971326 byte(s) send in 0.798 second(s) 9755.010Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for diaglog_3_20140221182310.log.zip.
/ 100% [***********]
226 File received ok

FTP: 444920 byte(s) send in 0.113 second(s) 3845.061Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for diagnostic_information.zip.
/ 100% [***********]
226 File received ok

FTP: 1756870 byte(s) send in 0.308 second(s) 5570.431Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log.log.
/ 100% [***********]
226 File received ok

FTP: 4272491 byte(s) send in 3.492 second(s) 1194.832Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log_3_20131222113233.log.zip.
/ 100% [***********]
226 File received ok

FTP: 354428 byte(s) send in 0.238 second(s) 1454.289Kbyte(s)/sec.


200 Port command successful.
150 Opening data connection for log_3_20140116085018.log.zip.
/ 100% [***********]
226 File received ok

FTP: 353715 byte(s) send in 0.265 second(s) 1303.486Kbyte(s)/sec.

[ftp]cd ..
# On the FTP server, create a log receiving directory for the standby or slave
switch module in the stack. In this example, the number 2 in swi2 indicates
the stack ID (same as the slot number) of the master switch module. (If the

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 27


Huawei Servers
Troubleshooting 4 Collecting Information

switch modules are not stacked, log in to each switch module and repeat the
preceding log collection procedure.)
[ftp]mkdir swi2
[ftp]cd swi2
[ftp]mput 2#flash:/logfile/*
[ftp]cd ..
[ftp]quit
221 Windows FTP Server (WFTPD, by Texas Imperial Software) says goodbye
<HUAWEI>

NOTE

– When you use the mput command in the FTP CLI, 2#flash:/ indicates the flash
root directory of the switch module with the stack ID 2. You can obtain the stack
ID and role information by using the display stack command.
– The flash root directory of the master switch module in a stack is flash:/.
– If multiple switch modules are displayed after running the display stack
command, obtain the log file of each switch module in the logfile directory.
4. View the log file in the FTP directory on the PC.

----End

4.4.2.5 Using the Web Tools Page of a Switch Module to Collect FC


Switching Plane Information (MX510)

Operation Scenario
Use Web Tools page of a switch module (MX510) to collect information about the
FC switching plane.
This section applies to the CX311, CX911, and CX915.

Prerequisites
Conditions
● The connection between the management IP address of the FC switch module
and the server IP address is normal.
● You have logged in to the Ethernet switching plane Web Tools page. For
details, see 8.10 Logging In to the Web Tools of the MX510.
Data

Table 4-7 Data list


Parameter Example Value

IP address 192.168.1.100

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 28


Huawei Servers
Troubleshooting 4 Collecting Information

For exporting the dump_support log file, the username is images, and the default
password is Huawei12#$.

Procedure
Step 1 On Web Tools, choose Switch > Download Support File, as shown in Figure 4-2.

Figure 4-2 Web Tools home page

Step 2 Select the directory for storing the log file, and click Start.
The log file download starts. If "Support file saved" is displayed in the Status area,
the log file has been successfully exported, See Figure 4-3.

Figure 4-3 Download Support File dialog box

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 29


Huawei Servers
Troubleshooting 4 Collecting Information

----End

4.4.2.6 Using the Switch Module CLI to Collect FC Switching Plane


Information (MX510)

Operation Scenario
Use the CLI of a switch module (MX510) to collect FC switching plane information.
This section applies to the CX311, CX911, and CX915.

Prerequisites
Conditions
● The PC has been connected to the management network port of the server by
using a network cable.
● The mini-sftp-server.exe software has been obtained.
NOTE

If the MX510 firmware version is earlier than 9.8.2.6.0, you can use the FTP tool WFTPD to
collect information. For details, see 8.20 Configuring an FTP Server.

Data

Table 4-8 Data list


Parameter Example Value

IP address 192.168.1.100

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

The default username of the switching plane is admin, and the default password
is Huawei12#$.
Software Tools
mini-sftp-server.exe: used to transfer files between different platforms, for
example, from a switch module to a PC. This tool is third-party software. You need
to prepare it by yourself.

Procedure
Step 1 Configure an SFTP server.
For details, see 8.21 Using SFTP to Transfer Files.
Step 2 Log in to the MX510.
For details about how to access the FC switching plane CLI, see 8.15 Logging In
to a Server Over a Network Port by Using PuTTY or 8.17 Logging In to a

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 30


Huawei Servers
Troubleshooting 4 Collecting Information

Compute Node, Passthrough Module, or Switch Module by Using the SOL


Function of the MM910.

Step 3 Query and set the management IP address.


1. Run the following command on the CLI of the FC switching plane to query the
management IP address:
FCoE_GW: admin> show version
*****************************************************
* *
* Command Line Interface SHell (CLISH) *
* *
*****************************************************

SystemDescription Huawei FCoE-FC Gateway module


HostName <undefined>
Eth1IPv4NetworkAddr 192.168.96.96
Eth1IPv6NetworkAddr fe80::2c0:ddff:fe24:21fe
MAC1Address 00:c0:dd:24:21:fe
WorldWideName 10:00:00:c0:dd:24:21:fd
SerialNumber 2198080446DQCB46F882
SymbolicName FCoE_GW
ActiveSWVersion V9.8.0.10.0
ActiveTimestamp Fri May 17 21:19:51 2013
POSTStatus Passed
SwitchMode Full Fabric

Eth1IPv4NetworkAddr indicates the management IP address.


2. Set the management IP address so that the switch module can properly
communicate with the configured SFTP server.
FCoE_GW: admin> admin start
FCoE_GW (admin): admin> set setup system ipv4
Set a static IPv4 address as prompted.
EthIPv4NetworkDiscovery (1=Static, 2=Bootp, 3=Dhcp, 4=Rarp) :1
EthIPv4NetworkAddress (dot-notated IP Address) : 192.168.101.123
EthIPv4NetworkMask (dot-notated IP Address) : 255.255.255.0
EthIPv4GatewayAddress (dot-notated IPv4 Address) : 192.168.101.254
Do you want to save and activate this system setup? (y/n): [n] y

3. Query the static IPv4 address of the FCoE gateway.


FCoE_GW (admin): admin> show setup system ipv4
The following information is displayed:
System Information
------------------
Eth1IPv4NetworkDiscovery Static
Eth1IPv4NetworkAddress 192.168.101.123
Eth1IPv4NetworkMask 255.255.255.0
Eth1IPv4GatewayAddress 192.168.101.254

Step 4 Obtain the log information.


1. Run the following command to collect logs and save the logs to the local PC:
FCoE_GW: admin> create support
2. Set the log collection parameters as prompted and start log collection. See
Figure 4-4.
The key information is described as follows:
– In this example, 192.168.100.214 is the PC IP address.
– The username and password for the SFTP server are both vxworks.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 31


Huawei Servers
Troubleshooting 4 Collecting Information

– If you press Enter when the CLI prompts you to specify the directory for
storing the dump file, the dump file is automatically downloaded to the
default directory on the SFTP server.

Figure 4-4 Collecting information

----End

4.4.2.7 Using the Switch Module CLI to Collect FC Switching Plane


Information (MX210/MX220)

Operation Scenario
Use the CLI of a switch module (MX210/MX220) to collect FC switching plane
information.

This section applies to the CX210, CX220, CX912, and CX916. The FC switching
planes of the CX210 and CX912 are the MX210, and those of the CX220 and
CX916 are the MX220.

Prerequisites
Conditions

● The PC has been connected to the management network port of the server by
using a network cable.
● The mini-sftp-server.exe software has been obtained.

Data

Table 4-9 Data list

Parameter Example Value

IP address 10.77.77.77

Subnet mask 255.255.255.0

Default gateway 0.0.0.0

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 32


Huawei Servers
Troubleshooting 4 Collecting Information

The default username of the switching plane is admin, and the default password
is Huawei12#$.

Software Tools

mini-sftp-server.exe: used to transfer files between different platforms, for


example, from a switch module to a PC. This tool is third-party software. You need
to prepare it by yourself.

Procedure
Step 1 Configure an SFTP server.

For details, see 8.21 Using SFTP to Transfer Files.

Step 2 Log in to the MX210 or MX220.

For details about how to access the FC switching plane CLI, see 8.15 Logging In
to a Server Over a Network Port by Using PuTTY or 8.17 Logging In to a
Compute Node, Passthrough Module, or Switch Module by Using the SOL
Function of the MM910.

Step 3 Run the ipaddrset command to set the management IP address and then run the
ipaddrshow command to check whether the IP address is correct.
● IPv4
FC_SW:admin> ipaddrset
Ethernet IP Address [10.77.77.77]:10.32.53.47
Ethernet Subnetmask [255.255.255.0]:255.255.240.0
Fibre Channel IP Addresss [none]:
Fibre Channel Subnetmask [none]:
Gateway IP Address [0.0.0.0]:10.32.48.1
DHCP [Off]:
IP address is being changed...Done.

FC_SW:admin> ipaddrshow
FC_SW:admin> ipaddrshow
Ethernet IP Address: 10.32.53.47
Ethernet Subnetmask: 255.255.240.0
Fibre Channel IP Addresss: none
Fibre Channel Subnetmask: none
Gateway IP Address 10.32.48.1
DHCP: Off

● IPv6
FC_SW:admin> ipaddrset -ipv6 --add fd00:60:69bc:82:205:33ff:fed7:f6fe/64
IP address is being changed...Done.

FC_SW:admin> ipaddrshow
SWITCH
Ethernet IP Address: 10.20.24.55
Ethernet Subnetmask: 255.255.240.0
Gateway IP Address: 10.20.16.1
DHCP: Off
IPv6 Autoconfiguration Enabled: No
Local IPv6 Addresses:
static fd00:60:69bc:82:205:33ff:fed7:f6fe/64 preferred
IPv6 Gateways: fe80:21b:3dff:fe0b:7800 fe80:21b:edff:fe0b:2400

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 33


Huawei Servers
Troubleshooting 4 Collecting Information

NOTE

The current environment uses IPv4 addresses. You do not need to set the IPv6 address.

Step 4 Run the supportsave command on the CLI to collect logs.


1. Run the following command to save logs to the SFTP server.
FC_SW:admin> supportsave
This command collects RASLOG, TRACE, supportShow, core file, FFDC data
and then transfer them to a FTP/SCP/SFTP server or a USB device.
This operation can take several minutes.
NOTE: supportSave will transfer existing trace dump file first, then
automatically generate and transfer latest one. There will be two trace dump
files transferred after this command.
OK to proceed? (yes, y, no, n): [no] y

2. Set the log collection parameters as prompted and start log collection.
– Host IP or Host Name: specifies the address for storing logs on the
target device (the SFTP server IP address).
– User Name: specifies the username for logging in to the target device
(the SFTP server username).
– Password: specifies the password for logging in to the target device (the
SFTP server password).
– Protocol: specifies the transfer protocol. Set this parameter to sftp.
– Remote Directory: specifies the directory for storing log files on the SFTP
server. Create the /support directory in the home directory of the SFTP
server, and set Remote Directory to /support.
(Optional) When "Do you want to continue with CRA (Y/N)" is displayed,
enter n to start collecting logs.

Figure 4-5 Collecting logs

If "SupportSave completed" is displayed, logs are successfully collected.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 34


Huawei Servers
Troubleshooting 4 Collecting Information

4.5 Collecting Switch Module Logs (for E9000+MM910/


MM921)
Using SmartKit
For details about how to use SmartKit to collect logs for the E9000 switch module,
see "Collecting Server Logs" in the FusionServer Tools 2.0 SmartKit User Guide.

Using FusionDirector
● FusionDirector has been installed on the MM920 and can be used to collect
chassis information.
● After FusionDirector manages the chassis of the MM921, you can use
FusionDirector to collect information.

Step 1 Log in to the FusionDirector WebUI.


For details, see 8.12 Logging In to the FusionDirector WebUI.
Step 2 Choose Compute > Hardware > Add Device > Add Online to add the chassis of
the MM920/MM921.
For details, see the FusionDirector User Guide.
Step 3 Choose Compute > Hardware > Chassis.
The list of chassis managed by FusionDirector is displayed.
Step 4 Click the chassis name.
The Overview tab page is displayed, as shown in Figure 4-6.

Figure 4-6 Overview tab page of the chassis

Step 5 Click Export Log in the Control Panel area.


The list of installed devices is displayed.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 35


Huawei Servers
Troubleshooting 4 Collecting Information

Step 6 Select the switch modules whose logs you want to export and click OK.
After the task is complete, decompress the downloaded package to obtain switch
module logs.

----End

4.6 Collecting Qlogic HBA Logs


Collect HBA logs when an NIC is faulty.
Collecting QLogic HBA logs has no adverse impact on services. The following
describes how to collect QLogic HBA logs on mainstream OSs. You can download
the scripts from the official QLogic support website.

Table 4-10 Collecting QLogic HBA logs on mainstream OSs


OS Information to Be Collected

Windows Collect system information to help technical support


personnel diagnose and rectify faults.

Linux Collect information to help diagnose Fibre Channel (FC)


and iSCSI HBA faults.

Solaris Collect data.

VMware Collect VMware system logs, including QLogic HBA logs.

4.7 Collecting Other Logs


Use the following methods to collect other host logs:
● Collect Emulex HBA logs when an NIC is faulty. Use the official tool
OneCapture to collect Emulex HBA logs. This tool may affect services.
● For details about how to collect screen recording information, see "Video
Play" in the Huawei Server iMana 200 User Guide or iBMC User Guide.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 36


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5 Diagnosing and Rectifying Faults

5.1 Fault Diagnosis Rules


5.2 Using Tools to Diagnose Faults
5.3 Handling Alarms
5.4 Using Error Codes to Locate Faults
5.5 Checking Indicators to Locate Faults
5.6 Handling Faults Based on Symptoms

5.1 Fault Diagnosis Rules

NOTICE

● Obtain the customer's written authorization before performing any operation.


● Before performing any operation, ensure that service data will not be lost or
has been backed up.

Observe the following fault diagnosis rules:


● Check the external components and then the internal components.
During troubleshooting, check external devices for faults (such as a power
failure and peer device failure) first.
● Check the network and then network elements (NEs).
According to the network topology, check whether the network environment
is normal and then check whether the NEs are normal. Determine which NE is
faulty if possible.
● Check the high-speed signal alarms and then the low-speed signal alarms.
Alarm signal streams show that high-speed signal alarms often cause low-
speed signal alarms. Therefore, clear high-speed signal alarms first.
● Analyze alarms of high severity and then analyze alarms those of low severity.
Analyze critical or major alarms first, and then analyze minor alarms.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 37


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5.2 Using Tools to Diagnose Faults

NOTICE

FusionServer Tools Toolkit, Smart Provisioning, and FusionServer Tools 2.0


SmartKit can be used only after services on the server are stopped. Notify the
customer to back up data before using the tools.

Table 5-1 Diagnosis tools


Scenario Tool Function Document Link

Single-node FusionServer ● Hardware For details about


scenario Tools Toolkit information the supported
collection server models and
● Quick the methods of
diagnosis using the tool, see
the FusionServer
● Tests for CPUs, Tools 2.0 Toolkit
drives, and User Guide.
DIMMs
● Reference tools
and scripts for
common
configuration
and
deployment
● Creation of a
bootable USB
flash drive for
easy O&M
● Automatic
configuration
diagnosis for
channel
partners

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 38


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Scenario Tool Function Document Link

Single-node Smart ● OS installation For details about


scenario Provisioning ● RAID the supported
configuration server models and
the methods of
● Firmware using the tool, see
upgrade the Smart
● Configuration Provisioning User
import and Guide.
export
● Hardware
diagnosis
● Log collection

Batch scenario FusionServer ● Hardware For details about


Tools 2.0 SmartKit configuration the supported
● Health check server models and
the methods of
● Configuration using the tool, see
check the FusionServer
● Server log Tools 2.0
collection SmartKit User
● Server burn-in Guide.
test
● Device repair
● Firmware
upgrade

5.3 Handling Alarms


This section describes how to use the server management system to handle
alarms. Search for alarm codes in the alarm handling manual to find the handling
methods. See Table 5-2 to obtain the server alarm handling manual.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 39


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-2 Methods for handling alarms


Server Type Reference

E9000 See the FusionServer Pro E9000 Server V100R001 HMM


Alarm Handling.
To check switch module alarms, run the following
commands on the Ethernet switching plane:
● display trapbuffer
● display alarm active
● display alarm history
NOTE
For details about how to log in to the Ethernet switching plane of
a switch module, see 8.15 Logging In to a Server Over a
Network Port by Using PuTTY, 8.16 Logging In to a Server Over
a Serial Port by Using PuTTY, or 8.17 Logging In to a Compute
Node, Passthrough Module, or Switch Module by Using the SOL
Function of the MM910.

E6000 See the E6000 Server V100R002 Alarm Reference.

V2 rack servers See the Huawei Rack Server Alarm Handling (iMana
200).
V2/V3/V5 rack See the FusionServer Pro Rack Server iBMC Alarm
servers Handling.
X6000 See the FusionServer Pro X6000 Server iBMC (Earlier
than V250) Alarm Handling or X6000 Server Alarm
Handling (iMana 200).
X8000 See the X8000 Server V100R001 Alarm Reference.

X6800 See the FusionServer Pro X6800 Server iBMC (Earlier


than V250) Alarm Handling.
G2500 See the FusionServer Pro G2500 Server 1.0.0 iBMC Alarm
Handling.
FusionServer Pro See the FusionServer Pro G5500 Server iBMC Alarm
G5500 Handling.
Atlas 800 AI See the Atlas 800 AI Inference Server iBMC Alarm
inference server Handling (Model 3010).
(model 3010)

Atlas 800 AI See the Atlas 800 AI Training Server iBMC (V3.01.00.00
training server or Later) Alarm Handling (Model 9010).
(model 9010)

5.4 Using Error Codes to Locate Faults


The following servers support the fault diagnosis LED: RH1288 V3, RH2288 V3,
RH2288H V3, RH5885 V3, 5288 V3, 1288H V5, 2288 V5, 2288H V5, 2488 V5,

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 40


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

2488H V5, 5885H V5, 1288X V5, 2288X V5, 2288 C V5, Atlas 800 AI inference
server (model 3010), Atlas 800 AI training server (model 9010). Table 5-3
describes the status and meanings of the fault diagnosis LED. For details about the
position of the fault diagnosis LED on each server, see the server user guide.
Figure 5-1 shows the position of the fault diagnosis LED on an RH1288 V3 server.
For details about how to rectify the fault, see the corresponding alarm handling
manual.

Table 5-3 Error codes

Module Name Displayed Meaning Diagnosis


Information Procedure

Fault diagnosis --- The server is N/A


LED operating
properly.

Error code A component For details about


fault has error codes, see
occurred. "Error Code
Handling" in the
alarm handling
guide of
corresponding
server. For details
about alarm
handling guides,
see Table 5-2.

Figure 5-1 Position of the fault diagnosis LED

5.5 Checking Indicators to Locate Faults


For details about the positions of indicators, see the sections about the front and
rear panels in the user guide of the specific server. To obtain the user guide of
each server, perform the following steps:

1. Visit Support > Intelligent Servers or Support > Ascend Computing.


2. Choose a server model to access the product page.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 41


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

3. On the Documentation tab page, choose Operation & Maintenance > User
Guide.
4. View the required user guide.

Process
Figure 5-2 shows the process for checking the indicators.

Figure 5-2 Process for checking the indicators

Indicators Available on All Servers


Step 1 Observe the status indicators of the servers.

Table 5-4 Status indicators

Indicator Status Meaning Diagnosis

Health status Steady green The server is N/A


indicator operating
properly.

Blinking red A fault alarm is 1. Log in to the


generated. iMana 200 or
iBMC WebUI to
view the alarm
information.
For details, see
"Basic
Operations" in

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 42


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The device is the Huawei


powered off or is Server iMana
faulty. 200 User
Guide, or
"iBMC WebUI
> Alarm & SEL"
in the
corresponding
iBMC User
Guide.
2. (Optional)
View the error
code on the
fault diagnosis
LED on the
front panel. For
details, see 5.4
Using Error
Codes to
Locate Faults.

Power button/ Steady green The server is N/A


indicator powered on.

Blinking yellow The iMana 200 or


iBMC
management
system is being
started. In this
case, you cannot
power on or off
the server by
pressing the
power button.

Steady yellow The server is Press the power


ready to power button to power
on. on the server. If
the server cannot
be powered on,
log in to the
iMana 200 or
iBMC WebUI and
view the alarm to
rectify the fault.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 43


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The server is not 1. If the iMana


connected to a 200 or iBMC
power source. WebUI is
accessible, log
in and check
for any alarms.
2. For the E9000
server, if the
iMana 200 or
iBMC WebUI is
inaccessible,
check whether
the PSU
indicator and
management
module
indicator on
the rear of the
chassis are
steady green. If
yes, the chassis
power supply is
normal. If no,
check the
external power
supply.
3. For the E9000
server, if the
power supply is
normal and
the PSUs are
normal,
contact
Huawei
technical
support. Do
not reseat the
compute nodes
or power on or
off the chassis.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 44


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

UID button/ Steady blue The server is NOTE


indicator being located. ● The UID
button/
Blinking blue The server is indicator helps
distinguished identify and
from multiple locate a server.
You can turn
servers that have
on or off the
also been located. UID indicator
by manually
Off The server is not pressing the
being located or is UID button or
not powered on. running a
command on
the iBMC CLI or
remotely
controlling the
UID indicator
on the iMana
200 or iBMC
WebUI.
● You can hold
down the UID
button for 4 to
6 seconds to
reset the iMana
200 or iBMC.

Step 2 View iMana 200 or iBMC system event logs (SELs) to locate faults.
Step 3 Check the status indicators of the components.
● Table 5-5, Table 5-6, Table 5-7, Table 5-8, Table 5-9, and Table 5-10
describe the meanings of the SAS/SATA drive status indicator, NVMe drive
status indicator, M.2 FRU indicator, PSU status indicator, network port
indicator, and FlexIO card status indicator, and the corresponding handling
procedures.
● Table 5-11 describes the meanings of the indicators for each module of the
RH5885 V2, RH5885 V3, and RH5885H V3, and the corresponding handling
procedures.
● Table 5-12 describes the meanings of the indicators for each module of the
RH8100 and X6800, and the corresponding handling procedures.
● Table 5-13 describes the meanings of the aggregation network port
indicators on the X6000, X6800, and X6800 V5, and the corresponding
handling procedures.
● Table 5-14, Table 5-15, and Table 5-16 describe the meanings of the MM910
management module indicator, E9000 fan module indicator, and E9000 switch
module indicator, and the corresponding handling procedures.
● Table 5-17 and Table 5-18 describe the meanings of the fan module
indicator and network port indicator on the Atlas 800 training server (model
9010), and the corresponding handling procedures.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 45


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-5 SAS/SATA drive status indicators


Drive Active Drive Fault Meaning Diagnosis
Indicator Indicator

Steady green Off The drive is N/A


operating properly.

Blinking green Data is being read


from or written to
the drive.

Steady green Blinking yellow The server is


locating the drive
Blinking green or rebuilding RAID.

Steady green, Steady yellow The drive is faulty. Log in to the


blinking green, or iMana 200 or
off iBMC and use
FusionServer
Tools Toolkit or
Smart
Provisioning to
check the drive
faults.

Off Off The drive is faulty Check whether


or not detected. the drive is
connected, or log
in to the iMana
200 or iBMC and
use FusionServer
Tools Toolkit or
Smart
Provisioning to
check the drive
faults.

Table 5-6 NVMe drive indicators


Drive Active Drive Fault Meaning Diagnosis
Indicator Indicator

Steady green Off The NVMe drive is N/A


detected and
working properly.

Blinking green at Data is being read


2 Hz from or written to
the NVMe drive.

Off The NVMe drive is


not detected.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 46


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Drive Active Drive Fault Meaning Diagnosis


Indicator Indicator

Steady green or Steady yellow The NVMe drive is Reseat the NVMe
off faulty. drive. If the
problem persists,
replace the NVMe
drive.

Off Blinking yellow at The NVMe drive is N/A


2 Hz being located by
the OS or hot-
swapped.

Blinking yellow at The hot removal Remove the


0.5 Hz process is NVMe drive.
complete, and the NOTE
NVMe drive is If the fault
removable. indicator is
blinking yellow at
0.5 Hz after the
NVMe drive is
inserted, reseat the
NVMe drive.

Table 5-7 M.2 FRU indicators


Indicator Status Meaning Diagnosis
Procedure

M.2 FRU fault Off The M.2 FRU is N/A


indicator running properly.

Blinking yellow The M.2 FRU is


being located or
the RAID is being
reconstructed.

Steady yellow The M.2 FRU 1. Check whether


cannot be the M.2 FRU is
detected or is in good
faulty. contact.

M.2 FRU activity Off The M.2 FRU is 2. If the contact


indicator not in position or is normal but
is faulty. the fault
persists,
replace the M.
2 FRU.

Blinking green The M.2 FRU is in N/A


the read/write or
synchronization
state.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 47


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Steady green The M.2 FRU is


inactive.

Table 5-8 PSU status indicator


Indicator Status Meaning Diagnosis

PSU operating Steady green The power supply N/A


status indicator is normal.
(460 W/750
W/800 W/1200 Off The PSU has no Check whether
W) power, or the the power cable is
system is on connected
standby or properly to the
abnormal. PSU and whether
the PSU is normal.

PSU operating Steady green The PSU is N/A


status indicator operating
(2000 W/2500 W/ properly.
3000 W)
Blinking green The PSU is in If the fault occurs
(once every 2 hibernation or is in an E9000
seconds) not connected server, check
properly. whether
hibernation
settings are
enabled. If
hibernation
settings are
disabled or the
fault occurs in
another type of
server, check
whether the PSU
is connected
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 48


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Steady red The PSU is not 1. Check whether


functioning the PSU is
correctly. functioning
correctly.
2. If the PSU is
functioning
correctly, check
whether the
external power
supply to the
PSU is
functioning
correctly.

Off The PSU has no Check whether


power input or is the power cable is
faulty. connected
properly.

PSU operating Steady green The PSU is N/A


status indicator operating
(500 W/900 W/ properly.
1500 W)
Blinking green ● The power NOTE
(once every supply is Do not reseat a
PSU.
second) normal.
Check whether
● The input is
the external
overvoltage or
power supply to
undervoltage.
the PSU is
functioning
correctly.

Blinking green The PSU is being N/A


(four times every upgraded online.
second)

Steady orange The input is Reseat the PSU


normal, but no and check
power output is whether the fault
supplied due to is rectified. If the
overheat fault persists,
protection, replace the PSU.
overcurrent
protection, short
circuit protection,
output
overvoltage
protection, or
some component
failures.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 49


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The PSU has no 1. Check whether


power input or is the PSU is
faulty. faulty.
2. Check whether
the external
power supply is
normal.

Table 5-9 Network port indicators


Indicator Status Meaning Diagnosis Procedure

Connection status Steady The network port is N/A


indicator/Data green connected properly.
transmission
status indicator for Blinking Data is being
a 10GE optical green transmitted.
port Off The network port is 1. Connect the port
not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 50


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

Data transmission Steady The data 1. Connect the port


rate indicator for a yellow transmission rate is to another switch,
10GE optical port lower than 10 optical fiber, and
Gbit/s. optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Steady The data N/A


green transmission rate is
10 Gbit/s.

Off The network port is 1. Connect the port


not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Connection status Steady The network port is N/A


indicator/Data green connected properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 51


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

transmission Blinking Data is being


status indicator for green transmitted.
a 10GE electric
port Off The network port is 1. Connect the port
not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Data transmission Steady The link rate is 1 1. Connect the port


rate indicator for a yellow Gbit/s. to another switch,
10GE electrical optical fiber, and
port optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Steady The link rate is 10 N/A


green Gbit/s.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 52


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

Off The network port is 1. Connect the port


not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Data transmission Blinking Data is being N/A


status indicator for yellow transmitted.
the management
network port Off No data is being 1. Connect the port
transmitted. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Connection status Steady The network port is N/A


indicator for the green connected properly.
management
network port

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 53


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

Off The network port is 1. Connect the port


not connected. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the switch is up.

Data transmission Blinking Data is being N/A


status indicator for yellow transmitted.
a GE electrical
port Off No data is being 1. Connect the port
transmitted. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Connection status Steady The network port is N/A


indicator for a GE green connected properly.
electrical port
Off The network port is 1. Connect the port
not connected. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 54


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

LOM port Steady The network port is N/A


connection status green connected properly.
indicator
Off The port is not in 1. Connect the port
use or is faulty. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Table 5-10 FlexIO card indicators


NIC Type Chip Port Status Network Operation
Mode Status
l

SM210 5719 Active Blinking Data is N/A


FlexIO card yellow being
(4 x GE transmitted
electrical on the
ports) network.

Off No data is 1. Connect the port


being to another switch
transmitted and network cable
on the to check whether
network. the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

Link Steady The network N/A


green connection
is normal.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 55


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Status Network Operation


Mode Status
l

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

● SM211 i350 Active Blinking Data is N/A


FlexIO yellow being
card (2 x transmitted
GE on the
electrical network.
ports)
Off No data is 1. Connect the port
● SM212 being to another switch
FlexIO transmitted and network cable
card (4 x on the to check whether
GE network. the original switch
electrical and network cable
ports) are normal.
2. Check whether the
NIC is operating
properly.

Link Steady The network N/A


green connection
is normal.

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

SM231 82599 Active Steady No data is N/A


FlexIO card yellow being
(2 x 10GE transmitted
optical on the
ports) network.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 56


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Status Network Operation


Mode Status
l

Blinking Data is
yellow being
transmitted
on the
network.

Link Steady The network


green connection
is normal.
Blinking
green

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

SM233 X540 Link Steady High speed N/A


FlexIO card Speed green (10 Gbit/s)
(2 x 10GE
electrical Steady Low speed 1. Connect the port
ports) yellow (1 Gbit/s) to another switch
and network cable
to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 57


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Status Network Operation


Mode Status
l

Link/ Steady No data is N/A


Active green being
transmitted
on the
network.

Blinking Data is
green being
transmitted
on the
network.

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

● SM251 CX3 Active Steady The network N/A


FlexIO green connection
card (1 x is normal.
56G IB
optical Blinking The network 1. Connect the port
port) green connection to another switch
is abnormal. and network cable
● SM252
to check whether
FlexIO Off The network the original switch
card (2 x connection and network cable
56G IB is are normal.
optical unavailable.
ports) 2. Check whether the
NIC is operating
properly.

Link Steady No data is N/A


yellow being
transmitted
on the
network.

Blinking Data is
yellow being
transmitted
on the
network.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 58


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Status Network Operation


Mode Status
l

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

SM380 CX4 Link Steady High speed N/A


FlexIO card Speed green (25 Gbit/s)
(2 x 25GE
optical Steady Low speed 1. Connect the port
ports) yellow (10 Gbit/s) to another switch
and network cable
to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.

Link/ Steady No data is N/A


Active green being
transmitted
on the
network.

Blinking Data is
green being
transmitted
on the
network.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 59


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

NIC Type Chip Port Status Network Operation


Mode Status
l

Off The network 1. Connect the port


connection to another switch
is and network cable
unavailable. to check whether
the original switch
and network cable
are normal.
2. Check whether the
NIC is operating
properly.
NOTE
For details, visit the official websites of the PCIe card vendors.

----End

Indicators Available Only on the RH5885 V2, RH5885 V3, and RH5885H V3

Table 5-11 Indicators for each module

Indicator Status Meaning Diagnosis

Power indicator Steady green The memory riser N/A


on a memory riser is on.

Off The memory riser


is off.

Memory riser Steady red A DIMM on the Locate the faulty


fault indicator memory riser is DIMM according
faulty. to the DIMM fault
locating indicator,
and replace the
faulty DIMM with
a spare one.

Off All DIMMs on the N/A


memory riser are
normal.

DIMM fault Steady red The DIMM is After you remove


locating indicator faulty. the memory riser
and hold down
the DIMM fault
locating button,
the indicator of
the faulty DIMM
turns on.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 60


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The DIMM is N/A


operating
properly.

Memory riser Steady green Memory mirroring N/A


mirroring has been
indicator configured on the
(available only on memory riser.
the RH5885H V3)
Off Memory mirroring
has not been
configured on the
memory riser.

Status indicator Steady yellow The PCIe card is If this indicator is


on a hot- abnormal, or the steady yellow but
swappable PCIe server is in the the server is not
card power-on self-test in the POST
(POST) phase. phase, check and
replace the PCIe
card.

Off The PCIe card is N/A


operating
properly.

Power indicator Steady green The power supply N/A


on a hot- to the PCIe card is
swappable PCIe normal.
card
Blinking green The PCIe card is
powering on or
off.

Off The PCIe card is


off.

Diagnostic panel Steady green A fault alarm is For details, see


on the RH5885 V2 generated for the "Components on
server server the Front Panel"
component. and "Indicators
and Buttons" in
the RH5885 V2
Server (8S)
V100R001C02
User Guide.
Off The server N/A
component is
operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 61


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Fault diagnosis Steady red A fault alarm is For details, see


panel on the generated for the "Indicators and
RH5885 V3 server server Buttons" in the
component. RH5885 V3
Server V100R003
User Guide.
Off The server N/A
component is
operating
properly.

Indicators Available Only on the RH8100 and X6800

Table 5-12 Indicators for each module


Indicator Status Meaning Diagnosis

RH8100 V3 fan Steady green The fan module Check whether


module indicator hardware or the fan module
backplane is hardware or
faulty or the fan backplane is
module software faulty and
is performing an whether the fan
online upgrade. module software
(An online is performing an
upgrade takes online upgrade.
about 3 minutes.)

Blinking green The fan module is N/A


(once every 2 properly
seconds) communicating
with the iBMC.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 62


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Blinking green The Log in to the


(four times every communication iBMC WebUI and
second) between the fan check whether the
module and the iBMC is running
iBMC is abnormal. properly.
● If the iBMC
software is
abnormal,
upgrade the
iBMC software
or replace the
high-
performance
fusion console
(HFC). For
details, see 6
Software and
Firmware
Upgrade.
● If the iBMC is
normal, reseat
the fan
module. If the
alarm persists,
replace the fan
module.

Steady red The fan module Reseat the fan


hardware or module. If the
backplane is alarm persists,
faulty. replace the fan
module.

Blinking red The fan module Reseat the fan


has an alarm, or module. If the
the fan module alarm persists,
hardware or replace the fan
backplane is module.
damaged.

Off The fan module is N/A


not powered on.

Fan module Steady green The fan module is N/A


operating status operating
indicator on the properly.
X6800
Steady red The fan module is Replace the faulty
faulty. fan module.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 63


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Off The fan module Check whether


has no power the fan module is
supply. installed properly.

Memory riser Steady green The memory riser N/A


button/status is operating
indicator properly.

Blinking green The memory riser


is in the
intermediate state
of hot swap.

Blinking red (once The memory riser View the iBMC


every second) is not operating event alarm logs
properly. to check whether
the memory riser
is faulty.

Blinking red (five The memory riser Check whether


times every is not installed the memory riser
second) properly. is installed
properly.
Off The memory riser
is off.

Memory riser Steady yellow The hot insertion Check whether


ATTN indicator or removal services can be
operation has migrated or
failed. stopped. After
services are
stopped, power
off and then
power on the
server.
● If the indicator
is off, attempt
to hot-swap
the memory
riser again. If
hot swap fails
again, replace
the memory
riser and
DIMMs on it.
● If the indicator
is steady
yellow, replace
the memory
riser and
DIMMs on it.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 64


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis

Blinking yellow The memory riser N/A


is waiting to
cancel the hot
swap operation.
To cancel the
operation, press
the memory riser
button again
within 5 seconds.

Off The hot insertion


or removal
operation is
normal.

Memory riser Steady green The memory riser N/A


backup indicator is idle.

Off The memory riser


is not idle.

Memory riser Steady green Memory mirroring N/A


mirroring has been
indicator configured on the
memory riser.

Off Memory mirroring


has not been
configured on the
memory riser.

Compute module Steady green The compute N/A


status indicator module is
operating
properly.

Blinking red (once The compute View the iBMC


every second) module is faulty. event alarm logs
to check whether
the compute
module is faulty.

Blinking red (five The compute Check whether


times every module is not the compute
second) installed properly. module is
installed properly.
Off The compute
module is not
powered on.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 65


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicators Available only on the X6000, X6800, and X6800 V5

Table 5-13 Aggregated network port indicators


Indicator Status Meaning Diagnosis Procedure

Aggregated Steady The network port is N/A


network port link green connected properly.
indicator
Off The network port is 1. Connect the port
not connected. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Aggregated Blinking Data is being N/A


network port orange transmitted over the
active status network port.
indicator
Off The network port is 1. Connect the port
idle. to another switch
and network
cable to check
whether the
original switch
and network
cable are normal.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 66


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Available Only on the E9000

Table 5-14 MM910 management module indicators

Indicator Status Meaning Diagnosis

Power indicator Steady green The MM910 has N/A


(PWR) on the been powered on.
MM910
Blinking green The MM910 is
being powered
on.

Off The MM910 is not Check whether


powered on. the MM910 is
installed properly.

Health status Steady green All components N/A


indicator (HLY) on inside the chassis
the MM910 are operating
properly.

Blinking red (once A major alarm is Check whether


every second) generated for a the MM910 is
component in the installed properly
chassis. The and log in to the
indicators on both HMM WebUI to
the active and view alarms.
standby MM910s
are red.

Blinking red (four A critical alarm is


times every generated for a
second) component in the
chassis, and the
indicators on both
the active and
standby MM910s
are red.

Blinking red (five The MM910 is not


times every installed properly.
second)

Off The MM910 is not N/A


powered on or is
being powered
on.

Active/standby Steady green The MM910 is N/A


status indicator active.
(ACT) on the
MM910 Off The MM910 is in
standby mode.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 67


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-15 E9000 fan module indicators


Indicator Status Meaning Diagnosis
Procedure

Fan module Blinking green The fan module is N/A


operating status (once every 2 operating
indicator on an seconds) properly.
E9000
Blinking green The Check whether
(four times every communication the fan module is
second) between the fan faulty by inserting
module and the it into a working
MM910 is slot. Check
abnormal, and whether the slot
the fan module is faulty by
has no alarm. inserting a
working fan
module into that
slot.

Blinking red (once The fan module 1. Log in to the


every 2 seconds) has reported an HMM WebUI
alarm. and check fan
alarms.
2. Check whether
the power
connector of
the fan module
is connected
properly. If it is
connected
properly,
replace the fan
module.

Off The fan module Check whether


has no power the fan module is
supply. installed properly
and whether its
control circuit is
functioning
correctly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 68


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-16 E9000 switch module indicators


Indicator Status Meaning Diagnosis
Procedure

Stack status Steady green A switch module N/A


indicator (STAT) that can be
stacked is active
in stacking mode
or is not stacked,
and is operating
properly.

A switch module
that cannot be
stacked is
operating
properly.

Blinking green A switch module


that can be
stacked is standby
or slave in
stacking mode
and is operating
properly.

A switch module
that cannot be
stacked is being
powered on.

Off The switch


module is not
powered on.

Health indicator Steady green The switch N/A


(HLY) module is
operating
properly.

Blinking red The switch Log in to the


module has a HMM WebUI to
fault alarm or is view event
not installed alarms, and check
properly. whether the
switch module is
installed and
operating
properly.

Off The switch N/A


module is not
powered on.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 69


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

GE electrical port Steady green The network is N/A


indicator connected
properly.

Blinking green Data is being


transmitted.

Off No data is being 1. Connect the


transmitted or the port to another
network is switch, optical
disconnected. fiber, and
optical module
to check
whether the
original switch
and optical
fiber are
normal and
whether the
type and speed
of the original
optical module
are correct.
2. Check the NIC
status in the
OS.
3. Check whether
the ports on
the switch and
NIC are up.

● Connection Steady green The port is N/A


status indicator connected
of a 10GE properly.
optical port
● 25GE optical
port
connectivity
status indicator

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 70


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Off The port is not 1. Connect the


connected port to another
properly. switch, optical
fiber, and
optical module
to check
whether the
original switch
and optical
fiber are
normal and
whether the
type and speed
of the original
optical module
are correct.
2. Check the NIC
status in the
OS.
3. Check whether
the ports on
the switch and
NIC are up.

● Data Blinking orange Data is being N/A


transmission transmitted or
status indicator received over the
of a 10GE port.
optical port
Off No data is being
● Data transmitted over
transmission the port.
status indicator
of a 25GE
optical port

40GE optical port Steady green The network is N/A


indicator connected
properly.

Blinking green Data is being


transmitted.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 71


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Off No data is being 1. Connect the


transmitted or the port to another
network is switch, optical
disconnected. fiber, and
optical module
to check
whether the
original switch
and optical
fiber are
normal and
whether the
type and speed
of the original
optical module
are correct.
2. Check the NIC
status in the
OS.
3. Check whether
the ports on
the switch and
NIC are up.

● Connection Steady orange Signals are not Check whether


status indicator synchronized the network cable
of the 8G FC between the port is connected
optical port on the switch properly and
● Data module and the whether the
transmission port on the peer optical module
status indicator device. and NIC are
for the 16G FC normal.
Blinking orange The port is
optical port (once every 2 disabled.
seconds)

Blinking orange The port is not


(twice every functioning
second) correctly.

Off If the connection


status indicator is
off, no optical
module is
installed or the
optical module is
not receiving
optical signals
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 72


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

● Connection Steady green The port is N/A


status indicator functioning
of an 8G FC correctly and its
optical port link is connected.
● Connection Blinking green The port is If the peer device
status indicator (once every 2 functioning is a switch, check
for the 16G FC seconds) correctly but whether the
optical port isolated. No link is working modes of
set up. the switches
match. For details,
see the
FusionServer Pro
E9000 Server
V100R001 Site
Deployment
Guide. If the peer
device is a storage
device, check the
port.

Blinking green A port inloop N/A


(twice every (diagnosis mode)
second) occurs.

Blinking green The link is


(four times every connected and
second) data is being
transmitted.

Off If the diagnosis Check whether


status indicator is the optical
off, no optical module is
module is installed and
installed or the operating properly
optical module is and whether the
not receiving optical cable is
optical signals faulty.
properly.

Data transmission Blinking orange An View the iMana


status indicator of (twice every overtemperature 200 or iBMC
an 8G FC optical second) alarm is event alarm logs
port on the CX911 generated if the to check whether
connection status an
indicator is overtemperature
blinking green. alarm is
generated.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 73


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Blinking orange Data is being N/A


(more than twice transmitted or
every second) received over the
port.

Off No data is being


transmitted over
the port.

Connection status Steady green The link is N/A


indicator of an 8G connected
FC optical port on properly.
the CX911
Blinking green The CX911 is
(once every being registered,
second) or the port is in
the diagnostic
state.

Blinking green The link is not Check the port,


(twice every connected optical module,
second) properly or the and optical cable.
port is not
functioning
correctly. An
overtemperature
alarm is
generated if the
data transmission
status indicator is
blinking orange
twice every
second.

Off No optical Check the optical


module is module and
installed or the optical cable.
optical module is
receiving optical
signals
abnormally.

● InfiniBand (IB) Steady green The port is N/A


optical port connected
status indicator properly.
● OPA port Blinking green Data is being
status indicator transmitted or
received over the
port.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 74


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis


Procedure

Off The port is not


connected.

Indicators Available Only on the Atlas 800 AI Training Server (Model 9010)

Table 5-17 Fan module indicators


Indicator Status Meaning Diagnosis
Procedure

Fan module Off The fan module Check whether


indicator has no power the fan module is
supply. installed properly
and whether its
control circuit is
functioning
correctly.

Steady green The fan module is N/A


operating
properly.

Blinking red The fan module 1. Log in to the


has reported an iBMC WebUI
alarm. and check fan
alarms.
2. Check whether
the power
connector of
the fan module
is properly
connected, or
replace the fan
module.

Table 5-18 Network port indicators


Indicator Status Meaning Diagnosis Procedure

2 x 100GE optical Steady The network port is N/A


port connection green connected properly.
status indicators
Blinking Data is being
green transmitted.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 75


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Indicator Status Meaning Diagnosis Procedure

Off The network port is 1. Connect the port


not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

2 x 100GE optical Steady The data N/A


port rate green transmission rate is
indicators 100 Gbit/s.

Off The network port is 1. Connect the port


not connected. to another switch,
optical fiber, and
optical module to
check whether the
original switch
and optical fiber
are normal and
whether the type
and speed of the
original optical
module are
correct.
2. Check whether
the ports on the
switch and NIC
are up.
3. Check whether
the NIC is
operating
properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 76


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5.6 Handling Faults Based on Symptoms


Table 5-19 lists the minimum configuration of servers.

Table 5-19 Minimum configuration of servers


Model Minimum Remarks
Configuration

RH1288 V3, RH2288 V3, One CPU in the CPU1 None


RH2288H V3, 5288 V3 socket

One DIMM in the


DIMM000(A) slot

RH8100 V3 (8P) One CPU in the CPU1 Dual system mode


socket (one PSU in any slot)

One memory board in


slot 1

One DIMM in the


DIMM000 slot

One HFC board in the


HFC2 slot

RH8100 V3 (dual-system One CPU in the CPU1 Dual system primary


primary 4P) slot 4P (one PSU in any
slot)
One memory board in
slot 1

One DIMM in the


DIMM000 slot

One HFC board in the


HFC2 slot

RH8100 V3 (dual system One CPU in CPU5 slot Dual system


secondary 4P) secondary 4P (one
One memory board in PSU in any slot)
slot 9

One DIMM in the


DIMM000 slot

One HFC board in the


HFC1 slot

RH5885 V3 Two CPUs in the CPU1 One PSU in any slot


and CPU2 sockets

One DIMM in the


DIMM000 slot

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 77


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Model Minimum Remarks


Configuration

RH5885H V3 Two CPUs in the CPU1 One PSU in any slot


and CPU2 sockets

One DIMM in the DIMM


A1 slot of the first
memory board

1288H V5, 1288X V5, 2288 One CPU in the CPU1 One PSU in any slot
V5, 2288C V5, 2288H V5, socket
2288X V5, 5288 V5, 5288X
V5 One DIMM in the
DIMM000(A) slot

2298 V5 Two CPUs in the CPU1 One PSU in any slot


and CPU2 sockets

One DIMM in the


DIMM000(A) slot

2488 V5, 2488H V5, 5885H Two CPUs in the CPU1 None
V5 and CPU2 sockets

One DIMM in the


DIMM000(A) slot

XH321 V5, XH321L V5 One CPU in the CPU1 None


socket

One DIMM in the


DIMM000(A) slot

XH628 V5 One CPU in the CPU1 The RAID controller


socket card is mounted to
CPU 2. If the OS drive
One DIMM in the is connected to the
DIMM000(A) slot RAID controller card,
the OS cannot be
accessed.

CH121 V5, CH242 V5, One CPU in the CPU1 None


CH121L V5, and CH221 V5 socket

One DIMM in the


DIMM000 slot

CH225 V5 Two CPUs in the CPU1 None


and CPU2 sockets

One DIMM in the


DIMM000(1A1) slot

Atlas 800 AI training server Two CPUs in the CPU1 None


(model 9010) and CPU2 sockets

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 78


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Model Minimum Remarks


Configuration

Two DIMMs in the


DIMM000(A) and
DIMM100(A) slots

5.6.1 Power Failures


The terms depicting server power status are defined as follows:
● Power connected: The server is connected to a power source and the power
indicator is on.
● Standby: The server is connected to a power source and the power indicator is
steady yellow.
● Power-on: The server is on and the power indicator is steady green.
● POST: The server is in the power-on self-test process.
Diagnose and rectify power failures depending on the symptoms.

NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs
to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 79


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

A PSU is 1. Check the PSU indicator and 1. Check whether the current
faulty record any alarms on the configuration has sufficient
(the PSU iMana 200 or iBMC WebUI. For power supplies.
has no details, see 5.5 Checking ● If yes, services are not
power Indicators to Locate Faults. affected.
output NOTE
and the ● If no, contact Huawei
● For E9000 servers, record technical support.
health alarms on the MM910 WebUI.
indicator 2. Replace the faulty PSU with
2. Check whether an "AC lost"
is a spare PSU. Do not install
alarm is generated.
blinking the faulty PSU into a server
red). ● If yes, check that the power again.
cable is connected properly
and that the PDU is
supplying power properly.
● If no, go to 3.
3. Replace the PSU with a spare
PSU and check whether the
fault is rectified.
● If yes, no further action is
required.
● If no, go to 4.
4. Replace the PSU backplane or
replace the mainboard if no
PSU backplane is configured.
Check whether the fault is
rectified.
● If yes, no further action is
required.
● If no, contact Huawei
technical support.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 80


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The rack 1. Check whether the external Follow the handling procedure
server/ power supply to the rack server to replace any faulty modules.
Atlas 800 is normal.
AI ● If yes, go to 2.
inference
server ● If no, resolve this issue.
(model 2. Replace the PSU with a normal
3010)/ one and check whether the
Atlas 800 fault is rectified.
AI ● If yes, no further action is
training required.
server
(model ● If no, go to 3.
9010) is 3. Replace the mainboard and
not PSU backplane and check
powered whether the fault is rectified.
on (all ● If yes, no further action is
indicator required.
s are
● If no, contact Huawei
off).
technical support.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 81


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The 1. Check whether the external Follow the handling procedure


chassis power supply to the chassis is to replace any faulty modules.
where a normal or whether a power
blade overload has occurred.
server or 2. Remove all compute nodes,
a high- switch modules, management
density modules and fan modules,
server is label them with the slot
located numbers, and check whether
has no their power connectors are
power. normal.
3. Remove all PSUs, install the
PSUs back one at a time in
ascending order by slot number
(ensure that only one PSU is
installed at the same time),
and check whether the chassis
can be connected to the power
source. If the chassis cannot be
connected to the power source
no matter which PSU is
installed, replace the chassis.
4. If the chassis cannot be
connected to the power source
after a PSU is installed, replace
the PSU.
5. After verifying that the chassis
and PSUs can be connected to
the power source, install only
one PSU. Then install the
switch modules, compute
nodes, fan modules and
management modules one at a
time in ascending order by slot
number, and check whether the
module can be connected to
the power source.
6. After the fault is rectified,
install the switch modules,
compute nodes, fan modules
and management modules
back into their original slots.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 82


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Sympto
m

The 1. Remove the compute node or 1. Remove the faulty compute


chassis of server node, and check whether node or server node. Check
a blade its power connector is whether other compute
server or damaged. nodes or server nodes work
high- ● If yes, replace the compute properly. (Do not install the
density node or server node node into a server again.)
server mainboard or replace the ● If yes, services are not
has chassis. affected.
power
but a ● If no, go to 2. ● If no, contact Huawei
compute 2. Do not install the faulty technical support.
node or compute node or server node 2. Follow the handling
server into a server again. Install a procedure to replace any
node spare component when faulty modules.
does not. available.

5.6.2 KVM Login Faults


1. Diagnose the fault based on the fault symptoms listed in the following table.
NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault
needs to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.
2. If the KVM connection is abnormal, you are advised to use Independent
Remote Console for login.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 83


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The KVM 1. Use a third-party tool, such as 1. Follow the handling


is PuTTY, to run the telnet IP procedure to replace any
inaccessib address:2198 command to faulty modules.
le. check whether the KVM port is 2. Restart iMana 200/iBMC
normal. The default port and replace the local PC.
number is 2198. Log in to the
iMana 200 or iBMC WebUI, 3. Connect the management
choose Configuration > network port to the local
Services, and check the KVM PC directly instead of
parameter to obtain the actual through a switching
port number. If Telnet access is network.
unavailable, use a PC to directly
connect to iMana 200 or iBMC
for troubleshooting.
2. Clear all browser and Java
cache and close all browsers.
Then re-log in to iMana 200 or
iBMC.
3. Adjust the Java security level to
medium or lower, or add the
KVM address to the Java
exception sites.
4. Check the OS and browser
versions on the client. Firefox
23.0 is recommended. For
details about the operating
environment requirements, see
the iMana 200 or iBMC help
document.

The KVM ● If the number of login users


displays exceeds the maximum value,
an error use the iBMC WebUI or CLI to
message. check whether other users are
using the KVM. If other users
are using the KVM, restart
iMana 200 or iBMC to force the
users to log out.
● If the KVM displays a message
indicating an unauthorized
user, clear all browsers and the
Java cache, and close all the
browsers. Then re-log in to the
iMana 200/iBMC.
● If the input signal is out of
range, check whether the OS
resolution exceeds the
maximum value 1280 x 1024.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 84


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

Login to ● If the keyboard and mouse


the KVM cannot be used but services are
is operating properly, reset the
successful, USB and check whether the
but the problem is solved.
KVM – If yes, no further action is
functions required.
abnormall
y. – If no, restart the service
system, clear the CMOS, and
upgrade iMana 200/iBMC
and BIOS.
● If an ISO file fails to be
mounted to the virtual DVD-
ROM drive, attempt to log in to
the virtual DVD-ROM drive port
over Telnet to check whether
the port is normal, attempt to
mount the ISO file by using
FusionServer Tools Toolkit or
Smart Provisioning to check
whether the ISO file is correct,
and upgrade the iBMC, HMM,
iMana 200, and BIOS versions.

5.6.3 POST Faults


Diagnose and rectify power-on self-test (POST) faults depending on the
symptoms.
NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs
to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 85


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The server 1. View serial port logs to For a rack server/Atlas 800 AI
fails to determine whether the iMana inference server (model 3010)/
enter the 200 or iBMC has been Atlas 800 AI training server
standby repeatedly reset. (model 9010), perform the
mode If the iMana 200 or iBMC has following operations:
after it been repeatedly reset, the logs 1. Power off the server,
powers repeatedly record the remove and reinstall the
on. (The following information: power cables, power on the
power ### JFFS2 load complete: 1107083
server, and check whether
bytes loaded to 0x8b000000
indicator ## Booting kernel from Legacy Image the iMana 200 or iBMC is
is blinking at 8a000000 ... functioning correctly.
yellow for Image Name: linux-2.6.34
over 5 Image Type: ARM Linux Kernel ● If yes, upgrade the iMana
Image (uncompressed)
minutes.) Data Size: 1511292 Bytes = 1.4
200 or iBMC by using
MiB software of its current
Load Address: 86008000 version or a later version.
Entry Point: 86008000
Verifying Checksum ... OK ● If no, check the iMana
## Loading init Ramdisk from Legacy 200 or iBMC version. If
Image at 8b000000 ...
Image Name: Ramdisk Image the version is 1.91 or
Image Type: ARM Linux RAMDisk later, go to 2; otherwise,
Image (uncompressed) go to 3.
Data Size: 1107019 Bytes = 1.1
MiB 2. Keep the power cables
Load Address: 00000000 removed and add a jumper
Entry Point: 00000000
Verifying Checksum ... OK cap to the Clear_BMC_PW
Loading Kernel Image ... OK pin on the mainboard to
OK attempt to restore the
Starting kernel ... default settings of the
iMana 200 or iBMC. Then
NOTE
reconnect power cables.
● The CH140 and CH140 V3
compute nodes of the E9000 3. Replace the mainboard or
do not provide any serial BMC board.
ports. Directly ping the IP
address of the iMana 200 or
For an E9000 server, perform
iBMC. If the ping tests the following operations:
occasionally or always fail, use 1. Remove and reinstall the
the quick recovery method. If
the problem persists, contact
compute node and check
Huawei technical support. whether the iMana 200 or
iBMC is functioning
● During the iMana 200 or
iBMC startup process, the
correctly.
serial port on a server is used ● If yes, upgrade the iMana
by default. After the startup is 200 or iBMC by using
complete, the serial port is
switched for the system serial
software of its current
port. version or a later version.
● During the iBMC startup ● If no, check the iMana
process, the serial port on a 200 or iBMC version. If
server is used by default. After the version is 1.91 or
the startup is complete, the later, go to 2; otherwise,
serial port is switched for the
system serial port.
go to 3.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 86


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

2. Contact Huawei technical 2. Keep the compute node


support. removed and add a jumper
cap to the Clear_BMC_PW
pin on the mainboard to
attempt to restore the
default settings of iMana
200 or iBMC. Then reinstall
the compute node.
3. Replace the mainboard or
BMC board.

A server 1. Collect iMana 200 or iBMC 1. Remove the external PCIe


in standby logs, and query the complex devices such as NICs and FC
mode programmable logical device HBAs. Then check whether
cannot (CPLD) register to determine the fault is rectified.
power on. whether the power supply link ● If yes, no further action is
(The to the mainboard has failed. required.
power 2. If the server cannot be
indicator ● If no, go to 2.
powered on by pressing the
is steady power button, check whether 2. Retain only the minimum
yellow.) the hardware of the server configuration (a
component where the power single CPU, a single
button is located is faulty. mainboard, and a single
DIMM). Then check whether
3. Check whether the mainboard the fault is rectified.
and DIMMs are installed
properly. ● If yes, no further action is
required.
● If no, go to 3.
3. Check whether the CPUs,
mainboard, and memory
modules are faulty, and
replace the faulty
components.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 87


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

A server 1. Check whether the external 1. Check all external power


powers power supply is normal, supplies, including the
off including the PDUs, PSUs, and PDUs, PSUs, and power
immediat power cables. cables. Replace any faulty
ely when 2. Check whether the total power components and check
powered of the PSUs configured for the whether the fault is
on. server is less than the power rectified.
required for running the server. ● If yes, no further action is
3. Collect iMana 200 or iBMC required.
logs, and query the CPLD ● If no, go to 2.
register to determine whether 2. Replace the mainboard or
the power supply link to the PSU backplane.
mainboard has failed.
NOTE
For an E9000 server, you are
advised to use the MM910 for
one-click log collection.
4. Check the power supply unit
(PSU) backplane and the
mainboard.

The 1. Collect iMana 200 or iBMC 1. Run the ipmcset -d


message logs, and query the CPLD clearcmos command to
"no register to determine whether clear the CMOS. Then check
signal" is the power supply link to the whether the fault is
displayed mainboard has failed. rectified.
immediat NOTE ● If yes, no further action is
ely after For an E9000 server, you are required.
the server advised to use the MM910 for
one-click log collection. ● If no, go to 2.
powers
NOTICE
on. 2. Set the printing level for Running the ipmcset -d
debugging the BIOS with the clearcmos command will
iMana 200 or iBMC CLI, restart restore the BIOS defaults.
the server, and save system Exercise caution when
serial port logs. When the fault running this command.
is repeated, collect iMana 200 2. Upgrade the iMana 200 or
or iBMC logs and download iBMC, and the BIOS. Then
the .bin file of the BIOS. check whether the fault is
NOTE rectified.
You can run the ipmcset -t ● If yes, no further action is
maintenance -d biosprint -v 1 required.
command to print all BIOS logs.
For details, see "Querying and ● If no, go to 3.
Setting BIOS Print Enablement 3. Remove the external
Status (biosprint)" in the iBMC
devices, including the PCIe
User Guide of the required
version. cards and HBAs. Then check
whether the fault is
rectified.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 88


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The server 1. Enable the video recording ● If yes, no further action is


repeatedl function on the iMana 200 or required.
y powers iBMC WebUI. ● If no, go to 4.
on and 2. Set the printing level for 4. Retain only the minimum
then debugging the BIOS with the server configuration (a
powers iMana 200 or iBMC CLI, restart single CPU, a single
off. the server, and save system mainboard, and a single
serial port logs. When the fault DIMM). Then check whether
is repeated, collect iMana 200 the fault is rectified.
or iBMC logs and download ● If yes, no further action is
the .bin file of the BIOS. required.
NOTE ● If no, go to 5.
You can run the ipmcset -t
maintenance -d biosprint -v 1 5. Check whether the CPUs,
command to print all BIOS logs. mainboard, and memory
For details, see "Querying and modules are faulty, and
Setting BIOS Print Enablement replace the faulty
Status (biosprint)" in the iBMC
User Guide of the required
components.
version.
3. Restore the default BIOS
settings, and check whether
the server operates properly.
● If yes, modify the BIOS
parameters in the OS side
based on actual
requirements.
● If no, collect iMana 200 or
iBMC logs, download
the .bin file of the BIOS. For
details, see the iBMC User
Guide of the corresponding
version.
NOTE
For an E9000 server, you are advised
to use the MM910 for one-click log
collection.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 89


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The POST 1. Capture the current screen.


stops 2. Collect iMana 200 or iBMC
respondin logs, and query the CPLD
g at a register to determine whether
screen. the power supply link to the
mainboard has failed.
3. Set the printing level for
debugging the BIOS with the
iMana 200 or iBMC CLI.
NOTE
You can run the ipmcset -t
maintenance -d biosprint -v 1
command to print all BIOS logs.
For details, see "Querying and
Setting BIOS Print Enablement
Status (biosprint)" in the iBMC
User Guide of the required
version.
4. Enable the video recording
function on the iMana 200 or
iBMC WebUI, restart the
server, and save system serial
port logs. When the fault is
repeated, collect iMana 200 or
iBMC logs and download
the .bin file of the BIOS.
5. Check the external USB
devices, CPUs, drives, DIMMs,
and PCIe devices.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 90


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

RAID self- 1. Capture the current iMana 1. If a RAID controller card


check is 200/iBMC KVM or local KVM firmware error exists,
suspende screen. replace the RAID controller
d. 2. Collect iMana 200 or iBMC card, supercapacitor, or BBU.
logs. Then check whether the
fault is rectified.
● If yes, no further action is
required.
● If no, go to 2.
2. Check whether the drives,
drive backplane, and SAS
cables are faulty.
● If yes, replace faulty
components.
● If no, go to 3.
3. If the RAID array is offline,
import it again. Then check
whether the fault is
rectified.
● If yes, no further action is
required.
● If no, go to 4.
4. If the BBU or supercapacitor
runs out of power, follow
the instructions shown in
the displayed messages to
keep the server running.
After the server runs for 30
minutes, check the BBU or
supercapacitor status. If the
BBU or supercapacitor is
abnormal, replace it.

NIC 1. Check whether the NIC Follow the handling procedure.


Preboot supports PXE.
Execution 2. Check the BIOS PXE
Environm configuration. Ensure that the
ent (PXE) NIC PXE function and NIC
has failed. UMC function are enabled. To
enable the NIC PXE function,
press Ctrl+S.
3. Check the NIC.
4. Check the PXE network
environment on the service
side.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 91


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

5.6.4 Memory Errors


Diagnose and rectify memory errors depending on the symptoms.

NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs
to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 92


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

The 1. Check whether the memory is 1. If the iBMC generates the


memory included in Computing "DIMMxxx Configuration
capacity Product Compatibility Error" alarm, replace the
detected Checker. related DIMM.
by the ● If yes, go to 2. 2. If the DIMM status
system is displayed in iBMC or the OS
less than ● If no, replace the memory
with a component listed in is abnormal (unidentified or
the faulty), replace the faulty
configure Computing Product
Compatibility Checker. DIMMs.
d memory
capacity. 2. Check whether memory 3. If memory mirroring or
mirroring has been enabled in memory rank sparing is
the BIOS. configured in the BIOS, the
total available memory
● If yes, the memory capacity capacity is less than the
is reduced by 50% due to configured physical memory
the memory mirroring capacity.
function. You can disable
the function in the BIOS. If 4. If the DIMMs do not meet
the problem persists, go to the DIMM configuration
3. rules, reinstall the DIMMs
by referring to Computing
● If no, go to 3. Product Memory
3. Check whether the DIMM Configuration Assistant.
installation positions meet 5. If DIMM installation slots
configuration rules. are faulty, replace the
● If yes, go to 4. mainboard.
● If no, reinstall the DIMMs in
correct slots according to
the configuration rules.
4. Check whether a "DIMMxxx
configuration error" alarm is
generated by iBMC.
● If yes, replace the faulty
DIMM. For details, see 5.3
Handling Alarms.
● If no, go to 5.
5. Check whether any DIMM
slots are abnormal. If a DIMM
slot is abnormal, replace the
mainboard.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 93


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

An 1. Install the faulty DIMM in 1. Switch the position of a


uncorrect different channels and use DIMM you suspect to be
able FusionServer Tools Toolkit faulty and a DIMM which is
DIMM orSmart Provisioning to test functioning correctly. Then,
error is the DIMM. determine whether the fault
generated ● If the error is caused by the is caused by the DIMM or
. DIMM, replace the DIMM. DIMM slot.
● If the error is caused by the ● If the fault is caused by
DIMM slot, check the the DIMM you suspect to
DIMM connector. If the be faulty, replace the
connector is damaged, DIMM.
replace the mainboard or ● If the fault is caused by
memory board. the DIMM slot, change
2. Remove the CPU connected to the CPU with a normal
the faulty DIMM channel, and one. If the problem is
check whether the CPU socket caused by the CPU,
pins are damaged. replace the CPU.
Otherwise, replace the
● If yes, replace the mainboard or memory
mainboard. board.
● If no, go to 3. 2. If the preceding steps do
3. Replace the CPU connected to not reproduce the fault, use
the faulty DIMM channel. FusionServer Tools Toolkit
or Smart Provisioning to
perform memory pressure
tests. If the fault is
reproduced, perform 1.
Otherwise, contact Huawei
technical support.

5.6.5 Drive I/O Faults


Diagnose and rectify drive I/O faults depending on the symptoms.

NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs
to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 94


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

A "Disk 1. If the drive is in a RAID array 1. If the faulty drive is not in a


Fault" and the RAID array is not RAID array (except drives in
alarm is functioning correctly, passthrough mode), the
reported troubleshoot the RAID array. drive cannot be used and
to iMana 2. If the server has stopped, use needs to be replaced. It is
200 or Smart Provisioning to inspect recommended that you
iBMC. the server hardware. If the configure RAID for all drives
server is operating, replace the and then deploy the
drive. redundant services.
3. If the fault persists, insert the 2. Back up the data of
new drive into the slot that redundant RAID arrays to
you suspect to be faulty to avoid data loss.
check whether that slot is 3. Follow the handling
faulty. procedure to replace any
NOTE faulty modules.
For RAID controller cards that
support out-of-band
management, if a hard drive is in
the Unconfigured Good
(Foreign) state, an iBMC alarm
will be generated but the fault
indicator will be off.

A RAID 1. Power off the server, swap the 1. If the redundant RAID array
controller drive that cannot be identified fails or no RAID array is
card fails with a normal drive, and configured, the related drive
to identify power on the server to check partitions are unavailable.
one or whether the drive is faulty. 2. Move the unidentified drives
more ● If the fault is caused by the or all drives in the RAID
drives. drive, replace the drive. array to a standby server.
● If the fault is caused by the Ensure that you retain their
drive slot, check whether order during this process
SAS cables are connected and attempt to back up
properly to all SAS ports on data.
the drive backplane. For 3. Follow the handling
details, see the server user procedure to replace any
guide. faulty modules.
● If the fault persists, go to 2.
2. Replace the RAID controller
card first, the SAS cables
second, and the drive
backplane third.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 95


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Handling Procedure Quick Recovery Method


Symptom

A RAID 1. Check whether the active Follow the handling procedure


controller indicators on the drives are on. to replace any faulty modules
card If they are off, ensure that without changing the drive
cannot both the power cable and installation positions.
identify drive are installed properly.
any 2. If the fault persists, check that
drives. the SAS cables and signal
cables are connected properly.
For details, see "Internal
Cabling" in the user guide.
3. If the fault persists, replace
any RAID controller card first,
the SAS cables second, and the
drive backplane third.

Note: If a fault occurs on the RH2288A V2 server, check whether the cable
connecting the mainboard to the power adapter board is connected properly.
Figure 5-3 shows the cable connection.

Figure 5-3 Cable connection

5.6.6 Ethernet Controller Faults


Diagnose and rectify Ethernet controller faults depending on the symptoms.

NOTE

● If a fault can be located using logs or tools, see "Handling Procedure". If a fault needs
to be rectified quickly onsite, see "Quick Recovery Method".
● For more fault symptoms and solutions, see the Computing Case Library. The
Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 96


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A network 1. Ensure that the NIC type, NIC 1. If a visible NIC port
port is driver, OS, BIOS version, and becomes invisible when the
invisible. iMana 200 or iBMC version on server is running, and
the server or compute node services can be interrupted,
are compatible. power the server off and on.
● If you use a system that is If the fault persists, go to 2.
not listed in Computing 2. Insert the NIC into another
Product Compatibility PCIe slot and check whether
Checker, contact the OS the fault is rectified.
compatibility team. ● If the NIC is causing the
NOTE fault, replace the NIC.
You are advised to use the system
listed in Computing Product ● If the PCIe slot is causing
Compatibility Checker. the fault, replace the
mainboard.
● If the NIC firmware and
driver versions do not
match, upgrade them to
the matching versions.
2. To check whether the PCI
device of the NIC is visible, run
the lspci | grep -i eth*
command in Linux (or
equivalent in other operating
systems) and observe the
response.
● If yes, go to 4.
● If no, go to 3.
3. If the PCI device is invisible,
perform the following steps:
a. Check the logical topology
of the NIC. If the NIC PCI
bus does not have a CPU,
screw-in PCI cards
connected to the bus are
invisible.
b. Power the iMana 200 or
iBMC off and then on.
Check whether the fault
persists.
c. Insert the NIC you suspect
to be faulty into another
slot, and a normal NIC into
the slot you suspect to be
faulty. Then check which of
these cause the fault.
4. If the PCI device is visible but
its network port is invisible,

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 97


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

the driver cannot be loaded.


To rectify the fault, perform
the following steps:
a. Run the ifconfig ethN up
command in Linux (or
equivalent in other
operating systems) to
ensure the information in
the network port
configuration file is
consistent with the actual
physical network ports and
whether the network ports
are up.
b. If an error is reported when
you install the driver in
compilation mode, run the
gcc -v and c++ -v
commands on the OS CLI. If
the command output
displays the corresponding
version information, the
GCC and C/C++ software is
installed properly.
Otherwise, install the GCC
and C/C++ software first.
c. Check the optical module
type. If an Intel NIC and a
non-Intel optical module
are configured, the driver
cannot be loaded and the
network port is invisible.
d. Reinstall the driver. Check
that no errors are reported
during the driver
installation and check
whether system logs record
any failures when loading
driver.
5. Collect OS logs. For details, see
4.2 Collecting OS Logs.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 98


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A 1. Check whether the network 1. Use the ping command to


communic cable is connected properly to check whether the server or
ation the network port. other servers on the
error 2. Use the Computing Product network have network
occurs on Compatibility Checker to faults.
a network check whether the NIC type is ● If the fault occurs on
port. compatible with the server more than one server,
board. Use Computing check whether the
Product Firmware and Driver external switching
Mapping Checker to check network is normal.
whether the NIC firmware and ● If the fault occurs only
driver versions match the OS. on one server, go to 2.
If they do not match, upgrade
2. Check the indicator to see
the NIC firmware and driver
the NIC port status. If the
first.
indicator is off, switch the
3. To check whether the network optical module, optical
ports are up, run the ifconfig cable, and uplink switch
ethN up command in Linux port related to the faulty
(the command may vary in NIC port with those of a
different OSs). To check normal NIC port if any of
whether IP addresses are set these components are
for the required network ports, faulty. Then replace them.
run the ethtool ethN
3. If the NIC is causing the
command.
fault, restart the server
4. Run the ethtool -p ethN when interruption will not
command in Linux (the affect services, and check
command may vary in other whether the communication
OSs) to check whether the is normal. If the fault
information in the network persists, power the server
port configuration file of the off and on. If the fault still
rack server/Atlas 800 AI persists, replace the NIC.
inference server (model 3010)/
Atlas 800 AI training server
(model 9010) is consistent
with the actual physical
network ports, and check
whether the network port
status indicators are on and
whether the network ports on
the switch are up.
NOTE
The ethtool -p ethN command
applies only to plug-in PCIe cards.
5. Check the network port
configuration of the switch
module by referring to E9000
Blade Server Mezzanine
Module-Switch Module

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 99


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

Interface Mapping Tool. The


network ports on both sides
must be up.
6. Check the settings of IP
addresses, gateway addresses,
VLANs, bondings, and uplink
switch network ports.
7. Collect OS logs. For details, see
4.2 Collecting OS Logs.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 100


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

A packet 1. Use the Computing Product 1. Check whether the packet


error or Compatibility Checker to loss occurs only on a single
packet check whether the NIC type is server. Run the ethtool -S
loss compatible with the server ethN command to check
occurs on board. Use Computing the packet loss type and run
a network Product Firmware and Driver the top command to check
port. Mapping Checker to check the system resource usage
whether the NIC firmware and (software interrupts, CPU
driver versions match the OS. usage, and memory usage)
If they do not match, upgrade and NIC traffic.
the NIC firmware and driver 2. When you have the
first. customer's permission to
2. Check whether there are an interrupt services, connect a
increasing number of network PC to the port and check for
port packet losses and errors. packet loss. Connect the PC
If there is no continuous to other working ports, and
increase, ignore this error. check optical modules,
3. Insert the NIC that you suspect optical cables, and uplink
to be faulty into another slot, switches. Then, replace or
and insert a normal NIC into adjust components based
the slot that you suspect to be on the actual situation.
faulty. Then, check which of 3. If the NIC is causing the
these is causing the fault. fault, restart the server
4. Connect the suspicious when interruption will not
network cable to a normal affect services, and check
server, connect a normal whether the communication
network cable to the is normal. If the fault
suspicious server, and check persists, power the server
whether the fault is caused by off and on. If the fault still
the suspicious network cable. persists, replace the NIC.
5. Switch the service traffic from
the network port that you
suspect to be faulty to a
different network port. Then,
check whether the fault is
caused by the network port.
6. To check parameters regarding
the packet error or loss, run
the ethtool -S ethN command
in Linux (or similar in other
operating systems).
7. Collect OS logs. For details, see
4.2 Collecting OS Logs.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 101


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Diagnosis Procedure Quick Recovery Method


Symptom

The 1. Use the Computing Product


performa Compatibility Checker to
nce of a check whether the NIC type is
network compatible with the server
port does board. Use Computing
not meet Product Firmware and Driver
requireme Mapping Checker to check
nts. whether the NIC firmware and
driver versions match the OS.
If they do not match, upgrade
the NIC firmware and driver
first.
2. Check whether the physical
network port meets
performance requirements.
3. Check whether the binding
between the network port
interrupt and CPU queue has
been modified.
4. To check whether the TSO and
GSO settings of the network
port have been modified, run
the ethtool -k ethN command
in Linux (or equivalent in other
operating systems).
5. To check whether the network
port buffer information has
been modified, run the
ethtool -g ethN command in
Linux (or equivalent in other
operating systems).
6. Collect OS logs. For details, see
4.2 Collecting OS Logs.

5.6.7 FC Controller Faults


Common FC Controller Faults and Handling Procedures
Diagnose and rectify FC controller faults according to the symptoms.

NOTE

For more fault symptoms and solutions, see the Computing Case Library. The Computing
Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 102


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Handling Procedure

The storage device 1. Connect to the switch and run the brocade:
fails to identify the switchshow command to query port connection
host World Wide status.
Port Name 2. If the switch fails to obtain the host WWPN, the host
(WWPN). bus adapter (HBA) cannot register with the switch. In
this case, do as follows:
a. Check that the HBA and the processor connected
to the PCIe bus are installed properly.
b. (Optional) Check the mapping between the HBAs
and switch modules for E9000 and E6000 servers.
c. Check the FC link between the HBA and the switch
by checking the optical module power, optical
fiber, and optical module compatibility. If E9000
servers are used, check the HBA work mode.
d. Ensure that the lpfc driver and firmware matching
the E9000 are installed.
e. If multiple switches are connected, check whether
the switch connection mode (AG or TR) is correct.
f. Collect the OS message logs and check lpfc driver
information for faults.
g. Collect log information of the switches.
3. If the HBA is successfully registered with the switch,
the switch obtains the host WWPN, but the storage
cannot identify host WWPNs, rectify the fault as
follows:
a. Check the FC links (optical cables and modules)
between the switch and the storage device.
b. Check whether the HBA and the storage ports are
in the same zone.
c. Check whether the zone configurations are the
same for switches from the same vendor.
d. Collect the OS message logs and check lpfc driver
information for faults.
e. Collect the log information of switches.

The storage device 1. Check whether the lpfc driver and firmware matching
has identified the the E9000 have been installed.
HBA WWPN, but 2. Collect the OS message logs and check lpfc driver
LUNs cannot be information for faults.
mapped to the host.
3. Collect log information of the switches.
4. If no faults are identified, faults may exist on the
storage device or OS SCSI application layer. Contact
the OS or storage device vendor.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 103


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Handling Procedure

Some multipath 1. Ensure that the installed lpfc driver and firmware
links of LUNs are match the E9000.
down. 2. Check for error codes on FC links between the HBA
and the storage device.
3. Collect the OS message log and check lpfc and
multipath driver information for faults.
4. Collect log information of the switches.
5. Contact the OS multipath driver vendor or storage
device vendor.

Poor data read/write 1. Check whether the installed lpfc driver and firmware
performance of match the E9000.
LUNs 2. Check for error codes on FC links between the HBA
and the storage device.
3. Run the iostat command on the host to query the I/O
delay and concurrent I/O operations.
4. Collect the OS message log and check the lpfc driver
information and the I/O queue depth configured for
the HAB driver.
5. Perform drive performance tests (read and write 100
GB and 100 MB files).
6. Contact storage analysis engineers.

Quick Recovery from FC Controller Faults


Table 5-20 describes the common quick recovery methods and handling
procedures of FC controller faults.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 104


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Table 5-20 Quick recovery methods and handling procedures of FC controller


faults
Fault Symptom Quick Recovery Method

All HBA links are 1. Check the link redundancy status.


disconnected. ● If the links are redundant, reset the switch module
ports connected to the faulty HBAs, and go to 2.
● If the links are not redundant, go to 3.
2. Check whether the ports connected to the faulty
HBAs are functioning correctly.
● If yes, check whether the fault is rectified.
● If no, migrate all services, and safely power off the
server. Next, remove and reinstall the compute
node, and power on the server. If the fault persists,
apply for spare HBAs to replace the faulty ones.
3. Before contacting Huawei technical support, it is
recommended that you migrate services and collect
switch module logs, OS logs, LLD networking
information, and device time differences.

Storage services are 1. Migrate all services, and safely power off the server.
affected but HBA Next, remove and reinstall the compute node, and
links are normal. power on the server. Then, check whether the fault is
rectified.
● If yes, no further action is required.
● If no, contact the storage vendor for quick fault
recovery.
2. Before contacting Huawei technical support, it is
recommended that you migrate services and collect
switch module logs, OS logs, LLD networking
information, and device time differences.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 105


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Quick Recovery Method

Storage LUN 1. Check for FC link error codes on the FC switch


performance issues module. If error codes exist, run the porterrshow
command and determine the cause of the fault based
on the port mapping relationships.
● If any links between the switch modules and the
external switches are faulty, remove and reconnect
the optical cables and modules. If a link is still
faulty and spare components are available, replace
any related optical cables and modules and try
again.
● If a link between an HBA and switch module is
faulty, move the compute node to a working slot
to check whether the fault is caused by the HBA,
switch module, or backplane. Replace any faulty
modules as required.
2. Clear the error code count history, observe the error
codes for 10 minutes, test the performance, and
contact the storage vendor for quick fault recovery.

5.6.8 Switch Module Faults


Switch Module Quick Recovery Method
Rectify switch module faults depending on the symptoms.

NOTE

For more fault symptoms and solutions, see the Computing Case Library. The Computing
Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 106


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Quick Recovery Method

A switch module fails to be 1. Switch between active and standby


started. After logging in to the MM910s and check whether the switch
switch module over SOL, the module can start normally.
SOL screen displays the ● If yes, no further action is required.
following: Can not get config
file from smm. Begin ● If no, go to 2.
reboot .... 2. Restart the baseboard management
controller (BMC) of the switch module and
check whether the switch module can be
started properly.
● If yes, no further action is required.
● If no, go to 3.
3. Upgrade the switch module software to the
latest version. For details, see "Upgrading
Pass-Through and Switch Modules" in the
FusionServer Pro Blade Server Upgrade
Guide.
A switch module fails to start. 1. If services are running, connect the
After logging in to the switch network cable or the optical cable to the
module over SOL, the SOL switch module and press Y to continue.
screen displays the following: 2. If no services are running, press Y to
Ensure that the optical fibers continue.
or cables are inserted on the
same ports on the panel after
the board replacement.
During system startup, do not
power off or remove the
board. To continue the
startup, press Y:.

After logging in to a switch Upgrade the switch module software to a


module over SOL, the SOL specified version or the latest version
screen shows Critical Error! depending on the displayed message.
and only the meth port can be
displayed by running display
interface.

A network storm occurs (the Perform one of the following operations:


Mulcast and Broadcast ● Run the following commands to disable
counters of a port encounter a the port with abnormal traffic:
fault). [~HUAWEI]interface 10ge 1/17/1
[~HUAWEI-10ge 1/17/1]shutdown
● Disconnect the optical cable or network
cable from the port that has abnormal
traffic.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 107


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Quick Recovery Method

A port is Up but no traffic 1. On the interface view, run the following


passes through the port. commands to check whether the fault is
rectified:
[~HUAWEI]interface 10ge 1/17/1
[~HUAWEI-10ge 1/17/1]restart
● If yes, no further action is required.
● If no, go to 2.
2. Run the reboot command to restart the
switch module.

Incorrect packets are generated Run the display interface command and
(running the display interface check CRC and Symbols.
command shows that the value 1. If the values of CRC and Symbols are not
of Total Error in the Input zero, perform the following operations:
area is not zero and keeps
increasing). ● Ensure that the optical cables are
connected properly to the faulty switch
module and the device it is directly
connected to.
● Check whether any optical cables are
damaged.
● Check whether the optical modules of
the faulty switch module and the device
it is directly connected to are working
properly.
● If there is a transmission device between
the switch module and its connected
device, check the transmission device
gateway for alarms.
2. If the values of CRC and Symbols are zero,
run the reboot command to restart the
switch module.

5.6.9 OS Faults
OS Installation Faults
Diagnose and rectify faults related to OS installation depending on the symptoms.

NOTE

For more fault symptoms and solutions, see the Computing Case Library. The Computing
Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 108


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Possible Cause Diagnosis Procedure

Incompatible Use Computing Product Compatibility Checker to check


OS compatible OSs on the server.

Incorrect Use Computing Product Compatibility Checker to check


installation compatible OSs on the server and view the OS installation
method description. For details about the OS installation description,
see the Huawei Server OS Installation Guide.

Installation 1. Check whether the OS installation procedure is correct by


process issue referring to the Huawei Server OS Installation Guide.
2. Check whether the OS installation requires a physical DVD
drive or other media.
3. Check whether the OS installation requires a special
installation DVD, for example, one integrated with drivers.
4. Check whether the OS installation DVD is an original from
the manufacturer or whether it has been modified by a
third party.
5. Disconnect any external storage devices.
6. Ensure that the default BIOS settings are used.
7. Ask the OS vendor for installation support.

Drive 1. Ensure that the target drive is identified by the RAID


identification controller, and use Computing Product Compatibility
issue Checker to check whether the target drive is compatible
with the server. Then check the BIOS to see whether the
target storage devices, including SATADOMs, microSD
cards, and built-in USB flash drives, are identified.
2. Check the RAID controller card model and determine
whether to configure RAID (LSI SAS1078, LSI SAS2108, LSI
SAS2208, LSI SAS3008, LSI SAS2308, LSI SAS3108, Avago
SAS 3408, Avago SAS 3416iMR, Avago SAS 3416IT, Avago
SAS 3508, Software RAID).
NOTE
For the V5 server/Atlas 800 AI inference server (model 3010)/Atlas
800 AI training server (model 9010), the OS can be installed on
the drives of the PCIe RAID controller card.
3. Check the RAID array properties to ensure that the boot
drive and the target drive are the same or in the same
RAID array.
4. Set the BIOS mode to UEFI if the drive capacity is over 2
TB.
NOTE
V1 and V3 servers do not support UEFI mode.
5. Check whether the drive is a 4K drive.
6. Check whether the loaded RAID controller card driver is
correct.
7. Format the drive or reconfigure the RAID array.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 109


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

OS Faults
If you have confirmed that faults are not caused by other factors, diagnose them
as follows:

Fault Symptom Diagnosis Method Conclusion

The server is Disable C state, P state, T The OS version does not


suspended or state, and ASPM in the BIOS support CPUs of the
restarted. and ensure that the server current platform.
functions correctly.

Check whether the Kdump The built-in OS drivers


information contains crashed are incompatible.
process names or board
vendor names. For example,
FC_XX indicates an FC device
breakdown.

Check whether it is a PCIe The PCIe card is


card compatibility issue. incompatible.
● There is a power supply
issue. (A cat err alarm is
generated on iMana 200
or iBMC.)
● The PCIe protocol is not
supported.
● There is a driver issue.

Check whether the The OS kernel is


breakdown screenshot incompatible with the
contains CPUidle. hardware platform.
NOTE NOTE
The G2500 server does not The G2500 server does
currently support this method. not currently support this
method.

Use the iMana 200 or iBMC Circuit hardware is


to locate the fault. For faulty.
example, determine whether
the alarm was reported for
the DIMM, drive, or
mainboard component.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 110


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Diagnosis Method Conclusion

Check whether the system A drive fault occurred.


logs contain read-only file
system records, and use
FusionServer Tools Toolkit
or Smart Provisioning to
rate the drive. Decide
whether to replace the drive
based on the result.

Check whether an imana Hardware is faulty.


cat err alarm is displayed on
iMana 200. Use the fdm log
of iMana 200 to locate the
fault.

Check whether there is a ● The hardware is


Machine Check Exception faulty.
issue. Locate such a fault by ● The software or
checking /var/log/mce.log hardware interface
and error codes of serial setting is incorrect.
port Kdump information.

Collect the following Locate the fault based


information: on the report.
● For new servers, confirm
the proportion of
abnormal servers and
check whether normal
and abnormal servers
have the same
configurations.
● For existing servers,
confirm the number of
servers that are not
functioning correctly, and
check whether the issues
occur under specific
circumstances.
● Check iMana 200 or iBMC
for hardware alarms.
After collecting the
preceding information, use
FusionServer Tools Toolkit
or Smart Provisioning to
check whether the issue
occurs on a single server or
multiple servers.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 111


Huawei Servers
Troubleshooting 5 Diagnosing and Rectifying Faults

Fault Symptom Diagnosis Method Conclusion

Check whether a breakdown ● The new software


occurs on the server under version has bugs.
specific circumstances after ● Original interfaces
software upgrades have are disabled for
been performed for security purposes
customer service software, causing issues.
database, middleware,
kernel, BIOS, management
modules, iMana 200 or
iBMC, or storage devices.

Check whether the Kdump The OS has bugs or


information of the kernel defects.
breakdown screenshot
periodically displays
update_cpu_power,
divide_error, or timer_xx.
NOTE
The G2500 server does not
currently support this method.

Check whether the Kdump


information of the
breakdown screenshot non-
periodically displays
gethostbyname.
NOTE
The G2500 server does not
currently support this method.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 112


Huawei Servers
Troubleshooting 6 Software and Firmware Upgrade

6 Software and Firmware Upgrade

Table 6-1 lists the software and firmware to be upgraded and reference
documents of servers.

Table 6-1 Upgradeable software and firmware and reference documents


Server Upgradable Software and Reference
Series Firmware

E9000 ● MM910: software, complex ● For details, see the


programmable logic device upgrade guide.
(CPLD), fan module firmware, To obtain the upgrade
and online help guide, perform the
● Chassis intelligent display: following steps:
software 1. Log in to Support >
● Compute node: iMana 200/ Intelligent Servers or
iBMC, BIOS, and CPLD Support > Ascend
Computing.
● Switch module: iBMC, CPLD,
2. Choose a server model
daughter card CPLD, and
to access the product
switching plane
page.
● Mezzanine card: firmware
3. On the
Rack server iMana 200/iBMC, BIOS, LCD, Documentation tab
CPLD, and card firmware page, choose
Installation &
X6800 iBMC, BIOS, HMM, CPLD, and Upgrade > Upgrade
card firmware Guide.
X6000 iMana 200/iBMC, BIOS, CPLD, 4. View the required
and card firmware upgrade guide.
● To obtain the upgrade
X8000 iMana 200, BIOS, CPLD, and card package, perform the
firmware following steps:
FusionServer iBMC, BIOS, HMM, CPLD, and 1. Log in to Support >
Pro G5500 FPGA firmware Intelligent Servers or
Support > Ascend
Computing.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 113


Huawei Servers
Troubleshooting 6 Software and Firmware Upgrade

Server Upgradable Software and Reference


Series Firmware

Atlas 800 AI iBMC, BIOS, LCD, CPLD, and card 2. Choose a server model
inference firmware to access the product
server page.
(model 3. Click the Software
3010)/Atlas Download tab.
800 AI
training 4. Select the latest patch
server version.
(model 5. Download the required
9010) upgrade package.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 114


Huawei Servers
Troubleshooting 7 Preventive Maintenance

7 Preventive Maintenance

About This Chapter


Preventive maintenance quickly detects, diagnoses, and rectifies server faults.
Obtain authorization from the customer before performing preventive
maintenance on Huawei servers.

NOTICE

Take protective measures to prevent ESD damage and any other damage to
servers during preventive maintenance.

7.1 Inspecting the Equipment Room Environment and Cable Layout


7.2 Inspecting Servers
7.3 Huawei Server Inspection Report

7.1 Inspecting the Equipment Room Environment and


Cable Layout

7.1.1 Precautions
Familiarize yourself with the security icons listed in Table 7-1 before preventive
maintenance to reduce the chance of injury to yourself or damage to the
equipment. These security icons will be on some server components.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 115


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Table 7-1 Security icons

Icon Description

Indicates that removing the cover of this component can result


in an electric shock. To prevent an electric shock, do not remove
the cover of the component.
Warning: All components with this icon have electric shock risks
and there are no serviceable parts inside these components.

Indicates a hazard. Operation of the component may cause an


electric shock. There are no serviceable parts inside the
component, and therefore do not remove the cover of the
component.
Warning: To prevent an electric shock, do not remove the cover
of the component.

Indicates that this component operates at a high temperature


and touching it can result in burns.
To prevent burns, do not touch the component until it cools
down.

Indicates a hazard. Misoperations can damage the device or


cause personal injury.

Indicates that this device can cause personal injury or can fail to
operate properly if it is not externally grounded. Each end of a
ground cable should be connected to a different device, and the
devices must be connected to ground points.

Indicates that this device can cause personal injury or can fail to
operate properly if it is not internally grounded. Each end of a
ground cable should be connected to different device
components, and the device must be connected to a ground
point.

Indicates an ESD-sensitive area, in which devices can easily be


damaged. To prevent damage, do not touch devices with bare
hands when operating in this area, and take ESD measures, such
as wearing an ESD wrist strap or ESD gloves.

7.1.2 Inspecting the Equipment Room Environment


The environmental factors to inspect in an equipment room include the
temperature, relative humidity, altitude, and power supply conditions.

For details, see 7.3 Huawei Server Inspection Report.

7.1.3 Inspecting Cable Layout


Visually inspect the cable layout. Obtain the customer's permission before
removing or inserting cables.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 116


Huawei Servers
Troubleshooting 7 Preventive Maintenance

To prevent any damage to the cables, take the following precautions before
inspecting the cable layout:
● Check that power cables meet the following requirements:
– The connector surface of each three-wire power ground cable is in a good
condition.
– All power cable types are correct.
– The insulation layer of each power cable is in a good condition.
● Keep cables slack and away from heat sources.
● Do not use excessive force to install or remove a cable.
● Install or remove a cable by holding its connectors.
● Do not twist or tear cables.
● Lay out and connect cables properly, and ensure that they are not in contact
with any components that are removable or replaceable.
For details, see 7.3 Huawei Server Inspection Report.

7.2 Inspecting Servers

7.2.1 Precautions
● Obtain the customer's consent before inspecting servers. Without customers'
written authorization, do not modify server configurations, power on or off
servers, remove or insert components, or change cables.
● Before inspecting servers, obtain the iMana 200 or iBMC IP address, MM910
IP address and password of the root or Administrator user for each server to
be inspected. After inspecting servers, advise the customer to change the
password of the root or Administrator user as soon as possible.

7.2.2 Inspecting Indicators


The front and rear panels of Huawei servers provide indicators and buttons,
including the UID button/indicator, health status indicator, network port status
indicators, fan module indicators, and power button/indicator. You can observe the
indicators on a server to determine the server status. For details about the
indicator status and handling measures, see 5.5 Checking Indicators to Locate
Faults. For details about the check items, see 7.3 Huawei Server Inspection
Report.

Indicators on the Front Panel


Check the following indicators on the server front panel:
● Health status indicator
● UID button/indicator
● Power button/indicator
● Drive indicators
● Network port status indicator (on the front NICs or on the MM610 or
MM620)

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 117


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Indicators on the Rear Panel


Check the following indicators on the server rear panel:

● Power indicator
● UID indicator
● Network port status indicator
● Fan module indicators
● E9000 switch module indicators
● E9000 management module indicators

7.2.3 Using SmartKit to Perform Health Inspection


Use SmartKit to inspect server health status. SmartKit provides the following
functions:

● Supports inspection for racks servers, high-density servers, blade servers,


KunLun servers, and heterogeneous servers, and allows users to export
inspection reports.
● Supports inspection for mainstream OSs including SLES, RHEL, CentOS,
VMware, Ubuntu, and Windows, and allows users to export inspection reports.
● Supports batch log collection for BMC and blade server management
modules, and supports SLES, RHEL, and CentOS mainstream versions.
● Supports batch upgrade for BMC, BIOS, CPLD, and Smart Provisioning
firmware of rack servers, high-density servers, blade servers, KunLun servers,
and heterogeneous servers.
● Supports firmware bundle upgrade by using the E9000 active management
module.
● Supports batch configuration for PSUs, BIOSs, BMCs, and RAID controller
cards of rack servers, high-density servers, blade servers, KunLun servers, and
heterogeneous servers.
● Supports batch configuration for E9000 management modules.
NOTE

Inspection and log collection do not modify data, collect service data, or affect services, and
will delete the collection scripts and files when finished.

For details about the supported server models and inspection operations, see the
FusionServer Tools 2.0 SmartKit User Guide.

7.2.4 Checking the System Status Through iBMC

Prerequisites
You can log in to the iBMC WebUI.

Procedure 1 (For iBMC V549 and Earlier)


Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 118


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Step 2 View system alarms and events.


1. On the menu bar of the iBMC WebUI, choose Alarm & SEL.
2. In the navigation tree, choose Current Alarms to view current alarms.
3. In the navigation tree, choose System Events to view system events.

Step 3 View the status of hardware, including drives, DIMMs, and sensors.
1. On the menu bar of the iBMC WebUI, choose Information.
2. In the navigation tree, choose System Info. On the right panel, click the
Storage tab and view hardware status information.
3. In the navigation tree, choose Real-Time Monitoring to view the CPU usage,
memory usage, and air intake vent temperature.
NOTE

– The RH5885 V3, RH5885H V3, and RH8100 V3 do not support display of the CPU
usage and memory usage.
– After iBMA 2.0 is installed and started on the server OS, the CPU usage is obtained
from the iBMA 2.0 and the CPU usage data is the same as the data collected on
the OS.
– If iBMA 2.0 is not installed on the server OS or iBMA 2.0 has not completely
started, the CPU usage data is obtained from the Intel Management Engine (ME).
The CPU usage is the average compute usage per second of all CPU cores
calculated by the CPU internal module.
– If iBMA 2.0 is not installed on the server OS, obtain the latest iBMA user guide and
software package, and install iBMA 2.0 by referring to the user guide.
4. In the navigation tree, choose Sensor Info to view the status of sensors.

----End

Procedure 2 (For iBMC V561 and Later or iBMC V3.01.00.00 and Later)
Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.

Step 2 View system alarms and events.


1. In the navigation tree, choose Maintenance > Alarm & SEL.
2. Click the Current Alarms tab to view the current alarms.
3. Click the System Events tab to view the system events.

Step 3 View the status of hardware, including drives, DIMMs, and sensors.
1. In the navigation tree, choose System > System Info. Click Memory to view
the detailed memory information.
2. In the navigation tree, choose System > System Info. Click Sensors Info to
view the sensor status.
3. In the navigation tree, choose System > Storage Management to view the
status of hardware such as system drive.
4. In the navigation tree, choose System > Performance Monitoring to view the
CPU usage, memory usage, and drive usage.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 119


Huawei Servers
Troubleshooting 7 Preventive Maintenance

7.3 Huawei Server Inspection Report


Inspection Information
Customer Information

Customer
Name

Equipment Eq
Room Address uip
me
nt
Ro
om
Na
me

Equipment Ph
Room Director on
e
Nu
mb
er

Inspecting Party formation

Time of
Inspection

Inspected By Phone
Numb
er

Huawei Contact Phone


Numb
er

Service Hotline
Enterprise China 4008229999
Region:

Enterprise global Global Service Hotline


technical
assistance center
(TAC):

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 120


Huawei Servers
Troubleshooting 7 Preventive Maintenance

China region 400830218 (customer service)/800830218/02986360000


carrier TAC

Huawei 8008303118/02981770177
engineers and
partners:

Global carrier 02981770999


TAC:

Inspecting the Equipment Room Environment


Equipment Room Environment Inspection Results

N Item Criteria Result


o.

1 Operati 10°C to 35°C (50°F to □ Normal □ Abnormal


ng 95°F) Brief description:
temper
ature

2 Storage –40°C to +65°C (–40°F to □ Normal □ Abnormal


temper +149°F) Brief description:
ature

3 Temper 20°C/h (68°F/h) □ Normal □ Abnormal


ature Brief description:
change
rate

4 Operati 8% to 90% RH (non- □ Normal □ Abnormal


ng condensing) Brief description:
humidit
y

5 Storage 5% to 95% RH (non- □ Normal □ Abnormal


humidit condensing) Brief description:
y

6 Operati ≤3050m □ Normal □ Abnormal


ng Brief description:
altitude

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 121


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Equipment Room Environment Inspection Results

7 Power ● AC input: 100 V to 240 □ Normal □ Abnormal


supply V AC at 50 or 60 Hz Brief description:
● DC input:
– –57.6 V to –38.4 V
DC (voltage range),
–48 V DC (nominal
voltage)
– 192 V to 288 V DC
(voltage range), 240
V DC (nominal
voltage)
– 260 V to 400 V DC
(voltage range), 380
V DC (nominal
voltage)

Inspecting Cable Layout


Cable Layout Inspection Results

N Item Criteria Result


o.

1 General Route the service cables □ Normal □ Abnormal


cable and power cables along Brief description:
layout the two sides of the
cabinet respectively.

2 Power ● Power cables are not □ Normal □ Abnormal


cable tangled and are Brief description:
layout arranged in an orderly
fashion.
● Power cables are
arranged in the same
way as those in any
existing cabinets.
● No power cables are
coiled.

3 Service ● Service cables are not □ Normal □ Abnormal


cable tangled and are Brief description:
layout arranged in an orderly
fashion.
● Service cables are
arranged in the same
way as those in any
existing cabinets.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 122


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Cable Layout Inspection Results

4 Optical Optical cables are not □ Normal □ Abnormal


cable coiled too tightly, bent at Brief description:
layout acute angles, or stretched.

5 Ground Ground cables are □ Normal □ Abnormal


cable connected properly. Brief description:
connect
ion

6 Cable Cable labels are properly □ Normal □ Abnormal


labels attached. The information Brief description:
on the labels is legible,
correct and easy to
understand.

7 Power Power cables are □ Normal □ Abnormal


cable connected to power Brief description:
connect sockets properly.
ion

8 Signal Signal cables and data □ Normal □ Abnormal


cable cables are connected to Brief description:
connect devices such as servers
or and switches properly.

Inspecting Servers
View the inspection report generated by SmartKit to check server health status. An
item has passed the inspection if the value of Result for the item is OK in the
report.

Server Inspection Results

N Item Criteria Result


o.

1 iMana Server health status logs □ Normal □ Abnormal


200/ contain no alarm Brief description:
iBMC information.
informa
tion

2 Manag Server health status logs □ Normal □ Abnormal


ement contain no alarm Brief description:
module information.
informa
tion

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 123


Huawei Servers
Troubleshooting 7 Preventive Maintenance

Inspection Conclusions and Suggestions


Huawei's preventive maintenance engineers will perform a comprehensive
inspection of your Huawei servers to quickly detect any potential problems. These
engineers will then submit a detailed inspection report, and suggestions, to help
improve your service availability.
If you receive inspection results, please provide your comments and suggestions in
the following Customer's Inspection Comments and Suggestions table:

Inspection Conclusions and Suggestions

Insp Ph Date
ecte on
d By e
N
u
m
be
r

Customer's Inspection Comments and Suggestions

In P Date
sp h
ec o
te n
d e
By N
u
m
b
er

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 124


Huawei Servers
Troubleshooting 8 Common Operations

8 Common Operations

8.1 Obtaining a Product SN


8.2 Using iMana 200 to Collect Information in Batches
8.3 Using iBMC to Collect Information in Batches
8.4 Using the MM910 WebUI to Collect Information in Batches (for Versions
Earlier Than U54 2.20)
8.5 Using the MM910 WebUI to Collect Information in Batches (for U54 2.20 or
Later)
8.6 Using the FusionDirector WebUI to Collection Information in Batches
8.7 Using the MM510 CLI to Collect Information (FusionServer Pro G5500)
8.8 Logging In to the iMana 200 WebUI
8.9 Logging In to the iBMC WebUI
8.10 Logging In to the Web Tools of the MX510
8.11 Logging In to the MM910 WebUI
8.12 Logging In to the FusionDirector WebUI
8.13 Logging In to the MM510 CLI
8.14 Logging In to the RMC CLI
8.15 Logging In to a Server Over a Network Port by Using PuTTY
8.16 Logging In to a Server Over a Serial Port by Using PuTTY
8.17 Logging In to a Compute Node, Passthrough Module, or Switch Module by
Using the SOL Function of the MM910
8.18 Logging In to a Compute Node, Passthrough Module, or Switch Module by
Using the SOL Function of the MM920/MM921
8.19 Using WinSCP to Transfer Files
8.20 Configuring an FTP Server
8.21 Using SFTP to Transfer Files

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 125


Huawei Servers
Troubleshooting 8 Common Operations

8.1 Obtaining a Product SN


Overview
A serial number (SN) or equipment serial number (ESN) uniquely identifies a
server and is required when you apply for technical support to Huawei.

NOTE

Check the first two digits of the product SN before reading the following information.
● If the first two digits of the product SN are 02 or 03, see Figure 8-1.
NOTE

A product SN starts with SN or ESN. The following is an example.

Figure 8-1 SN example

No. Description

1 SN ID (two characters).

2 Material identification code (four characters).

3 Vendor code (two characters). The value 10 indicates


Huawei and other values indicate outsourcing
vendors.

4 Year and month (two characters).


● The first character indicates the year. The digits 1
to 9 indicate 2001 to 2009, the letters A to H
indicate 2010 to 2017, the letters J to N indicate
2018 to 2022, and the letters P to Y indicate 2023
to 2032.
NOTE
The years from 2010 are represented by upper-case letters
excluding I, O, and Z because the three letters are similar to
the digits 1, 0, and 2.
● The second character indicates the month. Digits 1
to 9 indicate January to September, and letters A
to C indicate October to December.

5 Sequence number (six characters).

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 126


Huawei Servers
Troubleshooting 8 Common Operations

No. Description

6 RoHS compliance (one character). Y indicates


environmental-friendly processing.

7 Internal model, that is, product name.

● If the first two digits are 21, see Figure 8-2.


NOTE

A product SN starts with SN or ESN. The following is an example.

Figure 8-2 ESN example

No. Description

1 SN ID (two characters), which is 21.

2 Material identification code (eight digits), that is,


processing code.

3 Vendor code (two characters). The value 10 indicates


Huawei and other values indicate outsourcing
vendors.

4 Year and month (two characters).


● The first character indicates the year. The digits 1
to 9 indicate 2001 to 2009, the letters A to H
indicate 2010 to 2017, the letters J to N indicate
2018 to 2022, and the letters P to Y indicate 2023
to 2032.
NOTE
The years from 2010 are represented by upper-case letters
excluding I, O, and Z because the three letters are similar to
the digits 1, 0, and 2.
The second character indicates the month. Digits 1 to
9 indicate January to September, and letters A to C
indicate October to December.

5 Sequence number (six characters).

6 RoHS compliance (one character). Y indicates


environmental-friendly processing.

7 Internal model, that is, product name.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 127


Huawei Servers
Troubleshooting 8 Common Operations

Obtaining a Product SN
Use one of the following methods to obtain a product SN:
● Use SmartKit.
Use the server inspection function of SmartKit to obtain ESNs in batches. For
details about the product SN, see "Asset Inspection Information" > "Board SN"
in the inspection report.
● View the product label.
A product label is attached to each Huawei server. You can view the product
label to obtain its ESN. The product label position varies with the Huawei
server model. For details, see the user guide of a specific server.
– Figure 8-3 shows the product SN of a rack server.

Figure 8-3 Product SN of a rack server

– Figure 8-4 shows the product SN of an Atlas 800 AI inference server


(model 3010).

Figure 8-4 Product SN of an Atlas 800 AI inference server (model 3010)

– Figure 8-5 shows the product SN of an Atlas 800 AI training server


(model 9010).

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 128


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-5 Product SN of an Atlas 800 AI training server (model 9010)

– Figure 8-6 shows the product SN of an X6800. In Figure 8-6, (1) is the
product label of the server, and (2) is the product label of a server node.

Figure 8-6 Product SN of an X6800

– Figure 8-7 shows the product SN of an E9000. In Figure 8-7, (1) is the
product label of the server, and (2) is the product label of a compute
node.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 129


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-7 Product SN of an E9000

The product labels of switch modules and MM910s are on their ejector
levers.
● Use the iMana 200 WebUI.
NOTE

iMana 200 applies to the following products:


● Rack servers: RH1288 V2, RH2265 V2, RH2268 V2, RH2285 V2, RH2285H V2,
RH2288 V2, RH2288E V2, RH2288H V2, RH2485 V2, RH2488 V2, RH5885 V2,
RH5885 V3, and RH5885H V3
● X6000 server node: XH310 V2, XH311 V2, XH320 V2, XH321 V2, and XH621 V2
● X8000 server node: DH310 V2, DH320 V2, DH321 V2, DH620 V2, DH621 V2,
DH626 V2, and DH628 V2
● E9000 compute node: CH121, CH140, CH220, CH221, CH222, CH240, CH242, and
CH242 V3

a. Log in to the iMana 200 WebUI. For details, see 8.8 Logging In to the
iMana 200 WebUI.
b. On the Overview page, view the product SN of the server, as shown in
Figure 8-8.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 130


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-8 Product SN

● Use the iBMC WebUI.


NOTE

The iBMC applies to the following products:


● Rack server: RH1288A V2, RH2288A V2, 5288 V3, RH1288 V3, RH2288 V3,
RH2288H V3, RH5885 V3, RH5885H V3, RH8100 V3, 1288H V5, and 1288X V5,
2288 C V5, 2288H V5, 2288X V5, 2298 V5, 2488 V5, 2488H V5, 5288 V5, 5885H V5
● Atlas 800 AI inference server (model 3010)
● Atlas 800 AI training server (model 9010)
● X6000 server node: XH310 V3, XH321 V3, XH321 V5, XH321L V5
● X6800 server node: XH620 V3, XH622 V3, XH628 V3, XH628 V5
● E9000 compute node: CH121 V3, CH121L V3, CH140 V3, CH140L V3, CH220 V3,
CH222 V3, CH225 V3, CH226 V3, CH121 V5, CH121L V5, CH221 V5, CH225 V5,
CH242 V5
● Kunlun server: 9008 V5
– For iBMC V549 and earlier
i. Log in to the iBMC WebUI. For details, see 8.9 Logging In to the
iBMC WebUI.
ii. Choose Information > Information Summary/Overview/Summary.
(The menu varies depending on software versions.) View the product
SN of the server, as shown in Figure 8-9.

Figure 8-9 Product SN

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 131


Huawei Servers
Troubleshooting 8 Common Operations

– For iBMC V561 and later or iBMC V3.01.00.00 and later


i. Log in to the iBMC WebUI. For details, see 8.9 Logging In to the
iBMC WebUI.
ii. In the navigation tree, choose System > System Info. Click Product
Info to view the product serial number, as shown in Figure 8-10.

Figure 8-10 Product SN

● Use the MM910 WebUI.


NOTE

This method applies only to E9000 servers whose MM910 version is (U54) 2.20 or
later.

a. Log in to the MM910 WebUI. For details, see 8.11 Logging In to the
MM910 WebUI.
b. Choose Chassis Information > Manufacturing Information and view
the product SN of the server, as shown in Figure 8-11.

Figure 8-11 Product SN

c. Choose Chassis Information > Compute Node Slot Number >


Manufacturing Information and view the SN of the compute node, as
shown in Figure 8-12.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 132


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-12 Product SN

● Use the FusionDirector WebUI.


NOTE

● This method applies only to E9000 servers whose management module is the
MM920/MM921.
● Before the operations, add the MM920/MM921 to FusionDirector.

a. Log in to the FusionDirector WebUI. For details, see 8.12 Logging In to


the FusionDirector WebUI.
b. Choose Menu > Compute > Hardware > Chassis.
c. On displayed chassis list, click a chassis name to access the chassis details
page.
d. Click the Overview tab to view the chassis SN, as shown in Figure 8-13.

Figure 8-13 Product SN

e. Click the Device tab and click Server, Management Module, and Switch
Module respectively to view the SNs of the compute node, management
module, and switch module, as shown in Figure 8-14.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 133


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-14 Product SN

8.2 Using iMana 200 to Collect Information in Batches


This method applies only to servers and blades. To collect logs of switch modules
in batches, use the MM910 WebUI.

Procedure
Step 1 Use PuTTY to log in to the server. For details, see 8.15 Logging In to a Server
Over a Network Port by Using PuTTY or 8.17 Logging In to a Compute Node,
Passthrough Module, or Switch Module by Using the SOL Function of the
MM910.

Step 2 On the iMana 200 CLI, run the imtool command (for versions earlier than 7.01) or
the ipmcset -t maintenance -d imtool command (for 7.01 and later versions).
Information similar to the following is displayed:
root@BMC:/#ipmcset -t maintenance -d imtool
tar: removing leading '/' from member names
Tar result information success.
iMana:/->

If the following information is displayed, log collection is successful.


tar: removing leading '/' from member names
Tar result information success.

Step 3 Use a cross-platform file transfer tool to connect to the iMana 200 IP address.

In this document, WinSCP is used as the cross-platform file transfer tool. For
details, see 8.19 Using WinSCP to Transfer Files.

Step 4 Download the tar.gz package in the /tmp directory on iMana 200 to a directory on
the local PC. See Figure 8-15.

Figure 8-15 WinSCP window

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 134


Huawei Servers
Troubleshooting 8 Common Operations

----End

8.3 Using iBMC to Collect Information in Batches


Scenarios

Table 8-1 One-click information collection by the iBMC for each server
Server Series One-Click Information Description
Collection

E9000 Compute node For details about one-


information click E9000 information
collection, see
"Information Collection"
in MM910 Management
Module V100R001 User
Guide.
E6000 N/A

Rack server/Atlas 800 AI Server information


inference server (model
3010)/Atlas 800 AI
training server (model
9010)

X6000 Compute node


information
X8000

X6800

FusionServer Pro G5500 Server information,


MM510 management
module information, and
heterogeneous node
information

Procedure 1 (For iBMC V549 and Earlier)


Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.
Step 2 Choose Information > Overview > Shortcuts > One-Click Info Collection, as
shown in Figure 8-16.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 135


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-16 One-click information collection

Step 3 Click One-Click Info Collection.


When information collection is complete, a file named dump_info.tar.gz is
generated.
Step 4 Click the file name and download the file to the local PC as prompted.

----End

Procedure 2 (For iBMC V561 and Later or iBMC V3.01.00.00 and Later)
Step 1 Log in to the iBMC WebUI. For details, see 8.9 Logging In to the iBMC WebUI.
Step 2 Choose Home. The Home page is displayed, as shown in Figure 8-17 or Figure
8-18.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 136


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-17 iBMC home page (iBMC V561 and later)

Figure 8-18 iBMC home page (iBMC V3.01.00.00 and later)

Step 3 Click One-Click Info Collection in the Shortcuts area to download the collected
maintenance information.
----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 137


Huawei Servers
Troubleshooting 8 Common Operations

8.4 Using the MM910 WebUI to Collect Information in


Batches (for Versions Earlier Than U54 2.20)
Operation Scenario
For versions earlier than (U54) 2.20, use the MM910 WebUI to collect logs in
batches.

Procedure
Step 1 Log in to the MM910 WebUI. For details, see 8.11 Logging In to the MM910
WebUI.

Step 2 Choose System Management on the menu bar, choose SEL Information in the
navigation tree, and click the SMM tab and then the One touch collect tab.

The log collection page is displayed.

Step 3 On the log collection page, choose Collect All > Start.

Log collection takes about 20 minutes. When log collection is complete, a log file
named one_touch_info_all.tar.gz is displayed in the File Name area.

Step 4 Click the log file name and download it to the local PC as prompted.
NOTE

For MM910 earlier than (U54) 2.20, you need to collect logs of both the active and standby
HMMs.

----End

8.5 Using the MM910 WebUI to Collect Information in


Batches (for U54 2.20 or Later)
Operation Scenario
For (U54) 2.20 or later, use the MM910 WebUI to collect logs in batches.

Procedure
Step 1 Log in to the MM910 WebUI. For details, see 8.11 Logging In to the MM910
WebUI.

Step 2 Choose System Management > Information Collection, and set log collection
parameters.
● Select MM for Collected from.
● Select One-click full collection for Collected content.

Step 3 Click Collect.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 138


Huawei Servers
Troubleshooting 8 Common Operations

Log collection takes about 20 minutes. When log collection is complete, a log file
named one_touch_info_all.tar.gz is displayed in the File Name area.
Step 4 In the dialog box displayed, download the log file to the local PC as prompted. (In
some browsers, the log file is automatically saved in the default directory.)

----End

8.6 Using the FusionDirector WebUI to Collection


Information in Batches
Operation Scenario
If the management module is MM920 or MM921, you can use the FusionDirector
WebUI to collect logs.

Prerequisites
The MM920 or MM921 has been managed by FusionDirector.

Procedure
Step 1 Log in to the FusionDirector WebUI. For details, see 8.12 Logging In to the
FusionDirector WebUI.
Step 2 Choose Menu > Alarms and Logs > Log. The Log page is displayed.
Step 3 Click Collect Log. In the displayed dialog box, click OK.
The Task area is displayed on the right of the page, showing the progress and
status of the log collecting task.
When the task is complete, a message indicating success is displayed.
Step 4 Click Export Log to export the log information to a local directory.

----End

8.7 Using the MM510 CLI to Collect Information


(FusionServer Pro G5500)
The MM510 is the management module of the FusionServer Pro G5500.

NOTE

Use the MM510 CLI to collect information about the MM510 and heterogeneous nodes in
batches. To collect information about the server, MM510, and heterogeneous nodes in
batches, use the iBMC. For details, see 8.3 Using iBMC to Collect Information in Batches.

Prerequisites
You have logged in to the CLI of the MM510. For details, see 8.13 Logging In to
the MM510 CLI.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 139


Huawei Servers
Troubleshooting 8 Common Operations

Example
# One-click information collection
iBMC:/->ipmcget -d diaginfo
Download diagnose info to /tmp/ successfully.

8.8 Logging In to the iMana 200 WebUI


Operation Scenario
This section describes how to log in to the iMana 200 WebUI by using a browser
on the local PC. This section uses a PC running Windows and Internet Explorer 8.0
as an example.

Prerequisites
Conditions

If the remote control function is required, ensure that the OS, browser, and Java
Runtime Environment (JRE) of the required versions have been installed on the
local PC. Table 8-2 shows the system configuration requirements of the local PC.

Ensure that the local PC meets the following networking conditions:

● The local PC is properly connected to the iMana 200 management network


port on the server by using a network cable.
● The IP addresses of the local PC and the iMana 200 management network
port are on the same network segment.

Table 8-2 Local PC configuration requirements

OS Software Version

● Windows 7 32- Browser Internet Explorer Internet Explorer 8.0/10.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
● Windows 8 32-
bit Google Chrome Chrome 13.0/31.0
● Windows
Server 2008 JRE 1.6.0 U25/1.7.0 U40 (32-
32-bit bit)

● Windows 7 64- Browser Internet Explorer Internet Explorer 8.0/10.0


bit
Mozilla Firefox Mozilla Firefox 9.0/23.0
● Windows 8 64-
bit Google Chrome Chrome 13.0/31.0
● Windows
Server 2008 R2 JRE 1.6.0 U25/1.7.0 U40 (64-
64-bit bit)
● Windows
Server 2012
64-bit

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 140


Huawei Servers
Troubleshooting 8 Common Operations

OS Software Version

● RHEL 4.3 64- Browser Mozilla Firefox Mozilla Firefox 9.0/23.0


bit
JRE JRE 1.6.0 U25/1.7.0 U40
● RHEL 6.0 64-
bit

MAC X v10.7 Browser Safari Safari 5.1

Mozilla Firefox Mozilla Firefox 9.0/23.0

JRE JRE 1.6.0 U25/1.7.0 U40

NOTE

If the JRE does not meet requirements, download and install a proper Java version by
referring to Table 8-2.

Data
Table 8-3 lists the required data before you log in to the iBMC WebUI.

Table 8-3 Required data


Type Paramet Description Example
er

User User Username for logging in to the iMana root


login name 200 WebUI
informa
tion Password User password for logging in to the iBMC Huawei12#$
WebUI.
NOTE
The default iMana 200 user is root. The root
user belongs to the administrator group. The
default password is Huawei12#$.

Procedure
Step 1 Connect the local PC to the iMana 200 management network port on the server
by using a crossover cable or twisted pair cable.
Figure 8-19 shows the network diagram.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 141


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-19 Network diagram

Step 2 Open Internet Explorer on the local PC.

Step 3 In the address box, enter the iMana 200 address in the format of https://IP
address of the iMana 200 management network port on the server (for example,
https://192.168.2.100).

Step 4 Press Enter.

The iMana 200 login page is displayed, as shown in Figure 8-20.

NOTE

● If the message "There is a problem with this website's security certificate" is displayed,
click Continue to this website (not recommended).
● If the Security Alert dialog box indicating a certificate error is displayed, click Yes.

Figure 8-20 Logging in to the iMana 200 WebUI

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 142


Huawei Servers
Troubleshooting 8 Common Operations

Step 5 On the iMana 200 login page, enter the username and password.
NOTE

The user account will be locked after five consecutive login failures caused by incorrect
passwords. If your user account is locked, log in again 5 minutes later.

Step 6 Select This iMana from the Log on to drop-down list.


NOTE

You can click Reset to clear the information entered on the User Login page.

Step 7 Click Log In.

The Overview page is displayed. The login username is displayed in the upper
right corner of the page.

----End

8.9 Logging In to the iBMC WebUI


Scenarios
Log in to the iBMC WebUI by using a browser on the local PC.

Prerequisites
Conditions

The local PC that uses the remote control function must have the Java runtime
environment (JRE) and the browser of the required version. For details, see the
corresponding iBMC User Guide.

Ensure that the local PC meets the following networking conditions:

● The local PC is connected to the iBMC management network port on the


server by using a network cable.
● The IP addresses of the local PC and the iBMC management network port are
on the same network segment.

Data

Table 8-4 lists the required data before you log in to the iBMC WebUI.

Table 8-4 Required data

Type Paramet Description Example


er

User User Username for logging in to the iBMC root


login name WebUI.
informa
tion

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 143


Huawei Servers
Troubleshooting 8 Common Operations

Type Paramet Description Example


er

Password Password for logging in to the iBMC Huawei12#$


WebUI.
NOTE
The default username for logging in to the
iBMC WebUI of V2 & V3 servers is root, and
the default password is Huawei12#$.
The default user name for logging in to the
iBMC WebUI of V5 servers/Atlas 800 AI
inference servers (model 3010)/Atlas 800 AI
training servers (model 9010) is
Administrator, and the default password is
Admin@9000.

Procedure 1 (For iBMC V549 and Earlier)


This section uses a PC running Windows 7 and Internet Explorer 8.0 as an
example.

Step 1 Connect the local PC to the iBMC management network port on the server by
using a crossover cable or twisted pair cable.

Figure 8-21 shows the network diagram.

Figure 8-21 Network diagram

Step 2 Open Internet Explorer on the local PC.

Step 3 In the address box, enter the IP address of the server iBMC management network
port (for example, https://192.168.2.100) and press Enter.

The iBMC login page is displayed, as shown in Figure 8-22.


NOTE

● If the message "There is a problem with this website's security certificate" is displayed,
click Continue to this website (not recommended).
● If the Security Alert dialog box indicating a certificate error is displayed, click Yes.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 144


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-22 Logging in to iBMC

Step 4 On the login page, enter the username and password for logging in to the iBMC
WebUI.
NOTE

The user account will be locked after five consecutive login failures with wrong passwords.
If your user account is locked, log in again 5 minutes later.

Step 5 Select This iBMC from the Domain drop-down list.


Step 6 Click Log In.
The Overview page is displayed, showing the username in the upper right corner.

----End

Procedure 2 (For iBMC V561 and Later or iBMC V3.01.00.00 and Later)
This section uses a PC running Windows 7 and Internet Explorer 11 as an example.

Step 1 Open Internet Explorer, enter the iBMC management network port address
https://ipaddress/ in the address box, and press Enter.
NOTE

Enter an IPv6 address in brackets or an IPv4 address directly. For example:


● IPv4 address: 192.168.100.1
● [IPv6 address]: [fc00::64]

The security alert window is displayed, as shown in Figure 8-23.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 145


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-23 Security warning

NOTE

If a website security alert is displayed, you can ignore this message or perform any of the
following to shield this alert:
● Import a trust certificate and a root certificate to the iBMC. For details, see "Importing
the iBMC Trust and Root Certificates" in the corresponding iBMC User Guide.
● If no trust certificate is available and network security can be ensured, add the iBMC to
the Exception Site List on Java Control Panel or reduce the Java security level. This
operation, however, poses security risks. Exercise caution when performing this
operation.

Step 2 Click Continue to this website (not recommended).


The login page is displayed, as shown in Figure 8-24.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 146


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-24 iBMC login page

Step 3 On the login page, enter the username and password for logging in to the iBMC
WebUI.
Step 4 Select Local iBMC from the Domain drop-down list.
Step 5 Click Log In.
The Home page is displayed.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 147


Huawei Servers
Troubleshooting 8 Common Operations

8.10 Logging In to the Web Tools of the MX510


Operation Scenario
Log in to the Web Tools of the FC switching plane MX510 by using a browser on
the local PC to configure and manage this plane.
This section applies to the CX311, CX911, and CX915.

Data
The following data is required:
● IP address of the server to be connected
● User name for logging in to the server to be connected. The default username
is admin.
● User password for logging in to the server to be connected. The default user
password is Huawei12#$.

Tool
Java plug-in: This tool is third-party software. You need to prepare it by yourself.
JRE 1.8 or later is required.

Procedure
Step 1 Connect a client (for example, a local PC) to the management network port of the
management module by using a network cable.
Step 2 In this displayed security alert dialog box, click Allow to allow web access.
Step 3 In the displayed security alert dialog box, select Do not block this program.
Step 4 In the address box of the PC browser, enter https://IP address of the FC switching
plane and press Enter.
The login dialog box is displayed, as shown in Figure 8-25.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 148


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-25 Login dialog box

Step 5 Enter the username and password, and click Add Fabric.

Step 6 In the dialog box displayed, click Yes.

The Web Tools home page is displayed, as shown in Figure 8-26.

Figure 8-26 Web Tools home page

----End

8.11 Logging In to the MM910 WebUI


Scenarios
Log in to the MM910 WebUI by using a browser on the local PC to configure and
manage the chassis, MM910s, compute nodes, storage nodes, switch modules,
passthrough modules, power supply units (PSUs), and fan modules.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 149


Huawei Servers
Troubleshooting 8 Common Operations

Impact on the System


This operation has no adverse impact on the system.

NOTE

● The user account will be locked if incorrect passwords are entered for five consecutive
times. The user account will be automatically unlocked in 5 minutes, but cannot be
forcibly unlocked. If you attempt to enter a password again within 5 minutes, the lock
duration is reset to 5 minutes no matter whether the entered password is correct.
● The WebUI of the standby MM910 (displayed as "This is the standby MM.") does not
display component installation status. After logging in to the WebUI of the standby
MM910, you can view the status of the active MM910 and perform the following
operations for the standby MM910: Set the DHCP parameters and a static IP address,
set and query the thresholds and hysteresis of threshold sensors, collect system
operating information, and upgrade the management software. To perform other
operations, log in to the WebUI of the active MM910.

Data
You have obtained the following data:
● Username for logging in to the server to be connected. The default username
is root.
● User password for logging in to the server to be connected. The default user
password is Huawei12#$.

Procedure
Step 1 Connect the Ethernet port on the local PC to the MGMT ports on the active and
standby MM910s over the local area network (LAN).

NOTICE

If the active MM910 MGMT port has been connected to the network by using a
network cable and the client needs to be directly connected to the MM910, do not
directly disconnect the network cable from the active MM910 MGMT port that has
been connected to the network. Otherwise, an active/standby MM910 switchover
will be triggered, which may cause network interruption. You are advised to
connect the client to the active MM910 STACK port in the chassis by using a
network cable. If the active MM910 STACK port in the chassis has been connected
to the MGMT port in another chassis, use an idle active MM910 STACK port in
another chassis.

Figure 8-27 shows the network connections.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 150


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-27 Network connections

NOTE

The MM910 management port is provided by two modes.


● An Ethernet port on the switch module in slot 2X or 3X can be used as the MM910
management port. However, if a CX910/CX911/CX912/CX913/CX915 is in slot 2X or 3X,
only a GE port can be used; if a CX920 is in slot 2X or 3X, only a 10GE port can be
used; if a CX916/CX916L/CX930 is in slot 2X or 3X, only a 25GE port can be used. If a
CX110/CX111/CX310/CX311/CX312/CX320/CX710 is in slot 2X or 3X, any port can be
used. The CX910/CX911/CX912/CX913 are not recommended for providing the
management network port of the management module.
● The MGMT port on the MM910 panel can be used as the MM910 management port.
● For MM910 (U54) 2.25 or earlier, the port on the switch module in slot 2X or 3X is
used as the management network port by default. In this case, do not connect the
MGMT port on the MM910 and the port on the switch module to the same network
at the same time, otherwise, a network storm occurs and the network is interrupted.
For MM910 (U54) 2.26 or later, the MGMT port is used as the management network
port by default. For details about how to query the version, see the MM910
Management Module V100R001 User Guide.
● You can run the outportmode command to change the mode in which the MM910
management port is provided. For details, see "Querying and Setting the Network Port
Out Mode (outportmode)" in the MM910 Management Module V100R001
Command Reference

Step 2 Set the IP address and subnet mask or route information for the local PC so that
the local PC can communicate with the MM910 properly.
Step 3 On the menu bar of Internet Explorer, choose Tools > Internet Options.
The Internet Options dialog box is displayed.

NOTE

This section uses a PC running Windows 7 and Internet Explorer 8.0 as an example.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 151


Huawei Servers
Troubleshooting 8 Common Operations

Step 4 Click the Connections tab and click LAN Settings.


The LAN Settings dialog box is displayed.
Step 5 In the Proxy server area, deselect the Use a proxy server for your LAN check box.
Step 6 Click Yes.
The LAN Settings dialog box closes.
Step 7 Click Yes.
The Internet Options dialog box closes.
Step 8 Open Internet Explorer, enter https://MM910 floating IP address in the address
box, and press Enter.
For example, enter https://10.85.4.77 in the address box.
"There is a problem with this website's security certificate" is displayed.
Step 9 Click Continue to this website (not recommended).
The page for logging in to the HMM WebUI is displayed.
Step 10 Set the parameters. See Figure 8-28 and Figure 8-29.
● Language: Select English.
● User name: Enter the username for login. The default username is root.
● Password: Enter the user password for login. The default password is
Huawei12#$.
● Login To: Select This Machine/computer in most cases. Select LDAP if the
system manages domain users by using an active directory (AD) server.

Figure 8-28 Logging in to the HMM WebUI (MM910 (U54) 2.20 or later)

Figure 8-29 Logging in to the HMM WebUI (MM910 earlier than (U54) 2.20)

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 152


Huawei Servers
Troubleshooting 8 Common Operations

Step 11 Click Log In.


The HMM WebUI is displayed, as shown in Figure 8-30 or Figure 8-31.

Figure 8-30 HMM WebUI (MM910 (U54) 2.20 or later)

Figure 8-31 HMM WebUI (MM910 earlier than (U54) 2.20)

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 153


Huawei Servers
Troubleshooting 8 Common Operations

8.12 Logging In to the FusionDirector WebUI


Operation Scenario
Use Google Chrome to log in to the FusionDirector WebUI. On the FusionDirector
WebUI, you can manage chassis components and cluster devices.

Prerequisites
Conditions

● Google Chrome 55 or later is required for logging in to FusionDirector.


● You have obtained the IP address, username, and password of FusionDirector.
The default username of the FusionDirector WebUI is Administrator, and the
password is Admin@9000.
● If you log in as an LDAP domain user, ensure that the LDAP server
communicates with FusionDirector properly, the LDAP function has been
enabled on FusionDirector, and the LDAP server and user group information
has been configured.
● If you use the DNS domain name to log in, ensure that the DNS server
communicates with FusionDirector properly and the domain name and DNS
server are configured on FusionDirector.

Precautions

● FusionDirector supports a maximum of 100 concurrent users.


● The default timeout interval of FusionDirector is 30 minutes. If you do not
perform any operation on the WebUI within 30 minutes, the account is
automatically logged out. You need to enter the username and password to
log in again.
● If the number of login failures caused by incorrect user names and passwords
reaches the value specified in the system security policy, the account is
automatically locked. When the lockout duration reaches the value specified
in the security policy, the user is automatically unlocked.
● To ensure system security, change the default password upon the first login
and change the password periodically.

Procedure
Step 1 Connect the Ethernet port of the PC to a management network port of the active
or standby MM920/MM921 over the LAN.

The 10GE optical port and MGMT port on the MM920/MM921 panel are
management network ports. This section uses the MGMT port as an example.

Figure 8-32 shows the network connections.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 154


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-32 Network connections

Step 2 Set an IP address and a subnet mask or add route information for the PC so that
the PC can communicate with FusionDirector.
Step 3 Open the browser, enter https://ipaddr in the address box, and press Enter.
NOTE

● ipaddr indicates the address used to access the FusionDirector WebUI. It can be in either
of the following formats:
– IPv4 address in dotted-decimal format XXX.XXX.XXX.XXX.
– Fully qualified domain name (FQDN) of FusionDirector.
● The browser may display a message indicating that the website has a security certificate
error. Ignore this error and continue the login if the IP address is correct.

Step 4 Enter the login information.


Table 8-5 describes the information required on the login page.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 155


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-33 Login page

Table 8-5 Login parameters


Parameter Description

User name FusionDirector supports the following user names:


● Local users: The username is a string of 6 to 32 characters.
● LDAP users: The username can contain a maximum of 255
characters.

Password Specifies the password of the user. For security purposes, change
the password periodically.

Domain ● If you log in as a local user, select Local.


Name ● If you log in as an LDAP user, select LDAP.

Step 5 Click Log In.


The FusionDirector Dashboard is displayed, as shown in Figure 8-34.

NOTE

● If the username or password is incorrect, you need to enter a verification code in the
second login attempt. If the verification code is not clear, click to refresh the
verification code.
● If you enter incorrect passwords for three consecutive times, the account will be locked
for 5 minutes. If the account is locked, try again later or contact the administrator.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 156


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-34 Dashboard page

----End

8.13 Logging In to the MM510 CLI


The MM510 is the management module of the FusionServer Pro G5500.

Prerequisites
When logging in to the HMM CLI, ensure that:
● If you log in to the CLI over SSH, a maximum of five concurrent users are
supported.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 157


Huawei Servers
Troubleshooting 8 Common Operations

● To log in to the CLI over the network port, you must connect the network port
on the configuration terminal to the network port on the server by using a
network cable, and ensure that the IP addresses of the two network ports are
on the same network segment.
● To log in to the CLI over the serial port, you must connect the serial ports of
the terminal and the server by using a serial cable.

Login Method
● Login over SSH
● Login over the local serial port
NOTE

● The HMM provides one default user Administrator, and the default password is on
the product nameplate.
● The system locks a user account if the user enters incorrect passwords for five
consecutive times. The user is automatically unlocked 5 minutes later, or an
administrator can unlock the user on the CLI.
● For security purposes, change the initial password after the first login and change
your password periodically.

Logging In over SSH


The Secure Shell (SSH) protocol provides secure remote login and other secure
network services over an insecure network.
The methods for logging in to the CMC CLI over SSH varies according to the client
operating system:
● If the client uses Linux:
a. Connect the client to the management network port on the server.
b. Run the ssh ipaddress command on the terminal tool (for example, shell)
to log in to the CLI. (In the command, ipaddress indicates the IP address
of the management network port.)
NOTE

At the initial startup of the HMM, wait for about 3 minutes before you log in to the CLI.
● If the client uses Windows:
a. Download and install the SSH client communication tool.
b. Connect the client to the management network port on the server.
c. Enter the IP address, username, and password of the management
network port on the client communication tool.

Logging In over a Serial Port


1. Connect the serial cable.
2. Log in to the CLI by using the HyperTerminal and set the following
parameters:
– Bits per second: 115200
– Data bits: 8

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 158


Huawei Servers
Troubleshooting 8 Common Operations

– Parity: None
– Stop bits: 1
– Flow control: None
Figure 8-35 lists the parameters to be specified.

Figure 8-35 HyperTerminal properties

3. Enter the username and password after the connection is established.

8.14 Logging In to the RMC CLI


Operation Scenario
Log in to the rack management controller (RMC) CLI.
Two login methods are available:
● SSH
SSH provides secure remote login and other secure network services over an
insecure network.
To log in to the RMC CLI over SSH, connect a PC to the RMC management
network port by using a network cable.
● Login over the local serial port

Prerequisites
The RMC is operating properly.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 159


Huawei Servers
Troubleshooting 8 Common Operations

Data
● IP address of the RMC management network port. The default IP address is
192.168.2.100.
● RMC user names and passwords
The RMC provides four default users:
– User root (default password: Huawei12#$)
– User admin (default password: Huawei12#$)
– User operator (default password: Huawei12#$)
– User taobao (default password: Huawei12#$)

Tool
A terminal tool (for example, PuTTY) has been installed on the PC. This tool is
third-party software. You need to prepare it by yourself. PuTTY 0.60 or later is
required for login over a serial port.

Document
For details about the RMC, see the X8000 Server RMC Command Reference.

Log in to the RMC CLI over a serial port.


Step 1 Connect the PC to the RMC serial port by using a serial cable.
Step 2 On the PC, double-click PuTTY.exe.
The PuTTY Configuration window is displayed.
Step 3 Set Connection type to Serial, as shown in Figure 8-36.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 160


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-36 PuTTY Configuration (Serial)

Step 4 Set the login parameters.

The following are examples of the parameters:

● Serial Line to connect to: COM1


● Speed (baud): 38400
● Data bits: 8
● Stop bits: 1
● Parity: None
● Flow control: None

Step 5 Click Open.

The PuTTY window is displayed, prompting "login as:" for you to enter a user
name.

Step 6 Enter a user name and password.

After login, the RMC command prompt root@RMC:/ is displayed.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 161


Huawei Servers
Troubleshooting 8 Common Operations

Log in to the RMC over the management network port.


Step 1 Connect the PC to the RMC management network port by using a network cable.

Step 2 On the PC, double-click PuTTY.exe.

The PuTTY Configuration window is displayed.

Step 3 Set Connection type to SSH, as shown in Figure 8-37.

Figure 8-37 PuTTY Configuration (SSH)

Step 4 In the Host Name (or IP address) text box, enter the IP address of the RMC
management network port.

Step 5 Click Open.

The PuTTY window is displayed, prompting "login as:" for you to enter a user
name.

Step 6 Enter a user name and password.

After login, the RMC command prompt root@RMC:/ is displayed.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 162


Huawei Servers
Troubleshooting 8 Common Operations

8.15 Logging In to a Server Over a Network Port by


Using PuTTY
Scenarios
Use PuTTY to remotely log in to the server over a local area network (LAN) and to
configure and maintain the server.

NOTE

The server in this section can be a management module, compute node, or switching plane.

Prerequisites
Conditions
The PC and the server or MM910/MM920/MM921 management network port
have been connected by using a network cable.
Data
You have obtained the following data:
● You have obtained the IP address of the server to be connected.
● You have obtained the user name and password for logging in to the server to
be connected.
Software Tools
PuTTY.exe: This tool is third-party software. You need to prepare it by yourself.

Procedure
Step 1 Set an IP address and a subnet mask or add route information for the PC so that
the PC can properly communicate with the server.
You can run the Ping Server IP address command on the PC CLI to check the
communication between the PC and the server.
Step 2 Double-click PuTTY.exe.
The PuTTY Configuration window is displayed, as shown in Figure 8-38.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 163


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-38 PuTTY Configuration

Step 3 Set the login parameters.


Set parameters as follows:
● Host Name (or IP address): Enter the IP address of the server to be logged in
to, for example, 192.168.2.10.
● Port: Retain the default value 22.
● Connection type: Retain the default value SSH.
● Close window on exit: Retain the default value Only on clean exit.
NOTE

Configure Host Name and Saved Sessions, and click Save. You can double-click the saved
record under Saved Sessions to log in to the server the next time.

Step 4 (Optional) After logging in to the Ethernet plane by using PuTTY, if you fail to
delete characters on the CLI by using the Backspace key, choose Terminal >
Keyboard, and select Control-H under The Backspace key, as shown in Figure
8-39.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 164


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-39 PuTTY Configuration

Step 5 Click Open.


The PuTTY window is displayed, prompting "login as:" for you to enter a user
name.

NOTE

● If this is your first login to the server, the PuTTY Security Alert dialog box is displayed.
Click Yes to proceed.
● If an incorrect user name or password is entered, you must set up a new PuTTY session.

Step 6 Enter a user name and password.


If the login is successful, the server host name is displayed on the left of the
prompt.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 165


Huawei Servers
Troubleshooting 8 Common Operations

8.16 Logging In to a Server Over a Serial Port by Using


PuTTY
NOTE

By default, the server serial port is the OS serial port. For details about how to redirect the
server serial port, see "Querying and Redirecting the Serial Port (serialdir)" in the iBMC
User Guide.

Scenarios
Use PuTTY to log in to the server over a serial port in either of the following
scenarios:

● The server is configured for the first time at a new site.


● A remote connection to the server cannot be established.
NOTE

The server in this section can be a management module, compute node, or switching plane.

Prerequisites
Conditions

● A PC is connected to the server by using a serial cable.


● PuTTY 0.60 or later has been installed.

Data

You have obtained the user name and password for logging in to the server to be
connected.

Software Tools

PuTTY.exe: This tool is third-party software. You need to prepare it by yourself.


PuTTY 0.60 or later is required for login over a serial port.

Procedure
Step 1 Double-click PuTTY.exe.

The PuTTY Configuration window is displayed.

Step 2 In the navigation tree on the left, choose Connection > Serial.

Step 3 Set the login parameters.

The following are examples:

● Serial line to connect to: COMN


● Speed (baud): 115200
● Data bits: 8

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 166


Huawei Servers
Troubleshooting 8 Common Operations

● Stop bits: 1
● Parity: None
● Flow control: None
In COMN, N indicates the serial port number, and the value is an integer.
Step 4 In the navigation tree, choose Session.
Step 5 Select Connection type in Serial, as shown in Figure 8-40.

Figure 8-40 PuTTY Configuration

Step 6 Click Open.


The PuTTY window is displayed.
Step 7 Enter a user name and password.
If the login is successful, the server host name is displayed on the left of the
prompt.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 167


Huawei Servers
Troubleshooting 8 Common Operations

8.17 Logging In to a Compute Node, Passthrough


Module, or Switch Module by Using the SOL Function
of the MM910
Operation Scenario
You can use the Serial over LAN (SOL) function of the management module to
access a compute node, passthrough module, or switch module in a chassis for
remote maintenance of the E9000.

Prerequisites
Conditions

● You have logged in to the MM910 CLI by using the floating IP address of the
MM910.
● There is no jumper cap over the pins on the mainboard of the compute node,
passthrough module, or switch module.

Data

You have obtained the following data:

● User name and password for logging in to the management module. The
default user name of the MM910 is root, and the default password is
Huawei12#$.
● User name and password for logging in to the compute node to be
connected. The default user name is root, and the password is Huawei12#$.
● Password for logging in to the passthrough module or switch module to be
connected The default password is Huawei12#$.

Procedure
Step 1 Use an SSH tool and the floating IP address of the MM910 to log in to the
MM910 CLI.

In this document, PuTTY is used as the SSH tool. For details, see 8.15 Logging In
to a Server Over a Network Port by Using PuTTY.

Step 2 Log in to the SOL screen.

telnet 0 1101
*=====================================================================*
* Welcome to SMM SOL Server *
* Please log in with SMM account and password. *
*=====================================================================*
user name:

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 168


Huawei Servers
Troubleshooting 8 Common Operations

NOTICE

If you need to disconnect the service terminal or server power after logging in to
the SOL screen, exit the SOL screen first. Otherwise, re-logging in to the SOL
screen will fail.

Step 3 Enter the user name and password.

The screen for selecting a slot number is displayed.


Log in Success!

*=====================================================================================
======================
please input the SOL Blade1~Blade16(1 ~ 16), Blade1A~Blade16A(17 ~ 32), Swi1~Swi4(33 ~ 36) and
COM#(n)
press Ctrl+R to return
*=====================================================================================
======================

Blade1~Blade16(1 ~ 16)
Blade1A~Blade16A(17 ~ 32)
Swi1~Swi4(33 ~ 36)
Please input your choice:

The numbers in the preceding information are described as follows:

● 1 to 32 indicate the compute nodes in slots 1 to 32, respectively.


● 33 to 36 indicate the switch modules in slots 1E, 2X, 3X, and 4E, respectively.

Step 4 Enter the slot number of the compute node, passthrough module, or switch
module, and press Enter.
● If you enter a compute node slot number, the following serial port
information is displayed:
1 systemcom
2 RAIDcom
3 BMCcom
4 Exboardcom

Or
1 SYS COM
2 BMC COM

Or
1 systemcom
2 BMCcom

● If you enter a switch module slot number, the following serial port
information is displayed:
1 BMCcom
2 fabriccom
3 basecom
4 FCcom

Or
1 BMCcom
2 fabriccom

Or
1 BMCcom
2 fabriccom
3 basecom

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 169


Huawei Servers
Troubleshooting 8 Common Operations

● If you enter a passthrough module slot number, the following serial port
information is displayed:
1 BMCcom

Step 5 Enter the value representing the serial port to be connected, and press Enter.

The serial port screen is displayed. On this screen, you can perform operations
such as configuration and query.

NOTE

You can press Ctrl+R once to return to the slot number selection screen shown in Step 3, or
press Ctrl+R twice to exit the SOL screen.

----End

8.18 Logging In to a Compute Node, Passthrough


Module, or Switch Module by Using the SOL Function
of the MM920/MM921
Scenarios
You can use the SOL function of the management module to access a compute
node, passthrough module, or switch module in a chassis for remote maintenance
of the E9000.

Prerequisites
Conditions

● You have logged in to the MM920/MM921 CLI by using the floating IP


address of the MM920/MM921.
● There is no jumper cap over the pins on the mainboard of the compute node,
passthrough module, or switch module.

Data

You have obtained the following data:

● Username and password for logging in to the management module. The


default username and password of the MM920/MM921 are Administrator
and Admin@9000 respectively.
● Username and password for logging in to the compute node to be connected.
The default username and password are Administrator and Admin@9000
respectively.
● Password for logging in to the passthrough module or switch module to be
connected The default password is Huawei12#$.

Procedure
Step 1 Use an SSH tool and the floating IP address of the MM920/MM921 to log in to
the CLI.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 170


Huawei Servers
Troubleshooting 8 Common Operations

In this document, PuTTY is used as the SSH tool. For details, see 8.15 Logging In
to a Server Over a Network Port by Using PuTTY.
Step 2 Run the ipmcget -l bladeN -t SOL -d cominfo or ipmcget -l swiN -t SOL -d
cominfo command to query the SOL port information of the compute node, pass
through module, or switch module.
Step 3 Run the ipmcset -l bladeN -t sol -d activate -v com_value or ipmcset -l swiN -t
sol -d activate -v com_value command to enter the serial port input interface.
Step 4 Enter the username and password as prompted.

----End

8.19 Using WinSCP to Transfer Files


Scenarios
Use WinSCP to transfer files from a PC to a server.

Prerequisites
Conditions
The Secure File Transfer Protocol (SFTP) service has been enabled on the
destination device.
Data
You have obtained the following data:
● You have obtained the IP address of the server to be connected.
● You have obtained the user name and password for logging in to the server to
be connected.
Software Tools
WinSCP.exe: This tool is third-party software. You need to prepare it by yourself.

Procedure
Step 1 Open the WinSCP folder, and double-click WinSCP.exe.
The WinSCP Login dialog box is displayed, as shown in Figure 8-41.

NOTE

To change the UI language, click Languages.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 171


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-41 WinSCP Login

Step 2 Set the login parameters.

The parameters are described as follows:

● Host name: Enter the IP address of the server to be connected. For example,
192.168.2.10.
● Port number: The default value is 22.
● User name: Enter the username. For example, admin123.
● Password: Enter the password. For example, admin123.
● Private key file: This parameter is left blank by default. Retain the default
value.
● Protocol: Retain the default option SFTP in the File protocol drop-down list,
and select Allow SCP fallback.

Step 3 Click Login.

The WinSCP file transfer window is displayed.

NOTE

● If a private key file is not selected at the first login, the warning message "Continue
connecting and add host key to cache" is displayed. Click Yes. The WinSCP file transfer
window is displayed.
● On Windows 7, C:\Users\Administrator\Documents on the local PC is opened in the
left pane, and /root on the server is opened in the right pane by default.

Step 4 In the left and right panes, create, delete, or copy folders in specific directories as
required.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 172


Huawei Servers
Troubleshooting 8 Common Operations

Figure 8-42 WinSCP window

----End

8.20 Configuring an FTP Server


Scenarios
Configure an FTP server to transfer files from a PC to a switching plane.

Prerequisites
● A PC is connected to the server by using a serial cable.
● WFTPD has been installed.

Software Tools
wftpd32.exe: used to transfer files between different platforms, for example, from
a PC to a switching plane of a switch module. This tool is third-party software. You
need to prepare it by yourself.

Procedure
Step 1 Double-click wftpd32.exe.

The No log file open - WFTPD window is displayed.

Step 2 Choose Logging > Log Options.

The Logging Options dialog box is displayed.

Step 3 Select all check boxes except Winsock Calls, and click OK.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 173


Huawei Servers
Troubleshooting 8 Common Operations

Step 4 Choose Security > Users/rights.

The Users/Rights Security Dialog dialog box is displayed.

Step 5 Click New User. In the displayed dialog box, enter a new username (for example,
vxworks) and click OK.

The Change Password dialog box is displayed.

Step 6 Enter a new password (for example, vxworks) in the New Password and Verify
Password text boxes, and click OK.

Step 7 Copy the upgrade file to a directory (for example, D:\FTP) on the PC.
NOTE

The directory can contain only English characters.

Step 8 Select vxworks from the User Name combo box, and enter the upgrade file
directory (for example, D:\FTP) in the Home Directory text box. See Figure 8-43.

Figure 8-43 Users/Rights Security Dialog dialog box

Step 9 Click Done.

The FTP server is configured.

----End

8.21 Using SFTP to Transfer Files


Scenarios
Transfer files on the local PC using SFTP.

Prerequisites
The SFTP service has been enabled on the destination device.

Software Tools
mini-sftp-server.exe (free software)

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 174


Huawei Servers
Troubleshooting 8 Common Operations

Procedure
Step 1 Double-click mini-sftp-server.exe.
The Core FTP mini-sftp-server dialog box is displayed, as shown in Figure 8-44.

Figure 8-44 Core FTP mini-sftp-server

Step 2 Set the parameters as prompted:


The parameter descriptions are displayed as follows:
● User: specifies the username for logging in to the SFTP server.
● Password: specifies the password for logging in to the SFTP server.
● Port: specifies the port number, which is 22.
● Root path: specifies the home directory of the SFTP server.
Step 3 Click Options and enter the SFTP server IP address of the SFTP server. For
example, 192.168.2.10.
Step 4 Click Start.
The file transfer page is displayed.

----End

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 175


Huawei Servers
Troubleshooting 9 Other Resources

9 Other Resources

9.1 Obtaining Technical Support


9.2 Product Information Resources
9.3 Product Configuration Resources
9.4 Maintenance Tools

9.1 Obtaining Technical Support


Technical Support Website
Obtain technical documents at Huawei Technical Support.

Self-Service Platform and Community


Learn more about servers and communicate with experts at:

● Visit Computing Product Information Service Platform for server product


documentation.
● Visit Huawei Enterprise iKnow for Q&A about products.
● Visit Huawei Enterprise Support Community (Servers) for learning and
discussion.

News
For notices about product life cycles, warnings, and updates, visit Support >
Bulletins > Product Bulletins.

Cases
For details about existing cases, see the Computing Case Library.

NOTE

The Computing Case Library is available only to Huawei engineers and partners.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 176


Huawei Servers
Troubleshooting 9 Other Resources

Huawei Technical Support


If a fault persists after taking troubleshooting measures specified in documents,
contact technical support at your local Huawei office. If your local Huawei office is
not available, contact Huawei technical support as follows:
● Contact Huawei customer service center.
– Enterprise customers in China can contact Huawei in the following ways:

▪ Hotline: 400-822-9999

▪ Email: support_e@huawei.com
– Enterprise customers outside China can obtain the customer service
information from: Global Service Hotline.
– Carrier customers in China can contact Huawei in the following ways:

▪ Hotline: 400-830-2118

▪ Email: support@huawei.com
– Carrier customers outside China can obtain the customer service
information from: Global TAC Information.
● Contact the technical support personnel of the local Huawei office.

9.2 Product Information Resources


Table 9-1 describes the product information resources.

Table 9-1 Product information resources


Information Resource Description How to Obtain

Server product Describes the server 1. Visit Support >


documentation structure, specifications, Intelligent Servers or
and installation method. Support > Ascend
Each Huawei server has Computing.
a user guide or 2. Choose a server
maintenance and service model to access the
guide. product page.
3. On the
Documentation tab
page, choose
Operation &
Maintenance.
4. View the required
user guide or
maintenance and
service guide.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 177


Huawei Servers
Troubleshooting 9 Other Resources

Information Resource Description How to Obtain

Computing Product Used to query OSs, Computing Product


Compatibility Checker components, and Compatibility Checker
external devices that are
compatible with servers.

Maintenance Used to query the service Customer Support


Information Inquiry information about Service
System devices.

Computing Product Used to calculate server Computing Product


Power Calculator power consumption with Power Calculator
different configurations.

3D display of computing Used to view the 3D Computing Interactive


products structure of the server Product Display
hardware.

9.3 Product Configuration Resources


Table 9-2 describes the product configuration resources.

Table 9-2 Product configuration resources


Tool Name Description How to Obtain

Removal and installation Describe how to remove Intelligent Computing


videos and install hardware. Product Hardware
Installation Multimedia

Computing Product Online application that Computing Product


Memory Configuration shows the DIMM Memory Configuration
Assistant installation sequence in Assistant
a graphical manner after
the product name, CPU
quantity, and DIMM
quantity are specified.

9.4 Maintenance Tools


and Table 9-3 list the software tools required for routine maintenance of Huawei
servers.

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 178


Huawei Servers
Troubleshooting 9 Other Resources

Table 9-3 Software tools for routine maintenance


Name Server and Description
Version

FusionServer See the Only Huawei FusionServer V2 & V3 servers


Tools Toolkit FusionServer are supported. Diagnoses and configures
Tools 2.0 servers.
Toolkit User Download link: FusionServer Tools
Guide.
FusionServer See the Used for new site deployment and delivery,
Tools 2.0 FusionServer troubleshooting, and firmware upgrade.
SmartKit Tools 2.0 Download link: FusionServer Tools
SmartKit User
Guide.
Smart See the Smart Only Huawei FusionServer V5 servers are
Provisioning Provisioning supported. Smart Provisioning is used to
User Guide. install OSs without a physical DVD-ROM
drive, configure RAID, upgrade firmware,
and perform troubleshooting.
Download link: Smart Provisioning

Issue 20 (2020-09-25) Copyright © Huawei Technologies Co., Ltd. 179

You might also like