You are on page 1of 127

DELL POWEREDGE

SERVER CONCEPTS:
SECTION 04 SERVER
MAINTENANCE

COURSE GUIDE
Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page i


Table of Contents

PowerEdge Server Concepts: Server Maintenance ................................................ 6


Server Maintenance Objectives ........................................................................................... 7

Server Maintenance Introduction ............................................................................. 8


Introduction to Server Maintenance ..................................................................................... 9

Server Components ................................................................................................. 11


Server Components Overview ........................................................................................... 12
Server Memory .................................................................................................................. 13
Memory Comparison.......................................................................................................... 16
Types of System Components ........................................................................................... 18
Types of Expansion Cards ................................................................................................. 22
PERC Overview and PERC 10.6 ....................................................................................... 26
PERC 11.1 ......................................................................................................................... 28
Open Compute Project (OCP)............................................................................................ 30

Server Management ................................................................................................. 32


Server Management Methods Overview ............................................................................ 33
In-Band and Out-Of-Band Management............................................................................. 39
iDRAC Service Module (iSM) and Virtual Console ............................................................. 41

Power Distribution ................................................................................................... 44


Power Supply Unit (PSU) ................................................................................................... 45
Power Supply Configuration ............................................................................................... 47
Uninterruptible Power Supplies (UPS) ............................................................................... 49

Server Environment and Maintenance ................................................................... 50


Server Chassis Features - iDRAC ...................................................................................... 51
iDRAC Feature Controls .................................................................................................... 52
iDRAC Server Maintenance Functions ............................................................................... 54
Performing a Shutdown...................................................................................................... 55
Server Cooling ................................................................................................................... 60

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page ii © Copyright 2022 Dell Inc.


Importance of HVAC—Heating, Ventilation, and Air Conditioning ...................................... 64
Air Flow Challenges in the Data Center Environment ......................................................... 65
Maintenance Tasks ............................................................................................................ 66
Modular Server Characteristics .......................................................................................... 68
Modular Server Management ............................................................................................. 74

Server Hardware Troubleshooting ......................................................................... 77


Server Hardware Troubleshooting Overview ...................................................................... 78
Liquid Crystal Display (LCD) Error Messages Overview..................................................... 79
Configuring the LCD Panel ................................................................................................ 81
Viewing System Front Panel Light-emitting Diode (LED) Status Remotely ......................... 82
Quick Sync 2...................................................................................................................... 84
Hot Swap and Cold Swap .................................................................................................. 86
iDRAC Maintenance Section Overview .............................................................................. 89
Event Logs ......................................................................................................................... 90
Playing a Boot Capture Video ............................................................................................ 91
POST Code/ Intrusion/ Last Crash Screen ......................................................................... 93
iDRAC Troubleshooting - SupportAssist Collections .......................................................... 97
Troubleshooting iDRAC ..................................................................................................... 99
Reset iDRAC to Default Settings...................................................................................... 101

Server Configuration and Change Management ................................................. 102


Configuration and Change Management Overview .......................................................... 103
Server Documentation ..................................................................................................... 105
Procedures and Standards .............................................................................................. 106
Patch Management .......................................................................................................... 107
Windows Server Update Services .................................................................................... 108

Resources............................................................................................................... 110
Supporting Resources: Server Maintenance .................................................................... 111
Certification Journey Map ................................................................................................ 112

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page iii


Appendix ............................................................................................... 115

Glossary ................................................................................................ 117

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page iv © Copyright 2022 Dell Inc.


PowerEdge Server Concepts: Server Maintenance

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 5


PowerEdge Server Concepts: Server Maintenance

PowerEdge Server Concepts: Server Maintenance

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 6 © Copyright 2022 Dell Inc.


PowerEdge Server Concepts: Server Maintenance

Server Maintenance Objectives

At the end of this course, the learner will be able to:

• Define server maintenance.


• List and describe server components.
• List and describe the different server management applications.
• Define server power distribution.
• Define server environments and maintenance.
• List and describe server troubleshooting tasks.
• Manage server configuration and apply change management.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 7


Server Maintenance Introduction

Server Maintenance Introduction

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 8 © Copyright 2022 Dell Inc.


Server Maintenance Introduction

Introduction to Server Maintenance

Network Out-of-Band Switches

Management Servers

Monitor and
Maintain Compute Servers

Server Health

Database Servers
Data Protection Appliance

Network In-Band Switches

LOT Backup (2 unit)

Network Fibre Channel Switches

All Flash SAN Storage Servers

Example: Generic two rack datacenter system with server maintenance in place.

What is server maintenance?

Server maintenance is the optimization of server components through monitoring


and repair of the hardware. In server maintenance, IT administrators monitor
servers and react to underperforming metrics. Server optimization is accomplished
through the use of server specific management applications. Server management
applications are either In-Band or Out-of-Band tools.

Server hardware maintenance tasks:

• Hot swap (identified with an orange tab) and cold swap (identified with a blue
tab) server components that fail.
• Swap out or isolate servers that fail.
• Maintain clean and cool server environments.
• Perform server firmware and driver updates.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 9


Server Maintenance Introduction

Server software maintenance tasks:

• Use iDRAC, OMSA, OME, and SupportAssist applications to manage servers.


All of the applications are covered in detail in the Server Management section.
• Monitor server health.
• Perform software (OS) upgrades.
• Examine logs to troubleshoot the server.

Server maintenance and monitoring is covered in more detail in this topic.

Why is server maintenance important?

Server maintenance is important to maintain server reliability and performance. For


example, dust accumulates on the server over a period of time. Dust slows down
the server and raises the server operating temperature. Left unchecked or ignored,
dust accumulation carries a risk of a server crash. Also, system monitoring utilities
that are not properly installed and configured can fail identifying potential failure
risks.

For reasons like dust accumulation and monitoring software functioning well, server
maintenance is performed regularly to ensure long-term server health and
functionality. Server maintenance helps to save money on repairs by preventing
damage and complete replacement of a server system.

Benefits of server maintenance are:

• Enhance the life span of the server components.


• Server reliability.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 10 © Copyright 2022 Dell Inc.


Server Components

Server Components

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 11


Server Components

Server Components Overview

Expansion
Risers

Expansion
Cards

Processors

Memory

PowerEdge R740 Rack Server (top view)

Components found in a PowerEdge R740 rack server.

All servers (tower, rack, and modular) use the same basic configuration in
components as a desktop system. However, a server provides a different purpose
compare to a desktop which includes a design to support multiple users. Multiple
users is why the server has more CPU and memory than a desktop. A server is
able to run 24x7 and can be managed remotely through iDRAC.

The server components include: A System board, enhanced central processing unit
(CPU), expansion cards, enhanced memory, system fans, and hard drives.

The components that are displayed in the image are covered in detail in this topic.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 12 © Copyright 2022 Dell Inc.


Server Components

Server Memory

What is Server Memory?

Servers prepare programs that are used for any system unit on a network, while
client systems are responsible for their own operations. The memory for each is
different. Servers use larger memory capacity and bandwidth to cope with multiple
CPU processing loads and performing operations to simultaneously support client
workstations. Memory risers are present as a dynamic architecture to support a
longer equipment provision life cycle. The dynamic architecture provides staged
pathways for upgrading, and ultimately simplifies server maintenance.

PowerEdge Server

Client Data In
12x NVMe Drives 12x NVMe Drives

Drives mapped to CPU1 Drives mapped to CPU2

Client

Client
DDR4 NVDIMMs (not to
scale) NVDIMMs NVDIMMs
Data Out
Workload

Error Correcting Code

Server systems run on an Error-Correcting Code (ECC) memory type while client
systems run on non-ECC memory. The ECC memory system tests and corrects
any errors in memory without interrupting the other server operations. ECC also
makes corrections without the processor or user being aware.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 13


Server Components

Servers Error-Correcting Code (ECC)


Memory
a) Both data (M bits) and code generated by Data In traffic are stored.
b) During fetch, new K code bits are generated from M data
bits and compared with fetched K code bits.
c) If no errors were detected in Compare, then the path to Data
Data Out
Out is taken.
d) Errors detected by Compare are fixed by the Corrector.
Corrector
e) Errors detected by Compare but cannot be fixed by Corrector.

Data M bits
Data In Compare Error Signal
Code K bits
DDR4 Server Memory

Note: 1 byte = 8 bits. ECC uses a hamming code style function that
allows the correction of a single-bit error in words. The idea is that
every 64-bit value is hashed with an 8-bit value that is recorded with
it. ECC can detect 2-bit errors but cannot fix them. ECC DIMMs
function by adding 8 bits and extra chips to the memory
module/DIMM.

Memory Technology

Random access memory (RAM) performs tasks to store data that can be quickly
accessed, read, and written by the CPU. Double Data Rate (DDR) is a form of
Dynamic RAM (DRAM) which is a widely used RAM in server memory technology.
DDR is known for its low-power requirements and high-speed data transfer rate.

Memory Voltage

In server systems, voltage reduction is an important pre-requisite for limiting power


consumption and heat generation due to the increase in bandwidth. DDR4 memory
operates at 1.2-volts (v) as opposed to DDR3 at 1.5 v and DDR3L at 1.35 -1.5 v.

Memory Types

There are different types of DIMMs used in PowerEdge servers.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 14 © Copyright 2022 Dell Inc.


Server Components

Memory Type Description

UDIMM Unbuffered low density, low latency Dual In-line memory


modules (DIMM) that do not include a register or a buffer chip.
Purchased by customers who need the lowest possible latency,
UDIMMs are no longer lower cost than RDIMMs, and may cost
more than RDIMMs due to availability.

RDIMM Registered DIMMs provide better signal integrity, population of


more DIMM channels, and better performance for heavier
workloads. RDIMMS have a slight increase in latency and
generally use slightly more power than a UDIMM due to the
onboard register.

LRDIMM Load Reduced DIMMs use a buffer to reduce memory loading to


a single load on all DDR signals that allows for greater density.
LRDIMMs can navigate outside of these restrictions by using the
memory buffer chips. When a server is exclusively configured with
LRDIMMs, the memory controllers in the processors automatically
shift to serial mode.

NVDIMM A Non-Volatile Dual In-line memory module (NVDIMM) is a


type of random-access memory that retains its contents, even
when electrical power is removed. For example: an unexpected
power loss, system crash, or normal shutdown. NVDIMMs
improve application performance, data security, and system crash
recovery time.

Go to www.dell.com/support to search for and view the Supported Memory


Configuration Guide for PowerEdge Servers knowledge-based article (KBA).

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 15


Server Components

Memory Comparison

The table highlights the differences in memory features across the three
generations of Dell PowerEdge servers.

Server Generation 13G 14G 15G

Number of 12 DIMMs per 12 DIMMs per 16 DIMMs per


Supported DIMMs CPU CPU CPU

RAM Size 1 X 4 GB 1 X 8 GB 1 X 8 GB

DIMM Type RDIMM RDIMM RDIMM


LRDIMM LRDIMM LRDIMM
NVDIMM-N NVDIMM-BP

Supported Channels 4 Channels 6 Channels 8 Channels


per Processor

Intel Persistent N/A Apache Pass Barlow Pass


Memory 3DXPoint1 3DXPoint2

Supported Transfer 2400 MT/s 2933 MT/s 3200 MT/s


Speed 2133 MT/s 2666 MT/s 2933 MT/s
1866 MT/s 2400 MT/s 2666 MT/s
1600 MT/s 2133 MT/s
1866 MT/s

1 The 3DXPoint is a type of nonvolatile memory.


2 The 3DXPoint is a type of nonvolatile memory.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 16 © Copyright 2022 Dell Inc.


Server Components

Note: NVDIMM-N is not supported on Dell PowerEdge 15G servers.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 17


Server Components

Types of System Components

System Board

Expansion Slot

T
Storage Controller
Processor Sockets

Memory Sockets Memory Sockets

SSD/HDD

PowerEdge R740

PowerEdge R740 rack server system board.

A system board is the main circuit board of a server system that connects and
governs the interactions between system components. It is similar to a
"motherboard" in a desktop or laptop unit, with different features and functionalities.

The major components on the server system board include central processing unit
(CPU) sockets, memory socket, storage controller, a supporting circuitry known as
the chipset, a hard disk, and an expansion slot for connecting other hardware.

Hard Disk: The hard disks are connected to the server backplane that is
connected to the storage controller. The storage controller is connected to or is
embedded in the system board.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 18 © Copyright 2022 Dell Inc.


Server Components

Chipset: According to the Wikipedia definition: The chipset is a set of electronic


components in an integrated circuit known as a "Data Flow Management System"
that manages the data flow between the processor, memory and peripherals. Read
more about chipsets in the attribution link provided.

SATA: Serial Advanced Technology Attachment (Serial ATA) is a bus interface


used for connecting and transferring data from a hard disk to other parts of the
systems.

Central Processing Unit (CPU)

2nd Generation 2nd Generation


Intel® Xeon® Intel® Xeon®
Scalable processor Scalable processor
family family

PowerEdge R740

Central Processing Unit (CPU) on a PowerEdge R740 rack server.

The CPU is the processor in charge of processing critical information and


instructions in the server system. The processor determines how quickly the
system runs programs, loads pages.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 19


Server Components

Bandwidth, clock speed, and the number of processor cores all contribute to
processor performance. A server system processor performs for longer periods at
100% sustained loads.

Unique characteristics of a CPU are:

• Uses different types of sockets.


• Server CPUs could have more cores than client system CPUs.
• CPUs have the memory controller and PCI hubs built in

Expansion Card

Expansion Card

Expansion Insert card

Card Riser 3 Insert card


Expansion Card Riser
1

Expansion Card

PowerEdge R740

Expansion card inserted into the server expansion riser.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 20 © Copyright 2022 Dell Inc.


Server Components

An expansion card is a printed circuit board (PCB) that enhances the functionality
of a server. Depending on the server generation, different types of PCI technology
for expansion card inclusion is supported.

Expansion cards are installed in the expansion slots or expansion riser slots of a
server. A connector is used to create an electronic interface between the server
system board and the expansion card. Some examples of expansion cards are
video graphics cards, network cards, and storage controller cards.

Commonly used expansion cards are:

• Redundant Array of Independent Disks (RAID) and Host Bus Adapters (HBA)
Modules
• General Purpose Computing on Graphics Processing Units (GPGPU)
• Network Interface Card (NIC) and Converged Network Adapters (CNA)
• Host Channel Adapters (HCA)

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 21


Server Components

Types of Expansion Cards

RAID and HBA Modules

A host bus adapter (HBA and eHBA) is an expansion card that plugs inside a slot
on a server system board (such as PCIe). The HBA connects the host to the
storage or network devices and delivers fast, reliable non-RAID Input/Output (I/O).

Heatsink Battery

SAS cable connector

PCIe connector

H740P PERC card.

A RAID controller card is similar to an HBA and adds redundancy, improved


performance, reduced latency. RAID controllers are typically more expensive than
HBAs and handle fewer devices. HBA is a passthrough device while RAID has
more functions and features.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 22 © Copyright 2022 Dell Inc.


Server Components

GPGPU

Cooling Fan

PCIe Connector

Example of GPGPU is NVIDIA Tesla.

General-purpose computing on graphics processing units (GPGPU) properties lead


to a different processor architecture from traditional CPUs. CPUs devote many
resources (primarily chip area) to make a single stream of instructions run at high
speed. Caching to hide memory latency and complex instruction-stream processing
such as pipelining, out-of-order execution, and speculative execution are workloads
that require floating point capabilities from the processor. GPUs excel at floating
point computing unlike the typical CPU.

Fast context switching hides memory latency. When a memory fetch is issued while
processing one subset of data elements, that subset is set aside. Another subset
that is not waiting on a memory reference replaces the memory that is waiting.

GPUs use the chip area for hundreds of individual processing elements that
simultaneously run a single instruction stream on multiple data elements.

NIC and CNA

In large enterprise companies, main servers have (at least) two adapters – Fibre
Channel Host Bus Adapter (FC HBA) and Ethernet Network Interface

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 23


Server Components

Card (Ethernet NIC). The adapters connect to the storage network (Fibre Channel)
and system network (Ethernet). Converged Network Adapters (CNA) combine the
functionality of both adapters into one.

The diagram shows both the traditional setup with FC HBA and NIC as well as the
CNA and Fibre Channel over Ethernet (FCoE) setup. In the first diagram, the
server requires two separate adapters to connect to the Ethernet-based system
network and the FC-based storage network.

The setup in the second diagram requires one adapter (CNA), which carries both
Ethernet traffic and FCoE traffic on a single cable. This cable connects to one of
the Ethernet ports in the converged switch that has both Ethernet and Fibre
Channel ports. This converged switch converts the FCoE traffic into Fibre Channel
traffic to be sent to the FC SAN over the Fibre Channel network. Computer network
traffic is directly sent to the LAN over the Ethernet network.

A storage area network (SAN) is a dedicated network that provides access to


consolidated, block-level data storage.

Traditional Setup with FC HBA and NIC New Setup with CNA and FCoE

Server Server

Ethernet Switch
NIC Ports

FC HBA CNA
Fibre Channel Switch Ethernet/FCoE Fibre Channel
Ports Switch Ports Switch Ports
Ethernet
Ethernet Fibre Channel

Fibre Channel Fibre Channel Ethernet + FCoE Fibre Channel


(FC) Storage (Fibre Channel over (FC) Storage
Area Network Ethernet) Area Network

Traditional setup with FC HBA/NIC and a CNA/Fibre Channel over Ethernet (FCoE) setup.

Host Channel Adapters

Host Channel Adapters (HCA) are deployed on PCI cards. InfiniBand is an


industry-standard, channel-based, switched fabric interconnect architecture for
server and storage connectivity. An HCA differs from a HBA in that the type of
storage or switch connection is different.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 24 © Copyright 2022 Dell Inc.


Server Components

Dell 79DJ3 Mellanox ConnectX-3 56Gbps Single Port QSFP Host Channel Adapter

The key features of HCAs are:

• HCAs have a switched fabric topology - several devices communicate at once.


• HCAs have a bi-directional serial bus.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 25


Server Components

PERC Overview and PERC 10.6

The PowerEdge RAID Controller (PERC) is a series of RAID disk storage


controllers. PERC supports Serial-Attached SCSI (SAS) hard drives, Serial
Advanced Technology Attachment (SATA) hard drives, Solid-State Drives (SSDs).

1 2

1: The PowerEdge RAID Controller (PERC) 10 series consist of the H345, H740P,
H745, H745P MX, and H840 cards.

The PERC 10 family of storage controller cards has the following characteristics:

• PERC 10 complies with serial-attached SCSI (SAS) 3.0 providing up to 12


Gb/sec throughput.
• PERC 10 supports Dell-qualified serial-attached SCSI (SAS) hard drives, SATA
hard drives, and solid-state drives (SSDs).
• PERC 10 offers RAID control capabilities including support for RAID levels 0, 1,
5, 6, 10, 50, and 60.
• PERC 10 provides reliability, high performance, and fault-tolerant disk
subsystem management.

2: Some of the features of PERC 10.6 are as follows:

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 26 © Copyright 2022 Dell Inc.


Server Components

• The auto Configure RAID 0 feature creates a single drive RAID 0 on each hard
drive that is in the ready state.
• A non-RAID disk is a single disk to the host, and not a RAID volume. The only
supported cache policy for non-RAID disks is Write-Through.
• Physical disk power management is a power-saving feature of PERC 10 series
cards. The feature allows disks to be spun down based on disk configuration
and I/O activity.
• FastPath is a feature that improves application performance by delivering high
I/O per second (IOPs) for solid state drives (SSD). The Dell PERC 10 series
supports FastPath.

Tip: For more information about PERC 10.6 follow this link:
https://www.dell.com/support/manuals/en-us/poweredge-rc-
h840/perc10_ug_pub/overview?guid=guid-ecf11753-0ae0-4122-
b875-d909905059ae

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 27


Server Components

PERC 11.1

The PERC11 controller introduces many new features that boost performance.
New features such as support for the PCIe Gen4 host interface and the upgraded
DDR4 8GB 2666MT/s cache memory. However, the greatest addition to this
generation of technology is the inclusion of NVMe hardware RAID support. NVMe
hardware RAID support is available on the H755N front, H755MX and H755
adapter form factors.

1 2

1: The PERC 11 series consists of the many different adapters. PERC H755
adapter, PERC H755 front SAS, and PERC H755N front NVMe, PERC H750
adapter SAS, PERC H755 MX adapter, PERC H355 adapter SAS, PERC H355
front SAS, and PERC H350 adapter SAS cards.

The characteristics of PERC 11 adapters are:

• PERC 11 provides reliability, high performance, and fault-tolerant disk


subsystem management.
• PERC 11 offers RAID control capabilities including support for RAID levels 0, 1,
5, 6, 10, 50, 60. Note: H350 and H355 do not support all these RAID levels.
• PERC 11 complies with Serial Attached SCSI (SAS) 3.0 providing up to 12
Gb/sec throughput.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 28 © Copyright 2022 Dell Inc.


Server Components

• PERC 11 supports Dell-qualified Serial Attached SCSI (SAS), SATA hard


drives, Solid State Drive (SSD), and PCIe SSD (NVMe).
• PERC 11 supported drive speeds for NVMe drives are 8 GT/s and 16 GT/s at
maximum x2 lane width.

2: Some of the features of PERC 11.1 are as follows:

• A non-RAID disk is a single disk that is connected to the host that is not part of
a RAID volume. The only supported cache policy for non-RAID disks is Write-
Through.
• Opal Security Management of Opal SED drives require security key
management support. The security key that is set in the Opal drives and used
as an authentication key to lock and unlock the Opal drives can be generated.
IT administrators use the application software or the Integrated Dell Remote
Access Controller (iDRAC) to generate the security key.
• Hardware RoT (Root-of-Trust) builds a chain of trust by authenticating all the
firmware components before its execution. Hardware RoT also permits the
authenticated firmware to perform and be flashed.
• Disk roaming occurs once a hard drive is moved from one cable connection or
backplane slot to another on the same controller.

Tip: For more information about PERC 11.1 follow this link:
https://www.dell.com/support/manuals/en-us/perc-
h755/perc11_ug/dell-technologies-poweredge-raid-controller-
11?guid=guid-d64f78f6-d10c-4228-ae3f-f8e455ec9d04

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 29


Server Components

Open Compute Project (OCP)

Illustration of OCP card in a PowerEdge server.

The Open Compute Project (OCP) cards are network cards that connect to the PCI
bus. They are physically smaller than the Industry Standard Architecture (ISA)
expansion cards and often connect to a dedicated connector on the system board.

The OCP card was introduced with the PowerEdge 15G servers.

The benefits of the OCP card are:

• OCP is a removable networking card.


• OCP provides flexibility for customers to choose interconnect (10 GB, 24 GB,
50 GB).
• OCP does not consume a regular PCIe slot.
• OCP replaces the Network Daughter Card (NDC) from previous generation
servers.
• OCP is physically smaller than the ISA expansion card and connects to a
dedicated connector on the system board.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 30 © Copyright 2022 Dell Inc.


Server Components

Important: The OCP and the NDC cards are not a hot-swappable
component.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 31


Server Management

Server Management

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 32 © Copyright 2022 Dell Inc.


Server Management

Server Management Methods Overview

Server management provides maintenance and monitoring of the server hardware,


software, security, and backups. Server management enables users to view the
status and manage all servers from a single workstation. It provides template-
based configurations, such as: create, deploy, and replicate that are optimized for
workload performance.

iDRAC

iDRAC UI Dashboard.

The Integrated Dell Remote Access Controller (iDRAC) improves the overall
availability of Dell servers. The iDRAC enables users to deploy, update, monitor,
and maintain servers from any location.

The iDRAC interface is used to:


• Monitor and control power usage.
• View sensor information such as temperature, voltage, and intrusion.
• Monitor server health status.
• Monitor CPU state, processor throttling, and predictive failure.
• View memory information.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 33


Server Management

• View and export system inventory, system logs.


• Configure server management and BIOS.
• iDRAC group manager enables users to have a simplified iDRAC management.

Dell OpenManage Server Administrator (OMSA)

OMSA-integrated UI home page.

Dell OpenManage Server Administrator (OMSA) is an In-band, one-to-one software


application that can manage and monitor the health of one server.

OMSA offers two solutions for one-to-one systems management:


• Integrated, web browser-based user interface (UI).
• Command-line interface (CLI) through the operating system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 34 © Copyright 2022 Dell Inc.


Server Management

SupportAssist Enterprise

Hardware issue detected


OpenManage Enterprise or
Microsoft System Center
Operations (SCOM) or
OpenManage Enterprise -
Tech Release

SupportAssis
t Enterprise

email notification

System State Data


Customer Site

Dell Technologies Proactive


Technical Support Response
Enabled

Dell SupportAssist Enterprise at work monitoring and reacting to a PowerEdge MX7000 Modular
System hardware issue.

SupportAssist offers remote monitoring, automated collection of system state


information, automatic case creation, and proactive contact from Dell technical
support when needed.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 35


Server Management

Key Features (subjected to warranty entitlement):


• Automated3
• Proactive4
• Predictive5

Note: After July 2022, SupportAssist Enterprise 2.x capabilities such


as device management, case creation, and alert monitoring will be
discontinued. To continue to manage and monitor devices,
administrators must upgrade to secure connect gateway (SCG).
Read more about SCG on the www.dell.com/support site by
accessing the Upgrading SupportAssist Enterprise 2.0.80 to Secure
Connect Gateway – Application Edition knowledge-based article.

3 When issues arise, alerts are issued, possibly before the user is aware something
is wrong. A support case is opened automatically.
4 Proactive monitoring happens 24 x 7 x 365. Dell technical support contacts the

customer to start the resolution. Monthly reports provide recommendations to


optimize health and performance.
5 Using failure analysis, SupportAssist can predict issues and notify the customer

and Dell Technologies before they occur. Support cases are created on behalf of
the customer when issues are predicted.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 36 © Copyright 2022 Dell Inc.


Server Management

OpenManage Enterprise (OME)

Storage Device

OpenManage Enterprise

PowerEdge MX7000
Modular Platform

Network Device

PowerEdge Server OME GUI

Plug-ins and Integrations/Connections

Plug-ins: Examples Integrations/Connections

- OME Power Manager - OME Integration with ServiceNow, VMware, and Microsoft

- OME Services (previously SupportAssist) - OpenManage Ansible Modules (Connect)

- OME Update Manager - OpenManage Micro Focus Operations Bridge Manager (Connect)

Diagram of the infrastructure hardware that OME manages and monitors.

OpenManage Enterprise (OME) is the one-to-many management console used to


discover and manage up to 8,000 devices regardless of the form factor. OME also
used to update and patch systems. OME provides a simple and easy interface for
system administrators to maximize the uptime and health of the Dell systems.

OME helps to:

• Monitor health status and events for Dell PowerEdge racks, towers, modular
servers, or PowerVault MD and ME storage systems, or third-party
infrastructure.
• Provide hardware-level control and management for the PowerEdge server,
blade system, and internal storage arrays.
• Link and launch element management interfaces, such as iDRAC, Chassis
Management Controller (CMC), OME-Modular (OME-M), SC, and EQL group
manager.

Click the image to enlarge.

OMSA is designed so that system administrators can manage PowerEdge server


systems both locally and remotely linked to In-Band management. OMSA provides
information and configuration settings for system hardware, firmware, logs, and
storage.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 37


Server Management

OMSA is In-Band (communicates through the operating system), and iDRAC is


Out-of-Band (communicates outside of the operating system). Also, iDRAC
provides information and functionality regardless of the operating system state.
Most administrators use OMSA when iDRAC is not available in the system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 38 © Copyright 2022 Dell Inc.


Server Management

In-Band and Out-Of-Band Management

Both in-band and out-of-band methods require a network protocol that is configured
on the managed device.

In-Band Management

In-Band: Uses an OS dependent software


agent

Management data traffic

Management Station

Server

Managed Devices

In-Band example.

In-Band management is operating system dependent and cannot manage a device


when powered off or when otherwise inaccessible. OMSA is an example of In-Band
management.

In-band management uses Simple Network Management Protocol (SNMP) for


basic communication and messaging.

Users can do the following tasks using In-Band management:


• Firmware and systems-based driver updates
• Alerts

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 39


Server Management

• Messaging
• Inventory

Out-Of-Band Management

Out-Of-Band: Uses an OS independent hardware controller

Management data traffic

Management Station

IDRAC

Server

Managed Devices

Out-Of-Band example.

Out-Of-Band management is agentless, meaning there are no dependencies on


the operating system of the managed device. The Integrated Dell Remote Access
Controller (iDRAC) is an Out-Of-Band management tool.

Out-Of-Band management benefits:


• A dedicated management device with the ability to configure and monitor
hardware.
• Ideal option for environments that need a network connection to a managed
device (either for security or redundancy) that is separate from the data source.
• Out-Of-Band supports encryption and gives access only to administrators.
Limited access is beneficial as bandwidth, security, and pushing updates can be
data-intensive and can lead to slow performance.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 40 © Copyright 2022 Dell Inc.


Server Management

iDRAC Service Module (iSM) and Virtual Console

iDRAC Service Module (iSM) is a lightweight, optional software agent that users
can install on PowerEdge servers. iSM is an OS-resident process that expands
iDRAC management into supported host operating systems.

iSM Pre-Installation

Setup show iSM is not installed.

Install the iSM using the iDRAC option, or by downloading the file from the support
site and installing it in the server operating system. Before installing the iSM, the
iDRAC reports an error in the iSM setup section.

Installation Verification

Once installing the iSM, the iDRAC reports that iSM is installed and running.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 41


Server Management

Blue installer box is disabled.

iSM setup showing the iSM running.

iDRAC Virtual Console

The iDRAC Dashboard highlighting the virtual console feature.

The iDRAC Virtual Console enables users to access the local console remotely in
either graphic or text mode. Using virtual console, you can control an iDRAC-

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 42 © Copyright 2022 Dell Inc.


Server Management

enabled server. Use the keyboard, video, and mouse on the local management
station to control the corresponding devices on a remotely managed system. Users
can run up to six simultaneous virtual console sessions.

User can use the virtual console with virtual media to perform remote software
installations.

An Enterprise or Data Center license is required to access the Virtual Console in


iDRAC.

iSM adds the following services:


• Operating system information
• Lifecycle controller log replication into operating system
• Automatic system recovery
• Windows Management Instrumentation (WMI) provided with storage data
• SupportAssist collection

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 43


Power Distribution

Power Distribution

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 44 © Copyright 2022 Dell Inc.


Power Distribution

Power Supply Unit (PSU)

A power supply unit (PSU) is an electronic device that is responsible to distribute


power to the server components. A power supply unit also converts source input
power6 to a server and individual components when necessary.

Functionality of a PSU

The input power7 supply takes AC power from the power socket and converts it to
DC power, then the PSU distributes the power throughout the server. There are
also DC PSUs that do not require conversion.

Redundant PSUs
PowerEdge R740 - Rear View

Power
Hot
Swap

Fan

Pull Handle

PowerEdge R740 with two 1100W PSUs.

6 Input power supplies rate the wattage of power they produce for the system.
7 Some input power supplies can accept DC input as well.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 45


Power Distribution

PSU as Hot Swappable Component

• The PSU can be a hot-swappable component that provides power redundancy


support on PowerEdge servers.
• Hot-swappable PSUs have an orange tab indicating the component can be
removed without shutting down the server.
• To retain redundancy and optimum operations, it is suggested to pull one PSU
at a time.
• The number of supported redundant PSUs vary between different chassis
enclosures and server models.
• The minimum required number of PSUs should always be present in the server
based on the server's configuration.

Important: There are also non-hot swap capable PSUs sold in a few
of the Dell PowerEdge servers.

Caution: While performing the replacement procedure, take note of


power supply label numbers and look for indicators like fan movement
or PSU Light Emitting Diode (LED) lights.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 46 © Copyright 2022 Dell Inc.


Power Distribution

Power Supply Configuration

The Power Configuration panel to configure redundant power supplies for servers
is found in the iDRAC9 (and newer versions) interface. The type of power supply
configuration or the redundancy mode depends on the server chassis and the
number of PSUs. When the primary PSU fails, a redundant power supply provides
the necessary power supply to minimize the risk of a complete server shutdown.

Grid Redundancy

In grid redundancy mode, the hot spare8 feature is disabled. The power factor
correction (PFC) is disabled by default, to reduce power consumption when the
system is on standby. However, if a single PSU fails, the power drops down. The
grid redundancy configuration is also known as 1 + 1 configuration.

Grid redundancy policy enables a modular enclosure system to operate in a mode


that tolerates power failures9.

8 When the hot spare feature is enabled, one of the redundant PSUs is switched to
the sleep state. The active PSU supports 100 percent of the system load, thus
operating at higher efficiency. The PSU in the sleep state monitors the output
voltage of the active PSU. If the output voltage of the active PSU drops, the PSU in
the sleep state returns to an active output state.
9 These failures may originate in the input power grid, the cabling, or a PSU itself.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 47


Power Distribution

A disabled hot spare allows power output to be


distributed equally across both power supplies.

iDRAC Power Configuration page in the Configuration section of the user interface.

Non-Redundant

The not redundant configuration is also known as a 2 + 0 configuration. For


example: Adding two power supplies together to create 1140 W (2 x 570 W). The
hot spare feature reduces power consumption when the system is in standby.
However, if a single PSU fails, the power drops down to 570 W.

In non redundant configuration, the hot spare feature is enabled and


Power Factor Correction (PFC) is disabled by default.

iDRAC Power Configuration page in the Configuration section of the user interface.

When a system is configured for Grid Redundancy the PSUs are divided into grids:
PSUs in slots 1, 2, and 3 are in the first grid while PSUs in slots 4, 5, and 6 are in
the second grid. The system management manages power so that if there is a
failure of either grid, the system continues to operate without any degradation. Grid
Redundancy also tolerates failures of individual PSUs.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 48 © Copyright 2022 Dell Inc.


Power Distribution

Uninterruptible Power Supplies (UPS)

Server Facility

UPS
On Servers
Power
AC
Supply
Off
Switches

Power Battery
Supply

Normal Battery Storage

Example of how a UPS works to provide short-term power.

An unexpected power failure cause issues such as data loss or internal hardware
problems. The process to recover data requires time, energy and money - and yet
the data may be impossible to recover.

Uninterruptible Power Supply (UPS) systems provide the short-term power10


necessary to PowerEdge servers. In power failure, the UPS allows the system to
shut down gracefully.

Dell offers UPS devices for small to medium size organizations. Click here to view
a SmartUPS 1500 SMARTCONNECT 120V RM device.

10Servers require a constant, uninterrupted power supply to function correctly. If


the main power source fails, a temporary power source must keep the servers
running to avoid damage to the equipment and prevent losses.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 49


Server Environment and Maintenance

Server Environment and Maintenance

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 50 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Server Chassis Features - iDRAC

iDRAC9 dedicated port

The iDRAC with Lifecycle Controller is embedded on several server models but not all.

The Integrated Dell Remote Access Controller (iDRAC) improves the overall
availability of PowerEdge servers. The iDRAC enables users to deploy, update,
monitor, and maintain servers from any location.

The iDRAC functions regardless of the presence of an operating system or


hypervisor. Due to the iDRAC being an out-of-band (OOB) utility, it functions from a
pre-OS or bare-metal state. Each iDRAC shipped from the factory is ready to use.

The iDRAC9 uses a Nuvoton dual-core ARM A-9 processor @ 800 MHz, with a
512 KB L2 cache, and 8 GB NAND memory.

The iDRAC with Lifecycle Controller is embedded within the Dell PowerEdge
servers.

The iDRAC Service Module (iSM) monitors information from the operating system.

Important: For additional information, go to the Dell Technologies


support website www.dell.com/support to search for server manuals.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 51


Server Environment and Maintenance

iDRAC Feature Controls

iDRAC9 Enterprise Dashboard

1 2 3 4 5

1: The System tab provides a high-level overview of system information, iDRAC


details, and at-a-glance status of the systems.

2: The Storage tab provides details on the storage components. Information about
controllers, hard drives, virtual disks, and enclosures.

3: The Configuration tab provides power management, virtual console, licenses,


systems, BIOS, and the server configuration profile.

4: The Maintenance tab includes The Lifecycle log, job queue, system update,
system event log, troubleshooting, diagnostics, and SupportAssist.

5: The iDRAC Settings tab includes information about the iDRAC itself,
Connectivity, Services, and Users.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 52 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

• The System tab provides system information and iDRAC details and at a glance
status of the systems. More details about the system are accessed from the
tabs inside this section.
• The Storage tab provides details on the storage components. Summary
information and information about controllers, hard drives, virtual disks, and
enclosures are accessed from here.
• The Configuration tab is where settings for items such as: Power management,
virtual console, licenses, systems, storage configuration, BIOS, and server
configuration profile may be configured.
− The Maintenance tab includes: The Lifecycle log, job queue, system update,
system event log, troubleshooting, diagnostics, and SupportAssist.
• The iDRAC tab displays the details of the iDRAC settings. Configuration of the
network settings, IPv4 settings, and the iDRAC service module options for
connectivity, services, and users are also available.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 53


Server Environment and Maintenance

iDRAC Server Maintenance Functions

The iDRAC system maintenance capabilities include:


• Monitor and control power usage.

iDRAC9 System Page for monitoring the PowerEdge server components.

• View sensor information such as temperature, voltage, and intrusion.


• Monitor server health including CPU state, processor throttling, and predictive
failure.
• View memory information.
• Monitor and manage storage.
• View and export system inventory and system logs.

• Configure the server management and BIOS.


• The iDRAC group manager enables users to use a simplified iDRAC
management.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 54 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Performing a Shutdown

Performing a Shutdown

Properly shutting down the server consists of allowing all current operations to
complete, disconnecting current connections, stopping services, and powering off.
An immediate server shutdown can cause the loss of unsaved data.

Windows Server Shutdown Event Tracker

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 55


Server Environment and Maintenance

Performing a Remote Shut Down

When the user cannot physically access In certain situations, it may be


the server, the iDRAC provides this necessary to perform a hard reboot the
ability. To perform a graceful shutdown server. For example, if the operating
using the IDRAC interface: Connect to system stops responding.
the iDRAC > Dashboard tab > click
Graceful Shutdown.

The iDRAC9 dashboard provides the option to Perform a cold boot of the system using the
do a graceful shutdown. Graceful Shutdown drop-down menu.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 56 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Perform a Linux Shutdown

Administrators have several options to reboot or shutdown a Linux system.

• Use a Linux graphic user interface (GUI) like Gnome Desktop or Ubuntu Mate
to select the power off from the menu options.
• Use the $ sudo shutdown, $ sudo reboot, or the $ sudo
poweroff command to power down the Linux system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 57


Server Environment and Maintenance

Perform an ESXi Shutdown

Administrators can shut down ESXi through CLI commands, from the Direct
Console User Interface (DCUI), or from the vSphere client and web client.

• Learn more about the shutdown methods from the VMWare Customer Connect
online support knowledge-based article.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 58 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Warning: Although these methods will gracefully shut down or


reboot an ESXi host (node), they will not safely stop running virtual
machines on the host. Administrators should always ensure that
virtual machines are migrated off the host and that the host is in
maintenance mode before attempting to reboot or shut down a host
using any method.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 59


Server Environment and Maintenance

Server Cooling

Most server design revolves around cooling the main components: power supplies,
processors (CPU and GPU), and memory. For this reason, the chassis has
memory shrouds, processor heatsinks, and power supply fans. GPUs provide their
own cooling fans. The airflow of the server chassis is like that of a client system.
The only difference is that the server chassis emits more heat load11.

Air Shroud

Cooling Fans

PowerEdge R750 Air Shroud and Cooling Fans

11 Extra heat is due to the higher wattage used.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 60 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Heatsinks

PowerEdge R750 Heatsinks

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 61


Server Environment and Maintenance

Hot air expelled via exhaust ports

Back of the server is hot to the touch

Front of server should be cool to the touch

Cold air is pulled in via the front vents

Diagram of server airflow

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 62 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Important: Server cooling through liquid cooling (LC) is not covered


in this topic. However, starting in 15G and greater, some PowerEdge
servers provide LC options. To learn more about LC, go to
www.dell.com/support to search for liquid cooling information.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 63


Server Environment and Maintenance

Importance of HVAC—Heating, Ventilation, and Air


Conditioning

Data centers grow as the computing demands in an organization grow. Additional


servers increase the amount of heat that must be managed efficiently12.

If servers are unprotected from heat, then the servers slow down or work differently
than expected. The ideal temperature for the data center depends on the quantity
of servers and amount of heat emitted. Operating within the ideal temperature
range is critical for performance.

CRAH

CRAH

CRAH

Hot aisle containment (HAC) guides the hot air (red arrows) into a Computer Room
Air Handler (CRAH), which then recirculates the flow into cool air (blue arrows).

12Many data centers start out as a few racks in a server room, adding more
equipment over time. Overall, data center HVAC management can become difficult.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 64 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Air Flow Challenges in the Data Center Environment

One method of improving air flow is to use the Hot aisle/Cold aisle layout. Cold air
is routed to the intake in the rack front. Hot air exhaust exits the rack rear and is
routed to cooling equipment. A computer room administrator can choose either a
Computer Room Air Conditioner (CRAC) or Computer Room Air Handler (CRAH) to
route the air.

1 2 3

1: Hot air exits the rack and is sent to the CRAC or CRAH.

2: Cold air is pumped from the CRAC or CRAH to the rack intake.

3: Hot air exits the rack and is sent to the CRAC or CRAH.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 65


Server Environment and Maintenance

Maintenance Tasks

Server maintenance and monitoring helps prevent server failure. For an IT


administrator, prevention helps mitigate server disasters. Servers use fans to
constantly circulate a high amount of air. Cooling is essential to maintain an
optimum operating temperature and ensure that all the equipment operates at peak
efficiency.

5 3

1
2

1: Create a disaster recovery (DR) plan. Do both sites host workloads? How often
does the data replicate? Create a planned site-to-site failover.

2: Examine the power design. Can the server survive a spontaneous failure of
power supplies, Uninterruptible Power Supply (UPS), or building circuits?

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 66 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

3: Verify access of all tools. If the server is not onsite, check availability. Ensure the
iDRAC and operating system tools are available.

4: Review logs for any issues. Including but not limited to iDRAC, Lifecycle
Controller, Server event log (SEL), PERC TTY, and operating system event logs.
Enable and configure alerting including operating system event log forwarding,
SMTP, syslog, and other native utilities such as the iDRAC.

5: Verify the backup plan. Does the workload require a full, incremental, differential
backup? How often are backups completed? Can they successfully be restored?
What are the Recovery Point Objective (RPO) and Recovery Time Objectives
(RTO)?

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 67


Server Environment and Maintenance

Modular Server Characteristics

Dell PowerEdge Modular systems are all-in-one chassis platforms that provide
compute servers, network I/O modules, and storage devices. A modular system
relies on a unique environmental capacity to function as the all-encompassing
platform. IT administrators are aware of the modular system unique specifications
for rack mounting plus cooling and heating needs.

Modular systems are also maintained differently than rack servers because of the
varied components. Some modular systems like the PowerEdge MX7000 function
as a multi-chassis platform cabled together to provide a comprehensive solution.

PowerEdge FX Series

The PowerEdge FX2 is a 2U hybrid rack-based computing platform that can be


configured to hold quarter-width, half-width, and full-width servers and a half-width
storage block.

The FX2 includes a network switch, eight PCI Express (PCIe) expansion slots for a
cost-optimized, entry-level option.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 68 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

FX2 FX2s

Entry option Flex I/O option.

No PCIe slots Eight PCIe 3.0 low-profile expansion


slots

PowerEdge VRTX

Dual SD cards for redundant hypervisors. Hot-plug and swappable HDDs, plus
many RAID options tailored to specific needs, including optional PERC for RAID
controller failover inside the chassis. Optional hot-plug and swappable power
supply units and fans. Versatile shared storage

VRTX systems management is integrated with major third-party management tools,


protecting installed investments and allowing users to work with familiar tools.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 69


Server Environment and Maintenance

These improvements allow users to avoid the additional time and cost of training
that is related to new management solutions.

• Designed to address the needs of remote and branch offices of large


organizations, and small to medium businesses with limited IT resources.

• No compromise on performance
• Versatile shared storage
• Integrated networking and flexible I/O
• Seamless management integration

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 70 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

PowerEdge M1000e

The PowerEdge M1000e is efficient with power, thermal efficiency, performance,


flexibility, and system-wide availability. The chassis integrates the latest in
management, I/O, and power and cooling technologies.

The M1000e uses ultraefficient power supplies with large variable-speed fans to
cool the entire chassis while using less power.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 71


Server Environment and Maintenance

PowerEdge MX7000

The PowerEdge MX7000 chassis hosts disaggregated blocks of server and storage
to create on demand resources. Shared power, cooling, networking, I/O, and in-
chassis management provide outstanding efficiencies.

The PowerEdge MX creates shared pools of disaggregated compute and storage


resources. Users can create workloads using pooled resources. When the
workload is no longer needed, resources are returned to the pool. On-demand
capacity can be managed at a data center level instead of a per-server level.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 72 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

Modular Servers Comparison

PowerEdge PowerEdge PowerEdge PowerEdge


FX Chassis VRTX M1000e MX7000

Form 2U modular Tower or 5U 10U modular 7U modular


factor: enclosure modular enclosure enclosure
enclosure

Server 4 half-width 4 half-height 8 full height 8 standard size


options: 8 quarter-width 2 full height 16 half-height 4 double wide
32 quarter-
height

Server FC430, FC830 M520, M620, M600, M506, MX740c,


Models: FC640 M820 M805, M905 MX750c,
FM120X4 M630, M830 M610, M610x, MX840c
M640 M710, M710HD,
M910, M915
M420, M520,
M620, M820

Storage FD332 up to 3 Integrated PS-M4110 up to MX5016s up to 7


Options: 2

PCIe Slots: 8 PCIe low- 3 full height not applicable not applicable
profile slots 5 low-profile

I/O module 3 full width 1 full height 6 full height 4 full width
(IOM) 2 half-width
Quantity:

I/O module FN2210S, R1-2401, R1- M8024-k, MXL, MX9116n,


(IOM) FN410S, 2210, Pass- M6348, M6220, MX7116n,
Module: FN410T, Pass- Through M6505, M5242, MX5108n,
Through M4001F, MXG640x,
M4000T, Pass- MX5000s, Pass-
Through Through

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 73


Server Environment and Maintenance

Modular Server Management

MX7000 Management Module

MX9002m modules

1 2

MX7000 modular system rear view.

The Management Module (MM) essentially controls the overall chassis power,
cooling, and physical user interfaces such as the front panel.

MX7000 supports two MX9002m modules for redundancy. At least one MX9002m
is required to power on the system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 74 © Copyright 2022 Dell Inc.


Server Environment and Maintenance

LCD, MM, or CMC

M1000e LCD or CMC MX7000 LCD or


OME-M (MM)

A deployment setup wizard that LCD LCD


allows users to configure the CMC
module’s network settings during
initial system setup.

Menus to configure the iDRAC LCD or CMC OME-M


with Lifecycle Controller in each
blade.

Status information screens for CMC OME-M


each blade.

Status information screens for the CMC OME-M


modules installed in the back of
the enclosure, including the I/O
modules, fans, CMC, iKVM, and
power supplies.

A network summary screen listing CMC OME-M


the IP addresses of all
components in the system.

Real-time power consumption CMC OME-M


statistics, including high and low
values, and average power
consumption.

Ambient temperature values. CMC OME-M

AC power information. CMC OME-M

Critical failure alerts and warnings. CMC OME-M

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 75


Server Environment and Maintenance

Note: MM in the MX7000 is similar to the Chassis Management


Controller (CMC) that is used in the M1000e, VRTX, and FX2.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 76 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Server Hardware Troubleshooting

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 77


Server Hardware Troubleshooting

Server Hardware Troubleshooting Overview

IT administrators investigate and resolve server issues when they are identified to
avoid server downtime or data loss.

PowerEdge server hardware troubleshooting steps help users take logical and
systematic steps towards reviewing, diagnosing, and identifying operational or
technical faults in the server. Users review the PowerEdge server replacement
procedure after isolating a damaged server component to complete the
troubleshooting task.

PowerEdge R640

iDRAC

Cold Swap
Removable expansion riser 1B
iDRAC Maintenance/Troubleshooting Page

iDRAC Logs

Hot Swap
Removable cooling fan

Server hardware troubleshooting for a PowerEdge R640.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 78 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Liquid Crystal Display (LCD) Error Messages Overview

LCD backlight on a 14th Generation PowerEdge server.

Since 14G, the Liquid Crystal Display (LCD) panel is an optional feature on some
PowerEdge servers. The LCD displays system information, status, and error
messages to indicate whether the system is functioning correctly or requires
attention. The LCD panel is used to configure or view the system iDRAC IP
address.

The status and conditions of the LCD panel are:

• The LCD front panel displays user-configurable system information, depending


on the system condition.
• The LCD backlight displays blue during normal operating conditions and turns
amber when there is an error condition.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 79


Server Hardware Troubleshooting

• The LCD backlight turns off in standby mode. To turn on the LCD backlight,
press any of the LCD front panel buttons.
• If an error detection while the system is connected to a power source, the LCD
turns amber. The error detection happens regardless of whether the system is
turned on or off.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 80 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Configuring the LCD Panel

Simulation Activity: Follow the guided walk-through to practice the


tasks that are required to configure the LCD panel on the PowerEdge
MX7000 chassis.

The web version of this content contains an interactive activity.

To review the tasks and steps involved in completing the MX7000 Left Ear LCD
Panel simulation job aid, download the job aid document from the on-demand
resources section. Or click the Configuring the Left Control Display (LCD) Panel
Job Aids link to review the task and steps online.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 81


Server Hardware Troubleshooting

Viewing System Front Panel Light-emitting Diode (LED)


Status Remotely

Some PowerEdge servers are not delivered with a full LCD panel so, PowerEdge
servers have a set of Light-emitting Diode (LED) indicators. The front panel in the
iDRAC web interface helps administrators view the system ID LED status as well
as the LCD panel information. To get started, administrators select System >
Overview > Front Panel.

The Live Front Panel Feed section displays the current front panel status.

Color Indicators Description of color indicators

Solid blue Indicates that the system is turned on, the system is healthy,
and system ID mode is not active. Press the system health
and system ID button to switch to system ID mode.

Blinking blue Indicates that the system ID mode is active. Press the system
health and system ID button to switch to system health mode.

Solid amber Indicates that the system is in fail-safe mode.

Blinking amber Indicates that the system is experiencing a fault. Check the
system event log or the LCD panel, if available on the bezel,
for specific error messages.

The iDRAC user interface front panel feature is used to remotely view the LED
status on the server front panel.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 82 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

The live iDRAC front panel section displaying an error message.

Tip: When the system is operating (indicated by the blue health icon
on the LED front panel), both Hide Error and Un-Hide Error are
grayed-out. Only rack and tower servers can hide and unhide errors.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 83


Server Hardware Troubleshooting

Quick Sync 2

Left Control Panel

System health and


Status LED System ID indicator
indicators

PowerEdge R640

iDRAC Quick Sync 2


wireless indicator
(optional)

OpenManage Mobile (OMM)

OpenManage Mobile (OMM) and left control panel on a PowerEdge R640 server with the Quick
Sync 2 indicator.

Another optional maintenance feature for 14G and above PowerEdge servers is
Quick Sync 2. Using Quick Sync 2 with OpenManage Mobile (OMM),
administrators can configure, monitor, and troubleshoot 14G and above
PowerEdge servers and the MX7000 Modular system chassis.

To monitor servers, Quick Sync 2 provides:

• System inventory, including CPU and memory details


• Health status
• iDRAC System Event Log (SEL) and Lifecycle Controller Log (LCC)
• Network settings
• Firmware details
• Diagnostics information including SupportAssist reports, console, last crash
screens, boot, and crash videos.

With Quick Sync 2, an administrator configures the PowerEdge server:

• iDRAC NIC network settings

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 84 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

• iDRAC root credentials


• First boot device
• System location such as: Datacenter, room, aisle, rack, and slot
• Common BIOS settings such as: System profile, virtualization, logical
processor, boot mode, secure boot, serial comm, serial port, USB ports and
asset tag.

Important: To learn more about configuring the iDRAC Quick Sync 2,


go to the Integrated Dell Remote Access Controller 9 (iDRAC9) User
Guide on dell.com/support.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 85


Server Hardware Troubleshooting

Hot Swap and Cold Swap

The terms hot swap and cold swap indicate the replacement of system components
when the system is running or shutdown.

Hot Swap

What is hot swap?

Hot swap components are identified with an orange tab. Hot swap is the
replacement of a hard drive, system fans, power supply, or system devices while
the server remains in operation. When hot swappable devices fail, server devices
continue to function independently while the defective device is replaced.

Hot swappable fan in a PowerEdge R750.

An example of how to remove a hot swappable cooling fan.

1. Press the orange release tab and lift the cooling fan to disconnect the fan from
the connector on the system board.

Note: Ensure not to tilt or rotate the cooling fan while removing it from the
system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 86 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Cold Swap

What is cold swap?

Cold swap components are identified with a blue tab. Cold swap is the process of
installing, connecting, or uninstalling a server device while the server is turned off.

Cold swappable fan in a PowerEdge XR12.

An example of how to remove a cold swappable cooling fan are:

1. Disconnect the cooling fan cable that is connected on the system board
connector or the power interposer board (PIB).
2. Holding the blue touch point, lift the cooling fan out of the fan cage.

Note: Ensure not to tilt or rotate the cooling fan while removing it from the system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 87


Server Hardware Troubleshooting

Adding and Removing Components

As server designs are unique, it is a mandatory practice to review the service


manual before adding and removing components.

An example of a system board being removed in a PowerEdge R650.

The best practices to follow while adding or removing components are:

• To prevent injury, lift the system with the help of others.


• Do not operate the system open without the cover for more than five minutes.
The system gets damaged when operated without the system cover.
• Use an anti-static mat and an anti-static strap while working on the internal
components of the system.
• Ensure all bays in the system, and system fans, are populated with a
component or a blank when proper operation and cooling are required.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 88 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

iDRAC Maintenance Section Overview

The iDRAC9 Maintenance section is where administrators go to troubleshoot


PowerEdge servers. The Maintenance section is also used for other server
management tasks.

iDRAC Maintenance section tab menu.

Review the chart for iDRAC9 Maintenance section capabilities that are based on
server troubleshooting and maintenance scenarios:

If an administrator wants to.... Go to the iDRAC section....

Export Lifecycle log entries. Maintenance > Lifecycle Logs

Access a record of alerts in the iDRAC user Maintenance > System Event Log
interface.

Access a recording of the last three PowerEdge Maintenance > Troubleshooting


server boot cycles.

View scheduled server firmware update jobs. Maintenance > Job Queue

Run a command on a PowerEdge server from the Maintenance > Diagnostics


iDRAC user interface.

Gather platform information that enables support Maintenance > SupportAssist


services to resolve platform problems.

Reset the iDRAC default settings. Maintenance > Diagnostics

View the most recent crash screen that displays Maintenance > Troubleshooting
events leading to the system crash.

Perform manual updates, automatic updates, or Maintenance > System Update


roll back on the PowerEdge server.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 89


Server Hardware Troubleshooting

Event Logs

Server event logs are used to identify the cause of a problem that continues
despite basic troubleshooting of the server system. The Integrated Dell Remote
Access Controller (iDRAC) displays the server event logs. The event logs provide a
short explanation of system events that occurred. The event log descriptions are
beneficial for troubleshooting.

The iDRAC helps the user to view:

• Lifecycle Controller (LCC) logs: LCC logs provide the history of changes that
are related to components installed on the managed system.
• System Event Logs (SEL): Record when a system event occurs on a managed
system. This SEL13 entry is also available in the LCC log. To get started with
SEL through iDRAC9, go to Maintenance > System Event.

View and export the Lifecycle Controller log entries from the Maintenance>Lifecycle Log page in
iDRAC.

13 The SEL offers a filtered version of the LC log containing system events.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 90 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Playing a Boot Capture Video

Simulation Activity: The boot capture option helps a user view the
video recording of the last three boot cycles. A boot cycle video logs
the sequence of events for a boot cycle. The video log is an effective
tool in troubleshooting system errors.
Navigate through the guided walk-through to learn how to play a boot
capture video from iDRAC.

The web version of this content contains an interactive activity.

To review the tasks and steps involved in completing the Play a Boot Capture
Video File simulation job aid, download the job aid document from the on-demand
resources section. Or click the Playing a Boot Capture Video Job Aid link to review
the task and steps online.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 91


Server Hardware Troubleshooting

Boot capture files reflecting under the Troubleshooting tab in the iDRAC.

To configure the boot capture video settings, select one of the following options
and click Apply.

• Disable - Boot capture is disabled.


• Capture until buffer full - Boot sequence is captured until the buffer size has
reached.
• Capture until end of POST - Boot sequence is captured until end of POST.

The boot capture timestamp is the time that the boot capture sequence is
completed. The boot capture completion is either when the boot capture file size
has reached 2 MB or when the host system is rebooted.

The list displays the active boot capture file. While the update is in progress, click
Refresh to view the latest timestamp for the boot capture file. You can play the files
directly from iDRAC or save them to a location on your system.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 92 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

POST Code/ Intrusion/ Last Crash Screen

POST Code, Intrusion, and Last Crash Screen are troubleshooting tools that the
iDRAC provides. Each tool provides a report when a system event occurs.

Go to iDRAC Dashboard > Maintenance > Troubleshooting on the iDRAC to


use the tool.

1: The POST Code option is a view of the last system POST code (in hexadecimal)
before booting the operating system of the managed system. The POST code
helps to detect pre-video errors, report fatal errors, and analyze the system failures
during BIOS POST, particularly a No POST No Video situation. The fatal error
codes are used to report all the fatal POST errors.

2: The Intrusion option is related to the chassis intrusion switch and provides
information about whether the server cover is removed or not seated correctly. A
server cover that is unseated can lead to the system overheating and bring about
potential shutdown issues.

3: The Last Crash Screen option provides information about the events leading to
the system crash. This information is saved in the iDRAC memory and is remotely
accessible. The Last Crash Screen feature is available with iDRAC Express and
Enterprise licenses.

The last crash screen capture requires the user to install Open Manage Server
Administrator (OMSA).

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 93


Server Hardware Troubleshooting

iDRAC POST Code screen.

POST Codes are the progress indicators from the system BIOS indicating various
stages of the boot sequence. The POST Code option helps view the last system
POST code (in hexadecimal) before booting the operating system of the managed
system. The POST code helps to detect pre-video errors, report fatal errors, and
analyze the system failures during BIOS POST, particularly the No POST No Video
situations. The fatal error codes are used to report all the fatal POST errors.

Intrusion provides the status of the chassis intrusion probes. The Intrusion option is
related to the chassis intrusion switch. Intrusion provides information about whether
the server cover is removed or not seated correctly. Improper server covering can
lead to the system overheating and therefore potential shutdown issues.

The Last Crash Screen option provides information about the events leading to the
system crash. This information is saved in the iDRAC memory and is remotely
accessible. The Last Crash Screen feature is available with iDRAC Express and
Enterprise licenses.

The last crash screen capture is only available with the Windows operating system,
and the user must have installed Open Manage Server Administrator (OMSA). The
last crash screen capture does not work with Linux or ESXi operating system. If the
Windows operating system should fail, the last screen feature displays a blue
screen.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 94 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

iDRAC Diagnostics Console Command screen.

The Diagnostics Console Command is used for running commands to troubleshoot


various network issues.

Some of the commands are listed below:

• arp - Displays the contents of the Address Resolution Protocol (ARP) table.
ARP entries may not be added or deleted.
• ifconfig - Displays the contents of the network interface table.
• netstat - Displays the contents of the routing table. If the optional interface
number is provided in the text field to the right of the netstat option, then netstat
displays additional information. The information that is displayed is regarding
the traffic across the interface, buffer usage, and other network interface
information.
• ping <IP Address> - Verifies that the destination IPv4 address is reachable
from iDRAC with the current routing-table contents. An Internet Control
Message Protocol (ICMP) echo packet is sent to the destination IP address
based on the current routing table contents.
• gettracelog - Displays the iDRAC trace log. It may take a few seconds to
return the trace log. The command gettracelog -i returns the number of

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 95


Server Hardware Troubleshooting

records in the trace log. The -A option returns the trace log without the record
numbers.
• ping6 <IPv6 Address> - Verifies that the destination IPv6 address is
reachable from iDRAC with the current routing-table contents.

iDRAC diagnostics screen showing Serial Data Logs and BIOS Live Scanning.

Serial Data Logs

This feature enables you to retrieve the system serial data for operating system
debugging.

NOTE: Serial Data Logs is a licensed feature and is available only with iDRAC
Datacenter license.

BIOS Live Scanning

This feature enables you to scan the system BIOS once POST is completed. This
task can be run once or can be set up on a schedule.

Note - BIOS Live Scanning is a licensed feature and is available only with iDRAC
Datacenter license. This feature is only available on select iDRAC9 x5 systems.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 96 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

iDRAC Troubleshooting - SupportAssist Collections

The iDRAC SupportAssist section is used to create SupportAssist collections of the


server. SupportAssist then exports the collection to a location on the management
station (local) or to a shared network location. The collection is generated in the
standard ZIP format which is sent to Dell technical support for troubleshooting or
inventory collection.

iDRAC Maintenance SupportAssist page.

To reach the SupportAssist page in iDRAC9, go to iDRAC Dashboard >


Maintenance > SupportAssist. From the SupportAssist page, an administrator
selects the Start a Collection tab option. The administrator then designates where
the collection is saved, and then clicks Collect to complete the SupportAssist
Collection task.

There are two ways to generate a SupportAssist collection:

1. Automatic: Use the iDRAC Service Module (iSM) that automatically invokes
the operating system collector tool.
2. Manual: Run the operating system collector tool on the server operating system
to export the operating system and application data.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 97


Server Hardware Troubleshooting

Tip: Before generating the SupportAssist collection, ensure that the


Lifecycle Controller is enabled. An event is recorded in the Lifecycle
Controller log each time that the data is collected.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 98 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Troubleshooting iDRAC

Out-of-Band (OOB) management switch

iDRAC unresponsive due


to unknown conditions

SERVER

Server rear
iDRAC port -
1GB Server
management

iDRAC web UI

Resetting iDRAC settings using the web user interface.

The iDRAC is responsible for system profile settings and out-of-band management.
System conditions can cause the iDRAC to become unresponsive. When the
iDRAC becomes unresponsive, resetting the iDRAC back to factory defaults may
help to resolve the issue.

The web interface or the iDRAC BIOS enables users to reset the iDRAC to its
default settings.

There are three options available to reset iDRAC to default settings.


• Reset iDRAC configuration to defaults – Preserves the iDRAC network settings
and user accounts.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 99


Server Hardware Troubleshooting

• Reset iDRAC configuration to default all – Reset to factory settings and resets
the default username and password to root and calvin.
• Reset iDRAC configuration to default all – Reset to factory settings and resets
the default username and password14 to the shipping value.

iDRAC Reset and Reset iDRAC to Default Settings are listed under the Diagnostics option.

The Reset iDRAC performs a reset and not loses any settings.

The Reset iDRAC to Default Settings gives you the following options:

• Preserve user and network settings.


• Discard all settings and reset users to shipping value.
• Discard all settings and reset username and password.

NOTE: You can perform a hard or soft iDRAC restart without turning off the server.

- Hard restart—On the server, press and hold the ID LED button for 15 s.

- Soft restart—Using iDRAC Web interface or RACADM.

14 The unique password that is in the luggage tag.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 100 © Copyright 2022 Dell Inc.


Server Hardware Troubleshooting

Reset iDRAC to Default Settings

Simulation Activity: Administrators can reset iDRAC to default


settings through the GUI or by launching the Virtual Console and
using the system setup.
Navigate through the guided walk-through to learn how to reset the
iDRAC back to the default settings.

The web version of this content contains an interactive activity.

To review the tasks and steps involved in completing the Reset iDRAC to Default
Settings simulation job aid, download the job aid document from the on-demand
resources section. Or click the Reset iDRAC to Default Settings Job Aid link to
review the task and steps online.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 101


Server Configuration and Change Management

Server Configuration and Change Management

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 102 © Copyright 2022 Dell Inc.


Server Configuration and Change Management

Configuration and Change Management Overview

Management and
Benefits
Planning

Performance measurements Reduced risks

Resource management Cost reduction

Hazards and incidents Configuration Management Improved experience


analysis
Change approval process Strict control

Access and backup storage Greater agility

Impact analysis Efficient change


Change Management
Version control Quicker restoration

Roles and responsibilities Better releases

Configuration and change management for a PowerEdge R640 server.

What is Configuration Management?

Configuration management is implemented to ensure that the IT enterprise


systems operate with consistency.

Configuration management deals with the server specifications and related product
features and specifications. Configuration management maximizes server
performance at all utilization levels and workload types.

Configuration management applies to:

• Server configuration

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 103


Server Configuration and Change Management

• Storage system configuration


• Databases
• Networking
• Applications (Services)
• Software configuration

What is Change Management?

Change management is an implementation plan that describes the configuration


change process to a server or related product.

Why is configuration and change management important?

Change and configuration management helps maintain a server’s performance by


adjusting common standards. The outcome is a maximized investment in server
hardware.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 104 © Copyright 2022 Dell Inc.


Server Configuration and Change Management

Server Documentation

A well-organized server configuration document is critical in reducing the recovery


time from server downtime.

Best practices for server documentation:

Network map Inventory Log interactions


creation Maintain a list of all hardware
in the network. Include:
Every device in the software inventory that Keep a thorough account of
infrastructure should be provides the operating system any action taken involving the
named and labeled. details, virtual machine details server or hardware, even if
Documenting equipment (including any operating does not seem to affect the
names provides an in-depth systems), drivers, connections or performance.
guide to hardware location applications and all
and connection points. associated licensing
information.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 105


Server Configuration and Change Management

Procedures and Standards

Power Consumption

Booting Time

Server Redundancy

Performance

Impacts from adjusting server settings

The procedures of adjusting server settings can increase the productivity of


servers. However, some high-performance settings15 can increase power
consumption, or put a system at risk. The make, model, and architecture all affect
the BIOS settings that are available.

There is no perfect formula to standardize all settings across all servers. Best
practices seek to achieve standardization according to server roles, administration
policies, and procedures.

15 It is important to note that every server is different.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 106 © Copyright 2022 Dell Inc.


Server Configuration and Change Management

Patch Management

Having endpoints in physically inaccessible locations, such as smartphones


or laptops used by remote workers. The ability to schedule patch and update
deployments means that no matter what time-zone the endpoint is in, fixes
can be applied at a time that is not going to disrupt productivity.

Prioritization of critical patch fixes should be applied as soon as possible.


Server administrators must also consider how often often a piece of
software is used, and how business-critical it is before deciding how
urgently to apply a patch.

Choosing the best time to schedule the fix. Ensure updates are
installed outside of working hours to minimize disruption to
business workflows.

Best practices for common patch management.

Patch management is the process of ensuring the most recent updates are applied
to all software components. Patch management includes application and services
such as server operating systems and and database - as well as server tools like
Internet Explorer and Adobe Flash.

The use of patch management tools is crucial to maintain the productivity and
integrity of work.

Patch management also provides an overview of the network health and the
urgency of a needed fix.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 107


Server Configuration and Change Management

Windows Server Update Services

From an enterprise perspective, updating Windows for vulnerabilities on each


server and client system can be daunting. Windows Server Update Services
(WSUS) is how IT administrators install up to date Microsoft product patches
effectively.

Use WSUS servers to distribute patches and updates to clients and servers.

All Clients with Microsoft Windows OS

Firewall

Internet
Updates

Microsoft Update

All Servers with Microsoft Windows Server OS

WSUS Server

Updates

WSUS operational overview.

WSUS provides administrators with:

• Centralized management that allows postponing, pausing, and deployment from


a WSUS server to the client.
• Reduced external network bandwidth. Use WSUS to update clients and servers
without access to the internet.
• WSUS includes options for approving, not approving, declining, and removing
patches.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 108 © Copyright 2022 Dell Inc.


Server Configuration and Change Management

Tip: To learn about updating Dell PowerEdge servers, go to the


www.dell.com/support site and search for the Dell EMC System
Update (DSU) knowledge-based article.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 109


Resources

Resources

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 110 © Copyright 2022 Dell Inc.


Resources

Supporting Resources: Server Maintenance

The below training topics support the concepts and features that are discussed in
this training. Click the provided links for more information.
• Supported Memory Configuration Guide for PowerEdge Servers

− Learn how to find supported memory configuration for your PowerEdge


server.

Tip: Visit the Resource Library for Dell Technologies products at


www.dell.com for more server product information.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 111


Resources

Certification Journey Map

Implementation Engineer, Implementation Engineer, Implementation Engineer, Systems Administrator,


PowerEdge PowerEdge MX Modular PowerEdge VRTX, FX, M OpenManage Enterprise
Series

A. Dell EMC PowerEdge A. Dell EMC PowerEdge M1000e Installation,


A. Dell EMC PowerEdge MX Modular A. Dell EMC OpenManage
Administration and Troubleshooting
Installation, Administration, and Platform Installation, Implementation, Enterprise Features,
Troubleshooting and Administration (ODC)
Implementation, and
B. Dell EMC PowerEdge VRTX Installation, Administration
Administration and Troubleshooting
(C, VC, ODC) (C, VC, ODC) (VC, ODC)
(ODC)

C. Dell EMC PowerEdge FX2 Installation,


Administration and Troubleshooting
(ODC)

D. Dell EMC OpenManage Enterprise


Features, Implementation, and
Administration
(VC, ODC)

PowerEdge

Dell EMC PowerEdge Server Concepts (ODC)

(C) - Classroom
(VC) - Virtual Classroom
(ODC) - On Demand Course

• PowerEdge Server certification track starts with the PowerEdge Server


concepts curriculum (Associate Certification). The curriculum is a prerequisite
for other PowerEdge Specialist certifications.
• The specialist level certification, PowerEdge Implementation Engineer provides
an overview of the PowerEdge portfolio and the technologies involved. It
explores System Management tools and Server security. The course also
covers Server Maintenance and troubleshooting in detail.
• OpenManage Enterprise Systems Administrator certification helps the learners
to deploy, configure, and manage OpenManage Enterprise.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 112 © Copyright 2022 Dell Inc.


Resources

• PowerEdge VRTX, FX, M Series Implementation Engineer certification validates


the learner’s capability to install, manage, and troubleshoot M1000e, VRTX and
FX modular servers.
• Finally, with the PowerEdge MX Modular Implementation Engineer certification
the learners will be able to install, configure, manage, and troubleshoot the Dell
EMC PowerEdge MX7000 platform including MX Networking.

For more information, visit: http://dell.com/certification

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 113


Appendix

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2021 Dell Inc. Page 115


Appendix

DELL EMC SmartUPS 1500 SMARTCONNECT 120V RM


Dell EMC Smart-UPS provides availability and manageability to your network
allowing you to focus on business growth instead of business downtime.

Dell EMC Smart UPS

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 116 © Copyright 2021 Dell Inc.


Glossary
15G
Generation 15 modifier to distinguish different features available for generation 15
servers.

2S
Two socket form factor. Used to identify the family of servers. PowerEdge servers
can have 1S, 2S, or 4S. See the PowerEdge rack server portfolio page for details.

AI
Artificial Intelligence (AI) is the designing and building of intelligent agents that
receive precepts from the environment and act to affect that environment.

BOSS
Dell Technologies boots optimized storage solution. RAID solution card that is
designed for booting a server's operating system.

bus interface
A bus interface is a communication system that transfers data between
components inside a system, or among systems.

CMC
Chassis Management Controller (CMC) manages hardware and software solution
for multiple Dell blade chassis.

CMC
Chassis Management Controller (CMC) manages hardware and software solution
for multiple Dell blade chassis.

configuration
Configuration is the specifications that make the IT enterprise environment systems
work.

connector
A connector is a jack, plug, or card edge that helps connect the device to a port.

CPU

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2021 Page 117


A Central Processing Unit (CPU) is also known as a processor, and acts as the
brain of the server.

DIMM
Direct Access Inline Memory Module. DIMMs are available in varying capacities. All
DIMMs in a cache must have the same capacity

DIMM
Direct Access Inline Memory Module. DIMMs are available in varying capacities. All
DIMMs in a cache must have the same capacity

DL
Deep Learning (DL) is a form of Machine Learning which uses Artificial Neural
Networks.

EQL group manager


SAN and NAS management tool integrated with the Dell EqualLogic PS Series
firmware.

HCI
Hyper Converged infrastructure (HCI) combines compute, virtualization, storage,
and networking in a single cluster.

Hot-swap
Hot-swap means the removal and replacement of an electronic device or module
without powering down or shutting down the system.

HPC
High performance computing (HPC) is the ability to process data and perform
complex calculations at high speeds.

HW RAID
Form of RAID. The motherboard or a separate RAID card handles the processing.

iDRAC
The Integrated Dell Remote Access Controller (iDRAC) is designed for secure local
and remote server management and helps IT administrators deploy, update, and
monitor PowerEdge servers.

iDRAC

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 118 © Copyright 2021 Dell Inc.


The Integrated Dell Remote Access Controller (iDRAC) is designed for secure local
and remote server management and helps IT administrators deploy, update, and
monitor PowerEdge servers.

IDSDM
Redundant SD-card module for embedded hypervisors. PowerEdge servers can
boot to the hypervisor out-of-the-box. The embedded hypervisor is mirrored across
dual SD cards using an integrated hardware controller.

IEEE 802.3
The Electrical and Electronics Engineers (IEEE) 802.3 is a collection of IEEE
standards. The working group defining the physical layer and Media Access Control
(MAC) of Data Link Layer in the Ethernet set the standards.

InfiniBand
A computer networking communications standard used in high-performance
computing.

Intel Ice Lake


Codename for the 3rd generation Xeon Scalable server processors.

IoT
The Internet of things (IoT) describes the network of physical objects such as
sensors, software, and other technologies for the purpose of connecting and
exchanging data with other devices and systems over the Internet. (Wikipedia)

iSM
The iDRAC Service Module (iSM) is optional software provided by the Integrated
Dell Remote Access Controller (iDRAC). The iSM provides additional information
using RACADM CLI, Redfish, Web Service Management (WSMan), and User
Interface (UI). The iSM integrates with the iDRAC SupportAssist collection.

iSM
The iDRAC Service Module (iSM) is optional software provided by the Integrated
Dell Remote Access Controller (iDRAC). The iSM provides additional information
using RACADM CLI, Redfish, Web Service Management (WSMan), and User
Interface (UI). The iSM integrates with the iDRAC SupportAssist collection.

Lifecycle Controller

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2021 Page 119


The Lifecycle Controller provides deployment and simplified serviceability for local
deployment, Remote Service (WSMan), and Redfish interfaces.

live front panel


The iDRAC user interface is used to check the status of the LED front panel.

LRDIMM
Load-Reduced DIMM. Has higher densities than RDIMMs. Uses a memory buffer
chip to reduce the load on the server memory bus.

Media Access Control (MAC) address


The server is connected to the network through a switch. A Media Access Control
(MAC) address identifies the switch. A MAC address is a unique identifier for an
Ethernet or NIC over a network

ML
Machine Learning (ML) is an application of AI where systems use data to learn how
to respond, rather than being explicitly programmed.

MT/s
Mega-Transfers per Second (MT/s). Measurement of bus and channel speed in
millions of cycles per second.

Multicasting
Multicasting involves sending the same message to many endpoints such as in a
video conferencing facility.

NVDIMM
Non-Volatile DIMM

NVMe
Non-Volatile Memory Express (NVMe). Communications interface for PCIe-based
SSDs. Used to increase efficiency and performance.

NVMe
Non-Volatile Memory Express (NVMe). Communications interface for PCIe-based
SSDs. Used to increase efficiency and performance.

OCP

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 120 © Copyright 2021 Dell Inc.


Open Compute Project (OCP) is an organization that shares designs of data center
products and best practices among companies. OCP designs and projects include
server designs, data storage, rack designs, and open networking switches. Read
more information about the organization by going to www.opencompute.org.

OCP
Open Compute Project (OCP) is an organization that shares designs of data center
products and best practices among companies. OCP designs and projects include
server designs, data storage, rack designs, and open networking switches. Read
more information about the organization by going to www.opencompute.org.

OME
OpenManage Enterprise (OME) is the one-to-many management console used to
discover and manage up to 8,000 devices regardless of the form factor.

OMSA
Dell OpenManage Server Administrator (OMSA) is an In-band, one-to-one software
application that can manage and monitor the health of one server.

Optane Persistent Memory


Intel memory where non-volatile data is placed onto a DIMM and installed on the
memory bus.

PCH
Platform controller hub (PCH) controls certain data paths and support functions
used in conjunction with Intel CPUs.

PCIe
Peripheral component interconnect express (PCIe) is an interface standard for
connecting high-speed components.

PCIe
Peripheral component interconnect express (PCIe) is an interface standard for
connecting high-speed components.

PERC
PowerEdge RAID Controller (PERC). Family of controllers that enhance
performance, increase reliability, add fault tolerance, and simplifies management.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2021 Page 121


PERC
PowerEdge RAID Controller (PERC). Family of controllers that enhance
performance, increase reliability, add fault tolerance, and simplifies management.

proactive contact
A support case is automatically opened, diagnostic information is proactively sent
to Dell Technologies, and technical support begins troubleshooting. A support case
supports Windows, Linux, VMware, and Hyper-V environments.

RAID
Redundant Arrays of Independent Disks (RAID). RAID controllers combine multiple
server physical hard drives together into a virtual drive or multiple drives to improve
data efficiency and protection.

RAID
Redundant Arrays of Independent Disks (RAID). RAID controllers combine multiple
server physical hard drives together into a virtual drive or multiple drives to improve
data efficiency and protection.

RDIMM
Registered DIMM. Dual in-line memory module (DIMM) with improved reliability.

SAS
SAS (serial-attached SCSI) is a type of SCSI that uses serial signals to transfer
data, instructions, and information. SAS drives are dual ported.

SAS
SAS (serial-attached SCSI) is a type of SCSI that uses serial signals to transfer
data, instructions, and information. SAS drives are dual ported.

SATA
SATA (Serial Advanced Technology Attachment) uses serial signals to transfer
data, instructions, and information. SATA drives have only a single port.

SATA
SATA (Serial Advanced Technology Attachment) uses serial signals to transfer
data, instructions, and information. SATA drives have only a single port.

SDS

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 122 © Copyright 2021 Dell Inc.


Storage data services such as APEX Data Storage Services. APEX is an as-a-
Service portfolio of scalable and elastic storage resources. The storage as-a-
Service model simplifies the storage process.

SNAP I/O
Balances I/O performance. CPUs share one adapter, which prevents data from
traversing the inter-processor link when accessing remote memory.

SP
A service provider (SP) is a company that provides its subscribers access to the
internet.

STP cable
Shielded Twisted Pair (STP) Ethernet cable that is commonly used for high-speed
networks. A metallic substance shields STP. An additional metal foil wraps each set
of twisted wire pairs together.

SupportAssist
SupportAssist Enterprise is for users that require monitoring of up to 15,000 server,
storage, and networking devices.

SupportAssist
SupportAssist Enterprise is for users that require monitoring of up to 15,000 server,
storage, and networking devices.

UDIMM
Unregistered or unbuffered DIMM. UDIMMs do not have an onboard register as
seen with an RDIMM. UDIMMs are typically used in desktops and laptops.

UEFI boot
Unified Extensible Firmware Interface (UEFI). UEFI secure boot prevents systems
from booting from unsigned or unauthorized preboot device firmware, applications,
and operating system boot loaders. Without secure boot enabled, systems are
vulnerable to malware corrupting the startup process. UEFI is a firmware interface
that connects the firmware to the operating system. UEFI initializes the hardware
components and starts the operating system.

UTP cable

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2021 Page 123


Unshielded Twisted Pair (UTP) Ethernet cable that is commonly used between a
system and wall. It is also used for desktop communication applications.

Dell PowerEdge Server Concepts: Section 04 Server Maintenance

Page 124 © Copyright 2021 Dell Inc.


Dell PowerEdge Server Concepts: Section 04 Server Maintenance

© Copyright 2022 Dell Inc. Page 125

You might also like