Professional Documents
Culture Documents
HARDWARE
MAINTENANCE
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Table of Contents
The graphic shows the PowerScale Solutions Expert certification track. You can
leverage the Dell Technologies Proven Professional program to realize your full
potential. A combination of technology-focused and role-based training and exams
to cover concepts and principles as well as the full range of Dell Technologies'
hardware, software, and solutions. You can accelerate your career and your
organization’s capabilities.
PowerScale Solutions
(C) - Classroom
Prerequisite Skills
To understand the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must have an
understanding of:
• Current PowerScale hardware portfolio and the OneFS operating system
• PowerScale Concepts
Course Objectives
Module Objectives
The graphic shows a few basic reminders that are common to all hardware
maintenance procedures.
If you encounter any difficulties while performing this task, immediately contact Dell
EMC Technical Support.
3 6
4 8
2 7
1: Customer Replaceable Units (CRUs) are removed without shutting down the
node. This means you can accomplish CRU replacements and CRU hardware
upgrades while the node is powered up as long as the correct procedure is
followed. Field Replaceable Units (FRUs), on the other hand, require the node to
be powered off. If you must power off a node, always shut it down properly as
described in the replacement guide.
2: On Generation 6 nodes (Gen 6), before disconnecting any cables, ensure that
the compute module's Do Not Remove LED is off. When the LED is white or On,
this indicates the node’s journal is still active. The Do Not Remove LED is on the
right side of the compute module and looks like a symbol of a hand with a slash
through it. Do not disconnect any cables until this LED is off.
3: On Generation 5 nodes (Gen 5), never power off a node by pressing the power
button or removing both power cables unless you are expressly directed to do so
by Dell EMC Technical Support.
4:
Use the Solve Desktop tool to get the most recent, full instructions for the
procedure. These instructions are frequently updated based on feedback from the
field, so ensure to consult the instruction documents prior to every engagement,
even if you have already performed the service requested previously.
SolVe Desktop has been revised and updated to SolVe Online. It is a knowledge
management-led standard procedure for DELL-EMC field, service partners, and
customers. Click here for an overview on SolVe Desktop or SolVe Online.
6: Save the packaging from the replacement part. Use this packaging to return the
failed part to Dell EMC. A return label is included with the replacement part.
7: If the customer and/or Dell EMC technical support request Failure Analysis on
the replaced part, be sure to attach a filled out FA label to the return box, and
complete an FA request ticket in the WWFA system. Provide the FA ticket number
to your Support contact and/or add it to the SR in a comment.
8: After all work is complete, partner personnel should submit the Partner
Notification Form (PNF) to allow Dell EMC Technical Support to update the install
database. Dell EMC personnel should update the install database directly by going
to the Dell EMC Business Services website. In the Post Sales area, click Install
Base Group, complete, and submit the form.
Electrostatic Discharge
Anti-static Packaging:
Leave components in
anti-static packaging until
time to install.
Cold Serviceable
Blue Handles
Hot Serviceable
Terracotta Handles
Preparing a Node
Module Objectives
We’ll start off by covering how to safely power down a node. Remember, on a Gen
5 node, never power down a node by pressing the power button unless explicitly
instructed to do so by Dell EMC Technical Support. To power down a node, first
connect to the cluster. This can be done using SSH or the serial port. If using a
terminal emulator utility with a serial port connection, here are the settings to use:
In this exercise, we will shut down node 3. The next step after connecting to the
cluster is to get the IP address of the node to shut down. Here we are connected
and logged in to the cluster on node 1.
Click in the box and type isi status -q to get the node’s IP address.
Log on to the node that you want to shut down and type the command to shut the
node down: shutdown -p now.
To check that the node is powered down, we are connected to another node in the
cluster. Click in the box and type isi status -q again.
The shutdown node has a status of D--- (Down) in the command output.
Note:
• Gen 6 nodes do not support serial flow control - the flow control
setting should be set to 'None' when connecting to Gen 6
hardware.
• isi network interfaces list -v command can also
be used to get the node's IP address.
• isi config command can also be used to shut down a
node.
• It is recommended to run isi_flush command prior to
performing any shutdown to flush the cache.
Module Objectives
FRU
You can watch the videos of replacement procedures in the next few slides.
Gen 6
Movie:
Link:
https://edutube.emc.com/vlearning/launch/cPBJf1qfSHOzdb2Q|@$@|pA1wg==/vid
eodetails=false,comments=false,launch=yes
Node: Gen 6
Part 1
When facing the back of the chassis the compute modules are labeled right to left,
one to four as shown. Because compute modules are installed in pairs that are
called “node-pairs”, the minimum cluster size has increased from three to four
nodes and additional nodes must be added in node-pairs. The graphic shows that
the Node-pairs are either the left half or right half of chassis.
Part 2
Link:
https://edutube.emc.com/Player.aspx?vno=S3kFK4UC82Qmnc2PUec1iA==&autop
lay=true
• Transfer the internal components from the failed compute module to the
replacement unit, except for the battery
Remove the replacement node from the shipping package and inspect it for any
sign of damage. Notify Dell EMC Isilon Technical Support if the node appears
damaged in any way. Do not install a damaged node. Do not discard the shipping
container and packaging. You'll use them to return the failed node to Dell EMC.
Power down the node by following the instructions in the replacement guide.
Label the network cables connected to the back of the node to ensure that you can
reconnect them correctly. Before you disconnect any cables from a node, make
sure Do Not Remove LED is not lit. When the Do Not Remove LED is off
disconnect all cables from the back of the node. If there are transceivers connected
to your network cables, remove them from the node. You might see LEDs on inside
the node, even after you have removed the power cord. That is because the node
next to it is supplying redundant power. On the back of the chassis, loosen the
orange thumbscrew that secures the node release lever. To eject the node from the
node bay, pull the orange handle away from the node. Slowly pull the node out of
the bay. Support the node at front and back with both hands as you remove it from
the bay. Place the node on an ESD protected work surface next to the replacement
node.
Position the node with the fans facing you. The blue release handle should be
under the fans. Place the heel of your hand on the gray connectors above the fans
and grab the blue release handle with the fingertips of your other hand. Make sure
you are not pressing down on the top of the node with the heel of your hand as that
will keep the node lid from popping up when you pull the release handle. Pull on
the blue release handle to lift the lid up from the node. You will feel an initial pop as
the blue release handle pulls away from the node. Pull on the release handle until
you feel a second pop to raise the lid up off the node. Lift the lid straight up off the
node and place it next to the body of the node. Repeat for the replacement node.
Inside the left side of the node body, just behind the fans, locate the blue touch
point label. Place the thumb of your left hand on the blue touch point and press the
side of the node away from the fans. The metal tab that holds the fans in place will
flex away from the fans so you can remove them. Slide the fans straight up out of
the node with your right hand. Repeat for the replacement node
Locate the two blue tabs for removing the HBA riser. There's a sliding tab at the
back of the riser, and a fixed tab at the front. Complete the following three steps at
the same time. To free the back end of the riser, push the sliding tab in the
direction of the arrow on the tab. To free the front end, pull the riser away from the
locking pin on the side of the chassis with the fixed tab. Lift up on the tabs to
unseat the riser and pull it straight up out of the node.
Remove both the internal and external NICs from the HBA riser. When you are
looking down the length of the HBA riser, with the battery pack close to you, the
internal NIC is on your left, closest to the bottom of the riser. Remove the retaining
screw that secures the internal NIC to the chassis and set it aside. Pull the NIC
straight up out of its slot. Make a note that this is the internal NIC. Repeat for the
external NIC.
Disconnect the battery pack and remove it from the HBA riser. Press in on the
locking tab and disconnect the battery cable. Push in on the retaining tabs on the
bottom edge of the riser and lift up to free one side of the battery pack. Roll the
battery pack away from the riser to free the other side of the pack and remove it.
Unsnap and open the black retaining tab at the end of the M.2 vault card. Lift the
free end of the card at an angle and pull the card away from the connector.
After you remove the M.2 vault card, re-install the battery pack in the HBA riser
from the failed node. The replacement node already contains a battery. Hook the
two battery pack feet closest to the battery cable into the slots on the riser. Roll the
battery pack down until it is flat against the M.2 vault card, and then push in on the
retaining tabs until they click into the slots.
Slide the HBA riser into the node and secure it in place. Align the metal tab next to
the sliding tab on the riser with the slot on the node chassis. Slide the riser
downward until you seat the riser in the chassis. When you push the riser down to
seat it, you will see the sliding tab click forward and back as it secures the riser in
place. Make sure that the locking pin next to the fixed tab at the front of the riser
aligns with the locking slot in the chassis. The locking pin might sit away from the
side of the chassis. You can pinch the side of the chassis and the riser together to
make sure that the locking pin aligns with the slot on the chassis. When you install
the fans, the side of the fan module will hold the locking pin in place.
Remove the HBA riser from the replacement node using the same technique as
before. Remove the battery pack using the same technique as before.
Unsnap and open the black retaining tab at the end of the M.2 vault card. Insert the
connecting end of the M.2 vault card at an angle into the connector on the new
HBA riser card. Lower the other end of the M.2 vault until the card lies flat against
the HBA riser. Snap the retaining tab closed over the end of the card. Re-install the
battery using the same technique as before.
Locate the slot where you will install the internal NIC. Align the bottom of the card
with the appropriate slot and push the NIC into the slot. Secure the card to the
chassis using the retaining screw. Repeat for the external NIC. If you're installing a
10Gb NIC, the card is shorter than the internal NIC. You must install it in the middle
slot, right next to the internal NIC. If you're installing a 40Gb NIC, the card looks
just like the internal NIC. You must install it in the far-right slot, closest to the blue
HBA riser release tab.
Transfer all DIMMs from the failed node to the replacement node. In the
replacement node press down on the DIMM retaining tabs. Do the same in the
failed node for the first DIMM. Pull the DIMM straight up to remove it from the slot.
Make note of the slot from which the DIMM is removed. Transfer it into the
corresponding slot in the lid of the replacement node. Align the DIMM with the slot
and press down until the retaining tabs snap into place. Push on the retaining tabs
to make sure they are closed. Repeat for all remaining DIMMs.
With the label on top of the fans facing you, insert the rails on either side of the fans
into the slots on the sides of the node. Press down on the fans until you feel them
click into place. Repeat for the replacement node.
Make sure the blue release handle below the fans is pulled out completely. Place
the node lid onto the body of the node. You can use the cutouts on the side of the
lid to align the lid with the node body. Make sure that the lid is not in contact with
the HBA riser or any other internal components, otherwise you might damage
something when you secure the lid. Apply gentle pressure to the top of the lid with
one hand as you push in the blue release handle with the other hand. You'll feel the
lid pull down onto the node as you push in the release handle. If you do not feel the
lid pull down onto the node, pull the release handle back out and make sure that
the lid is properly aligned with the node body. Brace one hand against the back of
the node and push the blue release handle all the way in to secure the lid to the
node body. Repeat for the replacement node.
Keep the lever in the open position until the node is pushed all the way into the
bay. Support the node with both hands and slide it into the node bay. Push the
release lever in against the node back panel. You can feel the lever pull the node
into place in the bay. If you do not feel the lever pull the node into the bay, pull the
lever back into the open position, make sure that the node is pushed all the way
into the node bay, then push the lever in against the node again. Tighten the
thumbscrew on the release lever to secure the lever in place. Locate the labels on
the network cables and connect them to the correct ports on the back of the node.
Locate the power cable and connect it to the back of the node. Drop the metal bale
down over the power cord to secure the connector in place. The node will
automatically power up when you connect the power cable.
• Gather logs
If you encounter any difficulties while performing this task, contact Dell EMC
Technical Support.
Gen 6.5
Movie:
Link:
https://edutube.emc.com/vlearning/launch/mxpAovnlxV1ZXK1wL4Mr1g==/videodet
ails=false,comments=false,launch=yes
Module Objectives
Node Compatibility
1First, Gen 6 nodes can be in a cluster with Gen 5 and Gen 4 nodes. Gen 5 nodes
and earlier are not compatible with a Gen 6 cluster using an Ethernet back-end.
This means you cannot add Gen 5 nodes to a Gen 6 cluster that uses an Ethernet
back-end, but you can have Gen 6 and Gen 5 nodes using the same InfiniBand
backend.
2 It enables you to transition slowly to the new hardware over time without a forklift
upgrade by allowing you to add one node at a time to an existing node pool. This is
more cost effective than adding the node minimum to start a new node pool with
the all new hardware. When a customer has grown the new node counts to
sufficient quantities, node compatibility can be disabled on an individual node pool.
3 Enabling SSD compatibility allows customers to replace older, smaller SSDs with
new, larger SSDs to allow more L3 cache space. This lets customers better utilize
storage resources. Every node in the pool must be the same model or of the same
SSD: Gen 6
Movie:
Link:
https://edutube.emc.com/Player.aspx?vno=Ti76c637o7LUEAJj9GMayw==&autopla
y=true
Script: If there is more than one cache SSD installed, review the cluster event
associated with the failed SSD to determine which SSD to replace. Refer to the
replacement guide for more information. Press up on the orange tab to free the
bottom of the protective cover from the node, then swing the cover up and remove
it. On the face of the SSD, press up on the orange release button to release the
SSD handle. Rotate the SSD handle downward until it is perpendicular to the
compute module. Pull the SSD from the node.
Install the new cache SSD into the back of the node. If both SSD bays are empty,
install the SSD into the bay on the right. Make sure the SSD handle is completely
open and insert the SSD into the empty drive bay. Rotate the SSD handle upward
to seat the SSD and lock it in place. Place the upper tab of the SSD cover into the
slot above the SSDs. Swing the bottom of the SSD cover down and press it up into
the back of the node until it clears the catches and rests securely in place.
series or family. The node pool must have the same number of SSDs per node in
every node if the OneFS version is prior to OneFS 8.0.
If you encounter any difficulties while performing this task, contact Dell EMC
Technical Support.
Regardless of the node type, each compute module slot pairs with five drive sled
bays. Depending on the length of the chassis and type of drive, this means that
each node can have up to thirty drives, or as few as fifteen with every sled in place.
Every node needs a consistent set of sled types, and drive types in each sled,
meaning you cannot mix-and-match different drives within a sled or different sleds
in node slots. There are three types of drive sleds. For 3.5" drives, there are long
and short sleds, and for 2.5" drives there is a short sled that contains up to six
drives. The 3.5" drives come with a paddle card that connects the drive into the
sled, while the 2.5" drives connect directly into the sled. The 3.5" drives fit into a
sled without paddle cards, but there will be no connection to the sled without the
paddle card.
Drives: Gen 6
Part 1
Internal to the 2.5" sled, there are individual fault lights for each drive. The yellow
LED associated with each drive is visible through holes in the top cover. A
supercapacitor can keep one light lit for around 10 minutes while the sled is out of
the chassis, but if more than one light is lit (indicating multiple drive failures) the lit
time is correspondingly reduced.
In the 3.5" drive sleds, the yellow drive fault LEDs are on the paddle cards, and
they are visible through the cover of the drive sled so that you can see which drive,
if any, needs replacement. The graphic shows the 3.5” short drive sled, the 3.5”
long sled has four LED viewing locations.
Part 2
Movie:
The web version of this content contains a movie.
Link:
https://edutube.emc.com/Player.aspx?vno=hfBe54RKpBcgPpEvb2/dXQ==&autopla
y=true
Part 1
The graphic shows the lights and their information for the drive sleds. All twenty
sleds can be individually serviced. Do not remove more than one sled per node at a
time on running nodes. The typical procedure is to go to a chassis where a fault
has been detected, inspect the sleds to see which one shows a fault light, press the
service request button, wait until the LED stops blinking and goes dark, then
remove the sled and replace the drive. Replace the sled. The node automatically
detects and configures the replacement drive.
The service request button informs the node that the sled will be removed, and the
node prepares for this by moving key boot information from drives on that sled,
suspending the drives in the sled from the cluster file system, and then spinning
them down. This is to maximize survivability in the event of further failures, and to
prevent cluster file system issues that are caused by multiple drives becoming
temporarily unavailable.
Power/Activity
Sled Fault
Part 2
Movie:
The web version of this content contains a movie.
Link:
https://edutube.emc.com/Player.aspx?vno=JvXxvcHiwtqgODunE/TzlA==&autoplay
=true
Note: If the suspend button is pressed and drives are detected, the
node attempts to rediscover the sled and rejoin its drives after 1 hour.
If the suspend button is pressed and drives are not detected or the
sled is still removed, the node automatically smartfail the drives after
15 minutes.
Movie:
Link:
https://edutube.emc.com/Player.aspx?vno=LW/ZI6kmMaqKY54dpE/lOw==&autopl
ay=true
Script: Gather logs by following the instructions in the replacement guide. Press
both latches of the front bezel simultaneously to release it. Align the front bezel with
the front of the chassis, then push until you feel the bezel snap into place.
If you encounter any difficulties while performing this task, contact Dell EMC
Technical Support.
Movie:
Link:
https://edutube.emc.com/Player.aspx?vno=oveIZf3k48xr4biiu/hZXg==&autoplay=tr
ue
Script: Lift the metal bale to free the power cord. Disconnect the power cord from
the power supply. You may see LEDs on inside the compute module, even after
you have removed the power cord. That is because the node next to it is supplying
redundant power. Press the orange retaining tab upward and pull the black handle
to slide the power supply out of the node.
Slide the new power supply unit into the open bay in the back of the node until you
feel the unit click into place. Connect the power cord to the power supply. Rotate
the metal bale down over the power cord to hold the cord in place.
Follow the instructions in the guide to complete the replacement procedure. If you
encounter any difficulties while performing this task, contact Dell EMC Technical
Support.
Gen 6.5
Movie:
Link:
https://edutube.emc.com/vlearning/launch/aYXb4cgLbhio9HDLNhC0Xg==/videodet
ails=false,comments=false,launch=yes
• To update the drive firmware on nodes without bootflash drives, download and
install the latest drive firmware package.
• Power cycling drives during a firmware update might return unexpected results.
As a best practice, do not restart or power off nodes when the drive firmware is
being updated in a cluster.
• To update the drive firmware for your entire cluster, run the following command:
# isi devices drive list (--node-lnn node-lnn-number)
• Note that you must wait for the current upgrade operation to complete before
initiating another.
• To confirm that a node has finished updating, run the following command: #
isi devices -d <node-number>. A drive that is still updating displays a
status of FWUPDATE.
• OneFS updates drive sequentially.
Module Objectives
Click each tab to learn how to generate the procedure through SolVe Online.
Step 1
Step 2
Note that the procedure is still generated even though you do not enter any
information.
Step 3
• Click GENERATE.
Step 4
• You can find the generated procedure in My Content tab in SolVe Online.
Step 5
• You will also receive a mail with the link for the procedure.
Course Summary
Course Summary
Now that you have completed this course, you should be able to:
→ Explain hardware maintenance procedures.
→ Prepare a node.
→ Replace Field Replaceable Units (FRUs).
→ Replace Customer Replaceable Units (CRUs).