Professional Documents
Culture Documents
Replacement Procedures:
IBM PureData System for Analytics
N3001
Revised: December 14, 2018
i
8 Replacing a Host System Board (N3001-001)
ii
iii
iv
Preface
This guide includes a series of procedures you must follow to replace components in an
IBM® PureData™ System for Analytics N3001.
Topics See …
An overview of the system components and “Overview of the IBM PureData System for
locations. Analytics N3001” on page 1-1
Steps required to replace an S-Blade. “Replacing S-Blade Components” on
page 2-1
Steps required to replace an Environmental “Replacing an Environmental Services
Services Module. Module” on page 3-1
Steps required to replace a Disk Drive. “Replacing a Disk Drive” on page 4-1
Steps required to replace a Disk Enclosure. “Replacing a Disk Enclosure” on page 5-1
Steps required to replace an AMM. “Replacing a Management Module” on
page 6-1
Steps required to replace a 10Gb Switch. “Replacing a 10Gb Switch” on page 7-1
Steps required to replace Host Server System “Replacing a Host System Board (N3001-
board in an N3001-001 system. 001)” on page 8-1
Steps required to replace Host Server System “Replacing a Host System Board (N3001-
board in an N3001-002 or larger system. 002 or larger)” on page 9-1
Steps required to replace a Host Server RAID “Replacing a Host Server RAID Flash” on
Flash. page 10-1
Steps required to replace a Host Server NIC. “Replacing a Host Server Network Inter-
face Card” on page 11-1
Steps required to replace a Host Server SAS “Replacing a Host Server SAS HBA
HBA. (N3001-001)” on page 12-1
Steps required to replace a Host Server Disk “Replacing a Host Server Disk Drive” on
Drive. page 13-1
Steps required to replace G8052 Manage- “Replacing a G8052 Management
ment Switch. Switch” on page 14-1
Steps required to replace G8264 Fabric “Replacing a G8264 Fabric Switch” on
Switch. page 15-1
Steps required to replace a KVM. “Replacing a Keyboard/Video/Mouse
(KVM)” on page 16-1
v
Topics See …
vi
Comments on the Documentation
We welcome any questions, comments, or suggestions that you have for the IBM Netezza
documentation. Please send us an e-mail message at netezza-doc@wwpdl.vnet.ibm.com
and include the following information:
The name and version of the manual that you are using
Any comments that you have about the manual
Your name, address, and phone number
We appreciate your comments on the documentation.
vii
Safety
Before installing this product, read the Safety Information.
viii
Antes de instalar este produto, leia as Informações sobre Segurança.
Safety Statements
These statements provide the caution and danger information used in this documentation.
Important:
Each caution and danger statement in this documentation is labeled with a number. This
number is used to cross reference an English-language caution or danger statement with
translated versions of the caution or danger statement in the Safety Information document.
For example, if a caution statement is labeled "Statement 1," translations for that caution
statement are in the Safety Information document under "Statement 1."
Be sure to read all caution and danger statements in this documentation before you per-
form the procedures. Read any additional safety information that comes with your system
or optional device before you install the device.
ix
Replacement Procedures: IBM PureData System for Analytics N3001
Statement 1
DANGER
Connect to properly wired outlets any equipment that will be attached to this
product.
When possible, use one hand only to connect or disconnect signal cables.
Never turn on any equipment when there is evidence of fire, water, or structural
damage.
Disconnect the attached power cords, telecommunications systems, networks, and
modems before you open the device covers, unless instructed otherwise in the
installation and configuration procedures.
Connect and disconnect cables as described in the following table when installing,
moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF. 1. Turn everything OFF.
2. First, attach all cables to devices. 2. First, remove power cords from outlet.
3. Attach signal cables to connectors. 3. Remove signal cables from connectors.
4. Attach power cords to outlets. 4. Remove all cables from devices.
5. Turn device ON.
x
Statement 2
CAUTION:
When replacing a lithium battery, use only the approved IBM® Part Number or an equiva-
lent type battery recommended by the manufacturer. If your system has a module
containing a lithium battery, replace it only with the same module type made by the same
manufacturer. The battery contains lithium and can explode if not properly used, handled,
or disposed of.
Do not:
Throw or immerse into water
Heat to more than 100°C (212°F)
Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.
Statement 3
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.
xi
Replacement Procedures: IBM PureData System for Analytics N3001
Statement 4
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
Statement 6
CAUTION:
If you install a strain-relief bracket option over the end of the power cord that is connected
to the device, you must connect the other end of the power cord to an easily accessible
power source.
xii
Statement 7
CAUTION: If the device has doors, be sure to remove or secure the doors before moving or
lifting the device to avoid personal injury. The doors will not support the weight of the
device.
Statement 8
CAUTION:
Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has
this label attached. There are no serviceable parts inside these components. If you suspect
a problem with one of these parts, contact a service technician.
Statement 13
DANGER
Overloading a branch circuit is potentially a fire hazard and a shock hazard under
certain conditions. To avoid these hazards, ensure that you system electrical require-
ments do not exceed branch circuit protection requirements. Refer to the information
that is provided with your device for electrical specifications.
xiii
Replacement Procedures: IBM PureData System for Analytics N3001
Statement 15
CAUTION:
Make sure that the rack is secured properly to avoid tipping when the server unit is
extended.
Statement 21
CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
Statement 26
CAUTION:
Do not place any object on top of rack-mounted devices.
Statement 28
CAUTION:
The battery is a lithium ion battery. To avoid possible explosion, do not burn the battery.
exchange it only with the approved part. Recycle or discard the battery as instructed by
local regulations.
xiv
Statement 34
CAUTION:
To reduce the risk of electric shock or energy hazard:
This equipment must be installed by trained service personnel in a restricted-access
location, as defined by the NEC and IEC 60950-1, Second Edition, The Standard for
Safety of Information Technology Equipment.
See the specifications in the product documentation for the required circuit-breaker
rating for branch circuit overcurrent protection.
Use copper wire conductors only. See the specifications in the product documentation
for the required wire size.
See the specifications in the product documentation for the required torque values for
the wiring-terminal nuts.
Statement 37
DANGER
xv
Replacement Procedures: IBM PureData System for Analytics N3001
xvi
CHAPTER 1
Overview of the IBM PureData System for Analytics N3001
What’s in this chapter
Prerequisites
Replaceable Components
FRU Numbers
Electrostatic Discharge Precautions
Contact IBM Netezza Support
This guide provides replacement procedures for the components in the IBM PureData
System for Analytics N3001 that require steps in addition to those provided in the major
component IBM Problem Determination and Service Guide. For components not included
in this guide, see the IBM Problem Determination and Service Guide for the major compo-
nent requiring replacement.
This chapter provides an overview of the N3001 system.
To identify the machine type, a label is located in the upper right at the front and rear of
the rack, viewable with the doors open.
Prerequisites
Before you begin the replacement process, make certain that you have a replacement com-
ponent that conforms to the hardware models supported for the N3001 system. IBM Field
Replaceable Unit (FRU) numbers are located on each component.
Also, some procedures require logging into the system as root user. Other procedures
require the nz user and the use of the nzrev command to make sure that the N3001 system
is running a minimum level of Netezza software.
This procedure requires you to be familiar with commands such as nzhw and nzds, which
are documented in the Netezza System Administrator's Guide.
To service the N3001-001 using the procedures in this guide, a KVM or other keyboard/
monitor/mouse must be attached to the system. A KVM is not part of an N3001-001.
IBM Netezza appliances are now configured to automatically open support tickets using the
Call Home service. To avoid creation of extra tickets during service procedures which
involve system outages or state changes, please be sure to disable Call Home (if it is
enabled) prior to performing maintenance, and re-enable Call Home at the end of the pro-
cedure. Details can be found within each procedure.
1-1
Replacement Procedures: IBM PureData System for Analytics N3001
Replaceable Components
The IBM PureData System for Analytics N3001 models include partial, single, and multi-
rack systems.
Components accessed from the front of the system are:
Disks
Disk Enclosure
Host Servers
S-Blades
Keyboard
Video Monitor
ESM (EXP2524)
Management Module
Gb Switch
Power modules
PDUs
Rack Illustrations
The following illustrations show the component configuration of N3001 models.
Power Modules
Host 1
Host 2
Front Rear
Machine Type
Label
SPA 1
Host 1
Host 2
KVM
Enclosure 1
Enclosure 2
Chassis 1
(2 S-Blades)
Machine Type
Label
Power Modules
KVM Switch
Management Switch
spa1 encl 1 mm1 spa1 encl 1 mm2
ESM a ESM b
spa1 encl 2 mm1 spa1 encl 2 mm2
ESM a ESM b
Management Module 2
Chassis 1 / spa1.mm2
Gb Switch
Chassis 1 / Slot 9
Machine Type
Label
SPA 1
Host 1
Host 2
KVM
Enclosure 1
Enclosure 2
Enclosure 3
Enclosure 4
Enclosure 5
Enclosure 6
Chassis 1
(4 S-Blades)
Machine Type
Label
Power Modules
KVM Switch
Management Switch
spa1 encl 1 mm1 spa1 encl 1 mm2
ESM a ESM b
spa1 encl 2 mm1 spa1 encl 2 mm2
ESM a ESM b
spa1 encl 3 mm1 spa1 encl 3 mm2
ESM a ESM b
spa1 encl 4 mm1 spa1 encl 4 mm2
ESM a ESM b
spa1 encl 5 mm1 spa1 encl 5 mm2
ESM a ESM b
spa1 encl 6 mm1 spa1 encl 6 mm2
ESM a ESM b
Lower Left PDU Lower Right PDU
Management Module 2
Chassis 1 / spa1.mm2
Gb Switch
Chassis 1 / Slot 9
Machine Type
Label
SPA 1
Enclosure 1
Enclosure 2
Enclosure 3
Enclosure 4
Enclosure 5
Enclosure 6
Host 1
Host 2
KVM
Enclosure 7
Enclosure 8
Enclosure 9
Enclosure 10
Enclosure 11
Enclosure 12
Chassis 1
(7 S-Blades)
Machine Type
Label
Management Module 2
Chassis 1 / spa1.mm2
Gb Switch
Chassis 1 / Slot 9
Machine Type
Label
SPA 2 through 8
Enclosure 1
Enclosure 2
Enclosure 3
Enclosure 4
Enclosure 5
Enclosure 6
Enclosure 7
Enclosure 8
Enclosure 9
Enclosure 10
Enclosure 11
Enclosure 12
Chassis 2 through 8
(7 S-Blades)
Machine Type
Label
Fabric Switches
Management Module 2
Chassis 2 / spa2.mm2
Gb Switch
Chassis 2 / Slot 9
Machine Type
Label
Management Switch
spa[3|5|7] encl 7 mm1 spa[3|5|7] encl 7 mm2
ESM a ESM b
spa[3|5|7] encl 8 mm1 spa[3|5|7] encl 8 mm2
ESM a ESM b
spa[3|5|7] encl 9 mm1 spa[3|5|7] encl 9 mm2
ESM a ESM b
spa[3|5|7] encl 10 mm1 spa[3|5|7] encl 10 mm2
ESM a ESM b
spa[3|5|7] encl 11 mm1 spa[3|5|7] encl 11 mm2
ESM a ESM b
spa[3|5|7] encl 12 mm1 spa[3|5|7] encl 12 mm2
ESM a ESM b
Lower Left PDU Lower Right PDU
Management Module 2
Chassis [3|5|7] / spa[3|5|7].mm2
Gb Switch
Chassis [3|5|7] / Slot 9
Machine Type
Label
Management Module 2
Chassis [4|6|8] / spa[4|6|8].mm2
Gb Switch
Chassis [4|6|8] / Slot 9
FRU Numbers
We recommend that you verify the FRU number of the replacement part as compared to the
original part before beginning the replacement process.
The method for verifying the FRU number varies, depending on component, date of manu-
facture of the Netezza system, and other factors. This sections covers some of the methods
for identifying FRU numbers.
This guide is updated frequently, and every attempt is made to have accurate FRU numbers
listed. However, we recommend that when possible, verify an accurate FRU number by
using one of the methods included in this section.
Hardware rev: 6
Part no.: 41Y4864
FRU no.: 25R5780
FRU serial no.: YK109092D00Y
CLEI: Not Available
AMM slots: 2
Blade slots: 14
I/O Module slots: 10
Power Module slots: 4
Blower slots: 2
Media Tray slots: 1
H-Chassis 31R3308
S-Blade, Blade Server (HS23) 00YL046 00YL045
S-Blade, DIMM 46W0710
S-Blade, Processor 00Y2786
10Gb LOM Interposer 81Y9388
S-Blade, Database Accelerator Card 00X6711
S-Blade, Sidecar Expansion Chassis 68Y7498
AMM 47C2480
10Gb Switch 90Y9392
Midplane use command option use command option
Power Supply use command option use command option
Media Tray (not including optical drive) use command option use command option
The S-Blade of the IBM PureData System for Analytics N3001 is composed of the assem-
bly of an IBM HS23 blade server and an expansion unit with two Netezza Database
Accelerator Cards (DACs).
Note: The N3001-001 system does not use S-Blades.
Table 2-1: Components Covered by this Procedure
FRU Number
Note 1: When replacing the blade planar, you must have on hand two sets of media and the
accompanying README for instructions on booting the blade and updating the embedded
Emulex firmware. The media and documentation is available from Fix Central:
- 4.2.0.5-IM-Netezza-HOSTFW-HS23, and
- FDT Support Tools 2.0.0.5.
Use the downloaded 4.2.0.5-IM-Netezza-HOSTFW-HS23 ISO to create a bootable DVD.
2-1
Replacement Procedures: IBM PureData System for Analytics N3001
Note 2: When replacing the Expansion Unit, do not re-use the PCI riser from the original
Expansion Unit. Always use the PCI riser from the replacement Expansion Unit.
The replacement procedure included in this chapter must be followed when replacing any
component of the S-Blade, whether that component is part of the blade server (planar,
10Gb LOM Interposer, DIMM, processor, or CMOS battery) or part of the Sidecar
Expansion.
This procedure is written with the intent of keeping the Netezza system online. If the failed
component is one of these blade server components:
Blade server planar
Blade server CMOS battery
DAC (see Table 2-1 for replacement requirements)
The component must be replaced with the system online.
When re-seating an S-Blade the blade must be in a failover state or the NPS system must
be stopped.
For reference, the mechanical steps for removal and replacement of components located in
the blade server and sidecar (expansion unit) are documented in IBM BladeCenter HS23
Problem Determination and Service Guide.
DO NOT use the standard USB thumb drive to update firmware on blade components. Use
only the media described in this procedure.
The estimated time to perform this procedure from 90 to 240 minutes, depending on
which components require replacement, ease of access to the system and familiarity with
NPS and the Netezza system
Before you begin the S-Blade replacement process, make certain that you have contacted
Netezza Customer Service and that a service event plan is in place.
This procedure must be run from the N3001 active host.
Blade Server
(HS23) Blade Server Processors Sidecar Expansion Unit
Note: The customer is responsible for enabling and configuring the Restricted Environment
as part of NPS 7.2.0.5 or later.
If Call Home is enabled on this system, the System Manager will report this activity and
create PMRs. The customer must be made aware of this so that the PMRs can be closed
when filed.
NZ support actions
1 Replace Storage Disk
2 Replace SPU
3 Quit
>
3. Select option 2 from the menu:
> 2
Example output:
Logging to file /nz/var/log/replacespu/replacespu20141222151546.log
>
4. Type the S-Blade location information or HWID identified in step 9 on page 2-19.
For example:
> 1/13
Where 1 is the SPA number and 13 is the S-Blade slot number.
Or:
> 1674
Where 1674 is the HWID of the failed S-Blade.
Example output:
Turning on location LED for SPU 1 13
Location LED turned on for SPU 1 13
Retrieving blade product name for spa 1 slot 13
Retrieving blade serial number for spa 1 slot 13
-
Please replace the blade(s) in
SPA SLOT
1 13
The replacespu script locates and lights the S-Blade’s blue location LED ( ) and
prompts you to replace the hardware, providing the SPA and SLOT numbers, and
instructs you to remove SAS cabling, and to press RETURN when ready to proceed.
5. Making note of cable locations, so that they can be replaced in the same locations,
unplug the SAS cables from the S-Blade sidecar.
6. Remove the S-Blade assembly from the chassis.
Note: It is NOT necessary to power down the blade.
Only re-insert the S-Blade into the chassis as a complete Blade Server/Expansion Sidecar
unit. DO NOT insert just the blade server into the chassis.
Blade Server
Ejector
Handles
7. Separate the sidecar with the Database Accelerator Cards and associated hardware
from the blade processor card.
Note: Make note of the MTM and serial number of the original blade. These numbers
are located on a label on the side of the blade chassis.
8. As necessary, move components such as 10Gb LOM Interposer, DIMMs, and CPUs to
the replacement card.
9. If replacing a Database Accelerator Card:
a. Remove the cover from the sidecar assembly.
b. Disconnect the ribbon cable from the expansion card:
c. On the underside of the tray, press the blue release latch and slide the expansion
card/riser assembly out of the tray:
DAC2
DAC1
d. Release the adapter-retention latch and remove the failed DAC from the Riser
Assembly.
Note that DAC1 is in the top PCI slot and DAC2 is in the bottom PCI slot of the
sidecar. Looking at the serial number on the DAC, verify that you are replacing the
failed DAC listed in the output shown in step 9.
There is a specified sequence to follow when inserting the DAC cards into the Riser Assem-
bly. For that reason, be aware of:
DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2.
If replacing DAC1, also remove DAC2 from the Riser Assembly, replace DAC1
with the replacement DAC, then reinstall DAC2.
If replacing only DAC2, there is no need to remove DAC1 from the Riser
Assembly.
DAC2
DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2
Properly align the DAC connector panel with the sidecar front panel and with
the alignment pin at the top of the connector panel (see Figure 2-16)
Fully seat the DAC into the PCI connector, firmly pressing on the top of the
connector panel
Ensure that adapter-retention latch is fully closed and locked
Alignment Pin
Connector Panel
Insert
2nd
Insert
1st
h. Using care to ensure that the side tabs of the cover do not contact the DACs,
replace the cover for the sidecar assembly.
Ensure when replacing the cover that the latching tab does not push on the DAC card,
causing misalignment of the DAC cards in the assembly.
DAC
10. Assemble the sidecar with the Database Accelerator Cards and associated hardware
onto the blade processor card.
Statement 21
CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
11. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
Note: DO NOT plug the SAS cables into the sidecar until instructed to do so.
12. Temporarily connect the USB cable from the KVM USB/Video/Ethernet Adapter to the
AMM managing the chassis containing the replaced S-Blade. For multi-rack systems,
you may need a long CAT-5 cable for the USB Adapter to reach the AMM.
HS23 blade server planars coming out of FRU stock are likely to have a down-rev version of
Emulex firmware that prevents the blade from booting.
You must have on hand media for booting the blade and updating the embedded Emulex
firmware. This media is available from Fix Central: 4.2.0.5-IM-Netezza-HOSTFW-HS23.
Use the downloaded ISO to create a bootable DVD.
Included with the media on Fix Central is a README file intended for the IBM SSR
taking the media to the customer site. It is important for the SSR to read and under-
stand the content of the README to successfully complete the replacement process.
A copy of the README follows:
This bootable DVD must be used as part of the HS23 S-Blade planar or CMOS
Battery replacement in IBM PureData Systems for Analytics N3001.
NOTE: Ensure SAS cables are not attached to the SPU before booting this
DVD
*** The HS23 Problem Determination and Service Guide is used only for ***
*** the mechanical steps of removing the components. DO NOT use the ***
*** standard USB thumb drive during the "replacespu" blade replacement ***
*** procedure. ***
The bootable DVD is loaded after the replacement blade is inserted into
the H-Chassis.
WARNING: DO NOT use this DVD on any S-Blades other than HS23 (MT 7875).
Note: This DVD is not required for processor or DIMM replacement on the
HS23.
It is required only for the planar or CMOS battery replacement.
************************************************************************
* *
* During the update process, the blade *
* reboots UP TO 7 times! *
* These 7 reboots can take up to 30 minutes! *
* The process continues after the DVD ejects. *
* *
* Wait 3-4 minutes after the DVD routine screen displays *
* “Ready" *
* before continuing the replacespu procedure *
* by pressing Enter *
* *
************************************************************************
Issues:
1) The HS23 FRU stock for the planar (7875AC1) may have a level of code
for the Emulex NIC that causes hangs during Media boot (DVD) or PXE
boot operations on systems with heavy network traffic. That code level
is 4.6.281.26.
NOTE: If booting the DVD hangs on the replacement planar, you may need
to disable the network ports on the gig switches for the DVD to boot
during the Emulex FW update.
If it is necessary to disable the network Ports on the gig switches
for the S-Blade to boot, details are in Chapter 2 of the
N3001 Replacement Procedures.
2) The Emulex firmware 10.2.261.36 and/or 10.2.377.29 may occasionally
"HANG" during reboot of an S-Blade when heavy workload is on fabric.
Emulex Firmware 10.2.377.59 corrects this "HANG" condition so the
S-Blade proceeds in booting.
However, it has been found that the 10.2.377.59 code may cause bus
contention between the management bus and the fabric bus.
The latest code, 10.2.377.65, corrects all the above issues.
3) The ASU setting for 64-Bit resource allocation is incorrect (disabled)
while the blade settings are in default state. This prevents the NPS
64-bit OS from loading on the blade. This setting must be changed to
Enable for replacespu to continue. The DVD corrects this setting.
Note: During the booting of the DVD various error messages may display. They can be
ignored (unless there is a kernel panic, in which case reseat the S-Blade and reboot). If
you again get the Kernel panic, you may need to replace the S-Blade.
If the blade does not boot to the DVD, see section “Hang during DVD Boot” on page 2-39.
directory
Setting host name to tc-10-0-9-183
-- Calling nzprehook
Entering nzprehook.sh
Type y to change the MTM to 7875AC1, or type n if the MTM is already set to
7875AC1.
If you typed y, you are prompted for the 7 digit type number.
Enter new 7 digit type number ie: 7875AC1
After updating the serial number, the script reboots the IMM (taking about four
minutes), and then reboots the blade.
Note: The VPD is updated, and at this point the blade reboots and restarts at the
beginning of the prompts, asking for MTM and serial numbers. You answer n to
those prompts this time.
When the blade reboot completes, the Customized Media menu displays again.
Loading Customized Media . . .
-- Calling nzprehook
Entering nzprehook.sh
Type: 2
Type n at the MTM prompt (assuming the MTM is set correctly).
Type n at the serial number prompt (assuming the serial number is set correctly).
The script checks the Emulex firmware, and updates if needed:
**************************************************************
Emulex Adapter updater is Enabled
Checking Revision to determine if update is required....
**************************************************************
Emulex NIC updater for Machine Type 7875
The script assumes the 7875 is configured with
an Emulex adapter in device “eth0”
#################################################################
Note: Only ONE USB flash drive can be plugged into the media tray.
Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).
USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:
When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
i. For CMOS battery replacements:
The script now updates the ASU settings.
-- Running ASU IMM configuration unattended mode
At this point, the routine continues and asks if you want to save log files to a
USB Flash Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################
#################################################################
Note: Only ONE USB flash drive can be plugged into the media tray.
Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).
USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:
When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
j. When the console displays Ready:
Remove the USB flash drive (if used)
Remove the DVD from the media tray
Disconnect the KVM cable that was temporarily connected to the AMM
Wait four additional minutes before proceeding
The SAS cables must NOT be connected at this point.
14. Resume the replacespu procedure to complete the blade service activity.
From this point on, replacespu may take up to 90 minutes to complete the updates of
the S-Blade firmware and the FPGA code.
Press Enter (or RETURN) for the replacespu script to continue.
Example output:
Is the SAS Cable still disconnected from the replacement blade? (y/
n) [y]
15. Ensure that the SAS cable is removed from the failed S-Blade and type y.
When the script completes, the script displays:
Logfile: /nz/var/log/replacespu/replacespu20141222151546.log.gz
NZ support actions
1 Replace Storage Disk
2 Replace SPU
3 Quit
>
16. Select option 3 from the menu:
> 3
Reattach the SAS cables to their original positions.
The output from replacespu indicates it is necessary to rebalance the blades. However, a
current issue with Emulex firmware requires an updated be made using a tool in FDT Sup-
port Tools 2.0.0.5. After the firmware is updated, the NPS system must be stopped and
restarted (using nzstop and nzstart). No rebalance is needed in this procedure. See step 21
on page 2-35 through step 23 on page 2-35. The nzstop/nzstart step must be scheduled
with the customer when the system can be taken offline.
17. The replacement is complete after the Emulex firmware is updated as instructed in
FDT Support Tools 2.0.0.5 and the NPS system is stopped/started.
If you are using this online replacement process to replace a “failing” S-Blade or S-Blade
component such as:
DIMM
Processor
BladeCenter Expansion Blade
The command in step 9 does not return any values.
If the S-Blade is not already in a failed state, you must manually failover an S-Blade to
replace components.
To failover an active but problematic S-Blade that has components you want to replace,
follow these steps.
This procedure assumes that you know the ID of the problematic S-Blade.
If the S-Blade is already failed, skip to step 1.
a. Fail over the problem S-Blade:
[nz@nzhost1 ~]$ nzhw failover -id 1607
Are you sure you want to proceed (y|n)? [n] y
When you fail over the S-Blade, the system performs a state transition to reconfigure the
topology of the system; that is, the system redirects the data slices owned by the now failed
S-Blade to the other active S-Blades in the same chassis. The transition process can take
approximately 15 minutes to complete. Wait for the system to transition back into the
Online state before you continue. (Use the nzstate command to confirm the Online state.)
b. Proceed to either “Restricted Environment (Online) - Replace a Blade Server,
Battery, or DAC” on page 2-3 or “Replace a Blade Server, Battery, or DAC” on
page 2-20.
Note: If the replacespu script exits after timing out (up to 60 minutes), remove the
replacement S-Blade and re-insert the board, then run the replacespu script again. If
the command times out again, contact IBM Netezza Support for assistance.
3. Making note of cable locations, so that they can be replaced in the same locations,
unplug the SAS cables from the S-Blade sidecar.
4. Remove the S-Blade assembly from the chassis.
Note: It is NOT necessary to power down the blade.
Only re-insert the S-Blade into the chassis as a complete Blade Server/Expansion Sidecar
unit. DO NOT insert just the blade server into the chassis.
Blade Server
Ejector
Handles
5. Separate the sidecar with the Database Accelerator Cards and associated hardware
from the blade processor card.
Note: Make note of the MTM and serial number of the original blade. These numbers
are located on a label on the side of the blade chassis.
6. As necessary, move components such as 10Gb LOM Interposer, DIMMs, and CPUs to
the replacement card.
7. If replacing a Database Accelerator Card:
c. On the underside of the tray, press the blue release latch and slide the expansion
card/riser assembly out of the tray:
DAC2
DAC1
d. Release the adapter-retention latch and remove the failed DAC from the Riser
Assembly.
Note that DAC1 is in the top PCI slot and DAC2 is in the bottom PCI slot of the
sidecar. Looking at the serial number on the DAC, verify that you are replacing the
failed DAC listed in the output shown in step 9.
There is a specified sequence to follow when inserting the DAC cards into the Riser Assem-
bly. For that reason, be aware of:
DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2.
If replacing DAC1, also remove DAC2 from the Riser Assembly, replace DAC1
with the replacement DAC, then reinstall DAC2.
If replacing only DAC2, there is no need to remove DAC1 from the RIser
Assembly.
DAC2
DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2
Properly align the DAC connector panel with the sidecar front panel and with
the alignment pin at the top of the connector panel (see Figure 2-16)
Fully seat the DAC into the PCI connector, firmly pressing on the top of the
connector panel
Ensure that adapter-retention latch is fully closed and locked
Alignment Pin
Connector Panel
Insert
2nd
Insert
1st
h. Using care to ensure that the side tabs of the cover do not contact the DACs,
replace the cover for the sidecar assembly.
Ensure when replacing the cover that the latching tab does not push on the DAC card,
causing misalignment of the DAC cards in the assembly.
DAC
8. Assemble the sidecar with the Database Accelerator Cards and associated hardware
onto the blade processor card.
Statement 21
CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
9. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
Note: DO NOT plug the SAS cables into the sidecar until instructed to do so.
10. Temporarily connect the USB cable from the KVM USB/Video/Ethernet Adapter to the
AMM managing the chassis containing the replaced S-Blade. For multi-rack systems,
you may need a long CAT-5 cable for the USB Adapter to reach the AMM.
HS23 blade server planars coming out of FRU stock are likely to have a down-rev version of
Emulex firmware that prevents the blade from booting.
You must have on hand media for booting the blade and updating the embedded Emulex
firmware. This media is available from Fix Central: 4.2.0.5-IM-Netezza-HOSTFW-HS23.
Use the downloaded ISO to create a bootable DVD.
Included with the media on Fix Central is a README file intended for the IBM SSR
taking the media to the customer site. It is important for the SSR to read and under-
stand the content of the README to successfully complete the replacement process.
A copy of the README follows:
This bootable DVD must be used as part of the HS23 S-Blade planar or CMOS
Battery replacement in IBM PureData Systems for Analytics N3001.
NOTE: Ensure SAS cables are not attached to the SPU before booting this
DVD
*** The HS23 Problem Determination and Service Guide is used only for ***
*** the mechanical steps of removing the components. DO NOT use the ***
*** standard USB thumb drive during the "replacespu" blade replacement ***
*** procedure. ***
The bootable DVD is loaded after the replacement blade is inserted into
the H-Chassis.
WARNING: DO NOT use this DVD on any S-Blades other than HS23 (MT 7875).
Note: This DVD is not required for processor or DIMM replacement on the
HS23.
It is required only for the planar or CMOS battery replacement.
************************************************************************
* *
* During the update process, the blade *
* reboots UP TO 7 times! *
* These 7 reboots can take up to 30 minutes! *
* The process continues after the DVD ejects. *
* *
* Wait 3-4 minutes after the DVD routine screen displays *
* “Ready" *
* before continuing the replacespu procedure *
* by pressing Enter *
* *
************************************************************************
Issues:
1) The HS23 FRU stock for the planar (7875AC1) may have a level of code
for the Emulex NIC that causes hangs during Media boot (DVD) or PXE
boot operations on systems with heavy network traffic. That code level
is 4.6.281.26.
NOTE: If booting the DVD hangs on the replacement planar, you may need
to disable the network ports on the gig switches for the DVD to boot
during the Emulex FW update.
If it is necessary to disable the network Ports on the gig switches
for the S-Blade to boot, details are in Chapter 2 of the
N3001 Replacement Procedures.
2) The Emulex firmware 10.2.261.36 and/or 10.2.377.29 may occasionally
"HANG" during reboot of an S-Blade when heavy workload is on fabric.
Emulex Firmware 10.2.377.59 corrects this "HANG" condition so the
S-Blade proceeds in booting.
However, it has been found that the 10.2.377.59 code may cause bus
contention between the management bus and the fabric bus.
The latest code, 10.2.377.65, corrects all the above issues.
3) The ASU setting for 64-Bit resource allocation is incorrect (disabled)
while the blade settings are in default state. This prevents the NPS
64-bit OS from loading on the blade. This setting must be changed to
Enable for replacespu to continue. The DVD corrects this setting.
Note: During the booting of the DVD various error messages may display. They can be
ignored (unless there is a kernel panic, in which case reseat the S-Blade and reboot). If
you again get the Kernel panic, you may need to replace the S-Blade.
If the blade does not boot to the DVD, see section “Hang during DVD Boot” on page 2-39.
-- Calling nzprehook
Entering nzprehook.sh
Type y to change the MTM to 7875AC1, or type n if the MTM is already set to
7875AC1.
If you typed y, you are prompted for the 7 digit type number.
Enter new 7 digit type number ie: 7875AC1
After updating the serial number, the script reboots the IMM (taking about four
minutes), and then reboots the blade.
Note: The VPD is updated, and at this point the blade reboots and restarts at the
beginning of the prompts, asking for MTM and serial numbers. You answer n to
those prompts this time.
When the blade reboot completes, the Customized Media menu displays again.
Loading Customized Media . . .
-- Calling nzprehook
Entering nzprehook.sh
Type: 2
Type n at the MTM prompt (assuming the MTM is set correctly).
Type n at the serial number prompt (assuming the serial number is set correctly).
The script checks the Emulex firmware, and updates if needed:
**************************************************************
Emulex Adapter updater is Enabled
Checking Revision to determine if update is required....
**************************************************************
Emulex NIC updater for Machine Type 7875
The script assumes the 7875 is configured with
an Emulex adapter in device “eth0”
The routine continues and asks if you want to save log files to a USB Flash
Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################
#################################################################
Note: Only ONE USB flash drive can be plugged into the media tray.
Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).
USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:
When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
h. For CMOS battery replacements:
The script now updates the ASU settings.
-- Running ASU IMM configuration unattended mode
At this point, the routine continues and asks if you want to save log files to a
USB Flash Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################
Log copy to USB Flash Drive is enabled
-- Only 1 flash drive can be inserted
-- The contents of /tmp and /var/log will be tarred into a
single file and written to the flash drive. Make sure it
has enough room. A minimum of 2.5 MB is required.
#################################################################
Note: Only ONE USB flash drive can be plugged into the media tray.
Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).
USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:
When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
i. When the console displays Ready:
Remove the USB flash drive (if used)
Remove the DVD from the media tray
Disconnect the KVM cable that was temporarily connected to the AMM.
Wait four additional minutes before proceeding
j. Skip to step 14 on page 2-34
The SAS cables must NOT be connected at this point.
12. When the console displays Ready:, open a new session and check the state of the S-
Blade:
a. Ping the replacement S-Blade to ensure it is reachable:
[root@nzhost1 ~]# ping spu xxyy
Where xx is the SPA number and yy is the slot number for the replacement blade.
For example:
[root@nzhost1 ~]# ping spu0103
Type Ctrl-C to exit the ping session.
Example output:
PING spu0101.npsdomain (10.0.14.28) 56(84) bytes of data.
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=2 ttl=64 time=0.070 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=3 ttl=64 time=0.068 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=4 ttl=64 time=0.069 ms
--- spu0103.npsdomain ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.068/0.071/0.077/0.003 ms
Note: DO NOT continue with this procedure until you are successful with the ping com-
mand. Contact IBM Netezza support if you have issues at this point.
14. Resume the replacespu procedure to complete the blade service activity.
From this point on, replacespu may take up to 90 minutes to complete the updates of
the S-Blade firmware and the FPGA code.
Press Enter (or RETURN) for the replacespu script to continue.
Example output:
Is the SAS Cable still disconnected from the replacement blade? (y/
n) [y]
15. Ensure that the SAS cable is removed from the failed S-Blade and type y.
When the script completes, the script displays:
Logfile: /nz/var/log/replacespu/replacespu20141222151546.log.gz
The output from replacespu indicates it is necessary to rebalance the blades. However, a
current issue with Emulex firmware requires an updated be made using a tool in FDT Sup-
port Tools 2.0.0.5. After the firmware is updated, the NPS system must be stopped and
restarted (using nzstop and nzstart). No rebalance is needed in this procedure. See step 21
through step 23. The nzstop/nzstart step must be scheduled with the customer when the
system can be taken offline.
16. When the replacespu script completes, plug the SAS cables into their original connec-
tors on the S-Blade DACs. Ensure that the cable are routed correctly and are secured
with the Velcro straps to the cable guide at the bottom of the rack door opening.
17. Return to the nz account session.
18. Perform the following steps to activate the S-Blade.
19. Verify that the S-Blade is now available as Spare.
[nz@nzhost1 ~]$ nzhw show -id nnnn
Where nnnn is the new hwid of the replacement SPU.
20. Run the diagnostic script that test the blade settings:
[nz@nzhost1 ~]$ nzpush -s x/y diag run
Where x is the SPA number, and y is blade slot number.
spu0103: Start time: 2013_10_15_17_09_34
spu0103: 001.0 FPGA PCIe CFG Test------------------------->PASSED
spu0103: 001.1 PLX PCIe CFG Test-------------------------->PASSED
spu0103: 001.2 LSI SAS PCIe CFG Test---------------------->PASSED
spu0103: 002.0 FPGA 0 Memory Calibration Test------------->PASSED
spu0103: 002.1 FPGA 1 Memory Calibration Test------------->PASSED
spu0103: 003.0 FPGA 0 Bist POST Test---------------------->PASSED
spu0103: 003.1 FPGA 1 Bist POST Test---------------------->PASSED
spu0103: Done Executing loop 1
If any tests show as failed, troubleshoot the failed component and update its firmware,
then rerun this step.
21. From FDT Support Tools 2.0.0.5, load the tool for updating Emulex firmware
(described in the README for FDT Support Tools).
22. Update the Emulex firmware as described in the FDT Support Tools 2.0.0.5 README.
23. Stop, and then restart NPS:
[nz@nzhost1 ~]$ nzstop
When the system has stopped:
[nz@nzhost1 ~]$ nzstart
Verification
Check the system configuration.
1. Run the commands:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./sys_rev_check
If issues are noted in the sys_rev_check output (such as requiring a firmware update),
resolve the issues as described in the FDT User’s Guide, in the section “Resolve sys_
rev_check Issues,” and then rerun sys_rev_check to verify that issues are resolved.
Note: After updating the Emulex firmware using FDT Support Tools 2.0.0.5, the Ether-
net firmware is listed as [ABOVE]. This is expected and acceptable.
Note: This procedure requires the Software Support Tools is loaded on the system.
11. If replacing the blade server 10Gb LOM Interposer, DIMM, blade server processor, or
expansion unit, follow the procedure for that component as documented in IBM Blade-
Center HS23 Problem Determination and Service Guide.
Note: When replacing the Expansion Unit, do not re-use the PCI riser from the original
Expansion Unit. Always use the PCI riser from the replacement Expansion Unit.
Ensure when replacing the expansion unit cover that the latching tab does not push on the
DAC card, causing misalignment of the DAC cards in the assembly.
DAC
12. Assemble the sidecar with its components onto the blade processor card.
Statement 21
CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
13. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
14. Reconnect the SAS cabling.
15. Power on the S-Blade.
16. Wait for the S-Blade to become active. Check the blade state by using the command:
[root@nzhost1 ~]# ssh mm0xx info -T blade[x]
Where xx is the SPA number (01-08) and y is the slot number of the S-Blade server (1,
3, 5, 7, 9, 11, 13).
17. As user nz, restart the NPS system:
c. If the firmware requires updating, or if other failures are noted in the sys_rev_check
output, resolve the issues as described in the FDT User’s Guide, in the section
“Resolve sys_rev_check Issues,” and then rerun sys_rev_check to verify that issues
are resolved.
19. Check for any issues with the commands:
[nz@nzhost1 ~]$ nzhw -issues
[nz@nzhost1 ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues
If issues are found, consult with IBM Netezza Support.
20. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
If you encounter issues after replacing components without running replacespu, run the
complete replacespu procedure to resolve the issues.
Before you begin the ESM replacement process, make certain that you have a replacement
ESM that conforms to the hardware models supported for the N3001 system.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.
This procedure requires the user to have root access if firmware updates are required.
3-1
Replacement Procedures: IBM PureData System for Analytics N3001
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
c. Make note of the output to compare with results later in the procedure.
8. Identify the failed ESM that requires replacement:
[nz@nzhost1 ~]$ nzhw -type mm -issues
Description HW ID Location Role State
----------- ----- ------------------ ------ ------
MM 1074 spa1.diskEncl1.mm1 Active Failed
The Location entry provides details on the ESM location. See Figure 1-3 on page 1-4
through Figure 1-11 on page 1-12 for the physical location of N3001 components.
9. Check the ESM serial numbers:
[nz@nzhost1 ~]$ nzhw -type mm -detail
Description HW Location Role State Serial
ID number
----------- ----- ------------------ ------ ------ -------
MM 1071 spa1.diskEncl1.mm1 Active Ok
YM10Z12W036
...
Examine the serial numbers of each ESM and compare them to the serial number on
the label of the replacement ESM. Duplicate serial numbers are not allowed. If an
existing ESM serial number matches that of the replacement ESM, a different replace-
ment ESM must be used.
10. Locate the failed ESM. The amber Fault LED is lit on the failed ESM.
Input Connectors
Fault LED IN1 IN2
11. Remove all cables from the ESM, ensuring that the cables are properly labeled so that
they can be replaced into their original locations.
12. Remove the shipping bracket (if installed, colored orange) at the rear of the disk
enclosure:
a. Using a 7mm socket, loosen (do not remove) the nut that secures the shipping
bracket to each side of the rack.
b. With the nuts loosened, pull the bracket away from the disk enclosure, detaching
the bracket from the enclosure.
c. To completely remove the bracket from the rack, cables on one side of the rack,
behind the enclosure, need to be loosened by cutting the zip tie (or loosening the
Velcro tie) that secures them to the side of the rack.
d. After cutting the zip tie (or loosening the Velcro tie) lift that end of the bracket off
the rail and then rotate the bracket out of the rack.
e. Reattach the cables to their original position using a new zip tie (or the Velcro tie).
f. Tighten the nuts that secured the bracket to each side of the rack.
g. Repeat for all disk enclosure brackets.
h. The bracket(s) should be stored on-site in the event the system needs to be moved
(requiring them to be re-installed).
13. Remove the failing ESM from the enclosure.
Note: It may be necessary to temporarily move cabling out of the way to make room for
the ESM removal and replacement.
Release
Levers
16. Wait three minutes for the ESM to update from the backplane.
17. On systems with NPS versions earlier than 7.0 P14, 7.0.2 P11, 7.0.4 P13, 7.1, check
the ESM serial numbers:
[nz@nzhost1 ~]$ nzhw -detail | grep None
Verify that there are no duplicate serial numbers. If a duplicate serial number exists,
the ESM must be replaced.
18. Check for multi-path issues:
[nz@nzhost ~]$ nzpush -a mpath -issues
If issues are found, resolve the issues by typing:
[nz@nzhost ~]$ nzpush -a mpath dm -r
c. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 7. If issues are noted in the
sys_rev_check output, in particular, the ESM firmware, resolve the issues, such as
updating firmware, as described in the FDT User’s Guide, in the section “Resolve
sys_rev_check Issues.”
If a firmware update is required, the disk enclosure must be power-cycled for the firmware
to be loaded. This power cycle requires a customer outage and must be planned for.
Each IBM PureData System for Analytics N3001-002 and larger rack contains:
Two Disk Enclosures - N3001-002
Six Disk Enclosures - N3001-005
Twelve Disk Enclosures - N3001-010 and larger
Each Disk Enclosure houses 24 disk drives.
Note: The N3001-001 system does not use Disk Enclosures. Each N3001-001 system
includes two Host Servers, each with 24 disk drives. Sixteen disks are the data disks (slots
8-23) connected to SAS HBA, and the other eight disks are the system disks connected to
on-board RAID controller and configured in a RAID array (slots 0-7). This chapter describes
replacement procedure for data disks (slots 8-23).
Before you begin the disk replacement process, make certain that you have a replacement
disk that conforms to the hardware models supported for the N3001 system. The N3001
system uses Self-Encrypting Drives (SEDs). Typically, you will use a new replacement disk.
Also before beginning the replacement procedure, verify that there is a problem with the
disk drive. Consult the Problem Determination and Service Guide for the disk enclosure or
host server for more information on disk replacement.
The N3001-002 and larger systems currently support one SED model: 600GB Model
ST600MM0026E - FRU number 00AK388, firmware rev. E56D, E56F
The N3001-001 system currently supports one SED model: 600GB Model
ST600MM0026E - FRU number 90Y8909, firmware rev. E56D
Note: Self-Encrypting Drives have some important differences to be aware of:
When Security is Enabled (drives are in auto-lock mode) and SecureEraseOn-
Failover = True (default setting), a secure erase is performed automatically during
the failover of a drive.
When Security is Enabled (drives are in auto-lock mode), the replacement SED
performs a secure erase when transitioning roles from Inactive to Spare (during
activation).
4-1
Replacement Procedures: IBM PureData System for Analytics N3001
This procedure requires the user to have root and nz access. An option is also available that
requires the user to have access as nzibmsupport13 and nz.
When replacing multiple drives, replace one disk at a time, as instructed in the procedure,
before replacing the next drive.
The estimated time to perform this procedure is from 20 to 25 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
If Call Home is enabled on this system, the System Manager will report this activity and
create PMRs. The customer must be made aware of this so that the PMRs can be closed
when filed.
Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
5. Type y.
Example output:
################################
Beginning Remove phase:
The physical location of the disk needs to be located.
In order to locate the disk, the next 3 steps will have the LED
light turned on (1), off (2), and then back on (3).
This will help you to locate the disk. Mark the located disk in a
non-harmful way (with a sticker, etc).
After all 3 steps are completed, you will be asked if you want to
retry those steps.
When you are ready to start the steps to locate the failed disk,
press Enter to continue
8. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
Do you want to redo the above steps again? [y/n]:
9. Type n and press Enter.
Example output:
CONFIRMING AGAIN TO MAKE SURE YOU HAVE SELECTED THE CORRECT DISK!
You have selected the following disk to be removed:
Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
10. Type y and press Enter.
Example output:
Please remove the following disk:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)
Once you have removed the above disk, press Enter to continue
11. Remove the failed disk identified in step 10.
When removing a drive, pull it only half way out of the slot and wait 90 seconds for the disk
to spin down before fully removing the disk drive.
################################
Beginning Insert phase:
################################
Beginning Firmware Update phase:
################################
Beginning Activation phase:
Activating the disk . . . Disk is now activated and ready to use
################################
Report
Report of disk that has been removed:
Note: PFE information from the failed disk is contained in the log.
Disk with hardware ID: 1059 (upper host, 23rd host disk (slot 22))
##################################################################
Report:
Report of disk that has been removed:
##################################################################
Disks are ready to be activated.
During host disk activation the system is paused and SPUs reconfigured and restarted.
When the restart is complete the system goes back to the online state. During this pro-
cess several intermediate system states can also be observed (Discovering, Initializing,
Resuming).
The replacement disk now has the role of Spare. If more than one disk has been
replaced, they all become activated.
16. The replacement is complete.
To replace a “failed” or “failing” disk drive on the system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the system as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
The Role of a failed disk shows Failed or Incompatible. The State column typically
shows a state of Ok. The state could also be Unsupported, Unreachable, None, or
Degraded. The Security column indicates if the Self-Encrypting Drive (SED) is in auto-
lock mode (Enabled) or in default mode (not locked = Disabled).
Note: A Failed drive still installed in the system should always be shown as Disabled.
a. Verify the location information for the disk ID that you want to replace:
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl6.disk2 Active Ok Enabled
b. Fail over the disk:
[nz@nzhost ~]$ nzhw failover -id 1166
Are you sure you want to proceed (y|n)? [n] y
Note: The system manager will not allow you to fail over a disk manually if it holds the
last remaining copy of a data slice (that is, if the disk is unmirrored).
Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
10. Type y.
Example output:
################################
Beginning Remove phase:
The physical location of the disk needs to be located.
In order to locate the disk, the next 3 steps will have the LED
light turned on (1), off (2), and then back on (3).
This will help you to locate the disk. Mark the located disk in a
non-harmful way (with a sticker, etc).
After all 3 steps are completed, you will be asked if you want to
retry those steps.
When you are ready to start the steps to locate the failed disk,
press Enter to continue
Example output:
STEP 3: Reconfirm the LED light is turned on (Don't forget to mark
the disk)
Turning on LED light for disk 1166 ... (Please allow up to 2 minutes
for the LED light to turn on)
Is the LED light for disk 1166 turned on? [y/n]:
13. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
Do you want to redo the above steps again? [y/n]:
14. Type n and press Enter.
Example output:
CONFIRMING AGAIN TO MAKE SURE YOU HAVE SELECTED THE CORRECT DISK!
You have selected the following disk to be removed:
Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
15. Type y and press Enter.
Example output:
Please remove the following disk:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)
Once you have removed the above disk, press Enter to continue
16. Remove the failed disk identified in step 13.
When removing a drive, pull it only half way out of the slot and wait 90 seconds for the disk
to spin down before fully removing the disk drive.
################################
Beginning Insert phase:
Waiting for disk to become visible to NPS. This may take a few minutes
...
Disk (Hardware ID 1638, Serial Number , F/W Rev E56D) is visible at
rack 1, SPA 1, enclosure 6, slot 2
################################
Beginning Firmware Update phase:
################################
Beginning Activation phase:
Activating the disk . . . Disk is now activated and ready to use
################################
Report
Note: PFE information from the failed disk is contained in the log.
Disk with hardware ID: 1059 (upper host, 23rd host disk (slot 22))
##################################################################
Report:
##################################################################
Disks are ready to be activated.
Replacement Procedure
Note: If you require assistance performing this procedure, call IBM Netezza Support.
The Role of a failed disk shows Failed. The State column typically shows a state of Ok.
The state could also be Inactive, Unsupported, Unreachable, None, or Degraded. The
Security column indicates if the Self-Encrypting Drive (SED) is in auto-lock mode
(Enabled) or in default mode (not locked = Disabled).
Note: A Failed drive should always be shown as Disabled.
7. If you are replacing a "failing" disk (an active disk that does not yet have a role of
failed), the nzhw -issues command will not return any information about that disk. To
fail over an active but problematic disk that you want to replace, use the following
steps. This procedure assumes that you know the ID of the problematic disk (1166 in
this example). If the target disk is already in the Failed role, skip to step 8.
Never remove a disk drive from a storage array while the disk is in the Active, Assigned,
Assigning, Spare, Sparing, or Failing role. Always make sure that you failover the Active,
Assigned, or Spare disk before you attempt to remove it. If the disk is in Assigning, Sparing
or Failing role, wait for it to be transitioned to the Assigned, Spare, or Failed role before
failing it over. If you remove an Active, Assigned, Assigning, Spare, Sparing, or Failing disk
drive, you could cause the system to restart and/or transition to the down state.
a. Verify the location information for the disk ID that you want to replace:
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl1.disk2 Active Ok Enabled
b. Fail over the disk:
[nz@nzhost ~]$ nzhw failover -id 1166
Are you sure you want to proceed (y|n)? [n] y
Note: The system manager will not allow you to fail over a disk manually if it holds the
last remaining copy of a data slice (that is, if the disk is un-mirrored).
Failure to replace a hard disk drive in its correct bay might result in loss of data. If you are
replacing a hard disk drive that is part of a configured array and logical drive, be sure to
install the replacement hard disk drive in the correct bay.
Never swap a drive when its associated green activity LED is flashing. Swap a drive only
when its associated amber LED is blinking.
10. Mark the failed disk drive in a non-harmful way to ensure that the correct disk is
replaced in the next step.
11. Replace the failed disk identified in step 9:
a. Unlock the drive carrier by pushing up slightly on the release latch, then pulling
down and out on the drive handle (see Figure 4-3).
b. Pull the disk drive half way out of the slot.
c. Wait at least 90 seconds for the disk to spin down and fully clean up traces of the
failed disk.
Failing to wait for disk spin down can crash the disk heads and destroy data on the disk.
Always wait at least 90 seconds between pulling the failed disk out and inserting the
replacement disk to allow the system to fully clean up traces of the failed disk.
To activate the disk, using the HW ID identified in step 14, type the command:
[nz@nzhost ~]$ nzhw activate -id <HW ID>
When prompted, type y for confirmation.
For example, if the HWID of the disk (identified in step 14) is 1637:
[nz@nzhost ~]$ nzhw activate -id 1637
On N3001-001 systems, when prompted, type y for confirmation.
Note: On N3001-001 during host disk activation the system is paused and SPUs
reconfigured and restarted. When the restart is complete the system goes back to the
online state. During this process several intermediate system states can also be
observed (Discovering, Initializing, Resuming).
Compare the results to those obtained in step 6. If a regen was in process, progress
must have been noted. Otherwise, call IBM Netezza Support.
23. Check for data slice issues, in the event that a data slice may be degraded:
[nz@nzhost ~]$ nzds -issues
If a data slice is degraded, you must initiate a manual disk regeneration to the spare
disk. For more information, see the Netezza System Administrator's Guide, in Chapter
5, in the section “Regenerate a Disk Slice.”
24. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
IMPORTANT! Do not start the disk regeneration of a new, spare disk until you have verified
that its firmware meets the minimum required firmware revision. Proceed to the next sec-
tion, "Checking the Firmware Revision of the Replacement Disk."
To check the firmware revision of a replacement disk drive, follow these steps:
1. Log in to the Netezza system as the nz user.
2. Display the firmware details of the replacement disk using its hardware ID:
a. For N3001-002 and larger systems, for HWID 1637:
[nz@nzhost ~]$ nzhw show -id 1637 -detail
Description HW ID Location Role State Serial number Version
----------- ----- ------------------- ----- ----- ------------------- -------
Disk 1637 spa1.diskEncl6.disk2 Spare Ok S0M28B230000M4357PEP E56D
Detail Model
-------------------------------
558.91 GiB; ST600MM0026;
In the sample, the disk firmware revision is in the Hw Version column (highlighted
in bold italics E56D).
b. For N3001-001 systems:
[nz@nzhost ~]$ nzhw show -id xxxx -detail
Where xxxx is the HWID from step 20 on page 4-20.
The output from this command includes a serial number. Use that serial number in
the command:
[nz@nzhost ~]$ nzhw show -detail | grep serial_number
Where serial_number is the serial number from the output of the previous
command.
The output from this command shows the firmware revision on the replacement disk
drive.
If the disk firmware revision matches the correct revision, the firmware check pro-
cess is complete and you can skip the remaining steps of this procedure.
If the disk firmware revision does not match the correct revision, you must update
the firmware by continuing with this procedure.
Note: The N3001 system performs disk firmware checking. If the replacement disk
does not match the minimum allowed firmware revision, an nzevent warning message
is generated and a warning logged in sysmgr.log.
For example:
NZEVENT
NPS system Q100-23E-D - Disk 1053 Needs attention. System
initiated.
location:Logical Name:'spa1.diskEncl1.disk6' Logical Location:'1st
Rack, 1st SPA, 1st DiskEnclosure, Disk 6'
error string:disk firmware revision is below supported level
devSerial:6XR1PF800000M226AT22
event source:System initiated
SYSMGR
2013-12-02 10:21:58.275990 EDT Warning: Disk [disk hwid=1053
sn="6XR1PF800000M226AT22" SPA=1 Parent=1008 Position=12
ParentEnclPosition=2] is at firmware revision 'B556', which is
below supported level of 'B55C'
Note: If the disk firmware revision is a later version than the version listed as the cor-
rect revision, it is not necessarily a problem and no action may be needed. Check with
IBM support if you need assistance.
A disk must be idle before you update its firmware. If a disk has a role of Spare, as shown
in the output of step 2, the disk is idle and you can proceed.
Note: The firmware updater does not allow the user to update disk firmware if its state
is active.
For N3001-001:
The firmware_updater command has the following format:
[root@nzhost fdt]# ./firmware_updater Host --update-storage --ignore-
cluster-state --ignore-nps-state --alias 'haX' --slot YY --skip-bmc-
login-test --skip-prompt
Where YY is the slot number for the disk as reported by nzhw (slot 1 through 24).
For example, for SPA 1, Enclosure 6, Disk 2:
[root@nzhost fdt]# ./firmware_updater Host --update-storage --ignore-
cluster-state --ignore-nps-state --alias 'haX' --slot 12 --skip-bmc-
login-test --skip-prompt
5. Verify that the replacement disk has the correct revision of the firmware (shown as
E56D in this example), and the overall drive health is OK (no predictive failures, low
Grown defect list value: less than 6 for new drives).
As user nz, type the commands:
[nz@nzhost fdt]$ cd /opt/nz/fdt
[nz@nzhost fdt]$ ./storage_diags smart --spa 1 --encl 6 --disk 2
Example output:
Now creating the lock file [DONE]
------------------------------------------------------------------
***** S T O R A G E D I A G S *****
FDT 4.2.0.0 - /opt/nz/fdt/log/storage_diags_20140905-172736.log
-----------------------------Smart--------------------------------
Checking SPU availability [DONE]
------------------------------------------------------------------
SC_IO: bad ioctl driver_status=8
DSK: Make : ST600MM0026
DSK: Model : IBM-ESXS
DSK: F/W Rev. : E56D
DSK: S/N : S0M1NQFA0000B4259BW0
DSK: Size : 600 GB, (1172123567 sectors)
DSK: Transport Protocol: SAS
DSK: Disk Location : encl3Slot20
---------------Write---------------
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 0
Total times correction algorithm processed = 0
Total bytes processed = 417254719768
Total uncorrected errors = 0
---------------Read---------------
Errors corrected without possible delays = 293538838
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 293538838
Total times correction algorithm processed = 0
Total bytes processed = 85849841480
Total uncorrected errors = 0
---------------Verify---------------
Errors corrected without possible delays = 2864805
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 2864805
Total times correction algorithm processed = 0
Each Disk Enclosure contains two ESMs and two power modules.
Before you begin the Disk Enclosure replacement process, make certain that you have a
replacement component that conforms to the hardware models supported for the N3001
system.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for a disk enclosure midplane assembly is 81Y9834.
Complete details on Disk Enclosure removal and replacement is provided in the IBM Sys-
tem Storage EXP2500 Installation, User’s, and Maintenance Guide.
To replace a Disk Enclosure on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).
5-1
Replacement Procedures: IBM PureData System for Analytics N3001
c. Make note of the output to compare with results later in the procedure.
5. Check the state of the Netezza system:
[nz@nzhost ~]$ nzstate
System state is 'Online'.
6. If the system state is online, stop the system using the command:
[nz@nzhost ~]$ nzstop
10. Locate the enclosure being replaced. Figure 1-2 on page 1-3 through Figure 1-11 on
page 1-12 show locations of system components.
The disks must be removed (from the enclosure being replaced) and replaced in the exact
same locations, so ensure that their locations are noted.
12. From the enclosure being replaced, remove all cables from the power supplies and
ESMs, ensuring that the cables are properly labeled so that they can be replaced into
their original locations.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
13. Remove the shipping brackets (if installed, colored orange) at the rear of the disk
enclosures:
a. Using a 7mm socket, loosen (do not remove) the nut that secures the shipping
bracket to each side of the rack.
b. With the nuts loosened, pull the bracket away from the disk enclosure, detaching
the bracket from the enclosure.
c. To completely remove the bracket from the rack, cables on one side of the rack,
behind the enclosure, need to be loosened by cutting the zip tie (or loosening the
Velcro tie) that secures them to the side of the rack.
d. After cutting the zip tie (or loosening the Velcro tie) lift that end of the bracket off
the rail and then rotate the bracket out of the rack.
e. Reattach the cables to their original position using a new zip tie (or the Velcro tie).
f. Tighten the nuts that secured the bracket to each side of the rack.
g. Repeat for all disk enclosure brackets.
h. The brackets should be stored on-site in the event the system needs to be moved
(requiring them to be re-installed).
14. Remove all power supplies, ESMs, and disks.
The disks and ESMs must be removed from the failed enclosure and installed into the
replacement enclosure.
The power supplies and ESMs must be removed and replaced in the exact same locations,
so ensure that their location is noted.
15. Install the ESMs into the replacement disk enclosure, ensuring they are installed into
the same slots from which they were removed.
16. Install the power supplies into the replacement disk enclosure, ensuring they are
installed into the same slots from which they were removed.
17. Note the serial number of the replacement enclosure and install it into the rack. The
serial number label is located on the top of the enclosure, near the front.
18. Reconnect all cables to the power supplies and ESMs.
19. Install all disks into the enclosure, ensuring they are installed into the same slots from
which they were removed.
20. Power on all SPAs:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all
21. Start the bootp server:
a. Open another process window. (If using a KVM, press Alt-F2 to start a new process
window. If using a network connection and terminal session, open a new session.)
b. Issue the following command to make sure that the bootp server is started:
[root@nzhost1 ~]# /nz/kit/sbin/bootpsrv
Note: Leave this process running until you are instructed otherwise.
If any components have errors, troubleshoot that component and resolve the error
before proceeding.
27. Log out as user root and return to the user nz session.
[root@nzhost ~]# exit
c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
29. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 4.
30. Exit bootp server:
Return to the process window where you started the bootp server and type Ctrl-C to
stop the bootp server. You may close this process window by typing exit at the prompt.
The estimated time to perform this procedure is from 30 to 120 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the AMM is 47C2480, and can be verified as described in “H-Chassis
Component FRU Numbers” on page 1-13.
Note: For the N3001, use a replacement AMM of the same FRU number only. Do not mix
AMMs with different FRU numbers in the same system.
6. Validate the current state of the system prior to replacement. Run sys_rev_check:
a. Change directory to:
6-1
Replacement Procedures: IBM PureData System for Analytics N3001
c. Make note of the output to compare with results later in the procedure.
7. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
8. If the system state is stopped, start the system using the command:
[nz@nzhost1 ~]$ nzstart
9. Wait for the system to come online using the command:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
10. Identify the failed AMM that requires replacement:
[nz@nzhost1 ~]$ nzhw -issues
Description HW ID Location Role State
----------- ----- --------------------- -------- -----
MM 1004 spa1.mm1 Failed Ok
The State column typically has a state of Ok as shown in this output. The state could
also be Unreachable or None.
11. Obtain the physical location of the failed AMM:
[nz@nzhost1 ~]$ nzhw locate -id 1004
Note: The location of the AMM appears in the command output. The person who is run-
ning the command should communicate the location of the AMM to the person who is
onsite with the Netezza system.
12. Using the location information, physically locate the failing AMM. See Figure 1-9 on
page 1-10 through Figure 1-11 on page 1-12.
13. Disconnect the Ethernet cable from the AMM. If the failing AMM is in Chassis 1 and is
MM1, also disconnect the video and USB cables.
Release Handle
Video Connector
Ethernet Connector
USB Connectors
14. Remove the failing AMM (by pulling out and down on the release handle).
The primary AMM displays two LEDs in the ON state and the secondary AMM displays
one LED.
17. As user root from the active host, log into the active AMM:
[root@nzhost1 ~]# ssh USERID@mm00x
where x is the AMM SPA number (1-4).
When prompted for the password, type PASSW/0RD (the /0 is a zero, not the letter O).
18. Check the primary AMM:
system> info -T system:mm[1]
Name: mm001
UUID: 003A 5B9A CFD0 11DE B7F4 0021 5E43 B79E
Manufacturer: IBM (FOXC)
Manufacturer ID: 20301
Product ID: 65
Mach type/model: Advanced Management Module
Mach serial number: Not Available
Manuf date: 4609
Hardware rev: 18
Part no.: 49Y6295
FRU no.: 60Y0621
FRU serial no.: YK12909BC2KF
CLEI: Not Available
AMM firmware
Build ID: BPET086
File name: CNETCMUS.PKT
Rel date: 03/22/2011
Rev: 8
Product Name: IBM Advance Management Module
19. Monitor the replacement AMM and ensure the update completes. The Status line
shows No update in progress when the update is complete:
system> info -T system:mm[2]
Name: Standby MM
UUID: F421 F8D7 F019 11DE 930C 0021 5E43 E960
Manufacturer: IBM (FOXC)
Manufacturer ID: 20301
Product ID: 65
Mach type/model: Advanced Management Module
Mach serial number: Not Available
Manuf date: 5209
Hardware rev: 18
Part no.: 49Y6295
FRU no.: 60Y0621
FRU serial no.: YK12909CM2D4
CLEI: Not Available
AMM firmware
Build ID: BPET086
File name: CNETCMUS.PKT
Rel date: 03/22/2011
Rev: 8
Status: No update in progress
Product Name: IBM Advance Management Module
20. View the Event Log:
Example output:
------------------------------------------------------------------
IBM Netezza – Netezza Platform Software
(C) Copyright IBM Corp. 2002, 2012 All rights reserved.
------------------------------------------------------------------
WARNING: Inconsistencies ignored due to command line options:
Target and origin directories are identical (/nz/kit.7.0.22389).
Target and origin releases are the same.
Logfile: /nz/var/log/upgrade.20120402.7.0.run
Logfile: /nz/var/log/upgrade.20120402.7.0.run.gz
d. Repeat step a. If still prompted for a password, contact IBM Netezza Support.
23. As user nz, verify system is still Online.
[nz@nzhos1t ~]$ nzstate
System state is 'Online'.
24. Check for any issues with the commands:
[nz@nzhost ~]$ nzhw -issues
[nz@nzhost ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues
c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
27. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 6.
Note: If the replacement AMM is being reused from a former system, the VPD information
may not be refreshed after installation. If it is noticed that the AMM does not have the cor-
rect name in VPD:
1. Ensure the replacement AMM is the primary.
If the replacement AMM is not the primary, failover the other AMM:
a. As user root, log into the AMM:
[root@nzhost1 ~]# ssh mm0xx
Where xx is the SPA number of the chassis with the replaced AMM.
b. Failover the AMM:
system> reset -force -T mm[y]
Where for y, 1 is the top AMM in the chassis, and 2 is the bottom AMM in the
chassis.
For example, if the top AMM was replaced, and the bottom AMM is now primary:
system> reset -force -T mm[2]
To cause the top AMM to be primary, which also closes the current login session to
the AMM.
Before continuing after resetting an AMM, you must wait 15 for the new primary AMM
to be active.
2. As user root, log into the AMM:
[root@nzhost1 ~]# ssh mm0xx
Where xx is the SPA number of the chassis with the replaced AMM.
3. Check the VPD of the AMM:
system> config -T mm[y]
Where for y, 1 is the top AMM in the chassis, and 2 is the bottom AMM in the chassis.
The name of the AMM must be in the form of mm0xx, where xx is the SPA number of
the chassis with the replaced AMM.
4. If the name of the AMM needs to be corrected, type the command:
system> config -name mm0xx -T mm[y]
Where xx is the SPA number of the chassis with the replaced AMM and where for y, 1 is
the top AMM in the chassis, and 2 is the bottom AMM in the chassis.
5. Exit from the AMM:
system> exit
The estimated time to perform this procedure is from 60 to 120 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the 10Gb switch is 90Y9392, and can be verified as described in “H-
Chassis Component FRU Numbers” on page 1-13.
To replace a switch on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).
7-1
Replacement Procedures: IBM PureData System for Analytics N3001
3. Log into the active host (assumed here to be nzhost1) of the system as user in as root.
4. Change to user nz:
[root@nzhost1 ~]# su - nz
5. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
6. Run sys_rev_check to validate the current state of the system prior to replacement:
a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt
c. Make note of the output to compare with results later in the procedure.
7. Identify the failing switch by using one of these techniques:
Issue the following command to the AMM from the Host 1:
[nz@nzhost1 ~]$ ssh USERID@mm00x health -l all -f
Where x is the SPA number of the failing switch.
When prompted for the password, type PASSW0RD (use the number zero, not the
letter O).
The command output lists component health status and active alerts. Note the ID of
the failing switch. See Figure 1-3 on page 1-4 through Figure 1-11 on page 1-12.
Issue the following command:
[nz@nzhost1 ~]$ nzhw -type ethsw
Check the Role and Status in the output of the command for the failing switch.
8. If NPS resource group is running (from step 2):
a. Run the following command to stop the Netezza server:
[nz@nzhost1 ~]$ nzstop
Statement 3
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.
11. Install the replacement switch module. Replace the cables into their original positions.
Note: It may be necessary to move cabling back into place if it was moved out of the
way to make room for the switch removal and replacement.
b. Type the following commands to start the clustering processes on the active host:
[root@nzhost1 ~]# service heartbeat start
[root@nzhost1 ~]# ssh ha2 'service heartbeat start'
20. From the active host (ha1), type the following and press Enter:
[root@nzhost1 ~]# crm_mon -i5
Result: When the cluster manager comes up and is ready, status appears as follows.
Make sure that nzinit has started before you proceed. (This could take a few min-
utes.)
23. The system may require up to 10 minutes to come online. Verify that the system state
is online using the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
24. Check for any issues with the commands:
[nz@nzhost1 ~]$ nzhw -issues
[nz@nzhost1 ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues
d. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
27. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 6.
Because of an issue with the early version of RAID Flash cards (FRU number 46C9793),
whenever replacing the planar of a host server that has as an early version of the RAID
Flash, the RAID Flash is also to be replaced (using new FRU number 44W3393) at the
same time as the planar. The following system serial numbers are the systems that include
the early RAID Flash cards: NZ33000 to NZ33028, NZ33100 to NZ33109, 7837001 to
7837038.
If replacing networking components in the host in addition to the system board, you must
replace just one component at a time, completing each procedure first, and the continuing
to another component. Otherwise, it is difficult to determine which MAC address is
assigned to which port.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
Note: The Host Server firmware must be updated as part of this procedure. You must have
bootable media available for the firmware update. FDT Support Tools 2.0.0.1 provides tools
and instructions for creating bootable USB drives and includes the latest critical host firm-
ware updates.
There is one host type for the IBM PureData System for Analytics N3001-001.
For the x3650-M4-HD, there are ten ports (eth0 through eth9).
8-1
Replacement Procedures: IBM PureData System for Analytics N3001
The N3001-001 host system board uses Feature on Demand (FoD) keys for RAID configu-
ration and remote access. To restore the FoD keys, you must use a laptop computer to
retrieve the keys from the IBM Features on Demand website (https://fod2.lenovo.com/lkms/
angular/app/pages/index.htm) and then store the keys on a USB flash drive.
1. Log into the FoD website: https://fod2.lenovo.com/lkms/angular/app/pages/index.htm.
You need to have or create an IBM id for access.
2. Click on Retrieve history.
3. In the Search type dropdown, select Search history via UID.
4. In the Search value field, you must specify the server UID, which is a concatenation of
the machine type and system serial number (for example, 5460KQ5N05V).
5. Click Continue.
6. Select all active keys and press Download to save the keys on a USB flash drive to be
used later.
To replace an Host system board on the N3001-001 system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the active host of the system as user root.
3. Record IMM information to be restored after the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 3: View existing information
--- Reloading info from remote management...
--- Network Enabled = Enabled
--- DHCP Client = Disabled
--- Hostname = IMM2-40f2e92d2e76
--- IP Address = 10.0.46.178
8-2 00X6949 Rev.1.00
--- Subnet = 255.255.255.0
--- Gateway = 10.0.46.254
Make note of the information listed in the output.
Choose option 4: Exit
4. Save VPD to /nzscratch on other host server. On the host requiring the system board
replacement:
a. Change directory:
[root@nzhost1 ~]# cd /nz/export/tools/asu
c. Copy the VPD to the other host (assuming ha2 is the other host):
[root@nzhost1 ~]# scp /nzscratch/savedVPD.txt root@ha2:/nzscratch/
savedVPD.txt
d. The VPD must be available for console access later in the procedure (step 17,
substep i), or printed if necessary.
5. Change to user nz:
[root@nzhost1 ~]# su - nz
6. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
7. Check to see if the host drives have encryption Auto-Lock mode enabled. Type:
[nz@nzhost1 ~]$ nzhw show -type hostDisk
The Security column lists Enabled or Disabled. If Enabled, the drives are locked.
This information is required when you reach step 12. If Disabled, skip step 12, step
19, and step 26.
8. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
9. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
12. If host disks are encryption Auto-Lock mode Disabled (from step 7), skip to step 13. If
host disks are encryption Auto-Lock mode Enabled (from step 7), extract the host
key(s):
a. Log in to the active host as user root.
b. Determine which host keys are stored:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey list
Example output:
hostkey1
hostkey2
spuaek
hostkey1old
hostkey2old
Note: If only one key per host were generated, the files hostkeynold are not listed.
1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one:
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 1.
Example output:
You have selected ha1
Stopping nps resource . . .
Stopping Heartbeat on both ha2...
Stopping Heartbeat on both ha1...
Putting ha1 into maintenance mode...
Done
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING
Resource status: Stopped
Current NPS state is Stopped
Splitbrain is not detected.ha1 appears to be maintenance mode
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Select option 4:
What do you want to do?
1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one:
Select option 3.
14. On the replacement system board, there are stickers with MAC addresses for all net-
work cards ports and USB ports. You must change the MAC addresses configured in
NPS for the failed system board and replace them with MAC addresses assigned to the
replacement system board.
Using an editor, for example vi, change the following files on the host that requires the
system board replacement:
a. Type the command:
[nz@nzhost1 ~]$ vi /etc/udev/rules.d/70-persistent-net.rules
Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for
eth0 with the value for MAC 1 from the replacement system board
Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for
eth6 with the value for MAC 2 from the replacement system board
16. Replace the system board following the IBM replacement procedures in the IBM Prob-
lem Determination and Service Guide for the server.
17. The host server firmware must be updated (including critical updates from FDT Sup-
port Tools 2.0.0.1).
a. Insert the host-specific USB stick into the front USB port of the host that is being
configured. (Refer to the FDT Support Tools DVD for instructions on creating the
firmware update USB stick.)
b. Press F12 when the splash screen appears.
c. Select USB: Storage – USB Port#
d. Ignore the prompt to enter debug mode. Select f to select all entries when the selec-
tion appears on the screen.
e. Select a to accept the menu.
f. Answer Y to update ASU settings.
g. Type y when prompted to save logs.
The firmware and ASU settings are updated.
h. Interrupt the countdown before the host shuts down, which opens a command line
in debug mode.
i. Change the VPD to those saved from replaced system board (UUID, SERIAL AND
MTM data):
cd asu
./asu64 set SYSTEM_PROD_DATA.SysInfoUUID <saved uuid_value>
./asu64 set SYSTEM_PROD_DATA.SysInfoSerialNum <saved s/n>
./asu64 set SYSTEM_PROD_DATA.SysEncloseAssetTag <3561-AAR, NZ12345>
MTM for restricted system is same 3561-AAR, non-restricted MTM is 3561-AAJ,
Serial will vary (NZ12345 is an example).
./asu64 save savedVPD.txt --group SYSTEM_PROD_DATA
cat savedVPD.txt <check data correctness>
reboot <host reboots twice>
18. As the host boots the second time, press F1 at the splash screen to enter the Unified
Extensible Firmware Interface (UEFI) menu, and restore IMM information recorded
prior to the replacement.
a. Select System Settings -> Integrated Management Module.
b. Then select Network Configuration -> Network Interface Port and set as Shared.
c. Select DHCP Control and set as Static IP.
d. Select IP Address and type the IMM IP address recorded in step 3 on page 8-2.
e. Select Subnet Mask and type the address recorded in step 3 on page 8-2.
f. Select Default Gateway and type the address recorded in step 3 on page 8-2.
g. Ensure that VLAN support is set to Disabled.
h. Select Save Network Settings.
i. Press Esc repeatedly to return to the System Configuration and Boot Management
screen.
19. If the host drives are Auto-Lock mode Disabled (as identified in step 7 on page 8-3)
skip to step a. If the host drives are encryption Auto-Lock mode Enabled, as identified
in step 7 on page 8-3:
a. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management
-> Manage Foreign Configuration -> Preview Foreign Configuration -> Enter Security
Key for Locked Drives -> Security Key
b. In next screen in the Security Key field, type the key obtained in step 12 substep d,
then type the key again in the Confirm field.
The key must be typed correctly without errors, and you must record your entry (written or
photo) in the event that you need to change the key value later.
c. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management
-> Manage Foreign Configuration -> Preview Foreign Configuration -> Import
Foreign Configuration
d. Confirm and select yes and press Enter.
e. At the message The operation has been performed successfully, select OK and press
Enter.
20. Restore the FoD keys:
Note: FoD keys for both RAID6 and IMM2 update need to be installed.
a. Connect the SSR’s laptop computer to the switch that provides network connectivity
to IMM/LOM shared port (management network)
b. Connect the USB flash drive with the FoD keys to the laptop.
c. From the laptop’s browser, navigate to the IMM IP address recorded in step 3 on
page 8-2.
d. Click Connecting to this WEB site not recommended. Ignore the warning.
e. Type the user name USERID and password PASSW0RD (use a zero instead of the
letter O).
f. Navigate to IMM Management -> Activation Key Management.
g. Click Add.
h. Select File and browse for the FoD keys from the USB flash drive attached to the
laptop. Click Close.
i. The keys are now listed in Activation Key Management with a status of Activation
key is valid. This confirms that the keys are applied to the system. Find the correct
serial number of the intended Host and click OK.
j. Log out and remove the laptop from the Host system.
k. Click Close and Log Out when finished.
21. From the UEFI, press Esc to exit and then type Y to save the changes. The host boots.
22. Restore the LOM addresses:
a. Type the following commands and note the outputs relating to the LOM addresses
for HA1 and HA2:
[root@nzhost1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
|grep IPADDR
[root@nzhost1 ~]# ssh ha2 cat /etc/sysconfig/network-scripts/
ifcfgeth0 |grep IPADDR
b. Type the following command and note the IMM1 and IMM2 entries:
[root@nzhost1 ~]# cat /etc/hosts |grep imm
Example output:
–- 10.0.46.178 imm1
–- 10.0.46.180 imm2
c. Type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/ipminetcfg
Example output:
–- Configure IMM network (this will restart network service) [y/n]?
Type y and press Enter.
-–- Enter IP address for IMM on HA1 :
Type and press Enter.
–-- Re-enter IP address for IMM on HA1 []:
Retype the and press Enter.
–-- Enter LOM address on HA1 :
Type and press Enter.
–-- Re-enter LOM address on HA1 [] :
Retype the and press Enter.
–-- Enter IP address for IMM on HA2 :
Type and press Enter.
–-- Re-enter IP address for IMM on HA2 [] :
Retype the and press Enter.
–-- Enter LOM address on HA2 :
Type and press Enter.
–-- Re-enter LOM address on HA2 []:
Retype the and press Enter.
–-- Enter IMM network gateway :
Type and press Enter.
–-- Re-enter IMM network gateway []:
Retype and press Enter.
–-- Enter IMM network mask :
Type and press Enter.
–-- Re-enter IMM network mask []:
Retype and press Enter.
–-- Configuring IMM network, it may take some time
IMM network configuration completed
–-- Update IMM usernames and passwords (this will change user 2
parameters on IMMs) [y/n]?
Type y and press Enter.
–-- Enter username for IMMs [enter for default]:
Press enter to set default IMM username which is 'USERID'
–-- Enter user password for IMMs [enter for default]:
Press enter to set default password which is 'PASSW0RD', with '0'
Note: The IMM passwords are reverted to default ones on both hosts and the stonith
configuration has been updated to reflect these changes. User should update the pass-
words at his earliest convenience to ensure security of the system. These passwords
can only be updated using /nzlocal/scripts/ipminetcfg script.
1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 4 to go back to the previous menu.
Type 3 to exit.
26. If the host drives are Auto-Lock mode Disabled (as identified in step 7 on page 8-3)
skip to step 27. If the host drives are encryption Auto-Lock mode Enabled, as identi-
fied in step 7 on page 8-3:
a. As user nz, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
b. The virtual drive may need to be secured. To check if it is secured, as user root,
issue the following MegaCli command;
[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show all
Example output:
Controller = 0
Status = Success
Description = None
/c0/v0 :
======
-----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
-----------------------------------------------------------
0/0 RAID10 Optl RW Yes NRAWBC - 1.089 TB
-----------------------------------------------------------
.
.
.
Span Depth = 2
Number of Drives Per Span = 2
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disabled
Encryption = FDE
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
Creation Date = 04-09-2014
Creation Time = 03:48:53 PM
Emulation type = None
d. For all systems with Auto-Lock mode Enabled, issue the commands:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume
Note: If this command fails, contact IBM Netezza Support.
27. Type the following command on the host requiring the system board replacement:
[root@nzhost1 ~]# chkconfig heartbeat on
28. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
29. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
30. Type the following and press Enter:
[root@nzhost1 ~]# crm_mon -i5
Result: When the cluster manager comes up and is ready, status appears as follows.
Make sure that nzinit has started before you proceed. (This could take a few min-
utes.)
Node: nps61074 (e890696b-ab7b-42c0-9e91-4c1cdacbe3f9): online
Node: nps61068 (72043b2e-9217-4666-be6f-79923aef2958): online
33. The system may require up to 10 minutes to come online. Verify that the system state
is online using the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
If replacing networking components in the host in addition to the system board, you must
replace just one component at a time, completing each procedure first, and the continuing
to another component. Otherwise, it is difficult to determine which MAC address is
assigned to which port.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
Note: The Host Server firmware must be updated as part of this procedure. You must have
bootable media available for the firmware update. FDT Support Tools 2.0.0.1 provides tools
and instructions for creating bootable USB drives and includes the latest critical host firm-
ware updates.
There are two host types for the IBM PureData System for Analytics N3001:
x3650M4 (which has four Ethernet ports that must be configured: eth6, eth7, eth0,
and eth1).
x3750M4 (which has no Ethernet ports to configure).
The x3650M4 (N3001-002 and -005) host system board uses a Feature on Demand (FoD)
key for remote access. To restore the FoD keys, you must use a laptop computer to retrieve
the keys from the IBM Features on Demand website (https://fod2.lenovo.com/lkms/angular/
app/pages/index.htm).
1. Log into the FoD website: https://fod2.lenovo.com/lkms/angular/app/pages/index.htm.
You need to have or create an IBM id for access.
2. Click on Retrieve history.
3. In the Search type dropdown, select Search history via UID.
4. In the Search value field, you must specify the server UID, which is a concatenation of
the machine type and system serial number (for example, 8722KQ5N05V).
5. Click Continue.
6. Select all active keys and press Download to save the key(s) in a location to be used
later.
9-1
Replacement Procedures: IBM PureData System for Analytics N3001
To replace a Host system board on the N3001 system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the active host of the system as user root.
3. On the host requiring the system board replacement, record IMM information to be
restored after the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 3: View existing information
--- Reloading info from remote management...
--- Network Enabled = Enabled
--- DHCP Client = Disabled
--- Hostname = IMM2-40f2e92d2e76
--- IP Address = 10.0.46.178
8-2 00X6949 Rev.1.00
--- Subnet = 255.255.255.0
--- Gateway = 10.0.46.254
Make note of the information listed in the output.
Choose option 4: Exit
4. Save VPD to /nzscratch on other host server. On the host requiring the system board
replacement:
a. Change directory:
[root@nzhost1 ~]# cd /nz/export/tools/asu
c. Copy the VPD to the other host (assuming ha2 is the other host):
[root@nzhost1 ~]# scp /nzscratch/savedVPD.txt root@ha2:/nzscratch/
savedVPD.txt
6. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
7. Check to see if the host drives have encryption Auto-Lock mode enabled. Type:
[nz@nzhost1 ~]$ nzhw show -type hostDisk
The Security column lists Enable or Disabled. If Enabled, the drives are locked.
This information is required when you reach step 12.
8. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
9. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
10. Wait for the system to stop using the command:
[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.
11. When the system is stopped, exit from the nz session to return to user root:
[nz@nzhost1 ~]$ exit
12. If host disks are encryption Auto-Lock mode Disabled (from step 7), skip to step 13.If
host disks are encryption Auto-Lock mode Enabled (from step 7), extract the host
key(s):
a. Log in to the active host as user root.
b. Determine which host keys are stored:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey list
Example output:
hostkey1
hostkey2
hostkey1old
hostkey2old
Note: If only one key per host were generated, the files hostkeynold are not listed.
14. Type the following command on the host requiring the system board replacement:
[root@nzhost ~]# chkconfig heartbeat off
[root@nzhost ~]# shutdown -h now
Note: If the host is in a state where the above step is not possible, power off the host
by holding in the power button.
15. Replace the system board following the IBM replacement procedures in the IBM Prob-
lem Determination and Service Guide for the server.
16. Boot the host and login as root.
17. Put the system in non-heartbeat mode:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
18. Change the VPD to those saved from original system board (UUID, SERIAL, AND MTM
data) in step 4:
[root@nzhost1 ~]# cat /nzscratch/savedVPD.txt
[root@nzhost1 ~]# cd /nz/export/tools/asu
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoUUID <uuid>
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoSerialNum <s/n>
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysEncloseAssetTag <MT,
serial num>
For example:
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoUUID
43130A38E4C511E3BB3640F2E9301638
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoSerialNum 06BN989
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysEncloseAssetTag
3567-EEP,NZ35086
20. Reboot the host server and update the firmware using the critical host firmware
updates from FDT Support Tools 2.0.0.1 (see the README for FDT Support Tools
2.0.0.1 for instructions to create bootable media):
a. Reboot the host:
[root@nzhost1 ~]# reboot
b. Insert the host-specific USB stick into the front USB port of the host that is being
configured. (Refer to the FDT Support Tools media for instructions on creating the
firmware update USB stick.)
c. Press F12 when the splash screen appears.
d. Select USB: Storage – USB Port#
e. Ignore the prompt to enter debug mode. Select f to select all entries when the selec-
tion appears on the screen.
f. Select a to accept the menu.
g. Answer Y to update ASU settings.
h. Type y when prompted to save logs.
The firmware and ASU settings are updated, and the host reboots twice.
21. If the host drives are encryption Auto-Lock mode Disabled (as identified in step 7 on
page 9-2) skip to step 22. If the host drives are encryption Auto-Lock mode Enabled,
as identified in step 7 on page 9-2:
a. As the host boots the second time, press F1 at the splash screen to enter the
Unified Extensible Firmware Interface (UEFI) menu.
b. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management -
> Advanced -> Enable Drive Security
c. Local Key Management (LKM) must be selected, then select OK.
d. In next screen in the Security Key field, type the key obtained in step 12 substep d,
then type the key again in the Confirm field.
The key must be typed correctly without errors, and you must record your entry (written or
photo) in the event that you need to change the key value later.
e. Deselect Pause for Password at Boot by pressing space bar. Verify that it is not
selected.
f. After recording your Security Key entry with written documentation or photo, select
I Recorded the Security Settings for Future Reference (as described in substep d)
using the space bar.
g. Select Enable Drive Security by pressing Enter.
h. Confirm and select yes and press Enter.
i. At the message The operation has been performed successfully, select OK and press
Enter.
j. Press Esc to exit the UEFI and then type Y to save the changes. The host boots.
k. If system doesn't boot and the system is continuously rebooting, the Key entered in
substep d is incorrect. While booting, press F1 at the splash screen, go to System
Settings and re-start from substep b. (Select Change security key -> Change security
key settings, keeping Local Key Management. Then in the Existing Key field, type
the key saved in substep d, then in the New Key and Confirm fields type the correct
key and confirm. Again, record the key as typed. Continue at substep e.)
If you think you entered the correct Key and are still not able to boot contact IBM
Netezza Support for further help.
l. Once the system boots, the virtual drive may need to be secured. To check if it is
secured, issue the following MegaCli command;
[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show all
Example output:
Controller = 0
Status = Success
Description = None
/c0/v0 :
======
-----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
-----------------------------------------------------------
0/0 RAID10 Optl RW Yes NRAWBC - 1.089 TB
-----------------------------------------------------------
.
.
.
Span Depth = 2
Number of Drives Per Span = 2
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disabled
Encryption = FDE
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
Creation Date = 04-09-2014
Creation Time = 03:48:53 PM
Emulation type = None
Locate the listing for usb0 and make note of the HWaddr address listed for use later
in the procedure.
Locate the entries for __tempxxx.
There are four entries.
The following table shows how the __tempxxx values are applied to ethx values.
Note: If there are no __tempxxx values listed, look for new Ethernet ports, such as
eth14, eth15, eth16, and eth17.
Table 9-1: Determining ethx values
x3650-M4
Use the MAC address associated with lowest numbered __tempxxx value as the
MAC address for eth6, the next higher value as the MAC address for eth7, and con-
tinue with the next higher values.
b. Type the following commands on the host with the system board replacement:
[root@nzhost1 ~]# service network stop
[root@nzhost1 ~]# vi ifcfg-ethx
[root@nzhost1 ~]# cd /etc/sysconfig/network-scripts
Edit the value for HWADDR, using the values for MAC address from Table 9-1, then
save and close. For example:
# Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet
DEVICE=eth0
BOOTPROTO=dhcp
DHCPCLASS=
HWADDR=5C:F3:FC:7A:97:98
ONBOOT=no
HOTPLUG=no
DHCP_HOSTNAME=netezza
Repeat this step for each ethx value associated with the system board.
c. Type the following command to edit the ifcfg-usb0 file, and edit the HWADDR value
with the usb0 MAC address recorded in step 22:
[root@nzhost1 ~]# vi ifcfg-usb0
e. Type the following commands on the host with the system board replacement:
[root@nzhost1 ~]# ifconfig -a | less
Locate the entries for the ports just configured. Confirm that the values are changed
according to the edits made in step b through step c.
f. Also from the output in the previous step, locate the IP addresses for the ports, and
ping those addresses:
[root@nzhost1 ~]# ping www.xxx.yyy.zzz (confirm port is live)
[root@nzhost1 ~]# ping aaa.bbb.ccc.ddd (confirm port is live)
g. Delete the file /etc/udev/rules.d/70-Persistent-net.rules.
23. Restore IMM information recorded prior to the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 1.
Enter the information recorded earlier.
Choose option 4: Exit
24. After the host has rebooted, type the following command on the active host:
[root@nzhost1 ˜
~]# /nzlocal/scripts/drbd_config.sh --config-only
25. Type the following commands on both hosts:
[root@nzhost1 ~]# service drbd start
[root@nzhost1 ~]# chkconfig heartbeat on
[root@nzhost1 ~]# service heartbeat start
b. As user root:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume
Note: If this command fails, contact IBM Netezza Support.
31. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check host
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
Media for FDT Support Tools 2.0.0.1 (or later) is required to complete this replacement.
It is recommended, but not required that both hosts be updated with the latest RAID
firmware.
10-1
Replacement Procedures: IBM PureData System for Analytics N3001
4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
c. Copy the FDT Support Tools 2.0.0.1 package to the /nzscratch/FDT_ST directory
[root@nzhost1 ~]# cp nz-FDTSupport-2.0.0.1.tar.gz /nzscratch/FDT_ST
d. Untar the package:
[root@nzhost1 ~]# cd /nzscratch/FDT_ST
[root@nzhost1 ~]# tar xzvf nz-FDTSupport-2.0.0.1.tar.gz
e. Untar the script utilities package:
[root@nzhost1 ~]# tar xvf script-utils-tar.gz
Complete.
20. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
21. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
If replacing networking components in the host in addition to the NIC, you must replace
just one component at a time, completing each procedure first, and the continuing to
another component. Otherwise, it is difficult to determine which MAC address is assigned
to which port.
The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
To replace a Host NIC on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).
4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
11-1
Replacement Procedures: IBM PureData System for Analytics N3001
6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
7. Wait for the system to stop using the command:
[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.
1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = RUNNING
HA2:
drbd status = RUNNING
heartbeat status = RUNNING
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 1.
Example output:
You have selected ha1
Stopping nps resource...
Stopping Heartbeat on both ha2...
Stopping Heartbeat on both ha1...
Using an editor, for example vi, change the following files on the host that requires the
NIC replacement:
a. Type the command:
[nz@nzhost1 ~]$ vi /etc/udev/rules.d/70-persistent-net.rules
Replace MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for ethX
with MAC 1 from the new NIC.
Replace MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for ethY
with MAC 2 from the new NIC (2nd port has MAC address incremented by 1).
If the replacement NIC is a 4-port card, repeat the previous edits for the next two
ports.
For N3001-001 systems, remove the UUID from the file.
Save and exit the file.
b. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-ethx
Replace MAC address (value for attribute HWADDR=) for ethX with MAC from the
new NIC.
For N3001-001 systems, remove the UUID from the file.
Save and exit the file.
c. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-ethy
Replace MAC address (value for attribute HWADDR=) for ethY with MAC2 from the
new NIC (2nd port has MAC address incremented by 1).
Save and exit the file.
d. If the replacement NIC is a 4-port card, repeat step b and step c for the next two
ports.
e. Run the following command to stop the operating system:
[root@nzhost1 ~]# shutdown -h now
11. Replace the NIC following the IBM replacement procedures in the IBM Problem Deter-
mination and Service Guide for the server.
12. Reboot the host after completing the replacement.
13. The host server firmware must be updated as described in the FDT 4.2 User’s Guide
(for N3001-002 and larger systems) or the FDT 4.2.1 User’s Guide (for N3001-001).
14. From HA1, put the system into non-heartbeat mode:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 4 to go back to the previous menu.
Type 3 to exit.
17. For N3001-002 and larger only. (This step is optional but recommended. If performed,
it is recommended to repeat the step so as to make HA1 the active host.)
Relocate the active system software to ensure that both hosts successfully failover.
From ha1, type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Example output:
Migrating the NPS resource group from <current active host> to
<current standby host>.....
and then, after a few minutes:
Complete.
18. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
19. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the SAS HBA is 47C8676.
To replace a SAS HBA on the N3001-001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).
4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
12-1
Replacement Procedures: IBM PureData System for Analytics N3001
10. Type the following command on the host requiring the SAS HBA replacement:
[root@nzhost1 ~]# chkconfig heartbeat off
[root@nzhost1 ~]# shutdown -h now
11. Replace the SAS HBA following the IBM replacement procedures in the IBM Problem
Determination and Service Guide for the server.
12. Reboot the host after completing the replacement.
13. The host server firmware must be updated as described in the FDT 4.2.1 User’s Guide.
14. Type the following commands on both hosts:
[root@nzhost1 ~]# service drbd start
[root@nzhost1 ~]# chkconfig heartbeat on
[root@nzhost1 ~]# service heartbeat start
15. Type the following command on the active host:
[root@nzhost1 ~]# crm_mon -i3
Wait until both hosts go online and nps resource group comes up.
Seek help if there are any errors, or if entries in nps resource group are not all started
after 5 minutes.
16. As user root, type the following commands:
[root@nzhost ~]# cd /opt/nz/fdt
[root@nzhost ~]# ./system_diags datapathcheck
Note: Do note interrupt this operation (such as with Ctrl-C).
18. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
Before you begin the disk replacement process, make certain that you have a replacement
disk that conforms to the hardware models supported for the N3001 system. The N3001
system uses Self-Encrypting Drives (SEDs). Typically, you will use a new replacement disk.
Also before beginning the replacement procedure, verify that there is a problem with the
disk drive. Consult the Problem Determination and Service Guide for the disk enclosure or
host server for more information on disk replacement.
Note: Prior to FDT 4.3, only IBM branded drives labeled with the correct FRU number are
supported. If a Lenovo branded drive is used as a replacement, it is not possible to update
the firmware on the drive or the host server using standard firmware update procedures.
Firmware on host components may be individually updated as needed until an approved
drive is procured or FDT 4.3 is installed. As of FDT 4.3, Lenovo branded drives are allowed
in the system (sys_rev_check shows the drives as [INFO}), but firmware in the drives cannot
be updated using FDT.
A failed disk must be replaced while the system is online. The system cannot be offline
when replacing a failed disk.
The estimated time to perform this procedure is up to one hour (for the disk to fully mirror).
The N3001 system currently supports two SED models:
600GB Model ST600MM0026E - FRU number 90Y8909
600GB Model HUC101860CSS20E - FRU number 90Y8909
13-1
Replacement Procedures: IBM PureData System for Analytics N3001
e. Verify that the status is listed as UpToDate/UpToDate. If not, contact IBM Netezza
Support.
Example output:
6. Identify the failed disk that requires replacement. Look at the Role column:
[nz@nzhost ~]$ nzhw -issues
Example output:
Description HW ID Location Role State Security
----------- ----- ---------------------- ------- ----- --------
HostDisk 1020 rack1.host1.hostDisk2 Failed Down Disabled
SASController 1026 rack1.host1.SASController0 Active WarningN/A
Note: When a host disk fails, a SASController warning may accompany the disk failure.
This warning clears after the disk is replaced.
8. Remove the problem disk (identified by the amber LED) from the host server.
When removing a drive, pull it only half way out of the slot and wait 30 seconds for the disk
to spin down before fully removing the disk drive.
Latch
Tray handle
Pull the drive out half-way, wait 30 seconds
for the disk to spin down. Only then remove
the drive completely.
9. Mark the failed disk drive in a non-harmful way to ensure that the correct disk will be
replaced in a later step.
10. Delete the HW ID of the failed disk:
[root@nzhost1 ~]# nzhw delete -id 1020
Are you sure you want to proceed (y|n)? [n] y
11. Install the replacement disk into the same slot from which the failed disk was removed.
Failure to replace a hard disk drive in its correct bay might result in loss of data. If you are
replacing a hard disk drive that is part of a configured array and logical drive, be sure to
install the replacement hard disk drive in the correct bay. See the hardware and software
documentation that applies to the host server to determine whether there are restrictions
regarding hard disk drive configurations.
Never swap a drive when its associated green activity LED is flashing. Swap a drive only
when its associated amber LED is blinking.
If the disk status does not change to Assigned/Ok (or Failed/Warning), use the mega_
check.pl tool. As user root:
[root@nzhost ~]# /opt/nz-hwsupport/hts/mega_check.pl -r
Choose option 5 Manually start a Copyback process and provide the following answers to the
prompts:
For N3001-002 and larger systems:
There is no foreign configuration on controller 0
Exit Code: 0x00
Slot number of drive to copy data from: 4 (original configuration hot spare)
Slot number of drive to copy data to: n
Where n is the chassis slot number of the replaced drive (0 through 3).
For N3001-001:
There is no foreign configuration on controller 0
Exit Code: 0x00
Slot number of drive to copy data from: 7(original configuration hot spare)
Slot number of drive to copy data to: n
Where n is the chassis slot number of the replaced drive (0 through 6).
14. Disk regeneration process can take from 30 to 60 minutes.
When the regeneration process is complete, the replaced disk now lists as Active Ok
and an updated Spare disk should be present the host. As user nz, type:
[nz@nzhost ~]$ nzhw | grep -i hostdisk
HostDisk 1019 rack1.host1.hostDisk1 Active OkDisabled
HostDisk 1021 rack1.host1.hostDisk3 Active OkDisabled
HostDisk 1022 rack1.host1.hostDisk4 Active OkDisabled
HostDisk 1023 rack1.host1.hostDisk5 Spare Ok Disabled
HostDisk 1032 rack1.host2.hostDisk1 Spare OkDisabled
HostDisk 1033 rack1.host2.hostDisk2 Active OkDisabled
HostDisk 1034 rack1.host2.hostDisk3 Active OkDisabled
HostDisk 1035 rack1.host2.hostDisk4 Active OkDisabled
HostDisk 1036 rack1.host2.hostDisk5 Active OkDisabled
HostDisk 1658 rack1.host1.hostDisk2 Active OkDisabled
If a host disk firmware update is necessary, follow the instructions in the FDT User’s
Guide for firmware updates.
17. Verify that host disk issues are resolved:
[nz@nzhost ~]$ nzhw -issues
No entries found
Before you begin the G8052 Management Switch replacement process, make certain that
you have a replacement Management Switch that conforms to the hardware models
supported for the IBM PureData System for Analytics N3001. The odd numbered racks in
an N3001 system (rack 1 in an N3001-020 or racks 1 and 3 in an N3001-040) include a
G8052 Management Switch.
Note: You'll need a BNT serial cable (DB 9-F to mini-USB) for this installation procedure
(IBM FRU number 43X0510).
Note: The power and fan modules of the G8052 Management Switch are replaceable. Part
numbers are listed in “Overview of the IBM PureData System for Analytics N3001,” and
replacement procedures are included in Rack Switch G8052 Installation Guide.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The replacement part number for the G8052 switch is IBM P/N 49Y7922.
Replacement Procedure
The replacement G8052 management switch must be set up with a valid IP address and
then configured for use in an N3001. This includes internal settings as well as firmware
loading.
To replace a G8052 Management Switch:
14-1
Replacement Procedures: IBM PureData System for Analytics N3001
c. Check to see if Call Home is enabled, and if so, temporarily disable it.
Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
7. Connect the serial cable between the active host and the management switch.
The cable must be connected from the serial port at the rear of the active host to the
mini-USB connector on the front of the G8052 management switch, Refer to
Figure 14-1 for the location of the mini-USB connector on the G8052 switch.
Mini-USB Port
Figure 14-1: Front of G8052 Management Switch
d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
26. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
b. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
Troubleshooting
If the switch seems to be hung up during configuration procedure, it may be due to a previ-
ous session not having closed properly. To correct this:
1. Type the command:
[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8052.
2. Type admin and press Enter.
You are now logged into the G8052.
3. Type Ctrl-A.
4. Type x.
5. Select yes from the prompt and press Enter.
The active host prompt returns.
6. Restart the switch configuration at step 8 on page 14-3, or step 22, depending on
where the hang up was.
Before you begin the G8264 Fabric Switch replacement process, make certain that you
have a replacement Fabric Switch that conforms to the hardware models supported for the
IBM PureData System for Analytics N3001. Rack 2 in an N3001 system includes two
G8264 Fabric Switches.
Note: You'll need a BNT serial cable (DB 9-F to mini-USB) for this installation procedure
(IBM FRU number 46D0180).
Note: The power and fan modules of the G8264 Fabric Switch are replaceable. Part num-
bers are listed in “Overview of the IBM PureData System for Analytics N3001,” and
replacement procedures are included in IBM BNT Rack Switch G8264F Installation Guide.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The replacement part number for the G8264 switch is 49Y7923.
Replacement Procedure
The replacement G8264 fabric switch must be set up with a valid IP address and then con-
figured for use in an N3001. This includes internal settings as well as firmware loading.
To replace a G8264 Fabric Switch:
15-1
Replacement Procedures: IBM PureData System for Analytics N3001
c. Check to see if Call Home is enabled, and if so, temporarily disable it.
Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
Statement 3
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
7. Connect the serial cable between the active host and the fabric switch.
The cable must be connected from the serial port at the rear of the active host to the
mini-USB connector on the front of the replaced G8264 fabric switch, Refer to
Figure 15-1 for the location of the mini-USB connector on the G8264 switch.
Mini-USB port
20. When the G8264 prompt returns, type Ctrl-A, then type x.
21. Select yes from the prompt and press Enter.
The ha1 prompt returns.
22. When the configuration scripts completes, load the latest released firmware for the
G8264 switch:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./firmware_updater RackFabSwitch --alias
netswfab01[a,b]
d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
Troubleshooting
If the switch seems to be hung up during configuration procedure, it may be due to a previ-
ous session not having closed properly. To correct this:
1. Type the command:
[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8264.
2. Type admin and press Enter.
You are now logged into the G8264.
3. Type Ctrl-A.
4. Type x.
5. Select yes from the prompt and press Enter.
The active host prompt returns.
6. Restart the switch configuration at step 8 on page 15-4, or step 23 on page 15-5,
depending on where the hang up was.
Detailed instructions for replacing KVM components are provided in IBM 1U 17-inch Flat
Panel Console Kit Installation and Maintenance Guide.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system.
The replacement FRU numbers for the KVM components are:
Monitor/Tray - 47C2521
Keyboard - 00X6927
Switch - 69Y6015
USB/Video/Ethernet Adapter - 39M2909
Ethernet Cable - 90Y3732
Terminators - 39M2912
Power Cord Y-Adapter - 39M5450
16-1
Replacement Procedures: IBM PureData System for Analytics N3001
To replace a KVM:
1. Read the safety information that begins on page v.
2. Disconnect the KVM power cable from outlet 5 of the lower left RPC.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
To Host1 To AMM
To Y-Adapter From Monitor Adapter Adapter
Do Not Use
7. Plug the power cable (Y-adapter) into outlet 5 of the lower left RPC.
Before you begin the PDU replacement process, make certain that you have a replacement
PDU that conforms to the hardware models supported for the IBM PureData System for
Analytics N3001. Each N3001 rack has four PDUs.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.
Items required:
Serial Cable, included with 00AK104, connection type DB-9F to RJ11
The estimated time to perform this procedure is up to 90 minutes, depending on ease of
access to the system and familiarity with NPS and the Netezza system.
The FRU numbers for the PDUs are:
Upper and Lower (also called RPCs): 00AK104
17-1
Replacement Procedures: IBM PureData System for Analytics N3001
c. Check to see if Call Home is enabled, and if so, temporarily disable it.
Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
5. Take note of all the power connections to the PDU that is being replaced by labeling
each power cord with the port number where each is plugged into the PDU. Also note
the physical location of the PDU (Rack 1, upper, right for example) as you will need
this information for configuring the replacement PDU.
6. Unplug all power cords and network cables from the PDU, unplugging the input power
connection last.
Statement 1
DANGER
Electrical current from power, telephone, and communication cables is hazardous.
To avoid a shock hazard:
Do not connect or disconnect any cables or perform installation, maintenance, or
reconfiguration of this product during an electrical storm.
Connect all power cords to a properly wired and grounded electrical outlet.
Connect to properly wired outlets any equipment that will be attached to this
product.
When possible, use one hand only to connect or disconnect signal cables.
Never turn on any equipment when there is evidence of fire, water, or structural
damage.
Disconnect the attached power cords, telecommunications systems, networks, and
modems before you open the device covers, unless instructed otherwise in the
installation and configuration procedures.
Connect and disconnect cables as described in the following table when installing,
moving, or opening covers on this product or attached devices.
To Connect: To Disconnect:
1. Turn everything OFF. 1. Turn everything OFF.
2. First, attach all cables to devices. 2. First, remove power cords from outlet.
3. Attach signal cables to connectors. 3. Remove signal cables from connectors.
4. Attach power cords to outlets. 4. Remove all cables from devices.
5. Turn device ON.
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
7. Replace the PDU and connect the input power cable to its source.
8. Attach all power cords to the appropriate ports and reconnect the network cable to the
network port.
9. Configure the replacement PDU:
Note: You need the serial cable, FRU number 69Y2042, connection type DB-9F to
RJ11 cable. The cable connects from the serial port on the active host to the serial port
on the PDU.
Example output:
[root@nzhost1 rpc]# ./rpcconfigure -s 1ur
------------------------------------------------------------------
Host Platform Configuration Version 5.4
2014-09-20.20418.rel-hpfConfig-5.cm.20418
e. As user root, run sysrevcheck to verify that the PDU is at the correct firmware level:
Run the command:
[root@nzhost1 ~]# ./sys_rev_check rpc
Before you begin the media tray replacement process, make certain that you have a
replacement media tray that conforms to the hardware models supported for the N3001
system. Typically, you will use a new replacement media tray.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.
The estimated time to perform this procedure is from 20 to 45 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the media tray can be obtained as described in “H-Chassis Compo-
nent FRU Numbers” on page 1-13.
To replace a media tray, use the procedures in BladeCenter H Type 8852, 7989, and
1886, Problem Determination and Service Guide.
18-1
Replacement Procedures: IBM PureData System for Analytics N3001
This procedure requires that the IP address of the system components use the Netezza
default IP addresses (configIP must not have been applied).
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the H Chassis midplane can be obtained as described in “H-Chassis
Component FRU Numbers” on page 1-13.
The replacement part number for the full H Chassis is IBM P/N 31R3308.
To replace an H Chassis midplane or chassis:
1. Read the safety information that begins on page v.
2. Ensure that all cables connected to the modules in the chassis are clearly marked.
3. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
19-1
Replacement Procedures: IBM PureData System for Analytics N3001
Statement 5
CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.
Statement 3
CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.
DANGER
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.
10. On nzhost1, examine the IP address of the AMM in bay 1 using the command:
[root@nzhost1 ~]# nslookup mm00x
Where x is the number of the chassis.
Example output:
Server: 127.0.0.1
Address: 127.0.0.1#5
Name: mm001
Address: 10.0.129.0
Alternatively, the IP address of the AMM is listed in the file /etc/hosts.
11. Copy the appropriate AMM configuration file to /tmp/amm.cfg. AMM configuration files
are found in the directory /nzlocal/scripts/spa/bc, named spaxx.amm.cfg, where xx is
the chassis number (for example, 01).
12. Configure the network alias on nzhost1:
[root@nzhost1 ~]# ifconfig bond0:0 192.168.70.130 netmask
255.255.255.0
Reset Switch
30. If the system was in cluster mode in step 2, put the system in cluster mode:
a. Run the following script:
[root@nzhost1 fdt]# /nzlocal/scripts/nz.heartbeat.sh
d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
Some power supplies are classified as High Efficiency. A failed High Efficiency power sup-
ply must be replaced with a High Efficiency power supply, and both/all power supplies in
the chassis must be High Efficiency. To check the classification of host power supplies, you
can log into the IMM and check the VPD.
The estimated time to perform this procedure is from 10 to 30 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
An amber Fault LED lights on the failed power supply.
The power supply FRU numbers are:
Disk Enclosure - 45W8841
x3650-M4 Host Server - 94Y8114
x3650-M4-HD Host Server - 94Y8118
x3750-M4 Host Server - 69Y5954
G8052 Management Switch - 00D6271
G8264 Fabric Switch - 00D6271
The FRU number for the H Chassis power supply can be obtained as described in “H-
Chassis Component FRU Numbers” on page 1-13.
To replace a power supply, use the procedures in:
System Storage EXP2500, Installation, User’s, and Maintenance Guide
BladeCenter H Type 8852, 7989, and 1886, Problem Determination and Service
Guide
Rack Switch G8052 Installation Guide
20-1
Replacement Procedures: IBM PureData System for Analytics N3001
Statement 8
CAUTION:
Never remove the cover on a power supply or any part that has the following label attached.
Hazardous voltage, current, and energy levels are present inside any component that has
this label attached. There are no serviceable parts inside these components. If you suspect
a problem with one of these parts, contact a service technician.
The following sections describe the procedure for shutting down and bringing up an IBM
PureData System for Analytics N3001.
This procedure requires the user to have root access.
2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
3. Log in to the active host (nzhost1 in this example) as user nz.
4. Check to see if Call Home is enabled, and if so, disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status
A-1
Replacement Procedures: IBM PureData System for Analytics N3001
============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
6. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
7. Log into the nz account.
[root@nzhost1 ~]# su - nz
8. If Call Home was disabled before shutting down the system, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
This section describes some important notices, trademarks, and compliance information.
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and ser-
vices currently available in your area. Any reference to an IBM product, program, or service
is not intended to state or imply that only that IBM product, program, or service may be
used. Any functionally equivalent product, program, or service that does not infringe any
IBM intellectual property right may be used instead. However, it is the user's responsibility
to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to: This information was developed for
products and services offered in the U.S.A.
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellec-
tual Property Department in your country or send inquiries, in writing, to:
IBM World Trade Asia Corporation
Licensing 2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
B-1
Replacement Procedures: IBM PureData System for Analytics N3001
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and distrib-
ute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the appli-
cation programming interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions. IBM, there-
fore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Each copy or any portion of these sample programs or any derivative work, must include a
copyright notice as follows:
© your company name) (year). Portions of this code are derived from IBM Corp. Sample
Programs.
© Copyright IBM Corp. _enter the year or years_.
If you are viewing this information softcopy, the photographs and color illustrations may not
appear.
Trademarks
IBM, the IBM logo, ibm.com and Netezza are trademarks or registered trademarks of Inter-
national Business Machines Corporation in the United States, other countries, or both. If
these and other IBM trademarked terms are marked on their first occurrence in this infor-
mation with a trademark symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the Web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml.
Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/
or other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corpo-
ration in the United States, other countries, or both.
NEC is a registered trademark of NEC Corporation.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.
Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or
other countries.
D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the
Wind River logo are trademarks, registered trademarks, or service marks of Wind River Sys-
tems, Inc. Tornado patent pending.
APC and the APC logo are trademarks or registered trademarks of American Power Conver-
sion Corporation.
Other company, product or service names may be trademarks or service marks of others.
Deutschland: Einhaltung des Gesetzes über die elektromagnetische Verträglichkeit von Geräten
Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit von
Geräten (EMVG)”. Dies ist die Umsetzung der EU-Richtlinie 2014/30/EU in der Bundesre-
publik Deutschland.
Zulassungsbescheinigung laut dem Deutschen Gesetz über die elektromagnetische Verträglichkeit von Geräten
(EMVG) (bzw. der EMC EG Richtlinie 2014/30/EU) für Geräte der Klasse A
Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das EG-Konfor-
mitätszeichen - CE - zu führen.
Verantwortlich für die Einhaltung der EMV-Vorschriften ist der Hersteller:
International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
Tel: 914-499-1900
Der verantwortliche Ansprechpartner des Herstellers in der EU ist:
IBM Deutschland
Technical Relations Europe, Abteilung M456
IBM-Allee 1, 71139 Ehningen, Germany
Telephone: +49 800 225 5426
Email: HalloIBM@de.ibm.com
Generelle Informationen:
Das Gerät erfüllt die Schutzanforderungen nach EN 55024 und EN 55022 / EN 55032
Klasse A.
This is a Class A product based on the standard of the Voluntary Control Council for Inter-
ference (VCCI). If this equipment is used in a domestic environment, radio interference
may occur, in which case the user may be required to take corrective actions.
This is electromagnetic wave compatibility equipment for business (Type A). Sellers and
users need to pay attention to it. This is for any areas other than home.
The IBM PureData System for Analytics appliance requires a readily accessible power cut-
off. This can be a Unit Emergency Power Off Switch (UEPO), a circuit breaker or
completely remove power from the equipment by disconnecting the Appliance Coupler (line
cord) from all rack PDUs.
CAUTION: Disconnecting power from the appliance without first stopping the NPS soft-
ware and high availability processes could result in data loss and increased service time to
restart the appliance after the power cutoff. For all non-emergency situations, follow the
documented power-down procedures in the IBM Netezza System Administrator's Guide to
ensure that the software and databases are stopped correctly, in order, to avoid data loss or
file corruption.
Homologation Statement
This product may not be certified in your country for connection by any means whatsoever
to interfaces of public telecommunications networks.
Further certification may be required by law prior to making any such connection. Contact
an IBM representative or reseller for any questions.