You are on page 1of 212

Machine Types 3561 and 3567

Replacement Procedures:
IBM PureData System for Analytics
N3001
Revised: December 14, 2018

00X6949 Rev. 1.40


Note: Before using this information and the product that it supports, read the information in “Notices and Trademarks” on
page B-1.

Changes in this guide compared to the previous version:


Rev. 1.40: Updated guide with changed PDU part number (00AK104)
Rev. 1.30: Updated Host System Server Board replacement procedure (page 9-8).
Rev. 1.29: Updated link to FoD website in system board procedures.
Rev. 1.28: Updated name of configuration script in N3001-001 system board procedure.
Rev. 1.27:
1. Updated FRU number for quad-port NIC.
2. Correct command used to update disk firmware on N3001-001.
Rev. 1.26:
1. Added Host Disk Drive model.
2. Changed references for FDT Support Tools 2.0.0.4 to FDT Support Tools 2.0.0.5.
Rev. 1.25:
1. Changed PDU procedure requiring the system to be taken offline.
2. Corrected FRU number for RAID supecap.
3. Added info about onboard RAIID controller.
Rev. 1.24: Added note that when replacing blade Expansion Unit, do not re-use PCI Riser.
Rev. 1.23:
1. Added FRU numbers for host memory cards.
2. Modified S-Blade replacement to cover use of Host Firmware 4.2.0.5 (HS23) DVD and FDT Support Tools 2.0.0.4.
Rev. 1.22: Added Warning in ESM replacement procedure concerning need for enclosure power cycle if firmware is updated.
Rev. 1.21: Corrected sequence of steps in ESM replacement (page 3-4).
Rev. 1.20: Fixed formatting issues throughout guide.
Rev. 1.19: Clarified disk drive FRU numbers in FRU list and Chapter 4.
Rev. 1.18:
1. Combined H-Chassis and H-Chassis Midplane chapters
2. Added the to set AMM to factory default in H-Chassis midplane replacement.
Rev. 1.17: Modified S-Blade replacement to cover use of Host Firmware 4.2.0.4 (HS23) DVD.
Rev. 1.16:
1. Removed reference to FDT 4.3 in Flash RAID chapter.
2. Added recommended step to migrate hosts in host component replacement chapters.
Rev. 1.15:
1. Added instructions on how to set AMM name in VPD.
2. Updated disk drive firmware version.
3. Updated S-Blade replacement procedure to include a step to ping the replacement blade before proceeding with the
replacespu script.
Rev. 1.14: Restructured procedures for S-Blade and Disk replacement using Restricted Environment.
Rev. 1.13:
1. Changes in S-Blade replacement procedure for Host Firmware 4.2.0.3 (HS23) DVD.
2. Added information in the S-Blade replacement procedure for the 10Gb Interposer card.
3. Added note to host disk replacement procedure regarding use of Lenovo branded drives.
4. Added a chapter on replacing the RAID flash on x3650-M4-HD and x3750-M4v2 hosts.
5. Added warning related to replacing RAID Flash cards when replacing host planars in N3001-001 systems.
Rev. 1.12: More clarification on using Host Firmware 4.2.0.2 (HS23) DVD.

© Copyright IBM Corporation 2014, 2018.


US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM
Corp.
Contents

1 Overview of the IBM PureData System for Analytics N3001


Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Replaceable Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Rack Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
FRU Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
H-Chassis Component FRU Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
FRU Numbers Listed in Focal Point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
FRU Number List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-15
Electrostatic Discharge Precautions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
Contact IBM Netezza Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17

2 Replacing S-Blade Components


Restricted Environment (Online) - Replace a Blade Server, Battery, or DAC. . . . . . . . . 2-3
Command Line Interface (Online) Replacement Procedure . . . . . . . . . . . . . . . . . . . 2-19
Replace a Blade Server, Battery, or DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
Offline Replacement Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
Troubleshooting the Replacement Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-39
Hang during DVD Boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-39
Running the replacespu Command after the S-Blade is Deleted . . . . . . . . . . . . . 2-40

3 Replacing an Environmental Services Module

4 Replacing a Disk Drive


Use the Restricted Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Use the Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
Manual Disk Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
Replacement Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
Checking the Firmware Revision of the Replacement Disk . . . . . . . . . . . . . . . . . 4-21

5 Replacing a Disk Enclosure

6 Replacing a Management Module

7 Replacing a 10Gb Switch

i
8 Replacing a Host System Board (N3001-001)

9 Replacing a Host System Board (N3001-002 or larger)

10 Replacing a Host Server RAID Flash

11 Replacing a Host Server Network Interface Card

12 Replacing a Host Server SAS HBA (N3001-001)

13 Replacing a Host Server Disk Drive

14 Replacing a G8052 Management Switch


Replacement Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5

15 Replacing a G8264 Fabric Switch


Replacement Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-1
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6

16 Replacing a Keyboard/Video/Mouse (KVM)

17 Replacing a Power Distribution Unit (PDU)


Upper and Lower PDUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-1

18 Replacing a Media Tray in an H Chassis

19 Replacing an H Chassis or Midplane

20 Replacing a Power Supply

Appendix A: Reference Materials


Shutting Down an N3001 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Bringing Up an N3001 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2

Appendix B: Notices and Trademarks


Notices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Trademarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Electronic Emission Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
Regulatory and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7

ii
iii
iv
Preface
This guide includes a series of procedures you must follow to replace components in an
IBM® PureData™ System for Analytics N3001.

About This Guide


Replacement Procedures: IBM PureData System for Analytics N3001 is written for IBM
Netezza personnel authorized to repair an IBM PureData System for Analytics N3001. The
procedures in this document assume the system has been previously installed and opera-
tional. From this point on, we refer to the system as the N3001.

Topics See …

An overview of the system components and “Overview of the IBM PureData System for
locations. Analytics N3001” on page 1-1
Steps required to replace an S-Blade. “Replacing S-Blade Components” on
page 2-1
Steps required to replace an Environmental “Replacing an Environmental Services
Services Module. Module” on page 3-1
Steps required to replace a Disk Drive. “Replacing a Disk Drive” on page 4-1
Steps required to replace a Disk Enclosure. “Replacing a Disk Enclosure” on page 5-1
Steps required to replace an AMM. “Replacing a Management Module” on
page 6-1
Steps required to replace a 10Gb Switch. “Replacing a 10Gb Switch” on page 7-1
Steps required to replace Host Server System “Replacing a Host System Board (N3001-
board in an N3001-001 system. 001)” on page 8-1
Steps required to replace Host Server System “Replacing a Host System Board (N3001-
board in an N3001-002 or larger system. 002 or larger)” on page 9-1
Steps required to replace a Host Server RAID “Replacing a Host Server RAID Flash” on
Flash. page 10-1
Steps required to replace a Host Server NIC. “Replacing a Host Server Network Inter-
face Card” on page 11-1
Steps required to replace a Host Server SAS “Replacing a Host Server SAS HBA
HBA. (N3001-001)” on page 12-1
Steps required to replace a Host Server Disk “Replacing a Host Server Disk Drive” on
Drive. page 13-1
Steps required to replace G8052 Manage- “Replacing a G8052 Management
ment Switch. Switch” on page 14-1
Steps required to replace G8264 Fabric “Replacing a G8264 Fabric Switch” on
Switch. page 15-1
Steps required to replace a KVM. “Replacing a Keyboard/Video/Mouse
(KVM)” on page 16-1

v
Topics See …

Steps required to replace a PDU. “Replacing a Power Distribution Unit


(PDU)” on page 17-1
Steps required to replace a Media Tray. “Replacing a Media Tray in an H Chassis”
on page 18-1
Steps required to replace an H Chassis “Replacing an H Chassis or Midplane” on
Midplane. page 19-1
Steps required to replace power supplies. “Replacing a Power Supply” on page 20-1
A listing of reference materials, such as shut- “Reference Materials” on page A-1
ting down and bringing up an N3001 system.

The Purpose of This Guide


Replacement Procedures: IBM PureData System for Analytics N3001 provides the proce-
dures you must perform to replace components in an N3001 system.

Symbols and Conventions


This guide uses the following typographical conventions:
 Italics for emphasis on terms and user-defined values such as user input
 Bold for command line input; for example, nzstop

If You Need Help


If you are having trouble using the Netezza appliance, you should:
1. Retry the action, carefully following the instructions given for that task in the
documentation.
2. Go to the IBM Support Portal at: http://www.ibm.com/support. Log in using your IBM
ID and password. You can search the Support Portal for solutions. To submit a support
request, click the Service Requests & PMRs tab.
3. If you have an active service contract maintenance agreement with IBM, you can con-
tact customer support teams via telephone. For individual countries, visit the Technical
Support section of the IBM Directory of worldwide contacts (http://www14.soft-
ware.ibm.com/webapp/set2/sas/f/handbook/contacts.html#phone).

vi
Comments on the Documentation
We welcome any questions, comments, or suggestions that you have for the IBM Netezza
documentation. Please send us an e-mail message at netezza-doc@wwpdl.vnet.ibm.com
and include the following information:
 The name and version of the manual that you are using
 Any comments that you have about the manual
 Your name, address, and phone number
We appreciate your comments on the documentation.

vii
Safety
Before installing this product, read the Safety Information.

Antes de instalar este produto, leia as Informações de Segurança.

Læs sikkerhedsforskrifterne, før du installerer dette produkt.


Lees voordat u dit product installeert eerst de veiligheidsvoorschriften.
Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information. Avant
d'installer ce produit, lisez les consignes de sécurité.
Vor der Installation dieses Produkts die Sicherheitshinweise lesen.

Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.

Les sikkerhetsinformasjonen (Safety Information) før du installerer dette produktet.

viii
Antes de instalar este produto, leia as Informações sobre Segurança.

Antes de instalar este producto, lea la información de seguridad.


Läs säkerhetsinformationen innan du installerar den här produkten.

Safety Statements
These statements provide the caution and danger information used in this documentation.
Important:
Each caution and danger statement in this documentation is labeled with a number. This
number is used to cross reference an English-language caution or danger statement with
translated versions of the caution or danger statement in the Safety Information document.
For example, if a caution statement is labeled "Statement 1," translations for that caution
statement are in the Safety Information document under "Statement 1."
Be sure to read all caution and danger statements in this documentation before you per-
form the procedures. Read any additional safety information that comes with your system
or optional device before you install the device.

ix
Replacement Procedures: IBM PureData System for Analytics N3001

Statement 1

DANGER

Electrical current from power, telephone, and communication cables is hazardous.


To avoid a shock hazard:
 Do not connect or disconnect any cables or perform installation, maintenance, or
reconfiguration of this product during an electrical storm.
 Connect all power cords to a properly wired and grounded electrical outlet.

 Connect to properly wired outlets any equipment that will be attached to this
product.
 When possible, use one hand only to connect or disconnect signal cables.
 Never turn on any equipment when there is evidence of fire, water, or structural
damage.
 Disconnect the attached power cords, telecommunications systems, networks, and
modems before you open the device covers, unless instructed otherwise in the
installation and configuration procedures.
 Connect and disconnect cables as described in the following table when installing,
moving, or opening covers on this product or attached devices.

To Connect: To Disconnect:
1. Turn everything OFF. 1. Turn everything OFF.
2. First, attach all cables to devices. 2. First, remove power cords from outlet.
3. Attach signal cables to connectors. 3. Remove signal cables from connectors.
4. Attach power cords to outlets. 4. Remove all cables from devices.
5. Turn device ON.

x
Statement 2

CAUTION:
When replacing a lithium battery, use only the approved IBM® Part Number or an equiva-
lent type battery recommended by the manufacturer. If your system has a module
containing a lithium battery, replace it only with the same module type made by the same
manufacturer. The battery contains lithium and can explode if not properly used, handled,
or disposed of.
Do not:
 Throw or immerse into water
 Heat to more than 100°C (212°F)
 Repair or disassemble
Dispose of the battery as required by local ordinances or regulations.

Statement 3

CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
 Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
 Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.

DANGER

Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.

Class 1 Laser Product


Laser Klasse 1
Laser Klass 1
Luokan 1 Laserlaite
Appareil A` Laser de Classe 1

xi
Replacement Procedures: IBM PureData System for Analytics N3001

Statement 4

18 kg (39.7 lb) 32 kg (70.5 lb) 55 kg ( 121.1 lb)


CAUTION:
Use safe practices when lifting.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

Statement 6

CAUTION:
If you install a strain-relief bracket option over the end of the power cord that is connected
to the device, you must connect the other end of the power cord to an easily accessible
power source.

xii
Statement 7

CAUTION: If the device has doors, be sure to remove or secure the doors before moving or
lifting the device to avoid personal injury. The doors will not support the weight of the
device.

Statement 8

CAUTION:
Never remove the cover on a power supply or any part that has the following label attached.

Hazardous voltage, current, and energy levels are present inside any component that has
this label attached. There are no serviceable parts inside these components. If you suspect
a problem with one of these parts, contact a service technician.

Statement 13

DANGER

Overloading a branch circuit is potentially a fire hazard and a shock hazard under
certain conditions. To avoid these hazards, ensure that you system electrical require-
ments do not exceed branch circuit protection requirements. Refer to the information
that is provided with your device for electrical specifications.

xiii
Replacement Procedures: IBM PureData System for Analytics N3001

Statement 15

CAUTION:
Make sure that the rack is secured properly to avoid tipping when the server unit is
extended.

Statement 21

CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.

Statement 26

CAUTION:
Do not place any object on top of rack-mounted devices.

Statement 28

CAUTION:
The battery is a lithium ion battery. To avoid possible explosion, do not burn the battery.
exchange it only with the approved part. Recycle or discard the battery as instructed by
local regulations.

xiv
Statement 34

CAUTION:
To reduce the risk of electric shock or energy hazard:
 This equipment must be installed by trained service personnel in a restricted-access
location, as defined by the NEC and IEC 60950-1, Second Edition, The Standard for
Safety of Information Technology Equipment.
 See the specifications in the product documentation for the required circuit-breaker
rating for branch circuit overcurrent protection.
 Use copper wire conductors only. See the specifications in the product documentation
for the required wire size.
 See the specifications in the product documentation for the required torque values for
the wiring-terminal nuts.

Statement 37

DANGER

When you populate a rack cabinet, adhere to the following guidelines:


 Always lower the leveling pads on the rack cabinet.
 Always install the stabilizer brackets on the rack cabinet.
 Always install the heaviest devices in the bottom of the rack cabinet.
 Do not extend multiple devices from the rack cabinet simultaneously, unless the
rack-mounting instructions direct you to do so. Multiple devices extended into the
service position can cause your rack cabinet to tip.
 If you are not using the IBM 9308 rack cabinet, securely anchor the rack cabinet
to ensure its stability.

xv
Replacement Procedures: IBM PureData System for Analytics N3001

xvi
CHAPTER 1
Overview of the IBM PureData System for Analytics N3001
What’s in this chapter
 Prerequisites
 Replaceable Components
 FRU Numbers
 Electrostatic Discharge Precautions
 Contact IBM Netezza Support

This guide provides replacement procedures for the components in the IBM PureData
System for Analytics N3001 that require steps in addition to those provided in the major
component IBM Problem Determination and Service Guide. For components not included
in this guide, see the IBM Problem Determination and Service Guide for the major compo-
nent requiring replacement.
This chapter provides an overview of the N3001 system.
To identify the machine type, a label is located in the upper right at the front and rear of
the rack, viewable with the doors open.

Prerequisites
Before you begin the replacement process, make certain that you have a replacement com-
ponent that conforms to the hardware models supported for the N3001 system. IBM Field
Replaceable Unit (FRU) numbers are located on each component.
Also, some procedures require logging into the system as root user. Other procedures
require the nz user and the use of the nzrev command to make sure that the N3001 system
is running a minimum level of Netezza software.
This procedure requires you to be familiar with commands such as nzhw and nzds, which
are documented in the Netezza System Administrator's Guide.
To service the N3001-001 using the procedures in this guide, a KVM or other keyboard/
monitor/mouse must be attached to the system. A KVM is not part of an N3001-001.

IBM Netezza appliances are now configured to automatically open support tickets using the
Call Home service. To avoid creation of extra tickets during service procedures which
involve system outages or state changes, please be sure to disable Call Home (if it is
enabled) prior to performing maintenance, and re-enable Call Home at the end of the pro-
cedure. Details can be found within each procedure.

1-1
Replacement Procedures: IBM PureData System for Analytics N3001

Replaceable Components
The IBM PureData System for Analytics N3001 models include partial, single, and multi-
rack systems.
Components accessed from the front of the system are:

 Disks

 Disk Enclosure

 Host Servers

 S-Blades

 Keyboard

 Video Monitor

 Power modules in the H-Chassis


Each S-Blade occupies two slots in the BladeCenter H chassis. The S-Blade is the assem-
bly of an IBM blade server (odd-numbered slot) and the Sidecar Expansion Chassis with
Netezza Database Accelerator Cards (next higher even-numbered slot).
Components accessed from the rear of the system include:

 Network Interface Cards (NICs)

 SAS HBA Cards

 ESM (EXP2524)

 Management Module

 Gb Switch

 Management or Fabric Switch

 Power modules

 PDUs

Rack Illustrations
The following illustrations show the component configuration of N3001 models.

Power Modules
Host 1

Host 2

Front Rear

Figure 1-1: N3001-001

1-2 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Machine Type
Label

SPA 1

Host 1

Host 2
KVM

Enclosure 1

Enclosure 2

Power Module Power Module

Chassis 1
(2 S-Blades)

Power Module Power Module

Figure 1-2: Front of N3001-002 Rack

00X6949 Rev.1.40 1-3


Replacement Procedures: IBM PureData System for Analytics N3001

Machine Type
Label

Power Modules

KVM Switch
Management Switch
spa1 encl 1 mm1 spa1 encl 1 mm2
ESM a ESM b
spa1 encl 2 mm1 spa1 encl 2 mm2
ESM a ESM b

Lower Left PDU Lower Right PDU

Gb Switch Management Module 1


Chassis 1 / Slot 7 Chassis 1 / spa1.mm1

Management Module 2
Chassis 1 / spa1.mm2

Gb Switch
Chassis 1 / Slot 9

Figure 1-3: Rear of N3001-002 Rack

1-4 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Machine Type
Label

SPA 1

Host 1

Host 2
KVM

Enclosure 1

Enclosure 2

Enclosure 3

Enclosure 4

Enclosure 5

Enclosure 6

Power Module Power Module

Chassis 1
(4 S-Blades)

Power Module Power Module

Figure 1-4: Front of N3001-005 Rack

00X6949 Rev.1.40 1-5


Replacement Procedures: IBM PureData System for Analytics N3001

Machine Type
Label

Power Modules

KVM Switch
Management Switch
spa1 encl 1 mm1 spa1 encl 1 mm2
ESM a ESM b
spa1 encl 2 mm1 spa1 encl 2 mm2
ESM a ESM b
spa1 encl 3 mm1 spa1 encl 3 mm2
ESM a ESM b
spa1 encl 4 mm1 spa1 encl 4 mm2
ESM a ESM b
spa1 encl 5 mm1 spa1 encl 5 mm2
ESM a ESM b
spa1 encl 6 mm1 spa1 encl 6 mm2
ESM a ESM b
Lower Left PDU Lower Right PDU

Gb Switch Management Module 1


Chassis 1 / Slot 7 Chassis 1 / spa1.mm1

Management Module 2
Chassis 1 / spa1.mm2

Gb Switch
Chassis 1 / Slot 9

Figure 1-5: Rear of N3001-005 Rack

1-6 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Machine Type
Label
SPA 1

Enclosure 1

Enclosure 2

Enclosure 3

Enclosure 4

Enclosure 5

Enclosure 6

Host 1

Host 2
KVM

Enclosure 7

Enclosure 8

Enclosure 9

Enclosure 10

Enclosure 11

Enclosure 12

Power Module Power Module

Chassis 1
(7 S-Blades)

Power Module Power Module

Figure 1-6: Front of N3001-0x0 First Rack

00X6949 Rev.1.40 1-7


Replacement Procedures: IBM PureData System for Analytics N3001

Machine Type
Label

spa1 encl 1 mm1 spa1 encl 1 mm2


ESM a ESM b
spa1 encl 2 mm1 spa1 encl 2 mm2
ESM a ESM b
spa1 encl 3 mm1 spa1 encl 3 mm2
ESM a ESM b
spa1 encl 4 mm1 spa1 encl 4 mm2
ESM a ESM b
spa1 encl 5 mm1 spa1 encl 5 mm2
ESM a ESM b
spa1 encl 6 mm1 spa1 encl 6 mm2
ESM a ESM b
Power Modules Power Modules
Upper Left PDU
Upper Right PDU
KVM Switch
Management Switch
spa1 encl 7 mm1 spa1 encl 7 mm2
ESM a ESM b
spa1 encl 8 mm1 spa1 encl 8 mm2
ESM a ESM b
spa1 encl 9 mm1 spa1 encl 9 mm2
ESM a ESM b
spa1 encl 10 mm1 spa1 encl 10 mm2
ESM a ESM b
spa1 encl 11 mm1 spa1 encl 11 mm2
ESM a ESM b
spa1 encl 12 mm1 spa1 encl 12 mm2
ESM a ESM b
Lower Left PDU Lower Right PDU

Gb Switch Management Module 1


Chassis 1 / Slot 7 Chassis 1 / spa1.mm1

Management Module 2
Chassis 1 / spa1.mm2

Gb Switch
Chassis 1 / Slot 9

Figure 1-7: Rear of N3001-0x0 First Rack

1-8 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Machine Type
Label
SPA 2 through 8

Enclosure 1

Enclosure 2

Enclosure 3

Enclosure 4

Enclosure 5

Enclosure 6

Enclosure 7

Enclosure 8

Enclosure 9

Enclosure 10

Enclosure 11

Enclosure 12

Power Module Power Module

Chassis 2 through 8
(7 S-Blades)

Power Module Power Module

Figure 1-8: Front of N3001-0x0 Second through Eighth Racks

00X6949 Rev.1.40 1-9


Replacement Procedures: IBM PureData System for Analytics N3001

Machine Type
Label

spa2 encl 1 mm1 spa2 encl 1 mm2


ESM a ESM b
spa2 encl 2 mm1 spa2 encl 2 mm2
ESM a ESM b
spa2 encl 3 mm1 spa2 encl 3 mm2
ESM a ESM b
spa2 encl 4 mm1 spa2 encl 4 mm2
ESM a ESM b
spa2 encl 5 mm1 spa2 encl 5 mm2
ESM a ESM b
spa2 encl 6 mm1 spa3 encl 6 mm2
ESM a ESM b
Power Modules

Upper Left PDU Upper Right PDU

Fabric Switches

spa2 encl 7 mm1 spa2 encl 7 mm2


ESM a ESM b
spa2 encl 8 mm1 spa2 encl 8 mm2
ESM a ESM b
spa2 encl 9 mm1 spa2 encl 9 mm2
ESM a ESM b
spa2 encl 10 mm1 spa2 encl 10 mm2
ESM a ESM b
spa2 encl 11 mm1 spa2 encl 11 mm2
ESM a ESM b
spa2 encl 12 mm1 spa2 encl 12 mm2
ESM a ESM b

Lower Left PDU Lower Right PDU


Management Module 1
Gb Switch Chassis 2 / spa2.mm1
Chassis 2 / Slot 7

Management Module 2
Chassis 2 / spa2.mm2

Gb Switch
Chassis 2 / Slot 9

Figure 1-9: Rear of N3001-0x0 Second Rack

1-10 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Machine Type
Label

spa[3|5|7] encl 1 mm1 spa[3|5|7] encl 1 mm2


ESM a ESM b
spa[3|5|7] encl 2 mm1 spa[3|5|7] encl 2 mm2
ESM a ESM b
spa[3|5|7] encl 3 mm1 spa[3|5|7] encl 3 mm2
ESM a ESM b
spa[3|5|7] encl 4 mm1 spa[3|5|7] encl 4 mm2
ESM a ESM b
spa[3|5|7] encl 5 mm1 spa[3|5|7] encl 5 mm2
ESM a ESM b
spa[3|5|7] encl 6mm1 spa[3|5|7] encl 6 mm2
ESM a ESM b
Power Modules
Upper Right PDU
Upper Left PDU

Management Switch
spa[3|5|7] encl 7 mm1 spa[3|5|7] encl 7 mm2
ESM a ESM b
spa[3|5|7] encl 8 mm1 spa[3|5|7] encl 8 mm2
ESM a ESM b
spa[3|5|7] encl 9 mm1 spa[3|5|7] encl 9 mm2
ESM a ESM b
spa[3|5|7] encl 10 mm1 spa[3|5|7] encl 10 mm2
ESM a ESM b
spa[3|5|7] encl 11 mm1 spa[3|5|7] encl 11 mm2
ESM a ESM b
spa[3|5|7] encl 12 mm1 spa[3|5|7] encl 12 mm2
ESM a ESM b
Lower Left PDU Lower Right PDU

Gb Switch Management Module 1


Chassis [3|5|7] / Slot 7 Chassis [3|5|7] / spa[3|5|7].mm1

Management Module 2
Chassis [3|5|7] / spa[3|5|7].mm2
Gb Switch
Chassis [3|5|7] / Slot 9

Figure 1-10: Rear of N3001-0x0 Third/Fifth/Seventh Rack

00X6949 Rev.1.40 1-11


Replacement Procedures: IBM PureData System for Analytics N3001

Machine Type
Label

spa[4|6|8] encl 1 mm1 spa[4|6|8] encl 1 mm2


ESM a ESM b
spa[4|6|8] encl 2 mm1 spa[4|6|8] encl 2 mm2
ESM a ESM b
spa[4|6|8] encl 3 mm1 spa[4|6|8] encl 3 mm2
ESM a ESM b
spa[4|6|8] encl 4 mm1 spa[4|6|8] encl 4 mm2
ESM a ESM b
spa[4|6|8] encl 5 mm1 spa[4|6|8] encl 5 mm2
ESM a ESM b
spa[4|6|8] encl 6 mm1 spa[4|6|8] encl 6 mm2
ESM a ESM b
Power Modules

Upper Left PDU Upper Right PDU

spa[4|6|8] encl 7 mm1 spa[4|6|8] encl 7 mm2


ESM a ESM b
spa[4|6|8] encl 8 mm1 spa[4|6|8] encl 8 mm2
ESM a ESM b
spa[4|6|8] encl 9 mm1 spa[4|6|8] encl 9 mm2
ESM a ESM b
spa[4|6|8] encl 10 mm1 spa[4|6|8] encl 10 mm2
ESM a ESM b
spa[4|6|8] encl 11 mm1 spa[4|6|8] encl 11 mm2
ESM a ESM b
spa[4|6|8] encl 12 mm1 spa[4|6|8] encl 12 mm2
ESM a ESM b

Lower Left PDU Lower Right PDU


Gb Switch Management Module 1
Chassis [4|6|8] / Slot 7 Chassis [4|6|8]/ spa[4|6|8].mm1

Management Module 2
Chassis [4|6|8] / spa[4|6|8].mm2
Gb Switch
Chassis [4|6|8] / Slot 9

Figure 1-11: Rear of N3001-0x0 Fourth/Sixth/Eighth Rack

1-12 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

FRU Numbers
We recommend that you verify the FRU number of the replacement part as compared to the
original part before beginning the replacement process.
The method for verifying the FRU number varies, depending on component, date of manu-
facture of the Netezza system, and other factors. This sections covers some of the methods
for identifying FRU numbers.
This guide is updated frequently, and every attempt is made to have accurate FRU numbers
listed. However, we recommend that when possible, verify an accurate FRU number by
using one of the methods included in this section.

H-Chassis Component FRU Numbers


The FRU numbers of most components in the H-Chassis can be obtained by typing a com-
mand when logged into the host. These components include:
 AMM
 Backplane
 Blade server
 HBAs
 10Gb Switch
 Media Tray
 Power supplies
 Cooling Modules

To obtain an H-Chassis component FRU number:


1. Log into the host as root.
2. Type the command:
[root@nzhost1 ~]# ssh mm0xx info -T <component>
Where xx is the SPA number (01-08) and <component> is:
 blade[1,3,5,7,9,11,13] for the S-Blade in that slot
 blade[x]:exp[1] for the HBA card on blade x
 mm[1,2] for AMM 1 or 2
 mt[1] for the Media Tray
 switch[7,9] for Gb switch a ([7]) or b ([9])
 system for the midplane
 power[1-4] for power supplies 1 through 4
 blower[1,2] for cooling modules 1 or 2
Example output for the chassis midplane:
system> info -T system
UUID: F113 FFD4 02D5 11DE 9966 0014 5EE2 3DFA
Manufacturer: IBM (FOXC)
Manufacturer ID: 20301
Product ID: 82
Mach type/model: BladeCenter-H/88524YU
Mach serial number: 99C8330
Manuf date: 0709

00X6949 Rev.1.40 1-13


Replacement Procedures: IBM PureData System for Analytics N3001

Hardware rev: 6
Part no.: 41Y4864
FRU no.: 25R5780
FRU serial no.: YK109092D00Y
CLEI: Not Available
AMM slots: 2
Blade slots: 14
I/O Module slots: 10
Power Module slots: 4
Blower slots: 2
Media Tray slots: 1

FRU Numbers Listed in Focal Point


A tool, called Focal Point, is available through the IBM W3 intranet site for part number
look-ups. To use this tool, you must have a valid IBM intranet username and password.
There are also system requirements as shown on the website when you log in.
To access part number lists using Focal Point, you must provide:
 The Machine Type of the system (3561 for the IBM PureData System for Analytics
N3001-001, 3567 for the IBM PureData System for Analytics N3001-002 and larger)
 A Model Number is recommended, but not required:
 ALL
 Select the country location
To locate part numbers for the IBM PureData System for Analytics N3001:
1. Navigate to https://extbasicfocal3.podc.sl.edst.ibm.com
2. Log in using your IBM intranet username and password.
The first time you log in, a page opens describing the system requirements, and a sep-
arate window opens with Focal Point the search tool.
3. Select Extended Search.
4. Type in the Machine Type and Model Number, select your country, then select Search.
5. From the results, choose Parts Information.
A list of applicable parts for the Machine Type (and Model Number, if provided) dis-
plays.

1-14 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

FRU Number List


The following table lists major replaceable components and their FRU numbers.
Table 1-1: Replacement Part Numbers

Component IBM FRU Number Lenovo FRU Number

H-Chassis 31R3308
S-Blade, Blade Server (HS23) 00YL046 00YL045
S-Blade, DIMM 46W0710
S-Blade, Processor 00Y2786
10Gb LOM Interposer 81Y9388
S-Blade, Database Accelerator Card 00X6711
S-Blade, Sidecar Expansion Chassis 68Y7498
AMM 47C2480
10Gb Switch 90Y9392
Midplane use command option use command option
Power Supply use command option use command option
Media Tray (not including optical drive) use command option use command option

SAS Cable - S-Blade to Lower Enclosures 00AN059


SAS Cable - S-Blade to Upper Enclosures 00J1896
SAS Cable - Enclosure to Enclosure 39R6530
10Gbe Cable - 1m 90Y9426
10Gbe Cable - 3m 59Y1942
10Gbe SFP+ Transceiver 46C9297
Fiber Optic Cable (multirack - Chassis Sw to Fabric Sw) 00X6791
Disk Enclosure (EXP2524)
Midplane Assembly 81Y9834
Disk Drive (SED) 00AK388
ESM 00W1241
Power Supply 45W8841
Host Servers
Planar (system board) - 8752 x3750-M4 47C9552
Planar (system board) - 7915 x3650-M4 00AM209
Planar (system board) - 5460 x3650-M4-HD 00AL055
Processor (system board) - 8752 x3750-M4 00D1974
Processor (system board) - 7915 x3650-M4 OOY2781
Processor (system board) - 5460 x3650-M4-HD OOY2786
Thermal Grease 41Y9292
CPU Insertion Tool 94Y9955
DIMM (system board) - 8752 x3750-M4 00D5034
DIMM (system board) - 7915 x3650-M4 00D5034/00D5046
DIMM (system board) - 5460 x3650-M4-HD 00D5046
Power Supply - 8752 x3750-M4 69Y5954
Power Supply - 7915 x3650-M4 94Y8114
Power Supply - 5460 x3650-M4-HD 94Y8118
Disk Drive (5460 and 8752) 90Y8909
RAID Controller (8752 and 5460) N/A (built into system board)
RAID Flash (8752 and 5460) 44W3393

00X6949 Rev.1.40 1-15


Replacement Procedures: IBM PureData System for Analytics N3001

Table 1-1: Replacement Part Numbers (continued)

Component IBM FRU Number Lenovo FRU Number

RAID Flash Backup SuperCap (8752,5460, 7915) 00JY023


RAID Controller (7915) N/A (built into system board)
RAID Flash (7915) 46C9027
Quad-Port Ethernet Adapter 47C8210
10Gbe Dual Port Adapter (Intel) 49Y7962
10Gbe Dual Port Adapter (Embedded - 7915 and 49Y7982
SAS HBA 49Y8676
8Gb FC Dual-Port HBA (Emulex) 42D0500
KVM (IBM/Avocent)
Monitor/Tray 47C2521
Keyboard 00X6927
Switch 69Y6015
USB/Video/Ethernet Adapter 39M2909
Ethernet Cable 90Y3732
Terminators 39M2912
Power Cord Y-Adapter 39M5450
G8052 Management Switch (no P/Ss or Fans) 49Y7922
Power Supply 00D6271
Fan 00FE291
G8264 Fabric Switch (no P/Ss or Fans) 49Y7923
Power Supply 00D6271
Fan 00FE291
Upper and Lower PDUs 00AK104

1-16 00X6949 Rev.1.40


Chapter : Overview of the IBM PureData System for Analytics N3001

Electrostatic Discharge Precautions


Handling static-sensitive devices.
Static electricity can damage the system and other electronic devices. To avoid damage,
keep static-sensitive devices in their static-protective packages until you are ready to install
them.

To reduce the possibility of electrostatic discharge (ESD), observe the following


precautions:
 Use a static strap.
 Limit your movement. Movement can cause static electricity to build up around you.
Handle the device carefully, holding it by its edges or its frame.
 Do not touch solder joints, pins, or exposed printed circuitry.
 Do not leave the device where others can handle and damage it.
 While the device is still in its static-protective package, touch it to an unpainted metal
part of the chassis or any unpainted surface on any other grounded rack component for
at least two seconds. This drains static electricity from the package and from your
body.
 Remove the device from its package and install it immediately without setting down
the device. If it is necessary to set down the device, put it back into its static-protective
package.
 Take additional care when handling devices during cold weather. Heating reduces
indoor humidity and increases static electricity.

Contact IBM Netezza Support


As a best practice, contact IBM Netezza Support before you replace any hardware on the
IBM PureData System for Analytics N3001. IBM Netezza Support can assist you with the
procedure. You should know the MTM numbers for your system, as shown on the MTM label
(see Figure 1-2). You can contact IBM Netezza Support at the IBM Support Portal at: http:/
/www.ibm.com/support. Log in using your IBM ID and password. If you are not a registered
user, you need to register and request an STC (Site Technical Contact) to approve access to
the online tool.
If you cannot access the web site, you can contact IBM Netezza Support at the following
telephone numbers (you need your ICN (IBM Customer Number)):
 USA and Canada Toll-Free: +1.800.IBM.SERV (+1.800.426.7378)
 World-wide telephone listing for Support: http://www.ibm.com/planetwide
Dial the Software Support number and ask for Netezza Appliance Support.

00X6949 Rev.1.40 1-17


Replacement Procedures: IBM PureData System for Analytics N3001

1-18 00X6949 Rev.1.40


CHAPTER 2
Replacing S-Blade Components
What’s in this chapter
 Restricted Environment (Online) - Replace a Blade Server, Battery, or DAC
 Command Line Interface (Online) Replacement Procedure
 Offline Replacement Procedure
 Troubleshooting the Replacement Process

The S-Blade of the IBM PureData System for Analytics N3001 is composed of the assem-
bly of an IBM HS23 blade server and an expansion unit with two Netezza Database
Accelerator Cards (DACs).
Note: The N3001-001 system does not use S-Blades.
Table 2-1: Components Covered by this Procedure

FRU Number

Component IBM Lenovo Procedure Notes

Blade Server Planar 00AE749 00MM273 Online See Note 1

Blade Server DIMM 46W0710 Online or


Offline
Blade Server Processor 00Y2786 Online or
Offline
Blade Server Battery 33F8354 Online

10Gb LOM Interposer 81Y9388 Online or


Offline
DAC 00X6711 Online

Expansion Unit - 68Y7498 Online or See Note 2.


Sidecar (w/o DACs) Offline

Note 1: When replacing the blade planar, you must have on hand two sets of media and the
accompanying README for instructions on booting the blade and updating the embedded
Emulex firmware. The media and documentation is available from Fix Central:
- 4.2.0.5-IM-Netezza-HOSTFW-HS23, and
- FDT Support Tools 2.0.0.5.
Use the downloaded 4.2.0.5-IM-Netezza-HOSTFW-HS23 ISO to create a bootable DVD.

2-1
Replacement Procedures: IBM PureData System for Analytics N3001

Note 2: When replacing the Expansion Unit, do not re-use the PCI riser from the original
Expansion Unit. Always use the PCI riser from the replacement Expansion Unit.
The replacement procedure included in this chapter must be followed when replacing any
component of the S-Blade, whether that component is part of the blade server (planar,
10Gb LOM Interposer, DIMM, processor, or CMOS battery) or part of the Sidecar
Expansion.
This procedure is written with the intent of keeping the Netezza system online. If the failed
component is one of these blade server components:
 Blade server planar
 Blade server CMOS battery
 DAC (see Table 2-1 for replacement requirements)
The component must be replaced with the system online.

If the failed component is one of these blade server components:


 DIMM
 Blade server processor
 10Gb LOM Interposer
 Expansion Unit (Sidecar)
The component may be replaced with the system offline if the customer agrees to take the
system offline for the duration of the procedure.

When re-seating an S-Blade the blade must be in a failover state or the NPS system must
be stopped.

For reference, the mechanical steps for removal and replacement of components located in
the blade server and sidecar (expansion unit) are documented in IBM BladeCenter HS23
Problem Determination and Service Guide.
DO NOT use the standard USB thumb drive to update firmware on blade components. Use
only the media described in this procedure.

This procedure requires the user to have root access.

The estimated time to perform this procedure from 90 to 240 minutes, depending on
which components require replacement, ease of access to the system and familiarity with
NPS and the Netezza system
Before you begin the S-Blade replacement process, make certain that you have contacted
Netezza Customer Service and that a service event plan is in place.
This procedure must be run from the N3001 active host.

2-2 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Blade Server Planar DAC Cards

Blade Server Memory

Blade Server
(HS23) Blade Server Processors Sidecar Expansion Unit

Figure 2-1: S-Blade Assembly

Restricted Environment (Online) - Replace a Blade Server, Battery, or DAC


Use this procedure to replace the failed S-Blade component only if the customer has
installed and enabled the Restricted Environment as part of NPS 7.2.0.5 or later.
In addition to the following procedure, an update tool in FDT Support Tools 2.0.0.5 is
needed to update the Emulex firmware on the S-Blade. This firmware update cannot be
performed within the Restricted Environment, and must be accomplished at the end of this
procedure for the S-Blade to be functional. See Tech Note 1992012 for more details.

Note: The customer is responsible for enabling and configuring the Restricted Environment
as part of NPS 7.2.0.5 or later.

If Call Home is enabled on this system, the System Manager will report this activity and
create PMRs. The customer must be made aware of this so that the PMRs can be closed
when filed.

1. Read the safety information that begins on page v.


2. Log into the system and enter the Restricted Environment:
login as: nzibmsupport13
When prompted, type the password (default is nzibmsupport13).
Example output:

00X6949 Rev.1.40 2-3


Replacement Procedures: IBM PureData System for Analytics N3001

NZ support actions
1 Replace Storage Disk
2 Replace SPU
3 Quit
>
3. Select option 2 from the menu:
> 2
Example output:
Logging to file /nz/var/log/replacespu/replacespu20141222151546.log

Please enter either the hwid or spa/slot of the SPU to be replaced


Provide either spa/slot or hwid format
spa/slot example for spu0105
1/5
hwid example for spu with hwid 1063
1063
You can also provide multiple spus with space or comma seperated
input
1/5 1/7 3/5

>
4. Type the S-Blade location information or HWID identified in step 9 on page 2-19.
For example:
> 1/13
Where 1 is the SPA number and 13 is the S-Blade slot number.
Or:
> 1674
Where 1674 is the HWID of the failed S-Blade.
Example output:
Turning on location LED for SPU 1 13
Location LED turned on for SPU 1 13
Retrieving blade product name for spa 1 slot 13
Retrieving blade serial number for spa 1 slot 13
-
Please replace the blade(s) in
SPA SLOT
1 13

- Using the illuminated LED, locate the failed blade(s) being


replaced.
- Remove the failed blade.
- Perform required component replacement as required (NIC, DAC,
etc)
- Insert the new replacement blade.
- Do not connect the SAS Cable to the new blade until instructed.
- Upon insertion, blade LED will 'fast blink' as the blade
initializes. This may take several minutes..
- Wait for LED to enter 'slow blink' mode.
- Verify Mach Type / Model and FRU serial # (ssh mm0xx 'info -T
blade[x]')

Please remember to remove SAS cables before replacing the blade.


Press RETURN when ready (or ^C to quit) ...

2-4 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

The replacespu script locates and lights the S-Blade’s blue location LED ( ) and
prompts you to replace the hardware, providing the SPA and SLOT numbers, and
instructs you to remove SAS cabling, and to press RETURN when ready to proceed.

At this point you DO NOT press RETURN.

Continue with the following steps.

Return to the replacespu session at step 14.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

5. Making note of cable locations, so that they can be replaced in the same locations,
unplug the SAS cables from the S-Blade sidecar.
6. Remove the S-Blade assembly from the chassis.
Note: It is NOT necessary to power down the blade.

Only re-insert the S-Blade into the chassis as a complete Blade Server/Expansion Sidecar
unit. DO NOT insert just the blade server into the chassis.

00X6949 Rev.1.40 2-5


Replacement Procedures: IBM PureData System for Analytics N3001

Blade Server

Ejector
Handles

Sidecar with DACs

Figure 2-2: S-Blade Removal

7. Separate the sidecar with the Database Accelerator Cards and associated hardware
from the blade processor card.
Note: Make note of the MTM and serial number of the original blade. These numbers
are located on a label on the side of the blade chassis.

8. As necessary, move components such as 10Gb LOM Interposer, DIMMs, and CPUs to
the replacement card.
9. If replacing a Database Accelerator Card:
a. Remove the cover from the sidecar assembly.
b. Disconnect the ribbon cable from the expansion card:

2-6 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Figure 2-3: Disconnecting the Ribbon Cable

c. On the underside of the tray, press the blue release latch and slide the expansion
card/riser assembly out of the tray:

DAC2

Top Edge of Sidecar


(when inserted into chassis)

DAC1

Figure 2-4: Sliding Expansion Card/Riser Assembly out of Tray

00X6949 Rev.1.40 2-7


Replacement Procedures: IBM PureData System for Analytics N3001

d. Release the adapter-retention latch and remove the failed DAC from the Riser
Assembly.
Note that DAC1 is in the top PCI slot and DAC2 is in the bottom PCI slot of the
sidecar. Looking at the serial number on the DAC, verify that you are replacing the
failed DAC listed in the output shown in step 9.
There is a specified sequence to follow when inserting the DAC cards into the Riser Assem-
bly. For that reason, be aware of:
 DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2.
 If replacing DAC1, also remove DAC2 from the Riser Assembly, replace DAC1
with the replacement DAC, then reinstall DAC2.
 If replacing only DAC2, there is no need to remove DAC1 from the Riser
Assembly.

DAC1 Serial Number

DAC2

Figure 2-5: N3001 Sidecar with Two DACs

e. Install the replacement DAC into the Riser Assembly.


Note: There is a specified sequence to follow when inserting the DAC cards into the
Riser Assembly. For that reason, be aware of:

 DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2
 Properly align the DAC connector panel with the sidecar front panel and with
the alignment pin at the top of the connector panel (see Figure 2-16)
 Fully seat the DAC into the PCI connector, firmly pressing on the top of the

2-8 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

connector panel
 Ensure that adapter-retention latch is fully closed and locked

Alignment Pin

Connector Panel

Figure 2-6: Alignment Pin / Connector Panel

Insert
2nd

Insert
1st

Figure 2-7: Installing DAC and Closing Latch

f. Slide the expansion card/riser assembly into the tray.

00X6949 Rev.1.40 2-9


Replacement Procedures: IBM PureData System for Analytics N3001

Figure 2-8: Sliding Expansion Card/Riser Assembly into Tray

g. Connect the ribbon cable to the expansion card.

Figure 2-9: Connecting the Ribbon Cable

h. Using care to ensure that the side tabs of the cover do not contact the DACs,
replace the cover for the sidecar assembly.
Ensure when replacing the cover that the latching tab does not push on the DAC card,
causing misalignment of the DAC cards in the assembly.

2-10 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Cover Latching Tab

DAC

Figure 2-10: Top Cover Replacement

10. Assemble the sidecar with the Database Accelerator Cards and associated hardware
onto the blade processor card.

Statement 21

CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
11. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
Note: DO NOT plug the SAS cables into the sidecar until instructed to do so.

12. Temporarily connect the USB cable from the KVM USB/Video/Ethernet Adapter to the
AMM managing the chassis containing the replaced S-Blade. For multi-rack systems,
you may need a long CAT-5 cable for the USB Adapter to reach the AMM.

HS23 blade server planars coming out of FRU stock are likely to have a down-rev version of
Emulex firmware that prevents the blade from booting.
You must have on hand media for booting the blade and updating the embedded Emulex
firmware. This media is available from Fix Central: 4.2.0.5-IM-Netezza-HOSTFW-HS23.
Use the downloaded ISO to create a bootable DVD.

00X6949 Rev.1.40 2-11


Replacement Procedures: IBM PureData System for Analytics N3001

Included with the media on Fix Central is a README file intended for the IBM SSR
taking the media to the customer site. It is important for the SSR to read and under-
stand the content of the README to successfully complete the replacement process.
A copy of the README follows:

This bootable DVD must be used as part of the HS23 S-Blade planar or CMOS
Battery replacement in IBM PureData Systems for Analytics N3001.
NOTE: Ensure SAS cables are not attached to the SPU before booting this
DVD
*** The HS23 Problem Determination and Service Guide is used only for ***
*** the mechanical steps of removing the components. DO NOT use the ***
*** standard USB thumb drive during the "replacespu" blade replacement ***
*** procedure. ***
The bootable DVD is loaded after the replacement blade is inserted into
the H-Chassis.
WARNING: DO NOT use this DVD on any S-Blades other than HS23 (MT 7875).
Note: This DVD is not required for processor or DIMM replacement on the
HS23.
It is required only for the planar or CMOS battery replacement.
************************************************************************
* *
* During the update process, the blade *
* reboots UP TO 7 times! *
* These 7 reboots can take up to 30 minutes! *
* The process continues after the DVD ejects. *
* *
* Wait 3-4 minutes after the DVD routine screen displays *
* “Ready" *
* before continuing the replacespu procedure *
* by pressing Enter *
* *
************************************************************************
Issues:
1) The HS23 FRU stock for the planar (7875AC1) may have a level of code
for the Emulex NIC that causes hangs during Media boot (DVD) or PXE
boot operations on systems with heavy network traffic. That code level
is 4.6.281.26.
NOTE: If booting the DVD hangs on the replacement planar, you may need
to disable the network ports on the gig switches for the DVD to boot
during the Emulex FW update.
If it is necessary to disable the network Ports on the gig switches
for the S-Blade to boot, details are in Chapter 2 of the
N3001 Replacement Procedures.
2) The Emulex firmware 10.2.261.36 and/or 10.2.377.29 may occasionally
"HANG" during reboot of an S-Blade when heavy workload is on fabric.
Emulex Firmware 10.2.377.59 corrects this "HANG" condition so the
S-Blade proceeds in booting.
However, it has been found that the 10.2.377.59 code may cause bus
contention between the management bus and the fabric bus.
The latest code, 10.2.377.65, corrects all the above issues.
3) The ASU setting for 64-Bit resource allocation is incorrect (disabled)
while the blade settings are in default state. This prevents the NPS
64-bit OS from loading on the blade. This setting must be changed to
Enable for replacespu to continue. The DVD corrects this setting.

2-12 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

13. Use the bootable 4.2.0.5-IM-Netezza-HOSTFW-HS23 DVD to configure the replace-


ment blade component(s):
a. Insert the bootable 4.2.0.5-IM-Netezza-HOSTFW-HS23 DVD into the H-Chassis
media tray.
b. The Power LED on the S-Blade must transition from the fast-blink pattern to the
slow-blink pattern (see Figure 2-11).
c. On the replacement S-Blade, press the KVM and Media Tray select buttons to assign
control to the replacement S-Blade (see Figure 2-11).
d. Press the Power button so that the S-Blade boots.
e. Proceed to the KVM to monitor the boot progress on the screen.
f. At the IBM System x splash screen press F12 to Select the Boot Device, and choose
the Media Tray CD/DVD ROM to boot using the 4.2.0.5-IM-Netezza-HOSTFW-HS23
DVD.

Figure 2-11: S-Blade Power-on LED and Control Buttons

Note: During the booting of the DVD various error messages may display. They can be
ignored (unless there is a kernel panic, in which case reseat the S-Blade and reboot). If
you again get the Kernel panic, you may need to replace the S-Blade.

If the blade does not boot to the DVD, see section “Hang during DVD Boot” on page 2-39.

Loading Customized Media . . .


Waiting for devices to come up . . .ipmi message handler version 39.2
ipmi_si: unknown parameter ‘IBF_RETRY_TIMEOUT’
ipmi device interface
Bus 001 Device 004: ID 04b3:4003 IBM Corp.
mv: cannot stat ‘/etc/init.d/ibmasm.ibmusbasm’: No such file or
directory
mv: cannot stat ‘/sbin/ibmspup.ibmusbasm’: No such file or directory
mv: cannot stat ‘/sbin/ibmspdown.ibmusbasm’: No such file or directory
mv: cannot stat ‘/lib64/libsysSp.so.32.ibmusbasm’: No such file or

00X6949 Rev.1.40 2-13


Replacement Procedures: IBM PureData System for Analytics N3001

directory
Setting host name to tc-10-0-9-183

g. When completely booted, the Customized Media menu displays.


Loading Customized Media . . .

-- Calling nzprehook
Entering nzprehook.sh

1) CMOS Battery Replacement (Updates SMBOS settings only)


2) 7875 Planar Replacement
(updates Emulex fw, Blade/Host fw, and SMBIOS settings; as
required)
--Enter Choice [1-2]

 If you replaced the CMOS battery, type 1. (Proceed to step i)


 If you replaced the blade planar, type 2. (Proceed to step h)
h. For planar replacements:
 The script prompts if you want to change the MTM (the MTM and serial num-
bers were recorded in step 7 on page 2-6):
This is your existing MTM: 7875HQ1. Do you want to change it? [y/n]

Type y to change the MTM to 7875AC1, or type n if the MTM is already set to
7875AC1.
 If you typed y, you are prompted for the 7 digit type number.
Enter new 7 digit type number ie: 7875AC1

Type the MTM: 7875AC1


Changing VPD information now... Please wait
IBM Advanced Settings Utility version 9.61.85A
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
SYSTEM_PROD_DATA.SysInfoProdName=7875AC1
Waiting for command completion status.
Command completed successfully.

 The script prompts if you want to change the serial number:


This is your existing Serial Number: 1234565. Do you want to change
it? [y/n]

Type y to change the serial number.


 You are prompted for the 7 digit serial number of the original planar.
Enter new 7 digit serial number ie: 12345657

2-14 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

The serial number is located on a label on the original blade chassis.


Type the serial number. For example: 06LPXX1
Changing VPD information now... Please wait
IBM Advanced Settings Utility version 9.61.85A
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
SYSTEM_PROD_DATA.SysInfoSerialNum=06kpxx1
Waiting for command completion status.
Command completed successfully.

 After updating the serial number, the script reboots the IMM (taking about four
minutes), and then reboots the blade.
Note: The VPD is updated, and at this point the blade reboots and restarts at the
beginning of the prompts, asking for MTM and serial numbers. You answer n to
those prompts this time.

When the blade reboot completes, the Customized Media menu displays again.
Loading Customized Media . . .

-- Calling nzprehook
Entering nzprehook.sh

1) CMOS Battery Replacement (Updates SMBOS settings only)


2) 7875 Planar Replacement
(updates Emulex fw and SMBIOS settings; as required)
--Enter Choice [1-2]

Type: 2
Type n at the MTM prompt (assuming the MTM is set correctly).
Type n at the serial number prompt (assuming the serial number is set correctly).
 The script checks the Emulex firmware, and updates if needed:
**************************************************************
Emulex Adapter updater is Enabled
Checking Revision to determine if update is required....
**************************************************************
Emulex NIC updater for Machine Type 7875
The script assumes the 7875 is configured with
an Emulex adapter in device “eth0”

Forcing Emulex firmware update


ethtool -f eth0 oc11-10.2.377.65.ufi 0

It is important to pay attention to screen output so as to be aware if an error occurs during


the update. If an error does occur, please contact IBM Netezza support to determine cor-
rective action.

 The script configures the ASU settings.

00X6949 Rev.1.40 2-15


Replacement Procedures: IBM PureData System for Analytics N3001

-- Running ASU IMM configuration unattended mode

-- Executing ASU configuration script /toolscenter/asu-config.sh


7875 62 /tmp/bomc.log -- /toolscenter/asu/asu64 restore /
toolscenter/7875-62-asu.config 2>&1 | tee -a /tmp/bomc.log

 When the update completes, the following message displays:


Waiting for command completion status>
Command completed successfully.
-- Machine type 7875 finished applying updates
 The routine continues and asks if you want to save log files to a USB Flash
Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################

Log copy to USB Flash Drive is enabled

-- Only 1 flash drive can be inserted


-- The contents of /tmp and /var/log will be tarred into a
single file and written to the flash drive. Make sure it
has enough room. A minimum of 2.5 MB is required.

#################################################################

Note: Only ONE USB flash drive can be plugged into the media tray.

-- Do you want to save the logs to a USB Flash Drive? (y/n)

Type Y to write the log files to a USB flash drive.

*** Writing logs to USB Flash Drive if one is present ***

Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).

USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:

 When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
i. For CMOS battery replacements:
 The script now updates the ASU settings.
-- Running ASU IMM configuration unattended mode

-- Executing ASU configuration script /toolscenter/asu-config.sh


7875 62 /tmp/bomc.log -- /toolscenter/asu/asu64 restore /
toolscenter/7875-62-asu.config 2>&1 | tee -a /tmp/bomc.log

 At this point, the routine continues and asks if you want to save log files to a
USB Flash Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################

Log copy to USB Flash Drive is enabled

2-16 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

-- Only 1 flash drive can be inserted


-- The contents of /tmp and /var/log will be tarred into a
single file and written to the flash drive. Make sure it
has enough room. A minimum of 2.5 MB is required.

#################################################################

Note: Only ONE USB flash drive can be plugged into the media tray.

-- Do you want to save the logs to a USB Flash Drive? (y/n)

Type N to skip writing the log files to a USB flash drive.


Type Y to write the log files to a USB flash drive.

*** Writing logs to USB Flash Drive if one is present ***

Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).

USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:

 When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
j. When the console displays Ready:
 Remove the USB flash drive (if used)
 Remove the DVD from the media tray
 Disconnect the KVM cable that was temporarily connected to the AMM
 Wait four additional minutes before proceeding
The SAS cables must NOT be connected at this point.

14. Resume the replacespu procedure to complete the blade service activity.
From this point on, replacespu may take up to 90 minutes to complete the updates of
the S-Blade firmware and the FPGA code.
Press Enter (or RETURN) for the replacespu script to continue.
Example output:
Is the SAS Cable still disconnected from the replacement blade? (y/
n) [y]
15. Ensure that the SAS cable is removed from the failed S-Blade and type y.
When the script completes, the script displays:

Completed run of replacespu utility

New SPU(s) in the following spa/slots are now available


SPA SLOT HWID
1 13 1677

00X6949 Rev.1.40 2-17


Replacement Procedures: IBM PureData System for Analytics N3001

Use 'nzhw rebalance' to make these SPUs 'Active'


Note this WILL pause your system for ~5 minutes.
For additional information see log file : /nz/var/log/replacespu
replacespu20141222151546.log.gz
Please remember to reattach the SAS cables before rebalancing the
system.

Logfile: /nz/var/log/replacespu/replacespu20141222151546.log.gz

NZ support actions
1 Replace Storage Disk
2 Replace SPU
3 Quit
>
16. Select option 3 from the menu:
> 3
Reattach the SAS cables to their original positions.
The output from replacespu indicates it is necessary to rebalance the blades. However, a
current issue with Emulex firmware requires an updated be made using a tool in FDT Sup-
port Tools 2.0.0.5. After the firmware is updated, the NPS system must be stopped and
restarted (using nzstop and nzstart). No rebalance is needed in this procedure. See step 21
on page 2-35 through step 23 on page 2-35. The nzstop/nzstart step must be scheduled
with the customer when the system can be taken offline.

17. The replacement is complete after the Emulex firmware is updated as instructed in
FDT Support Tools 2.0.0.5 and the NPS system is stopped/started.

2-18 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Command Line Interface (Online) Replacement Procedure


As part of the following procedure, an update tool in FDT Support Tools 2.0.0.5 is needed
to update the Emulex firmware on the S-Blade. This firmware update must be accom-
plished at the end of this procedure for the S-Blade to be functional. See Tech Note
1992012 for more details.

To replace S-Blade components on the N3001 system, follow these steps:


1. Read the safety information that begins on page v.
2. Log into the active host as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

4. Check the state of the Netezza system:


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
5. If the system state is not online, start the system using the command:
[nz@nzhost1 ~]$ nzstart

6. Wait for the system to go online using the command:


[nz@nzhost ~]$ nzstate
System state is 'Online'.
7. Check for any data slice issues with the command:
[nz@nzhost1 ~]$ nzds -issues
Make note of any issues for comparison later in the procedure.
8. Check for any partitioning issues with the command:
[nz@nzhost1 ~]$ nzspupart -issues
Make note of any issues for comparison later in the procedure.
9. Identify the failed S-Blade component that requires replacement, noting the HWID:
[nz@nzhost1 ~]$ nzhw -detail -issues
Description HW ID Location Role State Serial number Version Detail
----------- ----- --------------- ------ ----- ------------- ------- ---------
SPU 1607 spa1.spu3 Failed
-or-
DAC 1586 spa1.spu7.dac1 Active Ok xxxxxx173534 1
FPGAs;
Note the failed component, its location, and its serial number.

00X6949 Rev.1.40 2-19


Replacement Procedures: IBM PureData System for Analytics N3001

If you are using this online replacement process to replace a “failing” S-Blade or S-Blade
component such as:
 DIMM
 Processor
 BladeCenter Expansion Blade
The command in step 9 does not return any values.
If the S-Blade is not already in a failed state, you must manually failover an S-Blade to
replace components.
To failover an active but problematic S-Blade that has components you want to replace,
follow these steps.
This procedure assumes that you know the ID of the problematic S-Blade.
If the S-Blade is already failed, skip to step 1.
a. Fail over the problem S-Blade:
[nz@nzhost1 ~]$ nzhw failover -id 1607
Are you sure you want to proceed (y|n)? [n] y
When you fail over the S-Blade, the system performs a state transition to reconfigure the
topology of the system; that is, the system redirects the data slices owned by the now failed
S-Blade to the other active S-Blades in the same chassis. The transition process can take
approximately 15 minutes to complete. Wait for the system to transition back into the
Online state before you continue. (Use the nzstate command to confirm the Online state.)
b. Proceed to either “Restricted Environment (Online) - Replace a Blade Server,
Battery, or DAC” on page 2-3 or “Replace a Blade Server, Battery, or DAC” on
page 2-20.

Replace a Blade Server, Battery, or DAC


Use this procedure to replace the failed S-Blade component in a standard Netezza
environment.
1. Open a new session and log in as root.
2. Run the replacespu script (which is required to replace a failed Blade server planar,
Blade server battery, or DAC):
[root@nzhost ~]# /nz/kit/bin/adm/tools/replacespu -s x/y
Where x is the SPA number and y is the SPU number of the component identified in
step 9.
[root@nzhost ~]# /nz/kit/bin/adm/tools/replacespu -s 1/3
[root@nzhost ~]# /nz/kit/bin/adm/tools/replacespu -id 1607
The replacespu script locates and lights the S-Blade’s blue location LED ( ) and
prompts you to replace the hardware, providing the SPA and SLOT numbers, and may
instruct you to remove SAS cabling, and to press RETURN when ready to proceed.

At this point you DO NOT press RETURN.

Continue with the following steps.

Return to the replacespu session at step 14.

2-20 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Note: If the replacespu script exits after timing out (up to 60 minutes), remove the
replacement S-Blade and re-insert the board, then run the replacespu script again. If
the command times out again, contact IBM Netezza Support for assistance.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

3. Making note of cable locations, so that they can be replaced in the same locations,
unplug the SAS cables from the S-Blade sidecar.
4. Remove the S-Blade assembly from the chassis.
Note: It is NOT necessary to power down the blade.

Only re-insert the S-Blade into the chassis as a complete Blade Server/Expansion Sidecar
unit. DO NOT insert just the blade server into the chassis.

Blade Server

Ejector
Handles

Sidecar with DACs

Figure 2-12: S-Blade Removal

5. Separate the sidecar with the Database Accelerator Cards and associated hardware
from the blade processor card.
Note: Make note of the MTM and serial number of the original blade. These numbers
are located on a label on the side of the blade chassis.

6. As necessary, move components such as 10Gb LOM Interposer, DIMMs, and CPUs to
the replacement card.
7. If replacing a Database Accelerator Card:

00X6949 Rev.1.40 2-21


Replacement Procedures: IBM PureData System for Analytics N3001

a. Remove the cover from the sidecar assembly.


b. Disconnect the ribbon cable from the expansion card:

Figure 2-13: Disconnecting the Ribbon Cable

c. On the underside of the tray, press the blue release latch and slide the expansion
card/riser assembly out of the tray:

2-22 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

DAC2

Top Edge of Sidecar


(when inserted into chassis)

DAC1

Figure 2-14: Sliding Expansion Card/Riser Assembly out of Tray

d. Release the adapter-retention latch and remove the failed DAC from the Riser
Assembly.
Note that DAC1 is in the top PCI slot and DAC2 is in the bottom PCI slot of the
sidecar. Looking at the serial number on the DAC, verify that you are replacing the
failed DAC listed in the output shown in step 9.
There is a specified sequence to follow when inserting the DAC cards into the Riser Assem-
bly. For that reason, be aware of:
 DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2.
 If replacing DAC1, also remove DAC2 from the Riser Assembly, replace DAC1
with the replacement DAC, then reinstall DAC2.
 If replacing only DAC2, there is no need to remove DAC1 from the RIser
Assembly.

00X6949 Rev.1.40 2-23


Replacement Procedures: IBM PureData System for Analytics N3001

DAC1 Serial Number

DAC2

Figure 2-15: N3001 Sidecar with Two DACs

e. Install the replacement DAC into the Riser Assembly.


Note: There is a specified sequence to follow when inserting the DAC cards into the
Riser Assembly. For that reason, be aware of:

 DAC1 should always be inserted first into the Riser Assembly, followed by
DAC2
 Properly align the DAC connector panel with the sidecar front panel and with
the alignment pin at the top of the connector panel (see Figure 2-16)
 Fully seat the DAC into the PCI connector, firmly pressing on the top of the
connector panel
 Ensure that adapter-retention latch is fully closed and locked

Alignment Pin

Connector Panel

Figure 2-16: Alignment Pin / Connector Panel

2-24 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Insert
2nd

Insert
1st

Figure 2-17: Installing DAC and Closing Latch

f. Slide the expansion card/riser assembly into the tray.

Figure 2-18: Sliding Expansion Card/Riser Assembly into Tray

00X6949 Rev.1.40 2-25


Replacement Procedures: IBM PureData System for Analytics N3001

g. Connect the ribbon cable to the expansion card.

Figure 2-19: Connecting the Ribbon Cable

h. Using care to ensure that the side tabs of the cover do not contact the DACs,
replace the cover for the sidecar assembly.
Ensure when replacing the cover that the latching tab does not push on the DAC card,
causing misalignment of the DAC cards in the assembly.

Cover Latching Tab

DAC

Figure 2-20: Top Cover Replacement

8. Assemble the sidecar with the Database Accelerator Cards and associated hardware
onto the blade processor card.

2-26 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Statement 21

CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
9. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
Note: DO NOT plug the SAS cables into the sidecar until instructed to do so.

10. Temporarily connect the USB cable from the KVM USB/Video/Ethernet Adapter to the
AMM managing the chassis containing the replaced S-Blade. For multi-rack systems,
you may need a long CAT-5 cable for the USB Adapter to reach the AMM.
HS23 blade server planars coming out of FRU stock are likely to have a down-rev version of
Emulex firmware that prevents the blade from booting.
You must have on hand media for booting the blade and updating the embedded Emulex
firmware. This media is available from Fix Central: 4.2.0.5-IM-Netezza-HOSTFW-HS23.
Use the downloaded ISO to create a bootable DVD.

00X6949 Rev.1.40 2-27


Replacement Procedures: IBM PureData System for Analytics N3001

Included with the media on Fix Central is a README file intended for the IBM SSR
taking the media to the customer site. It is important for the SSR to read and under-
stand the content of the README to successfully complete the replacement process.
A copy of the README follows:

This bootable DVD must be used as part of the HS23 S-Blade planar or CMOS
Battery replacement in IBM PureData Systems for Analytics N3001.
NOTE: Ensure SAS cables are not attached to the SPU before booting this
DVD
*** The HS23 Problem Determination and Service Guide is used only for ***
*** the mechanical steps of removing the components. DO NOT use the ***
*** standard USB thumb drive during the "replacespu" blade replacement ***
*** procedure. ***
The bootable DVD is loaded after the replacement blade is inserted into
the H-Chassis.
WARNING: DO NOT use this DVD on any S-Blades other than HS23 (MT 7875).
Note: This DVD is not required for processor or DIMM replacement on the
HS23.
It is required only for the planar or CMOS battery replacement.
************************************************************************
* *
* During the update process, the blade *
* reboots UP TO 7 times! *
* These 7 reboots can take up to 30 minutes! *
* The process continues after the DVD ejects. *
* *
* Wait 3-4 minutes after the DVD routine screen displays *
* “Ready" *
* before continuing the replacespu procedure *
* by pressing Enter *
* *
************************************************************************
Issues:
1) The HS23 FRU stock for the planar (7875AC1) may have a level of code
for the Emulex NIC that causes hangs during Media boot (DVD) or PXE
boot operations on systems with heavy network traffic. That code level
is 4.6.281.26.
NOTE: If booting the DVD hangs on the replacement planar, you may need
to disable the network ports on the gig switches for the DVD to boot
during the Emulex FW update.
If it is necessary to disable the network Ports on the gig switches
for the S-Blade to boot, details are in Chapter 2 of the
N3001 Replacement Procedures.
2) The Emulex firmware 10.2.261.36 and/or 10.2.377.29 may occasionally
"HANG" during reboot of an S-Blade when heavy workload is on fabric.
Emulex Firmware 10.2.377.59 corrects this "HANG" condition so the
S-Blade proceeds in booting.
However, it has been found that the 10.2.377.59 code may cause bus
contention between the management bus and the fabric bus.
The latest code, 10.2.377.65, corrects all the above issues.
3) The ASU setting for 64-Bit resource allocation is incorrect (disabled)
while the blade settings are in default state. This prevents the NPS
64-bit OS from loading on the blade. This setting must be changed to
Enable for replacespu to continue. The DVD corrects this setting.

2-28 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

11. Use the bootable 4.2.0.5-IM-Netezza-HOSTFW-HS23 DVD to configure the replace-


ment blade component(s):
a. Insert the bootable 4.2.0.5-IM-Netezza-HOSTFW-HS23 DVD into the H-Chassis
media tray.
b. The Power LED on the S-Blade must transition from the fast-blink pattern to the
slow-blink pattern (see Figure 2-11).
c. On the replacement S-Blade, press the KVM and Media Tray select buttons to give
control to the replacement S-Blade (see Figure 2-11).
d. Press the Power button so that the S-Blade boots.
e. At the IBM System x splash screen press F12 to Select the Boot Device, and choose
the Media Tray CD/DVD ROM to boot using the 4.2.0.5-IM-Netezza-HOSTFW-HS23
DVD.

Figure 2-21: S-Blade Power-on LED and Control Buttons

Note: During the booting of the DVD various error messages may display. They can be
ignored (unless there is a kernel panic, in which case reseat the S-Blade and reboot). If
you again get the Kernel panic, you may need to replace the S-Blade.

If the blade does not boot to the DVD, see section “Hang during DVD Boot” on page 2-39.

Loading Customized Media . . .


Waiting for devices to come up . . .ipmi message handler version 39.2
ipmi_si: unknown parameter ‘IBF_RETRY_TIMEOUT’
ipmi device interface
Bus 001 Device 004: ID 04b3:4003 IBM Corp.
mv: cannot stat ‘/etc/init.d/ibmasm.ibmusbasm’: No such file or
directory
mv: cannot stat ‘/sbin/ibmspup.ibmusbasm’: No such file or directory
mv: cannot stat ‘/sbin/ibmspdown.ibmusbasm’: No such file or directory
mv: cannot stat ‘/lib64/libsysSp.so.32.ibmusbasm’: No such file or
directory
Setting host name to tc-10-0-9-183

00X6949 Rev.1.40 2-29


Replacement Procedures: IBM PureData System for Analytics N3001

f. When completely booted, the Customized Media menu displays.


Loading Customized Media . . .

-- Calling nzprehook
Entering nzprehook.sh

1) CMOS Battery Replacement (Updates SMBOS settings only)


2) 7875 Planar Replacement
(updates Emulex fw, Blade/Host fw, and SMBIOS settings; as
required)
--Enter Choice [1-2]

 If you replaced the CMOS battery, type 1. (Proceed to step h)


 If you replaced the server planar, type 2. (Proceed to step g)
g. For planar replacements (the MTM and serial numbers were recorded in step 5 on
page 2-21):
 The script prompts if you want to change the MTM:
This is your existing MTM: 7875HQ1. Do you want to change it? [y/n]

Type y to change the MTM to 7875AC1, or type n if the MTM is already set to
7875AC1.
 If you typed y, you are prompted for the 7 digit type number.
Enter new 7 digit type number ie: 7875AC1

Type the MTM: 7875AC1


Changing VPD information now... Please wait
IBM Advanced Settings Utility version 9.61.85A
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
SYSTEM_PROD_DATA.SysInfoProdName=7875AC1
Waiting for command completion status.
Command completed successfully.

 The script prompts if you want to change the serial number:


This is your existing Serial Number: 1234565. Do you want to change
it? [y/n]

Type y to change the serial number.


 You are prompted for the 7 digit serial number of the original planar.
Enter new 7 digit serial number ie: 12345657

The serial number is located on a label on the original blade chassis.


Type the serial number. For example: 06LPXX1

2-30 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Changing VPD information now... Please wait


IBM Advanced Settings Utility version 9.61.85A
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2007-2014 All Rights Reserved
Successfully discovered the IMM via SLP.
Discovered IMM at IP address 169.254.95.118
Connected to IMM at IP address 169.254.95.118
SYSTEM_PROD_DATA.SysInfoSerialNum=06kpxx1
Waiting for command completion status.
Command completed successfully.

 After updating the serial number, the script reboots the IMM (taking about four
minutes), and then reboots the blade.
Note: The VPD is updated, and at this point the blade reboots and restarts at the
beginning of the prompts, asking for MTM and serial numbers. You answer n to
those prompts this time.

When the blade reboot completes, the Customized Media menu displays again.
Loading Customized Media . . .

-- Calling nzprehook
Entering nzprehook.sh

1) CMOS Battery Replacement (Updates SMBOS settings only)


2) 7875 Planar Replacement
(updates Emulex fw, Blade/Host fw, and SMBIOS settings; as
required)
--Enter Choice [1-2]

Type: 2
Type n at the MTM prompt (assuming the MTM is set correctly).
Type n at the serial number prompt (assuming the serial number is set correctly).
 The script checks the Emulex firmware, and updates if needed:
**************************************************************
Emulex Adapter updater is Enabled
Checking Revision to determine if update is required....
**************************************************************
Emulex NIC updater for Machine Type 7875
The script assumes the 7875 is configured with
an Emulex adapter in device “eth0”

Forcing Emulex firmware update


ethtool -f eth0 oc11-10.2.377.65.ufi 0

It is important to pay attention to screen output so as to be aware if an error occurs during


the update. If an error does occur, please contact IBM Netezza support to determine cor-
rective action.

 The script configures the ASU settings.

00X6949 Rev.1.40 2-31


Replacement Procedures: IBM PureData System for Analytics N3001

-- Running ASU IMM configuration unattended mode

-- Executing ASU configuration script /toolscenter/asu-config.sh


7875 62 /tmp/bomc.log -- /toolscenter/asu/asu64 restore /
toolscenter/7875-62-asu.config 2>&1 | tee -a /tmp/bomc.log

 The routine continues and asks if you want to save log files to a USB Flash
Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################

Log copy to USB Flash Drive is enabled

-- Only 1 flash drive can be inserted


-- The contents of /tmp and /var/log will be tarred into a
single file and written to the flash drive. Make sure it
has enough room. A minimum of 2.5 MB is required.

#################################################################

Note: Only ONE USB flash drive can be plugged into the media tray.

-- Do you want to save the logs to a USB Flash Drive? (y/n)

Type Y to write the log files to a USB flash drive.

*** Writing logs to USB Flash Drive if one is present ***

Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).

USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:

 When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
h. For CMOS battery replacements:
 The script now updates the ASU settings.
-- Running ASU IMM configuration unattended mode

-- Executing ASU configuration script /toolscenter/asu-config.sh


7875 62 /tmp/bomc.log -- /toolscenter/asu/asu64 restore /
toolscenter/7875-62-asu.config 2>&1 | tee -a /tmp/bomc.log

 At this point, the routine continues and asks if you want to save log files to a
USB Flash Drive. (At least 2.5 MB space required on the USB flash drive.)
#################################################################
Log copy to USB Flash Drive is enabled
-- Only 1 flash drive can be inserted
-- The contents of /tmp and /var/log will be tarred into a
single file and written to the flash drive. Make sure it
has enough room. A minimum of 2.5 MB is required.
#################################################################

2-32 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Note: Only ONE USB flash drive can be plugged into the media tray.

-- Do you want to save the logs to a USB Flash Drive? (y/n)

Type N to skip writing the log files to a USB flash drive.


Type Y to write the log files to a USB flash drive.

*** Writing logs to USB Flash Drive if one is present ***

Note: If a USB flash drive is not plugged into the media tray, you are prompted to
insert one and retry (type R to retry or any other key to exit).

USB Thumb Drive not found. Insert thumb drive in USB port.
-- Type R to retry, any other key to exit:

 When completed, the blade reboots (the blade reboots up to 7 times, taking up
to 30 minutes).
i. When the console displays Ready:
 Remove the USB flash drive (if used)
 Remove the DVD from the media tray
 Disconnect the KVM cable that was temporarily connected to the AMM.
 Wait four additional minutes before proceeding
j. Skip to step 14 on page 2-34
The SAS cables must NOT be connected at this point.

12. When the console displays Ready:, open a new session and check the state of the S-
Blade:
a. Ping the replacement S-Blade to ensure it is reachable:
[root@nzhost1 ~]# ping spu xxyy
Where xx is the SPA number and yy is the slot number for the replacement blade.
For example:
[root@nzhost1 ~]# ping spu0103
Type Ctrl-C to exit the ping session.
Example output:
PING spu0101.npsdomain (10.0.14.28) 56(84) bytes of data.
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=2 ttl=64 time=0.070 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=3 ttl=64 time=0.068 ms
64 bytes from spu0101.npsdomain (10.0.14.28): icmp_seq=4 ttl=64 time=0.069 ms
--- spu0103.npsdomain ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.068/0.071/0.077/0.003 ms

Note: DO NOT continue with this procedure until you are successful with the ping com-
mand. Contact IBM Netezza support if you have issues at this point.

00X6949 Rev.1.40 2-33


Replacement Procedures: IBM PureData System for Analytics N3001

b. As user nz, to check the VPD, type the command:


[nz@nzhost1 ~]$ ssh mm0xx 'info -T blade[yy]'
Where xx is the SPA number and yy is the blade number for the replacement blade.
If no firmware information is available, wait another 5 minutes and re-issue the
command.
Verify that the Mach type/model and FRU serial number listed are accurate by read-
ing them back to the IBM SSR.
c. To check the BSMP, type the command:
[nz@nzhost1 ~]$ ssh mm0xx baydata -b Y
Where xx is the SPA number and y is the blade number for the replacement blade.
The expected result: BSMP or supported
Unsupported indicates a failure. Reseat the S-Blade, reset the AMM, or failover the
AMM (in that order).
d. To check the SOL, type the command:
[nz@nzhost1 ~]$ ssh mm0xx sol -T blade[yy]
Where xx is the SPA number and yy is the blade number for the replacement blade.
The expected result: ready
Not ready or not supported indicates a failure. Reseat the S-Blade, reset the AMM,
or failover the AMM (in that order).
13. Next, perform these actions:
 Remove the USB flash drive (if used)
 Remove the DVD from the media tray
 Disconnect the KVM cable that was temporarily connected to the AMM.
The SAS cables must NOT be connected at this point.

14. Resume the replacespu procedure to complete the blade service activity.
From this point on, replacespu may take up to 90 minutes to complete the updates of
the S-Blade firmware and the FPGA code.
Press Enter (or RETURN) for the replacespu script to continue.
Example output:
Is the SAS Cable still disconnected from the replacement blade? (y/
n) [y]
15. Ensure that the SAS cable is removed from the failed S-Blade and type y.
When the script completes, the script displays:

Completed run of replacespu utility

New SPU(s) in the following spa/slots are now available


SPA SLOT HWID
1 13 1677

Use 'nzhw rebalance' to make these SPUs 'Active'


Note this WILL pause your system for ~5 minutes.

2-34 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

For additional information see log file : /nz/var/log/replacespu


replacespu20141222151546.log.gz
Please remember to reattach the SAS cables before rebalancing the
system.

Logfile: /nz/var/log/replacespu/replacespu20141222151546.log.gz

The output from replacespu indicates it is necessary to rebalance the blades. However, a
current issue with Emulex firmware requires an updated be made using a tool in FDT Sup-
port Tools 2.0.0.5. After the firmware is updated, the NPS system must be stopped and
restarted (using nzstop and nzstart). No rebalance is needed in this procedure. See step 21
through step 23. The nzstop/nzstart step must be scheduled with the customer when the
system can be taken offline.

16. When the replacespu script completes, plug the SAS cables into their original connec-
tors on the S-Blade DACs. Ensure that the cable are routed correctly and are secured
with the Velcro straps to the cable guide at the bottom of the rack door opening.
17. Return to the nz account session.
18. Perform the following steps to activate the S-Blade.
19. Verify that the S-Blade is now available as Spare.
[nz@nzhost1 ~]$ nzhw show -id nnnn
Where nnnn is the new hwid of the replacement SPU.
20. Run the diagnostic script that test the blade settings:
[nz@nzhost1 ~]$ nzpush -s x/y diag run
Where x is the SPA number, and y is blade slot number.
spu0103: Start time: 2013_10_15_17_09_34
spu0103: 001.0 FPGA PCIe CFG Test------------------------->PASSED
spu0103: 001.1 PLX PCIe CFG Test-------------------------->PASSED
spu0103: 001.2 LSI SAS PCIe CFG Test---------------------->PASSED
spu0103: 002.0 FPGA 0 Memory Calibration Test------------->PASSED
spu0103: 002.1 FPGA 1 Memory Calibration Test------------->PASSED
spu0103: 003.0 FPGA 0 Bist POST Test---------------------->PASSED
spu0103: 003.1 FPGA 1 Bist POST Test---------------------->PASSED
spu0103: Done Executing loop 1

If any tests show as failed, troubleshoot the failed component and update its firmware,
then rerun this step.
21. From FDT Support Tools 2.0.0.5, load the tool for updating Emulex firmware
(described in the README for FDT Support Tools).
22. Update the Emulex firmware as described in the FDT Support Tools 2.0.0.5 README.
23. Stop, and then restart NPS:
[nz@nzhost1 ~]$ nzstop
When the system has stopped:
[nz@nzhost1 ~]$ nzstart

24. Continue to “Verification” on page 2-36.

00X6949 Rev.1.40 2-35


Replacement Procedures: IBM PureData System for Analytics N3001

Verification
Check the system configuration.
1. Run the commands:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./sys_rev_check
If issues are noted in the sys_rev_check output (such as requiring a firmware update),
resolve the issues as described in the FDT User’s Guide, in the section “Resolve sys_
rev_check Issues,” and then rerun sys_rev_check to verify that issues are resolved.
Note: After updating the Emulex firmware using FDT Support Tools 2.0.0.5, the Ether-
net firmware is listed as [ABOVE]. This is expected and acceptable.

2. Check for any issues with the commands:


[nz@nzhost1 ~]$ nzhw -issues
[nz@nzhost1 ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues
If issues are found, consult with IBM Netezza Support.
3. Collect logs for the replacement S-Blade component(s) following the verification:
[root@nzhost1 ~]# cd /opt/nz/service_tools
[root@nzhost1 ~]# ./getservicedata.pl mm0xx nn

Where xx is the SPA number and nn is the S-Blade slot number.


For example, the command:
[root@nzhost1 ~]# ./getservicedata.pl mm001 13
Gathers the logs for the S-Blade in slot 13 of SPA1.
When the script completes, gather the resulting logs according to the output path and
attach to the PMR along with any other logs gathered as part of the service activity.
4. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

2-36 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

Offline Replacement Procedure


If the failed component is one of these components:
 Blade Server DIMM
 Blade Server Processor
 Blade Server 10Gb LOM Interposer
 BladeCenter Expansion Unit
The component may be replaced with the system offline if the customer agrees to take the
system offline for the duration of the procedure.
Note: When replacing the Expansion Unit, do not re-use the PCI riser from the original
Expansion Unit. Always use the PCI riser from the replacement Expansion Unit.

Note: This procedure requires the Software Support Tools is loaded on the system.

1. Read the safety information that begins on page v.


2. Log into the active host as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

4. Check to see if the blade is in a failed state:


[nz@nzhost1 ~]$ nzhw -detail -issues
Description HW ID Location Role State Serial number Version Detail
----------- ----- --------------- ------ ----- ------------- ------- ---------
SPU 1607 spa1.spu3 Failed

If the blade is failed, activate it so that later a rebalance is not required:.


[nz@nzhost1 ~]$ nzhw activate -id nnnn

5. Check the state of the Netezza system:


[nz@nzhost1 ~]$ nzstate
System state is 'Stopped'.
6. If the system state is not stopped, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

7. Check for the system to stop using the command:


[nz@nzhost ~]$ nzstate
System state is 'Stopped'.
8. Disconnect the SAS cabling from the S-Blade being removed.
9. Remove the S-Blade assembly from the chassis.
10. Separate the sidecar with the Database Accelerator Cards and associated hardware
from the blade processor card.

00X6949 Rev.1.40 2-37


Replacement Procedures: IBM PureData System for Analytics N3001

11. If replacing the blade server 10Gb LOM Interposer, DIMM, blade server processor, or
expansion unit, follow the procedure for that component as documented in IBM Blade-
Center HS23 Problem Determination and Service Guide.
Note: When replacing the Expansion Unit, do not re-use the PCI riser from the original
Expansion Unit. Always use the PCI riser from the replacement Expansion Unit.

Ensure when replacing the expansion unit cover that the latching tab does not push on the
DAC card, causing misalignment of the DAC cards in the assembly.

Cover Latching Tab

DAC

Figure 2-22: Top Cover Replacement

12. Assemble the sidecar with its components onto the blade processor card.

Statement 21

CAUTION:
Hazardous energy is present when the blade is connected to the power source. Always
replace the blade cover before installing the blade.
13. Install the S-Blade assembly into the same slot in the chassis from which it was
removed.
14. Reconnect the SAS cabling.
15. Power on the S-Blade.
16. Wait for the S-Blade to become active. Check the blade state by using the command:
[root@nzhost1 ~]# ssh mm0xx info -T blade[x]
Where xx is the SPA number (01-08) and y is the slot number of the S-Blade server (1,
3, 5, 7, 9, 11, 13).
17. As user nz, restart the NPS system:

2-38 00X6949 Rev.1.40


Chapter : Replacing S-Blade Components

[nz@nzhost ~]$ nzstart

18. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check Blade

c. If the firmware requires updating, or if other failures are noted in the sys_rev_check
output, resolve the issues as described in the FDT User’s Guide, in the section
“Resolve sys_rev_check Issues,” and then rerun sys_rev_check to verify that issues
are resolved.
19. Check for any issues with the commands:
[nz@nzhost1 ~]$ nzhw -issues
[nz@nzhost1 ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues
If issues are found, consult with IBM Netezza Support.
20. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

If you encounter issues after replacing components without running replacespu, run the
complete replacespu procedure to resolve the issues.

Troubleshooting the Replacement Process


This section describes some common error conditions for the S-Blade replacement process.

Hang during DVD Boot


A replacement planar with Emulex Firmware 4.6.281.26 may hang during the boot of the
HS23 DVD on systems with heavy network traffic .
In this case, you may need to disable the network ports on the gig switches for the DVD to
boot.
Note: It is necessary to disable the network Ports on the gig switches for the planar being
replaced until such time that the DVD has completed updating the Emulex FW.

To temporarily avoid the heavy network traffic:


1. As user root on host:
[root@nzhost1 ~]# telnet gigsw0xa
Where x is the rack number where the replacement planar is located.
For example:
[root@nzhost1 ~]# telnet gigsw03a
If replacing planar SPU0301 (rack 3, SPU 1)
2. At the login prompt, type admin.
3. From the gig switch prompt, type:

00X6949 Rev.1.40 2-39


Replacement Procedures: IBM PureData System for Analytics N3001

>>Main# /cfg/port 1:x,2:x /dis


Where x is the slot number where the replacement planar is located.
For example:
>>Main# /cfg/port 1:1,2:1 /dis
For gigsw03a port 1 and gigsw03b port 1 for slot 1 blade)
4. From the gig switch prompt, type:
>>Main# apply
5. Wait for the apply to complete.
Boot the HS23 DVD and update the firmware/SMBIOS settings on the replacement planar.
Once update is complete and the S-Blade is rebooting, it is IMPORTANT to re-enable the
gigswitch ports in order for the SPU to properly PXE boot.
Note: This can be done while the blade is in the process of rebooting.
1. As user root on host:
[root@nzhost1 ~]# telnet gigsw0xa
Where x is the rack number where the replacement planar is located.
For example:
[root@nzhost1 ~]# telnet gigsw03a
If replacing planar SPU0301 (rack 3, SPU 1)
2. At the login prompt, type admin.
3. From the gig switch prompt, type:
>>Main# /cfg/port 1:x,2:x /ena
Where x is the slot number where the replacement planar is located.
For example:
>>Main# /cfg/port 1:1,2:1 /ena
For gigsw03a port 1 and gigsw03b port 1 for slot 1 blade)
4. From the gig switch prompt, type:
>>Main# apply
5. Wait for the apply to complete.

Running the replacespu Command after the S-Blade is Deleted


If the failed S-Blade has been deleted from the system inventory/topology (such as from a
prior replacespu operation that failed in a later step), or if the S-Blade was already failed
and deleted from the chassis, it no longer has a hardware ID that you can use in the
replacespu command. In this case, you can use an alternate form of the replacespu com-
mand by specifying the SPA and slot number of the location where you will be installing the
replacement S-Blade.
For example, if the replacement S-Blade will reside in SPA 1 slot 3, use a command similar
to the following to start the replacement process:
[root@nzhost1 ~]# /nz/kit/bin/adm/tools/replacespu –spa 1 –slot 3
The command prompts you to replace the S-Blade to the specified slot, and proceeds with
the replacement steps.

2-40 00X6949 Rev.1.40


CHAPTER 3
Replacing an Environmental Services Module
Each IBM PureData System for Analytics N3001 rack contains:
 Two Disk Enclosures - N3001-002
 Six Disk Enclosures - N3001-005
 Twelve Disk Enclosures - N3001-010 and larger
Each Disk Enclosures houses 24 disk drives.
Note: The N3001-001 system does not use Disk Enclosures.

Each Disk Enclosure contains two ESMs.


Only one ESM is allowed to be replaced at a time. If both ESMs require replacement, con-
tact IBM Netezza Support.

Before you begin the ESM replacement process, make certain that you have a replacement
ESM that conforms to the hardware models supported for the N3001 system.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the user to have root access if firmware updates are required.

If a firmware update is required as part of this procedure (determined in step 19 on


page 3-4), the disk enclosure must be power-cycled for the firmware to be loaded. This
power cycle requires a customer outage and must be planned for.

The estimated time to perform this procedure is up to 90 minutes, depending on ease of


access to the system and familiarity with NPS and the Netezza system, and the need for
firmware updates
The FRU number for the ESM is 00W1241.
Complete details on generic ESM removal and replacement is in the IBM System Storage
EXP25xx Installation, User’s, and Maintenance Guide.
To replace a Environmental Services Module (ESM) component on the N3001 system, fol-
low these steps:
1. Read the safety information that begins on page v.
2. Log into the active host as user nz.

3-1
Replacement Procedures: IBM PureData System for Analytics N3001

3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

4. Validate the current state of the system prior to replacement.


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
5. If the system state is not online, start the system using the command:
[nz@nzhost1 ~]$ nzstart

6. Wait for the system to go online using the command:


[nz@nzhost ~]$ nzstate[nz@nzhost ~]$ nzstate
System state is 'Online'.
7. Run sys_rev_check:
a. Change directory to:
[nz@nzhost ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost ~]$ ./sys_rev_check

c. Make note of the output to compare with results later in the procedure.
8. Identify the failed ESM that requires replacement:
[nz@nzhost1 ~]$ nzhw -type mm -issues
Description HW ID Location Role State
----------- ----- ------------------ ------ ------
MM 1074 spa1.diskEncl1.mm1 Active Failed
The Location entry provides details on the ESM location. See Figure 1-3 on page 1-4
through Figure 1-11 on page 1-12 for the physical location of N3001 components.
9. Check the ESM serial numbers:
[nz@nzhost1 ~]$ nzhw -type mm -detail
Description HW Location Role State Serial
ID number
----------- ----- ------------------ ------ ------ -------
MM 1071 spa1.diskEncl1.mm1 Active Ok
YM10Z12W036
...
Examine the serial numbers of each ESM and compare them to the serial number on
the label of the replacement ESM. Duplicate serial numbers are not allowed. If an
existing ESM serial number matches that of the replacement ESM, a different replace-
ment ESM must be used.
10. Locate the failed ESM. The amber Fault LED is lit on the failed ESM.

3-2 00X6949 Rev.1.40


Chapter : Replacing an Environmental Services Module

Input Connectors
Fault LED IN1 IN2

Output Connector (Out) Ethernet Connector

Figure 3-1: ESM LEDs and Connectors

11. Remove all cables from the ESM, ensuring that the cables are properly labeled so that
they can be replaced into their original locations.
12. Remove the shipping bracket (if installed, colored orange) at the rear of the disk
enclosure:
a. Using a 7mm socket, loosen (do not remove) the nut that secures the shipping
bracket to each side of the rack.
b. With the nuts loosened, pull the bracket away from the disk enclosure, detaching
the bracket from the enclosure.
c. To completely remove the bracket from the rack, cables on one side of the rack,
behind the enclosure, need to be loosened by cutting the zip tie (or loosening the
Velcro tie) that secures them to the side of the rack.
d. After cutting the zip tie (or loosening the Velcro tie) lift that end of the bracket off
the rail and then rotate the bracket out of the rack.
e. Reattach the cables to their original position using a new zip tie (or the Velcro tie).
f. Tighten the nuts that secured the bracket to each side of the rack.
g. Repeat for all disk enclosure brackets.
h. The bracket(s) should be stored on-site in the event the system needs to be moved
(requiring them to be re-installed).
13. Remove the failing ESM from the enclosure.
Note: It may be necessary to temporarily move cabling out of the way to make room for
the ESM removal and replacement.

00X6949 Rev.1.40 3-3


Replacement Procedures: IBM PureData System for Analytics N3001

Release
Levers

Figure 3-2: ESM Removal

14. Install the replacement ESM into the disk enclosure.


15. Reconnect all cables, referring to the printed labels on each cable showing which port
the cable connect to (see Figure 3-1).
Note: It may be necessary to move cabling back into place if it was moved out of the
way to make room for the ESM removal and replacement.

16. Wait three minutes for the ESM to update from the backplane.
17. On systems with NPS versions earlier than 7.0 P14, 7.0.2 P11, 7.0.4 P13, 7.1, check
the ESM serial numbers:
[nz@nzhost1 ~]$ nzhw -detail | grep None
Verify that there are no duplicate serial numbers. If a duplicate serial number exists,
the ESM must be replaced.
18. Check for multi-path issues:
[nz@nzhost ~]$ nzpush -a mpath -issues
If issues are found, resolve the issues by typing:
[nz@nzhost ~]$ nzpush -a mpath dm -r

19. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost ~]$ ./sys_rev_check

c. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 7. If issues are noted in the
sys_rev_check output, in particular, the ESM firmware, resolve the issues, such as
updating firmware, as described in the FDT User’s Guide, in the section “Resolve
sys_rev_check Issues.”
If a firmware update is required, the disk enclosure must be power-cycled for the firmware
to be loaded. This power cycle requires a customer outage and must be planned for.

3-4 00X6949 Rev.1.40


Chapter : Replacing an Environmental Services Module

20. Check for any issues with the commands:


[nz@nzhost ~]$ nzhw -issues
[nz@nzhost ~]$ nzds -issues
[nz@nzhost ~]$ nzspupart -issues

21. If Call Home was previously disabled, enable it.


[nz@nzhost ~]$ nzcallhome -on

00X6949 Rev.1.40 3-5


Replacement Procedures: IBM PureData System for Analytics N3001

3-6 00X6949 Rev.1.40


CHAPTER 4
Replacing a Disk Drive
What’s in this chapter
 Use the Restricted Environment
 Use the Command Line Interface
 Manual Disk Replacement

Each IBM PureData System for Analytics N3001-002 and larger rack contains:
 Two Disk Enclosures - N3001-002
 Six Disk Enclosures - N3001-005
 Twelve Disk Enclosures - N3001-010 and larger
Each Disk Enclosure houses 24 disk drives.
Note: The N3001-001 system does not use Disk Enclosures. Each N3001-001 system
includes two Host Servers, each with 24 disk drives. Sixteen disks are the data disks (slots
8-23) connected to SAS HBA, and the other eight disks are the system disks connected to
on-board RAID controller and configured in a RAID array (slots 0-7). This chapter describes
replacement procedure for data disks (slots 8-23).

Before you begin the disk replacement process, make certain that you have a replacement
disk that conforms to the hardware models supported for the N3001 system. The N3001
system uses Self-Encrypting Drives (SEDs). Typically, you will use a new replacement disk.
Also before beginning the replacement procedure, verify that there is a problem with the
disk drive. Consult the Problem Determination and Service Guide for the disk enclosure or
host server for more information on disk replacement.
The N3001-002 and larger systems currently support one SED model: 600GB Model
ST600MM0026E - FRU number 00AK388, firmware rev. E56D, E56F
The N3001-001 system currently supports one SED model: 600GB Model
ST600MM0026E - FRU number 90Y8909, firmware rev. E56D
Note: Self-Encrypting Drives have some important differences to be aware of:
 When Security is Enabled (drives are in auto-lock mode) and SecureEraseOn-
Failover = True (default setting), a secure erase is performed automatically during
the failover of a drive.
 When Security is Enabled (drives are in auto-lock mode), the replacement SED
performs a secure erase when transitioning roles from Inactive to Spare (during
activation).

4-1
Replacement Procedures: IBM PureData System for Analytics N3001

Observe Electrostatic Discharge (ESD) precautions when handling electronic opponents.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the user to have root and nz access. An option is also available that
requires the user to have access as nzibmsupport13 and nz.

- A failed disk must be replaced while the system is online.


- The system cannot be offline when replacing a failed disk.

When replacing multiple drives, replace one disk at a time, as instructed in the procedure,
before replacing the next drive.

The estimated time to perform this procedure is from 20 to 25 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.

Use the Restricted Environment


If the customer has installed and enabled the Restricted Environment, use this procedure
to replace the failed disk. The replacedisk script gathers SMART data and places it in the
replacedisk log.
Note: The customer is responsible for enabling and configuring the restricted environment.

If Call Home is enabled on this system, the System Manager will report this activity and
create PMRs. The customer must be made aware of this so that the PMRs can be closed
when filed.

1. Read the safety information that begins on page v.


2. Log into the system and enter the Restricted Environment:
login as: nzibmsupport13
When prompted, type the password (default is nzibmsupport13).
Example output:
NZ support actions
1 Replace Storage Disk
2 Replace SPU
3 Quit
>
3. Select option 1 from the menu:
> 1
Example output:
**************************************************************
* Welcome to replacedisk. *
* Please use “Ctrl + c” to exit at any point in this process.*
**************************************************************

4-2 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

List of failed disk(s) in the current system:


----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1, SPA 1,
enclosure 6, slot 2)

Please input a hardware ID from the list above or enter <ctrl-c> to


exit script
>
4. Type the ID of only one disk drive to replace:
For example:
> 1166
Example output:
You have selected the following disk to be removed:

- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,


enclosure 6, slot 2)

Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
5. Type y.
Example output:
################################
Beginning Remove phase:
The physical location of the disk needs to be located.

In order to locate the disk, the next 3 steps will have the LED
light turned on (1), off (2), and then back on (3).
This will help you to locate the disk. Mark the located disk in a
non-harmful way (with a sticker, etc).

After all 3 steps are completed, you will be asked if you want to
retry those steps.

When you are ready to start the steps to locate the failed disk,
press Enter to continue

Locating Disk 1166 (spa1.diskEncl6.disk2) at rack 1, SPA 1,


enclosure 6, slot 2...

STEP 1: Confirm the LED light is turned on


Turning on LED light for disk 1166 ... (Please allow up to 2 minutes
for the LED light to turn on)
Is the LED light for disk 1166 turned on? [y/n]:
6. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
STEP 2: Confirm the LED light is turned off
Turning off LED light for disk 1166 ... (Please allow up to 2
minutes for the LED light to turn off)
Is the LED light for disk 1166 turned off? [y/n]:
7. Type y and press Enter.
Example output:
STEP 3: Reconfirm the LED light is turned on (Don’t forget to mark
the disk)
Turning on LED light for disk 1166 ... (Please allow up to 2 minutes
for the LED light to turn on)
Is the LED light for disk 1166 turned on? [y/n]:

00X6949 Rev.1.40 4-3


Replacement Procedures: IBM PureData System for Analytics N3001

8. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
Do you want to redo the above steps again? [y/n]:
9. Type n and press Enter.
Example output:
CONFIRMING AGAIN TO MAKE SURE YOU HAVE SELECTED THE CORRECT DISK!
You have selected the following disk to be removed:

- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,


enclosure 6, slot 2)

Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
10. Type y and press Enter.
Example output:
Please remove the following disk:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)

Please follow these instructions as you remove the disk:


- Unlock the drive carrier
- Pull the disk half way out of the slot
- Wait 90 seconds for the disk to spin down
- Fully remove the failed disk from the slot

Once you have removed the above disk, press Enter to continue
11. Remove the failed disk identified in step 10.
When removing a drive, pull it only half way out of the slot and wait 90 seconds for the disk
to spin down before fully removing the disk drive.

Pull the drive out half-way, wait 90 seconds


for the disk to spin down. Only then remove
the drive completely.

Figure 4-1: Disk Removal

4-4 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

12. Press Enter.


All systems, example output:
Verifying that the disk has been physically removed. This may take
a few minutes ...
N3001-001 systems may output the following:
******************************************************************
WARNING: DISK AT upper host, host disk in slot 2 IS NOT IN MISSING
STATE YET!
******************************************************************
Verifying that the disk is in missing state. This may take a few
minutes ...
All systems, example output:
The disk has been removed.
The following hardware ID of the removed disk will be deleted:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)

Deleting hardware ID is a permanant action. Continue? (y =


continue, n = exit) [y/n]:
13. Type y.
Example output:
Deleting the hardware ID of the specified disk ... done

################################
Beginning Insert phase:

Please insert the replacement disk, press Enter to continue


14. Insert the replacement disk, wait 30 seconds for the disk to spin up, and press Enter.
Example output from an N3001-002 or larger system (for a disk NOT requiring
a firmware update; N3001-001 system output is similar):
Waiting for disk to become visible to NPS. This may take a few minutes
...
Disk (Hardware ID 1637, Serial Number S0M1X3H20000C137GVAC, F/W Rev
E56D) is visible at rack 1, SPA 1, enclosure 6, slot 2

################################
Beginning Firmware Update phase:

Checking the disk's firmware for any updates.


Firmware updates not performed.

################################
Beginning Activation phase:
Activating the disk . . . Disk is now activated and ready to use
################################

Report
Report of disk that has been removed:

Hardware ID Location Serial F/W Rev Replace State


1166 spa1.diskEncl6.disk2 S0M1X3H20000B428GVAC E56DRemoved

Report of disks that have been inserted:

00X6949 Rev.1.40 4-5


Replacement Procedures: IBM PureData System for Analytics N3001

Hardware ID Location Serial F/W Rev Replace State


1637 spa1.diskEncl6.disk2 S0M28B230000M4357PEP E56D Activated

Log file location: /nz/kit.7.2.0.0.40130.1/../var/log/


replacedisk.log.2014-08-21

Note: PFE information from the failed disk is contained in the log.

Example output from an N3001-001 (for a disk requiring a firmware update;


N3001-002 and larger system output is similar):
Waiting for the disk to become visible to NPS. This may take a few
minutes ...
Disk (Hardware ID 1059, Serial Number 6XR2S04A0000B243G6QH) is visible
at upper host, 23rd host disk (slot 22)
##################################################################
Beginning Firmware Update phase:

The following disks may need the firmware to be updated:

Disk with hardware ID: 1059 (upper host, 23rd host disk (slot 22))

Updating the firmware for this disk is a permanent action. Continue?


[y/n]:
Type y
Performing firmware updates. This may take a few minutes ...
Updating firmware of disk 1059 ... done

##################################################################
Report:
Report of disk that has been removed:

Hardware ID Location Serial F/W Rev Replace State FDT State


1022 spa1.diskEncl1.disk23 S0M1YFRS0000B430AECF E56D Removed N/A
Report of disk that has been inserted:

Hardware ID Location Serial F/W Rev Replace State FDT State


1059 rack1.host1.hostDisk23 6XR2S04A0000B243G6QH N/A Firmware updated
Firmware is up to date

##################################################################
Disks are ready to be activated.

The N3001-001 system also shows the following output


On this platform a disk that has been replaced need to be activated
on the host before SPU can use it. The activation operation pauses
the system for a few minutes and therefore it is not performed
automatically by this script.
It is recommended to start activation during maintenance window of
when pausing the system for a few minutes is acceptable.

When multiple host disks are inactive it is enough to only activate


the first one - the activation process will activate all of them in
one run.

The activation can be requested using the following command:


nzhw activate -id XXXX

Where XXXX is hwid of the new host disk.

4-6 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

Log file location: /nz/kit.7.2.0.1.40624/../var/log/


replacedisk.log.2014-09-11::13:36:18
Note: PFE information from the failed disk is contained in the log.

15. Exit from the Restricted Environment:


NZ support actions
1 Replace Storage Disk
2 Replsce SPU
3 Quit
>
> 3
The N3001-001 system requires replacement disks to be manually activated after replace-
ment (as noted in the script output in step 14).

To activate the disk(s) on an N3001-001, at a time when a system pause is accept-


able, type:
[nz@nzhost ~]$ nzhw activate -id <HW ID of the replacement disk>
When prompted, type y for confirmation.
For example, if the HWID of the replacement disk (identified in step 14) is 1059:
[nz@nzhost ~]$ nzhw activate -id 1059

During host disk activation the system is paused and SPUs reconfigured and restarted.
When the restart is complete the system goes back to the online state. During this pro-
cess several intermediate system states can also be observed (Discovering, Initializing,
Resuming).
The replacement disk now has the role of Spare. If more than one disk has been
replaced, they all become activated.
16. The replacement is complete.

Use the Command Line Interface


Use this procedure to replace the failed disk in a standard Netezza environment. The
replacedisk script gathers SMART data and places it in the replacedisk log.
This procedure is used to replace one disk at a time.
If you require assistance performing this procedure, call IBM Netezza Support.

To replace a “failed” or “failing” disk drive on the system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the system as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

00X6949 Rev.1.40 4-7


Replacement Procedures: IBM PureData System for Analytics N3001

4. Check the state of the Netezza system:


[nz@nzhost ~]$ nzstate
System state is 'Online'.
If the system state is not online, use the nzstart command to start the system. If you
cannot start the system for any reason, contact Netezza Support for assistance.
5. Identify the failed disk that requires replacement. Look at the Role column:
[nz@nzhost ~]$ nzhw -issues
Example output:
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl6.disk2 Failed Ok Disabled

The Role of a failed disk shows Failed or Incompatible. The State column typically
shows a state of Ok. The state could also be Unsupported, Unreachable, None, or
Degraded. The Security column indicates if the Self-Encrypting Drive (SED) is in auto-
lock mode (Enabled) or in default mode (not locked = Disabled).
Note: A Failed drive still installed in the system should always be shown as Disabled.

6. Check to see if a disk regeneration (copy and restore) is in process:


[nz@nzhost ~]$ nzspupart -regenstatus

Note the status for use later in the procedure.


7. If you are replacing a “failing” disk (an active disk that has not yet failed), the nzhw -
issues command will not return any information about that disk. To fail over an active
but problematic disk that you want to replace, use the following steps. This procedure
assumes that you know the ID of the problematic disk (1166, in this example). If the
target disk is already in the Failed role, skip to step 8.
Never remove a disk drive from a storage array while the disk is in the Active or Spare state.
Always make sure that you failover the active disk before you attempt to remove it. If you
remove an Active or Spare disk drive, you could cause the system to restart and/or transi-
tion to the down state.

a. Verify the location information for the disk ID that you want to replace:
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl6.disk2 Active Ok Enabled
b. Fail over the disk:
[nz@nzhost ~]$ nzhw failover -id 1166
Are you sure you want to proceed (y|n)? [n] y
Note: The system manager will not allow you to fail over a disk manually if it holds the
last remaining copy of a data slice (that is, if the disk is unmirrored).

c. Verify that the disk is now in a failed role:


Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl6.disk2 Failed Ok Disabled
d. Proceed to step 8.
8. Change to the directory where the replacedisk script is located:

4-8 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

[nz@nzhost ~]$ cd /nz/kit/share/tools/storage

9. Run the replacedisk script:


[nz@nzhost ~]$ ./replacedisk -location spax.diskEncly.diskz
Where x is the SPA number, y is the enclosure number, and z is the disk drive number.
For example, for SPA 1, Enclosure 6, Disk 2:
[nz@nzhost ~]$ ./replacedisk -location spa1.diskEncl6.disk2
Example output:
**************************************************************
* Welcome to replacedisk. *
* Please use “Ctrl + C” to exit at any point in this process.*
**************************************************************

You have selected the following disk to be removed:


----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1, SPA 1,
enclosure 6, slot 2)

Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
10. Type y.
Example output:
################################
Beginning Remove phase:
The physical location of the disk needs to be located.

In order to locate the disk, the next 3 steps will have the LED
light turned on (1), off (2), and then back on (3).
This will help you to locate the disk. Mark the located disk in a
non-harmful way (with a sticker, etc).

After all 3 steps are completed, you will be asked if you want to
retry those steps.

When you are ready to start the steps to locate the failed disk,
press Enter to continue

Locating Disk 1166 (spa1.diskEncl6.disk2) at rack 1, SPA 1,


enclosure 6, slot 2...

STEP 1: Confirm the LED light is turned on


Turning on LED light for disk 1166 ... (Please allow up to 2 minutes
for the LED light to turn on)
Is the LED light for disk 1166 turned on? [y/n]:
11. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
STEP 2: Confirm the LED light is turned off
Turning off LED light for disk 1166 ... (Please allow up to 2
minutes for the LED light to turn off)
Is the LED light for disk 1166 turned off? [y/n]:
12. Verify that the disk locater LED is OFF and then type y and press Enter.

00X6949 Rev.1.40 4-9


Replacement Procedures: IBM PureData System for Analytics N3001

Example output:
STEP 3: Reconfirm the LED light is turned on (Don't forget to mark
the disk)
Turning on LED light for disk 1166 ... (Please allow up to 2 minutes
for the LED light to turn on)
Is the LED light for disk 1166 turned on? [y/n]:
13. Verify that the disk locater LED is ON and then type y and press Enter.
Example output:
Do you want to redo the above steps again? [y/n]:
14. Type n and press Enter.
Example output:
CONFIRMING AGAIN TO MAKE SURE YOU HAVE SELECTED THE CORRECT DISK!
You have selected the following disk to be removed:

- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,


enclosure 6, slot 2)

Is the above disk the one you want to remove? (DO NOT REMOVE YET)
[y/n]:
15. Type y and press Enter.
Example output:
Please remove the following disk:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)

Please follow these instructions as you remove the disk:


- Unlock the drive carrier
- Pull the disk half way out of the slot
- Wait 90 seconds for the disk to spin down
- Fully remove the failed disk from the slot

Once you have removed the above disk, press Enter to continue
16. Remove the failed disk identified in step 13.
When removing a drive, pull it only half way out of the slot and wait 90 seconds for the disk
to spin down before fully removing the disk drive.

4-10 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

Pull the drive out half-way, wait 90 seconds


for the disk to spin down. Only then
remove the drive completely.

Figure 4-2: Disk Removal

17. Press Enter.


All systems, example output:
Verifying that the disk has been physically removed. This may take a
few minutes ...
N3001-001 systems may output the following:
******************************************************************
WARNING: DISK AT upper host, host disk in slot 2 IS NOT IN MISSING
STATE YET!
******************************************************************
Verifying that the disk is in missing state. This may take a few
minutes ...
All systems, example output:
The disk has been removed.
The following hardware ID of the removed disk will be deleted:
----------------------------------------------------------------
- Disk with hardware ID: 1166, spa1.diskEncl6.disk2 (rack 1,SPA 1,
enclosure 6, slot 2)

Deleting hardware ID is a permanant action. Continue? (y = continue, n


= exit) [y/n]:
18. Type y.
Example output:
Deleting the hardware ID of the specified disk ... done

################################
Beginning Insert phase:

Please insert the replacement disk, press Enter to continue


19. Insert the replacement disk, wait 30 seconds for the disk to spin up, and press Enter.
Example output (for a disk NOT requiring a firmware update):

00X6949 Rev.1.40 4-11


Replacement Procedures: IBM PureData System for Analytics N3001

Waiting for disk to become visible to NPS. This may take a few minutes
...
Disk (Hardware ID 1638, Serial Number , F/W Rev E56D) is visible at
rack 1, SPA 1, enclosure 6, slot 2

################################
Beginning Firmware Update phase:

Checking the disk's firmware for any updates.


Firmware updates not performed.

################################
Beginning Activation phase:
Activating the disk . . . Disk is now activated and ready to use
################################

Report

Report of disk that has been removed:

Hardware ID Location Serial F/W Rev Replace State


1166 spa1.diskEncl6.disk2 S0M1X3H20000B428GVAC E56DRemoved

Report of disks that have been inserted:

Hardware ID Location Serial F/W Rev Replace State


1637 spa1.diskEncl6.disk2 S0M28B230000M4357PEP E56D Activated

Log file location: /nz/kit.7.2.0.0.40130.1/../var/log/


replacedisk.log.2014-08-21

Note: PFE information from the failed disk is contained in the log.

Example output (for a disk requiring a firmware update):


Waiting for the disk to become visible to NPS. This may take a few
minutes ...
Disk (Hardware ID 1059, Serial Number 6XR2S04A0000B243G6QH) is visible
at upper host, 23rd host disk (slot 22)
##################################################################
Beginning Firmware Update phase:

The following disks may need the firmware to be updated:

Disk with hardware ID: 1059 (upper host, 23rd host disk (slot 22))

Updating the firmware for this disk is a permanent action. Continue?


[y/n]:
Type y
Performing firmware updates. This may take a few minutes ...
Updating firmware of disk 1059 ... done

##################################################################
Report:

Report of disk that has been removed:

4-12 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

Hardware ID Location Serial F/W Rev Replace State FDT State


1022 spa1.diskEncl1.disk23 S0M1YFRS0000B430AECF E56D Removed N/A

Report of disk that has been inserted:

Hardware ID Location Serial F/W Rev Replace State FDT State


1059 rack1.host1.hostDisk23 6XR2S04A0000B243G6QH N/A Firmware updated
Firmware is up to date

##################################################################
Disks are ready to be activated.

The N3001-001 system also shows the following output


On this platform a disk that has been replaced need to be activated
on the host before SPU can use it. The activation operation pauses
the system for a few minutes and therefore it is not performed
automatically by this script.
It is recommended to start activation during maintenance window of
when pausing the system for a few minutes is acceptable.

When multiple host disks are inactive it is enough to only activate


the first one - the activation process will activate all of them in
one run.

The activation can be requested using the following command:


nzhw activate -id XXXX

Where XXXX is hwid of the new host disk.


Log file location: /nz/kit.7.2.0.1.40624/../var/log/
replacedisk.log.2014-09-11::13:36:18
Note: PFE information from the failed disk is contained in the log.

The N3001-001 system requires replacement disks to be manually activated after


replacement (as noted in the previous script output).
To activate the disk(s), at a time when a system pause is acceptable, type:
[nz@nzhost ~]$ nzhw activate -id <HW ID of the replacement disk>
When prompted, type y for confirmation.
For example, if the HWID of the replacement disk (identified in step 14) is 1059:
[nz@nzhost ~]$ nzhw activate -id 1059
During host disk activation the system is paused and SPUs reconfigured and restarted.
When the restart is complete the system goes back to the online state. During this pro-
cess several intermediate system states can also be observed (Discovering, Initializing,
Resuming).
The replacement disk now has the role of Spare. If more than one disk has been
replaced, they all become activated.
20. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

00X6949 Rev.1.40 4-13


Replacement Procedures: IBM PureData System for Analytics N3001

Manual Disk Replacement

Replacement Procedure

Note: If you require assistance performing this procedure, call IBM Netezza Support.

1. Read the safety information that begins on page v.


2. Log into the system as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

4. Check the state of the Netezza system:


[nz@nzhost ~]$ nzstate
System state is 'Online'.
If the system state is not online, use the nzstart command to start the system. If you
cannot start the system for any reason, contact Netezza Support for assistance.
5. Identify the failed disk that requires replacement. Look at the Role column:
[nz@nzhost ~]$ nzhw -issues
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl6.disk2 Failed Ok Disabled

The Role of a failed disk shows Failed. The State column typically shows a state of Ok.
The state could also be Inactive, Unsupported, Unreachable, None, or Degraded. The
Security column indicates if the Self-Encrypting Drive (SED) is in auto-lock mode
(Enabled) or in default mode (not locked = Disabled).
Note: A Failed drive should always be shown as Disabled.

6. Check to see if a regen is in process:


[nz@nzhost ~]$ nzspupart -regenstatus

Note the status for use later in the procedure.


When replacing multiple drives, replace one disk at a time, performing step 7 through step
20, before replacing the next drive.

7. If you are replacing a "failing" disk (an active disk that does not yet have a role of
failed), the nzhw -issues command will not return any information about that disk. To
fail over an active but problematic disk that you want to replace, use the following
steps. This procedure assumes that you know the ID of the problematic disk (1166 in
this example). If the target disk is already in the Failed role, skip to step 8.

4-14 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

Never remove a disk drive from a storage array while the disk is in the Active, Assigned,
Assigning, Spare, Sparing, or Failing role. Always make sure that you failover the Active,
Assigned, or Spare disk before you attempt to remove it. If the disk is in Assigning, Sparing
or Failing role, wait for it to be transitioned to the Assigned, Spare, or Failed role before
failing it over. If you remove an Active, Assigned, Assigning, Spare, Sparing, or Failing disk
drive, you could cause the system to restart and/or transition to the down state.

a. Verify the location information for the disk ID that you want to replace:
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl1.disk2 Active Ok Enabled
b. Fail over the disk:
[nz@nzhost ~]$ nzhw failover -id 1166
Are you sure you want to proceed (y|n)? [n] y
Note: The system manager will not allow you to fail over a disk manually if it holds the
last remaining copy of a data slice (that is, if the disk is un-mirrored).

c. Verify that the disk is now in a failed role:


Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1166 spa1.diskEncl1.disk2 Failed Ok Disabled
d. Make note of the HWID and Location for use later in the procedure.
e. Proceed to step 8.
8. Extract the SMART log from the problem drive using the following nzpush command.
(Attach these logs in the trouble ticket that you submit for the disk replacement.)
Note: If the disk is in the Failed/None, Failed/Unreachable, or Failed/Missing states,
you cannot extract the SMART information for that disk. Proceed to step 9.

For N3001-002 and larger systems:


nzpush -s spa/spu disk smart --encl num --slot num >/tmp/filename.txt

For N3001-001 systems:


/opt/nz/fdt/storage_diags Smart --spa 1 --encl num --disk num
In the command, spa is the number of the SPA where the disk resides; spu is the slot
number of any online SPU in the same SPA; encl num is the number of the disk enclo-
sure within the rack; and slot num is the number of the disk within the enclosure where
the disk resides. You can obtain the SPA ID, enclosure ID, and disk slot ID from the
disk location string as shown in step 7 step a.
For example, for the problem disk ID 1166, the command would be:
For N3001-002 and larger systems:
[nz@nzhost ~]$ nzpush -s 1/1 disk smart --encl 1 --slot 2 >/tmp/spa1_
encl1_slot2_smart.txt
For N3001-001 systems:
/opt/nz/fdt/storage_diags Smart --spa 1 --encl 1 --disk 2

9. Physically locate the failed drive.


Turn on the failed drive’s locater LED:

00X6949 Rev.1.40 4-15


Replacement Procedures: IBM PureData System for Analytics N3001

[nz@nzhost ~]$ nzhw locate -id 1166


Turned locator LED 'ON' for Disk: Logical
Name:'spa1.diskEncl6.disk2'
Physical Location:'1st Rack, 6th DiskEnclosure, Disk in Slot 2'.
Note: The location of the disk appears in the command output. The person who is run-
ning the command should communicate the location of the disk to the person who is
on-site with the Netezza system.

Failure to replace a hard disk drive in its correct bay might result in loss of data. If you are
replacing a hard disk drive that is part of a configured array and logical drive, be sure to
install the replacement hard disk drive in the correct bay.

Never swap a drive when its associated green activity LED is flashing. Swap a drive only
when its associated amber LED is blinking.

10. Mark the failed disk drive in a non-harmful way to ensure that the correct disk is
replaced in the next step.
11. Replace the failed disk identified in step 9:
a. Unlock the drive carrier by pushing up slightly on the release latch, then pulling
down and out on the drive handle (see Figure 4-3).
b. Pull the disk drive half way out of the slot.
c. Wait at least 90 seconds for the disk to spin down and fully clean up traces of the
failed disk.
Failing to wait for disk spin down can crash the disk heads and destroy data on the disk.

d. Fully remove the failed disk from the slot.

Pull the drive out half-way, wait 90 seconds


for the disk to spin down. Only then remove
the drive completely.

Figure 4-3: Disk Removal

Always wait at least 90 seconds between pulling the failed disk out and inserting the
replacement disk to allow the system to fully clean up traces of the failed disk.

4-16 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

e. DO NOT INSERT THE REPLACEMENT DISK AT THIS TIME.


12. Delete the disk ID for the failed drive:
[nz@nzhost ~]$ nzhw delete -id 1166
Are you sure you want to proceed (y|n)? [n] y
Note: For N3001-002 and larger systems, if the command returns the following error:
Error: cannot delete disk [disk hwid=1166 sn="S0M1YFRZ0000B430A8Y8"
SPA=1 Parent=1054 Position=15 ParentEnclPosition=4] - Disk is still
present in the system, remove it from the spu inventory first.
Wait a few minutes and retry the command.
13. Insert the replacement disk in the same slot, taking care to fully insert and lock the
drive carrier in place.
14. The system manager should automatically detect the replacement disk within five
minutes.
a. To check the role of the replacement disk, type the command:
 For systems N3001-002 and larger:
[nz@nzhost ~]$ nzhw | grep spa<x>.diskEncl<y>.disk<z>
Where <x> is the SPA number, <y> is the Disk Enclosure number, and <z> is the
disk slot number identified in step 7.
For example:
[nz@nzhost ~]$ nzhw | grep spa1.diskEncl6.disk2
Example output (N3001-002 and larger):
Description HW ID Location Role State Security
----------- ----- ---------------------- -------- ------- --------
Disk 1637 spa1.diskEncl6.disk2 Inactive Ok Disabled
 For N3001-001 systems:
[nz@nzhost ~]$ nzhw | grep rack1.host<x>.hostDisk<y+1>
Where <x> is the host number and <y+1> is the Disk slot number identified in
step 7 incremented by 1.
For example, if the disk replaced was identified in step 7 as
spa1.diskEncl1.disk22, type:
[nz@nzhost ~]$ nzhw | grep rack1.host1.hostDisk23
Example output (N3001-001):
Description HW ID Location Role State Security
----------- ----- ---------------------- -------- ------- --------
HostDisk 1058 rack1.host1.hostDisk23 Inactive Ok N/A
 For systems N3001-002 and larger, the replacement disk must be listed as in
the same location (SPA, enclosure, slot) as the failed disk.
 For systems N3001-002 and larger, note that the disk HW ID (listed in the
second column) of the replacement disk is a new HW ID. Make note of the HW
ID for use in step 15.
 The disk role is listed in the fourth column, and may be listed as:
- Inactive (acceptable: proceed to step 15)
- Spare (acceptable: proceed to step 15)

00X6949 Rev.1.40 4-17


Replacement Procedures: IBM PureData System for Analytics N3001

- Mismatched (acceptable: proceed to step 15)


- Failed (unacceptable: return to step 9 and use a new replacement disk)
b. For systems N3001-002 and larger, if the disk drive is not listed in the nzhw output
after waiting three minutes, type the command:
[nz@nzhost ~]$ nzhw probe -spa x
Where x is the number of the SPA where the disk was replaced (in this example 1).
Wait at least two minutes for disk recognition then repeat the check from the begin-
ning of step 14. If this second check is still unsuccessful, return to step 11 and
either re-seat the same disk or use new replacement disk.
15. For systems N3001-002 and larger, verify that the system recognizes all the paths to
the replacement disk.
Run the command:
[nz@nzhost ~]$ nzpush -a mpath -issues | grep -w <HW ID of replacement disk>
For example:
[nz@nzhost ~]$ nzpush -a mpath -issues | grep -w 1637
If no issues are listed, proceed to step 16.
If issues are listed, return to step 7 to failover this replacement disk and then re-seat
this disk or use new replacement disk (repeat all steps up to and including this step).
16. For systems N3001-002 and larger, verify that all paths to the disk are usable.
a. Identify the HW ID of the designated SPU for the SPA that includes the replace-
ment disk:
[nz@nzhost ~]$ nzhw -spa x -type spu -detail | grep Designated
Where x is the number of the SPA where the disk was replaced.
For example:
[nz@nzhost ~]$ nzhw -spa 1 -type spu -detail | grep Designated
Example output:
SPU 2757 spa1.spu9 Active Online Y010BG225056 10.0 32
CPU Cores; 125.90GB Memory; Ip Addr: 10.0.14.58; Designated Spu
The HW ID of the designated SPU is in the second column (in this example 2757).
b. Test each path to the disk:
[nz@nzhost ~]$ nzpush -id <x> mpath checkpaths --encl <y> --slot <z>
| grep Fail
Where <x> is the HW ID of the designated SPU, <y> is the Disk Enclosure number,
and <z> is the disk slot number identified in step 7.
For example:
[nz@nzhost ~]$ nzpush -id 2757 mpath checkpaths --encl 6 --slot 2
| grep Fail
If the paths are usable, this command produces no output.
Otherwise, if a path fails (as shown in the following example), return to step 7 to
failover this replacement disk and then re-seat this disk or use new replacement
disk (repeat all steps up to and including this step).
Example output for failed path:

4-18 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

spu0109: Error: unable to open /var/opt/nz/diskpath/


encl1Slot01hba0port1 : No such device or address
spu0109: encl1Slot01:
spu0109: encl1Slot01hba0port0 : OK
spu0109: encl1Slot01hba0port1 : Fail
17. Check the firmware level as described in “Checking the Firmware Revision of the
Replacement Disk,” update the firmware if necessary, then continue to step 20.
18. For N3001-001 systems, if you are replacing multiple drives, at this point return to
step 7 to begin the replacement of the next drive.
19. For N3001-002 and larger systems with versions of NPS prior to v7.2.0.2, and with
systems that are security disabled, you must configure the replacement disk bands.
To check if the system is security enabled, type the command:
[nz@nzhost ~]$ /nzlocal/scripts/hpfinfo
Example output:
HPF_RELEASE="5.4"
NPS_MODEL="Q100_M"
NPS_FAMILY="QSeries"
NPS_PLATFORM="xs"
NPS_MODEL_NAME="IBM PureData System for Analytics N3001-010"
NPS_NUM_RACKS="1"
NPS_NUM_SPAS="1"
NPS_SPUS_PER_SPA="7"
NPS_NUM_SPUS="7"
NPS_DENC_PER_SPA="12"
NPS_NUM_DENC="12"
NPS_HA_HOST="ha1"
NPS_HA_MASTER="ha1"
NPS_HA_ALLHOSTS="ha1 ha2"
NPS_HA_SHARED="/nz /export/home"
NPS_FABRIC_IFACE="bond2"
NPS_MGMT_IFACE="bond2"
NPS_SPA_IFACE="bond0"
SYS_AEK_ENABLED="NO"

If SYS_AEK_ENABLED=”NO” is listed, it indicates that security is disabled.


For systems that do not have security enabled:
a. Type the command:
[nz@nzhost ~]$ nzpush -s <spa>/<dspu of the spa> nzsed --encl
<enclosure number> --slot <disk slot number> --initialize
Where <spa> is the SPA number and <dspu of the spa> is the slot number of the
designated spu (identified in step 16, substep a). Enclosure number and disk slot
number of the replacement disk have already been identified.
For example:
[nz@nzhost ~]$ nzpush -s 1/9 nzsed --encl 6 --slot 2 --initialize
b. Type the following command to confirm the band settings:
[nz@nzhost ~]$ nzpush -s <spa>/<dspu of the spa> nzsed --encl
<enclosure number> --slot <disk slot number> --getBandInfo
--useMSID
Example output:

00X6949 Rev.1.40 4-19


Replacement Procedures: IBM PureData System for Analytics N3001

spu0101: name: Global_Range


spu0101: rangeStart: 0
spu0101: rangeLength: 0
spu0101: readLockEnabled: 0
spu0101: writeLockEnabled: 0
spu0101: readLocked: 0
spu0101: writeLocked: 0
spu0101: lockOnReset: [ POWER_CYCLE ]
spu0101: activeKey: [0000]: 00 00 08 06 00 00 00 01
spu0101: BAND MASTER 1:
spu0101: name: Band1
spu0101: rangeStart: 0
spu0101: rangeLength: 1172123567
spu0101: readLockEnabled: 0
spu0101: writeLockEnabled: 0
spu0101: readLocked: 0
spu0101: writeLocked: 0
spu0101: lockOnReset: [ POWER_CYCLE ]
spu0101: activeKey: [0000]: 00 00 08 06 00 00 00 02
spu0101: Done.

The rangeLength value in BAND MASTER 1 must be greater than 0.


If the rangeLength is equal to 0, then contact IBM Netezza Support, or swap out
the disk with a new replacement disk.
20. If the role of the replacement disk, as established in step 14, is Inactive or Mis-
matched, you must activate the disk.
Note: On N3001-001 systems, this command pauses the system for a few minutes.
Therefore this step should be performed during a maintenance window or when paus-
ing the system for a few minutes is acceptable to the customer.

To activate the disk, using the HW ID identified in step 14, type the command:
[nz@nzhost ~]$ nzhw activate -id <HW ID>
When prompted, type y for confirmation.
For example, if the HWID of the disk (identified in step 14) is 1637:
[nz@nzhost ~]$ nzhw activate -id 1637
On N3001-001 systems, when prompted, type y for confirmation.
Note: On N3001-001 during host disk activation the system is paused and SPUs
reconfigured and restarted. When the restart is complete the system goes back to the
online state. During this process several intermediate system states can also be
observed (Discovering, Initializing, Resuming).

The replacement disk now has the role of Spare.


To verify the role of the replacement disk (for example):
[nz@nzhost ~]$ nzhw -id 1637
Description HW ID Location Role State Security
----------- ----- --------------------- ------------- --------
Disk 1637 spa1.diskEncl6.disk2 Spare Ok Enabled
21. For systems N3001-002 and larger systems, if you are replacing multiple drives, at
this point return to step 7 to begin the replacement of the next drive.
22. Check to see if a disk regeneration is in progress:

4-20 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

[nz@nzhost ~]$ nzspupart -regenstatus

Compare the results to those obtained in step 6. If a regen was in process, progress
must have been noted. Otherwise, call IBM Netezza Support.
23. Check for data slice issues, in the event that a data slice may be degraded:
[nz@nzhost ~]$ nzds -issues

If a data slice is degraded, you must initiate a manual disk regeneration to the spare
disk. For more information, see the Netezza System Administrator's Guide, in Chapter
5, in the section “Regenerate a Disk Slice.”
24. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on
IMPORTANT! Do not start the disk regeneration of a new, spare disk until you have verified
that its firmware meets the minimum required firmware revision. Proceed to the next sec-
tion, "Checking the Firmware Revision of the Replacement Disk."

Checking the Firmware Revision of the Replacement Disk


Replacement disks must have an approved firmware revision to ensure normal operation of
the Netezza system. Currently, the approved disk firmware revisions are:
 600GB Model ST600MM0026E - E56D, E56F
Note: If the replacement disk has a firmware version higher than an approved version, con-
tact Netezza Support.

To check the firmware revision of a replacement disk drive, follow these steps:
1. Log in to the Netezza system as the nz user.
2. Display the firmware details of the replacement disk using its hardware ID:
a. For N3001-002 and larger systems, for HWID 1637:
[nz@nzhost ~]$ nzhw show -id 1637 -detail
Description HW ID Location Role State Serial number Version
----------- ----- ------------------- ----- ----- ------------------- -------
Disk 1637 spa1.diskEncl6.disk2 Spare Ok S0M28B230000M4357PEP E56D
Detail Model
-------------------------------
558.91 GiB; ST600MM0026;

In the sample, the disk firmware revision is in the Hw Version column (highlighted
in bold italics E56D).
b. For N3001-001 systems:
[nz@nzhost ~]$ nzhw show -id xxxx -detail
Where xxxx is the HWID from step 20 on page 4-20.
The output from this command includes a serial number. Use that serial number in
the command:
[nz@nzhost ~]$ nzhw show -detail | grep serial_number
Where serial_number is the serial number from the output of the previous
command.

00X6949 Rev.1.40 4-21


Replacement Procedures: IBM PureData System for Analytics N3001

The output from this command shows the firmware revision on the replacement disk
drive.
 If the disk firmware revision matches the correct revision, the firmware check pro-
cess is complete and you can skip the remaining steps of this procedure.
 If the disk firmware revision does not match the correct revision, you must update
the firmware by continuing with this procedure.
Note: The N3001 system performs disk firmware checking. If the replacement disk
does not match the minimum allowed firmware revision, an nzevent warning message
is generated and a warning logged in sysmgr.log.

For example:
NZEVENT
NPS system Q100-23E-D - Disk 1053 Needs attention. System
initiated.
location:Logical Name:'spa1.diskEncl1.disk6' Logical Location:'1st
Rack, 1st SPA, 1st DiskEnclosure, Disk 6'
error string:disk firmware revision is below supported level
devSerial:6XR1PF800000M226AT22
event source:System initiated

SYSMGR
2013-12-02 10:21:58.275990 EDT Warning: Disk [disk hwid=1053
sn="6XR1PF800000M226AT22" SPA=1 Parent=1008 Position=12
ParentEnclPosition=2] is at firmware revision 'B556', which is
below supported level of 'B55C'
Note: If the disk firmware revision is a later version than the version listed as the cor-
rect revision, it is not necessarily a problem and no action may be needed. Check with
IBM support if you need assistance.

3. As the root user, change directory:


[root@nzhost ~]# cd /opt/nz/fdt

A disk must be idle before you update its firmware. If a disk has a role of Spare, as shown
in the output of step 2, the disk is idle and you can proceed.

Note: The firmware updater does not allow the user to update disk firmware if its state
is active.

4. Update the drive firmware:


For N3001-002 and larger:
The firmware_updater command has the following format:
[root@nzhost fdt]# ./firmware_updater StorageMedia --spa a --encl b
--disk c
Where a is the SPA number, b is the enclosure number, and c is the disk drive number.
For example, for SPA 1, Enclosure 6, Disk 2:
[root@nzhost fdt]# ./firmware_updater StorageMedia --spa 1 --encl 6 --
disk 2

4-22 00X6949 Rev.1.40


Chapter : Replacing a Disk Drive

For N3001-001:
The firmware_updater command has the following format:
[root@nzhost fdt]# ./firmware_updater Host --update-storage --ignore-
cluster-state --ignore-nps-state --alias 'haX' --slot YY --skip-bmc-
login-test --skip-prompt
Where YY is the slot number for the disk as reported by nzhw (slot 1 through 24).
For example, for SPA 1, Enclosure 6, Disk 2:
[root@nzhost fdt]# ./firmware_updater Host --update-storage --ignore-
cluster-state --ignore-nps-state --alias 'haX' --slot 12 --skip-bmc-
login-test --skip-prompt

5. Verify that the replacement disk has the correct revision of the firmware (shown as
E56D in this example), and the overall drive health is OK (no predictive failures, low
Grown defect list value: less than 6 for new drives).
As user nz, type the commands:
[nz@nzhost fdt]$ cd /opt/nz/fdt
[nz@nzhost fdt]$ ./storage_diags smart --spa 1 --encl 6 --disk 2
Example output:
Now creating the lock file [DONE]
------------------------------------------------------------------
***** S T O R A G E D I A G S *****
FDT 4.2.0.0 - /opt/nz/fdt/log/storage_diags_20140905-172736.log
-----------------------------Smart--------------------------------
Checking SPU availability [DONE]
------------------------------------------------------------------
SC_IO: bad ioctl driver_status=8
DSK: Make : ST600MM0026
DSK: Model : IBM-ESXS
DSK: F/W Rev. : E56D
DSK: S/N : S0M1NQFA0000B4259BW0
DSK: Size : 600 GB, (1172123567 sectors)
DSK: Transport Protocol: SAS
DSK: Disk Location : encl3Slot20
---------------Write---------------
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 0
Total times correction algorithm processed = 0
Total bytes processed = 417254719768
Total uncorrected errors = 0
---------------Read---------------
Errors corrected without possible delays = 293538838
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 293538838
Total times correction algorithm processed = 0
Total bytes processed = 85849841480
Total uncorrected errors = 0
---------------Verify---------------
Errors corrected without possible delays = 2864805
Errors corrected with possible delays = 0
Total re-writes re-reads = 0
Total corrected errors = 2864805
Total times correction algorithm processed = 0

00X6949 Rev.1.40 4-23


Replacement Procedures: IBM PureData System for Analytics N3001

Total bytes processed = 852784400


Total uncorrected errors = 0
---------Non Medium-------------
Non-medium error count = 28
---------Temperature-------------
Current temperature = 22
Reference temperature = 65
---------Self Test-------------
Self-test Extended Duration = 62 minutes
Self-test Short Duration < 3 minutes
---------Background Scans ------------
Background Scanning Status = Background scanning is
enabled and the device is waiting for Background Medium Interval
timer experation
Number of background scans performed = 56
Background medium scan progress = 0%
---------Power On Hours-------------
Power On Hours = 3689
---------Grown list -------------
Grown defect list = 0
---------Predictive Failure -------------
Predictive failure = None
---------Date of Manufacturing---------
Calendar Year = 2013
Calendar Week = 52
------------------------------------------------------------------
Now removing the lock file [DONE]
-----------------------------SUMMARY------------------------------
Final Status [PASS]
Main log file - /opt/nz/fdt/log/storage_diags_20140905-172736.log
------------------------------------------------------------------

6. Return to step 20 on page 4-20 to activate the disk as Spare.

4-24 00X6949 Rev.1.40


CHAPTER 5
Replacing a Disk Enclosure
Each IBM PureData System for Analytics N3001 rack contains:
 Two Disk Enclosures - N3001-002
 Six Disk Enclosures - N3001-005
 Twelve Disk Enclosures - N3001-010 and larger
Each Disk Enclosures houses 24 disk drives.
Note: The N3001-001 system does not use Disk Enclosures.

Each Disk Enclosure contains two ESMs and two power modules.
Before you begin the Disk Enclosure replacement process, make certain that you have a
replacement component that conforms to the hardware models supported for the N3001
system.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for a disk enclosure midplane assembly is 81Y9834.
Complete details on Disk Enclosure removal and replacement is provided in the IBM Sys-
tem Storage EXP2500 Installation, User’s, and Maintenance Guide.
To replace a Disk Enclosure on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).

1. Read the safety information that begins on page v.


2. Log into the active host as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

5-1
Replacement Procedures: IBM PureData System for Analytics N3001

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off
4. Validate the current state of the system prior to replacement. Run sys_rev_check:
a. Change directory to:
[nz@nzhost ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost ~]$ ./sys_rev_check

c. Make note of the output to compare with results later in the procedure.
5. Check the state of the Netezza system:
[nz@nzhost ~]$ nzstate
System state is 'Online'.
6. If the system state is online, stop the system using the command:
[nz@nzhost ~]$ nzstop

7. Wait for the system to stop using the command:


[nz@nzhost ~]$ nzstate
System state is 'Stopped'.
8. Become user root:
[nz@nzhost ~]$ su

9. Power off all SPAs:


[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -off all

10. Locate the enclosure being replaced. Figure 1-2 on page 1-3 through Figure 1-11 on
page 1-12 show locations of system components.

Figure 5-1: Disk Enclosure LEDs

11. Power off the enclosure:

5-2 00X6949 Rev.1.40


Chapter : Replacing a Disk Enclosure

LEDs Power Connector

Figure 5-2: Power Supply LEDs and Connector

The disks must be removed (from the enclosure being replaced) and replaced in the exact
same locations, so ensure that their locations are noted.

12. From the enclosure being replaced, remove all cables from the power supplies and
ESMs, ensuring that the cables are properly labeled so that they can be replaced into
their original locations.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

13. Remove the shipping brackets (if installed, colored orange) at the rear of the disk
enclosures:
a. Using a 7mm socket, loosen (do not remove) the nut that secures the shipping
bracket to each side of the rack.
b. With the nuts loosened, pull the bracket away from the disk enclosure, detaching
the bracket from the enclosure.
c. To completely remove the bracket from the rack, cables on one side of the rack,
behind the enclosure, need to be loosened by cutting the zip tie (or loosening the
Velcro tie) that secures them to the side of the rack.

00X6949 Rev.1.40 5-3


Replacement Procedures: IBM PureData System for Analytics N3001

d. After cutting the zip tie (or loosening the Velcro tie) lift that end of the bracket off
the rail and then rotate the bracket out of the rack.
e. Reattach the cables to their original position using a new zip tie (or the Velcro tie).
f. Tighten the nuts that secured the bracket to each side of the rack.
g. Repeat for all disk enclosure brackets.
h. The brackets should be stored on-site in the event the system needs to be moved
(requiring them to be re-installed).
14. Remove all power supplies, ESMs, and disks.
The disks and ESMs must be removed from the failed enclosure and installed into the
replacement enclosure.
The power supplies and ESMs must be removed and replaced in the exact same locations,
so ensure that their location is noted.

15. Install the ESMs into the replacement disk enclosure, ensuring they are installed into
the same slots from which they were removed.
16. Install the power supplies into the replacement disk enclosure, ensuring they are
installed into the same slots from which they were removed.
17. Note the serial number of the replacement enclosure and install it into the rack. The
serial number label is located on the top of the enclosure, near the front.
18. Reconnect all cables to the power supplies and ESMs.
19. Install all disks into the enclosure, ensuring they are installed into the same slots from
which they were removed.
20. Power on all SPAs:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all
21. Start the bootp server:
a. Open another process window. (If using a KVM, press Alt-F2 to start a new process
window. If using a network connection and terminal session, open a new session.)
b. Issue the following command to make sure that the bootp server is started:
[root@nzhost1 ~]# /nz/kit/sbin/bootpsrv
Note: Leave this process running until you are instructed otherwise.

22. Wait 10 minutes to ensure that all SPUs are booted.


23. Run the following command to set the enclosure IDs:
[root@nzhost1 ~]# /nz/kit/bin/adm/encl_setIds
Check the latest entry in the logfile that can be found at /tmp/iocheck.log.
Note: All the power supplies of all disk enclosures must be operating correctly for this
command to successfully complete.

If any components have errors, troubleshoot that component and resolve the error
before proceeding.

5-4 00X6949 Rev.1.40


Chapter : Replacing a Disk Enclosure

24. Type the following command to initialize the Ethernet interface:


[root@nzhost ~]# /nzlocal/scripts/spade/spade_init.sh
25. Check SAS cable connectivity:
a. Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt

b. Run the command:


[root@nzhost ~]# ./system_diags DataPathCheck --rack x --encl y
--skip-power-cycle
Where x is the disk enclosure rack number and y is the enclosure number.
If any issues are noted, make the corrections and re-run the command.
26. Power cycle the SPAs:
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -off all
[root@nzhost1 ~]# /nzlocal/scripts/rpc/spapwr.sh -on all

27. Log out as user root and return to the user nz session.
[root@nzhost ~]# exit

28. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost ~]$ ./sys_rev_check

c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
29. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 4.
30. Exit bootp server:
Return to the process window where you started the bootp server and type Ctrl-C to
stop the bootp server. You may close this process window by typing exit at the prompt.

31. Start the NPS system:


[nz@nzhost ~]$ nzstart

32. Verify that the system eventually comes online:


[nz@nzhost ~]$ nzstate
System state is 'Online'.
33. Check for any issues with the commands:
[nz@nzhost ~]$ nzhw -issues
[nz@nzhost ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues

34. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

00X6949 Rev.1.40 5-5


Replacement Procedures: IBM PureData System for Analytics N3001

5-6 00X6949 Rev.1.40


CHAPTER 6
Replacing a Management Module
Before you begin the Management Module replacement process, make certain that you
have a replacement that conforms to the hardware models supported for the IBM PureData
System for Analytics N3001. Each N3001 rack has two Advanced Management Modules
(AMMs) (two per BladeCenter H chassis).
Note: The N3001-001 system does not use Management Modules.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the user to have root access.

The estimated time to perform this procedure is from 30 to 120 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the AMM is 47C2480, and can be verified as described in “H-Chassis
Component FRU Numbers” on page 1-13.
Note: For the N3001, use a replacement AMM of the same FRU number only. Do not mix
AMMs with different FRU numbers in the same system.

To replace an AMM on the N3001 system, follow these steps:


1. Read the safety information that begins on page v.
2. Inspect all power connections at the PDUs to ensure that all connections are tight and
secure.
3. Inspect all power supplies in the system to ensure that the Power 0n LEDS are lit
(green) and that all power supplies are functional.
4. Log into the active host as user nz.
5. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

6. Validate the current state of the system prior to replacement. Run sys_rev_check:
a. Change directory to:

6-1
Replacement Procedures: IBM PureData System for Analytics N3001

[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check

c. Make note of the output to compare with results later in the procedure.
7. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
8. If the system state is stopped, start the system using the command:
[nz@nzhost1 ~]$ nzstart
9. Wait for the system to come online using the command:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.
10. Identify the failed AMM that requires replacement:
[nz@nzhost1 ~]$ nzhw -issues
Description HW ID Location Role State
----------- ----- --------------------- -------- -----
MM 1004 spa1.mm1 Failed Ok
The State column typically has a state of Ok as shown in this output. The state could
also be Unreachable or None.
11. Obtain the physical location of the failed AMM:
[nz@nzhost1 ~]$ nzhw locate -id 1004
Note: The location of the AMM appears in the command output. The person who is run-
ning the command should communicate the location of the AMM to the person who is
onsite with the Netezza system.

12. Using the location information, physically locate the failing AMM. See Figure 1-9 on
page 1-10 through Figure 1-11 on page 1-12.
13. Disconnect the Ethernet cable from the AMM. If the failing AMM is in Chassis 1 and is
MM1, also disconnect the video and USB cables.

6-2 00X6949 Rev.1.40


Chapter : Replacing a Management Module

Release Handle

Video Connector

Ethernet Connector

USB Connectors

Figure 6-1: AMM Features

14. Remove the failing AMM (by pulling out and down on the release handle).

Figure 6-2: AMM Removal

15. Install the replacement AMM.


The replaced AMM is now AMMalt.
16. Reconnect the Ethernet cable and KVM cables (if applicable) to the AMM.
Do not interrupt the system until the AMM boot cycle has completed. Restoring configura-
tion from redundant AMM takes 8 to 20 Minutes. Do not remove the AMMs from slots
during the configuration synchronization process.

00X6949 Rev.1.40 6-3


Replacement Procedures: IBM PureData System for Analytics N3001

The primary AMM displays two LEDs in the ON state and the secondary AMM displays
one LED.
17. As user root from the active host, log into the active AMM:
[root@nzhost1 ~]# ssh USERID@mm00x
where x is the AMM SPA number (1-4).
When prompted for the password, type PASSW/0RD (the /0 is a zero, not the letter O).
18. Check the primary AMM:
system> info -T system:mm[1]
Name: mm001
UUID: 003A 5B9A CFD0 11DE B7F4 0021 5E43 B79E
Manufacturer: IBM (FOXC)
Manufacturer ID: 20301
Product ID: 65
Mach type/model: Advanced Management Module
Mach serial number: Not Available
Manuf date: 4609
Hardware rev: 18
Part no.: 49Y6295
FRU no.: 60Y0621
FRU serial no.: YK12909BC2KF
CLEI: Not Available
AMM firmware
Build ID: BPET086
File name: CNETCMUS.PKT
Rel date: 03/22/2011
Rev: 8
Product Name: IBM Advance Management Module
19. Monitor the replacement AMM and ensure the update completes. The Status line
shows No update in progress when the update is complete:
system> info -T system:mm[2]
Name: Standby MM
UUID: F421 F8D7 F019 11DE 930C 0021 5E43 E960
Manufacturer: IBM (FOXC)
Manufacturer ID: 20301
Product ID: 65
Mach type/model: Advanced Management Module
Mach serial number: Not Available
Manuf date: 5209
Hardware rev: 18
Part no.: 49Y6295
FRU no.: 60Y0621
FRU serial no.: YK12909CM2D4
CLEI: Not Available
AMM firmware
Build ID: BPET086
File name: CNETCMUS.PKT
Rel date: 03/22/2011
Rev: 8
Status: No update in progress
Product Name: IBM Advance Management Module
20. View the Event Log:

6-4 00X6949 Rev.1.40


Chapter : Replacing a Management Module

system> displaylog -T system:mm[1]


Verify that the AMM installation is logged, and that there are no errors.
21. Make sure the initialization completes (shown in the line 1 of the Event Log).
22. On systems where both AMMs in a chassis have been replaced (not necessarily at the
same time), it is necessary to check the AMM login configuration.
a. As root, type the command:
[root@nzhost1 ~]# ssh mm0xx
Where xx is the SPA number of the chassis with the replaced AMMs.
b. If you are not prompted for a password, skip to step 23. If you are prompted for a
password, press Ctrl-C.
c. Type the command (exactly as shown):
[root@nzhost1 ~]# /nz/kit/sbin/nzupgrade run CheckHost \; CheckHA \;
SetMmUser
Note: There are spaces before the backslashes.

Example output:
------------------------------------------------------------------
IBM Netezza – Netezza Platform Software
(C) Copyright IBM Corp. 2002, 2012 All rights reserved.
------------------------------------------------------------------
WARNING: Inconsistencies ignored due to command line options:
Target and origin directories are identical (/nz/kit.7.0.22389).
Target and origin releases are the same.

Logfile: /nz/var/log/upgrade.20120402.7.0.run

Logfile: /nz/var/log/upgrade.20120402.7.0.run.gz
d. Repeat step a. If still prompted for a password, contact IBM Netezza Support.
23. As user nz, verify system is still Online.
[nz@nzhos1t ~]$ nzstate
System state is 'Online'.
24. Check for any issues with the commands:
[nz@nzhost ~]$ nzhw -issues
[nz@nzhost ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues

25. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

26. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check

00X6949 Rev.1.40 6-5


Replacement Procedures: IBM PureData System for Analytics N3001

c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
27. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 6.

Note: If the replacement AMM is being reused from a former system, the VPD information
may not be refreshed after installation. If it is noticed that the AMM does not have the cor-
rect name in VPD:
1. Ensure the replacement AMM is the primary.
If the replacement AMM is not the primary, failover the other AMM:
a. As user root, log into the AMM:
[root@nzhost1 ~]# ssh mm0xx
Where xx is the SPA number of the chassis with the replaced AMM.
b. Failover the AMM:
system> reset -force -T mm[y]
Where for y, 1 is the top AMM in the chassis, and 2 is the bottom AMM in the
chassis.
For example, if the top AMM was replaced, and the bottom AMM is now primary:
system> reset -force -T mm[2]
To cause the top AMM to be primary, which also closes the current login session to
the AMM.
Before continuing after resetting an AMM, you must wait 15 for the new primary AMM
to be active.
2. As user root, log into the AMM:
[root@nzhost1 ~]# ssh mm0xx
Where xx is the SPA number of the chassis with the replaced AMM.
3. Check the VPD of the AMM:
system> config -T mm[y]
Where for y, 1 is the top AMM in the chassis, and 2 is the bottom AMM in the chassis.
The name of the AMM must be in the form of mm0xx, where xx is the SPA number of
the chassis with the replaced AMM.
4. If the name of the AMM needs to be corrected, type the command:
system> config -name mm0xx -T mm[y]
Where xx is the SPA number of the chassis with the replaced AMM and where for y, 1 is
the top AMM in the chassis, and 2 is the bottom AMM in the chassis.
5. Exit from the AMM:
system> exit

6-6 00X6949 Rev.1.40


CHAPTER 7
Replacing a 10Gb Switch
Before you begin the switch replacement process, make certain that you have a replace-
ment switch that conforms to the hardware models supported for the IBM PureData System
for Analytics N3001. Each N3001 rack has two 10Gb switches (two per BladeCenter H
chassis).
Note: The N3001-001 system does not use 10Gb switches.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

The estimated time to perform this procedure is from 60 to 120 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the 10Gb switch is 90Y9392, and can be verified as described in “H-
Chassis Component FRU Numbers” on page 1-13.
To replace a switch on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).

1. Read the safety information that begins on page v.


2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
Note: If the system is already in maintenance mode, an error message is output. To
identify the active (primary) host, type the command:

[root@nzhost1 ~]# service drbd status


The following line in the output shows the active/passive hosts:
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
Primary is always HA1, and secondary is HA2. In this example, HA1 (primary) is the
active host. (If the output showed as Secondary/Primary, HA2 would be the active
host.)

7-1
Replacement Procedures: IBM PureData System for Analytics N3001

3. Log into the active host (assumed here to be nzhost1) of the system as user in as root.
4. Change to user nz:
[root@nzhost1 ~]# su - nz

5. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

6. Run sys_rev_check to validate the current state of the system prior to replacement:
a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check

c. Make note of the output to compare with results later in the procedure.
7. Identify the failing switch by using one of these techniques:
 Issue the following command to the AMM from the Host 1:
[nz@nzhost1 ~]$ ssh USERID@mm00x health -l all -f
Where x is the SPA number of the failing switch.
When prompted for the password, type PASSW0RD (use the number zero, not the
letter O).
The command output lists component health status and active alerts. Note the ID of
the failing switch. See Figure 1-3 on page 1-4 through Figure 1-11 on page 1-12.
 Issue the following command:
[nz@nzhost1 ~]$ nzhw -type ethsw
Check the Role and Status in the output of the command for the failing switch.
8. If NPS resource group is running (from step 2):
a. Run the following command to stop the Netezza server:
[nz@nzhost1 ~]$ nzstop

b. Exit from the nz session and return to user root:


[nz@nzhost1 ~]$ exit
c. Logged into the active host as root, type the following commands to stop the cluster-
ing processes:
[root@nzhost1 ~]# ssh ha2 service heartbeat stop
[root@nzhost1 ~]# service heartbeat stop

d. Run the following script:


[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
Ensure the script completes without errors.
9. Remove all cables from the switch, ensuring that the cables are properly labeled so
that they can be replaced into their original locations.

7-2 00X6949 Rev.1.40


Chapter : Replacing a 10Gb Switch

Statement 3

CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
 Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
 Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.

DANGER

Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.

10. Remove the failed switch module from the chassis.


Note: It may be necessary to temporarily move cabling out of the way to make room for
the switch removal and replacement.

Figure 7-1: Gb Switch Removal

00X6949 Rev.1.40 7-3


Replacement Procedures: IBM PureData System for Analytics N3001

11. Install the replacement switch module. Replace the cables into their original positions.
Note: It may be necessary to move cabling back into place if it was moved out of the
way to make room for the switch removal and replacement.

12. Change directory to /nzlocal/scripts/spa:


[root@nzhost1 ~]# cd /nzlocal/scripts/spa

13. Type the command:


[root@nzhost1 ~]# ./spaconfigure.sh -n -s spa_number -components
"switch[7] switch[9]"
For example, if replacing the either switch in SPA 4:
[root@nzhost1 ~]# ./spaconfigure.sh -n -s 4 -components "switch[7]
switch[9]"
When prompted, log into the AMM:
Please enter User ID
USERID
Please enter password
PASSW0RD (with number zero, not the letter O)
14. If spaconfigure fails on the first attempt, run spaconfigure again without the -n option,
and spaconfigure will pick up where it left off.
15. If spaconfigure continues to fail, find the reason by examining the spaconfigure log at:
/var/log/nz/spaconfigure/spaconfigure.SPAxx.log. Correct the issue, then re-run
spaconfigure.
16. Ping the Gb switch:
[root@nzhost1 ~]# ping gigsw[xx][a,b]
For example:
[root@nzhost1 ~]# ping gigsw01b

17. Run the following command to check connections to the switch:


[root@nzhost1 ~]# /nz/kit/bin/adm/tools/nznetw -a
18. Run the following commands to test system connections from the switch:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./system_diags concheck --uut gigsw

19. Put the system back into cluster mode:


a. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh

b. Type the following commands to start the clustering processes on the active host:
[root@nzhost1 ~]# service heartbeat start
[root@nzhost1 ~]# ssh ha2 'service heartbeat start'

20. From the active host (ha1), type the following and press Enter:
[root@nzhost1 ~]# crm_mon -i5
Result: When the cluster manager comes up and is ready, status appears as follows.
Make sure that nzinit has started before you proceed. (This could take a few min-
utes.)

7-4 00X6949 Rev.1.40


Chapter : Replacing a 10Gb Switch

Node: nps61074 (e890696b-ab7b-42c0-9e91-4c1cdacbe3f9): online


Node: nps61068 (72043b2e-9217-4666-be6f-79923aef2958): online

Resource Group: nps


drbd_exphome_device(heartbeat:drbddisk): Started nps61074
drbd_nz_device(heartbeat:drbddisk): Started nps61074
exphome_filesystem(heartbeat::ocf:Filesystem): Started
nps61074
nz_filesystem (heartbeat::ocf:Filesystem): Started nps61074
fabric_ip (heartbeat::ocf:IPaddr): Started nps61074
wall_ip (heartbeat::ocf:IPaddr): Started nps61074
nzinit (lsb:nzinit): Started nps61074
fencing_route_to_ha1(stonith:apcmaster): Started nps61074
fencing_route_to_ha2(stonith:apcmaster): Started nps61068
21. From host 1 (ha1), press Ctrl+C to break out of crm_mon.
22. Change to user nz:
[root@nzhost1 ~]# su - nz

23. The system may require up to 10 minutes to come online. Verify that the system state
is online using the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
24. Check for any issues with the commands:
[nz@nzhost1 ~]$ nzhw -issues
[nz@nzhost1 ~]$ nzds -issues
[nz@nzhost1 ~]$ nzspupart -issues

25. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

26. Run sys_rev_check to verify that the system is configured correctly.


a. If the latest version of sys_rev_check is not already loaded onto the system, load the
latest as instructed in the FDT User’s Guide.
b. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt
c. Run the command:
[nz@nzhost1 ~]$ ./sys_rev_check

d. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”
27. Review the information on the screen to make note of the current firmware versions,
comparing the present results to the results from step 6.

00X6949 Rev.1.40 7-5


Replacement Procedures: IBM PureData System for Analytics N3001

7-6 00X6949 Rev.1.40


CHAPTER 8
Replacing a Host System Board (N3001-001)
Before you begin the Host Server System Board replacement process, make certain that
you have a replacement system board that conforms to the hardware models supported for
the IBM PureData System for Analytics N3001-001. Typically, you will use a new replace-
ment host system board.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

Because of an issue with the early version of RAID Flash cards (FRU number 46C9793),
whenever replacing the planar of a host server that has as an early version of the RAID
Flash, the RAID Flash is also to be replaced (using new FRU number 44W3393) at the
same time as the planar. The following system serial numbers are the systems that include
the early RAID Flash cards: NZ33000 to NZ33028, NZ33100 to NZ33109, 7837001 to
7837038.

If replacing networking components in the host in addition to the system board, you must
replace just one component at a time, completing each procedure first, and the continuing
to another component. Otherwise, it is difficult to determine which MAC address is
assigned to which port.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
Note: The Host Server firmware must be updated as part of this procedure. You must have
bootable media available for the firmware update. FDT Support Tools 2.0.0.1 provides tools
and instructions for creating bootable USB drives and includes the latest critical host firm-
ware updates.

There is one host type for the IBM PureData System for Analytics N3001-001.
For the x3650-M4-HD, there are ten ports (eth0 through eth9).

8-1
Replacement Procedures: IBM PureData System for Analytics N3001

The N3001-001 host system board uses Feature on Demand (FoD) keys for RAID configu-
ration and remote access. To restore the FoD keys, you must use a laptop computer to
retrieve the keys from the IBM Features on Demand website (https://fod2.lenovo.com/lkms/
angular/app/pages/index.htm) and then store the keys on a USB flash drive.
1. Log into the FoD website: https://fod2.lenovo.com/lkms/angular/app/pages/index.htm.
You need to have or create an IBM id for access.
2. Click on Retrieve history.
3. In the Search type dropdown, select Search history via UID.
4. In the Search value field, you must specify the server UID, which is a concatenation of
the machine type and system serial number (for example, 5460KQ5N05V).
5. Click Continue.
6. Select all active keys and press Download to save the keys on a USB flash drive to be
used later.

To replace an Host system board on the N3001-001 system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the active host of the system as user root.
3. Record IMM information to be restored after the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 3: View existing information
--- Reloading info from remote management...
--- Network Enabled = Enabled
--- DHCP Client = Disabled
--- Hostname = IMM2-40f2e92d2e76
--- IP Address = 10.0.46.178
8-2 00X6949 Rev.1.00
--- Subnet = 255.255.255.0
--- Gateway = 10.0.46.254
Make note of the information listed in the output.
Choose option 4: Exit
4. Save VPD to /nzscratch on other host server. On the host requiring the system board
replacement:
a. Change directory:
[root@nzhost1 ~]# cd /nz/export/tools/asu

b. Save the VPD:


[root@nzhost1 ~]# ./asu save /nzscratch/savedVPD.txt --group SYSTEM_
PROD_DATA

c. Copy the VPD to the other host (assuming ha2 is the other host):
[root@nzhost1 ~]# scp /nzscratch/savedVPD.txt root@ha2:/nzscratch/
savedVPD.txt

8-2 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

d. The VPD must be available for console access later in the procedure (step 17,
substep i), or printed if necessary.
5. Change to user nz:
[root@nzhost1 ~]# su - nz

6. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off
Note: Ensure that nzcallhome returns status as disabled. If there are errors in the call-
Home.txt configuration file, errors are listed in the output, and callHome is disabled.

7. Check to see if the host drives have encryption Auto-Lock mode enabled. Type:
[nz@nzhost1 ~]$ nzhw show -type hostDisk
The Security column lists Enabled or Disabled. If Enabled, the drives are locked.
This information is required when you reach step 12. If Disabled, skip step 12, step
19, and step 26.
8. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

9. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

10. Wait for the system to stop, using the command:


[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.

11. Exit from the nz session to return to user root:


[nz@nzhost1 ~]$ exit

12. If host disks are encryption Auto-Lock mode Disabled (from step 7), skip to step 13. If
host disks are encryption Auto-Lock mode Enabled (from step 7), extract the host
key(s):
a. Log in to the active host as user root.
b. Determine which host keys are stored:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey list
Example output:
hostkey1
hostkey2
spuaek
hostkey1old
hostkey2old
Note: If only one key per host were generated, the files hostkeynold are not listed.

c. Use the following commands to complete any key activity:

00X6949 Rev.1.40 8-3


Replacement Procedures: IBM PureData System for Analytics N3001

[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume -override


Note: If this command fails, contact IBM Netezza Support.

[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey check -hostkey -override


Example output:
AEK feature is enabled
Host AEK status:
Verification of host ha1 AEK key: SUCCESS
Verification of host ha2 AEK key: SUCCESS
d. Use the following commands to create text files with the extracted keys for the
hosts:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey extract -label
hostkey[1|2] -file /nzscratch/hostkey[1|2].txt
Where you choose 1 if HA1 is the host requiring the replacement, and 2 if HA2 is
the host requiring the replacement.
Example output:
Key written to file
The following commands are used only if hostkeynold files are in the output of the
command in substep b:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey extract -label
hostkey[1|2]old -file /nzscratch/hostkey[1|2]old.txt
Example output:
Key written to file
[root@nzhost1 ~]# diff /nzscratch/hostkey[1|2].txt
/nzscratch/hostkey[1|2]old.txt
The diff command compares the keys in hostkey and hostkeynold. If the values
are the same (no output), it is a valid key. If the values are different, contact IBM
Netezza Support.
e. Save the contents of the hostkey text files for use in step 19 on page 8-8.
13. Type the following commands to put the system in Maintenance mode:
[root@nzhost1 ~] /opt/nz-hwsupport/install_tools/nz-hbmgr.pl
Example output:
What do you want to do?

1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one:
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING

8-4 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

Resource status: Stopped


Current NPS state is Stopped
Splitbrain is not detected.

Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 1.
Example output:
You have selected ha1
Stopping nps resource . . .
Stopping Heartbeat on both ha2...
Stopping Heartbeat on both ha1...
Putting ha1 into maintenance mode...
Done
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING
Resource status: Stopped
Current NPS state is Stopped
Splitbrain is not detected.ha1 appears to be maintenance mode
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Select option 4:
What do you want to do?

1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one:
Select option 3.
14. On the replacement system board, there are stickers with MAC addresses for all net-
work cards ports and USB ports. You must change the MAC addresses configured in
NPS for the failed system board and replace them with MAC addresses assigned to the
replacement system board.
Using an editor, for example vi, change the following files on the host that requires the
system board replacement:
a. Type the command:
[nz@nzhost1 ~]$ vi /etc/udev/rules.d/70-persistent-net.rules
Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for
eth0 with the value for MAC 1 from the replacement system board
Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for
eth6 with the value for MAC 2 from the replacement system board

00X6949 Rev.1.40 8-5


Replacement Procedures: IBM PureData System for Analytics N3001

Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for


eth1 with the value for MAC 3 from the replacement system board
Replace the MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for
eth7 with the value for MAC 4 from the replacement system board
Save and exit the file.
b. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth0
Replace the MAC address (value for attribute HWADDR=) for eth0 with the value for
MAC1 from the replacement system board.
Remove the UUID from the file.
Save and exit the file.
c. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth6
Replace the MAC address (value for attribute HWADDR=) for eth6 with the value for
MAC2 from the replacement system board.
Remove the UUID from the file.
Save and exit the file.
d. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth1
Replace the MAC address (value for attribute HWADDR=) for eth1 with the value for
MAC3 from the replacement system board.
Remove the UUID from the file.
Save and exit the file.
e. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-eth7
Replace the MAC address (value for attribute HWADDR=) for eth7 with the value for
MAC4 from the replacement system board.
Remove the UUID from the file.
Save and exit the file.
f. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-usb0
Replace the MAC address (value for attribute HWADDR=) for usb0 with the value
for LAN2/USB MAC from the replacement system board.
Save and exit the file.
15. Type the following command on the host requiring the system board replacement:
[root@nzhost1 ~]# chkconfig heartbeat off
[root@nzhost1 ~]# shutdown -h now
Note: If the host is in a state where the above step is not possible, power off the host
by holding in the power button.

16. Replace the system board following the IBM replacement procedures in the IBM Prob-
lem Determination and Service Guide for the server.

8-6 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

17. The host server firmware must be updated (including critical updates from FDT Sup-
port Tools 2.0.0.1).
a. Insert the host-specific USB stick into the front USB port of the host that is being
configured. (Refer to the FDT Support Tools DVD for instructions on creating the
firmware update USB stick.)
b. Press F12 when the splash screen appears.
c. Select USB: Storage – USB Port#
d. Ignore the prompt to enter debug mode. Select f to select all entries when the selec-
tion appears on the screen.
e. Select a to accept the menu.
f. Answer Y to update ASU settings.
g. Type y when prompted to save logs.
The firmware and ASU settings are updated.
h. Interrupt the countdown before the host shuts down, which opens a command line
in debug mode.
i. Change the VPD to those saved from replaced system board (UUID, SERIAL AND
MTM data):
cd asu
./asu64 set SYSTEM_PROD_DATA.SysInfoUUID <saved uuid_value>
./asu64 set SYSTEM_PROD_DATA.SysInfoSerialNum <saved s/n>
./asu64 set SYSTEM_PROD_DATA.SysEncloseAssetTag <3561-AAR, NZ12345>
MTM for restricted system is same 3561-AAR, non-restricted MTM is 3561-AAJ,
Serial will vary (NZ12345 is an example).
./asu64 save savedVPD.txt --group SYSTEM_PROD_DATA
cat savedVPD.txt <check data correctness>
reboot <host reboots twice>
18. As the host boots the second time, press F1 at the splash screen to enter the Unified
Extensible Firmware Interface (UEFI) menu, and restore IMM information recorded
prior to the replacement.
a. Select System Settings -> Integrated Management Module.
b. Then select Network Configuration -> Network Interface Port and set as Shared.
c. Select DHCP Control and set as Static IP.
d. Select IP Address and type the IMM IP address recorded in step 3 on page 8-2.
e. Select Subnet Mask and type the address recorded in step 3 on page 8-2.
f. Select Default Gateway and type the address recorded in step 3 on page 8-2.
g. Ensure that VLAN support is set to Disabled.
h. Select Save Network Settings.
i. Press Esc repeatedly to return to the System Configuration and Boot Management
screen.

00X6949 Rev.1.40 8-7


Replacement Procedures: IBM PureData System for Analytics N3001

19. If the host drives are Auto-Lock mode Disabled (as identified in step 7 on page 8-3)
skip to step a. If the host drives are encryption Auto-Lock mode Enabled, as identified
in step 7 on page 8-3:
a. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management
-> Manage Foreign Configuration -> Preview Foreign Configuration -> Enter Security
Key for Locked Drives -> Security Key
b. In next screen in the Security Key field, type the key obtained in step 12 substep d,
then type the key again in the Confirm field.
The key must be typed correctly without errors, and you must record your entry (written or
photo) in the event that you need to change the key value later.

c. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management
-> Manage Foreign Configuration -> Preview Foreign Configuration -> Import
Foreign Configuration
d. Confirm and select yes and press Enter.
e. At the message The operation has been performed successfully, select OK and press
Enter.
20. Restore the FoD keys:
Note: FoD keys for both RAID6 and IMM2 update need to be installed.

a. Connect the SSR’s laptop computer to the switch that provides network connectivity
to IMM/LOM shared port (management network)
b. Connect the USB flash drive with the FoD keys to the laptop.
c. From the laptop’s browser, navigate to the IMM IP address recorded in step 3 on
page 8-2.
d. Click Connecting to this WEB site not recommended. Ignore the warning.
e. Type the user name USERID and password PASSW0RD (use a zero instead of the
letter O).
f. Navigate to IMM Management -> Activation Key Management.
g. Click Add.
h. Select File and browse for the FoD keys from the USB flash drive attached to the
laptop. Click Close.
i. The keys are now listed in Activation Key Management with a status of Activation
key is valid. This confirms that the keys are applied to the system. Find the correct
serial number of the intended Host and click OK.
j. Log out and remove the laptop from the Host system.
k. Click Close and Log Out when finished.
21. From the UEFI, press Esc to exit and then type Y to save the changes. The host boots.
22. Restore the LOM addresses:

8-8 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

a. Type the following commands and note the outputs relating to the LOM addresses
for HA1 and HA2:
[root@nzhost1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
|grep IPADDR
[root@nzhost1 ~]# ssh ha2 cat /etc/sysconfig/network-scripts/
ifcfgeth0 |grep IPADDR

b. Type the following command and note the IMM1 and IMM2 entries:
[root@nzhost1 ~]# cat /etc/hosts |grep imm
Example output:
–- 10.0.46.178 imm1
–- 10.0.46.180 imm2
c. Type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/ipminetcfg
Example output:
–- Configure IMM network (this will restart network service) [y/n]?
Type y and press Enter.
-–- Enter IP address for IMM on HA1 :
Type and press Enter.
–-- Re-enter IP address for IMM on HA1 []:
Retype the and press Enter.
–-- Enter LOM address on HA1 :
Type and press Enter.
–-- Re-enter LOM address on HA1 [] :
Retype the and press Enter.
–-- Enter IP address for IMM on HA2 :
Type and press Enter.
–-- Re-enter IP address for IMM on HA2 [] :
Retype the and press Enter.
–-- Enter LOM address on HA2 :
Type and press Enter.
–-- Re-enter LOM address on HA2 []:
Retype the and press Enter.
–-- Enter IMM network gateway :
Type and press Enter.
–-- Re-enter IMM network gateway []:
Retype and press Enter.
–-- Enter IMM network mask :
Type and press Enter.
–-- Re-enter IMM network mask []:
Retype and press Enter.
–-- Configuring IMM network, it may take some time
IMM network configuration completed
–-- Update IMM usernames and passwords (this will change user 2
parameters on IMMs) [y/n]?
Type y and press Enter.
–-- Enter username for IMMs [enter for default]:
Press enter to set default IMM username which is 'USERID'
–-- Enter user password for IMMs [enter for default]:
Press enter to set default password which is 'PASSW0RD', with '0'
Note: The IMM passwords are reverted to default ones on both hosts and the stonith
configuration has been updated to reflect these changes. User should update the pass-

00X6949 Rev.1.40 8-9


Replacement Procedures: IBM PureData System for Analytics N3001

words at his earliest convenience to ensure security of the system. These passwords
can only be updated using /nzlocal/scripts/ipminetcfg script.

23. From HA1, put the system into non-heartbeat mode:


[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh

24. Verify that all cables are connected correctly:


[root@nzhost1 ~]# /opt/nz/fdt/system_diags concheck
Example output:
------------------------------------------------------------------
***** S Y S T E M D I A G S *****
FDT 4.2.1.0 - /opt/nz/fdt/log/system_diags_20140925-074438.log
--------------------------- CONCHECK -----------------------------
Now checking the state of the system [PASS]
----------------------------- Rack 1 ------------------------------
------------ Rack 1 Host Bonded Network Connection Test -----------
Host ha1 iscsi-Link bond1 eth3 > Host ha2 eth3 [PASS]
Host ha1 iscsi-Link bond1 eth5 > Host ha2 eth5 [PASS]
Host ha1 iscsi-Ping bond1 eth3 > Host ha2 eth3 [PASS]
Host ha1 iscsi-Ping bond1 eth5 > Host ha2 eth5 [PASS]
Host ha1 iscsi-PingF bond1 eth3 eth5 Host ha2 bond1 eth3 eth5[PASS]
Host ha1 ha-Link bond2 eth2 > Host ha2 eth2 [PASS]
Host ha1 ha-Link bond2 eth4 > Host ha2 eth4 [PASS]
Host ha1 ha-Ping bond2 eth2 > Host ha2 eth2 [PASS]
Host ha1 ha-Ping bond2 eth4 > Host ha2 eth4 [PASS]
Host ha1 ha-PingF bond2 eth2 eth4 > Host ha2 bond2 eth2 eth4[PASS]
Host ha1 fabric-Link bond1:1 > Host ha2 bond1:1 [PASS]
Host ha1 drbd-Link bond1:2 > Host ha2 bond1:2 [PASS]
Host ha1 br0 bond2 [PASS]
Host ha2 iscsi-Link bond1 eth3 > Host ha1 eth3 [PASS]
Host ha2 iscsi-Link bond1 eth5 > Host ha1 eth5 [PASS]
Host ha2 iscsi-Ping bond1 eth3 > Host ha1 eth3 [PASS]
Host ha2 iscsi-Ping bond1 eth5 > Host ha1 eth5 [PASS]
Host ha2 iscsi-PingF bond1 eth3 eth5 Host ha1 bond1 eth3 eth5[PASS]
Host ha2 ha-Link bond2 eth2 > Host ha1 eth2 [PASS]
Host ha2 ha-Link bond2 eth4 > Host ha1 eth4 [PASS]
Host ha2 ha-Ping bond2 eth2 > Host ha1 eth2 [PASS]
Host ha2 ha-Ping bond2 eth4 > Host ha1 eth4 [PASS]
Host ha2 ha-PingF bond2 eth2 eth4 > Host ha1 bond2 eth2 eth4[PASS]
Host ha2 fabric-Link bond1:1 > Host ha1 bond1:1 [PASS]
Host ha2 drbd-Link bond1:2 > Host ha1 bond1:2 [PASS]
Host ha2 br0 bond2 [PASS]
Now removing the lock file [DONE]
-----------------------------SUMMARY------------------------------
Final Status [PASS]
Main log file - /opt/nz/fdt/log/system_diags_20140925-074438.log
------------------------------------------------------------------
25. Run the following command:
[root@nzhost1 ~]# /opt/nz-hwsupport/install_tools/nz-hbmgr.pl
Example output:
What do you want to do?

1: Maintenance Management

8-10 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING

Resource status: Stopped


Current NPS state is Stopped
Splitbrain is not detected.
ha1 appears to be in maintenance mode
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 3.
Example output:
Removing ha1 from maintenance mode.
Starting Heartbeat on ha1...
Starting Heartbeat on ha2...
Set resource to start when heartbeat finishes coming up?
y/n [n]: y
Waiting for heartbeat to come online...
Starting nps resource...
HA1:
drbd status = RUNNING
heartbeat status = RUNNING
HA2:
drbd status = RUNNING
heartbeat status = RUNNING

Resource status: Started


Current active node is ha1
Current NPS state is Stopped
Splitbrain is not detected.

Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 4 to go back to the previous menu.
Type 3 to exit.

00X6949 Rev.1.40 8-11


Replacement Procedures: IBM PureData System for Analytics N3001

26. If the host drives are Auto-Lock mode Disabled (as identified in step 7 on page 8-3)
skip to step 27. If the host drives are encryption Auto-Lock mode Enabled, as identi-
fied in step 7 on page 8-3:
a. As user nz, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

b. The virtual drive may need to be secured. To check if it is secured, as user root,
issue the following MegaCli command;
[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show all
Example output:
Controller = 0
Status = Success
Description = None
/c0/v0 :
======
-----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
-----------------------------------------------------------
0/0 RAID10 Optl RW Yes NRAWBC - 1.089 TB
-----------------------------------------------------------
.
.
.
Span Depth = 2
Number of Drives Per Span = 2
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disabled
Encryption = FDE
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
Creation Date = 04-09-2014
Creation Time = 03:48:53 PM
Emulation type = None

c. If the Encryption is blank (or none), secure it with following command:


[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/dall set
security=on

d. For all systems with Auto-Lock mode Enabled, issue the commands:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume
Note: If this command fails, contact IBM Netezza Support.

[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey check -hostkey


Example output:
AEK feature is enabled
Host AEK status:
Verification of host ha1 AEK key: SUCCESS
Verification of host ha2 AEK key: SUCCESS
e. As user nz, start the system:
[nz@nzhost1 ~]$ nzstart

8-12 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-001)

27. Type the following command on the host requiring the system board replacement:
[root@nzhost1 ~]# chkconfig heartbeat on
28. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

29. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.
30. Type the following and press Enter:
[root@nzhost1 ~]# crm_mon -i5
Result: When the cluster manager comes up and is ready, status appears as follows.
Make sure that nzinit has started before you proceed. (This could take a few min-
utes.)
Node: nps61074 (e890696b-ab7b-42c0-9e91-4c1cdacbe3f9): online
Node: nps61068 (72043b2e-9217-4666-be6f-79923aef2958): online

Resource Group: nps


drbd_exphome_device(heartbeat:drbddisk): Started nps61074
drbd_nz_device(heartbeat:drbddisk): Started nps61074
exphome_filesystem(heartbeat::ocf:Filesystem): Started nps61074
nz_filesystem (heartbeat::ocf:Filesystem): Started nps61074
fabric_ip (heartbeat::ocf:IPaddr): Started nps61074
wall_ip (heartbeat::ocf:IPaddr): Started nps61074
nzinit (lsb:nzinit): Started nps61074
fencing_route_to_ha1(stonith:apcmaster): Started nps61074
fencing_route_to_ha2(stonith:apcmaster): Started nps61068
31. Press Ctrl+C to break out of crm_mon.
32. Change to user nz:
[root@nzhost1 ~]# su - nz

33. The system may require up to 10 minutes to come online. Verify that the system state
is online using the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'

00X6949 Rev.1.40 8-13


Replacement Procedures: IBM PureData System for Analytics N3001

8-14 00X6949 Rev.1.40


CHAPTER 9
Replacing a Host System Board (N3001-002 or larger)
Before you begin the Host Server System Board replacement process, make certain that
you have a replacement system board that conforms to the hardware models supported for
the IBM PureData System for Analytics N3001. Typically, you will use a new replacement
host system board.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

If replacing networking components in the host in addition to the system board, you must
replace just one component at a time, completing each procedure first, and the continuing
to another component. Otherwise, it is difficult to determine which MAC address is
assigned to which port.

The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
Note: The Host Server firmware must be updated as part of this procedure. You must have
bootable media available for the firmware update. FDT Support Tools 2.0.0.1 provides tools
and instructions for creating bootable USB drives and includes the latest critical host firm-
ware updates.

There are two host types for the IBM PureData System for Analytics N3001:
 x3650M4 (which has four Ethernet ports that must be configured: eth6, eth7, eth0,
and eth1).
 x3750M4 (which has no Ethernet ports to configure).

The x3650M4 (N3001-002 and -005) host system board uses a Feature on Demand (FoD)
key for remote access. To restore the FoD keys, you must use a laptop computer to retrieve
the keys from the IBM Features on Demand website (https://fod2.lenovo.com/lkms/angular/
app/pages/index.htm).
1. Log into the FoD website: https://fod2.lenovo.com/lkms/angular/app/pages/index.htm.
You need to have or create an IBM id for access.
2. Click on Retrieve history.
3. In the Search type dropdown, select Search history via UID.
4. In the Search value field, you must specify the server UID, which is a concatenation of
the machine type and system serial number (for example, 8722KQ5N05V).
5. Click Continue.
6. Select all active keys and press Download to save the key(s) in a location to be used
later.

9-1
Replacement Procedures: IBM PureData System for Analytics N3001

To replace a Host system board on the N3001 system, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the active host of the system as user root.
3. On the host requiring the system board replacement, record IMM information to be
restored after the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 3: View existing information
--- Reloading info from remote management...
--- Network Enabled = Enabled
--- DHCP Client = Disabled
--- Hostname = IMM2-40f2e92d2e76
--- IP Address = 10.0.46.178
8-2 00X6949 Rev.1.00
--- Subnet = 255.255.255.0
--- Gateway = 10.0.46.254
Make note of the information listed in the output.
Choose option 4: Exit
4. Save VPD to /nzscratch on other host server. On the host requiring the system board
replacement:
a. Change directory:
[root@nzhost1 ~]# cd /nz/export/tools/asu

b. Save the VPD:


[root@nzhost1 ~]# ./asu save /nzscratch/savedVPD.txt --group SYSTEM_
PROD_DATA

c. Copy the VPD to the other host (assuming ha2 is the other host):
[root@nzhost1 ~]# scp /nzscratch/savedVPD.txt root@ha2:/nzscratch/
savedVPD.txt

5. Change to user nz:


[root@nzhost1 ~]# su - nz

6. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

7. Check to see if the host drives have encryption Auto-Lock mode enabled. Type:
[nz@nzhost1 ~]$ nzhw show -type hostDisk
The Security column lists Enable or Disabled. If Enabled, the drives are locked.
This information is required when you reach step 12.
8. Check the state of the Netezza system:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

9-2 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-002 or larger)

9. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
10. Wait for the system to stop using the command:
[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.

11. When the system is stopped, exit from the nz session to return to user root:
[nz@nzhost1 ~]$ exit

12. If host disks are encryption Auto-Lock mode Disabled (from step 7), skip to step 13.If
host disks are encryption Auto-Lock mode Enabled (from step 7), extract the host
key(s):
a. Log in to the active host as user root.
b. Determine which host keys are stored:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey list
Example output:
hostkey1
hostkey2
hostkey1old
hostkey2old
Note: If only one key per host were generated, the files hostkeynold are not listed.

c. Use the following commands to complete any key activity:


[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume -override
Note: If this command fails, contact IBM Netezza Support.

[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey check -hostkey -override


Example output:
AEK feature is enabled
Host AEK status:
Verification of host ha1 AEK key: SUCCESS
Verification of host ha2 AEK key: SUCCESS
d. Use the following commands to create text files with the extracted keys for the
hosts:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey extract -label
hostkey[1|2] -file /nzscratch/hostkey[1|2].txt
Where you choose 1 if HA1 is the host requiring the replacement, and 2 if HA2 is
the host requiring the replacement.
Example output:
Key written to file
The following commands are used only if hostkeynold files are in the output of the
command in substep b:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey extract -label
hostkey[1|2]old -file /nzscratch/hostkey[1|2]old.txt
Example output:
Key written to file

00X6949 Rev.1.40 9-3


Replacement Procedures: IBM PureData System for Analytics N3001

[root@nzhost1 ~]# diff /nzscratch/hostkey[1|2].txt


/nzscratch/hostkey[1|2]old.txt
The diff command compares the keys in hostkeyn and hostkeynold. If the values
are the same (no output), it is a valid key. If the values are different, contact IBM
Netezza Support.
e. Save the contents of the hostkey text files for use in step 21 on page 9-5.
13. Type the following commands to stop the clustering processes (assuming the failing
host is running NPS and HA1 is the primary host):
[root@nzhost1 ~]# ssh ha2 'service heartbeat stop'
[root@nzhost1 ~]# service drbd stop
[root@nzhost1 ~]# ssh ha2 'service drbd stop'
[root@nzhost1 ~]# service heartbeat stop
Note: If the host is in a state where the above step is not possible, skip to the next
step.

14. Type the following command on the host requiring the system board replacement:
[root@nzhost ~]# chkconfig heartbeat off
[root@nzhost ~]# shutdown -h now
Note: If the host is in a state where the above step is not possible, power off the host
by holding in the power button.

15. Replace the system board following the IBM replacement procedures in the IBM Prob-
lem Determination and Service Guide for the server.
16. Boot the host and login as root.
17. Put the system in non-heartbeat mode:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh

18. Change the VPD to those saved from original system board (UUID, SERIAL, AND MTM
data) in step 4:
[root@nzhost1 ~]# cat /nzscratch/savedVPD.txt
[root@nzhost1 ~]# cd /nz/export/tools/asu
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoUUID <uuid>
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoSerialNum <s/n>
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysEncloseAssetTag <MT,
serial num>
For example:
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoUUID
43130A38E4C511E3BB3640F2E9301638
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysInfoSerialNum 06BN989
[root@nzhost1 ~]# ./asu set SYSTEM_PROD_DATA.SysEncloseAssetTag
3567-EEP,NZ35086

19. Verify the the VPD is correct:


[root@nzhost1 ~]# ./asu save /nzscratch/newVPD.txt --group SYSTEM_
PROD_DATA
[root@nzhost1 ~]# cat /nzscratch/newVPD.txt
Verify that the output matches the data from step 18.

9-4 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-002 or larger)

20. Reboot the host server and update the firmware using the critical host firmware
updates from FDT Support Tools 2.0.0.1 (see the README for FDT Support Tools
2.0.0.1 for instructions to create bootable media):
a. Reboot the host:
[root@nzhost1 ~]# reboot

b. Insert the host-specific USB stick into the front USB port of the host that is being
configured. (Refer to the FDT Support Tools media for instructions on creating the
firmware update USB stick.)
c. Press F12 when the splash screen appears.
d. Select USB: Storage – USB Port#
e. Ignore the prompt to enter debug mode. Select f to select all entries when the selec-
tion appears on the screen.
f. Select a to accept the menu.
g. Answer Y to update ASU settings.
h. Type y when prompted to save logs.
The firmware and ASU settings are updated, and the host reboots twice.
21. If the host drives are encryption Auto-Lock mode Disabled (as identified in step 7 on
page 9-2) skip to step 22. If the host drives are encryption Auto-Lock mode Enabled,
as identified in step 7 on page 9-2:
a. As the host boots the second time, press F1 at the splash screen to enter the
Unified Extensible Firmware Interface (UEFI) menu.
b. Select System Settings -> Storage -> LSI MegaRAID.... -> Controller Management -
> Advanced -> Enable Drive Security
c. Local Key Management (LKM) must be selected, then select OK.
d. In next screen in the Security Key field, type the key obtained in step 12 substep d,
then type the key again in the Confirm field.
The key must be typed correctly without errors, and you must record your entry (written or
photo) in the event that you need to change the key value later.

e. Deselect Pause for Password at Boot by pressing space bar. Verify that it is not
selected.
f. After recording your Security Key entry with written documentation or photo, select
I Recorded the Security Settings for Future Reference (as described in substep d)
using the space bar.
g. Select Enable Drive Security by pressing Enter.
h. Confirm and select yes and press Enter.
i. At the message The operation has been performed successfully, select OK and press
Enter.

00X6949 Rev.1.40 9-5


Replacement Procedures: IBM PureData System for Analytics N3001

j. Press Esc to exit the UEFI and then type Y to save the changes. The host boots.
k. If system doesn't boot and the system is continuously rebooting, the Key entered in
substep d is incorrect. While booting, press F1 at the splash screen, go to System
Settings and re-start from substep b. (Select Change security key -> Change security
key settings, keeping Local Key Management. Then in the Existing Key field, type
the key saved in substep d, then in the New Key and Confirm fields type the correct
key and confirm. Again, record the key as typed. Continue at substep e.)
If you think you entered the correct Key and are still not able to boot contact IBM
Netezza Support for further help.
l. Once the system boots, the virtual drive may need to be secured. To check if it is
secured, issue the following MegaCli command;
[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/vall show all
Example output:
Controller = 0
Status = Success
Description = None
/c0/v0 :
======
-----------------------------------------------------------
DG/VD TYPE State Access Consist Cache sCC Size Name
-----------------------------------------------------------
0/0 RAID10 Optl RW Yes NRAWBC - 1.089 TB
-----------------------------------------------------------
.
.
.
Span Depth = 2
Number of Drives Per Span = 2
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disabled
Encryption = FDE
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
Creation Date = 04-09-2014
Creation Time = 03:48:53 PM
Emulation type = None

m. If the Encryption is blank (or none), secure it with following command:


[root@nzhost1 ~]# /opt/MegaRAID/storcli/storcli64 /c0/dall set
security=on
Note: For the x3650-M4 only, to configure the new system board for use in the Netezza
system, it is necessary to determine the MAC addresses of the replacement board and sub-
stitute those addresses for the values stored in configuration files that applied to the
removed system board. The required MAC addresses are for Ethernet ports (ethx) and a
USB port (usb0).
22. For the x3650-M4 only, when the host has booted:
a. Type the following command on the host with the system board replacement:
[root@nzhost1 ~]# ifconfig -a | less

9-6 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-002 or larger)

Locate the listing for usb0 and make note of the HWaddr address listed for use later
in the procedure.
Locate the entries for __tempxxx.
There are four entries.
The following table shows how the __tempxxx values are applied to ethx values.
Note: If there are no __tempxxx values listed, look for new Ethernet ports, such as
eth14, eth15, eth16, and eth17.
Table 9-1: Determining ethx values

x3650-M4

Lowest number __tempxxx eth6


(or lowest number new port)

Next higher __tempxxx eth7

Next higher __tempxxx eth0

Highest __tempxxx eth1

Use the MAC address associated with lowest numbered __tempxxx value as the
MAC address for eth6, the next higher value as the MAC address for eth7, and con-
tinue with the next higher values.
b. Type the following commands on the host with the system board replacement:
[root@nzhost1 ~]# service network stop
[root@nzhost1 ~]# vi ifcfg-ethx
[root@nzhost1 ~]# cd /etc/sysconfig/network-scripts
Edit the value for HWADDR, using the values for MAC address from Table 9-1, then
save and close. For example:
# Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet
DEVICE=eth0
BOOTPROTO=dhcp
DHCPCLASS=
HWADDR=5C:F3:FC:7A:97:98
ONBOOT=no
HOTPLUG=no
DHCP_HOSTNAME=netezza

Repeat this step for each ethx value associated with the system board.
c. Type the following command to edit the ifcfg-usb0 file, and edit the HWADDR value
with the usb0 MAC address recorded in step 22:
[root@nzhost1 ~]# vi ifcfg-usb0

d. Type the following command:


[root@nzhost1 ~]# service network restart

e. Type the following commands on the host with the system board replacement:
[root@nzhost1 ~]# ifconfig -a | less

00X6949 Rev.1.40 9-7


Replacement Procedures: IBM PureData System for Analytics N3001

Locate the entries for the ports just configured. Confirm that the values are changed
according to the edits made in step b through step c.
f. Also from the output in the previous step, locate the IP addresses for the ports, and
ping those addresses:
[root@nzhost1 ~]# ping www.xxx.yyy.zzz (confirm port is live)
[root@nzhost1 ~]# ping aaa.bbb.ccc.ddd (confirm port is live)
g. Delete the file /etc/udev/rules.d/70-Persistent-net.rules.
23. Restore IMM information recorded prior to the replacement. Type:
[root@nzhost1 ~]# cd /opt/nz-hwsupport/install_tools
[root@nzhost1 ~]# ./nz-rmgt.pl
Choose option 1.
Enter the information recorded earlier.
Choose option 4: Exit
24. After the host has rebooted, type the following command on the active host:
[root@nzhost1 ˜
~]# /nzlocal/scripts/drbd_config.sh --config-only
25. Type the following commands on both hosts:
[root@nzhost1 ~]# service drbd start
[root@nzhost1 ~]# chkconfig heartbeat on
[root@nzhost1 ~]# service heartbeat start

26. Type the following command on the active host:


[root@nzhost1 ~]# crm_mon -i3
Wait until both hosts go online and nps resource group comes up.
Seek help if there are any errors, or if entries in nps resource group are not all started
after 5 minutes.
27. For x3650M4 hosts (only), reapply the FoD key to the server:
[root@nzhost1 ~]# cd /nz/export/tools/asu
[root@nzhost1 ~]# ./asu fodcfg installkey -f key_location
Where key_location is the download location from when the key was retrieved.
28. If the host drives are encryption Auto-Lock mode Enabled, type the following
commands:
a. As user nz, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

b. As user root:
[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey resume
Note: If this command fails, contact IBM Netezza Support.

[root@nzhost1 ~]# /nz/kit/bin/adm/nzkey check -hostkey


Example output:
AEK feature is enabled
Host AEK status:
Verification of host ha1 AEK key: SUCCESS
Verification of host ha2 AEK key: SUCCESS

9-8 00X6949 Rev.1.40


Chapter : Replacing a Host System Board (N3001-002 or larger)

c. As user nz, start the system:


[nz@nzhost1 ~]$ nzstart
29. (This step is optional but recommended. If performed, it is recommended to repeat the
step so as to make HA1 the active host.)
Relocate the active system software to ensure that both hosts successfully failover.
From ha1, type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Example output:
Migrating the NPS resource group from <current active host> to
<current standby host>.....
and then, after a few minutes:
Complete.
30. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

31. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check host
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.

00X6949 Rev.1.40 9-9


Replacement Procedures: IBM PureData System for Analytics N3001

9-10 00X6949 Rev.1.40


C H A P T E R 10
Replacing a Host Server RAID Flash
This chapter applies to the N3001-001, and N3001-010 and larger systems. This chapter
does not apply to N3001-002 and N3001-005 systems.
Before you begin the Host Server RAID Flash replacement process, make certain that you
have a replacement RAID Flash that conforms to the hardware models supported for the
IBM PureData System for Analytics N3001. Typically, you will use a new replacement RAID
Flash.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
Media for FDT Support Tools 2.0.0.1 (or later) is required to complete this replacement.

The current FRU number for the RAID Flash is 44W3393.


Note: In some IBM publications, such as Problem Determination and Service or Installation
and Service guides, the name used for the RAID Flash may be:
 RAID cache card
 ServeRAID memory module
 ServeRAID adapter memory module
 TMM
During this procedure you need to reference other documentation (such as Installation and
Service guides) for the server type being serviced. These guides can be found at:
https://www-947.ibm.com/support/entry/myportal/support.
FRU number 44W3393 replaces the earlier version, FRU number 46C9793. When replac-
ing the earlier version with the current version, a RAID firmware update is required, as
described in this procedure. Systems are allowed to have a current version of RAID flash in
one host, and the earlier version of RAID flash in the other host.

It is recommended, but not required that both hosts be updated with the latest RAID
firmware.

10-1
Replacement Procedures: IBM PureData System for Analytics N3001

To replace a RAID Flash, follow these steps:


1. Read the safety information that begins on page v.
2. Log into Host 1 (ha1) of the system as user root.
If needed, locate the active host using the command:
[root@nzhost1 ~]# crm_resource -r nps -W

3. Change to user nz:


[root@nzhost1 ~]# su - nz

4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

5. Check the state of the Netezza system:


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

7. Wait for the system to stop using the command:


[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.

8. Exit from the nz session to return to user root:


[nz@nzhost1 ~]$ exit

9. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 'service heartbeat stop'
[root@nzhost1 ~]# service heartbeat stop
[root@nzhost1 ~]# ssh ha2 'service drbd stop'
[root@nzhost1 ~]# service drbd stop
[root@nzhost1 ~]# chkconfig heartbeat off
[root@nzhost1 ~]# ssh ha2 'chkconfig heartbeat off'
[root@nzhost1 ~]# chkconfig drbd off
[root@nzhost1 ~]# ssh ha2 'chkconfig drbd off'
10. Type the following command on the host requiring the RAID Flash replacement:
[root@nzhost1 ~]# shutdown -h now

11. Disconnect the power cord from the server.


12. Ensure that the RAID Flash power LED is not lit. It may take up to 10 minutes for the
power module to fully discharge and the LED to be OFF.
13. Replace the RAID Flash following the IBM replacement procedures in the IBM Installa-
tion and Service Guide for the server.

10-2 00X6949 Rev.1.40


Chapter : Replacing a Host Server RAID Flash

14. Reconnect the power cord to the server.


15. Reboot the host after completing the replacement.
16. Update the RAID firmware using FDT Support Tools 2.0.0.1 (or later):
Follow these steps to update using FDT Support Tools 2.0.0.1 (or later)
a. Insert and mount the media for FDT Support Tools 2.0.0.1.
b. Create the FDT_ST directory in the /nzscratch directory:
[root@nzhost1 ~]# mkdir /nzscratch/FDT_ST

c. Copy the FDT Support Tools 2.0.0.1 package to the /nzscratch/FDT_ST directory
[root@nzhost1 ~]# cp nz-FDTSupport-2.0.0.1.tar.gz /nzscratch/FDT_ST
d. Untar the package:
[root@nzhost1 ~]# cd /nzscratch/FDT_ST
[root@nzhost1 ~]# tar xzvf nz-FDTSupport-2.0.0.1.tar.gz
e. Untar the script utilities package:
[root@nzhost1 ~]# tar xvf script-utils-tar.gz

f. Run the update utility:


[root@nzhost1 ~]# ./ibm_fw_sraidmr_5200-24.7.0-0052_linux_32-64.bin
-s

g. Unmount and remove the media.


h. Reboot the host:
[root@nzhost1 ~]# reboot

i. (Optional) Perform step a through step h on the other host.


17. Type the following commands on both hosts:
[root@nzhost1 ~]# service drbd start
[root@nzhost1 ~]# chkconfig heartbeat on
[root@nzhost1 ~]# service heartbeat start
18. Type the following command on the active host:
[root@nzhost1 ~]# crm_mon -i3
Wait until both hosts go online and nps resource group comes up.
Seek help if there are any errors, or if entries in nps resource group are not all started
after 5 minutes.
19. For N3001-010 and larger only. (This step is optional but recommended. If performed,
it is recommended to repeat the step so as to make HA1 the active host.)
Relocate the active system software to ensure that both hosts successfully failover.
From ha1, type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Example output:
Migrating the NPS resource group from <current active host> to
<current standby host>.....
and then, after a few minutes:

00X6949 Rev.1.40 10-3


Replacement Procedures: IBM PureData System for Analytics N3001

Complete.
20. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

21. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.

10-4 00X6949 Rev.1.40


C H A P T E R 11
Replacing a Host Server Network Interface Card
Before you begin the Host Server Network Interface Card (NIC) replacement process, make
certain that you have a replacement Host NIC that conforms to the hardware models sup-
ported for the IBM PureData System for Analytics N3001. Typically, you will use a new
replacement Host NIC.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

If replacing networking components in the host in addition to the NIC, you must replace
just one component at a time, completing each procedure first, and the continuing to
another component. Otherwise, it is difficult to determine which MAC address is assigned
to which port.

The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
To replace a Host NIC on the N3001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).

1. Read the safety information that begins on page v.


2. Log into Host 1 (ha1) of the system as user root.
If needed, locate the active host using the command:
[root@nzhost1 ~]# crm_resource -r nps -W

3. Change to user nz:


[root@nzhost1 ~]# su - nz

4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off
Note: Ensure that nzcallhome returns status as disabled. If there are errors in the call-
Home.txt configuration file, errors are listed in the output, and callHome is disabled.

5. Check the state of the Netezza system:


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

11-1
Replacement Procedures: IBM PureData System for Analytics N3001

6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop
7. Wait for the system to stop using the command:
[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.

8. Exit from the nz session to return to user root:


[nz@nzhost1 ~]$ exit

9. Type the following commands to put the system in Maintenance mode:


[root@nzhost1 ~]# /opt/nz-hwsupport/install_tools/nz-hbmgr.pl
Example output:
What do you want to do?

1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = RUNNING
HA2:
drbd status = RUNNING
heartbeat status = RUNNING

Resource status: Started


Current active node is ha1
Current NPS state is Online
Splitbrain is not detected.

Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 1.
Example output:
You have selected ha1
Stopping nps resource...
Stopping Heartbeat on both ha2...
Stopping Heartbeat on both ha1...

Putting ha1 into maintenance mode...


Done
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING

11-2 00X6949 Rev.1.40


Chapter : Replacing a Host Server Network Interface Card

heartbeat status = NOT RUNNING

Resource status: Stopped


Current NPS state is Stopped
Splitbrain is not detected.ha1 appears to be maintenance mode
Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
Type 4 to go back to the previous menu.
Type 3 to exit.
10. On the replacement NIC, there is a sticker with the MAC addresses for the network
ports. You must change the MAC addresses configured in NPS for the failed NIC and
replace them with MAC addresses assigned to the replacement NIC.

Figure 11-1: N3001-001 NIC Ports

Figure 11-2: N3001-002/005 NIC Ports

00X6949 Rev.1.40 11-3


Replacement Procedures: IBM PureData System for Analytics N3001

Figure 11-3: N3001-010 and larger NIC Ports

Using an editor, for example vi, change the following files on the host that requires the
NIC replacement:
a. Type the command:
[nz@nzhost1 ~]$ vi /etc/udev/rules.d/70-persistent-net.rules
Replace MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for ethX
with MAC 1 from the new NIC.
Replace MAC address (value for field ATTR{address}=="xx:yy:zz:cc:dd:ee") for ethY
with MAC 2 from the new NIC (2nd port has MAC address incremented by 1).
If the replacement NIC is a 4-port card, repeat the previous edits for the next two
ports.
For N3001-001 systems, remove the UUID from the file.
Save and exit the file.
b. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-ethx
Replace MAC address (value for attribute HWADDR=) for ethX with MAC from the
new NIC.
For N3001-001 systems, remove the UUID from the file.
Save and exit the file.
c. Type the command:
[nz@nzhost1 ~]$ vi /etc/sysconfig/network-scripts/ifcfg-ethy
Replace MAC address (value for attribute HWADDR=) for ethY with MAC2 from the
new NIC (2nd port has MAC address incremented by 1).
Save and exit the file.

11-4 00X6949 Rev.1.40


Chapter : Replacing a Host Server Network Interface Card

d. If the replacement NIC is a 4-port card, repeat step b and step c for the next two
ports.
e. Run the following command to stop the operating system:
[root@nzhost1 ~]# shutdown -h now

11. Replace the NIC following the IBM replacement procedures in the IBM Problem Deter-
mination and Service Guide for the server.
12. Reboot the host after completing the replacement.
13. The host server firmware must be updated as described in the FDT 4.2 User’s Guide
(for N3001-002 and larger systems) or the FDT 4.2.1 User’s Guide (for N3001-001).
14. From HA1, put the system into non-heartbeat mode:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh

15. Verify that all cables are connected correctly:


[root@nzhost1 ~]# /opt/nz/fdt/system_diags concheck
Example output:
Now creating the lock file [DONE]
------------------------------------------------------------------
***** S Y S T E M D I A G S *****
FDT 4.2.1.0 - /opt/nz/fdt/log/system_diags_20140925-074438.log
--------------------------- CONCHECK -----------------------------
Now checking the state of the system [PASS]
----------------------------- Rack 1 ------------------------------
------------ Rack 1 Host Bonded Network Connection Test -----------
Host ha1 iscsi-Link bond1 eth3 > Host ha2 eth3 [PASS]
Host ha1 iscsi-Link bond1 eth5 > Host ha2 eth5 [PASS]
Host ha1 iscsi-Ping bond1 eth3 > Host ha2 eth3 [PASS]
Host ha1 iscsi-Ping bond1 eth5 > Host ha2 eth5 [PASS]
Host ha1 iscsi-PingF bond1 eth3 eth5 Host ha2 bond1 eth3 eth5[PASS]
Host ha1 ha-Link bond2 eth2 > Host ha2 eth2 [PASS]
Host ha1 ha-Link bond2 eth4 > Host ha2 eth4 [PASS]
Host ha1 ha-Ping bond2 eth2 > Host ha2 eth2 [PASS]
Host ha1 ha-Ping bond2 eth4 > Host ha2 eth4 [PASS]
Host ha1 ha-PingF bond2 eth2 eth4 > Host ha2 bond2 eth2 eth4[PASS]
Host ha1 fabric-Link bond1:1 > Host ha2 bond1:1 [PASS]
Host ha1 drbd-Link bond1:2 > Host ha2 bond1:2 [PASS]
Host ha1 br0 bond2 [PASS]
Host ha2 iscsi-Link bond1 eth3 > Host ha1 eth3 [PASS]
Host ha2 iscsi-Link bond1 eth5 > Host ha1 eth5 [PASS]
Host ha2 iscsi-Ping bond1 eth3 > Host ha1 eth3 [PASS]
Host ha2 iscsi-Ping bond1 eth5 > Host ha1 eth5 [PASS]
Host ha2 iscsi-PingF bond1 eth3 eth5 Host ha1 bond1 eth3 eth5[PASS]
Host ha2 ha-Link bond2 eth2 > Host ha1 eth2 [PASS]
Host ha2 ha-Link bond2 eth4 > Host ha1 eth4 [PASS]
Host ha2 ha-Ping bond2 eth2 > Host ha1 eth2 [PASS]
Host ha2 ha-Ping bond2 eth4 > Host ha1 eth4 [PASS]
Host ha2 ha-PingF bond2 eth2 eth4 > Host ha1 bond2 eth2 eth4[PASS]
Host ha2 fabric-Link bond1:1 > Host ha1 bond1:1 [PASS]
Host ha2 drbd-Link bond1:2 > Host ha1 bond1:2 [PASS]
Host ha2 br0 bond2 [PASS]
Now removing the lock file [DONE]
-----------------------------SUMMARY------------------------------

00X6949 Rev.1.40 11-5


Replacement Procedures: IBM PureData System for Analytics N3001

Final Status [PASS]


Main log file - /opt/nz/fdt/log/system_diags_20140925-074438.log
------------------------------------------------------------------

16. Run the following command:


[root@nzhost1 ~]# /opt/nz-hwsupport/install_tools/nz-hbmgr.pl
Example output:
What do you want to do?

1: Maintenance Management
2: Heartbeat Management
3: Exit
Select one :
Type 1.
Example output:
HA1:
drbd status = RUNNING
heartbeat status = NOT RUNNING
HA2:
drbd status = RUNNING
heartbeat status = NOT RUNNING

Resource status: Stopped


Current NPS state is Stopped
Splitbrain is not detected.

ha1 appears to be in maintenance mode


Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 3.
Example output:
Removing ha1 from maintenance mode.
Starting Heartbeat on ha1...
Starting Heartbeat on ha2...
Set resource to start when heartbeat finishes coming up?
y/n [n]: y
Waiting for heartbeat to come online...
Starting nps resource...
HA1:
drbd status = RUNNING
heartbeat status = RUNNING
HA2:
drbd status = RUNNING
heartbeat status = RUNNING

Resource status: Started


Current active node is ha1
Current NPS state is Stopped
Splitbrain is not detected.

11-6 00X6949 Rev.1.40


Chapter : Replacing a Host Server Network Interface Card

Select a host:
1: Move HA1 in/out of maintenance
2: Move HA2 in/out of maintenance
3: Return systems to cluster mode
4: Previous Menu
:
Type 4 to go back to the previous menu.
Type 3 to exit.
17. For N3001-002 and larger only. (This step is optional but recommended. If performed,
it is recommended to repeat the step so as to make HA1 the active host.)
Relocate the active system software to ensure that both hosts successfully failover.
From ha1, type the following command:
[root@nzhost1 ~]# /nzlocal/scripts/heartbeat_admin.sh --migrate
Example output:
Migrating the NPS resource group from <current active host> to
<current standby host>.....
and then, after a few minutes:
Complete.
18. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

19. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.

00X6949 Rev.1.40 11-7


Replacement Procedures: IBM PureData System for Analytics N3001

11-8 00X6949 Rev.1.40


C H A P T E R 12
Replacing a Host Server SAS HBA (N3001-001)
Before you begin the Host Server SAS HBA replacement process, make certain that you
have a replacement SAS HBA that conforms to the hardware models supported for the IBM
PureData System for Analytics N3001-001. Typically, you will use a new replacement SAS
HBA.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

The estimated time to perform this procedure is from 60 to 90 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the SAS HBA is 47C8676.
To replace a SAS HBA on the N3001-001 system, follow these steps:
Note: This procedure requires that NPS is running on Host 1 (ha1).

1. Read the safety information that begins on page v.


2. Log into Host 1 (ha1) of the system as user root.
If needed, locate the active host using the command:
[root@nzhost1 ~]# crm_resource -r nps -W

3. Change to user nz:


[root@nzhost1 ~]# su - nz

4. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

5. Check the state of the Netezza system:


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

6. If the system state is online, stop the system using the command:
[nz@nzhost1 ~]$ nzstop

7. Wait for the system to stop using the command:


[nz@nzhos1t ~]$ nzstate
System state is 'Stopped'.

12-1
Replacement Procedures: IBM PureData System for Analytics N3001

8. Exit from the nz session to return to user root:


[nz@nzhost1 ~]$ exit
9. Type the following commands to stop the clustering processes:
[root@nzhost1 ~]# ssh ha2 'service heartbeat stop'
[root@nzhost1 ~]# service heartbeat stop
[root@nzhost1 ~]# ssh ha2 'service drbd stop'
[root@nzhost1 ~]# service drbd stop

10. Type the following command on the host requiring the SAS HBA replacement:
[root@nzhost1 ~]# chkconfig heartbeat off
[root@nzhost1 ~]# shutdown -h now
11. Replace the SAS HBA following the IBM replacement procedures in the IBM Problem
Determination and Service Guide for the server.
12. Reboot the host after completing the replacement.
13. The host server firmware must be updated as described in the FDT 4.2.1 User’s Guide.
14. Type the following commands on both hosts:
[root@nzhost1 ~]# service drbd start
[root@nzhost1 ~]# chkconfig heartbeat on
[root@nzhost1 ~]# service heartbeat start
15. Type the following command on the active host:
[root@nzhost1 ~]# crm_mon -i3
Wait until both hosts go online and nps resource group comes up.
Seek help if there are any errors, or if entries in nps resource group are not all started
after 5 minutes.
16. As user root, type the following commands:
[root@nzhost ~]# cd /opt/nz/fdt
[root@nzhost ~]# ./system_diags datapathcheck
Note: Do note interrupt this operation (such as with Ctrl-C).

Ensure that the final status is [PASS]


17. If Call Home was previously disabled, as user nz, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

18. As user root, run sysrevcheck to verify that the system is configured correctly.
Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
Run the command:
[root@nzhost ~]# ./sys_rev_check
If issues are noted in the output, resolve the issues as described in the FDT User’s
Guide, in the section “Resolve sys_rev_check Issues,” and then rerun sysrevcheck to
verify that issues are resolved.

12-2 00X6949 Rev.1.40


C H A P T E R 13
Replacing a Host Server Disk Drive
The IBM PureData System for Analytics N3001 includes two Host Servers. In systems
other than the N3001-001, each host has five disk drives configured in a RAID array. In
N3001-001 systems each host has eight disk drives configured in a RAID array.
Note: In the N3001-001 system eight of the disks are the system disks connected to on-
board RAID controller and configured in a RAID array (slots 0-7) and the other sixteen are
the data disks (slots 8-23) connected to SAS HBA. This chapter describes replacement
procedure for the system disks (slots 0-7).

Before you begin the disk replacement process, make certain that you have a replacement
disk that conforms to the hardware models supported for the N3001 system. The N3001
system uses Self-Encrypting Drives (SEDs). Typically, you will use a new replacement disk.
Also before beginning the replacement procedure, verify that there is a problem with the
disk drive. Consult the Problem Determination and Service Guide for the disk enclosure or
host server for more information on disk replacement.
Note: Prior to FDT 4.3, only IBM branded drives labeled with the correct FRU number are
supported. If a Lenovo branded drive is used as a replacement, it is not possible to update
the firmware on the drive or the host server using standard firmware update procedures.
Firmware on host components may be individually updated as needed until an approved
drive is procured or FDT 4.3 is installed. As of FDT 4.3, Lenovo branded drives are allowed
in the system (sys_rev_check shows the drives as [INFO}), but firmware in the drives cannot
be updated using FDT.

Only one host disk drive can be replaced at a time.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

A failed disk must be replaced while the system is online. The system cannot be offline
when replacing a failed disk.

The estimated time to perform this procedure is up to one hour (for the disk to fully mirror).
The N3001 system currently supports two SED models:
 600GB Model ST600MM0026E - FRU number 90Y8909
 600GB Model HUC101860CSS20E - FRU number 90Y8909

13-1
Replacement Procedures: IBM PureData System for Analytics N3001

Note: Self-Encrypting Drives have some important differences to be aware of:


 When Security is Enabled (drives are in auto-lock mode) and SecureEraseOn-
Failover = True (default setting), a secure erase is performed automatically during
the failover of a drive.
 When Security is Enabled (drives are in auto-lock mode), the replacement SED
performs a secure erase when transitioning roles from Inactive to Spare (during
activation).
To replace a disk drive on an N3001 Host server, follow these steps:
1. Read the safety information that begins on page v.
2. Log into the system as user nz.
3. Check to see if Call Home is enabled, and if so, temporarily disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

4. Check the state of the Netezza system:


[nz@nzhost ~]$ nzstate
System state is 'Online'.
If the system state is not online, use the nzstart command to start the system. If you
cannot start the system for any reason, contact Netezza Support for assistance.
5. Check the DRBD status:
a. Change to user root:
[nz@nzhost ~]$ su

b. Check the DRBD status on the active host:


[root@nzhost ~]# service drbd status
c. Verify that the status is listed as UpToDate/UpToDate. If not, contact IBM Netezza
Support.
Example output:
drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094,
2010-11-18 14:52:01
m:res cs st ds p mounted fstype
0:r1 Connected Primary/Secondary UpToDate/UpToDate C /export/home ext3
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3

d. Check the DRBD status on the passive host:


[root@nzhost ~]# ssh ha2 service drbd status

e. Verify that the status is listed as UpToDate/UpToDate. If not, contact IBM Netezza
Support.
Example output:

13-2 00X6949 Rev.1.40


Chapter : Replacing a Host Server Disk Drive

drbd driver loaded OK; device status:


version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root@nps22094,
2010-11-18 14:52:01
m:res cs st ds p mounted fstype
0:r1 Connected Secondary/Primary UpToDate/UpToDate C
1:r0 Connected Secondary/Primary UpToDate/UpToDate C

f. Exit as user root:


[root@nzhost ~]# exit

6. Identify the failed disk that requires replacement. Look at the Role column:
[nz@nzhost ~]$ nzhw -issues
Example output:
Description HW ID Location Role State Security
----------- ----- ---------------------- ------- ----- --------
HostDisk 1020 rack1.host1.hostDisk2 Failed Down Disabled
SASController 1026 rack1.host1.SASController0 Active WarningN/A

Note: When a host disk fails, a SASController warning may accompany the disk failure.
This warning clears after the disk is replaced.

7. Locate the failed disk:


[root@nzhost1 ~]# nzhw locate -id 1020
Logical Name:'rack1.host1.hostDisk2' Physical Location:'upper host,
2nd host disk'
Never remove a disk drive while the disk is in the Active or Spare state. If you remove an
Active or Spare disk drive, you could cause the system to restart and/or transition to the
down state.

8. Remove the problem disk (identified by the amber LED) from the host server.
When removing a drive, pull it only half way out of the slot and wait 30 seconds for the disk
to spin down before fully removing the disk drive.

Latch

Hard disk drive

Tray handle
Pull the drive out half-way, wait 30 seconds
for the disk to spin down. Only then remove
the drive completely.

Figure 13-1: Disk Removal

00X6949 Rev.1.40 13-3


Replacement Procedures: IBM PureData System for Analytics N3001

9. Mark the failed disk drive in a non-harmful way to ensure that the correct disk will be
replaced in a later step.
10. Delete the HW ID of the failed disk:
[root@nzhost1 ~]# nzhw delete -id 1020
Are you sure you want to proceed (y|n)? [n] y

11. Install the replacement disk into the same slot from which the failed disk was removed.
Failure to replace a hard disk drive in its correct bay might result in loss of data. If you are
replacing a hard disk drive that is part of a configured array and logical drive, be sure to
install the replacement hard disk drive in the correct bay. See the hardware and software
documentation that applies to the host server to determine whether there are restrictions
regarding hard disk drive configurations.

Never swap a drive when its associated green activity LED is flashing. Swap a drive only
when its associated amber LED is blinking.

12. Check the branding of the replacement disk drive.


Note: Prior to FDT 4.3, only IBM branded drives labeled with the correct FRU number
are supported. If a Lenovo branded drive is used as a replacement, it is not possible to
update the firmware on the drive or the host server using standard firmware update pro-
cedures. Firmware on host components may be individually updated as needed until an
approved drive is procured or FDT 4.3 is installed. As of FDT 4.3, Lenovo branded
drives are allowed in the system (sys_rev_check shows the drives as [INFO}), but firm-
ware in the drives cannot be updated using FDT.

[root@nzhost1 ~]# /opt/nz/fdt/common/util/MegaCLI -pdInfo


-PhysDrv[252:slot_number] -a0 | grep "Inquiry" | cut -c 15-22
Where slot_number is the host disk slot number 0 through 4).
For example:
[root@nzhost1 ~]# /opt/nz/fdt/common/util/MegaCLI -pdInfo
-PhysDrv[252:3] -a0 | grep "Inquiry" | cut -c 15-22
The command return IBM-ESXS for an IBM branded disk, and LENOVO-X for a Lenovo
branded disk. If it is a Lenovo branded disk, see the note at the beginning of this step.
13. The system manager automatically detects the replacement disk.
Note: Depending on NPS version, the new disk may not immediately change to Active
status. It may remain in Failed/ Warning or Assigning/Ok state while RAID regeneration
is taking place. This can take up to 30 minutes to complete.

[nz@nzhost ~]$ nzhw | grep -i hostdisk


HostDisk 1019 rack1.host1.hostDisk1 Active Ok Disabled
HostDisk 1021 rack1.host1.hostDisk3 Active Ok Disabled
HostDisk 1022 rack1.host1.hostDisk4 Active Ok Disabled
HostDisk 1023 rack1.host1.hostDisk5 Active Ok Disabled
HostDisk 1032 rack1.host2.hostDisk1 Spare Ok Disabled
HostDisk 1033 rack1.host2.hostDisk2 Active Ok Disabled
HostDisk 1034 rack1.host2.hostDisk3 Active Ok Disabled
HostDisk 1035 rack1.host2.hostDisk4 Active Ok Disabled
HostDisk 1036 rack1.host2.hostDisk5 Active Ok Disabled
HostDisk 1658 rack1.host1.hostDisk2 Assigning Ok Disabled

13-4 00X6949 Rev.1.40


Chapter : Replacing a Host Server Disk Drive

If the disk status does not change to Assigned/Ok (or Failed/Warning), use the mega_
check.pl tool. As user root:
[root@nzhost ~]# /opt/nz-hwsupport/hts/mega_check.pl -r
Choose option 5 Manually start a Copyback process and provide the following answers to the
prompts:
For N3001-002 and larger systems:
There is no foreign configuration on controller 0
Exit Code: 0x00
Slot number of drive to copy data from: 4 (original configuration hot spare)
Slot number of drive to copy data to: n
Where n is the chassis slot number of the replaced drive (0 through 3).
For N3001-001:
There is no foreign configuration on controller 0
Exit Code: 0x00
Slot number of drive to copy data from: 7(original configuration hot spare)
Slot number of drive to copy data to: n
Where n is the chassis slot number of the replaced drive (0 through 6).
14. Disk regeneration process can take from 30 to 60 minutes.
When the regeneration process is complete, the replaced disk now lists as Active Ok
and an updated Spare disk should be present the host. As user nz, type:
[nz@nzhost ~]$ nzhw | grep -i hostdisk
HostDisk 1019 rack1.host1.hostDisk1 Active OkDisabled
HostDisk 1021 rack1.host1.hostDisk3 Active OkDisabled
HostDisk 1022 rack1.host1.hostDisk4 Active OkDisabled
HostDisk 1023 rack1.host1.hostDisk5 Spare Ok Disabled
HostDisk 1032 rack1.host2.hostDisk1 Spare OkDisabled
HostDisk 1033 rack1.host2.hostDisk2 Active OkDisabled
HostDisk 1034 rack1.host2.hostDisk3 Active OkDisabled
HostDisk 1035 rack1.host2.hostDisk4 Active OkDisabled
HostDisk 1036 rack1.host2.hostDisk5 Active OkDisabled
HostDisk 1658 rack1.host1.hostDisk2 Active OkDisabled

15. Check the SysMgr log.


Example (from N3001):
2014-08-25 13:09:07.241603 EDT Info: NZ-01526: the role of hostDisk
[hostDisk hwid=1658 sn="NC610PDWVHA1ECCXSA610" Parent=1005
Position=2] changed from 'failed' to 'active'
2014-08-25 13:09:07.250227 EDT Info: NZ-01526: the role of hostDisk
[hostDisk hwid=1025 sn="SB1AD0009T25SB1ASB1ASB1A" Parent=1005
Position=7] changed from 'active' to 'spare'
2014-08-25 13:09:07.262897 EDT Info: NZ-01524: the state of
hostDisk [hostDisk hwid=1658 sn="NC610PDWVHA1ECCXSA610" Parent=1005
Position=2] changed from 'unknown' to 'ok'
16. Check the firmware level of the host disks as user root:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./sys_rev_check Host
View the log for sys_rev_check in /opt/nz/fdt/log/sys_rev_check_yyyymmdd-xxxxxx.log.

00X6949 Rev.1.40 13-5


Replacement Procedures: IBM PureData System for Analytics N3001

If a host disk firmware update is necessary, follow the instructions in the FDT User’s
Guide for firmware updates.
17. Verify that host disk issues are resolved:
[nz@nzhost ~]$ nzhw -issues
No entries found

18. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

13-6 00X6949 Rev.1.40


C H A P T E R 14
Replacing a G8052 Management Switch
What’s in this chapter
 Replacement Procedure
 Troubleshooting

Before you begin the G8052 Management Switch replacement process, make certain that
you have a replacement Management Switch that conforms to the hardware models
supported for the IBM PureData System for Analytics N3001. The odd numbered racks in
an N3001 system (rack 1 in an N3001-020 or racks 1 and 3 in an N3001-040) include a
G8052 Management Switch.
Note: You'll need a BNT serial cable (DB 9-F to mini-USB) for this installation procedure
(IBM FRU number 43X0510).

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

Note: The power and fan modules of the G8052 Management Switch are replaceable. Part
numbers are listed in “Overview of the IBM PureData System for Analytics N3001,” and
replacement procedures are included in Rack Switch G8052 Installation Guide.

The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The replacement part number for the G8052 switch is IBM P/N 49Y7922.

Replacement Procedure
The replacement G8052 management switch must be set up with a valid IP address and
then configured for use in an N3001. This includes internal settings as well as firmware
loading.
To replace a G8052 Management Switch:

14-1
Replacement Procedures: IBM PureData System for Analytics N3001

1. Read the safety information that begins on page v.


2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
Note: If the system is already in maintenance mode, an error message is output. To
identify the active (primary) host, type the command:

[root@nzhost1 ~]# service drbd status


The following line in the output shows the active/passive hosts:
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
Primary is always HA1, and secondary is HA2. In this example, HA1 (primary) is the
active host. (If the output showed as Secondary/Primary, HA2 would be the active
host.)
3. If NPS resource group is running:
a. Log in to the active host (nzhost1 in this example) as root.
b. Change to user nz:
[root@nzhost1 ~]# su - nz

c. Check to see if Call Home is enabled, and if so, temporarily disable it.
 Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

 If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

d. Run the following command to stop the Netezza server:


[nz@nzhost1 ~]$ nzstop

e. Exit from the nz session and return to user root:


[nz@nzhost1 ~]$ exit

f. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 service heartbeat stop
[root@nzhost1 ~]# service heartbeat stop

g. Run the following script:


[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
Ensure the script completes without errors.
4. Obtain the IP address of the management switch by examining the file /etc/hosts:
[root@nzhost1 ~]# cat /etc/hosts | grep netswmgt0[1,2]
Use netswmgt01 for the switch in rack 1, and netswmgt02 for the switch in rack 3.
Example output:
10.0.128.65 netswmgt1 # MGT Switch 1

14-2 00X6949 Rev.1.40


Chapter : Replacing a G8052 Management Switch

5. Make note of the IP address for use in a later step.


6. Remove and replace the management switch from the rack.
Note: Ensure that all cables are labeled, and that they are replaced in the same order
and ports from which they were removed.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

7. Connect the serial cable between the active host and the management switch.
The cable must be connected from the serial port at the rear of the active host to the
mini-USB connector on the front of the G8052 management switch, Refer to
Figure 14-1 for the location of the mini-USB connector on the G8052 switch.

Mini-USB Port
Figure 14-1: Front of G8052 Management Switch

8. Type the command:


[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8052.
9. Type admin and press Enter.
You are now logged into the G8052.
10. At the prompt, type enable and press Enter.
11. At the prompt, type config term and press Enter.

00X6949 Rev.1.40 14-3


Replacement Procedures: IBM PureData System for Analytics N3001

12. At the prompt, type interface ip 1 and press Enter.


13. At the prompt, type ip address xxx.xxx.xxx.xxx and press Enter.
Use the IP address obtained in step 4.
14. At the prompt, type ip netmask 255.255.252.0 and press Enter.
15. At the prompt, type vlan 1 and press Enter.
16. At the prompt, type enable and press Enter.
17. At the prompt, type exit and press Enter.
18. At the next prompt, again type exit and press Enter.
19. When the G8052 prompt returns, type Ctrl-A, then type x.
20. Select yes from the prompt and press Enter.
The active host prompt returns.
21. Load the latest released firmware for the G8052 switch:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 fdt]# ./firmware_updater RackMgtSwitch --alias
netswmgt0[1,2]

22. Configure the switch:


[root@nzhost1 fdt]# /nzlocal/scripts/netswmgt/netswmgtConfig.sh -f no
-s [1,2]

23. When prompted for the password, type admin.


Note: If the switch seems to get hung up at this point, see the section “Troubleshoot-
ing” on page 14-5

24. Verify that all connections to the switch are correct:


[root@nzhost1 fdt]# ./system_diags concheck
If errors are reported, make the wiring corrections and rerun concheck.
25. Put the system back into cluster mode:
a. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh

b. Type the following commands to start the clustering processes:


[root@nzhost1 ~]# service heartbeat start
[root@nzhost1 ~]# ssh ha2 'service heartbeat start'

c. Wait five minutes and then type the command:


[root@nzhost1 ~]# su - nz

d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
26. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

14-4 00X6949 Rev.1.40


Chapter : Replacing a G8052 Management Switch

27. Run sys_rev_check to verify that the system is configured correctly.


a. Run the command:
[nz@nzhost1 ~]$ ./sys_rev_check

b. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”

Troubleshooting
If the switch seems to be hung up during configuration procedure, it may be due to a previ-
ous session not having closed properly. To correct this:
1. Type the command:
[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8052.
2. Type admin and press Enter.
You are now logged into the G8052.
3. Type Ctrl-A.
4. Type x.
5. Select yes from the prompt and press Enter.
The active host prompt returns.
6. Restart the switch configuration at step 8 on page 14-3, or step 22, depending on
where the hang up was.

00X6949 Rev.1.40 14-5


Replacement Procedures: IBM PureData System for Analytics N3001

14-6 00X6949 Rev.1.40


C H A P T E R 15
Replacing a G8264 Fabric Switch
What’s in this chapter
 Replacement Procedure
 Troubleshooting

Before you begin the G8264 Fabric Switch replacement process, make certain that you
have a replacement Fabric Switch that conforms to the hardware models supported for the
IBM PureData System for Analytics N3001. Rack 2 in an N3001 system includes two
G8264 Fabric Switches.
Note: You'll need a BNT serial cable (DB 9-F to mini-USB) for this installation procedure
(IBM FRU number 46D0180).

Note: The N3001-001 system does not use a Fabric Switch.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

Note: The power and fan modules of the G8264 Fabric Switch are replaceable. Part num-
bers are listed in “Overview of the IBM PureData System for Analytics N3001,” and
replacement procedures are included in IBM BNT Rack Switch G8264F Installation Guide.

The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The replacement part number for the G8264 switch is 49Y7923.

Replacement Procedure
The replacement G8264 fabric switch must be set up with a valid IP address and then con-
figured for use in an N3001. This includes internal settings as well as firmware loading.
To replace a G8264 Fabric Switch:

15-1
Replacement Procedures: IBM PureData System for Analytics N3001

1. Read the safety information that begins on page v.


2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
Note: If the system is already in maintenance mode, an error message is output. To
identify the active (primary) host, type the command:

[root@nzhost1 ~]# service drbd status


The following line in the output shows the active/passive hosts:
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
Primary is always HA1, and secondary is HA2. In this example, HA1 (primary) is the
active host. (If the output showed as Secondary/Primary, HA2 would be the active
host.)
3. If NPS resource group is running:
a. Log in to the active host (nzhost1 in this example) as root.
b. Change to user nz:
[root@nzhost1 ~]# su - nz

c. Check to see if Call Home is enabled, and if so, temporarily disable it.
 Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

 If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

d. Run the following command to stop the Netezza server:


[nz@nzhost1 ~]$ nzstop

e. Exit from the nz session and return to user root:


[nz@nzhost1 ~]$ exit

f. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 service heartbeat stop
[root@nzhost1 ~]# service heartbeat stop
g. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
Ensure the script completes without errors.
4. Obtain the IP address of the fabric switch by examining the file /etc/hosts:
[root@nzhost1 ~]# cat /etc/hosts | grep netswfab01[a,b]
Use netswmgt01a for the upper switch in rack 2, and netswfab01b for the lower switch
in rack 2.
Example output:

15-2 00X6949 Rev.1.40


Chapter : Replacing a G8264 Fabric Switch

10.0.128.81 netswfab01a # Fabric Switch 1


5. Make note of the IP address for use in a later step.
6. Remove and replace the fabric switch from the rack.
Note: Ensure that all cables are labeled, and that they are replaced in the same order
and ports from which they were removed.

Statement 3

CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
 Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
 Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.

DANGER

Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

00X6949 Rev.1.40 15-3


Replacement Procedures: IBM PureData System for Analytics N3001

7. Connect the serial cable between the active host and the fabric switch.
The cable must be connected from the serial port at the rear of the active host to the
mini-USB connector on the front of the replaced G8264 fabric switch, Refer to
Figure 15-1 for the location of the mini-USB connector on the G8264 switch.

Mini-USB port

Figure 15-1: Front of G8264 Fabric Switch

8. Type the command:


[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8264.
9. You are prompted for a password. Type admin and press Enter.
You are now logged into the G8264.
10. At the prompt, type enable and press Enter.
11. At the prompt, type config term and press Enter.
12. At the prompt, type interface ip 128 and press Enter.
13. At the prompt, type ip address xxx.xxx.xxx.xxx and press Enter.
Note: Use the IP address obtained in step 4.

14. At the prompt, type ip netmask 255.255.252.0 and press Enter.


15. At the prompt, type copy running-config startup-config and press Enter.
16. When prompted, type Y.
17. At the prompt, type enable and press Enter.
18. At the prompt, type exit and press Enter.
19. At the next prompt, again type exit and press Enter.

15-4 00X6949 Rev.1.40


Chapter : Replacing a G8264 Fabric Switch

20. When the G8264 prompt returns, type Ctrl-A, then type x.
21. Select yes from the prompt and press Enter.
The ha1 prompt returns.
22. When the configuration scripts completes, load the latest released firmware for the
G8264 switch:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./firmware_updater RackFabSwitch --alias
netswfab01[a,b]

23. Configure the switch:


[root@nzhost1 ~]# /nzlocal/scripts/netswfab/netswfabConfig.sh -f no
-s [1a,1b]

24. When prompted for the password, type admin.


Note: If the switch seems to get hung up at this point, see the section “Troubleshoot-
ing” on page 15-6

25. Verify that all connections to the switch are correct:


[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./system_diags concheck
If errors are reported, make the wiring corrections and rerun concheck.
26. Put the system back into cluster mode:
a. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh

b. Type the following commands to start the clustering processes:


[root@nzhost1 ~]# service heartbeat start
[root@nzhost1 ~]# ssh ha2 'service heartbeat start'
c. Wait five minutes and then type the command:
[root@nzhost1 ~]# su - nz

d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'

27. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

28. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check

c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”

00X6949 Rev.1.40 15-5


Replacement Procedures: IBM PureData System for Analytics N3001

Troubleshooting
If the switch seems to be hung up during configuration procedure, it may be due to a previ-
ous session not having closed properly. To correct this:
1. Type the command:
[root@nzhost1 ~]# minicom gig
A minicom session opens, connected to the G8264.
2. Type admin and press Enter.
You are now logged into the G8264.
3. Type Ctrl-A.
4. Type x.
5. Select yes from the prompt and press Enter.
The active host prompt returns.
6. Restart the switch configuration at step 8 on page 15-4, or step 23 on page 15-5,
depending on where the hang up was.

15-6 00X6949 Rev.1.40


C H A P T E R 16
Replacing a Keyboard/Video/Mouse (KVM)
Before you begin the KVM replacement process, make certain that you have a replacement
KVM that conforms to the hardware models supported for the IBM PureData System for
Analytics N3001.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

Detailed instructions for replacing KVM components are provided in IBM 1U 17-inch Flat
Panel Console Kit Installation and Maintenance Guide.
The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system.
The replacement FRU numbers for the KVM components are:
 Monitor/Tray - 47C2521
 Keyboard - 00X6927
 Switch - 69Y6015
 USB/Video/Ethernet Adapter - 39M2909
 Ethernet Cable - 90Y3732
 Terminators - 39M2912
 Power Cord Y-Adapter - 39M5450

16-1
Replacement Procedures: IBM PureData System for Analytics N3001

To replace a KVM:
1. Read the safety information that begins on page v.
2. Disconnect the KVM power cable from outlet 5 of the lower left RPC.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

3. Disconnect the input/output cables from the component being replaced.


Note: Ensure that all cables are labeled so that they can be replaced in the same ports
from which they are removed.

4. Remove the KVM, or the KVM component, from the rack.


5. Install the replacement KVM component(s) into the rack.
Note: If replacing the monitor and/or keyboard, the power cable must be routed
through the extension arm to the side of the rack and connected to the Y-adapter. The
monitor cable, and keyboard cable must also be routed through the extension arm and
then through the slide to the rear of the rack to the side of the KVM switch. This rout-
ing requires the removal of the switch. The power cord from the switch is routed back
through the slide at the side of the switch and to the Y-adapter.

6. Replace the input/output cables for the replaced component.

16-2 00X6949 Rev.1.40


Chapter : Replacing a Keyboard/Video/Mouse (KVM)

To Host1 To AMM
To Y-Adapter From Monitor Adapter Adapter

From Keyboard To Host2 Terminators


Adapter

Figure 16-1: KVM Switch Connections

Do Not Use

From KVM Switch To Host/AMM Video Port To Host/AMM USB Port

Figure 16-2: USB/Video/Ethernet Adapter Connections

7. Plug the power cable (Y-adapter) into outlet 5 of the lower left RPC.

00X6949 Rev.1.40 16-3


Replacement Procedures: IBM PureData System for Analytics N3001

16-4 00X6949 Rev.1.40


C H A P T E R 17
Replacing a Power Distribution Unit (PDU)
What’s in this chapter
 Upper and Lower PDUs

Before you begin the PDU replacement process, make certain that you have a replacement
PDU that conforms to the hardware models supported for the IBM PureData System for
Analytics N3001. Each N3001 rack has four PDUs.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

Items required:
 Serial Cable, included with 00AK104, connection type DB-9F to RJ11
The estimated time to perform this procedure is up to 90 minutes, depending on ease of
access to the system and familiarity with NPS and the Netezza system.
The FRU numbers for the PDUs are:
 Upper and Lower (also called RPCs): 00AK104

Upper and Lower PDUs


To replace an upper or lower PDU on the N3001 system, follow these steps:
1. Read the safety information that begins on page v.
2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:

17-1
Replacement Procedures: IBM PureData System for Analytics N3001

crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource


-r nps -W
resource nps is running on: nzhost1
Note: If the system is already in maintenance mode, an error message is output. To
identify the active (primary) host, type the command:

[root@nzhost1 ~]# service drbd status


The following line in the output shows the active/passive hosts:
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
Primary is always HA1, and secondary is HA2. In this example, HA1 (primary) is the
active host. (If the output showed as Secondary/Primary, HA2 would be the active
host.)
3. If NPS resource group is running:
a. Log in to the active host (nzhost1 in this example) as root.
b. Change to user nz:
[root@nzhost1 ~]# su - nz

c. Check to see if Call Home is enabled, and if so, temporarily disable it.
 Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

 If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

d. Run the following command to stop the Netezza server:


[nz@nzhost1 ~]$ nzstop

e. Exit from the nz session and return to user root:


[nz@nzhost1 ~]$ exit

f. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 service heartbeat stop
[root@nzhost1 ~]# service heartbeat stop
g. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
Ensure the script completes without errors.
4. On the active host, run sysrevcheck to verify that the system is configured correctly.
a. Change directory to:
[root@nzhost ~]# cd /opt/nz/fdt
b. Run the command:
[root@nzhost ~]# ./sys_rev_check
Make note of any issues that are identified.

17-2 00X6949 Rev.1.40


Chapter : Replacing a Power Distribution Unit (PDU)

5. Take note of all the power connections to the PDU that is being replaced by labeling
each power cord with the port number where each is plugged into the PDU. Also note
the physical location of the PDU (Rack 1, upper, right for example) as you will need
this information for configuring the replacement PDU.
6. Unplug all power cords and network cables from the PDU, unplugging the input power
connection last.

Statement 1

DANGER
Electrical current from power, telephone, and communication cables is hazardous.
To avoid a shock hazard:
 Do not connect or disconnect any cables or perform installation, maintenance, or
reconfiguration of this product during an electrical storm.
 Connect all power cords to a properly wired and grounded electrical outlet.

 Connect to properly wired outlets any equipment that will be attached to this
product.
 When possible, use one hand only to connect or disconnect signal cables.
 Never turn on any equipment when there is evidence of fire, water, or structural
damage.
 Disconnect the attached power cords, telecommunications systems, networks, and
modems before you open the device covers, unless instructed otherwise in the
installation and configuration procedures.
 Connect and disconnect cables as described in the following table when installing,
moving, or opening covers on this product or attached devices.

To Connect: To Disconnect:
1. Turn everything OFF. 1. Turn everything OFF.
2. First, attach all cables to devices. 2. First, remove power cords from outlet.
3. Attach signal cables to connectors. 3. Remove signal cables from connectors.
4. Attach power cords to outlets. 4. Remove all cables from devices.
5. Turn device ON.

00X6949 Rev.1.40 17-3


Replacement Procedures: IBM PureData System for Analytics N3001

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

7. Replace the PDU and connect the input power cable to its source.
8. Attach all power cords to the appropriate ports and reconnect the network cable to the
network port.
9. Configure the replacement PDU:
Note: You need the serial cable, FRU number 69Y2042, connection type DB-9F to
RJ11 cable. The cable connects from the serial port on the active host to the serial port
on the PDU.

Type the commands:


[root@nzhost1 ~]# cd /nzlocal/scripts/rpc
[root@nzhost1 ~]# ./rpcconfigure -s n[ul][lr]
Where n=rack number, u=upper, i=lower, l=left, r=right.
For example:
[root@nzhost1 ~]# ./rpcconfigure -s 1ur
Configures the upper right PDU in rack 1.
Follow the instructions on the screen.
Result: The system begins the configuration. Based on the model you are configuring,
respond accordingly to the system prompts. When you are asked for a password, type
admin and press Enter.
Note: Within the script that follows, you are asked to wait one minute for configuration
to complete. In some cases, the script times out, but you may need to wait up to ten
minutes for the system to display the time-out signal. Once you receive the time-out
signal, you need to restart the script.

Example output:
[root@nzhost1 rpc]# ./rpcconfigure -s 1ur
------------------------------------------------------------------
Host Platform Configuration Version 5.4
2014-09-20.20418.rel-hpfConfig-5.cm.20418

17-4 00X6949 Rev.1.40


Chapter : Replacing a Power Distribution Unit (PDU)

Script to re-configure one RPC.

Performing maintenance on an RPC.


The following prompts are used to enter the specifics
Please use gray APC serial cable.
Plug the 9-pin female plug into the back of the host green socket.
Then, plug the RJ-11 head into the narrow socket marked "serial port"
on the top ofrpc rack 1 upper right (as viewed from the rear).
Hit <Enter> when you are ready to configure the RPC.

This will take about 1 minute(s) to configure. Please wait....


Waiting for 10.0.128.32 to reboot.
rpc rack 1 upper right is reachable.
Sleep 30 seconds ...
Waiting for 10.0.128.32 to reboot.
Finished configure on rpc rack 1 upper right.
Please enter management network switch password
Please confirm management network switch password
Get rpc connection info from netswmgt01 . Please wait 20~70 seconds
......
Checking extra or missing connections. Please wait......
Finished RPC configure.
10. Verify that all connections to the PDU are correct and that it is configured correctly:
a. From the active master, type the commands:
[root@nzhost1 ~]# cd /nzlocal/scripts/rpc
[root@nzhost1 ~]# ./rpcconfigure -c -n
When prompted for a password, type admin and press Enter.
b. Change directory to:
[root@nzhost1 ~]# cd /opt/nz/fdt

c. Run the command:


[root@nzhost1 ~]# ./system_diags concheck
Verify that power is being removed from the appropriate PDU ports. If errors are
reported, make the wiring corrections and rerun concheck.
d. Run the command:
[root@nzhost1 ~]# ./system_diags rpccheck

e. As user root, run sysrevcheck to verify that the PDU is at the correct firmware level:
Run the command:
[root@nzhost1 ~]# ./sys_rev_check rpc

11. Put the system back into cluster mode:


a. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.heartbeat.sh
b. Type the following commands to start the clustering processes:
[root@nzhost1 ~]# service heartbeat start
[root@nzhost1 ~]# ssh ha2 'service heartbeat start'

00X6949 Rev.1.40 17-5


Replacement Procedures: IBM PureData System for Analytics N3001

c. Wait five minutes and then type the command:


[root@nzhost1 ~]# su - nz
d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'
12. If Call Home was previously disabled, enable it.
[nz@nzhost1 ~]$ nzOpenPmr --on
13. Exit from the nz session to return to user root:
[nz@nzhost1 ~]$ exit
14. As user root, run sysrevcheck to verify that the system is configured correctly.
Run the commands:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./sys_rev_check
If failures are noted in the output, resolve the failures as described in the FDT User’s
Guide, in the section “Resolving sysrevcheck Failures,” and then rerun sysrevcheck to
verify that issues are resolved.

17-6 00X6949 Rev.1.40


C H A P T E R 18
Replacing a Media Tray in an H Chassis
The media tray in the IBM PureData System for Analytics N3001 H Chassis is replaceable.
Note: The N3001-001 system does not use an H chassis.

Before you begin the media tray replacement process, make certain that you have a
replacement media tray that conforms to the hardware models supported for the N3001
system. Typically, you will use a new replacement media tray.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

The estimated time to perform this procedure is from 20 to 45 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the media tray can be obtained as described in “H-Chassis Compo-
nent FRU Numbers” on page 1-13.
To replace a media tray, use the procedures in BladeCenter H Type 8852, 7989, and
1886, Problem Determination and Service Guide.

18-1
Replacement Procedures: IBM PureData System for Analytics N3001

18-2 00X6949 Rev.1.40


C H A P T E R 19
Replacing an H Chassis or Midplane
Before you begin the H Chassis or midplane replacement process, make certain that you
have a replacement chassis midplane that conforms to the hardware models supported for
the IBM PureData System for Analytics N3001 system. Each rack contains one H chassis.
Note: The N3001-001 system does not use an H chassis.

Observe Electrostatic Discharge (ESD) precautions when handling electronic components.


ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

This procedure requires the system to be taken offline.

This procedure requires the user to have root access.

This procedure requires that the IP address of the system components use the Netezza
default IP addresses (configIP must not have been applied).

The estimated time to perform this procedure is from 60 to 180 minutes, depending on
ease of access to the system and familiarity with NPS and the Netezza system.
The FRU number for the H Chassis midplane can be obtained as described in “H-Chassis
Component FRU Numbers” on page 1-13.
The replacement part number for the full H Chassis is IBM P/N 31R3308.
To replace an H Chassis midplane or chassis:
1. Read the safety information that begins on page v.
2. Ensure that all cables connected to the modules in the chassis are clearly marked.
3. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
Example output from a running system:
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1

19-1
Replacement Procedures: IBM PureData System for Analytics N3001

Note: If the system is already in maintenance mode, an error message is output. To


identify the active (primary) host, type the command:

[root@nzhost1 ~]# service drbd status


The following line in the output shows the active/passive hosts:
1:r0 Connected Primary/Secondary UpToDate/UpToDate C /nz ext3
Primary is always HA1, and secondary is HA2. In this example, HA1 (primary) is the
active host. (If the output showed as Secondary/Primary, HA2 would be the active
host.)
4. If NPS resource group is running:
a. Log in to the active host (nzhost1 in this example) as root.
b. Change to user nz:
[root@nzhost1 ~]# su - nz
c. Check to see if Call Home is enabled, and if so, temporarily disable it.
 Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

 If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off

d. Run the following command to stop the Netezza server:


[nz@nzhost1 ~]$ nzstop

e. Exit from the nz session and return to user root:


[nz@nzhost1 ~]$ exit

f. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 service heartbeat stop
[root@nzhost1 ~]# service heartbeat stop
g. Run the following script:
[root@nzhost1 ~]# /nzlocal/scripts/nz.non-heartbeat.sh
Ensure the script completes without errors.
5. Disconnect the power connections to the chassis.

Statement 5

CAUTION:
The power control button on the device and the power switch on the power supply do not
turn off the electrical current supplied to the device. The device also might have more than
one power cord. To remove all electrical current from the device, ensure that all power
cords are disconnected from the power source.

19-2 00X6949 Rev.1.40


Chapter : Replacing an H Chassis or Midplane

6. Remove all components from the chassis.

Statement 3

CAUTION:
When laser products (such as CD-ROMs, DVD drives, fiber optic devices, or transmitters)
are installed, note the following:
 Do not remove the covers. Removing the covers of the laser product could result in
exposure to hazardous laser radiation. There are no serviceable parts inside the device.
 Use of controls or adjustments or performance of procedures other than those specified
herein might result in hazardous radiation exposure.

DANGER

Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the
following.
Laser radiation when open. Do not stare into the beam, do not view directly with opti-
cal instruments, and avoid direct exposure to the beam.

7. Replace the chassis or midplane as described in BladeCenter H Problem Determination


and Service Guide.
8. Replace all components except the AMM in bay 2.
9. Connect all cables to their respective connectors (including the cables to the AMM in
bay 2).
Do not apply power to the chassis at this time.

10. On nzhost1, examine the IP address of the AMM in bay 1 using the command:
[root@nzhost1 ~]# nslookup mm00x
Where x is the number of the chassis.

00X6949 Rev.1.40 19-3


Replacement Procedures: IBM PureData System for Analytics N3001

Example output:
Server: 127.0.0.1
Address: 127.0.0.1#5

Name: mm001
Address: 10.0.129.0
Alternatively, the IP address of the AMM is listed in the file /etc/hosts.
11. Copy the appropriate AMM configuration file to /tmp/amm.cfg. AMM configuration files
are found in the directory /nzlocal/scripts/spa/bc, named spaxx.amm.cfg, where xx is
the chassis number (for example, 01).
12. Configure the network alias on nzhost1:
[root@nzhost1 ~]# ifconfig bond0:0 192.168.70.130 netmask
255.255.255.0

13. Reconnect the power connections to the chassis.


14. Use a paper-clip to press the AMM reset switch for 10 seconds. This sets the AMM to
factory default values.
Note: Resetting the AMM to factory default sets the IP address of the Ethernet port to
192.168.70.125.

Reset Switch

Figure 19-1: AMM Reset Switch

15. Wait until the AMM responds to pings at IP address 192.168.70.125:


[root@nzhost1 ~]# ping 192.168.70.125
This may take up to five minutes.
16. Connect via telnet to 192.168.70.125 and set its IP address:
The AMM may be unresponsive in the telnet session after a short time in the default config-
uration. For that reason, it may be necessary to perform the following command as quickly
as possible.

19-4 00X6949 Rev.1.40


Chapter : Replacing an H Chassis or Midplane

[root@nzhost1 ~]# telnet 192.168.70.125


(username USERID, password: PASSW0RD [with a zero, not the letter O])
17. Type:
telnet> ifconfig -eth0 -i <IP address of AMM 1> -s 255.255.252.0 -T
mm[1]
The IP address of AMM 1 was identified in step 10.
For example:
telnet> ifconfig -eth0 -i 10.0.129.0 -s 255.255.252.0 -T mm[1]

18. Reset the AMM:


telnet> reset -T mm[1]

19. Exit the telnet session.


20. Wait until the AMM responds to pings at IP address <IP address of AMM 1>:
[root@nzhost1 ~]# ping <IP address of AMM 1>
This may take up to five minutes.
21. Disable the network alias on nzhost1:
[root@nzhost1 ~]# ifconfig bond0:0 down

22. Connect via telnet to <IP address of AMM 1>:


[root@nzhost1 ~]# telnet mm00x
Where x is the chassis number (username USERID, password: PASSW0RD[with a zero,
not the letter O])
23. Copy the configuration file:
telnet> read -config file -l /tmp/amm.cfg -i 10.0.128.1 -T mm[1]

24. Reset the AMM:


telnet> reset -T mm[1]

25. Insert the AMM in bay 2.


26. Exit the telnet session.
27. Run the configuration script:
[root@nzhost1 ~]# cd /nzlocal/scripts/spa
[root@nzhost1 ~]# ./spaconfigure.sh -s x
Where x is the chassis number.
28. If spaconfigure fails, find the reason by examining the spaconfigure log at:
/var/log/nz/spaconfigure/spaconfigure.SPAxx.log. Correct the issue, then re-run
spaconfigure.
29. When the configuration scripts completes, verify that all connections to the chassis are
correct:
[root@nzhost1 ~]# cd /opt/nz/fdt
[root@nzhost1 ~]# ./system_diags concheck
[root@nzhost1 ~]# ./system_diags DataPathCheck
If errors are reported, make the wiring corrections and rerun the command.

00X6949 Rev.1.40 19-5


Replacement Procedures: IBM PureData System for Analytics N3001

30. If the system was in cluster mode in step 2, put the system in cluster mode:
a. Run the following script:
[root@nzhost1 fdt]# /nzlocal/scripts/nz.heartbeat.sh

b. Type the following commands to stop the clustering processes:


[root@nzhost1 fdt]# service heartbeat start
[root@nzhost1 fdt]# ssh ha2 'service heartbeat start'
c. Wait five minutes and then type the command:
[root@nzhost1 ~]# su - nz

d. The command may require up to 10 minutes to complete; to verify when the system
state is online, use the following command until it returns the "Online" status:
[nz@nzhost1 ~]$ nzstate
System state is 'Online'

31. If Call Home was previously disabled, enable it.


[nz@nzhost1 ~]$ nzcallhome -on

32. Run sys_rev_check to verify that the system is configured correctly.


a. Change directory to:
[nz@nzhost1 ~]$ cd /opt/nz/fdt

b. Run the command:


[nz@nzhost1 ~]$ ./sys_rev_check

c. If issues are noted in the sys_rev_check output, resolve the issues as described in
the FDT User’s Guide, in the section “Resolve sys_rev_check Issues.”

19-6 00X6949 Rev.1.40


C H A P T E R 20
Replacing a Power Supply
Power supplies in the IBM PureData System for Analytics N3001 rack are replaceable in
the Host Server, Disk Enclosures, the H Chassis, and the Management and Fabric switches.
Before you begin the power supply replacement process, make certain that you have a
replacement power supply that conforms to the hardware models supported for the N3001
system. Typically, you will use a new replacement power supply.
Observe Electrostatic Discharge (ESD) precautions when handling electronic components.
ESD precautions are included “Electrostatic Discharge Precautions” on page 1-17.

Some power supplies are classified as High Efficiency. A failed High Efficiency power sup-
ply must be replaced with a High Efficiency power supply, and both/all power supplies in
the chassis must be High Efficiency. To check the classification of host power supplies, you
can log into the IMM and check the VPD.

The estimated time to perform this procedure is from 10 to 30 minutes, depending on ease
of access to the system and familiarity with NPS and the Netezza system.
An amber Fault LED lights on the failed power supply.
The power supply FRU numbers are:
 Disk Enclosure - 45W8841
 x3650-M4 Host Server - 94Y8114
 x3650-M4-HD Host Server - 94Y8118
 x3750-M4 Host Server - 69Y5954
 G8052 Management Switch - 00D6271
 G8264 Fabric Switch - 00D6271
 The FRU number for the H Chassis power supply can be obtained as described in “H-
Chassis Component FRU Numbers” on page 1-13.
To replace a power supply, use the procedures in:
 System Storage EXP2500, Installation, User’s, and Maintenance Guide
 BladeCenter H Type 8852, 7989, and 1886, Problem Determination and Service
Guide
 Rack Switch G8052 Installation Guide

20-1
Replacement Procedures: IBM PureData System for Analytics N3001

 IBM BNT Rack Switch G8264F Installation Guide


 Problem Determination and Service Guide for the appropriate host model

Statement 8

CAUTION:
Never remove the cover on a power supply or any part that has the following label attached.

Hazardous voltage, current, and energy levels are present inside any component that has
this label attached. There are no serviceable parts inside these components. If you suspect
a problem with one of these parts, contact a service technician.

20-2 00X6949 Rev.1.40


APPENDIX A
Reference Materials
What’s in this appendix
 Shutting Down an N3001 System
 Bringing Up an N3001 System

The following sections describe the procedure for shutting down and bringing up an IBM
PureData System for Analytics N3001.
This procedure requires the user to have root access.

Shutting Down an N3001 System


Perform the following procedure to shut down an N3001 system:
1. Log in to the host server (ha1) as root.
Note: Do not use the command su to become root.

2. Identify the active host in the cluster, which is the host where the NPS resource group
is running:
[root@nzhost1 ~]# crm_resource -r nps -W
crm_resource[5377]: 2009/06/07_10:13:12 info: Invoked: crm_resource
-r nps -W
resource nps is running on: nzhost1
3. Log in to the active host (nzhost1 in this example) as user nz.
4. Check to see if Call Home is enabled, and if so, disable it.
a. Check if Call Home is enabled:
[nz@nzhost1 ~]$ nzcallhome -status

b. If enabled, disable it:


[nz@nzhost1 ~]$ nzcallhome -off
5. Run the following command to stop the Netezza server:
[nz@nzhost1 ~]$ nzstop

6. Type the following commands to stop the clustering processes:


[root@nzhost1 ~]# ssh ha2 'service heartbeat stop'

A-1
Replacement Procedures: IBM PureData System for Analytics N3001

[root@nzhost1 ~]# service heartbeat stop

7. Type the following commands to stop the DRBD processes:


[root@nzhost1 ~]# ssh ha2 'service drbd stop'
[root@nzhost1 ~]# service drbd stop
8. Log into ha2 as root and shut down the Linux operating system using the following
command:
[root@nzhost2 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system activ-
ity. When it finishes, it displays the message “power down” which indicates that it is
now safe to turn off the power to the server.
9. Press the power button on Host 2 (located in the front of the cabinet) to power down
that NPS host.
10. On ha1, shut down the Linux operating system using the following command:
[root@nzhost1 ~]# shutdown -h now
The system displays a series of messages as it stops processes and other system activ-
ity. When it finishes, it displays the message “power down” which indicates that it is
now safe to turn off the power to the server.
11. Press the power button on Host 1 (located in the front of the cabinet) to power down
that NPS host.
12. Switch the breakers to OFF on both the left and right PDUs. (Repeat this step for each
rack of the system.)

Bringing Up an N3001 System


Perform the following to bring up an N3001 system.
1. Make sure that the two main power cables are connected to the data center drops;
there are two power cables for each rack of the system.
2. Switch the breakers to ON on both the left and right PDUs. (Repeat these steps for
each rack of the system.)
3. Press the power button on both host servers and wait for the servers to start. This pro-
cess can take a few minutes.
4. Log in to the host server (ha1) as root.
5. Run the crm_mon command to obtain the cluster status:
[root@nzhost1 ~]# crm_mon -i5
The output of the command refreshes at the specified interval rate of 5 seconds (-i5).
Review the output and watch for the resource groups to all have a Started status. This
usually takes about 2 to 3 minutes, then proceed to the next step. Sample output fol-
lows:
============
Last updated: Tue Jun 2 11:46:43 2009
Current DC: nzhost1 (key)
2 Nodes configured.
3 Resources configured.

A-2 00X6949 Rev.1.40


Chapter :

============
Node: nzhost1 (key): online
Node: nzhost2 (key): online
Resource Group: nps
drbd_exphome_device (heartbeat:drbddisk): Started nzhost1
drbd_nz_device (heartbeat:drbddisk): Started nzhost1
exphome_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
nz_filesystem (heartbeat::ocf:Filesystem): Started nzhost1
fabric_ip (heartbeat::ocf:IPaddr): Started nzhost1
wall_ip (heartbeat::ocf:IPaddr): Started nzhost1
nz_dnsmasq (lsb:nz_dnsmasq): Started nzhost1
nzinit (lsb:nzinit): Started nzhost1
fencing_route_to_ha1 (stonith:apcmaster): Started nzhost2
fencing_route_to_ha2 (stonith:apcmaster): Started nzhost1
6. Press Ctrl-C to exit the crm_mon command and return to the command prompt.
7. Log into the nz account.
[root@nzhost1 ~]# su - nz

8. If Call Home was disabled before shutting down the system, enable it.
[nz@nzhost1 ~]$ nzcallhome -on

9. Verify that the system is online using the following command:


[nz@nzhost1 ~]$ nzstate
System state is 'Online'.

00X6949 Rev.1.40 A-3


Replacement Procedures: IBM PureData System for Analytics N3001

A-4 00X6949 Rev.1.40


APPENDIX B
Notices and Trademarks
What’s in this appendix
• Notices
• Trademarks
• Electronic Emission Notices
• Regulatory and Compliance

This section describes some important notices, trademarks, and compliance information.

Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and ser-
vices currently available in your area. Any reference to an IBM product, program, or service
is not intended to state or imply that only that IBM product, program, or service may be
used. Any functionally equivalent product, program, or service that does not infringe any
IBM intellectual property right may be used instead. However, it is the user's responsibility
to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to: This information was developed for
products and services offered in the U.S.A.
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellec-
tual Property Department in your country or send inquiries, in writing, to:
IBM World Trade Asia Corporation
Licensing 2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE

B-1
Replacement Procedures: IBM PureData System for Analytics N3001

IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR


A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied war-
ranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new edi-
tions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only
and do not in any manner serve as an endorsement of those Web sites. The materials at
those Web sites are not part of the materials for this IBM product and use of those Web
sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropri-
ate without incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of
enabling: (i) the exchange of information between independently created programs and
other programs (including this one) and (ii) the mutual use of the information which has
been exchanged, should contact:
IBM Corporation
Software Interoperability Coordinator, Department 49XA
3605 Highway 52 N
Rochester, MN 55901
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including
in some cases, payment of a fee.
The licensed program described in this document and all licensed material available for it
are provided by IBM under terms of the IBM Customer Agreement, IBM International Pro-
gram License Agreement or any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly.
Some measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems. Fur-
thermore, some measurements may have been estimated through extrapolation. Actual
results may vary. Users of this document should verify the applicable data for their specific
environment.
Information concerning non-IBM products was obtained from the suppliers of those prod-
ucts, their published announcements or other publicly available sources. IBM has not
tested those products and cannot confirm the accuracy of performance, compatibility or
any other claims related to non-IBM products. Questions on the capabilities of non-IBM
products should be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or with-
drawal without notice, and represent goals and objectives only.
All IBM prices shown are IBM's suggested retail prices, are current and are subject to
change without notice. Dealer prices may vary.
This information contains examples of data and reports used in daily business operations.
To illustrate them as completely as possible, the examples include the names of individu-
als, companies, brands, and products. All of these names are fictitious and any similarity to
the names and addresses used by an actual business enterprise is entirely coincidental.

B-2 00X6949 Rev.1.40


Trademarks

COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and distrib-
ute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the appli-
cation programming interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions. IBM, there-
fore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Each copy or any portion of these sample programs or any derivative work, must include a
copyright notice as follows:
© your company name) (year). Portions of this code are derived from IBM Corp. Sample
Programs.
© Copyright IBM Corp. _enter the year or years_.
If you are viewing this information softcopy, the photographs and color illustrations may not
appear.

Trademarks
IBM, the IBM logo, ibm.com and Netezza are trademarks or registered trademarks of Inter-
national Business Machines Corporation in the United States, other countries, or both. If
these and other IBM trademarked terms are marked on their first occurrence in this infor-
mation with a trademark symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the Web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml.
Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/
or other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corpo-
ration in the United States, other countries, or both.
NEC is a registered trademark of NEC Corporation.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.
Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or
other countries.
D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the
Wind River logo are trademarks, registered trademarks, or service marks of Wind River Sys-
tems, Inc. Tornado patent pending.
APC and the APC logo are trademarks or registered trademarks of American Power Conver-
sion Corporation.
Other company, product or service names may be trademarks or service marks of others.

00X6949 Rev.1.40 B-3


Replacement Procedures: IBM PureData System for Analytics N3001

Electronic Emission Notices


When you attach a monitor to the equipment, you must use the designated monitor cable
and any interference suppression devices that are supplied with the monitor.

Federal Communications Commission (FCC) Statement


Note: This equipment has been tested and found to comply with the limits for a Class A
digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide
reasonable protection against harmful interference when the equipment is operated in a
commercial environment. This equipment generates, uses, and can radiate radio frequency
energy and, if not installed and used in accordance with the instruction manual, may cause
harmful interference to radio communications. Operation of this equipment in a residential
area is likely to cause harmful interference, in which case the user will be required to cor-
rect the interference at his own expense.
Properly shielded and grounded cables and connectors must be used in order to meet FCC
emission limits. IBM is not responsible for any radio or television interference caused by
using other than recommended cables and connectors or by unauthorized changes or mod-
ifications to this equipment. Unauthorized changes or modifications could void the user's
authority to operate the equipment.
This device complies with Part 15 of the FCC Rules. Operation is subject to the following
two conditions: (1) this device may not cause harmful interference, and (2) this device
must accept any interference received, including interference that might cause undesired
operation.

Industry Canada Class A Emission Compliance Statement


This Class A digital apparatus complies with Canadian ICES-003.

Avis de conformité à la réglementation d'Industrie Canada


Cet appareil numérique de la classe A est conforme à la norme NMB-003 du Canada.

Australia and New Zealand Class A Statement


Attention: This is a Class A product. In a domestic environment this product may cause
radio interference in which case the user may be required to take adequate measures.

European Union EMC Directive Conformance Statement


This product is in conformity with the protection requirements of EU Council Directive
2004/108/EC on the approximation of the laws of the Member States relating to electro-
magnetic compatibility. IBM cannot accept responsibility for any failure to satisfy the
protection requirements resulting from a nonrecommended modification of the product,
including the fitting of non-IBM option cards.
Attention: This is an EN 55022 Class A product. In a domestic environment this product
may cause radio interference in which case the user may be required to take adequate
measures.
Responsible manufacturer:
International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
914-499-1900

B-4 00X6949 Rev.1.40


Electronic Emission Notices

European Community contact:


IBM Technical Relations Europe, Department M456
IBM-Allee 1, 71139 Ehningen, Germany
Telephone: +49 800 225 5426
Email: HalloIBM@de.ibm.com

Germany Class A Statement


Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A EU-Richtlinie zur Elektro-
magnetischen Verträglichkeit
Dieses Produkt entspricht den Schutzanforderungen der EU-Richtlinie 2014/30/EU zur
Angleichung der Rechtsvorschriften über die elektromagnetische Verträglichkeit in den EU-
Mitgliedsstaaten und hält die Grenzwerte der EN 55022 / EN 55032 Klasse A ein.
Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu instal-
lieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel
angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der
Schutzanforderungen, wenn das Produkt ohne Zustimmung der IBM verändert bzw. wenn
Erweiterungskomponenten von Fremdherstellern ohne Empfehlung der IBM gesteckt/einge-
baut werden.
EN 55022 / EN 55032 Klasse A Geräte müssen mit folgendem Warnhinweis versehen
werden:
“Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im Wohnbere-
ich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber verlangt werden,
angemessene Maßnahmen zu ergreifen und dafür aufzukommen.”

Deutschland: Einhaltung des Gesetzes über die elektromagnetische Verträglichkeit von Geräten
Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit von
Geräten (EMVG)”. Dies ist die Umsetzung der EU-Richtlinie 2014/30/EU in der Bundesre-
publik Deutschland.

Zulassungsbescheinigung laut dem Deutschen Gesetz über die elektromagnetische Verträglichkeit von Geräten
(EMVG) (bzw. der EMC EG Richtlinie 2014/30/EU) für Geräte der Klasse A
Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das EG-Konfor-
mitätszeichen - CE - zu führen.
Verantwortlich für die Einhaltung der EMV-Vorschriften ist der Hersteller:
International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
Tel: 914-499-1900
Der verantwortliche Ansprechpartner des Herstellers in der EU ist:
IBM Deutschland
Technical Relations Europe, Abteilung M456
IBM-Allee 1, 71139 Ehningen, Germany
Telephone: +49 800 225 5426
Email: HalloIBM@de.ibm.com
Generelle Informationen:
Das Gerät erfüllt die Schutzanforderungen nach EN 55024 und EN 55022 / EN 55032
Klasse A.

00X6949 Rev.1.40 B-5


Replacement Procedures: IBM PureData System for Analytics N3001

Japan VCCI Class A Statement

This is a Class A product based on the standard of the Voluntary Control Council for Inter-
ference (VCCI). If this equipment is used in a domestic environment, radio interference
may occur, in which case the user may be required to take corrective actions.

Japan Electronics and Information Technology Industries Association (JEITA) Statement

Japan JIS C 61000-3-2 Compliance:

Japan Electronics and Information Technology Industries Association (JEITA) Confirmed


Harmonics Guidelines (products less than or equal to 20 A per phase):

Japan Electronics and Information Technology Industries Association (JEITA) Confirmed


Harmonics Guidelines (products greater than 20 A per phase):

Japan Electronics and Information Technology Industries Association (JEITA) Confirmed


Harmonics Guidelines (products greater than 20 A phase, three-phase):

B-6 00X6949 Rev.1.40


Regulatory and Compliance

Korea Communications Commission (KCC) Statement

This is electromagnetic wave compatibility equipment for business (Type A). Sellers and
users need to pay attention to it. This is for any areas other than home.

Russia Electromagnetic Interference (EMI) Class A Statement

People's Republic of China Class A Electronic Emission Statement

Taiwan Class A Compliance Statement

Regulatory and Compliance


Regulatory Notices
Install the NPS system in a restricted-access location. Ensure that only those trained to
operate or service the equipment have physical access to it. Install each AC power outlet
near the NPS rack that plugs into it, and keep it freely accessible.
Provide approved circuit breakers on all power sources.
Product may be powered by redundant power sources. Disconnect ALL power sources
before servicing.
High leakage current. Earth connection essential before connecting supply. Courant de
fuite élevé. Raccordement à la terre indispensable avant le raccordement au réseau.

00X6949 Rev.1.40 B-7


Replacement Procedures: IBM PureData System for Analytics N3001

The IBM PureData System for Analytics appliance requires a readily accessible power cut-
off. This can be a Unit Emergency Power Off Switch (UEPO), a circuit breaker or
completely remove power from the equipment by disconnecting the Appliance Coupler (line
cord) from all rack PDUs.
CAUTION: Disconnecting power from the appliance without first stopping the NPS soft-
ware and high availability processes could result in data loss and increased service time to
restart the appliance after the power cutoff. For all non-emergency situations, follow the
documented power-down procedures in the IBM Netezza System Administrator's Guide to
ensure that the software and databases are stopped correctly, in order, to avoid data loss or
file corruption.
Homologation Statement
This product may not be certified in your country for connection by any means whatsoever
to interfaces of public telecommunications networks.
Further certification may be required by law prior to making any such connection. Contact
an IBM representative or reseller for any questions.

B-8 00X6949 Rev.1.40

You might also like