You are on page 1of 312

Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n-
no Exadata
PTR/INT Oracle
a
) has Machine
Database i d e ฺ Install and
o m
c nt G u
eaฺMaintenance
is- tud e
c
s r y@ this S
l m a se
l i e ฺe to u Student Guide - Volume II
( e
y cens e
s r li
El Ma
E l i e

D80881GC10
Edition 1.0
April 2013
D81624
Authors Copyright © 2013, Oracle and/or it affiliates. All rights reserved.

Richard Eppstein Disclaimer

David Winter This document contains proprietary information and is protected by copyright and
other intellectual property laws. You may copy and print this document solely for your
own use in an Oracle training course. The document may not be modified or altered
Technical Contributors in any way. Except where your use constitutes "fair use" under copyright law, you
and Reviewers may not use, share, download, upload, copy, print, display, perform, reproduce,
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

publish, license, post, transmit, or distribute this document in whole or in part without
Leslie Keller the express authorization of Oracle.
Oliver Sharwood
The information contained in this document is subject to change without notice. If you
find any problems in the document, please report them in writing to: Oracle University,
500 Oracle Parkway, Redwood Shores, California 94065 USA. This document is not
Editors warranted to be error-free.
Rashmi Rajagopal
Restricted Rights Notice
Richard Wallis
If this documentation is delivered to the United States Government or anyone using

Graphic Designer
the documentation on behalf of the United States Government, the following notice is
b le
fera
applicable:
Rajiv Chandrabhanu
U.S. GOVERNMENT RIGHTS
a n s
t r
The U.S. Government’s rights to use, modify, reproduce, release, perform, display, or
n-
disclose these training materials are restricted by the terms of the applicable Oracle
Publishers
Pavithran Adka
no
license agreement and/or the applicable U.S. Government contract.
a
Nita Brozowski
Trademark Notice
) has ideฺ
Jayanthy Keshavamurthy
ฺ c om t Gu
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names
may be trademarks of their respective owners.

i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e
Contents
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1 Functions and Features


Course Goals 1-2
Course Map 1-3
Lessons 1-4
Topics Not Covered 1-5
Additional Resources for Installation and Maintenance 1-6
How Prepared Are You? 1-7 ble
Introductions 1-8
ns fera
How to Use Course Materials 1-9
t r a
Conventions 1-10
no n-
Additional Conventions 1-12 a
Functions and Features 1-13 ) has ideฺ
Objectives 1-14
ฺ c om t Gu
i s -ea uden
Functions and Features: Relevance 1-15
c S t
Additional Resources for Functions and Features 1-16
@
sry e thi s
Functions and Features: Overview 1-17
a
lm us
e
Database Machine Attributes 1-18

e lie se to
Additional Exadata X2-8 Machine Attributes 1-21
(
a sry licen
Oracle Exadata Database Machine Comparison 1-22

E lM Function Shipping 1-24

E l i e The Performance Challenge 1-25


A New and Improved Architecture 1-26
Smart Scan Offload Processing 1-27
Demonstration: Smart Scan Example 1-28
Smart Scan Processing 1-29
I/O Elimination with Storage Index 1-30
I/O Resource Management 1-31
Massively Parallel Storage Grid 1-33
Exadata X2-8/X3-8 Enterprise Architecture 1-34
Exadata Scalable Architecture 1-35
Exadata X2-8/X3-8 Scalable Architecture 1-36
Seamless Upgrades and Expansions 1-37
Exadata X3 | Database In-Memory Machine 1-38
NEW Exadata X3-2 Eighth Rack 1-39
Exadata Storage Expansion Rack 1-40

iii
Exadata Storage Expansion Full Rack 1-41
Exadata Storage Expansion Half Rack 1-42
Exadata Storage Expansion Quarter Rack 1-43
Exadata Storage Expansion Rack 1-44
Exadata Storage Expansion Rack Upgrades 1-45
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Exadata X3-8 Database Machine: Core Hardware Components 1-46


Exadata X3-2 Database Machine: Core Hardware Components 1-47
Oracle Exadata Storage Servers 1-48
Oracle Exadata X3-2 and X3-8 Storage Servers 1-49
Oracle Exadata X2-2 and X2-8 Storage Servers 1-50
Oracle Exadata V2 Storage Servers 1-51
Hardware Refresh of Exadata V2 1-52
ble
Oracle Exadata Storage Servers 1-53
Database Server OS: Solaris or Linux 1-54 ns fera
t r a
Deployment Considerations 1-56
no n-
Exercise: Answering Questions 1-57 a
Task 1: Answer the Following Questions 1-58
) has ideฺ
Exercise Summary 1-59
ฺ c om t Gu
Exercise Solutions 1-60
i s -ea uden
Summary 1-61
@ c S t
sry e thi s
a
lm us
2 Components and Architecture
ฺ e
Objectives 2-2
( e lie se to
s ry land
Components
c n
eArchitecture: Relevance 2-3
a i
Additional Resources for Components and Architecture 2-4
ElM
Elie
Recent Hardware Changes 2-5
Exadata Hardware Comparison Summary X3 2-6
Exadata Hardware Comparison Summary X2 2-7
Exadata X3-2 Database Machine Components 2-8
Exadata Database Machine X3-2 Definition 2-9
Exadata X2-2 Database Machine Components 2-10
Exadata Database Machine X2-2 Definition 2-11
Exadata X3-8 Database Machine Components 2-12
Exadata Database Machine X3-8 Definition 2-13
Exadata X2-8 Database Machine Components 2-14
Exadata Database Machine X2-8 Definition 2-15
Exadata Database Machine X2-8 in Assembly 2-16
Gigabit Ethernet Switch 2-17
Gigabit Ethernet Switch Cooling 2-18
Sun Datacenter 36-Port Managed QDR InfiniBand Switches 2-19
Two InfiniBand Leaf Switches 2-20

iv
One InfiniBand Spine Switch 2-21
36-Port Managed QDR InfiniBand Switch Differences 2-22
Exadata X3-2 Database Servers 2-23
Exadata X2-2 Database Servers Based on Sun Fire X4170 M2 2-24
Exadata Memory Expansion Kit 2-27
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Exadata X2-2 Database Server Disks 2-28


Dual Booting the Compute Server 2-29
Failure Scenarios: Single Disk 2-33
Failure Scenarios: Double Disk 2-34
Customer Chooses Solaris 2-35
New Exadata Database Server Comparison 2-36
Exadata V2 Legacy Database Servers 2-37
ble
Exadata X2-8 Database Server (X4800 M2) 2-38
Westmere-EX CPU Chip Diagram 2-39 ns fera
t r a
Exadata X2-8 Database Server (Front View) 2-40
no n-
Exadata X2-8 Database Server (Rear View) 2-41 a
Exadata X2-8 Database Server Disks 2-42
) has ideฺ
ฺ c om t Gu
Database Server Mapping I/O to CPU Modules 2-43

i s -ea uden
QPI Port Mapping 8-socket (4 CMOD) 2-44
c
CPU Module: CMOD Layout 2-45
@ S t
sry e thi
Memory Population 2-47 s
a
lm us
Population Order 2-48
ฺ e
( e lie se to
NEM: Overview 2-49

a sry licen
NEM Components 2-50
Service Processor and Universal Connector Port 2-51
El M
E l i e Chassis Cooling Zones 2-52
Exadata X2-8 Database Server Power Information 2-54
Exadata Storage Server Comparison 2-55
New Exadata X3-2L Storage Servers 2-57
Exadata X2-2 Storage Servers Based on Sun Fire X4270 M2 2-59
Exadata V2 Legacy Storage Servers 2-61
Thermal Sensing and Fan Control 2-62
New Sun Fire Flash Accelerator F40 PCIe Card Aura 2 2-63
LSI Nytro WarpDrive2 / Aura2 2-64
WarpDrive Firmware 2-65
Sun Fire Flash Accelerator F20 PCIe Card 2-66
F20 PCIe Card Versions 1.0 and 1.1 2-68
IB Dual Port 4x QDR PCIe Low-Profile HCA M2 2-70
Sun IB Dual Port 4x QDR PCIe ExpressModule HCA M2 2-71
IB HCA Ports and LEDs 2-72
Keyboard, Video, and Monitor Hardware (Exadata V2 and X2-2 only) 2-73

v
Sun Rack II 1242 2-74
Exadata Database Machine PDUs 2-75
New CMAs 2-76
Exadata and Storage Expansion Racks Low Voltage Single Phase PDU 2-77
Exadata and Storage Expansion Racks Low Voltage Three Phase PDU 2-78
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Exadata and Storage Expansion Racks High Voltage Single Phase PDU 2-79
Exadata and Storage Expansion Racks High Voltage Three Phase PDU 2-80
X2-8 Low Voltage Single Phase PDU 2-81
X2-8 Low Voltage Three Phase PDU 2-82
X2-8 High Voltage Single Phase PDU 2-83
X2-8 High Voltage Three Phase PDU 2-84
Exadata Database Machine Architecture 2-85
ble
Disk Abstraction 2-86
Cell Disk 2-87 n sfera
t r a
Grid Disk 2-88
no n-
a
Oracle Exadata Database Machine Architecture 2-90
Automatic Storage Management 2-92
) has ideฺ
ASM Scale-Out Data Distribution 2-93
ฺ c om t Gu
ASM Data Redistribution 2-94
i s -ea uden
c t
Protection from Hardware Failure 2-95
@ S
sry e thi
Protection from Brownout 2-96 s
a
lm us
Exercises and Discussion 2-97
ฺ e
( e lie se to
Task 1: Quiz 2-98

a sry licen
Exercise Summary 2-99
Exercise Solutions 2-100
E lM
E l i e Summary 2-101
Additional Resources 2-102
X2 Exadata Database Machine Environmental 2-103
X2 Storage Expansion Rack Environmental 2-104

3 Configuration and Installation


Objectives 3-2
Configuration and Installation: Relevance 3-3
Additional Resources for Configuration and Installation 3-4
Exadata Network Connectivity 3-5
Unpacking and Staging the Oracle Exadata Database Machine 3-7
Pre-installation Procedures 3-8
Manageability Tools 3-9
ASR Installation and Workflow 3-10
Facility Requirements 3-11
Network Requirements 3-12

vi
Network Overview 3-13
Information Requirements 3-14
InfiniBand Network Addresses 3-16
Ethernet Network Addresses 3-17
X2-2 KVM Connections 3-18
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Gigabit Ethernet Cabling 3-19


ILOM Cabling 3-20
InfiniBand Cabling 3-21
Component and Interface Abbreviations 3-23
Single-Phase Power Component Connections 3-24
Three-Phase Power Component Connections 3-26
X2/3-8 ILOM Cabling 3-28
ble
X2/3-8 Gigabit Ethernet Cabling 3-29
X2/3-8 InfiniBand Cabling 3-30 ns fera
t r a
X2/3-8 Power Cabling Matrix 3-32
no n-
X2/3-2 Full Rack: Server to Leaf Switch 3-33 a
X2/3-2 Half Rack: Server to Leaf Switch 3-34
) has ideฺ
ฺ c om t Gu
X2/3-2 Quarter Rack: Server to Leaf Switch 3-35

i s -ea uden
X2/3-8 Full Rack: Server to Leaf Switch 3-36
c t
Storage Expansion Full Rack: Server to Leaf Switch 3-37
@ S
sry e thi s
Storage Expansion Half Rack: Server to Leaf Switch 3-38
a
lm us
Storage Expansion Quarter Rack: Server to Leaf Switch 3-39
ฺ e
( e lie se to
Scaling Out to Multiple Full Racks 3-40

a sry licen
Two Rack Case: Fat Tree Topology 3-41
Multiple Rack Case: Up to 8 Racks 3-42
El M
E l i e Scaling Out to 9 to 36 Racks 3-43
Multiple Rack Case: 9 to 36 Racks 3-44
Interconnecting Quarter Racks 3-45
Case 1: Two Quarter Racks 3-46
Case 2: Quarter Rack with One Half or Full Rack 3-47
Case 3: Quarter Rack with Two or More Racks 3-48
InfiniBand Network: External Connectivity 3-49
System Access 3-50
Default Passwords and Usernames 3-51
Unpacking and Staging the Oracle Exadata Database Machine 3-52
Installation 3-56
Post-installation 3-58
Check the Software and Patches at Deployment Time 3-62
Database Platform 3-64
Configure Rack Master Serial Number 3-65
Configuring the Avocent KVM MergePoint Unity Switch (<=X2-2 only) 3-66

vii
KVM Configuration: Example 3-67
Configure Network Settings 3-68
Configure the InfiniBand Switches 3-70
InfiniBand Platform/Version 3-73
Sun Datacenter 36-Port Managed QDR InfiniBand Switch Settings 3-74
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Example of Configuring NTP on the IB Switch 3-75


Example of Configuring NTP on the IB Switch (If Connected to the Network) 3-76
General Environment Tests 3-77
Example of Setting the Switch Retries to 5 3-78
Configure the Cisco 4948 Catalyst Switch 3-79
OneCommand 3-88
Configure the Oracle Database Machine Servers: FirstBoot 3-89
ble
Exercise: Configure the Oracle Database Machine 3-95
Oracle Exadata Lab Diagram A 3-96 ns fera
t r a
ab Diagram B 3-97
no n-
Oracle Exadata Lab Diagram C 3-98 a
has ideฺ
Task 1: Verify Cable Connections on Oracle Exadata Database Machine 3-100
)
ฺ c om t Gu
Task 2: Configure the Oracle Database Machine 3-101
Task 3: Verify System Status 3-102
i s -ea uden
Exercise Summary 3-103
@ c S t
Exercise Solutions 3-104sry e thi s
Summary 3-105
a
lm us
ฺ e
( e lie se to
a s ry licen
4 Administration
Objectives 4-2
ElM
Elie
Relevance of Administration 4-3
Additional Resources for Administration 4-4
Administrative Interfaces 4-5
Oracle Exadata Database Machine and Enterprise Manager Grid Control 4-6
Automatic Service Request: Overview 4-7
Oracle Configuration Manager (OCM): Overview 4-8
CellCLI 4-9
Using the dcli Utility 4-28
dcli Options 4-29
dcli Examples 4-31
Administration and Configuration Commands 4-35
Checking Disk Performance with Calibrate 4-36
Monitoring the System 4-37
Monitoring the System: OS 4-38
Monitoring the System: CellCLI 4-39
Monitoring the System: ILOM Web UI 4-40

viii
Exercise: Verify System Components 4-43
Task 4-1: System Monitoring 4-44
Task 4-2: Navigating System Components 4-45
Task 4-3: Storage Cell Setup 4-47
Exercise Summary 4-57
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Summary 4-58

5 Maintenance and FRU Replacement


Objectives 5-2
Relevance of Maintenance and FRU Replacement 5-3
Additional Resources for Maintenance and FRU Replacement 5-4
Support Maintenance 5-5
ble
Support Case Handling 5-6
Exadata Relationship-Based Support 5-7 ns fera
t r a
Oracle Platinum Support for Engineered Systems 5-8
no n-
Platinum Support 5-9 a
Platinum Versus Premier Comparison 5-10
) has ideฺ
Platinum Service Platform 5-11
ฺ c om t Gu
i s -ea uden
Oracle Advanced Support Gateway X3-2 5-12
c t
Understanding Repair Categories 5-13
@ S
sry e thi s
Product-Specific FRUs and CRUs 5-14
a
lm us
X2-8 Product-Specific FRUs and CRUs 5-16
ฺ e
( e lie se to
Service Procedures X3-2L Storage Cell 5-17

a sry licen
Component Replacement Procedures 5-19
Storage Cell Disk Replacement 5-20
E lM
E l i e IB-HCA Card Installation 5-26
InfiniBand Switch Maintenance 5-31
Backing Up Settings on the IB Switch 5-32
Replacing an InfiniBand Switch 5-33
Restoring Settings on the IB Switch 5-35
Verifying the InfiniBand Network Operation 5-36
New LSI RAID Battery Maintenance Procedure 5-37
LSI HBA Batteries 5-38
Write-Back Versus Write-Through Mode 5-39
LSI HBA Batteries 5-40
Battery Monitoring via Learn Cycles 5-42
New Cable Management Arm 5-44
Exadata X2-8 Database Server 5-45
X2-8 DB Server Service Processor Cabling 5-46
X2-8 DB Server Subassembly Module Removal and Replacement 5-48
CPU Module Orientation 5-50

ix
How to Remove a CPU Module 5-51
CPU Module Components 5-53
Prepare the Server for Operation 5-54
PCIe EM Designations and Population Rules 5-56
IB HCA Ports and LEDs 5-57
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

How to Remove a PCIe EM 5-58


PCIe Express Module: Overview 5-59
How to Install a PCIe EM or PCIe EM Filler 5-60
Network Express Module Designations and Assignments 5-61
How to Remove an NEM or an NEM Filler 5-62
Network Express Module: Overview 5-63
How to Install an NEM or an NEM Filler 5-64
ble
Loading Software After Component Removal or Replacement 5-65
Oracle Exadata Database Machine IB Switch Replacement 5-66 ns fera
t r a
Exercise: Perform Removal and Replacement and Verify Status on
no n-
Components 5-67 a
has ideฺ
Task 3: Removing and Replacing X2-8 DB Server Components 5-78
)
Exercise Summary 5-79
ฺ c om t Gu
Exercise Solutions 5-80
i s -ea uden
Summary 5-81
@ c S t
sry e thi s
6 Troubleshooting lm
a
i e ฺ e to us
y ( eof lTroubleshooting
Objectives 6-2
n se
a s r
Relevance
l i c e 6-3
Additional Resources for Troubleshooting 6-4
ElM
Elie
Machine Troubleshooting 6-5
Collecting Information and Determining a Problem Statement 6-6
Machine Troubleshooting 6-7
Useful Status Commands 6-9
System Component Query Commands 6-11
Hardware Diagnostics: Server 6-26
Hardware Diagnostics: InfiniBand 6-40
Hardware Diagnostics: InfiniBand ILOM/Fabric Monitor 6-44
InfiniBand Utilities Descriptions 6-48
InfiniBand Switch Platform Commands 6-67
InfiniBand Utilities Descriptions 6-72
Known Support Issues and Workarounds 6-87
Exercise Overview: System Troubleshooting 6-95
Summary 6-100

x
7 Advanced Tasks
Objectives 7-2
Relevance of Advanced Tasks 7-3
Additional Resources for Advanced Tasks 7-4
Oracle Database Machine Advanced Tasks 7-5
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Exadata Hardening Script 7-9


Replacing a Physical Disk Due to Disk Failure 7-15
Replacing a Boot Disk Due to Disk Failure 7-18
Replacing a Physical Disk Due to Disk Problems 7-24
Removing a Physical Disk Due to Bad Performance 7-27
Repurposing a Physical Disk 7-29
Moving All Drives from One Cell to Another Cell 7-32
ble
Removing and Replacing the Same Physical Disk 7-34
Replacing an F20 Flash Disk Due to Failure 7-35 ns fera
t r a
Replacing an F20 Flash Disk Due to Problems 7-37
no n-
a
Removing an F20 Flash Disk Due to Bad Performance 7-39
Shutting Down a Cell 7-41
) has ideฺ
ฺ c om t Gu
Re-Creating a Damaged Cell Boot and Rescuing USB 7-44

i s -ea uden
Changing the InfiniBand Network Information 7-45
c t
Understanding the InfiniBand Network Master Subnet Manager 7-55
@ S
sry e thi s
Changing IP Addresses on an Exadata Storage Server 7-56
a
lm us
Nonemergency Power Cycle Procedure 7-60
ฺ e
( e lie se to
Emergency Power-Off Considerations 7-66

a sry licen
Installing and Configuring Auto Service Request: Solaris Server 7-68
Installing and Configuring Auto Service Request: Enterprise Linux Server 7-70
E lM
E l i e Registering ASR Manager 7-72
Configuring ASR Trap Destinations 7-75
Activating ASR Destinations 7-84
Validating Auto Service Request 7-89
ASR Support Process 7-90
Checking MOS Hardware Serial Numbers 7-92
New LSI RAID Battery Maintenance Procedure 7-93
New OneCommand Utility Oracle Exadata Deployment Assistant 7-94
Summary 7-96

A Utility Scripts and Tools


Utility Scripts and Tools A-2

xi
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

E l i e
lM E
a
( e
ฺ e
sry licen
a
lie se to
lm us
@ c i s
sry e thi s S

t
c
)
-ea uden
om t Gu
a
has ideฺ
n- no
t r a n
s
fera
b
le
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Maintenance and FRU Replacement

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives

After completing this lesson, you should be able to:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Describe maintenance for the Oracle Exadata Database


Machine
• Perform removal and replacement of components
• Load software after component removal and replacement

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 2


Relevance of Maintenance and FRU Replacement

Discussion: The following questions are relevant to


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

understanding the Oracle Exadata Database Machine:


• What general preventative maintenance is needed on the
Oracle Exadata Database Machine?
• Where can you obtain information about component and
system failure?
• How can you revert to a factory configuration? ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 3


Additional Resources for Maintenance
and FRU Replacement
The following references provide additional information about the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

topics that are described in this lesson:


• Oracle Exadata Database Machine Owner’s Guide
– http://wd0338.oracle.com/archive/cd_ns/E13877_01/welcome.html
• Creating a Platinum SR
– http://deskmanual.oraclecorp.com:7777/html/INEXA040.htm
• Working a Platinum SR ble
ns
– http://deskmanual.oraclecorp.com:7777/html/PREXA022.htm fera
tra
on- about the
The following MOS article provides additional information
n
topics described in this lesson: a
as h ฺ
• Configuring SSH on Cisco Catalyst 4948
o m u ide Switch [ID
) Ethernet
c nt G
1415044.1] eaฺ is- tude
c
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s referencesce in http://exadata.us.oracle.com
a l i
See the
E lM
following
i e
El“Technical Differences between X3 and X2” PowerPoint and video files.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 4


Support Maintenance

• When the system is not working properly, you


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

must troubleshoot to find out what has failed. It could be a


hardware issue or a software issue.
• If a component fails, you need to check the associated
LEDs for that component and, if necessary, the log files.
An amber LED means that the component needs your
attention and an error message should be in the log files. le
b
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sonr system
l i cecomponents give additional status readings. Check the user’s guides
lM
Other LEDs
E
of
Eli ethe various components for detailed information.
Each component of the Oracle Database Machine has a set of LEDs that indicate the location
and operational status. LEDs have set color schemes and an icon associated with them.
When the Oracle Database Machine is first powered on, all the component LEDs are
subjected to an LED test. After the components boot, the LED status should show the state of
that component.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 5


Support Case Handling

1. The customer files a support ticket.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

– The support ticket ends up in the Exadata Engineering Support


Team (EEST), no matter how the customer files the ticket or which
system the customer uses.
2. The engineers in the EEST group analyze the problem and do some
diagnosis.
– If they determine it is a hardware problem, they gather the
necessary information and engage the TSC (HW support engineer).
3. The Technical Support Consultant (TSC) analyzes the problem.
r a ble
– If it turns out to be a bad piece of hardware and needs to be sfe
t
replaced, a field engineer is dispatched to do the replacement.r a n
4. The Field Engineer (FE) performs the replacement andnupdate on- the TSC.
5. The TSC contacts the EEST engineer to give the s a
update.ฺ
a
h ide
)steps.
6. The EEST engineer continues with the next m u
ฺ c o restt of
G
– In this case, the engineer explains
- e a enthe the steps to the
customer. cis tud
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s process ce should appear seamless to the customer. The FE should stay to
a l i
E lM
Important: This
e
ensure
i thethat
Elhelp
the system is operational—not to perform the required software commands but to
customer resolve any software concerns or another hardware concern with Oracle
assistance.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 6


Exadata Relationship-Based Support
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Single Point of Entry and Ownership


EST triages and works issues.
for Customer

Engages hardware and software


experts as required
Active Collaboration

EST coordinates with ACS, CSM All Parties Informed and Engaged ble
and CIMs.
ns fera
t r a
CIMs and GCA help with
Escalation Management non-
escalated accounts. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 7


Oracle Platinum Support for Engineered Systems

In addition to Oracle Premier Support for Software or Premier


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Support for Systems deliverables*, Platinum Support includes:


• Remote quarterly patch services
• Remote monitoring
• Dedicated remote support response team
• Dedicated escalation manager and hotline
• Enhanced response times r a ble
e s f
tra n
on-
a n
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c PremierSSupport
* All customer requirements must be met to receive Oracle t for Engineered Systems.
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 8


Platinum Support

Detailed features
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Features Platinum
15-minute fault notification
15-minute service request generation for validated faults
Remote Fault Monitoring
Total 30 minutes maximum from fault to service request
generation
30-minute remote response from service request generation,
Severity 1 Remote Response
24/7
Severity 2 Remote Response 2-hour remote response from service request generation, 24/7
b le
Severity 1 Onsite Hardware Response
(< 25 miles)
2 hours, 24/7 from completion of remote diagnosis fera
n s
Severity 2 Onsite Hardware Response
4 hours, 8/7 from completion of remote n -t r a
diagnosis
(< 25 miles)
n o
Senior Support Engineers Remote response team a
Escalation Hotline and Escalation Managers
Exadata ) has idmanagers
Escalation hotline and escalation
e ฺ – dedicated to
Patching
ฺ c ompatch
Oracle will remotely
t G u 4 times per year
systems

i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 9


Platinum Versus Premier Comparison
Platinum Systems Premier Systems

15-minute fault notification


Advanced Monitoring and
15-minute service request generation for validated
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Resolution is an optional service


Remote Fault Monitoring faults
available from ACS
Total 30 minutes maximum from fault to service
request generation (ACS)
30 minutes, 24x7
Severity1 Phone Response 1 hour, 24x7
(TSC must meet for ASR-initiated SRs)
2 hours, 24x7
Severity 2 Phone Response No response commitment
(TSC must meet for ASR-initiated SRs)
Severity 1 Hardware 2 hours, 24x7 from issue diagnosis 2 hours, 24x7
Onsite Response (< 25 miles) (Field) no specific start time
Severity 2 Hardware 4 hours, 8x7 from issue diagnosis 4 hours, 8x5
b le
fera
Onsite Response (< 25 miles) (Field) no specific start time
Support Engineers Dedicated senior level engineers (EEST) s
Premier support queue
a n
Escalation Hotline and Escalation Managers
Dedicated hotline and dedicated escalation
n- t r
Premier support escalation

Patching
managers (CIM)

a no
Oracle will remotely patch systems quarterly (ACS) Customer responsibility

Fault Monitoring ) has ideฺ


Oracle will remotely monitor for system faults and
Customer responsibility

ฺ c om t Gu
alert customers (ACS)

-ea uden
Oracle will remotely collect diagnostic information
Diagnostic Data Collection Customer responsibility
related to faults (ACS)
c i s t
Support Portal
@ S
My Oracle Support with advanced monitoring portal
s
My Oracle Support

a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 10


Platinum Service Platform
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Monitoring Gateway

• Integrates alerts from Oracle


Enterprise Manager & Auto
Service Request:
 Database (11g), Oracle RAC,
ASM, OS, Storage, IB, HW,
Performs event enrichment
 Filtering, de-duplication, sets
severity levels, embeds
ble
fera
knowledge article IDs

• Remote access to:


a ns
 Validate Severity 1 & 2 Fault
n- t r
Events
 Apply Quarterly Patches
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E M
Oracle lAdvanced Support Gateway Server X3-2
e
Eli • It is released as a fixed configuration X3-2 server for Platinum Support.
• Platinum-approved products are Exadata (except V2), Exalogic, and SuperCluster.
• See PPM #140469 for more details.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 11


Oracle Advanced Support Gateway X3-2
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Monitoring Gateway

• Integrates alerts from Oracle


Enterprise Manager & Auto
Service Request:
 Database (11g), Oracle RAC,
ASM, OS, Storage, IB, HW,
Performs event enrichment
 Filtering, de-duplication, sets
severity levels, embeds
ble
fera
knowledge article IDs

• Remote access to:


a ns
 Validate Severity 1 & 2 Fault
n- t r
Events
 Apply Quarterly Patches
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sX3-2:
E lM
Sun Server model family ID: 50933960

Elie• Eight 2.5-inch drive slots disk cage


• Power cord: North America and Asia, 2.5 meters, 5-15P plug, C13 connector, 15 A (for
factory installation)
• Sun Storage 6 Gb SAS PCIe HBA, Internal: 8 port (for factory Installation)
• 1 Intel(R) Xeon(R) E5-2609 4-core 2.4 GHz processor (for factory installation)
• Heatsink (for factory installation)
• One 8 GB DDR3-1600 DIMM (for factory installation)
• One 600 GB 10000 rpm 2.5-inch SAS-2 HDD with bracket (for factory installation)
Sun Server X3-2: 1 RU base chassis with motherboard, 2 PSUs, slide rail kit, and cable
management arm, PCIe filler panel (for factory installation)

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 12


Understanding Repair Categories
Category Description Scenario
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The repair part is hot swappable and


might be replaced without shutting • Disks
Hot Swap (HS) down the host system. Commands • Fans
might be needed before and after • Power supplies
replacement to protect data.

This refers to the repair of the • External cables


• InfiniBand switch
Infrastructure connectivity component within the
ble
fera
Repair (IR) Database Machine rack. No down • Ethernet switch
time of the system is required. • KVM/KMM
a n s
• Systemn r
-tboards
Repair of the part requires the system n o
System Down (SD) •a PCIe cards
to be shut down.
) has •ideMemoryฺ
c
Repair requires the OracleฺExadata om t Gu
Rack Down (RD) Database Machine rack
i s -etoa beushut
d en Power distribution units
down.
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
El i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 13


Product-Specific FRUs and CRUs

Manufacturing Repair
FRU Description
Part Number Category
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

2.53-GHz Quad-Core Intel Xeon E5540, 8MB, 80W, RoHS:Y F371-4300 SD

PCIe flash accelerator F20 SAS HBA, (nonpopulated) RoHS:Y F511-1500 SD


5.5 v 11F capacitive backup power module, RoHS:Y F371-4650 SD

600 GB 15 K RPM SAS disk assembly, RoHS:Y F542-0166 HS

2 TB 7.2 K RPM 3.5-inch SATA disk assembly RoHS:Y F542-0167 HS


Oracle Exadata Storage Server cable kit with the following:
• 10-pin backplane power cable, RoHS:Y
ble
fera
• 20-pin backplane 12C cable, RoHS:Y F560-2937 SD
• 6-pin fan power cable, RoHS:Y
• 16-pin FAN 12C cable, RoHS:Y a ns
n- t r
no
4x QSFP copper QDR InfiniBand cable, 5 m RoHS:Y F530-4415 HS

Sun Datacenter InfiniBand Switch 36, RoHS:YL F541-3495 a IR


Fan module for Sun Datacenter InfiniBand Switch 36, RoHS:Y F350-1312
) has ideฺ HS

CR2032 3 V battery, RoHS:Y


ฺ c om t Gu
F371-2210 IR

760-watt power supply, RoHS:Y


i s -ea uden F300-2143 HS

@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sreplacement
E lM
Two types of units are available: field replaceable units (FRUs) and customer

Elie
replaceable units (CRUs). FRUs are installed by trained Oracle field technicians. CRUs are
installed by the customer.
For a complete list of FRUs and CRUs, see the product documentation.
• Server’s Service Manual: http://docs.sun.com/source/820-5830-11/index.html
• InfiniBand Switch Service Manual: http://docs.sun.com/source/835-0784-04/toc.html
• Cisco Switch:
http://www.cisco.com/en/US/docs/switches/lan/catalyst4900/4948E/installation/guide/49
48E_ins.html
• KVM and KVM Components:
http://pcs.mktg.avocent.com/@@content/manual/590883501c.pdf

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 14


Product-Specific FRUs and CRUs

Manufacturing Repair
FRU Description
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Part Number Category


Serial cable kit, RoHS:Y with the following:
F350-1519
• USB to DB 9-pin M serial cable IR
• DB 9-pin F to DB 9-pin F null modem cable
Cisco Catalyst 4948 switch, RoHS:Y F371-4784 IR
Power supply for Cisco Catalyst 4948 switch, RoHS:Y F371-4785 HS

Cooling fan for Cisco Catalyst 4948 switch, RoHS:Y F310-0307 HS


Assembly, KMM, with Japanese keyboard and mouse module F371-4778 IR
ble
Avocent MPU4032DAC-001 32-port KVM switch, RoHS:Y
Avocent KMM drawer with United States keyboard, RoHS:Y
F371-4779
F371-4780 ns fera
IR
IR
t r a
Avocent (DSRIQ-USB) DB 15 M to RJ45/USB KVM adapter, RoHS:Y F371-4781
non- IR

a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E l M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 15


X2-8 Product-Specific FRUs and CRUs

Manufacturing Repair
FRU Description
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Part Number Category


X4800 (8x X7560 CPU, 128x 8GB DDR3, SAS-2 RAID HBA, 8x 300GB
10K RPM 2.5-inch SAS-2 HDD, 4x CX2 QDR IB-HCA, 4x 10GbE
SD
Niantic FEM, 2 x PassThru for 4x GigE and 4x 10GbE NEM, 2x 760W
PSU
3V, Litium Battery, RoHS:Y ( for the SP) 150-1204 SD
CMOD Battery 150-3993 HS
PSU,AC,A239,F,12V,165A,2KW 300-2159 HS

ASSY,FAN MODULE,C4/G5 371-4579 HS


ble
UNIVERSALRACK MOUNT KIT 371-4742
n fera
IR
s
CPU,INTEL X7560,2.26G,130W,8CR 371-4860
-t r a SD

Qmirage EM (CX2) Dual port CX2 4xQDR PCI-E EXP_MODULE 375-3605 on HS


a n
10 GigE Fabric Express Module (Niantic FEM)
a s
375-3648
h 530-3100 ฺ
HS

ADAP, 9P, DSUB, 8POS,RJ45


) i d e NA

USB/VGA/RJ45 Dongle Cable Adapter, RoHS:Y


ฺ c om t G530-3936 u NA

i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 16


Service Procedures X3-2L Storage Cell

• CRUs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

– Air Baffle
– Battery
– DIMMs
– DVD Drive
– Storage Drive
– Fan Module ble
– PCIe Card ns fera
t r a
– Power Supply Unit
non-
– SAS Expander Mod a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 17


Service Procedures X3-2L Storage Cell

• FRUs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

– Processor and heat-sink


– Disk Drive Backplane
– Front LED/USB Indicator Modules
– Motherboard Assembly
– SAS and SATA cables
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr resource
l i ce on http://ilearning.oracle.com for more details on field replaceable
See the
E lM following

Eli e
units in the Sun Server X3-2 and Sun Server X3-2L.
“Oracle Exadata Database Machine X3 Services Training Update”

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 18


Component Replacement Procedures

The steps in the following slides summarize the maintenance


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

procedures that administrators implement to add and replace


components on a running Oracle Exadata Database Machine.

Caution: These procedures for removing and replacing


components are for training purposes only. You should consult
the latest Oracle Exadata Database Machine Owner’s Guide for ble
current procedures and more details. Additionally, these sfera
procedures should be performed only in collaboration -tra
n
with an Oracle ACS engineer. n on
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr oflicthee complexity of the Oracle Exadata Database Machine, only trained
E lM
Caution: Because

Eli e
field personnel should execute, debug, or follow replacement procedures.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 19


Storage Cell Disk Replacement

Removing and Replacing a Disk Drive from a Storage Cell


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Identify the storage cell with the failed hard drive. Be sure to
note the number of the drive that has failed. Perform the
following steps to map an affected LUN to a physical disk:
1. From CellCLI, list the current LUNs and identify the
affected LUN:
CellCLI> list lun
0_0 0_0 normal
b le
<snip>
0_5 0_5 normal
ns fera
t r a
n-
0_6 0_6 critical
0_7 0_7 normal
a no
has ideฺ
0_8 0_8 normal
0_9 0_9 normal
)
0_10 0_10 normal
ฺ c om t Gu
-ea udeensure
0_11 0_11 normal
Note: When performing a disk replacement, s n that only new,
c i t
unused disks are used. ry @ S
a s e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
When laM asrcompletely
drive lice fails (spins down, red-lights), an Exadata alert is generated with
e E instructions for replacement. If you have configured for alert notifications, you will be
specific
i
l
E alerted of this via email. You can also see this alert with the CellCLI list alerthistory
command. Steps for post disk replacement are fully automated.
Some things to keep in mind with respect to disk replacement:
If the failed disk had more than one grid disk on it, you might encounter bug 9237258, which
causes the celldisk to get created as interleaved. If this occurs, perform the following steps:
1. Ensure that the disk is dropped from the ASM disk group.
2. Drop the celldisk.
3. Re-create the celldisk.
4. Re-create the grid disks with proper order and size.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 20


List the affected LUNs in detail to obtain the physical disk name:
CellCLI> list lun 0_6 detail
name: 0_6
cellDisk: CD_06_trnbcel05
deviceName: /dev/sdg
diskType: HardDisk
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

errorCount: 0
id: 0_6
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 558.4059999994934G
lunUID: 0_6
physicalDrives: 20:6
ble
fera
raidLevel: 0
status: critical
a ns
List all the affected physical disk in detail:
n- t r
CellCLI> list physicaldisk 20:6 detail
a no
name: 20:6
) h as eฺ
deviceId: 14
c o m Guid
enclosureDeviceId: 20
- e aฺ ent
errMediaCount: 0
@ cis Stud
errOtherCount: 0 ry
s t h is
errorCount: m
l 0 a se
ฺ e to u
( e lie sefalse
foreignState:

r
id: n
y 0932E01C22
s ice
a luns: l0_6
ElM
E l i e makeModel: "SEAGATE ST360057SSUN600G"
physicalFirmware: 0605
physicalInsertTime: 2009-10-10T09:18:30-06:00
physicalInterface: sas
physicalSerial: 0932E01C22
physicalSize: 558.9109999993816G
slotNumber: 6
status: critical
Physical Disk nomenclature: 20:6
• 20 stands for HBA Enclosure ID
• 6 stands for slot

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 21


Storage Cell Disk Replacement

2. Press the release button to the right of the failed drive.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

3. When the drive tray opens, remove the drive from the
storage cell by pulling gently on the drive tray handle.
4. Insert the new drive tray in the empty drive slot in the
storage cell by sliding in the drive tray and pushing in the
drive tray handle until it locks.
5. Verify that the new drive is detected. ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe automatic
l i ce firmware update may not work and the LUN will not be
M
In rarelcases,
E
Eli e
automatically rebuilt.
This can be confirmed by checking the ms-odl.trc file.
If the disk was a system disk (a disk with a copy of the operating system on it), consider the
following:
The md sync status can be obtained with mdadm.
An example command is: mdadm --detail --scan /dev/md7 /dev/md5
In rare cases, the grub boot loader may not get properly installed on the newly replaced
system disk. If this occurs, it can be manually installed by doing the following:
Note: This may have to be done when booted off the cell boot USB depending on the severity
of the problem.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 22


Create a device mapping file that shows the relationship between the hard disks and the
devices. An example is:
hd0 /dev/cciss/c0d0
hd1 /dev/cciss/c0d1
Install the grub boot loader interactively via the following command: (This example installs it
on hd0.)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

grub-grub-devicemap=<device map file> << EOF


root(hd0,0)
setup(hd0) EOF
Tip: Double-check the BIOS boot order and ensure that the cell boot USB is first. This should
not have been changed by the disk failure, but it is good to double-check it because ensuring
that the cell boot USB is first also ensures that you can boot off the surviving disk when a
system disk fails.
ble
Repair a Corrupt File System Superblock
ns fera
r a
Follow these instructions below to repair a corrupt file system superblock. This example is
t
n-
no
from a corrupt superblock on a database server. This is only an example. Do not perform
these steps on the lab equipment.
s ฺa
Use the diagnostics ISO to get the system booted. Now try ) hasee
to e superblock can be
iifdthe
fixed. o m
c nt G u
e aฺreboot.
1. If fixed successfully, edit /etc/fstab -and
c i s t u de
2. Run the command: e2fsck -v@
r S
y -p th-fis/dev/cciss/c0d0p1.
3. If successful, remove thea s
ISO andereboot.
ฺ l m
e or the us still fails to boot, then you may still get the system
If the superblock is not i efixed t o system
l se You may need to try different copies of backup superblocks
esuperblocks.
up by using backup (
if some of a the
y
srbackups
l i cearen also corrupt.
lMlist can be obtained by "dumpe2fs /dev/cciss/c0d0p5 | grep super" from the ISO.
1. EThe
e
Eli 2. Run the command: mount -t ext3 -o sb=8193 /dev/cciss/c0d0p1
/mnt/cell.
3. Now edit /mnt/cell/etc/fstab to make the system use a different superblock.
4. The new mount line will be /dev/cciss/c0d0p1 / ext3 sb=8193,defaults 1
1.
5. Detach the ISO.
6. Reboot.
In general, identify the bad disk if there is one and replace it as soon as you can. Most of the
time this is not a media or bad disk issue, but a sudden power loss or reboot where the data
on the superblock gets polluted or damaged.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 23


F20 Flash Card/DOM Replacement
Use CellCLI to determine the PCI slot and DOM for flash card replacement.
The CellCLI command list physicaldisk detail will give you the PCI slot which is
needed if replacing a flash card. The back of the server details the slots and the card itself
has the DOMs labeled.
Here is an sample process to follow if a write performance issue is expected on a flash card:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Go to the cell that you suspect has the flash drive write performance problem.
Drop everything on the flash using:
cellcli -e drop flashcache all
If you have griddisks on flash, you need to drop them as well:
cellcli -e drop griddisk all flashdisk
cellcli -e drop celldisk all flashdisk
ble
fera
Then run the following command:
lsscsi |grep MARV | awk '{print $NF}' | awk '{printf "time dd
a ns
if=/dev/zero of=%s bs=1048576 count=20\n", $1}' | sh –x n- t r
a no
This writes 20 1M blocks to the flash devices. Based on the elapsed time for each dd
has ideฺ
command, you can tell which flash card is bad. If you are physically near the machine, you
)
ฺ c om t Gu
should also be able to see the amber light on the flash card.

i s -ea uden
Assume that /dev/sdy is bad. How should you identify the actual PCI slot that contains the
bad flash card?
@ c S t
cellcli -e list lun detail sry e thi s
a
lm us
<snip>
ฺ e
name: 5_3 ( e lie se to
y cen
srFD_15_sgsas1
cellDisk:
l M a li
i e E
deviceName: /dev/sdy <= device you are interested in
l
E diskType: FlashDisk?
id: 5_3
isSystemLun: FALSE
lunAutoCreate: FALSE

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 24


lunSize: 22.8880615234375G
overProvisioning: 100.0
physicalDrives: [11:0:3:0] <= physical disk
status: normal
<snip>
Now, look at the physical drive it is on using:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

cellcli -e list physical disk detail


<snip>
name: [11:0:3:0] <= physical disk from above
diskType: FlashDisk?
id: 508002000092d00FMOD3
luns: 5_3
ble
makeModel: "MARVELL SD88SA02”
ns fera
physicalFirmware: D20R? t r a
physicalInsertTime: 2009-11-20T00:13:19-08:00 no n-
a
physicalInterface: sas
) has ideฺ
physicalSerial: 508002000092d00FMOD3
ฺ c om t Gu
physicalSize: 22.8880615234375G
i s -ea uden
slotNumber: "PCI Slot: 5; FDOM:
@ c 3" <= S t PCI slot number and FDOMnumber
sry e thi s
status: normal a
lm us
ฺ e
<snip>
( e lie se to
The PCI slot number
y iscemarked
n on the chassis and the FDOM number is marked on the Aura
card. a s r li
E l M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 25


IB-HCA Card Installation

To install the Adapter, refer to your system installation or


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

service manual for detailed instructions on the following steps:


1. Power off your server by using the standard shutdown
procedures that are described in your system service
manual.
2. Remove the cover from the system to access the card
slots and connectors.
r a ble
3. Select PCIe x8 slot 2 for x4170 or slot 3 for x427X and n s fe
remove the blank front panel for that slot. n - tra
Or, if you are replacing an existing card in a n
o
s
that slot, remove the card. ) ha deฺ m Gui
c o
- e aฺ ent
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 26


IB-HCA Card Installation

4. Install the IB-HCA card into the slot, pushing the card’s
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

edge connector into the connector on the chassis. Ensure


that the front plate on the IB-HCA card mounts flush with
the chassis panel opening.
5. If applicable, install the screw in the front plate to secure
the IB-HCA card into the chassis.
6. Attach the 4x end of each InfiniBand I/O cable to an IB-
r a ble
HCA port connector. Ensure that the connectors are sfe
properly engaged. t r a n
n o n-
Caution: Avoid putting unnecessary stress on a
the connection. Do not bend or twist the )cable has ideฺ
near the connectors and avoid cable c m Gu
obends
e ฺ
a ent
of more than 90 degrees. is - ud
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 27


IB-HCA Card Installation

7. Replace the cover on the unit.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

8. If it is not already connected, connect the 12x end of the


InfiniBand I/O cables to the appropriate ports on the switch
or switches. The IB-HCA ports can be connected to
different ports on the same switch or to a port on different
switches.
9. Turn on power to the system and enable the server to
r a ble
reboot.
n s fe
This step completes the hardware n - tra
o
installation. Proceed to the next slide. an
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 28


IB-HCA Card Installation

Verifying the Installation


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The InfiniBand switch should automatically recognize the IB-


HCA card when it is connected to the fabric if the IB Subnet
Manager is running on the switch or on a host within the subnet.
1. Ensure that the cables are connected to the adapter and
switches.
2. Run /opt/oracle.SupportTools/CheckHWnFWProfile.sh
r a ble
to check the InfiniBand firmware. If you receive an error, n s fe
upgrade the firmware. This is the check. n - tra
Reflash occurs automatically on boot. a no
3. Verify that the IB Subnet Managermis) h
as eฺ
o u id
running on the IB switch or on c
aฺa hostn t G
- e e
within the subnet. cis tud
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 29


IB-HCA Card Installation

4. Check that the green LED is illuminated for each


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

connected port. If the green LED is not on, check the cable
connections at the adapter and at the switch.
5. Check that the amber LED is illuminated for each
connected port.
6. Verify that the IB-HCA ports are functional and the driver is
attached: # dmesg | grep mlx4. le
f e r ab
The output shows system diagnostic messages that have n s
the string mlx4 in the message (the name -t r a
o n
of the Linux driver). Included in the output a n
is a message that indicates whether)the has ideฺ
port is up or down. ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 30


InfiniBand Switch Maintenance

To service the IB switch, refer to your system installation or


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

service manual for detailed instructions.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
scomponents
E lM
Replaceable of an InfiniBand switch:

Elie1. Battery
2. Fan
3. Power Supply
To remove and replace an entire InfiniBand switch:
1. Identify the failed switch.
2. Disconnect the power and network cables from the failed switch.
3. Remove the failed switch from the rack.
4. Install the replacement switch in the rack.
5. Connect the power to the replacement switch.
6. Check the software on the replacement switch and confirm that it is the correct version.
7. Configure the replacement switch, restore from backup if possible.
8. Connect the network cables to the replacement switch.
9. Verify the status of the replacement switch.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 31


Backing Up Settings on the IB Switch

The procedure in the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

notes below
describes how to
back up and restore a
switch with 1.1.3-2 or
higher firmware. The
backup needs to be
ble
done only once after
the switch is initially n sfera
t r a
configured with the n- no
right settings. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr restore
l i cea switch with 1.1.3-2 or higher firmware:
To back
E lMup and
e
Eli 1. Navigate to the switch ILOM URL or IP address in a browser (for example,
http://10.7.4.227).
2. Log in as the ilom-admin user. The password is welcome1
Note: The default password is ilom-admin.
3. Click the Maintenance tab.
4. Click the Backup/Restore tab.
5. Select the Backup operation and Browser method.
6. Enter a pass phrase. This is used to encrypt sensitive information, such as user
passwords, in the backup.
7. Click Run and save the resulting XML file in a secure location.
8. Log in to the Sun Datacenter InfiniBand Switch 36 switch as the root user.
9. Use the scp command to copy the /etc/opensm/opensm.conf and
/conf/partitions.current files and save them with the backup XML file. This is
necessary because the backup does not save the Subnet Manager or IB partition
configuration respectively.
10. Save the output from the version command.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 32


Replacing an InfiniBand Switch

The procedure in the notes below describes how to replace a


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

failed InfiniBand Switch.


Note: This procedure can be performed on a live system.
However, physical limitations restrict the replacement of the
InfiniBand switch at rack unit 1 because the power cables run
right behind this switch.
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr thelicables
ce from the switch. All InfiniBand cables should have labels at both
lM
1. Disconnect
E
El i e ends, indicating their locations. If there are any cables that do not have labels, label
them.
Note: The following procedure can be performed on a live system. However, physical
limitations restrict the replacement of the InfiniBand switch at rack unit 1 because the
power cables run right behind this switch. To replace the spine switch, all servers need
to be powered off. Plan for the down time accordingly.
2. Remove the switch from the rack.
3. Install the new switch in the rack.
4. Restore the switch settings by using the backup.
5. Disable the Subnet Manager by using the disablesm command.
6. Connect the cables to the new switch. Make sure to connect each cable to the correct
port.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 33


7. Run the following command on any of the hosts with the appropriate argument
(depending on the size of the installation).
verify-topology –t halfrack
8. Run the following command on any host to verify that there are no errors on any of the
links in the fabric:
ibdiagnet -c 1000 -r
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

9. Enable the Subnet Manager by using the enablesm command.


Note: If the replaced switch was the spine switch, manually fail the Master Subnet
Manager back to the switch by disabling the Subnet Managers on the other switches
until this spine switch becomes the master, and then re-enable the Subnet Manager on
all the other switches.

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 34


Restoring Settings on the IB Switch

To restore the settings on a switch with 1.1.3-2 (or later)


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

firmware:
1. Run the version command and ensure that the switch is
at the right firmware level. If not, upgrade the switch to the
correct firmware level.
2. Navigate to the switch ILOM URL or IP in a browser, as in
this example: http://10.7.4.227.
r a ble
3. Log in as the ilom-admin user, password welcome1sfe
n
Note: The default password is ilom-admin. n-tra
n o
4. Click the Maintenance tab. a
) h as eฺ
5. Click the Backup/Restore tab.
c o m Guid

6. Select the Restore operation ent
-eaand Browser method.
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr and l i cethen select the XML file that contains the switch configuration backup.
E lM
7. Click Browse

El e
i 8. Enter the pass phrase that was used during the backup.
9. Click Run to restore the configuration.
10. Log in to the InfiniBand Switch switch as the root user.
11. Use the scp command to copy the /etc/opensm/opensm.conf and /conf files to
the switch. These were the files that were created during the backup.
12. Restart openSM from the switch CLI by using the following commands:
- disablesm
- enablesm

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 35


Verifying the InfiniBand Network Operation

You should verify that the InfiniBand network is operating


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

properly in the following situations:


• If hardware maintenance has taken place with any
component in the InfiniBand network, including replacing
an InfiniBand HCA on:
– A server
– An InfiniBand switch b le
– An InfiniBand cable fera
n s
• If operation of the InfiniBand network is suspected r a
-t to be
n o n
substandard a
Note: The procedure in the notes below ) has ideฺhow to verify
describes
network operation. It can be usedaany ฺ com time n t G
the
u InfiniBand
network is performing belowcexpectations. i s - e d e
@ S tu
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sribdiagnet
l i ce command to verify InfiniBand network quality:
E lM
1. Run the

Eli e # ibdiagnet -c 1000


All errors reported by this command should be investigated. This command generates a
small amount of network traffic and may be run while normal workload is running.
2. Run the ibqueryerrors.pl command to report on switch port error counters and port
configuration information:
# ibqueryerrors.pl -rR -s LinkDowned,RcvSwRelayErrors,XmtDiscards
Errors such as LinkDowned, RcvSwRelayErrors, and XmtDiscards are ignored when
using the preceding command.
3. If there is no load running on any portion of the InfiniBand network (for example, if there
are no databases running), run the infinicheck command to perform full InfiniBand
network configuration, connectivity, and performance evaluation.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 36


New LSI RAID Battery Maintenance Procedure

Activity: See videos on how to retrofit, remove, and replace


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the Sun LSI BBu08 Hot-Swap Battery:


• Available in the eKit
• One for X3-2
• One for X3-2L

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 37


LSI HBA Batteries

Battery Backup Unit (BBU)


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
An LSIlM asr6GbpsliHost
SAS2
ce bus adapter (HBA) is used in all Exadata Database Machine
e E based on Sun-Oracle hardware to control and interface the disk drives and contains
servers
i
l
E 512 MB of Low Voltage DDR2 memory it uses to cache data writes in order to improve
performance of disk write operations. It also contains a Battery Backup Unit (BBU), which is
designed to supply regulated battery power to the cache memory long enough for the main
system power to be brought back up on line, when there is a main system power outage. For
Exadata, the specified hold-up time is 48 hours, which requires a usable charge capacity of
674 mAh for low voltage DDR2 memory.
The BBU is a single cell Li-ion battery pack and like all Li-ion rechargeable batteries charge is
supplied via a chemical reaction and the battery packs ability to hold charge will degrade over
time. The BBU (also referred to as iBBU or Intelligent BBU) contains a small integrated circuit
board with a “smart” gas gauge, accessible through an I2C bus, which permits the RAID On a
Chip controller to monitor the actual battery capacity to ensure that caching is not permitted if
the capacity falls below the minimum necessary threshold. The BBU board also contains the
charge circuitry. It is designed to be removable and replaceable as a Field Replaceable Unit
(FRU) with a single mating connector that interfaces the BBU board to the HBA, and three
screws mounted under the HBA that physically retain it to the HBA.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 38


Write-Back Versus Write-Through Mode

When the BBU is present and operating normally, the virtual disks are
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

placed in Write-Back (WB) mode, which uses the cache memory to


store writes that get written back to the disk physically in a block to
optimize the data placement and speed on disk. After the write is in
the cache memory, it is acknowledged back to the OS as I/O
completed, which improves the OS level performance. If the power is
lost, the BBU ensures that the cache is maintained and written back to
disk.
r a ble
When the BBU is not present or not able to be used currently, thesfe
HBA reverts the virtual disks into Write-Through (WT) mode. t r a
Thisn
mode bypasses the cache memory and writes all I/Os n on- to the
through
disk physically, and waits on the disk to acknowledge s a the write is
a
) h uideฺ but
completed physically. This results in slowermperformance
guarantees that all writes are completed a ฺ coto nonvolatile
n t G disk storage so
- e de
cis wastulost.
there is no risk of data loss if power
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 39


LSI HBA Batteries

Two types of BBUs are shipped with Exadata Database


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Machine based on Sun-Oracle hardware:


• BBU07: V2s based on x4170 and x4275 servers shipped
with model BBU07 (iBBU in firmware) part number 371-
4746 and HBA model 375-3644, which contains the B2
ASIC version of LSI controller. The expected service life is
two years, but various factors may reduce that.
r a ble
• BBU08: X2-2s & X2-8s based on x4170m2, x4800, and
n s fe
x4270m2 servers shipped with model iBBU08 (iBBU08
n - tra in
firmware) part number 371-4982 and HBA amodel no 375-3701
(x4170m2/x4270m2), which contains h
) asASIC
B4 e ฺ version or
375-3647 (x4800), which contains omB2t(-02) i d
u or B4 (-03)
ฺ c G
ASIC version. The expected
i s -easervice
d e nlife is three years, but
@c isthat.
various factors mayyreduce Stu
a sr e th
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr between
l i cethe types is primarily that the BBU08 has a more refined charge
lM
The difference
E
i e
control capability and is more tolerant to temperature variations. The BB08 also has a
Elsignificantly reduced learn cycle time.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 40


LSI HBA Batteries

Mixes that are supported as FRU replacements:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Liberator ASIC BBU-07 BBU-08


B2 Yes Yes*
B4 N/A Yes
The revision of ASIC can be determined from Linux as follows:
# lspci | grep RAID
13:00.0 RAID bus controller: LSI Logic / Symbios Logic LSI
MegaSAS 9260 (rev 03)
ble
fera
#
Rev 03 is a B2 ASIC
a ns
# lspci | grep RAID
13:00.0 RAID bus controller: LSI Logic / Symbios Logic LSI n- t r
MegaSAS 9260 (rev 05)
a no
has ideฺ
#
Rev 05 is a B4 ASIC. )
ฺ c om t Gu
Exadata systems running Solaris are only supported on models that only shipped B4
ASIC.
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sridentifylicwhich
e
E lM
It is critical to model card is in use for replacements in order to identify

Elie
compatibility.
* BBU08 use on the older B2 HBA requires a firmware update to at least FW Package Build
12.9.0-0037 or later. This is shipped as part of Exadata image 11.2.2.1.1 or later. Older
installed systems may require update prior to replacing. The connector and screw mounts are
identical so that they are physically compatible.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 41


Battery Monitoring via Learn Cycles

Learn cycles are performed periodically to fully discharge the battery and
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

recharge it. When discharge is complete, the BBU determines the new
capacity of charge the battery can hold. Failure to perform learn cycles at the
recommended intervals may reduce the usable life of the battery by reducing
the full charge capacity more rapidly leading to premature end of service life.
This is reported by the Full Charge Capacity field in MegaCli BBU output and
will be updated after a learn cycle. Refer to the next section for an example.
When a learn cycle is initiated, the charging circuit automatically places any
virtual drives that are in WB mode into WT mode for the duration of the cycle, ble
which will temporarily reduce write performance. When the learn cycle
s f era
completes, the virtual drives are automatically transitioned back to tWB r a nmode
on- Learn
if the battery is still capable of holding the required charge amount.
n
cycle time will vary based on the BBU type. For BBU07,sthe a complete learn
cycle process and the cache in WT mode are expected ) a
h toidbee6ฺ to 8 hours. For
BBU08, the complete learn cycle process and c m cache
othe G uin WT mode are
ฺ t
expected to be 2 to 3 hours. -ea en
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sa rnew BBU
l i ceis installed into a system, it will have a depleted charge state. Any
lM
Note, when
E
e
virtual
i drives
ElUsually,
attached will be forced into WT cache mode while a full learn cycle is performed.
a sufficient charge to maintain the cache is reached after this cycle is complete. This
may take 24 hours or longer. Status will show the following:
# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep “Charging”
Charging Status: Charging
Learn cycles on Exadata are default configured as follows:
Storage Cells with image 11.2.1.2.x the learn cycle occurs monthly, based on when the
system was first powered on.
Storage Cells with image 11.2.1.3.1 or later, the learn cycle is manually scheduled quarterly to
start at 2 AM January 17, April 17, July 17, and October 17. The time is chosen to minimize
impact on day time operations.
Database nodes are set for automatic scheduled, which occurs every 30 days from first power
on. This may lead to variability in the time of day based on when the node was powered on.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 42


To change the start time on Storage Cells for when the learn cycle occurs, use a command
similar to the following. The time reverts to the default learn cycle time after the cycle
completes:
CellCLI> ALTER CELL bbuLearnCycleTime="2011-01-22T02:00:00-08:00"
To see the time for the next learn cycle, use the following command:
CellCLI> LIST CELL ATTRIBUTES bbuLearnCycleTime
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Storage cells will put the following in the Cell alerthistory which may generate a service call:
4 2011-04-17T05:01:06-04:00 info "BBU on disk contoller at adapter 0
is going into a learn cycle. All Logical Volumes on harddisks will
go into WriteThrough caching mode. Write Throughput will be lower."
5_1 2011-04-17T09:46:07-04:00 critical "All Logical drives are in
WriteThrough caching mode. Either battery is in a learn cycle or it
needs to be replaced. Please contact Oracle Support"
b le
fera
5_2 2011-04-17T12:09:28-04:00 clear "Battery is back to a good
state"
a n s
If the last message indicating that it is back in a good state does not occur, n r
t this requires
-then
investigation as to why the battery is not good after the learn cycle. n o
Database nodes currently do not log learn cycles except in h a
the s a events
HBA ฺ log.
) i d e
Additional learn cycles may start occurring more frequently
ฺ c om tthan G u30 days if the full charge
capacity gets close to the replacement thresholds
- e a and the
e nremaining capacity goes low, which
i s d
tu capacity. This has been seen to occur
c full charge
will initiate a new learn cycle to relearn the
@ s S
sry e thi
as frequently as daily on a failing BBU.
a
lm umode,
To check if a battery is in learn-cycle
ฺ e s do the following:
e lie se t
# /opt/MegaRAID/MegaCli/MegaCli64 o -AdpBbuCmd -a0 | grep Learn
(
Learn Cycle
a sryRequested
l i cen : No
LearnE lMCycle Active : No
i e
ElLearn Cycle Status : OK
Learn Cycle Timeout : No

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 43


New Cable Management Arm

Activity: See video on the new Cable Management Arm


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

replacement procedures.
• Available in the eKit

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 44


Exadata X2-8 Database Server

System connection and Field Replaceable Units


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 45


X2-8 DB Server Service Processor Cabling

The Service Processor


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

module (SP) provides


connections used for
system administration.
These include serial and
Ethernet cables for ILOM,
and serial, video, and USB
r a ble
cables for the host n s fe
console. n - tra
a no
Connectors are provided
) has ideฺ
on the SP itself and on the om t Gu
ฺ c
multiport cable, which
i s -ea uden
connects to the SP. y@c is St
a sr e th
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
FigurelM asr lice
Legend
i e E
E l 1. Connect an Ethernet cable between the NET MGT port and the network to which future
connections to the SP will be made. NET MGT port 0 is the suggested default.
2. Connect a serial cable between the SER MGT port and a terminal device or a PC. You
might need an adapter. The server comes with a DB9–to-RJ45 serial port adapter.
The SER MGT port provides a direct serial connection to the SP. You can use this to
discover the SP’s IP address, and configure it if necessary. DHCP is the default, but you
can configure it to use a static IP address as well. After you know the SP’s IP address,
you can use a web browser or an SSH connection to communicate with the SP over the
NET MGT port. Alternatively, you can continue to use the serial port to communicate
with the SP command-line interface (CLI).
Refer to the Oracle Integrated Lights Out Manager (ILOM) 3.0 documentation for details.
Connect the multiport cable to the KVM connector. This cable provides connectors for
the serial console, the video console, and USB.
3. Connect power cable to power source.
4. EM slots – PCIe EMs provide different connectors depending on what type is installed.
5. NEM slots – NEMs provide 1 GbE and 10 GbE connectors.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 46


X2-8 DB Server Service Processor Cabling
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
Video
1 ns fera
console r a
n- t
n o
Serial a
s 2
) ha eฺ
console
m Guid
c o
- e aฺ eUSB n t (2-
@ cis Stud connectors) 3

a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
FigurelM asr lice
Legend
i e E
E l 1. Net management ports 0 and 1
2. Serial management
3. Fault LED
4. Power/OK LED
5. Temperature LED
6. Multiport cable connector
7. Locate button/LED

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 47


X2-8 DB Server Subassembly Module
Removal and Replacement
The Subassembly module (SAM) resides inside the chassis
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

and contains the midplane on the internal front-facing side of


the SAM and the server components on the back of the SAM.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr thelicSubassembly
e
How to
E lM
Remove Module

Elie1. Prepare the server for service.


2. Disconnect the AC power cables from the rear of the server.
The AC power connectors are retained by a wire latch. Lift the retaining latch and pull
the connector out of its socket.
3. Disengage the power supplies. See “How to Remove a Power Supply.”
Partially remove the power supplies. This action ensures that the power supplies are
disconnected from the midplane connectors.
4. Label and remove the CMODs and CPU filler modules. CMODs and CPU filler modules
must be returned to their original slots.
5. Disconnect the three hard drive backplane cables from the server midplane.
Note: Do not disconnect the cables from the hard drive backplane.
WARNING! Do not attempt this procedure on the lab equipment. Refer to video captures for
details on the Subassembly replacement procedure.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 48


X2-8 DB Server Subassembly Module
Removal and Replacement
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sDor not attempt
l i ce this procedure on the lab equipment. Refer to video captures for
lM
WARNING!
E
Eli e
details on the Subassembly replacement procedure.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 49


CPU Module Orientation
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM (CMOD) Designations
CPU Module
E
e
i the front of the server chassis and within the ILOM interfaces (web and command line),
ElOn
the CMODs are designated as BL 0–BL 3.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 50


How to Remove a CPU Module

• To unlock the CMOD, squeeze together the green tabs


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

between the lever handles.


• This action produces a click sound and releases the
handles.
Caution: For proper airflow and cooling, all CMOD slots
must contain either a CMOD or a filler module.
Do not operate the server with unoccupied CMOD slots.
ble
Caution: The CMOD is not a hot-swap component.
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr (CMODs)
l i ce are the processing engines for the Sun Fire X4800 server. Each
E lM
The CPU modules
i e
CMOD contains two processors (CPUs), memory, and I/O capabilities for PCIe and
ElGigabitEthernet.
The CMOD bay of the Exadata X2-8 Database Server contains four CMODs. CMOD 0 (BL 0)
is the master CMOD.
If necessary, quiesce the operating system.
Note: Removal of some hot-swap components might cause disruption to network or storage
access. Take the necessary precautions to prepare the OS for the inaccessibility of network
communications or storage access.
If necessary, place the server in standby power mode or power off the server.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 51


How to Remove a CPU Module

Caution: Potential physical harm or component damage.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Because of the length and weight of the CMOD, more than one
person should perform the removal of the CMOD at this point.
Caution: Potential overheat condition. Unoccupied module
slots disrupt air flow and temperature control within the server.
Replace the module with a filler module or another CMOD.
b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe CMOD,
l i ce simultaneously rotate both levers outward away from the center of
lM
To disengage
E
the
Eli e module. Do not attempt to remove the CMOD now.
Rotating the levers outward causes the pawls on the end of the levers to engage the sidewall
of the chassis and pull the CMOD out of its internal connector.
Use the handles to pull the CMOD partially out of its slot.
Pull the CMOD out so that approximately six inches extends from the front of the chassis.
Rotate the levers inward until they are closed and locked.
To remove the CMOD, have an assistant support the CMOD as you grab it with your hands
and slowly pull it out of the slot.
Install a CMOD filler in the slot.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 52


CPU Module Components
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr of lDIMMS,
i ce REM, FEM, system battery, heat sinks, and CPU sockets.
E lM
Note the location

Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 53


Prepare the Server for Operation

To latch and lock the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

CMOD, simultaneously
rotate and push both levers
inward toward the center
of the module until the
locks on the handles click
into place.
le
This action pushes the module
f e rab
into the chassis and engages the connector on the back of the a n s
module
n r
-t are
with the connector on the interior midplane. When the handleso
a
locked, you cannot lift the levers without first releasing
nthe locks on
s
the handles. ) ha ideฺ
m u

Caution: Pinch point. Keep your fingers
a co nclear
t G of the back of the
e
- udofe the module.
cis edges
lever, the lever hinges, and the t
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
LocatelM
s licethat you need to populate.
theamodule slot
i e Ethe CMOD levers to the fully open position by squeezing together the green locking
E l
Open
tabs on the lever handles and rotating both handles outward, away from the center of the
module.
The levers do not extend beyond 90 degrees.
Orient the CMOD so that the cover faces upward.
Carefully slide the module into the chassis until it stops.
Do not force the module into the chassis in an attempt to engage the connectors on the
chassis midplane.
Ensure that the pawl on the end of each lever is aligned with the rectangular slot in the
chassis sidewall.
To latch and lock the CMOD, simultaneously rotate and push both levers inward toward the
center of the module until the locks on the handles click into place.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 54


If necessary, remove any service-related cabling and devices.
Remove tools from the interior and exterior of the server, the chassis, and the rack.
Account for all tools used to service the server.
Ensure that components are properly seated and cabled and that all cables are properly
routed and secured.
Ensure that server front and back air vents are not obstructed or clogged.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Use a vacuum to remove dust and debris from the server vents and chassis.
How to Verify CPLD Versions
All CMODs must have identical CPLD levels. After installing a CMOD, you must verify the
CPLD levels for all CMODs in the chassis.
Before You Begin
All CMODs in the chassis must be installed, and the chassis must be in standby power mode.
The green LED on all CMODs must be steady ON. ble
Log on to the ILOM.
ns fera
t r a
Enter the following command for each node in the chassis:
no n-
show /SYS/BLn/CPLD a
In this command, n is the node number.
) has ideฺ
Verify that all nodes return the same value. ฺ c om t Gu
i s -eaOracle
If all nodes do not return the same value, contact d e n
service.
@ c S tu
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 55


PCIe EM Designations and Population Rules
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

b le
ns fera
t r a
n-no
The PCIe EM slots are designated starting
a
has ideฺ
from the bottom as EM 0.0–EM 3.1.

Note:oFor
)
m proper uairflow and cooling, slots
ฺ c
a containing t G
n a PCIe EM must be populated
i s -enot d e
@ c with
S tu panel.
a filler

a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr are paired
l i ce and allocated to a single CMOD. The slots-to-CMOD pairing is as
PCIe EM
E lMslots
e
follows:
Eli • Slots EM 0.0 and 0.1 are paired to CMOD 0 (BL 0).
• Slots EM 1.0 and 1.1 are paired to CMOD 1 (BL 1).
• Slots EM 2.0 and 2.1 are paired to CMOD 2 (BL 2).
• Slots EM 3.0 and 3.1 are paired to CMOD 3 (BL 3).

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 56


IB HCA Ports and LEDs

1. InfiniBand Port 1
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

2. Green LED for Port 1


(Physical Link)
3. Amber LED for Port 1
(Data Activity Link)
4. InfiniBand Port 2
5. Green LED for Port 2 ble
(Physical Link)
ns fera
6. Amber LED for Port 2 t r a
non-
(Data Activity Link) a
7. Power On indicator ) has ideฺ
8. Hot Swap button ฺ c om t Gu
9. Service Required i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 57


How to Remove a PCIe EM

• To unlock the PCIe EM, pull out on the underside of the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

release handle and rotate the handle to the left to its fully
open position (1).
• To remove the PCIe EM, use the handle to pull the PCIe
EM from its slot (2).

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
sr liceserver.
theaback of the
AccesslM
i e E cables from the PCIe EM.
E l
Disconnect
To unlock the PCIe EM, pull out on the underside of the release handle and rotate the handle
to the left to its fully open position (1).
To remove the PCIe EM, use the handle to pull the PCIe EM from its slot (2).

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 58


PCIe Express Module: Overview

Note: PCIe Express modules are designated as customer


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

replaceable units (CRU).

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srX4800liserver
ce has eight PCIe EM slots. The PCIe EMs have a lever mechanism
The Sun
E lMFire

Eli e
that is used for removal and installation. The lever is held in place by a release latch.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 59


How to Install a PCIe EM or PCIe EM Filler

Note: PCIe Express modules are designated as customer


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

replaceable units (CRU).


• Slide the PCIe EM into the slot until it stops. (1)
• Ensure that the pawl on the end of the handle is aligned
with the slot sidewall.
• Rotate the handle downward until it is flush
with the PCIe EM (2). ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sExpress
E M
Note: lPCIe modules are designated as customer replaceable units (CRU).
ie the back of the server.
ElAccess
Ensure that the PCIe EM handles are in their fully open position.
To unlock and extend the handle, pull up on the underside of the release handle and lift the
handle to its fully open position.
Position the PCIe EM at the slot with the handle at the bottom.
Slide the PCIe EM into the slot until it stops.
Ensure that the pawl on the end of the handle is aligned with the slot sidewall.
Rotate the handle downward until it is flush with PCIe EM.
This action draws the PCIe EM into the slot, engaging the PCIe EM with its internal connector.
Attach the necessary cables.
Prepare the server for operation.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 60


Network Express Module
Designations and Assignments
The numerical designations correspond to the CPU module
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

(CMOD) position within the chassis.

Note: The first digit of the designation refers to the CMOD. The
second digit refers to the port.

b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sExpress
E lM
The Network modules (NEMs) are designated as NEM 0 and NEM 1. NEM 0 is on

Elie
the left, and NEM 1 is on the right:

Designation CPU Module


3.0 and 3.1 CMOD 3 (BL 3)
2.0 and 2.1 CMOD 2 (BL 2)
1.0 and 1.1 CMOD 1 (BL 1)
0.0 and 0.1 CMOD 0 (BL 0)

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 61


How to Remove an NEM or an NEM Filler

Note: The NEM is a hot-swappable component.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. To release the NEM,


squeeze and extend the
release handles outward
in opposite directions.
2. Pull the lower handle
downward and lift the r a ble
upper handle until both n s fe
are in their fully open n - tra
n o
position. a
) h as eฺ
3. To remove the NEM, pull
c o m Guid
it out of its slot using the -eaฺ e n t
is d
handles. @c Stu
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i eCopyright
el nse t o
y (
a sr backliofcethe server.
E lM
1. Access the
e
El2.i Label and disconnect any cables attached to the Network Express module (NEM).
3. To release the NEM, squeeze and extend the release handles outward in opposite
directions.
4. Pull the lower handle downward and lift the upper handle until both are in their fully open
position.
5. To remove the NEM, pull it out of its slot using the handles.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 62


Network Express Module: Overview

Note: Network Express modules are designated as customer-


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

replaceable units (CRU).

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sExpress
E lM
The Network modules (NEMs) provide server network connectivity options. In
ie have
addition
ElNEMs
to the four 10-Gigabit Ethernet ports and the four 10/100/1000Base-T ports, the
an indicator panel.
Note: Network Express modules are designated as customer replaceable units (CRU).

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 63


How to Install an NEM or an NEM Filler

Note: The NEM is a hot-swappable component.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. To install the NEM, use the


handles to slide the NEM
into its slot until it stops (1).
2. Ensure that the pawl on the
lever is aligned with the slot
in the back of the server. ble
3. Rotate both handles toward ns fera
t r a
the center of the NEM (2).
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lMan NEM or an NEM Filler:
To Install
E
e
Eli 1. Access the back of the server.
2. Ensure that the NEM release handles are in their fully open position.
3. Extend the release handles outward in opposite directions to their fully open position.
Pull the lower handle downward and lift the upper handle.
4. To install the NEM, use the handles to slide the NEM into its slot until it stops (1).
5. Ensure that the pawl on the lever is aligned with the slot in the back of the server.
6. Rotate both handles toward the center of the NEM (2).
7. This action draws the NEM into the slot, engaging the NEM with its internal connector.
8. If necessary, install the 10 GE transceiver.
9. Attach the necessary cables to the NEM.
10. Prepare the server for operation.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 64


Loading Software After Component
Removal or Replacement
All components must have software loaded on them after you
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

remove and replace them:


• IB Switch
• GigE Switch
• Storage Cell
• DB Server
• KVM Switch r a ble
e s f
tra n
on-
a n
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 65


Oracle Exadata Database Machine IB
Switch Replacement
To remove and replace a switch, perform the following steps:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. Identify the failed switch.


2. Disconnect the power and network cables from the failed
switch.
3. Remove the failed switch from the rack.
4. Install the replacement switch into the rack.
5. Connect the power to the replacement switch. r a ble
6. Check the software on the replacement switch. tran
sfe
7. Configure the replacement switch. n on-
s a
8. Connect network cables to the replacement
) a e ฺ
h idswitch.
ฺ c
9. Verify the status of the replacement om switch.
t G u
a n
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 66


Exercise: Perform Removal and Replacement and
Verify Status on Components
In this exercise, perform the following tasks:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Loading and configuring the InfiniBand switch firmware


• Removing and replacing various system components
Preparation
• Ensure that you have your Student Guide, system
customization information, and hardware.
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 67


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 68


Task 1: Loading the InfiniBand Switch Firmware
Access the ILOM Shell from the CLI by Using the USB Management Port
1. If you have not already done so, connect a USB-to-serial adapter to the USB port of the
switch.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

2. Connect a serial terminal, terminal server, or workstation with a TIP connection to the
USB-to-serial adapter. Configure the terminal or terminal emulator with these settings:

115200 baud
8 bits
No parity
1 Stop bit
b le
fera
No handshaking

a n s
n
3. Press the Enter key on the serial device several times to synchronize
r
t connection.
-the
n o
You might see text similar to the following: a
...
) has ideฺ
CentOS release 5.2 (Final)
ฺ c om t Gu
Kernel 2.6.27.13-nm2 on an-e a en
s tud
i686
c i
@
y this S
nm2name login: sr
a
lmhost name seof the management controller.
e
where nm2name isฺthe u
( e lie se to
sry licenas the login name and ilom-admin as the password.
4. Enterailom-admin
E lM nm2name login: ilom-admin
El i e Password: password
->

Note: As shipped, the ilom-admin user password is welcome1. If this does not work,
try ilom-admin for the password.

5. Upgrade the firmware:

-> load -source URI/pkgname


where:
- URI is the uniform resource indicator for the host where the switch firmware
package is located.
- pkgname is the name of the firmware package in the transfer directory.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 69


For example, use the FTP protocol:
-> load –source http://10.7.3.3/GW/sundcs_36p_repository_1.3.3_2.pkg
Downloading firmware image. This will take approximately 5
minutes.

The firmware is downloaded. The upgrade begins. A warning is displayed and you
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

are asked to commit to the upgrade.

Note: Firmware upgrade will upgrade the SUN DCS 36p firmware. ILOM will enter
a special mode to load new firmware. No other tasks should be performed in ILOM
until the firmware upgrade is complete.

Are you sure you want to load the specified file (y/n)? y
ble
6. Answer y to the prompt to commit to the upgrade.
ns fera
t r a
no n-
The upgrade begins.
a
) has ideฺ
Setting up environment for firmware
ฺ c om upgrade.
t G u This will take

-ea uden
approximately 2 minutes.

c s
Starting SUN DCS 36p FW iupdate
t
@ s S
a sry e thi
ฺ e lm us I4 A
==========================

( e lie se to
Performing operation:
y cen
a I4r fw lupgrade
==========================
s i
ElM from 7.2.0(INI:2) to 7.2.300(INI:2):

E l i e Upgrade started...
Upgrade completed.
INFO: I4 fw upgrade from 7.2.0(INI:2) to 7.2.300(INI:2) succeeded

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 70


===========================
Summary of Firmware update
===========================
I4 status : FW UPDATE - SUCCESS
I4 update succeeded on : A
I4 already up-to-date on : none
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

I4 update failed on : none

=========================================
Performing operation: SUN DCS 36p firmware update
=========================================
SUN DCS 36p: SUN DCS 36p is already at the given version.

Firmware update is complete. ble


ns fera
t r a
Note: In this example, the firmware of the management controller was not updated
no n-
because it was already at the version specified in the switch firmware package.
a
) h as eฺ
7. Reboot the switch to enable the new firmware. m
o u id
c
aฺ ent G
- e
Note: The ILOM stack requires at least
@ tud to become operational after a
cis oneSminute
restart.
a sry e this
ฺ e lm us
( e lie se to
a sry licen
E lM
El i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 71


Task 2: Loading the Cisco Switch Firmware
The default IOS firmware in Cisco 4948 as delivered inside Oracle’s Engineered System may
not have SSH server capability. This document provides instructions on how to apply the new
firmware and configure the SSH server.

Customers that need SSH server capability on their Cisco 4948 switch can obtain the updated
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Cisco IOS version described in this procedure by opening a service request with Oracle
Support and requesting the update. Please mention this note number [1415044.1] in the
service request to streamline the process.
Assumptions and Prerequisites
Cisco 4948 switch included in the Oracle’s Engineered System environment has been
configured to communicate over management network.
Telnet access and enable password are available.
ble
fera
A system with a telnet client is available and is able to connect to the Cisco 4948 switch.
At least 20MB free flash storage space available on Cisco 4948 switch bootflash. The
a ns
command to confirm available space is described below. n- t r
a no
A tftp server is available on the network and can be reached by the Cisco 4948 switch.
s ฺ
How to verify free space available on Cisco 4948 flash) ha
o m u ide
a ฺc After
Log in to Cisco 4948 via telnet with superuser privileges.
n t Glogging in, issue the “show file
e
s- tude
systems” command to display the available space.
i
c
cisco4948-ip#show file systems
s r y@ this S
File Systems:
l m a se
l i e ฺe to u
r y (e Type n seFlags Prefixes
Size(b) Free(b)
as 45204152 e
lic flash rw bootflash:
l M
* 60817408
E
-ie- opaque rw system:
l
E
- - opaque rw tmpsys:
- - opaque ro crashinfo:
524280 523664 flash rw cat4000_flash:

The above sample output shows approximately 45MB free space in bootflash. As only 20MB
is required, this switch passes the prerequisite check for space available.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 72


You can also display the contents of bootflash using the “dir” command as shown below.
Here, it shows a default IOS firmware file stored as example.
cisco4948-ip#dir bootflash:
Directory of bootflash:/

1 -rwx 15613000 Nov 4 2010 05:42:31 -04:00 cat4500-ipbase-mz.122-


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

53.SG2.bin
60817408 bytes total (45204152 bytes free)

Prepare the TFTP server


Create a new directory under the root directory of TFTP file server. In this document, you will
use /tftpboot/cisco4948 as your remote path on the file server (named tftp-server in your
examples). Download the new Cisco IOS firmware in this directory on the tftp-server host so
b le
that Cisco switch can download it via TFTP in later steps. It may look as below:
ns fera
t r a
[root@tftp-server cisco4948]# ls -l n-
no
a
total 30964
) h as eฺ
-rw-r--r-- 1 root root 16170184 Jan 13 11:25
o m G u id
cat4500-ipbasek9-
mz.122-53.SG1.bin c
aฺ ent
- e
@ cis Stud
sry e this
Update and Preserve Current Configuration
a
lm umay
By default, the current configuration
ฺ e s not be set up to boot from a specific firmware file. As
a best practice, we recommend
e o
lie se tto update current configuration to include the boot firmware
( nsection, you have already identified the default IOS firmware file
a sry previous
file name. In the
stored in bootflash. l i
The cefollowing steps will update current configuration to specify the
l M
E boot file.
firmware
E l i e
cisco4948-ip#configure terminal
cisco4948-ip(config)#no boot system
cisco4948-ip(config)#boot system bootflash:cat4500-ipbase-mz.122-
53.SG2.bin
cisco4948-ip(config)# (type <control-z> here to end)

Next, save the current configuration, write to nvram and also save it in boot flash with a
unique name.

cisco4948-ip#copy running-config startup-config all


cisco4948-ip#copy running-config bootflash:cisco4948-ip-confg-
before-ssh

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 73


Now, take a backup of this configuration on remote TFTP file server.

cisco4948-ip#copy bootflash:cisco4948-ip-confg-before-ssh tftp:


After entering the command above, the switch will prompt for the tftp server name and file
name to use when saving to the remote tftp server. Those outputs are not shown here.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Transfer the new Cisco IOS SSH-capable firmware to switch’s boot flash
Copy the new firmware file into Cisco 4948 flash filesystem and verify its integrity in boot
flash. In this example, your tftp server is named “tftp-server” and you have staged the updated
IOS firmware on the tftp server at cisco4948/cat4500-ipbasek9-mz.122-53.SG1.bin.

cisco4948-ip#copy tftp: bootflash:


Address or name of remote host [tftp-server]? 10.7.7.3
b le
Source filename []? cisco4948/cat4500-ipbasek9-mz.122-53.SG1.bin
ns fera
Destination filename [cat4500-ipbasek9-mz.122-53.SG1.bin]? t r a
no
Accessing tftp://10.7.7.3/cisco4948/cat4500-ipbasek9-mz.122-53.SG1.bin...n-
a
s fromฺ 10.133.42.139
Loading cisco4948/cat4500-ipbasek9-mz.122-53.SG1.bin
Vlan211): h a
) uide
(via

o m
e a ฺc nt G
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[OK - 16170184 bytes]
c is- tude
s r y@ this S
16170184 bytes copied in a
ฺ e l u se secs (5355 bytes/sec)
m 3019.672
cisco4948-ip#
( e lie se to
a s r
cisco4948-ip#dir
l i c en
y bootflash:
lM of bootflash:/
Directory
E
Elie
1 -rwx 15613000 Nov 4 2010 05:42:31 -04:00 cat4500-ipbase-mz.122-53.SG2.bin
2 -rwx 16170184 Jan 13 2012 15:37:15 -05:00 cat4500-ipbasek9-mz.122-
53.SG1.bin

60817408 bytes total (29033968 bytes free)

Verify the transferred firmware file for integrity


Run the verify command to verify and validate the download was successful and complete.

cisco4948-ip#verify bootflash:cat4500-ipbasek9-mz.122-53.SG1.bin
File system hash verification successful.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 74


If no errors are returned from the verify command, you should see the message in the
example output above indicating that the verification was successful.

Prepare Cisco 4948 to boot with new IOS firmware


The following steps update the configuration with config-register value of 0x2102 and a new
IOS firmware boot file that you just downloaded. 0x2102 instructs the boot process to ignore
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

any breaks, sets baudrate to 9600 and boots into ROM if the main boot process fails for some
reason.

cisco4948-ip#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
cisco4948-ip(config)#config-register 0x2102
cisco4948-ip(config)#no boot system
ble
fera
cisco4948-ip(config)#boot system bootflash:cat4500-ipbasek9-mz.122-
53.SG1.bin
a n s
cisco4948-ip(config)#
n- t r
cisco4948-ip(config)# (type <control-z> here to end)
a no
cisco4948-ip#show run | include boot
) has ideฺ
boot-start-marker
ฺ c om t Gu
i s -ea uden
boot system bootflash:cat4500-ipbasek9-mz.122-53.SG1.bin
boot-end-marker
@ c S t
sry e thi s
cisco4948-ip#
a
lm us
ฺ e to nvram
( e lie sinto
Save the configuration e
sry copy n
cerunning-config
a
cisco4948-ip#
M l i startup-config all
E l
cisco4948-ip#write memory
l i e
E Building configuration...
Compressed configuration from 6725 bytes to 2261 bytes[OK]

Boot the Cisco 4948 switch with new firmware


In this step, you boot the switch under the new IOS firmware. When the “reload” command is
issued, the switch will reboot and there will be an outage on the management network for all
connected devices (including all storage cells, database servers, ILOMs, and InfiniBand
switches) for a minute or two while the switch reboots. A management network outage should
not cause an application outage as the databases should all remain available and functioning
normally.

cisco4948-ip# reload
You will be asked to confirm if you want to continue and reboot the Cisco switch.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 75


Configure SSH access
With the switch successfully reloaded, reconnect using telnet and configure SSH as shown in
the procedure below. The username command in the example below is required and shows
the choice of username “admin” and password of “welcome1” to configure a user. This is a
required statement, but the username and password can be any username or password (it is
recommended to choose a better password than “welcome1”). After telnet login, please use
the “enable” command to get superuser privileges again and proceed with the following
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

configurations.

cisco4948-ip#conf terminal
Enter configuration commands, one per line. End with CNTL/Z.
cisco4948-ip(config)#crypto key generate rsa
% You already have RSA keys defined named cisco4948-ip.us.oracle.com.
% Do you really want to replace them? [yes/no]: yes
r a ble
s
Choose the size of the key modulus in the range of 360 to 2048 for your
n fe
General Purpose Keys. Choosing a key modulus greater than 512 trmay
- a take a few
minutes.
no n
a
How many bits in the modulus [512]: 768 ) has ideฺ
% Generating 768 bit RSA keys, keys will ฺc
m Gu
beo non-exportable...[OK]
- e a ent
@ cis Stud
cisco4948-ip(config)#
a s ry this
cisco4948-ip(config)#username
ฺ e se password 0 welcome1
lm uadmin
lie se tvty
cisco4948-ip(config)#line
e o 04
(
ry licen
a s
cisco4948-ip(config-line)#transport input all

E lM
cisco4948-ip(config-line)# exit
ie
Elcisco4948-ip(config)#aaa new-model
cisco4948-ip(config)#
cisco4948-ip(config)#ip ssh time-out 60
cisco4948-ip(config)#ip ssh authentication-retries 3
cisco4948-ip(config)#ip ssh version 2
cisco4948-ip(config)# (type <control-z> here to end)

Verify that the SSH configuration is working and configured properly using the “show ip ssh”
command:
cisco4948-ip#show ip ssh
SSH Enabled - version 2.0
Authentication timeout: 60 secs; Authentication retries: 3
cisco4948-ip#

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 76


This switch should now be available for SSH logins using username admin, password
welcome1 via SSH v2 (which is typically the default for most SSH clients).

Disable telnet access (optional)


After configuring SSH access and verifying it, some sites may want to disable telnet access to
the switch (leaving only SSH access available). This is optional as the switch can allow
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

access via SSH and telnet simultaneously. To disable telnet access, connect to the switch
using SSH (since telnet will be disabled as part of this procedure) and enter these commands:

cisco4948-ip# conf terminal


Enter configuration commands, one per line. End with CNTL/Z.
cisco4948-ip(config)#
cisco4948-ip(config)#line vty 0 4
ble
cisco4948-ip(config-line)#transport input ssh
ns fera
cisco4948-ip(config-line)# exit t r a
cisco4948-ip(config)# (type <control-z> here to end) no n-
After this change is in place, telnet on the a
a
s ฺis disabled and
switch
may be verified. SSH connectivity shouldmbe the only ) h ide allowed
o
ฺc nt G u
connection method.
e a
c is- tude
Save configuration changes
s r y@ this S
Finally, with all configurationl m a scomplete,
changes e save the current configuration, write to nvram
ฺ e o u
( e lie se tin bootflash with a unique name for easy reference.
and also save the configuration

a sry licen
E lM
cisco4948-ip#copy running-config startup-config all
i e
Elcisco4948-ip#copy running-config bootflash:cisco4948-ip-confg-with-
ssh
cisco4948-ip#write memory
Building configuration...
Compressed configuration from 6725 bytes to 2261 bytes[OK]

The configuration is complete. The bootflash on the 4948 is large enough to hold both the
original IOS version and the updated SSH-capable IOS version, so no cleanup is required.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 77


Task 3: Removing and Replacing
X2-8 DB Server Components
Log in to the ILOM and/or CLI, check system status, and locate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

and identify components. Use the cabling and Student Guide to


replace any unplugged cables and/or fix any faults. Remove
components with tasks found inline in the Student Guide. After
you verify the component removal, replace it and monitor the
node. Unplug ALL InfiniBand cables when you have finished.
1. Log in to the ILOM/CLI.
r a ble
2. Verify the status of the component.
n s fe
3. Locate and identify the component. n - tra
a
4. Remove the component and verify the removal. no
) h as eฺ
5. Replace the component.
c o m Guid
ฺ t
-ea en
6. Verify the component status.
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sDor not attempt
l i ce the Subassembly replacement procedure on the lab equipment.
lM
WARNING!
E
Eli e
Refer to video captures for details on the Subassembly replacement procedure.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 78


Exercise Summary

Discussion: Take a few minutes to discuss the experiences,


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

issues, and discoveries you had during the lab exercise.


• Experiences
• Interpretations
• Conclusions
• Applications
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 79


Exercise Solutions

In this section, review the answers presented in the Tasks


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

solutions.
• Task 1: Loading Switch Software and Configuring the
Switch
– Review the solutions inline.
• Task 2: Loading the Cisco Switch Firmware
– Review the solutions inline. ble
• Task 3: Removing and Replacing X2-8 DB Server ns fera
Components n - tra
o n
– Review the solutions inline. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 80


Summary

In this lesson, you should have learned how to:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Describe maintenance for the Oracle Exadata Database


Machine
• Perform removal and replacement of components
• Load software after component removal and replacement

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 5 - 81


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

E l i e
lM E
a
( e
ฺ e
sry licen
a
lie se to
lm us
@ c i s
sry e thi s S

t
c
)
-ea uden
om t Gu
a
has ideฺ
n- no
t r a n
s
fera
b
le
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Troubleshooting

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives

After completing this lesson, you should be able to:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Collect information and determine a problem statement for


the Oracle Exadata Database Machine
• Verify the status and configuration on the Oracle Exadata
Database Machine
• List the available troubleshooting utilities and commands
• Access a failed system and determine the problem r a ble
• Understand and interpret available system logs rans
fe
o n -t
• List the known support issues and perform an n available
a
workaround or resolution as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 2


Relevance of Troubleshooting

Discussion: The following questions are relevant to


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

understanding the Oracle Exadata Database Machine:


• What information should be collected before
troubleshooting the Oracle Exadata Database Machine?
• Where can you obtain component or system failure
information?
• How can you determine versions of the various
r a ble
e
components? nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 3


Additional Resources for Troubleshooting

The following references provide additional information about


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the topics described in this lesson:


• My Oracle Support: https://support.oracle.com
– Database Machine X2-2 Diagnosability and Troubleshooting
Best Practices (MOS NOTE 1274324.1)
– Master Note for Oracle Database Machine and Exadata
Storage Server (ID 1187674.1) le
b
• Troubleshooting on Exadata:
n sfera
r
http://www.oracle.com/webfolder/technetwork/Exadata/MAA-
- t a
BestP/Trouble/012811_92402/index.htm non a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 4


Machine Troubleshooting

The first step in troubleshooting is knowing where to look for


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

clues that might lead to a diagnosis. This section describes the


pieces of information you must collect from the system. These
steps are intended to be read-only actions and should not have
an impact on the running system.
Important: If the problem needs to be escalated to backline
support for analysis, you must gather all the information le
required to solve the problem before escalation. r a b
fe ns
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 5


Collecting Information and Determining
a Problem Statement
To begin troubleshooting an Oracle Exadata Database
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Machine problem or issue, perform the following steps:


1. Collect a problem statement from the user or customer.
a. The time that the problem occurred helps isolate the issue
in the logs.
b. What was happening when the problem occurred (for
example, configuration change, metadata update, or
hardware maintenance) r a ble
n tofe s
2. Use the information collected in the problem statement
- t r a
determine the root of the problem. non a
a. Hardware-related problems
) has ideฺ
b. Process or daemon problems ฺ c om t Gu
a en
is-e uproblems
c. Administration or configuration
c d t
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a srare additional
l i ce questions to ask as part of collecting information in a problem
lM
The following
E
e
statement:
Eli 1. What symptoms were observed?
2. What is the frequency of the observed problem?
3. Does it happen all the time or only occasionally?
4. What happened just before the symptom was observed?
5. What changed recently on the system?
6. Has the system been moved or powered off recently?
7. How long has the problem been occurring?
8. What is the intended configuration of the system?
9. What email alerts have been generated? Email alerts are sent for interesting system
events. These events are also logged to the external log and the internal log. Customers
might mention an alert they received that caused them to suspect a problem in the
system. Include this fact in the set of information collected from the system, though the
details of the information should already be in the logs.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 6


Machine Troubleshooting
Function Effect of Failure How It Is Detected
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

NIC Communication with switch or Communication ceases on the


backup patch is lost. primary path, or failover
causes loss of node, BIST on
NIC.

IPMI Status control system cannot IPMI requests to the affected


be sensed. node do not succeed.

CPU System cannot operate. System halts, and the CPU BIST
le
b
fera
or system will not start.

a n s
Memory Correctable ECC is ignored. t
Memory controller interrupts OS.
n- r
Uncorrectable ECC error
halts on interrupt. a no
) h as eฺ
Disk Drive cannot be read or
o m G
Performance id because
degrades
u
Controller written on. a ฺ
of cSATA t
retries.
n This is a stand-
e
is- alone e
ddiagnostic.
c t u
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y c e
a l i
Machine
E lMStatus and Verification
e
i following methods can be used to verify the status of the Oracle Exadata Database
ElThe
Machine.
Machine Visual Inspection
Many issues can be resolved simply through a visual inspection. Check all external network
cabling, power, LEDs, and devices by using the following checklist:
• Check the InfiniBand network, switches, cables, and HCAs.
• All active Ethernet connections should be blinking green, indicating a successfully
negotiated gigabit connection.
• Check the power connections on switches, nodes, and KVM.
• Check the network connections on switches and nodes.
• Conduct a visual verification of all LEDs.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 7


Machine Troubleshooting
Function Effect of Failure How It Is Detected
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

SATA/SAS Drive ceases operation. Controller monitors drive


drive behavior. This is a stand-alone
diagnostic.

BIOS System will not boot. Powerup checksum failure or will


not power up.

Power There are various failures IPMI sensors, BIOS queries IPMI
Supply (see voltage section). sensors (or handheld voltmeter). ble
ns fera
Fan Cooling lessens or fails, and IPMI measures fan speeds,
r a
temperature rises. n- t
BIOS queries IPMI sensors (or
handheld voltmeter).
a no
Voltage There is loss of resources IPMI measures ) hasvoltage,
i d e ฺ BIOS
including drives, fans, and queries
ฺ c omIPMI, t G u might not
resource
CPU. Resource halts.
i s -eawork. d e n
@ c S tu
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srBIOSlpower
i ce savings mode if a sudden drop in performance occurs. If the BIOS
lM
Double-check
E
has
Eli e changed from Power Savings mode, it should be reset.
Certain abnormal situations, such as excessive process termination by a large number of
shutdown aborts, can expose cellsrv to memory leaks. In a worst case situation, with a
large amount of memory leaked, the cell can hang, causing other problems. Cellsrv
memory can be monitored in a variety of ways, such as /proc file system, the ps command,
or the pmap command.
If a leak is detected, a service request should be filed and cellsrv restarted with the
cellcli command, alter cell restart services cellsrv. If the leak is so bad
that it completely hangs the system, the cell can be reset by contacting the BMC with
ipmitool, for example, ipmitool -H <ILOM address> -U root -P <password>
chassis power cycle.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 8


Useful Status Commands

The following commands are useful in an Oracle Exadata


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Database Machine/Exadata Storage Server environment.

Native commands
• vmstat: Overall system utilization; also useful to
aggregate I/O (bi/bo columns) on cells
• iostat: I/O stats on cells; -x for extended statistics
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
srare
E lM
Note that, in cases, the APIC timer might be disabled, causing iostat to report reduced
ie this even
throughput,
Elconfirm
though the cell is actually generating throughput at expected rates. To
problem, compare the vmstat bi and bo (read and write throughput aggregates)
to iostat, because vmstat is not affected by this problem.
Also, dmesg output can be investigated by looking for data after the “Using local APIC timer
interrupts” line. Compare the detected MHz with that from a cell where iostat is working
properly and you will find that the former MHz is about half. A reboot will usually fix this
problem. To work around it permanently, reboots can be performed without the APIC timer.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 9


• /sbin/fdisk –l: List the partition tables for the specified devices, and then exit. If no
devices are given, the devices that are mentioned in /proc/partitions (if that
exists) are used.
• /usr/bin/top: Provide dynamic overall system performance statistics.
• /usr/sbin/ibnetdiscover: Discover and display the IB network topology.
• /usr/sbin/iblinkinfo.pl: Check link speeds.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• /usr/bin/ib-bond –status: Show the interface bonding configuration.


• /sbin/ifconfig <device_name>: Display the status of the interface. Values for
device name are bond0, ib0, and ib1.
• /usr/sbin/ibstat: Query the basic status of the IB device, including version
information.
• Ibstatus: Query the basic status of the IB device.
• /usr/bin/rds-info -I <ibhostname>: Display the IB connections that the IB
transport is using to provide RDS connections. ble
• /usr/bin/rds-info –n: Display all RDS connections. RDS connections are
ns fera
maintained between nodes by transports. t r a
• no
/usr/bin/rds-info –k: Display all the RDS sockets in the system.
n-
a

has ideฺ
/usr/bin/rds-info –c: Display global counters. Each counter increments as its
event occurs. )
• ฺ c om t Gu
/usr/bin/rds-ping: Test whether a remote node is reachable over RDS.
• i s -ea uden
/sbin/ip n s: Display the status of the network connections.
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 10


System Component Query Commands

IPMI Status Commands Through BMC


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• The storage cells and database nodes have a BMC


interface that can report information about the system.
• You can query for remote information by using the
–H <host> -U <user> options. If you want to access
the information on the system you are on, you can omit
those options.
r a ble
• See the man page for more information. n s fe
• Local access option example; prints information n -
fortrathe
node you are on: a no
s ha ideฺ
$ ipmitool chassis status )
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM the Status of System Components
Determine
E
i e
ElRemote access option example; prints information for the node you list for -H:
• $ ipmitool -H trnacel05-ilom -U root chassis status
The chassis status command tells you the power state.
• $ ipmitool -H trnacel05-ilom -U root chassis status
System Power : on
Power Overload : false
Power Interlock : inactive
Main Power Fault : false
Power Control Fault : false
Power Restore Policy : always-off
Chassis Intrusion : inactive
Front-Panel Lockout : inactive
Drive Fault : false
Cooling/Fan Fault : false

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 11


System Component Query Commands

lan print tells you the network configuration of the ILOM.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

$ ipmitool -H trnacel05-ilom -U root lan print 1


Password:
Set in Progress : Set Complete
Auth Type Support : MD5 PASSWORD
Auth Type Enable : Callback : MD5 PASSWORD

ble
: User : MD5 PASSWORD
: Operator : MD5 PASSWORD
fe r a
: Admin : MD5 PASSWORD n s
: OEM : MD5 PASSWORD
n - tra
IP Address Source : Static Address
a no
IP Address : 10.7.7.215
) h as eฺ
Subnet Mask : 255.255.248.0 m Guid
MAC Address : 00:e0:81:34:3b:1daฺco t
- e e n
... is ud c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 12


System Component Query Commands

The sdr command provides sensor information.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

$ ipmitool -H trnacel05-ilom -U root sdr


Password:
DDR 2.6V | 2.63 Volts | ok
CPU core Voltage | 1.42 Volts | ok
VCC 3.3V | 3.32 Volts | ok
VCC 5V | 5.04 Volts | ok
VCC 12V | 11.91 Volts | ok
r a ble
Battery Volt | 2.98 Volts | ok
n s fe
CPU TEMP | 36 degrees C | ok
SYS TEMP | 26 degrees C | ok n - tra
System FAN4 | 7650 RPM | ok a no
System FAN3 | 15120 RPM | ok
) has ideฺ
om t Gu
System FAN1 | 15210 RPM | ok
System FAN2 | 15030 RPM | ok ฺ c
System FAN5 | 7650 RPM | ok s-ea den
@ ci Stu
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
lM the Status of System Components
Determine
E
e
i command sensor provides more detailed sensor information:
ElThe
$ ipmitool -H trnacel05-ilom -U root sensor
The fru command lists information about components in the system:
$ ipmitool -H trnacel05-ilom -U root fru

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 13


System Component Query Commands

The bmc info command lists the firmware version.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[root ~]# ipmitool bmc info


Device ID : 32
Device Revision : 1
Firmware Revision : 3.0
IPMI Version : 2.0

ble
Manufacturer ID : 42
Manufacturer Name : Sun Microsystems
fe r a
Product ID : 18177 (0x4701) n s
Device Available : yes
n - tra
Provides Device SDRs : no
a no
Additional Device Support :
) h as eฺ
Sensor Device
c o m Guid
SDR Repository Device
- e aฺ ent
… is ud c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 14


System Component Query Commands

The bmc reset cold command sends a hard reset.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[root ~]# ipmitool bmc reset cold


Sent cold reset command to MC

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 15


System Component Query Commands

Obtain the serial numbers and versions associated with an


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Oracle Exadata Database Machine.


BIOS Version:
# ipmitool -H 10.7.7.224 -U root -P welcome1 sunoem cli
"show /SYS/MB/BIOS fru_version"
Connected. Use ^D to exit.

ble
-> show /SYS/MB/BIOS fru_version
/SYS/MB/BIOS
fe r a
Properties:
a n s
fru_version = 07030004 n-tr no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 16


System Component Query Commands

ILOM Version:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# ipmitool -H 192.168.10.223 -U root -P welcome1 sunoem


cli "version"
Connected. Use ^D to exit.
-> version
SP firmware 3.0.6.10.b

ble
SP firmware build number: 52264
SP firmware date: Fri Jan 29 21:14:38 CST 2010
e r a
SP filesystem version: 0.1.22 nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 17


System Component Query Commands

Product Serial Number:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# ipmitool -H 192.168.10.223 -U root -P welcome1 sunoem


cli "show /SYS product_serial_number"
Connected. Use ^D to exit.
-> show /SYS product_serial_number

/SYS
b le
fera
Properties:
product_serial_number = 0937XF5003
a ns
n- t r
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Determine
E lM the Version of System Components
e
ElIfi you have a machine that is up and running, you can obtain its serial number, universally
unique identifier (UUID), and other relevant information by using the dmidecode command.
The command returns a long list of information, including items, such as serial number, UUID,
and product name, which are required when submitting a Service Request.
This information applies to the database nodes and storage cells.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 18


System Component Query Commands

If the server is booted and you can log in to the system, the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

system product name and serial number can be obtained from


the operating system command line by running the following
commands:

[root ~]# dmidecode -s system-product-name


SUN FIRE X4170 SERVER
ble
[root ~]# dmidecode -s system-serial-number
ns fera
t r a
n-
0937XF5003

a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM the Version of System Components
Determine
E
i e
ElAnother option is generating a file with the data generated by the dmidecode command and
uploading the file to the service request.
# dmidecode > `hostname`_dmidecode.txt

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 19


System Component Query Commands
The F40 flashcard details can be obtained from the operating
system command line by running the following commands:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[root@scam04cel14 sbin]# ./ddoemcli


**********************************************************
LSI Corporation WarpDrive Management Utility
Version 01.250.41.04 (2012.06.04)
Copyright (c) 2011 LSI Corporation. All Rights Reserved.
**********************************************************
le
ID WarpDrive Package Version PCI Address
f e rab
-- --------- --------------- -----------
a n s
1 ELP-4x100-4d-n 06.05.10.00 00:20:00:00 n -t r
n o
2 ELP-4x100-4d-n 06.05.10.00 00:30:00:00 a
3 ELP-4x100-4d-n 06.05.10.00 h
00:90:00:00
) as eฺ
4 ELP-4x100-4d-n 06.05.10.00

m
00:b0:00:00
co nt G uid
a
e de 1
Select the WarpDrive [1-4 or
c is-0:Quit]:
tu
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr liceInformation
E lM Flash Package
1. List WarpDrive
e
i Display WarpDrive Health
2.
El3.
Update

4. Locate WarpDrive
5. Format WarpDrive
6. Display/Set Power Value
7. Flash WarpDrive Firmware
8. Flash WarpDrive BIOS
9. Update SSD Firmware
10. Dump Raw Data
11. Show Vital Product Data
12. Reset Adapter
13. Reset Target
14. Set SSD State
15. Extract/Erase/Query Panic Logs
16. Extract SMART Logs

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 20


Hardware Diagnostic Best Practices: F40 ddoemcli command

[root@scam04cel14 sbin]# ./ddoemcli -c 1 –health


**********************************************************************
******
LSI Corporation WarpDrive Management Utility
Version 01.250.41.04 (2012.06.04)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Copyright (c) 2011 LSI Corporation. All Rights Reserved.


**********************************************************************
******
Raid Volume status = unconfigured.
--------------------------------
WarpDrive ELP-4x100-4d-n Health
--------------------------------
Backup Rail Monitor : GOOD
ble
ns fera
0.29
t r a
n-
Note: Customers will have only “ddcli” as their command. “ddoemcli” is an Oracle internal
no
command. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 21


[root@scam04cel14 sbin]# ./ddoemcli -c 1 -showvpd

****************************************************************************
LSI Corporation WarpDrive Management Utility
Version 01.250.41.04 (2012.06.04)
Copyright (c) 2011 LSI Corporation. All Rights Reserved.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

****************************************************************************
-----------------------------------------------------------
VPD Information
-----------------------------------------------------------
Product Name : Sun Flash Accelerator F40 PCIe 2.0 Low Profile Adapter
PN : 7026993
EC : L3-25487-02B
ble
SN : 464168P+1223000600
ns fera
VA : Flash HBA
t r a
VB : 0000
no n-
a
has ideฺ
V1 : LSI Corporation
)
om t Gu
V2 : 1000
ฺ c
-ea uden
V3 : 007E
V4 : 108E
c i s t
@ s S
V5 : 0581
: 17.6W lma
sry e thi
V6
i e ฺ e to us
V7 l se
e0.1W
: 5.8W
(
V8
a sr licen
y :
MN
El M : 10080

E l
RV i e : 0x8e
V1 : SP22232067
V3 : 01
V4 : A3
V6 : V6
V7 : P
-----------------------------------------------------------

LSI WarpDrive Management Utility: Execution completed successfully.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 22


System Component Query Commands

IB Switch Version:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[root ~]# version


SUN DCS 36p version: 1.3.3-2
Build time: Apr 4 2011 11:15:19
SP board info:
Manufacturing Date: 2009.02.19
Serial Number: "NCD2T0060"
ble
fera
Hardware Revision: 0x0100
Firmware Revision: 0x0102
a ns
BIOS version: NOW1R112
n- t r
BIOS date: 04/24/2009
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM the Version of System Components
Determine
E
e
i The NM2 InfiniBand switch software should be, at minimum, version 1.0.1-1.
ElNote:

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 23


System Component Query Commands

Verify Service LED status:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

-> show /SYS/SERVICE

/SYS/SERVICE
Targets:

Properties:
type = Indicator ble
ipmi_name = SERVICE
ns fera
value = Off
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Hardware
E lMDiagnostic Best Practices
i e
ElCheck all the systems to see if any of the Service LEDs are lit. The following command can
be run on the ILOMs of the individual systems or the dcli-based command may be run from
the DB01 system. The results should all show the LEDs to be off.
# dcli -l root -g half "ipmitool sunoem cli 'show /SYS/SERVICE' |
grep value”
192.168.1.1: value = Off
192.168.1.2: value = Off
192.168.1.3: value = Off
192.168.1.4: value = Off
192.168.1.5: value = Off
192.168.1.6: value = Off
192.168.1.7: value = Off
192.168.1.8: value = Off
192.168.1.9: value = Off

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 24


System Component Query Commands

Check for faults in the fault management shell.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

-> start /SP/faultmgmt/shell


Are you sure you want to start /SP/faultmgmt/shell (y/n)?
y

faultmgmtsp> fmadm faulty


No faults found
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a storleavelithe
ceshell.
lM
Enter exit
E
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 25


Hardware Diagnostics: Server

Verify that the Master Serial Number is set and correct.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# dcli -l root -g full 'ipmitool sunoem cli "show /SP |


grep system_identifier" ' | grep "="

192.168.1.1: system_identifier = Exadata Database Machine


X2-2 0938AK20E6
192.168.1.2: system_identifier = Exadata Database Machine
a b le
X2-2 0938AK20E6 fe r
n s
X2-2 0938AK20E6 n - tra
192.168.1.3: system_identifier = Exadata Database Machine
o
<snip> an
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sIdentifier
E M
Note: lValid strings should be one of the following:

Elie• Sun Oracle Database Machine 0938AK20E6


• Exadata Database Machine X2-2 1033AK213A
• Exadata Database Machine X2-8 1033AK21C0

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 26


Hardware Diagnostics: Server

Verify that ASR is available.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Compute Nodes:
# dcli -g dbs_group -l root
"/opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -
get_snmp_subscribers -type asr”

Storage Nodes:
# dcli -g cell_group -l celladmin "cellcli -e list cell rab
le
e
attributes snmpsubscriber" nsf tra
n -
no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srcannotliccheck
e
On ILOM,
E lM you if ASR is enabled. You can check only if it has not been enabled.
ie verify
To
Elverify
that the SNMP services are being monitored (not necessarily by ASR), you need to
if any of the SNMP rule sets have been configured with a destination address.
-> show /SP/alertmgmt/rules/[1-15]

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 27


Hardware Diagnostics: Server

Check the status (age) of the Flash card Electronic Storage


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Module ESM: Exadata X2-2 only


-> show /SYS/MB/RISER2/PCIE2/F20CARD/UPTIME

/SYS/MB/RISER/PCIE2/F20CARD/UPTIME
Targets:

Properties:
a b le
type = Power Unit fe r
n s
ipmi_name = PCIE2/F20/UP
n - tra
class = Threshold Sensor
value = 768.000 Hours a no
upper_nonrecov_threshold = 17500.000 Hours
) has ideฺ
upper_critical_threshold = 17200.000
ฺ c om Hours
t G u
upper_noncritical_threshold =ea e
16800.000 n Hours
i s - d
<snip> c tu
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s oflithe ceFlash20 cards on an Exadata storage cell are:
lM
The four a
locations
i e E
El/SYS/MB/RISER1/PCIE1/
/SYS/MB/RISER1/PCIE4/
/SYS/MB/RISER2/PCIE2/
/SYS/MB/RISER2/PCIE5/
Check the system for messages indicating that an ESM has failed or one which is near its end
of useful life. The messages you see should be similar to one of the following:
/SYS/MB/RISER1/PCI4/F20CARD ESM is approaching its lifespan. Please
schedule a replacement as soon as possible.
Or
/SYS/MB/RISER2/PCI5/F20CARD ESM has exceeded its lifespan. Please
schedule a replacement as soon as possible.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 28


Hardware Diagnostics: Server

Check the status of the RAID card for all hosts.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# dcli -g half -l root '/opt/MegaRAID/MegaCli/MegaCli64 -


AdpBbuCmd -a0 | grep Temperature:'

192.168.1.1: Temperature: 42 C
192.168.1.2: Temperature: 47 C
192.168.1.3: Temperature: 33 C
192.168.1.4: Temperature: 40 C
ble
192.168.1.5:
<snip>
Temperature: 33 C
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 29


Hardware Diagnostics: Server

Check the system event log for problems.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# ipmitool sel list


794 | 11/11/2008 | 18:40:20 | Temperature #0x27 | Upper
Noncritical
going high
7a8 | 11/11/2008 | 18:40:21 | Temperature #0x27 | Upper
Noncritical
ble
fera
going high

a ns
n- t r
a no
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Hardware
E lMDiagnostic Best Practices
i e
ElCheck for memory (ECC) errors with the command:
# ipmitool sel list | grep ECC | cut -f1 -d : | sort –u
This command enables you to identify any memory (ECC) errors. If such errors exist,
reseating the DIMM might fix the problem. In some cases, the entire motherboard may have
to be replaced.
Symptoms of memory (ECC) errors include the following:
• The amount of system time for certain operations is unpredictably large, causing
network transfers to be slow, which can also result in application timeouts.
• The IPMI Linux driver times out on operations with ILOM.
• A message such as IPMI message handler: BMC returned incorrect
response, expected netfn b cmd 22, got netfn 1 cmd 0 appears in the
/var/log/messages file.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 30


Hardware Diagnostics: Server

Obtain proper diagnostics for hardware issues:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# /opt/oracle.SupportTools/sundiag.sh

Note: This command collects diagnostics for the entire system.


When complete, the report files are bzip2 compressed in /tmp.rab
le
n s fe
For a complete list of the commands run by the script, tsee
ra
n -
Appendix A. no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr temperature
l i ce
E M
Checklthe ambient if drive throughput degrades.
ie throughput can degrade significantly without proper cooling. The following command
ElDrive
sequence can be used to check temperatures on all cells:
while true; do date; dcli -g cell_group -l root "ipmitool sensor |
grep 'Inlet Amb Temp'"; sleep 60; done

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 31


Hardware Diagnostics: Server

Use the sosreport package for problem diagnosis:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# /opt/oracle.cellos/vldrun -script sosreport


or
# /opt/oracle.cellos/sreport.sh

The sosreport package automates the process of collecting


the relevant trace files when an error occurs. This package erab
le
should be used to ensure that Oracle Support obtains all n
thes f
necessary information for root cause analysis. non-
tra
The following is an example of packaging h
a
anasincident:
) d e ฺ
i
# /opt/oracle.cellos/sreport.sh com Gu aฺ ent
- e
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr(version
l i ce 1.7)
lM
sosreport
E
e
i utility will collect some detailed information
ElThis about the
hardware and setup of your Enterprise Linux system. The information
is collected and an archive is packaged under /tmp, which you can
send to a support representative.
This information will be used for diagnostic purposes only and it
will be considered confidential information.
This process may take a while to complete.
No changes will be made to your system.
Press ENTER to continue, or CTRL-C to quit.
Please enter your first initial and last name [trnacel07]: dwinter
Please enter the case number that you are generating this report
for: 10132345
Progress [###################100%##################][05:51/05:51]

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 32


Creating compressed archive
Your sosreport has been generated and saved in:
/tmp/sosreport-dwinter.10132345-817953-683b39.tar.bz2
The md5sum is: 5a249a63e062f723cfc8b23fc6683b39
Please send this file to your support representative.
At this point, the packaged zip file is in /tmp and ready for shipment to Oracle Support.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Understand that certain error messages are not causing an application problem. The initial
production release includes various benign error messages that might seem critical but, in
fact, do not cause a problem.
Error messages in this category include:
ib1: multicast join failed for
ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
or ble
RDS/IB: send completion on 192.168.50.24 had status 12,
ns fera
t r a
n-
disconnecting and reconnecting
no
These messages can occur in /var/log/messages during various failure scenarios. They
a
has ideฺ
should not affect application performance, but all failures should still be investigated.
)
ฺ c om- 80:00:00:48:FE:80
HW address shared conflict bet ib0 and bond0 warning message:
t G u
(A) Warning: the permanent HWaddr of
- e a en
ib0 – is
still in use by bond0.i
@ cis Stud
Set the HWaddr of ib0 to raydifferent
s t h is address to avoid conflicts.
This message happenslm a s
on ifdowne bond0 and ib0 happens to be the
ฺ e u
to not a problem.
active interface.
( e lie Thiss e
is
(B) Duringry bootup, nmessages regarding "missing
a s li c e
l M
/sys/class/net/bond0/bonding/slaves" file come up.
E messages come up because the ib0/ib1 interfaces are configured by openibd before
l i e
These
E the bond0 interface is configured. These messages are okay as well and do not pose a
problem.
Line 109 cannot find the file message from ifup-eth on up/down of IB/and shutdown/boot.
First boot warnings on compute node: These are harmless and appear only on first boot
because of the way first boot loads. It loads only certain modules in the lower run level.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 33


Hardware Diagnostic Best Practices: Diagnosing Network Cabling Issues
Examining ifconfig output and looking for a missing RUNNING attribute can determine
loose or improperly connected cables.
For example, a working connected cable has a line such as the following:
“UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1”
But a loose or unconnected cable can have a line such as the following:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

“UP BROADCAST MULTICAST MTU:1500 Metric:1”


This strategy is viable for both Ethernet and InfiniBand interfaces.
An alternative way to check for the same thing is to look at
/sys/class/net/<interface>/carrier.
1 means a good connection; 0 means a problem with the connection.

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 34


Hardware Diagnostics: Server

Oracle OSWatcher: System data gathering and reporting


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

utilities. The gathered data is stored in archive subdirectories.

# /opt/oracle.oswatcher/osw/archive

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srdata isligathered:
The following ce
E lM
e
Eli 10 seconds
Every
rds-info –c

Every 15 seconds
iostat -x 1 3
vmstat 1 2
top
ps -elf
mpstat 1 2
cat /proc/meminfo
cat /proc/slabinfo

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 35


Every 2 minutes
lsof +c0 -w +L -b -R -X
lsof +c0 -w +L -b -R -i
lsof +c0 -w +L -b -R -U
lsof +c0 -w +L1 -b –R
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Every 5 minutes
ibstatus
ib-bond --status
ibv_devinfo
ifconfig –a

Every hour
b le
ipmitool sel list
ns fera
Data is retained for 120 hours in bzip2-compressed files in the appropriate archivet r a
subdirectory.
no n-
a
The osw directory has a read me file and utilities to report and graph some types of collected
data. ) has ideฺ
ฺ c om t Gu
Linux Kernel Crash Core Files
i s -ea uden
The cells and database nodes of the Database
@ c Machine S t are configured to generate Linux
kernel crash core files in the /var/crash i s
sry e thdirectory, when a Linux crash utility is located. The
a
m uthes crash files. The crash files are automatically removed
crash utility can be used tolanalyze
ฺ e o files do not occupy more than 10 percent of the free disk
by the OSWatcher utility
( e lie sthat
so
e tthe
space on the file y system. nOlder crash files are removed first.
a s r li c e
E l M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 36


Automatic Diagnostic Repository
Trace files from the storage servers or database nodes should be collected by using
Automatic Diagnostic Repository (ADR).
As of Oracle Database Release 11g R2, the alert log, all trace and dump files, and other
diagnostic data are stored in the ADR.
Note: These locations and directories are created and configured during software installation
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

and might not be available until the installation is complete.


Important Locations for Trace Files in the Storage Servers
variable $LOG_HOME defines a place for important trace files,
particular to the processes like cellsrv, ms, rm.
$ env |grep LOG
LOG_HOME=/opt/oracle/cell/cellsrv/deploy/log
Using adrci as celladmin:
ble
$ adrci
ns fera
t r a
n-
ADRCI: Release 11.1.0.7.0 - Production on Fri Sep 19 16:07:20 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
a no
ADR base = "/opt/oracle/cell/log"
) has ideฺ
adrci> show home
ฺ c om t Gu
ADR Homes:
i s -ea uden
diag/asm/cell/sc01.us.oracle.com@ c S t
adrci> show tracefile as
ry thi s
ฺ e lm use
( e lie se to
diag/asm/cell/sc01.us.oracle.com/trace/svtrc_18335_6.trc

a s ry licen
diag/asm/cell/sc01.us.oracle.com/trace/svtrc_8030_0.trc

lM
diag/asm/cell/sc01.us.oracle.com/trace/rstrc_21240_1.trc
E
Eli e
adrci>
The trace files are located at $ADR_BASE/diag/asm/cell/<hostname>.
This is the place for other important trace files—for example, alert.log.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 37


Hardware Diagnostic Best Practices: Diagnose Underperforming (“Sick") Cells and/or
Disks
Some performance problems can be caused by a cell or disk that is “sick.” A “sick" cell or disk
can be caused by things such as excessive temperature, excessive bad block relocations,
rereads due to bad sectors, and so on. Use the following guidelines to diagnose a
performance problem that is caused by a “sick” cell or disk:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The CellCLI calibrate command can be used to determine if disks are operating according to
expectations.
Note: Calibrate should be run when the application is not accessing the cell so that it does not
itself cause an application performance problem. Preferably, the cellsrv process should be
completely shut down when running calibrate. The force option is required if cellsrv is up.
Calibrate will report any disk that is suboptimal. If a disk is reported as suboptimal, perform
the following.
Look for evidence of unrecoverable I/O errors in the following locations: b le
• The CellCLI command list physicaldisk detail shows the hardReadErrors and
ns fera
hardWriteErrors attributes which will increase with each occurrence.
t r a
no n-
Note: It is better to monitor these values while the workload is running as opposed to
making a conclusion on the initial value seen. a
• ) has ideฺ
The cell side alert log reports unrecoverable I/O errors with the "io_getevents err"
string in them.
ฺ c om t Gu

i s -ea uden
The database and ASM alert logs will report messages with "IO Failed" when
c
encountering these unrecoverable errors.
@ S t
sry e thi s
- If the I/O is a write, the disk will be offlined.
a
lm us
ฺ e
- If the I/O is a read, just the message will be written.
• ( e lie se to
In both cases, the secondary extent is leveraged to acquire the necessary data. Disks
sry licen
reporting these errors excessively should be dropped from the disk group with new disks
a
l M
taking their place.
E
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 38


It is also useful to compare I/O performance metrics (such as queue time and service time)
across cells and ensure that they are within specifications and consistent across all cells. If
one cell and/or one disk is underperforming and causing a performance impact, consider
dropping that cell/disk from the disk group temporarily, root-causing and fixing the issue, and
then reading it back to the disk group.
• Disk controller configuration can be verified with the following command:
- Megacli64 -adpallinfo –a0
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Disk controller firmware logs can be checked with the following command:
- Megacli64 -fwtermlog -dsply –a0
In some cases, where a disk is performing poorly, simply reseating the drive can resolve the
issue. Ensure that the disk is taken offline (can be done by inactivating grid disk) before
performing this operation.
Write performance can be affected if the drives have been set back to the writethrough mode
le
from the default writeback. This can occur if there is a problem with the controller battery or if
b
can be checked with the following command: ns fera
it is in a special mode, such as a learn cycle. The writethrough/writeback status of the disks

t r a
no
MegaCli64 -LDInfo -Lall -aALL | grep 'Current Cache Policy’ n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 39


Hardware Diagnostics: InfiniBand

InfiniBand Diagnostics on Compute Node


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Copy the GetIBData.sh script (see below) to any


compute node. Execute the script and provide the
IBLogs*.gz log file.
• Execute the following commands to collect the information
and provide the /tmp/IBLogs*.gz file:
# cd /tmp
r a ble
e
# sh /tmp/GetIBData.sh nsf tra
n -
ano
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
#!/bin/sh
E
El e
i # script GetIBData.sh
# To collect data related to the IB net
#
rm -f /tmp/IBLogs_`hostname`.log
echo "####### Collecting data...." >> /tmp/IBLogs_`hostname`.log
echo `date` >> /tmp/IBLogs_`hostname`.log
echo "####### Checkboot..." >> /tmp/IBLogs_`hostname`.log
checkboot >> /tmp/IBLogs_`hostname`.log

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 40


echo "####### ibswitches..." >> /tmp/IBLogs_`hostname`.log
ibswitches >> /tmp/IBLogs_`hostname`.log
echo "####### ibhosts..." >> /tmp/IBLogs_`hostname`.log
ibhosts >> /tmp/IBLogs_`hostname`.log
echo "####### ibnetdiscover..." >> /tmp/IBLogs_`hostname`.log
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ibnetdiscover >> /tmp/IBLogs_`hostname`.log


echo "####### listlinkup..." >> /tmp/IBLogs_`hostname`.log
listlinkup >> /tmp/IBLogs_`hostname`.log
echo "####### ibheckstate..." >> /tmp/IBLogs_`hostname`.log
ibcheckstate -v >> /tmp/IBLogs_`hostname`.log
echo "####### ibdiagnet..." >> /tmp/IBLogs_`hostname`.log
ibdiagnet >> /tmp/IBLogs_`hostname`.log
ble
echo "####### sminfo..." >> /tmp/IBLogs_`hostname`.log
ns fera
t r a
n-
sminfo >> /tmp/IBLogs_`hostname`.log
echo "####### ibdiagnet -c 100 -P all=1..." >>
a no
/tmp/IBLogs_`hostname`.log
) h as eฺ
c o m Guid
ibdiagnet -c 100 -P all=1 >> /tmp/IBLogs_`hostname`.log

- e aฺ ent
# tar -czvf IBLogs_`hostname`.gz
@ tud
cis S/tmp/IBLogs*.log /tmp/ibdiagnet.*
s
sry e thi /var/log/boot.log /var/log/secure
/var/log/messages /var/log/opensm*
a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 41


Hardware Diagnostics: InfiniBand

InfiniBand Switch Configuration and Diagnostics


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Two users have been defined for NM2:


• ilom-admin (password changeme)
This is the default user account for NM2. ilom-admin
will be able to monitor the chassis and read health-related
information from the chassis.
• root (password welcome1)
r a ble
The root user will be able to perform standard “system
n s fe
management” on NM2. In addition, many productionn - traand
no
service commands will be available only toathe
s
superuser. ) ha eฺ m Guid
c o
- e aฺ ent
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 42


Hardware Diagnostics: InfiniBand
SA Subnet
Application IP Based Sockets
Various
Block
Clustered
Access to Administrator
Level App Based Storage File
Diag Open MPIs DB Access MAD Management
Access Access Access Systems
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Datagram
Tools SM
SMA Subnet Manager
User Level
UDAPL Agent
MAD API
User PMA Performance
InfiniBand OpenFabrics User Level Verbs / API iWARP R-NIC
Manager Agent
APIs
User SDP Lib IPoIB IP over InfiniBand
Space
SDP Sockets Direct
Kernel Space Protocol
Upper
Layer NFS-RDMA Cluster SRP SCSI RDMA
EoIB IPoIB SDP SRP iSER RDS RPC File Sys Protocol (Initiator)
Protocol
iSER iSCSI RDMA

b le
Protocol (Initiator)

fera
Connection Manager
Abstraction (CMA) RDS Reliable Datagram
Kernel bypass

Kernel bypass
Mid-Layer
SA Connection Connection
a n s
Service

Client
MAD SMA
Manager Manager
n- t r
UDAPL User Direct Access
Programming Lib

InfiniBand no
OpenFabrics Kernel Level Verbs / API
a iWARP R-NIC HCA Host Channel
Adapter

Hardware ) has ideฺ Hardware Specific


R-NIC RDMA NIC

om t Gu
Provider
Specific Driver Driver Key Common
ฺ c Apps &

-ea uden
Access
InfiniBand Methods
Hardware InfiniBand HCA
c i s t
iWARP R-NIC
iWARP
for using
OF Stack

@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i
el nsee
Copyright
t o
y (
a sr Architecture:
l i ce
E lM
InfiniBand Layered Troubleshoot based on layers.

Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 43


Hardware Diagnostics: InfiniBand
ILOM/Fabric Monitor
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr ILOMliWeb
ce UI is not accessible, first check network settings. If the switch
E lMsettings look okay, check the following setting through the spsh:
If the IB switch
e
network
Eli
-> cd /SP/services/http
/SP/services/http

-> ls
Properties:
port = 80
secureredirect = enabled
servicestate = disabled

-> set servicestate=enabled


Set 'servicestate' to 'enabled’

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 44


Hardware Diagnostics: InfiniBand
ILOM/Fabric Monitor
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sInfo
E lM
The System tab displays status information regarding the switch hardware, including

Eloriethe chassis and power supplies, and status of the power supplies and fans.
basic information about the management controller, firmware version and build date, FRU IDs

The Sensor Info tab displays the latest hardware sensor readings for the switch’s power
supplies and fans, including the current voltage and temperature values.
The IB Performance tab displays the status and available bandwidth of the switch ports by
using a table format. By clicking a column heading, the information in the table is sorted
according to that column heading, either in ascending or descending order.
The IB Port Map tab displays information about peer devices attached to the switch by using a
table format.
The Subnet Manager tab displays the current SM settings for this switch, including whether or
not it is the master SM, along with the priority of the SM.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 45


Hardware Diagnostics: InfiniBand
ILOM/Fabric Monitor
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr diagram
l i cedisplays four gateway receptacles, labeled 0A, 1A, 0B, and 1B. When
aie ElM is physically present in a gateway receptacle, the receptacle changes from a
The rear panel
connector
Elblack rectangle to a gray rectangle with four indicators. Each indicator represents one of the
four possible ports available at the connection. The rectangles left of the gateway connection
are the BX indicators, which display the status of the internal switch hardware connections.
Moving the cursor over a BX indicator opens a small window that provides information about
the BridgeX port. If the indicator is red, the window displays a reason for the respective state.
Clicking a gray gateway connector opens a window that displays connector FRU and port
information for that connection. At the top of the window is the connector name. There are two
parts of the window: the cable FRU ID information on the left, and a smaller status pane for
the ports on the right. Clicking a tab displays that port’s information, including the physical
state, logical state, protocol, and so on.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 46


Hardware Diagnostics: InfiniBand
ILOM/Fabric Monitor
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr diagram
l i cedisplays the presence of connectors and their status by using a
E lM
The rear panel
i e
graphic. The diagram displays the management controller’s IP address, the connector
Elreceptacles, and their respective connector names. When a cable is attached to a receptacle,
a connection is made. That connection is displayed in the diagram as a gray rectangle, with
three or four smaller indicators. Moving the cursor over an indicator, clicking an indicator, or
clicking a connection opens a window that provides additional information about that indicator
or connection.
In the rear panel diagram, there are 32 InfiniBand receptacles displayed, labeled 0A to 15A
and 0B to 15B. When a connector is physically present in an InfiniBand receptacle, the
receptacle changes from a black rectangle to a gray rectangle with three indicators. Moving
the cursor over an indicator that is orange or red opens a small window that provides the
reason for the respective state. A center indicator is orange when the link is at a speed slower
than QDR, such as SDR or DDR. A right indicator is red when there are significant errors
(symbol, recovery, and so on) on the link.
Clicking a gray InfiniBand connector opens a window that displays connector FRU, port state,
error, and statistical information for that connection.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 47


InfiniBand Utilities Descriptions

ibstatus
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibstatus is a script that displays basic information obtained
from the local IB driver. Output includes LID, SMLID, port
state, link width active, and port physical state.
Syntax:
le
ibstatus [-h] [devname[:port]]... rab fe
n s
- tra
n on
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibstatus
InfiniBand device 'is4_0' port 0 status:
default gid: fe80:0000:0000:0000:0021:283a:87cb:a0a0
base lid: 0x1
sm lid: 0x12
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
See also:
ibstat

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 48


InfiniBand Utilities Descriptions

ibroute
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibroute uses SMPs to display the forwarding tables (unicast
[LinearForwardingTable or LFT]) or multicast (Multicast
[ForwardingTable or MFT]) for the specified switch LID and
the optional LID (mlid) range.
The default range is all valid entries in the range 1...FDBTop. able
f e r
Syntax:
a n s
t r
ibroute [options] <switch_addr> [<startlid>
n on-
a
[<endlid>]]] as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s flags: ce
a l i
lM
Nonstandard
E
e
i Show all lids in range, even invalid entries.
El-a
-n Do not try to resolve destinations.
-M Show multicast forwarding tables. In this case, the range
parameters are specifying mlid range.
Examples
# ibroute 1
Unicast lids [0x0-0x19] of switch Lid 1 guid 0x0021283a87cba0a0 (Sun
DCS 36 QDR LC switch burxsw-ib2.east.sun.com):
Lid Out Destination
Port Info
0x0001 000 : (Switch portguid 0x0021283a87cba0a0: 'Sun DCS 36 QDR LC
switch burxsw-ib2.east.sun.com')

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 49


0x0002 003 : (Channel Adapter portguid 0x00212800013e70bb:
'burxcel04 C 192.168.20.114 HCA-1')
0x0003 013 : (Channel Adapter portguid 0x00212800013e6e00: 'burxdb03
S 192.168.20.123 HCA-1')
0x0004 008 : (Channel Adapter portguid 0x00212800013e6c13:
'burxcel07 C 192.168.20.117 HCA-1')
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

0x0005 014 : (Channel Adapter portguid 0x00212800013e6c14:


'burxcel07 C 192.168.20.117 HCA-1')
0x0006 012 : (Channel Adapter portguid 0x00212800013e6c23: 'burxdb04
S 192.168.20.124 HCA-1 ')
0x0007 015 : (Channel Adapter portguid 0x00212800013e6c24: 'burxdb04
S 192.168.20.124 HCA-1')
0x000a 004 : (Channel Adapter portguid 0x00212800013e6c7f:
'burxcel03 C 192.168.20.113 HCA-1') ble
0x000b 005 : (Channel Adapter portguid 0x00212800013e6e7f:
ns fera
t r a
n-
'burxcel06 C 192.168.20.116 HCA-1')
0x000c 017 : (Channel Adapter portguid 0x00212800013e6c80:
a no
'burxcel03 C 192.168.20.113 HCA-1')
) h as eฺ
c o m Guid
0x000d 018 : (Channel Adapter portguid 0x00212800013e6e80:
'burxcel06 C 192.168.20.116 HCA-1')aฺ t
- e e n
…<output omitted>
@ cis Stud
0x000e 007 : (Channel Adapter s t h is
ry portguid 0x00212800013e6d83: 'burxdb01
S 192.168.20.121 HCA-1') a
lm us e
ฺ e to
e lie seAdapter
0x0017 001 : (Channel
( portguid 0x00212800013e6ccf:
'burxcel02ry
a l i c en
s C 192.168.20.112 HCA-1')
0x0018
E lM016 : (Channel Adapter portguid 0x00212800013e6cd0:
Elie
'burxcel02 C 192.168.20.112 HCA-1')
0x0019 009 : (Channel Adapter portguid 0x00212800013e6dff: 'burxdb03
S 192.168.20.123 HCA-1')
22 valid lids dumped

See also:
ibtracert

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 50


InfiniBand Utilities Descriptions

ibtracert
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibtracert uses SMPs to trace the path from a source
GID/LID to a destination GID/LID. Each hop along the path is
displayed until the destination is reached or a hop does not
respond. By using the -m option, multicast path tracing can
be performed between source and destination nodes.
r a ble
Syntax: n s fe
ibtracert [options] <src-addr> <dest-addr> n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
e
i Simple format; don't show additional information.
El-n
-m <mlid> Show the multicast trace of the specified mlid.

Examples
# ibtracert 1 2
From switch {0x0021283a87cba0a0} portnum 0 lid 1-1 "Sun DCS 36 QDR
LC switch burxsw-ib2.east.sun.com"
[3] -> ca port {0x00212800013e70bb}[1] lid 2-2 "burxcel04 C
192.168.20.114 HCA-1"
To ca {0x00212800013e70ba} portnum 1 lid 2-2 "burxcel04 C
192.168.20.114 HCA-1"

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 51


InfiniBand Utilities Descriptions

ibping
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibping uses vendor mads to validate connectivity between
IB nodes. On exit, (IP) ping-like output is shown. ibping is
run as client/server. The default is to run as client. Note also
that a default ping server is implemented within the kernel.
Syntax:
r a ble
e
ibping [options] <dest lid|guid> nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
e
i <count>Stop after count packets.
El-c
-f Flood destination: send packets back to back without delay.
-o <oui>Use specified OUI number to multiplex vendor mads.
-S Start in server mode (do not return).

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 52


InfiniBand Utilities Descriptions

ibnetdiscover
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibnetdiscover performs IB subnet discovery and outputs a
human-readable topology file. GUIDs, node types, and port
numbers are displayed, as well as port LIDs and
NodeDescriptions. All nodes (and links) are displayed (full
topology).
r a ble
Syntax: n s fe
ibnetdiscover [options] [<topology-filename>] n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr can luse
i cethis utility to list the current connected nodes. The output is printed to
E lM
Optionally, you
the
Eli e standard output unless a topology file is specified.
Nonstandard flags:
-l List of connected nodes
-H List of connected HCAs
-S List of connected switches
-g Grouping

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 53


InfiniBand Utilities Descriptions

ibhosts
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibhosts either walks the IB subnet topology or uses an
already saved topology file and extracts the Controller
Adapter (CA) nodes.
Syntax:
ibhosts [-h] [<topology-file>] ble
Dependencies: ns fera
t r a
ibnetdiscover, ibnetdiscover format non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 54


InfiniBand Utilities Descriptions

ibswitches
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibswitches either walks the IB subnet topology or uses an
already saved topology file and extracts the IB switches.
Syntax:
ibswitches [-h] [<topology-file>]
ble
Dependencies:
ns fera
ibnetdiscover, ibnetdiscover format t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibswitches
Switch : 0x0021283a87cba0a0 ports 36 "Sun DCS 36 QDR LC switch sw-
ib2.east.sun.com" enhanced port 0 lid 1 lmc 0
Switch : 0x0021283a87b8a0a0 ports 36 "Sun DCS 36 QDR LC switch sw-
ib3.east.sun.com" enhanced port 0 lid 18 lmc 0

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 55


InfiniBand Utilities Descriptions

ibchecknet
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibchecknet uses a full topology file that was created by
ibnetdiscover, scans the network to validate the
connectivity, and reports errors (from port counters).
Syntax:
ibchecknet [-h] [<topology-file>] ble
Dependencies: n s fera
-t r a
ibnetdiscover, ibnetdiscover format, ibchecknode,
n o n
ibcheckport,
a
and ibcheckerrs as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
a l i
lM
Examples
E
e
El#i ibchecknet
#warn: counter SymbolErrors = 65533 (threshold 10) lid 1 port 255
Error check on lid 1 (Sun DCS 36 QDR LC switch burxsw-
ib2.east.sun.com) port all: FAILED
#warn: counter SymbolErrors = 65535 (threshold 10) lid 18 port 255
#warn: counter RcvSwRelayErrors = 152 (threshold 100) lid 18
port 255
Error check on lid 18 (Sun DCS 36 QDR LC switch burxsw-
ib3.east.sun.com) port all: FAILED

# Checking Ca: nodeguid 0x00212800013e6c22


#warn: counter XmtDiscards = 2661 (threshold 100) lid 7 port 2
Error check on lid 7 (burxdb04 S 192.168.20.124 HCA-1) port 2:
FAILED

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 56


InfiniBand Utilities Descriptions

ibcheckerrs
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibcheckerrs checks the specified port (or node) and reports
errors that surpassed their predefined threshold. The port
address is LID unless the -G option is used to specify a GUID
address.
Syntax:
r a ble
ibcheckerrs [-h] [-G] [-t <threshold_file>] s e
f[-
tra n
s(how_thresholds)] <lid|guid> [<port>] o n -
a n
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr thresholds
l i ce can be dumped by using the -s option, and a user-defined
lM
The predefined
E
i e
threshold_file
El<file>
(using the same format as the dump) can be specified by using the -t
option.
Examples
# ibcheckerrs 1
#warn: counter SymbolErrors = 65533 (threshold 10) lid 1 port
255
Error check on lid 1 (Sun DCS 36 QDR LC switch burxsw-
ib2.east.sun.com) port all: FAILED
Dependencies
perfquery, perfquery output format, ibaddr

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 57


InfiniBand Utilities Descriptions

ibportstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibportstate allows the port state and port physical state of
an IB port to be queried or a switch port to be disabled or
enabled.
Syntax:
ibportstate [-d(ebug) -e(rr_show) -v(erbose) rable
-D(irect)-G(uid) -s smlid -V(ersion) -C nsfe
ca_name -P ca_port -t timeout_ms] <dest n - tra
dr_path|lid|guid><portnum> [<op>] a no
s ฺ
haquery
supported ops: enable, disable, m ) ide o t Gu
ฺ c
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibportstate 3 1 disable # by lid
# ibportstate -G 0x2C9000100D051 1 enable # by guid
# ibportstate -D 0 1 # by direct route

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 58


InfiniBand Utilities Descriptions

ibcheckstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibcheckstate uses a full topology file that was created by
ibnetdiscover, scans the network to validate the port state and
port physical state, and reports any ports that have a port
state other than Active or a port physical state other than
LinkUp.
r a ble
Syntax: n s fe
ibcheckstate [-h] [<topology-file>] on-tr
a
a n
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Dependencies
E
i e
Elibnetdiscover, ibnetdiscover format, ibchecknode, ibcheckportstate
Examples
# ibcheckstate

## Summary: 12 nodes checked, 0 bad nodes found


## 54 ports checked, 0 ports with bad state found

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 59


InfiniBand Utilities Descriptions

ibcheckportstate
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibcheckportstate checks the connectivity and the
specified port for proper port state (Active) and port physical
state (LinkUp). The port address is LID unless the –G option is
used to specify a GUID address.
Syntax:
r a ble
ibcheckportstate [-h] [-G] <lid|guid> n s fe
<port_number> n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibcheckportstate 2 3
ibwarn: [28833] mad_rpc: _do_madrpc failed; dport (Lid 2)
smpquery: iberror: failed: operation portinfo: port info query
failed
Port check lid 2 port 3: FAILED

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 60


InfiniBand Utilities Descriptions

ibcheckerrors
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibcheckerrors uses a full topology file that was created by
ibnetdiscover, scans the network to validate the
connectivity, and reports errors (from port counters).
Syntax:
le
ibnetcheckerrors [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, ibchecknode,
n o
ibcheckport,
a
and ibcheckerrs as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 61


InfiniBand Utilities Descriptions

ibdiscover.pl
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibdiscover.pl uses a topology file that was created by
ibnetdiscover, a discover.map file that was created by the
network administrator (which indicates the nodes to be
expected), and an ibdiscover.topo file, (which is the
expected connectivity and produces a new connectivity file
r a ble
[discover.topo.new]) and outputs the changes to s fe
stdout. The network administrator can choose to replace n
tra the
n -
no changes
“old” topo file with the new one or incorporate acertain
from the new file to the old file. has eฺ ) uid
o m
e a ฺc nt G
c is- tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s of the ceibdiscover.map file is:
a l i
lM
The syntax
E
i e
El<nodeGUID>|port|"Text for node"|<NodeDescription from
ibnetdiscover format>
Examples
8f10400410015|8|"ISR 6000"|# SW-NM2 port 0 lid 5
8f10403960558|2|"HCA 1"|# MT23108 InfiniHost Mellanox Technologies
The syntax of the old and new topo files (ibdiscover.topo and
ibdiscover.topo.new) are:
<LocalPort>|<LocalNodeGUID>|<RemotePort>|<RemoteNodeGUID>
10|5442ba00003080|1|8f10400410015
These topo files are produced by the ibdiscover.pl tool.
Syntax:
ibnetdiscover | ibdiscover.pl

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 62


InfiniBand Utilities Descriptions

ibnodes
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibnodes either walks the IB subnet topology or uses an
already saved topology file and extracts the IB nodes (CAs
and switches).
Syntax:
ibnodes [<topology-file>] ble
Dependencies: ns fera
t r a
ibnetdiscover and ibnetdiscover format non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i ibnodes
Ca : 0x00212800013e6c22 ports 2 "db04 S 192.168.20.124 HCA-1"
Ca : 0x00212800013e6dfe ports 2 "db03 S 192.168.20.123 HCA-1"
Ca : 0x00212800013e6c12 ports 2 "cel07 C 192.168.20.117 HCA-1"
...
Ca : 0x00212800013e6cc6 ports 2 "cel01 C 192.168.20.111 HCA-1"
Ca : 0x00212800013e6cce ports 2 "cel02 C 192.168.20.112 HCA-1"
Switch : 0x0021283a87cba0a0 ports 36 "Sun DCS 36 QDR LC switch
burxsw-ib2.east.sun.com" enhanced port 0 lid 1 lmc 0
Switch : 0x0021283a87b8a0a0 ports 36 "Sun DCS 36 QDR LC switch
burxsw-ib3.east.sun.com" enhanced port 0 lid 18 lmc 0

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 63


InfiniBand Utilities Descriptions

ibclearerrors
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibclearerrors clears the PMA error counters in port
counters by either walking the IB subnet topology or by using
an already saved topology file.
Syntax:
le
ibclearerrors [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, and perfquery
n o
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 64


InfiniBand Utilities Descriptions

ibclearcounters
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibclearcounters clears the PMA port counters by either
walking the IB subnet topology or by using an already-saved
topology file.
Syntax:
le
ibclearcounters [-h] [<topology-file>] rab fe
Dependencies: a n s
n -t r
ibnetdiscover, ibnetdiscover format, and perfquery
n o
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 65


InfiniBand Utilities Descriptions

ibsysstat
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Description:
ibsysstat uses vendor mads to validate connectivity
between IB nodes and obtain other information about the IB
node. ibsysstat is run as client/server. The default is to run
as client.
Syntax:
r a ble
ibsysstat [options] <dest lid|guid> [<op>]nsfe
- tra
no n
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr flags:
l i ce
lM
Nonstandard
E
i e
ElCurrent supported operations:
ping - Verify connectivity to server (default).
host - Obtain host information from server.
cpu - Obtain cpu information from server.
-o <oui> Use specified OUI number to multiplex vendor mads.
-S Start in server mode (do not return).

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 66


InfiniBand Switch Platform Commands

The following platform CLI commands are available on this


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

version:
# version
SUN DCS 36p version: 1.3.3-2
Build time: Apr 4 2011 11:15:19
SP board info:
Manufacturing Date: 2009.06.22
r a ble
Serial Number: "NCD3R0168"
n s fe
Hardware Revision: 0x0006 n - tra
Firmware Revision: 0x0102 a no
BIOS version: NOW1R112 ) has ideฺ
BIOS date: 04/24/2009 eaฺc
om t Gu
n
c is- tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
a l i
lM
env_test
E
e
i environment:
ElShow
The env_test command can be used to read values for all the HW
sensors.
The output will indicate if there are any faulty HW:
NM2 Environment test started:
Starting PSU test:
PSU 0 not present
PSU 1 present status: OK
PSU test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.27 V
Measured 3.3V Standby = 3.37 V…

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 67


Measured 5V = 5.03 V
Measured VBAT = 3.28 V
Measured 2.5V = 2.51 V
Measured 1.8V = 1.80 V
Measured I4 1.2V = 1.18 V
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Voltage test returned OK


Starting Temperature test:
Back temperature 22.00
Front temperature 22.25
ComEx temperature 25.12
I4 temperature 32, maxtemperature 33
Temperature test returned OK
ble
Starting FAN test:
ns fera
Fan 0 not present t r a
WARNING Fan 1 not present but still running at rpm non-
a
15503
) has ideฺ
Fan 2 running at rpm 15697
ฺ c om t Gu
Fan 3 running at rpm 15697
i s -ea uden
Fan 4 not present
@ c S t
sry e thi s
FAN test returned 1 faults a
lm us
Starting Connector ฺ e
ie e to
test:
( e l
All Connectors
s r y OKcens
Connector M i
a test lreturned OK
E l
E l i e
Starting I4 test:
All I4s OK
I4 test returned OK
NM2 Environment test FAILED

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 68


InfiniBand Switch Platform Commands

showunhealthy
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• The showunhealthy command does the same tests as


env_test but shows only the unhealthy HW sensors:
– WARNING Fan 1 not present but still running
at rpm 15697
– FAILURE - 1 sensors NOT OK
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
showtemps
E
e
i showtemp command display the chassis temperatures
ElThe
Back temperature 32.62
Front temperature 34.50
Com-Express temperature 36.12
I4 temperature 54 maxtemperature 68

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 69


checkboot
The checkboot command shows the boot status of an I4/all I4s.
# checkboot A
I4 OK
Arguments:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

None = Status for all I4s


Valid arguments for 36p: A

checkpower
The checkpower command will display the status of the PSUs.
PSU 0 not present
PSU 1 present status: OK
ble
ns fera
getfanspeed t r a
no
The getfanspeed command will display the speed of the FANs. n-
a
Fan 0 not present
) has ideฺ
Fan 1 rpm 15314
ฺ c om t Gu
Fan 2 rpm 15314
i s -ea uden
Fan 3 rpm 15130
@ c S t
sry e thi s
Fan 4 not present a
lm us
ฺ e
( e lie se to
a sry licen
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 70


InfiniBand Switch Platform Commands

listlinkup
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• The listlinkup command displays the status of the


connectors and IBlinks:
– Connector 0A Present <-> I4 Port 20 is up
– Connector 1A Not present
– Connector 2A Not present
– ...<output truncated> ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
# listlinkup
E
Eli e
• Connector 0A Not present…
• Connector 8A Present <-> I4 Port 31 is up
• Connector 9A Present <-> I4 Port 14 is up
• Connector 10A Present <-> I4 Port 16 is up
• Connector 11A Present <-> I4 Port 18 is up
• Connector 12A Not present
• …
• Connector 12B Present <-> I4 Port 12 is up
• Connector 13B Present <-> I4 Port 10 is down
• Connector 14B Present <-> I4 Port 08 is up
• Connector 15B Present <-> I4 Port 06 is up
• Connector 16B Present <-> I4 Port 04 is up
• Connector 17B Present <-> I4 Port 02 is up

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 71


InfiniBand Utilities Descriptions

• In addition to the previous nm2user commands, the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

following commands are available to the root user.


• Test Chassis LED:
– chassis_led -h
– usage: ./chassis_led <on/off> [green]
[yellow] [white]
– Default: set all
r a ble
– chassis_led off yellow white
n s fe
tra
– chassis_led called without arguments -will
n
o
display current values. an
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
# chassis_led
E
e
i values:
ElLED
Green on
Yellow off
White off

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 72


Set fan speed:
setfan
Usage: setfan duty-cycle (In parts of 256)
Valid range for duty-cycle 80-255
Example:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

setfan 128
(Set fan duty-cycle to 128 of 255, around 1/2 speed 15000 RPM)
Note: The speed of the fans will normally be automatically controlled, so changing the value
here will normally have no effect.

Voltage check:
checkvoltages
Voltage ECB OK ble
Measured 3.3V Main = 3.27 V
ns fera
t r a
Measured 3.3V Standby = 3.37 V
no n-
Measured 12V = 12.00 V
a
Measured 5V = 5.03 V
) has ideฺ
Measured VBAT = 3.28 V
ฺ c om t Gu
Measured 2.5V = 2.51 V
i s -ea uden
@ c S t
Measured 1.8V = 1.80 V
ry thi s
Measured I4 1.2V = 1.18as
ฺ e lm use
V
All voltages OKlie
( e e to
s r y cens
l M a li
i e E
E l

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 73


InfiniBand Utilities Descriptions

nm2port
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• nm2port can be used to map between ibports and


connectors on NM2-36p
• Usage: /usr/local/sbin/nm2port [-guid guid
| -type NM2- 72p|NM2-36p -i4 i4name] (-
ibport ibport | -connector connectorname |
- printconnectors | -printinternal)
r a ble
-guid should be in the form 0x12ab and must
n s fe
be a valid -tra on
a n
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
NM2-36p
E
i e
El-type should be NM2-72p or NM2-36p (Default NM2-72p)
-i4 should be A, B, C, D, E or F
-connector should be 0A-17A or 0B-17B
-ibport should be 1-36
-printconnectors will display connectormapping for all connectors
-printinternal will display all internal links

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 74


Examples
# nm2port -type nm2-36p -connector 5A
NM2-36p connector 5A maps to I4 A ibport 30
# nm2port -guid 0x0003ba7aa1a3b0a0 -ibport 25
NM2-36p I4 A ibport 25 maps to connector 1B P3
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# nm2port -i4 a -ibport 22


NM2-36p I4 A ibport 22 maps to connector 0A P1
connector
# connector 5A present
Cable connector 5A present
Arguments:
connector:
ble
36p:
ns fera
0A-17A t r a
0B-17B no n-
a
Examples
) has ideฺ
# connector 12A portstate
ฺ c om t Gu
Cable connector 12A present <-> I4ea enis up
i s - Portud11
# connector 12A read 0xc
@ c S t
y
Read connector 12A reg a0srvalue t0x0c hi s
e lm ufor e
s description of registers
e ฺ
See QSFP/CXP specification
i t o and values.
y ( el nse
a sr lice
lM
setlinkspeed
E
Eli e
Usage:
setlinkspeed i4num ibport linkspeed
i4num and ibport as getportstate linkspeed as defined in IB
specification

enablecablelog / disablecablelog
Enable disable logging of Cable events in syslog (default enable)
enablelinklog / disablelinklog
Enable disable logging of Link events in syslog (default disable)

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 75


InfiniBand Utilities Descriptions

• getportstatus
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Usage:
• getportstatus i4num ibport
• Arguments:
• i4num (Values as for checkboot)
• ibport: IP port number(1-36)
ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Examples
E
e
El#i getportstatus A 1
Portstate 4
Portphystate 5
LinkWidthActive 2
LinkSpeedActice 4

i4reset
Reset I4-A
Note also after resetting the I4 that is connected to the
ComExpress, the I4 PCI-Express driver will no longer be working.
disableswitchport i4num ibport
(Arguments as getportstate)
enableswitchport i4num ibport

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 76


InfiniBand Utilities Descriptions

enablesm
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Start the subnet manager (SM) and make sure


it is started at boot disablesm.
• Disable the subnet manager (SM) and make
sure it is not started at boot.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
setsmpriority
E
Eli e
Usage:
setsmpriority priority
Set priority of the SM
priority should be in the range 0-15
setsubnetprefix
Usage:
setsubnetprefix subnetprefix
Set subnet prefix for the SM
subnetprefix should be a valid lower case hex number starting with
0x
setcontrolledhandover
Usage:
setcontrolledhandover value
Enable/disable controlled handover for the SM value should be TRUE
or
FALSE

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 77


InfiniBand Utilities Descriptions

setloghost
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

# setloghost
usage: setloghost <IP address or host name>
To turn off remote logging, use localhost as
parameter.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
PSU testing
E
Eli
# ea237test 1 read 0
Read psu 1 reg 0 value 0x8
For description of registers and values see specification of A237
PSU.
[root@o4nm2-36p-1 ~]# a237test 1 fruid
Read fruid psu 1
80 00 00 00 00 00 49 00 00 00 00 00 00 00 80 00
00 00 32 00 00 00 1f 71 11 91 00 00 00 00 00 00
<output truncated>

Error recovery
The CLI command managementreset can be used to recover from CPLD
FATAL (NM2-36p) and I4 therm shutdown. The CLI command will perform
the necessary actions to restore the system, and prompt the user to
reboot as necessary.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 78


InfiniBand Troubleshooting
Switch stuck in “pre-boot”
Stuck in “pre-boot”
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Is pre-boot version no
Upgrade pre-boot FW.
correct? (version)

yes

Run “boot”. yes


Problem solved!
Does switch boot ?
ble
no
ns fera
If nothing worked: t r a
Run
• Power cycle Switch non-
“check_app_partition
• If problem still persists, a

has ideฺ
SP filesystem is bad.
)
Finally:
ฺ c om t Gu
i s -ea uden
• File service ticket

@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
lM
Run “boot”
E
i e
El....
init> boot
Previous application starts failed (3 times). Please run
check_app_partition. Will not start application image.
init>
Run “check_app_partition”
....
init> check_app_partition
Doing filesystem check ...
fsck (busybox 1.14.3, 2009-09-16 10:16:05 CEST)
e2fsck 1.39 (29-May-2006)
/: clean, 11982/104448 files, 268701/417656 blocks
Everything looks OK.
init> boot

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 79


InfiniBand Troubleshooting
Firmware upgrade failed

pass
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Perform FW upgrade Problem solved!

fail
Is it due to incorrect FW no
Fix URL
URL?
If nothing worked:
yes • Switch is corrupted
no • File service ticket.
Is it due to a power
outage? b le
fera
yes

a ns
FW URL (Switch):
n- t r
no
Example:
ftp://root:******@10.20.30.40/xyz.abc.com/Releases/1.3.2/sundcs_gw_repository_1.3.2_1.pk
a
has ideฺ
g

)
om t Gu
Protocols supported & tested:
1) FTP
ฺ c
2) HTTP
i s ea ondethe
-enabled n file server)
tu
(Make sure appropriate transport service is

@ c S
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 80


InfiniBand Troubleshooting
IB link error reported
How to trace link errors in IB Fabric
(infinicheck / ibdiagnet)

Gather GUID and port information


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

“ibdiagnet” output:
....
-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-E- lid=0x0006 guid=0x00212856cd22c0a0 dev=48438 Port=24
Performance Monitor counter : Value
symbol_error_counter : 0x5
....
ble
fera
“ibcheckerrors” output:
[root@nm2gw-42 ~]# ibcheckerrors
a ns
## Summary: 11 nodes checked, 0 bad nodes found
n- t r
## 36 ports checked, 0 ports have errors beyond
a no
has ideฺ
threshold

)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E l M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 81


InfiniBand Troubleshooting
How to trace link errors in IB Fabric
IB link error reported
(infinicheck / ibdiagnet) Error seen on Switch port or HCA
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

port? (ibnetdiscover)

“ibnetdiscover” output:
....
vendid=0x2c9
devid=0xbd36
sysimgguid=0x212856cd22c0a3
switchguid=0x212856cd22c0a0(212856cd22c0a0)
Switch 36 "S-00212856cd22c0a0" # "SUN IB QDR GW switch nm2gw-41" enhanced port 0
ble
fera
lid 6 lmc 0
[30] "H-0002c90300089102"[1](2c90300089103)
[24] "H-0003ba000100e370"[1](3ba000100e371)
# " HCA-1" lid 51 4xQDR
# "nsn156-63 HCA-1" lid 2 4xQDR
a ns
[22] "H-00212800013ece9e"[1](212800013ece9f)
n- t
# "nsn156-61 HCA-1" lid 19 4xQDRr
[21] "H-0002c90300089102"[2](2c90300089104)
....
# " HCA-1" lid 54 4xQDR
a no
) has ideฺ
ฺ c om t Gu
Switch port GUID i s -ea uden
HCA port GUID c S t
@ s
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
Eli e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 82


InfiniBand Troubleshooting
How to trace link errors in IB Fabric

IB link error reported Error seen on Switch port or HCA


(infinicheck / ibdiagnet) port? (ibnetdiscover)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Switch HCA
Translate IB port number Trace IB port
to switch connector name number to HCA
(dcsport/listlinkup) connector

“dcsport” output:
le
rab2A
[root@nm2gw-41 ~]# dcsport -port 24
Port 24 maps to Connector
DCS-GW Switch port 24 maps to connector 2A
s fe
tra n
“listlinkup” output: on-
[root@nm2gw-41 ~]# listlinkup a n
a s ฺ
Connector 1A Present <-> Switch Port 22 up (Enabled)) h
Connector 0A Present <-> Switch Port 20 up (Enabled)

i d e
Connector 2A Present <-> Switch Port 24 up (Enabled)
c
Connector 3A Present <-> Switch Port 26 down ฺ(Enabled)
om t Gu
...
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
El i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 83


InfiniBand Troubleshooting
How to trace link errors in IB Fabric

IB link error reported Error seen on Switch port or HCA


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

(infinicheck / ibdiagnet) port? (ibnetdiscover)


Switch HCA
Translate IB port number Trace IB port
to switch connector name number to HCA
(dcsport/listlinkup) connector

“ibnetdiscover” output:
....
vendid=0x2c9
ble
fera
devid=0xbd36
sysimgguid=0x212856cd22c0a3
a n s
switchguid=0x212856cd22c0a0(212856cd22c0a0)
Switch 36 "S-00212856cd22c0a0"
n-
# "SUN IB QDR GW switch nm2gw-41" enhanced port 0 t r
lid 6 lmc 0
a no
has ideฺ
[30] "H-0002c90300089102"[1](2c90300089103) # " HCA-1" lid 51 4xQDR
[24] "H-0003ba000100e370"[1](3ba000100e371) # "nsn156-63 HCA-1" lid 2 4xQDR
[22] "H-00212800013ece9e"[1](212800013ece9f) )
# "nsn156-61 HCA-1" lid 19 4xQDR
[21] "H-0002c90300089102"[2](2c90300089104)
c om t Gu
# " HCA-1" lid 54 4xQDR

-eanumber n
....
HCA port GUID HCA port
i s u d e HCA port LID
c t
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 84


InfiniBand Troubleshooting
How to trace link errors in IB Fabric

IB link error reported Error seen on Switch port or HCA


(infinicheck / ibdiagnet) port? (ibnetdiscover)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Switch HCA
Translate IB port number to
Trace IB port number
switch connector name
to HCA connector
(dcsport/listlinkup)

Switch:
“disableswitchport” & “enableswitchport”
output:

Disable & Enable [root@nm2gw-41 ~]# disableswitchport 2A


b le
Switch/HCA port
fera
Disable connector 2A Switch port 24

(dis[en]ableswitchport)
Adminstate:......................Disabled
LinkState:.......................Down
a n s
n- t
PhysLinkState:...................Disabled r
no
[root@nm2gw-41 ~]# enableswitchport 2A
a
has ideฺ
Enable connector 2A Switch port 24
Adminstate:......................Enabled

)
LinkState:.......................Down

ฺ c om t Gu
PhysLinkState:...................PortConfigurationTraining

i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
HCA
E l M
i e
El“ibportstate enable/disable” output:
[root@nsn156-61 ~]# ibportstate 20 1 disable
Initial CA PortInfo:
# Port info: Lid 19 port 2
LinkState:.......................Down
PhysLinkState:...................Disabled
Lid:.............................20
.....
[root@nsn156-61 ~]# ibportstate 20 1 enable

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 85


InfiniBand Troubleshooting
How to trace link errors in IB Fabric

IB link error reported Error seen on Switch port or HCA


(infinicheck / ibdiagnet) port? (ibnetdiscover)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Switch HCA
Translate IB port number to
Trace IB port number
switch connector name
to HCA connector
(dcsport/listlinkup)

yes
Disable & Enable
Still see link no
Switch/HCA port Problem solved!
errors?
(dis[en]ableswitchport)
b le
Connector firmly ns fera
no Replug cable &
t r a
plugged in? Link LED
on?
confirm a “click” n-o
a Ifnnothing worked:
cable
Replace cable ha
s • ฺRestart SM
Change cables. Does
switch o m ) u id•e Power cycle IB device
the problem follow
e
Bad
a ฺcSwitch nPortt G Finally:
Switch or cable? HCA
c is- BadtHCA u dePort • File service ticket

s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 86


Known Support Issues and Workarounds
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 87


Known Support Issues and Workarounds

• Issue: ILOM IP addresses and default passwords are wrong


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

out of the factory.


Factory process inadvertently resets the ILOM defaults to the
normal x4x7x defaults of IP via DHCP and root password of
changeme instead of the predefined IPs and welcome1.
• Workaround: The IPs can be retained as is, because they will
be overwritten by ACS performing the applyconfig process.
This can be done after applyconfig, using the customer’s IPs ble
to check that they all set correctly. s f era
t r a n
For the ILOM default password, the CU most likely will
o n - change it,
an
but it can be set from DB01 by using the following:
# cd /opt/oracle.SupportTools/firstconf
)
# dcli -l root -g full "ipmitool sunoem cli 'set has ideฺ
c om t Gu
/SP/users/root password=welcome1' welcome1"

• Resolution: Factory process i s ea been
-has d e ncorrected. Only systems
shipped between June 2,
@ c2010 andS tuJuly 7, 2010 are affected.
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 88


Known Support Issues and Workarounds

• Issue: Disk drive fails “calibrate” tests.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Calibrate does a read-only operation to mimic the


database operation and benchmark throughput of all disks.
Exadata requires all disks to be performing equally for load
balancing purposes. Disk drives that appear to be running
at ~2/3rds their normal operation speed, just below
300IOPs, or at 1/2 their speed at ~75MB/s are marked
fails. The root cause is that the Seagate firmware r a ble
thresholds for temperature are higher on SAS thanra n sfe
SATA.
o n -t
• Workaround: n
a
has ideฺ when
1. Make sure that calibrate is run stand-alone,
)
om tany
cellsrv is not running, tocavoid
ฺ G u interference
a en in parallel.
s-eoccurring
from other I/Os that iare
c d tu
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 89


Known Support Issues and Workarounds

2. Power-cycle the cell once and try to calibrate again.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Because the issue is reported during installation,


customers could be encountering the issue due to the
box being powered off for a long time during shipping.
If this is the case, they should not encounter it again in
production when the disks are running at normal
operating temperature.
a b le
3. Try reseating the disks if the disks are still showing fe r
low I/O operations per second (IOPS) (calibrate a n s
failing
n -t r
on this disk). n o
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 90


Known Support Issues and Workarounds

4. Continue using the disks if the following


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

measurements are met: > 250 IOPS and > 110


MPBS. If the preceding measurements are not met,
repeat the power-cycle and reseating, and run
calibrate after a period of operating to see if they now
run clean. If a disk persists in having low performance
measurement, raise an SR and escalate to GL-X64-
NS queue with the measurements data for further erab
le
advice. n s f
n - tra
• Resolution: Fixed in new Seagate disk F/W,nwhicho is part
a
of OS image 11.2.1.3.1 as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
E l
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 91


Known Support Issues and Workarounds

• Issue: When logged in directly, ILOM reports


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

“Warning: The system appears to be in manufacturing


test mode. Contact Service immediately.” Because of
the way in which the 3.0.6.10.b ILOM image was built, on a
new installation, the image always has the sunservice
account enabled, which reports the message during login.
• Workaround: Log in as sunservice with the changeme le
password and run: f e rab
# /usr/local/bin/sunserviceacct disable a n s
t r -
• Resolution: Pending n on
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 92


Known Support Issues and Workarounds

• Issue: The CMA extender brackets (350-1514) for the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1U Rails used on x4170 dB nodes are not available as


FRUs.
This was a conscious engineering and service team
decision because the extender bracket is engineered out
of the product with a longer rail kit and a new CMA. The
bracket can break easily if the 1U systems are slid too fast. e
a b l
• Workaround: Be extra careful when pulling the x4170s fin e r
and out of the rack and during cabling the networktrports, a n s to
on -
avoid breaking the CMA. n
a
s new
• Resolution: Pending ECO release ofhthe
) a e ฺ longer rail
[F] 371-4919, which will be the cFRU u i
om replacementd for the
a ฺ n t G
combination of 371-2741is1U-e railu+de350-1514 CMA
c
Extension Bracket y@ is St
a sr e th
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 93


Known Support Issues and Workarounds

Verify motherboard replacement.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Refer to the manual for the full replacement procedure.


After replacing a motherboard on an Exadata cell (x4275), it can
sometimes show up as an X4170. This can be confirmed by logging in
to the ILOM and issuing the show /SYS
command and checking the model. If this occurs, try the following:
1. Power off the machine.
2. Open the lid and check the little gray ribbon cable that goes from r a ble
the motherboard to the PDB (near-left of the motherboardawhen n sfe
viewed from the front of the system). n - tr
a no
3. Check whether both ends of the cable are seated s correctly. If
h a ฺ
) uideand PDB and
not, remove the cable from both the motherboard
o m G pins are damaged.)
ฺcsee if tany
reseat it. (While doing so, checkato
i s -e uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr licehas an older BIOS, a post replacement manual power up may be
E M
Tip: If lthe motherboard
e
required.
Eli

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 94


Exercise Overview: System Troubleshooting

This exercise covers the following topics:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Performing basic system troubleshooting


• Removing an InfiniBand connection and remotely
identifying the cable and port
Preparation
You need the Student Guide and the Oracle Exadata Database
le
Machine Lab System diagram to perform this practice. rab fe
n s
- tra
n on
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 95


Task 6-1: Basic System Troubleshooting
1. Ensure that you know expected IPs for the nodes, gateway, and subnet mask. Most of
this procedure involves logging in to various components and ensuring that everything
looks correct.
2. Perform a visual inspection of the nodes and switches to ensure that all nodes are on,
have active link lights, and that a single switch appears to be active.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

3. View the configuration on both switches. Verify that the values match the expected
configuration.
4. View the configuration on the nodes. Verify that the values match the expected
configuration.
5. If you cannot access data, verify that the authorized client property is set as desired.
6. Verify that each node has the proper IP, gateway, and netmask configured.
7. Verify that the switch has the admin IP configured on one of its interfaces. Verify that the
netmask and gateway are correct. b le
8. Verify that the InfiniBand switches are functioning properly.
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 96


Task 6-2: Identify Missing IB Cable or Connection
A cable has been disconnected from a host or switch. Identify it. The following excerpt is from
a troubleshooting session to identify a missing IB cable connection. Use this as a guide to
identify the disconnected cable.
[root@trnadb01 ibdiagtools]#
/opt/oracle.SupportTools/ibdiagtools/verify-topology
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[ DB Machine InfiniBand Cabling Topology Verification Tool ]


Is every external switch connected to every internal
switch.......[SUCCESS]
Are any external switches connected to each
other...................[SUCCESS]
Are any hosts connected to spine
switch..................................[SUCCESS]
b le
fera
Check if all hosts have 2 CAs to different
switches....................[ERROR]
a ns
Node trnacel07 is connected to unknown switch 0x n- t r
--------fattree End Point Cabling verifation failed----- a no
Leaf switch check: cardinality and even ) has ideฺ
distribution..................[ERROR] om u
ฺ c t G
Internal QDR Switch 0x21283a89eba0a0
i s -ea has d e n than 7 cells
fewer
@ cto storage
S tu cells
It has only 6 links belonging
sry e thi s
[SUCCESS] a
lm an uvalid
s
i e ฺ e t o
Check if each rack
( e l has
e
internal

s r y cens
ring.............................[SUCCESS]

l M a li
i
Youe Eknow that the issue is with trnacel07. Looking at a snippet of the ibnetdiscover output,
l
E you see:
vendid=0x2c9
devid=0x673c
sysimgguid=0x212800013e6995
caguid=0x212800013e6992
Ca 2 "H-00212800013e6992" # "trnacel07 C 10.7.6.117 HCA-1“
[2](212800013e6994) "S-0021283a8371a0a0"[8] # lid 9 lmc 0 "Sun DCS
36 QDR LC switch localhost" lid 23 4xQDR

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 97


vendid=0x2c9
devid=0x673c
sysimgguid=0x212800013e698d
caguid=0x212800013e698a
Ca 2 "H-00212800013e698a" # "trnadb04 S 10.7.6.124 HCA-1"
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[2](212800013e698c) "S-0021283a8371a0a0"[12] # lid 22 lmc 0 "Sun DCS


36 QDR LC switch localhost" lid 23 4xQDR
[1](212800013e698b) "S-0021283a89eba0a0"[12] # lid 18 lmc 0 "Sun DCS
36 QDR LC switch localhost" lid 1 4xQDR
You see that only port 2 is listed for trnacel07, in contrast to both ports listed for trnadb04 in
the preceding code, so the host end of the cable is port 1 on trnacel07. From the HCA
diagram, you see that port 1 is on the right when viewed from the rear of the system.
You see that trnadb04 is connected to two switches, lid 23 and 1. You can also see that
b le
ns fera
trnacel07 is connected only to lid 23. From the line for trnacel07, you see that this is Switch
port 8, and from the mapping table, you determine that this is labeled port 14B. But can you
t r a
identify which switch?
no n-
a
You can log in to both of them and check whether you can see “Connector 14B Not present”
has ideฺ
in the listlinkup output. If not, the problem is likely on the host side.
)
[root@trnacel07 ~]# ibstatus
ฺ c om t Gu
InfiniBand device 'mlx4_0' port 1-e a en
status:
@ c i s
S tud
default gid: fe80:0000:0000:0000:0021:2800:013e:6993
state: 1: DOWN a sry e this
ฺ e lm us
( e lie se to
phys state: 2: Polling

a s r y c(4X)
rate: 10 Gb/sec
l i en
lM device 'mlx4_0' port 2 status:
InfiniBand
E
ie
Eldefault gid: fe80:0000:0000:0000:0021:2800:013e:6994
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
This shows that port 1 on trnacel07 is down. Indeed, for this test case, the cable was removed
from port 1 on trnacel07.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 98


The next test is to remove a switch-switch link.
[root@trnadb01]# /opt/oracle.SupportTools/ibdiagtools/verify-
topology
[ DB Machine InfiniBand Cabling Topology Verification Tool ]
Bad link:Switch 0x21283a8371a0a0 Port 11B - Sun Port 11B
Reason : 2.5 Gbps Speed found. Could be 10 Gbps
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Possible cause : Cable isn't fully seated in


Bad link:Switch 0x21283a89eba0a0 Port 11A - Sun Port 11A
Reason : 2.5 Gbps Speed found. Could be 10 Gbps
Possible cause : Cable isn't fully seated in
Is every external switch connected to every internal
switch..........[SUCCESS]
ble
fera
Are any external switches connected to each
other......................[SUCCESS]
a ns
Are any hosts connected to spine
n- t r
switch.....................................[SUCCESS]
a no
Check if all hosts have 2 CAs to different
) h as eฺ
switches.......................[SUCCESS]m
o u id
Leaf switch check: cardinality and a ฺ c n t G
- e even
e
distribution.....................[SUCCESS]
@ cis Stud
Check if each rack has anry
s h is
valid internal
t
a
lm use
ring................................[SUCCESS]
ฺ e to
You see an issue with
( e lietwo switch-to-switch
s e links. 11B is connected to 11A of the other switch.

a l i cen on the two switches returns:


srylistlinkup
Again, running

E lM 11B Present <-> I4 Port 17 is down - on one switch


Connector
i e
ElConnector 11A Not present - on the other switch
So it appears that 11A needs to be reseated. Indeed, for this test case, the IB connector 11A
on the IB switch had been pulled.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 99


Summary

In this lesson, you should have learned how to:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Collect information and determine a problem statement for


the Oracle Exadata Database Machine
• Verify the status and configuration on the Oracle Exadata
Database Machine
• List the available troubleshooting utilities and commands
• Access a failed system and determine the problem r a ble
• Understand and interpret available system logs rans
fe
o n -t
• List the known support issues and perform an n available
a
workaround or resolution as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 6 - 100


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Advanced Tasks

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Objectives

After completing this lesson, you should be able to describe


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

and perform advanced tasks on the Oracle Exadata Database


Machine.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 2


Relevance of Advanced Tasks

Discussion: The following questions are relevant to


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

understanding the Oracle Exadata Database Machine:


• What commands are important when you are performing
maintenance?
• How can you get back to factory configuration?

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 3


Additional Resources for Advanced Tasks

The following references provide additional information about


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the topics described in this lesson:


• My Oracle Support: https://support.oracle.com
– Master Note for Oracle Database Machine and Exadata
Storage Server (ID 1187674.1)
– Exadata Patching Overview and Patch Testing Guidelines
(ID 888828.1) le
b
• Automated Service Request (ASR) Training: fera
n s
– http://ilearning.oracle.com/ilearn/en/learner/jsp/offering_detail -t r a
s_home.jsp?classid=786184138 n o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
E lM
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 4


Oracle Database Machine Advanced Tasks

Caution: These procedures should be performed only in


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

collaboration with experienced Oracle support engineers. The


full procedure is listed here for reference and training purposes
only and should not be performed without assistance from fully
trained engineers.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 5


Selecting Desired Operating System for Database Server
Oracle Exadata Database Machine is shipped with the Linux operating system and Solaris
operating system for the Oracle Database servers. Linux is the default operating system. The
following procedure describes how to select Solaris as the operating system:
1. Log in as the root user.
2. Change to the /opt/SupportTools directory.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

3. Run the following command:


defaultOSchoose.pl
The script will prompt as follows:
Default OS is : LINUX_BOOT_0
Please choose new default OS:
[0] LINUX_BOOT_0
b le
fera
[1] SOLARIS_BOOT_1
[2] SOLARIS_BOOT_2
a n s
Please type the number you would like to make a newndefault -t r OS:
n o
Enter 1 to select SOLARIS_BOOT_1 for the operating system, and
s a press Enter.
Configuring Database Servers for Solaris Operating ) a eUpdates
hSystem ฺ
om Exadata i d
u Database Machine from
c
Solaris operating system updates are delivered toฺOracle t G
-ea uda ekeyn and certificate must be created for
the Solaris online repository. To receive thesupdates,
i
the database servers. Refer to My Oracle @ cSupportSnote
t 1021281.1 for information about
srytheeonline
creating the key and certificate, and s
thi repository.
a
lm Selecting
Reclaiming Disk SpaceฺeAfter
i e t o us the Operating System
el nssystem,
After selecting an(operating
y e reclaim the disk space occupied by the other operating
system. The r
asdisk space e
lic is shipped in the following configurations:
l M
• EThe four-disk database servers in Oracle Exadata Database Machine X2-2 are shipped
E l i e with Linux using the first two disks in hardware RAID-1 with RAID-1 implemented as the
disk controller configuration, and the second two disks used by Solaris in a ZFS RAID-1
configuration.
• For the eight-disk database servers in Oracle Exadata Database Machine X2-8 Full
Rack, Linux resides on the first four disks configured as three disks in RAID-5 at the disk
controller level with the fourth disk as the global hot spare. Solaris uses the other four
disks with two disks making one ZFS RAID-1 for root pool, and two disks making one
ZFS RAID-1 for the data pool.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 6


Reclaiming Disks for the Linux Operating System
Reclaiming the disks on a four-disk database server with Linux as the operating system
converts the system such that it now has three disks RAID-5 with the fourth disk as the hot
spare at the disk controller. Reclaiming the disks on an eight-disk database server with Linux
as the operating system creates seven-disk RAID-5 with the eighth disk as the hot spare at
the disk controller level.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The following procedure describes how to reclaim the disk space used by the Linux operating
system. This procedure should be performed on each database server.
1. Log in as the root user.
2. Change to the /opt/oracle.SupportTools directory.
3. Check the current disk configuration using the following command:
./reclaimdisks.sh –check
The command returns a detailed layout of the logical and physical disks. For Oracle Exadata
ble
fera
Database Machine X2-2, the last line of the output should be the following:
[INFO] Valid dual boot configuration found for Linux: RAID1 from 2
a ns
disks
n- t r
no
For Oracle Exadata Database Machine X2-8 Full Rack, the last line of the output should be
a
the following:
) has ideฺ
ฺ c om t Gu
[INFO] Valid dual boot configuration found for Linux: RAID5 from 3

-ea uden
disks and 1 global hot spare disk
c i s t
4. Start the disk reclamation process using the following command:
@ s S
./reclaimdisks.sh -free
a sry –reclaim
e thi
The command frees anyฺe lm us disks, and reclaims all free disks for Linux. The
Solaris-configured
process may take two
e o Exadata Database Machine X2-2, and five hours for
liehourssefor tOracle
(
a ryDatabase
Oracle Exadata
s l i c enMachine X2-8 Full Rack. To check the progress of the reclamation
lMuse the following command:
process,
E
Elie tail -f /var/log/cellos/reclaimdisks.bg.log

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 7


Reclaiming Disks for the Solaris Operating System
For four-disk database servers with Solaris as the operating system, reclaiming disks creates
a data pool with two reclaimed disks in ZFS RAID-1 configuration. For the eight-disk database
server with Solaris as the operating system, reclaiming disks results in two ZFS RAID-1 sets
created from the reclaimed disks to the existing data pool. The
/opt/oracle.SupportTools/disk_map.pl script on Solaris can assist in mapping the physical
enclosure:slot location of disks to the device names, ZFS pools and the mount points for the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

pools.
The following procedure describes how to reclaim the disk space used by the Linux operating
system. This procedure should be performed on each database server.
1. Log in as the root user on the database server.
2. Change to the /opt/SupportTools directory.
3. Run the reclaimdisks.pl script as follows:
reclaimdisks.pl [--unattended] ble
The script runs in interactive mode. To run the script in unattended mode, use the
ns fera
--unattended option. t r a
no n-
Respond to the prompts, as needed. The script prompts for confirmation to remove the Linux
a
has ideฺ
virtual disks, and to create Solaris virtual disks.
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 8


Exadata Hardening Script

The hardening script is run after the initial configuration. You


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

can also run this script whenever it is necessary to change


system passwords or repoint the Oracle Enterprise Manager
Grid Control agents.
To run the hardening script, perform the following steps:
1. Log in as the root user on the first Database Server on
the Database Machine.
r a ble
2. Change to the n s fe
/opt/oracle.SupportTools/dbmcprov/scripts directory.n - tra
3. Run the hardening script as follows: a no
s ha ideฺ
$ ./harden.sh )
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 9


Exadata Hardening Script

4. Respond with either “yes” or “no” to the prompt about the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

expiration of the operating system user password. The


responses are not case-sensitive.
– Respond with “yes” if this is the first time that you are running
the script after configuring the Database Machine, and do the
following:
a. Respond with either “yes” or “no” to the prompt for changing
the root password.
r a ble
Note: The first time that the root user logs in to the server,fe
the
system will prompt to change the root password. If you n s
not prompted, reboot the server and log in again. n-t
ra are
b. Enter the existing password and the new password
a no when
prompted by the system.
) has ideฺ
c o m Gu password, the
c. Repeat steps 4a and 4b for the celladmin
cellmonitor password, a
e ฺ the Grid
and
n t Infrastructure user
- e
password.
d. Go to step 5. y@
cis Stud
sr e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 10


Exadata Hardening Script

– Respond with “no” if this is not the first time, and go to step
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

5.
5. Respond to the prompt about setting SSH for the root
user.
6. Respond with either “yes” or “no” to the prompt about
Oracle Management Server.
– Respond with “yes” if the server has gone down. This will
a b le
bring up Oracle Management Server. If Oracle Management fe r
n s
n - tra
Server is not up, the hardening script will not complete.
– Respond with “no” if the server is up. no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 11


Exadata Hardening Script

7. Respond with either “yes” or “no” to the prompt about


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

repointing the management agents.


– Respond with “yes” to repoint the agents, and do the
following:
a. Enter the Oracle Management Server host name when
prompted.
b. Enter the Oracle Management Server port number when
prompted.
r a ble
c. Enter the Oracle Enterprise Manager Grid Control agent sfe
registration password when prompted. t r a n
-
d. Go to step 8. non a
– Respond with “no” if you do not want h s ฺ the agents,
toarepoint
and go to step 8. o m ) u ide
c G
aฺ ent
- e
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 12


Exadata Hardening Script

8. Respond with either “yes” or “no” to prompts about


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

hardening the Oracle DB and Oracle ASM system user


passwords.
– Respond with “yes” to change the passwords, and do the
following:
a. Enter the new password for the Oracle Database system user
when prompted.
b. Respond with “yes” or “no” when prompted to use the same
password for the Oracle Database system user and Oracle rab
le
Database SNMP users.
n s fe
ra
c. Enter the new password for the Oracle ASM system-tuser.
n
d. Respond with “yes” or “no” when prompted to n o the same
use
password for the Oracle ASM system user a
s and ฺOracle ASM
SNMP users. h a
) uide
e. Go to step 9. c o m G
– Respond with “no” if you aree n ฺ
t passwords. Go to
anot changing
is - d e
step 9. @c Stu
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 13


Exadata Hardening Script

9. Respond with either “yes” or “no” to prompts about


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

hardening the operating system user passwords. If the


passwords were changed at the beginning of this
procedure, this step will not occur.
– Respond with “yes” to change the passwords, and do the
following:
a. Enter the new password for the Oracle Enterprise Manager
Grid Control software user when prompted. r a ble
b. Enter the new password for the celladmin user. n s fe
c. Enter the new password for the cellmonitor user.n - tra
no user.
aroot
d. Enter the password for the Oracle Database
) h Exadata
e. Enter the new password for the Oracle
as
e ฺ Storage Server
m Gu i d
Software root user. ฺco
e a ent
-
– Respond with “no” if you
@ cisareSnot tudchanging passwords.
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 14


Replacing a Physical Disk Due to Disk Failure

To replace a disk due to disk failure, perform the following steps:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. Determine the cell disks, grid disks, and LUNs on the physical disk
that you want to replace by using the alert log, as follows:
CellCLI> LIST ALERTHISTORY WHERE ALERTMESSAGE LIKE
"Logical drive lost.*" DETAIL
The action text for this alert lists the cell disk and grid disks that are
affected.
Review the alert log and select the disk by name, time, and so on.
r a ble
2. Replace the physical disk on Exadata Storage Server.
n s fe
Note: When you replace a physical disk, the disk must be
n - tra
acknowledged by the RAID controller before you can
a nouse it. This
does not take long, but you should use the LIST
) h as PHYSICALDISK
e ฺ
command to ensure that the status is NORMAL.m u i d
a ฺ co nt G
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y e
cwhen
a l i
An alert
E lMis generated a disk fails. The alert includes specific instructions for replacing

Eltoiethe designated address. Consider the following when replacing a failed disk:
the disk. If you have configured the system for alert notifications, the alert will be sent by email

• The disk could be dropped by ASM, and the rebalance operation may have been
successfully run. Check the ASM alert logs to confirm this.
• The disk could be dropped, and the rebalance operation may be currently running.
Check the GV$ASM_OPERATION view to determine if the rebalance operation is still
running.
• The disk could be dropped by ASM, and the rebalance operation may have failed.
Check the V$ASM_OPERATION.ERROR code to determine whether the rebalance
operation failed.
• Rebalance operations from multiple disk groups can be performed on different ASM
instances in the same cluster if the physical disk being replaced contains grid disks from
multiple disk groups. Multiple rebalance operations cannot be run in parallel on just one
ASM instance. The operations will be queued for the instance.
• If the repair timer has not expired, the disk could be offline.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 15


Replacing a Physical Disk Due to Disk Failure

3. If ASM has not dropped the disk, drop the ASM disks from the ASM
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

disk group to which the grid disks belong by using the SQL ALTER
DISKGROUP DROP DISK command on the ASM instance. You must
drop the ASM disks from the ASM disk group before dropping the
corresponding grid disks from the cell.
4. From the Exadata Storage Server, run the following command by
using the LUN name. This command creates the cell disk and grid
disk on the LUN: ALTER LUN lun_name REENABLE FORCE.
The preceding command implicitly rebuilds the associated cell disks r a ble
and grid disks so that you do not have to perform additionalans
fe
procedures on the cell disks and grid disks. n - tr
n o
5. a
Add the grid disks to the ASM disk group. The addition
s ฺ of the grid
h a
disks to the group is automatically followed
o m i de
) by aurebalance
operation that populates the new disk t Gshare of ASM extents.
ฺc with its
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 16


Using MegaCli64 in Cases Where cellcli Does Not Have Support for the Desired
Operation
In general, cellcli should be used for all cell activities, but there are certain cases when a
lower-level command may be required. Following are a few examples of when the LSI
MegaCli64 utility may be needed.
If a drive is pulled out and plugged back in on the same cell, a manual import of the foreign
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

configuration may be required via command:


/opt/MegaRAID/MegaCli/MegaCli -cfgforeign -import –a0
If the controller setting needs to be verified, use the command:
/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0
If the firmware logs need to be investigated, use the command:
/opt/MegaRAID/MegaCli/MegaCli64 -fwtermlog -dsply -a0
Note: cellcli list physicaldisk detail is also important for this type of problem to b le
check for disk errors if available.
ns fera
If the controller battery status needs to be checked, use the following commands:
t r a
MegaCli64 -AdpBbuCmd -GetBbuStatus -a0 no n-
a
and/or
) has ideฺ
MegaCli64 -adpallinfo -a0
ฺ c om t Gu
If a performance problem is suspected to be -caused
i s ea bydethen drives falling back to write through
mode, use the command:
@ c S tu
MegaCli64 -LDInfo -Lall sry -aALL s
thi | grep 'Current Cache Policy’
a e
lm as predictive
If physical disk attributesฺe
i e such
t o us failure status are needed, use the command:
MegaCli64(e l se –aALL
-LdPdInfo
a
If the properties
y
sr of thel i en battery need to be verified, use the command:
ccontroller
E lM
e MegaCli64 -AdpBbuCmd -GetBbuProperties -a0
ElIfi physical disk information is needed, such as drive firmware version, use the command:
MegaCli64 -PDList -a0 | grep Inquiry

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 17


Replacing a Boot Disk Due to Disk Failure

To replace a boot disk due to disk failure, see the notes below.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
n-
no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr the l i ceLUN was fixed and cleared but never
M
lBOOT
### Evidence,
E
the missing

Eli e
CELL labels ###
CellCLI> list alerthistory
169 2011-01-25T03:19:58+00:00 warning "Physical
drive state changed on Adapter 0 deviceId 9, enclosureId 20, slotId
1 from online to failed."
170 2011-01-25T03:20:01+00:00 critical Logical
drive lost. Lun: 0_1. Status: critical. Physical hard disk: 20:1.
Slot Num: 1. Serial Number: E046LE.
Celldisk: CD_01_sdm1cel02.
Griddisks: RECO_CD_01_sdm1cel02, DATA_CD_01_sdm1cel02.
171_1 2011-01-25T10:52:51+00:00 critical
Cell configuration check discovered the following problems:
=CELLBOOT USB=
=disk partitions, MD devices, CELLBOOT usb

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 18


/dev/md1
0 8 10 0 active sync /dev/sda10
/dev/md2
0 8 9 0 active sync /dev/sda9
/dev/md4
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

0 8 1 0 active sync /dev/sda1


/dev/md5
0 8 5 0 active sync /dev/sda5
/dev/md6
<snip>

[ERROR] Cells must have 3 bootable devices. 2 with label BOOT and 1
ble
with label CELLBOOT
ns fera
/dev/sda BOOT
t r a
/dev/sdm CELLBOOT
no n-
a
has ideฺ
/dev/sda7 CELLSW
)
om t Gu
/dev/sda5 CELLSYS
ฺ c
/dev/sda1 BOOT
i s - ea den
/dev/sdm1 CELLBOOT
@ c S tu
sry e thi s
a
lm us2 BOOT, 2 CELLSYS and 2 CELLSW labels
e
[ERROR] Exactly 1 ฺCELLBOOT,
lie se to you risk data loss
must exist. Ifeuncorrected
(
a
y one
[WARNING]srOnly
l i cendisk has system disk layout. Either the second
system
E lMdisk is unavailable or this is not an Exadata cell
i e
El[ERROR] The Cell has missing system disks or improperly configured
and partitioned disks

172 2011-01-26T12:34:22+00:00 critical "Physical


drive removed on Adapter 0 deviceId 9, enclosureId 20, slotId 1."
173 2011-01-26T12:34:22+00:00 info "Physical
drive state changed on Adapter 0 deviceId 9, enclosureId 20, slotId
1 from failed to unconfigured-bad."
174 2011-01-26T12:34:26+00:00 warning Logical
drive status changed. Lun: 0_1. Status: not present. Physical hard
disk: 20:1. Slot Num: 1. Serial Number: E046LE.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 19


Celldisk: CD_01_sdm1cel02.
Griddisks: RECO_CD_01_sdm1cel02, DATA_CD_01_sdm1cel02.
175 2011-01-26T12:34:27+00:00 critical "A field
replaceable unit at /SYS/DBP/HDD1 has been removed from the system."
176 2011-01-26T12:35:31+00:00 clear "A field
replaceable unit at /SYS/DBP/HDD1 has been inserted into the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

system."
177 2011-01-26T12:35:38+00:00 clear "Physical
drive inserted on Adapter 0 deviceId 21, enclosureId 20, slotId 1."
178 2011-01-26T12:36:31+00:00 info "Physical
drive state changed on Adapter 0 deviceId 21, enclosureId 20, slotId
1 from unconfigured-good to online."
179 2011-01-26T12:36:52+00:00 clear Logical
drive found. Lun: 0_1. Status: normal. Physical hard disk: 20:1. ble
Slot Num: 1. Serial Number: E1NQF3. It was empty. Will auto-create.
ns fera
t r a
n-
Celldisk: CD_01_sdm1cel02.
Griddisks: DATA_CD_01_sdm1cel02, RECO_CD_01_sdm1cel02.
a no
) has ideฺ
## Checking CELL ##
ฺ c om t Gu
i s -ea uden
CellCLI> alter cell validate @ c S t
configuration
sry e thi s
a
lm us
ฺ e
/dev/md1
( e lie se to
0 ry 8 n 10 0 active sync /dev/sda10
a s li c e
l
/dev/md2
E 0M
E l i e 8 9 0 active sync /dev/sda9
/dev/md4
0 8 1 0 active sync /dev/sda1
1 65 209 1 active sync /dev/sdad1
<snip>
/dev/md8
0 8 8 0 active sync /dev/sda8
1 65 216 1 active sync /dev/sdad8

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 20


/dev/sda7 CELLSW
/dev/sda5 CELLSYS
/dev/sda1 BOOT <------------ only one, where is sdad1 from above
/dev/md4 !?!?
/dev/sdad7 CELLSW
/dev/sdad5 CELLSYS
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

/dev/sdm1 CELLBOOT
[ERROR] Exactly 1 CELLBOOT, 2 BOOT, 2 CELLSYS and 2 CELLSW labels
must exist. If uncorrected you risk data loss
[ERROR] The Cell has missing system disks or improperly configured
and partitioned disks
CellCLI>
ble
#####################################
ns fera
t r a
## Checking IMAGEINFO for above devices ##
no n-
##################################### a
has ideฺ
[root@sdm1cel02 MegaCli]# /opt/oracle.cellos/imageinfo
)
ฺ c om t Gu
i s
Kernel version: 2.6.18-128.1.16.0.1.el5-ea ud#1 enSMP Tue Jun 30 16:48:30
EDT 2009 x86_64 @ c S t
sry e thi s
a
Cell version: OSS_11.2.1.2.6_LINUX.X64_100511
lm us
ฺ e
Cell rpm version:
( e lie se to
cell-11.2.1.2.6_LINUX.X64_100511-1

a s ry licen
lMimage version: 11.2.1.2.6
Active
E
ie
ElActive image activated: 2010-07-28 11:22:16 +0100
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

In partition rollback to 11.2.1.2.3: Possible

Cell boot usb partition: /dev/sdm1


Cell boot usb version: 11.2.1.2.6
Inactive image version: undefined
Rollback to the inactive partitions: Impossible

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 21


######################################
## Check Devices Labels from above devices ##
######################################
[root@sdm1cel02 MegaCli]# e2label /dev/md5
CELLSYS
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[root@sdm1cel02 MegaCli]# e2label /dev/md7


CELLSW
[root@sdm1cel02 MegaCli]# e2label /dev/sdm1
CELLBOOT
[root@sdm1cel02 MegaCli]# e2label /dev/sda5
CELLSYS
[root@sdm1cel02 MegaCli]# e2label /dev/sda7
ble
CELLSW
ns fera
[root@sdm1cel02 MegaCli]# e2label /dev/md4 t r a
BOOT n-
no
[root@sdm1cel02 MegaCli]# e2label /dev/sda1 as
a
) h ideฺ
BOOT
ฺ c om t Gu
i s -ea uden
############################ c S t
@ s
## Check Disk SCSI and aLabels sry e##thi
ฺ e lm us
( e l ie e to
############################
[root@sdm1cel02
s r e ns
y cMegaCli]# lsscsi -d|grep LSI
[0:0:20:0]
l M a li
enclosu LSILOGIC SASX28 A.1 502E -
i e E
E l
[0:2:0:0] disk LSI MR9261-8i 2.0. /dev/sda[8:0]
[0:2:1:0] disk LSI MR9261-8i 2.0. /dev/sdad[65:208]
<---------- clue!! this device is different from all other cells.
Added.

[0:2:2:0] disk LSI MR9261-8i 2.0. /dev/sdc[8:32]


<snip>
[0:2:10:0] disk LSI MR9261-8i 2.0. /dev/sdk[8:160]
[0:2:11:0] disk LSI MR9261-8i 2.0. /dev/sdl[8:176]

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 22


[root@sdm1cel02 MegaCli]# blkid
/dev/sda1: LABEL=OOT" UUID="873922fa-afb8-4418-a917-c1bca7be2645"
TYPE="ext3"
/dev/sda2: UUID=3ef3b91-511e-4f89-a92e-df36823ef843" TYPE="ext2"
<snip>
/dev/md5: LABEL=ELLSYS" UUID="3bd269f4-c94c-4a5c-9892-efcaaa6299e6"
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

TYPE="ext3"
/dev/md4: LABEL=OOT" UUID="873922fa-afb8-4418-a917-c1bca7be2645"
TYPE="ext3" <---- BOOT is here.
/dev/sdad2: UUID=dd7ffcc-0c31-415c-9a48-39321b8471ed" TYPE="ext2"
/dev/sdad5: LABEL=ELLSYS" UUID="3bd269f4-c94c-4a5c-9892-
efcaaa6299e6" TYPE="ext3"
/dev/sdad6: UUID=d45e1b1-d879-47e7-955b-af53375c7132"
ble
SEC_TYPE="ext2" TYPE="ext3"
ns fera
r
/dev/sdad7: LABEL=ELLSW" UUID="3e84d917-3123-47f1-bde4-22a20e6f21c6"
t a
TYPE="ext3"
no n-
a
has ideฺ
/dev/sdad8: UUID=ef2ba4d-96d9-4121-b5bc-3b03ac6e27ee"
SEC_TYPE="ext2" TYPE="ext3"
)
ฺ c om t Gu
i s -ea uden
@ c S t
#### FIX ####
sry e thi s
a
lm us
ฺ e to
Resync MD device
( e lie tosediscover labels if missing.
# mdadm --query
a en
sry lic/dev/md4
E lM <-----Run again, shows the missing BOOT label
# blkid added and
e
i VALIDATION reports successful.
ElCELL
/dev/sdad1: LABEL=OOT" UUID="873922fa-afb8-4418-a917-c1bca7be2645"
TYPE="ext3"

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 23


Replacing a Physical Disk Due to Disk Problems

You may need to replace a physical disk when the disk is in predictive
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

failure status or poor performance status. The predictive failure status


indicates that the physical disk will soon fail and should be replaced at
the earliest opportunity.
The Oracle ASM disks associated with the grid disks on the physical
drive are automatically dropped, and an Oracle ASM rebalance will
relocate the data from the predictively failed disk to other disks.
ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr licstatus e
E lM
The poor performance indicates that the physical disk demonstrates extremely poor
ie
performance,
Elassociated
and should be replaced at the earliest opportunity. The Oracle ASM disks
with the grid disks on the physical drive are automatically dropped with the FORCE
option if possible. If DROP FORCE cannot succeed due to offline partners, the grid disks are
automatically dropped normally, and an Oracle ASM rebalance will relocate the data from the
poor performance disk to other disks.
To replace a physical disk, perform the following steps:
1. Determine the failing disk by using the following commands:
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
"predictive failure" DETAIL
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
"poor performance" DETAIL

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 24


The following is an example of the output from the commands. The slot number shows the
location of the disk, and the status shows that the disk is expected to fail.
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
"predictive failure" DETAIL
name: 28:3
deviceId: 19
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

diskType: HardDisk
enclosureDeviceId: 28
errMediaCount: 0
errOtherCount: 0
foreignState: false
luns: 0_3
makeModel: "SEAGATE ST360057SSUN600G”
physicalFirmware: 0705
ble
fera
physicalInterface: sas
physicalSerial: E07L8E
a ns
physicalSize: 558.9109999993816G
n- t r
slotNumber: 3
a no
has ideAND
status: predictive failure

)
CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk ฺ status="poor
performance" DETAIL
ฺ c om t Gu
name: 28:3
i s - ea den
deviceId: 19
@ c S tu
diskType: HardDisk ry
s t hi s
a
lm 28us
enclosureDeviceId: e
ฺ e
( e lie se0 to
errMediaCount:

s r y cen 0
errOtherCount:

lM li
a foreignState: false

i e E luns: 0_3
E l makeModel: "SEAGATE ST360057SSUN600G”
physicalFirmware: 0705
physicalInterface: sas
physicalSerial: E07L8E
physicalSize: 558.9109999993816G
slotNumber: 3
status: poor performance
2. Wait until the Oracle ASM disks associated with the grid disks on the physical disk have
been successfully dropped. To determine if the grid disks have been dropped, query the
V$ASM_DISK_STAT view on the Oracle ASM instance.
Caution: The disks in the first two slots are system disks, which store the operating
system and Oracle Exadata Storage Server Software. One system disk must be in
working condition to keep the cell up and running. Wait until ALTER CELL VALIDATE
CONFIGURATION shows no mdadm errors, which indicates that the system disk resync
has completed, before replacing the other system disk.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 25


3. Replace the physical disk. The physical disk is hot pluggable, and can be replaced
when the power is on.
When you remove the disk, you will get an alert. The grid disks and cell disks that
existed on the previous disk in the slot will be re-created on the new physical disk. If
those grid disks were part of an Oracle ASM disk group, they will be added back to the
disk group and the data will be rebalanced based on disk group redundancy and the
asm_power_limit parameter.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: When you replace a physical disk, the disk must be acknowledged by the RAID
controller before you can use it. This does not take a long time, but you should use the
list physicaldisk command to ensure that the status is NORMAL.
Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the
rebalance, do the following:
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm this.
ble
fera
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
a ns
n- t r
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation failed.
a no
has ideฺ
• Rebalance operations from multiple disk groups can be performed on different Oracle
)
ฺ c om t Gu
ASM instances in the same cluster if the physical disk that is being replaced contains
ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance
s -ea uden
operation at a time. If all Oracle ASM instances are busy, rebalance operations will be
i
queued. @ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 26


Removing a Physical Disk Due
to Bad Performance
A single bad physical disk can degrade the performance of
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

other good disks. It is better to remove the bad disk from the
system than let it remain. To identify a bad physical disk, use
the CALIBRATE command and look for very low throughput
and input/output operations per second (IOPS) for each
physical disk.
Note: If a disk exhibits extremely poor performance, it is le
marked as poor performance and its grid disks are r a b
n s fe
automatically dropped from the ASM disk group. tra
n on-
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sdisk
E lM
After the bad has been identified, perform the following procedure:

Elie1. Find all the grid disks on the bad disk. Use the following command to direct Oracle ASM
to stop using the bad disk immediately:
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE
It is possible that the DROP command with the FORCE option could fail due to offline
partners. You can restore the Oracle ASM data redundancy by correcting other cell or
disk failures and retry DROP...FORCE, or use the following command to direct Oracle
ASM to rebalance the data out of the bad disk:
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name NOFORCE
2. Wait until the Oracle ASM disks associated with the grid disks on the bad disk have
been successfully dropped by querying the V$ASM_DISK_STAT view.
3. Remove the badly performing disk. When you remove the disk, you get an alert.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 27


4. When a new disk is available, install the new disk in the system. The cell disks and grid
disks are automatically created on the new physical disk. If these grid disks were part of
an Oracle ASM disk group and DROP...FORCE was used in step 1, they will be added
back to the disk group and the data will be rebalanced based on disk group redundancy
and the asm_power_limit parameter. If DROP...NOFORCE was used in step 1, you
must manually add the grid disks back to the Oracle ASM disk group.
Note: When you replace a physical disk, the disk must be acknowledged by the RAID
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

controller before you can use it. This does not take long, but you should use the LIST
PHYSICALDISK command to ensure that the status is NORMAL.

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 28


Repurposing a Physical Disk

You may want to delete all data on a disk and then use the disk
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

for another purpose.


Before doing so, ensure that you have copies of the data that is
on the disk.
To repurpose a disk:
1. Use the CellCLI LIST command to display the Exadata
cell objects. You must identify the grid disks and cell disks able
on the physical drive. s f er
t r a n
Example: n-
n o
CellCLI> LIST PHYSICALDISK a
20:0 D174LX normal
) has ideฺ
20:1 D149R0 normal ฺ c om t Gu
a n
... is-e de
c tu
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 29


Repurposing a Physical Disk

2. Determine the cell disks and grid disks on the LUN by


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

using commands similar to the following:


CellCLI> LIST LUN ATTRIBUTES name, cellDisk WHERE
physicalDrives='20:0'
3. From Oracle ASM, drop the Oracle ASM disks on the
physical disk by using the following command:
ALTER DISKGROUP diskgroup_name DROP DISK
ble
fera
asm_disk_name
4. From Exadata cell, drop the cell disks and grid disks a non the s
- t r
physical disk by using the following command:non a
has ฺ FORCE
DROP CELLDISK celldisk_on_this_lun
) uide
5. Remove the disk to be repurposed m
c nt Ginsert a new disk.
o and
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
a l i
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 30


Repurposing a Physical Disk

6. Wait for the new physical disk to be added as a LUN.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

CellCLI> LIST LUN


7. Create new cell disks on the new physical disk.
8. Create new grid disks on the new cell disks.
9. Add the grid disks to the Oracle ASM disk group.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 31


Moving All Drives from One Cell to Another Cell

You may need to move all the drives from one Exadata cell to
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

another Exadata cell. This may be necessary when there is a


chassis-level component failure, or when you are
troubleshooting a hardware problem.
To move the drives:
1. Refer to the slide titled “Shutting Down a Cell” to safely
inactivate all grid disks. Make sure that ASM
DISK_REPAIR_TIME is set sufficiently long enough so rabl
e
Oracle ASM does not drop the disks before the grid disks n s fe
can be activated in another Exadata cell. n - tra
2. Back up the files in the /etc/hosts directory, a no the
/etc/modprobe.conf directory, and s ฺ
hathe
/etc/sysconfig/network directory. o m ) u ide files will be
These
a
re-created when /usr/local/bin/ipconf
e ฺc nt G is run later in
i s - ude
this procedure. @ c S t
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 32


Moving All Drives from One Cell to Another Cell

3. Move the disks from the original Exadata cell to the new
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Exadata cell.
Caution: Ensure that the first two disks, which are the
system disks, are in the same first two slots. Failure to do
so will cause the Exadata cell to function improperly.
4. Start the cell.
5. Restart the cell services by using the following command: ble
CellCLI> ALTER CELL RESTART SERVICES ALLns fera
t ra
on-
6. Activate the grid disks by using the following command:
n
sa
CellCLI> ALTER GRIDDISK ALL ACTIVE
ha ideฺ
If the Oracle ASM disks on this cell )
ฺ c omhave t G unot been
dropped, they are changed-etoaonline e nautomatically and
start getting used. @c i s tu d
s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 33


Removing and Replacing the Same Physical Disk

If you inadvertently removed the wrong physical disk, perform the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

following steps:
1. Replace the physical disk immediately, and wait for the disk to
be recognized by the RAID controller.
2. Run the following command to obtain the LUN and accept the
cell disks and grid disks on it:
CELLCLI -e "ALTER LUN lun_name REENABLE FORCE"
3. From ASM, verify the status of the ASM disks on the physical fera
ble
disk. a n s
-t r
4. Run the following command from the ASM host: non
s aONLINE DISK
SQL> ALTER DISKGROUP diskgroup_name
) a
h ideฺ
asm_disk_name m u co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 34


Replacing an F20 Flash Disk Due to Failure

Oracle Exadata X2-2 Storage Server is equipped with four F20


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

PCIe cards. Each card has four flash disks (FDOMs) for a total
of 16 flash disks. The four F20 PCIe cards are present on the
PCI slots numbered 1, 2, 4, and 5. The F20 PCIe cards are not
hot pluggable; therefore, the Exadata cell must be powered
down before replacing the flash disks or cards.

ble
n sfera
t r a
n-no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sfailed
E M
To identify a
lCellCLI> flash disk, use the following command:

Elie STATUS=critical DETAIL


LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND

name: [9:0:2:0]
diskType: FlashDisk
id: 508002000092e70FMOD2
luns: 1_2
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20R
physicalInsertTime: 2009-10-27T13:11:16-07:00
physicalInterface: sas
physicalSerial: 508002000092e70FMOD2
physicalSize: 22.8880615234375G
slotNumber: "PCI Slot: 1; FDOM: 2"
status: critical

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 35


The slotNumber attribute shows the PCI slot and FDOM number. If an FDOM is detected to
have failed, an alert is generated, indicating that the flash disk, as well as the LUN on it, has
failed. The alert message includes the PCI slot number of the flash card, and the exact FDOM
number. These numbers uniquely identify the field replaceable unit (FRU). If you have
configured the system for alert notification, the alert will be sent by email to the designated
address.
A flash disk outage can cause reduction in performance and data redundancy. The failed disk
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for
flash cache, the effective cache size for the cell is reduced. If the flash disk is used for grid
disks, the Oracle ASM disks associated with these grid disks are automatically dropped with
the FORCE option from the Oracle ASM disk group and Oracle ASM rebalance will ensue to
restore the data redundancy.
To replace a flash disk due to disk failure, perform the following steps:
1. Shut down the cell.
b le
fera
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
a ns
4. Bring all grid disks online by using the following command: n- t r
CellCLI> ALTER GRIDDISK ALL ACTIVE a no
5. Verify that all grid disks have been successfully put online
) hasbyidusing
e ฺ the following
command:
ฺ c om t Gu
CellCLI> LIST GRIDDISK ATTRIBUTES
i s -ea udasmmodestatus
en
Wait until asmmodestatus shows ONLINE
@ c for allSgrid
t disks.
The new flash disk will be automaticallysry eused s
thi by the system. If the flash disk is used for flash
a
lm will increase.
cache, the effective cache
i e ฺ e size
t o us If the flash disk is used for grid disks, the grid
disks will be recreated
( e l seon the new flash disk. If those grid disks were part of an Oracle ASM
ywill beceadded
srdisk
disk group, they n back to the disk group and the data will be rebalanced on them
based on M a
the l i
group redundancy and the asm_power_limit parameter.
l
E ASM rebalance occurs when dropping or adding a disk. Consider the following:
E i e
Oracle
l
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm.
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.
See the Sun Flash Accelerator F20 PCIe Card User’s Guide for additional information about
the F20 PCIe cards at http://docs.sun.com/app/docs/prod/flash.pcie?l=en&a=view.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 36


Replacing an F20 Flash Disk Due to Problems

You may need to replace a flash disk because the disk is in


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

predictive failure status or poor performance status.

ble
n sfera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
spredictive
E lM
To identify a failure flash disk, use the following command:

Elie
CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND
STATUS='predictive \failure' DETAIL
name: [9:0:2:0]
diskType: FlashDisk
id: 508002000092e70FMOD2
luns: 1_2
makeModel: "MARVELL SD88SA02"
physicalFirmware: D20R
physicalInsertTime: 2009-10-27T13:11:16-07:00
physicalInterface: sas
physicalSerial: 508002000092e70FMOD2
physicalSize: 22.8880615234375G
slotNumber: "PCI Slot: 1; FDOM: 2"
status: predictive failure

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 37


To identify a poor performance flash disk, use the following command:
CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS= \
'poor performance' DETAIL
name: [9:0:2:0]
diskType: FlashDisk
id: 508002000092e70FMOD2
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

…<truncated>
slotNumber: "PCI Slot: 1; FDOM: 2"
status: poor performance
To replace a flash disk due to disk problems, perform the following steps:
1. Shut down the cell.
2. Replace the failed flash disk based on the PCI number and FDOM number.
3. Power up the cell. The cell services will be started automatically.
b le
4. Bring all grid disks online by using the following command:
nfera
s
CellCLI> ALTER GRIDDISK ALL ACTIVE
r a
-t following
5. Verify that all grid disks have been successfully put online by usingn the
command: n o
a
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
) has ideฺ
Wait until asmmodestatus shows ONLINE for all grid
ฺ c omdisks. u
t G
The new flash disk will be automatically used-by e athe system.
n
e the flash disk is used for flash
If
c i s
cache, the effective cache size will increase. If the t u d
r y@ i s S flash disk is used for grid disks, the grid
disks will be re-created on the new
a h If those grid disks were part of an Oracle ASM
s flashedisk.
tdisk
disk group, they will be added
ฺ e l mback to
u sthe group and the data is rebalanced on them
based on the disk group
e lie se t o
redundancy and the asm_power_limit parameter.
(
ry liceoccurs
Oracle ASM rebalance n when dropping or adding a disk. To check the status of the
a s
lM do the following:
rebalance,
E
Elie
• The rebalance operation may have been successfully run. Check the Oracle ASM alert
logs to confirm.
• The rebalance operation may be currently running. Check the GV$ASM_OPERATION
view to determine if the rebalance operation is still running.
• The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
to determine if the rebalance operation has failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 38


Removing an F20 Flash Disk
Due to Bad Performance
A single bad flash disk can degrade the performance of other
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

good flash disks. It is better to remove the bad flash disk from
the system than to let it remain. To identify a bad flash disk, use
the CALIBRATE command and look for very low throughput
and IOPS for each flash disk.
If a flash disk exhibits extremely poor performance, it is marked
as poor performance. The flash cache on that flash disk will be le
automatically disabled, and the grid disks on that flash disk will f e rab
be automatically dropped from the Oracle ASM disk group. a n s
t r
n on-
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sflash
E lM
After the bad disk is identified, perform the following steps:

Elie disk by using the following commands:


1. If the flash disk is used for flash cache, disable the flash cache that is part of the flash

CellCLI > DROP FLASHCACHE


CellCLI > CREATE FLASHCACHE CELLDISK='fd1,fd2,fd3,fd4, ...‘
Note: Do not include the bad flash disk when creating the new flash cache.
2. If the flash disk is used for grid disks, use the following command to direct Oracle ASM
to stop using the bad disk immediately:
ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE
It is possible that the DROP command with the FORCE option may fail due to offline
partners. Either restore the Oracle ASM data redundancy by correcting other cell or disk
failures and retry DROP...FORCE, or use the following command to direct Oracle ASM
to rebalance the data out of the bad disk:
ALTER DISKGROUP DISKGROUP_NAME DROP DISK ASM_DISK_NAME NOFORCE

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 39


3. Wait until the Oracle ASM disks associated with the grid disks on this bad flash disk
have been successfully dropped by checking the V$ASM_DISK_STAT view.
4. Shut down the cell.
5. Remove the bad flash disk and replace it with a new flash disk.
6. Power up the cell. The cell services will be started automatically.
7. Bring all grid disks online by using the following command:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

CellCLI> ALTER GRIDDISK ALL ACTIVE


8. Verify that all grid disks have been successfully put online by using the following
command:
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
Wait until asmmodestatus shows ONLINE for all grid disks.
9. Add the new flash disk to flash cache by using the following commands:
CellCLI> DROP FLASHCACHE
b le
fera
CellCLI> CREATE FLASHCACHE ALL
ns
If the flash disk is used for grid disks, the grid disks will be recreated on the new flash disk. If
a
n- t
these grid disks were part of an Oracle ASM disk group and DROP...FORCE was used inr
no
step 2, they will be added back to the disk group and the data will be rebalanced on them
a
) h as eฺ
based on the disk group redundancy and the asm_power_limit parameter.
If DROP...NOFORCE was used in step 2, you must manually
o m G addu id grid disks back to the
the
Oracle ASM disk group. c
aฺ ent
- e
Oracle ASM rebalance occurs when dropping
@ tud a disk. To check the status of the
cis orSadding
rebalance, do the following:
a s ry this
• The rebalance operation
ฺ e l se been successfully run. Check the Oracle ASM alert
m may uhave
( e ie e to
logs to confirm lthis.

s r y operation
• The rebalance
c e ns may be currently running. Check the GV$ASM_OPERATION
viewa
l M li if the rebalance operation is still running.
to determine
• EThe rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view
e
E l i to determine if the rebalance operation has failed.
Rebalance operations from multiple disk groups can be performed on different Oracle ASM
instances in the same cluster if the physical disk being replaced contains ASM disks from
multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If
all Oracle ASM instances are busy, the rebalance operations are queued.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 40


Shutting Down a Cell

• When performing maintenance on Exadata cells, it may be


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

necessary to power down or reboot the cell.


• If a Storage Server is to be shut down when one or more
databases are running, verify that taking the Storage
Server offline will not impact the Oracle ASM disk group
and database availability.
• The ability to take Oracle Exadata Storage Server offline
r a ble
n s fe
without affecting database availability depends on the level
of Oracle ASM redundancy used on the affected -disk
n tra
groups, and the current status of disks in the o Oracle
nother
a
has copies
Exadata Storage Servers that have mirror eฺ of data. ) uid
o m
e a ฺc nt G
c is- tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s procedure cedescribes how to power down an Exadata cell.
a l i
lM
The following
E
e
Eli 1. Run the following command to check if there are other offline disks:
CellCLI > LIST GRIDDISK ATTRIBUTES name WHERE
asmdeactivationoutcome != ’Yes’
If any grid disks are returned, it is not safe to take the Storage Server offline because
Oracle ASM disk group redundancy will not be intact in this case. Taking the Storage
Server offline when one or more grid disks are in this state will cause Oracle ASM to
dismount the affected disk group, causing the databases to shut down abruptly.
2. Inactivate all the grid disks when you want to take the Oracle Exadata Storage Server
offline by using the following command:
CellCLI> ALTER GRIDDISK ALL INACTIVE
The preceding command will complete after all disks are inactive and offline.
Depending on the Storage Server activity, it may take several minutes for this command
to complete.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 41


3. Verify that all grid disks are INACTIVE to allow safe Storage Server shutdown by
running the following command:
CellCLI> LIST GRIDDISK WHERE STATUS != 'inactive'
If all grid disks are INACTIVE, the Storage Server can be shut down without affecting
database availability.
4. Shut down the cell.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

5. After performing the maintenance, start the cell. The cell services will start automatically.
6. Bring all grid disks online by using the following command:
CellCLI> ALTER GRIDDISK ALL ACTIVE
When the grid disks become active, Oracle ASM will automatically synchronize the gird
disks to bring them back into the disk group.
7. Verify that all grid disks have been successfully put online by using the following
command:
CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
r a ble
Wait until asmmodestatus is ONLINE for all grid disks. The following is an example
n s fe of
the output:
n - tra
DATA_CD_00_dm01cel01 ONLINE
a no
has ideฺ
DATA_CD_01_dm01cel01 SYNCING
)
om t Gu
DATA_CD_02_dm01cel01 OFFLINE
ฺ c
-ea uden
…<truncated>
DATA_CD_11_dm01cel01 OFFLINE
c i s t
@ s S
asmmodestatus=ONLINE. Before a srytaking
Oracle ASM synchronization is complete
e
only
t h when
another
all grid disks show
i Storage Server offline, Oracle ASM
l m
ฺe tono the
synchronization must complete
s
u restarted Oracle Exadata Storage Server. If
synchronization is(e l i e
r y e n seoutput:
not complete, the check performed on another Storage Server will fail. The

l M asCellCLI>
following is an example of the
lic list griddisk attributes name where
i e E
E l asmdeactivationoutcome != 'Yes'
DATA_CD_00_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_01_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup“
DATA_CD_02_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_03_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_04_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_05_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 42


DATA_CD_06_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_07_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_08_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_09_dm01cel02 "Cannot de-activate due to other offline
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

disks in the diskgroup"


DATA_CD_10_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"
DATA_CD_11_dm01cel02 "Cannot de-activate due to other offline
disks in the diskgroup"

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 43


Re-Creating a Damaged Cell Boot
and Rescuing USB
If the cell boot and rescue USB are lost or damaged, you can
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

create a new cell boot and rescue USB by performing the


following steps:
1. Attach a new USB flash drive. This flash drive should have
a capacity of 4 GB. See the lesson titled “Maintenance and
FRU Replacement” for part number details.
2. Remove any other USB flash drives from the system.
r a ble
3. Log in to the cell as the root user.
n s fe
4. Run the following script: n - tra
a no
/opt/oracle.cellos/make_cellboot_usb.sh s ha ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 44


Changing the InfiniBand Network Information

• It may be necessary to change the InfiniBand network


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

information on an existing Database Machine. The change


may be needed to support a media server with multiple
InfiniBand cards, or to keep InfiniBand traffic on a distinct
InfiniBand network (for example, having production, test,
and QA environments in the same rack).
• All InfiniBand addresses must be in the same subnet, with le
a minimum subnet mask of 255.255.240.0 (or /20). The fe r a b
subnet mask chosen should be wide enough to tran
s
accommodate possible future expansion of the n n-
oDatabase
a
Machine and InfiniBand network. as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
r y
s described ce in this section is based on the following assumptions:
a l i
lM
The procedure
E
i e
ElChannel bonding is used for client access network, such that the NET1 and NET2 interfaces
are bonded to create BOND1. If channel bonding is not used, replace BOND1 with NET1 in
the procedure.
• The procedure uses the dcli utility and the root user. This significantly reduces the
overall time to complete the procedure by running the commands in parallel on the
Database Servers.
• The dcli utility requires SSH user-equivalence. If SSH user-equivalence is not
configured, some commands must be run explicitly on each Database Server.
• The database group file, dbs_group, must exist and be located in the /root directory.
The following procedure describes how to change the InfiniBand network information:
1. Verify the assignment of the new InfiniBand network information for all servers.
Verification should include the InfiniBand IP addresses, netmask, broadcast, and
network IP information.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 45


2. Shut down all cluster-managed services on each Database Server as the Oracle user
by using the following command:
$ srvctl stop home -o db_home -s state_filename -n node_name
In the preceding command, db_home is the full directory name for the Oracle Database
home directory, state_filename is the path name where you want the state file to be
written, and node_name is the name of the Database Server. The following is an
example of the command:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

$ srvctl stop home -o /u01/app/oracle/product/11.2.0/dbhome_1 -s


/tmp/dm02db01_dbhome -n dm02db01
In the preceding example, /u01/app/oracle/product/11.2.0/dbhome_1 is the
Oracle Database home directory, /tmp/dm02db01_dbhome is the state_filename,
and dm02db01 is the name of the Database Server.
3. Modify the cluster interconnect interface on the first Database Server to use the NET0
interface as follows:
ble
fera
Note: Only Oracle Clusterware Cluster Ready Services (Oracle Clusterware CRS),

a ns
Oracle Clusterware, and Oracle Automatic Storage Management (Oracle ASM) are up
on all the Database Servers.
n- t r
a. Log in as the oracle user.
a no
) has ideฺ
b. Set ORACLE_SID = +ASM1. The base for the ORACLE_HOME environment
variable must be set to the Grid Infrastructure home.
ฺ c om t Gu
-ea uden
c. List the available cluster interfaces by using the following command:
$ oifcfg iflist
c i s t
@ s S
sry eof tthe
The following is an example
a hi output:
eth0 10.204.78.0
ฺ e lm us
( e lie se to
bond1 10.204.76.0
n
y 192.168.120.0
asr lice
bond0

E lM
d. List the assigned interfaces by using the following command:

El i e $ oifcfg getif
The following is an example of the output:
bond1 10.204.76.0 global public
bond0 192.168.120.0 global cluster_interconnect
e. Delete the current global cluster_interconnect interface by using the
following command:
$ oifcfg delif -global assigned_interface
In the preceding command, assigned_interface is the interface to be deleted.
The following is an example of the command:
$ oifcfg delif -global bond0

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 46


f. Assign NET0 as the global cluster_interconnect interface by using the
following command:
$ oifcfg setif -global
c_interface/c_IP_address:cluster_interconnect
In the preceding command, c_interface is the interface to be used for
cluster_ interconnect, and c_IP_address is the IP address for
cluster_interconnect.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The following is an example of the command:


$ oifcfg setif -global eth0/10.204.78.0:cluster_interconnect
Notes
- The reassignment of NET0 is only for the duration of this procedure.
- The IP addresses are specified in Classless Inter-Domain Routing (CIDR) format.
For example:
ipaddress1=192.168.50.23/24
r a ble
g. Confirm the current interface assignments by using the following command:
n s fe
$ oifcfg getif
n - tra
The following is an example of the output:
a no
has ideฺ
eth0 10.204.78.0 global cluster_interconnect
bond1 10.204.76.0 global public )
ฺ c
4. Shut down Oracle Clusterware and Oracle Clusterwareom t G u
CRS on each Database Server
as follows: i s -ea uden
a. Log in as the root user. @
c S t
sry e ton s
hieach Database Server by using the following
a
b. Shut down Oracle Clusterware
m us
command: eฺel
( e li se to
#/u01/app/11.2.0/grid/bin/crsctl stop clusterware
r y
s down e n
l
c. aShut
M licOracle Clusterware CRS on each Database Server by using the
ie E
following command:
El #/u01/app/11.2.0/grid/bin/crsctl stop crs -f
d. Disable automatic Oracle Clusterware CRS restart on each Database Server by
using the following command:
#/u01/app/11.2.0/grid/bin/crsctl disable crs
5. Change the InfiniBand IP addresses on each Oracle Exadata Storage Server as follows:
a. Log in as the root user.
b. Run the following commands:
# service celld stop
# service ocrvottargetd stop
# ipconf
For the last command, follow the prompts to change the BOND0 information.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 47


c. Restart the Oracle Exadata Storage Server by using the following command:
# reboot
6. Verify the newly assigned InfiniBand address by using the following command:
# cellcli -e list cell detail | grep ipaddress1
The following is an example of the output:
ipaddress1: 192.168.3.9/22
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: If there is an error, indicating that one or more cell services are not running,
restart the cell services by using the following command:
# cellcli -e alter cell restart services all
7. Change the InfiniBand IP addresses on each Database Server as follows:
a. Log in as the root user.
b. Change to the /etc/sysconfig/network-scripts directory.
c. Copy the ifcfg-bond0 file. The copied file name must not start with ifcfg.
b le
The following is an example of the copy command:
n s fera
# cp ifcfg-bond0 orig_ifcfg-bond0
t r a
-NETWORK,
d. Edit the ifcfg-bond0 file to update the IPADDR, NETMASK,
n o n and
BROADCAST fields. The following is an example of theaoriginal file and an
updated file:
) has ideฺ
Example of the original ifcfg-bond0 file:
ฺ c om t Gu
#### DO NOT REMOVE THESE LINES
i s - ea ####d e n
#### %GENERATED BY CELL%
@ c ####Stu
DEVICE=bond0
a sry e this
USERCTL=no lm
i e ฺ e to us
y ( el nse
BOOTPROTO=none

a sr lice
ONBOOT=yes

El M IPADDR=192.168.120.253
E l i e NETMASK=255.255.254.0
NETWORK=192.168.120.0
BROADCAST=192.168.121.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000
updelay=5000"
IPV6INIT=no
MTU=6552
Example of the updated ifcfg-bond0 file:
#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=bond0
USERCTL=no

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 48


BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.3.253
NETMASK=255.255.252.0
NETWORK=192.168.0.0
BROADCAST=192.168.3.255
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000


updelay=5000"
IPV6INIT=no
MTU=65520
e. Restart the Database Server by using the following command:
# reboot
f. Verify the InfiniBand IP address information by using the following command:
# ifconfig -a
r a ble
The following is an example of the BOND0 information. It shows the updated
n s fe
InfiniBand network information:
n - tra
no
inet addr:192.168.3.254 Bcast:192.168.3.255 Mask:255.255.252.0
a
8. Update the cellinit.ora and cellip.ora files on each
h a s Database
ฺ Server as
follows:
m ) uide
a. Log in as the root user. o
ฺc nt G
e a
c is- tude
b. Change to the /etc/oracle/cell/network-config directory.
c. Make a backup copy ofrthe @ is
y cellip.ora S file. The following is an example of the
command: a s e t h
ฺ e l m u s
( e l ie e toorig_cellip.ora
# cp cellip.ora
Note:
r y If using n
e s user-equivalence, the dcli utility can be used. The following
SSH
s c
li of the dcli command:
ais an example
l M
El ie E # dcli -l root -g /root/dbs_group "cp cellip.ora orig_cellip.ora“
d. Make a backup copy of the cellinit.ora file. The following is an example of
the command:
# cp cellinit.ora orig_cellinit.ora
Note: If using SSH user-equivalence, the dcli utility can be used. The following
is an example of the dcli command:
# dcli -l root -g /root/dbs_group "cp cellinit.ora
orig_cellinit.ora“

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 49


e. Change the InfiniBand IP addresses in the cellip.ora file. The following is an
example of the original file, and an updated file:
Example of the original file:
cell="192.168.121.9"
cell="192.168.121.10"
cell="192.168.121.11"
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

cell="192.168.121.12"
cell="192.168.121.13"
cell="192.168.121.14"
cell="192.168.121.15"
cell="192.168.121.16"
Example of the updated file:
cell="192.168.3.9"
b le
fera
cell="192.168.3.10"
cell="192.168.3.11"
a ns
cell="192.168.3.12"
n- t r
cell="192.168.3.13"
a no
has ideฺ
cell="192.168.3.14"
)
om t Gu
cell="192.168.3.15"
ฺ c
-eatheudcli
cell="192.168.3.16“
i
Note: If using SSH user-equivalence,s d enutility can be used to copy the
@ c Server
updated file from the firstyDatabase S t to the other Database Servers. The
s
thi command:
srof thee dcli
a
following is an example
lm -g /root/dbs_group
# dcli -lฺe
i e root
t o us -f /etc/oracle/cell/network-
l
(e nse
config/cellip.ora
r y
as # dcli
lic-le root -g /root/dbs_group "mv /root/cellip.ora \
lM /etc/oracle/cell/network-config/“
e E
Eli f. Change the InfiniBand IP addresses in the cellinit.ora file. The following is an
example of the original file, and an updated file:
Example of the original file:
ipaddress="192.168.120.253/23"
Example of the updated file:
ipaddress="192.168.3.253/22"
Update the cellinit.ora file on each Database Server. The contents of the file
are specific to the Database Server. The dcli utility cannot be used for this step.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 50


9. Update the /etc/host file on each Database Server and on each Oracle Exadata
Storage Server to use the new InfiniBand IP addresses as follows:
a. Log in as the root user.
b. Make a backup copy of the /etc/hosts file. The following is an example of the
command:
# cp /etc/hosts /etc/orig_hosts
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

c. Change the InfiniBand IP addresses for the Database Servers and Oracle Exadata
Storage Servers.
10. Update the cluster binaries to use the UDP protocol on each Database Server as
follows:
a. Log in as the root user.
b. Unlock the cluster binaries by using the following commands:
# /u01/app/11.2.0/grid/crs/install
ble
fera
# perl rootcrs.pl -unlock -crshome /u01/app/11.2.0/grid
c. Log in as the oracle user. s
t r
d. Set ORACLE_SID=+ASM1. The base for ORACLE_HOME must be set to the Grida n
Infrastructure home. no n-
a
) h as eฺ
e. Change to the /u01/app/11.2.0/grid/rdbms/lib directory.
f. Run the following command:
c o m Guid
$ make -f ins_rdbms.mk ipc_g aioracle
e ฺ n t
- e
11. Start Oracle Clusterware and Oracle
@
s
S tudCRS on each Database Server as
ciClusterware
follows:
a s ry this
a. Log in as the root
ฺ e lm user.use
( e e to CRS by using the following command:
lieClusterware
b. Start Oracle
s#ry
e ns
/u01/app/11.2.0/grid/bin/crsctl
c start crs
a l i
E lM
c. Start Oracle Clusterware by using the following command:

El ie # /u01/app/11.2.0/grid/bin/crsctl start clusterware


d. Log in as the oracle user.
e. Set ORACLE_SID=+ASM1.
f. Ensure that the Oracle ASM instances have started on each Database Server by
using the following command:
$ srvctl status ASM
Note: If Oracle ASM is not running on any Database Server, start it as the oracle
user. ORACLE_HOME must be set to the Grid Infrastructure home. The following is
an example of the commands to start a Database Server:
sqlplus /as sysasm
sqlplus > startup

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 51


g. Check the Oracle ASM alert.log file in the
/u01/app/oracle/diag/asm/+asm/+ASM1/trace directory to verify that the
cluster binary is using the UDP protocol. An entry similar to the following should be
in the file:
cluster interconnect IPC version:Oracle UDP/IP (generic)
12. Shut down any running cluster-managed services as follows:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

a. Log in as the oracle user on each Database Server.


b. Run the following command:
$ srvctl stop home -o db_home -s state_filename -n node_name
In the preceding command, db_home is the full directory name for the Oracle
Database home directory, state_filename is the path name where you want
the state file to be written, and node_name is the name of the Database Server.
See also Oracle Real Application Clusters Administration and Deployment Guide
for additional information about Server Control Utility (SRVCTL) commands.
r a ble
13. Modify the cluster interconnect interface to use the BOND0 interface on the first
n s fe
Database Server as follows:
n - tra
ASM instances are up. a no CRS, and Oracle
Note: At this point, only Oracle Clusterware, Oracle Clusterware

a. Log in as the oracle user. ) has ideฺ


b. Set ORACLE_HOME to the Grid Infrastructureฺ c omhome.
t G u
c. Set ORACLE_SID=+ASM1. i s -ea uden
@ c S t
d. List the available cluster
s ry thi
interfaces s
by using the following command:
$ oifcfg iflist a
lm use
ฺ e o
( e lie isseantexample
The following of the output:
r cen
y 10.204.78.0
eth0
a s i
bond1 l10.204.76.0
E lM bond0 192.168.3.0
Eli e
e. List the currently assigned cluster interfaces by using the following command:
$ oifcfg getif
The following is an example of the output:
eth0 10.204.78.0 global cluster_interconnect
bond1 10.204.76.0 global public
f. Delete the current global cluster_interconnect interface by using the
following command:
$ oifcfg delif -global eth0

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 52


g. Assign BOND0 as the global cluster_interconnect interface by using the
following command:
$ oifcfg setif -global
c_interface/c_IP_address:cluster_interconnect
In the preceding command, c_interface is the interface to be used for
cluster_ interconnect, and c_IP_address is the IP address for
cluster_interconnect.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The following is an example of the command:


$ oifcfg setif -global bond0/192.168.3.0:cluster_interconnect
h. List the current interfaces by using the following command:
$ oifcfg getif
The following is an example of the output:
bond1 10.204.76.0 global public
bond0 192.168.3.0 global cluster_interconnnect
r a ble
14. Stop Oracle Clusterware and Oracle Clusterware CRS on each Database Server
n s fe as
follows:
n - tra
a. Log in as the root user.
a no
b. Stop Oracle Clusterware by using the following command: s ฺ
haclusterware
# /u01/app/11.2.0/grid/bin/crsctl stop
o m ) u ide
c. Stop Oracle Clusterware CRS by usingฺc
a t G command:
the following
n
e
is- tudstop
# /u01/app/11.2.0/grid/bin/crsctl e crs -f
c
@the RDS Sprotocol on each Database Server as
15. Update the cluster binaries to y
s r use
h i s
follows:
l m a se t
a. Log in as the
i e e touser.
ฺoracle u
b. Set y
l
(e nse
ORACLE_SID=+ASM1. The base for ORACLE_HOME must be set to the Grid
s r c e
li home.
aInfrastructure
l M
El ie E c. Change to the /u01/app/11.2.0/grid/rdbms/lib directory.
d. Run the following command:
$ make -f ins_rdbms.mk ipc_rds ioracle
16. Lock the Grid Infrastructure binaries on each Database Server as follows:
a. Log in as the root user.
b. Change to the /u01/app/11.2.0/grid/crs/install directory.
c. Run the following command:
# perl rootcrs.pl -patch -crshome /u01/app/11.2.0/grid
17. Verify that cluster interconnect is using the RDS protocol on each Database Server by
examining the Oracle ASM alert.log file. The log is in the
/u01/app/oracle/diag/asm/+asm/+ASM1/trace directory. An entry similar to the
following should be listed for the most-recent Oracle ASM restart:
CELL interconnect IPC version: Oracle RDS/IP (generic)

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 53


18. Start all cluster-managed services by using the srvctl utility as follows:
a. Log in as the oracle user.
b. Start the database by using the following command:
$ srvctl start home -o /u01/app/oracle/product/11.2.0/dbhome_1 \
-s /tmp/dm02db01_dbhome -n dm02db01
c. Verify that the database instances are running by using the following command:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

$ srvctl status database -d dbm


19. Enable Oracle Clusterware CRS automatic restart on each Database Server as follows:
a. Log in as the root user.
b. Enable Oracle Clusterware CRS by using the following command:
# /u01/app/11.2.0/grid/bin/crsctl enable crs
Note: To use the dcli utility to enable Oracle Clusterware CRS, use the following
command:
b le
fera
# dcli -l root -g dbs_group "/u01/app/11.2.0/grid/bin/crsctl
enable crs“
a n s
20. Perform a health check of Exadata Database Machine by using then r
-t described in
steps
My Oracle Support note 1070954.1 n o
a
Note: The Oracle Exadata Database Machine HealthCheck
) h as eฺ utility collects data for
key software, hardware, and firmware releases,
o m and id
configuration
u best practices
for the Database Machine. Oracle recommends c
aฺDatabase
n G
tthat you periodically review the
current data for key components i s -
of e
the e
d Machine, and compare them to
c S t u
the supported release levels, @ and recommended best practices. Oracle Exadata
Database Machine a sry e thisisnot a database, network, or SQL performance
HealthCheck
analysis tool. It
ฺ e lmnot a continuous
is us monitoring utility, and does not duplicate other
monitoring
e i
l see
or alertingt o
tools, such as Integrated Lights Out Monitoring (ILOM) or
(
a l i cen Manager Grid Control.
sry Enterprise
Oracle

E lM
El i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 54


Understanding the InfiniBand Network
Master Subnet Manager
The Subnet Manager manages all operational characteristics of
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the InfiniBand network, such as the following:


• Discovering the network topology
• Assigning a local identifier to all ports connected to the
network
• Calculating and programming switch forwarding tables
• Monitoring changes in the fabric r a ble
For additional information, see the Sun Datacenter InfiniBandn s fe
Switch 36 User’s Guide: n - tra
no
sa
http://download.oracle.com/docs/cd/E19197-01/index.html
ha ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr network
l i cecan have more than one Subnet Manager, but only one Subnet
lM
The InfiniBand
E
Eli e
Manager is active at a time. The active Subnet Manager is the Master Subnet Manager.
The other Subnet Managers are the Standby Subnet Managers. If a Master Subnet Manager
is shut down or fails, a Standby Subnet Manager will automatically become the Master Subnet
Manager.
The following guidelines determine where Subnet Managers run on the Database Machine:
• Run Subnet Managers only on the Sun Datacenter InfiniBand Switch 36 switches. It is
possible to run Subnet Managers on a server using OpenSM, but the Subnet Managers
should never run on the Database Servers or Oracle Exadata Storage Servers.
• When the InfiniBand network consists of one, two, or three racks cabled together, all
switches should run Subnet Manager. The Master Subnet Manager should be run on a
spine switch. If the network has only leaf switches, as in Exadata Database Machine
Half Racks or Exadata Database Machine Quarter Racks, the Master Subnet Manager
runs on a leaf switch.
• When the InfiniBand network consists of four or more racks cabled together, only spine
switches should run Subnet Manager. The leaf switches should disable the Subnet
Manager.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 55


Changing IP Addresses on an
Exadata Storage Server
To change the IP address on Oracle Exadata Storage Server:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. Set the disks in the ASM disk group that are on the
Exadata Storage Server to OFFLINE by using the following
command on every ASM node and instance that accesses
the cell:
ALTER DISKGROUP diskgroup_name OFFLINE DISK
asm_disk_name ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sthe
E lM
You must do following when changing the fundamental configuration of a cell, such as

Elie
changing the IP address, host name, and InfiniBand address:
• Before changing the cell configuration, ensure that all ASM, Oracle RAC, and database
instances that use the cell will not access the cell while you are changing the IP
address.
• After changing the cell configuration, ensure that consumers of cell services have their
devices correctly reconfigured to use the new connect information of the cell.
• When changing a cell configuration, change only one cell at a time to ensure that ASM
and Oracle RAC work properly during the changes.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 56


Changing IP Addresses on an
Exadata Storage Server
2. Log in as the root user, and run the ipconf utility
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

located in the /usr/local/bin directory.


Use the ipconf utility to change the following on a cell:
– IP address
– Host name
– NTP server
– Time zone b le
– DNS name servers ns fera
t r a
– InfiniBand addresses n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sutility
E l M
The ipconf will shut down any running cell services. You must restart the cell services

Elie
by using the CellCLI utility after you verify your changes.
To verify changes, use the ipconf utility and review the values that are displayed.
You can also examine the /opt/oracle.cellos/ipconf.pl.log log.
The ipconf utility makes a backup of the files it modifies. When you rerun the utility, it
overwrites the existing backup file if you modify values. The log file maintains the complete
history of every ipconf operation you perform.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 57


Changing IP Addresses on an
Exadata Storage Server
3. Log in to the cell as the celladmin user.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

4. Restart all cell services by using the following command:


CellCLI> ALTER CELL RESTART SERVICES ALL
5. Verify that the services are started and functional.
6. If you modified the Ethernet or InfiniBand IP address,
update the cellip.ora file with the changed address
information on each ASM node or instance that uses the rable
e
cell. nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 58


Changing IP Addresses on an
Exadata Storage Server
7. For each database node, perform the following steps:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

a. Start CRS.
b. Start ASM.
c. Start the database.
Repeat these startup steps for the next database node.
8. Set the disks in the ASM disk group to ONLINE by using
the following command:
r a ble
CellCLI> ALTER DISKGROUP disk_group_name n s fe
ONLINE DISK asm_disk_name n - tra
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 59


Nonemergency Power Cycle Procedure

Powering On Oracle Exadata Storage Servers and


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Database Servers
Oracle Exadata Storage Servers and Database Servers are
powered on by either pressing the power button at the front of
the machine, or by logging in to the ILOM interface and
powering up the system.
When a Database Server is powered on and the operating
r a ble
system boots, Oracle Clusterware is automatically started ifsitfeis
r
installed. Oracle Clusterware then starts all resources-that
t anare
configured to start automatically. n on
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srsequence
l i ceis as follows:
lM
The power-on
E
e
Eli 1. Rack, including switches
Ensure that the switches have had power applied for a few minutes to complete
power-on configuration before starting Oracle Exadata Storage Servers.
2. Oracle Exadata Storage Servers
Ensure that all Oracle Exadata Storage Servers complete the boot process before
starting the Database Servers.
3. Database Servers
Powering On Servers Using ILOM
Servers can be powered on by using Integrated Lights Out Manager (ILOM). ILOM can be
accessed by using the web console, the command-line interface (CLI), IPMI, or SNMP.
For example, to apply power to the dm01cel01 server by using IPMI, where dm01cel01-
ilom is the host name of the ILOM for the server to be powered on, run the following
command from a server that has ipmitool installed:
# ipmitool -H dm01cel01-ilom -U root chassis power on

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 60


The preceding command will cause the system to prompt for the password.
For additional information about using the ILOM to power on the servers, see the Oracle
Integrated Lights Out Manager (ILOM) 3.0 documentation. The documentation is available at:
http://docs.sun.com/app/docs/coll/ilom3.0?l=en.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 61


Nonemergency Power Cycle Procedure

Powering Off Oracle Exadata Storage Servers


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Oracle Exadata Storage Servers are powered off and rebooted


by using the Linux shutdown command. The following
command shuts down an Oracle Exadata Storage Server
immediately:
# shutdown -h -y now
When powering off Oracle Exadata Storage Servers, all
r a ble
storage services are automatically stopped.
n s fe
The following command reboots an Oracle Exadata n - tra
Storage
o
Server immediately: an
# shutdown -r -y now ) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
Notes
E l M
e
Eli • All database and Oracle Clusterware processes should be shut down before shutting
down more than one Oracle Exadata Storage Server.
• Powering off or rebooting Oracle Exadata Storage Servers can impact database
availability.
• The shutdown commands to power off or reboot can be used for Database Servers.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 62


Nonemergency Power Cycle Procedure

Powering Off a Database Server


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

When powering off Database Servers, Oracle Clusterware


should be stopped before rebooting or shutting down a
Database Server. Oracle Clusterware is stopped by using the
following command:
# crsctl stop cluster

ble
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sris the lrecommended
i ce
E lM
The following shutdown procedure for Database Servers:

Elie1. Stop Oracle Clusterware by using the following command:


# /u01/app/11.2.0./grid/bin/crsctl stop cluster
If any resources managed by Oracle Clusterware are still running after running the
crsctl stop cluster command, the command fails. Use the -f option to
unconditionally stop all resources, and stop Oracle Clusterware.
2. Shut down the operating system by using the following command:
# shutdown -h -y now
The dcli command utility can be used to run the shutdown command on more than
one server. The following command shows the syntax for the command:
dcli -l root -g group_name shutdown -h -y now
In the preceding syntax, group_name is the name of the Oracle Exadata Storage
Server or Database Server group.
The power-off sequence is as follows:
1. Database servers
2. Oracle Exadata Storage Servers
3. Rack, including switches

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 63


Nonemergency Power Cycle Procedure

Powering Off Multiple Servers at the Same Time


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The dcli utility can be used to run the shutdown command


on multiple servers at the same time. Do not run the dcli
utility from a server that will be shut down. For example, to shut
down all Oracle Exadata Storage Servers using the dcli utility,
run the command from a Database Server.
The following command shows the syntax to shut down all
r a ble
Oracle Exadata Storage Servers at the same time:
a -y sfe n
# dcli -l root -g cell_group shutdown-tr-h
now no n
s a
In the preceding command, cell_group ) a
his theidfile
e ฺ that
contains a list of all Oracle ExadataฺcStorage om t GServers. u
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe Database
l i ce Machine by Using the dcli Utility
E lM
Powering Off
e
Eli 1. Stop Oracle Clusterware on all Database Servers by using the following command:
# /u01/app/11.2.0/grid/bin/crsctl stop cluster -all
2. Shut down all remote Database Servers by using the following command:
# dcli -l root -g remote_dbs_group shutdown -h -y now
In the preceding command, remote_dbs_group is the file that contains a list of all the
remote Database Servers.
3. Shut down all Oracle Exadata Storage Servers by using the following command:
# dcli -l root -g cell_group shutdown -h -y now
In the preceding command, cell_group is the file that contains a list of all Oracle
Exadata Storage Servers.
4. Shut down the local Database Server by using the following command:
# shutdown -h -y now
5. Remove power from the rack.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 64


Powering On and Off Network Switches
The network switches do not have power switches. They power off when power is removed,
by way of the Power Distribution Unit (PDU) or at the breaker in the data center.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 65


Emergency Power-Off Considerations

If there is an emergency, power to the Database Machine


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

should be halted immediately. The following emergencies may


require powering off the Database Machine:
• Natural disasters such as earthquake, flood, hurricane,
tornado, or cyclone
• Abnormal noise, smell, or smoke coming from the machine
e
• Threat to human safety abl fer
a n s
n - tr
no a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a r lice
sPower-Off
E lM
Emergency Procedure

Elie
To perform an emergency power-off procedure for the Database Machine, turn off power at
the circuit breaker or pull the emergency power-off switch in the computer room. After the
emergency, contact Oracle Support Services to restore power to the machine.
Emergency Power-Off Switch
Emergency power-off (EPO) switches are required when computer equipment contains
batteries that are capable of supplying more than 750 volt-amperes for more than five
minutes. Systems that have these batteries include internal EPO hardware for connection to a
site EPO switch or relay. Use of the EPO switch will remove power from the Database
Machine.
Cautions and Warnings
The following cautions and warnings apply to the Database Machine:
• Do not touch the parts of this product that use high-voltage power. Touching them might
result in serious injury.
• Do not power off the Database Machine unless there is an emergency. In the case of an
emergency, follow the emergency power-off procedure.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 66


• Keep the front and rear cabinet doors closed. Failure to do so might cause system
failure or result in damage to hardware components.
• Keep the top, front, and back of the cabinets clear to allow proper airflow and prevent
overheating of components.
• Use only the supplied hardware.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

b le
ns fera
t r a
no n-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm us
ฺ e
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 67


Installing and Configuring Auto Service Request:
Solaris Server
Before installing Auto Service Request (ASR), ensure that the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

following prerequisites are met on a stand-alone Solaris server:


• Create a Sun Online Account (SOA) at http://reg.sun.com.
• Identify and designate a system to serve as ASR Manager,
and ensure the following:
– The operating system is Solaris 10, Update 6 (10u6).
– Java version is JDK 1.6.0_04 or later. ble
– You have root user access to install the software. ns fera
t r a
• Identify and verify the ASR assets. n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srprocedure
l i cedescribes how to install Oracle Services Tools Bundle (STB), the
lM
The following
E
e
Sun Automated Service Manager (SASM) package, and the Auto Service Request package
i a qualified Sun
Elon server.
1. Install STB as follows:
a. Download and unzip the Oracle Services Tools Bundle from Doc ID 1153444.1.
b. Untar the STB bundle, and run the installation script, install_stb.sh. As part
of the installation, be sure to do the following:
– Enter I for installation.
– Enter Y to replace existing Serial Number in EEPROM (SNEEP) packages.
– Enter Y to replace existing Service Tags packages.
c. Confirm that SNEEP is installed correctly by running the following command from
a terminal window:
# sneep -a

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 68


d. Verify that Service Tags is reporting the system attributes correctly by running the
following command:
# stclient -E
If the output does not show the serial number, run the following command to
register the serial number:
# sneep -s SERIAL-NUMBER
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

2. Install the SASM package as follows:


Note: The SASM package needs to be installed only on ASR Manager.
a. Verify that SASM is version 1.2.1 or later. As the root user, run the following
command:
# pkginfo -l SUNWsasm
This command will show whether the SASM package is already installed. If
needed, download SASM by using the ASR Software Download link from the
following website: http://oracle.com/asr. ble
b. Install the SASM package by using the following command:
ns fera
# pkgadd -d SUNWsasm.version_number.pkg
t r a
3. Install the ASR package as follows: no n-
a
a. Download and unzip the ASR package. ) has ideฺ
Note: The ASR package needs to be installed only on ASR Manager.

ฺ c om t Gu
-ea uden
b. Install the ASR package as the root user by using the following command:
c i s t
# pkgadd -d SUNWswasr.version_number.pkg
c. Add the asr commandrto @
y the PATH s S
i variable as follows:
a s e t h
ฺ e lm us
# PATH=$PATH:/opt/SUNWawasr/bin/asr
# EXPORT
( e liePATH e to
s r y cens
l M a li
E
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 69


Installing and Configuring Auto Service Request:
Enterprise Linux Server
Before installing Auto Service Request, ensure that the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

following prerequisites are met on a stand-alone Oracle


Enterprise Linux server:
• Create a Sun Online Account (SOA) at http://reg.sun.com.
• Identify and designate a system to serve as ASR Manager,
and ensure the following:
– The operating system is Oracle Enterprise Linux 5.3 or later. ble
– Java version is JDK 1.6.0_04 or later. s f era
t r a n
n-
– You have root user access to install the software.
no
• Identify and verify the ASR assets. a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srprocedure
l i cedescribes how to install service tags, the SASM package, and the
lM
The following
E
Eli e
Auto Service Request package on a qualified stand-alone Oracle Enterprise Linux server:
1. Install service tags for Oracle Enterprise Linux as follows:
Note: Service tags need to be installed only on the ASR Manager server.
a. Go to the Oracle Software Delivery Cloud – http://edelivery.oracle.com.
b. Select Oracle Enterprise Linux from the menu, and click Continue.
c. Download and unzip the latest svctag.i386.linux.zip file.
d. Install the service tags by using the following commands:
# rpm -i sunservicetag-1.1.5-1.i386.rpm
# rpm -i sun-hardware-reg-1.0.0-1.i386.rpm

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 70


2. Install the SASM package as follows:
Note: The SASM package needs to be installed only on the ASR Manager server.
a. Verify that SASM is version 1.2.1 or later. As the root user, run the following
command:
# rpm -q SUNWsasm
This will show if the SASM package is installed. If needed, download SASM from
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the following website https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-


CDS_SMI-Site/en_US-USD/ViewProductDetail-Start?ProductRef=SASM-1.2.1-G-
F@CDS-CDS_SMI.
b. Install the SASM package by using the following command:
# rpm -i SUNWsasm.version_number.pkg
3. Install the ASR package as follows:
Note: The ASR package needs to be installed only on the ASR Manager server.
a. Download and unzip the ASR package from the following website: b le
https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-
ns fera
/USD/ViewProductDetail-Start?ProductRef=ASR-2.6-G-F@CDS-CDS_SMI.t r a
no n-
b. Install the ASR package as the root user by using the following command:
# rpm -ihv SUNWswasr.version_number.pkg s a
ha ideฺ
c. Add the asr command to the PATH variable as) follows:
ฺ c om t Gu
-ea uden
# PATH=$PATH:/opt/SUNWawasr/bin/asr
# EXPORT PATH
c i s t
@ s S
a sry e thi
ฺ e lm us
( e lie se to
a sry licen
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 71


Registering ASR Manager

To register the ASR Manager:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. Run the following command on the ASR Manager server


as the root user:
asr register
2. Enter 1 or 2, depending on your location.
– 1 is transport.sun.com (Americas or Asia Pacific regions).
– 2 is transport.sun.co.uk (Europe, Middle East, or Africa
r a ble
regions). sfe a n
3. If you are using a proxy server to access the Internet, t r
n- enter
the proxy server information. If you are notausing a proxy n o
server, enter a hyphen (-). ) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 72


Registering ASR Manager

4. Enter your SOA username and password when prompted.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ASR will validate the login. After the login credentials are
validated, the registration is complete.
Your SOA email address receives output from ASR
reports, notifications of ASR problems, and service request
(SR) generation.
Note: Passwords are not stored.
r a ble
5. Check the registration status by using the following nsfe
a
command: n-tr no
# asr show_reg_status a
The following is an example of the output: ) has ideฺ
# registered ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 73


Registering ASR Manager

6. Test the connection by using the following command. This


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ensures that ASR can send information to the transport


server.
# asr test_connection
The following is an example of the output:
Connecting to endpoint @
https://transport.sun.com/v1/queue/sasm-ping
Sending ping message. r a ble
n s fe
Ping message sent. a - tr
no n
s a
) a
h ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 74


Configuring ASR Trap Destinations

Ensure that the Oracle Exadata Storage Server software


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

release is 11.2.1.3.1 (or later).


To configure the trap destinations for Database Servers and
Exadata cells:
1. (Optional) Install the monitor program on the Database
Servers as follows:
Note: This step is necessary only if you are upgrading the
r a ble
Database Servers or Oracle Exadata Storage Servers from an
n s fe
earlier release.
n - tra
no
See also: My Oracle Support note 888828.1 foraadditional
information about the patch set.
) has ideฺ
om t G
a. Download the current OracleฺcExadata u
Storage Server
a n
s-e de
Software patch set release.
ci Stu
@
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 75


Configuring ASR Trap Destinations

b. Copy the db_11.2.1.3.1_patch.zip file to the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Database Servers.
c. Shut down the Oracle stack, including the Grid
Infrastructure, Oracle Automatic Storage Management
(Oracle ASM), and Oracle Database.
d. Log in as the root user on each Database Server.
e. Unzip the db_11.2.1.3.1_patch.zip file to the
db_patch_11.2.1.3.1 directory. r a ble
f. Change to the db_patch_11.2.1.3.1 directory. ansf
e
g. Run the install.sh script. n - tr
o n
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 76


Configuring ASR Trap Destinations

2. Add trap destinations for Database Servers.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: This step must be performed on each Database


Server in the Database Machine.
a. Add a new trap destination by using the following
command:
/opt/oracle.cellos/compmon/exadata_mon_hw_asr
.pl -set_snmp_subscribers \
a b le
"(type=asr, host=host_name, fe r
n s
port=162,community=public)"
n - tra
In the preceding command, host_name is the
a nohost name or
IP address of the destination. as ฺ ) h uide
m
co nt G
a ฺ
c is-e tude
s r y@ this S
l m a se
i e ฺe © t2013,
Copyright
o uOracle and/or its affiliates. All rights reserved.
l
(e nse
s r y ce
M a l i
El
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 77


Configuring ASR Trap Destinations

2. a. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: To use the dcli utility to add trap destinations for


Database Servers, use the following command:
dcli -g dbs_group -l root
"/opt/oracle.cellos/compmon/\
exadata_mon_hw_asr.pl -set_snmp_subscribers \
\"\(type=asr, host=hostname, port=162,
community=public\)\"" ble
In the preceding command, dbs_group is the file thatns fera
-
contains a list of all Database Server host namesnin
a
trthe
o
Database Machine. an
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 78


Configuring ASR Trap Destinations

b. Verify all SNMP ASR subscribers by using the following


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

command:
/opt/oracle.cellos/compmon/exadata_mon_hw_asr.p
l -get_snmp_subscribers \
-type asr
The following is an example of the output:
(host=dm01db04,port=162,community=public,type=a
le
rab
sr)
Note: To use the dcli utility to verify the trap destinations s f e
r a n
for Database Servers, use the following command:-t
n on
dcli -g dbs_group -l root
"/opt/oracle.cellos/compmon/\ as
a
) h ideฺ
ฺ c om t Gu
exadata_mon_hw_asr.pl -get_snmp_subscribers -
type asr“
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 79


Configuring ASR Trap Destinations

3. Add trap destinations for Exadata cells:


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

a. Log in as the celladmin user on Exadata cell.


b. Add a trap destination by using the following command:
CellCLI> ALTER CELL snmpSubscriber = ((
host=host_name, port=162, community=public,
type=asr))
In the preceding command, host_name is the host name or
IP address of the destination. The default for the SNMP portrabl
e
e
is 162. nsf tra
n -
no
a
a s
h ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 80


Configuring ASR Trap Destinations

3. b. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: To use the dcli utility to add the trap destinations for
Exadata cells:
dcli -g cell_group -l celladmin "cellcli -e alter
cell \
snmpSubscriber = \(\(host=hostname, port=162,
community=public, \type=ASR\)\)“
In the preceding command, cell_group is the file that
r a ble
contains a list of all Exadata cells in the Database Machine.
n s fe
c. Verify the trap destinations for Exadata cells by using
n - trathe
following command: no a
CellCLI> LIST CELL ATTRIBUTES snmpSubscriber a s
h ideฺ
)
The following is an example of the
ฺ c omoutput: t G u
i s -ea uden
((host=dm01db04,port=162,community=public,type=ASR))

@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 81


Configuring ASR Trap Destinations

3. c. (continued)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Note: To use the dcli utility to verify the trap destinations


for Exadata cells:
dcli -g cell_group -l celladmin "cellcli -e list cell
attributes
snmpsubscriber“

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 82


Configuring ASR Trap Destinations

Removing Trap Destinations


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

To remove trap destinations from Database Servers and


Exadata cells, set snmpSubscriber to an empty string on the
Database Servers and reset snmpSubscriber to all non-ASR
subscribers on the Exadata cells:
• For Database Servers:
dcli -g dbs_group -l root \
r a ble
s
"/opt/oracle.cellos/compmon/exadata_mon_hw_as
n fe
r.pl -set_snmp_subscribers \"\"“ -tra on
• For Exadata cells: a n
a s
h id"cellcli ฺ
dcli -g cell_group -l celladmin ) e -e
alter cell \ snmpSubscriber ฺ c om t=\'\' G u "
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
lMasr lice
i e E
E l

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 83


Activating ASR Destinations

The following procedure describes how to activate ASR assets.


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

The procedure is run from ASR Manager.


1. Activate the ASR Manager host by using the following
command:
asr activate_asset -i host_IP
In the preceding command, host_IP is the host IP
address.
r a ble
2. Activate the ILOMs for the Database Servers and Exadata
n s fe
a
cells by using the following command: n-tr no
asr activate_asset -i ILOM_IP a
In the preceding command, ILOM_IP ) hisastheidILOM
e ฺ IP
address. ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
ElM
Elie

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 84


Activating ASR Destinations

3. Activate the Database Servers and Exadata cells by using


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the following command:


asr activate_exadata –i asset_IP -h asset_hostname -
n asset_ILOM_hostname
Note: This command must be run for each Database Server and
each Exadata cell.
In the preceding command, asset_IP is the IP address of the
database node or Exadata cell, asset_hostname is the host rab
le
name of the database node or Exadata cell, and n s fe
asset_ILOM_hostname is the database node or o -tra cell
Exadata
n
host name. a n
s ha ideฺ
)
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 85


Activating ASR Destinations

4. Verify that ASR is activated by using the following


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

command:
asr list_asset -i asset_IP
In this command, asset_IP is the IP address of the Database
Server host or ILOM, or the Exadata cell host or ILOM. To list all
assets, use the following command:
asr list_asset
ble
The following is an example of the output:
ns fera
P_ADDRESS HOST_NAME SERIAL_NUMBER ASR PRODUCT_NAME
t r a
----------------------------------------------------------------
non-
10.204.79.22 dm01cel07 0123ABC021 Enabled SUN FIRE X4275 SERVER
a
has ideฺ
10.204.79.33 dm01cel07-c 0234ABC021 Enabled SUN FIRE X4275
SERVER )
c om t Gu
10.204.79.26 dm01db04-c 0345ABC51E6 Enabled SUN FIRE X4170

-ea uden
SERVER
c i s t
10.204.79.15 dm01db04 0456ABC1E6 Enabled SUN FIRE X4170 SERVER
@ s S
a sry e thi
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 86


Activating ASR Destinations

5. Confirm end-to-end ASR functionality by using the


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

following command:
# asr report
The following is an example of the output:
Successfully submitted request for activation
status report. Activation status report will be
sent to email address associated with Sun Online
Account:netadmin@example.com The report is an e- able
mail message that lists all activated assets. sfe
r
t r a n
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 87


Activating ASR Destinations

6. Verify that the ASR installer sets three CRON jobs by


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

using the following command as the root user:


# crontab -l
The following are the recommended settings:
– asr report: Set once a week on Sunday.
– asr heartbeat: Set twice daily, or at least once daily.
– asr update_rules.sh: Set once daily, by default. ble
Note: If ASR Manager asset activation fails, contact ns fera
-
Oracle Support Services to assign a Primary Contact.
n tra This
prerequisite applies to the Database Servers
o Exadata
nand
s a
cells in the Database Machine. ) ha deฺ m Gui
c o
- e aฺ ent
@ cis Stud
a sry e this
ฺ e lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e
Copyright
el nse t o
y (
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 88


Validating Auto Service Request

Use the following command from a Database Server to validate


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Auto Service Request from a Database Server. The command


sends a test trap from the host in SNMP V2c format to the ASR
Manager specified by the snmpSubscriber attribute:
/opt/oracle.cellos/compmon/exadata_mon_hw_a
sr.pl -validate_snmp_subscriber -type asr
Note: To use the dcli utility to send a test trap from the ble
hosts, use the following command: n s fera
-t r a
dcli -g dbs_group -l root –n
n o n
a
"/opt/oracle.cellos/compmon/exadata_mon_hw_
s ฺ
) ha ide-type
asr.pl -validate_snmp_subscriber asr"
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srcommand
l i cevalidates Auto Service Request from Exadata cells:
lM
The following
E
Eli e ALTER CELL VALIDATE SNMP TYPE=asr
Note: To use the dcli utility to validate Auto Service Request from Exadata cells, use the
following command:
dcli -g cell_group -l celladmin -n "cellcli -e alter cell
validate snmp type=asr“
Successful completion of the commands indicates that the test trap was sent, and the ASR
snmpSubscriber can be resolved to a valid host. The ASR Manager logs can be checked to
verify the receipt of the test trap. ASR Manager forwards the test trap to an Oracle Support
server, and sends an email to the SOA email account, acknowledging the receipt of the test
message without creating a service request.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 89


ASR Support Process

Engaging Customer Incident Managers (CIMs) and ASR


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Support when ASR activation cannot be completed during on-


site deployment:
• Scenario A. On-site field engineer needs help with ASR
config/registration. All server SNs are in MOS.
Field engineer follows SR process as described in EIS.
• Scenario B. On-site ACS engineer reports missing server ble
SNs that could not get ASR activated on them during the s f era
deployment. t r a n
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lperforms
i ce the following steps:
The ACS
E lM engineer
e
Eli 1. Document missing serial numbers.
2. Request that the customer logs in to MOS and opens a new Technical Exadata HW SR
using the rack serial number of the affected system.
3. The customer should be entered as the SR Contact.
4. Ensure that the customer enters the following subject line: “CIM: Missing Exadata SNs
for ASR.”
5. Ensure that the customer selects the following SR creation options:
a. SR Product = the Hardware product associated with the serial number provided
b. Problem Category= HW ASR Technology
c. SR Severity = 3
6. Ensure that the customer uploads or copy and paste the list of missing server serials
that need to be added to MOS and activated.
7. Document the SR number and include it into the deployment report/summary.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 90


The Exadata EST engineer who is assigned such an SR performs the following steps:
1. By the Summary, recognize that the SR is about missing SNs and needs CIM attention.
2. Contact CIM /Exadata DM and ask to pull this SR into CIM ownership.
The CIM/Exadata DM performs the following step:
Assign the SR to a CIM.
The CIM engineer who is assigned such an SR performs the following steps:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

1. Report missing serials needed for ASR.


2. When serials are recovered in MOS, update the SR that ASR Backline team needs to
work with the customer remotely to activate ASR on the remaining assets.
3. Set the following SR attributes:
a. SR Product = the Hardware product associated with the serial number provided
b. SR Component = Auto Service Request (ASR) Issues
c. SR Subcomponent = Installation b le
d. SR Category = HW ASR Technology
n fera
s
The ASR backline engineer works with the customer and closes the SR. -t r a
n o n
a
Getting the HW Serial Number from the Server
) has ideฺ
om tcommand,
Exadata Serial Numbers can be obtained using thecipmitool
ฺ G u which must be
executed on each server (DB or Storage): -ea e n
i s d
tu /SYS
c cli S"show
[root@node1 ~]# ipmitool sunoem
product_serial_number" sry
@ s
a e thi
Connected. Use ^D ฺto e lmexit.us
e lie se to
-> show /SYS (product_serial_number
a sry licen
E lM
e/SYS
Eli Properties:
product_serial_number = 0937XF500B

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 91


Checking MOS Hardware Serial Numbers

This section describes how to check for HW serial numbers


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

from the manufacturing System Record. The serial numbers


noted in the System Record can then be checked against the
serial numbers that are populated in My Oracle Support/
Internal Support Portal.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a srthe Orion
l i ceGCS Call Center Worker privilege in APS.
lM
1. Request
E
El i e - "Access & Mailing List Request"
- What: Oracle Applications
- How: Add Privileges
- Next Step
- Select One Account: "Orion“
- Next Step
- Available Privileges: "Orion GCS Call Center Worker“
- Move
- Next Step
- Justification: "Needed to support the ACS ASR installation process”
2. Wait for approval; follow instructions if request is rejected.
3. Log in to the Internal Support Portal (ISP): https://support.us.oracle.com
- Click the "Customer" tab.
- Click the "Asset" menu.
- Click "Menu" on the right and select "Create View."

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 92


New LSI RAID Battery Maintenance Procedure

Activity: See videos on how to retro-fit, remove, and replace


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the Sun LSI BBu08 Hot-Swap Battery:


• Available in the eKit
• One for X3-2
• One for X3-2L

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 93


New OneCommand Utility
Oracle Exadata Deployment Assistant
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 94


New OneCommand Utility
Oracle Exadata Deployment Assistant
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

b le
ns fera
t r a
n- no
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr licesection of the MOS article “Database Machine and Exadata Storage
See the
E lMOneCommand

Eli e
Server 11g Release 2 (11.2) Supported Versions (Doc ID 888828.1).”
Optional Activity
Downloading and using the Oracle Exadata Deployment Assistant:
1. Download the Oracle Exadata Deployment Assistant (also known as OneCommand)
and unpack it to a local UNIX filesystem.
2. Run the onecmd.sh script located in the <PatchNumber>/onecmd/Exaconf directory.
3. Fill out all the details using the lab diagram as a guide and generate the configuration
files.

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 95


Summary

In this lesson, you should have learned how to describe and


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

perform advanced tasks on the Oracle Exadata Database


Machine.

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance 7 - 96


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Utility Scripts and Tools

ble
ns fera
t r a
non-
a
) has ideฺ
ฺ c om t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e
Utility Scripts and Tools

This appendix provides information about the various scripts


Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

that can be useful when administering the Oracle Exadata


Database Machine. This appendix includes:
• Utility script dcli
• Utility script verify-topology
• Utility script sosreport
• Utility script exachk r a ble
• Utility tool MegaCli64 n s fe
n - tra
• Utility script sundiag.sh
a no
• Utility Flint application
) has ideฺ
m
• Utility F20 ESM Monitor ฺco t Gu
i s -ea uden
@ c S t
sry e thi s
a
lm© 2013,uOracle
s and/or its affiliates. All rights reserved.
i e ฺ e
Copyright
t o
y ( el nse
a sr lice
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 2


Utility Script dcli
Set up the dcli utility on compute node for storage cell administration.
The dcli utility simplifies any operations that must be run across a subset or all cells. You
can use this tool to execute commands or scripts across a defined set of cells in parallel.
The following steps show how to set up dcli on a host. dcli is a shell script that is not
supplied with RDBMS/ASM binaries. The dcli utility is in the
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

/opt/oracle/cell/cellsrv/deploy/scripts directory of Cell Server.


i) The dcli utility requires Python version 2.3 or later.
You can determine the version of Python by running python -V.
ii) Copy the dcli utility from the bin directory on a cell to a host computer from which
central management can be performed.
scp /opt/oracle/cell/cellsrv/deploy/scripts/dcli <user
name>@<db host name>:<path of db host name>
b le
to manage. ns fera
iii) Create a text file and add a list of cell server host names or IP addresses that you want

t r a
celllist.txt:
no n-
cellserver1 a
cellserver2 ) has ideฺ
iv) Set up the SSH for the root user to the cells:
ฺ c om t Gu
dcli -l root -k -g celllist.txt
i s -ea uden
@ c
celladmin@cellserver1’s password: ****** S t
sry e thi s
a
celladmin@cellserver2’s password: ******
lm
cellserver1: sshi e ฺ
keye added
t o us
( l se
essh
cellserver2:
y keyn added
s r li
a be prompted c e
The user
l M might to acknowledge cell authenticity, and might be prompted for the
cell E
root password.
Elie
After this acknowledgment, you can execute commands on the same cells without being
prompted for a password for that user from the host.
Example
$ dcli -g celllist cellcli -e list cell
cellserver1: cellserver1 online
cellserver2: cellserver2 online

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 3


Utility Script verify-topology
# /opt/oracle.SupportTools/ibdiagtools/verify-topology
[ DB Machine InfiniBand Cabling Topology Verification Tool ]
QDR based Half Rack found. Doing regular fattree checks
Is every external switch connected to every internal switch..........[SUCCESS]
Are any external switches connected to each other....................[SUCCESS]
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Are any hosts connected to spine switch..............................[SUCCESS]


Check if all hosts have 2 CAs to different
switches..................[SUCCESS]
Leaf switch check: cardinality and even
distribution.................[SUCCESS]
Check if each rack has a valid internal
ring........................[SUCCESS]
b le
fera
Example with Bad IB Switch to IB Switch Cable
[root@trnadb01 ~]# cd /opt/oracle.SupportTools/ibdiagtools/verify-topology
a n s
[ DB Machine InfiniBand Cabling Topology Verification Tool ] -t r
Bad link:Switch 0x21283a8371a0a0 Port 9A - Sun Port 9Ba n
on
Reason : 2.5 Gbps Speed found. Could be 10 Gbps ha
s ฺ
o m ) u ide
Possible cause : Cable isn’t fully seated in
a t G
ฺc- Sun nPort
Bad link:Switch 0x21283a89eba0a0 Port
i s e
- ude
9B 9A
Reason : 2.5 Gbps Speed found. Could beS c
@ is 10 Gbps t
r y
s fullythseated in
Possible cause : Cable isn’t
m a e

i e ฺ el to us
Example with Bad
y ( eIBl Cablen seon a DB Node
as r lic e
# /opt/oracle.SupportTools/ibdiagtools/verify-topology
l M
[ DBEMachine InfiniBand Cabling Topology Verification Tool ]
l ie every external switch connected to every internal switch.....[SUCCESS]
E Is
Are any external switches connected to each other...............[SUCCESS]
Are any hosts connected to spine switch.........................[SUCCESS]
Check if all hosts have 2 CAs to different switches.............[ERROR]
Node trnadb01 has 1 endpoints. (Should be 2)
--------fattree End Point Cabling verification failed-----
Leaf switch check: cardinality and even distribution............[ERROR]
Internal QDR Switch 0x21283a8371a0a0 has fewer than 4 compute nodes
It has only 3 links belonging to compute nodes [SUCCESS]
Check if each rack has a valid internal ring...................[SUCCESS]

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 4


Utility Script sosreport
# /opt/oracle.cellos/sreport.sh
The sosreport package automates the process of collecting the relevant trace files when an
error occurs. It should be used to ensure that Oracle Support gets all the necessary
information for root-cause analysis. The following is an example of packaging an incident:
# /opt/oracle.cellos/sreport.sh
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

sosreport (version 1.7)


This utility collects some detailed information about the hardware
and setup of your Enterprise Linux system. The information is
collected and an archive is packaged under /tmp, which you can send
to a support representative. This information will be used for
diagnostic purposes ONLY and it will be considered confidential
information.
b le
fera
This process may take a while to complete.
No changes will be made to your system.
a ns
Press ENTER to continue, or CTRL-C to quit. n- t r
Please enter your first initial and last name [trnacel07]: dwinter a no
Please enter the case number that you are )generating has ideฺ this report
for: 10132345
ฺ c om t Gu
i s -ea uden
Progress [###################100%##################][05:51/05:51]
Creating compressed archive... @ c S t
s
ry thi and saved in:
Your sosreport has beenas
ฺ e l m use
generated

( e lie se to
/tmp/sosreport-dwinter.10132345-817953-683b39.tar.bz2
The md5sumry
a s l i c en
is: 5a249a63e062f723cfc8b23fc6683b39
Please
E lMsend this file to your support representative.
ElAtiethis point, the packaged zip file is in /tmp and ready for shipment to Oracle Support.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 5


Utility Script exachk
# /opt/oracle.SupportTools/exachk
Exadata Database Machine Health Checker
• Hundreds of checks with potentially monthly or quarterly editions
• Software checks (firmware, operating system, clusterware, ASM,database, Exadata)
• Hardware checks (database server, InfiniBand, Exadata cells, disks)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Configuration best practices (Operating System, Clusterware, ASM, RAC, Database,


Exadata, Infiniband)
• Consolidated RAC, Exadata MAA, and some performance configuration best practices
• Nonintrusive data gathering, analysis and reporting phases
Preinstalled on new Exadata deployments, but must periodically upload kit to get
latest checks
Current support DBM V2, X2-2 and only Linux b le
More Info and download
ns fera
t r a
• MOS Note 757552.1 – Oracle Exadata Best Practices
no n-
• MOS Note 1070954.1 – Oracle Database Machine HealthCheck a
• RAC Assurance Team –
) has ideฺ
ฺ c om t Gu
- http://rat.us.oracle.com/pls/htmldb/f?p=191:393:3758227989974652::NO:393::
Key components of the Exachk Kit
i s -ea uden
• exachk – bash shell script @ c S t
sry e thi s
a
• collections.dat – driver file
lm us

• rules.dat – driverefile
i e to
( e l e
s r y cens
• User guide/readme.txt
Usage Ma
l li
i e
Stage
E and run the tool only on database servers (that is, database servers) as the Oracle
l
E RDBMS software owner (for example, oracle) if Oracle software installed.
The tool can be run with the following arguments:
1. -a – Performs all checks, best practice and database/clusterware patch/os
recommendations.
2. -o – For invoking optional functionality
v|verbose to display PASSing audit checks as well as non-PASSing
Example: exachk -a -o v
or exachk -a -o verbose
3. -v – Returns the version of the tool
4. -s silent, non-interactive mode
5. -S silent, non-interactive mode

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 6


Utility RAID Tool MegaCli64
# /opt/MegaRAID/MegaCli/MegaCli64 -h
MegaCLI SAS RAID Management Tool Ver 5.00.14 July 14, 2009(c)Copyright 2009, LSI
Corporation, All Rights Reserved.
Note: The following options may be given at the end of any command below:
[-Silent] [-AppLogFile filename] [-NoLog] [-page[N]]
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

[-] is optional.
N - Number of lines per page
MegaCli64 -v
MegaCli64 -help|-h|?
...
Gather information le
b
Controller information
ns fera
MegaCli64 -AdpAllInfo -aALL
t r a
MegaCli64 -CfgDsply -aALL
no n-
a
has ideฺ
MegaCli64 -AdpEventLog -GetEvents -f events.log -aALL && cat
events.log
)
Enclosure information
ฺ c om t Gu
MegaCli64 -EncInfo –aALL -ea e n
c i s tu d
Virtual drive information @
y t–aALL s S
MegaCli64 -LDInfo a sr-Lall
e hi
MegaCli64ฺe lm us–aALL
–LDPDInfo
(
(showse to VD/LD and physical disk relationship)
ie ebetween
lMapping
s r y cens
l M a
Physical drive li
information

i e E MegaCli64 -PDList -aALL


E l MegaCli64 -PDInfo -PhysDrv [E:S] –aALL
Battery backup information
MegaCli64 -AdpBbuCmd –aALL

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 7


Controller Management
Silence active alarm:
MegaCli64 -AdpSetProp AlarmSilence –aALL
Disable alarm:
MegaCli64 -AdpSetProp AlarmDsbl –aALL
Enable alarm:
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

MegaCli64 -AdpSetProp AlarmEnbl –aALL


Virtual Drive Management
Create RAID 0, 1, 5 drive:
MegaCli64 -CfgLdAdd -r(0|1|5) [E:S, E:S, ...] –aN
Create RAID 10 drive:
MegaCli64 -CfgSpanAdd -r10 -Array0[E:S,E:S] -Array1[E:S,E:S] –aN
Remove drive: ble
MegaCli64 -CfgLdDel -Lx –aN
ns fera
t r a
Physical Drive Management
non-
Set state to offline: a
MegaCli64 -PDOffline -PhysDrv [E:S] –aN
) has ideฺ
Set state to online:
ฺ c om t Gu
MegaCli64 -PDOnline -PhysDrv
i s -ea[E:S] d e n
–aN
Mark as missing: @ c S tu
s
sry e th-PhysDrv
i
l m a
MegaCli64 -PDMarkMissing
s
[E:S] –aN
Prepare for removal: eฺe
i t o u
l
(e n-PdPrpRmv
se
r y
MegaCli64
e
-PhysDrv [E:S] –aN

l M as lic
i e E
E l

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 8


Virtual Drive Management
Replace missing drive:
MegaCli64 -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN –aN
Note: The number N of the array parameter is the span reference that you obtain by
using MegaCli64 -CfgDsply -aALL and the number N of the row parameter is the
physical disk in that span or array that starts with zero. (It is not the physical disk’s slot.)
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Rebuild drive:
MegaCli64 -PDRbld -Start -PhysDrv [E:S] -aN
MegaCli64 -PDRbld -Stop -PhysDrv [E:S] -aN
MegaCli64 -PDRbld -ShowProg -PhysDrv [E:S] –An
Clear drive:
MegaCli64 -PDClear -Start -PhysDrv [E:S] -aN
MegaCli64 -PDClear -Stop -PhysDrv [E:S] -aN
b le
fera
MegaCli64 -PDClear -ShowProg -PhysDrv [E:S] –aN
Bad to good:
a ns
MegaCli64 -PDMakeGood -PhysDrv[E:S] –aN n- t r
Changes drive in state Unconfigured-Bad to Unconfigured-Good a no
Hot Spare Management ) has ideฺ
ฺ c om t Gu
-ea [E:S]
Set global hot spare: n–aN
i s d e
c
MegaCli64 -PDHSP -Set -PhysDrv
@ S tu
Remove hot spare:
s ry thi s
MegaCli64 -PDHSP a
lm -Rmv se-PhysDrv [E:S] –aN
ฺ e u
( e lie se to
Set dedicated hot spare:

a ry licen-PDHSP -Set -Dedicated -ArrayN,M,... -PhysDrv


sMegaCli64 [E:S] –aN
DeletelM
a logical device:
e E
Eli MegaCli64 -cfgLDDel -l0 –a0

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 9


Hot Spare Management
To start initialization:
MegaCli64 -LDInit -Start -L0 -a0

To display the background initialization:


MegaCli64 -LDBI -ProgDsply -L0 -a0
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

To display all the physical and logical devices on the array:


MegaCli64 -LDPDInfo -aall
MegaCli64 -AdpAllInfo -aALL | grep "Degraded"
MegaCli64 -AdpAllInfo -aALL | grep "Failed Disks"
MegaCli64 -AdpEventLog -GetEvents -f logfile –aALL
To change or replace a drive, perform the following steps: ble
1. Set the drive offline if it is not already offline due to an error.
ns fera
t r a
n-
MegaCli64 -PDOffline -PhysDrv [E:S] -aN
2. Mark the drive as missing:
a no
MegaCli64 -PDMarkMissing -PhysDrv [E:S]a-aN
h s ฺ
3. Prepare the drive for removal:
o m ) u ide
MegaCli64 -PDPrpRmv -PhysDrv a[E:S] ฺc -aN n t G
e
is- tude
4. Change or replace the drive. c S
r
5. If you are using hot spares,sthey@replaced
t h i sdrive should become your new hot-spare drive:
l
MegaCli64 -PDHSP m a -Setse-PhysDrv [E:S] -aN
ฺ e o u
6. If you are note
( lie swith
working e thot spares, you must add the new drive again.
Add your
a c en drive and start the rebuilding:
sryRAIDlivirtual
E lM MegaCli64 -PdReplaceMissing -PhysDrv [E:S] -ArrayN -rowN -aN
El i e MegaCli64 -PDRbld -Start -PhysDrv [E:S] -aNs

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 10


Utility Hardware Diagnostics Script
# /opt/oracle.SupportTools/sundiag.sh

1. cp /var/log/messages* .
2. /bin/dmesg > `hostname -a`_$mdate_dmesg.out
3. /sbin/lspci > `hostname -a`_$mdate_lspci.out
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

4. /sbin/fdisk -l > `hostname -a`_$mdate_fdisk-l.out


5. /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL >`hostname -
a`_$mdate_megacli64-AdpAllInfo.out
6. /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | awk –f
analysis.awk > `hostname -a`_$mdate_megacli64-PdList_short.out
where analysis.awk in /opt/MegaRAID/MegaCli is:
# This is a little AWK program that interprets MegaCLI
ble
output
ns fera
/Slot Number/ { counter += 1; slot[counter] = $3 }
t r a
/Device Id/ { device[counter] = $3 }
no n-
a
has ideฺ
/Firmware state/ { state_drive[counter] = $3 }
)
/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
ฺ c om t Gu
-ea uden
END {

c i s
for (i=1; i<=counter; i+=1) printf ( "Slot %02d Device %02d
t
@ s S
sry e thi
(%s)
a
lm us
status is: %s <br/>\n", slot[i], device[i], name_drive[i],
ฺ e
( e lie se to
state_drive[i]); }
7.
sry licen
/opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents
a
l M
-f /tmp/logfile -aALL | cp /tmp/logfile ./`hostname -
E
E l i ea`_$mdate_megacli64-GetEvents-all.out
8. /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL >`hostname
-a`_$mdate_megacli64-LdPdInfo.out
9. /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL >`hostname -
a`_$mdate_megacli64-PdList_long.out
10. /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL
>`hostname -a`_$mdate_megacli64-LdInfo.out
11. cellcli -e list cell detail > `hostname -a`_$mdate_cell-
detail.out
12. cellcli -e list celldisk > `hostname -
a`_$mdate_celldisk.out
13. cellcli -e list lun detail > `hostname -
a`_$mdate_lundetail.out
14. cellcli -e list physicaldisk detail > `hostname -
a`_$mdate_physicaldisk-detail.out
15. cellcli -e list physicaldisk where status!=normal
>`hostname -a`_$mdate_physicaldisk-fail.out

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 11


Create a script called “megaraid_status.sh” for checking raid status, “chmod +x”. Store
the script in /opt/MegaRAID/MegaCli.
#!/bin/sh
CONT="a0"
STATUS=0
MEGACLI=/opt/MegaRAID/MegaCli/MegaCli64
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

echo -n "Checking RAID status on "


hostname
for a in $CONT
do
NAME=`$MEGACLI -AdpAllInfo -$a |grep "Product Name" | cut -d: -
f2`
echo "Controller $a: $NAME"
ble
noonline=`$MEGACLI PDList -$a | grep Online | wc -l`
ns fera
echo "No of Physical disks online : $noonline"
t r a
DEGRADED=`$MEGACLI -AdpAllInfo -a0 |grep "Degrade"`
no n-
a
has ideฺ
echo $DEGRADED
NUM_DEGRADED=`echo $DEGRADED |cut -d" " -f3` )
ฺ c om t Gu
-ea uden
[ "$NUM_DEGRADED" -ne 0 ] && STATUS=1
i s
FAILED=`$MEGACLI -AdpAllInfo -a0 |grep "Failed Disks"`
c t
@ s S
echo $FAILED
a sry e thi
ฺ e lm us
NUM_FAILED=`echo $FAILED |cut -d" " -f4`

( e lie se to
[ "$NUM_FAILED" -ne 0 ] && STATUS=1
done
a sry licen
El M
exit $STATUS

E l i e Example output of script to capture in `hostname -


a`_$mdate_megacli64-status.out
Checking RAID status on trnA-db03.sodm.com
Controller a0: LSI MegaRAID SAS 9261-8i
No of Physical disks online : 3
Degraded : 0

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 12


# vi testraid.sh
#!/bin/sh
CONT="a0"
STATUS=0
MEGACLI=/opt/MegaRAID/MegaCli/MegaCli64
echo -n "Checking RAID status on "
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

hostname
for a in $CONT
do
NAME=`$MEGACLI -AdpAllInfo -$a |grep "Product Name" | cut -d: -
f2`
echo "Controller $a: $NAME"
noonline=`$MEGACLI PDList -$a | grep Online | wc -l` ble
echo "No of Physical disks online : $noonline"
ns fera
t r a
DEGRADED=`$MEGACLI -AdpAllInfo -a0 |grep "Degrade"`
echo $DEGRADED no n-
a
NUM_DEGRADED=`echo $DEGRADED |cut -d" " -f3`
) has ideฺ
[ "$NUM_DEGRADED" -ne 0 ] && STATUS=1
ฺ c om t Gu
i s -ea uden
FAILED=`$MEGACLI -AdpAllInfo -a0 |grep "Failed Disks"`
echo $FAILED
@ c S t
sry e thi s
NUM_FAILED=`echo $FAILED |cut -d" " -f4`
a
lm us
ฺ e
[ "$NUM_FAILED" -ne 0 ] && STATUS=1
done ( e lie se to
a sry licen
exit $STATUS
El M# /opt/MegaRAID/MegaCli/testraid.sh
E l i e Checking RAID status on trnadb03.sodm.com
Controller a0: LSI MegaRAID SAS 9261-8i
No of Physical disks online : 3
Degraded : 0
Failed Disks : 0

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 13


# vi analysis.awk
# This is a little AWK program that interprets MegaCLI output
/Device Id/ { counter += 1; device[counter] = $3 }
/Firmware state/ { state_drive[counter] = $3 }
/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
END {
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status


is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); }
# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | awk -f
analysis.awk
Device 11 (HITACHI H101414SCSUN146GSA250921ERW1YA ) status is:
Online <br/>
Device 10 (HITACHI H101414SCSUN146GSA250921ERTL3A ) status is: b le

ns fera
t r a
Flint Application
no n-
The Mellanox Firmware Tool (MFT) is a suite of applications installed on the Linux Infiniband
a
has ideฺ
host. The Flint application performs the switch chip firmware upgrade.
)
The format of the command line is:
ฺ c om t Gu
flint -d lid-LID -i image b ea
i s - uden
where:
@ c S t
• LID is the LID of the switchschip. ry thi s
a se binary image.
lmof the firmware
• image is the file name ฺ e u
toto verify the switch chip version and to reset the switch chip.
(
The Flint application e lisiealsosused
e
If the Flinta sry licisenot
application
n installed on your Linux Infiniband host, install it.
E
Install
M Flint Application
lthe
e
Eli 1. As superuser on the Linux Infiniband host, open the following URL in a web browser:
http://www.mellanox.com.
2. Locate and click the Firmware Tools link. The Mellanox Firmware Tools (MFT) page
appears.
3. Click the following links to download the respective Linux files.
- MFT_Linux_Release_Notes (PDF file)
- MFT_User’s Manual for Linux (PDF file)
- MFT_SW for Linux (.tgz file)
Read the user’s manual for installation instructions and how to use the Flint application.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 14


Download Switch Chip Firmware
1. As superuser, open a web browser on the Linux Infiniband host and go to this URL:
http://www.sun.com/software.
2. Download the most current versions of the switch chip firmware.
3. In the download directory, expand the compressed file.
4. On the Linux Infiniband host, use the ibswitches command to determine the LID of
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

the switch chip to be upgraded:


# ibswitches
The command returns a listing of switch GUIDs, descriptions, and LIDs, as in the
following example:
# ibswitches
.
.
ble
Switch : 0x0021283a831da0a2 ports 36 "Sun DCS 36 QDR
ns fera
switch 2.0" base port 0
t r a
lid 198 lmc 0
non-
a
has ideฺ
.
. )
ฺ c om t Gu
-ea uden
.
In the example, the switch chip LID is 198. c i s t
@ s S
5. Determine your current switch
a srychipefirmware
t hi version.
# flint -d lid-LID
ฺ e lm -qq u sq
where LID e islthe o
ie LID eof tthe switch chip, as in the following example:
r ( ns -qq q
y -d clid-198
s
# flint
a type:FS2 li e
E l M
Image
E l i e FW Version:7.1.948
Device ID:48438
Chip Revision:A0
Description:NodeSys image
6. Compare the versions in the output to those versions that you previously downloaded.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 15


7. If the installed firmware is older than the current version, upgrade the switch chip
firmware.
8. On the Linux Infiniband host, program the switch chip.
For example, to program a switch chip having a LID of 198 with the M2_I4_IMG.bin
firmware file:
# flint -d lid-198 -i M2_I4_IMG.bin b
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

Current FW version on flash:7.1.948


New FW version:7.2.0
Do you want to continue ? (y/n) : y
Burning first FW image without signatures - OK
Restoring first signature - OK
- Burning primary image- OK
- Verifying primary image- OK
b le
The switch chip is programmed with the firmware.
ns fera
9. Reset the switch chip:
t r a
# flint -d lid-198 swreset
no n-
a
has idtoecomplete.
Resetting device lid-198
Note: Resetting the switch chip might take up to 90 )seconds

10. Verify the new switch chip firmware version: ฺ c om t Gu
# flint -d lid-198 -qq q cis-
ea den
@ S tu
Image type: FS2
sry e thi s
FW Version: 7.2.0 lm a
i e ฺ e to us
y ( el nse
Device ID: 48438
r lice A0
ChipsRevision:
a
E lM
Description: NodeSys image
El i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 16


Exadata Storage Server Configuration
The data is collected by using Distributed Shell for Oracle Storage (dcli), a tool that allows
the execution of commands on multiple Exadata Storage Servers in parallel threads.
The following is an example of the script usage, executed in any of the Exadata Storage
Servers as user celladmin.
$dcli -c cel01,cel02,cel03,cel04 "cellcli -e start
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

GetSCConf.scl"
The preceding command will execute a list of cellcli commands specified in the
GetSCConf.scl script on Exadata Storage Servers cel01, cel02, cel03, and cel04.
GetSCConf.scl
A preview of the GetSCConf.scl script:
set echo on
list cell detail
ble
list lun detail
list physicaldisk detail ns fera
t r a
list celldisk detail
no n-
list griddisk detail a
) has ideฺ
You can keep the host names of the Exadata Storage Servers in a text file and use it with
dcli:
ฺ c om t Gu
s -ea uden
$dcli -g cells.txt "cellcli -e start GetSCConf.scl"
i
where cells.txt is: @ c S t
sry e thi s
• cel01 a
lm us
ฺ e
• cel02
( e lie se to
• cel03
a s ry licen
• cel04
E lM
Eli e
Getting the Information
To obtain the data, execute the GetSCConf.sh script:
#!/bin/sh
$dcli -g cells.txt "cellcli -e start GetSCConf.scl"
$dcli -g cells.txt ifconfig
1. Log in to one of the Exadata Storage Servers as the celladmin user.
2. Create the GetSCConf.sh script and the cells.txt file.
3. Modify the content of the cells.txt file, replacing the correct names of the Exadata
Storage Servers (host names).
4. Execute: $sh GetSCConf.sh > GetSCConf.out.
5. The GetSCConf.out file can be downloaded for further analysis.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 17


Sun Fire X4170 Database Server Configuration
This important information can be obtained by using the dcli tool and executing the
GetDBConf.sh script.
The files used to collect the information are:
• GetDBConf.sh
• #!/bin/sh
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• echo "####################"
• echo "Getting cellip.ora"
• echo "####################"
• dcli -l usupport -g dbsrv.txt cat
• /etc/oracle/cell/network-config/cellip.ora
• echo "#####################"
• echo "Getting cellinit.ora"
ble
• echo "#####################"
ns fera
• dcli -l usupport -g dbsrv.txt cat t r a
• /etc/oracle/cell/network-config/cellinit.ora no n-
a
has ideฺ
• File dbsrv.txt: Contains the host names for the Database Servers
)
• db01
ฺ c om t Gu
• Db02
i s -ea uden
c t
To execute GetDBConf.sh, perform the following steps:
@ S
sry e thi s
1. Log in to one of the Exadata Storage Servers as the celladmin user.
a
lm us
2. Create the GetDBConf.sh script and the dbsrv.txt file.
ฺ e
e lie se to
3. Modify content of the dbsrv.txt file, replacing the correct names of the Database
(
sry licen
Servers (host names).
a
l M
4. Execute $sh GetDBConf.sh > GetDBConf.out.
E
E l i e The GetDBConf.out file can be downloaded for further analysis.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 18


Monitoring ESM Lifespan
Because the onboard ESM has a two-year lifespan, two different methods are provided that
monitor how long an ESM has been installed, and you are notified when to replace the ESM.
• One option is the Sun Flash Accelerator F20 ESM Monitoring Utility, which is a simple
script that you install on your host server to track the life of the ESM. You must use this
monitoring option for F20 cards with part number 511-1500-01.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• The second option uses ILOM to monitor the F20 card. ILOM tracks the ESM lifespan
and notifies you when to replace the ESM. You must use this monitoring option for F20
cards with part numbers 511-1500-05 or greater, and with ILOM system firmware
version 7.2.7.d or later.
Sun Flash Accelerator F20 ESM Monitoring Utility
The Sun Flash Accelerator F20 ESM Monitoring Utility is a simple tool that you install on your
host server to track the life of the ESM. After it is installed, the ESM Monitoring Utility runs
weekly to track the age of your ESM. The utility sends messages to the console and the
b le
/var/adm/messages file as the ESM approaches or exceeds the two-year replacement
ns fera
interval. Optionally, you can use an external monitoring tool to configure an SNMP trap that
t r a
sends an email alert when these messages appear. n-no
a
The utility can be run manually any time to display the current ESM replacement data on all
installed cards.
) has ideฺ
Note: Installation of this utility is required on cards with
ฺ c omparttnumber
G u 511-1500-01 to maintain
optimal performance for the life of the card. This
- e aoption ewillnnot work on cards with part
numbers 511-1500-05 or higher.
@ c i s
S tud
sry eintthe
To install the utility, follow the directions
a hisread me file.
Purpose
ฺ e lm us
e
The Sun Flash Acceleratorlie sF20 e o Monitoring Utility checks the onboard Energy Storage
tESM
(
Module (ESM)
a srydaily,liand
c enindicates when to replace the ESM as it reaches its life-time
lM Replacement of the ESM ensures continued optimal performance while accessing
threshold.
E
Eli
thee FMods on the Sun Flash Accelerator F20 PCIe Card.

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 19


Installation
Make sure that you use the correct script for your operating system.
Linux and Solaris use similar scripts, which perform the following actions:
• Locate all Aura cards.
• Set up a periodic job to check the lapsed time information.
• Run the installation tool as root.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• Start the installer by giving the full path.


• Install with defaults to run the check daily.
If the installer is started with the -h switch, a description of the function is printed and the script
then exits.
Note: If there is an identical entry in the crontab, it will be replaced to avoid a duplicate entry.
Linux
The installer will modify the root crontab stored at /var/spool/cron/root. If the b le
customer wants to use an alternate location, the file should be moved to the preferred
ns fera
location. t r a
no n-
Unpack the distribution into the directory where you want the check tool to run from:
a
tar xf check_supercap.tar
) h as eฺ
As root, run the install_check_esm.sh installation
o mscriptGand
u idanswer the questions.
Description
c
aฺ ent
- e
This script installs a Linux tool, check_esm,
@ tudjob. The tool checks if the ESM
cis as aScron
component in any SunFlash Acceleratorsry eF20 t s installed in the system has exceeded its
hicards
lifespan. A warning message a
lmis sentuout
s to the system console and logged to the system log
ฺ e
ie e t
file, that is, /var/log/messages. o Before replacing the ESM component, the user should run
( e l
check_esm toyreset the ESM
s r c e ns initial power-on time.
Synopsis
l M a li
i e E
E l install_check_esm.sh [-h]
Running the installer with no switches will prompt for the cron data. If the defaults are used,
the tool runs on the 1st and 15th of each month.
[root@wgs94-203 check_supercap]# ./install_check_esm.sh
Installing into /home/gent/c800/check_supercap The cron job will
be run weekly.
Please set the time of day and the day of the week.
Press ENTER after each input hour[0-23]: [default 0] 23 day of
the week[0-6 with 0=Sunday]: [default 0] 6
You are about to create the cron job: 0 23 * * 6
/home/gent/c800/check_supercap/check_esm -p all 1,0,0 > /dev/null
2>&1
Are you sure that you want to continue? y
Starting creating your cron job...
Reading Aura ESM initial power-on date for all cards.
Done

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 20


Vital Product Data
The Vital Product Data (VPD), on the F20 is used to provide nonvolatile storage of the
threshold.
Correct operation of the tool requires that the Aura Card has a correct VPD installed. If the
VPD is blank or has an incorrect part number, the tool will not work. If the tool responds with a
message that is similar to “VPD Data is not Valid,” use LSIUtil to write a corrected VPD.
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

• LSIUtil Option 48 reads the VPD.


- If there is a VPD, make a note of the serial number.
• LSIUtil Option 49 writes the VPD.
- Use the file that is delivered with the kit as the prototype VPD.
- Enter the serial number when requested by LSI Util.
- If you do not have a serial number and have not removed the card to find it, enter
55555555.
ble
fera
Note: If check_esm does not show all flash cards in the system, it is most likely due to
either of the following:
a ns
n- t r
1. The card does not have the right part numbers in VPD. check_esm supports three
part numbers:
a no
511-1275-02, 511-1275-03, and 511-1500-01
) has ideฺ
ฺ c om t Gu
You can verify the part number by using the LSIUtil Option 48.

i s -ea uden
2. The card does not have the latest LSI HBA firmware with the SSID fix. You can
c t
find the download instruction and firmware at the following location:
@ S
/net/elis-ha2-
sry e thi s
a
nfs.east/export/ds02/d134/mongo/aura/teams/fw/LSI/1068E/phase15-1.27.3-bios-
lm us
ฺ e
SSID
( e lie se to
a sry licen
El M
E l i e

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 21


ILOM Monitoring
If you have multiple Sun Flash Accelerator F20 PCIe cards of the same age installed,
consider replacing the ESMs at the same time to minimize system down time. Service the
ESM (F371-4650) as described in the Sun Flash Accelerator F20 PCIe User’s Guide (820-
7265).
ILOM ESM Monitoring Option
Unauthorized reproduction or distribution prohibitedฺ Copyright© 2017, Oracle and/or its affiliatesฺ

For later-generation F20 cards (part number 511-1500-05 or greater), the ESM lifespan is
automatically monitored by the ILOM system management firmware (system firmware version
7.2.7.d or greater) installed on your host.
ILOM monitors ESMs by recording the Total_Time_On for each installed F20 card, and then
issues warning messages (to the event log and to the host Solaris syslog) as an ESM
approaches the end of its two-year lifespan.
For example, one week before an ESM reaches its two-year threshold, ILOM issues this
warning message: b le
"/SYS/MB/RISER1/PCI4/F20CARD ESM is approaching its lifespan. Please schedule a
ns fera
replacement as soon as possible."
t r a
n-
no
When an ESM reaches its two-year threshold, ILOM issues this critical event message:
a
has ideฺ
"/SYS/MB/RISER1/PCI4/F20CARD ESM has exceeded its lifespan. Please schedule a
replacement as soon as possible." )
Note: You can configure ILOM to send these alerts ฺ c om
by t
email G
or
u trap. See your ILOM
SNMP
documentation for more information. i s -ea uden
@ c in theSSun
t Flash Accelerator F20 PCIe User’s
Service the ESM (F371-4650) as described
sry e thi s
Guide (820-7265). a
lmESM, uuses
After you have replaced i e ฺ e
your t o ILOM’s standard fault clearing methods to remove the
( e l e
fault warnings; y
s r c e ns see
this also resets the F20 card Total_Time_On counter to 0. For more
information
l M aabout li
using ILOM, http://docs.sun.com/app/docs/coll/ilom3.0?l=en.

i e E
E l

PTR/INT Oracle Exadata Database Machine Install and Maintenance A - 22

You might also like